├── LICENSE ├── .gitignore ├── README.md ├── exp_main.py ├── cgp_config.py ├── cnn_train.py ├── cnn_model.py └── cgp.py /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 Masanori Suganuma and Shinichi Shirakawa 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | env/ 12 | build/ 13 | develop-eggs/ 14 | dist/ 15 | downloads/ 16 | eggs/ 17 | .eggs/ 18 | lib/ 19 | lib64/ 20 | parts/ 21 | sdist/ 22 | var/ 23 | wheels/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | .hypothesis/ 48 | 49 | # Translations 50 | *.mo 51 | *.pot 52 | 53 | # Django stuff: 54 | *.log 55 | local_settings.py 56 | 57 | # Flask stuff: 58 | instance/ 59 | .webassets-cache 60 | 61 | # Scrapy stuff: 62 | .scrapy 63 | 64 | # Sphinx documentation 65 | docs/_build/ 66 | 67 | # PyBuilder 68 | target/ 69 | 70 | # Jupyter Notebook 71 | .ipynb_checkpoints 72 | 73 | # pyenv 74 | .python-version 75 | 76 | # celery beat schedule file 77 | celerybeat-schedule 78 | 79 | # SageMath parsed files 80 | *.sage.py 81 | 82 | # dotenv 83 | .env 84 | 85 | # virtualenv 86 | .venv 87 | venv/ 88 | ENV/ 89 | 90 | # Spyder project settings 91 | .spyderproject 92 | .spyproject 93 | 94 | # Rope project settings 95 | .ropeproject 96 | 97 | # mkdocs documentation 98 | /site 99 | 100 | # mypy 101 | .mypy_cache/ 102 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Designing Convolutional Neural Network Architectures Based on Cartegian Genetic Programming 2 | 3 | This repository contains the code for the following paper: 4 | 5 | Masanori Suganuma, Shinichi Shirakawa, and Tomoharu Nagao, "A Genetic Programming Approach to Designing Convolutional Neural Network Architectures," 6 | Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '17), pp. 497-504 (2017) [[DOI]](https://doi.org/10.1145/3071178.3071229) [[arXiv]](https://arxiv.org/abs/1704.00764) 7 | 8 | ## Requirement 9 | We use the [Chainer](https://chainer.org/) framework for neural networks and tested on the following environment: 10 | 11 | * Chainer version 1.16.0 12 | * GPU: GTX 1080 or 1070 13 | * Python version 3.5.2 (anaconda3-4.1.1) 14 | 15 | [PyTorch](https://pytorch.org/) version is [here](https://github.com/sg-nm/cgp-cnn-PyTorch) 16 | 17 | ## Usage 18 | 19 | ### Run the architecture search 20 | This code can reproduce the experiment for CIFAR-10 dataset with the same setting of the GECCO 2017 paper (by default scenario). The (training) data are split into the training and validation data. The validation data are used for assigning the fitness to the generated architectures. We use the maximum validation accuracy in the last 10 epochs as the fitness value. 21 | 22 | If you run with the ResSet described in the paper as the function set: 23 | 24 | ```shell 25 | python exp_main.py -f ResSet 26 | ``` 27 | 28 | Or if you run with the ConvSet described in the paper: 29 | 30 | ```shell 31 | python exp_main.py -f ConvSet 32 | ``` 33 | 34 | When you use the multiple GPUs, please specify the `-g` option: 35 | 36 | ```shell 37 | python exp_main.py -f ConvSet -g 2 38 | ``` 39 | 40 | After the execution, the files, `network_info.pickle` and `log_cgp.txt` will be generated. The file `network_info.pickle` contains the information for Cartegian genetic programming (CGP) and `log_cgp.txt` contains the log of the optimization and discovered CNN architecture's genotype lists. 41 | 42 | Some parameters (e.g., # rows and columns of CGP, and # epochs) can easily change by modifying the arguments in the script `exp_main.py`. 43 | 44 | ### Re-training 45 | 46 | The discovered architecture is re-trained by the different training scheme (500 epoch training with momentum SGD) to polish up the network parameters. All training data are used for re-training, and the accuracy for the test data set is reported. 47 | 48 | ```shell 49 | python exp_main.py -m retrain 50 | ``` 51 | -------------------------------------------------------------------------------- /exp_main.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | import argparse 5 | import pickle 6 | import pandas as pd 7 | 8 | from cgp import * 9 | from cgp_config import * 10 | from cnn_train import CNN_train 11 | 12 | 13 | if __name__ == '__main__': 14 | 15 | func_set = { 16 | 'ConvSet': CgpInfoConvSet, 17 | 'ResSet': CgpInfoResSet, 18 | } 19 | 20 | parser = argparse.ArgumentParser(description='Evolving CNN structures of GECCO 2017 paper') 21 | parser.add_argument('--func_set', '-f', choices=func_set.keys(), default='ConvSet', help='Function set of CGP (ConvSet or ResSet)') 22 | parser.add_argument('--gpu_num', '-g', type=int, default=1, help='Num. of GPUs') 23 | parser.add_argument('--lam', '-l', type=int, default=2, help='Num. of offsprings') 24 | parser.add_argument('--net_info_file', default='network_info.pickle', help='Network information file name') 25 | parser.add_argument('--log_file', default='./log_cgp.txt', help='Log file name') 26 | parser.add_argument('--mode', '-m', default='evolution', help='Mode (evolution / retrain)') 27 | args = parser.parse_args() 28 | 29 | # --- Optimization of the CNN architecture --- 30 | if args.mode == 'evolution': 31 | # Create CGP configuration and save network information 32 | network_info = func_set[args.func_set](rows=5, cols=30, level_back=10, min_active_num=10, max_active_num=50) 33 | with open(args.net_info_file, mode='wb') as f: 34 | pickle.dump(network_info, f) 35 | 36 | # Evaluation function for CGP (training CNN and return validation accuracy) 37 | eval_f = CNNEvaluation(gpu_num=args.gpu_num, dataset='cifar10', valid_data_ratio=0.1, verbose=True, 38 | epoch_num=50, batchsize=128) 39 | 40 | # Execute evolution 41 | cgp = CGP(network_info, eval_f, lam=args.lam) 42 | cgp.modified_evolution(max_eval=1000, mutation_rate=0.05, log_file=args.log_file) 43 | 44 | # --- Retraining evolved architecture --- 45 | elif args.mode == 'retrain': 46 | # Load CGP configuration 47 | with open(args.net_info_file, mode='rb') as f: 48 | network_info = pickle.load(f) 49 | 50 | # Load network architecture 51 | cgp = CGP(network_info, None) 52 | data = pd.read_csv(args.log_file, header=None) # Load log file 53 | cgp.load_log(list(data.tail(1).values.flatten().astype(int))) # Read the log at final generation 54 | 55 | # Retraining the network 56 | temp = CNN_train('cifar10', validation=False, verbose=True) 57 | acc = temp(cgp.pop[0].active_net_list(), 0, epoch_num=500, batchsize=128, weight_decay=5e-4, eval_epoch_num=450, 58 | data_aug=True, comp_graph=None, out_model='retrained_net.model', init_model=None) 59 | print(acc) 60 | 61 | else: 62 | print('Undefined mode.') 63 | -------------------------------------------------------------------------------- /cgp_config.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | import multiprocessing as mp 5 | import numpy as np 6 | import cnn_train as cnn 7 | 8 | 9 | # wrapper function for multiprocessing 10 | def arg_wrapper_mp(args): 11 | return args[0](*args[1:]) 12 | 13 | 14 | # Evaluation of CNNs 15 | def cnn_eval(net, gpu_id, epoch_num, batchsize, dataset, valid_data_ratio, verbose): 16 | 17 | print('\tgpu_id:', gpu_id, ',', net) 18 | train = cnn.CNN_train(dataset, validation=True, valid_data_ratio=valid_data_ratio, verbose=verbose) 19 | evaluation = train(net, gpu_id, epoch_num=epoch_num, batchsize=batchsize, 20 | comp_graph='CNN%d.dot'%(gpu_id), out_model=None, init_model=None) 21 | print('\tgpu_id:', gpu_id, ', eval:', evaluation) 22 | return evaluation 23 | 24 | 25 | class CNNEvaluation(object): 26 | def __init__(self, gpu_num, epoch_num=50, batchsize=256, dataset='cifar10', valid_data_ratio=0.1, verbose=True): 27 | self.gpu_num = gpu_num 28 | self.epoch_num = epoch_num 29 | self.batchsize = batchsize 30 | self.dataset = dataset 31 | self.valid_data_ratio = valid_data_ratio 32 | self.verbose = verbose 33 | 34 | def __call__(self, net_lists): 35 | evaluations = np.zeros(len(net_lists)) 36 | 37 | for i in np.arange(0, len(net_lists), self.gpu_num): 38 | process_num = np.min((i + self.gpu_num, len(net_lists))) - i 39 | 40 | pool = mp.Pool(process_num) 41 | arg_data = [(cnn_eval, net_lists[i+j], j, self.epoch_num, self.batchsize, self.dataset, 42 | self.valid_data_ratio, self.verbose) for j in range(process_num)] 43 | evaluations[i:i+process_num] = pool.map(arg_wrapper_mp, arg_data) 44 | pool.terminate() 45 | 46 | return evaluations 47 | 48 | 49 | class CgpInfoConvSet(object): 50 | def __init__(self, rows=30, cols=40, level_back=40, min_active_num=8, max_active_num=50): 51 | # network configurations depending on the problem 52 | self.input_num = 1 53 | 54 | self.func_type = ['ConvBlock32_3', 'ConvBlock32_5', 55 | 'ConvBlock64_3', 'ConvBlock64_5', 56 | 'ConvBlock128_3', 'ConvBlock128_5', 57 | 'pool_max', 'pool_ave', 58 | 'concat', 'sum'] 59 | self.func_in_num = [1, 1, 60 | 1, 1, 61 | 1, 1, 62 | 1, 1, 63 | 2, 2] 64 | 65 | self.out_num = 1 66 | self.out_type = ['full'] 67 | self.out_in_num = [1] 68 | 69 | # CGP network configuration 70 | self.rows = rows 71 | self.cols = cols 72 | self.node_num = rows * cols 73 | self.level_back = level_back 74 | self.min_active_num = min_active_num 75 | self.max_active_num = max_active_num 76 | 77 | self.func_type_num = len(self.func_type) 78 | self.out_type_num = len(self.out_type) 79 | self.max_in_num = np.max([np.max(self.func_in_num), np.max(self.out_in_num)]) 80 | 81 | 82 | class CgpInfoResSet(object): 83 | def __init__(self, rows=30, cols=40, level_back=40, min_active_num=8, max_active_num=50): 84 | # network configurations depending on the problem 85 | self.input_num = 1 86 | 87 | self.func_type = ['ResBlock32_3', 'ResBlock32_5', 88 | 'ResBlock64_3', 'ResBlock64_5', 89 | 'ResBlock128_3', 'ResBlock128_5', 90 | 'pool_max', 'pool_ave', 91 | 'concat', 'sum'] 92 | self.func_in_num = [1, 1, 93 | 1, 1, 94 | 1, 1, 95 | 1, 1, 96 | 2, 2] 97 | 98 | self.out_num = 1 99 | self.out_type = ['full'] 100 | self.out_in_num = [1] 101 | 102 | # CGP network configuration 103 | self.rows = rows 104 | self.cols = cols 105 | self.node_num = rows * cols 106 | self.level_back = level_back 107 | self.min_active_num = min_active_num 108 | self.max_active_num = max_active_num 109 | 110 | self.func_type_num = len(self.func_type) 111 | self.out_type_num = len(self.out_type) 112 | self.max_in_num = np.max([np.max(self.func_in_num), np.max(self.out_in_num)]) -------------------------------------------------------------------------------- /cnn_train.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | import chainer 5 | from chainer import cuda 6 | from chainer import computational_graph 7 | import six 8 | import time 9 | import numpy as np 10 | from chainer import serializers 11 | 12 | from cnn_model import CGP2CNN 13 | 14 | 15 | # __init__: load dataset 16 | # __call__: training the CNN defined by CGP list 17 | class CNN_train(): 18 | def __init__(self, dataset_name, validation=True, valid_data_ratio=0.1, verbose=True): 19 | # dataset_name: name of data set ('cifar10' or 'cifar100' or 'mnist') 20 | # validation: [True] model validation mode 21 | # (split training data set according to valid_data_ratio for evaluation of CGP individual) 22 | # [False] model test mode for final evaluation of the evolved model 23 | # (raining data : all training data, test data : all test data) 24 | # valid_data_ratio: ratio of the validation data 25 | # (e.g., if the number of all training data=50000 and valid_data_ratio=0.2, 26 | # the number of training data=40000, validation=10000) 27 | # verbose: flag of display 28 | self.verbose = verbose 29 | 30 | # load dataset 31 | if dataset_name == 'cifar10' or dataset_name == 'cifar100' or dataset_name == 'mnist': 32 | if dataset_name == 'cifar10': 33 | self.n_class = 10 34 | self.channel = 3 35 | self.pad_size = 4 36 | train, test = chainer.datasets.get_cifar10(withlabel=True, ndim=3, scale=1.0) 37 | elif dataset_name == 'cifar100': 38 | self.n_class = 100 39 | self.channel = 3 40 | self.pad_size = 4 41 | train, test = chainer.datasets.get_cifar100(withlabel=True, ndim=3, scale=1.0) 42 | else: # mnist 43 | self.n_class = 10 44 | self.channel = 1 45 | self.pad_size = 4 46 | train, test = chainer.datasets.get_mnist(withlabel=True, ndim=3, scale=1.0) 47 | 48 | # model validation mode 49 | if validation: 50 | # split into train and validation data 51 | np.random.seed(2016) # always same data splitting 52 | order = np.random.permutation(len(train)) 53 | np.random.seed() 54 | if self.verbose: 55 | print('\tdata split order: ', order) 56 | train_size = int(len(train) * (1. - valid_data_ratio)) 57 | # train data 58 | self.x_train, self.y_train = train[order[:train_size]][0], train[order[:train_size]][1] 59 | # test data (for validation) 60 | self.x_test, self.y_test = train[order[train_size:]][0], train[order[train_size:]][1] 61 | # model test mode 62 | else: 63 | # train data 64 | self.x_train, self.y_train = train[range(len(train))][0], train[range(len(train))][1] 65 | # test data 66 | self.x_test, self.y_test = test[range(len(test))][0], test[range(len(test))][1] 67 | else: 68 | print('\tInvalid input dataset name at CNN_train()') 69 | exit(1) 70 | 71 | # preprocessing (subtraction of mean pixel values) 72 | x_mean = 0 73 | for x in self.x_train: 74 | x_mean += x 75 | x_mean /= len(self.x_train) 76 | self.x_train -= x_mean 77 | self.x_test -= x_mean 78 | 79 | # data size 80 | self.train_data_num = len(self.x_train) 81 | self.test_data_num = len(self.x_test) 82 | if self.verbose: 83 | print('\ttrain data shape:', self.x_train.shape) 84 | print('\ttest data shape :', self.x_test.shape) 85 | 86 | def __call__(self, cgp, gpuID, epoch_num=200, batchsize=256, weight_decay=1e-4, eval_epoch_num=10, 87 | data_aug=True, comp_graph='comp_graph.dot', out_model='mymodel.model', init_model=None, 88 | retrain_mode=False): 89 | if self.verbose: 90 | print('\tGPUID :', gpuID) 91 | print('\tepoch_num:', epoch_num) 92 | print('\tbatchsize:', batchsize) 93 | 94 | chainer.cuda.get_device(gpuID).use() # Make a specified GPU current 95 | model = CGP2CNN(cgp, self.n_class) 96 | if init_model is not None: 97 | if self.verbose: 98 | print('\tLoad model from', init_model) 99 | serializers.load_npz(init_model, model) 100 | model.to_gpu(gpuID) 101 | optimizer = chainer.optimizers.Adam() if not retrain_mode else chainer.optimizers.MomentumSGD(lr=0.01) 102 | optimizer.setup(model) 103 | optimizer.add_hook(chainer.optimizer.WeightDecay(weight_decay)) 104 | 105 | eval_epoch_num = np.min((eval_epoch_num, epoch_num)) 106 | test_accuracies = np.zeros(eval_epoch_num) 107 | for epoch in six.moves.range(1, epoch_num+1): 108 | if self.verbose: 109 | print('\tepoch', epoch) 110 | perm = np.random.permutation(self.train_data_num) 111 | train_accuracy = train_loss = 0 112 | start = time.time() 113 | for i in six.moves.range(0, self.train_data_num, batchsize): 114 | xx_train = self.data_augmentation(self.x_train[perm[i:i + batchsize]]) if data_aug else self.x_train[perm[i:i + batchsize]] 115 | x = chainer.Variable(cuda.to_gpu(xx_train)) 116 | t = chainer.Variable(cuda.to_gpu(self.y_train[perm[i:i + batchsize]])) 117 | try: 118 | optimizer.update(model, x, t) 119 | except: 120 | import traceback 121 | traceback.print_exc() 122 | return 0. 123 | 124 | if comp_graph is not None and epoch == 1 and i == 0: 125 | with open(comp_graph, 'w') as o: 126 | g = computational_graph.build_computational_graph((model.loss, )) 127 | o.write(g.dump()) 128 | del g 129 | if self.verbose: 130 | print('\tCNN graph generated.') 131 | 132 | train_loss += float(model.loss.data) * len(t.data) 133 | train_accuracy += float(model.accuracy.data) * len(t.data) 134 | elapsed_time = time.time() - start 135 | throughput = self.train_data_num / elapsed_time 136 | if self.verbose: 137 | print('\ttrain mean loss={}, train accuracy={}, time={}, throughput={} images/sec, paramNum={}'.format(train_loss / self.train_data_num, train_accuracy / self.train_data_num, elapsed_time, throughput, model.param_num)) 138 | 139 | # apply the model to test data 140 | # use the maximum validation accuracy in the last 10 epoch as the fitness value 141 | eval_index = epoch - (epoch_num - eval_epoch_num) -1 142 | if self.verbose or eval_index >= 0: 143 | test_accuracy, test_loss = self.__test(model, batchsize) 144 | if self.verbose: 145 | print('\tvalid mean loss={}, valid accuracy={}'.format(test_loss / self.test_data_num, test_accuracy / self.test_data_num)) 146 | if eval_index >= 0: 147 | test_accuracies[eval_index] = test_accuracy / self.test_data_num 148 | 149 | # decay the learning rate 150 | if not retrain_mode and epoch % 30 == 0: 151 | optimizer.alpha *= 0.1 152 | elif retrain_mode: 153 | if epoch == 5: 154 | optimizer.lr = 0.1 155 | if epoch == 250: 156 | optimizer.lr *= 0.1 157 | if epoch == 375: 158 | optimizer.lr *= 0.1 159 | 160 | # test_accuracy, test_loss = self.__test(model, batchsize) 161 | if out_model is not None: 162 | model.to_cpu() 163 | serializers.save_npz(out_model, model) 164 | 165 | return np.max(test_accuracies) 166 | 167 | def test(self, cgp, model_file, comp_graph='comp_graph.dot', batchsize=256): 168 | chainer.cuda.get_device(0).use() # Make a specified GPU current 169 | model = CGP2CNN(cgp, self.n_class) 170 | print('\tLoad model from', model_file) 171 | serializers.load_npz(model_file, model) 172 | model.to_gpu(0) 173 | test_accuracy, test_loss = self.__test(model, batchsize) 174 | print('\tparamNum={}'.format(model.param_num)) 175 | print('\ttest mean loss={}, test accuracy={}'.format(test_loss / self.test_data_num, test_accuracy / self.test_data_num)) 176 | 177 | if comp_graph is not None: 178 | with open(comp_graph, 'w') as o: 179 | g = computational_graph.build_computational_graph((model.loss,)) 180 | o.write(g.dump()) 181 | del g 182 | print('\tCNN graph generated ({}).'.format(comp_graph)) 183 | 184 | return test_accuracy, test_loss 185 | 186 | def __test(self, model, batchsize): 187 | model.train = False 188 | test_accuracy = test_loss = 0 189 | for i in six.moves.range(0, self.test_data_num, batchsize): 190 | x = chainer.Variable(cuda.to_gpu(self.x_test[i:i + batchsize]), volatile=True) 191 | t = chainer.Variable(cuda.to_gpu(self.y_test[i:i + batchsize]), volatile=True) 192 | loss = model(x, t) 193 | test_loss += float(loss.data) * len(t.data) 194 | test_accuracy += float(model.accuracy.data) * len(t.data) 195 | model.train = True 196 | return test_accuracy, test_loss 197 | 198 | def data_augmentation(self, x_train): 199 | _, c, h, w = x_train.shape 200 | pad_h = h + 2 * self.pad_size 201 | pad_w = w + 2 * self.pad_size 202 | aug_data = np.zeros_like(x_train) 203 | for i, x in enumerate(x_train): 204 | pad_img = np.zeros((c, pad_h, pad_w)) 205 | pad_img[:, self.pad_size:h+self.pad_size, self.pad_size:w+self.pad_size] = x 206 | 207 | # Randomly crop and horizontal flip the image 208 | top = np.random.randint(0, pad_h - h + 1) 209 | left = np.random.randint(0, pad_w - w + 1) 210 | bottom = top + h 211 | right = left + w 212 | if np.random.randint(0, 2): 213 | pad_img = pad_img[:, :, ::-1] 214 | 215 | aug_data[i] = pad_img[:, top:bottom, left:right] 216 | 217 | return aug_data 218 | -------------------------------------------------------------------------------- /cnn_model.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | import chainer 5 | import chainer.functions as F 6 | import chainer.links as L 7 | from chainer.functions.loss import softmax_cross_entropy 8 | from chainer.functions.evaluation import accuracy 9 | from chainer import reporter 10 | 11 | 12 | # CONV -> Batch -> ReLU 13 | class ConvBlock(chainer.Chain): 14 | def __init__(self, ksize, n_out, initializer): 15 | super(ConvBlock, self).__init__() 16 | pad_size = ksize // 2 17 | links = [('conv1', L.Convolution2D(None, n_out, ksize, pad=pad_size, initialW=initializer))] 18 | links += [('bn1', L.BatchNormalization(n_out))] 19 | for link in links: 20 | self.add_link(*link) 21 | self.forward = links 22 | 23 | def __call__(self, x, train): 24 | param_num = 0 25 | for name, f in self.forward: 26 | if 'conv1' in name: 27 | x = getattr(self, name)(x) 28 | param_num += (f.W.shape[0]*f.W.shape[2]*f.W.shape[3]*f.W.shape[1]+f.W.shape[0]) 29 | elif 'bn1' in name: 30 | x = getattr(self, name)(x, not train) 31 | param_num += x.data.shape[1]*2 32 | return (F.relu(x), param_num) 33 | 34 | 35 | # [(CONV -> Batch -> ReLU -> CONV -> Batch) + (x)] 36 | class ResBlock(chainer.Chain): 37 | def __init__(self, ksize, n_out, initializer): 38 | super(ResBlock, self).__init__() 39 | pad_size = ksize // 2 40 | links = [('conv1', L.Convolution2D(None, n_out, ksize, pad=pad_size, initialW=initializer))] 41 | links += [('bn1', L.BatchNormalization(n_out))] 42 | links += [('_act1', F.ReLU())] 43 | links += [('conv2', L.Convolution2D(n_out, n_out, ksize, pad=pad_size, initialW=initializer))] 44 | links += [('bn2', L.BatchNormalization(n_out))] 45 | for link in links: 46 | if not link[0].startswith('_'): 47 | self.add_link(*link) 48 | self.forward = links 49 | 50 | def __call__(self, x, h, train): 51 | xp = chainer.cuda.get_array_module(x) 52 | param_num = 0 53 | for name, f in self.forward: 54 | if 'conv' in name: 55 | x = getattr(self, name)(x) 56 | param_num += (f.W.shape[0]*f.W.shape[2]*f.W.shape[3]*f.W.shape[1]+f.W.shape[0]) 57 | elif 'bn' in name: 58 | x = getattr(self, name)(x, not train) 59 | param_num += x.data.shape[1]*2 60 | elif 'act' in name: 61 | x = f(x) 62 | else: 63 | print('not defined function at ResBlock __call__') 64 | exit(1) 65 | in_data = [x, h] 66 | # check of the image size 67 | small_in_id, large_in_id = (0, 1) if in_data[0].shape[2] < in_data[1].shape[2] else (1, 0) 68 | pool_num = xp.floor(xp.log2(in_data[large_in_id].shape[2] / in_data[small_in_id].shape[2])) 69 | for _ in xp.arange(pool_num): 70 | in_data[large_in_id] = F.max_pooling_2d(in_data[large_in_id], self.pool_size, self.pool_size, 0, False) 71 | # check of the channel size 72 | small_ch_id, large_ch_id = (0, 1) if in_data[0].shape[1] < in_data[1].shape[1] else (1, 0) 73 | pad_num = int(in_data[large_ch_id].shape[1] - in_data[small_ch_id].shape[1]) 74 | tmp = in_data[large_ch_id][:, :pad_num, :, :] 75 | in_data[small_ch_id] = F.concat((in_data[small_ch_id], tmp * 0), axis=1) 76 | return (F.relu(in_data[0]+in_data[1]), param_num) 77 | 78 | 79 | # Construct a CNN model using CGP (list) 80 | class CGP2CNN(chainer.Chain): 81 | def __init__(self, cgp, n_class, lossfun=softmax_cross_entropy.softmax_cross_entropy, accfun=accuracy.accuracy): 82 | super(CGP2CNN, self).__init__() 83 | self.cgp = cgp 84 | self.pool_size = 2 85 | initializer = chainer.initializers.HeNormal() 86 | links = [] 87 | i = 1 88 | for name, in1, in2 in self.cgp: 89 | if name == 'pool_max': 90 | links += [('_'+name+'_'+str(i), F.MaxPooling2D(self.pool_size, self.pool_size, 0, False))] 91 | elif name == 'pool_ave': 92 | links += [('_'+name+'_'+str(i), F.AveragePooling2D(self.pool_size, self.pool_size, 0, False))] 93 | elif name == 'concat': 94 | links += [('_'+name+'_'+str(i), F.Concat())] 95 | elif name == 'sum': 96 | links += [('_'+name+'_'+str(i), F.Concat())] # the F.Concat() is dummy 97 | elif name == 'ConvBlock32_3': 98 | links += [(name+'_'+str(i), ConvBlock(3, 32, initializer))] 99 | elif name == 'ConvBlock32_5': 100 | links += [(name+'_'+str(i), ConvBlock(5, 32, initializer))] 101 | elif name == 'ConvBlock32_7': 102 | links += [(name+'_'+str(i), ConvBlock(7, 32, initializer))] 103 | elif name == 'ConvBlock64_3': 104 | links += [(name+'_'+str(i), ConvBlock(3, 64, initializer))] 105 | elif name == 'ConvBlock64_5': 106 | links += [(name+'_'+str(i), ConvBlock(5, 64, initializer))] 107 | elif name == 'ConvBlock64_7': 108 | links += [(name+'_'+str(i), ConvBlock(7, 64, initializer))] 109 | elif name == 'ConvBlock128_3': 110 | links += [(name+'_'+str(i), ConvBlock(3, 128, initializer))] 111 | elif name == 'ConvBlock128_5': 112 | links += [(name+'_'+str(i), ConvBlock(5, 128, initializer))] 113 | elif name == 'ConvBlock128_7': 114 | links += [(name+'_'+str(i), ConvBlock(7, 128, initializer))] 115 | elif name == 'ResBlock32_3': 116 | links += [(name+'_'+str(i), ResBlock(3, 32, initializer))] 117 | elif name == 'ResBlock32_5': 118 | links += [(name+'_'+str(i), ResBlock(5, 32, initializer))] 119 | elif name == 'ResBlock32_7': 120 | links += [(name+'_'+str(i), ResBlock(7, 32, initializer))] 121 | elif name == 'ResBlock64_3': 122 | links += [(name+'_'+str(i), ResBlock(3, 64, initializer))] 123 | elif name == 'ResBlock64_5': 124 | links += [(name+'_'+str(i), ResBlock(5, 64, initializer))] 125 | elif name == 'ResBlock64_7': 126 | links += [(name+'_'+str(i), ResBlock(7, 64, initializer))] 127 | elif name == 'ResBlock128_3': 128 | links += [(name+'_'+str(i), ResBlock(3, 128, initializer))] 129 | elif name == 'ResBlock128_5': 130 | links += [(name+'_'+str(i), ResBlock(5, 128, initializer))] 131 | elif name == 'ResBlock128_7': 132 | links += [(name+'_'+str(i), ResBlock(7, 128, initializer))] 133 | elif name == 'full': 134 | links += [(name+'_'+str(i), L.Linear(None, n_class, initialW=initializer))] 135 | i += 1 136 | for link in links: 137 | if not link[0].startswith('_'): 138 | self.add_link(*link) 139 | self.forward = links 140 | self.train = True 141 | self.lossfun = lossfun 142 | self.accfun = accfun 143 | self.loss = None 144 | self.accuracy = None 145 | self.outputs = [None for _ in range(len(self.cgp))] 146 | self.param_num = 0 147 | 148 | def __call__(self, x, t): 149 | xp = chainer.cuda.get_array_module(x) 150 | outputs = self.outputs 151 | outputs[0] = x # input image 152 | nodeID = 1 153 | param_num = 0 154 | for name, f in self.forward: 155 | if 'ConvBlock' in name: 156 | outputs[nodeID], tmp_num = getattr(self, name)(outputs[self.cgp[nodeID][1]], self.train) 157 | nodeID += 1 158 | param_num += tmp_num 159 | elif 'ResBlock' in name: 160 | outputs[nodeID], tmp_num = getattr(self, name)(outputs[self.cgp[nodeID][1]], outputs[self.cgp[nodeID][1]], self.train) 161 | nodeID += 1 162 | param_num += tmp_num 163 | elif 'pool' in name: 164 | # check of the image size 165 | if outputs[self.cgp[nodeID][1]].shape[2] > 1: 166 | outputs[nodeID] = f(outputs[self.cgp[nodeID][1]]) 167 | nodeID += 1 168 | else: 169 | outputs[nodeID] = outputs[self.cgp[nodeID][1]] 170 | nodeID += 1 171 | elif 'concat' in name: 172 | in_data = [outputs[self.cgp[nodeID][1]], outputs[self.cgp[nodeID][2]]] 173 | # check of the image size 174 | small_in_id, large_in_id = (0, 1) if in_data[0].shape[2] < in_data[1].shape[2] else (1, 0) 175 | pool_num = xp.floor(xp.log2(in_data[large_in_id].shape[2] / in_data[small_in_id].shape[2])) 176 | for _ in xp.arange(pool_num): 177 | in_data[large_in_id] = F.max_pooling_2d(in_data[large_in_id], self.pool_size, self.pool_size, 0, False) 178 | # concat 179 | outputs[nodeID] = f(in_data[0], in_data[1]) 180 | nodeID += 1 181 | elif 'sum' in name: 182 | in_data = [outputs[self.cgp[nodeID][1]], outputs[self.cgp[nodeID][2]]] 183 | # check of the image size 184 | small_in_id, large_in_id = (0, 1) if in_data[0].shape[2] < in_data[1].shape[2] else (1, 0) 185 | pool_num = xp.floor(xp.log2(in_data[large_in_id].shape[2] / in_data[small_in_id].shape[2])) 186 | for _ in xp.arange(pool_num): 187 | in_data[large_in_id] = F.max_pooling_2d(in_data[large_in_id], self.pool_size, self.pool_size, 0, False) 188 | # check of the channel size 189 | small_ch_id, large_ch_id = (0, 1) if in_data[0].shape[1] < in_data[1].shape[1] else (1, 0) 190 | pad_num = int(in_data[large_ch_id].shape[1] - in_data[small_ch_id].shape[1]) 191 | tmp = in_data[large_ch_id][:, :pad_num, :, :] 192 | in_data[small_ch_id] = F.concat((in_data[small_ch_id], tmp * 0), axis=1) 193 | # summation 194 | outputs[nodeID] = in_data[0] + in_data[1] 195 | nodeID += 1 196 | elif 'full' in name: 197 | outputs[nodeID] = getattr(self, name)(outputs[self.cgp[nodeID][1]]) 198 | nodeID += 1 199 | param_num += f.W.data.shape[0] * f.W.data.shape[1] + f.b.data.shape[0] 200 | else: 201 | print('not defined function at CGP2CNN __call__') 202 | exit(1) 203 | self.param_num = param_num 204 | 205 | if t is not None: 206 | self.loss = None 207 | self.accuracy = None 208 | self.loss = self.lossfun(outputs[-1], t) 209 | reporter.report({'loss': self.loss}, self) 210 | self.accuracy = self.accfun(outputs[-1], t) 211 | reporter.report({'accuracy': self.accuracy}, self) 212 | return self.loss 213 | else: 214 | return outputs[-1] 215 | -------------------------------------------------------------------------------- /cgp.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | import csv 5 | import time 6 | import numpy as np 7 | 8 | 9 | class Individual(object): 10 | 11 | def __init__(self, net_info): 12 | self.net_info = net_info 13 | self.gene = np.zeros((self.net_info.node_num + self.net_info.out_num, self.net_info.max_in_num + 1)).astype(int) 14 | self.is_active = np.empty(self.net_info.node_num + self.net_info.out_num).astype(bool) 15 | self.eval = None 16 | self.init_gene() 17 | 18 | def init_gene(self): 19 | # intermediate node 20 | for n in range(self.net_info.node_num + self.net_info.out_num): 21 | # type gene 22 | type_num = self.net_info.func_type_num if n < self.net_info.node_num else self.net_info.out_type_num 23 | self.gene[n][0] = np.random.randint(type_num) 24 | # connection gene 25 | col = np.min((int(n / self.net_info.rows), self.net_info.cols)) 26 | max_connect_id = col * self.net_info.rows + self.net_info.input_num 27 | min_connect_id = (col - self.net_info.level_back) * self.net_info.rows + self.net_info.input_num \ 28 | if col - self.net_info.level_back >= 0 else 0 29 | for i in range(self.net_info.max_in_num): 30 | self.gene[n][i + 1] = min_connect_id + np.random.randint(max_connect_id - min_connect_id) 31 | 32 | self.check_active() 33 | 34 | def __check_course_to_out(self, n): 35 | if not self.is_active[n]: 36 | self.is_active[n] = True 37 | t = self.gene[n][0] 38 | if n >= self.net_info.node_num: # output node 39 | in_num = self.net_info.out_in_num[t] 40 | else: # intermediate node 41 | in_num = self.net_info.func_in_num[t] 42 | 43 | for i in range(in_num): 44 | if self.gene[n][i+1] >= self.net_info.input_num: 45 | self.__check_course_to_out(self.gene[n][i+1] - self.net_info.input_num) 46 | 47 | def check_active(self): 48 | # clear 49 | self.is_active[:] = False 50 | # start from output nodes 51 | for n in range(self.net_info.out_num): 52 | self.__check_course_to_out(self.net_info.node_num + n) 53 | 54 | def __mutate(self, current, min_int, max_int): 55 | mutated_gene = current 56 | while current == mutated_gene: 57 | mutated_gene = min_int + np.random.randint(max_int - min_int) 58 | return mutated_gene 59 | 60 | def mutation(self, mutation_rate=0.01): 61 | active_check = False 62 | 63 | for n in range(self.net_info.node_num + self.net_info.out_num): 64 | t = self.gene[n][0] 65 | # mutation for type gene 66 | type_num = self.net_info.func_type_num if n < self.net_info.node_num else self.net_info.out_type_num 67 | if np.random.rand() < mutation_rate and type_num > 1: 68 | self.gene[n][0] = self.__mutate(self.gene[n][0], 0, type_num) 69 | if self.is_active[n]: 70 | active_check = True 71 | # mutation for connection gene 72 | col = np.min((int(n / self.net_info.rows), self.net_info.cols)) 73 | max_connect_id = col * self.net_info.rows + self.net_info.input_num 74 | min_connect_id = (col - self.net_info.level_back) * self.net_info.rows + self.net_info.input_num \ 75 | if col - self.net_info.level_back >= 0 else 0 76 | in_num = self.net_info.func_in_num[t] if n < self.net_info.node_num else self.net_info.out_in_num[t] 77 | for i in range(self.net_info.max_in_num): 78 | if np.random.rand() < mutation_rate and max_connect_id - min_connect_id > 1: 79 | self.gene[n][i+1] = self.__mutate(self.gene[n][i+1], min_connect_id, max_connect_id) 80 | if self.is_active[n] and i < in_num: 81 | active_check = True 82 | 83 | self.check_active() 84 | return active_check 85 | 86 | def neutral_mutation(self, mutation_rate=0.01): 87 | for n in range(self.net_info.node_num + self.net_info.out_num): 88 | t = self.gene[n][0] 89 | # mutation for type gene 90 | type_num = self.net_info.func_type_num if n < self.net_info.node_num else self.net_info.out_type_num 91 | if not self.is_active[n] and np.random.rand() < mutation_rate and type_num > 1: 92 | self.gene[n][0] = self.__mutate(self.gene[n][0], 0, type_num) 93 | # mutation for connection gene 94 | col = np.min((int(n / self.net_info.rows), self.net_info.cols)) 95 | max_connect_id = col * self.net_info.rows + self.net_info.input_num 96 | min_connect_id = (col - self.net_info.level_back) * self.net_info.rows + self.net_info.input_num \ 97 | if col - self.net_info.level_back >= 0 else 0 98 | in_num = self.net_info.func_in_num[t] if n < self.net_info.node_num else self.net_info.out_in_num[t] 99 | for i in range(self.net_info.max_in_num): 100 | if (not self.is_active[n] or i >= in_num) and np.random.rand() < mutation_rate \ 101 | and max_connect_id - min_connect_id > 1: 102 | self.gene[n][i+1] = self.__mutate(self.gene[n][i+1], min_connect_id, max_connect_id) 103 | 104 | self.check_active() 105 | return False 106 | 107 | def count_active_node(self): 108 | return self.is_active.sum() 109 | 110 | def copy(self, source): 111 | self.net_info = source.net_info 112 | self.gene = source.gene.copy() 113 | self.is_active = source.is_active.copy() 114 | self.eval = source.eval 115 | 116 | def active_net_list(self): 117 | net_list = [["input", 0, 0]] 118 | active_cnt = np.arange(self.net_info.input_num + self.net_info.node_num + self.net_info.out_num) 119 | active_cnt[self.net_info.input_num:] = np.cumsum(self.is_active) 120 | 121 | for n, is_a in enumerate(self.is_active): 122 | if is_a: 123 | t = self.gene[n][0] 124 | if n < self.net_info.node_num: # intermediate node 125 | type_str = self.net_info.func_type[t] 126 | else: # output node 127 | type_str = self.net_info.out_type[t] 128 | 129 | connections = [active_cnt[self.gene[n][i+1]] for i in range(self.net_info.max_in_num)] 130 | net_list.append([type_str] + connections) 131 | return net_list 132 | 133 | 134 | # CGP with (1 + \lambda)-ES 135 | class CGP(object): 136 | 137 | def __init__(self, net_info, eval_func, lam=4): 138 | self.lam = lam 139 | self.pop = [Individual(net_info) for _ in range(1 + self.lam)] 140 | self.eval_func = eval_func 141 | 142 | self.num_gen = 0 143 | self.num_eval = 0 144 | 145 | def _evaluation(self, pop, eval_flag): 146 | # create network list 147 | net_lists = [] 148 | active_index = np.where(eval_flag)[0] 149 | for i in active_index: 150 | net_lists.append(pop[i].active_net_list()) 151 | 152 | # evaluation 153 | fp = self.eval_func(net_lists) 154 | for i, j in enumerate(active_index): 155 | pop[j].eval = fp[i] 156 | evaluations = np.zeros(len(pop)) 157 | for i in range(len(pop)): 158 | evaluations[i] = pop[i].eval 159 | 160 | self.num_eval += len(net_lists) 161 | return evaluations 162 | 163 | def _log_data(self, net_info_type='active_only'): 164 | log_list = [self.num_gen, self.num_eval, time.clock(), self.pop[0].eval, self.pop[0].count_active_node()] 165 | if net_info_type == 'active_only': 166 | log_list.append(self.pop[0].active_net_list()) 167 | elif net_info_type == 'full': 168 | log_list += self.pop[0].gene.flatten().tolist() 169 | else: 170 | pass 171 | return log_list 172 | 173 | def load_log(self, log_data): 174 | self.num_gen = log_data[0] 175 | self.num_eval = log_data[1] 176 | net_info = self.pop[0].net_info 177 | self.pop[0].eval = log_data[3] 178 | self.pop[0].gene = np.array(log_data[5:]).reshape((net_info.node_num + net_info.out_num, net_info.max_in_num + 1)) 179 | self.pop[0].check_active() 180 | 181 | # Usual evolution procedure of CGP (This is not used for GECCO 2017 paper) 182 | def evolution(self, max_eval=100, mutation_rate=0.01, log_file='./log.txt'): 183 | with open(log_file, 'w') as fw: 184 | writer = csv.writer(fw, lineterminator='\n') 185 | 186 | eval_flag = np.empty(self.lam) 187 | 188 | self._evaluation([self.pop[0]], np.array([True])) 189 | print(self._log_data(net_info_type='active_only')) 190 | 191 | while self.num_eval < max_eval: 192 | self.num_gen += 1 193 | 194 | # reproduction 195 | for i in range(self.lam): 196 | self.pop[i+1].copy(self.pop[0]) # copy a parent 197 | eval_flag[i] = self.pop[i+1].mutation(mutation_rate) # mutation 198 | 199 | # evaluation and selection 200 | evaluations = self._evaluation(self.pop[1:], eval_flag=eval_flag) 201 | best_arg = evaluations.argmax() 202 | if evaluations[best_arg] >= self.pop[0].eval: 203 | self.pop[0].copy(self.pop[best_arg+1]) 204 | 205 | # display and save log 206 | if eval_flag.sum() > 0: 207 | print(self._log_data(net_info_type='active_only')) 208 | writer.writerow(self._log_data(net_info_type='full')) 209 | 210 | # Modified CGP (used for GECCO 2017 paper): 211 | # At each iteration: 212 | # - Generate lambda individuals in which at least one active node changes (i.e., forced mutation) 213 | # - Mutate the best individual with neutral mutation (unchanging the active nodes) 214 | # if the best individual is not updated. 215 | def modified_evolution(self, max_eval=100, mutation_rate=0.01, log_file='./log.txt'): 216 | with open(log_file, 'w') as fw: 217 | writer = csv.writer(fw, lineterminator='\n') 218 | 219 | eval_flag = np.empty(self.lam) 220 | 221 | active_num = self.pop[0].count_active_node() 222 | while active_num < self.pop[0].net_info.min_active_num or active_num > self.pop[0].net_info.max_active_num: 223 | self.pop[0].mutation(1.0) 224 | active_num = self.pop[0].count_active_node() 225 | self._evaluation([self.pop[0]], np.array([True])) 226 | print(self._log_data(net_info_type='active_only')) 227 | 228 | while self.num_eval < max_eval: 229 | self.num_gen += 1 230 | 231 | # reproduction 232 | for i in range(self.lam): 233 | eval_flag[i] = False 234 | self.pop[i + 1].copy(self.pop[0]) # copy a parent 235 | active_num = self.pop[i + 1].count_active_node() 236 | 237 | # forced mutation 238 | while not eval_flag[i] or active_num < self.pop[i + 1].net_info.min_active_num \ 239 | or active_num > self.pop[i + 1].net_info.max_active_num: 240 | self.pop[i + 1].copy(self.pop[0]) # copy a parent 241 | eval_flag[i] = self.pop[i + 1].mutation(mutation_rate) # mutation 242 | active_num = self.pop[i + 1].count_active_node() 243 | 244 | # evaluation and selection 245 | evaluations = self._evaluation(self.pop[1:], eval_flag=eval_flag) 246 | best_arg = evaluations.argmax() 247 | if evaluations[best_arg] > self.pop[0].eval: 248 | self.pop[0].copy(self.pop[best_arg + 1]) 249 | else: 250 | self.pop[0].neutral_mutation(mutation_rate) # neutral mutation 251 | 252 | # display and save log 253 | print(self._log_data(net_info_type='active_only')) 254 | writer.writerow(self._log_data(net_info_type='full')) 255 | --------------------------------------------------------------------------------