├── .gitignore ├── LICENSE.txt ├── README.md ├── requirements.txt └── src ├── __init__.py ├── npi ├── __init__.py ├── add │ ├── __init__.py │ ├── config.py │ ├── create_training_data.py │ ├── lib.py │ ├── model.py │ ├── test_model.py │ └── training_model.py ├── core.py └── terminal_core.py ├── run_create_addition_data.sh ├── run_test_addition_model.sh └── run_train_addition_model.sh /.gitignore: -------------------------------------------------------------------------------- 1 | .idea/ 2 | .python-version 3 | *.pyc 4 | *.log 5 | *.png 6 | *.model 7 | *.pkl 8 | .DS_Store 9 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | Copyright (c) 2016 Ken Morishita 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining 4 | a copy of this software and associated documentation files (the 5 | "Software"), to deal in the Software without restriction, including 6 | without limitation the rights to use, copy, modify, merge, publish, 7 | distribute, sublicense, and/or sell copies of the Software, and to 8 | permit persons to whom the Software is furnished to do so, subject to 9 | the following conditions: 10 | 11 | The above copyright notice and this permission notice shall be 12 | included in all copies or substantial portions of the Software. 13 | 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 15 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 16 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 17 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE 18 | LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 19 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION 20 | WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 21 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | About 2 | ===== 3 | 4 | Implementation of [Neural Programmer-Interpreters](http://arxiv.org/abs/1511.06279) with Keras. 5 | 6 | How to Demo 7 | =========== 8 | 9 | [Demo Movie](https://youtu.be/s7PuBqwI2YA) 10 | 11 | requirement 12 | ----------- 13 | 14 | * Python3 15 | 16 | setup 17 | ----- 18 | 19 | ``` 20 | pip install -r requirements.txt 21 | ``` 22 | 23 | create training dataset 24 | ----------------------- 25 | ### create training dataset 26 | ``` 27 | sh src/run_create_addition_data.sh 28 | ``` 29 | 30 | ### create training dataset with showing steps on terminal 31 | ``` 32 | DEBUG=1 sh src/run_create_addition_data.sh 33 | ``` 34 | 35 | training model 36 | ------------------ 37 | ### Create New Model (-> remove old model if exists and then create new model) 38 | ``` 39 | NEW_MODEL=1 sh src/run_train_addition_model.sh 40 | ``` 41 | 42 | ### Training Existing Model (-> if a model exists, use the model) 43 | ``` 44 | sh src/run_train_addition_model.sh 45 | ``` 46 | 47 | test model 48 | ---------- 49 | ### check the model accuracy 50 | ``` 51 | sh src/run_test_addition_model.sh 52 | ``` 53 | 54 | ### check the model accuracy with showing steps on terminal 55 | ``` 56 | DEBUG=1 sh src/run_test_addition_model.sh 57 | ``` 58 | 59 | Implementation FAQ 60 | ================== 61 | These are questions about implementation that I received in the past. 62 | 63 | about pydot 64 | ----------- 65 | Q: I am using Python3. I am getting an error "module 'pydot' has no attribute 'find_graphviz'". 66 | 67 | A: Let's try `pydot-ng`. 68 | 69 | `train_f_enc` method 70 | -------------------- 71 | Q: What is the purpose of 'env_model' in 'train_f_enc' method which gets called by 'fit' method? My guess is, it is to train the weights of 'f_enc' layer. 72 | 73 | A: Yes, that's right. 74 | 75 | ---- 76 | 77 | Q: Why is the target output of 'env_model' - [[first digit of sum], [ carry of sum]]? 78 | Also, why does the target output not have 'output' 79 | As per my understanding the weights of 'f_enc' layer should be trained only in 'self.model'. 80 | 81 | A: Yes, in the original paper, 'f_enc' is trained with other layers. It is better not to be trained separately. 82 | 83 | The reason of that in my implementation is just difficulty to train the model. 84 | Especially it seemed to hard to train layers before LSTMs (like f_enc layer). 85 | f_enc weights often became some NaNs. (I don't know why... keras problem? or ??) 86 | So, I tried to train f_enc separately, and it seemed good (not best). 87 | 88 | NOP program 89 | ----------- 90 | Q: what's the purpose of NOP program? 91 | 92 | A: I do not remember it much, but NOP (No Operation) is program_id = 0. 93 | I thought that in the early days of learning, the predicted value often becomes 0, and harmless NOPs that do not perform unnecessary movements will learn more efficiently. 94 | Although it is not certain whether it is effective... 95 | 96 | `weights = [[1.]]` 97 | ------------------ 98 | 99 | Q: what's the purpose of `weights = [[1.]]` this initialization? 100 | 101 | A: You mean `weights = [[1.]]` in `AdditionNPIModel#convert_output()`, don't you? 102 | 103 | The `weights` means learning weights of [f_end, f_prog, f_args]. 104 | The first `weights = [[1.]]` means "f_end's weight=1". 105 | f_prog and f_args weights are set to 1 if the teacher returns valid values. -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | h5py==2.6.0 2 | Keras==1.0.2 3 | numpy==1.11.0 4 | pydot-ng==1.0.0 5 | pyparsing==2.1.1 6 | PyYAML==3.11 7 | scipy==0.17.0 8 | six==1.10.0 9 | Theano==0.8.2 10 | -------------------------------------------------------------------------------- /src/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mokemokechicken/keras_npi/05a4227effe9d1752fb953b9ffbd9ad29ba79e04/src/__init__.py -------------------------------------------------------------------------------- /src/npi/__init__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding: utf-8 3 | 4 | __author__ = 'k_morishita' 5 | -------------------------------------------------------------------------------- /src/npi/add/__init__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding: utf-8 3 | 4 | __author__ = 'k_morishita' 5 | -------------------------------------------------------------------------------- /src/npi/add/config.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | 3 | FIELD_ROW = 4 # Input1, Input2, Carry, Output 4 | FIELD_WIDTH = 9 # number of columns 5 | FIELD_DEPTH = 11 # number of characters(0~9 digits) and white space, per cell. one-hot-encoding 6 | PROGRAM_VEC_SIZE = 10 7 | PROGRAM_KEY_VEC_SIZE = 5 8 | MAX_PROGRAM_NUM = 10 9 | -------------------------------------------------------------------------------- /src/npi/add/create_training_data.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | import os 3 | import curses 4 | import pickle 5 | from copy import copy 6 | 7 | from npi.add.config import FIELD_ROW, FIELD_WIDTH, FIELD_DEPTH 8 | from npi.add.lib import AdditionEnv, AdditionProgramSet, AdditionTeacher, create_char_map, create_questions, run_npi 9 | from npi.core import ResultLogger 10 | from npi.terminal_core import TerminalNPIRunner, Terminal 11 | 12 | 13 | def main(stdscr, filename: str, num: int, result_logger: ResultLogger): 14 | terminal = Terminal(stdscr, create_char_map()) 15 | terminal.init_window(FIELD_WIDTH, FIELD_ROW) 16 | program_set = AdditionProgramSet() 17 | addition_env = AdditionEnv(FIELD_ROW, FIELD_WIDTH, FIELD_DEPTH) 18 | 19 | questions = create_questions(num) 20 | teacher = AdditionTeacher(program_set) 21 | npi_runner = TerminalNPIRunner(terminal, teacher) 22 | npi_runner.verbose = DEBUG_MODE 23 | steps_list = [] 24 | for data in questions: 25 | addition_env.reset() 26 | q = copy(data) 27 | run_npi(addition_env, npi_runner, program_set.ADD, data) 28 | steps_list.append({"q": q, "steps": npi_runner.step_list}) 29 | result_logger.write(data) 30 | terminal.add_log(data) 31 | 32 | if filename: 33 | with open(filename, 'wb') as f: 34 | pickle.dump(steps_list, f, protocol=pickle.HIGHEST_PROTOCOL) 35 | 36 | if __name__ == '__main__': 37 | import sys 38 | DEBUG_MODE = os.environ.get('DEBUG') 39 | if DEBUG_MODE: 40 | output_filename = None 41 | num_data = 3 42 | log_filename = 'result.log' 43 | else: 44 | output_filename = sys.argv[1] if len(sys.argv) > 1 else None 45 | num_data = int(sys.argv[2]) if len(sys.argv) > 2 else 1000 46 | log_filename = sys.argv[3] if len(sys.argv) > 3 else 'result.log' 47 | curses.wrapper(main, output_filename, num_data, ResultLogger(log_filename)) 48 | print("create %d training data" % num_data) 49 | -------------------------------------------------------------------------------- /src/npi/add/lib.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | from random import random 3 | 4 | import numpy as np 5 | 6 | from npi.core import Program, IntegerArguments, StepOutput, NPIStep, PG_CONTINUE, PG_RETURN 7 | from npi.terminal_core import Screen, Terminal 8 | 9 | __author__ = 'k_morishita' 10 | 11 | 12 | class AdditionEnv: 13 | """ 14 | Environment of Addition 15 | """ 16 | def __init__(self, height, width, num_chars): 17 | self.screen = Screen(height, width) 18 | self.num_chars = num_chars 19 | self.pointers = [0] * height 20 | self.reset() 21 | 22 | def reset(self): 23 | self.screen.fill(0) 24 | self.pointers = [self.screen.width-1] * self.screen.height # rightmost 25 | 26 | def get_observation(self) -> np.ndarray: 27 | value = [] 28 | for row in range(len(self.pointers)): 29 | value.append(self.to_one_hot(self.screen[row, self.pointers[row]])) 30 | return np.array(value) # shape of FIELD_ROW * FIELD_DEPTH 31 | 32 | def to_one_hot(self, ch): 33 | ret = np.zeros((self.num_chars,), dtype=np.int8) 34 | if 0 <= ch < self.num_chars: 35 | ret[ch] = 1 36 | else: 37 | raise IndexError("ch must be 0 <= ch < %s, but %s" % (self.num_chars, ch)) 38 | return ret 39 | 40 | def setup_problem(self, num1, num2): 41 | for i, s in enumerate(reversed("%s" % num1)): 42 | self.screen[0, -(i+1)] = int(s) + 1 43 | for i, s in enumerate(reversed("%s" % num2)): 44 | self.screen[1, -(i+1)] = int(s) + 1 45 | 46 | def move_pointer(self, row, left_or_right): 47 | if 0 <= row < len(self.pointers): 48 | self.pointers[row] += 1 if left_or_right == 1 else -1 # LEFT is 0, RIGHT is 1 49 | self.pointers[row] %= self.screen.width 50 | 51 | def write(self, row, ch): 52 | if 0 <= row < self.screen.height and 0 <= ch < self.num_chars: 53 | self.screen[row, self.pointers[row]] = ch 54 | 55 | def get_output(self): 56 | s = "" 57 | for ch in self.screen[3]: 58 | if ch > 0: 59 | s += "%s" % (ch-1) 60 | return int(s or "0") 61 | 62 | 63 | class MovePtrProgram(Program): 64 | output_to_env = True 65 | PTR_IN1 = 0 66 | PTR_IN2 = 1 67 | PTR_CARRY = 2 68 | PTR_OUT = 3 69 | 70 | TO_LEFT = 0 71 | TO_RIGHT = 1 72 | 73 | def do(self, env: AdditionEnv, args: IntegerArguments): 74 | ptr_kind = args.decode_at(0) 75 | left_or_right = args.decode_at(1) 76 | env.move_pointer(ptr_kind, left_or_right) 77 | 78 | 79 | class WriteProgram(Program): 80 | output_to_env = True 81 | WRITE_TO_CARRY = 0 82 | WRITE_TO_OUTPUT = 1 83 | 84 | def do(self, env: AdditionEnv, args: IntegerArguments): 85 | row = 2 if args.decode_at(0) == self.WRITE_TO_CARRY else 3 86 | digit = args.decode_at(1) 87 | env.write(row, digit+1) 88 | 89 | 90 | class AdditionProgramSet: 91 | NOP = Program('NOP') 92 | MOVE_PTR = MovePtrProgram('MOVE_PTR', 4, 2) # PTR_KIND(4), LEFT_OR_RIGHT(2) 93 | WRITE = WriteProgram('WRITE', 2, 10) # CARRY_OR_OUT(2), DIGITS(10) 94 | ADD = Program('ADD') 95 | ADD1 = Program('ADD1') 96 | CARRY = Program('CARRY') 97 | LSHIFT = Program('LSHIFT') 98 | RSHIFT = Program('RSHIFT') 99 | 100 | def __init__(self): 101 | self.map = {} 102 | self.program_id = 0 103 | self.register(self.NOP) 104 | self.register(self.MOVE_PTR) 105 | self.register(self.WRITE) 106 | self.register(self.ADD) 107 | self.register(self.ADD1) 108 | self.register(self.CARRY) 109 | self.register(self.LSHIFT) 110 | self.register(self.RSHIFT) 111 | 112 | def register(self, pg: Program): 113 | pg.program_id = self.program_id 114 | self.map[pg.program_id] = pg 115 | self.program_id += 1 116 | 117 | def get(self, i: int): 118 | return self.map.get(i) 119 | 120 | 121 | class AdditionTeacher(NPIStep): 122 | def __init__(self, program_set: AdditionProgramSet): 123 | self.pg_set = program_set 124 | self.step_queue = None 125 | self.step_queue_stack = [] 126 | self.sub_program = {} 127 | self.register_subprogram(program_set.MOVE_PTR, self.pg_primitive) 128 | self.register_subprogram(program_set.WRITE , self.pg_primitive) 129 | self.register_subprogram(program_set.ADD , self.pg_add) 130 | self.register_subprogram(program_set.ADD1 , self.pg_add1) 131 | self.register_subprogram(program_set.CARRY , self.pg_carry) 132 | self.register_subprogram(program_set.LSHIFT , self.pg_lshift) 133 | self.register_subprogram(program_set.RSHIFT , self.pg_rshift) 134 | 135 | def reset(self): 136 | super(AdditionTeacher, self).reset() 137 | self.step_queue_stack = [] 138 | self.step_queue = None 139 | 140 | def register_subprogram(self, pg, method): 141 | self.sub_program[pg.program_id] = method 142 | 143 | @staticmethod 144 | def decode_params(env_observation: np.ndarray, arguments: IntegerArguments): 145 | return env_observation.argmax(axis=1), arguments.decode_all() 146 | 147 | def enter_function(self): 148 | self.step_queue_stack.append(self.step_queue or []) 149 | self.step_queue = None 150 | 151 | def exit_function(self): 152 | self.step_queue = self.step_queue_stack.pop() 153 | 154 | def step(self, env_observation: np.ndarray, pg: Program, arguments: IntegerArguments) -> StepOutput: 155 | if not self.step_queue: 156 | self.step_queue = self.sub_program[pg.program_id](env_observation, arguments) 157 | if self.step_queue: 158 | ret = self.convert_for_step_return(self.step_queue[0]) 159 | self.step_queue = self.step_queue[1:] 160 | else: 161 | ret = StepOutput(PG_RETURN, None, None) 162 | return ret 163 | 164 | @staticmethod 165 | def convert_for_step_return(step_values: tuple) -> StepOutput: 166 | if len(step_values) == 2: 167 | return StepOutput(PG_CONTINUE, step_values[0], IntegerArguments(step_values[1])) 168 | else: 169 | return StepOutput(step_values[0], step_values[1], IntegerArguments(step_values[2])) 170 | 171 | @staticmethod 172 | def pg_primitive(env_observation: np.ndarray, arguments: IntegerArguments): 173 | return None 174 | 175 | def pg_add(self, env_observation: np.ndarray, arguments: IntegerArguments): 176 | ret = [] 177 | (in1, in2, carry, output), (a1, a2, a3) = self.decode_params(env_observation, arguments) 178 | if in1 == 0 and in2 == 0 and carry == 0: 179 | return None 180 | ret.append((self.pg_set.ADD1, None)) 181 | ret.append((self.pg_set.LSHIFT, None)) 182 | return ret 183 | 184 | def pg_add1(self, env_observation: np.ndarray, arguments: IntegerArguments): 185 | ret = [] 186 | p = self.pg_set 187 | (in1, in2, carry, output), (a1, a2, a3) = self.decode_params(env_observation, arguments) 188 | result = self.sum_ch_list([in1, in2, carry]) 189 | ret.append((p.WRITE, (p.WRITE.WRITE_TO_OUTPUT, result % 10))) 190 | if result > 9: 191 | ret.append((p.CARRY, None)) 192 | ret[-1] = (PG_RETURN, ret[-1][0], ret[-1][1]) 193 | return ret 194 | 195 | @staticmethod 196 | def sum_ch_list(ch_list): 197 | ret = 0 198 | for ch in ch_list: 199 | if ch > 0: 200 | ret += ch - 1 201 | return ret 202 | 203 | def pg_carry(self, env_observation: np.ndarray, arguments: IntegerArguments): 204 | ret = [] 205 | p = self.pg_set 206 | ret.append((p.MOVE_PTR, (p.MOVE_PTR.PTR_CARRY, p.MOVE_PTR.TO_LEFT))) 207 | ret.append((p.WRITE, (p.WRITE.WRITE_TO_CARRY, 1))) 208 | ret.append((PG_RETURN, p.MOVE_PTR, (p.MOVE_PTR.PTR_CARRY, p.MOVE_PTR.TO_RIGHT))) 209 | return ret 210 | 211 | def pg_lshift(self, env_observation: np.ndarray, arguments: IntegerArguments): 212 | ret = [] 213 | p = self.pg_set 214 | ret.append((p.MOVE_PTR, (p.MOVE_PTR.PTR_IN1, p.MOVE_PTR.TO_LEFT))) 215 | ret.append((p.MOVE_PTR, (p.MOVE_PTR.PTR_IN2, p.MOVE_PTR.TO_LEFT))) 216 | ret.append((p.MOVE_PTR, (p.MOVE_PTR.PTR_CARRY, p.MOVE_PTR.TO_LEFT))) 217 | ret.append((PG_RETURN, p.MOVE_PTR, (p.MOVE_PTR.PTR_OUT, p.MOVE_PTR.TO_LEFT))) 218 | return ret 219 | 220 | def pg_rshift(self, env_observation: np.ndarray, arguments: IntegerArguments): 221 | ret = [] 222 | p = self.pg_set 223 | ret.append((p.MOVE_PTR, (p.MOVE_PTR.PTR_IN1, p.MOVE_PTR.TO_RIGHT))) 224 | ret.append((p.MOVE_PTR, (p.MOVE_PTR.PTR_IN2, p.MOVE_PTR.TO_RIGHT))) 225 | ret.append((p.MOVE_PTR, (p.MOVE_PTR.PTR_CARRY, p.MOVE_PTR.TO_RIGHT))) 226 | ret.append((PG_RETURN, p.MOVE_PTR, (p.MOVE_PTR.PTR_OUT, p.MOVE_PTR.TO_RIGHT))) 227 | return ret 228 | 229 | 230 | def create_char_map(): 231 | char_map = dict((i+1, "%s" % i) for i in range(10)) 232 | char_map[0] = ' ' 233 | return char_map 234 | 235 | 236 | def create_questions(num=100, max_number=10000): 237 | questions = [] 238 | for in1 in range(10): 239 | for in2 in range(10): 240 | questions.append(dict(in1=in1, in2=in2)) 241 | 242 | for _ in range(100): 243 | questions.append(dict(in1=int(random() * 100), in2=int(random() * 100))) 244 | 245 | for _ in range(100): 246 | questions.append(dict(in1=int(random() * 1000), in2=int(random() * 1000))) 247 | 248 | questions += [ 249 | dict(in1=104, in2=902), 250 | ] 251 | 252 | questions += create_random_questions(num=num, max_number=max_number) 253 | return questions 254 | 255 | 256 | def create_random_questions(num=100, max_number=10000): 257 | questions = [] 258 | for _ in range(num): 259 | questions.append(dict(in1=int(random() * max_number), in2=int(random() * max_number))) 260 | return questions 261 | 262 | 263 | def run_npi(addition_env, npi_runner, program, data): 264 | data['expect'] = data['in1'] + data['in2'] 265 | 266 | addition_env.setup_problem(data['in1'], data['in2']) 267 | 268 | npi_runner.reset() 269 | npi_runner.display_env(addition_env, force=True) 270 | npi_runner.npi_program_interface(addition_env, program, IntegerArguments()) 271 | 272 | data['result'] = addition_env.get_output() 273 | data['correct'] = data['result'] == data['expect'] 274 | -------------------------------------------------------------------------------- /src/npi/add/model.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding: utf-8 3 | import os 4 | from collections import Counter 5 | from copy import copy 6 | 7 | import math 8 | import numpy as np 9 | from keras.engine.topology import Merge, InputLayer 10 | from keras.engine.training import Model 11 | from keras.layers.core import Dense, Activation, RepeatVector, MaxoutDense 12 | from keras.layers.embeddings import Embedding 13 | from keras.layers.recurrent import LSTM 14 | from keras.models import Sequential, model_from_yaml 15 | from keras.optimizers import Adam 16 | from keras.regularizers import l1, l2 17 | from keras.utils.visualize_util import plot 18 | 19 | from npi.add.config import FIELD_ROW, FIELD_DEPTH, PROGRAM_VEC_SIZE, PROGRAM_KEY_VEC_SIZE, FIELD_WIDTH 20 | from npi.add.lib import AdditionProgramSet, AdditionEnv, run_npi, create_questions, AdditionTeacher, \ 21 | create_random_questions 22 | from npi.core import NPIStep, Program, IntegerArguments, StepOutput, RuntimeSystem, PG_RETURN, StepInOut, StepInput, \ 23 | to_one_hot_array 24 | from npi.terminal_core import TerminalNPIRunner 25 | 26 | __author__ = 'k_morishita' 27 | 28 | 29 | class AdditionNPIModel(NPIStep): 30 | model = None 31 | f_enc = None 32 | 33 | def __init__(self, system: RuntimeSystem, model_path: str=None, program_set: AdditionProgramSet=None): 34 | self.system = system 35 | self.model_path = model_path 36 | self.program_set = program_set 37 | self.batch_size = 1 38 | self.build() 39 | self.weight_loaded = False 40 | self.load_weights() 41 | 42 | def build(self): 43 | enc_size = self.size_of_env_observation() 44 | argument_size = IntegerArguments.size_of_arguments 45 | input_enc = InputLayer(batch_input_shape=(self.batch_size, enc_size), name='input_enc') 46 | input_arg = InputLayer(batch_input_shape=(self.batch_size, argument_size), name='input_arg') 47 | input_prg = Embedding(input_dim=PROGRAM_VEC_SIZE, output_dim=PROGRAM_KEY_VEC_SIZE, input_length=1, 48 | batch_input_shape=(self.batch_size, 1)) 49 | 50 | f_enc = Sequential(name='f_enc') 51 | f_enc.add(Merge([input_enc, input_arg], mode='concat')) 52 | f_enc.add(MaxoutDense(128, nb_feature=4)) 53 | self.f_enc = f_enc 54 | 55 | program_embedding = Sequential(name='program_embedding') 56 | program_embedding.add(input_prg) 57 | 58 | f_enc_convert = Sequential(name='f_enc_convert') 59 | f_enc_convert.add(f_enc) 60 | f_enc_convert.add(RepeatVector(1)) 61 | 62 | f_lstm = Sequential(name='f_lstm') 63 | f_lstm.add(Merge([f_enc_convert, program_embedding], mode='concat')) 64 | f_lstm.add(LSTM(256, return_sequences=False, stateful=True, W_regularizer=l2(0.0000001))) 65 | f_lstm.add(Activation('relu', name='relu_lstm_1')) 66 | f_lstm.add(RepeatVector(1)) 67 | f_lstm.add(LSTM(256, return_sequences=False, stateful=True, W_regularizer=l2(0.0000001))) 68 | f_lstm.add(Activation('relu', name='relu_lstm_2')) 69 | # plot(f_lstm, to_file='f_lstm.png', show_shapes=True) 70 | 71 | f_end = Sequential(name='f_end') 72 | f_end.add(f_lstm) 73 | f_end.add(Dense(1, W_regularizer=l2(0.001))) 74 | f_end.add(Activation('sigmoid', name='sigmoid_end')) 75 | 76 | f_prog = Sequential(name='f_prog') 77 | f_prog.add(f_lstm) 78 | f_prog.add(Dense(PROGRAM_KEY_VEC_SIZE, activation="relu")) 79 | f_prog.add(Dense(PROGRAM_VEC_SIZE, W_regularizer=l2(0.0001))) 80 | f_prog.add(Activation('softmax', name='softmax_prog')) 81 | # plot(f_prog, to_file='f_prog.png', show_shapes=True) 82 | 83 | f_args = [] 84 | for ai in range(1, IntegerArguments.max_arg_num+1): 85 | f_arg = Sequential(name='f_arg%s' % ai) 86 | f_arg.add(f_lstm) 87 | f_arg.add(Dense(IntegerArguments.depth, W_regularizer=l2(0.0001))) 88 | f_arg.add(Activation('softmax', name='softmax_arg%s' % ai)) 89 | f_args.append(f_arg) 90 | # plot(f_arg, to_file='f_arg.png', show_shapes=True) 91 | 92 | self.model = Model([input_enc.input, input_arg.input, input_prg.input], 93 | [f_end.output, f_prog.output] + [fa.output for fa in f_args], 94 | name="npi") 95 | self.compile_model() 96 | plot(self.model, to_file='model.png', show_shapes=True) 97 | 98 | def reset(self): 99 | super(AdditionNPIModel, self).reset() 100 | for l in self.model.layers: 101 | if type(l) is LSTM: 102 | l.reset_states() 103 | 104 | def compile_model(self, lr=0.0001, arg_weight=1.): 105 | arg_num = IntegerArguments.max_arg_num 106 | optimizer = Adam(lr=lr) 107 | loss = ['binary_crossentropy', 'categorical_crossentropy'] + ['categorical_crossentropy'] * arg_num 108 | self.model.compile(optimizer=optimizer, loss=loss, loss_weights=[0.25, 0.25] + [arg_weight] * arg_num) 109 | 110 | def fit(self, steps_list, epoch=3000): 111 | """ 112 | 113 | :param int epoch: 114 | :param typing.List[typing.Dict[q=dict, steps=typing.List[StepInOut]]] steps_list: 115 | :return: 116 | """ 117 | 118 | def filter_question(condition_func): 119 | sub_steps_list = [] 120 | for steps_dict in steps_list: 121 | question = steps_dict['q'] 122 | if condition_func(question['in1'], question['in2']): 123 | sub_steps_list.append(steps_dict) 124 | return sub_steps_list 125 | 126 | # self.print_weights() 127 | if not self.weight_loaded: 128 | self.train_f_enc(filter_question(lambda a, b: 10 <= a < 100 and 10 <= b < 100), epoch=100) 129 | self.f_enc.trainable = False 130 | 131 | self.update_learning_rate(0.0001) 132 | 133 | # q_type = "training questions of a+b < 10" 134 | # print(q_type) 135 | # pr = 0.8 136 | # all_ok = self.fit_to_subset(filter_question(lambda a, b: a+b < 10), pass_rate=pr) 137 | # print("%s is pass_rate >= %s: %s" % (q_type, pr, all_ok)) 138 | # 139 | # q_type = "training questions of a<10 and b< 10 and 10 <= a+b" 140 | # print(q_type) 141 | # pr = 0.8 142 | # all_ok = self.fit_to_subset(filter_question(lambda a, b: a<10 and b<10 and a + b >= 10), pass_rate=pr) 143 | # print("%s is pass_rate >= %s: %s" % (q_type, pr, all_ok)) 144 | # 145 | # q_type = "training questions of a<10 and b<10" 146 | # print(q_type) 147 | # pr = 0.8 148 | # all_ok = self.fit_to_subset(filter_question(lambda a, b: a < 10 and b < 10), pass_rate=pr) 149 | # print("%s is pass_rate >= %s: %s" % (q_type, pr, all_ok)) 150 | 151 | q_type = "training questions of a<100 and b<100" 152 | print(q_type) 153 | pr = 0.8 154 | all_ok = self.fit_to_subset(filter_question(lambda a, b: a < 100 and b < 100), pass_rate=pr) 155 | print("%s is pass_rate >= %s: %s" % (q_type, pr, all_ok)) 156 | 157 | while True: 158 | if self.test_and_learn([10, 100, 1000]): 159 | break 160 | 161 | q_type = "training questions of ALL" 162 | print(q_type) 163 | 164 | q_num = 100 165 | skip_correct = False 166 | pr = 1.0 167 | questions = filter_question(lambda a, b: True) 168 | np.random.shuffle(questions) 169 | questions = questions[:q_num] 170 | all_ok = self.fit_to_subset(questions, pass_rate=pr, skip_correct=skip_correct) 171 | print("%s is pass_rate >= %s: %s" % (q_type, pr, all_ok)) 172 | 173 | def fit_to_subset(self, steps_list, pass_rate=1.0, skip_correct=False): 174 | for i in range(10): 175 | all_ok = self.do_learn(steps_list, 100, pass_rate=pass_rate, skip_correct=skip_correct) 176 | if all_ok: 177 | return True 178 | return False 179 | 180 | def test_and_learn(self, num_questions): 181 | for num in num_questions: 182 | print("test all type of %d questions" % num) 183 | cc, wc, wrong_questions = self.test_to_subset(create_random_questions(num)) 184 | acc_rate = cc/(cc+wc) 185 | print("Accuracy %s(OK=%d, NG=%d)" % (acc_rate, cc, wc)) 186 | if wc > 0: 187 | self.fit_to_subset(wrong_questions, pass_rate=1.0, skip_correct=False) 188 | return False 189 | return True 190 | 191 | def test_to_subset(self, questions): 192 | addition_env = AdditionEnv(FIELD_ROW, FIELD_WIDTH, FIELD_DEPTH) 193 | teacher = AdditionTeacher(self.program_set) 194 | npi_runner = TerminalNPIRunner(None, self) 195 | teacher_runner = TerminalNPIRunner(None, teacher) 196 | correct_count = wrong_count = 0 197 | wrong_steps_list = [] 198 | for idx, question in enumerate(questions): 199 | question = copy(question) 200 | if self.question_test(addition_env, npi_runner, question): 201 | correct_count += 1 202 | else: 203 | self.question_test(addition_env, teacher_runner, question) 204 | wrong_steps_list.append({"q": question, "steps": teacher_runner.step_list}) 205 | wrong_count += 1 206 | return correct_count, wrong_count, wrong_steps_list 207 | 208 | @staticmethod 209 | def dict_to_str(d): 210 | return str(tuple([(k, d[k]) for k in sorted(d)])) 211 | 212 | def do_learn(self, steps_list, epoch, pass_rate=1.0, skip_correct=False): 213 | addition_env = AdditionEnv(FIELD_ROW, FIELD_WIDTH, FIELD_DEPTH) 214 | npi_runner = TerminalNPIRunner(None, self) 215 | last_weights = None 216 | correct_count = Counter() 217 | no_change_count = 0 218 | last_loss = 1000 219 | for ep in range(1, epoch+1): 220 | correct_new = wrong_new = 0 221 | losses = [] 222 | ok_rate = [] 223 | np.random.shuffle(steps_list) 224 | for idx, steps_dict in enumerate(steps_list): 225 | question = copy(steps_dict['q']) 226 | question_key = self.dict_to_str(question) 227 | if self.question_test(addition_env, npi_runner, question): 228 | if correct_count[question_key] == 0: 229 | correct_new += 1 230 | correct_count[question_key] += 1 231 | print("GOOD!: ep=%2d idx=%3d :%s CorrectCount=%s" % (ep, idx, self.dict_to_str(question), correct_count[question_key])) 232 | ok_rate.append(1) 233 | cc = correct_count[question_key] 234 | if skip_correct or int(math.sqrt(cc)) ** 2 != cc: 235 | continue 236 | else: 237 | ok_rate.append(0) 238 | if correct_count[question_key] > 0: 239 | print("Degraded: ep=%2d idx=%3d :%s CorrectCount=%s -> 0" % (ep, idx, self.dict_to_str(question), correct_count[question_key])) 240 | correct_count[question_key] = 0 241 | wrong_new += 1 242 | 243 | steps = steps_dict['steps'] 244 | xs = [] 245 | ys = [] 246 | ws = [] 247 | for step in steps: 248 | xs.append(self.convert_input(step.input)) 249 | y, w = self.convert_output(step.output) 250 | ys.append(y) 251 | ws.append(w) 252 | 253 | self.reset() 254 | 255 | for i, (x, y, w) in enumerate(zip(xs, ys, ws)): 256 | loss = self.model.train_on_batch(x, y, sample_weight=w) 257 | if not np.isfinite(loss): 258 | print("Loss is not finite!, Last Input=%s" % ([i, (x, y, w)])) 259 | self.print_weights(last_weights, detail=True) 260 | raise RuntimeError("Loss is not finite!") 261 | losses.append(loss) 262 | last_weights = self.model.get_weights() 263 | if losses: 264 | cur_loss = np.average(losses) 265 | print("ep=%2d: ok_rate=%.2f%% (+%s -%s): ave loss %s (%s samples)" % 266 | (ep, np.average(ok_rate)*100, correct_new, wrong_new, cur_loss, len(steps_list))) 267 | # self.print_weights() 268 | if correct_new + wrong_new == 0: 269 | no_change_count += 1 270 | else: 271 | no_change_count = 0 272 | 273 | if math.fabs(1 - cur_loss/last_loss) < 0.001 and no_change_count > 5: 274 | print("math.fabs(1 - cur_loss/last_loss) < 0.001 and no_change_count > 5:") 275 | return False 276 | last_loss = cur_loss 277 | print("=" * 80) 278 | self.save() 279 | if np.average(ok_rate) >= pass_rate: 280 | return True 281 | return False 282 | 283 | def update_learning_rate(self, learning_rate, arg_weight=1.): 284 | print("Re-Compile Model lr=%s aw=%s" % (learning_rate, arg_weight)) 285 | self.compile_model(learning_rate, arg_weight=arg_weight) 286 | 287 | def train_f_enc(self, steps_list, epoch=50): 288 | print("training f_enc") 289 | f_add0 = Sequential(name='f_add0') 290 | f_add0.add(self.f_enc) 291 | f_add0.add(Dense(FIELD_DEPTH)) 292 | f_add0.add(Activation('softmax', name='softmax_add0')) 293 | 294 | f_add1 = Sequential(name='f_add1') 295 | f_add1.add(self.f_enc) 296 | f_add1.add(Dense(FIELD_DEPTH)) 297 | f_add1.add(Activation('softmax', name='softmax_add1')) 298 | 299 | env_model = Model(self.f_enc.inputs, [f_add0.output, f_add1.output], name="env_model") 300 | env_model.compile(optimizer='adam', loss=['categorical_crossentropy']*2) 301 | 302 | for ep in range(epoch): 303 | losses = [] 304 | for idx, steps_dict in enumerate(steps_list): 305 | prev = None 306 | for step in steps_dict['steps']: 307 | x = self.convert_input(step.input)[:2] 308 | env_values = step.input.env.reshape((4, -1)) 309 | in1 = np.clip(env_values[0].argmax() - 1, 0, 9) 310 | in2 = np.clip(env_values[1].argmax() - 1, 0, 9) 311 | carry = np.clip(env_values[2].argmax() - 1, 0, 9) 312 | y_num = in1 + in2 + carry 313 | now = (in1, in2, carry) 314 | if prev == now: 315 | continue 316 | prev = now 317 | y0 = to_one_hot_array((y_num % 10)+1, FIELD_DEPTH) 318 | y1 = to_one_hot_array((y_num // 10)+1, FIELD_DEPTH) 319 | y = [yy.reshape((self.batch_size, -1)) for yy in [y0, y1]] 320 | loss = env_model.train_on_batch(x, y) 321 | losses.append(loss) 322 | print("ep %3d: loss=%s" % (ep, np.average(losses))) 323 | if np.average(losses) < 1e-06: 324 | break 325 | 326 | def question_test(self, addition_env, npi_runner, question): 327 | addition_env.reset() 328 | self.reset() 329 | try: 330 | run_npi(addition_env, npi_runner, self.program_set.ADD, question) 331 | if question['correct']: 332 | return True 333 | except StopIteration: 334 | pass 335 | return False 336 | 337 | def convert_input(self, p_in: StepInput): 338 | x_pg = np.array((p_in.program.program_id,)) 339 | x = [xx.reshape((self.batch_size, -1)) for xx in (p_in.env, p_in.arguments.values, x_pg)] 340 | return x 341 | 342 | def convert_output(self, p_out: StepOutput): 343 | y = [np.array((p_out.r,))] 344 | weights = [[1.]] 345 | if p_out.program: 346 | arg_values = p_out.arguments.values 347 | arg_num = len(p_out.program.args or []) 348 | y += [p_out.program.to_one_hot(PROGRAM_VEC_SIZE)] 349 | weights += [[1.]] 350 | else: 351 | arg_values = IntegerArguments().values 352 | arg_num = 0 353 | y += [np.zeros((PROGRAM_VEC_SIZE, ))] 354 | weights += [[1e-10]] 355 | 356 | for v in arg_values: # split by each args 357 | y += [v] 358 | weights += [[1.]] * arg_num + [[1e-10]] * (len(arg_values) - arg_num) 359 | weights = [np.array(w) for w in weights] 360 | return [yy.reshape((self.batch_size, -1)) for yy in y], weights 361 | 362 | def step(self, env_observation: np.ndarray, pg: Program, arguments: IntegerArguments) -> StepOutput: 363 | x = self.convert_input(StepInput(env_observation, pg, arguments)) 364 | results = self.model.predict(x, batch_size=1) # if batch_size==1, returns single row 365 | 366 | r, pg_one_hot, arg_values = results[0], results[1], results[2:] 367 | program = self.program_set.get(pg_one_hot.argmax()) 368 | ret = StepOutput(r, program, IntegerArguments(values=np.stack(arg_values))) 369 | return ret 370 | 371 | def save(self): 372 | self.model.save_weights(self.model_path, overwrite=True) 373 | 374 | def load_weights(self): 375 | if os.path.exists(self.model_path): 376 | self.model.load_weights(self.model_path) 377 | self.weight_loaded = True 378 | 379 | def print_weights(self, weights=None, detail=False): 380 | weights = weights or self.model.get_weights() 381 | for w in weights: 382 | print("w%s: sum(w)=%s, ave(w)=%s" % (w.shape, np.sum(w), np.average(w))) 383 | if detail: 384 | for w in weights: 385 | print("%s: %s" % (w.shape, w)) 386 | 387 | @staticmethod 388 | def size_of_env_observation(): 389 | return FIELD_ROW * FIELD_DEPTH 390 | -------------------------------------------------------------------------------- /src/npi/add/test_model.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | import curses 3 | import os 4 | import pickle 5 | 6 | from npi.add.config import FIELD_ROW, FIELD_WIDTH, FIELD_DEPTH 7 | from npi.add.lib import AdditionEnv, AdditionProgramSet, AdditionTeacher, create_char_map, create_questions, run_npi 8 | from npi.add.model import AdditionNPIModel 9 | from npi.core import ResultLogger, RuntimeSystem 10 | from npi.terminal_core import TerminalNPIRunner, Terminal 11 | 12 | 13 | def main(stdscr, model_path: str, num: int, result_logger: ResultLogger): 14 | terminal = Terminal(stdscr, create_char_map()) 15 | terminal.init_window(FIELD_WIDTH, FIELD_ROW) 16 | program_set = AdditionProgramSet() 17 | addition_env = AdditionEnv(FIELD_ROW, FIELD_WIDTH, FIELD_DEPTH) 18 | 19 | questions = create_questions(num) 20 | if DEBUG_MODE: 21 | questions = questions[-num:] 22 | system = RuntimeSystem(terminal=terminal) 23 | npi_model = AdditionNPIModel(system, model_path, program_set) 24 | npi_runner = TerminalNPIRunner(terminal, npi_model, recording=False) 25 | npi_runner.verbose = DEBUG_MODE 26 | correct_count = wrong_count = 0 27 | for data in questions: 28 | addition_env.reset() 29 | run_npi(addition_env, npi_runner, program_set.ADD, data) 30 | result_logger.write(data) 31 | terminal.add_log(data) 32 | if data['correct']: 33 | correct_count += 1 34 | else: 35 | wrong_count += 1 36 | return correct_count, wrong_count 37 | 38 | 39 | if __name__ == '__main__': 40 | import sys 41 | DEBUG_MODE = os.environ.get('DEBUG') 42 | model_path_ = sys.argv[1] 43 | num_data = int(sys.argv[2]) if len(sys.argv) > 2 else 1000 44 | log_filename = sys.argv[3] if len(sys.argv) > 3 else 'result.log' 45 | cc, wc = curses.wrapper(main, model_path_, num_data, ResultLogger(log_filename)) 46 | print("Accuracy %s(OK=%d, NG=%d)" % (cc/(cc+wc), cc, wc)) 47 | -------------------------------------------------------------------------------- /src/npi/add/training_model.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | import os 3 | import pickle 4 | 5 | from npi.add.config import FIELD_ROW, FIELD_WIDTH, FIELD_DEPTH 6 | from npi.add.lib import AdditionEnv, AdditionProgramSet, AdditionTeacher, create_char_map, create_questions, run_npi 7 | from npi.add.model import AdditionNPIModel 8 | from npi.core import ResultLogger, RuntimeSystem 9 | from npi.terminal_core import TerminalNPIRunner, Terminal 10 | 11 | 12 | def main(filename: str, model_path: str): 13 | system = RuntimeSystem() 14 | program_set = AdditionProgramSet() 15 | 16 | with open(filename, 'rb') as f: 17 | steps_list = pickle.load(f) 18 | 19 | npi_model = AdditionNPIModel(system, model_path, program_set) 20 | npi_model.fit(steps_list) 21 | 22 | 23 | if __name__ == '__main__': 24 | import sys 25 | DEBUG_MODE = os.environ.get('DEBUG') 26 | train_filename = sys.argv[1] 27 | model_output = sys.argv[2] 28 | main(train_filename, model_output) 29 | 30 | -------------------------------------------------------------------------------- /src/npi/core.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | 3 | import json 4 | from copy import copy 5 | 6 | import numpy as np 7 | 8 | MAX_ARG_NUM = 3 9 | ARG_DEPTH = 10 # 0~9 digit. one-hot. 10 | 11 | PG_CONTINUE = 0 12 | PG_RETURN = 1 13 | 14 | 15 | class IntegerArguments: 16 | depth = ARG_DEPTH 17 | max_arg_num = MAX_ARG_NUM 18 | size_of_arguments = depth * max_arg_num 19 | 20 | def __init__(self, args: list=None, values: np.ndarray=None): 21 | if values is not None: 22 | self.values = values.reshape((self.max_arg_num, self.depth)) 23 | else: 24 | self.values = np.zeros((self.max_arg_num, self.depth), dtype=np.float32) 25 | 26 | if args: 27 | for i, v in enumerate(args): 28 | self.update_to(i, v) 29 | 30 | def copy(self): 31 | obj = IntegerArguments() 32 | obj.values = np.copy(self.values) 33 | return obj 34 | 35 | def decode_all(self): 36 | return [self.decode_at(i) for i in range(len(self.values))] 37 | 38 | def decode_at(self, index: int) -> int: 39 | return self.values[index].argmax() 40 | 41 | def update_to(self, index: int, integer: int): 42 | self.values[index] = 0 43 | self.values[index, int(np.clip(integer, 0, self.depth-1))] = 1 44 | 45 | def __str__(self): 46 | return "" % self.decode_all() 47 | 48 | 49 | class Program: 50 | output_to_env = False 51 | 52 | def __init__(self, name, *args): 53 | self.name = name 54 | self.args = args 55 | self.program_id = None 56 | 57 | def description_with_args(self, args: IntegerArguments) -> str: 58 | int_args = args.decode_all() 59 | return "%s(%s)" % (self.name, ", ".join([str(x) for x in int_args])) 60 | 61 | def to_one_hot(self, size, dtype=np.float): 62 | ret = np.zeros((size,), dtype=dtype) 63 | ret[self.program_id] = 1 64 | return ret 65 | 66 | def do(self, env, args: IntegerArguments): 67 | raise NotImplementedError() 68 | 69 | def __str__(self): 70 | return "" % self.name 71 | 72 | 73 | class StepInput: 74 | def __init__(self, env: np.ndarray, program: Program, arguments: IntegerArguments): 75 | self.env = env 76 | self.program = program 77 | self.arguments = arguments 78 | 79 | 80 | class StepOutput: 81 | def __init__(self, r: float, program: Program=None, arguments: IntegerArguments=None): 82 | self.r = r 83 | self.program = program 84 | self.arguments = arguments 85 | 86 | def __str__(self): 87 | return "" % (self.r, self.program, self.arguments) 88 | 89 | 90 | class StepInOut: 91 | def __init__(self, input: StepInput, output: StepOutput): 92 | self.input = input 93 | self.output = output 94 | 95 | 96 | class ResultLogger: 97 | def __init__(self, filename): 98 | self.filename = filename 99 | 100 | def write(self, obj): 101 | with open(self.filename, "a") as f: 102 | json.dump(obj, f) 103 | f.write("\n") 104 | 105 | 106 | class NPIStep: 107 | def reset(self): 108 | pass 109 | 110 | def enter_function(self): 111 | pass 112 | 113 | def exit_function(self): 114 | pass 115 | 116 | def step(self, env_observation: np.ndarray, pg: Program, arguments: IntegerArguments) -> StepOutput: 117 | raise NotImplementedError() 118 | 119 | 120 | class RuntimeSystem: 121 | def __init__(self, terminal=None): 122 | self.terminal = terminal 123 | 124 | def logging(self, message): 125 | if self.terminal: 126 | self.terminal.add_log(message) 127 | else: 128 | print(message) 129 | 130 | 131 | def to_one_hot_array(idx, size, dtype=np.int8): 132 | ret = np.zeros((size, ), dtype=dtype) 133 | ret[idx] = 1 134 | return ret 135 | -------------------------------------------------------------------------------- /src/npi/terminal_core.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding: utf-8 3 | import curses 4 | import numpy as np 5 | 6 | from npi.core import Program, IntegerArguments, NPIStep, StepOutput, StepInput, StepInOut 7 | 8 | __author__ = 'k_morishita' 9 | 10 | 11 | class Screen: 12 | data = None 13 | 14 | def __init__(self, height, width): 15 | self.height = height 16 | self.width = width 17 | self.init_screen() 18 | 19 | def init_screen(self): 20 | self.data = np.zeros([self.height, self.width], dtype=np.int8) 21 | 22 | def fill(self, ch): 23 | self.data.fill(ch) 24 | 25 | def as_float32(self): 26 | return self.data.astype(np.float32) 27 | 28 | def __setitem__(self, key, value): 29 | self.data[key] = value 30 | 31 | def __getitem__(self, item): 32 | return self.data[item] 33 | 34 | 35 | class Terminal: 36 | W_TOP = 1 37 | W_LEFT = 1 38 | LOG_WINDOW_HEIGHT = 10 39 | LOG_WINDOW_WIDTH = 80 40 | INFO_WINDOW_HEIGHT = 10 41 | INFO_WINDOW_WIDTH = 40 42 | 43 | main_window = None 44 | info_window = None 45 | log_window = None 46 | 47 | def __init__(self, stdscr, char_map=None): 48 | print(type(stdscr)) 49 | self.stdscr = stdscr 50 | self.char_map = char_map or dict((ch, chr(ch)) for ch in range(128)) 51 | self.log_list = [] 52 | 53 | def init_window(self, width, height): 54 | curses.curs_set(0) 55 | border_win = curses.newwin(height + 2, width + 2, self.W_TOP, self.W_LEFT) # h, w, y, x 56 | border_win.box() 57 | self.stdscr.refresh() 58 | border_win.refresh() 59 | self.main_window = curses.newwin(height, width, self.W_TOP + 1, self.W_LEFT + 1) 60 | self.main_window.refresh() 61 | self.main_window.timeout(1) 62 | self.info_window = curses.newwin(self.INFO_WINDOW_HEIGHT, self.INFO_WINDOW_WIDTH, 63 | self.W_TOP + 1, self.W_LEFT + width + 2) 64 | self.log_window = curses.newwin(self.LOG_WINDOW_HEIGHT, self.LOG_WINDOW_WIDTH, 65 | self.W_TOP + max(height, self.INFO_WINDOW_HEIGHT) + 5, self.W_LEFT) 66 | self.log_window.refresh() 67 | 68 | def wait_for_key(self): 69 | self.stdscr.getch() 70 | 71 | def update_main_screen(self, screen): 72 | for y in range(screen.height): 73 | line = "".join([self.char_map[ch] for ch in screen[y]]) 74 | self.ignore_error_add_str(self.main_window, y, 0, line) 75 | 76 | def update_main_window_attr(self, screen, y, x, attr): 77 | ch = screen[y, x] 78 | self.ignore_error_add_str(self.main_window, y, x, self.char_map[ch], attr) 79 | 80 | def refresh_main_window(self): 81 | self.main_window.refresh() 82 | 83 | def update_info_screen(self, info_list): 84 | self.info_window.clear() 85 | for i, info_str in enumerate(info_list): 86 | self.info_window.addstr(i, 2, info_str) 87 | self.info_window.refresh() 88 | 89 | def add_log(self, line): 90 | self.log_list.insert(0, str(line)[:self.LOG_WINDOW_WIDTH]) 91 | self.log_list = self.log_list[:self.LOG_WINDOW_HEIGHT-1] 92 | self.log_window.clear() 93 | for i, line in enumerate(self.log_list): 94 | line = str(line) + " " * (self.LOG_WINDOW_WIDTH - len(str(line))) 95 | self.log_window.addstr(i, 0, line) 96 | self.log_window.refresh() 97 | 98 | @staticmethod 99 | def ignore_error_add_str(win, y, x, s, attr=curses.A_NORMAL): 100 | """一番右下に書き込むと例外が飛んでくるけど、漢は黙って無視するのがお作法らしい?""" 101 | try: 102 | win.addstr(y, x, s, attr) 103 | except curses.error: 104 | pass 105 | 106 | 107 | def show_env_to_terminal(terminal, env): 108 | terminal.update_main_screen(env.screen) 109 | for i, p in enumerate(env.pointers): 110 | terminal.update_main_window_attr(env.screen, i, p, curses.A_REVERSE) 111 | terminal.refresh_main_window() 112 | 113 | 114 | class TerminalNPIRunner: 115 | def __init__(self, terminal: Terminal, model: NPIStep=None, recording=True, max_depth=10, max_step=1000): 116 | self.terminal = terminal 117 | self.model = model 118 | self.steps = 0 119 | self.step_list = [] 120 | self.alpha = 0.5 121 | self.verbose = True 122 | self.recording = recording 123 | self.max_depth = max_depth 124 | self.max_step = max_step 125 | 126 | def reset(self): 127 | self.steps = 0 128 | self.step_list = [] 129 | self.model.reset() 130 | 131 | def display_env(self, env, force=False): 132 | if (self.verbose or force) and self.terminal: 133 | show_env_to_terminal(self.terminal, env) 134 | 135 | def display_information(self, program: Program, arguments: IntegerArguments, result: StepOutput, depth: int): 136 | if self.verbose and self.terminal: 137 | information = [ 138 | "Step %2d Depth: %2d" % (self.steps, depth), 139 | program.description_with_args(arguments), 140 | 'r=%.2f' % result.r, 141 | ] 142 | if result.program: 143 | information.append("-> %s" % result.program.description_with_args(result.arguments)) 144 | self.terminal.update_info_screen(information) 145 | self.wait() 146 | 147 | def npi_program_interface(self, env, program: Program, arguments: IntegerArguments, depth=0): 148 | if self.max_depth < depth or self.max_step < self.steps: 149 | raise StopIteration() 150 | 151 | self.model.enter_function() 152 | 153 | result = StepOutput(0, None, None) 154 | while result.r < self.alpha: 155 | self.steps += 1 156 | if self.max_step < self.steps: 157 | raise StopIteration() 158 | 159 | env_observation = env.get_observation() 160 | result = self.model.step(env_observation, program, arguments.copy()) 161 | if self.recording: 162 | self.step_list.append(StepInOut(StepInput(env_observation, program, arguments.copy()), result)) 163 | self.display_information(program, arguments, result, depth) 164 | 165 | if program.output_to_env: 166 | program.do(env, arguments.copy()) 167 | self.display_env(env) 168 | else: 169 | if result.program: # modify original algorithm 170 | self.npi_program_interface(env, result.program, result.arguments, depth=depth+1) 171 | 172 | self.model.exit_function() 173 | 174 | def wait(self): 175 | self.terminal.wait_for_key() 176 | -------------------------------------------------------------------------------- /src/run_create_addition_data.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | THIS_DIR=$(cd $(dirname $0); pwd) 4 | DATA_DIR=${THIS_DIR}/../data 5 | OUTPUT_FILE=${1:-${DATA_DIR}/train.pkl} 6 | LOG=train_result.log 7 | export PYTHONPATH=${THIS_DIR} 8 | cd $THIS_DIR 9 | 10 | mkdir -p "$DATA_DIR" 11 | 12 | rm -f "$LOG" 13 | echo python npi/add/create_training_data.py "$OUTPUT_FILE" 1000 "$LOG" 14 | python npi/add/create_training_data.py "$OUTPUT_FILE" 1000 "$LOG" 15 | 16 | -------------------------------------------------------------------------------- /src/run_test_addition_model.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | THIS_DIR=$(cd $(dirname $0); pwd) 4 | DATA_DIR=${THIS_DIR}/../data 5 | MODEL_OUTPUT=${1:-${DATA_DIR}/addition.model} 6 | NUM_TEST=${2:-100} 7 | 8 | export PYTHONPATH=${THIS_DIR} 9 | cd "$THIS_DIR" 10 | 11 | mkdir -p "$DATA_DIR" 12 | 13 | echo python npi/add/test_model.py "$MODEL_OUTPUT" "$NUM_TEST" 14 | python npi/add/test_model.py "$MODEL_OUTPUT" "$NUM_TEST" 15 | -------------------------------------------------------------------------------- /src/run_train_addition_model.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | THIS_DIR=$(cd $(dirname $0); pwd) 4 | DATA_DIR=${THIS_DIR}/../data 5 | TRAIN_DATA=${1:-${DATA_DIR}/train.pkl} 6 | MODEL_OUTPUT=${2:-${DATA_DIR}/addition.model} 7 | 8 | export PYTHONPATH=${THIS_DIR} 9 | cd "$THIS_DIR" 10 | 11 | mkdir -p "$DATA_DIR" 12 | 13 | [ "$NEW_MODEL" != "" ] && rm -f "$MODEL_OUTPUT" 14 | 15 | echo python npi/add/training_model.py "$TRAIN_DATA" "$MODEL_OUTPUT" 16 | time python npi/add/training_model.py "$TRAIN_DATA" "$MODEL_OUTPUT" 17 | 18 | --------------------------------------------------------------------------------