├── README.md
└── dnc.ipynb


/README.md:
--------------------------------------------------------------------------------
 1 | # differentiable_neural_computer
 2 | 
 3 | ## Overview
 4 | 
 5 | This is the code for [this](https://youtu.be/r5XKzjTFCZQ) video on Youtube by Siraj Raval as part of the Deep Learning Nanodegree with Udacity. We're going to build a Differentiable Neural Computer capable of learning the mapping between binary inputs and outputs. The point of this demo is to break the DNC down to its bare essentials so we can really understand how the architecture works. This is the most complex network i've ever built. And it's dope AF.
 6 | 
 7 | ## Dependencies
 8 | 
 9 | * tensorflow
10 | * numpy
11 | 
12 | ## Usage
13 | 
14 | run `jupyter notebook` in terminal to see the code pop up in your browser.
15 | 
16 | Install jupyter [here](http://jupyter.readthedocs.io/en/latest/install.html)
17 | 
18 | 
19 | ## Credits
20 | 
21 | The credits for this code go to [claymcleod](https://github.com/claymcleod/tf-differentiable-neural-computer). I've merely created a wrapper to get people started. 
22 | 


--------------------------------------------------------------------------------
/dnc.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# The Differentiable Neural Computer\n",
  8 |     "\n",
  9 |     "## The Problem - how do we create more general purpose learning machines?\n",
 10 |     "\n",
 11 |     "Neural networks excel at pattern recognition and quick, reactive decision-making, but we are only just \n",
 12 |     "beginning to build neural networks that can think slowly. that is, deliberate or reason using knowledge.\n",
 13 |     "For example, how could a neural network store memories for facts like the connections in a transport network \n",
 14 |     "and then logically reason about its pieces of knowledge to answer questions?\n",
 15 |     "\n",
 16 |     "![alt text](https://storage.googleapis.com/deepmind-live-cms/images/dnc_figure1.width-1500_Zfxk87k.png \"Logo Title Text 1\")\n",
 17 |     "\n",
 18 |     "this consists of a neural network that can read from and write to an external memory matrix,\n",
 19 |     "analogous to the random-access memory in a conventional computer.\n",
 20 |     "\n",
 21 |     "Like a conventional computer, it can use its memory to represent and manipulate complex data structures, \n",
 22 |     "but, like a neural network, it can learn to do so from data.\n",
 23 |     "\n",
 24 |     "DNCs have the capacity to solve complex, structured tasks that are \n",
 25 |     "inaccessible to neural networks without external read–write memory.\n",
 26 |     "\n",
 27 |     "![alt text](https://storage.googleapis.com/deepmind-live-cms/images/dnc_figure2.width-1500_be2TeKT.png \"Logo Title Text 1\")\n",
 28 |     "\n",
 29 |     "[![IMAGE ALT TEXT HERE](http://img.youtube.com/vi/B9U8sI7TcMY/0.jpg)](http://www.youtube.com/watch?v=B9U8sI7TcMYE)\n",
 30 |     "\n",
 31 |     "\n",
 32 |     "\n",
 33 |     "Modern computers separate computation and memory. Computation is performed by a processor, \n",
 34 |     "which can use an addressable memory to bring operands in and out of play. \n",
 35 |     "\n",
 36 |     "In contrast to computers, the computational and memory resources of artificial neural networks \n",
 37 |     "are mixed together in the network weights and neuron activity. This is a major liability: \n",
 38 |     "as the memory demands of a task increase, these networks cannot allocate new storage \n",
 39 |     "dynam-ically, nor easily learn algorithms that act independently of the values realized \n",
 40 |     "by the task variables.\n",
 41 |     "    \n",
 42 |     "The whole system is differentiable, and can therefore be trained \n",
 43 |     "end-to-end with gradient descent, allowing the network to learn \n",
 44 |     "how to operate and organize the memory in a goal-directed manner.\n",
 45 |     "\n",
 46 |     "If the memory can be thought of as the DNC’s RAM, then the network, referred to as the ‘controller’, \n",
 47 |     "is a differentiable CPU whose operations are learned with gradient descent.\n",
 48 |     "\n",
 49 |     "\n",
 50 |     "\n",
 51 |     "How is it different from its predecessor, the Neural Turing Machine?\n",
 52 |     "\n",
 53 |     "basically, more memory access methods than NTM\n",
 54 |     "\n",
 55 |     " DNC extends the NTM addressing the following limitations:\n",
 56 |     "\n",
 57 |     "(1) Ensuring that blocks of allocated memory do not overlap and interfere.\n",
 58 |     "\n",
 59 |     "(2) Freeing memory that have already been written to.\n",
 60 |     "\n",
 61 |     "(3) Handling of non-contiguous memory through temporal links.\n",
 62 |     "\n",
 63 |     "\n",
 64 |     "the system required hand-crafted input to accomplish its learning and inference. This is not an NLP system where unstructured text is applied at input. \n",
 65 |     "\n",
 66 |     "3 forms of attention for heads\n",
 67 |     "- content lookup\n",
 68 |     "- records transitions between consecutively written locations in an N ×  N temporal link matrix L.\n",
 69 |     "This gives a DNC the native ability to recover sequences in the order in which it wrote them, even\n",
 70 |     "when consecutive writes did not occur in adjacent time-step\n",
 71 |     "- The third form of attention allocates memory for writing. \n",
 72 |     "\n",
 73 |     "Content lookup enables the formation of associative data structures;\n",
 74 |     "temporal links enable sequential retrieval of input sequences;\n",
 75 |     "and allocation provides the write head with unused locations. \n",
 76 |     "\n",
 77 |     "DNC memory modification is fast and can be one-shot, resembling the associative \n",
 78 |     "long-term potentiation of hippocampal CA3 and CA1 synapses\n",
 79 |     "\n",
 80 |     " Human ‘free recall’ experiments demonstrate the increased probability of \n",
 81 |     "   item recall in the same order as first pre-sented (temporal links)\n",
 82 |     "    \n",
 83 |     "DeepMind hopes that DNCs provide both a new tool for computer science and a new metaphor for cognitive science\n",
 84 |     "and neuroscience: here is a learning machine that, without prior programming, can organise information\n",
 85 |     "into connected facts and use those facts to solve problems."
 86 |    ]
 87 |   },
 88 |   {
 89 |    "cell_type": "code",
 90 |    "execution_count": null,
 91 |    "metadata": {
 92 |     "collapsed": true
 93 |    },
 94 |    "outputs": [],
 95 |    "source": [
 96 |     "import numpy as np\n",
 97 |     "import tensorflow as tf\n",
 98 |     "import os"
 99 |    ]
100 |   },
101 |   {
102 |    "cell_type": "code",
103 |    "execution_count": null,
104 |    "metadata": {
105 |     "collapsed": true
106 |    },
107 |    "outputs": [],
108 |    "source": [
109 |     "class DNC:\n",
110 |     "    def __init__(self, input_size, output_size, seq_len, num_words=256, word_size=64, num_heads=4):\n",
111 |     "        #define data\n",
112 |     "        #input data - [[1 0] [0 1] [0 0] [0 0]]\n",
113 |     "        self.input_size = input_size #X\n",
114 |     "        #output data [[0 0] [0 0] [1 0] [0 1]]\n",
115 |     "        self.output_size = output_size #Y\n",
116 |     "        \n",
117 |     "        #define read + write vector size\n",
118 |     "        #10\n",
119 |     "        self.num_words = num_words #N\n",
120 |     "        #4 characters\n",
121 |     "        self.word_size = word_size #W\n",
122 |     "        \n",
123 |     "        #define number of read+write heads\n",
124 |     "        #we could have multiple, but just 1 for simplicity\n",
125 |     "        self.num_heads = num_heads #R\n",
126 |     "\n",
127 |     "        #size of output vector from controller that defines interactions with memory matrix\n",
128 |     "        self.interface_size = num_heads*word_size + 3*word_size + 5*num_heads + 3\n",
129 |     "\n",
130 |     "        #the actual size of the neural network input after flatenning and\n",
131 |     "        # concatenating the input vector with the previously read vctors from memory\n",
132 |     "        self.nn_input_size = num_heads * word_size + input_size\n",
133 |     "        \n",
134 |     "        #size of output\n",
135 |     "        self.nn_output_size = output_size + self.interface_size\n",
136 |     "        \n",
137 |     "        #gaussian normal distribution for both outputs\n",
138 |     "        self.nn_out = tf.truncated_normal([1, self.output_size], stddev=0.1)\n",
139 |     "        self.interface_vec = tf.truncated_normal([1, self.interface_size], stddev=0.1)\n",
140 |     "\n",
141 |     "        #Create memory matrix\n",
142 |     "        self.mem_mat = tf.zeros([num_words, word_size]) #N*W\n",
143 |     "        \n",
144 |     "        #other variables\n",
145 |     "        #The usage vector records which locations have been used so far, \n",
146 |     "        self.usage_vec = tf.fill([num_words, 1], 1e-6) #N*1\n",
147 |     "        #a temporal link matrix records the order in which locations were written;\n",
148 |     "        self.link_mat = tf.zeros([num_words,num_words]) #N*N\n",
149 |     "        #represents degrees to which last location was written to\n",
150 |     "        self.precedence_weight = tf.zeros([num_words, 1]) #N*1\n",
151 |     "\n",
152 |     "        #Read and write head variables\n",
153 |     "        self.read_weights = tf.fill([num_words, num_heads], 1e-6) #N*R\n",
154 |     "        self.write_weights = tf.fill([num_words, 1], 1e-6) #N*1\n",
155 |     "        self.read_vecs = tf.fill([num_heads, word_size], 1e-6) #R*W\n",
156 |     "\n",
157 |     "        ###NETWORK VARIABLES\n",
158 |     "        #gateways into the computation graph for input output pairs\n",
159 |     "        self.i_data = tf.placeholder(tf.float32, [seq_len*2, self.input_size], name='input_node')\n",
160 |     "        self.o_data = tf.placeholder(tf.float32, [seq_len*2, self.output_size], name='output_node')\n",
161 |     "        \n",
162 |     "        #2 layer feedforwarded network\n",
163 |     "        self.W1 = tf.Variable(tf.truncated_normal([self.nn_input_size, 32], stddev=0.1), name='layer1_weights', dtype=tf.float32)\n",
164 |     "        self.b1 = tf.Variable(tf.zeros([32]), name='layer1_bias', dtype=tf.float32)\n",
165 |     "        self.W2 = tf.Variable(tf.truncated_normal([32, self.nn_output_size], stddev=0.1), name='layer2_weights', dtype=tf.float32)\n",
166 |     "        self.b2 = tf.Variable(tf.zeros([self.nn_output_size]), name='layer2_bias', dtype=tf.float32)\n",
167 |     "\n",
168 |     "        ###DNC OUTPUT WEIGHTS\n",
169 |     "        self.nn_out_weights = tf.Variable(tf.truncated_normal([self.nn_output_size, self.output_size], stddev=0.1), name='net_output_weights')\n",
170 |     "        self.interface_weights = tf.Variable(tf.truncated_normal([self.nn_output_size, self.interface_size], stddev=0.1), name='interface_weights')\n",
171 |     "        \n",
172 |     "        self.read_vecs_out_weight = tf.Variable(tf.truncated_normal([self.num_heads*self.word_size, self.output_size], stddev=0.1), name='read_vector_weights')\n",
173 |     "\n",
174 |     "    #3 attention mechanisms for read/writes to memory \n",
175 |     "    \n",
176 |     "    #1\n",
177 |     "    #a key vector emitted by the controller is compared to the \n",
178 |     "    #content of each location in memory according to a similarity measure \n",
179 |     "    #The similarity scores determine a weighting that can be used by the read heads \n",
180 |     "    #for associative recall1 or by the write head to modify an existing vector in memory.\n",
181 |     "    def content_lookup(self, key, str):\n",
182 |     "        #The l2 norm of a vector is the square root of the sum of the \n",
183 |     "        #absolute values squared\n",
184 |     "        norm_mem = tf.nn.l2_normalize(self.mem_mat, 1) #N*W\n",
185 |     "        norm_key = tf.nn.l2_normalize(key, 0) #1*W for write or R*W for read\n",
186 |     "        #get similarity measure between both vectors, transpose before multiplicaiton\n",
187 |     "        ##(N*W,W*1)->N*1 for write\n",
188 |     "        #(N*W,W*R)->N*R for read\n",
189 |     "        sim = tf.matmul(norm_mem, norm_key, transpose_b=True) \n",
190 |     "        #str is 1*1 or 1*R\n",
191 |     "        #returns similarity measure\n",
192 |     "        return tf.nn.softmax(sim*str, 0) #N*1 or N*R\n",
193 |     "\n",
194 |     "    #2\n",
195 |     "    #retreives the writing allocation weighting based on the usage free list\n",
196 |     "    #The ‘usage’ of each location is represented as a number between 0 and 1, \n",
197 |     "    #and a weighting that picks out unused locations is delivered to the write head. \n",
198 |     "    \n",
199 |     "    # independent of the size and contents of the memory, meaning that \n",
200 |     "    #DNCs can be trained to solve a task using one size of memory and later \n",
201 |     "    #upgraded to a larger memory without retraining\n",
202 |     "    def allocation_weighting(self):\n",
203 |     "        #sorted usage - the usage vector sorted ascndingly\n",
204 |     "        #the original indices of the sorted usage vector\n",
205 |     "        sorted_usage_vec, free_list = tf.nn.top_k(-1 * self.usage_vec, k=self.num_words)\n",
206 |     "        sorted_usage_vec *= -1\n",
207 |     "        cumprod = tf.cumprod(sorted_usage_vec, axis=0, exclusive=True)\n",
208 |     "        unorder = (1-sorted_usage_vec)*cumprod\n",
209 |     "\n",
210 |     "        alloc_weights = tf.zeros([self.num_words])\n",
211 |     "        I = tf.constant(np.identity(self.num_words, dtype=np.float32))\n",
212 |     "        \n",
213 |     "        #for each usage vec\n",
214 |     "        for pos, idx in enumerate(tf.unstack(free_list[0])):\n",
215 |     "            #flatten\n",
216 |     "            m = tf.squeeze(tf.slice(I, [idx, 0], [1, -1]))\n",
217 |     "            #add to weight matrix\n",
218 |     "            alloc_weights += m*unorder[0, pos]\n",
219 |     "        #the allocation weighting for each row in memory\n",
220 |     "        return tf.reshape(alloc_weights, [self.num_words, 1])\n",
221 |     "\n",
222 |     "    #at every time step the controller receives input vector from dataset and emits output vector. \n",
223 |     "    #it also recieves a set of read vectors from the memory matrix at the previous time step via \n",
224 |     "    #the read heads. then it emits an interface vector that defines its interactions with the memory\n",
225 |     "    #at the current time step\n",
226 |     "    def step_m(self, x):\n",
227 |     "        \n",
228 |     "        #reshape input\n",
229 |     "        input = tf.concat([x, tf.reshape(self.read_vecs, [1, self.num_heads*self.word_size])],1)\n",
230 |     "        \n",
231 |     "        #forward propagation\n",
232 |     "        l1_out = tf.matmul(input, self.W1) + self.b1\n",
233 |     "        l1_act = tf.nn.tanh(l1_out)\n",
234 |     "        l2_out = tf.matmul(l1_act, self.W2) + self.b2\n",
235 |     "        l2_act = tf.nn.tanh(l2_out)\n",
236 |     "        \n",
237 |     "        #output vector\n",
238 |     "        self.nn_out = tf.matmul(l2_act, self.nn_out_weights) #(1*eta+Y, eta+Y*Y)->(1*Y)\n",
239 |     "        #interaction vector - how to interact with memory\n",
240 |     "        self.interface_vec = tf.matmul(l2_act, self.interface_weights) #(1*eta+Y, eta+Y*eta)->(1*eta)\n",
241 |     "        \n",
242 |     "        \n",
243 |     "        partition = tf.constant([[0]*(self.num_heads*self.word_size) + [1]*(self.num_heads) + [2]*(self.word_size) + [3] + \\\n",
244 |     "                    [4]*(self.word_size) + [5]*(self.word_size) + \\\n",
245 |     "                    [6]*(self.num_heads) + [7] + [8] + [9]*(self.num_heads*3)], dtype=tf.int32)\n",
246 |     "\n",
247 |     "        #convert interface vector into a set of read write vectors\n",
248 |     "        #using tf.dynamic_partitions(Partitions interface_vec into 10 tensors using indices from partition)\n",
249 |     "        (read_keys, read_str, write_key, write_str,\n",
250 |     "         erase_vec, write_vec, free_gates, alloc_gate, write_gate, read_modes) = \\\n",
251 |     "            tf.dynamic_partition(self.interface_vec, partition, 10)\n",
252 |     "        \n",
253 |     "        #read vectors\n",
254 |     "        read_keys = tf.reshape(read_keys,[self.num_heads, self.word_size]) #R*W\n",
255 |     "        read_str = 1 + tf.nn.softplus(tf.expand_dims(read_str, 0)) #1*R\n",
256 |     "        \n",
257 |     "        #write vectors\n",
258 |     "        write_key = tf.expand_dims(write_key, 0) #1*W\n",
259 |     "        #help init our write weights\n",
260 |     "        write_str = 1 + tf.nn.softplus(tf.expand_dims(write_str, 0)) #1*1\n",
261 |     "        erase_vec = tf.nn.sigmoid(tf.expand_dims(erase_vec, 0)) #1*W\n",
262 |     "        write_vec = tf.expand_dims(write_vec, 0) #1*W\n",
263 |     "        \n",
264 |     "        #the degree to which locations at read heads will be freed\n",
265 |     "        free_gates = tf.nn.sigmoid(tf.expand_dims(free_gates, 0)) #1*R\n",
266 |     "        #the fraction of writing that is being allocated in a new location\n",
267 |     "        alloc_gate = tf.nn.sigmoid(alloc_gate) #1\n",
268 |     "        #the amount of information to be written to memory\n",
269 |     "        write_gate = tf.nn.sigmoid(write_gate) #1\n",
270 |     "        #the softmax distribution between the three read modes (backward, forward, lookup)\n",
271 |     "        #The read heads can use gates called read modes to switch between content lookup \n",
272 |     "        #using a read key and reading out locations either forwards or backwards \n",
273 |     "        #in the order they were written.\n",
274 |     "        read_modes = tf.nn.softmax(tf.reshape(read_modes, [3, self.num_heads])) #3*R\n",
275 |     "        \n",
276 |     "        #used to calculate usage vector, what's available to write to?\n",
277 |     "        retention_vec = tf.reduce_prod(1-free_gates*self.read_weights, reduction_indices=1)\n",
278 |     "        #used to dynamically allocate memory\n",
279 |     "        self.usage_vec = (self.usage_vec + self.write_weights - self.usage_vec * self.write_weights) * retention_vec\n",
280 |     "\n",
281 |     "        ##retreives the writing allocation weighting \n",
282 |     "        alloc_weights = self.allocation_weighting() #N*1\n",
283 |     "        #where to write to??\n",
284 |     "        write_lookup_weights = self.content_lookup(write_key, write_str) #N*1\n",
285 |     "        #define our write weights now that we know how much space to allocate for them and where to write to\n",
286 |     "        self.write_weights = write_gate*(alloc_gate*alloc_weights + (1-alloc_gate)*write_lookup_weights)\n",
287 |     "\n",
288 |     "        #write erase, then write to memory!\n",
289 |     "        self.mem_mat = self.mem_mat*(1-tf.matmul(self.write_weights, erase_vec)) + \\\n",
290 |     "                       tf.matmul(self.write_weights, write_vec)\n",
291 |     "\n",
292 |     "        #As well as writing, the controller can read from multiple locations in memory. \n",
293 |     "        #Memory can be searched based on the content of each location, or the associative \n",
294 |     "        #temporal links can be followed forward and backward to recall information written \n",
295 |     "        #in sequence or in reverse. (3rd attention mechanism)\n",
296 |     "        \n",
297 |     "        #updates and returns the temporal link matrix for the latest write\n",
298 |     "        #given the precedence vector and the link matrix from previous step\n",
299 |     "        nnweight_vec = tf.matmul(self.write_weights, tf.ones([1,self.num_words])) #N*N\n",
300 |     "        self.link_mat = (1 - nnweight_vec - tf.transpose(nnweight_vec))*self.link_mat + \\\n",
301 |     "                        tf.matmul(self.write_weights, self.precedence_weight, transpose_b=True)\n",
302 |     "        self.link_mat *= tf.ones([self.num_words, self.num_words]) - tf.constant(np.identity(self.num_words, dtype=np.float32))\n",
303 |     "\n",
304 |     "        \n",
305 |     "        self.precedence_weight = (1-tf.reduce_sum(self.write_weights, reduction_indices=0)) * \\\n",
306 |     "                                 self.precedence_weight + self.write_weights\n",
307 |     "        #3 modes - forward, backward, content lookup\n",
308 |     "        forw_w = read_modes[2]*tf.matmul(self.link_mat, self.read_weights) #(N*N,N*R)->N*R\n",
309 |     "        look_w = read_modes[1]*self.content_lookup(read_keys, read_str) #N*R\n",
310 |     "        back_w = read_modes[0]*tf.matmul(self.link_mat, self.read_weights, transpose_a=True) #N*R\n",
311 |     "\n",
312 |     "        #use them to intiialize read weights\n",
313 |     "        self.read_weights = back_w + look_w + forw_w #N*R\n",
314 |     "        #create read vectors by applying read weights to memory matrix\n",
315 |     "        self.read_vecs = tf.transpose(tf.matmul(self.mem_mat, self.read_weights, transpose_a=True)) #(W*N,N*R)^T->R*W\n",
316 |     "\n",
317 |     "        #multiply them together\n",
318 |     "        read_vec_mut = tf.matmul(tf.reshape(self.read_vecs, [1, self.num_heads * self.word_size]),\n",
319 |     "                                 self.read_vecs_out_weight)  # (1*RW, RW*Y)-> (1*Y)\n",
320 |     "        \n",
321 |     "        #return output + read vecs product\n",
322 |     "        return self.nn_out+read_vec_mut\n",
323 |     "\n",
324 |     "    #output list of numbers (one hot encoded) by running the step function\n",
325 |     "    def run(self):\n",
326 |     "        big_out = []\n",
327 |     "        for t, seq in enumerate(tf.unstack(self.i_data, axis=0)):\n",
328 |     "            seq = tf.expand_dims(seq, 0)\n",
329 |     "            y = self.step_m(seq)\n",
330 |     "            big_out.append(y)\n",
331 |     "        return tf.stack(big_out, axis=0)\n"
332 |    ]
333 |   },
334 |   {
335 |    "cell_type": "code",
336 |    "execution_count": null,
337 |    "metadata": {
338 |     "collapsed": true
339 |    },
340 |    "outputs": [],
341 |    "source": [
342 |     "def main(argv=None):\n",
343 |     "\n",
344 |     "    #generate the input output sequences, randomly intialized\n",
345 |     "    num_seq = 10\n",
346 |     "    seq_len = 6\n",
347 |     "    seq_width = 4\n",
348 |     "    iterations = 1000\n",
349 |     "    con = np.random.randint(0, seq_width,size=seq_len)\n",
350 |     "    seq = np.zeros((seq_len, seq_width))\n",
351 |     "    seq[np.arange(seq_len), con] = 1\n",
352 |     "    end = np.asarray([[-1]*seq_width])\n",
353 |     "    zer = np.zeros((seq_len, seq_width))\n",
354 |     "\n",
355 |     "    graph = tf.Graph()\n",
356 |     "    \n",
357 |     "    with graph.as_default():\n",
358 |     "        #training time\n",
359 |     "        with tf.Session() as sess:\n",
360 |     "            #init the DNC\n",
361 |     "            dnc = DNC(input_size=seq_width, output_size=seq_width, seq_len=seq_len, num_words=10, word_size=4, num_heads=1)\n",
362 |     "            \n",
363 |     "            #calculate the predicted output\n",
364 |     "            output = tf.squeeze(dnc.run())\n",
365 |     "            #compare prediction to reality, get loss via sigmoid cross entropy\n",
366 |     "            loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=output, labels=dnc.o_data))\n",
367 |     "            #use regularizers for each layer of the controller\n",
368 |     "            regularizers = (tf.nn.l2_loss(dnc.W1) + tf.nn.l2_loss(dnc.W2) +\n",
369 |     "                            tf.nn.l2_loss(dnc.b1) + tf.nn.l2_loss(dnc.b2))\n",
370 |     "            #to help the loss convergence faster\n",
371 |     "            loss += 5e-4 * regularizers\n",
372 |     "            #optimize the entire thing (memory + controller) using gradient descent. dope\n",
373 |     "            optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss)\n",
374 |     "            \n",
375 |     "            #initialize input output pairs\n",
376 |     "            tf.initialize_all_variables().run()\n",
377 |     "            final_i_data = np.concatenate((seq, zer), axis=0)\n",
378 |     "            final_o_data = np.concatenate((zer, seq), axis=0)\n",
379 |     "            #for each iteration\n",
380 |     "            for i in range(0, iterations+1):\n",
381 |     "                #feed in each input output pair\n",
382 |     "                feed_dict = {dnc.i_data: final_i_data, dnc.o_data: final_o_data}\n",
383 |     "                #make predictions\n",
384 |     "                l, _, predictions = sess.run([loss, optimizer, output], feed_dict=feed_dict)\n",
385 |     "                if i%100==0:\n",
386 |     "                    print(i,l)\n",
387 |     "            #print results\n",
388 |     "            print(final_i_data)\n",
389 |     "            print(final_o_data)\n",
390 |     "            print(predictions)\n",
391 |     "\n",
392 |     "if __name__ == '__main__':\n",
393 |     "    tf.app.run()\n"
394 |    ]
395 |   }
396 |  ],
397 |  "metadata": {
398 |   "kernelspec": {
399 |    "display_name": "Python 3",
400 |    "language": "python",
401 |    "name": "python3"
402 |   },
403 |   "language_info": {
404 |    "codemirror_mode": {
405 |     "name": "ipython",
406 |     "version": 3
407 |    },
408 |    "file_extension": ".py",
409 |    "mimetype": "text/x-python",
410 |    "name": "python",
411 |    "nbconvert_exporter": "python",
412 |    "pygments_lexer": "ipython3",
413 |    "version": "3.6.0"
414 |   }
415 |  },
416 |  "nbformat": 4,
417 |  "nbformat_minor": 2
418 | }
419 | 


--------------------------------------------------------------------------------