├── .flake8 ├── .gitignore ├── README.md ├── code ├── 01_simple_logistic_classification_on_MNIST.py ├── 02_DNN_classification_on_MNIST.py ├── 03_CNN_classification_on_MNIST.py ├── 04_1_Autoencoder_on_MNIST.py ├── 04_2_DenoiseAutoencoder_on_MNIST.py ├── 05_1_word2vec_SkipGram.py ├── 05_2_word2vec_CBOW.py └── 06_LSTM.py ├── img ├── 04_output_11_0.png ├── 04_output_13_0.png ├── 04_output_15_1.png ├── 04_output_5_1.png ├── 04_output_7_1.png ├── 04_output_9_0.png ├── 05_output_13_0.png ├── 05_output_20_0.png ├── TensorflowTutorial.001.jpeg ├── TensorflowTutorial.002.jpeg ├── TensorflowTutorial.003.jpeg ├── TensorflowTutorial.004.jpeg ├── TensorflowTutorial.005.jpeg ├── TensorflowTutorial.006.jpeg ├── TensorflowTutorial.007.jpeg ├── TensorflowTutorial.008.jpeg ├── TensorflowTutorial.009.jpeg ├── TensorflowTutorial.010.jpeg ├── TensorflowTutorial.011.jpeg ├── TensorflowTutorial.012.jpeg └── TensorflowTutorial.013.jpeg ├── requirements.txt └── tutorial ├── 01_Simple_Logistic_Classification_on_MNIST.ipynb ├── 02_Build_First_DNN.ipynb ├── 03_Build_CNN.ipynb ├── 04_Autoencoder.ipynb ├── 05_word2vec.ipynb ├── 06_RNN_and_LSTM.ipynb ├── tensorflow_workshop_0630.ipynb └── tensorflow_workshop_0630_ans.ipynb /.flake8: -------------------------------------------------------------------------------- 1 | [flake8] 2 | max-line-length = 99 3 | import-order-style = pep8 4 | extend-ignore = D1 5 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | pip-wheel-metadata/ 24 | share/python-wheels/ 25 | *.egg-info/ 26 | .installed.cfg 27 | *.egg 28 | MANIFEST 29 | 30 | # PyInstaller 31 | # Usually these files are written by a python script from a template 32 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 33 | *.manifest 34 | *.spec 35 | 36 | # Installer logs 37 | pip-log.txt 38 | pip-delete-this-directory.txt 39 | 40 | # Unit test / coverage reports 41 | htmlcov/ 42 | .tox/ 43 | .nox/ 44 | .coverage 45 | .coverage.* 46 | .cache 47 | nosetests.xml 48 | coverage.xml 49 | *.cover 50 | .hypothesis/ 51 | .pytest_cache/ 52 | 53 | # Translations 54 | *.mo 55 | *.pot 56 | 57 | # Django stuff: 58 | *.log 59 | local_settings.py 60 | db.sqlite3 61 | db.sqlite3-journal 62 | 63 | # Flask stuff: 64 | instance/ 65 | .webassets-cache 66 | 67 | # Scrapy stuff: 68 | .scrapy 69 | 70 | # Sphinx documentation 71 | docs/_build/ 72 | 73 | # PyBuilder 74 | target/ 75 | 76 | # Jupyter Notebook 77 | .ipynb_checkpoints 78 | 79 | # IPython 80 | profile_default/ 81 | ipython_config.py 82 | 83 | # pyenv 84 | .python-version 85 | 86 | # pipenv 87 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 88 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 89 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 90 | # install all needed dependencies. 91 | #Pipfile.lock 92 | 93 | # celery beat schedule file 94 | celerybeat-schedule 95 | 96 | # SageMath parsed files 97 | *.sage.py 98 | 99 | # Environments 100 | .env 101 | .venv 102 | env/ 103 | venv/ 104 | ENV/ 105 | env.bak/ 106 | venv.bak/ 107 | 108 | # Spyder project settings 109 | .spyderproject 110 | .spyproject 111 | 112 | # Rope project settings 113 | .ropeproject 114 | 115 | # mkdocs documentation 116 | /site 117 | 118 | # mypy 119 | .mypy_cache/ 120 | .dmypy.json 121 | dmypy.json 122 | 123 | # Pyre type checker 124 | .pyre/ 125 | 126 | # data 127 | code/MNIST_data/ 128 | code/text8.zip 129 | tutorial/MNIST_data/ 130 | tutorial/text8.zip 131 | 132 | # other 133 | .vscode 134 | .DS_Store 135 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 實作Tensorflow系列教程 2 | 3 | ## 一步一腳印的學Tensorflow 4 | 5 | 我想完成一套Tensorflow教程,將Deep Learning一些重要的概念一一的點出來,並且使用Tensorflow來實現或驗證這些概念。本教程有三個面向我希望做到的,我希望觀念講解時可以深入淺出,我希望呈現程式碼時可以結構嚴謹,我希望可以完整呈現Tensorflow的實用面。 6 | 7 | **本教程「網頁版」請至我的個人網站查看:[http://www.ycc.idv.tw/tag__實作Tensorflow/](http://www.ycc.idv.tw/tag__實作Tensorflow/)** 8 | 9 | ## Ch01 Simple Logistic Classification on MNIST 10 | 11 | 建立一個簡單的單層Neurel Network。 12 | 13 | ![Simple Neurel Network](https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/master/img/TensorflowTutorial.002.jpeg) 14 | 15 | ## Ch02 Build First Deep Neurel Network (DNN) 16 | 17 | 開始建立第一個Deep Learning,並仔細介紹Deep Learning的重要組成,包括:Hidden Layer、Activation Function、Mini-Batch Gradient Descent、Weight Regularization、Dropout和Optimizer。 18 | 19 | ![DNN](https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/master/img/TensorflowTutorial.003.jpeg) 20 | 21 | ## Ch03 Build First Convolutional Neurel Network (CNN) 22 | 23 | 介紹影像處理上最廣為人使用的Convolutional Neurel Network,引入Convolution Layer和Pooling Layer的概念,並在最後完成最簡單的CNN架構:LeNet5。 24 | 25 | ![CNN](https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/master/img/TensorflowTutorial.006.jpeg) 26 | 27 | ## Ch04 Autoencoder 28 | 29 | 建立一個DNN的Autoencoder,揭露Embedding Code的神奇效果,藉由壓縮與還原找出一個精簡描述一群數據的Embedding空間,在這空間上數據不需要人為給予Labels,機器會自行分類成為一個個合理的群體,所以Autoencoder可以用於Unsupervised Learning上。 30 | 31 | ![Autoencoder](https://github.com/GitYCC/Tensorflow_Tutorial/blob/master/img/TensorflowTutorial.007.jpeg?raw=true) 32 | 33 | ![Embedding Code](https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/master/img/04_output_9_0.png) 34 | 35 | ## Ch05 Word2Vec 36 | 37 | 介紹兩種Word2Vec模型:Skip-gram和CBOW,揭露Embedding Vector的神奇效果,利用壓縮上下文的關係,我們可以建立一個Embedding的空間,在這個空間語意相近的兩個字,它們的Embedding Vector也會彼此相似。 38 | 39 | ![word2vec](https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/master/img/TensorflowTutorial.008.jpeg) 40 | 41 | ![Embedding Vector](https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/master/img/05_output_13_0.png) 42 | 43 | ## CH06 Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) 44 | 45 | 介紹具有時序性的Neurel Network—RNN,並點出一般簡易型的RNN因為共用權重以及等效於非常深的網路,會遇到的梯度爆炸與梯度消失問題。LSTM是另外一種型態的RNN,利用建立「長期記憶」來避免梯度消失問題,至於梯度爆炸問題則可以使用Gradient Clipping的手法解決。 46 | 47 | ![LSTM](https://github.com/GitYCC/Tensorflow_Tutorial/raw/master/img/TensorflowTutorial.012.jpeg) -------------------------------------------------------------------------------- /code/01_simple_logistic_classification_on_MNIST.py: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/python3.6 2 | 3 | import numpy as np 4 | import tensorflow as tf 5 | from tensorflow.examples.tutorials.mnist import input_data 6 | 7 | tf.logging.set_verbosity(tf.logging.ERROR) 8 | 9 | 10 | class SimpleLogisticClassification: 11 | 12 | def __init__(self, n_features, n_labels, learning_rate=0.5): 13 | self.n_features = n_features 14 | self.n_labels = n_labels 15 | 16 | self.weights = None 17 | self.biases = None 18 | 19 | self.graph = tf.Graph() # initialize new graph 20 | self.build(learning_rate) # building graph 21 | self.sess = tf.Session(graph=self.graph) # create session by the graph 22 | 23 | def build(self, learning_rate): 24 | # Building Graph 25 | with self.graph.as_default(): 26 | ### Input 27 | self.train_features = tf.placeholder(tf.float32, shape=(None, self.n_features)) 28 | self.train_labels = tf.placeholder(tf.int32, shape=(None, self.n_labels)) 29 | 30 | ### Optimalization 31 | # build neurel network structure and get their predictions and loss 32 | self.y_, self.loss = self.structure(features=self.train_features, 33 | labels=self.train_labels) 34 | # define training operation 35 | self.train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(self.loss) 36 | 37 | ### Prediction 38 | self.new_features = tf.placeholder(tf.float32, shape=(None, self.n_features)) 39 | self.new_labels = tf.placeholder(tf.int32, shape=(None, self.n_labels)) 40 | self.new_y_, self.new_loss = self.structure(features=self.new_features, 41 | labels=self.new_labels) 42 | 43 | ### Initialization 44 | self.init_op = tf.global_variables_initializer() 45 | 46 | def structure(self, features, labels): 47 | # build neurel network structure and return their predictions and loss 48 | ### Variable 49 | if (not self.weights) or (not self.biases): 50 | self.weights = { 51 | 'fc1': tf.Variable(tf.truncated_normal(shape=(self.n_features, self.n_labels))), 52 | } 53 | self.biases = { 54 | 'fc1': tf.Variable(tf.zeros(shape=(self.n_labels))), 55 | } 56 | 57 | ### Structure 58 | # one fully connected layer 59 | logits = self.get_dense_layer(features, self.weights['fc1'], self.biases['fc1']) 60 | 61 | # predictions 62 | y_ = tf.nn.softmax(logits) 63 | 64 | # loss: softmax cross entropy 65 | loss = tf.reduce_mean( 66 | tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits)) 67 | 68 | return (y_, loss) 69 | 70 | def get_dense_layer(self, input_layer, weight, bias, activation=None): 71 | # fully connected layer 72 | x = tf.add(tf.matmul(input_layer, weight), bias) 73 | if activation: 74 | x = activation(x) 75 | return x 76 | 77 | def fit(self, X, y, epochs=10, validation_data=None, test_data=None): 78 | X = self._check_array(X) 79 | y = self._check_array(y) 80 | 81 | self.sess.run(self.init_op) 82 | for epoch in range(epochs): 83 | print('Epoch %2d/%2d: ' % (epoch+1, epochs)) 84 | 85 | # fully gradient descent 86 | feed_dict = {self.train_features: X, self.train_labels: y} 87 | self.sess.run(self.train_op, feed_dict=feed_dict) 88 | 89 | # evaluate at the end of this epoch 90 | y_ = self.predict(X) 91 | train_loss = self.evaluate(X, y) 92 | train_acc = self.accuracy(y_, y) 93 | msg = ' loss = %8.4f, acc = %3.2f%%' % (train_loss, train_acc*100) 94 | 95 | if validation_data: 96 | val_loss = self.evaluate(validation_data[0], validation_data[1]) 97 | val_acc = self.accuracy(self.predict(validation_data[0]), validation_data[1]) 98 | msg += ', val_loss = %8.4f, val_acc = %3.2f%%' % (val_loss, val_acc*100) 99 | 100 | print(msg) 101 | 102 | if test_data: 103 | test_acc = self.accuracy(self.predict(test_data[0]), test_data[1]) 104 | print('test_acc = %3.2f%%' % (test_acc*100)) 105 | 106 | def accuracy(self, predictions, labels): 107 | return (np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))/predictions.shape[0]) 108 | 109 | def predict(self, X): 110 | X = self._check_array(X) 111 | return self.sess.run(self.new_y_, feed_dict={self.new_features: X}) 112 | 113 | def evaluate(self, X, y): 114 | X = self._check_array(X) 115 | y = self._check_array(y) 116 | return self.sess.run(self.new_loss, feed_dict={self.new_features: X, self.new_labels: y}) 117 | 118 | def _check_array(self, ndarray): 119 | ndarray = np.array(ndarray) 120 | if len(ndarray.shape) == 1: 121 | ndarray = np.reshape(ndarray, (1, ndarray.shape[0])) 122 | return ndarray 123 | 124 | 125 | if __name__ == '__main__': 126 | print('Extract MNIST Dataset ...') 127 | 128 | mnist = input_data.read_data_sets('MNIST_data/', one_hot=True) 129 | 130 | train_data = mnist.train 131 | valid_data = mnist.validation 132 | test_data = mnist.test 133 | 134 | model = SimpleLogisticClassification(n_features=28*28, 135 | n_labels=10, 136 | learning_rate=0.5) 137 | model.fit(X=train_data.images, 138 | y=train_data.labels, 139 | epochs=10, 140 | validation_data=(valid_data.images, valid_data.labels), 141 | test_data=(test_data.images, test_data.labels), ) 142 | -------------------------------------------------------------------------------- /code/02_DNN_classification_on_MNIST.py: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/python3.6 2 | 3 | import random 4 | 5 | import numpy as np 6 | import tensorflow as tf 7 | from tensorflow.examples.tutorials.mnist import input_data 8 | 9 | tf.logging.set_verbosity(tf.logging.ERROR) 10 | 11 | 12 | class DNNLogisticClassification: 13 | 14 | def __init__(self, n_features, n_labels, 15 | learning_rate=0.5, n_hidden=1000, activation=tf.nn.relu, 16 | dropout_ratio=0.5, alpha=0.0): 17 | 18 | self.n_features = n_features 19 | self.n_labels = n_labels 20 | 21 | self.weights = None 22 | self.biases = None 23 | 24 | self.graph = tf.Graph() # initialize new graph 25 | self.build(learning_rate, n_hidden, activation, 26 | dropout_ratio, alpha) # building graph 27 | self.sess = tf.Session(graph=self.graph) # create session by the graph 28 | 29 | def build(self, learning_rate, n_hidden, activation, dropout_ratio, alpha): 30 | # Building Graph 31 | with self.graph.as_default(): 32 | ### Input 33 | self.train_features = tf.placeholder(tf.float32, shape=(None, self.n_features)) 34 | self.train_labels = tf.placeholder(tf.int32, shape=(None, self.n_labels)) 35 | 36 | ### Optimalization 37 | # build neurel network structure and get their predictions and loss 38 | self.y_, self.original_loss = self.structure(features=self.train_features, 39 | labels=self.train_labels, 40 | n_hidden=n_hidden, 41 | activation=activation, 42 | dropout_ratio=dropout_ratio, 43 | train=True) 44 | # regularization loss 45 | self.regularization = \ 46 | tf.reduce_sum([tf.nn.l2_loss(w) for w in self.weights.values()]) \ 47 | / tf.reduce_sum([tf.size(w, out_type=tf.float32) for w in self.weights.values()]) 48 | 49 | # total loss 50 | self.loss = self.original_loss + alpha * self.regularization 51 | 52 | # define training operation 53 | optimizer = tf.train.GradientDescentOptimizer(learning_rate) 54 | self.train_op = optimizer.minimize(self.loss) 55 | 56 | ### Prediction 57 | self.new_features = tf.placeholder(tf.float32, shape=(None, self.n_features)) 58 | self.new_labels = tf.placeholder(tf.int32, shape=(None, self.n_labels)) 59 | self.new_y_, self.new_original_loss = self.structure(features=self.new_features, 60 | labels=self.new_labels, 61 | n_hidden=n_hidden, 62 | activation=activation) 63 | self.new_loss = self.new_original_loss + alpha * self.regularization 64 | 65 | ### Initialization 66 | self.init_op = tf.global_variables_initializer() 67 | 68 | def structure(self, features, labels, n_hidden, activation, dropout_ratio=0, train=False): 69 | # build neurel network structure and return their predictions and loss 70 | ### Variable 71 | if (not self.weights) or (not self.biases): 72 | self.weights = { 73 | 'fc1': tf.Variable(tf.truncated_normal(shape=(self.n_features, n_hidden))), 74 | 'fc2': tf.Variable(tf.truncated_normal(shape=(n_hidden, self.n_labels))), 75 | } 76 | self.biases = { 77 | 'fc1': tf.Variable(tf.zeros(shape=(n_hidden))), 78 | 'fc2': tf.Variable(tf.zeros(shape=(self.n_labels))), 79 | } 80 | ### Structure 81 | # layer 1 82 | fc1 = self.get_dense_layer(features, self.weights['fc1'], 83 | self.biases['fc1'], activation=activation) 84 | if train: 85 | fc1 = tf.nn.dropout(fc1, keep_prob=1-dropout_ratio) 86 | 87 | # layer 2 88 | logits = self.get_dense_layer(fc1, self.weights['fc2'], self.biases['fc2']) 89 | 90 | y_ = tf.nn.softmax(logits) 91 | 92 | loss = tf.reduce_mean( 93 | tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits)) 94 | 95 | return (y_, loss) 96 | 97 | def get_dense_layer(self, input_layer, weight, bias, activation=None): 98 | # fully connected layer 99 | x = tf.add(tf.matmul(input_layer, weight), bias) 100 | if activation: 101 | x = activation(x) 102 | return x 103 | 104 | def fit(self, X, y, epochs=10, validation_data=None, test_data=None, batch_size=None): 105 | X = self._check_array(X) 106 | y = self._check_array(y) 107 | 108 | N = X.shape[0] 109 | random.seed(9000) 110 | if not batch_size: 111 | batch_size = N 112 | 113 | self.sess.run(self.init_op) 114 | for epoch in range(epochs): 115 | print('Epoch %2d/%2d: ' % (epoch+1, epochs)) 116 | 117 | # mini-batch gradient descent 118 | index = [i for i in range(N)] 119 | random.shuffle(index) 120 | while len(index) > 0: 121 | index_size = len(index) 122 | batch_index = [index.pop() for _ in range(min(batch_size, index_size))] 123 | 124 | feed_dict = { 125 | self.train_features: X[batch_index, :], 126 | self.train_labels: y[batch_index], 127 | } 128 | _, loss = self.sess.run([self.train_op, self.loss], feed_dict=feed_dict) 129 | 130 | print('[%d/%d] loss = %9.4f ' % (N-len(index), N, loss), end='\r') 131 | 132 | # evaluate at the end of this epoch 133 | y_ = self.predict(X) 134 | train_loss = self.evaluate(X, y) 135 | train_acc = self.accuracy(y_, y) 136 | msg = '[%d/%d] loss = %8.4f, acc = %3.2f%%' % (N, N, train_loss, train_acc*100) 137 | 138 | if validation_data: 139 | val_loss = self.evaluate(validation_data[0], validation_data[1]) 140 | val_acc = self.accuracy(self.predict(validation_data[0]), validation_data[1]) 141 | msg += ', val_loss = %8.4f, val_acc = %3.2f%%' % (val_loss, val_acc*100) 142 | 143 | print(msg) 144 | 145 | if test_data: 146 | test_acc = self.accuracy(self.predict(test_data[0]), test_data[1]) 147 | print('test_acc = %3.2f%%' % (test_acc*100)) 148 | 149 | def accuracy(self, predictions, labels): 150 | return (np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))/predictions.shape[0]) 151 | 152 | def predict(self, X): 153 | X = self._check_array(X) 154 | return self.sess.run(self.new_y_, feed_dict={self.new_features: X}) 155 | 156 | def evaluate(self, X, y): 157 | X = self._check_array(X) 158 | y = self._check_array(y) 159 | return self.sess.run(self.new_loss, feed_dict={self.new_features: X, 160 | self.new_labels: y}) 161 | 162 | def _check_array(self, ndarray): 163 | ndarray = np.array(ndarray) 164 | if len(ndarray.shape) == 1: 165 | ndarray = np.reshape(ndarray, (1, ndarray.shape[0])) 166 | return ndarray 167 | 168 | 169 | if __name__ == '__main__': 170 | print('Extract MNIST Dataset ...') 171 | 172 | mnist = input_data.read_data_sets('MNIST_data/', one_hot=True) 173 | 174 | train_data = mnist.train 175 | valid_data = mnist.validation 176 | test_data = mnist.test 177 | 178 | model = DNNLogisticClassification( 179 | n_features=28*28, 180 | n_labels=10, 181 | learning_rate=0.5, 182 | n_hidden=1000, 183 | activation=tf.nn.relu, 184 | dropout_ratio=0.5, 185 | alpha=0.01, 186 | ) 187 | model.fit( 188 | X=train_data.images, 189 | y=train_data.labels, 190 | epochs=3, 191 | validation_data=(valid_data.images, valid_data.labels), 192 | test_data=(test_data.images, test_data.labels), 193 | batch_size=32, 194 | ) 195 | -------------------------------------------------------------------------------- /code/03_CNN_classification_on_MNIST.py: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/python3.6 2 | 3 | import random 4 | 5 | import numpy as np 6 | import tensorflow as tf 7 | from tensorflow.examples.tutorials.mnist import input_data 8 | import matplotlib.pyplot as plt 9 | 10 | tf.logging.set_verbosity(tf.logging.ERROR) 11 | 12 | 13 | class CNNLogisticClassification: 14 | 15 | def __init__(self, shape_picture, n_labels, 16 | learning_rate=0.5, dropout_ratio=0.5, alpha=0.0): 17 | self.shape_picture = shape_picture 18 | self.n_labels = n_labels 19 | 20 | self.weights = None 21 | self.biases = None 22 | 23 | self.graph = tf.Graph() # initialize new grap 24 | self.build(learning_rate, dropout_ratio, alpha) # building graph 25 | self.sess = tf.Session(graph=self.graph) # create session by the graph 26 | 27 | def build(self, learning_rate, dropout_ratio, alpha): 28 | with self.graph.as_default(): 29 | ### Input 30 | self.train_pictures = tf.placeholder(tf.float32, 31 | shape=[None]+self.shape_picture) 32 | self.train_labels = tf.placeholder(tf.int32, 33 | shape=(None, self.n_labels)) 34 | 35 | ### Optimalization 36 | # build neurel network structure and get their predictions and loss 37 | self.y_, self.original_loss = self.structure(pictures=self.train_pictures, 38 | labels=self.train_labels, 39 | dropout_ratio=dropout_ratio, 40 | train=True, ) 41 | # regularization loss 42 | self.regularization = \ 43 | tf.reduce_sum([tf.nn.l2_loss(w) for w in self.weights.values()]) \ 44 | / tf.reduce_sum([tf.size(w, out_type=tf.float32) for w in self.weights.values()]) 45 | 46 | # total loss 47 | self.loss = self.original_loss + alpha * self.regularization 48 | 49 | # define training operation 50 | optimizer = tf.train.GradientDescentOptimizer(learning_rate) 51 | self.train_op = optimizer.minimize(self.loss) 52 | 53 | ### Prediction 54 | self.new_pictures = tf.placeholder(tf.float32, 55 | shape=[None]+self.shape_picture) 56 | self.new_labels = tf.placeholder(tf.int32, 57 | shape=(None, self.n_labels)) 58 | self.new_y_, self.new_original_loss = self.structure(pictures=self.new_pictures, 59 | labels=self.new_labels) 60 | self.new_loss = self.new_original_loss + alpha * self.regularization 61 | 62 | ### Initialization 63 | self.init_op = tf.global_variables_initializer() 64 | 65 | def structure(self, pictures, labels, dropout_ratio=None, train=False): 66 | ### Variable 67 | ## LeNet5 Architecture(http://yann.lecun.com/exdb/lenet/) 68 | # input:(batch,28,28,1) => conv1[5x5,6] => (batch,24,24,6) 69 | # pool2 => (batch,12,12,6) => conv2[5x5,16] => (batch,8,8,16) 70 | # pool4 => fatten5 => (batch,4x4x16) => fc6 => (batch,120) 71 | # (batch,120) => fc7 => (batch,84) 72 | # (batch,84) => fc8 => (batch,10) => softmax 73 | 74 | if (not self.weights) and (not self.biases): 75 | self.weights = { 76 | 'conv1': tf.Variable(tf.truncated_normal(shape=(5, 5, 1, 6), 77 | stddev=0.1)), 78 | 'conv3': tf.Variable(tf.truncated_normal(shape=(5, 5, 6, 16), 79 | stddev=0.1)), 80 | 'fc6': tf.Variable(tf.truncated_normal(shape=(4*4*16, 120), 81 | stddev=0.1)), 82 | 'fc7': tf.Variable(tf.truncated_normal(shape=(120, 84), 83 | stddev=0.1)), 84 | 'fc8': tf.Variable(tf.truncated_normal(shape=(84, self.n_labels), 85 | stddev=0.1)), 86 | } 87 | self.biases = { 88 | 'conv1': tf.Variable(tf.zeros(shape=(6))), 89 | 'conv3': tf.Variable(tf.zeros(shape=(16))), 90 | 'fc6': tf.Variable(tf.zeros(shape=(120))), 91 | 'fc7': tf.Variable(tf.zeros(shape=(84))), 92 | 'fc8': tf.Variable(tf.zeros(shape=(self.n_labels))), 93 | } 94 | 95 | ### Structure 96 | conv1 = self.get_conv_2d_layer(pictures, 97 | self.weights['conv1'], self.biases['conv1'], 98 | activation=tf.nn.relu) 99 | pool2 = tf.nn.max_pool(conv1, 100 | ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID') 101 | conv3 = self.get_conv_2d_layer(pool2, 102 | self.weights['conv3'], self.biases['conv3'], 103 | activation=tf.nn.relu) 104 | pool4 = tf.nn.max_pool(conv3, 105 | ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID') 106 | fatten5 = self.get_flatten_layer(pool4) 107 | 108 | if train: 109 | fatten5 = tf.nn.dropout(fatten5, keep_prob=1-dropout_ratio[0]) 110 | 111 | fc6 = self.get_dense_layer(fatten5, 112 | self.weights['fc6'], self.biases['fc6'], 113 | activation=tf.nn.relu) 114 | 115 | if train: 116 | fc6 = tf.nn.dropout(fc6, keep_prob=1-dropout_ratio[1]) 117 | 118 | fc7 = self.get_dense_layer(fc6, 119 | self.weights['fc7'], self.biases['fc7'], 120 | activation=tf.nn.relu) 121 | 122 | logits = self.get_dense_layer(fc7, self.weights['fc8'], self.biases['fc8']) 123 | 124 | y_ = tf.nn.softmax(logits) 125 | loss = tf.reduce_mean( 126 | tf.nn.softmax_cross_entropy_with_logits(labels=labels, 127 | logits=logits)) 128 | 129 | return (y_, loss) 130 | 131 | def get_dense_layer(self, input_layer, weight, bias, activation=None): 132 | x = tf.add(tf.matmul(input_layer, weight), bias) 133 | if activation: 134 | x = activation(x) 135 | return x 136 | 137 | def get_conv_2d_layer(self, input_layer, 138 | weight, bias, 139 | strides=(1, 1), padding='VALID', activation=None): 140 | x = tf.add( 141 | tf.nn.conv2d(input_layer, 142 | weight, 143 | [1, strides[0], strides[1], 1], 144 | padding=padding), bias) 145 | if activation: 146 | x = activation(x) 147 | return x 148 | 149 | def get_flatten_layer(self, input_layer): 150 | shape = input_layer.get_shape().as_list() 151 | n = 1 152 | for s in shape[1:]: 153 | n *= s 154 | x = tf.reshape(input_layer, [-1, n]) 155 | return x 156 | 157 | def fit(self, X, y, epochs=10, 158 | validation_data=None, test_data=None, batch_size=None): 159 | X = self._check_array(X) 160 | y = self._check_array(y) 161 | 162 | N = X.shape[0] 163 | random.seed(9000) 164 | if not batch_size: 165 | batch_size = N 166 | 167 | self.sess.run(self.init_op) 168 | for epoch in range(epochs): 169 | print('Epoch %2d/%2d: ' % (epoch+1, epochs)) 170 | 171 | # mini-batch gradient descent 172 | index = [i for i in range(N)] 173 | random.shuffle(index) 174 | while len(index) > 0: 175 | index_size = len(index) 176 | batch_index = [index.pop() for _ in range(min(batch_size, index_size))] 177 | 178 | feed_dict = { 179 | self.train_pictures: X[batch_index, :], 180 | self.train_labels: y[batch_index], 181 | } 182 | _, loss = self.sess.run([self.train_op, self.loss], 183 | feed_dict=feed_dict) 184 | 185 | print('[%d/%d] loss = %.4f ' % (N-len(index), N, loss), end='\r') 186 | 187 | # evaluate at the end of this epoch 188 | y_ = self.predict(X) 189 | train_loss = self.evaluate(X, y) 190 | train_acc = self.accuracy(y_, y) 191 | msg = '[%d/%d] loss = %8.4f, acc = %3.2f%%' % (N, N, train_loss, train_acc*100) 192 | 193 | if validation_data: 194 | val_loss = self.evaluate(validation_data[0], validation_data[1]) 195 | val_acc = self.accuracy(self.predict(validation_data[0]), validation_data[1]) 196 | msg += ', val_loss = %8.4f, val_acc = %3.2f%%' % (val_loss, val_acc*100) 197 | 198 | print(msg) 199 | 200 | if test_data: 201 | test_acc = self.accuracy(self.predict(test_data[0]), test_data[1]) 202 | print('test_acc = %3.2f%%' % (test_acc*100)) 203 | 204 | def accuracy(self, predictions, labels): 205 | return (np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))/predictions.shape[0]) 206 | 207 | def predict(self, X): 208 | X = self._check_array(X) 209 | return self.sess.run(self.new_y_, feed_dict={self.new_pictures: X}) 210 | 211 | def evaluate(self, X, y): 212 | X = self._check_array(X) 213 | y = self._check_array(y) 214 | return self.sess.run(self.new_loss, feed_dict={self.new_pictures: X, 215 | self.new_labels: y}) 216 | 217 | def _check_array(self, ndarray): 218 | ndarray = np.array(ndarray) 219 | if len(ndarray.shape) == 1: 220 | ndarray = np.reshape(ndarray, (1, ndarray.shape[0])) 221 | return ndarray 222 | 223 | 224 | if __name__ == '__main__': 225 | print('Extract MNIST Dataset ...') 226 | 227 | mnist = input_data.read_data_sets('MNIST_data/', one_hot=True) 228 | 229 | train_data = mnist.train 230 | valid_data = mnist.validation 231 | test_data = mnist.test 232 | 233 | train_img = np.reshape(train_data.images, [-1, 28, 28, 1]) 234 | valid_img = np.reshape(valid_data.images, [-1, 28, 28, 1]) 235 | test_img = np.reshape(test_data.images, [-1, 28, 28, 1]) 236 | 237 | model = CNNLogisticClassification( 238 | shape_picture=[28, 28, 1], 239 | n_labels=10, 240 | learning_rate=0.07, 241 | dropout_ratio=[0.2, 0.1], 242 | alpha=0.1, 243 | ) 244 | model.fit( 245 | X=train_img, 246 | y=train_data.labels, 247 | epochs=10, 248 | validation_data=(valid_img, valid_data.labels), 249 | test_data=(test_img, test_data.labels), 250 | batch_size=32, 251 | ) 252 | -------------------------------------------------------------------------------- /code/04_1_Autoencoder_on_MNIST.py: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/python3.6 2 | 3 | import random 4 | import time 5 | 6 | import numpy as np 7 | import tensorflow as tf 8 | from tensorflow.python.framework import ops 9 | from tensorflow.examples.tutorials.mnist import input_data 10 | import matplotlib.pyplot as plt 11 | 12 | tf.logging.set_verbosity(tf.logging.ERROR) 13 | 14 | 15 | class Autoencoder: 16 | 17 | def __init__(self, n_features, learning_rate=0.5, n_hidden=[1000, 500, 250, 2], alpha=0.0): 18 | self.n_features = n_features 19 | 20 | self.weights = None 21 | self.biases = None 22 | 23 | self.graph = tf.Graph() # initialize new grap 24 | self.build(n_features, learning_rate, n_hidden, alpha) # building graph 25 | self.sess = tf.Session(graph=self.graph) # create session by the graph 26 | 27 | def build(self, n_features, learning_rate, n_hidden, alpha): 28 | with self.graph.as_default(): 29 | ### Input 30 | self.train_features = tf.placeholder(tf.float32, shape=(None, n_features)) 31 | self.train_targets = tf.placeholder(tf.float32, shape=(None, n_features)) 32 | 33 | ### Optimalization 34 | # build neurel network structure and get their predictions and loss 35 | self.y_, self.original_loss, _ = self.structure( 36 | features=self.train_features, 37 | targets=self.train_targets, 38 | n_hidden=n_hidden) 39 | 40 | # regularization loss 41 | # weight elimination L2 regularizer 42 | self.regularizer = \ 43 | tf.reduce_sum([tf.reduce_sum( 44 | tf.pow(w, 2)/(1+tf.pow(w, 2))) for w in self.weights.values()]) \ 45 | / tf.reduce_sum( 46 | [tf.size(w, out_type=tf.float32) for w in self.weights.values()]) 47 | 48 | # total loss 49 | self.loss = self.original_loss + alpha * self.regularizer 50 | 51 | # define training operation 52 | self.optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate) 53 | self.train_op = self.optimizer.minimize(self.loss) 54 | 55 | ### Prediction 56 | self.new_features = tf.placeholder(tf.float32, shape=(None, n_features)) 57 | self.new_targets = tf.placeholder(tf.float32, shape=(None, n_features)) 58 | self.new_y_, self.new_original_loss, self.new_encoder = self.structure( 59 | features=self.new_features, 60 | targets=self.new_targets, 61 | n_hidden=n_hidden) 62 | self.new_loss = self.new_original_loss + alpha * self.regularizer 63 | 64 | ### Initialization 65 | self.init_op = tf.global_variables_initializer() 66 | 67 | def structure(self, features, targets, n_hidden): 68 | ### Variable 69 | if (not self.weights) and (not self.biases): 70 | self.weights = {} 71 | self.biases = {} 72 | 73 | n_encoder = [self.n_features]+n_hidden 74 | for i, n in enumerate(n_encoder[:-1]): 75 | self.weights['encode{}'.format(i+1)] = \ 76 | tf.Variable(tf.truncated_normal( 77 | shape=(n, n_encoder[i+1]), stddev=0.1), dtype=tf.float32) 78 | self.biases['encode{}'.format(i+1)] = \ 79 | tf.Variable(tf.zeros(shape=(n_encoder[i+1])), dtype=tf.float32) 80 | 81 | n_decoder = list(reversed(n_hidden))+[self.n_features] 82 | for i, n in enumerate(n_decoder[:-1]): 83 | self.weights['decode{}'.format(i+1)] = \ 84 | tf.Variable(tf.truncated_normal( 85 | shape=(n, n_decoder[i+1]), stddev=0.1), dtype=tf.float32) 86 | self.biases['decode{}'.format(i+1)] = \ 87 | tf.Variable(tf.zeros(shape=(n_decoder[i+1])), dtype=tf.float32) 88 | 89 | ### Structure 90 | activation = tf.nn.relu 91 | 92 | encoder = self.get_dense_layer(features, 93 | self.weights['encode1'], 94 | self.biases['encode1'], 95 | activation=activation) 96 | 97 | for i in range(1, len(n_hidden)-1): 98 | encoder = self.get_dense_layer( 99 | encoder, 100 | self.weights['encode{}'.format(i+1)], 101 | self.biases['encode{}'.format(i+1)], 102 | activation=activation, 103 | ) 104 | 105 | encoder = self.get_dense_layer( 106 | encoder, 107 | self.weights['encode{}'.format(len(n_hidden))], 108 | self.biases['encode{}'.format(len(n_hidden))], 109 | ) 110 | 111 | decoder = self.get_dense_layer(encoder, 112 | self.weights['decode1'], 113 | self.biases['decode1'], 114 | activation=activation) 115 | 116 | for i in range(1, len(n_hidden)-1): 117 | decoder = self.get_dense_layer( 118 | decoder, 119 | self.weights['decode{}'.format(i+1)], 120 | self.biases['decode{}'.format(i+1)], 121 | activation=activation, 122 | ) 123 | 124 | y_ = self.get_dense_layer( 125 | decoder, 126 | self.weights['decode{}'.format(len(n_hidden))], 127 | self.biases['decode{}'.format(len(n_hidden))], 128 | activation=tf.nn.sigmoid, 129 | ) 130 | 131 | loss = tf.reduce_mean(tf.pow(targets - y_, 2)) 132 | 133 | return (y_, loss, encoder) 134 | 135 | def get_dense_layer(self, input_layer, weight, bias, activation=None): 136 | x = tf.add(tf.matmul(input_layer, weight), bias) 137 | if activation: 138 | x = activation(x) 139 | return x 140 | 141 | def fit(self, X, Y, epochs=10, validation_data=None, test_data=None, batch_size=None): 142 | X = self._check_array(X) 143 | Y = self._check_array(Y) 144 | 145 | N = X.shape[0] 146 | random.seed(9000) 147 | if not batch_size: 148 | batch_size = N 149 | 150 | self.sess.run(self.init_op) 151 | for epoch in range(epochs): 152 | print('Epoch %2d/%2d: ' % (epoch+1, epochs)) 153 | start_time = time.time() 154 | 155 | # mini-batch gradient descent 156 | index = [i for i in range(N)] 157 | random.shuffle(index) 158 | while len(index) > 0: 159 | index_size = len(index) 160 | batch_index = [index.pop() for _ in range(min(batch_size, index_size))] 161 | 162 | feed_dict = {self.train_features: X[batch_index, :], 163 | self.train_targets: Y[batch_index, :]} 164 | _, loss = self.sess.run([self.train_op, self.loss], feed_dict=feed_dict) 165 | 166 | print('[%d/%d] loss = %9.4f ' % (N-len(index), N, loss), end='\r') 167 | 168 | # evaluate at the end of this epoch 169 | msg_valid = '' 170 | if validation_data is not None: 171 | val_loss = self.evaluate(validation_data[0], validation_data[1]) 172 | msg_valid = ', val_loss = %9.4f' % (val_loss) 173 | 174 | train_loss = self.evaluate(X, Y) 175 | print('[%d/%d] %ds loss = %9.4f %s' % (N, N, time.time()-start_time, 176 | train_loss, msg_valid)) 177 | 178 | if test_data is not None: 179 | test_loss = self.evaluate(test_data[0], test_data[1]) 180 | print('test_loss = %9.4f' % (test_loss)) 181 | 182 | def encode(self, X): 183 | X = self._check_array(X) 184 | return self.sess.run(self.new_encoder, feed_dict={self.new_features: X}) 185 | 186 | def predict(self, X): 187 | X = self._check_array(X) 188 | return self.sess.run(self.new_y_, feed_dict={self.new_features: X}) 189 | 190 | def evaluate(self, X, Y): 191 | X = self._check_array(X) 192 | return self.sess.run(self.new_loss, feed_dict={self.new_features: X, 193 | self.new_targets: Y}) 194 | 195 | def _check_array(self, ndarray): 196 | ndarray = np.array(ndarray) 197 | if len(ndarray.shape) == 1: 198 | ndarray = np.reshape(ndarray, (1, ndarray.shape[0])) 199 | return ndarray 200 | 201 | 202 | if __name__ == '__main__': 203 | print('Extract MNIST Dataset ...') 204 | 205 | mnist = input_data.read_data_sets('MNIST_data/', one_hot=True) 206 | 207 | train_data = mnist.train 208 | valid_data = mnist.validation 209 | test_data = mnist.test 210 | 211 | model_2 = Autoencoder( 212 | n_features=28*28, 213 | learning_rate=0.0005, 214 | n_hidden=[512, 32, 4], 215 | alpha=0.001, 216 | ) 217 | model_2.fit( 218 | X=train_data.images, 219 | Y=train_data.images, 220 | epochs=20, 221 | validation_data=(valid_data.images, valid_data.images), 222 | test_data=(test_data.images, test_data.images), 223 | batch_size=8, 224 | ) 225 | 226 | fig, axis = plt.subplots(2, 15, figsize=(15, 2)) 227 | for i in range(0, 15): 228 | img_original = np.reshape(test_data.images[i], (28, 28)) 229 | axis[0][i].imshow(img_original, cmap='gray') 230 | img = np.reshape(model_2.predict(test_data.images[i]), (28, 28)) 231 | axis[1][i].imshow(img, cmap='gray') 232 | plt.show() 233 | 234 | ### get code 235 | encode = model_2.encode(test_data.images) 236 | 237 | ### PCA 2D visualization 238 | from sklearn.decomposition import PCA 239 | pca = PCA(n_components=2) 240 | X = pca.fit_transform(encode) 241 | Y = np.argmax(test_data.labels, axis=1) 242 | 243 | # plot 244 | plt.figure(figsize=(10, 8)) 245 | plt.scatter(X[:, 0], X[:, 1], c=Y) 246 | plt.colorbar() 247 | plt.show() 248 | 249 | ### TSNE 2D visualization 250 | from sklearn.manifold import TSNE 251 | tsne = TSNE(n_components=2) 252 | X_embedded = tsne.fit_transform(encode) 253 | Y = np.argmax(test_data.labels, axis=1) 254 | 255 | # plot 256 | plt.figure(figsize=(10, 8)) 257 | plt.scatter(X_embedded[:, 0], X_embedded[:, 1], c=Y) 258 | plt.colorbar() 259 | plt.show() 260 | -------------------------------------------------------------------------------- /code/04_2_DenoiseAutoencoder_on_MNIST.py: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/python3.6 2 | 3 | import random 4 | import time 5 | 6 | import numpy as np 7 | import tensorflow as tf 8 | from tensorflow.python.framework import ops 9 | from tensorflow.examples.tutorials.mnist import input_data 10 | import matplotlib.pyplot as plt 11 | 12 | tf.logging.set_verbosity(tf.logging.ERROR) 13 | 14 | 15 | class Autoencoder: 16 | def __init__(self, n_features, learning_rate=0.5, n_hidden=[1000, 500, 250, 2], alpha=0.0): 17 | self.n_features = n_features 18 | 19 | self.weights = None 20 | self.biases = None 21 | 22 | self.graph = tf.Graph() # initialize new grap 23 | self.build(n_features, learning_rate, n_hidden, alpha) # building graph 24 | self.sess = tf.Session(graph=self.graph) # create session by the graph 25 | 26 | def build(self, n_features, learning_rate, n_hidden, alpha): 27 | with self.graph.as_default(): 28 | ### Input 29 | self.train_features = tf.placeholder(tf.float32, shape=(None, n_features)) 30 | self.train_targets = tf.placeholder(tf.float32, shape=(None, n_features)) 31 | 32 | ### Optimalization 33 | # build neurel network structure and get their predictions and loss 34 | self.y_, self.original_loss, _ = self.structure( 35 | features=self.train_features, 36 | targets=self.train_targets, 37 | n_hidden=n_hidden) 38 | 39 | # regularization loss 40 | # weight elimination L2 regularizer 41 | self.regularizer = \ 42 | tf.reduce_sum([tf.reduce_sum( 43 | tf.pow(w, 2)/(1+tf.pow(w, 2))) for w in self.weights.values()]) \ 44 | / tf.reduce_sum( 45 | [tf.size(w, out_type=tf.float32) for w in self.weights.values()]) 46 | 47 | # total loss 48 | self.loss = self.original_loss + alpha * self.regularizer 49 | 50 | # define training operation 51 | self.optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate) 52 | self.train_op = self.optimizer.minimize(self.loss) 53 | 54 | ### Prediction 55 | self.new_features = tf.placeholder(tf.float32, shape=(None, n_features)) 56 | self.new_targets = tf.placeholder(tf.float32, shape=(None, n_features)) 57 | self.new_y_, self.new_original_loss, self.new_encoder = self.structure( 58 | features=self.new_features, 59 | targets=self.new_targets, 60 | n_hidden=n_hidden) 61 | self.new_loss = self.new_original_loss + alpha * self.regularizer 62 | 63 | ### Initialization 64 | self.init_op = tf.global_variables_initializer() 65 | 66 | def structure(self, features, targets, n_hidden): 67 | ### Variable 68 | if (not self.weights) and (not self.biases): 69 | self.weights = {} 70 | self.biases = {} 71 | 72 | n_encoder = [self.n_features]+n_hidden 73 | for i, n in enumerate(n_encoder[:-1]): 74 | self.weights['encode{}'.format(i+1)] = \ 75 | tf.Variable(tf.truncated_normal( 76 | shape=(n, n_encoder[i+1]), stddev=0.1), dtype=tf.float32) 77 | self.biases['encode{}'.format(i+1)] = \ 78 | tf.Variable(tf.zeros(shape=(n_encoder[i+1])), dtype=tf.float32) 79 | 80 | n_decoder = list(reversed(n_hidden))+[self.n_features] 81 | for i, n in enumerate(n_decoder[:-1]): 82 | self.weights['decode{}'.format(i+1)] = \ 83 | tf.Variable(tf.truncated_normal( 84 | shape=(n, n_decoder[i+1]), stddev=0.1), dtype=tf.float32) 85 | self.biases['decode{}'.format(i+1)] = \ 86 | tf.Variable(tf.zeros(shape=(n_decoder[i+1])), dtype=tf.float32) 87 | 88 | ### Structure 89 | activation = tf.nn.relu 90 | 91 | encoder = self.get_dense_layer(features, 92 | self.weights['encode1'], 93 | self.biases['encode1'], 94 | activation=activation) 95 | 96 | for i in range(1, len(n_hidden)-1): 97 | encoder = self.get_dense_layer( 98 | encoder, 99 | self.weights['encode{}'.format(i+1)], 100 | self.biases['encode{}'.format(i+1)], 101 | activation=activation 102 | ) 103 | 104 | encoder = self.get_dense_layer( 105 | encoder, 106 | self.weights['encode{}'.format(len(n_hidden))], 107 | self.biases['encode{}'.format(len(n_hidden))], 108 | ) 109 | 110 | decoder = self.get_dense_layer( 111 | encoder, 112 | self.weights['decode1'], 113 | self.biases['decode1'], 114 | activation=activation 115 | ) 116 | 117 | for i in range(1, len(n_hidden)-1): 118 | decoder = self.get_dense_layer( 119 | decoder, 120 | self.weights['decode{}'.format(i+1)], 121 | self.biases['decode{}'.format(i+1)], 122 | activation=activation 123 | ) 124 | 125 | y_ = self.get_dense_layer( 126 | decoder, 127 | self.weights['decode{}'.format(len(n_hidden))], 128 | self.biases['decode{}'.format(len(n_hidden))], 129 | activation=tf.nn.sigmoid, 130 | ) 131 | 132 | loss = tf.reduce_mean(tf.pow(targets - y_, 2)) 133 | 134 | return (y_, loss, encoder) 135 | 136 | def get_dense_layer(self, input_layer, weight, bias, activation=None): 137 | x = tf.add(tf.matmul(input_layer, weight), bias) 138 | if activation: 139 | x = activation(x) 140 | return x 141 | 142 | def fit(self, X, Y, epochs=10, validation_data=None, test_data=None, batch_size=None): 143 | X = self._check_array(X) 144 | Y = self._check_array(Y) 145 | 146 | N = X.shape[0] 147 | random.seed(9000) 148 | if not batch_size: 149 | batch_size = N 150 | 151 | self.sess.run(self.init_op) 152 | for epoch in range(epochs): 153 | print('Epoch %2d/%2d: ' % (epoch+1, epochs)) 154 | start_time = time.time() 155 | 156 | # mini-batch gradient descent 157 | index = [i for i in range(N)] 158 | random.shuffle(index) 159 | while len(index) > 0: 160 | index_size = len(index) 161 | batch_index = [index.pop() for _ in range(min(batch_size, index_size))] 162 | 163 | feed_dict = {self.train_features: X[batch_index, :], 164 | self.train_targets: Y[batch_index, :]} 165 | _, loss = self.sess.run([self.train_op, self.loss], feed_dict=feed_dict) 166 | 167 | print('[%d/%d] loss = %9.4f ' % (N-len(index), N, loss), end='\r') 168 | 169 | # evaluate at the end of this epoch 170 | msg_valid = '' 171 | if validation_data is not None: 172 | val_loss = self.evaluate(validation_data[0], validation_data[1]) 173 | msg_valid = ', val_loss = %9.4f' % (val_loss, ) 174 | 175 | train_loss = self.evaluate(X, Y) 176 | print('[%d/%d] %ds loss = %9.4f %s' % (N, N, time.time()-start_time, 177 | train_loss, msg_valid)) 178 | 179 | if test_data is not None: 180 | test_loss = self.evaluate(test_data[0], test_data[1]) 181 | print('test_loss = %9.4f' % (test_loss)) 182 | 183 | def encode(self, X): 184 | X = self._check_array(X) 185 | return self.sess.run(self.new_encoder, feed_dict={self.new_features: X}) 186 | 187 | def predict(self, X): 188 | X = self._check_array(X) 189 | return self.sess.run(self.new_y_, feed_dict={self.new_features: X}) 190 | 191 | def evaluate(self, X, Y): 192 | X = self._check_array(X) 193 | return self.sess.run(self.new_loss, feed_dict={self.new_features: X, 194 | self.new_targets: Y}) 195 | 196 | def _check_array(self, ndarray): 197 | ndarray = np.array(ndarray) 198 | if len(ndarray.shape) == 1: 199 | ndarray = np.reshape(ndarray, (1, ndarray.shape[0])) 200 | return ndarray 201 | 202 | 203 | def add_noise(ndarr): 204 | noise_factor = 0.3 205 | noisy_ndarr = ndarr + noise_factor * np.random.normal(loc=0.0, 206 | scale=1.0, 207 | size=ndarr.shape) 208 | return noisy_ndarr 209 | 210 | 211 | if __name__ == '__main__': 212 | print('Extract MNIST Dataset ...') 213 | 214 | mnist = input_data.read_data_sets('MNIST_data/', one_hot=True) 215 | 216 | train_data = mnist.train 217 | valid_data = mnist.validation 218 | test_data = mnist.test 219 | 220 | # add noise 221 | noisy_train_img = add_noise(train_data.images) 222 | noisy_valid_img = add_noise(valid_data.images) 223 | noisy_test_img = add_noise(test_data.images) 224 | 225 | # train model 226 | denoise_model = Autoencoder( 227 | n_features=28*28, 228 | learning_rate=0.0003, 229 | n_hidden=[512, 32, 4], 230 | alpha=1.0, 231 | ) 232 | denoise_model.fit( 233 | X=noisy_train_img, 234 | Y=train_data.images, 235 | epochs=20, 236 | validation_data=(noisy_valid_img, valid_data.images), 237 | test_data=(noisy_test_img, test_data.images), 238 | batch_size=8, 239 | ) 240 | 241 | # plot 242 | fig, axis = plt.subplots(3, 15, figsize=(15, 3)) 243 | for i in range(0, 15): 244 | img_original = np.reshape(test_data.images[i], (28, 28)) 245 | axis[0][i].imshow(img_original, cmap='gray') 246 | img_noisy = np.reshape(noisy_test_img[i], (28, 28)) 247 | axis[1][i].imshow(img_noisy, cmap='gray') 248 | img = np.reshape(denoise_model.predict(noisy_test_img[i]), (28, 28)) 249 | axis[2][i].imshow(img, cmap='gray') 250 | plt.show() 251 | -------------------------------------------------------------------------------- /code/05_1_word2vec_SkipGram.py: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/python3.6 2 | 3 | import collections 4 | import os 5 | import zipfile 6 | import random 7 | import math 8 | import time 9 | from urllib.request import urlretrieve 10 | 11 | import tensorflow as tf 12 | import numpy as np 13 | 14 | tf.logging.set_verbosity(tf.logging.ERROR) 15 | 16 | VOCABULARY_SIZE = 100000 17 | 18 | 19 | def maybe_download(url, filename, expected_bytes): 20 | """Download a file if not present, and make sure it's the right size.""" 21 | if not os.path.exists(filename): 22 | filename, _ = urlretrieve(url, filename) 23 | statinfo = os.stat(filename) 24 | if statinfo.st_size == expected_bytes: 25 | print('Found and verified %s' % filename) 26 | else: 27 | print(statinfo.st_size) 28 | raise Exception( 29 | 'Failed to verify ' + filename + '. Can you get to it with a browser?') 30 | return filename 31 | 32 | 33 | def read_data(filename): 34 | """Extract the first file enclosed in a zip file as a list of words""" 35 | with zipfile.ZipFile(filename) as f: 36 | data = tf.compat.as_str(f.read(f.namelist()[0])).split() 37 | return data 38 | 39 | 40 | def build_dataset(words, vocabulary_size=VOCABULARY_SIZE): 41 | count = [['UNK', -1]] 42 | count.extend(collections.Counter(words).most_common(vocabulary_size - 1)) 43 | dictionary = dict() 44 | for word, _ in count: 45 | dictionary[word] = len(dictionary) 46 | data = list() 47 | unk_count = 0 48 | for word in words: 49 | if word in dictionary: 50 | index = dictionary[word] 51 | else: 52 | index = 0 # dictionary['UNK'] 53 | unk_count = unk_count + 1 54 | data.append(index) 55 | count[0][1] = unk_count 56 | reverse_dictionary = dict(zip(dictionary.values(), dictionary.keys())) 57 | return data, count, dictionary, reverse_dictionary 58 | 59 | 60 | def skip_gram_batch_generator(data, batch_size, num_skips, skip_window): 61 | assert batch_size % num_skips == 0 62 | assert num_skips <= 2 * skip_window 63 | 64 | batch = np.ndarray(shape=(batch_size), dtype=np.int32) 65 | labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32) 66 | 67 | span = 2 * skip_window + 1 # [ skip_window target skip_window ] 68 | buffer = collections.deque(maxlen=span) 69 | 70 | # initialization 71 | data_index = 0 72 | for _ in range(span): 73 | buffer.append(data[data_index]) 74 | data_index = (data_index + 1) % len(data) 75 | 76 | # generate 77 | k = 0 78 | while True: 79 | target = skip_window # target label at the center of the buffer 80 | targets_to_avoid = [target] 81 | for _ in range(num_skips): 82 | while target in targets_to_avoid: 83 | target = random.randint(0, span - 1) 84 | targets_to_avoid.append(target) 85 | batch[k] = buffer[skip_window] 86 | labels[k, 0] = buffer[target] 87 | k += 1 88 | 89 | # Recycle 90 | if data_index == len(data): 91 | data_index = 0 92 | 93 | # scan data 94 | buffer.append(data[data_index]) 95 | data_index = (data_index + 1) % len(data) 96 | 97 | # Enough num to output 98 | if k == batch_size: 99 | k = 0 100 | yield (batch.copy(), labels.copy()) 101 | 102 | 103 | class SkipGram: 104 | 105 | def __init__(self, n_vocabulary, n_embedding, reverse_dictionary, learning_rate=1.0): 106 | self.n_vocabulary = n_vocabulary 107 | self.n_embedding = n_embedding 108 | self.reverse_dictionary = reverse_dictionary 109 | 110 | self.weights = None 111 | self.biases = None 112 | 113 | self.graph = tf.Graph() # initialize new grap 114 | self.build(learning_rate) # building graph 115 | self.sess = tf.Session(graph=self.graph) # create session by the graph 116 | 117 | def build(self, learning_rate): 118 | with self.graph.as_default(): 119 | ### Input 120 | self.train_dataset = tf.placeholder(tf.int32, shape=[None]) 121 | self.train_labels = tf.placeholder(tf.int32, shape=[None, 1]) 122 | 123 | ### Optimalization 124 | # build neurel network structure and get their loss 125 | self.loss = self.structure( 126 | dataset=self.train_dataset, 127 | labels=self.train_labels, 128 | ) 129 | 130 | # normalize embeddings 131 | self.norm = tf.sqrt( 132 | tf.reduce_sum( 133 | tf.square(self.weights['embeddings']), 1, keep_dims=True)) 134 | self.normalized_embeddings = self.weights['embeddings'] / self.norm 135 | 136 | # define training operation 137 | self.optimizer = tf.train.AdagradOptimizer(learning_rate=learning_rate) 138 | self.train_op = self.optimizer.minimize(self.loss) 139 | 140 | ### Prediction 141 | self.new_dataset = tf.placeholder(tf.int32, shape=[None]) 142 | self.new_labels = tf.placeholder(tf.int32, shape=[None, 1]) 143 | self.new_loss = self.structure( 144 | dataset=self.new_dataset, 145 | labels=self.new_labels, 146 | ) 147 | 148 | # similarity 149 | self.new_embed = tf.nn.embedding_lookup( 150 | self.normalized_embeddings, self.new_dataset) 151 | self.new_similarity = tf.matmul(self.new_embed, 152 | tf.transpose(self.normalized_embeddings)) 153 | 154 | ### Initialization 155 | self.init_op = tf.global_variables_initializer() 156 | 157 | def structure(self, dataset, labels): 158 | ### Variable 159 | if (not self.weights) and (not self.biases): 160 | self.weights = { 161 | 'embeddings': tf.Variable( 162 | tf.random_uniform([self.n_vocabulary, self.n_embedding], 163 | -1.0, 1.0)), 164 | 'softmax': tf.Variable( 165 | tf.truncated_normal([self.n_vocabulary, self.n_embedding], 166 | stddev=1.0/math.sqrt(self.n_embedding))) 167 | } 168 | self.biases = { 169 | 'softmax': tf.Variable(tf.zeros([self.n_vocabulary])) 170 | } 171 | 172 | ### Structure 173 | # Look up embeddings for inputs. 174 | embed = tf.nn.embedding_lookup(self.weights['embeddings'], dataset) 175 | 176 | # Compute the softmax loss, using a sample of the negative labels each time. 177 | num_softmax_sampled = 64 178 | 179 | loss = tf.reduce_mean( 180 | tf.nn.sampled_softmax_loss(weights=self.weights['softmax'], 181 | biases=self.biases['softmax'], 182 | inputs=embed, 183 | labels=labels, 184 | num_sampled=num_softmax_sampled, 185 | num_classes=self.n_vocabulary)) 186 | 187 | return loss 188 | 189 | def initialize(self): 190 | self.weights = None 191 | self.biases = None 192 | self.sess.run(self.init_op) 193 | 194 | def online_fit(self, X, Y): 195 | feed_dict = {self.train_dataset: X, 196 | self.train_labels: Y} 197 | _, loss = self.sess.run([self.train_op, self.loss], feed_dict=feed_dict) 198 | 199 | return loss 200 | 201 | def nearest_words(self, X, top_nearest): 202 | similarity = self.sess.run(self.new_similarity, 203 | feed_dict={self.new_dataset: X}) 204 | X_size = X.shape[0] 205 | 206 | valid_words = [] 207 | nearests = [] 208 | for i in range(X_size): 209 | valid_word = self.find_word(X[i]) 210 | valid_words.append(valid_word) 211 | 212 | # select highest similarity word 213 | nearest = (-similarity[i, :]).argsort()[1:top_nearest+1] 214 | nearests.append(list(map(lambda x: self.find_word(x), nearest))) 215 | 216 | return (valid_words, np.array(nearests)) 217 | 218 | def evaluate(self, X, Y): 219 | return self.sess.run(self.new_loss, feed_dict={self.new_dataset: X, 220 | self.new_labels: Y}) 221 | 222 | def embedding_matrix(self): 223 | return self.sess.run(self.normalized_embeddings) 224 | 225 | def find_word(self, index): 226 | return self.reverse_dictionary[index] 227 | 228 | 229 | def main(): 230 | ### download and load data 231 | print('Downloading text8.zip') 232 | filename = maybe_download('http://mattmahoney.net/dc/text8.zip', './text8.zip', 31344016) 233 | 234 | print('=====') 235 | words = read_data(filename) 236 | print('Data size %d' % len(words)) 237 | print('First 10 words: {}'.format(words[:10])) 238 | 239 | print('=====') 240 | data, count, dictionary, reverse_dictionary = build_dataset(words, 241 | vocabulary_size=VOCABULARY_SIZE) 242 | del words # Hint to reduce memory. 243 | 244 | print('Most common words (+UNK)', count[:5]) 245 | print('Sample data', data[:10]) 246 | 247 | ### train model 248 | # build skip-gram batch generator 249 | batch_generator = skip_gram_batch_generator(data=data, 250 | batch_size=128, 251 | num_skips=2, 252 | skip_window=1) 253 | 254 | # build skip-gram model 255 | model_SkipGram = SkipGram(n_vocabulary=VOCABULARY_SIZE, 256 | n_embedding=100, 257 | reverse_dictionary=reverse_dictionary, 258 | learning_rate=1.0) 259 | # initial model 260 | model_SkipGram.initialize() 261 | 262 | # online training 263 | epochs = 50 264 | num_batchs_in_epoch = 5000 265 | 266 | for epoch in range(epochs): 267 | start_time = time.time() 268 | avg_loss = 0 269 | for _ in range(num_batchs_in_epoch): 270 | batch, labels = next(batch_generator) 271 | loss = model_SkipGram.online_fit(X=batch, 272 | Y=labels) 273 | avg_loss += loss 274 | avg_loss = avg_loss / num_batchs_in_epoch 275 | print('Epoch %d/%d: %ds loss = %9.4f' 276 | % (epoch+1, epochs, time.time()-start_time, avg_loss)) 277 | 278 | ### nearest words 279 | valid_words_index = np.array([10, 20, 30, 40, 50, 210, 239, 392, 396]) 280 | 281 | valid_words, nearests = model_SkipGram.nearest_words(X=valid_words_index, top_nearest=8) 282 | for i in range(len(valid_words)): 283 | print('Nearest to \'{}\': '.format(valid_words[i]), nearests[i]) 284 | 285 | ### visualization 286 | from matplotlib import pylab 287 | from sklearn.manifold import TSNE 288 | 289 | def plot(embeddings, labels): 290 | assert embeddings.shape[0] >= len(labels), 'More labels than embeddings' 291 | pylab.figure(figsize=(15, 15)) # in inches 292 | for i, label in enumerate(labels): 293 | x, y = embeddings[i, :] 294 | pylab.scatter(x, y, color='blue') 295 | pylab.annotate(label, xy=(x, y), xytext=(5, 2), textcoords='offset points', 296 | ha='right', va='bottom') 297 | pylab.show() 298 | 299 | visualization_words = 800 300 | # transform embeddings to 2D by t-SNE 301 | embed = model_SkipGram.embedding_matrix()[1:visualization_words+1, :] 302 | tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000, method='exact') 303 | two_d_embed = tsne.fit_transform(embed) 304 | # list labels 305 | words = [model_SkipGram.reverse_dictionary[i] for i in range(1, visualization_words+1)] 306 | # plot 307 | plot(two_d_embed, words) 308 | 309 | 310 | if __name__ == '__main__': 311 | main() 312 | -------------------------------------------------------------------------------- /code/05_2_word2vec_CBOW.py: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/python3.6 2 | 3 | import collections 4 | import os 5 | import zipfile 6 | import math 7 | import time 8 | from urllib.request import urlretrieve 9 | 10 | import tensorflow as tf 11 | import numpy as np 12 | 13 | tf.logging.set_verbosity(tf.logging.ERROR) 14 | 15 | VOCABULARY_SIZE = 100000 16 | 17 | 18 | def maybe_download(url, filename, expected_bytes): 19 | """Download a file if not present, and make sure it's the right size.""" 20 | if not os.path.exists(filename): 21 | filename, _ = urlretrieve(url, filename) 22 | statinfo = os.stat(filename) 23 | if statinfo.st_size == expected_bytes: 24 | print('Found and verified %s' % filename) 25 | else: 26 | print(statinfo.st_size) 27 | raise Exception( 28 | 'Failed to verify ' + filename + '. Can you get to it with a browser?') 29 | return filename 30 | 31 | 32 | def read_data(filename): 33 | """Extract the first file enclosed in a zip file as a list of words""" 34 | with zipfile.ZipFile(filename) as f: 35 | data = tf.compat.as_str(f.read(f.namelist()[0])).split() 36 | return data 37 | 38 | 39 | def build_dataset(words, vocabulary_size=VOCABULARY_SIZE): 40 | count = [['UNK', -1]] 41 | count.extend(collections.Counter(words).most_common(vocabulary_size - 1)) 42 | dictionary = dict() 43 | for word, _ in count: 44 | dictionary[word] = len(dictionary) 45 | data = list() 46 | unk_count = 0 47 | for word in words: 48 | if word in dictionary: 49 | index = dictionary[word] 50 | else: 51 | index = 0 # dictionary['UNK'] 52 | unk_count = unk_count + 1 53 | data.append(index) 54 | count[0][1] = unk_count 55 | reverse_dictionary = dict(zip(dictionary.values(), dictionary.keys())) 56 | return data, count, dictionary, reverse_dictionary 57 | 58 | 59 | def cbow_batch_generator(data, batch_size, context_window): 60 | span = 2 * context_window + 1 # [ context_window target context_window ] 61 | num_bow = span - 1 62 | 63 | batch = np.ndarray(shape=(batch_size, num_bow), dtype=np.int32) 64 | labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32) 65 | 66 | buffer = collections.deque(maxlen=span) 67 | 68 | # initialization 69 | data_index = 0 70 | for _ in range(span): 71 | buffer.append(data[data_index]) 72 | data_index = (data_index + 1) % len(data) 73 | 74 | # generate 75 | k = 0 76 | target = context_window 77 | while True: 78 | bow = list(buffer) 79 | del bow[target] 80 | for i, w in enumerate(bow): 81 | batch[k, i] = w 82 | labels[k, 0] = buffer[target] 83 | k += 1 84 | 85 | # Recycle 86 | if data_index == len(data): 87 | data_index = 0 88 | 89 | # scan data 90 | buffer.append(data[data_index]) 91 | data_index = (data_index + 1) % len(data) 92 | 93 | # Enough num to output 94 | if k == batch_size: 95 | k = 0 96 | yield (batch, labels) 97 | 98 | 99 | class CBOW: 100 | 101 | def __init__(self, n_vocabulary, n_embedding, 102 | context_window, reverse_dictionary, learning_rate=1.0): 103 | self.n_vocabulary = n_vocabulary 104 | self.n_embedding = n_embedding 105 | self.context_window = context_window 106 | self.reverse_dictionary = reverse_dictionary 107 | 108 | self.weights = None 109 | self.biases = None 110 | 111 | self.graph = tf.Graph() # initialize new grap 112 | self.build(learning_rate) # building graph 113 | self.sess = tf.Session(graph=self.graph) # create session by the graph 114 | 115 | def build(self, learning_rate): 116 | with self.graph.as_default(): 117 | ### Input 118 | self.train_dataset = tf.placeholder(tf.int32, shape=[None, self.context_window*2]) 119 | self.train_labels = tf.placeholder(tf.int32, shape=[None, 1]) 120 | 121 | ### Optimalization 122 | # build neurel network structure and get their predictions and loss 123 | self.loss = self.structure( 124 | dataset=self.train_dataset, 125 | labels=self.train_labels, 126 | ) 127 | 128 | # normalize embeddings 129 | self.norm = tf.sqrt( 130 | tf.reduce_sum( 131 | tf.square(self.weights['embeddings']), 1, keep_dims=True)) 132 | self.normalized_embeddings = self.weights['embeddings'] / self.norm 133 | 134 | # define training operation 135 | self.optimizer = tf.train.AdagradOptimizer(learning_rate=learning_rate) 136 | self.train_op = self.optimizer.minimize(self.loss) 137 | 138 | ### Prediction 139 | self.new_dataset = tf.placeholder(tf.int32, shape=[None]) 140 | self.new_labels = tf.placeholder(tf.int32, shape=[None, 1]) 141 | 142 | # similarity 143 | self.new_embed = tf.nn.embedding_lookup( 144 | self.normalized_embeddings, self.new_dataset) 145 | 146 | self.new_similarity = tf.matmul(self.new_embed, 147 | tf.transpose(self.normalized_embeddings)) 148 | 149 | ### Initialization 150 | self.init_op = tf.global_variables_initializer() 151 | 152 | def structure(self, dataset, labels): 153 | ### Variable 154 | if (not self.weights) and (not self.biases): 155 | self.weights = { 156 | 'embeddings': tf.Variable( 157 | tf.random_uniform([self.n_vocabulary, self.n_embedding], 158 | -1.0, 1.0)), 159 | 'softmax': tf.Variable( 160 | tf.truncated_normal([self.n_vocabulary, self.n_embedding], 161 | stddev=1.0 / math.sqrt(self.n_embedding))) 162 | } 163 | self.biases = { 164 | 'softmax': tf.Variable(tf.zeros([self.n_vocabulary])) 165 | } 166 | 167 | ### Structure 168 | # Look up embeddings for inputs. 169 | embed_bow = tf.nn.embedding_lookup(self.weights['embeddings'], dataset) 170 | embed = tf.reduce_mean(embed_bow, axis=1) 171 | 172 | # Compute the softmax loss, using a sample of the negative labels each time. 173 | num_softmax_sampled = 64 174 | 175 | loss = tf.reduce_mean( 176 | tf.nn.sampled_softmax_loss(weights=self.weights['softmax'], 177 | biases=self.biases['softmax'], 178 | inputs=embed, 179 | labels=labels, 180 | num_sampled=num_softmax_sampled, 181 | num_classes=self.n_vocabulary)) 182 | 183 | return loss 184 | 185 | def initialize(self): 186 | self.weights = None 187 | self.biases = None 188 | self.sess.run(self.init_op) 189 | 190 | def online_fit(self, X, Y): 191 | feed_dict = {self.train_dataset: X, 192 | self.train_labels: Y} 193 | _, loss = self.sess.run([self.train_op, self.loss], feed_dict=feed_dict) 194 | 195 | return loss 196 | 197 | def nearest_words(self, X, top_nearest): 198 | similarity = self.sess.run(self.new_similarity, feed_dict={self.new_dataset: X}) 199 | X_size = X.shape[0] 200 | 201 | valid_words = [] 202 | nearests = [] 203 | for i in range(X_size): 204 | valid_word = self.find_word(X[i]) 205 | valid_words.append(valid_word) 206 | 207 | # select highest similarity word 208 | nearest = (-similarity[i, :]).argsort()[1:top_nearest+1] 209 | nearests.append(list(map(lambda x: self.find_word(x), nearest))) 210 | 211 | return (valid_words, np.array(nearests)) 212 | 213 | def evaluate(self, X, Y): 214 | return self.sess.run(self.new_loss, feed_dict={self.new_dataset: X, 215 | self.new_labels: Y}) 216 | 217 | def embedding_matrix(self): 218 | return self.sess.run(self.normalized_embeddings) 219 | 220 | def find_word(self, index): 221 | return self.reverse_dictionary[index] 222 | 223 | 224 | def main(): 225 | ### download and load data 226 | print('Downloading text8.zip') 227 | filename = maybe_download('http://mattmahoney.net/dc/text8.zip', './text8.zip', 31344016) 228 | 229 | print('=====') 230 | words = read_data(filename) 231 | print('Data size %d' % len(words)) 232 | print('First 10 words: {}'.format(words[:10])) 233 | 234 | print('=====') 235 | data, count, dictionary, reverse_dictionary = build_dataset(words, 236 | vocabulary_size=VOCABULARY_SIZE) 237 | del words # Hint to reduce memory. 238 | 239 | print('Most common words (+UNK)', count[:5]) 240 | print('Sample data', data[:10]) 241 | 242 | ### train model 243 | context_window = 1 244 | 245 | # build CBOW batch generator 246 | batch_generator = cbow_batch_generator(data=data, 247 | batch_size=128, 248 | context_window=context_window) 249 | 250 | # build CBOW model 251 | model_CBOW = CBOW(n_vocabulary=VOCABULARY_SIZE, 252 | n_embedding=100, 253 | context_window=context_window, 254 | reverse_dictionary=reverse_dictionary, 255 | learning_rate=1.0) 256 | 257 | # initialize model 258 | model_CBOW.initialize() 259 | 260 | # online training 261 | epochs = 50 262 | num_batchs_in_epoch = 5000 263 | 264 | for epoch in range(epochs): 265 | start_time = time.time() 266 | avg_loss = 0 267 | for _ in range(num_batchs_in_epoch): 268 | batch, labels = next(batch_generator) 269 | loss = model_CBOW.online_fit(X=batch, 270 | Y=labels) 271 | avg_loss += loss 272 | avg_loss = avg_loss / num_batchs_in_epoch 273 | print('Epoch %d/%d: %ds loss = %9.4f' % (epoch+1, epochs, time.time()-start_time, 274 | avg_loss)) 275 | 276 | 277 | ### nearest words 278 | valid_words_index = np.array([10, 20, 30, 40, 50, 210, 239, 392, 396]) 279 | 280 | valid_words, nearests = model_CBOW.nearest_words(X=valid_words_index, top_nearest=8) 281 | for i in range(len(valid_words)): 282 | print('Nearest to \'{}\': '.format(valid_words[i]), nearests[i]) 283 | 284 | 285 | ### visualization 286 | from matplotlib import pylab 287 | from sklearn.manifold import TSNE 288 | 289 | def plot(embeddings, labels): 290 | assert embeddings.shape[0] >= len(labels), 'More labels than embeddings' 291 | pylab.figure(figsize=(15, 15)) # in inches 292 | for i, label in enumerate(labels): 293 | x, y = embeddings[i, :] 294 | pylab.scatter(x, y, color='blue') 295 | pylab.annotate(label, xy=(x, y), xytext=(5, 2), textcoords='offset points', 296 | ha='right', va='bottom') 297 | pylab.show() 298 | 299 | visualization_words = 800 300 | # transform embeddings to 2D by t-SNE 301 | embed = model_CBOW.embedding_matrix()[1:visualization_words+1, :] 302 | tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000, method='exact') 303 | two_d_embed = tsne.fit_transform(embed) 304 | # list labels 305 | words = [model_CBOW.reverse_dictionary[i] for i in range(1, visualization_words+1)] 306 | # plot 307 | plot(two_d_embed, words) 308 | 309 | 310 | if __name__ == '__main__': 311 | main() 312 | -------------------------------------------------------------------------------- /code/06_LSTM.py: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/python3.6 2 | 3 | import os 4 | import random 5 | import string 6 | import zipfile 7 | from urllib.request import urlretrieve 8 | import time 9 | 10 | import numpy as np 11 | import tensorflow as tf 12 | 13 | tf.logging.set_verbosity(tf.logging.ERROR) 14 | 15 | LETTER_SIZE = len(string.ascii_lowercase) + 1 # [a-z] + ' ' 16 | FIRST_LETTER_ASCII = ord(string.ascii_lowercase[0]) 17 | 18 | 19 | def maybe_download(url, filename, expected_bytes): 20 | """Download a file if not present, and make sure it's the right size.""" 21 | if not os.path.exists(filename): 22 | filename, _ = urlretrieve(url, filename) 23 | statinfo = os.stat(filename) 24 | if statinfo.st_size == expected_bytes: 25 | print('Found and verified %s' % filename) 26 | else: 27 | print(statinfo.st_size) 28 | raise Exception( 29 | 'Failed to verify ' + filename + '. Can you get to it with a browser?') 30 | return filename 31 | 32 | 33 | def read_data(filename): 34 | with zipfile.ZipFile(filename) as f: 35 | name = f.namelist()[0] 36 | data = tf.compat.as_str(f.read(name)) 37 | return data 38 | 39 | 40 | def char2id(char): 41 | if char in string.ascii_lowercase: 42 | return ord(char) - FIRST_LETTER_ASCII + 1 43 | elif char == ' ': 44 | return 0 45 | else: 46 | print('Unexpected character: %s' % char) 47 | return 0 48 | 49 | 50 | def id2char(dictid): 51 | if dictid > 0: 52 | return chr(dictid + FIRST_LETTER_ASCII - 1) 53 | else: 54 | return ' ' 55 | 56 | 57 | def characters(probabilities): 58 | """Turn a 1-hot encoding or a probability distribution over the possible 59 | characters back into its (most likely) character representation.""" 60 | return [id2char(c) for c in np.argmax(probabilities, 1)] 61 | 62 | 63 | def batches2string(batches): 64 | """Convert a sequence of batches back into their (most likely) string 65 | representation.""" 66 | s = [''] * batches[0].shape[0] 67 | for b in batches: 68 | s = [''.join(x) for x in zip(s, characters(b))] 69 | return s 70 | 71 | 72 | def rnn_batch_generator(text, batch_size, num_unrollings): 73 | text_size = len(text) 74 | 75 | ### initialization 76 | segment = text_size // batch_size 77 | cursors = [offset * segment for offset in range(batch_size)] 78 | 79 | batches = [] 80 | batch_initial = np.zeros(shape=(batch_size, LETTER_SIZE), dtype=np.float) 81 | for i in range(batch_size): 82 | cursor = cursors[i] 83 | id_ = char2id(text[cursor]) 84 | batch_initial[i][id_] = 1.0 85 | 86 | # move cursor 87 | cursors[i] = (cursors[i] + 1) % text_size 88 | 89 | batches.append(batch_initial) 90 | 91 | ### generate loop 92 | while True: 93 | batches = [batches[-1], ] 94 | for _ in range(num_unrollings): 95 | batch = np.zeros(shape=(batch_size, LETTER_SIZE), dtype=np.float) 96 | for i in range(batch_size): 97 | cursor = cursors[i] 98 | id_ = char2id(text[cursor]) 99 | batch[i][id_] = 1.0 100 | 101 | # move cursor 102 | cursors[i] = (cursors[i] + 1) % text_size 103 | batches.append(batch) 104 | 105 | yield batches # [last batch of previous batches] + [unrollings] 106 | 107 | 108 | def sample_distribution(distribution): 109 | """Sample one element from a distribution assumed to be an array of normalized 110 | probabilities. 111 | """ 112 | r = random.uniform(0, 1) 113 | s = 0 114 | for i in range(len(distribution)): 115 | s += distribution[i] 116 | if s >= r: 117 | return i 118 | return len(distribution) - 1 119 | 120 | 121 | def sample(prediction): 122 | """Turn a (column) prediction into 1-hot encoded samples.""" 123 | p = np.zeros(shape=[1, LETTER_SIZE], dtype=np.float) 124 | p[0, sample_distribution(prediction[0])] = 1.0 125 | return p 126 | 127 | 128 | def logprob(predictions, labels): 129 | """Log-probability of the true labels in a predicted batch.""" 130 | predictions[predictions < 1e-10] = 1e-10 131 | return np.sum(np.multiply(labels, -np.log(predictions))) / labels.shape[0] 132 | 133 | 134 | class LSTM: 135 | 136 | def __init__(self, n_unrollings, n_memory, n_train_batch, learning_rate=1.0): 137 | self.n_unrollings = n_unrollings 138 | self.n_memory = n_memory 139 | 140 | self.weights = None 141 | self.biases = None 142 | self.saved = None 143 | 144 | self.graph = tf.Graph() # initialize new grap 145 | self.build(learning_rate, n_train_batch) # building graph 146 | self.sess = tf.Session(graph=self.graph) # create session by the graph 147 | 148 | def build(self, learning_rate, n_train_batch): 149 | with self.graph.as_default(): 150 | ### Input 151 | self.train_data = list() 152 | for _ in range(self.n_unrollings + 1): 153 | self.train_data.append( 154 | tf.placeholder(tf.float32, shape=[n_train_batch, LETTER_SIZE])) 155 | self.train_inputs = self.train_data[:self.n_unrollings] 156 | self.train_labels = self.train_data[1:] # labels are inputs shifted by one time step. 157 | 158 | 159 | ### Optimalization 160 | # build neurel network structure and get their loss 161 | self.y_, self.loss = self.structure( 162 | inputs=self.train_inputs, 163 | labels=self.train_labels, 164 | n_batch=n_train_batch, 165 | ) 166 | 167 | # define training operation 168 | 169 | self.optimizer = tf.train.AdagradOptimizer(learning_rate=learning_rate) 170 | 171 | # gradient clipping 172 | 173 | # output gradients one by one 174 | gradients, v = zip(*self.optimizer.compute_gradients(self.loss)) 175 | gradients, _ = tf.clip_by_global_norm(gradients, 1.25) # clip gradient 176 | # apply clipped gradients 177 | self.train_op = self.optimizer.apply_gradients(zip(gradients, v)) 178 | 179 | ### Sampling and validation eval: batch 1, no unrolling. 180 | self.sample_input = tf.placeholder(tf.float32, shape=[1, LETTER_SIZE]) 181 | 182 | saved_sample_output = tf.Variable(tf.zeros([1, self.n_memory])) 183 | saved_sample_state = tf.Variable(tf.zeros([1, self.n_memory])) 184 | self.reset_sample_state = tf.group( # reset sample state operator 185 | saved_sample_output.assign(tf.zeros([1, self.n_memory])), 186 | saved_sample_state.assign(tf.zeros([1, self.n_memory]))) 187 | 188 | sample_output, sample_state = self.lstm_cell( 189 | self.sample_input, saved_sample_output, saved_sample_state) 190 | with tf.control_dependencies([saved_sample_output.assign(sample_output), 191 | saved_sample_state.assign(sample_state)]): 192 | # use tf.control_dependencies to make sure 'saving' before 'prediction' 193 | 194 | self.sample_prediction = tf.nn.softmax( 195 | tf.nn.xw_plus_b(sample_output, 196 | self.weights['classifier'], 197 | self.biases['classifier'])) 198 | 199 | ### Initialization 200 | self.init_op = tf.global_variables_initializer() 201 | 202 | def lstm_cell(self, i, o, state): 203 | """Create a LSTM cell. See e.g.: http://arxiv.org/pdf/1402.1128v1.pdf 204 | Note that in this formulation, we omit the various connections between the 205 | previous state and the gates.""" 206 | ## Build Input Gate 207 | ix = self.weights['input_gate_i'] 208 | im = self.weights['input_gate_o'] 209 | ib = self.biases['input_gate'] 210 | input_gate = tf.sigmoid(tf.matmul(i, ix) + tf.matmul(o, im) + ib) 211 | ## Build Forget Gate 212 | fx = self.weights['forget_gate_i'] 213 | fm = self.weights['forget_gate_o'] 214 | fb = self.biases['forget_gate'] 215 | forget_gate = tf.sigmoid(tf.matmul(i, fx) + tf.matmul(o, fm) + fb) 216 | ## Memory 217 | cx = self.weights['memory_i'] 218 | cm = self.weights['memory_o'] 219 | cb = self.biases['memory'] 220 | update = tf.matmul(i, cx) + tf.matmul(o, cm) + cb 221 | ## Update State 222 | state = forget_gate * state + input_gate * tf.tanh(update) 223 | ## Build Output Gate 224 | ox = self.weights['output_gate_i'] 225 | om = self.weights['output_gate_o'] 226 | ob = self.biases['output_gate'] 227 | output_gate = tf.sigmoid(tf.matmul(i, ox) + tf.matmul(o, om) + ob) 228 | ## Ouput 229 | output = output_gate * tf.tanh(state) 230 | 231 | return output, state 232 | 233 | def structure(self, inputs, labels, n_batch): 234 | ### Variable 235 | if (not self.weights) or (not self.biases) or (not self.saved): 236 | self.weights = { 237 | 'input_gate_i': tf.Variable(tf.truncated_normal( 238 | [LETTER_SIZE, self.n_memory], -0.1, 0.1)), 239 | 'input_gate_o': tf.Variable(tf.truncated_normal( 240 | [self.n_memory, self.n_memory], -0.1, 0.1)), 241 | 'forget_gate_i': tf.Variable(tf.truncated_normal( 242 | [LETTER_SIZE, self.n_memory], -0.1, 0.1)), 243 | 'forget_gate_o': tf.Variable(tf.truncated_normal( 244 | [self.n_memory, self.n_memory], -0.1, 0.1)), 245 | 'output_gate_i': tf.Variable(tf.truncated_normal( 246 | [LETTER_SIZE, self.n_memory], -0.1, 0.1)), 247 | 'output_gate_o': tf.Variable(tf.truncated_normal( 248 | [self.n_memory, self.n_memory], -0.1, 0.1)), 249 | 'memory_i': tf.Variable(tf.truncated_normal( 250 | [LETTER_SIZE, self.n_memory], -0.1, 0.1)), 251 | 'memory_o': tf.Variable(tf.truncated_normal( 252 | [self.n_memory, self.n_memory], -0.1, 0.1)), 253 | 'classifier': tf.Variable(tf.truncated_normal( 254 | [self.n_memory, LETTER_SIZE], -0.1, 0.1)), 255 | 256 | } 257 | self.biases = { 258 | 'input_gate': tf.Variable(tf.zeros([1, self.n_memory])), 259 | 'forget_gate': tf.Variable(tf.zeros([1, self.n_memory])), 260 | 'output_gate': tf.Variable(tf.zeros([1, self.n_memory])), 261 | 'memory': tf.Variable(tf.zeros([1, self.n_memory])), 262 | 'classifier': tf.Variable(tf.zeros([LETTER_SIZE])), 263 | } 264 | 265 | # Variables saving state across unrollings. 266 | saved_output = tf.Variable(tf.zeros([n_batch, self.n_memory]), trainable=False) 267 | saved_state = tf.Variable(tf.zeros([n_batch, self.n_memory]), trainable=False) 268 | 269 | ### Structure 270 | # Unrolled LSTM loop. 271 | outputs = list() 272 | output = saved_output 273 | state = saved_state 274 | for input_ in inputs: 275 | output, state = self.lstm_cell(input_, output, state) 276 | outputs.append(output) 277 | 278 | # State saving across unrollings. 279 | with tf.control_dependencies([saved_output.assign(output), 280 | saved_state.assign(state)]): 281 | # use tf.control_dependencies to make sure 'saving' before 'calculating loss' 282 | 283 | # Classifier 284 | logits = tf.nn.xw_plus_b(tf.concat(outputs, 0), 285 | self.weights['classifier'], 286 | self.biases['classifier']) 287 | y_ = tf.nn.softmax(logits) 288 | loss = tf.reduce_mean( 289 | tf.nn.softmax_cross_entropy_with_logits( 290 | labels=tf.concat(labels, 0), logits=logits)) 291 | 292 | return y_, loss 293 | 294 | def initialize(self): 295 | self.weights = None 296 | self.biases = None 297 | self.sess.run(self.init_op) 298 | 299 | def online_fit(self, X): 300 | feed_dict = dict() 301 | for i in range(self.n_unrollings + 1): 302 | feed_dict[self.train_data[i]] = X[i] 303 | 304 | _, loss = self.sess.run([self.train_op, self.loss], feed_dict=feed_dict) 305 | return loss 306 | 307 | def perplexity(self, X): 308 | sum_logprob = 0 309 | sample_size = len(X)-1 310 | batch_size = X[0].shape[0] 311 | 312 | for i in range(batch_size): 313 | self.sess.run(self.reset_sample_state) 314 | for j in range(sample_size): 315 | sample_input = np.reshape(X[j][i], newshape=(1, -1)) 316 | sample_label = np.reshape(X[j+1][i], newshape=(1, -1)) 317 | predictions = self.sess.run(self.sample_prediction, 318 | feed_dict={self.sample_input: sample_input}) 319 | sum_logprob += logprob(predictions, sample_label) 320 | perplexity = float(np.exp(sum_logprob / batch_size / sample_size)) 321 | return perplexity 322 | 323 | def generate(self, c, len_generate): 324 | feed = np.array([[1 if id2char(i) == c else 0 for i in range(LETTER_SIZE)]]) 325 | sentence = characters(feed)[0] 326 | self.sess.run(self.reset_sample_state) 327 | for _ in range(len_generate): 328 | prediction = self.sess.run(self.sample_prediction, feed_dict={self.sample_input: feed}) 329 | feed = sample(prediction) 330 | sentence += characters(feed)[0] 331 | return sentence 332 | 333 | 334 | def main(): 335 | ### download and load data 336 | print('Downloading text8.zip') 337 | filename = maybe_download('http://mattmahoney.net/dc/text8.zip', './text8.zip', 31344016) 338 | 339 | print('=====') 340 | text = read_data(filename) 341 | print('Data size %d letters' % len(text)) 342 | 343 | print('=====') 344 | valid_size = 1000 345 | valid_text = text[:valid_size] 346 | train_text = text[valid_size:] 347 | train_size = len(train_text) 348 | print('Train Dataset: size:', train_size, 'letters,\n first 64:', train_text[:64]) 349 | print('Validation Dataset: size:', valid_size, 'letters,\n first 64:', valid_text[:64]) 350 | 351 | # build training batch generator 352 | batch_size = 64 353 | num_unrollings = 10 354 | 355 | batch_generator = rnn_batch_generator(text=train_text, 356 | batch_size=batch_size, 357 | num_unrollings=num_unrollings) 358 | 359 | # build validation data 360 | valid_batches = rnn_batch_generator(text=valid_text, 361 | batch_size=1, 362 | num_unrollings=1) 363 | 364 | valid_data = [np.array(next(valid_batches)) for _ in range(valid_size)] 365 | 366 | # build LSTM model 367 | model_LSTM = LSTM(n_unrollings=num_unrollings, 368 | n_memory=128, 369 | n_train_batch=batch_size, 370 | learning_rate=0.9) 371 | # initial model 372 | model_LSTM.initialize() 373 | 374 | # online training 375 | epochs = 30 376 | num_batchs_in_epoch = 5000 377 | valid_freq = 5 378 | 379 | for epoch in range(epochs): 380 | start_time = time.time() 381 | avg_loss = 0 382 | for _ in range(num_batchs_in_epoch): 383 | batch = next(batch_generator) 384 | loss = model_LSTM.online_fit(X=batch) 385 | avg_loss += loss 386 | 387 | avg_loss = avg_loss / num_batchs_in_epoch 388 | 389 | train_perplexity = model_LSTM.perplexity(batch) 390 | print('Epoch %d/%d: %ds loss = %6.4f, perplexity = %6.4f' 391 | % (epoch+1, epochs, time.time()-start_time, avg_loss, train_perplexity)) 392 | 393 | if (epoch+1) % valid_freq == 0: 394 | print('') 395 | print('=============== Validation ===============') 396 | print('validation perplexity = %6.4f' % (model_LSTM.perplexity(valid_data))) 397 | print('Generate From \'a\': ', model_LSTM.generate(c='a', len_generate=50)) 398 | print('Generate From \'h\': ', model_LSTM.generate(c='h', len_generate=50)) 399 | print('Generate From \'m\': ', model_LSTM.generate(c='m', len_generate=50)) 400 | print('==========================================') 401 | print('') 402 | 403 | 404 | if __name__ == '__main__': 405 | main() 406 | -------------------------------------------------------------------------------- /img/04_output_11_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/8175d4167315e71d2529d661b8d73a445f6b0e22/img/04_output_11_0.png -------------------------------------------------------------------------------- /img/04_output_13_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/8175d4167315e71d2529d661b8d73a445f6b0e22/img/04_output_13_0.png -------------------------------------------------------------------------------- /img/04_output_15_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/8175d4167315e71d2529d661b8d73a445f6b0e22/img/04_output_15_1.png -------------------------------------------------------------------------------- /img/04_output_5_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/8175d4167315e71d2529d661b8d73a445f6b0e22/img/04_output_5_1.png -------------------------------------------------------------------------------- /img/04_output_7_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/8175d4167315e71d2529d661b8d73a445f6b0e22/img/04_output_7_1.png -------------------------------------------------------------------------------- /img/04_output_9_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/8175d4167315e71d2529d661b8d73a445f6b0e22/img/04_output_9_0.png -------------------------------------------------------------------------------- /img/05_output_13_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/8175d4167315e71d2529d661b8d73a445f6b0e22/img/05_output_13_0.png -------------------------------------------------------------------------------- /img/05_output_20_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/8175d4167315e71d2529d661b8d73a445f6b0e22/img/05_output_20_0.png -------------------------------------------------------------------------------- /img/TensorflowTutorial.001.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/8175d4167315e71d2529d661b8d73a445f6b0e22/img/TensorflowTutorial.001.jpeg -------------------------------------------------------------------------------- /img/TensorflowTutorial.002.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/8175d4167315e71d2529d661b8d73a445f6b0e22/img/TensorflowTutorial.002.jpeg -------------------------------------------------------------------------------- /img/TensorflowTutorial.003.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/8175d4167315e71d2529d661b8d73a445f6b0e22/img/TensorflowTutorial.003.jpeg -------------------------------------------------------------------------------- /img/TensorflowTutorial.004.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/8175d4167315e71d2529d661b8d73a445f6b0e22/img/TensorflowTutorial.004.jpeg -------------------------------------------------------------------------------- /img/TensorflowTutorial.005.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/8175d4167315e71d2529d661b8d73a445f6b0e22/img/TensorflowTutorial.005.jpeg -------------------------------------------------------------------------------- /img/TensorflowTutorial.006.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/8175d4167315e71d2529d661b8d73a445f6b0e22/img/TensorflowTutorial.006.jpeg -------------------------------------------------------------------------------- /img/TensorflowTutorial.007.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/8175d4167315e71d2529d661b8d73a445f6b0e22/img/TensorflowTutorial.007.jpeg -------------------------------------------------------------------------------- /img/TensorflowTutorial.008.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/8175d4167315e71d2529d661b8d73a445f6b0e22/img/TensorflowTutorial.008.jpeg -------------------------------------------------------------------------------- /img/TensorflowTutorial.009.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/8175d4167315e71d2529d661b8d73a445f6b0e22/img/TensorflowTutorial.009.jpeg -------------------------------------------------------------------------------- /img/TensorflowTutorial.010.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/8175d4167315e71d2529d661b8d73a445f6b0e22/img/TensorflowTutorial.010.jpeg -------------------------------------------------------------------------------- /img/TensorflowTutorial.011.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/8175d4167315e71d2529d661b8d73a445f6b0e22/img/TensorflowTutorial.011.jpeg -------------------------------------------------------------------------------- /img/TensorflowTutorial.012.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/8175d4167315e71d2529d661b8d73a445f6b0e22/img/TensorflowTutorial.012.jpeg -------------------------------------------------------------------------------- /img/TensorflowTutorial.013.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/8175d4167315e71d2529d661b8d73a445f6b0e22/img/TensorflowTutorial.013.jpeg -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | # First make sure to update pip: 2 | # $ sudo pip install --upgrade pip 3 | # 4 | # Then you probably want to work in a virtualenv (optional): 5 | # $ sudo pip install --upgrade virtualenv 6 | # Or if you prefer you can install virtualenv using your favorite packaging 7 | # system. E.g., in Ubuntu: 8 | # $ sudo apt-get update && sudo apt-get install virtualenv 9 | # Then: 10 | # $ cd $my_work_dir 11 | # $ virtualenv my_env 12 | # $ . my_env/bin/activate 13 | # 14 | # Next, optionally uncomment the OpenAI gym lines (see below). 15 | # If you do, make sure to install the dependencies first. 16 | # If you are interested in xgboost for high performance Gradient Boosting, you 17 | # should uncomment the xgboost line (used in the ensemble learning notebook). 18 | # 19 | # Then install these requirements: 20 | # $ pip install --upgrade -r requirements.txt 21 | # 22 | # Finally, start jupyter: 23 | # $ jupyter notebook 24 | # 25 | 26 | 27 | ##### Core scientific packages 28 | jupyter==1.0.0 29 | matplotlib==2.2.2 30 | numpy==1.14.3 31 | pandas==0.22.0 32 | scipy==1.1.0 33 | 34 | 35 | ##### Machine Learning packages 36 | scikit-learn==0.19.1 37 | 38 | # Optional: the XGBoost library is only used in the ensemble learning chapter. 39 | #xgboost==0.71 40 | 41 | 42 | ##### Deep Learning packages 43 | 44 | # Replace tensorflow with tensorflow-gpu if you want GPU support. If so, 45 | # you need a GPU card with CUDA Compute Capability 3.0 or higher support, and 46 | # you must install CUDA, cuDNN and more: see tensorflow.org for the detailed 47 | # installation instructions. 48 | tensorflow==1.8.0 49 | #tensorflow-gpu==1.8.0 50 | 51 | # Forcing bleach to 1.5 to avoid version incompatibility when installing 52 | # TensorBoard. 53 | bleach==1.5.0 54 | 55 | Keras==2.1.6 56 | 57 | ##### Image manipulation 58 | imageio==2.3.0 59 | Pillow==5.1.0 60 | scikit-image==0.13.1 61 | 62 | -------------------------------------------------------------------------------- /tutorial/01_Simple_Logistic_Classification_on_MNIST.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text", 7 | "id": "qSLUyGtUfzXa" 8 | }, 9 | "source": [ 10 | "## Tensorflow Tutorial 1: Simple Logistic Classification on MNIST\n", 11 | "\n", 12 | "初次學習Tensorflow最困難的地方莫過於不知道從何下手,已經學會很多的Deep Learning理論,但是要自己使用Tensorflow將Network建起來卻是非常困難的,這篇文章我會先簡單的介紹幾個Tensorflow的概念,最後利用這些概念建立一個簡單的分類模型。\n", 13 | "\n", 14 | "本單元程式碼可於[Github]( https://github.com/GitYCC/Tensorflow_Tutorial/blob/master/code/01_simple_logistic_classification_on_MNIST.py)下載。\n", 15 | "\n", 16 | "首先,先`import`一些會用到的function" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 0, 22 | "metadata": { 23 | "colab": {}, 24 | "colab_type": "code", 25 | "id": "Lb4jih0XfzXc" 26 | }, 27 | "outputs": [], 28 | "source": [ 29 | "import numpy as np\n", 30 | "import tensorflow as tf\n", 31 | "import matplotlib.pyplot as plt\n", 32 | "\n", 33 | "tf.logging.set_verbosity(tf.logging.ERROR)\n", 34 | "\n", 35 | "# Config the matplotlib backend as plotting inline in IPython\n", 36 | "%matplotlib inline" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": { 42 | "colab_type": "text", 43 | "id": "8__0TFoIfzXh" 44 | }, 45 | "source": [ 46 | "### MNIST Dataset\n", 47 | "\n", 48 | "定義`summary` function以便於觀察ndarray。" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 0, 54 | "metadata": { 55 | "colab": {}, 56 | "colab_type": "code", 57 | "id": "3gV2LmkHfzXj" 58 | }, 59 | "outputs": [], 60 | "source": [ 61 | "def summary(ndarr):\n", 62 | " print(ndarr)\n", 63 | " print('* shape: {}'.format(ndarr.shape))\n", 64 | " print('* min: {}'.format(np.min(ndarr)))\n", 65 | " print('* max: {}'.format(np.max(ndarr)))\n", 66 | " print('* avg: {}'.format(np.mean(ndarr)))\n", 67 | " print('* std: {}'.format(np.std(ndarr)))\n", 68 | " print('* unique: {}'.format(np.unique(ndarr)))" 69 | ] 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "metadata": { 74 | "colab_type": "text", 75 | "id": "Xe2plTabfzXn" 76 | }, 77 | "source": [ 78 | "ndarray是numpy的基本元素,它非常便於我們做矩陣的運算。\n", 79 | "\n", 80 | "我們使用MNIST Dataset來當作我們練習的標的,MNIST包含一包手寫數字的圖片,每張圖片大小為28x28,每一張圖片都是一個手寫的阿拉伯數字包含0到9,並且標記上它所對應的數字。我們的目標就是要利用MNIST做到手寫數字辨識。\n", 81 | "\n", 82 | "在Tensorflow你可以很簡單的得到「處理過後的」MNIST,只要利用以下程式碼," 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": 3, 88 | "metadata": { 89 | "colab": { 90 | "base_uri": "https://localhost:8080/", 91 | "height": 191 92 | }, 93 | "colab_type": "code", 94 | "id": "geqB9zFafzXp", 95 | "outputId": "32cdc790-f1f9-4a41-e4f3-520d1d0c7ed7" 96 | }, 97 | "outputs": [ 98 | { 99 | "name": "stdout", 100 | "output_type": "stream", 101 | "text": [ 102 | "Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.\n", 103 | "Extracting MNIST_data/train-images-idx3-ubyte.gz\n", 104 | "Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.\n", 105 | "Extracting MNIST_data/train-labels-idx1-ubyte.gz\n", 106 | "Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.\n", 107 | "Extracting MNIST_data/t10k-images-idx3-ubyte.gz\n", 108 | "Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.\n", 109 | "Extracting MNIST_data/t10k-labels-idx1-ubyte.gz\n" 110 | ] 111 | } 112 | ], 113 | "source": [ 114 | "from tensorflow.examples.tutorials.mnist import input_data\n", 115 | "mnist = input_data.read_data_sets('MNIST_data/', one_hot=True)\n", 116 | "\n", 117 | "train_data = mnist.train\n", 118 | "valid_data = mnist.validation\n", 119 | "test_data = mnist.test" 120 | ] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "metadata": { 125 | "colab_type": "text", 126 | "id": "lBRy-koufzXv" 127 | }, 128 | "source": [ 129 | "每個`train_data`、`valid_data`、`test_data`都包含兩部分:圖片和標籤。\n", 130 | "\n", 131 | "我們來看一下圖片的部分,`train_data.images`一共有55000張圖,每一張圖原本大小是28x28,不過特別注意這裡的Data已經先做過預先處理了,因此圖片已經被打平成28x28=784的一維矩陣了,另外每個Pixel的值也先做過「Normalization」了,通常會這樣處理,每個值減去128再除以128,所以你可以從以下的`summary`中看到它的最大最小值落在0到1之間,還有這個Dataset也已經做過亂數重排了。" 132 | ] 133 | }, 134 | { 135 | "cell_type": "code", 136 | "execution_count": 4, 137 | "metadata": { 138 | "colab": { 139 | "base_uri": "https://localhost:8080/", 140 | "height": 966 141 | }, 142 | "colab_type": "code", 143 | "id": "zHqPB3cPfzXw", 144 | "outputId": "0fe76502-f755-4823-cdfc-9639a3d4bc1a" 145 | }, 146 | "outputs": [ 147 | { 148 | "name": "stdout", 149 | "output_type": "stream", 150 | "text": [ 151 | "[[0. 0. 0. ... 0. 0. 0.]\n", 152 | " [0. 0. 0. ... 0. 0. 0.]\n", 153 | " [0. 0. 0. ... 0. 0. 0.]\n", 154 | " ...\n", 155 | " [0. 0. 0. ... 0. 0. 0.]\n", 156 | " [0. 0. 0. ... 0. 0. 0.]\n", 157 | " [0. 0. 0. ... 0. 0. 0.]]\n", 158 | "* shape: (55000, 784)\n", 159 | "* min: 0.0\n", 160 | "* max: 1.0\n", 161 | "* avg: 0.13070042431354523\n", 162 | "* std: 0.30815958976745605\n", 163 | "* unique: [0. 0.00392157 0.00784314 0.01176471 0.01568628 0.01960784\n", 164 | " 0.02352941 0.02745098 0.03137255 0.03529412 0.03921569 0.04313726\n", 165 | " 0.04705883 0.0509804 0.05490196 0.05882353 0.0627451 0.06666667\n", 166 | " 0.07058824 0.07450981 0.07843138 0.08235294 0.08627451 0.09019608\n", 167 | " 0.09411766 0.09803922 0.10196079 0.10588236 0.10980393 0.1137255\n", 168 | " 0.11764707 0.12156864 0.1254902 0.12941177 0.13333334 0.13725491\n", 169 | " 0.14117648 0.14509805 0.14901961 0.15294118 0.15686275 0.16078432\n", 170 | " 0.16470589 0.16862746 0.17254902 0.1764706 0.18039216 0.18431373\n", 171 | " 0.18823531 0.19215688 0.19607845 0.20000002 0.20392159 0.20784315\n", 172 | " 0.21176472 0.21568629 0.21960786 0.22352943 0.227451 0.23137257\n", 173 | " 0.23529413 0.2392157 0.24313727 0.24705884 0.2509804 0.25490198\n", 174 | " 0.25882354 0.2627451 0.26666668 0.27058825 0.27450982 0.2784314\n", 175 | " 0.28235295 0.28627452 0.2901961 0.29411766 0.29803923 0.3019608\n", 176 | " 0.30588236 0.30980393 0.3137255 0.31764707 0.32156864 0.3254902\n", 177 | " 0.32941177 0.33333334 0.3372549 0.34117648 0.34509805 0.34901962\n", 178 | " 0.3529412 0.35686275 0.36078432 0.3647059 0.36862746 0.37254903\n", 179 | " 0.37647063 0.3803922 0.38431376 0.38823533 0.3921569 0.39607847\n", 180 | " 0.40000004 0.4039216 0.40784317 0.41176474 0.4156863 0.41960788\n", 181 | " 0.42352945 0.427451 0.43137258 0.43529415 0.43921572 0.4431373\n", 182 | " 0.44705886 0.45098042 0.454902 0.45882356 0.46274513 0.4666667\n", 183 | " 0.47058827 0.47450984 0.4784314 0.48235297 0.48627454 0.4901961\n", 184 | " 0.49411768 0.49803925 0.5019608 0.5058824 0.50980395 0.5137255\n", 185 | " 0.5176471 0.52156866 0.5254902 0.5294118 0.53333336 0.5372549\n", 186 | " 0.5411765 0.54509807 0.54901963 0.5529412 0.5568628 0.56078434\n", 187 | " 0.5647059 0.5686275 0.57254905 0.5764706 0.5803922 0.58431375\n", 188 | " 0.5882353 0.5921569 0.59607846 0.6 0.6039216 0.60784316\n", 189 | " 0.6117647 0.6156863 0.61960787 0.62352943 0.627451 0.6313726\n", 190 | " 0.63529414 0.6392157 0.6431373 0.64705884 0.6509804 0.654902\n", 191 | " 0.65882355 0.6627451 0.6666667 0.67058825 0.6745098 0.6784314\n", 192 | " 0.68235296 0.6862745 0.6901961 0.69411767 0.69803923 0.7019608\n", 193 | " 0.7058824 0.70980394 0.7137255 0.7176471 0.72156864 0.7254902\n", 194 | " 0.7294118 0.73333335 0.7372549 0.7411765 0.74509805 0.7490196\n", 195 | " 0.75294125 0.7568628 0.7607844 0.76470596 0.7686275 0.7725491\n", 196 | " 0.77647066 0.7803922 0.7843138 0.78823537 0.79215693 0.7960785\n", 197 | " 0.8000001 0.80392164 0.8078432 0.8117648 0.81568635 0.8196079\n", 198 | " 0.8235295 0.82745105 0.8313726 0.8352942 0.83921576 0.8431373\n", 199 | " 0.8470589 0.85098046 0.854902 0.8588236 0.86274517 0.86666673\n", 200 | " 0.8705883 0.8745099 0.87843144 0.882353 0.8862746 0.89019614\n", 201 | " 0.8941177 0.8980393 0.90196085 0.9058824 0.909804 0.91372555\n", 202 | " 0.9176471 0.9215687 0.92549026 0.9294118 0.9333334 0.93725497\n", 203 | " 0.94117653 0.9450981 0.9490197 0.95294124 0.9568628 0.9607844\n", 204 | " 0.96470594 0.9686275 0.9725491 0.97647065 0.9803922 0.9843138\n", 205 | " 0.98823535 0.9921569 0.9960785 1. ]\n" 206 | ] 207 | } 208 | ], 209 | "source": [ 210 | "summary(train_data.images)" 211 | ] 212 | }, 213 | { 214 | "cell_type": "markdown", 215 | "metadata": { 216 | "colab_type": "text", 217 | "id": "wT5YM4oGfzX4" 218 | }, 219 | "source": [ 220 | "來試著畫圖來看看,我們使用ndarray的index功能來選出第10張圖片,`train_data.images[10,:]`表示的是選第一軸的第10個和第二軸的全部。" 221 | ] 222 | }, 223 | { 224 | "cell_type": "code", 225 | "execution_count": 0, 226 | "metadata": { 227 | "colab": {}, 228 | "colab_type": "code", 229 | "id": "6R6j8WcafzX5" 230 | }, 231 | "outputs": [], 232 | "source": [ 233 | "def plot_fatten_img(ndarr):\n", 234 | " img = ndarr.copy()\n", 235 | " img.shape = (28,28)\n", 236 | " plt.imshow(img, cmap='gray')\n", 237 | " plt.show()" 238 | ] 239 | }, 240 | { 241 | "cell_type": "code", 242 | "execution_count": 6, 243 | "metadata": { 244 | "colab": { 245 | "base_uri": "https://localhost:8080/", 246 | "height": 289 247 | }, 248 | "colab_type": "code", 249 | "id": "1iZMitxzfzX9", 250 | "outputId": "5d747a32-21ff-4a1e-d225-2bef7b5a1df1" 251 | }, 252 | "outputs": [ 253 | { 254 | "data": { 255 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAP8AAAD8CAYAAAC4nHJkAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAADclJREFUeJzt3X+IXfWZx/HPY36AJBHMlg6jTTbZ\nIMGaP+wy6IqxdDFWVwJJQSWiMKWlEyHCFldtTJEEiiCLreYfE6cYG7Vru6JiLNIfhlJT0WIM/krc\n6WRDYmfIj0qKsfpHnZln/7gn3VHnfs/NPffcc67P+wXD3Huee855uOSTc879njtfc3cBiOesqhsA\nUA3CDwRF+IGgCD8QFOEHgiL8QFCEHwiK8ANBEX4gqNnd3JmZcTshUDJ3t1ZeV+jIb2bXmNmImR00\ns41FtgWgu6zde/vNbJakP0q6StKYpFcl3ejuBxLrcOQHStaNI/8lkg66+yF3/5ukn0laU2B7ALqo\nSPjPl/Snac/HsmWfYGZDZrbXzPYW2BeADiv9Az93H5Y0LHHaD9RJkSP/uKRF055/KVsGoAcUCf+r\nki4ws6VmNlfSOkm7OtMWgLK1fdrv7hNmdqukX0maJWmHu+/vWGcAStX2UF9bO+OaHyhdV27yAdC7\nCD8QFOEHgiL8QFCEHwiK8ANBEX4gKMIPBEX4gaAIPxAU4QeCIvxAUIQfCIrwA0ERfiAowg8ERfiB\noAg/EBThB4Ii/EBQhB8IivADQRF+ICjCDwRF+IGgCD8QFOEHgiL8QFCEHwiq7Sm6JcnMDkv6QNKk\npAl3H+hEU/gks/Skq+vWrWta27x5c3Ld5cuXt9VTJ4yMjCTrV155ZbJ+/PjxZH1iYuKMe4qkUPgz\n/+ru73VgOwC6iNN+IKii4XdJvzaz18xsqBMNAeiOoqf9K9193My+KOk3ZvY/7v7i9Bdk/ynwHwNQ\nM4WO/O4+nv0+IekZSZfM8Jphdx/gw0CgXtoOv5nNM7MFpx9L+rqktzvVGIByFTnt75P0TDYMNVvS\nf7n7LzvSFYDSmbt3b2dm3dtZDznrrPQJ2IYNG5L1rVu3tr3vqampZP2jjz5K1mfNmpWsn3322Wfc\nU6v279+frK9atappLe8egV7m7ukbQzIM9QFBEX4gKMIPBEX4gaAIPxAU4QeCYqivBoaG0nc/b9++\nve1tT05OJutbtmxJ1u+5555kffHixcn6HXfc0bR2yy23JNfNG0bMkxoKvPzyy5Prnjp1qtC+q8RQ\nH4Akwg8ERfiBoAg/EBThB4Ii/EBQhB8IinH+Lsgbr37ssceS9dSf5s6TN05/9913t73toq6//vpk\n/YEHHkjW+/v72973eeedl6wfO3as7W1XjXF+AEmEHwiK8ANBEX4gKMIPBEX4gaAIPxAU4/xdkDce\nPT4+Xmj7qe+tr169OrnukSNHCu27TC+99FKyftlll7W9bcb5OfIDYRF+ICjCDwRF+IGgCD8QFOEH\ngiL8QFCz815gZjskrZZ0wt1XZMsWSvq5pCWSDku6wd3/Ul6bvW3t2rWF1v/444+T9TvvvLNprc7j\n+HluuummZP3ll19O1vv6+prWBgcHk+ved999yXrefAi9oJUj/08kXfOpZRsl7Xb3CyTtzp4D6CG5\n4Xf3FyWd/NTiNZJ2Zo93Sip2aAPQde1e8/e5+9Hs8TFJzc+vANRS7jV/Hnf31D37ZjYkKT0ZHYCu\na/fIf9zM+iUp+32i2QvdfdjdB9x9oM19AShBu+HfJen0x6WDkp7tTDsAuiU3/Gb2hKSXJS03szEz\n+7akeyVdZWajklZlzwH0EL7P3wELFixI1vft25esL1u2LFkfHR1N1pcvX56sf17de2/6mJO6/yHP\nhRdemKyPjIy0ve2y8X1+AEmEHwiK8ANBEX4gKMIPBEX4gaAK394Lae7cucl63lAe2nPgwIHStr1+\n/fpk/bbbbitt393CkR8IivADQRF+ICjCDwRF+IGgCD8QFOEHgmKcvwcUncIbmAlHfiAowg8ERfiB\noAg/EBThB4Ii/EBQhB8IinH+Drj55ptL3f4jjzxS6vYRE0d+ICjCDwRF+IGgCD8QFOEHgiL8QFCE\nHwgqd5zfzHZIWi3phLuvyJZtkfQdSX/OXrbJ3Z8vq8m6W7p0adUtAGeslSP/TyRdM8Py+9394uwn\nbPCBXpUbfnd/UdLJLvQCoIuKXPPfamZvmtkOMzu3Yx0B6Ip2w79N0jJJF0s6KumHzV5oZkNmttfM\n9ra5LwAlaCv87n7c3SfdfUrSjyVdknjtsLsPuPtAu00C6Ly2wm9m/dOefkPS251pB0C3tDLU94Sk\nr0n6gpmNSdos6WtmdrEkl3RYUno+YwC1kxt+d79xhsUPl9ALgC7iDj8gKMIPBEX4gaAIPxAU4QeC\nIvxAUPzp7hr48MMPk/V33323S53gtJGRkapbKB1HfiAowg8ERfiBoAg/EBThB4Ii/EBQhB8IinH+\nGpg7d26yfs4553Spk3pZvHhxsn777beXtu8nn3yytG3XBUd+ICjCDwRF+IGgCD8QFOEHgiL8QFCE\nHwiKcf4OeOONNwqtP2fOnGR906ZNyfpzzz1XaP919fjjjyfrK1asaHvbGzduTNbff//9trfdKzjy\nA0ERfiAowg8ERfiBoAg/EBThB4Ii/EBQueP8ZrZI0qOS+iS5pGF332pmCyX9XNISSYcl3eDufymv\n1fratWtXqdtfuHBhqduvyl133ZWsX3rppYW2n/rb+w899FBy3cnJyUL77gWtHPknJP2Hu39Z0r9I\n2mBmX5a0UdJud79A0u7sOYAekRt+dz/q7vuyxx9IekfS+ZLWSNqZvWynpLVlNQmg887omt/Mlkj6\niqQ/SOpz96NZ6ZgalwUAekTL9/ab2XxJT0n6rrufMrO/19zdzcybrDckaahoowA6q6Ujv5nNUSP4\nP3X3p7PFx82sP6v3Szox07ruPuzuA+4+0ImGAXRGbvitcYh/WNI77v6jaaVdkgazx4OSnu18ewDK\nYu4znq3//wvMVkraI+ktSVPZ4k1qXPf/t6TFko6oMdR3Mmdb6Z31qHnz5iXrr7zySrJ+0UUXJet5\nw07bt29vWrv//vuT6x46dChZL2rVqlVNa88//3xy3dmz01eledNoX3311U1rn+dpz93d8l/VwjW/\nu/9eUrONXXkmTQGoD+7wA4Ii/EBQhB8IivADQRF+ICjCDwSVO87f0Z19Tsf58/T1pb/28MILLyTr\nefcBpBw8eDBZf/DBB9vetiQNDg4m68uWLWtamz9/fqF9b9iwIVnftm1boe33qlbH+TnyA0ERfiAo\nwg8ERfiBoAg/EBThB4Ii/EBQjPPXwHXXXZesb968OVkvch9AlUZHR5P11Pfxpfzv5E9NTSXrn1eM\n8wNIIvxAUIQfCIrwA0ERfiAowg8ERfiBoBjn7wF5f78+9fcC1q9fn1z3iiuuSNb37NmTrOfZsWNH\n09rY2Fhy3YmJiUL7jopxfgBJhB8IivADQRF+ICjCDwRF+IGgCD8QVO44v5ktkvSopD5JLmnY3bea\n2RZJ35H05+ylm9w9OeE64/xA+Vod528l/P2S+t19n5ktkPSapLWSbpD0V3e/r9WmCD9QvlbDn751\nrLGho5KOZo8/MLN3JJ1frD0AVTuja34zWyLpK5L+kC261czeNLMdZnZuk3WGzGyvme0t1CmAjmr5\n3n4zmy/pd5LucfenzaxP0ntqfA7wAzUuDb6Vsw1O+4GSdeyaX5LMbI6kX0j6lbv/aIb6Ekm/cPcV\nOdsh/EDJOvbFHjMzSQ9Lemd68LMPAk/7hqS3z7RJANVp5dP+lZL2SHpL0um/hbxJ0o2SLlbjtP+w\npPXZh4OpbXHkB0rW0dP+TiH8QPn4Pj+AJMIPBEX4gaAIPxAU4QeCIvxAUIQfCIrwA0ERfiAowg8E\nRfiBoAg/EBThB4Ii/EBQuX/As8Pek3Rk2vMvZMvqqK691bUvid7a1cne/rHVF3b1+/yf2bnZXncf\nqKyBhLr2Vte+JHprV1W9cdoPBEX4gaCqDv9wxftPqWtvde1Lord2VdJbpdf8AKpT9ZEfQEUqCb+Z\nXWNmI2Z20Mw2VtFDM2Z22MzeMrPXq55iLJsG7YSZvT1t2UIz+42ZjWa/Z5wmraLetpjZePbevW5m\n11bU2yIz+62ZHTCz/Wb279nySt+7RF+VvG9dP+03s1mS/ijpKkljkl6VdKO7H+hqI02Y2WFJA+5e\n+ZiwmX1V0l8lPXp6NiQz+09JJ9393uw/znPd/Xs16W2LznDm5pJ6azaz9DdV4XvXyRmvO6GKI/8l\nkg66+yF3/5ukn0laU0EftefuL0o6+anFayTtzB7vVOMfT9c16a0W3P2ou+/LHn8g6fTM0pW+d4m+\nKlFF+M+X9Kdpz8dUrym/XdKvzew1MxuqupkZ9E2bGemYpL4qm5lB7szN3fSpmaVr8961M+N1p/GB\n32etdPd/lvRvkjZkp7e15I1rtjoN12yTtEyNadyOSvphlc1kM0s/Jem77n5qeq3K926Gvip536oI\n/7ikRdOefylbVgvuPp79PiHpGTUuU+rk+OlJUrPfJyru5+/c/bi7T7r7lKQfq8L3LptZ+ilJP3X3\np7PFlb93M/VV1ftWRfhflXSBmS01s7mS1knaVUEfn2Fm87IPYmRm8yR9XfWbfXiXpMHs8aCkZyvs\n5RPqMnNzs5mlVfF7V7sZr9296z+SrlXjE///lfT9Knpo0tc/SXoj+9lfdW+SnlDjNPBjNT4b+bak\nf5C0W9KopBckLaxRb4+pMZvzm2oErb+i3laqcUr/pqTXs59rq37vEn1V8r5xhx8QFB/4AUERfiAo\nwg8ERfiBoAg/EBThB4Ii/EBQhB8I6v8A+Md7QMI5IyUAAAAASUVORK5CYII=\n", 256 | "text/plain": [ 257 | "
" 258 | ] 259 | }, 260 | "metadata": { 261 | "tags": [] 262 | }, 263 | "output_type": "display_data" 264 | } 265 | ], 266 | "source": [ 267 | "plot_fatten_img(train_data.images[10,:])" 268 | ] 269 | }, 270 | { 271 | "cell_type": "markdown", 272 | "metadata": { 273 | "colab_type": "text", 274 | "id": "-xTkbTtEfzYA" 275 | }, 276 | "source": [ 277 | "很顯而易見的,這是一個0。\n", 278 | "\n", 279 | "接下來來看標籤的部分,`train_data.labels`不意外的一樣的也是有相應的55000筆資料,所對應的就是前面的每一張圖片,總共有10種類型:0到9,所以大小為(55000, 10)。" 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "execution_count": 7, 285 | "metadata": { 286 | "colab": { 287 | "base_uri": "https://localhost:8080/", 288 | "height": 273 289 | }, 290 | "colab_type": "code", 291 | "id": "7T3xY8FefzYB", 292 | "outputId": "e98ee934-c542-4eec-cb56-2abfb1eba2e1" 293 | }, 294 | "outputs": [ 295 | { 296 | "name": "stdout", 297 | "output_type": "stream", 298 | "text": [ 299 | "[[0. 0. 0. ... 1. 0. 0.]\n", 300 | " [0. 0. 0. ... 0. 0. 0.]\n", 301 | " [0. 0. 0. ... 0. 0. 0.]\n", 302 | " ...\n", 303 | " [0. 0. 0. ... 0. 0. 0.]\n", 304 | " [0. 0. 0. ... 0. 0. 0.]\n", 305 | " [0. 0. 0. ... 0. 1. 0.]]\n", 306 | "* shape: (55000, 10)\n", 307 | "* min: 0.0\n", 308 | "* max: 1.0\n", 309 | "* avg: 0.1\n", 310 | "* std: 0.30000000000000004\n", 311 | "* unique: [0. 1.]\n" 312 | ] 313 | } 314 | ], 315 | "source": [ 316 | "summary(train_data.labels)" 317 | ] 318 | }, 319 | { 320 | "cell_type": "markdown", 321 | "metadata": { 322 | "colab_type": "text", 323 | "id": "XA4oSl6rfzYG" 324 | }, 325 | "source": [ 326 | "所以我們來看看上面那張圖片的標籤," 327 | ] 328 | }, 329 | { 330 | "cell_type": "code", 331 | "execution_count": 8, 332 | "metadata": { 333 | "colab": { 334 | "base_uri": "https://localhost:8080/", 335 | "height": 74 336 | }, 337 | "colab_type": "code", 338 | "id": "fjQG0pT3fzYH", 339 | "outputId": "d178d1e5-8efa-4c27-b6e7-bbc59f83e366" 340 | }, 341 | "outputs": [ 342 | { 343 | "name": "stdout", 344 | "output_type": "stream", 345 | "text": [ 346 | "[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n" 347 | ] 348 | } 349 | ], 350 | "source": [ 351 | "print(train_data.labels[10])" 352 | ] 353 | }, 354 | { 355 | "cell_type": "markdown", 356 | "metadata": { 357 | "colab_type": "text", 358 | "id": "zqoZq-xofzYM" 359 | }, 360 | "source": [ 361 | "看起來的確沒錯,在0的位置標示1.,而其他地方標示為0.,因此這是一個標示為0的label沒有錯,這種表示方法稱為One-Hot Encoding,它具有機率的涵義,所代表的是有100%的機會落在0的類別上。\n", 362 | "\n", 363 | "### Softmax\n", 364 | "\n", 365 | "通常One-Hot Encoding會搭配Softmax一同服用,最後的Output結果如果是機率分布,那我也需要讓我的Neurel Network可以輸出機率分布。\n", 366 | "\n", 367 | "![softmax](https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/master/img/TensorflowTutorial.001.jpeg)\n", 368 | "\n", 369 | "通過Softmax這一層,我們就可以將輸出轉變為以「機率」表示。\n", 370 | "\n", 371 | "我們可以來手刻一個Softmax Function,不過直接套用Tensorflow中函數的也是可以的。" 372 | ] 373 | }, 374 | { 375 | "cell_type": "code", 376 | "execution_count": 9, 377 | "metadata": { 378 | "colab": { 379 | "base_uri": "https://localhost:8080/", 380 | "height": 74 381 | }, 382 | "colab_type": "code", 383 | "id": "un0KnUEGfzYN", 384 | "outputId": "32e074d3-7312-43d6-abe0-35a1a6e33281" 385 | }, 386 | "outputs": [ 387 | { 388 | "name": "stdout", 389 | "output_type": "stream", 390 | "text": [ 391 | "[0.8360188 0.11314284 0.05083836]\n" 392 | ] 393 | } 394 | ], 395 | "source": [ 396 | "import numpy as np\n", 397 | "\n", 398 | "def softmax(x):\n", 399 | " # avoid exp function go to too large,\n", 400 | " # pre-reduce before applying exp function\n", 401 | " max_score = np.max(x, axis=0)\n", 402 | " x = x - max_score\n", 403 | " \n", 404 | " exp_s = np.exp(x)\n", 405 | " sum_exp_s = np.sum(exp_s, axis=0)\n", 406 | " softmax = exp_s / sum_exp_s\n", 407 | " return softmax\n", 408 | "\n", 409 | "scores = [3.0, 1.0, 0.2]\n", 410 | "print(softmax(scores))" 411 | ] 412 | }, 413 | { 414 | "cell_type": "markdown", 415 | "metadata": { 416 | "colab_type": "text", 417 | "id": "yb65iA9kfzYP" 418 | }, 419 | "source": [ 420 | "### Cross-Entropy Loss\n", 421 | "\n", 422 | "一旦我們要處理機率預測的問題,就不可以使用單純的「平方誤差」,而必須使用Cross-Entropy Loss,是這樣計算的:\n", 423 | "\n", 424 | "$$\n", 425 | "Loss_{cross-entropy} = - \\sum_i y_i ln(s_i)\n", 426 | "$$\n", 427 | "其中,$y_i$為目標Label,$s_i$為經過Softmax產生的預測值。\n", 428 | "\n", 429 | "至於如果你想要了解為何需要使用Cross-Entropy Loss?這我在機器學習基石的筆記中已經有提及過,請看[介紹Logistic Regression的部分](http://www.ycc.idv.tw/YCNote/post/27)。\n", 430 | "\n", 431 | "### 分離數據的重要性\n", 432 | "\n", 433 | "在MNIST Dataset中,你會發現分為Training Dataset、Validation Dataset和Testing Dataset,這樣的作法在Machine Learning中是常見且必要的。\n", 434 | "\n", 435 | "流程是這樣的,我們會先使用Training Dataset來訓練Model,並且使用Validation Dataset來檢驗Model的好壞,我們會依據Validation Dataset的檢驗調整Model上的參數,試著盡可能的壓低Validation Dataset的Error,記住!在過程中所產生的所有Models都要保留下來,因為最後選擇的Model並不是Validation Dataset的Error最小的,而是要再由Testing Dataset來做最後的挑選,挑選出能使Testing Dataset的Error最小的Model。\n", 436 | "\n", 437 | "這所有的作法都是為了避免Overfitting的情況發生,也就是機器可能因為看過一筆Data,結果就把這筆Data給完整記了起來,而Data本身含有雜訊,雜訊就這樣滲透到Model裡,確實做到分離是很重要的,讓Model在測試階段時可以使用沒有看過的Data。\n", 438 | "\n", 439 | "因此,Validation Dataset的分離是為了避免讓Model在Training階段看到要驗證的資料,所以更能正確的評估Model的好壞。但這樣是不夠的,人為會根據Validation Dataset來調整Model,這樣無形之中已經將Validation Dataset的資訊間接的經由人傳給了Model,所以還是沒有徹底分離,因此在最後挑選Models時,我們會使用另外一筆從沒看過的資料Testing Dataset來做挑選,一旦挑選完就不能再去調整任何參數了。\n", 440 | "\n", 441 | "\n", 442 | "### Tensorflow工作流程\n", 443 | "\n", 444 | "我們這一篇將會使用Tensorflow實作最簡單的單層Neurel Network,在這之前我們來看看Tensorflow是如何運作的?\n", 445 | "\n", 446 | "深度學習是由一層一層可以微分的神經元所連接而成,數學上可以表示為張量(Tensor)的表示式,我們一般講的矩陣運算是指2x2的矩陣運算,而張量(Tensor)則是拓寬到n維陣列做計算,在Machine Learning當中我們常常需要處理到相當高維度的計算,例如:有五張28x28的彩色圖的表示就必須使用到四維張量,第一維表示第幾張、第二、三維表示圖片的大小、第四維則表示RGB,如果你是物理系的學生應該也對張量不陌生,廣義相對論裡頭大量的使用四維張量運算,三維空間加一維時間。\n", 447 | "\n", 448 | "而在做Neurel Network時,我們會根據需求不同設計不同形式但合理的流程(Flow),再使用數據來訓練我的Model。所以,這就是Tensorflow命名由來:Tensor+Flow。\n", 449 | "\n", 450 | "因此,一開始要先設計Model的結構,這在Tensorflow裡頭稱為Graph,Graph的作用是事先決定Neurel Network的結構,決定Neuron要怎麼連接?決定哪一些窗口是可以由外部置放數據的?決定哪一些變數是可以被訓練的?哪一些變數是不可以被訓練的?定義將要怎麼樣優化這個系統?...等等。" 451 | ] 452 | }, 453 | { 454 | "cell_type": "code", 455 | "execution_count": 0, 456 | "metadata": { 457 | "colab": {}, 458 | "colab_type": "code", 459 | "id": "k2Q9E5pifzYP" 460 | }, 461 | "outputs": [], 462 | "source": [ 463 | "my_graph = tf.Graph() # Initialize a new graph\n", 464 | "\n", 465 | "with my_graph.as_default(): # Create a scope to build graph\n", 466 | " # ...\n", 467 | " # detail of building graph" 468 | ] 469 | }, 470 | { 471 | "cell_type": "markdown", 472 | "metadata": { 473 | "colab_type": "text", 474 | "id": "LwigLuGvfzYT" 475 | }, 476 | "source": [ 477 | "Graph只是一個結構,它不具有有效的資訊,而當我們定義完成Graph之後,接下來我們需要創造一個環境叫做Session,Session會將Graph的結構複製一份,然後再放入資訊進行Training或是預測等等,因此Session是具有有效資訊的。" 478 | ] 479 | }, 480 | { 481 | "cell_type": "code", 482 | "execution_count": 0, 483 | "metadata": { 484 | "colab": {}, 485 | "colab_type": "code", 486 | "id": "U_CrKKyNfzYU" 487 | }, 488 | "outputs": [], 489 | "source": [ 490 | "with tf.Session(graph=my_graph) as sess: # Copy graph into session\n", 491 | " # ...\n", 492 | " # detail of doing machine learning " 493 | ] 494 | }, 495 | { 496 | "cell_type": "markdown", 497 | "metadata": { 498 | "colab_type": "text", 499 | "id": "JNCDM4W7fzYW" 500 | }, 501 | "source": [ 502 | "還有另外一種寫法也是相同作用的,我個人比較喜歡下面這種寫法。" 503 | ] 504 | }, 505 | { 506 | "cell_type": "code", 507 | "execution_count": 0, 508 | "metadata": { 509 | "colab": {}, 510 | "colab_type": "code", 511 | "id": "Xw55D3iSfzYW" 512 | }, 513 | "outputs": [], 514 | "source": [ 515 | "my_session = tf.Session(graph=my_graph)\n", 516 | "my_session.run(...)" 517 | ] 518 | }, 519 | { 520 | "cell_type": "markdown", 521 | "metadata": { 522 | "colab_type": "text", 523 | "id": "ye469pWkfzYb" 524 | }, 525 | "source": [ 526 | "### Tensorflow的基本「張量」元素\n", 527 | "\n", 528 | "接下來我們就來看看有哪些構成Graph的基本元素可以使用。\n", 529 | "\n", 530 | "(1) 常數張量:\n", 531 | "\n", 532 | "一開始來看看「常數張量」,常數指的是在Model中不會改變的數值。" 533 | ] 534 | }, 535 | { 536 | "cell_type": "code", 537 | "execution_count": 0, 538 | "metadata": { 539 | "colab": {}, 540 | "colab_type": "code", 541 | "id": "F37wqyyjfzYc" 542 | }, 543 | "outputs": [], 544 | "source": [ 545 | "tensor = tf.constant([1, 2, 3, 4, 5, 6, 7], dtype=tf.int32)" 546 | ] 547 | }, 548 | { 549 | "cell_type": "markdown", 550 | "metadata": { 551 | "colab_type": "text", 552 | "id": "2YeRPOIwfzYd" 553 | }, 554 | "source": [ 555 | "(2) 變數張量:\n", 556 | "\n", 557 | "與常數截然不同的就是變數,「變數張量」是指在訓練當中可以改變的值,一般「變數張量」會用作於Machine Learning需要被訓練的參數,如果你沒有特別設定,在最佳化的過程中,Tensorflow會自動調整「變數張量」的數值來最佳化。" 558 | ] 559 | }, 560 | { 561 | "cell_type": "code", 562 | "execution_count": 0, 563 | "metadata": { 564 | "colab": { 565 | "base_uri": "https://localhost:8080/", 566 | "height": 107 567 | }, 568 | "colab_type": "code", 569 | "id": "4dtZwlVrfzYe", 570 | "outputId": "6123477a-304d-46bf-f366-0c21b81b7665" 571 | }, 572 | "outputs": [ 573 | { 574 | "name": "stdout", 575 | "output_type": "stream", 576 | "text": [ 577 | "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.\n", 578 | "Instructions for updating:\n", 579 | "Colocations handled automatically by placer.\n" 580 | ] 581 | } 582 | ], 583 | "source": [ 584 | "tensor = tf.Variable(tf.truncated_normal(shape=(3, 5)))" 585 | ] 586 | }, 587 | { 588 | "cell_type": "markdown", 589 | "metadata": { 590 | "colab_type": "text", 591 | "id": "w5u0rzpLfzYf" 592 | }, 593 | "source": [ 594 | "因為變數通常是未知且待優化的參數,所以我們一般會使用Initalizer來設定它的初始值,`tf.truncated_normal(shape=(3,5))`會隨機產生大小3x5的矩陣,它的值呈常態分佈但只取兩個標準差以內的數值。\n", 595 | "\n", 596 | "如果今天你想要有一個「變數張量」但是又不希望它因為最佳化而改變,這時你要特別指定`trainable`為`False`。" 597 | ] 598 | }, 599 | { 600 | "cell_type": "code", 601 | "execution_count": 0, 602 | "metadata": { 603 | "colab": {}, 604 | "colab_type": "code", 605 | "id": "qmr1KLJafzYf" 606 | }, 607 | "outputs": [], 608 | "source": [ 609 | "tensor = tf.Variable(5, trainable=False)" 610 | ] 611 | }, 612 | { 613 | "cell_type": "markdown", 614 | "metadata": { 615 | "colab_type": "text", 616 | "id": "ft68RaznfzYg" 617 | }, 618 | "source": [ 619 | "(3) 置放張量:\n", 620 | "\n", 621 | "另外有一些張量負責擔任輸入窗口的角色,稱為Placeholder。" 622 | ] 623 | }, 624 | { 625 | "cell_type": "code", 626 | "execution_count": 0, 627 | "metadata": { 628 | "colab": {}, 629 | "colab_type": "code", 630 | "id": "X3s760m5fzYg" 631 | }, 632 | "outputs": [], 633 | "source": [ 634 | "tensor = tf.placeholder(tf.float32, shape=(None, 1000))" 635 | ] 636 | }, 637 | { 638 | "cell_type": "markdown", 639 | "metadata": { 640 | "colab_type": "text", 641 | "id": "OVKws2zQfzYi" 642 | }, 643 | "source": [ 644 | "因為我們在訓練之前還尚未知道Data的數量,所以這裡使用None來表示未知。`tf.placeholder`在Graph階段是沒有數值的,必須等到Session階段才將數值給輸入進去。\n", 645 | "\n", 646 | "(4) 操作型張量:\n", 647 | "\n", 648 | "這類張量並不含有實際數值,而是一種操作,常用的「操作型張量」有兩種,第一種是作為最佳化使用," 649 | ] 650 | }, 651 | { 652 | "cell_type": "code", 653 | "execution_count": 0, 654 | "metadata": { 655 | "colab": {}, 656 | "colab_type": "code", 657 | "id": "JUCAGyhkfzYi" 658 | }, 659 | "outputs": [], 660 | "source": [ 661 | "loss = ...\n", 662 | "train_op = tf.train.GradientDescentOptimizer(learning_rate=0.5).minimize(loss)" 663 | ] 664 | }, 665 | { 666 | "cell_type": "markdown", 667 | "metadata": { 668 | "colab_type": "text", 669 | "id": "TvgxFoaPfzYj" 670 | }, 671 | "source": [ 672 | "選擇Optimizer和最佳化的方式來定義最佳化的操作方法,上述的例子是使用learning_rate為0.5的Gradient Descent來降低loss。\n", 673 | "\n", 674 | "另外一種是初始化的操作," 675 | ] 676 | }, 677 | { 678 | "cell_type": "code", 679 | "execution_count": 0, 680 | "metadata": { 681 | "colab": {}, 682 | "colab_type": "code", 683 | "id": "9lclfbDwfzYk" 684 | }, 685 | "outputs": [], 686 | "source": [ 687 | "init_op = tf.global_variables_initializer()" 688 | ] 689 | }, 690 | { 691 | "cell_type": "markdown", 692 | "metadata": { 693 | "colab_type": "text", 694 | "id": "Z5pAzf3NfzYl" 695 | }, 696 | "source": [ 697 | "這一個步驟是必要的但常常被忽略,還記得剛剛我們定義「變數張量」時有用到Initalizer,這些Initalizer在Graph完成時還不具有數值,必須使用`init_op`來給予數值,所以記住一定要放`init_op`進去Graph裡頭,而且必須先定義完成所有會用到的Initalizer再來設定這個`init_op`。\n", 698 | "\n", 699 | "### Session的操作\n", 700 | "\n", 701 | "「張量」元素具有兩個面向:功能和數值,在Graph階段「張量」只具有功能但不具有數值,只有到了Session階段才開始有數值,那如何將這些數值取出來呢?有兩種方法,以1+1當作範例來看看," 702 | ] 703 | }, 704 | { 705 | "cell_type": "code", 706 | "execution_count": 10, 707 | "metadata": { 708 | "colab": { 709 | "base_uri": "https://localhost:8080/", 710 | "height": 74 711 | }, 712 | "colab_type": "code", 713 | "id": "bmwBI9VwfzYm", 714 | "outputId": "e2b8559f-8c89-4658-bb20-bed79674cadb" 715 | }, 716 | "outputs": [ 717 | { 718 | "name": "stdout", 719 | "output_type": "stream", 720 | "text": [ 721 | "Tensor(\"Add:0\", shape=(), dtype=int32)\n" 722 | ] 723 | } 724 | ], 725 | "source": [ 726 | "g1 = tf.Graph()\n", 727 | "with g1.as_default():\n", 728 | " x = tf.constant(1)\n", 729 | " y = tf.constant(1)\n", 730 | " sol = tf.add(x,y) # add x and y\n", 731 | "\n", 732 | "with tf.Session(graph=g1) as sess: \n", 733 | " print(sol) # print tensor, not their value" 734 | ] 735 | }, 736 | { 737 | "cell_type": "code", 738 | "execution_count": 11, 739 | "metadata": { 740 | "colab": { 741 | "base_uri": "https://localhost:8080/", 742 | "height": 74 743 | }, 744 | "colab_type": "code", 745 | "id": "miPCRXPxfzYo", 746 | "outputId": "9d05fbd2-2459-4c27-937f-9391c0445cb1" 747 | }, 748 | "outputs": [ 749 | { 750 | "name": "stdout", 751 | "output_type": "stream", 752 | "text": [ 753 | "2\n" 754 | ] 755 | } 756 | ], 757 | "source": [ 758 | "with tf.Session(graph=g1) as sess: \n", 759 | " print(sol.eval()) # evaluate their value" 760 | ] 761 | }, 762 | { 763 | "cell_type": "code", 764 | "execution_count": 12, 765 | "metadata": { 766 | "colab": { 767 | "base_uri": "https://localhost:8080/", 768 | "height": 74 769 | }, 770 | "colab_type": "code", 771 | "id": "7nuqkaoOfzYp", 772 | "outputId": "bd58d823-3797-4cee-913b-18267d786256" 773 | }, 774 | "outputs": [ 775 | { 776 | "name": "stdout", 777 | "output_type": "stream", 778 | "text": [ 779 | "2\n" 780 | ] 781 | } 782 | ], 783 | "source": [ 784 | "s1 = tf.Session(graph=g1)\n", 785 | "print(s1.run(sol)) # another way of evaluating value" 786 | ] 787 | }, 788 | { 789 | "cell_type": "markdown", 790 | "metadata": { 791 | "colab_type": "text", 792 | "id": "GTKAn9nafzYr" 793 | }, 794 | "source": [ 795 | "那如果我想使用placeholder來做到x+y呢?" 796 | ] 797 | }, 798 | { 799 | "cell_type": "code", 800 | "execution_count": 13, 801 | "metadata": { 802 | "colab": { 803 | "base_uri": "https://localhost:8080/", 804 | "height": 74 805 | }, 806 | "colab_type": "code", 807 | "id": "MKzo2tA-fzYr", 808 | "outputId": "91c99bf7-2afe-455c-fa2b-885b53e1e877" 809 | }, 810 | "outputs": [ 811 | { 812 | "name": "stdout", 813 | "output_type": "stream", 814 | "text": [ 815 | "5\n" 816 | ] 817 | } 818 | ], 819 | "source": [ 820 | "g2 = tf.Graph()\n", 821 | "with g2.as_default():\n", 822 | " x = tf.placeholder(tf.int32)\n", 823 | " y = tf.placeholder(tf.int32)\n", 824 | " sol = tf.add(x,y) # add x and y\n", 825 | "\n", 826 | "s2 = tf.Session(graph=g2)\n", 827 | "\n", 828 | "# if x = 2 and y = 3\n", 829 | "print(s2.run(sol, feed_dict={x: 2, y: 3})) " 830 | ] 831 | }, 832 | { 833 | "cell_type": "code", 834 | "execution_count": 14, 835 | "metadata": { 836 | "colab": { 837 | "base_uri": "https://localhost:8080/", 838 | "height": 74 839 | }, 840 | "colab_type": "code", 841 | "id": "Xm2uS2QCfzYu", 842 | "outputId": "d54d4029-b908-476c-d425-9483f3d22f09" 843 | }, 844 | "outputs": [ 845 | { 846 | "name": "stdout", 847 | "output_type": "stream", 848 | "text": [ 849 | "12\n" 850 | ] 851 | } 852 | ], 853 | "source": [ 854 | "# if x = 5 and y = 7\n", 855 | "print(s2.run(sol, feed_dict={x: 5, y: 7})) " 856 | ] 857 | }, 858 | { 859 | "cell_type": "markdown", 860 | "metadata": { 861 | "colab_type": "text", 862 | "id": "zROE5xCUfzYw" 863 | }, 864 | "source": [ 865 | "因為x和y是placeholder,所以必須使用`feed_dict`來餵入相關資訊,否則會報錯。" 866 | ] 867 | }, 868 | { 869 | "cell_type": "markdown", 870 | "metadata": { 871 | "colab_type": "text", 872 | "id": "zvY98ZFufzYw" 873 | }, 874 | "source": [ 875 | "### 第一個Tensorflow Model\n", 876 | "\n", 877 | "有了以上的認識我們就可以來建立我們第一個Model。\n", 878 | "\n", 879 | "以下我會使用物件導向的寫法,讓程式碼更有條理。\n", 880 | "\n", 881 | "Machine Learning在操作上可以整理成三個大步驟:建構(Building)、訓練(Fitting)和推論(Inference),所以我們將會使用這三大步驟來建製我們的Model。\n", 882 | "\n", 883 | "在`SimpleLogisticClassification`裡頭,「建構」的動作在`__init__`中會進行,由`build`函式來建立Graph,其中我將Neurel Network的結構分離存於`structure`裡。「訓練」的動作在`fit`中進行,這裡採用傳統的Gradient Descent的方法,將所有Data全部考慮進去最佳化,未來會再介紹Batch Gradient Descent。最後,「推論」的部分在`predict`和`evaluate`中進行。\n", 884 | "\n", 885 | "`SimpleLogisticClassification`將會建構一個只有一層的Neurel Network,也就是說沒有Hidden Layer,畫個圖。\n", 886 | "\n", 887 | "![Simple Logistic Classification](https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/master/img/TensorflowTutorial.002.jpeg)" 888 | ] 889 | }, 890 | { 891 | "cell_type": "code", 892 | "execution_count": 0, 893 | "metadata": { 894 | "colab": {}, 895 | "colab_type": "code", 896 | "id": "IkFZPA_LfzYw" 897 | }, 898 | "outputs": [], 899 | "source": [ 900 | "class SimpleLogisticClassification:\n", 901 | "\n", 902 | " def __init__(self, n_features, n_labels, learning_rate=0.5):\n", 903 | " self.n_features = n_features\n", 904 | " self.n_labels = n_labels\n", 905 | "\n", 906 | " self.weights = None\n", 907 | " self.biases = None\n", 908 | "\n", 909 | " self.graph = tf.Graph() # initialize new graph\n", 910 | " self.build(learning_rate) # building graph\n", 911 | " self.sess = tf.Session(graph=self.graph) # create session by the graph\n", 912 | "\n", 913 | " def build(self, learning_rate):\n", 914 | " # Building Graph\n", 915 | " with self.graph.as_default():\n", 916 | " ### Input\n", 917 | " self.train_features = tf.placeholder(tf.float32, shape=(None, self.n_features))\n", 918 | " self.train_labels = tf.placeholder(tf.int32, shape=(None, self.n_labels))\n", 919 | "\n", 920 | " ### Optimalization\n", 921 | " # build neurel network structure and get their predictions and loss\n", 922 | " self.y_, self.loss = self.structure(features=self.train_features,\n", 923 | " labels=self.train_labels)\n", 924 | " # define training operation\n", 925 | " self.train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(self.loss)\n", 926 | "\n", 927 | " ### Prediction\n", 928 | " self.new_features = tf.placeholder(tf.float32, shape=(None, self.n_features))\n", 929 | " self.new_labels = tf.placeholder(tf.int32, shape=(None, self.n_labels))\n", 930 | " self.new_y_, self.new_loss = self.structure(features=self.new_features,\n", 931 | " labels=self.new_labels)\n", 932 | "\n", 933 | " ### Initialization\n", 934 | " self.init_op = tf.global_variables_initializer()\n", 935 | "\n", 936 | " def structure(self, features, labels):\n", 937 | " # build neurel network structure and return their predictions and loss\n", 938 | " ### Variable\n", 939 | " if (not self.weights) or (not self.biases):\n", 940 | " self.weights = {\n", 941 | " 'fc1': tf.Variable(tf.truncated_normal(shape=(self.n_features, self.n_labels))),\n", 942 | " }\n", 943 | " self.biases = {\n", 944 | " 'fc1': tf.Variable(tf.zeros(shape=(self.n_labels))),\n", 945 | " }\n", 946 | "\n", 947 | " ### Structure\n", 948 | " # one fully connected layer\n", 949 | " logits = self.get_dense_layer(features, self.weights['fc1'], self.biases['fc1'])\n", 950 | "\n", 951 | " # predictions\n", 952 | " y_ = tf.nn.softmax(logits)\n", 953 | "\n", 954 | " # loss: softmax cross entropy\n", 955 | " loss = tf.reduce_mean(\n", 956 | " tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits))\n", 957 | "\n", 958 | " return (y_, loss)\n", 959 | "\n", 960 | " def get_dense_layer(self, input_layer, weight, bias, activation=None):\n", 961 | " # fully connected layer\n", 962 | " x = tf.add(tf.matmul(input_layer, weight), bias)\n", 963 | " if activation:\n", 964 | " x = activation(x)\n", 965 | " return x\n", 966 | "\n", 967 | " def fit(self, X, y, epochs=10, validation_data=None, test_data=None):\n", 968 | " X = self._check_array(X)\n", 969 | " y = self._check_array(y)\n", 970 | "\n", 971 | " self.sess.run(self.init_op)\n", 972 | " for epoch in range(epochs):\n", 973 | " print('Epoch %2d/%2d: ' % (epoch+1, epochs))\n", 974 | "\n", 975 | " # fully gradient descent\n", 976 | " feed_dict = {self.train_features: X, self.train_labels: y}\n", 977 | " self.sess.run(self.train_op, feed_dict=feed_dict)\n", 978 | "\n", 979 | " # evaluate at the end of this epoch\n", 980 | " y_ = self.predict(X)\n", 981 | " train_loss = self.evaluate(X, y)\n", 982 | " train_acc = self.accuracy(y_, y)\n", 983 | " msg = ' loss = %8.4f, acc = %3.2f%%' % (train_loss, train_acc*100)\n", 984 | "\n", 985 | " if validation_data:\n", 986 | " val_loss = self.evaluate(validation_data[0], validation_data[1])\n", 987 | " val_acc = self.accuracy(self.predict(validation_data[0]), validation_data[1])\n", 988 | " msg += ', val_loss = %8.4f, val_acc = %3.2f%%' % (val_loss, val_acc*100)\n", 989 | "\n", 990 | " print(msg)\n", 991 | "\n", 992 | " if test_data:\n", 993 | " test_acc = self.accuracy(self.predict(test_data[0]), test_data[1])\n", 994 | " print('test_acc = %3.2f%%' % (test_acc*100))\n", 995 | "\n", 996 | " def accuracy(self, predictions, labels):\n", 997 | " return (np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))/predictions.shape[0])\n", 998 | "\n", 999 | " def predict(self, X):\n", 1000 | " X = self._check_array(X)\n", 1001 | " return self.sess.run(self.new_y_, feed_dict={self.new_features: X})\n", 1002 | "\n", 1003 | " def evaluate(self, X, y):\n", 1004 | " X = self._check_array(X)\n", 1005 | " y = self._check_array(y)\n", 1006 | " return self.sess.run(self.new_loss, feed_dict={self.new_features: X, self.new_labels: y})\n", 1007 | "\n", 1008 | " def _check_array(self, ndarray):\n", 1009 | " ndarray = np.array(ndarray)\n", 1010 | " if len(ndarray.shape) == 1:\n", 1011 | " ndarray = np.reshape(ndarray, (1, ndarray.shape[0]))\n", 1012 | " return ndarray" 1013 | ] 1014 | }, 1015 | { 1016 | "cell_type": "code", 1017 | "execution_count": 16, 1018 | "metadata": { 1019 | "colab": { 1020 | "base_uri": "https://localhost:8080/", 1021 | "height": 385 1022 | }, 1023 | "colab_type": "code", 1024 | "id": "dTVeEqwRfzYx", 1025 | "outputId": "960805d5-00a6-4cec-d83b-ab0c785d243f" 1026 | }, 1027 | "outputs": [ 1028 | { 1029 | "name": "stdout", 1030 | "output_type": "stream", 1031 | "text": [ 1032 | "Epoch 1/10: \n", 1033 | " loss = 9.2515, acc = 12.81%, val_loss = 9.4888, val_acc = 11.92%\n", 1034 | "Epoch 2/10: \n", 1035 | " loss = 8.2946, acc = 13.89%, val_loss = 8.5156, val_acc = 13.10%\n", 1036 | "Epoch 3/10: \n", 1037 | " loss = 7.5609, acc = 15.92%, val_loss = 7.7680, val_acc = 15.02%\n", 1038 | "Epoch 4/10: \n", 1039 | " loss = 6.9563, acc = 18.31%, val_loss = 7.1521, val_acc = 17.44%\n", 1040 | "Epoch 5/10: \n", 1041 | " loss = 6.4402, acc = 20.94%, val_loss = 6.6249, val_acc = 19.80%\n", 1042 | "Epoch 6/10: \n", 1043 | " loss = 5.9915, acc = 23.35%, val_loss = 6.1650, val_acc = 22.38%\n", 1044 | "Epoch 7/10: \n", 1045 | " loss = 5.5971, acc = 25.79%, val_loss = 5.7596, val_acc = 24.98%\n", 1046 | "Epoch 8/10: \n", 1047 | " loss = 5.2479, acc = 28.18%, val_loss = 5.4001, val_acc = 27.30%\n", 1048 | "Epoch 9/10: \n", 1049 | " loss = 4.9376, acc = 30.46%, val_loss = 5.0803, val_acc = 29.86%\n", 1050 | "Epoch 10/10: \n", 1051 | " loss = 4.6608, acc = 32.71%, val_loss = 4.7947, val_acc = 32.20%\n", 1052 | "test_acc = 33.58%\n" 1053 | ] 1054 | } 1055 | ], 1056 | "source": [ 1057 | "model = SimpleLogisticClassification(n_features=28*28, n_labels=10, learning_rate= 0.5)\n", 1058 | "model.fit(\n", 1059 | " X=train_data.images,\n", 1060 | " y=train_data.labels,\n", 1061 | " epochs=10,\n", 1062 | " validation_data=(valid_data.images, valid_data.labels),\n", 1063 | " test_data=(test_data.images, test_data.labels),\n", 1064 | ")" 1065 | ] 1066 | } 1067 | ], 1068 | "metadata": { 1069 | "colab": { 1070 | "name": "01_Simple_Logistic_Classification_on_MNIST.ipynb", 1071 | "provenance": [], 1072 | "version": "0.3.2" 1073 | }, 1074 | "kernelspec": { 1075 | "display_name": "Python 3", 1076 | "language": "python", 1077 | "name": "python3" 1078 | }, 1079 | "language_info": { 1080 | "codemirror_mode": { 1081 | "name": "ipython", 1082 | "version": 3 1083 | }, 1084 | "file_extension": ".py", 1085 | "mimetype": "text/x-python", 1086 | "name": "python", 1087 | "nbconvert_exporter": "python", 1088 | "pygments_lexer": "ipython3", 1089 | "version": "3.6.5" 1090 | } 1091 | }, 1092 | "nbformat": 4, 1093 | "nbformat_minor": 1 1094 | } 1095 | -------------------------------------------------------------------------------- /tutorial/06_RNN_and_LSTM.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text", 7 | "id": "QPBoriJk7PuK" 8 | }, 9 | "source": [ 10 | "## Tensorflow Tutorial 6: RNN and LSTM\n", 11 | "\n", 12 | "如果我們想要處理的問題是具有時序性的,該怎麼辦呢?本章將會介紹有時序性的Neurel Network。\n", 13 | "\n", 14 | "本單元程式碼LSTM部分可於[Github](https://github.com/GitYCC/Tensorflow_Tutorial/blob/master/code/06_LSTM.py)下載。\n", 15 | "\n", 16 | "\n", 17 | "### 概論RNN\n", 18 | "\n", 19 | "當我們想使得Neurel Network具有時序性,我們的Neurel Network就必須有記憶的功能,然後在我不斷的輸入新資訊時,也能同時保有歷史資訊的影響,最簡單的作法就是將Output的結果保留,等到新資訊進來時,將新的資訊和舊的Output一起考量來訓練Neurel Network。\n", 20 | "\n", 21 | "![unrolling](https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/master/img/TensorflowTutorial.010.jpeg)\n", 22 | "\n", 23 | "這種將舊有資訊保留的Neurel Network統稱為Recurrent Neural Networks (RNN),這種不斷回饋的網路可以攤開來處理,如上圖,如果我有5筆數據,拿訓練一個RNN 5個回合並做了5次更新,其實就等效於攤開來一次處理5筆數據並做1次更新,這樣的手法叫做Unrolling,我們實作上會使用Unrolling的手法來增加計算效率。\n", 24 | "\n", 25 | "![RNN](https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/master/img/TensorflowTutorial.011.jpeg)\n", 26 | "\n", 27 | "接下來來看RNN內部怎麼實現的,上圖是最簡單的RNN形式,我們將上一回產生的Output和這一回的Input一起評估出這一回的Output,詳細式子如下:\n", 28 | "\n", 29 | "$$\n", 30 | "o_{new}=tanh(i \\times W_i + o \\times W_o + B)\n", 31 | "$$\n", 32 | "\n", 33 | "如此一來RNN就具有時序性了,舊的歷史資料將可以被「記憶」起來,你可以把RNN的「記憶」看成是「短期記憶」,因為它只會記得上一回的Output而已。\n", 34 | "\n", 35 | "### 梯度消失與梯度爆炸\n", 36 | "\n", 37 | "但這種形式的RNN在實作上會遇到很大的問題,還記得第二章當中,我們有講過像是tanh這類有飽和區的函數,會造成梯度消失的問題,而我們如果使用Unrolling的觀點來看RNN,將會發現這是一個超級深的網路,Backpapagation必須一路通到t0的RNN,想當然爾,有些梯度將會消失,部分權重就更新不到了,那有一些聰明的讀者一定會想到,那就使用Relu就好啦!不過其實還有一個重要的因素造成梯度消失,同時也造成梯度爆炸。\n", 38 | "\n", 39 | "注意喔!雖然我們使用Unrolling的觀點,把網路看成是一個Deep網路的連接,但是和之前DNN不同之處,這些RNN彼此間是共享同一組權重的,這會造成梯度消失和梯度爆炸兩個問題,在RNN的結構裡頭,一個權重會隨著時間不斷的加強影響一個單一特徵,因為不同時間之下的RNN Cell共用同一個權重,這麼一來若是權重大於1,影響將會隨時間放大到梯度爆炸,若是權重小於1,影響將會隨時間縮小到梯度消失,就像是蝴蝶效應一般,微小的差異因為回饋的機制,而不合理的放大或是消失,因此RNN的Error Surface將會崎嶇不平,這會造成我們無法穩定的找到最佳解,難以收斂。這才是RNN難以使用的重要原因,把Activation Function換成Relu不會解決問題,文獻上反而告訴我們會變更差。\n", 40 | "\n", 41 | "解決梯度爆炸有一個聽起來很廢但廣為人們使用的方法,叫做Gradient Clipping,也就是只要在更新過程梯度超過一個值,我就切掉讓梯度維持在這個上限,這樣就不會爆炸啦,待會會講到的LSTM只能夠解決梯度消失問題,但不能解決梯度爆炸問題,因此我們還是需要Gradient Clipping方法的幫忙。\n", 42 | "\n", 43 | "在Tensorflow怎麼做到Gradient Clipping呢?作法是這樣的,以往我們使用`optimizer.minimize(loss)`來進行更新,事實上我們可以把這一步驟拆成兩部分,第一部分計算所有參數的梯度,第二部分使用這些梯度進行更新。因此我們可以從中作梗,把gradients偷天換日一番,一開始使用`optimizer.compute_gradients(loss)`來計算出個別的梯度,然後使用`tf.clip_by_global_norm(gradients, clip_norm)`來切梯度,最後再使用`optimizer.apply_gradients`把新的梯度餵入進行更新。\n", 44 | "\n", 45 | "### Long Short-Term Memory (LSTM)\n", 46 | "\n", 47 | "LSTM是現今RNN的主流,它可以解決梯度消失的問題,我們先來看看結構,先預告一下,LSTM是迄今為止這系列課程當中看過最複雜的Neurel Network。\n", 48 | "\n", 49 | "![LSTM](https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/master/img/TensorflowTutorial.012.jpeg)\n", 50 | "\n", 51 | "最一開始和RNN一樣,Input會和上一回的Output一起評估一個「短期記憶」,\n", 52 | "\n", 53 | "$$\n", 54 | "f_m = tanh (i \\times W_{mi} + o \\times W_{mo} + B_m)\n", 55 | "$$\n", 56 | "\n", 57 | "但接下來不同於RNN直接輸出,LSTM做了一個類似於轉換成「長期記憶」的機制,「長期記憶」在這裡稱為State,State的狀態由三道門所控制,Input Gate負責控管哪些「短期記憶」可以進到「長期記憶」,Forget Gate負責調配哪一些「長期記憶」需要被遺忘,Output Gate則負責去決定需要從「長期記憶」中輸出怎樣的內容,先不要管這些Gate怎麼來,我們可以把這樣的記憶機制寫成以下的式子,假設State為$f_{state}$、Input Gate為$G_i$、Forget Gate為$G_f$和Output Gate為$G_o$。\n", 58 | "\n", 59 | "$$\n", 60 | "f_{state,new} = G_i \\times f_m + G_f \\times f_{state}\n", 61 | "$$\n", 62 | "\n", 63 | "$$\n", 64 | "o_{new} = G_o \\times tanh(f_{state,new})\n", 65 | "$$\n", 66 | "\n", 67 | "如果我們要使得上面中Gates的部分具有開關的功能的話,我們會希望Gates可以是0到1的值,0代表全關,1代表全開,sigmoid正可以幫我們做到這件事,那哪些因素會決定Gates的關閉與否呢?不妨考慮所有可能的因素,也就是所有輸入這個Cell的資訊都考慮進去,但上一回的State必須被剔除於外,因為上一回的State來決定下一個State的操作是不合理的,因此我們就可以寫下所有Gates的表示式了。\n", 68 | "\n", 69 | "$$\n", 70 | "G_i = Sigmoid (i \\times W_{ii} + o \\times W_{io} + B_i)\n", 71 | "$$\n", 72 | "\n", 73 | "$$\n", 74 | "G_f = Sigmoid (i \\times W_{fi} + o \\times W_{fo} + B_f)\n", 75 | "$$\n", 76 | "\n", 77 | "$$\n", 78 | "G_o = Sigmoid(i \\times W_{oi} + o \\times W_{oo} + B_o)\n", 79 | "$$\n", 80 | "\n", 81 | "這就是LSTM,「長期記憶」的出現可以解決掉梯度消失的問題,RNN只有「短期記憶」,所以一旦認為一個特徵不重要,經過幾回連乘,這個特徵的梯度就會消失殆盡,但是LSTM保留State,並且使用「加」的方法更新State,所以有一些重要的State得以留下來持續影響著Output,解決了梯度消失的問題。但是,不幸的LSTM還是免不了梯度爆炸,為什麼呢?如果一個特徵真的很重要,State會記住,Input也會強調,所以幾輪下來還是有可能出現爆炸的情況,這時候我們就需要Gradient Clipping的幫忙。" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": { 87 | "colab_type": "text", 88 | "id": "Mm0po4HK7PuN" 89 | }, 90 | "source": [ 91 | "### 使用LSTM實作文章產生器\n", 92 | "\n", 93 | "接下來我們來實作LSTM,目標是做一個文章產生器,我們希望機器可以不斷的根據前文猜測下一個「字母」(Letters)應該要下什麼,如此一來我只要給個開頭字母,LSTM就可以幫我腦補成一篇文章。" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 0, 99 | "metadata": { 100 | "colab": {}, 101 | "colab_type": "code", 102 | "id": "TUKnNDD87PuQ" 103 | }, 104 | "outputs": [], 105 | "source": [ 106 | "import os\n", 107 | "import random\n", 108 | "import string\n", 109 | "import zipfile\n", 110 | "from urllib.request import urlretrieve\n", 111 | "import time\n", 112 | "\n", 113 | "import numpy as np\n", 114 | "import tensorflow as tf\n", 115 | "\n", 116 | "tf.logging.set_verbosity(tf.logging.ERROR)" 117 | ] 118 | }, 119 | { 120 | "cell_type": "code", 121 | "execution_count": 2, 122 | "metadata": { 123 | "colab": { 124 | "base_uri": "https://localhost:8080/", 125 | "height": 206 126 | }, 127 | "colab_type": "code", 128 | "id": "op47Fre17Pud", 129 | "outputId": "2644ab5a-296c-4cf2-840d-2f19d0bef9e9" 130 | }, 131 | "outputs": [ 132 | { 133 | "name": "stdout", 134 | "output_type": "stream", 135 | "text": [ 136 | "Downloading text8.zip\n", 137 | "Found and verified ./text8.zip\n", 138 | "=====\n", 139 | "Data size 100000000 letters\n", 140 | "=====\n", 141 | "Train Dataset: size: 99999000 letters,\n", 142 | " first 64: ons anarchists advocate social relations based upon voluntary as\n", 143 | "Validation Dataset: size: 1000 letters,\n", 144 | " first 64: anarchism originated as a term of abuse first used against earl\n" 145 | ] 146 | } 147 | ], 148 | "source": [ 149 | "LETTER_SIZE = len(string.ascii_lowercase) + 1 # [a-z] + ' '\n", 150 | "FIRST_LETTER_ASCII = ord(string.ascii_lowercase[0])\n", 151 | "\n", 152 | "def maybe_download(url, filename, expected_bytes):\n", 153 | " \"\"\"Download a file if not present, and make sure it's the right size.\"\"\"\n", 154 | " if not os.path.exists(filename):\n", 155 | " filename, _ = urlretrieve(url, filename)\n", 156 | " statinfo = os.stat(filename)\n", 157 | " if statinfo.st_size == expected_bytes:\n", 158 | " print('Found and verified %s' % filename)\n", 159 | " else:\n", 160 | " print(statinfo.st_size)\n", 161 | " raise Exception('Failed to verify ' + filename + '. Can you get to it with a browser?')\n", 162 | " return filename\n", 163 | "\n", 164 | "\n", 165 | "def read_data(filename):\n", 166 | " with zipfile.ZipFile(filename) as f:\n", 167 | " name = f.namelist()[0]\n", 168 | " data = tf.compat.as_str(f.read(name))\n", 169 | " return data\n", 170 | "\n", 171 | "\n", 172 | "def char2id(char):\n", 173 | " if char in string.ascii_lowercase:\n", 174 | " return ord(char) - FIRST_LETTER_ASCII + 1\n", 175 | " elif char == ' ':\n", 176 | " return 0\n", 177 | " else:\n", 178 | " print('Unexpected character: %s' % char)\n", 179 | " return 0\n", 180 | "\n", 181 | "\n", 182 | "def id2char(dictid):\n", 183 | " if dictid > 0:\n", 184 | " return chr(dictid + FIRST_LETTER_ASCII - 1)\n", 185 | " else:\n", 186 | " return ' '\n", 187 | "\n", 188 | " \n", 189 | "print('Downloading text8.zip')\n", 190 | "filename = maybe_download('http://mattmahoney.net/dc/text8.zip', './text8.zip', 31344016)\n", 191 | "\n", 192 | "print('=====')\n", 193 | "text = read_data(filename)\n", 194 | "print('Data size %d letters' % len(text))\n", 195 | "\n", 196 | "print('=====')\n", 197 | "valid_size = 1000\n", 198 | "valid_text = text[:valid_size]\n", 199 | "train_text = text[valid_size:]\n", 200 | "train_size = len(train_text)\n", 201 | "print('Train Dataset: size:', train_size, 'letters,\\n first 64:', train_text[:64])\n", 202 | "print('Validation Dataset: size:', valid_size, 'letters,\\n first 64:', valid_text[:64])" 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": { 208 | "colab_type": "text", 209 | "id": "A3ChK14A7Pup" 210 | }, 211 | "source": [ 212 | "上面操作我們建制完成了字母庫,接下來就可以產生我們訓練所需要的Batch Data,所以我們來看看究竟要產生怎樣格式的資料。\n", 213 | "\n", 214 | "![LSTM Implement](https://raw.githubusercontent.com/GitYCC/Tensorflow_Tutorial/master/img/TensorflowTutorial.013.jpeg)\n", 215 | "\n", 216 | "如上圖所示,有點小複雜,假設我要設計一個LSTM Model,它的Unrolling Number為3,Batch Size為2,然後遇到的字串是\"abcde fghij klmno pqrst\",接下來就開始產生每個Round要用的Data,產生的結果如上圖所示,你會發現產生的Data第0軸表示的是考慮unrolling需要取樣的資料,總共應該會有(Unrolling Number+1)筆,如上圖例,共有4筆,3筆當作輸入而3筆當作Labels,中間有2筆重疊使用,另外還有一點,我們會保留最後一筆Data當作下一個回合的第一筆,這是為了不浪費使用每一個字母前後的組合。而第1軸則是餵入單一LSTM需要的資料,我們一次可以餵多組不相干的字母進去,如上圖例,Batch Size=2所以餵2個字母進去,那這些不相干的字母在取樣的時候,我們會盡量讓它平均分配在文字庫,才能確保彼此之間不相干,以增加LSTM的訓練效率和效果。\n", 217 | "\n", 218 | "因此,先產生Batch Data吧!" 219 | ] 220 | }, 221 | { 222 | "cell_type": "code", 223 | "execution_count": 3, 224 | "metadata": { 225 | "colab": { 226 | "base_uri": "https://localhost:8080/", 227 | "height": 158 228 | }, 229 | "colab_type": "code", 230 | "id": "lSQM8Xm87Pur", 231 | "outputId": "abd3bf8f-8cfc-43df-b105-8e0745c98db5" 232 | }, 233 | "outputs": [ 234 | { 235 | "name": "stdout", 236 | "output_type": "stream", 237 | "text": [ 238 | "*** train_batches:\n", 239 | "['ons anarchi', 'when milita', 'lleria arch', ' abbeys and', 'married urr', 'hel and ric', 'y and litur', 'ay opened f', 'tion from t', 'migration t', 'new york ot', 'he boeing s', 'e listed wi', 'eber has pr', 'o be made t', 'yer who rec', 'ore signifi', 'a fierce cr', ' two six ei', 'aristotle s', 'ity can be ', ' and intrac', 'tion of the', 'dy to pass ', 'f certain d', 'at it will ', 'e convince ', 'ent told hi', 'ampaign and', 'rver side s', 'ious texts ', 'o capitaliz', 'a duplicate', 'gh ann es d', 'ine january', 'ross zero t', 'cal theorie', 'ast instanc', ' dimensiona', 'most holy m', 't s support', 'u is still ', 'e oscillati', 'o eight sub', 'of italy la', 's the tower', 'klahoma pre', 'erprise lin', 'ws becomes ', 'et in a naz', 'the fabian ', 'etchy to re', ' sharman ne', 'ised empero', 'ting in pol', 'd neo latin', 'th risky ri', 'encyclopedi', 'fense the a', 'duating fro', 'treet grid ', 'ations more', 'appeal of d', 'si have mad']\n", 240 | "['ists advoca', 'ary governm', 'hes nationa', 'd monasteri', 'raca prince', 'chard baer ', 'rgical lang', 'for passeng', 'the nationa', 'took place ', 'ther well k', 'seven six s', 'ith a gloss', 'robably bee', 'to recogniz', 'ceived the ', 'icant than ', 'ritic of th', 'ight in sig', 's uncaused ', ' lost as in', 'cellular ic', 'e size of t', ' him a stic', 'drugs confu', ' take to co', ' the priest', 'im to name ', 'd barred at', 'standard fo', ' such as es', 'ze on the g', 'e of the or', 'd hiver one', 'y eight mar', 'the lead ch', 'es classica', 'ce the non ', 'al analysis', 'mormons bel', 't or at lea', ' disagreed ', 'ing system ', 'btypes base', 'anguages th', 'r commissio', 'ess one nin', 'nux suse li', ' the first ', 'zi concentr', ' society ne', 'elatively s', 'etworks sha', 'or hirohito', 'litical ini', 'n most of t', 'iskerdoo ri', 'ic overview', 'air compone', 'om acnm acc', ' centerline', 'e than any ', 'devotional ', 'de such dev']\n", 241 | "*** valid_batches:\n", 242 | "[' a']\n", 243 | "['an']\n" 244 | ] 245 | } 246 | ], 247 | "source": [ 248 | "def characters(probabilities):\n", 249 | " \"\"\"Turn a 1-hot encoding or a probability distribution over the possible\n", 250 | " characters back into its (most likely) character representation.\"\"\"\n", 251 | " return [id2char(c) for c in np.argmax(probabilities, 1)]\n", 252 | "\n", 253 | "\n", 254 | "def batches2string(batches):\n", 255 | " \"\"\"Convert a sequence of batches back into their (most likely) string\n", 256 | " representation.\"\"\"\n", 257 | " s = [''] * batches[0].shape[0]\n", 258 | " for b in batches:\n", 259 | " s = [''.join(x) for x in zip(s, characters(b))]\n", 260 | " return s\n", 261 | "\n", 262 | "\n", 263 | "def rnn_batch_generator(text, batch_size, num_unrollings):\n", 264 | " text_size = len(text)\n", 265 | "\n", 266 | " ### initialization\n", 267 | " segment = text_size // batch_size\n", 268 | " cursors = [offset * segment for offset in range(batch_size)]\n", 269 | "\n", 270 | " batches = []\n", 271 | " batch_initial = np.zeros(shape=(batch_size, LETTER_SIZE), dtype=np.float)\n", 272 | " for i in range(batch_size):\n", 273 | " cursor = cursors[i]\n", 274 | " id_ = char2id(text[cursor])\n", 275 | " batch_initial[i][id_] = 1.0\n", 276 | "\n", 277 | " # move cursor\n", 278 | " cursors[i] = (cursors[i] + 1) % text_size\n", 279 | "\n", 280 | " batches.append(batch_initial)\n", 281 | "\n", 282 | " ### generate loop\n", 283 | " while True:\n", 284 | " batches = [batches[-1], ]\n", 285 | " for _ in range(num_unrollings):\n", 286 | " batch = np.zeros(shape=(batch_size, LETTER_SIZE), dtype=np.float)\n", 287 | " for i in range(batch_size):\n", 288 | " cursor = cursors[i]\n", 289 | " id_ = char2id(text[cursor])\n", 290 | " batch[i][id_] = 1.0\n", 291 | "\n", 292 | " # move cursor\n", 293 | " cursors[i] = (cursors[i] + 1) % text_size\n", 294 | " batches.append(batch)\n", 295 | "\n", 296 | " yield batches # [last batch of previous batches] + [unrollings]\n", 297 | "\n", 298 | "\n", 299 | "# demonstrate generator\n", 300 | "batch_size = 64\n", 301 | "num_unrollings = 10\n", 302 | "\n", 303 | "train_batches = rnn_batch_generator(train_text, batch_size, num_unrollings)\n", 304 | "valid_batches = rnn_batch_generator(valid_text, 1, 1)\n", 305 | "\n", 306 | "print('*** train_batches:')\n", 307 | "print(batches2string(next(train_batches)))\n", 308 | "print(batches2string(next(train_batches)))\n", 309 | "print('*** valid_batches:')\n", 310 | "print(batches2string(next(valid_batches)))\n", 311 | "print(batches2string(next(valid_batches)))" 312 | ] 313 | }, 314 | { 315 | "cell_type": "markdown", 316 | "metadata": { 317 | "colab_type": "text", 318 | "id": "0jD3Gb8I7Puw" 319 | }, 320 | "source": [ 321 | "定義一下待會會用到的函數。" 322 | ] 323 | }, 324 | { 325 | "cell_type": "code", 326 | "execution_count": 0, 327 | "metadata": { 328 | "colab": {}, 329 | "colab_type": "code", 330 | "id": "hVjvGZok7Pux" 331 | }, 332 | "outputs": [], 333 | "source": [ 334 | "def sample_distribution(distribution):\n", 335 | " \"\"\"Sample one element from a distribution assumed to be an array of normalized\n", 336 | " probabilities.\n", 337 | " \"\"\"\n", 338 | " r = random.uniform(0, 1)\n", 339 | " s = 0\n", 340 | " for i in range(len(distribution)):\n", 341 | " s += distribution[i]\n", 342 | " if s >= r:\n", 343 | " return i\n", 344 | " return len(distribution) - 1\n", 345 | "\n", 346 | "\n", 347 | "def sample(prediction):\n", 348 | " \"\"\"Turn a (column) prediction into 1-hot encoded samples.\"\"\"\n", 349 | " p = np.zeros(shape=[1, LETTER_SIZE], dtype=np.float)\n", 350 | " p[0, sample_distribution(prediction[0])] = 1.0\n", 351 | " return p\n", 352 | "\n", 353 | "\n", 354 | "def logprob(predictions, labels):\n", 355 | " \"\"\"Log-probability of the true labels in a predicted batch.\"\"\"\n", 356 | " predictions[predictions < 1e-10] = 1e-10\n", 357 | " return np.sum(np.multiply(labels, -np.log(predictions))) / labels.shape[0]" 358 | ] 359 | }, 360 | { 361 | "cell_type": "markdown", 362 | "metadata": { 363 | "colab_type": "text", 364 | "id": "87eIPKyb7Pu2" 365 | }, 366 | "source": [ 367 | "開始建制LSTM Model。" 368 | ] 369 | }, 370 | { 371 | "cell_type": "code", 372 | "execution_count": 0, 373 | "metadata": { 374 | "colab": {}, 375 | "colab_type": "code", 376 | "id": "rzCS7NdQ7Pu3" 377 | }, 378 | "outputs": [], 379 | "source": [ 380 | "class LSTM:\n", 381 | "\n", 382 | " def __init__(self, n_unrollings, n_memory, n_train_batch, learning_rate=1.0):\n", 383 | " self.n_unrollings = n_unrollings\n", 384 | " self.n_memory = n_memory\n", 385 | "\n", 386 | " self.weights = None\n", 387 | " self.biases = None\n", 388 | " self.saved = None\n", 389 | "\n", 390 | " self.graph = tf.Graph() # initialize new grap\n", 391 | " self.build(learning_rate, n_train_batch) # building graph\n", 392 | " self.sess = tf.Session(graph=self.graph) # create session by the graph\n", 393 | "\n", 394 | " def build(self, learning_rate, n_train_batch):\n", 395 | " with self.graph.as_default():\n", 396 | " ### Input\n", 397 | " self.train_data = list()\n", 398 | " for _ in range(self.n_unrollings + 1):\n", 399 | " self.train_data.append(\n", 400 | " tf.placeholder(tf.float32, shape=[n_train_batch, LETTER_SIZE]))\n", 401 | " self.train_inputs = self.train_data[:self.n_unrollings]\n", 402 | " self.train_labels = self.train_data[1:] # labels are inputs shifted by one time step.\n", 403 | "\n", 404 | "\n", 405 | " ### Optimalization\n", 406 | " # build neurel network structure and get their loss\n", 407 | " self.y_, self.loss = self.structure(\n", 408 | " inputs=self.train_inputs,\n", 409 | " labels=self.train_labels,\n", 410 | " n_batch=n_train_batch,\n", 411 | " )\n", 412 | "\n", 413 | " # define training operation\n", 414 | "\n", 415 | " self.optimizer = tf.train.AdagradOptimizer(learning_rate=learning_rate)\n", 416 | "\n", 417 | " # gradient clipping\n", 418 | "\n", 419 | " # output gradients one by one\n", 420 | " gradients, v = zip(*self.optimizer.compute_gradients(self.loss))\n", 421 | " gradients, _ = tf.clip_by_global_norm(gradients, 1.25) # clip gradient\n", 422 | " # apply clipped gradients\n", 423 | " self.train_op = self.optimizer.apply_gradients(zip(gradients, v))\n", 424 | "\n", 425 | " ### Sampling and validation eval: batch 1, no unrolling.\n", 426 | " self.sample_input = tf.placeholder(tf.float32, shape=[1, LETTER_SIZE])\n", 427 | "\n", 428 | " saved_sample_output = tf.Variable(tf.zeros([1, self.n_memory]))\n", 429 | " saved_sample_state = tf.Variable(tf.zeros([1, self.n_memory]))\n", 430 | " self.reset_sample_state = tf.group( # reset sample state operator\n", 431 | " saved_sample_output.assign(tf.zeros([1, self.n_memory])),\n", 432 | " saved_sample_state.assign(tf.zeros([1, self.n_memory])))\n", 433 | "\n", 434 | " sample_output, sample_state = self.lstm_cell(\n", 435 | " self.sample_input, saved_sample_output, saved_sample_state)\n", 436 | " with tf.control_dependencies([saved_sample_output.assign(sample_output),\n", 437 | " saved_sample_state.assign(sample_state)]):\n", 438 | " # use tf.control_dependencies to make sure 'saving' before 'prediction'\n", 439 | "\n", 440 | " self.sample_prediction = tf.nn.softmax(\n", 441 | " tf.nn.xw_plus_b(sample_output,\n", 442 | " self.weights['classifier'],\n", 443 | " self.biases['classifier']))\n", 444 | "\n", 445 | " ### Initialization\n", 446 | " self.init_op = tf.global_variables_initializer()\n", 447 | "\n", 448 | " def lstm_cell(self, i, o, state):\n", 449 | " \"\"\"\"Create a LSTM cell. See e.g.: http://arxiv.org/pdf/1402.1128v1.pdf\n", 450 | " Note that in this formulation, we omit the various connections between the\n", 451 | " previous state and the gates.\"\"\"\n", 452 | " ## Build Input Gate\n", 453 | " ix = self.weights['input_gate_i']\n", 454 | " im = self.weights['input_gate_o']\n", 455 | " ib = self.biases['input_gate']\n", 456 | " input_gate = tf.sigmoid(tf.matmul(i, ix) + tf.matmul(o, im) + ib)\n", 457 | " ## Build Forget Gate\n", 458 | " fx = self.weights['forget_gate_i']\n", 459 | " fm = self.weights['forget_gate_o']\n", 460 | " fb = self.biases['forget_gate']\n", 461 | " forget_gate = tf.sigmoid(tf.matmul(i, fx) + tf.matmul(o, fm) + fb)\n", 462 | " ## Memory\n", 463 | " cx = self.weights['memory_i']\n", 464 | " cm = self.weights['memory_o']\n", 465 | " cb = self.biases['memory']\n", 466 | " update = tf.matmul(i, cx) + tf.matmul(o, cm) + cb\n", 467 | " ## Update State\n", 468 | " state = forget_gate * state + input_gate * tf.tanh(update)\n", 469 | " ## Build Output Gate\n", 470 | " ox = self.weights['output_gate_i']\n", 471 | " om = self.weights['output_gate_o']\n", 472 | " ob = self.biases['output_gate']\n", 473 | " output_gate = tf.sigmoid(tf.matmul(i, ox) + tf.matmul(o, om) + ob)\n", 474 | " ## Ouput\n", 475 | " output = output_gate * tf.tanh(state)\n", 476 | "\n", 477 | " return output, state\n", 478 | "\n", 479 | " def structure(self, inputs, labels, n_batch):\n", 480 | " ### Variable\n", 481 | " if (not self.weights) or (not self.biases) or (not self.saved):\n", 482 | " self.weights = {\n", 483 | " 'input_gate_i': tf.Variable(tf.truncated_normal(\n", 484 | " [LETTER_SIZE, self.n_memory], -0.1, 0.1)),\n", 485 | " 'input_gate_o': tf.Variable(tf.truncated_normal(\n", 486 | " [self.n_memory, self.n_memory], -0.1, 0.1)),\n", 487 | " 'forget_gate_i': tf.Variable(tf.truncated_normal(\n", 488 | " [LETTER_SIZE, self.n_memory], -0.1, 0.1)),\n", 489 | " 'forget_gate_o': tf.Variable(tf.truncated_normal(\n", 490 | " [self.n_memory, self.n_memory], -0.1, 0.1)),\n", 491 | " 'output_gate_i': tf.Variable(tf.truncated_normal(\n", 492 | " [LETTER_SIZE, self.n_memory], -0.1, 0.1)),\n", 493 | " 'output_gate_o': tf.Variable(tf.truncated_normal(\n", 494 | " [self.n_memory, self.n_memory], -0.1, 0.1)),\n", 495 | " 'memory_i': tf.Variable(tf.truncated_normal(\n", 496 | " [LETTER_SIZE, self.n_memory], -0.1, 0.1)),\n", 497 | " 'memory_o': tf.Variable(tf.truncated_normal(\n", 498 | " [self.n_memory, self.n_memory], -0.1, 0.1)),\n", 499 | " 'classifier': tf.Variable(tf.truncated_normal(\n", 500 | " [self.n_memory, LETTER_SIZE], -0.1, 0.1)),\n", 501 | "\n", 502 | " }\n", 503 | " self.biases = {\n", 504 | " 'input_gate': tf.Variable(tf.zeros([1, self.n_memory])),\n", 505 | " 'forget_gate': tf.Variable(tf.zeros([1, self.n_memory])),\n", 506 | " 'output_gate': tf.Variable(tf.zeros([1, self.n_memory])),\n", 507 | " 'memory': tf.Variable(tf.zeros([1, self.n_memory])),\n", 508 | " 'classifier': tf.Variable(tf.zeros([LETTER_SIZE])),\n", 509 | " }\n", 510 | "\n", 511 | " # Variables saving state across unrollings.\n", 512 | " saved_output = tf.Variable(tf.zeros([n_batch, self.n_memory]), trainable=False)\n", 513 | " saved_state = tf.Variable(tf.zeros([n_batch, self.n_memory]), trainable=False)\n", 514 | "\n", 515 | " ### Structure\n", 516 | " # Unrolled LSTM loop.\n", 517 | " outputs = list()\n", 518 | " output = saved_output\n", 519 | " state = saved_state\n", 520 | " for input_ in inputs:\n", 521 | " output, state = self.lstm_cell(input_, output, state)\n", 522 | " outputs.append(output)\n", 523 | "\n", 524 | " # State saving across unrollings.\n", 525 | " with tf.control_dependencies([saved_output.assign(output),\n", 526 | " saved_state.assign(state)]):\n", 527 | " # use tf.control_dependencies to make sure 'saving' before 'calculating loss'\n", 528 | "\n", 529 | " # Classifier\n", 530 | " logits = tf.nn.xw_plus_b(tf.concat(outputs, 0),\n", 531 | " self.weights['classifier'],\n", 532 | " self.biases['classifier'])\n", 533 | " y_ = tf.nn.softmax(logits)\n", 534 | " loss = tf.reduce_mean(\n", 535 | " tf.nn.softmax_cross_entropy_with_logits(\n", 536 | " labels=tf.concat(labels, 0), logits=logits))\n", 537 | "\n", 538 | " return y_, loss\n", 539 | "\n", 540 | " def initialize(self):\n", 541 | " self.weights = None\n", 542 | " self.biases = None\n", 543 | " self.sess.run(self.init_op)\n", 544 | "\n", 545 | " def online_fit(self, X):\n", 546 | " feed_dict = dict()\n", 547 | " for i in range(self.n_unrollings + 1):\n", 548 | " feed_dict[self.train_data[i]] = X[i]\n", 549 | "\n", 550 | " _, loss = self.sess.run([self.train_op, self.loss], feed_dict=feed_dict)\n", 551 | " return loss\n", 552 | "\n", 553 | " def perplexity(self, X):\n", 554 | " sum_logprob = 0\n", 555 | " sample_size = len(X)-1\n", 556 | " batch_size = X[0].shape[0]\n", 557 | "\n", 558 | " for i in range(batch_size):\n", 559 | " self.sess.run(self.reset_sample_state)\n", 560 | " for j in range(sample_size):\n", 561 | " sample_input = np.reshape(X[j][i], newshape=(1, -1))\n", 562 | " sample_label = np.reshape(X[j+1][i], newshape=(1, -1))\n", 563 | " predictions = self.sess.run(self.sample_prediction,\n", 564 | " feed_dict={self.sample_input: sample_input})\n", 565 | " sum_logprob += logprob(predictions, sample_label)\n", 566 | " perplexity = float(np.exp(sum_logprob / batch_size / sample_size))\n", 567 | " return perplexity\n", 568 | "\n", 569 | " def generate(self, c, len_generate):\n", 570 | " feed = np.array([[1 if id2char(i) == c else 0 for i in range(LETTER_SIZE)]])\n", 571 | " sentence = characters(feed)[0]\n", 572 | " self.sess.run(self.reset_sample_state)\n", 573 | " for _ in range(len_generate):\n", 574 | " prediction = self.sess.run(self.sample_prediction, feed_dict={self.sample_input: feed})\n", 575 | " feed = sample(prediction)\n", 576 | " sentence += characters(feed)[0]\n", 577 | " return sentence" 578 | ] 579 | }, 580 | { 581 | "cell_type": "code", 582 | "execution_count": 6, 583 | "metadata": { 584 | "colab": { 585 | "base_uri": "https://localhost:8080/", 586 | "height": 1326 587 | }, 588 | "colab_type": "code", 589 | "id": "zLo09yMl7Pu8", 590 | "outputId": "7c1d448d-b840-4d55-d073-b3870fc2d6e1" 591 | }, 592 | "outputs": [ 593 | { 594 | "name": "stdout", 595 | "output_type": "stream", 596 | "text": [ 597 | "Epoch 1/30: 66s loss = 1.8249, perplexity = 5.6840\n", 598 | "Epoch 2/30: 64s loss = 1.5348, perplexity = 5.7269\n", 599 | "Epoch 3/30: 63s loss = 1.4754, perplexity = 5.7866\n", 600 | "Epoch 4/30: 62s loss = 1.4412, perplexity = 5.3462\n", 601 | "Epoch 5/30: 62s loss = 1.4246, perplexity = 5.8845\n", 602 | "\n", 603 | "=============== Validation ===============\n", 604 | "validation perplexity = 3.7260\n", 605 | "Generate From 'a': ah plays agrestiom scattery at an experiments the a\n", 606 | "Generate From 'h': ht number om one nine six three kg aid rosta franci\n", 607 | "Generate From 'm': m within v like opens and solepolity ledania as was\n", 608 | "==========================================\n", 609 | "\n", 610 | "Epoch 6/30: 64s loss = 1.4094, perplexity = 6.0429\n", 611 | "Epoch 7/30: 64s loss = 1.3954, perplexity = 5.6133\n", 612 | "Epoch 8/30: 63s loss = 1.3905, perplexity = 5.4791\n", 613 | "Epoch 9/30: 62s loss = 1.3675, perplexity = 5.7168\n", 614 | "Epoch 10/30: 62s loss = 1.3861, perplexity = 5.3937\n", 615 | "\n", 616 | "=============== Validation ===============\n", 617 | "validation perplexity = 3.5992\n", 618 | "Generate From 'a': ands their hypenman sam diversion passes to rouke t\n", 619 | "Generate From 'h': hash pryess the setuluply see include the grophistr\n", 620 | "Generate From 'm': merhouses tourism in vertic or influence carbon min\n", 621 | "==========================================\n", 622 | "\n", 623 | "Epoch 11/30: 64s loss = 1.3782, perplexity = 5.5835\n", 624 | "Epoch 12/30: 62s loss = 1.3802, perplexity = 6.0567\n", 625 | "Epoch 13/30: 62s loss = 1.3723, perplexity = 6.0672\n", 626 | "Epoch 14/30: 62s loss = 1.3729, perplexity = 6.4365\n", 627 | "Epoch 15/30: 62s loss = 1.3682, perplexity = 6.2878\n", 628 | "\n", 629 | "=============== Validation ===============\n", 630 | "validation perplexity = 3.7153\n", 631 | "Generate From 'a': ate at decade a july uses mobe on the john press to\n", 632 | "Generate From 'h': htell yullandi is u one five it naval railandly eng\n", 633 | "Generate From 'm': ment theory president and much three sinit in harde\n", 634 | "==========================================\n", 635 | "\n", 636 | "Epoch 16/30: 65s loss = 1.3647, perplexity = 5.5579\n", 637 | "Epoch 17/30: 63s loss = 1.3691, perplexity = 5.3885\n", 638 | "Epoch 18/30: 64s loss = 1.3535, perplexity = 6.4797\n", 639 | "Epoch 19/30: 63s loss = 1.3637, perplexity = 5.8126\n", 640 | "Epoch 20/30: 62s loss = 1.3567, perplexity = 5.9839\n", 641 | "\n", 642 | "=============== Validation ===============\n", 643 | "validation perplexity = 3.6210\n", 644 | "Generate From 'a': ate treaty jack a golderazogon develoged civilized \n", 645 | "Generate From 'h': hyene is ricpstowed dark preferent crurts annivaril\n", 646 | "Generate From 'm': mer centine all level end of a character of tracks \n", 647 | "==========================================\n", 648 | "\n", 649 | "Epoch 21/30: 65s loss = 1.3584, perplexity = 6.0557\n", 650 | "Epoch 22/30: 63s loss = 1.3535, perplexity = 7.0777\n", 651 | "Epoch 23/30: 63s loss = 1.3700, perplexity = 5.7674\n", 652 | "Epoch 24/30: 63s loss = 1.3609, perplexity = 6.1226\n", 653 | "Epoch 25/30: 64s loss = 1.3663, perplexity = 6.2711\n", 654 | "\n", 655 | "=============== Validation ===============\n", 656 | "validation perplexity = 3.6048\n", 657 | "Generate From 'a': an vary palest in some live halleten converting to \n", 658 | "Generate From 'h': heper could use that the l bidging the five zero th\n", 659 | "Generate From 'm': mer yort can the real forexanded or rather then for\n", 660 | "==========================================\n", 661 | "\n", 662 | "Epoch 26/30: 66s loss = 1.3551, perplexity = 6.1640\n", 663 | "Epoch 27/30: 65s loss = 1.3586, perplexity = 6.3620\n", 664 | "Epoch 28/30: 65s loss = 1.3744, perplexity = 5.5748\n", 665 | "Epoch 29/30: 64s loss = 1.3634, perplexity = 6.0498\n", 666 | "Epoch 30/30: 63s loss = 1.3671, perplexity = 6.2313\n", 667 | "\n", 668 | "=============== Validation ===============\n", 669 | "validation perplexity = 3.4751\n", 670 | "Generate From 'a': an one brivistrial empir thorodox to an of one city\n", 671 | "Generate From 'h': ho wing two he wonders marding where never boat lit\n", 672 | "Generate From 'm': mptemeignt linerical premore logical boldving on ch\n", 673 | "==========================================\n", 674 | "\n" 675 | ] 676 | } 677 | ], 678 | "source": [ 679 | "# build training batch generator\n", 680 | "batch_generator = rnn_batch_generator(\n", 681 | " text=train_text,\n", 682 | " batch_size=batch_size,\n", 683 | " num_unrollings=num_unrollings,\n", 684 | ")\n", 685 | "\n", 686 | "# build validation data\n", 687 | "valid_batches = rnn_batch_generator(\n", 688 | " text=valid_text, \n", 689 | " batch_size=1, \n", 690 | " num_unrollings=1,\n", 691 | ")\n", 692 | "\n", 693 | "valid_data = [np.array(next(valid_batches)) for _ in range(valid_size)]\n", 694 | "\n", 695 | "# build LSTM model\n", 696 | "model_LSTM = LSTM(\n", 697 | " n_unrollings=num_unrollings,\n", 698 | " n_memory=128,\n", 699 | " n_train_batch=batch_size,\n", 700 | " learning_rate=0.9\n", 701 | ")\n", 702 | "\n", 703 | "# initial model\n", 704 | "model_LSTM.initialize()\n", 705 | "\n", 706 | "# online training\n", 707 | "epochs = 30\n", 708 | "num_batchs_in_epoch = 5000\n", 709 | "valid_freq = 5\n", 710 | "\n", 711 | "for epoch in range(epochs):\n", 712 | " start_time = time.time()\n", 713 | " avg_loss = 0\n", 714 | " for _ in range(num_batchs_in_epoch):\n", 715 | " batch = next(batch_generator)\n", 716 | " loss = model_LSTM.online_fit(X=batch)\n", 717 | " avg_loss += loss\n", 718 | " \n", 719 | " avg_loss = avg_loss / num_batchs_in_epoch\n", 720 | " \n", 721 | " train_perplexity = model_LSTM.perplexity(batch)\n", 722 | " print('Epoch %d/%d: %ds loss = %6.4f, perplexity = %6.4f'\n", 723 | " % ( epoch+1, epochs, time.time()-start_time, avg_loss, train_perplexity))\n", 724 | " \n", 725 | " if (epoch+1) % valid_freq == 0:\n", 726 | " print('')\n", 727 | " print('=============== Validation ===============')\n", 728 | " print('validation perplexity = %6.4f' % (model_LSTM.perplexity(valid_data)))\n", 729 | " print('Generate From \\'a\\': ', model_LSTM.generate(c='a', len_generate=50))\n", 730 | " print('Generate From \\'h\\': ', model_LSTM.generate(c='h', len_generate=50))\n", 731 | " print('Generate From \\'m\\': ', model_LSTM.generate(c='m', len_generate=50))\n", 732 | " print('==========================================')\n", 733 | " print('')" 734 | ] 735 | }, 736 | { 737 | "cell_type": "markdown", 738 | "metadata": { 739 | "colab_type": "text", 740 | "id": "kCwuspmv7PvA" 741 | }, 742 | "source": [ 743 | "最後來產生一篇以\"t\"為開頭的1000字文章吧!" 744 | ] 745 | }, 746 | { 747 | "cell_type": "code", 748 | "execution_count": 7, 749 | "metadata": { 750 | "colab": { 751 | "base_uri": "https://localhost:8080/", 752 | "height": 94 753 | }, 754 | "colab_type": "code", 755 | "id": "mC8joiwD7PvC", 756 | "outputId": "ac8ec426-7ec2-47e3-e19c-7f2116a2efbc" 757 | }, 758 | "outputs": [ 759 | { 760 | "name": "stdout", 761 | "output_type": "stream", 762 | "text": [ 763 | "th the oppose asia college on all of indirect i suicide upse angence and including khazool cashle with jeremp of the case hasway was catiline tribui s law can be wounds to free from an eventually locations university colid for admirum syn semition goths display the might the official up it alder stowinity name like or day elenth names and lesk external links a loons for have the genione e elevang cress leven isbn effects on cultural leave to oldincil he hokerzon blacklomen with the known resolvement of literated by college founded to families in ak urke player jain of highling fake state a first o al reason into the son then mmpt one nine three three npunt university unexal and currently amnyanipation behavion from ber and ii variety of the gupife number topan has one three zero z capital prime genary brown one nine five nine so universities country recipient the vegetables bether form the distinct de plus out as a first a johnson quicky s remain which an death to anti in panibus series\n" 764 | ] 765 | } 766 | ], 767 | "source": [ 768 | "print(model_LSTM.generate(c='t', len_generate=1000))" 769 | ] 770 | }, 771 | { 772 | "cell_type": "markdown", 773 | "metadata": { 774 | "colab_type": "text", 775 | "id": "74prLZwn7PvH" 776 | }, 777 | "source": [ 778 | "看得出來LSTM想表達什麼嗎,哈哈!\n", 779 | "\n", 780 | "### Reference\n", 781 | "* https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/udacity/6_lstm.ipynb\n", 782 | "* http://colah.github.io/posts/2015-08-Understanding-LSTMs/\n" 783 | ] 784 | } 785 | ], 786 | "metadata": { 787 | "accelerator": "GPU", 788 | "colab": { 789 | "name": "06_RNN_and_LSTM.ipynb", 790 | "provenance": [], 791 | "version": "0.3.2" 792 | }, 793 | "kernelspec": { 794 | "display_name": "Python 3", 795 | "language": "python", 796 | "name": "python3" 797 | }, 798 | "language_info": { 799 | "codemirror_mode": { 800 | "name": "ipython", 801 | "version": 3 802 | }, 803 | "file_extension": ".py", 804 | "mimetype": "text/x-python", 805 | "name": "python", 806 | "nbconvert_exporter": "python", 807 | "pygments_lexer": "ipython3", 808 | "version": "3.6.5" 809 | } 810 | }, 811 | "nbformat": 4, 812 | "nbformat_minor": 1 813 | } 814 | -------------------------------------------------------------------------------- /tutorial/tensorflow_workshop_0630.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Tensorflow Workshop (2019/6/30)" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "import tensorflow as tf\n", 17 | "from tensorflow import keras\n", 18 | "import numpy as np\n", 19 | "import matplotlib.pyplot as plt" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": {}, 25 | "source": [ 26 | "## python tutorial\n", 27 | "\n", 28 | "參考:https://www.w3schools.com/python/python_lists.asp\n", 29 | "\n", 30 | "* print: \n", 31 | " * `print('Hello!')`\n", 32 | " * `print('Art: %5d, Price: %8.2f' % (453, 59.058))`\n", 33 | " * `print('Art: {0:5d}, Price: {1:8.2f}'.format(453, 59.058))`\n", 34 | " * 參考:https://www.python-course.eu/python3_formatted_output.php\n", 35 | "* list\n", 36 | "* tuples\n", 37 | "* dictionaries\n", 38 | "* if ... elif ... else\n", 39 | "* while loop\n", 40 | "* for loop\n", 41 | "* function\n", 42 | "* class\n", 43 | "\n", 44 | "\n", 45 | "### Mission\n", 46 | "\n", 47 | "假設今天有一燈泡組,第一顆燈泡會接上電源,然後第一顆燈泡(序號為0)接上第二顆燈泡(序號為1),第二顆燈泡(序號為1)接上第三顆燈泡(序號為2),依序連接,總共有5顆燈泡。另外,每顆燈泡都有一個開關,開關打開代表導通,燈泡就有機會亮,但還需要兩個額外條件:第一,這顆燈泡已接上電源,第二,前一顆燈泡亮了(已導通),所以如果依照順序 `[2, 0, 3, 1, 4]` 的開啟燈泡,則燈泡組會有 `3` 階段的亮法,如下所示:\n", 48 | "\n", 49 | "```\n", 50 | "switch on: 2\n", 51 | "bulbs: _ _ _ _ _\n", 52 | "switch on: 0\n", 53 | "bulbs: * _ _ _ _\n", 54 | "switch on: 3\n", 55 | "bulbs: * _ _ _ _\n", 56 | "switch on: 1\n", 57 | "bulbs: * * * * _\n", 58 | "switch on: 4\n", 59 | "bulbs: * * * * *\n", 60 | "```\n", 61 | "\n", 62 | "請大家幫我利用以下的code把這個demo實作出來。" 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": null, 68 | "metadata": {}, 69 | "outputs": [], 70 | "source": [ 71 | "class Bulb:\n", 72 | " def __init__(self, name):\n", 73 | " self.name = name\n", 74 | "\n", 75 | " self._connect_source = False\n", 76 | " self._switch_on = False\n", 77 | " self._prev = None\n", 78 | " self._next = None\n", 79 | " \n", 80 | " def append(self, next_bulb):\n", 81 | " self._next = next_bulb\n", 82 | " next_bulb._prev = self\n", 83 | " \n", 84 | " def connect_source(self):\n", 85 | " self._connect_source = True\n", 86 | "\n", 87 | " def switch_on(self):\n", 88 | " self._switch_on = True\n", 89 | " \n", 90 | " def is_light(self):\n", 91 | " # 幫我在這邊實作出兩種情形要回傳True(亮)\n", 92 | " # 第一種情況是,燈泡有接電源且開關打開\n", 93 | " # 第二種情況是,前面接的燈泡有亮且開關打開\n", 94 | " # hint: 記得檢查上一個燈泡是否存在\n", 95 | " # answer here\n", 96 | "\n", 97 | " \n", 98 | "def demo(order):\n", 99 | " N = 5\n", 100 | " bulbs = [Bulb(i) for i in range(N)]\n", 101 | "\n", 102 | " # 幫我在這邊讓第一顆燈泡接上電源,並且串接各個燈泡成為燈泡組\n", 103 | " # answer here\n", 104 | "\n", 105 | " for j in order:\n", 106 | " print('switch on: {0}'.format(j))\n", 107 | " bulbs[j].switch_on()\n", 108 | " light_status = ['*' if bulbs[i].is_light() else '_' for i in range(N)]\n", 109 | " print('bulbs: {0}'.format(' '.join(light_status)))\n", 110 | "\n", 111 | "\n", 112 | "demo([2, 0, 3, 1, 4])\n", 113 | "\n", 114 | "### Correct Output\n", 115 | "# switch on: 2\n", 116 | "# bulbs: _ _ _ _ _\n", 117 | "# switch on: 0\n", 118 | "# bulbs: * _ _ _ _\n", 119 | "# switch on: 3\n", 120 | "# bulbs: * _ _ _ _\n", 121 | "# switch on: 1\n", 122 | "# bulbs: * * * * _\n", 123 | "# switch on: 4" 124 | ] 125 | }, 126 | { 127 | "cell_type": "markdown", 128 | "metadata": {}, 129 | "source": [ 130 | "## Numpy tutorial\n", 131 | "\n", 132 | "參考:\n", 133 | "https://www.ycc.idv.tw/python-play-with-data_2.html \n", 134 | "https://www.ycc.idv.tw/python-play-with-data_3.html\n", 135 | "\n", 136 | "\n", 137 | "### ndarray\n", 138 | "Numpy最重要的元素就是ndarray,它是N-Dimensional Array的縮寫,在Numpy裡,dimesions被稱為axes,而axes的數量被稱為rank,axes是一個重要的概念,了解這個概念基本上就把Numpy搞懂一半以上了。\n", 139 | "\n", 140 | "先來建立一個簡單的ndarray" 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "execution_count": null, 146 | "metadata": {}, 147 | "outputs": [], 148 | "source": [ 149 | "A = np.array(\n", 150 | " [\n", 151 | " [\n", 152 | " [1,2,3], [4,5,6]\n", 153 | " ],\n", 154 | " [\n", 155 | " [7,8,9], [10,11,12]\n", 156 | " ],\n", 157 | " ]\n", 158 | ")\n", 159 | "A" 160 | ] 161 | }, 162 | { 163 | "cell_type": "code", 164 | "execution_count": null, 165 | "metadata": {}, 166 | "outputs": [], 167 | "source": [ 168 | "A.shape" 169 | ] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "metadata": {}, 174 | "source": [ 175 | "下面這張圖可以幫助大家理解\n", 176 | "![](http://www.ycc.idv.tw/media/PlayDataWithPython/ndarray_axis.png)" 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": null, 182 | "metadata": {}, 183 | "outputs": [], 184 | "source": [ 185 | "np.sum(A, axis=None) # axis為None的時候則加總所有元素" 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": null, 191 | "metadata": {}, 192 | "outputs": [], 193 | "source": [ 194 | "np.sum(A, axis=0)" 195 | ] 196 | }, 197 | { 198 | "cell_type": "code", 199 | "execution_count": null, 200 | "metadata": {}, 201 | "outputs": [], 202 | "source": [ 203 | "np.sum(A, axis=1)" 204 | ] 205 | }, 206 | { 207 | "cell_type": "code", 208 | "execution_count": null, 209 | "metadata": {}, 210 | "outputs": [], 211 | "source": [ 212 | "np.sum(A, axis=2)" 213 | ] 214 | }, 215 | { 216 | "cell_type": "markdown", 217 | "metadata": {}, 218 | "source": [ 219 | "### reshape" 220 | ] 221 | }, 222 | { 223 | "cell_type": "code", 224 | "execution_count": null, 225 | "metadata": {}, 226 | "outputs": [], 227 | "source": [ 228 | "B = np.array([[1,2],[3,4],[5,6]])\n", 229 | "B.shape" 230 | ] 231 | }, 232 | { 233 | "cell_type": "markdown", 234 | "metadata": {}, 235 | "source": [ 236 | "`(3, 2)`這樣的shape我們就一點都不意外了,axis=0有三個元素,而axis=1有兩個元素。shape可以直接改,如果數量恰當的話就會自動重組。" 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": null, 242 | "metadata": {}, 243 | "outputs": [], 244 | "source": [ 245 | "B.shape = (2,1,3)\n", 246 | "B.shape" 247 | ] 248 | }, 249 | { 250 | "cell_type": "code", 251 | "execution_count": null, 252 | "metadata": {}, 253 | "outputs": [], 254 | "source": [ 255 | "B" 256 | ] 257 | }, 258 | { 259 | "cell_type": "markdown", 260 | "metadata": {}, 261 | "source": [ 262 | "axis=0有兩個元素,axis=1有一個元素,axis=2有三個元素。\n", 263 | "\n", 264 | "同樣的概念也可以用在取出單一元素上。" 265 | ] 266 | }, 267 | { 268 | "cell_type": "code", 269 | "execution_count": null, 270 | "metadata": {}, 271 | "outputs": [], 272 | "source": [ 273 | "B[1, 0, 1]" 274 | ] 275 | }, 276 | { 277 | "cell_type": "code", 278 | "execution_count": null, 279 | "metadata": {}, 280 | "outputs": [], 281 | "source": [ 282 | "B[0, 0, 2]" 283 | ] 284 | }, 285 | { 286 | "cell_type": "markdown", 287 | "metadata": {}, 288 | "source": [ 289 | "在axis=0上選第二個元素(1),在axis=1上選第一個元素(0),在axis=2上選第二個元素(1),所以選出來的元素就是5啦!\n", 290 | "\n", 291 | "### dtype\n", 292 | "\n", 293 | "ndarray有其資料型別,這個資料型別就稱為dtype,有哪些內建的資料型別呢?我們可以透過numpy的內建資料來查看。" 294 | ] 295 | }, 296 | { 297 | "cell_type": "code", 298 | "execution_count": null, 299 | "metadata": {}, 300 | "outputs": [], 301 | "source": [ 302 | "np.sctypes" 303 | ] 304 | }, 305 | { 306 | "cell_type": "markdown", 307 | "metadata": {}, 308 | "source": [ 309 | "有複數、浮點數、整數,另外每個資料型別還可以由資料的儲存容量大小來區分,例如:numpy.int32就代表是容量為32bits的整數。我們可以在設置ndarray的時候事先強迫設成某資料型別。" 310 | ] 311 | }, 312 | { 313 | "cell_type": "code", 314 | "execution_count": null, 315 | "metadata": {}, 316 | "outputs": [], 317 | "source": [ 318 | "t1 = np.array([1, 2, 3], dtype='int32')\n", 319 | "t1" 320 | ] 321 | }, 322 | { 323 | "cell_type": "code", 324 | "execution_count": null, 325 | "metadata": {}, 326 | "outputs": [], 327 | "source": [ 328 | "t1.dtype" 329 | ] 330 | }, 331 | { 332 | "cell_type": "code", 333 | "execution_count": null, 334 | "metadata": {}, 335 | "outputs": [], 336 | "source": [ 337 | "t2 = np.array([1, 2, 3], dtype='float64')\n", 338 | "t2" 339 | ] 340 | }, 341 | { 342 | "cell_type": "code", 343 | "execution_count": null, 344 | "metadata": {}, 345 | "outputs": [], 346 | "source": [ 347 | "t2.dtype" 348 | ] 349 | }, 350 | { 351 | "cell_type": "markdown", 352 | "metadata": {}, 353 | "source": [ 354 | "### Numpy的矩陣運算\n", 355 | "\n", 356 | "有了ndarray就可以作矩陣的運算了,矩陣運算有兩種系統,一種是element-wise(元素方面) operation,一種是matrix operation。\n", 357 | "\n", 358 | "這樣講好像很抽象,我來解釋一下,element-wise operation就是每個元素獨立運算,例如,以下例子就是element-wise的相加。" 359 | ] 360 | }, 361 | { 362 | "cell_type": "code", 363 | "execution_count": null, 364 | "metadata": {}, 365 | "outputs": [], 366 | "source": [ 367 | "A = np.array([[1, 2], [3, 4]], dtype='float64')\n", 368 | "B = np.array([[5, 0], [0, 0]], dtype='float64')\n", 369 | "A + B " 370 | ] 371 | }, 372 | { 373 | "cell_type": "markdown", 374 | "metadata": {}, 375 | "source": [ 376 | "A和B矩陣中同樣位置的元素相加,再放到新的矩陣中,這一種操作就叫做element-wise operation。\n", 377 | "\n", 378 | "在numpy中如果沒有特別指定,所有的運算都是這類的運算,我們來看一下減、乘和除。" 379 | ] 380 | }, 381 | { 382 | "cell_type": "code", 383 | "execution_count": null, 384 | "metadata": {}, 385 | "outputs": [], 386 | "source": [ 387 | "A - B" 388 | ] 389 | }, 390 | { 391 | "cell_type": "code", 392 | "execution_count": null, 393 | "metadata": {}, 394 | "outputs": [], 395 | "source": [ 396 | "A * B" 397 | ] 398 | }, 399 | { 400 | "cell_type": "code", 401 | "execution_count": null, 402 | "metadata": {}, 403 | "outputs": [], 404 | "source": [ 405 | "B / A" 406 | ] 407 | }, 408 | { 409 | "cell_type": "markdown", 410 | "metadata": {}, 411 | "source": [ 412 | "那我如果想要作矩陣操作(matrix operation)呢?譬如說矩陣內積," 413 | ] 414 | }, 415 | { 416 | "cell_type": "code", 417 | "execution_count": null, 418 | "metadata": {}, 419 | "outputs": [], 420 | "source": [ 421 | "np.dot(A, B)" 422 | ] 423 | }, 424 | { 425 | "cell_type": "markdown", 426 | "metadata": {}, 427 | "source": [ 428 | "還有更多的矩陣操作,\n", 429 | "\n", 430 | "矩陣轉置" 431 | ] 432 | }, 433 | { 434 | "cell_type": "code", 435 | "execution_count": null, 436 | "metadata": {}, 437 | "outputs": [], 438 | "source": [ 439 | "A = np.array([[1, 2], [3, 4]], dtype='float64')\n", 440 | "A.T" 441 | ] 442 | }, 443 | { 444 | "cell_type": "markdown", 445 | "metadata": {}, 446 | "source": [ 447 | "垂直方向合併" 448 | ] 449 | }, 450 | { 451 | "cell_type": "code", 452 | "execution_count": null, 453 | "metadata": {}, 454 | "outputs": [], 455 | "source": [ 456 | "A = np.array([[1, 2], [3, 4]], dtype='float64')\n", 457 | "B = np.array([[5, 0], [0, 0]], dtype='float64')\n", 458 | "V = np.vstack((A, B))\n", 459 | "V" 460 | ] 461 | }, 462 | { 463 | "cell_type": "markdown", 464 | "metadata": {}, 465 | "source": [ 466 | "水平方向合併" 467 | ] 468 | }, 469 | { 470 | "cell_type": "code", 471 | "execution_count": null, 472 | "metadata": {}, 473 | "outputs": [], 474 | "source": [ 475 | "H = np.hstack((A, B))\n", 476 | "H" 477 | ] 478 | }, 479 | { 480 | "cell_type": "markdown", 481 | "metadata": {}, 482 | "source": [ 483 | "## Machine Learning tutorial\n", 484 | "\n", 485 | "* loss function\n", 486 | " * MSE (Mean Square Error)\n", 487 | " * Cross-Entropy Loss\n", 488 | "* optimization\n", 489 | " * grandient descent\n", 490 | " * back propagation\n", 491 | "* overfitting\n", 492 | " * validation\n", 493 | " * regularization" 494 | ] 495 | }, 496 | { 497 | "cell_type": "markdown", 498 | "metadata": {}, 499 | "source": [ 500 | "## 01_Simple_Logistic_Classification_on_MNIST" 501 | ] 502 | }, 503 | { 504 | "cell_type": "code", 505 | "execution_count": 2, 506 | "metadata": {}, 507 | "outputs": [], 508 | "source": [ 509 | "fashion_mnist = keras.datasets.fashion_mnist\n", 510 | "\n", 511 | "(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()" 512 | ] 513 | }, 514 | { 515 | "cell_type": "markdown", 516 | "metadata": {}, 517 | "source": [ 518 | "圖片是由28x28的NumPy arrays所構成,每個pixel值落在0到255之間。Labels則是有整數所構成,範圍從0到9,分別代表以下類別:\n", 519 | "\n", 520 | "\n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | "
LabelClass
0T-shirt/top
1Trouser
2Pullover
3Dress
4Coat
5Sandal
6Shirt
7Sneaker
8Bag
9Ankle boot
" 566 | ] 567 | }, 568 | { 569 | "cell_type": "markdown", 570 | "metadata": {}, 571 | "source": [ 572 | "### Mission\n", 573 | "\n", 574 | "請問 `(train_images, train_labels), (test_images, test_labels)` 中的各個變數它們的 `shape` 各是? 請問train部分的圖片有幾張?test部份的圖片有幾張?" 575 | ] 576 | }, 577 | { 578 | "cell_type": "code", 579 | "execution_count": null, 580 | "metadata": {}, 581 | "outputs": [], 582 | "source": [] 583 | }, 584 | { 585 | "cell_type": "markdown", 586 | "metadata": {}, 587 | "source": [ 588 | "### Mission\n", 589 | "麻煩幫我從 `(train_images, train_labels)` 中隨便印出一張Sneaker和Coat的圖片,並且將相應的Label數字給顯示出來。" 590 | ] 591 | }, 592 | { 593 | "cell_type": "code", 594 | "execution_count": null, 595 | "metadata": {}, 596 | "outputs": [], 597 | "source": [] 598 | }, 599 | { 600 | "cell_type": "markdown", 601 | "metadata": {}, 602 | "source": [ 603 | "做一些資料的前處理" 604 | ] 605 | }, 606 | { 607 | "cell_type": "code", 608 | "execution_count": 3, 609 | "metadata": {}, 610 | "outputs": [], 611 | "source": [ 612 | "from sklearn.model_selection import train_test_split\n", 613 | "from sklearn.preprocessing import OneHotEncoder\n", 614 | "\n", 615 | "train_images.shape = (-1, 784)\n", 616 | "X_test = test_images.reshape((-1, 784))\n", 617 | "\n", 618 | "enc = OneHotEncoder(handle_unknown='ignore')\n", 619 | "enc.fit([[0, ], [1, ], [2, ], [3, ], [4, ], [5, ], [6, ], [7, ], [8, ], [9, ]])\n", 620 | "train_labels = enc.transform(train_labels.reshape((-1, 1))).toarray()\n", 621 | "y_test = enc.transform(test_labels.reshape((-1, 1))).toarray()\n", 622 | "\n", 623 | "X_train, X_valid, y_train, y_valid = train_test_split(train_images, train_labels, test_size=0.2)" 624 | ] 625 | }, 626 | { 627 | "cell_type": "code", 628 | "execution_count": 6, 629 | "metadata": {}, 630 | "outputs": [ 631 | { 632 | "data": { 633 | "text/plain": [ 634 | "((48000, 784),\n", 635 | " (12000, 784),\n", 636 | " (10000, 784),\n", 637 | " (48000, 10),\n", 638 | " (12000, 10),\n", 639 | " (10000, 10))" 640 | ] 641 | }, 642 | "execution_count": 6, 643 | "metadata": {}, 644 | "output_type": "execute_result" 645 | } 646 | ], 647 | "source": [ 648 | "X_train.shape, X_valid.shape, X_test.shape, y_train.shape, y_valid.shape, y_test.shape" 649 | ] 650 | }, 651 | { 652 | "cell_type": "code", 653 | "execution_count": null, 654 | "metadata": {}, 655 | "outputs": [], 656 | "source": [ 657 | "y_train[6]" 658 | ] 659 | }, 660 | { 661 | "cell_type": "markdown", 662 | "metadata": {}, 663 | "source": [ 664 | "### Mission\n", 665 | "\n", 666 | "麻煩幫我使用 `(X_train, y_train)` 去訓練一個 Simple Logistic Classification,並且使用 `(X_valid, y_valid)` 去作validation,最後用 `(test_images, test_labels)` 來test出它的精確度。" 667 | ] 668 | }, 669 | { 670 | "cell_type": "code", 671 | "execution_count": null, 672 | "metadata": {}, 673 | "outputs": [], 674 | "source": [] 675 | }, 676 | { 677 | "cell_type": "markdown", 678 | "metadata": {}, 679 | "source": [ 680 | "## Tensorflow補充資訊\n", 681 | "\n", 682 | "loss\n", 683 | "https://www.tensorflow.org/api_docs/python/tf/losses\n", 684 | "\n", 685 | "optimizer\n", 686 | "https://www.tensorflow.org/api_docs/python/tf/train\n", 687 | "* search XXXOptimizer\n", 688 | "\n", 689 | "dtype\n", 690 | "https://www.tensorflow.org/api_docs/python/tf/dtypes/DType\n", 691 | "\n", 692 | "math\n", 693 | "https://www.tensorflow.org/api_docs/python/tf/math\n", 694 | "\n", 695 | "nn\n", 696 | "https://www.tensorflow.org/api_docs/python/tf/nn\n", 697 | "\n", 698 | "layers\n", 699 | "https://www.tensorflow.org/api_docs/python/tf/layers" 700 | ] 701 | } 702 | ], 703 | "metadata": { 704 | "kernelspec": { 705 | "display_name": "Python 3", 706 | "language": "python", 707 | "name": "python3" 708 | }, 709 | "language_info": { 710 | "codemirror_mode": { 711 | "name": "ipython", 712 | "version": 3 713 | }, 714 | "file_extension": ".py", 715 | "mimetype": "text/x-python", 716 | "name": "python", 717 | "nbconvert_exporter": "python", 718 | "pygments_lexer": "ipython3", 719 | "version": "3.6.5" 720 | } 721 | }, 722 | "nbformat": 4, 723 | "nbformat_minor": 2 724 | } 725 | -------------------------------------------------------------------------------- /tutorial/tensorflow_workshop_0630_ans.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Tensorflow Workshop (2019/6/30)" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "import tensorflow as tf\n", 17 | "from tensorflow import keras\n", 18 | "import numpy as np\n", 19 | "import matplotlib.pyplot as plt" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": {}, 25 | "source": [ 26 | "## python tutorial\n", 27 | "\n", 28 | "參考:https://www.w3schools.com/python/python_lists.asp\n", 29 | "\n", 30 | "* print: \n", 31 | " * `print('Hello!')`\n", 32 | " * `print('Art: %5d, Price: %8.2f' % (453, 59.058))`\n", 33 | " * `print('Art: {0:5d}, Price: {1:8.2f}'.format(453, 59.058))`\n", 34 | " * 參考:https://www.python-course.eu/python3_formatted_output.php\n", 35 | "* list\n", 36 | "* tuples\n", 37 | "* dictionaries\n", 38 | "* if ... elif ... else\n", 39 | "* while loop\n", 40 | "* for loop\n", 41 | "* function\n", 42 | "* class\n", 43 | "\n", 44 | "\n", 45 | "### Mission\n", 46 | "\n", 47 | "假設今天有一燈泡組,第一顆燈泡會接上電源,然後第一顆燈泡(序號為0)接上第二顆燈泡(序號為1),第二顆燈泡(序號為1)接上第三顆燈泡(序號為2),依序連接,總共有5顆燈泡。另外,每顆燈泡都有一個開關,開關打開代表導通,燈泡就有機會亮,但還需要兩個額外條件:第一,這顆燈泡已接上電源,第二,前一顆燈泡亮了(已導通),所以如果依照順序 `[2, 0, 3, 1, 4]` 的開啟燈泡,則燈泡組會有 `3` 階段的亮法,如下所示:\n", 48 | "\n", 49 | "```\n", 50 | "switch on: 2\n", 51 | "bulbs: _ _ _ _ _\n", 52 | "switch on: 0\n", 53 | "bulbs: * _ _ _ _\n", 54 | "switch on: 3\n", 55 | "bulbs: * _ _ _ _\n", 56 | "switch on: 1\n", 57 | "bulbs: * * * * _\n", 58 | "switch on: 4\n", 59 | "bulbs: * * * * *\n", 60 | "```\n", 61 | "\n", 62 | "請大家幫我利用以下的code把這個demo實作出來。" 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": null, 68 | "metadata": {}, 69 | "outputs": [], 70 | "source": [ 71 | "class Bulb:\n", 72 | " def __init__(self, name):\n", 73 | " self.name = name\n", 74 | "\n", 75 | " self._connect_source = False\n", 76 | " self._switch_on = False\n", 77 | " self._prev = None\n", 78 | " self._next = None\n", 79 | " \n", 80 | " def append(self, next_bulb):\n", 81 | " self._next = next_bulb\n", 82 | " next_bulb._prev = self\n", 83 | " \n", 84 | " def connect_source(self):\n", 85 | " self._connect_source = True\n", 86 | "\n", 87 | " def switch_on(self):\n", 88 | " self._switch_on = True\n", 89 | " \n", 90 | " def is_light(self):\n", 91 | " # 幫我在這邊實作出兩種情形要回傳True(亮)\n", 92 | " # 第一種情況是,燈泡有接電源且開關打開\n", 93 | " # 第二種情況是,前面接的燈泡有亮且開關打開\n", 94 | " # hint: 記得檢查上一個燈泡是否存在\n", 95 | " if self._switch_on:\n", 96 | " if self._connect_source:\n", 97 | " return True\n", 98 | " elif self._prev and self._prev.is_light():\n", 99 | " return True\n", 100 | " return False\n", 101 | "\n", 102 | " \n", 103 | "def demo(order):\n", 104 | " N = 5\n", 105 | " bulbs = [Bulb(i) for i in range(N)]\n", 106 | "\n", 107 | " # 幫我在這邊讓第一顆燈泡接上電源,並且串接各個燈泡成為燈泡組\n", 108 | " bulbs[0].connect_source()\n", 109 | " for i in range(0, N-1):\n", 110 | " bulbs[i].append(bulbs[i+1])\n", 111 | "\n", 112 | " for j in order:\n", 113 | " print('switch on: {0}'.format(j))\n", 114 | " bulbs[j].switch_on()\n", 115 | " light_status = ['*' if bulbs[i].is_light() else '_' for i in range(N)]\n", 116 | " print('bulbs: {0}'.format(' '.join(light_status)))\n", 117 | "\n", 118 | "\n", 119 | "demo([2, 0, 3, 1, 4])\n", 120 | "\n", 121 | "### Correct Output\n", 122 | "# switch on: 2\n", 123 | "# bulbs: _ _ _ _ _\n", 124 | "# switch on: 0\n", 125 | "# bulbs: * _ _ _ _\n", 126 | "# switch on: 3\n", 127 | "# bulbs: * _ _ _ _\n", 128 | "# switch on: 1\n", 129 | "# bulbs: * * * * _\n", 130 | "# switch on: 4" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": {}, 136 | "source": [ 137 | "## Numpy tutorial\n", 138 | "\n", 139 | "參考:\n", 140 | "https://www.ycc.idv.tw/python-play-with-data_2.html \n", 141 | "https://www.ycc.idv.tw/python-play-with-data_3.html\n", 142 | "\n", 143 | "\n", 144 | "### ndarray\n", 145 | "Numpy最重要的元素就是ndarray,它是N-Dimensional Array的縮寫,在Numpy裡,dimesions被稱為axes,而axes的數量被稱為rank,axes是一個重要的概念,了解這個概念基本上就把Numpy搞懂一半以上了。\n", 146 | "\n", 147 | "先來建立一個簡單的ndarray" 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": null, 153 | "metadata": {}, 154 | "outputs": [], 155 | "source": [ 156 | "A = np.array(\n", 157 | " [\n", 158 | " [\n", 159 | " [1,2,3], [4,5,6]\n", 160 | " ],\n", 161 | " [\n", 162 | " [7,8,9], [10,11,12]\n", 163 | " ],\n", 164 | " ]\n", 165 | ")\n", 166 | "A" 167 | ] 168 | }, 169 | { 170 | "cell_type": "code", 171 | "execution_count": null, 172 | "metadata": {}, 173 | "outputs": [], 174 | "source": [ 175 | "A.shape" 176 | ] 177 | }, 178 | { 179 | "cell_type": "markdown", 180 | "metadata": {}, 181 | "source": [ 182 | "下面這張圖可以幫助大家理解\n", 183 | "![](http://www.ycc.idv.tw/media/PlayDataWithPython/ndarray_axis.png)" 184 | ] 185 | }, 186 | { 187 | "cell_type": "code", 188 | "execution_count": null, 189 | "metadata": {}, 190 | "outputs": [], 191 | "source": [ 192 | "np.sum(A, axis=None) # axis為None的時候則加總所有元素" 193 | ] 194 | }, 195 | { 196 | "cell_type": "code", 197 | "execution_count": null, 198 | "metadata": {}, 199 | "outputs": [], 200 | "source": [ 201 | "np.sum(A, axis=0)" 202 | ] 203 | }, 204 | { 205 | "cell_type": "code", 206 | "execution_count": null, 207 | "metadata": {}, 208 | "outputs": [], 209 | "source": [ 210 | "np.sum(A, axis=1)" 211 | ] 212 | }, 213 | { 214 | "cell_type": "code", 215 | "execution_count": null, 216 | "metadata": {}, 217 | "outputs": [], 218 | "source": [ 219 | "np.sum(A, axis=2)" 220 | ] 221 | }, 222 | { 223 | "cell_type": "markdown", 224 | "metadata": {}, 225 | "source": [ 226 | "### reshape" 227 | ] 228 | }, 229 | { 230 | "cell_type": "code", 231 | "execution_count": null, 232 | "metadata": {}, 233 | "outputs": [], 234 | "source": [ 235 | "B = np.array([[1,2],[3,4],[5,6]])\n", 236 | "B.shape" 237 | ] 238 | }, 239 | { 240 | "cell_type": "markdown", 241 | "metadata": {}, 242 | "source": [ 243 | "`(3, 2)`這樣的shape我們就一點都不意外了,axis=0有三個元素,而axis=1有兩個元素。shape可以直接改,如果數量恰當的話就會自動重組。" 244 | ] 245 | }, 246 | { 247 | "cell_type": "code", 248 | "execution_count": null, 249 | "metadata": {}, 250 | "outputs": [], 251 | "source": [ 252 | "B.shape = (2,1,3)\n", 253 | "B.shape" 254 | ] 255 | }, 256 | { 257 | "cell_type": "code", 258 | "execution_count": null, 259 | "metadata": {}, 260 | "outputs": [], 261 | "source": [ 262 | "B" 263 | ] 264 | }, 265 | { 266 | "cell_type": "markdown", 267 | "metadata": {}, 268 | "source": [ 269 | "axis=0有兩個元素,axis=1有一個元素,axis=2有三個元素。\n", 270 | "\n", 271 | "同樣的概念也可以用在取出單一元素上。" 272 | ] 273 | }, 274 | { 275 | "cell_type": "code", 276 | "execution_count": null, 277 | "metadata": {}, 278 | "outputs": [], 279 | "source": [ 280 | "B[1, 0, 1]" 281 | ] 282 | }, 283 | { 284 | "cell_type": "code", 285 | "execution_count": null, 286 | "metadata": {}, 287 | "outputs": [], 288 | "source": [ 289 | "B[0, 0, 2]" 290 | ] 291 | }, 292 | { 293 | "cell_type": "markdown", 294 | "metadata": {}, 295 | "source": [ 296 | "在axis=0上選第二個元素(1),在axis=1上選第一個元素(0),在axis=2上選第二個元素(1),所以選出來的元素就是5啦!\n", 297 | "\n", 298 | "### dtype\n", 299 | "\n", 300 | "ndarray有其資料型別,這個資料型別就稱為dtype,有哪些內建的資料型別呢?我們可以透過numpy的內建資料來查看。" 301 | ] 302 | }, 303 | { 304 | "cell_type": "code", 305 | "execution_count": null, 306 | "metadata": {}, 307 | "outputs": [], 308 | "source": [ 309 | "np.sctypes" 310 | ] 311 | }, 312 | { 313 | "cell_type": "markdown", 314 | "metadata": {}, 315 | "source": [ 316 | "有複數、浮點數、整數,另外每個資料型別還可以由資料的儲存容量大小來區分,例如:numpy.int32就代表是容量為32bits的整數。我們可以在設置ndarray的時候事先強迫設成某資料型別。" 317 | ] 318 | }, 319 | { 320 | "cell_type": "code", 321 | "execution_count": null, 322 | "metadata": {}, 323 | "outputs": [], 324 | "source": [ 325 | "t1 = np.array([1, 2, 3], dtype='int32')\n", 326 | "t1" 327 | ] 328 | }, 329 | { 330 | "cell_type": "code", 331 | "execution_count": null, 332 | "metadata": {}, 333 | "outputs": [], 334 | "source": [ 335 | "t1.dtype" 336 | ] 337 | }, 338 | { 339 | "cell_type": "code", 340 | "execution_count": null, 341 | "metadata": {}, 342 | "outputs": [], 343 | "source": [ 344 | "t2 = np.array([1, 2, 3], dtype='float64')\n", 345 | "t2" 346 | ] 347 | }, 348 | { 349 | "cell_type": "code", 350 | "execution_count": null, 351 | "metadata": {}, 352 | "outputs": [], 353 | "source": [ 354 | "t2.dtype" 355 | ] 356 | }, 357 | { 358 | "cell_type": "markdown", 359 | "metadata": {}, 360 | "source": [ 361 | "### Numpy的矩陣運算\n", 362 | "\n", 363 | "有了ndarray就可以作矩陣的運算了,矩陣運算有兩種系統,一種是element-wise(元素方面) operation,一種是matrix operation。\n", 364 | "\n", 365 | "這樣講好像很抽象,我來解釋一下,element-wise operation就是每個元素獨立運算,例如,以下例子就是element-wise的相加。" 366 | ] 367 | }, 368 | { 369 | "cell_type": "code", 370 | "execution_count": null, 371 | "metadata": {}, 372 | "outputs": [], 373 | "source": [ 374 | "A = np.array([[1, 2], [3, 4]], dtype='float64')\n", 375 | "B = np.array([[5, 0], [0, 0]], dtype='float64')\n", 376 | "A + B " 377 | ] 378 | }, 379 | { 380 | "cell_type": "markdown", 381 | "metadata": {}, 382 | "source": [ 383 | "A和B矩陣中同樣位置的元素相加,再放到新的矩陣中,這一種操作就叫做element-wise operation。\n", 384 | "\n", 385 | "在numpy中如果沒有特別指定,所有的運算都是這類的運算,我們來看一下減、乘和除。" 386 | ] 387 | }, 388 | { 389 | "cell_type": "code", 390 | "execution_count": null, 391 | "metadata": {}, 392 | "outputs": [], 393 | "source": [ 394 | "A - B" 395 | ] 396 | }, 397 | { 398 | "cell_type": "code", 399 | "execution_count": null, 400 | "metadata": {}, 401 | "outputs": [], 402 | "source": [ 403 | "A * B" 404 | ] 405 | }, 406 | { 407 | "cell_type": "code", 408 | "execution_count": null, 409 | "metadata": {}, 410 | "outputs": [], 411 | "source": [ 412 | "B / A" 413 | ] 414 | }, 415 | { 416 | "cell_type": "markdown", 417 | "metadata": {}, 418 | "source": [ 419 | "那我如果想要作矩陣操作(matrix operation)呢?譬如說矩陣內積," 420 | ] 421 | }, 422 | { 423 | "cell_type": "code", 424 | "execution_count": null, 425 | "metadata": {}, 426 | "outputs": [], 427 | "source": [ 428 | "np.dot(A, B)" 429 | ] 430 | }, 431 | { 432 | "cell_type": "markdown", 433 | "metadata": {}, 434 | "source": [ 435 | "還有更多的矩陣操作,\n", 436 | "\n", 437 | "矩陣轉置" 438 | ] 439 | }, 440 | { 441 | "cell_type": "code", 442 | "execution_count": null, 443 | "metadata": {}, 444 | "outputs": [], 445 | "source": [ 446 | "A = np.array([[1, 2], [3, 4]], dtype='float64')\n", 447 | "A.T" 448 | ] 449 | }, 450 | { 451 | "cell_type": "markdown", 452 | "metadata": {}, 453 | "source": [ 454 | "垂直方向合併" 455 | ] 456 | }, 457 | { 458 | "cell_type": "code", 459 | "execution_count": null, 460 | "metadata": {}, 461 | "outputs": [], 462 | "source": [ 463 | "A = np.array([[1, 2], [3, 4]], dtype='float64')\n", 464 | "B = np.array([[5, 0], [0, 0]], dtype='float64')\n", 465 | "V = np.vstack((A, B))\n", 466 | "V" 467 | ] 468 | }, 469 | { 470 | "cell_type": "markdown", 471 | "metadata": {}, 472 | "source": [ 473 | "水平方向合併" 474 | ] 475 | }, 476 | { 477 | "cell_type": "code", 478 | "execution_count": null, 479 | "metadata": {}, 480 | "outputs": [], 481 | "source": [ 482 | "H = np.hstack((A, B))\n", 483 | "H" 484 | ] 485 | }, 486 | { 487 | "cell_type": "markdown", 488 | "metadata": {}, 489 | "source": [ 490 | "## Machine Learning tutorial\n", 491 | "\n", 492 | "* loss function\n", 493 | " * MSE (Mean Square Error)\n", 494 | " * Cross-Entropy Loss\n", 495 | "* optimization\n", 496 | " * grandient descent\n", 497 | " * back propagation\n", 498 | "* overfitting\n", 499 | " * validation\n", 500 | " * regularization" 501 | ] 502 | }, 503 | { 504 | "cell_type": "markdown", 505 | "metadata": {}, 506 | "source": [ 507 | "## 01_Simple_Logistic_Classification_on_MNIST" 508 | ] 509 | }, 510 | { 511 | "cell_type": "code", 512 | "execution_count": 2, 513 | "metadata": {}, 514 | "outputs": [], 515 | "source": [ 516 | "fashion_mnist = keras.datasets.fashion_mnist\n", 517 | "\n", 518 | "(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()" 519 | ] 520 | }, 521 | { 522 | "cell_type": "markdown", 523 | "metadata": {}, 524 | "source": [ 525 | "圖片是由28x28的NumPy arrays所構成,每個pixel值落在0到255之間。Labels則是有整數所構成,範圍從0到9,分別代表以下類別:\n", 526 | "\n", 527 | "\n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | "
LabelClass
0T-shirt/top
1Trouser
2Pullover
3Dress
4Coat
5Sandal
6Shirt
7Sneaker
8Bag
9Ankle boot
" 573 | ] 574 | }, 575 | { 576 | "cell_type": "markdown", 577 | "metadata": {}, 578 | "source": [ 579 | "### Mission\n", 580 | "\n", 581 | "請問 `(train_images, train_labels), (test_images, test_labels)` 中的各個變數它們的 `shape` 各是? 請問train部分的圖片有幾張?test部份的圖片有幾張?" 582 | ] 583 | }, 584 | { 585 | "cell_type": "code", 586 | "execution_count": null, 587 | "metadata": {}, 588 | "outputs": [], 589 | "source": [ 590 | "train_images.shape" 591 | ] 592 | }, 593 | { 594 | "cell_type": "code", 595 | "execution_count": null, 596 | "metadata": {}, 597 | "outputs": [], 598 | "source": [ 599 | "train_labels.shape" 600 | ] 601 | }, 602 | { 603 | "cell_type": "code", 604 | "execution_count": null, 605 | "metadata": {}, 606 | "outputs": [], 607 | "source": [ 608 | "test_images.shape" 609 | ] 610 | }, 611 | { 612 | "cell_type": "code", 613 | "execution_count": null, 614 | "metadata": {}, 615 | "outputs": [], 616 | "source": [ 617 | "test_labels.shape" 618 | ] 619 | }, 620 | { 621 | "cell_type": "markdown", 622 | "metadata": {}, 623 | "source": [ 624 | "### Mission\n", 625 | "麻煩幫我從 `(train_images, train_labels)` 中隨便印出一張Sneaker和Coat的圖片,並且將相應的Label數字給顯示出來。" 626 | ] 627 | }, 628 | { 629 | "cell_type": "code", 630 | "execution_count": null, 631 | "metadata": {}, 632 | "outputs": [], 633 | "source": [ 634 | "def plot_fatten_img(ndarr):\n", 635 | " img = ndarr.copy()\n", 636 | " plt.imshow(img, cmap='gray')\n", 637 | " plt.show()" 638 | ] 639 | }, 640 | { 641 | "cell_type": "code", 642 | "execution_count": null, 643 | "metadata": {}, 644 | "outputs": [], 645 | "source": [ 646 | "plot_fatten_img(train_images[19, :, :])" 647 | ] 648 | }, 649 | { 650 | "cell_type": "code", 651 | "execution_count": null, 652 | "metadata": {}, 653 | "outputs": [], 654 | "source": [ 655 | "train_labels[19]" 656 | ] 657 | }, 658 | { 659 | "cell_type": "code", 660 | "execution_count": null, 661 | "metadata": {}, 662 | "outputs": [], 663 | "source": [ 664 | "plot_fatten_img(train_images[6, :, :])" 665 | ] 666 | }, 667 | { 668 | "cell_type": "code", 669 | "execution_count": null, 670 | "metadata": {}, 671 | "outputs": [], 672 | "source": [ 673 | "train_labels[6]" 674 | ] 675 | }, 676 | { 677 | "cell_type": "markdown", 678 | "metadata": {}, 679 | "source": [ 680 | "做一些資料的前處理" 681 | ] 682 | }, 683 | { 684 | "cell_type": "code", 685 | "execution_count": 3, 686 | "metadata": {}, 687 | "outputs": [], 688 | "source": [ 689 | "from sklearn.model_selection import train_test_split\n", 690 | "from sklearn.preprocessing import OneHotEncoder\n", 691 | "\n", 692 | "train_images.shape = (-1, 784)\n", 693 | "X_test = test_images.reshape((-1, 784))\n", 694 | "\n", 695 | "enc = OneHotEncoder(handle_unknown='ignore')\n", 696 | "enc.fit([[0, ], [1, ], [2, ], [3, ], [4, ], [5, ], [6, ], [7, ], [8, ], [9, ]])\n", 697 | "train_labels = enc.transform(train_labels.reshape((-1, 1))).toarray()\n", 698 | "y_test = enc.transform(test_labels.reshape((-1, 1))).toarray()\n", 699 | "\n", 700 | "X_train, X_valid, y_train, y_valid = train_test_split(train_images, train_labels, test_size=0.2)" 701 | ] 702 | }, 703 | { 704 | "cell_type": "code", 705 | "execution_count": 6, 706 | "metadata": {}, 707 | "outputs": [ 708 | { 709 | "data": { 710 | "text/plain": [ 711 | "((48000, 784),\n", 712 | " (12000, 784),\n", 713 | " (10000, 784),\n", 714 | " (48000, 10),\n", 715 | " (12000, 10),\n", 716 | " (10000, 10))" 717 | ] 718 | }, 719 | "execution_count": 6, 720 | "metadata": {}, 721 | "output_type": "execute_result" 722 | } 723 | ], 724 | "source": [ 725 | "X_train.shape, X_valid.shape, X_test.shape, y_train.shape, y_valid.shape, y_test.shape" 726 | ] 727 | }, 728 | { 729 | "cell_type": "code", 730 | "execution_count": null, 731 | "metadata": {}, 732 | "outputs": [], 733 | "source": [ 734 | "y_train[6]" 735 | ] 736 | }, 737 | { 738 | "cell_type": "markdown", 739 | "metadata": {}, 740 | "source": [ 741 | "### Mission\n", 742 | "\n", 743 | "麻煩幫我使用 `(X_train, y_train)` 去訓練一個 Simple Logistic Classification,並且使用 `(X_valid, y_valid)` 去作validation,最後用 `(test_images, test_labels)` 來test出它的精確度。" 744 | ] 745 | }, 746 | { 747 | "cell_type": "code", 748 | "execution_count": 7, 749 | "metadata": {}, 750 | "outputs": [], 751 | "source": [ 752 | "class SimpleLogisticClassification:\n", 753 | "\n", 754 | " def __init__(self, n_features, n_labels, learning_rate=0.5):\n", 755 | " self.n_features = n_features\n", 756 | " self.n_labels = n_labels\n", 757 | "\n", 758 | " self.weights = None\n", 759 | " self.biases = None\n", 760 | "\n", 761 | " self.graph = tf.Graph() # initialize new graph\n", 762 | " self.build(learning_rate) # building graph\n", 763 | " self.sess = tf.Session(graph=self.graph) # create session by the graph\n", 764 | "\n", 765 | " def build(self, learning_rate):\n", 766 | " # Building Graph\n", 767 | " with self.graph.as_default():\n", 768 | " ### Input\n", 769 | " self.train_features = tf.placeholder(tf.float32, shape=(None, self.n_features))\n", 770 | " self.train_labels = tf.placeholder(tf.int32, shape=(None, self.n_labels))\n", 771 | "\n", 772 | " ### Optimalization\n", 773 | " # build neurel network structure and get their predictions and loss\n", 774 | " self.y_, self.loss = self.structure(features=self.train_features,\n", 775 | " labels=self.train_labels)\n", 776 | " # define training operation\n", 777 | " self.train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(self.loss)\n", 778 | "\n", 779 | " ### Prediction\n", 780 | " self.new_features = tf.placeholder(tf.float32, shape=(None, self.n_features))\n", 781 | " self.new_labels = tf.placeholder(tf.int32, shape=(None, self.n_labels))\n", 782 | " self.new_y_, self.new_loss = self.structure(features=self.new_features,\n", 783 | " labels=self.new_labels)\n", 784 | "\n", 785 | " ### Initialization\n", 786 | " self.init_op = tf.global_variables_initializer()\n", 787 | "\n", 788 | " def structure(self, features, labels):\n", 789 | " # build neurel network structure and return their predictions and loss\n", 790 | " ### Variable\n", 791 | " if (not self.weights) or (not self.biases):\n", 792 | " self.weights = {\n", 793 | " 'fc1': tf.Variable(tf.truncated_normal(shape=(self.n_features, self.n_labels))),\n", 794 | " }\n", 795 | " self.biases = {\n", 796 | " 'fc1': tf.Variable(tf.zeros(shape=(self.n_labels))),\n", 797 | " }\n", 798 | "\n", 799 | " ### Structure\n", 800 | " # one fully connected layer\n", 801 | " logits = self.get_dense_layer(features, self.weights['fc1'], self.biases['fc1'])\n", 802 | "\n", 803 | " # predictions\n", 804 | " y_ = tf.nn.softmax(logits)\n", 805 | "\n", 806 | " # loss: softmax cross entropy\n", 807 | " loss = tf.reduce_mean(\n", 808 | " tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits))\n", 809 | "\n", 810 | " return (y_, loss)\n", 811 | "\n", 812 | " def get_dense_layer(self, input_layer, weight, bias, activation=None):\n", 813 | " # fully connected layer\n", 814 | " x = tf.add(tf.matmul(input_layer, weight), bias)\n", 815 | " if activation:\n", 816 | " x = activation(x)\n", 817 | " return x\n", 818 | "\n", 819 | " def fit(self, X, y, epochs=10, validation_data=None, test_data=None):\n", 820 | " X = self._check_array(X)\n", 821 | " y = self._check_array(y)\n", 822 | "\n", 823 | " self.sess.run(self.init_op)\n", 824 | " for epoch in range(epochs):\n", 825 | " print('Epoch %2d/%2d: ' % (epoch+1, epochs))\n", 826 | "\n", 827 | " # fully gradient descent\n", 828 | " feed_dict = {self.train_features: X, self.train_labels: y}\n", 829 | " self.sess.run(self.train_op, feed_dict=feed_dict)\n", 830 | "\n", 831 | " # evaluate at the end of this epoch\n", 832 | " y_ = self.predict(X)\n", 833 | " train_loss = self.evaluate(X, y)\n", 834 | " train_acc = self.accuracy(y_, y)\n", 835 | " msg = ' loss = %8.4f, acc = %3.2f%%' % (train_loss, train_acc*100)\n", 836 | "\n", 837 | " if validation_data:\n", 838 | " val_loss = self.evaluate(validation_data[0], validation_data[1])\n", 839 | " val_acc = self.accuracy(self.predict(validation_data[0]), validation_data[1])\n", 840 | " msg += ', val_loss = %8.4f, val_acc = %3.2f%%' % (val_loss, val_acc*100)\n", 841 | "\n", 842 | " print(msg)\n", 843 | "\n", 844 | " if test_data:\n", 845 | " test_acc = self.accuracy(self.predict(test_data[0]), test_data[1])\n", 846 | " print('test_acc = %3.2f%%' % (test_acc*100))\n", 847 | "\n", 848 | " def accuracy(self, predictions, labels):\n", 849 | " return (np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))/predictions.shape[0])\n", 850 | "\n", 851 | " def predict(self, X):\n", 852 | " X = self._check_array(X)\n", 853 | " return self.sess.run(self.new_y_, feed_dict={self.new_features: X})\n", 854 | "\n", 855 | " def evaluate(self, X, y):\n", 856 | " X = self._check_array(X)\n", 857 | " y = self._check_array(y)\n", 858 | " return self.sess.run(self.new_loss, feed_dict={self.new_features: X, self.new_labels: y})\n", 859 | "\n", 860 | " def _check_array(self, ndarray):\n", 861 | " ndarray = np.array(ndarray)\n", 862 | " if len(ndarray.shape) == 1:\n", 863 | " ndarray = np.reshape(ndarray, (1, ndarray.shape[0]))\n", 864 | " return ndarray" 865 | ] 866 | }, 867 | { 868 | "cell_type": "code", 869 | "execution_count": 8, 870 | "metadata": {}, 871 | "outputs": [ 872 | { 873 | "name": "stdout", 874 | "output_type": "stream", 875 | "text": [ 876 | "Epoch 1/10: \n", 877 | " loss = 197411.9688, acc = 18.82%, val_loss = 203962.7031, val_acc = 19.03%\n", 878 | "Epoch 2/10: \n", 879 | " loss = 519169.7812, acc = 36.41%, val_loss = 533264.5625, val_acc = 36.18%\n", 880 | "Epoch 3/10: \n", 881 | " loss = 717693.1875, acc = 20.36%, val_loss = 725370.3125, val_acc = 20.16%\n", 882 | "Epoch 4/10: \n", 883 | " loss = 780948.0625, acc = 36.44%, val_loss = 792466.9375, val_acc = 36.39%\n", 884 | "Epoch 5/10: \n", 885 | " loss = 787304.8750, acc = 33.32%, val_loss = 792811.4375, val_acc = 33.44%\n", 886 | "Epoch 6/10: \n", 887 | " loss = 594957.0625, acc = 33.63%, val_loss = 602511.2500, val_acc = 33.96%\n", 888 | "Epoch 7/10: \n", 889 | " loss = 592702.2500, acc = 27.09%, val_loss = 601402.1250, val_acc = 27.38%\n", 890 | "Epoch 8/10: \n", 891 | " loss = 661588.1250, acc = 26.63%, val_loss = 662238.0000, val_acc = 26.66%\n", 892 | "Epoch 9/10: \n", 893 | " loss = 637350.1250, acc = 24.93%, val_loss = 642814.8750, val_acc = 24.39%\n", 894 | "Epoch 10/10: \n", 895 | " loss = 607272.1875, acc = 37.34%, val_loss = 611831.0625, val_acc = 36.69%\n", 896 | "test_acc = 36.91%\n" 897 | ] 898 | } 899 | ], 900 | "source": [ 901 | "model = SimpleLogisticClassification(n_features=28*28, n_labels=10, learning_rate= 0.5)\n", 902 | "model.fit(\n", 903 | " X=X_train,\n", 904 | " y=y_train,\n", 905 | " epochs=10,\n", 906 | " validation_data=(X_valid, y_valid),\n", 907 | " test_data=(X_test, y_test),\n", 908 | ")" 909 | ] 910 | }, 911 | { 912 | "cell_type": "markdown", 913 | "metadata": {}, 914 | "source": [ 915 | "## Tensorflow補充資訊\n", 916 | "\n", 917 | "loss\n", 918 | "https://www.tensorflow.org/api_docs/python/tf/losses\n", 919 | "\n", 920 | "optimizer\n", 921 | "https://www.tensorflow.org/api_docs/python/tf/train\n", 922 | "* search XXXOptimizer\n", 923 | "\n", 924 | "dtype\n", 925 | "https://www.tensorflow.org/api_docs/python/tf/dtypes/DType\n", 926 | "\n", 927 | "math\n", 928 | "https://www.tensorflow.org/api_docs/python/tf/math\n", 929 | "\n", 930 | "nn\n", 931 | "https://www.tensorflow.org/api_docs/python/tf/nn\n", 932 | "\n", 933 | "layers\n", 934 | "https://www.tensorflow.org/api_docs/python/tf/layers" 935 | ] 936 | }, 937 | { 938 | "cell_type": "code", 939 | "execution_count": null, 940 | "metadata": {}, 941 | "outputs": [], 942 | "source": [] 943 | } 944 | ], 945 | "metadata": { 946 | "kernelspec": { 947 | "display_name": "Python 3", 948 | "language": "python", 949 | "name": "python3" 950 | }, 951 | "language_info": { 952 | "codemirror_mode": { 953 | "name": "ipython", 954 | "version": 3 955 | }, 956 | "file_extension": ".py", 957 | "mimetype": "text/x-python", 958 | "name": "python", 959 | "nbconvert_exporter": "python", 960 | "pygments_lexer": "ipython3", 961 | "version": "3.5.2" 962 | } 963 | }, 964 | "nbformat": 4, 965 | "nbformat_minor": 2 966 | } 967 | --------------------------------------------------------------------------------