├── genetic
    ├── zombies
    │   ├── .gitignore
    │   ├── demo
    │   │   ├── webapp
    │   │   │   ├── public
    │   │   │   │   ├── images
    │   │   │   │   │   ├── boy.png
    │   │   │   │   │   ├── girl.png
    │   │   │   │   │   ├── bullet.png
    │   │   │   │   │   └── zombie.png
    │   │   │   │   ├── js
    │   │   │   │   │   ├── utils.js
    │   │   │   │   │   └── index.js
    │   │   │   │   └── css
    │   │   │   │   │   └── general.css
    │   │   │   └── index.html
    │   │   ├── README.md
    │   │   ├── package.json
    │   │   ├── app.yaml
    │   │   └── app.js
    │   ├── README.md
    │   ├── DEVELOPMENT.md
    │   ├── CONTRIBUTE.md
    │   ├── typedoc.json
    │   ├── package.json
    │   ├── tsconfig.json
    │   ├── webpack.config.js
    │   └── src
    │   │   └── index.ts
    └── introduction
    │   ├── simple_genetic_exemple.html
    │   └── js
    │       └── simple_genetic_example.js
├── images
    ├── algorithm_q_learning.png
    ├── q_learning_loss_with_target.png
    ├── optimal_action_value_function.png
    └── q_function_loss_without_target.png
├── README.md
├── rl
    ├── demo_gym.py
    ├── test_human.py
    ├── q_learning.py
    ├── dqn_use.py
    ├── grid_world_q_learning.py
    ├── sticks.py
    ├── GridWorld.ipynb
    ├── tictactoe.py
    ├── q_learning_nn.html
    ├── dqn.py
    ├── q_learning_visu.html
    ├── policy_gradient.py
    ├── policy_gradient_continuous_actions.py
    └── actor_critic.py
├── meta-learning
    ├── Neural Turing Machine.ipynb
    └── Meta Learning.ipynb
├── metacar
    └── index.html
├── tensorflow-js
    └── teachable_machine.html
└── tensorflow
    ├── Eager Execution.ipynb
    └── TensorFlow MNIST tutorial.ipynb


/genetic/zombies/.gitignore:
--------------------------------------------------------------------------------
1 | node_modules/*
2 | demo/dist/*
3 | demo/node_modules/*
4 | dist/*
5 | 


--------------------------------------------------------------------------------
/images/algorithm_q_learning.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thibo73800/aihub/HEAD/images/algorithm_q_learning.png


--------------------------------------------------------------------------------
/images/q_learning_loss_with_target.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thibo73800/aihub/HEAD/images/q_learning_loss_with_target.png


--------------------------------------------------------------------------------
/images/optimal_action_value_function.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thibo73800/aihub/HEAD/images/optimal_action_value_function.png


--------------------------------------------------------------------------------
/images/q_function_loss_without_target.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thibo73800/aihub/HEAD/images/q_function_loss_without_target.png


--------------------------------------------------------------------------------
/genetic/zombies/demo/webapp/public/images/boy.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thibo73800/aihub/HEAD/genetic/zombies/demo/webapp/public/images/boy.png


--------------------------------------------------------------------------------
/genetic/zombies/demo/webapp/public/images/girl.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thibo73800/aihub/HEAD/genetic/zombies/demo/webapp/public/images/girl.png


--------------------------------------------------------------------------------
/genetic/zombies/demo/webapp/public/images/bullet.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thibo73800/aihub/HEAD/genetic/zombies/demo/webapp/public/images/bullet.png


--------------------------------------------------------------------------------
/genetic/zombies/demo/webapp/public/images/zombie.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thibo73800/aihub/HEAD/genetic/zombies/demo/webapp/public/images/zombie.png


--------------------------------------------------------------------------------
/genetic/zombies/demo/README.md:
--------------------------------------------------------------------------------
 1 | # Metacar: Demo
 2 | 
 3 | This is the demo website of the metacar project. You can use it as an example to learn how to used the environement.
 4 | 
 5 | ## Install
 6 | 
 7 | ```
 8 | npm install
 9 | npm start
10 | ```
11 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # aihub
2 | 
3 | I use this repository for my <a href="https://www.youtube.com/channel/UCVso5UVvQeGAuwbksmA95iA">Youtube channel</a> where I share videos about Artificial Intelligence. The repository includes Machine Learning, Deep Learning, and Reinforcement learning's code.
4 | 


--------------------------------------------------------------------------------
/genetic/zombies/README.md:
--------------------------------------------------------------------------------
 1 | ### Install
 2 | 
 3 | ```
 4 | npm install
 5 | ```
 6 | 
 7 | ### Develop
 8 | 
 9 | ```
10 | npm run watch
11 | ```
12 | 
13 | ### Build
14 | 
15 | ```
16 | npm run build
17 | ```
18 | 
19 | ### Run demo
20 | 
21 | ```
22 | cd demo
23 | npm run start
24 | ```
25 | 


--------------------------------------------------------------------------------
/genetic/zombies/DEVELOPMENT.md:
--------------------------------------------------------------------------------
 1 | # Metacar: Development
 2 | 
 3 | ### Install
 4 | 
 5 | ```
 6 | npm install
 7 | ```
 8 | 
 9 | ### Develop
10 | 
11 | ```
12 | npm run watch
13 | ```
14 | 
15 | ### Build
16 | 
17 | ```
18 | npm run build
19 | ```
20 | 
21 | ### Build the docs
22 | 
23 | ```
24 | npm run docs
25 | ```
26 | 


--------------------------------------------------------------------------------
/rl/demo_gym.py:
--------------------------------------------------------------------------------
 1 | import gym
 2 | import time
 3 | 
 4 | env = gym.make('Breakout-v0')
 5 | env.reset()
 6 | 
 7 | for _ in range(1000):
 8 |     env.render()
 9 |     observation, reward, done, info = env.step(env.action_space.sample())
10 |     if reward != 0:
11 |         print("Reward", reward)
12 |     time.sleep(0.1)
13 |     if done:
14 |         break
15 | 


--------------------------------------------------------------------------------
/genetic/zombies/demo/package.json:
--------------------------------------------------------------------------------
 1 | {
 2 |   "name": "metacar-demo",
 3 |   "version": "0.0.1",
 4 |   "description": "",
 5 |   "main": "app.js",
 6 |   "scripts": {
 7 |     "start": "node app.js"
 8 |   },
 9 |   "author": "Thibault Neveu",
10 |   "license": "ISC",
11 |   "dependencies": {
12 |     "cors": "^2.8.4",
13 |     "express": "^4.16.3",
14 |     "pixi.js": "^4.8.1"
15 |   }
16 | }
17 | 


--------------------------------------------------------------------------------
/genetic/zombies/CONTRIBUTE.md:
--------------------------------------------------------------------------------
1 | # Metacar: Contribute
2 | 
3 | If you want to be part of the project, whether to implement features in the environment or demonstrate algorithms, feel free to join the [slack channel](https://join.slack.com/t/metacar/shared_invite/enQtMzgyODI4NDMzMDc0LTY1MjIwNzk1MTAzOTBiZjJlOGUwM2YyYjA3MzBmNjQyNjUyMDZkOGNkYmU0MmUyYzUzNGRhNGJhZDE1M2EzNzM) to ask questions and talk about all your fantastic ideas!
4 | 


--------------------------------------------------------------------------------
/genetic/zombies/typedoc.json:
--------------------------------------------------------------------------------
 1 | {
 2 |   "mode": "modules",
 3 |   "out": "demo/docs",
 4 |   "src": "src/index.ts",
 5 |   "theme": "default",
 6 |   "ignoreCompilerErrors": "true",
 7 |   "experimentalDecorators": "true",
 8 |   "emitDecoratorMetadata": "true",
 9 |   "target": "ES5",
10 |   "moduleResolution": "node",
11 |   "preserveConstEnums": "true",
12 |   "stripInternal": "true",
13 |   "suppressExcessPropertyErrors": "true",
14 |   "suppressImplicitAnyIndexErrors": "true",
15 |   "module": "commonjs",
16 |   "hideGenerator": true,
17 |   "excludePrivate": true
18 | }


--------------------------------------------------------------------------------
/meta-learning/Neural Turing Machine.ipynb:
--------------------------------------------------------------------------------
 1 | {
 2 |  "cells": [
 3 |   {
 4 |    "cell_type": "markdown",
 5 |    "metadata": {},
 6 |    "source": [
 7 |     "# Neural Turing Machine"
 8 |    ]
 9 |   }
10 |  ],
11 |  "metadata": {
12 |   "kernelspec": {
13 |    "display_name": "Python 3",
14 |    "language": "python",
15 |    "name": "python3"
16 |   },
17 |   "language_info": {
18 |    "codemirror_mode": {
19 |     "name": "ipython",
20 |     "version": 3
21 |    },
22 |    "file_extension": ".py",
23 |    "mimetype": "text/x-python",
24 |    "name": "python",
25 |    "nbconvert_exporter": "python",
26 |    "pygments_lexer": "ipython3",
27 |    "version": "3.5.2"
28 |   }
29 |  },
30 |  "nbformat": 4,
31 |  "nbformat_minor": 2
32 | }
33 | 


--------------------------------------------------------------------------------
/genetic/zombies/package.json:
--------------------------------------------------------------------------------
 1 | {
 2 |   "name": "zombie",
 3 |   "description": "A reinforcement learning environment for self-driving cars in the browser.",
 4 |   "version": "0.1.1",
 5 |   "main": "dist/geneticzombie.min.js",
 6 |   "author": "Thibault Neveu",
 7 |   "license": "MIT",
 8 |   "devDependencies": {
 9 |     "ts-loader": "^4.3.1",
10 |     "typedoc": "^0.11.1",
11 |     "typescript": "^2.9.1",
12 |     "webpack": "^4.10.2",
13 |     "webpack-cli": "^3.0.2"
14 |   },
15 |   "dependencies": {
16 |     "@types/pixi.js": "^4.7.5",
17 |     "pixi.js": "^4.8.0"
18 |   },
19 |   "scripts": {
20 |     "build": "./node_modules/.bin/webpack-cli --mode production",
21 |     "build-dev": "./node_modules/.bin/webpack-cli --mode development",
22 |     "watch": "./node_modules/.bin/webpack-cli --watch --mode development",
23 |     "docs": "./node_modules/.bin/typedoc --options typedoc.json"
24 |   }
25 | }
26 | 


--------------------------------------------------------------------------------
/genetic/zombies/tsconfig.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "compilerOptions": {
 3 |         "module": "ES2015",
 4 |         "moduleResolution": "node",
 5 |         "noImplicitAny": true,
 6 |         "sourceMap": true,
 7 |         "removeComments": true,
 8 |         "preserveConstEnums": true,
 9 |         "declaration": true,
10 |         "target": "es5",
11 |         "lib": ["es2015", "dom"],
12 |         "outDir": "./dist-es6",
13 |         "noUnusedLocals": false,
14 |         "noImplicitReturns": true,
15 |         "noImplicitThis": true,
16 |         "alwaysStrict": true,
17 |         "noUnusedParameters": false,
18 |         "pretty": true,
19 |         "noFallthroughCasesInSwitch": true,
20 |         "allowUnreachableCode": false,
21 |         "experimentalDecorators": true
22 |     },
23 |     "include": [
24 |         "src/**/*"
25 |     ],
26 |     "exclude": [
27 |         "node_modules",
28 |         "demo"
29 |     ]
30 | }
31 | 


--------------------------------------------------------------------------------
/genetic/zombies/demo/app.yaml:
--------------------------------------------------------------------------------
 1 | # Copyright 2017, Google, Inc.
 2 | # Licensed under the Apache License, Version 2.0 (the "License");
 3 | # you may not use this file except in compliance with the License.
 4 | # You may obtain a copy of the License at
 5 | #
 6 | #    http://www.apache.org/licenses/LICENSE-2.0
 7 | #
 8 | # Unless required by applicable law or agreed to in writing, software
 9 | # distributed under the License is distributed on an "AS IS" BASIS,
10 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11 | # See the License for the specific language governing permissions and
12 | # limitations under the License.
13 | 
14 | # [START app_yaml]
15 | runtime: nodejs
16 | env: flex
17 | 
18 | skip_files:
19 | - yarn.lock
20 | 
21 | # This sample incurs costs to run on the App Engine flexible environment. 
22 | # The settings below are to reduce costs during testing and are not appropriate
23 | # for production use. For more information, see:
24 | # https://cloud.google.com/appengine/docs/flexible/nodejs/configuring-your-app-with-app-yaml
25 | manual_scaling:
26 |   instances: 1
27 | resources:
28 |   cpu: 1
29 |   memory_gb: 0.5
30 |   disk_size_gb: 10
31 | 
32 | # [END app_yaml]
33 | 


--------------------------------------------------------------------------------
/genetic/zombies/webpack.config.js:
--------------------------------------------------------------------------------
 1 | const path = require('path');
 2 | var webpack = require('webpack')
 3 | 
 4 | var config = {
 5 |     entry: './src/index',
 6 |     module: {
 7 |         rules: [
 8 |             {
 9 |                 test: /\.ts$/,
10 |                 use: 'ts-loader',
11 |             }
12 |         ],
13 |     },
14 |     resolve: {
15 |         extensions: [
16 |             '.ts',
17 |         ],
18 |     },
19 |     externals: [
20 |         // Don't bundle pixi.js, assume it'll be included in the HTML via a script
21 |         // tag, and made available in the global variable PIXI.
22 |         {"pixi.js": "PIXI"}
23 |     ]
24 | };
25 | 
26 | var packageConfig = Object.assign({}, config, {
27 |     output: {
28 |         filename: 'geneticzombie.min.js',
29 |         path: path.resolve(__dirname, './dist'),
30 |         library: 'geneticzombie',
31 |         libraryTarget: 'window',
32 |         libraryExport: 'default'
33 |     }
34 | });
35 | 
36 | var demoConfig = Object.assign({}, config,{
37 |     output: {
38 |         filename: 'geneticzombie.min.js',
39 |         path: path.resolve(__dirname, './demo/dist'),
40 |         library: 'geneticzombie',
41 |         libraryTarget: 'window',
42 |         libraryExport: 'default'
43 |     }
44 | });
45 | 
46 | module.exports = [
47 |     packageConfig, demoConfig,
48 | ];
49 | 


--------------------------------------------------------------------------------
/genetic/zombies/demo/webapp/index.html:
--------------------------------------------------------------------------------
 1 | <!doctype html>
 2 | <html>
 3 | <head>
 4 |   <meta charset="utf-8">
 5 |   <title>Genetic-Zombie: A survival experiment based on genetic algorithm</title>
 6 |   <link rel="icon" href="/public/img/icon.png">
 7 |   <link rel="stylesheet" href="/public/css/general.css">
 8 |   <link href="https://fonts.googleapis.com/css?family=Tajawal" rel="stylesheet">
 9 |   <script src="https://cdn.jsdelivr.net/npm/chart.js@2.8.0"></script>
10 |   <script src="//cdnjs.cloudflare.com/ajax/libs/seedrandom/3.0.1/seedrandom.min.js">
11 |   </script>
12 | </head>
13 | <body>
14 | 
15 |     <header>
16 |       <div class="header_container">
17 |         <h1><a href="/">Genetic-Zombie</a></h1>
18 |         <a href="https://github.com/thibo73800/metacar"><img src="/public/img/github-logo.png" /></a>
19 |     </div>
20 |     </header>
21 | 
22 |     <div class="body_container">
23 |       <h2><br>Genetic-Zombie</h2>
24 |       <h3>A survival experiment based on genetic algorithm</h3>
25 |     </div>
26 | 
27 |     <div id="canvas_container" class="canvas_container"></div>
28 |     <canvas id="genomeChart"></canvas>
29 | 
30 | 
31 |     <div class="slidecontainer">
32 |       <input type="range" min="0" max="100" value="1" class="slider" id="speed">
33 |     </div>
34 | 
35 |     <script src="https://cdnjs.cloudflare.com/ajax/libs/pixi.js/4.7.1/pixi.min.js"></script>
36 |     <script src="/dist/geneticzombie.min.js"></script>
37 | 
38 |     <script type="text/javascript" src="/public/js/utils.js"></script>
39 |     <script type="text/javascript" src="/public/js/index.js"></script>
40 | </body>
41 | </html>
42 | 


--------------------------------------------------------------------------------
/genetic/introduction/simple_genetic_exemple.html:
--------------------------------------------------------------------------------
 1 | <!DOCTYPE html>
 2 | <html lang="en" dir="ltr">
 3 |   <head>
 4 |     <meta charset="utf-8">
 5 |     <style>
 6 |       body {
 7 |         background-color: #363636;
 8 |       }
 9 | 
10 |       ol {
11 |         display: block;
12 |         float: left;
13 |         width: 200px;
14 |         margin-left: 100px;
15 |         margin-right: 50px;
16 |       }
17 | 
18 |       ol li {
19 |         color: white;
20 |         font-size: 57px;
21 |       }
22 | 
23 |       button {
24 |         background: white;
25 |         color: black;
26 |         padding: 15px;
27 |         font-weight: bold;
28 |         cursor: pointer;
29 |         border: none;
30 |       }
31 | 
32 |     </style>
33 |     <title>Simple Genetic Exemple</title>
34 |   </head>
35 |   <body>
36 | 
37 |     <ol>
38 |       <li id="ind_1"></li>
39 |       <li id="ind_2"></li>
40 |       <li id="ind_3"></li>
41 |       <li id="ind_4"></li>
42 |       <li id="ind_5"></li>
43 |       <li id="ind_6"></li>
44 |       <li id="ind_7"></li>
45 |       <li id="ind_8"></li>
46 |     </ol>
47 | 
48 |     <ol>
49 |       <li id="sel_1"></li>
50 |       <li id="sel_2"></li>
51 |       <li id="sel_3"></li>
52 |       <li id="sel_4"></li>
53 |     </ol>
54 | 
55 |   <button id="sort" type="button" name="button">Sort</button>
56 |   <button id="selection" type="button" name="button">Sélection</button>
57 |   <button id="mutate" type="button" name="button">Mutate</button>
58 |   <button id="fillpop" type="button" name="button">Fill pop</button>
59 | 
60 |   <script src="js/simple_genetic_example.js"></script>
61 |   </body>
62 | </html>
63 | 


--------------------------------------------------------------------------------
/genetic/zombies/demo/app.js:
--------------------------------------------------------------------------------
 1 | 'use strict';
 2 | 
 3 | var fs        = require('fs');
 4 | const express = require('express');
 5 | var path      = require("path");
 6 | var cors      = require('cors');
 7 | 
 8 | function fromDir(startPath,filter){
 9 |   var files_list = [];
10 |   var files=fs.readdirSync(startPath);
11 |   for(var i=0;i<files.length;i++){
12 |       var filename=path.join(startPath,files[i]);
13 |       var stat = fs.lstatSync(filename);
14 |       if (filename.indexOf(filter)>=0) {
15 |           files_list.push(filename);
16 |       };
17 |   };
18 |   return files_list;
19 | }
20 | 
21 | function get_path(file){
22 |   return path.join(path.join(__dirname, "webapp/"), file);
23 | }
24 | 
25 | const app = express();
26 | app.use(cors());
27 | 
28 | app.use("/dist", express.static(path.join(__dirname, "dist/")));
29 | app.use("/public",   express.static(path.join(__dirname, "webapp/public/")));
30 | app.use("/docs",   express.static(path.join(__dirname, "docs/")));
31 | 
32 | app.get('/', (req, res) => {
33 |   res.sendFile(get_path("index.html"));
34 | });
35 | 
36 | /**
37 |  * Create all HTML routes
38 |  * **/
39 | const files = fromDir(path.join(__dirname, "webapp/"),'.html');
40 | files.forEach(file => {
41 |   let route = file.split("/");
42 |   route = route[route.length - 1];
43 | 
44 |   console.log("Open route:", route, file);
45 |   app.get('/'+route, (req, res) => {
46 |     res.sendFile(file);
47 |   });
48 | });
49 | 
50 | // Start the server
51 | const PORT = process.env.PORT || 3000;
52 | app.listen(PORT, () => {
53 |   console.log(`App listening on port ${PORT}`);
54 |   console.log('Press Ctrl+C to quit.');
55 | });
56 | 


--------------------------------------------------------------------------------
/rl/test_human.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | 
 3 | from __future__ import division, print_function
 4 | 
 5 | import sys
 6 | import numpy
 7 | import gym
 8 | import time
 9 | from optparse import OptionParser
10 | 
11 | import gym_minigrid
12 | 
13 | def main():
14 |     # Load the gym environment
15 |     env = gym.make("MiniGrid-Empty-6x6-v0")
16 | 
17 |     env.reset()
18 |     renderer = env.render('human')
19 | 
20 |     def keyDownCb(keyName):
21 |         if keyName == 'BACKSPACE':
22 |             resetEnv()
23 |             return
24 | 
25 |         if keyName == 'ESCAPE':
26 |             sys.exit(0)
27 | 
28 |         action = 0
29 | 
30 |         if keyName == 'LEFT':
31 |             action = env.actions.left
32 |         elif keyName == 'RIGHT':
33 |             action = env.actions.right
34 |         elif keyName == 'UP':
35 |             action = env.actions.forward
36 | 
37 |         elif keyName == 'SPACE':
38 |             action = env.actions.toggle
39 |         elif keyName == 'PAGE_UP':
40 |             action = env.actions.pickup
41 |         elif keyName == 'PAGE_DOWN':
42 |             action = env.actions.drop
43 | 
44 |         elif keyName == 'RETURN':
45 |             action = env.actions.done
46 | 
47 |         else:
48 |             print("unknown key %s" % keyName)
49 |             return
50 | 
51 |         obs, reward, done, info = env.step(action)
52 | 
53 |         print('step=%s, reward=%.2f' % (env.step_count, reward))
54 | 
55 |         if done:
56 |             print('done!')
57 |             resetEnv()
58 | 
59 |     renderer.window.setKeyDownCb(keyDownCb)
60 | 
61 |     while True:
62 |         env.render('human')
63 |         time.sleep(0.01)
64 | 
65 |         # If the window was closed
66 |         if renderer.window == None:
67 |             break
68 | 
69 | if __name__ == "__main__":
70 |     main()
71 | 


--------------------------------------------------------------------------------
/metacar/index.html:
--------------------------------------------------------------------------------
 1 | <!DOCTYPE html>
 2 | <html>
 3 | <head>
 4 |     <meta charset="utf-8" />
 5 |     <title>Metacar: Documentation</title>
 6 |     <script src="https://cdnjs.cloudflare.com/ajax/libs/pixi.js/4.7.1/pixi.min.js"></script>
 7 | </head>
 8 | <body>
 9 | 
10 |     <div id="env"></div>
11 | 
12 |     <script src="https://cdn.jsdelivr.net/combine/npm/metacar@0.1.1,npm/metacar@0.1.1"></script>
13 |     <script type="text/javascript">
14 |         // Select a level
15 |         const level = metacar.level.level0;
16 |         // Create the environement
17 |         const env = new metacar.env("env", level);
18 |         env.carsMoving(false);
19 |         env.setAgentMotion(metacar.motion.ControlMotion);
20 |         let gather = false;
21 | 
22 |         // Load it
23 |         env.load();
24 | 
25 |         // TODO:
26 |         // Build the neural network you need for this task
27 | 
28 |         env.loop(() => {
29 |             if (gather){
30 |                 // This method is called at each update of the scene
31 |                 const state = env.getState();
32 |                 // TODO: Gather the information you need to train the NN
33 |             }
34 |         });
35 | 
36 |         env.addEvent("Gather: On", () => {
37 |             gather = true;
38 |         });
39 | 
40 |         env.addEvent("Gather: Off", () => {
41 |             gather = false;
42 |         });
43 | 
44 |         env.addEvent("Train NN", () => {
45 |             // TODO: Train the neural network
46 |         });
47 | 
48 |         env.addEvent("play", () => {
49 |             // TODO: Use the neural network you just trained to drive
50 |             // on the road
51 |             // The goal is to predict the steering angle
52 |             const steering = 0.;
53 | 
54 |             const reward = env.step([1., steering]);
55 |         });
56 |     </script>
57 | </body>
58 | </html>
59 | 


--------------------------------------------------------------------------------
/rl/q_learning.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | 
 3 | import numpy as np
 4 | import gym
 5 | import gym_minigrid
 6 | import time
 7 | 
 8 | def state_to_key(obs):
 9 |     return str(obs["image"].tolist()+[obs["direction"]]).strip()
10 | 
11 | def update_Q(Q, s, sp, a, r, done):
12 |     if s not in Q:
13 |         Q[s] = np.array([0., 0., 0., 0.])
14 |     if sp not in Q:
15 |         Q[sp] = np.array([0., 0., 0., 0.])
16 | 
17 |     ap = np.argmax(Q[sp])
18 |     if not done:
19 |         Q[s][a] = Q[s][a] + 0.01*(r + 0.99*Q[sp][ap] - Q[s][a])
20 |     else:
21 |         Q[s][a] = Q[s][a] + 0.01*(r - Q[s][a])
22 | 
23 | def create_state_if_not_exist(Q, s):
24 |     if s not in Q:
25 |         Q[s] = np.array([0., 0., 0., 0.])
26 | 
27 | def main():
28 | 
29 |     Q = {}
30 | 
31 |     env = gym.make("MiniGrid-Empty-6x6-v0")
32 |     eps = 0.01
33 | 
34 |     for epoch in range(100):
35 | 
36 |         s = env.reset()
37 |         s = state_to_key(s)
38 |         done = False
39 | 
40 |         while not done:
41 | 
42 |             if np.random.rand() < eps:
43 |                 a = np.random.randint(0, 4)
44 |             else:
45 |                 create_state_if_not_exist(Q, s)
46 |                 a = np.argmax(Q[s])
47 | 
48 |             sp, r, done, info = env.step(a)
49 |             sp = state_to_key(sp)
50 | 
51 |             update_Q(Q, s, sp, a, r, done)
52 | 
53 |             s = sp
54 | 
55 |         print("eps", eps)
56 |         eps = max(0.1, eps*0.99)
57 | 
58 | 
59 |     for epoch in range(100):
60 |         s = env.reset()
61 |         s = state_to_key(s)
62 |         done = False
63 | 
64 |         while not done:
65 |             create_state_if_not_exist(Q, s)
66 |             a = np.argmax(Q[s])
67 |             sp, r, done, info = env.step(a)
68 |             sp = state_to_key(sp)
69 |             s = sp
70 |             env.render()
71 |             time.sleep(0.1)
72 |         print("r", r)
73 | 
74 | if __name__ == "__main__":
75 |     main()
76 | 


--------------------------------------------------------------------------------
/genetic/zombies/demo/webapp/public/js/utils.js:
--------------------------------------------------------------------------------
 1 | function update_genome_to_display(genomed, genome){
 2 |   genomed = [
 3 |     genomed[0]+1,
 4 |     genomed[1]+genome[0],
 5 |     genomed[2]+genome[1],
 6 |     genomed[3]+genome[2],
 7 |     genomed[4]+genome[3]
 8 |   ]
 9 |   return genomed;
10 | }
11 | 
12 | function display_genome(genome){
13 |   console.log(chart_genome);
14 |   genome = [
15 |     genome[0]/30, // Scale between 0 and 1
16 |     genome[1]/genome[0], // Take the mean
17 |     genome[2]/genome[0], // Take the mean
18 |     genome[3]/genome[0], // Take the mean
19 |     genome[4]/genome[0] // Take the mean
20 |   ]
21 |   chart_genome.data.datasets.forEach((dataset) => {
22 |     dataset.data = genome;
23 |   });
24 |   chart_genome.update();
25 | }
26 | 
27 | var ctx = document.getElementById('genomeChart').getContext('2d');
28 | var chart_genome = new Chart(ctx, {
29 |     type: 'bar',
30 |     data: {
31 |         labels: ['Size', 'Gen1 (Speed)', 'Gen2 (Perception)', 'Gen3 (Accuracy)', 'Gen4 (bullets)'],
32 |         datasets: [{
33 |             label: 'Genome',
34 |             labelColor: 'rgba(0, 99, 132, 0.2)',
35 |             data: [],
36 |             backgroundColor: [
37 |                 'rgba(255, 99, 132, 0.2)',
38 |                 'rgba(54, 162, 235, 0.2)',
39 |                 'rgba(255, 206, 86, 0.2)',
40 |                 'rgba(75, 192, 192, 0.2)',
41 |                 'rgba(153, 102, 255, 0.2)'
42 |             ],
43 |             borderColor: [
44 |                 'rgba(255, 99, 132, 1)',
45 |                 'rgba(54, 162, 235, 1)',
46 |                 'rgba(255, 206, 86, 1)',
47 |                 'rgba(75, 192, 192, 1)',
48 |                 'rgba(153, 102, 255, 1)'
49 |             ],
50 |             borderWidth: 1
51 |         }]
52 |     },
53 |     options: {
54 |       responsive:false,
55 |       maintainAspectRatio: false,
56 |         scales: {
57 |             yAxes: [{
58 |                 ticks: {
59 |                     beginAtZero: true
60 |                 }
61 |             }]
62 |         }
63 |     }
64 | });
65 | 


--------------------------------------------------------------------------------
/rl/dqn_use.py:
--------------------------------------------------------------------------------
 1 | import tensorflow as tf
 2 | import numpy as np
 3 | import gym
 4 | import time
 5 | import os
 6 | 
 7 | class StateProcessor():
 8 | 
 9 |     def __init__(self):
10 |       with tf.variable_scope("process"):
11 |         self.input_state = tf.placeholder(shape=[210, 160, 3], dtype=tf.uint8, name="input_process")
12 |         self.output = tf.image.rgb_to_grayscale(self.input_state)
13 |         self.output = tf.image.crop_to_bounding_box(self.output, 34, 0, 160, 160)
14 |         self.output = tf.image.resize_images(
15 |             self.output, [84, 84], method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
16 |         self.output = tf.squeeze(self.output)
17 | 
18 |     def process(self, sess, state):
19 |         return sess.run(self.output, { self.input_state: state })
20 | 
21 | tf.reset_default_graph()
22 | 
23 | checkpoint = os.path.join("./checkpoints", "model")
24 | new_saver = tf.train.import_meta_graph(checkpoint + '.meta')
25 | 
26 | with tf.Session() as sess:
27 |   # Restore variables from disk.
28 | 
29 | 
30 |   new_saver.restore(sess, checkpoint)
31 | 
32 | 
33 |   input_data = tf.get_default_graph().get_tensor_by_name('dqn/X:0')
34 |   input_process = tf.get_default_graph().get_tensor_by_name('process/input_process:0')
35 |   probs = tf.get_default_graph().get_tensor_by_name('dqn/predictions:0')
36 | 
37 |   state_processor = StateProcessor()
38 | 
39 |   env = gym.make('Breakout-v0')
40 |   env.reset()
41 | 
42 |   state = env.reset()
43 |   state = state_processor.process(sess, state)
44 |   state = np.stack([state] * 4, axis=2)
45 | 
46 |   for _ in range(1000):
47 |       import time
48 |       time.sleep(0.05)
49 |       env.render()
50 |       p = sess.run(probs, feed_dict={input_data: [state]})[0]
51 |       action = np.argmax(p)
52 |       next_state, reward, done, info = env.step(action)
53 | 
54 |       # Add this action to the stack of images
55 |       next_state = state_processor.process(sess, next_state)
56 |       next_state = np.append(state[:,:,1:], np.expand_dims(next_state, 2), axis=2)
57 | 
58 |       if reward != 0:
59 |           print("Reward", reward)
60 | 
61 |       state = next_state
62 | 
63 |       if done:
64 |           break
65 | 


--------------------------------------------------------------------------------
/genetic/zombies/demo/webapp/public/js/index.js:
--------------------------------------------------------------------------------
 1 | /*
 2 |   Create the initial population in the environment
 3 | */
 4 | function createPopulation(env) {
 5 |   // Create individuals with one specific or random genome
 6 |   for (let i = 0; i < 15; i++){
 7 |     //env.createHuman([Math.random(), Math.random(), Math.random(), Math.random()]);
 8 |     env.createHuman([0.5, 0.5, 0.5, 0.5]);
 9 |   }
10 |   // Create individuals with one specific or random genome
11 |   for (let i = 0; i < 15; i++){
12 |     //env.createHuman([Math.random(), Math.random(), Math.random(), Math.random()]);
13 |     env.createHuman([0.5, 0.5, 0.5, 0.5]);
14 |   }
15 | }
16 | 
17 | /*
18 |   Method used to select amoung the last population the best
19 |   individuals to create a new population
20 | */
21 | function selection(env){
22 |   let genomed = [0, 0, 0, 0, 0];
23 | 
24 |   // Compute the total fitness of all individuals
25 |   let total_fitness = 0;
26 |   for (let p in env.last_population){
27 |     total_fitness += env.last_population[p].lifeduration;
28 |   }
29 | 
30 |   // Sort individual by their best fitness
31 |   env.last_population.sort((i1, i2) => {
32 |       i1.prob = i1.lifeduration / total_fitness;
33 |       i2.prob = i2.lifeduration / total_fitness;
34 |       return i2.lifeduration - i1.lifeduration;
35 |   });
36 | 
37 |   for (let i = 0; i<30; i++){
38 | 
39 |     let rd = Math.random();
40 |     let lastprop = 0;
41 | 
42 |     for (let p in env.last_population){
43 |       let h = env.last_population[p];
44 |       if (rd >= lastprop && rd < h.prob + lastprop){
45 |         //if (h.genome[3] == 0.75){
46 |         genomed = update_genome_to_display(genomed, h.genome);
47 |         //}
48 |         env.createHuman(env.last_population[p].genome);
49 |         break;
50 |       }
51 |       lastprop += env.last_population[p].prob;
52 |     }
53 | 
54 |   }
55 | 
56 |   display_genome(genomed);
57 | }
58 | 
59 | function step(env, delta){
60 |     // Retrieve the desired speed
61 |     let speed = document.getElementById("speed").value;
62 |     // Step in the environment with the given speed
63 |     for (let i = 0; i<speed; i++){
64 |       env.step();
65 |     }
66 |     // If all Humans died or all zombies died
67 |     if (env.humans.length <= 1 || env.zombies.length <= 1){
68 |         env.emptyHumans(); // Remove all Human left
69 |         env.emptyZomvies(); // Remove all zombies left
70 |         selection(env); // Select the new population
71 |         env.createZombies(); // Create the zombies
72 |         //document.getElementById("speed").value = 1;
73 |     }
74 |     // Step again a the next animation frame
75 |     window.requestAnimationFrame(() => {
76 |       step(env);
77 |     });
78 | }
79 | 
80 | function init(env){
81 |   env.reset();
82 |   createPopulation(env);
83 |   step(env);
84 | }
85 | 
86 | var env = new geneticzombie.env("canvas_container", {nbbullets : 10});
87 | env.init(init, null);
88 | 


--------------------------------------------------------------------------------
/rl/grid_world_q_learning.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | from random import randint
  3 | import random
  4 | 
  5 | class EnvGrid(object):
  6 |     """
  7 |         docstring forEnvGrid.
  8 |     """
  9 |     def __init__(self):
 10 |         super(EnvGrid, self).__init__()
 11 | 
 12 |         self.grid = [
 13 |             [0, 0, 1],
 14 |             [0, -1, 0],
 15 |             [0, 0, 0]
 16 |         ]
 17 |         # Starting position
 18 |         self.y = 2
 19 |         self.x = 0
 20 | 
 21 |         self.actions = [
 22 |             [-1, 0], # Up
 23 |             [1, 0], #Down
 24 |             [0, -1], # Left
 25 |             [0, 1] # Right
 26 |         ]
 27 | 
 28 |     def reset(self):
 29 |         """
 30 |             Reset world
 31 |         """
 32 |         self.y = 2
 33 |         self.x = 0
 34 |         return (self.y*3+self.x+1)
 35 | 
 36 |     def step(self, action):
 37 |         """
 38 |             Action: 0, 1, 2, 3
 39 |         """
 40 |         self.y = max(0, min(self.y + self.actions[action][0],2))
 41 |         self.x = max(0, min(self.x + self.actions[action][1],2))
 42 | 
 43 |         return (self.y*3+self.x+1) , self.grid[self.y][self.x]
 44 | 
 45 |     def show(self):
 46 |         """
 47 |             Show the grid
 48 |         """
 49 |         print("---------------------")
 50 |         y = 0
 51 |         for line in self.grid:
 52 |             x = 0
 53 |             for pt in line:
 54 |                 print("%s\t" % (pt if y != self.y or x != self.x else "X"), end="")
 55 |                 x += 1
 56 |             y += 1
 57 |             print("")
 58 | 
 59 |     def is_finished(self):
 60 |         return self.grid[self.y][self.x] == 1
 61 | 
 62 | def take_action(st, Q, eps):
 63 |     # Take an action
 64 |     if random.uniform(0, 1) < eps:
 65 |         action = randint(0, 3)
 66 |     else: # Or greedy action
 67 |         action = np.argmax(Q[st])
 68 |     return action
 69 | 
 70 | if __name__ == '__main__':
 71 |     env = EnvGrid()
 72 |     st = env.reset()
 73 | 
 74 |     Q = [
 75 |         [0, 0, 0, 0],
 76 |         [0, 0, 0, 0],
 77 |         [0, 0, 0, 0],
 78 |         [0, 0, 0, 0],
 79 |         [0, 0, 0, 0],
 80 |         [0, 0, 0, 0],
 81 |         [0, 0, 0, 0],
 82 |         [0, 0, 0, 0],
 83 |         [0, 0, 0, 0],
 84 |         [0, 0, 0, 0]
 85 |     ]
 86 | 
 87 |     for _ in range(100):
 88 |         # Reset the game
 89 |         st = env.reset()
 90 |         while not env.is_finished():
 91 |             #env.show()
 92 |             #at = int(input("$>"))
 93 |             at = take_action(st, Q, 0.4)
 94 | 
 95 |             stp1, r = env.step(at)
 96 |             #print("s", stp1)
 97 |             #print("r", r)
 98 | 
 99 |             # Update Q function
100 |             atp1 = take_action(stp1, Q, 0.0)
101 |             Q[st][at] = Q[st][at] + 0.1*(r + 0.9*Q[stp1][atp1] - Q[st][at])
102 | 
103 |             st = stp1
104 | 
105 |     for s in range(1, 10):
106 |         print(s, Q[s])
107 | 


--------------------------------------------------------------------------------
/genetic/introduction/js/simple_genetic_example.js:
--------------------------------------------------------------------------------
  1 | function individual_to_string(individual){
  2 |   let st = "";
  3 |   for (let i=0; i<8; i++){
  4 |     st += individual[i].toString();
  5 |   }
  6 |   return st;
  7 | }
  8 | 
  9 | function display_pop(pop){
 10 |   // Display each individual
 11 |   for (let p=0; p<8; p++){
 12 |     // Display the individual on the screen
 13 |     let st = individual_to_string(pop[p]);
 14 |     document.getElementById("ind_" + (p + 1)).innerHTML = st;
 15 |   }
 16 | }
 17 | 
 18 | function display_subpop(subpop){
 19 |   // Display each individual
 20 |   for (let p=0; p<4; p++){
 21 |     // Display the individual on the screen
 22 |     let st = individual_to_string(subpop[p]);
 23 |     document.getElementById("sel_" + (p + 1)).innerHTML = st;
 24 |   }
 25 | }
 26 | 
 27 | function compute_fitness(individual){
 28 |   score = 0;
 29 |   for (let i=0; i<8; i++){
 30 |     score += individual[i];
 31 |   }
 32 |   return score;
 33 | }
 34 | 
 35 | function cross_over(i1, i2){
 36 |   let individual = [
 37 |     i1[0],
 38 |     i1[1],
 39 |     i1[2],
 40 |     i1[3],
 41 |     i2[4],
 42 |     i2[5],
 43 |     i2[6],
 44 |     i2[7]
 45 |   ];
 46 |   return individual;
 47 | }
 48 | 
 49 | function generate_individual(){
 50 |   let individual = [];
 51 |   for (let i=0; i<8; i++){
 52 |     individual.push(Math.random() > 0.5 ? 1 : 0);
 53 |   }
 54 |   return individual;
 55 | }
 56 | 
 57 | function generate_population(){
 58 |   // Init empty population
 59 |   let pop = []
 60 |   for (let p=0; p<8; p++){
 61 |     // Randomly generate one individual with random genes
 62 |     pop.push(generate_individual());
 63 |   }
 64 |   return pop;
 65 | }
 66 | 
 67 | let pop = generate_population();
 68 | console.log(pop);
 69 | display_pop(pop);
 70 | //var n_pop = [];
 71 | 
 72 | document.getElementById("sort").addEventListener("click", () => {
 73 |   // Sort function
 74 |   pop.sort((i1, i2) => {
 75 |     return compute_fitness(i2) - compute_fitness(i1);
 76 |   });
 77 |   console.log(pop);
 78 |   display_pop(pop);
 79 | });
 80 | 
 81 | document.getElementById("selection").addEventListener("click", () => {
 82 |   n_pop = [];
 83 |   n_pop.push(pop[0]);
 84 |   n_pop.push(cross_over(pop[0], pop[1]));
 85 |   n_pop.push(cross_over(pop[0], pop[2]));
 86 |   n_pop.push(cross_over(pop[1], pop[2]));
 87 |   display_subpop(n_pop);
 88 | });
 89 | 
 90 | document.getElementById("mutate").addEventListener("click", () => {
 91 |   for (let i=0; i<4; i++){
 92 |     let individual = n_pop[i];
 93 |     let gene_to_mutate = Math.floor(Math.random()*individual.length);
 94 |     if (individual[gene_to_mutate] == 0) {
 95 |       individual[gene_to_mutate] = 1;
 96 |     }
 97 |     else {
 98 |       individual[gene_to_mutate] = 0;
 99 |     }
100 |     display_subpop(n_pop);
101 |   }
102 | });
103 | 
104 | document.getElementById("fillpop").addEventListener("click", () => {
105 |   pop = []
106 |   pop.push(n_pop[0]);
107 |   pop.push(n_pop[1]);
108 |   pop.push(n_pop[2]);
109 |   pop.push(n_pop[3]);
110 | 
111 |   pop.push(generate_individual());
112 |   pop.push(generate_individual());
113 |   pop.push(generate_individual());
114 |   pop.push(generate_individual());
115 | 
116 |   display_pop(pop);
117 | 
118 |   document.getElementById("sel_1").innerHTML = "";
119 |   document.getElementById("sel_2").innerHTML = "";
120 |   document.getElementById("sel_3").innerHTML = "";
121 |   document.getElementById("sel_4").innerHTML = "";
122 | });
123 | 


--------------------------------------------------------------------------------
/genetic/zombies/demo/webapp/public/css/general.css:
--------------------------------------------------------------------------------
  1 | body{
  2 |     background: #030303;
  3 |     font-family: 'Tajawal', sans-serif;
  4 |     padding: 0;
  5 |     margin: 0;
  6 |     padding-bottom: 100px;
  7 |     padding-top: 70px;
  8 | }
  9 | 
 10 | #genomeChart {
 11 |   display: block;
 12 |   float: left;
 13 |   width: 500px;
 14 |   height: 500px;
 15 | }
 16 | 
 17 | canvas {
 18 |   display: block;
 19 |   float: left;
 20 | }
 21 | 
 22 | header{
 23 |     width: 100%;
 24 |     height: 70px;
 25 |     background: #383838;
 26 |     position: fixed;
 27 |     top:0px;
 28 |     left: 0px;
 29 | }
 30 | 
 31 | .header_container{
 32 |     width: 100%;
 33 |     max-width: 800px;
 34 |     height: 70px;
 35 |     margin: auto;
 36 |     overflow: hidden;
 37 | }
 38 | 
 39 | .header_container h1{
 40 |     line-height: 45px;
 41 |     color: white;
 42 |     float: left;
 43 |     font-weight: 100;
 44 | }
 45 | 
 46 | .header_container a, .header_container a:link{
 47 |     color: white;
 48 |     text-decoration: none;
 49 | }
 50 | 
 51 | .header_container a{
 52 |     float: right;
 53 | }
 54 | 
 55 | .header_container img{
 56 |     width: 50px;
 57 |     margin-top: 10px;
 58 | }
 59 | 
 60 | .canvas_container{
 61 |     background-color: #000000;
 62 |     height: auto;
 63 |     margin-bottom: 50px;
 64 |     width: 1200px;
 65 |     height: 800px;
 66 |     float: left;
 67 |     margin-left: 100px;
 68 | }
 69 | 
 70 | .canvas_container .canvas{
 71 |   text-align: center;
 72 |   margin: auto;
 73 | }
 74 | 
 75 | .canvas_container .canvas canvas{
 76 |     margin-bottom: -40px;
 77 |     -webkit-box-shadow: -1px 2px 20px -2px rgba(0,0,0,0.5);
 78 |     -moz-box-shadow: -1px 2px 20px -2px rgba(0,0,0,0.5);
 79 |     box-shadow: -1px 2px 20px -2px rgba(0,0,0,0.5);
 80 | }
 81 | 
 82 | .body_container{
 83 |     width: 100%;
 84 |     max-width: 800px;
 85 |     min-height: 200px;
 86 |     margin: auto;
 87 | }
 88 | 
 89 | .body_container h2{
 90 |     text-align: center;
 91 |     font-size: 50px;
 92 |     font-weight: 100;
 93 |     color: #cacaca;
 94 |     margin-top: 0px;
 95 |     margin-bottom: 0px;
 96 | }
 97 | 
 98 | .body_container h3{
 99 |     margin-top: 0px;
100 |     margin-bottom: 0px;
101 |     text-align: center;
102 |     font-size: 30px;
103 |     font-weight: 100;
104 |     color: #cacaca;
105 | }
106 | 
107 | .body_container pre{
108 |     background: #383838;
109 |     color: white;
110 |     padding: 10px;
111 |     font-size: 20px;
112 | }
113 | 
114 | .body_container p{
115 |     font-size: 16px;
116 | }
117 | 
118 | .level_link_box{
119 |     width: 100%;
120 |     background: white;
121 | }
122 | 
123 | .level_link_box h4, .level_link_box a{
124 |     margin-bottom: 0px;
125 |     text-align: center;
126 |     font-size: 22px;
127 |     font-weight: 100;
128 |     color: #484848;
129 |     text-decoration: none;
130 | }
131 | 
132 | .level_link_box p{
133 |     padding: 10px;
134 |     font-size: 18px;
135 | }
136 | 
137 | .level_link_box p a, .level_link_box p a:link{
138 |     color: #ed4532;
139 |     text-decoration: none;
140 |     font-weight: bold;
141 | }
142 | 
143 | .floatleft {
144 |     float: left;
145 |     margin-right: 20px;
146 | }
147 | 
148 | .floatright {
149 |     float: right;
150 |     margin-left: 20px;
151 | }
152 | 
153 | .metacar_buttons_container{
154 |     text-align: center;
155 |     margin: auto;
156 |     margin-top: 60px;
157 |     max-width: 800px;
158 | }
159 | 
160 | .metacar_buttons_container button{
161 |     display: inline-block;
162 |     padding: 10px;
163 |     border: none;
164 |     color: white;
165 |     cursor: pointer;
166 |     text-align: center;
167 |     margin: auto;
168 |     margin-right: 5px;
169 |     border-radius:5px;
170 |     -moz-border-radius:5px;
171 |     -webkit-border-radius:5px;
172 |     margin-bottom: 5px;
173 |     background: #383838;
174 | }
175 | 
176 | .metacar_img{
177 |     background: #71944c;
178 |     display: inline-block;
179 |     padding: 10px;
180 |     margin: 5px;
181 |     -moz-border-radius:5px;
182 |     -webkit-border-radius:5px;
183 |     margin-bottom: 5px;
184 | }
185 | 


--------------------------------------------------------------------------------
/rl/sticks.py:
--------------------------------------------------------------------------------
  1 | from random import randint
  2 | import random
  3 | import numpy as np
  4 | 
  5 | class StickGame(object):
  6 |     """
  7 |         StickGame.
  8 |     """
  9 | 
 10 |     def __init__(self, nb):
 11 |         # @nb Number of stick to play with
 12 |         super(StickGame, self).__init__()
 13 |         self.original_nb = nb
 14 |         self.nb = nb
 15 | 
 16 |     def is_finished(self):
 17 |         # Check if the game is over @return Boolean
 18 |         if self.nb <= 0:
 19 |             return True
 20 |         return False
 21 | 
 22 |     def reset(self):
 23 |         # Reset the state of the game
 24 |         self.nb = self.original_nb
 25 |         return self.nb
 26 | 
 27 |     def display(self):
 28 |         # Display the state of the game
 29 |         print ("| " * self.nb)
 30 | 
 31 |     def step(self, action):
 32 |         # @action either 1, 2 or 3. Take an action into the environement
 33 |         self.nb -= action
 34 |         if self.nb <= 0:
 35 |             return None, -1
 36 |         else:
 37 |             return self.nb, 0
 38 | 
 39 | class StickPlayer(object):
 40 |     """
 41 |         Stick Player
 42 |     """
 43 | 
 44 |     def __init__(self, is_human, size, trainable=True):
 45 |         # @nb Number of stick to play with
 46 |         super(StickPlayer, self).__init__()
 47 |         self.is_human = is_human
 48 |         self.history = []
 49 |         self.V = {}
 50 |         for s in range(1, size+1):
 51 |             self.V[s] = 0.
 52 |         self.win_nb = 0.
 53 |         self.lose_nb = 0.
 54 |         self.rewards = []
 55 |         self.eps = 0.99
 56 |         self.trainable = trainable
 57 | 
 58 |     def reset_stat(self):
 59 |         # Reset stat
 60 |         self.win_nb = 0
 61 |         self.lose_nb = 0
 62 |         self.rewards = []
 63 | 
 64 |     def greedy_step(self, state):
 65 |         # Greedy step
 66 |         actions = [1, 2, 3]
 67 |         vmin = None
 68 |         vi = None
 69 |         for i in range(0, 3):
 70 |             a = actions[i]
 71 |             if state - a > 0 and (vmin is None or vmin > self.V[state - a]):
 72 |                 vmin = self.V[state - a]
 73 |                 vi = i
 74 |         return actions[vi if vi is not None else 1]
 75 | 
 76 |     def play(self, state):
 77 |         # PLay given the @state (int)
 78 |         if self.is_human is False:
 79 |             # Take random action
 80 |             if random.uniform(0, 1) < self.eps:
 81 |                 action = randint(1, 3)
 82 |             else: # Or greedy action
 83 |                 action = self.greedy_step(state)
 84 |         else:
 85 |             action = int(input("$>"))
 86 |         return action
 87 | 
 88 |     def add_transition(self, n_tuple):
 89 |         # Add one transition to the history: tuple (s, a , r, s')
 90 |         self.history.append(n_tuple)
 91 |         s, a, r, sp = n_tuple
 92 |         self.rewards.append(r)
 93 | 
 94 |     def train(self):
 95 |         if not self.trainable or self.is_human is True:
 96 |             return
 97 | 
 98 |         # Update the value function if this player is not human
 99 |         for transition in reversed(self.history):
100 |             s, a, r, sp = transition
101 |             if r == 0:
102 |                 self.V[s] = self.V[s] + 0.001*(self.V[sp] - self.V[s])
103 |             else:
104 |                 self.V[s] = self.V[s] + 0.001*(r - self.V[s])
105 | 
106 |         self.history = []
107 | 
108 | def play(game, p1, p2, train=True):
109 |     state = game.reset()
110 |     players = [p1, p2]
111 |     random.shuffle(players)
112 |     p = 0
113 |     while game.is_finished() is False:
114 | 
115 |         if players[p%2].is_human:
116 |             game.display()
117 | 
118 |         action = players[p%2].play(state)
119 |         n_state, reward = game.step(action)
120 | 
121 |         #  Game is over. Ass stat
122 |         if (reward != 0):
123 |             # Update stat of the current player
124 |             players[p%2].lose_nb += 1. if reward == -1 else 0
125 |             players[p%2].win_nb += 1. if reward == 1 else 0
126 |             # Update stat of the other player
127 |             players[(p+1)%2].lose_nb += 1. if reward == 1 else 0
128 |             players[(p+1)%2].win_nb += 1. if reward == -1 else 0
129 | 
130 |         # Add the reversed reward and the new state to the other player
131 |         if p != 0:
132 |             s, a, r, sp = players[(p+1)%2].history[-1]
133 |             players[(p+1)%2].history[-1] = (s, a, reward * -1, n_state)
134 | 
135 |         players[p%2].add_transition((state, action, reward, None))
136 | 
137 |         state = n_state
138 |         p += 1
139 | 
140 |     if train:
141 |         p1.train()
142 |         p2.train()
143 | 
144 | if __name__ == '__main__':
145 |     game = StickGame(12)
146 | 
147 |     # PLayers to train
148 |     p1 = StickPlayer(is_human=False, size=12, trainable=True)
149 |     p2 = StickPlayer(is_human=False, size=12, trainable=True)
150 |     # Human player and random player
151 |     human = StickPlayer(is_human=True, size=12, trainable=False)
152 |     random_player = StickPlayer(is_human=False, size=12, trainable=False)
153 | 
154 |     # Train the agent
155 |     for i in range(0, 10000):
156 |         if i % 10 == 0:
157 |             p1.eps = max(p1.eps*0.996, 0.05)
158 |             p2.eps = max(p2.eps*0.996, 0.05)
159 |         play(game, p1, p2)
160 |     p1.reset_stat()
161 | 
162 |     # Display the value function
163 |     for key in p1.V:
164 |         print(key, p1.V[key])
165 |     print("--------------------------")
166 | 
167 |     # Play agains a random player
168 |     for _ in range(0, 1000):
169 |         play(game, p1, random_player, train=False)
170 |     print("p1 win rate", p1.win_nb/(p1.win_nb + p1.lose_nb))
171 |     print("p1 win mean", np.mean(p1.rewards))
172 | 
173 |     # Play agains us
174 |     while True:
175 |         play(game, p1, human, train=False)
176 | 


--------------------------------------------------------------------------------
/tensorflow-js/teachable_machine.html:
--------------------------------------------------------------------------------
  1 | <html>
  2 |   <head>
  3 |     <!-- Load the latest version of TensorFlow.js -->
  4 |     <script src="https://unpkg.com/@tensorflow/tfjs"></script>
  5 |     <!-- Load MobileNet -->
  6 |     <script src="https://unpkg.com/@tensorflow-models/mobilenet"></script>
  7 |   </head>
  8 |   <body>
  9 | 
 10 |     <div id="console"></div>
 11 |     // Webcam
 12 |     <video autoplay playsinline muted id="webcam" width="224" height="224"></video>
 13 | 
 14 |     <!-- Buttons -->
 15 |     <button id="left">Left</button>
 16 |     <button id="right">Right</button>
 17 |     <button id="up">Up</button>
 18 |     <button id="down">Down</button>
 19 |     <br>
 20 |     <button onclick="train()" id="train">train</button>
 21 |     <button onclick="show()" id="train">show</button>
 22 | 
 23 |     <script>
 24 | 
 25 |         const webcamElement = document.getElementById('webcam');
 26 | 
 27 |         // Select buttons
 28 |         const left = document.getElementById("left");
 29 |         const right = document.getElementById("right");
 30 |         const up = document.getElementById("up");
 31 |         const down = document.getElementById("down")
 32 | 
 33 |         var show_class = false;
 34 | 
 35 |         var features = [];
 36 |         var targets = [];
 37 | 
 38 |         left.addEventListener("mousedown", () => { left.clicked = true; });
 39 |         right.addEventListener("mousedown", () => { right.clicked = true; });
 40 |         down.addEventListener("mousedown", () => { down.clicked = true; });
 41 |         up.addEventListener("mousedown", () => { up.clicked = true; });
 42 |         left.addEventListener("mouseup", () => { left.clicked = false; });
 43 |         right.addEventListener("mouseup", () => { right.clicked = false; });
 44 |         down.addEventListener("mouseup", () => { down.clicked = false; });
 45 |         up.addEventListener("mouseup", () => { up.clicked = false; });
 46 | 
 47 |         /*
 48 |             Create the model
 49 |         */
 50 | 
 51 |         // Input
 52 |         const input = tf.input({batchShape: [null, 1000]});
 53 |         // Output
 54 |         const output = tf.layers.dense({useBias: true, units: 4, activation: 'softmax'}).apply(input);
 55 |         // Create the model
 56 |         const model = tf.model({inputs: input, outputs: output});
 57 | 
 58 |         // Optimize
 59 |         const optimizer = tf.train.adam(0.01);
 60 |         // Compile the model
 61 |         model.compile({optimizer: optimizer, loss: 'categoricalCrossentropy'});
 62 | 
 63 | 
 64 |         // Methode used to laod the webcam
 65 |         async function setupWebcam() {
 66 |           return new Promise((resolve, reject) => {
 67 |             const navigatorAny = navigator;
 68 |             navigator.getUserMedia = navigator.getUserMedia ||
 69 |                 navigatorAny.webkitGetUserMedia || navigatorAny.mozGetUserMedia ||
 70 |                 navigatorAny.msGetUserMedia;
 71 |             if (navigator.getUserMedia) {
 72 |               navigator.getUserMedia({video: true},
 73 |                 stream => {
 74 |                   webcamElement.srcObject = stream;
 75 |                   webcamElement.addEventListener('loadeddata',  () => resolve(), false);
 76 |                 },
 77 |                 error => reject());
 78 |             } else {
 79 |               reject();
 80 |             }
 81 |           });
 82 |         }
 83 | 
 84 |         function show(){
 85 |             show_class = true;
 86 |         }
 87 | 
 88 | 
 89 |         function train(){
 90 |             // Train the model
 91 |             console.log("Train");
 92 |             const tf_features = tf.tensor2d(features, shape=[features.length, 1000])
 93 |             const tf_targets = tf.tensor(targets);
 94 | 
 95 |             model.fit(tf_features, tf_targets, {
 96 |               batchSize: 32,
 97 |               epochs: 75,
 98 |               callbacks: {
 99 |                 onBatchEnd: async (batch, logs) => {
100 |                   // Log the cost for every batch that is fed.
101 |                   console.log(logs.loss.toFixed(5));
102 |                   await tf.nextFrame();
103 |                 }
104 |               }
105 |             });
106 |         }
107 | 
108 | 
109 |         function add_features(feature){
110 |             // Add features to one class if one button is pressed
111 |             if (left.clicked){
112 |                 console.log("gather left");
113 |                 features.push(feature);
114 |                 targets.push([1., 0., 0., 0.]);
115 |             }
116 |             else if (right.clicked){
117 |                 console.log("gather right");
118 |                 features.push(feature);
119 |                 targets.push([0., 1., 0., 0.]);
120 |             }
121 |             else if (up.clicked){
122 |                 console.log("gather up");
123 |                 features.push(feature);
124 |                 targets.push([0., 0., 1., 0.]);
125 |             }
126 |             else if (down.clicked){
127 |                 console.log("gather down");
128 |                 features.push(feature);
129 |                 targets.push([0., 0., 0., 1.]);
130 |             }
131 |         }
132 | 
133 |         async function app() {
134 |           console.log('Loading mobilenet..');
135 |           // Load the model.
136 |           net = await mobilenet.load();
137 |           // Model loaded
138 |           console.log('Sucessfully loaded model');
139 |           await setupWebcam();
140 |           // Wait for the webcam to be setup
141 |           while (true) {
142 |             //const result = await net.classify(webcamElement);
143 | 
144 |             const feature = await net.infer(webcamElement);
145 |             //console.log(feature);
146 |             //console.log(feature.buffer().values);
147 |             add_features(Array.from(feature.buffer().values));
148 | 
149 |             //console.log("Prediction", result[0].className);
150 |             //console.log("Probability", result[0].probability);
151 | 
152 |             if (show_class){
153 |                 const prediction = model.predict(feature);
154 |                 const labels = ["Left", "Right", "Up", "Down"];
155 |                 cl = prediction.argMax(1).buffer().values[0];
156 |                 console.log(cl, labels[cl]);
157 |             }
158 | 
159 | 
160 |             // Wait for the next frame
161 |             await tf.nextFrame();
162 |           }
163 |         }
164 | 
165 |         app();
166 | 
167 |     </script>
168 |   </body>
169 | </html>
170 | 


--------------------------------------------------------------------------------
/rl/GridWorld.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "## Install the environement"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "code",
 12 |    "execution_count": null,
 13 |    "metadata": {},
 14 |    "outputs": [],
 15 |    "source": [
 16 |     "# $> git clone https://github.com/maximecb/gym-minigrid.git\n",
 17 |     "# $> pip3 install -e gym-minigrid\n",
 18 |     "# $> pip instal gym"
 19 |    ]
 20 |   },
 21 |   {
 22 |    "cell_type": "markdown",
 23 |    "metadata": {},
 24 |    "source": [
 25 |     "## Import libs and usefull methods"
 26 |    ]
 27 |   },
 28 |   {
 29 |    "cell_type": "code",
 30 |    "execution_count": 2,
 31 |    "metadata": {},
 32 |    "outputs": [],
 33 |    "source": [
 34 |     "import gym\n",
 35 |     "import gym_minigrid\n",
 36 |     "\n",
 37 |     "import time\n",
 38 |     "import numpy as np\n",
 39 |     "\n",
 40 |     "from q_learning import state_to_key, update_Q, create_state_if_not_exist"
 41 |    ]
 42 |   },
 43 |   {
 44 |    "cell_type": "markdown",
 45 |    "metadata": {},
 46 |    "source": [
 47 |     "## Demonstration of the environement"
 48 |    ]
 49 |   },
 50 |   {
 51 |    "cell_type": "code",
 52 |    "execution_count": 3,
 53 |    "metadata": {},
 54 |    "outputs": [],
 55 |    "source": [
 56 |     "Q = {}\n",
 57 |     "env = gym.make(\"MiniGrid-Empty-6x6-v0\")\n",
 58 |     "\n",
 59 |     "for epoch in range(100):\n",
 60 |     "    s = env.reset()\n",
 61 |     "    s = state_to_key(s)\n",
 62 |     "    done = False\n",
 63 |     "\n",
 64 |     "    while not done:\n",
 65 |     "        create_state_if_not_exist(Q, s)\n",
 66 |     "        sp, r, done, info = env.step(np.random.randint(0, 4))\n",
 67 |     "        sp = state_to_key(sp)\n",
 68 |     "        s = sp\n",
 69 |     "        env.render()"
 70 |    ]
 71 |   },
 72 |   {
 73 |    "cell_type": "code",
 74 |    "execution_count": 7,
 75 |    "metadata": {},
 76 |    "outputs": [
 77 |     {
 78 |      "name": "stdout",
 79 |      "output_type": "stream",
 80 |      "text": [
 81 |       "60\n",
 82 |       "[0. 0. 0. 0.]\n",
 83 |       "[0. 0. 0. 0.]\n",
 84 |       "[0. 0. 0. 0.]\n",
 85 |       "[0. 0. 0. 0.]\n",
 86 |       "[0. 0. 0. 0.]\n",
 87 |       "[0. 0. 0. 0.]\n",
 88 |       "[0. 0. 0. 0.]\n",
 89 |       "[0. 0. 0. 0.]\n",
 90 |       "[0. 0. 0. 0.]\n",
 91 |       "[0. 0. 0. 0.]\n",
 92 |       "[0. 0. 0. 0.]\n",
 93 |       "[0. 0. 0. 0.]\n",
 94 |       "[0. 0. 0. 0.]\n",
 95 |       "[0. 0. 0. 0.]\n",
 96 |       "[0. 0. 0. 0.]\n",
 97 |       "[0. 0. 0. 0.]\n",
 98 |       "[0. 0. 0. 0.]\n",
 99 |       "[0. 0. 0. 0.]\n",
100 |       "[0. 0. 0. 0.]\n",
101 |       "[0. 0. 0. 0.]\n",
102 |       "[0. 0. 0. 0.]\n",
103 |       "[0. 0. 0. 0.]\n",
104 |       "[0. 0. 0. 0.]\n",
105 |       "[0. 0. 0. 0.]\n",
106 |       "[0. 0. 0. 0.]\n",
107 |       "[0. 0. 0. 0.]\n",
108 |       "[0. 0. 0. 0.]\n",
109 |       "[0. 0. 0. 0.]\n",
110 |       "[0. 0. 0. 0.]\n",
111 |       "[0. 0. 0. 0.]\n",
112 |       "[0. 0. 0. 0.]\n",
113 |       "[0. 0. 0. 0.]\n",
114 |       "[0. 0. 0. 0.]\n",
115 |       "[0. 0. 0. 0.]\n",
116 |       "[0. 0. 0. 0.]\n",
117 |       "[0. 0. 0. 0.]\n",
118 |       "[0. 0. 0. 0.]\n",
119 |       "[0. 0. 0. 0.]\n",
120 |       "[0. 0. 0. 0.]\n",
121 |       "[0. 0. 0. 0.]\n",
122 |       "[0. 0. 0. 0.]\n",
123 |       "[0. 0. 0. 0.]\n",
124 |       "[0. 0. 0. 0.]\n",
125 |       "[0. 0. 0. 0.]\n",
126 |       "[0. 0. 0. 0.]\n",
127 |       "[0. 0. 0. 0.]\n",
128 |       "[0. 0. 0. 0.]\n",
129 |       "[0. 0. 0. 0.]\n",
130 |       "[0. 0. 0. 0.]\n",
131 |       "[0. 0. 0. 0.]\n",
132 |       "[0. 0. 0. 0.]\n",
133 |       "[0. 0. 0. 0.]\n",
134 |       "[0. 0. 0. 0.]\n",
135 |       "[0. 0. 0. 0.]\n",
136 |       "[0. 0. 0. 0.]\n",
137 |       "[0. 0. 0. 0.]\n",
138 |       "[0. 0. 0. 0.]\n",
139 |       "[0. 0. 0. 0.]\n",
140 |       "[0. 0. 0. 0.]\n",
141 |       "[0. 0. 0. 0.]\n"
142 |      ]
143 |     }
144 |    ],
145 |    "source": [
146 |     "print(len(Q))\n",
147 |     "for state in Q:\n",
148 |     "    print(Q[state])"
149 |    ]
150 |   },
151 |   {
152 |    "cell_type": "markdown",
153 |    "metadata": {},
154 |    "source": [
155 |     "# Methods you can use"
156 |    ]
157 |   },
158 |   {
159 |    "cell_type": "code",
160 |    "execution_count": 11,
161 |    "metadata": {},
162 |    "outputs": [
163 |     {
164 |      "name": "stdout",
165 |      "output_type": "stream",
166 |      "text": [
167 |       "<class 'dict'>\n",
168 |       "<class 'str'>\n",
169 |       "Actions for state s [0. 0. 0. 0.]\n"
170 |      ]
171 |     }
172 |    ],
173 |    "source": [
174 |     "Q = {}\n",
175 |     "env = gym.make(\"MiniGrid-Empty-6x6-v0\")\n",
176 |     "s = env.reset()\n",
177 |     "print(type(s))\n",
178 |     "s = state_to_key(s)\n",
179 |     "print(type(s))\n",
180 |     "# Create the state in the Q table if this state doesn't exist yet\n",
181 |     "create_state_if_not_exist(Q, s)\n",
182 |     "print(\"Actions for state s\", Q[s])\n",
183 |     "\n",
184 |     "# This method can be used to update the Q Table\n",
185 |     "update_Q(Q=Q, s=s, sp=s, a=0, r=0, done=False)"
186 |    ]
187 |   },
188 |   {
189 |    "cell_type": "markdown",
190 |    "metadata": {},
191 |    "source": [
192 |     "## Train the agent"
193 |    ]
194 |   },
195 |   {
196 |    "cell_type": "code",
197 |    "execution_count": 14,
198 |    "metadata": {},
199 |    "outputs": [
200 |     {
201 |      "name": "stdout",
202 |      "output_type": "stream",
203 |      "text": [
204 |       "\r",
205 |       " epoch= 0\n",
206 |       "\r",
207 |       " epoch= 1\n"
208 |      ]
209 |     }
210 |    ],
211 |    "source": [
212 |     "Q = {}\n",
213 |     "env = gym.make(\"MiniGrid-Empty-6x6-v0\")\n",
214 |     "\n",
215 |     "epochs_nb = 2\n",
216 |     "for epoch in range(epochs_nb):\n",
217 |     "    s = env.reset()\n",
218 |     "    s = state_to_key(s)\n",
219 |     "    done = False\n",
220 |     "\n",
221 |     "    print(\"\\r epoch=\", epoch)\n",
222 |     "    \n",
223 |     "    while not done:\n",
224 |     "        create_state_if_not_exist(Q, s)\n",
225 |     "        # TODO\n",
226 |     "        # Take an action here with epsilon greedy instead of randint\n",
227 |     "        sp, r, done, info = env.step(np.random.randint(0, 4))\n",
228 |     "        sp = state_to_key(sp)\n",
229 |     "        # TODO\n",
230 |     "        # Call the update method\n",
231 |     "        s = sp"
232 |    ]
233 |   },
234 |   {
235 |    "cell_type": "markdown",
236 |    "metadata": {},
237 |    "source": [
238 |     "## Check the result"
239 |    ]
240 |   },
241 |   {
242 |    "cell_type": "code",
243 |    "execution_count": 15,
244 |    "metadata": {},
245 |    "outputs": [],
246 |    "source": [
247 |     "s = env.reset()\n",
248 |     "s = state_to_key(s)\n",
249 |     "done = False\n",
250 |     "\n",
251 |     "while not done:\n",
252 |     "    create_state_if_not_exist(Q, s)\n",
253 |     "    sp, r, done, info = env.step(np.random.randint(0, 4))\n",
254 |     "    sp = state_to_key(sp)\n",
255 |     "    s = sp\n",
256 |     "    env.render()"
257 |    ]
258 |   }
259 |  ],
260 |  "metadata": {
261 |   "kernelspec": {
262 |    "display_name": "Python 3",
263 |    "language": "python",
264 |    "name": "python3"
265 |   },
266 |   "language_info": {
267 |    "codemirror_mode": {
268 |     "name": "ipython",
269 |     "version": 3
270 |    },
271 |    "file_extension": ".py",
272 |    "mimetype": "text/x-python",
273 |    "name": "python",
274 |    "nbconvert_exporter": "python",
275 |    "pygments_lexer": "ipython3",
276 |    "version": "3.6.7"
277 |   }
278 |  },
279 |  "nbformat": 4,
280 |  "nbformat_minor": 2
281 | }
282 | 


--------------------------------------------------------------------------------
/rl/tictactoe.py:
--------------------------------------------------------------------------------
  1 | from random import randint
  2 | import random
  3 | import numpy as np
  4 | 
  5 | class TicTacToeGame(object):
  6 |     """
  7 |         StickGame.
  8 |     """
  9 | 
 10 |     def __init__(self):
 11 |         # @nb Number of stick to play with
 12 |         super(TicTacToeGame, self).__init__()
 13 |         self.state = [0, 0, 0, 0, 0, 0, 0, 0, 0]
 14 |         self.finished = False
 15 | 
 16 |     def is_finished(self):
 17 |         # Check if the game is over @return Boolean
 18 |         return self.finished
 19 | 
 20 |     def reset(self):
 21 |         # Reset the state of the game
 22 |         self.state = [0, 0, 0, 0, 0, 0, 0, 0, 0]
 23 |         self.finished = False
 24 |         return self.state_to_nb(self.state)
 25 | 
 26 |     def display(self):
 27 |         # Display the state of the game
 28 |         print("State id:%s"%self.state_to_nb(self.state))
 29 |         display = "%s %s %s \n%s %s %s \n%s %s %s" % tuple(self.state)
 30 |         print(display.replace("1", "O").replace("2", "X").replace("0", "-"))
 31 | 
 32 |     def next_actions(self):
 33 |         # Return the next possible actions
 34 |         actions = []
 35 |         for p in range(0, 9):
 36 |             if self.state[p] == 0:
 37 |                 actions.append(p)
 38 |         return actions
 39 | 
 40 |     def next_possible_state(self, action, p):
 41 |         # Return the next possible actions
 42 |         state = [v for v in self.state]
 43 |         state[action] = p
 44 |         return state
 45 | 
 46 |     def state_to_nb(self, state):
 47 |         return str(state)
 48 |         i = 0
 49 |         nb = 0
 50 |         for p in state:
 51 |             nb += p*3**i
 52 |             i += 1
 53 |         return nb
 54 | 
 55 |     def step(self, action, p):
 56 |         self.state[action] = p
 57 |         st = self.state
 58 |         n_actions = self.next_actions()
 59 |         if len(n_actions) == 0:
 60 |             self.finished = True
 61 |         if (st[0] == p and st[1] == p and st[2] == p) or \
 62 |         (st[3] == p and st[4] == p and st[5] == p) or \
 63 |         (st[6] == p and st[7] == p and st[8] == p) or \
 64 |         (st[0] == p and st[3] == p and st[6] == p) or \
 65 |         (st[1] == p and st[4] == p and st[7] == p) or \
 66 |         (st[2] == p and st[5] == p and st[8] == p) or \
 67 |         (st[0] == p and st[4] == p and st[8] == p) or \
 68 |         (st[2] == p and st[4] == p and st[6] == p):
 69 |             self.finished = True
 70 |             return self.state_to_nb(self.state), 1
 71 |         else:
 72 |             return self.state_to_nb(self.state), 0
 73 | 
 74 | class StickPlayer(object):
 75 |     """
 76 |         Stick Player
 77 |     """
 78 | 
 79 |     def __init__(self, is_human, p, trainable=True):
 80 |         # @nb Number of stick to play with
 81 |         super(StickPlayer, self).__init__()
 82 |         self.is_human = is_human
 83 |         self.history = []
 84 |         self.V = {}
 85 |         self.p = p
 86 |         self.win_nb = 0.
 87 |         self.lose_nb = 0.
 88 |         self.rewards = []
 89 |         self.eps = 0.99
 90 |         self.trainable = trainable
 91 | 
 92 |     def reset_stat(self):
 93 |         # Reset stat
 94 |         self.win_nb = 0
 95 |         self.lose_nb = 0
 96 |         self.rewards = []
 97 | 
 98 |     def greedy_step(self, state, game, display_value=False):
 99 |         # Greedy step
100 |         actions = game.next_actions()
101 |         vboard = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
102 |         vmax = None
103 |         vi = None
104 |         for i in range(0, len(actions)):
105 |             a = actions[i]
106 |             n_state = game.next_possible_state(a, self.p)
107 |             nb = game.state_to_nb(n_state)
108 | 
109 |             if nb not in self.V:
110 |                 self.V[nb] = 0
111 | 
112 |             vboard[a] = self.V[nb]
113 |             if vmax is None or vmax < self.V[nb]:
114 |                 vmax = self.V[nb]
115 |                 vi = i
116 | 
117 |         if display_value:
118 |             display = "%.2f %.2f %.2f \n%.2f %.2f %.2f \n%.2f %.2f %.2f" % tuple(vboard)
119 |             print(display)
120 | 
121 |         return actions[vi]
122 | 
123 |     def play(self, state, game, display_value=False):
124 |         # PLay given the @state (int)
125 |         if self.is_human is False:
126 |             # Take random action
127 |             if random.uniform(0, 1) < self.eps:
128 |                 action = random.choice(game.next_actions())
129 |             else: # Or greedy action
130 |                 action = self.greedy_step(state, game, display_value)
131 |         else:
132 |             action = int(input("$>"))
133 |         return action
134 | 
135 |     def add_transition(self, n_tuple):
136 |         # Add one transition to the history: tuple (s, a , r, s')
137 |         self.history.append(n_tuple)
138 | 
139 |     def train(self, display_value):
140 |         if not self.trainable or self.is_human is True:
141 |             return
142 | 
143 |         # Update the value function if this player is not human
144 |         f = 0
145 |         for transition in reversed(self.history):
146 |             s, r, sp = transition
147 | 
148 |             if s not in self.V:
149 |                 self.V[s] = 0
150 |             if sp not in self.V:
151 |                 self.V[sp] = 0
152 | 
153 |             if f == 0:
154 |                 # For the last element, we train it toward the reward
155 |                 self.V[sp] = self.V[sp] + 0.01*(r - self.V[sp])
156 |             self.V[s] = self.V[s] + 0.01*(self.V[sp] - self.V[s])
157 |             f += 1
158 | 
159 | 
160 |         self.history = []
161 | 
162 | def play(game, p1, p2, train=True, display_value=False):
163 |     state = game.reset()
164 |     players = [p1, p2]
165 |     random.shuffle(players)
166 |     p = 0
167 |     while game.is_finished() is False:
168 | 
169 |         if players[0].is_human or players[1].is_human:
170 |             game.display()
171 | 
172 |         if state not in players[p%2].V:
173 |             players[p%2].V[state] = 0
174 |         action = players[p%2].play(state, game, display_value)
175 |         n_state, reward = game.step(action, players[p%2].p)
176 | 
177 |         #  Game is over. Ass stat
178 |         if (reward == 1):
179 |             players[p%2].win_nb += 1
180 | 
181 |         if display_value:
182 |             print("reward", reward)
183 | 
184 |         players[p%2].add_transition((state, reward, n_state))
185 |         players[(p+1)%2].add_transition((state, reward * -1, n_state))
186 | 
187 |         state = n_state
188 |         p += 1
189 | 
190 |     if train:
191 |         p1.train(display_value=display_value)
192 |         p2.train(display_value=display_value)
193 | 
194 | if __name__ == '__main__':
195 |     game = TicTacToeGame()
196 | 
197 |     # Players to train
198 |     p1 = StickPlayer(is_human=False, p=1, trainable=True)
199 |     p2 = StickPlayer(is_human=False, p=2, trainable=True)
200 |     # Human player and random player
201 |     human = StickPlayer(is_human=True, p=2, trainable=False)
202 |     random_player = StickPlayer(is_human=False, p=2, trainable=False)
203 | 
204 |     # Train the agent
205 |     for i in range(0, 100000):
206 |         if i % 10 == 0:
207 |             p1.eps = max(0.05, p1.eps*0.999)
208 |             p2.eps = max(0.05, p2.eps*0.999)
209 |         if i % 1000 == 0:
210 |             p1.reset_stat()
211 |             # Play agains a random player
212 |             for _ in range(0, 100):
213 |                 play(game, p1, random_player, train=False)
214 |             print("eps=%sp1 win rate=%s" % (p1.eps, p1.win_nb/100.))
215 | 
216 |         play(game, p1, p2)
217 | 
218 |     p1.eps = 0.0
219 |     p1.reset_stat()
220 |     # Play agains a random player
221 |     for _ in range(0, 10000):
222 |         play(game, p1, random_player, train=False)
223 |     print("eps=%sp1 win rate=%s" % (p1.eps, p1.win_nb/10000.))
224 | 
225 |     # Play agains us
226 |     while True:
227 |         play(game, p1, human, train=True, display_value=True)
228 | 


--------------------------------------------------------------------------------
/rl/q_learning_nn.html:
--------------------------------------------------------------------------------
  1 | <!DOCTYPE html>
  2 | <html>
  3 | <head>
  4 |     <meta charset="utf-8" />
  5 |     <title>Metacar: Documentation</title>
  6 |     <!-- Load the latest version of TensorFlow.js -->
  7 |     <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@0.11.6"> </script>
  8 |     <script src="https://cdnjs.cloudflare.com/ajax/libs/pixi.js/4.7.1/pixi.min.js"></script>
  9 | </head>
 10 | <body>
 11 | 
 12 |     <div id="env"></div>
 13 | 
 14 |     <script src="https://cdn.jsdelivr.net/combine/npm/metacar@0.1.1,npm/metacar@0.1.1"></script>
 15 |     <script>
 16 | 
 17 |         // Select a level
 18 |         const level = metacar.level.level0;
 19 |         // Create the environement
 20 |         const env = new metacar.env("env", level);
 21 |         // Set the agent motion
 22 |         env.setAgentMotion(metacar.motion.BasicMotion, {rotationStep: 0.1});
 23 | 
 24 |         // Create the model
 25 |         // Input
 26 |         const input = tf.input({batchShape: [null, 26]});
 27 |         // Hidden layer
 28 |         const layer = tf.layers.dense({useBias: true, units: 32, activation: 'relu'}).apply(input);
 29 |         // Output layer
 30 |         const output = tf.layers.dense({useBias: true, units: 3, activation: 'linear'}).apply(layer);
 31 |         // Create the model
 32 |         const model = tf.model({inputs: input, outputs: output});
 33 |         // Optimize
 34 |         let model_optimizer = tf.train.adam(0.01);
 35 | 
 36 |         // Loss of the model
 37 |         function model_loss(tf_states, tf_actions, Qtargets){
 38 |             return tf.tidy(() => {
 39 |                 // valeur
 40 |                 return model.predict(tf_states).sub(Qtargets).square().mul(tf_actions).mean();
 41 |             });
 42 |         }
 43 | 
 44 |         // Pick an action eps-greedy
 45 |         function pickAction(st, eps){
 46 |             let st_tensor = tf.tensor([st]);
 47 |             let act;
 48 |             if (Math.random() < eps){ // Pick a random action
 49 |                 act = Math.floor(Math.random()*3);
 50 |             }
 51 |             else {
 52 |                 let result = model.predict(st_tensor);
 53 |                 let argmax = result.argMax(1);
 54 |                 act = argmax.buffer().values[0];
 55 |                 argmax.dispose();
 56 |                 result.dispose();
 57 |             }
 58 |             st_tensor.dispose();
 59 |             return act;
 60 |         }
 61 | 
 62 |         // Return the mean of an array
 63 |         function mean(array){
 64 |             if (array.length == 0)
 65 |                 return null;
 66 |             var sum = array.reduce(function(a, b) { return a + b; });
 67 |             var avg = sum / array.length;
 68 |             return avg;
 69 |         }
 70 | 
 71 |         // Train the model
 72 |         function train_model(states, actions, rewards, next_states){
 73 |             var size = next_states.length;
 74 |             // Transform each array into a tensor
 75 |             let tf_states = tf.tensor2d(states, shape=[states.length, 26]);
 76 |             let tf_rewards = tf.tensor2d(rewards, shape=[rewards.length, 1]);
 77 |             let tf_next_states = tf.tensor2d(next_states, shape=[next_states.length, 26]);
 78 |             let tf_actions = tf.tensor2d(actions, shape=[actions.length, 3]);
 79 |             // Get the list of loss to compute the mean later in this function
 80 |             let losses = []
 81 | 
 82 |             // Get the QTargets
 83 |             const Qtargets = tf.tidy(() => {
 84 |                 let Q_stp1 = model.predict(tf_next_states);
 85 |                 let Qtargets = tf.tensor2d(Q_stp1.max(1).expandDims(1).mul(tf.scalar(0.99)).add(tf_rewards).buffer().values, shape=[size, 1]);
 86 |                 return Qtargets;
 87 |             });
 88 | 
 89 |             // Generate batch of training and train the model
 90 |             let batch_size = 32;
 91 |             for (var b = 0; b < size; b+=32) {
 92 | 
 93 |                 // Select the batch
 94 |                 let to = (b + batch_size < size) ?  batch_size  : (size - b);
 95 |                 const tf_states_b = tf_states.slice(b, to);
 96 |                 const tf_actions_b = tf_actions.slice(b, to);
 97 |                 const Qtargets_b = Qtargets.slice(b, to);
 98 | 
 99 |                 // Minimize the error
100 |                 model_optimizer.minimize(() => {
101 |                     const loss = model_loss(tf_states_b, tf_actions_b, Qtargets_b);
102 |                     losses.push(loss.buffer().values[0]);
103 |                     return loss;
104 |                 });
105 | 
106 |                 // Dispose the tensors from the memory
107 |                 tf_states_b.dispose();
108 |                 tf_actions_b.dispose();
109 |                 Qtargets_b.dispose();
110 |             }
111 | 
112 |             console.log("Mean loss", mean(losses));
113 | 
114 |             // Dispose the tensors from the memory
115 |             Qtargets.dispose();
116 |             tf_states.dispose();
117 |             tf_rewards.dispose();
118 |             tf_next_states.dispose();
119 |             tf_actions.dispose();
120 |         }
121 | 
122 |         env.load().then(() => {
123 | 
124 |             env.addEvent("play", () => {
125 |                 // Move forward
126 |                 let st = env.getState().linear;
127 |                 let act = pickAction(st, 0.0);
128 |                 let reward = env.step(act);
129 |                 console.log(reward);
130 |                 // Log the reward
131 |             });
132 | 
133 |             env.addEvent("stop", () => {
134 |                 return;
135 |             });
136 | 
137 |             env.addEvent("train", async () => {
138 | 
139 |                 let eps = 1.0;
140 |                 // Used to store the experiences
141 |                 let states = [];
142 |                 let rewards = [];
143 |                 let reward_mean = [];
144 |                 let next_states = [];
145 |                 let actions = [];
146 | 
147 |                 // Get the current state of the lidar
148 |                 let st = env.getState().linear;
149 |                 let st2;
150 | 
151 |                 for (let epi=0; epi < 150; epi++){
152 |                     let reward = 0;
153 |                     let step = 0;
154 |                     while (step < 400){
155 |                         // pick an action
156 |                         let act = pickAction(st, eps);
157 |                         reward = env.step(act);
158 |                         st2 = env.getState().linear;
159 | 
160 | 
161 |                         let mask = [0, 0, 0];
162 |                         mask[act] = 1;
163 | 
164 |                         // Randomly insert the new transition tuple
165 |                         let index = Math.floor(Math.random() * states.length);
166 |                         states.splice(index, 0, st);
167 |                         rewards.splice(index, 0, [reward]);
168 |                         reward_mean.splice(index, 0, reward)
169 |                         next_states.splice(index, 0, st2);
170 |                         actions.splice(index, 0, mask);
171 |                         // Be sure to keep the size of the dataset under 10000 transitions
172 |                         if (states.length > 10000){
173 |                             states = states.slice(1, states.length);
174 |                             rewards = rewards.slice(1, rewards.length);
175 |                             reward_mean = reward_mean.slice(1, reward_mean.length);
176 |                             next_states = next_states.slice(1, next_states.length);
177 |                             actions = actions.slice(1, actions.length);
178 |                         }
179 | 
180 |                         st = st2;
181 |                         step += 1;
182 |                     }
183 |                     // Decrease epsilon
184 |                     eps = Math.max(0.1, eps*0.99);
185 | 
186 |                     // Train model every 5 episodes
187 |                     if (epi % 5 == 0){
188 |                         console.log("---------------");
189 |                         console.log("rewards mean", mean(reward_mean));
190 |                         console.log("episode", epi);
191 |                         await train_model(states, actions, rewards, next_states);
192 |                         await tf.nextFrame();
193 |                     }
194 |                     // Shuffle the env
195 |                     env.shuffle();
196 |                 }
197 | 
198 |                 env.render(true);
199 |             });
200 | 
201 |         });
202 | 
203 |     </script>
204 | 
205 | </body>
206 | </html>
207 | 


--------------------------------------------------------------------------------
/rl/dqn.py:
--------------------------------------------------------------------------------
  1 | from collections import deque, namedtuple
  2 | from gym.wrappers import Monitor
  3 | import matplotlib.pyplot as plt
  4 | import tensorflow as tf
  5 | import numpy as np
  6 | import itertools
  7 | import random
  8 | import gym
  9 | import os
 10 | import sys
 11 | 
 12 | VALID_ACTIONS = [0, 1, 2, 3]
 13 | 
 14 | 
 15 | from tensorflow.python.client import device_lib
 16 | 
 17 | def get_available_gpus():
 18 |     local_device_protos = device_lib.list_local_devices()
 19 |     return [x.name for x in local_device_protos if x.device_type == 'GPU']
 20 | get_available_gpus()
 21 | 
 22 | 
 23 | class StateProcessor():
 24 | 
 25 |     def __init__(self):
 26 |       with tf.variable_scope("process"):
 27 |         self.input_state = tf.placeholder(shape=[210, 160, 3], dtype=tf.uint8, name="input_process")
 28 |         self.output = tf.image.rgb_to_grayscale(self.input_state)
 29 |         self.output = tf.image.crop_to_bounding_box(self.output, 34, 0, 160, 160)
 30 |         self.output = tf.image.resize_images(
 31 |             self.output, [84, 84], method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
 32 |         self.output = tf.squeeze(self.output)
 33 | 
 34 |     def process(self, sess, state):
 35 |         return sess.run(self.output, { self.input_state: state })
 36 | 
 37 | import gym
 38 | import time
 39 | 
 40 | env = gym.make('Breakout-v0')
 41 | env.reset()
 42 | 
 43 | for _ in range(1000):
 44 |     observation, reward, done, info = env.step(env.action_space.sample())
 45 |     if done:
 46 |         break
 47 | 
 48 | class DQN():
 49 | 
 50 |     def __init__(self, scope):
 51 |       self.scope = scope
 52 |       with tf.variable_scope(self.scope):
 53 |         self._build_model()
 54 | 
 55 |     def _build_model(self):
 56 |         # 4 Last frames of the game
 57 |         self.X_pl = tf.placeholder(shape=[None, 84, 84, 4], dtype=tf.uint8, name="X")
 58 |         # The TD target value
 59 |         self.y_pl = tf.placeholder(shape=[None], dtype=tf.float32, name="y")
 60 |         # Integer id of which action was selected
 61 |         self.actions_pl = tf.placeholder(shape=[None], dtype=tf.int32, name="actions")
 62 | 
 63 |         # Rescale the image
 64 |         X = tf.to_float(self.X_pl) / 255.0
 65 |         # Get the batch size
 66 |         batch_size = tf.shape(self.X_pl)[0]
 67 | 
 68 |         # Three convolutional layers
 69 |         conv1 = tf.layers.conv2d(X, 32, 8, 4, activation=tf.nn.relu)
 70 |         conv2 = tf.layers.conv2d(conv1, 64, 4, 2, activation=tf.nn.relu)
 71 |         conv3 = tf.layers.conv2d(conv2, 64, 3, 1, activation=tf.nn.relu)
 72 | 
 73 |         # Fully connected layers
 74 |         flattened = tf.contrib.layers.flatten(conv3)
 75 |         fc1 = tf.layers.dense(flattened, 512, activation=tf.nn.relu)
 76 |         self.predictions = tf.layers.dense(fc1, len(VALID_ACTIONS))
 77 |         tf.identity(self.predictions, name="predictions")
 78 | 
 79 |         # Get the predictions for the chosen actions only
 80 |         gather_indices = tf.range(batch_size) * tf.shape(self.predictions)[1] + self.actions_pl
 81 |         self.action_predictions = tf.gather(tf.reshape(self.predictions, [-1]), gather_indices)
 82 | 
 83 |         # Calculate the loss
 84 |         self.losses = tf.squared_difference(self.y_pl, self.action_predictions)
 85 |         self.loss = tf.reduce_mean(self.losses)
 86 | 
 87 |         # Optimizer Parameters from original paper
 88 |         self.optimizer = tf.train.RMSPropOptimizer(0.00025, 0.99, 0.0, 1e-6)
 89 |         self.train_op = self.optimizer.minimize(self.loss)
 90 | 
 91 |     def predict(self, sess, s):
 92 |         return sess.run(self.predictions, { self.X_pl: s })
 93 | 
 94 |     def update(self, sess, s, a, y):
 95 |         feed_dict = { self.X_pl: s, self.y_pl: y, self.actions_pl: a }
 96 |         ops = [self.train_op, self.loss]
 97 |         _, loss = sess.run(ops, feed_dict)
 98 |         return loss
 99 | 
100 | def copy_model_parameters(sess, estimator1, estimator2):
101 |     e1_params = [t for t in tf.trainable_variables() if t.name.startswith(estimator1.scope)]
102 |     e1_params = sorted(e1_params, key=lambda v: v.name)
103 |     e2_params = [t for t in tf.trainable_variables() if t.name.startswith(estimator2.scope)]
104 |     e2_params = sorted(e2_params, key=lambda v: v.name)
105 | 
106 |     update_ops = []
107 |     for e1_v, e2_v in zip(e1_params, e2_params):
108 |         op = e2_v.assign(e1_v)
109 |         update_ops.append(op)
110 | 
111 |     sess.run(update_ops)
112 | 
113 | tf.reset_default_graph()
114 | 
115 | 
116 | # DQN
117 | dqn = DQN(scope="dqn")
118 | # DQN target
119 | target_dqn = DQN(scope="target_dqn")
120 | 
121 | 
122 | # State processor
123 | state_processor = StateProcessor()
124 | 
125 | num_episodes = 10000
126 | 
127 | replay_memory_size = 250000
128 | replay_memory_init_size = 50000
129 | 
130 | update_target_estimator_every = 10000
131 | 
132 | epsilon_start = 1.0
133 | epsilon_end = 0.1
134 | 
135 | 
136 | epsilon_decay_steps = 500000
137 | discount_factor = 0.99
138 | batch_size = 32
139 | 
140 | def make_epsilon_greedy_policy(estimator, nA):
141 |     def policy_fn(sess, observation, epsilon):
142 |         A = np.ones(nA, dtype=float) * epsilon / nA
143 |         q_values = estimator.predict(sess, np.expand_dims(observation, 0))[0]
144 |         best_action = np.argmax(q_values)
145 |         A[best_action] += (1.0 - epsilon)
146 |         return A
147 |     return policy_fn
148 | 
149 | #saver = tf.train.Saver()
150 | start_i_episode = 0
151 | opti_step = -1
152 | 
153 | # The replay memory
154 | replay_memory = []
155 | 
156 | 
157 | 
158 | with tf.Session() as sess:
159 |     sess.run(tf.global_variables_initializer())
160 | 
161 |     Transition = namedtuple("Transition", ["state", "action", "reward", "next_state", "done"])
162 | 
163 | 
164 |     # Used to save the model
165 |     checkpoint_dir = os.path.join("./", "checkpoints")
166 |     checkpoint_path = os.path.join(checkpoint_dir, "model")
167 | 
168 |     if not os.path.exists(checkpoint_dir):
169 |         os.makedirs(checkpoint_dir)
170 | 
171 |     saver = tf.train.Saver()
172 |     # Load a previous checkpoint if we find one
173 |     latest_checkpoint = tf.train.latest_checkpoint(checkpoint_dir)
174 | 
175 |     #  Epsilon decay
176 |     epsilons = np.linspace(epsilon_start, epsilon_end, epsilon_decay_steps)
177 | 
178 |     # Policy
179 |     policy = make_epsilon_greedy_policy(dqn, len(VALID_ACTIONS))
180 | 
181 |     epi_reward = []
182 |     best_epi_reward = 0
183 | 
184 |     for i_episode in range(start_i_episode, num_episodes):
185 |         # Reset the environment
186 |         state = env.reset()
187 |         state = state_processor.process(sess, state)
188 |         state = np.stack([state] * 4, axis=2)
189 |         loss = None
190 |         done = False
191 |         r_sum = 0
192 |         mean_epi_reward = np.mean(epi_reward)
193 |         if best_epi_reward < mean_epi_reward:
194 |             best_epi_reward = mean_epi_reward
195 |             saver.save(tf.get_default_session(), checkpoint_path)
196 | 
197 |         len_replay_memory = len(replay_memory)
198 |         while not done:
199 |             # Get the epsilon for this step
200 |             epsilon = epsilons[min(opti_step+1, epsilon_decay_steps-1)]
201 | 
202 | 
203 |             # Update the target network
204 |             if opti_step % update_target_estimator_every == 0:
205 |                 copy_model_parameters(sess, dqn, target_dqn)
206 | 
207 |             print("\r Epsilon ({}) ReplayMemorySize : ({}) rSum: ({}) best_epi_reward: ({}) OptiStep ({}) @ Episode {}/{}, loss: {}".format(epsilon, len_replay_memory, mean_epi_reward, best_epi_reward, opti_step, i_episode + 1, num_episodes, loss), end="")
208 |             sys.stdout.flush()
209 | 
210 | 
211 |             #  Select an action with eps-greedy
212 |             action_probs = policy(sess, state, epsilon)
213 |             action = np.random.choice(np.arange(len(action_probs)), p=action_probs)
214 | 
215 |             # Step in the env with this action
216 |             next_state, reward, done, _ = env.step(VALID_ACTIONS[action])
217 |             r_sum += reward
218 | 
219 |             # Add this action to the stack of images
220 |             next_state = state_processor.process(sess, next_state)
221 |             next_state = np.append(state[:,:,1:], np.expand_dims(next_state, 2), axis=2)
222 | 
223 |             # If our replay memory is full, pop the first element
224 |             if len(replay_memory) == replay_memory_size:
225 |                 replay_memory.pop(0)
226 | 
227 | 
228 |             # Save transition to replay memory
229 |             replay_memory.append(Transition(state, action, reward, next_state, done))
230 | 
231 |             if len_replay_memory > replay_memory_init_size:
232 |                 # Sample a minibatch from the replay memory
233 |                 samples = random.sample(replay_memory, batch_size)
234 |                 states_batch, action_batch, reward_batch, next_states_batch, done_batch = map(np.array, zip(*samples))
235 | 
236 |                 # We compute the next q value with
237 |                 q_values_next_target = target_dqn.predict(sess, next_states_batch)
238 |                 t_best_actions = np.argmax(q_values_next_target, axis=1)
239 |                 targets_batch = reward_batch + np.invert(done_batch).astype(np.float32) * discount_factor * q_values_next_target[np.arange(batch_size), t_best_actions]
240 | 
241 |                 # Perform gradient descent update
242 |                 states_batch = np.array(states_batch)
243 |                 loss = dqn.update(sess, states_batch, action_batch, targets_batch)
244 | 
245 |                 opti_step += 1
246 | 
247 |             state = next_state
248 |             if done:
249 |               break
250 | 
251 |         epi_reward.append(r_sum)
252 |         if len(epi_reward) > 100:
253 |             epi_reward = epi_reward[1:]
254 | 


--------------------------------------------------------------------------------
/tensorflow/Eager Execution.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 2,
  6 |    "metadata": {},
  7 |    "outputs": [],
  8 |    "source": [
  9 |     "import tensorflow as tf\n",
 10 |     "tf.enable_eager_execution()"
 11 |    ]
 12 |   },
 13 |   {
 14 |    "cell_type": "code",
 15 |    "execution_count": 2,
 16 |    "metadata": {},
 17 |    "outputs": [
 18 |     {
 19 |      "name": "stdout",
 20 |      "output_type": "stream",
 21 |      "text": [
 22 |       "hello, [[4.]]\n"
 23 |      ]
 24 |     }
 25 |    ],
 26 |    "source": [
 27 |     "tf.executing_eagerly()        # => True\n",
 28 |     "\n",
 29 |     "x = [[2.]]\n",
 30 |     "m = tf.matmul(x, x)\n",
 31 |     "print(\"hello, {}\".format(m))  # => \"hello, [[4.]]\""
 32 |    ]
 33 |   },
 34 |   {
 35 |    "cell_type": "code",
 36 |    "execution_count": 3,
 37 |    "metadata": {},
 38 |    "outputs": [
 39 |     {
 40 |      "name": "stdout",
 41 |      "output_type": "stream",
 42 |      "text": [
 43 |       "tf.Tensor(\n",
 44 |       "[[1 2]\n",
 45 |       " [3 4]], shape=(2, 2), dtype=int32)\n",
 46 |       "tf.Tensor(\n",
 47 |       "[[2 3]\n",
 48 |       " [4 5]], shape=(2, 2), dtype=int32)\n",
 49 |       "tf.Tensor(\n",
 50 |       "[[ 2  6]\n",
 51 |       " [12 20]], shape=(2, 2), dtype=int32)\n",
 52 |       "[[ 2  6]\n",
 53 |       " [12 20]]\n",
 54 |       "[[1 2]\n",
 55 |       " [3 4]]\n"
 56 |      ]
 57 |     }
 58 |    ],
 59 |    "source": [
 60 |     "a = tf.constant([[1, 2],\n",
 61 |     "                 [3, 4]])\n",
 62 |     "print(a)\n",
 63 |     "# => tf.Tensor([[1 2]\n",
 64 |     "#               [3 4]], shape=(2, 2), dtype=int32)\n",
 65 |     "\n",
 66 |     "# Broadcasting support\n",
 67 |     "b = tf.add(a, 1)\n",
 68 |     "print(b)\n",
 69 |     "# => tf.Tensor([[2 3]\n",
 70 |     "#               [4 5]], shape=(2, 2), dtype=int32)\n",
 71 |     "\n",
 72 |     "# Operator overloading is supported\n",
 73 |     "print(a * b)\n",
 74 |     "# => tf.Tensor([[ 2  6]\n",
 75 |     "#               [12 20]], shape=(2, 2), dtype=int32)\n",
 76 |     "\n",
 77 |     "# Use NumPy values\n",
 78 |     "import numpy as np\n",
 79 |     "\n",
 80 |     "c = np.multiply(a, b)\n",
 81 |     "print(c)\n",
 82 |     "# => [[ 2  6]\n",
 83 |     "#     [12 20]]\n",
 84 |     "\n",
 85 |     "# Obtain numpy value from a tensor:\n",
 86 |     "print(a.numpy())\n",
 87 |     "# => [[1 2]\n",
 88 |     "#     [3 4]]"
 89 |    ]
 90 |   },
 91 |   {
 92 |    "cell_type": "markdown",
 93 |    "metadata": {},
 94 |    "source": [
 95 |     "## Eager training"
 96 |    ]
 97 |   },
 98 |   {
 99 |    "cell_type": "code",
100 |    "execution_count": 10,
101 |    "metadata": {},
102 |    "outputs": [
103 |     {
104 |      "name": "stdout",
105 |      "output_type": "stream",
106 |      "text": [
107 |       "tf.Tensor([[2.]], shape=(1, 1), dtype=float32)\n"
108 |      ]
109 |     }
110 |    ],
111 |    "source": [
112 |     "tfe = tf.contrib.eager\n",
113 |     "w = tfe.Variable([[1.0]])\n",
114 |     "\n",
115 |     "with tf.GradientTape() as tape:\n",
116 |     "  loss = w * w\n",
117 |     "\n",
118 |     "grad = tape.gradient(loss, w)\n",
119 |     "print(grad)  # => tf.Tensor([[ 2.]], shape=(1, 1), dtype=float32)"
120 |    ]
121 |   },
122 |   {
123 |    "cell_type": "markdown",
124 |    "metadata": {},
125 |    "source": [
126 |     "Here's an example of tf.GradientTape that records forward-pass operations to train a simple model:"
127 |    ]
128 |   },
129 |   {
130 |    "cell_type": "code",
131 |    "execution_count": 12,
132 |    "metadata": {},
133 |    "outputs": [
134 |     {
135 |      "name": "stdout",
136 |      "output_type": "stream",
137 |      "text": [
138 |       "Initial loss: 69.297\n",
139 |       "Loss at step 000: 66.579\n",
140 |       "Loss at step 020: 30.117\n",
141 |       "Loss at step 040: 13.941\n",
142 |       "Loss at step 060: 6.764\n",
143 |       "Loss at step 080: 3.579\n",
144 |       "Loss at step 100: 2.167\n",
145 |       "Loss at step 120: 1.540\n",
146 |       "Loss at step 140: 1.261\n",
147 |       "Loss at step 160: 1.138\n",
148 |       "Loss at step 180: 1.083\n",
149 |       "Final loss: 1.060\n",
150 |       "W = 3.0000877380371094, B = 2.153679609298706\n"
151 |      ]
152 |     }
153 |    ],
154 |    "source": [
155 |     "# A toy dataset of points around 3 * x + 2\n",
156 |     "NUM_EXAMPLES = 1000\n",
157 |     "training_inputs = tf.random_normal([NUM_EXAMPLES])\n",
158 |     "noise = tf.random_normal([NUM_EXAMPLES])\n",
159 |     "training_outputs = training_inputs * 3 + 2 + noise\n",
160 |     "\n",
161 |     "def prediction(input, weight, bias):\n",
162 |     "  return input * weight + bias\n",
163 |     "\n",
164 |     "# A loss function using mean-squared error\n",
165 |     "def loss(weights, biases):\n",
166 |     "  error = prediction(training_inputs, weights, biases) - training_outputs\n",
167 |     "  return tf.reduce_mean(tf.square(error))\n",
168 |     "\n",
169 |     "# Return the derivative of loss with respect to weight and bias\n",
170 |     "def grad(weights, biases):\n",
171 |     "  with tf.GradientTape() as tape:\n",
172 |     "    loss_value = loss(weights, biases)\n",
173 |     "  return tape.gradient(loss_value, [weights, biases])\n",
174 |     "\n",
175 |     "train_steps = 200\n",
176 |     "learning_rate = 0.01\n",
177 |     "# Start with arbitrary values for W and B on the same batch of data\n",
178 |     "W = tfe.Variable(5.)\n",
179 |     "B = tfe.Variable(10.)\n",
180 |     "\n",
181 |     "print(\"Initial loss: {:.3f}\".format(loss(W, B)))\n",
182 |     "\n",
183 |     "for i in range(train_steps):\n",
184 |     "  dW, dB = grad(W, B)\n",
185 |     "  W.assign_sub(dW * learning_rate)\n",
186 |     "  B.assign_sub(dB * learning_rate)\n",
187 |     "  if i % 20 == 0:\n",
188 |     "    print(\"Loss at step {:03d}: {:.3f}\".format(i, loss(W, B)))\n",
189 |     "\n",
190 |     "print(\"Final loss: {:.3f}\".format(loss(W, B)))\n",
191 |     "print(\"W = {}, B = {}\".format(W.numpy(), B.numpy()))"
192 |    ]
193 |   },
194 |   {
195 |    "cell_type": "markdown",
196 |    "metadata": {},
197 |    "source": [
198 |     "### Variables and optimizers"
199 |    ]
200 |   },
201 |   {
202 |    "cell_type": "code",
203 |    "execution_count": 20,
204 |    "metadata": {},
205 |    "outputs": [
206 |     {
207 |      "name": "stdout",
208 |      "output_type": "stream",
209 |      "text": [
210 |       "Initial loss: 68.871\n",
211 |       "Loss at step 000: 66.188\n",
212 |       "Loss at step 020: 30.088\n",
213 |       "Loss at step 040: 13.975\n",
214 |       "Loss at step 060: 6.783\n",
215 |       "Loss at step 080: 3.573\n",
216 |       "Loss at step 100: 2.140\n",
217 |       "Loss at step 120: 1.500\n",
218 |       "Loss at step 140: 1.215\n",
219 |       "Loss at step 160: 1.087\n",
220 |       "Loss at step 180: 1.030\n",
221 |       "Loss at step 200: 1.005\n",
222 |       "Loss at step 220: 0.993\n",
223 |       "Loss at step 240: 0.988\n",
224 |       "Loss at step 260: 0.986\n",
225 |       "Loss at step 280: 0.985\n",
226 |       "Final loss: 0.985\n",
227 |       "W = 3.0072834491729736, B = 2.017597198486328\n"
228 |      ]
229 |     }
230 |    ],
231 |    "source": [
232 |     "class Model(tf.keras.Model):\n",
233 |     "  def __init__(self):\n",
234 |     "    super(Model, self).__init__()\n",
235 |     "    self.W = tfe.Variable(5., name='weight')\n",
236 |     "    self.B = tfe.Variable(10., name='bias')\n",
237 |     "  def call(self, inputs):\n",
238 |     "    return inputs * self.W + self.B\n",
239 |     "\n",
240 |     "# A toy dataset of points around 3 * x + 2\n",
241 |     "NUM_EXAMPLES = 2000\n",
242 |     "training_inputs = tf.random_normal([NUM_EXAMPLES])\n",
243 |     "noise = tf.random_normal([NUM_EXAMPLES])\n",
244 |     "training_outputs = training_inputs * 3 + 2 + noise\n",
245 |     "\n",
246 |     "# The loss function to be optimized\n",
247 |     "def loss(model, inputs, targets):\n",
248 |     "  error = model(inputs) - targets\n",
249 |     "  return tf.reduce_mean(tf.square(error))\n",
250 |     "\n",
251 |     "def grad(model, inputs, targets):\n",
252 |     "  with tf.GradientTape() as tape:\n",
253 |     "    loss_value = loss(model, inputs, targets)\n",
254 |     "  return tape.gradient(loss_value, [model.W, model.B])\n",
255 |     "\n",
256 |     "# Define:\n",
257 |     "# 1. A model.\n",
258 |     "# 2. Derivatives of a loss function with respect to model parameters.\n",
259 |     "# 3. A strategy for updating the variables based on the derivatives.\n",
260 |     "model = Model()\n",
261 |     "optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)\n",
262 |     "\n",
263 |     "print(\"Initial loss: {:.3f}\".format(loss(model, training_inputs, training_outputs)))\n",
264 |     "\n",
265 |     "# Training loop\n",
266 |     "for i in range(300):\n",
267 |     "  grads = grad(model, training_inputs, training_outputs)\n",
268 |     "  optimizer.apply_gradients(zip(grads, [model.W, model.B]),\n",
269 |     "                            global_step=tf.train.get_or_create_global_step())\n",
270 |     "  #optimizer.minimize(lambda: loss(model, training_inputs, training_outputs))\n",
271 |     "  if i % 20 == 0:\n",
272 |     "    print(\"Loss at step {:03d}: {:.3f}\".format(i, loss(model, training_inputs, training_outputs)))\n",
273 |     "\n",
274 |     "print(\"Final loss: {:.3f}\".format(loss(model, training_inputs, training_outputs)))\n",
275 |     "print(\"W = {}, B = {}\".format(model.W.numpy(), model.B.numpy()))"
276 |    ]
277 |   },
278 |   {
279 |    "cell_type": "code",
280 |    "execution_count": null,
281 |    "metadata": {},
282 |    "outputs": [],
283 |    "source": []
284 |   }
285 |  ],
286 |  "metadata": {
287 |   "kernelspec": {
288 |    "display_name": "Python 3",
289 |    "language": "python",
290 |    "name": "python3"
291 |   },
292 |   "language_info": {
293 |    "codemirror_mode": {
294 |     "name": "ipython",
295 |     "version": 3
296 |    },
297 |    "file_extension": ".py",
298 |    "mimetype": "text/x-python",
299 |    "name": "python",
300 |    "nbconvert_exporter": "python",
301 |    "pygments_lexer": "ipython3",
302 |    "version": "3.5.2"
303 |   }
304 |  },
305 |  "nbformat": 4,
306 |  "nbformat_minor": 2
307 | }
308 | 


--------------------------------------------------------------------------------
/rl/q_learning_visu.html:
--------------------------------------------------------------------------------
  1 | <!DOCTYPE html>
  2 | <html>
  3 |     <head>
  4 |         <meta charset="utf-8">
  5 |         <title></title>
  6 |     </head>
  7 | 
  8 |     <style media="screen">
  9 |         body{
 10 |             background: #efefef;
 11 |         }
 12 | 
 13 |         section{
 14 |             position: relative;
 15 |         }
 16 | 
 17 |         .grid{
 18 |             width: 900px;
 19 |             height: 900px;
 20 |             background: white;
 21 |             margin: auto;
 22 |             margin-top: 20px;
 23 |         }
 24 | 
 25 |         .state {
 26 |             width: 298px;
 27 |             height: 298px;
 28 |             border: 1px solid #e0e0e0;
 29 |             float: left;
 30 |             position: relative;
 31 |         }
 32 | 
 33 |         .state h1{
 34 |             position: absolute;
 35 |             top: 98px;
 36 |             margin: 0px;
 37 |             font-size: 80px;
 38 |             color: #8f8f8f;
 39 |             left: 119px;
 40 |         }
 41 | 
 42 |         .left, .right, .up, .down {
 43 |             position: absolute;
 44 |             color:  #8f8f8f;
 45 |             font-weight: bold;
 46 |             font-size: 30px;
 47 |         }
 48 | 
 49 |         .left {
 50 |             top: 110px;
 51 |             left: 10px;
 52 |         }
 53 | 
 54 |         .right {
 55 |             top: 110px;
 56 |             right: 10px;
 57 |         }
 58 | 
 59 |         .up {
 60 |             top: 10px;
 61 |             right: 130px;
 62 |         }
 63 | 
 64 |         .down {
 65 |             bottom: 10px;
 66 |             right: 130px;
 67 |         }
 68 | 
 69 |         #five {
 70 |             background: #ff6060;
 71 |         }
 72 | 
 73 |         #three {
 74 |             background: #92b77c;
 75 |         }
 76 | 
 77 |         .agent {
 78 |             position: absolute;
 79 |             width: 86px;
 80 |             height: 81px;
 81 |             background: blue;
 82 |             z-index: 80;
 83 |         }
 84 | 
 85 |         button{
 86 |             background: #4e4e4e;
 87 |             border: none;
 88 |             width: 100px;
 89 |             height: 52px;
 90 |             color: white;
 91 |             font-weight: bold;
 92 |             display: block;
 93 |             margin: auto;
 94 |             cursor: pointer;
 95 |         }
 96 | 
 97 |     </style>
 98 |     <body>
 99 | 
100 |         <button id="button" type="button">Step</button>
101 |         <center><img src="update.png" alt="" width="800px" /></center>
102 | 
103 |         <section class="grid">
104 | 
105 |             <div class="agent" id="agent">
106 |             </div>
107 | 
108 |             <div class="state" id="one">
109 |                 <h1>1</h1>
110 |                 <span class="left">
111 |                     <span id="1_0">0</span>
112 |                     <br>
113 |                     &#8592;
114 |                 </span>
115 | 
116 |                 <span class="right">
117 |                     <span id="1_1">0</span>
118 |                     <br>
119 |                     &#8594;
120 |                 </span>
121 | 
122 |                 <span class="up">
123 |                     <span id="1_2">0</span>
124 |                     <br>
125 |                     &#8593;
126 |                 </span>
127 | 
128 |                 <span class="down">
129 |                     <span id="1_3">0</span>
130 |                     <br>
131 |                     &#8595;
132 |                 </span>
133 |             </div>
134 | 
135 |             <div class="state" id="2">
136 |                 <h1>2</h1>
137 |                 <span class="left">
138 |                     <span id="2_0">0</span>
139 |                     <br>
140 |                     &#8592;
141 |                 </span>
142 | 
143 |                 <span class="right">
144 |                     <span id="2_1">0</span>
145 |                     <br>
146 |                     &#8594;
147 |                 </span>
148 | 
149 |                 <span class="up">
150 |                     <span id="2_2">0</span>
151 |                     <br>
152 |                     &#8593;
153 |                 </span>
154 | 
155 |                 <span class="down">
156 |                     <span id="2_3">0</span>
157 |                     <br>
158 |                     &#8595;
159 |                 </span>
160 |             </div>
161 |             <div class="state" id="three">
162 |                 <h1>3</h1>
163 |                 <span class="left">
164 |                     <span id="3_0">0</span>
165 |                     <br>
166 |                     &#8592;
167 |                 </span>
168 | 
169 |                 <span class="right">
170 |                     <span id="3_1">0</span>
171 |                     <br>
172 |                     &#8594;
173 |                 </span>
174 | 
175 |                 <span class="up">
176 |                     <span id="3_2">0</span>
177 |                     <br>
178 |                     &#8593;
179 |                 </span>
180 | 
181 |                 <span class="down">
182 |                     <span id="3_3">0</span>
183 |                     <br>
184 |                     &#8595;
185 |                 </span>
186 |             </div>
187 |             <div class="state" id="four">
188 |                 <h1>4</h1>
189 |                 <span class="left">
190 |                     <span id="4_0">0</span>
191 |                     <br>
192 |                     &#8592;
193 |                 </span>
194 | 
195 |                 <span class="right">
196 |                     <span id="4_1">0</span>
197 |                     <br>
198 |                     &#8594;
199 |                 </span>
200 | 
201 |                 <span class="up">
202 |                     <span id="4_2">0</span>
203 |                     <br>
204 |                     &#8593;
205 |                 </span>
206 | 
207 |                 <span class="down">
208 |                     <span id="4_3">0</span>
209 |                     <br>
210 |                     &#8595;
211 |                 </span>
212 |             </div>
213 |             <div class="state" id="five">
214 |                 <h1>5</h1>
215 |                 <span class="left">
216 |                     <span id="5_0">0</span>
217 |                     <br>
218 |                     &#8592;
219 |                 </span>
220 | 
221 |                 <span class="right">
222 |                     <span id="5_1">0</span>
223 |                     <br>
224 |                     &#8594;
225 |                 </span>
226 | 
227 |                 <span class="up">
228 |                     <span id="5_2">0</span>
229 |                     <br>
230 |                     &#8593;
231 |                 </span>
232 | 
233 |                 <span class="down">
234 |                     <span id="5_3">0</span>
235 |                     <br>
236 |                     &#8595;
237 |                 </span>
238 |             </div>
239 |             <div class="state" id="six">
240 |                 <h1>6</h1>
241 |                 <span class="left">
242 |                     <span id="6_0">0</span>
243 |                     <br>
244 |                     &#8592;
245 |                 </span>
246 | 
247 |                 <span class="right">
248 |                     <span id="6_1">0</span>
249 |                     <br>
250 |                     &#8594;
251 |                 </span>
252 | 
253 |                 <span class="up">
254 |                     <span id="6_2">0</span>
255 |                     <br>
256 |                     &#8593;
257 |                 </span>
258 | 
259 |                 <span class="down">
260 |                     <span id="6_3">0</span>
261 |                     <br>
262 |                     &#8595;
263 |                 </span>
264 |             </div>
265 |             <div class="state" id="seven">
266 |                 <h1>7</h1>
267 |                 <span class="left">
268 |                     <span id="7_0">0</span>
269 |                     <br>
270 |                     &#8592;
271 |                 </span>
272 | 
273 |                 <span class="right">
274 |                     <span id="7_1">0</span>
275 |                     <br>
276 |                     &#8594;
277 |                 </span>
278 | 
279 |                 <span class="up">
280 |                     <span id="7_2">0</span>
281 |                     <br>
282 |                     &#8593;
283 |                 </span>
284 | 
285 |                 <span class="down">
286 |                     <span id="7_3">0</span>
287 |                     <br>
288 |                     &#8595;
289 |                 </span>
290 |             </div>
291 |             <div class="state" id="height">
292 |                 <h1>8</h1>
293 |                 <span class="left">
294 |                     <span id="8_0">0</span>
295 |                     <br>
296 |                     &#8592;
297 |                 </span>
298 | 
299 |                 <span class="right">
300 |                     <span id="8_1">0</span>
301 |                     <br>
302 |                     &#8594;
303 |                 </span>
304 | 
305 |                 <span class="up">
306 |                     <span id="8_2">0</span>
307 |                     <br>
308 |                     &#8593;
309 |                 </span>
310 | 
311 |                 <span class="down">
312 |                     <span id="8_3">0</span>
313 |                     <br>
314 |                     &#8595;
315 |                 </span>
316 |             </div>
317 |             <div class="state" id="nine">
318 |                 <h1>9</h1>
319 |                 <span class="left">
320 |                     <span id="9_0">0</span>
321 |                     <br>
322 |                     &#8592;
323 |                 </span>
324 | 
325 |                 <span class="right">
326 |                     <span id="9_1">0</span>
327 |                     <br>
328 |                     &#8594;
329 |                 </span>
330 | 
331 |                 <span class="up">
332 |                     <span id="9_2">0</span>
333 |                     <br>
334 |                     &#8593;
335 |                 </span>
336 | 
337 |                 <span class="down">
338 |                     <span id="9_3">0</span>
339 |                     <br>
340 |                     &#8595;
341 |                 </span>
342 |             </div>
343 |         </section>
344 | 
345 |         <script type="application/javascript">
346 | 
347 |         var agent = document.getElementById("agent");
348 |         var agent_pos = {y: 0, x: 0};
349 |         var grid = [
350 |                 [0, 0, 1],
351 |                 [0, -1, 0],
352 |                 [0, 0, 0]
353 |         ];
354 | 
355 |         agent.style.top = "103px";
356 |         agent.style.left = "98px";
357 | 
358 |         var button = document.getElementById("button");
359 |         var state = agent_pos.y*3+agent_pos.x+1;
360 | 
361 |         var Q = [
362 |             [0, 0, 0, 0],
363 |             [0, 0, 0, 0],
364 |             [0, 0, 0, 0],
365 |             [0, 0, 0, 0],
366 |             [0, 0, 0, 0],
367 |             [0, 0, 0, 0],
368 |             [0, 0, 0, 0],
369 |             [0, 0, 0, 0],
370 |             [0, 0, 0, 0],
371 |             [0, 0, 0, 0]
372 |         ];
373 | 
374 |         function step(action){
375 |             if (action == 0){ // left
376 |                 agent_pos.x = Math.min(2, Math.max(0, agent_pos.x - 1));
377 |             }
378 |             else if (action == 1){ // right
379 |                 agent_pos.x = Math.min(2, Math.max(0, agent_pos.x + 1));
380 |             }
381 |             else if (action == 2){ // up
382 |                 agent_pos.y = Math.min(2, Math.max(0, agent_pos.y - 1));
383 |             }
384 |             else if (action == 3){ // dpwn
385 |                 agent_pos.y = Math.min(2, Math.max(0, agent_pos.y + 1));
386 |             }
387 | 
388 |             agent.style.top = (agent_pos.y*298)+103 + "px";
389 |             agent.style.left = (agent_pos.x*298)+98 + "px";
390 | 
391 |             state = agent_pos.y*3+agent_pos.x+1;
392 |             return {r: grid[agent_pos.y][agent_pos.x], stp1: state};
393 |         }
394 | 
395 |         function getRandomInt(max) {
396 |           return Math.floor(Math.random() * Math.floor(max));
397 |         }
398 | 
399 |         function argMax(array) {
400 |           return array.map((x, i) => [x, i]).reduce((r, a) => (a[0] > r[0] ? a : r))[1];
401 |         }
402 | 
403 |         function take_action(st, eps){
404 |             if (Math.random() < eps){
405 |                 action = getRandomInt(4);
406 |             }
407 |             else{
408 |                 action = argMax(Q[st])
409 |             }
410 |             return action;
411 |         }
412 | 
413 |         button.addEventListener("click", () => {
414 |             var gamma = 0.9;
415 |             var lr = 0.1;
416 |             var st = state;
417 |             var at = take_action(st, 1.0);
418 |             var {r, stp1} = step(at);
419 | 
420 |             var atp1 = take_action(stp1, 0.0);
421 |             Q[st][at] = Q[st][at] + lr*(r + gamma*Q[stp1][atp1] - Q[st][at])
422 | 
423 |             document.getElementById(st+"_"+at).innerHTML = Math.round(Q[st][at] * 100) / 100;
424 | 
425 |             if (Q[st][at] == 0){
426 |                 document.getElementById(st+"_"+at).style.color = "#dcdcdc";
427 |             }
428 |             else if (Q[st][at] < 0){
429 |                 document.getElementById(st+"_"+at).style.color = "#ff6060";
430 |             }
431 |             else {
432 |                 document.getElementById(st+"_"+at).style.color = "#92b77c";
433 |             }
434 | 
435 | 
436 |         });
437 |         </script>
438 |     </body>
439 | </html>
440 | 


--------------------------------------------------------------------------------
/rl/policy_gradient.py:
--------------------------------------------------------------------------------
  1 | from random import randint
  2 | import tensorflow as tf
  3 | import numpy as np
  4 | import scipy.signal
  5 | 
  6 | """
  7 |     Exemple of the Policy Gradient Algorithm
  8 | """
  9 | 
 10 | class Buffer:
 11 | 
 12 |     def __init__(self, obs_dim, act_dim, size, gamma=0.99, lam=0.95):
 13 |         self.obs_buf = np.zeros(Buffer.combined_shape(size, obs_dim), dtype=np.float32)
 14 |         # Actions buffer
 15 |         self.act_buf = np.zeros(size, dtype=np.float32)
 16 |         # Advantages buffer
 17 |         self.adv_buf = np.zeros(size, dtype=np.float32)
 18 |         # Rewards buffer
 19 |         self.rew_buf = np.zeros(size, dtype=np.float32)
 20 |         # Log probability of action a with the policy
 21 |         self.logp_buf = np.zeros(size, dtype=np.float32)
 22 |         # Gamma and lam to compute the advantage
 23 |         self.gamma, self.lam = gamma, lam
 24 |         # ptr: Position to insert the next tuple
 25 |         # path_start_idx Posittion of the current trajectory
 26 |         # max_size Max size of the buffer
 27 |         self.ptr, self.path_start_idx, self.max_size = 0, 0, size
 28 | 
 29 |     @staticmethod
 30 |     def discount_cumsum(x, discount):
 31 |         """
 32 |             x = [x0, x1, x2]
 33 |             output: [x0 + discount * x1 + discount^2 * x2, x1 + discount * x2, x2]
 34 |         """
 35 |         return scipy.signal.lfilter([1], [1, float(-discount)], x[::-1], axis=0)[::-1]
 36 | 
 37 |     @staticmethod
 38 |     def combined_shape(length, shape=None):
 39 |         if shape is None:
 40 |             return (length,)
 41 |         return (length, shape) if np.isscalar(shape) else (length, *shape)
 42 | 
 43 |     def store(self, obs, act, rew, logp):
 44 |         """
 45 |             Append one timestep of agent-environment interaction to the buffer.
 46 |         """
 47 |         assert self.ptr < self.max_size
 48 |         self.obs_buf[self.ptr] = obs
 49 |         self.act_buf[self.ptr] = act
 50 |         self.rew_buf[self.ptr] = rew
 51 |         self.logp_buf[self.ptr] = logp
 52 |         self.ptr += 1
 53 | 
 54 |     def finish_path(self, last_val=0):
 55 |         # Select the path
 56 |         path_slice = slice(self.path_start_idx, self.ptr)
 57 |         # Append the last_val to the trajectory
 58 |         rews = np.append(self.rew_buf[path_slice], last_val)
 59 |         # Advantage
 60 |         self.adv_buf[path_slice] = Buffer.discount_cumsum(rews, self.gamma)[:-1]
 61 |         self.path_start_idx = self.ptr
 62 | 
 63 |     def get(self):
 64 |         assert self.ptr == self.max_size # buffer has to be full before you can get
 65 |         self.ptr, self.path_start_idx = 0, 0
 66 |         # the next two lines implement the advantage normalization trick
 67 |         # Normalize the Advantage
 68 |         self.adv_buf = (self.adv_buf - np.mean(self.adv_buf)) / np.std(self.adv_buf)
 69 |         return self.obs_buf, self.act_buf, self.adv_buf, self.logp_buf
 70 | 
 71 | 
 72 | class PolicyGradient(object):
 73 |     """
 74 |         Implementation of Policy gradient algorithm
 75 |     """
 76 |     def __init__(self, input_space, action_space, pi_lr, buffer_size, seed):
 77 |         super(PolicyGradient, self).__init__()
 78 | 
 79 |         # Stored the spaces
 80 |         self.input_space = input_space
 81 |         self.action_space = action_space
 82 |         self.seed = seed
 83 |         # NET Buffer defined above
 84 |         self.buffer = Buffer(
 85 |             obs_dim=input_space,
 86 |             act_dim=action_space,
 87 |             size=buffer_size
 88 |         )
 89 |         # Learning rate of the policy network
 90 |         self.pi_lr = pi_lr
 91 |         # The tensorflow session (set later)
 92 |         self.sess = None
 93 |         # Apply a random seed on tensorflow and numpy
 94 |         tf.set_random_seed(42)
 95 |         np.random.seed(42)
 96 | 
 97 |     def compile(self):
 98 |         """
 99 |             Compile the model
100 |         """
101 |         # tf_map: Input: Input state
102 |         # tf_adv: Input: Advantage
103 |         self.tf_map, self.tf_a, self.tf_adv = PolicyGradient.inputs(
104 |             map_space=self.input_space,
105 |             action_space=self.action_space
106 |         )
107 |         # mu_op: Used to get the exploited prediction of the model
108 |         # pi_op: Used to get the prediction of the model
109 |         # logp_a_op: Used to get the log likelihood of taking action a with the current policy
110 |         # logp_pi_op: Used to get the log likelihood of the predicted action @pi_op
111 |         # log_std: Used to get the currently used log_std
112 |         self.mu_op, self.pi_op, self.logp_a_op, self.logp_pi_op = PolicyGradient.mlp(
113 |             tf_map=self.tf_map,
114 |             tf_a=self.tf_a,
115 |             action_space=self.action_space,
116 |             seed=self.seed
117 |         )
118 |         # Error
119 |         self.pi_loss = PolicyGradient.net_objectives(
120 |             tf_adv=self.tf_adv,
121 |             logp_a_op=self.logp_a_op
122 |         )
123 |         # Optimization
124 |         self.train_pi = tf.train.AdamOptimizer(learning_rate=self.pi_lr).minimize(self.pi_loss)
125 |         # Entropy
126 |         self.approx_ent = tf.reduce_mean(-self.logp_a_op)
127 | 
128 | 
129 |     def set_sess(self, sess):
130 |         # Set the tensorflow used to run this model
131 |         self.sess = sess
132 | 
133 |     def step(self, states):
134 |         # Take actions given the states
135 |         # Return mu (policy without exploration), pi (policy with the current exploration) and
136 |         # the log probability of the action chossen by pi
137 |         mu, pi, logp_pi = self.sess.run([self.mu_op, self.pi_op, self.logp_pi_op], feed_dict={
138 |             self.tf_map: states
139 |         })
140 |         return mu, pi, logp_pi
141 | 
142 |     def store(self, obs, act, rew, logp):
143 |         # Store the observation, action, reward and the log probability of the action
144 |         # into the buffer
145 |         self.buffer.store(obs, act, rew, logp)
146 | 
147 |     def finish_path(self, last_val=0):
148 |         self.buffer.finish_path(last_val=last_val)
149 | 
150 |     def train(self, additional_infos={}):
151 |         # Get buffer
152 |         obs_buf, act_buf, adv_buf, logp_last_buf = self.buffer.get()
153 |         # Train the model
154 |         pi_loss_list = []
155 |         entropy_list = []
156 | 
157 |         for step in range(5):
158 |             _, entropy, pi_loss = self.sess.run([self.train_pi, self.approx_ent, self.pi_loss], feed_dict= {
159 |                 self.tf_map: obs_buf,
160 |                 self.tf_a:act_buf,
161 |                 self.tf_adv: adv_buf
162 |             })
163 | 
164 |             pi_loss_list.append(pi_loss)
165 |             entropy_list.append(entropy)
166 | 
167 |         print("Entropy : %s, Loss: %s" % (np.mean(entropy_list), np.mean(pi_loss_list)), end="\r")
168 | 
169 | 
170 |     @staticmethod
171 |     def gaussian_likelihood(x, mu, log_std):
172 |         # Compute the gaussian likelihood of x with a normal gaussian distribution of mean @mu
173 |         # and a std @log_std
174 |         pre_sum = -0.5 * (((x-mu)/(tf.exp(log_std)+1e-8))**2 + 2*log_std + np.log(2*np.pi))
175 |         return tf.reduce_sum(pre_sum, axis=1)
176 | 
177 |     @staticmethod
178 |     def inputs(map_space, action_space):
179 |         """
180 |             @map_space Tuple of the space. Ex (size,)
181 |             @action_space Tuple describing the action space. Ex (size,)
182 |         """
183 |         # Map of the game
184 |         tf_map = tf.placeholder(tf.float32, shape=(None, *map_space), name="tf_map")
185 |         # Possible actions (Should be two: x,y for the beacon game)
186 |         tf_a = tf.placeholder(tf.int32, shape=(None,), name="tf_a")
187 |         # Advantage
188 |         tf_adv = tf.placeholder(tf.float32, shape=(None,), name="tf_adv")
189 |         return tf_map, tf_a, tf_adv
190 | 
191 |     @staticmethod
192 |     def mlp(tf_map, tf_a, action_space, seed=None):
193 |         if seed is not None:
194 |             tf.random.set_random_seed(seed)
195 | 
196 |         # Expand the dimension of the input
197 |         tf_map_expand = tf.expand_dims(tf_map, axis=3)
198 | 
199 |         flatten = tf.layers.flatten(tf_map_expand)
200 |         hidden = tf.layers.dense(flatten, units=256, activation=tf.nn.relu)
201 |         spacial_action_logits = tf.layers.dense(hidden, units=action_space, activation=None)
202 | 
203 |         # Add take the log of the softmax
204 |         logp_all = tf.nn.log_softmax(spacial_action_logits)
205 |         # Take random actions according to the logits (Exploration)
206 |         pi_op = tf.squeeze(tf.multinomial(spacial_action_logits,1), axis=1)
207 |         mu = tf.argmax(spacial_action_logits, axis=1)
208 | 
209 |         # Gives log probability, according to  the policy, of taking actions @a in states @x
210 |         logp_a_op = tf.reduce_sum(tf.one_hot(tf_a, depth=action_space) * logp_all, axis=1)
211 |         # Gives log probability, according to the policy, of the action sampled by pi.
212 |         logp_pi_op = tf.reduce_sum(tf.one_hot(pi_op, depth=action_space) * logp_all, axis=1)
213 | 
214 |         return mu, pi_op, logp_a_op, logp_pi_op
215 | 
216 |     @staticmethod
217 |     def net_objectives(logp_a_op, tf_adv, clip_ratio=0.2):
218 |         """
219 |             @v_op: Predicted value function
220 |             @tf_tv: Expected advantage
221 |             @logp_a_op: Log likelihood of taking action under the current policy
222 |             @tf_logp_old_pi: Log likelihood of the last policy
223 |             @tf_adv: Advantage input
224 |         """
225 |         pi_loss = -tf.reduce_mean(logp_a_op*tf_adv)
226 |         return pi_loss
227 | 
228 | class GridWorld(object):
229 |     """
230 |         docstring for GridWorld.
231 |     """
232 |     def __init__(self):
233 |         super(GridWorld, self).__init__()
234 | 
235 |         self.rewards = [
236 |             [0,  0,  0, 0, -1, 0, 0],
237 |             [0, -1, -1, 0, -1, 0, 0],
238 |             [0, -1, -1, 1, -1, 0, 0],
239 |             [0, -1, -1, 0, -1, 0, 0],
240 |             [0,  0,  0, 0,  0, 0, 0],
241 |             [0,  0,  0, 0,  0, 0, 0],
242 |             [0,  0,  0, 0,  0, 0, 0],
243 |         ]
244 |         self.position = [6, 6] # y, x
245 | 
246 |     def gen_state(self):
247 |         # Generate a state given the current position of the agent
248 |         state = np.zeros((7, 7))
249 |         state[self.position[0]][self.position[1]] = 1
250 |         return state
251 | 
252 |     def step(self, action):
253 |         if action == 0: # Top
254 |             self.position = [(self.position[0] - 1) % 7, self.position[1]]
255 |         elif action == 1: # Left
256 |             self.position = [self.position[0], (self.position[1] - 1) % 7]
257 |         elif action == 2: # Right
258 |             self.position = [self.position[0], (self.position[1] + 1) % 7]
259 |         elif action == 3: # Down
260 |             self.position = [(self.position[0] + 1) % 7, self.position[1]]
261 | 
262 |         reward = self.rewards[self.position[0]][self.position[1]]
263 |         done = False if reward == 0 else True
264 |         state = self.gen_state()
265 |         if done: # The agent is dead, reset the game
266 |             self.position = [6, 6]
267 |         return state, reward, done
268 | 
269 |     def display(self):
270 |         y = 0
271 |         print("="*14)
272 |         for line in self.rewards:
273 |             x = 0
274 |             for case in line:
275 |                 if case == -1:
276 |                     c = "0"
277 |                 elif (y == self.position[0] and x == self.position[1]):
278 |                     c = "A"
279 |                 elif case == 1:
280 |                     c = "T"
281 |                 else:
282 |                     c = "-"
283 |                 print(c, end=" ")
284 |                 x += 1
285 |             y += 1
286 |             print()
287 | 
288 | def main():
289 |     grid = GridWorld()
290 |     buffer_size = 1000
291 | 
292 |     # Create the NET class
293 |     agent = PolicyGradient(
294 |     	input_space=(7, 7),
295 |     	action_space=4,
296 |     	pi_lr=0.001,
297 |     	buffer_size=buffer_size,
298 |     	seed=42
299 |     )
300 |     agent.compile()
301 |     # Init Session
302 |     sess = tf.Session()
303 |     # Init variables
304 |     sess.run(tf.global_variables_initializer())
305 |     # Set the session
306 |     agent.set_sess(sess)
307 | 
308 |     rewards = []
309 | 
310 |     b = 0
311 | 
312 |     for epoch in range(10000):
313 | 
314 |         done = False
315 |         state = grid.gen_state()
316 | 
317 |         while not done:
318 |             _, pi, logpi = agent.step([state])
319 |             n_state, reward, done = grid.step(pi[0])
320 |             agent.store(state, pi[0], reward, logpi)
321 |             b += 1
322 | 
323 |             state = n_state
324 | 
325 |             if done:
326 |                 agent.finish_path(reward)
327 |                 rewards.append(reward)
328 |                 if len(rewards) > 1000:
329 |                     rewards.pop(0)
330 |             if b == buffer_size:
331 |                 if not done:
332 |                     agent.finish_path(0)
333 |                 agent.train()
334 |                 b = 0
335 | 
336 |         if epoch % 1000 == 0:
337 |             print("Rewards mean:%s" % np.mean(rewards))
338 | 
339 |     for epoch in range(10):
340 |         import time
341 |         print("=========================TEST=================================")
342 |         done = False
343 |         state = grid.gen_state()
344 | 
345 |         while not done:
346 |             time.sleep(1)
347 |             _, pi, logpi = agent.step([state])
348 |             n_state, _, done = grid.step(pi[0])
349 |             grid.display()
350 |             state = n_state
351 |         print("reward=>", reward)
352 | 
353 | if __name__ == '__main__':
354 |     main()
355 | 


--------------------------------------------------------------------------------
/rl/policy_gradient_continuous_actions.py:
--------------------------------------------------------------------------------
  1 | from random import randint
  2 | import tensorflow as tf
  3 | import numpy as np
  4 | import scipy.signal
  5 | 
  6 | """
  7 |     Exemple of the Policy Gradient Algorithm
  8 | """
  9 | 
 10 | class Buffer:
 11 | 
 12 |     def __init__(self, obs_dim, act_dim, size, gamma=0.99, lam=0.95):
 13 |         self.obs_buf = np.zeros(Buffer.combined_shape(size, obs_dim), dtype=np.float32)
 14 |         # Actions buffer
 15 |         self.act_buf = np.zeros((size, act_dim), dtype=np.float32)
 16 |         # Advantages buffer
 17 |         self.adv_buf = np.zeros(size, dtype=np.float32)
 18 |         # Rewards buffer
 19 |         self.rew_buf = np.zeros(size, dtype=np.float32)
 20 |         # Log probability of action a with the policy
 21 |         self.logp_buf = np.zeros(size, dtype=np.float32)
 22 |         # Gamma and lam to compute the advantage
 23 |         self.gamma, self.lam = gamma, lam
 24 |         # ptr: Position to insert the next tuple
 25 |         # path_start_idx Posittion of the current trajectory
 26 |         # max_size Max size of the buffer
 27 |         self.ptr, self.path_start_idx, self.max_size = 0, 0, size
 28 | 
 29 |     @staticmethod
 30 |     def discount_cumsum(x, discount):
 31 |         """
 32 |             x = [x0, x1, x2]
 33 |             output: [x0 + discount * x1 + discount^2 * x2, x1 + discount * x2, x2]
 34 |         """
 35 |         return scipy.signal.lfilter([1], [1, float(-discount)], x[::-1], axis=0)[::-1]
 36 | 
 37 |     @staticmethod
 38 |     def combined_shape(length, shape=None):
 39 |         if shape is None:
 40 |             return (length,)
 41 |         return (length, shape) if np.isscalar(shape) else (length, *shape)
 42 | 
 43 |     def store(self, obs, act, rew, logp):
 44 |         """
 45 |             Append one timestep of agent-environment interaction to the buffer.
 46 |         """
 47 |         assert self.ptr < self.max_size
 48 |         self.obs_buf[self.ptr] = obs
 49 |         self.act_buf[self.ptr] = act
 50 |         self.rew_buf[self.ptr] = rew
 51 |         self.logp_buf[self.ptr] = logp
 52 |         self.ptr += 1
 53 | 
 54 |     def finish_path(self, last_val=0):
 55 |         # Select the path
 56 |         path_slice = slice(self.path_start_idx, self.ptr)
 57 |         # Append the last_val to the trajectory
 58 |         rews = np.append(self.rew_buf[path_slice], last_val)
 59 |         # Advantage
 60 |         self.adv_buf[path_slice] = Buffer.discount_cumsum(rews, self.gamma)[:-1]
 61 |         self.path_start_idx = self.ptr
 62 | 
 63 |     def get(self):
 64 |         assert self.ptr == self.max_size # buffer has to be full before you can get
 65 |         self.ptr, self.path_start_idx = 0, 0
 66 |         # the next two lines implement the advantage normalization trick
 67 |         # Normalize the Advantage
 68 |         if np.std(self.adv_buf) != 0:
 69 |             self.adv_buf = (self.adv_buf - np.mean(self.adv_buf)) / np.std(self.adv_buf)
 70 |         return self.obs_buf, self.act_buf, self.adv_buf, self.logp_buf
 71 | 
 72 | 
 73 | class PolicyGradient(object):
 74 |     """
 75 |         Implementation of Policy gradient algorithm
 76 |     """
 77 |     def __init__(self, input_space, action_space, pi_lr, buffer_size, seed):
 78 |         super(PolicyGradient, self).__init__()
 79 | 
 80 |         # Stored the spaces
 81 |         self.input_space = input_space
 82 |         self.action_space = action_space
 83 |         self.seed = seed
 84 |         # NET Buffer defined above
 85 |         self.buffer = Buffer(
 86 |             obs_dim=input_space,
 87 |             act_dim=action_space,
 88 |             size=buffer_size
 89 |         )
 90 |         # Learning rate of the policy network
 91 |         self.pi_lr = pi_lr
 92 |         # The tensorflow session (set later)
 93 |         self.sess = None
 94 |         # Apply a random seed on tensorflow and numpy
 95 |         tf.set_random_seed(42)
 96 |         np.random.seed(42)
 97 | 
 98 |     def compile(self):
 99 |         """
100 |             Compile the model
101 |         """
102 |         # tf_map: Input: Input state
103 |         # tf_adv: Input: Advantage
104 |         self.tf_map, self.tf_a, self.tf_adv = PolicyGradient.inputs(
105 |             map_space=self.input_space,
106 |             action_space=self.action_space
107 |         )
108 |         # mu_op: Used to get the exploited prediction of the model
109 |         # pi_op: Used to get the prediction of the model
110 |         # logp_a_op: Used to get the log likelihood of taking action a with the current policy
111 |         # logp_pi_op: Used to get the log likelihood of the predicted action @pi_op
112 |         # log_std: Used to get the currently used log_std
113 |         self.mu_op, self.pi_op, self.logp_a_op, self.logp_pi_op, self.std = PolicyGradient.mlp(
114 |             tf_map=self.tf_map,
115 |             tf_a=self.tf_a,
116 |             action_space=self.action_space,
117 |             seed=self.seed
118 |         )
119 |         # Error
120 |         self.pi_loss = PolicyGradient.net_objectives(
121 |             tf_adv=self.tf_adv,
122 |             logp_a_op=self.logp_a_op
123 |         )
124 |         # Optimization
125 |         self.train_pi = tf.train.AdamOptimizer(learning_rate=self.pi_lr).minimize(self.pi_loss)
126 |         # Entropy
127 |         self.approx_ent = tf.reduce_mean(-self.logp_a_op)
128 | 
129 | 
130 |     def set_sess(self, sess):
131 |         # Set the tensorflow used to run this model
132 |         self.sess = sess
133 | 
134 |     def step(self, states):
135 |         # Take actions given the states
136 |         # Return mu (policy without exploration), pi (policy with the current exploration) and
137 |         # the log probability of the action chossen by pi
138 |         mu, pi, logp_pi = self.sess.run([self.mu_op, self.pi_op, self.logp_pi_op], feed_dict={
139 |             self.tf_map: states
140 |         })
141 |         return mu, pi, logp_pi
142 | 
143 |     def store(self, obs, act, rew, logp):
144 |         # Store the observation, action, reward and the log probability of the action
145 |         # into the buffer
146 |         self.buffer.store(obs, act, rew, logp)
147 | 
148 |     def finish_path(self, last_val=0):
149 |         self.buffer.finish_path(last_val=last_val)
150 | 
151 |     def train(self, additional_infos={}):
152 |         # Get buffer
153 |         obs_buf, act_buf, adv_buf, logp_last_buf = self.buffer.get()
154 |         # Train the model
155 |         pi_loss_list = []
156 |         entropy_list = []
157 | 
158 |         import time
159 |         t = time.time()
160 | 
161 |         for step in range(5):
162 |             _, entropy, pi_loss, std = self.sess.run([self.train_pi, self.approx_ent, self.pi_loss, self.std], feed_dict= {
163 |                 self.tf_map: obs_buf,
164 |                 self.tf_a:act_buf,
165 |                 self.tf_adv: adv_buf
166 |             })
167 | 
168 |             pi_loss_list.append(pi_loss)
169 |             entropy_list.append(entropy)
170 | 
171 |         print("Std: %.2f, Entropy : %.2f" % (std[0], np.mean(entropy_list)), end="\r")
172 | 
173 | 
174 |     @staticmethod
175 |     def gaussian_likelihood(x, mu, log_std):
176 |         # Compute the gaussian likelihood of x with a normal gaussian distribution of mean @mu
177 |         # and a std @log_std
178 |         pre_sum = -0.5 * (((x-mu)/(tf.exp(log_std)+1e-8))**2 + 2*log_std + np.log(2*np.pi))
179 |         return tf.reduce_sum(pre_sum, axis=1)
180 | 
181 |     @staticmethod
182 |     def inputs(map_space, action_space):
183 |         """
184 |             @map_space Tuple of the space. Ex (size,)
185 |             @action_space Tuple describing the action space. Ex (size,)
186 |         """
187 |         # Map of the game
188 |         tf_map = tf.placeholder(tf.float32, shape=(None, *map_space), name="tf_map")
189 |         # Possible actions (Should be two: x,y for the beacon game)
190 |         tf_a = tf.placeholder(tf.float32, shape=(None, action_space), name="tf_a")
191 |         # Advantage
192 |         tf_adv = tf.placeholder(tf.float32, shape=(None,), name="tf_adv")
193 |         return tf_map, tf_a, tf_adv
194 | 
195 |     @staticmethod
196 |     def gaussian_likelihood(x, mu, log_std):
197 |         # Compute the gaussian likelihood of x with a normal gaussian distribution of mean @mu
198 |         # and a std @log_std
199 |         pre_sum = -0.5 * (((x-mu)/(tf.exp(log_std)+1e-8))**2 + 2*log_std + np.log(2*np.pi))
200 |         return tf.reduce_sum(pre_sum, axis=1)
201 |         #pre_sum = (1.*tf.exp((-(x-mu)**2)/(2*std**2)))/tf.sqrt(2*3.14*std**2)
202 |         #retun
203 | 
204 |     @staticmethod
205 |     def mlp(tf_map, tf_a, action_space, seed=None):
206 |         if seed is not None:
207 |             tf.random.set_random_seed(seed)
208 | 
209 |         # Expand the dimension of the input
210 |         tf_map_expand = tf.expand_dims(tf_map, axis=3)
211 | 
212 |         flatten = tf.layers.flatten(tf_map_expand)
213 |         hidden = tf.layers.dense(flatten, units=256, activation=tf.nn.relu)
214 |         mu = tf.layers.dense(hidden, units=action_space, activation=tf.nn.relu)
215 | 
216 |         log_std = tf.get_variable(name='log_std', initializer=-0.5*np.ones(action_space, dtype=np.float32))
217 |         std = tf.exp(log_std)
218 |         pi_op = mu + tf.random_normal(tf.shape(mu)) * std
219 | 
220 |         # Gives log probability, according to  the policy, of taking actions @a in states @x
221 |         logp_a_op = PolicyGradient.gaussian_likelihood(tf_a, mu, log_std)
222 |         # Gives log probability, according to the policy, of the action sampled by pi.
223 |         logp_pi_op = PolicyGradient.gaussian_likelihood(pi_op, mu, log_std)
224 | 
225 |         return mu, pi_op, logp_a_op, logp_pi_op, std
226 | 
227 |     @staticmethod
228 |     def net_objectives(logp_a_op, tf_adv, clip_ratio=0.2):
229 |         """
230 |             @v_op: Predicted value function
231 |             @tf_tv: Expected advantage
232 |             @logp_a_op: Log likelihood of taking action under the current policy
233 |             @tf_logp_old_pi: Log likelihood of the last policy
234 |             @tf_adv: Advantage input
235 |         """
236 |         pi_loss = -tf.reduce_mean(logp_a_op*tf_adv)
237 |         return pi_loss
238 | 
239 | class GridWorld(object):
240 |     """
241 |         docstring for GridWorld.
242 |     """
243 |     def __init__(self):
244 |         super(GridWorld, self).__init__()
245 | 
246 |         self.rewards = [
247 |             [0,  0,  0, 0, -1, 0, 0],
248 |             [0, -1, -1, 0, -1, 0, 0],
249 |             [0, -1, -1, 1, -1, 0, 0],
250 |             [0, -1, -1, 0, -1, 0, 0],
251 |             [0,  0,  0, 0,  0, 0, 0],
252 |             [0,  0,  0, 0,  0, 0, 0],
253 |             [0,  0,  0, 0,  0, 0, 0],
254 |         ]
255 |         self.position = [6, 6] # y, x
256 | 
257 |     def gen_state(self):
258 |         # Generate a state given the current position of the agent
259 |         state = np.zeros((7, 7))
260 |         state[self.position[0]][self.position[1]] = 1
261 |         return state
262 | 
263 |     def step(self, y, x):
264 |         # y and x coordinates
265 | 
266 |         # Move in the direction of the "click"
267 |         self.position = [
268 |             self.position[0] + min(1, max(-1, (y - self.position[0]))),
269 |             self.position[1] + min(1, max(-1, (x - self.position[1])))
270 |         ]
271 |         reward = self.rewards[self.position[0]][self.position[1]]
272 |         done = False if reward == 0 else True
273 |         state = self.gen_state()
274 |         if done: # The agent is dead, reset the game
275 |             self.position = [6, 6]
276 |         return state, reward, done
277 | 
278 |     def display(self):
279 |         y = 0
280 |         print("="*14)
281 |         for line in self.rewards:
282 |             x = 0
283 |             for case in line:
284 |                 if case == -1:
285 |                     c = "0"
286 |                 elif (y == self.position[0] and x == self.position[1]):
287 |                     c = "A"
288 |                 elif case == 1:
289 |                     c = "T"
290 |                 else:
291 |                     c = "-"
292 |                 print(c, end=" ")
293 |                 x += 1
294 |             y += 1
295 |             print()
296 | 
297 | def main():
298 |     grid = GridWorld()
299 |     buffer_size = 1000
300 | 
301 |     # Create the NET class
302 |     agent = PolicyGradient(
303 |     	input_space=(7, 7),
304 |     	action_space=2,
305 |     	pi_lr=0.001,
306 |     	buffer_size=buffer_size,
307 |     	seed=42
308 |     )
309 |     agent.compile()
310 |     # Init Session
311 |     sess = tf.Session()
312 |     # Init variables
313 |     sess.run(tf.global_variables_initializer())
314 |     # Set the session
315 |     agent.set_sess(sess)
316 | 
317 |     rewards = []
318 | 
319 |     b = 0
320 |     display = False
321 | 
322 |     for epoch in range(100000):
323 | 
324 |         done = False
325 |         state = grid.gen_state()
326 | 
327 |         s = 0
328 |         while not done and s < 20:
329 |             s += 1
330 |             _, pi, logpi = agent.step([state])
331 | 
332 |             y = max(0, min(6, int(round((pi[0][0]+1.) / 2*6))))
333 |             x = max(0, min(6, int(round((pi[0][1]+1.) / 2*6))))
334 | 
335 |             if display:
336 |                 import time
337 |                 time.sleep(0.1)
338 |                 grid.display()
339 | 
340 |             n_state, reward, done = grid.step(y, x)
341 |             agent.store(state, pi[0], -0.1 if reward == 0 else reward, logpi)
342 |             b += 1
343 | 
344 |             state = n_state
345 | 
346 |             if done:
347 |                 agent.finish_path(reward)
348 |                 rewards.append(reward)
349 |                 if len(rewards) > 1000:
350 |                     rewards.pop(0)
351 |             if b == buffer_size:
352 |                 if not done:
353 |                     agent.finish_path(0)
354 |                 agent.train()
355 |                 b = 0
356 | 
357 |         if epoch % 1000 == 0:
358 |             print("\nEpoch: %s Rewards mean:%s" % (epoch, np.mean(rewards)))
359 | 
360 |     for epoch in range(10):
361 |         import time
362 |         print("=========================TEST=================================")
363 |         done = False
364 |         state = grid.gen_state()
365 | 
366 |         while not done:
367 |             time.sleep(1)
368 |             mu, pi, logpi = agent.step([state])
369 | 
370 |             y = max(0, min(6, int(round((pi[0][0]+1.) / 2*6))))
371 |             x = max(0, min(6, int(round((pi[0][1]+1.) / 2*6))))
372 | 
373 |             n_state, _, done = grid.step(y, x)
374 |             grid.display()
375 |             state = n_state
376 |         print("reward=>", reward)
377 | 
378 | if __name__ == '__main__':
379 |     main()
380 | 


--------------------------------------------------------------------------------
/genetic/zombies/src/index.ts:
--------------------------------------------------------------------------------
  1 | 
  2 | function mod(x: number, n: number): number {
  3 |   return (x % n + n) % n;
  4 | }
  5 | 
  6 | export interface Zombie {
  7 |   sprite: PIXI.Sprite;
  8 |   vector: number[];
  9 |   follow: boolean;
 10 |   speed: number;
 11 |   perception: number;
 12 | }
 13 | 
 14 | export interface Human {
 15 |   sprite: PIXI.Sprite;
 16 |   vector: number[];
 17 |   followdist: number;
 18 |   speed: number;
 19 |   perception: number;
 20 |   ammunition: number;
 21 |   shot_accuracy: number;
 22 |   genome: number[];
 23 |   lifeduration: number;
 24 | }
 25 | 
 26 | export interface Bullet {
 27 |   sprite: PIXI.Sprite;
 28 | }
 29 | 
 30 | export interface WorldConf {
 31 |   [key:string]: any,
 32 |   nbzombie: number;
 33 |   nbhuman: number;
 34 | }
 35 | 
 36 | export interface Phenotype {
 37 |   speed: number,
 38 |   perception: number,
 39 |   shot_accuracy: number,
 40 |   ammunition: number
 41 | }
 42 | 
 43 | export interface Position {
 44 |   y: number;
 45 |   x: number;
 46 | }
 47 | 
 48 | class GeneticZombie {
 49 | 
 50 |   private canvasId: string;
 51 |   protected app: PIXI.Application = null; // The pixi app.
 52 |   protected sprite: PIXI.Sprite = null;
 53 |   protected zombies: Zombie[] = [];
 54 |   protected bullets: Bullet[] = [];
 55 |   protected humans: Human[] = [];
 56 |   protected humansToConvert: string[] = [];
 57 |   protected zombieToKill: string[] = [];
 58 |   protected sounds: Position[] = [];
 59 |   protected isplaying: boolean = true;
 60 |   protected population: Human[] = [];
 61 |   protected last_population: Human[] = [];
 62 |   protected stepnb: number = 0;
 63 |   protected callback: any = null;
 64 | 
 65 |   protected width: number = 1200;
 66 |   protected height: number = 800;
 67 | 
 68 |   protected config: WorldConf = {
 69 |       nbzombie: 50,
 70 |       nbhuman: 50,
 71 |       nbbullets: 10,
 72 |   };
 73 | 
 74 |   constructor(canvasId: string, config: WorldConf) {
 75 |       if (!canvasId){
 76 |           console.error("You must specify the canvasId");
 77 |       }
 78 | 
 79 |       for (let key in this.config){
 80 |         if (config && key in config){
 81 |           this.config[key] = config[key];
 82 |         }
 83 |       }
 84 | 
 85 |       this.canvasId = canvasId;//Create a Pixi Application
 86 |       this.app = new PIXI.Application({
 87 |         width: this.width,
 88 |         height: this.height,
 89 |         backgroundColor: 0x002628
 90 |       });
 91 |       //Add the canvas that Pixi automatically created for you to the HTML document
 92 |       document.getElementById(this.canvasId).appendChild(this.app.view);
 93 | 
 94 |       this.app.ticker.add(delta => {
 95 |         if (this.callback){
 96 |           this.callback(this, delta);
 97 |         }
 98 |       });
 99 |     }
100 | 
101 |     protected vectorrand(vector: number []){
102 |        let a = Math.random();
103 |        let b = 1 - a;
104 |        vector[0] = vector[0] * a;
105 |        vector[1] = vector[1] * b;
106 |        return vector;
107 |     }
108 | 
109 |     public run(){
110 |       this.isplaying = true;
111 |     }
112 | 
113 |     public stop(){
114 |       this.isplaying = false;
115 |     }
116 | 
117 |     /*
118 |       Step in the environment
119 |     */
120 |     public step(delta: number){
121 |         this.stepnb += 1;
122 | 
123 |         // Track the human that are going to be catch by zombies
124 |         this.humansToConvert = [];
125 |         // Step all the zombies
126 |         let n_zombies: Zombie[] = [];
127 |         for (let i in this.zombies) {
128 |           this.stepZombie(this.zombies[i], i);
129 |           if (this.zombieToKill.indexOf(i) == -1){
130 |             n_zombies.push(this.zombies[i]);
131 |           }
132 |         }
133 |         this.zombies = n_zombies;
134 |         if (this.zombies.length == 0){
135 |           let zombie = this.createZombie();
136 |         }
137 |         // Track the zombie that are going to be kill
138 |         this.zombieToKill = [];
139 |         // Step all the humans
140 |         this.sounds = [];
141 |         let n_humans: Human[] = [];
142 |         for (let i in this.humans) {
143 |           this.stepHuman(this.humans[i], i);
144 |           if (this.humansToConvert.indexOf(i) == -1){
145 |             this.humans[i].lifeduration = this.stepnb;
146 |             n_humans.push(this.humans[i])
147 |           }
148 |           else {
149 |             let zombie = this.createZombie();
150 |             zombie.sprite.position.x = this.humans[i].sprite.position.x;
151 |             zombie.sprite.position.y = this.humans[i].sprite.position.y;
152 |             this.humans[i].sprite.visible = false;
153 |           }
154 |         }
155 |         this.humans = n_humans;
156 |     }
157 | 
158 |     /*
159 |       Move a particular Zombie
160 |     */
161 |     public stepZombie(z: Zombie, hid: string){
162 |         for (let s in this.sounds){
163 |           let sound = this.sounds[s];
164 |           let dist = Math.sqrt((z.sprite.position.x-sound.x)**2 + (z.sprite.position.y-sound.y)**2);
165 |           if (dist < 200){
166 |             z.vector[0] = sound.x <= z.sprite.position.x ? -1 : 1;
167 |             z.vector[1] = sound.y <= z.sprite.position.y ? -1 : 1;
168 |           }
169 |         }
170 |         for (let i in this.humans) {
171 |           let h = this.humans[i];
172 | 
173 |           let dist = Math.sqrt((h.sprite.position.x-z.sprite.position.x)**2 + (h.sprite.position.y-z.sprite.position.y)**2);
174 | 
175 |           if (dist < 10){
176 |             if (this.humansToConvert.indexOf(i) == -1){
177 |                 this.humansToConvert.push(i);
178 |                 this.app.stage.removeChild(h.sprite);
179 |             }
180 |           }
181 |           // The speed of the human increase the perception of the zombie
182 |           // because of the sound
183 |           let factornoise = h.speed > 1.0 ? (h.speed-1.0)*50 : 0.0;
184 |           if (dist < z.perception * factornoise){
185 |              z.vector[0] = h.sprite.position.x <= z.sprite.position.x ? -1 : 1;
186 |              z.vector[1] = h.sprite.position.y <= z.sprite.position.y ? -1 : 1;
187 |              //z.vector = this.vectorrand(z.vector);
188 |           }
189 |         }
190 |         z.sprite.position.x = mod(z.sprite.position.x + (z.speed*z.vector[0]) , this.width);
191 |         z.sprite.position.y = mod(z.sprite.position.y + (z.speed*z.vector[1]),  this.height);
192 |     }
193 | 
194 |     /*
195 |       Move a particular human
196 |     */
197 |     public stepHuman(h: Human, hid: string): void {
198 | 
199 |       h.followdist = 10000;
200 | 
201 |       let n_bullets: Bullet[] = [];
202 |       let bullet_to_add = 0;
203 |       for (let b in this.bullets) {
204 |         let bullet = this.bullets[b];
205 |         let dist = Math.sqrt((h.sprite.position.x-bullet.sprite.position.x)**2 + (h.sprite.position.y-bullet.sprite.position.y)**2);
206 |         if (dist < h.perception){
207 |           h.vector[0] = h.sprite.position.x <= bullet.sprite.position.x ? 1 : -1;
208 |           h.vector[1] = h.sprite.position.y <= bullet.sprite.position.y ? 1 : -1;
209 |           h.vector = this.vectorrand(h.vector);
210 |         }
211 |         // Take the bullets
212 |         if (dist < 10){
213 |           bullet.sprite.visible = false;
214 |           this.app.stage.removeChild(bullet.sprite);
215 |           h.ammunition += 5;
216 |           bullet_to_add += 1;
217 |         } else{
218 |           n_bullets.push(bullet);
219 |         }
220 |       }
221 |       this.bullets = n_bullets;
222 |       for (let b=0; b < bullet_to_add; b++){
223 |         this.createBullet();
224 |       }
225 | 
226 |       for (let i in this.zombies) {
227 |         let z = this.zombies[i];
228 | 
229 |         let dist = Math.sqrt((h.sprite.position.x-z.sprite.position.x)**2 + (h.sprite.position.y-z.sprite.position.y)**2);
230 | 
231 |         if (dist < h.perception && h.followdist+10 > dist){
232 |           if (h.ammunition > 0){ // Try to shot the Zombie
233 |               h.ammunition = h.ammunition - 1;
234 |               this.sounds.push({
235 |                 y: h.sprite.position.y,
236 |                 x: h.sprite.position.x
237 |               });
238 |               if (Math.random() < h.shot_accuracy){
239 |                 this.zombieToKill.push(i);
240 |                 this.app.stage.removeChild(z.sprite);
241 |               }
242 |           }
243 |            h.vector[0] = h.sprite.position.x <= z.sprite.position.x ? -1 : 1;
244 |            h.vector[1] = h.sprite.position.y <= z.sprite.position.y ? -1 : 1;
245 |            h.vector = this.vectorrand(h.vector);
246 |            h.followdist = dist;
247 |         }
248 |       }
249 | 
250 |       h.sprite.position.x = mod(h.sprite.position.x + (h.speed*h.vector[0]), this.width);
251 |       h.sprite.position.y = mod(h.sprite.position.y + (h.speed*h.vector[1]), this.height);
252 |     }
253 | 
254 |     public emptyHumans(){
255 |       for (let h in this.humans){
256 |         this.humans[h].sprite.visible = false;
257 |         this.app.stage.removeChild(this.humans[h].sprite);
258 |       }
259 |       this.humans = [];
260 |       this.last_population = this.population;
261 |       this.population = [];
262 |     }
263 | 
264 |     public emptyZomvies(){
265 |       for (let z in this.zombies){
266 |         this.zombies[z].sprite.visible = false;
267 |         this.app.stage.removeChild(this.zombies[z].sprite);
268 |       }
269 |       this.zombies = [];
270 |     }
271 | 
272 |     public createZombies(){
273 |       console.log("create zombies", this.config.nbzombie - this.zombies.length);
274 |       for (let i = 0; i < this.config.nbzombie - this.zombies.length; i++) {
275 |         this.createZombie();
276 |       }
277 |     }
278 | 
279 |     public reset() {
280 |       this.stepnb = 0;
281 | 
282 |       for (let b in this.bullets){
283 |         this.bullets[b].sprite.visible = false;
284 |         this.app.stage.removeChild(this.bullets[b].sprite);
285 |       }
286 |       this.bullets = [];
287 |       for (let z in this.zombies){
288 |         this.zombies[z].sprite.visible = false;
289 |         this.app.stage.removeChild(this.zombies[z].sprite);
290 |       }
291 |       this.zombies = [];
292 |       for (let h in this.humans){
293 |         this.humans[h].sprite.visible = false;
294 |         this.app.stage.removeChild(this.humans[h].sprite);
295 |       }
296 |       this.humans = [];
297 |       this.population = [];
298 |       for (let i = 0; i < this.config.nbbullets; i++) {
299 |         this.createBullet();
300 |       }
301 |       for (let i = 0; i < this.config.nbzombie; i++) {
302 |         this.createZombie();
303 |       }
304 |     }
305 | 
306 |     public init(oninit: any, callback: any): void {
307 |       PIXI.loader
308 |       .add("/public/images/boy.png")
309 |       .add("/public/images/girl.png")
310 |       .add("/public/images/zombie.png")
311 |       .add("/public/images/bullet.png")
312 |       .load(() => {
313 | 
314 |           for (let i = 0; i < this.config.nbbullets; i++) {
315 |             this.createBullet();
316 |           }
317 | 
318 |           for (let i = 0; i < this.config.nbzombie; i++) {
319 |             this.createZombie();
320 |           }
321 |           //for (let i = 0; i < this.config.nbhuman; i++) {
322 |           //  this.createHuman();
323 |           //}
324 | 
325 |           if (oninit){
326 |             oninit(this);
327 |           }
328 | 
329 |           this.callback = callback;
330 | 
331 |       });
332 |     }
333 | 
334 |     /**
335 |     * Create a new Zombie
336 |     */
337 |     public createZombie(): Zombie {
338 |       let sprite = new PIXI.Sprite(PIXI.loader.resources["/public/images/zombie.png"].texture);
339 |       sprite.scale.x = 0.05;
340 |       sprite.scale.y = 0.05;
341 |       // Position of the Zombie
342 |       sprite.position.x = Math.floor(Math.random() * this.width);
343 |       sprite.position.y = Math.floor(Math.random() * this.height);
344 |       this.app.stage.addChild(sprite);
345 | 
346 |       let vector: number[] = [
347 |         Math.random() > 0.75 ? -1 : 1,
348 |         Math.random() > 0.25 ? -1 : 1
349 |       ]
350 |       vector = this.vectorrand(vector);
351 | 
352 |       let zombie: Zombie = {
353 |         sprite: sprite,
354 |         vector: vector,
355 |         follow: false,
356 |         speed: 0.8,
357 |         perception: 30.0
358 |       }
359 |       this.zombies.push(zombie);
360 |       return zombie;
361 |     }
362 | 
363 | 
364 |     /**
365 |     * Create a Bullet
366 |     */
367 |     public createBullet(): Bullet {
368 |       let sprite = new PIXI.Sprite(PIXI.loader.resources["/public/images/bullet.png"].texture);
369 |       sprite.scale.x = 0.03;
370 |       sprite.scale.y = 0.03;
371 |       // Position of the Zombie
372 |       sprite.position.x = Math.floor(Math.random() * this.width);
373 |       sprite.position.y = Math.floor(Math.random() * this.height);
374 |       this.app.stage.addChild(sprite);
375 | 
376 |       let bullet: Bullet = {
377 |         sprite: sprite
378 |       }
379 | 
380 |       this.bullets.push(bullet);
381 |       return bullet;
382 |     }
383 | 
384 |     /*
385 |       Genome to phenotype
386 |     */
387 |     protected genomeToPhenotype(genome: number[]): Phenotype {
388 |       let phenotype: Phenotype = {
389 |         speed: 1.0 + (0.4*genome[0]-0.2),
390 |         perception: 50 + (300*genome[1]-150),
391 |         shot_accuracy: 0.3 + (0.6*genome[2]-0.3),
392 |         ammunition: 4 + (20*genome[3]-10),
393 |       };
394 |       return phenotype;
395 |     }
396 | 
397 |     /**
398 |     * Create a new Human
399 |     */
400 |     public createHuman(genome: number[]): void {
401 |       let phenotype = this.genomeToPhenotype(genome);
402 |       let sprite = new PIXI.Sprite(PIXI.loader.resources[
403 |         Math.random() > 0.5 ? "/public/images/boy.png" : "/public/images/girl.png"
404 |       ].texture);
405 |       sprite.scale.x = 0.05;
406 |       sprite.scale.y = 0.05;
407 |       // Position of the Zombie
408 |       sprite.position.x = Math.floor(Math.random() * this.width);
409 |       sprite.position.y = Math.floor(Math.random() * this.height);
410 |       this.app.stage.addChild(sprite);
411 | 
412 |       let vector: number[] = [
413 |         Math.random() > 0.5 ? -1 : 1,
414 |         Math.random() > 0.5 ? -1 : 1
415 |       ]
416 |       vector = this.vectorrand(vector);
417 | 
418 |       let n_human = {
419 |         sprite: sprite,
420 |         vector: vector,
421 |         followdist: 10000,
422 |         speed: phenotype.speed,
423 |         perception: phenotype.perception,
424 |         ammunition: phenotype.ammunition,
425 |         shot_accuracy: phenotype.shot_accuracy,
426 |         genome: genome,
427 |         lifeduration: 0
428 |       };
429 |       this.humans.push(n_human);
430 |       this.population.push(n_human);
431 |     }
432 | 
433 | }
434 | 
435 | const geneticzombie = {
436 |   env: GeneticZombie
437 | }
438 | 
439 | export default geneticzombie;
440 | 


--------------------------------------------------------------------------------
/rl/actor_critic.py:
--------------------------------------------------------------------------------
  1 | from random import randint
  2 | import tensorflow as tf
  3 | import numpy as np
  4 | import scipy.signal
  5 | 
  6 | """
  7 |     Exemple of the Policy Gradient Algorithm
  8 | """
  9 | 
 10 | class Buffer:
 11 | 
 12 |     def __init__(self, obs_dim, act_dim, size, gamma=0.99, lam=0.95):
 13 |         self.obs_buf = np.zeros(Buffer.combined_shape(size, obs_dim), dtype=np.float32)
 14 |         # Actions buffer
 15 |         self.act_buf = np.zeros(size, dtype=np.float32)
 16 |         # Advantages buffer
 17 |         self.adv_buf = np.zeros(size, dtype=np.float32)
 18 |         # Value function buffer
 19 |         self.val_buf = np.zeros(size, dtype=np.float32)
 20 |         # Rewards buffer
 21 |         self.rew_buf = np.zeros(size, dtype=np.float32)
 22 |         # Log probability of action a with the policy
 23 |         self.logp_buf = np.zeros(size, dtype=np.float32)
 24 |         # Gamma and lam to compute the advantage
 25 |         self.gamma, self.lam = gamma, lam
 26 |         # Rreturn buffer (used to train the value fucntion)
 27 |         self.ret_buf = np.zeros(size, dtype=np.float32)
 28 |         # ptr: Position to insert the next tuple
 29 |         # path_start_idx Posittion of the current trajectory
 30 |         # max_size Max size of the buffer
 31 |         self.ptr, self.path_start_idx, self.max_size = 0, 0, size
 32 | 
 33 |     @staticmethod
 34 |     def discount_cumsum(x, discount):
 35 |         """
 36 |             x = [x0, x1, x2]
 37 |             output: [x0 + discount * x1 + discount^2 * x2, x1 + discount * x2, x2]
 38 |         """
 39 |         return scipy.signal.lfilter([1], [1, float(-discount)], x[::-1], axis=0)[::-1]
 40 | 
 41 |     @staticmethod
 42 |     def combined_shape(length, shape=None):
 43 |         if shape is None:
 44 |             return (length,)
 45 |         return (length, shape) if np.isscalar(shape) else (length, *shape)
 46 | 
 47 |     def store(self, obs, act, rew, logp, val):
 48 |         # Append one timestep of agent-environment interaction to the buffer.
 49 |         assert self.ptr < self.max_size
 50 |         self.obs_buf[self.ptr] = obs
 51 |         self.act_buf[self.ptr] = act
 52 |         self.rew_buf[self.ptr] = rew
 53 |         self.logp_buf[self.ptr] = logp
 54 |         self.val_buf[self.ptr] = val
 55 |         self.ptr += 1
 56 | 
 57 |     def finish_path(self, last_val=0):
 58 |         # Select the path
 59 |         path_slice = slice(self.path_start_idx, self.ptr)
 60 |         # Append the last_val to the trajectory
 61 |         rews = np.append(self.rew_buf[path_slice], last_val)
 62 |         # Append the last value to the value function
 63 |         vals = np.append(self.val_buf[path_slice], last_val)
 64 |         # Deltas = r + y*v(s') - v(s)
 65 |         deltas = rews[:-1] + self.gamma * vals[1:] - vals[:-1]
 66 |         self.adv_buf[path_slice] = Buffer.discount_cumsum(deltas, self.gamma)
 67 |         # Advantage
 68 |         self.ret_buf[path_slice] = Buffer.discount_cumsum(rews, self.gamma)[:-1]
 69 |         self.path_start_idx = self.ptr
 70 | 
 71 |     def get(self):
 72 |         assert self.ptr == self.max_size # buffer has to be full before you can get
 73 |         self.ptr, self.path_start_idx = 0, 0
 74 |         # the next two lines implement the advantage normalization trick
 75 |         # Normalize the Advantage
 76 |         if np.std(self.adv_buf) != 0:
 77 |             self.adv_buf = (self.adv_buf - np.mean(self.adv_buf)) / np.std(self.adv_buf)
 78 |         return self.obs_buf, self.act_buf, self.adv_buf, self.logp_buf, self.ret_buf
 79 | 
 80 | 
 81 | class ActorCritic(object):
 82 |     """
 83 |         Implementation of Policy gradient algorithm
 84 |     """
 85 |     def __init__(self, input_space, action_space, pi_lr, v_lr, buffer_size, seed):
 86 |         super(ActorCritic, self).__init__()
 87 | 
 88 |         # Stored the spaces
 89 |         self.input_space = input_space
 90 |         self.action_space = action_space
 91 |         self.seed = seed
 92 |         # NET Buffer defined above
 93 |         self.buffer = Buffer(
 94 |             obs_dim=input_space,
 95 |             act_dim=action_space,
 96 |             size=buffer_size
 97 |         )
 98 |         # Learning rate of the policy network and the value network
 99 |         self.pi_lr = pi_lr
100 |         self.v_lr = v_lr
101 |         # The tensorflow session (set later)
102 |         self.sess = None
103 |         # Apply a random seed on tensorflow and numpy
104 |         tf.set_random_seed(42)
105 |         np.random.seed(42)
106 | 
107 |     def compile(self):
108 |         """
109 |             Compile the model
110 |         """
111 |         # tf_a : Input: Chosen action
112 |         # tf_map: Input: Input state
113 |         # tf_tv: Input: Target value function
114 |         # tf_adv: Input: Advantage
115 |         self.tf_map, self.tf_a, self.tf_adv, self.tf_tv = ActorCritic.inputs(
116 |             map_space=self.input_space,
117 |             action_space=self.action_space
118 |         )
119 |         # mu_op: Used to get the exploited prediction of the model
120 |         # pi_op: Used to get the prediction of the model
121 |         # logp_a_op: Used to get the log likelihood of taking action a with the current policy
122 |         # logp_pi_op: Used to get the log likelihood of the predicted action @pi_op
123 |         # v_op: Used to get the value function of the given state
124 |         self.mu_op, self.pi_op, self.logp_a_op, self.logp_pi_op, self.v_op = ActorCritic.mlp(
125 |             tf_map=self.tf_map,
126 |             tf_a=self.tf_a,
127 |             action_space=self.action_space,
128 |             seed=self.seed
129 |         )
130 |         # Error
131 |         self.pi_loss, self.v_loss = ActorCritic.net_objectives(
132 |             tf_adv=self.tf_adv,
133 |             logp_a_op=self.logp_a_op,
134 |             v_op=self.v_op,
135 |             tf_tv=self.tf_tv
136 |         )
137 |         # Optimization
138 |         self.train_pi = tf.train.AdamOptimizer(learning_rate=self.pi_lr).minimize(self.pi_loss)
139 |         self.train_v = tf.train.AdamOptimizer(learning_rate=self.v_lr).minimize(self.v_loss)
140 |         # Entropy
141 |         self.approx_ent = tf.reduce_mean(-self.logp_a_op)
142 | 
143 | 
144 |     def set_sess(self, sess):
145 |         # Set the tensorflow used to run this model
146 |         self.sess = sess
147 | 
148 |     def step(self, states):
149 |         # Take actions given the states
150 |         # Return mu (policy without exploration), pi (policy with the current exploration) and
151 |         # the log probability of the action chossen by pi
152 |         mu, pi, logp_pi, v = self.sess.run([self.mu_op, self.pi_op, self.logp_pi_op, self.v_op], feed_dict={
153 |             self.tf_map: states
154 |         })
155 |         return mu, pi, logp_pi, v
156 | 
157 |     def store(self, obs, act, rew, logp, val):
158 |         # Store the observation, action, reward, log probability of the action and state value
159 |         # into the buffer
160 |         self.buffer.store(obs, act, rew, logp, val)
161 | 
162 |     def finish_path(self, last_val=0):
163 |         self.buffer.finish_path(last_val=last_val)
164 | 
165 |     def train(self, additional_infos={}):
166 |         # Get buffer
167 |         obs_buf, act_buf, adv_buf, logp_last_buf, ret_buf = self.buffer.get()
168 |         # Train the model
169 |         pi_loss_list = []
170 |         entropy_list = []
171 |         v_loss_list = []
172 | 
173 |         for step in range(5):
174 |             _, entropy, pi_loss = self.sess.run([self.train_pi, self.approx_ent, self.pi_loss], feed_dict= {
175 |                 self.tf_map: obs_buf,
176 |                 self.tf_a:act_buf,
177 |                 self.tf_adv: adv_buf
178 |             })
179 |             entropy_list.append(entropy)
180 |             pi_loss_list.append(pi_loss)
181 | 
182 | 
183 |         for step in range(5):
184 |             _, v_loss = self.sess.run([self.train_v, self.v_loss], feed_dict= {
185 |                 self.tf_map: obs_buf,
186 |                 self.tf_tv: ret_buf,
187 |             })
188 | 
189 |             v_loss_list.append(v_loss)
190 | 
191 |         print("Entropy : %s" % (np.mean(entropy_list)), end="\r")
192 | 
193 | 
194 |     @staticmethod
195 |     def gaussian_likelihood(x, mu, log_std):
196 |         # Compute the gaussian likelihood of x with a normal gaussian distribution of mean @mu
197 |         # and a std @log_std
198 |         pre_sum = -0.5 * (((x-mu)/(tf.exp(log_std)+1e-8))**2 + 2*log_std + np.log(2*np.pi))
199 |         return tf.reduce_sum(pre_sum, axis=1)
200 | 
201 |     @staticmethod
202 |     def inputs(map_space, action_space):
203 |         """
204 |             @map_space Tuple of the space. Ex (size,)
205 |             @action_space Tuple describing the action space. Ex (size,)
206 |         """
207 |         # Map of the game
208 |         tf_map = tf.placeholder(tf.float32, shape=(None, *map_space), name="tf_map")
209 |         # Possible actions (Should be two: x,y for the beacon game)
210 |         tf_a = tf.placeholder(tf.int32, shape=(None,), name="tf_a")
211 |         # Advantage
212 |         tf_adv = tf.placeholder(tf.float32, shape=(None,), name="tf_adv")
213 |         # Target value
214 |         tf_tv = tf.placeholder(tf.float32, shape=(None,), name="tf_tv")
215 |         return tf_map, tf_a, tf_adv, tf_tv
216 | 
217 |     @staticmethod
218 |     def mlp(tf_map, tf_a, action_space, seed=None):
219 |         if seed is not None:
220 |             tf.random.set_random_seed(seed)
221 | 
222 |         # Expand the dimension of the input
223 |         tf_map_expand = tf.expand_dims(tf_map, axis=3)
224 | 
225 |         flatten = tf.layers.flatten(tf_map_expand)
226 |         hidden = tf.layers.dense(flatten, units=256, activation=tf.nn.relu)
227 | 
228 |         # Logits policy
229 |         spacial_action_logits = tf.layers.dense(hidden, units=action_space, activation=None)
230 |         # Logits value function
231 |         v_op = tf.layers.dense(hidden, units=1, activation=None)
232 | 
233 |         # Add take the log of the softmax
234 |         logp_all = tf.nn.log_softmax(spacial_action_logits)
235 |         # Take random actions according to the logits (Exploration)
236 |         pi_op = tf.squeeze(tf.multinomial(spacial_action_logits,1), axis=1)
237 |         mu = tf.argmax(spacial_action_logits, axis=1)
238 | 
239 |         # Gives log probability, according to  the policy, of taking actions @a in states @x
240 |         logp_a_op = tf.reduce_sum(tf.one_hot(tf_a, depth=action_space) * logp_all, axis=1)
241 |         # Gives log probability, according to the policy, of the action sampled by pi.
242 |         logp_pi_op = tf.reduce_sum(tf.one_hot(pi_op, depth=action_space) * logp_all, axis=1)
243 | 
244 |         return mu, pi_op, logp_a_op, logp_pi_op, v_op
245 | 
246 |     @staticmethod
247 |     def net_objectives(logp_a_op, tf_adv, v_op, tf_tv, clip_ratio=0.2):
248 |         """
249 |             @v_op: Predicted value function
250 |             @tf_tv: Expected value function
251 |             @logp_a_op: Log likelihood of taking action under the current policy
252 |             @tf_logp_old_pi: Log likelihood of the last policy
253 |             @tf_adv: Advantage input
254 |         """
255 |         pi_loss = -tf.reduce_mean(logp_a_op*tf_adv)
256 |         v_loss = tf.reduce_mean((tf_tv - v_op)**2)
257 |         return pi_loss, v_loss
258 | 
259 | class GridWorld(object):
260 |     """
261 |         docstring for GridWorld.
262 |     """
263 |     def __init__(self):
264 |         super(GridWorld, self).__init__()
265 | 
266 |         self.rewards = [
267 |             [0,  0,  0, 0, -1, 0, 0],
268 |             [0, -1, -1, 0, -1, 0, 0],
269 |             [0, -1, -1, 1, -1, 0, 0],
270 |             [0, -1, -1, 0, -1, 0, 0],
271 |             [0,  0,  0, 0,  0, 0, 0],
272 |             [0,  0,  0, 0,  0, 0, 0],
273 |             [0,  0,  0, 0,  0, 0, 0],
274 |         ]
275 |         self.position = [6, 6] # y, x
276 | 
277 |     def gen_state(self):
278 |         # Generate a state given the current position of the agent
279 |         state = np.zeros((7, 7))
280 |         state[self.position[0]][self.position[1]] = 1
281 |         return state
282 | 
283 |     def step(self, action):
284 |         if action == 0: # Top
285 |             self.position = [(self.position[0] - 1) % 7, self.position[1]]
286 |         elif action == 1: # Left
287 |             self.position = [self.position[0], (self.position[1] - 1) % 7]
288 |         elif action == 2: # Right
289 |             self.position = [self.position[0], (self.position[1] + 1) % 7]
290 |         elif action == 3: # Down
291 |             self.position = [(self.position[0] + 1) % 7, self.position[1]]
292 | 
293 |         reward = self.rewards[self.position[0]][self.position[1]]
294 |         done = False if reward == 0 else True
295 |         state = self.gen_state()
296 |         if done: # The agent is dead, reset the game
297 |             self.position = [6, 6]
298 |         return state, reward, done
299 | 
300 |     def display(self):
301 |         y = 0
302 |         print("="*14)
303 |         for line in self.rewards:
304 |             x = 0
305 |             for case in line:
306 |                 if case == -1:
307 |                     c = "0"
308 |                 elif (y == self.position[0] and x == self.position[1]):
309 |                     c = "A"
310 |                 elif case == 1:
311 |                     c = "T"
312 |                 else:
313 |                     c = "-"
314 |                 print(c, end=" ")
315 |                 x += 1
316 |             y += 1
317 |             print()
318 | 
319 | def main():
320 |     grid = GridWorld()
321 |     buffer_size = 1000
322 | 
323 |     # Create the NET class
324 |     agent = ActorCritic(
325 |     	input_space=(7, 7),
326 |     	action_space=4,
327 |     	pi_lr=0.001,
328 |         v_lr=0.001,
329 |     	buffer_size=buffer_size,
330 |     	seed=42
331 |     )
332 |     agent.compile()
333 |     # Init Session
334 |     sess = tf.Session()
335 |     # Init variables
336 |     sess.run(tf.global_variables_initializer())
337 |     # Set the session
338 |     agent.set_sess(sess)
339 | 
340 |     rewards = []
341 | 
342 |     b = 0
343 | 
344 |     for epoch in range(10000):
345 | 
346 |         done = False
347 |         state = grid.gen_state()
348 | 
349 |         while not done:
350 |             _, pi, logpi, v = agent.step([state])
351 |             n_state, reward, done = grid.step(pi[0])
352 |             agent.store(state, pi[0], reward, logpi, v)
353 |             b += 1
354 | 
355 |             state = n_state
356 | 
357 |             if done:
358 |                 agent.finish_path(reward)
359 |                 rewards.append(reward)
360 |                 if len(rewards) > 1000:
361 |                     rewards.pop(0)
362 |             if b == buffer_size:
363 |                 if not done:
364 |                     # Bootstrap the last value
365 |                     agent.finish_path(v)
366 |                 agent.train()
367 |                 b = 0
368 | 
369 |         if epoch % 1000 == 0:
370 |             print("Rewards mean:%s" % np.mean(rewards))
371 | 
372 |     for epoch in range(10):
373 |         import time
374 |         print("=========================TEST=================================")
375 |         done = False
376 |         state = grid.gen_state()
377 | 
378 |         while not done:
379 |             time.sleep(1)
380 |             _, pi, logpi, v = agent.step([state])
381 |             n_state, _, done = grid.step(pi[0])
382 |             print("v", v)
383 |             grid.display()
384 |             state = n_state
385 |         print("reward=>", reward)
386 | 
387 | if __name__ == '__main__':
388 |     main()
389 | 


--------------------------------------------------------------------------------
/meta-learning/Meta Learning.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "## Classification Demo"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "code",
 12 |    "execution_count": 1,
 13 |    "metadata": {},
 14 |    "outputs": [],
 15 |    "source": [
 16 |     "import numpy as np\n",
 17 |     "import matplotlib.pyplot as plt\n",
 18 |     "from scipy.spatial.distance import cdist\n",
 19 |     "import copy\n",
 20 |     "import sys\n",
 21 |     "import os\n",
 22 |     "\n",
 23 |     "DATASET = \"/home/thibault/work/datasets/omniglot/python/\""
 24 |    ]
 25 |   },
 26 |   {
 27 |    "cell_type": "markdown",
 28 |    "metadata": {},
 29 |    "source": [
 30 |     "## One-shot classification demo with Modified Hausdorff Distance"
 31 |    ]
 32 |   },
 33 |   {
 34 |    "cell_type": "markdown",
 35 |    "metadata": {},
 36 |    "source": [
 37 |     "Running this demo should lead to a result of 38.8 percent errors. M.-P. Dubuisson, A. K. Jain (1994). A modified hausdorff distance for object matching. International Conference on Pattern Recognition, pp. 566-568. ** Models should be trained on images in 'images_background' directory to avoid using images and alphabets used in the one-shot evaluation **"
 38 |    ]
 39 |   },
 40 |   {
 41 |    "cell_type": "code",
 42 |    "execution_count": 2,
 43 |    "metadata": {},
 44 |    "outputs": [],
 45 |    "source": [
 46 |     "def mod_hausdorff_distance(itemA, itemB):\n",
 47 |     "    \"\"\"\n",
 48 |     "     Modified Hausdorff Distance\n",
 49 |     "\n",
 50 |     "     Input:\n",
 51 |     "            itemA : [n x 2] coordinates of \"inked\" pixels\n",
 52 |     "            itemB : [m x 2] coordinates of \"inked\" pixels\n",
 53 |     "            M.-P. Dubuisson, A. K. Jain (1994). A modified hausdorff distance for object matching.\n",
 54 |     "            International Conference on Pattern Recognition, pp. 566-568.\n",
 55 |     "    \"\"\"\n",
 56 |     "    D = cdist(itemA,itemB)\n",
 57 |     "    mindist_A = D.min(axis=1)\n",
 58 |     "    mindist_B = D.min(axis=0)\n",
 59 |     "    mean_A = np.mean(mindist_A)\n",
 60 |     "    mean_B = np.mean(mindist_B)\n",
 61 |     "    return max(mean_A,mean_B)"
 62 |    ]
 63 |   },
 64 |   {
 65 |    "cell_type": "code",
 66 |    "execution_count": 23,
 67 |    "metadata": {},
 68 |    "outputs": [
 69 |     {
 70 |      "data": {
 71 |       "image/png": "iVBORw0KGgoAAAANSUhEUgAAAQUAAAD8CAYAAAB+fLH0AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAADnhJREFUeJzt3V2oZXd5x/HvrzNGa6Tm7RDiTOxMMShBsJGDjaQUMUrTVEwuRBLEDjIwN7bGF9CkvZDeKYiagkgHo06LRG0MTQiipGOk9KJTz2gwL2PMNDZmhsQcqdFiLzT49GKvwfM/OafnzF77Ze053w8czl5rr733M/+Z/Naz/mvtlVQVknTG78y7AEnDYihIahgKkhqGgqSGoSCpYShIahgKkhpTCYUk1yV5LMnJJLdO4zMkTUcmffFSkl3AD4G3AqeA7wA3V9WjE/0gSVOxewrv+QbgZFU9AZDky8ANwKahcMkll9S+ffumUIqkM44fP/7TqlraartphMIe4Kk1y6eAP1q/UZJDwCGAV77ylaysrEyhFElnJHlyO9vNbaKxqg5X1XJVLS8tbRlekmZkGqFwGrh8zfLebp2kBTCNUPgOcEWS/UnOA24C7p3C50iagonPKVTV80n+EvgmsAv4fFU9MunPkTQd05hopKq+Dnx9Gu8tabq8olFSw1CQ1DAUJDUMBUkNQ0FSw1CQ1DAUJDUMBUkNQ0FSw1CQ1DAUJDUMBUkNQ0FSw1CQ1DAUJDUMBUkNQ0FSw1CQ1DAUJDUMBUkNQ0FSw1CQ1DAUJDUMBUkNQ0FSw1CQ1DAUJDUMBUkNQ0FSw1CQ1DAUJDUMBUmNsUMhyeVJHkjyaJJHktzSrb8oyf1JHu9+Xzi5ciVNW59O4XngQ1V1JXA18N4kVwK3Aker6grgaLcsaUGMHQpV9XRVfbd7/D/ACWAPcANwpNvsCHBj3yIlzc5E5hSS7AOuAo4Bl1bV091TzwCXTuIzJM1G71BI8jLga8D7q+oXa5+rqgJqk9cdSrKSZGV1dbVvGZImpFcoJHkRo0D4UlXd3a3+SZLLuucvA57d6LVVdbiqlqtqeWlpqU8Zkiaoz9mHAHcAJ6rqk2ueuhc40D0+ANwzfnmSZm13j9deA7wbeCjJg926vwY+Bnw1yUHgSeCd/UqUNEtjh0JV/RuQTZ6+dtz3lTRfXtEoqWEoSGoYCpIahoKkhqEgqWEoSGoYCpIahoKkhqEgqWEoSGoYCpIahoKkRp9vSUoAjL5FPx2j+/RoluwUJDXsFPQC09zzn631tdg5TJ+dgqSGncION6SuYDvsHKbPTkFSw05hh1i0jmC7zvy57Bgmx05BUsNQkNQwFHROSHLOHiLNmqEgqeFEoyZuEpN+4+71nXjsz05BUsNOQVuax153/Weebeewdnu7hrNjpyCpYaegFxjinvVMTZ5hmD47BUkNO4UdYoh7/3GM0zF4RuLs2ClIahgKWkhV5Z5/SgwFSY3ecwpJdgErwOmqeluS/cCXgYuB48C7q+pXfT9nJ1iUmfVF3UM7t7A9k+gUbgFOrFn+OPCpqnoV8DPg4AQ+Q9KM9AqFJHuBPwc+1y0HeDNwV7fJEeDGPp9xLjjzDb6tfhbFotZ9xqLWPSt9O4VPAx8GftMtXww8V1XPd8ungD0bvTDJoSQrSVZWV1d7liFpUsYOhSRvA56tquPjvL6qDlfVclUtLy0tjVvGIJwrHcC45vnn7HMWYqf8/ZytPhON1wBvT3I98BLg94DbgQuS7O66hb3A6f5lSpqVsTuFqrqtqvZW1T7gJuBbVfUu4AHgHd1mB4B7elc5MDutE9iueY6LHcPkTOM6hY8AH0xyktEcwx1T+AxJUzKR7z5U1beBb3ePnwDeMIn3nQf3GJMzj/9xi9+m7M8rGiU1/JZkZ2h7liFeddd3jGZ5N6S177/dur3iccROQVLDTmGGFn0PNMnjdffKw2WnIKlhKEhqePgwRedqa9z39usbvXaaY3W2hz07/dDGTkFSw06hh526J1lvEhOQO33vPCR2CpIadgpjcG+2sUnMNUyzY/AS6O2xU5DUMBQ0NYt+G/ad+pVqQ0FSw1AYwyLsQYZU4zgdw5Dq32kMBUkNzz70MIRz61vtTTd7fh41j/N15mnW4RWOG7NTkNSwU+gs2jnsRamzr522lx4COwVJDUNhHWfKZ2MI1zAMoYYhMhQkNQwFaZt2SkdoKEhqGAqbcG5BO5WhIKnhdQoL6myvq3CWfXOLdo3KtNkpSGrYKWxhKNfrb8YOQJNmpyCpYShMgWchtMgMBUmNXqGQ5IIkdyX5QZITSd6Y5KIk9yd5vPt94aSKnTevlddO0LdTuB34RlW9BngdcAK4FThaVVcAR7tlSQti7FBI8nLgT4A7AKrqV1X1HHADcKTb7AhwY98iJc1On05hP7AKfCHJ95J8Lsn5wKVV9XS3zTPApX2LXFROOGoR9QmF3cDrgc9W1VXAL1l3qFCjA/AND8KTHEqykmRldXW1RxmSJqlPKJwCTlXVsW75LkYh8ZMklwF0v5/d6MVVdbiqlqtqeWlpqUcZkiZp7FCoqmeAp5K8ult1LfAocC9woFt3ALinV4WSZqrvZc5/BXwpyXnAE8B7GAXNV5McBJ4E3tnzMwbHW4SPx/mVxdArFKrqQWB5g6eu7fO+kubHL0TN0E7tGOwQFouXOUtq2Cn0MO7NOXZKxzCJDuFcH6MhslOQ1LBTmAA7hslzTObHTkFSw05Bg2KHMH92CpIadgoT5K3CW9sZDzuD4bFTkNSwU5iCcb8bsdF7nAvOpT/LTmCnIKlhKExRnxu9etcmzYuhIKnhnMLAbdYteJw+OXZkLTsFSQ07hRmYxvULZ/tedhbaLjsFSQ07hRlav7ee5bGscxO/Ne6475SxslOQ1DAUJDU8fJijIXyB6lz5spKnFSfHTkFSw05hAIbQMWxkaPXMyyJ1TJNgpyCpYacwIP/fHsm9tmbFTkFSw05hQZztca2dxfh22hzCenYKkhp2CueorfZ2dhK/tdM7g/XsFCQ17BR2qHP9TId7//H16hSSfCDJI0keTnJnkpck2Z/kWJKTSb6S5LxJFStp+sYOhSR7gPcBy1X1WmAXcBPwceBTVfUq4GfAwUkUqtk5c8PZRf7R+PrOKewGfjfJbuClwNPAm4G7uuePADf2/AxJMzR2KFTVaeATwI8ZhcHPgePAc1X1fLfZKWBP3yIlzU6fw4cLgRuA/cArgPOB687i9YeSrCRZWV1dHbcMSRPW5/DhLcCPqmq1qn4N3A1cA1zQHU4A7AVOb/TiqjpcVctVtby0tNSjDEmT1CcUfgxcneSlGZ3DuhZ4FHgAeEe3zQHgnn4lSpqlPnMKxxhNKH4XeKh7r8PAR4APJjkJXAzcMYE6Jc1Ir4uXquqjwEfXrX4CeEOf95U0P17mLKlhKEhqGAqSGoaCpIahIKlhKEhqGAqSGoaCpIahIKlhKEhqGAqSGoaCpIahIKlhKEhqGAqSGoaCpIahIKlhKEhqGAqSGoaCpIahIKlhKEhqGAqSGoaCpIahIKlhKEhqGAqSGoaCpIahIKlhKEhqGAqSGoaCpIahIKmxZSgk+XySZ5M8vGbdRUnuT/J49/vCbn2S/F2Sk0m+n+T10yxe0uRtp1P4InDdunW3Aker6grgaLcM8GfAFd3PIeCzkylT0qxsGQpV9a/Af69bfQNwpHt8BLhxzfp/qJF/By5IctmkipU0fePOKVxaVU93j58BLu0e7wGeWrPdqW7dCyQ5lGQlycrq6uqYZUiatN4TjVVVQI3xusNVtVxVy0tLS33LkDQh44bCT84cFnS/n+3WnwYuX7Pd3m6dpAUxbijcCxzoHh8A7lmz/i+6sxBXAz9fc5ghaQHs3mqDJHcCbwIuSXIK+CjwMeCrSQ4CTwLv7Db/OnA9cBL4X+A9U6hZ0hRtGQpVdfMmT127wbYFvLdvUZLmxysaJTUMBUkNQ0FSw1CQ1MhobnDORSSrwC+Bn867lm24hOHXaY2Tswh1brfG36+qLa8UHEQoACRZqarledexlUWo0xonZxHqnHSNHj5IahgKkhpDCoXD8y5gmxahTmucnEWoc6I1DmZOQdIwDKlTkDQAgwiFJNcleay7t+OtW79i+pJcnuSBJI8meSTJLd36De9POedadyX5XpL7uuX9SY514/mVJOcNoMYLktyV5AdJTiR549DGMskHur/rh5PcmeQlQxjLWd8nde6hkGQX8BlG93e8Erg5yZXzrQqA54EPVdWVwNXAe7u6Nrs/5TzdApxYs/xx4FNV9SrgZ8DBuVTVuh34RlW9Bngdo3oHM5ZJ9gDvA5ar6rXALuAmhjGWX2SW90mtqrn+AG8Evrlm+TbgtnnXtUGd9wBvBR4DLuvWXQY8Nue69nb/KN4M3AeE0YUsuzca3znV+HLgR3RzWGvWD2Ys+e2tBC9i9O3h+4A/HcpYAvuAh7caO+DvgZs32m67P3PvFDiL+zrOS5J9wFXAMTa/P+W8fBr4MPCbbvli4Lmqer5bHsJ47gdWgS90hzmfS3I+AxrLqjoNfAL4MfA08HPgOMMbyzN63yd1M0MIhUFL8jLga8D7q+oXa5+rURTP7fRNkrcBz1bV8XnVsE27gdcDn62qqxhd0t4cKgxgLC9kdDfy/cArgPN5Ycs+SJMeuyGEwmDv65jkRYwC4UtVdXe3erP7U87DNcDbk/wX8GVGhxC3M7q1/pkb6AxhPE8Bp6rqWLd8F6OQGNJYvgX4UVWtVtWvgbsZje/QxvKMqd0ndQih8B3gim6W9zxGkzv3zrkmkgS4AzhRVZ9c89Rm96ecuaq6rar2VtU+RuP2rap6F/AA8I5us7nWCFBVzwBPJXl1t+pa4FEGNJaMDhuuTvLS7u/+TI2DGss1pnef1HlN7KybRLke+CHwn8DfzLuerqY/ZtSSfR94sPu5ntEx+1HgceBfgIvmXWtX75uA+7rHfwD8B6N7Zf4T8OIB1PeHwEo3nv8MXDi0sQT+FvgB8DDwj8CLhzCWwJ2M5jl+zajrOrjZ2DGaaP5M99/SQ4zOppzV53lFo6TGEA4fJA2IoSCpYShIahgKkhqGgqSGoSCpYShIahgKkhr/ByMg6OKiVzxNAAAAAElFTkSuQmCC\n",
 72 |       "text/plain": [
 73 |        "<Figure size 432x288 with 1 Axes>"
 74 |       ]
 75 |      },
 76 |      "metadata": {},
 77 |      "output_type": "display_data"
 78 |     },
 79 |     {
 80 |      "data": {
 81 |       "image/png": "iVBORw0KGgoAAAANSUhEUgAAAQUAAAD8CAYAAAB+fLH0AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAADelJREFUeJzt3W+MZXV9x/H3p7uiFVMBmRDche42Eg0xsZgJxdA0RjSl1AgPjIGYdmM22Se24p9EoX1g+kwTo9LEmG5E3TYEtUgKIUaDK6bpg26ZFaLAimyxyBKQMQVs7INC/PbBPdPOb5xhZu65f2fer2Qy95x77r3f/c3u53zP75w9k6pCklb81rQLkDRbDAVJDUNBUsNQkNQwFCQ1DAVJDUNBUmMsoZDk6iSPJjmd5KZxfIak8cioL15Ksgf4CfAu4AxwP3BDVT0y0g+SNBZ7x/CelwOnq+pxgCRfA64FNgyF888/vw4cODCGUiStOHny5C+qamGz7cYRCvuAJ1ctnwH+YO1GSY4ARwAuvvhilpaWxlCKpBVJntjKdlObaKyqo1W1WFWLCwubhpekCRlHKDwFXLRqeX+3TtIcGEco3A9ckuRgkrOA64G7x/A5ksZg5HMKVfVSkr8AvgPsAb5cVQ+P+nMkjcc4Jhqpqm8B3xrHe0saL69olNQwFCQ1DAVJDUNBUsNQkNQwFCQ1DAVJDUNBUsNQkNQwFCQ1DAVJDUNBUsNQkNQwFCQ1DAVJDUNBUmMsN1nR5CXZ0naj/j0f2nnsFCQ1DAVJDUNBUsNQkNQwFCQ1PPsw57Z61kHaKjsFSQ1DQTtCErumETEUJDV2/ZzCdvYu83w14DzX/nLW/vxWlnfqn3cS7BQkNXZ9pzCvPH5+eavHx65he+wUJDXsFLZhHo9X56nW7dhOpzSPP7dpslOQ1Bg6FJJclOS+JI8keTjJjd3685Lcm+Sx7vu5oytXu53XI4xfn07hJeBjVXUpcAXwwSSXAjcBx6vqEuB4tyxpTgwdClX1dFX9oHv8X8ApYB9wLXCs2+wYcF3fIiVNzkgmGpMcAC4DTgAXVNXT3VPPABeM4jNmiRNXk+chw+T0nmhM8hrgm8CHq+qXq5+rwb+adf/lJDmSZCnJ0vLyct8yJI1Ir1BI8goGgXBbVd3Zrf55kgu75y8Enl3vtVV1tKoWq2pxYWGhTxm9VJV7/Bk2iolFf8bb0+fsQ4BbgVNV9dlVT90NHOoeHwLuGr48SZPWZ07hSuDPgB8lebBb91fAp4BvJDkMPAG8r1+Jk7GyJ/GimOly7mD6hg6FqvoXYKOf4FXDvq+k6fIy5zWG6Rg0vHGNs93b8LzMWVLDTmEE1u7t3Eutz+5rPtgpSGrYKYzBbv1lr3YCO4OdgqSGncIGJnEWYl72rPNSJ+y87msa7BQkNewUNrF6zzNPe8wV81gz/OYef17/HPPITkFSw05hG9x7jY9zAbPDTkFSw06hB/+fxPDsDGaXnYKkhp3CCAy719upHYZdwHyzU5DUsFOYomH2qNPsLuwAdgc7BUkNO4U5s90zHu7dtV12CpIahoKkhqEgqWEoSGoYCpIahoKkhqEgqWEoSGoYCpIahoKkhqEgqWEoSGoYCpIahoKkRu9QSLInyQNJ7umWDyY5keR0kq8nOat/mZImZRSdwo3AqVXLnwY+V1VvAJ4DDo/gMyRNSK9QSLIf+FPgS91ygHcAd3SbHAOu6/MZkiarb6fweeDjwK+75dcBz1fVS93yGWDfei9MciTJUpKl5eXlnmVIGpWhQyHJu4Fnq+rkMK+vqqNVtVhViwsLC8OWIWnE+tyj8UrgPUmuAV4F/A5wC3BOkr1dt7AfeKp/mZImZehOoapurqr9VXUAuB74XlW9H7gPeG+32SHgrt5VSpqYcVyn8Ango0lOM5hjuHUMnyFpTEZyi/eq+j7w/e7x48Dlo3hfSZPnFY2SGoaCpIahIKlhKEhqGAqSGoaCpIahIKlhKEhqGAqSGoaCpIahIKlhKEhqGAqSGoaCpIahIKlhKEhqGAqSGoaCpIahIKlhKEhqGAqSGoaCpMZIbvGu2TX4nb9QVSN/z7VG+RmaHjsFSQ07BW1qo85go+3sGOabnYKkhp3CnFrZG291L74d43hPzQ87BUkNO4Vdzq5Aa9kpSGrYKexS4+gQPOuwM9gpSGr0CoUk5yS5I8mPk5xK8rYk5yW5N8lj3fdzR1Ws+ksysi6hqpov7Qx9O4VbgG9X1ZuAtwCngJuA41V1CXC8W5Y0J4YOhSSvBf4IuBWgqv6nqp4HrgWOdZsdA67rW6SkyenTKRwEloGvJHkgyZeSnA1cUFVPd9s8A1zQt0j1N47DBu1MfUJhL/BW4ItVdRnwK9YcKtTgb866f3uSHEmylGRpeXm5RxmSRqlPKJwBzlTViW75DgYh8fMkFwJ0359d78VVdbSqFqtqcWFhoUcZGpe1E4l2CLvD0KFQVc8ATyZ5Y7fqKuAR4G7gULfuEHBXrwolTVTfi5f+ErgtyVnA48AHGATNN5IcBp4A3tfzMzRhdgO7W69QqKoHgcV1nrqqz/tKmh4vc9b/sUMQeJmzpDUMhTnnGYHWKK/H2K0MBUkNQ0FzwY5ocgwFSQ3PPuxy7n21lp2CpIadgnak1Wcg7Ia2x05BUsNOYYfY7i+Hmde95zh/CY4G7BQkNewUdpjN9qTz2iFocuwUJDXsFHaotR3Dbu4QHIPtsVOQ1LBT2OF26t7RsxDjY6cgqWEoSGoYCpIahoKkhqGgXcNbtW2NoSCpYShornmbttEzFCQ1DAXtOs4tvDxDQVLDUNCOMMzcgh3D+gwFSQ1DQVLDUJDUMBS0o3jdQn+9QiHJR5I8nOShJLcneVWSg0lOJDmd5OtJzhpVsZLGb+hQSLIP+BCwWFVvBvYA1wOfBj5XVW8AngMOj6JQaTtWOobVXxs9p1bfw4e9wG8n2Qu8GngaeAdwR/f8MeC6np8haYKGDoWqegr4DPAzBmHwAnASeL6qXuo2OwPs61ukNAp2BlvT5/DhXOBa4CDweuBs4OptvP5IkqUkS8vLy8OWIWnE+hw+vBP4aVUtV9WLwJ3AlcA53eEEwH7gqfVeXFVHq2qxqhYXFhZ6lCFplPqEws+AK5K8OoNrRa8CHgHuA97bbXMIuKtfiZImqc+cwgkGE4o/AH7UvddR4BPAR5OcBl4H3DqCOiVNSK/f+1BVnwQ+uWb148Dlfd5X0vR4RaOkhqEgqWEoSGoYCpIahoKkhqEgqWEoSGoYCpIahoKkhqEgqWEoSGoYCpIahoKkhqEgqWEoSGoYCpIahoKkhqEgqWEoSGoYCpIahoKkhqEgqWEoSGoYCpIahoKkhqEgqWEoSGoYCpIahoKkhqEgqWEoSGoYCpIahoKkxqahkOTLSZ5N8tCqdecluTfJY933c7v1SfK3SU4n+WGSt46zeEmjt5VO4avA1WvW3QQcr6pLgOPdMsCfAJd0X0eAL46mTEmTsmkoVNU/A/+5ZvW1wLHu8THgulXr/74G/hU4J8mFoypW0vgNO6dwQVU93T1+Brige7wPeHLVdme6db8hyZEkS0mWlpeXhyxD0qj1nmisqgJqiNcdrarFqlpcWFjoW4akERk2FH6+cljQfX+2W/8UcNGq7fZ36yTNiWFD4W7gUPf4EHDXqvV/3p2FuAJ4YdVhhqQ5sHezDZLcDrwdOD/JGeCTwKeAbyQ5DDwBvK/b/FvANcBp4L+BD4yhZkljtGkoVNUNGzx11TrbFvDBvkVJmh6vaJTUMBQkNQwFSQ1DQVIjg7nBKReRLAO/An4x7Vq24Hxmv05rHJ15qHOrNf5uVW16peBMhAJAkqWqWpx2HZuZhzqtcXTmoc5R1+jhg6SGoSCpMUuhcHTaBWzRPNRpjaMzD3WOtMaZmVOQNBtmqVOQNANmIhSSXJ3k0e7ejjdt/orxS3JRkvuSPJLk4SQ3duvXvT/llGvdk+SBJPd0yweTnOjG8+tJzpqBGs9JckeSHyc5leRtszaWST7S/awfSnJ7klfNwlhO+j6pUw+FJHuALzC4v+OlwA1JLp1uVQC8BHysqi4FrgA+2NW10f0pp+lG4NSq5U8Dn6uqNwDPAYenUlXrFuDbVfUm4C0M6p2ZsUyyD/gQsFhVbwb2ANczG2P5VSZ5n9SqmuoX8DbgO6uWbwZunnZd69R5F/Au4FHgwm7dhcCjU65rf/eX4h3APUAYXMiyd73xnVKNrwV+SjeHtWr9zIwl/38rwfMY/O/he4A/npWxBA4AD202dsDfATest91Wv6beKbCN+zpOS5IDwGXACTa+P+W0fB74OPDrbvl1wPNV9VK3PAvjeRBYBr7SHeZ8KcnZzNBYVtVTwGeAnwFPAy8AJ5m9sVzR+z6pG5mFUJhpSV4DfBP4cFX9cvVzNYjiqZ2+SfJu4NmqOjmtGrZoL/BW4ItVdRmDS9qbQ4UZGMtzGdyN/CDweuBsfrNln0mjHrtZCIWZva9jklcwCITbqurObvVG96echiuB9yT5D+BrDA4hbmFwa/2VG+jMwnieAc5U1Ylu+Q4GITFLY/lO4KdVtVxVLwJ3MhjfWRvLFWO7T+oshML9wCXdLO9ZDCZ37p5yTSQJcCtwqqo+u+qpje5POXFVdXNV7a+qAwzG7XtV9X7gPuC93WZTrRGgqp4Bnkzyxm7VVcAjzNBYMjhsuCLJq7uf/UqNMzWWq4zvPqnTmthZM4lyDfAT4N+Bv552PV1Nf8igJfsh8GD3dQ2DY/bjwGPAd4Hzpl1rV+/bgXu6x78H/BuDe2X+I/DKGajv94Glbjz/CTh31sYS+Bvgx8BDwD8Ar5yFsQRuZzDP8SKDruvwRmPHYKL5C92/pR8xOJuyrc/zikZJjVk4fJA0QwwFSQ1DQVLDUJDUMBQkNQwFSQ1DQVLDUJDU+F8RThXf3CWT1wAAAABJRU5ErkJggg==\n",
 82 |       "text/plain": [
 83 |        "<Figure size 432x288 with 1 Axes>"
 84 |       ]
 85 |      },
 86 |      "metadata": {},
 87 |      "output_type": "display_data"
 88 |     },
 89 |     {
 90 |      "name": "stdout",
 91 |      "output_type": "stream",
 92 |      "text": [
 93 |       "Distance 0.0\n",
 94 |       "Distance 1.9464946780762098\n"
 95 |      ]
 96 |     }
 97 |    ],
 98 |    "source": [
 99 |     "def load_img_as_points(fn, show=False, inked=False): \n",
100 |     "    \"\"\"\n",
101 |     "    Load image file and return coordinates of 'inked' pixels in the binary image \n",
102 |     "        Output:\n",
103 |     "            D : [n x 2] rows are coordinates\n",
104 |     "    \"\"\"\n",
105 |     "    I = plt.imread(fn)\n",
106 |     "    I = np.array(I,dtype=bool)\n",
107 |     "\n",
108 |     "    if inked:\n",
109 |     "        I = np.logical_not(I)\n",
110 |     "        (row,col) = I.nonzero()\n",
111 |     "        I = I[row.min():row.max(),col.min():col.max()]\n",
112 |     "        I = np.logical_not(I)\n",
113 |     "\n",
114 |     "    if show:\n",
115 |     "        plt.imshow(I, cmap=\"gray\")\n",
116 |     "        plt.show()\n",
117 |     "\n",
118 |     "    return I\n",
119 |     "\n",
120 |     "\n",
121 |     "def test_load_img_as_points():\n",
122 |     "    img1 = load_img_as_points(os.path.join(DATASET, \"one-shot-classification/run01/training/class01.png\"))\n",
123 |     "    plt.imshow(img1, cmap=\"gray\")\n",
124 |     "    plt.show()\n",
125 |     "    \n",
126 |     "    img2 = load_img_as_points(os.path.join(DATASET, \"one-shot-classification/run01/training/class02.png\"))\n",
127 |     "    plt.imshow(img2, cmap=\"gray\")\n",
128 |     "    plt.show()    \n",
129 |     "    \n",
130 |     "    print(\"Distance\", mod_hausdorff_distance(img1, img1))\n",
131 |     "    print(\"Distance\", mod_hausdorff_distance(img1, img2))\n",
132 |     "\n",
133 |     "test_load_img_as_points()"
134 |    ]
135 |   },
136 |   {
137 |    "cell_type": "code",
138 |    "execution_count": 52,
139 |    "metadata": {},
140 |    "outputs": [],
141 |    "source": [
142 |     "def classification_run(folder,f_load,f_cost,ftype='cost'):\n",
143 |     "    # Compute error rate for one run of one-shot classification\n",
144 |     "    #\n",
145 |     "    # Input\n",
146 |     "    #  folder : contains images for a run of one-shot classification\n",
147 |     "    #  f_load : itemA = f_load('file.png') should read in the image file and process it\n",
148 |     "    #  f_cost : f_cost(itemA,itemB) should compute similarity between two images, using output of f_load\n",
149 |     "    #  ftype  : 'cost' if small values from f_cost mean more similar, or 'score' if large values are more similar\n",
150 |     "    #\n",
151 |     "    # Output\n",
152 |     "    #  perror : percent errors (0 to 100% error)\n",
153 |     "    #\n",
154 |     "    fname_label = 'class_labels.txt' # where class labels are stored for each run\n",
155 |     "    assert ((ftype=='cost') | (ftype=='score'))\n",
156 |     "\n",
157 |     "    base_folder = os.path.join(DATASET, \"one-shot-classification\")\n",
158 |     "    sub_folder = os.path.join(base_folder, folder)\n",
159 |     "    \n",
160 |     "    # get file names\n",
161 |     "    with open(os.path.join(sub_folder, fname_label), \"r\") as f:\n",
162 |     "        content = f.read().splitlines()\n",
163 |     "    \n",
164 |     "    pairs = [line.split() for line in content]\n",
165 |     "    test_files  = [os.path.join(base_folder, pair[0]) for pair in pairs]\n",
166 |     "    train_files = [os.path.join(base_folder, pair[1]) for pair in pairs]\n",
167 |     "    answers_files = copy.copy(train_files)\n",
168 |     "    test_files.sort()\n",
169 |     "    train_files.sort()\n",
170 |     "    ntrain = len(train_files)\n",
171 |     "    ntest = len(test_files)\n",
172 |     "    \n",
173 |     "    # load the images (and, if needed, extract features)\n",
174 |     "    train_items = [f_load(f) for f in train_files]\n",
175 |     "    test_items  = [f_load(f) for f in test_files ]\n",
176 |     "\n",
177 |     "    # compute cost matrix\n",
178 |     "    costM = np.zeros((ntest,ntrain),float)\n",
179 |     "    for i in range(ntest):\n",
180 |     "        for c in range(ntrain):\n",
181 |     "            costM[i,c] = f_cost(test_items[i],train_items[c])\n",
182 |     "\n",
183 |     "    if ftype == 'cost':\n",
184 |     "        YHAT = np.argmin(costM,axis=1)\n",
185 |     "    elif ftype == 'score':\n",
186 |     "        YHAT = np.argmax(costM,axis=1)\n",
187 |     "    else:\n",
188 |     "        assert False\n",
189 |     "        \n",
190 |     "    # compute the error rate\n",
191 |     "    correct = 0.0\n",
192 |     "    for i in range(ntest):\n",
193 |     "        if train_files[YHAT[i]] == answers_files[i]:\n",
194 |     "            correct += 1.0\n",
195 |     "    pcorrect = 100 * correct / ntest\n",
196 |     "    perror = 100 - pcorrect\n",
197 |     "\n",
198 |     "    return perror"
199 |    ]
200 |   },
201 |   {
202 |    "cell_type": "markdown",
203 |    "metadata": {},
204 |    "source": [
205 |     "##  Percentage of error with a simple dist"
206 |    ]
207 |   },
208 |   {
209 |    "cell_type": "code",
210 |    "execution_count": 57,
211 |    "metadata": {},
212 |    "outputs": [
213 |     {
214 |      "name": "stdout",
215 |      "output_type": "stream",
216 |      "text": [
217 |       "Percentage of error 81.25\n"
218 |      ]
219 |     }
220 |    ],
221 |    "source": [
222 |     "def run_one_shot_classificaion_with_hausdorff_distance():\n",
223 |     "    nrun = 20 # number of classification runs\n",
224 |     "    fname_label = 'class_labels.txt' # where class labels are stored for each run\n",
225 |     "    \n",
226 |     "    perror = np.zeros(nrun)\n",
227 |     "    for r in range(1, nrun+1):\n",
228 |     "        # Add 0 for number from 0 to 9 (2 -> 02)\n",
229 |     "        rs = '0' + str(r) if len(str(r)) == 1 else str(r)\n",
230 |     "        perror[r-1] = classification_run('run'+rs, load_img_as_points, mod_hausdorff_distance, 'cost')\n",
231 |     "    \n",
232 |     "    print(\"Percentage of error\", np.mean(perror))\n",
233 |     "\n",
234 |     "run_one_shot_classificaion_with_hausdorff_distance()"
235 |    ]
236 |   }
237 |  ],
238 |  "metadata": {
239 |   "kernelspec": {
240 |    "display_name": "Python 3",
241 |    "language": "python",
242 |    "name": "python3"
243 |   },
244 |   "language_info": {
245 |    "codemirror_mode": {
246 |     "name": "ipython",
247 |     "version": 3
248 |    },
249 |    "file_extension": ".py",
250 |    "mimetype": "text/x-python",
251 |    "name": "python",
252 |    "nbconvert_exporter": "python",
253 |    "pygments_lexer": "ipython3",
254 |    "version": "3.5.2"
255 |   }
256 |  },
257 |  "nbformat": 4,
258 |  "nbformat_minor": 2
259 | }
260 | 


--------------------------------------------------------------------------------
/tensorflow/TensorFlow MNIST tutorial.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# TensorFlow Demo: MNIST for ML Beginners\n",
  8 |     "Before start using this, please select `Cell` - `All Output` - `Clear` to clear the old results. See [TensorFlow Tutorial](https://www.tensorflow.org/versions/master/tutorials/mnist/beginners/index.html) for details of the tutorial.\n",
  9 |     "\n",
 10 |     "# Loading MNIST training data"
 11 |    ]
 12 |   },
 13 |   {
 14 |    "cell_type": "code",
 15 |    "execution_count": 1,
 16 |    "metadata": {},
 17 |    "outputs": [
 18 |     {
 19 |      "name": "stdout",
 20 |      "output_type": "stream",
 21 |      "text": [
 22 |       "WARNING:tensorflow:From <ipython-input-1-be091de43671>:7: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.\n",
 23 |       "Instructions for updating:\n",
 24 |       "Please use alternatives such as official/mnist/dataset.py from tensorflow/models.\n",
 25 |       "WARNING:tensorflow:From /home/thibault/work/env/gpu/lib/python3.5/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.\n",
 26 |       "Instructions for updating:\n",
 27 |       "Please write your own downloading logic.\n",
 28 |       "WARNING:tensorflow:From /home/thibault/work/env/gpu/lib/python3.5/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.\n",
 29 |       "Instructions for updating:\n",
 30 |       "Please use tf.data to implement this functionality.\n",
 31 |       "Extracting MNIST_data/train-images-idx3-ubyte.gz\n",
 32 |       "WARNING:tensorflow:From /home/thibault/work/env/gpu/lib/python3.5/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.\n",
 33 |       "Instructions for updating:\n",
 34 |       "Please use tf.data to implement this functionality.\n",
 35 |       "Extracting MNIST_data/train-labels-idx1-ubyte.gz\n",
 36 |       "WARNING:tensorflow:From /home/thibault/work/env/gpu/lib/python3.5/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:110: dense_to_one_hot (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.\n",
 37 |       "Instructions for updating:\n",
 38 |       "Please use tf.one_hot on tensors.\n",
 39 |       "Extracting MNIST_data/t10k-images-idx3-ubyte.gz\n",
 40 |       "Extracting MNIST_data/t10k-labels-idx1-ubyte.gz\n",
 41 |       "WARNING:tensorflow:From /home/thibault/work/env/gpu/lib/python3.5/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:290: DataSet.__init__ (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.\n",
 42 |       "Instructions for updating:\n",
 43 |       "Please use alternatives such as official/mnist/dataset.py from tensorflow/models.\n"
 44 |      ]
 45 |     }
 46 |    ],
 47 |    "source": [
 48 |     "# import matplotlib\n",
 49 |     "import matplotlib.pyplot as plt\n",
 50 |     "%matplotlib inline\n",
 51 |     "\n",
 52 |     "# import MNIST data\n",
 53 |     "from tensorflow.examples.tutorials.mnist import input_data\n",
 54 |     "mnist = input_data.read_data_sets(\"MNIST_data/\", one_hot=True)"
 55 |    ]
 56 |   },
 57 |   {
 58 |    "cell_type": "markdown",
 59 |    "metadata": {},
 60 |    "source": [
 61 |     "## Training Images\n",
 62 |     "![mnist.train.xs](https://www.tensorflow.org/versions/master/images/mnist-train-xs.png)"
 63 |    ]
 64 |   },
 65 |   {
 66 |    "cell_type": "code",
 67 |    "execution_count": 2,
 68 |    "metadata": {},
 69 |    "outputs": [
 70 |     {
 71 |      "data": {
 72 |       "text/plain": [
 73 |        "(55000, 784)"
 74 |       ]
 75 |      },
 76 |      "execution_count": 2,
 77 |      "metadata": {},
 78 |      "output_type": "execute_result"
 79 |     }
 80 |    ],
 81 |    "source": [
 82 |     "# check MNIST training images matrix shape\n",
 83 |     "mnist.train.images.shape"
 84 |    ]
 85 |   },
 86 |   {
 87 |    "cell_type": "code",
 88 |    "execution_count": 8,
 89 |    "metadata": {},
 90 |    "outputs": [
 91 |     {
 92 |      "data": {
 93 |       "text/plain": [
 94 |        "array([[0.        , 0.        , 0.        , 0.        , 0.        ,\n",
 95 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
 96 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
 97 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
 98 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
 99 |        "        0.        , 0.        , 0.        ],\n",
100 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
101 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
102 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
103 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
104 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
105 |        "        0.        , 0.        , 0.        ],\n",
106 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
107 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
108 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
109 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
110 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
111 |        "        0.        , 0.        , 0.        ],\n",
112 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
113 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
114 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
115 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
116 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
117 |        "        0.        , 0.        , 0.        ],\n",
118 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
119 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
120 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
121 |        "        0.49411768, 0.8352942 , 0.13333334, 0.        , 0.        ,\n",
122 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
123 |        "        0.        , 0.        , 0.        ],\n",
124 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
125 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
126 |        "        0.        , 0.        , 0.        , 0.        , 0.07450981,\n",
127 |        "        0.89019614, 0.9960785 , 0.32941177, 0.        , 0.        ,\n",
128 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
129 |        "        0.        , 0.        , 0.        ],\n",
130 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
131 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
132 |        "        0.        , 0.        , 0.        , 0.        , 0.44705886,\n",
133 |        "        0.9960785 , 0.9960785 , 0.32941177, 0.        , 0.        ,\n",
134 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
135 |        "        0.        , 0.        , 0.        ],\n",
136 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
137 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
138 |        "        0.        , 0.        , 0.        , 0.        , 0.69803923,\n",
139 |        "        0.9960785 , 0.9960785 , 0.32941177, 0.        , 0.        ,\n",
140 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
141 |        "        0.        , 0.        , 0.        ],\n",
142 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
143 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
144 |        "        0.        , 0.        , 0.        , 0.23529413, 0.92549026,\n",
145 |        "        0.9960785 , 0.9960785 , 0.32941177, 0.        , 0.        ,\n",
146 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
147 |        "        0.        , 0.        , 0.        ],\n",
148 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
149 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
150 |        "        0.        , 0.        , 0.        , 0.30588236, 0.9960785 ,\n",
151 |        "        0.9960785 , 0.9960785 , 0.32941177, 0.        , 0.        ,\n",
152 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
153 |        "        0.        , 0.        , 0.        ],\n",
154 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
155 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
156 |        "        0.        , 0.        , 0.        , 0.30588236, 0.9960785 ,\n",
157 |        "        0.9960785 , 0.9058824 , 0.21960786, 0.        , 0.        ,\n",
158 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
159 |        "        0.        , 0.        , 0.        ],\n",
160 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
161 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
162 |        "        0.        , 0.        , 0.        , 0.7490196 , 0.9960785 ,\n",
163 |        "        0.9960785 , 0.7254902 , 0.        , 0.        , 0.        ,\n",
164 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
165 |        "        0.        , 0.        , 0.        ],\n",
166 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
167 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
168 |        "        0.        , 0.        , 0.        , 0.91372555, 0.9960785 ,\n",
169 |        "        0.9960785 , 0.41176474, 0.        , 0.        , 0.        ,\n",
170 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
171 |        "        0.        , 0.        , 0.        ],\n",
172 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
173 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
174 |        "        0.        , 0.        , 0.        , 0.91372555, 0.9960785 ,\n",
175 |        "        0.9960785 , 0.11764707, 0.        , 0.        , 0.        ,\n",
176 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
177 |        "        0.        , 0.        , 0.        ],\n",
178 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
179 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
180 |        "        0.        , 0.        , 0.3647059 , 0.9725491 , 0.9960785 ,\n",
181 |        "        0.95294124, 0.10588236, 0.        , 0.        , 0.        ,\n",
182 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
183 |        "        0.        , 0.        , 0.        ],\n",
184 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
185 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
186 |        "        0.        , 0.0627451 , 0.75294125, 0.9960785 , 0.9960785 ,\n",
187 |        "        0.50980395, 0.        , 0.        , 0.        , 0.        ,\n",
188 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
189 |        "        0.        , 0.        , 0.        ],\n",
190 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
191 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
192 |        "        0.        , 0.1254902 , 0.9960785 , 0.9960785 , 0.9725491 ,\n",
193 |        "        0.37254903, 0.        , 0.        , 0.        , 0.        ,\n",
194 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
195 |        "        0.        , 0.        , 0.        ],\n",
196 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
197 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
198 |        "        0.        , 0.16078432, 0.9960785 , 0.9960785 , 0.9058824 ,\n",
199 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
200 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
201 |        "        0.        , 0.        , 0.        ],\n",
202 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
203 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
204 |        "        0.        , 0.7294118 , 0.9960785 , 0.9960785 , 0.627451  ,\n",
205 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
206 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
207 |        "        0.        , 0.        , 0.        ],\n",
208 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
209 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
210 |        "        0.08235294, 0.7960785 , 0.9960785 , 0.7372549 , 0.04705883,\n",
211 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
212 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
213 |        "        0.        , 0.        , 0.        ],\n",
214 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
215 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
216 |        "        0.3372549 , 0.9960785 , 1.        , 0.7019608 , 0.03137255,\n",
217 |        "        0.01568628, 0.        , 0.        , 0.        , 0.        ,\n",
218 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
219 |        "        0.        , 0.        , 0.        ],\n",
220 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
221 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
222 |        "        0.3372549 , 0.9960785 , 0.9960785 , 0.9960785 , 0.9960785 ,\n",
223 |        "        0.50980395, 0.        , 0.        , 0.        , 0.        ,\n",
224 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
225 |        "        0.        , 0.        , 0.        ],\n",
226 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
227 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
228 |        "        0.26666668, 0.94117653, 0.9960785 , 0.9960785 , 0.9803922 ,\n",
229 |        "        0.4039216 , 0.        , 0.        , 0.        , 0.        ,\n",
230 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
231 |        "        0.        , 0.        , 0.        ],\n",
232 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
233 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
234 |        "        0.        , 0.36078432, 1.        , 0.8117648 , 0.35686275,\n",
235 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
236 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
237 |        "        0.        , 0.        , 0.        ],\n",
238 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
239 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
240 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
241 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
242 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
243 |        "        0.        , 0.        , 0.        ],\n",
244 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
245 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
246 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
247 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
248 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
249 |        "        0.        , 0.        , 0.        ],\n",
250 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
251 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
252 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
253 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
254 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
255 |        "        0.        , 0.        , 0.        ],\n",
256 |        "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
257 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
258 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
259 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
260 |        "        0.        , 0.        , 0.        , 0.        , 0.        ,\n",
261 |        "        0.        , 0.        , 0.        ]], dtype=float32)"
262 |       ]
263 |      },
264 |      "execution_count": 8,
265 |      "metadata": {},
266 |      "output_type": "execute_result"
267 |     }
268 |    ],
269 |    "source": [
270 |     "# check MNIST training images matrix data\n",
271 |     "sample_img = mnist.train.images[6].reshape(28, 28)\n",
272 |     "sample_img"
273 |    ]
274 |   },
275 |   {
276 |    "cell_type": "code",
277 |    "execution_count": 9,
278 |    "metadata": {},
279 |    "outputs": [
280 |     {
281 |      "data": {
282 |       "image/png": "iVBORw0KGgoAAAANSUhEUgAAAP8AAAD8CAYAAAC4nHJkAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAADFRJREFUeJzt3V2IXOUdx/HfL9qKmIDRTJdotZsWKSxC0zIEIWpS+4KVQiyINhclBWm8aLBCLyr2ol5FLbUlQqmkGppKa1toY3KhtWlokIIUV0l9rTWVDSbE7ARfqviSav692JOyjTtnx5lz5szm//3AsGfOc+acPyf72+fMeSbzOCIEIJ9FTRcAoBmEH0iK8ANJEX4gKcIPJEX4gaQIP5AU4QeSIvxAUqcP82DLli2L8fHxYR4SSGVqakpHjx51L9sOFH7bV0raIuk0SfdExO1l24+Pj2tycnKQQwIo0W63e96278t+26dJ+qmkr0iakLTe9kS/+wMwXIO8518laX9EvBgRxyT9RtK6asoCULdBwn++pJdmPT9YrPs/tjfanrQ92el0BjgcgCrVfrc/IrZGRDsi2q1Wq+7DAejRIOE/JOmCWc8/XqwDsAAMEv7HJF1ke4Xtj0r6uqRd1ZQFoG59D/VFxHu2N0l6WDNDfdsi4pnKKgNQq4HG+SPiQUkPVlQLgCHi471AUoQfSIrwA0kRfiApwg8kRfiBpAg/kBThB5Ii/EBShB9IivADSRF+ICnCDyRF+IGkCD+QFOEHkiL8QFKEH0iK8ANJEX4gKcIPJDXUKbpx6rn77rtL2++5556ubbt37y597dKlS/uqCb2h5weSIvxAUoQfSIrwA0kRfiApwg8kRfiBpAYa57c9JekNSe9Lei8i2lUUhdHx9ttvl7bfcccdpe0HDhzo2vb888+XvvaSSy4pbcdgqviQz+cj4mgF+wEwRFz2A0kNGv6Q9Cfbj9veWEVBAIZj0Mv+SyPikO2PSdpt+x8R8cjsDYo/Chsl6cILLxzwcACqMlDPHxGHip/TknZIWjXHNlsjoh0R7VarNcjhAFSo7/DbPsv2khPLkr4s6emqCgNQr0Eu+8ck7bB9Yj+/jog/VlIVgNr1Hf6IeFHSZyqsBSPogQceKG0vG8fHaGOoD0iK8ANJEX4gKcIPJEX4gaQIP5AUX92NUg8//HDTJaAm9PxAUoQfSIrwA0kRfiApwg8kRfiBpAg/kBTj/Mnt37+/tP2hhx4aUiUYNnp+ICnCDyRF+IGkCD+QFOEHkiL8QFKEH0iKcf7kXn/99dL2TqczpEowbPT8QFKEH0iK8ANJEX4gKcIPJEX4gaQIP5DUvOP8trdJ+qqk6Yi4uFh3jqTfShqXNCXp2oh4tb4ysVCdd955XdvGxsaGWAlO1kvP/wtJV5607mZJeyLiIkl7iucAFpB5wx8Rj0h65aTV6yRtL5a3S7q64roA1Kzf9/xjEXG4WH5ZEtdvwAIz8A2/iAhJ0a3d9kbbk7Yn+Zw4MDr6Df8R28slqfg53W3DiNgaEe2IaLdarT4PB6Bq/YZ/l6QNxfIGSTurKQfAsMwbftv3S3pU0qdtH7R9vaTbJX3J9guSvlg8B7CAzDvOHxHruzR9oeJa0IDNmzfXuv/LLrusa9uKFStqPTbK8Qk/ICnCDyRF+IGkCD+QFOEHkiL8QFJ8dXdyjz76aK37X7++20gxmkbPDyRF+IGkCD+QFOEHkiL8QFKEH0iK8ANJMc6PWq1Zs6bpEtAFPT+QFOEHkiL8QFKEH0iK8ANJEX4gKcIPJMU4/ylu7969pe2vvjrYzOqLFy8ubV+0iP5lVPEvAyRF+IGkCD+QFOEHkiL8QFKEH0iK8ANJzTvOb3ubpK9Kmo6Ii4t1t0r6lqROsdktEfFgXUWi3DvvvNO1bcuWLaWvfffddwc69nxTfC9ZsmSg/aM+vfT8v5B05RzrfxIRK4sHwQcWmHnDHxGPSHplCLUAGKJB3vNvsv2k7W22l1ZWEYCh6Df8P5P0KUkrJR2WdGe3DW1vtD1pe7LT6XTbDMCQ9RX+iDgSEe9HxHFJP5e0qmTbrRHRjoh2q9Xqt04AFesr/LaXz3r6NUlPV1MOgGHpZajvfklrJS2zfVDSDySttb1SUkiaknRDjTUCqMG84Y+IuSZYv7eGWtCnsnH+nTt3DrTvM844o7R9YmJioP2jOXzCD0iK8ANJEX4gKcIPJEX4gaQIP5AUX919Cjh27Fht+z733HNL26+44orajo160fMDSRF+ICnCDyRF+IGkCD+QFOEHkiL8QFKM858Cbrzxxtr2fd1119W2bzSLnh9IivADSRF+ICnCDyRF+IGkCD+QFOEHkmKcfwF46623Stunp6f73vc111xT2n7bbbf1vW+MNnp+ICnCDyRF+IGkCD+QFOEHkiL8QFKEH0hq3nF+2xdI+qWkMUkhaWtEbLF9jqTfShqXNCXp2oh4tb5S89q3b19p+969e7u2RUTpa88888zS9kWLyvuH48ePD/R6NKeXf5n3JH03IiYkXSLp27YnJN0saU9EXCRpT/EcwAIxb/gj4nBEPFEsvyHpOUnnS1onaXux2XZJV9dVJIDqfahrMtvjkj4r6W+SxiLicNH0smbeFgBYIHoOv+3Fkn4v6aaI+Pfstph5Yznnm0vbG21P2p7sdDoDFQugOj2F3/ZHNBP8X0XEH4rVR2wvL9qXS5rzf5dExNaIaEdEu9VqVVEzgArMG37blnSvpOci4sezmnZJ2lAsb5C0s/ryANSll//Su1rSNyQ9ZfvEmNMtkm6X9Dvb10s6IOnaekrEIGb+dnd33333DdR+1113lbZv2rSptB3NmTf8EfFXSd1+g75QbTkAhoVPYABJEX4gKcIPJEX4gaQIP5AU4QeS4qu7F4Czzz677/bXXnttoGOffnr5r8j4+PhA+0dz6PmBpAg/kBThB5Ii/EBShB9IivADSRF+ICnG+ReAiYmJ0vYdO3Z0bVu7dm3pa1evXl3avnnz5tL2yy+/vLQdo4ueH0iK8ANJEX4gKcIPJEX4gaQIP5AU4QeSYpz/FLBmzZqubfNN0Y286PmBpAg/kBThB5Ii/EBShB9IivADSRF+IKl5w2/7Att/sf2s7Wdsf6dYf6vtQ7b3FY+r6i8XQFV6+ZDPe5K+GxFP2F4i6XHbu4u2n0TEj+orD0Bd5g1/RByWdLhYfsP2c5LOr7swAPX6UO/5bY9L+qykvxWrNtl+0vY220u7vGaj7Unbk51OZ6BiAVSn5/DbXizp95Juioh/S/qZpE9JWqmZK4M753pdRGyNiHZEtFutVgUlA6hCT+G3/RHNBP9XEfEHSYqIIxHxfkQcl/RzSavqKxNA1Xq5229J90p6LiJ+PGv98lmbfU3S09WXB6AuvdztXy3pG5Kesr2vWHeLpPW2V0oKSVOSbqilQgC16OVu/18leY6mB6svB8Cw8Ak/ICnCDyRF+IGkCD+QFOEHkiL8QFKEH0iK8ANJEX4gKcIPJEX4gaQIP5AU4QeSIvxAUh7mFM62O5IOzFq1TNLRoRXw4YxqbaNal0Rt/aqytk9ERE/flzfU8H/g4PZkRLQbK6DEqNY2qnVJ1Navpmrjsh9IivADSTUd/q0NH7/MqNY2qnVJ1NavRmpr9D0/gOY03fMDaEgj4bd9pe3nbe+3fXMTNXRje8r2U8XMw5MN17LN9rTtp2etO8f2btsvFD/nnCatodpGYubmkpmlGz13ozbj9dAv+22fJumfkr4k6aCkxyStj4hnh1pIF7anJLUjovExYduXS3pT0i8j4uJi3Q8lvRIRtxd/OJdGxPdGpLZbJb3Z9MzNxYQyy2fPLC3paknfVIPnrqSua9XAeWui518laX9EvBgRxyT9RtK6BuoYeRHxiKRXTlq9TtL2Ynm7Zn55hq5LbSMhIg5HxBPF8huSTsws3ei5K6mrEU2E/3xJL816flCjNeV3SPqT7cdtb2y6mDmMFdOmS9LLksaaLGYO887cPEwnzSw9Mueunxmvq8YNvw+6NCI+J+krkr5dXN6OpJh5zzZKwzU9zdw8LHPMLP0/TZ67fme8rloT4T8k6YJZzz9erBsJEXGo+DktaYdGb/bhIycmSS1+Tjdcz/+M0szNc80srRE4d6M043UT4X9M0kW2V9j+qKSvS9rVQB0fYPus4kaMbJ8l6csavdmHd0naUCxvkLSzwVr+z6jM3NxtZmk1fO5GbsbriBj6Q9JVmrnj/y9J32+ihi51fVLS34vHM03XJul+zVwG/kcz90aul3SupD2SXpD0Z0nnjFBt90l6StKTmgna8oZqu1Qzl/RPStpXPK5q+tyV1NXIeeMTfkBS3PADkiL8QFKEH0iK8ANJEX4gKcIPJEX4gaQIP5DUfwHXgNMxwvvjUAAAAABJRU5ErkJggg==\n",
283 |       "text/plain": [
284 |        "<Figure size 432x288 with 1 Axes>"
285 |       ]
286 |      },
287 |      "metadata": {},
288 |      "output_type": "display_data"
289 |     }
290 |    ],
291 |    "source": [
292 |     "# plot the image\n",
293 |     "plt.imshow(sample_img).set_cmap('Greys')"
294 |    ]
295 |   },
296 |   {
297 |    "cell_type": "markdown",
298 |    "metadata": {},
299 |    "source": [
300 |     "## Training Labels\n",
301 |     "![mnist.train.ys](https://www.tensorflow.org/versions/master/images/mnist-train-ys.png)"
302 |    ]
303 |   },
304 |   {
305 |    "cell_type": "code",
306 |    "execution_count": 6,
307 |    "metadata": {},
308 |    "outputs": [
309 |     {
310 |      "data": {
311 |       "text/plain": [
312 |        "(55000, 10)"
313 |       ]
314 |      },
315 |      "execution_count": 6,
316 |      "metadata": {},
317 |      "output_type": "execute_result"
318 |     }
319 |    ],
320 |    "source": [
321 |     "# check MNIST labels shape\n",
322 |     "mnist.train.labels.shape"
323 |    ]
324 |   },
325 |   {
326 |    "cell_type": "code",
327 |    "execution_count": 10,
328 |    "metadata": {},
329 |    "outputs": [
330 |     {
331 |      "data": {
332 |       "text/plain": [
333 |        "array([0., 1., 0., 0., 0., 0., 0., 0., 0., 0.])"
334 |       ]
335 |      },
336 |      "execution_count": 10,
337 |      "metadata": {},
338 |      "output_type": "execute_result"
339 |     }
340 |    ],
341 |    "source": [
342 |     "# show MNIST label data\n",
343 |     "sample_label = mnist.train.labels[6]\n",
344 |     "sample_label"
345 |    ]
346 |   },
347 |   {
348 |    "cell_type": "markdown",
349 |    "metadata": {},
350 |    "source": [
351 |     "# Defining a Neural Network"
352 |    ]
353 |   },
354 |   {
355 |    "cell_type": "markdown",
356 |    "metadata": {},
357 |    "source": [
358 |     "## in a graph:\n",
359 |     "![](https://www.tensorflow.org/versions/master/images/softmax-regression-scalargraph.png)\n",
360 |     "\n",
361 |     "## in a vector equation:\n",
362 |     "![](https://www.tensorflow.org/versions/master/images/softmax-regression-vectorequation.png)\n",
363 |     "\n",
364 |     "## so that we'll have the weights like this:\n",
365 |     "blue: positive weights, red: negative weights\n",
366 |     "![](https://www.tensorflow.org/versions/master/images/softmax-weights.png)"
367 |    ]
368 |   },
369 |   {
370 |    "cell_type": "code",
371 |    "execution_count": 11,
372 |    "metadata": {},
373 |    "outputs": [
374 |     {
375 |      "data": {
376 |       "text/plain": [
377 |        "<tf.Tensor 'Softmax:0' shape=(?, 10) dtype=float32>"
378 |       ]
379 |      },
380 |      "execution_count": 11,
381 |      "metadata": {},
382 |      "output_type": "execute_result"
383 |     }
384 |    ],
385 |    "source": [
386 |     "# define a neural network (softmax logistic regression)\n",
387 |     "import tensorflow as tf\n",
388 |     "\n",
389 |     "x = tf.placeholder(tf.float32, [None, 784])\n",
390 |     "\n",
391 |     "W = tf.Variable(tf.zeros([784, 10]))\n",
392 |     "b = tf.Variable(tf.zeros([10]))\n",
393 |     "\n",
394 |     "y = tf.nn.softmax(tf.matmul(x, W) + b) # the equation\n",
395 |     "y"
396 |    ]
397 |   },
398 |   {
399 |    "cell_type": "markdown",
400 |    "metadata": {},
401 |    "source": [
402 |     "# Defining the Train Step"
403 |    ]
404 |   },
405 |   {
406 |    "cell_type": "code",
407 |    "execution_count": 12,
408 |    "metadata": {},
409 |    "outputs": [
410 |     {
411 |      "data": {
412 |       "text/plain": [
413 |        "<tf.Operation 'GradientDescent' type=NoOp>"
414 |       ]
415 |      },
416 |      "execution_count": 12,
417 |      "metadata": {},
418 |      "output_type": "execute_result"
419 |     }
420 |    ],
421 |    "source": [
422 |     "# define the train step to minimize the cross entropy with SGD\n",
423 |     "y_ = tf.placeholder(tf.float32, [None, 10])\n",
424 |     "\n",
425 |     "cross_entropy = -tf.reduce_sum(y_*tf.log(y))\n",
426 |     "\n",
427 |     "train_step = tf.train.GradientDescentOptimizer(0.0001).minimize(cross_entropy)\n",
428 |     "train_step"
429 |    ]
430 |   },
431 |   {
432 |    "cell_type": "markdown",
433 |    "metadata": {},
434 |    "source": [
435 |     "## Use Gradient Decent to find the optimal weights\n",
436 |     "![](http://blog.datumbox.com/wp-content/uploads/2013/10/gradient-descent.png)\n",
437 |     "From: [Machine Learning Blog & Software Development News](http://blog.datumbox.com/tuning-the-learning-rate-in-gradient-descent/)"
438 |    ]
439 |   },
440 |   {
441 |    "cell_type": "markdown",
442 |    "metadata": {},
443 |    "source": [
444 |     "# Do 1000 times of mini-batch training"
445 |    ]
446 |   },
447 |   {
448 |    "cell_type": "code",
449 |    "execution_count": 14,
450 |    "metadata": {},
451 |    "outputs": [
452 |     {
453 |      "name": "stdout",
454 |      "output_type": "stream",
455 |      "text": [
456 |       "done\n"
457 |      ]
458 |     }
459 |    ],
460 |    "source": [
461 |     "# initialize variables and session\n",
462 |     "init = tf.global_variables_initializer()\n",
463 |     "sess = tf.Session()\n",
464 |     "sess.run(init)\n",
465 |     "\n",
466 |     "# train the model mini batch with 100 elements, for 1K times\n",
467 |     "for i in range(1000):\n",
468 |     "    batch_xs, batch_ys = mnist.train.next_batch(100)\n",
469 |     "    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})\n",
470 |     "    \n",
471 |     "print(\"done\")"
472 |    ]
473 |   },
474 |   {
475 |    "cell_type": "markdown",
476 |    "metadata": {},
477 |    "source": [
478 |     "# Test"
479 |    ]
480 |   },
481 |   {
482 |    "cell_type": "code",
483 |    "execution_count": 16,
484 |    "metadata": {},
485 |    "outputs": [
486 |     {
487 |      "name": "stdout",
488 |      "output_type": "stream",
489 |      "text": [
490 |       "0.8695\n"
491 |      ]
492 |     }
493 |    ],
494 |    "source": [
495 |     "# evaluate the accuracy of the model\n",
496 |     "correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))\n",
497 |     "accuracy = tf.reduce_mean(tf.cast(correct_prediction, \"float\"))\n",
498 |     "print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))"
499 |    ]
500 |   },
501 |   {
502 |    "cell_type": "markdown",
503 |    "metadata": {},
504 |    "source": [
505 |     "# TODO\n",
506 |     "\n",
507 |     "* Ajouter une cellule ou vous lancerez une initialisation, afficherez la valeur des poids avant et après une optimisation. Est-ce que les poids change effectivement avant et après l'optimisation ? \n",
508 |     "\n",
509 |     "\n",
510 |     "* Modifier ce notebook pour afficher deux courbes montrant la progression du loss/accuracy sur le training/test une fois le training terminé. Vous pouvez pour cela utiliser matplotlib.pyplot\n",
511 |     "\n",
512 |     "\n",
513 |     "* quelle courbe semble dominer et obtenir les meilleurs résultats?\n",
514 |     "\n",
515 |     "\n",
516 |     "* Affichez les logs de l'évolution du training et du test pendant le training.\n",
517 |     "\n",
518 |     "\n",
519 |     "* Pouvez vous imaginez une formule d'erreur différente ? Mettez-la en place et comparer les résultats.\n",
520 |     "\n",
521 |     "\n",
522 |     "* Modifiez le learning rate de 0001 à 0.1. Quelle sont les conséquences ? Pourquoi ?\n",
523 |     "\n",
524 |     "\n",
525 |     "* Ajouter un seed à python et un seed à tensorflow. Pouvez-vous en déduire l'importance du seed dans un entrainement ?\n",
526 |     "\n",
527 |     "\n",
528 |     "* A présent, trouvez le learning rate le plus favorable\n",
529 |     "\n",
530 |     "\n",
531 |     "* Essayez de rajouter un hidden layer constitué de 16 neurones. La structure passera ainsi de 764 -> 10 à 764 -> 16 -> 10. Le layer intermédiaire devra être constitué de poids et d'un bias. Initialiser les poids à zero est-il une bonne idée avec deux layers ? Pourquoi ? Arrivez-vous à atteindre les mêmes performances avec un layer additionnel ? Que pouvez-vous en déduire ? \n",
532 |     "\n",
533 |     "\n",
534 |     "* Il existe d'autre algorithms basé sur la déscente de gradient utile pour optimiser vos variables, essayer de remplacer la version actuel par une version plus sophisticé. Documentez vous sur les différences exacts"
535 |    ]
536 |   }
537 |  ],
538 |  "metadata": {
539 |   "kernelspec": {
540 |    "display_name": "Python 3",
541 |    "language": "python",
542 |    "name": "python3"
543 |   },
544 |   "language_info": {
545 |    "codemirror_mode": {
546 |     "name": "ipython",
547 |     "version": 3
548 |    },
549 |    "file_extension": ".py",
550 |    "mimetype": "text/x-python",
551 |    "name": "python",
552 |    "nbconvert_exporter": "python",
553 |    "pygments_lexer": "ipython3",
554 |    "version": "3.5.2"
555 |   }
556 |  },
557 |  "nbformat": 4,
558 |  "nbformat_minor": 1
559 | }
560 | 


--------------------------------------------------------------------------------