├── 400by400.png ├── favicon.ico ├── google38bcc26191576bef.html ├── .gitignore ├── a06740a73b693e9847f815050642ff0f.html ├── assets ├── 400by400.png └── google-play-badge.png ├── about.markdown ├── 404.html ├── LICENSE ├── Gemfile ├── Gemfile.lock ├── _config.yml ├── README.md └── index.markdown /400by400.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ai-boson/mcts/HEAD/400by400.png -------------------------------------------------------------------------------- /favicon.ico: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ai-boson/mcts/HEAD/favicon.ico -------------------------------------------------------------------------------- /google38bcc26191576bef.html: -------------------------------------------------------------------------------- 1 | google-site-verification: google38bcc26191576bef.html -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | _site 2 | .sass-cache 3 | .jekyll-cache 4 | .jekyll-metadata 5 | vendor 6 | -------------------------------------------------------------------------------- /a06740a73b693e9847f815050642ff0f.html: -------------------------------------------------------------------------------- 1 | site-verification: a06740a73b693e9847f815050642ff0f -------------------------------------------------------------------------------- /assets/400by400.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ai-boson/mcts/HEAD/assets/400by400.png -------------------------------------------------------------------------------- /assets/google-play-badge.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ai-boson/mcts/HEAD/assets/google-play-badge.png -------------------------------------------------------------------------------- /about.markdown: -------------------------------------------------------------------------------- 1 | --- 2 | layout: page 3 | title: About 4 | permalink: /about/ 5 | --- 6 | 7 | I am interested in Algorithms, Game Design, Game Engines and Coding in general. I like making games and do game development in my free time. If you have any questions or want to collabrate on projects contact me at bosonicstudios@gmail.com 8 | 9 | -------------------------------------------------------------------------------- /404.html: -------------------------------------------------------------------------------- 1 | --- 2 | permalink: /404.html 3 | layout: default 4 | --- 5 | 6 | 19 | 20 |
21 |

404

22 | 23 |

Page not found :(

24 |

The requested page could not be found.

25 |
26 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 ai-boson 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /Gemfile: -------------------------------------------------------------------------------- 1 | source "https://rubygems.org" 2 | # Hello! This is where you manage which Jekyll version is used to run. 3 | # When you want to use a different version, change it below, save the 4 | # file and run `bundle install`. Run Jekyll with `bundle exec`, like so: 5 | # 6 | # bundle exec jekyll serve 7 | # 8 | # This will help ensure the proper Jekyll version is running. 9 | # Happy Jekylling! 10 | gem "jekyll", "~> 4.2.0" 11 | # This is the default theme for new Jekyll sites. You may change this to anything you like. 12 | gem "minima", "~> 2.5" 13 | # If you want to use GitHub Pages, remove the "gem "jekyll"" above and 14 | # uncomment the line below. To upgrade, run `bundle update github-pages`. 15 | # gem "github-pages", group: :jekyll_plugins 16 | # If you have any plugins, put them here! 17 | group :jekyll_plugins do 18 | gem "jekyll-feed", "~> 0.12" 19 | end 20 | 21 | # Windows and JRuby does not include zoneinfo files, so bundle the tzinfo-data gem 22 | # and associated library. 23 | platforms :mingw, :x64_mingw, :mswin, :jruby do 24 | gem "tzinfo", "~> 1.2" 25 | gem "tzinfo-data" 26 | end 27 | 28 | # Performance-booster for watching directories on Windows 29 | gem "wdm", "~> 0.1.1", :platforms => [:mingw, :x64_mingw, :mswin] 30 | 31 | -------------------------------------------------------------------------------- /Gemfile.lock: -------------------------------------------------------------------------------- 1 | GEM 2 | remote: https://rubygems.org/ 3 | specs: 4 | addressable (2.7.0) 5 | public_suffix (>= 2.0.2, < 5.0) 6 | colorator (1.1.0) 7 | concurrent-ruby (1.1.8) 8 | em-websocket (0.5.2) 9 | eventmachine (>= 0.12.9) 10 | http_parser.rb (~> 0.6.0) 11 | eventmachine (1.2.7-x64-mingw32) 12 | ffi (1.14.2-x64-mingw32) 13 | forwardable-extended (2.6.0) 14 | http_parser.rb (0.6.0) 15 | i18n (1.8.8) 16 | concurrent-ruby (~> 1.0) 17 | jekyll (4.2.0) 18 | addressable (~> 2.4) 19 | colorator (~> 1.0) 20 | em-websocket (~> 0.5) 21 | i18n (~> 1.0) 22 | jekyll-sass-converter (~> 2.0) 23 | jekyll-watch (~> 2.0) 24 | kramdown (~> 2.3) 25 | kramdown-parser-gfm (~> 1.0) 26 | liquid (~> 4.0) 27 | mercenary (~> 0.4.0) 28 | pathutil (~> 0.9) 29 | rouge (~> 3.0) 30 | safe_yaml (~> 1.0) 31 | terminal-table (~> 2.0) 32 | jekyll-feed (0.15.1) 33 | jekyll (>= 3.7, < 5.0) 34 | jekyll-sass-converter (2.1.0) 35 | sassc (> 2.0.1, < 3.0) 36 | jekyll-seo-tag (2.7.1) 37 | jekyll (>= 3.8, < 5.0) 38 | jekyll-watch (2.2.1) 39 | listen (~> 3.0) 40 | kramdown (2.3.0) 41 | rexml 42 | kramdown-parser-gfm (1.1.0) 43 | kramdown (~> 2.0) 44 | liquid (4.0.3) 45 | listen (3.4.1) 46 | rb-fsevent (~> 0.10, >= 0.10.3) 47 | rb-inotify (~> 0.9, >= 0.9.10) 48 | mercenary (0.4.0) 49 | minima (2.5.1) 50 | jekyll (>= 3.5, < 5.0) 51 | jekyll-feed (~> 0.9) 52 | jekyll-seo-tag (~> 2.1) 53 | pathutil (0.16.2) 54 | forwardable-extended (~> 2.6) 55 | public_suffix (4.0.6) 56 | rb-fsevent (0.10.4) 57 | rb-inotify (0.10.1) 58 | ffi (~> 1.0) 59 | rexml (3.2.4) 60 | rouge (3.26.0) 61 | safe_yaml (1.0.5) 62 | sassc (2.4.0-x64-mingw32) 63 | ffi (~> 1.9) 64 | terminal-table (2.0.0) 65 | unicode-display_width (~> 1.1, >= 1.1.1) 66 | thread_safe (0.3.6) 67 | tzinfo (1.2.9) 68 | thread_safe (~> 0.1) 69 | tzinfo-data (1.2021.1) 70 | tzinfo (>= 1.0.0) 71 | unicode-display_width (1.7.0) 72 | wdm (0.1.1) 73 | 74 | PLATFORMS 75 | x64-mingw32 76 | 77 | DEPENDENCIES 78 | jekyll (~> 4.2.0) 79 | jekyll-feed (~> 0.12) 80 | minima (~> 2.5) 81 | tzinfo (~> 1.2) 82 | tzinfo-data 83 | wdm (~> 0.1.1) 84 | 85 | BUNDLED WITH 86 | 2.2.8 87 | -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | # Welcome to Jekyll! 2 | # 3 | # This config file is meant for settings that affect your whole blog, values 4 | # which you are expected to set up once and rarely edit after that. If you find 5 | # yourself editing this file very often, consider using Jekyll's data files 6 | # feature for the data you need to update frequently. 7 | # 8 | # For technical reasons, this file is *NOT* reloaded automatically when you use 9 | # 'bundle exec jekyll serve'. If you change this file, please restart the server process. 10 | # 11 | # If you need help with YAML syntax, here are some quick references for you: 12 | # https://learn-the-web.algonquindesign.ca/topics/markdown-yaml-cheat-sheet/#yaml 13 | # https://learnxinyminutes.com/docs/yaml/ 14 | # 15 | # Site settings 16 | # These are used to personalize your new site. If you look in the HTML files, 17 | # you will see them accessed via {{ site.title }}, {{ site.email }}, and so on. 18 | # You can create any custom variable you would like, and they will be accessible 19 | # in the templates via {{ site.myvariable }}. 20 | 21 | title: Monte Carlo Tree Search (MCTS) algorithm tutorial and it's explanation with Python code. 22 | email: bosonicstudios@gmail.com 23 | description: >- 24 | Apply Monte Carlo Tree Search (MCTS) algorithm and create an unbeatable A.I for a simple game. 25 | MCTS algorithm tutorial with Python code for students with no background 26 | in Computer Science or Machine Learning. Design board games like Go, 27 | Sudo Tic Tac Toe, Chess, etc within hours. 28 | baseurl: "mcts" # the subpath of your site, e.g. /blog 29 | url: "https://ai-boson.github.io/mcts/" # the base hostname & protocol for your site, e.g. http://example.com 30 | twitter_username: 31 | github_username: ai-boson 32 | 33 | # Build settings 34 | theme: minima 35 | plugins: 36 | - jekyll-feed 37 | 38 | # Exclude from processing. 39 | # The following items will not be processed, by default. 40 | # Any item listed under the `exclude:` key here will be automatically added to 41 | # the internal "default list". 42 | # 43 | # Excluded items can be processed by explicitly listing the directories or 44 | # their entries' file path in the `include:` list. 45 | # 46 | # exclude: 47 | # - .sass-cache/ 48 | # - .jekyll-cache/ 49 | # - gemfiles/ 50 | # - Gemfile 51 | # - Gemfile.lock 52 | # - node_modules/ 53 | # - vendor/bundle/ 54 | # - vendor/cache/ 55 | # - vendor/gems/ 56 | # - vendor/ruby/ 57 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Monte Carlo Tree Search (MCTS) algorithm tutorial with Python code. Application of MCTS to create A.I for simple game. 2 | 3 | 4 | 5 | In this tutorial we will be explaining the Monte Carlo Tree Search algorithm and each part of the code. Recently we applied MCTS to develop our game. 6 | 7 | The code is general and only assumes familarity with basic Python. We have explained it with respect to our game. If you want to use it for your project or game, you will have to slightly modify the functions which I have mentioned below. 8 | 9 | drawing 10 | 11 | Here is the playstore link to the game: [Sudo Tic Tac Toe][jekyll-talk]. 12 | 13 | Rules for our game(mode 1) are as follows: 14 | 15 | 1. The game is played on a 9 by 9 grid like Sudoku. 16 | 2. This big 9 by 9 grid is divided into 9 smaller 3 by 3 grids (local board). 17 | 3. Aim of the game is to win any one local board of the 9 available. 18 | 4. Your move determines in which local board A.I has to make a move and viceversa. 19 | 5. For example you make a move in position 1 of local board number 5. 20 | This will force the A.I to make a move in local board number 1. 21 | 6. Rules of normal Tic Tac Toe are applied to local board. 22 | 23 | ## Why MCTS? 24 | 25 | As you would have seen this game has a very high branching factor. For the first move the entire board is empty. So there are 81 empty spots. For the first turn it has 81 possible moves. For the second turn by applying rule 4 it has 8 or 9 possible moves. 26 | 27 | For the first 2 moves this results in 81*9 = 729 possible combinations. Thus the number of possible combinations increases as the game progresses, resulting in a high branching factor. For both the modes of our game the branching factor is very high. For games with such high branching factor it's not possible to apply the minimax algorithm. MCTS algorithm works for these kind of games. 28 | 29 | Also as you would have seen from playing the game the time it takes for the ai to make a move is just about a second. Thus MCTS runs fast. MCTS has been applied to both the modes of the game. 30 | 31 | [![homepage](/assets/google-play-badge.png)][2] 32 | 33 | Below we demonstrate the MCTS code in Python. 34 | First we need to import numpy and defaultdict. 35 | 36 | ```python 37 | import numpy as np 38 | from collections import defaultdict 39 | ``` 40 | Define MCTS class as shown below. 41 | 42 | ```python 43 | class MonteCarloTreeSearchNode(): 44 | def __init__(self, state, parent=None, parent_action=None): 45 | self.state = state 46 | self.parent = parent 47 | self.parent_action = parent_action 48 | self.children = [] 49 | self._number_of_visits = 0 50 | self._results = defaultdict(int) 51 | self._results[1] = 0 52 | self._results[-1] = 0 53 | self._untried_actions = None 54 | self._untried_actions = self.untried_actions() 55 | return 56 | 57 | ``` 58 | ### Constructor is used to initialize the following variables. 59 | - **state**: For our game it represents the board state. Generally the board state is represented by an array. For normal Tic Tac Toe, it is a 3 by 3 array. 60 | - **parent**: It is None for the root node and for other nodes it is equal to the node it is derived from. For the first turn as you have seen from the game it is None. 61 | - **children**: It contains all possible actions from the current node. For the second turn in our game this is 9 or 8 depending on where you make your move. 62 | - **parent_action**: None for the root node and for other nodes it is equal to the action which it's parent carried out. 63 | - **_number_of_visits**: Number of times current node is visited 64 | - **results**: It's a dictionary 65 | - **_untried_actions**: Represents the list of all possible actions 66 | - **action**: Move which has to be carried out. 67 | 68 | Class consists of the following member functions. All the functions below are member function except the main() function. 69 | 70 | ```python 71 | def untried_actions(self): 72 | 73 | self._untried_actions = self.state.get_legal_actions() 74 | return self._untried_actions 75 | ``` 76 | Returns the list of untried actions from a given state. For the first turn of our game there are 81 possible actions. For the second turn it is 8 or 9. This varies in our game. 77 | 78 | ```python 79 | def q(self): 80 | wins = self._results[1] 81 | loses = self._results[-1] 82 | return wins - loses 83 | ``` 84 | Returns the difference of wins - losses 85 | 86 | ```python 87 | def n(self): 88 | return self._number_of_visits 89 | ``` 90 | Returns the number of times each node is visited. 91 | 92 | ```python 93 | def expand(self): 94 | 95 | action = self._untried_actions.pop() 96 | next_state = self.state.move(action) 97 | child_node = MonteCarloTreeSearchNode( 98 | next_state, parent=self, parent_action=action) 99 | 100 | self.children.append(child_node) 101 | return child_node 102 | ``` 103 | From the present state, next state is generated depending on the action which is carried out. In this step all the possible states are appended to the children array and the child_node is returned. The states which are possible from the present state are all appended to the children array and the child_node corresponding to this state is returned. 104 | 105 | ```python 106 | def is_terminal_node(self): 107 | return self.state.is_game_over() 108 | ``` 109 | This is used to check if the current node is terminal or not. Terminal node is reached when the game is over. 110 | 111 | ```python 112 | def rollout(self): 113 | current_rollout_state = self.state 114 | 115 | while not current_rollout_state.is_game_over(): 116 | 117 | possible_moves = current_rollout_state.get_legal_actions() 118 | 119 | action = self.rollout_policy(possible_moves) 120 | current_rollout_state = current_rollout_state.move(action) 121 | return current_rollout_state.game_result() 122 | ``` 123 | From the current state, entire game is simulated till there is an outcome for the game. This outcome of the game is returned. For example if it results in a win, the outcome is 1. Otherwise it is -1 if it results in a loss. And it is 0 if it is a tie. If the entire game is randomly simulated, that is at each turn the move is randomly selected out of set of possible moves, it is called light playout. 124 | 125 | ```python 126 | def backpropagate(self, result): 127 | self._number_of_visits += 1. 128 | self._results[result] += 1. 129 | if self.parent: 130 | self.parent.backpropagate(result) 131 | ``` 132 | In this step all the statistics for the nodes are updated. Untill the parent node is reached, the number of visits for each node is incremented by 1. If the result is 1, that is it resulted in a win, then the win is incremented by 1. Otherwise if result is a loss, then loss is incremented by 1. 133 | 134 | ```python 135 | def is_fully_expanded(self): 136 | return len(self._untried_actions) == 0 137 | 138 | ``` 139 | All the actions are poped out of _untried_actions one by one. When it becomes empty, that is when the size is zero, it is fully expanded. 140 | 141 | ```python 142 | def best_child(self, c_param=0.1): 143 | 144 | choices_weights = [(c.q() / c.n()) + c_param * np.sqrt((2 * np.log(self.n()) / c.n())) for c in self.children] 145 | return self.children[np.argmax(choices_weights)] 146 | 147 | ``` 148 | Once fully expanded, this function selects the best child out of the children array. The first term in the formula corresponds to exploitation and the second term corresponds to exploration. 149 | 150 | ```python 151 | def rollout_policy(self, possible_moves): 152 | 153 | return possible_moves[np.random.randint(len(possible_moves))] 154 | 155 | ``` 156 | Randomly selects a move out of possible moves. This is an example of random playout. 157 | 158 | ```python 159 | def _tree_policy(self): 160 | 161 | current_node = self 162 | while not current_node.is_terminal_node(): 163 | 164 | if not current_node.is_fully_expanded(): 165 | return current_node.expand() 166 | else: 167 | current_node = current_node.best_child() 168 | return current_node 169 | ``` 170 | Selects node to run rollout. 171 | 172 | ```python 173 | def best_action(self): 174 | simulation_no = 100 175 | 176 | 177 | for i in range(simulation_no): 178 | 179 | v = self._tree_policy() 180 | reward = v.rollout() 181 | v.backpropagate(reward) 182 | 183 | return self.best_child(c_param=0.) 184 | ``` 185 | This is the best action function which returns the node corresponding to best possible move. 186 | The step of expansion, simulation and backpropagation are carried out by the code above. 187 | 188 | ```python 189 | def get_legal_actions(self): 190 | ''' 191 | Modify according to your game or 192 | needs. Constructs a list of all 193 | possible states from current state. 194 | Returns a list. 195 | ''' 196 | ``` 197 | 198 | ```python 199 | def is_game_over(self): 200 | ''' 201 | Modify according to your game or 202 | needs. It is the game over condition 203 | and depends on your game. Returns 204 | true or false 205 | ''' 206 | ``` 207 | 208 | ```python 209 | def game_result(self): 210 | ''' 211 | Modify according to your game or 212 | needs. Returns 1 or 0 or -1 depending 213 | on your state corresponding to win, 214 | tie or a loss. 215 | ''' 216 | ``` 217 | ```python 218 | def move(self,action): 219 | ''' 220 | Modify according to your game or 221 | needs. Changes the state of your 222 | board with a new value. For a normal 223 | Tic Tac Toe game, it can be a 3 by 3 224 | array with all the elements of array 225 | being 0 initially. 0 means the board 226 | position is empty. If you place x in 227 | row 2 column 3, then it would be some 228 | thing like board[2][3] = 1, where 1 229 | represents that x is placed. Returns 230 | the new state after making a move. 231 | ''' 232 | 233 | ``` 234 | 235 | ```python 236 | def main(): 237 | root = MonteCarloTreeSearchNode(state,None,action) 238 | selected_node = root.best_action() 239 | return 240 | ``` 241 | This is the main() function. Initialize the root node and call the best_action function to get the best node. This is not a member function of the class. All the other functions are member function of the class. 242 | 243 | MCTS consists of 4 steps: 244 | 245 | ## SELECTION 246 | 247 | The idea is to keep selecting best child nodes until we reach the leaf node of the tree. A good way to select such a child node is to use UCT (Upper Confidence Bound applied to trees) formula: 248 | 249 | wi/ni + c*sqrt(t)/ni 250 | 251 | 252 | wi = number of wins after the i-th move 253 | ni = number of simulations after the i-th move 254 | c = exploration parameter (theoretically equal to √2) 255 | t = total number of simulations for the parent node 256 | 257 | ## EXPANSION: 258 | 259 | When it can no longer apply UCT to find the successor node, it expands the game tree by appending all possible states from the leaf node. 260 | 261 | ## SIMULATION: 262 | 263 | After Expansion, the algorithm picks a child node arbitrarily, and it simulates entire game from selected node until it reaches the resulting state of the game. If nodes are picked randomly during the play out, it is called light play out. You can also opt for heavy play out by writing quality heuristics or evaluation functions. 264 | 265 | ## BACKPROPAGATION: 266 | 267 | Once the algorithm reaches the end of the game, it evaluates the state to figure out which player has won. It traverses upwards to the root and increments visit score for all visited nodes. It also updates win score for each node if the player for that position has won the playout. 268 | 269 | 270 | ## DESIGNING YOUR GAME: 271 | 272 | If you plan to make your own game, you will have to think about the following questions. 273 | 1. **How will you represent the state of your game? Think about the initial state in our game.** 274 | 2. **What will be the end game condition for your game? Compare it with the end game condition of our game.** 275 | 3. **How will you get the legal actions in your game? Try getting the legal actions for the first move of our game.** 276 | 277 | [Sudo Tic Tac Toe][jekyll-talk] 278 | 279 | [![homepage](/assets/google-play-badge.png)][2] 280 | 281 | If you have any questions or suggestions, feel free to contact us at bosonicstudios@gmail.com 282 | 283 | [jekyll-talk]: https://play.google.com/store/apps/details?id=com.myComp.sudo 284 | [2]: https://play.google.com/store/apps/details?id=com.myComp.sudo 285 | -------------------------------------------------------------------------------- /index.markdown: -------------------------------------------------------------------------------- 1 | --- 2 | # Feel free to add content and custom Front Matter to this file. 3 | # To modify the layout, see https://jekyllrb.com/docs/themes/#overriding-theme-defaults 4 | 5 | layout: home 6 | --- 7 | 8 | In this tutorial we will be explaining the Monte Carlo Tree Search algorithm and each part of the code. Recently we applied MCTS to develop our game. 9 | 10 | The code is general and only assumes familarity with basic Python. We have explained it with respect to our game. If you want to use it for your project or game, you will have to slightly modify the functions which I have mentioned below. 11 | 12 | drawing 13 | 14 | Here is the playstore link to the game: [Fractio][jekyll-talk]. 15 | 16 | [![homepage](/assets/google-play-badge.png)][2] 17 | 18 | Rules for our game(mode 1) are as follows: 19 | 20 | 1. The game is played on a 9 by 9 grid like Sudoku. 21 | 2. This big 9 by 9 grid is divided into 9 smaller 3 by 3 grids (local board). 22 | 3. Aim of the game is to win any one local board of the 9 available. 23 | 4. Your move determines in which local board A.I has to make a move and viceversa. 24 | 5. For example you make a move in position 1 of local board number 5. 25 | This will force the A.I to make a move in local board number 1. 26 | 6. Rules of normal Tic Tac Toe are applied to local board. 27 | 28 | ## Why MCTS? 29 | 30 | As you would have seen this game has a very high branching factor. For the first move the entire board is empty. So there are 81 empty spots. For the first turn it has 81 possible moves. For the second turn by applying rule 4 it has 8 or 9 possible moves. 31 | 32 | For the first 2 moves this results in 81*9 = 729 possible combinations. Thus the number of possible combinations increases as the game progresses, resulting in a high branching factor. For both the modes of our game the branching factor is very high. For games with such high branching factor it's not possible to apply the minimax algorithm. MCTS algorithm works for these kind of games. 33 | 34 | Also as you would have seen from playing the game the time it takes for the ai to make a move is just about a second. Thus MCTS runs fast. MCTS has been applied to both the modes of the game. 35 | 36 | MCTS consists of 4 steps: 37 | 38 | Note: You might not understand initially but look at the MCTS code below for proper explanation. 39 | 40 | ## SELECTION 41 | 42 | The idea is to keep selecting best child nodes until we reach the leaf node of the tree. A good way to select such a child node is to use UCT (Upper Confidence Bound applied to trees) formula: 43 | 44 | wi/ni + c*sqrt(t)/ni 45 | 46 | 47 | wi = number of wins after the i-th move 48 | ni = number of simulations after the i-th move 49 | c = exploration parameter (theoretically equal to √2) 50 | t = total number of simulations for the parent node 51 | 52 | ## EXPANSION: 53 | 54 | When it can no longer apply UCT to find the successor node, it expands the game tree by appending all possible states from the leaf node. 55 | 56 | ## SIMULATION: 57 | 58 | After Expansion, the algorithm picks a child node arbitrarily, and it simulates entire game from selected node until it reaches the resulting state of the game. If nodes are picked randomly during the play out, it is called light play out. You can also opt for heavy play out by writing quality heuristics or evaluation functions. 59 | 60 | ## BACKPROPAGATION: 61 | 62 | Once the algorithm reaches the end of the game, it evaluates the state to figure out which player has won. It traverses upwards to the root and increments visit score for all visited nodes. It also updates win score for each node if the player for that position has won the playout. 63 | 64 | Below we demonstrate the MCTS code in Python. 65 | First we need to import numpy and defaultdict. 66 | 67 | ```python 68 | import numpy as np 69 | from collections import defaultdict 70 | ``` 71 | Define MCTS class as shown below. 72 | 73 | ```python 74 | class MonteCarloTreeSearchNode(): 75 | def __init__(self, state, parent=None, parent_action=None): 76 | self.state = state 77 | self.parent = parent 78 | self.parent_action = parent_action 79 | self.children = [] 80 | self._number_of_visits = 0 81 | self._results = defaultdict(int) 82 | self._results[1] = 0 83 | self._results[-1] = 0 84 | self._untried_actions = None 85 | self._untried_actions = self.untried_actions() 86 | return 87 | 88 | ``` 89 | ### Constructor is used to initialize the following variables. 90 | - **state**: For our game it represents the board state. Generally the board state is represented by an array. For normal Tic Tac Toe, it is a 3 by 3 array. 91 | - **parent**: It is None for the root node and for other nodes it is equal to the node it is derived from. For the first turn as you have seen from the game it is None. 92 | - **children**: It contains all possible actions from the current node. For the second turn in our game this is 9 or 8 depending on where you make your move. 93 | - **parent_action**: None for the root node and for other nodes it is equal to the action which it's parent carried out. 94 | - **_number_of_visits**: Number of times current node is visited 95 | - **results**: It's a dictionary 96 | - **_untried_actions**: Represents the list of all possible actions 97 | - **action**: Move which has to be carried out. 98 | 99 | Class consists of the following member functions. All the functions below are member function except the main() function. 100 | 101 | ```python 102 | def untried_actions(self): 103 | 104 | self._untried_actions = self.state.get_legal_actions() 105 | return self._untried_actions 106 | ``` 107 | Returns the list of untried actions from a given state. For the first turn of our game there are 81 possible actions. For the second turn it is 8 or 9. This varies in our game. 108 | 109 | ```python 110 | def q(self): 111 | wins = self._results[1] 112 | loses = self._results[-1] 113 | return wins - loses 114 | ``` 115 | Returns the difference of wins - losses 116 | 117 | ```python 118 | def n(self): 119 | return self._number_of_visits 120 | ``` 121 | Returns the number of times each node is visited. 122 | 123 | ```python 124 | def expand(self): 125 | 126 | action = self._untried_actions.pop() 127 | next_state = self.state.move(action) 128 | child_node = MonteCarloTreeSearchNode( 129 | next_state, parent=self, parent_action=action) 130 | 131 | self.children.append(child_node) 132 | return child_node 133 | ``` 134 | From the present state, next state is generated depending on the action which is carried out. In this step all the possible child nodes corresponding to generated states are appended to the children array and the child_node is returned. The states which are possible from the present state are all generated and the child_node corresponding to this generated state is returned. 135 | 136 | ```python 137 | def is_terminal_node(self): 138 | return self.state.is_game_over() 139 | ``` 140 | This is used to check if the current node is terminal or not. Terminal node is reached when the game is over. 141 | 142 | ```python 143 | def rollout(self): 144 | current_rollout_state = self.state 145 | 146 | while not current_rollout_state.is_game_over(): 147 | 148 | possible_moves = current_rollout_state.get_legal_actions() 149 | 150 | action = self.rollout_policy(possible_moves) 151 | current_rollout_state = current_rollout_state.move(action) 152 | return current_rollout_state.game_result() 153 | ``` 154 | From the current state, entire game is simulated till there is an outcome for the game. This outcome of the game is returned. For example if it results in a win, the outcome is 1. Otherwise it is -1 if it results in a loss. And it is 0 if it is a tie. If the entire game is randomly simulated, that is at each turn the move is randomly selected out of set of possible moves, it is called light playout. 155 | 156 | ```python 157 | def backpropagate(self, result): 158 | self._number_of_visits += 1. 159 | self._results[result] += 1. 160 | if self.parent: 161 | self.parent.backpropagate(result) 162 | ``` 163 | In this step all the statistics for the nodes are updated. Untill the parent node is reached, the number of visits for each node is incremented by 1. If the result is 1, that is it resulted in a win, then the win is incremented by 1. Otherwise if result is a loss, then loss is incremented by 1. 164 | 165 | ```python 166 | def is_fully_expanded(self): 167 | return len(self._untried_actions) == 0 168 | 169 | ``` 170 | All the actions are poped out of _untried_actions one by one. When it becomes empty, that is when the size is zero, it is fully expanded. 171 | 172 | ```python 173 | def best_child(self, c_param=0.1): 174 | 175 | choices_weights = [(c.q() / c.n()) + c_param * np.sqrt((2 * np.log(self.n()) / c.n())) for c in self.children] 176 | return self.children[np.argmax(choices_weights)] 177 | 178 | ``` 179 | Once fully expanded, this function selects the best child out of the children array. The first term in the formula corresponds to exploitation and the second term corresponds to exploration. 180 | 181 | ```python 182 | def rollout_policy(self, possible_moves): 183 | 184 | return possible_moves[np.random.randint(len(possible_moves))] 185 | 186 | ``` 187 | Randomly selects a move out of possible moves. This is an example of random playout. 188 | 189 | ```python 190 | def _tree_policy(self): 191 | 192 | current_node = self 193 | while not current_node.is_terminal_node(): 194 | 195 | if not current_node.is_fully_expanded(): 196 | return current_node.expand() 197 | else: 198 | current_node = current_node.best_child() 199 | return current_node 200 | ``` 201 | Selects node to run rollout. 202 | 203 | ```python 204 | def best_action(self): 205 | simulation_no = 100 206 | 207 | 208 | for i in range(simulation_no): 209 | 210 | v = self._tree_policy() 211 | reward = v.rollout() 212 | v.backpropagate(reward) 213 | 214 | return self.best_child(c_param=0.) 215 | ``` 216 | This is the best action function which returns the node corresponding to best possible move. 217 | The step of expansion, simulation and backpropagation are carried out by the code above. 218 | 219 | ```python 220 | def get_legal_actions(self): 221 | ''' 222 | Modify according to your game or 223 | needs. Constructs a list of all 224 | possible actions from current state. 225 | Returns a list. 226 | ''' 227 | ``` 228 | 229 | ```python 230 | def is_game_over(self): 231 | ''' 232 | Modify according to your game or 233 | needs. It is the game over condition 234 | and depends on your game. Returns 235 | true or false 236 | ''' 237 | ``` 238 | 239 | ```python 240 | def game_result(self): 241 | ''' 242 | Modify according to your game or 243 | needs. Returns 1 or 0 or -1 depending 244 | on your state corresponding to win, 245 | tie or a loss. 246 | ''' 247 | ``` 248 | ```python 249 | def move(self,action): 250 | ''' 251 | Modify according to your game or 252 | needs. Changes the state of your 253 | board with a new value. For a normal 254 | Tic Tac Toe game, it can be a 3 by 3 255 | array with all the elements of array 256 | being 0 initially. 0 means the board 257 | position is empty. If you place x in 258 | row 2 column 3, then it would be some 259 | thing like board[2][3] = 1, where 1 260 | represents that x is placed. Returns 261 | the new state after making a move. 262 | ''' 263 | 264 | ``` 265 | 266 | ```python 267 | def main(): 268 | root = MonteCarloTreeSearchNode(state = initial_state) 269 | selected_node = root.best_action() 270 | return 271 | ``` 272 | This is the main() function. Initialize the root node and call the best_action function to get the best node. This is not a member function of the class. All the other functions are member function of the class. 273 | 274 | If you like the tutorial please consider checking out game page of my 3D racing game [Speed Surge][3] 275 | 276 | ## DESIGNING YOUR GAME: 277 | 278 | If you plan to make your own game, you will have to think about the following questions. 279 | 1. **How will you represent the state of your game? Think about the initial state in our game.** 280 | 2. **What will be the end game condition for your game? Compare it with the end game condition of our game.** 281 | 3. **How will you get the legal actions in your game? Try getting the legal actions for the first move of our game.** 282 | 283 | 284 | 285 | If you have any questions or suggestions, feel free to contact me at bosonicstudios@gmail.com 286 | 287 | [jekyll-talk]: https://play.google.com/store/apps/details?id=com.myComp.sudo 288 | [2]: https://play.google.com/store/apps/details?id=com.myComp.sudo 289 | [3]: https://store.steampowered.com/app/3566670/Speed_Surge/ 290 | --------------------------------------------------------------------------------