404

├── 400by400.png
├── favicon.ico
├── google38bcc26191576bef.html
├── .gitignore
├── a06740a73b693e9847f815050642ff0f.html
├── assets
    ├── 400by400.png
    └── google-play-badge.png
├── about.markdown
├── 404.html
├── LICENSE
├── Gemfile
├── Gemfile.lock
├── _config.yml
├── README.md
└── index.markdown


/400by400.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ai-boson/mcts/HEAD/400by400.png


--------------------------------------------------------------------------------
/favicon.ico:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ai-boson/mcts/HEAD/favicon.ico


--------------------------------------------------------------------------------
/google38bcc26191576bef.html:
--------------------------------------------------------------------------------
1 | google-site-verification: google38bcc26191576bef.html


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | _site
2 | .sass-cache
3 | .jekyll-cache
4 | .jekyll-metadata
5 | vendor
6 | 


--------------------------------------------------------------------------------
/a06740a73b693e9847f815050642ff0f.html:
--------------------------------------------------------------------------------
1 | site-verification: a06740a73b693e9847f815050642ff0f


--------------------------------------------------------------------------------
/assets/400by400.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ai-boson/mcts/HEAD/assets/400by400.png


--------------------------------------------------------------------------------
/assets/google-play-badge.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ai-boson/mcts/HEAD/assets/google-play-badge.png


--------------------------------------------------------------------------------
/about.markdown:
--------------------------------------------------------------------------------
1 | ---
2 | layout: page
3 | title: About
4 | permalink: /about/
5 | ---
6 | 
7 | I am interested in Algorithms, Game Design, Game Engines and Coding in general. I like making games and do game development in my free time. If you have any questions or want to collabrate on projects contact me at bosonicstudios@gmail.com  
8 | 
9 | 


--------------------------------------------------------------------------------
/404.html:
--------------------------------------------------------------------------------
 1 | ---
 2 | permalink: /404.html
 3 | layout: default
 4 | ---
 5 | 
 6 | <style type="text/css" media="screen">
 7 |   .container {
 8 |     margin: 10px auto;
 9 |     max-width: 600px;
10 |     text-align: center;
11 |   }
12 |   h1 {
13 |     margin: 30px 0;
14 |     font-size: 4em;
15 |     line-height: 1;
16 |     letter-spacing: -1px;
17 |   }
18 | </style>
19 | 
20 | <div class="container">
21 |   <h1>404</h1>
22 | 
23 |   <p><strong>Page not found :(</strong></p>
24 |   <p>The requested page could not be found.</p>
25 | </div>
26 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2021 ai-boson
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/Gemfile:
--------------------------------------------------------------------------------
 1 | source "https://rubygems.org"
 2 | # Hello! This is where you manage which Jekyll version is used to run.
 3 | # When you want to use a different version, change it below, save the
 4 | # file and run `bundle install`. Run Jekyll with `bundle exec`, like so:
 5 | #
 6 | #     bundle exec jekyll serve
 7 | #
 8 | # This will help ensure the proper Jekyll version is running.
 9 | # Happy Jekylling!
10 | gem "jekyll", "~> 4.2.0"
11 | # This is the default theme for new Jekyll sites. You may change this to anything you like.
12 | gem "minima", "~> 2.5"
13 | # If you want to use GitHub Pages, remove the "gem "jekyll"" above and
14 | # uncomment the line below. To upgrade, run `bundle update github-pages`.
15 | # gem "github-pages", group: :jekyll_plugins
16 | # If you have any plugins, put them here!
17 | group :jekyll_plugins do
18 |   gem "jekyll-feed", "~> 0.12"
19 | end
20 | 
21 | # Windows and JRuby does not include zoneinfo files, so bundle the tzinfo-data gem
22 | # and associated library.
23 | platforms :mingw, :x64_mingw, :mswin, :jruby do
24 |   gem "tzinfo", "~> 1.2"
25 |   gem "tzinfo-data"
26 | end
27 | 
28 | # Performance-booster for watching directories on Windows
29 | gem "wdm", "~> 0.1.1", :platforms => [:mingw, :x64_mingw, :mswin]
30 | 
31 | 


--------------------------------------------------------------------------------
/Gemfile.lock:
--------------------------------------------------------------------------------
 1 | GEM
 2 |   remote: https://rubygems.org/
 3 |   specs:
 4 |     addressable (2.7.0)
 5 |       public_suffix (>= 2.0.2, < 5.0)
 6 |     colorator (1.1.0)
 7 |     concurrent-ruby (1.1.8)
 8 |     em-websocket (0.5.2)
 9 |       eventmachine (>= 0.12.9)
10 |       http_parser.rb (~> 0.6.0)
11 |     eventmachine (1.2.7-x64-mingw32)
12 |     ffi (1.14.2-x64-mingw32)
13 |     forwardable-extended (2.6.0)
14 |     http_parser.rb (0.6.0)
15 |     i18n (1.8.8)
16 |       concurrent-ruby (~> 1.0)
17 |     jekyll (4.2.0)
18 |       addressable (~> 2.4)
19 |       colorator (~> 1.0)
20 |       em-websocket (~> 0.5)
21 |       i18n (~> 1.0)
22 |       jekyll-sass-converter (~> 2.0)
23 |       jekyll-watch (~> 2.0)
24 |       kramdown (~> 2.3)
25 |       kramdown-parser-gfm (~> 1.0)
26 |       liquid (~> 4.0)
27 |       mercenary (~> 0.4.0)
28 |       pathutil (~> 0.9)
29 |       rouge (~> 3.0)
30 |       safe_yaml (~> 1.0)
31 |       terminal-table (~> 2.0)
32 |     jekyll-feed (0.15.1)
33 |       jekyll (>= 3.7, < 5.0)
34 |     jekyll-sass-converter (2.1.0)
35 |       sassc (> 2.0.1, < 3.0)
36 |     jekyll-seo-tag (2.7.1)
37 |       jekyll (>= 3.8, < 5.0)
38 |     jekyll-watch (2.2.1)
39 |       listen (~> 3.0)
40 |     kramdown (2.3.0)
41 |       rexml
42 |     kramdown-parser-gfm (1.1.0)
43 |       kramdown (~> 2.0)
44 |     liquid (4.0.3)
45 |     listen (3.4.1)
46 |       rb-fsevent (~> 0.10, >= 0.10.3)
47 |       rb-inotify (~> 0.9, >= 0.9.10)
48 |     mercenary (0.4.0)
49 |     minima (2.5.1)
50 |       jekyll (>= 3.5, < 5.0)
51 |       jekyll-feed (~> 0.9)
52 |       jekyll-seo-tag (~> 2.1)
53 |     pathutil (0.16.2)
54 |       forwardable-extended (~> 2.6)
55 |     public_suffix (4.0.6)
56 |     rb-fsevent (0.10.4)
57 |     rb-inotify (0.10.1)
58 |       ffi (~> 1.0)
59 |     rexml (3.2.4)
60 |     rouge (3.26.0)
61 |     safe_yaml (1.0.5)
62 |     sassc (2.4.0-x64-mingw32)
63 |       ffi (~> 1.9)
64 |     terminal-table (2.0.0)
65 |       unicode-display_width (~> 1.1, >= 1.1.1)
66 |     thread_safe (0.3.6)
67 |     tzinfo (1.2.9)
68 |       thread_safe (~> 0.1)
69 |     tzinfo-data (1.2021.1)
70 |       tzinfo (>= 1.0.0)
71 |     unicode-display_width (1.7.0)
72 |     wdm (0.1.1)
73 | 
74 | PLATFORMS
75 |   x64-mingw32
76 | 
77 | DEPENDENCIES
78 |   jekyll (~> 4.2.0)
79 |   jekyll-feed (~> 0.12)
80 |   minima (~> 2.5)
81 |   tzinfo (~> 1.2)
82 |   tzinfo-data
83 |   wdm (~> 0.1.1)
84 | 
85 | BUNDLED WITH
86 |    2.2.8
87 | 


--------------------------------------------------------------------------------
/_config.yml:
--------------------------------------------------------------------------------
 1 | # Welcome to Jekyll!
 2 | #
 3 | # This config file is meant for settings that affect your whole blog, values
 4 | # which you are expected to set up once and rarely edit after that. If you find
 5 | # yourself editing this file very often, consider using Jekyll's data files
 6 | # feature for the data you need to update frequently.
 7 | #
 8 | # For technical reasons, this file is *NOT* reloaded automatically when you use
 9 | # 'bundle exec jekyll serve'. If you change this file, please restart the server process.
10 | #
11 | # If you need help with YAML syntax, here are some quick references for you: 
12 | # https://learn-the-web.algonquindesign.ca/topics/markdown-yaml-cheat-sheet/#yaml
13 | # https://learnxinyminutes.com/docs/yaml/
14 | #
15 | # Site settings
16 | # These are used to personalize your new site. If you look in the HTML files,
17 | # you will see them accessed via {{ site.title }}, {{ site.email }}, and so on.
18 | # You can create any custom variable you would like, and they will be accessible
19 | # in the templates via {{ site.myvariable }}.
20 | 
21 | title: Monte Carlo Tree Search (MCTS) algorithm tutorial and it's explanation with Python code.
22 | email: bosonicstudios@gmail.com
23 | description: >- 
24 |   Apply Monte Carlo Tree Search (MCTS) algorithm and create an unbeatable A.I for a simple game.
25 |   MCTS algorithm tutorial with Python code for students with no background
26 |   in Computer Science or Machine Learning. Design board games like Go,
27 |   Sudo Tic Tac Toe, Chess, etc within hours.
28 | baseurl: "mcts" # the subpath of your site, e.g. /blog
29 | url: "https://ai-boson.github.io/mcts/" # the base hostname & protocol for your site, e.g. http://example.com
30 | twitter_username: 
31 | github_username:  ai-boson
32 | 
33 | # Build settings
34 | theme: minima
35 | plugins:
36 |   - jekyll-feed
37 | 
38 | # Exclude from processing.
39 | # The following items will not be processed, by default.
40 | # Any item listed under the `exclude:` key here will be automatically added to
41 | # the internal "default list".
42 | #
43 | # Excluded items can be processed by explicitly listing the directories or
44 | # their entries' file path in the `include:` list.
45 | #
46 | # exclude:
47 | #   - .sass-cache/
48 | #   - .jekyll-cache/
49 | #   - gemfiles/
50 | #   - Gemfile
51 | #   - Gemfile.lock
52 | #   - node_modules/
53 | #   - vendor/bundle/
54 | #   - vendor/cache/
55 | #   - vendor/gems/
56 | #   - vendor/ruby/
57 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Monte Carlo Tree Search (MCTS) algorithm tutorial with Python code. Application of MCTS to create A.I for simple game.
  2 | 
  3 | 
  4 | 
  5 | In this tutorial we will be explaining the Monte Carlo Tree Search algorithm and each part of the code. Recently we applied MCTS to develop our game.
  6 | 
  7 | The code is general and only assumes familarity with basic Python. We have explained it with respect to our game. If you want to use it for your project or game, you will have to slightly modify the functions which I have mentioned below.
  8 | 
  9 | <img src="assets/400by400.png" alt="drawing" width="250"/>
 10 | 
 11 | Here is the playstore link to the game: [Sudo Tic Tac Toe][jekyll-talk].
 12 | 
 13 | Rules for our game(mode 1) are as follows:
 14 | 
 15 | 1. The game is played on a 9 by 9 grid like Sudoku.
 16 | 2. This big 9 by 9 grid is divided into 9 smaller 3 by 3 grids (local board).
 17 | 3. Aim of the game is to win any one local board of the 9 available.
 18 | 4. Your move determines in which local board A.I has to make a move and viceversa.
 19 | 5. For example you make a move in position 1 of local board number 5.
 20 |  This will force the A.I to make a move in local board number 1.
 21 | 6. Rules of normal Tic Tac Toe are applied to local board.
 22 | 
 23 | ## Why MCTS?
 24 | 
 25 | As you would have seen this game has a very high branching factor. For the first move the entire board is empty. So there are 81 empty spots. For the first turn it has 81 possible moves. For the second turn by applying rule 4 it has 8 or 9 possible moves.
 26 | 
 27 | For the first 2 moves this results in 81*9 = 729 possible combinations. Thus the number of possible combinations increases as the game progresses, resulting in a high branching factor. For both the modes of our game the branching factor is very high. For games with such high branching factor it's not possible to apply the minimax algorithm. MCTS algorithm works for these kind of games.
 28 | 
 29 | Also as you would have seen from playing the game the time it takes for the ai to make a move is just about a second. Thus MCTS runs fast. MCTS has been applied to both the modes of the game.
 30 | 
 31 | [![homepage](/assets/google-play-badge.png)][2]
 32 | 
 33 | Below we demonstrate the MCTS code in Python.
 34 | First we need to import numpy and defaultdict.
 35 | 
 36 | ```python
 37 | import numpy as np
 38 | from collections import defaultdict
 39 | ```
 40 | Define MCTS class as shown below. 
 41 | 
 42 | ```python
 43 | class MonteCarloTreeSearchNode():
 44 |     def __init__(self, state, parent=None, parent_action=None):
 45 |         self.state = state
 46 |         self.parent = parent
 47 |         self.parent_action = parent_action
 48 |         self.children = []
 49 |         self._number_of_visits = 0
 50 |         self._results = defaultdict(int)
 51 |         self._results[1] = 0
 52 |         self._results[-1] = 0
 53 |         self._untried_actions = None
 54 |         self._untried_actions = self.untried_actions()
 55 |         return
 56 | 
 57 | ```
 58 | ### Constructor is used to initialize the following variables.
 59 | - **state**: For our game it represents the board state. Generally the board state is represented by an array. For normal Tic Tac Toe, it is a 3 by 3 array.
 60 | - **parent**: It is None for the root node and for other nodes it is equal to the node it is derived from. For the first turn as you have seen from the game  it is None.
 61 | - **children**: It contains all possible actions from the current node. For the second turn in our game this is 9 or 8 depending on where you make your move.
 62 | - **parent_action**: None for the root node and for other nodes it is equal to the action which it's parent carried out.
 63 | - **_number_of_visits**: Number of times current node is visited
 64 | - **results**: It's a dictionary
 65 | - **_untried_actions**: Represents the list of all possible actions
 66 | - **action**: Move which has to be carried out.  
 67 | 
 68 | Class consists of the following member functions. All the functions below are member function except the main() function.
 69 | 
 70 | ```python
 71 | def untried_actions(self):
 72 | 
 73 |     self._untried_actions = self.state.get_legal_actions()
 74 |     return self._untried_actions
 75 | ```
 76 | Returns the list of untried actions from a given state. For the first turn of our game there are 81 possible actions. For the second turn it is 8 or 9. This varies in our game.
 77 | 
 78 | ```python    
 79 | def q(self):
 80 |     wins = self._results[1]
 81 |     loses = self._results[-1]
 82 |     return wins - loses
 83 | ```
 84 | Returns the difference of wins - losses
 85 | 
 86 | ```python   
 87 | def n(self):
 88 |     return self._number_of_visits
 89 | ```
 90 | Returns the number of times each node is visited.
 91 | 
 92 | ```python
 93 | def expand(self):
 94 | 	
 95 |     action = self._untried_actions.pop()
 96 |     next_state = self.state.move(action)
 97 |     child_node = MonteCarloTreeSearchNode(
 98 | 		next_state, parent=self, parent_action=action)
 99 | 
100 |     self.children.append(child_node)
101 |     return child_node 
102 | ``` 
103 | From the present state, next state is generated depending on the action which is carried out. In this step all the possible states are appended to the children array and the child_node is returned. The states which are possible from the present state are all appended to the children array and the child_node corresponding to this state is returned.
104 | 
105 | ```python 
106 | def is_terminal_node(self):
107 |     return self.state.is_game_over()
108 | ``` 
109 | This is used to check if the current node is terminal or not. Terminal node is reached when the game is over.
110 | 
111 | ```python 
112 | def rollout(self):
113 |     current_rollout_state = self.state
114 |     
115 |     while not current_rollout_state.is_game_over():
116 |         
117 |         possible_moves = current_rollout_state.get_legal_actions()
118 |         
119 |         action = self.rollout_policy(possible_moves)
120 |         current_rollout_state = current_rollout_state.move(action)
121 |     return current_rollout_state.game_result()
122 | ``` 
123 | From the current state, entire game is simulated till there is an outcome for the game. This outcome of the game is returned. For example if it results in a win, the outcome is 1. Otherwise it is -1 if it results in a loss. And it is 0 if it is a tie. If the entire game is randomly simulated, that is at each turn the move is randomly selected out of set of possible moves, it is called light playout.
124 | 
125 | ```python 
126 | def backpropagate(self, result):
127 |     self._number_of_visits += 1.
128 |     self._results[result] += 1.
129 |     if self.parent:
130 |         self.parent.backpropagate(result)
131 | ``` 
132 | In this step all the statistics for the nodes are updated. Untill the parent node is reached, the number of visits for each node is incremented by 1. If the result is 1, that is it resulted in a win, then the win is incremented by 1. Otherwise if result is a loss, then loss is incremented by 1.
133 | 
134 | ```python    
135 | def is_fully_expanded(self):
136 |     return len(self._untried_actions) == 0
137 | 
138 | ``` 
139 | All the actions are poped out of _untried_actions one by one. When it becomes empty, that is when the size is zero, it is fully expanded.
140 | 
141 | ```python 
142 | def best_child(self, c_param=0.1):
143 |     
144 |     choices_weights = [(c.q() / c.n()) + c_param * np.sqrt((2 * np.log(self.n()) / c.n())) for c in self.children]
145 |     return self.children[np.argmax(choices_weights)]
146 | 
147 | ``` 
148 | Once fully expanded, this function selects the best child out of the children array. The first term in the formula corresponds to exploitation and the second term corresponds to exploration.
149 | 
150 | ```python 
151 | def rollout_policy(self, possible_moves):
152 |     
153 |     return possible_moves[np.random.randint(len(possible_moves))]
154 | 
155 | ```
156 | Randomly selects a move out of possible moves. This is an example of random playout.
157 | 
158 | ```python
159 | def _tree_policy(self):
160 | 
161 |     current_node = self
162 |     while not current_node.is_terminal_node():
163 |         
164 |         if not current_node.is_fully_expanded():
165 |             return current_node.expand()
166 |         else:
167 |             current_node = current_node.best_child()
168 |     return current_node
169 | ```       
170 | Selects node to run rollout.
171 |     
172 | ```python
173 | def best_action(self):
174 |     simulation_no = 100
175 | 	
176 | 	
177 |     for i in range(simulation_no):
178 | 		
179 |         v = self._tree_policy()
180 |         reward = v.rollout()
181 |         v.backpropagate(reward)
182 | 	
183 |     return self.best_child(c_param=0.)
184 | ```
185 | This is the best action function which returns the node corresponding to best possible move.
186 | The step of expansion, simulation and backpropagation are carried out by the code above.
187 | 
188 | ```python
189 | def get_legal_actions(self): 
190 |     '''
191 |     Modify according to your game or
192 |     needs. Constructs a list of all
193 |     possible states from current state.
194 |     Returns a list.
195 |     '''
196 | ```
197 | 
198 | ```python
199 | def is_game_over(self):
200 |     '''
201 |     Modify according to your game or 
202 |     needs. It is the game over condition
203 |     and depends on your game. Returns
204 |     true or false
205 |     '''
206 | ```
207 | 
208 | ```python
209 | def game_result(self):
210 |     '''
211 |     Modify according to your game or 
212 |     needs. Returns 1 or 0 or -1 depending
213 |     on your state corresponding to win,
214 |     tie or a loss.
215 |     '''
216 | ```
217 | ```python
218 | def move(self,action):
219 |     '''
220 |     Modify according to your game or 
221 |     needs. Changes the state of your 
222 |     board with a new value. For a normal
223 |     Tic Tac Toe game, it can be a 3 by 3
224 |     array with all the elements of array
225 |     being 0 initially. 0 means the board 
226 |     position is empty. If you place x in
227 |     row 2 column 3, then it would be some 
228 |     thing like board[2][3] = 1, where 1
229 |     represents that x is placed. Returns 
230 |     the new state after making a move.
231 |     '''
232 | 
233 | ```
234 | 
235 | ```python
236 | def main():
237 |     root = MonteCarloTreeSearchNode(state,None,action)
238 |     selected_node = root.best_action()
239 |     return 
240 | ```
241 | This is the main() function. Initialize the root node and call the best_action function to get the best node. This is not a member function of the class. All the other functions are member function of the class.
242 | 
243 | MCTS consists of 4 steps:
244 | 
245 | ## SELECTION
246 | 
247 | The idea is to keep selecting best child nodes until we reach the leaf node of the tree. A good way to select such a child node is to use UCT (Upper Confidence Bound applied to trees) formula:
248 | 
249 | 		wi/ni + c*sqrt(t)/ni
250 |   
251 | 
252 | wi = number of wins after the i-th move  
253 | ni = number of simulations after the i-th move  
254 | c = exploration parameter (theoretically equal to √2)  
255 | t = total number of simulations for the parent node  
256 | 
257 | ## EXPANSION:
258 | 
259 | When it can no longer apply UCT to find the successor node, it expands the game tree by appending all possible states from the leaf node.
260 | 
261 | ## SIMULATION:
262 | 
263 | After Expansion, the algorithm picks a child node arbitrarily, and it simulates entire game from selected node until it reaches the resulting state of the game. If nodes are picked randomly during the play out, it is called light play out. You can also opt for heavy play out by writing quality heuristics or evaluation functions.
264 | 
265 | ## BACKPROPAGATION:
266 | 
267 | Once the algorithm reaches the end of the game, it evaluates the state to figure out which player has won. It traverses upwards to the root and increments visit score for all visited nodes. It also updates win score for each node if the player for that position has won the playout.
268 | 
269 | 
270 | ## DESIGNING YOUR GAME:
271 | 
272 | If you plan to make your own game, you will have to think about the following questions.
273 | 1. **How will you represent the state of your game? Think about the initial state in our game.** 
274 | 2. **What will be the end game condition for your game? Compare it with the end game condition of our game.**
275 | 3. **How will you get the legal actions in your game? Try getting the legal actions for the first move of our game.** 
276 | 
277 | [Sudo Tic Tac Toe][jekyll-talk]
278 | 
279 | [![homepage](/assets/google-play-badge.png)][2]
280 | 
281 | If you have any questions or suggestions, feel free to contact us at bosonicstudios@gmail.com
282 | 
283 | [jekyll-talk]: https://play.google.com/store/apps/details?id=com.myComp.sudo
284 | [2]: https://play.google.com/store/apps/details?id=com.myComp.sudo
285 | 


--------------------------------------------------------------------------------
/index.markdown:
--------------------------------------------------------------------------------
  1 | ---
  2 | # Feel free to add content and custom Front Matter to this file.
  3 | # To modify the layout, see https://jekyllrb.com/docs/themes/#overriding-theme-defaults
  4 | 
  5 | layout: home
  6 | ---
  7 | 
  8 | In this tutorial we will be explaining the Monte Carlo Tree Search algorithm and each part of the code. Recently we applied MCTS to develop our game.
  9 | 
 10 | The code is general and only assumes familarity with basic Python. We have explained it with respect to our game. If you want to use it for your project or game, you will have to slightly modify the functions which I have mentioned below.
 11 | 
 12 | <img src="assets/400by400.png" alt="drawing" width="250"/>
 13 | 
 14 | Here is the playstore link to the game: [Fractio][jekyll-talk].
 15 | 
 16 | [![homepage](/assets/google-play-badge.png)][2]
 17 | 
 18 | Rules for our game(mode 1) are as follows:
 19 | 
 20 | 1. The game is played on a 9 by 9 grid like Sudoku.
 21 | 2. This big 9 by 9 grid is divided into 9 smaller 3 by 3 grids (local board).
 22 | 3. Aim of the game is to win any one local board of the 9 available.
 23 | 4. Your move determines in which local board A.I has to make a move and viceversa.
 24 | 5. For example you make a move in position 1 of local board number 5.
 25 |  This will force the A.I to make a move in local board number 1.
 26 | 6. Rules of normal Tic Tac Toe are applied to local board.
 27 | 
 28 | ## Why MCTS?
 29 | 
 30 | As you would have seen this game has a very high branching factor. For the first move the entire board is empty. So there are 81 empty spots. For the first turn it has 81 possible moves. For the second turn by applying rule 4 it has 8 or 9 possible moves.
 31 | 
 32 | For the first 2 moves this results in 81*9 = 729 possible combinations. Thus the number of possible combinations increases as the game progresses, resulting in a high branching factor. For both the modes of our game the branching factor is very high. For games with such high branching factor it's not possible to apply the minimax algorithm. MCTS algorithm works for these kind of games.
 33 | 
 34 | Also as you would have seen from playing the game the time it takes for the ai to make a move is just about a second. Thus MCTS runs fast. MCTS has been applied to both the modes of the game.
 35 | 
 36 | MCTS consists of 4 steps:
 37 | 
 38 | Note: You might not understand initially but look at the MCTS code below for proper explanation.
 39 | 
 40 | ## SELECTION
 41 | 
 42 | The idea is to keep selecting best child nodes until we reach the leaf node of the tree. A good way to select such a child node is to use UCT (Upper Confidence Bound applied to trees) formula:
 43 | 
 44 | 		wi/ni + c*sqrt(t)/ni
 45 |   
 46 | 
 47 | wi = number of wins after the i-th move  
 48 | ni = number of simulations after the i-th move  
 49 | c = exploration parameter (theoretically equal to √2)  
 50 | t = total number of simulations for the parent node  
 51 | 
 52 | ## EXPANSION:
 53 | 
 54 | When it can no longer apply UCT to find the successor node, it expands the game tree by appending all possible states from the leaf node.
 55 | 
 56 | ## SIMULATION:
 57 | 
 58 | After Expansion, the algorithm picks a child node arbitrarily, and it simulates entire game from selected node until it reaches the resulting state of the game. If nodes are picked randomly during the play out, it is called light play out. You can also opt for heavy play out by writing quality heuristics or evaluation functions.
 59 | 
 60 | ## BACKPROPAGATION:
 61 | 
 62 | Once the algorithm reaches the end of the game, it evaluates the state to figure out which player has won. It traverses upwards to the root and increments visit score for all visited nodes. It also updates win score for each node if the player for that position has won the playout.
 63 | 
 64 | Below we demonstrate the MCTS code in Python.
 65 | First we need to import numpy and defaultdict.
 66 | 
 67 | ```python
 68 | import numpy as np
 69 | from collections import defaultdict
 70 | ```
 71 | Define MCTS class as shown below. 
 72 | 
 73 | ```python
 74 | class MonteCarloTreeSearchNode():
 75 |     def __init__(self, state, parent=None, parent_action=None):
 76 |         self.state = state
 77 |         self.parent = parent
 78 |         self.parent_action = parent_action
 79 |         self.children = []
 80 |         self._number_of_visits = 0
 81 |         self._results = defaultdict(int)
 82 |         self._results[1] = 0
 83 |         self._results[-1] = 0
 84 |         self._untried_actions = None
 85 |         self._untried_actions = self.untried_actions()
 86 |         return
 87 | 
 88 | ```
 89 | ### Constructor is used to initialize the following variables.
 90 | - **state**: For our game it represents the board state. Generally the board state is represented by an array. For normal Tic Tac Toe, it is a 3 by 3 array.
 91 | - **parent**: It is None for the root node and for other nodes it is equal to the node it is derived from. For the first turn as you have seen from the game  it is None.
 92 | - **children**: It contains all possible actions from the current node. For the second turn in our game this is 9 or 8 depending on where you make your move.
 93 | - **parent_action**: None for the root node and for other nodes it is equal to the action which it's parent carried out.
 94 | - **_number_of_visits**: Number of times current node is visited
 95 | - **results**: It's a dictionary
 96 | - **_untried_actions**: Represents the list of all possible actions
 97 | - **action**: Move which has to be carried out.  
 98 | 
 99 | Class consists of the following member functions. All the functions below are member function except the main() function.
100 | 
101 | ```python
102 | def untried_actions(self):
103 | 
104 |     self._untried_actions = self.state.get_legal_actions()
105 |     return self._untried_actions
106 | ```
107 | Returns the list of untried actions from a given state. For the first turn of our game there are 81 possible actions. For the second turn it is 8 or 9. This varies in our game.
108 | 
109 | ```python    
110 | def q(self):
111 |     wins = self._results[1]
112 |     loses = self._results[-1]
113 |     return wins - loses
114 | ```
115 | Returns the difference of wins - losses
116 | 
117 | ```python   
118 | def n(self):
119 |     return self._number_of_visits
120 | ```
121 | Returns the number of times each node is visited.
122 | 
123 | ```python
124 | def expand(self):
125 | 	
126 |     action = self._untried_actions.pop()
127 |     next_state = self.state.move(action)
128 |     child_node = MonteCarloTreeSearchNode(
129 | 		next_state, parent=self, parent_action=action)
130 | 
131 |     self.children.append(child_node)
132 |     return child_node 
133 | ``` 
134 | From the present state, next state is generated depending on the action which is carried out. In this step all the possible child nodes corresponding to generated states are appended to the children array and the child_node is returned. The states which are possible from the present state are all generated and the child_node corresponding to this generated state is returned.
135 | 
136 | ```python 
137 | def is_terminal_node(self):
138 |     return self.state.is_game_over()
139 | ``` 
140 | This is used to check if the current node is terminal or not. Terminal node is reached when the game is over.
141 | 
142 | ```python 
143 | def rollout(self):
144 |     current_rollout_state = self.state
145 |     
146 |     while not current_rollout_state.is_game_over():
147 |         
148 |         possible_moves = current_rollout_state.get_legal_actions()
149 |         
150 |         action = self.rollout_policy(possible_moves)
151 |         current_rollout_state = current_rollout_state.move(action)
152 |     return current_rollout_state.game_result()
153 | ``` 
154 | From the current state, entire game is simulated till there is an outcome for the game. This outcome of the game is returned. For example if it results in a win, the outcome is 1. Otherwise it is -1 if it results in a loss. And it is 0 if it is a tie. If the entire game is randomly simulated, that is at each turn the move is randomly selected out of set of possible moves, it is called light playout.
155 | 
156 | ```python 
157 | def backpropagate(self, result):
158 |     self._number_of_visits += 1.
159 |     self._results[result] += 1.
160 |     if self.parent:
161 |         self.parent.backpropagate(result)
162 | ``` 
163 | In this step all the statistics for the nodes are updated. Untill the parent node is reached, the number of visits for each node is incremented by 1. If the result is 1, that is it resulted in a win, then the win is incremented by 1. Otherwise if result is a loss, then loss is incremented by 1.
164 | 
165 | ```python    
166 | def is_fully_expanded(self):
167 |     return len(self._untried_actions) == 0
168 | 
169 | ``` 
170 | All the actions are poped out of _untried_actions one by one. When it becomes empty, that is when the size is zero, it is fully expanded.
171 | 
172 | ```python 
173 | def best_child(self, c_param=0.1):
174 |     
175 |     choices_weights = [(c.q() / c.n()) + c_param * np.sqrt((2 * np.log(self.n()) / c.n())) for c in self.children]
176 |     return self.children[np.argmax(choices_weights)]
177 | 
178 | ``` 
179 | Once fully expanded, this function selects the best child out of the children array. The first term in the formula corresponds to exploitation and the second term corresponds to exploration.
180 | 
181 | ```python 
182 | def rollout_policy(self, possible_moves):
183 |     
184 |     return possible_moves[np.random.randint(len(possible_moves))]
185 | 
186 | ```
187 | Randomly selects a move out of possible moves. This is an example of random playout.
188 | 
189 | ```python
190 | def _tree_policy(self):
191 | 
192 |     current_node = self
193 |     while not current_node.is_terminal_node():
194 |         
195 |         if not current_node.is_fully_expanded():
196 |             return current_node.expand()
197 |         else:
198 |             current_node = current_node.best_child()
199 |     return current_node
200 | ```       
201 | Selects node to run rollout.
202 |     
203 | ```python
204 | def best_action(self):
205 |     simulation_no = 100
206 | 	
207 | 	
208 |     for i in range(simulation_no):
209 | 		
210 |         v = self._tree_policy()
211 |         reward = v.rollout()
212 |         v.backpropagate(reward)
213 | 	
214 |     return self.best_child(c_param=0.)
215 | ```
216 | This is the best action function which returns the node corresponding to best possible move.
217 | The step of expansion, simulation and backpropagation are carried out by the code above.
218 | 
219 | ```python
220 | def get_legal_actions(self): 
221 |     '''
222 |     Modify according to your game or
223 |     needs. Constructs a list of all
224 |     possible actions from current state.
225 |     Returns a list.
226 |     '''
227 | ```
228 | 
229 | ```python
230 | def is_game_over(self):
231 |     '''
232 |     Modify according to your game or 
233 |     needs. It is the game over condition
234 |     and depends on your game. Returns
235 |     true or false
236 |     '''
237 | ```
238 | 
239 | ```python
240 | def game_result(self):
241 |     '''
242 |     Modify according to your game or 
243 |     needs. Returns 1 or 0 or -1 depending
244 |     on your state corresponding to win,
245 |     tie or a loss.
246 |     '''
247 | ```
248 | ```python
249 | def move(self,action):
250 |     '''
251 |     Modify according to your game or 
252 |     needs. Changes the state of your 
253 |     board with a new value. For a normal
254 |     Tic Tac Toe game, it can be a 3 by 3
255 |     array with all the elements of array
256 |     being 0 initially. 0 means the board 
257 |     position is empty. If you place x in
258 |     row 2 column 3, then it would be some 
259 |     thing like board[2][3] = 1, where 1
260 |     represents that x is placed. Returns 
261 |     the new state after making a move.
262 |     '''
263 | 
264 | ```
265 | 
266 | ```python
267 | def main():
268 |     root = MonteCarloTreeSearchNode(state = initial_state)
269 |     selected_node = root.best_action()
270 |     return 
271 | ```
272 | This is the main() function. Initialize the root node and call the best_action function to get the best node. This is not a member function of the class. All the other functions are member function of the class.
273 | 
274 | If you like the tutorial please consider checking out game page of my 3D racing game [Speed Surge][3]
275 | 
276 | ## DESIGNING YOUR GAME:
277 | 
278 | If you plan to make your own game, you will have to think about the following questions.
279 | 1. **How will you represent the state of your game? Think about the initial state in our game.** 
280 | 2. **What will be the end game condition for your game? Compare it with the end game condition of our game.**
281 | 3. **How will you get the legal actions in your game? Try getting the legal actions for the first move of our game.** 
282 | 
283 | <iframe src="https://store.steampowered.com/widget/3566670/" frameborder="0" width="646" height="190"></iframe>
284 | 
285 | If you have any questions or suggestions, feel free to contact me at bosonicstudios@gmail.com
286 | 
287 | [jekyll-talk]: https://play.google.com/store/apps/details?id=com.myComp.sudo
288 | [2]: https://play.google.com/store/apps/details?id=com.myComp.sudo
289 | [3]: https://store.steampowered.com/app/3566670/Speed_Surge/
290 | 


--------------------------------------------------------------------------------