├── 400by400.png ├── favicon.ico ├── google38bcc26191576bef.html ├── .gitignore ├── a06740a73b693e9847f815050642ff0f.html ├── assets ├── 400by400.png └── google-play-badge.png ├── about.markdown ├── 404.html ├── LICENSE ├── Gemfile ├── Gemfile.lock ├── _config.yml ├── README.md └── index.markdown /400by400.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ai-boson/mcts/HEAD/400by400.png -------------------------------------------------------------------------------- /favicon.ico: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ai-boson/mcts/HEAD/favicon.ico -------------------------------------------------------------------------------- /google38bcc26191576bef.html: -------------------------------------------------------------------------------- 1 | google-site-verification: google38bcc26191576bef.html -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | _site 2 | .sass-cache 3 | .jekyll-cache 4 | .jekyll-metadata 5 | vendor 6 | -------------------------------------------------------------------------------- /a06740a73b693e9847f815050642ff0f.html: -------------------------------------------------------------------------------- 1 | site-verification: a06740a73b693e9847f815050642ff0f -------------------------------------------------------------------------------- /assets/400by400.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ai-boson/mcts/HEAD/assets/400by400.png -------------------------------------------------------------------------------- /assets/google-play-badge.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ai-boson/mcts/HEAD/assets/google-play-badge.png -------------------------------------------------------------------------------- /about.markdown: -------------------------------------------------------------------------------- 1 | --- 2 | layout: page 3 | title: About 4 | permalink: /about/ 5 | --- 6 | 7 | I am interested in Algorithms, Game Design, Game Engines and Coding in general. I like making games and do game development in my free time. If you have any questions or want to collabrate on projects contact me at bosonicstudios@gmail.com 8 | 9 | -------------------------------------------------------------------------------- /404.html: -------------------------------------------------------------------------------- 1 | --- 2 | permalink: /404.html 3 | layout: default 4 | --- 5 | 6 | 19 | 20 |
Page not found :(
24 |The requested page could not be found.
25 |
10 |
11 | Here is the playstore link to the game: [Sudo Tic Tac Toe][jekyll-talk].
12 |
13 | Rules for our game(mode 1) are as follows:
14 |
15 | 1. The game is played on a 9 by 9 grid like Sudoku.
16 | 2. This big 9 by 9 grid is divided into 9 smaller 3 by 3 grids (local board).
17 | 3. Aim of the game is to win any one local board of the 9 available.
18 | 4. Your move determines in which local board A.I has to make a move and viceversa.
19 | 5. For example you make a move in position 1 of local board number 5.
20 | This will force the A.I to make a move in local board number 1.
21 | 6. Rules of normal Tic Tac Toe are applied to local board.
22 |
23 | ## Why MCTS?
24 |
25 | As you would have seen this game has a very high branching factor. For the first move the entire board is empty. So there are 81 empty spots. For the first turn it has 81 possible moves. For the second turn by applying rule 4 it has 8 or 9 possible moves.
26 |
27 | For the first 2 moves this results in 81*9 = 729 possible combinations. Thus the number of possible combinations increases as the game progresses, resulting in a high branching factor. For both the modes of our game the branching factor is very high. For games with such high branching factor it's not possible to apply the minimax algorithm. MCTS algorithm works for these kind of games.
28 |
29 | Also as you would have seen from playing the game the time it takes for the ai to make a move is just about a second. Thus MCTS runs fast. MCTS has been applied to both the modes of the game.
30 |
31 | [][2]
32 |
33 | Below we demonstrate the MCTS code in Python.
34 | First we need to import numpy and defaultdict.
35 |
36 | ```python
37 | import numpy as np
38 | from collections import defaultdict
39 | ```
40 | Define MCTS class as shown below.
41 |
42 | ```python
43 | class MonteCarloTreeSearchNode():
44 | def __init__(self, state, parent=None, parent_action=None):
45 | self.state = state
46 | self.parent = parent
47 | self.parent_action = parent_action
48 | self.children = []
49 | self._number_of_visits = 0
50 | self._results = defaultdict(int)
51 | self._results[1] = 0
52 | self._results[-1] = 0
53 | self._untried_actions = None
54 | self._untried_actions = self.untried_actions()
55 | return
56 |
57 | ```
58 | ### Constructor is used to initialize the following variables.
59 | - **state**: For our game it represents the board state. Generally the board state is represented by an array. For normal Tic Tac Toe, it is a 3 by 3 array.
60 | - **parent**: It is None for the root node and for other nodes it is equal to the node it is derived from. For the first turn as you have seen from the game it is None.
61 | - **children**: It contains all possible actions from the current node. For the second turn in our game this is 9 or 8 depending on where you make your move.
62 | - **parent_action**: None for the root node and for other nodes it is equal to the action which it's parent carried out.
63 | - **_number_of_visits**: Number of times current node is visited
64 | - **results**: It's a dictionary
65 | - **_untried_actions**: Represents the list of all possible actions
66 | - **action**: Move which has to be carried out.
67 |
68 | Class consists of the following member functions. All the functions below are member function except the main() function.
69 |
70 | ```python
71 | def untried_actions(self):
72 |
73 | self._untried_actions = self.state.get_legal_actions()
74 | return self._untried_actions
75 | ```
76 | Returns the list of untried actions from a given state. For the first turn of our game there are 81 possible actions. For the second turn it is 8 or 9. This varies in our game.
77 |
78 | ```python
79 | def q(self):
80 | wins = self._results[1]
81 | loses = self._results[-1]
82 | return wins - loses
83 | ```
84 | Returns the difference of wins - losses
85 |
86 | ```python
87 | def n(self):
88 | return self._number_of_visits
89 | ```
90 | Returns the number of times each node is visited.
91 |
92 | ```python
93 | def expand(self):
94 |
95 | action = self._untried_actions.pop()
96 | next_state = self.state.move(action)
97 | child_node = MonteCarloTreeSearchNode(
98 | next_state, parent=self, parent_action=action)
99 |
100 | self.children.append(child_node)
101 | return child_node
102 | ```
103 | From the present state, next state is generated depending on the action which is carried out. In this step all the possible states are appended to the children array and the child_node is returned. The states which are possible from the present state are all appended to the children array and the child_node corresponding to this state is returned.
104 |
105 | ```python
106 | def is_terminal_node(self):
107 | return self.state.is_game_over()
108 | ```
109 | This is used to check if the current node is terminal or not. Terminal node is reached when the game is over.
110 |
111 | ```python
112 | def rollout(self):
113 | current_rollout_state = self.state
114 |
115 | while not current_rollout_state.is_game_over():
116 |
117 | possible_moves = current_rollout_state.get_legal_actions()
118 |
119 | action = self.rollout_policy(possible_moves)
120 | current_rollout_state = current_rollout_state.move(action)
121 | return current_rollout_state.game_result()
122 | ```
123 | From the current state, entire game is simulated till there is an outcome for the game. This outcome of the game is returned. For example if it results in a win, the outcome is 1. Otherwise it is -1 if it results in a loss. And it is 0 if it is a tie. If the entire game is randomly simulated, that is at each turn the move is randomly selected out of set of possible moves, it is called light playout.
124 |
125 | ```python
126 | def backpropagate(self, result):
127 | self._number_of_visits += 1.
128 | self._results[result] += 1.
129 | if self.parent:
130 | self.parent.backpropagate(result)
131 | ```
132 | In this step all the statistics for the nodes are updated. Untill the parent node is reached, the number of visits for each node is incremented by 1. If the result is 1, that is it resulted in a win, then the win is incremented by 1. Otherwise if result is a loss, then loss is incremented by 1.
133 |
134 | ```python
135 | def is_fully_expanded(self):
136 | return len(self._untried_actions) == 0
137 |
138 | ```
139 | All the actions are poped out of _untried_actions one by one. When it becomes empty, that is when the size is zero, it is fully expanded.
140 |
141 | ```python
142 | def best_child(self, c_param=0.1):
143 |
144 | choices_weights = [(c.q() / c.n()) + c_param * np.sqrt((2 * np.log(self.n()) / c.n())) for c in self.children]
145 | return self.children[np.argmax(choices_weights)]
146 |
147 | ```
148 | Once fully expanded, this function selects the best child out of the children array. The first term in the formula corresponds to exploitation and the second term corresponds to exploration.
149 |
150 | ```python
151 | def rollout_policy(self, possible_moves):
152 |
153 | return possible_moves[np.random.randint(len(possible_moves))]
154 |
155 | ```
156 | Randomly selects a move out of possible moves. This is an example of random playout.
157 |
158 | ```python
159 | def _tree_policy(self):
160 |
161 | current_node = self
162 | while not current_node.is_terminal_node():
163 |
164 | if not current_node.is_fully_expanded():
165 | return current_node.expand()
166 | else:
167 | current_node = current_node.best_child()
168 | return current_node
169 | ```
170 | Selects node to run rollout.
171 |
172 | ```python
173 | def best_action(self):
174 | simulation_no = 100
175 |
176 |
177 | for i in range(simulation_no):
178 |
179 | v = self._tree_policy()
180 | reward = v.rollout()
181 | v.backpropagate(reward)
182 |
183 | return self.best_child(c_param=0.)
184 | ```
185 | This is the best action function which returns the node corresponding to best possible move.
186 | The step of expansion, simulation and backpropagation are carried out by the code above.
187 |
188 | ```python
189 | def get_legal_actions(self):
190 | '''
191 | Modify according to your game or
192 | needs. Constructs a list of all
193 | possible states from current state.
194 | Returns a list.
195 | '''
196 | ```
197 |
198 | ```python
199 | def is_game_over(self):
200 | '''
201 | Modify according to your game or
202 | needs. It is the game over condition
203 | and depends on your game. Returns
204 | true or false
205 | '''
206 | ```
207 |
208 | ```python
209 | def game_result(self):
210 | '''
211 | Modify according to your game or
212 | needs. Returns 1 or 0 or -1 depending
213 | on your state corresponding to win,
214 | tie or a loss.
215 | '''
216 | ```
217 | ```python
218 | def move(self,action):
219 | '''
220 | Modify according to your game or
221 | needs. Changes the state of your
222 | board with a new value. For a normal
223 | Tic Tac Toe game, it can be a 3 by 3
224 | array with all the elements of array
225 | being 0 initially. 0 means the board
226 | position is empty. If you place x in
227 | row 2 column 3, then it would be some
228 | thing like board[2][3] = 1, where 1
229 | represents that x is placed. Returns
230 | the new state after making a move.
231 | '''
232 |
233 | ```
234 |
235 | ```python
236 | def main():
237 | root = MonteCarloTreeSearchNode(state,None,action)
238 | selected_node = root.best_action()
239 | return
240 | ```
241 | This is the main() function. Initialize the root node and call the best_action function to get the best node. This is not a member function of the class. All the other functions are member function of the class.
242 |
243 | MCTS consists of 4 steps:
244 |
245 | ## SELECTION
246 |
247 | The idea is to keep selecting best child nodes until we reach the leaf node of the tree. A good way to select such a child node is to use UCT (Upper Confidence Bound applied to trees) formula:
248 |
249 | wi/ni + c*sqrt(t)/ni
250 |
251 |
252 | wi = number of wins after the i-th move
253 | ni = number of simulations after the i-th move
254 | c = exploration parameter (theoretically equal to √2)
255 | t = total number of simulations for the parent node
256 |
257 | ## EXPANSION:
258 |
259 | When it can no longer apply UCT to find the successor node, it expands the game tree by appending all possible states from the leaf node.
260 |
261 | ## SIMULATION:
262 |
263 | After Expansion, the algorithm picks a child node arbitrarily, and it simulates entire game from selected node until it reaches the resulting state of the game. If nodes are picked randomly during the play out, it is called light play out. You can also opt for heavy play out by writing quality heuristics or evaluation functions.
264 |
265 | ## BACKPROPAGATION:
266 |
267 | Once the algorithm reaches the end of the game, it evaluates the state to figure out which player has won. It traverses upwards to the root and increments visit score for all visited nodes. It also updates win score for each node if the player for that position has won the playout.
268 |
269 |
270 | ## DESIGNING YOUR GAME:
271 |
272 | If you plan to make your own game, you will have to think about the following questions.
273 | 1. **How will you represent the state of your game? Think about the initial state in our game.**
274 | 2. **What will be the end game condition for your game? Compare it with the end game condition of our game.**
275 | 3. **How will you get the legal actions in your game? Try getting the legal actions for the first move of our game.**
276 |
277 | [Sudo Tic Tac Toe][jekyll-talk]
278 |
279 | [][2]
280 |
281 | If you have any questions or suggestions, feel free to contact us at bosonicstudios@gmail.com
282 |
283 | [jekyll-talk]: https://play.google.com/store/apps/details?id=com.myComp.sudo
284 | [2]: https://play.google.com/store/apps/details?id=com.myComp.sudo
285 |
--------------------------------------------------------------------------------
/index.markdown:
--------------------------------------------------------------------------------
1 | ---
2 | # Feel free to add content and custom Front Matter to this file.
3 | # To modify the layout, see https://jekyllrb.com/docs/themes/#overriding-theme-defaults
4 |
5 | layout: home
6 | ---
7 |
8 | In this tutorial we will be explaining the Monte Carlo Tree Search algorithm and each part of the code. Recently we applied MCTS to develop our game.
9 |
10 | The code is general and only assumes familarity with basic Python. We have explained it with respect to our game. If you want to use it for your project or game, you will have to slightly modify the functions which I have mentioned below.
11 |
12 |
13 |
14 | Here is the playstore link to the game: [Fractio][jekyll-talk].
15 |
16 | [][2]
17 |
18 | Rules for our game(mode 1) are as follows:
19 |
20 | 1. The game is played on a 9 by 9 grid like Sudoku.
21 | 2. This big 9 by 9 grid is divided into 9 smaller 3 by 3 grids (local board).
22 | 3. Aim of the game is to win any one local board of the 9 available.
23 | 4. Your move determines in which local board A.I has to make a move and viceversa.
24 | 5. For example you make a move in position 1 of local board number 5.
25 | This will force the A.I to make a move in local board number 1.
26 | 6. Rules of normal Tic Tac Toe are applied to local board.
27 |
28 | ## Why MCTS?
29 |
30 | As you would have seen this game has a very high branching factor. For the first move the entire board is empty. So there are 81 empty spots. For the first turn it has 81 possible moves. For the second turn by applying rule 4 it has 8 or 9 possible moves.
31 |
32 | For the first 2 moves this results in 81*9 = 729 possible combinations. Thus the number of possible combinations increases as the game progresses, resulting in a high branching factor. For both the modes of our game the branching factor is very high. For games with such high branching factor it's not possible to apply the minimax algorithm. MCTS algorithm works for these kind of games.
33 |
34 | Also as you would have seen from playing the game the time it takes for the ai to make a move is just about a second. Thus MCTS runs fast. MCTS has been applied to both the modes of the game.
35 |
36 | MCTS consists of 4 steps:
37 |
38 | Note: You might not understand initially but look at the MCTS code below for proper explanation.
39 |
40 | ## SELECTION
41 |
42 | The idea is to keep selecting best child nodes until we reach the leaf node of the tree. A good way to select such a child node is to use UCT (Upper Confidence Bound applied to trees) formula:
43 |
44 | wi/ni + c*sqrt(t)/ni
45 |
46 |
47 | wi = number of wins after the i-th move
48 | ni = number of simulations after the i-th move
49 | c = exploration parameter (theoretically equal to √2)
50 | t = total number of simulations for the parent node
51 |
52 | ## EXPANSION:
53 |
54 | When it can no longer apply UCT to find the successor node, it expands the game tree by appending all possible states from the leaf node.
55 |
56 | ## SIMULATION:
57 |
58 | After Expansion, the algorithm picks a child node arbitrarily, and it simulates entire game from selected node until it reaches the resulting state of the game. If nodes are picked randomly during the play out, it is called light play out. You can also opt for heavy play out by writing quality heuristics or evaluation functions.
59 |
60 | ## BACKPROPAGATION:
61 |
62 | Once the algorithm reaches the end of the game, it evaluates the state to figure out which player has won. It traverses upwards to the root and increments visit score for all visited nodes. It also updates win score for each node if the player for that position has won the playout.
63 |
64 | Below we demonstrate the MCTS code in Python.
65 | First we need to import numpy and defaultdict.
66 |
67 | ```python
68 | import numpy as np
69 | from collections import defaultdict
70 | ```
71 | Define MCTS class as shown below.
72 |
73 | ```python
74 | class MonteCarloTreeSearchNode():
75 | def __init__(self, state, parent=None, parent_action=None):
76 | self.state = state
77 | self.parent = parent
78 | self.parent_action = parent_action
79 | self.children = []
80 | self._number_of_visits = 0
81 | self._results = defaultdict(int)
82 | self._results[1] = 0
83 | self._results[-1] = 0
84 | self._untried_actions = None
85 | self._untried_actions = self.untried_actions()
86 | return
87 |
88 | ```
89 | ### Constructor is used to initialize the following variables.
90 | - **state**: For our game it represents the board state. Generally the board state is represented by an array. For normal Tic Tac Toe, it is a 3 by 3 array.
91 | - **parent**: It is None for the root node and for other nodes it is equal to the node it is derived from. For the first turn as you have seen from the game it is None.
92 | - **children**: It contains all possible actions from the current node. For the second turn in our game this is 9 or 8 depending on where you make your move.
93 | - **parent_action**: None for the root node and for other nodes it is equal to the action which it's parent carried out.
94 | - **_number_of_visits**: Number of times current node is visited
95 | - **results**: It's a dictionary
96 | - **_untried_actions**: Represents the list of all possible actions
97 | - **action**: Move which has to be carried out.
98 |
99 | Class consists of the following member functions. All the functions below are member function except the main() function.
100 |
101 | ```python
102 | def untried_actions(self):
103 |
104 | self._untried_actions = self.state.get_legal_actions()
105 | return self._untried_actions
106 | ```
107 | Returns the list of untried actions from a given state. For the first turn of our game there are 81 possible actions. For the second turn it is 8 or 9. This varies in our game.
108 |
109 | ```python
110 | def q(self):
111 | wins = self._results[1]
112 | loses = self._results[-1]
113 | return wins - loses
114 | ```
115 | Returns the difference of wins - losses
116 |
117 | ```python
118 | def n(self):
119 | return self._number_of_visits
120 | ```
121 | Returns the number of times each node is visited.
122 |
123 | ```python
124 | def expand(self):
125 |
126 | action = self._untried_actions.pop()
127 | next_state = self.state.move(action)
128 | child_node = MonteCarloTreeSearchNode(
129 | next_state, parent=self, parent_action=action)
130 |
131 | self.children.append(child_node)
132 | return child_node
133 | ```
134 | From the present state, next state is generated depending on the action which is carried out. In this step all the possible child nodes corresponding to generated states are appended to the children array and the child_node is returned. The states which are possible from the present state are all generated and the child_node corresponding to this generated state is returned.
135 |
136 | ```python
137 | def is_terminal_node(self):
138 | return self.state.is_game_over()
139 | ```
140 | This is used to check if the current node is terminal or not. Terminal node is reached when the game is over.
141 |
142 | ```python
143 | def rollout(self):
144 | current_rollout_state = self.state
145 |
146 | while not current_rollout_state.is_game_over():
147 |
148 | possible_moves = current_rollout_state.get_legal_actions()
149 |
150 | action = self.rollout_policy(possible_moves)
151 | current_rollout_state = current_rollout_state.move(action)
152 | return current_rollout_state.game_result()
153 | ```
154 | From the current state, entire game is simulated till there is an outcome for the game. This outcome of the game is returned. For example if it results in a win, the outcome is 1. Otherwise it is -1 if it results in a loss. And it is 0 if it is a tie. If the entire game is randomly simulated, that is at each turn the move is randomly selected out of set of possible moves, it is called light playout.
155 |
156 | ```python
157 | def backpropagate(self, result):
158 | self._number_of_visits += 1.
159 | self._results[result] += 1.
160 | if self.parent:
161 | self.parent.backpropagate(result)
162 | ```
163 | In this step all the statistics for the nodes are updated. Untill the parent node is reached, the number of visits for each node is incremented by 1. If the result is 1, that is it resulted in a win, then the win is incremented by 1. Otherwise if result is a loss, then loss is incremented by 1.
164 |
165 | ```python
166 | def is_fully_expanded(self):
167 | return len(self._untried_actions) == 0
168 |
169 | ```
170 | All the actions are poped out of _untried_actions one by one. When it becomes empty, that is when the size is zero, it is fully expanded.
171 |
172 | ```python
173 | def best_child(self, c_param=0.1):
174 |
175 | choices_weights = [(c.q() / c.n()) + c_param * np.sqrt((2 * np.log(self.n()) / c.n())) for c in self.children]
176 | return self.children[np.argmax(choices_weights)]
177 |
178 | ```
179 | Once fully expanded, this function selects the best child out of the children array. The first term in the formula corresponds to exploitation and the second term corresponds to exploration.
180 |
181 | ```python
182 | def rollout_policy(self, possible_moves):
183 |
184 | return possible_moves[np.random.randint(len(possible_moves))]
185 |
186 | ```
187 | Randomly selects a move out of possible moves. This is an example of random playout.
188 |
189 | ```python
190 | def _tree_policy(self):
191 |
192 | current_node = self
193 | while not current_node.is_terminal_node():
194 |
195 | if not current_node.is_fully_expanded():
196 | return current_node.expand()
197 | else:
198 | current_node = current_node.best_child()
199 | return current_node
200 | ```
201 | Selects node to run rollout.
202 |
203 | ```python
204 | def best_action(self):
205 | simulation_no = 100
206 |
207 |
208 | for i in range(simulation_no):
209 |
210 | v = self._tree_policy()
211 | reward = v.rollout()
212 | v.backpropagate(reward)
213 |
214 | return self.best_child(c_param=0.)
215 | ```
216 | This is the best action function which returns the node corresponding to best possible move.
217 | The step of expansion, simulation and backpropagation are carried out by the code above.
218 |
219 | ```python
220 | def get_legal_actions(self):
221 | '''
222 | Modify according to your game or
223 | needs. Constructs a list of all
224 | possible actions from current state.
225 | Returns a list.
226 | '''
227 | ```
228 |
229 | ```python
230 | def is_game_over(self):
231 | '''
232 | Modify according to your game or
233 | needs. It is the game over condition
234 | and depends on your game. Returns
235 | true or false
236 | '''
237 | ```
238 |
239 | ```python
240 | def game_result(self):
241 | '''
242 | Modify according to your game or
243 | needs. Returns 1 or 0 or -1 depending
244 | on your state corresponding to win,
245 | tie or a loss.
246 | '''
247 | ```
248 | ```python
249 | def move(self,action):
250 | '''
251 | Modify according to your game or
252 | needs. Changes the state of your
253 | board with a new value. For a normal
254 | Tic Tac Toe game, it can be a 3 by 3
255 | array with all the elements of array
256 | being 0 initially. 0 means the board
257 | position is empty. If you place x in
258 | row 2 column 3, then it would be some
259 | thing like board[2][3] = 1, where 1
260 | represents that x is placed. Returns
261 | the new state after making a move.
262 | '''
263 |
264 | ```
265 |
266 | ```python
267 | def main():
268 | root = MonteCarloTreeSearchNode(state = initial_state)
269 | selected_node = root.best_action()
270 | return
271 | ```
272 | This is the main() function. Initialize the root node and call the best_action function to get the best node. This is not a member function of the class. All the other functions are member function of the class.
273 |
274 | If you like the tutorial please consider checking out game page of my 3D racing game [Speed Surge][3]
275 |
276 | ## DESIGNING YOUR GAME:
277 |
278 | If you plan to make your own game, you will have to think about the following questions.
279 | 1. **How will you represent the state of your game? Think about the initial state in our game.**
280 | 2. **What will be the end game condition for your game? Compare it with the end game condition of our game.**
281 | 3. **How will you get the legal actions in your game? Try getting the legal actions for the first move of our game.**
282 |
283 |
284 |
285 | If you have any questions or suggestions, feel free to contact me at bosonicstudios@gmail.com
286 |
287 | [jekyll-talk]: https://play.google.com/store/apps/details?id=com.myComp.sudo
288 | [2]: https://play.google.com/store/apps/details?id=com.myComp.sudo
289 | [3]: https://store.steampowered.com/app/3566670/Speed_Surge/
290 |
--------------------------------------------------------------------------------