├── .gitignore ├── README.md ├── agent ├── __init__.py ├── basic_agent.py ├── rl │ ├── ApproxQAgent.npy │ ├── ApproxQAgent_1e-3_4e-1randomoppo.npy │ ├── ApproxQAgent_2e-3.npy │ ├── ApproxQAgent_5e-4.npy │ ├── __init__.py │ ├── rl_agent.py │ ├── rl_agentx.py │ └── rl_env.py ├── search │ ├── __init__.py │ ├── evaluation.py │ └── search_agent.py └── util.py ├── benchmark.py ├── game ├── README.md ├── __init__.py ├── go.py ├── images │ └── ramin.jpg ├── ui.py └── util.py ├── img └── Board.jpg ├── match.py └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | .idea 2 | .DS_Store 3 | __pycache__ 4 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## "Mini" Go Game with Classic AI Agents 2 | 3 | An "easier" version of Go game implemented in Python 3, with more constraints on legal moves, and simpler winning condition. 4 | 5 | GUI is provided for human to play; legal actions at each turn are indicated on the board as blue dots. 6 | 7 | The following AI agents are also provided: 8 | * Random agent 9 | * Greedy agent 10 | * Minimax search agent with alpha-beta pruning 11 | * Expectimax search agent 12 | * Approximate Q-learning agent 13 | 14 | Board 15 | 16 | ### Usage 17 | 18 | Install dependencies: `pip install -r requirements.txt` 19 | 20 | #### Start A Match 21 | 22 | See usage on `match.py`. 23 | 24 | #### Examples: 25 | 26 | **human** vs. **human**: `./match.py` 27 | 28 | **random agent** (BLACK) vs. **human** (WHITE): `./match.py -b random` 29 | 30 | **minimax agent with search depth** 1 (BLACK) vs. **human** (WHITE): `./match.py -b minimax` 31 | 32 | **expectimax agent with search depth** 2 (BLACK) vs. **human** (WHITE): `./match.py -b expectimax -d 2` 33 | 34 | **Q-learning agent** (BLACK) vs. **human** (WHITE): `./match.py -b approx-q` 35 | 36 | **Q-learning agent** (BLACK) vs. **random agent** (WHITE): `./match.py -b approx-q -w random` 37 | 38 | 39 | ```angular2html 40 | usage: Mini Go Game [-h] [-b AGENT_BLACK] [-w AGENT_WHITE] [-d SEARCH_DEPTH] 41 | [-g GUI] [-s DIR_SAVE] 42 | 43 | optional arguments: 44 | -h, --help show this help message and exit 45 | -b AGENT_BLACK, --agent_black AGENT_BLACK 46 | possible agents: random; greedy; minimax; expectimax, 47 | approx-q; DEFAULT is None (human) 48 | -w AGENT_WHITE, --agent_white AGENT_WHITE 49 | possible agents: random; greedy; minimax; expectimax, 50 | approx-q; DEFAULT is None (human) 51 | -d SEARCH_DEPTH, --search_depth SEARCH_DEPTH 52 | the search depth for searching agents if applicable; 53 | DEFAULT is 1 54 | -g GUI, --gui GUI if show GUI; always true if human plays; DEFAULT is 55 | True 56 | -s DIR_SAVE, --dir_save DIR_SAVE 57 | if not None, save the image of last board state to 58 | this directory; DEFAULT is None 59 | ``` 60 | 61 | #### Benchmark on AI Agents 62 | 63 | See `benchmark.py`. 64 | 65 | ### Game Rules 66 | 67 | This "simplified" version of Go has the same rules and concepts (such as "liberties") as the original Go, with the exceptions on legal actions and winning criteria. 68 | 69 | * **Legal actions**: at each turn, the player can only place the stone on one of opponent's liberties, unless the player has any self-group to save. 70 | * If there exists any opponent's group that has only one liberty, the legal actions will be these liberties to cause a direct win. 71 | * Else, if there exists any self-group that has only one liberty, the legal actions will be these liberties to try to save these groups. If sadly there are more than one of these liberties, the player will lose in the next round :( 72 | * Else, there are no endangered groups for both players; the player should place the stone on one of opponent's liberties. 73 | * Suicidal moves are not considered as legal actions. 74 | 75 | * **Winning criteria** (one of the following): 76 | * You remove any opponent's group. 77 | * There are no legal actions for the opponent (this happens around 1.6% for random plays). 78 | 79 | BLACK always has the first move; the first move is always on the center of the board. 80 | 81 | ### Code 82 | 83 | match: the full environment to play a match; a match can be started with or without GUI. 84 | benchmark: the tool to test the performance (e.g. win rate) of AI agents. 85 | 86 | game.go: the full backend of this Go game, with all logic needed in the game. 87 | game.ui: the game GUI on top of the backend. 88 | 89 | agent.basic_agent: basic agents including random agent or greedy agent. 90 | agent.search_agent: agents that utilize searching techniques, including AlphaBeta agent or Expectimax agent. 91 | 92 | -------------------------------------------------------------------------------- /agent/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lxucs/go-game-easy/837578cb41e9481174a361ab2c487045f1bd7ae9/agent/__init__.py -------------------------------------------------------------------------------- /agent/basic_agent.py: -------------------------------------------------------------------------------- 1 | import random 2 | from game.go import Board, opponent_color 3 | import random 4 | 5 | 6 | class Agent: 7 | """Abstract stateless agent.""" 8 | def __init__(self, color): 9 | """ 10 | :param color: 'BLACK' or 'WHITE' 11 | """ 12 | self.color = color 13 | 14 | @classmethod 15 | def terminal_test(cls, board): 16 | return board.winner is not None 17 | 18 | def get_action(self, board: Board): 19 | raise NotImplementedError 20 | 21 | def __str__(self): 22 | return self.__class__.__name__ + '; color: ' + self.color 23 | 24 | 25 | class RandomAgent(Agent): 26 | """Pick a random action.""" 27 | def __init__(self, color): 28 | super().__init__(color) 29 | 30 | def get_action(self, board): 31 | actions = board.get_legal_actions() 32 | return random.choice(actions) if actions else None 33 | 34 | 35 | class GreedyAgent(Agent): 36 | """Pick the action that kills the liberty of most opponent's groups""" 37 | def __init__(self, color): 38 | super().__init__(color) 39 | 40 | def get_action(self, board): 41 | actions = board.get_legal_actions() 42 | num_groups = [len(board.libertydict.get_groups(opponent_color(self.color), action)) for action in actions] 43 | max_num_groups = max(num_groups) 44 | idx_candidates = [idx for idx, num in enumerate(num_groups) if num == max_num_groups] 45 | return actions[random.choice(idx_candidates)] if actions else None 46 | -------------------------------------------------------------------------------- /agent/rl/ApproxQAgent.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lxucs/go-game-easy/837578cb41e9481174a361ab2c487045f1bd7ae9/agent/rl/ApproxQAgent.npy -------------------------------------------------------------------------------- /agent/rl/ApproxQAgent_1e-3_4e-1randomoppo.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lxucs/go-game-easy/837578cb41e9481174a361ab2c487045f1bd7ae9/agent/rl/ApproxQAgent_1e-3_4e-1randomoppo.npy -------------------------------------------------------------------------------- /agent/rl/ApproxQAgent_2e-3.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lxucs/go-game-easy/837578cb41e9481174a361ab2c487045f1bd7ae9/agent/rl/ApproxQAgent_2e-3.npy -------------------------------------------------------------------------------- /agent/rl/ApproxQAgent_5e-4.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lxucs/go-game-easy/837578cb41e9481174a361ab2c487045f1bd7ae9/agent/rl/ApproxQAgent_5e-4.npy -------------------------------------------------------------------------------- /agent/rl/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lxucs/go-game-easy/837578cb41e9481174a361ab2c487045f1bd7ae9/agent/rl/__init__.py -------------------------------------------------------------------------------- /agent/rl/rl_agent.py: -------------------------------------------------------------------------------- 1 | from agent.basic_agent import Agent, RandomAgent 2 | from agent.search.search_agent import AlphaBetaAgent 3 | from agent.rl.rl_env import RlEnv 4 | import numpy as np 5 | from game.go import Board 6 | from game.go import opponent_color 7 | import random 8 | from statistics import mean 9 | 10 | 11 | class RlAgent(Agent): 12 | def __init__(self, color, rl_env): 13 | super().__init__(color) 14 | self.rl_env = rl_env 15 | self.w = None 16 | 17 | def get_action(self, board): 18 | raise NotImplementedError 19 | 20 | 21 | class ApproxQAgent(RlAgent): 22 | def __init__(self, color, rl_env): 23 | super().__init__(color, rl_env) 24 | 25 | def get_action(self, board): 26 | if self.w is None: 27 | raise RuntimeError('Agent needs to be trained or loaded!') 28 | 29 | legal_actions = board.get_legal_actions() 30 | if not legal_actions: 31 | return None 32 | 33 | return max(legal_actions, key=lambda action: self._calc_q(board, action)) 34 | 35 | def get_default_path(self): 36 | return '%s.npy' % self.__class__.__name__ 37 | 38 | def save(self, path_file=None): 39 | """Save the weight vector.""" 40 | if self.w is not None: 41 | if not path_file: 42 | path_file = self.get_default_path() 43 | np.save(path_file, self.w) 44 | print('Saved weights to ' + path_file) 45 | 46 | def load(self, path_file=None): 47 | """Load the weight vector.""" 48 | if path_file is None: 49 | path_file = self.get_default_path() 50 | self.w = np.load(path_file) 51 | print('Loaded weights from ' + path_file) 52 | 53 | def train(self, epochs, lr, discount, exploration_rate, decay_rate=0.9, decay_epoch=200): 54 | """ 55 | Use RandomAgent for opponent. 56 | :param epochs: one epoch = one game 57 | :param lr: learning rate 58 | :param discount: 59 | :param exploration_rate: the probability to cause random move during training 60 | :param decay_rate: the rate to decay learning rate and exploration rate 61 | :param decay_epoch: the number of epochs to apply decay 62 | :return: 63 | """ 64 | if exploration_rate > 1 or exploration_rate < 0: 65 | raise ValueError('exploration_rate should be in [0, 1]!') 66 | 67 | num_feats = self.rl_env.get_num_feats() 68 | self.w = np.random.random(num_feats) 69 | 70 | print('Start training ' + str(self)) 71 | for epoch in range(epochs): 72 | diff_mean = self._train_one_epoch(lr, discount, exploration_rate) 73 | # Decay learning rate and exploration rate 74 | if epoch % decay_epoch == decay_epoch - 1: 75 | lr *= decay_rate 76 | exploration_rate *= decay_rate 77 | print('Decay learning rate to %f' % lr) 78 | print('Decay exploration rate to %f' % exploration_rate) 79 | # Echo performance 80 | if epoch % 5 == 4: 81 | print('Epoch %d: mean difference %f' % (epoch, diff_mean)) 82 | print('Finished training') 83 | 84 | def _train_one_epoch(self, lr, discount, exploration_rate): 85 | """Return the mean of difference during this epoch""" 86 | # Opponent: minimax with random move 87 | prob_oppo_random = 0.4 88 | agent_oppo = AlphaBetaAgent(opponent_color(self.color), depth=1) 89 | agent_oppo_random = RandomAgent(opponent_color(self.color)) 90 | 91 | board = Board() 92 | first_move = (10, 10) 93 | board.put_stone(first_move, check_legal=False) 94 | 95 | if board.next != self.color: 96 | board.put_stone(agent_oppo_random.get_action(board), check_legal=False) 97 | 98 | diffs = [] 99 | while board.winner is None: 100 | legal_actions = board.get_legal_actions() 101 | 102 | # Get next action with exploration 103 | if random.uniform(0, 1) < exploration_rate: 104 | action_next = random.choice(legal_actions) 105 | else: 106 | action_next = max(legal_actions, key=lambda action: self._calc_q(board, action)) 107 | 108 | # Keep current features 109 | feats = self.rl_env.extract_features(board, action_next, self.color) 110 | q = self.w.dot(feats) 111 | 112 | # Apply next action 113 | board.put_stone(action_next, check_legal=False) 114 | 115 | # Let opponent play 116 | if board.winner is None: 117 | if random.uniform(0, 1) < prob_oppo_random: 118 | board.put_stone(agent_oppo_random.get_action(board), check_legal=False) 119 | else: 120 | board.put_stone(agent_oppo.get_action(board), check_legal=False) 121 | 122 | # Calc difference 123 | reward_now = self.rl_env.get_reward(board, self.color) 124 | reward_future = 0 125 | if board.winner is None: 126 | next_legal_actions = board.get_legal_actions() 127 | next_qs = [self._calc_q(board, action) for action in next_legal_actions] 128 | reward_future = max(next_qs) 129 | difference = reward_now + discount * reward_future - q 130 | diffs.append(difference) 131 | 132 | # Apply weight update 133 | self.w += (lr * difference * feats) 134 | 135 | return mean(diffs) 136 | 137 | def _calc_q(self, board, action): 138 | return self.w.dot(self.rl_env.extract_features(board, action, self.color)) 139 | 140 | 141 | if __name__ == '__main__': 142 | # Train and save ApproxQAgent 143 | approx_q_agent = ApproxQAgent('BLACK', RlEnv()) 144 | approx_q_agent.train(2000, 0.001, 0.9, 0.1) 145 | approx_q_agent.save() 146 | -------------------------------------------------------------------------------- /agent/rl/rl_agentx.py: -------------------------------------------------------------------------------- 1 | from agent.basic_agent import Agent, RandomAgent 2 | from agent.search.search_agent import AlphaBetaAgent 3 | from agent.rl.rl_env import RlEnv2 4 | import numpy as np 5 | from game.go import Board 6 | from game.go import opponent_color 7 | import random 8 | from statistics import mean 9 | 10 | 11 | class RlAgent(Agent): 12 | def __init__(self, color, rl_env): 13 | super().__init__(color) 14 | self.rl_env = rl_env 15 | self.w = None 16 | 17 | def get_action(self, board): 18 | raise NotImplementedError 19 | 20 | 21 | class ApproxQAgent(RlAgent): 22 | def __init__(self, color, rl_env): 23 | super().__init__(color, rl_env) 24 | 25 | def get_action(self, board): 26 | if self.w is None: 27 | raise RuntimeError('Agent needs to be trained or loaded!') 28 | 29 | legal_actions = board.get_legal_actions() 30 | if not legal_actions: 31 | return None 32 | 33 | return max(legal_actions, key=lambda action: self._calc_q(board, action)) 34 | 35 | def get_default_path(self): 36 | return '%s_%s.npy' % (self.__class__.__name__, self.color) 37 | 38 | def save(self, path_file=None): 39 | """Save the weight vector.""" 40 | if self.w is not None: 41 | if not path_file: 42 | path_file = self.get_default_path() 43 | np.save(path_file, self.w) 44 | print('Saved weights to ' + path_file) 45 | 46 | def load(self, path_file=None): 47 | """Load the weight vector.""" 48 | if path_file is None: 49 | path_file = self.get_default_path() 50 | self.w = np.load(path_file) 51 | print('Loaded weights from ' + path_file) 52 | 53 | def train(self, epochs, lr, discount, exploration_rate, decay_rate=0.9, decay_epoch=500): 54 | """ 55 | Use RandomAgent for opponent. 56 | :param epochs: one epoch = one game 57 | :param lr: learning rate 58 | :param discount: 59 | :param exploration_rate: the probability to cause self random move during training 60 | :param decay_rate: the rate to decay learning rate and exploration rate 61 | :param decay_epoch: the number of epochs to apply decay 62 | :return: 63 | """ 64 | if exploration_rate > 1 or exploration_rate < 0: 65 | raise ValueError('exploration_rate should be in [0, 1]!') 66 | 67 | num_feats = self.rl_env.get_num_feats() 68 | self.w = np.array([-1]+[-0.5]*(-1+num_feats)+[0.8]+[0.4]*(-1+num_feats)) 69 | 70 | print('Start training ' + str(self)) 71 | for epoch in range(epochs): 72 | diff_mean = self._train_one_epoch(lr, discount, exploration_rate) 73 | # Decay learning rate and exploration rate 74 | if epoch % decay_epoch == decay_epoch - 1: 75 | lr *= decay_rate 76 | exploration_rate *= decay_rate 77 | print('Decay learning rate to %f' % lr) 78 | print('Decay exploration rate to %f' % exploration_rate) 79 | # Echo performance 80 | if epoch % 5 == 4: 81 | print('Epoch %d: mean difference %f' % (epoch, diff_mean)) 82 | print('Finished training') 83 | 84 | def _train_one_epoch(self, lr, discount, exploration_rate): 85 | """Return the mean of difference during this epoch""" 86 | # Opponent: minimax with random move 87 | prob_oppo_random = 0.1 88 | agent_oppo = AlphaBetaAgent(opponent_color(self.color), depth=1) 89 | agent_oppo_random = RandomAgent(opponent_color(self.color)) 90 | 91 | board = Board() 92 | first_move = (10, 10) 93 | board.put_stone(first_move, check_legal=False) 94 | 95 | if board.next != self.color: 96 | board.put_stone(agent_oppo_random.get_action(board), check_legal=False) 97 | 98 | diffs = [] 99 | while board.winner is None: 100 | legal_actions = board.get_legal_actions() 101 | # Get next action with exploration 102 | if random.uniform(0, 1) < exploration_rate: 103 | action_next = random.choice(legal_actions) 104 | else: 105 | action_next = max(legal_actions, key=lambda action: self._calc_q(board, action)) 106 | 107 | # Keep current features 108 | feats ,isself = self.rl_env.extract_features(board, action_next, self.color) 109 | if isself: 110 | q=self.w.dot(feats) 111 | else: 112 | q=-self.w.dot(self.rl_env.reverse_features(feats)) 113 | 114 | # Apply next action 115 | board.put_stone(action_next, check_legal=False) 116 | # Let opponent play 117 | if board.winner is None: 118 | if random.uniform(0, 1) < prob_oppo_random: 119 | board.put_stone(agent_oppo_random.get_action(board), check_legal=False) 120 | else: 121 | board.put_stone(agent_oppo.get_action(board), check_legal=False) 122 | # Calc difference 123 | reward_now = self.rl_env.get_reward(board, self.color) 124 | reward_future = 0 125 | if board.winner is None: 126 | next_legal_actions = board.get_legal_actions() 127 | next_qs = [self._calc_q(board, action) for action in next_legal_actions] 128 | reward_future = max(next_qs) 129 | difference = reward_now + discount * reward_future - q 130 | diffs.append(difference) 131 | 132 | # Apply weight update 133 | if isself: 134 | self.w += (lr * difference * feats) 135 | else: 136 | self.w -= (lr * difference * self.rl_env.reverse_features(feats)) 137 | 138 | return mean(diffs) 139 | 140 | def _calc_q(self, board, action): 141 | feats,isself = self.rl_env.extract_features(board, action, self.color) 142 | if isself: 143 | return self.w.dot(feats) 144 | else: 145 | return -self.w.dot(self.rl_env.reverse_features(feats)) 146 | 147 | 148 | if __name__ == '__main__': 149 | # Train and save ApproxQAgent 150 | approx_q_agent = ApproxQAgent('BLACK', RlEnv2()) 151 | approx_q_agent.train(2000, 0.001, 0.9, 0.1) 152 | approx_q_agent.save() 153 | -------------------------------------------------------------------------------- /agent/rl/rl_env.py: -------------------------------------------------------------------------------- 1 | from game.go import Board, opponent_color 2 | from agent.util import get_num_endangered_groups, get_liberties, is_dangerous_liberty, \ 3 | get_num_groups_with_k_liberties, calc_group_liberty_var, get_group_scores, get_liberty_score 4 | import numpy as np 5 | """ 6 | Environment for rl_agent. 7 | """ 8 | 9 | 10 | class RlEnvBase: 11 | def __init__(self): 12 | pass 13 | 14 | @classmethod 15 | def get_reward(cls, board: Board, color): 16 | """Return a scalar reward""" 17 | if board.winner is None: 18 | return 0 19 | return 10 if board.winner == color else -10 20 | 21 | @classmethod 22 | def extract_features(cls, board: Board, action, color): 23 | raise NotImplementedError 24 | 25 | @classmethod 26 | def get_num_feats(cls): 27 | raise NotImplementedError 28 | 29 | 30 | class RlEnv(RlEnvBase): 31 | def __init__(self): 32 | super().__init__() 33 | 34 | @classmethod 35 | def extract_features(cls, board: Board, action, color): 36 | """Return a numpy array of features""" 37 | board = board.generate_successor_state(action) 38 | oppo = opponent_color(color) 39 | 40 | # Features for win 41 | feat_win = 1 if board.winner == color else 0 42 | if feat_win == 1: 43 | return np.array([feat_win] + [0] * (cls.get_num_feats() - 1)) 44 | 45 | # Features for endangered groups 46 | num_endangered_self, num_endangered_oppo = get_num_endangered_groups(board, color) 47 | feat_exist_endangered_self = 1 if num_endangered_self > 0 else 0 48 | feat_more_than_one_endangered_oppo = 1 if num_endangered_oppo > 1 else 0 49 | 50 | # Features for dangerous liberties 51 | feat_exist_guarantee_losing = 0 52 | feat_exist_guarantee_winning = 0 53 | liberties_self, liberties_oppo = get_liberties(board, color) 54 | for liberty in liberties_self: 55 | if is_dangerous_liberty(board, liberty, color): 56 | feat_exist_guarantee_losing = 1 57 | break 58 | for liberty in liberties_oppo: 59 | if is_dangerous_liberty(board, liberty, oppo): 60 | oppo_groups = board.libertydict.get_groups(oppo, liberty) 61 | liberties = oppo_groups[0].liberties | oppo_groups[1].liberties 62 | able_to_save = False 63 | for lbt in liberties: 64 | if len(board.libertydict.get_groups(color, lbt)) > 0: 65 | able_to_save = True 66 | break 67 | if not able_to_save: 68 | feat_exist_guarantee_winning = 1 69 | break 70 | 71 | # Features for groups 72 | num_groups_2lbt_self, num_groups_2lbt_oppo = get_num_groups_with_k_liberties(board, color, 2) 73 | feat_groups_2lbt = num_groups_2lbt_oppo - num_groups_2lbt_self 74 | 75 | # Features for shared liberties 76 | num_shared_liberties_self = 0 77 | num_shared_liberties_oppo = 0 78 | for liberty in liberties_self: 79 | num_shared_liberties_self += len(board.libertydict.get_groups(color, liberty)) - 1 80 | for liberty in liberties_oppo: 81 | num_shared_liberties_oppo += len(board.libertydict.get_groups(oppo, liberty)) - 1 82 | feat_shared_liberties = num_shared_liberties_oppo - num_shared_liberties_self 83 | 84 | # Features for number of groups 85 | feat_num_groups_diff = len(board.groups[color]) - len(board.groups[oppo]) 86 | 87 | # Features for liberty variance 88 | var_self, var_oppo = [], [] 89 | for group in board.groups[color]: 90 | var_self.append(calc_group_liberty_var(group)) 91 | for group in board.groups[oppo]: 92 | var_oppo.append(calc_group_liberty_var(group)) 93 | feat_var_self_mean = np.mean(var_self) 94 | feat_var_oppo_mean = np.mean(var_oppo) 95 | 96 | feats = [feat_win, feat_exist_endangered_self, feat_more_than_one_endangered_oppo, 97 | feat_exist_guarantee_losing, feat_exist_guarantee_winning, feat_groups_2lbt, 98 | feat_shared_liberties, feat_num_groups_diff, feat_var_self_mean, 99 | feat_var_oppo_mean, 1] # Add bias 100 | return np.array(feats) 101 | 102 | @classmethod 103 | def get_num_feats(cls): 104 | return 11 105 | 106 | 107 | class RlEnv2(RlEnvBase): 108 | def __init__(self): 109 | super().__init__() 110 | 111 | @classmethod 112 | def extract_features(cls, board: Board, action, color, isself=True, generatesuccessor=True): 113 | 114 | """Return a numpy array of features""" 115 | if generatesuccessor: 116 | board = board.generate_successor_state(action) 117 | else: 118 | board.put_stone(action) 119 | oppo = opponent_color(color) 120 | 121 | if board.winner == color: 122 | return np.array([0] * (cls.get_num_feats()) + [1] + [0] * (cls.get_num_feats() - 1)) , isself 123 | elif board.winner == oppo: 124 | return np.array([1] + [0] * (cls.get_num_feats() * 2 - 1)), isself 125 | 126 | if color == board.next: # Now opponent's move 127 | print('fuck! Extract features when color==next!') 128 | 129 | num_endangered_self, num_endangered_oppo = get_num_endangered_groups(board, color) 130 | 131 | if num_endangered_self>0: 132 | return np.array([1] + [0] * (cls.get_num_feats() * 2 - 1)) , isself # Doomed to lose 133 | 134 | elif len(board.legal_actions) == 1: #One choice only 135 | return cls.extract_features(board, board.legal_actions[0], oppo, not isself, False) 136 | 137 | elif num_endangered_oppo>1: 138 | return np.array([0] * (cls.get_num_feats()) + [1] + [0] * (cls.get_num_feats() - 1)) , isself # Doomed to win 139 | 140 | # Features for groups 141 | num_groups_2lbt_self, num_groups_2lbt_oppo = get_num_groups_with_k_liberties(board, color, 2) 142 | 143 | # Features for number of groups 144 | num_groups_self = len(board.groups[color])/3. 145 | num_groups_oppo = len(board.groups[oppo])/3. 146 | 147 | # Features for liberty variance 148 | self_group_score, oppo_group_score = get_group_scores(board, color) 149 | 150 | feats = [0,num_groups_2lbt_self, num_groups_self] + self_group_score + [0, num_groups_2lbt_oppo ,num_groups_oppo] + oppo_group_score # Add bias 151 | if len(feats) !=12: 152 | print('!!!!!!!!!!!!!!!!!!!',len(feats),'@@@@@@@@@@@@@@@@@@@@@') 153 | return np.array(feats), isself 154 | 155 | @classmethod 156 | def get_num_feats(cls): 157 | return 6 158 | 159 | @classmethod 160 | def reverse_features(cls, feat): 161 | length=cls.get_num_feats() 162 | return np.concatenate((feat[length:], feat[:length])) 163 | 164 | 165 | class RlEnv3(RlEnvBase): 166 | def __init__(self): 167 | super().__init__() 168 | 169 | @classmethod 170 | def extract_features(cls, board: Board, action, color, isself=True, generatesuccessor=True): 171 | """Return a numpy array of features""" 172 | if generatesuccessor: 173 | board = board.generate_successor_state(action) 174 | else: 175 | board.put_stone(action) 176 | oppo = opponent_color(color) 177 | 178 | if color == board.next: # Now opponent's move 179 | print('fuck! Extract features when color==next!') 180 | 181 | num_endangered_self, num_endangered_oppo = get_num_endangered_groups(board, color) 182 | if num_endangered_self>0: 183 | return np.array([1] + [0] * (cls.get_num_feats() * 2 - 1)) , isself # Doomed to lose 184 | 185 | elif len(board.legal_actions) == 1: #One choice only 186 | return cls.extract_features(board, board.legal_actions[0], oppo, not isself, False) 187 | 188 | elif num_endangered_oppo>1: 189 | return np.array([0] * (cls.get_num_feats() * 2) + [1] + [0] * (cls.get_num_feats() - 1)) , isself # Doomed to win 190 | 191 | # Features for groups 192 | num_groups_2lbt_self, num_groups_2lbt_oppo = get_num_groups_with_k_liberties(board, color, 2) 193 | 194 | # Features for number of groups 195 | num_groups_self = len(board.groups[color])/3. 196 | num_groups_oppo = len(board.groups[oppo])/3. 197 | 198 | # Features for liberty variance 199 | self_group_score, oppo_group_score = get_group_scores(board, color) 200 | 201 | # Features for liberty score 202 | self_liberty_scorelist = get_liberty_score(board, color) 203 | oppo_liberty_scorelist = get_liberty_score(board, oppo) 204 | feats = [0 , num_groups_2lbt_self, num_groups_self] + self_group_score + self_liberty_scorelist + [0, num_groups_2lbt_oppo ,num_groups_oppo] + oppo_group_score + oppo_liberty_scorelist # Add bias 205 | if len(feats) !=12: 206 | print('!!!!!!!!!!!!!!!!!!!',len(feats),'@@@@@@@@@@@@@@@@@@@@@') 207 | return np.array(feats), isself 208 | 209 | @classmethod 210 | def get_num_feats(cls): 211 | return 9 212 | 213 | @classmethod 214 | def reverse_features(cls, feat): 215 | length=cls.get_num_feats() 216 | return np.concatenate((feat[length:], feat[:length])) 217 | -------------------------------------------------------------------------------- /agent/search/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lxucs/go-game-easy/837578cb41e9481174a361ab2c487045f1bd7ae9/agent/search/__init__.py -------------------------------------------------------------------------------- /agent/search/evaluation.py: -------------------------------------------------------------------------------- 1 | from game.go import Board, opponent_color 2 | from agent.util import get_num_endangered_groups, get_liberties, is_dangerous_liberty, get_num_groups_with_k_liberties 3 | from numpy.random import normal 4 | """ 5 | Evaluation functions for search_agent. 6 | """ 7 | 8 | 9 | def evaluate(board: Board, color): 10 | """Color has the next action""" 11 | # Score for win or lose 12 | score_win = 1000 - board.counter_move # Prefer faster game 13 | if board.winner: 14 | return score_win if board.winner == color else -score_win 15 | 16 | oppo = opponent_color(color) 17 | # Score for endangered groups 18 | num_endangered_self, num_endangered_oppo = get_num_endangered_groups(board, color) 19 | if num_endangered_oppo > 0: 20 | return score_win - 10 # Win in the next move 21 | elif num_endangered_self > 1: 22 | return -(score_win - 10) # Lose in the next move 23 | 24 | # Score for dangerous liberties 25 | liberties_self, liberties_oppo = get_liberties(board, color) 26 | for liberty in liberties_oppo: 27 | if is_dangerous_liberty(board, liberty, oppo): 28 | return score_win / 2 # Good probability to win in the next next move 29 | for liberty in liberties_self: 30 | if is_dangerous_liberty(board, liberty, color): 31 | self_groups = board.libertydict.get_groups(color, liberty) 32 | liberties = self_groups[0].liberties | self_groups[1].liberties 33 | able_to_save = False 34 | for lbt in liberties: 35 | if len(board.libertydict.get_groups(oppo, lbt)) > 0: 36 | able_to_save = True 37 | break 38 | if not able_to_save: 39 | return -score_win / 2 # Good probability to lose in the next next move 40 | 41 | # Score for groups 42 | num_groups_2lbt_self, num_groups_2lbt_oppo = get_num_groups_with_k_liberties(board, color, 2) 43 | score_groups = num_groups_2lbt_oppo - num_groups_2lbt_self 44 | 45 | # Score for liberties 46 | num_shared_liberties_self = 0 47 | num_shared_liberties_oppo = 0 48 | for liberty in liberties_self: 49 | num_shared_liberties_self += len(board.libertydict.get_groups(color, liberty)) - 1 50 | for liberty in liberties_oppo: 51 | num_shared_liberties_oppo += len(board.libertydict.get_groups(oppo, liberty)) - 1 52 | score_liberties = num_shared_liberties_oppo - num_shared_liberties_self 53 | 54 | # Score for groups (doesn't help) 55 | # score_groups_self = [] 56 | # score_groups_oppo = [] 57 | # for group in board.groups[color]: 58 | # if group.num_liberty > 1: 59 | # score_groups_self.append(eval_group(group, board)) 60 | # for group in board.groups[opponent_color(color)]: 61 | # if group.num_liberty > 1: 62 | # score_groups_oppo.append(eval_group(group, board)) 63 | # score_groups_self.sort(reverse=True) 64 | # score_groups_self += [0, 0] 65 | # score_groups_oppo.sort(reverse=True) 66 | # score_groups_oppo += [0, 0] 67 | # finals = score_groups_oppo[0] - score_groups_self[0] + score_groups_oppo[1] - score_groups_self[1] 68 | 69 | return score_groups * normal(1, 0.1) + score_liberties * normal(1, 0.1) 70 | -------------------------------------------------------------------------------- /agent/search/search_agent.py: -------------------------------------------------------------------------------- 1 | from agent.basic_agent import Agent 2 | import random 3 | from agent.search.evaluation import evaluate 4 | 5 | 6 | class SearchAgent(Agent): 7 | def __init__(self, color, depth, eval_func): 8 | """ 9 | :param color: 10 | :param depth: search depth 11 | :param eval_func: evaluation function from the evaluation module 12 | """ 13 | super().__init__(color) 14 | self.depth = depth 15 | self.eval_func = eval_func 16 | self.pruning_actions = None 17 | 18 | def get_action(self, board): 19 | raise NotImplementedError 20 | 21 | def __str__(self): 22 | return '%s; color: %s; search_depth: %d' % (self.__class__.__name__, self.color, self.depth) 23 | 24 | 25 | class AlphaBetaAgent(SearchAgent): 26 | def __init__(self, color, depth, eval_func=evaluate): 27 | super().__init__(color, depth, eval_func) 28 | 29 | def get_action(self, board, pruning_actions=20): 30 | 31 | self.pruning_actions = pruning_actions 32 | score, actions = self.max_value(board, 0, float("-inf"), float("inf")) 33 | 34 | return actions[0] if len(actions) > 0 else None 35 | 36 | def max_value(self, board, depth, alpha, beta): 37 | """Return the highest score and the corresponding subsequent actions""" 38 | if self.terminal_test(board) or depth == self.depth: 39 | return self.eval_func(board, self.color), [] 40 | 41 | max_score = float("-inf") 42 | max_score_actions = None 43 | # Prune the legal actions 44 | legal_actions = board.get_legal_actions() 45 | if self.pruning_actions and len(legal_actions) > self.pruning_actions: 46 | legal_actions = random.sample(legal_actions, self.pruning_actions) 47 | 48 | for action in legal_actions: 49 | score, actions = self.min_value(board.generate_successor_state(action), depth, alpha, beta) 50 | if score > max_score: 51 | max_score = score 52 | max_score_actions = [action] + actions 53 | 54 | if max_score > beta: 55 | return max_score, max_score_actions 56 | 57 | if max_score > alpha: 58 | alpha = max_score 59 | 60 | return max_score, max_score_actions 61 | 62 | def min_value(self, board, depth, alpha, beta): 63 | """Return the lowest score and the corresponding subsequent actions""" 64 | if self.terminal_test(board) or depth == self.depth: 65 | return self.eval_func(board, self.color), [] 66 | 67 | min_score = float("inf") 68 | min_score_actions = None 69 | # Prune the legal actions 70 | legal_actions = board.get_legal_actions() 71 | if self.pruning_actions and len(legal_actions) > self.pruning_actions: 72 | legal_actions = random.sample(legal_actions, self.pruning_actions) 73 | 74 | for action in legal_actions: 75 | score, actions = self.max_value(board.generate_successor_state(action), depth+1, alpha, beta) 76 | if score < min_score: 77 | min_score = score 78 | min_score_actions = [action] + actions 79 | 80 | if min_score < alpha: 81 | return min_score, min_score_actions 82 | 83 | if min_score < beta: 84 | beta = min_score 85 | 86 | return min_score, min_score_actions 87 | 88 | 89 | class ExpectimaxAgent(SearchAgent): 90 | """Assume uniform distribution for opponent""" 91 | def __init__(self, color, depth, eval_func=evaluate): 92 | super().__init__(color, depth, eval_func) 93 | 94 | def get_action(self, board, pruning_actions=16): 95 | self.pruning_actions = pruning_actions 96 | score, actions = self.max_value(board, 0) 97 | return actions[0] if len(actions) > 0 else None 98 | 99 | def max_value(self, board, depth): 100 | if self.terminal_test(board) or depth == self.depth: 101 | return self.eval_func(board, self.color), [] 102 | 103 | max_score = float("-inf") 104 | max_score_actions = None 105 | # Prune the legal actions 106 | legal_actions = board.get_legal_actions() 107 | if self.pruning_actions and len(legal_actions) > self.pruning_actions: 108 | legal_actions = random.sample(legal_actions, self.pruning_actions) 109 | 110 | for action in legal_actions: 111 | score, actions = self.expected_value(board.generate_successor_state(action), depth) 112 | if score > max_score: 113 | max_score = score 114 | max_score_actions = [action] + actions 115 | 116 | return max_score, max_score_actions 117 | 118 | def expected_value(self, board, depth): 119 | if self.terminal_test(board) or depth == self.depth: 120 | return self.eval_func(board, self.color), [] 121 | 122 | expected_score = 0.0 123 | # Prune the legal actions 124 | legal_actions = board.get_legal_actions() 125 | if self.pruning_actions and len(legal_actions) > self.pruning_actions: 126 | legal_actions = random.sample(legal_actions, self.pruning_actions) 127 | 128 | for action in legal_actions: 129 | score, actions = self.max_value(board.generate_successor_state(action), depth+1) 130 | expected_score += score / len(legal_actions) 131 | 132 | return expected_score, [] 133 | -------------------------------------------------------------------------------- /agent/util.py: -------------------------------------------------------------------------------- 1 | from game.go import Board, opponent_color, Group 2 | import numpy as np 3 | 4 | 5 | def get_num_endangered_groups(board: Board, color): 6 | num_endangered_self = 0 7 | num_endangered_oppo = 0 8 | for group in board.endangered_groups: 9 | if group.color == color: 10 | num_endangered_self += 1 11 | else: 12 | num_endangered_oppo += 1 13 | return num_endangered_self, num_endangered_oppo 14 | 15 | 16 | def get_num_groups_with_k_liberties(board: Board, color, k): 17 | num_groups_self = 0 18 | num_groups_oppo = 0 19 | for group in board.groups[color]: 20 | if group.num_liberty == k: 21 | num_groups_self += 1 22 | for group in board.groups[opponent_color(color)]: 23 | if group.num_liberty == k: 24 | num_groups_oppo += 1 25 | return num_groups_self, num_groups_oppo 26 | 27 | 28 | def get_liberties(board: Board, color): 29 | liberties_self = set() 30 | liberties_oppo = set() 31 | for group in board.groups[color]: 32 | liberties_self = liberties_self | group.liberties 33 | for group in board.groups[opponent_color(color)]: 34 | liberties_oppo = liberties_oppo | group.liberties 35 | return liberties_self, liberties_oppo 36 | 37 | 38 | def is_dangerous_liberty(board: Board, point, color): 39 | self_groups = board.libertydict.get_groups(color, point) 40 | return len(self_groups) == 2 and self_groups[0].num_liberty == 2 and self_groups[1].num_liberty == 2 41 | 42 | 43 | def calc_group_liberty_var(group: Group): 44 | var_x = np.var([x[0] for x in group.liberties]) 45 | var_y = np.var([x[1] for x in group.liberties]) 46 | return var_x + var_y 47 | 48 | 49 | def eval_group(group: Group, board: Board): 50 | """Evaluate the liveliness of group; higher score, more endangered""" 51 | if group.num_liberty > 3: 52 | return 0 53 | elif group.num_liberty == 1: 54 | return 5 55 | 56 | # Till here, group has either 2 or 3 liberties. 57 | var_x = np.var([x[0] for x in group.liberties]) 58 | var_y = np.var([x[1] for x in group.liberties]) 59 | var_sum = var_x + var_y 60 | if var_sum < 0.1: 61 | print('var_sum < 0.1') 62 | 63 | num_shared_liberty = 0 64 | for liberty in group.liberties: 65 | num_shared_self_groups = len(board.libertydict.get_groups(group.color, liberty)) 66 | num_shared_oppo_groups = len(board.libertydict.get_groups(opponent_color(group.color), liberty)) 67 | if num_shared_self_groups == 3 and num_shared_oppo_groups == 0: # Group is safe 68 | return 0 69 | elif num_shared_self_groups == 2 or num_shared_self_groups == 3: 70 | num_shared_liberty += 1 71 | 72 | if num_shared_liberty == 1 and var_sum <= 0.5: 73 | score = 1/np.sqrt(group.num_liberty)/var_sum/4. 74 | elif num_shared_liberty == 2 and var_sum > 0.3: 75 | score = 1/np.sqrt(group.num_liberty)/var_sum/8. 76 | else: 77 | score = 1/np.sqrt(group.num_liberty)/var_sum/6. 78 | if np.sqrt(group.num_liberty)<1.1: 79 | print('fuck!', group.num_liberty, board.winner) 80 | if var_sum<0.2: 81 | print('shit!') 82 | return score 83 | 84 | 85 | def get_group_scores(board: Board, color): 86 | selfscore=[] 87 | opponentscore=[] 88 | for group in board.groups[color]: 89 | if group.num_liberty != 1: 90 | selfscore.append(eval_group(group, board)) 91 | for group in board.groups[opponent_color(color)]: 92 | if group.num_liberty != 1: 93 | opponentscore.append(eval_group(group, board)) 94 | selfscore.sort(reverse=True) 95 | selfscore.extend([0, 0, 0]) 96 | opponentscore.sort(reverse=True) 97 | opponentscore.extend([0, 0, 0]) 98 | return selfscore[:3], opponentscore[:3] 99 | 100 | 101 | def get_liberty_score(board: Board, color): 102 | scores = [] 103 | share3 = 0 104 | for liberty, groups in board.libertydict.get_items(color): 105 | if len(groups) == 0: 106 | continue 107 | elif len(groups) == 3: 108 | share3 += 1 109 | continue 110 | else: 111 | scores.append(sum([0.353 / group.num_liberty / calc_group_liberty_var(group) for group in groups])) 112 | scores.sort(reverse=True) 113 | scores.extend([0, 0]) 114 | return scores[:2] + [-share3 / 2.] 115 | -------------------------------------------------------------------------------- /benchmark.py: -------------------------------------------------------------------------------- 1 | from match import Match 2 | from agent.basic_agent import RandomAgent, GreedyAgent 3 | from agent.search.search_agent import AlphaBetaAgent, ExpectimaxAgent 4 | from agent.rl.rl_agent import ApproxQAgent 5 | from agent.rl.rl_env import RlEnv 6 | from statistics import mean 7 | 8 | 9 | class Benchmark: 10 | def __init__(self, agent_self, agent_oppo): 11 | """ 12 | :param agent_self: the agent to evaluate 13 | :param agent_oppo: the opponent agent, such as RandomAgent, GreedyAgent 14 | """ 15 | if (agent_self.color == 'BLACK' and agent_oppo.color == 'WHITE') \ 16 | or (agent_self.color == 'WHITE' and agent_oppo.color == 'BLACK'): 17 | self.agent_self = agent_self 18 | self.agent_oppo = agent_oppo 19 | else: 20 | raise ValueError('Must have one BLACK agent and one WHITE agent!') 21 | 22 | def create_match(self, gui=False): 23 | if self.agent_self.color == 'BLACK': 24 | return Match(agent_black=self.agent_self, agent_white=self.agent_oppo, gui=gui) 25 | else: 26 | return Match(agent_white=self.agent_self, agent_black=self.agent_oppo, gui=gui) 27 | 28 | def run_benchmark(self, num_tests, gui=False): 29 | list_win = [] 30 | list_num_moves = [] 31 | list_time_elapsed = [] 32 | 33 | for i in range(num_tests): 34 | print('Running game %d: ' % i, end='') 35 | match = self.create_match(gui=gui) 36 | match.start() 37 | 38 | list_win.append(match.winner == self.agent_self.color) 39 | list_num_moves.append(match.counter_move) 40 | list_time_elapsed.append(match.time_elapsed) 41 | print('\tWinner: ' + match.winner) 42 | 43 | win_mean = mean(list_win) 44 | num_moves_mean = mean(list_num_moves) 45 | time_elapsed_mean = mean(list_time_elapsed) 46 | return win_mean, num_moves_mean, time_elapsed_mean 47 | 48 | 49 | if __name__ == '__main__': 50 | # agent_self = RandomAgent('BLACK') 51 | # agent_self = GreedyAgent('BLACK') 52 | # agent_self = AlphaBetaAgent('BLACK', 1) 53 | # agent_self = ExpectimaxAgent('BLACK', 1) 54 | agent_self = ApproxQAgent('WHITE', RlEnv()) 55 | agent_self.load('agent/rl/ApproxQAgent_1e-3.npy') 56 | 57 | # agent_oppo = RandomAgent('WHITE') 58 | # agent_oppo = GreedyAgent('WHITE') 59 | agent_oppo = AlphaBetaAgent('BLACK', 1) 60 | 61 | benchmark = Benchmark(agent_self=agent_self, agent_oppo=agent_oppo) 62 | win_mean, num_moves_mean, time_elapsed_mean = benchmark.run_benchmark(100, gui=True) 63 | print('Win rate: %f; Avg # moves: %f; Avg time: %f' % (win_mean, num_moves_mean, time_elapsed_mean)) 64 | -------------------------------------------------------------------------------- /game/README.md: -------------------------------------------------------------------------------- 1 | ui.py is inspired by [https://github.com/eagleflo/goban]. 2 | -------------------------------------------------------------------------------- /game/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lxucs/go-game-easy/837578cb41e9481174a361ab2c487045f1bd7ae9/game/__init__.py -------------------------------------------------------------------------------- /game/go.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | from copy import deepcopy 3 | from game.util import PointDict 4 | """ 5 | This file is the full backend environment of the game. 6 | """ 7 | 8 | BOARD_SIZE = 20 # number of rows/cols = BOARD_SIZE - 1 9 | 10 | 11 | def opponent_color(color): 12 | if color == 'WHITE': 13 | return 'BLACK' 14 | elif color == 'BLACK': 15 | return 'WHITE' 16 | else: 17 | print('Invalid color: ' + color) 18 | return KeyError 19 | 20 | 21 | def neighbors(point): 22 | """Return a list of neighboring points.""" 23 | neighboring = [(point[0] - 1, point[1]), 24 | (point[0] + 1, point[1]), 25 | (point[0], point[1] - 1), 26 | (point[0], point[1] + 1)] 27 | return [point for point in neighboring if 0 < point[0] < BOARD_SIZE and 0 < point[1] < BOARD_SIZE] 28 | 29 | 30 | def cal_liberty(points, board): 31 | """Find and return the liberties of the point.""" 32 | liberties = [point for point in neighbors(points) 33 | if not board.stonedict.get_groups('BLACK', point) and not board.stonedict.get_groups('WHITE', point)] 34 | return set(liberties) 35 | 36 | 37 | class Group(object): 38 | def __init__(self, point, color, liberties): 39 | """ 40 | Create and initialize a new group. 41 | :param point: the initial stone in the group 42 | :param color: 43 | :param liberties: 44 | """ 45 | self.color = color 46 | if isinstance(point, list): 47 | self.points = point 48 | else: 49 | self.points = [point] 50 | self.liberties = liberties 51 | 52 | @property 53 | def num_liberty(self): 54 | return len(self.liberties) 55 | 56 | def add_stones(self, pointlist): 57 | """Only update stones, not liberties""" 58 | self.points.extend(pointlist) 59 | 60 | def remove_liberty(self, point): 61 | self.liberties.remove(point) 62 | 63 | def __str__(self): 64 | """Summarize color, stones, liberties.""" 65 | return '%s - stones: [%s]; liberties: [%s]' % \ 66 | (self.color, 67 | ', '.join([str(point) for point in self.points]), 68 | ', '.join([str(point) for point in self.liberties])) 69 | 70 | def __repr__(self): 71 | return str(self) 72 | 73 | 74 | class Board(object): 75 | """ 76 | get_legal_actions(), generate_successor_state() are the external game interface. 77 | put_stone() is the main internal method that contains all logic to update game state. 78 | create_group(), remove_group(), merge_groups() operations don't check winner or endangered groups. 79 | Winner or endangered groups are updated in put_stone(). 80 | Winning criteria: remove any opponent's group, or no legal actions for opponent. 81 | """ 82 | def __init__(self, next_color='BLACK'): 83 | self.winner = None 84 | self.next = next_color 85 | self.legal_actions = [] # Legal actions for current state 86 | self.end_by_no_legal_actions = False 87 | self.counter_move = 0 88 | 89 | # Point dict 90 | self.libertydict = PointDict() # {color: {point: {groups}}} 91 | self.stonedict = PointDict() 92 | 93 | # Group list 94 | self.groups = {'BLACK': [], 'WHITE': []} 95 | self.endangered_groups = [] # groups with only 1 liberty 96 | self.removed_groups = [] # This is assigned when game ends 97 | 98 | def create_group(self, point, color): 99 | """Create a new group.""" 100 | # Update group list 101 | ll = cal_liberty(point, self) 102 | group = Group(point, color, ll) 103 | self.groups[color].append(group) 104 | # Update endangered group 105 | if len(group.liberties) <= 1: 106 | self.endangered_groups.append(group) 107 | # Update stonedict 108 | self.stonedict.get_groups(color, point).append(group) 109 | # Update libertydict 110 | for liberty in group.liberties: 111 | self.libertydict.get_groups(color, liberty).append(group) 112 | return group 113 | 114 | def remove_group(self, group): 115 | """ 116 | Remove the group. 117 | :param group: 118 | :return: 119 | """ 120 | color = group.color 121 | # Update group list 122 | self.groups[color].remove(group) 123 | # Update endangered_groups 124 | if group in self.endangered_groups: 125 | self.endangered_groups.remove(group) 126 | # Update stonedict 127 | for point in group.points: 128 | self.stonedict.get_groups(color, point).remove(group) 129 | # Update libertydict 130 | for liberty in group.liberties: 131 | self.libertydict.get_groups(color, liberty).remove(group) 132 | 133 | def merge_groups(self, grouplist, point): 134 | """ 135 | Merge groups (assuming same color). 136 | all groups already have this liberty removed; 137 | libertydict already has this point removed. 138 | :param grouplist: 139 | :param point: 140 | """ 141 | color = grouplist[0].color 142 | newgroup = grouplist[0] 143 | all_liberties = grouplist[0].liberties 144 | 145 | # Add last move (update newgroup and stonedict) 146 | newgroup.add_stones([point]) 147 | self.stonedict.get_groups(color, point).append(newgroup) 148 | all_liberties = all_liberties | cal_liberty(point, self) 149 | 150 | # Merge with other groups (update newgroup and stonedict) 151 | for group in grouplist[1:]: 152 | newgroup.add_stones(group.points) 153 | for p in group.points: 154 | self.stonedict.get_groups(color, p).append(newgroup) 155 | all_liberties = all_liberties | group.liberties 156 | self.remove_group(group) 157 | 158 | # Update newgroup liberties (point is already removed from group liberty) 159 | newgroup.liberties = all_liberties 160 | 161 | # Update libertydict 162 | for point in all_liberties: 163 | belonging_groups = self.libertydict.get_groups(color, point) 164 | if newgroup not in belonging_groups: 165 | belonging_groups.append(newgroup) 166 | 167 | return newgroup 168 | 169 | def get_legal_actions(self): 170 | """External interface to get legal actions""" 171 | # It is important NOT to calculate actions on the fly to keep the performance. 172 | return self.legal_actions.copy() 173 | 174 | def _get_legal_actions(self): 175 | """Internal method to calculate legal actions; shouldn't be called outside""" 176 | if self.winner: 177 | return [] 178 | 179 | endangered_lbt_self = set() 180 | endangered_lbt_opponent = set() 181 | for group in self.endangered_groups: 182 | if group.color == self.next: 183 | endangered_lbt_self = endangered_lbt_self | group.liberties 184 | else: 185 | endangered_lbt_opponent = endangered_lbt_opponent | group.liberties 186 | 187 | # If there are opponent's endangered points, return these points to win 188 | if len(endangered_lbt_opponent) > 0: 189 | return list(endangered_lbt_opponent) 190 | 191 | legal_actions = [] 192 | if len(endangered_lbt_self) > 0: 193 | # If there are more than one self endangered points, return these points (losing the game) 194 | if len(endangered_lbt_self) > 1: 195 | legal_actions = list(endangered_lbt_self) 196 | # Rescue the sole endangered liberty if existing 197 | if len(endangered_lbt_self) == 1: 198 | legal_actions = list(endangered_lbt_self) 199 | else: 200 | legal_actions = set() 201 | for group in self.groups[opponent_color(self.next)]: 202 | legal_actions = legal_actions | group.liberties 203 | legal_actions = list(legal_actions) 204 | 205 | # Final check: no suicidal move, either has liberties or any connected self-group has more than this liberty 206 | legal_actions_filtered = [] 207 | for action in legal_actions: 208 | if len(cal_liberty(action, self)) > 0: 209 | legal_actions_filtered.append(action) 210 | else: 211 | connected_self_groups = [self.stonedict.get_groups(self.next, p)[0] for p in neighbors(action) 212 | if self.stonedict.get_groups(self.next, p)] 213 | for self_group in connected_self_groups: 214 | if len(self_group.liberties) > 1: 215 | legal_actions_filtered.append(action) 216 | break 217 | 218 | return legal_actions_filtered 219 | 220 | def _shorten_liberty(self, group, point, color): 221 | group.remove_liberty(point) 222 | if group.color != color: # If opponent's group, check if winning or endangered groups 223 | if len(group.liberties) == 0: # The new stone is opponent's, check if winning 224 | self.removed_groups.append(group) # Set removed_group 225 | self.winner = opponent_color(group.color) 226 | elif len(group.liberties) == 1: 227 | self.endangered_groups.append(group) 228 | 229 | def shorten_liberty_for_groups(self, point, color): 230 | """ 231 | Remove the liberty from all belonging groups. 232 | For opponent's groups, update consequences such as libertydict, winner or endangered group. 233 | endangered groups for self will be updated in put_stone() after self groups are merged 234 | :param point: 235 | :param color: 236 | :return: 237 | """ 238 | # Check opponent's groups first 239 | opponent = opponent_color(color) 240 | for group in self.libertydict.get_groups(opponent, point): 241 | self._shorten_liberty(group, point, color) 242 | self.libertydict.remove_point(opponent, point) # update libertydict 243 | 244 | # If any opponent's group dies, no need to check self group 245 | if not self.winner: 246 | for group in self.libertydict.get_groups(color, point): 247 | self._shorten_liberty(group, point, color) 248 | self.libertydict.remove_point(color, point) # update libertydict 249 | 250 | def put_stone(self, point, check_legal=False): 251 | if check_legal: 252 | if point not in self.legal_actions: 253 | print('Error: illegal move, try again.') 254 | return False 255 | # If more than 400 moves (which shouldn't happen), print the board for debug 256 | if self.counter_move > 400: 257 | print(self) 258 | raise RuntimeError('More than 400 moves in one game! Board is printed.') 259 | 260 | # Get all self-groups containing this liberty 261 | self_belonging_groups = self.libertydict.get_groups(self.next, point).copy() 262 | 263 | # Remove the liberty from all belonging groups (with consequences updated such as winner) 264 | self.shorten_liberty_for_groups(point, self.next) 265 | self.counter_move += 1 266 | if self.winner: 267 | self.next = opponent_color(self.next) 268 | return True 269 | 270 | # Update groups with the new point 271 | if len(self_belonging_groups) == 0: # Create a group for the new stone 272 | new_group = self.create_group(point, self.next) 273 | else: # Merge all self-groups in touch with the new stone 274 | new_group = self.merge_groups(self_belonging_groups, point) 275 | 276 | # Update whether is endangered group 277 | # endangered groups for opponent are already updated in shorten_liberty_for_groups 278 | if new_group in self.endangered_groups and len(new_group.liberties) > 1: 279 | self.endangered_groups.remove(new_group) 280 | elif new_group not in self.endangered_groups and len(new_group.liberties) == 1: 281 | self.endangered_groups.append(new_group) 282 | 283 | self.next = opponent_color(self.next) 284 | 285 | # Update legal_actions; if there are no legal actions for opponent, claim winning 286 | self.legal_actions = self._get_legal_actions() 287 | if not self.legal_actions: 288 | self.winner = opponent_color(self.next) 289 | self.end_by_no_legal_actions = True 290 | 291 | return True 292 | 293 | def generate_successor_state(self, action, check_legal=False): 294 | board = self.copy() 295 | board.put_stone(action, check_legal=check_legal) 296 | return board 297 | 298 | def __str__(self): 299 | str_groups = [str(group) for group in self.groups['BLACK']] + [str(group) for group in self.groups['WHITE']] 300 | return 'Next: %s\n%s' % (self.next, '\n'.join(str_groups)) 301 | 302 | def exist_stone(self, point): 303 | """To see if a stone has been placed on the board""" 304 | return len(self.stonedict.get_groups('BLACK', point)) > 0 or len(self.stonedict.get_groups('WHITE', point)) > 0 305 | 306 | def copy(self): 307 | """Manual copy because of group dependencies across self variables""" 308 | board = Board(self.next) 309 | board.winner = self.winner 310 | 311 | group_mapping = {group: deepcopy(group) for group in self.groups['BLACK'] + self.groups['WHITE']} 312 | board.groups['BLACK'] = [group_mapping[group] for group in self.groups['BLACK']] 313 | board.groups['WHITE'] = [group_mapping[group] for group in self.groups['WHITE']] 314 | 315 | board.endangered_groups = [group_mapping[group] for group in self.endangered_groups] 316 | board.removed_groups = [group_mapping[group] for group in self.removed_groups] 317 | 318 | for point, groups in self.libertydict.get_items('BLACK'): 319 | if groups: 320 | board.libertydict.set_groups('BLACK', point, [group_mapping[group] for group in groups]) 321 | for point, groups in self.libertydict.get_items('WHITE'): 322 | if groups: 323 | board.libertydict.set_groups('WHITE', point, [group_mapping[group] for group in groups]) 324 | 325 | for point, groups in self.stonedict.get_items('BLACK'): 326 | if groups: 327 | board.stonedict.set_groups('BLACK', point, [group_mapping[group] for group in groups]) 328 | for point, groups in self.stonedict.get_items('WHITE'): 329 | if groups: 330 | board.stonedict.set_groups('WHITE', point, [group_mapping[group] for group in groups]) 331 | 332 | return board 333 | -------------------------------------------------------------------------------- /game/images/ramin.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lxucs/go-game-easy/837578cb41e9481174a361ab2c487045f1bd7ae9/game/images/ramin.jpg -------------------------------------------------------------------------------- /game/ui.py: -------------------------------------------------------------------------------- 1 | import pygame 2 | """ 3 | This file is the GUI on top of the game backend. 4 | """ 5 | 6 | BACKGROUND = 'game/images/ramin.jpg' 7 | BOARD_SIZE = (820, 820) 8 | BLACK = (0, 0, 0) 9 | 10 | 11 | def get_rbg(color): 12 | if color == 'WHITE': 13 | return 255, 255, 255 14 | elif color == 'BLACK': 15 | return 0, 0, 0 16 | else: 17 | return 0, 133, 211 18 | 19 | 20 | def coords(point): 21 | """Return the coordinate of a stone drawn on board""" 22 | return 5 + point[0] * 40, 5 + point[1] * 40 23 | 24 | 25 | def leftup_corner(point): 26 | return -15 + point[0] * 40, -15 + point[1] * 40 27 | 28 | 29 | class UI: 30 | def __init__(self): 31 | """Create, initialize and draw an empty board.""" 32 | self.outline = pygame.Rect(45, 45, 720, 720) 33 | self.screen = None 34 | self.background = None 35 | 36 | def initialize(self): 37 | """This method should only be called once, when initializing the board.""" 38 | # This method is from https://github.com/eagleflo/goban/blob/master/goban.py 39 | pygame.init() 40 | pygame.display.set_caption('Goban') 41 | self.screen = pygame.display.set_mode(BOARD_SIZE, 0, 32) 42 | self.background = pygame.image.load(BACKGROUND).convert() 43 | 44 | pygame.draw.rect(self.background, BLACK, self.outline, 3) 45 | # Outline is inflated here for future use as a collidebox for the mouse 46 | self.outline.inflate_ip(20, 20) 47 | for i in range(18): 48 | for j in range(18): 49 | rect = pygame.Rect(45 + (40 * i), 45 + (40 * j), 40, 40) 50 | pygame.draw.rect(self.background, BLACK, rect, 1) 51 | for i in range(3): 52 | for j in range(3): 53 | coords = (165 + (240 * i), 165 + (240 * j)) 54 | pygame.draw.circle(self.background, BLACK, coords, 5, 0) 55 | self.screen.blit(self.background, (0, 0)) 56 | pygame.display.update() 57 | 58 | def draw(self, point, color, size=20): 59 | color = get_rbg(color) 60 | pygame.draw.circle(self.screen, color, coords(point), size, 0) 61 | pygame.display.update() 62 | 63 | def remove(self, point): 64 | blit_coords = leftup_corner(point) 65 | area_rect = pygame.Rect(blit_coords, (40, 40)) 66 | self.screen.blit(self.background, blit_coords, area_rect) 67 | pygame.display.update() 68 | 69 | def save_image(self, path_to_save): 70 | pygame.image.save(self.screen, path_to_save) 71 | -------------------------------------------------------------------------------- /game/util.py: -------------------------------------------------------------------------------- 1 | class PointDict: 2 | def __init__(self): 3 | self.d = {'BLACK': {}, 'WHITE': {}} 4 | 5 | def get_groups(self, color, point): 6 | if point not in self.d[color]: 7 | self.d[color][point] = [] 8 | return self.d[color][point] 9 | 10 | def set_groups(self, color, point, groups): 11 | self.d[color][point] = groups 12 | 13 | def remove_point(self, color, point): 14 | if point in self.d[color]: 15 | del self.d[color][point] 16 | 17 | def get_items(self, color): 18 | return self.d[color].items() 19 | -------------------------------------------------------------------------------- /img/Board.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lxucs/go-game-easy/837578cb41e9481174a361ab2c487045f1bd7ae9/img/Board.jpg -------------------------------------------------------------------------------- /match.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | from game.go import Board, opponent_color 3 | from game.ui import UI 4 | import pygame 5 | import time 6 | from agent.basic_agent import RandomAgent, GreedyAgent 7 | from agent.search.search_agent import AlphaBetaAgent, ExpectimaxAgent 8 | from agent.rl.rl_agent import ApproxQAgent 9 | from agent.rl.rl_env import RlEnv 10 | from os.path import join 11 | from argparse import ArgumentParser 12 | 13 | 14 | class Match: 15 | def __init__(self, agent_black=None, agent_white=None, gui=True, dir_save=None): 16 | """ 17 | BLACK always has the first move on the center of the board. 18 | :param agent_black: agent or None(human) 19 | :param agent_white: agent or None(human) 20 | :param gui: if show GUI; always true if there are human playing 21 | :param dir_save: directory to save board image if GUI is shown; no save for None 22 | """ 23 | self.agent_black = agent_black 24 | self.agent_white = agent_white 25 | 26 | self.board = Board(next_color='BLACK') 27 | 28 | gui = gui if agent_black and agent_white else True 29 | self.ui = UI() if gui else None 30 | self.dir_save = dir_save 31 | 32 | # Metadata 33 | self.time_elapsed = None 34 | 35 | @property 36 | def winner(self): 37 | return self.board.winner 38 | 39 | @property 40 | def next(self): 41 | return self.board.next 42 | 43 | @property 44 | def counter_move(self): 45 | return self.board.counter_move 46 | 47 | def start(self): 48 | if self.ui: 49 | self._start_with_ui() 50 | else: 51 | self._start_without_ui() 52 | 53 | def _start_with_ui(self): 54 | """Start the game with GUI.""" 55 | self.ui.initialize() 56 | self.time_elapsed = time.time() 57 | 58 | # First move is fixed on the center of board 59 | first_move = (10, 10) 60 | self.board.put_stone(first_move, check_legal=False) 61 | self.ui.draw(first_move, opponent_color(self.board.next)) 62 | 63 | # Take turns to play move 64 | while self.board.winner is None: 65 | if self.board.next == 'BLACK': 66 | point = self.perform_one_move(self.agent_black) 67 | else: 68 | point = self.perform_one_move(self.agent_white) 69 | 70 | # Check if action is legal 71 | if point not in self.board.legal_actions: 72 | continue 73 | 74 | # Apply action 75 | prev_legal_actions = self.board.legal_actions.copy() 76 | self.board.put_stone(point, check_legal=False) 77 | # Remove previous legal actions on board 78 | for action in prev_legal_actions: 79 | self.ui.remove(action) 80 | # Draw new point 81 | self.ui.draw(point, opponent_color(self.board.next)) 82 | # Update new legal actions and any removed groups 83 | if self.board.winner: 84 | for group in self.board.removed_groups: 85 | for point in group.points: 86 | self.ui.remove(point) 87 | if self.board.end_by_no_legal_actions: 88 | print('Game ends early (no legal action is available for %s)' % self.board.next) 89 | else: 90 | for action in self.board.legal_actions: 91 | self.ui.draw(action, 'BLUE', 8) 92 | 93 | self.time_elapsed = time.time() - self.time_elapsed 94 | if self.dir_save: 95 | path_file = join(self.dir_save, 'go_' + str(time.time()) + '.jpg') 96 | self.ui.save_image(path_file) 97 | print('Board image saved in file ' + path_file) 98 | 99 | def _start_without_ui(self): 100 | """Start the game without GUI. Only possible when no human is playing.""" 101 | # First move is fixed on the center of board 102 | self.time_elapsed = time.time() 103 | first_move = (10, 10) 104 | self.board.put_stone(first_move, check_legal=False) 105 | 106 | # Take turns to play move 107 | while self.board.winner is None: 108 | if self.board.next == 'BLACK': 109 | point = self.perform_one_move(self.agent_black) 110 | else: 111 | point = self.perform_one_move(self.agent_white) 112 | 113 | # Apply action 114 | self.board.put_stone(point, check_legal=False) # Assuming agent always gives legal actions 115 | 116 | if self.board.end_by_no_legal_actions: 117 | print('Game ends early (no legal action is available for %s)' % self.board.next) 118 | 119 | self.time_elapsed = time.time() - self.time_elapsed 120 | 121 | def perform_one_move(self, agent): 122 | if agent: 123 | return self._move_by_agent(agent) 124 | else: 125 | return self._move_by_human() 126 | 127 | def _move_by_agent(self, agent): 128 | if self.ui: 129 | pygame.time.wait(100) 130 | pygame.event.get() 131 | return agent.get_action(self.board) 132 | 133 | def _move_by_human(self): 134 | while True: 135 | pygame.time.wait(100) 136 | for event in pygame.event.get(): 137 | if event.type == pygame.QUIT: 138 | pygame.quit() 139 | return 140 | if event.type == pygame.MOUSEBUTTONDOWN: 141 | if event.button == 1 and self.ui.outline.collidepoint(event.pos): 142 | x = int(round(((event.pos[0] - 5) / 40.0), 0)) 143 | y = int(round(((event.pos[1] - 5) / 40.0), 0)) 144 | point = (x, y) 145 | stone = self.board.exist_stone(point) 146 | if not stone: 147 | return point 148 | 149 | 150 | def get_args(): 151 | parser = ArgumentParser('Mini Go Game') 152 | parser.add_argument('-b', '--agent_black', default=None, 153 | help='possible agents: random; greedy; minimax; expectimax, approx-q; DEFAULT is None (human)') 154 | parser.add_argument('-w', '--agent_white', default=None, 155 | help='possible agents: random; greedy; minimax; expectimax, approx-q; DEFAULT is None (human)') 156 | parser.add_argument('-d', '--search_depth', type=int, default=1, 157 | help='the search depth for searching agents if applicable; DEFAULT is 1') 158 | parser.add_argument('-g', '--gui', type=bool, default=True, 159 | help='if show GUI; always true if human plays; DEFAULT is True') 160 | parser.add_argument('-s', '--dir_save', default=None, 161 | help='if not None, save the image of last board state to this directory; DEFAULT is None') 162 | return parser.parse_args() 163 | 164 | 165 | def get_agent(str_agent, color, depth): 166 | if str_agent is None: 167 | return None 168 | str_agent = str_agent.lower() 169 | if str_agent == 'none': 170 | return None 171 | elif str_agent == 'random': 172 | return RandomAgent(color) 173 | elif str_agent == 'greedy': 174 | return GreedyAgent(color) 175 | elif str_agent == 'minimax': 176 | return AlphaBetaAgent(color, depth=depth) 177 | elif str_agent == 'expectimax': 178 | return ExpectimaxAgent(color, depth=depth) 179 | elif str_agent == 'approx-q': 180 | agent = ApproxQAgent(color, RlEnv()) 181 | agent.load('agent/rl/ApproxQAgent.npy') 182 | return agent 183 | else: 184 | raise ValueError('Invalid agent for ' + color) 185 | 186 | 187 | def main(): 188 | args = get_args() 189 | depth = args.search_depth 190 | agent_black = get_agent(args.agent_black, 'BLACK', depth) 191 | agent_white = get_agent(args.agent_white, 'WHITE', depth) 192 | gui = args.gui 193 | dir_save = args.dir_save 194 | 195 | print('Agent for BLACK: ' + (str(agent_black) if agent_black else 'Human')) 196 | print('Agent for WHITE: ' + (str(agent_white) if agent_white else 'Human')) 197 | if dir_save: 198 | print('Directory to save board image: ' + dir_save) 199 | 200 | match = Match(agent_black=agent_black, agent_white=agent_white, gui=gui, dir_save=dir_save) 201 | 202 | print('Match starts!') 203 | match.start() 204 | 205 | print(match.winner + ' wins!') 206 | print('Match ends in ' + str(match.time_elapsed) + ' seconds') 207 | print('Match ends in ' + str(match.counter_move) + ' moves') 208 | 209 | 210 | if __name__ == '__main__': 211 | # match = Match() 212 | # match = Match(agent_black=RandomAgent('BLACK')) 213 | # match = Match(agent_black=RandomAgent('BLACK'), agent_white=RandomAgent('WHITE'), gui=True) 214 | # match = Match(agent_black=RandomAgent('BLACK'), agent_white=RandomAgent('WHITE'), gui=False) 215 | # match.start() 216 | # print(match.winner + ' wins!') 217 | # print('Match ends in ' + str(match.time_elapsed) + ' seconds') 218 | # print('Match ends in ' + str(match.counter_move) + ' moves') 219 | main() 220 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | pygame 2 | numpy --------------------------------------------------------------------------------