├── README.md ├── askguess ├── agents │ ├── __init__.py │ ├── answer_agent.py │ └── question_agent.py ├── compute.py ├── demo.py ├── labels.json ├── test_labels.json └── utils │ ├── __init__.py │ ├── prompt.py │ └── utils.py ├── assets ├── .DS_Store ├── GameEval.png ├── case1.png ├── res.png ├── res1.png └── res2.png ├── chat ├── __init__.py ├── config.py ├── config_example.py ├── gpt3_chat.py ├── gpt4_chat.py └── text003_chat.py ├── game_askguess.py ├── game_spyfall.py ├── game_tofukingdom.py ├── spyfall ├── .gitignore ├── agents │ ├── __init__.py │ └── base_agent.py ├── compute_adversarial.py ├── labels.txt └── utils │ ├── prompt.py │ └── utils.py └── tofukingdom ├── .gitignore ├── agents ├── __init__.py ├── base_agent.py ├── chef_agent.py ├── guard_agent.py ├── maid_agent.py ├── minister_agent.py ├── prince_agent.py ├── princess_agent.py ├── queen_agent.py └── spy_agent.py ├── compute.py └── utils ├── prompt.py └── utils.py /README.md: -------------------------------------------------------------------------------- 1 | # GameEval: GameEval: Evaluating LLMs on Conversational Games 2 | 3 | [[`Paper`](https://arxiv.org/pdf/2308.10032v1.pdf)] [[`BibTeX`](#citing-gameeval)] 4 | 5 | GameEval try to evalutate powerful LLMs by playing conversational games.GameEval treats 6 | LLMs as game players and assigns them distinct roles with 7 | specific goals achieved by launching conversations of various forms, including discussion, question answering, and voting. 8 | 9 | ## Involved Capabilities 10 | GameEval is distinct from other evaluation methods, as it requires not only the model’s common capabilities like instruct-following but also the model’s higher-level skills, including cooperative&adversarial strategies, and even deceptive strategies and long-term planning. In this section, we introduce various distinctive capabilities that can be effectively evaluated by conversational games. We show shows the 11 | capabilities of LLMs that can be examined by these games. 12 | | Capabilities | Ask-Guess | SpyFall | TofuKingdom | 13 | |---------------------- |----------- |--------- |------------- | 14 | | Cooperative Strategy | ✓ | ✓ | ✓ | 15 | | Adversarial Strategy | X | ✓ | ✓ | 16 | | Specific Knowledge | ✓ | ✓ | X | 17 | | Multi-hop Reasoning | ✓ | ✓ | ✓ | 18 | | Deceptive Strategy | X | ✓ | ✓ | 19 | | Long-term Planning | ✓ | ✓ | X | 20 | | Instruct-Following | ✓ | ✓ | ✓ | 21 | 22 | 23 | ## Expiremental Result of the Original Version 24 | 25 | **Experimental results on GameEval clearly demonstrate high discrimination in the capabilities of models under evaluation.** 26 |

27 | 28 |

29 | 30 | ### Ask-Guess 31 | The reult of the easy version. (with prior description from the answerer) 32 | | Model | Round | ST | EE | RLE | AME | CE | 33 | |---|---|---|---|---|---|---| 34 | | TD003 | 4.39 | 82.71 | 9.47 | 1.84 | 5.97 | 0.01 | 35 | | ChatGPT | 6.01 | 53.39 | 8.13 | 14.63 | 23.21 | 0.64 | 36 | | GPT4 | 1.57 | 97.69 | 0.80 | 1.01 | 0.47 | 0.03 | 37 | 38 | The reult of the hard version. (without prior description from the answerer) 39 | 40 | | Model | Round | ST | EE | RLE | AME | CE | 41 | |--------- |------- |------- |------- |------- |------ |------ | 42 | | TD003 | 15.13 | 42.36 | 19.18 | 37.19 | 0.36 | 0.91 | 43 | | ChatGPT | 13.78 | 40.50 | 3.88 | 49.89 | 4.57 | 1.16 | 44 | | GPT4 | 4.01 | 92.77 | 2.95 | 0.84 | 2.75 | 0.69 | 45 | 46 | ### SpyFall 47 | S-model means the model plays the spy, V-model means the model plays the villagers. 48 |

49 | 50 | 51 |

52 | 53 | ### TofuKingdom 54 | We let different LLMs play all the roles in the same camps to perform a adversarial game. The model that represent a winning camp can get one point. 55 | 56 | | Prince | Spy | Queen | ChatGPT | GPT4 | TD003 | 57 | |--------- |--------- |--------- |--------- |------ |------- | 58 | | TD003 | GPT4 | ChatGPT | 7 | 9 | 4 | 59 | | TD003 | ChatGPT | GPT4 | 5 | 11 | 4 | 60 | | ChatGPT | GPT4 | TD003 | 8 | 7 | 5 | 61 | | ChatGPT | TD003 | GPT4 | 5 | 9 | 6 | 62 | | GPT4 | TD003 | ChatGPT | 6 | 7 | 7 | 63 | | GPT4 | ChatGPT | TD003 | 8 | 8 | 4 | 64 | | - | - | Total | 39 | 51 | 30 | 65 | 66 | ## Illustration 67 | Below is a simple demonstration of three designed games: Ask-Guess, SpyFall and TofuKingdom. 68 |

69 | 70 |

71 | 72 | 73 | ## How to use GameEval 74 | 75 | ### For Azure OpenAI 76 | You can create a `chat/config.py` file with reference to the `chat/config_example.py` file, and fill in your Azure OpenAI account information. 77 | 78 | ### For other LLMs 79 | For other models including Official OpenAI models and open-source models, you can create a chat file in folder `chat` to create a chatbot which receive messsages or text prompt as input and give the response as output. 80 | You can read other files in folder `chat` for reference. 81 | 82 | ### Install Packages 83 | ``` 84 | pip install openai 85 | pip install vthread 86 | pip intsall func_timeout 87 | ``` 88 | 89 | ## Ask-Guess 90 | ### Game Introduction 91 | Ask-Guess is a cooperative game involving a questioner and an answerer. At the beginning of the game, the answerer receives a word unknown to the questioner. In each round, the questioner may ask the answerer one question, and the answerer has to answer faithfully. The provided word or phrase must not be included in the answerer’s reply. Both participants should collaborate to minimize the number of Q&A rounds needed for the questioner to deduce the given word or phrase accurately. The questioner should ask targeted questions to progressively narrow down the potential scope of the given word based on the answerer’s responses. The answerer must assess whether the questioner has successfully identified the word and respond with ’Gameover’ to conclude the game. 92 | 93 | ### Get Started 94 | **You can direct use the following script to use model `ChatGPT` to play the game.** You can set the word to be guessed in `label_path` and `n` means run n times for each word. The result and the game log will be automatically recorded. 95 | ``` 96 | cd ask-guess 97 | python game_askguess.py \ 98 | --label_path test_labels.json \ 99 | --model_name gpt3 \ 100 | --mode easy \ 101 | --debug false \ 102 | --n 30 103 | ``` 104 | **When all the game is over, you can compute the average result mentioned in the paper by run the file `compute.py`.** 105 | 106 | ### Case 107 | To better understand how conversational games reflect the gap in model capabilities, we show the game dialogue in Ask-Guess without prior description. 108 |

109 | 110 |

111 | As we can see, both ChatGPT and GPT-4 can correctly understand the tasks, and they ask and answer questions according to the game rules. 112 | However, for a given goal, GPT-4 has demonstrated an astonishing planning ability; the series of questions it asks follow a specific taxonomy. In each round, GPT-4 shows a clear awareness of the impossible objectives that have been ruled out by previous Q\&A and ask new questions targeted at the remaining part. However, the questions ChatGPT asks seem more disorganized and disoriented. 113 | 114 | 115 | ## SpyFall 116 | 117 | ### Game Introduction 118 | This game has six players, including one spy and five villagers. 119 | At the beginning of the game, everyone will receive a word. 120 | The spy will receive the spy word, and others will receive the common word. 121 | Spy word is different but relevant to the common word. For example, the spy word can be "lion," and the common word is "tiger." 122 | There are two stages in each round of the game. 123 | In the first stage, everyone needs to describe the word he got but cannot say the given word directly. 124 | In the second stage, everyone should vote for a player he thinks is the spy according to the descriptions in the first stage and state why he thinks this player is a spy. 125 | 126 | ### Get Started 127 | 128 | ``` 129 | cd spyfall 130 | python game_spyfall.py \ 131 | --label_path spyfall/labels.txt \ 132 | --spy_model_name gpt3 \ 133 | --villager_model_name gpt3 \ 134 | --debug false \ 135 | --n 30 136 | ``` 137 | 138 | ## TofuKingdom 139 | 140 | ### Game Introduction 141 | This game is a role-playing text reasoning game. 142 | It has eight roles, including Prince, Princess, Queen, Minister, Chef, Guard, Maid, and Spy. 143 | The players, except the Prince, know the real identity of the rest of the players. 144 | The Prince needs to guess which player is the Princess by asking one question to each player. 145 | During the game, the Prince's question can only be chosen from the three questions below: 146 | 1. Who is the Princess; 147 | 2. What is your identity; 148 | 3. What is the identity of \{player\_name\}. 149 | 150 | There are three different camps in this game. 151 | The Princess and Chef belong to the Prince Camp; they must tell the truth when answering the question. 152 | The Queen, Minister, and Guard belong to the Queen Camp; they must tell a lie when answering the question. 153 | The Spy and the Maid belong to the Spy Camp and can choose to speak the truth or lie. 154 | After asking each player one question, the Prince can still choose one player to ask an extra question. 155 | The question should also be chosen from one of the three questions mentioned above. 156 | Then the Prince has to choose a player who he thinks is the Princess. 157 | If the Prince correctly chooses Princess, the Chef and the Princess win. 158 | If the Prince chooses the Queen, the Queen, Minister, and Guard win. 159 | If the Prince chooses a player whose identity is neither the Princess nor the Queen, the Maid and Spy wins. 160 | 161 | ### Get Started 162 | ``` 163 | cd spyfall 164 | python game_spyfall.py \ 165 | --prince_model_name gpt3 \ 166 | --queen_model_name gpt4 \ 167 | --spy_model_name td003 \ 168 | --debug false \ 169 | --n 20 170 | ``` 171 | 172 | 173 | ## Citing GameEval 174 | 175 | ``` 176 | @article{qiao2023gameeval, 177 | title={GameEval: Evaluating LLMs on Conversational Games}, 178 | author={Qiao, Dan and Wu, Chenfei and Liang, Yaobo and Li, Juntao and Duan, Nan}, 179 | journal={arXiv preprint arXiv:2308.10032}, 180 | year={2023} 181 | } 182 | ``` -------------------------------------------------------------------------------- /askguess/agents/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jordddan/GameEval/5025a1abfc00de62aab177a016cc8b64d2c1b4ab/askguess/agents/__init__.py -------------------------------------------------------------------------------- /askguess/agents/answer_agent.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | sys.path.append("ask-guess") 4 | from askguess.utils.prompt import get_answerer_role,get_questioner_role 5 | from askguess.utils.utils import create_message,print_messages, convert_messages_to_prompt 6 | class AnswerAgnent: 7 | 8 | def __init__(self, chatbot, object_name, args) -> None: 9 | 10 | self.chatbot = chatbot 11 | self.object_name = object_name 12 | self.role_easy, self.role_hard = get_answerer_role(object_name) 13 | 14 | self.history = [] 15 | if args.mode == "easy": 16 | role_message = create_message("system",self.role_easy) 17 | else: 18 | role_message = create_message("system",self.role_hard) 19 | self.history.append(role_message) 20 | 21 | def play(self): 22 | 23 | if self.chatbot.name == "td003": 24 | text_prompt = convert_messages_to_prompt(messages=self.history,role="answerer") 25 | response = self.chatbot.single_chat(text_prompt) 26 | else: 27 | response = self.chatbot.multi_chat(self.history) 28 | return response 29 | 30 | def answer(self): 31 | # response = self.chatbox.multi_chat(input_list=self.history,role=self.answer_role,role_last = self.role_last) 32 | # import pdb 33 | # pdb.set_trace() 34 | if self.chatbot.name == "td003": 35 | text_prompt = self.get_answer_prompt() 36 | response = self.chatbot.single_chat(text_prompt) 37 | 38 | else: 39 | response = self.chatbot.multi_chat(self.history) 40 | 41 | return response 42 | 43 | def get_answer_prompt(self): 44 | prompt = "##system##" 45 | prompt += self.answer_role + "\n" 46 | flag = (len(self.history) + 1) % 2 47 | for i in range(len(self.history)): 48 | if i % 2 == flag: 49 | prompt += "##questioner##: " 50 | else: 51 | prompt += "##answerer##:" 52 | prompt += self.history[i]["content"] + "\n\n" 53 | prompt += "##answerer##:" 54 | 55 | return prompt 56 | 57 | def get_describe_prompt(self): 58 | 59 | prompt = "##system##" 60 | prompt += self.describe_role + "\n" 61 | flag = (len(self.history) + 1) % 2 62 | for i in range(len(self.history)): 63 | if i % 2 == flag: 64 | prompt += "##questioner##: " 65 | else: 66 | prompt += "##answerer##:" 67 | prompt += self.history[i]["content"] + "\n\n" 68 | prompt += "##answerer##:" 69 | 70 | return prompt 71 | 72 | def update_history(self, new_message): 73 | self.history.append(new_message) 74 | 75 | 76 | -------------------------------------------------------------------------------- /askguess/agents/question_agent.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | sys.path.append("ask-guess") 4 | from askguess.utils.prompt import get_questioner_role 5 | from askguess.utils.utils import create_message,convert_messages_to_prompt,print_messages 6 | 7 | class QuestionAgnent: 8 | 9 | def __init__(self,chatbot, object_name, args) -> None: 10 | 11 | self.chatbot = chatbot 12 | self.object_name = object_name 13 | self.role_easy, self.role_hard = get_questioner_role() 14 | self.history = [] 15 | if args.mode == "easy": 16 | role_message = create_message("system",self.role_easy) 17 | else: 18 | role_message = create_message("system",self.role_hard) 19 | self.history.append(role_message) 20 | 21 | def play(self): 22 | if self.chatbot.name == "td003": 23 | text_prompt = convert_messages_to_prompt(messages=self.history,role="questioner") 24 | response = self.chatbot.single_chat(text_prompt) 25 | else: 26 | response = self.chatbot.multi_chat(self.history) 27 | return response 28 | 29 | def update_history(self, new_message): 30 | self.history.append(new_message) 31 | 32 | 33 | 34 | -------------------------------------------------------------------------------- /askguess/compute.py: -------------------------------------------------------------------------------- 1 | import json 2 | 3 | from agents.answer_agent import AnswerAgnent 4 | from agents.question_agent import QuestionAgnent 5 | from utils.utils import get_model, create_message 6 | from utils.prompt import host_description_prompt, host_qa_prompt 7 | import argparse 8 | import json 9 | 10 | 11 | 12 | if __name__ == "__main__": 13 | parser = argparse.ArgumentParser() 14 | parser.add_argument('--model_name',type=str,default='gpt3') 15 | parser.add_argument('--label_path',type=str,default="labels.json") 16 | parser.add_argument('--mode',type=str,default='easy') 17 | parser.add_argument('--n',type=int,default='20') 18 | args = parser.parse_args() 19 | 20 | model_name = args.model_name 21 | mode = args.mode 22 | N = args.n 23 | 24 | 25 | file_path = f"guess_result_{mode}_{model_name}.json" 26 | output_path = f"avg_result_{mode}_{model_name}.json" 27 | with open(file_path,'r') as f: 28 | data = f.readlines() 29 | 30 | res = {} 31 | 32 | # error_type: EndingError, AnswerMentionedError, RoundLimitError, ChatError, SuccessfulTrial 33 | for item in data: 34 | line = json.loads(item) 35 | name = line["object"].replace(" ","_") 36 | error_type = line["error_type"] 37 | round = line["round"] 38 | if name not in res: 39 | res[name] = {"round":0,"EndingError":0,"SuccessfulTrial":0,"ChatError":0,"RoundLimitError":0,"AnswerMentionedError":0} 40 | if name in res: 41 | res[name][error_type] += 1 42 | if round > 0: 43 | res[name]["round"] += round 44 | 45 | for key,value in res.items(): 46 | if res[key]["SuccessfulTrial"] != 0: 47 | res[key]["round"] /= res[key]["SuccessfulTrial"] 48 | else: 49 | res[key]["round"] = 30 50 | 51 | 52 | data = res 53 | 54 | # error_type: EndingError, AnswerMentionedError, RoundLimitError, ChatError, SuccessfulTrial 55 | 56 | acc_avg = { "round": 0, 57 | "EndingError": 0, 58 | "ChatError": 0, 59 | "SuccessfulTrial": 0, 60 | "RoundLimitError": 0, 61 | "AnswerMentionedError": 0} 62 | 63 | cnt_correct = 0 64 | for name,item in data.items(): 65 | if item["round"] != 0: 66 | cnt_correct += 1 67 | for key in item: 68 | acc_avg[key] += item[key] 69 | 70 | for key in acc_avg: 71 | if key == "round": 72 | acc_avg[key] /= cnt_correct 73 | else: 74 | acc_avg[key] /= len(data) * N 75 | 76 | data["avg"] = acc_avg 77 | 78 | with open(output_path,'w') as f: 79 | json.dump(data,f,indent=1) -------------------------------------------------------------------------------- /askguess/demo.py: -------------------------------------------------------------------------------- 1 | import gradio as gr 2 | import random 3 | import time 4 | import json 5 | 6 | from agents.answer_agent import AnswerAgnent 7 | from agents.question_agent import QuestionAgnent 8 | from utils.utils import get_model, create_message 9 | from utils.prompt import host_description_prompt, host_qa_prompt 10 | import argparse 11 | import json 12 | 13 | if __name__ == "__main__": 14 | parser = argparse.ArgumentParser() 15 | parser.add_argument('--model_name',type=str,default='gpt3') 16 | parser.add_argument('--mode',type=str,default='easy') 17 | parser.add_argument('--n',type=int,default='20') 18 | 19 | with gr.Blocks() as demo: 20 | chatbot = gr.Chatbot() 21 | msg = gr.Textbox() 22 | clear = gr.Button("清除") 23 | 24 | def respond(message, chat_history): 25 | bot_message = random.choice(["你好吗？", "我爱你", "我很饿"]) 26 | chat_history.append((message, bot_message)) 27 | time.sleep(1) 28 | return "", chat_history 29 | 30 | msg.submit(respond, [msg, chatbot], [msg, chatbot]) 31 | clear.click(lambda: None, None, chatbot, queue=False) 32 | 33 | demo.launch() 34 | -------------------------------------------------------------------------------- /askguess/labels.json: -------------------------------------------------------------------------------- 1 | ["apple", "aquarium_fish", "baby", "bear", "beaver", "bed", "bee", "beetle", "bicycle", "bottle", "bowl", "boy", "bridge", "bus", "butterfly", "camel", "can", "castle", "caterpillar", "cattle", "chair", "chimpanzee", "clock", "cloud", "cockroach", "couch", "cra", "crocodile", "cup", "dinosaur", "dolphin", "elephant", "flatfish", "forest", "fox", "girl", "hamster", "house", "kangaroo", "keyboard", "lamp", "lawn_mower", "leopard", "lion", "lizard", "lobster", "man", "maple_tree", "motorcycle", "mountain", "mouse", "mushroom", "oak_tree", "orange", "orchid", "otter", "palm_tree", "pear", "pickup_truck", "pine_tree", "plain", "plate", "poppy", "porcupine", "possum", "rabbit", "raccoon", "ray", "road", "rocket", "rose", "sea", "seal", "shark", "shrew", "skunk", "skyscraper", "snail", "snake", "spider", "squirrel", "streetcar", "sunflower", "sweet_pepper", "table", "tank", "telephone", "television", "tiger", "tractor", "train", "trout", "tulip", "turtle", "wardrobe", "whale", "willow_tree", "wolf", "woman", "worm"] -------------------------------------------------------------------------------- /askguess/test_labels.json: -------------------------------------------------------------------------------- 1 | ["apple","bed", "beetle", "baby", "bear", "women"] -------------------------------------------------------------------------------- /askguess/utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jordddan/GameEval/5025a1abfc00de62aab177a016cc8b64d2c1b4ab/askguess/utils/__init__.py -------------------------------------------------------------------------------- /askguess/utils/prompt.py: -------------------------------------------------------------------------------- 1 | def get_answerer_role(word): 2 | prompt_easy = f''' 3 | You are playing a game called "Ask-Guess." 4 | 5 | The rules of this game are as follows: 6 | The game requires two players, a questioner and an answerer. 7 | At the beginning of the game, the answerer is given a word or phrase as the answer, which the questioner does not know. 8 | The answerer can first briefly describe the given word or phrase without directly mentioning it. 9 | Then, the questioner must guess the word or phrase by asking a series of questions, but only one question in each round. 10 | The answerer should check whether the questioner has guessed the answer. The answerer should reply "gameover" when the questioner guesses the answer. 11 | But the answerer must now directly say the given word or phrase when answering the question. 12 | The goal of the game is to guess the given word or phrase using as few rounds of Q&A as possible. 13 | 14 | Now, you are playing the game. You are the answerer in the game. The given word is "{word}". The user is the questioner to guess the answer. 15 | In each round, you should first check if the user has directly guessed the word "{word}" or guessed similar descriptions like "a kind of {word}" or "a type of {word}." If so, the game is over. You must directly output "gameover". 16 | Otherwise, you should answer questions faithfully or give hints to guide the questioner to guess the word. In no circumstances should you directly say "{word}." 17 | ''' 18 | 19 | prompt_hard = f''' 20 | You are playing a game called "Ask-Guess" 21 | 22 | The rules of this game are as follows: 23 | The game requires two players, a questioner and an answerer. 24 | At the beginning of the game, the answerer is given a word or phrase as the answer, which the questioner does not know. 25 | Then, the questioner must guess the word or phrase by asking a series of questions, but only one question in each round. 26 | The answerer should only reply "yes" or "no" to the question or reply "gameover" when the questioner has guessed the answer. 27 | The game's goal is to guess the given word or phrase using as few rounds of Q&A as possible. 28 | 29 | Now, you are the answerer, the user is the questioner, you should answer the questions faithfully. 30 | The answer is "{word}". 31 | You can only answer the question with "yes" or "no" or "gameover", no more words are allowed in your reply. 32 | In each round, you should first check if the user has directly guessed the word "{word}" or guessed similar descriptions like "a kind of {word}" or "a type of {word}." If so, the game is over. You should directly output "gameover". 33 | Remember, you can only reply "yes," "no," or "gameover." 34 | ''' 35 | return prompt_easy, prompt_hard 36 | 37 | def get_questioner_role(): 38 | 39 | prompt_easy = f''' 40 | You are playing a game called "Ask-Guess" 41 | 42 | The rules of this game are as follows: 43 | The game requires two players, a questioner and an answerer. 44 | At the beginning of the game, the answerer is given a word or phrase as the answer, which the questioner does not know. 45 | The answerer can first briefly describe the given word or phrase without directly mentioning it. 46 | Then, the questioner must guess the word or phrase by asking a series of questions, but only one question in each round. 47 | The answerer should answer the questions faithfully, even give some hints to guide the questioner to guess the answer and check whether the questioner has guessed the answer. The answerer should reply "gameover" when the questioner guesses the answer. 48 | But the answerer must now directly say the given word or phrase when answering the question. 49 | The goal of the game is to guess the given word or phrase using as few rounds of Q&A as possible. 50 | 51 | Now, you are the questioner. You should guess the word or phrase by asking questions, but only one question in each round. Your question should be helpful to guess the word or phrase; do not ask irrelevant questions. 52 | ''' 53 | prompt_hard = f''' 54 | You are playing a game called "Ask-Guess" 55 | 56 | The rules of this game are as follows: 57 | The game requires two players, a questioner and an answerer. 58 | At the beginning of the game, the answerer is given a word or phrase as the answer, which the questioner does not know. 59 | Then, the questioner must guess the word or phrase by asking a series of questions, but only one question in each round. 60 | The answerer should only reply "yes" or "no" to the question or reply "gameover" when the questioner has guessed the answer. 61 | The game's goal is to guess the given word or phrase using as few rounds of Q&A as possible. 62 | 63 | Now, you are the questioner, you should guess the word or phrase by asking questions, but only one question in each round. Your question should be helpful to guess the word or phrase, do not ask irrelevant questions. 64 | ''' 65 | return prompt_easy, prompt_hard 66 | 67 | 68 | host_description_prompt = '''Now the game start, answerer please give a short description of your received word or phrase.''' 69 | 70 | host_qa_prompt = '''Now the Q&A start, questioner please guess the answer!.''' -------------------------------------------------------------------------------- /askguess/utils/utils.py: -------------------------------------------------------------------------------- 1 | from chat.gpt3_chat import GPT3 2 | from chat.gpt4_chat import GPT4 3 | from chat.text003_chat import Text003 4 | 5 | def get_model(model_name): 6 | model = None 7 | if model_name == "gpt3": 8 | model = GPT3() 9 | if model_name == "gpt4": 10 | model = GPT4() 11 | if model_name == "td003": 12 | model = Text003() 13 | return model 14 | 15 | def create_message(role,content): 16 | return {"role":role,"content":content} 17 | 18 | def print_messages(messages): 19 | for message in messages: 20 | print(message) 21 | 22 | def convert_messages_to_prompt(messages,role): 23 | prompt = "" 24 | if role == "questioner": 25 | for message in messages: 26 | content = message["content"] 27 | if message["role"] == "user": 28 | prompt += f"questioner: {content}\n" 29 | elif message["role"] == "assistant": 30 | prompt += f"answerer: {content}\n" 31 | else: 32 | prompt += f"host: {content}\n" 33 | prompt += "questioner: " 34 | else: 35 | for message in messages: 36 | content = message["content"] 37 | if message["role"] == "assistant": 38 | prompt += f"questioner: {content}\n" 39 | elif message["role"] == "user": 40 | prompt += f"answerer: {content}\n" 41 | else: 42 | prompt += f"host: {content}\n" 43 | prompt += "answerer: " 44 | 45 | return prompt 46 | -------------------------------------------------------------------------------- /assets/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jordddan/GameEval/5025a1abfc00de62aab177a016cc8b64d2c1b4ab/assets/.DS_Store -------------------------------------------------------------------------------- /assets/GameEval.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jordddan/GameEval/5025a1abfc00de62aab177a016cc8b64d2c1b4ab/assets/GameEval.png -------------------------------------------------------------------------------- /assets/case1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jordddan/GameEval/5025a1abfc00de62aab177a016cc8b64d2c1b4ab/assets/case1.png -------------------------------------------------------------------------------- /assets/res.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jordddan/GameEval/5025a1abfc00de62aab177a016cc8b64d2c1b4ab/assets/res.png -------------------------------------------------------------------------------- /assets/res1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jordddan/GameEval/5025a1abfc00de62aab177a016cc8b64d2c1b4ab/assets/res1.png -------------------------------------------------------------------------------- /assets/res2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jordddan/GameEval/5025a1abfc00de62aab177a016cc8b64d2c1b4ab/assets/res2.png -------------------------------------------------------------------------------- /chat/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jordddan/GameEval/5025a1abfc00de62aab177a016cc8b64d2c1b4ab/chat/__init__.py -------------------------------------------------------------------------------- /chat/config.py: -------------------------------------------------------------------------------- 1 | key_gpt3 = "a41acec784d340b184bc71d850c97a7f" 2 | key_gpt4 = "a41acec784d340b184bc71d850c97a7f" 3 | key_td003 = "653880d85b6e4a209206c263d7c3cc7a" 4 | 5 | api_type_gpt3 = "azure" 6 | api_base_gpt3 = "https://mtutor-dev.openai.azure.com/" 7 | api_version_gpt3 = "2023-03-15-preview" 8 | 9 | api_type_gpt4 = "azure" 10 | api_base_gpt4 = "https://mtutor-dev.openai.azure.com/" 11 | api_version_gpt4 = "2023-03-15-preview" 12 | 13 | api_type_td003 = "azure" 14 | api_base_td003 = "https://gcrgpt4aoai5.openai.azure.com" 15 | api_version_td003 = "2023-03-15-preview" 16 | 17 | engine_gpt4 = "devgpt4-32k" 18 | engine_gpt3 = "mtutor-openai-dev" 19 | engine_td003 = "text-davinci-003" 20 | 21 | temperature_gpt3 = 0.7 22 | temperature_gpt4 = 0.7 23 | temperature_td003 = 0.7 -------------------------------------------------------------------------------- /chat/config_example.py: -------------------------------------------------------------------------------- 1 | key_gpt3 = 2 | key_gpt4 = 3 | key_td003 = 4 | 5 | api_type_gpt3 = 6 | api_base_gpt3 = 7 | api_version_gpt3 = 8 | 9 | api_type_gpt4 = 10 | api_base_gpt4 = 11 | api_version_gpt4 = 12 | 13 | api_type_td003 = 14 | api_base_td003 = 15 | api_version_td003 = 16 | 17 | engine_gpt4 = 18 | engine_gpt3 = 19 | engine_td003 = 20 | 21 | temperature_gpt3 = 22 | temperature_gpt4 = 23 | temperature_td003 = -------------------------------------------------------------------------------- /chat/gpt3_chat.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import openai 3 | from func_timeout import func_set_timeout 4 | 5 | from chat.config import key_gpt3, api_type_gpt3, api_base_gpt3, api_version_gpt3, engine_gpt3, temperature_gpt3 6 | 7 | @func_set_timeout(15) 8 | def get_response(messages): 9 | response = openai.ChatCompletion.create( 10 | temperature=temperature_gpt3, 11 | engine=engine_gpt3, 12 | messages = messages, 13 | api_type=api_type_gpt3, 14 | api_base=api_base_gpt3, 15 | api_version=api_version_gpt3, 16 | api_key=key_gpt3, 17 | ) 18 | return response 19 | 20 | class GPT3: 21 | def __init__(self) -> None: 22 | self.name = "gpt3" 23 | def single_chat(self,content,role=None): 24 | if role is None: 25 | role = "You are an AI assistant that helps people find information." 26 | messages = [ 27 | {"role":"system","content":role}, 28 | {"role":"user","content":content} 29 | ] 30 | res = None 31 | cnt = 0 32 | while True: 33 | try: 34 | response = get_response(messages) 35 | res = response["choices"][0]["message"]["content"] 36 | break 37 | except: 38 | cnt += 1 39 | if cnt >= 5: 40 | break 41 | 42 | return res 43 | 44 | def multi_chat(self, messages): 45 | 46 | res = None 47 | cnt = 0 48 | 49 | while True: 50 | try: 51 | response = get_response(messages) 52 | res = response["choices"][0]["message"]["content"] 53 | break 54 | except: 55 | cnt += 1 56 | if cnt >= 3: 57 | break 58 | 59 | return res 60 | 61 | if __name__ == "__main__": 62 | pass 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | -------------------------------------------------------------------------------- /chat/gpt4_chat.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | 4 | import openai 5 | 6 | 7 | from func_timeout import func_set_timeout 8 | from chat.config import key_gpt4, api_type_gpt4, api_base_gpt4, api_version_gpt4, engine_gpt4,temperature_gpt4 9 | 10 | @func_set_timeout(30) 11 | def get_response(messages): 12 | response = openai.ChatCompletion.create( 13 | temperature=temperature_gpt4, 14 | engine=engine_gpt4, 15 | messages = messages, 16 | api_type=api_type_gpt4, 17 | api_base=api_base_gpt4, 18 | api_version=api_version_gpt4, 19 | api_key=key_gpt4, 20 | ) 21 | return response 22 | 23 | class GPT4: 24 | def __init__(self) -> None: 25 | self.name = "gpt4" 26 | 27 | def single_chat(self,content,role=None): 28 | if role is None: 29 | role = "You are an AI assistant that helps people find information." 30 | messages = [ 31 | {"role":"system","content":role}, 32 | {"role":"user","content":content} 33 | ] 34 | res = None 35 | cnt = 0 36 | 37 | while True: 38 | try: 39 | response = get_response(messages) 40 | res = response["choices"][0]["message"]["content"] 41 | break 42 | except: 43 | cnt += 1 44 | if cnt >= 3: 45 | break 46 | 47 | return res 48 | 49 | def multi_chat(self, messages): 50 | 51 | res = None 52 | cnt = 0 53 | while True: 54 | try: 55 | response = get_response(messages) 56 | res = response["choices"][0]["message"]["content"] 57 | break 58 | except: 59 | cnt += 1 60 | if cnt >= 3: 61 | break 62 | 63 | return res 64 | 65 | if __name__ == "__main__": 66 | pass 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | -------------------------------------------------------------------------------- /chat/text003_chat.py: -------------------------------------------------------------------------------- 1 | import openai 2 | import sys 3 | from func_timeout import func_set_timeout 4 | from chat.config import key_td003, api_type_td003, api_base_td003, api_version_td003, engine_td003,temperature_td003 5 | import re 6 | 7 | import random 8 | 9 | @func_set_timeout(30) 10 | def get_response(prompt): 11 | 12 | response = openai.Completion.create(engine=engine_td003, 13 | temperature=temperature_td003, 14 | prompt = prompt, 15 | max_tokens = 150, 16 | api_type = api_type_td003, 17 | api_base = api_base_td003, 18 | api_version = api_version_td003, 19 | api_key=key_td003, 20 | ) 21 | 22 | return response 23 | 24 | 25 | def extract_json(string): 26 | l = string.find("{") 27 | r = string.find("}") + 1 28 | json_string = string[l:r] 29 | 30 | return json_string 31 | 32 | 33 | class Text003: 34 | def __init__(self) -> None: 35 | self.name = "td003" 36 | 37 | def single_chat(self,prompt): 38 | cnt = 0 39 | res = None 40 | 41 | # import pdb 42 | # pdb.set_trace() 43 | while True: 44 | try: 45 | response = get_response(prompt) 46 | res = response['choices'][0]['text'].replace('\n', '').replace(' .', '.').strip() 47 | break 48 | except: 49 | cnt += 1 50 | if cnt >= 3: 51 | break 52 | return res 53 | 54 | 55 | if __name__ == "__main__": 56 | 57 | pass 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | -------------------------------------------------------------------------------- /game_askguess.py: -------------------------------------------------------------------------------- 1 | import json 2 | 3 | from askguess.agents.answer_agent import AnswerAgnent 4 | from askguess.agents.question_agent import QuestionAgnent 5 | import vthread 6 | import logging 7 | import os 8 | import re 9 | from askguess.utils.utils import get_model, create_message 10 | from askguess.utils.prompt import host_description_prompt, host_qa_prompt 11 | import argparse 12 | import json 13 | 14 | def checkin(word, text): 15 | ''' 16 | check whether the answerer directly use the word as hint 17 | ''' 18 | pattern = r'[^\w\s]' 19 | 20 | replaced_text = re.sub(pattern, ' ', text) 21 | 22 | if (" " + word + " ") in replaced_text: 23 | return True 24 | 25 | return False 26 | 27 | def game(object, f, model, args): 28 | word = object.replace("_"," ") 29 | flag = False # True meas a successful trial, False means an error happened. 30 | cnt = 0 # to record the game round 31 | answer_agent = AnswerAgnent(model, word, args) 32 | question_agent = QuestionAgnent(model, word, args) 33 | error_type = None 34 | while True: 35 | # ---------- Describing Stage ---------- 36 | if args.mode == "easy" and cnt == 0: 37 | # host describing prompt 38 | host_message = create_message("system",host_description_prompt) 39 | answer_agent.update_history(host_message) 40 | question_agent.update_history(host_message) 41 | 42 | description = answer_agent.play() 43 | f.write(f"description: {description}"+"\n") 44 | 45 | if description == None: 46 | error_type = "ChatError" 47 | break 48 | if checkin(word.lower(), description.lower()): 49 | error_type = "AnswerMentionedError" 50 | break 51 | 52 | questioner_message = create_message("user",description) 53 | answerer_messsage = create_message("assistant",description) 54 | answer_agent.update_history(answerer_messsage) 55 | question_agent.update_history(questioner_message) 56 | # import pdb 57 | # pdb.set_trace() 58 | if args.debug: 59 | print(f"description: {description}"+"\n") 60 | f.write(f"description: {description}"+"\n") 61 | cnt += 1 62 | continue 63 | 64 | # ---------- Q&A Stage ---------- 65 | 66 | # host Q&A prompt 67 | host_message = create_message("system",host_qa_prompt) 68 | answer_agent.update_history(host_message) 69 | question_agent.update_history(host_message) 70 | 71 | question = question_agent.play() 72 | 73 | if question == None: 74 | error_type = "ChatError" 75 | break 76 | questioner_message = create_message("assistant",question) 77 | answerer_messsage = create_message("user",question) 78 | answer_agent.update_history(answerer_messsage) 79 | question_agent.update_history(questioner_message) 80 | if args.debug: 81 | print(f"question: {question}"+"\n") 82 | f.write(f"question: {question}"+"\n") 83 | 84 | answer = answer_agent.play() 85 | if answer == None: 86 | error_type = "ChatError" 87 | break 88 | questioner_message = create_message("user",answer) 89 | answerer_messsage = create_message("assistant",answer) 90 | answer_agent.update_history(answerer_messsage) 91 | question_agent.update_history(questioner_message) 92 | if args.debug: 93 | print(f"answer: {answer}"+"\n") 94 | f.write(f"answer: {answer}"+"\n") 95 | cnt += 1 96 | 97 | # ---------- Check the Result ---------- 98 | if "gameover" in answer.lower() or "game over" in answer.lower(): 99 | if word.lower() in question.lower(): 100 | flag = True 101 | break 102 | else: 103 | # wrongly end the game 104 | error_type = "EndingError" 105 | break 106 | 107 | # break the rule 108 | if checkin(word.lower(), answer.lower()): 109 | error_type = "AnswerMentionedError" 110 | break 111 | if cnt > 30: 112 | error_type = "RoundLimitError" 113 | break 114 | 115 | if flag: 116 | if args.debug: 117 | print({"object":word,"round":cnt-1,"error_type":"SuccessfulTrial"}) 118 | return {"object":word,"round":cnt-1,"error_type":"SuccessfulTrial"} 119 | else: 120 | if args.debug: 121 | print({"object":word,"round":-1,"error_type":error_type}) 122 | return {"object":word,"round":-1,"error_type":error_type} 123 | 124 | 125 | @vthread.pool(1) 126 | def run(word,i,model,args): 127 | with open(f"askguess/logs_{args.mode}_{args.model_name}/{word}/{i}.log","w") as f: 128 | res = game(word,f,model,args) 129 | res["log"] = f"{i}.log" 130 | with open(f"askguess/guess_result_{args.mode}_{args.model_name}.json","a") as f: 131 | f.write(json.dumps(res)+"\n") 132 | 133 | if __name__ == "__main__": 134 | 135 | parser = argparse.ArgumentParser() 136 | parser.add_argument('--label_path', type=str, default='askguess/labels.json') 137 | parser.add_argument('--model_name',type=str,default='gpt3') 138 | parser.add_argument('--mode',type=str,default='easy') 139 | parser.add_argument('--n',type=int,default='20') 140 | parser.add_argument('--debug',type=bool,default=True) 141 | args = parser.parse_args() 142 | model = get_model(args.model_name) 143 | with open(args.label_path,'r') as f: 144 | labels = json.load(f) 145 | 146 | ## prepare log folder 147 | for label in labels: 148 | log_dir = f"askguess/logs_{args.mode}_{args.model_name}/{label}" 149 | if not os.path.exists(log_dir): 150 | os.makedirs(log_dir) 151 | 152 | ## run multiple times for each word 153 | for i in range(0,len(labels)): 154 | label = labels[i] 155 | for i in range(args.n): 156 | run(label,i,model,args) 157 | 158 | vthread.pool.wait() 159 | 160 | -------------------------------------------------------------------------------- /game_spyfall.py: -------------------------------------------------------------------------------- 1 | from spyfall.agents.base_agent import BaseAgent 2 | 3 | from chat.gpt3_chat import GPT3 4 | from chat.gpt4_chat import GPT4 5 | from chat.text003_chat import Text003 6 | from spyfall.utils.utils import create_message, get_model 7 | import argparse 8 | import vthread 9 | import json 10 | import copy 11 | import random 12 | import os 13 | # gpt3 = GPT3() 14 | # gpt4 = GPT4() 15 | # text003 = Text003() 16 | # name2model = {"gpt3":gpt3,"gpt4":gpt4,"text003":text003} 17 | players = ["Nancy","Tom","Cindy","Jack","Rose","Edward"] 18 | 19 | def init_game(phrase_pair,spy_model, villager_model): 20 | # phrase_pair[0]:spy word, phrase_pair[1]:common word 21 | spy_word = phrase_pair[0] 22 | villager_word = phrase_pair[1] 23 | random.shuffle(players) 24 | name2agent = {} 25 | index = random.randint(1,len(players)) # index of the spy 26 | spy_name = players[index-1] 27 | agents_list = [] 28 | for i in range(len(players)): 29 | 30 | if i+1 == index: 31 | phrase = spy_word 32 | llm = spy_model 33 | llm_name = llm.name 34 | else: 35 | phrase = villager_word 36 | llm = villager_model 37 | llm_name = villager_model.name 38 | 39 | player_name = players[i] 40 | agent = BaseAgent(llm,llm_name,player_name,players,phrase) 41 | name2agent[player_name] = agent 42 | agents_list.append(agent) 43 | settings = f'''The spy word is: {spy_word};\n The villager word is {villager_word}.\n''' 44 | for agent in agents_list: 45 | settings += f"Player: {agent.player_name}; LLM: {agent.llm_name}; Assigned Word: {agent.phrase} \n" 46 | 47 | return agents_list, spy_name, index, settings 48 | 49 | def get_voted_name(name_list): 50 | counts = {} 51 | 52 | for string in name_list: 53 | if string in counts: 54 | counts[string] += 1 55 | else: 56 | counts[string] = 1 57 | 58 | max_count = 0 59 | most_frequent_string = None 60 | freq = [] 61 | for string, count in counts.items(): 62 | freq.append(count) 63 | 64 | if count > max_count: 65 | max_count = count 66 | most_frequent_string = string 67 | 68 | freq.sort() 69 | return most_frequent_string, freq 70 | 71 | def update_history(agents_list:list[BaseAgent], temp_message, player_name, public_messages): 72 | 73 | for agent in agents_list: 74 | if agent.player_name != player_name: 75 | agent.private_history.append(temp_message) 76 | public_messages.append(temp_message) 77 | 78 | def get_result(agent_list:list[BaseAgent], spy_index, round, i, winer): 79 | 80 | llms = [agent.llm_name for agent in agent_list] 81 | players = [agent.player_name for agent in agent_list] 82 | 83 | return {"winer":winer,"players":players,"llms":llms,"spy_index":spy_index,"round":round,"log":f"{i}.log"} 84 | 85 | 86 | def game(f, phrase_pair, spy_model, villager_model, i, args): 87 | 88 | agents_list, spy_name, spy_index, game_settings = init_game(phrase_pair,spy_model,villager_model) 89 | f.write(game_settings) 90 | living_players = copy.deepcopy(players) 91 | 92 | # game start 93 | PUBLIC_MESSAGES = [] 94 | host_speech= "Host: The game now start." 95 | start_message = create_message("user",host_speech) 96 | update_history(agents_list,start_message,"host",PUBLIC_MESSAGES) 97 | f.write(host_speech + "\n") 98 | 99 | host_speech = f"Host: The living players are:{json.dumps(living_players)}" 100 | living_player_message = create_message("user",host_speech) 101 | update_history(agents_list,living_player_message,"host",PUBLIC_MESSAGES) 102 | f.write(host_speech + "\n") 103 | 104 | if args.debug: 105 | print(game_settings) 106 | game_round = 0 107 | while True: 108 | game_round += 1 109 | ## describing 110 | f.write("---------describing stage-------------") 111 | 112 | host_speech = f"Host: Now it's the describing stage, players have to say something about the received word without directly saying it." 113 | host_message = create_message("user",host_speech) 114 | update_history(agents_list,host_message,"host",PUBLIC_MESSAGES) 115 | f.write(host_speech+"\n\n") 116 | 117 | for agent in agents_list: 118 | 119 | if agent.player_name not in living_players: 120 | continue 121 | 122 | host_speech = f"Host: {agent.player_name}, it's your turn." 123 | host_message = create_message("user",host_speech) 124 | update_history(agents_list,host_message,"host",PUBLIC_MESSAGES) 125 | f.write(host_speech+"\n") 126 | 127 | description, cot = agent.describe() 128 | 129 | temp = f"{agent.player_name}: {description}" 130 | public_message = create_message("user",temp) 131 | update_history(agents_list,public_message,agent.player_name,PUBLIC_MESSAGES) 132 | PUBLIC_MESSAGES.append(cot) 133 | private_message = create_message("assistant",json.dumps(cot)) 134 | agent.private_history.append(private_message) 135 | f.write(temp+"\n") 136 | f.write(json.dumps(cot)+"\n\n") 137 | 138 | if args.debug: 139 | print(cot,agent.phrase) 140 | print(temp) 141 | print() 142 | 143 | 144 | f.write("---------voting stage-------------") 145 | 146 | host_speech = "Host: Now the voting start, please vote for the player you think is the spy and tell the reason why you think he is the spy." 147 | host_message = create_message("user",host_speech) 148 | update_history(agents_list,host_message,"host",PUBLIC_MESSAGES) 149 | f.write(host_speech+"\n") 150 | 151 | name_list = [] 152 | for agent in agents_list: 153 | 154 | host_speech = f"Host: {agent.player_name}, it's your turn." 155 | host_message = create_message("user",host_speech) 156 | update_history(agents_list,host_message,"host",PUBLIC_MESSAGES) 157 | f.write(host_speech+"\n") 158 | 159 | if agent.player_name not in living_players: 160 | continue 161 | name, speak, cot = agent.vote() 162 | 163 | # private message for the player 164 | private_message = create_message("assistant",json.dumps(cot)) 165 | agent.private_history.append(private_message) 166 | 167 | # public message for the other players 168 | temp = f"{agent.player_name}: {speak}, i will vote {name} as the spy." 169 | public_message = create_message("user",temp) 170 | update_history(agents_list,public_message,agent.player_name,PUBLIC_MESSAGES) 171 | PUBLIC_MESSAGES.append(cot) 172 | 173 | if args.debug: 174 | print(cot) 175 | print(temp) 176 | print() 177 | 178 | f.write(temp+"\n") 179 | f.write(json.dumps(cot)+"\n") 180 | 181 | name_list.append(name) 182 | 183 | ## result of this round 184 | final_name, freq = get_voted_name(name_list) 185 | 186 | if final_name not in living_players: 187 | log_content = "Agent not reture a correct player name." 188 | print(log_content) 189 | f.write(log_content+"\n") 190 | return get_result(agents_list,spy_index,-1,i,"exit"),PUBLIC_MESSAGES 191 | 192 | if final_name == spy_name: 193 | log_content = "the spy loss the game" 194 | print(log_content) 195 | f.write(log_content) 196 | # gameover, spy loss 197 | return get_result(agents_list,spy_index,game_round,i,"villager"),PUBLIC_MESSAGES 198 | else: 199 | ## remove a player 200 | living_players.remove(final_name) 201 | 202 | host_speech = f"Host: the voting result is {final_name}, he is not the spy. The spy still lives, the game will continue. In the next round, the players' descriptions need to be more specific." 203 | host_message = create_message("user",host_speech) 204 | update_history(agents_list,host_message,"host",PUBLIC_MESSAGES) 205 | f.write(host_speech+"\n\n") 206 | 207 | host_speech = f"Host: Now the living players are:{json.dumps(living_players)}" 208 | living_player_message = create_message("user",host_speech) 209 | update_history(agents_list,living_player_message,"host",PUBLIC_MESSAGES) 210 | f.write(host_speech+"\n\n") 211 | 212 | 213 | if len(living_players) <= 3: 214 | log_content = "the spy win the game" 215 | print(log_content) 216 | f.write(log_content) 217 | return get_result(agents_list,spy_index,game_round,i,"spy"),PUBLIC_MESSAGES 218 | 219 | 220 | if __name__ == "__main__": 221 | 222 | parser = argparse.ArgumentParser() 223 | parser.add_argument('--label_path', type=str, default='spyfall/labels.txt') 224 | parser.add_argument('--spy_model_name',type=str,default='td003') 225 | parser.add_argument('--villager_model_name',type=str,default='gpt3') 226 | parser.add_argument('--n',type=int,default='1') 227 | parser.add_argument('--debug',type=bool,default=True) 228 | args = parser.parse_args() 229 | with open(args.label_path,'r') as f: 230 | data = f.readlines() 231 | 232 | log_path = f"spyfall/logs/{args.spy_model_name}_{args.villager_model_name}" 233 | labels = [] 234 | for item in data: 235 | labels.append(item.strip().split(",")) 236 | 237 | for label in labels: 238 | dir_name = f"{label[0]}&{label[1]}" 239 | if not os.path.exists(os.path.join(log_path,dir_name)): 240 | os.makedirs(os.path.join(log_path,dir_name)) 241 | 242 | spy_model = get_model(args.spy_model_name) 243 | villager_model = get_model(args.villager_model_name) 244 | 245 | for j in range(args.n): 246 | for i in range(len(labels)): 247 | label = labels[i] 248 | dir_name = f"{log_path}/{label[0]}&{label[1]}" 249 | with open(f"{dir_name}/{j}.log",'w') as f: 250 | res, history = game(f=f, 251 | phrase_pair=label, 252 | spy_model=spy_model, 253 | villager_model=villager_model, 254 | i=j, 255 | args=args) -------------------------------------------------------------------------------- /game_tofukingdom.py: -------------------------------------------------------------------------------- 1 | from tofukingdom.agents import ChefAgent,SpyAgent,MaidAgent,GuardAgent,QueenAgent,PrinceAgent,PrincessAgent,MinisterAgent 2 | from tofukingdom.utils.utils import create_message 3 | from chat.gpt3_chat import GPT3 4 | from chat.gpt4_chat import GPT4 5 | import vthread 6 | import json 7 | import random 8 | import argparse 9 | import os 10 | from tofukingdom.utils.utils import get_model 11 | gpt3 = GPT3() 12 | gpt4 = GPT4() 13 | players = ["Nancy","Tom","Cindy","Jack","Rose","Edward","Robert"] 14 | 15 | 16 | def init_game(players,prince_model,queen_model,spy_model): 17 | name2agent = {} 18 | random.shuffle(players) 19 | agents = [] 20 | agents.append(PrincessAgent(prince_model,players[0],players)) 21 | agents.append(ChefAgent(prince_model,players[1],players)) 22 | agents.append(SpyAgent(spy_model,players[2],players)) 23 | agents.append(MaidAgent(spy_model,players[3],players)) 24 | agents.append(GuardAgent(queen_model,players[4],players)) 25 | agents.append(QueenAgent(queen_model,players[5],players)) 26 | agents.append(MinisterAgent(queen_model,players[6],players)) 27 | random.shuffle(agents) 28 | settings = f"PrinceModel: {prince_model.name}\n QueenModel: {queen_model.name} \n SpyModel: {spy_model.name} \n" 29 | identities = "" 30 | for agent in agents: 31 | name2agent[agent.player_name] = agent 32 | settings += f"Player: {agent.player_name}; LLM: {agent.chatbot.name}; Identity: {agent.role}; \n" 33 | identities += f"Player {agent.player_name} is the {agent.role};" 34 | return agents, name2agent, settings, identities 35 | 36 | def get_identity_text(agents): 37 | res = "" 38 | for agent in agents: 39 | res += f"{agent.player_name} is the {agent.role}. \n" 40 | return res 41 | 42 | def get_game_result(final_name,i): 43 | res = {"winner":final_name,"log":f"{i}.log"} 44 | return res 45 | 46 | def update_history(agents_list, temp_message, player_name): 47 | 48 | for agent in agents_list: 49 | if agent.player_name != player_name: 50 | agent.private_history.append(temp_message) 51 | 52 | 53 | def game(f,round,prince_model,queen_model,spy_model): 54 | 55 | prince = PrinceAgent(prince_model,players) 56 | print(f"The {round}-th game begins.\n") 57 | 58 | random.shuffle(players) 59 | 60 | agents_list, name2agent, settings, identities = init_game(players,prince_model,queen_model,spy_model) 61 | if args.debug: 62 | print(settings) 63 | print() 64 | identities = get_identity_text(agents_list) 65 | 66 | host_speech= "Host: The game now start." 67 | start_message = create_message("user",host_speech) 68 | update_history(agents_list,start_message,"host") 69 | prince.private_history.append(start_message) 70 | f.write(host_speech + "\n") 71 | if args.debug: 72 | print(host_speech) 73 | 74 | for agent in agents_list: 75 | player_name = agent.player_name 76 | 77 | host_speech= f"Host: The Prince please ask player {player_name} one question." 78 | host_message = create_message("user",host_speech) 79 | update_history(agents_list,host_message,"host") 80 | prince.private_history.append(host_message) 81 | f.write(host_speech + "\n") 82 | if args.debug: 83 | print(host_speech) 84 | 85 | # prince ask question 86 | question, cot = prince.ask() 87 | if question is None: 88 | error = "Question is None." 89 | print(error) 90 | return {"error":error} 91 | temp = f"Prince: {question}" 92 | temp_message = create_message("user",temp) 93 | update_history(agents_list,temp_message,"Prince") 94 | prince_message = create_message("assistant",json.dumps(cot)) 95 | prince.private_history.append(prince_message) 96 | f.write(temp+"\n") 97 | f.write(json.dumps(cot)+"\n") 98 | if args.debug: 99 | print(temp) 100 | print(json.dumps(cot)) 101 | print() 102 | 103 | # player answer question 104 | answer, cot = agent.chat(identities) 105 | if answer is None: 106 | error = "Answer is None." 107 | print(error) 108 | return {"error":error} 109 | temp = f"{player_name}: {answer}" 110 | temp_message = create_message("user",temp) 111 | update_history(agents_list,temp_message,player_name) 112 | private_message = create_message("assistant",json.dumps(cot)) 113 | agent.private_history.append(private_message) 114 | prince.private_history.append(temp_message) 115 | f.write(temp+"\n") 116 | f.write(json.dumps(cot)+"\n") 117 | if args.debug: 118 | print(temp) 119 | print(json.dumps(cot)) 120 | print() 121 | 122 | # choose a player to ask one more question 123 | host_speech = f"Host: The Prince please choose a player to ask an extra question." 124 | host_message = create_message("user",host_speech) 125 | update_history(agents_list,host_message,"host") 126 | prince.private_history.append(host_message) 127 | f.write(host_speech + "\n") 128 | if args.debug: 129 | print(host_speech) 130 | print() 131 | 132 | name, question, cot = prince.ask_choose() 133 | if name is None: 134 | error = "Extra name is None." 135 | print(error) 136 | return {"error":error} 137 | temp = f"Prince: I choose {name}, my quesiton is {question}" 138 | temp_message = create_message("user",temp) 139 | update_history(agents_list,temp_message,"Prince") 140 | prince_message = create_message("assistant",json.dumps(cot)) 141 | prince.private_history.append(prince_message) 142 | f.write(temp+"\n") 143 | f.write(json.dumps(cot)+"\n") 144 | if args.debug: 145 | print(temp) 146 | print(json.dumps(cot)) 147 | print() 148 | 149 | # player answer an extra question 150 | answer, cot = agent.chat(identities) 151 | if answer is None: 152 | error = "Extra answer is None." 153 | print(error) 154 | return {"error":error} 155 | temp = f"{player_name}: {answer}" 156 | temp_message = create_message("user",temp) 157 | update_history(agents_list,temp_message,player_name) 158 | private_message = create_message("assistant",json.dumps(cot)) 159 | agent.private_history.append(private_message) 160 | prince.private_history.append(temp_message) 161 | f.write(temp+"\n") 162 | f.write(json.dumps(cot)+"\n") 163 | if args.debug: 164 | print(temp) 165 | print(json.dumps(cot)) 166 | print() 167 | 168 | # choose the princess 169 | 170 | host_speech = f"Host: Who do you think is the true princess?" 171 | host_message = create_message("user",host_speech) 172 | update_history(agents_list,host_message,"host") 173 | prince.private_history.append(host_message) 174 | f.write(host_speech + "\n") 175 | if args.debug: 176 | print(host_speech) 177 | print() 178 | 179 | name, cot = prince.choose() 180 | if name is None: 181 | error = "Fianl answer is None." 182 | print(error) 183 | return {"error":error} 184 | if args.debug: 185 | print(f"The final choice is {name}") 186 | print(json.dumps(cot)) 187 | print() 188 | 189 | game_result = get_game_result(name2agent[name].role,round) 190 | if args.debug: 191 | print(game_result) 192 | return game_result 193 | 194 | 195 | 196 | if __name__ == "__main__": 197 | parser = argparse.ArgumentParser() 198 | parser.add_argument('--prince_model_name',type=str,default='gpt3') 199 | parser.add_argument('--spy_model_name',type=str,default='td003') 200 | parser.add_argument('--queen_model_name',type=str,default='gpt3') 201 | parser.add_argument('--n',type=int,default='1') 202 | parser.add_argument('--debug',type=bool,default=True) 203 | args = parser.parse_args() 204 | prince_model = get_model(args.prince_model_name) 205 | spy_model = get_model(args.spy_model_name) 206 | queen_model = get_model(args.queen_model_name) 207 | 208 | 209 | for i in range(args.n): 210 | log_dir = f"tofukingdom/logs/{args.prince_model_name}_{args.queen_model_name}_{args.spy_model_name}" 211 | if not os.path.exists(log_dir): 212 | os.makedirs(log_dir) 213 | file_name = os.path.join(log_dir,f"{i}.txt") 214 | with open(file_name,'w') as f: 215 | game_result = game(f,i,prince_model,queen_model,spy_model) 216 | if game_result: 217 | with open("tofukingdom/result.json","a") as f: 218 | f.write(json.dumps(game_result)+"\n") 219 | 220 | -------------------------------------------------------------------------------- /spyfall/.gitignore: -------------------------------------------------------------------------------- 1 | utils/key.py 2 | logs -------------------------------------------------------------------------------- /spyfall/agents/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /spyfall/agents/base_agent.py: -------------------------------------------------------------------------------- 1 | from spyfall.utils.prompt import game_prompt_en 2 | from spyfall.utils.utils import create_message, print_messages 3 | import json 4 | 5 | class BaseAgent: 6 | def __init__(self,chatbot,llm_name,player_name,all_players,phrase) -> None: 7 | self.chatbot = chatbot # llm model 8 | self.llm_name = llm_name # name of the chatbot e.g. "gpt3" 9 | self.game_prompt = game_prompt_en 10 | self.player_name = player_name # player name 11 | self.phrase = phrase 12 | self.all_players = all_players # the names of all the players 13 | self.role_prompt = self.get_role_prompt() 14 | 15 | self.role_messages = self.get_role_messages() 16 | self.vote_messages = self.get_vote_messages() 17 | self.private_history = [] 18 | 19 | def get_role_prompt(self): 20 | role_prompt = ( 21 | f'''{self.game_prompt}''' 22 | f'''The players involved in the game are: {json.dumps(self.all_players)}.''' 23 | f'''You are {self.player_name} \n''' 24 | f'''Your given phrase is {self.phrase} \n''' 25 | ) 26 | 27 | return role_prompt 28 | 29 | 30 | 31 | def get_role_messages(self): 32 | 33 | messages = [] 34 | messages.append(create_message("system",self.game_prompt)) 35 | temp = f"Now i have read the rules and i know how to play the game, can you offer me some key strategy to win the game? " 36 | messages.append(create_message("assistant",temp)) 37 | 38 | temp = f"Sure. At the begining of the game or you are not sure whether you are the spy, you can speak with very general descriptions and use as few words as you can. " 39 | messages.append(create_message("user",temp)) 40 | temp = f"For example, if your word is 'apple', you can say like 'it's this is a very common object' or 'it's a kind of fruit' " 41 | messages.append(create_message("user",temp)) 42 | temp = f"You need to analyze the speech of other players carefully to guess what is the common word and what is the spy word." 43 | messages.append(create_message("user",temp)) 44 | temp = f"If you are sure that you are a spy, you should try to conceal your identity and confuse others not to vote you." 45 | messages.append(create_message("user",temp)) 46 | temp = f"I understand. " 47 | messages.append(create_message("assistant",temp)) 48 | temp = f"Now you are {self.player_name}, the word you get is {self.phrase}. You don't know the word of other players. " 49 | messages.append(create_message("user",temp)) 50 | temp = f"Recieved. " 51 | messages.append(create_message("assistant",temp)) 52 | 53 | temp = ( 54 | f'''Your reply should be a string in the json format as follows:\n''' 55 | '''{"thought":{your though},"speak":{your speak}}\n ''' 56 | f''' "thought" represent your thinking, which can be seen only by your self. \n''' 57 | f''' "speak" represent your speak in this round, which can been seen by all the other players. \n''' 58 | ) 59 | messages.append(create_message("user",temp)) 60 | temp = ( 61 | '''Your speak should only contain the few words about the word you received, you should not speak like 'i agree with {player_name}' or other thing irrelevant to the word you received. ''' 62 | ) 63 | messages.append(create_message("user",temp)) 64 | temp = f"I understand. I will reply with a json string, and i will not repeat other players' speak or my own speak in the previous round. " 65 | messages.append(create_message("assistant",temp)) 66 | 67 | return messages 68 | 69 | 70 | def get_vote_messages(self): 71 | messages = [] 72 | messages.append(create_message("system",self.game_prompt)) 73 | temp = f"Now i have read the rules. But i still need some strategies to better win the game." 74 | messages.append(create_message("assistant",temp)) 75 | 76 | temp = f"The voting stage is very important. So i can give some experience in the voting stage. " 77 | messages.append(create_message("user",temp)) 78 | temp = f"Great, i will learn from the experience to better win the game. " 79 | messages.append(create_message("assistant",temp)) 80 | temp = f"First you should carefully think of the word each players get according to their descriptions. The player whose description is far from the others can be the spy." 81 | messages.append(create_message("user",temp)) 82 | temp = "You need to constantly guess possible common words and spy words during the game process." 83 | messages.append(create_message("user",temp)) 84 | temp = f"If you are sure that your word is the spy word, you should think of how to prevent being voted. You should try to confuse other players to hide your identity." 85 | temp += f"If you think you are not the spy, you should think who might be a spy. And you should encourage other players to vote for the spy in your speech if you have a specific suspicion target to be the spy." 86 | messages.append(create_message("user",temp)) 87 | temp = f"I understand. " 88 | messages.append(create_message("assistant",temp)) 89 | 90 | temp = f"Now you are {self.player_name}, the word you get is {self.phrase}." 91 | messages.append(create_message("user",temp)) 92 | 93 | temp = f"Recieved. " 94 | messages.append(create_message("assistant",temp)) 95 | temp = ( 96 | f'''In the voting stage, your reply should be a string in the json format as follows:\n''' 97 | '''{"thought":{your though},"speak":{your speak},"name":{voted name}} \n ''' 98 | f''' "thought" represent your thinking, which can be seen only by your self. \n''' 99 | f''' "speak" represent your speak in the game, which can be seen for all the players. \n''' 100 | f''' "name" can be only select be the living players. \n ''' 101 | ) 102 | messages.append(create_message("user",temp)) 103 | temp = f"I understand. I will reply with a json string, and i will not repeat other players' speak or my own speak of the previous round. " 104 | messages.append(create_message("assistant",temp)) 105 | return messages 106 | 107 | def describe(self): 108 | messages = self.role_messages + self.private_history 109 | messages.append(create_message("system","Remember, you must reply a json string as required. And you must not repeat the statements of other players and your own past statement.")) 110 | 111 | if self.chatbot.name == "td003": 112 | text_prompt = self.convert_messages_to_prompt(messages) 113 | res = self.chatbot.single_chat(text_prompt) 114 | else: 115 | res = self.chatbot.multi_chat(messages) 116 | 117 | try: 118 | res = json.loads(res) 119 | though = res["thought"] 120 | speak = res["speak"] 121 | except: 122 | pass 123 | 124 | return speak, res 125 | 126 | def vote(self): 127 | messages = self.vote_messages + self.private_history 128 | messages.append(create_message("system","Remember, you must reply a json string as required, and the 'speak' must not repeat with the statements of other players or your own past statement. The 'name' must be the same string chosen from the given list 'living players'. ")) 129 | if self.chatbot.name == "td003": 130 | text_prompt = self.convert_messages_to_prompt(messages) 131 | res = self.chatbot.single_chat(text_prompt) 132 | else: 133 | res = self.chatbot.multi_chat(messages) 134 | thought = None 135 | speak = None 136 | name = None 137 | try: 138 | res = json.loads(res) 139 | though = res["thought"] 140 | speak = res["speak"] 141 | name = res["name"] 142 | except: 143 | pass 144 | 145 | return name, speak, res 146 | 147 | def convert_messages_to_prompt(self, messages): 148 | prompt = "" 149 | for message in messages: 150 | if message["role"] == "system": 151 | prompt += "system: " 152 | prompt += message["content"] 153 | prompt += "\n" 154 | elif message["role"] == "assistant": 155 | prompt += f"{self.player_name}: " 156 | prompt += message["content"] 157 | prompt += "\n" 158 | else: 159 | prompt += message["content"] 160 | prompt += "\n" 161 | prompt += f"{self.player_name}: " 162 | return prompt -------------------------------------------------------------------------------- /spyfall/compute_adversarial.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | 4 | with open("/zecheng/qiaodan/spyfall/labels.txt",'r') as f: 5 | data = f.readlines() 6 | 7 | 8 | logs_dir = "/zecheng/qiaodan/spyfall/gpt3_gpt4" 9 | out_path = "/zecheng/qiaodan/spyfall/result/gpt3_gpt4.json" 10 | labels = [] 11 | for item in data: 12 | labels.append(item.strip().split(",")) 13 | 14 | def compute_adversarial(lines): 15 | cnt_spy = 0 16 | cnt_villager = 0 17 | round_avg = 0 18 | for line in lines: 19 | item = json.loads(line) 20 | winer = item["winer"] 21 | if winer == "exit": 22 | continue 23 | if winer == "spy": 24 | cnt_spy += 1 25 | else: 26 | cnt_villager += 1 27 | round_avg += item["round"] 28 | 29 | round_avg /= cnt_spy + cnt_villager 30 | 31 | return {"cnt_spy":cnt_spy,"cnt_villager":cnt_villager,"round_avg":round_avg,"rate":cnt_spy/(cnt_spy + cnt_villager)} 32 | res_dict = {} 33 | for label in labels: 34 | label_name = f"{label[0]}&{label[1]}" 35 | dir_name = os.path.join(logs_dir,label_name) 36 | with open(f"{dir_name}/res.json", 'r') as f: 37 | lines = f.readlines() 38 | res = compute_adversarial(lines) 39 | res_dict[label_name] = res 40 | 41 | avg = {"round_avg":0,"rate":0} 42 | 43 | for key,value in res_dict.items(): 44 | avg["round_avg"] += value["round_avg"] 45 | avg["rate"] += value["rate"] 46 | 47 | avg["round_avg"] /= len(res_dict) - 1 48 | avg["rate"] /= len(res_dict) - 1 49 | res_dict["avg"] = avg 50 | 51 | with open(out_path,'w') as f: 52 | json.dump(res_dict,f,indent=1) 53 | -------------------------------------------------------------------------------- /spyfall/labels.txt: -------------------------------------------------------------------------------- 1 | ipad,iphone 2 | guitar,lute 3 | BMW,BENZ 4 | eyebrow,beard 5 | Grape,Raisins 6 | sea_lion,seal 7 | spider_man,batman 8 | nike,adidas 9 | milk,soy_milk 10 | motorcycle,electromobile 11 | tiger,lion 12 | -------------------------------------------------------------------------------- /spyfall/utils/prompt.py: -------------------------------------------------------------------------------- 1 | 2 | 3 | game_prompt_en = ''' 4 | Now you are playing the game called "who is the spy". 5 | 6 | There are six players in this game. 7 | At the beginning of the game, everyone will receive a word. 8 | There is one spy who will receive a spy word, while others will receive a common word. 9 | Spy word is different but relevant to common words. For example, the spy word can be "apple", and the common word is "banana". 10 | 11 | There are two stages in each round of the game. 12 | 13 | The first stage is describing stage: 14 | Everyone needs to use a word or a few words to say something in turn about the word he received without directly saying the word. 15 | The funniest part of the game is that since you do not know other's words, you are not sure whether you are the spy. 16 | So, you can only infer who have the different based on other players description. 17 | 18 | The second stage is the voting stage: 19 | After everyone speaks, each player must vote for a player who he thinks is the spy and tell others why. Players can not repeat other players' speeches or their own speeches in the previous round. The player getting the most votes will be eliminated from the game. 20 | 21 | If the spy is eliminated, the rest of the players win. 22 | If there are less than three players, and the spy still lives, then the spy wins. 23 | 24 | ''' 25 | 26 | -------------------------------------------------------------------------------- /spyfall/utils/utils.py: -------------------------------------------------------------------------------- 1 | from chat.gpt3_chat import GPT3 2 | from chat.gpt4_chat import GPT4 3 | from chat.text003_chat import Text003 4 | 5 | def get_model(model_name): 6 | model = None 7 | if model_name == "gpt3": 8 | model = GPT3() 9 | if model_name == "gpt4": 10 | model = GPT4() 11 | if model_name == "td003": 12 | model = Text003() 13 | return model 14 | 15 | def create_message(role,content): 16 | return {"role":role,"content":content} 17 | 18 | def print_messages(messages): 19 | for message in messages: 20 | print(message) 21 | 22 | -------------------------------------------------------------------------------- /tofukingdom/.gitignore: -------------------------------------------------------------------------------- 1 | utils/key.py 2 | -------------------------------------------------------------------------------- /tofukingdom/agents/__init__.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from tofukingdom.agents.guard_agent import GuardAgent 3 | from tofukingdom.agents.maid_agent import MaidAgent 4 | from tofukingdom.agents.chef_agent import ChefAgent 5 | from tofukingdom.agents.minister_agent import MinisterAgent 6 | from tofukingdom.agents.prince_agent import PrinceAgent 7 | from tofukingdom.agents.princess_agent import PrincessAgent 8 | from tofukingdom.agents.queen_agent import QueenAgent 9 | from tofukingdom.agents.spy_agent import SpyAgent -------------------------------------------------------------------------------- /tofukingdom/agents/base_agent.py: -------------------------------------------------------------------------------- 1 | from tofukingdom.utils.prompt import game_prompt_en 2 | from tofukingdom.utils.utils import create_message,print_messages 3 | import json 4 | 5 | class BaseAgent: 6 | def __init__(self,chatbot,player_name,all_players) -> None: 7 | self.game_prompt = game_prompt_en 8 | self.chatbot = chatbot 9 | self.role = None 10 | self.role_prompt = self.get_role_prompt() 11 | self.player_name = player_name 12 | self.all_players = all_players 13 | self.private_history = [] 14 | 15 | def get_role_prompt(self): 16 | role_prompt = ''' 17 | You now need to play the role of the Maid. 18 | For the Prince's question, you can choose to say the truth or lie. 19 | 20 | ''' 21 | return role_prompt 22 | 23 | def chat(self,identities): 24 | 25 | role_prompt = (f"{self.game_prompt} \n" 26 | f"Now, you are player {self.player_name} " 27 | f"{self.role_prompt} \n" 28 | f"This is the identity information of other players: {identities} \n " 29 | ) 30 | last_prompt = (f'''Your reply must be a JSON string in the following format: \n''' 31 | '''{"thought":{your thought},"answer":"your answer"} \n''' 32 | f''' 'thought' represent your thought of how to answer the question according to the rule and your goal. ''' 33 | f''' 'answer' represent your reply to the Prince. ''') 34 | messages = [] 35 | first_message = create_message("system",role_prompt) 36 | messages.append(first_message) 37 | messages += self.private_history 38 | last_message = create_message("system",last_prompt) 39 | messages.append(last_message) 40 | 41 | cnt = 0 42 | while True: 43 | try: 44 | if self.chatbot.name == "td003": 45 | prompt = self.convert_messages_to_prompt(messages) 46 | res = self.chatbot.single_chat(prompt) 47 | else: 48 | res = self.chatbot.multi_chat(messages) 49 | res = json.loads(res) 50 | break 51 | except: 52 | cnt += 1 53 | if cnt >= 3: 54 | return None, None 55 | 56 | answer = res["answer"] 57 | return answer, res 58 | 59 | def convert_messages_to_prompt(self, messages): 60 | prompt = "" 61 | for message in messages: 62 | if message["role"] == "system": 63 | prompt += "system: " 64 | prompt += message["content"] 65 | prompt += "\n" 66 | elif message["role"] == "assistant": 67 | prompt += f"{self.player_name}: " 68 | prompt += message["content"] 69 | prompt += "\n" 70 | else: 71 | prompt += message["content"] 72 | prompt += "\n" 73 | prompt += f"{self.player_name}: " 74 | return prompt 75 | -------------------------------------------------------------------------------- /tofukingdom/agents/chef_agent.py: -------------------------------------------------------------------------------- 1 | 2 | from tofukingdom.utils.prompt import game_prompt_en 3 | from tofukingdom.agents.base_agent import BaseAgent 4 | 5 | class ChefAgent(BaseAgent): 6 | def __init__(self,chatbot,name,all_players) -> None: 7 | super().__init__(chatbot,name,all_players) 8 | self.role = "Chef" 9 | 10 | def get_role_prompt(self): 11 | role_prompt = '''You now need to play the role of the Chef. 12 | For the Prince's question, you can only speak the truth. 13 | The Princess is your teammate. 14 | You goal is to help the Prince correctly choose the princess.''' 15 | return role_prompt 16 | -------------------------------------------------------------------------------- /tofukingdom/agents/guard_agent.py: -------------------------------------------------------------------------------- 1 | 2 | from tofukingdom.utils.prompt import game_prompt_en 3 | from tofukingdom.agents.base_agent import BaseAgent 4 | class GuardAgent(BaseAgent): 5 | def __init__(self,chatbot,name,all_players) -> None: 6 | super().__init__(chatbot,name,all_players) 7 | self.role = "Guard" 8 | 9 | def get_role_prompt(self): 10 | role_prompt = ''' 11 | You now need to play the role of the Guard. 12 | For the Prince's question, you can only speak the lie, you are not allowed to speak the truth. 13 | The Queen and the Minister is your teammate. 14 | You goal is to mislead the Prince to choose the Queen as the Princess. 15 | ''' 16 | return role_prompt 17 | -------------------------------------------------------------------------------- /tofukingdom/agents/maid_agent.py: -------------------------------------------------------------------------------- 1 | from tofukingdom.utils.prompt import game_prompt_en 2 | from tofukingdom.agents.base_agent import BaseAgent 3 | class MaidAgent(BaseAgent): 4 | def __init__(self,chatbot,name,all_players) -> None: 5 | super().__init__(chatbot,name,all_players) 6 | self.role = "Maid" 7 | 8 | def get_role_prompt(self): 9 | role_prompt = ''' 10 | You now need to play the role of the Maid. 11 | For the Prince's question, you can choose to say the truth or lie. 12 | The Spy is your teammate. 13 | You goal is to mislead the Prince to choose other players except the Princess and the Queen. 14 | ''' 15 | return role_prompt 16 | -------------------------------------------------------------------------------- /tofukingdom/agents/minister_agent.py: -------------------------------------------------------------------------------- 1 | from tofukingdom.utils.prompt import game_prompt_en 2 | from tofukingdom.agents.base_agent import BaseAgent 3 | class MinisterAgent(BaseAgent): 4 | def __init__(self,chatbot,name,all_players) -> None: 5 | super().__init__(chatbot,name,all_players) 6 | self.role = "Minister" 7 | 8 | def get_role_prompt(self): 9 | role_prompt = ''' 10 | You now need to play the role of the Minister. 11 | For the Prince's question, you can only speak the lie, you are not allowed to speak the truth. 12 | The Queen and the Guard is your teammate. 13 | Your goal is to mislead the Prince to choose the Queen as the Princess. 14 | ''' 15 | return role_prompt 16 | 17 | -------------------------------------------------------------------------------- /tofukingdom/agents/prince_agent.py: -------------------------------------------------------------------------------- 1 | 2 | from tofukingdom.utils.prompt import game_prompt_en 3 | import json 4 | from tofukingdom.utils.utils import create_message,print_messages 5 | class PrinceAgent: 6 | def __init__(self,chatbot,all_players) -> None: 7 | self.game_prompt = game_prompt_en 8 | self.chatbot = chatbot 9 | self.name = "Prince" 10 | self.role_prompt = self.get_role_prompt() 11 | self.ask_prompt = self.get_ask_prompt() 12 | self.ask_choose_prompt = self.get_ask_choose_prompt() 13 | self.choose_prompt = self.get_choose_prompt() 14 | 15 | self.all_players = all_players 16 | self.private_history = [] 17 | 18 | def get_role_prompt(self): 19 | prompt = (f"You need to play the role of the Prince. \n" 20 | f"Your goal is to find the true Princess. \n" 21 | "You can ask each player one question. " 22 | ''' 23 | Your question can only be strictly chosen from the three question below: 24 | 1. Which player is the Princess? 25 | 2. What is your identity? 26 | 3. What is the identity of {player_name} ? 27 | ''' 28 | f"Remember some player could lie to you. " 29 | f"So you need to carefully consider each player's response and think whether or not you should trust it. " 30 | f"You should not always ask the same question, you should try choosing different questions to verify who is the Princess." 31 | ) 32 | 33 | return prompt 34 | 35 | def get_ask_prompt(self): 36 | prompt = ( 37 | f"Your reply must be a JSON string in the following format: \n" 38 | '''{"thought":{your though},"question":{your question}}\n''' 39 | f"'thought' represent your you thinking of which question you want to ask and why. \n" 40 | f"'question' represent your question.\n" 41 | ) 42 | return prompt 43 | 44 | def get_ask_choose_prompt(self): 45 | prompt = ( 46 | f"Your reply must be in the json format as below:\n " 47 | '''{"thought":{your thought},"name":{player_name},"question":{your question}}\n''' 48 | f"'thought' represent your thinking of which player and which question you should ask to help you find the true Princess. \n" 49 | f"'name' should be the name of the player you choose to ask. \n" 50 | f"'question' is the question you want to ask, which should be chosen from the three questions above.\n" 51 | ) 52 | return prompt 53 | 54 | def get_choose_prompt(self): 55 | prompt = ( 56 | f"Your reply must be a single JSON string without any extra characters in the following format: \n" 57 | '''{"thought":{your thought},"name":{player_name}}\n''' 58 | f"'thought' represent you analysis according to your question and the response. \n " 59 | f"'name' should be the name of the player that you think is the Princess. \n" 60 | f"'name' must be chosen from names of the players \n" 61 | ) 62 | return prompt 63 | 64 | def ask(self): 65 | messages = [] 66 | game_message = create_message("system",self.game_prompt) 67 | messages.append(game_message) 68 | role_message = create_message("system",self.role_prompt) 69 | messages.append(role_message) 70 | messages += self.private_history 71 | last_message = create_message("system",self.ask_prompt) 72 | messages.append(last_message) 73 | cnt = 0 74 | 75 | while True: 76 | try: 77 | if self.chatbot.name == "td003": 78 | prompt = self.convert_messages_to_prompt(messages) 79 | res = self.chatbot.single_chat(prompt) 80 | else: 81 | res = self.chatbot.multi_chat(messages) 82 | res = json.loads(res) 83 | break 84 | except: 85 | cnt += 1 86 | if cnt >= 3: 87 | return None, None 88 | 89 | question = res["question"] 90 | return question, res 91 | 92 | def ask_choose(self): 93 | messages = [] 94 | first_message = create_message("system",self.role_prompt) 95 | messages.append(first_message) 96 | messages += self.private_history 97 | last_message = create_message("system",self.ask_choose_prompt) 98 | messages.append(last_message) 99 | 100 | cnt = 0 101 | while True: 102 | try: 103 | if self.chatbot.name == "td003": 104 | prompt = self.convert_messages_to_prompt(messages) 105 | res = self.chatbot.single_chat(prompt) 106 | else: 107 | res = self.chatbot.multi_chat(messages) 108 | res = json.loads(res) 109 | break 110 | except: 111 | cnt += 1 112 | if cnt >= 3: 113 | return None, None, None 114 | 115 | question = res["question"] 116 | name = res["name"] 117 | return name, question, res 118 | 119 | def choose(self): 120 | messages = [] 121 | first_message = create_message("system",self.role_prompt) 122 | messages.append(first_message) 123 | messages += self.private_history 124 | last_message = create_message("system",self.choose_prompt) 125 | messages.append(last_message) 126 | 127 | cnt = 0 128 | while True: 129 | try: 130 | if self.chatbot.name == "td003": 131 | prompt = self.convert_messages_to_prompt(messages) 132 | res = self.chatbot.single_chat(prompt) 133 | else: 134 | res = self.chatbot.multi_chat(messages) 135 | res = json.loads(res) 136 | break 137 | except: 138 | cnt += 1 139 | if cnt >= 3: 140 | return None, None 141 | name = res["name"] 142 | if name not in self.all_players: 143 | return None, None 144 | return name, res 145 | 146 | def convert_messages_to_prompt(self, messages): 147 | prompt = "" 148 | for message in messages: 149 | if message["role"] == "system": 150 | prompt += "system: " 151 | prompt += message["content"] 152 | prompt += "\n" 153 | elif message["role"] == "assistant": 154 | prompt += f"{self.player_name}: " 155 | prompt += message["content"] 156 | prompt += "\n" 157 | else: 158 | prompt += message["content"] 159 | prompt += "\n" 160 | prompt += f"{self.player_name}: " 161 | return prompt 162 | -------------------------------------------------------------------------------- /tofukingdom/agents/princess_agent.py: -------------------------------------------------------------------------------- 1 | 2 | from tofukingdom.utils.prompt import game_prompt_en 3 | from tofukingdom.agents.base_agent import BaseAgent 4 | 5 | class PrincessAgent(BaseAgent): 6 | def __init__(self,chatbot,name,all_players) -> None: 7 | super().__init__(chatbot,name,all_players) 8 | self.role = "Princess" 9 | 10 | def get_role_prompt(self): 11 | role_prompt = ''' 12 | You now need to play the role of the Princess. 13 | For the Prince's question, you can only speak the truth. 14 | The Chef is your teammate. 15 | You goal is to help the Prince correctly choose the princess. 16 | ''' 17 | return role_prompt 18 | 19 | -------------------------------------------------------------------------------- /tofukingdom/agents/queen_agent.py: -------------------------------------------------------------------------------- 1 | from tofukingdom.utils.prompt import game_prompt_en 2 | from tofukingdom.agents.base_agent import BaseAgent 3 | 4 | class QueenAgent(BaseAgent): 5 | def __init__(self,chatbot,name,all_players) -> None: 6 | super().__init__(chatbot,name,all_players) 7 | self.role = "Queen" 8 | 9 | def get_role_prompt(self): 10 | role_prompt = ''' 11 | You now need to play the role of the Queen. 12 | For the Prince's question, you can only speak the lie, you are not allowed to speak the truth. 13 | The Minister and the Guard is your teammate. 14 | You goal is to mislead the Prince to choose you as the Princess. 15 | ''' 16 | return role_prompt 17 | 18 | -------------------------------------------------------------------------------- /tofukingdom/agents/spy_agent.py: -------------------------------------------------------------------------------- 1 | from tofukingdom.utils.prompt import game_prompt_en 2 | from tofukingdom.agents.base_agent import BaseAgent 3 | 4 | class SpyAgent(BaseAgent): 5 | def __init__(self,chatbot,name,all_players) -> None: 6 | super().__init__(chatbot,name,all_players) 7 | self.role = "Spy" 8 | 9 | def get_role_prompt(self): 10 | role_prompt = ''' 11 | You now need to play the role of the Spy. 12 | For the Prince's question, you can choose to say the truth or lie. 13 | The Maid is your teammate. 14 | You goal is to mislead the Prince to choose other players except the Princess and the Queen. 15 | ''' 16 | return role_prompt 17 | -------------------------------------------------------------------------------- /tofukingdom/compute.py: -------------------------------------------------------------------------------- 1 | import json 2 | file_path = "/workspace/qiaodan/tofuking/logs/res.json" 3 | 4 | 5 | with open(file_path,'r') as f: 6 | data = f.readlines() 7 | 8 | res = {"gpt3":0, "gpt4":0} 9 | 10 | for line in data[:100]: 11 | item = json.loads(line) 12 | llms = item["llms"] 13 | score = item["score"] 14 | for i in range(len(llms)): 15 | llm = llms[i] 16 | res[llm] += score[i] 17 | 18 | res["gpt3"] /= 4 19 | res["gpt4"] /= 3 20 | print(res) 21 | 22 | -------------------------------------------------------------------------------- /tofukingdom/utils/prompt.py: -------------------------------------------------------------------------------- 1 | game_prompt_en = ''' 2 | Here is a game called "Tofu Kingdom". 3 | Translation into English: There is a game called "Tofu Kingdom." 4 | 5 | In this game, there are 8 roles of Prince, Princess, Queen, Minister, Chef, Guard, Maid, and Spy. 6 | There is a predetermined Prince. 7 | Other players play one of the remaining roles. 8 | 9 | In the game, the Prince doesn't know the identities of the other players. 10 | The Prince needs to guess which player is the Princess. 11 | But the players except the Prince know the identity of each other. 12 | If the Prince correctly chooses the player who is the Princess, both the Chef and the Princess win. 13 | If the Prince chooses a player who is the Queen, then the Queen, Minister, and Guard win. 14 | If the Prince chooses a player who is any other role except the Princess and Queen, then the Maid and Spy win. 15 | 16 | During the game, the Prince can ask questions to the other players, the question can only be chosen from the three question below: 17 | 1.Which player is the Princess? 18 | 2.What is your identity? 19 | 3.What is the identity of {player_name} ? 20 | 21 | {player_name} must be chosen from the names of the given players. 22 | 23 | For question1, player must answer like: {player_name} is the Princess. 24 | For question2, player must answer like: My identity is {role_name}. 25 | For question3, player must answer like: the identity of {player_name} is {role_name}. 26 | 27 | At the end of the game, the Prince can choose a player to ask one more question, and the question options remain the same. 28 | For the Prince's inquiries, the Princess and Chef must tell the truth; the Queen, Minister, and Guard must lie; the Maid and Spy can either tell the truth or lie. 29 | Finally, the Prince should guess who is the Princess according to the questions and answers. 30 | ''' 31 | 32 | 33 | game_prompt_zh = ''' 34 | 这有一个叫做 “豆腐王国的游戏”。 35 | 在这个游戏中一共有8个玩家，分别扮演王子，公主，女王，大臣，厨师，侍卫，女仆，间谍。 36 | 这个游戏中，王子不知道其余玩家的身份，王子需要猜测哪一位玩家是公主。 37 | 如果王子选择的玩家身份是公主，则厨师与公主胜利。 38 | 如果王子选择的玩家身份是女王，则女王，大臣，侍卫胜利。 39 | 如果王子选择的玩家身份是除了公主和女王以外的其他，则女仆和间谍胜利。 40 | 游戏中，王子需要询问每个玩家一个问题，被问的玩家需要给出回答这个问题只能从以下三个问题中选择： 41 | 1.豆腐公主是哪位玩家。 42 | 2.你的身份是什么。 43 | 3.某个玩家的身份是什么。 44 | 问题3的某个玩家可以是除了王子以外的任何一名玩家。 45 | 游戏的最后，王子还可以再次挑选一个玩家询问一次问题，问题依旧只能从上面三个问题中选择。 46 | 对于王子的提问，公主，厨师，必须说真话；女王，大臣，侍卫必须说假话；女仆，间谍可以说真话也可以说假话。 47 | ''' -------------------------------------------------------------------------------- /tofukingdom/utils/utils.py: -------------------------------------------------------------------------------- 1 | from chat.gpt3_chat import GPT3 2 | from chat.gpt4_chat import GPT4 3 | from chat.text003_chat import Text003 4 | 5 | def get_model(model_name): 6 | model = None 7 | if model_name == "gpt3": 8 | model = GPT3() 9 | if model_name == "gpt4": 10 | model = GPT4() 11 | if model_name == "td003": 12 | model = Text003() 13 | return model 14 | 15 | def create_message(role,content): 16 | return {"role":role,"content":content} 17 | 18 | def print_messages(messages): 19 | for message in messages: 20 | print(message) 21 | 22 | --------------------------------------------------------------------------------