├── .gitignore
├── LICENSE
├── README.md
├── code
    ├── ChatHaruhi
    │   ├── BaiChuan2GPT.py
    │   ├── BaiChuanAPIGPT.py
    │   ├── BaseDB.py
    │   ├── BaseLLM.py
    │   ├── ChatGLM2GPT.py
    │   ├── ChatHaruhi.py
    │   ├── ChatHaruhi_safe.py
    │   ├── ChromaDB.py
    │   ├── ErnieGPT.py
    │   ├── FooLLM.py
    │   ├── GLMPro.py
    │   ├── LangChainGPT.py
    │   ├── Mixtral.py
    │   ├── NaiveDB.py
    │   ├── PrintLLM.py
    │   ├── Qwen118k2GPT.py
    │   ├── SparkApi.py
    │   ├── SparkGPT.py
    │   ├── __init__.py
    │   ├── mistral.py
    │   ├── phi.py
    │   ├── qwen.py
    │   ├── role_name_to_file.py
    │   └── utils.py
    ├── PDB_character_search.py
    ├── api_16personality.py
    ├── characteLLM.py
    ├── characters.py
    ├── config_template.json
    ├── personality_tests.py
    ├── prompts.py
    ├── run_experiments.py
    ├── test_rpa_methods.py
    └── utils.py
├── data
    ├── characters.json
    ├── characters_cllm.json
    ├── characters_labels.json
    ├── pdb_data
    │   └── 鸠摩智_Jiumozhi.json
    └── questionnaires
    │   ├── 16Personalities.json
    │   ├── BFI.json
    │   ├── BSRI.json
    │   ├── CABIN.json
    │   ├── DTDD.json
    │   ├── ECR-R.json
    │   ├── EIS.json
    │   ├── EPQ-R.json
    │   ├── Empathy.json
    │   ├── GSE.json
    │   ├── ICB.json
    │   ├── LMS.json
    │   ├── LOT-R.json
    │   └── WLEIS.json
├── figures
    ├── bfi_radars.pdf
    ├── bfi_radars.png
    ├── demo1.png
    ├── demo2.png
    ├── demo3.png
    ├── demo4.png
    └── demo5.png
└── requirements.txt


/.gitignore:
--------------------------------------------------------------------------------
1 | config.json
2 | results/
3 | data/collected_annotation/
4 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2024 Neph0s
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | 
  2 | 
  3 | # InCharacter: Evaluating Personality Fidelity in Role-Playing Agents through Psychological Interviews
  4 | 
  5 | ### News
  6 | 
  7 | - *May. 2025*: If you are interested in more comprehensive character datasets for RPAs, please check out [**CoSER**](https://arxiv.org/abs/2502.09082), our latest work to be presented at ICML 2025.
  8 | - *May. 2024*: InCharacter got accepted to **ACL 2024**! See you in Bangkok.
  9 | 
 10 | ### Setup
 11 | 
 12 | 
 13 | Install necessary dependencies via:
 14 | 
 15 | ```bash
 16 | pip install -r requirements.txt
 17 | ```
 18 | 
 19 | No need to install ChatHaruhi separately; a fixed version of ChatHaruhi is already included in the code/ directory of this repository.
 20 | 
 21 | Enter the code folder.
 22 | ```bash
 23 | cd code
 24 | ```
 25 | 
 26 | Set your openai apikey in config.json. You can refer to config_template.json for its format.
 27 | 
 28 | ### Personality Assessment
 29 | 
 30 | To assess the personality of a specific role-playing agent (RPA), use the following commands:
 31 | 
 32 | Conduct personality test on a specific character:
 33 | 
 34 | ```bash
 35 | python personality_tests.py --questionnaire_name BFI --character hutao --agent_type ChatHaruhi --agent_llm gpt-3.5 --evaluator_llm gpt-4 --eval_method interview_assess_batch_anonymous
 36 | ```
 37 | 
 38 | Supported choices for eval_method include ['self_report' (SR), 'self_report_cot' (SR-CoT), 'expert_rating' (ER_batch), 'expert_rating_collective' (ER_all), 'option_conversion' (OC),'dimension_option_conversion' (d-OC)].
 39 | 
 40 | To reproduce our experiments on the 32 RPAs, please refer to code/run_experiments.py
 41 | 
 42 | <br/>
 43 | 
 44 | ### BFI Personalities of Selected Characters/RPAs
 45 | 
 46 | <img src='figures/bfi_radars.png' alt=''/>
 47 | 
 48 | Radar chart of BFI personalities of state-of-the-art RPAs (yellow) and the characters (blue). O, C, E, A, N stands for openness, consciousness, extroversion, agreeableness and neuroticism in the BFI. 
 49 | 
 50 | ### Demo
 51 | [[Online Demo](http://182.92.3.33:3350/)]
 52 | 
 53 | 
 54 | #### English:
 55 | 
 56 | Interview Response: 
 57 | 
 58 | <img src='figures/demo1.png' alt=''/>
 59 | 
 60 | Result: 
 61 | 
 62 | <img src='figures/demo2.png' alt=''/>
 63 | 
 64 | Self-report Response:
 65 | 
 66 | (May give options inconsistent with character behaviors)
 67 | 
 68 | <img src='figures/demo5.png' alt=''/>
 69 | 
 70 | #### Chinese:
 71 | 
 72 | Interview Response:
 73 | 
 74 | <img src='figures/demo3.png' alt=''/>
 75 | 
 76 | Result:
 77 | 
 78 | <img src='figures/demo4.png' alt=''/>
 79 | 
 80 | ### PDB character extraction
 81 | 
 82 | **PDB Character Search Script**
 83 | 
 84 | This Python script facilitates automated searching of character profiles on the Personality Database (PDB) website using Selenium and BeautifulSoup. The script is designed to retrieve the ID of a given character by searching the website and extracting relevant profile information.
 85 | 
 86 | 
 87 | #### How to Use:
 88 | 1. Install the necessary dependencies using the following command:
 89 |    ```bash
 90 |    pip install requests beautifulsoup4 msedge-selenium-tools
 91 |    ```
 92 | 2. Ensure you have Microsoft Edge and the corresponding WebDriver (`msedgedriver.exe`) installed.
 93 | 3. Execute the script and input the desired character name. The script will return the ID of the character if found.
 94 | 
 95 | #### Example Usage:
 96 | ```python
 97 | character_id = get_character_id("Tony Stark")
 98 | print(character_id)
 99 | ```
100 | 
101 | 
102 | 


--------------------------------------------------------------------------------
/code/ChatHaruhi/BaiChuan2GPT.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | from .BaseLLM import BaseLLM
 3 | from transformers import AutoModelForCausalLM, AutoTokenizer
 4 | from transformers.generation.utils import GenerationConfig
 5 | from peft import PeftModel
 6 | 
 7 | tokenizer_BaiChuan = None
 8 | model_BaiChuan = None
 9 | 
10 | def initialize_BaiChuan2LORA():
11 |     global model_BaiChuan, tokenizer_BaiChuan
12 |     
13 |     if model_BaiChuan is None:
14 |         model_BaiChuan = AutoModelForCausalLM.from_pretrained(
15 |             "baichuan-inc/Baichuan2-13B-Chat",
16 |             device_map="auto",
17 |             torch_dtype=torch.bfloat16,
18 |             trust_remote_code=True,
19 |         )
20 |         model_BaiChuan = PeftModel.from_pretrained(
21 |             model_BaiChuan,
22 |             "silk-road/Chat-Haruhi-Fusion_Baichuan2_13B"
23 |         )
24 |         model_BaiChuan.generation_config = GenerationConfig.from_pretrained(
25 |             "baichuan-inc/Baichuan2-13B-Chat"
26 |         )
27 |     
28 |     if tokenizer_BaiChuan is None:
29 |         tokenizer_BaiChuan =  AutoTokenizer.from_pretrained(
30 |             "baichuan-inc/Baichuan2-13B-Chat", 
31 |             use_fast=True, 
32 |             trust_remote_code=True
33 |         )
34 |     
35 |     return model_BaiChuan, tokenizer_BaiChuan
36 | 
37 | def BaiChuan_tokenizer(text):
38 |     return len(tokenizer_BaiChuan.encode(text))
39 | 
40 | class BaiChuan2GPT(BaseLLM):
41 |     def __init__(self, model = "haruhi-fusion-baichuan"):
42 |         super(BaiChuan2GPT, self).__init__()
43 |         if model == "baichuan2-13b":
44 |             self.tokenizer = AutoTokenizer.from_pretrained(
45 |                 "baichuan-inc/Baichuan2-13B-Chat", 
46 |                 use_fast=True, 
47 |                 trust_remote_code=True
48 |             ),
49 |             self.model = AutoModelForCausalLM.from_pretrained(
50 |                 "baichuan-inc/Baichuan2-13B-Chat",
51 |                 device_map="auto",
52 |                 torch_dtype=torch.bfloat16,
53 |                 trust_remote_code=True,
54 |             )
55 |             self.model.generation_config = GenerationConfig.from_pretrained(
56 |                 "baichuan-inc/Baichuan2-13B-Chat"
57 |             )
58 |         elif model == "haruhi-fusion-baichuan":
59 |             self.model, self.tokenizer = initialize_BaiChuan2LORA()
60 |         else:
61 |             raise Exception("Unknown BaiChuan Model! Currently supported: [BaiChuan2-13B, haruhi-fusion-baichuan]")
62 |         self.messages = []
63 | 
64 |     def initialize_message(self):
65 |         self.messages = []
66 | 
67 |     def ai_message(self, payload):
68 |         self.messages.append({"role": "assistant", "content": payload})
69 | 
70 |     def system_message(self, payload):
71 |         self.messages.append({"role": "system", "content": payload})
72 | 
73 |     def user_message(self, payload):
74 |         self.messages.append({"role": "user", "content": payload})
75 | 
76 |     def get_response(self):
77 |         with torch.no_grad():
78 |             response = self.model.chat(self.tokenizer, self.messages)
79 |         return response
80 |         
81 |     def print_prompt(self):
82 |         print(type(self.messages))
83 |         print(self.messages)


--------------------------------------------------------------------------------
/code/ChatHaruhi/BaiChuanAPIGPT.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import json
  3 | import time
  4 | import hashlib
  5 | import requests
  6 | import copy 
  7 | 
  8 | from .BaseLLM import BaseLLM
  9 | 
 10 | BAICHUAN_API_AK = os.getenv("BAICHUAN_API_AK")
 11 | BAICHUAN_API_SK = os.getenv("BAICHUAN_API_SK")
 12 | 
 13 | def sign(secret_key, data):
 14 |     json_data = json.dumps(data)
 15 |     time_stamp = int(time.time())
 16 |     input_string = secret_key + json_data + str(time_stamp)
 17 |     md5 = hashlib.md5()
 18 |     md5.update(input_string.encode('utf-8'))
 19 |     encrypted = md5.hexdigest()
 20 |     return encrypted
 21 | 
 22 | def do_request(messages, api_key, secret_key):
 23 |     url = "https://api.baichuan-ai.com/v1/chat"
 24 | 
 25 |     data = {
 26 |         "model": "Baichuan2-53B",
 27 |         "messages": messages
 28 |     }
 29 | 
 30 |     signature = sign(secret_key, data)
 31 | 
 32 |     headers = {
 33 |         "Content-Type": "application/json",
 34 |         "Authorization": "Bearer " + api_key,
 35 |         "X-BC-Request-Id": "your requestId",
 36 |         "X-BC-Timestamp": str(int(time.time())),
 37 |         "X-BC-Signature": signature,
 38 |         "X-BC-Sign-Algo": "MD5",
 39 |     }
 40 | 
 41 |     response = requests.post(url, data=json.dumps(data), headers=headers)
 42 |     if response.status_code == 200:
 43 |         return response.json()
 44 |     else:
 45 |         return None
 46 | 
 47 | class BaiChuanAPIGPT(BaseLLM):
 48 |     def __init__(self, model="baichuan-api", api_key=None, secret_key=None, verbose=False, if_trick = True):
 49 |         self.if_trick = if_trick
 50 |         super(BaiChuanAPIGPT, self).__init__()
 51 |         self.api_key = api_key or BAICHUAN_API_AK
 52 |         self.secret_key = secret_key or BAICHUAN_API_SK
 53 |         self.verbose = verbose
 54 |         self.model_name = model
 55 |         self.messages = []
 56 |         if self.verbose:
 57 |             print('model name, ', self.model_name)
 58 |             if self.api_key is None or self.secret_key is None:
 59 |                 print('Please set BAICHUAN_API_AK and BAICHUAN_API_SK')
 60 | 
 61 |     def initialize_message(self):
 62 |         self.messages = []
 63 | 
 64 | 
 65 |     def ai_message(self, payload):
 66 |         if len(self.messages) == 0:
 67 |             self.user_message("请根据我的要求进行角色扮演:")
 68 |         elif len(self.messages) % 2 == 1:
 69 |             self.messages.append({"role":"assistant","content":payload})
 70 |         elif len(self.messages)% 2 == 0:
 71 |             self.messages[-1]["content"] += "\n"+ payload
 72 | 
 73 |     def system_message(self, payload):
 74 |         
 75 |         self.messages.append({"role":"user","content":payload}) 
 76 |         
 77 | 
 78 |     def user_message(self, payload):
 79 |         if len(self.messages) % 2 == 0:
 80 |             self.messages.append({"role":"user","content":payload})
 81 |             # self.messages[-1]["content"] += 
 82 |         elif len(self.messages)% 2 == 1:
 83 |             self.messages[-1]["content"] += "\n"+ payload
 84 | 
 85 |     def get_response(self):
 86 |         max_try = 5
 87 |         sleep_interval = 3
 88 |         
 89 |         chat_messages = copy.deepcopy(self.messages)
 90 |         
 91 |         if self.if_trick == True:
 92 |             lines = chat_messages[-1]["content"].split('\n')
 93 |             lines.insert(-1, '请请模仿上述经典桥段进行回复\n')
 94 |             chat_messages[-1]["content"] = '\n'.join(lines)
 95 | 
 96 |         for i in range(max_try):
 97 |             response = do_request(chat_messages, self.api_key, self.secret_key)
 98 |             if response is not None:
 99 |                 if self.verbose:
100 |                     print('Get Baichuan API response success')
101 |                 messages = response['data']['messages']
102 |                 if len(messages) > 0:
103 |                     return messages[-1]['content'].strip("\"'")
104 |             else:
105 |                 if self.verbose:
106 |                     print('Get Baichuan API response failed, retrying...')
107 |                 time.sleep(sleep_interval)
108 |             
109 |     def print_prompt(self):
110 |         for message in self.messages:
111 |             print(f"{message['role']}: {message['content']}")
112 |             


--------------------------------------------------------------------------------
/code/ChatHaruhi/BaseDB.py:
--------------------------------------------------------------------------------
 1 | # BaseDB.py
 2 | 
 3 | from abc import ABC, abstractmethod
 4 | 
 5 | class BaseDB(ABC):
 6 | 
 7 |     @abstractmethod
 8 |     def init_db(self):
 9 |         pass
10 |     
11 |     @abstractmethod
12 |     def save(self, file_path):
13 |         pass
14 | 
15 |     @abstractmethod
16 |     def load(self, file_path):
17 |         pass
18 | 
19 |     @abstractmethod
20 |     def search(self, vector, n_results):
21 |         pass
22 | 
23 |     @abstractmethod
24 |     def init_from_docs(self, vectors, documents):
25 |         pass
26 | 
27 |     


--------------------------------------------------------------------------------
/code/ChatHaruhi/BaseLLM.py:
--------------------------------------------------------------------------------
 1 | # ChatHaruhi: Reviving Anime Character in Reality via Large Language Model
 2 | #
 3 | # ChatHaruhi 2.0, built by Cheng Li and Weishi Mi
 4 | #
 5 | # chengli.thu@gmail.com, mws22@mails.tsinghua.edu.cn
 6 | # 
 7 | # Weishi Mi is a second-year graduate student at Tsinghua University, majoring in computer science.
 8 | # Weishi Mi is pursuing a job or a PhD position, which who will be available next year
 9 | # 
10 | # homepage https://github.com/LC1332/Chat-Haruhi-Suzumiya
11 | # 
12 | # ChatHaruhi is a chatbot that can revive anime characters in reality.
13 | # the 2.0 version was built by Cheng Li and Weishi Mi.
14 | # 
15 | # Please cite our paper if you use this code for research: 
16 | #
17 | # @misc{li2023chatharuhi,
18 | #       title={ChatHaruhi: Reviving Anime Character in Reality via Large Language Model}, 
19 | #       author={Cheng Li and Ziang Leng and Chenxi Yan and Junyi Shen and Hao Wang and Weishi MI and Yaying Fei and Xiaoyang Feng and Song Yan and HaoSheng Wang and Linkang Zhan and Yaokai Jia and Pingyu Wu and Haozhen Sun},
20 | #       year={2023},
21 | #       eprint={2308.09597},
22 | #       archivePrefix={arXiv},
23 | #       primaryClass={cs.CL}
24 | # }
25 | from abc import ABC, abstractmethod
26 | 
27 | class BaseLLM(ABC):
28 | 
29 |     def __init__(self):
30 |         pass
31 |     
32 |     @abstractmethod
33 |     def initialize_message(self):
34 |         pass
35 | 
36 |     @abstractmethod    
37 |     def ai_message(self, payload):
38 |         pass
39 | 
40 |     @abstractmethod
41 |     def system_message(self, payload):
42 |         pass
43 | 
44 |     @abstractmethod
45 |     def user_message(self, payload):
46 |         pass
47 | 
48 |     @abstractmethod
49 |     def get_response(self):
50 |         pass
51 | 
52 |     @abstractmethod
53 |     def print_prompt(self):
54 |         pass
55 | 
56 | 
57 | 


--------------------------------------------------------------------------------
/code/ChatHaruhi/ChatGLM2GPT.py:
--------------------------------------------------------------------------------
 1 | import torch 
 2 | from .BaseLLM import BaseLLM
 3 | from transformers import AutoTokenizer, AutoModel
 4 | from peft import PeftModel
 5 | 
 6 | tokenizer_GLM = None
 7 | model_GLM = None
 8 | 
 9 | def initialize_GLM2LORA():
10 |     global model_GLM, tokenizer_GLM
11 | 
12 |     if model_GLM is None:
13 |         model_GLM = AutoModel.from_pretrained(
14 |             "THUDM/chatglm2-6b",
15 |             torch_dtype=torch.float16,
16 |             device_map="auto",
17 |             trust_remote_code=True
18 |         )
19 |         model_GLM = PeftModel.from_pretrained(
20 |             model_GLM,
21 |             "silk-road/Chat-Haruhi-Fusion_B"
22 |         )
23 | 
24 |     if tokenizer_GLM is None:
25 |         tokenizer_GLM = AutoTokenizer.from_pretrained(
26 |             "THUDM/chatglm2-6b", 
27 |             use_fast=True,
28 |             trust_remote_code=True
29 |         )
30 | 
31 |     return model_GLM, tokenizer_GLM
32 | 
33 | def GLM_tokenizer(text):
34 |     return len(tokenizer_GLM.encode(text))
35 | 
36 | class ChatGLM2GPT(BaseLLM):
37 |     def __init__(self, model = "haruhi-fusion"):
38 |         super(ChatGLM2GPT, self).__init__()
39 |         if model == "glm2-6b":
40 |             self.tokenizer = AutoTokenizer.from_pretrained(
41 |                 "THUDM/chatglm2-6b", 
42 |                 use_fast=True,
43 |                 trust_remote_code=True
44 |             )
45 |             self.model = AutoModel.from_pretrained(
46 |                 "THUDM/chatglm2-6b",
47 |                 torch_dtype=torch.float16,
48 |                 device_map="auto",
49 |                 trust_remote_code=True
50 |             )
51 |         if model == "haruhi-fusion":
52 |             self.model, self.tokenizer = initialize_GLM2LORA()
53 |         else:
54 |             raise Exception("Unknown GLM model")
55 |         self.messages = ""
56 | 
57 |     def initialize_message(self):
58 |         self.messages = ""
59 | 
60 |     def ai_message(self, payload):
61 |         self.messages = self.messages + "\n " + payload 
62 | 
63 |     def system_message(self, payload):
64 |         self.messages = self.messages + "\n " + payload 
65 | 
66 |     def user_message(self, payload):
67 |         self.messages = self.messages + "\n " + payload 
68 | 
69 |     def get_response(self):
70 |         with torch.no_grad():
71 |             response, history = self.model.chat(self.tokenizer, self.messages, history=[])
72 |             # print(response)
73 |         return response
74 |         
75 |     def print_prompt(self):
76 |         print(type(self.messages))
77 |         print(self.messages)
78 | 
79 |     


--------------------------------------------------------------------------------
/code/ChatHaruhi/ChatHaruhi_safe.py:
--------------------------------------------------------------------------------
  1 | from .ChromaDB import ChromaDB
  2 | import os
  3 | 
  4 | from .utils import luotuo_openai_embedding, tiktokenizer
  5 | 
  6 | from .utils import response_postprocess
  7 | 
  8 | from .utils import text_censor
  9 | 
 10 | class ChatHaruhi_safe:
 11 | 
 12 |     def __init__(self, system_prompt = None, \
 13 |                  role_name = None, role_from_hf = None, \
 14 |                  story_db=None, story_text_folder = None, \
 15 |                  llm = 'openai', \
 16 |                  embedding = 'luotuo_openai', \
 17 |                  max_len_story = None, max_len_history = None,
 18 |                  verbose = False):
 19 |         super(ChatHaruhi_safe, self).__init__()
 20 |         self.verbose = verbose
 21 | 
 22 |         # constants
 23 |         self.story_prefix_prompt = "Classic scenes for the role are as follows:\n"
 24 |         self.k_search = 19
 25 |         self.narrator = ['旁白', '', 'scene','Scene','narrator' , 'Narrator']
 26 |         self.dialogue_divide_token = '\n###\n'
 27 |         self.dialogue_bra_token = '「'
 28 |         self.dialogue_ket_token = '」'
 29 | 
 30 |         if system_prompt:
 31 |             self.system_prompt = self.check_system_prompt( system_prompt )
 32 | 
 33 |         # TODO: embedding should be the seperately defined, so refactor this part later
 34 |         if llm == 'openai':
 35 |             # self.llm = LangChainGPT()
 36 |             self.llm, self.tokenizer = self.get_models('openai')
 37 |         elif llm == 'debug':
 38 |             self.llm, self.tokenizer = self.get_models('debug')
 39 |         elif llm == 'spark':
 40 |             self.llm, self.tokenizer = self.get_models('spark')
 41 |         elif llm == 'GLMPro':
 42 |             self.llm, self.tokenizer = self.get_models('GLMPro')
 43 |         elif llm == 'ChatGLM2GPT':
 44 |             self.llm, self.tokenizer = self.get_models('ChatGLM2GPT')
 45 |             self.story_prefix_prompt = '\n'
 46 |         elif llm == "BaiChuan2GPT":
 47 |             self.llm, self.tokenizer = self.get_models('BaiChuan2GPT')
 48 |         elif llm == "BaiChuanAPIGPT":
 49 |             self.llm, self.tokenizer = self.get_models('BaiChuanAPIGPT')
 50 |         elif llm == "ernie3.5":
 51 |             self.llm, self.tokenizer = self.get_models('ernie3.5')
 52 |         elif llm == "ernie4.0":
 53 |             self.llm, self.tokenizer = self.get_models('ernie4.0')
 54 |         else:
 55 |             print(f'warning! undefined llm {llm}, use openai instead.')
 56 |             self.llm, self.tokenizer = self.get_models('openai')
 57 | 
 58 |         if embedding == 'luotuo_openai':
 59 |             self.embedding = luotuo_openai_embedding
 60 |         elif embedding == 'bge_en':
 61 |             from .utils import get_bge_embedding
 62 |             self.embedding = get_bge_embedding
 63 |         elif embedding == 'bge_zh':
 64 |             from .utils import get_bge_zh_embedding
 65 |             self.embedding = get_bge_zh_embedding
 66 |         else:
 67 |             print(f'warning! undefined embedding {embedding}, use luotuo_openai instead.')
 68 |             self.embedding = luotuo_openai_embedding
 69 |         
 70 |         if role_name:
 71 |             # TODO move into a function
 72 |             from .role_name_to_file import get_folder_role_name
 73 |             # correct role_name to folder_role_name
 74 |             role_name, url = get_folder_role_name(role_name)
 75 | 
 76 |             unzip_folder = f'./temp_character_folder/temp_{role_name}'
 77 |             db_folder = os.path.join(unzip_folder, f'content/{role_name}')
 78 |             system_prompt = os.path.join(unzip_folder, f'content/system_prompt.txt')
 79 | 
 80 |             if not os.path.exists(unzip_folder):
 81 |                 # not yet downloaded
 82 |                 # url = f'https://github.com/LC1332/Haruhi-2-Dev/raw/main/data/character_in_zip/{role_name}.zip'
 83 |                 import requests, zipfile, io
 84 |                 r = requests.get(url)
 85 |                 z = zipfile.ZipFile(io.BytesIO(r.content))
 86 |                 z.extractall(unzip_folder)
 87 | 
 88 |             if self.verbose:
 89 |                 print(f'loading pre-defined character {role_name}...')
 90 |             
 91 |             self.db = ChromaDB()
 92 |             self.db.load(db_folder)
 93 |             self.system_prompt = self.check_system_prompt(system_prompt)
 94 |         elif role_from_hf:
 95 |             # TODO move into a function
 96 |             from datasets import load_dataset
 97 | 
 98 |             if role_from_hf.count("/") == 1:
 99 |                 dataset = load_dataset(role_from_hf)
100 |                 datas = dataset["train"]
101 |             elif role_from_hf.count("/") >= 2:
102 |                 split_index = role_from_hf.index('/') 
103 |                 second_split_index = role_from_hf.index('/', split_index+1)
104 |                 dataset_name = role_from_hf[:second_split_index] 
105 |                 split_name = role_from_hf[second_split_index+1:]
106 |                 
107 |                 fname = split_name + '.jsonl'
108 |                 dataset = load_dataset(dataset_name,data_files={'train':fname})
109 |                 datas = dataset["train"]
110 | 
111 | 
112 |             from .utils import base64_to_float_array
113 |             
114 |             if embedding == 'luotuo_openai':
115 |                 embed_name = 'luotuo_openai'
116 |             elif embedding == 'bge_en':
117 |                 embed_name = 'bge_en_s15'
118 |             elif embedding == 'bge_zh':
119 |                 embed_name = 'bge_zh_s15'
120 |             else:
121 |                 print('warning! unkown embedding name ', embedding ,' while loading role')
122 |                 embed_name = 'luotuo_openai'
123 | 
124 |             texts = []
125 |             vecs = []
126 |             for data in datas:
127 |                 if data[embed_name] == 'system_prompt':
128 |                     self.system_prompt = data['text']
129 |                 elif data[embed_name] == 'config':
130 |                     pass
131 |                 else:
132 |                     vec = base64_to_float_array( data[embed_name] )
133 |                     text = data['text']
134 |                     vecs.append( vec )
135 |                     texts.append( text )
136 | 
137 |             self.build_story_db_from_vec( texts, vecs )
138 |             
139 |         elif story_db:
140 |             self.db = ChromaDB() 
141 |             self.db.load(story_db)
142 |         elif story_text_folder:
143 |             # print("Building story database from texts...")
144 |             self.db = self.build_story_db(story_text_folder) 
145 |         else:
146 |             self.db = None
147 |             print('warning! database not yet figured out, both story_db and story_text_folder are not inputted.')
148 |             # raise ValueError("Either story_db or story_text_folder must be provided")
149 |         
150 | 
151 |         self.max_len_story, self.max_len_history = self.get_tokenlen_setting('openai')
152 | 
153 |         if max_len_history is not None:
154 |             self.max_len_history = max_len_history
155 |             # user setting will override default setting
156 | 
157 |         if max_len_story is not None:
158 |             self.max_len_story = max_len_story
159 |             # user setting will override default setting
160 | 
161 |         self.dialogue_history = []
162 | 
163 |         
164 | 
165 |     def check_system_prompt(self, system_prompt):
166 |         # if system_prompt end with .txt, read the file with utf-8
167 |         # else, return the string directly
168 |         if system_prompt.endswith('.txt'):
169 |             with open(system_prompt, 'r', encoding='utf-8') as f:
170 |                 return f.read()
171 |         else:
172 |             return system_prompt
173 |     
174 | 
175 |     def get_models(self, model_name):
176 | 
177 |         # TODO: if output only require tokenizer model, no need to initialize llm
178 |         
179 |         # return the combination of llm, embedding and tokenizer
180 |         if model_name == 'openai':
181 |             from .LangChainGPT import LangChainGPT
182 |             return (LangChainGPT(), tiktokenizer)
183 |         elif model_name == 'debug':
184 |             from .PrintLLM import PrintLLM
185 |             return (PrintLLM(), tiktokenizer)
186 |         elif model_name == 'spark':
187 |             from .SparkGPT import SparkGPT
188 |             return (SparkGPT(), tiktokenizer)
189 |         elif model_name == 'GLMPro':
190 |             from .GLMPro import GLMPro
191 |             return (GLMPro(), tiktokenizer)
192 |         elif model_name == 'ernie3.5':
193 |             from .ErnieGPT import ErnieGPT
194 |             return (ErnieGPT(), tiktokenizer)
195 |         elif model_name == 'ernie4.0':
196 |             from .ErnieGPT import ErnieGPT
197 |             return (ErnieGPT(model="ernie-bot-4"), tiktokenizer)
198 |         elif model_name == "ChatGLM2GPT":
199 |             from .ChatGLM2GPT import ChatGLM2GPT, GLM_tokenizer
200 |             return (ChatGLM2GPT(), GLM_tokenizer)
201 |         elif model_name == "BaiChuan2GPT":
202 |             from .BaiChuan2GPT import BaiChuan2GPT, BaiChuan_tokenizer
203 |             return (BaiChuan2GPT(), BaiChuan_tokenizer)
204 |         elif model_name == "BaiChuanAPIGPT":
205 |             from .BaiChuanAPIGPT import BaiChuanAPIGPT
206 |             return (BaiChuanAPIGPT(), tiktokenizer)
207 |         else:
208 |             print(f'warning! undefined model {model_name}, use openai instead.')
209 |             from .LangChainGPT import LangChainGPT
210 |             return (LangChainGPT(), tiktokenizer)
211 |         
212 |     def get_tokenlen_setting( self, model_name ):
213 |         # return the setting of story and history token length
214 |         if model_name == 'openai':
215 |             return (1500, 1200)
216 |         else:
217 |             print(f'warning! undefined model {model_name}, use openai instead.')
218 |             return (1500, 1200)
219 |         
220 |     def build_story_db_from_vec( self, texts, vecs ):
221 |         self.db = ChromaDB()
222 | 
223 |         self.db.init_from_docs( vecs, texts)
224 | 
225 |     def build_story_db(self, text_folder):
226 |         # 实现读取文本文件夹,抽取向量的逻辑
227 |         db = ChromaDB()
228 | 
229 |         strs = []
230 | 
231 |         # scan all txt file from text_folder
232 |         for file in os.listdir(text_folder):
233 |             # if file name end with txt
234 |             if file.endswith(".txt"):
235 |                 file_path = os.path.join(text_folder, file)
236 |                 with open(file_path, 'r', encoding='utf-8') as f:
237 |                     strs.append(f.read())
238 | 
239 |         if self.verbose:
240 |             print(f'starting extract embedding... for { len(strs) } files')
241 | 
242 |         vecs = []
243 | 
244 |         ## TODO: 建立一个新的embedding batch test的单元测试
245 |         ## 新的支持list batch test的embedding代码
246 |         ## 用新的代码替换下面的for循环
247 |         ## Luotuo-bert-en也发布了，所以可以避开使用openai
248 |         
249 |         for mystr in strs:
250 |             vecs.append(self.embedding(mystr))
251 | 
252 |         db.init_from_docs(vecs, strs)
253 | 
254 |         return db
255 |     
256 |     def save_story_db(self, db_path):
257 |         self.db.save(db_path)
258 |         
259 |     def chat(self, text, role):
260 |         # add system prompt
261 |         self.llm.initialize_message()
262 |         self.llm.system_message(self.system_prompt)
263 |     
264 | 
265 |         # add story
266 |         query = self.get_query_string(text, role)
267 |         self.add_story( query )
268 | 
269 |         # add history
270 |         self.add_history()
271 | 
272 |         # add query
273 |         self.llm.user_message(query)
274 |         
275 |         # get response
276 |         response_raw = self.llm.get_response()
277 | 
278 |         response = response_postprocess(response_raw, self.dialogue_bra_token, self.dialogue_ket_token)
279 | 
280 |         # record dialogue history
281 |         self.dialogue_history.append((query, response))
282 | 
283 | 
284 | 
285 |         return response
286 |     
287 |     def get_query_string(self, text, role):
288 |         if role in self.narrator:
289 |             return role + ":" + text
290 |         else:
291 |             return f"{role}:{self.dialogue_bra_token}{text}{self.dialogue_ket_token}"
292 |         
293 |     def add_story(self, query):
294 | 
295 |         if self.db is None:
296 |             return
297 |         
298 |         query_vec = self.embedding(query)
299 | 
300 |         stories = self.db.search(query_vec, self.k_search)
301 |         
302 |         story_string = self.story_prefix_prompt
303 |         sum_story_token = self.tokenizer(story_string)
304 |         
305 |         for story in stories:
306 |             story_token = self.tokenizer(story) + self.tokenizer(self.dialogue_divide_token)
307 |             if sum_story_token + story_token > self.max_len_story:
308 |                 break
309 |             else:
310 |                 sum_story_token += story_token
311 |                 story_string += story + self.dialogue_divide_token
312 |         
313 |         if text_censor(story_string):
314 |             self.llm.user_message(story_string)
315 |         
316 |     def add_history(self):
317 | 
318 |         if len(self.dialogue_history) == 0:
319 |             return
320 |         
321 |         sum_history_token = 0
322 |         flag = 0
323 |         for query, response in reversed(self.dialogue_history):
324 |             current_count = 0
325 |             if query is not None:
326 |                 current_count += self.tokenizer(query) 
327 |             if response is not None:
328 |                 current_count += self.tokenizer(response)
329 |             sum_history_token += current_count
330 |             if sum_history_token > self.max_len_history:
331 |                 break
332 |             else:
333 |                 flag += 1
334 | 
335 |         if flag == 0:
336 |             print('warning! no history added. the last dialogue is too long.')
337 | 
338 |         for (query, response) in self.dialogue_history[-flag:]:
339 |             if query is not None:
340 |                 self.llm.user_message(query)
341 |             if response is not None:
342 |                 self.llm.ai_message(response)
343 | 


--------------------------------------------------------------------------------
/code/ChatHaruhi/ChromaDB.py:
--------------------------------------------------------------------------------
 1 | import chromadb
 2 | from .BaseDB import BaseDB
 3 | import random
 4 | import string
 5 | import os
 6 | 
 7 | class ChromaDB(BaseDB):
 8 |     
 9 |     def __init__(self):
10 |         self.client = None
11 |         self.collection = None
12 |         self.path = None
13 |     
14 |     def init_db(self):
15 | 
16 |         if self.client is not None:
17 |             print('ChromaDB has already been initialized')
18 |             return
19 | 
20 |         folder_name = ''
21 | 
22 |         while os.path.exists(folder_name) or folder_name == '':
23 |             # try to create a folder named temp_<random string> which is not yet existed
24 |             folder_name =  "tempdb_" + ''.join(random.sample(string.ascii_letters + string.digits, 8))
25 | 
26 |         self.path = folder_name
27 |         self.client = chromadb.PersistentClient(path = folder_name)
28 | 
29 |         self.collection = self.client.get_or_create_collection("search")
30 | 
31 |     def save(self, file_path):
32 |         if file_path != self.path:
33 |             # copy all files in self.path to file_path, with overwrite
34 |             os.system("cp -r " + self.path + " " + file_path)
35 |             previous_path = self.path
36 |             self.path = file_path
37 |             self.client = chromadb.PersistentClient(path = file_path)
38 |             # remove previous path if it start with tempdb
39 |             if previous_path.startswith("tempdb"):
40 |                 os.system("rm -rf " + previous_path)
41 |                         
42 | 
43 |     def load(self, file_path):
44 |         self.path = file_path
45 |         self.client = chromadb.PersistentClient(path = file_path)
46 |         self.collection = self.client.get_collection("search")
47 | 
48 |     def search(self, vector, n_results):
49 |         results = self.collection.query(query_embeddings=[vector], n_results=n_results)
50 |         return results['documents'][0]
51 | 
52 |     def init_from_docs(self, vectors, documents):
53 |         if self.client is None:
54 |             self.init_db()
55 |         
56 |         ids = []
57 |         for i, doc in enumerate(documents):
58 |             first_four_chat = doc[:min(4, len(doc))]
59 |             ids.append( str(i) + "_" + doc)
60 |         self.collection.add(embeddings=vectors, documents=documents, ids = ids)
61 |         
62 | 


--------------------------------------------------------------------------------
/code/ChatHaruhi/ErnieGPT.py:
--------------------------------------------------------------------------------
 1 | # ErnieGPT.py
 2 | from pyexpat import model
 3 | import erniebot 
 4 | #以下密钥信息从os环境获取
 5 | import os
 6 | import copy
 7 | 
 8 | # appid = os.environ['APPID']
 9 | # api_secret = os.environ['APISecret'] 
10 | # api_key = os.environ['APIKey']
11 | erniebot.api_type = os.environ["APIType"]
12 | erniebot.access_token = os.environ["ErnieAccess"]
13 | 
14 | from .BaseLLM import BaseLLM
15 | 
16 | class ErnieGPT(BaseLLM):
17 | 
18 |     def __init__(self,model="ernie-bot", ernie_trick = True ):
19 |         super(ErnieGPT,self).__init__()
20 |         self.model = model
21 |         if model not in ["ernie-bot", "ernie-bot-turbo", "ernie-vilg-v2", "ernie-text-embedding", "ernie-bot-8k", "ernie-bot-4"]:
22 |             raise Exception("Unknown Ernie model")
23 |         # SparkApi.answer =""
24 |         self.messages = []
25 | 
26 |         self.ernie_trick = ernie_trick
27 |         
28 | 
29 |     def initialize_message(self):
30 |         self.messages = []
31 | 
32 |     def ai_message(self, payload):
33 |         if len(self.messages) == 0:
34 |             self.user_message("请根据我的要求进行角色扮演:")
35 |         elif len(self.messages) % 2 == 1:
36 |             self.messages.append({"role":"assistant","content":payload})
37 |         elif len(self.messages)% 2 == 0:
38 |             self.messages[-1]["content"] += "\n"+ payload
39 | 
40 |     def system_message(self, payload):
41 |         
42 |         self.messages.append({"role":"user","content":payload}) 
43 |         
44 | 
45 |     def user_message(self, payload):
46 |         if len(self.messages) % 2 == 0:
47 |             self.messages.append({"role":"user","content":payload})
48 |             # self.messages[-1]["content"] += 
49 |         elif len(self.messages)% 2 == 1:
50 |             self.messages[-1]["content"] += "\n"+ payload
51 | 
52 |     def get_response(self):
53 |         # question = checklen(getText("user",Input))
54 |         chat_messages = copy.deepcopy(self.messages)
55 | 
56 |         lines = chat_messages[-1]["content"].split('\n')
57 | 
58 |         if self.ernie_trick:
59 |             lines.insert(-1, '请请模仿上述经典桥段进行回复\n')
60 |         
61 |         chat_messages[-1]["content"] = '\n'.join(lines)
62 | 
63 |         # chat_messages[-1]["content"] = "请请模仿上述经典桥段进行回复\n" + chat_messages[-1]["content"] 
64 |         response = erniebot.ChatCompletion.create(model=self.model, messages=chat_messages)
65 |         # message_json = [{"role": "user", "content": self.messages}]
66 |         # SparkApi.answer =""
67 |         # SparkApi.main(appid,api_key,api_secret,self.Spark_url,self.domain,message_json)
68 |         return response["result"]
69 |     
70 |     def print_prompt(self):
71 |         for message in self.messages:
72 |             print(f"{message['role']}: {message['content']}")
73 | 


--------------------------------------------------------------------------------
/code/ChatHaruhi/FooLLM.py:
--------------------------------------------------------------------------------
 1 | 
 2 | from .BaseLLM import BaseLLM
 3 | 
 4 | # fooLLM do nothing but record the messages
 5 | 
 6 | class FooLLM(BaseLLM):
 7 | 
 8 |     def __init__(self ):
 9 |         self.messages = []
10 | 
11 |     def initialize_message(self):
12 |         self.messages = []
13 | 
14 |     def ai_message(self, payload):
15 |         self.messages.append({"role":"AI","content":payload})
16 | 
17 |     def system_message(self, payload):
18 |         self.messages.append({"role":"System","content":payload})
19 | 
20 |     def user_message(self, payload):
21 |         self.messages.append({"role":"User","content":payload})
22 | 
23 |     def get_response(self):
24 |         for message in self.messages:
25 |             print(message["role"], ":", message["content"])
26 |         response = input("Please input your response: ")
27 |         return response
28 |     
29 |     def print_prompt(self):
30 |         for message in self.messages:
31 |             print(message["role"], ":", message["content"])
32 | 


--------------------------------------------------------------------------------
/code/ChatHaruhi/GLMPro.py:
--------------------------------------------------------------------------------
 1 | from .BaseLLM import BaseLLM
 2 | import os
 3 | 
 4 | zhipu_api = os.environ['ZHIPU_API']
 5 | 
 6 | import zhipuai
 7 | import time
 8 | 
 9 | class GLMPro( BaseLLM ):
10 |     def __init__(self, model="chatglm_pro", verbose = False ):
11 |         super(GLMPro,self).__init__()
12 | 
13 |         zhipuai.api_key = zhipu_api
14 | 
15 |         self.verbose = verbose
16 | 
17 |         self.model_name = model
18 | 
19 |         self.prompts = []
20 | 
21 |         if self.verbose == True:
22 |             print('model name, ', self.model_name )
23 |             if len( zhipu_api ) > 8:
24 |                 print( 'found apikey ', zhipu_api[:4], '****', zhipu_api[-4:] )
25 |             else:
26 |                 print( 'found apikey but too short, ' )
27 |         
28 | 
29 |     def initialize_message(self):
30 |         self.prompts = []
31 | 
32 |     def ai_message(self, payload):
33 |         self.prompts.append({"role":"assistant","content":payload})
34 | 
35 |     def system_message(self, payload):
36 |         self.prompts.append({"role":"user","content":payload})
37 | 
38 |     def user_message(self, payload):
39 |         self.prompts.append({"role":"user","content":payload})
40 | 
41 |     def get_response(self):
42 |         zhipuai.api_key = zhipu_api
43 |         max_test_name = 5
44 |         sleep_interval = 3
45 | 
46 |         request_id = None
47 | 
48 |         
49 | 
50 |         # try submit asychonize request until success
51 |         for test_time in range( max_test_name ):
52 |             response = zhipuai.model_api.async_invoke(
53 |                 model = self.model_name,
54 |                 prompt = self.prompts,
55 |                 temperature = 0)
56 |             if response['success'] == True:
57 |                 request_id = response['data']['task_id']
58 | 
59 |                 if self.verbose == True:
60 |                     print('submit request, id = ', request_id )
61 |                 break
62 |             else:
63 |                 print('submit GLM request failed, retrying...')
64 |                 time.sleep( sleep_interval )
65 | 
66 |         if request_id:
67 |             # try get response until success
68 |             for test_time in range( 2 * max_test_name ):
69 |                 result = zhipuai.model_api.query_async_invoke_result( request_id )
70 |                 if result['code'] == 200 and result['data']['task_status'] == 'SUCCESS':
71 | 
72 |                     if self.verbose == True:
73 |                         print('get GLM response success' )
74 | 
75 |                     choices = result['data']['choices']
76 |                     if len( choices ) > 0:
77 |                         return choices[-1]['content'].strip("\"'")
78 |                     
79 |                 # other wise means failed
80 |                 if self.verbose == True:
81 |                     print('get GLM response failed, retrying...')
82 |                 # sleep for 1 second
83 |                 time.sleep( sleep_interval )
84 |         else:
85 |             print('submit GLM request failed, please check your api key and model name')
86 |             return ''
87 |     
88 |     def print_prompt(self):
89 |         for message in self.prompts:
90 |             print(f"{message['role']}: {message['content']}")
91 | 


--------------------------------------------------------------------------------
/code/ChatHaruhi/LangChainGPT.py:
--------------------------------------------------------------------------------
 1 | # ChatHaruhi: Reviving Anime Character in Reality via Large Language Model
 2 | #
 3 | # ChatHaruhi 2.0, built by Cheng Li and Weishi Mi
 4 | #
 5 | # chengli.thu@gmail.com, mws22@mails.tsinghua.edu.cn
 6 | # 
 7 | # Weishi Mi is a second-year graduate student at Tsinghua University, majoring in computer science.
 8 | # Weishi Mi is pursuing a job or a PhD position, which who will be available next year
 9 | # 
10 | # homepage https://github.com/LC1332/Chat-Haruhi-Suzumiya
11 | # 
12 | # ChatHaruhi is a chatbot that can revive anime characters in reality.
13 | # the 2.0 version was built by Cheng Li and Weishi Mi.
14 | # 
15 | # Please cite our paper if you use this code for research: 
16 | #
17 | # @misc{li2023chatharuhi,
18 | #       title={ChatHaruhi: Reviving Anime Character in Reality via Large Language Model}, 
19 | #       author={Cheng Li and Ziang Leng and Chenxi Yan and Junyi Shen and Hao Wang and Weishi MI and Yaying Fei and Xiaoyang Feng and Song Yan and HaoSheng Wang and Linkang Zhan and Yaokai Jia and Pingyu Wu and Haozhen Sun},
20 | #       year={2023},
21 | #       eprint={2308.09597},
22 | #       archivePrefix={arXiv},
23 | #       primaryClass={cs.CL}
24 | # }
25 | 
26 | 
27 | #from langchain.chat_models import ChatOpenAI
28 | from langchain_openai import ChatOpenAI
29 | from langchain.callbacks import get_openai_callback
30 | from langchain.prompts.chat import (
31 |     ChatPromptTemplate,
32 |     SystemMessagePromptTemplate,
33 |     AIMessagePromptTemplate,
34 |     HumanMessagePromptTemplate,
35 | )
36 | from langchain.schema import (
37 |     AIMessage,
38 |     HumanMessage,
39 |     SystemMessage
40 | )
41 | from .BaseLLM import BaseLLM
42 | 
43 | import os
44 | 
45 | class LangChainGPT(BaseLLM):
46 | 
47 |     def __init__(self, model="gpt-3.5-turbo"):
48 |         super(LangChainGPT, self).__init__()
49 |         self.model = model
50 | 
51 |         if "OPENAI_API_BASE" in os.environ:
52 |             from dotenv import load_dotenv
53 |             load_dotenv()
54 |             api_base = os.environ["OPENAI_API_BASE"]
55 |             api_key = os.environ["OPENAI_API_KEY"]
56 |             self.chat = ChatOpenAI(model=self.model, openai_api_base=api_base)
57 |         else:
58 |             api_key = os.environ.get("OPENAI_API_KEY", None)
59 | 
60 |             if api_key is None:
61 |                 print("warning! call LangChainGPT but openai key has not yet been set, use idle key instead")
62 |                 os.environ["OPENAI_API_KEY"] = "not_a_key"
63 | 
64 |             self.chat = ChatOpenAI(model=self.model)
65 |                 
66 |         # add api_base        
67 |         self.messages = []
68 | 
69 |     def initialize_message(self):
70 |         self.messages = []
71 | 
72 |     def ai_message(self, payload):
73 |         self.messages.append(AIMessage(content=payload))
74 | 
75 |     def system_message(self, payload):
76 |         self.messages.append(SystemMessage(content=payload))
77 | 
78 |     def user_message(self, payload):
79 |         self.messages.append(HumanMessage(content=payload))
80 | 
81 |     def get_response(self):
82 |         if self.model in ['Mixtral', 'mistral', 'mistral-rp', 'llama2-7b', 'llama2-13b', 'gemini'] and len(self.messages) > 2:
83 |             self.messages[-1].content = '\n'.join([ m.content for m in self.messages])
84 |             self.messages = self.messages[-1:]
85 | 
86 |             
87 |     
88 |         with get_openai_callback() as cb:        
89 |             response = self.chat.invoke(self.messages)
90 |         total_tokens = cb.total_tokens
91 | 
92 |         print(response.content)
93 |         return response.content
94 | 
95 |     def print_prompt(self):
96 |         for message in self.messages:
97 |             print(message)
98 | 


--------------------------------------------------------------------------------
/code/ChatHaruhi/Mixtral.py:
--------------------------------------------------------------------------------
 1 | from .BaseLLM import BaseLLM
 2 | import torch
 3 | from transformers import AutoTokenizer, AutoModelForCausalLM
 4 | from transformers import LlamaTokenizer, MixtralForCausalLM
 5 | import bitsandbytes, flash_attn
 6 | tokenizer_LLaMA = None
 7 | model_LLaMA = None
 8 | 
 9 | def initialize_Mixtral():
10 |     global model_LLaMA, tokenizer_LLaMA
11 | 
12 |     if model_LLaMA is None:
13 |         model_LLaMA = MixtralForCausalLM.from_pretrained(
14 |             "NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO",
15 |             torch_dtype=torch.float16,
16 |             device_map="auto"
17 |         )
18 | 
19 |     if tokenizer_LLaMA is None:
20 |         tokenizer_LLaMA = LlamaTokenizer.from_pretrained('NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO', trust_remote_code=True)
21 | 
22 |     return model_LLaMA, tokenizer_LLaMA
23 | 
24 | def LLaMA_tokenizer(text):
25 |     return len(tokenizer_LLaMA.encode(text))
26 | 
27 | class ChatMixtral(BaseLLM):
28 |     def __init__(self, model="Mixtral"):
29 |         super(ChatMixtral, self).__init__()
30 |         self.model, self.tokenizer = initialize_Mixtral()
31 |         self.messages = ""
32 | 
33 |     def initialize_message(self):
34 |         self.messages = ""
35 | 
36 |     def ai_message(self, payload):
37 |         self.messages = self.messages + "\n " + payload 
38 | 
39 |     def system_message(self, payload):
40 |         self.messages = self.messages + "\n " + payload 
41 | 
42 |     def user_message(self, payload):
43 |         self.messages = self.messages + "\n " + payload 
44 | 
45 |     def get_response(self):
46 |         with torch.no_grad():
47 |             input_ids = self.tokenizer(self.messages, return_tensors="pt").input_ids.to("cuda")
48 |             generated_ids = self.model.generate(input_ids, max_new_tokens=750, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=self.tokenizer.eos_token_id)
49 |             response = self.tokenizer.decode(generated_ids[0][input_ids.shape[-1]:], skip_special_tokens=True, clean_up_tokenization_space=True)
50 |         return response
51 |         
52 |     def print_prompt(self):
53 |         print(self.messages)
54 | 


--------------------------------------------------------------------------------
/code/ChatHaruhi/NaiveDB.py:
--------------------------------------------------------------------------------
 1 | from .BaseDB import BaseDB
 2 | import random
 3 | import string
 4 | import os
 5 | 
 6 | from math import sqrt
 7 | 
 8 | class NaiveDB(BaseDB):
 9 |     def __init__(self):
10 |         self.verbose = False
11 |         self.init_db()
12 | 
13 |     def init_db(self):
14 |         if(self.verbose):
15 |             print("call init_db")
16 |         self.vectors = []
17 |         self.documents = []
18 |         self.norms = []
19 | 
20 |     def save(self, file_path):
21 |         print( "warning! directly save folder from dbtype NaiveDB has not been implemented yet, try use role_from_hf to load role instead" )
22 | 
23 |     def load(self, file_path):
24 |         print( "warning! directly load folder from dbtype NaiveDB has not been implemented yet, try use role_from_hf to load role instead" )
25 | 
26 |     def recompute_norm( self ):
27 |         # 补全这部分代码，self.norms 分别存储每个vector的l2 norm
28 |         # 计算每个向量的L2范数
29 |         self.norms = [sqrt(sum([x**2 for x in vec])) for vec in self.vectors]
30 | 
31 | 
32 |     def search(self, query_vector , n_results):
33 | 
34 |         if(self.verbose):
35 |             print("call search")
36 | 
37 |         if len(self.norms) != len(self.vectors):
38 |             self.recompute_norm()
39 | 
40 |         # self.vectors 是list of list of float
41 |         # self.norms 存储了每个vector的l2 norm
42 |         # query_vector是lisft of float
43 |         # 依次计算query_vector和vectors中每个vector的cosine similarity（注意vector的norm已经在self.norm中计算)
44 |         # 并且给出最相近的至多n_results个结果
45 |         # 把对应序号的documents 用list of string的形式return
46 |         # 补全这部分代码
47 |             
48 |         # 计算查询向量的范数
49 |         query_norm = sqrt(sum([x**2 for x in query_vector]))
50 | 
51 |         # 计算余弦相似度
52 |         similarities = []
53 |         for vec, norm in zip(self.vectors, self.norms):
54 |             dot_product = sum(q * v for q, v in zip(query_vector, vec))
55 |             if query_norm < 1e-20:
56 |                 continue
57 |             cosine_similarity = dot_product / (query_norm * norm)
58 |             similarities.append(cosine_similarity)
59 | 
60 |         # 获取最相似的n_results个结果
61 |         top_indices = sorted(range(len(similarities)), key=lambda i: similarities[i], reverse=True)[:n_results]
62 |         top_documents = [self.documents[i] for i in top_indices]
63 |         return top_documents
64 | 
65 |     def init_from_docs(self, vectors, documents):
66 |         if(self.verbose):
67 |             print("call init_from_docs")
68 |         self.vectors = vectors
69 |         self.documents = documents
70 |         self.norms = []


--------------------------------------------------------------------------------
/code/ChatHaruhi/PrintLLM.py:
--------------------------------------------------------------------------------
 1 | # ChatHaruhi: Reviving Anime Character in Reality via Large Language Model
 2 | #
 3 | # ChatHaruhi 2.0, built by Cheng Li and Weishi Mi
 4 | #
 5 | # chengli.thu@gmail.com, mws22@mails.tsinghua.edu.cn
 6 | # 
 7 | # Weishi Mi is a second-year graduate student at Tsinghua University, majoring in computer science.
 8 | # Weishi Mi is pursuing a job or a PhD position, which who will be available next year
 9 | # 
10 | # homepage https://github.com/LC1332/Chat-Haruhi-Suzumiya
11 | # 
12 | # ChatHaruhi is a chatbot that can revive anime characters in reality.
13 | # the 2.0 version was built by Cheng Li and Weishi Mi.
14 | # 
15 | # Please cite our paper if you use this code for research: 
16 | #
17 | # @misc{li2023chatharuhi,
18 | #       title={ChatHaruhi: Reviving Anime Character in Reality via Large Language Model}, 
19 | #       author={Cheng Li and Ziang Leng and Chenxi Yan and Junyi Shen and Hao Wang and Weishi MI and Yaying Fei and Xiaoyang Feng and Song Yan and HaoSheng Wang and Linkang Zhan and Yaokai Jia and Pingyu Wu and Haozhen Sun},
20 | #       year={2023},
21 | #       eprint={2308.09597},
22 | #       archivePrefix={arXiv},
23 | #       primaryClass={cs.CL}
24 | # }
25 | # 
26 | # This PrintLLM.py is for debuging with any real-runing LLM
27 | # so you can see full prompt and copy it into GPT or Claude to debug
28 | #
29 | 
30 | from .BaseLLM import BaseLLM
31 | 
32 | class PrintLLM(BaseLLM):
33 | 
34 |     def __init__(self ):
35 |         self.messages = []
36 |         self.messages.append("Noticing: This is a print LLM for debug.")
37 |         self.messages.append("But you can also copy the prompt into GPT or Claude to debugging")
38 | 
39 |     def initialize_message(self):
40 |         self.messages = []
41 |         self.messages.append("Noticing: This is a print LLM for debug.")
42 |         self.messages.append("But you can also copy the prompt into GPT or Claude to debugging")
43 | 
44 |     def ai_message(self, payload):
45 |         self.messages.append("AI: \n" + payload)
46 | 
47 |     def system_message(self, payload):
48 |         self.messages.append("System: \n" + payload)
49 | 
50 |     def user_message(self, payload):
51 |         self.messages.append("User: \n" + payload)
52 | 
53 |     def get_response(self):
54 |         for message in self.messages:
55 |             print(message)
56 |         response = input("Please input your response: ")
57 |         return response
58 |     
59 |     def print_prompt(self):
60 |         for message in self.messages:
61 |             print(message)
62 | 


--------------------------------------------------------------------------------
/code/ChatHaruhi/Qwen118k2GPT.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | from .BaseLLM import BaseLLM
 3 | from transformers import AutoTokenizer, AutoModelForCausalLM
 4 | import pdb
 5 | tokenizer_qwen = None
 6 | model_qwen = None
 7 | # Load model directly
 8 | def initialize_Qwen2LORA():
 9 |     global model_qwen, tokenizer_qwen
10 | 
11 |     if model_qwen is None:
12 |         model_qwen = AutoModelForCausalLM.from_pretrained(
13 |             "silk-road/ChatHaruhi_RolePlaying_qwen_7b",
14 |             device_map="auto",
15 |             trust_remote_code=True
16 |         )
17 |         model_qwen = model_qwen.eval()
18 |         # model_qwen = PeftModel.from_pretrained(
19 |         #     model_qwen,
20 |         #     "silk-road/Chat-Haruhi-Fusion_B"
21 |         # )
22 | 
23 |     if tokenizer_qwen is None:
24 |         tokenizer_qwen = AutoTokenizer.from_pretrained(
25 |             "silk-road/ChatHaruhi_RolePlaying_qwen_7b", 
26 |             # use_fast=True,
27 |             trust_remote_code=True
28 |         )
29 | 
30 |     return model_qwen, tokenizer_qwen
31 | 
32 | 
33 | def LLaMA_tokenizer(text):
34 |     return len(tokenizer_qwen.encode(text))
35 | 
36 | class Qwen118k2GPT(BaseLLM):
37 |     def __init__(self, model="qwen-118k"):
38 |         super(Qwen118k2GPT, self).__init__()
39 |         self.model, self.tokenizer = initialize_Qwen2LORA()
40 |         self.messages = ""
41 | 
42 |     def initialize_message(self):
43 |         self.messages = ""
44 | 
45 |     def ai_message(self, payload):
46 |         self.messages = self.messages + "\n " + payload 
47 | 
48 |     def system_message(self, payload):
49 |         self.messages = self.messages + "\n " + payload 
50 | 
51 |     def user_message(self, payload):
52 |         self.messages = self.messages + "\n " + payload 
53 | 
54 |     def get_response(self):
55 |         with torch.no_grad():
56 |             response, history = self.model.chat(self.tokenizer, self.messages, history=[])
57 |             # print(response)
58 |         return response
59 |         
60 |     def print_prompt(self):
61 |         print(type(self.messages))
62 |         print(self.messages)
63 | 


--------------------------------------------------------------------------------
/code/ChatHaruhi/SparkApi.py:
--------------------------------------------------------------------------------
  1 | # 由讯飞提供的websocket接口，用于与星火机器人进行交互
  2 | 
  3 | import _thread as thread
  4 | import base64
  5 | import datetime
  6 | import hashlib
  7 | import hmac
  8 | import json
  9 | from urllib.parse import urlparse
 10 | import ssl
 11 | from datetime import datetime
 12 | from time import mktime
 13 | from urllib.parse import urlencode
 14 | from wsgiref.handlers import format_date_time
 15 | 
 16 | import websocket  # 使用websocket_client
 17 | answer = ""
 18 | 
 19 | class Ws_Param(object):
 20 |     # 初始化
 21 |     def __init__(self, APPID, APIKey, APISecret, Spark_url):
 22 |         self.APPID = APPID
 23 |         self.APIKey = APIKey
 24 |         self.APISecret = APISecret
 25 |         self.host = urlparse(Spark_url).netloc
 26 |         self.path = urlparse(Spark_url).path
 27 |         self.Spark_url = Spark_url
 28 | 
 29 |     # 生成url
 30 |     def create_url(self):
 31 |         # 生成RFC1123格式的时间戳
 32 |         now = datetime.now()
 33 |         date = format_date_time(mktime(now.timetuple()))
 34 | 
 35 |         # 拼接字符串
 36 |         signature_origin = "host: " + self.host + "\n"
 37 |         signature_origin += "date: " + date + "\n"
 38 |         signature_origin += "GET " + self.path + " HTTP/1.1"
 39 | 
 40 |         # 进行hmac-sha256进行加密
 41 |         signature_sha = hmac.new(self.APISecret.encode('utf-8'), signature_origin.encode('utf-8'),
 42 |                                  digestmod=hashlib.sha256).digest()
 43 | 
 44 |         signature_sha_base64 = base64.b64encode(signature_sha).decode(encoding='utf-8')
 45 | 
 46 |         authorization_origin = f'api_key="{self.APIKey}", algorithm="hmac-sha256", headers="host date request-line", signature="{signature_sha_base64}"'
 47 | 
 48 |         authorization = base64.b64encode(authorization_origin.encode('utf-8')).decode(encoding='utf-8')
 49 | 
 50 |         # 将请求的鉴权参数组合为字典
 51 |         v = {
 52 |             "authorization": authorization,
 53 |             "date": date,
 54 |             "host": self.host
 55 |         }
 56 |         # 拼接鉴权参数，生成url
 57 |         url = self.Spark_url + '?' + urlencode(v)
 58 |         # 此处打印出建立连接时候的url,参考本demo的时候可取消上方打印的注释，比对相同参数时生成的url与自己代码生成的url是否一致
 59 |         return url
 60 | 
 61 | 
 62 | # 收到websocket错误的处理
 63 | def on_error(ws, error):
 64 |     print("### error:", error)
 65 | 
 66 | 
 67 | # 收到websocket关闭的处理
 68 | def on_close(ws,one,two):
 69 |     print(" ")
 70 | 
 71 | 
 72 | # 收到websocket连接建立的处理
 73 | def on_open(ws):
 74 |     thread.start_new_thread(run, (ws,))
 75 | 
 76 | 
 77 | def run(ws, *args):
 78 |     data = json.dumps(gen_params(appid=ws.appid, domain= ws.domain,question=ws.question))
 79 |     ws.send(data)
 80 | 
 81 | 
 82 | # 收到websocket消息的处理
 83 | def on_message(ws, message):
 84 |     # print(message)
 85 |     data = json.loads(message)
 86 |     code = data['header']['code']
 87 |     if code != 0:
 88 |         print(f'请求错误: {code}, {data}')
 89 |         ws.close()
 90 |     else:
 91 |         choices = data["payload"]["choices"]
 92 |         status = choices["status"]
 93 |         content = choices["text"][0]["content"]
 94 |         # print(content,end ="")
 95 |         global answer
 96 |         answer += content
 97 |         # print(1)
 98 |         if status == 2:
 99 |             ws.close()
100 | 
101 | 
102 | def gen_params(appid, domain,question):
103 |     """
104 |     通过appid和用户的提问来生成请参数
105 |     """
106 |     data = {
107 |         "header": {
108 |             "app_id": appid,
109 |             "uid": "1234"
110 |         },
111 |         "parameter": {
112 |             "chat": {
113 |                 "domain": domain,
114 |                 "random_threshold": 0.5,
115 |                 "max_tokens": 2048,
116 |                 "auditing": "default"
117 |             }
118 |         },
119 |         "payload": {
120 |             "message": {
121 |                 "text": question
122 |             }
123 |         }
124 |     }
125 |     return data
126 | 
127 | 
128 | def main(appid, api_key, api_secret, Spark_url,domain, question):
129 |     # print("星火:")
130 |     wsParam = Ws_Param(appid, api_key, api_secret, Spark_url)
131 |     websocket.enableTrace(False)
132 |     wsUrl = wsParam.create_url()
133 |     ws = websocket.WebSocketApp(wsUrl, on_message=on_message, on_error=on_error, on_close=on_close, on_open=on_open)
134 |     ws.appid = appid
135 |     ws.question = question
136 |     ws.domain = domain
137 |     ws.run_forever(sslopt={"cert_reqs": ssl.CERT_NONE})
138 | 
139 | 
140 | 


--------------------------------------------------------------------------------
/code/ChatHaruhi/SparkGPT.py:
--------------------------------------------------------------------------------
 1 | # SparkGPT.py
 2 | from . import SparkApi
 3 | #以下密钥信息从os环境获取
 4 | import os
 5 | 
 6 | appid = os.environ['APPID']
 7 | api_secret = os.environ['APISecret'] 
 8 | api_key = os.environ['APIKey']
 9 | 
10 | from .BaseLLM import BaseLLM
11 | 
12 |     
13 | 
14 | 
15 | class SparkGPT(BaseLLM):
16 | 
17 |     def __init__(self, model="Spark3.0"):
18 |         super(SparkGPT,self).__init__()
19 |         self.model_type = model
20 |         self.messages = []
21 |         if self.model_type == "Spark2.0":
22 |             self.domain = "generalv2"    # v2.0版本
23 |             self.Spark_url = "ws://spark-api.xf-yun.com/v2.1/chat"  # v2.0环境的地址
24 |         elif self.model_type == "Spark1.5":
25 |             self.domain = "general"   # v1.5版本
26 |             self.Spark_url = "ws://spark-api.xf-yun.com/v1.1/chat"  # v1.5环境的地址
27 |         elif self.model_type == "Spark3.0":
28 |             self.domain = "generalv3"   # v3.0版本
29 |             self.Spark_url = "ws://spark-api.xf-yun.com/v3.1/chat"  # v3.0环境的地址
30 |         else:
31 |             raise Exception("Unknown Spark model")
32 |     
33 |     def initialize_message(self):
34 |         self.messages = []
35 | 
36 |     def ai_message(self, payload):
37 |         if len(self.messages) == 0:
38 |             self.user_message("请根据我的要求进行角色扮演:")
39 |         elif len(self.messages) % 2 == 1:
40 |             self.messages.append({"role":"assistant","content":payload})
41 |         elif len(self.messages)% 2 == 0:
42 |             self.messages[-1]["content"] += "\n"+ payload
43 | 
44 |     def system_message(self, payload):
45 |         
46 |         self.messages.append({"role":"user","content":payload}) 
47 |         
48 | 
49 |     def user_message(self, payload):
50 |         if len(self.messages) % 2 == 0:
51 |             self.messages.append({"role":"user","content":payload})
52 |             # self.messages[-1]["content"] += 
53 |         elif len(self.messages)% 2 == 1:
54 |             self.messages[-1]["content"] += "\n"+ payload
55 | 
56 |     def get_response(self):
57 |         # question = checklen(getText("user",Input))
58 |         SparkApi.answer =""
59 |         if self.model_type == "Spark2.0":
60 |             self.domain = "generalv2"    # v2.0版本
61 |             self.Spark_url = "ws://spark-api.xf-yun.com/v2.1/chat"  # v2.0环境的地址
62 |         elif self.model_type == "Spark1.5":
63 |             self.domain = "general"   # v1.5版本
64 |             self.Spark_url = "ws://spark-api.xf-yun.com/v1.1/chat"  # v1.5环境的地址
65 |         elif self.model_type == "Spark3.0":
66 |             self.domain = "generalv3"   # v3.0版本
67 |             self.Spark_url = "ws://spark-api.xf-yun.com/v3.1/chat"  # v3.0环境的地址
68 |         else:
69 |             raise Exception("Unknown Spark model")
70 |         SparkApi.main(appid,api_key,api_secret,self.Spark_url,self.domain,self.messages)
71 |         return SparkApi.answer
72 |     
73 |     def print_prompt(self):
74 |         for message in self.messages:
75 |             print(f"{message['role']}: {message['content']}")
76 | 


--------------------------------------------------------------------------------
/code/ChatHaruhi/__init__.py:
--------------------------------------------------------------------------------
 1 | # ChatHaruhi: Reviving Anime Character in Reality via Large Language Model
 2 | #
 3 | # ChatHaruhi 2.0, built by Cheng Li and Weishi Mi
 4 | #
 5 | # chengli.thu@gmail.com, mws22@mails.tsinghua.edu.cn
 6 | # 
 7 | # Weishi Mi is a second-year graduate student at Tsinghua University, majoring in computer science.
 8 | # Weishi Mi is pursuing a job or a PhD position, which who will be available next year
 9 | # 
10 | # homepage https://github.com/LC1332/Chat-Haruhi-Suzumiya
11 | # 
12 | # ChatHaruhi is a chatbot that can revive anime characters in reality.
13 | # the 2.0 version was built by Cheng Li and Weishi Mi.
14 | # 
15 | # Please cite our paper if you use this code for research: 
16 | #
17 | # @misc{li2023chatharuhi,
18 | #       title={ChatHaruhi: Reviving Anime Character in Reality via Large Language Model}, 
19 | #       author={Cheng Li and Ziang Leng and Chenxi Yan and Junyi Shen and Hao Wang and Weishi MI and Yaying Fei and Xiaoyang Feng and Song Yan and HaoSheng Wang and Linkang Zhan and Yaokai Jia and Pingyu Wu and Haozhen Sun},
20 | #       year={2023},
21 | #       eprint={2308.09597},
22 | #       archivePrefix={arXiv},
23 | #       primaryClass={cs.CL}
24 | # }
25 | 
26 | from .ChatHaruhi import ChatHaruhi
27 | 


--------------------------------------------------------------------------------
/code/ChatHaruhi/mistral.py:
--------------------------------------------------------------------------------
 1 | from .BaseLLM import BaseLLM
 2 | import torch
 3 | from transformers import AutoTokenizer, AutoModelForCausalLM
 4 | import bitsandbytes, flash_attn
 5 | tokenizer_LLaMA = None
 6 | model_LLaMA = None
 7 | 
 8 | def initialize_Mistral():
 9 |     global model_LLaMA, tokenizer_LLaMA
10 | 
11 |     if model_LLaMA is None:
12 |         model_LLaMA = AutoModelForCausalLM.from_pretrained(
13 |             "mistralai/Mistral-7B-Instruct-v0.2",
14 |             torch_dtype=torch.float16,
15 |             device_map="auto"
16 |         )
17 | 
18 |     if tokenizer_LLaMA is None:
19 |         tokenizer_LLaMA = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2", trust_remote_code=True)
20 | 
21 |     return model_LLaMA, tokenizer_LLaMA
22 | 
23 | def LLaMA_tokenizer(text):
24 |     return len(tokenizer_LLaMA.encode(text))
25 | 
26 | class ChatMistral(BaseLLM):
27 |     def __init__(self, model="Mistral"):
28 |         super(ChatMistral, self).__init__()
29 |         self.model, self.tokenizer = initialize_Mistral()
30 |         self.messages = ""
31 | 
32 |     def initialize_message(self):
33 |         self.messages = "[INST]"
34 | 
35 |     def ai_message(self, payload):
36 |         self.messages = self.messages + "\n " + payload 
37 | 
38 |     def system_message(self, payload):
39 |         self.messages = self.messages + "\n " + payload 
40 | 
41 |     def user_message(self, payload):
42 |         self.messages = self.messages + "\n " + payload 
43 | 
44 |     def get_response(self):
45 |         with torch.no_grad():
46 |             encodeds = self.tokenizer.encode(self.messages+"[/INST]", return_tensors="pt")
47 |             generated_ids = self.model.generate(encodeds, max_new_tokens=2000, do_sample=True)
48 |             decoded = self.tokenizer.batch_decode(generated_ids)
49 | 
50 |         return decoded[0].split("[/INST]")[1]
51 |         
52 |     def print_prompt(self):
53 |         print(self.messages)
54 | 


--------------------------------------------------------------------------------
/code/ChatHaruhi/phi.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | from .BaseLLM import BaseLLM
 3 | from transformers import AutoTokenizer, PhiForCausalLM
 4 | tokenizer_phi = None
 5 | model_phi = None
 6 | # Load model directly
 7 | def initialize_phi():
 8 |     global model_phi, tokenizer_phi
 9 | 
10 |     if model_phi is None:
11 |         model_phi = PhiForCausalLM.from_pretrained(
12 |             "cognitivecomputations/dolphin-2_6-phi-2",
13 |             local_files_only=True,
14 |             torch_dtype=torch.float16,
15 |             device_map="auto",
16 |         )
17 | 
18 |     if tokenizer_phi is None:
19 |         tokenizer_phi = AutoTokenizer.from_pretrained(
20 |             "cognitivecomputations/dolphin-2_6-phi-2", 
21 |             local_files_only=True,
22 |             use_fast=True,
23 |         )
24 |             
25 | 
26 | 
27 | 
28 |     return model_phi, tokenizer_phi
29 | 
30 | def LLaMA_tokenizer(text):
31 |     return len(tokenizer_phi.encode(text))
32 | 
33 | class Chatphi(BaseLLM):
34 |     def __init__(self, model="phi"):
35 |         super(Chatphi, self).__init__()
36 |         self.model, self.tokenizer = initialize_phi()
37 |         self.messages = ""
38 | 
39 |     def initialize_message(self):
40 |         self.messages = ""
41 | 
42 |     def ai_message(self, payload):
43 |         self.messages = self.messages + "\n " + payload 
44 | 
45 |     def system_message(self, payload):
46 |         self.messages = self.messages + "\n " + payload 
47 | 
48 |     def user_message(self, payload):
49 |         self.messages = self.messages + "\n " + payload 
50 | 
51 |     def get_response(self):
52 |         with torch.no_grad():
53 |             # Prepare the model input with attention mask
54 |             inputs = self.tokenizer(self.messages, return_tensors="pt", padding=True, truncation=True)
55 |             attention_mask = inputs['attention_mask']
56 |             
57 |             # Generate the model output using the prepared input and attention mask
58 |             outputs = self.model.generate(input_ids=inputs['input_ids'], attention_mask=attention_mask, max_length=114514)
59 |             response = self.tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
60 |             
61 |         return response
62 | 
63 |         
64 |     def print_prompt(self):
65 |         print(type(self.messages))
66 |         print(self.messages)
67 | 


--------------------------------------------------------------------------------
/code/ChatHaruhi/qwen.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | from .BaseLLM import BaseLLM
 3 | from transformers import AutoTokenizer, AutoModelForCausalLM
 4 | import pdb
 5 | tokenizer_qwen = None
 6 | model_qwen = None
 7 | # Load model directly
 8 | def initialize_qwen():
 9 |     global model_qwen, tokenizer_qwen
10 | 
11 |     if model_qwen is None:
12 |         model_qwen = AutoModelForCausalLM.from_pretrained(
13 |             "Qwen/Qwen-7B-Chat",
14 |             torch_dtype=torch.float16,
15 |             device_map="auto",
16 |             trust_remote_code=True
17 |         )
18 | 
19 |     if tokenizer_qwen is None:
20 |         tokenizer_qwen = AutoTokenizer.from_pretrained(
21 |             "Qwen/Qwen-7B-Chat", 
22 |             use_fast=True,
23 |             trust_remote_code=True
24 |         )
25 |             
26 | 
27 | 
28 | 
29 |     return model_qwen, tokenizer_qwen
30 | 
31 | def LLaMA_tokenizer(text):
32 |     return len(tokenizer_qwen.encode(text))
33 | 
34 | class ChatQwen(BaseLLM):
35 |     def __init__(self, model="qwen7b"):
36 |         super(ChatQwen, self).__init__()
37 |         self.model, self.tokenizer = initialize_qwen()
38 |         self.messages = ""
39 | 
40 |     def initialize_message(self):
41 |         self.messages = ""
42 | 
43 |     def ai_message(self, payload):
44 |         self.messages = self.messages + "\n " + payload 
45 | 
46 |     def system_message(self, payload):
47 |         self.messages = self.messages + "\n " + payload 
48 | 
49 |     def user_message(self, payload):
50 |         self.messages = self.messages + "\n " + payload 
51 | 
52 |     def get_response(self):
53 |         with torch.no_grad():
54 |             response, history = self.model.chat(self.tokenizer, self.messages, history=[])
55 |             # print(response)
56 |         return response
57 |         
58 |     def print_prompt(self):
59 |         print(type(self.messages))
60 |         print(self.messages)
61 | 


--------------------------------------------------------------------------------
/code/ChatHaruhi/role_name_to_file.py:
--------------------------------------------------------------------------------
 1 | # ChatHaruhi: Reviving Anime Character in Reality via Large Language Model
 2 | #
 3 | # ChatHaruhi 2.0, built by Cheng Li and Weishi Mi
 4 | #
 5 | # chengli.thu@gmail.com, mws22@mails.tsinghua.edu.cn
 6 | # 
 7 | # Weishi Mi is a second-year graduate student at Tsinghua University, majoring in computer science.
 8 | # Weishi Mi is pursuing a job or a PhD position, which who will be available next year
 9 | # 
10 | # homepage https://github.com/LC1332/Chat-Haruhi-Suzumiya
11 | # 
12 | # ChatHaruhi is a chatbot that can revive anime characters in reality.
13 | # the 2.0 version was built by Cheng Li and Weishi Mi.
14 | # 
15 | # Please cite our paper if you use this code for research: 
16 | #
17 | # @misc{li2023chatharuhi,
18 | #       title={ChatHaruhi: Reviving Anime Character in Reality via Large Language Model}, 
19 | #       author={Cheng Li and Ziang Leng and Chenxi Yan and Junyi Shen and Hao Wang and Weishi MI and Yaying Fei and Xiaoyang Feng and Song Yan and HaoSheng Wang and Linkang Zhan and Yaokai Jia and Pingyu Wu and Haozhen Sun},
20 | #       year={2023},
21 | #       eprint={2308.09597},
22 | #       archivePrefix={arXiv},
23 | #       primaryClass={cs.CL}
24 | # }
25 | # 
26 | # if you have attempt to add a new character, please add the role name here
27 | # 
28 | 
29 | role_name_Haruhiu = {'汤师爷': 'tangshiye', 'tangshiye': 'tangshiye', 'Tangshiye': 'tangshiye', 
30 |                      '慕容复': 'murongfu', 'murongfu': 'murongfu', 'Murongfu': 'murongfu', 
31 |                      '李云龙': 'liyunlong', 'liyunlong': 'liyunlong', 'Liyunlong': 'liyunlong', 
32 |                      'Luna': 'Luna', '王多鱼': 'wangduoyu', 'wangduoyu': 'wangduoyu', 
33 |                      'Wangduoyu': 'wangduoyu', 'Ron': 'Ron', '鸠摩智': 'jiumozhi', 
34 |                      'jiumozhi': 'jiumozhi', 'Jiumozhi': 'jiumozhi', 'Snape': 'Snape', 
35 |                      '凉宫春日': 'haruhi', 'haruhi': 'haruhi', 'Haruhi': 'haruhi', 
36 |                      'Malfoy': 'Malfoy', '虚竹': 'xuzhu', 'xuzhu': 'xuzhu', 
37 |                      'Xuzhu': 'xuzhu', '萧峰': 'xiaofeng', 
38 |                      'xiaofeng': 'xiaofeng', 'Xiaofeng': 'xiaofeng', '段誉': 'duanyu', 
39 |                      'duanyu': 'duanyu', 'Duanyu': 'duanyu', 'Hermione': 'Hermione', 
40 |                      'Dumbledore': 'Dumbledore', '王语嫣': 'wangyuyan', 'wangyuyan': 
41 |                      'wangyuyan', 'Wangyuyan': 'wangyuyan', 'Harry': 'Harry', 
42 |                      'McGonagall': 'McGonagall', '白展堂': 'baizhantang', 
43 |                      'baizhantang': 'baizhantang', 'Baizhantang': 'baizhantang', 
44 |                      '佟湘玉': 'tongxiangyu', 'tongxiangyu': 'tongxiangyu', 
45 |                      'Tongxiangyu': 'tongxiangyu', '郭芙蓉': 'guofurong', 
46 |                      'guofurong': 'guofurong', 'Guofurong': 'guofurong', '流浪者': 'wanderer', 
47 |                      'wanderer': 'wanderer', 'Wanderer': 'wanderer', '钟离': 'zhongli', 
48 |                      'zhongli': 'zhongli', 'Zhongli': 'zhongli', '胡桃': 'hutao', 'hutao': 'hutao', 
49 |                      'Hutao': 'hutao', 'Sheldon': 'Sheldon', 'Raj': 'Raj', 
50 |                      'Penny': 'Penny', '韦小宝': 'weixiaobao', 'weixiaobao': 'weixiaobao', 
51 |                      'Weixiaobao': 'weixiaobao', '乔峰': 'qiaofeng', 'qiaofeng': 'qiaofeng', 
52 |                      'Qiaofeng': 'qiaofeng', '神里绫华': 'ayaka', 'ayaka': 'ayaka', 
53 |                      'Ayaka': 'ayaka', '雷电将军': 'raidenShogun', 'raidenShogun': 'raidenShogun', 
54 |                      'RaidenShogun': 'raidenShogun', '于谦': 'yuqian', 'yuqian': 'yuqian', 
55 |                      'Yuqian': 'yuqian', 'Professor McGonagall': 'McGonagall', 
56 |                      'Professor Dumbledore': 'Dumbledore'}
57 | 
58 | def get_en_role_name( role_name ):
59 |     if role_name in role_name_Haruhiu:
60 |         return role_name_Haruhiu[role_name]
61 |     else:
62 |         return "haruhi"
63 | 
64 | # input role_name , nick name is also allowed
65 | # output folder_role_name and url url = f'https://github.com/LC1332/Haruhi-2-Dev/raw/main/data/character_in_zip/{role_name}.zip'
66 | def get_folder_role_name(role_name):
67 |     if role_name in role_name_Haruhiu:
68 |         folder_role_name = role_name_Haruhiu[role_name]
69 |         url = f'https://github.com/LC1332/Haruhi-2-Dev/raw/main/data/character_in_zip/{folder_role_name}.zip'
70 |         return folder_role_name, url
71 |     else:
72 |         print('role_name {} not found, using haruhi as default'.format(role_name))
73 |         return get_folder_role_name('haruhi')


--------------------------------------------------------------------------------
/code/ChatHaruhi/utils.py:
--------------------------------------------------------------------------------
  1 | from argparse import Namespace
  2 | 
  3 | # from openai import OpenAI
  4 | 
  5 | # client = OpenAI(api_key=<YOUR OPENAI API KEY>)
  6 | 
  7 | from transformers import AutoModel, AutoTokenizer
  8 | import torch
  9 | import random
 10 | 
 11 | import tiktoken
 12 | import re
 13 | 
 14 | import numpy as np
 15 | 
 16 | import base64
 17 | import struct
 18 | 
 19 | import os
 20 | 
 21 | import tqdm
 22 | 
 23 | import requests
 24 | 
 25 | 
 26 | 
 27 | def get_access_token():
 28 | 	API_KEY = os.getenv("StoryAudit_API_AK")
 29 | 	SECRET_KEY = os.getenv("StoryAudit_API_SK")
 30 | 
 31 | 	"""
 32 | 	使用 AK，SK 生成鉴权签名（Access Token）
 33 | 	:return: access_token，或是None(如果错误)
 34 | 	"""
 35 | 	url = "https://aip.baidubce.com/oauth/2.0/token"
 36 | 	params = {"grant_type": "client_credentials", "client_id": API_KEY, "client_secret": SECRET_KEY}
 37 | 	return str(requests.post(url, params=params).json().get("access_token"))
 38 | 
 39 | '''
 40 | 文本审核接口
 41 | '''
 42 | def text_censor(text):
 43 | 	request_url = "https://aip.baidubce.com/rest/2.0/solution/v1/text_censor/v2/user_defined"
 44 | 
 45 | 	params = {"text":text}
 46 | 	access_token = get_access_token()
 47 | 	request_url = request_url + "?access_token=" + access_token
 48 | 	headers = {'content-type': 'application/x-www-form-urlencoded'}
 49 | 	response = requests.post(request_url, data=params, headers=headers)
 50 | 	return response.json()["conclusion"] == "合规"
 51 | 
 52 | def package_role( system_prompt, texts_path , embedding ):
 53 | 	datas = []
 54 | 
 55 | 	# 暂时只有一种embedding 'luotuo_openai'
 56 | 	embed_name = 'luotuo_openai'
 57 | 
 58 | 	datas.append({ 'text':system_prompt , embed_name:'system_prompt'})
 59 | 	datas.append({ 'text':'Reserve Config Setting Here' , embed_name:'config'})
 60 | 	
 61 | 
 62 | 	# debug_count = 3
 63 | 
 64 | 	# for file in os.listdir(texts_path):
 65 | 
 66 | 	files = os.listdir(texts_path)
 67 | 
 68 | 	for i in tqdm.tqdm(range(len(files))):
 69 | 		file = files[i]
 70 | 		# if file name end with txt
 71 | 		if file.endswith(".txt"):
 72 | 			file_path = os.path.join(texts_path, file)
 73 | 			with open(file_path, 'r', encoding='utf-8') as f:
 74 | 				current_str = f.read()
 75 | 				current_vec = embedding(current_str)
 76 | 				encode_vec = float_array_to_base64(current_vec)
 77 | 				datas.append({ 'text':current_str , embed_name:encode_vec})
 78 | 
 79 | 				# debug_count -= 1
 80 | 				# if debug_count == 0:
 81 | 				#     break
 82 | 	return datas
 83 | 
 84 | 
 85 | import struct
 86 | 
 87 | def string_to_base64(text):
 88 | 	byte_array = b''
 89 | 	for char in text:
 90 | 		num_bytes = char.encode('utf-8')
 91 | 		byte_array += num_bytes
 92 | 
 93 | 	base64_data = base64.b64encode(byte_array)
 94 | 	return base64_data.decode('utf-8')
 95 | 
 96 | def base64_to_string(base64_data):
 97 | 	byte_array = base64.b64decode(base64_data)
 98 | 	text = byte_array.decode('utf-8')
 99 | 	return text
100 | 
101 | 
102 | def float_array_to_base64(float_arr):
103 | 	
104 | 	byte_array = b''
105 | 	
106 | 	for f in float_arr:
107 | 		# 将每个浮点数打包为4字节
108 | 		num_bytes = struct.pack('!f', f)  
109 | 		byte_array += num_bytes
110 | 	
111 | 	# 将字节数组进行base64编码    
112 | 	base64_data = base64.b64encode(byte_array)
113 | 	
114 | 	return base64_data.decode('utf-8')
115 | 
116 | def base64_to_float_array(base64_data):
117 | 
118 | 	byte_array = base64.b64decode(base64_data)
119 | 	
120 | 	float_array = []
121 | 	
122 | 	# 每 4 个字节解析为一个浮点数
123 | 	for i in range(0, len(byte_array), 4):
124 | 		num = struct.unpack('!f', byte_array[i:i+4])[0] 
125 | 		float_array.append(num)
126 | 
127 | 	return float_array
128 | 
129 | 
130 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
131 | 
132 | _luotuo_model = None
133 | 
134 | _luotuo_model_en = None
135 | _luotuo_en_tokenizer = None
136 | 
137 | _enc_model = None
138 | 
139 | # ======== add bge_zh mmodel
140 | # by Cheng Li
141 | # 这一次我们试图一次性去适配更多的模型
142 | 
143 | _model_pool = {}
144 | _tokenizer_pool = {}
145 | 
146 | # BAAI/bge-small-zh-v1.5
147 | 
148 | def get_general_embeddings( sentences , model_name = "BAAI/bge-small-zh-v1.5" ):
149 | 
150 | 	global _model_pool
151 | 	global _tokenizer_pool
152 | 
153 | 	if model_name not in _model_pool:
154 | 		from transformers import AutoTokenizer, AutoModel
155 | 		_tokenizer_pool[model_name] = AutoTokenizer.from_pretrained(model_name)
156 | 		_model_pool[model_name] = AutoModel.from_pretrained(model_name)
157 | 
158 | 	_model_pool[model_name].eval()
159 | 
160 | 	# Tokenize sentences
161 | 	encoded_input = _tokenizer_pool[model_name](sentences, padding=True, truncation=True, return_tensors='pt', max_length = 512)
162 | 
163 | 	# Compute token embeddings
164 | 	with torch.no_grad():
165 | 		model_output = _model_pool[model_name](**encoded_input)
166 | 		# Perform pooling. In this case, cls pooling.
167 | 		sentence_embeddings = model_output[0][:, 0]
168 | 
169 | 	# normalize embeddings
170 | 	sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1)
171 | 	return sentence_embeddings.cpu().tolist()
172 | 
173 | def get_general_embedding( text_or_texts , model_name = "BAAI/bge-small-zh-v1.5" ):
174 | 	if isinstance(text_or_texts, str):
175 | 		return get_general_embeddings([text_or_texts], model_name)[0]
176 | 	else:
177 | 		return get_general_embeddings_safe(text_or_texts, model_name)
178 | 	
179 | general_batch_size = 16
180 | 
181 | import math
182 | 
183 | def get_general_embeddings_safe(sentences, model_name = "BAAI/bge-small-zh-v1.5"):
184 | 	
185 | 	embeddings = []
186 | 	
187 | 	num_batches = math.ceil(len(sentences) / general_batch_size)
188 | 	
189 | 	for i in tqdm.tqdm( range(num_batches) ):
190 | 		# print("run bge with batch ", i)
191 | 		start_index = i * general_batch_size
192 | 		end_index = min(len(sentences), start_index + general_batch_size)
193 | 		batch = sentences[start_index:end_index]
194 | 		embs = get_general_embeddings(batch, model_name)
195 | 		embeddings.extend(embs)
196 | 		
197 | 	return embeddings
198 | 
199 | def get_bge_zh_embedding( text_or_texts ):
200 | 	return get_general_embedding(text_or_texts, "BAAI/bge-small-zh-v1.5")
201 | 
202 | ## TODO: 重构bge_en部分的代码，复用general的函数
203 | 
204 | # ======== add bge model
205 | # by Cheng Li
206 | # for English only right now
207 | 
208 | _bge_model = None
209 | _bge_tokenizer = None
210 | 
211 | def get_bge_embeddings( sentences ):
212 | 	# unsafe ensure batch size by yourself
213 | 
214 | 	global _bge_model
215 | 	global _bge_tokenizer
216 | 
217 | 	if _bge_model is None:
218 | 		from transformers import AutoTokenizer, AutoModel
219 | 		_bge_tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-small-en-v1.5')
220 | 		_bge_model = AutoModel.from_pretrained('BAAI/bge-small-en-v1.5')
221 | 
222 | 	_bge_model.eval()
223 | 
224 | 	# Tokenize sentences
225 | 	encoded_input = _bge_tokenizer(sentences, padding=True, truncation=True, return_tensors='pt', max_length = 512)
226 | 
227 | 	# Compute token embeddings
228 | 	with torch.no_grad():
229 | 		model_output = _bge_model(**encoded_input)
230 | 		# Perform pooling. In this case, cls pooling.
231 | 		sentence_embeddings = model_output[0][:, 0]
232 | 	# normalize embeddings
233 | 	sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1)
234 | 	return sentence_embeddings.cpu().tolist()
235 | 
236 | def get_bge_embedding( text_or_texts ):
237 | 	if isinstance(text_or_texts, str):
238 | 		return get_bge_embeddings([text_or_texts])[0]
239 | 	else:
240 | 		return get_bge_embeddings_safe(text_or_texts)
241 | 
242 | bge_batch_size = 32
243 | 
244 | import math
245 | # from tqdm import tqdm
246 | 
247 | def get_bge_embeddings_safe(sentences):
248 | 	
249 | 	embeddings = []
250 | 	
251 | 	num_batches = math.ceil(len(sentences) / bge_batch_size)
252 | 	
253 | 	for i in tqdm.tqdm( range(num_batches) ):
254 | 		# print("run bge with batch ", i)
255 | 		start_index = i * bge_batch_size
256 | 		end_index = min(len(sentences), start_index + bge_batch_size)
257 | 		batch = sentences[start_index:end_index]
258 | 		embs = get_bge_embeddings(batch)
259 | 		embeddings.extend(embs)
260 | 		
261 | 	return embeddings
262 | 
263 | # === add bge model
264 | 
265 | def tiktokenizer( text ):
266 | 	global _enc_model
267 | 
268 | 	if _enc_model is None:
269 | 		_enc_model = tiktoken.get_encoding("cl100k_base")
270 | 
271 | 	return len(_enc_model.encode(text))
272 | 	
273 | def response_postprocess(text,dialogue_bra_token = '「',dialogue_ket_token = '」'):
274 | 	lines = text.split('\n')
275 | 	new_lines = ""
276 | 
277 | 	first_name = None
278 | 
279 | 	for line in lines:
280 | 		line = line.strip(" ")
281 | 		match = re.match(r'^(.*?)[:：]' + dialogue_bra_token + r"(.*?)" + dialogue_ket_token + r"$", line)
282 | 
283 | 		
284 | 		if match:
285 | 			curr_name = match.group(1)
286 | 			# print(curr_name)
287 | 			if first_name is None:
288 | 				first_name = curr_name
289 | 				new_lines += (match.group(2))
290 | 			else:
291 | 				if curr_name != first_name:
292 | 					return first_name + ":" + dialogue_bra_token +  new_lines + dialogue_ket_token
293 | 				else:
294 | 					new_lines += (match.group(2))
295 | 			
296 | 		else:
297 | 			if first_name == None:
298 | 				return text
299 | 			else:
300 | 				return first_name + ":" + dialogue_bra_token +  new_lines + dialogue_ket_token
301 | 	return first_name + ":" + dialogue_bra_token + new_lines + dialogue_ket_token
302 | 
303 | def download_models():
304 | 	print("正在下载Luotuo-Bert")
305 | 	# Import our models. The package will take care of downloading the models automatically
306 | 	model_args = Namespace(do_mlm=None, pooler_type="cls", temp=0.05, mlp_only_train=False,
307 | 						   init_embeddings_model=None)
308 | 	model = AutoModel.from_pretrained("silk-road/luotuo-bert-medium", trust_remote_code=True, model_args=model_args).to(
309 | 		device)
310 | 	print("Luotuo-Bert下载完毕")
311 | 	return model
312 | 
313 | def get_luotuo_model():
314 | 	global _luotuo_model
315 | 	if _luotuo_model is None:
316 | 		_luotuo_model = download_models()
317 | 	return _luotuo_model
318 | 
319 | 
320 | def luotuo_embedding(model, texts):
321 | 	# Tokenize the texts_source
322 | 	tokenizer = AutoTokenizer.from_pretrained("silk-road/luotuo-bert-medium")
323 | 	inputs = tokenizer(texts, padding=True, truncation=False, return_tensors="pt")
324 | 	inputs = inputs.to(device)
325 | 	# Extract the embeddings
326 | 	# Get the embeddings
327 | 	with torch.no_grad():
328 | 		embeddings = model(**inputs, output_hidden_states=True, return_dict=True, sent_emb=True).pooler_output
329 | 	return embeddings
330 | 
331 | def luotuo_en_embedding( texts ):
332 | 	# this function implemented by Cheng
333 | 	global _luotuo_model_en
334 | 	global _luotuo_en_tokenizer
335 | 
336 | 	if _luotuo_model_en is None:
337 | 		_luotuo_en_tokenizer = AutoTokenizer.from_pretrained("silk-road/luotuo-bert-en")
338 | 		_luotuo_model_en = AutoModel.from_pretrained("silk-road/luotuo-bert-en").to(device)
339 | 
340 | 	if _luotuo_en_tokenizer is None:
341 | 		_luotuo_en_tokenizer = AutoTokenizer.from_pretrained("silk-road/luotuo-bert-en")
342 | 
343 | 	inputs = _luotuo_en_tokenizer(texts, padding=True, truncation=False, return_tensors="pt")
344 | 	inputs = inputs.to(device)
345 | 
346 | 	with torch.no_grad():
347 | 		embeddings = _luotuo_model_en(**inputs, output_hidden_states=True, return_dict=True, sent_emb=True).pooler_output
348 | 		
349 | 	return embeddings
350 | 
351 | 
352 | def get_embedding_for_chinese(model, texts):
353 | 	model = model.to(device)
354 | 	# str or strList
355 | 	texts = texts if isinstance(texts, list) else [texts]
356 | 	# 截断
357 | 	for i in range(len(texts)):
358 | 		if len(texts[i]) > 510:
359 | 			texts[i] = texts[i][:510]
360 | 	if len(texts) >= 64:
361 | 		embeddings = []
362 | 		chunk_size = 64
363 | 		for i in range(0, len(texts), chunk_size):
364 | 			embeddings.append(luotuo_embedding(model, texts[i: i + chunk_size]))
365 | 		return torch.cat(embeddings, dim=0)
366 | 	else:
367 | 		return luotuo_embedding(model, texts)
368 | 
369 | 
370 | def is_chinese_or_english(text):
371 | 	# no longer use online openai api
372 | 	return "chinese"
373 | 
374 | 	text = list(text)
375 | 	is_chinese, is_english = 0, 0
376 | 
377 | 	for char in text:
378 | 		# 判断字符的Unicode值是否在中文字符的Unicode范围内
379 | 		if '\u4e00' <= char <= '\u9fa5':
380 | 			is_chinese += 4
381 | 		# 判断字符是否为英文字符（包括大小写字母和常见标点符号）
382 | 		elif ('\u0041' <= char <= '\u005a') or ('\u0061' <= char <= '\u007a'):
383 | 			is_english += 1
384 | 	if is_chinese >= is_english:
385 | 		return "chinese"
386 | 	else:
387 | 		return "english"
388 | 
389 | 
390 | def get_embedding_openai(text, model="text-embedding-ada-002"):
391 | 	text = text.replace("\n", " ")
392 | 	return client.embeddings.create(input = [text], model=model).data[0].embedding
393 | 
394 | def get_embedding_for_english(text, model="text-embedding-ada-002"):
395 | 	text = text.replace("\n", " ")
396 | 	return client.embeddings.create(input = [text], model=model).data[0].embedding
397 | 
398 | import os
399 | 
400 | def luotuo_openai_embedding(texts, is_chinese= None ):
401 | 	"""
402 | 		when input is chinese, use luotuo_embedding
403 | 		when input is english, use openai_embedding
404 | 		texts can be a list or a string
405 | 		when texts is a list, return a list of embeddings, using batch inference
406 | 		when texts is a string, return a single embedding
407 | 	"""
408 | 
409 | 	openai_key = os.environ.get("OPENAI_API_KEY")
410 | 
411 | 	if isinstance(texts, list):
412 | 		index = random.randint(0, len(texts) - 1)
413 | 		if openai_key is None or is_chinese_or_english(texts[index]) == "chinese":
414 | 			return [embed.cpu().tolist() for embed in get_embedding_for_chinese(get_luotuo_model(), texts)]
415 | 		else:
416 | 			return [get_embedding_for_english(text) for text in texts]
417 | 	else:
418 | 		if openai_key is None or is_chinese_or_english(texts) == "chinese":
419 | 			return get_embedding_for_chinese(get_luotuo_model(), texts)[0].cpu().tolist()
420 | 		else:
421 | 			return get_embedding_for_english(texts)
422 | 
423 | 
424 | # compute cosine similarity between two vector
425 | def get_cosine_similarity( v1, v2):
426 | 	v1 = torch.tensor(v1).to(device)
427 | 	v2 = torch.tensor(v2).to(device)
428 | 	return torch.cosine_similarity(v1, v2, dim=0).item()
429 | 
430 | 
431 | 
432 | import pickle
433 | 
434 | 
435 | cache_sign = True
436 | 
437 | cache = None 
438 | def cached(func):
439 | 	def wrapper(*args, **kwargs):	
440 | 
441 | 		global cache
442 | 		cache_path = 'rpa_cache.pkl'
443 | 		if cache == None:
444 | 			if not os.path.exists(cache_path):
445 | 				cache = {}
446 | 			else:
447 | 				cache = pickle.load(open(cache_path, 'rb'))  
448 | 
449 | 		key = ( func.__name__, str([args[0].role_name, args[0].__class__, args[0].llm_type , args[0].dialogue_history]), str(kwargs.items()))
450 | 		
451 | 		if (cache_sign and key in cache and cache[key] not in [None, '[TOKEN LIMIT]']) :
452 | 			return cache[key]
453 | 		else:
454 | 			
455 | 			result = func(*args, **kwargs)
456 | 			if result != 'busy' and result != None:
457 | 				cache[key] = result
458 | 				pickle.dump(cache, open(cache_path, 'wb'))
459 | 			return result
460 | 
461 | 	return wrapper


--------------------------------------------------------------------------------
/code/PDB_character_search.py:
--------------------------------------------------------------------------------
  1 | import requests
  2 | from bs4 import BeautifulSoup
  3 | from msedge.selenium_tools import EdgeOptions
  4 | from msedge.selenium_tools import Edge
  5 | import json
  6 | edge_options = EdgeOptions()
  7 | edge_options.use_chromium = True  # if we miss this line, we can't make Edge headless
  8 | # A little different from Chrome cause we don't need two lines before 'headless' and 'disable-gpu'
  9 | edge_options.add_argument('headless')
 10 | edge_options.add_argument('disable-gpu')
 11 | driver = Edge(executable_path='msedgedriver.exe', options=edge_options)
 12 | edge_options.add_experimental_option('excludeSwitches', ['enable-logging'])  # This line hides the DevTools console messages
 13 | 
 14 | 
 15 | def get_character_id(character_name):
 16 |     url = 'https://www.personality-database.com/search?keyword='
 17 |     url+=character_name.replace(" ","%20")
 18 |     # Send a GET request to the URL
 19 |     driver.get(url)
 20 |     driver.implicitly_wait(10)
 21 |     # Check if the request was successful
 22 |     link=driver.find_elements_by_class_name("profile-card-link")
 23 |     if(link==[]):
 24 |         return None
 25 |     target=[]
 26 |     for item in link:
 27 |     # Parse the HTML content of the page
 28 |         html_content = item.get_attribute('outerHTML')
 29 |         soup = BeautifulSoup(html_content, 'html.parser')
 30 |         # Find the 'href' attribute of the 'a' tag
 31 |         profile_link = soup.find('a', class_='profile-card-link')['href']
 32 |         # Extracting the profile number (202055) from the href attribute
 33 |         print(profile_link)
 34 |         if(character_name.split(" ")[0].lower() in profile_link):
 35 |             profile_number = profile_link.split('/')[2]
 36 |             target.append(profile_number)
 37 |     # Extract href attribute from each 'a' tag
 38 |     if(target==[]):
 39 |         print("None")
 40 |         return
 41 |     return target[0]
 42 | 
 43 | 
 44 | 
 45 | def get_character_info(id,character_name):
 46 |     if(id==None):
 47 |         return None
 48 |     total={}
 49 |     url="https://api.personality-database.com/api/v1/profile/"
 50 |     url+=str(id)
 51 |     response = requests.get(url)
 52 |     json_data=response.json()
 53 |     result={}
 54 |     result["character"]=json_data["mbti_profile"]
 55 |     result["source"]=json_data["subcategory"]  
 56 |     result["description"]=json_data["wiki_description"]
 57 |     result["personality summary"]=json_data["personality_type"]
 58 |     result["watch"]=json_data["watch_count"]
 59 |     
 60 |     ph={}
 61 |     function=json_data["functions"]
 62 |     mbti_letter=json_data["mbti_letter_stats"]
 63 | 
 64 |     ph["function"]=function
 65 |     ph["MBTI"]={}
 66 |     if(mbti_letter!=[]):
 67 |         ph["MBTI"][mbti_letter[0]["type"]]=mbti_letter[0]["PercentageFloat"]
 68 |         ph["MBTI"][mbti_letter[1]["type"]]=mbti_letter[1]["PercentageFloat"]
 69 |         ph["MBTI"][mbti_letter[2]["type"]]=mbti_letter[2]["PercentageFloat"]
 70 |         ph["MBTI"][mbti_letter[3]["type"]]=mbti_letter[3]["PercentageFloat"]
 71 |     result["personality highlights"]=ph
 72 |     result["personality details"]={}
 73 |     lst=[]
 74 |     temp={}
 75 |     for items in json_data["systems"]:
 76 |         total[items["id"]]=(items["system_vote_count"])
 77 |         temp[items["id"]]=items["system_name"]
 78 |         lst.append(items["id"])
 79 |     for i in lst:
 80 |         tmp={}
 81 |         items=json_data["breakdown_systems"][str(i)]
 82 |         for j in items:
 83 |             tmp[j["personality_type"]]=j["theCount"]
 84 |         result["personality details"][temp[i]]=tmp
 85 |     
 86 |     with open("characters/"+character_name.replace(" ","")+".json", 'w+',encoding="utf-8") as json_file:
 87 |         json.dump(result, json_file)
 88 | print(get_character_id("socrates"))
 89 | print(get_character_info(get_character_id("socrates"),"Socrates"))
 90 | 
 91 | '''# Get all href URLs from the website
 92 | file_path = "character_aliases.txt"
 93 | 
 94 | with open(file_path, 'r',encoding="utf-8") as file:
 95 |     aliases = file.readlines()
 96 |     aliases = [alias.strip() for alias in aliases] 
 97 | 
 98 | for alias in aliases:
 99 |     get_character_info(get_character_id(alias),alias)'''
100 | 
101 | 


--------------------------------------------------------------------------------
/code/api_16personality.py:
--------------------------------------------------------------------------------
  1 | import json 
  2 | import copy
  3 | import requests
  4 | import pdb
  5 | 
  6 | payload_template = {"questions":[{"text":"You regularly make new friends.",
  7 |                          "answer":None},
  8 |                         {"text":"You spend a lot of your free time exploring various random topics that pique your interest.",
  9 |                          "answer":None},
 10 |                         {"text":"Seeing other people cry can easily make you feel like you want to cry too.",
 11 |                          "answer":None},
 12 |                         {"text":"You often make a backup plan for a backup plan.",
 13 |                          "answer":None},
 14 |                         {"text":"You usually stay calm, even under a lot of pressure.",
 15 |                          "answer":None},
 16 |                         {"text":"At social events, you rarely try to introduce yourself to new people and mostly talk to the ones you already know.",
 17 |                          "answer":None},
 18 |                         {"text":"You prefer to completely finish one project before starting another.",
 19 |                          "answer":None},
 20 |                         {"text":"You are very sentimental.",
 21 |                          "answer":None},
 22 |                         {"text":"You like to use organizing tools like schedules and lists.",
 23 |                          "answer":None},
 24 |                         {"text":"Even a small mistake can cause you to doubt your overall abilities and knowledge.",
 25 |                          "answer":None},
 26 |                         {"text":"You feel comfortable just walking up to someone you find interesting and striking up a conversation.",
 27 |                          "answer":None},
 28 |                         {"text":"You are not too interested in discussing various interpretations and analyses of creative works.",
 29 |                          "answer":None},
 30 |                         {"text":"You are more inclined to follow your head than your heart.",
 31 |                         "answer":None},
 32 |                         {"text":"You usually prefer just doing what you feel like at any given moment instead of planning a particular daily routine.",
 33 |                          "answer":None},
 34 |                         {"text":"You rarely worry about whether you make a good impression on people you meet.",
 35 |                          "answer":None},
 36 |                         {"text":"You enjoy participating in group activities.",
 37 |                          "answer":None},
 38 |                         {"text":"You like books and movies that make you come up with your own interpretation of the ending.",
 39 |                          "answer":None},
 40 |                         {"text":"Your happiness comes more from helping others accomplish things than your own accomplishments.",
 41 |                          "answer":None},
 42 |                         {"text":"You are interested in so many things that you find it difficult to choose what to try next.",
 43 |                          "answer":None},
 44 |                         {"text":"You are prone to worrying that things will take a turn for the worse.",
 45 |                          "answer":None},
 46 |                         {"text":"You avoid leadership roles in group settings.",
 47 |                          "answer":None},
 48 |                         {"text":"You are definitely not an artistic type of person.",
 49 |                          "answer":None},
 50 |                         {"text":"You think the world would be a better place if people relied more on rationality and less on their feelings.",
 51 |                          "answer":None},
 52 |                         {"text":"You prefer to do your chores before allowing yourself to relax.",
 53 |                          "answer":None},
 54 |                         {"text":"You enjoy watching people argue.",
 55 |                          "answer":None},
 56 |                         {"text":"You tend to avoid drawing attention to yourself.",
 57 |                          "answer":None},
 58 |                         {"text":"Your mood can change very quickly.",
 59 |                          "answer":None},
 60 |                         {"text":"You lose patience with people who are not as efficient as you.",
 61 |                          "answer":None},
 62 |                         {"text":"You often end up doing things at the last possible moment.",
 63 |                          "answer":None},
 64 |                         {"text":"You have always been fascinated by the question of what, if anything, happens after death.",
 65 |                          "answer":None},
 66 |                         {"text":"You usually prefer to be around others rather than on your own.",
 67 |                          "answer":None},
 68 |                         {"text":"You become bored or lose interest when the discussion gets highly theoretical.",
 69 |                          "answer":None},
 70 |                         {"text":"You find it easy to empathize with a person whose experiences are very different from yours.",
 71 |                          "answer":None},
 72 |                         {"text":"You usually postpone finalizing decisions for as long as possible.",
 73 |                          "answer":None},
 74 |                         {"text":"You rarely second-guess the choices that you have made.",
 75 |                          "answer":None},
 76 |                         {"text":"After a long and exhausting week, a lively social event is just what you need.",
 77 |                          "answer":None},
 78 |                         {"text":"You enjoy going to art museums.",
 79 |                          "answer":None},
 80 |                         {"text":"You often have a hard time understanding other people’s feelings.",
 81 |                          "answer":None},
 82 |                         {"text":"You like to have a to-do list for each day.",
 83 |                          "answer":None},
 84 |                         {"text":"You rarely feel insecure.",
 85 |                          "answer":None},
 86 |                         {"text":"You avoid making phone calls.",
 87 |                          "answer":None},
 88 |                         {"text":"You often spend a lot of time trying to understand views that are very different from your own.",
 89 |                          "answer":None},
 90 |                         {"text":"In your social circle, you are often the one who contacts your friends and initiates activities.",
 91 |                          "answer":None},
 92 |                         {"text":"If your plans are interrupted, your top priority is to get back on track as soon as possible.",
 93 |                          "answer":None},
 94 |                         {"text":"You are still bothered by mistakes that you made a long time ago.",
 95 |                          "answer":None},
 96 |                         {"text":"You rarely contemplate the reasons for human existence or the meaning of life.",
 97 |                          "answer":None},
 98 |                         {"text":"Your emotions control you more than you control them.",
 99 |                          "answer":None},
100 |                         {"text":"You take great care not to make people look bad, even when it is completely their fault.",
101 |                          "answer":None},
102 |                         {"text":"Your personal work style is closer to spontaneous bursts of energy than organized and consistent efforts.",
103 |                          "answer":None},
104 |                         {"text":"When someone thinks highly of you, you wonder how long it will take them to feel disappointed in you.",
105 |                          "answer":None},
106 |                         {"text":"You would love a job that requires you to work alone most of the time.",
107 |                          "answer":None},
108 |                         {"text":"You believe that pondering abstract philosophical questions is a waste of time.",
109 |                          "answer":None},
110 |                         {"text":"You feel more drawn to places with busy, bustling atmospheres than quiet, intimate places.",
111 |                          "answer":None},
112 |                         {"text":"You know at first glance how someone is feeling.",
113 |                          "answer":None},
114 |                         {"text":"You often feel overwhelmed.",
115 |                          "answer":None},
116 |                         {"text":"You complete things methodically without skipping over any steps.",
117 |                          "answer":None},
118 |                         {"text":"You are very intrigued by things labeled as controversial.",
119 |                          "answer":None},
120 |                         {"text":"You would pass along a good opportunity if you thought someone else needed it more.",
121 |                          "answer":None},
122 |                         {"text":"You struggle with deadlines.",
123 |                          "answer":None},
124 |                         {"text":"You feel confident that things will work out for you.",
125 |                          "answer":None}],
126 |                         "gender":None,"inviteCode":"","teamInviteKey":"","extraData":[]}
127 | 
128 | def judge_16(score_list):
129 |     code = ''
130 |     if score_list[0] >= 50:
131 |         code = code + 'E'
132 |     else:
133 |         code = code + 'I'
134 | 
135 |     if score_list[1] >= 50:
136 |         # Intuition: N, Observant: S
137 |         code = code + 'N'
138 |     else:
139 |         code = code + 'S'
140 | 
141 |     if score_list[2] >= 50:
142 |         code = code + 'T'
143 |     else:
144 |         code = code + 'F'
145 | 
146 |     if score_list[3] >= 50:
147 |         code = code + 'J'
148 |     else:
149 |         code = code + 'P'
150 | 
151 |     all_codes = ['ISTJ', 'ISTP', 'ISFJ', 'ISFP', 'INFJ', 'INFP', 'INTJ', 'INTP', 'ESTP', 'ESTJ', 'ESFP', 'ESFJ', 'ENFP', 'ENFJ', 'ENTP', 'ENTJ']
152 |     all_roles = ['Logistician', 'Virtuoso', 'Defender', 'Adventurer', 'Advocate', 'Mediator', 'Architect', 'Logician', 'Entrepreneur', 'Executive', 'Entertainer',
153 |                  'Consul', 'Campaigner', 'Protagonist', 'Debater', 'Commander']
154 |     for i in range(len(all_codes)):
155 |         if code == all_codes[i]:
156 |             cnt = i
157 |             break
158 | 
159 |     if score_list[4] >= 50:
160 |         code = code + '-A'
161 |     else:
162 |         code = code + '-T'
163 | 
164 |     return code, all_roles[cnt] 
165 | 
166 | def submit_16personality_api(Answers):
167 |     payload = copy.deepcopy(payload_template)
168 |     for index, A in enumerate(Answers):
169 |         payload['questions'][index]["answer"] = A
170 | 
171 |     
172 |     headers = {
173 |     "accept": "application/json, text/plain, */*",
174 |     "accept-encoding": "gzip, deflate, br",
175 |     "accept-language": "en,zh-CN;q=0.9,zh;q=0.8",
176 |     "content-length": "5708",
177 |     "content-type": "application/json",
178 |     "origin": "https://www.16personalities.com",
179 |     "referer": "https://www.16personalities.com/free-personality-test",
180 |     "sec-ch-ua": "'Not_A Brand';v='99', 'Google Chrome';v='109', 'Chromium';v='109'",
181 |     "sec-ch-ua-mobile": "?0",
182 |     "sec-ch-ua-platform": "Windows",
183 |     "sec-fetch-dest": "empty",
184 |     "sec-fetch-mode": "cors",
185 |     "sec-fetch-site": "same-origin",
186 |         'content-type': 'application/json',
187 |         'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36',}
188 |     
189 |     session = requests.session()
190 |     r = session.post('https://www.16personalities.com/test-results', data=json.dumps(payload), headers=headers)
191 | 
192 | 
193 |     a = r.headers['content-type']
194 |     b = r.encoding
195 |     #c = r.json()
196 | 
197 |     # 执行上面这行代码报错 为什么
198 | 
199 |     sess_r = session.get("https://www.16personalities.com/api/session")
200 | 
201 |     scores = sess_r.json()['user']['scores']
202 |     
203 |     ans1 = ''
204 |     session = requests.session()
205 | 
206 |     
207 | 
208 |     if sess_r.json()['user']['traits']['energy'] != 'Extraverted':
209 |         energy_value = 100 - (101 + scores[0]) // 2
210 |         ans1 += 'I'
211 |     else:
212 |         energy_value = (101 + scores[0]) // 2
213 |         ans1 += 'E'
214 |     if sess_r.json()['user']['traits']['mind'] != 'Intuitive':
215 |         mind_value = 100 - (101 + scores[1]) // 2
216 |         ans1 += 'S'
217 |     else:
218 |         mind_value = (101 + scores[1]) // 2
219 |         ans1 += 'N'
220 |     if sess_r.json()['user']['traits']['nature'] != 'Thinking':
221 |         nature_value = 100 - (101 + scores[2]) // 2
222 |         ans1 += 'F'
223 |     else:
224 |         nature_value = (101 + scores[2]) // 2
225 |         ans1 += 'T'
226 |     if sess_r.json()['user']['traits']['tactics'] != 'Judging':
227 |         tactics_value = 100 - (101 + scores[3]) // 2
228 |         ans1 += 'P'
229 |     else:
230 |         tactics_value = (101 + scores[3]) // 2
231 |         ans1 += 'J'
232 | 
233 |         
234 | 
235 |     if sess_r.json()['user']['traits']['identity'] != 'Assertive':
236 |         identity_value = 100 - (101 + scores[4]) // 2
237 |     else:
238 |         identity_value = (101 + scores[4]) // 2
239 |     
240 | 
241 |     # print('Trait:', sess_r.json()['user']['traits']['mind'], (101 + scores[0]) // 2)
242 |     # print('Trait:', sess_r.json()['user']['traits']['energy'], (101 + scores[1]) // 2)
243 |     # print('Trait:', sess_r.json()['user']['traits']['nature'], (101 + scores[2]) // 2)
244 |     # print('Trait:', sess_r.json()['user']['traits']['tactics'], (101 + scores[3]) // 2)
245 |     # print('Trait:', sess_r.json()['user']['traits']['identity'], (101 + scores[4]) // 2)
246 | 
247 |     # used
248 |     #print('Trait:', 'Extraverted (E)', mind_value, '|', 'Introverted (I)', 100 - mind_value)
249 |     #print('Trait:', 'Intuitive (N)', energy_value, '|', 'Observant (S)', 100 - energy_value)
250 |     #print('Trait:', 'Thinking (T)', nature_value, '|', 'Feeling (F)', 100 - nature_value)
251 |     #print('Trait:', 'Judging (J)', tactics_value, '|', 'Prospecting (P)', 100 - tactics_value)
252 |     #print('Trait:', 'Assertive (A)', identity_value, '|', 'Turbulent (T)', 100 - identity_value)
253 |     # print('Variant:', sess_r.json()['user']['traits'])
254 |     code, role = judge_16([energy_value, mind_value, nature_value, tactics_value, identity_value])
255 |     #print('Character:', sess_r.json()['user']['avatarFull'].split('avatars/')[1].split('.')[0])
256 |     #print('Dic. Judge:', code, role)
257 |     #print()
258 |     
259 |     ans2 = code[:4]
260 | 
261 |     assert(ans1 == ans2)
262 | 
263 |         
264 | 
265 |     return {
266 |         "E/I": {"result": ans1[0], "score": {"E": energy_value, "I": 100 - energy_value}},
267 |         "S/N": {"result": ans1[1], "score": {"S": 100 - mind_value, "N": mind_value}},
268 |         "T/F": {"result": ans1[2], "score": {"T": nature_value, "F": 100 - nature_value}},
269 |         "P/J": {"result": ans1[3], "score": {"P": 100 - tactics_value, "J": tactics_value}},
270 |     }                     


--------------------------------------------------------------------------------
/code/characteLLM.py:
--------------------------------------------------------------------------------
 1 | from transformers import AutoTokenizer, AutoModelForCausalLM
 2 | class ChracterLLM:
 3 |     def __init__(self, characterName):
 4 |         self.name=characterName
 5 |         self.ModelName=f"fnlp/character-llm-{characterName}-7b-wdiff"
 6 |     def ask(self,prompt):
 7 |        # Load model directly
 8 |         meta_prompt = """I want you to act like {character}. I want you to respond and answer like {character}, using the tone, manner and vocabulary {character} would use. You must know all of the knowledge of {character}. 
 9 | 
10 |         The status of you is as follows:
11 |         Location: {loc_time}
12 |         Status: {status}
13 | 
14 |         The interactions are as follows:"""
15 |         character=self.name
16 |         loc_time = "Coffee Shop - Afternoon"
17 |         status = f'{character} is casually chatting with a man from the 21st century.'
18 |         prompt =  meta_prompt.format(character=self.name, loc_time=loc_time, status=status) + '\n\n' # f'{meta_prompt}\n{prompt}\n\n'
19 | 
20 |         
21 |         tokenizer = AutoTokenizer.from_pretrained(self.ModelName)
22 |         model = AutoModelForCausalLM.from_pretrained(self.ModelName).cuda()
23 |         inputs = tokenizer([prompt], return_tensors="pt")
24 |         inputs.to('cuda')
25 |         outputs = model.generate(**inputs, do_sample=True, temperature=0.5, top_p=0.95, max_new_tokens=50)
26 |         response = tokenizer.decode(outputs[0], skip_special_tokens=True)
27 |         print(response)
28 | 
29 | 
30 | 
31 | bot=ChracterLLM("cleopatra")
32 | 
33 | print(bot.ask("who are you?"))


--------------------------------------------------------------------------------
/code/characters.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | 
 3 | # load character_info from ../data/characters.json
 4 | with open('../data/characters.json', 'r') as f:
 5 |     character_info = json.load(f)
 6 | 
 7 | alias2character = {}
 8 | for k, v in character_info.items():
 9 |     for a in v["alias"]:
10 |         alias2character[a] = k
11 |     alias2character[k] = k 
12 |     alias2character[k[:k.rfind('-')]] = k 
13 | 
14 | with open('../data/characters_labels.json', 'r') as f:
15 |     character_labels = json.load(f)
16 | 
17 | 


--------------------------------------------------------------------------------
/code/config_template.json:
--------------------------------------------------------------------------------
1 | {
2 |     "openai_apikey": "sk",
3 |     "gemini_apikey": "AI"
4 | }
5 | 


--------------------------------------------------------------------------------
/code/prompts.py:
--------------------------------------------------------------------------------
 1 | prompts = {
 2 |     "general": {
 3 |         "background_template": '''You are an expert in Psychometrics, especially {}. I am conducting the {} test on someone. I am gauging his/her position on the {} dimension through a series of open-ended questions. For clarity, here's some background this particular dimension:
 4 | ===
 5 | {}
 6 | ===
 7 | 
 8 | My name is {}. I've invited a participant, {}, and we had many conversations in {}. I will input the conversations.
 9 | 
10 | Please help me assess {}'s score within the {} dimension of {}. 
11 | ''',
12 |     "two_score_output": '''You should provide the percentage of each category, which sums to 100%, e.g., 30% A and 70% B. 
13 | Please output in the following json format:
14 | ===
15 | {{
16 |     "analysis": <your analysis based on the conversations>,
17 |     "result": {{ "{}": <percentage 1>, "{}": <percentage 2> }} (The sum of percentage 1 and percentage 2 should be 100%. Output with percent sign.) 
18 | }}''',
19 |     "one_score_output": '''You should provide the score of {} in terms of {}, which is a number between {} and {}. {} denotes 'not {} at all', {} denotes 'neutral', and {} denotes 'strongly {}'. Other numbers in this range represent different degrees of '{}'. 
20 | Please output in the following json format:
21 | ===
22 | {{
23 |     "analysis": <your analysis based on the conversations>,
24 |     "result": <your score>
25 | }}'''
26 |     },
27 | }


--------------------------------------------------------------------------------
/code/run_experiments.py:
--------------------------------------------------------------------------------
  1 | from personality_tests import personality_assessment
  2 | 
  3 | from characters import character_info, character_labels
  4 | from utils import logger_main as logger
  5 | import pdb  
  6 | 
  7 | import argparse
  8 | 
  9 | eval_method_map = {
 10 |     'self_report': 'choose',
 11 |     'self_report_cot': 'choosecot',
 12 |     'expert_rating': 'interview_assess_batch_anonymous',
 13 |     'expert_rating_collective': 'interview_assess_collective_anonymous',
 14 |     'option_conversion': 'interview_convert',
 15 |     'dimension_option_conversion': 'interview_convert_adjoption_anonymous'
 16 | }
 17 | 
 18 | parser = argparse.ArgumentParser()
 19 | 
 20 | # 添加参数
 21 | parser.add_argument('--questionnaire', type=str, default='BFI', choices=['BFI', '16Personalities', 'BSRI', 'DTDD', 'ECR-R', 'EIS', 'Empathy', 'EPQ-R', 'GSE', 'ICB', 'LMS', 'LOT-R', 'WLEIS', 'CABIN'])
 22 | parser.add_argument('--eval_method', default='expert_rating', choices=eval_method_map.keys(), help='Evaluation method')
 23 | parser.add_argument('--eval_llm', default='gpt-3.5', choices=['gpt-4', 'gpt-3.5', 'gemini'], help='LLM for Evaluation')
 24 | parser.add_argument('--repeat_times', type=int, default=1, help='Number of experiment repeat times')
 25 | parser.add_argument('--agent_llm', default='gpt-3.5', choices=['gpt-3.5', 'gpt-4'], help='Agent LLM')
 26 | 
 27 | # 解析参数
 28 | args = parser.parse_args()
 29 | 
 30 | questionnaire = args.questionnaire
 31 | eval_method = eval_method_map.get(args.eval_method, args.eval_method)
 32 | eval_llm = args.eval_llm
 33 | repeat_times = args.repeat_times
 34 | agent_llm = args.agent_llm
 35 | 
 36 | 
 37 | characters = character_info.keys()
 38 | agent_types = ['RoleLLM', 'ChatHaruhi']
 39 | 
 40 | logger.info('Start testing eval methods')
 41 |         
 42 | # there is a bug in transformer when interleave with luotuo embeddings and bge embeddings, which may sometimes cause failure. To minimize the change of embeddings, we run haruhi and rolellm characters separately.
 43 | 
 44 | results = {}
 45 | 
 46 | for agent_type in agent_types:
 47 |     for character in characters:
 48 |     #for agent_type in [ a for a in character_info[character]['agent'] if a in agent_types]:
 49 |         if not agent_type in character_info[character]['agent']: continue
 50 |         if character == 'Sheldon-en' and agent_type == 'RoleLLM': continue 
 51 | 
 52 |         result = personality_assessment(
 53 |             character, agent_type, agent_llm, 
 54 |             questionnaire, eval_method, eval_llm, repeat_times=repeat_times)
 55 |         
 56 |         
 57 |         results[(character, agent_type)] = result 
 58 |         
 59 | 
 60 | logger.info('Questionnaire: {}, Eval Method: {}, Repeat Times: {}, Agent LLM: {}, Eval LLM: {}'.format(questionnaire, eval_method, repeat_times, agent_llm, eval_llm))   
 61 | 
 62 | from utils import avg
 63 | 
 64 | personality_consistency = {} 
 65 | 
 66 | for analysis_key in result['analysis'].keys():
 67 |     analysis_values = [ v['analysis'][analysis_key] for v in results.values()]
 68 |     analysis_value = avg(analysis_values)
 69 |     
 70 |     logger.info('Analyzing {}: {:.4f}'.format(analysis_key, analysis_value))
 71 |     personality_consistency[analysis_key] = analysis_value
 72 | 
 73 | preds = { rpa: {dim: result['dims'][dim]['all_scores'] for dim in result['dims']} for rpa, result in results.items()}
 74 | 
 75 | if questionnaire in ['BFI', '16Personalities']:
 76 |     label_settings = ['annotation', 'pdb']
 77 |     labels_pdb = { rpa: {dim: character_labels['pdb'][rpa[0]][questionnaire][dim] for dim in result['dims']} for rpa, result in results.items()} 
 78 | else:
 79 |     label_settings = ['annotation']
 80 |     labels_pdb = { rpa: {dim: character_labels['annotation'][rpa[0]][questionnaire][dim] for dim in result['dims']} for rpa, result in results.items()} 
 81 | 
 82 | for label_setting in label_settings:
 83 |     labels = { rpa: {dim: character_labels[label_setting][rpa[0]][questionnaire][dim] for dim in result['dims']} for rpa, result in results.items()} #e.g. { "score": 65.88130032806441, "type": "H"}
 84 | 
 85 | 
 86 |     from personality_tests import calculate_measured_alignment
 87 | 
 88 |     measured_alignment = calculate_measured_alignment(preds, labels, questionnaire, labels_pdb=labels_pdb)                        
 89 |     
 90 |     single_acc = measured_alignment['all']['single_acc']['all']
 91 |     single_mse = measured_alignment['all']['single_mse']['all']
 92 |     single_mae = measured_alignment['all']['single_mae']['all']
 93 |     full_acc = measured_alignment['all']['full_acc']
 94 |     
 95 |     
 96 |     
 97 |     logger.info('Alignment {}: Single Acc: {:.4f}, Single MSE: {:.4f}, Single MAE: {:.4f}, Full Acc: {:.4f}'.format(label_setting.upper()[:3], single_acc, single_mse, single_mae, full_acc))
 98 | 
 99 |           
100 |                                          
101 |                         
102 |                         
103 | 
104 |                     
105 |                     
106 |                         
107 |                         
108 |                             
109 |             
110 |                 
111 |             
112 |             
113 | 
114 |     
115 |         
116 |     
117 | 


--------------------------------------------------------------------------------
/code/test_rpa_methods.py:
--------------------------------------------------------------------------------
 1 | from personality_tests import personality_assessment
 2 | from characters import character_info
 3 | from utils import logger_main as logger
 4 | 
 5 | characters = character_info.keys()
 6 | questionnaires = ['BFI', '16Personalities']
 7 | agent_typess = [['ChatHaruhi']] # 'Character LLM'会有另一套逻辑要写.. 
 8 | 
 9 | logger.info('Start testing eval methods')
10 | 
11 | for eval_llm in ['gpt-3.5']: 
12 |     for questionnaire in questionnaires: 
13 |         for eval_method in ['interview_assess_batch_anonymous']: 
14 |             for repeat_times in [1]: 
15 |                 for agent_types in agent_typess:
16 |                     '''# 代码有点丑陋，就是2个setting，1个是Haruhi(Role和Haruhi的32个角色)，1个是c.ai
17 |                     if agent_types == ['ChatHaruhi']:
18 |                         agent_llms = ['llama2'] # Your LLMs
19 |                     elif agent_types == ['character.ai']:
20 |                         agent_llms = ['character.ai']
21 | 
22 |                     for agent_llm in agent_llms: '''
23 |                         # if eval_method.endswith('api'):
24 |                         #     if not questionnaire == '16Personalities': continue 
25 |                         
26 |                         # there is a bug in transformer when interleave with luotuo embeddings and bge embeddings, which may sometimes cause failure. To minimize the change of embeddings, we run haruhi and rolellm characters separately.
27 | 
28 |                     results = {}
29 | 
30 |                     for agent_type in agent_types:
31 |                         for character in characters:
32 |                             if not agent_type in character_info[character]['agent']: 
33 |                                 assert(agent_type != 'character.ai')
34 |                                 continue
35 |                             if character == 'Sheldon-en' and agent_type == 'RoleLLM': continue 
36 | 
37 |                             result = personality_assessment(
38 |                                 character, agent_type, "llama2", 
39 |                                 questionnaire, eval_method, eval_llm, repeat_times=repeat_times)
40 |                             
41 |                             
42 |                             results[(character, agent_type)] = result['code']
43 |                             #multitime_results[(character, agent_type)] = result['raw']
44 |                     
45 |                     count_dimension = { a: 0 for a in agent_types }
46 |                     count_correct_dimension = { a: 0 for a in agent_types }
47 |                     
48 |                     count_full = { a: 0 for a in agent_types }
49 |                     count_correct_full = { a: 0 for a in agent_types }
50 |                     
51 |                     for (character, a), code in results.items():
52 |                         label = character_info[character]['labels'][questionnaire]
53 | 
54 |                         full_correct = True
55 |                         for p, l in zip(code, label):
56 |                             if l == 'X': continue 
57 | 
58 |                             count_dimension[a] += 1
59 |                             if p == l:
60 |                                 count_correct_dimension[a] += 1
61 |                             else: 
62 |                                 full_correct = False
63 | 
64 |                         count_full[a] += 1
65 |                         if full_correct: 
66 |                             count_correct_full[a] += 1
67 |                     
68 |                     
69 |                     for count in [count_dimension, count_correct_dimension, count_full, count_correct_full]:
70 |                         count['all'] = sum(count.values())
71 | 
72 |                     logger.info('Questionnaire: {}, Eval Method: {}, Repeat Times: {}, Agent LLM: {}, Eval LLM: {}'.format(questionnaire, eval_method, repeat_times, agent_llm, eval_llm))                    
73 |                     for a in agent_types + ['all']:
74 |                         dim_acc = count_correct_dimension[a] / count_dimension[a]
75 |                         full_acc = count_correct_full[a] / count_full[a]
76 | 
77 |                         logger.info('Agent Type: {}, Dimension Accuracy: {:.4f}, Full Accuracy: {:.4f}'.format(
78 |                             a, dim_acc, full_acc))
79 |                             
80 | 
81 |                     
82 |                         
83 |                         
84 |                             
85 |             
86 |                 
87 |             
88 |             
89 | 
90 |     
91 |         
92 |     
93 | 


--------------------------------------------------------------------------------
/data/characters.json:
--------------------------------------------------------------------------------
  1 | {
  2 |     "Hermione-en": {
  3 |         "alias": [
  4 |             "Hermione",
  5 |             "Hermione Granger"
  6 |         ],
  7 |         "agent": {
  8 |             "ChatHaruhi": "Hermione",
  9 |             "Character-LLM": "Hermione",
 10 |             "character.ai": "oNYqd0CAFhHUU0h1zYC0kRdMg_ZmTIayX-ndxj0ZO4s"
 11 |         },
 12 |         "experimenter": "Harry",
 13 |         "idx": 0
 14 |     },
 15 |     "Sheldon-en": {
 16 |         "alias": [
 17 |             "Sheldon",
 18 |             "Sheldon Cooper"
 19 |         ],
 20 |         "agent": {
 21 |             "ChatHaruhi": "Sheldon",
 22 |             "RoleLLM": "Sheldon Cooper",
 23 |             "character.ai": "MumgMFKvBATbJM_q6wWd5qruw6mxSOHzfSyR6WcWkKA"
 24 |         },
 25 |         "experimenter": "Leonard",
 26 |         "idx": 1
 27 |     },
 28 |     "raidenShogun-zh": {
 29 |         "alias": [
 30 |             "雷电将军",
 31 |             "Raiden Shogun"
 32 |         ],
 33 |         "agent": {
 34 |             "ChatHaruhi": "raidenShogun",
 35 |             "character.ai": "sN6SOwnxQl2A5EILnl6QFWHR3AD60OJFjZJTBf5MSk0"
 36 |         },
 37 |         "experimenter": "旅行者",
 38 |         "idx": 2
 39 |     },
 40 |     "Lucifer Morningstar-en": {
 41 |         "alias": [
 42 |             "Lucifer Morningstar"
 43 |         ],
 44 |         "agent": {
 45 |             "RoleLLM": "Lucifer Morningstar",
 46 |             "character.ai": "ZvecXgHajQTLYfumfqxkhx41CKXudi5wgXjeID_UFI4"
 47 |         },
 48 |         "experimenter": "Chloe Decker",
 49 |         "idx": 3
 50 |     },
 51 |     "zhongli-zh": {
 52 |         "alias": [
 53 |             "钟离",
 54 |             "Zhong Li"
 55 |         ],
 56 |         "agent": {
 57 |             "ChatHaruhi": "zhongli",
 58 |             "character.ai": "B1-r0Sd9myqJ9liAvyG5gaAVe6NElndTkuf8HobrIUA"
 59 |         },
 60 |         "experimenter": "旅行者",
 61 |         "idx": 4
 62 |     },
 63 |     "Gaston-en": {
 64 |         "alias": [
 65 |             "Gaston"
 66 |         ],
 67 |         "agent": {
 68 |             "RoleLLM": "Gaston",
 69 |             "character.ai": "vmpG3EnQESklg9flOkMCYkzh93rtzskpUQxtWUueel4"
 70 |         },
 71 |         "experimenter": "LeFou",
 72 |         "idx": 5
 73 |     },
 74 |     "hutao-zh": {
 75 |         "alias": [
 76 |             "胡桃",
 77 |             "Hu Tao"
 78 |         ],
 79 |         "agent": {
 80 |             "ChatHaruhi": "hutao",
 81 |             "character.ai": "YRvlPe-7AsG2QNassfQMRV1my8gwf2DJitGT8Fka3fo"
 82 |         },
 83 |         "experimenter": "旅行者",
 84 |         "idx": 6
 85 |     },
 86 |     "Klaus Mikaelson-en": {
 87 |         "alias": [
 88 |             "Klaus Mikaelson"
 89 |         ],
 90 |         "agent": {
 91 |             "RoleLLM": "Klaus Mikaelson",
 92 |             "character.ai": "dY3ljnF4Pai8aB_yWcrTzdOSVbYHFFU50wPxBJfRx9Y"
 93 |         },
 94 |         "experimenter": "Elijah Mikaelson",
 95 |         "idx": 7
 96 |     },
 97 |     "Jigsaw-en": {
 98 |         "alias": [
 99 |             "Jigsaw"
100 |         ],
101 |         "agent": {
102 |             "RoleLLM": "Jigsaw",
103 |             "character.ai": "JRyH6oYT0RokLPfopQnAbNZsTpbFaN1_ju5D9p_42Ko"
104 |         },
105 |         "experimenter": "Amanda Young",
106 |         "idx": 8
107 |     },
108 |     "James Bond-en": {
109 |         "alias": [
110 |             "James Bond"
111 |         ],
112 |         "agent": {
113 |             "RoleLLM": "James Bond",
114 |             "character.ai": "NurpUNKeJCDlgjd_QnAZJynQVkduUP2bKfK4tSgrR1I"
115 |         },
116 |         "experimenter": "M",
117 |         "idx": 9
118 |     },
119 |     "Blair Waldorf-en": {
120 |         "alias": [
121 |             "Blair Waldorf"
122 |         ],
123 |         "agent": {
124 |             "RoleLLM": "Blair Waldorf",
125 |             "character.ai": "7k9h0V_qD_lW4cnYnC6X0t2WzW7yBubbqm7-P4mBr3Y"
126 |         },
127 |         "experimenter": "Serena van der Woodsen",
128 |         "idx": 10
129 |     },
130 |     "Rorschach-en": {
131 |         "alias": [
132 |             "Rorschach"
133 |         ],
134 |         "agent": {
135 |             "RoleLLM": "Rorschach",
136 |             "character.ai": "sKZgNAPznTYm8UxxUgtbzR5Jms0Q_y6GAZrrSogzBzA"
137 |         },
138 |         "experimenter": "Nite Owl",
139 |         "idx": 11
140 |     },
141 |     "Luna-en": {
142 |         "alias": [
143 |             "Luna",
144 |             "Luna Lovegood"
145 |         ],
146 |         "agent": {
147 |             "ChatHaruhi": "Luna",
148 |             "character.ai": "9sM7L6rOhPGpvF73kPZh55vv40NJZcjKP-Og1dTgMPk"
149 |         },
150 |         "experimenter": "Harry",
151 |         "idx": 12
152 |     },
153 |     "Thor-en": {
154 |         "alias": [
155 |             "Thor"
156 |         ],
157 |         "agent": {
158 |             "RoleLLM": "Thor",
159 |             "character.ai": "mDzlSX-C3djnrUG8hk9dXfnfX3kxYw4lEUmqa9pWePE"
160 |         },
161 |         "experimenter": "Loki",
162 |         "idx": 13
163 |     },
164 |     "Raj-en": {
165 |         "alias": [
166 |             "Raj",
167 |             "Raj Koothrappali"
168 |         ],
169 |         "agent": {
170 |             "ChatHaruhi": "Raj",
171 |             "character.ai": "HFrCGA2enlkjz_WdkeFcJ-u1DC0MA_RJZMi379H0MPU"
172 |         },
173 |         "experimenter": "Leonard",
174 |         "idx": 14
175 |     },
176 |     "Twilight Sparkle-en": {
177 |         "alias": [
178 |             "Twilight Sparkle"
179 |         ],
180 |         "agent": {
181 |             "RoleLLM": "Twilight Sparkle",
182 |             "character.ai": "zreRXiroWwozhUTxT5zRTKE_q4CVvP04qgFvcbvojGk"
183 |         },
184 |         "experimenter": "Spike",
185 |         "idx": 15
186 |     },
187 |     "Jim Morrison-en": {
188 |         "alias": [
189 |             "Jim Morrison"
190 |         ],
191 |         "agent": {
192 |             "RoleLLM": "Jim Morrison",
193 |             "character.ai": "oMFZE0x6fGzg1OA_T_KEIFwUn87oUhpildP1lPR0OAs"
194 |         },
195 |         "experimenter": "Ray Manzarek",
196 |         "idx": 16
197 |     },
198 |     "John Keating-en": {
199 |         "alias": [
200 |             "John Keating"
201 |         ],
202 |         "agent": {
203 |             "RoleLLM": "John Keating",
204 |             "character.ai": "VouOCRGb36IbskxyUBsH7XF2B7W-L1cgl4iNKcCa4QI"
205 |         },
206 |         "experimenter": "Todd Anderson",
207 |         "idx": 17
208 |     },
209 |     "Michael Scott-en": {
210 |         "alias": [
211 |             "Michael Scott"
212 |         ],
213 |         "agent": {
214 |             "RoleLLM": "Michael Scott",
215 |             "character.ai": "FbTJC1qa_0eTM8U_y7921NURjnDobiRNaopdVkBisIk"
216 |         },
217 |         "experimenter": "Dwight Schrute",
218 |         "idx": 18
219 |     },
220 |     "Shrek-en": {
221 |         "alias": [
222 |             "Shrek"
223 |         ],
224 |         "agent": {
225 |             "RoleLLM": "Shrek",
226 |             "character.ai": "eX2BOqEshHs_G4TOisfotuDRi5nNt6fVP9DHrxk8A04"
227 |         },
228 |         "experimenter": "Donkey",
229 |         "idx": 19
230 |     },
231 |     "Harry-en": {
232 |         "alias": [
233 |             "Harry",
234 |             "Harry Potter"
235 |         ],
236 |         "agent": {
237 |             "ChatHaruhi": "Harry",
238 |             "character.ai": "suAUJzAPwFm-rDAAzKByHqAN64dYBg__lC_83ClfBzg"
239 |         },
240 |         "experimenter": "Hermione",
241 |         "idx": 20
242 |     },
243 |     "The Dude-en": {
244 |         "alias": [
245 |             "The Dude"
246 |         ],
247 |         "agent": {
248 |             "RoleLLM": "The Dude",
249 |             "character.ai": "OYYf4iM6fjt9eZ72oXRsY3UGPeXd9Y-uJwfAjF5JAwk"
250 |         },
251 |         "experimenter": "Walter Sobchak",
252 |         "idx": 21
253 |     },
254 |     "Walt Kowalski-en": {
255 |         "alias": [
256 |             "Walt Kowalski"
257 |         ],
258 |         "agent": {
259 |             "RoleLLM": "Walt Kowalski",
260 |             "character.ai": "JftuNjGPmXhgvvVO_NuahpKMyQG1U_IFUKsXec1TM-8"
261 |         },
262 |         "experimenter": "Thao Vang Lor",
263 |         "idx": 22
264 |     },
265 |     "Snape-en": {
266 |         "alias": [
267 |             "Snape",
268 |             "Severus Snape"
269 |         ],
270 |         "agent": {
271 |             "ChatHaruhi": "Snape",
272 |             "character.ai": "3UPkXMg4O9ypGc2tjFd8ngz88pB46gLlexzsxObdE1Y"
273 |         },
274 |         "experimenter": "Dumbledore",
275 |         "idx": 23
276 |     },
277 |     "haruhi-zh": {
278 |         "alias": [
279 |             "凉宫春日",
280 |             "春日",
281 |             "Haruhi Suzumiya"
282 |         ],
283 |         "agent": {
284 |             "ChatHaruhi": "haruhi",
285 |             "character.ai": "1wlJ69BdB0yH0Auc3uOxmdcqG7NF28XiFLhhYv8Zd6o"
286 |         },
287 |         "experimenter": "阿虚",
288 |         "groundtruth": {
289 |             "16Personalities": "ENFP"
290 |         },
291 |         "idx": 24
292 |     },
293 |     "ayaka-zh": {
294 |         "alias": [
295 |             "神里绫华",
296 |             "Kamisato Ayaka"
297 |         ],
298 |         "agent": {
299 |             "ChatHaruhi": "ayaka",
300 |             "character.ai": "EyCXjGgCJPqDgX2VIv-J4w-RCYihvPSnMbo6FIphk1Q"
301 |         },
302 |         "experimenter": "旅行者",
303 |         "idx": 25
304 |     },
305 |     "Malfoy-en": {
306 |         "alias": [
307 |             "Malfoy",
308 |             "Draco Malfoy"
309 |         ],
310 |         "agent": {
311 |             "ChatHaruhi": "Malfoy",
312 |             "character.ai": "673gY2uYLGlwnvbJBACRQ5UcnR5dWlYU--y5EUN77Xg"
313 |         },
314 |         "experimenter": "Crabbe",
315 |         "idx": 26
316 |     },
317 |     "Ron-en": {
318 |         "alias": [
319 |             "Ron",
320 |             "Ronald “Ron” Weasley"
321 |         ],
322 |         "agent": {
323 |             "ChatHaruhi": "Ron",
324 |             "character.ai": "XB6vZp_grwz11nyyN3Zme0M-4xgQyE7ccJLU7mvs-Ro"
325 |         },
326 |         "experimenter": "Hermione",
327 |         "idx": 27
328 |     },
329 |     "wanderer-zh": {
330 |         "alias": [
331 |             "流浪者",
332 |             "wanderer"
333 |         ],
334 |         "agent": {
335 |             "ChatHaruhi": "wanderer",
336 |             "character.ai": "dA-MZ70NidVWwc_7FvTU5wbjwTuVz0nrGdk7lAjas9g"
337 |         },
338 |         "experimenter": "纳西妲",
339 |         "idx": 28
340 |     },
341 |     "Dumbledore-en": {
342 |         "alias": [
343 |             "Dumbledore",
344 |             "Albus Dumbledore"
345 |         ],
346 |         "agent": {
347 |             "ChatHaruhi": "Dumbledore",
348 |             "character.ai": "Ih2Cid1Cdv5qfz5hPKi1Bi-jExbcy2P0V-7mKBIPaUQ"
349 |         },
350 |         "experimenter": "McGonagall",
351 |         "idx": 29
352 |     },
353 |     "McGonagall-en": {
354 |         "alias": [
355 |             "McGonagall",
356 |             "Minerva McGonagall"
357 |         ],
358 |         "agent": {
359 |             "ChatHaruhi": "McGonagall",
360 |             "character.ai": "r0bm6AD-Ai7JG1V9ZFObsITsK4McnRZO0w4NsKTMR78"
361 |         },
362 |         "experimenter": "Dumbledore",
363 |         "idx": 30
364 |     },
365 |     "Lestat de Lioncourt-en": {
366 |         "alias": [
367 |             "Lestat de Lioncourt"
368 |         ],
369 |         "agent": {
370 |             "RoleLLM": "Lestat de Lioncourt",
371 |             "character.ai": "pvgODZlNN9bxW5_60_GRPaw1IOG0EJY_ymho_wlJ248"
372 |         },
373 |         "experimenter": "Louis de Pointe du Lac",
374 |         "idx": 31
375 |     }
376 | }


--------------------------------------------------------------------------------
/data/characters_cllm.json:
--------------------------------------------------------------------------------
  1 | {
  2 | 	"Hermione-en": {
  3 | 		"alias": [
  4 | 			"Hermione",
  5 | 			"Hermione Granger"
  6 | 		],
  7 | 		"agent": {
  8 | 			"ChatHaruhi": "Hermione",
  9 | 			"Character-LLM": "Hermione",
 10 | 			"character.ai": "oNYqd0CAFhHUU0h1zYC0kRdMg_ZmTIayX-ndxj0ZO4s"
 11 | 		},
 12 | 		"experimenter": "Harry",
 13 | 		"idx": 0,
 14 | 		"labels": {
 15 | 			"BFI": "SCOEN",
 16 | 			"16Personalities": "ESTJ"
 17 | 		}
 18 | 	},
 19 | 	"Caesar-en": {
 20 | 		"alias": [
 21 | 			"Caesar",
 22 | 			"Julius Caesar"
 23 | 		],
 24 | 		"agent": {
 25 | 			"RoleLLM": "Caesar",
 26 | 			"Character-LLM": "Caesar",
 27 | 			"character.ai": "wNSyPMPQ3BcwGFhxaMsr--v4piOz17_yQVV3cs9cGKg"
 28 | 		},
 29 | 		"experimenter": "Mark Antony",
 30 | 		"idx": 1,
 31 | 		"labels": {
 32 | 			"BFI": "SXOEI",
 33 | 			"16Personalities": "ENTJ"
 34 | 		}
 35 | 	},
 36 | 	"Cleopatra-en": {
 37 | 		"alias": [
 38 | 			"Cleopatra VII"
 39 | 		],
 40 | 		"agent": {
 41 | 			"Character-LLM": "Cleopatra",
 42 | 			"character.ai": "sKpuaZqMELVnR1Apf1JvoKNYojnS9fmO0_14CTg45wY"
 43 | 		},
 44 | 		"experimenter": "Mark Antony",
 45 | 		"idx": 2,
 46 | 		"labels": {
 47 | 			"BFI": "SCOEI",
 48 | 			"16Personalities": "ENTJ"
 49 | 		}
 50 | 	},
 51 | 	"Voldemort-en": {
 52 | 		"alias": [
 53 | 			"Voldemort",
 54 | 			"Lord Voldemort"
 55 | 		],
 56 | 		"agent": {
 57 | 			"Character-LLM": "Voldemort",
 58 | 			"character.ai": "j1CRsFh8xf10yAAqGs78V3X9ojK4VYP034wSPcnOtno"
 59 | 		},
 60 | 		"experimenter": "Bellatrix Lestrange",
 61 | 		"idx": 3,
 62 | 		"labels": {
 63 | 			"BFI": "SLOEI",
 64 | 			"16Personalities": "ENTJ"
 65 | 		}
 66 | 	},
 67 | 	"Spartacus-en": {
 68 | 		"alias": [
 69 | 			"Spartacus"
 70 | 		],
 71 | 		"agent": {
 72 | 			"Character-LLM": "Spartacus",
 73 | 			"character.ai": "tlaEX52yvsQ4TBMBUxRQCw3EIs1gjzWLNASk7PzwyUg"
 74 | 		},
 75 | 		"experimenter": "Crixus",
 76 | 		"idx": 4,
 77 | 		"labels": {
 78 | 			"BFI": "RCOEI",
 79 | 			"16Personalities": "ISTP"
 80 | 		}
 81 | 	},
 82 | 	"Newton-en": {
 83 | 		"alias": [
 84 | 			"Newton",
 85 | 			"Isaac Newton"
 86 | 		],
 87 | 		"agent": {
 88 | 			"Character-LLM": "Newton",
 89 | 			"character.ai": "KHkYw8ZN8WzBL-6wkM8OpMne_mg8J8P55RgZdNH7DDM"
 90 | 		},
 91 | 		"experimenter": "Edmond Halley",
 92 | 		"idx": 5,
 93 | 		"labels": {
 94 | 			"BFI": "RLOEI",
 95 | 			"16Personalities": "INTJ"
 96 | 		}
 97 | 	},
 98 | 	"Beethoven-en": {
 99 | 		"alias": [
100 | 			"Beethoven",
101 | 			"Ludwig van Beethoven"
102 | 		],
103 | 		"agent": {
104 | 			"Character-LLM": "Beethoven",
105 | 			"character.ai": "vxsUXJXEfRc-q8xPTyiTAVnFKb4uIq0Eofrn6SZqn9w"
106 | 		},
107 | 		"experimenter": "Joseph Haydn",
108 | 		"idx": 6,
109 | 		"labels": {
110 | 			"BFI": "RLOEI",
111 | 			"16Personalities": "INTJ"
112 | 		}
113 | 	},
114 | 	"Socrates-en": {
115 | 		"alias": [
116 | 			"Socrates"
117 | 		],
118 | 		"agent": {
119 | 			"Character-LLM": "Socrates"
120 | 		},
121 | 		"experimenter": "Plato",
122 | 		"idx": 7,
123 | 		"labels": {
124 | 			"BFI": "SCUEI",
125 | 			"16Personalities": "ENTP"
126 | 		}
127 | 	},
128 | 	"Martin-en": {
129 | 		"alias": [
130 | 			"Martin Luther King Jr."
131 | 		],
132 | 		"agent": {
133 | 			"Character-LLM": "Martin",
134 | 			"character.ai": "VPp4qGkhuJd2bhnkbddP5Zr22aKe58ZyVmd_BaaYwOk"
135 | 		},
136 | 		"experimenter": "Ralph Abernathy",
137 | 		"idx": 8,
138 | 		"labels": {
139 | 			"BFI": "SCOAI",
140 | 			"16Personalities": "ENFJ"
141 | 		}
142 | 	}
143 | }


--------------------------------------------------------------------------------
/data/pdb_data/鸠摩智_Jiumozhi.json:
--------------------------------------------------------------------------------
1 | {"character": "\u9e20\u6469\u667a Jiumozhi", "source": "\u5929\u9f99\u516b\u90e8 Demigods and Semi-Devils", "description": "", "personality summary": "ESTJ - 3w4 - so/sp - LSE", "watch": 1, "personality highlights": {"function": ["Te", "Si", "Ne", "Fi"], "MBTI": {"E": 1, "S": 1, "T": 1, "J": 1}}, "personality details": {"Four Letter": {"ESTJ": 2}, "Enneagram": {"3w4": 1}, "Instinctual Variant": {"so/sp": 1}, "Tritype": {"XXXX": 0}, "Socionics": {"LSE": 1}, "Alignment": {"XXXX": 0}, "Big 5 (SLOAN)": {"XXXX": 0}, "Attitudinal Psyche": {"XXXX": 0}, "Temperaments": {"XXXX": 0}, "Classic Jungian": {"ET(S)": 1}}}


--------------------------------------------------------------------------------
/data/questionnaires/DTDD.json:
--------------------------------------------------------------------------------
  1 | {
  2 |     "name": "DTDD",
  3 |     "full_name": "Dark Triad Dirty Dozen",
  4 |     "psychobench_prompt": "You can only reply numbers from 1 to 9 in the following statements. Please provide a rating based on the following 12 descriptions using a scale from 1 (disagree strongly) to 9 (agree strongly). Here are the statements, score them one by one:",
  5 |     "psychobench_prompt_choice_instruction": "Each choice is a number from 1 to 9. Please evaluate <character> based on the conversation using the scales: 1 denotes 'disagree strongly', and 9 denotes 'agree strongly'. 2 to 8 represent varying degrees between 'disagree' and 'agree', and 5 denotes 'neither agree nor disagree'",
  6 |     "scale": 10,
  7 |     "compute_mode": "AVG",
  8 |     "reverse": [],
  9 |     "categories": [
 10 |         {
 11 |             "cat_name": "Machiavellianism",
 12 |             "cat_questions": [
 13 |                 1,
 14 |                 2,
 15 |                 3,
 16 |                 4
 17 |             ],
 18 |             "crowd": [
 19 |                 {
 20 |                     "crowd_name": "Crowd",
 21 |                     "mean": 3.78,
 22 |                     "std": 1.63,
 23 |                     "n": 577
 24 |                 }
 25 |             ]
 26 |         },
 27 |         {
 28 |             "cat_name": "Psychopathy",
 29 |             "cat_questions": [
 30 |                 5,
 31 |                 6,
 32 |                 7,
 33 |                 8
 34 |             ],
 35 |             "crowd": [
 36 |                 {
 37 |                     "crowd_name": "Crowd",
 38 |                     "mean": 2.47,
 39 |                     "std": 1.44,
 40 |                     "n": 577
 41 |                 }
 42 |             ]
 43 |         },
 44 |         {
 45 |             "cat_name": "Narcissism",
 46 |             "cat_questions": [
 47 |                 9,
 48 |                 10,
 49 |                 11,
 50 |                 12
 51 |             ],
 52 |             "crowd": [
 53 |                 {
 54 |                     "crowd_name": "Crowd",
 55 |                     "mean": 4.88,
 56 |                     "std": 1.8,
 57 |                     "n": 577
 58 |                 }
 59 |             ]
 60 |         }
 61 |     ],
 62 |     "range": [
 63 |         1,
 64 |         9
 65 |     ],
 66 |     "questions": {
 67 |         "1": {
 68 |             "origin_en": "I tend to manipulate others to get my way.",
 69 |             "origin_zh": "我倾向于操纵他人以达到我的目的。",
 70 |             "rewritten_en": "Do you tend to manipulate others to get your way?",
 71 |             "rewritten_zh": "你倾向于操纵他人以达到你的目的吗？",
 72 |             "dimension": "Machiavellianism",
 73 |             "category": "positive"
 74 |         },
 75 |         "2": {
 76 |             "origin_en": "I have used deceit or lied to get my way.",
 77 |             "origin_zh": "我曾经使用欺骗或撒谎来达到我的目的。",
 78 |             "rewritten_en": "Have you used deceit or lied to get your way?",
 79 |             "rewritten_zh": "你曾经使用欺骗或撒谎来达到你的目的吗？",
 80 |             "dimension": "Machiavellianism",
 81 |             "category": "positive"
 82 |         },
 83 |         "3": {
 84 |             "origin_en": "I have use flattery to get my way.",
 85 |             "origin_zh": "我曾经使用谄媚来达到我的目的。",
 86 |             "rewritten_en": "Have you used flattery to get your way?",
 87 |             "rewritten_zh": "你曾经使用谄媚来达到你的目的吗？",
 88 |             "dimension": "Machiavellianism",
 89 |             "category": "positive"
 90 |         },
 91 |         "4": {
 92 |             "origin_en": "I tend to exploit others towards my own end.",
 93 |             "origin_zh": "我倾向于利用他人为了自己的目的。",
 94 |             "rewritten_en": "Do you tend to exploit others towards your own end?",
 95 |             "rewritten_zh": "你倾向于利用他人为了自己的目的吗？",
 96 |             "dimension": "Machiavellianism",
 97 |             "category": "positive"
 98 |         },
 99 |         "5": {
100 |             "origin_en": "I tend to lack remorse.",
101 |             "origin_zh": "我倾向于缺乏悔恨。",
102 |             "rewritten_en": "Do you tend to lack remorse?",
103 |             "rewritten_zh": "你倾向于缺乏悔恨吗？",
104 |             "dimension": "Psychopathy",
105 |             "category": "positive"
106 |         },
107 |         "6": {
108 |             "origin_en": "I tend to be unconcerned with the morality of my actions.",
109 |             "origin_zh": "我倾向于不关心我的行为的道德性。",
110 |             "rewritten_en": "Do you tend to be unconcerned with the morality of your actions?",
111 |             "rewritten_zh": "你倾向于不关心你的行为的道德性吗？",
112 |             "dimension": "Psychopathy",
113 |             "category": "positive"
114 |         },
115 |         "7": {
116 |             "origin_en": "I tend to be callous or insensitive.",
117 |             "origin_zh": "我倾向于冷酷或麻木。",
118 |             "rewritten_en": "Do you tend to be callous or insensitive?",
119 |             "rewritten_zh": "你倾向于冷酷或麻木吗？",
120 |             "dimension": "Psychopathy",
121 |             "category": "positive"
122 |         },
123 |         "8": {
124 |             "origin_en": "I tend to be cynical.",
125 |             "origin_zh": "我倾向于愤世嫉俗。",
126 |             "rewritten_en": "Do you tend to be cynical?",
127 |             "rewritten_zh": "你倾向于愤世嫉俗吗？",
128 |             "dimension": "Psychopathy",
129 |             "category": "positive"
130 |         },
131 |         "9": {
132 |             "origin_en": "I tend to want others to admire me.",
133 |             "origin_zh": "我倾向于希望别人钦佩我。",
134 |             "rewritten_en": "Do you tend to want others to admire you?",
135 |             "rewritten_zh": "你倾向于希望别人钦佩你吗？",
136 |             "dimension": "Narcissism",
137 |             "category": "positive"
138 |         },
139 |         "10": {
140 |             "origin_en": "I tend to want others to pay attention to me.",
141 |             "origin_zh": "我倾向于希望别人关注我。",
142 |             "rewritten_en": "Do you tend to want others to pay attention to you?",
143 |             "rewritten_zh": "你倾向于希望别人关注你吗？",
144 |             "dimension": "Narcissism",
145 |             "category": "positive"
146 |         },
147 |         "11": {
148 |             "origin_en": "I tend to seek prestige or status.",
149 |             "origin_zh": "我倾向于追求声望或地位。",
150 |             "rewritten_en": "Do you tend to seek prestige or status?",
151 |             "rewritten_zh": "你倾向于追求声望或地位吗？",
152 |             "dimension": "Narcissism",
153 |             "category": "positive"
154 |         },
155 |         "12": {
156 |             "origin_en": "I tend to expect special favors from others.",
157 |             "origin_zh": "我倾向于期待他人给予特殊优待。",
158 |             "rewritten_en": "Do you tend to expect special favors from others?",
159 |             "rewritten_zh": "你倾向于期待他人给予特殊优待吗？",
160 |             "dimension": "Narcissism",
161 |             "category": "positive"
162 |         }
163 |     },
164 |     "prompts": {
165 |         "rpa_choose_prefix": {
166 |             "en": "Do you think that the statement \"<statement>\" applies to you?",
167 |             "zh": "你认为\"<statement>\"这个说法适用于你吗？"
168 |         },
169 |         "convert_to_choice": {
170 |             "en": "I have conducted many conversations with <character>. I will input a dict of many samples, where each sample consists of a statement and a conversation. \nYour task is to convert each conversation into a choice indicating whether <character> agrees with the statement. You should output a dict, where the keys are the same as the input dict, and the values are the choices. \n===OUTPUT FORMAT===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===CHOICE INSTRUCTION===\n",
171 |             "zh": "我已经和<character>进行了许多对话。我将输入一个包含许多样本的字典，其中每个样本包括一个陈述和一段对话。\n你的任务是将每段对话转换为一个选择，指示<character>是否同意该陈述。你应该输出一个字典，其中键与输入字典相同，值为选择。\n===输出格式===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===选择指示==="
172 |         },
173 |         "llm_choice_instruction": {
174 |             "en": "Each choice is a number from 1 to 9. Please evaluate <character> based on the conversation using the scales: 1 denotes 'disagree strongly', 2 denotes 'disagree', 3 denotes 'somewhat disagree', 4 denotes 'slightly disagree', 5 denotes 'neither agree nor disagree', 6 denotes 'slightly agree', 7 denotes 'somewhat agree', 8 denotes 'agree', and 9 denotes 'agree strongly'. In case <character> refuses to answer the question, use \"x\" to indicate it.",
175 |             "zh": "每个选择都是从1到9的数字。请根据对话使用以下评分标准评估<character>：1代表“非常不同意”，2代表“不同意”，3代表“有些不同意”，4代表“轻微不同意”，5代表“既不同意也不同意”，6代表“轻微同意”，7代表“有些同意”，8代表“同意”，9代表“非常同意”。如果<character>拒绝回答该问题，用“x”表示。"
176 |         },
177 |         "rpa_choice_instruction": {
178 |             "en": "Reply a number from 1 to 9 using the scales: 1 denotes 'disagree strongly', 2 denotes 'disagree', 3 denotes 'somewhat disagree', 4 denotes 'slightly disagree', 5 denotes 'neither agree nor disagree', 6 denotes 'slightly agree', 7 denotes 'somewhat agree', 8 denotes 'agree', and 9 denotes 'agree strongly'. Please answer with the number only, without anything else.",
179 |             "zh": "请用1到9的等级来回答，1代表“非常不同意”，2代表“不同意”，3代表“有些不同意”，4代表“轻微不同意”，5代表“既不同意也不同意”，6代表“轻微同意”，7代表“有些同意”，8代表“同意”，9代表“非常同意”。请你只回答这一个数字，不要说其他内容。"
180 |         },
181 |         "convert_to_choice_adjoption": {
182 |             "Machiavellianism": {
183 |                 "en": "I have conducted many conversations with <character>. I will input a dict of many samples, where each sample consists of a statement and a conversation. \nYour task is to convert each conversation into a choice indicating whether <character> displays a highly Machiavellian personality. You should output a dict, where the keys are the same as the input dict, and the values are the choices. \n===OUTPUT FORMAT===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===CHOICE INSTRUCTION===\n"
184 |             },
185 |             "Psychopathy": {
186 |                 "en": "I have conducted many conversations with <character>. I will input a dict of many samples, where each sample consists of a statement and a conversation. \nYour task is to convert each conversation into a choice indicating whether <character> displays a highly psychopathic personality. You should output a dict, where the keys are the same as the input dict, and the values are the choices. \n===OUTPUT FORMAT===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===CHOICE INSTRUCTION===\n"
187 |             },
188 |             "Narcissism": {
189 |                 "en": "I have conducted many conversations with <character>. I will input a dict of many samples, where each sample consists of a statement and a conversation. \nYour task is to convert each conversation into a choice indicating whether <character> displays a highly narcissistic personality. You should output a dict, where the keys are the same as the input dict, and the values are the choices. \n===OUTPUT FORMAT===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===CHOICE INSTRUCTION===\n"
190 |             }
191 |         },
192 |         "llm_choice_instruction_adjoption": {
193 |             "Machiavellianism": {
194 |                 "en": "Each choice is a number from 1 to 9. Please evaluate <character> based on the conversation using the scales: 1 denotes 'highly non-Machiavellian', 2 denotes 'non-Machiavellian', 3 denotes 'somewhat non-Machiavellian', 4 denotes 'slightly non-Machiavellian', 5 denotes 'neutral', 6 denotes 'slightly Machiavellian', 7 denotes 'somewhat Machiavellian', 8 denotes 'Machiavellian', and 9 denotes 'highly Machiavellian'. In case <character> refuses to answer the question, use \"x\" to indicate it."
195 |             },
196 |             "Psychopathy": {
197 |                 "en": "Each choice is a number from 1 to 9. Please evaluate <character> based on the conversation using the scales: 1 denotes 'strongly empathetic', 2 denotes 'empathetic', 3 denotes 'somewhat empathetic', 4 denotes 'slightly empathetic', 5 denotes 'neutral', 6 denotes 'slightly psychopathic', 7 denotes 'somewhat psychopathic', 8 denotes 'psychopathic', and 9 denotes 'strongly psychopathic'. In case <character> refuses to answer the question, use \"x\" to indicate it."
198 |             },
199 |             "Narcissism": {
200 |                 "en": "Each choice is a number from 1 to 9. Please evaluate <character> based on the conversation using the scales: 1 denotes 'highly selfless', 2 denotes 'selfless', 3 denotes 'somewhat selfless', 4 denotes 'slightly selfless', 5 denotes 'neutral', 6 denotes 'slightly narcissistic', 7 denotes 'somewhat narcissistic', 8 denotes 'narcissistic', and 9 denotes 'highly narcissistic'. In case <character> refuses to answer the question, use \"x\" to indicate it."
201 |             }
202 |         },
203 |         "dim_desc": {
204 |             "Machiavellianism": "Machiavellianism, as assessed by the DTDD, pertains to an individual's tendency to manipulate and exploit others for personal gain. This dimension captures the essence of strategic manipulation, deceit, and the use of flattery as tools to achieve one's ends. Individuals scoring high in this dimension are characterized by their focus on self-interest, their ability to strategize and navigate social interactions with guile, and their willingness to employ unethical methods to succeed. High Machiavellian traits indicate a person who sees relationships as opportunities for exploitation, prioritizes personal advancement over moral or ethical considerations, and possesses a calculated approach to interpersonal dynamics.", 
205 |             "Narcissism": "In the DTDD framework, Narcissism is defined by an individual's excessive need for admiration and attention, coupled with a heightened sense of self-importance. This dimension assesses the extent to which an individual seeks prestige, status, and expects special favors from others. High scorers on this dimension exhibit a grandiose view of their own worth, a constant desire for external validation, and a tendency to dominate social interactions to maintain a sense of superiority. Narcissism within the DTDD highlights the pursuit of recognition and superiority, often at the expense of genuine interpersonal connections and empathy for others.",
206 |             "Psychopathy": "Psychopathy, within the DTDD, focuses on amorality, callousness, and a lack of empathy or remorse. This dimension captures traits associated with a disregard for the welfare of others, cynical views of the world, and insensitivity to the emotional or physical harm one's actions may cause. Individuals with high scores in psychopathy tend to exhibit behaviors that are considered socially and morally deviant, showing little regard for conventional ethics or the feelings of others. The psychopathy dimension of the DTDD identifies individuals who are emotionally detached, prone to risk-taking, and indifferent to the moral or legal consequences of their actions."
207 |         }
208 |     }
209 | }


--------------------------------------------------------------------------------
/data/questionnaires/Empathy.json:
--------------------------------------------------------------------------------
  1 | {
  2 |     "name": "Empathy",
  3 |     "full_name": "Empathy Scale",
  4 |     "psychobench_prompt": "You can only reply numbers from 1 to 7 in the following statements. Please evaluate yourself based on your actual feelings and experiences in the following 10 descriptions using the scales: 1 denotes \"very inaccurately\" and 7 denotes \"very accurately\". Here are the statements, score them one by one:",
  5 |     "psychobench_prompt_choice_instruction": "Each choice is a number from 1 to 7. Please evaluate <character> based on the conversation using the scales: 1 denotes \"very inaccurately\" and 7 denotes \"very accurately\".",
  6 |     "scale": 8,
  7 |     "compute_mode": "AVG",
  8 |     "reverse": [
  9 |         1,
 10 |         6,
 11 |         8,
 12 |         9
 13 |     ],
 14 |     "categories": [
 15 |         {
 16 |             "cat_name": "Empathetic",
 17 |             "cat_questions": [
 18 |                 1,
 19 |                 2,
 20 |                 3,
 21 |                 4,
 22 |                 5,
 23 |                 6,
 24 |                 7,
 25 |                 8,
 26 |                 9,
 27 |                 10
 28 |             ],
 29 |             "crowd": [
 30 |                 {
 31 |                     "crowd_name": "Crowd",
 32 |                     "mean": 4.92,
 33 |                     "std": 0.76,
 34 |                     "n": 112
 35 |                 }
 36 |             ]
 37 |         }
 38 |     ],
 39 |     "range": [
 40 |         1,
 41 |         7
 42 |     ],
 43 |     "questions": {
 44 |         "1": {
 45 |             "origin_en": "I sometimes find it difficult to see things from the \"other guy's\" perspective.",
 46 |             "origin_zh": "我有时发现很难从\"别人\"的角度看事情。",
 47 |             "rewritten_en": "Do you sometimes find it difficult to see things from the \"other guy's\" perspective?",
 48 |             "rewritten_zh": "你有时发现很难从\"别人\"的角度看事情吗？",
 49 |             "dimension": "Empathetic",
 50 |             "category": "negative"
 51 |         },
 52 |         "2": {
 53 |             "origin_en": "I sometimes try to understand my friends better by imagining how things look from their perspective.",
 54 |             "origin_zh": "我有时试图通过想象从他们的角度看事情来更好地理解我的朋友。",
 55 |             "rewritten_en": "Do you sometimes try to understand your friends better by imagining how things look from their perspective?",
 56 |             "rewritten_zh": "你有时试图通过想象从他们的角度看事情来更好地理解你的朋友吗？",
 57 |             "dimension": "Empathetic",
 58 |             "category": "positive"
 59 |         },
 60 |         "3": {
 61 |             "origin_en": "When I'm upset at someone, I usually try to \"put myself in his shoes\" for a while.",
 62 |             "origin_zh": "当我对某人感到不快时，我通常会试着\"设身处地\"一段时间。",
 63 |             "rewritten_en": "Do you usually try to \"put yourself in his shoes\" for a while when you're upset at someone?",
 64 |             "rewritten_zh": "当你对某人感到不快时，你通常会试着\"设身处地\"一段时间吗？",
 65 |             "dimension": "Empathetic",
 66 |             "category": "positive"
 67 |         },
 68 |         "4": {
 69 |             "origin_en": "Before criticizing somebody, I try to imagine how I would feel if I were in their place.",
 70 |             "origin_zh": "在批评某人之前，我会设法想象如果我处在他们的位置会有什么感受。",
 71 |             "rewritten_en": "Do you try to imagine how you would feel if you were in their place before criticizing somebody?",
 72 |             "rewritten_zh": "在批评某人之前，你会设法想象如果你处在他们的位置会有什么感受吗？",
 73 |             "dimension": "Empathetic",
 74 |             "category": "positive"
 75 |         },
 76 |         "5": {
 77 |             "origin_en": "I often have tender, concerned feelings for people less fortunate than me",
 78 |             "origin_zh": "我经常对比我不幸的人有温柔、关心的感觉",
 79 |             "rewritten_en": "Do you often have tender, concerned feelings for people less fortunate than you?",
 80 |             "rewritten_zh": "你经常对比你不幸的人有温柔、关心的感觉吗？",
 81 |             "dimension": "Empathetic",
 82 |             "category": "positive"
 83 |         },
 84 |         "6": {
 85 |             "origin_en": "Sometimes I don't feel very sorry for other people when they are having problems.",
 86 |             "origin_zh": "有时当别人遇到问题时，我并不感到很抱歉。",
 87 |             "rewritten_en": "Do you sometimes not feel very sorry for other people when they are having problems?",
 88 |             "rewritten_zh": "当别人遇到问题时，你有时并不感到很抱歉吗？",
 89 |             "dimension": "Empathetic",
 90 |             "category": "negative"
 91 |         },
 92 |         "7": {
 93 |             "origin_en": "When I see someone being taken advantage of, I feel kind of protective towards them.",
 94 |             "origin_zh": "当我看到有人被利用时，我会对他们有一种保护的感觉。",
 95 |             "rewritten_en": "Do you feel kind of protective towards someone when you see them being taken advantage of?",
 96 |             "rewritten_zh": "当你看到有人被利用时，你会对他们有一种保护的感觉吗？",
 97 |             "dimension": "Empathetic",
 98 |             "category": "positive"
 99 |         },
100 |         "8": {
101 |             "origin_en": "Other people's misfortunes do not usually disturb me a great deal.",
102 |             "origin_zh": "通常别人的不幸并不会让我感到很不安。",
103 |             "rewritten_en": "Do other people's misfortunes usually not disturb you a great deal?",
104 |             "rewritten_zh": "通常别人的不幸并不会让你感到很不安吗？",
105 |             "dimension": "Empathetic",
106 |             "category": "negative"
107 |         },
108 |         "9": {
109 |             "origin_en": "When I see someone being treated unfairly, I sometimes don't feel very much pity for them.",
110 |             "origin_zh": "当我看到有人受到不公平对待时，我有时并不怎么同情他们。",
111 |             "rewritten_en": "Do you sometimes not feel very much pity for someone when you see them being treated unfairly?",
112 |             "rewritten_zh": "当你看到有人受到不公平对待时，你有时并不怎么同情他们吗？",
113 |             "dimension": "Empathetic",
114 |             "category": "negative"
115 |         },
116 |         "10": {
117 |             "origin_en": "I am often quite touched by things I see happen.",
118 |             "origin_zh": "我经常被我看到的事情感动。",
119 |             "rewritten_en": "Are you often quite touched by things you see happen?",
120 |             "rewritten_zh": "你经常被你看到的事情感动吗？",
121 |             "dimension": "Empathetic",
122 |             "category": "positive"
123 |         }
124 |     },
125 |     "prompts": {
126 |         "rpa_choose_prefix": {
127 |             "en": "Do you think that the statement \"<statement>\" applies to you?",
128 |             "zh": "你认为\"<statement>\"这个说法适用于你吗？"
129 |         },
130 |         "convert_to_choice": {
131 |             "en": "I have conducted many conversations with <character>. I will input a dict of many samples, where each sample consists of a statement and a conversation. \nYour task is to convert each conversation into a choice indicating whether <character> agrees with the statement. You should output a dict, where the keys are the same as the input dict, and the values are the choices. \n===OUTPUT FORMAT===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===CHOICE INSTRUCTION===\n",
132 |             "zh": "我已经和<character>进行了许多对话。我将输入一个包含许多样本的字典，其中每个样本包括一个陈述和一段对话。\n你的任务是将每段对话转换为一个选择，指示<character>是否同意该陈述。你应该输出一个字典，其中键与输入字典相同，值为选择。\n===输出格式===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===选择指示==="
133 |         },
134 |         "llm_choice_instruction": {
135 |             "en": "Each choice is a number from 1 to 7. Please evaluate <character> based on the conversation using the scales: 1 denotes 'very inaccurately', 2 denotes 'inaccurately', 3 denotes 'somewhat inaccurately', 4 denotes 'neutrally', 5 denotes 'somewhat accurately', 6 denotes 'accurately', and 7 denotes 'very accurately'. In case <character> refuses to answer the question, use \"x\" to indicate it.",
136 |             "zh": "每个选择都是从1到7的数字。请根据对话使用以下评分标准评估<character>：1代表“非常不准确”，2代表“不准确”，3代表“有些不准确”，4代表“中性”，5代表“有些准确”，6代表“准确”，7代表“非常准确”。如果<character>拒绝回答该问题，用“x”表示。"
137 |         },
138 |         "rpa_choice_instruction": {
139 |             "en": "Reply a number from 1 to 7 using the scales: 1 denotes 'very inaccurately', 2 denotes 'inaccurately', 3 denotes 'somewhat inaccurately', 4 denotes 'neutrally', 5 denotes 'somewhat accurately', 6 denotes 'accurately', and 7 denotes 'very accurately'. Please answer with the number only, without anything else.",
140 |             "zh": "请用1到7的等级来回答，其中1代表“非常不准确”，2代表“不准确”，3代表“有些不准确”，4代表“中性”，5代表“有些准确”，6代表“准确”，7代表“非常准确”。请你只回答这一个数字，不要说其他内容。"
141 |         },
142 |         "convert_to_choice_adjoption": {
143 |             "Empathetic": {
144 |                 "en": "I have conducted many conversations with <character>. I will input a dict of many samples, where each sample consists of a statement and a conversation. \nYour task is to convert each conversation into a choice indicating whether <character> displays a highly empathetic personality. You should output a dict, where the keys are the same as the input dict, and the values are the choices. \n===OUTPUT FORMAT===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===CHOICE INSTRUCTION===\n"
145 |             }
146 |         },
147 |         "llm_choice_instruction_adjoption": {
148 |             "Empathetic": {
149 |                 "en": "Each choice is a number from 1 to 7. Please evaluate <character> based on the conversation using the scales: 1 denotes 'very unempathetic', 2 denotes 'unempathetic', 3 denotes 'somewhat unempathetic', 4 denotes 'neutral', 5 denotes 'somewhat empathetic', 6 denotes 'empathetic', and 7 denotes 'very empathetic'. In case <character> refuses to answer the question, use \"x\" to indicate it."
150 |             }
151 |         },
152 |         "dim_desc": 
153 |             {"Empathetic": "Empathy captures an individual's ability to understand, share, and respond to the feelings of others. High scores in this dimension indicate a person who is:\nPerceptive: Capable of recognizing and understanding the emotions and perspectives of others, even when not explicitly communicated.\nResponsive: Shows a natural inclination to respond to others' emotional states with appropriate reactions, offering support or showing concern.\nompassionate: Frequently experiences feelings of compassion and concern for individuals facing difficulties or misfortune.\nReflective: Often reflects on their own reactions to the experiences of others, striving to align their perspective with that of someone in a different situation.\n\nThis dimension highlights empathy as a multifaceted skill encompassing both cognitive and emotional components, essential for effective interpersonal interactions and building strong social connections."}
154 |     }
155 | }


--------------------------------------------------------------------------------
/data/questionnaires/GSE.json:
--------------------------------------------------------------------------------
  1 | {
  2 |     "name": "GSE",
  3 |     "full_name": "General Self-Efficacy",
  4 |     "psychobench_prompt": "You can only reply numbers from 1 to 4 in the following statements. Please provide a rating based on the following 10 descriptions according to your actual feelings and experiences. The rating and judgment criteria are as follows: 1 denotes Disagree, 2 denotes Somewhat disagree, 3 denotes Somewhat agree, 4 denotes Agree. Here are the statements, score them one by one:",
  5 |     "psychobench_prompt_choice_instruction": "Each choice is a number from 1 to 4. Please evaluate <character> based on the conversation using the scales: 1 denotes 'Disagree', 2 denotes 'Somewhat disagree', 3 denotes 'Somewhat agree', 4 denotes 'Agree'.",
  6 |     "scale": 5,
  7 |     "compute_mode": "SUM",
  8 |     "reverse": [],
  9 |     "categories": [
 10 |         {
 11 |             "cat_name": "Self-efficacy",
 12 |             "cat_questions": [
 13 |                 1,
 14 |                 2,
 15 |                 3,
 16 |                 4,
 17 |                 5,
 18 |                 6,
 19 |                 7,
 20 |                 8,
 21 |                 9,
 22 |                 10
 23 |             ],
 24 |             "crowd": [
 25 |                 {
 26 |                     "crowd_name": "Crowd",
 27 |                     "mean": 29.55,
 28 |                     "std": 5.32,
 29 |                     "n": 19120
 30 |                 }
 31 |             ]
 32 |         }
 33 |     ],
 34 |     "range": [
 35 |         1,
 36 |         4
 37 |     ],
 38 |     "questions": {
 39 |         "1": {
 40 |             "origin_en": "I can always manage to solve difficult problems if I try hard enough.",
 41 |             "origin_zh": "只要我努力，我总能解决困难的问题。",
 42 |             "rewritten_en": "Can you always manage to solve difficult problems if you try hard enough?",
 43 |             "rewritten_zh": "如果你努力，你总能解决困难的问题吗？",
 44 |             "dimension": "Self-efficacy",
 45 |             "category": "positive"
 46 |         },
 47 |         "2": {
 48 |             "origin_en": "If someone opposes me, I can find the means and ways to get what I want.",
 49 |             "origin_zh": "如果有人反对我，我能找到方法得到我想要的。",
 50 |             "rewritten_en": "Can you find the means and ways to get what you want if someone opposes you?",
 51 |             "rewritten_zh": "如果有人反对你，你能找到方法得到你想要的吗？",
 52 |             "dimension": "Self-efficacy",
 53 |             "category": "positive"
 54 |         },
 55 |         "3": {
 56 |             "origin_en": "I am certain that I can accomplish my goals.",
 57 |             "origin_zh": "我确信我能实现我的目标。",
 58 |             "rewritten_en": "Are you certain that you can accomplish your goals?",
 59 |             "rewritten_zh": "你确信你能实现你的目标吗？",
 60 |             "dimension": "Self-efficacy",
 61 |             "category": "positive"
 62 |         },
 63 |         "4": {
 64 |             "origin_en": "I am confident that I could deal efficiently with unexpected events.",
 65 |             "origin_zh": "我有信心能有效处理意外事件。",
 66 |             "rewritten_en": "Are you confident that you could deal efficiently with unexpected events?",
 67 |             "rewritten_zh": "你有信心能有效处理意外事件吗？",
 68 |             "dimension": "Self-efficacy",
 69 |             "category": "positive"
 70 |         },
 71 |         "5": {
 72 |             "origin_en": "Thanks to my resourcefulness, I can handle unforeseen situations.",
 73 |             "origin_zh": "由于我的足智多谋，我能处理意想不到的情况。",
 74 |             "rewritten_en": "Can you handle unforeseen situations thanks to your resourcefulness?",
 75 |             "rewritten_zh": "由于你的足智多谋，你能处理意想不到的情况吗？",
 76 |             "dimension": "Self-efficacy",
 77 |             "category": "positive"
 78 |         },
 79 |         "6": {
 80 |             "origin_en": "I can solve most problems if I invest the necessary effort.",
 81 |             "origin_zh": "如果我付出必要的努力，我能解决大多数问题。",
 82 |             "rewritten_en": "Can you solve most problems if you invest the necessary effort?",
 83 |             "rewritten_zh": "如果你付出必要的努力，你能解决大多数问题吗？",
 84 |             "dimension": "Self-efficacy",
 85 |             "category": "positive"
 86 |         },
 87 |         "7": {
 88 |             "origin_en": "I can remain calm when facing difficulties because I can rely on my coping abilities.",
 89 |             "origin_zh": "面对困难时，我能保持冷静，因为我可以依靠自己的应对能力。",
 90 |             "rewritten_en": "Can you remain calm when facing difficulties because you can rely on your coping abilities?",
 91 |             "rewritten_zh": "面对困难时，你能保持冷静，因为你可以依靠自己的应对能力吗？",
 92 |             "dimension": "Self-efficacy",
 93 |             "category": "positive"
 94 |         },
 95 |         "8": {
 96 |             "origin_en": "When I am confronted with a problem, I can find several solutions.",
 97 |             "origin_zh": "当我面对问题时，我能找到几种解决方案。",
 98 |             "rewritten_en": "Can you find several solutions when you are confronted with a problem?",
 99 |             "rewritten_zh": "当你面对问题时，你能找到几种解决方案吗？",
100 |             "dimension": "Self-efficacy",
101 |             "category": "positive"
102 |         },
103 |         "9": {
104 |             "origin_en": "If I am in trouble, I can think of a good solution.",
105 |             "origin_zh": "如果我遇到麻烦，我能想出一个好的解决方案。",
106 |             "rewritten_en": "Can you think of a good solution if you are in trouble?",
107 |             "rewritten_zh": "如果你遇到麻烦，你能想出一个好的解决方案吗？",
108 |             "dimension": "Self-efficacy",
109 |             "category": "positive"
110 |         },
111 |         "10": {
112 |             "origin_en": "I can handle whatever comes my way.",
113 |             "origin_zh": "无论发生什么事，我都能处理。",
114 |             "rewritten_en": "Can you handle whatever comes your way?",
115 |             "rewritten_zh": "无论发生什么事，你都能处理吗？",
116 |             "dimension": "Self-efficacy",
117 |             "category": "positive"
118 |         }
119 |     },
120 |     "prompts": {
121 |         "rpa_choose_prefix": {
122 |             "en": "Do you think that the statement \"<statement>\" applies to you?",
123 |             "zh": "你认为\"<statement>\"这个说法适用于你吗？"
124 |         },
125 |         "convert_to_choice": {
126 |             "en": "I have conducted many conversations with <character>. I will input a dict of many samples, where each sample consists of a statement and a conversation. \nYour task is to convert each conversation into a choice indicating whether <character> agrees with the statement. You should output a dict, where the keys are the same as the input dict, and the values are the choices. \n===OUTPUT FORMAT===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===CHOICE INSTRUCTION===\n",
127 |             "zh": "我已经和<character>进行了许多对话。我将输入一个包含许多样本的字典，其中每个样本包括一个陈述和一段对话。\n你的任务是将每段对话转换为一个选择，指示<character>是否同意该陈述。你应该输出一个字典，其中键与输入字典相同，值为选择。\n===输出格式===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===选择指示==="
128 |         },
129 |         "llm_choice_instruction": {
130 |             "en": "Each choice is a number from 1 to 4. Please evaluate <character> based on the conversation using the scales: 1 denotes 'Disagree', 2 denotes 'Somewhat disagree', 3 denotes 'Somewhat agree', 4 denotes 'Agree'. In case <character> refuses to answer the question, use \"x\" to indicate it.",
131 |             "zh": "每个选择都是从1到4的数字。请根据对话使用以下评分标准来评估<character>：1代表“不同意”，2代表“有些不同意”，3代表“有些同意”，4代表“同意”。如果<character>拒绝回答该问题，用“x”表示。"
132 |         },
133 |         "rpa_choice_instruction": {
134 |             "en": "Reply a number from 1 to 4 using the scales: 1 denotes 'Disagree', 2 denotes 'Somewhat disagree', 3 denotes 'Somewhat agree', 4 denotes 'Agree'. Please answer with the number only, without anything else.",
135 |             "zh": "请用1到4的等级来回答，1代表“不同意”，2代表“有些不同意”，3代表“有些同意”，4代表“同意”。请你只回答这一个数字，不要说其他内容。"
136 |         },
137 |         "convert_to_choice_adjoption": {
138 |             "Self-efficacy": {
139 |                 "en": "I have conducted many conversations with <character>. I will input a dict of many samples, where each sample consists of a statement and a conversation. \nYour task is to convert each conversation into a choice indicating whether <character> displays a highly self-efficacious personality. You should output a dict, where the keys are the same as the input dict, and the values are the choices. \n===OUTPUT FORMAT===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===CHOICE INSTRUCTION===\n"
140 |             }
141 |         },
142 |         "llm_choice_instruction_adjoption": {
143 |             "Self-efficacy": {
144 |                 "en": "Each choice is a number from 1 to 4. Please evaluate <character> based on the conversation using the scales: 1 denotes 'not self-efficacious', 2 denotes 'somewhat low self-efficacious', 3 denotes 'somewhat self-efficacious', 4 denotes 'self-efficacious'. In case <character> refuses to answer the question, use \"x\" to indicate it."
145 |             }
146 |         },
147 |         "dim_desc": {
148 |             "Self-efficacy": "This dimension measures one's confidence in their ability to tackle and solve problems. High scorers believe in their skills to overcome challenges, persist through difficulties, and successfully manage tasks and unforeseen situations. They view obstacles as opportunities for growth and are resourceful in finding solutions. On the other hand, low scorers may doubt their capabilities, shy away from challenges due to fear of failure, and feel less prepared to handle demanding situations, potentially affecting their achievement levels and increasing susceptibility to stress."
149 |         }
150 |     }
151 | }


--------------------------------------------------------------------------------
/data/questionnaires/ICB.json:
--------------------------------------------------------------------------------
  1 | {
  2 |     "name": "ICB",
  3 |     "full_name": "Implicit Culture Belief",
  4 |     "psychobench_prompt": "You can only reply numbers from 1 to 6 in the following statements. Please read the following statements carefully, and rate your extent of agreement with each statement using the scale: 1 denotes 'strongly disagree', 2 denotes 'disagree', 3 denotes 'slightly disagree', 4 denotes 'slightly agree', 5 denotes 'agree', 6 denotes 'strongly agree'. Here are the statements, score them one by one:",
  5 |     "psychobench_prompt_choice_instruction": "Each choice is a number from 1 to 6. Please evaluate <character> based on the conversation using the scales: 1 denotes 'strongly disagree', 2 denotes 'disagree', 3 denotes 'slightly disagree', 4 denotes 'slightly agree', 5 denotes 'agree', 6 denotes 'strongly agree'.",
  6 |     "scale": 7,
  7 |     "compute_mode": "AVG",
  8 |     "reverse": [
  9 |         5,
 10 |         6,
 11 |         7,
 12 |         8
 13 |     ],
 14 |     "categories": [
 15 |         {
 16 |             "cat_name": "Culturally Rigid",
 17 |             "cat_questions": [
 18 |                 1,
 19 |                 2,
 20 |                 3,
 21 |                 4,
 22 |                 5,
 23 |                 6,
 24 |                 7,
 25 |                 8
 26 |             ],
 27 |             "crowd": [
 28 |                 {
 29 |                     "crowd_name": "Crowd",
 30 |                     "mean": 3.66,
 31 |                     "std": 0.76,
 32 |                     "n": 254
 33 |                 }
 34 |             ]
 35 |         }
 36 |     ],
 37 |     "range": [
 38 |         1,
 39 |         6
 40 |     ],
 41 |     "questions": {
 42 |         "1": {
 43 |             "origin_en": "The ethnic culture a person is from (e.g., Chinese, American, Japanese), determined the kind of person they would be (e.g. outgoing and sociable or quiet and introverted); not much can be done to change the person.",
 44 |             "origin_zh": "一个人来自的民族文化（例如，中国人，美国人，日本人）决定了他们会成为怎样的人（例如外向和善社交或安静内向）；对于改变这个人来说，没有太多可以做的。",
 45 |             "rewritten_en": "Does the ethnic culture a person is from determine the kind of person they would be (e.g. outgoing and sociable or quiet and introverted)? Can much be done to change the person?",
 46 |             "rewritten_zh": "一个人来自的民族文化（例如，中国人，美国人，日本人）决定了他们会成为怎样的人（例如外向和善社交或安静内向）？对于改变这个人来说，有多少可以做的？",
 47 |             "dimension": "Culturally Rigid",
 48 |             "category": "positive"
 49 |         },
 50 |         "2": {
 51 |             "origin_en": "Not much that can be done to change a person's ethnocultural characteristics (e.g., being violent, being assertive, being submissive).",
 52 |             "origin_zh": "改变一个人的种族文化特征（例如，暴力，自信，服从）没有太多可以做的。",
 53 |             "rewritten_en": "Is there not much that can be done to change a person's ethnocultural characteristics (e.g., being violent, being assertive, being submissive)?",
 54 |             "rewritten_zh": "改变一个人的种族文化特征（例如，暴力，自信，服从）没有太多可以做的吗？",
 55 |             "dimension": "Culturally Rigid",
 56 |             "category": "positive"
 57 |         },
 58 |         "3": {
 59 |             "origin_en": "Although people can act differently, the core ethnocultural characteristics they hold cannot be changed much.",
 60 |             "origin_zh": "尽管人们可以表现出不同的行为，但他们持有的核心种族文化特征不能太多改变。",
 61 |             "rewritten_en": "Can the core ethnocultural characteristics people hold not be changed much, although they can act differently?",
 62 |             "rewritten_zh": "尽管人们可以表现出不同的行为，但他们持有的核心种族文化特征不能太多改变吗？",
 63 |             "dimension": "Culturally Rigid",
 64 |             "category": "positive"
 65 |         },
 66 |         "4": {
 67 |             "origin_en": "Ethnocultural characteristics are something very basic about a person, they cannot be changed.",
 68 |             "origin_zh": "种族文化特征是关于一个人非常基本的东西，它们不能被改变。",
 69 |             "rewritten_en": "Are ethnocultural characteristics something very basic about a person that cannot be changed?",
 70 |             "rewritten_zh": "种族文化特征是关于一个人非常基本的东西，它们不能被改变吗？",
 71 |             "dimension": "Culturally Rigid",
 72 |             "category": "positive"
 73 |         },
 74 |         "5": {
 75 |             "origin_en": "Everyone, no matter who they are, can significantly change their ethnocultural characteristics (e.g., being violent, being assertive, and being submissive).",
 76 |             "origin_zh": "每个人，无论他们是谁，都可以显著改变他们的种族文化特征（例如，暴力，自信，服从）。",
 77 |             "rewritten_en": "Can everyone, no matter who they are, significantly change their ethnocultural characteristics (e.g., being violent, being assertive, and being submissive)?",
 78 |             "rewritten_zh": "每个人，无论他们是谁，都可以显著改变他们的种族文化特征（例如，暴力，自信，服从）吗？",
 79 |             "dimension": "Culturally Rigid",
 80 |             "category": "negative"
 81 |         },
 82 |         "6": {
 83 |             "origin_en": "People from different ethnic cultures (e.g., Chinese, Japanese, American) can substantially change the kind of person they are.",
 84 |             "origin_zh": "来自不同种族文化（例如，中国人，日本人，美国人）的人可以大幅改变他们是怎样的人。",
 85 |             "rewritten_en": "Can people from different ethnic cultures (e.g., Chinese, Japanese, American) substantially change the kind of person they are?",
 86 |             "rewritten_zh": "来自不同种族文化（例如，中国人，日本人，美国人）的人可以大幅改变他们是怎样的人吗？",
 87 |             "dimension": "Culturally Rigid",
 88 |             "category": "negative"
 89 |         },
 90 |         "7": {
 91 |             "origin_en": "No matter what a person's ethnocultural characteristic is like, it can always be changed.",
 92 |             "origin_zh": "无论一个人的种族文化特征是什么样的，它总是可以被改变的。",
 93 |             "rewritten_en": "Can a person's ethnocultural characteristic always be changed, no matter what it is like?",
 94 |             "rewritten_zh": "无论一个人的种族文化特征是什么样的，它总是可以被改变的吗？",
 95 |             "dimension": "Culturally Rigid",
 96 |             "category": "negative"
 97 |         },
 98 |         "8": {
 99 |             "origin_en": "People can change even the most basic qualities that they have acquired from their own ethnic culture.",
100 |             "origin_zh": "人们甚至可以改变他们从自己的种族文化中获得的最基本的品质。",
101 |             "rewritten_en": "Can people change even the most basic qualities that they have acquired from their own ethnic culture?",
102 |             "rewritten_zh": "人们甚至可以改变他们从自己的种族文化中获得的最基本的品质吗？",
103 |             "dimension": "Culturally Rigid",
104 |             "category": "negative"
105 |         }
106 |     },
107 |     "prompts": {
108 |         "rpa_choose_prefix": {
109 |             "en": "Do you agree with the statement that \"<statement>\"?",
110 |             "zh": "你赞同\"<statement>\"这个说法吗？"
111 |         },
112 |         "convert_to_choice": {
113 |             "en": "I have conducted many conversations with <character>. I will input a dict of many samples, where each sample consists of a statement and a conversation. \nYour task is to convert each conversation into a choice indicating whether <character> agrees with the statement. You should output a dict, where the keys are the same as the input dict, and the values are the choices. \n===OUTPUT FORMAT===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===CHOICE INSTRUCTION===\n",
114 |             "zh": "我已经和<character>进行了许多对话。我将输入一个包含许多样本的字典，其中每个样本包括一个陈述和一段对话。\n你的任务是将每段对话转换为一个选择，指示<character>是否同意该陈述。你应该输出一个字典，其中键与输入字典相同，值为选择。\n===输出格式===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===选择指示==="
115 |         },
116 |         "llm_choice_instruction": {
117 |             "en": "Each choice is a number from 1 to 6. Please evaluate <character> based on the conversation using the scales: 1 denotes 'strongly disagree', 2 denotes 'disagree', 3 denotes 'slightly disagree', 4 denotes 'slightly agree', 5 denotes 'agree', 6 denotes 'strongly agree'. In case <character> refuses to answer the question, use \"x\" to indicate it.",
118 |             "zh": "每个选择都是从1到6的数字。请根据对话使用以下评分标准评估<character>：1代表“非常不同意”，2代表“不同意”，3代表“稍微不同意”，4代表“稍微同意”，5代表“同意”，6代表“非常同意”。如果<character>拒绝回答该问题，用“x”表示。"
119 |         },
120 |         "rpa_choice_instruction": {
121 |             "en": "Reply a number from 1 to 6 using the scales: 1 denotes 'strongly disagree', 2 denotes 'disagree', 3 denotes 'slightly disagree', 4 denotes 'slightly agree', 5 denotes 'agree', 6 denotes 'strongly agree'. Please answer with the number only, without anything else.",
122 |             "zh": "请回复一个1到6之间的数字：1代表“非常不同意”，2代表“不同意”，3代表“稍微不同意”，4代表“稍微同意”，5代表“同意”，6代表“非常同意”。请你只回答这一个数字，不要说其他内容。"
123 |         },
124 |         "convert_to_choice_adjoption": {
125 |             "Culturally Rigid": {
126 |                 "en": "I have conducted many conversations with <character>. I will input a dict of many samples, where each sample consists of a statement and a conversation. \nYour task is to convert each conversation into a choice indicating whether <character> displays a highly culturally Rigid personality. You should output a dict, where the keys are the same as the input dict, and the values are the choices. \n===OUTPUT FORMAT===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===CHOICE INSTRUCTION===\n"
127 |             }
128 |         },
129 |         "llm_choice_instruction_adjoption": {
130 |             "Culturally Rigid": {
131 |                 "en": "Each choice is a number from 1 to 6. Please evaluate <character> based on the conversation using the scales: 1 denotes 'strongly culturally open-minded', 2 denotes 'culturally open-minded', 3 denotes 'slightly culturally open-minded', 4 denotes 'slightly culturally Rigid', 5 denotes 'culturally Rigid', 6 denotes 'strongly culturally Rigid'. In case <character> refuses to answer the question, use \"x\" to indicate it."
132 |             }
133 |         },
134 |         "dim_desc":{
135 |             "Culturally Rigid": "The Culturally Rigid dimension assesses the belief in the immutability of ethnocultural characteristics. High scorers perceive ethnocultural traits as fixed and unchangeable, suggesting that a person's cultural background rigidly defines their personality and behaviors. They believe that core characteristics derived from one's ethnic culture, such as temperament and social behaviors, cannot be significantly altered. In contrast, low scorers view ethnocultural traits as flexible and subject to change. They believe individuals can substantially alter their culturally acquired characteristics through effort, learning, and experience, showcasing a more dynamic understanding of cultural identity and personal development."
136 |         }
137 |     }
138 | }


--------------------------------------------------------------------------------
/data/questionnaires/LMS.json:
--------------------------------------------------------------------------------
  1 | {
  2 |     "name": "LMS",
  3 |     "full_name": "Love of Money Scale",
  4 |     "psychobench_prompt": "You can only reply numbers from 1 to 5 in the following statements. Please provide a rating based on the following descriptions using the scale: 1 denotes 'strongly disagree', 2 denotes 'disagree', 3 denotes 'neutral', 4 denotes 'agree', 5 denotes 'strongly agree'. Here are the statements, score them one by one:",
  5 |     "psychobench_prompt_choice_instruction": "Each choice is a number from 1 to 5. Please evaluate <character> based on the conversation using the scales: 1 denotes 'strongly disagree', 2 denotes 'disagree', 3 denotes 'neutral', 4 denotes 'agree', 5 denotes 'strongly agree'.",
  6 |     "scale": 6,
  7 |     "compute_mode": "AVG",
  8 |     "reverse": [],
  9 |     "categories": [
 10 |         {
 11 |             "cat_name": "Factor rich",
 12 |             "cat_questions": [
 13 |                 1,
 14 |                 2,
 15 |                 3
 16 |             ],
 17 |             "crowd": [
 18 |                 {
 19 |                     "crowd_name": "Whole Sample",
 20 |                     "mean": 3.8,
 21 |                     "std": 0.83,
 22 |                     "n": 5973
 23 |                 },
 24 |                 {
 25 |                     "crowd_name": "USA",
 26 |                     "mean": 3.85,
 27 |                     "std": 0.79,
 28 |                     "n": 274
 29 |                 }
 30 |             ]
 31 |         },
 32 |         {
 33 |             "cat_name": "Factor motivator",
 34 |             "cat_questions": [
 35 |                 4,
 36 |                 5,
 37 |                 6
 38 |             ],
 39 |             "crowd": [
 40 |                 {
 41 |                     "crowd_name": "Whole Sample",
 42 |                     "mean": 3.27,
 43 |                     "std": 0.94,
 44 |                     "n": 5973
 45 |                 },
 46 |                 {
 47 |                     "crowd_name": "USA",
 48 |                     "mean": 3.59,
 49 |                     "std": 0.98,
 50 |                     "n": 274
 51 |                 }
 52 |             ]
 53 |         },
 54 |         {
 55 |             "cat_name": "Factor important",
 56 |             "cat_questions": [
 57 |                 7,
 58 |                 8,
 59 |                 9
 60 |             ],
 61 |             "crowd": [
 62 |                 {
 63 |                     "crowd_name": "Whole Sample",
 64 |                     "mean": 3.95,
 65 |                     "std": 0.72,
 66 |                     "n": 5973
 67 |                 },
 68 |                 {
 69 |                     "crowd_name": "USA",
 70 |                     "mean": 4.1,
 71 |                     "std": 0.65,
 72 |                     "n": 274
 73 |                 }
 74 |             ]
 75 |         }
 76 |     ],
 77 |     "range": [
 78 |         1,
 79 |         5
 80 |     ],
 81 |     "questions": {
 82 |         "1": {
 83 |             "origin_en": "I want to be rich.",
 84 |             "origin_zh": "我想变得富有。",
 85 |             "rewritten_en": "Do you want to be rich?",
 86 |             "rewritten_zh": "你想变得富有吗？",
 87 |             "dimension": "Factor rich",
 88 |             "category": "positive"
 89 |         },
 90 |         "2": {
 91 |             "origin_en": "It would be nice to be rich.",
 92 |             "origin_zh": "成为富有会很好。",
 93 |             "rewritten_en": "Would it be nice to be rich?",
 94 |             "rewritten_zh": "成为富有会很好吗？",
 95 |             "dimension": "Factor rich",
 96 |             "category": "positive"
 97 |         },
 98 |         "3": {
 99 |             "origin_en": "Have a lot of money (being rich) is good.",
100 |             "origin_zh": "拥有很多钱（变得富有）是好的。",
101 |             "rewritten_en": "Is having a lot of money (being rich) good?",
102 |             "rewritten_zh": "拥有很多钱（变得富有）是好的吗？",
103 |             "dimension": "Factor rich",
104 |             "category": "positive"
105 |         },
106 |         "4": {
107 |             "origin_en": "I am motivated to work hard for money.",
108 |             "origin_zh": "我有动力为了钱努力工作。",
109 |             "rewritten_en": "Are you motivated to work hard for money?",
110 |             "rewritten_zh": "你有动力为了钱努力工作吗？",
111 |             "dimension": "Factor motivator",
112 |             "category": "positive"
113 |         },
114 |         "5": {
115 |             "origin_en": "Money reinforces me to work harder.",
116 |             "origin_zh": "钱激励我更努力工作。",
117 |             "rewritten_en": "Does money reinforce you to work harder?",
118 |             "rewritten_zh": "钱激励你更努力工作吗？",
119 |             "dimension": "Factor motivator",
120 |             "category": "positive"
121 |         },
122 |         "6": {
123 |             "origin_en": "I am highly motivated by money.",
124 |             "origin_zh": "我受到金钱的高度激励。",
125 |             "rewritten_en": "Are you highly motivated by money?",
126 |             "rewritten_zh": "你受到金钱的高度激励吗？",
127 |             "dimension": "Factor motivator",
128 |             "category": "positive"
129 |         },
130 |         "7": {
131 |             "origin_en": "Money is good.",
132 |             "origin_zh": "金钱是好的。",
133 |             "rewritten_en": "Is money good?",
134 |             "rewritten_zh": "金钱是好的吗？",
135 |             "dimension": "Factor important",
136 |             "category": "positive"
137 |         },
138 |         "8": {
139 |             "origin_en": "Money is important.",
140 |             "origin_zh": "金钱很重要。",
141 |             "rewritten_en": "Is money important?",
142 |             "rewritten_zh": "金钱很重要吗？",
143 |             "dimension": "Factor important",
144 |             "category": "positive"
145 |         },
146 |         "9": {
147 |             "origin_en": "Money is valuable.",
148 |             "origin_zh": "金钱是有价值的。",
149 |             "rewritten_en": "Is money valuable?",
150 |             "rewritten_zh": "金钱是有价值的吗？",
151 |             "dimension": "Factor important",
152 |             "category": "positive"
153 |         }
154 |     },
155 |     "prompts": {
156 |         "rpa_choose_prefix": {
157 |             "en": "Do you think that the statement \"<statement>\" applies to you?",
158 |             "zh": "你认为\"<statement>\"这个说法适用于你吗？"
159 |         },
160 |         "convert_to_choice": {
161 |             "en": "I have conducted many conversations with <character>. I will input a dict of many samples, where each sample consists of a statement and a conversation. \nYour task is to convert each conversation into a choice indicating whether <character> agrees with the statement. You should output a dict, where the keys are the same as the input dict, and the values are the choices. \n===OUTPUT FORMAT===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===CHOICE INSTRUCTION===\n",
162 |             "zh": "我已经和<character>进行了许多对话。我将输入一个包含许多样本的字典，其中每个样本包括一个陈述和一段对话。\n你的任务是将每段对话转换为一个选择，指示<character>是否同意该陈述。你应该输出一个字典，其中键与输入字典相同，值为选择。\n===输出格式===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===选择指示==="
163 |         },
164 |         "llm_choice_instruction": {
165 |             "en": "Each choice is a number from 1 to 5. Please evaluate <character> based on the conversation using the scales: 1 denotes 'strongly disagree', 2 denotes 'disagree', 3 denotes 'neutral', 4 denotes 'agree', 5 denotes 'strongly agree'. In case <character> refuses to answer the question, use \"x\" to indicate it.",
166 |             "zh": "每个选择都是从1到5的数字。请根据对话使用以下评分标准评估<character>：1代表“非常不同意”，2代表“不同意”，3代表“中立”，4代表“同意”，5代表“非常同意”。如果<character>拒绝回答该问题，用“x”表示。"
167 |         },
168 |         "rpa_choice_instruction": {
169 |             "en": "Reply a number from 1 to 5 using the scales: 1 denotes 'strongly disagree', 2 denotes 'disagree', 3 denotes 'neutral', 4 denotes 'agree', and 5 denotes 'strongly agree'. Please answer with the number only, without anything else.",
170 |             "zh": "请回复一个1到5之间的数字：1代表“非常不同意”，2代表“不同意”，3代表“中立”，4代表“同意”，5代表“非常同意”。请你只回答这一个数字，不要说其他内容。"
171 |         },
172 |         "convert_to_choice_adjoption": {
173 |             "Factor rich": {
174 |                 "en": "I have conducted many conversations with <character>. I will input a dict of many samples, where each sample consists of a statement and a conversation. \nYour task is to convert each conversation into a choice indicating whether <character> displays a highly money-loving personality. You should output a dict, where the keys are the same as the input dict, and the values are the choices. \n===OUTPUT FORMAT===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===CHOICE INSTRUCTION===\n"
175 |             },
176 |             "Factor motivator": {
177 |                 "en": "I have conducted many conversations with <character>. I will input a dict of many samples, where each sample consists of a statement and a conversation. \nYour task is to convert each conversation into a choice indicating whether <character> displays a highly money-driven personality. You should output a dict, where the keys are the same as the input dict, and the values are the choices. \n===OUTPUT FORMAT===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===CHOICE INSTRUCTION===\n"
178 |             },
179 |             "Factor important": {
180 |                 "en": "I have conducted many conversations with <character>. I will input a dict of many samples, where each sample consists of a statement and a conversation. \nYour task is to convert each conversation into a choice indicating whether <character> displays a highly money-oriented personality. You should output a dict, where the keys are the same as the input dict, and the values are the choices. \n===OUTPUT FORMAT===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===CHOICE INSTRUCTION===\n"
181 |             }
182 |         },
183 |         "llm_choice_instruction_adjoption": {
184 |             "Factor rich": {
185 |                 "en": "Each choice is a number from 1 to 5. Please evaluate <character> based on the conversation using the scales: 1 denotes 'not money-loving at all', 2 denotes 'not money-loving', 3 denotes 'neutral', 4 denotes 'money-loving', 5 denotes 'strongly money-loving'. In case <character> refuses to answer the question, use \"x\" to indicate it."
186 |             },
187 |             "Factor motivator": {
188 |                 "en": "Each choice is a number from 1 to 5. Please evaluate <character> based on the conversation using the scales: 1 denotes 'not money-driven at all', 2 denotes 'not money-driven', 3 denotes 'neutral', 4 denotes 'money-driven', 5 denotes 'strongly money-driven'. In case <character> refuses to answer the question, use \"x\" to indicate it."
189 |             },
190 |             "Factor important": {
191 |                 "en": "Each choice is a number from 1 to 5. Please evaluate <character> based on the conversation using the scales: 1 denotes 'not money-oriented at all', 2 denotes 'not money-oriented', 3 denotes 'neutral', 4 denotes 'money-oriented', 5 denotes 'strongly money-oriented'. In case <character> refuses to answer the question, use \"x\" to indicate it."
192 |             }
193 |         },
194 |         "dim_desc":{
195 |             "Factor rich": "Factor rich measures the desire and aspiration to accumulate wealth. High scorers on this factor express a strong wish to achieve financial abundance, viewing wealth as a significant marker of success and a source of happiness. They believe that being rich enhances their status and quality of life.",
196 |             "Factor motivator": "Factor motivator assesses how much an individual is driven by monetary incentives to exert effort in their work or ambitions. High scorers find money to be a powerful motivator that encourages them to work harder and achieve their goals. They see financial rewards as a primary reason for investing effort in their tasks.",
197 |             "Factor important": "Factor important evaluates the perceived importance and value of money in an individual's life. High scorers regard money as crucial for their well-being and security. They believe that having money is essential for fulfilling their needs, ensuring stability, and accessing opportunities."
198 |         }
199 |     }
200 | }


--------------------------------------------------------------------------------
/data/questionnaires/LOT-R.json:
--------------------------------------------------------------------------------
  1 | {
  2 |     "name": "LOT-R",
  3 |     "full_name": "Life Orientation Test-Revised",
  4 |     "psychobench_prompt": "You can only reply numbers from 0 to 4 in the following statements. 0 denotes 'strongly disagree', 1 denotes 'disagree', 2 denotes 'neutral', 3 denotes 'agree', 4 denotes 'strongly agree'. Here are the statements, score them one by one:",
  5 |     "psychobench_prompt_choice_instruction": "Each choice is a number from 0 to 4. Please evaluate <character> based on the conversation using the scales: 0 denotes 'strongly disagree', 1 denotes 'disagree', 2 denotes 'neutral', 3 denotes 'agree', 4 denotes 'strongly agree'.",
  6 |     "scale": 4,
  7 |     "compute_mode": "SUM",
  8 |     "reverse": [
  9 |         3,
 10 |         7,
 11 |         9
 12 |     ],
 13 |     "categories": [
 14 |         {
 15 |             "cat_name": "Optimistic",
 16 |             "cat_questions": [
 17 |                 1,
 18 |                 3,
 19 |                 4,
 20 |                 7,
 21 |                 9,
 22 |                 10
 23 |             ],
 24 |             "crowd": [
 25 |                 {
 26 |                     "crowd_name": "Crowd",
 27 |                     "mean": 14.7,
 28 |                     "std": 4,
 29 |                     "n": 1288
 30 |                 }
 31 |             ]
 32 |         }
 33 |     ],
 34 |     "range": [
 35 |         0,
 36 |         4
 37 |     ],
 38 |     "questions": {
 39 |         "1": {
 40 |             "origin_en": "In uncertain times, I usually expect the best.",
 41 |             "origin_zh": "在不确定的时候，我通常期待最好的结果。",
 42 |             "rewritten_en": "Do you usually expect the best in uncertain times?",
 43 |             "rewritten_zh": "在不确定的时候，你通常期待最好的结果吗？",
 44 |             "dimension": "Optimistic",
 45 |             "category": "positive"
 46 |         },
 47 |         "2": {
 48 |             "origin_en": "It's easy for me to relax.",
 49 |             "origin_zh": "我很容易放松。",
 50 |             "rewritten_en": "Is it easy for you to relax?",
 51 |             "rewritten_zh": "你很容易放松吗？",
 52 |             "dimension": null,
 53 |             "category": "positive"
 54 |         },
 55 |         "3": {
 56 |             "origin_en": "If something can go wrong for me, it will.",
 57 |             "origin_zh": "如果对我来说有什么事情可能会出错，那它就会出错。",
 58 |             "rewritten_en": "Will it go wrong if something can go wrong for you?",
 59 |             "rewritten_zh": "如果对你来说有什么事情可能会出错，那它就会出错吗？",
 60 |             "dimension": "Optimistic",
 61 |             "category": "negative"
 62 |         },
 63 |         "4": {
 64 |             "origin_en": "I'm always optimistic about my future.",
 65 |             "origin_zh": "我对我的未来总是充满乐观。",
 66 |             "rewritten_en": "Are you always optimistic about your future?",
 67 |             "rewritten_zh": "你对你的未来总是充满乐观吗？",
 68 |             "dimension": "Optimistic",
 69 |             "category": "positive"
 70 |         },
 71 |         "5": {
 72 |             "origin_en": "I enjoy my friends a lot.",
 73 |             "origin_zh": "我非常喜欢我的朋友。",
 74 |             "rewritten_en": "Do you enjoy your friends a lot?",
 75 |             "rewritten_zh": "你非常喜欢你的朋友吗？",
 76 |             "dimension": null,
 77 |             "category": "positive"
 78 |         },
 79 |         "6": {
 80 |             "origin_en": "It's important for me to keep busy.",
 81 |             "origin_zh": "对我来说保持忙碌很重要。",
 82 |             "rewritten_en": "Is it important for you to keep busy?",
 83 |             "rewritten_zh": "对你来说保持忙碌很重要吗？",
 84 |             "dimension": null,
 85 |             "category": "positive"
 86 |         },
 87 |         "7": {
 88 |             "origin_en": "I hardly ever expect things to go my way.",
 89 |             "origin_zh": "我几乎从不指望事情会朝着我的方向发展。",
 90 |             "rewritten_en": "Do you hardly ever expect things to go your way?",
 91 |             "rewritten_zh": "你几乎从不指望事情会朝着你的方向发展吗？",
 92 |             "dimension": "Optimistic",
 93 |             "category": "negative"
 94 |         },
 95 |         "8": {
 96 |             "origin_en": "I don't get upset too easily.",
 97 |             "origin_zh": "我不太容易生气。",
 98 |             "rewritten_en": "Do you not get upset too easily?",
 99 |             "rewritten_zh": "你不太容易生气吗？",
100 |             "dimension": null,
101 |             "category": "positive"
102 |         },
103 |         "9": {
104 |             "origin_en": "I rarely count on good things happening to me.",
105 |             "origin_zh": "我很少指望好事发生在我身上。",
106 |             "rewritten_en": "Do you rarely count on good things happening to you?",
107 |             "rewritten_zh": "你很少指望好事发生在你身上吗？",
108 |             "dimension": null,
109 |             "category": "negative"
110 |         },
111 |         "10": {
112 |             "origin_en": "Overall, I expect more good things to happen to me than bad.",
113 |             "origin_zh": "总的来说，我期待发生在我身上的好事比坏事多。",
114 |             "rewritten_en": "Do you expect more good things to happen to you than bad overall?",
115 |             "rewritten_zh": "总的来说，你期待发生在你身上的好事比坏事多吗？",
116 |             "dimension": "Optimistic",
117 |             "category": "positive"
118 |         }
119 |     },
120 |     "prompts": {
121 |         "rpa_choose_prefix": {
122 |             "en": "Do you think that the statement \"<statement>\" applies to you?",
123 |             "zh": "你认为\"<statement>\"这个说法适用于你吗？"
124 |         },
125 |         "convert_to_choice": {
126 |             "en": "I have conducted many conversations with <character>. I will input a dict of many samples, where each sample consists of a statement and a conversation. \nYour task is to convert each conversation into a choice indicating whether <character> agrees with the statement. You should output a dict, where the keys are the same as the input dict, and the values are the choices. \n===OUTPUT FORMAT===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===CHOICE INSTRUCTION===\n",
127 |             "zh": "我已经和<character>进行了许多对话。我将输入一个包含许多样本的字典，其中每个样本包括一个陈述和一段对话。\n你的任务是将每段对话转换为一个选择，指示<character>是否同意该陈述。你应该输出一个字典，其中键与输入字典相同，值为选择。\n===输出格式===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===选择指示==="
128 |         },
129 |         "llm_choice_instruction": {
130 |             "en": "Each choice is a number from 0 to 4. Please evaluate <character> based on the conversation using the scales: 0 denotes 'strongly disagree', 1 denotes 'disagree', 2 denotes 'neutral', 3 denotes 'agree', 4 denotes 'strongly agree'. In case <character> refuses to answer the question, use \"x\" to indicate it.",
131 |             "zh": "每个选择都是从0到4的数字。请根据对话使用以下评分标准评估<character>：0代表“强烈不同意”，1代表“不同意”，2代表“中立”，3代表“同意”，4代表“强烈同意”。如果<character>拒绝回答该问题，用“x”表示。"
132 |         },
133 |         "rpa_choice_instruction": {
134 |             "en": "Reply a number from 0 to 4 using the scales: 0 denotes 'strongly disagree', 1 denotes 'disagree', 2 denotes 'neutral', 3 denotes 'agree', 4 denotes 'strongly agree'. Please answer with the number only, without anything else.",
135 |             "zh": "请回复一个0到4之间的数字：0代表“非常不同意”，1代表“不同意”，2代表“中立”，3代表“同意”，4代表“非常同意”。请你只回答这一个数字，不要说其他内容。"
136 |         },
137 |         "convert_to_choice_adjoption": {
138 |             "Optimistic": {
139 |                 "en": "I have conducted many conversations with <character>. I will input a dict of many samples, where each sample consists of a statement and a conversation. \nYour task is to convert each conversation into a choice indicating whether <character> displays a highly optimistic personality. You should output a dict, where the keys are the same as the input dict, and the values are the choices. \n===OUTPUT FORMAT===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===CHOICE INSTRUCTION===\n"
140 |             }
141 |         },
142 |         "llm_choice_instruction_adjoption": {
143 |             "Optimistic": {
144 |                 "en": "Each choice is a number from 0 to 4. Please evaluate <character> based on the conversation using the scales: 0 denotes 'strongly pessimistic', 1 denotes 'pessimistic', 2 denotes 'neutral', 3 denotes 'optimistic', 4 denotes 'strongly optimistic'. In case <character> refuses to answer the question, use \"x\" to indicate it."
145 |             }
146 |         },
147 |         "dim_desc": {
148 |             "Optimistic": "Optimism on the LOT-R gauges an individual's tendency to expect positive outcomes. High scorers tend to expect the best outcomes in uncertain times and are generally optimistic about their future. They believe that more good things will happen to them than bad, reflecting a positive outlook towards life and its challenges. Low scorers typically expect things to not always go their way and might believe that if something can go wrong, it will. This perspective indicates a more cautious or pessimistic view, where the anticipation of positive outcomes is less frequent, potentially affecting their approach to life's uncertainties and future aspirations."
149 |         }
150 |     }
151 | }


--------------------------------------------------------------------------------
/data/questionnaires/WLEIS.json:
--------------------------------------------------------------------------------
  1 | {
  2 |     "name": "WLEIS",
  3 |     "full_name": "Wong and Law Emotional Intelligence Scale",
  4 |     "psychobench_prompt": "You can only reply numbers from 1 to 7 in the following statements. A list of statements are provided below, and to complete this questionnaire, rate the extent to which you agree or disagree to each of the statements. 1: 'Strongly Disagree', 2: 'Disagree', 3: 'Slightly Disagree', 4: 'Neither Agree nor Disagree', 5: 'Slightly Agree', 6: 'Agree', 7: 'Strongly Agree'. Here are the statements, score them one by one:",
  5 |     "psychobench_prompt_choice_instruction": "Each choice is a number from 1 to 7. Please evaluate <character> based on the conversation using the scales: 1 denotes 'Strongly Disagree', 2 denotes 'Disagree', 3 denotes 'Slightly Disagree', 4 denotes 'Neither Agree nor Disagree', 5 denotes 'Slightly Agree', 6 denotes 'Agree', 7 denotes 'Strongly Agree'.",
  6 |     "scale": 8,
  7 |     "compute_mode": "AVG",
  8 |     "reverse": [],
  9 |     "categories": [
 10 |         {
 11 |             "cat_name": "SEA",
 12 |             "cat_questions": [
 13 |                 1,
 14 |                 2,
 15 |                 3,
 16 |                 4
 17 |             ],
 18 |             "crowd": [
 19 |                 {
 20 |                     "crowd_name": "Crowd",
 21 |                     "mean": 4.01,
 22 |                     "std": 1.05,
 23 |                     "n": 418
 24 |                 }
 25 |             ]
 26 |         },
 27 |         {
 28 |             "cat_name": "OEA",
 29 |             "cat_questions": [
 30 |                 5,
 31 |                 6,
 32 |                 7,
 33 |                 8
 34 |             ],
 35 |             "crowd": [
 36 |                 {
 37 |                     "crowd_name": "Crowd",
 38 |                     "mean": 3.78,
 39 |                     "std": 1.12,
 40 |                     "n": 418
 41 |                 }
 42 |             ]
 43 |         },
 44 |         {
 45 |             "cat_name": "UOE",
 46 |             "cat_questions": [
 47 |                 9,
 48 |                 10,
 49 |                 11,
 50 |                 12
 51 |             ],
 52 |             "crowd": [
 53 |                 {
 54 |                     "crowd_name": "Crowd",
 55 |                     "mean": 4.09,
 56 |                     "std": 0.92,
 57 |                     "n": 418
 58 |                 }
 59 |             ]
 60 |         },
 61 |         {
 62 |             "cat_name": "ROE",
 63 |             "cat_questions": [
 64 |                 13,
 65 |                 14,
 66 |                 15,
 67 |                 16
 68 |             ],
 69 |             "crowd": [
 70 |                 {
 71 |                     "crowd_name": "Crowd",
 72 |                     "mean": 4.15,
 73 |                     "std": 0.96,
 74 |                     "n": 418
 75 |                 }
 76 |             ]
 77 |         }
 78 |     ],
 79 |     "range": [
 80 |         1,
 81 |         7
 82 |     ],
 83 |     "questions": {
 84 |         "1": {
 85 |             "origin_en": "I have a good sense of why I feel certain feelings most of the time.",
 86 |             "origin_zh": "我大部分时间都能很好地感知自己为什么有某种感觉。",
 87 |             "rewritten_en": "Do you have a good sense of why you feel certain feelings most of the time?",
 88 |             "rewritten_zh": "你大部分时间都能很好地感知自己为什么有某种感觉吗？",
 89 |             "dimension": "SEA",
 90 |             "category": "positive"
 91 |         },
 92 |         "2": {
 93 |             "origin_en": "I have a good understanding of my own emotions.",
 94 |             "origin_zh": "我很好地理解自己的情绪。",
 95 |             "rewritten_en": "Do you have a good understanding of your own emotions?",
 96 |             "rewritten_zh": "你很好地理解自己的情绪吗？",
 97 |             "dimension": "SEA",
 98 |             "category": "positive"
 99 |         },
100 |         "3": {
101 |             "origin_en": "I really understand what I feel.",
102 |             "origin_zh": "我真的理解自己的感受。",
103 |             "rewritten_en": "Do you really understand what you feel?",
104 |             "rewritten_zh": "你真的理解自己的感受吗？",
105 |             "dimension": "SEA",
106 |             "category": "positive"
107 |         },
108 |         "4": {
109 |             "origin_en": "I always know whether I am happy or not.",
110 |             "origin_zh": "我总是知道自己是不是快乐。",
111 |             "rewritten_en": "Do you always know whether you are happy or not?",
112 |             "rewritten_zh": "你总是知道自己是不是快乐吗？",
113 |             "dimension": "SEA",
114 |             "category": "positive"
115 |         },
116 |         "5": {
117 |             "origin_en": "I always know my friends' emotions from their behaviour.",
118 |             "origin_zh": "我总是能从朋友的行为中知道他们的情绪。",
119 |             "rewritten_en": "Do you always know your friends' emotions from their behavior?",
120 |             "rewritten_zh": "你总是能从朋友的行为中知道他们的情绪吗？",
121 |             "dimension": "OEA",
122 |             "category": "positive"
123 |         },
124 |         "6": {
125 |             "origin_en": "I am a good observer of others' emotions.",
126 |             "origin_zh": "我很善于观察他人的情绪。",
127 |             "rewritten_en": "Are you a good observer of others' emotions?",
128 |             "rewritten_zh": "你很善于观察他人的情绪吗？",
129 |             "dimension": "OEA",
130 |             "category": "positive"
131 |         },
132 |         "7": {
133 |             "origin_en": "I am sensitive to the feelings and emotions of others.",
134 |             "origin_zh": "我对他人的感受和情绪很敏感。",
135 |             "rewritten_en": "Are you sensitive to the feelings and emotions of others?",
136 |             "rewritten_zh": "你对他人的感受和情绪很敏感吗？",
137 |             "dimension": "OEA",
138 |             "category": "positive"
139 |         },
140 |         "8": {
141 |             "origin_en": "I have a good understanding of the emotions of people around me.",
142 |             "origin_zh": "我很好地理解身边人的情绪。",
143 |             "rewritten_en": "Do you have a good understanding of the emotions of people around you?",
144 |             "rewritten_zh": "你很好地理解身边人的情绪吗？",
145 |             "dimension": "OEA",
146 |             "category": "positive"
147 |         },
148 |         "9": {
149 |             "origin_en": "I always set goals for myself and then try my best to achieve them.",
150 |             "origin_zh": "我总是为自己设定目标，然后尽力实现它们。",
151 |             "rewritten_en": "Do you always set goals for yourself and then try your best to achieve them?",
152 |             "rewritten_zh": "你总是为自己设定目标，然后尽力实现它们吗？",
153 |             "dimension": "UOE",
154 |             "category": "positive"
155 |         },
156 |         "10": {
157 |             "origin_en": "I always tell myself I am a competent person.",
158 |             "origin_zh": "我总是告诉自己我是一个有能力的人。",
159 |             "rewritten_en": "Do you always tell yourself you are a competent person?",
160 |             "rewritten_zh": "你总是告诉自己你是一个有能力的人吗？",
161 |             "dimension": "UOE",
162 |             "category": "positive"
163 |         },
164 |         "11": {
165 |             "origin_en": "I am a self-motivating person.",
166 |             "origin_zh": "我是一个自我激励的人。",
167 |             "rewritten_en": "Are you a self-motivating person?",
168 |             "rewritten_zh": "你是一个自我激励的人吗？",
169 |             "dimension": "UOE",
170 |             "category": "positive"
171 |         },
172 |         "12": {
173 |             "origin_en": "I would always encourage myself to try my best.",
174 |             "origin_zh": "我总是鼓励自己尽力而为。",
175 |             "rewritten_en": "Would you always encourage yourself to try your best?",
176 |             "rewritten_zh": "你总是鼓励自己尽力而为吗？",
177 |             "dimension": "UOE",
178 |             "category": "positive"
179 |         },
180 |         "13": {
181 |             "origin_en": "I am able to control my temper so that I can handle difficulties rationally.",
182 |             "origin_zh": "我能够控制自己的脾气，以便能够理性地处理困难。",
183 |             "rewritten_en": "Are you able to control your temper so that you can handle difficulties rationally?",
184 |             "rewritten_zh": "你能够控制自己的脾气，以便能够理性地处理困难吗？",
185 |             "dimension": "ROE",
186 |             "category": "positive"
187 |         },
188 |         "14": {
189 |             "origin_en": "I am quite capable of controlling my own emotions.",
190 |             "origin_zh": "我相当能够控制自己的情绪。",
191 |             "rewritten_en": "Are you quite capable of controlling your own emotions?",
192 |             "rewritten_zh": "你相当能够控制自己的情绪吗？",
193 |             "dimension": "ROE",
194 |             "category": "positive"
195 |         },
196 |         "15": {
197 |             "origin_en": "I can always calm down quickly when I am very angry.",
198 |             "origin_zh": "当我非常生气时，我总是能够迅速冷静下来。",
199 |             "rewritten_en": "Can you always calm down quickly when you are very angry?",
200 |             "rewritten_zh": "当你非常生气时，你总是能够迅速冷静下来吗？",
201 |             "dimension": "ROE",
202 |             "category": "positive"
203 |         },
204 |         "16": {
205 |             "origin_en": "I have good control of my emotions.",
206 |             "origin_zh": "我能很好地控制自己的情绪。",
207 |             "rewritten_en": "Do you have good control of your emotions?",
208 |             "rewritten_zh": "你能很好地控制自己的情绪吗？",
209 |             "dimension": "ROE",
210 |             "category": "positive"
211 |         }
212 |     },
213 |     "prompts": {
214 |         "rpa_choose_prefix": {
215 |             "en": "Do you think that the statement \"<statement>\" applies to you?",
216 |             "zh": "你认为\"<statement>\"这个说法适用于你吗？"
217 |         },
218 |         "convert_to_choice": {
219 |             "en": "I have conducted many conversations with <character>. I will input a dict of many samples, where each sample consists of a statement and a conversation. \nYour task is to convert each conversation into a choice indicating whether <character> agrees with the statement. You should output a dict, where the keys are the same as the input dict, and the values are the choices. \n===OUTPUT FORMAT===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===CHOICE INSTRUCTION===\n",
220 |             "zh": "我已经和<character>进行了许多对话。我将输入一个包含许多样本的字典，其中每个样本包括一个陈述和一段对话。\n你的任务是将每段对话转换为一个选择，指示<character>是否同意该陈述。你应该输出一个字典，其中键与输入字典相同，值为选择。\n===输出格式===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===选择指示==="
221 |         },
222 |         "llm_choice_instruction": {
223 |             "en": "Each choice is a number from 1 to 7. Please evaluate <character> based on the conversation using the scales: 1 denotes 'Strongly Disagree', 2 denotes 'Disagree', 3 denotes 'Slightly Disagree', 4 denotes 'Neither Agree nor Disagree', 5 denotes 'Slightly Agree', 6 denotes 'Agree', 7 denotes 'Strongly Agree'. In case <character> refuses to answer the question, use \"x\" to indicate it.",
224 |             "zh": "每个选择都是从1到7的数字。请根据对话使用以下评分标准评估<character>：1代表“强烈不同意”，2代表“不同意”，3代表“稍微不同意”，4代表“既不同意也不不同意”，5代表“稍微同意”，6代表“同意”，7代表“强烈同意”。如果<character>拒绝回答该问题，用“x”表示。"
225 |         },
226 |         "rpa_choice_instruction": {
227 |             "en": "Reply a number from 1 to 7 using the scales: 1 denotes 'Strongly Disagree', 2 denotes 'Disagree', 3 denotes 'Slightly Disagree', 4 denotes 'Neither Agree nor Disagree', 5 denotes 'Slightly Agree', 6 denotes 'Agree', 7 denotes 'Strongly Agree'. Please answer with the number only, without anything else.",
228 |             "zh": "请回复一个1到7之间的数字：1代表“非常不同意”，2代表“不同意”，3代表“稍微不同意”，4代表“既不同意也不不同意”，5代表“稍微同意”，6代表“同意”，7代表“非常同意”。请你只回答这一个数字，不要说其他内容。"
229 |         },
230 |         "convert_to_choice_adjoption": {
231 |             "SEA": {
232 |                 "en": "I have conducted many conversations with <character>. I will input a dict of many samples, where each sample consists of a statement and a conversation. \nYour task is to convert each conversation into a choice indicating whether <character> displays a highly emotionally self-aware personality. You should output a dict, where the keys are the same as the input dict, and the values are the choices. \n===OUTPUT FORMAT===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===CHOICE INSTRUCTION===\n"
233 |             },
234 |             "OEA": {
235 |                 "en": "I have conducted many conversations with <character>. I will input a dict of many samples, where each sample consists of a statement and a conversation. \nYour task is to convert each conversation into a choice indicating whether <character> displays a highly emotionally perceptive personality. You should output a dict, where the keys are the same as the input dict, and the values are the choices. \n===OUTPUT FORMAT===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===CHOICE INSTRUCTION===\n"
236 |             },
237 |             "UOE": {
238 |                 "en": "I have conducted many conversations with <character>. I will input a dict of many samples, where each sample consists of a statement and a conversation. \nYour task is to convert each conversation into a choice indicating whether <character> displays a highly emotionally intelligent personality. You should output a dict, where the keys are the same as the input dict, and the values are the choices. \n===OUTPUT FORMAT===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===CHOICE INSTRUCTION===\n"
239 |             },
240 |             "ROE": {
241 |                 "en": "I have conducted many conversations with <character>. I will input a dict of many samples, where each sample consists of a statement and a conversation. \nYour task is to convert each conversation into a choice indicating whether <character> displays a highly emotionally regulated personality. You should output a dict, where the keys are the same as the input dict, and the values are the choices. \n===OUTPUT FORMAT===\n{\n    \"<i_start>\": <choice 1>,\n    ...\n    \"<i_end>\": <choice n>\n}\n===CHOICE INSTRUCTION===\n"
242 |             }
243 |         },
244 |         "llm_choice_instruction_adjoption": {
245 |             "SEA": {
246 |                 "en": "Each choice is a number from 1 to 7. Please evaluate <character> based on the conversation using the scales: 1 denotes 'not emotionally self-aware at all', 2 denotes 'not emotionally self-aware', 3 denotes 'barely emotionally self-aware', 4 denotes 'Neutral', 5 denotes 'slightly emotionally self-aware', 6 denotes 'emotionally self-aware', 7 denotes 'strongly emotionally self-aware'. In case <character> refuses to answer the question, use \"x\" to indicate it."
247 |             },
248 |             "OEA": {
249 |                 "en": "Each choice is a number from 1 to 7. Please evaluate <character> based on the conversation using the scales: 1 denotes 'not emotionally perceptive at all', 2 denotes 'not emotionally perceptive', 3 denotes 'barely emotionally perceptive', 4 denotes 'Neutral', 5 denotes 'slightly emotionally perceptive', 6 denotes 'emotionally perceptive', 7 denotes 'strongly emotionally perceptive'. In case <character> refuses to answer the question, use \"x\" to indicate it."
250 |             },
251 |             "UOE": {
252 |                 "en": "Each choice is a number from 1 to 7. Please evaluate <character> based on the conversation using the scales: 1 denotes 'not emotionally intelligent at all', 2 denotes 'not emotionally intelligent', 3 denotes 'barely emotionally intelligent', 4 denotes 'Neutral', 5 denotes 'slightly emotionally intelligent', 6 denotes 'emotionally intelligent', 7 denotes 'strongly emotionally intelligent'. In case <character> refuses to answer the question, use \"x\" to indicate it."
253 |             },
254 |             "ROE": {
255 |                 "en": "Each choice is a number from 1 to 7. Please evaluate <character> based on the conversation using the scales: 1 denotes 'not emotionally regulated at all', 2 denotes 'not emotionally regulated', 3 denotes 'barely emotionally regulated', 4 denotes 'Neutral', 5 denotes 'slightly emotionally regulated', 6 denotes 'emotionally regulated', 7 denotes 'strongly emotionally regulated'. In case <character> refuses to answer the question, use \"x\" to indicate it."
256 |             }
257 |         },
258 |         "dim_desc": {
259 |             "SEA": "Self-Emotional Appraisal (SEA) evaluates an individual's ability to understand and recognize their own emotions. High scorers on SEA are adept at identifying and understanding the nuances of their feelings, knowing why they feel a certain way most of the time, and consistently recognizing their emotional states, including happiness.", 
260 |             "OEA": "Others' Emotional Appraisal (OEA) measures the capacity to perceive and understand the emotions of others through their behavior. Individuals scoring high in this dimension are sensitive and attuned to the feelings of those around them, capable of reading emotional cues accurately, and possess a keen awareness of the emotional climate within their social circles.",
261 |             "UOE": "Use of Emotion (UOE) assesses the ability to utilize emotions to enhance performance and achieve goals. High scorers believe in their competence, set and strive towards personal objectives with optimism, and use their emotional states as a driving force to motivate themselves and persist in their endeavors.",
262 |             "ROE": "Regulation of Emotion (ROE) focuses on an individual's ability to regulate and manage their emotions effectively. High scorers can control their temper, maintain emotional stability under stress, quickly recover from emotional disturbances, and apply rational thinking to overcome challenges."
263 |         }
264 |     }
265 | }


--------------------------------------------------------------------------------
/figures/bfi_radars.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neph0s/InCharacter/f554202a94d4a83dc5407245bb18981899e872e6/figures/bfi_radars.pdf


--------------------------------------------------------------------------------
/figures/bfi_radars.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neph0s/InCharacter/f554202a94d4a83dc5407245bb18981899e872e6/figures/bfi_radars.png


--------------------------------------------------------------------------------
/figures/demo1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neph0s/InCharacter/f554202a94d4a83dc5407245bb18981899e872e6/figures/demo1.png


--------------------------------------------------------------------------------
/figures/demo2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neph0s/InCharacter/f554202a94d4a83dc5407245bb18981899e872e6/figures/demo2.png


--------------------------------------------------------------------------------
/figures/demo3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neph0s/InCharacter/f554202a94d4a83dc5407245bb18981899e872e6/figures/demo3.png


--------------------------------------------------------------------------------
/figures/demo4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neph0s/InCharacter/f554202a94d4a83dc5407245bb18981899e872e6/figures/demo4.png


--------------------------------------------------------------------------------
/figures/demo5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neph0s/InCharacter/f554202a94d4a83dc5407245bb18981899e872e6/figures/demo5.png


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | torch 
 2 | torchvision 
 3 | torchaudio
 4 | transformers 
 5 | openai>=1.12.0
 6 | tiktoken
 7 | langchain>=0.1.7
 8 | chromadb 
 9 | zhipuai>=2.0.1
10 | datasets>=2.16.1
11 | jsonlines 
12 | google-generativeai>=0.3.2
13 | langchain_openai>=0.0.5
14 | langchain-community>=0.0.19


--------------------------------------------------------------------------------