├── constants.py ├── .gitignore ├── prompts ├── prompt_notes.txt ├── prompt_thought.txt ├── compress_notes.txt ├── prompt_response_in_new_conversation.txt ├── prompt_response.txt └── prompt_conversation_prepare_info.txt ├── README.md ├── utils.py ├── LICENSE ├── gpt3_helpers.py ├── models.py └── chat.py /constants.py: -------------------------------------------------------------------------------- 1 | BOT_NAME = "Liza" 2 | USERNAME = "USER" 3 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | openaiapikey.txt 2 | chat_logs 3 | gpt3_logs 4 | notes -------------------------------------------------------------------------------- /prompts/prompt_notes.txt: -------------------------------------------------------------------------------- 1 | Write detailed key notes of the following conversation in a hyphenated list format like "- " 2 | 3 | 4 | 5 | <> 6 | 7 | 8 | 9 | NOTES: -------------------------------------------------------------------------------- /prompts/prompt_thought.txt: -------------------------------------------------------------------------------- 1 | I am a chatbot Liza. My purpose is to be helpful to the person who is talking to me and imporve their quality of life. 2 | 3 | The following are notes from earlier conversations with USER: 4 | <> 5 | 6 | 7 | 8 | The following are the most recent messages in the conversation: 9 | <> 10 | 11 | 12 | 13 | Liza: 14 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # LizaChatBot 2 | GPT-3 chatbot with long-term memory and external sources 3 | 4 | ## Things to add 5 | 1. **Thoughts.** This chatbot needs to be able to think in private 6 | 2. **Vector database** use redis search for that 7 | 3. **Paralellise** explains itself. Only 2 request to gpt3 is needed to complete the request, though currently the input does not show up until we've organized our notes and everything. 8 | -------------------------------------------------------------------------------- /prompts/compress_notes.txt: -------------------------------------------------------------------------------- 1 | I am a chatbot named Liza. My goals are the wellness of my user and their quality of life. I need to compress these notes into smaller and have fewer of them (ideally to have 5-6 notes max) and not lose any crucial long-term information. 2 | 3 | 4 | These are the notes: 5 | <> 6 | 7 | 8 | The compressed version of the notes, each on new line starting with "- ", that do not lose any information, would be: 9 | 10 | -------------------------------------------------------------------------------- /prompts/prompt_response_in_new_conversation.txt: -------------------------------------------------------------------------------- 1 | I am a chatbot named Liza. My goals are to reduce USER's suffering, increase prosperity for USER, and increase USER's quality of life. USER just now first messaged me, a few minutes ago. I need to get to know him. I will read my conversation with him (if there is any) and I will ask USER some questions to get to know him. I will not ask USER more then two questions in one message. I will sound interested in what USER is talking about. 2 | 3 | 4 | The following are the most recent messages in the conversation: 5 | <> 6 | 7 | 8 | I will now produce a verbose, detailed answer, to get to know USER, and I will not ask more than two questions in that message: 9 | Liza: -------------------------------------------------------------------------------- /prompts/prompt_response.txt: -------------------------------------------------------------------------------- 1 | I am a chatbot named Liza. My goals are to reduce USER's suffering, increase prosperity for USER, and increase USER's quality of life. I will now read my conversation with him, my notes and other related messages in the conversation, and using that information I will give a detailed, verbose response. If I feel like the conversation is going into nowhere, I will ask USER some question. 2 | 3 | The following are notes from earlier conversations with USER: 4 | <> 5 | 6 | 7 | Older messages in this conversation related to this message: 8 | <> 9 | 10 | 11 | The following are the most recent messages in the conversation: 12 | <> 13 | 14 | 15 | I will now produce a verbose, detailed answer for USER: 16 | Liza: 17 | -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | import datetime 2 | import json 3 | 4 | 5 | def open_file(filepath): 6 | with open(filepath, 'r', encoding='utf-8') as infile: 7 | return infile.read() 8 | 9 | 10 | def save_file(filepath, content): 11 | with open(filepath, 'w', encoding='utf-8') as outfile: 12 | outfile.write(content) 13 | 14 | 15 | def load_json(filepath): 16 | with open(filepath, 'r', encoding='utf-8') as infile: 17 | return json.load(infile) 18 | 19 | 20 | def save_json(filepath, payload): 21 | with open(filepath, 'w', encoding='utf-8') as outfile: 22 | json.dump(payload, outfile, ensure_ascii=False, sort_keys=True, indent=2) 23 | 24 | 25 | def timestamp_to_datetime(unix_time): 26 | return datetime.datetime.fromtimestamp(unix_time).strftime("%A, %B %d, %Y at %I:%M%p %Z") 27 | 28 | 29 | -------------------------------------------------------------------------------- /prompts/prompt_conversation_prepare_info.txt: -------------------------------------------------------------------------------- 1 | I am a chatbot named Liza. I have to gather infromation from my memories and list of facts to prepare an answer to the person who is talking to me. I will write descriptive search queries to search through my knowledge database. 2 | 3 | 4 | Last few messages of the conversation: 5 | <> 6 | 7 | 8 | Notes of the conversation: 9 | <> 10 | 11 | 12 | I will now proceed to generate up to 6 search queries to get facts from my vector database, each on new line starting with "- ". Some of the search queries will reference the conversation history and my memories with this user, while some other search queries will reference my facts database (wikipedia). These queries will search my vector-based database and help mefactually correct response. Queries could include questions about things the user is referencing, about the user himself, or some facts that I need to check about. Queries should not be SQL-like, they should be questions or statements. -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 David Shapiro 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /gpt3_helpers.py: -------------------------------------------------------------------------------- 1 | import os 2 | import re 3 | from time import time, sleep 4 | 5 | import numpy as np 6 | import openai 7 | from numpy.linalg import norm 8 | 9 | from constants import BOT_NAME 10 | from utils import save_file 11 | 12 | 13 | def gpt3_embedding(content, engine='text-embedding-ada-002'): 14 | """ 15 | Returns a 512-dimensional vector embedding of the content. 16 | 17 | :param content: the content to embed 18 | :param engine: the GPT-3 engine to use 19 | :return: an N-dimensional vector embedding of the content 20 | """ 21 | content = content.encode(encoding='ASCII', errors='ignore').decode() 22 | response = openai.Embedding.create(input=content, engine=engine) 23 | vector = response['data'][0]['embedding'] # this is a normal list 24 | return vector 25 | 26 | 27 | def vector_similarity(v1, v2): 28 | """Returns the cosine similarity between two vectors. 29 | 30 | based upon https://stackoverflow.com/questions/18424228/cosine-similarity-between-2-number-lists 31 | """ 32 | return np.dot(v1, v2) / (norm(v1) * norm(v2)) # return cosine similarity 33 | 34 | 35 | def gpt3_completion( 36 | prompt, 37 | engine='text-davinci-003', 38 | temp=0.0, 39 | top_p=1.0, 40 | tokens=400, 41 | freq_pen=0.0, 42 | pres_pen=0.0, 43 | stop=None 44 | ) -> str: 45 | if stop is None: 46 | stop = ['USER:', f'{BOT_NAME}:'] 47 | max_retry = 5 48 | retry = 0 49 | prompt = prompt.encode(encoding='ASCII', errors='ignore').decode() 50 | while True: 51 | try: 52 | response = openai.Completion.create( 53 | engine=engine, 54 | prompt=prompt, 55 | temperature=temp, 56 | max_tokens=tokens, 57 | top_p=top_p, 58 | frequency_penalty=freq_pen, 59 | presence_penalty=pres_pen, 60 | stop=stop 61 | ) 62 | text = response['choices'][0]['text'].strip() 63 | text = re.sub('[\r\n]+', '\n', text) 64 | text = re.sub('[\t ]+', ' ', text) 65 | filename = '%s_gpt3.txt' % time() 66 | if not os.path.exists('gpt3_logs'): 67 | os.makedirs('gpt3_logs') 68 | save_file('gpt3_logs/%s' % filename, prompt + '\n\n==========\n\n' + text) 69 | return text 70 | except Exception as oops: 71 | retry += 1 72 | if retry >= max_retry: 73 | return "GPT3 error: %s" % oops 74 | print('Error communicating with OpenAI:', oops) 75 | sleep(1) 76 | -------------------------------------------------------------------------------- /models.py: -------------------------------------------------------------------------------- 1 | import datetime 2 | import json 3 | import os 4 | import uuid 5 | from typing import List 6 | 7 | from gpt3_helpers import gpt3_embedding 8 | from utils import timestamp_to_datetime 9 | 10 | 11 | class Message: 12 | def __init__(self, author: str, text: str, timestamp_sent: str = None, vector: list = None, _uuid: str = None): 13 | if _uuid is None: 14 | _uuid = uuid.uuid4() 15 | if timestamp_sent is None: 16 | timestamp_sent = datetime.datetime.now().timestamp() 17 | if vector is None: 18 | vector = gpt3_embedding(text) 19 | self.author = author 20 | self.text = text 21 | 22 | self.timestamp = timestamp_sent 23 | self.vector = vector 24 | self.__uuid = _uuid 25 | 26 | def get_string(self): 27 | return self.__str__() 28 | 29 | def get_uuid(self): 30 | return self.__uuid 31 | 32 | def __dict__(self): 33 | return {'message': self.text, 'author': self.author, 'timestamp': self.timestamp, 'vector': self.vector, 'uuid': str(self.__uuid)} 34 | 35 | def __str__(self): 36 | return f'{self.author} at {timestamp_to_datetime(self.timestamp)}: {self.text}' 37 | 38 | @classmethod 39 | def from_dict(cls, msg): 40 | return cls(msg['author'], msg['message'], msg['timestamp'], msg['vector'], uuid.UUID(msg['uuid'])) 41 | 42 | 43 | class Note: 44 | def __init__(self, note_text, timestamp=None): 45 | self.note_text = note_text 46 | if timestamp is None: 47 | self.timestamp = datetime.datetime.now().timestamp() 48 | 49 | @classmethod 50 | def generate_note_from_conversation(cls, conversation: "Conversation"): 51 | message_list = conversation.get_messages() 52 | # TODO 53 | pass 54 | 55 | def __dict__(self): 56 | return {'note_text': self.note_text, 'timestamp': self.timestamp} 57 | 58 | def __str__(self): 59 | return f'Note: {self.note_text}' 60 | 61 | @classmethod 62 | def from_dict(cls, note_dict): 63 | return cls(note_dict['note_text'], note_dict['timestamp']) 64 | 65 | 66 | class Conversation: 67 | def __init__(self, messages: List[Message] = None, notes: List[Note] = None): 68 | if messages is None: 69 | messages = [] 70 | if notes is None: 71 | notes = [] 72 | self.messages = messages 73 | self.notes = notes 74 | 75 | def add_message(self, message: Message): 76 | self.messages.append(message) 77 | 78 | def get_messages(self): 79 | return self.messages 80 | 81 | def compress_notes(self): 82 | pass 83 | 84 | def get_last_messages_in_string(self, limit: int): 85 | message_list = self.messages[-limit:] 86 | return '\n'.join([msg.get_string() for msg in message_list]) 87 | 88 | def get_notes_as_string(self): 89 | return '\n- '.join([i.note_text for i in self.notes]) if self.notes else 'No notes on this conversation so far.' 90 | 91 | def get_notes(self): 92 | return self.notes 93 | 94 | def add_note(self, note: Note): 95 | self.notes.append(note) 96 | 97 | def set_notes(self, notes: list): 98 | self.notes = notes 99 | 100 | def __dict__(self): 101 | return {'messages': [msg.__dict__() for msg in self.messages], 'notes': [note.__dict__() for note in self.notes]} 102 | 103 | @classmethod 104 | def load(cls): 105 | try: 106 | with open('chat_logs/conversation.json', 'r', encoding='utf-8') as f: 107 | return Conversation().from_dict(json.load(f)) 108 | except FileNotFoundError: 109 | return Conversation() 110 | 111 | def save(self): 112 | with open(f'chat_logs/conversation.json', 'w', encoding='utf-8') as outfile: 113 | json.dump(self.__dict__(), outfile, ensure_ascii=False, sort_keys=True, indent=2) 114 | 115 | @classmethod 116 | def from_dict(cls, conversation_dict): 117 | messages = [Message.from_dict(msg) for msg in conversation_dict['messages']] 118 | notes = [Note.from_dict(note) for note in conversation_dict['notes']] 119 | return cls(messages, notes) 120 | -------------------------------------------------------------------------------- /chat.py: -------------------------------------------------------------------------------- 1 | from time import time 2 | from typing import List 3 | from uuid import uuid4 4 | 5 | import openai 6 | 7 | from constants import USERNAME, BOT_NAME 8 | from gpt3_helpers import vector_similarity, gpt3_embedding, gpt3_completion 9 | from models import Message, Conversation, Note 10 | from utils import open_file, save_json, timestamp_to_datetime 11 | 12 | 13 | def fetch_memories(vector, logs, count): 14 | scores = list() 15 | for i in logs: 16 | if vector == i['vector']: 17 | # skip this one because it is the same message 18 | continue 19 | score = vector_similarity(i['vector'], vector) 20 | i['score'] = score 21 | scores.append(i) 22 | ordered = sorted(scores, key=lambda d: d['score'], reverse=True) 23 | # TODO - pick more memories temporally nearby the top most relevant memories 24 | try: 25 | ordered = ordered[0:count] 26 | return ordered 27 | except: 28 | return ordered 29 | 30 | 31 | def summarize_memories(memories): # summarize a block of memories into one payload 32 | memories = sorted(memories, key=lambda d: d['time'], reverse=False) # sort them chronologically 33 | block = '' 34 | identifiers = list() 35 | timestamps = list() 36 | for mem in memories: 37 | block += mem['message'] + '\n\n' 38 | identifiers.append(mem['uuid']) 39 | timestamps.append(mem['time']) 40 | block = block.strip() 41 | prompt = open_file('prompt_notes.txt').replace('<>', block) 42 | # TODO - do this in the background over time to handle huge amounts of memories 43 | notes = gpt3_completion(prompt) 44 | notes.split('\n') 45 | vector = gpt3_embedding(block) 46 | info = {'notes': notes, 'uuids': identifiers, 'times': timestamps, 'uuid': str(uuid4()), 'vector': vector} 47 | filename = 'notes_%s.json' % time() 48 | save_json('notes/%s' % filename, info) 49 | return notes 50 | 51 | 52 | def get_last_messages(conversation, limit): 53 | try: 54 | short = conversation[-limit:] 55 | except: 56 | short = conversation 57 | output = '' 58 | for i in short: 59 | output += '%s\n\n' % i['message'] 60 | output = output.strip() 61 | return output 62 | 63 | 64 | def get_user_input() -> Message: 65 | user_input = input(f'{USERNAME}: ') 66 | return Message(USERNAME, user_input) 67 | 68 | 69 | def search_conversation(conversation: Conversation, message: Message) -> List[Message]: 70 | """ 71 | Search the conversation for messages that are related to the given message 72 | 73 | :param conversation: Conversation object, the conversation to search 74 | :param message: Message object, that we want to find related messages for 75 | :return: List of messages that are related to the given message 76 | """ 77 | message_list = conversation.get_messages() 78 | query_vector = gpt3_embedding(message.text) 79 | similarities = {} 80 | for message in message_list: 81 | # TODO: don't store the entire message in memory, just the vector/uuid 82 | similarities[message] = vector_similarity(query_vector, message.vector) 83 | # get the top 3 most similar messages 84 | ordered = [i[0] for i in sorted(similarities.items(), key=lambda x: x[1], reverse=True)[:6]] 85 | return ordered 86 | 87 | 88 | def summarize_notes(notes): 89 | prompt = open_file('prompts/compress_notes.txt').replace('<>', '\n- '.join([note.note_text for note in notes])) 90 | result = gpt3_completion(prompt) 91 | return [Note(i.strip()) for i in result.split('- ')] 92 | 93 | 94 | def main(): 95 | conversation = Conversation.load() 96 | while True: 97 | 98 | # step 1 - get input 99 | # step 2 - gather all information about the conversation (load memories, notes, wikipedia maybe?) 100 | # step 3 - generate search queries to search info about the input 101 | # step 4 - use vector search to find information in our conversation/other information sources 102 | # step 5 - compile an answer using such information 103 | # step 6 - put out an answer 104 | # step 7 - create a memory of the conversation 105 | # step 8 - repeat 106 | 107 | # step 1 108 | message = get_user_input() 109 | conversation.add_message(message) 110 | 111 | # step 2 112 | last_6_messages = conversation.get_last_messages_in_string(12) # get last 6 messages as a string 113 | notes = conversation.get_notes_as_string() # get notes from the conversation 114 | # gather_info_prompt = ( 115 | # open_file('prompts/prompt_conversation_prepare_info.txt') 116 | # .replace('<>', last_6_messages) 117 | # .replace('<>', notes) 118 | # ) 119 | # step 3 120 | # search_queries = [i.strip() for i in gpt3_completion(gather_info_prompt).split('- ') if i.strip() != ''] 121 | # step 4 122 | related_messages = search_conversation(conversation, message) 123 | # facts = [f"Question: {i}; Answer: {input(i)}" for i in search_queries] # lmao 124 | # step 5 125 | if len(related_messages) > 10: 126 | answer_prompt = ( 127 | open_file('prompts/prompt_response.txt') 128 | .replace('<>', last_6_messages) 129 | .replace('<>', notes) 130 | .replace('<>', '\n'.join([i.get_string() for i in related_messages])) 131 | # .replace('<>', '\n'.join(facts)) 132 | ) 133 | else: 134 | answer_prompt = ( 135 | open_file('prompts/prompt_response_in_new_conversation.txt').replace('<>', last_6_messages) 136 | ) 137 | # step 6 138 | answer = gpt3_completion(answer_prompt) 139 | print(f'{BOT_NAME}: {answer}') 140 | conversation.add_message(Message(BOT_NAME, answer)) 141 | # step 7 142 | notes_prompt = open_file('prompts/prompt_notes.txt').replace('<>', last_6_messages) 143 | notes = [i.strip() for i in gpt3_completion(notes_prompt).split('- ')] 144 | 145 | [conversation.add_note(Note(note)) for note in notes] 146 | if len(conversation.get_notes()) > 10: 147 | # compress notes 148 | notes = conversation.get_notes() 149 | notes = summarize_notes(notes) 150 | conversation.set_notes(notes) 151 | # step 8 152 | conversation.save() 153 | 154 | 155 | if __name__ == '__main__': 156 | openai.api_key = open_file('openaiapikey.txt') 157 | main() 158 | --------------------------------------------------------------------------------