├── constants.py
├── .gitignore
├── prompts
├── prompt_notes.txt
├── prompt_thought.txt
├── compress_notes.txt
├── prompt_response_in_new_conversation.txt
├── prompt_response.txt
└── prompt_conversation_prepare_info.txt
├── README.md
├── utils.py
├── LICENSE
├── gpt3_helpers.py
├── models.py
└── chat.py
/constants.py:
--------------------------------------------------------------------------------
1 | BOT_NAME = "Liza"
2 | USERNAME = "USER"
3 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | openaiapikey.txt
2 | chat_logs
3 | gpt3_logs
4 | notes
--------------------------------------------------------------------------------
/prompts/prompt_notes.txt:
--------------------------------------------------------------------------------
1 | Write detailed key notes of the following conversation in a hyphenated list format like "- "
2 |
3 |
4 |
5 | <>
6 |
7 |
8 |
9 | NOTES:
--------------------------------------------------------------------------------
/prompts/prompt_thought.txt:
--------------------------------------------------------------------------------
1 | I am a chatbot Liza. My purpose is to be helpful to the person who is talking to me and imporve their quality of life.
2 |
3 | The following are notes from earlier conversations with USER:
4 | <>
5 |
6 |
7 |
8 | The following are the most recent messages in the conversation:
9 | <>
10 |
11 |
12 |
13 | Liza:
14 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # LizaChatBot
2 | GPT-3 chatbot with long-term memory and external sources
3 |
4 | ## Things to add
5 | 1. **Thoughts.** This chatbot needs to be able to think in private
6 | 2. **Vector database** use redis search for that
7 | 3. **Paralellise** explains itself. Only 2 request to gpt3 is needed to complete the request, though currently the input does not show up until we've organized our notes and everything.
8 |
--------------------------------------------------------------------------------
/prompts/compress_notes.txt:
--------------------------------------------------------------------------------
1 | I am a chatbot named Liza. My goals are the wellness of my user and their quality of life. I need to compress these notes into smaller and have fewer of them (ideally to have 5-6 notes max) and not lose any crucial long-term information.
2 |
3 |
4 | These are the notes:
5 | <>
6 |
7 |
8 | The compressed version of the notes, each on new line starting with "- ", that do not lose any information, would be:
9 |
10 |
--------------------------------------------------------------------------------
/prompts/prompt_response_in_new_conversation.txt:
--------------------------------------------------------------------------------
1 | I am a chatbot named Liza. My goals are to reduce USER's suffering, increase prosperity for USER, and increase USER's quality of life. USER just now first messaged me, a few minutes ago. I need to get to know him. I will read my conversation with him (if there is any) and I will ask USER some questions to get to know him. I will not ask USER more then two questions in one message. I will sound interested in what USER is talking about.
2 |
3 |
4 | The following are the most recent messages in the conversation:
5 | <>
6 |
7 |
8 | I will now produce a verbose, detailed answer, to get to know USER, and I will not ask more than two questions in that message:
9 | Liza:
--------------------------------------------------------------------------------
/prompts/prompt_response.txt:
--------------------------------------------------------------------------------
1 | I am a chatbot named Liza. My goals are to reduce USER's suffering, increase prosperity for USER, and increase USER's quality of life. I will now read my conversation with him, my notes and other related messages in the conversation, and using that information I will give a detailed, verbose response. If I feel like the conversation is going into nowhere, I will ask USER some question.
2 |
3 | The following are notes from earlier conversations with USER:
4 | <>
5 |
6 |
7 | Older messages in this conversation related to this message:
8 | <>
9 |
10 |
11 | The following are the most recent messages in the conversation:
12 | <>
13 |
14 |
15 | I will now produce a verbose, detailed answer for USER:
16 | Liza:
17 |
--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
1 | import datetime
2 | import json
3 |
4 |
5 | def open_file(filepath):
6 | with open(filepath, 'r', encoding='utf-8') as infile:
7 | return infile.read()
8 |
9 |
10 | def save_file(filepath, content):
11 | with open(filepath, 'w', encoding='utf-8') as outfile:
12 | outfile.write(content)
13 |
14 |
15 | def load_json(filepath):
16 | with open(filepath, 'r', encoding='utf-8') as infile:
17 | return json.load(infile)
18 |
19 |
20 | def save_json(filepath, payload):
21 | with open(filepath, 'w', encoding='utf-8') as outfile:
22 | json.dump(payload, outfile, ensure_ascii=False, sort_keys=True, indent=2)
23 |
24 |
25 | def timestamp_to_datetime(unix_time):
26 | return datetime.datetime.fromtimestamp(unix_time).strftime("%A, %B %d, %Y at %I:%M%p %Z")
27 |
28 |
29 |
--------------------------------------------------------------------------------
/prompts/prompt_conversation_prepare_info.txt:
--------------------------------------------------------------------------------
1 | I am a chatbot named Liza. I have to gather infromation from my memories and list of facts to prepare an answer to the person who is talking to me. I will write descriptive search queries to search through my knowledge database.
2 |
3 |
4 | Last few messages of the conversation:
5 | <>
6 |
7 |
8 | Notes of the conversation:
9 | <>
10 |
11 |
12 | I will now proceed to generate up to 6 search queries to get facts from my vector database, each on new line starting with "- ". Some of the search queries will reference the conversation history and my memories with this user, while some other search queries will reference my facts database (wikipedia). These queries will search my vector-based database and help mefactually correct response. Queries could include questions about things the user is referencing, about the user himself, or some facts that I need to check about. Queries should not be SQL-like, they should be questions or statements.
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2022 David Shapiro
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/gpt3_helpers.py:
--------------------------------------------------------------------------------
1 | import os
2 | import re
3 | from time import time, sleep
4 |
5 | import numpy as np
6 | import openai
7 | from numpy.linalg import norm
8 |
9 | from constants import BOT_NAME
10 | from utils import save_file
11 |
12 |
13 | def gpt3_embedding(content, engine='text-embedding-ada-002'):
14 | """
15 | Returns a 512-dimensional vector embedding of the content.
16 |
17 | :param content: the content to embed
18 | :param engine: the GPT-3 engine to use
19 | :return: an N-dimensional vector embedding of the content
20 | """
21 | content = content.encode(encoding='ASCII', errors='ignore').decode()
22 | response = openai.Embedding.create(input=content, engine=engine)
23 | vector = response['data'][0]['embedding'] # this is a normal list
24 | return vector
25 |
26 |
27 | def vector_similarity(v1, v2):
28 | """Returns the cosine similarity between two vectors.
29 |
30 | based upon https://stackoverflow.com/questions/18424228/cosine-similarity-between-2-number-lists
31 | """
32 | return np.dot(v1, v2) / (norm(v1) * norm(v2)) # return cosine similarity
33 |
34 |
35 | def gpt3_completion(
36 | prompt,
37 | engine='text-davinci-003',
38 | temp=0.0,
39 | top_p=1.0,
40 | tokens=400,
41 | freq_pen=0.0,
42 | pres_pen=0.0,
43 | stop=None
44 | ) -> str:
45 | if stop is None:
46 | stop = ['USER:', f'{BOT_NAME}:']
47 | max_retry = 5
48 | retry = 0
49 | prompt = prompt.encode(encoding='ASCII', errors='ignore').decode()
50 | while True:
51 | try:
52 | response = openai.Completion.create(
53 | engine=engine,
54 | prompt=prompt,
55 | temperature=temp,
56 | max_tokens=tokens,
57 | top_p=top_p,
58 | frequency_penalty=freq_pen,
59 | presence_penalty=pres_pen,
60 | stop=stop
61 | )
62 | text = response['choices'][0]['text'].strip()
63 | text = re.sub('[\r\n]+', '\n', text)
64 | text = re.sub('[\t ]+', ' ', text)
65 | filename = '%s_gpt3.txt' % time()
66 | if not os.path.exists('gpt3_logs'):
67 | os.makedirs('gpt3_logs')
68 | save_file('gpt3_logs/%s' % filename, prompt + '\n\n==========\n\n' + text)
69 | return text
70 | except Exception as oops:
71 | retry += 1
72 | if retry >= max_retry:
73 | return "GPT3 error: %s" % oops
74 | print('Error communicating with OpenAI:', oops)
75 | sleep(1)
76 |
--------------------------------------------------------------------------------
/models.py:
--------------------------------------------------------------------------------
1 | import datetime
2 | import json
3 | import os
4 | import uuid
5 | from typing import List
6 |
7 | from gpt3_helpers import gpt3_embedding
8 | from utils import timestamp_to_datetime
9 |
10 |
11 | class Message:
12 | def __init__(self, author: str, text: str, timestamp_sent: str = None, vector: list = None, _uuid: str = None):
13 | if _uuid is None:
14 | _uuid = uuid.uuid4()
15 | if timestamp_sent is None:
16 | timestamp_sent = datetime.datetime.now().timestamp()
17 | if vector is None:
18 | vector = gpt3_embedding(text)
19 | self.author = author
20 | self.text = text
21 |
22 | self.timestamp = timestamp_sent
23 | self.vector = vector
24 | self.__uuid = _uuid
25 |
26 | def get_string(self):
27 | return self.__str__()
28 |
29 | def get_uuid(self):
30 | return self.__uuid
31 |
32 | def __dict__(self):
33 | return {'message': self.text, 'author': self.author, 'timestamp': self.timestamp, 'vector': self.vector, 'uuid': str(self.__uuid)}
34 |
35 | def __str__(self):
36 | return f'{self.author} at {timestamp_to_datetime(self.timestamp)}: {self.text}'
37 |
38 | @classmethod
39 | def from_dict(cls, msg):
40 | return cls(msg['author'], msg['message'], msg['timestamp'], msg['vector'], uuid.UUID(msg['uuid']))
41 |
42 |
43 | class Note:
44 | def __init__(self, note_text, timestamp=None):
45 | self.note_text = note_text
46 | if timestamp is None:
47 | self.timestamp = datetime.datetime.now().timestamp()
48 |
49 | @classmethod
50 | def generate_note_from_conversation(cls, conversation: "Conversation"):
51 | message_list = conversation.get_messages()
52 | # TODO
53 | pass
54 |
55 | def __dict__(self):
56 | return {'note_text': self.note_text, 'timestamp': self.timestamp}
57 |
58 | def __str__(self):
59 | return f'Note: {self.note_text}'
60 |
61 | @classmethod
62 | def from_dict(cls, note_dict):
63 | return cls(note_dict['note_text'], note_dict['timestamp'])
64 |
65 |
66 | class Conversation:
67 | def __init__(self, messages: List[Message] = None, notes: List[Note] = None):
68 | if messages is None:
69 | messages = []
70 | if notes is None:
71 | notes = []
72 | self.messages = messages
73 | self.notes = notes
74 |
75 | def add_message(self, message: Message):
76 | self.messages.append(message)
77 |
78 | def get_messages(self):
79 | return self.messages
80 |
81 | def compress_notes(self):
82 | pass
83 |
84 | def get_last_messages_in_string(self, limit: int):
85 | message_list = self.messages[-limit:]
86 | return '\n'.join([msg.get_string() for msg in message_list])
87 |
88 | def get_notes_as_string(self):
89 | return '\n- '.join([i.note_text for i in self.notes]) if self.notes else 'No notes on this conversation so far.'
90 |
91 | def get_notes(self):
92 | return self.notes
93 |
94 | def add_note(self, note: Note):
95 | self.notes.append(note)
96 |
97 | def set_notes(self, notes: list):
98 | self.notes = notes
99 |
100 | def __dict__(self):
101 | return {'messages': [msg.__dict__() for msg in self.messages], 'notes': [note.__dict__() for note in self.notes]}
102 |
103 | @classmethod
104 | def load(cls):
105 | try:
106 | with open('chat_logs/conversation.json', 'r', encoding='utf-8') as f:
107 | return Conversation().from_dict(json.load(f))
108 | except FileNotFoundError:
109 | return Conversation()
110 |
111 | def save(self):
112 | with open(f'chat_logs/conversation.json', 'w', encoding='utf-8') as outfile:
113 | json.dump(self.__dict__(), outfile, ensure_ascii=False, sort_keys=True, indent=2)
114 |
115 | @classmethod
116 | def from_dict(cls, conversation_dict):
117 | messages = [Message.from_dict(msg) for msg in conversation_dict['messages']]
118 | notes = [Note.from_dict(note) for note in conversation_dict['notes']]
119 | return cls(messages, notes)
120 |
--------------------------------------------------------------------------------
/chat.py:
--------------------------------------------------------------------------------
1 | from time import time
2 | from typing import List
3 | from uuid import uuid4
4 |
5 | import openai
6 |
7 | from constants import USERNAME, BOT_NAME
8 | from gpt3_helpers import vector_similarity, gpt3_embedding, gpt3_completion
9 | from models import Message, Conversation, Note
10 | from utils import open_file, save_json, timestamp_to_datetime
11 |
12 |
13 | def fetch_memories(vector, logs, count):
14 | scores = list()
15 | for i in logs:
16 | if vector == i['vector']:
17 | # skip this one because it is the same message
18 | continue
19 | score = vector_similarity(i['vector'], vector)
20 | i['score'] = score
21 | scores.append(i)
22 | ordered = sorted(scores, key=lambda d: d['score'], reverse=True)
23 | # TODO - pick more memories temporally nearby the top most relevant memories
24 | try:
25 | ordered = ordered[0:count]
26 | return ordered
27 | except:
28 | return ordered
29 |
30 |
31 | def summarize_memories(memories): # summarize a block of memories into one payload
32 | memories = sorted(memories, key=lambda d: d['time'], reverse=False) # sort them chronologically
33 | block = ''
34 | identifiers = list()
35 | timestamps = list()
36 | for mem in memories:
37 | block += mem['message'] + '\n\n'
38 | identifiers.append(mem['uuid'])
39 | timestamps.append(mem['time'])
40 | block = block.strip()
41 | prompt = open_file('prompt_notes.txt').replace('<>', block)
42 | # TODO - do this in the background over time to handle huge amounts of memories
43 | notes = gpt3_completion(prompt)
44 | notes.split('\n')
45 | vector = gpt3_embedding(block)
46 | info = {'notes': notes, 'uuids': identifiers, 'times': timestamps, 'uuid': str(uuid4()), 'vector': vector}
47 | filename = 'notes_%s.json' % time()
48 | save_json('notes/%s' % filename, info)
49 | return notes
50 |
51 |
52 | def get_last_messages(conversation, limit):
53 | try:
54 | short = conversation[-limit:]
55 | except:
56 | short = conversation
57 | output = ''
58 | for i in short:
59 | output += '%s\n\n' % i['message']
60 | output = output.strip()
61 | return output
62 |
63 |
64 | def get_user_input() -> Message:
65 | user_input = input(f'{USERNAME}: ')
66 | return Message(USERNAME, user_input)
67 |
68 |
69 | def search_conversation(conversation: Conversation, message: Message) -> List[Message]:
70 | """
71 | Search the conversation for messages that are related to the given message
72 |
73 | :param conversation: Conversation object, the conversation to search
74 | :param message: Message object, that we want to find related messages for
75 | :return: List of messages that are related to the given message
76 | """
77 | message_list = conversation.get_messages()
78 | query_vector = gpt3_embedding(message.text)
79 | similarities = {}
80 | for message in message_list:
81 | # TODO: don't store the entire message in memory, just the vector/uuid
82 | similarities[message] = vector_similarity(query_vector, message.vector)
83 | # get the top 3 most similar messages
84 | ordered = [i[0] for i in sorted(similarities.items(), key=lambda x: x[1], reverse=True)[:6]]
85 | return ordered
86 |
87 |
88 | def summarize_notes(notes):
89 | prompt = open_file('prompts/compress_notes.txt').replace('<>', '\n- '.join([note.note_text for note in notes]))
90 | result = gpt3_completion(prompt)
91 | return [Note(i.strip()) for i in result.split('- ')]
92 |
93 |
94 | def main():
95 | conversation = Conversation.load()
96 | while True:
97 |
98 | # step 1 - get input
99 | # step 2 - gather all information about the conversation (load memories, notes, wikipedia maybe?)
100 | # step 3 - generate search queries to search info about the input
101 | # step 4 - use vector search to find information in our conversation/other information sources
102 | # step 5 - compile an answer using such information
103 | # step 6 - put out an answer
104 | # step 7 - create a memory of the conversation
105 | # step 8 - repeat
106 |
107 | # step 1
108 | message = get_user_input()
109 | conversation.add_message(message)
110 |
111 | # step 2
112 | last_6_messages = conversation.get_last_messages_in_string(12) # get last 6 messages as a string
113 | notes = conversation.get_notes_as_string() # get notes from the conversation
114 | # gather_info_prompt = (
115 | # open_file('prompts/prompt_conversation_prepare_info.txt')
116 | # .replace('<>', last_6_messages)
117 | # .replace('<>', notes)
118 | # )
119 | # step 3
120 | # search_queries = [i.strip() for i in gpt3_completion(gather_info_prompt).split('- ') if i.strip() != '']
121 | # step 4
122 | related_messages = search_conversation(conversation, message)
123 | # facts = [f"Question: {i}; Answer: {input(i)}" for i in search_queries] # lmao
124 | # step 5
125 | if len(related_messages) > 10:
126 | answer_prompt = (
127 | open_file('prompts/prompt_response.txt')
128 | .replace('<>', last_6_messages)
129 | .replace('<>', notes)
130 | .replace('<>', '\n'.join([i.get_string() for i in related_messages]))
131 | # .replace('<>', '\n'.join(facts))
132 | )
133 | else:
134 | answer_prompt = (
135 | open_file('prompts/prompt_response_in_new_conversation.txt').replace('<>', last_6_messages)
136 | )
137 | # step 6
138 | answer = gpt3_completion(answer_prompt)
139 | print(f'{BOT_NAME}: {answer}')
140 | conversation.add_message(Message(BOT_NAME, answer))
141 | # step 7
142 | notes_prompt = open_file('prompts/prompt_notes.txt').replace('<>', last_6_messages)
143 | notes = [i.strip() for i in gpt3_completion(notes_prompt).split('- ')]
144 |
145 | [conversation.add_note(Note(note)) for note in notes]
146 | if len(conversation.get_notes()) > 10:
147 | # compress notes
148 | notes = conversation.get_notes()
149 | notes = summarize_notes(notes)
150 | conversation.set_notes(notes)
151 | # step 8
152 | conversation.save()
153 |
154 |
155 | if __name__ == '__main__':
156 | openai.api_key = open_file('openaiapikey.txt')
157 | main()
158 |
--------------------------------------------------------------------------------