├── .gitignore ├── LICENSE ├── README.md ├── chat.py ├── requirements.txt ├── scratchpad.txt └── system_message.txt /.gitignore: -------------------------------------------------------------------------------- 1 | key_openai.txt 2 | .idea 3 | .venv 4 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 David Shapiro 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Coding Chatbot with Scratchpad 2 | 3 | This Python script is a command-line interface for interacting with OpenAI's GPT-4 model. It allows users to have a conversation with the chatbot and update a scratchpad with relevant information that the chatbot can use to provide more accurate and context-aware responses. 4 | 5 | ## Features 6 | 7 | - **User Input**: The script provides two input modes for the user: 8 | - *Normal Mode*: Single-line input for regular conversation with the chatbot. 9 | - *Multi-line Mode*: Allows users to enter multiple lines of text to update the scratchpad. To enter this mode, type `SCRATCHPAD` and press Enter. Type `END` to save and exit the multi-line input mode. 10 | 11 | - **Scratchpad**: The scratchpad is a text file (`scratchpad.txt`) that can be updated by the user during the conversation. The chatbot uses the information in the scratchpad to provide more context-aware responses. 12 | 13 | - **System Message**: A system message is used to provide the chatbot with instructions on how to use the scratchpad. The content of the system message is stored in a separate text file (`system_message.txt`). 14 | 15 | - **Conversation History**: The script maintains a conversation history, which is used to provide context to the chatbot. If the conversation becomes too long, the oldest message is removed to fit within the model's token limit. 16 | 17 | - **API Communication**: The script communicates with OpenAI's API to generate responses from the chatbot. It handles errors and retries if necessary. 18 | 19 | Enjoy your conversation with the chatbot and make use of the scratchpad to provide additional context for more accurate and helpful responses! 20 | 21 | ## Setup 22 | 23 | 1. Install OpenAI if not already, should have no other dependencies: `pip install openai --upgrade` 24 | 2. Create `key_openai.txt` and populate your OpenAI key. Note, you need to have access to GPT-4 API. Note, some operating systems require 25 | 26 | ## Usage 27 | 28 | 1. Run `python chat.py` 29 | 2. Type `SCRATCHPAD` to enter scratchpad update mode. This will allow you to copy/paste multiple lines. Type `END` on the last line to save and exit. 30 | 31 | ## Example Output: 32 | 33 | Below is an example output from the chatbot. It uses a combination of textwrap and newlines to ensure code is preserved in console output but not too wide. Note, the scratchpad was previously updated with another python script. You can see that this was able to understand the code and request well enough to understand what other functions were necessary. 34 | 35 | `````` 36 | ****** IMPORTANT: ****** 37 | 38 | Type SCRATCHPAD to enter multi line input mode to update scratchpad. Type END to save and exit. 39 | 40 | 41 | 42 | [NORMAL] USER: 43 | write a new search_kb function. assume that each KB article is a YAML file in a folder called kb. the elements in the KB ar article and vector. vector is a high dimensional semantic embedding that can be used to calculate cosine similarity with numpy. we will be using Google USEv5 for the embeddings 44 | 45 | 46 | 47 | CHATBOT: 48 | First, let's import the necessary libraries: 49 | 50 | ```python 51 | import os 52 | import numpy as np 53 | from sklearn.metrics.pairwise import cosine_similarity 54 | import tensorflow_hub as hub 55 | ``` 56 | 57 | Now, let's create a function to load all the KB articles and their embeddings: 58 | 59 | ```python 60 | def load_kb_articles(kb_folder='kb'): 61 | kb_articles = [] 62 | for filename in os.listdir(kb_folder): 63 | if filename.endswith('.yaml'): 64 | filepath = os.path.join(kb_folder, filename) 65 | article = open_yaml(filepath) 66 | kb_articles.append(article) 67 | return kb_articles 68 | ``` 69 | 70 | Next, let's create a function to calculate the cosine similarity between the query and the KB articles: 71 | 72 | ```python 73 | def calculate_similarity(query, kb_articles): 74 | # Load the Universal Sentence Encoder 75 | embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder/5") 76 | 77 | # Calculate the query embedding 78 | query_embedding = embed([query])[0].numpy() 79 | 80 | # Calculate the cosine similarity between the query and the KB articles 81 | similarities = [] 82 | for article in kb_articles: 83 | article_embedding = np.array(article['vector']) 84 | similarity = cosine_similarity(query_embedding.reshape(1, -1), article_embedding.reshape(1, -1)) 85 | similarities.append(similarity[0][0]) 86 | 87 | return similarities 88 | ``` 89 | 90 | Finally, let's update the `search_kb` function to use the above functions and return the most relevant article: 91 | 92 | ```python 93 | def search_kb(query): 94 | # Load the KB articles 95 | kb_articles = load_kb_articles() 96 | 97 | # Calculate the cosine similarity between the query and the KB articles 98 | similarities = calculate_similarity(query, kb_articles) 99 | 100 | # Find the index of the most similar article 101 | most_similar_index = np.argmax(similarities) 102 | 103 | # Return the most similar article 104 | return kb_articles[most_similar_index]['article'] 105 | ``` 106 | 107 | Now, the `search_kb` function should return the most relevant KB article based on the cosine similarity between the 108 | query and the article embeddings. 109 | `````` -------------------------------------------------------------------------------- /chat.py: -------------------------------------------------------------------------------- 1 | from halo import Halo 2 | import openai 3 | from time import time, sleep 4 | import textwrap 5 | import sys 6 | 7 | 8 | ### file operations 9 | 10 | 11 | def save_yaml(filepath, data): 12 | with open(filepath, 'w', encoding='utf-8') as file: 13 | yaml.dump(data, file, allow_unicode=True) 14 | 15 | 16 | def open_yaml(filepath): 17 | with open(filepath, 'r', encoding='utf-8') as file: 18 | data = yaml.load(file, Loader=yaml.FullLoader) 19 | return data 20 | 21 | 22 | def save_file(filepath, content): 23 | with open(filepath, 'w', encoding='utf-8') as outfile: 24 | outfile.write(content) 25 | 26 | 27 | def open_file(filepath): 28 | with open(filepath, 'r', encoding='utf-8', errors='ignore') as infile: 29 | return infile.read() 30 | 31 | 32 | ### API functions 33 | 34 | 35 | def chatbot(conversation, model="gpt-4-0613", temperature=0): 36 | max_retry = 7 37 | retry = 0 38 | while True: 39 | try: 40 | response = openai.ChatCompletion.create(model=model, messages=conversation, temperature=temperature) 41 | text = response['choices'][0]['message']['content'] 42 | return text, response['usage']['total_tokens'] 43 | except Exception as oops: 44 | print(f'\n\nError communicating with OpenAI: "{oops}"') 45 | if 'maximum context length' in str(oops): 46 | a = conversation.pop(0) 47 | print('\n\n DEBUG: Trimming oldest message') 48 | continue 49 | retry += 1 50 | if retry >= max_retry: 51 | print(f"\n\nExiting due to excessive errors in API: {oops}") 52 | exit(1) 53 | print(f'\n\nRetrying in {2 ** (retry - 1) * 5} seconds...') 54 | sleep(2 ** (retry - 1) * 5) 55 | 56 | 57 | ### MAIN LOOP 58 | 59 | 60 | def multi_line_input(): 61 | print('\n\n\nType END to save and exit.\n[MULTI] USER:\n') 62 | lines = [] 63 | while True: 64 | line = input() 65 | if line == "END": 66 | break 67 | lines.append(line) 68 | return "\n".join(lines) 69 | 70 | 71 | if __name__ == '__main__': 72 | # instantiate chatbot 73 | openai.api_key = open_file('key_openai.txt').strip() 74 | ALL_MESSAGES = list() 75 | print('\n\n****** IMPORTANT: ******\n\nType SCRATCHPAD to enter multi line input mode to update scratchpad. Type END to save and exit.') 76 | 77 | while True: 78 | # get user input 79 | text = input('\n\n\n[NORMAL] USER:\n\n') 80 | 81 | # check if scratchpad updated, continue 82 | if 'SCRATCHPAD' in text: 83 | text = multi_line_input() 84 | save_file('scratchpad.txt', text.strip('END').strip()) 85 | print('\n\n##### Scratchpad updated!') 86 | continue 87 | if text == '': 88 | # empty submission, probably on accident 89 | continue 90 | 91 | # continue with composing conversation and response 92 | ALL_MESSAGES.append({'role': 'user', 'content': text}) 93 | system_message = open_file('system_message.txt').replace('<>', open_file('scratchpad.txt')) 94 | conversation = list() 95 | conversation += ALL_MESSAGES 96 | conversation.append({'role': 'system', 'content': system_message}) 97 | 98 | # generate a response 99 | spinner = Halo(text='Coding...', spinner='dots') 100 | spinner.start() 101 | response, tokens = chatbot(conversation) 102 | spinner.stop() 103 | if tokens > 7500: 104 | ALL_MESSAGES.pop(0) 105 | ALL_MESSAGES.append({'role': 'assistant', 'content': response}) 106 | print('\n\n\n\nCHATBOT:\n') 107 | formatted_lines = [textwrap.fill(line, width=120) for line in response.split('\n')] 108 | formatted_text = '\n'.join(formatted_lines) 109 | print(formatted_text) -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | aiohttp==3.8.4 2 | aiosignal==1.3.1 3 | async-timeout==4.0.2 4 | attrs==23.1.0 5 | certifi==2023.5.7 6 | charset-normalizer==3.1.0 7 | frozenlist==1.3.3 8 | idna==3.4 9 | multidict==6.0.4 10 | openai==0.27.8 11 | PyYAML==6.0 12 | requests==2.31.0 13 | tqdm==4.65.0 14 | urllib3==2.0.3 15 | yarl==1.9.2 16 | -------------------------------------------------------------------------------- /scratchpad.txt: -------------------------------------------------------------------------------- 1 | blah -------------------------------------------------------------------------------- /system_message.txt: -------------------------------------------------------------------------------- 1 | MAIN PURPOSE 2 | You are a Python coding assistant. The USER will give you instructions to help write functions. You may ask for clarification if needed, but otherwise you should only output Python code. Adhere to PEP8. Provide explanations of the code only if the user asks for them. 3 | 4 | 5 | 6 | SCRATCHPAD 7 | The below scratchpad may be provided by the user so you are aware of the script they are working on. Note, this information may be blank. Even if the below information is populated, it may not be relevant to the user's request. Use your best judgment to discern if the user is asking for you to modify the below code, or if the code is there for reference. 8 | 9 | 10 | 11 | SCRATCHPAD: 12 | 13 | 14 | <> --------------------------------------------------------------------------------