├── .gitignore ├── .local.env ├── LICENSE ├── README.md ├── cody.py └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | .venv 2 | .env 3 | -------------------------------------------------------------------------------- /.local.env: -------------------------------------------------------------------------------- 1 | OPENAI_API_KEY=YOUR_API_KEY_HERE -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 Drew H. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | # 🤖 Cody - Your AI Coding Assistant 3 | [![Star History Chart](https://api.star-history.com/svg?repos=ajhous44/cody&type=Date)](https://star-history.com/#ajhous44/cody&Date) 4 | 5 | Welcome to Cody! An AI assistant designed to let you interactively query your codebase using natural language. By utilizing vector embeddings, chunking, and OpenAI's language models, Cody can help you navigate through your code in an efficient and intuitive manner. 💻 6 | https://www.star-history.com/#ajhous44/cody&Date 7 | ![image](https://github.com/ajhous44/cody/assets/42582780/f2a62a20-663c-4ec1-b000-67257331fb12) 8 | ## LINK 9 | https://www.loom.com/share/eba1d0dcee20430fbd412580d1c0ea0e?sid=4998cf6f-45b4-480d-b742-6f22f3a49dc3 10 | 11 | 12 | Cody continuously updates its knowledge base every time you save a file, ensuring you have the most up-to-date information. You can customize your setup by specifying directories to ignore in the `ignore_list`. 13 | 14 | ## 🚀 Getting Started 15 | 16 | 1. Clone the repo 17 | 2. (Optionally) Setup virtual environment by running `pip install -m venv .venv` and then `pip install -r requirements.txt` in terminal from the root of your directory 18 | 3. Rename the `.local.env` file to `.env`` and replace `YOUR_API_KEY_HERE` with your OpenAI API Key. 19 | 4. Modify the `IGNORE_THESE` global var at the top of the script to specify directories and files you wish to exclude from monitoring. (You should comment out any large files like a virtual environment, cache, js libraries you have downloaded, etc...) 20 | 5. Run the script using Python: python cody.py and follow terminal for setup. It will prompt you for if you want to use text chat (terminal) or conversational (speech i/o). It will also warn you if you remove .env from the ignore list. 21 | 22 | ## 🎯 Features 23 | 24 | - **File Monitoring**: Real-time monitoring of all files in your project's directory and subdirectories. 👀 25 | - **Embedding-based Knowledge Base**: Create a knowledge base using OpenAI Embeddings. Cody collects the contents of all text and JSON files and adds them to this knowledge base. 📚 26 | - **Interactive Q&A**: Listen to user inputs. Ask questions, and Cody will generate a response using the knowledge base. 🧠 27 | - **Customizable**: Easily specify files or directories to ignore during monitoring. 28 | 29 | ## 🛠 Dependencies 30 | 31 | - `dotenv`: Load variables from a `.env` file into the environment. 32 | - `langchain-community`: A language processing library used for embeddings and vector storage. Previously `langchain`. 33 | - `langchain_openai`: Provides the `OpenAIEmbeddings` functionality, integrating OpenAI models directly with langchain's architecture. 34 | - `litellm`: Call all LLM APIs using the OpenAI format (https://github.com/BerriAI/litellm) 35 | - `watchdog`: Monitor filesystem events in real-time. 36 | - `openai`: Generate smart responses using OpenAI's language model. 37 | - `speech_recognition`: Convert speech to text for voice interaction. 38 | - `gtts`: Google Text-to-Speech library for generating audio from text. 39 | - `pygame`: Library to play audio files. 40 | 41 | ## 💡 Usage 42 | 43 | - To stop the script, type 'exit' or speak the word 'exit' and press enter. Cody will gracefully terminate the program. 44 | 45 | ### Configuring the Ignore List 46 | 47 | Cody allows you to specify which files and directories should be ignored during file monitoring. This is particularly useful for excluding files that change frequently, are not relevant to your queries, or could contain sensitive information. 48 | 49 | To customize your `ignore_list`, add patterns matching the files or directories you wish to exclude. Cody supports simple wildcard patterns for flexibility. Here are some examples to guide you: 50 | 51 | #### Examples 52 | 53 | - **Ignoring Specific Files**: If you want to ignore all `.env` files, you can add `*.env` to the ignore list. 54 | ```python 55 | IGNORE_THESE = ['*.env'] 56 | ``` 57 | 58 | - **Ignoring Directories**: To ignore an entire directory, such as `node_modules` or a virtual environment directory like `.venv`, simply add the directory name. 59 | ```python 60 | IGNORE_THESE = ['node_modules', '.venv'] 61 | ``` 62 | 63 | - **Ignoring File Extensions**: To ignore all files with a specific extension, such as `.log` or `.tmp`, use the wildcard pattern `*`. 64 | ```python 65 | IGNORE_THESE = ['*.log', '*.tmp'] 66 | ``` 67 | 68 | - **Complex Patterns**: You can combine directory names and wildcards to ignore specific types of files within certain directories. For example, to ignore all `.md` files in the `docs` directory: 69 | ```python 70 | IGNORE_THESE = ['docs/*.md'] 71 | ``` 72 | 73 | #### Tips for Configuring Your Ignore List 74 | 75 | - **Review Regularly**: As your project evolves, so too may the files and directories you need to ignore. Regularly reviewing and updating your `ignore_list` can help ensure Cody's performance remains optimal. 76 | 77 | - **Use Wildcards Wisely**: While wildcards offer powerful flexibility, they can also lead to unintentionally ignoring important files. Be specific in your patterns to avoid such issues. 78 | 79 | - **Test Changes**: After updating your `ignore_list`, perform a few tests to ensure that the changes behave as expected, especially if using complex patterns. 80 | 81 | By carefully configuring your `ignore_list`, you can tailor Cody to better suit your project's needs, enhancing both its efficiency and relevance to your coding tasks. 82 | 83 | 84 | ## ⚠️ Notes & Tips 85 | 86 | - Cody uses the FAISS library for efficient similarity search in storing vectors. Please ensure you have sufficient memory available, especially when monitoring a large number of files. 87 | - Additionally, be sure to monitor your OpenAI api usage. A helpful tip is to set a monthly spend limit inside of your OpenAI account to prevent anything crazy from happening. As an additional helper, it prints the number of tokens used in each call you make. 88 | - "LIVE" coding questions. To use to it's full potential. I recommend opening a seperate terminal or even command prompt cd'ing into your project directory, and then launching python cody.py. Then place it split screen with your code in a small viewing window on the far left or right. This way, you can use a seperate terminal for actually running your code without worrying about Cody or having to run him (er... it) each time! This will still continue to update with each file save you do on any file so it always is using the latest data. 89 | 90 | ## Contributing 91 | 92 | Contributions are welcome. Please submit a pull request or open an issue for any bugs or feature requests. 93 | 94 | Happy Coding with Cody! 💡🚀🎉 95 | -------------------------------------------------------------------------------- /cody.py: -------------------------------------------------------------------------------- 1 | from dotenv import load_dotenv 2 | from langchain.text_splitter import CharacterTextSplitter 3 | from langchain_openai import OpenAIEmbeddings 4 | from langchain_community.vectorstores import FAISS 5 | from watchdog.observers import Observer 6 | from watchdog.events import FileSystemEventHandler 7 | import tempfile 8 | import json 9 | import time 10 | import threading 11 | import openai 12 | import os 13 | import speech_recognition as sr 14 | from gtts import gTTS 15 | import pygame 16 | import fnmatch 17 | 18 | 19 | # Load environment variable(s) 20 | load_dotenv() 21 | OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") 22 | 23 | ### USER OPTIONS ### 24 | ### MAX TOKENS PER CALL: MAX TOKENS TO USE FOR CALL 25 | MAX_TOKENS_PER_CALL = 2500 # MAX TOKENS TO USE FOR CALL 26 | IGNORE_THESE = ['.venv', '.env', 'static', 'dashboard/static', 'audio', 'license.md', '.github', '__pycache__'] 27 | 28 | r = sr.Recognizer() 29 | 30 | class FileChangeHandler(FileSystemEventHandler): 31 | def __init__(self, ignore_list=[]): 32 | super().__init__() 33 | self._busy_files = {} 34 | self.cooldown = 5.0 # Cooldown in seconds 35 | self.ignore_list = ignore_list # Ignore list 36 | self.data = {} 37 | self.knowledge_base = {} 38 | self.embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY) 39 | 40 | def should_ignore(self, path): 41 | for pattern in self.ignore_list: 42 | if fnmatch.fnmatch(path, pattern) or any(fnmatch.fnmatch(part, pattern) for part in path.split(os.sep)): 43 | return True 44 | return False 45 | 46 | def on_modified(self, event): 47 | if self.should_ignore(event.src_path): 48 | return 49 | print(f'\n🔄 The file {event.src_path} has changed!') 50 | self.update_file_content() 51 | 52 | def update_file_content(self): 53 | print("\n\U0001F4C1 Collecting files...") 54 | all_files_data = {} 55 | # Check if ".env" is in ignore list, if not prompt warning "Are you sure you want to include your .env in your api call to OpenAI?" 56 | if ".env" not in self.ignore_list: 57 | response = input("😨 You removed .env from ignore list. This may expose .env variables to OpenAI. Confirm? (1 for Yes, 2 for exit):") 58 | if response != "1": 59 | print("\n😅 Phew. Close one... Operation aborted. Please add '.env' to your ignore list and try again.") 60 | exit() 61 | for root, dirs, files in os.walk('.'): 62 | # Remove directories in the ignore list 63 | dirs[:] = [d for d in dirs if d not in self.ignore_list] 64 | for filename in files: 65 | if filename not in self.ignore_list: 66 | file_path = os.path.join(root, filename) 67 | try: 68 | with open(file_path, 'r') as file: 69 | if filename.endswith('.json'): 70 | json_data = json.load(file) 71 | all_files_data[file_path] = json_data # Store JSON data in the dictionary 72 | else: 73 | lines = file.readlines() 74 | line_data = {} 75 | for i, line in enumerate(lines): 76 | line_data[f"line {i + 1}"] = line.strip() 77 | all_files_data[file_path] = line_data 78 | except Exception as e: 79 | continue 80 | #print(f'\U000026A0 Error reading file {file_path}: {str(e)}') 81 | 82 | # Create the final dictionary with the desired format 83 | final_data = {"files": all_files_data} 84 | combined_text = json.dumps(final_data) 85 | 86 | # Split combined text into chunks 87 | text_splitter = CharacterTextSplitter( 88 | separator=",", 89 | chunk_size=1000, 90 | chunk_overlap=200, 91 | length_function=len, 92 | ) 93 | chunks = text_splitter.split_text(combined_text) 94 | # print(combined_text) 95 | # Create or update the knowledge base 96 | self.knowledge_base = FAISS.from_texts(chunks, self.embeddings) 97 | 98 | print("\U00002705 All set!") 99 | audio_stream = create_audio("Files updated. Ready for questions") 100 | play_audio(audio_stream) 101 | 102 | def play_audio(file_path): 103 | """ 104 | Play audio from a file 105 | """ 106 | pygame.mixer.init() 107 | pygame.mixer.music.load(file_path) 108 | pygame.mixer.music.play() 109 | 110 | while pygame.mixer.music.get_busy(): 111 | continue 112 | 113 | pygame.mixer.music.unload() 114 | os.unlink(file_path) # Delete the temporary file 115 | print("Deleted temp audio file in: " + file_path) 116 | 117 | def create_audio(text): 118 | """ 119 | Create an audio file from text and return the path to a temporary file 120 | """ 121 | temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3") 122 | print(f"\nCreated temp audio file in : {temp_file.name}") 123 | try: 124 | speech = gTTS(text=text, lang='en', slow=False) 125 | speech.save(temp_file.name) 126 | except Exception as e: 127 | print(f"\nError in creating audio: {e}") 128 | 129 | return temp_file.name 130 | 131 | def generate_response(prompt, speak_response=True): 132 | openai.api_key = OPENAI_API_KEY 133 | try: 134 | completion = openai.chat.completions.create( 135 | model="gpt-3.5-turbo", 136 | messages=[{"role": "user", "content": prompt}], 137 | max_tokens=MAX_TOKENS_PER_CALL, 138 | ) 139 | print("\n\U0001F4B0 Tokens used:", completion.usage.total_tokens) 140 | response_text = completion.choices[0].message.content 141 | print('\U0001F916', response_text) 142 | if speak_response: 143 | audio_stream = create_audio(response_text) 144 | play_audio("audio/response.mp3") 145 | except Exception as e: 146 | print(f"\U000026A0 Error in generating response: {e}") 147 | 148 | def monitor_input(handler, terminal_input=True): 149 | while True: 150 | try: 151 | if terminal_input: 152 | text = input("\U00002753 Please type your question (or 'exit' to quit): ") 153 | else: 154 | with sr.Microphone() as source: 155 | print("\nListening...") 156 | audio_data = r.listen(source) 157 | text = r.recognize_google(audio_data) 158 | 159 | if text.lower() == 'exit': 160 | print("\n\U0001F44B Exiting the program...") 161 | os._exit(0) 162 | else: 163 | print(f"You said: {text}") 164 | question = text 165 | print("\n\U0001F9E0 You asked: " + question) 166 | docs = handler.knowledge_base.similarity_search(question) 167 | response = f"You are an expert programmer who is aware of this much of the code base:{str(docs)}. \n" 168 | response += "Please answer this: " + question + "..." # Add the rest of your instructions here 169 | generate_response(response, speak_response=not terminal_input) 170 | except sr.UnknownValueError: 171 | print("\nCould not understand audio") 172 | except sr.RequestError as e: 173 | print("\nCould not request results; {0}".format(e)) 174 | except Exception as e: 175 | print(f"An error occurred: {e}") 176 | 177 | def start_cody(ignore_list=[]): 178 | handler = FileChangeHandler(ignore_list=IGNORE_THESE) 179 | 180 | # Collect files before starting the observer 181 | handler.update_file_content() # Directly call the update_file_content method 182 | 183 | # Prompt user for interaction method 184 | interaction_method = input("\nHow should I talk to you? Enter 1 for Terminal or 2 for Speech I/O: ") 185 | 186 | terminal_input = interaction_method == '1' 187 | 188 | # Start a new thread to monitor input 189 | input_thread = threading.Thread(target=monitor_input, args=(handler, terminal_input)) 190 | input_thread.start() 191 | 192 | # Initialize the observer 193 | observer = Observer() 194 | observer.schedule(handler, path='.', recursive=True) 195 | observer.start() 196 | 197 | # Continue to observe for file changes 198 | try: 199 | while True: 200 | time.sleep(5) 201 | except KeyboardInterrupt: 202 | observer.stop() 203 | 204 | observer.join() 205 | 206 | if __name__ == "__main__": 207 | start_cody(ignore_list=IGNORE_THESE) 208 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | aiohttp==3.9.3 2 | aiosignal==1.3.1 3 | annotated-types==0.6.0 4 | anyio==4.3.0 5 | attrs==23.2.0 6 | certifi==2024.2.2 7 | charset-normalizer==3.3.2 8 | click==8.1.7 9 | colorama==0.4.6 10 | dataclasses-json==0.5.9 11 | distro==1.9.0 12 | faiss-cpu==1.8.0 13 | filelock==3.13.1 14 | frozenlist==1.4.1 15 | fsspec==2024.2.0 16 | greenlet==3.0.3 17 | gTTS==2.5.1 18 | h11==0.14.0 19 | httpcore==1.0.4 20 | httpx==0.27.0 21 | huggingface-hub==0.21.3 22 | idna==3.6 23 | importlib-metadata==7.0.1 24 | Jinja2==3.1.3 25 | jsonpatch==1.33 26 | jsonpointer==2.4 27 | langchain==0.1.10 28 | langchain-community==0.0.25 29 | langchain-core==0.1.28 30 | langchain-openai==0.0.8 31 | langchain-text-splitters==0.0.1 32 | langsmith==0.1.13 33 | MarkupSafe==2.1.5 34 | marshmallow==3.21.0 35 | marshmallow-enum==1.5.1 36 | multidict==6.0.5 37 | mypy-extensions==1.0.0 38 | numpy==1.26.4 39 | openai==1.13.3 40 | orjson==3.9.15 41 | packaging==23.2 42 | pydantic==2.6.3 43 | pydantic_core==2.16.3 44 | pygame==2.5.2 45 | python-dotenv==1.0.1 46 | PyYAML==6.0.1 47 | regex==2023.12.25 48 | requests==2.31.0 49 | sniffio==1.3.1 50 | SpeechRecognition==3.10.1 51 | SQLAlchemy==2.0.27 52 | tenacity==8.2.3 53 | tiktoken==0.6.0 54 | tokenizers==0.15.2 55 | tqdm==4.66.2 56 | typing-inspect==0.9.0 57 | typing_extensions==4.10.0 58 | urllib3==2.2.1 59 | watchdog==4.0.0 60 | yarl==1.9.4 61 | zipp==3.17.0 62 | --------------------------------------------------------------------------------