├── .gitignore
├── .local.env
├── LICENSE
├── README.md
├── cody.py
└── requirements.txt


/.gitignore:
--------------------------------------------------------------------------------
1 | .venv
2 | .env
3 | 


--------------------------------------------------------------------------------
/.local.env:
--------------------------------------------------------------------------------
1 | OPENAI_API_KEY=YOUR_API_KEY_HERE


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2023 Drew H.
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # 🤖 Cody - Your AI Coding Assistant 
 3 | [![Star History Chart](https://api.star-history.com/svg?repos=ajhous44/cody&type=Date)](https://star-history.com/#ajhous44/cody&Date)
 4 | 
 5 | Welcome to Cody! An AI assistant designed to let you interactively query your codebase using natural language. By utilizing vector embeddings, chunking, and OpenAI's language models, Cody can help you navigate through your code in an efficient and intuitive manner. 💻
 6 | https://www.star-history.com/#ajhous44/cody&Date
 7 | ![image](https://github.com/ajhous44/cody/assets/42582780/f2a62a20-663c-4ec1-b000-67257331fb12)
 8 | ## LINK
 9 | https://www.loom.com/share/eba1d0dcee20430fbd412580d1c0ea0e?sid=4998cf6f-45b4-480d-b742-6f22f3a49dc3
10 | 
11 | 
12 | Cody continuously updates its knowledge base every time you save a file, ensuring you have the most up-to-date information. You can customize your setup by specifying directories to ignore in the `ignore_list`.
13 | 
14 | ## 🚀 Getting Started
15 | 
16 | 1. Clone the repo
17 | 2. (Optionally) Setup virtual environment by running `pip install -m venv .venv` and then `pip install -r requirements.txt` in terminal from the root of your directory
18 | 3. Rename the `.local.env` file to `.env`` and replace `YOUR_API_KEY_HERE` with your OpenAI API Key.
19 | 4. Modify the `IGNORE_THESE` global var at the top of the script to specify directories and files you wish to exclude from monitoring. (You should comment out any large files like a virtual environment, cache, js libraries you have downloaded, etc...)
20 | 5. Run the script using Python: python cody.py and follow terminal for setup. It will prompt you for if you want to use text chat (terminal) or conversational (speech i/o). It will also warn you if you remove .env from the ignore list.
21 | 
22 | ## 🎯 Features
23 | 
24 | - **File Monitoring**: Real-time monitoring of all files in your project's directory and subdirectories. 👀
25 | - **Embedding-based Knowledge Base**: Create a knowledge base using OpenAI Embeddings. Cody collects the contents of all text and JSON files and adds them to this knowledge base. 📚
26 | - **Interactive Q&A**: Listen to user inputs. Ask questions, and Cody will generate a response using the knowledge base. 🧠
27 | - **Customizable**: Easily specify files or directories to ignore during monitoring.
28 | 
29 | ## 🛠 Dependencies
30 | 
31 | - `dotenv`: Load variables from a `.env` file into the environment.
32 | - `langchain-community`: A language processing library used for embeddings and vector storage. Previously `langchain`.
33 | - `langchain_openai`: Provides the `OpenAIEmbeddings` functionality, integrating OpenAI models directly with langchain's architecture.
34 | - `litellm`: Call all LLM APIs using the OpenAI format (https://github.com/BerriAI/litellm)
35 | - `watchdog`: Monitor filesystem events in real-time.
36 | - `openai`: Generate smart responses using OpenAI's language model.
37 | - `speech_recognition`: Convert speech to text for voice interaction.
38 | - `gtts`: Google Text-to-Speech library for generating audio from text.
39 | - `pygame`: Library to play audio files.
40 | 
41 | ## 💡 Usage
42 | 
43 | - To stop the script, type 'exit' or speak the word 'exit' and press enter. Cody will gracefully terminate the program.
44 | 
45 | ### Configuring the Ignore List
46 | 
47 | Cody allows you to specify which files and directories should be ignored during file monitoring. This is particularly useful for excluding files that change frequently, are not relevant to your queries, or could contain sensitive information.
48 | 
49 | To customize your `ignore_list`, add patterns matching the files or directories you wish to exclude. Cody supports simple wildcard patterns for flexibility. Here are some examples to guide you:
50 | 
51 | #### Examples
52 | 
53 | - **Ignoring Specific Files**: If you want to ignore all `.env` files, you can add `*.env` to the ignore list.
54 |     ```python
55 |     IGNORE_THESE = ['*.env']
56 |     ```
57 | 
58 | - **Ignoring Directories**: To ignore an entire directory, such as `node_modules` or a virtual environment directory like `.venv`, simply add the directory name.
59 |     ```python
60 |     IGNORE_THESE = ['node_modules', '.venv']
61 |     ```
62 | 
63 | - **Ignoring File Extensions**: To ignore all files with a specific extension, such as `.log` or `.tmp`, use the wildcard pattern `*`.
64 |     ```python
65 |     IGNORE_THESE = ['*.log', '*.tmp']
66 |     ```
67 | 
68 | - **Complex Patterns**: You can combine directory names and wildcards to ignore specific types of files within certain directories. For example, to ignore all `.md` files in the `docs` directory:
69 |     ```python
70 |     IGNORE_THESE = ['docs/*.md']
71 |     ```
72 | 
73 | #### Tips for Configuring Your Ignore List
74 | 
75 | - **Review Regularly**: As your project evolves, so too may the files and directories you need to ignore. Regularly reviewing and updating your `ignore_list` can help ensure Cody's performance remains optimal.
76 | 
77 | - **Use Wildcards Wisely**: While wildcards offer powerful flexibility, they can also lead to unintentionally ignoring important files. Be specific in your patterns to avoid such issues.
78 | 
79 | - **Test Changes**: After updating your `ignore_list`, perform a few tests to ensure that the changes behave as expected, especially if using complex patterns.
80 | 
81 | By carefully configuring your `ignore_list`, you can tailor Cody to better suit your project's needs, enhancing both its efficiency and relevance to your coding tasks.
82 | 
83 | 
84 | ## ⚠️ Notes & Tips
85 | 
86 | - Cody uses the FAISS library for efficient similarity search in storing vectors. Please ensure you have sufficient memory available, especially when monitoring a large number of files.
87 | - Additionally, be sure to monitor your OpenAI api usage. A helpful tip is to set a monthly spend limit inside of your OpenAI account to prevent anything crazy from happening. As an additional helper, it prints the number of tokens used in each call you make.
88 | - "LIVE" coding questions. To use to it's full potential. I recommend opening a seperate terminal or even command prompt cd'ing into your project directory, and then launching python cody.py. Then place it split screen with your code in a small viewing window on the far left or right. This way, you can use a seperate terminal for actually running your code without worrying about Cody or having to run him (er... it) each time! This will still continue to update with each file save you do on any file so it always is using the latest data.
89 | 
90 | ## Contributing
91 | 
92 | Contributions are welcome. Please submit a pull request or open an issue for any bugs or feature requests.
93 | 
94 | Happy Coding with Cody! 💡🚀🎉
95 | 


--------------------------------------------------------------------------------
/cody.py:
--------------------------------------------------------------------------------
  1 | from dotenv import load_dotenv
  2 | from langchain.text_splitter import CharacterTextSplitter
  3 | from langchain_openai import OpenAIEmbeddings
  4 | from langchain_community.vectorstores import FAISS
  5 | from watchdog.observers import Observer
  6 | from watchdog.events import FileSystemEventHandler
  7 | import tempfile
  8 | import json
  9 | import time
 10 | import threading
 11 | import openai
 12 | import os
 13 | import speech_recognition as sr
 14 | from gtts import gTTS
 15 | import pygame
 16 | import fnmatch
 17 | 
 18 | 
 19 | # Load environment variable(s)
 20 | load_dotenv()
 21 | OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
 22 | 
 23 | ### USER OPTIONS ###
 24 | ### MAX TOKENS PER CALL: MAX TOKENS TO USE FOR CALL
 25 | MAX_TOKENS_PER_CALL = 2500 # MAX TOKENS TO USE FOR CALL
 26 | IGNORE_THESE = ['.venv', '.env', 'static', 'dashboard/static', 'audio', 'license.md', '.github', '__pycache__']
 27 | 
 28 | r = sr.Recognizer()
 29 | 
 30 | class FileChangeHandler(FileSystemEventHandler):
 31 | 	def __init__(self, ignore_list=[]):
 32 | 		super().__init__()
 33 | 		self._busy_files = {}
 34 | 		self.cooldown = 5.0  # Cooldown in seconds
 35 | 		self.ignore_list = ignore_list  # Ignore list
 36 | 		self.data = {}
 37 | 		self.knowledge_base = {}
 38 | 		self.embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
 39 | 
 40 | 	def should_ignore(self, path):
 41 | 		for pattern in self.ignore_list:
 42 | 			if fnmatch.fnmatch(path, pattern) or any(fnmatch.fnmatch(part, pattern) for part in path.split(os.sep)):
 43 | 				return True
 44 | 		return False
 45 | 
 46 | 	def on_modified(self, event):
 47 | 		if self.should_ignore(event.src_path):
 48 | 			return
 49 | 		print(f'\n🔄 The file {event.src_path} has changed!')
 50 | 		self.update_file_content()
 51 | 
 52 | 	def update_file_content(self):
 53 | 		print("\n\U0001F4C1 Collecting files...")
 54 | 		all_files_data = {}
 55 | 		# Check if ".env" is in ignore list, if not prompt warning "Are you sure you want to include your .env in your api call to OpenAI?"
 56 | 		if ".env" not in self.ignore_list:
 57 | 			response = input("😨 You removed .env from ignore list. This may expose .env variables to OpenAI. Confirm? (1 for Yes, 2 for exit):")
 58 | 			if response != "1":
 59 | 				print("\n😅 Phew. Close one... Operation aborted. Please add '.env' to your ignore list and try again.")
 60 | 				exit()
 61 | 		for root, dirs, files in os.walk('.'):
 62 | 			# Remove directories in the ignore list
 63 | 			dirs[:] = [d for d in dirs if d not in self.ignore_list]
 64 | 			for filename in files:
 65 | 				if filename not in self.ignore_list:
 66 | 					file_path = os.path.join(root, filename)
 67 | 					try:
 68 | 						with open(file_path, 'r') as file:
 69 | 							if filename.endswith('.json'):
 70 | 								json_data = json.load(file)
 71 | 								all_files_data[file_path] = json_data  # Store JSON data in the dictionary
 72 | 							else:
 73 | 								lines = file.readlines()
 74 | 								line_data = {}
 75 | 								for i, line in enumerate(lines):
 76 | 									line_data[f"line {i + 1}"] = line.strip()
 77 | 								all_files_data[file_path] = line_data
 78 | 					except Exception as e:
 79 | 						continue
 80 | 						#print(f'\U000026A0 Error reading file {file_path}: {str(e)}')
 81 | 	
 82 | 		# Create the final dictionary with the desired format
 83 | 		final_data = {"files": all_files_data}
 84 | 		combined_text = json.dumps(final_data)
 85 | 	
 86 | 		# Split combined text into chunks
 87 | 		text_splitter = CharacterTextSplitter(
 88 | 			separator=",",
 89 | 			chunk_size=1000,
 90 | 			chunk_overlap=200,
 91 | 			length_function=len,
 92 | 		)
 93 | 		chunks = text_splitter.split_text(combined_text)
 94 | 		# print(combined_text)
 95 | 		# Create or update the knowledge base
 96 | 		self.knowledge_base = FAISS.from_texts(chunks, self.embeddings)
 97 | 		
 98 | 		print("\U00002705 All set!")
 99 | 		audio_stream = create_audio("Files updated. Ready for questions")
100 | 		play_audio(audio_stream)
101 | 
102 | def play_audio(file_path):
103 | 	"""
104 | 	Play audio from a file
105 | 	"""
106 | 	pygame.mixer.init()
107 | 	pygame.mixer.music.load(file_path)
108 | 	pygame.mixer.music.play()
109 | 
110 | 	while pygame.mixer.music.get_busy():
111 | 		continue
112 | 
113 | 	pygame.mixer.music.unload()
114 | 	os.unlink(file_path)  # Delete the temporary file
115 | 	print("Deleted temp audio file in: " + file_path)
116 | 
117 | def create_audio(text):
118 | 	"""
119 | 	Create an audio file from text and return the path to a temporary file
120 | 	"""
121 | 	temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")
122 | 	print(f"\nCreated temp audio file in : {temp_file.name}")
123 | 	try:
124 | 		speech = gTTS(text=text, lang='en', slow=False)
125 | 		speech.save(temp_file.name)
126 | 	except Exception as e:
127 | 		print(f"\nError in creating audio: {e}")
128 | 
129 | 	return temp_file.name
130 | 	
131 | def generate_response(prompt, speak_response=True):
132 | 	openai.api_key = OPENAI_API_KEY
133 | 	try:
134 | 		completion = openai.chat.completions.create(
135 | 		model="gpt-3.5-turbo", 
136 | 		messages=[{"role": "user", "content": prompt}],
137 | 		max_tokens=MAX_TOKENS_PER_CALL,
138 | 		)
139 | 		print("\n\U0001F4B0 Tokens used:", completion.usage.total_tokens)
140 | 		response_text = completion.choices[0].message.content
141 | 		print('\U0001F916', response_text)
142 | 		if speak_response:
143 | 			audio_stream = create_audio(response_text)
144 | 			play_audio("audio/response.mp3")
145 | 	except Exception as e:
146 | 		print(f"\U000026A0 Error in generating response: {e}")
147 | 
148 | def monitor_input(handler, terminal_input=True):
149 | 	while True:
150 | 		try:
151 | 			if terminal_input:
152 | 				text = input("\U00002753 Please type your question (or 'exit' to quit): ")
153 | 			else:
154 | 				with sr.Microphone() as source:
155 | 					print("\nListening...")
156 | 					audio_data = r.listen(source)
157 | 					text = r.recognize_google(audio_data)
158 | 
159 | 			if text.lower() == 'exit':
160 | 				print("\n\U0001F44B Exiting the program...")
161 | 				os._exit(0)
162 | 			else:
163 | 				print(f"You said: {text}")
164 | 				question = text
165 | 				print("\n\U0001F9E0 You asked: " + question)
166 | 				docs = handler.knowledge_base.similarity_search(question)
167 | 				response = f"You are an expert programmer who is aware of this much of the code base:{str(docs)}. \n"
168 | 				response += "Please answer this: " + question + "..." # Add the rest of your instructions here
169 | 				generate_response(response, speak_response=not terminal_input)
170 | 		except sr.UnknownValueError:
171 | 			print("\nCould not understand audio")
172 | 		except sr.RequestError as e:
173 | 			print("\nCould not request results; {0}".format(e))
174 | 		except Exception as e:
175 | 			print(f"An error occurred: {e}")
176 | 
177 | def start_cody(ignore_list=[]):
178 | 	handler = FileChangeHandler(ignore_list=IGNORE_THESE)
179 | 
180 | 	# Collect files before starting the observer
181 | 	handler.update_file_content()  # Directly call the update_file_content method
182 | 
183 | 	# Prompt user for interaction method
184 | 	interaction_method = input("\nHow should I talk to you? Enter 1 for Terminal or 2 for Speech I/O: ")
185 | 
186 | 	terminal_input = interaction_method == '1'
187 | 	
188 | 	# Start a new thread to monitor input
189 | 	input_thread = threading.Thread(target=monitor_input, args=(handler, terminal_input))
190 | 	input_thread.start()
191 | 
192 | 	# Initialize the observer
193 | 	observer = Observer()
194 | 	observer.schedule(handler, path='.', recursive=True)
195 | 	observer.start()
196 | 
197 | 	# Continue to observe for file changes
198 | 	try:
199 | 		while True:
200 | 			time.sleep(5)
201 | 	except KeyboardInterrupt:
202 | 		observer.stop()
203 | 
204 | 	observer.join()
205 | 
206 | if __name__ == "__main__":
207 | 	start_cody(ignore_list=IGNORE_THESE)
208 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | aiohttp==3.9.3
 2 | aiosignal==1.3.1
 3 | annotated-types==0.6.0
 4 | anyio==4.3.0
 5 | attrs==23.2.0
 6 | certifi==2024.2.2
 7 | charset-normalizer==3.3.2
 8 | click==8.1.7
 9 | colorama==0.4.6
10 | dataclasses-json==0.5.9
11 | distro==1.9.0
12 | faiss-cpu==1.8.0
13 | filelock==3.13.1
14 | frozenlist==1.4.1
15 | fsspec==2024.2.0
16 | greenlet==3.0.3
17 | gTTS==2.5.1
18 | h11==0.14.0
19 | httpcore==1.0.4
20 | httpx==0.27.0
21 | huggingface-hub==0.21.3
22 | idna==3.6
23 | importlib-metadata==7.0.1
24 | Jinja2==3.1.3
25 | jsonpatch==1.33
26 | jsonpointer==2.4
27 | langchain==0.1.10
28 | langchain-community==0.0.25
29 | langchain-core==0.1.28
30 | langchain-openai==0.0.8
31 | langchain-text-splitters==0.0.1
32 | langsmith==0.1.13
33 | MarkupSafe==2.1.5
34 | marshmallow==3.21.0
35 | marshmallow-enum==1.5.1
36 | multidict==6.0.5
37 | mypy-extensions==1.0.0
38 | numpy==1.26.4
39 | openai==1.13.3
40 | orjson==3.9.15
41 | packaging==23.2
42 | pydantic==2.6.3
43 | pydantic_core==2.16.3
44 | pygame==2.5.2
45 | python-dotenv==1.0.1
46 | PyYAML==6.0.1
47 | regex==2023.12.25
48 | requests==2.31.0
49 | sniffio==1.3.1
50 | SpeechRecognition==3.10.1
51 | SQLAlchemy==2.0.27
52 | tenacity==8.2.3
53 | tiktoken==0.6.0
54 | tokenizers==0.15.2
55 | tqdm==4.66.2
56 | typing-inspect==0.9.0
57 | typing_extensions==4.10.0
58 | urllib3==2.2.1
59 | watchdog==4.0.0
60 | yarl==1.9.4
61 | zipp==3.17.0
62 | 


--------------------------------------------------------------------------------