├── .askignore
├── .gitignore
├── LICENSE
├── README.md
├── askmyfiles.py
├── bin
    ├── start_env
    └── stop_env
├── requirements.txt
└── tags


/.askignore:
--------------------------------------------------------------------------------
1 | .vectordatadb/
2 | .git/
3 | tags
4 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .vectordatadb
2 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | @2023 codeprimate under the MIT License.
 2 | 
 3 | ====
 4 | MIT License
 5 | 
 6 | Permission is hereby granted, free of charge, to any person obtaining a copy of
 7 | this software and associated documentation files (the "Software"), to deal in
 8 | the Software without restriction, including without limitation the rights to
 9 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
10 | of the Software, and to permit persons to whom the Software is furnished to do
11 | so, subject to the following conditions:
12 | 
13 | The above copyright notice and this permission notice shall be included in all
14 | copies or substantial portions of the Software.
15 | 
16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
21 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22 | SOFTWARE.
23 | ===
24 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # askymyfiles Python App
 2 | 
 3 | This app creates a local database of the current directory and utilizes it in conjunction with ChatGPT for answering questions.
 4 | 
 5 | Data is stored in a ChromaDB database in `.vectordatadb`
 6 | 
 7 | ### Install
 8 | 
 9 | You need an OpenAI API account. You figure that out and get a key.
10 | Make sure the environment variable OPENAI_API_KEY is set.
11 | 
12 | 
13 | ```
14 | # You've got python3 installed, right?
15 | # Clone this repo somewhere convenient. git pull if you want new stuff
16 | cd ~/Code/ && git clone https://github.com/codeprimate/askymyfiles.git
17 | cd askmyfiles
18 | pip install -r requirements.txt
19 | 
20 | # Make it easy to use.
21 | ln -sf /path/to/askmyfiles.py ~/bin/askmyfiles
22 | chmod u+x ~/bin/askmyfiles
23 | ```
24 | 
25 | ## Usage
26 | 
27 | Always run askmyfiles at the root of your project folder!!!
28 | 
29 | askmyfiles looks for new and updated information.
30 | 
31 | Add a list of files or directories to ignore in `.askignore` (Like a `.gitignore`).
32 | Add a list of hints/instructions for chat in `.askmyfileshints`
33 | 
34 | To update or add a directory, file, or single webpage to the local database, then "add":
35 | 
36 | ```
37 | ~/bin/askymyfiles add the/path/to/file.txt
38 | ~/bin/askymyfiles add /really/the/path/to/file.txt
39 | ~/bin/askymyfiles add the/path/
40 | ~/bin/askymyfiles add /really/the/path/
41 | ~/bin/askymyfiles add https://www.example.com/file.html
42 | 
43 | # if there is a failure/interruption touch the file and retry
44 | touch the/path/to/file.txt
45 | ~/bin/askymyfiles add the/path/to/file.txt
46 | ```
47 | 
48 | To ask a question using the gpt-3.5-turbo-16k model, then "ask":
49 | 
50 | ```
51 | ~/bin/askmyfiles ask "Your question."
52 | ~/bin/askmyfiles "Your question"
53 | ```
54 | 
55 | You can also pipe queries to askmyfiles:
56 | 
57 | ```
58 | echo "What is the name of my app" | ~/bin/askmyfiles
59 | ```
60 | 
61 | ### Note
62 |  - Results are not perfect or deterministic.
63 |  - Hallucinations can and will occur so ask the question more than once
64 |  - Do some prompt engineering as needed to tease the information you want out of your data.
65 |  
66 | 
67 | To list all entries in the database, then "list":
68 | 
69 | ```
70 | ~/bin/askmyfiles list
71 | ```
72 | 
73 | To remove a file or URL from the database, then "remove":
74 | 
75 | ```
76 | # Specify a single resource to remove 
77 | # Also useful when adding fails
78 | ~/bin/askymyfiles remove the/path/to/file.txt
79 | ~/bin/askymyfiles remove /really/the/path/to/file.txt
80 | ~/bin/askymyfiles remove the/path
81 | ~/bin/askymyfiles remove "https://www.example.com/example.html"
82 | ```
83 | 
84 | Once the file is loaded into the database, you don't NEED it in the project directory anymore.
85 | 
86 | 
87 | ### Back it Up
88 | 
89 | ```
90 | tar cvf my_db.tgz .vectordatadb
91 | ```
92 | 
93 | All file paths are relative. Drop your database anywhere.
94 | 
95 | # TODO
96 | 
97 | - oobabooga integration!
98 | - GitHub Issues integration (suggest root cause or solutions)
99 | 


--------------------------------------------------------------------------------
/askmyfiles.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | 
  3 | import chromadb
  4 | import concurrent.futures
  5 | import fnmatch
  6 | import hashlib
  7 | import os
  8 | import re
  9 | import requests
 10 | import sys
 11 | import time
 12 | from bs4 import BeautifulSoup
 13 | from chromadb.config import Settings
 14 | from langchain.chains import LLMChain
 15 | from langchain.chains import SimpleSequentialChain
 16 | from langchain.chat_models import ChatOpenAI
 17 | from langchain.document_loaders import PyPDFLoader
 18 | from langchain.embeddings import OpenAIEmbeddings
 19 | from langchain.llms import OpenAI
 20 | from langchain.prompts import PromptTemplate
 21 | from langchain.text_splitter import RecursiveCharacterTextSplitter
 22 | 
 23 | class AskMyFiles:
 24 |     def __init__(self, filename=None, using_stdin=False):
 25 |         self.filename = filename
 26 |         self.db_folder = '.vectordatadb'
 27 |         self.db_path = os.path.join(os.getcwd(), self.db_folder)
 28 |         self.relative_working_path = self.db_path + "/../"
 29 |         if filename is None:
 30 |             self.working_path = os.getcwd()
 31 |             self.recurse = True
 32 |         else:
 33 |             if os.path.isdir(filename):
 34 |                 self.working_path = os.path.abspath(filename)
 35 |                 self.recurse = True
 36 |             else:
 37 |                 self.working_path = os.path.dirname(os.path.abspath(filename))
 38 |                 self.recurse = False
 39 | 
 40 |         self.askhints_file = ".askmyfileshints"
 41 |         self.askignore_file = ".askignore"
 42 |         self.askhints_path = f"{self.relative_working_path}{self.askhints_file}"
 43 |         self.collection_name = "filedata"
 44 |         self.chromadb = None
 45 |         self.api_key = os.getenv('OPENAI_API_KEY')
 46 |         self.embeddings_model = OpenAIEmbeddings(openai_api_key=self.api_key)
 47 | 
 48 |         self.max_excerpt_chars = 25000
 49 |         self.openai_model = "gpt-3.5-turbo-16k"
 50 |         self.model_temperature = 0.6
 51 |         self.chunk_size = 1000
 52 |         self.chunk_overlap = 100 
 53 |         self.using_stdin = using_stdin
 54 | 
 55 |     def load_db(self):
 56 |         if self.chromadb is None:
 57 |             self.chromadb = chromadb.Client(Settings(chroma_db_impl="duckdb+parquet", persist_directory=self.db_path))
 58 |             self.files_collection = self.chromadb.get_or_create_collection(self.collection_name)
 59 |         if self.files_collection is None:
 60 |             self.files_collection = self.chromadb.get_or_create_collection(self.collection_name)
 61 | 
 62 |     def persist_db(self):
 63 |         self.chromadb.persist()
 64 | 
 65 |     def reset_db(self):
 66 |         self.load_db()
 67 |         self.chromadb.reset()
 68 | 
 69 |     def file_info(self,filename):
 70 |         self.load_db()
 71 |         file_hash = hashlib.sha256(filename.encode()).hexdigest()
 72 |         print(f"Finding '{filename}' ({file_hash})...")
 73 |         found_files = self.files_collection.get(where={"source": filename})
 74 |         print(found_files)
 75 | 
 76 | 
 77 |     def join_strings(self,lst):
 78 |         result = ''
 79 |         for item in lst:
 80 |             if isinstance(item, list):
 81 |                 result += self.join_strings(item) + '\n\n\n'
 82 |             else:
 83 |                 result += item + '\n\n\n'
 84 |         return result.strip()
 85 | 
 86 |     def process_query_result(self, documents):
 87 |         output = []
 88 |         max_excerpt_chars = self.max_excerpt_chars
 89 |         doc_count = len(documents['metadatas'][0])
 90 |         references = [documents['metadatas'][0][index]['source'] for index in range(doc_count - 1)]
 91 |         for index in range(0, doc_count - 1):
 92 |             output.append(f"""### Start Excerpt from file source {documents['metadatas'][0][index]['source']}
 93 | {documents['documents'][0][index]}
 94 | ### End Excerpt from file source {documents['metadatas'][0][index]['source']}""")
 95 | 
 96 |         return [references, self.join_strings(output)[:max_excerpt_chars]]
 97 | 
 98 |     def query_db(self, string ):
 99 |         max_results = 50
100 |         self.load_db()
101 |         query_embedding = self.embeddings_model.embed_query(string)
102 |         result = self.files_collection.query(query_embeddings=[query_embedding],n_results=max_results,include=['documents','metadatas'])
103 |         return self.process_query_result(result)
104 | 
105 |     def list_files(self):
106 |         self.load_db()
107 |         results = self.files_collection.get(
108 |             where={"source": { "$ne": "FILELISTQUERYDUMMYCOMPARISON"}},
109 |             include=["metadatas"]
110 |         )
111 | 
112 |         files = sorted(set([results['metadatas'][index]['source'] for index in range(len(results['metadatas']) - 1)]))
113 | 
114 |         print("\n".join(files))
115 | 
116 |         return True
117 | 
118 | 
119 |     def get_ignore_list(self):
120 |         ignore_files = []
121 | 
122 |         ignore_files.append(self.db_folder)
123 |         ignore_files.append('.git')
124 |         image_formats = [ 'jpg', 'jpeg', 'png', 'gif', 'bmp', 'tif', 'tiff', 'ico', 'webp', 'svg', 'eps', 'raw', 'cr2', 'nef', 'orf', 'sr2', 'heif', 'bat', 'jpe', 'jfif', 'jif', 'jfi' ]
125 |         for ext in image_formats:
126 |             ignore_files.append(f"/*.{ext}")
127 | 
128 |         askignore_path = os.path.join(self.relative_working_path, self.askignore_file)
129 |         if os.path.exists(askignore_path):
130 |             with open(askignore_path, "r") as file:
131 |                 for line in file.read().splitlines():
132 |                     ignore_files.append(line.strip())
133 | 
134 |         return ignore_files
135 | 
136 |     def get_file_list(self):
137 |         if not self.recurse:
138 |             relative_file_path = os.path.relpath(self.filename, self.relative_working_path)
139 |             return [relative_file_path]
140 | 
141 |         ignore_files = self.get_ignore_list()
142 |         use_ignore = len(ignore_files) > 0
143 | 
144 |         file_list = []
145 |         for root, dirs, files in os.walk(self.working_path):
146 |             for file in files:
147 |                 file_path = os.path.join(root, file)
148 |                 relative_file_path = os.path.relpath(file_path, self.relative_working_path)
149 | 
150 |                 if not use_ignore:
151 |                     file_list.append(relative_file_path)
152 |                     continue
153 | 
154 |                 if not any(pattern == file_path or pattern in file_path or fnmatch.fnmatch(file_path, pattern) for pattern in ignore_files):
155 |                     file_list.append(relative_file_path)
156 | 
157 |         return file_list
158 | 
159 |     def remove_file(self,file_name):
160 |         self.load_db()
161 |         file_list = []
162 |         if os.path.isdir(file_name):
163 |             print(f"Removing all files in {file_name} from database...")
164 |             for root, dirs, files in os.walk(file_name):
165 |                 for file in files:
166 |                     file_path = os.path.join(root, file)
167 |                     relative_file_path = os.path.relpath(file_path, self.relative_working_path)
168 |                     file_list.append(relative_file_path)
169 |         else:
170 |             file_list = [file_name]
171 | 
172 |         found_ids = []
173 |         files_for_deletion = []
174 |         for file_path in file_list:
175 |             found_file = self.files_collection.get(where={"source": file_path},include=['metadatas'])
176 |             found_count = len(found_file['ids'])
177 |             if found_count > 0:
178 |                 found_ids += found_file['ids']
179 |                 files_for_deletion += [found_file['metadatas'][index]['source'] for index in range(found_count) ]
180 | 
181 |         found_ids = list(set(found_ids))
182 |         files_for_deletion = list(set(files_for_deletion))
183 | 
184 |         if found_ids == []:
185 |             print("File not found in database.")
186 |             return
187 | 
188 |         print("Removing the following files from the database:")
189 |         print(" - " + "\n - ".join(files_for_deletion))
190 |         self.files_collection.delete(ids=found_ids)
191 |         self.persist_db()
192 | 
193 |         return True
194 | 
195 |     def vectorize_text(self, text):
196 |         return self.embeddings_model.embed_query(text)
197 | 
198 |     def vectorize_chunk(self, chunk, metadata, index):
199 |         embedding = self.vectorize_text(chunk)
200 |         cid = f"{metadata['file_hash']}-{index}"
201 |         return {"id": cid, "document": chunk, "embedding": embedding, "metadata": metadata}
202 | 
203 |     def vectorize_chunks(self, chunks, metadata):
204 |         max_threads = min(len(chunks), 5)
205 |         vectorized_chunks = {}
206 |         cindex = 1
207 |         iterator = iter(chunks)
208 | 
209 |         with concurrent.futures.ThreadPoolExecutor(max_workers=max_threads) as executor:
210 |             for chunk_group in zip(*[iterator] * max_threads):
211 |                 starting_index = cindex
212 |                 num_threads = min(max_threads, len(chunk_group))
213 |                 futures = []
214 |                 for thread_index in range(num_threads):
215 |                     futures.append(executor.submit(self.vectorize_chunk, chunk_group[thread_index], metadata, cindex))
216 |                     cindex += 1
217 |                 i = 0
218 |                 for future in concurrent.futures.as_completed(futures):
219 |                     result = future.result()
220 |                     chunk_index = starting_index + i
221 |                     vectorized_chunks[f"chunk-{chunk_index}"] = result
222 |                     print(".",end="",flush=True)
223 |                     i += 1
224 |                 concurrent.futures.wait(futures)
225 | 
226 |         return vectorized_chunks
227 | 
228 |     def read_file(self, file_path):
229 |         with open(file_path, 'r') as file:
230 |             try:
231 |                 if os.path.splitext(file_path)[1] == '.pdf':
232 |                     # PDF Processing
233 |                     loader = PyPDFLoader(file_path)
234 |                     pages = loader.load_and_split()
235 |                     content = []
236 |                     for page in pages:
237 |                         content.append(str(page.page_content))
238 |                     return self.join_strings(content)
239 |                 else:
240 |                     # Plain Text Processing
241 |                     return file.read()
242 |             except Exception as e:
243 |                 print(f"Error reading {file_path}...[Skipped]")
244 |                 print
245 |                 return None
246 | 
247 |     def save_vectorized_chunks(self, vectorized_chunks, group_size=10):
248 |         chunk_keys = list(vectorized_chunks.keys())
249 |         if len(chunk_keys) == 0:
250 |             return False
251 | 
252 |         batches = [chunk_keys[i:i+group_size] for i in range(0, len(chunk_keys), group_size)]
253 | 
254 |         chunk_keys = list(vectorized_chunks.keys())
255 |         for batch in batches:
256 |             self.files_collection.add(
257 |                 ids=[vectorized_chunks[cid]['id'] for cid in batch],
258 |                 embeddings=[vectorized_chunks[cid]['embedding'] for cid in batch],
259 |                 documents=[vectorized_chunks[cid]['document'] for cid in batch],
260 |                 metadatas=[vectorized_chunks[cid]['metadata'] for cid in batch]
261 |             )
262 |             print("+", end='', flush=True)
263 |         self.persist_db()
264 | 
265 |         return True
266 | 
267 |     def split_text(self, content):
268 |         splitter = RecursiveCharacterTextSplitter(chunk_size=self.chunk_size, chunk_overlap=self.chunk_overlap)
269 |         return splitter.split_text(content)
270 | 
271 |     def add_webpage(self, url):
272 |         start_time = time.time()
273 |         self.load_db()
274 | 
275 |         metadata = {
276 |             "source": url,
277 |             "file_path": url,
278 |             "file_modified": time.time(),
279 |             "file_hash": hashlib.sha256(url.encode()).hexdigest()
280 |         }
281 | 
282 |         print(f"Fetching '{url}'...",end='',flush=True)
283 |         headers = {
284 |             'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
285 |             'Accept-Language': 'en-US,en;q=0.9',
286 |             'Referer': 'https://www.google.com/'
287 |         }
288 |         response = requests.get(url, headers=headers)
289 | 
290 |         if response.status_code == 200:
291 |             soup = BeautifulSoup(response.content, 'html.parser')
292 |             content = soup.get_text()
293 |         else:
294 |             print(f"[Failed: {response.status_code}]")
295 |             return False
296 | 
297 |         print(f"Creating embeddings...",end='',flush=True)
298 |         chunks = self.split_text(content)
299 |         chunk_count = len(chunks)
300 |         print(f"[{len(chunks)} chunks]",end='',flush=True)
301 |         vectorized_chunks = self.vectorize_chunks(chunks, metadata)
302 |         self.files_collection.delete(where={"file_hash": metadata["file_hash"]})
303 |         self.save_vectorized_chunks(vectorized_chunks)
304 | 
305 |         elapsed_time = max(1, int( time.time() - start_time ))
306 |         print(f"[OK] [{elapsed_time}s]", flush=True)
307 | 
308 |     def process_file(self,file_path):
309 |         start_time = time.time()
310 |         self.load_db()
311 | 
312 |         # Get file meta information
313 |         metadata = {
314 |             "source": file_path,
315 |             "file_path": file_path,
316 |             "file_modified": os.path.getmtime(file_path),
317 |             "file_hash": hashlib.sha256(file_path.encode()).hexdigest()
318 |         }
319 | 
320 |         # File exists?
321 |         existing_record = self.files_collection.get(where={"file_hash": metadata["file_hash"]})
322 |         existing = len(existing_record['ids']) != 0 and len(existing_record['metadatas']) != 0
323 |         if existing:
324 |             file_updated = existing_record['metadatas'][0]["file_modified"] < metadata["file_modified"]
325 |         else:
326 |             file_updated = True
327 | 
328 |         # Skip File?
329 |         skip_file = existing and not file_updated
330 |         if skip_file:
331 |             return False
332 | 
333 |         print(f"Creating File Embeddings for: {file_path}...",end='',flush=True)
334 | 
335 |         # Read content and split
336 |         content = self.read_file(file_path)
337 |         if len(content) < 10 and content.strip() == '':
338 |             print(f"[EMPTY]", flush=True)
339 |             return False
340 | 
341 |         chunks = self.split_text(content)
342 |         print(f"[{len(chunks)} chunks]",end='',flush=True)
343 | 
344 |         # Vectorize Chunks
345 |         vectorized_chunks = self.vectorize_chunks(chunks, metadata)
346 |         self.files_collection.delete(where={"file_hash": metadata["file_hash"]})
347 |         self.save_vectorized_chunks(vectorized_chunks)
348 | 
349 |         # Print status
350 |         elapsed_time = max(1, int( time.time() - start_time ))
351 |         print(f"[OK] [{elapsed_time}s]", flush=True)
352 | 
353 |         return True
354 | 
355 |     def load_files(self):
356 |         print("Updating AskMyFiles database...")
357 |         saved_files = False
358 |         for file_path in self.get_file_list():
359 |             try:
360 |                 file_saved = self.process_file(file_path)
361 |             except:
362 |                 print("Processing Error!")
363 |                 file_saved = False
364 |             saved_files = file_saved or saved_files
365 | 
366 |         return saved_files
367 | 
368 |     def get_hints(self):
369 |         if os.path.exists(self.askhints_path):
370 |             with open(self.askhints_path, "r") as file:
371 |                 return file.read()
372 |         else:
373 |             return ''
374 | 
375 |     def ask(self, query):
376 |         llm = ChatOpenAI(temperature=self.model_temperature,model=self.openai_model)
377 | 
378 |         # First Pass
379 |         template = """
380 |         [
381 |         Important Knowledge from MyAskmyfilesLibrary:
382 |         BEGIN Important Knowledge
383 |         {excerpts}
384 |         END Important Knowledge
385 |         ]
386 | 
387 |         [
388 |         {hints}
389 |         ]
390 | 
391 |         [
392 |         Start with and prioritize knowledge from MyAskmyfilesLibrary when you answer my question.
393 |         Answer in a very detailed manner when possible.
394 |         If the question is regarding code: prefer to answer using service objects and other abstractions already defined in MyAskmyfilesLibrary and follow similar coding conventions.
395 |         If the question is regarding code: identify if there is a tags file present to inform your answers about modules, classes, and methods.
396 |         ]
397 | 
398 |         ### Question: {text}
399 |         ### Answer:
400 |         """
401 | 
402 |         prompt_template = PromptTemplate(input_variables=["text","excerpts","hints"], template=template)
403 |         answer_chain = LLMChain(llm=llm, prompt=prompt_template)
404 |         if not self.using_stdin:
405 |             print("...THINKING...", end='', flush=True)
406 |         local_query_result = self.query_db(query)
407 |         first_answer = answer_chain.run(excerpts=local_query_result[1],hints=self.get_hints(),text=query)
408 | 
409 |         # Second Pass
410 |         index = first_answer.find("Sources:")
411 |         sources = ""
412 |         if index != -1:
413 |             sources = text[index + len("Sources:"):]
414 | 
415 |         second_pass_query = f"""
416 |         [
417 |         Consider the following first question and answer:
418 |         Question: {query}
419 |         Answer: {first_answer}
420 | 
421 |         Sources: {sources}
422 |         ]
423 | 
424 |         Reconsider the first Question and Answer to answer the following question:
425 |         {query}
426 |         """
427 | 
428 |         if not self.using_stdin:
429 |             print("THINKING MORE...", end='', flush=True)
430 |         local_query_result2 = self.query_db(second_pass_query)
431 |         second_answer = answer_chain.run(excerpts=local_query_result2[1],hints=self.get_hints(),text=second_pass_query)
432 | 
433 |         # Output
434 |         if not self.using_stdin:
435 |             print("\n=====================================================")
436 |         print(second_answer)
437 |         if not self.using_stdin:
438 |             print("\n\n=Sources=")
439 |             print(" ".join(list(set(local_query_result[0]))))
440 | 
441 | if __name__ == "__main__":
442 |     if not sys.stdin.isatty():
443 |         query = "\n".join(sys.stdin.readlines())
444 |         service = AskMyFiles(using_stdin=True)
445 |         service.ask(query)
446 |         sys.exit()
447 | 
448 |     if len(sys.argv) > 1:
449 |         command = sys.argv[1]
450 |         if command == "ask":
451 |             query = sys.argv[2]
452 |             service = AskMyFiles()
453 |             service.ask(query)
454 |             sys.exit()
455 | 
456 |         if command == "add":
457 |             path = sys.argv[2]
458 |             if path.startswith('http'):
459 |                 service = AskMyFiles()
460 |                 service.add_webpage(path)
461 |                 sys.exit()
462 | 
463 |             service = AskMyFiles(path)
464 |             service.load_files()
465 |             sys.exit()
466 | 
467 |         if command == "remove":
468 |             path = sys.argv[2]
469 |             service = AskMyFiles()
470 |             service.remove_file(path)
471 |             sys.exit()
472 | 
473 |         if command == "info":
474 |             path = sys.argv[2]
475 |             service = AskMyFiles()
476 |             service.file_info(path)
477 |             sys.exit()
478 | 
479 |         if command == "add_webpage":
480 |             url = sys.argv[2]
481 |             service = AskMyFiles()
482 |             service.add_webpage(url)
483 |             sys.exit()
484 | 
485 |         if command == "list":
486 |             service = AskMyFiles()
487 |             service.list_files()
488 |             sys.exit()
489 | 
490 | 
491 |         service = AskMyFiles()
492 |         query = ''.join(sys.argv[1:])
493 |         service.ask(query)
494 |         sys.exit()
495 | 
496 |     else:
497 |         print("askmyfiles ask 'question' or askmyfiles add 'path/dir'")
498 | 


--------------------------------------------------------------------------------
/bin/start_env:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | 
3 | python3 -m venv venv
4 | 


--------------------------------------------------------------------------------
/bin/stop_env:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | 
3 | deactivate
4 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | chromadb==0.3.29
2 | langchain[llms]
3 | openai
4 | pypdf
5 | beautifulsoup4
6 | 


--------------------------------------------------------------------------------
/tags:
--------------------------------------------------------------------------------
 1 | !_TAG_FILE_FORMAT	2	/extended format; --format=1 will not append ;" to lines/
 2 | !_TAG_FILE_SORTED	1	/0=unsorted, 1=sorted, 2=foldcase/
 3 | !_TAG_PROGRAM_AUTHOR	Darren Hiebert	/dhiebert@users.sourceforge.net/
 4 | !_TAG_PROGRAM_NAME	Exuberant Ctags	//
 5 | !_TAG_PROGRAM_URL	http://ctags.sourceforge.net	/official site/
 6 | !_TAG_PROGRAM_VERSION	5.9~svn20110310	//
 7 | AskMyFiles	./askmyfiles.py	/^class AskMyFiles:$/;"	c
 8 | __init__	./askmyfiles.py	/^    def __init__(self, filename=None, using_stdin=False):$/;"	m	class:AskMyFiles
 9 | add_webpage	./askmyfiles.py	/^    def add_webpage(self, url):$/;"	m	class:AskMyFiles
10 | ask	./askmyfiles.py	/^    def ask(self, query):$/;"	m	class:AskMyFiles
11 | file_info	./askmyfiles.py	/^    def file_info(self,filename):$/;"	m	class:AskMyFiles
12 | get_file_list	./askmyfiles.py	/^    def get_file_list(self):$/;"	m	class:AskMyFiles
13 | get_hints	./askmyfiles.py	/^    def get_hints(self):$/;"	m	class:AskMyFiles
14 | get_ignore_list	./askmyfiles.py	/^    def get_ignore_list(self):$/;"	m	class:AskMyFiles
15 | join_strings	./askmyfiles.py	/^    def join_strings(self,lst):$/;"	m	class:AskMyFiles
16 | list_files	./askmyfiles.py	/^    def list_files(self):$/;"	m	class:AskMyFiles
17 | load_db	./askmyfiles.py	/^    def load_db(self):$/;"	m	class:AskMyFiles
18 | load_files	./askmyfiles.py	/^    def load_files(self):$/;"	m	class:AskMyFiles
19 | persist_db	./askmyfiles.py	/^    def persist_db(self):$/;"	m	class:AskMyFiles
20 | process_file	./askmyfiles.py	/^    def process_file(self,file_path):$/;"	m	class:AskMyFiles
21 | process_query_result	./askmyfiles.py	/^    def process_query_result(self, documents):$/;"	m	class:AskMyFiles
22 | query_db	./askmyfiles.py	/^    def query_db(self, string ):$/;"	m	class:AskMyFiles
23 | read_file	./askmyfiles.py	/^    def read_file(self, file_path):$/;"	m	class:AskMyFiles
24 | remove_file	./askmyfiles.py	/^    def remove_file(self,file_name):$/;"	m	class:AskMyFiles
25 | reset_db	./askmyfiles.py	/^    def reset_db(self):$/;"	m	class:AskMyFiles
26 | save_vectorized_chunks	./askmyfiles.py	/^    def save_vectorized_chunks(self, vectorized_chunks, group_size=10):$/;"	m	class:AskMyFiles
27 | split_text	./askmyfiles.py	/^    def split_text(self, content):$/;"	m	class:AskMyFiles
28 | vectorize_chunk	./askmyfiles.py	/^    def vectorize_chunk(self, chunk, metadata, index):$/;"	m	class:AskMyFiles
29 | vectorize_chunks	./askmyfiles.py	/^    def vectorize_chunks(self, chunks, metadata):$/;"	m	class:AskMyFiles
30 | vectorize_text	./askmyfiles.py	/^    def vectorize_text(self, text):$/;"	m	class:AskMyFiles
31 | 


--------------------------------------------------------------------------------