├── .gitignore
├── README.md
├── TraceTalk
    ├── app.py
    ├── chatbot_agent.py
    ├── handle_multiprocessing.py
    ├── main.py
    ├── package-lock.json
    ├── prep_data.py
    ├── prompts
    │   ├── basic_prompt.py
    │   └── combine_prompt.py
    ├── requirements.txt
    ├── roots.sst
    ├── src.py
    ├── update_collection.py
    ├── utils
    │   ├── json_tokenizer.py
    │   └── test_tokenizer.py
    ├── vector-db-persist-directory
    │   ├── book dada
    │   │   └── book data.csv
    │   └── resources
    │   │   ├── assets.txt
    │   │   ├── assignments.txt
    │   │   ├── data-science.txt
    │   │   ├── data.txt
    │   │   ├── deep-learning.txt
    │   │   ├── machine-learning-productionization.txt
    │   │   ├── ml-advanced.txt
    │   │   ├── ml-fundamentals.txt
    │   │   ├── prerequisites.txt
    │   │   ├── slides.txt
    │   │   └── supporting-materials.txt
    └── workflows
    │   └── update_source_link.py
├── docs
    ├── api_documentation.md
    ├── learning_mechanism.md
    └── setup.md
├── frontend
    └── main.py
├── notebooks
    └── experiment_adaptive_learning.ipynb
├── requirements.txt
├── setup.py
├── src
    ├── __init__.py
    ├── config.py
    ├── llm
    │   ├── __init__.py
    │   ├── api_integration.py
    │   └── model.py
    ├── main.py
    ├── nlu
    │   ├── __init__.py
    │   └── intent_recognition.py
    ├── online_learning
    │   ├── __init__.py
    │   ├── adaptive_model.py
    │   └── memory_manager.py
    ├── personalization
    │   ├── __init__.py
    │   ├── preference_learner.py
    │   └── user_profile.py
    ├── task_management
    │   ├── __init__.py
    │   └── task_handler.py
    └── utils
    │   ├── __init__.py
    │   └── helpers.py
└── tests
    ├── __init__.py
    ├── test_adaptive_model.py
    ├── test_main.py
    └── test_personalization.py


/.gitignore:
--------------------------------------------------------------------------------
  1 | # Virtual Environment
  2 | venv/
  3 | env/
  4 | .venv/
  5 | .env/
  6 | 
  7 | # Python cache files
  8 | __pycache__/
  9 | *.py[cod]
 10 | *$py.class
 11 | 
 12 | # C extensions
 13 | *.so
 14 | 
 15 | # Distribution / packaging
 16 | .Python
 17 | build/
 18 | develop-eggs/
 19 | dist/
 20 | downloads/
 21 | eggs/
 22 | .eggs/
 23 | lib/
 24 | lib64/
 25 | parts/
 26 | sdist/
 27 | var/
 28 | wheels/
 29 | share/python-wheels/
 30 | *.egg-info/
 31 | .installed.cfg
 32 | *.egg
 33 | 
 34 | # PyInstaller
 35 | *.manifest
 36 | *.spec
 37 | 
 38 | # Installer logs
 39 | pip-log.txt
 40 | pip-delete-this-directory.txt
 41 | 
 42 | # Unit test / coverage reports
 43 | htmlcov/
 44 | .tox/
 45 | .nox/
 46 | .coverage
 47 | .coverage.*
 48 | .cache
 49 | nosetests.xml
 50 | coverage.xml
 51 | *.cover
 52 | *.py,cover
 53 | .hypothesis/
 54 | .pytest_cache/
 55 | 
 56 | # Jupyter Notebook
 57 | .ipynb_checkpoints
 58 | 
 59 | # IPython
 60 | profile_default/
 61 | ipython_config.py
 62 | 
 63 | # pyenv
 64 | .python-version
 65 | 
 66 | # Environments
 67 | .env
 68 | .venv
 69 | env/
 70 | venv/
 71 | ENV/
 72 | env.bak/
 73 | venv.bak/
 74 | 
 75 | # Spyder project settings
 76 | .spyderproject
 77 | .spyproject
 78 | 
 79 | # Rope project settings
 80 | .ropeproject
 81 | 
 82 | # mkdocs documentation
 83 | /site
 84 | 
 85 | # mypy
 86 | .mypy_cache/
 87 | .dmypy.json
 88 | dmypy.json
 89 | 
 90 | # Pyre type checker
 91 | .pyre/
 92 | 
 93 | # pytype static type analyzer
 94 | .pytype/
 95 | 
 96 | # Operating System Files
 97 | .DS_Store  # macOS
 98 | Thumbs.db  # Windows
 99 | 
100 | # IDE specific files
101 | .vscode/
102 | .idea/
103 | *.swp
104 | *.swo
105 | *~
106 | 
107 | # LeAgent specific ignores (add any project-specific files/directories here)
108 | # data/user_data/  # Uncomment if you want to ignore user data
109 | # model_checkpoints/  # Uncomment if you want to ignore model checkpoints


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # LeAgent: Your Adaptive AI Companion
  2 | 
  3 | ### Note: My dear developers and users, this project was established in March 2023. At that time, the concept of RAG had not yet become widespread. However, I've recently come up with new ideas and plan to work on a project involving a learning RAG+agent (the ultimate goal is to realize an AI like Iron Man's Jarvis). Thanks! Please give me a few months to restructure this project. Thank you for your support. As a professional developer and algorithm engineer, I promise not to let you down. The vision is as follows:
  4 | 
  5 | ## Overview
  6 | 
  7 | LeAgent is an innovative AI assistant designed to grow and adapt alongside its user, creating a unique and evolving relationship. Inspired by AI assistants in science fiction, such as J.A.R.V.I.S. or Friday from the Iron Man films, LeAgent aims to bridge the gap between static, pre-programmed assistants and truly personalized AI companions.
  8 | 
  9 | The name "LeAgent" combines the French article "Le" with "Agent," symbolizing a sophisticated, adaptive assistant that transcends language and cultural barriers.
 10 | 
 11 | ## Key Features
 12 | 
 13 | - **Adaptive Learning**: Learns and adapts in real-time based on user interactions.
 14 | - **Personalization**: Tailors responses, suggestions, and behavior to individual user preferences and habits.
 15 | - **Natural Interaction**: Utilizes advanced natural language understanding for more human-like conversations.
 16 | - **Task Management**: Provides efficient assistance with various tasks, from scheduling to information retrieval.
 17 | - **Continuous Growth**: Designed to continuously improve and expand capabilities over time.
 18 | - **Privacy-Focused**: Ensures user data is handled securely and ethically.
 19 | 
 20 | ## Technology Stack
 21 | 
 22 | - Large Language Models (LLMs) for natural language processing
 23 | - Incremental learning algorithms for real-time adaptation
 24 | - Advanced Natural Language Understanding (NLU) for intent recognition
 25 | - Efficient data structures for memory management and quick retrieval
 26 | 
 27 | ![****[What is vector search?](https://www.elastic.co/cn/what-is/vector-search)****](https://user-images.githubusercontent.com/65004114/226753565-e2230d59-5750-4d77-840f-4f777441a4dc.png)
 28 | 
 29 | ![Framework of TraceTalk](https://github.com/Appointat/Chat-with-Document-s-using-ChatGPT-API-and-Text-Embedding/assets/65004114/78a1b834-41cf-4ddc-bae3-26398ce53bb8)
 30 | 
 31 | ## Getting Started
 32 | 
 33 | ### Prerequisites
 34 | 
 35 | - Python 3.8+
 36 | - pip (Python package manager)
 37 | 
 38 | ### Installation
 39 | 
 40 | 1. Clone the repository:
 41 |    ```
 42 |    git clone https://github.com/your-username/LeAgent.git
 43 |    cd LeAgent
 44 |    ```
 45 | 
 46 | 2. Create a virtual environment:
 47 |    ```
 48 |    python -m venv venv
 49 |    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
 50 |    ```
 51 | 
 52 | 3. Install the required packages:
 53 |    ```
 54 |    pip install -r requirements.txt
 55 |    ```
 56 | 
 57 | ### Usage
 58 | 
 59 | (Add basic usage instructions here once the core functionality is implemented)
 60 | 
 61 | ## Project Structure
 62 | 
 63 | ```
 64 | LeAgent/
 65 | ├── src/
 66 | │   ├── llm/
 67 | │   ├── online_learning/
 68 | │   ├── personalization/
 69 | │   ├── nlu/
 70 | │   ├── task_management/
 71 | │   └── utils/
 72 | ├── data/
 73 | ├── tests/
 74 | ├── docs/
 75 | └── notebooks/
 76 | ```
 77 | 
 78 | ## Contributing
 79 | 
 80 | We welcome contributions from the community! If you're interested in improving LeAgent, please follow these steps:
 81 | 
 82 | 1. Fork the repository
 83 | 2. Create a new branch (`git checkout -b feature/AmazingFeature`)
 84 | 3. Make your changes
 85 | 4. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
 86 | 5. Push to the branch (`git push origin feature/AmazingFeature`)
 87 | 6. Open a Pull Request
 88 | 
 89 | Please read [CONTRIBUTING.md](CONTRIBUTING.md) for details on our code of conduct and the process for submitting pull requests.
 90 | 
 91 | ## License
 92 | 
 93 | This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details.
 94 | 
 95 | ## Acknowledgments
 96 | 
 97 | - Inspiration from sci-fi AI assistants
 98 | - OpenAI for advances in language models
 99 | - The open-source community for various tools and libraries used in this project
100 | 
101 | ---
102 | 
103 | LeAgent: Votre compagnon AI évolutif - Your adaptive AI companion
104 | 


--------------------------------------------------------------------------------
/TraceTalk/app.py:
--------------------------------------------------------------------------------
 1 | import asyncio
 2 | import codecs
 3 | import os
 4 | import re
 5 | import sys
 6 | from typing import List
 7 | 
 8 | from fastapi import FastAPI, Request
 9 | from fastapi.middleware.cors import CORSMiddleware
10 | from fastapi.responses import StreamingResponse
11 | from main import main as agent
12 | 
13 | core_directory = os.path.dirname(os.path.abspath(__file__))
14 | if core_directory not in sys.path:
15 |     sys.path.append(core_directory)
16 | 
17 | 
18 | app = FastAPI()
19 | 
20 | # Add CORS middleware to allow cross-origin requests
21 | app.add_middleware(
22 |     CORSMiddleware,
23 |     allow_origins=["*"],
24 |     allow_credentials=True,
25 |     allow_methods=["*"],
26 |     allow_headers=["*"],
27 | )
28 | 
29 | sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())
30 | 
31 | 
32 | async def process_messages(messages: List[str]):
33 |     result = agent(messages=messages)
34 |     chunks = re.split(r"(\n+)", result)
35 |     for chunk in chunks:
36 |         if chunk.strip():
37 |             # Split the chunk into sentences
38 |             if len(chunk) > 100:
39 |                 sentences = re.split(r"([.!?。！？]+)", chunk)
40 |                 for i in range(0, len(sentences), 2):
41 |                     sentence = sentences[i] + (
42 |                         sentences[i + 1] if i + 1 < len(sentences) else ""
43 |                     )
44 |                     yield f"{sentence}"
45 |                     await asyncio.sleep(0.05)  # pause between sentences
46 |             else:
47 |                 yield f"{chunk}"
48 | 
49 |             await asyncio.sleep(0.1)  # pause between chunks
50 |         else:
51 |             yield chunk
52 | 
53 | 
54 | @app.post("/process")
55 | async def handler(request: Request):
56 |     print("Received request...")
57 |     try:
58 |         data = await request.json()
59 |         messages = data.get("messages", [])
60 |         # messages_str_list = [message["content"] for message in messages
61 |         messages_str_list = [message["content"] for message in messages]
62 |         print(f"Received messages:\n{messages_str_list}")
63 | 
64 |         return StreamingResponse(
65 |             process_messages(messages_str_list), media_type="text/event-stream"
66 |         )
67 |     except Exception as e:
68 |         print(f"Error occurred: {e}")
69 |         return {"error": str(e)}
70 | 
71 | 
72 | if __name__ == "__main__":
73 |     print("Starting the server...")
74 |     import uvicorn
75 | 
76 |     uvicorn.run(app, host="0.0.0.0", port=8101)
77 | 


--------------------------------------------------------------------------------
/TraceTalk/chatbot_agent.py:
--------------------------------------------------------------------------------
  1 | # Import basic libraries.
  2 | import os
  3 | import re
  4 | from collections import deque
  5 | from typing import List
  6 | 
  7 | # Import OpenAI API and Langchain libraries.
  8 | from openai import OpenAI
  9 | from langchain.prompts import PromptTemplate
 10 | 
 11 | # Importing prompts.
 12 | from prompts.basic_prompt import basic_prompt
 13 | from prompts.combine_prompt import combine_prompt
 14 | 
 15 | # Import Qdrant client (vector database).
 16 | from qdrant_client import QdrantClient
 17 | from src import get_emmbeddings, get_tokens_number
 18 | 
 19 | 
 20 | class ChatbotAgent:
 21 |     """
 22 |     A class used to represent a chatbot agent.
 23 |     """
 24 | 
 25 |     def __init__(
 26 |         self,
 27 |         openai_api_key: str,
 28 |         qdrant_url: str,
 29 |         qdrant_api_key: str,
 30 |         messages: List[str],
 31 |     ):
 32 |         """
 33 |         Initializes an instance of the ChatbotAgent class.
 34 | 
 35 |         Args:
 36 |             openai_api_key (str): The API key provided by OpenAI to authenticate requests.
 37 |             qdrant_url (str): The URL for the Qdrant service to connect with.
 38 |             qdrant_api_key (str): The API key for the Qdrant service to authenticate requests.
 39 |             messages (List[str]): A list of messages that the chatbot agent will process.
 40 | 
 41 |         """
 42 |         # Set OpenAI API key and initialize client.
 43 |         self._openai_api_key = openai_api_key
 44 |         os.environ["OPENAI_API_KEY"] = self._openai_api_key
 45 |         self.client = OpenAI(api_key=self._openai_api_key)
 46 | 
 47 |         # Initialize Qdrant client
 48 |         self.qdrant_client = QdrantClient(
 49 |             url=qdrant_url,
 50 |             prefer_grpc=False,
 51 |             api_key=qdrant_api_key,
 52 |         )
 53 |         self.qdrant_client.get_collections()
 54 | 
 55 |         # Initialize the chat history.
 56 |         self.count = 1  # Count the number of times the chatbot has been called.
 57 |         self._max_chat_history_length = 20
 58 |         self.chat_history = deque(maxlen=self._max_chat_history_length)
 59 |         init_prompt = "I am TraceTalk, a cutting-edge chatbot designed to encapsulate the power of advanced AI technology, with a special focus on data science, machine learning, and deep learning. (https://github.com/Appointat/Chat-with-Document-s-using-ChatGPT-API-and-Text-Embedding)\n"
 60 |         self.chat_history.append({"role": "chatbot", "content": init_prompt})
 61 |         for i in range(len(messages)):
 62 |             if i % 2 == 0:
 63 |                 self.chat_history.append({"role": "user", "content": messages[i]})
 64 |             else:
 65 |                 self.chat_history.append({"role": "chatbot", "content": messages[i]})
 66 | 
 67 |         self.query = ""
 68 |         self.answer = ""
 69 | 
 70 |     def search_context_qdrant(
 71 |         self, query, collection_name, vector_name="content", top_k=10
 72 |     ):
 73 |         """
 74 |         Search the Qdrant database for the top k most similar vectors to the query.
 75 | 
 76 |         Args:
 77 |             query (str): The query to search for.
 78 |             collection_name (str): The name of the collection to search in.
 79 |             vector_name (str): The name of the vector to search for.
 80 |             top_k (int): The number of results to return.
 81 | 
 82 |         Returns:
 83 |             query_results (list): A list of the top k most similar vectors to the query.
 84 |         """
 85 |         # Create embedding vector from user query.
 86 |         embedded_query = get_emmbeddings(query)
 87 | 
 88 |         query_results = self.qdrant_client.search(
 89 |             collection_name=collection_name,
 90 |             query_vector=(vector_name, embedded_query),
 91 |             limit=top_k,
 92 |         )
 93 | 
 94 |         return query_results
 95 | 
 96 |     def prompt_chatbot(self, context, chat_history, resource, query):
 97 |         """
 98 |         Prompt the chatbot to generate a response.
 99 | 
100 |         Args:
101 |             context (str): The context for the query.
102 |             chat_history (str): The chat history.
103 |             resource (str): The resource string.
104 |             query (str): The user's query.
105 | 
106 |         Returns:
107 |             str: The chatbot's response to the user's query.
108 |         """
109 |         prompt = f"Context: {context}\nChat History: {chat_history}\nResources: {resource}\nQuestion: {query}\nPlease provide an answer based on the given context and resources."
110 |         
111 |         response = self.client.chat.completions.create(
112 |             model="gpt-4o-mini",
113 |             messages=[{"role": "user", "content": prompt}],
114 |             temperature=0.8
115 |         )
116 |         return response.choices[0].message.content
117 | 
118 |     def prompt_combine_chain(self, query, answer_list, link_list_list):
119 |         """
120 |         Prompt the chatbot to generate a response.
121 | 
122 |         Args:
123 |             query (str): The user's query.
124 |             answer_list (list): A list of answers to the user's query.
125 |             link_list (list): A list of links to the user's query.
126 |         Returns:
127 |             chatbot_answer (str): The chatbot's response to the user's query.
128 |         """
129 |         MAX_TOKENS_CHAT_HISTORY = 1000
130 |         n = len(answer_list)
131 | 
132 |         if n == 0:
133 |             return "I'm sorry, there is not enough information to provide a meaningful answer to your question. Can you please provide more context or a specific question?"
134 |         else:
135 |             chat_history = self.convert_chat_history_to_string()
136 |             if get_tokens_number(chat_history) >= MAX_TOKENS_CHAT_HISTORY:
137 |                 print(
138 |                     f"Warning: chat history is too long, tokens: {get_tokens_number(chat_history)}."
139 |                 )
140 |                 chat_history = self.convert_chat_history_to_string(
141 |                     user_only=True, remove_resource=True
142 |                 )
143 | 
144 |             prompt = combine_prompt(
145 |                 chat_history=chat_history,
146 |                 query=query,
147 |                 answer_list=answer_list,
148 |                 link_list_list=link_list_list,
149 |                 MAX_TOKENS=4096 - 1000,
150 |             )
151 |             prompt = self.convert_links_in_text(prompt)
152 | 
153 |             if get_tokens_number(prompt) > 4096 - 1000:
154 |                 return "Tokens number of the prompt is too long: {}.".format(
155 |                     get_tokens_number(prompt)
156 |                 )
157 |             else:
158 |                 print(
159 |                     "Tokens number of the prompt: {}.".format(get_tokens_number(prompt))
160 |                 )
161 | 
162 |             # return prompt
163 |             # Use the OpenAI API to generate a response based on the prompt
164 |             response = self.client.chat.completions.create(
165 |                 model="gpt-4o-mini",
166 |                 messages=[{"role": "user", "content": prompt}],
167 |                 temperature=0.7,
168 |                 max_tokens=1000,
169 |                 n=1,
170 |                 stop=None,
171 |             )
172 | 
173 |             # Extract and return the generated response
174 |             print(f"Chatbot response:\n{response.choices[0].message.content.strip()}")
175 |             return response.choices[0].message.content.strip()
176 | 
177 |     def update_chat_history(self, query, answer):
178 |         """
179 |         Update the chat history with the user's query and the chatbot's response.
180 | 
181 |         Args:
182 |             query (str): The user's query.
183 |             answer (str): The chatbot's response to the user's query.
184 |         """
185 |         if len(self.chat_history) == self._max_chat_history_length:
186 |             self.chat_history.popleft()
187 | 
188 |         self.chat_history.append({"role": "user", "content": query})
189 |         self.chat_history.append({"role": "chatbot", "content": answer})
190 |         self.count += 1
191 | 
192 |     def convert_chat_history_to_string(
193 |         self,
194 |         new_query="",
195 |         new_answser="",
196 |         user_only=False,
197 |         chatbot_only=False,
198 |         remove_resource=False,
199 |     ):
200 |         """
201 |         Convert the chat history to a string.
202 | 
203 |         Args:
204 |             new_query (str): The user's query.
205 |             new_answser (str): The chatbot's response to the user's query.
206 |             user_only (bool): If True, only return the user's queries.
207 |             chatbot_only (bool): If True, only return the chatbot's responses.
208 |             remove_resource (bool): If True, remove the resource from the chatbot's responses.
209 | 
210 |         Returns:
211 |             chat_string (str): The chat history as a string.
212 |         """
213 |         if sum([bool(user_only), bool(chatbot_only)]) == 2:
214 |             raise ValueError(
215 |                 "user_only and chatbot_only cannot be True at the same time."
216 |             )
217 |         chat_string = ""
218 |         if len(self.chat_history) > 0:
219 |             for message in self.chat_history:
220 |                 if message["role"] == "chatbot" and ~user_only:
221 |                     # Delete the text (the text until to end) begin with "REFERENCE:" in the message['content'], because we do not need it.
222 |                     if remove_resource:
223 |                         chat_string += f"[{message['role']}]: {message['content'].split('RESOURCE:', 1)[0].split('REFERENCE', 1)[0]} \n"
224 |                     else:
225 |                         chat_string += f"[{message['role']}]: {message['content']} \n"
226 |                 elif message["role"] == "user" and ~chatbot_only:
227 |                     chat_string += f"[{message['role']}]: {message['content']} \n"
228 |         if new_query:
229 |             chat_string += f"[user]: {new_query} \n"
230 |         if new_answser:
231 |             chat_string += f"[chatbot]: {new_answser} \n"
232 | 
233 |         if get_tokens_number(chat_string) >= 3000:  # Max token length for GPT-3 is 4096.
234 |             print(
235 |                 f"Chat history is too long: {get_tokens_number(chat_string)} tokens. Truncating chat history."
236 |             )
237 |         return chat_string
238 | 
239 |     def convert_links_in_text(self, text):
240 |         """
241 |         Convert links in the text to the correct format.
242 | 
243 |         Args:
244 |             text (str): The text to convert.
245 | 
246 |         Returns:
247 |             text (str): The text with the links converted.
248 |         """
249 |         links = re.findall(
250 |             "https://open-academy.github.io/machine-learning/[^\s]*", text
251 |         )
252 |         for link in links:
253 |             converted_link = (
254 |                 link.replace("_sources/", "")
255 |                 .replace(".md", ".html")
256 |                 .replace("open-machine-learning-jupyter-book/", "")
257 |             )
258 |             text = text.replace(link, converted_link)
259 |         return text
260 | 
261 |     def markdown_to_python(self, markdown_text):
262 |         """
263 |         Convert Markdown text to Python string.
264 | 
265 |         Args:
266 |             markdown_text (str): The Markdown text to convert.
267 | 
268 |         Returns:
269 |             python_string (str): The Python string.
270 |         """
271 |         # Escape quotes and backslashes in the input.
272 |         escaped_input = markdown_text.replace("\\", "\\\\").replace("'", "\\'")
273 | 
274 |         # Generate the Python string
275 |         python_string = f"'{escaped_input}'"
276 | 
277 |         return python_string
278 | 
279 |     def chatbot_pipeline(
280 |         self, query_pipeline, choose_GPTModel=False, updateChatHistory=False
281 |     ):
282 |         """
283 |         Chat with the chatbot using the pipeline.
284 | 
285 |         Args:
286 |             query_pipeline (str): The user's query.
287 |             choose_GPTModel (bool): If True, choose the GPT model.
288 |             updateChatHistory (bool): If True, update the chat history.
289 | 
290 |         Returns:
291 |             result_pipeline (str): The chatbot's response to the user's query.
292 |         """
293 |         # choose which GPT model.
294 |         if choose_GPTModel:
295 |             response = self.client.completions.create(
296 |                 model="davinci",
297 |                 prompt=query_pipeline,
298 |                 temperature=0.7,
299 |                 max_tokens=150,
300 |                 n=1,
301 |                 stop=None,
302 |             )
303 |             result_pipeline = response.choices[0].text.strip()
304 |         else:
305 |             result_pipeline = self.chatbot_qa(
306 |                 {"question": query_pipeline, "chat_history": self.chat_history}
307 |             )
308 | 
309 |         if updateChatHistory:
310 |             self.query = query_pipeline
311 |             self.result = result_pipeline
312 |             self.chat_history = self.chat_history + [
313 |                 (self.query, self.result["answer"])
314 |             ]
315 |             return self.result
316 |         else:
317 |             return result_pipeline
318 | 
319 |     def promtp_engineering_for_non_library_content(self, query):
320 |         """
321 |         Prompt the chatbot for non libary content.
322 | 
323 |         Args:
324 |             query (str): The user's query.
325 | 
326 |         Returns:
327 |             result_prompted (str): The chatbot's response to the user's query.
328 |         """
329 |         # Please do not modify the value of query.
330 |         query_prompted = query + " Please provide a verbose answer."
331 | 
332 |         result_prompted = self.chatbot_pipeline(query_prompted)
333 |         # result_not_know_answer = []  # TBD
334 |         # result_non_library_query = []  # TBD
335 |         # result_official_keywords = []  # TBD
336 |         # result_cheeting = []  # TBD
337 |         return result_prompted
338 | 
339 |     def chatbot_qa(self, input_data):
340 |         """
341 |         Process the input data and generate a response using the chatbot.
342 | 
343 |         Args:
344 |             input_data (dict): A dictionary containing the question and chat history.
345 | 
346 |         Returns:
347 |             dict: A dictionary containing the chatbot's response.
348 |         """
349 |         question = input_data["question"]
350 |         chat_history = input_data["chat_history"]
351 | 
352 |         # Prepare the messages for the ChatCompletion API
353 |         messages = [
354 |             {"role": "system", "content": "You are a helpful AI assistant."},
355 |         ]
356 |         
357 |         # Add chat history to the messages
358 |         for entry in chat_history:
359 |             if isinstance(entry, dict):
360 |                 messages.append({"role": entry["role"], "content": entry["content"]})
361 |             elif isinstance(entry, tuple):
362 |                 messages.append({"role": "user", "content": entry[0]})
363 |                 messages.append({"role": "assistant", "content": entry[1]})
364 | 
365 |         # Add the current question
366 |         messages.append({"role": "user", "content": question})
367 | 
368 |         # Call the OpenAI API
369 |         response = self.client.chat.completions.create(
370 |             model="gpt-3.5-turbo",
371 |             messages=messages,
372 |             temperature=0.7,
373 |             max_tokens=150,
374 |             n=1,
375 |             stop=None,
376 |         )
377 | 
378 |         # Extract the response
379 |         answer = response.choices[0].message.content.strip()
380 | 
381 |         return {"answer": answer}
382 | 


--------------------------------------------------------------------------------
/TraceTalk/handle_multiprocessing.py:
--------------------------------------------------------------------------------
 1 | import re
 2 | 
 3 | def process_request(params):
 4 |     """
 5 |     Process a request to the chatbot.
 6 | 
 7 |     Args:
 8 |         params (tuple): A tuple of parameters.
 9 | 
10 |     Returns:
11 |         tuple: A tuple of the answer and the link.
12 |     """
13 |     chatbot_agent, context, chat_history, query, link, score = params
14 |     convert_link = link.replace("_sources", "").replace(".md", ".html") if score > 0.5 else ""
15 |     reject_context = "Sorry, the question is not associated with the context. The chatbot should refuse to answer."
16 | 
17 |     url_pattern = r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+|www.(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+|[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}'
18 |     resource_list = re.findall(url_pattern, context)
19 |     resource_str = "\n".join([f"[{i+1}] {link}" for i, link in enumerate(resource_list)])
20 |     resource_str = convert_link + "\n" + resource_str
21 | 
22 |     try:
23 |         if score > 0.5:
24 |             answer = chatbot_agent.prompt_chatbot(context, chat_history, resource_str, query)
25 |         else:
26 |             answer = reject_context
27 |     except Exception as e:
28 |         print(f"An error occurred: {e}")
29 |         answer = "I'm sorry, but I encountered an error while processing your request."
30 |     finally:
31 |         # Release resources here, for example:
32 |         # chatbot_agent.close()
33 |         pass
34 | 
35 |     return answer, convert_link


--------------------------------------------------------------------------------
/TraceTalk/main.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | from concurrent.futures import ThreadPoolExecutor
 3 | 
 4 | from chatbot_agent import ChatbotAgent
 5 | from handle_multiprocessing import process_request
 6 | 
 7 | def main(message="", messages=[""]):
 8 |     # Initialize the ChatbotAgent.
 9 |     openai_api_key = os.getenv("OPENAI_API_KEY")
10 |     if not openai_api_key:
11 |         raise ValueError("OPENAI_API_KEY environment variable not set.")
12 |     qdrant_url = os.getenv("QDRANT_URL")
13 |     if not qdrant_url:
14 |         raise ValueError("QDRANT_URL environment variable not set.")
15 |     qdrant_api_key = os.getenv("QDRANT_API_KEY")
16 |     if not qdrant_api_key:
17 |         raise ValueError("QDRANT_API_KEY environment variable not set.")
18 | 
19 |     if message:
20 |         messages.append(message)
21 | 
22 |     global chatbot_agent
23 |     chatbot_agent = ChatbotAgent(
24 |         openai_api_key=openai_api_key,
25 |         qdrant_url=qdrant_url,
26 |         qdrant_api_key=qdrant_api_key,
27 |         messages=messages,
28 |     )
29 | 
30 |     # Start the conversation.
31 |     query = messages[-1]
32 |     answer_list = []
33 |     link_list = []
34 |     # query it using content vector.
35 |     query_results = chatbot_agent.search_context_qdrant(
36 |         chatbot_agent.convert_chat_history_to_string(new_query=query, user_only=True),
37 |         "Articles",
38 |         top_k=4,
39 |     )
40 | 
41 |     article_ids_plus_one = [min(article.id + 1, 932 - 1) for article in query_results]
42 |     article_ids_minus_one = [max(article.id - 1, 0) for article in query_results]
43 |     retrieved_articles_plus_one = chatbot_agent.qdrant_client.retrieve(
44 |         collection_name="Articles", ids=article_ids_plus_one
45 |     )
46 |     retrieved_articles_minus_one = chatbot_agent.qdrant_client.retrieve(
47 |         collection_name="Articles", ids=article_ids_minus_one
48 |     )
49 |     requests = [
50 |         (
51 |             chatbot_agent,
52 |             # Concatenate the existing article content with the content retrieved using the article's id.
53 |             # 'retrieve' function returns a list of points, so we need to access the first (and in this case, only) result with '[0]'.
54 |             (retrieved_articles_minus_one[i].payload["content"] + "\n" +
55 |              article.payload["content"] + "\n" +
56 |              retrieved_articles_plus_one[i].payload["content"]),
57 |             chatbot_agent.convert_chat_history_to_string(
58 |                 user_only=True, remove_resource=True
59 |             ),
60 |             query,
61 |             article.payload["link"],
62 |             article.score,
63 |         )
64 |         for i, article in enumerate(query_results)
65 |     ]
66 | 
67 |     # Use a Pool to manage the processes.
68 |     with ThreadPoolExecutor(max_workers=len(query_results)) as executor:
69 |         results = list(executor.map(process_request, requests))
70 | 
71 |     # Results is a list of tuples of the form (answer, link).
72 |     answer_list, link_list = zip(*results)
73 | 
74 |     # Initialize link_list_list with each link from link_list as a separate list.
75 |     link_list_list = [[link] for link in link_list]
76 |     # For each answer, perform the query and add the result to the corresponding list in link_list_list.
77 |     for i, answer in enumerate(answer_list):
78 |         secondary_query_results_temp = chatbot_agent.search_context_qdrant(
79 |             answer, "Articles", top_k=2
80 |         )
81 |         link_list_list[i].extend(
82 |             article.payload["link"].replace("_sources", "").replace(".md", ".html")
83 |             for article in secondary_query_results_temp
84 |         )
85 | 
86 |     combine_answer = chatbot_agent.prompt_combine_chain(
87 |         query=query, answer_list=answer_list, link_list_list=link_list_list
88 |     )
89 | 
90 |     return combine_answer
91 | 
92 | 
93 | if __name__ == "__main__":
94 |     main()
95 | 


--------------------------------------------------------------------------------
/TraceTalk/package-lock.json:
--------------------------------------------------------------------------------
  1 | {
  2 |   "name": "TraceTalk",
  3 |   "lockfileVersion": 3,
  4 |   "requires": true,
  5 |   "packages": {
  6 |     "": {
  7 |       "dependencies": {
  8 |         "axios": "^1.4.0"
  9 |       }
 10 |     },
 11 |     "node_modules/asynckit": {
 12 |       "version": "0.4.0",
 13 |       "resolved": "https://registry.npmjs.org/asynckit/-/asynckit-0.4.0.tgz",
 14 |       "integrity": "sha512-Oei9OH4tRh0YqU3GxhX79dM/mwVgvbZJaSNaRk+bshkj0S5cfHcgYakreBjrHwatXKbz+IoIdYLxrKim2MjW0Q=="
 15 |     },
 16 |     "node_modules/axios": {
 17 |       "version": "1.4.0",
 18 |       "resolved": "https://registry.npmjs.org/axios/-/axios-1.4.0.tgz",
 19 |       "integrity": "sha512-S4XCWMEmzvo64T9GfvQDOXgYRDJ/wsSZc7Jvdgx5u1sd0JwsuPLqb3SYmusag+edF6ziyMensPVqLTSc1PiSEA==",
 20 |       "dependencies": {
 21 |         "follow-redirects": "^1.15.0",
 22 |         "form-data": "^4.0.0",
 23 |         "proxy-from-env": "^1.1.0"
 24 |       }
 25 |     },
 26 |     "node_modules/combined-stream": {
 27 |       "version": "1.0.8",
 28 |       "resolved": "https://registry.npmjs.org/combined-stream/-/combined-stream-1.0.8.tgz",
 29 |       "integrity": "sha512-FQN4MRfuJeHf7cBbBMJFXhKSDq+2kAArBlmRBvcvFE5BB1HZKXtSFASDhdlz9zOYwxh8lDdnvmMOe/+5cdoEdg==",
 30 |       "dependencies": {
 31 |         "delayed-stream": "~1.0.0"
 32 |       },
 33 |       "engines": {
 34 |         "node": ">= 0.8"
 35 |       }
 36 |     },
 37 |     "node_modules/delayed-stream": {
 38 |       "version": "1.0.0",
 39 |       "resolved": "https://registry.npmjs.org/delayed-stream/-/delayed-stream-1.0.0.tgz",
 40 |       "integrity": "sha512-ZySD7Nf91aLB0RxL4KGrKHBXl7Eds1DAmEdcoVawXnLD7SDhpNgtuII2aAkg7a7QS41jxPSZ17p4VdGnMHk3MQ==",
 41 |       "engines": {
 42 |         "node": ">=0.4.0"
 43 |       }
 44 |     },
 45 |     "node_modules/follow-redirects": {
 46 |       "version": "1.15.2",
 47 |       "resolved": "https://registry.npmjs.org/follow-redirects/-/follow-redirects-1.15.2.tgz",
 48 |       "integrity": "sha512-VQLG33o04KaQ8uYi2tVNbdrWp1QWxNNea+nmIB4EVM28v0hmP17z7aG1+wAkNzVq4KeXTq3221ye5qTJP91JwA==",
 49 |       "funding": [
 50 |         {
 51 |           "type": "individual",
 52 |           "url": "https://github.com/sponsors/RubenVerborgh"
 53 |         }
 54 |       ],
 55 |       "engines": {
 56 |         "node": ">=4.0"
 57 |       },
 58 |       "peerDependenciesMeta": {
 59 |         "debug": {
 60 |           "optional": true
 61 |         }
 62 |       }
 63 |     },
 64 |     "node_modules/form-data": {
 65 |       "version": "4.0.0",
 66 |       "resolved": "https://registry.npmjs.org/form-data/-/form-data-4.0.0.tgz",
 67 |       "integrity": "sha512-ETEklSGi5t0QMZuiXoA/Q6vcnxcLQP5vdugSpuAyi6SVGi2clPPp+xgEhuMaHC+zGgn31Kd235W35f7Hykkaww==",
 68 |       "dependencies": {
 69 |         "asynckit": "^0.4.0",
 70 |         "combined-stream": "^1.0.8",
 71 |         "mime-types": "^2.1.12"
 72 |       },
 73 |       "engines": {
 74 |         "node": ">= 6"
 75 |       }
 76 |     },
 77 |     "node_modules/mime-db": {
 78 |       "version": "1.52.0",
 79 |       "resolved": "https://registry.npmjs.org/mime-db/-/mime-db-1.52.0.tgz",
 80 |       "integrity": "sha512-sPU4uV7dYlvtWJxwwxHD0PuihVNiE7TyAbQ5SWxDCB9mUYvOgroQOwYQQOKPJ8CIbE+1ETVlOoK1UC2nU3gYvg==",
 81 |       "engines": {
 82 |         "node": ">= 0.6"
 83 |       }
 84 |     },
 85 |     "node_modules/mime-types": {
 86 |       "version": "2.1.35",
 87 |       "resolved": "https://registry.npmjs.org/mime-types/-/mime-types-2.1.35.tgz",
 88 |       "integrity": "sha512-ZDY+bPm5zTTF+YpCrAU9nK0UgICYPT0QtT1NZWFv4s++TNkcgVaT0g6+4R2uI4MjQjzysHB1zxuWL50hzaeXiw==",
 89 |       "dependencies": {
 90 |         "mime-db": "1.52.0"
 91 |       },
 92 |       "engines": {
 93 |         "node": ">= 0.6"
 94 |       }
 95 |     },
 96 |     "node_modules/proxy-from-env": {
 97 |       "version": "1.1.0",
 98 |       "resolved": "https://registry.npmjs.org/proxy-from-env/-/proxy-from-env-1.1.0.tgz",
 99 |       "integrity": "sha512-D+zkORCbA9f1tdWRK0RaCR3GPv50cMxcrz4X8k5LTSUD1Dkw47mKJEZQNunItRTkWwgtaUSo1RVFRIG9ZXiFYg=="
100 |     }
101 |   }
102 | }
103 | 


--------------------------------------------------------------------------------
/TraceTalk/prep_data.py:
--------------------------------------------------------------------------------
  1 | # Import basic libraries.
  2 | import ast
  3 | import os
  4 | import re
  5 | import warnings
  6 | 
  7 | import pandas as pd
  8 | import requests
  9 | 
 10 | # Import Qdrant libraries.
 11 | from qdrant_client import QdrantClient, models
 12 | from src import get_emmbeddings, get_tokens_number
 13 | 
 14 | 
 15 | def prep_book_data(
 16 |     csv_file_path=r"vector-db-persist-directory/book data/book data.csv",
 17 | ):
 18 |     """
 19 |     This function prepares the book data for the vector database.
 20 | 
 21 |     Args:
 22 |         csv_file_path (str): The path to the CSV file containing the book data.
 23 | 
 24 |     Returns:
 25 |         book_data (list): A list of dictionaries containing the book data.
 26 |     """
 27 |     book_data = []  # Create an empty list to store book data.
 28 |     id = 0  # Set an initial value for the ID counter.
 29 | 
 30 |     input_directory = (
 31 |         r"vector-db-persist-directory/resources"  # Set the defualt directory.
 32 |     )
 33 | 
 34 |     for file in os.listdir(input_directory):
 35 |         if file.endswith(".txt"):
 36 |             with open(os.path.join(input_directory, file), "r") as f:
 37 |                 txt_content = f.read()
 38 |                 # Link all Markdown files extracted from the text file.
 39 |                 md_links = re.findall(r"'(https://[\w\d\-_/.]+\.md)',", txt_content)
 40 | 
 41 |             for link in md_links:
 42 |                 md_file = link.rsplit("/", 1)[-1]
 43 |                 md_title = md_file[:-3]  # Remove the .md suffix.
 44 | 
 45 |                 # Get the contents of the .md file.
 46 |                 converted_link = (
 47 |                     link.replace("github.com/open-academy", "ocademy-ai.github.io")
 48 |                     .replace("tree/main", "_sources")
 49 |                     .replace("open-machine-learning-jupyter-book/", "")
 50 |                 )
 51 |                 md_content_request = requests.get(converted_link)
 52 |                 md_content = (
 53 |                     md_content_request.text
 54 |                     if md_content_request.status_code == 200
 55 |                     else ""
 56 |                 )
 57 |                 print(f"Processing {md_title}: {link}...")
 58 | 
 59 |                 md_content_split = split_text_into_chunks(
 60 |                     md_content, chunk_max_tokens=300
 61 |                 )  # Split the text into chunks.
 62 |                 for text in md_content_split:
 63 |                     if not text:
 64 |                         continue
 65 |                     id = id + 1
 66 | 
 67 |                     # Add the book data to the list.
 68 |                     book_data.append(
 69 |                         {
 70 |                             "id": id,
 71 |                             "title": md_title,
 72 |                             "title_vector": get_emmbeddings(md_title),
 73 |                             "content": text,  # Further optimization can be done by splitting files to reduce text volume.
 74 |                             "content_vector": get_emmbeddings(
 75 |                                 text
 76 |                             ),  # Further optimization can be done by splitting files to reduce text volume.
 77 |                             "link": converted_link,
 78 |                         }
 79 |                     )
 80 | 
 81 |     # Convert the book_data list into a pandas DataFrame.
 82 |     book_data_df = pd.DataFrame(book_data)
 83 |     print("Shape of the book data DataFrame:", book_data_df.shape)
 84 |     book_data_df.to_csv(csv_file_path, index=False)
 85 | 
 86 | 
 87 | def update_collection_to_database(
 88 |     csv_file_path=r"vector-db-persist-directory/book data/book data.csv",
 89 | ):
 90 |     """
 91 |     This function updates the Qdrant database collection with the book data.
 92 | 
 93 |     Args:
 94 |         csv_file_path (str, optional): The path to the CSV file containing the book data.
 95 |     """
 96 |     # Load the book data DataFrame.
 97 |     book_data_df = pd.read_csv(csv_file_path)
 98 | 
 99 |     def convert_string_to_list(s):
100 |         return ast.literal_eval(s)
101 | 
102 |     book_data_df["title_vector"] = book_data_df["title_vector"].apply(
103 |         convert_string_to_list
104 |     )
105 |     book_data_df["content_vector"] = book_data_df["content_vector"].apply(
106 |         convert_string_to_list
107 |     )
108 | 
109 |     # Initialize client.
110 |     qdrant_url = os.getenv("QDRANT_URL")
111 |     if not qdrant_url:
112 |         raise ValueError("QDRANT_URL environment variable not set.")
113 |     print("QDRANT_URL:", qdrant_url)
114 |     qdrant_api_key = os.getenv("QDRANT_API_KEY")
115 |     if not qdrant_api_key:
116 |         raise ValueError("QDRANT_API_KEY environment variable not set.")
117 |     print("QDRANT_API_KEY:", qdrant_api_key)
118 | 
119 |     client = QdrantClient(url=qdrant_url, api_key=qdrant_api_key)
120 | 
121 |     # Create a new collection of the Qdrant database.
122 |     vector_size = 1536
123 |     client.recreate_collection(
124 |         collection_name="Articles",
125 |         vectors_config={
126 |             "title": models.VectorParams(
127 |                 size=vector_size, distance=models.Distance.COSINE
128 |             ),
129 |             "content": models.VectorParams(
130 |                 size=vector_size, distance=models.Distance.COSINE
131 |             ),
132 |         },
133 |     )
134 | 
135 |     # Upsert the data into the collection of the Qdrant database.
136 |     batch_size = 50  # Adjust this value to fit within Qdrant's size limits.
137 |     # Divide data into batches.
138 |     batches = [
139 |         book_data_df[i : i + batch_size]
140 |         for i in range(0, book_data_df.shape[0], batch_size)
141 |     ]
142 | 
143 |     for batch in batches:
144 |         points = []
145 |         for _, row in batch.iterrows():
146 |             point = models.PointStruct(
147 |                 id=row["id"],
148 |                 vector={
149 |                     "title": row["title_vector"],
150 |                     "content": row["content_vector"],
151 |                 },
152 |                 payload=row.to_dict(),
153 |             )
154 |             points.append(point)
155 |             print(f"Upserting point with id: {row['id']}")
156 | 
157 |         client.upsert(collection_name="Articles", points=points)
158 | 
159 | 
160 | def split_text_into_chunks(
161 |     text, delimiter="\n# ", chunk_max_tokens=600, MAX_TOKENS=4096
162 | ):
163 |     begin_pattern = r"---.*?---"
164 |     text = re.sub(begin_pattern, "", text, flags=re.DOTALL)
165 |     text = "\n" + text
166 | 
167 |     # Remove code cells from text.
168 |     code_pattern = r"(```{code-cell}.*?```)"
169 |     code_cells = re.findall(code_pattern, text, flags=re.DOTALL)
170 |     text = re.sub(code_pattern, "TEMPLATE_CODE_CELL\n", text, flags=re.DOTALL)
171 | 
172 |     chunks = re.split(
173 |         "((?:^|\n)(?={}(?!#)))".format(delimiter), text, flags=re.MULTILINE
174 |     )
175 |     chunks = [chunk for chunk in chunks if chunk.strip()]
176 | 
177 |     final_chunks = []
178 |     for chunk in chunks:
179 |         # Split the chunk into the sentences.
180 |         sentences = re.split(r"([.?!])", chunk)
181 |         current_n_sentences = []
182 |         for sentence in sentences:
183 |             current_n_sentences.append(sentence)
184 | 
185 |             # When the number of words reaches chunk_size, add a new chunk.
186 |             if get_tokens_number("".join(current_n_sentences)) >= chunk_max_tokens:
187 |                 final_chunks.append("".join(current_n_sentences))
188 |                 current_n_sentences = []
189 |         # Add the last chunk.
190 |         if current_n_sentences:
191 |             final_chunks.append("".join(current_n_sentences))
192 | 
193 |     for i, chunk in enumerate(final_chunks):
194 |         try:
195 |             while "TEMPLATE_CODE_CELL" in chunk and code_cells:
196 |                 code_cell = code_cells.pop(0)
197 | 
198 |                 # Split the code cell into lines.
199 |                 code_cell_lines = code_cell.split("\n")
200 | 
201 |                 # If the code cell can be inserted without exceeding the chunk size, do it.
202 |                 if (
203 |                     get_tokens_number(chunk.replace("TEMPLATE_CODE_CELL", code_cell, 1))
204 |                     <= chunk_max_tokens * 2
205 |                 ):
206 |                     chunk = chunk.replace("TEMPLATE_CODE_CELL", code_cell, 1)
207 |                 # If not, insert as many lines as possible.
208 |                 else:
209 |                     code_cell_lines = []
210 |                     for line in code_cell_lines:
211 |                         # If the next line can be inserted without exceeding the chunk size, do it.
212 |                         if (
213 |                             get_tokens_number(
214 |                                 chunk.replace(
215 |                                     "TEMPLATE_CODE_CELL", "\n".join(code_cell_lines), 1
216 |                                 )
217 |                             )
218 |                             <= chunk_max_tokens * 2
219 |                         ):  # chunk_size
220 |                             code_cell_lines.append(line)
221 |                         # If not, put the remaining lines back into code_cells and stop.
222 |                         else:
223 |                             break
224 |                     chunk = chunk.replace(
225 |                         "TEMPLATE_CODE_CELL", "\n".join(code_cell_lines)
226 |                     )
227 | 
228 |             final_chunks[i] = chunk
229 |             print(f"Tokens number of chunk {i}: {get_tokens_number(chunk)}")
230 |         except IndexError:
231 |             warnings.warn(
232 |                 "Code cells mismatch. The number of 'TEMPLATE_CODE_CELL' placeholders and actual code cells do not match."
233 |             )
234 | 
235 |     return final_chunks
236 | 


--------------------------------------------------------------------------------
/TraceTalk/prompts/basic_prompt.py:
--------------------------------------------------------------------------------
 1 | from langchain.prompts import PromptTemplate
 2 | 
 3 | 
 4 | # Prompt the chatbot.
 5 | def basic_prompt():
 6 |     template = """
 7 | In your answer you should add a part called RESOURCE to extract the corresponding links from CONTEXT and list them in RESOURCE in markdown and citation format.
 8 | Strictly PROHIBITED to create or fabricate the links within RESOURCE, if no links are found please say sorry. The RESOURCE should ONLY consist of LINKS that are directly drawn from the CONTEXT.
 9 | If the answer to the QUESTION is not within your knowledge scope, admit it instead of concocting an answer. 
10 | In the event where the QUESTION doesn't correlate with the CONTEXT, it's acceptable to respond with an apology, indicating that more information is required for an accurate answer, or you may respectfully decline to provide an answer.
11 | 
12 | ===== CONTEXT =====
13 | {{context}}
14 | 
15 | ===== CHAT HISTORY =====
16 | {{chat_history}}
17 | 
18 | ===== RESOURCE =====
19 | {{resource}}
20 | 
21 | =========
22 | ANSWER THE QUESTION "{{query}}", FINAL A VERBOSE ANSWER, language used for answers is CONSISTENT with QUESTION:
23 | """
24 |     prompt = PromptTemplate(
25 |         template=template,
26 |         input_variables=["context", "chat_history", "resource","query"],
27 |         template_format="jinja2",
28 |         validate_template=False,
29 |     )  # Parameter the prompt template
30 |     return prompt
31 | 


--------------------------------------------------------------------------------
/TraceTalk/prompts/combine_prompt.py:
--------------------------------------------------------------------------------
 1 | import re
 2 | from langchain.prompts import PromptTemplate
 3 | from jinja2 import Template
 4 | from src import get_tokens_number
 5 | 
 6 | 
 7 | # Combine prompt.
 8 | def combine_prompt(chat_history, query, answer_list, link_list_list, MAX_TOKENS=3000):
 9 |     n = len(answer_list)
10 | 
11 |     template = f"""
12 | ===== RULES =====
13 | Now I will provide you with {n} chains, here is the definition of chain: each chain contains an answer and a link. The answers in the chain are the results from the links.
14 | In theory, each chain should produce a paragraph with links as the resources. It means that you MUST tell me from which references you make the summery.
15 | The smaller the number of the chain, the more important the information contained in the chain.
16 | Your final answer is verbose.
17 | But if the meaning of an answer in a certain chain is similar to 'I am not sure about your question' or 'I refuse to answer such a question', it means that this answer chain is deprecated, and you should actively ignore the information in this answer chain.
18 | 
19 | You now are asked to try to answer and integrate these {n} chains (integration means avoiding repetition, writing logically, smooth writing, giving verbose answer), and answer it in 2-4 paragraphs appropriately.
20 | The final answer is ALWAYS in Markdown format.
21 | Provide your answer in a style of CITATION format where you also list the resources from where you found the information at the end of the text. (an example is provided below)
22 | In addition, in order to demostrate the knowledge resources you have referred, please ALWAYs return a "RESOURCE" part in your answer. 
23 | RESOURCE can ONLY be a list of links, and each link means the knowledge resource of each chain. Each chain has only one RESOURCE part.
24 | The RESOURCE should ONLY consist of LINKS that are directly drawn from the CHAINE.
25 | Strictly PROHIBITED to create or fabricate the links within RESOURCE, if no links are found please say sorry. 
26 | 
27 | ===== EXAMPLE =====
28 | For exmaple, if you are provided with 2 chains, the template is below:
29 | CHAIN 1:
30 |     CONTEXT:
31 |         Text of chain 1. ABCDEFGHIJKLMNOPQRSTUVWXYZ
32 |     RESOURCE:
33 |         https://link1.com
34 | CHAIN 2:
35 |     CONTEXT:
36 |         Text of chain 2. ABCDEFGHIJKLMNOPQRSTUVWXYZ
37 |     RESOURCE:
38 |         https://link2.com
39 | 
40 | YOU COMPLETE ANSWER LIKE THIS:
41 |     Integrated text of chain 1 [1] and chain 2 [2]. Blablabla.
42 | REFERENCE:
43 |     [1] [title_link1](https://link1.com)
44 |     [2] [title_link2](https://link2.com)
45 | 
46 | """
47 | 
48 |     chat_history_text = """
49 | ===== CHAT HISTORY =====
50 | {{chat_history}}
51 | 
52 | """
53 |     template += chat_history_text
54 | 
55 |     init_chain_tmp = f"Now I provide you with {n} chains:"
56 |     template += init_chain_tmp
57 |     for i in range(n):
58 |         link_list = "\n".join([item for item in link_list_list[i]])
59 |         template_tmp = f"""
60 | ===== CHAIN {i+1} =====
61 | CONTEXT:
62 | {answer_list[i]}
63 | RESOURCE:
64 | {link_list}
65 | """
66 |         if get_tokens_number(template + template_tmp) > MAX_TOKENS:
67 |             break
68 |         template += template_tmp
69 |     # After breaking from the loop, print the remaining links.
70 |     for j in range(i + 1, n):
71 |         link_list = "\n".join([item for item in link_list_list[j]])
72 |         template_tmp = f"{link_list}\n"
73 |         if get_tokens_number(template + template_tmp) > MAX_TOKENS:
74 |             break
75 |         template += template_tmp
76 | 
77 |     template += """
78 | =========
79 | ANSWER THE QUESTION "{{query}}", FINAL A VERBOSE ANSWER, language used for answers is CONSISTENT with QUESTION:
80 | """
81 | 
82 |     prompt = Template(template).render(query=query, chat_history=chat_history)
83 |     return prompt
84 | 


--------------------------------------------------------------------------------
/TraceTalk/requirements.txt:
--------------------------------------------------------------------------------
 1 | qdrant-client==1.3.1
 2 | langchain==0.0.220
 3 | Flask==1.1.2
 4 | flask-cors==3.0.10
 5 | openai==0.27.0
 6 | pandas==2.2.2
 7 | requests==2.26.0
 8 | jinja2==3.0.1
 9 | dotenv==0.19.1
10 | python-dotenv==latest
11 | tiktoken==latest
12 | 


--------------------------------------------------------------------------------
/TraceTalk/roots.sst:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/TraceTalk/roots.sst


--------------------------------------------------------------------------------
/TraceTalk/src.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | 
 3 | from openai import OpenAI
 4 | import tiktoken
 5 | 
 6 | 
 7 | def get_emmbeddings(text):
 8 |     """
 9 |     Get the embeddings of the text.
10 | 
11 |     Args:
12 |         text (str): The text to get the embeddings of.
13 | 
14 |     Returns:
15 |         embedded_query (list): The embeddings of the text.
16 |     """
17 |     client = OpenAI(
18 |         api_key=os.getenv("OPENAI_API_KEY"),
19 |     )
20 |     embedded_query = client.embeddings.create(
21 |         input=text,
22 |         model="text-embedding-ada-002",
23 |     ).data[0].embedding
24 | 
25 |     return embedded_query  # It is a vector of numbers.
26 | 
27 | 
28 | def get_tokens_number(text="", encoding_type="cl100k_base", model_name="gpt-4o-mini"):
29 |     """
30 |     Get the number of tokens in the text.
31 | 
32 |     Args:
33 |         text (str): The text to get the number of tokens of.
34 |         encoding_type (str): The encoding type.
35 |         model_name (str): The model name.
36 | 
37 |     Returns:
38 |         tokens_number (int): The number of tokens in the text.
39 |     """
40 |     encoding = tiktoken.get_encoding(encoding_type)
41 |     encoding = tiktoken.encoding_for_model(model_name)
42 |     return len(encoding.encode(text))
43 | 


--------------------------------------------------------------------------------
/TraceTalk/update_collection.py:
--------------------------------------------------------------------------------
1 | from prep_data import update_collection_to_database
2 | 
3 | if __name__ == "__main__":
4 |     scv_file_path = r"vector-db-persist-directory/book dada/book data.csv"
5 |     # prep_book_data(scv_file_path)
6 |     update_collection_to_database(scv_file_path)
7 | 


--------------------------------------------------------------------------------
/TraceTalk/utils/json_tokenizer.py:
--------------------------------------------------------------------------------
 1 | class JSONTokenizer:
 2 |     def __init__(self):
 3 |         self.stack = []  # 用于保存当前的JSON片段，如开始一个对象或数组
 4 |         self.last_token_type = None
 5 | 
 6 |     def is_valid(self, current_state, token):
 7 |         # 检查token是否合法，并更新内部状态
 8 |         if self.is_start_of_object(token) and (
 9 |             self.last_token_type in [None, "start_array", "comma", "colon"]
10 |         ):
11 |             self.stack.append("object")
12 |             self.last_token_type = "start_object"
13 |             return True
14 |         if self.is_start_of_array(token) and (
15 |             self.last_token_type in [None, "start_array", "comma", "colon"]
16 |         ):
17 |             self.stack.append("array")
18 |             self.last_token_type = "start_array"
19 |             return True
20 |         if self.is_end_of_object(token) and self.stack[-1] == "object":
21 |             self.stack.pop()
22 |             self.last_token_type = "end_object"
23 |             return True
24 |         if self.is_end_of_array(token) and self.stack[-1] == "array":
25 |             self.stack.pop()
26 |             self.last_token_type = "end_array"
27 |             return True
28 |         if self.is_key_or_value(token) and (
29 |             self.last_token_type in ["start_object", "comma"]
30 |         ):
31 |             self.last_token_type = "key_or_value"
32 |             return True
33 |         if self.is_colon(token) and self.last_token_type == "key_or_value":
34 |             self.last_token_type = "colon"
35 |             return True
36 |         if self.is_comma(token) and self.last_token_type in [
37 |             "key_or_value",
38 |             "end_object",
39 |             "end_array",
40 |         ]:
41 |             self.last_token_type = "comma"
42 |             return True
43 |         return False
44 | 


--------------------------------------------------------------------------------
/TraceTalk/utils/test_tokenizer.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | import math
  3 | import os
  4 | from typing import Dict, List
  5 | 
  6 | from openai import OpenAI
  7 | 
  8 | from utils.json_tokenizer import JsonTokenizer
  9 | 
 10 | 
 11 | def softmax(tokens: List[Dict[str, float]]) -> List[Dict[str, float]]:
 12 |     exp_probs = [{"token": t["token"], "prob": math.exp(t["logprob"])} for t in tokens]
 13 |     total = sum(t["prob"] for t in exp_probs)
 14 |     return [{"token": t["token"], "prob": t["prob"] / total} for t in exp_probs]
 15 | 
 16 | 
 17 | def preprocessor(
 18 |     tokens: List[Dict[str, float]], json_tokenizer: JsonTokenizer
 19 | ) -> List[Dict[str, float]]:
 20 |     valid_tokens = [t for t in tokens if json_tokenizer.is_valid(t["token"])]
 21 | 
 22 |     if not valid_tokens:
 23 |         if json_tokenizer.stack:
 24 |             closing_token = json_tokenizer.stack[-1]
 25 |             if json_tokenizer.is_valid(closing_token):
 26 |                 valid_tokens = [
 27 |                     {"token": closing_token, "logprob": -10.0}
 28 |                 ]  # 给一个很小的概率
 29 |         elif not json_tokenizer.has_content:
 30 |             valid_tokens = [
 31 |                 {"token": "{", "logprob": -1.0},
 32 |                 {"token": "[", "logprob": -1.0},
 33 |             ]
 34 | 
 35 |     return softmax(valid_tokens)
 36 | 
 37 | 
 38 | def generate_json_with_llm(prompt: str, max_tokens: int = 100) -> str:
 39 |     json_tokenizer = JsonTokenizer()
 40 |     result = ""
 41 |     client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
 42 | 
 43 |     while len(result) < max_tokens and not json_tokenizer.is_complete():
 44 |         print(f"Prompt: {prompt + result}")  # 为了调试
 45 |         response = client.chat.completions.create(
 46 |             model="gpt-4-turbo",
 47 |             messages=[
 48 |                 {
 49 |                     "role": "system",
 50 |                     "content": "You are a JSON generator. Generate valid JSON only.",
 51 |                 },
 52 |                 {"role": "user", "content": prompt + result},
 53 |             ],
 54 |             max_tokens=1,  # 每次只生成一个 token
 55 |             n=1,
 56 |             temperature=0.7,
 57 |             logprobs=True,
 58 |             top_logprobs=5,
 59 |         )
 60 | 
 61 |         if response.choices[0].logprobs and response.choices[0].logprobs.content:
 62 |             token_info = response.choices[0].logprobs.content[0]
 63 |             raw_tokens = [
 64 |                 {"token": logprob.token, "logprob": logprob.logprob}
 65 |                 for logprob in token_info.top_logprobs
 66 |             ]
 67 | 
 68 |             processed_tokens = preprocessor(raw_tokens, json_tokenizer)
 69 | 
 70 |             if processed_tokens:
 71 |                 next_token = max(processed_tokens, key=lambda x: x["prob"])
 72 |                 result += next_token["token"]
 73 |                 json_tokenizer.is_valid(next_token["token"])
 74 |                 print(f"Generated token: {next_token['token']}")  # 为了调试
 75 |             else:
 76 |                 print("No valid tokens available. Ending generation.")
 77 |                 break
 78 | 
 79 |         prompt += f"\nPlease continue writing the answer in json format:\n{result}"
 80 | 
 81 |     return result
 82 | 
 83 | 
 84 | if __name__ == "__main__":
 85 |     prompt = "Task: Generate a JSON object describing a person with name and age. The answer schema is as follows:"
 86 |     schema = {
 87 |         "name": "string",
 88 |         "age": "number",
 89 |     }
 90 | 
 91 |     print("Generating JSON...")
 92 |     generated_json = generate_json_with_llm(
 93 |         prompt + "\n" + json.dumps(schema, indent=2)
 94 |     )
 95 |     print("\nGenerated JSON:")
 96 |     print(generated_json)
 97 | 
 98 |     try:
 99 |         parsed_json = json.loads(generated_json)
100 |         print("\nSuccessfully parsed the generated JSON:")
101 |         print(json.dumps(parsed_json, indent=2))
102 |     except json.JSONDecodeError as e:
103 |         print(f"\nError: Generated JSON is not valid: {e}")
104 | 


--------------------------------------------------------------------------------
/TraceTalk/vector-db-persist-directory/resources/assets.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/TraceTalk/vector-db-persist-directory/resources/assets.txt


--------------------------------------------------------------------------------
/TraceTalk/vector-db-persist-directory/resources/assignments.txt:
--------------------------------------------------------------------------------
 1 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/amazon-sagemaker-mlops-workshop-warm-up.md',
 2 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/apply-your-skills.md',
 3 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/build-your-own-custom-vis.md',
 4 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/classifying-datasets.md',
 5 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/data-processing-in-python.md',
 6 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/data-science-project-using-azure-ml-sdk.md',
 7 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/data-science-scenarios.md',
 8 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/dive-into-the-beehive.md',
 9 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/explore-a-planetary-computer-dataset.md',
10 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/lines-scatters-and-bars.md',
11 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/low-code-no-code-data-science-project-on-azure-ml.md',
12 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/market-research.md',
13 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/tell-a-story.md',
14 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/try-it-in-excel.md',
15 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/write-a-data-ethics-case-study.md',
16 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/ml-fundamentals/create-a-regression-model.md',
17 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/ml-fundamentals/explore-classification-methods.md',
18 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/ml-fundamentals/exploring-visualizations.md',
19 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/ml-fundamentals/parameter-play.md',
20 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/ml-fundamentals/regression-with-scikit-learn.md',
21 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/ml-fundamentals/retrying-some-regression.md',
22 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/ml-fundamentals/study-the-solvers.md',
23 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/ml-fundamentals/try-a-different-model.md',
24 | 


--------------------------------------------------------------------------------
/TraceTalk/vector-db-persist-directory/resources/data-science.txt:
--------------------------------------------------------------------------------
 1 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-in-the-wild.md',
 2 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-in-the-cloud/data-science-in-the-cloud.md',
 3 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-in-the-cloud/introduction.md',
 4 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-in-the-cloud/the-azure-ml-sdk-way.md',
 5 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-in-the-cloud/the-low-code-no-code-way.md',
 6 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-lifecycle/analyzing.md',
 7 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-lifecycle/communication.md',
 8 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-lifecycle/data-science-lifecycle.md',
 9 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-lifecycle/introduction.md',
10 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-visualization/data-visualization.md',
11 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-visualization/meaningful-visualizations.md',
12 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-visualization/visualization-distributions.md',
13 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-visualization/visualization-proportions.md',
14 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-visualization/visualization-relationships.md',
15 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-visualization/visualizing-quantities.md',
16 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/introduction/data-science-ethics.md',
17 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/introduction/defining-data-science.md',
18 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/introduction/defining-data.md',
19 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/introduction/introduction-to-statistics-and-probability.md',
20 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/introduction/introduction.md',
21 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/working-with-data/data-preparation.md',
22 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/working-with-data/non-relational-data.md',
23 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/working-with-data/numpy.md',
24 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/working-with-data/pandas.md',
25 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/working-with-data/relational-databases.md',
26 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/working-with-data/working-with-data.md',
27 | 


--------------------------------------------------------------------------------
/TraceTalk/vector-db-persist-directory/resources/data.txt:
--------------------------------------------------------------------------------
 1 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-in-the-wild.md',
 2 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-in-the-cloud/data-science-in-the-cloud.md',
 3 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-in-the-cloud/introduction.md',
 4 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-in-the-cloud/the-azure-ml-sdk-way.md',
 5 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-in-the-cloud/the-low-code-no-code-way.md',
 6 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-lifecycle/analyzing.md',
 7 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-lifecycle/communication.md',
 8 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-lifecycle/data-science-lifecycle.md',
 9 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-lifecycle/introduction.md',
10 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-visualization/data-visualization.md',
11 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-visualization/meaningful-visualizations.md',
12 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-visualization/visualization-distributions.md',
13 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-visualization/visualization-proportions.md',
14 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-visualization/visualization-relationships.md',
15 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-visualization/visualizing-quantities.md',
16 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/introduction/data-science-ethics.md',
17 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/introduction/defining-data-science.md',
18 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/introduction/defining-data.md',
19 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/introduction/introduction-to-statistics-and-probability.md',
20 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/introduction/introduction.md',
21 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/working-with-data/data-preparation.md',
22 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/working-with-data/non-relational-data.md',
23 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/working-with-data/numpy.md',
24 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/working-with-data/pandas.md',
25 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/working-with-data/relational-databases.md',
26 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/working-with-data/working-with-data.md',
27 | 


--------------------------------------------------------------------------------
/TraceTalk/vector-db-persist-directory/resources/deep-learning.txt:
--------------------------------------------------------------------------------
 1 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/autoencoder.md',
 2 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/cnn.md',
 3 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/difussion-model.md',
 4 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/dl-overview.md',
 5 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/dl-summary.md',
 6 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/dqn.md',
 7 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/gan.md',
 8 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/image-classification.md',
 9 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/image-segmentation.md',
10 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/lstm.md',
11 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/nlp.md',
12 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/object-detection.md',
13 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/rnn.md',
14 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/time-series.md',
15 | 


--------------------------------------------------------------------------------
/TraceTalk/vector-db-persist-directory/resources/machine-learning-productionization.txt:
--------------------------------------------------------------------------------
1 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/machine-learning-productionization/data-engineering.md',
2 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/machine-learning-productionization/model-deployment.md',
3 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/machine-learning-productionization/model-training-and-evaluation.md',
4 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/machine-learning-productionization/overview.md',
5 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/machine-learning-productionization/problem-framing.md',
6 | 


--------------------------------------------------------------------------------
/TraceTalk/vector-db-persist-directory/resources/ml-advanced.txt:
--------------------------------------------------------------------------------
 1 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/kernel-method.md',
 2 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/model-selection.md',
 3 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/unsupervised-learning-pca-and-clustering.md',
 4 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/unsupervised-learning.md',
 5 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/clustering/clustering-models-for-machine-learning.md',
 6 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/clustering/introduction-to-clustering.md',
 7 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/clustering/k-means-clustering.md',
 8 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/ensemble-learning/bagging.md',
 9 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/ensemble-learning/feature-importance.md',
10 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/ensemble-learning/getting-started-with-ensemble-learning.md',
11 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/ensemble-learning/random-forest.md',
12 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/gradient-boosting/gradient-boosting-example.md',
13 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/gradient-boosting/gradient-boosting.md',
14 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/gradient-boosting/introduction-to-gradient-boosting.md',
15 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/gradient-boosting/xgboost-k-fold-cv-feature-importance.md',
16 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/gradient-boosting/xgboost.md',
17 | 


--------------------------------------------------------------------------------
/TraceTalk/vector-db-persist-directory/resources/ml-fundamentals.txt:
--------------------------------------------------------------------------------
 1 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/build-a-web-app-to-use-a-machine-learning-model.md',
 2 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/ml-overview.md',
 3 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/ml-summary.md',
 4 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/classification/applied-ml-build-a-web-app.md',
 5 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/classification/getting-started-with-classification.md',
 6 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/classification/introduction-to-classification.md',
 7 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/classification/more-classifiers.md',
 8 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/classification/yet-other-classifiers.md',
 9 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/introduction/fairness-and-machine-learning.md',
10 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/introduction/introduction-to-machine-learning.md',
11 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/introduction/introduction.md',
12 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/introduction/techniques-of-machine-learning.md',
13 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/introduction/the-history-of-machine-learning-and-ai.md',
14 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/neural-network/autoencoders.md',
15 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/neural-network/convolutional-neural-networks.md',
16 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/neural-network/introduction.md',
17 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/neural-network/neural-network-overview.md',
18 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/neural-network/nn-basics.md',
19 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/neural-network/nn-hands-on.md',
20 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/neural-network/nn-implementation.md',
21 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/neural-network/recurrent-neural-networks.md',
22 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/parameter-optimization/gradient-descent.md',
23 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/parameter-optimization/loss-function.md',
24 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/parameter-optimization/parameter-optimization.md',
25 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/regression/linear-and-polynomial-regression.md',
26 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/regression/logistic-regression.md',
27 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/regression/managing-data.md',
28 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/regression/regression-models-for-machine-learning.md',
29 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/regression/tools-of-the-trade.md',
30 | 


--------------------------------------------------------------------------------
/TraceTalk/vector-db-persist-directory/resources/prerequisites.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/TraceTalk/vector-db-persist-directory/resources/prerequisites.txt


--------------------------------------------------------------------------------
/TraceTalk/vector-db-persist-directory/resources/slides.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/TraceTalk/vector-db-persist-directory/resources/slides.txt


--------------------------------------------------------------------------------
/TraceTalk/vector-db-persist-directory/resources/supporting-materials.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/TraceTalk/vector-db-persist-directory/resources/supporting-materials.txt


--------------------------------------------------------------------------------
/TraceTalk/workflows/update_source_link.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import sys
 3 | 
 4 | sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
 5 | from prep_data import prep_data
 6 | 
 7 | 
 8 | def main():
 9 |     # Update the source links.
10 |     update_source_link()
11 |     # Update the embeddings of text.
12 |     prep_data()
13 | 
14 | 
15 | def update_source_link():
16 |     path = "open-machine-learning-jupyter-book"
17 |     folders = []
18 |     md_files = []
19 |     for root, dirs, files in os.walk(path):
20 |         for file in files:
21 |             if file.endswith(".md"):
22 |                 md_files.append(os.path.join(root, file))
23 |         for dir in dirs:
24 |             if not dir.startswith("."):
25 |                 folders.append(dir)
26 | 
27 |     for folder in folders:
28 |         file_content = f"#### {folder}:\n"
29 |         folder_md_files = []
30 | 
31 |         for md_file in md_files:
32 |             if md_file.startswith(os.path.join(path, folder)):
33 |                 md_file = md_file.replace("\\", "/")
34 |                 folder_md_files.append(
35 |                     f"'https://github.com/open-academy/machine-learning/tree/main/{md_file}',\n"
36 |                 )
37 | 
38 |         file_content += "".join(folder_md_files)
39 |         # file_content += '\n'
40 | 
41 |         if folder_md_files:
42 |             with open(
43 |                 r"chatbot\vector-db-persist-directory\resources\{}.txt".format(folder),
44 |                 "w",
45 |             ) as f:
46 |                 f.write(file_content)
47 | 
48 | 
49 | if __name__ == "__main__":
50 |     main()
51 | 


--------------------------------------------------------------------------------
/docs/api_documentation.md:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/docs/api_documentation.md


--------------------------------------------------------------------------------
/docs/learning_mechanism.md:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/docs/learning_mechanism.md


--------------------------------------------------------------------------------
/docs/setup.md:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/docs/setup.md


--------------------------------------------------------------------------------
/frontend/main.py:
--------------------------------------------------------------------------------
 1 | import asyncio
 2 | 
 3 | from fastapi import FastAPI, Request
 4 | from fastapi.responses import StreamingResponse
 5 | 
 6 | app = FastAPI()
 7 | 
 8 | 
 9 | async def process_request(content: str, user: dict):
10 |     yield "LeAgent is processing your request...\n"
11 |     await asyncio.sleep(0.5)
12 |     yield f"LeAgent is analyzing the message from user {user['name']}: {content}\n"
13 |     await asyncio.sleep(0.5)
14 |     yield "LeAgent is generating a response...\n"
15 |     await asyncio.sleep(0.5)
16 |     yield "TASK_DONE"
17 | 
18 | 
19 | @app.post("/process")
20 | async def process(request: Request):
21 |     data = await request.json()
22 |     content = data["content"]
23 |     user = data["user"]
24 | 
25 |     async def event_generator():
26 |         async for message in process_request(content, user):
27 |             yield f"{message}\n"
28 | 
29 |     return StreamingResponse(event_generator(), media_type="text/plain")
30 | 
31 | 
32 | if __name__ == "__main__":
33 |     import uvicorn
34 | 
35 |     uvicorn.run(app, host="0.0.0.0", port=8101)
36 | 


--------------------------------------------------------------------------------
/notebooks/experiment_adaptive_learning.ipynb:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/notebooks/experiment_adaptive_learning.ipynb


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/requirements.txt


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/setup.py


--------------------------------------------------------------------------------
/src/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/__init__.py


--------------------------------------------------------------------------------
/src/config.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/config.py


--------------------------------------------------------------------------------
/src/llm/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/llm/__init__.py


--------------------------------------------------------------------------------
/src/llm/api_integration.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/llm/api_integration.py


--------------------------------------------------------------------------------
/src/llm/model.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/llm/model.py


--------------------------------------------------------------------------------
/src/main.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/main.py


--------------------------------------------------------------------------------
/src/nlu/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/nlu/__init__.py


--------------------------------------------------------------------------------
/src/nlu/intent_recognition.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/nlu/intent_recognition.py


--------------------------------------------------------------------------------
/src/online_learning/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/online_learning/__init__.py


--------------------------------------------------------------------------------
/src/online_learning/adaptive_model.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/online_learning/adaptive_model.py


--------------------------------------------------------------------------------
/src/online_learning/memory_manager.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/online_learning/memory_manager.py


--------------------------------------------------------------------------------
/src/personalization/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/personalization/__init__.py


--------------------------------------------------------------------------------
/src/personalization/preference_learner.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/personalization/preference_learner.py


--------------------------------------------------------------------------------
/src/personalization/user_profile.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/personalization/user_profile.py


--------------------------------------------------------------------------------
/src/task_management/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/task_management/__init__.py


--------------------------------------------------------------------------------
/src/task_management/task_handler.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/task_management/task_handler.py


--------------------------------------------------------------------------------
/src/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/utils/__init__.py


--------------------------------------------------------------------------------
/src/utils/helpers.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/utils/helpers.py


--------------------------------------------------------------------------------
/tests/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/tests/__init__.py


--------------------------------------------------------------------------------
/tests/test_adaptive_model.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/tests/test_adaptive_model.py


--------------------------------------------------------------------------------
/tests/test_main.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/tests/test_main.py


--------------------------------------------------------------------------------
/tests/test_personalization.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/tests/test_personalization.py


--------------------------------------------------------------------------------