├── .gitignore ├── README.md ├── TraceTalk ├── app.py ├── chatbot_agent.py ├── handle_multiprocessing.py ├── main.py ├── package-lock.json ├── prep_data.py ├── prompts │ ├── basic_prompt.py │ └── combine_prompt.py ├── requirements.txt ├── roots.sst ├── src.py ├── update_collection.py ├── utils │ ├── json_tokenizer.py │ └── test_tokenizer.py ├── vector-db-persist-directory │ ├── book dada │ │ └── book data.csv │ └── resources │ │ ├── assets.txt │ │ ├── assignments.txt │ │ ├── data-science.txt │ │ ├── data.txt │ │ ├── deep-learning.txt │ │ ├── machine-learning-productionization.txt │ │ ├── ml-advanced.txt │ │ ├── ml-fundamentals.txt │ │ ├── prerequisites.txt │ │ ├── slides.txt │ │ └── supporting-materials.txt └── workflows │ └── update_source_link.py ├── docs ├── api_documentation.md ├── learning_mechanism.md └── setup.md ├── frontend └── main.py ├── notebooks └── experiment_adaptive_learning.ipynb ├── requirements.txt ├── setup.py ├── src ├── __init__.py ├── config.py ├── llm │ ├── __init__.py │ ├── api_integration.py │ └── model.py ├── main.py ├── nlu │ ├── __init__.py │ └── intent_recognition.py ├── online_learning │ ├── __init__.py │ ├── adaptive_model.py │ └── memory_manager.py ├── personalization │ ├── __init__.py │ ├── preference_learner.py │ └── user_profile.py ├── task_management │ ├── __init__.py │ └── task_handler.py └── utils │ ├── __init__.py │ └── helpers.py └── tests ├── __init__.py ├── test_adaptive_model.py ├── test_main.py └── test_personalization.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Virtual Environment 2 | venv/ 3 | env/ 4 | .venv/ 5 | .env/ 6 | 7 | # Python cache files 8 | __pycache__/ 9 | *.py[cod] 10 | *$py.class 11 | 12 | # C extensions 13 | *.so 14 | 15 | # Distribution / packaging 16 | .Python 17 | build/ 18 | develop-eggs/ 19 | dist/ 20 | downloads/ 21 | eggs/ 22 | .eggs/ 23 | lib/ 24 | lib64/ 25 | parts/ 26 | sdist/ 27 | var/ 28 | wheels/ 29 | share/python-wheels/ 30 | *.egg-info/ 31 | .installed.cfg 32 | *.egg 33 | 34 | # PyInstaller 35 | *.manifest 36 | *.spec 37 | 38 | # Installer logs 39 | pip-log.txt 40 | pip-delete-this-directory.txt 41 | 42 | # Unit test / coverage reports 43 | htmlcov/ 44 | .tox/ 45 | .nox/ 46 | .coverage 47 | .coverage.* 48 | .cache 49 | nosetests.xml 50 | coverage.xml 51 | *.cover 52 | *.py,cover 53 | .hypothesis/ 54 | .pytest_cache/ 55 | 56 | # Jupyter Notebook 57 | .ipynb_checkpoints 58 | 59 | # IPython 60 | profile_default/ 61 | ipython_config.py 62 | 63 | # pyenv 64 | .python-version 65 | 66 | # Environments 67 | .env 68 | .venv 69 | env/ 70 | venv/ 71 | ENV/ 72 | env.bak/ 73 | venv.bak/ 74 | 75 | # Spyder project settings 76 | .spyderproject 77 | .spyproject 78 | 79 | # Rope project settings 80 | .ropeproject 81 | 82 | # mkdocs documentation 83 | /site 84 | 85 | # mypy 86 | .mypy_cache/ 87 | .dmypy.json 88 | dmypy.json 89 | 90 | # Pyre type checker 91 | .pyre/ 92 | 93 | # pytype static type analyzer 94 | .pytype/ 95 | 96 | # Operating System Files 97 | .DS_Store # macOS 98 | Thumbs.db # Windows 99 | 100 | # IDE specific files 101 | .vscode/ 102 | .idea/ 103 | *.swp 104 | *.swo 105 | *~ 106 | 107 | # LeAgent specific ignores (add any project-specific files/directories here) 108 | # data/user_data/ # Uncomment if you want to ignore user data 109 | # model_checkpoints/ # Uncomment if you want to ignore model checkpoints -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # LeAgent: Your Adaptive AI Companion 2 | 3 | ### Note: My dear developers and users, this project was established in March 2023. At that time, the concept of RAG had not yet become widespread. However, I've recently come up with new ideas and plan to work on a project involving a learning RAG+agent (the ultimate goal is to realize an AI like Iron Man's Jarvis). Thanks! Please give me a few months to restructure this project. Thank you for your support. As a professional developer and algorithm engineer, I promise not to let you down. The vision is as follows: 4 | 5 | ## Overview 6 | 7 | LeAgent is an innovative AI assistant designed to grow and adapt alongside its user, creating a unique and evolving relationship. Inspired by AI assistants in science fiction, such as J.A.R.V.I.S. or Friday from the Iron Man films, LeAgent aims to bridge the gap between static, pre-programmed assistants and truly personalized AI companions. 8 | 9 | The name "LeAgent" combines the French article "Le" with "Agent," symbolizing a sophisticated, adaptive assistant that transcends language and cultural barriers. 10 | 11 | ## Key Features 12 | 13 | - **Adaptive Learning**: Learns and adapts in real-time based on user interactions. 14 | - **Personalization**: Tailors responses, suggestions, and behavior to individual user preferences and habits. 15 | - **Natural Interaction**: Utilizes advanced natural language understanding for more human-like conversations. 16 | - **Task Management**: Provides efficient assistance with various tasks, from scheduling to information retrieval. 17 | - **Continuous Growth**: Designed to continuously improve and expand capabilities over time. 18 | - **Privacy-Focused**: Ensures user data is handled securely and ethically. 19 | 20 | ## Technology Stack 21 | 22 | - Large Language Models (LLMs) for natural language processing 23 | - Incremental learning algorithms for real-time adaptation 24 | - Advanced Natural Language Understanding (NLU) for intent recognition 25 | - Efficient data structures for memory management and quick retrieval 26 | 27 | ![****[What is vector search?](https://www.elastic.co/cn/what-is/vector-search)****](https://user-images.githubusercontent.com/65004114/226753565-e2230d59-5750-4d77-840f-4f777441a4dc.png) 28 | 29 | ![Framework of TraceTalk](https://github.com/Appointat/Chat-with-Document-s-using-ChatGPT-API-and-Text-Embedding/assets/65004114/78a1b834-41cf-4ddc-bae3-26398ce53bb8) 30 | 31 | ## Getting Started 32 | 33 | ### Prerequisites 34 | 35 | - Python 3.8+ 36 | - pip (Python package manager) 37 | 38 | ### Installation 39 | 40 | 1. Clone the repository: 41 | ``` 42 | git clone https://github.com/your-username/LeAgent.git 43 | cd LeAgent 44 | ``` 45 | 46 | 2. Create a virtual environment: 47 | ``` 48 | python -m venv venv 49 | source venv/bin/activate # On Windows use `venv\Scripts\activate` 50 | ``` 51 | 52 | 3. Install the required packages: 53 | ``` 54 | pip install -r requirements.txt 55 | ``` 56 | 57 | ### Usage 58 | 59 | (Add basic usage instructions here once the core functionality is implemented) 60 | 61 | ## Project Structure 62 | 63 | ``` 64 | LeAgent/ 65 | ├── src/ 66 | │ ├── llm/ 67 | │ ├── online_learning/ 68 | │ ├── personalization/ 69 | │ ├── nlu/ 70 | │ ├── task_management/ 71 | │ └── utils/ 72 | ├── data/ 73 | ├── tests/ 74 | ├── docs/ 75 | └── notebooks/ 76 | ``` 77 | 78 | ## Contributing 79 | 80 | We welcome contributions from the community! If you're interested in improving LeAgent, please follow these steps: 81 | 82 | 1. Fork the repository 83 | 2. Create a new branch (`git checkout -b feature/AmazingFeature`) 84 | 3. Make your changes 85 | 4. Commit your changes (`git commit -m 'Add some AmazingFeature'`) 86 | 5. Push to the branch (`git push origin feature/AmazingFeature`) 87 | 6. Open a Pull Request 88 | 89 | Please read [CONTRIBUTING.md](CONTRIBUTING.md) for details on our code of conduct and the process for submitting pull requests. 90 | 91 | ## License 92 | 93 | This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details. 94 | 95 | ## Acknowledgments 96 | 97 | - Inspiration from sci-fi AI assistants 98 | - OpenAI for advances in language models 99 | - The open-source community for various tools and libraries used in this project 100 | 101 | --- 102 | 103 | LeAgent: Votre compagnon AI évolutif - Your adaptive AI companion 104 | -------------------------------------------------------------------------------- /TraceTalk/app.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | import codecs 3 | import os 4 | import re 5 | import sys 6 | from typing import List 7 | 8 | from fastapi import FastAPI, Request 9 | from fastapi.middleware.cors import CORSMiddleware 10 | from fastapi.responses import StreamingResponse 11 | from main import main as agent 12 | 13 | core_directory = os.path.dirname(os.path.abspath(__file__)) 14 | if core_directory not in sys.path: 15 | sys.path.append(core_directory) 16 | 17 | 18 | app = FastAPI() 19 | 20 | # Add CORS middleware to allow cross-origin requests 21 | app.add_middleware( 22 | CORSMiddleware, 23 | allow_origins=["*"], 24 | allow_credentials=True, 25 | allow_methods=["*"], 26 | allow_headers=["*"], 27 | ) 28 | 29 | sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach()) 30 | 31 | 32 | async def process_messages(messages: List[str]): 33 | result = agent(messages=messages) 34 | chunks = re.split(r"(\n+)", result) 35 | for chunk in chunks: 36 | if chunk.strip(): 37 | # Split the chunk into sentences 38 | if len(chunk) > 100: 39 | sentences = re.split(r"([.!?。!?]+)", chunk) 40 | for i in range(0, len(sentences), 2): 41 | sentence = sentences[i] + ( 42 | sentences[i + 1] if i + 1 < len(sentences) else "" 43 | ) 44 | yield f"{sentence}" 45 | await asyncio.sleep(0.05) # pause between sentences 46 | else: 47 | yield f"{chunk}" 48 | 49 | await asyncio.sleep(0.1) # pause between chunks 50 | else: 51 | yield chunk 52 | 53 | 54 | @app.post("/process") 55 | async def handler(request: Request): 56 | print("Received request...") 57 | try: 58 | data = await request.json() 59 | messages = data.get("messages", []) 60 | # messages_str_list = [message["content"] for message in messages 61 | messages_str_list = [message["content"] for message in messages] 62 | print(f"Received messages:\n{messages_str_list}") 63 | 64 | return StreamingResponse( 65 | process_messages(messages_str_list), media_type="text/event-stream" 66 | ) 67 | except Exception as e: 68 | print(f"Error occurred: {e}") 69 | return {"error": str(e)} 70 | 71 | 72 | if __name__ == "__main__": 73 | print("Starting the server...") 74 | import uvicorn 75 | 76 | uvicorn.run(app, host="0.0.0.0", port=8101) 77 | -------------------------------------------------------------------------------- /TraceTalk/chatbot_agent.py: -------------------------------------------------------------------------------- 1 | # Import basic libraries. 2 | import os 3 | import re 4 | from collections import deque 5 | from typing import List 6 | 7 | # Import OpenAI API and Langchain libraries. 8 | from openai import OpenAI 9 | from langchain.prompts import PromptTemplate 10 | 11 | # Importing prompts. 12 | from prompts.basic_prompt import basic_prompt 13 | from prompts.combine_prompt import combine_prompt 14 | 15 | # Import Qdrant client (vector database). 16 | from qdrant_client import QdrantClient 17 | from src import get_emmbeddings, get_tokens_number 18 | 19 | 20 | class ChatbotAgent: 21 | """ 22 | A class used to represent a chatbot agent. 23 | """ 24 | 25 | def __init__( 26 | self, 27 | openai_api_key: str, 28 | qdrant_url: str, 29 | qdrant_api_key: str, 30 | messages: List[str], 31 | ): 32 | """ 33 | Initializes an instance of the ChatbotAgent class. 34 | 35 | Args: 36 | openai_api_key (str): The API key provided by OpenAI to authenticate requests. 37 | qdrant_url (str): The URL for the Qdrant service to connect with. 38 | qdrant_api_key (str): The API key for the Qdrant service to authenticate requests. 39 | messages (List[str]): A list of messages that the chatbot agent will process. 40 | 41 | """ 42 | # Set OpenAI API key and initialize client. 43 | self._openai_api_key = openai_api_key 44 | os.environ["OPENAI_API_KEY"] = self._openai_api_key 45 | self.client = OpenAI(api_key=self._openai_api_key) 46 | 47 | # Initialize Qdrant client 48 | self.qdrant_client = QdrantClient( 49 | url=qdrant_url, 50 | prefer_grpc=False, 51 | api_key=qdrant_api_key, 52 | ) 53 | self.qdrant_client.get_collections() 54 | 55 | # Initialize the chat history. 56 | self.count = 1 # Count the number of times the chatbot has been called. 57 | self._max_chat_history_length = 20 58 | self.chat_history = deque(maxlen=self._max_chat_history_length) 59 | init_prompt = "I am TraceTalk, a cutting-edge chatbot designed to encapsulate the power of advanced AI technology, with a special focus on data science, machine learning, and deep learning. (https://github.com/Appointat/Chat-with-Document-s-using-ChatGPT-API-and-Text-Embedding)\n" 60 | self.chat_history.append({"role": "chatbot", "content": init_prompt}) 61 | for i in range(len(messages)): 62 | if i % 2 == 0: 63 | self.chat_history.append({"role": "user", "content": messages[i]}) 64 | else: 65 | self.chat_history.append({"role": "chatbot", "content": messages[i]}) 66 | 67 | self.query = "" 68 | self.answer = "" 69 | 70 | def search_context_qdrant( 71 | self, query, collection_name, vector_name="content", top_k=10 72 | ): 73 | """ 74 | Search the Qdrant database for the top k most similar vectors to the query. 75 | 76 | Args: 77 | query (str): The query to search for. 78 | collection_name (str): The name of the collection to search in. 79 | vector_name (str): The name of the vector to search for. 80 | top_k (int): The number of results to return. 81 | 82 | Returns: 83 | query_results (list): A list of the top k most similar vectors to the query. 84 | """ 85 | # Create embedding vector from user query. 86 | embedded_query = get_emmbeddings(query) 87 | 88 | query_results = self.qdrant_client.search( 89 | collection_name=collection_name, 90 | query_vector=(vector_name, embedded_query), 91 | limit=top_k, 92 | ) 93 | 94 | return query_results 95 | 96 | def prompt_chatbot(self, context, chat_history, resource, query): 97 | """ 98 | Prompt the chatbot to generate a response. 99 | 100 | Args: 101 | context (str): The context for the query. 102 | chat_history (str): The chat history. 103 | resource (str): The resource string. 104 | query (str): The user's query. 105 | 106 | Returns: 107 | str: The chatbot's response to the user's query. 108 | """ 109 | prompt = f"Context: {context}\nChat History: {chat_history}\nResources: {resource}\nQuestion: {query}\nPlease provide an answer based on the given context and resources." 110 | 111 | response = self.client.chat.completions.create( 112 | model="gpt-4o-mini", 113 | messages=[{"role": "user", "content": prompt}], 114 | temperature=0.8 115 | ) 116 | return response.choices[0].message.content 117 | 118 | def prompt_combine_chain(self, query, answer_list, link_list_list): 119 | """ 120 | Prompt the chatbot to generate a response. 121 | 122 | Args: 123 | query (str): The user's query. 124 | answer_list (list): A list of answers to the user's query. 125 | link_list (list): A list of links to the user's query. 126 | Returns: 127 | chatbot_answer (str): The chatbot's response to the user's query. 128 | """ 129 | MAX_TOKENS_CHAT_HISTORY = 1000 130 | n = len(answer_list) 131 | 132 | if n == 0: 133 | return "I'm sorry, there is not enough information to provide a meaningful answer to your question. Can you please provide more context or a specific question?" 134 | else: 135 | chat_history = self.convert_chat_history_to_string() 136 | if get_tokens_number(chat_history) >= MAX_TOKENS_CHAT_HISTORY: 137 | print( 138 | f"Warning: chat history is too long, tokens: {get_tokens_number(chat_history)}." 139 | ) 140 | chat_history = self.convert_chat_history_to_string( 141 | user_only=True, remove_resource=True 142 | ) 143 | 144 | prompt = combine_prompt( 145 | chat_history=chat_history, 146 | query=query, 147 | answer_list=answer_list, 148 | link_list_list=link_list_list, 149 | MAX_TOKENS=4096 - 1000, 150 | ) 151 | prompt = self.convert_links_in_text(prompt) 152 | 153 | if get_tokens_number(prompt) > 4096 - 1000: 154 | return "Tokens number of the prompt is too long: {}.".format( 155 | get_tokens_number(prompt) 156 | ) 157 | else: 158 | print( 159 | "Tokens number of the prompt: {}.".format(get_tokens_number(prompt)) 160 | ) 161 | 162 | # return prompt 163 | # Use the OpenAI API to generate a response based on the prompt 164 | response = self.client.chat.completions.create( 165 | model="gpt-4o-mini", 166 | messages=[{"role": "user", "content": prompt}], 167 | temperature=0.7, 168 | max_tokens=1000, 169 | n=1, 170 | stop=None, 171 | ) 172 | 173 | # Extract and return the generated response 174 | print(f"Chatbot response:\n{response.choices[0].message.content.strip()}") 175 | return response.choices[0].message.content.strip() 176 | 177 | def update_chat_history(self, query, answer): 178 | """ 179 | Update the chat history with the user's query and the chatbot's response. 180 | 181 | Args: 182 | query (str): The user's query. 183 | answer (str): The chatbot's response to the user's query. 184 | """ 185 | if len(self.chat_history) == self._max_chat_history_length: 186 | self.chat_history.popleft() 187 | 188 | self.chat_history.append({"role": "user", "content": query}) 189 | self.chat_history.append({"role": "chatbot", "content": answer}) 190 | self.count += 1 191 | 192 | def convert_chat_history_to_string( 193 | self, 194 | new_query="", 195 | new_answser="", 196 | user_only=False, 197 | chatbot_only=False, 198 | remove_resource=False, 199 | ): 200 | """ 201 | Convert the chat history to a string. 202 | 203 | Args: 204 | new_query (str): The user's query. 205 | new_answser (str): The chatbot's response to the user's query. 206 | user_only (bool): If True, only return the user's queries. 207 | chatbot_only (bool): If True, only return the chatbot's responses. 208 | remove_resource (bool): If True, remove the resource from the chatbot's responses. 209 | 210 | Returns: 211 | chat_string (str): The chat history as a string. 212 | """ 213 | if sum([bool(user_only), bool(chatbot_only)]) == 2: 214 | raise ValueError( 215 | "user_only and chatbot_only cannot be True at the same time." 216 | ) 217 | chat_string = "" 218 | if len(self.chat_history) > 0: 219 | for message in self.chat_history: 220 | if message["role"] == "chatbot" and ~user_only: 221 | # Delete the text (the text until to end) begin with "REFERENCE:" in the message['content'], because we do not need it. 222 | if remove_resource: 223 | chat_string += f"[{message['role']}]: {message['content'].split('RESOURCE:', 1)[0].split('REFERENCE', 1)[0]} \n" 224 | else: 225 | chat_string += f"[{message['role']}]: {message['content']} \n" 226 | elif message["role"] == "user" and ~chatbot_only: 227 | chat_string += f"[{message['role']}]: {message['content']} \n" 228 | if new_query: 229 | chat_string += f"[user]: {new_query} \n" 230 | if new_answser: 231 | chat_string += f"[chatbot]: {new_answser} \n" 232 | 233 | if get_tokens_number(chat_string) >= 3000: # Max token length for GPT-3 is 4096. 234 | print( 235 | f"Chat history is too long: {get_tokens_number(chat_string)} tokens. Truncating chat history." 236 | ) 237 | return chat_string 238 | 239 | def convert_links_in_text(self, text): 240 | """ 241 | Convert links in the text to the correct format. 242 | 243 | Args: 244 | text (str): The text to convert. 245 | 246 | Returns: 247 | text (str): The text with the links converted. 248 | """ 249 | links = re.findall( 250 | "https://open-academy.github.io/machine-learning/[^\s]*", text 251 | ) 252 | for link in links: 253 | converted_link = ( 254 | link.replace("_sources/", "") 255 | .replace(".md", ".html") 256 | .replace("open-machine-learning-jupyter-book/", "") 257 | ) 258 | text = text.replace(link, converted_link) 259 | return text 260 | 261 | def markdown_to_python(self, markdown_text): 262 | """ 263 | Convert Markdown text to Python string. 264 | 265 | Args: 266 | markdown_text (str): The Markdown text to convert. 267 | 268 | Returns: 269 | python_string (str): The Python string. 270 | """ 271 | # Escape quotes and backslashes in the input. 272 | escaped_input = markdown_text.replace("\\", "\\\\").replace("'", "\\'") 273 | 274 | # Generate the Python string 275 | python_string = f"'{escaped_input}'" 276 | 277 | return python_string 278 | 279 | def chatbot_pipeline( 280 | self, query_pipeline, choose_GPTModel=False, updateChatHistory=False 281 | ): 282 | """ 283 | Chat with the chatbot using the pipeline. 284 | 285 | Args: 286 | query_pipeline (str): The user's query. 287 | choose_GPTModel (bool): If True, choose the GPT model. 288 | updateChatHistory (bool): If True, update the chat history. 289 | 290 | Returns: 291 | result_pipeline (str): The chatbot's response to the user's query. 292 | """ 293 | # choose which GPT model. 294 | if choose_GPTModel: 295 | response = self.client.completions.create( 296 | model="davinci", 297 | prompt=query_pipeline, 298 | temperature=0.7, 299 | max_tokens=150, 300 | n=1, 301 | stop=None, 302 | ) 303 | result_pipeline = response.choices[0].text.strip() 304 | else: 305 | result_pipeline = self.chatbot_qa( 306 | {"question": query_pipeline, "chat_history": self.chat_history} 307 | ) 308 | 309 | if updateChatHistory: 310 | self.query = query_pipeline 311 | self.result = result_pipeline 312 | self.chat_history = self.chat_history + [ 313 | (self.query, self.result["answer"]) 314 | ] 315 | return self.result 316 | else: 317 | return result_pipeline 318 | 319 | def promtp_engineering_for_non_library_content(self, query): 320 | """ 321 | Prompt the chatbot for non libary content. 322 | 323 | Args: 324 | query (str): The user's query. 325 | 326 | Returns: 327 | result_prompted (str): The chatbot's response to the user's query. 328 | """ 329 | # Please do not modify the value of query. 330 | query_prompted = query + " Please provide a verbose answer." 331 | 332 | result_prompted = self.chatbot_pipeline(query_prompted) 333 | # result_not_know_answer = [] # TBD 334 | # result_non_library_query = [] # TBD 335 | # result_official_keywords = [] # TBD 336 | # result_cheeting = [] # TBD 337 | return result_prompted 338 | 339 | def chatbot_qa(self, input_data): 340 | """ 341 | Process the input data and generate a response using the chatbot. 342 | 343 | Args: 344 | input_data (dict): A dictionary containing the question and chat history. 345 | 346 | Returns: 347 | dict: A dictionary containing the chatbot's response. 348 | """ 349 | question = input_data["question"] 350 | chat_history = input_data["chat_history"] 351 | 352 | # Prepare the messages for the ChatCompletion API 353 | messages = [ 354 | {"role": "system", "content": "You are a helpful AI assistant."}, 355 | ] 356 | 357 | # Add chat history to the messages 358 | for entry in chat_history: 359 | if isinstance(entry, dict): 360 | messages.append({"role": entry["role"], "content": entry["content"]}) 361 | elif isinstance(entry, tuple): 362 | messages.append({"role": "user", "content": entry[0]}) 363 | messages.append({"role": "assistant", "content": entry[1]}) 364 | 365 | # Add the current question 366 | messages.append({"role": "user", "content": question}) 367 | 368 | # Call the OpenAI API 369 | response = self.client.chat.completions.create( 370 | model="gpt-3.5-turbo", 371 | messages=messages, 372 | temperature=0.7, 373 | max_tokens=150, 374 | n=1, 375 | stop=None, 376 | ) 377 | 378 | # Extract the response 379 | answer = response.choices[0].message.content.strip() 380 | 381 | return {"answer": answer} 382 | -------------------------------------------------------------------------------- /TraceTalk/handle_multiprocessing.py: -------------------------------------------------------------------------------- 1 | import re 2 | 3 | def process_request(params): 4 | """ 5 | Process a request to the chatbot. 6 | 7 | Args: 8 | params (tuple): A tuple of parameters. 9 | 10 | Returns: 11 | tuple: A tuple of the answer and the link. 12 | """ 13 | chatbot_agent, context, chat_history, query, link, score = params 14 | convert_link = link.replace("_sources", "").replace(".md", ".html") if score > 0.5 else "" 15 | reject_context = "Sorry, the question is not associated with the context. The chatbot should refuse to answer." 16 | 17 | url_pattern = r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+|www.(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+|[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}' 18 | resource_list = re.findall(url_pattern, context) 19 | resource_str = "\n".join([f"[{i+1}] {link}" for i, link in enumerate(resource_list)]) 20 | resource_str = convert_link + "\n" + resource_str 21 | 22 | try: 23 | if score > 0.5: 24 | answer = chatbot_agent.prompt_chatbot(context, chat_history, resource_str, query) 25 | else: 26 | answer = reject_context 27 | except Exception as e: 28 | print(f"An error occurred: {e}") 29 | answer = "I'm sorry, but I encountered an error while processing your request." 30 | finally: 31 | # Release resources here, for example: 32 | # chatbot_agent.close() 33 | pass 34 | 35 | return answer, convert_link -------------------------------------------------------------------------------- /TraceTalk/main.py: -------------------------------------------------------------------------------- 1 | import os 2 | from concurrent.futures import ThreadPoolExecutor 3 | 4 | from chatbot_agent import ChatbotAgent 5 | from handle_multiprocessing import process_request 6 | 7 | def main(message="", messages=[""]): 8 | # Initialize the ChatbotAgent. 9 | openai_api_key = os.getenv("OPENAI_API_KEY") 10 | if not openai_api_key: 11 | raise ValueError("OPENAI_API_KEY environment variable not set.") 12 | qdrant_url = os.getenv("QDRANT_URL") 13 | if not qdrant_url: 14 | raise ValueError("QDRANT_URL environment variable not set.") 15 | qdrant_api_key = os.getenv("QDRANT_API_KEY") 16 | if not qdrant_api_key: 17 | raise ValueError("QDRANT_API_KEY environment variable not set.") 18 | 19 | if message: 20 | messages.append(message) 21 | 22 | global chatbot_agent 23 | chatbot_agent = ChatbotAgent( 24 | openai_api_key=openai_api_key, 25 | qdrant_url=qdrant_url, 26 | qdrant_api_key=qdrant_api_key, 27 | messages=messages, 28 | ) 29 | 30 | # Start the conversation. 31 | query = messages[-1] 32 | answer_list = [] 33 | link_list = [] 34 | # query it using content vector. 35 | query_results = chatbot_agent.search_context_qdrant( 36 | chatbot_agent.convert_chat_history_to_string(new_query=query, user_only=True), 37 | "Articles", 38 | top_k=4, 39 | ) 40 | 41 | article_ids_plus_one = [min(article.id + 1, 932 - 1) for article in query_results] 42 | article_ids_minus_one = [max(article.id - 1, 0) for article in query_results] 43 | retrieved_articles_plus_one = chatbot_agent.qdrant_client.retrieve( 44 | collection_name="Articles", ids=article_ids_plus_one 45 | ) 46 | retrieved_articles_minus_one = chatbot_agent.qdrant_client.retrieve( 47 | collection_name="Articles", ids=article_ids_minus_one 48 | ) 49 | requests = [ 50 | ( 51 | chatbot_agent, 52 | # Concatenate the existing article content with the content retrieved using the article's id. 53 | # 'retrieve' function returns a list of points, so we need to access the first (and in this case, only) result with '[0]'. 54 | (retrieved_articles_minus_one[i].payload["content"] + "\n" + 55 | article.payload["content"] + "\n" + 56 | retrieved_articles_plus_one[i].payload["content"]), 57 | chatbot_agent.convert_chat_history_to_string( 58 | user_only=True, remove_resource=True 59 | ), 60 | query, 61 | article.payload["link"], 62 | article.score, 63 | ) 64 | for i, article in enumerate(query_results) 65 | ] 66 | 67 | # Use a Pool to manage the processes. 68 | with ThreadPoolExecutor(max_workers=len(query_results)) as executor: 69 | results = list(executor.map(process_request, requests)) 70 | 71 | # Results is a list of tuples of the form (answer, link). 72 | answer_list, link_list = zip(*results) 73 | 74 | # Initialize link_list_list with each link from link_list as a separate list. 75 | link_list_list = [[link] for link in link_list] 76 | # For each answer, perform the query and add the result to the corresponding list in link_list_list. 77 | for i, answer in enumerate(answer_list): 78 | secondary_query_results_temp = chatbot_agent.search_context_qdrant( 79 | answer, "Articles", top_k=2 80 | ) 81 | link_list_list[i].extend( 82 | article.payload["link"].replace("_sources", "").replace(".md", ".html") 83 | for article in secondary_query_results_temp 84 | ) 85 | 86 | combine_answer = chatbot_agent.prompt_combine_chain( 87 | query=query, answer_list=answer_list, link_list_list=link_list_list 88 | ) 89 | 90 | return combine_answer 91 | 92 | 93 | if __name__ == "__main__": 94 | main() 95 | -------------------------------------------------------------------------------- /TraceTalk/package-lock.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "TraceTalk", 3 | "lockfileVersion": 3, 4 | "requires": true, 5 | "packages": { 6 | "": { 7 | "dependencies": { 8 | "axios": "^1.4.0" 9 | } 10 | }, 11 | "node_modules/asynckit": { 12 | "version": "0.4.0", 13 | "resolved": "https://registry.npmjs.org/asynckit/-/asynckit-0.4.0.tgz", 14 | "integrity": "sha512-Oei9OH4tRh0YqU3GxhX79dM/mwVgvbZJaSNaRk+bshkj0S5cfHcgYakreBjrHwatXKbz+IoIdYLxrKim2MjW0Q==" 15 | }, 16 | "node_modules/axios": { 17 | "version": "1.4.0", 18 | "resolved": "https://registry.npmjs.org/axios/-/axios-1.4.0.tgz", 19 | "integrity": "sha512-S4XCWMEmzvo64T9GfvQDOXgYRDJ/wsSZc7Jvdgx5u1sd0JwsuPLqb3SYmusag+edF6ziyMensPVqLTSc1PiSEA==", 20 | "dependencies": { 21 | "follow-redirects": "^1.15.0", 22 | "form-data": "^4.0.0", 23 | "proxy-from-env": "^1.1.0" 24 | } 25 | }, 26 | "node_modules/combined-stream": { 27 | "version": "1.0.8", 28 | "resolved": "https://registry.npmjs.org/combined-stream/-/combined-stream-1.0.8.tgz", 29 | "integrity": "sha512-FQN4MRfuJeHf7cBbBMJFXhKSDq+2kAArBlmRBvcvFE5BB1HZKXtSFASDhdlz9zOYwxh8lDdnvmMOe/+5cdoEdg==", 30 | "dependencies": { 31 | "delayed-stream": "~1.0.0" 32 | }, 33 | "engines": { 34 | "node": ">= 0.8" 35 | } 36 | }, 37 | "node_modules/delayed-stream": { 38 | "version": "1.0.0", 39 | "resolved": "https://registry.npmjs.org/delayed-stream/-/delayed-stream-1.0.0.tgz", 40 | "integrity": "sha512-ZySD7Nf91aLB0RxL4KGrKHBXl7Eds1DAmEdcoVawXnLD7SDhpNgtuII2aAkg7a7QS41jxPSZ17p4VdGnMHk3MQ==", 41 | "engines": { 42 | "node": ">=0.4.0" 43 | } 44 | }, 45 | "node_modules/follow-redirects": { 46 | "version": "1.15.2", 47 | "resolved": "https://registry.npmjs.org/follow-redirects/-/follow-redirects-1.15.2.tgz", 48 | "integrity": "sha512-VQLG33o04KaQ8uYi2tVNbdrWp1QWxNNea+nmIB4EVM28v0hmP17z7aG1+wAkNzVq4KeXTq3221ye5qTJP91JwA==", 49 | "funding": [ 50 | { 51 | "type": "individual", 52 | "url": "https://github.com/sponsors/RubenVerborgh" 53 | } 54 | ], 55 | "engines": { 56 | "node": ">=4.0" 57 | }, 58 | "peerDependenciesMeta": { 59 | "debug": { 60 | "optional": true 61 | } 62 | } 63 | }, 64 | "node_modules/form-data": { 65 | "version": "4.0.0", 66 | "resolved": "https://registry.npmjs.org/form-data/-/form-data-4.0.0.tgz", 67 | "integrity": "sha512-ETEklSGi5t0QMZuiXoA/Q6vcnxcLQP5vdugSpuAyi6SVGi2clPPp+xgEhuMaHC+zGgn31Kd235W35f7Hykkaww==", 68 | "dependencies": { 69 | "asynckit": "^0.4.0", 70 | "combined-stream": "^1.0.8", 71 | "mime-types": "^2.1.12" 72 | }, 73 | "engines": { 74 | "node": ">= 6" 75 | } 76 | }, 77 | "node_modules/mime-db": { 78 | "version": "1.52.0", 79 | "resolved": "https://registry.npmjs.org/mime-db/-/mime-db-1.52.0.tgz", 80 | "integrity": "sha512-sPU4uV7dYlvtWJxwwxHD0PuihVNiE7TyAbQ5SWxDCB9mUYvOgroQOwYQQOKPJ8CIbE+1ETVlOoK1UC2nU3gYvg==", 81 | "engines": { 82 | "node": ">= 0.6" 83 | } 84 | }, 85 | "node_modules/mime-types": { 86 | "version": "2.1.35", 87 | "resolved": "https://registry.npmjs.org/mime-types/-/mime-types-2.1.35.tgz", 88 | "integrity": "sha512-ZDY+bPm5zTTF+YpCrAU9nK0UgICYPT0QtT1NZWFv4s++TNkcgVaT0g6+4R2uI4MjQjzysHB1zxuWL50hzaeXiw==", 89 | "dependencies": { 90 | "mime-db": "1.52.0" 91 | }, 92 | "engines": { 93 | "node": ">= 0.6" 94 | } 95 | }, 96 | "node_modules/proxy-from-env": { 97 | "version": "1.1.0", 98 | "resolved": "https://registry.npmjs.org/proxy-from-env/-/proxy-from-env-1.1.0.tgz", 99 | "integrity": "sha512-D+zkORCbA9f1tdWRK0RaCR3GPv50cMxcrz4X8k5LTSUD1Dkw47mKJEZQNunItRTkWwgtaUSo1RVFRIG9ZXiFYg==" 100 | } 101 | } 102 | } 103 | -------------------------------------------------------------------------------- /TraceTalk/prep_data.py: -------------------------------------------------------------------------------- 1 | # Import basic libraries. 2 | import ast 3 | import os 4 | import re 5 | import warnings 6 | 7 | import pandas as pd 8 | import requests 9 | 10 | # Import Qdrant libraries. 11 | from qdrant_client import QdrantClient, models 12 | from src import get_emmbeddings, get_tokens_number 13 | 14 | 15 | def prep_book_data( 16 | csv_file_path=r"vector-db-persist-directory/book data/book data.csv", 17 | ): 18 | """ 19 | This function prepares the book data for the vector database. 20 | 21 | Args: 22 | csv_file_path (str): The path to the CSV file containing the book data. 23 | 24 | Returns: 25 | book_data (list): A list of dictionaries containing the book data. 26 | """ 27 | book_data = [] # Create an empty list to store book data. 28 | id = 0 # Set an initial value for the ID counter. 29 | 30 | input_directory = ( 31 | r"vector-db-persist-directory/resources" # Set the defualt directory. 32 | ) 33 | 34 | for file in os.listdir(input_directory): 35 | if file.endswith(".txt"): 36 | with open(os.path.join(input_directory, file), "r") as f: 37 | txt_content = f.read() 38 | # Link all Markdown files extracted from the text file. 39 | md_links = re.findall(r"'(https://[\w\d\-_/.]+\.md)',", txt_content) 40 | 41 | for link in md_links: 42 | md_file = link.rsplit("/", 1)[-1] 43 | md_title = md_file[:-3] # Remove the .md suffix. 44 | 45 | # Get the contents of the .md file. 46 | converted_link = ( 47 | link.replace("github.com/open-academy", "ocademy-ai.github.io") 48 | .replace("tree/main", "_sources") 49 | .replace("open-machine-learning-jupyter-book/", "") 50 | ) 51 | md_content_request = requests.get(converted_link) 52 | md_content = ( 53 | md_content_request.text 54 | if md_content_request.status_code == 200 55 | else "" 56 | ) 57 | print(f"Processing {md_title}: {link}...") 58 | 59 | md_content_split = split_text_into_chunks( 60 | md_content, chunk_max_tokens=300 61 | ) # Split the text into chunks. 62 | for text in md_content_split: 63 | if not text: 64 | continue 65 | id = id + 1 66 | 67 | # Add the book data to the list. 68 | book_data.append( 69 | { 70 | "id": id, 71 | "title": md_title, 72 | "title_vector": get_emmbeddings(md_title), 73 | "content": text, # Further optimization can be done by splitting files to reduce text volume. 74 | "content_vector": get_emmbeddings( 75 | text 76 | ), # Further optimization can be done by splitting files to reduce text volume. 77 | "link": converted_link, 78 | } 79 | ) 80 | 81 | # Convert the book_data list into a pandas DataFrame. 82 | book_data_df = pd.DataFrame(book_data) 83 | print("Shape of the book data DataFrame:", book_data_df.shape) 84 | book_data_df.to_csv(csv_file_path, index=False) 85 | 86 | 87 | def update_collection_to_database( 88 | csv_file_path=r"vector-db-persist-directory/book data/book data.csv", 89 | ): 90 | """ 91 | This function updates the Qdrant database collection with the book data. 92 | 93 | Args: 94 | csv_file_path (str, optional): The path to the CSV file containing the book data. 95 | """ 96 | # Load the book data DataFrame. 97 | book_data_df = pd.read_csv(csv_file_path) 98 | 99 | def convert_string_to_list(s): 100 | return ast.literal_eval(s) 101 | 102 | book_data_df["title_vector"] = book_data_df["title_vector"].apply( 103 | convert_string_to_list 104 | ) 105 | book_data_df["content_vector"] = book_data_df["content_vector"].apply( 106 | convert_string_to_list 107 | ) 108 | 109 | # Initialize client. 110 | qdrant_url = os.getenv("QDRANT_URL") 111 | if not qdrant_url: 112 | raise ValueError("QDRANT_URL environment variable not set.") 113 | print("QDRANT_URL:", qdrant_url) 114 | qdrant_api_key = os.getenv("QDRANT_API_KEY") 115 | if not qdrant_api_key: 116 | raise ValueError("QDRANT_API_KEY environment variable not set.") 117 | print("QDRANT_API_KEY:", qdrant_api_key) 118 | 119 | client = QdrantClient(url=qdrant_url, api_key=qdrant_api_key) 120 | 121 | # Create a new collection of the Qdrant database. 122 | vector_size = 1536 123 | client.recreate_collection( 124 | collection_name="Articles", 125 | vectors_config={ 126 | "title": models.VectorParams( 127 | size=vector_size, distance=models.Distance.COSINE 128 | ), 129 | "content": models.VectorParams( 130 | size=vector_size, distance=models.Distance.COSINE 131 | ), 132 | }, 133 | ) 134 | 135 | # Upsert the data into the collection of the Qdrant database. 136 | batch_size = 50 # Adjust this value to fit within Qdrant's size limits. 137 | # Divide data into batches. 138 | batches = [ 139 | book_data_df[i : i + batch_size] 140 | for i in range(0, book_data_df.shape[0], batch_size) 141 | ] 142 | 143 | for batch in batches: 144 | points = [] 145 | for _, row in batch.iterrows(): 146 | point = models.PointStruct( 147 | id=row["id"], 148 | vector={ 149 | "title": row["title_vector"], 150 | "content": row["content_vector"], 151 | }, 152 | payload=row.to_dict(), 153 | ) 154 | points.append(point) 155 | print(f"Upserting point with id: {row['id']}") 156 | 157 | client.upsert(collection_name="Articles", points=points) 158 | 159 | 160 | def split_text_into_chunks( 161 | text, delimiter="\n# ", chunk_max_tokens=600, MAX_TOKENS=4096 162 | ): 163 | begin_pattern = r"---.*?---" 164 | text = re.sub(begin_pattern, "", text, flags=re.DOTALL) 165 | text = "\n" + text 166 | 167 | # Remove code cells from text. 168 | code_pattern = r"(```{code-cell}.*?```)" 169 | code_cells = re.findall(code_pattern, text, flags=re.DOTALL) 170 | text = re.sub(code_pattern, "TEMPLATE_CODE_CELL\n", text, flags=re.DOTALL) 171 | 172 | chunks = re.split( 173 | "((?:^|\n)(?={}(?!#)))".format(delimiter), text, flags=re.MULTILINE 174 | ) 175 | chunks = [chunk for chunk in chunks if chunk.strip()] 176 | 177 | final_chunks = [] 178 | for chunk in chunks: 179 | # Split the chunk into the sentences. 180 | sentences = re.split(r"([.?!])", chunk) 181 | current_n_sentences = [] 182 | for sentence in sentences: 183 | current_n_sentences.append(sentence) 184 | 185 | # When the number of words reaches chunk_size, add a new chunk. 186 | if get_tokens_number("".join(current_n_sentences)) >= chunk_max_tokens: 187 | final_chunks.append("".join(current_n_sentences)) 188 | current_n_sentences = [] 189 | # Add the last chunk. 190 | if current_n_sentences: 191 | final_chunks.append("".join(current_n_sentences)) 192 | 193 | for i, chunk in enumerate(final_chunks): 194 | try: 195 | while "TEMPLATE_CODE_CELL" in chunk and code_cells: 196 | code_cell = code_cells.pop(0) 197 | 198 | # Split the code cell into lines. 199 | code_cell_lines = code_cell.split("\n") 200 | 201 | # If the code cell can be inserted without exceeding the chunk size, do it. 202 | if ( 203 | get_tokens_number(chunk.replace("TEMPLATE_CODE_CELL", code_cell, 1)) 204 | <= chunk_max_tokens * 2 205 | ): 206 | chunk = chunk.replace("TEMPLATE_CODE_CELL", code_cell, 1) 207 | # If not, insert as many lines as possible. 208 | else: 209 | code_cell_lines = [] 210 | for line in code_cell_lines: 211 | # If the next line can be inserted without exceeding the chunk size, do it. 212 | if ( 213 | get_tokens_number( 214 | chunk.replace( 215 | "TEMPLATE_CODE_CELL", "\n".join(code_cell_lines), 1 216 | ) 217 | ) 218 | <= chunk_max_tokens * 2 219 | ): # chunk_size 220 | code_cell_lines.append(line) 221 | # If not, put the remaining lines back into code_cells and stop. 222 | else: 223 | break 224 | chunk = chunk.replace( 225 | "TEMPLATE_CODE_CELL", "\n".join(code_cell_lines) 226 | ) 227 | 228 | final_chunks[i] = chunk 229 | print(f"Tokens number of chunk {i}: {get_tokens_number(chunk)}") 230 | except IndexError: 231 | warnings.warn( 232 | "Code cells mismatch. The number of 'TEMPLATE_CODE_CELL' placeholders and actual code cells do not match." 233 | ) 234 | 235 | return final_chunks 236 | -------------------------------------------------------------------------------- /TraceTalk/prompts/basic_prompt.py: -------------------------------------------------------------------------------- 1 | from langchain.prompts import PromptTemplate 2 | 3 | 4 | # Prompt the chatbot. 5 | def basic_prompt(): 6 | template = """ 7 | In your answer you should add a part called RESOURCE to extract the corresponding links from CONTEXT and list them in RESOURCE in markdown and citation format. 8 | Strictly PROHIBITED to create or fabricate the links within RESOURCE, if no links are found please say sorry. The RESOURCE should ONLY consist of LINKS that are directly drawn from the CONTEXT. 9 | If the answer to the QUESTION is not within your knowledge scope, admit it instead of concocting an answer. 10 | In the event where the QUESTION doesn't correlate with the CONTEXT, it's acceptable to respond with an apology, indicating that more information is required for an accurate answer, or you may respectfully decline to provide an answer. 11 | 12 | ===== CONTEXT ===== 13 | {{context}} 14 | 15 | ===== CHAT HISTORY ===== 16 | {{chat_history}} 17 | 18 | ===== RESOURCE ===== 19 | {{resource}} 20 | 21 | ========= 22 | ANSWER THE QUESTION "{{query}}", FINAL A VERBOSE ANSWER, language used for answers is CONSISTENT with QUESTION: 23 | """ 24 | prompt = PromptTemplate( 25 | template=template, 26 | input_variables=["context", "chat_history", "resource","query"], 27 | template_format="jinja2", 28 | validate_template=False, 29 | ) # Parameter the prompt template 30 | return prompt 31 | -------------------------------------------------------------------------------- /TraceTalk/prompts/combine_prompt.py: -------------------------------------------------------------------------------- 1 | import re 2 | from langchain.prompts import PromptTemplate 3 | from jinja2 import Template 4 | from src import get_tokens_number 5 | 6 | 7 | # Combine prompt. 8 | def combine_prompt(chat_history, query, answer_list, link_list_list, MAX_TOKENS=3000): 9 | n = len(answer_list) 10 | 11 | template = f""" 12 | ===== RULES ===== 13 | Now I will provide you with {n} chains, here is the definition of chain: each chain contains an answer and a link. The answers in the chain are the results from the links. 14 | In theory, each chain should produce a paragraph with links as the resources. It means that you MUST tell me from which references you make the summery. 15 | The smaller the number of the chain, the more important the information contained in the chain. 16 | Your final answer is verbose. 17 | But if the meaning of an answer in a certain chain is similar to 'I am not sure about your question' or 'I refuse to answer such a question', it means that this answer chain is deprecated, and you should actively ignore the information in this answer chain. 18 | 19 | You now are asked to try to answer and integrate these {n} chains (integration means avoiding repetition, writing logically, smooth writing, giving verbose answer), and answer it in 2-4 paragraphs appropriately. 20 | The final answer is ALWAYS in Markdown format. 21 | Provide your answer in a style of CITATION format where you also list the resources from where you found the information at the end of the text. (an example is provided below) 22 | In addition, in order to demostrate the knowledge resources you have referred, please ALWAYs return a "RESOURCE" part in your answer. 23 | RESOURCE can ONLY be a list of links, and each link means the knowledge resource of each chain. Each chain has only one RESOURCE part. 24 | The RESOURCE should ONLY consist of LINKS that are directly drawn from the CHAINE. 25 | Strictly PROHIBITED to create or fabricate the links within RESOURCE, if no links are found please say sorry. 26 | 27 | ===== EXAMPLE ===== 28 | For exmaple, if you are provided with 2 chains, the template is below: 29 | CHAIN 1: 30 | CONTEXT: 31 | Text of chain 1. ABCDEFGHIJKLMNOPQRSTUVWXYZ 32 | RESOURCE: 33 | https://link1.com 34 | CHAIN 2: 35 | CONTEXT: 36 | Text of chain 2. ABCDEFGHIJKLMNOPQRSTUVWXYZ 37 | RESOURCE: 38 | https://link2.com 39 | 40 | YOU COMPLETE ANSWER LIKE THIS: 41 | Integrated text of chain 1 [1] and chain 2 [2]. Blablabla. 42 | REFERENCE: 43 | [1] [title_link1](https://link1.com) 44 | [2] [title_link2](https://link2.com) 45 | 46 | """ 47 | 48 | chat_history_text = """ 49 | ===== CHAT HISTORY ===== 50 | {{chat_history}} 51 | 52 | """ 53 | template += chat_history_text 54 | 55 | init_chain_tmp = f"Now I provide you with {n} chains:" 56 | template += init_chain_tmp 57 | for i in range(n): 58 | link_list = "\n".join([item for item in link_list_list[i]]) 59 | template_tmp = f""" 60 | ===== CHAIN {i+1} ===== 61 | CONTEXT: 62 | {answer_list[i]} 63 | RESOURCE: 64 | {link_list} 65 | """ 66 | if get_tokens_number(template + template_tmp) > MAX_TOKENS: 67 | break 68 | template += template_tmp 69 | # After breaking from the loop, print the remaining links. 70 | for j in range(i + 1, n): 71 | link_list = "\n".join([item for item in link_list_list[j]]) 72 | template_tmp = f"{link_list}\n" 73 | if get_tokens_number(template + template_tmp) > MAX_TOKENS: 74 | break 75 | template += template_tmp 76 | 77 | template += """ 78 | ========= 79 | ANSWER THE QUESTION "{{query}}", FINAL A VERBOSE ANSWER, language used for answers is CONSISTENT with QUESTION: 80 | """ 81 | 82 | prompt = Template(template).render(query=query, chat_history=chat_history) 83 | return prompt 84 | -------------------------------------------------------------------------------- /TraceTalk/requirements.txt: -------------------------------------------------------------------------------- 1 | qdrant-client==1.3.1 2 | langchain==0.0.220 3 | Flask==1.1.2 4 | flask-cors==3.0.10 5 | openai==0.27.0 6 | pandas==2.2.2 7 | requests==2.26.0 8 | jinja2==3.0.1 9 | dotenv==0.19.1 10 | python-dotenv==latest 11 | tiktoken==latest 12 | -------------------------------------------------------------------------------- /TraceTalk/roots.sst: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/TraceTalk/roots.sst -------------------------------------------------------------------------------- /TraceTalk/src.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | from openai import OpenAI 4 | import tiktoken 5 | 6 | 7 | def get_emmbeddings(text): 8 | """ 9 | Get the embeddings of the text. 10 | 11 | Args: 12 | text (str): The text to get the embeddings of. 13 | 14 | Returns: 15 | embedded_query (list): The embeddings of the text. 16 | """ 17 | client = OpenAI( 18 | api_key=os.getenv("OPENAI_API_KEY"), 19 | ) 20 | embedded_query = client.embeddings.create( 21 | input=text, 22 | model="text-embedding-ada-002", 23 | ).data[0].embedding 24 | 25 | return embedded_query # It is a vector of numbers. 26 | 27 | 28 | def get_tokens_number(text="", encoding_type="cl100k_base", model_name="gpt-4o-mini"): 29 | """ 30 | Get the number of tokens in the text. 31 | 32 | Args: 33 | text (str): The text to get the number of tokens of. 34 | encoding_type (str): The encoding type. 35 | model_name (str): The model name. 36 | 37 | Returns: 38 | tokens_number (int): The number of tokens in the text. 39 | """ 40 | encoding = tiktoken.get_encoding(encoding_type) 41 | encoding = tiktoken.encoding_for_model(model_name) 42 | return len(encoding.encode(text)) 43 | -------------------------------------------------------------------------------- /TraceTalk/update_collection.py: -------------------------------------------------------------------------------- 1 | from prep_data import update_collection_to_database 2 | 3 | if __name__ == "__main__": 4 | scv_file_path = r"vector-db-persist-directory/book dada/book data.csv" 5 | # prep_book_data(scv_file_path) 6 | update_collection_to_database(scv_file_path) 7 | -------------------------------------------------------------------------------- /TraceTalk/utils/json_tokenizer.py: -------------------------------------------------------------------------------- 1 | class JSONTokenizer: 2 | def __init__(self): 3 | self.stack = [] # 用于保存当前的JSON片段,如开始一个对象或数组 4 | self.last_token_type = None 5 | 6 | def is_valid(self, current_state, token): 7 | # 检查token是否合法,并更新内部状态 8 | if self.is_start_of_object(token) and ( 9 | self.last_token_type in [None, "start_array", "comma", "colon"] 10 | ): 11 | self.stack.append("object") 12 | self.last_token_type = "start_object" 13 | return True 14 | if self.is_start_of_array(token) and ( 15 | self.last_token_type in [None, "start_array", "comma", "colon"] 16 | ): 17 | self.stack.append("array") 18 | self.last_token_type = "start_array" 19 | return True 20 | if self.is_end_of_object(token) and self.stack[-1] == "object": 21 | self.stack.pop() 22 | self.last_token_type = "end_object" 23 | return True 24 | if self.is_end_of_array(token) and self.stack[-1] == "array": 25 | self.stack.pop() 26 | self.last_token_type = "end_array" 27 | return True 28 | if self.is_key_or_value(token) and ( 29 | self.last_token_type in ["start_object", "comma"] 30 | ): 31 | self.last_token_type = "key_or_value" 32 | return True 33 | if self.is_colon(token) and self.last_token_type == "key_or_value": 34 | self.last_token_type = "colon" 35 | return True 36 | if self.is_comma(token) and self.last_token_type in [ 37 | "key_or_value", 38 | "end_object", 39 | "end_array", 40 | ]: 41 | self.last_token_type = "comma" 42 | return True 43 | return False 44 | -------------------------------------------------------------------------------- /TraceTalk/utils/test_tokenizer.py: -------------------------------------------------------------------------------- 1 | import json 2 | import math 3 | import os 4 | from typing import Dict, List 5 | 6 | from openai import OpenAI 7 | 8 | from utils.json_tokenizer import JsonTokenizer 9 | 10 | 11 | def softmax(tokens: List[Dict[str, float]]) -> List[Dict[str, float]]: 12 | exp_probs = [{"token": t["token"], "prob": math.exp(t["logprob"])} for t in tokens] 13 | total = sum(t["prob"] for t in exp_probs) 14 | return [{"token": t["token"], "prob": t["prob"] / total} for t in exp_probs] 15 | 16 | 17 | def preprocessor( 18 | tokens: List[Dict[str, float]], json_tokenizer: JsonTokenizer 19 | ) -> List[Dict[str, float]]: 20 | valid_tokens = [t for t in tokens if json_tokenizer.is_valid(t["token"])] 21 | 22 | if not valid_tokens: 23 | if json_tokenizer.stack: 24 | closing_token = json_tokenizer.stack[-1] 25 | if json_tokenizer.is_valid(closing_token): 26 | valid_tokens = [ 27 | {"token": closing_token, "logprob": -10.0} 28 | ] # 给一个很小的概率 29 | elif not json_tokenizer.has_content: 30 | valid_tokens = [ 31 | {"token": "{", "logprob": -1.0}, 32 | {"token": "[", "logprob": -1.0}, 33 | ] 34 | 35 | return softmax(valid_tokens) 36 | 37 | 38 | def generate_json_with_llm(prompt: str, max_tokens: int = 100) -> str: 39 | json_tokenizer = JsonTokenizer() 40 | result = "" 41 | client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) 42 | 43 | while len(result) < max_tokens and not json_tokenizer.is_complete(): 44 | print(f"Prompt: {prompt + result}") # 为了调试 45 | response = client.chat.completions.create( 46 | model="gpt-4-turbo", 47 | messages=[ 48 | { 49 | "role": "system", 50 | "content": "You are a JSON generator. Generate valid JSON only.", 51 | }, 52 | {"role": "user", "content": prompt + result}, 53 | ], 54 | max_tokens=1, # 每次只生成一个 token 55 | n=1, 56 | temperature=0.7, 57 | logprobs=True, 58 | top_logprobs=5, 59 | ) 60 | 61 | if response.choices[0].logprobs and response.choices[0].logprobs.content: 62 | token_info = response.choices[0].logprobs.content[0] 63 | raw_tokens = [ 64 | {"token": logprob.token, "logprob": logprob.logprob} 65 | for logprob in token_info.top_logprobs 66 | ] 67 | 68 | processed_tokens = preprocessor(raw_tokens, json_tokenizer) 69 | 70 | if processed_tokens: 71 | next_token = max(processed_tokens, key=lambda x: x["prob"]) 72 | result += next_token["token"] 73 | json_tokenizer.is_valid(next_token["token"]) 74 | print(f"Generated token: {next_token['token']}") # 为了调试 75 | else: 76 | print("No valid tokens available. Ending generation.") 77 | break 78 | 79 | prompt += f"\nPlease continue writing the answer in json format:\n{result}" 80 | 81 | return result 82 | 83 | 84 | if __name__ == "__main__": 85 | prompt = "Task: Generate a JSON object describing a person with name and age. The answer schema is as follows:" 86 | schema = { 87 | "name": "string", 88 | "age": "number", 89 | } 90 | 91 | print("Generating JSON...") 92 | generated_json = generate_json_with_llm( 93 | prompt + "\n" + json.dumps(schema, indent=2) 94 | ) 95 | print("\nGenerated JSON:") 96 | print(generated_json) 97 | 98 | try: 99 | parsed_json = json.loads(generated_json) 100 | print("\nSuccessfully parsed the generated JSON:") 101 | print(json.dumps(parsed_json, indent=2)) 102 | except json.JSONDecodeError as e: 103 | print(f"\nError: Generated JSON is not valid: {e}") 104 | -------------------------------------------------------------------------------- /TraceTalk/vector-db-persist-directory/resources/assets.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/TraceTalk/vector-db-persist-directory/resources/assets.txt -------------------------------------------------------------------------------- /TraceTalk/vector-db-persist-directory/resources/assignments.txt: -------------------------------------------------------------------------------- 1 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/amazon-sagemaker-mlops-workshop-warm-up.md', 2 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/apply-your-skills.md', 3 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/build-your-own-custom-vis.md', 4 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/classifying-datasets.md', 5 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/data-processing-in-python.md', 6 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/data-science-project-using-azure-ml-sdk.md', 7 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/data-science-scenarios.md', 8 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/dive-into-the-beehive.md', 9 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/explore-a-planetary-computer-dataset.md', 10 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/lines-scatters-and-bars.md', 11 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/low-code-no-code-data-science-project-on-azure-ml.md', 12 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/market-research.md', 13 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/tell-a-story.md', 14 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/try-it-in-excel.md', 15 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/data-science/write-a-data-ethics-case-study.md', 16 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/ml-fundamentals/create-a-regression-model.md', 17 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/ml-fundamentals/explore-classification-methods.md', 18 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/ml-fundamentals/exploring-visualizations.md', 19 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/ml-fundamentals/parameter-play.md', 20 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/ml-fundamentals/regression-with-scikit-learn.md', 21 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/ml-fundamentals/retrying-some-regression.md', 22 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/ml-fundamentals/study-the-solvers.md', 23 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/assignments/ml-fundamentals/try-a-different-model.md', 24 | -------------------------------------------------------------------------------- /TraceTalk/vector-db-persist-directory/resources/data-science.txt: -------------------------------------------------------------------------------- 1 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-in-the-wild.md', 2 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-in-the-cloud/data-science-in-the-cloud.md', 3 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-in-the-cloud/introduction.md', 4 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-in-the-cloud/the-azure-ml-sdk-way.md', 5 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-in-the-cloud/the-low-code-no-code-way.md', 6 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-lifecycle/analyzing.md', 7 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-lifecycle/communication.md', 8 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-lifecycle/data-science-lifecycle.md', 9 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-lifecycle/introduction.md', 10 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-visualization/data-visualization.md', 11 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-visualization/meaningful-visualizations.md', 12 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-visualization/visualization-distributions.md', 13 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-visualization/visualization-proportions.md', 14 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-visualization/visualization-relationships.md', 15 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-visualization/visualizing-quantities.md', 16 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/introduction/data-science-ethics.md', 17 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/introduction/defining-data-science.md', 18 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/introduction/defining-data.md', 19 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/introduction/introduction-to-statistics-and-probability.md', 20 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/introduction/introduction.md', 21 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/working-with-data/data-preparation.md', 22 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/working-with-data/non-relational-data.md', 23 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/working-with-data/numpy.md', 24 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/working-with-data/pandas.md', 25 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/working-with-data/relational-databases.md', 26 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/working-with-data/working-with-data.md', 27 | -------------------------------------------------------------------------------- /TraceTalk/vector-db-persist-directory/resources/data.txt: -------------------------------------------------------------------------------- 1 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-in-the-wild.md', 2 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-in-the-cloud/data-science-in-the-cloud.md', 3 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-in-the-cloud/introduction.md', 4 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-in-the-cloud/the-azure-ml-sdk-way.md', 5 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-in-the-cloud/the-low-code-no-code-way.md', 6 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-lifecycle/analyzing.md', 7 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-lifecycle/communication.md', 8 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-lifecycle/data-science-lifecycle.md', 9 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-science-lifecycle/introduction.md', 10 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-visualization/data-visualization.md', 11 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-visualization/meaningful-visualizations.md', 12 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-visualization/visualization-distributions.md', 13 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-visualization/visualization-proportions.md', 14 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-visualization/visualization-relationships.md', 15 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/data-visualization/visualizing-quantities.md', 16 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/introduction/data-science-ethics.md', 17 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/introduction/defining-data-science.md', 18 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/introduction/defining-data.md', 19 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/introduction/introduction-to-statistics-and-probability.md', 20 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/introduction/introduction.md', 21 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/working-with-data/data-preparation.md', 22 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/working-with-data/non-relational-data.md', 23 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/working-with-data/numpy.md', 24 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/working-with-data/pandas.md', 25 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/working-with-data/relational-databases.md', 26 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/data-science/working-with-data/working-with-data.md', 27 | -------------------------------------------------------------------------------- /TraceTalk/vector-db-persist-directory/resources/deep-learning.txt: -------------------------------------------------------------------------------- 1 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/autoencoder.md', 2 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/cnn.md', 3 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/difussion-model.md', 4 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/dl-overview.md', 5 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/dl-summary.md', 6 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/dqn.md', 7 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/gan.md', 8 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/image-classification.md', 9 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/image-segmentation.md', 10 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/lstm.md', 11 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/nlp.md', 12 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/object-detection.md', 13 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/rnn.md', 14 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/deep-learning/time-series.md', 15 | -------------------------------------------------------------------------------- /TraceTalk/vector-db-persist-directory/resources/machine-learning-productionization.txt: -------------------------------------------------------------------------------- 1 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/machine-learning-productionization/data-engineering.md', 2 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/machine-learning-productionization/model-deployment.md', 3 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/machine-learning-productionization/model-training-and-evaluation.md', 4 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/machine-learning-productionization/overview.md', 5 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/machine-learning-productionization/problem-framing.md', 6 | -------------------------------------------------------------------------------- /TraceTalk/vector-db-persist-directory/resources/ml-advanced.txt: -------------------------------------------------------------------------------- 1 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/kernel-method.md', 2 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/model-selection.md', 3 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/unsupervised-learning-pca-and-clustering.md', 4 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/unsupervised-learning.md', 5 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/clustering/clustering-models-for-machine-learning.md', 6 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/clustering/introduction-to-clustering.md', 7 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/clustering/k-means-clustering.md', 8 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/ensemble-learning/bagging.md', 9 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/ensemble-learning/feature-importance.md', 10 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/ensemble-learning/getting-started-with-ensemble-learning.md', 11 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/ensemble-learning/random-forest.md', 12 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/gradient-boosting/gradient-boosting-example.md', 13 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/gradient-boosting/gradient-boosting.md', 14 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/gradient-boosting/introduction-to-gradient-boosting.md', 15 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/gradient-boosting/xgboost-k-fold-cv-feature-importance.md', 16 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-advanced/gradient-boosting/xgboost.md', 17 | -------------------------------------------------------------------------------- /TraceTalk/vector-db-persist-directory/resources/ml-fundamentals.txt: -------------------------------------------------------------------------------- 1 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/build-a-web-app-to-use-a-machine-learning-model.md', 2 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/ml-overview.md', 3 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/ml-summary.md', 4 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/classification/applied-ml-build-a-web-app.md', 5 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/classification/getting-started-with-classification.md', 6 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/classification/introduction-to-classification.md', 7 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/classification/more-classifiers.md', 8 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/classification/yet-other-classifiers.md', 9 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/introduction/fairness-and-machine-learning.md', 10 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/introduction/introduction-to-machine-learning.md', 11 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/introduction/introduction.md', 12 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/introduction/techniques-of-machine-learning.md', 13 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/introduction/the-history-of-machine-learning-and-ai.md', 14 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/neural-network/autoencoders.md', 15 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/neural-network/convolutional-neural-networks.md', 16 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/neural-network/introduction.md', 17 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/neural-network/neural-network-overview.md', 18 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/neural-network/nn-basics.md', 19 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/neural-network/nn-hands-on.md', 20 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/neural-network/nn-implementation.md', 21 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/neural-network/recurrent-neural-networks.md', 22 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/parameter-optimization/gradient-descent.md', 23 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/parameter-optimization/loss-function.md', 24 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/parameter-optimization/parameter-optimization.md', 25 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/regression/linear-and-polynomial-regression.md', 26 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/regression/logistic-regression.md', 27 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/regression/managing-data.md', 28 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/regression/regression-models-for-machine-learning.md', 29 | 'https://github.com/open-academy/machine-learning/tree/main/open-machine-learning-jupyter-book/ml-fundamentals/regression/tools-of-the-trade.md', 30 | -------------------------------------------------------------------------------- /TraceTalk/vector-db-persist-directory/resources/prerequisites.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/TraceTalk/vector-db-persist-directory/resources/prerequisites.txt -------------------------------------------------------------------------------- /TraceTalk/vector-db-persist-directory/resources/slides.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/TraceTalk/vector-db-persist-directory/resources/slides.txt -------------------------------------------------------------------------------- /TraceTalk/vector-db-persist-directory/resources/supporting-materials.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/TraceTalk/vector-db-persist-directory/resources/supporting-materials.txt -------------------------------------------------------------------------------- /TraceTalk/workflows/update_source_link.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | 4 | sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) 5 | from prep_data import prep_data 6 | 7 | 8 | def main(): 9 | # Update the source links. 10 | update_source_link() 11 | # Update the embeddings of text. 12 | prep_data() 13 | 14 | 15 | def update_source_link(): 16 | path = "open-machine-learning-jupyter-book" 17 | folders = [] 18 | md_files = [] 19 | for root, dirs, files in os.walk(path): 20 | for file in files: 21 | if file.endswith(".md"): 22 | md_files.append(os.path.join(root, file)) 23 | for dir in dirs: 24 | if not dir.startswith("."): 25 | folders.append(dir) 26 | 27 | for folder in folders: 28 | file_content = f"#### {folder}:\n" 29 | folder_md_files = [] 30 | 31 | for md_file in md_files: 32 | if md_file.startswith(os.path.join(path, folder)): 33 | md_file = md_file.replace("\\", "/") 34 | folder_md_files.append( 35 | f"'https://github.com/open-academy/machine-learning/tree/main/{md_file}',\n" 36 | ) 37 | 38 | file_content += "".join(folder_md_files) 39 | # file_content += '\n' 40 | 41 | if folder_md_files: 42 | with open( 43 | r"chatbot\vector-db-persist-directory\resources\{}.txt".format(folder), 44 | "w", 45 | ) as f: 46 | f.write(file_content) 47 | 48 | 49 | if __name__ == "__main__": 50 | main() 51 | -------------------------------------------------------------------------------- /docs/api_documentation.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/docs/api_documentation.md -------------------------------------------------------------------------------- /docs/learning_mechanism.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/docs/learning_mechanism.md -------------------------------------------------------------------------------- /docs/setup.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/docs/setup.md -------------------------------------------------------------------------------- /frontend/main.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | 3 | from fastapi import FastAPI, Request 4 | from fastapi.responses import StreamingResponse 5 | 6 | app = FastAPI() 7 | 8 | 9 | async def process_request(content: str, user: dict): 10 | yield "LeAgent is processing your request...\n" 11 | await asyncio.sleep(0.5) 12 | yield f"LeAgent is analyzing the message from user {user['name']}: {content}\n" 13 | await asyncio.sleep(0.5) 14 | yield "LeAgent is generating a response...\n" 15 | await asyncio.sleep(0.5) 16 | yield "TASK_DONE" 17 | 18 | 19 | @app.post("/process") 20 | async def process(request: Request): 21 | data = await request.json() 22 | content = data["content"] 23 | user = data["user"] 24 | 25 | async def event_generator(): 26 | async for message in process_request(content, user): 27 | yield f"{message}\n" 28 | 29 | return StreamingResponse(event_generator(), media_type="text/plain") 30 | 31 | 32 | if __name__ == "__main__": 33 | import uvicorn 34 | 35 | uvicorn.run(app, host="0.0.0.0", port=8101) 36 | -------------------------------------------------------------------------------- /notebooks/experiment_adaptive_learning.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/notebooks/experiment_adaptive_learning.ipynb -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/requirements.txt -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/setup.py -------------------------------------------------------------------------------- /src/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/__init__.py -------------------------------------------------------------------------------- /src/config.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/config.py -------------------------------------------------------------------------------- /src/llm/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/llm/__init__.py -------------------------------------------------------------------------------- /src/llm/api_integration.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/llm/api_integration.py -------------------------------------------------------------------------------- /src/llm/model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/llm/model.py -------------------------------------------------------------------------------- /src/main.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/main.py -------------------------------------------------------------------------------- /src/nlu/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/nlu/__init__.py -------------------------------------------------------------------------------- /src/nlu/intent_recognition.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/nlu/intent_recognition.py -------------------------------------------------------------------------------- /src/online_learning/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/online_learning/__init__.py -------------------------------------------------------------------------------- /src/online_learning/adaptive_model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/online_learning/adaptive_model.py -------------------------------------------------------------------------------- /src/online_learning/memory_manager.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/online_learning/memory_manager.py -------------------------------------------------------------------------------- /src/personalization/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/personalization/__init__.py -------------------------------------------------------------------------------- /src/personalization/preference_learner.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/personalization/preference_learner.py -------------------------------------------------------------------------------- /src/personalization/user_profile.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/personalization/user_profile.py -------------------------------------------------------------------------------- /src/task_management/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/task_management/__init__.py -------------------------------------------------------------------------------- /src/task_management/task_handler.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/task_management/task_handler.py -------------------------------------------------------------------------------- /src/utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/utils/__init__.py -------------------------------------------------------------------------------- /src/utils/helpers.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/src/utils/helpers.py -------------------------------------------------------------------------------- /tests/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/tests/__init__.py -------------------------------------------------------------------------------- /tests/test_adaptive_model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/tests/test_adaptive_model.py -------------------------------------------------------------------------------- /tests/test_main.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/tests/test_main.py -------------------------------------------------------------------------------- /tests/test_personalization.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Appointat/LeAgent/0a7d9aef4cded6bba667ccd3b0922db7b85da89c/tests/test_personalization.py --------------------------------------------------------------------------------