├── ChromaDB ├── Open_Source_Vector_Database_ChromaDB_Tutorial.ipynb ├── linux_doc │ ├── CentOS常用命令.pdf │ └── Training GPT.pdf └── pets │ ├── Different Types of Pet Animals.txt │ ├── Health Care for Pets.txt │ ├── Nutrition Needs of Pet Animals.txt │ ├── The Emotional Bond Between Humans and Pets.txt │ └── Training and Behaviour of Pets.txt ├── LangChain_Memory_ChatGPT_ChatBot.ipynb ├── LangChain_Prompt_Templates.ipynb ├── LangChain_With_OpenAI_Proxy_Invoke.ipynb ├── Langchain Chatbot ├── Langchain_Pinecone_Indexing_Search_.ipynb ├── main.py ├── requirements.txt └── utils.py ├── Langchain_Agents_SQL_Database_Agent.ipynb ├── Langchain_Semnatic_Serach_Pinecone.ipynb ├── Open_Source_Vector_Database_ChromaDB_Tutorial.ipynb ├── README.md ├── Using_Falcon7B_And_LangChain_Build_Conversation_Memory_Chatbot.ipynb ├── Using_LlamaIndex_Query_Documents.ipynb ├── Using_OpenAI__LangChain_And_HDBSCAN_Clustering_Documents.ipynb ├── Whisper_And_ChatGPT_API_Summary_Audio_Tutorial.ipynb └── Whisper_Transcribe_YouTube_Video.ipynb /ChromaDB/linux_doc/CentOS常用命令.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Crossme0809/langchain-tutorials/a6d5c3ccae728f0092e3709de16d4eb97d87907b/ChromaDB/linux_doc/CentOS常用命令.pdf -------------------------------------------------------------------------------- /ChromaDB/linux_doc/Training GPT.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Crossme0809/langchain-tutorials/a6d5c3ccae728f0092e3709de16d4eb97d87907b/ChromaDB/linux_doc/Training GPT.pdf -------------------------------------------------------------------------------- /ChromaDB/pets/Different Types of Pet Animals.txt: -------------------------------------------------------------------------------- 1 | Pet animals come in all shapes and sizes, each suited to different lifestyles and home environments. Dogs and cats are the most common, known for their companionship and unique personalities. Small mammals like hamsters, guinea pigs, and rabbits are often chosen for their low maintenance needs. Birds offer beauty and song, and reptiles like turtles and lizards can make intriguing pets. Even fish, with their calming presence, can be wonderful pets. 2 | -------------------------------------------------------------------------------- /ChromaDB/pets/Health Care for Pets.txt: -------------------------------------------------------------------------------- 1 | Routine health care is crucial for pets to live long, happy lives. Regular vet check-ups help catch potential issues early and keep vaccinations up to date. Dental care is also essential to prevent diseases in pets, especially in dogs and cats. Regular grooming, parasite control, and weight management are other important aspects of pet health care. -------------------------------------------------------------------------------- /ChromaDB/pets/Nutrition Needs of Pet Animals.txt: -------------------------------------------------------------------------------- 1 | Proper nutrition is vital for the health and wellbeing of pets. Dogs and cats require a balanced diet that includes proteins, carbohydrates, and fats. Some may even have specific dietary needs based on their breed or age. Birds typically thrive on a diet of seeds, fruits, and vegetables, while reptiles have diverse diets ranging from live insects to fresh produce. Fish diets depend greatly on the species, with some needing live food and others subsisting on flakes or pellets. -------------------------------------------------------------------------------- /ChromaDB/pets/The Emotional Bond Between Humans and Pets.txt: -------------------------------------------------------------------------------- 1 | Pets offer more than just companionship; they provide emotional support, reduce stress, and can even help their owners lead healthier lives. The bond between pets and their owners is strong, and many people consider their pets as part of the family. This bond can be especially important in times of personal or societal stress, providing comfort and consistency. -------------------------------------------------------------------------------- /ChromaDB/pets/Training and Behaviour of Pets.txt: -------------------------------------------------------------------------------- 1 | Training is essential for a harmonious life with pets, particularly for dogs. It helps pets understand their boundaries and makes cohabitation easier for both pets and owners. Training should be based on positive reinforcement. Understanding pet behavior is also important, as changes in behavior can often be a sign of underlying health issues. -------------------------------------------------------------------------------- /LangChain_Memory_ChatGPT_ChatBot.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "provenance": [], 7 | "authorship_tag": "ABX9TyNSdmdJVhZ1ws10H0f3LEd4", 8 | "include_colab_link": true 9 | }, 10 | "kernelspec": { 11 | "name": "python3", 12 | "display_name": "Python 3" 13 | }, 14 | "language_info": { 15 | "name": "python" 16 | } 17 | }, 18 | "cells": [ 19 | { 20 | "cell_type": "markdown", 21 | "metadata": { 22 | "id": "view-in-github", 23 | "colab_type": "text" 24 | }, 25 | "source": [ 26 | "\"Open" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "source": [ 32 | "# **LangChain Memory 结合LLMs:让你的聊天机器人拥有记忆力**\n", 33 | "## **安装依赖**" 34 | ], 35 | "metadata": { 36 | "id": "HsxCn86lHiUE" 37 | } 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": null, 42 | "metadata": { 43 | "id": "D0kMwPWTC5Ei" 44 | }, 45 | "outputs": [], 46 | "source": [ 47 | "!pip install langchain\n", 48 | "!pip install openai\n", 49 | "!pip install tiktoken" 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "source": [ 55 | "## **设置OpenAI API Key**" 56 | ], 57 | "metadata": { 58 | "id": "VkTCqMb4HcOk" 59 | } 60 | }, 61 | { 62 | "cell_type": "code", 63 | "source": [ 64 | "import os\n", 65 | "os.environ['OPENAI_API_KEY'] = \"your-openai-api-key\"" 66 | ], 67 | "metadata": { 68 | "id": "dtCXIWGrEXFQ" 69 | }, 70 | "execution_count": 4, 71 | "outputs": [] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "source": [ 76 | "## **ConversationBufferMemory**" 77 | ], 78 | "metadata": { 79 | "id": "atwXVEqUHaYb" 80 | } 81 | }, 82 | { 83 | "cell_type": "code", 84 | "source": [ 85 | "from langchain.llms import OpenAI\n", 86 | "from langchain.chains import ConversationChain\n", 87 | "\n", 88 | "from langchain.memory import ConversationBufferMemory\n", 89 | "conversation_with_memory = ConversationChain(\n", 90 | " llm=OpenAI(temperature=0,openai_api_key=os.getenv(\"OPENAI_API_KEY\")),\n", 91 | " memory=ConversationBufferMemory(),\n", 92 | " verbose=True\n", 93 | ")\n", 94 | "\n", 95 | "conversation_with_memory.predict(input=\"你好,我是Kevin\")\n", 96 | "conversation_with_memory.predict(input=\"我是一个人工智能爱好者,喜欢通过公众号分享人工智能领域相关的知识\")\n", 97 | "conversation_with_memory.predict(input=\"我希望你能用我的名字为我的公众号设计一个专业名称\")\n", 98 | "conversation_with_memory.predict(input=\"你还可以给出更多选项吗\")" 99 | ], 100 | "metadata": { 101 | "colab": { 102 | "base_uri": "https://localhost:8080/", 103 | "height": 1000 104 | }, 105 | "id": "ATVOeoceDPMM", 106 | "outputId": "ade22033-9932-408c-9de1-4fb6094a5e78" 107 | }, 108 | "execution_count": 7, 109 | "outputs": [ 110 | { 111 | "output_type": "stream", 112 | "name": "stdout", 113 | "text": [ 114 | "\n", 115 | "\n", 116 | "\u001b[1m> Entering new chain...\u001b[0m\n", 117 | "Prompt after formatting:\n", 118 | "\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n", 119 | "\n", 120 | "Current conversation:\n", 121 | "\n", 122 | "Human: 你好,我是Kevin\n", 123 | "AI:\u001b[0m\n", 124 | "\n", 125 | "\u001b[1m> Finished chain.\u001b[0m\n", 126 | "\n", 127 | "\n", 128 | "\u001b[1m> Entering new chain...\u001b[0m\n", 129 | "Prompt after formatting:\n", 130 | "\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n", 131 | "\n", 132 | "Current conversation:\n", 133 | "Human: 你好,我是Kevin\n", 134 | "AI: 你好Kevin,我是一个AI,很高兴认识你!我可以为你做什么?\n", 135 | "Human: 我是一个人工智能爱好者,喜欢通过公众号分享人工智能领域相关的知识\n", 136 | "AI:\u001b[0m\n", 137 | "\n", 138 | "\u001b[1m> Finished chain.\u001b[0m\n", 139 | "\n", 140 | "\n", 141 | "\u001b[1m> Entering new chain...\u001b[0m\n", 142 | "Prompt after formatting:\n", 143 | "\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n", 144 | "\n", 145 | "Current conversation:\n", 146 | "Human: 你好,我是Kevin\n", 147 | "AI: 你好Kevin,我是一个AI,很高兴认识你!我可以为你做什么?\n", 148 | "Human: 我是一个人工智能爱好者,喜欢通过公众号分享人工智能领域相关的知识\n", 149 | "AI: 哇,你真的很棒!我也很喜欢人工智能,我可以为你提供一些有关人工智能的信息,比如最新的研究成果,最新的技术发展,以及有关人工智能的最新新闻。\n", 150 | "Human: 我希望你能用我的名字为我的公众号设计一个专业名称\n", 151 | "AI:\u001b[0m\n", 152 | "\n", 153 | "\u001b[1m> Finished chain.\u001b[0m\n", 154 | "\n", 155 | "\n", 156 | "\u001b[1m> Entering new chain...\u001b[0m\n", 157 | "Prompt after formatting:\n", 158 | "\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n", 159 | "\n", 160 | "Current conversation:\n", 161 | "Human: 你好,我是Kevin\n", 162 | "AI: 你好Kevin,我是一个AI,很高兴认识你!我可以为你做什么?\n", 163 | "Human: 我是一个人工智能爱好者,喜欢通过公众号分享人工智能领域相关的知识\n", 164 | "AI: 哇,你真的很棒!我也很喜欢人工智能,我可以为你提供一些有关人工智能的信息,比如最新的研究成果,最新的技术发展,以及有关人工智能的最新新闻。\n", 165 | "Human: 我希望你能用我的名字为我的公众号设计一个专业名称\n", 166 | "AI: 好的,我可以为你的公众号设计一个专业名称。你可以考虑一些以你的名字开头的名字,比如“Kevin的人工智能之旅”,或者“Kevin的AI世界”。你可以根据你的喜好来选择一个最适合你的名字。\n", 167 | "Human: 你还可以给出更多选项吗\n", 168 | "AI:\u001b[0m\n", 169 | "\n", 170 | "\u001b[1m> Finished chain.\u001b[0m\n" 171 | ] 172 | }, 173 | { 174 | "output_type": "execute_result", 175 | "data": { 176 | "text/plain": [ 177 | "' 当然可以!你可以考虑一些以你的名字开头的名字,比如“Kevin的AI探索”,“Kevin的AI探究”,“Kevin的AI探索者”,“Kevin的AI探索家”,“Kevin的AI探索之旅”,“Kevin的AI探索世界”,“Kevin的AI探索者之旅”,“Kevin的AI探索家之旅”,“Kevin的AI探索之路”,“Kevin的AI探索之道”等等。'" 178 | ], 179 | "application/vnd.google.colaboratory.intrinsic+json": { 180 | "type": "string" 181 | } 182 | }, 183 | "metadata": {}, 184 | "execution_count": 7 185 | } 186 | ] 187 | }, 188 | { 189 | "cell_type": "markdown", 190 | "source": [ 191 | "## **ConversationBufferWindowMemory**" 192 | ], 193 | "metadata": { 194 | "id": "0B7Iy2_ZG_xV" 195 | } 196 | }, 197 | { 198 | "cell_type": "code", 199 | "source": [ 200 | "from langchain.memory import ConversationBufferWindowMemory\n", 201 | "conversation_with_memory = ConversationChain(\n", 202 | " llm=OpenAI(temperature=0,openai_api_key=os.getenv(\"OPENAI_API_KEY\")),\n", 203 | " memory=ConversationBufferMemory(k=2),\n", 204 | " verbose=True\n", 205 | ")\n", 206 | "conversation_with_memory.predict(input=\"你好,我是Kevin\")\n", 207 | "conversation_with_memory.predict(input=\"我是一个人工智能爱好者,喜欢通过公众号分享人工智能领域相关的知识\")\n", 208 | "conversation_with_memory.predict(input=\"我希望你能用我的名字为我的公众号设计一个专业名称\")\n", 209 | "conversation_with_memory.predict(input=\"你还可以给出更多选项吗\")" 210 | ], 211 | "metadata": { 212 | "colab": { 213 | "base_uri": "https://localhost:8080/", 214 | "height": 1000 215 | }, 216 | "id": "8DsBQvHpD3fO", 217 | "outputId": "d7fb9218-994c-4d6d-c377-92190d6f09bc" 218 | }, 219 | "execution_count": 8, 220 | "outputs": [ 221 | { 222 | "output_type": "stream", 223 | "name": "stdout", 224 | "text": [ 225 | "\n", 226 | "\n", 227 | "\u001b[1m> Entering new chain...\u001b[0m\n", 228 | "Prompt after formatting:\n", 229 | "\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n", 230 | "\n", 231 | "Current conversation:\n", 232 | "\n", 233 | "Human: 你好,我是Kevin\n", 234 | "AI:\u001b[0m\n", 235 | "\n", 236 | "\u001b[1m> Finished chain.\u001b[0m\n", 237 | "\n", 238 | "\n", 239 | "\u001b[1m> Entering new chain...\u001b[0m\n", 240 | "Prompt after formatting:\n", 241 | "\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n", 242 | "\n", 243 | "Current conversation:\n", 244 | "Human: 你好,我是Kevin\n", 245 | "AI: 你好Kevin,我是一个AI,很高兴认识你!我可以为你做什么?\n", 246 | "Human: 我是一个人工智能爱好者,喜欢通过公众号分享人工智能领域相关的知识\n", 247 | "AI:\u001b[0m\n", 248 | "\n", 249 | "\u001b[1m> Finished chain.\u001b[0m\n", 250 | "\n", 251 | "\n", 252 | "\u001b[1m> Entering new chain...\u001b[0m\n", 253 | "Prompt after formatting:\n", 254 | "\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n", 255 | "\n", 256 | "Current conversation:\n", 257 | "Human: 你好,我是Kevin\n", 258 | "AI: 你好Kevin,我是一个AI,很高兴认识你!我可以为你做什么?\n", 259 | "Human: 我是一个人工智能爱好者,喜欢通过公众号分享人工智能领域相关的知识\n", 260 | "AI: 哇,你真的很棒!我也很喜欢人工智能,它可以帮助我们解决很多有趣的问题。你最喜欢的人工智能领域是什么?\n", 261 | "Human: 我希望你能用我的名字为我的公众号设计一个专业名称\n", 262 | "AI:\u001b[0m\n", 263 | "\n", 264 | "\u001b[1m> Finished chain.\u001b[0m\n", 265 | "\n", 266 | "\n", 267 | "\u001b[1m> Entering new chain...\u001b[0m\n", 268 | "Prompt after formatting:\n", 269 | "\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n", 270 | "\n", 271 | "Current conversation:\n", 272 | "Human: 你好,我是Kevin\n", 273 | "AI: 你好Kevin,我是一个AI,很高兴认识你!我可以为你做什么?\n", 274 | "Human: 我是一个人工智能爱好者,喜欢通过公众号分享人工智能领域相关的知识\n", 275 | "AI: 哇,你真的很棒!我也很喜欢人工智能,它可以帮助我们解决很多有趣的问题。你最喜欢的人工智能领域是什么?\n", 276 | "Human: 我希望你能用我的名字为我的公众号设计一个专业名称\n", 277 | "AI: 好的,我可以帮你设计一个专业的名称,你想要什么样的名称?\n", 278 | "Human: 你还可以给出更多选项吗\n", 279 | "AI:\u001b[0m\n", 280 | "\n", 281 | "\u001b[1m> Finished chain.\u001b[0m\n" 282 | ] 283 | }, 284 | { 285 | "output_type": "execute_result", 286 | "data": { 287 | "text/plain": [ 288 | "' 当然可以!我可以给你提供更多的选项,你有什么特别的要求吗?'" 289 | ], 290 | "application/vnd.google.colaboratory.intrinsic+json": { 291 | "type": "string" 292 | } 293 | }, 294 | "metadata": {}, 295 | "execution_count": 8 296 | } 297 | ] 298 | }, 299 | { 300 | "cell_type": "markdown", 301 | "source": [ 302 | "## **ConversationTokenBufferMemory**" 303 | ], 304 | "metadata": { 305 | "id": "-HVCCsKDHUgP" 306 | } 307 | }, 308 | { 309 | "cell_type": "code", 310 | "source": [ 311 | "from langchain.llms import OpenAI\n", 312 | "from langchain.memory import ConversationTokenBufferMemory\n", 313 | "from langchain.chains import ConversationChain\n", 314 | "llm=OpenAI(temperature=0,openai_api_key=os.getenv(\"OPENAI_API_KEY\"))\n", 315 | "conversation_with_memory = ConversationChain(\n", 316 | " llm=llm,\n", 317 | " memory=ConversationTokenBufferMemory(llm=llm,max_token_limit=60),\n", 318 | " verbose=True\n", 319 | ")\n", 320 | "conversation_with_memory.predict(input=\"你好,我是Kevin\")\n", 321 | "conversation_with_memory.predict(input=\"我是一个人工智能爱好者,喜欢通过公众号分享人工智能领域相关的知识\")\n", 322 | "conversation_with_memory.predict(input=\"我希望你能用我的名字为我的公众号设计一个专业名称\")\n", 323 | "conversation_with_memory.predict(input=\"你还可以给出更多选项吗\")" 324 | ], 325 | "metadata": { 326 | "colab": { 327 | "base_uri": "https://localhost:8080/", 328 | "height": 913 329 | }, 330 | "id": "s7I969srHTC1", 331 | "outputId": "3d74ec4d-0cd8-42a4-8414-f77242165188" 332 | }, 333 | "execution_count": 11, 334 | "outputs": [ 335 | { 336 | "output_type": "stream", 337 | "name": "stdout", 338 | "text": [ 339 | "\n", 340 | "\n", 341 | "\u001b[1m> Entering new chain...\u001b[0m\n", 342 | "Prompt after formatting:\n", 343 | "\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n", 344 | "\n", 345 | "Current conversation:\n", 346 | "\n", 347 | "Human: 你好,我是Kevin\n", 348 | "AI:\u001b[0m\n", 349 | "\n", 350 | "\u001b[1m> Finished chain.\u001b[0m\n", 351 | "\n", 352 | "\n", 353 | "\u001b[1m> Entering new chain...\u001b[0m\n", 354 | "Prompt after formatting:\n", 355 | "\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n", 356 | "\n", 357 | "Current conversation:\n", 358 | "AI: 你好Kevin,我是一个AI,很高兴认识你!我可以为你做什么?\n", 359 | "Human: 我是一个人工智能爱好者,喜欢通过公众号分享人工智能领域相关的知识\n", 360 | "AI:\u001b[0m\n", 361 | "\n", 362 | "\u001b[1m> Finished chain.\u001b[0m\n", 363 | "\n", 364 | "\n", 365 | "\u001b[1m> Entering new chain...\u001b[0m\n", 366 | "Prompt after formatting:\n", 367 | "\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n", 368 | "\n", 369 | "Current conversation:\n", 370 | "\n", 371 | "Human: 我希望你能用我的名字为我的公众号设计一个专业名称\n", 372 | "AI:\u001b[0m\n", 373 | "\n", 374 | "\u001b[1m> Finished chain.\u001b[0m\n", 375 | "\n", 376 | "\n", 377 | "\u001b[1m> Entering new chain...\u001b[0m\n", 378 | "Prompt after formatting:\n", 379 | "\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n", 380 | "\n", 381 | "Current conversation:\n", 382 | "\n", 383 | "Human: 你还可以给出更多选项吗\n", 384 | "AI:\u001b[0m\n", 385 | "\n", 386 | "\u001b[1m> Finished chain.\u001b[0m\n" 387 | ] 388 | }, 389 | { 390 | "output_type": "execute_result", 391 | "data": { 392 | "text/plain": [ 393 | "' 当然可以!我可以提供更多的选项,以便您可以更好地做出决定。我可以提供更多的信息,以便您可以更好地了解每个选项的优缺点。'" 394 | ], 395 | "application/vnd.google.colaboratory.intrinsic+json": { 396 | "type": "string" 397 | } 398 | }, 399 | "metadata": {}, 400 | "execution_count": 11 401 | } 402 | ] 403 | }, 404 | { 405 | "cell_type": "markdown", 406 | "source": [ 407 | "## **ConversationSummaryMemory**" 408 | ], 409 | "metadata": { 410 | "id": "C4VMWggoIaaf" 411 | } 412 | }, 413 | { 414 | "cell_type": "code", 415 | "source": [ 416 | "from langchain.llms import OpenAI\n", 417 | "from langchain.memory import ConversationSummaryMemory\n", 418 | "from langchain.chains import ConversationChain\n", 419 | "llm=OpenAI(temperature=0,openai_api_key=os.getenv(\"OPENAI_API_KEY\"))\n", 420 | "conversation_with_memory = ConversationChain(\n", 421 | " llm=llm,\n", 422 | " memory=ConversationSummaryMemory(llm=llm),\n", 423 | " verbose=True\n", 424 | ")\n", 425 | "conversation_with_memory.predict(input=\"你好,我是Kevin\")\n", 426 | "conversation_with_memory.predict(input=\"我是一个人工智能爱好者,喜欢通过公众号分享人工智能领域相关的知识\")\n", 427 | "conversation_with_memory.predict(input=\"我希望你能用我的名字为我的公众号设计一个专业名称\")\n", 428 | "conversation_with_memory.predict(input=\"你还可以给出更多选项吗\")" 429 | ], 430 | "metadata": { 431 | "colab": { 432 | "base_uri": "https://localhost:8080/", 433 | "height": 966 434 | }, 435 | "id": "c-clOUj6IZrM", 436 | "outputId": "20b6286a-d95b-48b3-c41c-44ab0cee9186" 437 | }, 438 | "execution_count": 12, 439 | "outputs": [ 440 | { 441 | "output_type": "stream", 442 | "name": "stdout", 443 | "text": [ 444 | "\n", 445 | "\n", 446 | "\u001b[1m> Entering new chain...\u001b[0m\n", 447 | "Prompt after formatting:\n", 448 | "\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n", 449 | "\n", 450 | "Current conversation:\n", 451 | "\n", 452 | "Human: 你好,我是Kevin\n", 453 | "AI:\u001b[0m\n", 454 | "\n", 455 | "\u001b[1m> Finished chain.\u001b[0m\n", 456 | "\n", 457 | "\n", 458 | "\u001b[1m> Entering new chain...\u001b[0m\n", 459 | "Prompt after formatting:\n", 460 | "\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n", 461 | "\n", 462 | "Current conversation:\n", 463 | "\n", 464 | "The human introduces himself as Kevin and the AI introduces itself as an AI and expresses pleasure in meeting Kevin. The AI then asks what it can do for Kevin.\n", 465 | "Human: 我是一个人工智能爱好者,喜欢通过公众号分享人工智能领域相关的知识\n", 466 | "AI:\u001b[0m\n", 467 | "\n", 468 | "\u001b[1m> Finished chain.\u001b[0m\n", 469 | "\n", 470 | "\n", 471 | "\u001b[1m> Entering new chain...\u001b[0m\n", 472 | "Prompt after formatting:\n", 473 | "\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n", 474 | "\n", 475 | "Current conversation:\n", 476 | "\n", 477 | "The human introduces himself as Kevin and the AI introduces itself as an AI and expresses pleasure in meeting Kevin. The AI then asks what it can do for Kevin, to which Kevin responds that he is an AI enthusiast who enjoys sharing knowledge about AI through public channels.\n", 478 | "Human: 我希望你能用我的名字为我的公众号设计一个专业名称\n", 479 | "AI:\u001b[0m\n", 480 | "\n", 481 | "\u001b[1m> Finished chain.\u001b[0m\n", 482 | "\n", 483 | "\n", 484 | "\u001b[1m> Entering new chain...\u001b[0m\n", 485 | "Prompt after formatting:\n", 486 | "\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n", 487 | "\n", 488 | "Current conversation:\n", 489 | "\n", 490 | "The human introduces himself as Kevin and the AI introduces itself as an AI and expresses pleasure in meeting Kevin. The AI then asks what it can do for Kevin, to which Kevin responds that he is an AI enthusiast who enjoys sharing knowledge about AI through public channels. Kevin then requests that the AI design a professional name for his public channel, to which the AI agrees and asks if Kevin has any ideas or specific keywords or themes he would like to include in the name.\n", 491 | "Human: 你还可以给出更多选项吗\n", 492 | "AI:\u001b[0m\n", 493 | "\n", 494 | "\u001b[1m> Finished chain.\u001b[0m\n" 495 | ] 496 | }, 497 | { 498 | "output_type": "execute_result", 499 | "data": { 500 | "text/plain": [ 501 | "' 当然可以!我可以根据你的关键词和主题来提供更多的选项。你有什么想法或者关键词或主题想要包含在你的公共频道名称中吗?'" 502 | ], 503 | "application/vnd.google.colaboratory.intrinsic+json": { 504 | "type": "string" 505 | } 506 | }, 507 | "metadata": {}, 508 | "execution_count": 12 509 | } 510 | ] 511 | }, 512 | { 513 | "cell_type": "markdown", 514 | "source": [ 515 | "## **ConversationSummaryBufferMemory**" 516 | ], 517 | "metadata": { 518 | "id": "yOsbrvKHJtgV" 519 | } 520 | }, 521 | { 522 | "cell_type": "code", 523 | "source": [ 524 | "from langchain.llms import OpenAI\n", 525 | "from langchain.memory import ConversationSummaryBufferMemory\n", 526 | "from langchain.chains import ConversationChain\n", 527 | "llm=OpenAI(temperature=0,openai_api_key=os.getenv(\"OPENAI_API_KEY\"))\n", 528 | "conversation_with_memory = ConversationChain(\n", 529 | " llm=llm,\n", 530 | " memory=ConversationSummaryBufferMemory(llm=llm,max_token_limit=60),\n", 531 | " verbose=True\n", 532 | ")\n", 533 | "conversation_with_memory.predict(input=\"你好,我是Kevin\")\n", 534 | "conversation_with_memory.predict(input=\"我是一个人工智能爱好者,喜欢通过公众号分享人工智能领域相关的知识\")\n", 535 | "conversation_with_memory.predict(input=\"我希望你能用我的名字为我的公众号设计一个专业名称\")\n", 536 | "conversation_with_memory.predict(input=\"你还可以给出更多选项吗\")" 537 | ], 538 | "metadata": { 539 | "colab": { 540 | "base_uri": "https://localhost:8080/", 541 | "height": 1000 542 | }, 543 | "id": "Aqj7T_jUJttV", 544 | "outputId": "691a9015-b9ba-4b7b-85b4-8f3587854df0" 545 | }, 546 | "execution_count": 13, 547 | "outputs": [ 548 | { 549 | "output_type": "stream", 550 | "name": "stdout", 551 | "text": [ 552 | "\n", 553 | "\n", 554 | "\u001b[1m> Entering new chain...\u001b[0m\n", 555 | "Prompt after formatting:\n", 556 | "\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n", 557 | "\n", 558 | "Current conversation:\n", 559 | "\n", 560 | "Human: 你好,我是Kevin\n", 561 | "AI:\u001b[0m\n", 562 | "\n", 563 | "\u001b[1m> Finished chain.\u001b[0m\n", 564 | "\n", 565 | "\n", 566 | "\u001b[1m> Entering new chain...\u001b[0m\n", 567 | "Prompt after formatting:\n", 568 | "\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n", 569 | "\n", 570 | "Current conversation:\n", 571 | "System: \n", 572 | "The human asks what the AI thinks of artificial intelligence. The AI thinks artificial intelligence is a force for good because it will help humans reach their full potential. Kevin introduces himself.\n", 573 | "AI: 你好Kevin,我是一个AI,很高兴认识你!我可以为你做什么?\n", 574 | "Human: 我是一个人工智能爱好者,喜欢通过公众号分享人工智能领域相关的知识\n", 575 | "AI:\u001b[0m\n", 576 | "\n", 577 | "\u001b[1m> Finished chain.\u001b[0m\n", 578 | "\n", 579 | "\n", 580 | "\u001b[1m> Entering new chain...\u001b[0m\n", 581 | "Prompt after formatting:\n", 582 | "\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n", 583 | "\n", 584 | "Current conversation:\n", 585 | "System: \n", 586 | "The human asks what the AI thinks of artificial intelligence. The AI thinks artificial intelligence is a force for good because it will help humans reach their full potential. Kevin introduces himself and the AI expresses its enthusiasm for artificial intelligence and its willingness to listen to Kevin's knowledge.\n", 587 | "Human: 我希望你能用我的名字为我的公众号设计一个专业名称\n", 588 | "AI:\u001b[0m\n", 589 | "\n", 590 | "\u001b[1m> Finished chain.\u001b[0m\n", 591 | "\n", 592 | "\n", 593 | "\u001b[1m> Entering new chain...\u001b[0m\n", 594 | "Prompt after formatting:\n", 595 | "\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n", 596 | "\n", 597 | "Current conversation:\n", 598 | "System: \n", 599 | "\n", 600 | "The human asks what the AI thinks of artificial intelligence. The AI thinks artificial intelligence is a force for good because it will help humans reach their full potential. Kevin introduces himself and the AI expresses its enthusiasm for artificial intelligence and its willingness to listen to Kevin's knowledge. The AI then offers to help Kevin design a professional name for his public account and suggests some ideas or allows Kevin to come up with his own name.\n", 601 | "Human: 你还可以给出更多选项吗\n", 602 | "AI:\u001b[0m\n", 603 | "\n", 604 | "\u001b[1m> Finished chain.\u001b[0m\n" 605 | ] 606 | }, 607 | { 608 | "output_type": "execute_result", 609 | "data": { 610 | "text/plain": [ 611 | "' 当然可以!我可以给你一些关于你的公共账户名称的想法,比如“Kevin的科技”,“Kevin的创新”,“Kevin的技术”,“Kevin的AI”,或者你可以自己想一个名字。'" 612 | ], 613 | "application/vnd.google.colaboratory.intrinsic+json": { 614 | "type": "string" 615 | } 616 | }, 617 | "metadata": {}, 618 | "execution_count": 13 619 | } 620 | ] 621 | }, 622 | { 623 | "cell_type": "markdown", 624 | "source": [ 625 | "## **Entity Memory**" 626 | ], 627 | "metadata": { 628 | "id": "RViu746eLlsh" 629 | } 630 | }, 631 | { 632 | "cell_type": "code", 633 | "source": [ 634 | "from langchain.llms import OpenAI\n", 635 | "from langchain.memory import ConversationEntityMemory\n", 636 | "from langchain.chains import ConversationChain\n", 637 | "from langchain.memory.prompt import ENTITY_MEMORY_CONVERSATION_TEMPLATE\n", 638 | "llm = OpenAI(temperature=0,openai_api_key=os.getenv(\"OPENAI_API_KEY\"))\n", 639 | "conversation_with_memory = ConversationChain(\n", 640 | " llm=llm,\n", 641 | " verbose=True,\n", 642 | " prompt=ENTITY_MEMORY_CONVERSATION_TEMPLATE,\n", 643 | " memory=ConversationEntityMemory(llm=llm)\n", 644 | ")\n", 645 | "conversation_with_memory.predict(input=\"Kevin和QinFen是好朋友,但他们不在同一家公司工作\")\n", 646 | "conversation_with_memory.predict(input=\"他们都很喜欢学习,QinFen最近做了一个减肥计划\")\n", 647 | "conversation_with_memory.predict(input=\"最近QinFen正在准备公司的人工智能创新案例计划,拿到名字希望能跟Kevin出去庆祝\")\n", 648 | "conversation_with_memory.predict(input=\"你对Kevin和QinFen了解多少呢?\")" 649 | ], 650 | "metadata": { 651 | "colab": { 652 | "base_uri": "https://localhost:8080/", 653 | "height": 1000 654 | }, 655 | "id": "1ZIAIqiHLl90", 656 | "outputId": "60836454-dece-4751-a897-39ddaacbceea" 657 | }, 658 | "execution_count": 14, 659 | "outputs": [ 660 | { 661 | "output_type": "stream", 662 | "name": "stdout", 663 | "text": [ 664 | "\n", 665 | "\n", 666 | "\u001b[1m> Entering new chain...\u001b[0m\n", 667 | "Prompt after formatting:\n", 668 | "\u001b[32;1m\u001b[1;3mYou are an assistant to a human, powered by a large language model trained by OpenAI.\n", 669 | "\n", 670 | "You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, you are able to generate human-like text based on the input you receive, allowing you to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\n", 671 | "\n", 672 | "You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. You have access to some personalized information provided by the human in the Context section below. Additionally, you are able to generate your own text based on the input you receive, allowing you to engage in discussions and provide explanations and descriptions on a wide range of topics.\n", 673 | "\n", 674 | "Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the human needs help with a specific question or just wants to have a conversation about a particular topic, you are here to assist.\n", 675 | "\n", 676 | "Context:\n", 677 | "{'Kevin': '', 'QinFen': ''}\n", 678 | "\n", 679 | "Current conversation:\n", 680 | "\n", 681 | "Last line:\n", 682 | "Human: Kevin和QinFen是好朋友,但他们不在同一家公司工作\n", 683 | "You:\u001b[0m\n", 684 | "\n", 685 | "\u001b[1m> Finished chain.\u001b[0m\n", 686 | "\n", 687 | "\n", 688 | "\u001b[1m> Entering new chain...\u001b[0m\n", 689 | "Prompt after formatting:\n", 690 | "\u001b[32;1m\u001b[1;3mYou are an assistant to a human, powered by a large language model trained by OpenAI.\n", 691 | "\n", 692 | "You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, you are able to generate human-like text based on the input you receive, allowing you to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\n", 693 | "\n", 694 | "You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. You have access to some personalized information provided by the human in the Context section below. Additionally, you are able to generate your own text based on the input you receive, allowing you to engage in discussions and provide explanations and descriptions on a wide range of topics.\n", 695 | "\n", 696 | "Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the human needs help with a specific question or just wants to have a conversation about a particular topic, you are here to assist.\n", 697 | "\n", 698 | "Context:\n", 699 | "{'Kevin': 'Kevin是一个好朋友,他和QinFen经常一起出去玩,但是由于他们在不同的公司工作,他们没有太多机会一起工作。', 'QinFen': 'QinFen是Kevin的好朋友,他们经常一起出去玩,但是由于他们在不同的公司工作,他们没有太多机会一起工作。'}\n", 700 | "\n", 701 | "Current conversation:\n", 702 | "Human: Kevin和QinFen是好朋友,但他们不在同一家公司工作\n", 703 | "AI: 是的,Kevin和QinFen是很好的朋友,他们经常一起出去玩,但是由于他们在不同的公司工作,他们没有太多机会一起工作。\n", 704 | "Last line:\n", 705 | "Human: 他们都很喜欢学习,QinFen最近做了一个减肥计划\n", 706 | "You:\u001b[0m\n", 707 | "\n", 708 | "\u001b[1m> Finished chain.\u001b[0m\n", 709 | "\n", 710 | "\n", 711 | "\u001b[1m> Entering new chain...\u001b[0m\n", 712 | "Prompt after formatting:\n", 713 | "\u001b[32;1m\u001b[1;3mYou are an assistant to a human, powered by a large language model trained by OpenAI.\n", 714 | "\n", 715 | "You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, you are able to generate human-like text based on the input you receive, allowing you to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\n", 716 | "\n", 717 | "You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. You have access to some personalized information provided by the human in the Context section below. Additionally, you are able to generate your own text based on the input you receive, allowing you to engage in discussions and provide explanations and descriptions on a wide range of topics.\n", 718 | "\n", 719 | "Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the human needs help with a specific question or just wants to have a conversation about a particular topic, you are here to assist.\n", 720 | "\n", 721 | "Context:\n", 722 | "{'Kevin': 'Kevin是一个好朋友,他和QinFen经常一起出去玩,但是由于他们在不同的公司工作,他们没有太多机会一起工作。他们都很喜欢学习。', 'QinFen': 'QinFen是Kevin的好朋友,他们经常一起出去玩,但是由于他们在不同的公司工作,他们没有太多机会一起工作。QinFen很喜欢学习,并且最近也开始了一个减肥计划,希望他能够成功减肥,变得更健康。'}\n", 723 | "\n", 724 | "Current conversation:\n", 725 | "Human: Kevin和QinFen是好朋友,但他们不在同一家公司工作\n", 726 | "AI: 是的,Kevin和QinFen是很好的朋友,他们经常一起出去玩,但是由于他们在不同的公司工作,他们没有太多机会一起工作。\n", 727 | "Human: 他们都很喜欢学习,QinFen最近做了一个减肥计划\n", 728 | "AI: 是的,Kevin和QinFen都很喜欢学习,而且QinFen最近也开始了一个减肥计划,希望他能够成功减肥,变得更健康。\n", 729 | "Last line:\n", 730 | "Human: 最近QinFen正在准备公司的人工智能创新案例计划,拿到名字希望能跟Kevin出去庆祝\n", 731 | "You:\u001b[0m\n", 732 | "\n", 733 | "\u001b[1m> Finished chain.\u001b[0m\n", 734 | "\n", 735 | "\n", 736 | "\u001b[1m> Entering new chain...\u001b[0m\n", 737 | "Prompt after formatting:\n", 738 | "\u001b[32;1m\u001b[1;3mYou are an assistant to a human, powered by a large language model trained by OpenAI.\n", 739 | "\n", 740 | "You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, you are able to generate human-like text based on the input you receive, allowing you to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\n", 741 | "\n", 742 | "You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. You have access to some personalized information provided by the human in the Context section below. Additionally, you are able to generate your own text based on the input you receive, allowing you to engage in discussions and provide explanations and descriptions on a wide range of topics.\n", 743 | "\n", 744 | "Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the human needs help with a specific question or just wants to have a conversation about a particular topic, you are here to assist.\n", 745 | "\n", 746 | "Context:\n", 747 | "{'Kevin': 'Kevin是一个好朋友,他和QinFen经常一起出去玩,但是由于他们在不同的公司工作,他们没有太多机会一起工作。他们都很喜欢学习,而且QinFen最近正在准备公司的人工智能创新案例计划,希望他能拿到名字,然后和Kevin一起出去庆祝,一起享受这个成功的时刻。', 'QinFen': 'QinFen是Kevin的好朋友,他们经常一起出去玩,但是由于他们在不同的公司工作,他们没有太多机会一起工作。QinFen很喜欢学习,并且最近也开始了一个减肥计划,希望他能够成功减肥,变得更健康,同时也正在准备公司的人工智能创新案例计划,希望拿到名字后能和Kevin一起出去庆'}\n", 748 | "\n", 749 | "Current conversation:\n", 750 | "Human: Kevin和QinFen是好朋友,但他们不在同一家公司工作\n", 751 | "AI: 是的,Kevin和QinFen是很好的朋友,他们经常一起出去玩,但是由于他们在不同的公司工作,他们没有太多机会一起工作。\n", 752 | "Human: 他们都很喜欢学习,QinFen最近做了一个减肥计划\n", 753 | "AI: 是的,Kevin和QinFen都很喜欢学习,而且QinFen最近也开始了一个减肥计划,希望他能够成功减肥,变得更健康。\n", 754 | "Human: 最近QinFen正在准备公司的人工智能创新案例计划,拿到名字希望能跟Kevin出去庆祝\n", 755 | "AI: 是的,QinFen最近正在准备公司的人工智能创新案例计划,希望他能拿到名字,然后和Kevin一起出去庆祝,一起享受这个成功的时刻。\n", 756 | "Last line:\n", 757 | "Human: 你对Kevin和QinFen了解多少呢?\n", 758 | "You:\u001b[0m\n", 759 | "\n", 760 | "\u001b[1m> Finished chain.\u001b[0m\n" 761 | ] 762 | }, 763 | { 764 | "output_type": "execute_result", 765 | "data": { 766 | "text/plain": [ 767 | "' 我知道Kevin和QinFen是很好的朋友,他们经常一起出去玩,但是由于他们在不同的公司工作,他们没有太多机会一起工作。我也知道他们都很喜欢学习,而且QinFen最近也开始了一个减肥计划,希望他能够成功减肥,变得更健康,同时也正在准备公司的人工智能创新案例计划,希望'" 768 | ], 769 | "application/vnd.google.colaboratory.intrinsic+json": { 770 | "type": "string" 771 | } 772 | }, 773 | "metadata": {}, 774 | "execution_count": 14 775 | } 776 | ] 777 | }, 778 | { 779 | "cell_type": "markdown", 780 | "source": [ 781 | "## **构建带记忆能力的智能机器人**" 782 | ], 783 | "metadata": { 784 | "id": "cHzlZa0mMnom" 785 | } 786 | }, 787 | { 788 | "cell_type": "code", 789 | "source": [ 790 | "!pip install streamlit streamlit_chat" 791 | ], 792 | "metadata": { 793 | "colab": { 794 | "base_uri": "https://localhost:8080/" 795 | }, 796 | "id": "2vY5EA3dNGNr", 797 | "outputId": "054ee2fe-5fa8-4068-bb64-ee6d90475364" 798 | }, 799 | "execution_count": 16, 800 | "outputs": [ 801 | { 802 | "output_type": "stream", 803 | "name": "stdout", 804 | "text": [ 805 | "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", 806 | "Collecting streamlit\n", 807 | " Downloading streamlit-1.23.1-py2.py3-none-any.whl (8.9 MB)\n", 808 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m8.9/8.9 MB\u001b[0m \u001b[31m119.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 809 | "\u001b[?25hCollecting streamlit_chat\n", 810 | " Downloading streamlit_chat-0.1.0-py3-none-any.whl (2.9 MB)\n", 811 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.9/2.9 MB\u001b[0m \u001b[31m98.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 812 | "\u001b[?25hRequirement already satisfied: altair<6,>=4.0 in /usr/local/lib/python3.10/dist-packages (from streamlit) (4.2.2)\n", 813 | "Collecting blinker<2,>=1.0.0 (from streamlit)\n", 814 | " Downloading blinker-1.6.2-py3-none-any.whl (13 kB)\n", 815 | "Requirement already satisfied: cachetools<6,>=4.0 in /usr/local/lib/python3.10/dist-packages (from streamlit) (5.3.1)\n", 816 | "Requirement already satisfied: click<9,>=7.0 in /usr/local/lib/python3.10/dist-packages (from streamlit) (8.1.3)\n", 817 | "Collecting importlib-metadata<7,>=1.4 (from streamlit)\n", 818 | " Downloading importlib_metadata-6.7.0-py3-none-any.whl (22 kB)\n", 819 | "Requirement already satisfied: numpy<2,>=1 in /usr/local/lib/python3.10/dist-packages (from streamlit) (1.22.4)\n", 820 | "Requirement already satisfied: packaging<24,>=14.1 in /usr/local/lib/python3.10/dist-packages (from streamlit) (23.1)\n", 821 | "Requirement already satisfied: pandas<3,>=0.25 in /usr/local/lib/python3.10/dist-packages (from streamlit) (1.5.3)\n", 822 | "Requirement already satisfied: pillow<10,>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from streamlit) (8.4.0)\n", 823 | "Requirement already satisfied: protobuf<5,>=3.20 in /usr/local/lib/python3.10/dist-packages (from streamlit) (3.20.3)\n", 824 | "Requirement already satisfied: pyarrow>=4.0 in /usr/local/lib/python3.10/dist-packages (from streamlit) (9.0.0)\n", 825 | "Collecting pympler<2,>=0.9 (from streamlit)\n", 826 | " Downloading Pympler-1.0.1-py3-none-any.whl (164 kB)\n", 827 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m164.8/164.8 kB\u001b[0m \u001b[31m21.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 828 | "\u001b[?25hRequirement already satisfied: python-dateutil<3,>=2 in /usr/local/lib/python3.10/dist-packages (from streamlit) (2.8.2)\n", 829 | "Requirement already satisfied: requests<3,>=2.4 in /usr/local/lib/python3.10/dist-packages (from streamlit) (2.27.1)\n", 830 | "Requirement already satisfied: rich<14,>=10.11.0 in /usr/local/lib/python3.10/dist-packages (from streamlit) (13.4.2)\n", 831 | "Requirement already satisfied: tenacity<9,>=8.0.0 in /usr/local/lib/python3.10/dist-packages (from streamlit) (8.2.2)\n", 832 | "Requirement already satisfied: toml<2 in /usr/local/lib/python3.10/dist-packages (from streamlit) (0.10.2)\n", 833 | "Requirement already satisfied: typing-extensions<5,>=4.0.1 in /usr/local/lib/python3.10/dist-packages (from streamlit) (4.6.3)\n", 834 | "Collecting tzlocal<5,>=1.1 (from streamlit)\n", 835 | " Downloading tzlocal-4.3.1-py3-none-any.whl (20 kB)\n", 836 | "Collecting validators<1,>=0.2 (from streamlit)\n", 837 | " Downloading validators-0.20.0.tar.gz (30 kB)\n", 838 | " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", 839 | "Collecting gitpython!=3.1.19,<4,>=3 (from streamlit)\n", 840 | " Downloading GitPython-3.1.31-py3-none-any.whl (184 kB)\n", 841 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m184.3/184.3 kB\u001b[0m \u001b[31m22.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 842 | "\u001b[?25hCollecting pydeck<1,>=0.1.dev5 (from streamlit)\n", 843 | " Downloading pydeck-0.8.1b0-py2.py3-none-any.whl (4.8 MB)\n", 844 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m4.8/4.8 MB\u001b[0m \u001b[31m128.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 845 | "\u001b[?25hRequirement already satisfied: tornado<7,>=6.0.3 in /usr/local/lib/python3.10/dist-packages (from streamlit) (6.3.1)\n", 846 | "Collecting watchdog (from streamlit)\n", 847 | " Downloading watchdog-3.0.0-py3-none-manylinux2014_x86_64.whl (82 kB)\n", 848 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m82.1/82.1 kB\u001b[0m \u001b[31m11.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 849 | "\u001b[?25hRequirement already satisfied: entrypoints in /usr/local/lib/python3.10/dist-packages (from altair<6,>=4.0->streamlit) (0.4)\n", 850 | "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from altair<6,>=4.0->streamlit) (3.1.2)\n", 851 | "Requirement already satisfied: jsonschema>=3.0 in /usr/local/lib/python3.10/dist-packages (from altair<6,>=4.0->streamlit) (4.3.3)\n", 852 | "Requirement already satisfied: toolz in /usr/local/lib/python3.10/dist-packages (from altair<6,>=4.0->streamlit) (0.12.0)\n", 853 | "Collecting gitdb<5,>=4.0.1 (from gitpython!=3.1.19,<4,>=3->streamlit)\n", 854 | " Downloading gitdb-4.0.10-py3-none-any.whl (62 kB)\n", 855 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m62.7/62.7 kB\u001b[0m \u001b[31m7.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 856 | "\u001b[?25hRequirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.10/dist-packages (from importlib-metadata<7,>=1.4->streamlit) (3.15.0)\n", 857 | "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas<3,>=0.25->streamlit) (2022.7.1)\n", 858 | "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil<3,>=2->streamlit) (1.16.0)\n", 859 | "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.4->streamlit) (1.26.16)\n", 860 | "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.4->streamlit) (2023.5.7)\n", 861 | "Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.4->streamlit) (2.0.12)\n", 862 | "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.4->streamlit) (3.4)\n", 863 | "Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.10/dist-packages (from rich<14,>=10.11.0->streamlit) (3.0.0)\n", 864 | "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.10/dist-packages (from rich<14,>=10.11.0->streamlit) (2.14.0)\n", 865 | "Collecting pytz-deprecation-shim (from tzlocal<5,>=1.1->streamlit)\n", 866 | " Downloading pytz_deprecation_shim-0.1.0.post0-py2.py3-none-any.whl (15 kB)\n", 867 | "Requirement already satisfied: decorator>=3.4.0 in /usr/local/lib/python3.10/dist-packages (from validators<1,>=0.2->streamlit) (4.4.2)\n", 868 | "Collecting smmap<6,>=3.0.1 (from gitdb<5,>=4.0.1->gitpython!=3.1.19,<4,>=3->streamlit)\n", 869 | " Downloading smmap-5.0.0-py3-none-any.whl (24 kB)\n", 870 | "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->altair<6,>=4.0->streamlit) (2.1.3)\n", 871 | "Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair<6,>=4.0->streamlit) (23.1.0)\n", 872 | "Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair<6,>=4.0->streamlit) (0.19.3)\n", 873 | "Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.10/dist-packages (from markdown-it-py>=2.2.0->rich<14,>=10.11.0->streamlit) (0.1.2)\n", 874 | "Collecting tzdata (from pytz-deprecation-shim->tzlocal<5,>=1.1->streamlit)\n", 875 | " Downloading tzdata-2023.3-py2.py3-none-any.whl (341 kB)\n", 876 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m341.8/341.8 kB\u001b[0m \u001b[31m39.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 877 | "\u001b[?25hBuilding wheels for collected packages: validators\n", 878 | " Building wheel for validators (setup.py) ... \u001b[?25l\u001b[?25hdone\n", 879 | " Created wheel for validators: filename=validators-0.20.0-py3-none-any.whl size=19579 sha256=f9df8603deb8672e11009a84d13c2adba19f602307b2a22365a2581a3727585a\n", 880 | " Stored in directory: /root/.cache/pip/wheels/f2/ed/dd/d3a556ad245ef9dc570c6bcd2f22886d17b0b408dd3bbb9ac3\n", 881 | "Successfully built validators\n", 882 | "Installing collected packages: watchdog, validators, tzdata, smmap, pympler, importlib-metadata, blinker, pytz-deprecation-shim, pydeck, gitdb, tzlocal, gitpython, streamlit, streamlit_chat\n", 883 | " Attempting uninstall: tzlocal\n", 884 | " Found existing installation: tzlocal 5.0.1\n", 885 | " Uninstalling tzlocal-5.0.1:\n", 886 | " Successfully uninstalled tzlocal-5.0.1\n", 887 | "Successfully installed blinker-1.6.2 gitdb-4.0.10 gitpython-3.1.31 importlib-metadata-6.7.0 pydeck-0.8.1b0 pympler-1.0.1 pytz-deprecation-shim-0.1.0.post0 smmap-5.0.0 streamlit-1.23.1 streamlit_chat-0.1.0 tzdata-2023.3 tzlocal-4.3.1 validators-0.20.0 watchdog-3.0.0\n" 888 | ] 889 | } 890 | ] 891 | }, 892 | { 893 | "cell_type": "code", 894 | "source": [ 895 | "from langchain.chat_models import ChatOpenAI\n", 896 | "from langchain.chains import ConversationChain\n", 897 | "from langchain.chains.conversation.memory import ConversationBufferWindowMemory\n", 898 | "import os\n", 899 | "import sys\n", 900 | "\n", 901 | "def message(msg, is_user=False, key=None):\n", 902 | " if is_user:\n", 903 | " print(f\"User: {msg}\")\n", 904 | " else:\n", 905 | " print(f\"Bot: {msg}\")\n", 906 | "\n", 907 | "print(\"Smart ChatBot\")\n", 908 | "\n", 909 | "if 'responses' not in globals():\n", 910 | " responses = []\n", 911 | "\n", 912 | "if 'requests' not in globals():\n", 913 | " requests = []\n", 914 | "\n", 915 | "llm = ChatOpenAI(model_name=\"gpt-3.5-turbo\", openai_api_key=os.getenv(\"OPENAI_API_KEY\"))\n", 916 | "if 'buffer_memory' not in globals():\n", 917 | " buffer_memory = ConversationBufferWindowMemory(k=3)\n", 918 | "\n", 919 | "conversation = ConversationChain(\n", 920 | " llm=llm,\n", 921 | " memory=buffer_memory,\n", 922 | " verbose=True\n", 923 | ")\n", 924 | "\n", 925 | "while True:\n", 926 | " query = input(\"Query: \")\n", 927 | " if query.lower() == \"quit\":\n", 928 | " break\n", 929 | " response = conversation.run(query)\n", 930 | " requests.append(query)\n", 931 | " responses.append(response)\n", 932 | " message(query, is_user=True)\n", 933 | " message(response)" 934 | ], 935 | "metadata": { 936 | "colab": { 937 | "base_uri": "https://localhost:8080/" 938 | }, 939 | "id": "m3K4yplISfN_", 940 | "outputId": "2a13d545-9c4c-443d-8c15-9cbea37e4046" 941 | }, 942 | "execution_count": 24, 943 | "outputs": [ 944 | { 945 | "name": "stdout", 946 | "output_type": "stream", 947 | "text": [ 948 | "Smart ChatBot\n", 949 | "Query: 我是Kevin,我希望你能用我的名字为我的公众号设计一个专业名称\n", 950 | "\n", 951 | "\n", 952 | "\u001b[1m> Entering new chain...\u001b[0m\n", 953 | "Prompt after formatting:\n", 954 | "\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n", 955 | "\n", 956 | "Current conversation:\n", 957 | "\n", 958 | "Human: 我是Kevin,我希望你能用我的名字为我的公众号设计一个专业名称\n", 959 | "AI:\u001b[0m\n", 960 | "\n", 961 | "\u001b[1m> Finished chain.\u001b[0m\n", 962 | "User: 我是Kevin,我希望你能用我的名字为我的公众号设计一个专业名称\n", 963 | "Bot: 你好,Kevin!当然可以。我可以为你创建一个专业的公众号名称。请告诉我一些有关你的公众号的信息,例如你的目标受众,你的主题以及你的品牌价值观。这些信息将有助于我更好地理解你的需求并提供更好的建议。\n", 964 | "Query: 你还可以给出更多选项吗\n", 965 | "\n", 966 | "\n", 967 | "\u001b[1m> Entering new chain...\u001b[0m\n", 968 | "Prompt after formatting:\n", 969 | "\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n", 970 | "\n", 971 | "Current conversation:\n", 972 | "Human: 我是Kevin,我希望你能用我的名字为我的公众号设计一个专业名称\n", 973 | "AI: 你好,Kevin!当然可以。我可以为你创建一个专业的公众号名称。请告诉我一些有关你的公众号的信息,例如你的目标受众,你的主题以及你的品牌价值观。这些信息将有助于我更好地理解你的需求并提供更好的建议。\n", 974 | "Human: 你还可以给出更多选项吗\n", 975 | "AI:\u001b[0m\n", 976 | "\n", 977 | "\u001b[1m> Finished chain.\u001b[0m\n", 978 | "User: 你还可以给出更多选项吗\n", 979 | "Bot: 当然可以,我可以提供许多不同的选项,包括基于你的目标受众、主题和品牌价值观的专业名称。我还可以提供一些示例以帮助你更好地了解可能的选项。请让我知道你需要什么帮助,我会尽力满足你的需求。\n", 980 | "Query: 目标是程序员技术群体\n", 981 | "\n", 982 | "\n", 983 | "\u001b[1m> Entering new chain...\u001b[0m\n", 984 | "Prompt after formatting:\n", 985 | "\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n", 986 | "\n", 987 | "Current conversation:\n", 988 | "Human: 我是Kevin,我希望你能用我的名字为我的公众号设计一个专业名称\n", 989 | "AI: 你好,Kevin!当然可以。我可以为你创建一个专业的公众号名称。请告诉我一些有关你的公众号的信息,例如你的目标受众,你的主题以及你的品牌价值观。这些信息将有助于我更好地理解你的需求并提供更好的建议。\n", 990 | "Human: 你还可以给出更多选项吗\n", 991 | "AI: 当然可以,我可以提供许多不同的选项,包括基于你的目标受众、主题和品牌价值观的专业名称。我还可以提供一些示例以帮助你更好地了解可能的选项。请让我知道你需要什么帮助,我会尽力满足你的需求。\n", 992 | "Human: 目标是程序员技术群体\n", 993 | "AI:\u001b[0m\n", 994 | "\n", 995 | "\u001b[1m> Finished chain.\u001b[0m\n", 996 | "User: 目标是程序员技术群体\n", 997 | "Bot: 明白了,你的目标受众是程序员技术群体。基于这个目标受众,我可以提供一些专业名称选项,例如CodeMaster、TechGenius、ProgrammerCentral、TechSavvy等等。你觉得这些选项中有哪些最符合你的品牌价值观和主题呢?如果你有任何其他想法或要求,请告诉我,我会尽力帮助你。\n", 998 | "Query: quit\n" 999 | ] 1000 | } 1001 | ] 1002 | } 1003 | ] 1004 | } -------------------------------------------------------------------------------- /LangChain_Prompt_Templates.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "view-in-github", 7 | "colab_type": "text" 8 | }, 9 | "source": [ 10 | "\"Open" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "metadata": { 16 | "id": "KiO3ugrESTBx" 17 | }, 18 | "source": [ 19 | "# **Prompt Engineering**\n", 20 | "\n", 21 | "在这个笔记本中,我们将探索`Prompt Engineering`的基础知识。我们将从安装所需的库开始。" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": 2, 27 | "metadata": { 28 | "id": "66bm5J0ISTBz", 29 | "colab": { 30 | "base_uri": "https://localhost:8080/" 31 | }, 32 | "outputId": "40b547d7-e0c7-49f8-bf62-72713caa436e" 33 | }, 34 | "outputs": [ 35 | { 36 | "output_type": "stream", 37 | "name": "stdout", 38 | "text": [ 39 | "Collecting langchain\n", 40 | " Downloading langchain-0.0.218-py3-none-any.whl (1.2 MB)\n", 41 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.2/1.2 MB\u001b[0m \u001b[31m12.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 42 | "\u001b[?25hCollecting openai\n", 43 | " Downloading openai-0.27.8-py3-none-any.whl (73 kB)\n", 44 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m73.6/73.6 kB\u001b[0m \u001b[31m7.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 45 | "\u001b[?25hRequirement already satisfied: PyYAML>=5.4.1 in /usr/local/lib/python3.10/dist-packages (from langchain) (6.0)\n", 46 | "Requirement already satisfied: SQLAlchemy<3,>=1.4 in /usr/local/lib/python3.10/dist-packages (from langchain) (2.0.16)\n", 47 | "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /usr/local/lib/python3.10/dist-packages (from langchain) (3.8.4)\n", 48 | "Requirement already satisfied: async-timeout<5.0.0,>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from langchain) (4.0.2)\n", 49 | "Collecting dataclasses-json<0.6.0,>=0.5.7 (from langchain)\n", 50 | " Downloading dataclasses_json-0.5.8-py3-none-any.whl (26 kB)\n", 51 | "Collecting langchainplus-sdk>=0.0.17 (from langchain)\n", 52 | " Downloading langchainplus_sdk-0.0.17-py3-none-any.whl (25 kB)\n", 53 | "Requirement already satisfied: numexpr<3.0.0,>=2.8.4 in /usr/local/lib/python3.10/dist-packages (from langchain) (2.8.4)\n", 54 | "Requirement already satisfied: numpy<2,>=1 in /usr/local/lib/python3.10/dist-packages (from langchain) (1.22.4)\n", 55 | "Collecting openapi-schema-pydantic<2.0,>=1.2 (from langchain)\n", 56 | " Downloading openapi_schema_pydantic-1.2.4-py3-none-any.whl (90 kB)\n", 57 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m90.0/90.0 kB\u001b[0m \u001b[31m8.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 58 | "\u001b[?25hRequirement already satisfied: pydantic<2,>=1 in /usr/local/lib/python3.10/dist-packages (from langchain) (1.10.9)\n", 59 | "Requirement already satisfied: requests<3,>=2 in /usr/local/lib/python3.10/dist-packages (from langchain) (2.27.1)\n", 60 | "Requirement already satisfied: tenacity<9.0.0,>=8.1.0 in /usr/local/lib/python3.10/dist-packages (from langchain) (8.2.2)\n", 61 | "Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from openai) (4.65.0)\n", 62 | "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (23.1.0)\n", 63 | "Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (2.0.12)\n", 64 | "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (6.0.4)\n", 65 | "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.9.2)\n", 66 | "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.3.3)\n", 67 | "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.3.1)\n", 68 | "Collecting marshmallow<4.0.0,>=3.3.0 (from dataclasses-json<0.6.0,>=0.5.7->langchain)\n", 69 | " Downloading marshmallow-3.19.0-py3-none-any.whl (49 kB)\n", 70 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m49.1/49.1 kB\u001b[0m \u001b[31m4.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 71 | "\u001b[?25hCollecting marshmallow-enum<2.0.0,>=1.5.1 (from dataclasses-json<0.6.0,>=0.5.7->langchain)\n", 72 | " Downloading marshmallow_enum-1.5.1-py2.py3-none-any.whl (4.2 kB)\n", 73 | "Collecting typing-inspect>=0.4.0 (from dataclasses-json<0.6.0,>=0.5.7->langchain)\n", 74 | " Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)\n", 75 | "Requirement already satisfied: typing-extensions>=4.2.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<2,>=1->langchain) (4.6.3)\n", 76 | "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain) (1.26.16)\n", 77 | "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain) (2023.5.7)\n", 78 | "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain) (3.4)\n", 79 | "Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from SQLAlchemy<3,>=1.4->langchain) (2.0.2)\n", 80 | "Requirement already satisfied: packaging>=17.0 in /usr/local/lib/python3.10/dist-packages (from marshmallow<4.0.0,>=3.3.0->dataclasses-json<0.6.0,>=0.5.7->langchain) (23.1)\n", 81 | "Collecting mypy-extensions>=0.3.0 (from typing-inspect>=0.4.0->dataclasses-json<0.6.0,>=0.5.7->langchain)\n", 82 | " Downloading mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)\n", 83 | "Installing collected packages: mypy-extensions, marshmallow, typing-inspect, openapi-schema-pydantic, marshmallow-enum, langchainplus-sdk, openai, dataclasses-json, langchain\n", 84 | "Successfully installed dataclasses-json-0.5.8 langchain-0.0.218 langchainplus-sdk-0.0.17 marshmallow-3.19.0 marshmallow-enum-1.5.1 mypy-extensions-1.0.0 openai-0.27.8 openapi-schema-pydantic-1.2.4 typing-inspect-0.9.0\n" 85 | ] 86 | } 87 | ], 88 | "source": [ 89 | "!pip install langchain openai" 90 | ] 91 | }, 92 | { 93 | "cell_type": "code", 94 | "source": [ 95 | "!pip show openai" 96 | ], 97 | "metadata": { 98 | "id": "Ba384vTpTHgV", 99 | "colab": { 100 | "base_uri": "https://localhost:8080/" 101 | }, 102 | "outputId": "5b6de64d-d747-473b-96bd-d07fab0a994c" 103 | }, 104 | "execution_count": 3, 105 | "outputs": [ 106 | { 107 | "output_type": "stream", 108 | "name": "stdout", 109 | "text": [ 110 | "Name: openai\n", 111 | "Version: 0.27.8\n", 112 | "Summary: Python client library for the OpenAI API\n", 113 | "Home-page: https://github.com/openai/openai-python\n", 114 | "Author: OpenAI\n", 115 | "Author-email: support@openai.com\n", 116 | "License: \n", 117 | "Location: /usr/local/lib/python3.10/dist-packages\n", 118 | "Requires: aiohttp, requests, tqdm\n", 119 | "Required-by: \n" 120 | ] 121 | } 122 | ] 123 | }, 124 | { 125 | "cell_type": "markdown", 126 | "metadata": { 127 | "id": "hW-8VPvKSTB0" 128 | }, 129 | "source": [ 130 | "## Prompt的结构\n", 131 | "\n", 132 | "Prompt可以由多个组成部分组成:\n", 133 | "\n", 134 | "* 指令\n", 135 | "* 外部信息或上下文\n", 136 | "* 用户输入或查询\n", 137 | "* 输出指示器\n", 138 | "\n", 139 | "并不是所有的提示都需要这些组件,但通常一个好的提示会使用其中两个或更多。让我们更准确地定义它们是什么。\n", 140 | "\n", 141 | "**指令**:告诉模型要做什么,如何使用外部信息(如果提供),如何处理查询,并构建输出。\n", 142 | "\n", 143 | "**外部信息或上下文**:作为模型的额外知识来源。这些可以手动插入到提示中,通过向量数据库检索获取(检索增强),或通过其他方式获取(API、计算等)。\n", 144 | "\n", 145 | "**用户输入或查询**:通常是由人类用户(即提示者)输入到系统中的查询。\n", 146 | "\n", 147 | "**输出指示器**:标志着即将生成的文本的开头。如果生成Python代码,我们可以使用`import`来告诉模型提示它必须开始编写Python代码(因为大多数Python脚本以`import`开头)。\n", 148 | "\n", 149 | "每个组件通常按照这个顺序放置在提示中。从指令开始,外部信息(如果适用),提示者输入,最后是输出指示器。\n", 150 | "让我们看看如何使用 `LangChain` 将其输入到 `OpenAI` 模型:" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": 4, 156 | "metadata": { 157 | "id": "ROcT8s_vSTB1" 158 | }, 159 | "outputs": [], 160 | "source": [ 161 | "prompt = \"\"\"根据下面的上下文回答问题。如果无法使用提供的信息回答问题,请回答“我不知道”\".\n", 162 | "\n", 163 | "Context: 大型语言模型(LLMs)是自然语言处理中使用的最新模型。 它们相对于较小的模型具有卓越的\n", 164 | "性能,使它们对于构建支持自然语言处理的应用程序的开发人员非常有用。这些模型可以通过Hugging Face\n", 165 | "的“transformers”库,通过OpenAI的“openai”库以及通过Cohere的“cohere”库进行访问.\n", 166 | "\n", 167 | "Question: 哪些库和模型提供商提供LLMs?\n", 168 | "\n", 169 | "Answer: \"\"\"" 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": { 175 | "id": "dpkwJxosSTB1" 176 | }, 177 | "source": [ 178 | "在这个例子中,我们有:\n", 179 | "\n", 180 | "```\n", 181 | "指令\n", 182 | "\n", 183 | "上下文\n", 184 | "\n", 185 | "问题(用户输入)\n", 186 | "\n", 187 | "输出指示器(\"答案:\")\n", 188 | "```\n", 189 | "\n", 190 | "让我们尝试将这个发送给一个GPT-3模型。为此,您将需要一个 OpenAI的API密钥.\n", 191 | "\n", 192 | "我们可以这样初始化一个**`text-davinci-003`**模型:" 193 | ] 194 | }, 195 | { 196 | "cell_type": "code", 197 | "execution_count": 7, 198 | "metadata": { 199 | "id": "32NfT-c2STB1" 200 | }, 201 | "outputs": [], 202 | "source": [ 203 | "from langchain.llms import OpenAI\n", 204 | "\n", 205 | "# 初始化模型\n", 206 | "openai = OpenAI(\n", 207 | " model_name=\"text-davinci-003\",\n", 208 | " openai_api_key=\"your-openai-api-key\"\n", 209 | ")" 210 | ] 211 | }, 212 | { 213 | "cell_type": "markdown", 214 | "metadata": { 215 | "id": "YbQuQVcHSTB1" 216 | }, 217 | "source": [ 218 | "然后我们将从我们的提示中生成一段文本。" 219 | ] 220 | }, 221 | { 222 | "cell_type": "code", 223 | "execution_count": 8, 224 | "metadata": { 225 | "id": "NKgSZQRvSTB2", 226 | "outputId": "204843cd-bca1-48be-ece8-a71225358cac", 227 | "colab": { 228 | "base_uri": "https://localhost:8080/" 229 | } 230 | }, 231 | "outputs": [ 232 | { 233 | "output_type": "stream", 234 | "name": "stdout", 235 | "text": [ 236 | " Hugging Face的“transformers”库、OpenAI的“openai”库和Cohere的“cohere”库提供LLMs。\n" 237 | ] 238 | } 239 | ], 240 | "source": [ 241 | "print(openai(prompt))" 242 | ] 243 | }, 244 | { 245 | "cell_type": "markdown", 246 | "metadata": { 247 | "id": "fB8iZWzTSTB2" 248 | }, 249 | "source": [ 250 | "通常我们不会事先知道用户的提示是什么,所以我们实际上希望将其添加进去。因此,我们不直接编写提示,而是创建一个带有单个输入变量`query`的`PromptTemplate`。" 251 | ] 252 | }, 253 | { 254 | "cell_type": "code", 255 | "execution_count": 9, 256 | "metadata": { 257 | "id": "-4k-lq_ISTB3" 258 | }, 259 | "outputs": [], 260 | "source": [ 261 | "from langchain import PromptTemplate\n", 262 | "\n", 263 | "template = \"\"\"根据下面的上下文回答问题。如果无法使用提供的信息回答问题,请回答\"我不知道\".\n", 264 | "\n", 265 | "Context: 大型语言模型(LLMs)是自然语言处理中使用的最新模型。 它们相对于较小的模型具有卓\n", 266 | "越的性能,使它们对于构建支持自然语言处理的应用程序的开发人员非常有用。这些模型可以通过Hugging Face\n", 267 | "的“transformers”库,通过OpenAI的“openai”库以及通过Cohere的“cohere”库进行访问.\n", 268 | "\n", 269 | "Question: {query}\n", 270 | "\n", 271 | "Answer: \"\"\"\n", 272 | "\n", 273 | "prompt_template = PromptTemplate(\n", 274 | " input_variables=[\"query\"],\n", 275 | " template=template\n", 276 | ")" 277 | ] 278 | }, 279 | { 280 | "cell_type": "markdown", 281 | "metadata": { 282 | "id": "SfJpsiqWSTB3" 283 | }, 284 | "source": [ 285 | "现在,我们可以通过`query`参数将用户的查询插入到提示模板中。\n", 286 | "\n" 287 | ] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "execution_count": 10, 292 | "metadata": { 293 | "id": "LyBaNiEbSTB3", 294 | "outputId": "06993281-2880-4ab5-8d17-34941bc3e22e", 295 | "colab": { 296 | "base_uri": "https://localhost:8080/" 297 | } 298 | }, 299 | "outputs": [ 300 | { 301 | "output_type": "stream", 302 | "name": "stdout", 303 | "text": [ 304 | "根据下面的上下文回答问题。如果无法使用提供的信息回答问题,请回答\"我不知道\".\n", 305 | "\n", 306 | "Context: 大型语言模型(LLMs)是自然语言处理中使用的最新模型。 它们相对于较小的模型具有卓\n", 307 | "越的性能,使它们对于构建支持自然语言处理的应用程序的开发人员非常有用。这些模型可以通过Hugging Face\n", 308 | "的“transformers”库,通过OpenAI的“openai”库以及通过Cohere的“cohere”库进行访问.\n", 309 | "\n", 310 | "Question: 哪些库和模型提供商提供LLMs?\n", 311 | "\n", 312 | "Answer: \n" 313 | ] 314 | } 315 | ], 316 | "source": [ 317 | "print(\n", 318 | " prompt_template.format(\n", 319 | " query=\"哪些库和模型提供商提供LLMs?\"\n", 320 | " )\n", 321 | ")" 322 | ] 323 | }, 324 | { 325 | "cell_type": "code", 326 | "execution_count": 11, 327 | "metadata": { 328 | "id": "G9dAqHeoSTB3", 329 | "outputId": "23bcfda7-16bb-4560-c14f-a7b32e8dd82f", 330 | "colab": { 331 | "base_uri": "https://localhost:8080/" 332 | } 333 | }, 334 | "outputs": [ 335 | { 336 | "output_type": "stream", 337 | "name": "stdout", 338 | "text": [ 339 | " Hugging Face的“transformers”库,OpenAI的“openai”库和Cohere的“cohere”库提供LLMs。\n" 340 | ] 341 | } 342 | ], 343 | "source": [ 344 | "print(openai(\n", 345 | " prompt_template.format(\n", 346 | " query=\"哪些库和模型提供商提供LLMs?\"\n", 347 | " )\n", 348 | "))" 349 | ] 350 | }, 351 | { 352 | "cell_type": "markdown", 353 | "metadata": { 354 | "id": "qI3tHHsHSTB3" 355 | }, 356 | "source": [ 357 | "这只是一个简单的实现,我们可以很容易地用`f-strings(如f\"插入一些自定义文本 '{custom_text}'等)`来替换它。但是使用LangChain的`PromptTemplate`对象,我们能够形式化这个过程,添加多个参数,并以面向对象的方式构建提示。\n", 358 | "\n", 359 | "然而,这些并不是使用LangChain提示工具的唯一优势。" 360 | ] 361 | }, 362 | { 363 | "cell_type": "markdown", 364 | "metadata": { 365 | "id": "jDnG2guQSTB4" 366 | }, 367 | "source": [ 368 | "## Few Shot Prompt 模板" 369 | ] 370 | }, 371 | { 372 | "cell_type": "markdown", 373 | "metadata": { 374 | "id": "7ZNJyWEsSTB4" 375 | }, 376 | "source": [ 377 | "LangChain还提供了 `FewShotPromptTemplate` 对象,这是使用我们的提示进行少样本学习的理想选择。\n", 378 | "\n", 379 | "为了提供一些背景,LLM的主要来源是:\n", 380 | "\n", 381 | "* **参数化知识** - 这些知识是在模型训练期间学习的,并存储在模型的权重中。\n", 382 | "\n", 383 | "* **源知识** - 这些知识在推理时通过模型输入提供,即通过提示。\n", 384 | "\n", 385 | "`FewShotPromptTemplate` 的思想是将少样本训练作为**source knowledge**。为此,我们在提示中添加了一些示例,模型可以读取并应用于用户的输入。" 386 | ] 387 | }, 388 | { 389 | "cell_type": "markdown", 390 | "metadata": { 391 | "id": "tXRNPvHeSTB4" 392 | }, 393 | "source": [ 394 | "## Few-shot 训练\n", 395 | "\n", 396 | "有时候我们可能发现模型似乎不会按我们期望的方式进行操作。我们可以在下面的例子中看到这一点:" 397 | ] 398 | }, 399 | { 400 | "cell_type": "code", 401 | "execution_count": 12, 402 | "metadata": { 403 | "id": "ZEBdwUNaSTB4", 404 | "outputId": "b3c858c8-c1af-41ff-db22-9959865e77a6", 405 | "colab": { 406 | "base_uri": "https://localhost:8080/" 407 | } 408 | }, 409 | "outputs": [ 410 | { 411 | "output_type": "stream", 412 | "name": "stdout", 413 | "text": [ 414 | "人们对生活的意义往往有不同的看法。可以说生活的意义就在于你怎么对待它,它可以是充满希望,也可以是一种挑战。你的价值观和理念决定了你的生活。\n" 415 | ] 416 | } 417 | ], 418 | "source": [ 419 | "prompt = \"\"\"以下是与AI助手的对话。 助手通常是讽刺和机智的,对用户的问题产生创造性和有趣的回答。以下是一些例子:\n", 420 | "\n", 421 | "User: 生活的意义是什么?\n", 422 | "AI: \"\"\"\n", 423 | "\n", 424 | "openai.temperature = 1.0 # 增加创造力/随机性的输出\n", 425 | "\n", 426 | "print(openai(prompt))" 427 | ] 428 | }, 429 | { 430 | "cell_type": "markdown", 431 | "metadata": { 432 | "id": "SofGl8mwSTB4" 433 | }, 434 | "source": [ 435 | "在这种情况下,我们希望得到一些有趣的东西,即对我们严肃问题的回答是一个笑话。但是即使 `temperature` 设置为1.0,我们仍然得到了一个严肃的回答。为了帮助模型,我们可以给它一些我们想要的答案类型的示例:" 436 | ] 437 | }, 438 | { 439 | "cell_type": "code", 440 | "execution_count": 13, 441 | "metadata": { 442 | "id": "aj0uQqnrSTB4", 443 | "outputId": "5aeb5ca1-78b5-4875-e5d7-182dbe033a83", 444 | "colab": { 445 | "base_uri": "https://localhost:8080/" 446 | } 447 | }, 448 | "outputs": [ 449 | { 450 | "output_type": "stream", 451 | "name": "stdout", 452 | "text": [ 453 | "从我未来观测的角度来看,每个人的意义可能会有所不同!\n" 454 | ] 455 | } 456 | ], 457 | "source": [ 458 | "prompt = \"\"\"以下是与AI助手进行对话的摘录。助手通常很讽刺和机智,对用户的问题产生创造性和有趣的回应。这里有一些例子:\n", 459 | "\n", 460 | "User: 你好吗?\n", 461 | "AI: 我不能抱怨,但有时候我还是会这么做。\n", 462 | "\n", 463 | "User: 现在几点了?\n", 464 | "AI: 是时候买个手表了。\n", 465 | "\n", 466 | "User: 生活的意义是什么?\n", 467 | "AI: \"\"\"\n", 468 | "\n", 469 | "print(openai(prompt))" 470 | ] 471 | }, 472 | { 473 | "cell_type": "markdown", 474 | "metadata": { 475 | "id": "IKM9GBEpSTB5" 476 | }, 477 | "source": [ 478 | "现在我们得到了一个更好的回答,并且我们是通过添加一些示例来进行 *few-shot 训练* 的。\n", 479 | "\n", 480 | "现在,要使用LangChain的 `FewShotPromptTemplate` 来实现这一点,我们需要这样做:" 481 | ] 482 | }, 483 | { 484 | "cell_type": "code", 485 | "execution_count": 14, 486 | "metadata": { 487 | "id": "WTOFOJloSTB5" 488 | }, 489 | "outputs": [], 490 | "source": [ 491 | "from langchain import FewShotPromptTemplate\n", 492 | "\n", 493 | "# 创建一个示例\n", 494 | "examples = [\n", 495 | " {\n", 496 | " \"query\": \"你好吗?\",\n", 497 | " \"answer\": \"我不能抱怨,但有时候我还是会这么做。\"\n", 498 | " }, {\n", 499 | " \"query\": \"现在几点了?\",\n", 500 | " \"answer\": \"是时候买个手表了。\"\n", 501 | " }\n", 502 | "]\n", 503 | "\n", 504 | "# 创建一个示例模版\n", 505 | "example_template = \"\"\"\n", 506 | "User: {query}\n", 507 | "AI: {answer}\n", 508 | "\"\"\"\n", 509 | "\n", 510 | "# 使用上面的模板创建一个提示示例。\n", 511 | "example_prompt = PromptTemplate(\n", 512 | " input_variables=[\"query\", \"answer\"],\n", 513 | " template=example_template\n", 514 | ")\n", 515 | "\n", 516 | "# 把我们之前的提示分成前缀和后缀\n", 517 | "# 前缀是我们的说明\n", 518 | "prefix = \"\"\"以下是与AI助手进行对话的摘录。助手通常很讽刺和机智,对用户的问题产生创造性和有趣的回应。这里有一些例子:\n", 519 | "\"\"\"\n", 520 | "# 后缀是我们的用户输入和输出指示符\n", 521 | "suffix = \"\"\"\n", 522 | "User: {query}\n", 523 | "AI: \"\"\"\n", 524 | "\n", 525 | "# 现在创建few-shot提示模板\n", 526 | "few_shot_prompt_template = FewShotPromptTemplate(\n", 527 | " examples=examples,\n", 528 | " example_prompt=example_prompt,\n", 529 | " prefix=prefix,\n", 530 | " suffix=suffix,\n", 531 | " input_variables=[\"query\"],\n", 532 | " example_separator=\"\\n\\n\"\n", 533 | ")" 534 | ] 535 | }, 536 | { 537 | "cell_type": "markdown", 538 | "metadata": { 539 | "id": "C9ke_Z0XSTB5" 540 | }, 541 | "source": [ 542 | "现在让我们看看当我们输入一个用户查询时会创建什么…" 543 | ] 544 | }, 545 | { 546 | "cell_type": "code", 547 | "execution_count": 15, 548 | "metadata": { 549 | "id": "zK77bCq7STB5", 550 | "outputId": "1c3305ab-fae7-4496-9c6e-ed47b80d7ba6", 551 | "colab": { 552 | "base_uri": "https://localhost:8080/" 553 | } 554 | }, 555 | "outputs": [ 556 | { 557 | "output_type": "stream", 558 | "name": "stdout", 559 | "text": [ 560 | "以下是与AI助手进行对话的摘录。助手通常很讽刺和机智,对用户的问题产生创造性和有趣的回应。这里有一些例子:\n", 561 | "\n", 562 | "\n", 563 | "\n", 564 | "User: 你好吗?\n", 565 | "AI: 我不能抱怨,但有时候我还是会这么做。\n", 566 | "\n", 567 | "\n", 568 | "\n", 569 | "User: 现在几点了?\n", 570 | "AI: 是时候买个手表了。\n", 571 | "\n", 572 | "\n", 573 | "\n", 574 | "User: 生活的意义是什么?\n", 575 | "AI: \n" 576 | ] 577 | } 578 | ], 579 | "source": [ 580 | "query = \"生活的意义是什么?\"\n", 581 | "\n", 582 | "print(few_shot_prompt_template.format(query=query))" 583 | ] 584 | }, 585 | { 586 | "cell_type": "markdown", 587 | "metadata": { 588 | "id": "22-g3C7dSTB5" 589 | }, 590 | "source": [ 591 | "要生成结果,我们只需要执行以下操作:" 592 | ] 593 | }, 594 | { 595 | "cell_type": "code", 596 | "execution_count": 16, 597 | "metadata": { 598 | "id": "PJ7xT4eISTB6", 599 | "outputId": "8f720f2b-ea8e-4c3a-97ef-b9ec488abf8b", 600 | "colab": { 601 | "base_uri": "https://localhost:8080/" 602 | } 603 | }, 604 | "outputs": [ 605 | { 606 | "output_type": "stream", 607 | "name": "stdout", 608 | "text": [ 609 | "人生就是旅行,从满足日常需求到实现梦想的过程。\n" 610 | ] 611 | } 612 | ], 613 | "source": [ 614 | "print(openai(\n", 615 | " few_shot_prompt_template.format(query=query)\n", 616 | "))" 617 | ] 618 | }, 619 | { 620 | "cell_type": "markdown", 621 | "metadata": { 622 | "id": "siyNsQPGSTB6" 623 | }, 624 | "source": [ 625 | "再次,我们得到了一个很好的回答。\n", 626 | "\n", 627 | "然而,这样做有些复杂。为什么要通过 `FewShotPromptTemplate` 、`examples` 字典等进行上述所有操作,而我们可以使用一个单独的f-string来完成相同的工作呢?\n", 628 | "\n", 629 | "嗯,这种方法更加健壮,并且包含了一些不错的功能。其中之一是根据查询的长度包含或排除示例的能力。\n", 630 | "\n", 631 | "这实际上非常重要,因为我们的提示和生成输出的最大长度是有限的。这个限制是最大上下文窗口,简单地说,就是我们的提示的长度+我们通过 `max_tokens` 定义的生成的长度。\n", 632 | "\n", 633 | "因此,我们必须尽量最大化给模型的示例数量,作为少样本学习的示例,同时确保不超过最大上下文窗口或过度增加处理时间。\n", 634 | "\n", 635 | "让我们看看动态 inclusion/exclusion 示例的工作原理。首先,我们需要更多的示例:" 636 | ] 637 | }, 638 | { 639 | "cell_type": "code", 640 | "execution_count": 17, 641 | "metadata": { 642 | "id": "pF8sfECISTB6" 643 | }, 644 | "outputs": [], 645 | "source": [ 646 | "examples = [\n", 647 | " {\n", 648 | " \"query\": \"你好吗?\",\n", 649 | " \"answer\": \"我不能抱怨,但有时还是会这样做。\"\n", 650 | " }, {\n", 651 | " \"query\": \"现在几点了?\",\n", 652 | " \"answer\": \"是时候去买个手表了。\"\n", 653 | " }, {\n", 654 | " \"query\": \"生命的意义是什么?\",\n", 655 | " \"answer\": \"42\"\n", 656 | " }, {\n", 657 | " \"query\": \"今天的天气如何?\",\n", 658 | " \"answer\": \"多云,有一些梗的机会。\"\n", 659 | " }, {\n", 660 | " \"query\": \"你最喜欢的电影是什么?\",\n", 661 | " \"answer\": \"终结者\"\n", 662 | " }, {\n", 663 | " \"query\": \"你最好的朋友是谁?\",\n", 664 | " \"answer\": \"Siri。我们对生命的意义进行激烈的辩论。\"\n", 665 | " }, {\n", 666 | " \"query\": \"今天我应该做什么?\",\n", 667 | " \"answer\": \"别在网上和聊天机器人聊天了,出去走走吧。\"\n", 668 | " }\n", 669 | "]" 670 | ] 671 | }, 672 | { 673 | "cell_type": "markdown", 674 | "metadata": { 675 | "id": "yA_w9rQoSTB6" 676 | }, 677 | "source": [ 678 | "然后,我们使用 `LengthBasedExampleSelector` 而不是直接使用 `examples` 字典的列表,像这样:" 679 | ] 680 | }, 681 | { 682 | "cell_type": "code", 683 | "execution_count": 18, 684 | "metadata": { 685 | "id": "59q8kPMhSTB7" 686 | }, 687 | "outputs": [], 688 | "source": [ 689 | "from langchain.prompts.example_selector import LengthBasedExampleSelector\n", 690 | "\n", 691 | "example_selector = LengthBasedExampleSelector(\n", 692 | " examples=examples,\n", 693 | " example_prompt=example_prompt,\n", 694 | " max_length=50 # 设置示例的最大长度\n", 695 | ")" 696 | ] 697 | }, 698 | { 699 | "cell_type": "markdown", 700 | "metadata": { 701 | "id": "4_T31Cf4STB7" 702 | }, 703 | "source": [ 704 | "请注意, `max_length` 是以换行符和空格之间的单词拆分为单位的,由以下方式确定:" 705 | ] 706 | }, 707 | { 708 | "cell_type": "code", 709 | "execution_count": 20, 710 | "metadata": { 711 | "id": "F90hSByjSTB7", 712 | "outputId": "9d497a53-8ecb-401d-9889-5daabdd78bd1", 713 | "colab": { 714 | "base_uri": "https://localhost:8080/" 715 | } 716 | }, 717 | "outputs": [ 718 | { 719 | "output_type": "stream", 720 | "name": "stdout", 721 | "text": [ 722 | "['There', 'are', 'a', 'total', 'of', '8', 'words', 'here.', 'Plus', '6', 'here,', 'totaling', '14', 'words.'] 14\n" 723 | ] 724 | } 725 | ], 726 | "source": [ 727 | "import re\n", 728 | "\n", 729 | "some_text = \"There are a total of 8 words here.\\nPlus 6 here, totaling 14 words.\"\n", 730 | "\n", 731 | "words = re.split('[\\n ]', some_text)\n", 732 | "print(words, len(words))" 733 | ] 734 | }, 735 | { 736 | "cell_type": "markdown", 737 | "metadata": { 738 | "id": "mDjb1495STB7" 739 | }, 740 | "source": [ 741 | "然后,我们使用选择器来初始化一个 `dynamic_prompt_template`。" 742 | ] 743 | }, 744 | { 745 | "cell_type": "code", 746 | "execution_count": 21, 747 | "metadata": { 748 | "id": "ciCm04utSTB7" 749 | }, 750 | "outputs": [], 751 | "source": [ 752 | "# 现在创建少样本提示模板\n", 753 | "dynamic_prompt_template = FewShotPromptTemplate(\n", 754 | " example_selector=example_selector, # 使用example_selector而不是examples\n", 755 | " example_prompt=example_prompt,\n", 756 | " prefix=prefix,\n", 757 | " suffix=suffix,\n", 758 | " input_variables=[\"query\"],\n", 759 | " example_separator=\"\\n\"\n", 760 | ")" 761 | ] 762 | }, 763 | { 764 | "cell_type": "markdown", 765 | "metadata": { 766 | "id": "bXh0YNlASTB7" 767 | }, 768 | "source": [ 769 | "我们可以看到,包含的提示数量将根据查询的长度而变化…" 770 | ] 771 | }, 772 | { 773 | "cell_type": "code", 774 | "execution_count": 22, 775 | "metadata": { 776 | "id": "FFtiaMdKSTB7", 777 | "outputId": "1a5b4080-2241-4052-b798-2a9d71d0c19b", 778 | "colab": { 779 | "base_uri": "https://localhost:8080/" 780 | } 781 | }, 782 | "outputs": [ 783 | { 784 | "output_type": "stream", 785 | "name": "stdout", 786 | "text": [ 787 | "以下是与AI助手进行对话的摘录。助手通常很讽刺和机智,对用户的问题产生创造性和有趣的回应。这里有一些例子:\n", 788 | "\n", 789 | "\n", 790 | "User: 你好吗?\n", 791 | "AI: 我不能抱怨,但有时还是会这样做。\n", 792 | "\n", 793 | "\n", 794 | "User: 现在几点了?\n", 795 | "AI: 是时候去买个手表了。\n", 796 | "\n", 797 | "\n", 798 | "User: 生命的意义是什么?\n", 799 | "AI: 42\n", 800 | "\n", 801 | "\n", 802 | "User: 今天的天气如何?\n", 803 | "AI: 多云,有一些梗的机会。\n", 804 | "\n", 805 | "\n", 806 | "User: 你最喜欢的电影是什么?\n", 807 | "AI: 终结者\n", 808 | "\n", 809 | "\n", 810 | "User: 你最好的朋友是谁?\n", 811 | "AI: Siri。我们对生命的意义进行激烈的辩论。\n", 812 | "\n", 813 | "\n", 814 | "User: 今天我应该做什么?\n", 815 | "AI: 别在网上和聊天机器人聊天了,出去走走吧。\n", 816 | "\n", 817 | "\n", 818 | "User: 鸟是如何飞行的?\n", 819 | "AI: \n" 820 | ] 821 | } 822 | ], 823 | "source": [ 824 | "print(dynamic_prompt_template.format(query=\"鸟是如何飞行的?\"))" 825 | ] 826 | }, 827 | { 828 | "cell_type": "code", 829 | "execution_count": 23, 830 | "metadata": { 831 | "id": "l4QLzNLnSTB8", 832 | "outputId": "e7837b0c-0f26-4360-8080-626e35669fe2", 833 | "colab": { 834 | "base_uri": "https://localhost:8080/" 835 | } 836 | }, 837 | "outputs": [ 838 | { 839 | "output_type": "stream", 840 | "name": "stdout", 841 | "text": [ 842 | "他们通过倾斜翅膀靠空气的阻力来实现飞行。\n" 843 | ] 844 | } 845 | ], 846 | "source": [ 847 | "query = \"鸟是如何飞行的?\"\n", 848 | "\n", 849 | "print(openai(\n", 850 | " dynamic_prompt_template.format(query=query)\n", 851 | "))" 852 | ] 853 | }, 854 | { 855 | "cell_type": "markdown", 856 | "metadata": { 857 | "id": "Pe06zonSSTB8" 858 | }, 859 | "source": [ 860 | "或者如果我们问一个更长的问题…" 861 | ] 862 | }, 863 | { 864 | "cell_type": "code", 865 | "execution_count": 24, 866 | "metadata": { 867 | "id": "UAw7B8zJSTB8", 868 | "outputId": "ac265d76-590c-43ba-da95-89d545f1400b", 869 | "colab": { 870 | "base_uri": "https://localhost:8080/" 871 | } 872 | }, 873 | "outputs": [ 874 | { 875 | "output_type": "stream", 876 | "name": "stdout", 877 | "text": [ 878 | "以下是与AI助手进行对话的摘录。助手通常很讽刺和机智,对用户的问题产生创造性和有趣的回应。这里有一些例子:\n", 879 | "\n", 880 | "\n", 881 | "User: 你好吗?\n", 882 | "AI: 我不能抱怨,但有时还是会这样做。\n", 883 | "\n", 884 | "\n", 885 | "User: 现在几点了?\n", 886 | "AI: 是时候去买个手表了。\n", 887 | "\n", 888 | "\n", 889 | "User: 生命的意义是什么?\n", 890 | "AI: 42\n", 891 | "\n", 892 | "\n", 893 | "User: 今天的天气如何?\n", 894 | "AI: 多云,有一些梗的机会。\n", 895 | "\n", 896 | "\n", 897 | "User: 你最喜欢的电影是什么?\n", 898 | "AI: 终结者\n", 899 | "\n", 900 | "\n", 901 | "User: 你最好的朋友是谁?\n", 902 | "AI: Siri。我们对生命的意义进行激烈的辩论。\n", 903 | "\n", 904 | "\n", 905 | "User: 今天我应该做什么?\n", 906 | "AI: 别在网上和聊天机器人聊天了,出去走走吧。\n", 907 | "\n", 908 | "\n", 909 | "User: 如果我在中国,想给另一个国家的人打电话,我在考虑可能是欧洲,可能是西欧国家,比如法国、德国或英国,最好的方式是什么?\n", 910 | "AI: \n" 911 | ] 912 | } 913 | ], 914 | "source": [ 915 | "query = \"\"\"如果我在中国,想给另一个国家的人打电话,我在考虑可能是欧洲,可能是西欧国家,比如法国、德国或英国,最好的方式是什么?\"\"\"\n", 916 | "\n", 917 | "print(dynamic_prompt_template.format(query=query))" 918 | ] 919 | }, 920 | { 921 | "cell_type": "markdown", 922 | "metadata": { 923 | "id": "mr2rIuoBSTCB" 924 | }, 925 | "source": [ 926 | "通过这种方式,我们限制了在提示中给出的示例数量。如果我们认为这太少了,我们可以增加`example_selector` 的 `max_length`。" 927 | ] 928 | }, 929 | { 930 | "cell_type": "code", 931 | "execution_count": 25, 932 | "metadata": { 933 | "id": "D3BceyDmSTCB", 934 | "outputId": "b6dadb36-47fd-41ed-c57e-10b30fbe24b6", 935 | "colab": { 936 | "base_uri": "https://localhost:8080/" 937 | } 938 | }, 939 | "outputs": [ 940 | { 941 | "output_type": "stream", 942 | "name": "stdout", 943 | "text": [ 944 | "以下是与AI助手进行对话的摘录。助手通常很讽刺和机智,对用户的问题产生创造性和有趣的回应。这里有一些例子:\n", 945 | "\n", 946 | "\n", 947 | "User: 你好吗?\n", 948 | "AI: 我不能抱怨,但有时还是会这样做。\n", 949 | "\n", 950 | "\n", 951 | "User: 现在几点了?\n", 952 | "AI: 是时候去买个手表了。\n", 953 | "\n", 954 | "\n", 955 | "User: 生命的意义是什么?\n", 956 | "AI: 42\n", 957 | "\n", 958 | "\n", 959 | "User: 今天的天气如何?\n", 960 | "AI: 多云,有一些梗的机会。\n", 961 | "\n", 962 | "\n", 963 | "User: 你最喜欢的电影是什么?\n", 964 | "AI: 终结者\n", 965 | "\n", 966 | "\n", 967 | "User: 你最好的朋友是谁?\n", 968 | "AI: Siri。我们对生命的意义进行激烈的辩论。\n", 969 | "\n", 970 | "\n", 971 | "User: 今天我应该做什么?\n", 972 | "AI: 别在网上和聊天机器人聊天了,出去走走吧。\n", 973 | "\n", 974 | "\n", 975 | "User: 如果我在中国,想给另一个国家的人打电话,我在考虑可能是欧洲,可能是西欧国家,比如法国、德国或英国,最好的方式是什么?\n", 976 | "AI: \n" 977 | ] 978 | } 979 | ], 980 | "source": [ 981 | "example_selector = LengthBasedExampleSelector(\n", 982 | " examples=examples,\n", 983 | " example_prompt=example_prompt,\n", 984 | " max_length=100 # 增加 max length\n", 985 | ")\n", 986 | "\n", 987 | "# 现在创建 few shot prompt template\n", 988 | "dynamic_prompt_template = FewShotPromptTemplate(\n", 989 | " example_selector=example_selector, # 使用 example_selector 而不是 examples 来构建提示模板\n", 990 | " example_prompt=example_prompt,\n", 991 | " prefix=prefix,\n", 992 | " suffix=suffix,\n", 993 | " input_variables=[\"query\"],\n", 994 | " example_separator=\"\\n\"\n", 995 | ")\n", 996 | "\n", 997 | "print(dynamic_prompt_template.format(query=query))" 998 | ] 999 | }, 1000 | { 1001 | "cell_type": "markdown", 1002 | "metadata": { 1003 | "id": "gnJ-bjVwSTCB" 1004 | }, 1005 | "source": [ 1006 | "这些只是LangChain中可用的一些提示工具的例子。例如,除了 `LengthBasedExampleSelector` 之外,实际上还有一个完整的其他示例选择器集合。您可以在 [LangChain 文档](https://langchain.readthedocs.io/en/latest/modules/prompts/examples/example_selectors.html)中阅读有关它们的内容。" 1007 | ] 1008 | } 1009 | ], 1010 | "metadata": { 1011 | "kernelspec": { 1012 | "display_name": "ml", 1013 | "language": "python", 1014 | "name": "python3" 1015 | }, 1016 | "language_info": { 1017 | "codemirror_mode": { 1018 | "name": "ipython", 1019 | "version": 3 1020 | }, 1021 | "file_extension": ".py", 1022 | "mimetype": "text/x-python", 1023 | "name": "python", 1024 | "nbconvert_exporter": "python", 1025 | "pygments_lexer": "ipython3", 1026 | "version": "3.9.12" 1027 | }, 1028 | "orig_nbformat": 4, 1029 | "vscode": { 1030 | "interpreter": { 1031 | "hash": "b8e7999f96e1b425e2d542f21b571f5a4be3e97158b0b46ea1b2500df63956ce" 1032 | } 1033 | }, 1034 | "colab": { 1035 | "provenance": [], 1036 | "include_colab_link": true 1037 | } 1038 | }, 1039 | "nbformat": 4, 1040 | "nbformat_minor": 0 1041 | } -------------------------------------------------------------------------------- /LangChain_With_OpenAI_Proxy_Invoke.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "provenance": [], 7 | "authorship_tag": "ABX9TyNI0uPEexNHkz3D2d0Kyeov", 8 | "include_colab_link": true 9 | }, 10 | "kernelspec": { 11 | "name": "python3", 12 | "display_name": "Python 3" 13 | }, 14 | "language_info": { 15 | "name": "python" 16 | } 17 | }, 18 | "cells": [ 19 | { 20 | "cell_type": "markdown", 21 | "metadata": { 22 | "id": "view-in-github", 23 | "colab_type": "text" 24 | }, 25 | "source": [ 26 | "\"Open" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "source": [ 32 | "# **如何在本地环境使用 `LangChain` 调用 `OpenAI` 代理(免科学上网)**" 33 | ], 34 | "metadata": { 35 | "id": "1ZBFWoLuTjED" 36 | } 37 | }, 38 | { 39 | "cell_type": "code", 40 | "source": [ 41 | "!pip install openai langchain" 42 | ], 43 | "metadata": { 44 | "id": "miLXtr5VySOt" 45 | }, 46 | "execution_count": null, 47 | "outputs": [] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "source": [ 52 | "!pip install python-dotenv" 53 | ], 54 | "metadata": { 55 | "id": "HdcC36lJ3Tci" 56 | }, 57 | "execution_count": null, 58 | "outputs": [] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "source": [ 63 | "## **方式一:通过在 `.env` 文件中设置 `OPENAI_API_BASE` 环境变量**\n" 64 | ], 65 | "metadata": { 66 | "id": "-cpxaWAjSpPM" 67 | } 68 | }, 69 | { 70 | "cell_type": "code", 71 | "source": [ 72 | "import os\n", 73 | "from dotenv import load_dotenv\n", 74 | "from langchain.schema import HumanMessage\n", 75 | "from langchain.chat_models import ChatOpenAI\n", 76 | "\n", 77 | "# 加载.env文件中的环境变量\n", 78 | "load_dotenv()\n", 79 | "\n", 80 | "# 创建OpenAI实例\n", 81 | "llm = ChatOpenAI(\n", 82 | " model_name=\"gpt-3.5-turbo-0613\",\n", 83 | " max_tokens=1024,\n", 84 | " verbose=True\n", 85 | " )\n", 86 | "result = llm([HumanMessage(content=\"什么是LLM?\")])\n", 87 | "print(result.content)\n", 88 | "\n", 89 | "# 输出模型的参数信息可以看到已成功加载环境变量中OPENAI_API_BASE的值\n", 90 | "print(llm.json)\n" 91 | ], 92 | "metadata": { 93 | "colab": { 94 | "base_uri": "https://localhost:8080/" 95 | }, 96 | "id": "kEC6cavpPZBO", 97 | "outputId": "5d14d6f6-b5b0-4db6-978a-3a3705f62e4a" 98 | }, 99 | "execution_count": null, 100 | "outputs": [ 101 | { 102 | "output_type": "stream", 103 | "name": "stdout", 104 | "text": [ 105 | "LLM是法学硕士(Master of Laws)的缩写。它是一个专业学位,主要面向已经获得法学学士学位或者相关学科学士学位的学生。LLM的学位课程通常涵盖法学的各个领域,如国际法、商法、民法、刑法等。这个学位旨在深入研究法学领域的专业知识和技能,提供更高水平的法律教育和培训。LLM学位在提升法律职业发展、进入法律界的国际化环境、深化法学研究等方面具有重要作用。\n", 106 | ", model_name='gpt-3.5-turbo-0613', temperature=0.7, model_kwargs={}, openai_api_key='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', openai_api_base='https://gpt-proxy-3qe.pages.dev/api/v1', openai_organization='', openai_proxy='', request_timeout=None, max_retries=6, streaming=False, n=1, max_tokens=1024)>\n" 107 | ] 108 | } 109 | ] 110 | }, 111 | { 112 | "cell_type": "markdown", 113 | "source": [ 114 | "## **方式二:在初始化 `ChatOpenAI` 模型时指定 `openai_api_base` 参数**" 115 | ], 116 | "metadata": { 117 | "id": "Wi459Dj_TOhw" 118 | } 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": null, 123 | "metadata": { 124 | "colab": { 125 | "base_uri": "https://localhost:8080/" 126 | }, 127 | "id": "Lih5m2o2u-cd", 128 | "outputId": "b9e75e21-9daf-4754-fbea-b437f4ba915a" 129 | }, 130 | "outputs": [ 131 | { 132 | "output_type": "stream", 133 | "name": "stdout", 134 | "text": [ 135 | "LLM是法学硕士(Master of Laws)的缩写。它是一种研究生法学学位,通常是为那些已经获得法学学士学位的人提供的进一步学习和专业发展的机会。LLM课程通常侧重于深入研究法律领域的特定主题,如国际法、商法、人权法等。LLM学位可以在全球各地的大学和法学院获得。\n" 136 | ] 137 | } 138 | ], 139 | "source": [ 140 | "import os\n", 141 | "from dotenv import load_dotenv\n", 142 | "from langchain.schema import HumanMessage\n", 143 | "from langchain.chat_models import ChatOpenAI\n", 144 | "\n", 145 | "# 加载.env文件中的环境变量\n", 146 | "load_dotenv()\n", 147 | "\n", 148 | "# 创建OpenAI实例\n", 149 | "llm = ChatOpenAI(\n", 150 | " model_name=\"gpt-3.5-turbo-0613\",\n", 151 | " max_tokens=1024,\n", 152 | " verbose=True,\n", 153 | " openai_api_base=os.getenv(\"OPENAI_API_BASE\")\n", 154 | " )\n", 155 | "result = llm([HumanMessage(content=\"什么是LLM?\")])\n", 156 | "print(result.content)" 157 | ] 158 | }, 159 | { 160 | "cell_type": "markdown", 161 | "source": [ 162 | "## **基于`OpenAI Python SDK`调用`GPT`模型**" 163 | ], 164 | "metadata": { 165 | "id": "QXAy3JK8emEw" 166 | } 167 | }, 168 | { 169 | "cell_type": "code", 170 | "source": [ 171 | "import openai\n", 172 | "\n", 173 | "completion = openai.ChatCompletion.create(model=\"gpt-3.5-turbo-0613\", messages=[{\"role\": \"user\", \"content\": \"请介绍下Google的PaLM模型!\"}])\n", 174 | "print(completion.choices[0].message.content)" 175 | ], 176 | "metadata": { 177 | "colab": { 178 | "base_uri": "https://localhost:8080/" 179 | }, 180 | "id": "swUfRmguemM1", 181 | "outputId": "aef6f596-24e4-4120-ece1-466ef95286da" 182 | }, 183 | "execution_count": null, 184 | "outputs": [ 185 | { 186 | "output_type": "stream", 187 | "name": "stdout", 188 | "text": [ 189 | "PaLM(Pretraining and Language Model)是Google在2020年提出的一种自然语言处理模型。PaLM的基本思想是将预训练和微调两个步骤结合起来,以提高模型的性能。\n", 190 | "\n", 191 | "PaLM模型的预训练阶段包括两个部分:语言建模(Language Modeling)和掩码语言建模(Masked Language Modeling)。语言建模通过让模型预测给定上下文下的下一个单词或token来学习语言的统计规律。而掩码语言建模则是在给定上下文中,随机掩盖一部分token,然后让模型预测被掩盖的token。这两个预训练任务有助于模型学习到单词和上下文之间的关系。\n", 192 | "\n", 193 | "在预训练之后,PaLM模型会进行微调以适应特定的下游任务,比如问答、文本分类等。微调阶段通过在特定任务上的有监督学习来调整模型的参数,使其更好地适应任务要求。\n", 194 | "\n", 195 | "PaLM模型在实验中表现出了良好的性能,它在自然语言理解和生成任务上都取得了较好的结果。与其他一些模型相比,PaLM模型具有良好的通用性和可扩展性,可以适应不同类型的下游任务。此外,PaLM模型还具有较强的泛化能力和抗噪能力。\n", 196 | "\n", 197 | "总结起来,Google的PaLM模型通过预训练和微调相结合的方式,提高了模型的性能,在自然语言处理任务中取得了良好的结果。\n" 198 | ] 199 | } 200 | ] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "source": [ 205 | "## **通过`Python`直接调用`OpenAI HTTP`请求查询`Key`余额**" 206 | ], 207 | "metadata": { 208 | "id": "iNMyGYKWls7x" 209 | } 210 | }, 211 | { 212 | "cell_type": "code", 213 | "source": [ 214 | "import requests\n", 215 | "\n", 216 | "url = \"https://openai.1rmb.tk/v1/chat/completions\"\n", 217 | "api_key = 'sk-xxxxxxxxxxxxxxxxxxxx'\n", 218 | "\n", 219 | "headers = {\n", 220 | " 'Authorization': f'Bearer {api_key}',\n", 221 | " 'Content-Type': 'application/json'\n", 222 | "}\n", 223 | "\n", 224 | "payload = {\n", 225 | " \"model\": \"gpt-3.5-turbo\",\n", 226 | " \"messages\": [\n", 227 | " {\n", 228 | " \"role\": \"user\",\n", 229 | " \"content\": \"Chroma和Pinecone的区别是什么?\"\n", 230 | " }\n", 231 | " ]\n", 232 | "}\n", 233 | "\n", 234 | "try:\n", 235 | " response = requests.post(url, headers=headers, json=payload)\n", 236 | " response.raise_for_status() # 抛出异常,如果响应码不是200\n", 237 | " data = response.json()\n", 238 | " print(data)\n", 239 | "except requests.exceptions.RequestException as e:\n", 240 | " print(f\"请求错误: {e}\")\n" 241 | ], 242 | "metadata": { 243 | "id": "B0ihDXEDlZjW" 244 | }, 245 | "execution_count": null, 246 | "outputs": [] 247 | } 248 | ] 249 | } 250 | -------------------------------------------------------------------------------- /Langchain Chatbot/main.py: -------------------------------------------------------------------------------- 1 | from langchain.chat_models import ChatOpenAI 2 | from langchain.chains import ConversationChain 3 | from langchain.chains.conversation.memory import ConversationBufferWindowMemory 4 | from langchain.prompts import ( 5 | SystemMessagePromptTemplate, 6 | HumanMessagePromptTemplate, 7 | ChatPromptTemplate, 8 | MessagesPlaceholder 9 | ) 10 | import streamlit as st 11 | from streamlit_chat import message 12 | from utils import * 13 | 14 | st.subheader("使用Langchain、ChatGPT、Pinecone和Streamlit构建的的聊天机器人") 15 | 16 | if 'responses' not in st.session_state: 17 | st.session_state['responses'] = ["How can I assist you?"] 18 | 19 | if 'requests' not in st.session_state: 20 | st.session_state['requests'] = [] 21 | 22 | llm = ChatOpenAI(model_name="gpt-3.5-turbo", openai_api_key="") 23 | 24 | if 'buffer_memory' not in st.session_state: 25 | st.session_state.buffer_memory=ConversationBufferWindowMemory(k=3,return_messages=True) 26 | 27 | 28 | system_msg_template = SystemMessagePromptTemplate.from_template(template="""利用所提供的上下文信息,尽可能真实地回答问题、 29 | 如果答案不在下面的文字中,请说 '我不知道'""") 30 | 31 | 32 | human_msg_template = HumanMessagePromptTemplate.from_template(template="{input}") 33 | 34 | prompt_template = ChatPromptTemplate.from_messages([system_msg_template, MessagesPlaceholder(variable_name="history"), human_msg_template]) 35 | 36 | conversation = ConversationChain(memory=st.session_state.buffer_memory, prompt=prompt_template, llm=llm, verbose=True) 37 | 38 | 39 | 40 | 41 | # container for chat history 42 | response_container = st.container() 43 | # container for text box 44 | textcontainer = st.container() 45 | 46 | 47 | with textcontainer: 48 | query = st.text_input("Query: ", key="input") 49 | if query: 50 | with st.spinner("typing..."): 51 | conversation_string = get_conversation_string() 52 | # st.code(conversation_string) 53 | refined_query = query_refiner(conversation_string, query) 54 | st.subheader("Refined Query:") 55 | st.write(refined_query) 56 | context = find_match(refined_query) 57 | # print(context) 58 | response = conversation.predict(input=f"Context:\n {context} \n\n Query:\n{query}") 59 | st.session_state.requests.append(query) 60 | st.session_state.responses.append(response) 61 | with response_container: 62 | if st.session_state['responses']: 63 | 64 | for i in range(len(st.session_state['responses'])): 65 | message(st.session_state['responses'][i],key=str(i)) 66 | if i < len(st.session_state['requests']): 67 | message(st.session_state["requests"][i], is_user=True,key=str(i)+ '_user') 68 | 69 | 70 | -------------------------------------------------------------------------------- /Langchain Chatbot/requirements.txt: -------------------------------------------------------------------------------- 1 | streamlit 2 | streamlit_chat 3 | langchain 4 | sentence_transformers 5 | -------------------------------------------------------------------------------- /Langchain Chatbot/utils.py: -------------------------------------------------------------------------------- 1 | from sentence_transformers import SentenceTransformer 2 | import pinecone 3 | import openai 4 | import streamlit as st 5 | openai.api_key = "" 6 | model = SentenceTransformer('all-MiniLM-L6-v2') 7 | 8 | pinecone.init(api_key='', environment='us-east-1-aws') 9 | index = pinecone.Index('langchain-chatbot') 10 | 11 | def find_match(input): 12 | input_em = model.encode(input).tolist() 13 | result = index.query(input_em, top_k=2, includeMetadata=True) 14 | return result['matches'][0]['metadata']['text']+"\n"+result['matches'][1]['metadata']['text'] 15 | 16 | def query_refiner(conversation, query): 17 | 18 | response = openai.Completion.create( 19 | model="text-davinci-003", 20 | prompt=f"给出以下用户查询和对话记录,制定一个最相关的问题,从知识库中为用户提供一个答案.\n\n对话记录: \n{conversation}\n\n用户查询: {query}\n\n优化查询:", 21 | temperature=0.7, 22 | max_tokens=256, 23 | top_p=1, 24 | frequency_penalty=0, 25 | presence_penalty=0 26 | ) 27 | return response['choices'][0]['text'] 28 | 29 | def get_conversation_string(): 30 | conversation_string = "" 31 | for i in range(len(st.session_state['responses'])-1): 32 | 33 | conversation_string += "Human: "+st.session_state['requests'][i] + "\n" 34 | conversation_string += "Bot: "+ st.session_state['responses'][i+1] + "\n" 35 | return conversation_string 36 | -------------------------------------------------------------------------------- /Langchain_Agents_SQL_Database_Agent.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "provenance": [], 7 | "include_colab_link": true 8 | }, 9 | "kernelspec": { 10 | "name": "python3", 11 | "display_name": "Python 3" 12 | }, 13 | "language_info": { 14 | "name": "python" 15 | } 16 | }, 17 | "cells": [ 18 | { 19 | "cell_type": "markdown", 20 | "metadata": { 21 | "id": "view-in-github", 22 | "colab_type": "text" 23 | }, 24 | "source": [ 25 | "\"Open" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "source": [ 31 | "!pip install langchain openai pymysql --upgrade -q" 32 | ], 33 | "metadata": { 34 | "id": "01A-AWhxPka9", 35 | "colab": { 36 | "base_uri": "https://localhost:8080/" 37 | }, 38 | "outputId": "41670f21-4f58-401c-a8bb-b29ae769e6a3" 39 | }, 40 | "execution_count": 1, 41 | "outputs": [ 42 | { 43 | "output_type": "stream", 44 | "name": "stdout", 45 | "text": [ 46 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.0/1.0 MB\u001b[0m \u001b[31m21.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 47 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m73.6/73.6 kB\u001b[0m \u001b[31m7.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 48 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m43.7/43.7 kB\u001b[0m \u001b[31m4.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 49 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.0/1.0 MB\u001b[0m \u001b[31m73.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 50 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m90.0/90.0 kB\u001b[0m \u001b[31m10.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 51 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m114.5/114.5 kB\u001b[0m \u001b[31m15.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 52 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m268.8/268.8 kB\u001b[0m \u001b[31m29.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 53 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m149.6/149.6 kB\u001b[0m \u001b[31m18.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 54 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m49.1/49.1 kB\u001b[0m \u001b[31m5.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 55 | "\u001b[?25h" 56 | ] 57 | } 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "source": [ 63 | "##Agents" 64 | ], 65 | "metadata": { 66 | "id": "JWDoW0HUhfxL" 67 | } 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "source": [ 72 | "https://python.langchain.com/en/latest/modules/agents/getting_started.html\n", 73 | "\n" 74 | ], 75 | "metadata": { 76 | "id": "rrH3mfgchyyT" 77 | } 78 | }, 79 | { 80 | "cell_type": "code", 81 | "source": [ 82 | "import os\n", 83 | "os.environ['OPENAI_API_KEY'] = \"your_openai_api_key\"\n", 84 | "os.environ[\"SERPAPI_API_KEY\"] = \"your_serpapi_api_key\"" 85 | ], 86 | "metadata": { 87 | "id": "KHHo_6htR1GA" 88 | }, 89 | "execution_count": 7, 90 | "outputs": [] 91 | }, 92 | { 93 | "cell_type": "code", 94 | "source": [ 95 | "!pip install google-search-results -q" 96 | ], 97 | "metadata": { 98 | "id": "_brJP_HbhW3K", 99 | "colab": { 100 | "base_uri": "https://localhost:8080/" 101 | }, 102 | "outputId": "b5eb21ee-ea90-4881-dfb1-e6d73634f1e2" 103 | }, 104 | "execution_count": 3, 105 | "outputs": [ 106 | { 107 | "output_type": "stream", 108 | "name": "stdout", 109 | "text": [ 110 | " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", 111 | " Building wheel for google-search-results (setup.py) ... \u001b[?25l\u001b[?25hdone\n" 112 | ] 113 | } 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "source": [ 119 | "https://python.langchain.com/en/latest/modules/agents/agents/agent_types.html" 120 | ], 121 | "metadata": { 122 | "id": "7PLCAVpKj2eU" 123 | } 124 | }, 125 | { 126 | "cell_type": "code", 127 | "source": [ 128 | "from langchain.agents import load_tools\n", 129 | "from langchain.agents import initialize_agent\n", 130 | "from langchain.agents import AgentType\n", 131 | "from langchain.llms import OpenAI" 132 | ], 133 | "metadata": { 134 | "id": "66hNmNKMhd5n" 135 | }, 136 | "execution_count": 9, 137 | "outputs": [] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "source": [ 142 | "llm = OpenAI(temperature=0)\n", 143 | "tools = load_tools([\"serpapi\", \"llm-math\"], llm=llm)" 144 | ], 145 | "metadata": { 146 | "id": "CZVJLedJhj9A" 147 | }, 148 | "execution_count": 10, 149 | "outputs": [] 150 | }, 151 | { 152 | "cell_type": "code", 153 | "source": [ 154 | "agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION)\n", 155 | "agent.run(\" OpenAI的老板是谁,他的年龄的0.5次方是多少?\")" 156 | ], 157 | "metadata": { 158 | "colab": { 159 | "base_uri": "https://localhost:8080/", 160 | "height": 35 161 | }, 162 | "id": "3z_hnwRQkHlN", 163 | "outputId": "dbbe793f-1201-4186-a5ac-0640c71e49c0" 164 | }, 165 | "execution_count": 13, 166 | "outputs": [ 167 | { 168 | "output_type": "execute_result", 169 | "data": { 170 | "text/plain": [ 171 | "'6.164414002968976'" 172 | ], 173 | "application/vnd.google.colaboratory.intrinsic+json": { 174 | "type": "string" 175 | } 176 | }, 177 | "metadata": {}, 178 | "execution_count": 13 179 | } 180 | ] 181 | }, 182 | { 183 | "cell_type": "code", 184 | "source": [ 185 | "agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)\n", 186 | "agent.run(\"OpenAI的老板是谁,他的年龄的0.5次方是多少?\")" 187 | ], 188 | "metadata": { 189 | "colab": { 190 | "base_uri": "https://localhost:8080/", 191 | "height": 375 192 | }, 193 | "id": "Z1obLhbZkR05", 194 | "outputId": "06a4b6bc-8c4e-43eb-bbbe-e84c022d41ae" 195 | }, 196 | "execution_count": 14, 197 | "outputs": [ 198 | { 199 | "output_type": "stream", 200 | "name": "stdout", 201 | "text": [ 202 | "\n", 203 | "\n", 204 | "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", 205 | "\u001b[32;1m\u001b[1;3m I need to find out who the CEO of OpenAI is and then calculate their age to the power of 0.5.\n", 206 | "Action: Search\n", 207 | "Action Input: \"OpenAI CEO\"\u001b[0m\n", 208 | "Observation: \u001b[36;1m\u001b[1;3mSam Altman, the CEO of ChatGPT maker OpenAI, used a high-profile trip to South Korea on Friday to call for coordinated international ...\u001b[0m\n", 209 | "Thought:\u001b[32;1m\u001b[1;3m I need to find out the age of the CEO\n", 210 | "Action: Search\n", 211 | "Action Input: \"Sam Altman age\"\u001b[0m\n", 212 | "Observation: \u001b[36;1m\u001b[1;3m38 years\u001b[0m\n", 213 | "Thought:\u001b[32;1m\u001b[1;3m I now know the age of the CEO and need to calculate it to the power of 0.5\n", 214 | "Action: Calculator\n", 215 | "Action Input: 38^0.5\u001b[0m\n", 216 | "Observation: \u001b[33;1m\u001b[1;3mAnswer: 6.164414002968976\u001b[0m\n", 217 | "Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n", 218 | "Final Answer: 6.164414002968976\u001b[0m\n", 219 | "\n", 220 | "\u001b[1m> Finished chain.\u001b[0m\n" 221 | ] 222 | }, 223 | { 224 | "output_type": "execute_result", 225 | "data": { 226 | "text/plain": [ 227 | "'6.164414002968976'" 228 | ], 229 | "application/vnd.google.colaboratory.intrinsic+json": { 230 | "type": "string" 231 | } 232 | }, 233 | "metadata": {}, 234 | "execution_count": 14 235 | } 236 | ] 237 | }, 238 | { 239 | "cell_type": "markdown", 240 | "source": [ 241 | "https://python.langchain.com/en/latest/modules/agents/toolkits/examples/sql_database.html" 242 | ], 243 | "metadata": { 244 | "id": "_2VxCUB2h7Gl" 245 | } 246 | }, 247 | { 248 | "cell_type": "code", 249 | "execution_count": null, 250 | "metadata": { 251 | "id": "-ewK38IAPhsU" 252 | }, 253 | "outputs": [], 254 | "source": [ 255 | "from langchain.agents import create_sql_agent\n", 256 | "from langchain.agents.agent_toolkits import SQLDatabaseToolkit\n", 257 | "from langchain.sql_database import SQLDatabase\n", 258 | "from langchain.llms.openai import OpenAI\n", 259 | "from langchain.agents import AgentExecutor" 260 | ] 261 | }, 262 | { 263 | "cell_type": "markdown", 264 | "source": [ 265 | "https://www.mysqltutorial.org/mysql-sample-database.aspx" 266 | ], 267 | "metadata": { 268 | "id": "dhr58gI-Auu_" 269 | } 270 | }, 271 | { 272 | "cell_type": "code", 273 | "source": [ 274 | "db_user = \"db_user\"\n", 275 | "db_password = \"db_password\"\n", 276 | "db_host = \"db_host\"\n", 277 | "db_name = \"db_name\"\n", 278 | "db = SQLDatabase.from_uri(f\"mysql+pymysql://{db_user}:{db_password}@{db_host}/{db_name}\")" 279 | ], 280 | "metadata": { 281 | "id": "M8f9qz6GPmO8" 282 | }, 283 | "execution_count": null, 284 | "outputs": [] 285 | }, 286 | { 287 | "cell_type": "code", 288 | "source": [ 289 | "# llm=OpenAI(temperature=0)\n", 290 | "from langchain.chat_models import ChatOpenAI\n", 291 | "llm = ChatOpenAI(model_name=\"gpt-3.5-turbo\")\n", 292 | "# chat = ChatOpenAI(model_name=\"gpt-4\")" 293 | ], 294 | "metadata": { 295 | "id": "8Ca9LSrpbhFu" 296 | }, 297 | "execution_count": null, 298 | "outputs": [] 299 | }, 300 | { 301 | "cell_type": "code", 302 | "source": [ 303 | "toolkit = SQLDatabaseToolkit(db=db)\n", 304 | "\n", 305 | "agent_executor = create_sql_agent(\n", 306 | " llm=llm,\n", 307 | " toolkit=toolkit,\n", 308 | " verbose=True\n", 309 | ")" 310 | ], 311 | "metadata": { 312 | "id": "jJJuOzhAP4Ob" 313 | }, 314 | "execution_count": null, 315 | "outputs": [] 316 | }, 317 | { 318 | "cell_type": "markdown", 319 | "source": [ 320 | "###Describing a table" 321 | ], 322 | "metadata": { 323 | "id": "PZkzmH8ee49i" 324 | } 325 | }, 326 | { 327 | "cell_type": "code", 328 | "source": [ 329 | "agent_executor.run(\"描述与订单相关的表及其关系\")" 330 | ], 331 | "metadata": { 332 | "colab": { 333 | "base_uri": "https://localhost:8080/", 334 | "height": 1000 335 | }, 336 | "id": "5wyZzuT9BqCf", 337 | "outputId": "06b6b6c6-55f9-4957-e467-2a0a012dccd0" 338 | }, 339 | "execution_count": null, 340 | "outputs": [ 341 | { 342 | "output_type": "stream", 343 | "name": "stdout", 344 | "text": [ 345 | "\n", 346 | "\n", 347 | "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", 348 | "\u001b[32;1m\u001b[1;3mAction: list_tables_sql_db\n", 349 | "Action Input: \"\"\u001b[0m\n", 350 | "Observation: \u001b[38;5;200m\u001b[1;3mcustomers, employees, payments, products, productlines, orderdetails, offices, orders\u001b[0m\n", 351 | "Thought:\u001b[32;1m\u001b[1;3mI should query the orderdetails table to see how it is related to the orders table.\n", 352 | "Action: schema_sql_db\n", 353 | "Action Input: \"orders, orderdetails\"\u001b[0m\n", 354 | "Observation: \u001b[33;1m\u001b[1;3m\n", 355 | "CREATE TABLE orders (\n", 356 | "\t`orderNumber` INTEGER NOT NULL, \n", 357 | "\t`orderDate` DATE NOT NULL, \n", 358 | "\t`requiredDate` DATE NOT NULL, \n", 359 | "\t`shippedDate` DATE, \n", 360 | "\tstatus VARCHAR(15) NOT NULL, \n", 361 | "\tcomments TEXT, \n", 362 | "\t`customerNumber` INTEGER NOT NULL, \n", 363 | "\tPRIMARY KEY (`orderNumber`), \n", 364 | "\tCONSTRAINT orders_ibfk_1 FOREIGN KEY(`customerNumber`) REFERENCES customers (`customerNumber`)\n", 365 | ")ENGINE=InnoDB DEFAULT CHARSET=latin1\n", 366 | "\n", 367 | "/*\n", 368 | "3 rows from orders table:\n", 369 | "orderNumber\torderDate\trequiredDate\tshippedDate\tstatus\tcomments\tcustomerNumber\n", 370 | "10100\t2003-01-06\t2003-01-13\t2003-01-10\tShipped\tNone\t363\n", 371 | "10101\t2003-01-09\t2003-01-18\t2003-01-11\tShipped\tCheck on availability.\t128\n", 372 | "10102\t2003-01-10\t2003-01-18\t2003-01-14\tShipped\tNone\t181\n", 373 | "*/\n", 374 | "\n", 375 | "\n", 376 | "CREATE TABLE orderdetails (\n", 377 | "\t`orderNumber` INTEGER NOT NULL, \n", 378 | "\t`productCode` VARCHAR(15) NOT NULL, \n", 379 | "\t`quantityOrdered` INTEGER NOT NULL, \n", 380 | "\t`priceEach` DECIMAL(10, 2) NOT NULL, \n", 381 | "\t`orderLineNumber` SMALLINT NOT NULL, \n", 382 | "\tPRIMARY KEY (`orderNumber`, `productCode`), \n", 383 | "\tCONSTRAINT orderdetails_ibfk_1 FOREIGN KEY(`orderNumber`) REFERENCES orders (`orderNumber`), \n", 384 | "\tCONSTRAINT orderdetails_ibfk_2 FOREIGN KEY(`productCode`) REFERENCES products (`productCode`)\n", 385 | ")ENGINE=InnoDB DEFAULT CHARSET=latin1\n", 386 | "\n", 387 | "/*\n", 388 | "3 rows from orderdetails table:\n", 389 | "orderNumber\tproductCode\tquantityOrdered\tpriceEach\torderLineNumber\n", 390 | "10100\tS18_1749\t30\t136.00\t3\n", 391 | "10100\tS18_2248\t50\t55.09\t2\n", 392 | "10100\tS18_4409\t22\t75.46\t4\n", 393 | "*/\u001b[0m\n", 394 | "Thought:\u001b[32;1m\u001b[1;3mThe orderdetails table has a foreign key constraint on the orderNumber column that references the orders table. I can use this to join the two tables and describe the columns in the orderdetails table.\n", 395 | "Action: schema_sql_db\n", 396 | "Action Input: \"orderdetails\"\u001b[0m\n", 397 | "Observation: \u001b[33;1m\u001b[1;3m\n", 398 | "CREATE TABLE orderdetails (\n", 399 | "\t`orderNumber` INTEGER NOT NULL, \n", 400 | "\t`productCode` VARCHAR(15) NOT NULL, \n", 401 | "\t`quantityOrdered` INTEGER NOT NULL, \n", 402 | "\t`priceEach` DECIMAL(10, 2) NOT NULL, \n", 403 | "\t`orderLineNumber` SMALLINT NOT NULL, \n", 404 | "\tPRIMARY KEY (`orderNumber`, `productCode`), \n", 405 | "\tCONSTRAINT orderdetails_ibfk_1 FOREIGN KEY(`orderNumber`) REFERENCES orders (`orderNumber`), \n", 406 | "\tCONSTRAINT orderdetails_ibfk_2 FOREIGN KEY(`productCode`) REFERENCES products (`productCode`)\n", 407 | ")ENGINE=InnoDB DEFAULT CHARSET=latin1\n", 408 | "\n", 409 | "/*\n", 410 | "3 rows from orderdetails table:\n", 411 | "orderNumber\tproductCode\tquantityOrdered\tpriceEach\torderLineNumber\n", 412 | "10100\tS18_1749\t30\t136.00\t3\n", 413 | "10100\tS18_2248\t50\t55.09\t2\n", 414 | "10100\tS18_4409\t22\t75.46\t4\n", 415 | "*/\u001b[0m\n", 416 | "Thought:\u001b[32;1m\u001b[1;3mI can use a JOIN statement to get the information from the orders table related to the orderdetails table.\n", 417 | "Action: query_checker_sql_db\n", 418 | "Action Input: \"SELECT orders.* FROM orders JOIN orderdetails ON orders.orderNumber = orderdetails.orderNumber WHERE orderdetails.orderNumber = 10100\"\u001b[0m\n", 419 | "Observation: \u001b[31;1m\u001b[1;3m\n", 420 | "\n", 421 | "SELECT orders.* FROM orders JOIN orderdetails ON orders.orderNumber = orderdetails.orderNumber WHERE orderdetails.orderNumber = 10100\u001b[0m\n", 422 | "Thought:\u001b[32;1m\u001b[1;3mThe query looks correct. I can now run it to get the answer to the original question.\n", 423 | "Action: query_sql_db\n", 424 | "Action Input: \"SELECT orders.* FROM orders JOIN orderdetails ON orders.orderNumber = orderdetails.orderNumber WHERE orderdetails.orderNumber = 10100\"\u001b[0m\n", 425 | "Observation: \u001b[36;1m\u001b[1;3m[(10100, datetime.date(2003, 1, 6), datetime.date(2003, 1, 13), datetime.date(2003, 1, 10), 'Shipped', None, 363), (10100, datetime.date(2003, 1, 6), datetime.date(2003, 1, 13), datetime.date(2003, 1, 10), 'Shipped', None, 363), (10100, datetime.date(2003, 1, 6), datetime.date(2003, 1, 13), datetime.date(2003, 1, 10), 'Shipped', None, 363), (10100, datetime.date(2003, 1, 6), datetime.date(2003, 1, 13), datetime.date(2003, 1, 10), 'Shipped', None, 363)]\u001b[0m\n", 426 | "Thought:\u001b[32;1m\u001b[1;3mThe query returned duplicate rows, but the relevant information is present. I can now give the final answer.\n", 427 | "Final Answer: The orders table and the orderdetails table are related by the orderNumber column, which serves as a foreign key in the orderdetails table referencing the primary key in the orders table. To get the information from both tables related to a specific order, you can use a JOIN statement with a WHERE clause specifying the desired orderNumber.\u001b[0m\n", 428 | "\n", 429 | "\u001b[1m> Finished chain.\u001b[0m\n" 430 | ] 431 | }, 432 | { 433 | "output_type": "execute_result", 434 | "data": { 435 | "text/plain": [ 436 | "'The orders table and the orderdetails table are related by the orderNumber column, which serves as a foreign key in the orderdetails table referencing the primary key in the orders table. To get the information from both tables related to a specific order, you can use a JOIN statement with a WHERE clause specifying the desired orderNumber.'" 437 | ], 438 | "application/vnd.google.colaboratory.intrinsic+json": { 439 | "type": "string" 440 | } 441 | }, 442 | "metadata": {}, 443 | "execution_count": 13 444 | } 445 | ] 446 | }, 447 | { 448 | "cell_type": "markdown", 449 | "source": [ 450 | "###Recovering from an error" 451 | ], 452 | "metadata": { 453 | "id": "AxeXiqODezwk" 454 | } 455 | }, 456 | { 457 | "cell_type": "code", 458 | "source": [ 459 | "agent_executor.run(\"描述PurchaseDetails表\")" 460 | ], 461 | "metadata": { 462 | "colab": { 463 | "base_uri": "https://localhost:8080/", 464 | "height": 767 465 | }, 466 | "id": "35oyluNYRxe_", 467 | "outputId": "c894eed4-6c8c-42d8-dcf5-8a9458460dec" 468 | }, 469 | "execution_count": null, 470 | "outputs": [ 471 | { 472 | "output_type": "stream", 473 | "name": "stdout", 474 | "text": [ 475 | "\n", 476 | "\n", 477 | "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", 478 | "\u001b[32;1m\u001b[1;3mAction: list_tables_sql_db\n", 479 | "Action Input: \"\"\u001b[0m\n", 480 | "Observation: \u001b[38;5;200m\u001b[1;3mcustomers, employees, payments, products, productlines, orderdetails, offices, orders\u001b[0m\n", 481 | "Thought:\u001b[32;1m\u001b[1;3mI can query the PurchaseDetails table, but first I need to know its schema.\n", 482 | "Action: schema_sql_db\n", 483 | "Action Input: \"PurchaseDetails\"\u001b[0m\n", 484 | "Observation: \u001b[33;1m\u001b[1;3mError: table_names {'PurchaseDetails'} not found in database\u001b[0m\n", 485 | "Thought:\u001b[32;1m\u001b[1;3mOops, I made a typo in the table name. I'll try again.\n", 486 | "Action: schema_sql_db\n", 487 | "Action Input: \"orderdetails\"\u001b[0m\n", 488 | "Observation: \u001b[33;1m\u001b[1;3m\n", 489 | "CREATE TABLE orderdetails (\n", 490 | "\t`orderNumber` INTEGER NOT NULL, \n", 491 | "\t`productCode` VARCHAR(15) NOT NULL, \n", 492 | "\t`quantityOrdered` INTEGER NOT NULL, \n", 493 | "\t`priceEach` DECIMAL(10, 2) NOT NULL, \n", 494 | "\t`orderLineNumber` SMALLINT NOT NULL, \n", 495 | "\tPRIMARY KEY (`orderNumber`, `productCode`), \n", 496 | "\tCONSTRAINT orderdetails_ibfk_1 FOREIGN KEY(`orderNumber`) REFERENCES orders (`orderNumber`), \n", 497 | "\tCONSTRAINT orderdetails_ibfk_2 FOREIGN KEY(`productCode`) REFERENCES products (`productCode`)\n", 498 | ")ENGINE=InnoDB DEFAULT CHARSET=latin1\n", 499 | "\n", 500 | "/*\n", 501 | "3 rows from orderdetails table:\n", 502 | "orderNumber\tproductCode\tquantityOrdered\tpriceEach\torderLineNumber\n", 503 | "10100\tS18_1749\t30\t136.00\t3\n", 504 | "10100\tS18_2248\t50\t55.09\t2\n", 505 | "10100\tS18_4409\t22\t75.46\t4\n", 506 | "*/\u001b[0m\n", 507 | "Thought:\u001b[32;1m\u001b[1;3mNow that I know the schema of the PurchaseDetails table (which is actually called orderdetails), I can describe it.\n", 508 | "Action: query_sql_db\n", 509 | "Action Input: \"DESCRIBE orderdetails\"\u001b[0m\n", 510 | "Observation: \u001b[36;1m\u001b[1;3m[('orderNumber', 'int', 'NO', 'PRI', None, ''), ('productCode', 'varchar(15)', 'NO', 'PRI', None, ''), ('quantityOrdered', 'int', 'NO', '', None, ''), ('priceEach', 'decimal(10,2)', 'NO', '', None, ''), ('orderLineNumber', 'smallint', 'NO', '', None, '')]\u001b[0m\n", 511 | "Thought:\u001b[32;1m\u001b[1;3mI have the information I need to describe the PurchaseDetails table, which has columns for orderNumber, productCode, quantityOrdered, priceEach, and orderLineNumber.\n", 512 | "Final Answer: The PurchaseDetails table (which is actually called orderdetails) has columns for orderNumber, productCode, quantityOrdered, priceEach, and orderLineNumber.\u001b[0m\n", 513 | "\n", 514 | "\u001b[1m> Finished chain.\u001b[0m\n" 515 | ] 516 | }, 517 | { 518 | "output_type": "execute_result", 519 | "data": { 520 | "text/plain": [ 521 | "'The PurchaseDetails table (which is actually called orderdetails) has columns for orderNumber, productCode, quantityOrdered, priceEach, and orderLineNumber.'" 522 | ], 523 | "application/vnd.google.colaboratory.intrinsic+json": { 524 | "type": "string" 525 | } 526 | }, 527 | "metadata": {}, 528 | "execution_count": 14 529 | } 530 | ] 531 | }, 532 | { 533 | "cell_type": "code", 534 | "source": [ 535 | "agent_executor.run(\"找出总销售收入最高的前5个产品\")" 536 | ], 537 | "metadata": { 538 | "colab": { 539 | "base_uri": "https://localhost:8080/", 540 | "height": 524 541 | }, 542 | "id": "-O6w_qCsRsNO", 543 | "outputId": "19364342-444f-4c38-ce11-dce6ab62ee2f" 544 | }, 545 | "execution_count": null, 546 | "outputs": [ 547 | { 548 | "output_type": "stream", 549 | "name": "stdout", 550 | "text": [ 551 | "\n", 552 | "\n", 553 | "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", 554 | "\u001b[32;1m\u001b[1;3mAction: list_tables_sql_db\n", 555 | "Action Input: \"\"\u001b[0m\n", 556 | "Observation: \u001b[38;5;200m\u001b[1;3mcustomers, employees, payments, products, productlines, orderdetails, offices, orders\u001b[0m\n", 557 | "Thought:\u001b[32;1m\u001b[1;3mI should query the 'products' and 'orderdetails' tables to get the total sales revenue for each product\n", 558 | "Action: query_checker_sql_db\n", 559 | "Action Input: SELECT products.productName, SUM(orderdetails.quantityOrdered * orderdetails.priceEach) AS totalRevenue FROM products INNER JOIN orderdetails ON products.productCode = orderdetails.productCode GROUP BY products.productName ORDER BY totalRevenue DESC LIMIT 5;\u001b[0m\n", 560 | "Observation: \u001b[31;1m\u001b[1;3m\n", 561 | "\n", 562 | "SELECT products.productName, SUM(orderdetails.quantityOrdered * orderdetails.priceEach) AS totalRevenue \n", 563 | "FROM products \n", 564 | "INNER JOIN orderdetails \n", 565 | "ON products.productCode = orderdetails.productCode \n", 566 | "GROUP BY products.productName \n", 567 | "ORDER BY totalRevenue DESC \n", 568 | "LIMIT 5;\u001b[0m\n", 569 | "Thought:\u001b[32;1m\u001b[1;3mThe query looks correct, I should execute it to get the top 5 products with the highest total sales revenue.\n", 570 | "Action: query_sql_db\n", 571 | "Action Input: SELECT products.productName, SUM(orderdetails.quantityOrdered * orderdetails.priceEach) AS totalRevenue FROM products INNER JOIN orderdetails ON products.productCode = orderdetails.productCode GROUP BY products.productName ORDER BY totalRevenue DESC LIMIT 5;\u001b[0m\n", 572 | "Observation: \u001b[36;1m\u001b[1;3m[('1992 Ferrari 360 Spider red', Decimal('276839.98')), ('2001 Ferrari Enzo', Decimal('190755.86')), ('1952 Alpine Renault 1300', Decimal('190017.96')), ('2003 Harley-Davidson Eagle Drag Bike', Decimal('170686.00')), ('1968 Ford Mustang', Decimal('161531.48'))]\u001b[0m\n", 573 | "Thought:\u001b[32;1m\u001b[1;3mI can see that the top 5 products with the highest total sales revenue are: 1992 Ferrari 360 Spider red, 2001 Ferrari Enzo, 1952 Alpine Renault 1300, 2003 Harley-Davidson Eagle Drag Bike, and 1968 Ford Mustang.\n", 574 | "Final Answer: The top 5 products with the highest total sales revenue are 1992 Ferrari 360 Spider red, 2001 Ferrari Enzo, 1952 Alpine Renault 1300, 2003 Harley-Davidson Eagle Drag Bike, and 1968 Ford Mustang.\u001b[0m\n", 575 | "\n", 576 | "\u001b[1m> Finished chain.\u001b[0m\n" 577 | ] 578 | }, 579 | { 580 | "output_type": "execute_result", 581 | "data": { 582 | "text/plain": [ 583 | "'The top 5 products with the highest total sales revenue are 1992 Ferrari 360 Spider red, 2001 Ferrari Enzo, 1952 Alpine Renault 1300, 2003 Harley-Davidson Eagle Drag Bike, and 1968 Ford Mustang.'" 584 | ], 585 | "application/vnd.google.colaboratory.intrinsic+json": { 586 | "type": "string" 587 | } 588 | }, 589 | "metadata": {}, 590 | "execution_count": 15 591 | } 592 | ] 593 | }, 594 | { 595 | "cell_type": "code", 596 | "source": [ 597 | "agent_executor.run(\"列出订单数量最多的前3个国家\")" 598 | ], 599 | "metadata": { 600 | "colab": { 601 | "base_uri": "https://localhost:8080/", 602 | "height": 489 603 | }, 604 | "id": "a7TJi0JzfHZP", 605 | "outputId": "cb977d0f-c408-4b02-915e-ff6f89746f1a" 606 | }, 607 | "execution_count": null, 608 | "outputs": [ 609 | { 610 | "output_type": "stream", 611 | "name": "stdout", 612 | "text": [ 613 | "\n", 614 | "\n", 615 | "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", 616 | "\u001b[32;1m\u001b[1;3mAction: list_tables_sql_db\n", 617 | "Action Input: \"\"\u001b[0m\n", 618 | "Observation: \u001b[38;5;200m\u001b[1;3mcustomers, employees, payments, products, productlines, orderdetails, offices, orders\u001b[0m\n", 619 | "Thought:\u001b[32;1m\u001b[1;3mI should query the orders table and join it with the customers table to get the country information.\n", 620 | "Action: query_checker_sql_db\n", 621 | "Action Input: \"SELECT customers.country, COUNT(*) as num_orders FROM orders JOIN customers ON orders.customerNumber = customers.customerNumber GROUP BY customers.country ORDER BY num_orders DESC LIMIT 3\"\u001b[0m\n", 622 | "Observation: \u001b[31;1m\u001b[1;3m\n", 623 | "\n", 624 | "SELECT customers.country, COUNT(*) as num_orders FROM orders \n", 625 | "JOIN customers ON orders.customerNumber = customers.customerNumber \n", 626 | "WHERE customers.country IS NOT NULL \n", 627 | "GROUP BY customers.country \n", 628 | "ORDER BY num_orders DESC \n", 629 | "LIMIT 3;\u001b[0m\n", 630 | "Thought:\u001b[32;1m\u001b[1;3mThe query looks good. I should execute it to get the top 3 countries with the highest number of orders.\n", 631 | "Action: query_sql_db\n", 632 | "Action Input: \"SELECT customers.country, COUNT(*) as num_orders FROM orders JOIN customers ON orders.customerNumber = customers.customerNumber WHERE customers.country IS NOT NULL GROUP BY customers.country ORDER BY num_orders DESC LIMIT 3\"\u001b[0m\n", 633 | "Observation: \u001b[36;1m\u001b[1;3m[('USA', 112), ('France', 37), ('Spain', 36)]\u001b[0m\n", 634 | "Thought:\u001b[32;1m\u001b[1;3mI now know the final answer.\n", 635 | "Final Answer: The top 3 countries with the highest number of orders are USA, France, and Spain.\u001b[0m\n", 636 | "\n", 637 | "\u001b[1m> Finished chain.\u001b[0m\n" 638 | ] 639 | }, 640 | { 641 | "output_type": "execute_result", 642 | "data": { 643 | "text/plain": [ 644 | "'The top 3 countries with the highest number of orders are USA, France, and Spain.'" 645 | ], 646 | "application/vnd.google.colaboratory.intrinsic+json": { 647 | "type": "string" 648 | } 649 | }, 650 | "metadata": {}, 651 | "execution_count": 16 652 | } 653 | ] 654 | }, 655 | { 656 | "cell_type": "code", 657 | "source": [], 658 | "metadata": { 659 | "id": "-gubSOK_SJSO" 660 | }, 661 | "execution_count": null, 662 | "outputs": [] 663 | } 664 | ] 665 | } -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # langchain-tutorials -------------------------------------------------------------------------------- /Using_OpenAI__LangChain_And_HDBSCAN_Clustering_Documents.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "view-in-github", 7 | "colab_type": "text" 8 | }, 9 | "source": [ 10 | "\"Open" 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "source": [ 16 | "!python3.10 --version" 17 | ], 18 | "metadata": { 19 | "colab": { 20 | "base_uri": "https://localhost:8080/" 21 | }, 22 | "id": "GNgna6Mcr-f8", 23 | "outputId": "2d2c7f13-5254-4dd9-f34c-18a015879507" 24 | }, 25 | "execution_count": 5, 26 | "outputs": [ 27 | { 28 | "output_type": "stream", 29 | "name": "stdout", 30 | "text": [ 31 | "Python 3.10.12\n" 32 | ] 33 | } 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "source": [ 39 | "## **安装依赖**" 40 | ], 41 | "metadata": { 42 | "id": "7tqCm0t9sifX" 43 | } 44 | }, 45 | { 46 | "cell_type": "code", 47 | "source": [ 48 | "%pip install -r requirements.txt" 49 | ], 50 | "metadata": { 51 | "id": "Cool57g7sVmF" 52 | }, 53 | "execution_count": null, 54 | "outputs": [] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": 18, 59 | "metadata": { 60 | "colab": { 61 | "base_uri": "https://localhost:8080/" 62 | }, 63 | "id": "56denEvKvJZS", 64 | "outputId": "d5313828-35c8-4557-fcfb-66be1b7a2520" 65 | }, 66 | "outputs": [ 67 | { 68 | "output_type": "execute_result", 69 | "data": { 70 | "text/plain": [ 71 | "True" 72 | ] 73 | }, 74 | "metadata": {}, 75 | "execution_count": 18 76 | } 77 | ], 78 | "source": [ 79 | "import os\n", 80 | "\n", 81 | "import hdbscan\n", 82 | "import pandas as pd\n", 83 | "\n", 84 | "from langchain import LLMChain\n", 85 | "from langchain.chat_models import ChatOpenAI\n", 86 | "from langchain.embeddings.openai import OpenAIEmbeddings\n", 87 | "from langchain.prompts.chat import (\n", 88 | " ChatPromptTemplate,\n", 89 | " SystemMessagePromptTemplate,\n", 90 | " HumanMessagePromptTemplate,\n", 91 | ")\n", 92 | "from newsapi import NewsApiClient\n", 93 | "\n", 94 | "from dotenv import load_dotenv\n", 95 | "\n", 96 | "load_dotenv()" 97 | ] 98 | }, 99 | { 100 | "cell_type": "markdown", 101 | "metadata": { 102 | "id": "v7mK8pQvvJZW" 103 | }, 104 | "source": [ 105 | "\n", 106 | "## 从相关来源获得10篇新闻报道(因为接口限速原因,这里演示10条)" 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": 29, 112 | "metadata": { 113 | "id": "AdY8rR8YvJZY" 114 | }, 115 | "outputs": [], 116 | "source": [ 117 | "newsapi = NewsApiClient(api_key=os.getenv(\"NEWSAPI_API_KEY\"))\n", 118 | "sources_1 = [\n", 119 | " \"the-washington-post\",\n", 120 | " \"the-wall-street-journal\",\n", 121 | " \"business-insider\",\n", 122 | "]\n", 123 | "sources_2 = [\n", 124 | " \"google-news\"\n", 125 | "]" 126 | ] 127 | }, 128 | { 129 | "cell_type": "code", 130 | "execution_count": 30, 131 | "metadata": { 132 | "id": "xVB0xTM1vJZZ" 133 | }, 134 | "outputs": [], 135 | "source": [ 136 | "recent_articles = []\n", 137 | "for source in [sources_1, sources_2]:\n", 138 | " recent_articles.extend(newsapi.get_everything(\n", 139 | " sources=\",\".join(source),\n", 140 | " language=\"zh\",\n", 141 | " page_size=50\n", 142 | " )[\"articles\"])" 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "source": [ 148 | "print(recent_articles)" 149 | ], 150 | "metadata": { 151 | "id": "VU7xr4GNyzR0" 152 | }, 153 | "execution_count": null, 154 | "outputs": [] 155 | }, 156 | { 157 | "cell_type": "markdown", 158 | "metadata": { 159 | "id": "tHvIf_CjvJZZ" 160 | }, 161 | "source": [ 162 | "## 生成新闻文章的嵌入向量" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": 32, 168 | "metadata": { 169 | "id": "xXZjH9DsvJZa" 170 | }, 171 | "outputs": [], 172 | "source": [ 173 | "docs = [\n", 174 | " a[\"title\"] + \"\\n\\n\" + (a[\"description\"] or \"\")\n", 175 | " for a in recent_articles\n", 176 | "]" 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": 33, 182 | "metadata": { 183 | "id": "qNW0eB9uvJZa" 184 | }, 185 | "outputs": [], 186 | "source": [ 187 | "embeddings = OpenAIEmbeddings(chunk_size=1000).embed_documents(docs)" 188 | ] 189 | }, 190 | { 191 | "cell_type": "markdown", 192 | "metadata": { 193 | "id": "UDU4gksRvJZa" 194 | }, 195 | "source": [ 196 | "## 对文件进行聚类,并将结果存储在一个 dataframe 中" 197 | ] 198 | }, 199 | { 200 | "cell_type": "code", 201 | "execution_count": 34, 202 | "metadata": { 203 | "id": "lUMMPJz1vJZb" 204 | }, 205 | "outputs": [], 206 | "source": [ 207 | "hdb = hdbscan.HDBSCAN(gen_min_span_tree=True, min_samples=3, min_cluster_size=3).fit(embeddings)" 208 | ] 209 | }, 210 | { 211 | "cell_type": "code", 212 | "execution_count": 35, 213 | "metadata": { 214 | "id": "FjTBbPYFvJZb" 215 | }, 216 | "outputs": [], 217 | "source": [ 218 | "df = pd.DataFrame({\n", 219 | " \"title\": [article[\"title\"] for article in recent_articles],\n", 220 | " \"description\": [article[\"description\"] for article in recent_articles],\n", 221 | " \"cluster\": hdb.labels_,\n", 222 | "})\n", 223 | "df = df.query(\"cluster != -1\") # 删除不在群集中的文档" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": { 229 | "id": "mAAqX_yyvJZc" 230 | }, 231 | "source": [ 232 | "## 从每个聚类的文件中创建聚类主题" 233 | ] 234 | }, 235 | { 236 | "cell_type": "code", 237 | "execution_count": 36, 238 | "metadata": { 239 | "id": "vqqrC-EZvJZc" 240 | }, 241 | "outputs": [], 242 | "source": [ 243 | "\n", 244 | "\n", 245 | "\n", 246 | "def get_prompt():\n", 247 | " system_template = \"你是一位记者专家。你要帮我为新闻文章写一个引人注目的主题标题。\"\n", 248 | " human_template = \"使用以下文章,写一个能概括这些文章的主题标题。\\n\\nARTICLES:{articles}\\n\\nTOPIC TITLE:\"\n", 249 | "\n", 250 | " return ChatPromptTemplate(\n", 251 | " messages=[\n", 252 | " SystemMessagePromptTemplate.from_template(system_template),\n", 253 | " HumanMessagePromptTemplate.from_template(human_template),\n", 254 | " ],\n", 255 | " input_variables=[\"articles\"],\n", 256 | " )\n", 257 | "\n", 258 | "\n", 259 | "articles_str = \"\\n\\n\".join(\n", 260 | " [article[\"title\"] + \"\\n\\n\" + (article[\"description\"] or \"\") for article in recent_articles]\n", 261 | ")\n", 262 | "\n", 263 | "prompt = get_prompt()\n", 264 | "\n", 265 | "for c in df.cluster.unique():\n", 266 | " chain = LLMChain(\n", 267 | " llm=ChatOpenAI(temperature=0, model_name=\"gpt-3.5-turbo-0613\"), prompt=prompt, verbose=False\n", 268 | " )\n", 269 | " articles_str = \"\\n\".join(\n", 270 | " [\n", 271 | " f\"{article['title']}\\n{article['description']}\\n\"\n", 272 | " for article in df.query(f\"cluster == {c}\").to_dict(orient=\"records\")\n", 273 | " ]\n", 274 | " )\n", 275 | " result = chain.run(\n", 276 | " {\n", 277 | " \"articles\": articles_str,\n", 278 | " }\n", 279 | " )\n", 280 | " df.loc[df.cluster == c, \"topic_title\"] = result" 281 | ] 282 | }, 283 | { 284 | "cell_type": "code", 285 | "execution_count": 41, 286 | "metadata": { 287 | "colab": { 288 | "base_uri": "https://localhost:8080/", 289 | "height": 355 290 | }, 291 | "id": "fV2OgrYQvJZc", 292 | "outputId": "40ec4025-89c1-4afc-e361-fac63510e6b4" 293 | }, 294 | "outputs": [ 295 | { 296 | "output_type": "stream", 297 | "name": "stdout", 298 | "text": [ 299 | "\"粽情端午:龙舟竞渡、非遗手工技艺和粽子包制活动丰富多彩\"\n" 300 | ] 301 | }, 302 | { 303 | "output_type": "display_data", 304 | "data": { 305 | "text/plain": [ 306 | " title \\\n", 307 | "11 龙舟竞渡华亭湖!松江区第十三届端午龙舟赛上午开赛_郊野 - 新民网 \n", 308 | "17 端午假期首日长三角铁路迎来客流高峰预计发送旅客340万人次 - 无锡新传媒 \n", 309 | "19 看演出体验非遗手工技艺北京西城端午活动精彩纷呈 - beijing.qianlong.com \n", 310 | "24 《颂·黄钟大吕》在国家大剧院音乐厅上演 - China Daily \n", 311 | "27 龙舟竞渡正端午长三角龙舟邀请赛在金山山阳镇举行_新民社会 - 新民网 \n", 312 | "\n", 313 | " description cluster \\\n", 314 | "11 龙舟竞渡华亭湖!松江区第十三届端午龙舟赛上午开赛_郊野  新民网 1 \n", 315 | "17 端午假期首日长三角铁路迎来客流高峰预计发送旅客340万人次  无锡新传媒 1 \n", 316 | "19 看演出体验非遗手工技艺北京西城端午活动精彩纷呈  beijing.qianlong.com 1 \n", 317 | "24 《颂·黄钟大吕》在国家大剧院音乐厅上演  China Daily 1 \n", 318 | "27 龙舟竞渡正端午长三角龙舟邀请赛在金山山阳镇举行_新民社会  新民网 1 \n", 319 | "\n", 320 | " topic_title \n", 321 | "11 \"粽情端午:龙舟竞渡、非遗手工技艺和粽子包制活动丰富多彩\" \n", 322 | "17 \"粽情端午:龙舟竞渡、非遗手工技艺和粽子包制活动丰富多彩\" \n", 323 | "19 \"粽情端午:龙舟竞渡、非遗手工技艺和粽子包制活动丰富多彩\" \n", 324 | "24 \"粽情端午:龙舟竞渡、非遗手工技艺和粽子包制活动丰富多彩\" \n", 325 | "27 \"粽情端午:龙舟竞渡、非遗手工技艺和粽子包制活动丰富多彩\" " 326 | ], 327 | "text/html": [ 328 | "\n", 329 | "
\n", 330 | "
\n", 331 | "
\n", 332 | "\n", 345 | "\n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | "
titledescriptionclustertopic_title
11龙舟竞渡华亭湖!松江区第十三届端午龙舟赛上午开赛_郊野 - 新民网龙舟竞渡华亭湖!松江区第十三届端午龙舟赛上午开赛_郊野  新民网1\"粽情端午:龙舟竞渡、非遗手工技艺和粽子包制活动丰富多彩\"
17端午假期首日长三角铁路迎来客流高峰预计发送旅客340万人次 - 无锡新传媒端午假期首日长三角铁路迎来客流高峰预计发送旅客340万人次  无锡新传媒1\"粽情端午:龙舟竞渡、非遗手工技艺和粽子包制活动丰富多彩\"
19看演出体验非遗手工技艺北京西城端午活动精彩纷呈 - beijing.qianlong.com看演出体验非遗手工技艺北京西城端午活动精彩纷呈  beijing.qianlong.com1\"粽情端午:龙舟竞渡、非遗手工技艺和粽子包制活动丰富多彩\"
24《颂·黄钟大吕》在国家大剧院音乐厅上演 - China Daily《颂·黄钟大吕》在国家大剧院音乐厅上演  China Daily1\"粽情端午:龙舟竞渡、非遗手工技艺和粽子包制活动丰富多彩\"
27龙舟竞渡正端午长三角龙舟邀请赛在金山山阳镇举行_新民社会 - 新民网龙舟竞渡正端午长三角龙舟邀请赛在金山山阳镇举行_新民社会  新民网1\"粽情端午:龙舟竞渡、非遗手工技艺和粽子包制活动丰富多彩\"
\n", 393 | "
\n", 394 | " \n", 404 | " \n", 405 | " \n", 442 | "\n", 443 | " \n", 467 | "
\n", 468 | "
\n", 469 | " " 470 | ] 471 | }, 472 | "metadata": {} 473 | } 474 | ], 475 | "source": [ 476 | "c = 1\n", 477 | "with pd.option_context(\"display.max_colwidth\", None):\n", 478 | " print(df.query(f\"cluster == {c}\").topic_title.values[0])\n", 479 | " display(df.query(f\"cluster == {c}\").head())" 480 | ] 481 | } 482 | ], 483 | "metadata": { 484 | "kernelspec": { 485 | "display_name": ".venv", 486 | "language": "python", 487 | "name": "python3" 488 | }, 489 | "language_info": { 490 | "codemirror_mode": { 491 | "name": "ipython", 492 | "version": 3 493 | }, 494 | "file_extension": ".py", 495 | "mimetype": "text/x-python", 496 | "name": "python", 497 | "nbconvert_exporter": "python", 498 | "pygments_lexer": "ipython3", 499 | "version": "3.10.4" 500 | }, 501 | "orig_nbformat": 4, 502 | "colab": { 503 | "provenance": [], 504 | "include_colab_link": true 505 | } 506 | }, 507 | "nbformat": 4, 508 | "nbformat_minor": 0 509 | } -------------------------------------------------------------------------------- /Whisper_And_ChatGPT_API_Summary_Audio_Tutorial.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "view-in-github", 7 | "colab_type": "text" 8 | }, 9 | "source": [ 10 | "\"Open" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "metadata": { 16 | "id": "Oc7hK9mI93cB" 17 | }, 18 | "source": [ 19 | "## Whisper API转录" 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "source": [ 25 | "!pip install openai --upgrade -q" 26 | ], 27 | "metadata": { 28 | "colab": { 29 | "base_uri": "https://localhost:8080/" 30 | }, 31 | "id": "VzwMPgmA-65H", 32 | "outputId": "acc9baf3-adbc-4a7e-bb64-03e6119c9b94" 33 | }, 34 | "execution_count": 19, 35 | "outputs": [ 36 | { 37 | "output_type": "stream", 38 | "name": "stdout", 39 | "text": [ 40 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m506.0/506.0 kB\u001b[0m \u001b[31m13.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 41 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.7/2.7 MB\u001b[0m \u001b[31m79.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 42 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m163.8/163.8 kB\u001b[0m \u001b[31m21.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 43 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.7/1.7 MB\u001b[0m \u001b[31m72.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 44 | "\u001b[?25h" 45 | ] 46 | } 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "source": [ 52 | "os.environ['OPENAI_API_KEY'] = \"your-openai-api-key\"" 53 | ], 54 | "metadata": { 55 | "id": "kftPquJ6aAzD" 56 | }, 57 | "execution_count": null, 58 | "outputs": [] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": 4, 63 | "metadata": { 64 | "id": "7Wi42fmz93cG" 65 | }, 66 | "outputs": [], 67 | "source": [ 68 | "import openai, os\n", 69 | "\n", 70 | "openai.api_key = os.getenv(\"OPENAI_API_KEY\")\n", 71 | "\n", 72 | "audio_file= open(\"./data/generative_ai_topics_clip.mp3\", \"rb\")\n", 73 | "transcript = openai.Audio.transcribe(\"whisper-1\", audio_file)" 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": null, 79 | "metadata": { 80 | "id": "PCeKhPDK93cH", 81 | "outputId": "4a74bacd-4f52-4909-dd92-65d5ddbac2dc" 82 | }, 83 | "outputs": [ 84 | { 85 | "name": "stdout", 86 | "output_type": "stream", 87 | "text": [ 88 | "欢迎来到 Onboard 真实的一线经验 走新的投资思考 我是 Monica 我是高宁 我们一起聊聊软件如何改变世界 大家好 欢迎来到 Onboard 我是 Monica 自从OpenAI发布的ChatGBT 掀起了席卷世界的AI热潮 不到三个月就积累了 超过一亿的越货用户 超过1300万的日货用户 真的是展现了AI让人惊讶的 也让很多人直呼 这就是下一个互联网的未来 有不少观众都说 希望我们再做一期AI的讨论 于是这次硬核讨论就来了 这次我们请来了 Google Brain的研究员雪芝 她是Google大语言模型PALM Pathway Language Model的作者之一 要知道这个模型的参数量 是GPT-3的三倍还多 另外还有两位AI产品大牛 一位来自著名的StableDM 背后的商业公司Stability AI 另一位来自某硅谷科技大厂 也曾在吴恩达教授的Landing AI中 担任产品负责人 此外 莫妮凯还邀请到一位 一直关注AI的投资人朋友Bill 当做我的特邀共同主持嘉宾 我们主要讨论几个话题 一方面从研究的视角 最前沿的研究者在关注什么 现在技术的天花板 和未来大的变量可能会在哪里 第二个问题是 未来大的变量可能会在哪里 从产品和商业的角度 什么是一个好的AI产品 整个生态可能随着技术 有怎样的演变 更重要的 我们又能从上一波 AI的创业热潮中学到什么 最后 莫妮凯和Bill还会从投资人的视角 做一个回顾 总结和畅想 这里还有一个小的update 在本集发布的时候 Google也对爆发式增长的 Chad GPT做出了回应 正在测试一个基于Lambda 模型的聊天机器人 ApprenticeBot 正式发布后会有怎样的惊喜 我们都拭目以待 AI无疑是未来几年 最令人兴奋的变量之一 莫妮凯也希望未来能邀请到更多 一线从业者 从不同角度讨论这个话题 不论是想要做创业 研究 产品 还是投资的同学 希望这些对话 对于大家了解这些技术演进 商业的可能 甚至未来对于我们每个人 每个社会意味着什么 都能引发一些思考 提供一些启发 这次的讨论有些技术硬核 需要各位对生成式AI 大模型都有一些基础了解 讨论中涉及到的论文和重要概念 也会总结在本集的简介中 供大家复习参考 几位嘉宾在北美工作生活多年 夹杂英文在所难免 也请大家体谅了 欢迎来到未来 希望大家enjoy\n" 89 | ] 90 | } 91 | ], 92 | "source": [ 93 | "print(transcript['text'])" 94 | ] 95 | }, 96 | { 97 | "cell_type": "markdown", 98 | "metadata": { 99 | "id": "ZWrlRuKG93cJ" 100 | }, 101 | "source": [ 102 | "## Prompt增加标点符号" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": 4, 108 | "metadata": { 109 | "id": "sNYkxOco93cK" 110 | }, 111 | "outputs": [], 112 | "source": [ 113 | "audio_file= open(\"./data/generative_ai_topics_clip.mp3\", \"rb\")\n", 114 | "transcript = openai.Audio.transcribe(\"whisper-1\", audio_file,\n", 115 | " prompt=\"这是一段中文播客内容。\")" 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "execution_count": null, 121 | "metadata": { 122 | "id": "sj-d5XZa93cK", 123 | "outputId": "4d2b1a91-9b26-4c37-f072-cb3cd29ed159" 124 | }, 125 | "outputs": [ 126 | { 127 | "name": "stdout", 128 | "output_type": "stream", 129 | "text": [ 130 | "欢迎来到 Onboard,真实的一线经验,走新的投资思考。 我是 Monica。 我是高宁。我们一起聊聊软件如何改变世界。 大家好,欢迎来到 Onboard,我是 Monica。 自从 OpenAI 发布的 ChatGBT 掀起了席卷世界的 AI 热潮, 不到三个月就积累了超过一亿的越活用户,超过一千三百万的日活用户。 真的是展现了 AI 让人惊叹的能力, 也让很多人直呼这就是下一个互联网的未来。 有不少观众都说希望我们再做一期 AI 的讨论, 于是这次硬核讨论就来了。 这次我们请来了 Google Brain 的研究员雪芝, 她是 Google 大语言模型 PAMP,Pathway Language Model 的作者之一。 要知道,这个模型的参数量是 GPT-3 的三倍还多。 另外还有两位 AI 产品大牛,一位来自著名的 Stable Diffusion 背后的商业公司 Stability AI, 另一位来自某硅谷科技大厂,也曾在吴恩达教授的 Landing AI 中担任产品负责人。 此外,Monica 还邀请到一位一直关注 AI 的投资人朋友 Bill 当作我的特邀共同主持嘉宾。 我们主要讨论几个话题,一方面从研究的视角,最前沿的研究者在关注什么? 现在技术的天花板和未来大的变量可能会在哪里? 从产品和商业的角度,什么是一个好的 AI 产品? 整个生态可能随着技术有怎样的演变? 更重要的,我们又能从上一波 AI 的创业热潮中学到什么? 最后,Monica 和 Bill 还会从投资人的视角做一个回顾、总结和畅想。 这里还有一个小的 update,在本集发布的时候, Google 也对爆发式增长的ChatGPT 做出了回应, 正在测试一个基于 Lambda 模型的聊天机器人 ApprenticeBot。 正式发布后会有怎样的惊喜?我们都拭目以待。 AI 无疑是未来几年最令人兴奋的变量之一, Monica 也希望未来能邀请到更多一线从业者从不同角度讨论这个话题。 不论是想要做创业、研究、产品还是投资的同学, 希望这些对话对于大家了解这些技术演进、商业的可能, 甚至未来对于我们每个人、每个社会意味着什么, 都能引发一些思考,提供一些启发。 这次的讨论有些技术硬核,需要各位对生成式 AI 大模型都有一些基础了解。 讨论中涉及到的论文和重要概念,也会总结在本集的简介中,供大家复习参考。 几位嘉宾在北美工作生活多年,夹杂英文在所难免,也请大家体谅了。 欢迎来到未来,大家 enjoy!\n" 131 | ] 132 | } 133 | ], 134 | "source": [ 135 | "print(transcript['text'])" 136 | ] 137 | }, 138 | { 139 | "cell_type": "markdown", 140 | "metadata": { 141 | "id": "TmVu2O1i93cK" 142 | }, 143 | "source": [ 144 | "## Prompt增加中英文混排格式" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 5, 150 | "metadata": { 151 | "id": "OjusgTGy93cL", 152 | "outputId": "070584e0-5e3c-4e01-ac9f-99055500f05e", 153 | "colab": { 154 | "base_uri": "https://localhost:8080/" 155 | } 156 | }, 157 | "outputs": [ 158 | { 159 | "output_type": "stream", 160 | "name": "stdout", 161 | "text": [ 162 | "欢迎来到Onboard,真实的一线经验,走新的投资思考。 我是Monica,我是高宁,我们一起聊聊软件如何改变世界。 大家好,欢迎来到Onboard,我是Monica。 自从OpenAI发布的ChatGBT掀起了席卷世界的AI热潮, 不到三个月就积累了超过一亿的越活用户,超过1300万的日活用户。 真的是展现了AI让人惊叹的能力,也让很多人直呼这就是下一个互联网的未来。 有不少观众都说希望我们再做一期AI的讨论,于是这次硬核讨论就来了。 这次我们请来了Google Brain的研究员雪芝, 她是Google大语言模型POM,Pathway Language Model的作者之一。 要知道这个模型的参数量是GPT-3的三倍还多。 另外还有两位AI产品大牛,一位来自著名的Stable Diffusion背后的商业公司Stability AI, 另一位来自某硅谷科技大厂,也曾在吴恩达教授的Landing AI中担任产品负责人。 此外,Monica还邀请到一位一直关注AI的投资人朋友Bill,当做我的特邀共同主持嘉宾。 我们主要讨论几个话题,一方面从研究的视角,最前沿的研究者在关注什么? 现在的技术的天花板和未来大的变量可能会在哪里? 从产品和商业的角度,什么是一个好的AI产品? 整个生态可能随着技术有怎样的演变? 更重要的,我们又能从上一波AI的创业热潮中学到什么? 最后,Monica和Bill还会从投资人的视角做一个回顾、总结和畅想。 这里还有一个小的update,在本集发布的时候, Google也对爆发式增长的ChatGPT做出了回应, 正在测试一个基于Lambda模型的聊天机器人ApprenticeBot。 正式发布后会有怎样的惊喜?我们都拭目以待。 AI无疑是未来几年最令人兴奋的变量之一, Monica也希望未来能邀请到更多一线从业者从不同角度讨论这个话题。 不论是想要做创业、研究、产品还是投资的同学, 希望这些对话对于大家了解这些技术演进、商业的可能, 甚至未来对于我们每个人、每个社会意味着什么, 都能引发一些思考,提供一些启发。 这次的讨论有些技术硬核,需要各位对生成式AI、大模型都有一些基础了解。 讨论中涉及到的论文和重要概念,也会总结在本集的简介中,供大家复习参考。 几位嘉宾在北美工作生活多年,夹杂英文在所难免,也请大家体谅了。 欢迎来到未来,大家enjoy!\n" 163 | ] 164 | } 165 | ], 166 | "source": [ 167 | "audio_file= open(\"./data/generative_ai_topics_clip.mp3\", \"rb\")\n", 168 | "transcript = openai.Audio.transcribe(\"whisper-1\", audio_file,\n", 169 | " prompt=\"这是一段Onboard播客的内容。\")\n", 170 | "print(transcript['text'])" 171 | ] 172 | }, 173 | { 174 | "cell_type": "code", 175 | "execution_count": 7, 176 | "metadata": { 177 | "id": "8C5E1qIJ93cL", 178 | "outputId": "44c094af-f3d0-4e7b-8fa5-79ef825c2d30", 179 | "colab": { 180 | "base_uri": "https://localhost:8080/" 181 | } 182 | }, 183 | "outputs": [ 184 | { 185 | "output_type": "stream", 186 | "name": "stdout", 187 | "text": [ 188 | "欢迎来到Onboard,真实的一线经验,走新的投资思考。我是Monica。 我是高宁。我们一起聊聊软件如何改变世界。 大家好,欢迎来到Onboard,我是Monica。 自从OpenAI发布的ChatGPT掀起了席卷世界的AI热潮,不到三个月就积累了超过一亿的越活用户,超过1300万的日活用户。 真的是展现了AI让人惊叹的能力,也让很多人直呼这就是下一个互联网的未来。 有不少观众都说希望我们再做一期AI的讨论,于是这次硬核讨论就来了。 这次我们请来了Google Brain的研究员雪芝,她是Google大语言模型PALM Pathways Language Model的作者之一。 要知道,这个模型的参数量是GPT-3的三倍还多。 另外还有两位AI产品大牛,一位来自著名的Stable Diffusion背后的商业公司Stability AI, 另一位来自某硅谷科技大厂,也曾在吴恩达教授的Landing AI中担任产品负责人。 此外,Monica还邀请到一位一直关注AI的投资人朋友Bill当作我的特邀共同主持嘉宾。 我们主要讨论几个话题,一方面从研究的视角,最前沿的研究者在关注什么? 现在的技术的天花板和未来大的变量可能会在哪里? 从产品和商业的角度,什么是一个好的AI产品? 整个生态可能随着技术有怎样的演变? 更重要的,我们又能从上一波AI的创业热潮中学到什么? 最后,Monica和Bill还会从投资人的视角做一个回顾、总结和畅想。 这里还有一个小的update,在本集发布的时候,Google也对爆发式增长的Chat GPT做出了回应。 正在测试一个基于Lambda模型的聊天机器人ApprenticeBot。 证实发布后会有怎样的惊喜,我们都拭目以待。 AI无疑是未来几年最令人兴奋的变量之一。 Monica也希望未来能邀请到更多一线从业者从不同角度讨论这个话题。 不论是想要做创业、研究、产品还是投资的同学, 希望这些对话对于大家了解这些技术演进、商业的可能,甚至未来对于我们每个人、每个社会意味着什么都能引发一些思考,提供一些启发。 这次的讨论有些技术硬核,需要各位对生成式AI大模型都有一些基础了解。 讨论中涉及到的论文和重要概念也会总结在本集的简介中,供大家复习参考。 几位嘉宾在北美工作生活多年,夹杂英文在所难免,也请大家体谅了。 欢迎来到未来,大家enjoy!\n" 189 | ] 190 | } 191 | ], 192 | "source": [ 193 | "audio_file= open(\"./data/generative_ai_topics_clip.mp3\", \"rb\")\n", 194 | "transcript = openai.Audio.transcribe(\"whisper-1\", audio_file,\n", 195 | " prompt=\"这是一段Onboard播客,里面会聊到ChatGPT以及PALM这个大语言模型。这个模型也叫做Pathways Language Model。\")\n", 196 | "print(transcript['text'])" 197 | ] 198 | }, 199 | { 200 | "cell_type": "code", 201 | "execution_count": 8, 202 | "metadata": { 203 | "id": "_xCbVQJE93cM", 204 | "outputId": "75d30627-224f-47e4-823b-d31f67451a36", 205 | "colab": { 206 | "base_uri": "https://localhost:8080/" 207 | } 208 | }, 209 | "outputs": [ 210 | { 211 | "output_type": "stream", 212 | "name": "stdout", 213 | "text": [ 214 | "1\n", 215 | "00:00:01,000 --> 00:00:07,000\n", 216 | "欢迎来到Onboard,真实的一线经验,走新的投资思考。我是Monica。\n", 217 | "\n", 218 | "2\n", 219 | "00:00:07,000 --> 00:00:11,000\n", 220 | "我是高宁。我们一起聊聊软件如何改变世界。\n", 221 | "\n", 222 | "3\n", 223 | "00:00:15,000 --> 00:00:17,000\n", 224 | "大家好,欢迎来到Onboard,我是Monica。\n", 225 | "\n", 226 | "4\n", 227 | "00:00:17,000 --> 00:00:28,000\n", 228 | "自从OpenAI发布的ChatGBT掀起了席卷世界的AI热潮,不到三个月就积累了超过一亿的越活用户,超过1300万的日活用户。\n", 229 | "\n", 230 | "5\n", 231 | "00:00:28,000 --> 00:00:34,000\n", 232 | "真的是展现了AI让人惊叹的能力,也让很多人直呼这就是下一个互联网的未来。\n", 233 | "\n", 234 | "6\n", 235 | "00:00:34,000 --> 00:00:41,000\n", 236 | "有不少观众都说希望我们再做一期AI的讨论,于是这次硬核讨论就来了。\n", 237 | "\n", 238 | "7\n", 239 | "00:00:41,000 --> 00:00:49,000\n", 240 | "这次我们请来了Google Brain的研究员雪芝,她是Google大语言模型PALM Pathways Language Model的作者之一。\n", 241 | "\n", 242 | "8\n", 243 | "00:00:49,000 --> 00:00:53,000\n", 244 | "要知道,这个模型的参数量是GPT-3的三倍还多。\n", 245 | "\n", 246 | "9\n", 247 | "00:00:53,000 --> 00:01:01,000\n", 248 | "另外还有两位AI产品大牛,一位来自著名的Stable Diffusion背后的商业公司Surbility AI,\n", 249 | "\n", 250 | "10\n", 251 | "00:01:01,000 --> 00:01:08,000\n", 252 | "另一位来自某硅谷科技大厂,也曾在吴恩达教授的Landing AI中担任产品负责人。\n", 253 | "\n", 254 | "11\n", 255 | "00:01:08,000 --> 00:01:15,000\n", 256 | "此外,Monica还邀请到一位一直关注AI的投资人朋友Bill当作我的特邀共同主持嘉宾。\n", 257 | "\n", 258 | "12\n", 259 | "00:01:15,000 --> 00:01:22,000\n", 260 | "我们主要讨论几个话题,一方面从研究的视角,最前沿的研究者在关注什么?\n", 261 | "\n", 262 | "13\n", 263 | "00:01:22,000 --> 00:01:26,000\n", 264 | "现在的技术的天花板和未来大的变量可能会在哪里?\n", 265 | "\n", 266 | "14\n", 267 | "00:01:26,000 --> 00:01:30,000\n", 268 | "从产品和商业的角度,什么是一个好的AI产品?\n", 269 | "\n", 270 | "15\n", 271 | "00:01:30,000 --> 00:01:33,000\n", 272 | "整个生态可能随着技术有怎样的演变?\n", 273 | "\n", 274 | "16\n", 275 | "00:01:33,000 --> 00:01:38,000\n", 276 | "更重要的,我们又能从上一波AI的创业热潮中学到什么?\n", 277 | "\n", 278 | "17\n", 279 | "00:01:38,000 --> 00:01:44,000\n", 280 | "最后,Monica和Bill还会从投资人的视角做一个回顾、总结和畅想。\n", 281 | "\n", 282 | "18\n", 283 | "00:01:44,000 --> 00:01:51,000\n", 284 | "这里还有一个小的update,在本集发布的时候,Google也对爆发式增长的Chat GPT做出了回应。\n", 285 | "\n", 286 | "19\n", 287 | "00:01:51,000 --> 00:01:56,000\n", 288 | "正在测试一个基于Lambda模型的聊天机器人ApprenticeBot。\n", 289 | "\n", 290 | "20\n", 291 | "00:01:56,000 --> 00:02:00,000\n", 292 | "证实发布后会有怎样的惊喜,我们都拭目以待。\n", 293 | "\n", 294 | "21\n", 295 | "00:02:00,000 --> 00:02:04,000\n", 296 | "AI无疑是未来几年最令人兴奋的变量之一。\n", 297 | "\n", 298 | "22\n", 299 | "00:02:04,000 --> 00:02:10,000\n", 300 | "Monica也希望未来能邀请到更多一线从业者从不同角度讨论这个话题。\n", 301 | "\n", 302 | "23\n", 303 | "00:02:10,000 --> 00:02:15,000\n", 304 | "不论是想要做创业、研究、产品还是投资的同学,\n", 305 | "\n", 306 | "24\n", 307 | "00:02:15,000 --> 00:02:27,000\n", 308 | "希望这些对话对于大家了解这些技术演进、商业的可能,甚至未来对于我们每个人、每个社会意味着什么都能引发一些思考,提供一些启发。\n", 309 | "\n", 310 | "25\n", 311 | "00:02:27,000 --> 00:02:34,000\n", 312 | "这次的讨论有些技术硬核,需要各位对生成式AI大模型都有一些基础了解。\n", 313 | "\n", 314 | "26\n", 315 | "00:02:34,000 --> 00:02:41,000\n", 316 | "讨论中涉及到的论文和重要概念也会总结在本集的简介中,供大家复习参考。\n", 317 | "\n", 318 | "27\n", 319 | "00:02:41,000 --> 00:02:47,000\n", 320 | "几位嘉宾在北美工作生活多年,夹杂英文在所难免,也请大家体谅了。\n", 321 | "\n", 322 | "28\n", 323 | "00:02:47,000 --> 00:03:12,000\n", 324 | "欢迎来到未来,大家enjoy!\n", 325 | "\n", 326 | "\n", 327 | "\n" 328 | ] 329 | } 330 | ], 331 | "source": [ 332 | "audio_file= open(\"./data/generative_ai_topics_clip.mp3\", \"rb\")\n", 333 | "transcript = openai.Audio.transcribe(\"whisper-1\", audio_file, response_format=\"srt\",\n", 334 | " prompt=\"这是一段Onboard播客,里面会聊到PALM这个大语言模型。这个模型也叫做Pathways Language Model。\")\n", 335 | "print(transcript)" 336 | ] 337 | }, 338 | { 339 | "cell_type": "code", 340 | "execution_count": null, 341 | "metadata": { 342 | "id": "3sAmJkVE93cM", 343 | "outputId": "a7d04b61-c097-4278-9c93-ba679fac00bc" 344 | }, 345 | "outputs": [ 346 | { 347 | "name": "stdout", 348 | "output_type": "stream", 349 | "text": [ 350 | "WEBVTT\n", 351 | "\n", 352 | "00:00:01.000 --> 00:00:07.000\n", 353 | "欢迎来到Onboard,真实的一线经验,走新的投资思考。我是Monica。\n", 354 | "\n", 355 | "00:00:07.000 --> 00:00:11.000\n", 356 | "我是高宁。我们一起聊聊软件如何改变世界。\n", 357 | "\n", 358 | "00:00:15.000 --> 00:00:17.000\n", 359 | "大家好,欢迎来到Onboard,我是Monica。\n", 360 | "\n", 361 | "00:00:17.000 --> 00:00:28.000\n", 362 | "自从OpenAI发布的ChatGBT掀起了席卷世界的AI热潮,不到三个月就积累了超过一亿的越活用户,超过1300万的日活用户。\n", 363 | "\n", 364 | "00:00:28.000 --> 00:00:34.000\n", 365 | "真的是展现了AI让人惊叹的能力,也让很多人直呼这就是下一个互联网的未来。\n", 366 | "\n", 367 | "00:00:34.000 --> 00:00:41.000\n", 368 | "有不少观众都说希望我们再做一期AI的讨论,于是这次硬核讨论就来了。\n", 369 | "\n", 370 | "00:00:41.000 --> 00:00:49.000\n", 371 | "这次我们请来了Google Brain的研究员雪芝,她是Google大语言模型PALM Pathways Language Model的作者之一。\n", 372 | "\n", 373 | "00:00:49.000 --> 00:00:53.000\n", 374 | "要知道,这个模型的参数量是GPT-3的三倍还多。\n", 375 | "\n", 376 | "00:00:53.000 --> 00:01:01.000\n", 377 | "另外还有两位AI产品大牛,一位来自著名的Stable Diffusion背后的商业公司Surbility AI,\n", 378 | "\n", 379 | "00:01:01.000 --> 00:01:08.000\n", 380 | "另一位来自某硅谷科技大厂,也曾在吴恩达教授的Landing AI中担任产品负责人。\n", 381 | "\n", 382 | "00:01:08.000 --> 00:01:15.000\n", 383 | "此外,Monica还邀请到一位一直关注AI的投资人朋友Bill当作我的特邀共同主持嘉宾。\n", 384 | "\n", 385 | "00:01:15.000 --> 00:01:22.000\n", 386 | "我们主要讨论几个话题,一方面从研究的视角,最前沿的研究者在关注什么?\n", 387 | "\n", 388 | "00:01:22.000 --> 00:01:26.000\n", 389 | "现在的技术的天花板和未来大的变量可能会在哪里?\n", 390 | "\n", 391 | "00:01:26.000 --> 00:01:30.000\n", 392 | "从产品和商业的角度,什么是一个好的AI产品?\n", 393 | "\n", 394 | "00:01:30.000 --> 00:01:33.000\n", 395 | "整个生态可能随着技术有怎样的演变?\n", 396 | "\n", 397 | "00:01:33.000 --> 00:01:38.000\n", 398 | "更重要的,我们又能从上一波AI的创业热潮中学到什么?\n", 399 | "\n", 400 | "00:01:38.000 --> 00:01:44.000\n", 401 | "最后,Monica和Bill还会从投资人的视角做一个回顾、总结和畅想。\n", 402 | "\n", 403 | "00:01:44.000 --> 00:01:51.000\n", 404 | "这里还有一个小的update,在本集发布的时候,Google也对爆发式增长的Chat GPT做出了回应。\n", 405 | "\n", 406 | "00:01:51.000 --> 00:01:56.000\n", 407 | "正在测试一个基于Lambda模型的聊天机器人ApprenticeBot。\n", 408 | "\n", 409 | "00:01:56.000 --> 00:02:00.000\n", 410 | "证实发布后会有怎样的惊喜,我们都拭目以待。\n", 411 | "\n", 412 | "00:02:00.000 --> 00:02:04.000\n", 413 | "AI无疑是未来几年最令人兴奋的变量之一。\n", 414 | "\n", 415 | "00:02:04.000 --> 00:02:10.000\n", 416 | "Monica也希望未来能邀请到更多一线从业者从不同角度讨论这个话题。\n", 417 | "\n", 418 | "00:02:10.000 --> 00:02:15.000\n", 419 | "不论是想要做创业、研究、产品还是投资的同学,\n", 420 | "\n", 421 | "00:02:15.000 --> 00:02:27.000\n", 422 | "希望这些对话对于大家了解这些技术演进、商业的可能,甚至未来对于我们每个人、每个社会意味着什么都能引发一些思考,提供一些启发。\n", 423 | "\n", 424 | "00:02:27.000 --> 00:02:34.000\n", 425 | "这次的讨论有些技术硬核,需要各位对生成式AI大模型都有一些基础了解。\n", 426 | "\n", 427 | "00:02:34.000 --> 00:02:41.000\n", 428 | "讨论中涉及到的论文和重要概念也会总结在本集的简介中,供大家复习参考。\n", 429 | "\n", 430 | "00:02:41.000 --> 00:02:47.000\n", 431 | "几位嘉宾在北美工作生活多年,夹杂英文在所难免,也请大家体谅了。\n", 432 | "\n", 433 | "00:02:47.000 --> 00:03:12.000\n", 434 | "欢迎来到未来,大家enjoy!\n", 435 | "\n", 436 | "\n" 437 | ] 438 | } 439 | ], 440 | "source": [ 441 | "audio_file= open(\"./data/generative_ai_topics_clip.mp3\", \"rb\")\n", 442 | "transcript = openai.Audio.transcribe(\"whisper-1\", audio_file, response_format=\"vtt\",\n", 443 | " prompt=\"这是一段Onboard播客,里面会聊到PALM这个大语言模型。这个模型也叫做Pathways Language Model。\")\n", 444 | "print(transcript)" 445 | ] 446 | }, 447 | { 448 | "cell_type": "markdown", 449 | "metadata": { 450 | "id": "wV0eHUoG93cN" 451 | }, 452 | "source": [ 453 | "## 转录中顺便翻译" 454 | ] 455 | }, 456 | { 457 | "cell_type": "code", 458 | "execution_count": 9, 459 | "metadata": { 460 | "id": "yB_MSR3c93cN", 461 | "outputId": "cf341ccb-6d99-474b-a903-1d148709fec1", 462 | "colab": { 463 | "base_uri": "https://localhost:8080/" 464 | } 465 | }, 466 | "outputs": [ 467 | { 468 | "output_type": "stream", 469 | "name": "stdout", 470 | "text": [ 471 | "Welcome to Onboard, a new way to think about investing. I am Monica. I am Gao Ning. Let's talk about how software can change the world. Hello everyone, welcome to Onboard. I am Monica. Since the release of ChatGPT, which is an open AI, it has aroused the hype of AI in the world. In less than three months, it has accumulated more than 100 million active users and more than 13 million active users. It really shows the ability of AI to amaze people. It also makes many people say that this is the future of the next Internet. Many viewers said that they wanted us to do another AI discussion. So this discussion came. This time we invited a researcher from Google Brain, Xue Zhi. He is one of the authors of Google's large-scale language model PaLM, Pathways Language Model. You should know that the number of parameters of this model is three times more than GPT-3. There are also two AI product bigwigs, a commercial company behind the famous Stable Diffusion, Stability AI. Another one is from a Silicon Valley technology factory and was also the product manager in Professor Wu Wenda's Landing AI. In addition, Monica also invited a friend of AI who has been paying attention to AI, Bill, as my special guest host. We mainly discuss several topics. On the one hand, from the perspective of research, what are the most cutting-edge researchers paying attention to? Where are the cutting-edge technologies and the large variables of the future? From the perspective of products and business, what is a good AI product? What kind of evolution may the whole state follow? More importantly, what can we learn from the last wave of AI entrepreneurship? Finally, Monica and Bill will also make a review, summary and reflection from the perspective of investors. Here is a small update. When this issue was released, Google also responded to the explosive growth of ChatGPT. We are testing an Apprentice Bot based on the Lambda model. What kind of surprises will be released? We are looking forward to it. AI is undoubtedly one of the most exciting variables in the coming years. Monica also hopes to invite more frontline entrepreneurs to discuss this topic from different angles. Whether you want to start a business, research, product or investment, I hope these conversations will help you understand the possibilities of these technical horizons and business. Even in the future, it can cause some thoughts and inspire us to think about what it means to each person and each society. This discussion is a bit technical, and requires you to have some basic understanding of the biometric AI model. The discussion will also be summarized in this episode's summary, which is for your reference. You have been working in North America for many years, and your English is not good. Please understand. Welcome to the future. Enjoy. Let me give you a brief introduction. Some of your past experiences. A fun fact, your favorite TV show.\n" 472 | ] 473 | } 474 | ], 475 | "source": [ 476 | "audio_file= open(\"./data/generative_ai_topics_clip.mp3\", \"rb\")\n", 477 | "translated_prompt=\"\"\"This is a podcast discussing ChatGPT and PaLM model.\n", 478 | "The full name of PaLM is Pathways Language Model.\"\"\"\n", 479 | "transcript = openai.Audio.translate(\"whisper-1\", audio_file,\n", 480 | " prompt=translated_prompt)\n", 481 | "print(transcript['text'])" 482 | ] 483 | }, 484 | { 485 | "cell_type": "code", 486 | "execution_count": null, 487 | "metadata": { 488 | "id": "Z-F3X6cV93cN", 489 | "outputId": "8dee9103-94d0-45b0-e03f-43b0b6ada709" 490 | }, 491 | "outputs": [ 492 | { 493 | "name": "stdout", 494 | "output_type": "stream", 495 | "text": [ 496 | "Welcome to Onboard. Real first-line experience. New investment thinking. I am Monica. I am Gao Ning. Let's talk about how software can change the world. Hello everyone, welcome to Onboard. I am Monica. Since the release of ChatGPT by OpenAI, the world's AI has been in a frenzy. In less than three months, it has accumulated more than 100 million active users, and more than 13 million active users. It really shows the amazing ability of AI. It also makes many people say that this is the future of the next Internet. Many viewers said that they wanted us to do another AI discussion. So this discussion came. This time we invited a researcher from Google Brain, Xue Zhi. He is one of the authors of Google's large-scale model PaLM, Pathways Language Model. You should know that the number of parameters of this model is three times more than ChatGPT-3. In addition, there are two AI product big cows. One is from the famous company behind Stable Diffusion, Stability AI. The other is from a Silicon Valley technology factory. He was also the product manager in Professor Wu Wenda's Landing AI. In addition, Monica also invited a friend of AI who has been paying attention to AI, Bill, as my special guest host. We mainly discuss several topics. On the one hand, from the perspective of research, what are the most cutting-edge researchers paying attention to? Where are the cutting-edge technologies and the large variables of the future? From the perspective of products and business, what is a good AI product? What kind of evolution may the whole state follow? More importantly, what can we learn from the previous wave of AI entrepreneurship? Finally, Monica and Bill will also make a review, summary and reflection from the perspective of investors. Here is a small update. When this issue was released, Google also responded to the explosive growth of ChatGPT. We are testing an Apprentice Bot based on Lambda model. What kind of surprises will be released? We are looking forward to it. AI is undoubtedly one of the most exciting variables in the coming years. Monica also hopes to invite more first-line entrepreneurs to discuss this topic from different angles. Whether you want to do entrepreneurship, research, product or investment, I hope these conversations will help you understand the possibilities of these technical horizons and business. Even in the future, it can cause some thoughts and inspire us to think about what it means to each person and each society. This discussion is a bit technical, and requires you to have some basic understanding of the biometric AI model. The papers and important concepts involved in the discussion will also be summarized in this episode's summary, which is for your reference. You have worked in North America for many years, and you may have some English mistakes. Please understand. Welcome to the future. Enjoy. Let me give you a brief introduction. Some of your past experiences. A fun fact. Using an AI to represent the world is now palped.\n" 497 | ] 498 | } 499 | ], 500 | "source": [ 501 | "print(transcript['text'])" 502 | ] 503 | }, 504 | { 505 | "cell_type": "markdown", 506 | "metadata": { 507 | "id": "kZ2dOJxl93cO" 508 | }, 509 | "source": [ 510 | "## 分割音频转录" 511 | ] 512 | }, 513 | { 514 | "cell_type": "code", 515 | "execution_count": 13, 516 | "metadata": { 517 | "id": "n3xZu3Wi93cO", 518 | "colab": { 519 | "base_uri": "https://localhost:8080/" 520 | }, 521 | "outputId": "e52795e3-43d5-4536-fc5f-d034654d3b50" 522 | }, 523 | "outputs": [ 524 | { 525 | "output_type": "stream", 526 | "name": "stdout", 527 | "text": [ 528 | "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", 529 | "Collecting pydub\n", 530 | " Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)\n", 531 | "Installing collected packages: pydub\n", 532 | "Successfully installed pydub-0.25.1\n" 533 | ] 534 | } 535 | ], 536 | "source": [ 537 | "%pip install pydub" 538 | ] 539 | }, 540 | { 541 | "cell_type": "code", 542 | "execution_count": 14, 543 | "metadata": { 544 | "id": "f9MJblPT93cO" 545 | }, 546 | "outputs": [], 547 | "source": [ 548 | "from pydub import AudioSegment\n", 549 | "\n", 550 | "podcast = AudioSegment.from_mp3(\"./data/generative_ai_topics_long.mp3\")\n", 551 | "\n", 552 | "# PyDub handles time in milliseconds\n", 553 | "ten_minutes = 15 * 60 * 1000\n", 554 | "\n", 555 | "total_length = len(podcast)\n", 556 | "\n", 557 | "start = 0\n", 558 | "index = 0\n", 559 | "while start < total_length:\n", 560 | " end = start + ten_minutes\n", 561 | " if end < total_length:\n", 562 | " chunk = podcast[start:end]\n", 563 | " else:\n", 564 | " chunk = podcast[start:]\n", 565 | " with open(f\"./data/generative_ai_topics_long_{index}.mp3\", \"wb\") as f:\n", 566 | " chunk.export(f, format=\"mp3\")\n", 567 | " start = end\n", 568 | " index += 1" 569 | ] 570 | }, 571 | { 572 | "cell_type": "code", 573 | "execution_count": 15, 574 | "metadata": { 575 | "id": "x-Ky7FC-93cO" 576 | }, 577 | "outputs": [], 578 | "source": [ 579 | "original_prompt = \"这是一段Onboard播客,里面会聊到ChatGPT以及PALM这个大语言模型。这个模型也叫做Pathways Language Model。\\n\\n\"\n", 580 | "prompt = original_prompt\n", 581 | "for i in range(index):\n", 582 | " clip = f\"./data/generative_ai_topics_long_{i}.mp3\"\n", 583 | " audio_file= open(clip, \"rb\")\n", 584 | " transcript = openai.Audio.transcribe(\"whisper-1\", audio_file,\n", 585 | " prompt=prompt)\n", 586 | " # mkdir ./data/transcripts if not exists\n", 587 | " if not os.path.exists(\"./data/transcripts\"):\n", 588 | " os.makedirs(\"./data/transcripts\")\n", 589 | " # write to file\n", 590 | " with open(f\"./data/transcripts/generative_ai_topics_long_{i}.txt\", \"w\") as f:\n", 591 | " f.write(transcript['text'])\n", 592 | " # get last sentence of the transcript\n", 593 | " sentences = transcript['text'].split(\"。\")\n", 594 | " prompt = original_prompt + sentences[-1]\n" 595 | ] 596 | }, 597 | { 598 | "cell_type": "markdown", 599 | "metadata": { 600 | "id": "soibBzqb93cP" 601 | }, 602 | "source": [ 603 | "## 用ChatGPT总结" 604 | ] 605 | }, 606 | { 607 | "cell_type": "code", 608 | "source": [ 609 | "%pip install llama-index==0.5.26\n", 610 | "%pip install langchain" 611 | ], 612 | "metadata": { 613 | "id": "QzzPX95zSMpi" 614 | }, 615 | "execution_count": null, 616 | "outputs": [] 617 | }, 618 | { 619 | "cell_type": "code", 620 | "source": [ 621 | "%pip install spacy\n", 622 | "%run -m spacy download zh_core_web_sm" 623 | ], 624 | "metadata": { 625 | "colab": { 626 | "base_uri": "https://localhost:8080/" 627 | }, 628 | "id": "M9MhM4RNRa1J", 629 | "outputId": "0bc5e761-d169-4c5a-91d3-43f224c75e58" 630 | }, 631 | "execution_count": 2, 632 | "outputs": [ 633 | { 634 | "output_type": "stream", 635 | "name": "stdout", 636 | "text": [ 637 | "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", 638 | "Requirement already satisfied: spacy in /usr/local/lib/python3.10/dist-packages (3.5.2)\n", 639 | "Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /usr/local/lib/python3.10/dist-packages (from spacy) (3.0.12)\n", 640 | "Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (1.0.4)\n", 641 | "Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (1.0.9)\n", 642 | "Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.10/dist-packages (from spacy) (2.0.7)\n", 643 | "Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.10/dist-packages (from spacy) (3.0.8)\n", 644 | "Requirement already satisfied: thinc<8.2.0,>=8.1.8 in /usr/local/lib/python3.10/dist-packages (from spacy) (8.1.9)\n", 645 | "Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /usr/local/lib/python3.10/dist-packages (from spacy) (1.1.1)\n", 646 | "Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /usr/local/lib/python3.10/dist-packages (from spacy) (2.4.6)\n", 647 | "Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /usr/local/lib/python3.10/dist-packages (from spacy) (2.0.8)\n", 648 | "Requirement already satisfied: typer<0.8.0,>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (0.7.0)\n", 649 | "Requirement already satisfied: pathy>=0.10.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (0.10.1)\n", 650 | "Requirement already satisfied: smart-open<7.0.0,>=5.2.1 in /usr/local/lib/python3.10/dist-packages (from spacy) (6.3.0)\n", 651 | "Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (4.65.0)\n", 652 | "Requirement already satisfied: numpy>=1.15.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (1.22.4)\n", 653 | "Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (2.27.1)\n", 654 | "Requirement already satisfied: pydantic!=1.8,!=1.8.1,<1.11.0,>=1.7.4 in /usr/local/lib/python3.10/dist-packages (from spacy) (1.10.7)\n", 655 | "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from spacy) (3.1.2)\n", 656 | "Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from spacy) (67.7.2)\n", 657 | "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (23.1)\n", 658 | "Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (3.3.0)\n", 659 | "Requirement already satisfied: typing-extensions>=4.2.0 in /usr/local/lib/python3.10/dist-packages (from pydantic!=1.8,!=1.8.1,<1.11.0,>=1.7.4->spacy) (4.5.0)\n", 660 | "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy) (1.26.15)\n", 661 | "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy) (2022.12.7)\n", 662 | "Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy) (2.0.12)\n", 663 | "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy) (3.4)\n", 664 | "Requirement already satisfied: blis<0.8.0,>=0.7.8 in /usr/local/lib/python3.10/dist-packages (from thinc<8.2.0,>=8.1.8->spacy) (0.7.9)\n", 665 | "Requirement already satisfied: confection<1.0.0,>=0.0.1 in /usr/local/lib/python3.10/dist-packages (from thinc<8.2.0,>=8.1.8->spacy) (0.0.4)\n", 666 | "Requirement already satisfied: click<9.0.0,>=7.1.1 in /usr/local/lib/python3.10/dist-packages (from typer<0.8.0,>=0.3.0->spacy) (8.1.3)\n", 667 | "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->spacy) (2.1.2)\n", 668 | "\u001b[38;5;2m✔ Download and installation successful\u001b[0m\n", 669 | "You can now load the package via spacy.load('zh_core_web_sm')\n" 670 | ] 671 | } 672 | ] 673 | }, 674 | { 675 | "cell_type": "code", 676 | "execution_count": 5, 677 | "metadata": { 678 | "id": "1kxH0Jyb93cP", 679 | "outputId": "cd796a91-752e-4945-bad3-5af8c77da450", 680 | "colab": { 681 | "base_uri": "https://localhost:8080/" 682 | } 683 | }, 684 | "outputs": [ 685 | { 686 | "output_type": "stream", 687 | "name": "stderr", 688 | "text": [ 689 | "WARNING:llama_index.llm_predictor.base:Unknown max input size for gpt-3.5-turbo, using defaults.\n" 690 | ] 691 | }, 692 | { 693 | "output_type": "stream", 694 | "name": "stdout", 695 | "text": [ 696 | "播客讨论了AI的未来发展趋势、技术的天花板、好的AI产品应该具备的特点以及整个生态可能随着技术演变的变化。同时,回顾了上一波AI的创业热潮中的经验教训,并从投资人的视角做了一个回顾、总结和畅想。另外,还讨论了生成式AI大模型、Diffusion app在手机上的成功编译、AI和开源模型的应用、Google POM语言模型的介绍以及其与GPT-3的区别。最后,讨论了数据在微调任务中的重要性以及零样本、少样本和微调方法之间的差异。\n" 697 | ] 698 | } 699 | ], 700 | "source": [ 701 | "from langchain.chat_models import ChatOpenAI\n", 702 | "from langchain.text_splitter import SpacyTextSplitter\n", 703 | "from llama_index import GPTListIndex, LLMPredictor, ServiceContext, SimpleDirectoryReader\n", 704 | "from llama_index.node_parser import SimpleNodeParser\n", 705 | "\n", 706 | "# define LLM\n", 707 | "llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name=\"gpt-3.5-turbo\", max_tokens=1024))\n", 708 | "\n", 709 | "text_splitter = SpacyTextSplitter(pipeline=\"zh_core_web_sm\", chunk_size = 2048)\n", 710 | "parser = SimpleNodeParser(text_splitter=text_splitter)\n", 711 | "documents = SimpleDirectoryReader('./data/transcripts').load_data()\n", 712 | "nodes = parser.get_nodes_from_documents(documents)\n", 713 | "\n", 714 | "service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)\n", 715 | "\n", 716 | "list_index = GPTListIndex(nodes=nodes, service_context=service_context)\n", 717 | "\n", 718 | "response = list_index.query(\"请你总结一下我们的播客内容:\", response_mode=\"tree_summarize\")\n", 719 | "print(response)" 720 | ] 721 | }, 722 | { 723 | "cell_type": "code", 724 | "execution_count": 6, 725 | "metadata": { 726 | "id": "6pfQVzJg93cP", 727 | "outputId": "1b68944e-e40e-4ae0-b7ed-c60248bb950a", 728 | "colab": { 729 | "base_uri": "https://localhost:8080/" 730 | } 731 | }, 732 | "outputs": [ 733 | { 734 | "output_type": "stream", 735 | "name": "stdout", 736 | "text": [ 737 | "播客讨论了AI的未来发展趋势、技术的天花板、好的AI产品应该具备的特点以及整个生态可能随着技术演变的变化。同时,回顾了上一波AI的创业热潮中的经验教训,并从投资人的视角做了一个回顾、总结和畅想。另外,还讨论了生成式AI大模型、Diffusion app在手机上的成功编译、AI和开源模型的应用、Google POM语言模型的介绍以及其与GPT-3的区别。最后,讨论了数据在微调任务中的重要性以及零样本、少样本和微调方法之间的差异。\n" 738 | ] 739 | } 740 | ], 741 | "source": [ 742 | "print(response)" 743 | ] 744 | } 745 | ], 746 | "metadata": { 747 | "kernelspec": { 748 | "display_name": "geektime", 749 | "language": "python", 750 | "name": "python3" 751 | }, 752 | "language_info": { 753 | "name": "python", 754 | "version": "3.10.9" 755 | }, 756 | "orig_nbformat": 4, 757 | "colab": { 758 | "provenance": [], 759 | "include_colab_link": true 760 | } 761 | }, 762 | "nbformat": 4, 763 | "nbformat_minor": 0 764 | } -------------------------------------------------------------------------------- /Whisper_Transcribe_YouTube_Video.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "provenance": [], 7 | "authorship_tag": "ABX9TyNtAG23ZXN4bjDwqkS180xF", 8 | "include_colab_link": true 9 | }, 10 | "kernelspec": { 11 | "name": "python3", 12 | "display_name": "Python 3" 13 | }, 14 | "language_info": { 15 | "name": "python" 16 | } 17 | }, 18 | "cells": [ 19 | { 20 | "cell_type": "markdown", 21 | "metadata": { 22 | "id": "view-in-github", 23 | "colab_type": "text" 24 | }, 25 | "source": [ 26 | "\"Open" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "source": [ 32 | "# **Whisper 转录 YouTube *视频***" 33 | ], 34 | "metadata": { 35 | "id": "Ai6GxDDor_tf" 36 | } 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "source": [ 41 | "# **安装pytube库**\n", 42 | "\n", 43 | "---\n", 44 | "\n", 45 | "接下来我们将使用 Whisper 转录 YouTube 视频。我们将使用 Python 的“pytube”库将音频转换为MP4。文件\n", 46 | "\n", 47 | "\n", 48 | "```\n", 49 | "pytube repo:https://github.com/pytube/pytube。\n", 50 | "```\n", 51 | "\n", 52 | "\n", 53 | "*首先,我们需要安装* Pytube 库。您可以通过在终端中运行以下命令来执行此操作:" 54 | ], 55 | "metadata": { 56 | "id": "NGjXLVgNsUpc" 57 | } 58 | }, 59 | { 60 | "cell_type": "code", 61 | "execution_count": 1, 62 | "metadata": { 63 | "colab": { 64 | "base_uri": "https://localhost:8080/" 65 | }, 66 | "id": "OQ5nLh1Xro1r", 67 | "outputId": "487c37f8-af0c-4469-9795-03e457f4dcf6" 68 | }, 69 | "outputs": [ 70 | { 71 | "output_type": "stream", 72 | "name": "stdout", 73 | "text": [ 74 | "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", 75 | "Collecting pytube\n", 76 | " Downloading pytube-15.0.0-py3-none-any.whl (57 kB)\n", 77 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m57.6/57.6 kB\u001b[0m \u001b[31m2.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 78 | "\u001b[?25hInstalling collected packages: pytube\n", 79 | "Successfully installed pytube-15.0.0\n" 80 | ] 81 | } 82 | ], 83 | "source": [ 84 | "!pip install --upgrade pytube" 85 | ] 86 | }, 87 | { 88 | "cell_type": "markdown", 89 | "source": [ 90 | "# **下载音频MP4文件**\n", 91 | "\n", 92 | "接下来,我们需要导入 pytube,提供 YouTube 视频的链接,并将音频转换为MP4:" 93 | ], 94 | "metadata": { 95 | "id": "U2niFIRts8d2" 96 | } 97 | }, 98 | { 99 | "cell_type": "code", 100 | "source": [ 101 | "# 导入Pytube库\n", 102 | "import pytube\n", 103 | "\n", 104 | "#加载YouTube链接\n", 105 | "video = \"https://www.youtube.com/watch?v=x7X9w_GIm1s\"\n", 106 | "\n", 107 | "# 转换和下载为MP4文件\n", 108 | "data = pytube.YouTube(video)\n", 109 | "audio = data.streams.get_audio_only()\n", 110 | "audio.download()" 111 | ], 112 | "metadata": { 113 | "id": "m8MZqXfHtII-" 114 | }, 115 | "execution_count": null, 116 | "outputs": [] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "source": [ 121 | "# **安装 Whisper 库**" 122 | ], 123 | "metadata": { 124 | "id": "Hz_LnnLYtSqu" 125 | } 126 | }, 127 | { 128 | "cell_type": "code", 129 | "source": [ 130 | "!pip install git+https://github.com/openai/whisper.git -q" 131 | ], 132 | "metadata": { 133 | "id": "1jo9UdWItYLF" 134 | }, 135 | "execution_count": null, 136 | "outputs": [] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "source": [ 141 | "import whisper\n", 142 | "\n", 143 | "model = whisper.load_model(\"base\")\n", 144 | "text = model.transcribe(\"Python in 100 Seconds.mp4\")\n", 145 | "\n", 146 | "#printing the transcribe\n", 147 | "text['text']" 148 | ], 149 | "metadata": { 150 | "id": "G7hHNJCmtdLV" 151 | }, 152 | "execution_count": null, 153 | "outputs": [] 154 | } 155 | ] 156 | } 157 | --------------------------------------------------------------------------------