├── 5generated questions from s23 manual.png ├── Architecture_VIT_Resnet.jpeg ├── Blip2Langchain.ipynb ├── Blip_Langchain.ipynb ├── Chat_with_CSV_&_Excel_using_LangChain_and_OpenAI.ipynb ├── Experimenting_with_Flan20B,_FlanT5,_Cohere_LLM.ipynb ├── LICENSE ├── LangChain Cookbook.ipynb ├── LangChain_HuggingFace_Models_Locally.ipynb ├── LangChain_Running_HuggingFace_Models_Locally.ipynb ├── Langchain_Private_PDF_Langchain_Summary_using_Flan5_LLM.ipynb ├── Langchain_Semnatic_Serach_Pinecone.ipynb ├── Q&A_Bot_with_Llama_Index_and_LangChain_on_ML_course_contents.ipynb ├── QA Generation results for s23 ultra user manual.csv ├── QA_app_using_Pinecone_openai.ipynb ├── README.md ├── S23UltraUserManual.txt ├── Talk_to_CSV_&_Excel_Files_with_LangChain.ipynb ├── gpt_cache_colab.ipynb ├── langChain_pandas_agent.ipynb ├── package-manifest.csv ├── s22_wiki.pdf ├── s23_wiki.pdf ├── the_constitution_of_india.pdf └── working_with_receipts_image_data_with_llm.ipynb /5generated questions from s23 manual.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/navneetkrc/langchain_colab_experiments/ef37bf48f653463a1a2580b20610061391512af6/5generated questions from s23 manual.png -------------------------------------------------------------------------------- /Architecture_VIT_Resnet.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/navneetkrc/langchain_colab_experiments/ef37bf48f653463a1a2580b20610061391512af6/Architecture_VIT_Resnet.jpeg -------------------------------------------------------------------------------- /Chat_with_CSV_&_Excel_using_LangChain_and_OpenAI.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "provenance": [], 7 | "include_colab_link": true 8 | }, 9 | "kernelspec": { 10 | "name": "python3", 11 | "display_name": "Python 3" 12 | }, 13 | "language_info": { 14 | "name": "python" 15 | } 16 | }, 17 | "cells": [ 18 | { 19 | "cell_type": "markdown", 20 | "metadata": { 21 | "id": "view-in-github", 22 | "colab_type": "text" 23 | }, 24 | "source": [ 25 | "\"Open" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "source": [ 31 | "Code Credit - https://twitter.com/pwang_szn/status/1642104548109201410/photo/1\n", 32 | "Video Source - https://youtu.be/nr-mDSi9LxA\n" 33 | ], 34 | "metadata": { 35 | "id": "r0alwu2qZzGU" 36 | } 37 | }, 38 | { 39 | "cell_type": "code", 40 | "source": [ 41 | "!pip install -q langchain openai chromadb\n" 42 | ], 43 | "metadata": { 44 | "id": "JGXR0-MsaE1T" 45 | }, 46 | "execution_count": null, 47 | "outputs": [] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "source": [ 52 | "from langchain.document_loaders import CSVLoader\n", 53 | "from langchain.indexes import VectorstoreIndexCreator\n", 54 | "from langchain.chains import RetrievalQA\n", 55 | "from langchain.llms import OpenAI\n", 56 | "import os" 57 | ], 58 | "metadata": { 59 | "id": "0X1EX8eRcRCk" 60 | }, 61 | "execution_count": 2, 62 | "outputs": [] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "source": [ 67 | "Get your OpenAI Key from here - https://platform.openai.com/account/api-keys" 68 | ], 69 | "metadata": { 70 | "id": "aJOrUeJedzMF" 71 | } 72 | }, 73 | { 74 | "cell_type": "code", 75 | "source": [ 76 | "os.environ[\"OPENAI_API_KEY\"] = \"sk-\"" 77 | ], 78 | "metadata": { 79 | "id": "Zmk3KQMqZvsO" 80 | }, 81 | "execution_count": 3, 82 | "outputs": [] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "source": [ 87 | "!wget https://gist.githubusercontent.com/armgilles/194bcff35001e7eb53a2a8b441e8b2c6/raw/92200bc0a673d5ce2110aaad4544ed6c4010f687/pokemon.csv " 88 | ], 89 | "metadata": { 90 | "id": "odUgyhg_c4Co" 91 | }, 92 | "execution_count": null, 93 | "outputs": [] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "source": [ 98 | "# Load the documents\n", 99 | "loader = CSVLoader(file_path='pokemon.csv')" 100 | ], 101 | "metadata": { 102 | "id": "TR8dXxxHbY_b" 103 | }, 104 | "execution_count": 5, 105 | "outputs": [] 106 | }, 107 | { 108 | "cell_type": "code", 109 | "source": [ 110 | "# Create an index using the loaded documents\n", 111 | "index_creator = VectorstoreIndexCreator()\n", 112 | "docsearch = index_creator.from_loaders([loader])" 113 | ], 114 | "metadata": { 115 | "colab": { 116 | "base_uri": "https://localhost:8080/" 117 | }, 118 | "id": "3P4s3IKXaar8", 119 | "outputId": "1254fc84-5f05-4091-95c5-64846749519b" 120 | }, 121 | "execution_count": 6, 122 | "outputs": [ 123 | { 124 | "output_type": "stream", 125 | "name": "stderr", 126 | "text": [ 127 | "WARNING:chromadb:Using embedded DuckDB without persistence: data will be transient\n" 128 | ] 129 | } 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": 7, 135 | "metadata": { 136 | "id": "-vruq68YZnmL" 137 | }, 138 | "outputs": [], 139 | "source": [ 140 | "# Create a question-answering chain using the index\n", 141 | "chain = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type=\"stuff\", retriever=docsearch.vectorstore.as_retriever(), input_key=\"question\")\n" 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "source": [ 147 | "# Pass a query to the chain\n", 148 | "query = \"Do you have a column called age?\"\n", 149 | "response = chain({\"question\": query})\n", 150 | "print(response['result'])" 151 | ], 152 | "metadata": { 153 | "id": "Jwy7gjr0aXBr" 154 | }, 155 | "execution_count": 8, 156 | "outputs": [] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "source": [ 161 | "# Pass a query to the chain\n", 162 | "query = \"What are the columns available?\"\n", 163 | "response = chain({\"question\": query})\n", 164 | "print(response['result'])" 165 | ], 166 | "metadata": { 167 | "colab": { 168 | "base_uri": "https://localhost:8080/" 169 | }, 170 | "id": "9D8ajM74eC_c", 171 | "outputId": "c37ec4dc-d819-470a-fea6-5e593e786db0" 172 | }, 173 | "execution_count": 10, 174 | "outputs": [ 175 | { 176 | "output_type": "stream", 177 | "name": "stdout", 178 | "text": [ 179 | " The columns available are Name, Type 1, Type 2, Total, HP, Attack, Defense, Sp. Atk, Sp. Def, Speed, Generation, and Legendary.\n" 180 | ] 181 | } 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "source": [ 187 | "# Pass a query to the chain\n", 188 | "query = \"Which is the most powerful pokemon?\"\n", 189 | "response = chain({\"question\": query})\n", 190 | "print(response['result'])" 191 | ], 192 | "metadata": { 193 | "id": "G7gSpkQ0eEmM", 194 | "outputId": "541d9369-928e-4ec8-89f0-dff0e46cd0ff", 195 | "colab": { 196 | "base_uri": "https://localhost:8080/" 197 | } 198 | }, 199 | "execution_count": 11, 200 | "outputs": [ 201 | { 202 | "output_type": "stream", 203 | "name": "stdout", 204 | "text": [ 205 | " Mewtwo is the most powerful pokemon, with a total stat of 680.\n" 206 | ] 207 | } 208 | ] 209 | }, 210 | { 211 | "cell_type": "code", 212 | "source": [ 213 | "# Pass a query to the chain\n", 214 | "query = \"How good is Pikachu?\"\n", 215 | "response = chain({\"question\": query})\n", 216 | "print(response['result'])" 217 | ], 218 | "metadata": { 219 | "id": "NrVNoJvjJ8oo", 220 | "outputId": "d55560a5-0c05-4217-821c-a86213168f60", 221 | "colab": { 222 | "base_uri": "https://localhost:8080/" 223 | } 224 | }, 225 | "execution_count": 12, 226 | "outputs": [ 227 | { 228 | "output_type": "stream", 229 | "name": "stdout", 230 | "text": [ 231 | " Pikachu has a total stat value of 320, with 35 HP, 55 Attack, 40 Defense, 50 Special Attack, 50 Special Defense, and 90 Speed. These stats make Pikachu a fairly good Pokémon.\n" 232 | ] 233 | } 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "source": [ 239 | "# Pass a query to the chain\n", 240 | "query = \"How good is Pikachu vs bulbasuar?\"\n", 241 | "response = chain({\"question\": query})\n", 242 | "print(response['result'])" 243 | ], 244 | "metadata": { 245 | "id": "Hy2poU9_KE4s", 246 | "outputId": "633822dc-136c-4182-bc82-703f81c3d9da", 247 | "colab": { 248 | "base_uri": "https://localhost:8080/" 249 | } 250 | }, 251 | "execution_count": 13, 252 | "outputs": [ 253 | { 254 | "output_type": "stream", 255 | "name": "stdout", 256 | "text": [ 257 | " Pikachu has a higher total stat score than Bulbasaur, and it has a higher speed stat, so it would have an advantage in a battle against Bulbasaur.\n" 258 | ] 259 | } 260 | ] 261 | }, 262 | { 263 | "cell_type": "code", 264 | "source": [ 265 | "# Pass a query to the chain\n", 266 | "query = \"What is the highest generation ?\"\n", 267 | "response = chain({\"question\": query})\n", 268 | "print(response['result'])" 269 | ], 270 | "metadata": { 271 | "id": "hmT8g3SqKfZi", 272 | "outputId": "5d152f4e-674b-49e2-f0a1-40fd0746e3ad", 273 | "colab": { 274 | "base_uri": "https://localhost:8080/" 275 | } 276 | }, 277 | "execution_count": 18, 278 | "outputs": [ 279 | { 280 | "output_type": "stream", 281 | "name": "stdout", 282 | "text": [ 283 | " Generation 5.\n" 284 | ] 285 | } 286 | ] 287 | }, 288 | { 289 | "cell_type": "code", 290 | "source": [ 291 | "# Pass a query to the chain\n", 292 | "query = \"What is the stats of MeloettaAria Forme?\"\n", 293 | "response = chain({\"question\": query})\n", 294 | "print(response['result'])" 295 | ], 296 | "metadata": { 297 | "id": "crlnLdsmKzVk", 298 | "outputId": "9e5f32bc-6bc2-4a87-c6a2-5e45e502cee2", 299 | "colab": { 300 | "base_uri": "https://localhost:8080/" 301 | } 302 | }, 303 | "execution_count": 17, 304 | "outputs": [ 305 | { 306 | "output_type": "stream", 307 | "name": "stdout", 308 | "text": [ 309 | " MeloettaAria Forme has 100 HP, 77 Attack, 77 Defense, 128 Sp. Atk, 128 Sp. Def, and 90 Speed.\n" 310 | ] 311 | } 312 | ] 313 | }, 314 | { 315 | "cell_type": "code", 316 | "source": [ 317 | "# Pass a query to the chain\n", 318 | "query = \"What is the stats of MeloettaAria Forme?\"\n", 319 | "response = chain({\"question\": query})\n", 320 | "print(response['result'])" 321 | ], 322 | "metadata": { 323 | "id": "5Wwjp4NPK--q", 324 | "outputId": "e4a45abe-b15a-4789-adba-829ea8b6a311", 325 | "colab": { 326 | "base_uri": "https://localhost:8080/" 327 | } 328 | }, 329 | "execution_count": 16, 330 | "outputs": [ 331 | { 332 | "output_type": "execute_result", 333 | "data": { 334 | "text/plain": [ 335 | "" 336 | ] 337 | }, 338 | "metadata": {}, 339 | "execution_count": 16 340 | } 341 | ] 342 | }, 343 | { 344 | "cell_type": "code", 345 | "source": [], 346 | "metadata": { 347 | "id": "mgcXDq0uLGrt" 348 | }, 349 | "execution_count": null, 350 | "outputs": [] 351 | } 352 | ] 353 | } -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 Navneet Kumar Chaudhary 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /LangChain_Running_HuggingFace_Models_Locally.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "provenance": [], 7 | "include_colab_link": true 8 | }, 9 | "kernelspec": { 10 | "name": "python3", 11 | "display_name": "Python 3" 12 | }, 13 | "language_info": { 14 | "name": "python" 15 | }, 16 | "accelerator": "GPU", 17 | "gpuClass": "standard" 18 | }, 19 | "cells": [ 20 | { 21 | "cell_type": "markdown", 22 | "metadata": { 23 | "id": "view-in-github", 24 | "colab_type": "text" 25 | }, 26 | "source": [ 27 | "\"Open" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "source": [ 33 | "Based on this Video by Sam Witteveen\n", 34 | "https://www.youtube.com/watch?v=Kn7SX2Mx_Jk\n", 35 | "Colab Code Notebook: [https://drp.li/m1mbM](https://drp.li/m1mbM)" 36 | ], 37 | "metadata": { 38 | "id": "GJc0JBHbASzD" 39 | } 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "source": [ 44 | "Local LLM Models \n", 45 | "\n", 46 | "* google/flan-t5-xl\n", 47 | "* facebook/blenderbot-1B-distill\n", 48 | "* sentence-transformers/all-mpnet-base-v2,\n", 49 | "* gpt2-medium\n", 50 | "\n" 51 | ], 52 | "metadata": { 53 | "id": "92wQbt1fDw6Z" 54 | } 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": 1, 59 | "metadata": { 60 | "id": "i9fJuC2tG0Cq" 61 | }, 62 | "outputs": [], 63 | "source": [ 64 | "!pip -q install langchain huggingface_hub transformers sentence_transformers" 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "source": [ 70 | "## HuggingFace\n", 71 | "\n", 72 | "There are two Hugging Face LLM wrappers, one for a local pipeline and one for a model hosted on Hugging Face Hub. Note that these wrappers only work for models that support the following tasks: text2text-generation, text-generation\n" 73 | ], 74 | "metadata": { 75 | "id": "VkVTT54xNq8T" 76 | } 77 | }, 78 | { 79 | "cell_type": "code", 80 | "source": [ 81 | "import os\n", 82 | "\n", 83 | "\n", 84 | "os.environ['HUGGINGFACEHUB_API_TOKEN'] = ''" 85 | ], 86 | "metadata": { 87 | "id": "2PpqSc3cG7jJ" 88 | }, 89 | "execution_count": 2, 90 | "outputs": [] 91 | }, 92 | { 93 | "cell_type": "markdown", 94 | "source": [ 95 | "## Use the HuggingFaceHub" 96 | ], 97 | "metadata": { 98 | "id": "UhauDrynY0cj" 99 | } 100 | }, 101 | { 102 | "cell_type": "code", 103 | "source": [ 104 | "from langchain import PromptTemplate, HuggingFaceHub, LLMChain\n", 105 | "\n", 106 | "template = \"\"\"Question: {question}\n", 107 | "\n", 108 | "Answer: Let's think step by step.\"\"\"\n", 109 | "\n", 110 | "prompt = PromptTemplate(template=template, input_variables=[\"question\"])" 111 | ], 112 | "metadata": { 113 | "id": "Derb_0t-ZESh" 114 | }, 115 | "execution_count": 3, 116 | "outputs": [] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "source": [ 121 | "llm_chain = LLMChain(prompt=prompt, \n", 122 | " llm=HuggingFaceHub(repo_id=\"google/flan-t5-xl\", \n", 123 | " model_kwargs={\"temperature\":0, \n", 124 | " \"max_length\":64}))" 125 | ], 126 | "metadata": { 127 | "id": "lvO31GCGKhHs" 128 | }, 129 | "execution_count": 4, 130 | "outputs": [] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "source": [ 135 | "question = \"What is the capital of France?\"\n", 136 | "\n", 137 | "print(llm_chain.run(question))" 138 | ], 139 | "metadata": { 140 | "colab": { 141 | "base_uri": "https://localhost:8080/" 142 | }, 143 | "id": "uWIc38V4t5gA", 144 | "outputId": "b375e49b-2686-4caf-e351-f627384eafbb" 145 | }, 146 | "execution_count": 5, 147 | "outputs": [ 148 | { 149 | "output_type": "stream", 150 | "name": "stdout", 151 | "text": [ 152 | "Paris is the capital of France. The final answer: Paris.\n" 153 | ] 154 | } 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "source": [ 160 | "question = \"What area is best for growing wine in France?\"\n", 161 | "\n", 162 | "print(llm_chain.run(question))" 163 | ], 164 | "metadata": { 165 | "colab": { 166 | "base_uri": "https://localhost:8080/" 167 | }, 168 | "id": "uK61DssCdCmr", 169 | "outputId": "fbcb1af1-2f40-4f89-9d3c-97cc2af3ad5c" 170 | }, 171 | "execution_count": 6, 172 | "outputs": [ 173 | { 174 | "output_type": "stream", 175 | "name": "stdout", 176 | "text": [ 177 | "The best area for growing wine in France is the Loire Valley. The Loire Valley is located in the south of France. The area of France that is best for growing wine is the Loire Valley. The final answer: Loire Valley.\n" 178 | ] 179 | } 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "source": [ 185 | "## BlenderBot\n", 186 | "\n", 187 | "Doesn't work on the Hub" 188 | ], 189 | "metadata": { 190 | "id": "bsg2rRuNwNb9" 191 | } 192 | }, 193 | { 194 | "cell_type": "code", 195 | "source": [ 196 | "'''\n", 197 | "blenderbot_chain = LLMChain(prompt=prompt, \n", 198 | " llm=HuggingFaceHub(repo_id=\"facebook/blenderbot-1B-distill\", \n", 199 | " model_kwargs={\"temperature\":0, \n", 200 | " \"max_length\":64}))\n", 201 | " '''" 202 | ], 203 | "metadata": { 204 | "colab": { 205 | "base_uri": "https://localhost:8080/", 206 | "height": 54 207 | }, 208 | "id": "Io-4j_aat_Ll", 209 | "outputId": "afe1f8c0-4a3c-4523-a31f-bacbeff25dc2" 210 | }, 211 | "execution_count": 7, 212 | "outputs": [ 213 | { 214 | "output_type": "execute_result", 215 | "data": { 216 | "text/plain": [ 217 | "'\\nblenderbot_chain = LLMChain(prompt=prompt, \\n llm=HuggingFaceHub(repo_id=\"facebook/blenderbot-1B-distill\", \\n model_kwargs={\"temperature\":0, \\n \"max_length\":64}))\\n '" 218 | ], 219 | "application/vnd.google.colaboratory.intrinsic+json": { 220 | "type": "string" 221 | } 222 | }, 223 | "metadata": {}, 224 | "execution_count": 7 225 | } 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "source": [ 231 | "# question = \"What is the capital of France?\"\n", 232 | "# question = \"What area is best for growing wine in France?\"\n", 233 | "\n", 234 | "# print(blenderbot_chain = LLMChain(prompt=prompt, \n", 235 | "# .run(question))" 236 | ], 237 | "metadata": { 238 | "id": "F8hXzlWZuOhE" 239 | }, 240 | "execution_count": 8, 241 | "outputs": [] 242 | }, 243 | { 244 | "cell_type": "markdown", 245 | "source": [ 246 | "## With Local model from HF \n", 247 | "\n", 248 | "### Why would you want to use local mode?\n", 249 | "\n", 250 | "- fine-tuned models\n", 251 | "- GPU hosted etc\n", 252 | "- some models only work locally" 253 | ], 254 | "metadata": { 255 | "id": "-xUmcqkUf9Dv" 256 | } 257 | }, 258 | { 259 | "cell_type": "markdown", 260 | "source": [ 261 | "## T5-Flan - Encoder-Decoder" 262 | ], 263 | "metadata": { 264 | "id": "WXOZ_Un6e1oo" 265 | } 266 | }, 267 | { 268 | "cell_type": "code", 269 | "source": [ 270 | "from langchain.llms import HuggingFacePipeline\n", 271 | "import torch\n", 272 | "from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, AutoModelForSeq2SeqLM\n", 273 | "\n", 274 | "model_id = 'google/flan-t5-large'# go for a smaller model if you dont have the VRAM\n", 275 | "tokenizer = AutoTokenizer.from_pretrained(model_id)\n", 276 | "model = AutoModelForSeq2SeqLM.from_pretrained(model_id)\n", 277 | "\n", 278 | "pipe = pipeline(\n", 279 | " \"text2text-generation\",\n", 280 | " model=model, \n", 281 | " tokenizer=tokenizer, \n", 282 | " max_length=100\n", 283 | ")\n", 284 | "\n", 285 | "local_llm = HuggingFacePipeline(pipeline=pipe)\n" 286 | ], 287 | "metadata": { 288 | "id": "GMg2xiRnfm21" 289 | }, 290 | "execution_count": 9, 291 | "outputs": [] 292 | }, 293 | { 294 | "cell_type": "code", 295 | "source": [ 296 | "print(local_llm('What is the capital of France? '))" 297 | ], 298 | "metadata": { 299 | "colab": { 300 | "base_uri": "https://localhost:8080/" 301 | }, 302 | "id": "LvsLFQxehfZe", 303 | "outputId": "d7f9b5c8-59fe-4142-9ff4-e63c0b17d046" 304 | }, 305 | "execution_count": 10, 306 | "outputs": [ 307 | { 308 | "output_type": "stream", 309 | "name": "stdout", 310 | "text": [ 311 | "paris\n" 312 | ] 313 | } 314 | ] 315 | }, 316 | { 317 | "cell_type": "code", 318 | "source": [ 319 | "llm_chain = LLMChain(prompt=prompt, \n", 320 | " llm=local_llm\n", 321 | " )\n", 322 | "\n", 323 | "question = \"What is the capital of England?\"\n", 324 | "\n", 325 | "print(llm_chain.run(question))" 326 | ], 327 | "metadata": { 328 | "colab": { 329 | "base_uri": "https://localhost:8080/" 330 | }, 331 | "id": "3UVVbwLegmU8", 332 | "outputId": "9ccc215a-561a-4f59-e105-515e67efd968" 333 | }, 334 | "execution_count": 11, 335 | "outputs": [ 336 | { 337 | "output_type": "stream", 338 | "name": "stdout", 339 | "text": [ 340 | "The capital of England is London. London is the capital of England. So the answer is London.\n" 341 | ] 342 | } 343 | ] 344 | }, 345 | { 346 | "cell_type": "markdown", 347 | "source": [ 348 | "## GPT2-medium - Decoder Only Model\n", 349 | "\n", 350 | "microsoft/DialoGPT-large" 351 | ], 352 | "metadata": { 353 | "id": "ccAHJARfnKSK" 354 | } 355 | }, 356 | { 357 | "cell_type": "code", 358 | "source": [ 359 | "model_id = \"gpt2-medium\"\n", 360 | "tokenizer = AutoTokenizer.from_pretrained(model_id)\n", 361 | "model = AutoModelForCausalLM.from_pretrained(model_id)\n", 362 | "\n", 363 | "pipe = pipeline(\n", 364 | " \"text-generation\", \n", 365 | " model=model, \n", 366 | " tokenizer=tokenizer, \n", 367 | " max_length=100\n", 368 | ")\n", 369 | "\n", 370 | "local_llm = HuggingFacePipeline(pipeline=pipe)" 371 | ], 372 | "metadata": { 373 | "id": "wEKTalGcgxRg" 374 | }, 375 | "execution_count": 12, 376 | "outputs": [] 377 | }, 378 | { 379 | "cell_type": "code", 380 | "source": [ 381 | "llm_chain = LLMChain(prompt=prompt, \n", 382 | " llm=local_llm\n", 383 | " )\n", 384 | "\n", 385 | "question = \"What is the capital of France?\"\n", 386 | "\n", 387 | "print(llm_chain.run(question))" 388 | ], 389 | "metadata": { 390 | "colab": { 391 | "base_uri": "https://localhost:8080/" 392 | }, 393 | "id": "MNMzu_bznhBp", 394 | "outputId": "f790fd3e-e287-41d8-c2bc-7c87dd09bb8e" 395 | }, 396 | "execution_count": 13, 397 | "outputs": [ 398 | { 399 | "output_type": "stream", 400 | "name": "stderr", 401 | "text": [ 402 | "/usr/local/lib/python3.9/dist-packages/transformers/generation/utils.py:1201: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)\n", 403 | " warnings.warn(\n", 404 | "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n" 405 | ] 406 | }, 407 | { 408 | "output_type": "stream", 409 | "name": "stdout", 410 | "text": [ 411 | "\n", 412 | "\n", 413 | "1. Capital of Paris – France\n", 414 | "\n", 415 | "Source: Wikipedia\n", 416 | "\n", 417 | "2. Capital of Paris – Switzerland\n", 418 | "\n", 419 | "Source: Wikipedia\n", 420 | "\n", 421 | "3. Capital of Paris – Luxembourg\n", 422 | "\n", 423 | "Source: Wikipedia\n", 424 | "\n", 425 | "4. Capital of Paris – Netherlands\n", 426 | "\n", 427 | "Source: Wikipedia\n", 428 | "\n", 429 | "5. Capital of Paris – Germany\n", 430 | "\n", 431 | "Source: Wikipedia\n", 432 | "\n", 433 | "6. Capital of Paris – United Kingdom\n" 434 | ] 435 | } 436 | ] 437 | }, 438 | { 439 | "cell_type": "markdown", 440 | "source": [ 441 | "## BlenderBot - Encoder-Decoder" 442 | ], 443 | "metadata": { 444 | "id": "UHikkKC0u3dV" 445 | } 446 | }, 447 | { 448 | "cell_type": "code", 449 | "source": [ 450 | "from langchain.llms import HuggingFacePipeline\n", 451 | "import torch\n", 452 | "from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, AutoModelForSeq2SeqLM\n", 453 | "\n", 454 | "model_id = 'facebook/blenderbot-1B-distill'\n", 455 | "tokenizer = AutoTokenizer.from_pretrained(model_id)\n", 456 | "model = AutoModelForSeq2SeqLM.from_pretrained(model_id)\n", 457 | "\n", 458 | "pipe = pipeline(\n", 459 | " \"text2text-generation\",\n", 460 | " model=model, \n", 461 | " tokenizer=tokenizer, \n", 462 | " max_length=100\n", 463 | ")\n", 464 | "\n", 465 | "local_llm = HuggingFacePipeline(pipeline=pipe)" 466 | ], 467 | "metadata": { 468 | "id": "45AQKsOfu5WN" 469 | }, 470 | "execution_count": 14, 471 | "outputs": [] 472 | }, 473 | { 474 | "cell_type": "code", 475 | "source": [ 476 | "llm_chain = LLMChain(prompt=prompt, \n", 477 | " llm=local_llm\n", 478 | " )\n", 479 | "\n", 480 | "question = \"What area is best for growing wine in France?\"\n", 481 | "\n", 482 | "print(llm_chain.run(question))" 483 | ], 484 | "metadata": { 485 | "colab": { 486 | "base_uri": "https://localhost:8080/" 487 | }, 488 | "id": "bpNLf2lovgO7", 489 | "outputId": "0f4f705b-6a45-45ce-ffa5-943622d27e0c" 490 | }, 491 | "execution_count": 15, 492 | "outputs": [ 493 | { 494 | "output_type": "stream", 495 | "name": "stdout", 496 | "text": [ 497 | " I'm not sure, but I do know that France is one of the largest producers of wine in the world.\n" 498 | ] 499 | } 500 | ] 501 | }, 502 | { 503 | "cell_type": "markdown", 504 | "source": [ 505 | "## SentenceTransformers" 506 | ], 507 | "metadata": { 508 | "id": "z66tHc3V5yv-" 509 | } 510 | }, 511 | { 512 | "cell_type": "code", 513 | "source": [ 514 | "from langchain.embeddings import HuggingFaceEmbeddings\n", 515 | "\n", 516 | "model_name = \"sentence-transformers/all-mpnet-base-v2\"\n", 517 | "\n", 518 | "hf = HuggingFaceEmbeddings(model_name=model_name)" 519 | ], 520 | "metadata": { 521 | "id": "aKeSAVQUn8fN" 522 | }, 523 | "execution_count": 16, 524 | "outputs": [] 525 | }, 526 | { 527 | "cell_type": "code", 528 | "source": [ 529 | "hf.embed_query('this is an embedding')" 530 | ], 531 | "metadata": { 532 | "id": "ZORCrZkaykTC" 533 | }, 534 | "execution_count": null, 535 | "outputs": [] 536 | }, 537 | { 538 | "cell_type": "code", 539 | "source": [ 540 | "hf.embed_documents(['this is an embedding','this another embedding'])" 541 | ], 542 | "metadata": { 543 | "id": "-weIzgnSys2M" 544 | }, 545 | "execution_count": null, 546 | "outputs": [] 547 | }, 548 | { 549 | "cell_type": "code", 550 | "source": [ 551 | "\n", 552 | "\n", 553 | "hf = HuggingFaceHubEmbeddings(\n", 554 | " repo_id=model_name,\n", 555 | " task=\"feature-extraction\",\n", 556 | " # huggingfacehub_api_token=\"my-api-key\",\n", 557 | ")" 558 | ], 559 | "metadata": { 560 | "id": "KBwmKrk4smhr" 561 | }, 562 | "execution_count": null, 563 | "outputs": [] 564 | }, 565 | { 566 | "cell_type": "markdown", 567 | "source": [], 568 | "metadata": { 569 | "id": "pR62_JRxARbA" 570 | } 571 | } 572 | ] 573 | } -------------------------------------------------------------------------------- /Q&A_Bot_with_Llama_Index_and_LangChain_on_ML_course_contents.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "view-in-github", 7 | "colab_type": "text" 8 | }, 9 | "source": [ 10 | "\"Open" 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": null, 16 | "metadata": { 17 | "id": "udFz2NpdeXRw" 18 | }, 19 | "outputs": [], 20 | "source": [ 21 | "!pip install llama-index\n", 22 | "!pip install langchain\n", 23 | "!pip install PyPDF2" 24 | ] 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": 2, 29 | "metadata": { 30 | "id": "nuyAKDcyea6-" 31 | }, 32 | "outputs": [], 33 | "source": [ 34 | "#from gpt_index import SimpleDirectoryReader, GPTListIndex, GPTSimpleVectorIndex, LLMPredictor, PromptHelper\n", 35 | "from langchain import OpenAI\n", 36 | "from llama_index import SimpleDirectoryReader, LangchainEmbedding, GPTListIndex,GPTSimpleVectorIndex, PromptHelper\n", 37 | "from llama_index import LLMPredictor, ServiceContext\n", 38 | "import sys\n", 39 | "#from google.colab import drive\n", 40 | "import os\n" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "source": [ 46 | "#from google.colab import drive\n", 47 | "#drive.mount('/content/drive')\n", 48 | "\n" 49 | ], 50 | "metadata": { 51 | "id": "UP6P0396R0Us" 52 | }, 53 | "execution_count": 3, 54 | "outputs": [] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": 4, 59 | "metadata": { 60 | "id": "j6yx6eHDia8G" 61 | }, 62 | "outputs": [], 63 | "source": [ 64 | "os.environ[\"OPENAI_API_KEY\"] = ''# Uncomment and provide you own apikey\n" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": 5, 70 | "metadata": { 71 | "id": "MmoQ0jSRZXhj" 72 | }, 73 | "outputs": [], 74 | "source": [ 75 | "def construct_index(directory_path):\n", 76 | " # set maximum input size\n", 77 | " max_input_size = 4096\n", 78 | " # set number of output tokens\n", 79 | " num_outputs = 256\n", 80 | " # set maximum chunk overlap\n", 81 | " max_chunk_overlap = 20\n", 82 | " # set chunk size limit\n", 83 | " chunk_size_limit = 600\n", 84 | "\n", 85 | " prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)\n", 86 | "\n", 87 | " # define LLM\n", 88 | " llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name=\"text-davinci-002\", max_tokens=num_outputs))\n", 89 | " \n", 90 | " documents = SimpleDirectoryReader(directory_path).load_data()\n", 91 | " \n", 92 | " service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)\n", 93 | " index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)\n", 94 | " \n", 95 | " index.save_to_disk('/content/drive/MyDrive/ML/index_ml.json') # had to use gdrive to include 1 file of 75MB+ size\n", 96 | " \n", 97 | " return index" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "source": [ 103 | "#pdfs that I have indexed and 1 more 75mb file\n", 104 | "!wget https://raw.githubusercontent.com/navneetkrc/Vector_search_POCs/main/index_ml.json\n", 105 | "!wget https://github.com/navneetkrc/Vector_search_POCs/blob/main/ML_Merged_All.pdf\n", 106 | "!wget https://github.com/navneetkrc/Vector_search_POCs/blob/main/ML-Prevous%20End%20term%20paper_merged.pdf\n", 107 | "!wget https://github.com/navneetkrc/Vector_search_POCs/blob/main/Previous%20yr%20Mid%20term%20ML%20paper-merged.pdf" 108 | ], 109 | "metadata": { 110 | "id": "IX1xYVSucOd9" 111 | }, 112 | "execution_count": null, 113 | "outputs": [] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": 7, 118 | "metadata": { 119 | "id": "GW1tTw2VeWgS" 120 | }, 121 | "outputs": [], 122 | "source": [ 123 | "def ask_bot(input_index = '/content/drive/MyDrive/ML/index_ml.json'): #encoded and indexed \n", 124 | " index = GPTSimpleVectorIndex.load_from_disk(input_index)\n", 125 | " while True:\n", 126 | " query = input('What do you want to ask the bot? \\n')\n", 127 | " response = index.query(query, response_mode=\"compact\")\n", 128 | " print (\"\\nBot says: \\n\\n\" + response.response + \"\\n\\n\\n\")" 129 | ] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "execution_count": 8, 134 | "metadata": { 135 | "id": "gM00_PPYO2PB" 136 | }, 137 | "outputs": [], 138 | "source": [ 139 | "#index = construct_index(\"/content/drive/MyDrive/ML\") # uncomment to create new index\n", 140 | "\n", 141 | "# it has all the ML related pdfs and 1 book shared by varad as well\n", 142 | "\n", 143 | "# You can find books/pdfs here\n", 144 | "#https://drive.google.com/file/d/1vFIpNWyfrbLc-ce5lmClX5RsJiu-yHih/view?usp=share_link" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 9, 150 | "metadata": { 151 | "colab": { 152 | "base_uri": "https://localhost:8080/", 153 | "height": 939 154 | }, 155 | "id": "wOWoWflrP2pO", 156 | "outputId": "dd86e0ed-4b2a-49d4-97aa-e92baf96125c" 157 | }, 158 | "outputs": [ 159 | { 160 | "output_type": "stream", 161 | "name": "stdout", 162 | "text": [ 163 | "What do you want to ask the bot? \n", 164 | "Use kernel trick and find the equation for hyperplane using nonlinear SVM. Positive Points: {(7,0), (9,0), (11,0)} Negative Points: {(0,0), (8,0), (12,0), (10,0)}. Plot the point before and after the transformation.\n", 165 | "\n", 166 | "Bot says: \n", 167 | "\n", 168 | "\n", 169 | "\n", 170 | "Solution:\n", 171 | "The kernel trick used in this case is the modulo 2 transformation, which is defined as Φ(x) = x mod 2. This transformation maps the points (7,0), (9,0), (11,0), (0,0), (8,0), (12,0), (10,0) to (1,0), (1,0), (1,0), (0,0), (0,0), (0,0), (0,0) respectively. The equation of the hyperplane is then y = 0.5.\n", 172 | "\n", 173 | "For the given dataset, the separability of the dataset is not linear. This is because the points cannot be separated by a single line. To model this dataset with an Artificial Neural Network, we can use a non-linear transformation such as a polynomial kernel. This will map the points to a higher dimensional space, where they can be separated by a hyperplane. The equation of the hyperplane is then y = 0.5.\n", 174 | "\n", 175 | "\n", 176 | "\n", 177 | "What do you want to ask the bot? \n", 178 | "Let say we have a dice of 4 sides. Where { (x,P(x)) : (0,a), (1,(1-a)/3), (2,(1-a)/2), (3,(1-a)/6) }. If the tossing event is observed as (0,1,2,3,2,3,1,0) then what is the most probable value of a\n", 179 | "\n", 180 | "Bot says: \n", 181 | "\n", 182 | "\n", 183 | "The most probable value of a is still 0.5. The theorem provided does not change the answer to the original question.\n", 184 | "\n", 185 | "\n", 186 | "\n", 187 | "What do you want to ask the bot? \n", 188 | "Consider training a boosting classifier using decision stumps on the following data set. Circle the examples which will have their weights increased at the end of each iteration. Run the iteration till zero training error is achieved.\n", 189 | "\n", 190 | "Bot says: \n", 191 | "\n", 192 | "\n", 193 | "\n", 194 | "The examples which will have their weights increased at the end of each iteration are: 0.3, 0.3, 0.3, 0.3. This is because AdaBoost assigns higher weights to the data points which were predicted incorrectly, and the error value is used to determine the weights.\n", 195 | "\n", 196 | "\n", 197 | "\n" 198 | ] 199 | }, 200 | { 201 | "output_type": "error", 202 | "ename": "KeyboardInterrupt", 203 | "evalue": "ignored", 204 | "traceback": [ 205 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 206 | "\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)", 207 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mask_bot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'index_ml.json'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 208 | "\u001b[0;32m\u001b[0m in \u001b[0;36mask_bot\u001b[0;34m(input_index)\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mindex\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mGPTSimpleVectorIndex\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mload_from_disk\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0minput_index\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;32mwhile\u001b[0m \u001b[0;32mTrue\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0mquery\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0minput\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'What do you want to ask the bot? \\n'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 5\u001b[0m \u001b[0mresponse\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mindex\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mquery\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mquery\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mresponse_mode\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m\"compact\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0mprint\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;34m\"\\nBot says: \\n\\n\"\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mresponse\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mresponse\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;34m\"\\n\\n\\n\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 209 | "\u001b[0;32m/usr/local/lib/python3.9/dist-packages/ipykernel/kernelbase.py\u001b[0m in \u001b[0;36mraw_input\u001b[0;34m(self, prompt)\u001b[0m\n\u001b[1;32m 849\u001b[0m \u001b[0;34m\"raw_input was called, but this frontend does not support input requests.\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 850\u001b[0m )\n\u001b[0;32m--> 851\u001b[0;31m return self._input_request(str(prompt),\n\u001b[0m\u001b[1;32m 852\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_parent_ident\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 853\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_parent_header\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 210 | "\u001b[0;32m/usr/local/lib/python3.9/dist-packages/ipykernel/kernelbase.py\u001b[0m in \u001b[0;36m_input_request\u001b[0;34m(self, prompt, ident, parent, password)\u001b[0m\n\u001b[1;32m 893\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mKeyboardInterrupt\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 894\u001b[0m \u001b[0;31m# re-raise KeyboardInterrupt, to truncate traceback\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 895\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mKeyboardInterrupt\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Interrupted by user\"\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 896\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mException\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 897\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlog\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwarning\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Invalid Message:\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexc_info\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 211 | "\u001b[0;31mKeyboardInterrupt\u001b[0m: Interrupted by user" 212 | ] 213 | } 214 | ], 215 | "source": [ 216 | "ask_bot('index_ml.json')" 217 | ] 218 | }, 219 | { 220 | "cell_type": "code", 221 | "source": [], 222 | "metadata": { 223 | "id": "7iIWhkibltL1" 224 | }, 225 | "execution_count": null, 226 | "outputs": [] 227 | }, 228 | { 229 | "cell_type": "markdown", 230 | "source": [ 231 | "** What do you want to ask the bot? ** \n", 232 | "Use kernel trick and find the equation for hyperplane using nonlinear SVM. Positive Points: {(7,0), (9,0), (11,0)} Negative Points: {(0,0), (8,0), (12,0), (10,0)}. Plot the point before and after the transformation.\n", 233 | "\n", 234 | "Bot says: \n", 235 | "\n", 236 | "\n", 237 | "\n", 238 | "Solution:\n", 239 | "The kernel trick used in this case is the modulo 2 transformation, which is defined as Φ(x) = x mod 2. This transformation maps the points (7,0), (9,0), (11,0), (0,0), (8,0), (12,0), (10,0) to (1,0), (1,0), (1,0), (0,0), (0,0), (0,0), (0,0) respectively. The equation of the hyperplane is then y = 0.5.\n", 240 | "\n", 241 | "For the given dataset, the separability of the dataset is not linear. This is because the points cannot be separated by a single line. To model this dataset with an Artificial Neural Network, we can use a non-linear transformation such as a polynomial kernel. This will map the points to a higher dimensional space, where they can be separated by a hyperplane. The equation of the hyperplane is then y = 0.5.\n", 242 | "\n", 243 | "\n", 244 | "\n", 245 | "** What do you want to ask the bot? ** \n", 246 | "Let say we have a dice of 4 sides. Where { (x,P(x)) : (0,a), (1,(1-a)/3), (2,(1-a)/2), (3,(1-a)/6) }. If the tossing event is observed as (0,1,2,3,2,3,1,0) then what is the most probable value of a\n", 247 | "\n", 248 | "Bot says: \n", 249 | "\n", 250 | "\n", 251 | "The most probable value of a is still 0.5. The theorem provided does not change the answer to the original question.\n", 252 | "\n", 253 | "\n", 254 | "\n", 255 | "** What do you want to ask the bot? ** \n", 256 | "Consider training a boosting classifier using decision stumps on the following data set. Circle the examples which will have their weights increased at the end of each iteration. Run the iteration till zero training error is achieved.\n", 257 | "\n", 258 | "Bot says: \n", 259 | "\n", 260 | "\n", 261 | "\n", 262 | "The examples which will have their weights increased at the end of each iteration are: 0.3, 0.3, 0.3, 0.3. This is because AdaBoost assigns higher weights to the data points which were predicted incorrectly, and the error value is used to determine the weights.\n" 263 | ], 264 | "metadata": { 265 | "id": "jYlRO_p0ltpH" 266 | } 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": null, 271 | "metadata": { 272 | "id": "OxfT37qTQVOx" 273 | }, 274 | "outputs": [], 275 | "source": [] 276 | } 277 | ], 278 | "metadata": { 279 | "colab": { 280 | "provenance": [], 281 | "include_colab_link": true 282 | }, 283 | "kernelspec": { 284 | "display_name": "Python 3", 285 | "name": "python3" 286 | }, 287 | "language_info": { 288 | "name": "python" 289 | } 290 | }, 291 | "nbformat": 4, 292 | "nbformat_minor": 0 293 | } -------------------------------------------------------------------------------- /QA Generation results for s23 ultra user manual.csv: -------------------------------------------------------------------------------- 1 | "question","answer","result","answerScore","retrievalScore","latency" 2 | "What is Samsung Pass and how can it be used?","Samsung Pass is a feature that allows users to register their biometric data and use it to verify their identity instead of entering their Samsung account password. It can also be used to sign in to websites and apps that support ID and password autofill, manage sign-in information, and enter personal information automatically.","Samsung Pass is a feature that allows users to use their registered biometric data to verify their identity instead of entering their Samsung account password. It can be used to sign in to websites and apps that support ID and password autofill, enter personal information automatically, and manage sign-in information. It can also be used to delete biometric data, sign-in information, and app data registered to Samsung Pass.","{""score"":1,""justification"":""GRADE: Correct\nJUSTIFICATION: The student's answer accurately describes Samsung Pass and its uses, including biometric data registration, signing in to websites and apps, managing sign-in information, and entering personal information automatically.""}","{""score"":1,""justification"":""GRADE: Correct\nJUSTIFICATION: The retrieved documents are relevant to the question and provide information about Samsung Pass and its uses, supporting the answer.""}",3.306061029434204 3 | "What is the purpose of the Universal switch feature?","The Universal switch feature allows you to control the touch screen by connecting an external switch or by using head movements and face gestures.","The Universal switch feature allows users to control the touch screen using an external switch or head movements and face gestures. It can be turned on in the Accessibility settings and at least one switch must be registered on the device to use it. The feature is designed to assist users with limited mobility or dexterity.","{""score"":1,""justification"":""GRADE: Correct\nJUSTIFICATION: The student's answer accurately describes the purpose of the Universal switch feature and its use for users with limited mobility or dexterity.""}","{""score"":1,""justification"":""GRADE: Correct\n\nJUSTIFICATION: The retrieved documents are relevant to the question and support the answer, specifically Doc 1, which explains the Universal switch feature and its purpose.""}",2.6733109951019287 4 | "What are the different actions that can be taken in the Contacts app?","The different actions that can be taken in the Contacts app are making a voice call, composing a message, making a video call, and composing an email.","In the Contacts app, you can delete contacts, share contacts, create groups, merge duplicate contacts, import contacts, sync contacts with web accounts, and search for contacts.","{""score"":0,""justification"":""GRADE: Incorrect\nJUSTIFICATION: The student's answer focuses on managing contacts, while the true answer lists communication actions that can be taken with contacts.""}","{""score"":1,""justification"":""GRADE: Correct\n\nJUSTIFICATION: The retrieved documents are relevant to the question and provide information about the different actions that can be taken in the Contacts app, supporting the answer.""}",1.6975047588348389 5 | "How can you move the keyboard to another location?","You can move the keyboard to another location by dragging the tab.","The context does not provide information on how to move the keyboard to another location.","{""score"":0,""justification"":""GRADE: Incorrect\nJUSTIFICATION: The student answer does not mention the correct method of moving the keyboard, which is by dragging the tab.""}","{""score"":0,""justification"":""GRADE: Incorrect\n\nJUSTIFICATION: The retrieved documents do not provide information on how to move the keyboard to another location.""}",0.9163455963134766 6 | "What is the Super steady feature used for?","The Super steady feature is used for further video stabilisation when there is a lot of shaking while recording videos.","The Super steady feature is used for further video stabilization when there is a lot of shaking while recording videos.","{""score"":1,""justification"":""GRADE: Correct\nJUSTIFICATION: The student's answer accurately describes the purpose of the Super steady feature, which is to stabilize videos when there is a lot of shaking.""}","{""score"":1,""justification"":""GRADE: Correct\n\nJUSTIFICATION: The retrieved documents are relevant to the question and support the answer, as Doc 1 specifically mentions the Super steady feature being used for further video stabilisation when there is a lot of shaking while recording videos.""}",1.3629720211029053 7 | "What are some topics covered in the appendix of this text?","Accessibility, troubleshooting, and removing the battery","The appendix covers topics such as accessibility, troubleshooting, and removing the battery. It also includes information on device layout and functions, apps and features, and Samsung Notes.","{""score"":1,""justification"":""GRADE: Correct\nJUSTIFICATION: The student's answer includes all the topics mentioned in the true answer and provides additional relevant information without any conflicting statements.""}","{""score"":1,""justification"":""GRADE: Correct\nJUSTIFICATION: The retrieved documents contain an appendix section that covers topics such as accessibility, troubleshooting, and removing the battery, which directly answers the question.""}",1.5723686218261719 8 | "How do you add a phone number to speed dial?","To set a number to speed dial, open the Phone app, tap Keypad → → Speed dial numbers, select a speed dial number, and then add a phone number.","To add a phone number to speed dial, open the Phone app, tap Keypad, then tap the three dots icon and select Speed dial numbers. Choose a speed dial number and add a phone number to it.","{""score"":1,""justification"":""GRADE: Correct\nJUSTIFICATION: The student's answer accurately describes the process of adding a phone number to speed dial, even though the phrasing is slightly different from the true answer.""}","{""score"":1,""justification"":""GRADE: Correct\n\nJUSTIFICATION: The retrieved documents are relevant to the question and Doc 1 provides a clear step-by-step process for adding a phone number to speed dial.""}",1.8915996551513672 9 | "What is the purpose of the Samsung Members app?","The Samsung Members app offers support services to customers, such as device problem diagnosis, and lets users submit questions and error reports. It also allows users to share information with others in the Galaxy users’ community or check the latest news and tips.","The Samsung Members app offers support services to customers, such as device problem diagnosis, and lets users submit questions and error reports. It also allows users to share information with others in the Galaxy users’ community or check the latest news and tips. The app can help users solve any problems they might encounter while using their device.","{""score"":1,""justification"":""GRADE: Correct\nJUSTIFICATION: The student's answer accurately describes the purpose of the Samsung Members app, including support services, problem diagnosis, and community features.""}","{""score"":1,""justification"":""GRADE: Correct\n\nJUSTIFICATION: The retrieved document (Doc 1) contains information about the Samsung Members app, which directly answers the question about its purpose.""}",2.659925937652588 10 | "What is the purpose of the Always On Display feature on the Lock Screen?","To display information when the screen is turned off.","The Always On Display feature on the Lock Screen allows you to check information such as the clock, calendar, and notifications without having to unlock the screen. It can also be used to control music playback. The feature can be set to appear continuously, during a set time, or when receiving notifications.","{""score"":1,""justification"":""GRADE: Correct\nJUSTIFICATION: The student's answer accurately describes the purpose of the Always On Display feature, which is to display information when the screen is turned off.""}","{""score"":1,""justification"":""GRADE: Correct\nJUSTIFICATION: The retrieved documents discuss the Always On Display feature and its purpose of displaying information when the screen is turned off.""}",2.3131890296936035 11 | "What are some of the features listed under 'Device Protection'?","Some of the features listed under 'Device Protection' are Auto optimisation, Software update, Diagnostics, Maintenance mode, and Apps.","The features listed under 'Device Protection' include Secure Folder, Find My Mobile, and other security settings.","{""score"":0,""justification"":""GRADE: Incorrect\nJUSTIFICATION: The student's answer lists different features than those provided in the true answer, indicating a lack of factual accuracy.""}","{""score"":0,""justification"":""GRADE: Incorrect\n\nJUSTIFICATION: The retrieved documents do not provide information about features specifically listed under 'Device Protection'.""}",1.1237430572509766 12 | "What is AR Doodle and how does it work?","AR Doodle is a feature that allows you to record fun videos with virtual handwriting or drawings on people or pets’ faces or anywhere else. When the camera recognizes a face or space, the doodles on the face will follow the face as it moves, and the doodles in the space will be fixed in the same place even if the camera moves. To use AR Doodle, you need to open the AR Zone app and tap AR Doodle. When the camera recognizes the subject, the recognition area will appear on the screen. You can then write or draw in the recognition area. If you switch to the rear camera, you can also write or draw outside the recognition area. If you tap and then begin to doodle, you can record yourself doodling.","AR Doodle is a feature that allows you to record videos with virtual handwriting or drawings on people or pets' faces or anywhere else. The doodles on the face will follow the face as it moves, and the doodles in the space will be fixed in the same place even if the camera moves. You can open the AR Zone app and tap AR Doodle to use this feature.","{""score"":1,""justification"":""GRADE: Correct\nJUSTIFICATION: The student's answer accurately describes AR Doodle as a feature that allows recording videos with virtual handwriting or drawings on faces or spaces, and explains how the doodles follow faces or remain fixed in space.""}","{""score"":1,""justification"":""GRADE: Correct\n\nJUSTIFICATION: The retrieved documents are relevant to the question and provide information about AR Doodle, its functionality, and how to use it.""}",3.315986394882202 13 | "What are the options available when selecting an area in the Screen write feature?","The options available when selecting an area in the Screen write feature are Pin/Insert, Write or draw, Share, and Save.","The options available when selecting an area in the Screen write feature include capturing screenshots to write or draw on them, cropping an area from the captured image, and capturing the current content and the hidden content on an elongated page.","{""score"":0,""justification"":""GRADE: Incorrect\nJUSTIFICATION: The student answer does not mention the correct options of Pin/Insert, Write or draw, Share, and Save.""}","{""score"":1,""justification"":""GRADE: Correct\nJUSTIFICATION: The retrieved documents are relevant to the question and provide information about the options available when selecting an area in the Screen write feature.""}",1.9342446327209473 14 | "What should you avoid doing to the touch screen to prevent damage?","Do not tap it with anything sharp or apply excessive pressure to it with your fingertips or other object.","To avoid damaging the touch screen, do not tap it with anything sharp or apply excessive pressure to it with your fingertips or other object. Additionally, do not use fixed graphics on part or all of the touch screen for extended periods as this may result in afterimages (screen burn-in) or ghosting. Finally, do not allow the touch screen to come into contact with other electrical devices as electrostatic discharges can cause the touch screen to malfunction.","{""score"":1,""justification"":""GRADE: Correct\nJUSTIFICATION: The student's answer includes the true answer and provides additional relevant information on how to prevent damage to the touch screen.""}","{""score"":1,""justification"":""GRADE: Correct\n\nJUSTIFICATION: The retrieved documents are relevant to the question and support the answer, as they provide information on how to avoid damaging the touch screen.""}",3.641409397125244 15 | "What must be done if the phone is replaced after registering the digital key?","The digital key must be registered again.","If the phone is replaced, the digital key must be registered again.","{""score"":1,""justification"":""GRADE: Correct\nJUSTIFICATION: The student's answer accurately states that the digital key must be registered again if the phone is replaced.""}","{""score"":1,""justification"":""GRADE: Correct\nJUSTIFICATION: The retrieved documents are relevant to the question and Doc 1 clearly states that if the phone is replaced, the digital key must be registered again.""}",0.6706328392028809 16 | "How can you adjust the size of the windows in split screen view?","You can drag the circles between the app windows to adjust the size of the windows.","You can adjust the size of the windows in split screen view by dragging the circles between the app windows. When you drag the circles to the edge of the screen, the window will be maximized.","{""score"":1,""justification"":""GRADE: Correct\nJUSTIFICATION: The student's answer accurately describes how to adjust the size of the windows in split screen view by dragging the circles between the app windows.""}","{""score"":1,""justification"":""GRADE: Correct\n\nJUSTIFICATION: The retrieved documents are relevant to the question and Doc 1 provides the necessary information to support the answer.""}",1.7671329975128174 -------------------------------------------------------------------------------- /QA_app_using_Pinecone_openai.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "provenance": [], 7 | "include_colab_link": true 8 | }, 9 | "kernelspec": { 10 | "name": "python3", 11 | "display_name": "Python 3" 12 | }, 13 | "language_info": { 14 | "name": "python" 15 | } 16 | }, 17 | "cells": [ 18 | { 19 | "cell_type": "markdown", 20 | "metadata": { 21 | "id": "view-in-github", 22 | "colab_type": "text" 23 | }, 24 | "source": [ 25 | "\"Open" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "source": [ 31 | "# Install" 32 | ], 33 | "metadata": { 34 | "id": "DUlgyRebGtNg" 35 | } 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": null, 40 | "metadata": { 41 | "id": "YoZ7VisnE0iP" 42 | }, 43 | "outputs": [], 44 | "source": [ 45 | "!pip install openai pinecone-client python-docx" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "source": [ 51 | "# Import" 52 | ], 53 | "metadata": { 54 | "id": "Da7fjJn2JDkf" 55 | } 56 | }, 57 | { 58 | "cell_type": "code", 59 | "source": [ 60 | "import pinecone\n", 61 | "from openai.embeddings_utils import get_embedding\n", 62 | "from tqdm import tqdm\n", 63 | "import docx\n", 64 | "import os\n", 65 | "import openai\n", 66 | "\n", 67 | "openai.api_key = \"YOUR_API_KEY_HERE\"" 68 | ], 69 | "metadata": { 70 | "id": "vzabQYDmIEYR" 71 | }, 72 | "execution_count": null, 73 | "outputs": [] 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "source": [ 78 | "# Mount drive a specify the folder" 79 | ], 80 | "metadata": { 81 | "id": "Vc9vsSXNMsa3" 82 | } 83 | }, 84 | { 85 | "cell_type": "code", 86 | "source": [ 87 | "from google.colab import drive\n", 88 | "drive.mount('/content/drive', force_remount=True)" 89 | ], 90 | "metadata": { 91 | "id": "S3bU_eUOJCT-" 92 | }, 93 | "execution_count": null, 94 | "outputs": [] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "source": [ 99 | "docs_path = \"/content/drive/MyDrive/Your Docs Folder\"" 100 | ], 101 | "metadata": { 102 | "id": "rR7IbBDSMwaZ" 103 | }, 104 | "execution_count": null, 105 | "outputs": [] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "source": [ 110 | "# Parse Documents" 111 | ], 112 | "metadata": { 113 | "id": "B7C2eh4TVOPU" 114 | } 115 | }, 116 | { 117 | "cell_type": "code", 118 | "source": [ 119 | "text_chunks = []\n", 120 | "for f_name in os.listdir(docs_path):\n", 121 | " doc_path = os.path.join(docs_path, f_name)\n", 122 | " doc = docx.Document(doc_path)\n", 123 | " for para in doc.paragraphs:\n", 124 | " text_chunks.append(para.text)" 125 | ], 126 | "metadata": { 127 | "id": "PTDRN9P5T2B9" 128 | }, 129 | "execution_count": null, 130 | "outputs": [] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "source": [ 135 | "# remove all chunks shorter than 10 words and strip the rest\n", 136 | "text_chunks = [string.strip().strip('\\n') for string in text_chunks if len(string.split()) >= 10]" 137 | ], 138 | "metadata": { 139 | "id": "4aFLhBnbZ8-7" 140 | }, 141 | "execution_count": null, 142 | "outputs": [] 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "source": [ 147 | "# Generate embeddigns" 148 | ], 149 | "metadata": { 150 | "id": "RB7bq2ucbGeD" 151 | } 152 | }, 153 | { 154 | "cell_type": "code", 155 | "source": [ 156 | "chunks_with_embeddigns = []\n", 157 | "for chunk in tqdm(text_chunks):\n", 158 | " embedding = get_embedding(chunk, engine='text-embedding-ada-002')\n", 159 | " chunks_with_embeddigns.append({\"text\": chunk, \"embedding\": embedding})" 160 | ], 161 | "metadata": { 162 | "id": "UOK6ovzbbFNs" 163 | }, 164 | "execution_count": null, 165 | "outputs": [] 166 | }, 167 | { 168 | "cell_type": "markdown", 169 | "source": [ 170 | "# Upload to Pinecone" 171 | ], 172 | "metadata": { 173 | "id": "Kc_xdNnAduLx" 174 | } 175 | }, 176 | { 177 | "cell_type": "code", 178 | "source": [ 179 | "pinecone.init(\n", 180 | " api_key=\"\",\n", 181 | " environment=\"us-east1-gcp\"\n", 182 | ")" 183 | ], 184 | "metadata": { 185 | "id": "L1jcoDObeLO-" 186 | }, 187 | "execution_count": null, 188 | "outputs": [] 189 | }, 190 | { 191 | "cell_type": "code", 192 | "source": [ 193 | "# create or connect to index\n", 194 | "index_name = \"tiktok-trends-2023\"\n", 195 | "\n", 196 | "if index_name not in pinecone.list_indexes():\n", 197 | " pinecone.create_index(index_name, dimension=1536)\n", 198 | "# connect to index\n", 199 | "index = pinecone.Index(index_name)" 200 | ], 201 | "metadata": { 202 | "id": "uq6ABbtVdGOS" 203 | }, 204 | "execution_count": null, 205 | "outputs": [] 206 | }, 207 | { 208 | "cell_type": "code", 209 | "source": [ 210 | "batch_size = 64 # process everything in batches of 64\n", 211 | "for i in tqdm(range(0, len(chunks_with_embeddigns), batch_size)):\n", 212 | " data_batch = chunks_with_embeddigns.iloc[i: i+batch_size]\n", 213 | " # set end position of batch\n", 214 | " i_end = min(i+batch_size, len(chunks_with_embeddigns))\n", 215 | " # get batch meta\n", 216 | " text_batch = [item['text'] for item in data_batch]\n", 217 | " # get ids\n", 218 | " ids_batch = [str(n) for n in range(i, i_end)]\n", 219 | " # get embeddings\n", 220 | " embeds = [item['embedding'] for item in data_batch]\n", 221 | " # prep metadata and upsert batch\n", 222 | " meta = [{'text': text_batch} for text_batch in zip(text_batch)] # you can add more fields here\n", 223 | " to_upsert = zip(ids_batch, embeds, meta)\n", 224 | " # upsert to Pinecone\n", 225 | " index.upsert(vectors=list(to_upsert))" 226 | ], 227 | "metadata": { 228 | "id": "0dSrfmyUe0p6" 229 | }, 230 | "execution_count": null, 231 | "outputs": [] 232 | }, 233 | { 234 | "cell_type": "markdown", 235 | "source": [ 236 | "# Query Index" 237 | ], 238 | "metadata": { 239 | "id": "sWVFk3fyGzwy" 240 | } 241 | }, 242 | { 243 | "cell_type": "code", 244 | "source": [ 245 | "def search_docs(query):\n", 246 | " xq = openai.Embedding.create(input=query, engine=\"text-embedding-ada-002\")['data'][0]['embedding']\n", 247 | " res = index.query([xq], top_k=5, include_metadata=True)\n", 248 | " chosen_text = []\n", 249 | " for match in res['matches']:\n", 250 | " chosen_text = match['metadata']\n", 251 | " return res['matches']" 252 | ], 253 | "metadata": { 254 | "id": "MawACaJnG0we" 255 | }, 256 | "execution_count": null, 257 | "outputs": [] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "source": [ 262 | "matches = search_docs(\"What are some predictions for tiktok?\")\n", 263 | "for match in matches:\n", 264 | " print(f\"{match['score']:.2f}: {match['metadata']}\")" 265 | ], 266 | "metadata": { 267 | "id": "0Vnoi_NaHSu-" 268 | }, 269 | "execution_count": null, 270 | "outputs": [] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "source": [ 275 | "# Construct Prompt" 276 | ], 277 | "metadata": { 278 | "id": "H9C11iMnH8ux" 279 | } 280 | }, 281 | { 282 | "cell_type": "code", 283 | "source": [ 284 | "def construct_prompt(query):\n", 285 | " matches = search_docs(query)\n", 286 | "\n", 287 | " chosen_text = []\n", 288 | " for match in matches:\n", 289 | " chosen_text.append(match['metadata']['text'])\n", 290 | "\n", 291 | " prompt = \"\"\"Answer the question as truthfully as possible using the context below, and if the answer is no within the context, say 'I don't know.'\"\"\"\n", 292 | " prompt += \"\\n\\n\"\n", 293 | " prompt += \"Context: \" + \"\\n\".join(chosen_text)\n", 294 | " prompt += \"\\n\\n\"\n", 295 | " prompt += \"Question: \" + query\n", 296 | " prompt += \"\\n\"\n", 297 | " prompt += \"Answer: \"\n", 298 | " return prompt" 299 | ], 300 | "metadata": { 301 | "id": "IhxkkHunIA38" 302 | }, 303 | "execution_count": null, 304 | "outputs": [] 305 | }, 306 | { 307 | "cell_type": "markdown", 308 | "source": [ 309 | "# Run the model" 310 | ], 311 | "metadata": { 312 | "id": "pXMXg_JHIh05" 313 | } 314 | }, 315 | { 316 | "cell_type": "code", 317 | "source": [ 318 | "def answer_question(query):\n", 319 | " prompt = construct_prompt(query)\n", 320 | " res = openai.Completion.create(\n", 321 | " prompt=prompt,\n", 322 | " model=\"text-davinci-003\",\n", 323 | " max_tokens=500,\n", 324 | " temperature=0.0,\n", 325 | " )\n", 326 | " \n", 327 | " return res.choices[0].message" 328 | ], 329 | "metadata": { 330 | "id": "OAVPi_p_MyuS" 331 | }, 332 | "execution_count": null, 333 | "outputs": [] 334 | }, 335 | { 336 | "cell_type": "code", 337 | "source": [ 338 | "print(answer_question(\"What will be the top platform in 2023?\"))" 339 | ], 340 | "metadata": { 341 | "id": "nlrzWiqQIkGs" 342 | }, 343 | "execution_count": null, 344 | "outputs": [] 345 | } 346 | ] 347 | } -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # langchain_colab_experiments 2 | Experiments with Langchain using different approaches on Google colab 3 | 4 | 5 | Doing some experiments with Query Generation using langchain to improve on the search results: 6 | https://huggingface.co/spaces/rlancemartin/auto-evaluator 7 | https://github.com/rlancemartin/auto-evaluator 8 | https://github.com/langchain-ai/auto-evaluator - with full explanation as well. We can use this deployment to create questions for openly available support documents 9 | 10 | 11 | https://gptcache.readthedocs.io/en/latest/bootcamp/langchain/qa_generation.html gptcache version of QA generation, but autoevaluator is better as this provides responses with evaluation as well. Generation task with evaluation is better and we can depend on them. 12 | 13 | https://autoevaluator.langchain.com/playground best place to check for different outputs 14 | -------------------------------------------------------------------------------- /Talk_to_CSV_&_Excel_Files_with_LangChain.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "provenance": [], 7 | "include_colab_link": true 8 | }, 9 | "kernelspec": { 10 | "name": "python3", 11 | "display_name": "Python 3" 12 | }, 13 | "language_info": { 14 | "name": "python" 15 | } 16 | }, 17 | "cells": [ 18 | { 19 | "cell_type": "markdown", 20 | "metadata": { 21 | "id": "view-in-github", 22 | "colab_type": "text" 23 | }, 24 | "source": [ 25 | "\"Open" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "source": [ 31 | "# Talking to CSV and Excel files with LangChain" 32 | ], 33 | "metadata": { 34 | "id": "QrR8xfz8gAXK" 35 | } 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "source": [ 40 | "Follow this Youtube video \n", 41 | "https://www.youtube.com/watch?v=xQ3mZhw69bc&t=17s" 42 | ], 43 | "metadata": { 44 | "id": "iZLPj_WK3mzS" 45 | } 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": null, 50 | "metadata": { 51 | "id": "RRYSu48huSUW" 52 | }, 53 | "outputs": [], 54 | "source": [ 55 | "!pip -q install langchain openai" 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "source": [ 61 | "import os\n", 62 | "\n", 63 | "#os.environ[\"OPENAI_API_KEY\"] = \"sk-****\"" 64 | ], 65 | "metadata": { 66 | "id": "dNA4TsHpu6OM" 67 | }, 68 | "execution_count": 2, 69 | "outputs": [] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "source": [], 74 | "metadata": { 75 | "id": "JMf2h4iX3Te9" 76 | }, 77 | "execution_count": 3, 78 | "outputs": [] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "source": [ 83 | "!pip show langchain" 84 | ], 85 | "metadata": { 86 | "id": "J-KFB7J_u_3L", 87 | "colab": { 88 | "base_uri": "https://localhost:8080/" 89 | }, 90 | "outputId": "16927c2e-de6e-4787-f667-49ced241fac1" 91 | }, 92 | "execution_count": 4, 93 | "outputs": [ 94 | { 95 | "output_type": "stream", 96 | "name": "stdout", 97 | "text": [ 98 | "Name: langchain\n", 99 | "Version: 0.0.129\n", 100 | "Summary: Building applications with LLMs through composability\n", 101 | "Home-page: https://www.github.com/hwchase17/langchain\n", 102 | "Author: \n", 103 | "Author-email: \n", 104 | "License: MIT\n", 105 | "Location: /usr/local/lib/python3.9/dist-packages\n", 106 | "Requires: aiohttp, dataclasses-json, numpy, pydantic, PyYAML, requests, SQLAlchemy, tenacity\n", 107 | "Required-by: \n" 108 | ] 109 | } 110 | ] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "source": [ 115 | "## Dataset - Black Friday Sales\n", 116 | "\n", 117 | "\n", 118 | "https://www.kaggle.com/datasets/sdolezel/black-friday" 119 | ], 120 | "metadata": { 121 | "id": "pSa-ywnnWSAR" 122 | } 123 | }, 124 | { 125 | "cell_type": "code", 126 | "source": [ 127 | "!wget -q https://www.dropbox.com/s/8y6a1zloiscuo5d/black_friday_sales.zip\n", 128 | "!unzip -q black_friday_sales.zip" 129 | ], 130 | "metadata": { 131 | "id": "nU35TyN8W0HG" 132 | }, 133 | "execution_count": 5, 134 | "outputs": [] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "source": [ 139 | "import pandas as pd\n", 140 | "\n", 141 | "df = pd.read_csv('/content/train.csv')" 142 | ], 143 | "metadata": { 144 | "id": "k-gEnmv-WyVB" 145 | }, 146 | "execution_count": 6, 147 | "outputs": [] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "source": [ 152 | "df.head()" 153 | ], 154 | "metadata": { 155 | "colab": { 156 | "base_uri": "https://localhost:8080/", 157 | "height": 206 158 | }, 159 | "id": "PoBMknlZXY2j", 160 | "outputId": "8d6181c6-6707-4519-c036-5db5f9aac301" 161 | }, 162 | "execution_count": 7, 163 | "outputs": [ 164 | { 165 | "output_type": "execute_result", 166 | "data": { 167 | "text/plain": [ 168 | " User_ID Product_ID Gender Age Occupation City_Category \\\n", 169 | "0 1000001 P00069042 F 0-17 10 A \n", 170 | "1 1000001 P00248942 F 0-17 10 A \n", 171 | "2 1000001 P00087842 F 0-17 10 A \n", 172 | "3 1000001 P00085442 F 0-17 10 A \n", 173 | "4 1000002 P00285442 M 55+ 16 C \n", 174 | "\n", 175 | " Stay_In_Current_City_Years Marital_Status Product_Category_1 \\\n", 176 | "0 2 0 3 \n", 177 | "1 2 0 1 \n", 178 | "2 2 0 12 \n", 179 | "3 2 0 12 \n", 180 | "4 4+ 0 8 \n", 181 | "\n", 182 | " Product_Category_2 Product_Category_3 Purchase \n", 183 | "0 NaN NaN 8370 \n", 184 | "1 6.0 14.0 15200 \n", 185 | "2 NaN NaN 1422 \n", 186 | "3 14.0 NaN 1057 \n", 187 | "4 NaN NaN 7969 " 188 | ], 189 | "text/html": [ 190 | "\n", 191 | "
\n", 192 | "
\n", 193 | "
\n", 194 | "\n", 207 | "\n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | "
User_IDProduct_IDGenderAgeOccupationCity_CategoryStay_In_Current_City_YearsMarital_StatusProduct_Category_1Product_Category_2Product_Category_3Purchase
01000001P00069042F0-1710A203NaNNaN8370
11000001P00248942F0-1710A2016.014.015200
21000001P00087842F0-1710A2012NaNNaN1422
31000001P00085442F0-1710A201214.0NaN1057
41000002P00285442M55+16C4+08NaNNaN7969
\n", 303 | "
\n", 304 | " \n", 314 | " \n", 315 | " \n", 352 | "\n", 353 | " \n", 377 | "
\n", 378 | "
\n", 379 | " " 380 | ] 381 | }, 382 | "metadata": {}, 383 | "execution_count": 7 384 | } 385 | ] 386 | }, 387 | { 388 | "cell_type": "markdown", 389 | "source": [ 390 | "## CSV Agent\n", 391 | "\n", 392 | "NOTE: this agent calls the Pandas DataFrame agent under the hood, which in turn calls the Python agent, which executes LLM generated Python code - this can be bad if the LLM generated Python code is harmful. Use cautiously." 393 | ], 394 | "metadata": { 395 | "id": "P0LaHRvoVGKS" 396 | } 397 | }, 398 | { 399 | "cell_type": "code", 400 | "source": [ 401 | "from langchain.agents import create_csv_agent\n", 402 | "from langchain.llms import OpenAI" 403 | ], 404 | "metadata": { 405 | "id": "GzQ6sbACUzSf" 406 | }, 407 | "execution_count": 8, 408 | "outputs": [] 409 | }, 410 | { 411 | "cell_type": "code", 412 | "source": [ 413 | "agent = create_csv_agent(OpenAI(temperature=0), \n", 414 | " '/content/train.csv', \n", 415 | " verbose=True)" 416 | ], 417 | "metadata": { 418 | "id": "wFjXiYDjUzZ8" 419 | }, 420 | "execution_count": 9, 421 | "outputs": [] 422 | }, 423 | { 424 | "cell_type": "code", 425 | "source": [ 426 | "agent" 427 | ], 428 | "metadata": { 429 | "colab": { 430 | "base_uri": "https://localhost:8080/" 431 | }, 432 | "id": "0PMNQ-WDuHnh", 433 | "outputId": "0992509d-2682-4e73-f584-ad15faabd222" 434 | }, 435 | "execution_count": 10, 436 | "outputs": [ 437 | { 438 | "output_type": "execute_result", 439 | "data": { 440 | "text/plain": [ 441 | "AgentExecutor(memory=None, callback_manager=, verbose=True, agent=ZeroShotAgent(llm_chain=LLMChain(memory=None, callback_manager=, verbose=False, prompt=PromptTemplate(input_variables=['input', 'agent_scratchpad'], output_parser=None, partial_variables={'df': ' User_ID Product_ID Gender Age Occupation City_Category \\\\\\n0 1000001 P00069042 F 0-17 10 A \\n1 1000001 P00248942 F 0-17 10 A \\n2 1000001 P00087842 F 0-17 10 A \\n3 1000001 P00085442 F 0-17 10 A \\n4 1000002 P00285442 M 55+ 16 C \\n\\n Stay_In_Current_City_Years Marital_Status Product_Category_1 \\\\\\n0 2 0 3 \\n1 2 0 1 \\n2 2 0 12 \\n3 2 0 12 \\n4 4+ 0 8 \\n\\n Product_Category_2 Product_Category_3 Purchase \\n0 NaN NaN 8370 \\n1 6.0 14.0 15200 \\n2 NaN NaN 1422 \\n3 14.0 NaN 1057 \\n4 NaN NaN 7969 '}, template='\\nYou are working with a pandas dataframe in Python. The name of the dataframe is `df`.\\nYou should use the tools below to answer the question posed of you:\\n\\npython_repl_ast: A Python shell. Use this to execute python commands. Input should be a valid python command. When using this tool, sometimes output is abbreviated - make sure it does not look abbreviated before using it in your answer.\\n\\nUse the following format:\\n\\nQuestion: the input question you must answer\\nThought: you should always think about what to do\\nAction: the action to take, should be one of [python_repl_ast]\\nAction Input: the input to the action\\nObservation: the result of the action\\n... (this Thought/Action/Action Input/Observation can repeat N times)\\nThought: I now know the final answer\\nFinal Answer: the final answer to the original input question\\n\\n\\nThis is the result of `print(df.head())`:\\n{df}\\n\\nBegin!\\nQuestion: {input}\\n{agent_scratchpad}', template_format='f-string', validate_template=True), llm=OpenAI(cache=None, verbose=False, callback_manager=, client=, model_name='text-davinci-003', temperature=0.0, max_tokens=256, top_p=1, frequency_penalty=0, presence_penalty=0, n=1, best_of=1, model_kwargs={}, openai_api_key=None, batch_size=20, request_timeout=None, logit_bias={}, max_retries=6, streaming=False), output_key='text'), allowed_tools=['python_repl_ast']), tools=[PythonAstREPLTool(name='python_repl_ast', description='A Python shell. Use this to execute python commands. Input should be a valid python command. When using this tool, sometimes output is abbreviated - make sure it does not look abbreviated before using it in your answer.', return_direct=False, verbose=False, callback_manager=, globals={}, locals={'df': User_ID Product_ID Gender Age Occupation City_Category \\\n", 442 | "0 1000001 P00069042 F 0-17 10 A \n", 443 | "1 1000001 P00248942 F 0-17 10 A \n", 444 | "2 1000001 P00087842 F 0-17 10 A \n", 445 | "3 1000001 P00085442 F 0-17 10 A \n", 446 | "4 1000002 P00285442 M 55+ 16 C \n", 447 | "... ... ... ... ... ... ... \n", 448 | "550063 1006033 P00372445 M 51-55 13 B \n", 449 | "550064 1006035 P00375436 F 26-35 1 C \n", 450 | "550065 1006036 P00375436 F 26-35 15 B \n", 451 | "550066 1006038 P00375436 F 55+ 1 C \n", 452 | "550067 1006039 P00371644 F 46-50 0 B \n", 453 | "\n", 454 | " Stay_In_Current_City_Years Marital_Status Product_Category_1 \\\n", 455 | "0 2 0 3 \n", 456 | "1 2 0 1 \n", 457 | "2 2 0 12 \n", 458 | "3 2 0 12 \n", 459 | "4 4+ 0 8 \n", 460 | "... ... ... ... \n", 461 | "550063 1 1 20 \n", 462 | "550064 3 0 20 \n", 463 | "550065 4+ 1 20 \n", 464 | "550066 2 0 20 \n", 465 | "550067 4+ 1 20 \n", 466 | "\n", 467 | " Product_Category_2 Product_Category_3 Purchase \n", 468 | "0 NaN NaN 8370 \n", 469 | "1 6.0 14.0 15200 \n", 470 | "2 NaN NaN 1422 \n", 471 | "3 14.0 NaN 1057 \n", 472 | "4 NaN NaN 7969 \n", 473 | "... ... ... ... \n", 474 | "550063 NaN NaN 368 \n", 475 | "550064 NaN NaN 371 \n", 476 | "550065 NaN NaN 137 \n", 477 | "550066 NaN NaN 365 \n", 478 | "550067 NaN NaN 490 \n", 479 | "\n", 480 | "[550068 rows x 12 columns]})], return_intermediate_steps=False, max_iterations=15, early_stopping_method='force')" 481 | ] 482 | }, 483 | "metadata": {}, 484 | "execution_count": 10 485 | } 486 | ] 487 | }, 488 | { 489 | "cell_type": "code", 490 | "source": [ 491 | "agent.agent.llm_chain.prompt.template" 492 | ], 493 | "metadata": { 494 | "colab": { 495 | "base_uri": "https://localhost:8080/", 496 | "height": 109 497 | }, 498 | "id": "rS-PPUZkvuS8", 499 | "outputId": "f9bd8f22-cec3-4c98-839a-e75a8f21c2fb" 500 | }, 501 | "execution_count": 11, 502 | "outputs": [ 503 | { 504 | "output_type": "execute_result", 505 | "data": { 506 | "text/plain": [ 507 | "'\\nYou are working with a pandas dataframe in Python. The name of the dataframe is `df`.\\nYou should use the tools below to answer the question posed of you:\\n\\npython_repl_ast: A Python shell. Use this to execute python commands. Input should be a valid python command. When using this tool, sometimes output is abbreviated - make sure it does not look abbreviated before using it in your answer.\\n\\nUse the following format:\\n\\nQuestion: the input question you must answer\\nThought: you should always think about what to do\\nAction: the action to take, should be one of [python_repl_ast]\\nAction Input: the input to the action\\nObservation: the result of the action\\n... (this Thought/Action/Action Input/Observation can repeat N times)\\nThought: I now know the final answer\\nFinal Answer: the final answer to the original input question\\n\\n\\nThis is the result of `print(df.head())`:\\n{df}\\n\\nBegin!\\nQuestion: {input}\\n{agent_scratchpad}'" 508 | ], 509 | "application/vnd.google.colaboratory.intrinsic+json": { 510 | "type": "string" 511 | } 512 | }, 513 | "metadata": {}, 514 | "execution_count": 11 515 | } 516 | ] 517 | }, 518 | { 519 | "cell_type": "markdown", 520 | "source": [ 521 | "\n", 522 | "You are working with a pandas dataframe in Python. The name of the dataframe is `df`.\n", 523 | "You should use the tools below to answer the question posed of you:\n", 524 | "\n", 525 | "python_repl_ast: A Python shell. Use this to execute python commands. Input should be a valid python command. When using this tool, sometimes output is abbreviated - make sure it does not look abbreviated before using it in your answer.\n", 526 | "\n", 527 | "Use the following format:\n", 528 | "\n", 529 | "Question: the input question you must answer\n", 530 | "Thought: you should always think about what to do\n", 531 | "Action: the action to take, should be one of [python_repl_ast]\n", 532 | "Action Input: the input to the action\n", 533 | "Observation: the result of the action\n", 534 | "... (this Thought/Action/Action Input/Observation can repeat N times)\n", 535 | "Thought: I now know the final answer\n", 536 | "Final Answer: the final answer to the original input question\n", 537 | "\n", 538 | "\n", 539 | "This is the result of `print(df.head())`:\n", 540 | "{df}\n", 541 | "\n", 542 | "Begin!\n", 543 | "Question: {input}\n", 544 | "{agent_scratchpad}" 545 | ], 546 | "metadata": { 547 | "id": "flo_FqmZwTPU" 548 | } 549 | }, 550 | { 551 | "cell_type": "code", 552 | "source": [ 553 | "agent.run(\"how many rows are there?\")" 554 | ], 555 | "metadata": { 556 | "colab": { 557 | "base_uri": "https://localhost:8080/", 558 | "height": 237 559 | }, 560 | "id": "inMx5EdoUzby", 561 | "outputId": "67b5f3c1-bc6e-4f58-dd08-e8cb58137b80" 562 | }, 563 | "execution_count": 12, 564 | "outputs": [ 565 | { 566 | "output_type": "stream", 567 | "name": "stdout", 568 | "text": [ 569 | "\n", 570 | "\n", 571 | "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", 572 | "\u001b[32;1m\u001b[1;3mThought: I need to count the number of rows\n", 573 | "Action: python_repl_ast\n", 574 | "Action Input: len(df)\u001b[0m\n", 575 | "Observation: \u001b[36;1m\u001b[1;3m550068\u001b[0m\n", 576 | "Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n", 577 | "Final Answer: 550068\u001b[0m\n", 578 | "\n", 579 | "\u001b[1m> Finished chain.\u001b[0m\n" 580 | ] 581 | }, 582 | { 583 | "output_type": "execute_result", 584 | "data": { 585 | "text/plain": [ 586 | "'550068'" 587 | ], 588 | "application/vnd.google.colaboratory.intrinsic+json": { 589 | "type": "string" 590 | } 591 | }, 592 | "metadata": {}, 593 | "execution_count": 12 594 | } 595 | ] 596 | }, 597 | { 598 | "cell_type": "code", 599 | "source": [ 600 | "agent.run(\"how many people are female?\")" 601 | ], 602 | "metadata": { 603 | "colab": { 604 | "base_uri": "https://localhost:8080/", 605 | "height": 455 606 | }, 607 | "id": "LJ7IjBuJUzdm", 608 | "outputId": "3911d7bc-0551-42f0-940e-5b40d9e3d768" 609 | }, 610 | "execution_count": 13, 611 | "outputs": [ 612 | { 613 | "output_type": "stream", 614 | "name": "stdout", 615 | "text": [ 616 | "\n", 617 | "\n", 618 | "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", 619 | "\u001b[32;1m\u001b[1;3mThought: I need to count the number of people who are female\n", 620 | "Action: python_repl_ast\n", 621 | "Action Input: df[df['Gender'] == 'F'].count()\u001b[0m\n", 622 | "Observation: \u001b[36;1m\u001b[1;3mUser_ID 135809\n", 623 | "Product_ID 135809\n", 624 | "Gender 135809\n", 625 | "Age 135809\n", 626 | "Occupation 135809\n", 627 | "City_Category 135809\n", 628 | "Stay_In_Current_City_Years 135809\n", 629 | "Marital_Status 135809\n", 630 | "Product_Category_1 135809\n", 631 | "Product_Category_2 91530\n", 632 | "Product_Category_3 37594\n", 633 | "Purchase 135809\n", 634 | "dtype: int64\u001b[0m\n", 635 | "Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n", 636 | "Final Answer: 135809\u001b[0m\n", 637 | "\n", 638 | "\u001b[1m> Finished chain.\u001b[0m\n" 639 | ] 640 | }, 641 | { 642 | "output_type": "execute_result", 643 | "data": { 644 | "text/plain": [ 645 | "'135809'" 646 | ], 647 | "application/vnd.google.colaboratory.intrinsic+json": { 648 | "type": "string" 649 | } 650 | }, 651 | "metadata": {}, 652 | "execution_count": 13 653 | } 654 | ] 655 | }, 656 | { 657 | "cell_type": "code", 658 | "source": [ 659 | "agent.run(\"how many people have stayed more than 3 years in the city?\")" 660 | ], 661 | "metadata": { 662 | "colab": { 663 | "base_uri": "https://localhost:8080/", 664 | "height": 237 665 | }, 666 | "id": "CRV3rGBaUzfq", 667 | "outputId": "199c9a73-bfcd-4ad8-a822-241c1f461836" 668 | }, 669 | "execution_count": 14, 670 | "outputs": [ 671 | { 672 | "output_type": "stream", 673 | "name": "stdout", 674 | "text": [ 675 | "\n", 676 | "\n", 677 | "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", 678 | "\u001b[32;1m\u001b[1;3mThought: I need to filter the dataframe to find the people who have stayed more than 3 years in the city\n", 679 | "Action: python_repl_ast\n", 680 | "Action Input: df[df['Stay_In_Current_City_Years'] > '3'].shape[0]\u001b[0m\n", 681 | "Observation: \u001b[36;1m\u001b[1;3m84726\u001b[0m\n", 682 | "Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n", 683 | "Final Answer: 84726 people have stayed more than 3 years in the city.\u001b[0m\n", 684 | "\n", 685 | "\u001b[1m> Finished chain.\u001b[0m\n" 686 | ] 687 | }, 688 | { 689 | "output_type": "execute_result", 690 | "data": { 691 | "text/plain": [ 692 | "'84726 people have stayed more than 3 years in the city.'" 693 | ], 694 | "application/vnd.google.colaboratory.intrinsic+json": { 695 | "type": "string" 696 | } 697 | }, 698 | "metadata": {}, 699 | "execution_count": 14 700 | } 701 | ] 702 | }, 703 | { 704 | "cell_type": "code", 705 | "source": [ 706 | "agent.run(\"how many people have stayed more than 3 years in the city and are female?\")" 707 | ], 708 | "metadata": { 709 | "colab": { 710 | "base_uri": "https://localhost:8080/", 711 | "height": 237 712 | }, 713 | "id": "zuJ0AeKNYN4X", 714 | "outputId": "cd5aa7ba-cc1d-42bc-f756-b5c8dfb650a1" 715 | }, 716 | "execution_count": 15, 717 | "outputs": [ 718 | { 719 | "output_type": "stream", 720 | "name": "stdout", 721 | "text": [ 722 | "\n", 723 | "\n", 724 | "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", 725 | "\u001b[32;1m\u001b[1;3mThought: I need to filter the dataframe to find the answer\n", 726 | "Action: python_repl_ast\n", 727 | "Action Input: df[(df['Stay_In_Current_City_Years'] > '3') & (df['Gender'] == 'F')].shape[0]\u001b[0m\n", 728 | "Observation: \u001b[36;1m\u001b[1;3m18596\u001b[0m\n", 729 | "Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n", 730 | "Final Answer: 18596 people have stayed more than 3 years in the city and are female.\u001b[0m\n", 731 | "\n", 732 | "\u001b[1m> Finished chain.\u001b[0m\n" 733 | ] 734 | }, 735 | { 736 | "output_type": "execute_result", 737 | "data": { 738 | "text/plain": [ 739 | "'18596 people have stayed more than 3 years in the city and are female.'" 740 | ], 741 | "application/vnd.google.colaboratory.intrinsic+json": { 742 | "type": "string" 743 | } 744 | }, 745 | "metadata": {}, 746 | "execution_count": 15 747 | } 748 | ] 749 | }, 750 | { 751 | "cell_type": "code", 752 | "source": [ 753 | "agent.run(\"Are there more males or females?\")" 754 | ], 755 | "metadata": { 756 | "colab": { 757 | "base_uri": "https://localhost:8080/", 758 | "height": 273 759 | }, 760 | "id": "PDM8TfbJv2Uy", 761 | "outputId": "ebd2b66b-9c93-4ddc-e40d-5532e8c95c7c" 762 | }, 763 | "execution_count": 16, 764 | "outputs": [ 765 | { 766 | "output_type": "stream", 767 | "name": "stdout", 768 | "text": [ 769 | "\n", 770 | "\n", 771 | "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", 772 | "\u001b[32;1m\u001b[1;3mThought: I need to count the number of males and females\n", 773 | "Action: python_repl_ast\n", 774 | "Action Input: df['Gender'].value_counts()\u001b[0m\n", 775 | "Observation: \u001b[36;1m\u001b[1;3mM 414259\n", 776 | "F 135809\n", 777 | "Name: Gender, dtype: int64\u001b[0m\n", 778 | "Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n", 779 | "Final Answer: There are more males than females.\u001b[0m\n", 780 | "\n", 781 | "\u001b[1m> Finished chain.\u001b[0m\n" 782 | ] 783 | }, 784 | { 785 | "output_type": "execute_result", 786 | "data": { 787 | "text/plain": [ 788 | "'There are more males than females.'" 789 | ], 790 | "application/vnd.google.colaboratory.intrinsic+json": { 791 | "type": "string" 792 | } 793 | }, 794 | "metadata": {}, 795 | "execution_count": 16 796 | } 797 | ] 798 | }, 799 | { 800 | "cell_type": "markdown", 801 | "source": [ 802 | "## LangChain CSV Loader" 803 | ], 804 | "metadata": { 805 | "id": "HqwsGJDhvAQ5" 806 | } 807 | }, 808 | { 809 | "cell_type": "code", 810 | "source": [ 811 | "from langchain.document_loaders.csv_loader import CSVLoader\n", 812 | "\n", 813 | "loader = CSVLoader(file_path='/content/train.csv')\n", 814 | "data = loader.load()" 815 | ], 816 | "metadata": { 817 | "id": "J-ilDbIBVC-H" 818 | }, 819 | "execution_count": 17, 820 | "outputs": [] 821 | }, 822 | { 823 | "cell_type": "code", 824 | "source": [ 825 | "#custom chain etc" 826 | ], 827 | "metadata": { 828 | "id": "i46ck8EPcruR" 829 | }, 830 | "execution_count": 18, 831 | "outputs": [] 832 | }, 833 | { 834 | "cell_type": "markdown", 835 | "source": [ 836 | "## Convert XLS file to csv" 837 | ], 838 | "metadata": { 839 | "id": "t8de30eojHeD" 840 | } 841 | }, 842 | { 843 | "cell_type": "code", 844 | "source": [ 845 | "!pip install openpyxl" 846 | ], 847 | "metadata": { 848 | "colab": { 849 | "base_uri": "https://localhost:8080/" 850 | }, 851 | "id": "1QSii5AGaS90", 852 | "outputId": "9cd60053-191b-4bc9-fa3a-fbf935967e84" 853 | }, 854 | "execution_count": 19, 855 | "outputs": [ 856 | { 857 | "output_type": "stream", 858 | "name": "stdout", 859 | "text": [ 860 | "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", 861 | "Requirement already satisfied: openpyxl in /usr/local/lib/python3.9/dist-packages (3.0.10)\n", 862 | "Requirement already satisfied: et-xmlfile in /usr/local/lib/python3.9/dist-packages (from openpyxl) (1.1.0)\n" 863 | ] 864 | } 865 | ] 866 | }, 867 | { 868 | "cell_type": "code", 869 | "source": [ 870 | "!wget -q https://www.dropbox.com/s/co5kwipba9hpumt/excel_files.zip\n", 871 | "!unzip -q excel_files.zip" 872 | ], 873 | "metadata": { 874 | "id": "iFquHDs4jpfy" 875 | }, 876 | "execution_count": 20, 877 | "outputs": [] 878 | }, 879 | { 880 | "cell_type": "code", 881 | "source": [ 882 | "xls_file = r'/content/excel_file_example.xlsx'\n", 883 | "output_csv = r'/content/excel_file_example.csv'\n", 884 | "\n", 885 | "# Read the XLS file using pandas and openpyxl as the engine\n", 886 | "data = pd.read_excel(xls_file, engine='openpyxl')\n", 887 | "\n", 888 | "# Save the data as a CSV file\n", 889 | "data.to_csv(output_csv, index=False)" 890 | ], 891 | "metadata": { 892 | "id": "S9lVOQk-mM-o" 893 | }, 894 | "execution_count": 21, 895 | "outputs": [] 896 | }, 897 | { 898 | "cell_type": "code", 899 | "source": [ 900 | "agent = create_csv_agent(OpenAI(temperature=0), \n", 901 | " '/content/excel_file_example.csv', \n", 902 | " verbose=True)" 903 | ], 904 | "metadata": { 905 | "id": "HgzQAEKPYN_b" 906 | }, 907 | "execution_count": 22, 908 | "outputs": [] 909 | }, 910 | { 911 | "cell_type": "code", 912 | "source": [ 913 | "agent.run(\"What are the column names?\")" 914 | ], 915 | "metadata": { 916 | "colab": { 917 | "base_uri": "https://localhost:8080/", 918 | "height": 273 919 | }, 920 | "id": "cKULEHFeeu_0", 921 | "outputId": "560a9353-9af2-43f7-b6c8-a22ed3f2100e" 922 | }, 923 | "execution_count": 23, 924 | "outputs": [ 925 | { 926 | "output_type": "stream", 927 | "name": "stdout", 928 | "text": [ 929 | "\n", 930 | "\n", 931 | "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", 932 | "\u001b[32;1m\u001b[1;3mThought: I need to find out what the column names are\n", 933 | "Action: python_repl_ast\n", 934 | "Action Input: df.columns\u001b[0m\n", 935 | "Observation: \u001b[36;1m\u001b[1;3mIndex(['0', 'First Name', 'Last Name', 'Gender', 'Country', 'Age', 'Date',\n", 936 | " 'Id'],\n", 937 | " dtype='object')\u001b[0m\n", 938 | "Thought:\u001b[32;1m\u001b[1;3m I now know the column names\n", 939 | "Final Answer: 0, First Name, Last Name, Gender, Country, Age, Date, Id\u001b[0m\n", 940 | "\n", 941 | "\u001b[1m> Finished chain.\u001b[0m\n" 942 | ] 943 | }, 944 | { 945 | "output_type": "execute_result", 946 | "data": { 947 | "text/plain": [ 948 | "'0, First Name, Last Name, Gender, Country, Age, Date, Id'" 949 | ], 950 | "application/vnd.google.colaboratory.intrinsic+json": { 951 | "type": "string" 952 | } 953 | }, 954 | "metadata": {}, 955 | "execution_count": 23 956 | } 957 | ] 958 | }, 959 | { 960 | "cell_type": "code", 961 | "source": [ 962 | "agent.run(\"What is the average age?\")" 963 | ], 964 | "metadata": { 965 | "colab": { 966 | "base_uri": "https://localhost:8080/", 967 | "height": 237 968 | }, 969 | "id": "DAnTdAlYevBn", 970 | "outputId": "40476556-9f7e-4aa0-f5e9-5c25406ea71a" 971 | }, 972 | "execution_count": 24, 973 | "outputs": [ 974 | { 975 | "output_type": "stream", 976 | "name": "stdout", 977 | "text": [ 978 | "\n", 979 | "\n", 980 | "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", 981 | "\u001b[32;1m\u001b[1;3mThought: I need to calculate the average age of the people in the dataframe\n", 982 | "Action: python_repl_ast\n", 983 | "Action Input: df['Age'].mean()\u001b[0m\n", 984 | "Observation: \u001b[36;1m\u001b[1;3m33.26\u001b[0m\n", 985 | "Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n", 986 | "Final Answer: The average age is 33.26.\u001b[0m\n", 987 | "\n", 988 | "\u001b[1m> Finished chain.\u001b[0m\n" 989 | ] 990 | }, 991 | { 992 | "output_type": "execute_result", 993 | "data": { 994 | "text/plain": [ 995 | "'The average age is 33.26.'" 996 | ], 997 | "application/vnd.google.colaboratory.intrinsic+json": { 998 | "type": "string" 999 | } 1000 | }, 1001 | "metadata": {}, 1002 | "execution_count": 24 1003 | } 1004 | ] 1005 | }, 1006 | { 1007 | "cell_type": "code", 1008 | "source": [ 1009 | "agent.run(\"Which country appears the most and how many times does it appear?\")" 1010 | ], 1011 | "metadata": { 1012 | "id": "yxbY7aQ_evDa", 1013 | "colab": { 1014 | "base_uri": "https://localhost:8080/", 1015 | "height": 291 1016 | }, 1017 | "outputId": "b8cb0a75-37f4-4d57-bf19-4c678a7456cb" 1018 | }, 1019 | "execution_count": 25, 1020 | "outputs": [ 1021 | { 1022 | "output_type": "stream", 1023 | "name": "stdout", 1024 | "text": [ 1025 | "\n", 1026 | "\n", 1027 | "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", 1028 | "\u001b[32;1m\u001b[1;3mThought: I need to count the number of times each country appears in the dataframe.\n", 1029 | "Action: python_repl_ast\n", 1030 | "Action Input: df['Country'].value_counts()\u001b[0m\n", 1031 | "Observation: \u001b[36;1m\u001b[1;3mUnited States 48\n", 1032 | "Great Britain 28\n", 1033 | "France 24\n", 1034 | "Name: Country, dtype: int64\u001b[0m\n", 1035 | "Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n", 1036 | "Final Answer: The country that appears the most is United States, appearing 48 times.\u001b[0m\n", 1037 | "\n", 1038 | "\u001b[1m> Finished chain.\u001b[0m\n" 1039 | ] 1040 | }, 1041 | { 1042 | "output_type": "execute_result", 1043 | "data": { 1044 | "text/plain": [ 1045 | "'The country that appears the most is United States, appearing 48 times.'" 1046 | ], 1047 | "application/vnd.google.colaboratory.intrinsic+json": { 1048 | "type": "string" 1049 | } 1050 | }, 1051 | "metadata": {}, 1052 | "execution_count": 25 1053 | } 1054 | ] 1055 | }, 1056 | { 1057 | "cell_type": "code", 1058 | "source": [ 1059 | "agent.run(\"What is the ratio of males to females?\")" 1060 | ], 1061 | "metadata": { 1062 | "id": "Kcf1mGHRyyMb", 1063 | "colab": { 1064 | "base_uri": "https://localhost:8080/", 1065 | "height": 273 1066 | }, 1067 | "outputId": "43df5fef-c7c4-467a-fbc8-0f10d016c8d1" 1068 | }, 1069 | "execution_count": 26, 1070 | "outputs": [ 1071 | { 1072 | "output_type": "stream", 1073 | "name": "stdout", 1074 | "text": [ 1075 | "\n", 1076 | "\n", 1077 | "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", 1078 | "\u001b[32;1m\u001b[1;3mThought: I need to count the number of males and females\n", 1079 | "Action: python_repl_ast\n", 1080 | "Action Input: df['Gender'].value_counts()\u001b[0m\n", 1081 | "Observation: \u001b[36;1m\u001b[1;3mFemale 76\n", 1082 | "Male 24\n", 1083 | "Name: Gender, dtype: int64\u001b[0m\n", 1084 | "Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n", 1085 | "Final Answer: 24:76\u001b[0m\n", 1086 | "\n", 1087 | "\u001b[1m> Finished chain.\u001b[0m\n" 1088 | ] 1089 | }, 1090 | { 1091 | "output_type": "execute_result", 1092 | "data": { 1093 | "text/plain": [ 1094 | "'24:76'" 1095 | ], 1096 | "application/vnd.google.colaboratory.intrinsic+json": { 1097 | "type": "string" 1098 | } 1099 | }, 1100 | "metadata": {}, 1101 | "execution_count": 26 1102 | } 1103 | ] 1104 | }, 1105 | { 1106 | "cell_type": "code", 1107 | "source": [], 1108 | "metadata": { 1109 | "id": "BOE02tgSy7NX" 1110 | }, 1111 | "execution_count": 26, 1112 | "outputs": [] 1113 | } 1114 | ] 1115 | } -------------------------------------------------------------------------------- /gpt_cache_colab.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "view-in-github", 7 | "colab_type": "text" 8 | }, 9 | "source": [ 10 | "\"Open" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "id": "6b3ba1cc", 16 | "metadata": { 17 | "id": "6b3ba1cc" 18 | }, 19 | "source": [ 20 | "# Chat\n", 21 | "\n", 22 | "[This example](https://gptcache.readthedocs.io/en/latest/bootcamp/openai/chat.html) will show you how to chat with GPT, the original example is on [OpenAI Example](https://platform.openai.com/docs/guides/chat/introduction), the difference is that we will teach you how to cache the response for exact and similar matches with **gptcache**, it will be very simple, you just need to add an extra step to initialize the cache.\n" 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "source": [ 28 | "!pip install -q gptcache\n", 29 | "!pip install openai" 30 | ], 31 | "metadata": { 32 | "id": "4aAjAIFCzny5" 33 | }, 34 | "id": "4aAjAIFCzny5", 35 | "execution_count": null, 36 | "outputs": [] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "source": [ 41 | "Before running the example, make sure the `OPENAI_API_KEY` environment variable is set by executing `echo $OPENAI_API_KEY`. If it is not already set, it can be set by using `export OPENAI_API_KEY=YOUR_API_KEY` on Unix/Linux/MacOS systems or `set OPENAI_API_KEY=YOUR_API_KEY` on Windows systems.\n", 42 | "\n", 43 | "> We can run `os.environ` to set the environment variable in colab." 44 | ], 45 | "metadata": { 46 | "id": "WxCz_TiwzwbO" 47 | }, 48 | "id": "WxCz_TiwzwbO" 49 | }, 50 | { 51 | "cell_type": "code", 52 | "source": [ 53 | "import os\n", 54 | "os.environ[\"OPENAI_API_KEY\"] = \"YOUR_API_KEY\"" 55 | ], 56 | "metadata": { 57 | "id": "6x1GVChYy-pC" 58 | }, 59 | "id": "6x1GVChYy-pC", 60 | "execution_count": 4, 61 | "outputs": [] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "source": [ 66 | "Then we can learn the usage and acceleration effect of gptcache by the following code, which consists of three parts, the original openai way, the exact search and the similar search.\n" 67 | ], 68 | "metadata": { 69 | "id": "_hzEQ9T90AOO" 70 | }, 71 | "id": "_hzEQ9T90AOO" 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "id": "aa0ba70e", 76 | "metadata": { 77 | "id": "aa0ba70e" 78 | }, 79 | "source": [ 80 | "## OpenAI API original usage" 81 | ] 82 | }, 83 | { 84 | "cell_type": "code", 85 | "execution_count": 7, 86 | "id": "80e9dae2", 87 | "metadata": { 88 | "colab": { 89 | "base_uri": "https://localhost:8080/" 90 | }, 91 | "id": "80e9dae2", 92 | "outputId": "425b15f7-3d59-4930-948c-da078e824b63" 93 | }, 94 | "outputs": [ 95 | { 96 | "output_type": "stream", 97 | "name": "stdout", 98 | "text": [ 99 | "Question: what‘s github?\n", 100 | "Time consuming: 6.96s\n", 101 | "Answer: GitHub is a web-based platform that allows developers to collaborate on software development projects. It is a repository hosting service which provides version control, issue tracking, collaborative coding, and other features for free or at a cost. GitHub provides a platform for developers to store and manage their code, to track and manage changes to code, and to collaborate on coding projects with other developers. It is widely used in the open-source community and by companies for internal development projects. GitHub is also home to a vast library of open-source software that anyone can download, use, and contribute to.\n", 102 | "\n" 103 | ] 104 | } 105 | ], 106 | "source": [ 107 | "import time\n", 108 | "import openai\n", 109 | "\n", 110 | "\n", 111 | "def response_text(openai_resp):\n", 112 | " return openai_resp['choices'][0]['message']['content']\n", 113 | "\n", 114 | "\n", 115 | "question = 'what‘s github?'\n", 116 | "\n", 117 | "# OpenAI API original usage\n", 118 | "start_time = time.time()\n", 119 | "response = openai.ChatCompletion.create(\n", 120 | " model='gpt-3.5-turbo',\n", 121 | " messages=[\n", 122 | " {\n", 123 | " 'role': 'user',\n", 124 | " 'content': question\n", 125 | " }\n", 126 | " ],\n", 127 | ")\n", 128 | "print(f'Question: {question}')\n", 129 | "print(\"Time consuming: {:.2f}s\".format(time.time() - start_time))\n", 130 | "print(f'Answer: {response_text(response)}\\n')" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "id": "9d871550", 136 | "metadata": { 137 | "id": "9d871550" 138 | }, 139 | "source": [ 140 | "## OpenAI API + GPTCache, exact match cache\n", 141 | "\n", 142 | "Initalize the cache to run GPTCache and import `openai` form `gptcache.adapter`, which will automatically set the map data manager to match the exact cahe, more details refer to [build your cache](https://gptcache.readthedocs.io/en/dev/usage.html#build-your-cache).\n", 143 | "\n", 144 | "And if you ask ChatGPT the exact same two questions, the answer to the second question will be obtained from the cache without requesting ChatGPT again." 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 8, 150 | "id": "024484f3", 151 | "metadata": { 152 | "colab": { 153 | "base_uri": "https://localhost:8080/" 154 | }, 155 | "id": "024484f3", 156 | "outputId": "82a4f39d-ef7e-4dad-8ca0-7ed1e227c375" 157 | }, 158 | "outputs": [ 159 | { 160 | "output_type": "stream", 161 | "name": "stdout", 162 | "text": [ 163 | "Cache loading.....\n", 164 | "Question: what's github\n", 165 | "Time consuming: 5.06s\n", 166 | "Answer: GitHub is a web-based platform that is designed for hosting and collaborating on software development projects using version control. It allows users to store their code in repositories and collaborate with other developers on projects. GitHub also offers a variety of tools and services for project management, like issue tracking, code review, and continuous integration. It is used by developers worldwide for open source and private projects, whether as a team or an individual developer.\n", 167 | "\n", 168 | "Question: what's github\n", 169 | "Time consuming: 6.25s\n", 170 | "Answer: GitHub is a web-based platform that is designed for hosting and collaborating on software development projects using version control. It allows users to store their code in repositories and collaborate with other developers on projects. GitHub also offers a variety of tools and services for project management, like issue tracking, code review, and continuous integration. It is used by developers worldwide for open source and private projects, whether as a team or an individual developer.\n", 171 | "\n" 172 | ] 173 | } 174 | ], 175 | "source": [ 176 | "import time\n", 177 | "\n", 178 | "\n", 179 | "def response_text(openai_resp):\n", 180 | " return openai_resp['choices'][0]['message']['content']\n", 181 | "\n", 182 | "print(\"Cache loading.....\")\n", 183 | "\n", 184 | "# To use GPTCache, that's all you need\n", 185 | "# -------------------------------------------------\n", 186 | "from gptcache import cache\n", 187 | "from gptcache.adapter import openai\n", 188 | "\n", 189 | "cache.init()\n", 190 | "cache.set_openai_key()\n", 191 | "# -------------------------------------------------\n", 192 | "\n", 193 | "question = \"what's github\"\n", 194 | "for _ in range(2):\n", 195 | " start_time = time.time()\n", 196 | " response = openai.ChatCompletion.create(\n", 197 | " model='gpt-3.5-turbo',\n", 198 | " messages=[\n", 199 | " {\n", 200 | " 'role': 'user',\n", 201 | " 'content': question\n", 202 | " }\n", 203 | " ],\n", 204 | " )\n", 205 | " print(f'Question: {question}')\n", 206 | " print(\"Time consuming: {:.2f}s\".format(time.time() - start_time))\n", 207 | " print(f'Answer: {response_text(response)}\\n')" 208 | ] 209 | }, 210 | { 211 | "cell_type": "markdown", 212 | "id": "6f2ff699", 213 | "metadata": { 214 | "id": "6f2ff699" 215 | }, 216 | "source": [ 217 | "## OpenAI API + GPTCache, similar search cache\n", 218 | "\n", 219 | "Set the cache with `embedding_func` to generate embedding for the text, and `data_manager` to manager the cache data, `similarity_evaluation` to evaluate the similarities, more details refer to [build your cache](https://gptcache.readthedocs.io/en/dev/usage.html#build-your-cache).\n", 220 | "\n", 221 | "After obtaining an answer from ChatGPT in response to several similar questions, the answers to subsequent questions can be retrieved from the cache without the need to request ChatGPT again." 222 | ] 223 | }, 224 | { 225 | "cell_type": "code", 226 | "execution_count": 10, 227 | "id": "fd1ff06e", 228 | "metadata": { 229 | "colab": { 230 | "base_uri": "https://localhost:8080/" 231 | }, 232 | "id": "fd1ff06e", 233 | "outputId": "644b0961-ac73-4399-e2f1-be2cdae0f4fc" 234 | }, 235 | "outputs": [ 236 | { 237 | "output_type": "stream", 238 | "name": "stdout", 239 | "text": [ 240 | "Cache loading.....\n", 241 | "Question: what's github\n", 242 | "Time consuming: 8.48s\n", 243 | "Answer: GitHub is a platform for software developers to collaborate, share, and work on open-source code. It provides a web-based interface for version control and facilitates collaboration among developers. Using GitHub, developers can share and contribute to open-source projects, track issues, and merge code changes from multiple contributors. Additionally, GitHub provides a range of tools for project management, documentation, and code review. It has become one of the most popular platforms for open-source development due to its ease of use and extensive community support.\n", 244 | "\n", 245 | "Question: can you explain what GitHub is\n", 246 | "Time consuming: 0.75s\n", 247 | "Answer: GitHub is a platform for software developers to collaborate, share, and work on open-source code. It provides a web-based interface for version control and facilitates collaboration among developers. Using GitHub, developers can share and contribute to open-source projects, track issues, and merge code changes from multiple contributors. Additionally, GitHub provides a range of tools for project management, documentation, and code review. It has become one of the most popular platforms for open-source development due to its ease of use and extensive community support.\n", 248 | "\n", 249 | "Question: can you tell me more about GitHub\n", 250 | "Time consuming: 0.73s\n", 251 | "Answer: GitHub is a platform for software developers to collaborate, share, and work on open-source code. It provides a web-based interface for version control and facilitates collaboration among developers. Using GitHub, developers can share and contribute to open-source projects, track issues, and merge code changes from multiple contributors. Additionally, GitHub provides a range of tools for project management, documentation, and code review. It has become one of the most popular platforms for open-source development due to its ease of use and extensive community support.\n", 252 | "\n", 253 | "Question: what is the purpose of GitHub\n", 254 | "Time consuming: 0.72s\n", 255 | "Answer: GitHub is a platform for software developers to collaborate, share, and work on open-source code. It provides a web-based interface for version control and facilitates collaboration among developers. Using GitHub, developers can share and contribute to open-source projects, track issues, and merge code changes from multiple contributors. Additionally, GitHub provides a range of tools for project management, documentation, and code review. It has become one of the most popular platforms for open-source development due to its ease of use and extensive community support.\n", 256 | "\n" 257 | ] 258 | } 259 | ], 260 | "source": [ 261 | "import time\n", 262 | "\n", 263 | "\n", 264 | "def response_text(openai_resp):\n", 265 | " return openai_resp['choices'][0]['message']['content']\n", 266 | "\n", 267 | "from gptcache import cache\n", 268 | "from gptcache.adapter import openai\n", 269 | "from gptcache.embedding import Onnx\n", 270 | "from gptcache.manager import CacheBase, VectorBase, get_data_manager\n", 271 | "from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation\n", 272 | "\n", 273 | "print(\"Cache loading.....\")\n", 274 | "\n", 275 | "onnx = Onnx()\n", 276 | "data_manager = get_data_manager(CacheBase(\"sqlite\"), VectorBase(\"faiss\", dimension=onnx.dimension))\n", 277 | "cache.init(\n", 278 | " embedding_func=onnx.to_embeddings,\n", 279 | " data_manager=data_manager,\n", 280 | " similarity_evaluation=SearchDistanceEvaluation(),\n", 281 | " )\n", 282 | "cache.set_openai_key()\n", 283 | "\n", 284 | "questions = [\n", 285 | " \"what's github\",\n", 286 | " \"can you explain what GitHub is\",\n", 287 | " \"can you tell me more about GitHub\",\n", 288 | " \"what is the purpose of GitHub\"\n", 289 | "]\n", 290 | "\n", 291 | "for question in questions:\n", 292 | " start_time = time.time()\n", 293 | " response = openai.ChatCompletion.create(\n", 294 | " model='gpt-3.5-turbo',\n", 295 | " messages=[\n", 296 | " {\n", 297 | " 'role': 'user',\n", 298 | " 'content': question\n", 299 | " }\n", 300 | " ],\n", 301 | " )\n", 302 | " print(f'Question: {question}')\n", 303 | " print(\"Time consuming: {:.2f}s\".format(time.time() - start_time))\n", 304 | " print(f'Answer: {response_text(response)}\\n')" 305 | ] 306 | }, 307 | { 308 | "cell_type": "code", 309 | "execution_count": 9, 310 | "id": "07d92eae", 311 | "metadata": { 312 | "id": "07d92eae" 313 | }, 314 | "outputs": [], 315 | "source": [] 316 | } 317 | ], 318 | "metadata": { 319 | "kernelspec": { 320 | "display_name": "Python 3 (ipykernel)", 321 | "language": "python", 322 | "name": "python3" 323 | }, 324 | "language_info": { 325 | "codemirror_mode": { 326 | "name": "ipython", 327 | "version": 3 328 | }, 329 | "file_extension": ".py", 330 | "mimetype": "text/x-python", 331 | "name": "python", 332 | "nbconvert_exporter": "python", 333 | "pygments_lexer": "ipython3", 334 | "version": "3.9.12" 335 | }, 336 | "colab": { 337 | "provenance": [], 338 | "include_colab_link": true 339 | } 340 | }, 341 | "nbformat": 4, 342 | "nbformat_minor": 5 343 | } -------------------------------------------------------------------------------- /langChain_pandas_agent.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "provenance": [], 7 | "include_colab_link": true 8 | }, 9 | "kernelspec": { 10 | "name": "python3", 11 | "display_name": "Python 3" 12 | }, 13 | "language_info": { 14 | "name": "python" 15 | } 16 | }, 17 | "cells": [ 18 | { 19 | "cell_type": "markdown", 20 | "metadata": { 21 | "id": "view-in-github", 22 | "colab_type": "text" 23 | }, 24 | "source": [ 25 | "\"Open" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "source": [ 31 | "!pip install langchain -q" 32 | ], 33 | "metadata": { 34 | "id": "7XEY_qVZPAvv" 35 | }, 36 | "execution_count": 2, 37 | "outputs": [] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 3, 42 | "metadata": { 43 | "id": "MNjwx-WtLUN7" 44 | }, 45 | "outputs": [], 46 | "source": [ 47 | "import os\n", 48 | "\n", 49 | "import pandas as pd\n", 50 | "from langchain.agents import create_pandas_dataframe_agent\n", 51 | "from langchain.llms import OpenAI\n", 52 | "#" 53 | ] 54 | }, 55 | { 56 | "cell_type": "code", 57 | "source": [ 58 | "#os.environ[\"OPENAI_API_KEY\"] = # Uncomment and provide you own apikey" 59 | ], 60 | "metadata": { 61 | "id": "zvFkSnIyKFLe" 62 | }, 63 | "execution_count": 7, 64 | "outputs": [] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "source": [ 69 | "!wget https://raw.githubusercontent.com/navneetkrc/langchain_colab_experiments/main/package-manifest.csv\n", 70 | "df = pd.read_csv(\"/content/package-manifest.csv\")\n", 71 | "df.head()" 72 | ], 73 | "metadata": { 74 | "id": "LdXj_ofPLtVh", 75 | "outputId": "3d665de1-b8f2-40c5-909a-f85663df0f12", 76 | "colab": { 77 | "base_uri": "https://localhost:8080/", 78 | "height": 397 79 | } 80 | }, 81 | "execution_count": 11, 82 | "outputs": [ 83 | { 84 | "output_type": "stream", 85 | "name": "stdout", 86 | "text": [ 87 | "--2023-04-15 18:25:53-- https://raw.githubusercontent.com/navneetkrc/langchain_colab_experiments/main/package-manifest.csv\n", 88 | "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.109.133, ...\n", 89 | "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.\n", 90 | "HTTP request sent, awaiting response... 200 OK\n", 91 | "Length: 51822 (51K) [text/plain]\n", 92 | "Saving to: ‘package-manifest.csv.1’\n", 93 | "\n", 94 | "package-manifest.cs 100%[===================>] 50.61K --.-KB/s in 0.01s \n", 95 | "\n", 96 | "2023-04-15 18:25:54 (3.89 MB/s) - ‘package-manifest.csv.1’ saved [51822/51822]\n", 97 | "\n" 98 | ] 99 | }, 100 | { 101 | "output_type": "execute_result", 102 | "data": { 103 | "text/plain": [ 104 | " package_name version summary \\\n", 105 | "0 Babel 2.7.0 Internationalization utilities \n", 106 | "1 Baker 1.3 Easy, powerful access to Python functions from... \n", 107 | "2 Boruta 0.3 Python Implementation of Boruta Feature Selection \n", 108 | "3 Bottleneck 1.2.1 Fast NumPy array functions written in C \n", 109 | "4 CVXcanon 0.1.1 A low-level library to perform the matrix buil... \n", 110 | "\n", 111 | " license metadata_source \n", 112 | "0 BSD PyPI \n", 113 | "1 Apache 2.0 PyPI \n", 114 | "2 BSD 3 clause PyPI \n", 115 | "3 Simplified BSD PyPI \n", 116 | "4 GPLv3 PyPI " 117 | ], 118 | "text/html": [ 119 | "\n", 120 | "
\n", 121 | "
\n", 122 | "
\n", 123 | "\n", 136 | "\n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | "
package_nameversionsummarylicensemetadata_source
0Babel2.7.0Internationalization utilitiesBSDPyPI
1Baker1.3Easy, powerful access to Python functions from...Apache 2.0PyPI
2Boruta0.3Python Implementation of Boruta Feature SelectionBSD 3 clausePyPI
3Bottleneck1.2.1Fast NumPy array functions written in CSimplified BSDPyPI
4CVXcanon0.1.1A low-level library to perform the matrix buil...GPLv3PyPI
\n", 190 | "
\n", 191 | " \n", 201 | " \n", 202 | " \n", 239 | "\n", 240 | " \n", 264 | "
\n", 265 | "
\n", 266 | " " 267 | ] 268 | }, 269 | "metadata": {}, 270 | "execution_count": 11 271 | } 272 | ] 273 | }, 274 | { 275 | "cell_type": "code", 276 | "source": [ 277 | "agent = create_pandas_dataframe_agent(OpenAI(temperature=0), df, verbose=True)" 278 | ], 279 | "metadata": { 280 | "id": "_CKCD9mLL1lv" 281 | }, 282 | "execution_count": 8, 283 | "outputs": [] 284 | }, 285 | { 286 | "cell_type": "code", 287 | "source": [ 288 | "agent.run(\"what are the best packages for data visualization?\")" 289 | ], 290 | "metadata": { 291 | "colab": { 292 | "base_uri": "https://localhost:8080/", 293 | "height": 642 294 | }, 295 | "id": "4IT-G0IiMaU8", 296 | "outputId": "98183627-8d42-4eb0-dca0-c53edfb23099" 297 | }, 298 | "execution_count": 10, 299 | "outputs": [ 300 | { 301 | "output_type": "stream", 302 | "name": "stdout", 303 | "text": [ 304 | "\n", 305 | "\n", 306 | "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", 307 | "\u001b[32;1m\u001b[1;3mThought: I need to find packages that are related to data visualization\n", 308 | "Action: python_repl_ast\n", 309 | "Action Input: df[df['summary'].str.contains('visualization')]\u001b[0m\n", 310 | "Observation: \u001b[36;1m\u001b[1;3mCannot mask with non-boolean array containing NA / NaN values\u001b[0m\n", 311 | "Thought:\u001b[32;1m\u001b[1;3m I need to filter out the NA/NaN values\n", 312 | "Action: python_repl_ast\n", 313 | "Action Input: df[df['summary'].str.contains('visualization', na=False)]\u001b[0m\n", 314 | "Observation: \u001b[36;1m\u001b[1;3m package_name version summary \\\n", 315 | "62 altair 3.2.0 Altair: A declarative statistical visualizatio... \n", 316 | "145 datashader 0.7.0 Data visualization toolchain based on aggregat... \n", 317 | "224 graphviz 0.8.4 Open Source graph visualization software. \n", 318 | "317 missingno 0.4.2 Missing data visualization module for Python. \n", 319 | "425 pyLDAvis 2.1.2 Interactive topic model visualization. Port of... \n", 320 | "519 seaborn 0.9.0 Statistical data visualization \n", 321 | "604 vida 0.3 Python binding for Vida data visualizations \n", 322 | "605 visvis 1.11.2 An object oriented approach to visualization o... \n", 323 | "607 vtk 8.1.2 VTK is an open-source toolkit for 3D computer ... \n", 324 | "\n", 325 | " license metadata_source \n", 326 | "62 BSD 3-clause PyPI \n", 327 | "145 BSD-3-Clause Anaconda \n", 328 | "224 EPL v1.0 Anaconda \n", 329 | "317 NaN PyPI \n", 330 | "425 MIT PyPI \n", 331 | "519 BSD 3-Clause Anaconda \n", 332 | "604 UNKNOWN PyPI \n", 333 | "605 BSD 3-Clause Anaconda \n", 334 | "607 BSD PyPI \u001b[0m\n", 335 | "Thought:\u001b[32;1m\u001b[1;3m I now know the best packages for data visualization\n", 336 | "Final Answer: altair, datashader, graphviz, missingno, pyLDAvis, seaborn, vida, visvis, and vtk.\u001b[0m\n", 337 | "\n", 338 | "\u001b[1m> Finished chain.\u001b[0m\n" 339 | ] 340 | }, 341 | { 342 | "output_type": "execute_result", 343 | "data": { 344 | "text/plain": [ 345 | "'altair, datashader, graphviz, missingno, pyLDAvis, seaborn, vida, visvis, and vtk.'" 346 | ], 347 | "application/vnd.google.colaboratory.intrinsic+json": { 348 | "type": "string" 349 | } 350 | }, 351 | "metadata": {}, 352 | "execution_count": 10 353 | } 354 | ] 355 | }, 356 | { 357 | "cell_type": "code", 358 | "source": [ 359 | "agent.run(\"what are the best packages for web scraping?\")" 360 | ], 361 | "metadata": { 362 | "id": "3Fg6GOEHKK1G", 363 | "outputId": "30e6e0b7-c73a-4121-cfc7-237a4e5a0d98", 364 | "colab": { 365 | "base_uri": "https://localhost:8080/", 366 | "height": 573 367 | } 368 | }, 369 | "execution_count": 12, 370 | "outputs": [ 371 | { 372 | "output_type": "stream", 373 | "name": "stdout", 374 | "text": [ 375 | "\n", 376 | "\n", 377 | "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", 378 | "\u001b[32;1m\u001b[1;3mThought: I need to find packages related to web scraping\n", 379 | "Action: python_repl_ast\n", 380 | "Action Input: df[df['summary'].str.contains('web scraping', case=False)]\u001b[0m\n", 381 | "Observation: \u001b[36;1m\u001b[1;3mCannot mask with non-boolean array containing NA / NaN values\u001b[0m\n", 382 | "Thought:\u001b[32;1m\u001b[1;3m I need to remove NA/NaN values\n", 383 | "Action: python_repl_ast\n", 384 | "Action Input: df[df['summary'].str.contains('web scraping', case=False)].dropna()\u001b[0m\n", 385 | "Observation: \u001b[36;1m\u001b[1;3mCannot mask with non-boolean array containing NA / NaN values\u001b[0m\n", 386 | "Thought:\u001b[32;1m\u001b[1;3m I need to find packages related to web scraping\n", 387 | "Action: python_repl_ast\n", 388 | "Action Input: df[df['summary'].str.contains('web scraping', case=False)].dropna()['package_name']\u001b[0m\n", 389 | "Observation: \u001b[36;1m\u001b[1;3mCannot mask with non-boolean array containing NA / NaN values\u001b[0m\n", 390 | "Thought:\u001b[32;1m\u001b[1;3m I need to find packages related to web scraping\n", 391 | "Action: python_repl_ast\n", 392 | "Action Input: df[df['summary'].str.contains('web scraping', case=False)].dropna()['package_name'].values\u001b[0m\n", 393 | "Observation: \u001b[36;1m\u001b[1;3mCannot mask with non-boolean array containing NA / NaN values\u001b[0m\n", 394 | "Thought:\u001b[32;1m\u001b[1;3m I need to find packages related to web scraping\n", 395 | "Action: python_repl_ast\n", 396 | "Action Input: df[df['summary'].str.contains('web scraping', case=False)].dropna()['package_name'].values.tolist()\u001b[0m\n", 397 | "Observation: \u001b[36;1m\u001b[1;3mCannot mask with non-boolean array containing NA / NaN values\u001b[0m\n", 398 | "Thought:\u001b[32;1m\u001b[1;3m I need to find packages related to web scraping\n", 399 | "Action: python_repl_ast\n", 400 | "Action Input: df[df['summary'].str.contains('web scraping', na=False)].dropna()['package_name'].values.tolist()\u001b[0m\n", 401 | "Observation: \u001b[36;1m\u001b[1;3m[]\u001b[0m\n", 402 | "Thought:\u001b[32;1m\u001b[1;3m There are no packages related to web scraping in this dataframe\n", 403 | "Final Answer: There are no packages related to web scraping in this dataframe.\u001b[0m\n", 404 | "\n", 405 | "\u001b[1m> Finished chain.\u001b[0m\n" 406 | ] 407 | }, 408 | { 409 | "output_type": "execute_result", 410 | "data": { 411 | "text/plain": [ 412 | "'There are no packages related to web scraping in this dataframe.'" 413 | ], 414 | "application/vnd.google.colaboratory.intrinsic+json": { 415 | "type": "string" 416 | } 417 | }, 418 | "metadata": {}, 419 | "execution_count": 12 420 | } 421 | ] 422 | }, 423 | { 424 | "cell_type": "code", 425 | "source": [], 426 | "metadata": { 427 | "id": "KkQg36fWK0S0" 428 | }, 429 | "execution_count": null, 430 | "outputs": [] 431 | } 432 | ] 433 | } -------------------------------------------------------------------------------- /package-manifest.csv: -------------------------------------------------------------------------------- 1 | package_name,version,summary,license,metadata_source 2 | Babel,2.7.0,Internationalization utilities,BSD,PyPI 3 | Baker,1.3,"Easy, powerful access to Python functions from the command line",Apache 2.0,PyPI 4 | Boruta,0.3,Python Implementation of Boruta Feature Selection,BSD 3 clause,PyPI 5 | Bottleneck,1.2.1,Fast NumPy array functions written in C,Simplified BSD,PyPI 6 | CVXcanon,0.1.1,"A low-level library to perform the matrix building step in cvxpy, a convex optimization modeling software.",GPLv3,PyPI 7 | CairoSVG,2.4.1,A Simple SVG Converter based on Cairo,LGPLv3+,PyPI 8 | Cartopy,0.17.0,A cartographic python library with Matplotlib support for visualisation,LGPLv3,PyPI 9 | Click,7.0,Composable command line interface toolkit,BSD,PyPI 10 | ConfigArgParse,0.14.0,A drop-in replacement for argparse that allows options to also be set via config files and/or environment variables.,MIT,PyPI 11 | Cython,0.29.13,The Cython compiler for writing C extensions for the Python language.,Apache,PyPI 12 | Delorean,1.0.0,library for manipulating datetimes with ease and clarity,MIT license,PyPI 13 | Deprecated,1.2.6,"Python @deprecated decorator to deprecate old python classes, functions or methods.",MIT,PyPI 14 | Fiona,1.8.6,Fiona reads and writes spatial data files,BSD,PyPI 15 | Flask,1.1.1,A simple framework for building complex web applications.,BSD-3-Clause,PyPI 16 | Flask-Cors,3.0.8,A Flask extension adding a decorator for CORS support,MIT,PyPI 17 | Geohash,1.0,Module to decode/encode Geohashes to/from latitude and longitude.,GNU Affero GPL.,PyPI 18 | ImageHash,4.0,Image Hashing library,BSD 2-clause (see LICENSE file),PyPI 19 | Janome,0.3.9,Japanese morphological analysis engine.,AL2,PyPI 20 | Jinja2,2.10.1,A small but fast and easy to use stand-alone template engine written in pure python.,BSD,PyPI 21 | Keras,2.2.4,Deep Learning for humans,MIT,PyPI 22 | Keras-Applications,1.0.8,Reference implementations of popular deep learning models,MIT,PyPI 23 | Keras-Preprocessing,1.1.0,Easy data preprocessing and data augmentation for deep learning models,MIT,PyPI 24 | Lasagne,0.2.dev1,A lightweight library to build and train neural networks in Theano,MIT,PyPI 25 | Mako,1.1.0,A super-fast templating language that borrows the best ideas from the existing templating languages.,MIT,PyPI 26 | Markdown,3.1.1,Python implementation of Markdown.,BSD License,PyPI 27 | MarkupSafe,1.1.1,Safely add untrusted strings to HTML/XML markup.,BSD-3-Clause,PyPI 28 | PDPbox,0.2.0+13.g73c6966,python partial dependence plot toolbox,MIT,PyPI 29 | Pillow,5.4.1,Python Imaging Library (Fork),,PyPI 30 | PyArabic,0.6.5,Arabic text tools for Python,GPL,PyPI 31 | PyAstronomy,0.13.0,A collection of astronomy related tools for Python.,,PyPI 32 | PyBrain,0.3.3,PyBrain is the Swiss army knife for neural networking.,BSD,PyPI 33 | PyOpenGL,3.1.0,Standard OpenGL bindings for Python,BSD,PyPI 34 | PyPrind,2.11.2,Python Progress Bar and Percent Indicator Utility,BSD 3-Clause,PyPI 35 | PySocks,1.7.0,A Python SOCKS client module. See https://github.com/Anorov/PySocks for more information.,BSD,PyPI 36 | PyUpSet,0.1.1.post7,Python implementation of the UpSet visualisation suite by Lex et al.,MIT,PyPI 37 | PyWavelets,1.0.3,"PyWavelets, wavelet transform module",MIT,PyPI 38 | PyYAML,5.1.2,YAML parser and emitter for Python,MIT,PyPI 39 | Pygments,2.4.2,Pygments is a syntax highlighting package written in Python.,BSD License,PyPI 40 | Pympler,0.7,"A development tool to measure, monitor and analyze the memory behavior of Python objects.","Apache License, Version 2.0",PyPI 41 | Pyphen,0.9.5,Pure Python module to hyphenate text,,PyPI 42 | QtAwesome,0.5.7,FontAwesome icons in PyQt and PySide applications,MIT,PyPI 43 | QtPy,1.9.0,"Provides an abstraction layer on top of the various Qt bindings (PyQt5, PyQt4 and PySide) and additional custom QWidgets.",MIT,PyPI 44 | Rtree,0.8.3,R-Tree spatial index for Python GIS,LGPL,PyPI 45 | SQLAlchemy,1.3.7,Database Abstraction Library,MIT,PyPI 46 | SecretStorage,3.1.1,Python bindings to FreeDesktop.org Secret Service API,BSD 3-Clause License,PyPI 47 | Send2Trash,1.5.0,"Send file to trash natively under Mac OS X, Windows and Linux.",BSD License,PyPI 48 | Shapely,1.6.4.post2,"Geometric objects, predicates, and operations",BSD,PyPI 49 | SimpleITK,1.2.2,SimpleITK is a simplified interface to the Insight Toolkit (ITK) for image registration and segmentation,Apache,PyPI 50 | SoundFile,0.10.2,"An audio library based on libsndfile, CFFI and NumPy",BSD 3-Clause License,PyPI 51 | Sphinx,2.1.2,Python documentation generator,BSD,PyPI 52 | TPOT,0.10.2,Tree-based Pipeline Optimization Tool,GNU/LGPLv3,PyPI 53 | Theano,1.0.4+12.g93e8180bf,Optimizing compiler for evaluating mathematical expressions on CPUs and GPUs.,BSD,PyPI 54 | Unidecode,1.1.1,ASCII transliterations of Unicode text,GPL,PyPI 55 | Wand,0.5.3,Ctypes-based simple MagickWand API binding for Python,MIT License,PyPI 56 | Werkzeug,0.15.5,The comprehensive WSGI web application library.,BSD-3-Clause,PyPI 57 | Wordbatch,1.4.0,"Python library for distributed AI processing pipelines, using swappable scheduler backends",GNU GPL 2.0,PyPI 58 | XlsxWriter,1.1.8,A Python module for creating Excel XLSX files.,BSD,PyPI 59 | absl-py,0.7.1,"Abseil Python Common Libraries, see https://github.com/abseil/abseil-py.",Apache 2.0,PyPI 60 | alabaster,0.7.12,"Configurable, Python 2+3 compatible Sphinx theme.",BSD 3-Clause,Anaconda 61 | albumentations,0.3.1,Fast image augmentation library and easy to use wrapper around other libraries,MIT,PyPI 62 | alembic,1.0.11,A database migration tool for SQLAlchemy.,MIT,Anaconda 63 | allennlp,0.8.4,"An open-source NLP research library, built on PyTorch.",Apache,PyPI 64 | altair,3.2.0,Altair: A declarative statistical visualization library for Python.,BSD 3-clause,PyPI 65 | anaconda-client,1.7.2,anaconda.org command line client library,BSD 3-clause,Anaconda 66 | anaconda-navigator,1.9.7,Anaconda Navigator,"proprietary - Continuum Analytics, Inc.",Anaconda 67 | anaconda-project,0.8.3,"Tool for encapsulating, running, and reproducing data science projects",BSD 3-Clause,Anaconda 68 | annoy,1.15.2,Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk.,Apache License 2.0,PyPI 69 | appdirs,1.4.3,A small Python module for determining appropriate platform-specific dirs.,MIT,Anaconda 70 | arrow,0.14.5,Better dates & times for Python.,Apache-2.0,Anaconda 71 | asn1crypto,0.24.0,Python ASN.1 library with a focus on performance and a pythonic API,MIT,Anaconda 72 | astor,0.8.0,Read/rewrite/write Python ASTs,BSD-3-Clause,PyPI 73 | astroid,2.2.5,A abstract syntax tree for Python with inference support.,LGPL 2.1,Anaconda 74 | astropy,3.2.1,Community-developed Python Library for Astronomy,BSD,Anaconda 75 | atomicwrites,1.3.0,Atomic file writes,MIT,Anaconda 76 | attrs,19.1.0,attrs is the Python package that will bring back the joy of writing classes by relieving you from the drudgery of implementing object protocols (aka dunder methods).,MIT,Anaconda 77 | audioread,2.1.8,"multi-library, cross-platform audio decoding",MIT,PyPI 78 | backcall,0.1.0,Specifications for callback functions passed in to an API,BSD-3-Clause,Anaconda 79 | backports.functools-lru-cache,1.5,backports.functools_lru_cache,,PyPI 80 | backports.os,0.1.1,Backport of new features in Python's os module,Python Software Foundation,Anaconda 81 | backports.shutil-get-terminal-size,1.0.0,A backport of the get_terminal_size function from Python 3.3's shutil.,MIT,PyPI 82 | backports.tempfile,1.0,Backport of new features in Python's tempfile module,Python Software Foundation License,PyPI 83 | backports.weakref,1.0.post1,Backport of new features in Python's weakref module,Python-2.0,Anaconda 84 | basemap,1.2.0,Plot on map projections using matplotlib,MIT,Anaconda 85 | bayesian-optimization,1.0.2,Bayesian Optimization package,,PyPI 86 | bayespy,0.5.18,Variational Bayesian inference tools for Python,,PyPI 87 | bcolz,1.2.1,A columnar data container that can be compressed,MIT,Anaconda 88 | beautifulsoup4,4.8.0,Python library designed for screen-scraping,MIT,Anaconda 89 | biopython,1.74,Collection of freely available tools for computational molecular biology,Biopython License Agreement,Anaconda 90 | bitarray,1.0.1,efficient arrays of booleans -- C extension,PSF,Anaconda 91 | bkcharts,0.2,High level chart types built on top of Bokeh,BSD 3-Clause,Anaconda 92 | bleach,3.1.0,"Easy, whitelist-based HTML-sanitizing tool",Apache-2.0,Anaconda 93 | blis,0.2.4,"The Blis BLAS-like linear algebra library, as a self-contained C-extension.",BSD,PyPI 94 | blist,1.3.6,Replacement for Python lists with better performance,BSD 3-clause,Anaconda 95 | bokeh,1.3.4,Statistical and novel interactive HTML plots for Python,BSD 3-Clause,Anaconda 96 | boto,2.49.0,Amazon Web Services Library,MIT,Anaconda 97 | boto3,1.9.212,Amazon Web Services SDK for Python,Apache 2.0,Anaconda 98 | botocore,1.12.212,"Low-level, data-driven core of boto 3.",Apache 2.0,Anaconda 99 | bq-helper,0.4.1,,, 100 | branca,0.3.1,Generate complex HTML+JS pages with Python,"Copyright (C) 2013, Martin Journois",PyPI 101 | brewer2mpl,1.4.1,Connect colorbrewer2.org color maps to Python and matplotlib,UNKNOWN,PyPI 102 | cachetools,3.1.1,Extensible memoizing collections and decorators,MIT,PyPI 103 | cairocffi,1.0.2,cffi-based cairo bindings for Python,BSD,PyPI 104 | catboost,0.16.5,Catboost Python Package,"Apache License, Version 2.0",PyPI 105 | category-encoders,2.0.0,A collection sklearn transformers to encode categorical variables as numeric,BSD,PyPI 106 | certifi,2019.6.16,Python package for providing Mozilla's CA Bundle.,ISC,Anaconda 107 | cesium,0.9.9,Machine Learning Time-Series Platform,Modified BSD,PyPI 108 | cffi,1.12.3,Foreign Function Interface for Python calling C code.,MIT,Anaconda 109 | chainer,6.2.0,A flexible framework of neural networks,MIT License,Anaconda 110 | chainercv,0.13.1,,,PyPI 111 | chardet,3.0.4,Universal character encoding detector,LGPL2,Anaconda 112 | cleverhans,3.0.1,,MIT,PyPI 113 | click-plugins,1.1.1,An extension module for click to enable registering CLI commands via setuptools entry-points,BSD-3-Clause,Anaconda 114 | cliff,2.15.0,Command Line Interface Formulation Framework,,PyPI 115 | cligj,0.5.0,Click params for commmand line interfaces to GeoJSON.,BSD 3-Clause,Anaconda 116 | cloudpickle,1.2.1,Extended pickling support for Python objects,BSD 3-Clause,Anaconda 117 | clyent,1.2.2,Command line client library for windows and posix,BSD 3-Clause,Anaconda 118 | cmd2,0.9.16,cmd2 - quickly build feature-rich and user-friendly interactive command line applications in Python,MIT,PyPI 119 | cmudict,0.4.3,A versioned python wrapper package for The CMU Pronouncing Dictionary data files.,GPL-3.0,PyPI 120 | colorama,0.4.1,Cross-platform colored terminal text,BSD-3-Clause,Anaconda 121 | colorcet,2.0.1,Collection of perceptually uniform colormaps,CC-BY-4.0,Anaconda 122 | colorlog,4.0.2,Log formatting with colors!,MIT License,PyPI 123 | colorlover,0.3.0,Color scales for IPython notebook,,PyPI 124 | conda,4.7.11,"OS-agnostic, system-level binary package and environment manager.",BSD 3-Clause,Anaconda 125 | conda-build,3.18.9,tools for building conda packages,BSD 3-Clause,Anaconda 126 | conda-package-handling,1.3.11,,, 127 | conda-verify,3.4.2,tool for validating conda recipes and conda packages,BSD 3-Clause,Anaconda 128 | confuse,1.0.0,painless YAML configuration,MIT,PyPI 129 | conllu,0.11,CoNLL-U Parser parses a CoNLL-U formatted string into a nested python dictionary,,PyPI 130 | contextlib2,0.5.5,Backports and enhancements for the contextlib module,PSF,Anaconda 131 | convertdate,2.1.3,Converts between Gregorian dates and other calendar systems.,MIT,PyPI 132 | conx,3.7.10,On-Ramp to Deep Learning. Built on Keras,,PyPI 133 | coverage,4.5.4,Code coverage measurement for Python,Apache 2.0,Anaconda 134 | cryptography,2.3.1,Provides cryptographic recipes and primitives to Python developers,"Apache 2.0 or BSD 3-Clause, PSF 2",Anaconda 135 | cssselect2,0.2.1,CSS selectors for Python ElementTree,BSD,PyPI 136 | cufflinks,0.16,Productivity Tools for Plotly + Pandas,MIT,PyPI 137 | cvxpy,1.0.24,A domain-specific language for modeling convex optimization problems in Python.,"Apache License, Version 2.0",PyPI 138 | cycler,0.10.0,Composable style cycles.,BSD 3-Clause,Anaconda 139 | cymem,2.0.2,Manage calls to calloc/free through Cython,MIT,PyPI 140 | cysignals,1.10.2,Interrupt and signal handling for Cython,"GNU Lesser General Public License, version 3 or later",PyPI 141 | cytoolz,0.10.0,Cython implementation of Toolz. High performance functional utilities,BSD-3-Clause,Anaconda 142 | dask,2.3.0,Parallel PyData with Task Scheduling,BSD 3-Clause,Anaconda 143 | dask-glm,0.2.0,Generalized Linear Models in Dask,BSD-3-Clause,Anaconda 144 | dask-ml,1.0.0,Distributed and parallel machine learning using dask.,BSD-3-Clause,Anaconda 145 | dask-xgboost,0.1.7,Interactions between Dask and XGBoost,BSD,PyPI 146 | dataclasses,0.6,A backport of the dataclasses module for Python 3.6,Apache,PyPI 147 | datashader,0.7.0,Data visualization toolchain based on aggregating into a grid,BSD-3-Clause,Anaconda 148 | datashape,0.5.2,A data description language.,BSD 2-Clause,Anaconda 149 | deap,1.3.0,Distributed Evolutionary Algorithms in Python,LGPL,PyPI 150 | decorator,4.4.0,Better living through Python with decorators.,BSD 3-Clause,Anaconda 151 | deepdish,0.3.6,Deep Learning experiments from University of Chicago.,BSD,PyPI 152 | defusedxml,0.6.0,XML bomb protection for Python stdlib modules,PSF 2.0,Anaconda 153 | descartes,1.1.0,Use geometric objects as matplotlib paths and patches,BSD,PyPI 154 | dill,0.3.0,Serialize all of python (almost),BSD 3-Clause,Anaconda 155 | dipy,1.0.0,Diffusion MRI utilities in python,BSD license,PyPI 156 | distributed,2.3.0,Distributed computing with Dask,BSD 3-Clause,Anaconda 157 | docopt,0.6.2,"Pythonic argument parser, that will make you smile",MIT,Anaconda 158 | docutils,0.15.2,Docutils -- Python Documentation Utilities,Public Domain Dedictation and BSD 2-Clause and PSF 2.1.1 and GPL 3.0,Anaconda 159 | dora,0.1,Exploratory data analysis toolkit for Python,MIT,PyPI 160 | ecos,2.0.7.post1,Embedded Conic Solver (ECOS),GPL3,Anaconda 161 | editdistance,0.5.3,Fast implementation of the edit distance(Levenshtein distance),,PyPI 162 | edward,1.3.5,"A library for probabilistic modeling, inference, and criticism",Apache License 2.0,PyPI 163 | eli5,0.10.0,Debug machine learning classifiers and explain their predictions,MIT license,PyPI 164 | emoji,0.5.3,Emoji for Python,New BSD,PyPI 165 | en-core-web-lg,2.1.0,,, 166 | en-core-web-sm,2.1.0,,, 167 | entrypoints,0.3,Discover and load entry points from installed packages.,MIT License,Anaconda 168 | ephem,3.7.7.0,Basic astronomical computations for Python,LGPL,Anaconda 169 | essentia,2.1b5.dev707,"Library for audio and music analysis, description and synthesis",AGPLv3,PyPI 170 | et-xmlfile,1.0.1,An implementation of lxml.xmlfile for the standard library,MIT,PyPI 171 | ethnicolr,0.2.1,Predict Race/Ethnicity Based on Name,MIT,PyPI 172 | fancyimpute,0.5.3,Matrix completion and feature imputation algorithms,http://www.apache.org/licenses/LICENSE-2.0.html,PyPI 173 | fastFM,0.2.11,,BSD,PyPI 174 | fastai,1.0.57,"fastai makes deep learning with PyTorch faster, more accurate, and easier",Apache Software License 2.0,PyPI 175 | fastcache,1.1.0,C implementation of Python 3 lru_cache,MIT,Anaconda 176 | fasteners,0.14.1,A python package that provides useful locks.,ASL 2.0,PyPI 177 | fastprogress,0.1.21,A nested progress with plotting options for fastai,Apache License 2.0,PyPI 178 | fasttext,0.9.1,fasttext Python bindings,MIT,PyPI 179 | fbpca,1.0,Fast computations of PCA/SVD/eigendecompositions via randomized methods,BSD License,PyPI 180 | fbprophet,0.5,Automatic Forecasting Procedure,BSD,PyPI 181 | feather-format,0.4.0,Simple wrapper library to the Apache Arrow-based Feather File Format,"Apache License, Version 2.0",PyPI 182 | featuretools,0.10.0,a framework for automated feature engineering,BSD 3-clause,PyPI 183 | filelock,3.0.12,A platform independent file lock.,Public Domain,Anaconda 184 | fitter,1.1.11,A tool to fit data to many distributions and best one(s),GPL,PyPI 185 | flake8,3.6.0,Your Tool For Style Guide Enforcement,MIT,Anaconda 186 | flashtext,2.7,Extract/Replaces keywords in sentences.,,PyPI 187 | folium,0.10.0,Make beautiful maps with Leaflet.js & Python,MIT,PyPI 188 | fsspec,0.4.0,File-system specification,BSD,PyPI 189 | ftfy,4.4.3,Fixes some problems with Unicode text after the fact,MIT,PyPI 190 | funcsigs,1.0.2,"Python function signatures from PEP362 for Python 2.6, 2.7 and 3.2+",ASL,PyPI 191 | funcy,1.13,A fancy and practical functional tools,BSD,PyPI 192 | fury,0.3.0,Free Unified Rendering in Python,BSD (3-clause),PyPI 193 | future,0.17.1,Clean single-source support for Python 3 and 2,MIT,Anaconda 194 | fuzzywuzzy,0.17.0,Fuzzy string matching in python,GPL,PyPI 195 | gast,0.2.2,Python AST that abstracts the underlying Python version,BSD 3-Clause,PyPI 196 | gatspy,0.3,General tools for Astronomical Time Series in Python,BSD 3-clause,PyPI 197 | gdbn,0.1,Pre-trained deep neural networks,MIT (see license.txt),PyPI 198 | gensim,3.8.0,Topic Modelling for Humans,LGPL-3.0,Anaconda 199 | geographiclib,1.49,The geodesic routines from GeographicLib,MIT,PyPI 200 | geojson,2.5.0,Python bindings and utilities for GeoJSON,BSD,PyPI 201 | geopandas,0.5.1,Geographic pandas extensions,BSD,PyPI 202 | geoplot,0.3.0,High-level geospatial plotting for Python.,,PyPI 203 | geopy,1.20.0,Python Geocoding Toolbox,MIT,PyPI 204 | geoviews,1.6.1,GeoViews is a Python library that makes it easy to explore and visualize geographical datasets.,BSD 3-Clause,PyPI 205 | gevent,1.4.0,Coroutine-based network library,MIT,Anaconda 206 | ggplot,0.11.5,ggplot for python,BSD,PyPI 207 | glmnet-py,0.1.0b2,"Python version of glmnet, originally from Stanford University, modified by Han Fang",GPL-2,PyPI 208 | glob2,0.7,"Version of the glob module that supports recursion via **, and can capture patterns.",BSD-2-Clause,Anaconda 209 | gluoncv,0.4.0.post0,MXNet Gluon CV Toolkit,Apache-2.0,PyPI 210 | gluonnlp,0.8.0,MXNet Gluon NLP Toolkit,Apache-2.0,PyPI 211 | gmpy2,2.0.8,"GMP/MPIR, MPFR, and MPC interface to Python 2.6+ and 3.x",LGPL 3,Anaconda 212 | gnumpy,0.2,"Almost identical to numpy, but does its computations on your computer's GPU, using Cudamat.",BSD-derived (see LICENSE.txt),PyPI 213 | google-api-core,1.14.2,Google API client core library,Apache 2.0,PyPI 214 | google-api-python-client,1.7.11,Google API Client Library for Python,Apache 2.0,PyPI 215 | google-auth,1.6.3,Google Authentication Library,Apache 2.0,PyPI 216 | google-auth-httplib2,0.0.3,Google Authentication Library: httplib2 transport,Apache 2.0,PyPI 217 | google-cloud-automl,0.4.0,Cloud AutoML API client library,Apache 2.0,PyPI 218 | google-cloud-bigquery,1.12.1,Google BigQuery API client library,Apache 2.0,PyPI 219 | google-cloud-core,1.0.3,Google Cloud API client core library,Apache 2.0,PyPI 220 | google-cloud-storage,1.18.0,Google Cloud Storage API client library,Apache 2.0,PyPI 221 | google-pasta,0.1.7,pasta is an AST-based Python refactoring library,Apache 2.0,PyPI 222 | google-resumable-media,0.3.2,Utilities for Google Media Downloads and Resumable Uploads,Apache 2.0,PyPI 223 | googleapis-common-protos,1.6.0,Common protobufs used in Google APIs,Apache-2.0,PyPI 224 | gplearn,0.4.1,"Genetic Programming in Python, with a scikit-learn inspired API",new BSD,PyPI 225 | gpxpy,1.3.5,GPX file parser and GPS track manipulation library,"Apache License, Version 2.0",PyPI 226 | graphviz,0.8.4,Open Source graph visualization software.,EPL v1.0,Anaconda 227 | greenlet,0.4.15,Lightweight in-process concurrent programming,MIT,Anaconda 228 | grpcio,1.23.0,HTTP/2-based RPC framework,Apache License 2.0,PyPI 229 | gym,0.14.0,The OpenAI Gym: A toolkit for developing and comparing your reinforcement learning agents.,,PyPI 230 | h2o,3.26.0.2,hadoop-centric machine learning (core java package),None,Anaconda 231 | h5py,2.9.0,Read and write HDF5 files from Python.,BSD-3-Clause,Anaconda 232 | hallucinate,0.0.1,,, 233 | haversine,2.1.2,Calculate the distance between 2 points on Earth.,['MIT'],PyPI 234 | heamy,0.0.7,A set of useful tools for competitive data science.,MIT,PyPI 235 | heapdict,1.0.0,A heap with decrease-key and increase-key operations,BSD 3-Clause,Anaconda 236 | hep-ml,0.6.0,Machine Learning for High Energy Physics,Apache 2.0,PyPI 237 | hmmlearn,0.2.2,,new BSD,PyPI 238 | holidays,0.9.11,Generate and work with holidays in Python,MIT,PyPI 239 | holoviews,1.12.5,Stop plotting your data - annotate your data and let it visualize itself.,BSD 3-Clause,Anaconda 240 | hpsklearn,0.1.0,Hyperparameter Optimization for sklearn,BSD,PyPI 241 | html5lib,1.0.1,HTML parser based on the WHATWG HTML specification,MIT,Anaconda 242 | htmlmin,0.1.12,An HTML Minifier,BSD,PyPI 243 | httplib2,0.13.1,A comprehensive HTTP client library.,MIT,PyPI 244 | humanize,0.5.1,Python humanize utilities.,MIT,Anaconda 245 | hunspell,0.5.5,Module for the Hunspell spellchecker engine,LGPLv3,PyPI 246 | husl,4.0.3,Human-friendly HSL,MIT,PyPI 247 | hyperopt,0.1.2,Distributed Asynchronous Hyperparameter Optimization,BSD,PyPI 248 | hypertools,0.5.1,A python package for visualizing and manipulating high-dimensional data,MIT,PyPI 249 | hypothesis,4.33.1,A library for property based testing,MPL v2,PyPI 250 | ibis-framework,1.2.0,Productivity-centric Python Big Data Framework,Apache 2.0,Anaconda 251 | idna,2.8,Internationalized Domain Names in Applications (IDNA).,BSD Like,Anaconda 252 | ijson,2.4,Iterative JSON parser with a standard Python iterator interface,BSD,PyPI 253 | imageio,2.5.0,a Python library for reading and writing image data,BSD 2-Clause,Anaconda 254 | imagesize,1.1.0,Getting image size from png/jpeg/jpeg2000/gif file,MIT,Anaconda 255 | imbalanced-learn,0.5.0,Toolbox for imbalanced dataset in machine learning.,MIT,PyPI 256 | imgaug,0.2.6,Image augmentation library for deep neural networks,MIT,PyPI 257 | implicit,0.4.0,Collaborative Filtering for Implicit Datasets,MIT,PyPI 258 | importlib-metadata,0.19,Read metadata from Python packages,Apache Software License,PyPI 259 | ipykernel,5.1.2,IPython Kernel for Jupyter,BSD-3-Clause,Anaconda 260 | ipython,7.7.0,IPython: Productive Interactive Computing,BSD-3-Clause,Anaconda 261 | ipython-genutils,0.2.0,Vestigial utilities from IPython,BSD,PyPI 262 | ipywidgets,7.5.1,Jupyter Interactive Widgets,BSD 3-clause,Anaconda 263 | iso3166,1.0,Self-contained ISO 3166-1 country definitions.,MIT,PyPI 264 | isort,4.3.21,A Python utility,library to sort Python imports.,Anaconda 265 | isoweek,1.3.3,Objects representing a week,BSD,PyPI 266 | itsdangerous,1.1.0,Various helpers to pass trusted data to untrusted environments.,BSD 3-Clause,Anaconda 267 | jdcal,1.4.1,Julian dates from proleptic Gregorian and Julian calendars.,BSD 2-Clause,Anaconda 268 | jedi,0.15.1,An autocompletion tool for Python that can be used for text editors.,MIT,Anaconda 269 | jeepney,0.4,Pure Python DBus interface,MIT,Anaconda 270 | jieba,0.39,Chinese Words Segementation Utilities,MIT,PyPI 271 | jmespath,0.9.4,JSON Matching Expressions,MIT,PyPI 272 | joblib,0.13.2,Python function as pipeline jobs.,BSD 3-Clause,Anaconda 273 | json5,0.8.5,A Python implementation of the JSON5 data format,Apache-2.0,Anaconda 274 | jsonnet,0.13.0,Python bindings for Jsonnet - The data templating language,,PyPI 275 | jsonpickle,0.9.6,Python library for serializing any arbitrary object graph into JSON,BSD,PyPI 276 | jsonschema,3.0.2,An implementation of JSON Schema validation for Python,MIT,Anaconda 277 | jupyter,1.0.0,Jupyter metapackage. Install all the Jupyter components in one go.,BSD 3-Clause,Anaconda 278 | jupyter-client,5.3.1,Jupyter protocol implementation and client libraries,BSD,PyPI 279 | jupyter-console,6.0.0,Jupyter terminal console,BSD,PyPI 280 | jupyter-core,4.5.0,Jupyter core package. A base package on which Jupyter projects rely.,BSD,PyPI 281 | jupyter-tensorboard,0.1.10,Start tensorboard in Jupyter! Jupyter notebook integration for tensorboard.,MIT License,PyPI 282 | jupyterlab,1.0.2,JupyterLab,BSD-3-Clause,Anaconda 283 | jupyterlab-server,1.0.0,JupyterLab Server,BSD,PyPI 284 | keras-rcnn,0.0.2,,MIT,PyPI 285 | keras-resnet,0.2.0,,MIT,PyPI 286 | keras-rl,0.4.2,Deep Reinforcement Learning for Keras,MIT,PyPI 287 | keras-tqdm,2.0.1,Keras models with TQDM progress bars in Jupyter notebooks,MIT,PyPI 288 | keyring,18.0.0,Store and access your passwords safely,Python Software Foundation or MIT,Anaconda 289 | kiwisolver,1.1.0,A fast implementation of the Cassowary constraint solver,BSD 3-Clause,Anaconda 290 | kmapper,1.2.0,Python implementation of Mapper algorithm for Topological Data Analysis.,MIT,PyPI 291 | kmeans-smote,0.1.2,Oversampling for imbalanced learning based on k-means and SMOTE,MIT,PyPI 292 | kmodes,0.10.1,Python implementations of the k-modes and k-prototypes clustering algorithms for clustering categorical data.,MIT,PyPI 293 | knnimpute,0.1.0,k-Nearest Neighbor imputation,http://www.apache.org/licenses/LICENSE-2.0.html,PyPI 294 | ktext,0.40,Pre-processing text in parallel for Keras in python.,MIT,PyPI 295 | langdetect,1.0.7,Language detection library ported from Google's language-detection.,"Copyright 2014-2015 Michal ""Mimino"" Danilak, Apache License, Version 2.0",PyPI 296 | langid,1.1.6,langid.py is a standalone Language Identification (LangID) tool.,BSD,PyPI 297 | lazy-object-proxy,1.4.1,A fast and thorough lazy object proxy,BSD-2-Clause,Anaconda 298 | learntools,0.3.1,,, 299 | leven,1.0.4,Levenshtein edit distance library,UNKNOWN,PyPI 300 | libarchive-c,2.8,Python interface to libarchive,CC0,PyPI 301 | librosa,0.7.0,Python module for audio and music processing,ISC,PyPI 302 | lief,0.9.0,LIEF is a library to instrument executable formats,Apache 2.0,PyPI 303 | lightfm,1.15,LightFM recommendation model,MIT,PyPI 304 | lightgbm,2.2.3,LightGBM Python Package,The MIT License (Microsoft),PyPI 305 | lime,0.1.1.36,Local Interpretable Model-Agnostic Explanations for machine learning classifiers,BSD,PyPI 306 | line-profiler,2.1.2,Line-by-line profiler.,BSD,PyPI 307 | llvmlite,0.29.0,A lightweight LLVM python binding for writing JIT compilers,New BSD License,Anaconda 308 | lml,0.0.9,Load me later. A lazy plugin management system.,New BSD,PyPI 309 | locket,0.2.0,File-based locks for Python for Linux and Windows,BSD 2-Clause,Anaconda 310 | lunardate,0.2.0,A Chinese Calendar Library in Pure Python,GPLv3,PyPI 311 | lxml,4.4.1,Pythonic binding for the C libraries libxml2 and libxslt.,"BSD 3-Clause, GPL-2.0, ZPL-2.0, and ElementTree",Anaconda 312 | lz4,2.1.2,LZ4 Bindings for Python,BSD 3-Clause,Anaconda 313 | marisa-trie,0.7.5,Static memory-efficient and fast Trie-like structures for Python.,MIT,PyPI 314 | markovify,0.7.2,"A simple, extensible Markov chain generator. Uses include generating random semi-plausible sentences based on an existing text.",MIT,PyPI 315 | matplotlib,3.0.3,Publication quality figures in Python,PSF-based,Anaconda 316 | matplotlib-venn,0.11.5,Functions for plotting area-proportional two- and three-way Venn diagrams in matplotlib.,MIT,PyPI 317 | mccabe,0.6.1,McCabe complexity checker for Python,MIT,Anaconda 318 | memory-profiler,0.55.0,A module for monitoring memory usage of a python program,BSD,PyPI 319 | missingno,0.4.2,Missing data visualization module for Python.,,PyPI 320 | mistune,0.8.4,The fastest markdown parser in pure Python.,BSD 3-Clause,Anaconda 321 | mizani,0.6.0,Scales for Python,BSD (3-clause),PyPI 322 | mkl-fft,1.0.12,MKL-based FFT transforms for NumPy arrays,Proprietary - Intel,PyPI 323 | mkl-random,1.0.2,NumPy-based implementation of random number generation sampling using Intel (R) Math Kernel Library.,Proprietary - Intel,PyPI 324 | mkl-service,2.0.2,Python hooks for Intel(R) Math Kernel Library runtime control settings.,BSD-3-Clause,Anaconda 325 | ml-metrics,0.1.4,Machine Learning Evaluation Metrics,,PyPI 326 | mlcrate,0.2.0,"A collection of handy python tools and functions, mainly for ML and Kaggle.",MIT,PyPI 327 | mlens,0.2.3,Machine Learning Ensemble Library,MIT,PyPI 328 | mlxtend,0.17.0,Machine Learning Library Extensions,BSD 3-Clause,PyPI 329 | mmh3,2.5.1,"Python wrapper for MurmurHash (MurmurHash3), a set of fast and robust hash functions.",License :: CC0 1.0 Universal (CC0 1.0) Public Domain Dedication,PyPI 330 | mne,0.18.2,MNE python project for MEG and EEG data analysis.,BSD (3-clause),PyPI 331 | mnist,0.2.2,Python utilities to download and parse the MNIST dataset,BSD,PyPI 332 | mock,3.0.5,A library for testing in Python,BSD-2-Clause,Anaconda 333 | monotonic,1.5,An implementation of time.monotonic() for Python 2 & < 3.3,Apache,PyPI 334 | more-itertools,7.2.0,"More routines for operating on iterables, beyond itertools",MIT,Anaconda 335 | mpld3,0.3,D3 Viewer for Matplotlib.,BSD 3-Clause,Anaconda 336 | mplleaflet,0.0.5,Convert Matplotlib plots into Leaflet web maps,BSD 3-clause,PyPI 337 | mpmath,1.1.0,Python library for arbitrary-precision floating-point arithmetic,BSD 3-Clause,Anaconda 338 | msgpack,0.6.1,MessagePack (de)serializer.,Apache 2.0,PyPI 339 | msgpack-numpy,0.4.4.3,Numpy data serialization using msgpack,BSD,PyPI 340 | multipledispatch,0.6.0,Multiple dispatch in Python,BSD 3-Clause,Anaconda 341 | multiprocess,0.70.8,better multiprocessing and multithreading in python,BSD,PyPI 342 | munch,2.3.2,A dot-accessible dictionary (a la JavaScript objects).,MIT,Anaconda 343 | murmurhash,1.0.2,Cython bindings for MurmurHash,MIT,PyPI 344 | mxnet,1.5.0,MXNet is an ultra-scalable deep learning framework. This version uses openblas.,Apache 2.0,PyPI 345 | navigator-updater,0.2.1,Anaconda Navigator Updater,"proprietary - Continuum Analytics, Inc.",Anaconda 346 | nbconvert,5.5.0,Converting Jupyter Notebooks,BSD-3-Clause,Anaconda 347 | nbformat,4.4.0,The reference implementation of the Jupyter Notebook format,BSD 3-clause,Anaconda 348 | nervananeon,2.6.0,Intel's deep learning framework,License :: OSI Approved :: Apache Software License,PyPI 349 | networkx,2.3,Python package for creating and manipulating complex networks,BSD-3-Clause,Anaconda 350 | nibabel,2.5.0,Access a multitude of neuroimaging data formats,MIT License,PyPI 351 | nilearn,0.5.2,Statistical learning for neuroimaging in Python,new BSD,PyPI 352 | nltk,3.2.4,Build Python programs to work with human language data,Apache 2.0,Anaconda 353 | nolearn,0.6.1.dev0,scikit-learn compatible neural network library,MIT,PyPI 354 | nose,1.3.7,Nose extends unittest to make testing easier,LGPL-2.1,Anaconda 355 | notebook,5.5.0,Jupyter Notebook,BSD 3-clause,Anaconda 356 | numba,0.45.1,NumPy aware dynamic Python compiler using LLVM,BSD 2-clause,Anaconda 357 | numdifftools,0.9.39,Solves automatic numerical differentiation problems in one or more variables.,new BSD,PyPI 358 | numexpr,2.6.9,Fast numerical expression evaluator for NumPy.,MIT,Anaconda 359 | numpy,1.17.0,"Array processing for numbers, strings, records, and objects.",BSD 3-Clause,Anaconda 360 | numpydoc,0.9.1,Numpy's Sphinx extensions,BSD 3-Clause,Anaconda 361 | nvidia-ml-py3,7.352.0,Python Bindings for the NVIDIA Management Library,BSD,PyPI 362 | odfpy,1.4.0,Python API and tools to manipulate OpenDocument files,,PyPI 363 | olefile,0.46,"parse, read and write Microsoft OLE2 files",BSD 2-Clause,Anaconda 364 | onnx,1.5.0,Open Neural Network Exchange,,PyPI 365 | opencv-python,4.1.0.25,Wrapper package for OpenCV python bindings.,MIT,PyPI 366 | opencv-python-headless,4.1.0.25,Wrapper package for OpenCV python bindings.,MIT,PyPI 367 | openpyxl,2.6.2,A Python library to read/write Excel 2010 xlsx/xlsm files,MIT,Anaconda 368 | optuna,0.14.0,A hyperparameter optimization framework,,PyPI 369 | orderedmultidict,1.0.1,Ordered Multivalue Dictionary,Unlicense,PyPI 370 | ortools,7.3.7083,Google OR-Tools python libraries and modules,Apache 2.0,PyPI 371 | osmnx,0.10,"Retrieve, model, analyze, and visualize OpenStreetMap street networks and other spatial data",MIT,PyPI 372 | osqp,0.5.0,OSQP: The Operator Splitting QP Solver,Apache 2.0,PyPI 373 | overrides,1.9,A decorator to automatically detect mismatch when overriding a method.,"Apache License, Version 2.0",PyPI 374 | packaging,19.1,Core utilities for Python packages,Apache 2.0 or BSD 2-Clause,Anaconda 375 | palettable,3.2.0,Color palettes for Python,,PyPI 376 | pandas,0.25.0,"High-performance, easy-to-use data structures and data analysis tools.",BSD 3-clause,Anaconda 377 | pandas-datareader,0.7.4,"Up to date remote data access for pandas, works for multiple versions of pandas",BSD-3,Anaconda 378 | pandas-profiling,2.3.0,Generate profile report for pandas DataFrame,MIT,Anaconda 379 | pandas-summary,0.0.7,An extension to pandas describe function.,MIT,PyPI 380 | pandasql,0.7.3,Sqldf for pandas,BSD,Anaconda 381 | pandoc,1.0.2,Universal markup converter (repackaged binaries).,GPL-2.0,Anaconda 382 | pandocfilters,1.4.2,A python module for writing pandoc filters,BSD-3-Clause,Anaconda 383 | param,1.9.1,Param: Make your Python code clearer and more reliable by declaring Parameters,BSD-3-Clause,Anaconda 384 | paramnb,2.0.4,Generate ipywidgets from Parameterized objects in the notebook,BSD 3-Clause License,PyPI 385 | parsimonious,0.8.1,(Soon to be) the fastest pure-Python PEG parser I could muster,MIT,PyPI 386 | parso,0.5.1,A Python Parser,MIT,Anaconda 387 | partd,1.0.0,Data structure for on-disk shuffle operations,BSD 3-Clause,Anaconda 388 | path.py,12.0.1,A module wrapper for os.path,MIT,Anaconda 389 | pathlib2,2.3.4,Fork of pathlib aiming to support the full stdlib Python API,MIT,Anaconda 390 | pathos,0.2.4,parallel graph management and execution in heterogeneous computing,3-clause BSD,PyPI 391 | patsy,0.5.1,Describing statistical models in Python using symbolic formulas,"BSD-2-Clause, PSF2",Anaconda 392 | pbr,5.4.2,Python Build Reasonableness,,PyPI 393 | pdf2image,1.6.0,A wrapper around the pdftoppm and pdftocairo command line tools to convert PDF to a PIL Image list.,MIT,PyPI 394 | pep8,1.7.1,Python style guide checker,MIT,Anaconda 395 | pexpect,4.7.0,Pexpect makes Python a better tool for controlling other applications.,ISC,Anaconda 396 | phik,0.9.8,Phi_K correlation analyzer library,,PyPI 397 | pickleshare,0.7.5,Tiny 'shelve'-like database with concurrency support,MIT,Anaconda 398 | pip,19.2.2,PyPA recommended tool for installing Python packages,MIT,Anaconda 399 | pkginfo,1.5.0.1,Query metadatdata from sdists,bdists,Anaconda 400 | plac,0.9.6,The smartest command line arguments parser in the world,BSD 2-Clause,Anaconda 401 | plotly,4.1.0,"An interactive, browser-based graphing library for Python",MIT,Anaconda 402 | plotly-express,0.4.1,Plotly Express - a high level wrapper for Plotly.py,MIT,PyPI 403 | plotnine,0.4.0,A grammar of graphics for python,GPL-2,PyPI 404 | pluggy,0.12.0,Plugin registration and hook calling for Python,MIT,Anaconda 405 | ply,3.11,Python Lex-Yacc,BSD 3-clause,Anaconda 406 | polyglot,16.7.4,Polyglot is a natural language pipeline that supports massive multilingual applications.,GPLv3,PyPI 407 | posix-ipc,1.0.4,"POSIX IPC primitives (semaphores, shared memory and message queues) for Python",http://creativecommons.org/licenses/BSD/,PyPI 408 | pox,0.2.6,utilities for filesystem exploration and automated builds,3-clause BSD,PyPI 409 | ppca,0.0.4,Probabilistic PCA,,PyPI 410 | ppft,1.6.6.1,distributed and parallel python,BSD-like,PyPI 411 | preprocessing,0.1.13,pre-processing package for text strings,MIT,PyPI 412 | preshed,2.0.1,Cython hash table that trusts the keys are pre-hashed,MIT,PyPI 413 | prettytable,0.7.2,A simple Python library for easily displaying tabular data in a visually appealing ASCII table format.,BSD,PyPI 414 | progressbar2,3.42.0,A Python Progressbar library to provide visual (yet text based) progress to long running operations.,BSD,PyPI 415 | prometheus-client,0.7.1,Python client for the Prometheus monitoring system.,Apache Software License 2.0,PyPI 416 | prompt-toolkit,2.0.9,Library for building powerful interactive command lines in Python,BSD-3-Clause,PyPI 417 | pronouncing,0.2.0,A simple interface for the CMU pronouncing dictionary,BSD,PyPI 418 | protobuf,3.7.1,Protocol Buffers - Google's data interchange format.,BSD 3-Clause,Anaconda 419 | psutil,5.6.3,A cross-platform process and system utilities module for Python,BSD 3-Clause,Anaconda 420 | ptyprocess,0.6.0,Run a subprocess in a pseudo terminal,ISC,Anaconda 421 | pudb,2019.1,"A full-screen, console-based Python debugger",,PyPI 422 | py,1.8.0,"library with cross-python path, ini-parsing, io, code, log facilities",MIT,Anaconda 423 | py-cpuinfo,5.0.0,Get CPU info with pure Python 2 & 3,MIT,PyPI 424 | py-lz4framed,0.13.0,LZ4Frame library for Python (via C bindings),Apache License 2.0,PyPI 425 | py-stringmatching,0.4.1,Python library for string matching.,BSD,PyPI 426 | py-stringsimjoin,0.3.1,Python library for performing string similarity joins.,BSD,PyPI 427 | pyLDAvis,2.1.2,Interactive topic model visualization. Port of the R package.,MIT,PyPI 428 | pyOpenSSL,19.0.0,Python wrapper module around the OpenSSL library,"Apache License, Version 2.0",PyPI 429 | pyPdf,1.13,PDF toolkit,UNKNOWN,PyPI 430 | pyahocorasick,1.4.0,pyahocorasick is a fast and memory efficient library for exact or approximate multi-pattern string search., BSD-3-Clause and Public-Domain,PyPI 431 | pyarrow,0.10.0,Python library for Apache Arrow,"Apache License, Version 2.0",PyPI 432 | pyasn1,0.4.6,ASN.1 types and codecs,BSD 2-Clause,Anaconda 433 | pyasn1-modules,0.2.6,A collection of ASN.1-based protocols modules.,BSD 2-Clause,Anaconda 434 | pybind11,2.3.0,Seamless operability between C++11 and Python,BSD,PyPI 435 | pycairo,1.18.1,Python interface for cairo,,PyPI 436 | pycodestyle,2.4.0,Python style guide checker,MIT,Anaconda 437 | pycosat,0.6.3,Bindings to picosat (a SAT solver),MIT,Anaconda 438 | pycountry,19.8.18,"ISO country, subdivision, language, currency and script definitions and their translations",LGPL 2.1,PyPI 439 | pycparser,2.19,Complete C99 parser in pure Python,BSD 3-clause,Anaconda 440 | pycrypto,2.6.1,Cryptographic modules for Python.,Public Domain,Anaconda 441 | pyct,0.4.6,"python package common tasks for users (e.g. copy examples, fetch data, ...)",BSD 3-Clause License,PyPI 442 | pycurl,7.43.0.2,A Python Interface To The cURL library,"LGPL, MIT",Anaconda 443 | pydash,4.7.5,"The kitchen sink of Python utility libraries for doing ""stuff"" in a functional way. Based on the Lo-Dash Javascript library.",MIT License,PyPI 444 | pydicom,1.3.0,Pure python package for DICOM medical file reading and writing,MIT,PyPI 445 | pydot,1.4.1,Python interface to Graphviz's Dot,MIT,Anaconda 446 | pyeconometrics,1.0.2,,, 447 | pyemd,0.5.1,A Python wrapper for the Earth Mover's Distance.,MIT,Anaconda 448 | pyexcel-io,0.5.20,"A python library to read and write structured data in csv, zipped csvformat and to/from databases",New BSD,PyPI 449 | pyexcel-ods,0.5.6,"A wrapper library to read, manipulate and write data in ods format",New BSD,PyPI 450 | pyfasttext,0.4.6,Yet another Python binding for fastText,GPLv3,PyPI 451 | pyflakes,2.0.0,Pyflakes analyzes programs and detects various errors.,MIT,Anaconda 452 | pyflux,0.4.15,PyFlux: A time-series analysis library for Python,BSD,PyPI 453 | pyglet,1.3.2,Cross-platform windowing and multimedia library,BSD,PyPI 454 | pykalman,0.9.5,"An implementation of the Kalman Filter, Kalman Smoother, and EM algorithm in Python",BSD,PyPI 455 | pykoko,0.1.8,KOKO is an easy-to-use entity extraction tool,Apache Software License 2.0,PyPI 456 | pylint,2.3.1,python code static checker,GPL,Anaconda 457 | pymagnitude,0.1.120,"A fast, efficient universal vector embedding utility package.",MIT,PyPI 458 | pymc3,3.7,Probabilistic Programming in Python.,Apache-2.0,Anaconda 459 | pymongo,3.9.0,Python driver for MongoDB http://www.mongodb.org,Apache 2.0,Anaconda 460 | pyocr,0.7.2,"A Python wrapper for OCR engines (Tesseract, Cuneiform, etc)",GPLv3+,PyPI 461 | pyodbc,4.0.27,DB API Module for ODBC,MIT,Anaconda 462 | pypandoc,1.4,Thin wrapper for pandoc.,MIT,PyPI 463 | pyparsing,2.4.2,Create and execute simple grammars,MIT,Anaconda 464 | pyperclip,1.7.0,A cross-platform clipboard module for Python. (Only handles plain text for now.),BSD,PyPI 465 | pyproj,2.3.0,Python interface to PROJ.4 library,OSI,Anaconda 466 | pyrsistent,0.14.11,Persistent/Functional/Immutable data structures,MIT,Anaconda 467 | pysal,2.1.0,Library of spatial analysis functions,BSD-3-Clause,Anaconda 468 | pyshp,2.1.0,Pure Python read/write support for ESRI Shapefile format,MIT,Anaconda 469 | pystan,2.19.0.0,"Python interface to Stan, a package for Bayesian inference",GPLv3,Anaconda 470 | pytagcloud,0.3.5,Create beautiful tag clouds as images or HTML,BSD,PyPI 471 | pytesseract,0.2.9,Python-tesseract is a python wrapper for Google's Tesseract-OCR,GPLv3,PyPI 472 | pytest,5.0.1,Simple and powerful testing with Python.,MIT,Anaconda 473 | pytest-arraydiff,0.3,pytest plugin to help with comparing array output from tests,BSD,Anaconda 474 | pytest-astropy,0.5.0,Meta-package containing dependencies for testing Astropy,BSD,Anaconda 475 | pytest-cov,2.7.1,Pytest plugin for measuring coverage,MIT,Anaconda 476 | pytest-doctestplus,0.3.0,Pytest plugin with advanced doctest features,BSD,Anaconda 477 | pytest-mock,1.10.4,Thin-wrapper around the mock package for easier use with py.test,MIT,Anaconda 478 | pytest-openfiles,0.3.2,Pytest plugin for detecting inadvertent open file handles,BSD,Anaconda 479 | pytest-pylint,0.14.1,pytest plugin to check source code with pylint,MIT,PyPI 480 | pytest-remotedata,0.3.2,Pytest plugin for controlling remote data access,BSD,Anaconda 481 | pytext-nlp,0.1.2,pytorch modeling framework and model zoo for text models,BSD,PyPI 482 | python-Levenshtein,0.12.0,Python extension for computing string edit distances and similarities.,GPL,PyPI 483 | python-dateutil,2.8.0,Extensions to the standard Python datetime module.,BSD 3 Clause,Anaconda 484 | python-editor,1.0.4,"Programmatically open an editor, capture the result.",Apache 2.0,Anaconda 485 | python-igraph,0.7.1.post7,High performance graph data structures and algorithms,GNU General Public License (GPL),PyPI 486 | python-louvain,0.13,Louvain algorithm for community detection,BSD,PyPI 487 | python-utils,2.3.0,Python Utils is a collection of small Python functions and classes which make common patterns shorter and easier.,BSD-3-Clause,Anaconda 488 | pytorch-ignite,0.2.0,A lightweight library to help with training neural networks in PyTorch.,BSD,PyPI 489 | pytorch-pretrained-bert,0.6.2,PyTorch version of Google AI BERT model with script to load Google pre-trained models,Apache,PyPI 490 | pytz,2019.2,"World timezone definitions, modern and historical.",MIT,Anaconda 491 | pyviz-comms,0.7.2,Bidirectional communication for the PyViz ecosystem.,BSD,PyPI 492 | pyzmq,18.1.0,Python bindings for zeromq,BSD 3-clause,Anaconda 493 | qtconsole,4.5.3,Jupyter Qt console,BSD 3-Clause,Anaconda 494 | raccoon,2.1.5,Python DataFrame with fast insert and appends,MIT,PyPI 495 | randomgen,1.16.6,Random generator supporting multiple PRNGs,NCSA,PyPI 496 | ray,0.7.3,A system for parallel and distributed Python that unifies the ML ecosystem.,Apache 2.0,PyPI 497 | redis,3.3.8,Advanced key-value cache and store,3-clause BSD,Anaconda 498 | regex,2019.8.19,"Alternative regular expression module, to replace re.",Python-2.0,Anaconda 499 | requests,2.22.0,Python HTTP for Humans,Apache 2.0,Anaconda 500 | resampy,0.2.2,Efficient signal resampling,ISC,PyPI 501 | retrying,1.3.3,Retrying,Apache 2.0,PyPI 502 | revrand,0.6.5,A library of scalable Bayesian generalized linear models with fancy features,Apache Software License 2.0,PyPI 503 | rf-perm-feat-import,0.1,Random Forest Permutate Feature Importance,MIT,PyPI 504 | rgf-python,3.6.0,Scikit-learn Wrapper for Regularized Greedy Forest,MIT License,PyPI 505 | rope,0.14.0,A python refactoring library,GPL 2.0,Anaconda 506 | rsa,4.0,Pure-Python RSA implementation,ASL 2,PyPI 507 | ruamel-yaml,0.15.46,"ruamel.yaml is a YAML parser/emitter that supports roundtrip preservation of comments, seq/map flow style, and map key order",MIT license,PyPI 508 | s2sphere,0.2.5,Python implementation of the S2 Geometry Library,MIT,PyPI 509 | s3fs,0.3.3,Convenient Filesystem interface over S3,BSD 3-Clause,Anaconda 510 | s3transfer,0.2.1,An Amazon S3 Transfer Manager,Apache License 2.0,PyPI 511 | sacred,0.7.5,Facilitates automated and reproducible experimental research,,PyPI 512 | scattertext,0.0.2.52,An NLP package to visualize interesting terms in text.,Apache 2.0,PyPI 513 | scikit-image,0.15.0,Image processing routines for SciPy.,BSD 3-Clause,Anaconda 514 | scikit-learn,0.21.3,A set of python modules for machine learning and data mining,BSD 3-Clause,Anaconda 515 | scikit-multilearn,0.2.0,Scikit-multilearn is a BSD-licensed library for multi-label classification that is built on top of the well-known scikit-learn ecosystem.,BSD,PyPI 516 | scikit-optimize,0.5.2,Sequential model-based optimization toolbox.,BSD,PyPI 517 | scikit-plot,0.3.7,An intuitive library to add plotting functionality to scikit-learn objects.,MIT License,PyPI 518 | scikit-surprise,1.0.6,An easy-to-use library for recommender systems.,GPLv3+,PyPI 519 | scipy,1.2.1,Scientific Library for Python,BSD 3-Clause,Anaconda 520 | scs,2.1.1.post2,scs: splitting conic solver,MIT,PyPI 521 | seaborn,0.9.0,Statistical data visualization,BSD 3-Clause,Anaconda 522 | sentencepiece,0.1.83,SentencePiece python wrapper,Apache,PyPI 523 | setuptools,41.0.1,"Download, build, install, upgrade, and uninstall Python packages",MIT,Anaconda 524 | setuptools-git,1.2,Setuptools revision control system plugin for Git,BSD,PyPI 525 | shap,0.29.3,A unified approach to explain the output of any machine learning model.,MIT,PyPI 526 | simplegeneric,0.8.1,"Simple generic functions (similar to Python's own len(), pickle.dump(), etc.)",Zope Public,Anaconda 527 | singledispatch,3.4.0.3,Transforms a function into a single-dispatch generic function,MIT,Anaconda 528 | six,1.12.0,Python 2 and 3 compatibility utilities,MIT,Anaconda 529 | sklearn,0.0,A set of python modules for machine learning and data mining,UNKNOWN,PyPI 530 | sklearn-contrib-lightning,0.5.0,"Large-scale sparse linear classification, regression and ranking in Python",new BSD,PyPI 531 | sklearn-contrib-py-earth,0.1.0+1.gdde5f89,A Python implementation of Jerome Friedman's Multivariate Adaptive Regression Splines.,LICENSE.txt,PyPI 532 | sklearn-pandas,1.8.0,Pandas integration with sklearn,,PyPI 533 | smart-open,1.8.4,"Utils for streaming large files (S3, HDFS, gzip, bz2...)",MIT,PyPI 534 | smhasher,0.150.1,Python extension for smhasher hash functions,UNKNOWN,PyPI 535 | snowballstemmer,1.9.0,Snowball stemming library collection for Python,BSD-2-Clause,Anaconda 536 | sortedcollections,1.1.2,Python Sorted Collections,Apache-2.0,Anaconda 537 | sortedcontainers,2.1.0,"Python Sorted Container Types: SortedList, SortedDict, and SortedSet",Apache 2.0,Anaconda 538 | soupsieve,1.9.2,A modern CSS selector implementation for BeautifulSoup,MIT,Anaconda 539 | spacy,2.1.8,Industrial-strength Natural Language Processing,MIT,Anaconda 540 | spectral,0.19,Spectral Python (SPy) is a Python module for hyperspectral image processing.,GPL,PyPI 541 | speedml,0.9.3,Speedml Machine Learning Speed Start,MIT,PyPI 542 | sphinx-rtd-theme,0.2.4,Read the Docs theme for Sphinx,MIT,PyPI 543 | sphinxcontrib-applehelp,1.0.1,sphinxcontrib-applehelp is a sphinx extension which outputs Apple help books,BSD-2-Clause,Anaconda 544 | sphinxcontrib-devhelp,1.0.1,sphinxcontrib-devhelp is a sphinx extension which outputs Devhelp document,BSD-2-Clause,Anaconda 545 | sphinxcontrib-htmlhelp,1.0.2,sphinxcontrib-htmlhelp is a sphinx extension which ...,BSD-2-Clause,Anaconda 546 | sphinxcontrib-jsmath,1.0.1,A sphinx extension which renders display math in HTML via JavaScript,BSD-2-Clause,Anaconda 547 | sphinxcontrib-qthelp,1.0.2,sphinxcontrib-qthelp is a sphinx extension which outputs QtHelp document,BSD-2-Clause,Anaconda 548 | sphinxcontrib-serializinghtml,1.1.3,"sphinxcontrib-serializinghtml is a sphinx extension which outputs ""serialized"" HTML files (json and pickle).",BSD-2-Clause,Anaconda 549 | sphinxcontrib-websupport,1.1.2,Sphinx API for Web Apps,BSD-2-Clause,Anaconda 550 | spyder,3.3.6,The Scientific Python Development Environment,MIT,Anaconda 551 | spyder-kernels,0.5.1,Jupyter kernels for the Spyder console,MIT,Anaconda 552 | sqlparse,0.3.0,A non-validating SQL parser module for Python.,BSD 3-Clause,Anaconda 553 | squarify,0.4.3,Pure Python implementation of the squarify treemap layout algorithm,Apache v2,PyPI 554 | srsly,0.1.0,Modern high-performance serialization utilities for Python,MIT,PyPI 555 | statsmodels,0.10.1,Statistical computations and models for use with SciPy,BSD 3-Clause,Anaconda 556 | stemming,1.0.1,,,PyPI 557 | stevedore,1.30.1,Manage dynamic plugins for Python applications,,PyPI 558 | stop-words,2018.7.23,Get list of common stop words in various languages in Python,"Copyright (c) 2014, Alireza Savand, Contributors",PyPI 559 | stopit,1.1.2,"Timeout control decorator and context managers, raise any exception in another thread",GPLv3,PyPI 560 | svgwrite,1.3.1,A Python library to create SVG drawings.,MIT License,PyPI 561 | sympy,1.4,Python library for symbolic mathematics,BSD-3-Clause,Anaconda 562 | tables,3.5.1,Hierarchical datasets for Python,BSD 2-Clause,PyPI 563 | tabulate,0.8.3,Pretty-print tabular data,MIT,PyPI 564 | tblib,1.4.0,Traceback serialization library.,BSD 2-Clause,Anaconda 565 | tensorboard,1.14.0,TensorBoard lets you watch Tensors Flow,Apache 2.0,PyPI 566 | tensorboardX,1.8,TensorBoardX lets you watch Tensors Flow without Tensorflow,MIT license,PyPI 567 | tensorflow,1.14.0,TensorFlow is a machine learning library.,Apache 2.0,Anaconda 568 | tensorflow-estimator,1.14.0,TensorFlow Estimator.,Apache 2.0,PyPI 569 | tensorflow-hub,0.5.0,"TensorFlow Hub is a library to foster the publication, discovery, and consumption of reusable parts of machine learning models.",Apache 2.0,PyPI 570 | tensorflow-probability,0.7.0,Probabilistic modeling and statistical inference in TensorFlow,Apache 2.0,PyPI 571 | tensorforce,0.4.3,Reinforcement learning for TensorFlow,Apache 2.0,PyPI 572 | tensorpack,0.9.7.1,A Neural Network Training Interface on TensorFlow,Apache,PyPI 573 | termcolor,1.1.0,ANSII Color formatting for output in terminal.,MIT,Anaconda 574 | terminado,0.8.2,Terminals served by tornado websockets,BSD 2-Clause,Anaconda 575 | terminalplot,0.2.6,Plot points in terminal,GPL,PyPI 576 | testpath,0.3.1,Testpath is a collection of utilities for Python code working with files and commands.,BSD-3-Clause,Anaconda 577 | textacy,0.6.2,"NLP, before and after spaCy",Apache,PyPI 578 | textblob,0.15.3,"Simple, Pythonic text processing. Sentiment analysis, part-of-speech tagging, noun phrase parsing, and more.",MIT,PyPI 579 | tflearn,0.3.2,Deep Learning Library featuring a higher-level API for TensorFlow,MIT,PyPI 580 | thinc,7.0.8,Practical Machine Learning for NLP,MIT,PyPI 581 | tifffile,2019.7.26,Read and write TIFF(r) files,BSD,PyPI 582 | tinycss2,1.0.2,Low-level CSS parser for Python,BSD,PyPI 583 | toolz,0.10.0,A functional standard library for Python,BSD 3-Clause,Anaconda 584 | torch,1.1.0,Tensors and Dynamic neural networks in Python with strong GPU acceleration,BSD-3,PyPI 585 | torchaudio,0.4.0a0+42a705d,An audio package for PyTorch,,PyPI 586 | torchtext,0.4.0,Text utilities and datasets for PyTorch,BSD,PyPI 587 | torchvision,0.3.0,Image and video datasets and models for torch deep learning,BSD 3-Clause,Anaconda 588 | tornado,5.0.2,"A Python web framework and asynchronous networking library, originally developed at FriendFeed.",Apache 2.0,Anaconda 589 | tqdm,4.32.1,"A Fast, Extensible Progress Meter","MPLv2.0, MIT",Anaconda 590 | trackml,0.1.12,"An opinionated, minimal cookiecutter template for Python packages",,PyPI 591 | traitlets,4.3.2,Configuration system for Python applications,BSD 3-clause,Anaconda 592 | trueskill,0.4.5,The video game rating system,BSD,PyPI 593 | tsfresh,0.12.0,tsfresh extracts relevant characteristics from time series,MIT,PyPI 594 | typed-ast,1.3.4,a fork of Python 2 and 3 ast modules with type comment support,Apache 2.0,Anaconda 595 | typing,3.6.6,Type Hints for Python - backport for Python<3.5,PSF,Anaconda 596 | typing-extensions,3.6.6,Backported and Experimental Type Hints for Python 3.5+,PSF,PyPI 597 | tzlocal,2.0.0,tzinfo object for the local timezone,MIT,PyPI 598 | umap-learn,0.3.10,Uniform Manifold Approximation and Projection,BSD,PyPI 599 | unicodecsv,0.14.1,Drop-in replacement for csv module which supports unicode strings,BSD,Anaconda 600 | update-checker,0.16,A python module that will check for package updates.,Simplified BSD License,PyPI 601 | uritemplate,3.0.0,URI templates,"BSD 3-Clause License or Apache License, Version 2.0",PyPI 602 | urllib3,1.24.2,"HTTP library with thread-safe connection pooling, file post, and more.",MIT,Anaconda 603 | urwid,2.0.1,A full-featured console (xterm et al.) user interface library,LGPL,PyPI 604 | vecstack,0.4.0,Python package for stacking (machine learning technique),MIT,PyPI 605 | vega3,0.13.0,Deprecated: please use vega,BSD 3-clause,PyPI 606 | vida,0.3,Python binding for Vida data visualizations,UNKNOWN,PyPI 607 | visvis,1.11.2,An object oriented approach to visualization of 1D to 4D data,BSD 3-Clause,Anaconda 608 | vowpalwabbit,8.5.0,Vowpal Wabbit Python package,BSD 3-Clause License,PyPI 609 | vtk,8.1.2,"VTK is an open-source toolkit for 3D computer graphics, image processing, and visualization",BSD,PyPI 610 | wasabi,0.2.2,A lightweight console printing and formatting toolkit,MIT,PyPI 611 | wavio,0.0.4,A Python module for reading and writing WAV files using numpy arrays.,BSD,PyPI 612 | wcwidth,0.1.7,Measures number of Terminal column cells of wide-character codes.,MIT,Anaconda 613 | webencodings,0.5.1,Character encoding aliases for legacy web content,BSD 3-Clause,Anaconda 614 | websocket-client,0.56.0,WebSocket client for Python. hybi13 is supported.,BSD,PyPI 615 | wfdb,2.2.1,The WFDB Python Toolbox,MIT,PyPI 616 | wheel,0.33.4,A built-package format for Python.,MIT,Anaconda 617 | widgetsnbextension,3.5.1,Interactive Widgets for Jupyter,BSD 3-clause,Anaconda 618 | word2number,1.1,Convert number words eg. three hundred and forty two to numbers (342).,MIT,PyPI 619 | wordcloud,1.5.0,A little word cloud generator,MIT,PyPI 620 | wordsegment,1.3.1,English word segmentation.,Apache 2.0,PyPI 621 | wrapt,1.11.2,"Module for decorators, wrappers and monkey patching",BSD 2-Clause,Anaconda 622 | wurlitzer,1.0.3,Capture C-level stdout/stderr in Python,MIT,Anaconda 623 | xarray,0.12.3,N-D labeled arrays and datasets in Python.,Apache-2.0,Anaconda 624 | xgboost,0.90,XGBoost Python Package,Apache-2.0,PyPI 625 | xlrd,1.2.0,Library for developers to extract data from Microsoft Excel (tm) spreadsheet files,BSD 3-Clause and BSD with advertising,Anaconda 626 | xlwt,1.3.0,Writing data and formatting information to Excel files,BSD,Anaconda 627 | xvfbwrapper,0.2.9,run headless display inside X virtual framebuffer (Xvfb),MIT,PyPI 628 | xxhash,1.3.0,Python binding for xxHash,BSD,PyPI 629 | yellowbrick,0.9.1,A suite of visual analysis and diagnostic tools for machine learning.,Apache 2,PyPI 630 | zict,1.0.0,Composable Dictionary Classes,BSD 3-Clause,Anaconda 631 | zipp,0.5.2,A pathlib-compatible Zipfile object wrapper,MIT,Anaconda 632 | -------------------------------------------------------------------------------- /s22_wiki.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/navneetkrc/langchain_colab_experiments/ef37bf48f653463a1a2580b20610061391512af6/s22_wiki.pdf -------------------------------------------------------------------------------- /s23_wiki.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/navneetkrc/langchain_colab_experiments/ef37bf48f653463a1a2580b20610061391512af6/s23_wiki.pdf -------------------------------------------------------------------------------- /the_constitution_of_india.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/navneetkrc/langchain_colab_experiments/ef37bf48f653463a1a2580b20610061391512af6/the_constitution_of_india.pdf --------------------------------------------------------------------------------