├── 01_Introduction
├── 01_1.png
├── 01_2.png
├── README.md
└── 01_Introduction.ipynb
├── 02_Core_Concepts
├── 02_1.png
├── 02_2.png
└── README.md
├── 04_Data_Connectors
├── llama_hub.png
├── README.md
└── 04_Data_Connectors.ipynb
├── README.md
├── LICENSE
├── 03_Customization
├── README.md
└── 03_Customization.ipynb
├── 05_Documents_Nodes
├── README.md
└── 05_Documents_Nodes.ipynb
└── llamaindex_llmlingua_prompt_compression.ipynb
/01_Introduction/01_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sugarforever/LlamaIndex-Tutorials/main/01_Introduction/01_1.png
--------------------------------------------------------------------------------
/01_Introduction/01_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sugarforever/LlamaIndex-Tutorials/main/01_Introduction/01_2.png
--------------------------------------------------------------------------------
/02_Core_Concepts/02_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sugarforever/LlamaIndex-Tutorials/main/02_Core_Concepts/02_1.png
--------------------------------------------------------------------------------
/02_Core_Concepts/02_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sugarforever/LlamaIndex-Tutorials/main/02_Core_Concepts/02_2.png
--------------------------------------------------------------------------------
/04_Data_Connectors/llama_hub.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sugarforever/LlamaIndex-Tutorials/main/04_Data_Connectors/llama_hub.png
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # LlamaIndex Tutorials
2 |
3 | 最近开始学习`LlamaIndex`框架,顺手将学习内容整理归纳为该入门教程系列。
4 |
5 | `LlamaIndex`项目 - [https://github.com/jerryjliu/llama_index](https://github.com/jerryjliu/llama_index)
6 |
7 | ## 目录
8 |
9 | 1. 什么是LlamaIndex?[链接](./01_Introduction)
10 | 2. 核心概念 [链接](./02_Core_Concepts)
11 | 3. 个性化配置 [链接](./03_Customization)
12 | 4. 数据连接器 [链接](./04_Data_Connectors)
13 | 5. 文档和节点 [链接](./05_Documents_Nodes)
14 |
--------------------------------------------------------------------------------
/01_Introduction/README.md:
--------------------------------------------------------------------------------
1 | # 什么是LlamaIndex
2 |
3 | `LlamaIndex` 是一个用于 `LLM` 应用程序的数据框架,用于注入,结构化,并访问私有或特定领域数据。
4 |
5 | `LlamaIndex` 由 Jerry Liu (Twitter: [@jerryjliu0](https://twitter.com/jerryjliu0)) 联合创办,并担任CEO。
6 |
7 | ## LlamaIndex为何而生?
8 |
9 | 在本质上,`LLM`(如`GPT`)为人类和推断出的数据提供了基于自然语言的交互接口。广泛可用的大模型通常在大量公开可用的数据上进行的预训练,包括来自维基百科、邮件列表、书籍和源代码等。
10 |
11 | 构建在LLM模型之上的应用程序通常需要使用私有或特定领域数据来增强这些模型。不幸的是,这些数据可能分布在不同的应用程序和数据存储中。它们可能存在于API之后、SQL数据库中,或者存在在PDF文件以及幻灯片中。
12 |
13 | LlamaIndex应运而生。
14 |
15 | 
16 | ## LlamaIndex如何破局?
17 |
18 | `LlamaIndex` 提供了5大核心工具:
19 | - Data connectors
20 | - Data indexes
21 | - Engines
22 | - Data agents
23 | - Application integrations
24 |
25 | 
26 |
27 | ## 一个简单的LlamaIndex应用
28 |
29 | [01_Introduction.ipynb](./01_Introduction.ipynb)
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2023 sugarforever
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/04_Data_Connectors/README.md:
--------------------------------------------------------------------------------
1 | # Data Connectors (数据连接器)
2 |
3 | 在 `RAG` 的业务场景中,数据加载是非常重要的一个环节。`LlamaIndex` 定义了数据连接器接口,并提供了一系列实现来支持不同的数据源或数据格式的数据加载。它们包括,但不限于:
4 |
5 | - Simple Directory Reader
6 | - Psychic Reader
7 | - DeepLake Reader
8 | - Qdrant Reader
9 | - Discord Reader
10 | - MongoDB Reader
11 | - Chroma Reader
12 | - MyScale Reader
13 | - Faiss Reader
14 | - Obsidian Reader
15 | - Slack Reader
16 | - Web Page Reader
17 | - Pinecone Reader
18 | - Mbox Reader
19 | - MilvusReader
20 | - Notion Reader
21 | - Github Repo Reader
22 | - Google Docs Reader
23 | - Database Reader
24 | - Twitter Reader
25 | - Weaviate Reader
26 | - Make Reader
27 |
28 | ## LlamaHub
29 |
30 | `LlamaIndex` 的数据连接器通过 [LlamaHub](https://llamahub.ai/) 提供。`LlamaHub` 是一个开源仓库,包含可轻松集成到任何 `LlamaIndex` 应用。
31 |
32 | 
33 |
34 | ### 使用示例
35 |
36 | #### 使用LlamaIndex框架内置的数据连接器
37 |
38 | `LlamaIndex` 框架提供了一系列内置的数据连接器。开发者不需要从 `LlamaHub` 加载就可以直接使用。
39 |
40 | 以下代码演示了如何读取网页数据。
41 |
42 | ```python
43 | from llama_index import SummaryIndex, SimpleWebPageReader
44 | documents = SimpleWebPageReader(html_to_text=True).load_data(
45 | ["http://paulgraham.com/worked.html"]
46 | )
47 | ```
48 |
49 | #### 从LlamaHub加载数据连接器
50 |
51 | 以下示例代码从 `LlamaHub` 加载 Markdown 文档数据连接器。关于该数据连接器的细节,请参考[https://llamahub.ai/l/file-markdown](https://llamahub.ai/l/file-markdown)
52 |
53 | ```python
54 | from pathlib import Path
55 | from llama_index import download_loader
56 |
57 | MarkdownReader = download_loader("MarkdownReader")
58 |
59 | loader = MarkdownReader()
60 | documents = loader.load_data(file=Path('./README.md'))
61 | ```
62 |
63 | ## 完整实例
64 |
65 | 对于以上介绍的两种加载方式,请参考完整示例代码[04_Data_Connectors.ipynb](./04_Data_Connectors.ipynb)
66 |
--------------------------------------------------------------------------------
/03_Customization/README.md:
--------------------------------------------------------------------------------
1 | # 个性化配置
2 |
3 | `LlamaIndex` 对 `RAG` 过程提供了全面的配置支持,允许开发者对整个过程进行个性化设置。常见的配置场景包括:
4 |
5 | - 自定义文档分块
6 | - 自定义向量存储
7 | - 自定义检索
8 | - 指定 `LLM`
9 | - 指定响应模式
10 | - 指定流式响应
11 |
12 | 注,个性化配置主要通过 `LlamaIndex` 提供的 `ServiceContext` 类实现。
13 |
14 | ## 配置场景示例
15 |
16 | 接下来通过简明示例代码段展示 `LlamaIndex` 对各种配置场景的支持。
17 |
18 | ### 自定义文档分块
19 |
20 | ```python
21 | from llama_index import ServiceContext
22 | service_context = ServiceContext.from_defaults(chunk_size=500)
23 | ```
24 |
25 | ### 自定义向量存储
26 |
27 | ```python
28 | import chromadb
29 | from llama_index.vector_stores import ChromaVectorStore
30 | from llama_index import StorageContext
31 |
32 | chroma_client = chromadb.PersistentClient()
33 | chroma_collection = chroma_client.create_collection("quickstart")
34 | vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
35 | storage_context = StorageContext.from_defaults(vector_store=vector_store)
36 | ```
37 |
38 | ### 自定义检索
39 |
40 | 自定义检索中,我们可以通过参数指定查询引擎(Query Engine)在检索时请求的相似文档数。
41 |
42 | ```python
43 | index = VectorStoreIndex.from_documents(documents)
44 | query_engine = index.as_query_engine(similarity_top_k=5)
45 | ```
46 |
47 | ### 指定 `LLM`
48 |
49 | ```python
50 | service_context = ServiceContext.from_defaults(llm=OpenAI())
51 | ```
52 |
53 | ### 指定响应模式
54 |
55 | ```python
56 | query_engine = index.as_query_engine(response_mode='tree_summarize')
57 | ```
58 | ### 指定流式响应
59 |
60 | ```python
61 | query_engine = index.as_query_engine(streaming=True)
62 | ```
63 |
64 | ## 完整实例
65 |
66 | 请参考[03_Customization.ipynb](./03_Customization.ipynb) ,这是一个基于第1课的示例实现上述的所有个性化配置:
67 | 1. 文档分块大小:500
68 | 2. Chromadb作为向量存储
69 | 3. 自定义检索文档数为5
70 | 4. 指定大模型为OpenAI的模型
71 | 5. 响应模式为 `tree_summarize`
72 | 6. 问答实现流式响应
73 |
74 | 注,响应模式会在后续课程中详细介绍。
75 |
--------------------------------------------------------------------------------
/02_Core_Concepts/README.md:
--------------------------------------------------------------------------------
1 | # 核心概念
2 |
3 | `LlamaIndex` 帮助构建 `LLM` 驱动的,基于个人或私域数据的应用。RAG(Retrieval Augmented Generation) 是 `LlamaIndex` 应用的核心概念。
4 |
5 | ## RAG
6 |
7 | RAG,也称为检索增强生成,是利用个人或私域数据增强 `LLM` 的一种范式。通常,它包含两个阶段:
8 | 1. 索引
9 |
10 | 构建知识库。
11 |
12 | 2. 查询
13 |
14 | 从知识库检索相关上下文信息,以辅助 `LLM` 回答问题。
15 |
16 | `LlamaIndex` 提供了工具包帮助开发者极其便捷地完成这两个阶段的工作。
17 |
18 | ### 索引阶段
19 |
20 | `LlamaIndex` 通过提供 Data connectors(数据连接器) 和 Indexes (索引) 帮助开发者构建知识库。
21 |
22 | 该阶段会用到如下工具或组件:
23 |
24 | - Data connectors
25 |
26 | 数据连接器。它负责将来自不同数据源的不同格式的数据注入,并转换为 `LlamaIndex` 支持的文档(Document)表现形式,其中包含了文本和元数据。
27 |
28 | - Documents / Nodes
29 |
30 | Document是 `LlamaIndex` 中容器的概念,它可以包含任何数据源,包括,PDF文档,API响应,或来自数据库的数据。
31 |
32 | Node是 `LlamaIndex` 中数据的最小单元,代表了一个 Document的分块。它还包含了元数据,以及与其他Node的关系信息。这使得更精确的检索操作成为可能。
33 |
34 | - Data Indexes
35 |
36 | `LlamaIndex` 提供便利的工具,帮助开发者为注入的数据建立索引,使得未来的检索简单而高效。
37 |
38 | 最常用的索引是向量存储索引 - `VectorStoreIndex`。
39 |
40 | 
41 |
42 | ### 查询阶段
43 |
44 | 在查询阶段,`RAG` 管道根据的用户查询,检索最相关的上下文,并将其与查询一起,传递给 `LLM`,以合成响应。这使 `LLM` 能够获得不在其原始训练数据中的最新知识,同时也减少了虚构内容。该阶段的关键挑战在于检索、编排和基于知识库的推理。
45 |
46 | `LlamaIndex` 提供可组合的模块,帮助开发者构建和集成 `RAG` 管道,用于问答、聊天机器人或作为代理的一部分。这些构建块可以根据排名偏好进行定制,并组合起来,以结构化的方式基于多个知识库进行推理。
47 |
48 | 该阶段的构建块包括:
49 |
50 | - Retrievers
51 |
52 | 检索器。它定义如何高效地从知识库,基于查询,检索相关上下文信息。
53 |
54 | - Node Postprocessors
55 |
56 | Node后处理器。它对一系列文档节点(Node)实施转换,过滤,或排名。
57 |
58 | - Response Synthesizers
59 |
60 | 响应合成器。它基于用户的查询,和一组检索到的文本块(形成上下文),利用 `LLM` 生成响应。
61 |
62 | RAG管道包括:
63 | - Query Engines
64 |
65 | 查询引擎 - 端到端的管道,允许用户基于知识库,以自然语言提问,并获得回答,以及相关的上下文。
66 |
67 | - Chat Engines
68 |
69 | 聊天引擎 - 端到端的管道,允许用户基于知识库进行对话(多次交互,会话历史)。
70 |
71 | - Agents
72 |
73 | 代理。它是一种由 `LLM` 驱动的自动化决策器。代理可以像查询引擎或聊天引擎一样使用。主要区别在于,代理动态地决定最佳的动作序列,而不是遵循预定的逻辑。这为其提供了处理更复杂任务的额外灵活性。
74 |
75 | 
--------------------------------------------------------------------------------
/01_Introduction/01_Introduction.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "nbformat": 4,
3 | "nbformat_minor": 0,
4 | "metadata": {
5 | "colab": {
6 | "provenance": [],
7 | "authorship_tag": "ABX9TyP9UyuxS6D0UB2pyiMnxl/h",
8 | "include_colab_link": true
9 | },
10 | "kernelspec": {
11 | "name": "python3",
12 | "display_name": "Python 3"
13 | },
14 | "language_info": {
15 | "name": "python"
16 | }
17 | },
18 | "cells": [
19 | {
20 | "cell_type": "markdown",
21 | "metadata": {
22 | "id": "view-in-github",
23 | "colab_type": "text"
24 | },
25 | "source": [
26 | "
"
27 | ]
28 | },
29 | {
30 | "cell_type": "code",
31 | "execution_count": null,
32 | "metadata": {
33 | "id": "3gyCNIM1Im7-"
34 | },
35 | "outputs": [],
36 | "source": [
37 | "!pip install -q llama-index"
38 | ]
39 | },
40 | {
41 | "cell_type": "code",
42 | "source": [
43 | "!mkdir data\n",
44 | "!wget https://raw.githubusercontent.com/jerryjliu/llama_index/main/examples/paul_graham_essay/data/paul_graham_essay.txt -O data/paul_graham_essay.txt"
45 | ],
46 | "metadata": {
47 | "id": "5a1h2iL3I5Fs"
48 | },
49 | "execution_count": null,
50 | "outputs": []
51 | },
52 | {
53 | "cell_type": "code",
54 | "source": [
55 | "import os\n",
56 | "os.environ['OPENAI_API_KEY'] = 'your valid openai api key'"
57 | ],
58 | "metadata": {
59 | "id": "lmaZiSPUJSUw"
60 | },
61 | "execution_count": null,
62 | "outputs": []
63 | },
64 | {
65 | "cell_type": "code",
66 | "source": [
67 | "from llama_index import VectorStoreIndex, SimpleDirectoryReader\n",
68 | "\n",
69 | "documents = SimpleDirectoryReader('data').load_data()\n",
70 | "index = VectorStoreIndex.from_documents(documents)"
71 | ],
72 | "metadata": {
73 | "id": "mWbUr_INJMbH"
74 | },
75 | "execution_count": null,
76 | "outputs": []
77 | },
78 | {
79 | "cell_type": "code",
80 | "source": [
81 | "query_engine = index.as_query_engine()\n",
82 | "response = query_engine.query(\"Who is the author?\")\n",
83 | "print(response)"
84 | ],
85 | "metadata": {
86 | "id": "4Z_ainQLJOcd"
87 | },
88 | "execution_count": null,
89 | "outputs": []
90 | },
91 | {
92 | "cell_type": "code",
93 | "source": [
94 | "response = query_engine.query(\"Introduce me Paul Graham\")\n",
95 | "print(response)"
96 | ],
97 | "metadata": {
98 | "id": "mKyLeFloJ8aF"
99 | },
100 | "execution_count": null,
101 | "outputs": []
102 | }
103 | ]
104 | }
--------------------------------------------------------------------------------
/05_Documents_Nodes/README.md:
--------------------------------------------------------------------------------
1 | # Documents & Nodes (文档与节点)
2 |
3 | 在 `LlamaIndex` 中,`Document` 和 `Node` 是最核心的数据抽象。
4 |
5 | ## Document
6 |
7 | `Document` 是任何数据源的容器:
8 |
9 | - PDF
10 | - API响应
11 | - 数据库
12 | - ...
13 |
14 | `Document` 存储:
15 |
16 | - 文本数据
17 | - 属性数据
18 | - 元数据 (metadata)
19 | - 关系数据 (relationships)
20 |
21 | 示例:
22 |
23 | ```python
24 | from llama_index import Document
25 | text_list = ["hello", "world"]
26 | documents = [Document(text=t) for t in text_list]
27 | ```
28 |
29 | ### 自定义Document
30 |
31 | 自定义Document可以实现元数据和文档ID的设置。
32 |
33 | 元数据可以在文档构建时指定,也可以在文档对象上修改,还可以在 `SimpleDirectoryReader` 的使用中设置。
34 |
35 | 1. 构建时指定
36 |
37 | ```python
38 | from llama_index import Document
39 | document = Document(
40 | text='Hello World',
41 | metadata={
42 | 'filename': 'hello_world.pdf',
43 | 'category': 'science'
44 | }
45 | )
46 | ```
47 |
48 | 2. 在文档对象上修改
49 |
50 | ```python
51 | document.metadata = {'filename': 'hello_world_v2.pdf'}
52 | ```
53 |
54 | 3. 在 `SimpleDirectoryReader` 的使用中设置
55 |
56 | 当使用 `SimpleDirectoryReader` 加载文档时,利用 `file_metadata` 回调进行设置
57 |
58 | ```python
59 | from llama_index import SimpleDirectoryReader
60 | filenama_hook = lambda filename: {'file_name': filename}
61 | documents = SimpleDirectoryReader('./data', file_metadata=filenama_hook).load_data()
62 | ```
63 |
64 | 注,请参阅[simple-directory-reader](https://docs.llamaindex.ai/en/stable/examples/data_connectors/simple_directory_reader.html#simple-directory-reader)了解 `SimpleDirectoryReader` 的接口定义。
65 |
66 | 文档ID可以在 `Document` 创建后设置
67 |
68 | ```python
69 | from llama_index import Document
70 | document = Document(text='Hello World')
71 | document.doc_id = "xxxx-yyyy"
72 | ```
73 |
74 | ## Node
75 |
76 | `Node` 是 `LlamaIndex` 中的一等公民。`Node` 也包含了 `Document` 中相同类型的数据和属性。
77 |
78 | `Node` 通常有两种构建方式:
79 |
80 | 1. 基于API直接构建
81 | 2. 基于 `Document`,利用节点解析器(NodeParser)生成。
82 |
83 | 注,基于 `Document` 衍生出来的 `Node` 也继承了 `Document` 上的属性。
84 |
85 | 示例:
86 |
87 | ```python
88 | # 基于API直接构建节点
89 | from llama_index.schema import TextNode
90 | node = TextNode(text="hello world", id_="1234-5678")
91 | ```
92 |
93 | ```python
94 | # 利用节点解析器生成节点
95 | from llama_index import Document
96 | from llama_index.node_parser import SimpleNodeParser
97 |
98 | text_list = ["hello", "world"]
99 | documents = [Document(text=t) for t in text_list]
100 |
101 | parser = SimpleNodeParser.from_defaults()
102 | nodes = parser.get_nodes_from_documents(documents)
103 | ```
104 |
105 | ### 自定义Node
106 |
107 | 通常开发者可以通过自定义Node来:
108 |
109 | 1. 定义节点间的关系 (relationship)
110 | 2. 自定义节点ID
111 |
112 | #### 定义节点间的关系
113 |
114 | `RelatedNodeInfo` 被用来建立关系。`RelatedNodeInfo` 支持在构造时传递参数设置元数据。
115 |
116 | ```python
117 | from llama_index.schema import TextNode, NodeRelationship, RelatedNodeInfo
118 |
119 | hello_node = TextNode(text="Hello", id_="1111-1111")
120 | world_node = TextNode(text="World", id_="2222-2222")
121 |
122 | hello_node.relationships[NodeRelationship.NEXT] = RelatedNodeInfo(node_id=world_node.node_id, metadata={"created_by": "VerySmallWoods"})
123 | world_node.relationships[NodeRelationship.PREVIOUS] = RelatedNodeInfo(node_id=hello_node.node_id)
124 | nodes = [hello_node, world_node]
125 | ```
126 |
127 | #### 自定义节点ID
128 |
129 | ```python
130 | from llama_index.schema import TextNode, NodeRelationship, RelatedNodeInfo
131 |
132 | hello_node = TextNode(text="Hello", id_="1111-1111")
133 | hello_node.id_ = "3333-3333"
134 | ```
135 |
136 | ## 完整实例
137 |
138 | 对于以上介绍的Document和Node的使用,请参考完整示例代码[05_Documents_Nodes.ipynb](./05_Documents_Nodes.ipynb)
139 |
--------------------------------------------------------------------------------
/03_Customization/03_Customization.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "nbformat": 4,
3 | "nbformat_minor": 0,
4 | "metadata": {
5 | "colab": {
6 | "provenance": [],
7 | "authorship_tag": "ABX9TyMZHViR6npRf89UzVPvDKHG",
8 | "include_colab_link": true
9 | },
10 | "kernelspec": {
11 | "name": "python3",
12 | "display_name": "Python 3"
13 | },
14 | "language_info": {
15 | "name": "python"
16 | }
17 | },
18 | "cells": [
19 | {
20 | "cell_type": "markdown",
21 | "metadata": {
22 | "id": "view-in-github",
23 | "colab_type": "text"
24 | },
25 | "source": [
26 | "
"
27 | ]
28 | },
29 | {
30 | "cell_type": "code",
31 | "execution_count": null,
32 | "metadata": {
33 | "id": "3gyCNIM1Im7-"
34 | },
35 | "outputs": [],
36 | "source": [
37 | "!pip install -q -U llama-index chromadb"
38 | ]
39 | },
40 | {
41 | "cell_type": "code",
42 | "source": [
43 | "!mkdir data\n",
44 | "!wget https://raw.githubusercontent.com/jerryjliu/llama_index/main/examples/paul_graham_essay/data/paul_graham_essay.txt -O data/paul_graham_essay.txt"
45 | ],
46 | "metadata": {
47 | "id": "5a1h2iL3I5Fs"
48 | },
49 | "execution_count": null,
50 | "outputs": []
51 | },
52 | {
53 | "cell_type": "code",
54 | "source": [
55 | "!pip install openai"
56 | ],
57 | "metadata": {
58 | "id": "DAx-uT-XLgTf"
59 | },
60 | "execution_count": null,
61 | "outputs": []
62 | },
63 | {
64 | "cell_type": "code",
65 | "source": [
66 | "import os\n",
67 | "os.environ['OPENAI_API_KEY'] = 'your valid openai api key'\n",
68 | "\n",
69 | "import chromadb\n",
70 | "from llama_index import VectorStoreIndex, SimpleDirectoryReader\n",
71 | "from llama_index import ServiceContext\n",
72 | "from llama_index.vector_stores import ChromaVectorStore\n",
73 | "from llama_index import StorageContext\n",
74 | "from llama_index.llms import OpenAI"
75 | ],
76 | "metadata": {
77 | "id": "lmaZiSPUJSUw"
78 | },
79 | "execution_count": null,
80 | "outputs": []
81 | },
82 | {
83 | "cell_type": "markdown",
84 | "source": [
85 | "## 自定义文档分块,指定 LLM"
86 | ],
87 | "metadata": {
88 | "id": "vs48EWb3KGjh"
89 | }
90 | },
91 | {
92 | "cell_type": "code",
93 | "source": [
94 | "service_context = ServiceContext.from_defaults(chunk_size=500, llm=OpenAI())\n"
95 | ],
96 | "metadata": {
97 | "id": "mWbUr_INJMbH"
98 | },
99 | "execution_count": null,
100 | "outputs": []
101 | },
102 | {
103 | "cell_type": "markdown",
104 | "source": [
105 | "## 自定义向量存储"
106 | ],
107 | "metadata": {
108 | "id": "-web3MtHKiaN"
109 | }
110 | },
111 | {
112 | "cell_type": "code",
113 | "source": [
114 | "chroma_client = chromadb.PersistentClient()\n",
115 | "chroma_collection = chroma_client.create_collection(\"quickstart\")\n",
116 | "vector_store = ChromaVectorStore(chroma_collection=chroma_collection)\n",
117 | "storage_context = StorageContext.from_defaults(vector_store=vector_store)"
118 | ],
119 | "metadata": {
120 | "id": "TOKP_4ljKl55"
121 | },
122 | "execution_count": null,
123 | "outputs": []
124 | },
125 | {
126 | "cell_type": "markdown",
127 | "source": [
128 | "## 索引文档"
129 | ],
130 | "metadata": {
131 | "id": "jxuR1Z21KyTJ"
132 | }
133 | },
134 | {
135 | "cell_type": "code",
136 | "source": [
137 | "documents = SimpleDirectoryReader('data').load_data()\n",
138 | "index = VectorStoreIndex.from_documents(documents, service_context=service_context,storage_context=storage_context)"
139 | ],
140 | "metadata": {
141 | "id": "JF9D7myqKf6f"
142 | },
143 | "execution_count": null,
144 | "outputs": []
145 | },
146 | {
147 | "cell_type": "markdown",
148 | "source": [
149 | "## 指定响应模式,以及启用流式响应"
150 | ],
151 | "metadata": {
152 | "id": "F2nCITY8LP5r"
153 | }
154 | },
155 | {
156 | "cell_type": "code",
157 | "source": [
158 | "query_engine = index.as_query_engine(response_mode='tree_summarize', streaming=True)\n",
159 | "response = query_engine.query(\"What did the author do?\")\n",
160 | "response.print_response_stream()"
161 | ],
162 | "metadata": {
163 | "id": "4Z_ainQLJOcd"
164 | },
165 | "execution_count": null,
166 | "outputs": []
167 | }
168 | ]
169 | }
--------------------------------------------------------------------------------
/04_Data_Connectors/04_Data_Connectors.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "nbformat": 4,
3 | "nbformat_minor": 0,
4 | "metadata": {
5 | "colab": {
6 | "provenance": [],
7 | "authorship_tag": "ABX9TyPw1gwZ1xfqLolq4oowSNrn",
8 | "include_colab_link": true
9 | },
10 | "kernelspec": {
11 | "name": "python3",
12 | "display_name": "Python 3"
13 | },
14 | "language_info": {
15 | "name": "python"
16 | }
17 | },
18 | "cells": [
19 | {
20 | "cell_type": "markdown",
21 | "metadata": {
22 | "id": "view-in-github",
23 | "colab_type": "text"
24 | },
25 | "source": [
26 | "
"
27 | ]
28 | },
29 | {
30 | "cell_type": "markdown",
31 | "source": [
32 | "# Data connectors"
33 | ],
34 | "metadata": {
35 | "id": "-VjYJcfAkTTu"
36 | }
37 | },
38 | {
39 | "cell_type": "code",
40 | "execution_count": null,
41 | "metadata": {
42 | "id": "3gyCNIM1Im7-"
43 | },
44 | "outputs": [],
45 | "source": [
46 | "!pip install -q -U llama-index"
47 | ]
48 | },
49 | {
50 | "cell_type": "code",
51 | "source": [
52 | "import os\n",
53 | "os.environ['OPENAI_API_KEY'] = 'your valid openai api key'"
54 | ],
55 | "metadata": {
56 | "id": "ZKgOR2HYk-vN"
57 | },
58 | "execution_count": null,
59 | "outputs": []
60 | },
61 | {
62 | "cell_type": "markdown",
63 | "source": [
64 | "## Use LlamaIndex native data connectors\n",
65 | "\n",
66 | "We will the open source langchain tutorial https://github.com/sugarforever/wtf-langchain and load it into LlamaIndex defined documents for query engine usage."
67 | ],
68 | "metadata": {
69 | "id": "-gRIz-JDkWNb"
70 | }
71 | },
72 | {
73 | "cell_type": "code",
74 | "source": [
75 | "!git clone https://github.com/sugarforever/wtf-langchain.git"
76 | ],
77 | "metadata": {
78 | "id": "p8o-wQXDihsg"
79 | },
80 | "execution_count": null,
81 | "outputs": []
82 | },
83 | {
84 | "cell_type": "code",
85 | "source": [
86 | "from llama_index import SimpleDirectoryReader\n",
87 | "reader = SimpleDirectoryReader(\n",
88 | " input_dir=\"./wtf-langchain\",\n",
89 | " required_exts=[\".md\"],\n",
90 | " recursive=True\n",
91 | ")\n",
92 | "docs = reader.load_data()"
93 | ],
94 | "metadata": {
95 | "id": "X8LnUM2Yiut7"
96 | },
97 | "execution_count": null,
98 | "outputs": []
99 | },
100 | {
101 | "cell_type": "code",
102 | "source": [
103 | "docs"
104 | ],
105 | "metadata": {
106 | "id": "TKn4vx29i8x_"
107 | },
108 | "execution_count": null,
109 | "outputs": []
110 | },
111 | {
112 | "cell_type": "code",
113 | "source": [
114 | "from llama_index import VectorStoreIndex\n",
115 | "\n",
116 | "index = VectorStoreIndex.from_documents(docs)\n",
117 | "query_engine = index.as_query_engine()\n",
118 | "response = query_engine.query(\"什么是WTF LangChain?\")\n",
119 | "print(response)"
120 | ],
121 | "metadata": {
122 | "id": "Qu4lEW_7jJva"
123 | },
124 | "execution_count": null,
125 | "outputs": []
126 | },
127 | {
128 | "cell_type": "code",
129 | "source": [
130 | "response = query_engine.query(\"WTF LangChain教程的目的是什么?\")\n",
131 | "print(response)"
132 | ],
133 | "metadata": {
134 | "id": "GU7NmFnVjoaj"
135 | },
136 | "execution_count": null,
137 | "outputs": []
138 | },
139 | {
140 | "cell_type": "markdown",
141 | "source": [
142 | "## Load data connectors from LlamaHub\n",
143 | "\n",
144 | "We will the open source langchain tutorial https://github.com/sugarforever/wtf-langchain and load it into LlamaIndex defined documents for query engine usage."
145 | ],
146 | "metadata": {
147 | "id": "wl-zzPA9ksay"
148 | }
149 | },
150 | {
151 | "cell_type": "code",
152 | "source": [
153 | "from pathlib import Path\n",
154 | "from llama_index import download_loader\n",
155 | "\n",
156 | "MarkdownReader = download_loader(\"MarkdownReader\")\n",
157 | "\n",
158 | "loader = MarkdownReader()\n",
159 | "documents = loader.load_data(file=Path('./wtf-langchain/README.md'))"
160 | ],
161 | "metadata": {
162 | "id": "H30CAo0mk4um"
163 | },
164 | "execution_count": null,
165 | "outputs": []
166 | },
167 | {
168 | "cell_type": "code",
169 | "source": [
170 | "index = VectorStoreIndex.from_documents(documents)\n",
171 | "query_engine = index.as_query_engine()\n",
172 | "response = query_engine.query(\"LangChain极简入门教程用的什么框架版本?\")\n",
173 | "print(response)"
174 | ],
175 | "metadata": {
176 | "id": "AqLNKLkrlDkJ"
177 | },
178 | "execution_count": null,
179 | "outputs": []
180 | }
181 | ]
182 | }
--------------------------------------------------------------------------------
/05_Documents_Nodes/05_Documents_Nodes.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "nbformat": 4,
3 | "nbformat_minor": 0,
4 | "metadata": {
5 | "colab": {
6 | "provenance": [],
7 | "authorship_tag": "ABX9TyP9rGid9c/02pK6fB4pV9NS",
8 | "include_colab_link": true
9 | },
10 | "kernelspec": {
11 | "name": "python3",
12 | "display_name": "Python 3"
13 | },
14 | "language_info": {
15 | "name": "python"
16 | }
17 | },
18 | "cells": [
19 | {
20 | "cell_type": "markdown",
21 | "metadata": {
22 | "id": "view-in-github",
23 | "colab_type": "text"
24 | },
25 | "source": [
26 | "
"
27 | ]
28 | },
29 | {
30 | "cell_type": "markdown",
31 | "source": [
32 | "# Documents & Nodes"
33 | ],
34 | "metadata": {
35 | "id": "-VjYJcfAkTTu"
36 | }
37 | },
38 | {
39 | "cell_type": "code",
40 | "execution_count": 1,
41 | "metadata": {
42 | "id": "3gyCNIM1Im7-",
43 | "colab": {
44 | "base_uri": "https://localhost:8080/"
45 | },
46 | "outputId": "e047ccf8-896a-4abc-c373-0ab3a564a7e0"
47 | },
48 | "outputs": [
49 | {
50 | "output_type": "stream",
51 | "name": "stdout",
52 | "text": [
53 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m868.1/868.1 kB\u001b[0m \u001b[31m12.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
54 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.0/2.0 MB\u001b[0m \u001b[31m66.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
55 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.8/1.8 MB\u001b[0m \u001b[31m77.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
56 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m77.0/77.0 kB\u001b[0m \u001b[31m9.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
57 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m143.4/143.4 kB\u001b[0m \u001b[31m16.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
58 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m40.0/40.0 kB\u001b[0m \u001b[31m4.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
59 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m49.4/49.4 kB\u001b[0m \u001b[31m5.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
60 | "\u001b[?25h"
61 | ]
62 | }
63 | ],
64 | "source": [
65 | "!pip install -q -U llama-index"
66 | ]
67 | },
68 | {
69 | "cell_type": "markdown",
70 | "source": [
71 | "## Construct Document"
72 | ],
73 | "metadata": {
74 | "id": "-gRIz-JDkWNb"
75 | }
76 | },
77 | {
78 | "cell_type": "code",
79 | "source": [
80 | "from llama_index import Document\n",
81 | "text_list = [\"hello\", \"world\"]\n",
82 | "documents = [Document(text=t) for t in text_list]"
83 | ],
84 | "metadata": {
85 | "id": "HlCpu-fhSB32"
86 | },
87 | "execution_count": 4,
88 | "outputs": []
89 | },
90 | {
91 | "cell_type": "code",
92 | "source": [
93 | "documents"
94 | ],
95 | "metadata": {
96 | "colab": {
97 | "base_uri": "https://localhost:8080/"
98 | },
99 | "id": "5xjbm28ESRHB",
100 | "outputId": "de5fa42c-cc34-4851-e191-1c414bdeef25"
101 | },
102 | "execution_count": 5,
103 | "outputs": [
104 | {
105 | "output_type": "execute_result",
106 | "data": {
107 | "text/plain": [
108 | "[Document(id_='7335bb8a-56dc-4074-8e79-0406237cadb1', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='7debe8d278fe6c55c45f979269ab268102d75f8d48644d244cd0050dae0846ac', text='hello', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'),\n",
109 | " Document(id_='30041a91-a8e4-448f-971a-8203c0c77e06', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='12d52a924ac6bc76ec4101c5d1f55bb5b5365d5f4c524a21d6175a4b049a1962', text='world', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n')]"
110 | ]
111 | },
112 | "metadata": {},
113 | "execution_count": 5
114 | }
115 | ]
116 | },
117 | {
118 | "cell_type": "markdown",
119 | "source": [
120 | "## Customize Document"
121 | ],
122 | "metadata": {
123 | "id": "RSFhimunSvIw"
124 | }
125 | },
126 | {
127 | "cell_type": "markdown",
128 | "source": [
129 | "### Metadata customization"
130 | ],
131 | "metadata": {
132 | "id": "va6j1fZ1WCuN"
133 | }
134 | },
135 | {
136 | "cell_type": "markdown",
137 | "source": [
138 | "1. Customize in document construction"
139 | ],
140 | "metadata": {
141 | "id": "ARSXGkrjS1GB"
142 | }
143 | },
144 | {
145 | "cell_type": "code",
146 | "source": [
147 | "from llama_index import Document\n",
148 | "document = Document(\n",
149 | " text='Hello World',\n",
150 | " metadata={\n",
151 | " 'filename': 'hello_world.pdf',\n",
152 | " 'category': 'science'\n",
153 | " }\n",
154 | ")\n",
155 | "document"
156 | ],
157 | "metadata": {
158 | "colab": {
159 | "base_uri": "https://localhost:8080/"
160 | },
161 | "id": "WXOi9jtbTECK",
162 | "outputId": "6028214f-2fd3-4ce3-8f73-6f06d0b6505f"
163 | },
164 | "execution_count": 19,
165 | "outputs": [
166 | {
167 | "output_type": "execute_result",
168 | "data": {
169 | "text/plain": [
170 | "Document(id_='e38ed14a-b41b-439e-858a-10ac815b95bb', embedding=None, metadata={'filename': 'hello_world.pdf', 'category': 'science'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='5b1ac0318e8c80ae4c7d187cd698067606fd001256065952e78d7f6229d62e26', text='Hello World', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n')"
171 | ]
172 | },
173 | "metadata": {},
174 | "execution_count": 19
175 | }
176 | ]
177 | },
178 | {
179 | "cell_type": "markdown",
180 | "source": [
181 | "2. Customize after document is constructed"
182 | ],
183 | "metadata": {
184 | "id": "jnvmC0sMS56a"
185 | }
186 | },
187 | {
188 | "cell_type": "code",
189 | "source": [
190 | "document.metadata = {'filename': 'hello_world_v2.pdf'}\n",
191 | "document"
192 | ],
193 | "metadata": {
194 | "colab": {
195 | "base_uri": "https://localhost:8080/"
196 | },
197 | "id": "i4TMSsKUTMTF",
198 | "outputId": "6179448e-cb12-4049-a1bd-3b8b5ddd727b"
199 | },
200 | "execution_count": 20,
201 | "outputs": [
202 | {
203 | "output_type": "execute_result",
204 | "data": {
205 | "text/plain": [
206 | "Document(id_='e38ed14a-b41b-439e-858a-10ac815b95bb', embedding=None, metadata={'filename': 'hello_world_v2.pdf'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='5b1ac0318e8c80ae4c7d187cd698067606fd001256065952e78d7f6229d62e26', text='Hello World', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n')"
207 | ]
208 | },
209 | "metadata": {},
210 | "execution_count": 20
211 | }
212 | ]
213 | },
214 | {
215 | "cell_type": "markdown",
216 | "source": [
217 | "3. Customize in `SimpleDirectoryReader` usage"
218 | ],
219 | "metadata": {
220 | "id": "s72NT50tS-Vb"
221 | }
222 | },
223 | {
224 | "cell_type": "code",
225 | "source": [
226 | "!rm -rf data && mkdir data\n",
227 | "!echo 'hello llama!' > data/hello_llama.txt\n",
228 | "\n",
229 | "from llama_index import SimpleDirectoryReader\n",
230 | "filenama_hook = lambda filename: {'file_name': filename}\n",
231 | "documents = SimpleDirectoryReader('./data', file_metadata=filenama_hook).load_data()\n",
232 | "documents"
233 | ],
234 | "metadata": {
235 | "colab": {
236 | "base_uri": "https://localhost:8080/"
237 | },
238 | "id": "bqyplmYZTSI5",
239 | "outputId": "a34d5bab-3eb2-4c1b-e86d-d284acf58446"
240 | },
241 | "execution_count": 7,
242 | "outputs": [
243 | {
244 | "output_type": "execute_result",
245 | "data": {
246 | "text/plain": [
247 | "[Document(id_='15db2314-2da5-4221-bd71-54c2e66b4908', embedding=None, metadata={'file_name': 'data/hello_llama.txt'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='9787e1e664b99bd6bd78fbcc816ab052d8f0808e5fcda36a9d89d360ede017b6', text='hello llama!\\n', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n')]"
248 | ]
249 | },
250 | "metadata": {},
251 | "execution_count": 7
252 | }
253 | ]
254 | },
255 | {
256 | "cell_type": "markdown",
257 | "source": [
258 | "### Document ID Customization"
259 | ],
260 | "metadata": {
261 | "id": "BJ71E5AsWFha"
262 | }
263 | },
264 | {
265 | "cell_type": "code",
266 | "source": [
267 | "from llama_index import Document\n",
268 | "document = Document(text='Hello World')\n",
269 | "document.doc_id = \"xxxx-yyyy\"\n",
270 | "document"
271 | ],
272 | "metadata": {
273 | "colab": {
274 | "base_uri": "https://localhost:8080/"
275 | },
276 | "id": "btT5kT-CWIQd",
277 | "outputId": "39f3ef88-edda-4cd9-9ba0-ca616a0661d5"
278 | },
279 | "execution_count": 18,
280 | "outputs": [
281 | {
282 | "output_type": "execute_result",
283 | "data": {
284 | "text/plain": [
285 | "Document(id_='xxxx-yyyy', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='a339d80c761418702b627851cbd963eb722f12dfeb360d441f9123c2d3d4fcf7', text='Hello World', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n')"
286 | ]
287 | },
288 | "metadata": {},
289 | "execution_count": 18
290 | }
291 | ]
292 | },
293 | {
294 | "cell_type": "markdown",
295 | "source": [
296 | "## Construct Node\n",
297 | "\n",
298 | "Nodes are a first-class citizen in LlamaIndex. Developers can choose to define Nodes and all its attributes directly or parse source Documents into Nodes through the `NodeParser` classes."
299 | ],
300 | "metadata": {
301 | "id": "BKTqmxHJTw0e"
302 | }
303 | },
304 | {
305 | "cell_type": "markdown",
306 | "source": [
307 | "1. Construct directly"
308 | ],
309 | "metadata": {
310 | "id": "YJlU3R-0UGfp"
311 | }
312 | },
313 | {
314 | "cell_type": "code",
315 | "source": [
316 | "from llama_index.schema import TextNode\n",
317 | "node = TextNode(text=\"hello world\", id_=\"1234-5678\")\n",
318 | "node"
319 | ],
320 | "metadata": {
321 | "colab": {
322 | "base_uri": "https://localhost:8080/"
323 | },
324 | "id": "wKesz2hsUMae",
325 | "outputId": "04a23337-7e1d-4b81-8785-10331ee14728"
326 | },
327 | "execution_count": 9,
328 | "outputs": [
329 | {
330 | "output_type": "execute_result",
331 | "data": {
332 | "text/plain": [
333 | "TextNode(id_='1234-5678', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='351b5b79b4cf0d96b24050f568933500d93160ec03e5e996aa31c29cf2d2f654', text='hello world', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n')"
334 | ]
335 | },
336 | "metadata": {},
337 | "execution_count": 9
338 | }
339 | ]
340 | },
341 | {
342 | "cell_type": "markdown",
343 | "source": [
344 | "2. Construct by NodeParser\n",
345 | "\n",
346 | " We will reuse the `documents` variable loaded above."
347 | ],
348 | "metadata": {
349 | "id": "GoF5FZmyUKt3"
350 | }
351 | },
352 | {
353 | "cell_type": "code",
354 | "source": [
355 | "from llama_index.node_parser import SimpleNodeParser\n",
356 | "\n",
357 | "parser = SimpleNodeParser.from_defaults()\n",
358 | "nodes = parser.get_nodes_from_documents(documents)\n",
359 | "nodes"
360 | ],
361 | "metadata": {
362 | "colab": {
363 | "base_uri": "https://localhost:8080/"
364 | },
365 | "id": "wz6SWDU6UUs7",
366 | "outputId": "4112e9fd-b67e-4bcb-c159-439b29d4c988"
367 | },
368 | "execution_count": 10,
369 | "outputs": [
370 | {
371 | "output_type": "stream",
372 | "name": "stderr",
373 | "text": [
374 | "[nltk_data] Downloading package punkt to /tmp/llama_index...\n",
375 | "[nltk_data] Unzipping tokenizers/punkt.zip.\n"
376 | ]
377 | },
378 | {
379 | "output_type": "execute_result",
380 | "data": {
381 | "text/plain": [
382 | "[TextNode(id_='45d7982c-c9d1-4384-b0e0-f32defff4722', embedding=None, metadata={'file_name': 'data/hello_llama.txt'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={: RelatedNodeInfo(node_id='15db2314-2da5-4221-bd71-54c2e66b4908', node_type=None, metadata={'file_name': 'data/hello_llama.txt'}, hash='9787e1e664b99bd6bd78fbcc816ab052d8f0808e5fcda36a9d89d360ede017b6')}, hash='9d29d9fb54e3e9054a39b57a34665527f97c3bf22b6f3cd2b7fdc1d72d440527', text='hello llama!', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n')]"
383 | ]
384 | },
385 | "metadata": {},
386 | "execution_count": 10
387 | }
388 | ]
389 | },
390 | {
391 | "cell_type": "markdown",
392 | "source": [
393 | "## Customize Node\n"
394 | ],
395 | "metadata": {
396 | "id": "rHi-taOrUsQN"
397 | }
398 | },
399 | {
400 | "cell_type": "markdown",
401 | "source": [
402 | "1. Define nodes relationships"
403 | ],
404 | "metadata": {
405 | "id": "tMF9aWZ2UxMZ"
406 | }
407 | },
408 | {
409 | "cell_type": "code",
410 | "source": [
411 | "from llama_index.schema import TextNode, NodeRelationship, RelatedNodeInfo\n",
412 | "\n",
413 | "hello_node = TextNode(text=\"Hello\", id_=\"1111-1111\")\n",
414 | "world_node = TextNode(text=\"World\", id_=\"2222-2222\")\n",
415 | "\n",
416 | "hello_node.relationships[NodeRelationship.NEXT] = RelatedNodeInfo(node_id=world_node.node_id, metadata={\"created_by\": \"VerySmallWoods\"})\n",
417 | "world_node.relationships[NodeRelationship.PREVIOUS] = RelatedNodeInfo(node_id=hello_node.node_id)\n",
418 | "nodes = [hello_node, world_node]\n",
419 | "nodes"
420 | ],
421 | "metadata": {
422 | "colab": {
423 | "base_uri": "https://localhost:8080/"
424 | },
425 | "id": "N4iTvVf6Uu1N",
426 | "outputId": "67c5a97e-2241-4e14-fad5-6fdbb44ed854"
427 | },
428 | "execution_count": 21,
429 | "outputs": [
430 | {
431 | "output_type": "execute_result",
432 | "data": {
433 | "text/plain": [
434 | "[TextNode(id_='1111-1111', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={: RelatedNodeInfo(node_id='2222-2222', node_type=None, metadata={'created_by': 'VerySmallWoods'}, hash=None)}, hash='378b8aef7d6589c0b81c83cb8eaa637088d5ae46b9c27fd5ce9d4c56ed676b57', text='Hello', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'),\n",
435 | " TextNode(id_='2222-2222', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={: RelatedNodeInfo(node_id='1111-1111', node_type=None, metadata={}, hash=None)}, hash='30cc2bcfbe4881f1655525a2a613acae3ab9cc0800b5bf21cac89fce84e8197e', text='World', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n')]"
436 | ]
437 | },
438 | "metadata": {},
439 | "execution_count": 21
440 | }
441 | ]
442 | },
443 | {
444 | "cell_type": "markdown",
445 | "source": [
446 | "2. Customize node ID"
447 | ],
448 | "metadata": {
449 | "id": "pur6GMrKU1v2"
450 | }
451 | },
452 | {
453 | "cell_type": "code",
454 | "source": [
455 | "from llama_index.schema import TextNode, NodeRelationship, RelatedNodeInfo\n",
456 | "\n",
457 | "hello_node = TextNode(text=\"Hello\", id_=\"1111-1111\")\n",
458 | "hello_node.id_ = '3333-3333'\n",
459 | "hello_node"
460 | ],
461 | "metadata": {
462 | "colab": {
463 | "base_uri": "https://localhost:8080/"
464 | },
465 | "id": "i6cDdDuXU4-3",
466 | "outputId": "4f2f3311-cfd5-4eff-faa7-a1bebe0738b4"
467 | },
468 | "execution_count": 16,
469 | "outputs": [
470 | {
471 | "output_type": "execute_result",
472 | "data": {
473 | "text/plain": [
474 | "TextNode(id_='xx', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='378b8aef7d6589c0b81c83cb8eaa637088d5ae46b9c27fd5ce9d4c56ed676b57', text='Hello', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n')"
475 | ]
476 | },
477 | "metadata": {},
478 | "execution_count": 16
479 | }
480 | ]
481 | }
482 | ]
483 | }
--------------------------------------------------------------------------------
/llamaindex_llmlingua_prompt_compression.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "nbformat": 4,
3 | "nbformat_minor": 0,
4 | "metadata": {
5 | "colab": {
6 | "provenance": [],
7 | "gpuType": "T4",
8 | "authorship_tag": "ABX9TyP2KgCcjpZhImtE7dVepHrt",
9 | "include_colab_link": true
10 | },
11 | "kernelspec": {
12 | "name": "python3",
13 | "display_name": "Python 3"
14 | },
15 | "language_info": {
16 | "name": "python"
17 | },
18 | "accelerator": "GPU",
19 | "widgets": {
20 | "application/vnd.jupyter.widget-state+json": {
21 | "53687d7d633941f0a54a6ec53a0db386": {
22 | "model_module": "@jupyter-widgets/controls",
23 | "model_name": "HBoxModel",
24 | "model_module_version": "1.5.0",
25 | "state": {
26 | "_dom_classes": [],
27 | "_model_module": "@jupyter-widgets/controls",
28 | "_model_module_version": "1.5.0",
29 | "_model_name": "HBoxModel",
30 | "_view_count": null,
31 | "_view_module": "@jupyter-widgets/controls",
32 | "_view_module_version": "1.5.0",
33 | "_view_name": "HBoxView",
34 | "box_style": "",
35 | "children": [
36 | "IPY_MODEL_8380e644a56b48c19a42e23bb39a98da",
37 | "IPY_MODEL_0ce7633498da4978932f3dc6c0b94659",
38 | "IPY_MODEL_2161e5d9426a416cb212f8107d23dcb5"
39 | ],
40 | "layout": "IPY_MODEL_5508d2bfa3454b16ae682d5bfa62b5ff"
41 | }
42 | },
43 | "8380e644a56b48c19a42e23bb39a98da": {
44 | "model_module": "@jupyter-widgets/controls",
45 | "model_name": "HTMLModel",
46 | "model_module_version": "1.5.0",
47 | "state": {
48 | "_dom_classes": [],
49 | "_model_module": "@jupyter-widgets/controls",
50 | "_model_module_version": "1.5.0",
51 | "_model_name": "HTMLModel",
52 | "_view_count": null,
53 | "_view_module": "@jupyter-widgets/controls",
54 | "_view_module_version": "1.5.0",
55 | "_view_name": "HTMLView",
56 | "description": "",
57 | "description_tooltip": null,
58 | "layout": "IPY_MODEL_82eebd4dfe764734854070009499bc53",
59 | "placeholder": "",
60 | "style": "IPY_MODEL_024b11dc68434750a3dca9d7be5b75b6",
61 | "value": "Loading checkpoint shards: 100%"
62 | }
63 | },
64 | "0ce7633498da4978932f3dc6c0b94659": {
65 | "model_module": "@jupyter-widgets/controls",
66 | "model_name": "FloatProgressModel",
67 | "model_module_version": "1.5.0",
68 | "state": {
69 | "_dom_classes": [],
70 | "_model_module": "@jupyter-widgets/controls",
71 | "_model_module_version": "1.5.0",
72 | "_model_name": "FloatProgressModel",
73 | "_view_count": null,
74 | "_view_module": "@jupyter-widgets/controls",
75 | "_view_module_version": "1.5.0",
76 | "_view_name": "ProgressView",
77 | "bar_style": "success",
78 | "description": "",
79 | "description_tooltip": null,
80 | "layout": "IPY_MODEL_6485161f74c744fd806e405a7a5e722a",
81 | "max": 2,
82 | "min": 0,
83 | "orientation": "horizontal",
84 | "style": "IPY_MODEL_422e02ef01614a8fb6c9baecb32abe00",
85 | "value": 2
86 | }
87 | },
88 | "2161e5d9426a416cb212f8107d23dcb5": {
89 | "model_module": "@jupyter-widgets/controls",
90 | "model_name": "HTMLModel",
91 | "model_module_version": "1.5.0",
92 | "state": {
93 | "_dom_classes": [],
94 | "_model_module": "@jupyter-widgets/controls",
95 | "_model_module_version": "1.5.0",
96 | "_model_name": "HTMLModel",
97 | "_view_count": null,
98 | "_view_module": "@jupyter-widgets/controls",
99 | "_view_module_version": "1.5.0",
100 | "_view_name": "HTMLView",
101 | "description": "",
102 | "description_tooltip": null,
103 | "layout": "IPY_MODEL_d54db20bb4d64240bea60519f3cc4650",
104 | "placeholder": "",
105 | "style": "IPY_MODEL_c5cb7c55d0684a60a60076459501ba65",
106 | "value": " 2/2 [01:04<00:00, 29.49s/it]"
107 | }
108 | },
109 | "5508d2bfa3454b16ae682d5bfa62b5ff": {
110 | "model_module": "@jupyter-widgets/base",
111 | "model_name": "LayoutModel",
112 | "model_module_version": "1.2.0",
113 | "state": {
114 | "_model_module": "@jupyter-widgets/base",
115 | "_model_module_version": "1.2.0",
116 | "_model_name": "LayoutModel",
117 | "_view_count": null,
118 | "_view_module": "@jupyter-widgets/base",
119 | "_view_module_version": "1.2.0",
120 | "_view_name": "LayoutView",
121 | "align_content": null,
122 | "align_items": null,
123 | "align_self": null,
124 | "border": null,
125 | "bottom": null,
126 | "display": null,
127 | "flex": null,
128 | "flex_flow": null,
129 | "grid_area": null,
130 | "grid_auto_columns": null,
131 | "grid_auto_flow": null,
132 | "grid_auto_rows": null,
133 | "grid_column": null,
134 | "grid_gap": null,
135 | "grid_row": null,
136 | "grid_template_areas": null,
137 | "grid_template_columns": null,
138 | "grid_template_rows": null,
139 | "height": null,
140 | "justify_content": null,
141 | "justify_items": null,
142 | "left": null,
143 | "margin": null,
144 | "max_height": null,
145 | "max_width": null,
146 | "min_height": null,
147 | "min_width": null,
148 | "object_fit": null,
149 | "object_position": null,
150 | "order": null,
151 | "overflow": null,
152 | "overflow_x": null,
153 | "overflow_y": null,
154 | "padding": null,
155 | "right": null,
156 | "top": null,
157 | "visibility": null,
158 | "width": null
159 | }
160 | },
161 | "82eebd4dfe764734854070009499bc53": {
162 | "model_module": "@jupyter-widgets/base",
163 | "model_name": "LayoutModel",
164 | "model_module_version": "1.2.0",
165 | "state": {
166 | "_model_module": "@jupyter-widgets/base",
167 | "_model_module_version": "1.2.0",
168 | "_model_name": "LayoutModel",
169 | "_view_count": null,
170 | "_view_module": "@jupyter-widgets/base",
171 | "_view_module_version": "1.2.0",
172 | "_view_name": "LayoutView",
173 | "align_content": null,
174 | "align_items": null,
175 | "align_self": null,
176 | "border": null,
177 | "bottom": null,
178 | "display": null,
179 | "flex": null,
180 | "flex_flow": null,
181 | "grid_area": null,
182 | "grid_auto_columns": null,
183 | "grid_auto_flow": null,
184 | "grid_auto_rows": null,
185 | "grid_column": null,
186 | "grid_gap": null,
187 | "grid_row": null,
188 | "grid_template_areas": null,
189 | "grid_template_columns": null,
190 | "grid_template_rows": null,
191 | "height": null,
192 | "justify_content": null,
193 | "justify_items": null,
194 | "left": null,
195 | "margin": null,
196 | "max_height": null,
197 | "max_width": null,
198 | "min_height": null,
199 | "min_width": null,
200 | "object_fit": null,
201 | "object_position": null,
202 | "order": null,
203 | "overflow": null,
204 | "overflow_x": null,
205 | "overflow_y": null,
206 | "padding": null,
207 | "right": null,
208 | "top": null,
209 | "visibility": null,
210 | "width": null
211 | }
212 | },
213 | "024b11dc68434750a3dca9d7be5b75b6": {
214 | "model_module": "@jupyter-widgets/controls",
215 | "model_name": "DescriptionStyleModel",
216 | "model_module_version": "1.5.0",
217 | "state": {
218 | "_model_module": "@jupyter-widgets/controls",
219 | "_model_module_version": "1.5.0",
220 | "_model_name": "DescriptionStyleModel",
221 | "_view_count": null,
222 | "_view_module": "@jupyter-widgets/base",
223 | "_view_module_version": "1.2.0",
224 | "_view_name": "StyleView",
225 | "description_width": ""
226 | }
227 | },
228 | "6485161f74c744fd806e405a7a5e722a": {
229 | "model_module": "@jupyter-widgets/base",
230 | "model_name": "LayoutModel",
231 | "model_module_version": "1.2.0",
232 | "state": {
233 | "_model_module": "@jupyter-widgets/base",
234 | "_model_module_version": "1.2.0",
235 | "_model_name": "LayoutModel",
236 | "_view_count": null,
237 | "_view_module": "@jupyter-widgets/base",
238 | "_view_module_version": "1.2.0",
239 | "_view_name": "LayoutView",
240 | "align_content": null,
241 | "align_items": null,
242 | "align_self": null,
243 | "border": null,
244 | "bottom": null,
245 | "display": null,
246 | "flex": null,
247 | "flex_flow": null,
248 | "grid_area": null,
249 | "grid_auto_columns": null,
250 | "grid_auto_flow": null,
251 | "grid_auto_rows": null,
252 | "grid_column": null,
253 | "grid_gap": null,
254 | "grid_row": null,
255 | "grid_template_areas": null,
256 | "grid_template_columns": null,
257 | "grid_template_rows": null,
258 | "height": null,
259 | "justify_content": null,
260 | "justify_items": null,
261 | "left": null,
262 | "margin": null,
263 | "max_height": null,
264 | "max_width": null,
265 | "min_height": null,
266 | "min_width": null,
267 | "object_fit": null,
268 | "object_position": null,
269 | "order": null,
270 | "overflow": null,
271 | "overflow_x": null,
272 | "overflow_y": null,
273 | "padding": null,
274 | "right": null,
275 | "top": null,
276 | "visibility": null,
277 | "width": null
278 | }
279 | },
280 | "422e02ef01614a8fb6c9baecb32abe00": {
281 | "model_module": "@jupyter-widgets/controls",
282 | "model_name": "ProgressStyleModel",
283 | "model_module_version": "1.5.0",
284 | "state": {
285 | "_model_module": "@jupyter-widgets/controls",
286 | "_model_module_version": "1.5.0",
287 | "_model_name": "ProgressStyleModel",
288 | "_view_count": null,
289 | "_view_module": "@jupyter-widgets/base",
290 | "_view_module_version": "1.2.0",
291 | "_view_name": "StyleView",
292 | "bar_color": null,
293 | "description_width": ""
294 | }
295 | },
296 | "d54db20bb4d64240bea60519f3cc4650": {
297 | "model_module": "@jupyter-widgets/base",
298 | "model_name": "LayoutModel",
299 | "model_module_version": "1.2.0",
300 | "state": {
301 | "_model_module": "@jupyter-widgets/base",
302 | "_model_module_version": "1.2.0",
303 | "_model_name": "LayoutModel",
304 | "_view_count": null,
305 | "_view_module": "@jupyter-widgets/base",
306 | "_view_module_version": "1.2.0",
307 | "_view_name": "LayoutView",
308 | "align_content": null,
309 | "align_items": null,
310 | "align_self": null,
311 | "border": null,
312 | "bottom": null,
313 | "display": null,
314 | "flex": null,
315 | "flex_flow": null,
316 | "grid_area": null,
317 | "grid_auto_columns": null,
318 | "grid_auto_flow": null,
319 | "grid_auto_rows": null,
320 | "grid_column": null,
321 | "grid_gap": null,
322 | "grid_row": null,
323 | "grid_template_areas": null,
324 | "grid_template_columns": null,
325 | "grid_template_rows": null,
326 | "height": null,
327 | "justify_content": null,
328 | "justify_items": null,
329 | "left": null,
330 | "margin": null,
331 | "max_height": null,
332 | "max_width": null,
333 | "min_height": null,
334 | "min_width": null,
335 | "object_fit": null,
336 | "object_position": null,
337 | "order": null,
338 | "overflow": null,
339 | "overflow_x": null,
340 | "overflow_y": null,
341 | "padding": null,
342 | "right": null,
343 | "top": null,
344 | "visibility": null,
345 | "width": null
346 | }
347 | },
348 | "c5cb7c55d0684a60a60076459501ba65": {
349 | "model_module": "@jupyter-widgets/controls",
350 | "model_name": "DescriptionStyleModel",
351 | "model_module_version": "1.5.0",
352 | "state": {
353 | "_model_module": "@jupyter-widgets/controls",
354 | "_model_module_version": "1.5.0",
355 | "_model_name": "DescriptionStyleModel",
356 | "_view_count": null,
357 | "_view_module": "@jupyter-widgets/base",
358 | "_view_module_version": "1.2.0",
359 | "_view_name": "StyleView",
360 | "description_width": ""
361 | }
362 | }
363 | }
364 | }
365 | },
366 | "cells": [
367 | {
368 | "cell_type": "markdown",
369 | "metadata": {
370 | "id": "view-in-github",
371 | "colab_type": "text"
372 | },
373 | "source": [
374 | "
"
375 | ]
376 | },
377 | {
378 | "cell_type": "markdown",
379 | "source": [
380 | "# Prompt Compression with LlamaIndex and LlmLingua\n",
381 | "\n",
382 | "In `RAG` (Retrieval-Augmented Generation), the input tokens consume the most resources, in which a user query is typically sent to a vector storage, to fetch the vector data of the most similar pieces of information. According to the context or relevant documents retrieved from a vector storage by the user's query, the prompt (input) can even reach thousands of tokens.\n",
383 | "\n",
384 | "`Prompt compression` is the technique that's used to shorten the original prompt while keeping the most important information, and speed up the language model response generation.\n",
385 | "\n",
386 | "The theory it's based on, is that language often includes unnecessary repetition.\n",
387 | "\n",
388 | "## LLMLingua\n",
389 | "\n",
390 | "https://github.com/microsoft/LLMLingua\n",
391 | "\n",
392 | "LLMLingua utilizes a compact, well-trained language model (e.g., GPT2-small, LLaMA-7B) to identify and remove non-essential tokens in prompts. This approach enables efficient inference with large language models (LLMs), achieving up to 20x compression with minimal performance loss."
393 | ],
394 | "metadata": {
395 | "id": "8s5m5rOhvNio"
396 | }
397 | },
398 | {
399 | "cell_type": "code",
400 | "execution_count": 1,
401 | "metadata": {
402 | "id": "26mZc1-BHZBy"
403 | },
404 | "outputs": [],
405 | "source": [
406 | "!pip install llama-index openai -q -U"
407 | ]
408 | },
409 | {
410 | "cell_type": "code",
411 | "source": [
412 | "import openai\n",
413 | "\n",
414 | "from google.colab import userdata\n",
415 | "openai.api_key = userdata.get('OPENAI_API_KEY')"
416 | ],
417 | "metadata": {
418 | "id": "PFObN5uiM_sS"
419 | },
420 | "execution_count": 2,
421 | "outputs": []
422 | },
423 | {
424 | "cell_type": "code",
425 | "source": [
426 | "from llama_index import ServiceContext, set_global_service_context\n",
427 | "from llama_index.llms import OpenAI\n",
428 | "from llama_index.callbacks import CallbackManager, TokenCountingHandler\n",
429 | "import tiktoken\n",
430 | "\n",
431 | "OPENAI_MODEL_NAME = \"gpt-3.5-turbo-16k\"\n",
432 | "\n",
433 | "llm = OpenAI(model=OPENAI_MODEL_NAME)\n",
434 | "\n",
435 | "token_counter = TokenCountingHandler(\n",
436 | " tokenizer=tiktoken.encoding_for_model(OPENAI_MODEL_NAME).encode\n",
437 | ")\n",
438 | "callback_manager = CallbackManager([token_counter])\n",
439 | "\n",
440 | "service_context = ServiceContext.from_defaults(\n",
441 | " llm=llm, callback_manager=callback_manager\n",
442 | ")\n",
443 | "set_global_service_context(service_context)"
444 | ],
445 | "metadata": {
446 | "id": "JtBt-ucONiNg"
447 | },
448 | "execution_count": 3,
449 | "outputs": []
450 | },
451 | {
452 | "cell_type": "code",
453 | "source": [
454 | "from llama_index import VectorStoreIndex, download_loader\n",
455 | "\n",
456 | "WikipediaReader = download_loader(\"WikipediaReader\")\n",
457 | "loader = WikipediaReader()\n",
458 | "documents = loader.load_data(pages=['Premier League'])"
459 | ],
460 | "metadata": {
461 | "id": "Vv2TgiiBXS-W"
462 | },
463 | "execution_count": 4,
464 | "outputs": []
465 | },
466 | {
467 | "cell_type": "code",
468 | "source": [
469 | "print(len(documents))"
470 | ],
471 | "metadata": {
472 | "colab": {
473 | "base_uri": "https://localhost:8080/"
474 | },
475 | "id": "GKvzpjsjX-ru",
476 | "outputId": "91e1c0cf-90ec-4917-ac1c-43953378e2bd"
477 | },
478 | "execution_count": 5,
479 | "outputs": [
480 | {
481 | "output_type": "stream",
482 | "name": "stdout",
483 | "text": [
484 | "1\n"
485 | ]
486 | }
487 | ]
488 | },
489 | {
490 | "cell_type": "code",
491 | "source": [
492 | "retriever = VectorStoreIndex.from_documents(documents).as_retriever(similarity_top_k=3)"
493 | ],
494 | "metadata": {
495 | "id": "3ah1InsfN-Hz"
496 | },
497 | "execution_count": 6,
498 | "outputs": []
499 | },
500 | {
501 | "cell_type": "code",
502 | "source": [
503 | "question = \"Why did English clubs get banned fron European competition?\""
504 | ],
505 | "metadata": {
506 | "id": "Wtb8jeA6OHlc"
507 | },
508 | "execution_count": 7,
509 | "outputs": []
510 | },
511 | {
512 | "cell_type": "code",
513 | "source": [
514 | "relevant_documents = retriever.retrieve(question)"
515 | ],
516 | "metadata": {
517 | "id": "UtkLls8JOKxw"
518 | },
519 | "execution_count": 8,
520 | "outputs": []
521 | },
522 | {
523 | "cell_type": "code",
524 | "source": [
525 | "contexts = [n.get_content() for n in relevant_documents]\n",
526 | "contexts"
527 | ],
528 | "metadata": {
529 | "colab": {
530 | "base_uri": "https://localhost:8080/"
531 | },
532 | "id": "s6ATDj-MOREP",
533 | "outputId": "3c7d9c73-30e7-4e67-b95b-496d88d77ff3"
534 | },
535 | "execution_count": 9,
536 | "outputs": [
537 | {
538 | "output_type": "execute_result",
539 | "data": {
540 | "text/plain": [
541 | "['== History ==\\n\\n\\n=== Origins ===\\nDespite significant European success in the 1970s and early 1980s, the late 1980s marked a low point for English football. Stadiums were deteriorating and supporters endured poor facilities, hooliganism was rife, and English clubs had been banned from European competition for five years following the Heysel Stadium disaster in 1985. The Football League First Division, the top level of English football since 1888, was behind leagues such as Italy\\'s Serie A and Spain\\'s La Liga in attendances and revenues, and several top English players had moved abroad.By the turn of the 1990s, the downward trend was starting to reverse. At the 1990 FIFA World Cup, England reached the semi-finals; UEFA, European football\\'s governing body, lifted the five-year ban on English clubs playing in European competitions in 1990, resulting in Manchester United lifting the Cup Winners\\' Cup in 1991. The Taylor Report on stadium safety standards, which proposed expensive upgrades to create all-seater stadiums in the aftermath of the Hillsborough disaster, was published in January 1990.During the 1980s, major English clubs had begun to transform into business ventures, applying commercial principles to club administration to maximise revenue. Martin Edwards of Manchester United, Irving Scholar of Tottenham Hotspur, and David Dein of Arsenal were among the leaders in this transformation. The commercial imperative led to the top clubs seeking to increase their power and revenue: the clubs in Division One threatened to break away from the Football League, and in doing so, they managed to increase their voting power and gain a more favourable financial arrangement, taking a 50% share of all television and sponsorship income in 1986. They demanded that television companies should pay more for their coverage of football matches, and revenue from television grew in importance. The Football League received £6.3 million for a two-year agreement in 1986, but by 1988, in a deal agreed with ITV, the price rose to £44 million over four years, with the leading clubs taking 75% of the cash. According to Scholar, who was involved in the negotiations of television deals, each of the First Division clubs received only around £25,000 per year from television rights before 1986, this increased to around £50,000 in the 1986 negotiation, then to £600,000 in 1988. The 1988 negotiations were conducted under the threat of ten clubs leaving to form a \"super league\", but they were eventually persuaded to stay, with the top clubs taking the lion\\'s share of the deal. The negotiations also convinced the bigger clubs that in order to receive enough votes, they needed to take the whole of First Division with them instead of a smaller \"super league\". By the beginning of the 1990s, the big clubs again considered breaking away, especially now that they had to fund the cost of stadium upgrade as proposed by the Taylor Report.In 1990, the managing director of London Weekend Television (LWT), Greg Dyke, met with the representatives of the \"big five\" football clubs in England (Manchester United, Liverpool, Tottenham Hotspur, Everton and Arsenal) over a dinner. The meeting was to pave the way for a breakaway from The Football League. Dyke believed that it would be more lucrative for LWT if only the larger clubs in the country were featured on national television and wanted to establish whether the clubs would be interested in a larger share of television rights money. The five clubs agreed with the suggestion and decided to press ahead with it; however, the league would have no credibility without the backing of The Football Association, and so David Dein of Arsenal held talks to see whether the FA were receptive to the idea. The FA did not have an amicable relationship with the Football League at the time and considered it as a way to weaken the Football League\\'s position. The FA released a report in June 1991, Blueprint for the Future of Football, that supported the plan for the Premier League with the FA as the ultimate authority that would oversee the breakaway league.',\n",
542 | " '=== Performance in international competition ===\\n\\nWith 48 continental trophies won, English clubs are the third-most successful in European football, behind Italy (49) and Spain (65). In the top-tier UEFA Champions League, a record six English clubs have won a total of 15 titles and lost a further 11 finals, behind Spanish clubs with 19 and 11, respectively. In the second-tier UEFA Europa League, English clubs are also second, with nine victories and eight losses in the finals. In the former second-tier UEFA Cup Winners\\' Cup, English teams won a record eight titles and had a further five finalists. In the non-UEFA organized Inter-Cities Fairs Cup, English clubs provided four winners and four runners-up, the second-most behind Spain with six and three, respectively. In the newly created third-tier UEFA Europa Conference League, English clubs have won a joint-record one title so far. In the former fourth-tier UEFA Intertoto Cup, England won four titles and had a further final appearance, placing it fifth in the rankings, although English clubs were notorious for treating the tournament with disdain, either sending \"B\" squads or withdrawing from it altogether. In the one-off UEFA Super Cup, England has ten winners and ten runners-up, the second-most behind Spain with 16 and 15, respectively. Similarly to the Intertoto Cup, English teams did not take the former Intercontinental Cup seriously enough, despite its international status of the Club World Championship. They a made a total of six appearances in the one-off competition, winning only one of them, and withdrew a further three times. English clubs have won the FIFA-organized Club World Cup four times, tied for the second-most with Brazil, and behind only Spain, with eight.\\n\\n\\n== Sponsorship ==\\n\\nAfter an inaugural season with no sponsorship, the Premier League was sponsored by Carling from 1993 until 2001, during which time it was known as the FA Carling Premiership. In 2001, a new sponsorship deal with Barclaycard saw the league rebranded the FA Barclaycard Premiership, which was changed to the FA Barclays Premiership in time for the 2004–05 season.\\nFor the 2007–08 season, the league was rebranded the Barclays Premier League.\\nBarclays\\' deal with the Premier League expired at the end of the 2015–16 season. The FA announced on 4 June 2015 that it would not pursue any further title sponsorship deals for the Premier League, arguing that they wanted to build a \"clean\" brand for the competition more in line with those of major U.S. sports leagues.\\nAs well as sponsorship for the league itself, the Premier League has a number of official partners and suppliers. The official ball supplier for the league is Nike who have had the contract since the 2000–01 season when they took over from Mitre. Under its Merlin brand, Topps held the licence to produce collectables for the Premier League between 1994 and 2019 including stickers (for their sticker album) and trading cards. Launched in the 2007–08 season, Topps\\' Match Attax, the official Premier League trading card game, is the best selling boys collectable in the UK, and is also the biggest selling sports trading card game in the world. In October 2018, Panini were awarded the licence to produce collectables from the 2019–20 season. The chocolate company Cadbury has been the official snack partner of the Premier League since 2017, and sponsored the Golden Boot, Golden Glove and Playmaker of the Season awards from the 2017–18 season to 2019–20 season. The Coca-Cola Company (under its Coca-Cola Zero Sugar product line) sponsored these awards during the 2020–21 season with Castrol being the current sponsor as of the 2021–22 season.',\n",
543 | " '=== Foundation (1990s) ===\\n\\nAt the close of the 1990–1991 season, a proposal was tabled for the establishment of a new league that would bring more money into the game overall. The Founder Members Agreement, signed on 17 July 1991 by the game\\'s top-flight clubs, established the basic principles for setting up the FA Premier League. The newly formed top division was to have commercial independence from The Football Association and the Football League, giving the FA Premier League licence to negotiate its own broadcast and sponsorship agreements. The argument given at the time was that the extra income would allow English clubs to compete with teams across Europe. Although Dyke played a significant role in the creation of the Premier League, he and ITV (of which LWT was part) lost out in the bidding for broadcast rights: BSkyB won with a bid of £304 million over five years, with the BBC awarded the highlights package broadcast on Match of the Day.The First Division clubs resigned en masse from the Football League in 1992, and on 27 May that year the FA Premier League was formed as a limited company, working out of an office at the Football Association\\'s then headquarters in Lancaster Gate. The 22 inaugural members of the new Premier League were:\\n\\nThis meant a break-up of the 104-year-old Football League that had operated until then with four divisions; the Premier League would operate with a single division and the Football League with three. There was no change in competition format; the same number of teams competed in the top flight, and promotion and relegation between the Premier League and the new First Division remained the same as the old First and Second Divisions with three teams relegated from the league and three promoted.The league held its first season in 1992–93. It was composed of 22 clubs for that season (reduced to 20 in the 1995–96 season). The first Premier League goal was scored by Brian Deane of Sheffield United in a 2–1 win against Manchester United.Luton Town, Notts County, and West Ham United were the three teams relegated from the old First Division at the end of the 1991–92 season, and did not take part in the inaugural Premier League season.Manchester United won the inaugural edition of the new league, ending a twenty-six year wait to be crowned champions of England. Bolstered by this breakthrough, United immediately became the competition\\'s dominant team, winning seven of the first nine trophies, two League and FA Cup \\'doubles\\' and a European treble, initially under a team of hardened veterans such as Bryan Robson, Steve Bruce, Paul Ince, Mark Hughes and Eric Cantona, before Cantona, Bruce and Roy Keane led a young dynamic new team filled with the Class of 92, a group of young players including David Beckham who came through the Manchester United Academy.\\nBetween 1993 and 1997, Blackburn Rovers and Newcastle United came close to challenging Manchester United\\'s early dominance; Blackburn won the 1994–95 FA Premier League and Newcastle led the title charge over United for much of the 1995–96 season. As the decade closed, Arsenal replicated Manchester United\\'s dominance by winning the League and FA Cup double in 1997–98 and together the \"Big 2\" would form a duopoly over the league between 1997 and 2004.']"
544 | ]
545 | },
546 | "metadata": {},
547 | "execution_count": 9
548 | }
549 | ]
550 | },
551 | {
552 | "cell_type": "code",
553 | "source": [
554 | "from llama_index.prompts import PromptTemplate\n",
555 | "\n",
556 | "template = (\n",
557 | " \"Given the context information below: \\n\"\n",
558 | " \"---------------------\\n\"\n",
559 | " \"{context_str}\"\n",
560 | " \"\\n---------------------\\n\"\n",
561 | " \"Please answer the question: {query_str}\\n\"\n",
562 | ")\n",
563 | "\n",
564 | "qa_template = PromptTemplate(template)\n",
565 | "prompt = qa_template.format(context_str=\"\\n\\n\".join(contexts), query_str=question)\n",
566 | "\n",
567 | "response = llm.complete(prompt)"
568 | ],
569 | "metadata": {
570 | "id": "sshxLpd-WB44"
571 | },
572 | "execution_count": 10,
573 | "outputs": []
574 | },
575 | {
576 | "cell_type": "code",
577 | "source": [
578 | "response.text"
579 | ],
580 | "metadata": {
581 | "colab": {
582 | "base_uri": "https://localhost:8080/",
583 | "height": 36
584 | },
585 | "id": "0CIih0WpWlJX",
586 | "outputId": "51f94ed9-5baf-43e2-dc55-59fd51dd1a22"
587 | },
588 | "execution_count": 11,
589 | "outputs": [
590 | {
591 | "output_type": "execute_result",
592 | "data": {
593 | "text/plain": [
594 | "'English clubs were banned from European competition for five years following the Heysel Stadium disaster in 1985.'"
595 | ],
596 | "application/vnd.google.colaboratory.intrinsic+json": {
597 | "type": "string"
598 | }
599 | },
600 | "metadata": {},
601 | "execution_count": 11
602 | }
603 | ]
604 | },
605 | {
606 | "cell_type": "code",
607 | "source": [
608 | "print(\n",
609 | " \"Embedding Tokens: \",\n",
610 | " token_counter.total_embedding_token_count,\n",
611 | " \"\\n\",\n",
612 | " \"LLM Prompt Tokens: \",\n",
613 | " token_counter.prompt_llm_token_count,\n",
614 | " \"\\n\",\n",
615 | " \"LLM Completion Tokens: \",\n",
616 | " token_counter.completion_llm_token_count,\n",
617 | " \"\\n\",\n",
618 | " \"Total LLM Token Count: \",\n",
619 | " token_counter.total_llm_token_count,\n",
620 | " \"\\n\",\n",
621 | ")"
622 | ],
623 | "metadata": {
624 | "colab": {
625 | "base_uri": "https://localhost:8080/"
626 | },
627 | "id": "nf30v7hiX0j-",
628 | "outputId": "a3936047-ef44-443b-9df9-a5da818b4ba3"
629 | },
630 | "execution_count": 12,
631 | "outputs": [
632 | {
633 | "output_type": "stream",
634 | "name": "stdout",
635 | "text": [
636 | "Embedding Tokens: 13551 \n",
637 | " LLM Prompt Tokens: 2363 \n",
638 | " LLM Completion Tokens: 22 \n",
639 | " Total LLM Token Count: 2385 \n",
640 | "\n"
641 | ]
642 | }
643 | ]
644 | },
645 | {
646 | "cell_type": "code",
647 | "source": [
648 | "!pip install llmlingua accelerate -q -U"
649 | ],
650 | "metadata": {
651 | "id": "Bn1B9Rz4afsp"
652 | },
653 | "execution_count": 13,
654 | "outputs": []
655 | },
656 | {
657 | "cell_type": "code",
658 | "source": [
659 | "from llama_index.query_engine import RetrieverQueryEngine\n",
660 | "from llama_index.response_synthesizers import CompactAndRefine\n",
661 | "from llama_index.indices.postprocessor import LongLLMLinguaPostprocessor\n",
662 | "node_postprocessor = LongLLMLinguaPostprocessor(\n",
663 | " instruction_str=\"Given the context, please answer the final question\",\n",
664 | " target_token=300,\n",
665 | " rank_method=\"longllmlingua\",\n",
666 | " additional_compress_kwargs={\n",
667 | " \"condition_compare\": True,\n",
668 | " \"condition_in_question\": \"after\",\n",
669 | " \"context_budget\": \"+100\",\n",
670 | " \"reorder_context\": \"sort\",\n",
671 | " \"dynamic_context_compression_ratio\": 0.3,\n",
672 | " },\n",
673 | ")"
674 | ],
675 | "metadata": {
676 | "colab": {
677 | "base_uri": "https://localhost:8080/",
678 | "height": 253,
679 | "referenced_widgets": [
680 | "53687d7d633941f0a54a6ec53a0db386",
681 | "8380e644a56b48c19a42e23bb39a98da",
682 | "0ce7633498da4978932f3dc6c0b94659",
683 | "2161e5d9426a416cb212f8107d23dcb5",
684 | "5508d2bfa3454b16ae682d5bfa62b5ff",
685 | "82eebd4dfe764734854070009499bc53",
686 | "024b11dc68434750a3dca9d7be5b75b6",
687 | "6485161f74c744fd806e405a7a5e722a",
688 | "422e02ef01614a8fb6c9baecb32abe00",
689 | "d54db20bb4d64240bea60519f3cc4650",
690 | "c5cb7c55d0684a60a60076459501ba65"
691 | ]
692 | },
693 | "id": "tpYopcTlZWX2",
694 | "outputId": "188b114f-7f83-425c-d0fb-277a8f1593f0"
695 | },
696 | "execution_count": 14,
697 | "outputs": [
698 | {
699 | "output_type": "stream",
700 | "name": "stderr",
701 | "text": [
702 | "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning: \n",
703 | "The secret `HF_TOKEN` does not exist in your Colab secrets.\n",
704 | "To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n",
705 | "You will be able to reuse this secret in all of your notebooks.\n",
706 | "Please note that authentication is recommended but still optional to access public models or datasets.\n",
707 | " warnings.warn(\n"
708 | ]
709 | },
710 | {
711 | "output_type": "display_data",
712 | "data": {
713 | "text/plain": [
714 | "Loading checkpoint shards: 0%| | 0/2 [00:00, ?it/s]"
715 | ],
716 | "application/vnd.jupyter.widget-view+json": {
717 | "version_major": 2,
718 | "version_minor": 0,
719 | "model_id": "53687d7d633941f0a54a6ec53a0db386"
720 | }
721 | },
722 | "metadata": {}
723 | },
724 | {
725 | "output_type": "stream",
726 | "name": "stderr",
727 | "text": [
728 | "/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:381: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.\n",
729 | " warnings.warn(\n",
730 | "/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:386: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.\n",
731 | " warnings.warn(\n"
732 | ]
733 | }
734 | ]
735 | },
736 | {
737 | "cell_type": "code",
738 | "source": [
739 | "retrieved_nodes = retriever.retrieve(question)\n",
740 | "synthesizer = CompactAndRefine()"
741 | ],
742 | "metadata": {
743 | "id": "80z15WSjzReI"
744 | },
745 | "execution_count": 15,
746 | "outputs": []
747 | },
748 | {
749 | "cell_type": "code",
750 | "source": [
751 | "retrieved_nodes"
752 | ],
753 | "metadata": {
754 | "colab": {
755 | "base_uri": "https://localhost:8080/"
756 | },
757 | "id": "Ae8dt9qv54Xu",
758 | "outputId": "e0ee5f93-fd1d-4e06-aa1b-51656942edc8"
759 | },
760 | "execution_count": 16,
761 | "outputs": [
762 | {
763 | "output_type": "execute_result",
764 | "data": {
765 | "text/plain": [
766 | "[NodeWithScore(node=TextNode(id_='6ec9417e-decb-4290-a9bf-62630e844bb4', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={: RelatedNodeInfo(node_id='28ebd3bd-9540-4e8a-9bff-6976c3aae981', node_type=, metadata={}, hash='49f783bdfd0ac70d7235553dac9a4b23a646a54a3fa319a87aceffd06b3c1310'), : RelatedNodeInfo(node_id='5f5926b3-c364-4754-b171-a3620c0827d9', node_type=, metadata={}, hash='6a2a1d44721d8e629023dcd71977bcc3bdba201f052ded1ccd93271ba73f4178'), : RelatedNodeInfo(node_id='25597b8b-b1c9-4769-bc2d-ee1bada663af', node_type=, metadata={}, hash='05d3b56d1e30dbad1d491ac9f1dcddeaa4be4f5001d73f5c22c7153de2ede65b')}, hash='38b2bd679b8bd0bcf92b1b878dd1dce93b1e13fb3774cd7ef562e6e1e9236019', text='== History ==\\n\\n\\n=== Origins ===\\nDespite significant European success in the 1970s and early 1980s, the late 1980s marked a low point for English football. Stadiums were deteriorating and supporters endured poor facilities, hooliganism was rife, and English clubs had been banned from European competition for five years following the Heysel Stadium disaster in 1985. The Football League First Division, the top level of English football since 1888, was behind leagues such as Italy\\'s Serie A and Spain\\'s La Liga in attendances and revenues, and several top English players had moved abroad.By the turn of the 1990s, the downward trend was starting to reverse. At the 1990 FIFA World Cup, England reached the semi-finals; UEFA, European football\\'s governing body, lifted the five-year ban on English clubs playing in European competitions in 1990, resulting in Manchester United lifting the Cup Winners\\' Cup in 1991. The Taylor Report on stadium safety standards, which proposed expensive upgrades to create all-seater stadiums in the aftermath of the Hillsborough disaster, was published in January 1990.During the 1980s, major English clubs had begun to transform into business ventures, applying commercial principles to club administration to maximise revenue. Martin Edwards of Manchester United, Irving Scholar of Tottenham Hotspur, and David Dein of Arsenal were among the leaders in this transformation. The commercial imperative led to the top clubs seeking to increase their power and revenue: the clubs in Division One threatened to break away from the Football League, and in doing so, they managed to increase their voting power and gain a more favourable financial arrangement, taking a 50% share of all television and sponsorship income in 1986. They demanded that television companies should pay more for their coverage of football matches, and revenue from television grew in importance. The Football League received £6.3 million for a two-year agreement in 1986, but by 1988, in a deal agreed with ITV, the price rose to £44 million over four years, with the leading clubs taking 75% of the cash. According to Scholar, who was involved in the negotiations of television deals, each of the First Division clubs received only around £25,000 per year from television rights before 1986, this increased to around £50,000 in the 1986 negotiation, then to £600,000 in 1988. The 1988 negotiations were conducted under the threat of ten clubs leaving to form a \"super league\", but they were eventually persuaded to stay, with the top clubs taking the lion\\'s share of the deal. The negotiations also convinced the bigger clubs that in order to receive enough votes, they needed to take the whole of First Division with them instead of a smaller \"super league\". By the beginning of the 1990s, the big clubs again considered breaking away, especially now that they had to fund the cost of stadium upgrade as proposed by the Taylor Report.In 1990, the managing director of London Weekend Television (LWT), Greg Dyke, met with the representatives of the \"big five\" football clubs in England (Manchester United, Liverpool, Tottenham Hotspur, Everton and Arsenal) over a dinner. The meeting was to pave the way for a breakaway from The Football League. Dyke believed that it would be more lucrative for LWT if only the larger clubs in the country were featured on national television and wanted to establish whether the clubs would be interested in a larger share of television rights money. The five clubs agreed with the suggestion and decided to press ahead with it; however, the league would have no credibility without the backing of The Football Association, and so David Dein of Arsenal held talks to see whether the FA were receptive to the idea. The FA did not have an amicable relationship with the Football League at the time and considered it as a way to weaken the Football League\\'s position. The FA released a report in June 1991, Blueprint for the Future of Football, that supported the plan for the Premier League with the FA as the ultimate authority that would oversee the breakaway league.', start_char_idx=2624, end_char_idx=6734, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), score=0.878433755179298),\n",
767 | " NodeWithScore(node=TextNode(id_='2570d30c-e15f-4859-9623-a21679e064ac', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={: RelatedNodeInfo(node_id='28ebd3bd-9540-4e8a-9bff-6976c3aae981', node_type=, metadata={}, hash='49f783bdfd0ac70d7235553dac9a4b23a646a54a3fa319a87aceffd06b3c1310'), : RelatedNodeInfo(node_id='ad2559e9-27d9-470d-8743-85bdc486b9a4', node_type=, metadata={}, hash='b50f48d6a7e334808cab9581ed47e2caf0320017ce2a77382d905e465e4a4a70'), : RelatedNodeInfo(node_id='2c71c7b2-eb25-4350-9987-d631a79a2ab1', node_type=, metadata={}, hash='aafe86b41a60bf8a8cda2344bf827a9e4d4d34b0ce86fce663763cac40a3871f')}, hash='9f95728b7f8e639234e64f166776d5363b347a4c38cfe6720eb1a59b232f6150', text='=== Performance in international competition ===\\n\\nWith 48 continental trophies won, English clubs are the third-most successful in European football, behind Italy (49) and Spain (65). In the top-tier UEFA Champions League, a record six English clubs have won a total of 15 titles and lost a further 11 finals, behind Spanish clubs with 19 and 11, respectively. In the second-tier UEFA Europa League, English clubs are also second, with nine victories and eight losses in the finals. In the former second-tier UEFA Cup Winners\\' Cup, English teams won a record eight titles and had a further five finalists. In the non-UEFA organized Inter-Cities Fairs Cup, English clubs provided four winners and four runners-up, the second-most behind Spain with six and three, respectively. In the newly created third-tier UEFA Europa Conference League, English clubs have won a joint-record one title so far. In the former fourth-tier UEFA Intertoto Cup, England won four titles and had a further final appearance, placing it fifth in the rankings, although English clubs were notorious for treating the tournament with disdain, either sending \"B\" squads or withdrawing from it altogether. In the one-off UEFA Super Cup, England has ten winners and ten runners-up, the second-most behind Spain with 16 and 15, respectively. Similarly to the Intertoto Cup, English teams did not take the former Intercontinental Cup seriously enough, despite its international status of the Club World Championship. They a made a total of six appearances in the one-off competition, winning only one of them, and withdrew a further three times. English clubs have won the FIFA-organized Club World Cup four times, tied for the second-most with Brazil, and behind only Spain, with eight.\\n\\n\\n== Sponsorship ==\\n\\nAfter an inaugural season with no sponsorship, the Premier League was sponsored by Carling from 1993 until 2001, during which time it was known as the FA Carling Premiership. In 2001, a new sponsorship deal with Barclaycard saw the league rebranded the FA Barclaycard Premiership, which was changed to the FA Barclays Premiership in time for the 2004–05 season.\\nFor the 2007–08 season, the league was rebranded the Barclays Premier League.\\nBarclays\\' deal with the Premier League expired at the end of the 2015–16 season. The FA announced on 4 June 2015 that it would not pursue any further title sponsorship deals for the Premier League, arguing that they wanted to build a \"clean\" brand for the competition more in line with those of major U.S. sports leagues.\\nAs well as sponsorship for the league itself, the Premier League has a number of official partners and suppliers. The official ball supplier for the league is Nike who have had the contract since the 2000–01 season when they took over from Mitre. Under its Merlin brand, Topps held the licence to produce collectables for the Premier League between 1994 and 2019 including stickers (for their sticker album) and trading cards. Launched in the 2007–08 season, Topps\\' Match Attax, the official Premier League trading card game, is the best selling boys collectable in the UK, and is also the biggest selling sports trading card game in the world. In October 2018, Panini were awarded the licence to produce collectables from the 2019–20 season. The chocolate company Cadbury has been the official snack partner of the Premier League since 2017, and sponsored the Golden Boot, Golden Glove and Playmaker of the Season awards from the 2017–18 season to 2019–20 season. The Coca-Cola Company (under its Coca-Cola Zero Sugar product line) sponsored these awards during the 2020–21 season with Castrol being the current sponsor as of the 2021–22 season.', start_char_idx=30324, end_char_idx=34008, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), score=0.8618792149943213),\n",
768 | " NodeWithScore(node=TextNode(id_='25597b8b-b1c9-4769-bc2d-ee1bada663af', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={: RelatedNodeInfo(node_id='28ebd3bd-9540-4e8a-9bff-6976c3aae981', node_type=, metadata={}, hash='49f783bdfd0ac70d7235553dac9a4b23a646a54a3fa319a87aceffd06b3c1310'), : RelatedNodeInfo(node_id='6ec9417e-decb-4290-a9bf-62630e844bb4', node_type=, metadata={}, hash='38b2bd679b8bd0bcf92b1b878dd1dce93b1e13fb3774cd7ef562e6e1e9236019'), : RelatedNodeInfo(node_id='6e4e797a-5953-44c3-841a-19ed91c4326a', node_type=, metadata={}, hash='de9dbc63008a8f665a208abfdc0ac575cf6e69d56fc356e92a158d4927010539')}, hash='05d3b56d1e30dbad1d491ac9f1dcddeaa4be4f5001d73f5c22c7153de2ede65b', text='=== Foundation (1990s) ===\\n\\nAt the close of the 1990–1991 season, a proposal was tabled for the establishment of a new league that would bring more money into the game overall. The Founder Members Agreement, signed on 17 July 1991 by the game\\'s top-flight clubs, established the basic principles for setting up the FA Premier League. The newly formed top division was to have commercial independence from The Football Association and the Football League, giving the FA Premier League licence to negotiate its own broadcast and sponsorship agreements. The argument given at the time was that the extra income would allow English clubs to compete with teams across Europe. Although Dyke played a significant role in the creation of the Premier League, he and ITV (of which LWT was part) lost out in the bidding for broadcast rights: BSkyB won with a bid of £304 million over five years, with the BBC awarded the highlights package broadcast on Match of the Day.The First Division clubs resigned en masse from the Football League in 1992, and on 27 May that year the FA Premier League was formed as a limited company, working out of an office at the Football Association\\'s then headquarters in Lancaster Gate. The 22 inaugural members of the new Premier League were:\\n\\nThis meant a break-up of the 104-year-old Football League that had operated until then with four divisions; the Premier League would operate with a single division and the Football League with three. There was no change in competition format; the same number of teams competed in the top flight, and promotion and relegation between the Premier League and the new First Division remained the same as the old First and Second Divisions with three teams relegated from the league and three promoted.The league held its first season in 1992–93. It was composed of 22 clubs for that season (reduced to 20 in the 1995–96 season). The first Premier League goal was scored by Brian Deane of Sheffield United in a 2–1 win against Manchester United.Luton Town, Notts County, and West Ham United were the three teams relegated from the old First Division at the end of the 1991–92 season, and did not take part in the inaugural Premier League season.Manchester United won the inaugural edition of the new league, ending a twenty-six year wait to be crowned champions of England. Bolstered by this breakthrough, United immediately became the competition\\'s dominant team, winning seven of the first nine trophies, two League and FA Cup \\'doubles\\' and a European treble, initially under a team of hardened veterans such as Bryan Robson, Steve Bruce, Paul Ince, Mark Hughes and Eric Cantona, before Cantona, Bruce and Roy Keane led a young dynamic new team filled with the Class of 92, a group of young players including David Beckham who came through the Manchester United Academy.\\nBetween 1993 and 1997, Blackburn Rovers and Newcastle United came close to challenging Manchester United\\'s early dominance; Blackburn won the 1994–95 FA Premier League and Newcastle led the title charge over United for much of the 1995–96 season. As the decade closed, Arsenal replicated Manchester United\\'s dominance by winning the League and FA Cup double in 1997–98 and together the \"Big 2\" would form a duopoly over the league between 1997 and 2004.', start_char_idx=6737, end_char_idx=10023, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), score=0.8386493120583591)]"
769 | ]
770 | },
771 | "metadata": {},
772 | "execution_count": 16
773 | }
774 | ]
775 | },
776 | {
777 | "cell_type": "code",
778 | "source": [
779 | "from llama_index.indices.query.schema import QueryBundle\n",
780 | "\n",
781 | "new_retrieved_nodes = node_postprocessor.postprocess_nodes(\n",
782 | " retrieved_nodes, query_bundle=QueryBundle(query_str=question)\n",
783 | ")"
784 | ],
785 | "metadata": {
786 | "id": "RwBETqYpZZWO"
787 | },
788 | "execution_count": 17,
789 | "outputs": []
790 | },
791 | {
792 | "cell_type": "code",
793 | "source": [
794 | "original_contexts = \"\\n\\n\".join([n.get_content() for n in retrieved_nodes])\n",
795 | "compressed_contexts = \"\\n\\n\".join([n.get_content() for n in new_retrieved_nodes])\n",
796 | "original_tokens = node_postprocessor._llm_lingua.get_token_length(original_contexts)\n",
797 | "compressed_tokens = node_postprocessor._llm_lingua.get_token_length(compressed_contexts)\n",
798 | "print(compressed_contexts)\n",
799 | "print()\n",
800 | "print(\"Original Tokens:\", original_tokens)\n",
801 | "print(\"Compressed Tokens:\", compressed_tokens)\n",
802 | "print(\"Compressed Ratio:\", f\"{original_tokens/(compressed_tokens + 1e-5):.2f}x\")"
803 | ],
804 | "metadata": {
805 | "colab": {
806 | "base_uri": "https://localhost:8080/"
807 | },
808 | "id": "WVTBwsFJZeqN",
809 | "outputId": "e9b30369-cf36-4cbb-b6d0-d2e240affce9"
810 | },
811 | "execution_count": 18,
812 | "outputs": [
813 | {
814 | "output_type": "stream",
815 | "name": "stdout",
816 | "text": [
817 | "\n",
818 | " ===Desite significant European success in the 197 and the late9s marked a low for footballs andters poor facilities,ism wasife, and English had been European in5 The First level of English since, wass A attendues, top English moved abroad.By9 theend was reverse9,s; UEFA European, lifted the five-year on English playing in European competitions in19 in Cupinners1 Report on safety, proposed to stadiums the, in09s, major English begun to commercial club administrationvenue Edwards Unitedolarur and were the this transformation The commercial imperative the clubs seeking to increase their power the away from League in so, to increase their voting and gain more arrangement50 ofship income in18. They demanded companies for of football re from in receivedyear in98, but by, deal price leading clubs taking ofash. Sch, whoations ofals each First5 rights800 in6,08.188 the clubs form a \" but were eventually to, top taking theion deal Theations also in receive needed take whole of First Division of a By the90 the big again that had of stad.In the man, withbig) meetingaway from The Football League. Dyke believed that it would be more lucrative for LWT if only the larger clubs in the country were featured on national television and wanted to establish whether the clubs would be interested in a larger share of television rights money. The five clubs agreed with the suggestion and decided to press ahead with it; however, the league would have no credibility without the backing of The Football Association, and so David Dein of Arsenal held talks to see whether the FA were receptive to the idea. The FA did not have an amicable relationship with the Football League at the time and considered it as a way to weaken the Football League's position. The FA released a report in June 1991, Blueprint for the Future of Football, that supported the plan for the Premier League with the FA as the ultimate authority that would oversee the breakaway league.\n",
819 | "\n",
820 | "Original Tokens: 2768\n",
821 | "Compressed Tokens: 443\n",
822 | "Compressed Ratio: 6.25x\n"
823 | ]
824 | }
825 | ]
826 | },
827 | {
828 | "cell_type": "code",
829 | "source": [
830 | "token_counter.reset_counts()"
831 | ],
832 | "metadata": {
833 | "id": "yrppFM1z-JJR"
834 | },
835 | "execution_count": 19,
836 | "outputs": []
837 | },
838 | {
839 | "cell_type": "code",
840 | "source": [
841 | "response = synthesizer.synthesize(question, new_retrieved_nodes)"
842 | ],
843 | "metadata": {
844 | "id": "Snht3Z8SaCeI"
845 | },
846 | "execution_count": 20,
847 | "outputs": []
848 | },
849 | {
850 | "cell_type": "code",
851 | "source": [
852 | "response"
853 | ],
854 | "metadata": {
855 | "colab": {
856 | "base_uri": "https://localhost:8080/"
857 | },
858 | "id": "esMKWsrUaD3X",
859 | "outputId": "1383303c-edb6-4cf8-f191-e9933617488f"
860 | },
861 | "execution_count": 21,
862 | "outputs": [
863 | {
864 | "output_type": "execute_result",
865 | "data": {
866 | "text/plain": [
867 | "Response(response='English clubs were banned from European competition due to poor facilities, hooliganism, and a history of violence.', source_nodes=[NodeWithScore(node=TextNode(id_='d1b09aff-6ac7-40f0-9ef6-797a03e872e3', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='ea6d1aeb9d0b0098d02bbc88f3338265ea944b0ec66744e065e8296a7dfe1068', text='\\n ===Desite significant European success in the 197 and the late9s marked a low for footballs andters poor facilities,ism wasife, and English had been European in5 The First level of English since, wass A attendues, top English moved abroad.By9 theend was reverse9,s; UEFA European, lifted the five-year on English playing in European competitions in19 in Cupinners1 Report on safety, proposed to stadiums the, in09s, major English begun to commercial club administrationvenue Edwards Unitedolarur and were the this transformation The commercial imperative the clubs seeking to increase their power the away from League in so, to increase their voting and gain more arrangement50 ofship income in18. They demanded companies for of football re from in receivedyear in98, but by, deal price leading clubs taking ofash. Sch, whoations ofals each First5 rights800 in6,08.188 the clubs form a \" but were eventually to, top taking theion deal Theations also in receive needed take whole of First Division of a By the90 the big again that had of stad.In the man, withbig) meetingaway from The Football League. Dyke believed that it would be more lucrative for LWT if only the larger clubs in the country were featured on national television and wanted to establish whether the clubs would be interested in a larger share of television rights money. The five clubs agreed with the suggestion and decided to press ahead with it; however, the league would have no credibility without the backing of The Football Association, and so David Dein of Arsenal held talks to see whether the FA were receptive to the idea. The FA did not have an amicable relationship with the Football League at the time and considered it as a way to weaken the Football League\\'s position. The FA released a report in June 1991, Blueprint for the Future of Football, that supported the plan for the Premier League with the FA as the ultimate authority that would oversee the breakaway league.', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), score=None)], metadata={'d1b09aff-6ac7-40f0-9ef6-797a03e872e3': {}})"
868 | ]
869 | },
870 | "metadata": {},
871 | "execution_count": 21
872 | }
873 | ]
874 | },
875 | {
876 | "cell_type": "code",
877 | "source": [
878 | "response.response"
879 | ],
880 | "metadata": {
881 | "colab": {
882 | "base_uri": "https://localhost:8080/",
883 | "height": 36
884 | },
885 | "id": "DenC2vcNy7Fd",
886 | "outputId": "210b7d9f-41c0-4120-fca3-f40a74a95bcc"
887 | },
888 | "execution_count": 22,
889 | "outputs": [
890 | {
891 | "output_type": "execute_result",
892 | "data": {
893 | "text/plain": [
894 | "'English clubs were banned from European competition due to poor facilities, hooliganism, and a history of violence.'"
895 | ],
896 | "application/vnd.google.colaboratory.intrinsic+json": {
897 | "type": "string"
898 | }
899 | },
900 | "metadata": {},
901 | "execution_count": 22
902 | }
903 | ]
904 | },
905 | {
906 | "cell_type": "code",
907 | "source": [
908 | "print(\n",
909 | " \"Embedding Tokens: \",\n",
910 | " token_counter.total_embedding_token_count,\n",
911 | " \"\\n\",\n",
912 | " \"LLM Prompt Tokens: \",\n",
913 | " token_counter.prompt_llm_token_count,\n",
914 | " \"\\n\",\n",
915 | " \"LLM Completion Tokens: \",\n",
916 | " token_counter.completion_llm_token_count,\n",
917 | " \"\\n\",\n",
918 | " \"Total LLM Token Count: \",\n",
919 | " token_counter.total_llm_token_count,\n",
920 | " \"\\n\",\n",
921 | ")"
922 | ],
923 | "metadata": {
924 | "colab": {
925 | "base_uri": "https://localhost:8080/"
926 | },
927 | "id": "p732JJs0-Nz0",
928 | "outputId": "51613b7c-a3c3-4154-cf84-16de440c6631"
929 | },
930 | "execution_count": 23,
931 | "outputs": [
932 | {
933 | "output_type": "stream",
934 | "name": "stdout",
935 | "text": [
936 | "Embedding Tokens: 0 \n",
937 | " LLM Prompt Tokens: 525 \n",
938 | " LLM Completion Tokens: 23 \n",
939 | " Total LLM Token Count: 548 \n",
940 | "\n"
941 | ]
942 | }
943 | ]
944 | },
945 | {
946 | "cell_type": "code",
947 | "source": [
948 | "retrieved_nodes = retriever.retrieve(question)\n",
949 | "\n",
950 | "new_retrieved_nodes = node_postprocessor.postprocess_nodes(\n",
951 | " retrieved_nodes, query_bundle=QueryBundle(query_str=question)\n",
952 | ")\n",
953 | "\n",
954 | "contexts = [n.get_content() for n in new_retrieved_nodes]\n",
955 | "prompt = qa_template.format(context_str=\"\\n\\n\".join(contexts), query_str=question)\n",
956 | "response = llm.complete(prompt)"
957 | ],
958 | "metadata": {
959 | "id": "N3qQz83U5YEn"
960 | },
961 | "execution_count": 24,
962 | "outputs": []
963 | },
964 | {
965 | "cell_type": "code",
966 | "source": [
967 | "new_retrieved_nodes"
968 | ],
969 | "metadata": {
970 | "colab": {
971 | "base_uri": "https://localhost:8080/"
972 | },
973 | "id": "y6CKc0Zl5wGV",
974 | "outputId": "51b42d17-a11c-4690-dafa-f202dfdf2811"
975 | },
976 | "execution_count": 25,
977 | "outputs": [
978 | {
979 | "output_type": "execute_result",
980 | "data": {
981 | "text/plain": [
982 | "[NodeWithScore(node=TextNode(id_='bfc9ecb3-ad7d-4d50-a47f-86ab58af44db', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='ea6d1aeb9d0b0098d02bbc88f3338265ea944b0ec66744e065e8296a7dfe1068', text='\\n ===Desite significant European success in the 197 and the late9s marked a low for footballs andters poor facilities,ism wasife, and English had been European in5 The First level of English since, wass A attendues, top English moved abroad.By9 theend was reverse9,s; UEFA European, lifted the five-year on English playing in European competitions in19 in Cupinners1 Report on safety, proposed to stadiums the, in09s, major English begun to commercial club administrationvenue Edwards Unitedolarur and were the this transformation The commercial imperative the clubs seeking to increase their power the away from League in so, to increase their voting and gain more arrangement50 ofship income in18. They demanded companies for of football re from in receivedyear in98, but by, deal price leading clubs taking ofash. Sch, whoations ofals each First5 rights800 in6,08.188 the clubs form a \" but were eventually to, top taking theion deal Theations also in receive needed take whole of First Division of a By the90 the big again that had of stad.In the man, withbig) meetingaway from The Football League. Dyke believed that it would be more lucrative for LWT if only the larger clubs in the country were featured on national television and wanted to establish whether the clubs would be interested in a larger share of television rights money. The five clubs agreed with the suggestion and decided to press ahead with it; however, the league would have no credibility without the backing of The Football Association, and so David Dein of Arsenal held talks to see whether the FA were receptive to the idea. The FA did not have an amicable relationship with the Football League at the time and considered it as a way to weaken the Football League\\'s position. The FA released a report in June 1991, Blueprint for the Future of Football, that supported the plan for the Premier League with the FA as the ultimate authority that would oversee the breakaway league.', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), score=None)]"
983 | ]
984 | },
985 | "metadata": {},
986 | "execution_count": 25
987 | }
988 | ]
989 | },
990 | {
991 | "cell_type": "code",
992 | "source": [
993 | "response.text"
994 | ],
995 | "metadata": {
996 | "colab": {
997 | "base_uri": "https://localhost:8080/",
998 | "height": 73
999 | },
1000 | "id": "HWoklTzM5jqQ",
1001 | "outputId": "f46e0749-f852-49a0-92ff-e847f545a11d"
1002 | },
1003 | "execution_count": 26,
1004 | "outputs": [
1005 | {
1006 | "output_type": "execute_result",
1007 | "data": {
1008 | "text/plain": [
1009 | "'English clubs were banned from European competition due to a series of incidents and issues in the 1980s. These included poor facilities, hooliganism, and a lack of safety measures in stadiums. The ban was imposed by UEFA, the governing body for European football, in an effort to address these problems and improve the overall image and safety of the sport. The ban lasted for five years, from 1985 to 1990.'"
1010 | ],
1011 | "application/vnd.google.colaboratory.intrinsic+json": {
1012 | "type": "string"
1013 | }
1014 | },
1015 | "metadata": {},
1016 | "execution_count": 26
1017 | }
1018 | ]
1019 | }
1020 | ]
1021 | }
--------------------------------------------------------------------------------