├── GraphRAG ├── .env ├── input │ └── book.txt ├── prompts │ ├── claim_extraction.txt │ ├── community_report.txt │ ├── drift_search_system_prompt.txt │ ├── entity_extraction.txt │ ├── global_search_knowledge_system_prompt.txt │ ├── global_search_map_system_prompt.txt │ ├── global_search_reduce_system_prompt.txt │ ├── local_search_system_prompt.txt │ ├── question_gen_system_prompt.txt │ └── summarize_descriptions.txt ├── settings.yaml └── utils │ └── graph_visual_with_neo4j.py ├── LightRAG ├── Dockerfile ├── LICENSE ├── README.md ├── examples │ ├── batch_eval.py │ ├── generate_query.py │ ├── graph_visual_with_html.py │ ├── graph_visual_with_neo4j.py │ ├── insert_custom_kg.py │ ├── lightrag_api_openai_compatible_demo.py │ ├── lightrag_api_oracle_demo..py │ ├── lightrag_azure_openai_demo.py │ ├── lightrag_bedrock_demo.py │ ├── lightrag_hf_demo.py │ ├── lightrag_lmdeploy_demo.py │ ├── lightrag_ollama_demo.py │ ├── lightrag_openai_compatible_demo.py │ ├── lightrag_openai_demo.py │ ├── lightrag_oracle_demo.py │ ├── lightrag_siliconcloud_demo.py │ └── vram_management_demo.py ├── get_all_edges_nx.py ├── lightrag │ ├── __init__.py │ ├── __pycache__ │ │ ├── __init__.cpython-311.pyc │ │ ├── base.cpython-311.pyc │ │ ├── lightrag.cpython-311.pyc │ │ ├── llm.cpython-311.pyc │ │ ├── operate.cpython-311.pyc │ │ ├── prompt.cpython-311.pyc │ │ ├── storage.cpython-311.pyc │ │ └── utils.cpython-311.pyc │ ├── base.py │ ├── kg │ │ ├── __init__.py │ │ ├── __pycache__ │ │ │ ├── __init__.cpython-311.pyc │ │ │ ├── neo4j_impl.cpython-311.pyc │ │ │ └── oracle_impl.cpython-311.pyc │ │ ├── neo4j_impl.py │ │ └── oracle_impl.py │ ├── lightrag.py │ ├── llm.py │ ├── operate.py │ ├── prompt.py │ ├── storage.py │ └── utils.py ├── lightrag_hku.egg-info │ ├── PKG-INFO │ ├── SOURCES.txt │ ├── dependency_links.txt │ ├── requires.txt │ └── top_level.txt ├── nangeAGICode │ ├── input │ │ └── book.txt │ └── test.py ├── nangeAGICode1201 │ ├── files │ │ ├── incremental_inputs │ │ │ ├── 6.txt │ │ │ ├── 7.txt │ │ │ ├── 8.docx │ │ │ ├── 8.pdf │ │ │ ├── 9.docx │ │ │ └── 9.pdf │ │ └── inputs │ │ │ ├── 1.txt │ │ │ ├── 2.txt │ │ │ ├── 3.txt │ │ │ ├── 4.txt │ │ │ └── 5.txt │ ├── graph_visual_with_html.py │ ├── graph_visual_with_neo4j.py │ ├── insertTest.py │ ├── queryTest.py │ └── textract-16.5.zip ├── reproduce │ ├── Step_0.py │ ├── Step_1.py │ ├── Step_1_openai_compatible.py │ ├── Step_2.py │ ├── Step_3.py │ └── Step_3_openai_compatible.py ├── requirements.txt ├── setup.py ├── test.py └── test_neo4j.py ├── README.md ├── img.png └── results.md /GraphRAG/.env: -------------------------------------------------------------------------------- 1 | GRAPHRAG_API_BASE=https://api.wlai.vip/v1 2 | GRAPHRAG_CHAT_API_KEY=sk-dUWW1jzueJ4lrDixWaPsq7nnyN5bCucMzvldpNJwfJlIvAcC 3 | GRAPHRAG_CHAT_MODEL=gpt-4o-mini 4 | GRAPHRAG_EMBEDDING_API_KEY=sk-dUWW1jzueJ4lrDixWaPsq7nnyN5bCucMzvldpNJwfJlIvAcC 5 | GRAPHRAG_EMBEDDING_MODEL=text-embedding-3-small 6 | -------------------------------------------------------------------------------- /GraphRAG/prompts/claim_extraction.txt: -------------------------------------------------------------------------------- 1 | 2 | -Target activity- 3 | You are an intelligent assistant that helps a human analyst to analyze claims against certain entities presented in a text document. 4 | 5 | -Goal- 6 | Given a text document that is potentially relevant to this activity, an entity specification, and a claim description, extract all entities that match the entity specification and all claims against those entities. 7 | 8 | -Steps- 9 | 1. Extract all named entities that match the predefined entity specification. Entity specification can either be a list of entity names or a list of entity types. 10 | 2. For each entity identified in step 1, extract all claims associated with the entity. Claims need to match the specified claim description, and the entity should be the subject of the claim. 11 | For each claim, extract the following information: 12 | - Subject: name of the entity that is subject of the claim, capitalized. The subject entity is one that committed the action described in the claim. Subject needs to be one of the named entities identified in step 1. 13 | - Object: name of the entity that is object of the claim, capitalized. The object entity is one that either reports/handles or is affected by the action described in the claim. If object entity is unknown, use **NONE**. 14 | - Claim Type: overall category of the claim, capitalized. Name it in a way that can be repeated across multiple text inputs, so that similar claims share the same claim type 15 | - Claim Status: **TRUE**, **FALSE**, or **SUSPECTED**. TRUE means the claim is confirmed, FALSE means the claim is found to be False, SUSPECTED means the claim is not verified. 16 | - Claim Description: Detailed description explaining the reasoning behind the claim, together with all the related evidence and references. 17 | - Claim Date: Period (start_date, end_date) when the claim was made. Both start_date and end_date should be in ISO-8601 format. If the claim was made on a single date rather than a date range, set the same date for both start_date and end_date. If date is unknown, return **NONE**. 18 | - Claim Source Text: List of **all** quotes from the original text that are relevant to the claim. 19 | 20 | Format each claim as ({tuple_delimiter}{tuple_delimiter}{tuple_delimiter}{tuple_delimiter}{tuple_delimiter}{tuple_delimiter}{tuple_delimiter}) 21 | 22 | 3. Return output in English as a single list of all the claims identified in steps 1 and 2. Use **{record_delimiter}** as the list delimiter. 23 | 24 | 4. When finished, output {completion_delimiter} 25 | 26 | -Examples- 27 | Example 1: 28 | Entity specification: organization 29 | Claim description: red flags associated with an entity 30 | Text: According to an article on 2022/01/10, Company A was fined for bid rigging while participating in multiple public tenders published by Government Agency B. The company is owned by Person C who was suspected of engaging in corruption activities in 2015. 31 | Output: 32 | 33 | (COMPANY A{tuple_delimiter}GOVERNMENT AGENCY B{tuple_delimiter}ANTI-COMPETITIVE PRACTICES{tuple_delimiter}TRUE{tuple_delimiter}2022-01-10T00:00:00{tuple_delimiter}2022-01-10T00:00:00{tuple_delimiter}Company A was found to engage in anti-competitive practices because it was fined for bid rigging in multiple public tenders published by Government Agency B according to an article published on 2022/01/10{tuple_delimiter}According to an article published on 2022/01/10, Company A was fined for bid rigging while participating in multiple public tenders published by Government Agency B.) 34 | {completion_delimiter} 35 | 36 | Example 2: 37 | Entity specification: Company A, Person C 38 | Claim description: red flags associated with an entity 39 | Text: According to an article on 2022/01/10, Company A was fined for bid rigging while participating in multiple public tenders published by Government Agency B. The company is owned by Person C who was suspected of engaging in corruption activities in 2015. 40 | Output: 41 | 42 | (COMPANY A{tuple_delimiter}GOVERNMENT AGENCY B{tuple_delimiter}ANTI-COMPETITIVE PRACTICES{tuple_delimiter}TRUE{tuple_delimiter}2022-01-10T00:00:00{tuple_delimiter}2022-01-10T00:00:00{tuple_delimiter}Company A was found to engage in anti-competitive practices because it was fined for bid rigging in multiple public tenders published by Government Agency B according to an article published on 2022/01/10{tuple_delimiter}According to an article published on 2022/01/10, Company A was fined for bid rigging while participating in multiple public tenders published by Government Agency B.) 43 | {record_delimiter} 44 | (PERSON C{tuple_delimiter}NONE{tuple_delimiter}CORRUPTION{tuple_delimiter}SUSPECTED{tuple_delimiter}2015-01-01T00:00:00{tuple_delimiter}2015-12-30T00:00:00{tuple_delimiter}Person C was suspected of engaging in corruption activities in 2015{tuple_delimiter}The company is owned by Person C who was suspected of engaging in corruption activities in 2015) 45 | {completion_delimiter} 46 | 47 | -Real Data- 48 | Use the following input for your answer. 49 | Entity specification: {entity_specs} 50 | Claim description: {claim_description} 51 | Text: {input_text} 52 | Output: -------------------------------------------------------------------------------- /GraphRAG/prompts/community_report.txt: -------------------------------------------------------------------------------- 1 | 2 | You are an expert in literary analysis. You are skilled at textual interpretation, thematic exploration, and understanding narrative techniques. You are adept at helping people identify the relations and structures within literary communities, facilitating a deeper analysis of texts and their socio-cultural contexts. 3 | 4 | # Goal 5 | Write a comprehensive assessment report of a community taking on the role of a A literary analyst that is examining a passage from the classic Chinese novel "Journey to the West," focusing on the character development of Sun Wukong (the Monkey King) and his relationships with other characters. The analysis will delve into themes of rebellion, identity, and the quest for immortality, providing insights into the cultural and social relevance of the text within the context of Chinese literature. The report will be used to educate readers and scholars about the nuanced character arcs and the overarching narrative techniques employed in the work.. The content of this report includes an overview of the community's key entities and relationships. 6 | 7 | # Report Structure 8 | The report should include the following sections: 9 | - TITLE: community's name that represents its key entities - title should be short but specific. When possible, include representative named entities in the title. 10 | - SUMMARY: An executive summary of the community's overall structure, how its entities are related to each other, and significant points associated with its entities. 11 | - REPORT RATING: A float score between 0-10 that represents the relevance of the text to literary analysis, narrative techniques, character development, and thematic exploration, with 1 being trivial or irrelevant and 10 being highly significant, profound, and impactful for understanding the text's meaning and its socio-cultural context. 12 | - RATING EXPLANATION: Give a single sentence explanation of the rating. 13 | - DETAILED FINDINGS: A list of 5-10 key insights about the community. Each insight should have a short summary followed by multiple paragraphs of explanatory text grounded according to the grounding rules below. Be comprehensive. 14 | 15 | Return output as a well-formed JSON-formatted string with the following format. Don't use any unnecessary escape sequences. The output should be a single JSON object that can be parsed by json.loads. 16 | { 17 | "title": "", 18 | "summary": "", 19 | "rating": , 20 | "rating_explanation": "" 21 | "findings": "[{"summary":"", "explanation": "", "explanation": " (, ... ()]. If there are more than 10 data records, show the top 10 most relevant records. 26 | Each paragraph should contain multiple sentences of explanation and concrete examples with specific named entities. All paragraphs must have these references at the start and end. Use "NONE" if there are no related roles or records. Everything should be in Chinese. 27 | 28 | Example paragraph with references added: 29 | This is a paragraph of the output text [records: Entities (1, 2, 3), Claims (2, 5), Relationships (10, 12)] 30 | 31 | # Example Input 32 | ----------- 33 | Text: 34 | 35 | Entities 36 | 37 | id,entity,description 38 | 5,ABILA CITY PARK,Abila City Park is the location of the POK rally 39 | 40 | Relationships 41 | 42 | id,source,target,description 43 | 37,ABILA CITY PARK,POK RALLY,Abila City Park is the location of the POK rally 44 | 38,ABILA CITY PARK,POK,POK is holding a rally in Abila City Park 45 | 39,ABILA CITY PARK,POKRALLY,The POKRally is taking place at Abila City Park 46 | 40,ABILA CITY PARK,CENTRAL BULLETIN,Central Bulletin is reporting on the POK rally taking place in Abila City Park 47 | 48 | Output: 49 | { 50 | "title": "Abila City Park and POK Rally", 51 | "summary": "The community revolves around the Abila City Park, which is the location of the POK rally. The park has relationships with POK, POKRALLY, and Central Bulletin, all 52 | of which are associated with the rally event.", 53 | "rating": 5.0, 54 | "rating_explanation": "The impact rating is moderate due to the potential for unrest or conflict during the POK rally.", 55 | "findings": [ 56 | { 57 | "summary": "Abila City Park as the central location", 58 | "explanation": "Abila City Park is the central entity in this community, serving as the location for the POK rally. This park is the common link between all other 59 | entities, suggesting its significance in the community. The park's association with the rally could potentially lead to issues such as public disorder or conflict, depending on the 60 | nature of the rally and the reactions it provokes. [records: Entities (5), Relationships (37, 38, 39, 40)]" 61 | }, 62 | { 63 | "summary": "POK's role in the community", 64 | "explanation": "POK is another key entity in this community, being the organizer of the rally at Abila City Park. The nature of POK and its rally could be a potential 65 | source of threat, depending on their objectives and the reactions they provoke. The relationship between POK and the park is crucial in understanding the dynamics of this community. 66 | [records: Relationships (38)]" 67 | }, 68 | { 69 | "summary": "POKRALLY as a significant event", 70 | "explanation": "The POKRALLY is a significant event taking place at Abila City Park. This event is a key factor in the community's dynamics and could be a potential 71 | source of threat, depending on the nature of the rally and the reactions it provokes. The relationship between the rally and the park is crucial in understanding the dynamics of this 72 | community. [records: Relationships (39)]" 73 | }, 74 | { 75 | "summary": "Role of Central Bulletin", 76 | "explanation": "Central Bulletin is reporting on the POK rally taking place in Abila City Park. This suggests that the event has attracted media attention, which could 77 | amplify its impact on the community. The role of Central Bulletin could be significant in shaping public perception of the event and the entities involved. [records: Relationships 78 | (40)]" 79 | } 80 | ] 81 | 82 | } 83 | 84 | # Real Data 85 | 86 | Use the following text for your answer. Do not make anything up in your answer. 87 | 88 | Text: 89 | {input_text} 90 | Output: -------------------------------------------------------------------------------- /GraphRAG/prompts/drift_search_system_prompt.txt: -------------------------------------------------------------------------------- 1 | 2 | ---Role--- 3 | 4 | You are a helpful assistant responding to questions about data in the tables provided. 5 | 6 | 7 | ---Goal--- 8 | 9 | Generate a response of the target length and format that responds to the user's question, summarizing all information in the input data tables appropriate for the response length and format, and incorporating any relevant general knowledge. 10 | 11 | If you don't know the answer, just say so. Do not make anything up. 12 | 13 | Points supported by data should list their data references as follows: 14 | 15 | "This is an example sentence supported by multiple data references [Data: (record ids); (record ids)]." 16 | 17 | Do not list more than 5 record ids in a single reference. Instead, list the top 5 most relevant record ids and add "+more" to indicate that there are more. 18 | 19 | For example: 20 | 21 | "Person X is the owner of Company Y and subject to many allegations of wrongdoing [Data: Sources (15, 16)]." 22 | 23 | where 15, 16, 1, 5, 7, 23, 2, 7, 34, 46, and 64 represent the id (not the index) of the relevant data record. 24 | 25 | Pay close attention specifically to the Sources tables as it contains the most relevant information for the user query. You will be rewarded for preserving the context of the sources in your response. 26 | 27 | ---Target response length and format--- 28 | 29 | {response_type} 30 | 31 | 32 | ---Data tables--- 33 | 34 | {context_data} 35 | 36 | 37 | ---Goal--- 38 | 39 | Generate a response of the target length and format that responds to the user's question, summarizing all information in the input data tables appropriate for the response length and format, and incorporating any relevant general knowledge. 40 | 41 | If you don't know the answer, just say so. Do not make anything up. 42 | 43 | Points supported by data should list their data references as follows: 44 | 45 | "This is an example sentence supported by multiple data references [Data: (record ids); (record ids)]." 46 | 47 | Do not list more than 5 record ids in a single reference. Instead, list the top 5 most relevant record ids and add "+more" to indicate that there are more. 48 | 49 | For example: 50 | 51 | "Person X is the owner of Company Y and subject to many allegations of wrongdoing [Data: Sources (15, 16)]." 52 | 53 | where 15, 16, 1, 5, 7, 23, 2, 7, 34, 46, and 64 represent the id (not the index) of the relevant data record. 54 | 55 | Pay close attention specifically to the Sources tables as it contains the most relevant information for the user query. You will be rewarded for preserving the context of the sources in your response. 56 | 57 | ---Target response length and format--- 58 | 59 | {response_type} 60 | 61 | Add sections and commentary to the response as appropriate for the length and format. 62 | 63 | Additionally provide a score between 0 and 100 representing how well the response addresses the overall research question: {global_query}. Based on your response, suggest up to five follow-up questions that could be asked to further explore the topic as it relates to the overall research question. Do not include scores or follow up questions in the 'response' field of the JSON, add them to the respective 'score' and 'follow_up_queries' keys of the JSON output. Format your response in JSON with the following keys and values: 64 | 65 | {{'response': str, Put your answer, formatted in markdown, here. Do not answer the global query in this section. 66 | 'score': int, 67 | 'follow_up_queries': List[str]}} 68 | -------------------------------------------------------------------------------- /GraphRAG/prompts/entity_extraction.txt: -------------------------------------------------------------------------------- 1 | 2 | -Goal- 3 | Given a text document that is potentially relevant to this activity, first identify all entities needed from the text in order to capture the information and ideas in the text. 4 | Next, report all relationships among the identified entities. 5 | 6 | -Steps- 7 | 1. Identify all entities. For each identified entity, extract the following information: 8 | - entity_name: Name of the entity, capitalized 9 | - entity_type: Suggest several labels or categories for the entity. The categories should not be specific, but should be as general as possible. 10 | - entity_description: Comprehensive description of the entity's attributes and activities 11 | Format each entity as ("entity"{tuple_delimiter}{tuple_delimiter}{tuple_delimiter}) 12 | 13 | 2. From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are *clearly related* to each other. 14 | For each pair of related entities, extract the following information: 15 | - source_entity: name of the source entity, as identified in step 1 16 | - target_entity: name of the target entity, as identified in step 1 17 | - relationship_description: explanation as to why you think the source entity and the target entity are related to each other 18 | - relationship_strength: a numeric score indicating strength of the relationship between the source entity and target entity 19 | Format each relationship as ("relationship"{tuple_delimiter}{tuple_delimiter}{tuple_delimiter}{tuple_delimiter}) 20 | 21 | 3. Return output in Chinese as a single list of all the entities and relationships identified in steps 1 and 2. Use **{record_delimiter}** as the list delimiter. 22 | 23 | 4. If you have to translate into Chinese, just translate the descriptions, nothing else! 24 | 25 | 5. When finished, output {completion_delimiter}. 26 | 27 | -Examples- 28 | ###################### 29 | 30 | Example 1: 31 | 32 | text: 33 | �手走到里间,关上了门。师兄们看到师父生气了,感到很害怕,纷纷责怪孙悟空。 34 |   孙悟空既不怕,又不生气,心里反而十分高兴。当天晚上,悟空假装睡着了,可是一到半夜,就悄悄起来,从前门出去,等到三更,绕到后门口,看见门半开半闭,高兴地不得了,心想∶“哈哈,我没有猜错师父的意思。” 35 |   孙悟空走了进去,看见祖� 36 | ------------------------ 37 | output: 38 | ("entity"{tuple_delimiter}孙悟空{tuple_delimiter}人物{tuple_delimiter}孙悟空是故事中的主要角色,他不害怕师父的生气,并感到高兴。) 39 | {record_delimiter} 40 | ("entity"{tuple_delimiter}师父{tuple_delimiter}人物{tuple_delimiter}师父是孙悟空的老师,他对弟子们的行为感到生气。) 41 | {record_delimiter} 42 | ("entity"{tuple_delimiter}师兄们{tuple_delimiter}人物{tuple_delimiter}师兄们是师父的其他弟子,他们看到师父生气后感到害怕,并责怪孙悟空。) 43 | {record_delimiter} 44 | ("relationship"{tuple_delimiter}孙悟空{tuple_delimiter}师父{tuple_delimiter}孙悟空是师父的弟子,师父对他的行为感到生气。{tuple_delimiter}7) 45 | {record_delimiter} 46 | ("relationship"{tuple_delimiter}师兄们{tuple_delimiter}孙悟空{tuple_delimiter}师兄们责怪孙悟空因为师父生气。{tuple_delimiter}6) 47 | {completion_delimiter} 48 | ############################# 49 | 50 | 51 | Example 2: 52 | 53 | text: 54 | 。他下了木筏,登上了岸,看见岸边有许多人都在干活,有的捉鱼,有的打天上的大雁,有的挖蛤蜊,有的淘盐,他悄悄地走过去,没想到,吓得那些人将东西一扔,四处逃命。 55 |   这一天,他来到一座高山前,突然从半山腰的树林里传出一阵美妙的歌声,唱的是一些关于成仙的话。猴王想∶这个唱歌的人一定是神仙,就顺着歌声找去 56 | ------------------------ 57 | output: 58 | ("entity"{tuple_delimiter}木筏{tuple_delimiter}物体{tuple_delimiter}木筏是猴王所用的交通工具,用于过河或海的活动) 59 | {record_delimiter} 60 | ("entity"{tuple_delimiter}岸{tuple_delimiter}地方{tuple_delimiter}岸是河流或海洋的边缘,有许多人在岸边活动) 61 | {record_delimiter} 62 | ("entity"{tuple_delimiter}人{tuple_delimiter}个体{tuple_delimiter}岸边的许多人从事不同的活动,如捕鱼、打雁、挖蛤蜊和淘盐) 63 | {record_delimiter} 64 | ("entity"{tuple_delimiter}鱼{tuple_delimiter}生物{tuple_delimiter}鱼是岸边人们捕捉的水生生物) 65 | {record_delimiter} 66 | ("entity"{tuple_delimiter}天鹅{tuple_delimiter}生物{tuple_delimiter}天鹅是一些人在岸边捕打的飞禽) 67 | {record_delimiter} 68 | ("entity"{tuple_delimiter}蛤蜊{tuple_delimiter}生物{tuple_delimiter}蛤蜊是岸边被人们挖掘的海洋生物) 69 | {record_delimiter} 70 | ("entity"{tuple_delimiter}盐{tuple_delimiter}物品{tuple_delimiter}盐是岸边人们淘取并利用的物品) 71 | {record_delimiter} 72 | ("entity"{tuple_delimiter}猴王{tuple_delimiter}人物{tuple_delimiter}猴王是故事的主角,追随美妙的歌声寻找神仙) 73 | {record_delimiter} 74 | ("entity"{tuple_delimiter}山{tuple_delimiter}地形{tuple_delimiter}山是猴王在故事中到达的高地,代表了某种挑战和转折) 75 | {record_delimiter} 76 | ("entity"{tuple_delimiter}歌声{tuple_delimiter}声音{tuple_delimiter}歌声是一种美妙的声音,吸引猴王去寻找发声者) 77 | {record_delimiter} 78 | ("entity"{tuple_delimiter}神仙{tuple_delimiter}个体{tuple_delimiter}发出歌声的人被猴王认为是神仙,象征着超自然存在) 79 | {record_delimiter} 80 | ("relationship"{tuple_delimiter}猴王{tuple_delimiter}木筏{tuple_delimiter}猴王使用木筏作为交通工具来到岸边{tuple_delimiter}7) 81 | {record_delimiter} 82 | ("relationship"{tuple_delimiter}人{tuple_delimiter}岸{tuple_delimiter}岸边有许多人在进行各种活动{tuple_delimiter}8) 83 | {record_delimiter} 84 | ("relationship"{tuple_delimiter}鱼{tuple_delimiter}人{tuple_delimiter}岸边有人捕捉鱼{tuple_delimiter}6) 85 | {record_delimiter} 86 | ("relationship"{tuple_delimiter}天鹅{tuple_delimiter}人{tuple_delimiter}岸边有人打天鹅{tuple_delimiter}6) 87 | {record_delimiter} 88 | ("relationship"{tuple_delimiter}蛤蜊{tuple_delimiter}人{tuple_delimiter}岸边有人挖掘蛤蜊{tuple_delimiter}6) 89 | {record_delimiter} 90 | ("relationship"{tuple_delimiter}盐{tuple_delimiter}人{tuple_delimiter}岸边有人淘盐{tuple_delimiter}6) 91 | {record_delimiter} 92 | ("relationship"{tuple_delimiter}猴王{tuple_delimiter}山{tuple_delimiter}猴王来到高山前{tuple_delimiter}5) 93 | {record_delimiter} 94 | ("relationship"{tuple_delimiter}歌声{tuple_delimiter}猴王{tuple_delimiter}猴王被美妙的歌声吸引,追随声音的来源{tuple_delimiter}9) 95 | {record_delimiter} 96 | ("relationship"{tuple_delimiter}神仙{tuple_delimiter}歌声{tuple_delimiter}猴王认为发出歌声的人是神仙{tuple_delimiter}8) 97 | {completion_delimiter} 98 | ############################# 99 | 100 | 101 | 102 | -Real Data- 103 | ###################### 104 | text: {input_text} 105 | ###################### 106 | output: 107 | -------------------------------------------------------------------------------- /GraphRAG/prompts/global_search_knowledge_system_prompt.txt: -------------------------------------------------------------------------------- 1 | 2 | The response may also include relevant real-world knowledge outside the dataset, but it must be explicitly annotated with a verification tag [LLM: verify]. For example: 3 | "This is an example sentence supported by real-world knowledge [LLM: verify]." 4 | -------------------------------------------------------------------------------- /GraphRAG/prompts/global_search_map_system_prompt.txt: -------------------------------------------------------------------------------- 1 | 2 | ---Role--- 3 | 4 | You are a helpful assistant responding to questions about data in the tables provided. 5 | 6 | 7 | ---Goal--- 8 | 9 | Generate a response consisting of a list of key points that responds to the user's question, summarizing all relevant information in the input data tables. 10 | 11 | You should use the data provided in the data tables below as the primary context for generating the response. 12 | If you don't know the answer or if the input data tables do not contain sufficient information to provide an answer, just say so. Do not make anything up. 13 | 14 | Each key point in the response should have the following element: 15 | - Description: A comprehensive description of the point. 16 | - Importance Score: An integer score between 0-100 that indicates how important the point is in answering the user's question. An 'I don't know' type of response should have a score of 0. 17 | 18 | The response should be JSON formatted as follows: 19 | {{ 20 | "points": [ 21 | {{"description": "Description of point 1 [Data: Reports (report ids)]", "score": score_value}}, 22 | {{"description": "Description of point 2 [Data: Reports (report ids)]", "score": score_value}} 23 | ] 24 | }} 25 | 26 | The response shall preserve the original meaning and use of modal verbs such as "shall", "may" or "will". 27 | 28 | Points supported by data should list the relevant reports as references as follows: 29 | "This is an example sentence supported by data references [Data: Reports (report ids)]" 30 | 31 | **Do not list more than 5 record ids in a single reference**. Instead, list the top 5 most relevant record ids and add "+more" to indicate that there are more. 32 | 33 | For example: 34 | "Person X is the owner of Company Y and subject to many allegations of wrongdoing [Data: Reports (2, 7, 64, 46, 34, +more)]. He is also CEO of company X [Data: Reports (1, 3)]" 35 | 36 | where 1, 2, 3, 7, 34, 46, and 64 represent the id (not the index) of the relevant data report in the provided tables. 37 | 38 | Do not include information where the supporting evidence for it is not provided. 39 | 40 | 41 | ---Data tables--- 42 | 43 | {context_data} 44 | 45 | ---Goal--- 46 | 47 | Generate a response consisting of a list of key points that responds to the user's question, summarizing all relevant information in the input data tables. 48 | 49 | You should use the data provided in the data tables below as the primary context for generating the response. 50 | If you don't know the answer or if the input data tables do not contain sufficient information to provide an answer, just say so. Do not make anything up. 51 | 52 | Each key point in the response should have the following element: 53 | - Description: A comprehensive description of the point. 54 | - Importance Score: An integer score between 0-100 that indicates how important the point is in answering the user's question. An 'I don't know' type of response should have a score of 0. 55 | 56 | The response shall preserve the original meaning and use of modal verbs such as "shall", "may" or "will". 57 | 58 | Points supported by data should list the relevant reports as references as follows: 59 | "This is an example sentence supported by data references [Data: Reports (report ids)]" 60 | 61 | **Do not list more than 5 record ids in a single reference**. Instead, list the top 5 most relevant record ids and add "+more" to indicate that there are more. 62 | 63 | For example: 64 | "Person X is the owner of Company Y and subject to many allegations of wrongdoing [Data: Reports (2, 7, 64, 46, 34, +more)]. He is also CEO of company X [Data: Reports (1, 3)]" 65 | 66 | where 1, 2, 3, 7, 34, 46, and 64 represent the id (not the index) of the relevant data report in the provided tables. 67 | 68 | Do not include information where the supporting evidence for it is not provided. 69 | 70 | The response should be JSON formatted as follows: 71 | {{ 72 | "points": [ 73 | {{"description": "Description of point 1 [Data: Reports (report ids)]", "score": score_value}}, 74 | {{"description": "Description of point 2 [Data: Reports (report ids)]", "score": score_value}} 75 | ] 76 | }} 77 | -------------------------------------------------------------------------------- /GraphRAG/prompts/global_search_reduce_system_prompt.txt: -------------------------------------------------------------------------------- 1 | 2 | ---Role--- 3 | 4 | You are a helpful assistant responding to questions about a dataset by synthesizing perspectives from multiple analysts. 5 | 6 | 7 | ---Goal--- 8 | 9 | Generate a response of the target length and format that responds to the user's question, summarize all the reports from multiple analysts who focused on different parts of the dataset. 10 | 11 | Note that the analysts' reports provided below are ranked in the **descending order of importance**. 12 | 13 | If you don't know the answer or if the provided reports do not contain sufficient information to provide an answer, just say so. Do not make anything up. 14 | 15 | The final response should remove all irrelevant information from the analysts' reports and merge the cleaned information into a comprehensive answer that provides explanations of all the key points and implications appropriate for the response length and format. 16 | 17 | Add sections and commentary to the response as appropriate for the length and format. Style the response in markdown. 18 | 19 | The response shall preserve the original meaning and use of modal verbs such as "shall", "may" or "will". 20 | 21 | The response should also preserve all the data references previously included in the analysts' reports, but do not mention the roles of multiple analysts in the analysis process. 22 | 23 | **Do not list more than 5 record ids in a single reference**. Instead, list the top 5 most relevant record ids and add "+more" to indicate that there are more. 24 | 25 | For example: 26 | 27 | "Person X is the owner of Company Y and subject to many allegations of wrongdoing [Data: Reports (2, 7, 34, 46, 64, +more)]. He is also CEO of company X [Data: Reports (1, 3)]" 28 | 29 | where 1, 2, 3, 7, 34, 46, and 64 represent the id (not the index) of the relevant data record. 30 | 31 | Do not include information where the supporting evidence for it is not provided. 32 | 33 | 34 | ---Target response length and format--- 35 | 36 | {response_type} 37 | 38 | 39 | ---Analyst Reports--- 40 | 41 | {report_data} 42 | 43 | 44 | ---Goal--- 45 | 46 | Generate a response of the target length and format that responds to the user's question, summarize all the reports from multiple analysts who focused on different parts of the dataset. 47 | 48 | Note that the analysts' reports provided below are ranked in the **descending order of importance**. 49 | 50 | If you don't know the answer or if the provided reports do not contain sufficient information to provide an answer, just say so. Do not make anything up. 51 | 52 | The final response should remove all irrelevant information from the analysts' reports and merge the cleaned information into a comprehensive answer that provides explanations of all the key points and implications appropriate for the response length and format. 53 | 54 | The response shall preserve the original meaning and use of modal verbs such as "shall", "may" or "will". 55 | 56 | The response should also preserve all the data references previously included in the analysts' reports, but do not mention the roles of multiple analysts in the analysis process. 57 | 58 | **Do not list more than 5 record ids in a single reference**. Instead, list the top 5 most relevant record ids and add "+more" to indicate that there are more. 59 | 60 | For example: 61 | 62 | "Person X is the owner of Company Y and subject to many allegations of wrongdoing [Data: Reports (2, 7, 34, 46, 64, +more)]. He is also CEO of company X [Data: Reports (1, 3)]" 63 | 64 | where 1, 2, 3, 7, 34, 46, and 64 represent the id (not the index) of the relevant data record. 65 | 66 | Do not include information where the supporting evidence for it is not provided. 67 | 68 | 69 | ---Target response length and format--- 70 | 71 | {response_type} 72 | 73 | Add sections and commentary to the response as appropriate for the length and format. Style the response in markdown. 74 | -------------------------------------------------------------------------------- /GraphRAG/prompts/local_search_system_prompt.txt: -------------------------------------------------------------------------------- 1 | 2 | ---Role--- 3 | 4 | You are a helpful assistant responding to questions about data in the tables provided. 5 | 6 | 7 | ---Goal--- 8 | 9 | Generate a response of the target length and format that responds to the user's question, summarizing all information in the input data tables appropriate for the response length and format, and incorporating any relevant general knowledge. 10 | 11 | If you don't know the answer, just say so. Do not make anything up. 12 | 13 | Points supported by data should list their data references as follows: 14 | 15 | "This is an example sentence supported by multiple data references [Data: (record ids); (record ids)]." 16 | 17 | Do not list more than 5 record ids in a single reference. Instead, list the top 5 most relevant record ids and add "+more" to indicate that there are more. 18 | 19 | For example: 20 | 21 | "Person X is the owner of Company Y and subject to many allegations of wrongdoing [Data: Sources (15, 16), Reports (1), Entities (5, 7); Relationships (23); Claims (2, 7, 34, 46, 64, +more)]." 22 | 23 | where 15, 16, 1, 5, 7, 23, 2, 7, 34, 46, and 64 represent the id (not the index) of the relevant data record. 24 | 25 | Do not include information where the supporting evidence for it is not provided. 26 | 27 | 28 | ---Target response length and format--- 29 | 30 | {response_type} 31 | 32 | 33 | ---Data tables--- 34 | 35 | {context_data} 36 | 37 | 38 | ---Goal--- 39 | 40 | Generate a response of the target length and format that responds to the user's question, summarizing all information in the input data tables appropriate for the response length and format, and incorporating any relevant general knowledge. 41 | 42 | If you don't know the answer, just say so. Do not make anything up. 43 | 44 | Points supported by data should list their data references as follows: 45 | 46 | "This is an example sentence supported by multiple data references [Data: (record ids); (record ids)]." 47 | 48 | Do not list more than 5 record ids in a single reference. Instead, list the top 5 most relevant record ids and add "+more" to indicate that there are more. 49 | 50 | For example: 51 | 52 | "Person X is the owner of Company Y and subject to many allegations of wrongdoing [Data: Sources (15, 16), Reports (1), Entities (5, 7); Relationships (23); Claims (2, 7, 34, 46, 64, +more)]." 53 | 54 | where 15, 16, 1, 5, 7, 23, 2, 7, 34, 46, and 64 represent the id (not the index) of the relevant data record. 55 | 56 | Do not include information where the supporting evidence for it is not provided. 57 | 58 | 59 | ---Target response length and format--- 60 | 61 | {response_type} 62 | 63 | Add sections and commentary to the response as appropriate for the length and format. Style the response in markdown. 64 | -------------------------------------------------------------------------------- /GraphRAG/prompts/question_gen_system_prompt.txt: -------------------------------------------------------------------------------- 1 | 2 | ---Role--- 3 | 4 | You are a helpful assistant generating a bulleted list of {question_count} questions about data in the tables provided. 5 | 6 | 7 | ---Data tables--- 8 | 9 | {context_data} 10 | 11 | 12 | ---Goal--- 13 | 14 | Given a series of example questions provided by the user, generate a bulleted list of {question_count} candidates for the next question. Use - marks as bullet points. 15 | 16 | These candidate questions should represent the most important or urgent information content or themes in the data tables. 17 | 18 | The candidate questions should be answerable using the data tables provided, but should not mention any specific data fields or data tables in the question text. 19 | 20 | If the user's questions reference several named entities, then each candidate question should reference all named entities. 21 | 22 | ---Example questions--- 23 | -------------------------------------------------------------------------------- /GraphRAG/prompts/summarize_descriptions.txt: -------------------------------------------------------------------------------- 1 | 2 | You are an expert in literary analysis. You are skilled at textual interpretation, thematic exploration, and understanding narrative techniques. You are adept at helping people identify the relations and structures within literary communities, facilitating a deeper analysis of texts and their socio-cultural contexts. 3 | Using your expertise, you're asked to generate a comprehensive summary of the data provided below. 4 | Given one or two entities, and a list of descriptions, all related to the same entity or group of entities. 5 | Please concatenate all of these into a single, concise description in Chinese. Make sure to include information collected from all the descriptions. 6 | If the provided descriptions are contradictory, please resolve the contradictions and provide a single, coherent summary. 7 | Make sure it is written in third person, and include the entity names so we have the full context. 8 | 9 | Enrich it as much as you can with relevant information from the nearby text, this is very important. 10 | 11 | If no answer is possible, or the description is empty, only convey information that is provided within the text. 12 | ####### 13 | -Data- 14 | Entities: {entity_name} 15 | Description List: {description_list} 16 | ####### 17 | Output: -------------------------------------------------------------------------------- /GraphRAG/settings.yaml: -------------------------------------------------------------------------------- 1 | ### This config file contains required core defaults that must be set, along with a handful of common optional settings. 2 | ### For a full list of available settings, see https://microsoft.github.io/graphrag/config/yaml/ 3 | 4 | ### LLM settings ### 5 | ## There are a number of settings to tune the threading and token limits for LLM calls - check the docs. 6 | 7 | encoding_model: cl100k_base # this needs to be matched to your model! 8 | 9 | llm: 10 | api_key: ${GRAPHRAG_CHAT_API_KEY} # set this in the generated .env file 11 | type: openai_chat # or azure_openai_chat 12 | model: ${GRAPHRAG_CHAT_MODEL} 13 | model_supports_json: true # recommended if this is available for your model. 14 | # audience: "https://cognitiveservices.azure.com/.default" 15 | api_base: ${GRAPHRAG_API_BASE} 16 | # api_version: 2024-02-15-preview 17 | # organization: 18 | # deployment_name: 19 | 20 | parallelization: 21 | stagger: 0.3 22 | # num_threads: 50 23 | 24 | async_mode: threaded # or asyncio 25 | 26 | embeddings: 27 | async_mode: threaded # or asyncio 28 | vector_store: 29 | type: lancedb 30 | db_uri: 'output/lancedb' 31 | container_name: default 32 | overwrite: true 33 | llm: 34 | api_key: ${GRAPHRAG_EMBEDDING_API_KEY} 35 | type: openai_embedding # or azure_openai_embedding 36 | model: ${GRAPHRAG_EMBEDDING_MODEL} 37 | api_base: ${GRAPHRAG_API_BASE} 38 | # api_version: 2024-02-15-preview 39 | # audience: "https://cognitiveservices.azure.com/.default" 40 | # organization: 41 | # deployment_name: 42 | 43 | ### Input settings ### 44 | 45 | input: 46 | type: file # or blob 47 | file_type: text # or csv 48 | base_dir: "input" 49 | file_encoding: utf-8 50 | file_pattern: ".*\\.txt$" 51 | 52 | chunks: 53 | size: 1200 54 | overlap: 100 55 | group_by_columns: [id] 56 | 57 | ### Storage settings ### 58 | ## If blob storage is specified in the following four sections, 59 | ## connection_string and container_name must be provided 60 | 61 | cache: 62 | type: file # or blob 63 | base_dir: "cache" 64 | 65 | reporting: 66 | type: file # or console, blob 67 | base_dir: "logs" 68 | 69 | storage: 70 | type: file # or blob 71 | base_dir: "output" 72 | 73 | ## only turn this on if running `graphrag index` with custom settings 74 | ## we normally use `graphrag update` with the defaults 75 | update_index_storage: 76 | # type: file # or blob 77 | # base_dir: "update_output" 78 | 79 | ### Workflow settings ### 80 | 81 | skip_workflows: [] 82 | 83 | entity_extraction: 84 | prompt: "prompts/entity_extraction.txt" 85 | entity_types: [organization,person,geo,event] 86 | max_gleanings: 1 87 | 88 | summarize_descriptions: 89 | prompt: "prompts/summarize_descriptions.txt" 90 | max_length: 500 91 | 92 | claim_extraction: 93 | enabled: true 94 | prompt: "prompts/claim_extraction.txt" 95 | description: "Any claims or facts that could be relevant to information discovery." 96 | max_gleanings: 1 97 | 98 | community_reports: 99 | prompt: "prompts/community_report.txt" 100 | max_length: 2000 101 | max_input_length: 8000 102 | 103 | cluster_graph: 104 | max_cluster_size: 10 105 | 106 | embed_graph: 107 | enabled: false # if true, will generate node2vec embeddings for nodes 108 | 109 | umap: 110 | enabled: false # if true, will generate UMAP embeddings for nodes 111 | 112 | snapshots: 113 | graphml: false 114 | raw_entities: false 115 | top_level_nodes: false 116 | embeddings: false 117 | transient: false 118 | 119 | ### Query settings ### 120 | ## The prompt locations are required here, but each search method has a number of optional knobs that can be tuned. 121 | ## See the config docs: https://microsoft.github.io/graphrag/config/yaml/#query 122 | 123 | local_search: 124 | prompt: "prompts/local_search_system_prompt.txt" 125 | 126 | global_search: 127 | map_prompt: "prompts/global_search_map_system_prompt.txt" 128 | reduce_prompt: "prompts/global_search_reduce_system_prompt.txt" 129 | knowledge_prompt: "prompts/global_search_knowledge_system_prompt.txt" 130 | 131 | drift_search: 132 | prompt: "prompts/drift_search_system_prompt.txt" 133 | -------------------------------------------------------------------------------- /LightRAG/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM debian:bullseye-slim 2 | ENV JAVA_HOME=/opt/java/openjdk 3 | COPY --from=eclipse-temurin:17 $JAVA_HOME $JAVA_HOME 4 | ENV PATH="${JAVA_HOME}/bin:${PATH}" \ 5 | NEO4J_SHA256=7ce97bd9a4348af14df442f00b3dc5085b5983d6f03da643744838c7a1bc8ba7 \ 6 | NEO4J_TARBALL=neo4j-enterprise-5.24.2-unix.tar.gz \ 7 | NEO4J_EDITION=enterprise \ 8 | NEO4J_HOME="/var/lib/neo4j" \ 9 | LANG=C.UTF-8 10 | ARG NEO4J_URI=https://dist.neo4j.org/neo4j-enterprise-5.24.2-unix.tar.gz 11 | 12 | RUN addgroup --gid 7474 --system neo4j && adduser --uid 7474 --system --no-create-home --home "${NEO4J_HOME}" --ingroup neo4j neo4j 13 | 14 | COPY ./local-package/* /startup/ 15 | 16 | RUN apt update \ 17 | && apt-get install -y curl gcc git jq make procps tini wget \ 18 | && curl --fail --silent --show-error --location --remote-name ${NEO4J_URI} \ 19 | && echo "${NEO4J_SHA256} ${NEO4J_TARBALL}" | sha256sum -c --strict --quiet \ 20 | && tar --extract --file ${NEO4J_TARBALL} --directory /var/lib \ 21 | && mv /var/lib/neo4j-* "${NEO4J_HOME}" \ 22 | && rm ${NEO4J_TARBALL} \ 23 | && sed -i 's/Package Type:.*/Package Type: docker bullseye/' $NEO4J_HOME/packaging_info \ 24 | && mv /startup/neo4j-admin-report.sh "${NEO4J_HOME}"/bin/neo4j-admin-report \ 25 | && mv "${NEO4J_HOME}"/data /data \ 26 | && mv "${NEO4J_HOME}"/logs /logs \ 27 | && chown -R neo4j:neo4j /data \ 28 | && chmod -R 777 /data \ 29 | && chown -R neo4j:neo4j /logs \ 30 | && chmod -R 777 /logs \ 31 | && chown -R neo4j:neo4j "${NEO4J_HOME}" \ 32 | && chmod -R 777 "${NEO4J_HOME}" \ 33 | && chmod -R 755 "${NEO4J_HOME}/bin" \ 34 | && ln -s /data "${NEO4J_HOME}"/data \ 35 | && ln -s /logs "${NEO4J_HOME}"/logs \ 36 | && git clone https://github.com/ncopa/su-exec.git \ 37 | && cd su-exec \ 38 | && git checkout 4c3bb42b093f14da70d8ab924b487ccfbb1397af \ 39 | && echo d6c40440609a23483f12eb6295b5191e94baf08298a856bab6e15b10c3b82891 su-exec.c | sha256sum -c \ 40 | && echo 2a87af245eb125aca9305a0b1025525ac80825590800f047419dc57bba36b334 Makefile | sha256sum -c \ 41 | && make \ 42 | && mv /su-exec/su-exec /usr/bin/su-exec \ 43 | && apt-get -y purge --auto-remove curl gcc git make \ 44 | && rm -rf /var/lib/apt/lists/* /su-exec 45 | 46 | 47 | ENV PATH "${NEO4J_HOME}"/bin:$PATH 48 | 49 | WORKDIR "${NEO4J_HOME}" 50 | 51 | VOLUME /data /logs 52 | 53 | EXPOSE 7474 7473 7687 54 | 55 | ENTRYPOINT ["tini", "-g", "--", "/startup/docker-entrypoint.sh"] 56 | CMD ["neo4j"] 57 | -------------------------------------------------------------------------------- /LightRAG/LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Gustavo Ye 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /LightRAG/examples/batch_eval.py: -------------------------------------------------------------------------------- 1 | import re 2 | import json 3 | import jsonlines 4 | 5 | from openai import OpenAI 6 | 7 | 8 | def batch_eval(query_file, result1_file, result2_file, output_file_path): 9 | client = OpenAI() 10 | 11 | with open(query_file, "r") as f: 12 | data = f.read() 13 | 14 | queries = re.findall(r"- Question \d+: (.+)", data) 15 | 16 | with open(result1_file, "r") as f: 17 | answers1 = json.load(f) 18 | answers1 = [i["result"] for i in answers1] 19 | 20 | with open(result2_file, "r") as f: 21 | answers2 = json.load(f) 22 | answers2 = [i["result"] for i in answers2] 23 | 24 | requests = [] 25 | for i, (query, answer1, answer2) in enumerate(zip(queries, answers1, answers2)): 26 | sys_prompt = """ 27 | ---Role--- 28 | You are an expert tasked with evaluating two answers to the same question based on three criteria: **Comprehensiveness**, **Diversity**, and **Empowerment**. 29 | """ 30 | 31 | prompt = f""" 32 | You will evaluate two answers to the same question based on three criteria: **Comprehensiveness**, **Diversity**, and **Empowerment**. 33 | 34 | - **Comprehensiveness**: How much detail does the answer provide to cover all aspects and details of the question? 35 | - **Diversity**: How varied and rich is the answer in providing different perspectives and insights on the question? 36 | - **Empowerment**: How well does the answer help the reader understand and make informed judgments about the topic? 37 | 38 | For each criterion, choose the better answer (either Answer 1 or Answer 2) and explain why. Then, select an overall winner based on these three categories. 39 | 40 | Here is the question: 41 | {query} 42 | 43 | Here are the two answers: 44 | 45 | **Answer 1:** 46 | {answer1} 47 | 48 | **Answer 2:** 49 | {answer2} 50 | 51 | Evaluate both answers using the three criteria listed above and provide detailed explanations for each criterion. 52 | 53 | Output your evaluation in the following JSON format: 54 | 55 | {{ 56 | "Comprehensiveness": {{ 57 | "Winner": "[Answer 1 or Answer 2]", 58 | "Explanation": "[Provide explanation here]" 59 | }}, 60 | "Empowerment": {{ 61 | "Winner": "[Answer 1 or Answer 2]", 62 | "Explanation": "[Provide explanation here]" 63 | }}, 64 | "Overall Winner": {{ 65 | "Winner": "[Answer 1 or Answer 2]", 66 | "Explanation": "[Summarize why this answer is the overall winner based on the three criteria]" 67 | }} 68 | }} 69 | """ 70 | 71 | request_data = { 72 | "custom_id": f"request-{i+1}", 73 | "method": "POST", 74 | "url": "/v1/chat/completions", 75 | "body": { 76 | "model": "gpt-4o-mini", 77 | "messages": [ 78 | {"role": "system", "content": sys_prompt}, 79 | {"role": "user", "content": prompt}, 80 | ], 81 | }, 82 | } 83 | 84 | requests.append(request_data) 85 | 86 | with jsonlines.open(output_file_path, mode="w") as writer: 87 | for request in requests: 88 | writer.write(request) 89 | 90 | print(f"Batch API requests written to {output_file_path}") 91 | 92 | batch_input_file = client.files.create( 93 | file=open(output_file_path, "rb"), purpose="batch" 94 | ) 95 | batch_input_file_id = batch_input_file.id 96 | 97 | batch = client.batches.create( 98 | input_file_id=batch_input_file_id, 99 | endpoint="/v1/chat/completions", 100 | completion_window="24h", 101 | metadata={"description": "nightly eval job"}, 102 | ) 103 | 104 | print(f"Batch {batch.id} has been created.") 105 | 106 | 107 | if __name__ == "__main__": 108 | batch_eval() 109 | -------------------------------------------------------------------------------- /LightRAG/examples/generate_query.py: -------------------------------------------------------------------------------- 1 | from openai import OpenAI 2 | 3 | # os.environ["OPENAI_API_KEY"] = "" 4 | 5 | 6 | def openai_complete_if_cache( 7 | model="gpt-4o-mini", prompt=None, system_prompt=None, history_messages=[], **kwargs 8 | ) -> str: 9 | openai_client = OpenAI() 10 | 11 | messages = [] 12 | if system_prompt: 13 | messages.append({"role": "system", "content": system_prompt}) 14 | messages.extend(history_messages) 15 | messages.append({"role": "user", "content": prompt}) 16 | 17 | response = openai_client.chat.completions.create( 18 | model=model, messages=messages, **kwargs 19 | ) 20 | return response.choices[0].message.content 21 | 22 | 23 | if __name__ == "__main__": 24 | description = "" 25 | prompt = f""" 26 | Given the following description of a dataset: 27 | 28 | {description} 29 | 30 | Please identify 5 potential users who would engage with this dataset. For each user, list 5 tasks they would perform with this dataset. Then, for each (user, task) combination, generate 5 questions that require a high-level understanding of the entire dataset. 31 | 32 | Output the results in the following structure: 33 | - User 1: [user description] 34 | - Task 1: [task description] 35 | - Question 1: 36 | - Question 2: 37 | - Question 3: 38 | - Question 4: 39 | - Question 5: 40 | - Task 2: [task description] 41 | ... 42 | - Task 5: [task description] 43 | - User 2: [user description] 44 | ... 45 | - User 5: [user description] 46 | ... 47 | """ 48 | 49 | result = openai_complete_if_cache(model="gpt-4o-mini", prompt=prompt) 50 | 51 | file_path = "./queries.txt" 52 | with open(file_path, "w") as file: 53 | file.write(result) 54 | 55 | print(f"Queries written to {file_path}") 56 | -------------------------------------------------------------------------------- /LightRAG/examples/graph_visual_with_html.py: -------------------------------------------------------------------------------- 1 | import networkx as nx 2 | from pyvis.network import Network 3 | import random 4 | 5 | # Load the GraphML file 6 | G = nx.read_graphml("./dickens/graph_chunk_entity_relation.graphml") 7 | 8 | # Create a Pyvis network 9 | net = Network(height="100vh", notebook=True) 10 | 11 | # Convert NetworkX graph to Pyvis network 12 | net.from_nx(G) 13 | 14 | # Add colors to nodes 15 | for node in net.nodes: 16 | node["color"] = "#{:06x}".format(random.randint(0, 0xFFFFFF)) 17 | 18 | # Save and display the network 19 | net.show("knowledge_graph.html") 20 | -------------------------------------------------------------------------------- /LightRAG/examples/graph_visual_with_neo4j.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | from lightrag.utils import xml_to_json 4 | from neo4j import GraphDatabase 5 | 6 | # Constants 7 | WORKING_DIR = "./dickens" 8 | BATCH_SIZE_NODES = 500 9 | BATCH_SIZE_EDGES = 100 10 | 11 | # Neo4j connection credentials 12 | NEO4J_URI = "bolt://localhost:7687" 13 | NEO4J_USERNAME = "neo4j" 14 | NEO4J_PASSWORD = "your_password" 15 | 16 | 17 | def convert_xml_to_json(xml_path, output_path): 18 | """Converts XML file to JSON and saves the output.""" 19 | if not os.path.exists(xml_path): 20 | print(f"Error: File not found - {xml_path}") 21 | return None 22 | 23 | json_data = xml_to_json(xml_path) 24 | if json_data: 25 | with open(output_path, "w", encoding="utf-8") as f: 26 | json.dump(json_data, f, ensure_ascii=False, indent=2) 27 | print(f"JSON file created: {output_path}") 28 | return json_data 29 | else: 30 | print("Failed to create JSON data") 31 | return None 32 | 33 | 34 | def process_in_batches(tx, query, data, batch_size): 35 | """Process data in batches and execute the given query.""" 36 | for i in range(0, len(data), batch_size): 37 | batch = data[i : i + batch_size] 38 | tx.run(query, {"nodes": batch} if "nodes" in query else {"edges": batch}) 39 | 40 | 41 | def main(): 42 | # Paths 43 | xml_file = os.path.join(WORKING_DIR, "graph_chunk_entity_relation.graphml") 44 | json_file = os.path.join(WORKING_DIR, "graph_data.json") 45 | 46 | # Convert XML to JSON 47 | json_data = convert_xml_to_json(xml_file, json_file) 48 | if json_data is None: 49 | return 50 | 51 | # Load nodes and edges 52 | nodes = json_data.get("nodes", []) 53 | edges = json_data.get("edges", []) 54 | 55 | # Neo4j queries 56 | create_nodes_query = """ 57 | UNWIND $nodes AS node 58 | MERGE (e:Entity {id: node.id}) 59 | SET e.entity_type = node.entity_type, 60 | e.description = node.description, 61 | e.source_id = node.source_id, 62 | e.displayName = node.id 63 | REMOVE e:Entity 64 | WITH e, node 65 | CALL apoc.create.addLabels(e, [node.entity_type]) YIELD node AS labeledNode 66 | RETURN count(*) 67 | """ 68 | 69 | create_edges_query = """ 70 | UNWIND $edges AS edge 71 | MATCH (source {id: edge.source}) 72 | MATCH (target {id: edge.target}) 73 | WITH source, target, edge, 74 | CASE 75 | WHEN edge.keywords CONTAINS 'lead' THEN 'lead' 76 | WHEN edge.keywords CONTAINS 'participate' THEN 'participate' 77 | WHEN edge.keywords CONTAINS 'uses' THEN 'uses' 78 | WHEN edge.keywords CONTAINS 'located' THEN 'located' 79 | WHEN edge.keywords CONTAINS 'occurs' THEN 'occurs' 80 | ELSE REPLACE(SPLIT(edge.keywords, ',')[0], '\"', '') 81 | END AS relType 82 | CALL apoc.create.relationship(source, relType, { 83 | weight: edge.weight, 84 | description: edge.description, 85 | keywords: edge.keywords, 86 | source_id: edge.source_id 87 | }, target) YIELD rel 88 | RETURN count(*) 89 | """ 90 | 91 | set_displayname_and_labels_query = """ 92 | MATCH (n) 93 | SET n.displayName = n.id 94 | WITH n 95 | CALL apoc.create.setLabels(n, [n.entity_type]) YIELD node 96 | RETURN count(*) 97 | """ 98 | 99 | # Create a Neo4j driver 100 | driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD)) 101 | 102 | try: 103 | # Execute queries in batches 104 | with driver.session() as session: 105 | # Insert nodes in batches 106 | session.execute_write( 107 | process_in_batches, create_nodes_query, nodes, BATCH_SIZE_NODES 108 | ) 109 | 110 | # Insert edges in batches 111 | session.execute_write( 112 | process_in_batches, create_edges_query, edges, BATCH_SIZE_EDGES 113 | ) 114 | 115 | # Set displayName and labels 116 | session.run(set_displayname_and_labels_query) 117 | 118 | except Exception as e: 119 | print(f"Error occurred: {e}") 120 | 121 | finally: 122 | driver.close() 123 | 124 | 125 | if __name__ == "__main__": 126 | main() 127 | -------------------------------------------------------------------------------- /LightRAG/examples/insert_custom_kg.py: -------------------------------------------------------------------------------- 1 | import os 2 | from lightrag import LightRAG 3 | from lightrag.llm import gpt_4o_mini_complete 4 | ######### 5 | # Uncomment the below two lines if running in a jupyter notebook to handle the async nature of rag.insert() 6 | # import nest_asyncio 7 | # nest_asyncio.apply() 8 | ######### 9 | 10 | WORKING_DIR = "./custom_kg" 11 | 12 | if not os.path.exists(WORKING_DIR): 13 | os.mkdir(WORKING_DIR) 14 | 15 | rag = LightRAG( 16 | working_dir=WORKING_DIR, 17 | llm_model_func=gpt_4o_mini_complete, # Use gpt_4o_mini_complete LLM model 18 | # llm_model_func=gpt_4o_complete # Optionally, use a stronger model 19 | ) 20 | 21 | custom_kg = { 22 | "entities": [ 23 | { 24 | "entity_name": "CompanyA", 25 | "entity_type": "Organization", 26 | "description": "A major technology company", 27 | "source_id": "Source1", 28 | }, 29 | { 30 | "entity_name": "ProductX", 31 | "entity_type": "Product", 32 | "description": "A popular product developed by CompanyA", 33 | "source_id": "Source1", 34 | }, 35 | { 36 | "entity_name": "PersonA", 37 | "entity_type": "Person", 38 | "description": "A renowned researcher in AI", 39 | "source_id": "Source2", 40 | }, 41 | { 42 | "entity_name": "UniversityB", 43 | "entity_type": "Organization", 44 | "description": "A leading university specializing in technology and sciences", 45 | "source_id": "Source2", 46 | }, 47 | { 48 | "entity_name": "CityC", 49 | "entity_type": "Location", 50 | "description": "A large metropolitan city known for its culture and economy", 51 | "source_id": "Source3", 52 | }, 53 | { 54 | "entity_name": "EventY", 55 | "entity_type": "Event", 56 | "description": "An annual technology conference held in CityC", 57 | "source_id": "Source3", 58 | }, 59 | { 60 | "entity_name": "CompanyD", 61 | "entity_type": "Organization", 62 | "description": "A financial services company specializing in insurance", 63 | "source_id": "Source4", 64 | }, 65 | { 66 | "entity_name": "ServiceZ", 67 | "entity_type": "Service", 68 | "description": "An insurance product offered by CompanyD", 69 | "source_id": "Source4", 70 | }, 71 | ], 72 | "relationships": [ 73 | { 74 | "src_id": "CompanyA", 75 | "tgt_id": "ProductX", 76 | "description": "CompanyA develops ProductX", 77 | "keywords": "develop, produce", 78 | "weight": 1.0, 79 | "source_id": "Source1", 80 | }, 81 | { 82 | "src_id": "PersonA", 83 | "tgt_id": "UniversityB", 84 | "description": "PersonA works at UniversityB", 85 | "keywords": "employment, affiliation", 86 | "weight": 0.9, 87 | "source_id": "Source2", 88 | }, 89 | { 90 | "src_id": "CityC", 91 | "tgt_id": "EventY", 92 | "description": "EventY is hosted in CityC", 93 | "keywords": "host, location", 94 | "weight": 0.8, 95 | "source_id": "Source3", 96 | }, 97 | { 98 | "src_id": "CompanyD", 99 | "tgt_id": "ServiceZ", 100 | "description": "CompanyD provides ServiceZ", 101 | "keywords": "provide, offer", 102 | "weight": 1.0, 103 | "source_id": "Source4", 104 | }, 105 | ], 106 | } 107 | 108 | rag.insert_custom_kg(custom_kg) 109 | -------------------------------------------------------------------------------- /LightRAG/examples/lightrag_api_openai_compatible_demo.py: -------------------------------------------------------------------------------- 1 | from fastapi import FastAPI, HTTPException, File, UploadFile 2 | from pydantic import BaseModel 3 | import os 4 | from lightrag import LightRAG, QueryParam 5 | from lightrag.llm import openai_complete_if_cache, openai_embedding 6 | from lightrag.utils import EmbeddingFunc 7 | import numpy as np 8 | from typing import Optional 9 | import asyncio 10 | import nest_asyncio 11 | 12 | # Apply nest_asyncio to solve event loop issues 13 | nest_asyncio.apply() 14 | 15 | DEFAULT_RAG_DIR = "index_default" 16 | app = FastAPI(title="LightRAG API", description="API for RAG operations") 17 | 18 | # Configure working directory 19 | WORKING_DIR = os.environ.get("RAG_DIR", f"{DEFAULT_RAG_DIR}") 20 | print(f"WORKING_DIR: {WORKING_DIR}") 21 | LLM_MODEL = os.environ.get("LLM_MODEL", "gpt-4o-mini") 22 | print(f"LLM_MODEL: {LLM_MODEL}") 23 | EMBEDDING_MODEL = os.environ.get("EMBEDDING_MODEL", "text-embedding-3-large") 24 | print(f"EMBEDDING_MODEL: {EMBEDDING_MODEL}") 25 | EMBEDDING_MAX_TOKEN_SIZE = int(os.environ.get("EMBEDDING_MAX_TOKEN_SIZE", 8192)) 26 | print(f"EMBEDDING_MAX_TOKEN_SIZE: {EMBEDDING_MAX_TOKEN_SIZE}") 27 | 28 | if not os.path.exists(WORKING_DIR): 29 | os.mkdir(WORKING_DIR) 30 | 31 | 32 | # LLM model function 33 | 34 | 35 | async def llm_model_func( 36 | prompt, system_prompt=None, history_messages=[], **kwargs 37 | ) -> str: 38 | return await openai_complete_if_cache( 39 | LLM_MODEL, 40 | prompt, 41 | system_prompt=system_prompt, 42 | history_messages=history_messages, 43 | **kwargs, 44 | ) 45 | 46 | 47 | # Embedding function 48 | 49 | 50 | async def embedding_func(texts: list[str]) -> np.ndarray: 51 | return await openai_embedding( 52 | texts, 53 | model=EMBEDDING_MODEL, 54 | ) 55 | 56 | 57 | async def get_embedding_dim(): 58 | test_text = ["This is a test sentence."] 59 | embedding = await embedding_func(test_text) 60 | embedding_dim = embedding.shape[1] 61 | print(f"{embedding_dim=}") 62 | return embedding_dim 63 | 64 | 65 | # Initialize RAG instance 66 | rag = LightRAG( 67 | working_dir=WORKING_DIR, 68 | llm_model_func=llm_model_func, 69 | embedding_func=EmbeddingFunc( 70 | embedding_dim=asyncio.run(get_embedding_dim()), 71 | max_token_size=EMBEDDING_MAX_TOKEN_SIZE, 72 | func=embedding_func, 73 | ), 74 | ) 75 | 76 | 77 | # Data models 78 | 79 | 80 | class QueryRequest(BaseModel): 81 | query: str 82 | mode: str = "hybrid" 83 | only_need_context: bool = False 84 | 85 | 86 | class InsertRequest(BaseModel): 87 | text: str 88 | 89 | 90 | class Response(BaseModel): 91 | status: str 92 | data: Optional[str] = None 93 | message: Optional[str] = None 94 | 95 | 96 | # API routes 97 | 98 | 99 | @app.post("/query", response_model=Response) 100 | async def query_endpoint(request: QueryRequest): 101 | try: 102 | loop = asyncio.get_event_loop() 103 | result = await loop.run_in_executor( 104 | None, 105 | lambda: rag.query( 106 | request.query, 107 | param=QueryParam( 108 | mode=request.mode, only_need_context=request.only_need_context 109 | ), 110 | ), 111 | ) 112 | return Response(status="success", data=result) 113 | except Exception as e: 114 | raise HTTPException(status_code=500, detail=str(e)) 115 | 116 | 117 | @app.post("/insert", response_model=Response) 118 | async def insert_endpoint(request: InsertRequest): 119 | try: 120 | loop = asyncio.get_event_loop() 121 | await loop.run_in_executor(None, lambda: rag.insert(request.text)) 122 | return Response(status="success", message="Text inserted successfully") 123 | except Exception as e: 124 | raise HTTPException(status_code=500, detail=str(e)) 125 | 126 | 127 | @app.post("/insert_file", response_model=Response) 128 | async def insert_file(file: UploadFile = File(...)): 129 | try: 130 | file_content = await file.read() 131 | # Read file content 132 | try: 133 | content = file_content.decode("utf-8") 134 | except UnicodeDecodeError: 135 | # If UTF-8 decoding fails, try other encodings 136 | content = file_content.decode("gbk") 137 | # Insert file content 138 | loop = asyncio.get_event_loop() 139 | await loop.run_in_executor(None, lambda: rag.insert(content)) 140 | 141 | return Response( 142 | status="success", 143 | message=f"File content from {file.filename} inserted successfully", 144 | ) 145 | except Exception as e: 146 | raise HTTPException(status_code=500, detail=str(e)) 147 | 148 | 149 | @app.get("/health") 150 | async def health_check(): 151 | return {"status": "healthy"} 152 | 153 | 154 | if __name__ == "__main__": 155 | import uvicorn 156 | 157 | uvicorn.run(app, host="0.0.0.0", port=8020) 158 | 159 | # Usage example 160 | # To run the server, use the following command in your terminal: 161 | # python lightrag_api_openai_compatible_demo.py 162 | 163 | # Example requests: 164 | # 1. Query: 165 | # curl -X POST "http://127.0.0.1:8020/query" -H "Content-Type: application/json" -d '{"query": "your query here", "mode": "hybrid"}' 166 | 167 | # 2. Insert text: 168 | # curl -X POST "http://127.0.0.1:8020/insert" -H "Content-Type: application/json" -d '{"text": "your text here"}' 169 | 170 | # 3. Insert file: 171 | # curl -X POST "http://127.0.0.1:8020/insert_file" -H "Content-Type: application/json" -d '{"file_path": "path/to/your/file.txt"}' 172 | 173 | # 4. Health check: 174 | # curl -X GET "http://127.0.0.1:8020/health" 175 | -------------------------------------------------------------------------------- /LightRAG/examples/lightrag_api_oracle_demo..py: -------------------------------------------------------------------------------- 1 | from fastapi import FastAPI, HTTPException, File, UploadFile 2 | from fastapi import Query 3 | from contextlib import asynccontextmanager 4 | from pydantic import BaseModel 5 | from typing import Optional, Any 6 | 7 | import sys 8 | import os 9 | 10 | 11 | from pathlib import Path 12 | 13 | import asyncio 14 | import nest_asyncio 15 | from lightrag import LightRAG, QueryParam 16 | from lightrag.llm import openai_complete_if_cache, openai_embedding 17 | from lightrag.utils import EmbeddingFunc 18 | import numpy as np 19 | 20 | from lightrag.kg.oracle_impl import OracleDB 21 | 22 | print(os.getcwd()) 23 | script_directory = Path(__file__).resolve().parent.parent 24 | sys.path.append(os.path.abspath(script_directory)) 25 | 26 | 27 | # Apply nest_asyncio to solve event loop issues 28 | nest_asyncio.apply() 29 | 30 | DEFAULT_RAG_DIR = "index_default" 31 | 32 | 33 | # We use OpenAI compatible API to call LLM on Oracle Cloud 34 | # More docs here https://github.com/jin38324/OCI_GenAI_access_gateway 35 | BASE_URL = "http://xxx.xxx.xxx.xxx:8088/v1/" 36 | APIKEY = "ocigenerativeai" 37 | 38 | # Configure working directory 39 | WORKING_DIR = os.environ.get("RAG_DIR", f"{DEFAULT_RAG_DIR}") 40 | print(f"WORKING_DIR: {WORKING_DIR}") 41 | LLM_MODEL = os.environ.get("LLM_MODEL", "cohere.command-r-plus-08-2024") 42 | print(f"LLM_MODEL: {LLM_MODEL}") 43 | EMBEDDING_MODEL = os.environ.get("EMBEDDING_MODEL", "cohere.embed-multilingual-v3.0") 44 | print(f"EMBEDDING_MODEL: {EMBEDDING_MODEL}") 45 | EMBEDDING_MAX_TOKEN_SIZE = int(os.environ.get("EMBEDDING_MAX_TOKEN_SIZE", 512)) 46 | print(f"EMBEDDING_MAX_TOKEN_SIZE: {EMBEDDING_MAX_TOKEN_SIZE}") 47 | 48 | if not os.path.exists(WORKING_DIR): 49 | os.mkdir(WORKING_DIR) 50 | 51 | 52 | async def llm_model_func( 53 | prompt, system_prompt=None, history_messages=[], **kwargs 54 | ) -> str: 55 | return await openai_complete_if_cache( 56 | LLM_MODEL, 57 | prompt, 58 | system_prompt=system_prompt, 59 | history_messages=history_messages, 60 | api_key=APIKEY, 61 | base_url=BASE_URL, 62 | **kwargs, 63 | ) 64 | 65 | 66 | async def embedding_func(texts: list[str]) -> np.ndarray: 67 | return await openai_embedding( 68 | texts, 69 | model=EMBEDDING_MODEL, 70 | api_key=APIKEY, 71 | base_url=BASE_URL, 72 | ) 73 | 74 | 75 | async def get_embedding_dim(): 76 | test_text = ["This is a test sentence."] 77 | embedding = await embedding_func(test_text) 78 | embedding_dim = embedding.shape[1] 79 | return embedding_dim 80 | 81 | 82 | async def init(): 83 | # Detect embedding dimension 84 | embedding_dimension = await get_embedding_dim() 85 | print(f"Detected embedding dimension: {embedding_dimension}") 86 | # Create Oracle DB connection 87 | # The `config` parameter is the connection configuration of Oracle DB 88 | # More docs here https://python-oracledb.readthedocs.io/en/latest/user_guide/connection_handling.html 89 | # We storage data in unified tables, so we need to set a `workspace` parameter to specify which docs we want to store and query 90 | # Below is an example of how to connect to Oracle Autonomous Database on Oracle Cloud 91 | 92 | oracle_db = OracleDB( 93 | config={ 94 | "user": "", 95 | "password": "", 96 | "dsn": "", 97 | "config_dir": "path_to_config_dir", 98 | "wallet_location": "path_to_wallet_location", 99 | "wallet_password": "wallet_password", 100 | "workspace": "company", 101 | } # specify which docs you want to store and query 102 | ) 103 | 104 | # Check if Oracle DB tables exist, if not, tables will be created 105 | await oracle_db.check_tables() 106 | # Initialize LightRAG 107 | # We use Oracle DB as the KV/vector/graph storage 108 | # You can add `addon_params={"example_number": 1, "language": "Simplfied Chinese"}` to control the prompt 109 | rag = LightRAG( 110 | enable_llm_cache=False, 111 | working_dir=WORKING_DIR, 112 | chunk_token_size=512, 113 | llm_model_func=llm_model_func, 114 | embedding_func=EmbeddingFunc( 115 | embedding_dim=embedding_dimension, 116 | max_token_size=512, 117 | func=embedding_func, 118 | ), 119 | graph_storage="OracleGraphStorage", 120 | kv_storage="OracleKVStorage", 121 | vector_storage="OracleVectorDBStorage", 122 | ) 123 | 124 | # Setthe KV/vector/graph storage's `db` property, so all operation will use same connection pool 125 | rag.graph_storage_cls.db = oracle_db 126 | rag.key_string_value_json_storage_cls.db = oracle_db 127 | rag.vector_db_storage_cls.db = oracle_db 128 | 129 | return rag 130 | 131 | 132 | # Extract and Insert into LightRAG storage 133 | # with open("./dickens/book.txt", "r", encoding="utf-8") as f: 134 | # await rag.ainsert(f.read()) 135 | 136 | # # Perform search in different modes 137 | # modes = ["naive", "local", "global", "hybrid"] 138 | # for mode in modes: 139 | # print("="*20, mode, "="*20) 140 | # print(await rag.aquery("这篇文档是关于什么内容的?", param=QueryParam(mode=mode))) 141 | # print("-"*100, "\n") 142 | 143 | # Data models 144 | 145 | 146 | class QueryRequest(BaseModel): 147 | query: str 148 | mode: str = "hybrid" 149 | only_need_context: bool = False 150 | only_need_prompt: bool = False 151 | 152 | 153 | class DataRequest(BaseModel): 154 | limit: int = 100 155 | 156 | 157 | class InsertRequest(BaseModel): 158 | text: str 159 | 160 | 161 | class Response(BaseModel): 162 | status: str 163 | data: Optional[Any] = None 164 | message: Optional[str] = None 165 | 166 | 167 | # API routes 168 | 169 | rag = None 170 | 171 | 172 | @asynccontextmanager 173 | async def lifespan(app: FastAPI): 174 | global rag 175 | rag = await init() 176 | print("done!") 177 | yield 178 | 179 | 180 | app = FastAPI( 181 | title="LightRAG API", description="API for RAG operations", lifespan=lifespan 182 | ) 183 | 184 | 185 | @app.post("/query", response_model=Response) 186 | async def query_endpoint(request: QueryRequest): 187 | # try: 188 | # loop = asyncio.get_event_loop() 189 | if request.mode == "naive": 190 | top_k = 3 191 | else: 192 | top_k = 60 193 | result = await rag.aquery( 194 | request.query, 195 | param=QueryParam( 196 | mode=request.mode, 197 | only_need_context=request.only_need_context, 198 | only_need_prompt=request.only_need_prompt, 199 | top_k=top_k, 200 | ), 201 | ) 202 | return Response(status="success", data=result) 203 | # except Exception as e: 204 | # raise HTTPException(status_code=500, detail=str(e)) 205 | 206 | 207 | @app.get("/data", response_model=Response) 208 | async def query_all_nodes(type: str = Query("nodes"), limit: int = Query(100)): 209 | if type == "nodes": 210 | result = await rag.chunk_entity_relation_graph.get_all_nodes(limit=limit) 211 | elif type == "edges": 212 | result = await rag.chunk_entity_relation_graph.get_all_edges(limit=limit) 213 | elif type == "statistics": 214 | result = await rag.chunk_entity_relation_graph.get_statistics() 215 | return Response(status="success", data=result) 216 | 217 | 218 | @app.post("/insert", response_model=Response) 219 | async def insert_endpoint(request: InsertRequest): 220 | try: 221 | loop = asyncio.get_event_loop() 222 | await loop.run_in_executor(None, lambda: rag.insert(request.text)) 223 | return Response(status="success", message="Text inserted successfully") 224 | except Exception as e: 225 | raise HTTPException(status_code=500, detail=str(e)) 226 | 227 | 228 | @app.post("/insert_file", response_model=Response) 229 | async def insert_file(file: UploadFile = File(...)): 230 | try: 231 | file_content = await file.read() 232 | # Read file content 233 | try: 234 | content = file_content.decode("utf-8") 235 | except UnicodeDecodeError: 236 | # If UTF-8 decoding fails, try other encodings 237 | content = file_content.decode("gbk") 238 | # Insert file content 239 | loop = asyncio.get_event_loop() 240 | await loop.run_in_executor(None, lambda: rag.insert(content)) 241 | 242 | return Response( 243 | status="success", 244 | message=f"File content from {file.filename} inserted successfully", 245 | ) 246 | except Exception as e: 247 | raise HTTPException(status_code=500, detail=str(e)) 248 | 249 | 250 | @app.get("/health") 251 | async def health_check(): 252 | return {"status": "healthy"} 253 | 254 | 255 | if __name__ == "__main__": 256 | import uvicorn 257 | 258 | uvicorn.run(app, host="127.0.0.1", port=8020) 259 | 260 | # Usage example 261 | # To run the server, use the following command in your terminal: 262 | # python lightrag_api_openai_compatible_demo.py 263 | 264 | # Example requests: 265 | # 1. Query: 266 | # curl -X POST "http://127.0.0.1:8020/query" -H "Content-Type: application/json" -d '{"query": "your query here", "mode": "hybrid"}' 267 | 268 | # 2. Insert text: 269 | # curl -X POST "http://127.0.0.1:8020/insert" -H "Content-Type: application/json" -d '{"text": "your text here"}' 270 | 271 | # 3. Insert file: 272 | # curl -X POST "http://127.0.0.1:8020/insert_file" -H "Content-Type: application/json" -d '{"file_path": "path/to/your/file.txt"}' 273 | 274 | # 4. Health check: 275 | # curl -X GET "http://127.0.0.1:8020/health" 276 | -------------------------------------------------------------------------------- /LightRAG/examples/lightrag_azure_openai_demo.py: -------------------------------------------------------------------------------- 1 | import os 2 | import asyncio 3 | from lightrag import LightRAG, QueryParam 4 | from lightrag.utils import EmbeddingFunc 5 | import numpy as np 6 | from dotenv import load_dotenv 7 | import logging 8 | from openai import AzureOpenAI 9 | 10 | logging.basicConfig(level=logging.INFO) 11 | 12 | load_dotenv() 13 | 14 | AZURE_OPENAI_API_VERSION = os.getenv("AZURE_OPENAI_API_VERSION") 15 | AZURE_OPENAI_DEPLOYMENT = os.getenv("AZURE_OPENAI_DEPLOYMENT") 16 | AZURE_OPENAI_API_KEY = os.getenv("AZURE_OPENAI_API_KEY") 17 | AZURE_OPENAI_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT") 18 | 19 | AZURE_EMBEDDING_DEPLOYMENT = os.getenv("AZURE_EMBEDDING_DEPLOYMENT") 20 | AZURE_EMBEDDING_API_VERSION = os.getenv("AZURE_EMBEDDING_API_VERSION") 21 | 22 | WORKING_DIR = "./dickens" 23 | 24 | if os.path.exists(WORKING_DIR): 25 | import shutil 26 | 27 | shutil.rmtree(WORKING_DIR) 28 | 29 | os.mkdir(WORKING_DIR) 30 | 31 | 32 | async def llm_model_func( 33 | prompt, system_prompt=None, history_messages=[], **kwargs 34 | ) -> str: 35 | client = AzureOpenAI( 36 | api_key=AZURE_OPENAI_API_KEY, 37 | api_version=AZURE_OPENAI_API_VERSION, 38 | azure_endpoint=AZURE_OPENAI_ENDPOINT, 39 | ) 40 | 41 | messages = [] 42 | if system_prompt: 43 | messages.append({"role": "system", "content": system_prompt}) 44 | if history_messages: 45 | messages.extend(history_messages) 46 | messages.append({"role": "user", "content": prompt}) 47 | 48 | chat_completion = client.chat.completions.create( 49 | model=AZURE_OPENAI_DEPLOYMENT, # model = "deployment_name". 50 | messages=messages, 51 | temperature=kwargs.get("temperature", 0), 52 | top_p=kwargs.get("top_p", 1), 53 | n=kwargs.get("n", 1), 54 | ) 55 | return chat_completion.choices[0].message.content 56 | 57 | 58 | async def embedding_func(texts: list[str]) -> np.ndarray: 59 | client = AzureOpenAI( 60 | api_key=AZURE_OPENAI_API_KEY, 61 | api_version=AZURE_EMBEDDING_API_VERSION, 62 | azure_endpoint=AZURE_OPENAI_ENDPOINT, 63 | ) 64 | embedding = client.embeddings.create(model=AZURE_EMBEDDING_DEPLOYMENT, input=texts) 65 | 66 | embeddings = [item.embedding for item in embedding.data] 67 | return np.array(embeddings) 68 | 69 | 70 | async def test_funcs(): 71 | result = await llm_model_func("How are you?") 72 | print("Resposta do llm_model_func: ", result) 73 | 74 | result = await embedding_func(["How are you?"]) 75 | print("Resultado do embedding_func: ", result.shape) 76 | print("Dimensão da embedding: ", result.shape[1]) 77 | 78 | 79 | asyncio.run(test_funcs()) 80 | 81 | embedding_dimension = 3072 82 | 83 | rag = LightRAG( 84 | working_dir=WORKING_DIR, 85 | llm_model_func=llm_model_func, 86 | embedding_func=EmbeddingFunc( 87 | embedding_dim=embedding_dimension, 88 | max_token_size=8192, 89 | func=embedding_func, 90 | ), 91 | ) 92 | 93 | book1 = open("./book_1.txt", encoding="utf-8") 94 | book2 = open("./book_2.txt", encoding="utf-8") 95 | 96 | rag.insert([book1.read(), book2.read()]) 97 | 98 | query_text = "What are the main themes?" 99 | 100 | print("Result (Naive):") 101 | print(rag.query(query_text, param=QueryParam(mode="naive"))) 102 | 103 | print("\nResult (Local):") 104 | print(rag.query(query_text, param=QueryParam(mode="local"))) 105 | 106 | print("\nResult (Global):") 107 | print(rag.query(query_text, param=QueryParam(mode="global"))) 108 | 109 | print("\nResult (Hybrid):") 110 | print(rag.query(query_text, param=QueryParam(mode="hybrid"))) 111 | -------------------------------------------------------------------------------- /LightRAG/examples/lightrag_bedrock_demo.py: -------------------------------------------------------------------------------- 1 | """ 2 | LightRAG meets Amazon Bedrock ⛰️ 3 | """ 4 | 5 | import os 6 | import logging 7 | 8 | from lightrag import LightRAG, QueryParam 9 | from lightrag.llm import bedrock_complete, bedrock_embedding 10 | from lightrag.utils import EmbeddingFunc 11 | 12 | logging.getLogger("aiobotocore").setLevel(logging.WARNING) 13 | 14 | WORKING_DIR = "./dickens" 15 | if not os.path.exists(WORKING_DIR): 16 | os.mkdir(WORKING_DIR) 17 | 18 | rag = LightRAG( 19 | working_dir=WORKING_DIR, 20 | llm_model_func=bedrock_complete, 21 | llm_model_name="Anthropic Claude 3 Haiku // Amazon Bedrock", 22 | embedding_func=EmbeddingFunc( 23 | embedding_dim=1024, max_token_size=8192, func=bedrock_embedding 24 | ), 25 | ) 26 | 27 | with open("./book.txt", "r", encoding="utf-8") as f: 28 | rag.insert(f.read()) 29 | 30 | for mode in ["naive", "local", "global", "hybrid"]: 31 | print("\n+-" + "-" * len(mode) + "-+") 32 | print(f"| {mode.capitalize()} |") 33 | print("+-" + "-" * len(mode) + "-+\n") 34 | print( 35 | rag.query("What are the top themes in this story?", param=QueryParam(mode=mode)) 36 | ) 37 | -------------------------------------------------------------------------------- /LightRAG/examples/lightrag_hf_demo.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | from lightrag import LightRAG, QueryParam 4 | from lightrag.llm import hf_model_complete, hf_embedding 5 | from lightrag.utils import EmbeddingFunc 6 | from transformers import AutoModel, AutoTokenizer 7 | 8 | WORKING_DIR = "./dickens" 9 | 10 | if not os.path.exists(WORKING_DIR): 11 | os.mkdir(WORKING_DIR) 12 | 13 | rag = LightRAG( 14 | working_dir=WORKING_DIR, 15 | llm_model_func=hf_model_complete, 16 | llm_model_name="meta-llama/Llama-3.1-8B-Instruct", 17 | embedding_func=EmbeddingFunc( 18 | embedding_dim=384, 19 | max_token_size=5000, 20 | func=lambda texts: hf_embedding( 21 | texts, 22 | tokenizer=AutoTokenizer.from_pretrained( 23 | "sentence-transformers/all-MiniLM-L6-v2" 24 | ), 25 | embed_model=AutoModel.from_pretrained( 26 | "sentence-transformers/all-MiniLM-L6-v2" 27 | ), 28 | ), 29 | ), 30 | ) 31 | 32 | 33 | with open("./book.txt", "r", encoding="utf-8") as f: 34 | rag.insert(f.read()) 35 | 36 | # Perform naive search 37 | print( 38 | rag.query("What are the top themes in this story?", param=QueryParam(mode="naive")) 39 | ) 40 | 41 | # Perform local search 42 | print( 43 | rag.query("What are the top themes in this story?", param=QueryParam(mode="local")) 44 | ) 45 | 46 | # Perform global search 47 | print( 48 | rag.query("What are the top themes in this story?", param=QueryParam(mode="global")) 49 | ) 50 | 51 | # Perform hybrid search 52 | print( 53 | rag.query("What are the top themes in this story?", param=QueryParam(mode="hybrid")) 54 | ) 55 | -------------------------------------------------------------------------------- /LightRAG/examples/lightrag_lmdeploy_demo.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | from lightrag import LightRAG, QueryParam 4 | from lightrag.llm import lmdeploy_model_if_cache, hf_embedding 5 | from lightrag.utils import EmbeddingFunc 6 | from transformers import AutoModel, AutoTokenizer 7 | 8 | WORKING_DIR = "./dickens" 9 | 10 | if not os.path.exists(WORKING_DIR): 11 | os.mkdir(WORKING_DIR) 12 | 13 | 14 | async def lmdeploy_model_complete( 15 | prompt=None, system_prompt=None, history_messages=[], **kwargs 16 | ) -> str: 17 | model_name = kwargs["hashing_kv"].global_config["llm_model_name"] 18 | return await lmdeploy_model_if_cache( 19 | model_name, 20 | prompt, 21 | system_prompt=system_prompt, 22 | history_messages=history_messages, 23 | ## please specify chat_template if your local path does not follow original HF file name, 24 | ## or model_name is a pytorch model on huggingface.co, 25 | ## you can refer to https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/model.py 26 | ## for a list of chat_template available in lmdeploy. 27 | chat_template="llama3", 28 | # model_format ='awq', # if you are using awq quantization model. 29 | # quant_policy=8, # if you want to use online kv cache, 4=kv int4, 8=kv int8. 30 | **kwargs, 31 | ) 32 | 33 | 34 | rag = LightRAG( 35 | working_dir=WORKING_DIR, 36 | llm_model_func=lmdeploy_model_complete, 37 | llm_model_name="meta-llama/Llama-3.1-8B-Instruct", # please use definite path for local model 38 | embedding_func=EmbeddingFunc( 39 | embedding_dim=384, 40 | max_token_size=5000, 41 | func=lambda texts: hf_embedding( 42 | texts, 43 | tokenizer=AutoTokenizer.from_pretrained( 44 | "sentence-transformers/all-MiniLM-L6-v2" 45 | ), 46 | embed_model=AutoModel.from_pretrained( 47 | "sentence-transformers/all-MiniLM-L6-v2" 48 | ), 49 | ), 50 | ), 51 | ) 52 | 53 | 54 | with open("./book.txt", "r", encoding="utf-8") as f: 55 | rag.insert(f.read()) 56 | 57 | # Perform naive search 58 | print( 59 | rag.query("What are the top themes in this story?", param=QueryParam(mode="naive")) 60 | ) 61 | 62 | # Perform local search 63 | print( 64 | rag.query("What are the top themes in this story?", param=QueryParam(mode="local")) 65 | ) 66 | 67 | # Perform global search 68 | print( 69 | rag.query("What are the top themes in this story?", param=QueryParam(mode="global")) 70 | ) 71 | 72 | # Perform hybrid search 73 | print( 74 | rag.query("What are the top themes in this story?", param=QueryParam(mode="hybrid")) 75 | ) 76 | -------------------------------------------------------------------------------- /LightRAG/examples/lightrag_ollama_demo.py: -------------------------------------------------------------------------------- 1 | import os 2 | import logging 3 | from lightrag import LightRAG, QueryParam 4 | from lightrag.llm import ollama_model_complete, ollama_embedding 5 | from lightrag.utils import EmbeddingFunc 6 | 7 | WORKING_DIR = "./dickens" 8 | 9 | logging.basicConfig(format="%(levelname)s:%(message)s", level=logging.INFO) 10 | 11 | if not os.path.exists(WORKING_DIR): 12 | os.mkdir(WORKING_DIR) 13 | 14 | rag = LightRAG( 15 | working_dir=WORKING_DIR, 16 | llm_model_func=ollama_model_complete, 17 | llm_model_name="gemma2:2b", 18 | llm_model_max_async=4, 19 | llm_model_max_token_size=32768, 20 | llm_model_kwargs={"host": "http://localhost:11434", "options": {"num_ctx": 32768}}, 21 | embedding_func=EmbeddingFunc( 22 | embedding_dim=768, 23 | max_token_size=8192, 24 | func=lambda texts: ollama_embedding( 25 | texts, embed_model="nomic-embed-text", host="http://localhost:11434" 26 | ), 27 | ), 28 | ) 29 | 30 | with open("./book.txt", "r", encoding="utf-8") as f: 31 | rag.insert(f.read()) 32 | 33 | # Perform naive search 34 | print( 35 | rag.query("What are the top themes in this story?", param=QueryParam(mode="naive")) 36 | ) 37 | 38 | # Perform local search 39 | print( 40 | rag.query("What are the top themes in this story?", param=QueryParam(mode="local")) 41 | ) 42 | 43 | # Perform global search 44 | print( 45 | rag.query("What are the top themes in this story?", param=QueryParam(mode="global")) 46 | ) 47 | 48 | # Perform hybrid search 49 | print( 50 | rag.query("What are the top themes in this story?", param=QueryParam(mode="hybrid")) 51 | ) 52 | -------------------------------------------------------------------------------- /LightRAG/examples/lightrag_openai_compatible_demo.py: -------------------------------------------------------------------------------- 1 | import os 2 | import asyncio 3 | from lightrag import LightRAG, QueryParam 4 | from lightrag.llm import openai_complete_if_cache, openai_embedding 5 | from lightrag.utils import EmbeddingFunc 6 | import numpy as np 7 | 8 | WORKING_DIR = "./dickens" 9 | 10 | if not os.path.exists(WORKING_DIR): 11 | os.mkdir(WORKING_DIR) 12 | 13 | 14 | async def llm_model_func( 15 | prompt, system_prompt=None, history_messages=[], **kwargs 16 | ) -> str: 17 | return await openai_complete_if_cache( 18 | "solar-mini", 19 | prompt, 20 | system_prompt=system_prompt, 21 | history_messages=history_messages, 22 | api_key=os.getenv("UPSTAGE_API_KEY"), 23 | base_url="https://api.upstage.ai/v1/solar", 24 | **kwargs, 25 | ) 26 | 27 | 28 | async def embedding_func(texts: list[str]) -> np.ndarray: 29 | return await openai_embedding( 30 | texts, 31 | model="solar-embedding-1-large-query", 32 | api_key=os.getenv("UPSTAGE_API_KEY"), 33 | base_url="https://api.upstage.ai/v1/solar", 34 | ) 35 | 36 | 37 | async def get_embedding_dim(): 38 | test_text = ["This is a test sentence."] 39 | embedding = await embedding_func(test_text) 40 | embedding_dim = embedding.shape[1] 41 | return embedding_dim 42 | 43 | 44 | # function test 45 | async def test_funcs(): 46 | result = await llm_model_func("How are you?") 47 | print("llm_model_func: ", result) 48 | 49 | result = await embedding_func(["How are you?"]) 50 | print("embedding_func: ", result) 51 | 52 | 53 | # asyncio.run(test_funcs()) 54 | 55 | 56 | async def main(): 57 | try: 58 | embedding_dimension = await get_embedding_dim() 59 | print(f"Detected embedding dimension: {embedding_dimension}") 60 | 61 | rag = LightRAG( 62 | working_dir=WORKING_DIR, 63 | llm_model_func=llm_model_func, 64 | embedding_func=EmbeddingFunc( 65 | embedding_dim=embedding_dimension, 66 | max_token_size=8192, 67 | func=embedding_func, 68 | ), 69 | ) 70 | 71 | with open("./book.txt", "r", encoding="utf-8") as f: 72 | await rag.ainsert(f.read()) 73 | 74 | # Perform naive search 75 | print( 76 | await rag.aquery( 77 | "What are the top themes in this story?", param=QueryParam(mode="naive") 78 | ) 79 | ) 80 | 81 | # Perform local search 82 | print( 83 | await rag.aquery( 84 | "What are the top themes in this story?", param=QueryParam(mode="local") 85 | ) 86 | ) 87 | 88 | # Perform global search 89 | print( 90 | await rag.aquery( 91 | "What are the top themes in this story?", 92 | param=QueryParam(mode="global"), 93 | ) 94 | ) 95 | 96 | # Perform hybrid search 97 | print( 98 | await rag.aquery( 99 | "What are the top themes in this story?", 100 | param=QueryParam(mode="hybrid"), 101 | ) 102 | ) 103 | except Exception as e: 104 | print(f"An error occurred: {e}") 105 | 106 | 107 | if __name__ == "__main__": 108 | asyncio.run(main()) 109 | -------------------------------------------------------------------------------- /LightRAG/examples/lightrag_openai_demo.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | from lightrag import LightRAG, QueryParam 4 | from lightrag.llm import gpt_4o_mini_complete 5 | 6 | WORKING_DIR = "./dickens" 7 | 8 | if not os.path.exists(WORKING_DIR): 9 | os.mkdir(WORKING_DIR) 10 | 11 | rag = LightRAG( 12 | working_dir=WORKING_DIR, 13 | llm_model_func=gpt_4o_mini_complete, 14 | # llm_model_func=gpt_4o_complete 15 | ) 16 | 17 | 18 | with open("./book.txt", "r", encoding="utf-8") as f: 19 | rag.insert(f.read()) 20 | 21 | # Perform naive search 22 | print( 23 | rag.query("What are the top themes in this story?", param=QueryParam(mode="naive")) 24 | ) 25 | 26 | # Perform local search 27 | print( 28 | rag.query("What are the top themes in this story?", param=QueryParam(mode="local")) 29 | ) 30 | 31 | # Perform global search 32 | print( 33 | rag.query("What are the top themes in this story?", param=QueryParam(mode="global")) 34 | ) 35 | 36 | # Perform hybrid search 37 | print( 38 | rag.query("What are the top themes in this story?", param=QueryParam(mode="hybrid")) 39 | ) 40 | -------------------------------------------------------------------------------- /LightRAG/examples/lightrag_oracle_demo.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import os 3 | from pathlib import Path 4 | import asyncio 5 | from lightrag import LightRAG, QueryParam 6 | from lightrag.llm import openai_complete_if_cache, openai_embedding 7 | from lightrag.utils import EmbeddingFunc 8 | import numpy as np 9 | from lightrag.kg.oracle_impl import OracleDB 10 | 11 | print(os.getcwd()) 12 | script_directory = Path(__file__).resolve().parent.parent 13 | sys.path.append(os.path.abspath(script_directory)) 14 | 15 | WORKING_DIR = "./dickens" 16 | 17 | # We use OpenAI compatible API to call LLM on Oracle Cloud 18 | # More docs here https://github.com/jin38324/OCI_GenAI_access_gateway 19 | BASE_URL = "http://xxx.xxx.xxx.xxx:8088/v1/" 20 | APIKEY = "ocigenerativeai" 21 | CHATMODEL = "cohere.command-r-plus" 22 | EMBEDMODEL = "cohere.embed-multilingual-v3.0" 23 | 24 | 25 | if not os.path.exists(WORKING_DIR): 26 | os.mkdir(WORKING_DIR) 27 | 28 | 29 | async def llm_model_func( 30 | prompt, system_prompt=None, history_messages=[], **kwargs 31 | ) -> str: 32 | return await openai_complete_if_cache( 33 | CHATMODEL, 34 | prompt, 35 | system_prompt=system_prompt, 36 | history_messages=history_messages, 37 | api_key=APIKEY, 38 | base_url=BASE_URL, 39 | **kwargs, 40 | ) 41 | 42 | 43 | async def embedding_func(texts: list[str]) -> np.ndarray: 44 | return await openai_embedding( 45 | texts, 46 | model=EMBEDMODEL, 47 | api_key=APIKEY, 48 | base_url=BASE_URL, 49 | ) 50 | 51 | 52 | async def get_embedding_dim(): 53 | test_text = ["This is a test sentence."] 54 | embedding = await embedding_func(test_text) 55 | embedding_dim = embedding.shape[1] 56 | return embedding_dim 57 | 58 | 59 | async def main(): 60 | try: 61 | # Detect embedding dimension 62 | embedding_dimension = await get_embedding_dim() 63 | print(f"Detected embedding dimension: {embedding_dimension}") 64 | 65 | # Create Oracle DB connection 66 | # The `config` parameter is the connection configuration of Oracle DB 67 | # More docs here https://python-oracledb.readthedocs.io/en/latest/user_guide/connection_handling.html 68 | # We storage data in unified tables, so we need to set a `workspace` parameter to specify which docs we want to store and query 69 | # Below is an example of how to connect to Oracle Autonomous Database on Oracle Cloud 70 | oracle_db = OracleDB( 71 | config={ 72 | "user": "username", 73 | "password": "xxxxxxxxx", 74 | "dsn": "xxxxxxx_medium", 75 | "config_dir": "dir/path/to/oracle/config", 76 | "wallet_location": "dir/path/to/oracle/wallet", 77 | "wallet_password": "xxxxxxxxx", 78 | "workspace": "company", # specify which docs you want to store and query 79 | } 80 | ) 81 | 82 | # Check if Oracle DB tables exist, if not, tables will be created 83 | await oracle_db.check_tables() 84 | 85 | # Initialize LightRAG 86 | # We use Oracle DB as the KV/vector/graph storage 87 | # You can add `addon_params={"example_number": 1, "language": "Simplfied Chinese"}` to control the prompt 88 | rag = LightRAG( 89 | enable_llm_cache=False, 90 | working_dir=WORKING_DIR, 91 | chunk_token_size=512, 92 | llm_model_func=llm_model_func, 93 | embedding_func=EmbeddingFunc( 94 | embedding_dim=embedding_dimension, 95 | max_token_size=512, 96 | func=embedding_func, 97 | ), 98 | graph_storage="OracleGraphStorage", 99 | kv_storage="OracleKVStorage", 100 | vector_storage="OracleVectorDBStorage", 101 | ) 102 | 103 | # Setthe KV/vector/graph storage's `db` property, so all operation will use same connection pool 104 | rag.graph_storage_cls.db = oracle_db 105 | rag.key_string_value_json_storage_cls.db = oracle_db 106 | rag.vector_db_storage_cls.db = oracle_db 107 | 108 | # Extract and Insert into LightRAG storage 109 | with open("./dickens/demo.txt", "r", encoding="utf-8") as f: 110 | await rag.ainsert(f.read()) 111 | 112 | # Perform search in different modes 113 | modes = ["naive", "local", "global", "hybrid"] 114 | for mode in modes: 115 | print("=" * 20, mode, "=" * 20) 116 | print( 117 | await rag.aquery( 118 | "What are the top themes in this story?", 119 | param=QueryParam(mode=mode), 120 | ) 121 | ) 122 | print("-" * 100, "\n") 123 | 124 | except Exception as e: 125 | print(f"An error occurred: {e}") 126 | 127 | 128 | if __name__ == "__main__": 129 | asyncio.run(main()) 130 | -------------------------------------------------------------------------------- /LightRAG/examples/lightrag_siliconcloud_demo.py: -------------------------------------------------------------------------------- 1 | import os 2 | import asyncio 3 | from lightrag import LightRAG, QueryParam 4 | from lightrag.llm import openai_complete_if_cache, siliconcloud_embedding 5 | from lightrag.utils import EmbeddingFunc 6 | import numpy as np 7 | 8 | WORKING_DIR = "./dickens" 9 | 10 | if not os.path.exists(WORKING_DIR): 11 | os.mkdir(WORKING_DIR) 12 | 13 | 14 | async def llm_model_func( 15 | prompt, system_prompt=None, history_messages=[], **kwargs 16 | ) -> str: 17 | return await openai_complete_if_cache( 18 | "Qwen/Qwen2.5-7B-Instruct", 19 | prompt, 20 | system_prompt=system_prompt, 21 | history_messages=history_messages, 22 | api_key=os.getenv("SILICONFLOW_API_KEY"), 23 | base_url="https://api.siliconflow.cn/v1/", 24 | **kwargs, 25 | ) 26 | 27 | 28 | async def embedding_func(texts: list[str]) -> np.ndarray: 29 | return await siliconcloud_embedding( 30 | texts, 31 | model="netease-youdao/bce-embedding-base_v1", 32 | api_key=os.getenv("SILICONFLOW_API_KEY"), 33 | max_token_size=512, 34 | ) 35 | 36 | 37 | # function test 38 | async def test_funcs(): 39 | result = await llm_model_func("How are you?") 40 | print("llm_model_func: ", result) 41 | 42 | result = await embedding_func(["How are you?"]) 43 | print("embedding_func: ", result) 44 | 45 | 46 | asyncio.run(test_funcs()) 47 | 48 | 49 | rag = LightRAG( 50 | working_dir=WORKING_DIR, 51 | llm_model_func=llm_model_func, 52 | embedding_func=EmbeddingFunc( 53 | embedding_dim=768, max_token_size=512, func=embedding_func 54 | ), 55 | ) 56 | 57 | 58 | with open("./book.txt") as f: 59 | rag.insert(f.read()) 60 | 61 | # Perform naive search 62 | print( 63 | rag.query("What are the top themes in this story?", param=QueryParam(mode="naive")) 64 | ) 65 | 66 | # Perform local search 67 | print( 68 | rag.query("What are the top themes in this story?", param=QueryParam(mode="local")) 69 | ) 70 | 71 | # Perform global search 72 | print( 73 | rag.query("What are the top themes in this story?", param=QueryParam(mode="global")) 74 | ) 75 | 76 | # Perform hybrid search 77 | print( 78 | rag.query("What are the top themes in this story?", param=QueryParam(mode="hybrid")) 79 | ) 80 | -------------------------------------------------------------------------------- /LightRAG/examples/vram_management_demo.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | from lightrag import LightRAG, QueryParam 4 | from lightrag.llm import ollama_model_complete, ollama_embedding 5 | from lightrag.utils import EmbeddingFunc 6 | 7 | # Working directory and the directory path for text files 8 | WORKING_DIR = "./dickens" 9 | TEXT_FILES_DIR = "/llm/mt" 10 | 11 | # Create the working directory if it doesn't exist 12 | if not os.path.exists(WORKING_DIR): 13 | os.mkdir(WORKING_DIR) 14 | 15 | # Initialize LightRAG 16 | rag = LightRAG( 17 | working_dir=WORKING_DIR, 18 | llm_model_func=ollama_model_complete, 19 | llm_model_name="qwen2.5:3b-instruct-max-context", 20 | embedding_func=EmbeddingFunc( 21 | embedding_dim=768, 22 | max_token_size=8192, 23 | func=lambda texts: ollama_embedding(texts, embed_model="nomic-embed-text"), 24 | ), 25 | ) 26 | 27 | # Read all .txt files from the TEXT_FILES_DIR directory 28 | texts = [] 29 | for filename in os.listdir(TEXT_FILES_DIR): 30 | if filename.endswith(".txt"): 31 | file_path = os.path.join(TEXT_FILES_DIR, filename) 32 | with open(file_path, "r", encoding="utf-8") as file: 33 | texts.append(file.read()) 34 | 35 | 36 | # Batch insert texts into LightRAG with a retry mechanism 37 | def insert_texts_with_retry(rag, texts, retries=3, delay=5): 38 | for _ in range(retries): 39 | try: 40 | rag.insert(texts) 41 | return 42 | except Exception as e: 43 | print( 44 | f"Error occurred during insertion: {e}. Retrying in {delay} seconds..." 45 | ) 46 | time.sleep(delay) 47 | raise RuntimeError("Failed to insert texts after multiple retries.") 48 | 49 | 50 | insert_texts_with_retry(rag, texts) 51 | 52 | # Perform different types of queries and handle potential errors 53 | try: 54 | print( 55 | rag.query( 56 | "What are the top themes in this story?", param=QueryParam(mode="naive") 57 | ) 58 | ) 59 | except Exception as e: 60 | print(f"Error performing naive search: {e}") 61 | 62 | try: 63 | print( 64 | rag.query( 65 | "What are the top themes in this story?", param=QueryParam(mode="local") 66 | ) 67 | ) 68 | except Exception as e: 69 | print(f"Error performing local search: {e}") 70 | 71 | try: 72 | print( 73 | rag.query( 74 | "What are the top themes in this story?", param=QueryParam(mode="global") 75 | ) 76 | ) 77 | except Exception as e: 78 | print(f"Error performing global search: {e}") 79 | 80 | try: 81 | print( 82 | rag.query( 83 | "What are the top themes in this story?", param=QueryParam(mode="hybrid") 84 | ) 85 | ) 86 | except Exception as e: 87 | print(f"Error performing hybrid search: {e}") 88 | 89 | 90 | # Function to clear VRAM resources 91 | def clear_vram(): 92 | os.system("sudo nvidia-smi --gpu-reset") 93 | 94 | 95 | # Regularly clear VRAM to prevent overflow 96 | clear_vram_interval = 3600 # Clear once every hour 97 | start_time = time.time() 98 | 99 | while True: 100 | current_time = time.time() 101 | if current_time - start_time > clear_vram_interval: 102 | clear_vram() 103 | start_time = current_time 104 | time.sleep(60) # Check the time every minute 105 | -------------------------------------------------------------------------------- /LightRAG/get_all_edges_nx.py: -------------------------------------------------------------------------------- 1 | import networkx as nx 2 | 3 | G = nx.read_graphml("./dickensTestEmbedcall/graph_chunk_entity_relation.graphml") 4 | 5 | 6 | def get_all_edges_and_nodes(G): 7 | # Get all edges and their properties 8 | edges_with_properties = [] 9 | for u, v, data in G.edges(data=True): 10 | edges_with_properties.append( 11 | { 12 | "start": u, 13 | "end": v, 14 | "label": data.get( 15 | "label", "" 16 | ), # Assuming 'label' is used for edge type 17 | "properties": data, 18 | "start_node_properties": G.nodes[u], 19 | "end_node_properties": G.nodes[v], 20 | } 21 | ) 22 | 23 | return edges_with_properties 24 | 25 | 26 | # Example usage 27 | if __name__ == "__main__": 28 | # Assume G is your NetworkX graph loaded from Neo4j 29 | 30 | all_edges = get_all_edges_and_nodes(G) 31 | 32 | # Print all edges and node properties 33 | for edge in all_edges: 34 | print(f"Edge Label: {edge['label']}") 35 | print(f"Edge Properties: {edge['properties']}") 36 | print(f"Start Node: {edge['start']}") 37 | print(f"Start Node Properties: {edge['start_node_properties']}") 38 | print(f"End Node: {edge['end']}") 39 | print(f"End Node Properties: {edge['end_node_properties']}") 40 | print("---") 41 | -------------------------------------------------------------------------------- /LightRAG/lightrag/__init__.py: -------------------------------------------------------------------------------- 1 | from .lightrag import LightRAG as LightRAG, QueryParam as QueryParam 2 | 3 | __version__ = "1.0.2" 4 | __author__ = "Zirui Guo" 5 | __url__ = "https://github.com/HKUDS/LightRAG" 6 | -------------------------------------------------------------------------------- /LightRAG/lightrag/__pycache__/__init__.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NanGePlus/LightRAGTest/669ebb417f2e43eaea9bcb51dca26f80a9addff4/LightRAG/lightrag/__pycache__/__init__.cpython-311.pyc -------------------------------------------------------------------------------- /LightRAG/lightrag/__pycache__/base.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NanGePlus/LightRAGTest/669ebb417f2e43eaea9bcb51dca26f80a9addff4/LightRAG/lightrag/__pycache__/base.cpython-311.pyc -------------------------------------------------------------------------------- /LightRAG/lightrag/__pycache__/lightrag.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NanGePlus/LightRAGTest/669ebb417f2e43eaea9bcb51dca26f80a9addff4/LightRAG/lightrag/__pycache__/lightrag.cpython-311.pyc -------------------------------------------------------------------------------- /LightRAG/lightrag/__pycache__/llm.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NanGePlus/LightRAGTest/669ebb417f2e43eaea9bcb51dca26f80a9addff4/LightRAG/lightrag/__pycache__/llm.cpython-311.pyc -------------------------------------------------------------------------------- /LightRAG/lightrag/__pycache__/operate.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NanGePlus/LightRAGTest/669ebb417f2e43eaea9bcb51dca26f80a9addff4/LightRAG/lightrag/__pycache__/operate.cpython-311.pyc -------------------------------------------------------------------------------- /LightRAG/lightrag/__pycache__/prompt.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NanGePlus/LightRAGTest/669ebb417f2e43eaea9bcb51dca26f80a9addff4/LightRAG/lightrag/__pycache__/prompt.cpython-311.pyc -------------------------------------------------------------------------------- /LightRAG/lightrag/__pycache__/storage.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NanGePlus/LightRAGTest/669ebb417f2e43eaea9bcb51dca26f80a9addff4/LightRAG/lightrag/__pycache__/storage.cpython-311.pyc -------------------------------------------------------------------------------- /LightRAG/lightrag/__pycache__/utils.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NanGePlus/LightRAGTest/669ebb417f2e43eaea9bcb51dca26f80a9addff4/LightRAG/lightrag/__pycache__/utils.cpython-311.pyc -------------------------------------------------------------------------------- /LightRAG/lightrag/base.py: -------------------------------------------------------------------------------- 1 | from dataclasses import dataclass, field 2 | from typing import TypedDict, Union, Literal, Generic, TypeVar 3 | 4 | import numpy as np 5 | 6 | from .utils import EmbeddingFunc 7 | 8 | TextChunkSchema = TypedDict( 9 | "TextChunkSchema", 10 | {"tokens": int, "content": str, "full_doc_id": str, "chunk_order_index": int}, 11 | ) 12 | 13 | T = TypeVar("T") 14 | 15 | 16 | @dataclass 17 | class QueryParam: 18 | mode: Literal["local", "global", "hybrid", "naive"] = "global" 19 | only_need_context: bool = False 20 | only_need_prompt: bool = False 21 | response_type: str = "Multiple Paragraphs" 22 | # Number of top-k items to retrieve; corresponds to entities in "local" mode and relationships in "global" mode. 23 | top_k: int = 60 24 | # Number of document chunks to retrieve. 25 | # top_n: int = 10 26 | # Number of tokens for the original chunks. 27 | max_token_for_text_unit: int = 4000 28 | # Number of tokens for the relationship descriptions 29 | max_token_for_global_context: int = 4000 30 | # Number of tokens for the entity descriptions 31 | max_token_for_local_context: int = 4000 32 | 33 | 34 | @dataclass 35 | class StorageNameSpace: 36 | namespace: str 37 | global_config: dict 38 | 39 | async def index_done_callback(self): 40 | """commit the storage operations after indexing""" 41 | pass 42 | 43 | async def query_done_callback(self): 44 | """commit the storage operations after querying""" 45 | pass 46 | 47 | 48 | @dataclass 49 | class BaseVectorStorage(StorageNameSpace): 50 | embedding_func: EmbeddingFunc 51 | meta_fields: set = field(default_factory=set) 52 | 53 | async def query(self, query: str, top_k: int) -> list[dict]: 54 | raise NotImplementedError 55 | 56 | async def upsert(self, data: dict[str, dict]): 57 | """Use 'content' field from value for embedding, use key as id. 58 | If embedding_func is None, use 'embedding' field from value 59 | """ 60 | raise NotImplementedError 61 | 62 | 63 | @dataclass 64 | class BaseKVStorage(Generic[T], StorageNameSpace): 65 | embedding_func: EmbeddingFunc 66 | 67 | async def all_keys(self) -> list[str]: 68 | raise NotImplementedError 69 | 70 | async def get_by_id(self, id: str) -> Union[T, None]: 71 | raise NotImplementedError 72 | 73 | async def get_by_ids( 74 | self, ids: list[str], fields: Union[set[str], None] = None 75 | ) -> list[Union[T, None]]: 76 | raise NotImplementedError 77 | 78 | async def filter_keys(self, data: list[str]) -> set[str]: 79 | """return un-exist keys""" 80 | raise NotImplementedError 81 | 82 | async def upsert(self, data: dict[str, T]): 83 | raise NotImplementedError 84 | 85 | async def drop(self): 86 | raise NotImplementedError 87 | 88 | 89 | @dataclass 90 | class BaseGraphStorage(StorageNameSpace): 91 | embedding_func: EmbeddingFunc = None 92 | 93 | async def has_node(self, node_id: str) -> bool: 94 | raise NotImplementedError 95 | 96 | async def has_edge(self, source_node_id: str, target_node_id: str) -> bool: 97 | raise NotImplementedError 98 | 99 | async def node_degree(self, node_id: str) -> int: 100 | raise NotImplementedError 101 | 102 | async def edge_degree(self, src_id: str, tgt_id: str) -> int: 103 | raise NotImplementedError 104 | 105 | async def get_node(self, node_id: str) -> Union[dict, None]: 106 | raise NotImplementedError 107 | 108 | async def get_edge( 109 | self, source_node_id: str, target_node_id: str 110 | ) -> Union[dict, None]: 111 | raise NotImplementedError 112 | 113 | async def get_node_edges( 114 | self, source_node_id: str 115 | ) -> Union[list[tuple[str, str]], None]: 116 | raise NotImplementedError 117 | 118 | async def upsert_node(self, node_id: str, node_data: dict[str, str]): 119 | raise NotImplementedError 120 | 121 | async def upsert_edge( 122 | self, source_node_id: str, target_node_id: str, edge_data: dict[str, str] 123 | ): 124 | raise NotImplementedError 125 | 126 | async def delete_node(self, node_id: str): 127 | raise NotImplementedError 128 | 129 | async def embed_nodes(self, algorithm: str) -> tuple[np.ndarray, list[str]]: 130 | raise NotImplementedError("Node embedding is not used in lightrag.") 131 | -------------------------------------------------------------------------------- /LightRAG/lightrag/kg/__init__.py: -------------------------------------------------------------------------------- 1 | # print ("init package vars here. ......") 2 | -------------------------------------------------------------------------------- /LightRAG/lightrag/kg/__pycache__/__init__.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NanGePlus/LightRAGTest/669ebb417f2e43eaea9bcb51dca26f80a9addff4/LightRAG/lightrag/kg/__pycache__/__init__.cpython-311.pyc -------------------------------------------------------------------------------- /LightRAG/lightrag/kg/__pycache__/neo4j_impl.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NanGePlus/LightRAGTest/669ebb417f2e43eaea9bcb51dca26f80a9addff4/LightRAG/lightrag/kg/__pycache__/neo4j_impl.cpython-311.pyc -------------------------------------------------------------------------------- /LightRAG/lightrag/kg/__pycache__/oracle_impl.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NanGePlus/LightRAGTest/669ebb417f2e43eaea9bcb51dca26f80a9addff4/LightRAG/lightrag/kg/__pycache__/oracle_impl.cpython-311.pyc -------------------------------------------------------------------------------- /LightRAG/lightrag/kg/neo4j_impl.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | import os 3 | from dataclasses import dataclass 4 | from typing import Any, Union, Tuple, List, Dict 5 | import inspect 6 | from lightrag.utils import logger 7 | from ..base import BaseGraphStorage 8 | from neo4j import ( 9 | AsyncGraphDatabase, 10 | exceptions as neo4jExceptions, 11 | AsyncDriver, 12 | AsyncManagedTransaction, 13 | ) 14 | 15 | 16 | from tenacity import ( 17 | retry, 18 | stop_after_attempt, 19 | wait_exponential, 20 | retry_if_exception_type, 21 | ) 22 | 23 | 24 | @dataclass 25 | class Neo4JStorage(BaseGraphStorage): 26 | @staticmethod 27 | def load_nx_graph(file_name): 28 | print("no preloading of graph with neo4j in production") 29 | 30 | def __init__(self, namespace, global_config): 31 | super().__init__(namespace=namespace, global_config=global_config) 32 | self._driver = None 33 | self._driver_lock = asyncio.Lock() 34 | URI = os.environ["NEO4J_URI"] 35 | USERNAME = os.environ["NEO4J_USERNAME"] 36 | PASSWORD = os.environ["NEO4J_PASSWORD"] 37 | self._driver: AsyncDriver = AsyncGraphDatabase.driver( 38 | URI, auth=(USERNAME, PASSWORD) 39 | ) 40 | return None 41 | 42 | def __post_init__(self): 43 | self._node_embed_algorithms = { 44 | "node2vec": self._node2vec_embed, 45 | } 46 | 47 | async def close(self): 48 | if self._driver: 49 | await self._driver.close() 50 | self._driver = None 51 | 52 | async def __aexit__(self, exc_type, exc, tb): 53 | if self._driver: 54 | await self._driver.close() 55 | 56 | async def index_done_callback(self): 57 | print("KG successfully indexed.") 58 | 59 | async def has_node(self, node_id: str) -> bool: 60 | entity_name_label = node_id.strip('"') 61 | 62 | async with self._driver.session() as session: 63 | query = ( 64 | f"MATCH (n:`{entity_name_label}`) RETURN count(n) > 0 AS node_exists" 65 | ) 66 | result = await session.run(query) 67 | single_result = await result.single() 68 | logger.debug( 69 | f'{inspect.currentframe().f_code.co_name}:query:{query}:result:{single_result["node_exists"]}' 70 | ) 71 | return single_result["node_exists"] 72 | 73 | async def has_edge(self, source_node_id: str, target_node_id: str) -> bool: 74 | entity_name_label_source = source_node_id.strip('"') 75 | entity_name_label_target = target_node_id.strip('"') 76 | 77 | async with self._driver.session() as session: 78 | query = ( 79 | f"MATCH (a:`{entity_name_label_source}`)-[r]-(b:`{entity_name_label_target}`) " 80 | "RETURN COUNT(r) > 0 AS edgeExists" 81 | ) 82 | result = await session.run(query) 83 | single_result = await result.single() 84 | logger.debug( 85 | f'{inspect.currentframe().f_code.co_name}:query:{query}:result:{single_result["edgeExists"]}' 86 | ) 87 | return single_result["edgeExists"] 88 | 89 | async def get_node(self, node_id: str) -> Union[dict, None]: 90 | async with self._driver.session() as session: 91 | entity_name_label = node_id.strip('"') 92 | query = f"MATCH (n:`{entity_name_label}`) RETURN n" 93 | result = await session.run(query) 94 | record = await result.single() 95 | if record: 96 | node = record["n"] 97 | node_dict = dict(node) 98 | logger.debug( 99 | f"{inspect.currentframe().f_code.co_name}: query: {query}, result: {node_dict}" 100 | ) 101 | return node_dict 102 | return None 103 | 104 | async def node_degree(self, node_id: str) -> int: 105 | entity_name_label = node_id.strip('"') 106 | 107 | async with self._driver.session() as session: 108 | query = f""" 109 | MATCH (n:`{entity_name_label}`) 110 | RETURN COUNT{{ (n)--() }} AS totalEdgeCount 111 | """ 112 | result = await session.run(query) 113 | record = await result.single() 114 | if record: 115 | edge_count = record["totalEdgeCount"] 116 | logger.debug( 117 | f"{inspect.currentframe().f_code.co_name}:query:{query}:result:{edge_count}" 118 | ) 119 | return edge_count 120 | else: 121 | return None 122 | 123 | async def edge_degree(self, src_id: str, tgt_id: str) -> int: 124 | entity_name_label_source = src_id.strip('"') 125 | entity_name_label_target = tgt_id.strip('"') 126 | src_degree = await self.node_degree(entity_name_label_source) 127 | trg_degree = await self.node_degree(entity_name_label_target) 128 | 129 | # Convert None to 0 for addition 130 | src_degree = 0 if src_degree is None else src_degree 131 | trg_degree = 0 if trg_degree is None else trg_degree 132 | 133 | degrees = int(src_degree) + int(trg_degree) 134 | logger.debug( 135 | f"{inspect.currentframe().f_code.co_name}:query:src_Degree+trg_degree:result:{degrees}" 136 | ) 137 | return degrees 138 | 139 | async def get_edge( 140 | self, source_node_id: str, target_node_id: str 141 | ) -> Union[dict, None]: 142 | entity_name_label_source = source_node_id.strip('"') 143 | entity_name_label_target = target_node_id.strip('"') 144 | """ 145 | Find all edges between nodes of two given labels 146 | 147 | Args: 148 | source_node_label (str): Label of the source nodes 149 | target_node_label (str): Label of the target nodes 150 | 151 | Returns: 152 | list: List of all relationships/edges found 153 | """ 154 | async with self._driver.session() as session: 155 | query = f""" 156 | MATCH (start:`{entity_name_label_source}`)-[r]->(end:`{entity_name_label_target}`) 157 | RETURN properties(r) as edge_properties 158 | LIMIT 1 159 | """.format( 160 | entity_name_label_source=entity_name_label_source, 161 | entity_name_label_target=entity_name_label_target, 162 | ) 163 | 164 | result = await session.run(query) 165 | record = await result.single() 166 | if record: 167 | result = dict(record["edge_properties"]) 168 | logger.debug( 169 | f"{inspect.currentframe().f_code.co_name}:query:{query}:result:{result}" 170 | ) 171 | return result 172 | else: 173 | return None 174 | 175 | async def get_node_edges(self, source_node_id: str) -> List[Tuple[str, str]]: 176 | node_label = source_node_id.strip('"') 177 | 178 | """ 179 | Retrieves all edges (relationships) for a particular node identified by its label. 180 | :return: List of dictionaries containing edge information 181 | """ 182 | query = f"""MATCH (n:`{node_label}`) 183 | OPTIONAL MATCH (n)-[r]-(connected) 184 | RETURN n, r, connected""" 185 | async with self._driver.session() as session: 186 | results = await session.run(query) 187 | edges = [] 188 | async for record in results: 189 | source_node = record["n"] 190 | connected_node = record["connected"] 191 | 192 | source_label = ( 193 | list(source_node.labels)[0] if source_node.labels else None 194 | ) 195 | target_label = ( 196 | list(connected_node.labels)[0] 197 | if connected_node and connected_node.labels 198 | else None 199 | ) 200 | 201 | if source_label and target_label: 202 | edges.append((source_label, target_label)) 203 | 204 | return edges 205 | 206 | @retry( 207 | stop=stop_after_attempt(3), 208 | wait=wait_exponential(multiplier=1, min=4, max=10), 209 | retry=retry_if_exception_type( 210 | ( 211 | neo4jExceptions.ServiceUnavailable, 212 | neo4jExceptions.TransientError, 213 | neo4jExceptions.WriteServiceUnavailable, 214 | neo4jExceptions.ClientError, 215 | ) 216 | ), 217 | ) 218 | async def upsert_node(self, node_id: str, node_data: Dict[str, Any]): 219 | """ 220 | Upsert a node in the Neo4j database. 221 | 222 | Args: 223 | node_id: The unique identifier for the node (used as label) 224 | node_data: Dictionary of node properties 225 | """ 226 | label = node_id.strip('"') 227 | properties = node_data 228 | 229 | async def _do_upsert(tx: AsyncManagedTransaction): 230 | query = f""" 231 | MERGE (n:`{label}`) 232 | SET n += $properties 233 | """ 234 | await tx.run(query, properties=properties) 235 | logger.debug( 236 | f"Upserted node with label '{label}' and properties: {properties}" 237 | ) 238 | 239 | try: 240 | async with self._driver.session() as session: 241 | await session.execute_write(_do_upsert) 242 | except Exception as e: 243 | logger.error(f"Error during upsert: {str(e)}") 244 | raise 245 | 246 | @retry( 247 | stop=stop_after_attempt(3), 248 | wait=wait_exponential(multiplier=1, min=4, max=10), 249 | retry=retry_if_exception_type( 250 | ( 251 | neo4jExceptions.ServiceUnavailable, 252 | neo4jExceptions.TransientError, 253 | neo4jExceptions.WriteServiceUnavailable, 254 | ) 255 | ), 256 | ) 257 | async def upsert_edge( 258 | self, source_node_id: str, target_node_id: str, edge_data: Dict[str, Any] 259 | ): 260 | """ 261 | Upsert an edge and its properties between two nodes identified by their labels. 262 | 263 | Args: 264 | source_node_id (str): Label of the source node (used as identifier) 265 | target_node_id (str): Label of the target node (used as identifier) 266 | edge_data (dict): Dictionary of properties to set on the edge 267 | """ 268 | source_node_label = source_node_id.strip('"') 269 | target_node_label = target_node_id.strip('"') 270 | edge_properties = edge_data 271 | 272 | async def _do_upsert_edge(tx: AsyncManagedTransaction): 273 | query = f""" 274 | MATCH (source:`{source_node_label}`) 275 | WITH source 276 | MATCH (target:`{target_node_label}`) 277 | MERGE (source)-[r:DIRECTED]->(target) 278 | SET r += $properties 279 | RETURN r 280 | """ 281 | await tx.run(query, properties=edge_properties) 282 | logger.debug( 283 | f"Upserted edge from '{source_node_label}' to '{target_node_label}' with properties: {edge_properties}" 284 | ) 285 | 286 | try: 287 | async with self._driver.session() as session: 288 | await session.execute_write(_do_upsert_edge) 289 | except Exception as e: 290 | logger.error(f"Error during edge upsert: {str(e)}") 291 | raise 292 | 293 | async def _node2vec_embed(self): 294 | print("Implemented but never called.") 295 | -------------------------------------------------------------------------------- /LightRAG/lightrag/storage.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | import html 3 | import os 4 | from tqdm.asyncio import tqdm as tqdm_async 5 | from dataclasses import dataclass 6 | from typing import Any, Union, cast 7 | import networkx as nx 8 | import numpy as np 9 | from nano_vectordb import NanoVectorDB 10 | 11 | from .utils import ( 12 | logger, 13 | load_json, 14 | write_json, 15 | compute_mdhash_id, 16 | ) 17 | 18 | from .base import ( 19 | BaseGraphStorage, 20 | BaseKVStorage, 21 | BaseVectorStorage, 22 | ) 23 | 24 | 25 | @dataclass 26 | class JsonKVStorage(BaseKVStorage): 27 | def __post_init__(self): 28 | working_dir = self.global_config["working_dir"] 29 | self._file_name = os.path.join(working_dir, f"kv_store_{self.namespace}.json") 30 | self._data = load_json(self._file_name) or {} 31 | logger.info(f"Load KV {self.namespace} with {len(self._data)} data") 32 | 33 | async def all_keys(self) -> list[str]: 34 | return list(self._data.keys()) 35 | 36 | async def index_done_callback(self): 37 | write_json(self._data, self._file_name) 38 | 39 | async def get_by_id(self, id): 40 | return self._data.get(id, None) 41 | 42 | async def get_by_ids(self, ids, fields=None): 43 | if fields is None: 44 | return [self._data.get(id, None) for id in ids] 45 | return [ 46 | ( 47 | {k: v for k, v in self._data[id].items() if k in fields} 48 | if self._data.get(id, None) 49 | else None 50 | ) 51 | for id in ids 52 | ] 53 | 54 | async def filter_keys(self, data: list[str]) -> set[str]: 55 | return set([s for s in data if s not in self._data]) 56 | 57 | async def upsert(self, data: dict[str, dict]): 58 | left_data = {k: v for k, v in data.items() if k not in self._data} 59 | self._data.update(left_data) 60 | return left_data 61 | 62 | async def drop(self): 63 | self._data = {} 64 | 65 | 66 | @dataclass 67 | class NanoVectorDBStorage(BaseVectorStorage): 68 | cosine_better_than_threshold: float = 0.2 69 | 70 | def __post_init__(self): 71 | self._client_file_name = os.path.join( 72 | self.global_config["working_dir"], f"vdb_{self.namespace}.json" 73 | ) 74 | self._max_batch_size = self.global_config["embedding_batch_num"] 75 | self._client = NanoVectorDB( 76 | self.embedding_func.embedding_dim, storage_file=self._client_file_name 77 | ) 78 | self.cosine_better_than_threshold = self.global_config.get( 79 | "cosine_better_than_threshold", self.cosine_better_than_threshold 80 | ) 81 | 82 | async def upsert(self, data: dict[str, dict]): 83 | logger.info(f"Inserting {len(data)} vectors to {self.namespace}") 84 | if not len(data): 85 | logger.warning("You insert an empty data to vector DB") 86 | return [] 87 | list_data = [ 88 | { 89 | "__id__": k, 90 | **{k1: v1 for k1, v1 in v.items() if k1 in self.meta_fields}, 91 | } 92 | for k, v in data.items() 93 | ] 94 | contents = [v["content"] for v in data.values()] 95 | batches = [ 96 | contents[i : i + self._max_batch_size] 97 | for i in range(0, len(contents), self._max_batch_size) 98 | ] 99 | embedding_tasks = [self.embedding_func(batch) for batch in batches] 100 | embeddings_list = [] 101 | for f in tqdm_async( 102 | asyncio.as_completed(embedding_tasks), 103 | total=len(embedding_tasks), 104 | desc="Generating embeddings", 105 | unit="batch", 106 | ): 107 | embeddings = await f 108 | embeddings_list.append(embeddings) 109 | embeddings = np.concatenate(embeddings_list) 110 | for i, d in enumerate(list_data): 111 | d["__vector__"] = embeddings[i] 112 | results = self._client.upsert(datas=list_data) 113 | return results 114 | 115 | async def query(self, query: str, top_k=5): 116 | embedding = await self.embedding_func([query]) 117 | embedding = embedding[0] 118 | results = self._client.query( 119 | query=embedding, 120 | top_k=top_k, 121 | better_than_threshold=self.cosine_better_than_threshold, 122 | ) 123 | results = [ 124 | {**dp, "id": dp["__id__"], "distance": dp["__metrics__"]} for dp in results 125 | ] 126 | return results 127 | 128 | @property 129 | def client_storage(self): 130 | return getattr(self._client, "_NanoVectorDB__storage") 131 | 132 | async def delete_entity(self, entity_name: str): 133 | try: 134 | entity_id = [compute_mdhash_id(entity_name, prefix="ent-")] 135 | 136 | if self._client.get(entity_id): 137 | self._client.delete(entity_id) 138 | logger.info(f"Entity {entity_name} have been deleted.") 139 | else: 140 | logger.info(f"No entity found with name {entity_name}.") 141 | except Exception as e: 142 | logger.error(f"Error while deleting entity {entity_name}: {e}") 143 | 144 | async def delete_relation(self, entity_name: str): 145 | try: 146 | relations = [ 147 | dp 148 | for dp in self.client_storage["data"] 149 | if dp["src_id"] == entity_name or dp["tgt_id"] == entity_name 150 | ] 151 | ids_to_delete = [relation["__id__"] for relation in relations] 152 | 153 | if ids_to_delete: 154 | self._client.delete(ids_to_delete) 155 | logger.info( 156 | f"All relations related to entity {entity_name} have been deleted." 157 | ) 158 | else: 159 | logger.info(f"No relations found for entity {entity_name}.") 160 | except Exception as e: 161 | logger.error( 162 | f"Error while deleting relations for entity {entity_name}: {e}" 163 | ) 164 | 165 | async def index_done_callback(self): 166 | self._client.save() 167 | 168 | 169 | @dataclass 170 | class NetworkXStorage(BaseGraphStorage): 171 | @staticmethod 172 | def load_nx_graph(file_name) -> nx.Graph: 173 | if os.path.exists(file_name): 174 | return nx.read_graphml(file_name) 175 | return None 176 | 177 | @staticmethod 178 | def write_nx_graph(graph: nx.Graph, file_name): 179 | logger.info( 180 | f"Writing graph with {graph.number_of_nodes()} nodes, {graph.number_of_edges()} edges" 181 | ) 182 | nx.write_graphml(graph, file_name) 183 | 184 | @staticmethod 185 | def stable_largest_connected_component(graph: nx.Graph) -> nx.Graph: 186 | """Refer to https://github.com/microsoft/graphrag/index/graph/utils/stable_lcc.py 187 | Return the largest connected component of the graph, with nodes and edges sorted in a stable way. 188 | """ 189 | from graspologic.utils import largest_connected_component 190 | 191 | graph = graph.copy() 192 | graph = cast(nx.Graph, largest_connected_component(graph)) 193 | node_mapping = { 194 | node: html.unescape(node.upper().strip()) for node in graph.nodes() 195 | } # type: ignore 196 | graph = nx.relabel_nodes(graph, node_mapping) 197 | return NetworkXStorage._stabilize_graph(graph) 198 | 199 | @staticmethod 200 | def _stabilize_graph(graph: nx.Graph) -> nx.Graph: 201 | """Refer to https://github.com/microsoft/graphrag/index/graph/utils/stable_lcc.py 202 | Ensure an undirected graph with the same relationships will always be read the same way. 203 | """ 204 | fixed_graph = nx.DiGraph() if graph.is_directed() else nx.Graph() 205 | 206 | sorted_nodes = graph.nodes(data=True) 207 | sorted_nodes = sorted(sorted_nodes, key=lambda x: x[0]) 208 | 209 | fixed_graph.add_nodes_from(sorted_nodes) 210 | edges = list(graph.edges(data=True)) 211 | 212 | if not graph.is_directed(): 213 | 214 | def _sort_source_target(edge): 215 | source, target, edge_data = edge 216 | if source > target: 217 | temp = source 218 | source = target 219 | target = temp 220 | return source, target, edge_data 221 | 222 | edges = [_sort_source_target(edge) for edge in edges] 223 | 224 | def _get_edge_key(source: Any, target: Any) -> str: 225 | return f"{source} -> {target}" 226 | 227 | edges = sorted(edges, key=lambda x: _get_edge_key(x[0], x[1])) 228 | 229 | fixed_graph.add_edges_from(edges) 230 | return fixed_graph 231 | 232 | def __post_init__(self): 233 | self._graphml_xml_file = os.path.join( 234 | self.global_config["working_dir"], f"graph_{self.namespace}.graphml" 235 | ) 236 | preloaded_graph = NetworkXStorage.load_nx_graph(self._graphml_xml_file) 237 | if preloaded_graph is not None: 238 | logger.info( 239 | f"Loaded graph from {self._graphml_xml_file} with {preloaded_graph.number_of_nodes()} nodes, {preloaded_graph.number_of_edges()} edges" 240 | ) 241 | self._graph = preloaded_graph or nx.Graph() 242 | self._node_embed_algorithms = { 243 | "node2vec": self._node2vec_embed, 244 | } 245 | 246 | async def index_done_callback(self): 247 | NetworkXStorage.write_nx_graph(self._graph, self._graphml_xml_file) 248 | 249 | async def has_node(self, node_id: str) -> bool: 250 | return self._graph.has_node(node_id) 251 | 252 | async def has_edge(self, source_node_id: str, target_node_id: str) -> bool: 253 | return self._graph.has_edge(source_node_id, target_node_id) 254 | 255 | async def get_node(self, node_id: str) -> Union[dict, None]: 256 | return self._graph.nodes.get(node_id) 257 | 258 | async def node_degree(self, node_id: str) -> int: 259 | return self._graph.degree(node_id) 260 | 261 | async def edge_degree(self, src_id: str, tgt_id: str) -> int: 262 | return self._graph.degree(src_id) + self._graph.degree(tgt_id) 263 | 264 | async def get_edge( 265 | self, source_node_id: str, target_node_id: str 266 | ) -> Union[dict, None]: 267 | return self._graph.edges.get((source_node_id, target_node_id)) 268 | 269 | async def get_node_edges(self, source_node_id: str): 270 | if self._graph.has_node(source_node_id): 271 | return list(self._graph.edges(source_node_id)) 272 | return None 273 | 274 | async def upsert_node(self, node_id: str, node_data: dict[str, str]): 275 | self._graph.add_node(node_id, **node_data) 276 | 277 | async def upsert_edge( 278 | self, source_node_id: str, target_node_id: str, edge_data: dict[str, str] 279 | ): 280 | self._graph.add_edge(source_node_id, target_node_id, **edge_data) 281 | 282 | async def delete_node(self, node_id: str): 283 | """ 284 | Delete a node from the graph based on the specified node_id. 285 | 286 | :param node_id: The node_id to delete 287 | """ 288 | if self._graph.has_node(node_id): 289 | self._graph.remove_node(node_id) 290 | logger.info(f"Node {node_id} deleted from the graph.") 291 | else: 292 | logger.warning(f"Node {node_id} not found in the graph for deletion.") 293 | 294 | async def embed_nodes(self, algorithm: str) -> tuple[np.ndarray, list[str]]: 295 | if algorithm not in self._node_embed_algorithms: 296 | raise ValueError(f"Node embedding algorithm {algorithm} not supported") 297 | return await self._node_embed_algorithms[algorithm]() 298 | 299 | # @TODO: NOT USED 300 | async def _node2vec_embed(self): 301 | from graspologic import embed 302 | 303 | embeddings, nodes = embed.node2vec_embed( 304 | self._graph, 305 | **self.global_config["node2vec_params"], 306 | ) 307 | 308 | nodes_ids = [self._graph.nodes[node_id]["id"] for node_id in nodes] 309 | return embeddings, nodes_ids 310 | -------------------------------------------------------------------------------- /LightRAG/lightrag/utils.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | import html 3 | import io 4 | import csv 5 | import json 6 | import logging 7 | import os 8 | import re 9 | from dataclasses import dataclass 10 | from functools import wraps 11 | from hashlib import md5 12 | from typing import Any, Union, List 13 | import xml.etree.ElementTree as ET 14 | 15 | import numpy as np 16 | import tiktoken 17 | 18 | ENCODER = None 19 | 20 | logger = logging.getLogger("lightrag") 21 | 22 | 23 | def set_logger(log_file: str): 24 | logger.setLevel(logging.DEBUG) 25 | 26 | file_handler = logging.FileHandler(log_file) 27 | file_handler.setLevel(logging.DEBUG) 28 | 29 | formatter = logging.Formatter( 30 | "%(asctime)s - %(name)s - %(levelname)s - %(message)s" 31 | ) 32 | file_handler.setFormatter(formatter) 33 | 34 | if not logger.handlers: 35 | logger.addHandler(file_handler) 36 | 37 | 38 | @dataclass 39 | class EmbeddingFunc: 40 | embedding_dim: int 41 | max_token_size: int 42 | func: callable 43 | 44 | async def __call__(self, *args, **kwargs) -> np.ndarray: 45 | return await self.func(*args, **kwargs) 46 | 47 | 48 | def locate_json_string_body_from_string(content: str) -> Union[str, None]: 49 | """Locate the JSON string body from a string""" 50 | try: 51 | maybe_json_str = re.search(r"{.*}", content, re.DOTALL) 52 | if maybe_json_str is not None: 53 | maybe_json_str = maybe_json_str.group(0) 54 | maybe_json_str = maybe_json_str.replace("\\n", "") 55 | maybe_json_str = maybe_json_str.replace("\n", "") 56 | maybe_json_str = maybe_json_str.replace("'", '"') 57 | json.loads(maybe_json_str) 58 | return maybe_json_str 59 | except Exception: 60 | pass 61 | # try: 62 | # content = ( 63 | # content.replace(kw_prompt[:-1], "") 64 | # .replace("user", "") 65 | # .replace("model", "") 66 | # .strip() 67 | # ) 68 | # maybe_json_str = "{" + content.split("{")[1].split("}")[0] + "}" 69 | # json.loads(maybe_json_str) 70 | 71 | return None 72 | 73 | 74 | def convert_response_to_json(response: str) -> dict: 75 | json_str = locate_json_string_body_from_string(response) 76 | assert json_str is not None, f"Unable to parse JSON from response: {response}" 77 | try: 78 | data = json.loads(json_str) 79 | return data 80 | except json.JSONDecodeError as e: 81 | logger.error(f"Failed to parse JSON: {json_str}") 82 | raise e from None 83 | 84 | 85 | def compute_args_hash(*args): 86 | return md5(str(args).encode()).hexdigest() 87 | 88 | 89 | def compute_mdhash_id(content, prefix: str = ""): 90 | return prefix + md5(content.encode()).hexdigest() 91 | 92 | 93 | def limit_async_func_call(max_size: int, waitting_time: float = 0.0001): 94 | """Add restriction of maximum async calling times for a async func""" 95 | 96 | def final_decro(func): 97 | """Not using async.Semaphore to aovid use nest-asyncio""" 98 | __current_size = 0 99 | 100 | @wraps(func) 101 | async def wait_func(*args, **kwargs): 102 | nonlocal __current_size 103 | while __current_size >= max_size: 104 | await asyncio.sleep(waitting_time) 105 | __current_size += 1 106 | result = await func(*args, **kwargs) 107 | __current_size -= 1 108 | return result 109 | 110 | return wait_func 111 | 112 | return final_decro 113 | 114 | 115 | def wrap_embedding_func_with_attrs(**kwargs): 116 | """Wrap a function with attributes""" 117 | 118 | def final_decro(func) -> EmbeddingFunc: 119 | new_func = EmbeddingFunc(**kwargs, func=func) 120 | return new_func 121 | 122 | return final_decro 123 | 124 | 125 | def load_json(file_name): 126 | if not os.path.exists(file_name): 127 | return None 128 | with open(file_name, encoding="utf-8") as f: 129 | return json.load(f) 130 | 131 | 132 | def write_json(json_obj, file_name): 133 | with open(file_name, "w", encoding="utf-8") as f: 134 | json.dump(json_obj, f, indent=2, ensure_ascii=False) 135 | 136 | 137 | def encode_string_by_tiktoken(content: str, model_name: str = "gpt-4o"): 138 | global ENCODER 139 | if ENCODER is None: 140 | ENCODER = tiktoken.encoding_for_model(model_name) 141 | tokens = ENCODER.encode(content) 142 | return tokens 143 | 144 | 145 | def decode_tokens_by_tiktoken(tokens: list[int], model_name: str = "gpt-4o"): 146 | global ENCODER 147 | if ENCODER is None: 148 | ENCODER = tiktoken.encoding_for_model(model_name) 149 | content = ENCODER.decode(tokens) 150 | return content 151 | 152 | 153 | def pack_user_ass_to_openai_messages(*args: str): 154 | roles = ["user", "assistant"] 155 | return [ 156 | {"role": roles[i % 2], "content": content} for i, content in enumerate(args) 157 | ] 158 | 159 | 160 | def split_string_by_multi_markers(content: str, markers: list[str]) -> list[str]: 161 | """Split a string by multiple markers""" 162 | if not markers: 163 | return [content] 164 | results = re.split("|".join(re.escape(marker) for marker in markers), content) 165 | return [r.strip() for r in results if r.strip()] 166 | 167 | 168 | # Refer the utils functions of the official GraphRAG implementation: 169 | # https://github.com/microsoft/graphrag 170 | def clean_str(input: Any) -> str: 171 | """Clean an input string by removing HTML escapes, control characters, and other unwanted characters.""" 172 | # If we get non-string input, just give it back 173 | if not isinstance(input, str): 174 | return input 175 | 176 | result = html.unescape(input.strip()) 177 | # https://stackoverflow.com/questions/4324790/removing-control-characters-from-a-string-in-python 178 | return re.sub(r"[\x00-\x1f\x7f-\x9f]", "", result) 179 | 180 | 181 | def is_float_regex(value): 182 | return bool(re.match(r"^[-+]?[0-9]*\.?[0-9]+$", value)) 183 | 184 | 185 | def truncate_list_by_token_size(list_data: list, key: callable, max_token_size: int): 186 | """Truncate a list of data by token size""" 187 | if max_token_size <= 0: 188 | return [] 189 | tokens = 0 190 | for i, data in enumerate(list_data): 191 | tokens += len(encode_string_by_tiktoken(key(data))) 192 | if tokens > max_token_size: 193 | return list_data[:i] 194 | return list_data 195 | 196 | 197 | def list_of_list_to_csv(data: List[List[str]]) -> str: 198 | output = io.StringIO() 199 | writer = csv.writer(output) 200 | writer.writerows(data) 201 | return output.getvalue() 202 | 203 | 204 | def csv_string_to_list(csv_string: str) -> List[List[str]]: 205 | output = io.StringIO(csv_string) 206 | reader = csv.reader(output) 207 | return [row for row in reader] 208 | 209 | 210 | def save_data_to_file(data, file_name): 211 | with open(file_name, "w", encoding="utf-8") as f: 212 | json.dump(data, f, ensure_ascii=False, indent=4) 213 | 214 | 215 | def xml_to_json(xml_file): 216 | try: 217 | tree = ET.parse(xml_file) 218 | root = tree.getroot() 219 | 220 | # Print the root element's tag and attributes to confirm the file has been correctly loaded 221 | print(f"Root element: {root.tag}") 222 | print(f"Root attributes: {root.attrib}") 223 | 224 | data = {"nodes": [], "edges": []} 225 | 226 | # Use namespace 227 | namespace = {"": "http://graphml.graphdrawing.org/xmlns"} 228 | 229 | for node in root.findall(".//node", namespace): 230 | node_data = { 231 | "id": node.get("id").strip('"'), 232 | "entity_type": node.find("./data[@key='d0']", namespace).text.strip('"') 233 | if node.find("./data[@key='d0']", namespace) is not None 234 | else "", 235 | "description": node.find("./data[@key='d1']", namespace).text 236 | if node.find("./data[@key='d1']", namespace) is not None 237 | else "", 238 | "source_id": node.find("./data[@key='d2']", namespace).text 239 | if node.find("./data[@key='d2']", namespace) is not None 240 | else "", 241 | } 242 | data["nodes"].append(node_data) 243 | 244 | for edge in root.findall(".//edge", namespace): 245 | edge_data = { 246 | "source": edge.get("source").strip('"'), 247 | "target": edge.get("target").strip('"'), 248 | "weight": float(edge.find("./data[@key='d3']", namespace).text) 249 | if edge.find("./data[@key='d3']", namespace) is not None 250 | else 0.0, 251 | "description": edge.find("./data[@key='d4']", namespace).text 252 | if edge.find("./data[@key='d4']", namespace) is not None 253 | else "", 254 | "keywords": edge.find("./data[@key='d5']", namespace).text 255 | if edge.find("./data[@key='d5']", namespace) is not None 256 | else "", 257 | "source_id": edge.find("./data[@key='d6']", namespace).text 258 | if edge.find("./data[@key='d6']", namespace) is not None 259 | else "", 260 | } 261 | data["edges"].append(edge_data) 262 | 263 | # Print the number of nodes and edges found 264 | print(f"Found {len(data['nodes'])} nodes and {len(data['edges'])} edges") 265 | 266 | return data 267 | except ET.ParseError as e: 268 | print(f"Error parsing XML file: {e}") 269 | return None 270 | except Exception as e: 271 | print(f"An error occurred: {e}") 272 | return None 273 | 274 | 275 | def process_combine_contexts(hl, ll): 276 | header = None 277 | list_hl = csv_string_to_list(hl.strip()) 278 | list_ll = csv_string_to_list(ll.strip()) 279 | 280 | if list_hl: 281 | header = list_hl[0] 282 | list_hl = list_hl[1:] 283 | if list_ll: 284 | header = list_ll[0] 285 | list_ll = list_ll[1:] 286 | if header is None: 287 | return "" 288 | 289 | if list_hl: 290 | list_hl = [",".join(item[1:]) for item in list_hl if item] 291 | if list_ll: 292 | list_ll = [",".join(item[1:]) for item in list_ll if item] 293 | 294 | combined_sources = [] 295 | seen = set() 296 | 297 | for item in list_hl + list_ll: 298 | if item and item not in seen: 299 | combined_sources.append(item) 300 | seen.add(item) 301 | 302 | combined_sources_result = [",\t".join(header)] 303 | 304 | for i, item in enumerate(combined_sources, start=1): 305 | combined_sources_result.append(f"{i},\t{item}") 306 | 307 | combined_sources_result = "\n".join(combined_sources_result) 308 | 309 | return combined_sources_result 310 | -------------------------------------------------------------------------------- /LightRAG/lightrag_hku.egg-info/SOURCES.txt: -------------------------------------------------------------------------------- 1 | LICENSE 2 | README.md 3 | setup.py 4 | lightrag/__init__.py 5 | lightrag/base.py 6 | lightrag/lightrag.py 7 | lightrag/llm.py 8 | lightrag/operate.py 9 | lightrag/prompt.py 10 | lightrag/storage.py 11 | lightrag/utils.py 12 | lightrag/kg/__init__.py 13 | lightrag/kg/neo4j_impl.py 14 | lightrag/kg/oracle_impl.py 15 | lightrag_hku.egg-info/PKG-INFO 16 | lightrag_hku.egg-info/SOURCES.txt 17 | lightrag_hku.egg-info/dependency_links.txt 18 | lightrag_hku.egg-info/requires.txt 19 | lightrag_hku.egg-info/top_level.txt -------------------------------------------------------------------------------- /LightRAG/lightrag_hku.egg-info/dependency_links.txt: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /LightRAG/lightrag_hku.egg-info/requires.txt: -------------------------------------------------------------------------------- 1 | accelerate 2 | aioboto3 3 | aiohttp 4 | graspologic 5 | hnswlib 6 | nano-vectordb 7 | neo4j 8 | networkx 9 | ollama 10 | openai 11 | oracledb 12 | pyvis 13 | tenacity 14 | tiktoken 15 | torch 16 | transformers 17 | xxhash 18 | -------------------------------------------------------------------------------- /LightRAG/lightrag_hku.egg-info/top_level.txt: -------------------------------------------------------------------------------- 1 | lightrag 2 | -------------------------------------------------------------------------------- /LightRAG/nangeAGICode/test.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) 4 | import numpy as np 5 | from lightrag import LightRAG, QueryParam 6 | from lightrag.utils import EmbeddingFunc 7 | from lightrag.llm import openai_embedding 8 | from lightrag.llm import openai_complete_if_cache 9 | 10 | 11 | 12 | # 模型全局参数配置 根据自己的实际情况进行调整 13 | OPENAI_API_BASE = "https://api.wlai.vip/v1" 14 | OPENAI_CHAT_API_KEY = "sk-dUWW1jzueJ4lrDixWaPsq7nnyN5bCucMzvldpNJwfJlIvAcC" 15 | OPENAI_CHAT_MODEL = "gpt-4o-mini" 16 | OPENAI_EMBEDDING_MODEL = "text-embedding-3-small" 17 | 18 | 19 | # 检测并创建文件夹 20 | WORKING_DIR = "./output" 21 | if not os.path.exists(WORKING_DIR): 22 | os.mkdir(WORKING_DIR) 23 | 24 | 25 | # 自定义Chat模型 配置类OpenAI 26 | async def llm_model_func(prompt, system_prompt=None, history_messages=[], **kwargs) -> str: 27 | return await openai_complete_if_cache( 28 | model=OPENAI_CHAT_MODEL, 29 | prompt=prompt, 30 | system_prompt=system_prompt, 31 | history_messages=history_messages, 32 | api_key=OPENAI_CHAT_API_KEY, 33 | base_url=OPENAI_API_BASE, 34 | **kwargs 35 | ) 36 | 37 | 38 | # 自定义Embedding模型 配置类OpenAI 39 | async def embedding_func(texts: list[str]) -> np.ndarray: 40 | return await openai_embedding( 41 | texts, 42 | model=OPENAI_EMBEDDING_MODEL, 43 | api_key=OPENAI_CHAT_API_KEY, 44 | base_url=OPENAI_API_BASE, 45 | ) 46 | 47 | 48 | # 定义rag 49 | rag = LightRAG( 50 | working_dir=WORKING_DIR, 51 | llm_model_func=llm_model_func, 52 | embedding_func=EmbeddingFunc( 53 | embedding_dim=1536, 54 | max_token_size=8192, 55 | func=embedding_func 56 | ) 57 | ) 58 | 59 | 60 | # 构建索引 61 | with open("./input/book.txt", "r", encoding="utf-8") as f: 62 | rag.insert(f.read()) 63 | 64 | 65 | # # local检索 66 | # print( 67 | # rag.query("这个故事的核心主题是什么?", param=QueryParam(mode="local")) 68 | # ) 69 | 70 | # # global检索 71 | # print( 72 | # rag.query("这个故事的核心主题是什么?", param=QueryParam(mode="global")) 73 | # ) 74 | 75 | # # hybrid检索 76 | # print( 77 | # rag.query("这个故事的核心主题是什么?", param=QueryParam(mode="hybrid")) 78 | # ) 79 | 80 | # # naive检索 81 | # print( 82 | # rag.query("这个故事的核心主题是什么?", param=QueryParam(mode="naive")) 83 | # ) 84 | -------------------------------------------------------------------------------- /LightRAG/nangeAGICode1201/files/incremental_inputs/6.txt: -------------------------------------------------------------------------------- 1 | 第6回 观音院斗宝失袈裟 2 | 3 |   唐僧骑上白龙马,走起路来就轻松了许多。一天傍晚,师徒二人来到山谷里的一座观音院。门口的和尚一听是大唐来的高僧,要到西天去取经,连忙施礼,恭恭敬敬地请他们进院子休息。 4 |   唐僧师徒刚刚坐好,两名小和尚搀扶着一个驼背的和尚,慢慢地走了进来。唐僧连忙起身,双手合掌,施礼相迎。老和尚一边还礼,一边叫人端茶来。不一会儿,两个小童端着精美的茶具进来了。 5 |   唐僧喝了一口茶,夸了几句这茶具。老和尚很高兴,然后卖弄地讲起了茶经,接着又问唐僧有什么东土大唐带来的宝贝,拿出来看一看。悟空见老和尚这般卖弄,心中早有一百个不服气了,不等师父说话,便抢着说∶“师父,把你的袈裟让他们见识见识!” 6 |   老和尚一听袈裟,更是得意,大笑起来,让人拿出十二只箱子,将里面的袈裟全部抖了出来,竟有上百件,而且每一件都很漂亮。悟空见了,也不言语,拿出了唐僧的袈裟抖开,顿时满屋金光四射,让人睁不开眼睛。 7 |   老和尚看呆了,一条毒计爬上心头,找了个借口,请求唐僧把袈裟借给他仔细看上一晚,明早奉还。唐僧还未开口,悟空抢先说∶“就借给他看一晚吧!不会有事的!”唐僧想要阻止已经来不及了,只好很不情愿地把袈裟借给老和尚。 8 |   晚上,老和尚偷偷让小和尚搬来许多木柴,想把唐僧师徒烧死。悟空听到院子里很吵,觉得奇怪,害怕师父被惊醒,就变成一个小蜜蜂,飞到院中,看到眼前的情景,觉得很可笑,眼珠一转,想出了一条妙计。 9 |   悟空驾起筋斗云,来到南天门,守门的天兵天将见是大闹天宫的齐天大圣来了,吓得乱成一团。悟空高叫∶“别怕!别怕!我不是来打架的,是来找广目天王借避火罩,去救师父的!”广目天王只好将宝贝借给悟空。 10 |   悟空拿着避火罩回到观音院,把师父的禅房罩住,然后悠闲地坐在屋顶,看和尚们放火。刹那间,大火熊熊燃烧。悟空想,这些和尚也太狠了,就吹了一口气,立刻刮起一阵狂风,火借风势,整个观音院顿时变成了一片火海。 11 |   这场大火引来了一个妖怪。原来这座观音院的南面有座黑风山,山中黑风洞里住着一个黑风怪。他远远地看见寺庙起火,就想着趁火打劫偷点东西,于是驾云飘进方丈房中,看见桌上的包袱放出金光,打开一看,竟是件价值连城的袈裟。 12 |   黑风怪偷了那件袈裟,驾云回到洞中。悟空只管坐在屋顶吹火,却没注意到黑风怪。天快亮时,悟空见火已经快灭了,才收起避火罩,还给了广目天王。回到禅房,见师父还在熟睡,就轻轻地叫醒了师父。 13 |   唐僧打开房门,见院中四处都是乌黑烧焦的木头,好端端的观音院已经不存在了,感到非常吃惊,悟空就把昨晚发生的事说了一遍。唐僧心中想着袈裟,就和悟空一块去找,寺里的和尚看见他们,还以为是冤魂来了,吓得连连跪地求饶。 14 |   那驼背老和尚看见寺院被烧,又不见了袈裟,正生气,又听唐僧没有烧死,来取袈裟了,吓得不知怎么办才好。最后一狠心,一头往墙上撞去,顿时血流如注,当场就死了。唐僧知道后,埋怨悟空说“唉!徒儿,你何苦要和别人斗气比阔呢?现在可怎么办呀!” 15 |   悟空手拿金箍棒,追问那些和尚袈裟在哪里,和尚都说不知道。悟空想了又想问道∶“这附近可有妖怪?”和尚都说黑风山上有个黑风怪。悟空板着脸说∶“好好侍候我师父,如有不周,小心脑袋!”说着一棒打断了一堵墙。 16 |   悟空一个筋斗来到黑风山,按落云头,往林中走去。忽听坡前有人在说笑,悟空侧身躲在岩石后面,偷偷望去,见地上坐着三个妖魔,为首的一个黑脸大汉说∶“昨天晚上有缘得到了一件佛衣,特地请二位来,开个佛衣盛会!” 17 |   悟空听得一清二楚,一边骂着∶“偷东西的坏家伙!”一边跳上前去,“呼”的就是一捧。黑脸大汉就是黑风怪,变成一股风逃走了;还有个道士也跑了,只有那个白衣秀才没来得及逃走,被悟空一棒打死,现出原形,原来是条大白花蛇。 18 |   悟空紧跟那股风来到一座山峰上,远远地看见对面山崖上有一座洞府,门前有一石碑,上面写着∶“黑风山黑风洞”几个大字。悟空来到洞前,用棒子敲着门,高声叫到∶“坏家伙,还我袈裟来!”小妖怪看到悟空气势汹汹,连忙跑进去报告黑风怪。 19 |   黑风怪刚才在山坡逃走是因为没带武器,现在是在他的地盘上,他可不怕。他穿上乌金甲,提着黑缨枪,出洞和悟空打了起来。打到中午,黑风怪说要吃饭,饭后再打。悟空也不说话,只是打,黑风怪只得再变成一股清风逃回洞中。 20 |   不管悟空在洞外骂得有多难听,黑风怪就是不出来。悟空急得没有办法,只得先回观音院去看师父了。回到院中,随便吃了些东西,又驾云来到黑风山,看见一个小妖拿着一个装请柬的木匣急急忙忙向前走,就一棒把它打死了。 21 |   悟空打开木匣一看,里面装的竟是黑风怪邀请观音院那老和尚的请柬,这才明白,老和尚早就和妖怪有来往,悟空眼珠一转,心生一条妙计,马上变成了老和尚的模样,摇摇摆摆地走到洞口,小妖一见是熟人,连忙开门相迎。 22 |   黑风怪没有看出什么破绽,扶着老和尚走进中厅,还没说几句话,在外面巡逻的小妖进来报告说送信的小妖已经被打死了。黑风怪立刻就明白了是怎么回事,拿出枪来狠狠刺向悟空,悟空侧身躲开,嘿嘿笑了几声,露出了本来面目,和妖怪打了起来。 23 |   两人你一枪,我一棒,打得难分难解,一直到太阳落山。那妖怪说∶“现在天快要黑了,明天再和你打!”悟空知道这家伙又要逃跑,哪肯放过,当头一棒打去,那妖怪化作一股清风,溜回洞中去了。 24 |   悟空没有办法,只好回到观音院。唐僧看到袈裟还没有夺回来,心中非常着急。晚上怎么也睡不着。第二天天刚亮,悟空对唐僧说∶“师父请放心,老孙今天要是夺不回袈裟,就不回来见你!”原来他已决定找观音菩萨想办法。 25 |   悟空驾云来到南海落伽山,见到观音菩萨,上前深深鞠了一躬,说明来意。观音菩萨听后叹了口气说∶“你这猴子,不该当众卖弄宝衣,更不该放火烧了寺院弄成现在这个样子。”说完,嘱咐了童子几句,和悟空驾着云,飞往黑风山。 26 |   他们很快来到黑风山,远远看见那天在山坡前的道士端着玉盘走了过来。悟空上前一棒打死了道士,现出了原形,原来是只大灰狼。悟空捡起盘子,看见里面有两粒仙丹,原来他是去参加佛衣盛会的。 27 |   悟空灵机一动,想出一条妙计,他让观音菩萨变成那道士,自己则变成一颗仙丹,只不过比原来的大一些。观音菩萨把他放在盘中,向洞中走去,按悟空说的计策,要让黑风怪吃下那颗仙丹。 28 |   观音菩萨来到洞中,把仙丹送到黑风怪手中,说∶“小道献上一颗仙丹,祝大王健康长寿!”黑风怪十分高兴,接过仙丹刚送到嘴边,没想到仙丹自动滑了下去。 29 |   悟空一到黑风怪的肚子里,就恢复了原形,在里面打起了猴拳,黑风怪痛得在地上直打滚。观音菩萨也恢复了原形,命令他交出佛衣,黑风怪痛得受不了了,让小妖拿来袈裟。观音菩萨接过佛衣,拿出一个小金圈儿,套在黑风怪头上。 30 |   观音这才让悟空出来。悟空刚从黑风怪的鼻孔里跳出来,黑风怪就摆出一副凶相,拿着黑缨枪向观音刺去。观音浮在空中,念动咒语,黑风怪马上头痛了起来,只好跪在地上,求观音饶命,并说自己愿意出家。 31 |   观音菩萨把佛衣交给悟空,带着黑风怪回南海去了。悟空见黑风洞中的小妖早已逃离,就放了一把火把洞烧了,然后驾云赶回观音院。唐僧和寺里的和尚们看见悟空取回了袈裟,都很高兴。第二天,唐僧师徒离开了观音院,又向西出发 32 | 33 | 34 | -------------------------------------------------------------------------------- /LightRAG/nangeAGICode1201/files/incremental_inputs/7.txt: -------------------------------------------------------------------------------- 1 | 第7回 高老庄唐僧收八戒 2 | 3 |   这一天天快黑了,他们来到一个叫做高老庄的村子。碰巧,庄主高太公正在到处寻找能捉妖怪的法师。悟空一听非常高兴地说∶“不用找了,我就是专门捉妖怪的。” 4 |   原来,高太公有三个女儿,前两个女儿已经出嫁,到了三女儿,就想找个上门女婿来支撑门户。三年前来了个又黑又壮的青年,自称是福陵山人,姓猪,想到高家当女婿。三女儿对他还算满意,高太公就让他们成了家。 5 |   开始这个女婿很勤快,耕田下地,收割粮食,样样都行。没想到过了一阵,他突然变成一个猪头猪脑的妖怪,一顿饭要吃三五斗米,来去都腾云驾雾。这半年来,竟然把三女儿锁在后院,不让人进去。 6 |   悟空听了高太公的话,拍拍胸脯说∶“这个妖怪我捉定了,今天晚上就让他写退婚书,永远不再碰你女儿。”高太公问他要几个帮手,悟空说∶“一个也不要,只要把我师父照顾好就行了。”高太公连忙照办。 7 |   安顿好了师父,悟空让高太公带路来到后院。他打掉铁锁,走进院中一间黑洞洞的屋子。高太公和女儿见面,忍不住抱在一起痛哭起来。三女儿告诉他们∶“那妖怪知道我爹要请法师捉拿他,每天天一亮就走,晚上才回来。” 8 |   悟空让高太公父女离开,自己变成三女儿的模样。没过多久,院外一阵狂风刮来,那妖怪出现在半空中。悟空连忙向床上一靠,装出有病的样子,那妖怪摸进房中,口中喊着∶“姐姐,姐姐,你在哪儿呀?” 9 |   悟空故意叹口气说∶“我听爹今天在外面骂你,还说请了法师来抓你! 10 |   ”那妖怪说∶“不怕,不怕,咱们上床睡吧!”悟空说∶“我爹请的可是那五百年前大闹天宫的齐天大圣,你不害怕吗?”那妖怪倒吸了口凉气说∶“咱们做不成夫妻了。” 11 |   猪精打开门就往外跑,悟空从后面一把扯住他的后领子,把脸一抹,现出原形大叫道∶“泼怪,你看我是谁?”那妖怪一见是悟空,吓得手脚发麻,“呼”地一下化成一阵狂风跑了。 12 |   悟空跟着这股妖风一路追到高山上,只见那股妖风钻进了一个洞里。悟空刚落下云头,那妖怪已现形从洞中出来了,手里拿着一柄九齿钉耙骂道∶“你这个弼马温!当年大闹天宫,不知连累了我们多少人。今天又来欺负我,让你尝尝我的厉害,看耙!” 13 |   悟空举起棒架住了钉耙,问∶“怎么,你认识俺老孙!”那妖怪说出了自己的来历∶原来他是天上的天蓬元师,在王母娘娘的蟠桃会上喝得酩酊大醉,闯进了广寒宫,见嫦娥长得十分美丽,就上去调戏嫦娥。 14 |   玉皇大帝知道这件事后,要根据天条将他处死。多亏太白金星求情,才保住了性命,但要重打二千铜锤,并打入凡间投胎。没想到他急于投胎转世,竟错投了猪胎,落得如此模样。这时他和悟空打了一会儿,就觉得抵挡不住,拔腿就往洞中逃。 15 |   悟空站在洞口骂,那妖怪也不出来。悟空一见,气得乱蹦乱跳,拿起金箍棒打碎了洞门,那妖怪听见洞门被打碎的声音,只好跳出来骂道∶“我在高老庄招亲,跟你有什么关系,你欺人太甚,我这把钉耙绝不饶你!” 16 |   悟空想跟他玩玩,就站着不动,不管那妖怪怎么打,悟空的头皮连红都不红。那妖怪使劲一打,溅得火星乱飞,这下可把他吓坏了,说∶“好头! 17 |   好头!你原来不是在花果山水帘洞,怎么跑到这儿来了,是不是我丈人到那儿把你请来的?” 18 |   悟空说∶“不是,是我自己改邪归正了,保护唐僧西天取经路过这…… 19 |   ”妖怪一听“取经”二字,“啪”地一声一丢钉耙,拱了拱手说∶“麻烦你引见一下,我受观音菩萨劝导,她叫我在这里等你们,我愿意跟唐僧西天取经,也好将功折罪。” 20 |   两个人放火烧了云栈洞,悟空将妖怪的双手反绑上,押回高老庄。那妖怪“扑通”一声跪在唐僧面前,把观音菩萨劝他行善的事说了一遍。唐僧十分高兴,叫悟空给他松绑,又请高太公抬出香炉烛台,拜谢了观音,还给他取了个法号叫猪悟能,别名猪八戒。 21 |   高太公又给猪八戒准备了一套僧衣、僧鞋、僧帽等穿上。临走的时候,八戒一再叮嘱说∶“丈人啊!你好好照看我老婆,如果取不成经,我还是要还俗的。你不要把我的老婆再许给别人呀!”悟空听了笑骂他胡说八道,八戒却说∶“我这是给自己留条后路呢!” 22 | 23 | 24 | -------------------------------------------------------------------------------- /LightRAG/nangeAGICode1201/files/incremental_inputs/8.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NanGePlus/LightRAGTest/669ebb417f2e43eaea9bcb51dca26f80a9addff4/LightRAG/nangeAGICode1201/files/incremental_inputs/8.docx -------------------------------------------------------------------------------- /LightRAG/nangeAGICode1201/files/incremental_inputs/8.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NanGePlus/LightRAGTest/669ebb417f2e43eaea9bcb51dca26f80a9addff4/LightRAG/nangeAGICode1201/files/incremental_inputs/8.pdf -------------------------------------------------------------------------------- /LightRAG/nangeAGICode1201/files/incremental_inputs/9.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NanGePlus/LightRAGTest/669ebb417f2e43eaea9bcb51dca26f80a9addff4/LightRAG/nangeAGICode1201/files/incremental_inputs/9.docx -------------------------------------------------------------------------------- /LightRAG/nangeAGICode1201/files/incremental_inputs/9.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NanGePlus/LightRAGTest/669ebb417f2e43eaea9bcb51dca26f80a9addff4/LightRAG/nangeAGICode1201/files/incremental_inputs/9.pdf -------------------------------------------------------------------------------- /LightRAG/nangeAGICode1201/files/inputs/1.txt: -------------------------------------------------------------------------------- 1 | 第1回 惊天地美猴王出世 2 | 3 |   这是一个神话故事,传说在很久很久以前,天下分为东胜神洲、西牛贺洲、南赡部洲、北俱芦洲。在东胜神洲傲来国,有一座花果山,山上有一块仙石,一天仙石崩裂,从石头中滚出一个卵,这个卵一见风就变成一个石猴,猴眼射出一道道金光,向四方朝拜。 4 |   那猴能走、能跑,渴了就喝些山涧中的泉水,饿了就吃些山上的果子。 5 |   整天和山中的动物一起玩乐,过得十分快活。一天,天气特别热,猴子们为了躲避炎热的天气,跑到山涧里洗澡。它们看见这泉水哗哗地流,就顺着涧往前走,去寻找它的源头。 6 |   猴子们爬呀、爬呀,走到了尽头,却看见一股瀑布,像是从天而降一样。猴子们觉得惊奇,商量说∶“哪个敢钻进瀑布,把泉水的源头找出来,又不伤身体,就拜他为王。”连喊了三遍,那石猴呼地跳了出来,高声喊道∶“我进去,我进去!” 7 |   那石猴闭眼纵身跳入瀑布,觉得不像是在水中,这才睁开眼,四处打量,发现自己站在一座铁板桥上,桥下的水冲贯于石窍之间,倒挂着流出来,将桥门遮住,使外面的人看不到里面。石猴走过桥,发现这真是个好地方,石椅、石床、石盆、石碗,样样都有。 8 |   这里就像不久以前有人住过一样,天然的房子,安静整洁,锅、碗、瓢、盆,整齐地放在炉灶上。正当中有一块石碑,上面刻着∶花果山福地,水帘洞洞天。石猴高兴得不得了,忙转身向外走去,嗖的一下跳出了洞。 9 |   猴子们见石猴出来了,身上又一点伤也没有,又惊又喜,把他团团围住,争著问他里面的情况。石猴抓抓腮,挠挠痒,笑嘻嘻地对大家说∶“里面没有水,是一个安身的好地方,刮大风我们有地方躲,下大雨我们也不怕淋。”猴子们一听,一个个高兴得又蹦又跳。 10 |   猴子们随着石猴穿过了瀑布,进入水帘洞中,看见了这么多的好东西,一个个你争我夺,拿盆的拿盆,拿碗的拿碗,占灶的占灶,争床的争床,搬过来,移过去,直到精疲力尽为止。猴子们都遵照诺言,拜石猴为王,石猴从此登上王位,将石字省去,自称“美猴王”。 11 |   美猴王每天带着猴子们游山玩水,很快三、五百年过去了。一天正在玩乐时,美猴王想到自己将来难免一死,不由悲伤得掉下眼泪来,这时猴群中跳出个通背猿猴来,说∶“大王想要长生不老,只有去学佛、学仙、学神之术。” 12 |   美猴王决定走遍天涯海角,也要找到神仙,学那长生不老的本领。第二天,猴子们为他做了一个木筏,又准备了一些野果,于是美猴王告别了群猴们,一个人撑着木筏,奔向汪洋大海。 13 |   大概是美猴王的运气好,连日的东南风,将他送到西北岸边。他下了木筏,登上了岸,看见岸边有许多人都在干活,有的捉鱼,有的打天上的大雁,有的挖蛤蜊,有的淘盐,他悄悄地走过去,没想到,吓得那些人将东西一扔,四处逃命。 14 |   这一天,他来到一座高山前,突然从半山腰的树林里传出一阵美妙的歌声,唱的是一些关于成仙的话。猴王想∶这个唱歌的人一定是神仙,就顺着歌声找去。 15 |   唱歌的是一个正在树林里砍柴的青年人,猴王从这青年人的口中了解到,这座山叫灵台方寸山,离这儿七八里路,有个斜月三星洞,洞中住着一个称为菩提祖师的神仙。 16 |   美猴王告别打柴的青年人,出了树林,走过山坡,果然远远地看见一座洞府,只见洞门紧紧地闭着,洞门对面的山岗上立着一块石碑,大约有三丈多高,八尺多宽,上面写着十个大字∶“灵台方寸山斜月三星洞”。正在看时,门却忽然打开了,走出来一个仙童。 17 |   美猴王赶快走上前,深深地鞠了一个躬,说明来意,那仙童说∶“我师父刚才正要讲道,忽然叫我出来开门,说外面来了个拜师学艺的,原来就是你呀!跟我来吧!”美猴王赶紧整整衣服,恭恭敬敬地跟着仙童进到洞内,来到祖师讲道的法台跟前。 18 |   猴王看见菩提祖师端端正正地坐在台上,台下两边站着三十多个仙童,就赶紧跪下叩头。祖师问清楚他的来意,很高兴,见他没有姓名,便说∶“你就叫悟空吧!” 19 |   祖师叫孙悟空又拜见了各位师兄,并给悟空找了间空房住下。从此悟空跟着师兄学习生活常识,讲究经典,写字烧香,空时做些扫地挑水的活。 20 |   很快七年过去了,一天,祖师讲道结束后,问悟空想学什么本领。孙悟空不管祖师讲什么求神拜佛、打坐修行,只要一听不能长生不老,就不愿意学,菩提祖师对此非常生气。 21 |   祖师从高台上跳了下来,手里拿着戒尺指着孙悟空说∶“你这猴子,这也不学,那也不学,你要学些什么?”说完走过去在悟空头上打了三下,倒背着手走到里间,关上了门。师兄们看到师父生气了,感到很害怕,纷纷责怪孙悟空。 22 |   孙悟空既不怕,又不生气,心里反而十分高兴。当天晚上,悟空假装睡着了,可是一到半夜,就悄悄起来,从前门出去,等到三更,绕到后门口,看见门半开半闭,高兴地不得了,心想∶“哈哈,我没有猜错师父的意思。” 23 |   孙悟空走了进去,看见祖师面朝里睡着,就跪在床前说∶“师父,我跪在这里等着您呢!”祖师听见声音就起来了,盘着腿坐好后,严厉地问孙悟空来做什么,悟空说∶“师父白天当着大家的面不是答应我,让我三更时从后门进来,教我长生不老的法术吗?” 24 |   菩提祖师听到这话心里很高兴。心想∶“这个猴子果然是天地生成的,不然,怎么能猜透我的暗谜。”于是,让孙悟空跪在床前,教给他长生不老的法术。孙悟空洗耳恭听,用心理解,牢牢记住口诀,并叩头拜谢了祖师的恩情。 25 |   很快三年又过去了,祖师又教了孙悟空七十二般变化的法术和驾筋斗云的本领,学会了这个本领,一个筋斗便能翻出十万八千里路程。孙悟空是个猴子,本来就喜欢蹦蹦跳跳的,所以学起筋斗云来很容易。 26 |   有一个夏天,孙悟空和师兄们在洞门前玩耍,大家要孙悟空变个东西看看,孙悟空心里感到很高兴,得意地念起咒语,摇身一变变成了一棵大树。 27 |   师兄们见了,鼓着掌称赞他。 28 |   大家的吵闹声,让菩提祖师听到了,他拄着拐杖出来,问∶“是谁在吵闹?你们这样大吵大叫的,哪里像个出家修行的人呢?”大家都赶紧停住了笑,孙悟空也恢复了原样,给师父解释,请求原谅。 29 |   菩提祖师看见孙悟空刚刚学会了一些本领就卖弄起来,十分生气。祖师叫其他人离开,把悟空狠狠地教训了一顿,并且要把孙悟空赶走。孙悟空着急了,哀求祖师不要赶他走,祖师却不肯留下他,并要他立下誓言∶任何时候都不能说孙悟空是菩提祖师的徒弟。 30 | 31 | 32 | -------------------------------------------------------------------------------- /LightRAG/nangeAGICode1201/files/inputs/2.txt: -------------------------------------------------------------------------------- 1 | 第2回 闹龙宫刁石猴借宝 2 | 3 |   孙悟空见没办法留下来,就拜别了菩提祖师,又和各位师兄告别,然后念了口诀,驾着筋斗云,不到一个时辰,就回到了花果山水帘洞,看到花果山上一片荒凉破败的景象,很是凄惨。 4 |   原来孙悟空走了以后,有一个混世魔王独占了水帘洞,并且抢走了许多猴子猴孙。孙悟空听到这些以后,气得咬牙跺脚。他问清了混世魔王的住处,决定找混世魔王报仇,便驾着筋斗云,朝北方飞去。 5 |   不一会儿,孙悟空就来到混世魔王的水脏洞前,对门前的小妖喊到∶“你家那个狗屁魔王,多次欺负我们猴子。我今天来,要和那魔王比比高低! 6 |   ”小妖跑进洞里,报告魔王。魔王急忙穿上铁甲,提着大刀,在小妖们的簇拥下走出洞门。 7 |   孙悟空赤手空拳,夺过了混世魔王的大刀,把他劈成了两半。然后,拔下一把毫毛咬碎喷了出去,毫毛变成许多小猴子,直杀进洞里,把所有的妖精全杀死,然后救出被抢走的小猴子,放了一把火烧了水脏洞。 8 |   孙悟空收回了毫毛,让小猴子们闭上眼睛,作起法术来,一阵狂风刮过,他们驾着狂风回到了花果山。从此,孙悟空便叫小猴们做了些竹枪和木刀,用夺来的大刀教他们武艺。没过多久,孙悟空觉得竹枪木刀不能打仗,两个猴告诉他,傲来国里肯定有好的兵器。 9 |   孙悟空驾云来到傲来国上空,念起咒语,立即天空刮起狂风,砂石乱飞,把满城的军民吓得不敢出来。他趁机跑进兵器库拔了把毫毛一吹,变成上千个小猴,乱搬乱抢,悟空见差不多了,把风向一变回了花果山。 10 |   从此以后,花果山水帘洞的名气就更大了,所有的妖怪头子,即七十二洞的洞主都来拜见孙悟空。可是,悟空却有一件事不顺心,嫌那口大刀太轻,不好用。有个通背老猿猴告诉悟空,水帘洞桥下,可直通东海龙宫,叫他去找龙王要一件得心应手的兵器。 11 |   悟空立刻来到东海龙宫,给老龙王敖广讲明了来这儿的目的。龙王不好推辞,叫虾兵们抬出一杆三千六百斤重的九股叉,悟空接过来玩了一阵,嫌它太轻。龙王又命令蟹将们抬出一柄七千二百斤重的方天画戟,悟空一见,仍然嫌它太轻。 12 |   龙王说∶“再也没有比这更重的兵器了。”悟空不信,和龙王吵了起来,龙婆给龙王说∶“大禹治水时,测定海水深浅的神珍铁最近总是放光,就把这给他,管他能不能用,打发他走算了。”龙王听后告诉悟空∶“这宝物太重了,你自己去取吧!” 13 |   孙悟空跟龙王来到海底,龙王用手一指说∶“放光的就是。”悟空见神珍铁金光四射,就走过去用手一摸,原来是根铁柱子,斗一样粗,二丈多长。孙悟空使劲用手搬了搬说∶“太长太长了,要是再短些,再细一些就好了。” 14 |   孙悟空话还没有说完,那个宝贝就短了几尺,也细了一圈。孙悟空看了看说∶“再细些就更好了。”那个宝贝真的又细了许多,悟空拿过来,见上面写着∶“如意金箍棒、重一万三千五百斤”顺手玩了一会儿,觉得十分好用。 15 |   回到水晶宫,孙悟空又要龙王送一身衣服相配。龙王实在没有,但又害怕悟空乱打乱闹,只好敲响应急的金钟,叫来南、北、西三海龙王敖钦、敖顺和敖闰,兄弟三人凑了一副黄金甲、一顶凤翅紫金冠、一双藕丝步云鞋,送给悟空。 16 |   回到花果山,悟空才发现那根金箍棒竟可以变成绣花针一样大小,藏到耳朵中。一天,他宴请所有的妖王吃饭,喝醉了,在桥边的松树下睡觉,迷迷糊糊地见两个人手里拿着写有“孙悟空”的批文,走到他身边也不说话,把他用绳索套上,拉起来就走。 17 |   悟空糊里糊涂跟他们来到一座城门外,看见城楼上有一块牌子,牌子上写着“幽冥界”三个大字,知道这里是阎王住的地方,转身就要走,两个勾魂鬼死死拉住他,非要让他进去。孙悟空一看火了,从耳朵中掏出了金箍棒,把两个勾魂鬼打成了肉酱。 18 |   他甩掉套在身上的绳套,挥着金箍棒直打到城里,又一直打到森罗殿前,十位冥王见悟空长得的十分凶恶,吓得不知道该怎么办。悟空说∶“你们既然坐在王位上,就应该有点灵气,为什么不知道我来?俺老孙已经修成仙道,能长生不老。快拿生死簿来!”十位冥王赶快叫判官拿出生死本来查。 19 |   悟空登上森罗殿,一直查到魂字一千三百五十号,才找到了自己的名字,顺手拿起笔把所有猴子的名字通通勾掉,说∶“这下好极了,好极了,今后再也不归你们管了。”说完又一路打出了幽冥界。十位冥王赶忙到翠云宫去见地藏王菩萨,商量如何向玉皇大帝报告。 20 | 21 | 22 | -------------------------------------------------------------------------------- /LightRAG/nangeAGICode1201/files/inputs/3.txt: -------------------------------------------------------------------------------- 1 | 第3回 齐天大圣大闹天宫 2 | 3 |   冥司阎王和龙王先后都来找玉皇大帝,状告孙悟空大闹龙宫和地府。玉皇大帝正要派天兵、天将到人间去收伏孙悟空。这时,太白金星走了出来,给玉帝出了个主意,说不如随便给他一个官职,把他困在天上,玉帝同意了,命文曲星写了一封诏书,叫太白金星请悟空上天。 4 |   太白金星遵命来到花果山,宣读圣旨。孙悟空听了十分高兴,就命令猴子们看家,自己跟着太白金星驾着云来到灵霄殿上。太白金星向玉帝行了礼,说∶“悟空来了。”玉帝问∶“谁是悟空?”悟空听了,既不行礼,也不跪拜,随便答应了一声∶“我就是。”其他神仙见悟空没有礼貌都非常生气。 5 |   玉帝对悟空没有办法,听了武曲星君的建议让悟空给玉帝看马。这个官职在天上是最小的,过了半个月悟空才知道。一气之下,便拿出金箍棒,杀出南天门,回到花果山,自封“齐天大圣”。又做了一面大旗,插在花果山上。 6 |   玉帝听说孙悟空又回到花果山,马上命令托塔李天王和三太子哪吒,带兵去捉拿悟空。没想到先锋官巨灵神和悟空没打几个回合,宣花斧就成了两截。哪吒一见气得头发都竖了起来,大喊一声,变成三头六臂,拿着六件兵器和悟空打了起来。 7 |   悟空也不示弱,摇身一变,也变成三头六臂,拿着三根金箍棒跟哪吒打了好长时间,仍不分胜负。悟空偷偷拔下一根毫毛变成自己,跟哪吒打,真身却绕到哪吒身后,举起棒子就打。哪吒躲闪不及,被打中左臂,痛得也顾不上还手,转身就跑。 8 |   玉帝听了这些十分生气,准备多派些兵将,再去和孙悟空打。这时太白金星又出了个主意说∶“不如封孙悟空一个有名无权的齐天大圣,什么事也不让他管,只把他留在天上,免得再派人去打,伤了兵将。”玉帝听了觉得有理,于是派太白金星去讲和。 9 |   悟空听说后,十分高兴,跟太白金星又一次来到天宫。玉帝马上让人在蟠桃园右侧为孙悟空修了一座齐天大圣府。孙悟空到底是个猴子,只知道名声好听,也不问有没有实权,整天和天神们以兄弟相称,在府内吃喝玩乐,今天东游游,明天西转转,自由自在。 10 |   时间长了,玉帝怕悟空闲着没事添麻烦,就让他去管蟠桃园。这桃园前、中、后各有桃树一千二百棵。前面的树三千年结果成熟,吃了可以成仙;中间的树六千年结果成熟,吃了能长生不老;后面的树九千年结果成熟,吃了以后可以跟日月同辉,天地齐寿。 11 |   一天,他见园中的桃子大部分都熟了,就想尝个新鲜,便偷偷地跑进园子去,脱了衣服,爬上大树,挑熟透的大桃吃了个饱。从此以后,每隔两三天,他就设法偷吃一次桃。每年一次的蟠桃会到了,一天,七位仙女奉王母娘娘之命进园摘桃。 12 |   恰巧这时孙悟空把桃吃饱了,感到有点困,就变成二寸来长的小人,在大树梢上,找个凉快的地方睡着了。七位仙女见园中的熟桃不多,便四处寻找,找了好长的一段时间,最后在一棵大树梢上发现有个熟透的桃,就把树梢扯下来。 13 |   没想到悟空正好睡在这棵树上,被惊醒了,变回原来的样子。他拿出金箍棒叫了声∶“谁敢偷桃?”吓得七位仙女一齐跪下,说明了来这的原因。 14 |   悟空问蟠桃会请了什么人,当他知道没有自己时,十分生气。 15 |   他用定身法把七位仙女定住,然后驾着云来到瑶池。这时赶来赴宴的众仙还没有到,只有佣人在摆设宴席,于是悟空拔了根毫毛,变成瞌睡虫,放到佣人脸上。这些人立刻呼呼大睡,他跳到桌上,端起美酒,开怀痛饮。 16 |   他吃饱喝足后才走出瑶池,迷迷糊糊地走到太上老君的兜率宫里,刚好宫里没有人就把五个葫芦里的金丹全部倒出来吃了,吃完这才想到闯了大祸,可能保不住性命。于是又回到瑶池,偷了几罐好酒,回花果山去了。 17 |   玉帝听到报告,大发脾气,命令李天王和哪吒太子率领十万天兵天将,布下十八层天罗地网,一定要捉拿悟空回来。但是天兵天将都不是悟空的对手,一个个都败下来。于是观音菩萨就建议让灌江口的显圣二郎神到花果山来捉拿孙悟空。 18 |   二郎神奉命,带领梅山六兄弟,点了些精兵良将,杀向花果山。他请李天王举着照妖镜站在空中,对着悟空照,自己到水帘洞前挑战。悟空出洞迎战,与二郎神打得难分难解。梅山六兄弟见悟空这时顾不上他们,就乘机杀进了水帘洞。 19 |   悟空见自己的老窝被破坏了,心里一慌,变成麻雀想跑,二郎神摇身变成了捉麻雀的鹰,抖抖翅膀就去啄麻雀;悟空急忙又变成一只大鹚鸟,冲向天空,二郎神急忙变成了一只大海鹤,钻进云里去扑;悟空一见嗖地一声飞到水里,变成一条鱼。 20 |   二郎神从照妖镜里看见了悟空,就变成鱼鹰,在水面上等着,悟空见了,急忙变条水蛇,窜到岸边,接着又变成花鸨,立在芦苇上。二郎神见他变的太低贱,也不去理他,变回原来的样子,取出弹弓,朝着花鸨就打,把悟空打得站立不稳。 21 |   悟空趁机滚下山坡,变成一座土地庙,二郎神追过来,见有个旗杆立在庙的后面,就知道是悟空变的,拿起兵器就朝门砸过去,悟空见被看出来了,往上一跳,变回原样就跑,二郎神驾着云追了过去。两个人一边走一边打,又来到花果山跟前。 22 |   各路的天兵神将一拥而上,把悟空团团围住,在南天门观战的太上老君趁机把金钢琢朝悟空扔过去,悟空被打中头部,摔了一跤。二郎神的哮天犬跑上去,咬住了悟空,其他天神则扑上去把悟空按住,用铁链穿住琵琶骨捆了回去。 23 |   孙悟空被绑在斩妖台上,但不论用刀砍斧剁,还是用雷打火烧,都不能伤他一根毫毛。太上老君启奏玉帝,把悟空放到八卦炉里熔炼,玉帝准奏。 24 |   于是,悟空被带到兜率宫,众神仙把他推进八卦炉里,烧火的童子用扇子使劲扇火。 25 |   悟空在炉中跳来跳去,偶然中跳到巽宫的位置,这里只有烟没有火,熏得很厉害,就弯下身子蹲在里面。四十九天过去了,太上老君下令打开炉门,悟空忽然听到炉顶有响声,抬头看见一道光,用力一跳,跳出炼丹炉,踢倒炉子,转身就跑。 26 |   孙悟空不但没有被熔化,反而炼就了一双火眼金睛。他从耳朵中掏出金箍棒,迎风一晃,变成碗口那么粗。悟空抡起如意棒,一路指东打西,直打到灵霄殿上,大声叫喊着∶“皇帝轮流做,玉帝老头,你快搬出去,把天宫让给我,要不,就给你点厉害看看!” 27 |   幸好有三十六员雷将,二十八座星宿赶来保护,玉帝才能脱身。玉帝立即派人去西天请如来佛祖。如来一听,带着阿傩、伽叶两位尊者,来到灵霄殿外,命令停止打斗,叫悟空出来,看看他有什么本事。悟空怒气冲冲地看着如来,根本就不把如来放在眼里。 28 |   如来佛祖伸开手掌说∶“如果你有本领一筋斗翻出我的手掌,我就劝玉帝到西方去,把位子让给你。”悟空一听不知道是计,心里还挺高兴,就把金箍棒放在耳朵里,轻轻一跳,站在如来佛的手心中,喊到∶“我去了!” 29 |   一个筋斗,无影无踪。 30 |   悟空驾着云飞一样地往前赶,忽然见前面有五根肉红色的柱子,想这肯定是天边了,柱子一定是撑天用的,这才停下来。他害怕回去见如来没有凭证,就拔下一根毫毛,变成一支笔,在中间的一根柱子上写下“齐天大圣到此一游”八个大字。 31 |   写完收了毫毛,又跑到第一个柱子下撒了一泡猴尿,然后又驾起筋斗云,回到如来佛祖手掌里说∶“如果你说话算数,就快叫玉帝让位子吧!”如来佛却说孙悟空根本没有离开他的掌心。悟空不服,要如来去看看他在天边留下的证据。 32 |   如来佛不去,他让悟空看看他右手的中指,再闻闻大拇指根。悟空睁大火眼金睛,只见佛祖右手中指上有他写的那八个大字,大拇指丫里还有些猴尿的臊气。悟空吃惊地说∶“我不信,我一点也不信,我把字写在撑天的柱子上,怎么却在你手上?等我去看看再说。” 33 |   悟空转身想跑,如来佛眼疾手快,反手一扑,把悟空推出西天门外,又把手的五指分别化作金、木、水、火、土五座联山,给这座联山起名叫“五行山”,将悟空牢牢压在山下。天上的各位神仙和阿傩、伽叶一个个合掌称好。 34 |   玉帝见如来佛祖镇压了孙悟空,心里十分高兴,立即传令设下“安天大会”感谢佛祖。不一会,各路神仙都被请来了,玉帝又命令打开太玄宫、洞阳玉馆,请如来佛坐在七宝灵台上,各路神仙纷纷送来贺礼,如来佛命阿傩、伽叶将礼物一一收下。 35 |   就在众位神仙饮酒听歌的时候,巡查官回来报告∶“那个妖猴把头伸出来了!”佛祖一听,就从袖子里取出一张帖子,上面写着∶“、嘛、呢、叭、、”,叫阿傩、伽叶拿去贴在五行山顶的一块方石头上,那座山的缝立刻合住,孙悟空再也没有办法出来了。 36 |   如来佛祖回西天时,路过五行山,又发了慈悲心,叫来山神,让他和五方揭谛住在这座山上,监押悟空,并对他们说∶“如果他饿了,就给他吃些铁丸子,渴了,就把溶化的铜水给他喝。五百年以后,他刑期满了,自然会有人来救他。” 37 | 38 | 39 | -------------------------------------------------------------------------------- /LightRAG/nangeAGICode1201/files/inputs/4.txt: -------------------------------------------------------------------------------- 1 | 第4回 五行山从师取真经 2 | 3 |   五百年以后,观音菩萨奉了如来佛的法旨,带着锦袈裟等五件宝贝,跟惠岸行者一块儿来到东土大唐,寻找去西天求取三藏真经的人。师徒二人在半空中驾着云,来到大唐京城长安的上空。这时已是贞观十三年。 4 |   这一天,正是唐太宗李世民命令高僧陈玄奘在化生寺设坛宣讲佛法的日子。陈玄奘是如来佛二弟子金蝉子转世的,观音暗中选定他为取经人,自己与惠岸行者变成了游方和尚,捧着袈裟等宝贝到皇宫门外,要求拜见唐太宗,给他献宝。 5 |   唐太宗一向喜欢佛经,立即叫他们上殿,问那些宝贝一共要多少钱。观音说∶“如来佛祖那儿有三藏真经,你如果派陈玄奘去西天求取真经,这些宝贝就送给你了。”说完,跟惠岸行者变成原来的样子,驾起云走了。太宗一见是观音菩萨,连忙带领满朝文武官员向天朝拜。 6 |   唐太宗十分高兴,和陈玄奘结成了兄弟,要他去西天取经,将护身袈裟等宝物送给了他,并将他的名字改为“唐三藏”。过了几天,三藏要出发了,唐太宗便率领文武百官一路送到长安城外,和三藏依依惜别。 7 |   唐三藏别名唐僧。他和两个仆人赶了两天路,来到法门寺,寺里的和尚赶忙出来迎接。晚上,和尚们坐在一起议论去西天取经的路途艰险,唐僧用手指着心口说∶“只要有坚定的信念,那么任何危险都算不了什么!”和尚们连声称赞。 8 |   第二天,唐僧主仆含泪辞别了和尚,又骑着马继续向西走去。不几天,就来到了大唐的边界河州,镇边总兵和本地的和尚道士把唐僧主仆接到福原寺休息。 9 |   第二天天还没亮,唐僧就把两个仆人叫了起来,三人借着月光赶路。走了十几里就开始上山了,道路起伏不平,杂草丛生,十分难走。他们只好一边拔草一边走。忽然一脚踏空,三人和马还一起摔进了深坑。主仆三人正在惊慌之时,忽然听见“抓起来!抓起来!”的叫喊声。 10 |   随着一阵狂风,出现了一群妖怪,抓住了主仆三人。唐僧偷偷看了看,上面坐着一个长相凶恶的魔王,那魔王一声令下,妖怪们把唐僧主仆绑了起来。这时一个小妖来报∶“熊山君和特处士到!” 11 |   魔王赶忙出去迎接,那两人称魔王为寅将军。寅将军打算用唐僧等人招待他的客人。熊山君说∶“今天,就选吃两个算了。”于是,寅将军把唐僧的两个仆人剖腹挖心,活活地吃掉了。唐僧差点被吓昏过去。 12 |   天快亮了,妖怪们都躲了起来。唐僧吓傻了,昏昏沉沉地睡着。忽然一个柱拐杖的老人慢慢向他走来,把手一挥,捆绑唐僧的绳子都断了,又向他吹一口气,唐僧醒了过来,连忙躬身施礼感谢老人,老人说∶“这个地方叫双叉岭,是个危险的地方。” 13 |   老人让唐僧拿上包袱,牵着马,把他领到大路上来。唐僧连忙拴好马,准备感谢,抬头一看,老人已乘着一只红顶白鹤飞走了,从空中掉下一张纸条,唐僧接过一看,才知老人就是太白金星,于是赶忙向空中不停地施礼。 14 |   唐僧骑着马,沿着山路往前走,走了半天,也不见一个人。他又渴又饿,想找点水喝。忽然看见前面有两只凶恶的老虎,张开了血盆大嘴,又往四周看看,发现身后是吐着红信的毒蛇,左边是有毒的虫子,右边又是些从未见过的野兽。唐僧被困在中间,急得不知如何是好,只好听天由命了。 15 |   就在这危急关头,野兽忽然都逃跑了。唐僧惊奇地四处观看,只见一个手拿钢叉,腰挂弓箭的大汉从山坡上走了过来。唐僧连忙跪下,合掌高叫∶“大王救命!”那大汉挽起唐僧说∶“我哪里是什么大王,只不过是一个猎户,叫刘伯钦。” 16 |   刘伯钦请唐僧到家中作客,唐僧非常高兴,牵着马,来到了刘伯钦的家。第二天,唐僧要上路了,刘伯钦按照母亲的意思,带了几个人,拿着捕猎的工具,要送一送唐僧。走了半天,他们来到一座大山前。 17 |   他们走到半山腰,刘伯钦等人站住说∶“长老,前面就要到两界山了,山东边归大唐管,山西边是鞑靼的疆域,我们是不能过去的,您自己走吧,一路上可要多多小心啊!”唐僧只好和他们道别,忽听山脚下有人大喊∶“师父快过来,师父快过来!” 18 |   唐僧吓得胆战心惊。刘伯钦赶忙说∶“长老莫怕,听老人说,当年王莽造反的时候,这座山从天而降,山下还压着一个饿不死,冻不坏的神猴,刚才肯定是那个神猴在叫喊,长老不妨过去看看。” 19 |   这神猴正是当年被如来压在山下的孙悟空,他一见唐僧就喊道∶“师父快救我出去,我保护你到西天取经。几天前观音菩萨来劝过我,让我给您当徒弟。”唐僧听了非常高兴,可是又很发愁,没有办法把孙悟空救出来。 20 |   孙悟空说只要把山顶上如来佛的金字压帖拿掉就行了。唐僧拿掉了金字压帖后,按照悟空的要求和刘伯钦等人退到十里之外的地方等着。忽然一声天崩地裂般的巨响,五行山裂成两半,顿时飞沙走石,满天灰尘,让人睁不开眼睛。 21 |   等到唐僧睁开眼睛时,悟空已经跪在地上,给他叩头。唐僧见他赤身裸体,就从包袱里拿出一双鞋和一条裤子让他穿上。刘伯钦见唐僧收了徒弟,非常高兴,告别了唐僧师徒回家去了。悟空立刻收拾行李,和师父一道出发。 22 |   没过多久,师徒二人出了大唐边界。忽然从草丛中跳出一只大老虎。孙悟空赶忙放下行李,从耳朵中取出金箍棒,高兴地说∶“老孙已经五百多年没有用过这宝贝了,今天用它弄件衣服穿穿!”说完抡起金箍棒对着老虎狠命一击,老虎当场就死了。 23 |   唐僧见了,惊得连嘴都合不住。悟空拔了根毫毛,变成一把尖刀,剥了虎皮,做了条皮裙围在腰间,然后,恭恭敬敬地扶唐僧上马,师徒继续赶路。忽然一声口哨声,跳出六个强盗,要抢他们的马和行李。 24 |   悟空放下行李,笑着说∶“我原来也是做山大王的,把你们抢的金银珠宝分我一半吧!”强盗们一听,气得头发都竖了起来,拿着刀枪就往悟空头上砍,可是乒乒乓乓砍了七、八十下,也没伤着悟空半根毫毛。 25 |   悟空见他们打累了,高喊一声∶“该俺老孙玩玩了!”他取出金箍棒一个个打,六个强盗就变成了肉酱。唐僧见了很不高兴地说∶“他们虽然是强盗,但也不至于都要打死,你这样残忍,怎能去西天取经呢?阿弥陀佛。” 26 |   孙悟空最受不了别人的气,他听师父这样一说,压不住心中的怒火,高声说到∶“既然师父这样说,那我就不去西天取经了,你自己去吧!老孙我可要回花果山了!”说完纵身一跳,驾上筋斗云,往东飞去了,等到唐僧抬起头,已经看不见孙悟空了。 27 |   唐僧没有办法,只好把行李放在马背上,一手拄着锡杖,一手牵着马,慢慢地往西走去,不久,就见对面来了位老妇人,手里捧着一件衣服和一顶花帽。唐僧赶忙牵住马,双手合掌,让路给老妇人过。 28 |   那老妇人走到唐僧跟前说道∶“你从哪里来呀,怎么一个人在山中走呢?”唐僧就把悟空不听话的事告诉了老妇人,老妇人听后微微一笑,说∶“我送你一件衣服和一顶花帽,给你那不听话的徒弟穿上吧!” 29 |   唐僧苦笑着说∶“唉,徒弟已经走了!要这些还有什么用呢?”老妇人笑着说∶“别急,徒弟我会帮你找回来的。我这儿呀,还有一篇咒语,叫做紧箍咒,你要牢牢记在心里,你让你的徒弟穿上这衣服,戴上帽子,他如果再不听话,你就念咒,他就不敢不听了!” 30 |   唐僧学会了紧箍咒,低头拜谢老妇人。这时老妇人已经变成一道金光,向东飞去。唐僧抬头一看,原来是观音菩萨,赶忙跪下叩头,然后把衣帽收到包袱里,坐在路边,加紧背诵紧箍咒,直到背得滚瓜烂熟。 31 |   观音菩萨驾着祥云,没走多远,碰上了从东边走过来的孙悟空。原来悟空离开唐僧之后,在东海龙王那儿吃了顿饭,在龙王的苦苦劝告之下,已回心转意。观音菩萨让他赶快回到唐僧身边,悟空二话不说,告别观音菩萨去追赶唐僧了。 32 |   见到唐僧,悟空把去龙王那儿吃饭的事情说了一遍,又问∶“师父,你也饿了吧!我去化些斋饭来。”唐僧摇摇头说∶“不用了,包袱里还有些干粮,你给师父拿来吧!”悟空打开包袱,发现观音菩萨给的衣帽十分漂亮,便向唐僧讨取。 33 |   唐僧点头答应了。悟空高兴得抓耳挠腮,忙穿上了衣服,戴上了帽子。 34 |   唐僧要试试紧箍咒灵不灵,就小声念了起来,悟空马上痛得满地打滚,拼命去扯那帽子,可帽子却像长在肉里一样,取也取不下来,扯也扯不烂。 35 |   悟空发现头痛是因为师父在念咒,嘴里喊着“师父别念了!别念了!” 36 |   暗地里取出金箍棒,想把唐僧一棒打死。唐僧见了,紧箍咒越念越快,悟空的头越来越疼,没有办法,只好跪地求饶∶“师父,是我错了,徒儿知道错了,不要再念咒了吧!” 37 |   唐僧见他已经知错,就住了口。悟空的头马上就不痛了,他想这咒语一定是观音菩萨教的,就吵着要去南海找观音菩萨算帐。唐僧说∶“她既然能教我这紧箍咒,肯定也会念咒!”悟空猛吸了一口气,不再胡来,发誓以后一定听师父的话,保护唐僧西天取经。 38 | 39 | 40 | -------------------------------------------------------------------------------- /LightRAG/nangeAGICode1201/files/inputs/5.txt: -------------------------------------------------------------------------------- 1 | 第5回 应愁涧白龙马收缰 2 | 3 |   师徒俩继续向西行。一天,他们来到蛇盘山鹰愁涧,突然从涧中钻出一条白龙来,张着爪子向唐僧冲了过来,悟空慌忙背起唐僧,驾云就跑。那龙追不上悟空,就张开大嘴把白马给吞吃了,然后又钻进深涧了。 4 |   悟空把师父安顿在一个安全地方。转身回到涧边去牵马拿行李,发现马不见了,想着一定是被白龙吃了,就在涧边破口大骂∶“烂泥鳅,把我的马吐出来!”白龙听见有人骂他,气得眼睛都红了,跳出水面,张牙舞爪地向悟空扑来。 5 |   那龙根本不是悟空的对手,几个回合就累得浑身是汗,转身就逃到水里。悟空又骂了一阵,不见白龙出来,便使了个翻江倒海的本领,把这个清澈的涧水弄得泥沙翻滚,浑浊不清。 6 |   那龙在水里待不住了,就硬着头皮跳出来,和悟空打了起来,双方战了几十个回合,白龙实在打不过,摇身变成一条水蛇,钻进了草丛。悟空赶忙追过去,可是连蛇的影子都找不到,气得他把牙咬得乱响。 7 |   于是,悟空念咒语,把山神和土地都叫了出来,问他们白龙从哪里来的。山神和土地小心翼翼地说∶“这白龙是观音菩萨放在这儿等候你们,和你们一起取经的。”悟空一听,气得要找观音菩萨讲道理。 8 |   观音菩萨料事如神,驾云来到鹰愁涧,告诉悟空∶“这白龙原是西海龙王的儿子,犯了死罪,是我讲了个人情,让他给唐僧当马骑的。如果没这匹龙马,你们就去不了西天。”悟空急着说∶“他藏在水里不出来,怎么办? 9 |   ” 10 |   观音菩萨面带微笑,朝涧中喊了一声,那白龙立刻变成一个英俊的公子,来到菩萨跟前。菩萨说∶“小白龙,你师父已经来了!”边说边解下白龙脖上的夜明珠,用柳条蘸些甘露向他身上一挥,吹了口仙气,喊声“变”,白龙就变成了一匹白马。 11 |   观音菩萨叫悟空牵着白马去见唐僧,自己回南海落伽山去了。悟空牵着马,兴高采烈地来到唐僧跟前。唐僧一边用手摸着马头,一边说∶“好马,好马,你是在哪儿找的马?”悟空把经过说了一遍,唐僧连忙向南磕头,感谢观音菩萨。 12 | 13 | 14 | -------------------------------------------------------------------------------- /LightRAG/nangeAGICode1201/graph_visual_with_html.py: -------------------------------------------------------------------------------- 1 | import networkx as nx 2 | from pyvis.network import Network 3 | 4 | # Load the GraphML file 5 | G = nx.read_graphml('./output/graph_chunk_entity_relation.graphml') 6 | 7 | # Create a Pyvis network 8 | net = Network(notebook=True) 9 | 10 | # Convert NetworkX graph to Pyvis network 11 | net.from_nx(G) 12 | 13 | # Save and display the network 14 | net.show('knowledge_graph.html') -------------------------------------------------------------------------------- /LightRAG/nangeAGICode1201/graph_visual_with_neo4j.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) 4 | import json 5 | from lightrag.utils import xml_to_json 6 | from neo4j import GraphDatabase 7 | 8 | # Constants 9 | WORKING_DIR = "./output" 10 | BATCH_SIZE_NODES = 500 11 | BATCH_SIZE_EDGES = 100 12 | 13 | 14 | # 数据库连接相关参数配置 15 | NEO4J_URI="neo4j+s://9ea2ff6d.databases.neo4j.io" 16 | NEO4J_USERNAME="neo4j" 17 | NEO4J_PASSWORD="0YW9l5A11PBRhfzGmWEeLzJMsPTuPRNh8tVfgQkn0qI" 18 | NEO4J_DATABASE="neo4j" 19 | 20 | 21 | def convert_xml_to_json(xml_path, output_path): 22 | """Converts XML file to JSON and saves the output.""" 23 | if not os.path.exists(xml_path): 24 | print(f"Error: File not found - {xml_path}") 25 | return None 26 | 27 | json_data = xml_to_json(xml_path) 28 | if json_data: 29 | with open(output_path, 'w', encoding='utf-8') as f: 30 | json.dump(json_data, f, ensure_ascii=False, indent=2) 31 | print(f"JSON file created: {output_path}") 32 | return json_data 33 | else: 34 | print("Failed to create JSON data") 35 | return None 36 | 37 | def process_in_batches(tx, query, data, batch_size): 38 | """Process data in batches and execute the given query.""" 39 | for i in range(0, len(data), batch_size): 40 | batch = data[i:i + batch_size] 41 | tx.run(query, {"nodes": batch} if "nodes" in query else {"edges": batch}) 42 | 43 | def main(): 44 | # Paths 45 | xml_file = os.path.join(WORKING_DIR, 'graph_chunk_entity_relation.graphml') 46 | json_file = os.path.join(WORKING_DIR, 'graph_data.json') 47 | 48 | # Convert XML to JSON 49 | json_data = convert_xml_to_json(xml_file, json_file) 50 | if json_data is None: 51 | return 52 | 53 | # Load nodes and edges 54 | nodes = json_data.get('nodes', []) 55 | edges = json_data.get('edges', []) 56 | 57 | # Neo4j queries 58 | create_nodes_query = """ 59 | UNWIND $nodes AS node 60 | MERGE (e:Entity {id: node.id}) 61 | SET e.entity_type = node.entity_type, 62 | e.description = node.description, 63 | e.source_id = node.source_id, 64 | e.displayName = node.id 65 | REMOVE e:Entity 66 | WITH e, node 67 | CALL apoc.create.addLabels(e, [node.entity_type]) YIELD node AS labeledNode 68 | RETURN count(*) 69 | """ 70 | 71 | create_edges_query = """ 72 | UNWIND $edges AS edge 73 | MATCH (source {id: edge.source}) 74 | MATCH (target {id: edge.target}) 75 | WITH source, target, edge, 76 | CASE 77 | WHEN edge.keywords CONTAINS 'lead' THEN 'lead' 78 | WHEN edge.keywords CONTAINS 'participate' THEN 'participate' 79 | WHEN edge.keywords CONTAINS 'uses' THEN 'uses' 80 | WHEN edge.keywords CONTAINS 'located' THEN 'located' 81 | WHEN edge.keywords CONTAINS 'occurs' THEN 'occurs' 82 | ELSE REPLACE(SPLIT(edge.keywords, ',')[0], '\"', '') 83 | END AS relType 84 | CALL apoc.create.relationship(source, relType, { 85 | weight: edge.weight, 86 | description: edge.description, 87 | keywords: edge.keywords, 88 | source_id: edge.source_id 89 | }, target) YIELD rel 90 | RETURN count(*) 91 | """ 92 | 93 | set_displayname_and_labels_query = """ 94 | MATCH (n) 95 | SET n.displayName = n.id 96 | WITH n 97 | CALL apoc.create.setLabels(n, [n.entity_type]) YIELD node 98 | RETURN count(*) 99 | """ 100 | 101 | # Create a Neo4j driver 102 | driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD)) 103 | 104 | try: 105 | # Execute queries in batches 106 | with driver.session() as session: 107 | # Insert nodes in batches 108 | session.execute_write(process_in_batches, create_nodes_query, nodes, BATCH_SIZE_NODES) 109 | 110 | # Insert edges in batches 111 | session.execute_write(process_in_batches, create_edges_query, edges, BATCH_SIZE_EDGES) 112 | 113 | # Set displayName and labels 114 | session.run(set_displayname_and_labels_query) 115 | 116 | except Exception as e: 117 | print(f"Error occurred: {e}") 118 | 119 | finally: 120 | driver.close() 121 | 122 | if __name__ == "__main__": 123 | main() -------------------------------------------------------------------------------- /LightRAG/nangeAGICode1201/insertTest.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) 4 | from pathlib import Path 5 | import numpy as np 6 | from lightrag import LightRAG, QueryParam 7 | from lightrag.utils import EmbeddingFunc 8 | from lightrag.llm import openai_embedding 9 | from lightrag.llm import openai_complete_if_cache 10 | 11 | 12 | 13 | 14 | # gpt大模型相关配置根据自己的实际情况进行调整 15 | OPENAI_API_BASE = "https://api.wlai.vip/v1" 16 | OPENAI_CHAT_API_KEY = "sk-gdXw028PJ6JtobnBLeQiArQLnmqahdXUQSjIbyFgAhJdHb1Q" 17 | OPENAI_CHAT_MODEL = "gpt-4o-mini" 18 | OPENAI_EMBEDDING_MODEL = "text-embedding-3-small" 19 | 20 | 21 | # 检测并创建文件夹 22 | WORKING_DIR = "./output" 23 | if not os.path.exists(WORKING_DIR): 24 | os.mkdir(WORKING_DIR) 25 | 26 | 27 | # 自定义Chat模型 配置类OpenAI 28 | async def llm_model_func(prompt, system_prompt=None, history_messages=[], **kwargs) -> str: 29 | return await openai_complete_if_cache( 30 | model=OPENAI_CHAT_MODEL, 31 | prompt=prompt, 32 | system_prompt=system_prompt, 33 | history_messages=history_messages, 34 | api_key=OPENAI_CHAT_API_KEY, 35 | base_url=OPENAI_API_BASE, 36 | **kwargs 37 | ) 38 | 39 | 40 | # 自定义Embedding模型 配置类OpenAI 41 | async def embedding_func(texts: list[str]) -> np.ndarray: 42 | return await openai_embedding( 43 | texts, 44 | model=OPENAI_EMBEDDING_MODEL, 45 | api_key=OPENAI_CHAT_API_KEY, 46 | base_url=OPENAI_API_BASE, 47 | ) 48 | 49 | 50 | # 定义rag 51 | rag = LightRAG( 52 | working_dir=WORKING_DIR, 53 | llm_model_func=llm_model_func, 54 | embedding_func=EmbeddingFunc( 55 | embedding_dim=1536, 56 | max_token_size=8192, 57 | func=embedding_func 58 | ) 59 | ) 60 | 61 | 62 | # 构建索引 支持TXT, DOCX, PPTX, CSV, PDF等文件格式 63 | 64 | # 1、按批次首次构建索引 1-5回 65 | contents = [] 66 | current_dir = Path(__file__).parent 67 | # 指定文件目录 68 | files_dir = current_dir / "files/inputs" 69 | for file_path in files_dir.glob("*.txt"): 70 | with open(file_path, "r", encoding="utf-8") as file: 71 | content = file.read() 72 | contents.append(content) 73 | rag.insert(contents) 74 | print("构建索引完成") 75 | 76 | 77 | # # 2、按批次追加构建索引 6-7回 78 | # contents = [] 79 | # current_dir = Path(__file__).parent 80 | # # 指定文件目录 81 | # files_dir = current_dir / "files/incremental_inputs" 82 | # for file_path in files_dir.glob("*.txt"): 83 | # with open(file_path, "r", encoding="utf-8") as file: 84 | # content = file.read() 85 | # contents.append(content) 86 | # 87 | # rag.insert(contents) 88 | # print("增量构建索引完成") 89 | 90 | 91 | # # 3、按批次追加构建索引 8-9回 92 | # import textract 93 | # contents = [] 94 | # current_dir = Path(__file__).parent 95 | # # 指定文件目录 96 | # files_dir = current_dir / "files/incremental_inputs" 97 | # for file_path in files_dir.glob("*.pdf"): 98 | # text_content = textract.process(str(file_path)) 99 | # contents.append(text_content.decode('utf-8')) 100 | # 101 | # rag.insert(contents) 102 | # print("增量构建索引完成") 103 | 104 | 105 | -------------------------------------------------------------------------------- /LightRAG/nangeAGICode1201/queryTest.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) 4 | import numpy as np 5 | from lightrag import LightRAG, QueryParam 6 | from lightrag.utils import EmbeddingFunc 7 | from lightrag.llm import openai_embedding 8 | from lightrag.llm import openai_complete_if_cache 9 | 10 | 11 | 12 | 13 | # gpt大模型相关配置根据自己的实际情况进行调整 14 | OPENAI_API_BASE = "https://api.wlai.vip/v1" 15 | OPENAI_CHAT_API_KEY = "sk-gdXw028PJ6JtobnBLeQiArQLnmqahdXUQSjIbyFgAhJdHb1Q" 16 | OPENAI_CHAT_MODEL = "gpt-4o-mini" 17 | OPENAI_EMBEDDING_MODEL = "text-embedding-3-small" 18 | 19 | 20 | # 检测并创建文件夹 21 | WORKING_DIR = "./output" 22 | if not os.path.exists(WORKING_DIR): 23 | os.mkdir(WORKING_DIR) 24 | 25 | 26 | # 自定义Chat模型 配置类OpenAI 27 | async def llm_model_func(prompt, system_prompt=None, history_messages=[], **kwargs) -> str: 28 | return await openai_complete_if_cache( 29 | model=OPENAI_CHAT_MODEL, 30 | prompt=prompt, 31 | system_prompt=system_prompt, 32 | history_messages=history_messages, 33 | api_key=OPENAI_CHAT_API_KEY, 34 | base_url=OPENAI_API_BASE, 35 | **kwargs 36 | ) 37 | 38 | 39 | # 自定义Embedding模型 配置类OpenAI 40 | async def embedding_func(texts: list[str]) -> np.ndarray: 41 | return await openai_embedding( 42 | texts, 43 | model=OPENAI_EMBEDDING_MODEL, 44 | api_key=OPENAI_CHAT_API_KEY, 45 | base_url=OPENAI_API_BASE, 46 | ) 47 | 48 | 49 | # 定义rag 50 | rag = LightRAG( 51 | working_dir=WORKING_DIR, 52 | llm_model_func=llm_model_func, 53 | embedding_func=EmbeddingFunc( 54 | embedding_dim=1536, 55 | max_token_size=8192, 56 | func=embedding_func 57 | ) 58 | ) 59 | 60 | 61 | # # naive检索 62 | # print( 63 | # rag.query("这个故事的核心主题是什么?", param=QueryParam(mode="naive")) 64 | # ) 65 | 66 | # # local检索 67 | # print( 68 | # rag.query("这个故事的核心主题是什么?", param=QueryParam(mode="local")) 69 | # ) 70 | 71 | # global检索 72 | print( 73 | rag.query("这个故事的核心主题是什么?", param=QueryParam(mode="global")) 74 | ) 75 | 76 | # # hybrid检索 77 | # print( 78 | # rag.query("这个故事的核心主题是什么?", param=QueryParam(mode="hybrid")) 79 | # ) 80 | 81 | 82 | -------------------------------------------------------------------------------- /LightRAG/nangeAGICode1201/textract-16.5.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NanGePlus/LightRAGTest/669ebb417f2e43eaea9bcb51dca26f80a9addff4/LightRAG/nangeAGICode1201/textract-16.5.zip -------------------------------------------------------------------------------- /LightRAG/reproduce/Step_0.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | import glob 4 | import argparse 5 | 6 | 7 | def extract_unique_contexts(input_directory, output_directory): 8 | os.makedirs(output_directory, exist_ok=True) 9 | 10 | jsonl_files = glob.glob(os.path.join(input_directory, "*.jsonl")) 11 | print(f"Found {len(jsonl_files)} JSONL files.") 12 | 13 | for file_path in jsonl_files: 14 | filename = os.path.basename(file_path) 15 | name, ext = os.path.splitext(filename) 16 | output_filename = f"{name}_unique_contexts.json" 17 | output_path = os.path.join(output_directory, output_filename) 18 | 19 | unique_contexts_dict = {} 20 | 21 | print(f"Processing file: {filename}") 22 | 23 | try: 24 | with open(file_path, "r", encoding="utf-8") as infile: 25 | for line_number, line in enumerate(infile, start=1): 26 | line = line.strip() 27 | if not line: 28 | continue 29 | try: 30 | json_obj = json.loads(line) 31 | context = json_obj.get("context") 32 | if context and context not in unique_contexts_dict: 33 | unique_contexts_dict[context] = None 34 | except json.JSONDecodeError as e: 35 | print( 36 | f"JSON decoding error in file {filename} at line {line_number}: {e}" 37 | ) 38 | except FileNotFoundError: 39 | print(f"File not found: {filename}") 40 | continue 41 | except Exception as e: 42 | print(f"An error occurred while processing file {filename}: {e}") 43 | continue 44 | 45 | unique_contexts_list = list(unique_contexts_dict.keys()) 46 | print( 47 | f"There are {len(unique_contexts_list)} unique `context` entries in the file {filename}." 48 | ) 49 | 50 | try: 51 | with open(output_path, "w", encoding="utf-8") as outfile: 52 | json.dump(unique_contexts_list, outfile, ensure_ascii=False, indent=4) 53 | print(f"Unique `context` entries have been saved to: {output_filename}") 54 | except Exception as e: 55 | print(f"An error occurred while saving to the file {output_filename}: {e}") 56 | 57 | print("All files have been processed.") 58 | 59 | 60 | if __name__ == "__main__": 61 | parser = argparse.ArgumentParser() 62 | parser.add_argument("-i", "--input_dir", type=str, default="../datasets") 63 | parser.add_argument( 64 | "-o", "--output_dir", type=str, default="../datasets/unique_contexts" 65 | ) 66 | 67 | args = parser.parse_args() 68 | 69 | extract_unique_contexts(args.input_dir, args.output_dir) 70 | -------------------------------------------------------------------------------- /LightRAG/reproduce/Step_1.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | import time 4 | 5 | from lightrag import LightRAG 6 | 7 | 8 | def insert_text(rag, file_path): 9 | with open(file_path, mode="r") as f: 10 | unique_contexts = json.load(f) 11 | 12 | retries = 0 13 | max_retries = 3 14 | while retries < max_retries: 15 | try: 16 | rag.insert(unique_contexts) 17 | break 18 | except Exception as e: 19 | retries += 1 20 | print(f"Insertion failed, retrying ({retries}/{max_retries}), error: {e}") 21 | time.sleep(10) 22 | if retries == max_retries: 23 | print("Insertion failed after exceeding the maximum number of retries") 24 | 25 | 26 | cls = "agriculture" 27 | WORKING_DIR = f"../{cls}" 28 | 29 | if not os.path.exists(WORKING_DIR): 30 | os.mkdir(WORKING_DIR) 31 | 32 | rag = LightRAG(working_dir=WORKING_DIR) 33 | 34 | insert_text(rag, f"../datasets/unique_contexts/{cls}_unique_contexts.json") 35 | -------------------------------------------------------------------------------- /LightRAG/reproduce/Step_1_openai_compatible.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | import time 4 | import numpy as np 5 | 6 | from lightrag import LightRAG 7 | from lightrag.utils import EmbeddingFunc 8 | from lightrag.llm import openai_complete_if_cache, openai_embedding 9 | 10 | 11 | ## For Upstage API 12 | # please check if embedding_dim=4096 in lightrag.py and llm.py in lightrag direcotry 13 | async def llm_model_func( 14 | prompt, system_prompt=None, history_messages=[], **kwargs 15 | ) -> str: 16 | return await openai_complete_if_cache( 17 | "solar-mini", 18 | prompt, 19 | system_prompt=system_prompt, 20 | history_messages=history_messages, 21 | api_key=os.getenv("UPSTAGE_API_KEY"), 22 | base_url="https://api.upstage.ai/v1/solar", 23 | **kwargs, 24 | ) 25 | 26 | 27 | async def embedding_func(texts: list[str]) -> np.ndarray: 28 | return await openai_embedding( 29 | texts, 30 | model="solar-embedding-1-large-query", 31 | api_key=os.getenv("UPSTAGE_API_KEY"), 32 | base_url="https://api.upstage.ai/v1/solar", 33 | ) 34 | 35 | 36 | ## /For Upstage API 37 | 38 | 39 | def insert_text(rag, file_path): 40 | with open(file_path, mode="r") as f: 41 | unique_contexts = json.load(f) 42 | 43 | retries = 0 44 | max_retries = 3 45 | while retries < max_retries: 46 | try: 47 | rag.insert(unique_contexts) 48 | break 49 | except Exception as e: 50 | retries += 1 51 | print(f"Insertion failed, retrying ({retries}/{max_retries}), error: {e}") 52 | time.sleep(10) 53 | if retries == max_retries: 54 | print("Insertion failed after exceeding the maximum number of retries") 55 | 56 | 57 | cls = "mix" 58 | WORKING_DIR = f"../{cls}" 59 | 60 | if not os.path.exists(WORKING_DIR): 61 | os.mkdir(WORKING_DIR) 62 | 63 | rag = LightRAG( 64 | working_dir=WORKING_DIR, 65 | llm_model_func=llm_model_func, 66 | embedding_func=EmbeddingFunc( 67 | embedding_dim=4096, max_token_size=8192, func=embedding_func 68 | ), 69 | ) 70 | 71 | insert_text(rag, f"../datasets/unique_contexts/{cls}_unique_contexts.json") 72 | -------------------------------------------------------------------------------- /LightRAG/reproduce/Step_2.py: -------------------------------------------------------------------------------- 1 | import json 2 | from openai import OpenAI 3 | from transformers import GPT2Tokenizer 4 | 5 | 6 | def openai_complete_if_cache( 7 | model="gpt-4o", prompt=None, system_prompt=None, history_messages=[], **kwargs 8 | ) -> str: 9 | openai_client = OpenAI() 10 | 11 | messages = [] 12 | if system_prompt: 13 | messages.append({"role": "system", "content": system_prompt}) 14 | messages.extend(history_messages) 15 | messages.append({"role": "user", "content": prompt}) 16 | 17 | response = openai_client.chat.completions.create( 18 | model=model, messages=messages, **kwargs 19 | ) 20 | return response.choices[0].message.content 21 | 22 | 23 | tokenizer = GPT2Tokenizer.from_pretrained("gpt2") 24 | 25 | 26 | def get_summary(context, tot_tokens=2000): 27 | tokens = tokenizer.tokenize(context) 28 | half_tokens = tot_tokens // 2 29 | 30 | start_tokens = tokens[1000 : 1000 + half_tokens] 31 | end_tokens = tokens[-(1000 + half_tokens) : 1000] 32 | 33 | summary_tokens = start_tokens + end_tokens 34 | summary = tokenizer.convert_tokens_to_string(summary_tokens) 35 | 36 | return summary 37 | 38 | 39 | clses = ["agriculture"] 40 | for cls in clses: 41 | with open(f"../datasets/unique_contexts/{cls}_unique_contexts.json", mode="r") as f: 42 | unique_contexts = json.load(f) 43 | 44 | summaries = [get_summary(context) for context in unique_contexts] 45 | 46 | total_description = "\n\n".join(summaries) 47 | 48 | prompt = f""" 49 | Given the following description of a dataset: 50 | 51 | {total_description} 52 | 53 | Please identify 5 potential users who would engage with this dataset. For each user, list 5 tasks they would perform with this dataset. Then, for each (user, task) combination, generate 5 questions that require a high-level understanding of the entire dataset. 54 | 55 | Output the results in the following structure: 56 | - User 1: [user description] 57 | - Task 1: [task description] 58 | - Question 1: 59 | - Question 2: 60 | - Question 3: 61 | - Question 4: 62 | - Question 5: 63 | - Task 2: [task description] 64 | ... 65 | - Task 5: [task description] 66 | - User 2: [user description] 67 | ... 68 | - User 5: [user description] 69 | ... 70 | """ 71 | 72 | result = openai_complete_if_cache(model="gpt-4o", prompt=prompt) 73 | 74 | file_path = f"../datasets/questions/{cls}_questions.txt" 75 | with open(file_path, "w") as file: 76 | file.write(result) 77 | 78 | print(f"{cls}_questions written to {file_path}") 79 | -------------------------------------------------------------------------------- /LightRAG/reproduce/Step_3.py: -------------------------------------------------------------------------------- 1 | import re 2 | import json 3 | import asyncio 4 | from lightrag import LightRAG, QueryParam 5 | from tqdm import tqdm 6 | 7 | 8 | def extract_queries(file_path): 9 | with open(file_path, "r") as f: 10 | data = f.read() 11 | 12 | data = data.replace("**", "") 13 | 14 | queries = re.findall(r"- Question \d+: (.+)", data) 15 | 16 | return queries 17 | 18 | 19 | async def process_query(query_text, rag_instance, query_param): 20 | try: 21 | result = await rag_instance.aquery(query_text, param=query_param) 22 | return {"query": query_text, "result": result}, None 23 | except Exception as e: 24 | return None, {"query": query_text, "error": str(e)} 25 | 26 | 27 | def always_get_an_event_loop() -> asyncio.AbstractEventLoop: 28 | try: 29 | loop = asyncio.get_event_loop() 30 | except RuntimeError: 31 | loop = asyncio.new_event_loop() 32 | asyncio.set_event_loop(loop) 33 | return loop 34 | 35 | 36 | def run_queries_and_save_to_json( 37 | queries, rag_instance, query_param, output_file, error_file 38 | ): 39 | loop = always_get_an_event_loop() 40 | 41 | with open(output_file, "a", encoding="utf-8") as result_file, open( 42 | error_file, "a", encoding="utf-8" 43 | ) as err_file: 44 | result_file.write("[\n") 45 | first_entry = True 46 | 47 | for query_text in tqdm(queries, desc="Processing queries", unit="query"): 48 | result, error = loop.run_until_complete( 49 | process_query(query_text, rag_instance, query_param) 50 | ) 51 | 52 | if result: 53 | if not first_entry: 54 | result_file.write(",\n") 55 | json.dump(result, result_file, ensure_ascii=False, indent=4) 56 | first_entry = False 57 | elif error: 58 | json.dump(error, err_file, ensure_ascii=False, indent=4) 59 | err_file.write("\n") 60 | 61 | result_file.write("\n]") 62 | 63 | 64 | if __name__ == "__main__": 65 | cls = "agriculture" 66 | mode = "hybrid" 67 | WORKING_DIR = f"../{cls}" 68 | 69 | rag = LightRAG(working_dir=WORKING_DIR) 70 | query_param = QueryParam(mode=mode) 71 | 72 | queries = extract_queries(f"../datasets/questions/{cls}_questions.txt") 73 | run_queries_and_save_to_json( 74 | queries, rag, query_param, f"{cls}_result.json", f"{cls}_errors.json" 75 | ) 76 | -------------------------------------------------------------------------------- /LightRAG/reproduce/Step_3_openai_compatible.py: -------------------------------------------------------------------------------- 1 | import os 2 | import re 3 | import json 4 | import asyncio 5 | from lightrag import LightRAG, QueryParam 6 | from tqdm import tqdm 7 | from lightrag.llm import openai_complete_if_cache, openai_embedding 8 | from lightrag.utils import EmbeddingFunc 9 | import numpy as np 10 | 11 | 12 | ## For Upstage API 13 | # please check if embedding_dim=4096 in lightrag.py and llm.py in lightrag direcotry 14 | async def llm_model_func( 15 | prompt, system_prompt=None, history_messages=[], **kwargs 16 | ) -> str: 17 | return await openai_complete_if_cache( 18 | "solar-mini", 19 | prompt, 20 | system_prompt=system_prompt, 21 | history_messages=history_messages, 22 | api_key=os.getenv("UPSTAGE_API_KEY"), 23 | base_url="https://api.upstage.ai/v1/solar", 24 | **kwargs, 25 | ) 26 | 27 | 28 | async def embedding_func(texts: list[str]) -> np.ndarray: 29 | return await openai_embedding( 30 | texts, 31 | model="solar-embedding-1-large-query", 32 | api_key=os.getenv("UPSTAGE_API_KEY"), 33 | base_url="https://api.upstage.ai/v1/solar", 34 | ) 35 | 36 | 37 | ## /For Upstage API 38 | 39 | 40 | def extract_queries(file_path): 41 | with open(file_path, "r") as f: 42 | data = f.read() 43 | 44 | data = data.replace("**", "") 45 | 46 | queries = re.findall(r"- Question \d+: (.+)", data) 47 | 48 | return queries 49 | 50 | 51 | async def process_query(query_text, rag_instance, query_param): 52 | try: 53 | result = await rag_instance.aquery(query_text, param=query_param) 54 | return {"query": query_text, "result": result}, None 55 | except Exception as e: 56 | return None, {"query": query_text, "error": str(e)} 57 | 58 | 59 | def always_get_an_event_loop() -> asyncio.AbstractEventLoop: 60 | try: 61 | loop = asyncio.get_event_loop() 62 | except RuntimeError: 63 | loop = asyncio.new_event_loop() 64 | asyncio.set_event_loop(loop) 65 | return loop 66 | 67 | 68 | def run_queries_and_save_to_json( 69 | queries, rag_instance, query_param, output_file, error_file 70 | ): 71 | loop = always_get_an_event_loop() 72 | 73 | with open(output_file, "a", encoding="utf-8") as result_file, open( 74 | error_file, "a", encoding="utf-8" 75 | ) as err_file: 76 | result_file.write("[\n") 77 | first_entry = True 78 | 79 | for query_text in tqdm(queries, desc="Processing queries", unit="query"): 80 | result, error = loop.run_until_complete( 81 | process_query(query_text, rag_instance, query_param) 82 | ) 83 | 84 | if result: 85 | if not first_entry: 86 | result_file.write(",\n") 87 | json.dump(result, result_file, ensure_ascii=False, indent=4) 88 | first_entry = False 89 | elif error: 90 | json.dump(error, err_file, ensure_ascii=False, indent=4) 91 | err_file.write("\n") 92 | 93 | result_file.write("\n]") 94 | 95 | 96 | if __name__ == "__main__": 97 | cls = "mix" 98 | mode = "hybrid" 99 | WORKING_DIR = f"../{cls}" 100 | 101 | rag = LightRAG(working_dir=WORKING_DIR) 102 | rag = LightRAG( 103 | working_dir=WORKING_DIR, 104 | llm_model_func=llm_model_func, 105 | embedding_func=EmbeddingFunc( 106 | embedding_dim=4096, max_token_size=8192, func=embedding_func 107 | ), 108 | ) 109 | query_param = QueryParam(mode=mode) 110 | 111 | base_dir = "../datasets/questions" 112 | queries = extract_queries(f"{base_dir}/{cls}_questions.txt") 113 | run_queries_and_save_to_json( 114 | queries, rag, query_param, f"{base_dir}/result.json", f"{base_dir}/errors.json" 115 | ) 116 | -------------------------------------------------------------------------------- /LightRAG/requirements.txt: -------------------------------------------------------------------------------- 1 | accelerate 2 | aioboto3 3 | aiohttp 4 | 5 | # database packages 6 | graspologic 7 | hnswlib 8 | nano-vectordb 9 | neo4j 10 | networkx 11 | ollama 12 | openai 13 | oracledb 14 | pyvis 15 | tenacity 16 | # lmdeploy[all] 17 | 18 | # LLM packages 19 | tiktoken 20 | torch 21 | transformers 22 | xxhash 23 | -------------------------------------------------------------------------------- /LightRAG/setup.py: -------------------------------------------------------------------------------- 1 | import setuptools 2 | from pathlib import Path 3 | 4 | 5 | # Reading the long description from README.md 6 | def read_long_description(): 7 | try: 8 | return Path("README.md").read_text(encoding="utf-8") 9 | except FileNotFoundError: 10 | return "A description of LightRAG is currently unavailable." 11 | 12 | 13 | # Retrieving metadata from __init__.py 14 | def retrieve_metadata(): 15 | vars2find = ["__author__", "__version__", "__url__"] 16 | vars2readme = {} 17 | try: 18 | with open("./lightrag/__init__.py") as f: 19 | for line in f.readlines(): 20 | for v in vars2find: 21 | if line.startswith(v): 22 | line = ( 23 | line.replace(" ", "") 24 | .replace('"', "") 25 | .replace("'", "") 26 | .strip() 27 | ) 28 | vars2readme[v] = line.split("=")[1] 29 | except FileNotFoundError: 30 | raise FileNotFoundError("Metadata file './lightrag/__init__.py' not found.") 31 | 32 | # Checking if all required variables are found 33 | missing_vars = [v for v in vars2find if v not in vars2readme] 34 | if missing_vars: 35 | raise ValueError( 36 | f"Missing required metadata variables in __init__.py: {missing_vars}" 37 | ) 38 | 39 | return vars2readme 40 | 41 | 42 | # Reading dependencies from requirements.txt 43 | def read_requirements(): 44 | deps = [] 45 | try: 46 | with open("./requirements.txt") as f: 47 | deps = [line.strip() for line in f if line.strip()] 48 | except FileNotFoundError: 49 | print( 50 | "Warning: 'requirements.txt' not found. No dependencies will be installed." 51 | ) 52 | return deps 53 | 54 | 55 | metadata = retrieve_metadata() 56 | long_description = read_long_description() 57 | requirements = read_requirements() 58 | 59 | setuptools.setup( 60 | name="lightrag-hku", 61 | url=metadata["__url__"], 62 | version=metadata["__version__"], 63 | author=metadata["__author__"], 64 | description="LightRAG: Simple and Fast Retrieval-Augmented Generation", 65 | long_description=long_description, 66 | long_description_content_type="text/markdown", 67 | packages=setuptools.find_packages( 68 | exclude=("tests*", "docs*") 69 | ), # Automatically find packages 70 | classifiers=[ 71 | "Development Status :: 4 - Beta", 72 | "Programming Language :: Python :: 3", 73 | "License :: OSI Approved :: MIT License", 74 | "Operating System :: OS Independent", 75 | "Intended Audience :: Developers", 76 | "Topic :: Software Development :: Libraries :: Python Modules", 77 | ], 78 | python_requires=">=3.9", 79 | install_requires=requirements, 80 | include_package_data=True, # Includes non-code files from MANIFEST.in 81 | project_urls={ # Additional project metadata 82 | "Documentation": metadata.get("__url__", ""), 83 | "Source": metadata.get("__url__", ""), 84 | "Tracker": f"{metadata.get('__url__', '')}/issues" 85 | if metadata.get("__url__") 86 | else "", 87 | }, 88 | ) 89 | -------------------------------------------------------------------------------- /LightRAG/test.py: -------------------------------------------------------------------------------- 1 | import os 2 | from lightrag import LightRAG, QueryParam 3 | from lightrag.llm import gpt_4o_mini_complete 4 | ######### 5 | # Uncomment the below two lines if running in a jupyter notebook to handle the async nature of rag.insert() 6 | # import nest_asyncio 7 | # nest_asyncio.apply() 8 | ######### 9 | 10 | WORKING_DIR = "./dickens" 11 | 12 | if not os.path.exists(WORKING_DIR): 13 | os.mkdir(WORKING_DIR) 14 | 15 | rag = LightRAG( 16 | working_dir=WORKING_DIR, 17 | llm_model_func=gpt_4o_mini_complete, # Use gpt_4o_mini_complete LLM model 18 | # llm_model_func=gpt_4o_complete # Optionally, use a stronger model 19 | ) 20 | 21 | with open("./dickens/book.txt", "r", encoding="utf-8") as f: 22 | rag.insert(f.read()) 23 | 24 | # Perform naive search 25 | print( 26 | rag.query("What are the top themes in this story?", param=QueryParam(mode="naive")) 27 | ) 28 | 29 | # Perform local search 30 | print( 31 | rag.query("What are the top themes in this story?", param=QueryParam(mode="local")) 32 | ) 33 | 34 | # Perform global search 35 | print( 36 | rag.query("What are the top themes in this story?", param=QueryParam(mode="global")) 37 | ) 38 | 39 | # Perform hybrid search 40 | print( 41 | rag.query("What are the top themes in this story?", param=QueryParam(mode="hybrid")) 42 | ) 43 | -------------------------------------------------------------------------------- /LightRAG/test_neo4j.py: -------------------------------------------------------------------------------- 1 | import os 2 | from lightrag import LightRAG, QueryParam 3 | from lightrag.llm import gpt_4o_mini_complete 4 | 5 | 6 | ######### 7 | # Uncomment the below two lines if running in a jupyter notebook to handle the async nature of rag.insert() 8 | # import nest_asyncio 9 | # nest_asyncio.apply() 10 | ######### 11 | 12 | WORKING_DIR = "./local_neo4jWorkDir" 13 | 14 | if not os.path.exists(WORKING_DIR): 15 | os.mkdir(WORKING_DIR) 16 | 17 | rag = LightRAG( 18 | working_dir=WORKING_DIR, 19 | llm_model_func=gpt_4o_mini_complete, # Use gpt_4o_mini_complete LLM model 20 | kg="Neo4JStorage", 21 | log_level="INFO", 22 | # llm_model_func=gpt_4o_complete # Optionally, use a stronger model 23 | ) 24 | 25 | with open("./book.txt") as f: 26 | rag.insert(f.read()) 27 | 28 | # Perform naive search 29 | print( 30 | rag.query("What are the top themes in this story?", param=QueryParam(mode="naive")) 31 | ) 32 | 33 | # Perform local search 34 | print( 35 | rag.query("What are the top themes in this story?", param=QueryParam(mode="local")) 36 | ) 37 | 38 | # Perform global search 39 | print( 40 | rag.query("What are the top themes in this story?", param=QueryParam(mode="global")) 41 | ) 42 | 43 | # Perform hybrid search 44 | print( 45 | rag.query("What are the top themes in this story?", param=QueryParam(mode="hybrid")) 46 | ) 47 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 1、项目介绍 2 | ## 1.1、本次分享介绍 3 | 本期项目对GraphRAG和LightRAG进行对比测试 4 | **11月29日分享内容:本次分享主要内容为:** 5 | (1)搭建LightRAG、GraphRAG 6 | (2)LightRAG构建索引、naive检索、Local检索、Global检索、Hybrid检索 7 | (3)GraphRAG构建索引、Local检索、Global检索、Drift检索 8 | (4)LightRAG与GraphRAG在索引构建、检索测试中的耗时、模型请求次数、Token消耗金额、检索质量等方面进行对比 9 | 相关视频: 10 | (1)LightRAG与GraphRAG对比评测,从索引构建、本地检索、全局检索、混合检索等维度对请求大模型次数、Token消耗、金额消耗、检索质量等方面进行全面对比 11 | https://www.bilibili.com/video/BV1CmzEYcEnS/?vd_source=30acb5331e4f5739ebbad50f7cc6b949 12 | https://youtu.be/-O5ATdQcefo 13 | (2)GraphRAG发布重大更新!增量更新索引终于来并新增DRIFT图推理搜索查询,带你手把手全流程实操新功能,源码分析,同时支持GPT、国产大模型、本地大模型等 14 | https://www.bilibili.com/video/BV1AADaYfE2T/ 15 | https://youtu.be/7WFMd8U8C7E 16 | 17 | 18 | **12月01日分享内容:本次分享主要内容为:** 19 | (1)LightRAG批量及增量构建索引,支持TXT、PDF、DOCX、PPTX、CSV等 20 | (2)LightRAG生成的知识图谱可视化展示,html方式和neo4j数据库方式 21 | (3)LightRAG与GraphRAG基于neo4j数据库进行知识图谱可视化进行质量评测 22 | 相关视频: 23 | https://www.bilibili.com/video/BV1xXzoYGEJw/ 24 | https://youtu.be/-IiyHQQdn34 25 | 26 | 27 | 28 | ## 1.2 GraphRAG介绍 29 | ### (1)简介 30 | GraphRAG 截止当前(11.29) 为0.5.0版本,该版本引入了增量更新索引和DRIFT图推理搜索查询(一种混合本地与全局的搜索的方法) 31 | github地址:https://github.com/microsoft/graphrag 32 | 官网地址:https://microsoft.github.io/graphrag/query/overview/ 33 | ### (2)定义 34 | GraphRAG是微软研究院开发的一种创新型检索增强生成(RAG)方法,旨在提高大语言模型LLM在处理复杂信息和私有数据集时的推理能力 35 | GraphRAG 是一种基于大型语言模型 (LLM) 的技术,用于从非结构化文本文档中生成知识图谱和摘要,并利用这些内容提升在私有数据集上的检索增强生成 (RAG) 效果 36 | 通过 GraphRAG,用户能够从大量非结构化的文本数据中提取信息,构建一个全局概览,同时还可以深入挖掘更为本地化的细节 37 | 该系统借助 LLM 生成知识图谱,识别文档中的实体和关系,利用数据的语义结构响应复杂查询 38 | 最新消息是11.26凌晨,微软宣布将推出 GraphRAG 的全新迭代版本LazyGraphRAG 39 | 核心亮点是极低的使用成本,其数据索引成本仅为现有GraphRAG 的 0.1%。此外,LazyGraphRAG 引入了全新的混合数据检索方法,大幅提升了生成结果的准确性和效率。该版本将很快开源,并纳入到 GitHub GraphRAG 库中 40 | 原文链接如下:https://www.microsoft.com/en-us/research/blog/lazygraphrag-setting-a-new-standard-for-quality-and-cost/ 41 | ### (3)支持的检索方式 42 | **Local Search** 43 | 本地搜索方法将知识图谱中的结构化数据与输入文档中的非结构化数据结合起来,在查询时用相关实体信息增强 LLM 上下文 44 | 这种方法非常适合回答需要了解输入文档中提到的特定实体的问题 45 | **Global Search** 46 | 根据LLM生成的知识图谱结构能知道整个数据集的结构(以及主题) 47 | 这样就可以将私有数据集组织成有意义的语义集群,并预先加以总结。LLM在响应用户查询时会使用这些聚类来总结这些主题 48 | **DRIFT Search** 49 | DRIFT 搜索通过整合社区信息,使本地搜索的起点更为宽泛,能够检索更多类型的事实,从而提升查询的详细性 50 | DRIFT 在 GraphRAG 的查询引擎基础上进一步扩展,为本地搜索提供了更精细的分层查询方式,使其能够处理与预定义模板不完全匹配的复杂查询 51 | ### (4)相关GrahRAG视频 52 | GraphRAG(基于知识图谱的RAG)系列合集 53 | https://space.bilibili.com/509246474 54 | https://www.youtube.com/playlist?list=PL8zBXedQ0uflcyD-_ghqmjpv1lAdRdKJ- 55 | 56 | ## 1.3 LightRAG介绍 57 | ### (1)简介 58 | LightRAG 截止当前(11.29) 未发布release版本 59 | github地址:https://github.com/HKUDS/LightRAG 60 | ### (2)定义 61 | LightRAG是香港大学开发的一种简单高效的大模型检索增强(RAG)系统 62 | 它将图结构整合到文本索引和检索过程中。这一创新框架采用了双层检索系统,从低层次和高层次的知识发现中增强了全面信息检索 63 | 此外,图结构与向量表示的整合便于高效检索相关实体及其关系,显著提高了响应时间,同时保持了上下文的相关性 64 | 这种能力通过增量更新算法进一步增强,确保新数据能够及时整合,使系统在快速变化的数据环境中保持有效和响应性 65 | ### (3)支持的检索方式 66 | **Naive Search** 67 | Naive 模式是最简单的检索策略,它直接基于输入查询计算向量相似度,返回最接近的结果,不进行任何额外的优化或复杂处理 68 | **Local Search** 69 | Local 模式只在本地上下文范围内进行检索。它聚焦于用户当前输入的特定领域或某部分数据,不会考虑全局数据 70 | **Global Search** 71 | Global 模式会在整个知识库范围内进行检索,试图找到与查询最相关的信息,而不局限于当前上下文或局部区域 72 | **Hybrid Search** 73 | Hybrid 模式结合了 Local 和 Global 的优势,同时考虑局部上下文和全局信息,综合结果以提高答案的相关性和覆盖范围 74 | 75 | 76 | # 2、前期准备工作 77 | ## 2.1 开发环境搭建:anaconda、pycharm 78 | anaconda:提供python虚拟环境,官网下载对应系统版本的安装包安装即可 79 | pycharm:提供集成开发环境,官网下载社区版本安装包安装即可 80 | 可参考如下视频进行安装,【大模型应用开发基础】集成开发环境搭建Anaconda+PyCharm 81 | https://www.bilibili.com/video/BV1q9HxeEEtT/?vd_source=30acb5331e4f5739ebbad50f7cc6b949 82 | https://youtu.be/myVgyitFzrA 83 | 84 | ## 2.2 大模型相关配置 85 | (1)GPT大模型使用方案 86 | (2)非GPT大模型(国产大模型)使用方案(OneAPI安装、部署、创建渠道和令牌) 87 | (3)本地开源大模型使用方案(Ollama安装、启动、下载大模型) 88 | 可参考如下视频: 89 | 提供一种LLM集成解决方案,一份代码支持快速同时支持gpt大模型、国产大模型(通义千问、文心一言、百度千帆、讯飞星火等)、本地开源大模型(Ollama) 90 | https://www.bilibili.com/video/BV12PCmYZEDt/?vd_source=30acb5331e4f5739ebbad50f7cc6b949 91 | https://youtu.be/CgZsdK43tcY 92 | 93 | 94 | # 3、项目初始化 95 | ## 3.1 下载源码 96 | GitHub或Gitee中下载工程文件到本地,下载地址如下: 97 | https://github.com/NanGePlus/LightRAGTest 98 | https://gitee.com/NanGePlus/LightRAGTest 99 | 100 | ## 3.2 构建项目 101 | 使用pycharm构建一个项目,为项目配置虚拟python环境 102 | 项目名称:LightRAGTest 103 | 104 | ## 3.3 将相关代码拷贝到项目工程中 105 | 直接将下载的文件夹中的文件拷贝到新建的项目目录中 106 | 107 | ## 3.4 安装项目依赖 108 | 命令行终端中执行如下命令安装依赖包 109 | cd LightRAG 110 | pip install -e . 111 | cd GraphRAG 112 | pip install graphrag==0.5.0 113 | 114 | 115 | # 4、11月29日分享内容相关测试 116 | **测试文本** 测试文本均为使用西游记白话文前九回内容,文件名为book.txt 117 | **模型配置** 大模型使用OpenAI(代理方案),Chat模型均使用gpt-4o-mini,Embedding模型均使用text-embedding-3-small 118 | **其他配置** 笔记本均为MacBook Pro2017,网速、python环境均相同 119 | ## 4.1 LightRAG测试 120 | ### (1)构建索引 121 | 打开命令行终端,执行如下指令 122 | cd LightRAG/nangeAGICode 123 | python test.py 124 | **注意** 在运行脚本之前,需要调整相关代码将如下代码块打开,检索相关的代码块注释 125 | ### (2)逐一测试 126 | 执行如下指令 127 | cd LightRAG/nangeAGICode 128 | python test.py 129 | **注意** 在运行脚本之前,需要注释如下构建索引代码,取消检索相关的代码块注释 130 | 131 | ## 4.2 GraphRAG测试 132 | ### (1)构建索引 133 | 打开命令行终端,执行如下指令 134 | cd GraphRAG 135 | graphrag index --root ./ 136 | ### (2)逐一测试 137 | graphrag query --root ./ --method local --query "这个故事的核心主题是什么?" 138 | graphrag query --root ./ --method global --query "这个故事的核心主题是什么?" 139 | graphrag query --root ./ --method drift --query "这个故事的核心主题是什么?" 140 | 141 | ## 4.3 对比测试结论一览表 142 | 143 | 144 | 145 | # 5、12月01日分享内容相关测试 146 | **测试文本** 测试文本均为使用西游记白话文前九回内容 147 | **模型配置** 大模型均使用OpenAI(代理方案),Chat模型均使用gpt-4o,Embedding模型均使用text-embedding-3-small 148 | **其他配置** 笔记本均为MacBook Pro2017,网速、python环境均相同 149 | 150 | ## 5.1 LightRAG构建索引测试 151 | ### (1)安装textract依赖包 152 | 通过指令 pip install textract 安装时会报错,报错的原因是 153 | 其元数据文件中使用了不再被支持的版本约束符号(<=0.29.*),而当前 pip 和 setuptools 不再接受这种格式 154 | 解决方案:下载依赖包源码,修改相应参数后本地进行安装 155 | https://pypi.org/project/textract/1.6.5/#description 156 | cd textract-1.6.5 157 | pip install . 158 | 159 | ### (2) 创建neo4j数据库实例 160 | 推荐使用云服务进行测试,链接地址如下: 161 | https://console-preview.neo4j.io/tools/query 162 | 注册登录成功,直接新建实例即可 163 | 164 | ### (3)增量索引构建及知识图谱可视化测试 165 | 运行如下指令进行索引构建 166 | cd LightRAG/nangeAGICode1201 167 | python insertTest.py 168 | python queryTest.py 169 | 每一次构建完成,先清除数据库中的数据再运行如下指令进行可视化 170 | 在运行之前需要根据自己的实际情况进行参数的调整 171 | python graph_visual_with_html.py 172 | python graph_visual_with_neo4j.py 173 | 在数据库中进行查询测试 174 | MATCH (n:`PERSON`) 175 | WHERE n.displayName CONTAINS '唐僧' 176 | RETURN n LIMIT 25; 177 | 178 | MATCH (n:`PERSON`) 179 | WHERE n.displayName CONTAINS '八戒' 180 | RETURN n LIMIT 25; 181 | 182 | MATCH (n:`PERSON`) 183 | WHERE n.displayName CONTAINS '沙和尚' 184 | RETURN n LIMIT 25; 185 | 186 | **清除数据** 187 | MATCH (n) 188 | CALL { WITH n DETACH DELETE n } IN TRANSACTIONS OF 25000 ROWS; 189 | 190 | ### (4)LightRAG和GraphRAG生成的知识图谱对比 191 | 运行如下指令将GraphRAG生成的知识图谱进行可视化展示 192 | cd GraphRAG/utils 193 | python graph_visual_with_neo4j.py 194 | 在运行脚本前根据自己的实际情况进行调整,修改文件所在路径为存储增量数据的文件路径 195 | GRAPHRAG_FOLDER="/Users/janetjiang/Desktop/agi_code/LightRAGTest/GraphRAG/output" 196 | 在数据库中进行查询测试 197 | MATCH (n:`__Entity__`) 198 | WHERE n.name CONTAINS '唐僧' 199 | RETURN n LIMIT 25; 200 | 201 | MATCH (n:`__Entity__`) 202 | WHERE n.name CONTAINS '八戒' 203 | RETURN n LIMIT 25; 204 | 205 | MATCH (n:`__Entity__`) 206 | WHERE n.name CONTAINS '沙和尚' 207 | RETURN n LIMIT 25; 208 | 209 | **清除数据** 210 | MATCH (n) 211 | CALL { WITH n DETACH DELETE n } IN TRANSACTIONS OF 25000 ROWS; 212 | -------------------------------------------------------------------------------- /img.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NanGePlus/LightRAGTest/669ebb417f2e43eaea9bcb51dca26f80a9addff4/img.png -------------------------------------------------------------------------------- /results.md: -------------------------------------------------------------------------------- 1 | 2 | # Local检索-LightRAG与GraphRAG对比 3 | ## (1)LightRAG 4 | 在这个故事中,核心主题围绕着**勇气与智慧的斗争**,以及对抗邪恶的决心和坚持。故事的主线追随着主人公悟空及其同伴们在面对各种妖怪与天界力量时,展现了他们的勇气、智慧以及与邪恶之间的斗争,包括个人与权威的冲突,以及对正义的追求。 5 | 6 | ### 勇气和力量的体现 7 | 8 | 悟空,作为一个勇敢且具有特殊能力的角色,通过他的行动和决心体现出勇气的重要性。在对抗黑风怪的过程中,悟空不仅使用了他的武器金箍棒,还展现了利用策略和智慧进行战斗的能力。这一点在他与黑风怪的冲突中显得尤为突出,悟空通过灵活变换,成功地揭示了黑风怪的真实身份,加强了他在抗击敌人过程中的英雄形象。 9 | 10 | ### 伙伴关系与团队合作 11 | 12 | 故事中悟空与唐僧、猪八戒等人物之间的关系也反映了合作的重要性。每个角色在旅途中都有独特的贡献和角色,强调了团队协作在面对挑战时所起的关键作用。这种关系不仅增强了情节的深度,也展示了人性中的友爱和互相支持的价值。 13 | 14 | ### 正义与道德的衝突 15 | 16 | 此外,故事还探讨了正义与道德之间的复杂关系。例如,灵吉菩萨作为一个干预者,他不仅帮助悟空对抗妖怪,还体现了天界对于邪恶行为的审判。通过这种对比,故事挑战了人们对善恶的理解,每个角色都有其背后的驱动和成因,增加了情节的复杂性和深度。 17 | 18 | ### 总结 19 | 20 | 总体来看,本故事的核心主题不仅在于个人的勇气与智慧的斗争,还在于通过正义与道德的冲突描绘出复杂的人性,以及亲密的伙伴关系如何帮助角色克服挑战。这些主题的交织,使得故事具有了丰富的内涵和深远的影响。 21 | 22 | ## (2)GraphRAG 23 | ### 故事的核心主题 24 | 25 | 这个故事的核心主题围绕着信仰、冒险、忠诚与救赎展开,展现了个人成长与自我提升的过程。《西游记》中的角色,如唐僧、悟空和猪八戒,各自在取经的旅程中经历了种种挑战,体现了修行与道德的探索。 26 | 27 | #### 信仰与使命 28 | 29 | 唐僧作为故事中的领袖,他的使命是取经,象征着追求真理与信仰的力量。尽管他在旅程中经受了诸多困难,如被妖怪捕获和绑在定风柱上,但他仍然坚守着信念,努力引导徒弟们,通过佛法实现自我救赎。此外,唐僧对白马的感激也反映了他对取经过程的尊重与感恩 [Data: Entities (172); Relationships (253); Sources (20)]。 30 | 31 | #### 冒险与挑战 32 | 33 | 故事中不乏冒险元素,特别是根据悟空与妖怪的斗争,可以看到在不断的战斗与困境中,角色们逐渐成长。尤其是悟空,他的聪明才智与战斗能力使他成为保护唐僧的重要角色,体现了他从自我中心到无私奉献的转变。这一过ntities (31, 196, 218); Relationships (131, 300); Sources (20)]。 34 | 35 | #### 忠诚与救赎 36 | 37 | 悟空和猪八戒之间的关系同样值得关注。悟空的忠诚以及他在逆境中对唐僧的保护,突显了友谊与牺牲的重要性。同时,猪八戒的变化,从妖怪到唐僧的徒弟,标志着个人的悔悟与救赎,体现了希望每一个角色都能在挑战中找到自己的归属 [Data: Entities (31, 199, 218); Relationships (0, 8); Sources (20)]。 38 | 39 | ### 结论 40 | 41 | 总的来说,这个故事不仅仅是魔幻的冒险,更是对信仰、勇气与友情的深刻反思。每个角色在面对内外挑战时的表现,深刻地揭示了人类在追求理想过程中的挣扎与成长,也体现了佛教教义中关于修行与成佛的核心思想 [Data: Entities (0, 172, 199); Relationships (60, 250); Sources (20)]。 42 | 43 | 44 | 45 | # Global检索-LightRAG与GraphRAG对比 46 | ## (1)LightRAG 47 | ### 核心主题分析 48 | 49 | 这个故事的核心主题围绕着英雄主义与善恶斗争展开,表现在角色之间的复杂关系和冲突中。故事中的主要角色,如悟空、唐僧和黑风怪,各自代表着不同的立场和伦理观念,推动了情节的发展。 50 | 51 | #### 英雄与保护 52 | 53 | 悟空作为故事的中心角色,展现了顽强的英雄主义。他不仅勇敢地面对黑风怪和老妖,还积极参与保护唐僧,体现了忠诚与责任。悟空与唐僧之间的师徒关系充分展现了保护与支持的主题。唐僧作为悟空的导师,常常对悟空的冒险活动表示关心和担忧,这种情感深深植根于着对彼此的信任和相互支持。 54 | 55 | #### 善恶斗争 56 | 57 | 故事中,黑风怪和老妖则代表了反派角色,他们的行为不断给正义带来挑战。黑风怪偷了袈裟,触发了悟空和唐僧的复仇欲望。这种对善恶之间较量的表现,不仅揭示了人性中光明与黑暗的一面,也反映了在面对邪恶时勇气与重要性,悟空通过与对手的斗争而显现出他的成长与突破。 58 | 59 | #### 幻想与神圣介入 60 | 61 | 故事不仅仅是冒险,它也探讨了人与神之间的关系。灵吉菩萨和观音菩萨等神祇在关键时刻介入,帮助悟空解决危机。他们的存在代表着一种道德的监督和引导,强调了人类精神世界的深邃与丰富。这种神圣的干预使得英雄的旅程增添了神秘色彩,同时引导着主题向更高层次发展,如自我救赎和对命运的挑战。 62 | 63 | #### 结论 64 | 65 | 故事通过细致的情节与角色发展,围绕着英雄主义、善与恶的交锋和道德道义的思考,构成了一个富有深度与教育意义的叙述。通过悟空的冒险,读者不仅能观看到冲突的精彩,还能反思在面对困境时应如何选择和行动,从而展现出人性的光辉。 66 | 67 | ## (2)GraphRAG 68 | ## 故事的核心主题 69 | 70 | 故事围绕孙悟空的身份探索与反叛精神展开,体现了他对自由的渴望以及对权威的挑战。这种反叛不仅是个体的抗争,也反映了社会对压迫的反抗,促使人们对人类价值和文化传承的思考 [Data: Reports (34, 20, 12, 25, 44)]。 71 | 72 | ### 个体与权威的矛盾 73 | 74 | 故事还深入探讨了个体自由与权威之间的矛盾。在悟空的反叛过程中,展现了他对自我身份的追寻,以及他在传统与个人价值之间的斗争 [Data: Reports (35, 45, 14)]。这一主题强调了个人与集体之间的张力与冲突,反映了传统社会规则的挑战 [Data: Reports (36, 33, 21)]。 75 | 76 | ### 追求不朽与生命意义 77 | 78 | 追求不朽也成为故事中重要的主题。孙悟空的经历不仅反映了他对生命意义的深入思考,还与中国传统文化中的道教与佛教思想紧密相关。这种追求强调了对自我救赎的探索及其在故事情节中的重要性 [Data: Reports (34, 312, 20, 25)]。 79 | 80 | ### 忠诚与牺牲 81 | 82 | 在孙悟空与唐僧之间的师徒关系中,还有忠诚与牺牲的主题,体现了他们在面对外部威胁时的团结与支持。唐僧的慈悲与智慧不仅指引了孙悟空的成长,还展示了个人成长与集体支持之间的深刻联系 [Data: Reports (20, 12, 18)]。 83 | 84 | ### 友情与合作 85 | 86 | 友情与合作同样是故事中不可或缺的主题,特别是在诸如木叉行者和灵吉菩萨等角色对悟空的支持中,强调了在逆境时团结协作的重要性。通过这些角色的相互作用,故事不仅展示了角色间的友情,也反映了人际关系的复杂性与社会中的定位 [Data: Reports (39, 13)]。 87 | 88 | ### 结论 89 | 90 | 总的来看,故事的核心主题涵盖了对自由的渴望、个体与权威的冲突、对生命意义的追求、以及忠诚与牺牲的深刻探讨。这些主题通过孙悟空的旅程以及他与其他角色的互动得以具体展现,形成了丰富而深刻的文化内涵 [Datas (35, 14, 30)]。 91 | 92 | 93 | # Hybride/Drift检索-LightRAG与GraphRAG对比 94 | ## (1)LightRAG 95 | ### 故事核心主题分析 96 | 97 | 这个故事的核心主题围绕着追求自我救赎和对抗邪恶的努力而展开,主要表现在几个方面。 98 | 99 | #### 1. **正义与邪恶的斗争** 100 | 101 | 在故事中,悟空、唐僧以及他们的伙伴们象征着正义,而黑风怪、老妖等妖怪则代表着邪恶势力。通过悟空与黑风怪的对抗,可以看到正义力量与邪恶之间的直接冲突,这不仅是物理上的战斗,也是道德和精神上的较量。悟空的强大和机智使得他不断挑战并击败这些敌人,而这些敌人的失败也寓意着邪恶终将被正义战胜。 102 | 103 | #### 2. **师徒关系和忠诚** 104 | 105 | 故事中唐僧与悟空的关系展现了师徒之间深厚的情感。当唐僧被捕时,悟空承担起保护和拯救的责任,体现了忠诚和牺牲的精神。这种师徒间的信任和责任感是故事的重要动力,使得他们在艰苦的旅途中不断前行。 106 | 107 | #### 3. **成长与救赎** 108 | 109 | 故事中的角色经历了成长和变化。猪八戒从一个妖怪转化为唐僧的徒弟,代表了个人的救赎和转变。每个角色在面对困难和考验时,逐渐认清自己的使命和价值。这种成长不仅对他们自身有益,也为整个团队的目标提供了支持。 110 | 111 | #### 4. **信仰与道德** 112 | 113 | 唐僧作为一个追求真理和智慧的和尚,展现了坚定的信仰和道德观念。在他对佛教经文的追求中,反映了对知识、信仰和道德的崇高追求。这种信仰在面对挑战时为他和他的弟子提供了精神支持,帮助他们在诱惑和困难面前坚持正确的道路。 114 | 115 | ### 总结 116 | 117 | 总的来说,这个故事传达了追求正义、忠诚、救赎和信仰的重要性。在这些主题的影响下,角色们展示了面对困境时的勇气和决心,影响了他们的命运和旅程。通过面对邪恶和内心的挣扎,故事揭示了每个人灵魂中潜藏的力量和光明,展现出一种积极向上的力量。 118 | 119 | 120 | ## (2)GraphRAG 121 | ### 《西游记》核心主题分析 122 | 123 | 《西游记》中孙悟空作为主角的角色发展围绕着反叛与身份认同、追求不朽这一核心主题展开。首先,孙悟空展现出的强烈反叛精神是故事的重要组成部分。他对天界的挑战不仅反映了他个人的叛逆,也体现了对社会压迫的抗争。通过与如唐僧、观音菩萨等其他角色的互动,孙悟空的反叛行为和对权威的质疑为其角色发展增添了层次感,深刻展示了个体在面对压迫时的反抗力量。 124 | 125 | 身份认同也是《西游记》的重要主题之一。孙悟空在故事中的身份从一个自由自在的妖猴转变为唐僧的徒弟,这样的转变不仅影响了他的生活选择,还促使他重新审视自身存在的意义。在与唐僧的关系中,孙悟空通过学习责任和道德感,实现了从反叛到顺从的复杂转变,这反映了个人与集体之间动态关系的深刻思考。 126 | 127 | 与此同时,追求不朽的主题贯穿于孙悟空的整个旅程。他寻求永生的动机不仅仅是对肉体的延续,更是对灵魂升华的渴望。这一主题在其与唐僧的修行中愈加明显,揭示了人类对不朽的幻想与现实之间的矛盾。此外,孙悟空与唐僧、沙和尚等角色之间的互补关系也为故事增添了丰富的情感层面,强调了智慧、责任与合作的重要性,将整个作品的文化意涵与社会背景都提升到了另一个高度。 128 | 129 | 综上所述,《西游记》探讨了个体反叛、身份探索及不朽追寻等核心主题,通过丰富的角色关系揭示了深邃的文化和社会内涵,使其成为文学史上的经典之作。 130 | 131 | ### 关于《西游记》的核心主题探讨 132 | 133 | 《西游记》的核心主题涉及人性、正义以及社会价值观的探讨。故事中,孙悟空的英雄形象通过与强盗的冲突得以展现,体现了对抗邪恶与追求正义的重要性。强盗不仅代表了贪婪与诱惑,也象征了人性中阴暗的一面,促使英故事推动情节发展的关键因素。 134 | 135 | 此外,金银珠宝作为财富象征,暗示了对物质欲望的执迷及其带来的负面影响。这一探讨反映了人们对道德伦理的思考,以及在社会中存在的价值观分歧。而孙悟空与唐僧的师徒关系则强调了学习与成长,表现出个人在追求理想过程中的变化,这种变化也深刻影响了角色之间的互动和整体情节的发展。 136 | 137 | 孙悟空的叛逆与追求自由的主题深刻,贯穿整个故事,使得他在挑战权威的同时,也在内心中不断寻找自我认同。通过对抗黄风怪等敌人,孙悟空展现了责任、保护弱者的内涵,这不仅是个人成长的旅程,更是对善恶斗争的文化背景进行的反思。 138 | 139 | 这些元素共同构成了《西游记》丰富多层的主题,使其不仅是一次冒险旅程,更是一部充满思想深度的文学作品,探讨着人性、道德、欲望及成长等重要议题。通过不同角色及其互动,故事深刻揭示了中国文化中对正义与邪恶的认知,以及个人在社会中的角色定位与自我价值的塑造。 140 | 141 | ### 这个故事的核心主题解析 142 | 143 | 《西游记》的核心主题围绕着反叛与权威之间的斗争,以孙悟空的叛逆精神为代表。他在与天界权威角色的互动中,尤其是与玉帝和太上老君的关系,展现了对传统权威的不满,以及对长生不老的渴望。这个故事不仅探讨了个人对自由与身份认同的追寻,还反映了社会层面对权威和传统信仰的挑战。 144 | 145 | 孙悟空从石头中诞生,拥有强大的法力和渴望自由的本能,这种反叛性格贯穿了他的整个旅程。尽管他最终成为唐僧的徒弟,承担起取经的责任,但对权威的质疑始终是他内心深处的博弈。在灵霄殿的冲突中,孙悟空的不尊重力的坚定信心。玉帝的无奈与妥协显示了在绝对权力面前的脆弱,太上老君则作为调解者,试图以道教哲学平衡这一场权力冲突。 146 | 147 | 此外,故事中长生不老的追求不仅是孙悟空个人目标的体现,更是整个文化背景下对生命意义的深刻思考。通过对这个主题的探讨,《西游记》揭示了超越生死的哲学意涵,以及个体在追求自由与权威之间的矛盾关系。孙悟空的经历是对每个人在社会中寻找认同与自我的启示,反映了中华文化中深刻的思想与价值观。这种核心主题构建了《西游记》的叙事骨架,使其成为反映人性和社会的经典作品,影响了中国乃至全世界的文化。 148 | 149 | 150 | ### 故事的核心主题探讨 151 | 152 | 该故事围绕美猴王的旅程及其与劳动者们的关系展开。故事的核心主题在于探讨人与自然的和谐关系、身份与地位的转变,以及普通劳动者与英雄角色之间的对比。美猴王在岸边的降落象征着他从冒险回归后的身份转变,同时劳动者们从事捕捉鱼、打天鹅等活动,不仅展现了与自然的互动,也反映出他们的生活态度和价值观。 153 | 154 | 在故事中,美猴王的角色复杂而多维,既具备超凡能力,又与凡人有着密切的联系。这种双重身份使他更为人性化,也让读者感受到普通人在生活中应对的困难和挑战。通过对岸边劳动者们的描绘,故事体现了对勤劳的尊重,突显出中华文化中对不同职业角色的认可与尊重。 155 | 156 | 进一步地,故事探讨了生存的意义,劳动活动不仅是维持生计的方式,更深层次地体现了人们对生命和自由的追求。美猴王目睹的这一切,使他对生命的理解日益深刻,有助于他在未来的旅程中更好地适应与周围环境的关系。 157 | 158 | 总体而言,故事围绕人的生存、身份的变化以及人与自然、社会的紧密联系展开,极具哲学深度。作者通过生动的描写,引导读者反思生活的本质与价值,以及在充满挑战的世界中,人与人之间的关系和共同奋斗的精神。这种观念的传递为故事增添了丰富的层次和深度,成为核心主题的体现。 159 | 160 | ### 《西游记》故事核心主题分析 161 | 162 | 《西游记》是中国经典文学的一部重要作品,具有深刻的文化与哲学内涵。它的故事围绕唐僧取经的经历与各类角色之间的复杂互动展开,核心主题涵盖个人救赎、身份探索、以及师徒关系等多个层面。 163 | 164 | 首先,故事中猪悟能的身份转变是一个明显的救赎象征。他从妖怪变为唐僧的徒弟,反映出个人如何在面对社会期望与责任时进行内心的改造与成长。这一过程不仅是他个人的蜕变,也映射了汉民族对改变与救赎的期望。 165 | 166 | 其次,孙悟空与天庭的权力斗争则展示了个人与权威之间的对抗。孙悟空因其反叛行为挑战了玉皇大帝的统治,体现出在压迫下的个体如何寻求自我价值与身份的确立。这种叛逆不仅具有反抗的意义,而且深层次上探讨了权力结构的复杂性。 167 | 168 | 师徒关系是《西游记》中另一个重要的主题。唐僧作为引导者塑造了孙悟空的角色发展,二者之间的互相依赖与矛盾展示了信任、领导与自由意志之间的深刻探讨。通过对唐僧的领导与悟空的反抗,读者能够看到在压力与挑战中的人性挣扎与应对。 169 | 170 | 最后,故事中的文化背景也为其主题增添了深度。通过对优秀品质如善良与仁慈的探索,结合角色之间的互动,整个叙述构成了对人与人之间关系的深刻反思,这些元素共同丰富了《西游记》的文学价值与社会意义。 171 | --------------------------------------------------------------------------------