├── .gitignore ├── logos ├── chroma.png ├── meta.png ├── milvus.png ├── qdrant.png ├── chainlit.png ├── elastic.png ├── pinecone.png ├── weaviate.png ├── intel-labs.png └── opensearch.png ├── images └── chainlit-haystack.png ├── .github └── workflows │ └── publish_integrations.yml ├── integrations ├── text2speech.md ├── fastrag.md ├── mastodon-fetcher.md ├── chroma-documentstore.md ├── azure-translator.md ├── veracity.md ├── chainlit.md ├── document-threshold.md ├── entailment-checker.md ├── opensearch-document-store.md ├── readmedocs-fetcher.md ├── faiss-document-store.md ├── lemmatize.md ├── qdrant-document-store.md ├── elasticsearch-document-store.md ├── pinecone-document-store.md ├── milvus-document-store.md ├── weaviate-document-store.md ├── basic-agent-memory.md └── newspaper3k.md └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store -------------------------------------------------------------------------------- /logos/chroma.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLukas22/haystack-integrations/main/logos/chroma.png -------------------------------------------------------------------------------- /logos/meta.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLukas22/haystack-integrations/main/logos/meta.png -------------------------------------------------------------------------------- /logos/milvus.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLukas22/haystack-integrations/main/logos/milvus.png -------------------------------------------------------------------------------- /logos/qdrant.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLukas22/haystack-integrations/main/logos/qdrant.png -------------------------------------------------------------------------------- /logos/chainlit.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLukas22/haystack-integrations/main/logos/chainlit.png -------------------------------------------------------------------------------- /logos/elastic.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLukas22/haystack-integrations/main/logos/elastic.png -------------------------------------------------------------------------------- /logos/pinecone.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLukas22/haystack-integrations/main/logos/pinecone.png -------------------------------------------------------------------------------- /logos/weaviate.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLukas22/haystack-integrations/main/logos/weaviate.png -------------------------------------------------------------------------------- /logos/intel-labs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLukas22/haystack-integrations/main/logos/intel-labs.png -------------------------------------------------------------------------------- /logos/opensearch.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLukas22/haystack-integrations/main/logos/opensearch.png -------------------------------------------------------------------------------- /images/chainlit-haystack.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLukas22/haystack-integrations/main/images/chainlit-haystack.png -------------------------------------------------------------------------------- /.github/workflows/publish_integrations.yml: -------------------------------------------------------------------------------- 1 | name: Publish integrations on Haystack Home 2 | 3 | on: 4 | workflow_dispatch: 5 | push: 6 | branches: 7 | - main 8 | 9 | 10 | jobs: 11 | publish-integrations: 12 | runs-on: ubuntu-latest 13 | 14 | steps: 15 | - name: trigger-hook 16 | run: | 17 | curl -X POST ${{ secrets.VERCEL_DEPLOY_HOOK }} -------------------------------------------------------------------------------- /integrations/text2speech.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: integration 3 | name: AnswerToSpeech & DocumentToSpeech 4 | description: Convert Haystack Answers and Documents to audio files 5 | authors: 6 | - name: deepset 7 | socials: 8 | github: deepset-ai 9 | twitter: deepset_ai 10 | linkedin: deepset-ai 11 | pypi: https://pypi.org/project/farm-haystack-text2speech/ 12 | repo: https://github.com/deepset-ai/haystack-extras/tree/main/nodes/text2speech 13 | type: Custom Node 14 | report_issue: https://github.com/deepset-ai/haystack-extras/issues 15 | --- 16 | 17 | The `farm-haystack-text2speech` package contains two Nodes that allow you to convert Haystack `Answers` and `Documents` into audio files: `AnswerToSpeech` and `DocumentToSpeech`. 18 | 19 | ## Installation 20 | 21 | For Debain-based systems, first install some more dependencies: 22 | ```bash 23 | sudo apt-get install libsndfile1 ffmpeg 24 | ``` 25 | 26 | Install the package: 27 | ```bash 28 | pip install farm-haystack-text2speech 29 | ``` 30 | 31 | ## Usage 32 | 33 | For a full example of how to use the `AnswerToSpeech` Node, you may try out our "[Make Your QA Pipelines Talk Tutorial](https://haystack.deepset.ai/tutorials/17_audio)" 34 | 35 | For example, in a simple Extractive QA Pipeline: 36 | 37 | ```python 38 | from haystack.nodes import BM25Retriever, FARMReader 39 | from text2speech import AnswerToSpeech 40 | 41 | retriever = BM25Retriever(document_store=document_store) 42 | reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True) 43 | answer2speech = AnswerToSpeech( 44 | model_name_or_path="espnet/kan-bayashi_ljspeech_vits", generated_audio_dir=Path("./audio_answers") 45 | ) 46 | 47 | audio_pipeline = Pipeline() 48 | audio_pipeline.add_node(retriever, name="Retriever", inputs=["Query"]) 49 | audio_pipeline.add_node(reader, name="Reader", inputs=["Retriever"]) 50 | audio_pipeline.add_node(answer2speech, name="AnswerToSpeech", inputs=["Reader"]) 51 | ``` 52 | -------------------------------------------------------------------------------- /integrations/fastrag.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: integration 3 | name: fastRAG 4 | description: A research framework designed to facilitate the building of retrieval augmented generative pipelines. 5 | authors: 6 | - name: Intel Labs 7 | socials: 8 | github: IntelLabs 9 | pypi: 10 | repo: https://github.com/IntelLabs/fastRAG 11 | type: Custom Node 12 | report_issue: https://github.com/IntelLabs/fastRAG/issues 13 | logo: /logos/intel-labs.png 14 | --- 15 | 16 | fast**RAG** is a research framework designed to facilitate the building of retrieval augmented generative pipelines. Its main goal is to make retrieval augmented generation as efficient as possible through the use of state-of-the-art and efficient retrieval and generative models. The framework includes a variety of sparse and dense retrieval models, as well as different extractive and generative information processing models. fastRAG aims to provide researchers and developers with a comprehensive tool-set for exploring and advancing the field of retrieval augmented generation. 17 | 18 | It includes custom nodes such as: 19 | - Image Generators 20 | - Knoweldge Graph Creator 21 | - Document Shapers 22 | - Reader with FiD implementation 23 | - Efficient document vector store (PLAID) 24 | - Benchmarking scripts 25 | 26 | ## Installation 27 | 28 | Preliminary requirements: 29 | 30 | - Python 3.8+ 31 | - PyTorch 32 | 33 | In a new virtual environment, run: 34 | 35 | ```bash 36 | pip install . 37 | ``` 38 | 39 | There are various dependencies, based on usage: 40 | 41 | ```bash 42 | # Additional engines/components 43 | pip install .[faiss-cpu] # CPU-based Faiss 44 | pip install .[faiss-gpu] # GPU-based Faiss 45 | pip install .[qdrant] # Qdrant support 46 | pip install libs/colbert # ColBERT/PLAID indexing engine 47 | pip install .[image-generation] # Stable diffusion library 48 | pip install .[knowledge_graph] # spacy and KG libraries 49 | 50 | # REST API + UI 51 | pip install .[ui] 52 | 53 | # Benchmarking 54 | pip install .[benchmark] 55 | 56 | # Dev tools 57 | pip install .[dev] 58 | ``` 59 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Haystack Integrations 2 | 3 | This repository is an index of Haystack integrations that can be used with a Haystack Pipeline or Agent. 4 | 5 | These integrations are maintained by their respective owner or authors. You can browse them on the [Haystack Integrations](https://haystack.deepset.ai/integrations) page, where you will find information on the Author(s), installation and usage of each tool. 6 | 7 | ## What are Haystack Integrations? 8 | 9 | A Haystack Integration are Custom Nodes, DocumentStores or Agent Tools that are either external packages or additional technologies that can be used with Haystack. Some integrations may be maintained by the deepset team, others are community contributions that are owned by the authors of the integration. 10 | 11 | ## Looking for prompts? 12 | 13 | Prompts for the `PromptNode` and `Agent` can be found on our [Prompt Hub](https://prompthub.deepset.ai). 14 | To contribute a prompt, follow instructions in the [`prompthub`](https://github.com/deepset-ai/prompthub) repo. 15 | 16 | ## How to contribute 17 | 18 | To contribute, create a PR add an `.md` file to the `integrations/` directory. A few things to include in the file 👇 19 | The frontmatter has to include the following: 20 | ``` 21 | --- 22 | name: Name of your integration (required) 23 | description: A short description (this will appear on the front page element of your integration on the website) (required) 24 | authors: 25 | - name: Name of Author 1 (required) 26 | socials: 27 | github: include if desired 28 | twitter: include if desired 29 | - name: Name of Author 2 30 | socials: 31 | github: include if desired 32 | twitter: include if desired 33 | pypi: url of pypi package if exists 34 | repo: url of GitHub repo if exists 35 | type: Custom Node OR Document Store OR Agent Tool (required) 36 | report_issue: url to where people can report an issue with the integration 37 | --- 38 | ``` 39 | Note that there should be at least one of either the `pypi` or `repo` fields for us to merge the integration. 40 | 41 | Then, please add as much information and instructions about your Integration as possible as the body of your `.md` file. 42 | 43 | Open a Pull Request, and congrats, if all goes well, you will see your integration on the integrations page in no time 🥳 44 | -------------------------------------------------------------------------------- /integrations/mastodon-fetcher.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: integration 3 | name: Mastodon Fetcher 4 | description: A custom component to fetch a mastodon usernames latest posts 5 | authors: 6 | - name: Tuana Çelik 7 | socials: 8 | github: tuanacelik 9 | twitter: tuanacelik 10 | linkedin: tuanacelik 11 | pypi: https://pypi.org/project/mastodon-fetcher-haystack/ 12 | repo: https://github.com/tuanacelik/mastodon-fetcher-haystack 13 | type: Custom Node 14 | report_issue: https://github.com/tuanacelik/mastodon-fetcher-haystack/issues 15 | --- 16 | 17 | The `MastodonFetcher` is a simple custom component that fetches the `last_k_posts` of a given Mastodon username. 18 | 19 | This component expects `query` to be a complete Mastodon username. For example "tuana@sigmoid.social". If the provided username is correct and public, `MastodonFetcher` will return a list of `Document` objects where the contents are the users latest posts. 20 | 21 | ## Installation 22 | 23 | Run `pip install mastodon-fetcher-haystack` to install the latest available release. 24 | 25 | ## Usage 26 | 27 | Because the component returns a list of Documents, it can be used at the same step that a Retriever would normally be used. For example, use it in a Retrieval Augmented Generative (RAG) pipeline as follows: 28 | 29 | ```python 30 | from haystack import Pipeline 31 | from haystack.nodes import PromptNode, PromptTemplate, AnswerParser 32 | from haystack.utils import print_answers 33 | from mastodon_fetcher_haystack.mastodon_fetcher import MastodonFetcher 34 | 35 | mastodon_fetcher = MastodonFetcher() 36 | 37 | prompt_template = PromptTemplate(prompt="Given the follwing Mastodon posts stream, create a short summary of the topics the account posts about. Mastodon posts stream: {join(documents)};\n Answer:", 38 | output_parser=AnswerParser()) 39 | prompt_node = PromptNode(default_prompt_template=prompt_template, model_name_or_path="text-davinci-003", api_key=YOUR_OPENAI_API_KEY) 40 | 41 | pipe = Pipeline() 42 | pipe.add_node(component=mastodon_fetcher, name="MastodonFetcher", inputs=["Query"]) 43 | pipe.add_node(component=prompt_node, name="PromptNode", inputs=["MastodonFetcher"]) 44 | result = pipe.run(query="tuana@sigmoid.social", params={"MastodonFetcher": {"last_k_posts": 3}}) 45 | ``` 46 | 47 | ## Limitations 48 | 1. The way this component is set up is very particular with how it expects usernames. Make sure you provide the full username, e.g.: `username@instance` 49 | 2. By default, the Mastodon API allows requesting up to 40 posts. 50 | -------------------------------------------------------------------------------- /integrations/chroma-documentstore.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: integration 3 | name: Chroma Document Store 4 | description: A Document Store for storing and retrieval from Chroma - built for Haystack 2.0 5 | authors: 6 | - name: Massimiliano Pippi 7 | socials: 8 | github: masci 9 | pypi: https://pypi.org/project/chroma-store 10 | repo: https://github.com/masci/chroma-haystack 11 | type: Document Store 12 | report_issue: https://github.com/masci/chroma-haystack/issues 13 | logo: /logos/chroma.png 14 | --- 15 | # Chroma Document Store for Haystack 16 | 17 | [![PyPI - Version](https://img.shields.io/pypi/v/chroma-haystack.svg)](https://pypi.org/project/chroma-haystack) 18 | [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/chroma-haystack.svg)](https://pypi.org/project/chroma-haystack) 19 | [![test](https://github.com/masci/chroma-haystack/actions/workflows/test.yml/badge.svg)](https://github.com/masci/chroma-haystack/actions/workflows/test.yml) 20 | 21 | ----- 22 | 23 | **Table of Contents** 24 | 25 | - [Chroma Document Store for Haystack](#chroma-document-store-for-haystack) 26 | - [Installation](#installation) 27 | - [Examples](#examples) 28 | - [License](#license) 29 | 30 | ## Installation 31 | 32 | ```console 33 | pip install chroma-haystack 34 | ``` 35 | ## Usage 36 | Once installed, you can start using your Chroma database with Haystack 2.0 by initializing it: 37 | 38 | ```python 39 | from chroma_haystack import ChromaDocumentStore 40 | 41 | # Chroma is used in-memory so we use the same instances in the two pipelines below 42 | document_store = ChromaDocumentStore() 43 | ``` 44 | 45 | ### Writing Documents to ChromaDocumentStore 46 | To write documents to `ChromaDocumentStore`, create an indexing pipeline. 47 | 48 | ```python 49 | from haystack.preview.components.file_converters import TextFileToDocument 50 | from haystack.preview.components.writers import DocumentWriter 51 | 52 | indexing = Pipeline() 53 | indexing.add_component("converter", TextFileToDocument()) 54 | indexing.add_component("writer", DocumentWriter(document_store)) 55 | indexing.connect("converter", "writer") 56 | indexing.run({"converter": {"paths": file_paths}}) 57 | ``` 58 | 59 | ## Examples 60 | You can find a code example showing how to use the Document Store and the Retriever under the `example/` folder of this repo or in [this Colab](https://colab.research.google.com/drive/1YpDetI8BRbObPDEVdfqUcwhEX9UUXP-m?usp=sharing). 61 | 62 | ## License 63 | 64 | `chroma-haystack` is distributed under the terms of the [Apache-2.0](https://spdx.org/licenses/Apache-2.0.html) license. 65 | -------------------------------------------------------------------------------- /integrations/azure-translator.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: integration 3 | name: Azure Translate Nodes 4 | description: TranslateAnswer and TranslateQuery Nodes that use the Azure Translate endpoint 5 | authors: 6 | - name: recrudesce (Russ) 7 | socials: 8 | github: recrudesce 9 | twitter: recrudesce 10 | pypi: https://pypi.org/project/haystack-translate-node/ 11 | repo: https://github.com/recrudesce/haystack_translate_node 12 | type: Custom Node 13 | report_issue: https://github.com/recrudesce/haystack_translate_node/issues 14 | --- 15 | 16 | This package allows you to use the Azure translation endpoints to separately translate the query and the answer. It's good for scenarios where your dataset is in a different language to what you expect the user query to be in. This way, you will be able to translate the user query to the your dataset's language, and translate the answer back to the user's language. 17 | 18 | ## Installation 19 | Run `pip install haystack-translate-node` to install the latest available version. 20 | 21 | ## Usage 22 | Include in your pipeline as follows: 23 | 24 | ```python 25 | from haystack_translate_node import TranslateAnswer, TranslateQuery 26 | 27 | translate_query = TranslateQuery(api_key="", location="", azure_translate_endpoint="", base_lang="en") 28 | translate_answer = TranslateAnswer(api_key="", location="", azure_translate_endpoint="", base_lang="en") 29 | 30 | pipel = Pipeline() 31 | pipel.add_node(component=translate_query, name="TranslateQuery", inputs=["Query"]) 32 | pipel.add_node(component=retriever, name="Retriever", inputs=["TranslateQuery"]) 33 | pipel.add_node(component=prompt_node, name="prompt_node", inputs=["Retriever"]) 34 | pipel.add_node(component=translate_answer, name="TranslateAnswer", inputs=["prompt_node"]) 35 | ``` 36 | 37 | `location`, `azure_translate_endpoint`, and `base_lang` are optional, and will default to uksouth, https://api.cognitive.microsofttranslator.com/, and en respectively. 38 | 39 | TranslateQuery will determine the language of the query, and assign it to the `in_lang` JSON value. 40 | 41 | TranslateQuery will take the original query, in any language, and assign it to the `in_query` JSON value. 42 | 43 | TranslateQuery will overwrite the original `query` JSON value with the translated English value 44 | 45 | You can then query your `base_lang` corpus using the `query` value as normal using a standard Haystack Retriever node, which will place your results in `results`. 46 | 47 | TranslateAnswer translate the `base_lang` result stored in results back to the language stored in `in_lang` and subsequently store it in the `out_answer` JSON value. 48 | -------------------------------------------------------------------------------- /integrations/veracity.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: integration 3 | name: Veracity Node 4 | description: A node to check the validity of an answer, based on the given context. 5 | authors: 6 | - name: Xceron 7 | socials: 8 | github: Xceron 9 | repo: https://github.com/Xceron/haystack_veracity_node 10 | type: Custom Node 11 | report_issue: https://github.com/Xceron/haystack_veracity_node/issues 12 | --- 13 | 14 | This Node checks whether the given input is correctly answered by the given context (as judged by the given LLM). One example usage is together with [Haystack Memory](https://github.com/rolandtannous/haystack-memory): After the memory is retrieved, the given model checks whether the output is satisfying the question. 15 | 16 | **Important**: 17 | The Node expects the context to be passed into `results`. If the previous node in the pipeline is putting the text somewhere else, use a [Shaper](https://docs.haystack.deepset.ai/docs/shaper) to `rename` the argument to `results`. 18 | 19 | ## Installation 20 | 21 | Clone the repo to a directory, change to that directory, then perform a `pip install '.'`. This will install the package to your Python libraries. 22 | 23 | ## Usage 24 | ### Example Usage with Haystack Memory 25 | ```py 26 | from haystack_veracity_node.node import VeracityNode 27 | from haystack_memory.memory import RedisMemoryRecallNode 28 | from haystack_memory.prompt_templates import memory_template 29 | from haystack import Pipeline 30 | from haystack.agents import Agent, Tool 31 | from haystack.nodes import PromptNode 32 | 33 | # Create VeracityNode 34 | veracity_node = VeracityNode(model_name_or_path="gpt-3.5-turbo", api_key="YOUR_KEY") 35 | 36 | # Create Memory 37 | redis_memory_node = RedisMemoryRecallNode(memory_id="agent_memory", 38 | host="localhost", 39 | port=6379, 40 | db=0) 41 | 42 | # Add them together in a pipeline 43 | memory_pipeline = Pipeline() 44 | memory_pipeline.add_node(component=redis_memory_node, name="MemoryTool", inputs=["Query"]) 45 | memory_pipeline.add_node(component=veracity_node, name="VeracityNode", inputs=["MemoryTool"]) 46 | 47 | # Create an agent and add the pipeline as a tool 48 | prompt_node = PromptNode(model_name_or_path="text-davinci-003", api_key=openai_api_key, max_length=512, 49 | stop_words=["Observation:"]) 50 | memory_agent = Agent(prompt_node=prompt_node, prompt_template=memory_template) 51 | memory_tool = Tool(name="Memory", 52 | pipeline_or_node=memory_pipeline, 53 | description="Your memory. Always access this tool first to remember what you have learned.") 54 | 55 | memory_agent.add_tool(memory_tool) 56 | ``` 57 | -------------------------------------------------------------------------------- /integrations/chainlit.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: integration 3 | name: Chainlit Agent UI 4 | description: Visualise and debug your agent's intermediary steps! 5 | authors: 6 | - name: Chainlit Team 7 | socials: 8 | github: Chainlit 9 | twitter: chainlit_io 10 | pypi: https://pypi.org/project/chainlit/ 11 | repo: https://github.com/Chainlit/chainlit 12 | type: Custom Node 13 | report_issue: https://github.com/Chainlit/chainlit/issues 14 | logo: /logos/chainlit.png 15 | --- 16 | 17 | Chainlit is an open-source Python package that makes it incredibly fast to build, test and share LLM apps. Integrate the Chainlit API in your existing code to spawn a ChatGPT-like interface in minutes. With a simple line of code, you can leverage Chainlit to interact with your agent, visualise intermediary steps, debug them in an advanced prompt playground and share your app to collect human feedback. More info on the [documentation](https://docs.chainlit.io/). 18 | 19 | ![Chainlit screenshot](https://raw.githubusercontent.com/deepset-ai/haystack-integrations/main/images/chainlit-haystack.png) 20 | 21 | ## Installation 22 | 23 | ```bash 24 | pip install chainlit 25 | ``` 26 | 27 | ## Usage 28 | 29 | Create a new Python file named `app.py` with the code below. This code adds the Chainlit callback handler to the Haystack callback manager. The callback handler is responsible for listening to the Agent’s intermediate steps and sending them to the UI. 30 | 31 | ```python 32 | from haystack.agents.conversational import ConversationalAgent 33 | import chainlit as cl 34 | 35 | ## Agent Code 36 | 37 | agent = ConversationalAgent( 38 | prompt_node=conversational_agent_prompt_node, 39 | memory=memory, 40 | prompt_template=agent_prompt, 41 | tools=[search_tool], 42 | ) 43 | 44 | cl.HaystackAgentCallbackHandler(agent) 45 | 46 | @cl.on_message 47 | async def main(message: str): 48 | response = await cl.make_async(agent.run)(message) 49 | await cl.Message(author="Agent", content=response["answers"][0].answer).send() 50 | ``` 51 | 52 | To kick off your LLM app, open a terminal, navigate to the directory containing `app.py`, and run the following command: 53 | 54 | ```bash 55 | chainlit run app.py 56 | ``` 57 | 58 | ## Example 59 | Check out this full example from [the cookbook](https://github.com/Chainlit/cookbook/tree/main/haystack). 60 | 61 | ## About Chainlit 62 | Chainlit is an open-source Python package that makes it incredibly fast to build, test and share LLM apps. Integrate the Chainlit API in your existing code to spawn a ChatGPT-like interface in minutes! 63 | 64 | ### Key features 65 | - Build LLM Apps fast: Integrate seamlessly with an existing code base or start from scratch in minutes 66 | - Visualize multi-steps reasoning: Understand the intermediary steps that produced an output at a glance 67 | - Iterate on prompts: Deep dive into prompts in the Prompt Playground to understand where things went wrong and iterate 68 | - Collaborate with teammates: Invite your teammates, create annotated datasets and run experiments together 69 | - Share your app: Publish your LLM app and share it with the world (coming soon) 70 | -------------------------------------------------------------------------------- /integrations/document-threshold.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: integration 3 | name: Document Threshold 4 | description: This component filters documents based on a minimum Confidence Score percentage, ensuring only the documents above the threshold get passed down the pipeline. 5 | authors: 6 | - name: recrudesce 7 | socials: 8 | github: recrudesce 9 | twitter: recrudesce 10 | pypi: https://pypi.org/project/haystack-threshold-node/ 11 | repo: https://github.com/recrudesce/haystack_threshold_node 12 | type: Custom Node 13 | report_issue: https://github.com/recrudesce/haystack_threshold_node/issues 14 | --- 15 | # Haystack Threshold Node 16 | This component filters documents based on a threshold percentage, ensuring only the documents above the threshold get passed down the pipeline. 17 | This allows you to query your document store for a larger top_k, but then filter the results down to those which are above a set confidence score. 18 | 19 | ## Installation 20 | 21 | `pip install haystack-threshold-node` 22 | 23 | ## Usage 24 | 25 | Include it in your pipeline - example as follows: 26 | 27 | ```python 28 | import logging 29 | import re 30 | 31 | from datasets import load_dataset 32 | from haystack.document_stores import InMemoryDocumentStore 33 | from haystack.nodes import PromptNode, PromptTemplate, AnswerParser, BM25Retriever 34 | from haystack.pipelines import Pipeline 35 | from haystack_lemmatize_node import LemmatizeDocuments 36 | 37 | 38 | logging.basicConfig(format="%(levelname)s - %(name)s - %(message)s", level=logging.WARNING) 39 | logging.getLogger("haystack").setLevel(logging.INFO) 40 | 41 | document_store = InMemoryDocumentStore(use_bm25=True) 42 | 43 | dataset = load_dataset("bilgeyucel/seven-wonders", split="train") 44 | document_store.write_documents(dataset) 45 | 46 | retriever = BM25Retriever(document_store=document_store, top_k=10) 47 | 48 | lfqa_prompt = PromptTemplate( 49 | name="lfqa", 50 | prompt_text="Given the context please answer the question using your own words. Generate a comprehensive, summarized answer. If the information is not included in the provided context, reply with 'Provided documents didn't contain the necessary information to provide the answer'\n\nContext: {documents}\n\nQuestion: {query} \n\nAnswer:", 51 | output_parser=AnswerParser(), 52 | ) 53 | 54 | prompt_node = PromptNode( 55 | model_name_or_path="text-davinci-003", 56 | default_prompt_template=lfqa_prompt, 57 | max_length=500, 58 | api_key="sk-OPENAIKEY", 59 | ) 60 | 61 | # The value you pass for threshold is the lowest % score you will accept. Whole numbers only. 62 | # In this example, the threshold is set to 80%. 63 | threshold = DocumentThreshold(threshold=80) 64 | 65 | pipe = Pipeline() 66 | pipe.add_node(component=retriever, name="Retriever", inputs=["Query"]) 67 | pipe.add_node(component=threshold, name="Threshold", inputs=["Retriever"]) 68 | pipe.add_node(component=prompt_node, name="prompt_node", inputs=["Threshold"]) 69 | 70 | query = "What does the Rhodes Statue look like?" 71 | 72 | output = pipe.run(query) 73 | 74 | print(output['answers'][0].answer) 75 | ``` 76 | -------------------------------------------------------------------------------- /integrations/entailment-checker.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: integration 3 | name: Entailment Checker 4 | description: Haystack node for checking the entailment between a statement and a list of Documents 5 | authors: 6 | - name: Stefano Fiorucci 7 | socials: 8 | github: anakin87 9 | pypi: https://pypi.org/project/haystack-entailment-checker/ 10 | repo: https://github.com/anakin87/haystack-entailment-checker 11 | type: Custom Node 12 | report_issue: https://github.com/anakin87/haystack-entailment-checker/issues 13 | --- 14 | **Live Demo**: Fact Checking 🎸 Rocks!   [![Generic badge](https://img.shields.io/badge/🤗-Open%20in%20Spaces-blue.svg)](https://huggingface.co/spaces/anakin87/fact-checking-rocks) 15 | 16 | ## How it works 17 | ![Entailment Checker Node](https://github.com/anakin87/haystack-entailment-checker/raw/main/images/entailment_checker_node.png) 18 | - The node takes a list of Documents (commonly returned by a [Retriever](https://docs.haystack.deepset.ai/docs/retriever)) and a statement as input. 19 | - Using a Natural Language Inference model, the text entailment between each text passage/Document (premise) and the statement (hypothesis) is computed. For every text passage, we get 3 scores (summing to 1): entailment, contradiction and neutral. 20 | - The text entailment scores are aggregated using a weighted average. The weight is the relevance score of each passage returned by the Retriever, if availaible. It expresses the similarity between the text passage and the statement. **Now we have a summary score, so it is possible to tell if the passages confirm, are neutral or disprove the user statement.** 21 | - *Empirical consideration: if in the first N documents (N **threshold**), it is better not to consider the less relevant other (K-N) documents.* 22 | 23 | ## Installation 24 | ```bash 25 | pip install haystack-entailment-checker 26 | ``` 27 | 28 | ## Usage 29 | ### Basic example 30 | ```python 31 | from haystack import Document 32 | from haystack_entailment_checker import EntailmentChecker 33 | 34 | ec = EntailmentChecker( 35 | model_name_or_path = "microsoft/deberta-v2-xlarge-mnli", 36 | use_gpu = False, 37 | entailment_contradiction_threshold = 0.5) 38 | 39 | doc = Document("My cat is lazy") 40 | 41 | print(ec.run("My cat is very active", [doc])) 42 | # ({'documents': [...], 43 | # 'aggregate_entailment_info': {'contradiction': 1.0, 'neutral': 0.0, 'entailment': 0.0}}, ...) 44 | ``` 45 | 46 | ### Fact-checking pipeline (Retriever + EntailmentChecker) 47 | ```python 48 | from haystack import Document, Pipeline 49 | from haystack.nodes import BM25Retriever 50 | from haystack.document_stores import InMemoryDocumentStore 51 | from haystack_entailment_checker import EntailmentChecker 52 | 53 | # INDEXING 54 | # the knowledge base can consist of many documents 55 | docs = [...] 56 | ds = InMemoryDocumentStore(use_bm25=True) 57 | ds.write_documents(docs) 58 | 59 | # QUERYING 60 | retriever = BM25Retriever(document_store=ds) 61 | ec = EntailmentChecker() 62 | 63 | pipe = Pipeline() 64 | pipe.add_node(component=retriever, name="Retriever", inputs=["Query"]) 65 | pipe.add_node(component=ec, name="EntailmentChecker", inputs=["Retriever"]) 66 | 67 | pipe.run(query="YOUR STATEMENT TO CHECK") 68 | ``` 69 | -------------------------------------------------------------------------------- /integrations/opensearch-document-store.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: integration 3 | name: OpenSearch Document Store 4 | description: Use an OpenSearch database with Haystack 5 | authors: 6 | - name: deepset 7 | socials: 8 | github: deepset-ai 9 | twitter: deepset_ai 10 | linkedin: deepset-ai 11 | pypi: https://pypi.org/project/farm-haystack 12 | repo: https://github.com/deepset-ai/haystack 13 | type: Document Store 14 | report_issue: https://github.com/deepset-ai/haystack/issues 15 | logo: /logos/opensearch.png 16 | --- 17 | 18 | You can use [OpenSearch](https://opensearch.org/docs/latest/#docker-quickstart) in your Haystack pipelines with the [OpenSearchDocumentStore](https://docs.haystack.deepset.ai/docs/document_store#initialization) 19 | 20 | For a detailed overview of all the available methods and settings for the `OpenSearchDocumentStore`, visit the Haystack [API Reference](https://docs.haystack.deepset.ai/reference/document-store-api#opensearchdocumentstore) 21 | 22 | ## Installation 23 | 24 | ```bash 25 | pip install farm-haystack[opensearch] 26 | ``` 27 | 28 | ## Usage 29 | 30 | Once installed and running, you can start using OpenSearch with Haystack by initializing it: 31 | 32 | ```python 33 | from haystack.document_stores import OpenSearchDocumentStore 34 | 35 | document_store = OpenSearchDocumentStore() 36 | ``` 37 | 38 | ### Writing Documents to OpenSearchDocumentStore 39 | 40 | To write documents to your `OpenSearchDocumentStore`, create an indexing pipeline, or use the `write_documents()` function. 41 | For this step, you may make use of the available [FileConverters](https://docs.haystack.deepset.ai/docs/file_converters) and [PreProcessors](https://docs.haystack.deepset.ai/docs/preprocessor), as well as other [Integrations](/integrations) that might help you fetch data from other resources. 42 | 43 | #### Indexing Pipeline 44 | 45 | ```python 46 | from haystack import Pipeline 47 | from haystack.document_stores import OpenSearchDocumentStore 48 | from haystack.nodes import PDFToTextConverter, PreProcessor 49 | 50 | document_store = OpenSearchDocumentStore() 51 | converter = PDFToTextConverter() 52 | preprocessor = PreProcessor() 53 | 54 | indexing_pipeline = Pipeline() 55 | indexing_pipeline.add_node(component=converter, name="PDFConverter", inputs=["File"]) 56 | indexing_pipeline.add_node(component=preprocessor, name="PreProcessor", inputs=["PDFConverter"]) 57 | indexing_pipeline.add_node(component=document_store, name="DocumentStore", inputs=["PreProcessor"]) 58 | 59 | indexing_pipeline.run(file_paths=["filename.pdf"]) 60 | ``` 61 | 62 | ### Using OpenSearch in a Query Pipeline 63 | 64 | Once you have documents in your `OpenSearchDocumentStore`, it's ready to be used in any Haystack pipeline. For example, below is a pipeline that makes use of the ["deepset/question-generation"](https://prompthub.deepset.ai/?prompt=deepset%2Fquestion-generation) prompt that is designed to generate questions for the retrieved documents. If our `OpenSearchDocumentStore` had documents about food in it, you could generate questions about "Pizzas" in the following way: 65 | 66 | ```python 67 | from haystack import Pipeline 68 | from haystack.document_stores import OpenSearchDocumentStore 69 | from haystack.nodes import BM25Retriever, PromptNode 70 | 71 | document_store = OpenSearchDocumentStore() 72 | retriever = BM25Retriever(document_sotre = document_store) 73 | prompt_node = PromptNode(model_name_or_path = "gpt-4", 74 | api_key = "YOUR_OPENAI_KEY", 75 | default_prompt_template = "deepset/question-generation") 76 | 77 | query_pipeline = Pipeline() 78 | query_pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"]) 79 | query_pipeline.add_node(component=prompt_node, name="PromptNode", inputs=["Retriever"]) 80 | 81 | query_pipeline.run(query = "Pizzas") 82 | ``` -------------------------------------------------------------------------------- /integrations/readmedocs-fetcher.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: integration 3 | name: ReadMeDocs Fetcher 4 | description: Fetch documentdation pages from ReadMe docs sites. 5 | authors: 6 | - name: Tuana Çelik 7 | socials: 8 | github: tuanacelik 9 | twitter: tuanacelik 10 | linkedin: tuanacelik 11 | - name: Silvano Cerza 12 | socials: 13 | github: silvanocerza 14 | linkedin: silvanocerza 15 | pypi: https://pypi.org/project/readmedocs-fetcher-haystack/ 16 | repo: https://github.com/tuanacelik/readmedocs-fetcher-haystack 17 | type: Custom Node 18 | report_issue: https://github.com/tuanacelik/readmedocs-fetcher-haystack/issues 19 | --- 20 | 21 | This custom component for Haystack is designed to fetch documentation pages from the [ReadMe](https://readme.com/) documentation you have access to. It uses a `MarkdownConverter` to convert all of your documentation pages to a list of Haystack `Documents`. You can use this node as a standalone node or within an indexing pipeline. 22 | 23 | ## Installation 24 | 25 | ```bash 26 | pip install readmedocs-fetcher-haystack 27 | ``` 28 | 29 | ## Usage 30 | 31 | 1. To initialize a `ReadmeDocsFetcher` you have to provide an `api_key` parameter. This is your ReadMe Docs API Key. 32 | 2. There are 3 optional parameters to initialize the `ReadmeDocsFetcher` 33 | - `slugs`: To fetch a list of specific pages from your documentation. E.g. if you have want to fetch 'https://docs.haystack.deepset.ai/docs/installation' the slug would be `installation`. If not set, all of the available pages will be fetched. 34 | - `base_url`: Optionally provide this to add the full url of a documentation page to the `meta` of the created document. For example `base_url='https://docs.haystack.deepset.ai'"` 35 | - `version`: If not set, the latest stable version of tour docs will be fetched. 36 | - `markdown_converter`: When documents are fetched from ReadMe, temporary `.md` files are created and we use a [`MakrdownConverter`](https://docs.haystack.deepset.ai/reference/file-converters-api#markdownconverter) to create a list of haystack `Documents`. If not provided at initialization, then a `MarkdownConverter` with the default parameters is used. 37 | 38 | ### Standalone 39 | ```python 40 | import os 41 | from dotenv import load_dotenv 42 | from haystack.nodes import MarkdownConverter 43 | from readmedocs_fetcher_haystack import ReadmeDocsFetcher 44 | 45 | load_dotenv() 46 | README_API_KEY = os.getenv('README_API_KEY') 47 | 48 | converter = MarkdownConverter(remove_code_snippets=False) 49 | readme_fetcher = ReadmeDocsFetcher(api_key=README_API_KEY, markdown_converter=converter, base_url="https://docs.haystack.deepset.ai") 50 | readme_fetcher.fetch_docs() 51 | ``` 52 | 53 | To fetch a single doc from a specific version: 54 | ```python 55 | readme_fetcher.fetch_docs(slugs=["nodes_overview"], version="v1.18") 56 | ``` 57 | ### In a Pipeline 58 | 59 | ```python 60 | import os 61 | from dotenv import load_dotenv 62 | from haystack import Pipeline 63 | from haystack.nodes import MarkdownConverter, PreProcessor 64 | from haystack.document_stores import InMemoryDocumentStore 65 | from readmedocs_fetcher_haystack import ReadmeDocsFetcher 66 | 67 | load_dotenv() 68 | README_API_KEY = os.getenv('README_API_KEY') 69 | 70 | converter = MarkdownConverter(remove_code_snippets=False) 71 | readme_fetcher = ReadmeDocsFetcher(api_key=README_API_KEY, markdown_converter=converter, base_url="https://docs.haystack.deepset.ai")) 72 | 73 | preprocessor = PreProcessor() 74 | doc_store = InMemoryDocumentStore() 75 | 76 | pipe = Pipeline() 77 | pipe.add_node(component=readme_fetcher, name="ReadmeFetcher", inputs=["File"]) 78 | pipe.add_node(component=preprocessor, name="Preprocessor", inputs=["ReadmeFetcher"]) 79 | pipe.add_node(component=doc_store, name="DocumentStore", inputs=["Preprocessor"]) 80 | pipe.run() 81 | ``` 82 | 83 | To fetch a single documentation page: 84 | ```python 85 | pipe.run(params={"ReadmeFetcher":{"slugs": ["nodes_overview"]}}) 86 | ``` 87 | -------------------------------------------------------------------------------- /integrations/faiss-document-store.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: integration 3 | name: FAISS Document Store 4 | description: Use a FAISS vector database with Haystack 5 | authors: 6 | - name: deepset 7 | socials: 8 | github: deepset-ai 9 | twitter: deepset_ai 10 | linkedin: deepset-ai 11 | pypi: https://pypi.org/project/farm-haystack 12 | repo: https://github.com/deepset-ai/haystack 13 | type: Document Store 14 | report_issue: https://github.com/deepset-ai/haystack/issues 15 | logo: /logos/meta.png 16 | --- 17 | 18 | [Faiss](https://github.com/facebookresearch/faiss#readme) is a project by Meta, for efficient vector search. You can use it in your Haystack pipelines with the [FAISSDocumentStore](https://docs.haystack.deepset.ai/docs/document_store#initialization) 19 | 20 | For a detailed explanation on different initialization options of the `FAISSDocumentStore`, please visit the [Haystack Documentation](https://docs.haystack.deepset.ai/docs/document_store#initialization) and [API Reference](https://docs.haystack.deepset.ai/reference/document-store-api#faissdocumentstore). Below are some examples of how you might use it within a Haystack Pipeline. 21 | 22 | ## Installation 23 | 24 | ```bash 25 | pip install farm-haystack[faiss] 26 | ``` 27 | 28 | or to install `FAISSDocumentStore` with GPU support, you may install: 29 | ```bash 30 | pip install farm-haystack[faiss-gpu] 31 | ``` 32 | 33 | ## Usage 34 | 35 | Once installed, you can start using FAISS with Haystack by initializing it: 36 | 37 | ```python 38 | from haystack.document_stores import FAISSDocumentStore 39 | 40 | document_store = FAISSDocumentStore() 41 | ``` 42 | 43 | ### Writing Documents to FAISSDocumentStore 44 | 45 | To write documents to your `FAISSDocumentStore`, create an indexing pipeline, or use the `write_documents()` function. 46 | For this step, you may make use of the available [FileConverters](https://docs.haystack.deepset.ai/docs/file_converters) and [PreProcessors](https://docs.haystack.deepset.ai/docs/preprocessor), as well as other [Integrations](/integrations) that might help you fetch data from other resources. 47 | 48 | #### Indexing Pipeline 49 | 50 | ```python 51 | from haystack import Pipeline 52 | from haystack.document_stores import FAISSDocumentStore 53 | from haystack.nodes import PDFToTextConverter, PreProcessor 54 | 55 | document_store = FAISSDocumentStore() 56 | converter = PDFToTextConverter() 57 | preprocessor = PreProcessor() 58 | 59 | indexing_pipeline = Pipeline() 60 | indexing_pipeline.add_node(component=converter, name="PDFConverter", inputs=["File"]) 61 | indexing_pipeline.add_node(component=preprocessor, name="PreProcessor", inputs=["PDFConverter"]) 62 | indexing_pipeline.add_node(component=document_store, name="DocumentStore", inputs=["PreProcessor"]) 63 | 64 | indexing_pipeline.run(file_paths=["filename.pdf"]) 65 | ``` 66 | 67 | ### Using Faiss in a Query Pipeline 68 | 69 | Once you have documents in your `FAISSDocumentStore`, it's ready to be used in any Haystack pipeline. Such as a Retrieval Augmented Generation (RAG) pipeline. Learn more about [Retrievers](https://docs.haystack.deepset.ai/docs/retriever) to make use of vector search within your LLM pipelines. 70 | 71 | ```python 72 | from haystack import Pipeline 73 | from haystack.document_stores import FAISSDocumentStore 74 | from haystack.nodes import EmbeddingRetriever, PromptNode 75 | 76 | document_store = FAISSDocumentStore() 77 | retriever = EmbeddingRetriever(document_store = document_store, 78 | embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1") 79 | prompt_node = PromptNode(model_name_or_path = "gpt-4", 80 | api_key = "YOUR_OPENAI_KEY", 81 | default_prompt_template = "deepset/question-answering-with-references") 82 | 83 | query_pipeline = Pipeline() 84 | query_pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"]) 85 | query_pipeline.add_node(component=prompt_node, name="PromptNode", inputs=["Retriever"]) 86 | 87 | query_pipeline.run(query = "What is Haystack?") 88 | ``` -------------------------------------------------------------------------------- /integrations/lemmatize.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: integration 3 | name: Document Lemmatizer 4 | description: A lemmatizing node for documents which can potentially reduce token use by up to 30%. 5 | authors: 6 | - name: recrudesce 7 | socials: 8 | github: recrudesce 9 | twitter: recrudesce 10 | - name: Xceron 11 | socials: 12 | github: Xceron 13 | pypi: https://pypi.org/project/haystack-lemmatize-node/ 14 | repo: https://github.com/recrudesce/haystack_lemmatize_node 15 | type: Custom Node 16 | report_issue: https://github.com/recrudesce/haystack_lemmatize_node/issues 17 | --- 18 | 19 | ## Lemmatization 20 | 21 | Lemmatization is a text pre-processing technique used in natural language processing (NLP) models to break a word down to its root meaning to identify similarities. For example, a lemmatization algorithm would reduce the word better to its root word, or lemme, good. 22 | 23 | This node can be placed within a pipeline to lemmatize documents returned by a Retriever, prior to adding them as context to a prompt (for a PromptNode or similar). 24 | The process of lemmatizing the document content can potentially reduce the amount of tokens used by up to 30%, without drastically affecting the meaning of the document. 25 | 26 | ![image](https://user-images.githubusercontent.com/6450799/230403871-d0299748-977c-4c9e-9d70-914d8ff2bf3b.png) 27 | 28 | ### Before Lemmatization: 29 | ![image](https://user-images.githubusercontent.com/6450799/230404198-a3ed6382-03b8-4ec6-b88d-4232560752f8.png) 30 | 31 | ### After Lemmatization: 32 | ![image](https://user-images.githubusercontent.com/6450799/230404246-a8488a57-73bd-4420-9f1b-8a080b84121b.png) 33 | 34 | ## Installation 35 | 36 | Run `pip install haystack-lemmatize-node` to install the latest available release. 37 | 38 | ## Usage 39 | 40 | Include it in your pipeline - example as follows: 41 | 42 | ```python 43 | import logging 44 | import re 45 | 46 | from datasets import load_dataset 47 | from haystack.document_stores import InMemoryDocumentStore 48 | from haystack.nodes import PromptNode, PromptTemplate, AnswerParser, BM25Retriever 49 | from haystack.pipelines import Pipeline 50 | from haystack_lemmatize_node import LemmatizeDocuments 51 | 52 | 53 | logging.basicConfig(format="%(levelname)s - %(name)s - %(message)s", level=logging.WARNING) 54 | logging.getLogger("haystack").setLevel(logging.INFO) 55 | 56 | document_store = InMemoryDocumentStore(use_bm25=True) 57 | 58 | dataset = load_dataset("bilgeyucel/seven-wonders", split="train") 59 | document_store.write_documents(dataset) 60 | 61 | retriever = BM25Retriever(document_store=document_store, top_k=2) 62 | 63 | lfqa_prompt = PromptTemplate( 64 | name="lfqa", 65 | prompt_text="Given the context please answer the question using your own words. Generate a comprehensive, summarized answer. If the information is not included in the provided context, reply with 'Provided documents didn't contain the necessary information to provide the answer'\n\nContext: {documents}\n\nQuestion: {query} \n\nAnswer:", 66 | output_parser=AnswerParser(), 67 | ) 68 | 69 | prompt_node = PromptNode( 70 | model_name_or_path="text-davinci-003", 71 | default_prompt_template=lfqa_prompt, 72 | max_length=500, 73 | api_key="sk-OPENAIKEY", 74 | ) 75 | 76 | lemmatize = LemmatizeDocuments() # you can pass the `base_lang=XX` argument here too, where XX is a language as listed here: https://pypi.org/project/simplemma/ 77 | 78 | pipe = Pipeline() 79 | pipe.add_node(component=retriever, name="Retriever", inputs=["Query"]) 80 | pipe.add_node(component=lemmatize, name="Lemmatize", inputs=["Retriever"]) 81 | pipe.add_node(component=prompt_node, name="prompt_node", inputs=["Lemmatize"]) 82 | 83 | query = "What does the Rhodes Statue look like?" 84 | 85 | output = pipe.run(query) 86 | 87 | print(output['answers'][0].answer) 88 | ``` 89 | 90 | ## Caveats 91 | Sometimes lemmatization can be slow for large document content, but in the world of AI where we can potentially wait 30+ seconds for an LLM to respond (hello GPT-4), what's a couple more seconds? 92 | -------------------------------------------------------------------------------- /integrations/qdrant-document-store.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: integration 3 | name: Qdrant Document Store 4 | description: Use the Qdrant vector database with Haystack 5 | authors: 6 | - name: Qdrant 7 | socials: 8 | github: qdrant 9 | twitter: qdrant_engine 10 | pypi: https://pypi.org/project/qdrant-haystack/ 11 | repo: https://github.com/qdrant/qdrant-haystack 12 | type: Document Store 13 | report_issue: https://github.com/qdrant/qdrant-haystack/issues 14 | logo: /logos/qdrant.png 15 | --- 16 | 17 | An integration of [Qdrant](https://qdrant.tech) vector database with [Haystack](https://haystack.deepset.ai/) 18 | by [deepset](https://www.deepset.ai). 19 | 20 | The library finally allows using Qdrant as a document store, and provides an in-place replacement 21 | for any other vector embeddings store. Thus, you should expect any kind of application to be working 22 | smoothly just by changing the provider to `QdrantDocumentStore`. 23 | 24 | ## Installation 25 | 26 | `qdrant-haystack` might be installed as any other Python library, using pip or poetry: 27 | 28 | ```bash 29 | pip install qdrant-haystack 30 | ``` 31 | 32 | ```bash 33 | poetry add qdrant-haystack 34 | ``` 35 | 36 | ## Usage 37 | 38 | Once installed, you can already start using `QdrantDocumentStore` as any other store that supports 39 | embeddings. 40 | 41 | ```python 42 | from qdrant_haystack import QdrantDocumentStore 43 | 44 | document_store = QdrantDocumentStore( 45 | url="localhost", 46 | index="Document", 47 | embedding_dim=512, 48 | recreate_index=True, 49 | hnsw_config={"m": 16, "ef_construct": 64} # Optional 50 | ) 51 | ``` 52 | 53 | The list of parameters accepted by `QdrantDocumentStore` is complementary to those used in the 54 | official [Python Qdrant client](https://github.com/qdrant/qdrant_client). 55 | 56 | ### Using local in-memory / disk-persisted mode 57 | 58 | Qdrant Python client, from version 1.1.1, supports local in-memory/disk-persisted mode. That's 59 | a good choice for any test scenarios and quick experiments in which you do not plan to store 60 | lots of vectors. In such a case spinning a Docker container might be even not required. 61 | 62 | The local mode was also implemented in `qdrant-haystack` integration. 63 | 64 | #### In-memory storage 65 | 66 | In case you want to have a transient storage, for example in case of automated tests launched 67 | during your CI/CD pipeline, using Qdrant Local mode with in-memory storage might be a preferred 68 | option. It might be simply enabled by passing `:memory:` as first parameter, while creating an 69 | instance of `QdrantDocumentStore`. 70 | 71 | ```python 72 | from qdrant_haystack import QdrantDocumentStore 73 | 74 | document_store = QdrantDocumentStore( 75 | ":memory:", 76 | index="Document", 77 | embedding_dim=512, 78 | recreate_index=True, 79 | hnsw_config={"m": 16, "ef_construct": 64} # Optional 80 | ) 81 | ``` 82 | 83 | #### On disk storage 84 | 85 | However, if you prefer to keep the vectors between different runs of your application, it 86 | might be better to use on disk storage and pass the path that should be used to persist 87 | the data. 88 | 89 | ```python 90 | from qdrant_haystack import QdrantDocumentStore 91 | 92 | document_store = QdrantDocumentStore( 93 | path="/home/qdrant/storage_local", 94 | index="Document", 95 | embedding_dim=512, 96 | recreate_index=True, 97 | hnsw_config={"m": 16, "ef_construct": 64} # Optional 98 | ) 99 | ``` 100 | 101 | ### Connecting to Qdrant Cloud cluster 102 | 103 | If you prefer not to manage your own Qdrant instance, [Qdrant Cloud](https://cloud.qdrant.io/) 104 | might be a better option. 105 | 106 | ```python 107 | from qdrant_haystack import QdrantDocumentStore 108 | 109 | document_store = QdrantDocumentStore( 110 | url="https://YOUR-CLUSTER-URL.aws.cloud.qdrant.io", 111 | index="Document", 112 | api_key="<< YOUR QDRANT CLOUD API KEY >>", 113 | embedding_dim=512, 114 | recreate_index=True, 115 | ) 116 | ``` 117 | 118 | There is no difference in terms of functionality between local instances and cloud clusters. -------------------------------------------------------------------------------- /integrations/elasticsearch-document-store.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: integration 3 | name: Elasticsearch Document Store 4 | description: Use an Elasticsearch database with Haystack 5 | authors: 6 | - name: deepset 7 | socials: 8 | github: deepset-ai 9 | twitter: deepset_ai 10 | linkedin: deepset-ai 11 | pypi: https://pypi.org/project/farm-haystack 12 | repo: https://github.com/deepset-ai/haystack 13 | type: Document Store 14 | report_issue: https://github.com/deepset-ai/haystack/issues 15 | logo: /logos/elastic.png 16 | --- 17 | 18 | The `ElasticsearchDocumentStore` is maintained within the core Haystack project. It allows you to use [Elasticsearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/elasticsearch-intro.html) as data storage for your Haystack pipelines. 19 | 20 | For a details on available methods, visit the [API Reference](https://docs.haystack.deepset.ai/reference/document-store-api#elasticsearchdocumentstore-1) 21 | 22 | ## Installation 23 | 24 | To run an Elasticsearch instance locally, first follow the [installation](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html) and [start up](https://www.elastic.co/guide/en/elasticsearch/reference/current/starting-elasticsearch.html) guides. 25 | 26 | ```bash 27 | pip install farm-haystack[elasticsearch] 28 | ``` 29 | 30 | To install Elasticsearch 7, you can run `pip install farm-haystac[elasticsearch7]`. 31 | 32 | ## Usage 33 | 34 | Once installed, you can start using your Elasticsearch database with Haystack by initializing it: 35 | 36 | ```python 37 | from haystack.document_stores import ElasticsearchDocumentStore 38 | 39 | document_store = ElasticsearchDocumentStore(host = "localhost", 40 | port = 9200, 41 | embedding_dim = 768) 42 | ``` 43 | 44 | ### Writing Documents to ElasticsearchDocumentStore 45 | 46 | To write documents to your `ElasticsearchDocumentStore`, create an indexing pipeline, or use the `write_documents()` function. 47 | For this step, you may make use of the available [FileConverters](https://docs.haystack.deepset.ai/docs/file_converters) and [PreProcessors](https://docs.haystack.deepset.ai/docs/preprocessor), as well as other [Integrations](/integrations) that might help you fetch data from other resources. 48 | 49 | #### Indexing Pipeline 50 | 51 | ```python 52 | from haystack import Pipeline 53 | from haystack.document_stores import ElasticsearchDocumentStore 54 | from haystack.nodes import TextConverter, PreProcessor 55 | 56 | document_store = ElasticsearchDocumentStore(host = "localhost", port = 9200) 57 | converter = TextConverter() 58 | preprocessor = PreProcessor() 59 | 60 | indexing_pipeline = Pipeline() 61 | indexing_pipeline.add_node(component=converter, name="TextConverter", inputs=["File"]) 62 | indexing_pipeline.add_node(component=preprocessor, name="PreProcessor", inputs=["TextConverter"]) 63 | indexing_pipeline.add_node(component=document_store, name="DocumentStore", inputs=["PreProcessor"]) 64 | 65 | indexing_pipeline.run(file_paths=["filename.txt"]) 66 | ``` 67 | 68 | ### Using Elasticsearch in a Query Pipeline 69 | 70 | Once you have documents in your `ElasitsearchDocumentStore`, it's ready to be used in any Haystack pipeline. Such as a Retrieval Augmented Generation (RAG) pipeline. Learn more about [Retrievers](https://docs.haystack.deepset.ai/docs/retriever) to make use of vector search within your LLM pipelines. 71 | 72 | ```python 73 | from haystack import Pipeline 74 | from haystack.document_stores import ElasticsearchDocumentStore 75 | from haystack.nodes import EmbeddingRetriever, PromptNode 76 | 77 | document_store = ElasticsearchDocumentStore() 78 | retriever = EmbeddingRetriever(document_store = document_store, 79 | embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1") 80 | prompt_node = PromptNode(model_name_or_path = "google/flan-t5-xl", default_prompt_template = "deepset/question-answering") 81 | 82 | query_pipeline = Pipeline() 83 | query_pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"]) 84 | query_pipeline.add_node(component=prompt_node, name="PromptNode", inputs=["Retriever"]) 85 | 86 | query_pipeline.run(query = "Where is Istanbul?") 87 | ``` -------------------------------------------------------------------------------- /integrations/pinecone-document-store.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: integration 3 | name: Pinecone Document Store 4 | description: Use a Pinecone database with Haystack 5 | authors: 6 | - name: deepset 7 | socials: 8 | github: deepset-ai 9 | twitter: deepset_ai 10 | linkedin: deepset-ai 11 | pypi: https://pypi.org/project/farm-haystack 12 | repo: https://github.com/deepset-ai/haystack 13 | type: Document Store 14 | report_issue: https://github.com/deepset-ai/haystack/issues 15 | logo: /logos/pinecone.png 16 | --- 17 | 18 | [Pinecone](https://www.pinecone.io/) is a fast and scalable vector database which you can use in Haystack pipelines with the [PineconeDocumentStore](https://docs.haystack.deepset.ai/docs/document_store#initialization) 19 | 20 | For a detailed overview of all the available methods and settings for the `PineconeDocumentStore`, visit the Haystack [API Reference](https://docs.haystack.deepset.ai/reference/document-store-api#pineconedocumentstore) 21 | 22 | ## Installation 23 | 24 | ```bash 25 | pip install farm-haystack[pinecone] 26 | ``` 27 | 28 | ## Usage 29 | 30 | To use Pinecone as your data storage for your Haystack LLM pipelines, you must have an account with Pinecone and an API Key. Once you have those, you can initialize a `PineconeDocumentStore` for Haystack: 31 | 32 | ```python 33 | from haystack.document_stores import PineconeDocumentStore 34 | 35 | document_store = PineconeDocumentStore(api_key='YOUR_API_KEY', 36 | similarity="cosine", 37 | embedding_dim=768) 38 | ``` 39 | 40 | ### Writing Documents to PineconeDocumentStore 41 | 42 | To write documents to your `PineconeDocumentStore`, create an indexing pipeline, or use the `write_documents()` function. 43 | For this step, you may make use of the available [FileConverters](https://docs.haystack.deepset.ai/docs/file_converters) and [PreProcessors](https://docs.haystack.deepset.ai/docs/preprocessor), as well as other [Integrations](/integrations) that might help you fetch data from other resources. Below is an example indexing pipeline that indexes your Markdown files into a Pinecone database. 44 | 45 | #### Indexing Pipeline 46 | 47 | ```python 48 | from haystack import Pipeline 49 | from haystack.document_stores import PineconeDocumentStore 50 | from haystack.nodes import MarkdownConverter, PreProcessor 51 | 52 | document_store = PineconeDocumentStore(api_key='YOUR_API_KEY', 53 | similarity="cosine", 54 | embedding_dim=768) 55 | converter = MarkdownConverter() 56 | preprocessor = PreProcessor() 57 | 58 | indexing_pipeline = Pipeline() 59 | indexing_pipeline.add_node(component=converter, name="PDFConverter", inputs=["File"]) 60 | indexing_pipeline.add_node(component=preprocessor, name="PreProcessor", inputs=["PDFConverter"]) 61 | indexing_pipeline.add_node(component=document_store, name="DocumentStore", inputs=["PreProcessor"]) 62 | 63 | indexing_pipeline.run(file_paths=["filename.pdf"]) 64 | ``` 65 | 66 | ### Using Pinecone in a Query Pipeline 67 | 68 | Once you have documents in your `PineconeDocumentStore`, it's ready to be used in any Haystack pipeline. For example, below is a pipeline that makes use of a custom prompt that is designed to answer questions for the retrieved documents. 69 | 70 | ```python 71 | from haystack import Pipeline 72 | from haystack.document_stores import PineconeDocumentStore 73 | from haystack.nodes import AnswerParser, EmbeddingRetriever, PromptNode, PromptTemplate 74 | 75 | document_store = PineconeDocumentStore(api_key='YOUR_API_KEY', 76 | similarity="cosine", 77 | embedding_dim=768) 78 | 79 | retriever = EmbeddingRetriever(document_store = document_store, 80 | embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1") 81 | prompt_template = PromptTemplate(prompt = """"Answer the following query based on the provided context. If the context does 82 | not include an answer, reply with 'I don't know'.\n 83 | Query: {query}\n 84 | Documents: {join(documents)} 85 | Answer: 86 | """, 87 | output_parser=AnswerParser()) 88 | prompt_node = PromptNode(model_name_or_path = "gpt-4", 89 | api_key = "YOUR_OPENAI_KEY", 90 | default_prompt_template = prompt_template) 91 | 92 | query_pipeline = Pipeline() 93 | query_pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"]) 94 | query_pipeline.add_node(component=prompt_node, name="PromptNode", inputs=["Retriever"]) 95 | 96 | query_pipeline.run(query = "What is Pinecone", params={"Retriever" : {"top_k": 5}}) 97 | ``` -------------------------------------------------------------------------------- /integrations/milvus-document-store.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: integration 3 | name: Milvus Document Store 4 | description: Use the Milvus vector database with Haystack 5 | authors: 6 | - name: Zilliz 7 | socials: 8 | github: zilliztech 9 | twitter: zilliz_universe 10 | pypi: https://pypi.org/project/milvus-haystack/ 11 | repo: https://github.com/milvus-io/milvus-haystack 12 | type: Document Store 13 | report_issue: https://github.com/milvus-io/milvus-haystack/issues 14 | logo: /logos/milvus.png 15 | --- 16 | 17 | An integration of [Milvus](https://milvus.io/) vector database with [Haystack](https://haystack.deepset.ai/). 18 | 19 | Milvus is a flexible, reliable, and fast cloud-native, open-source vector database. It powers embedding similarity search and AI applications and strives to make vector databases accessible to every organization. Milvus can store, index, and manage a billion+ embedding vectors generated by deep neural networks and other machine learning (ML) models. This level of scale is vital to handling the volumes of unstructured data generated to help organizations to analyze and act on it to provide better service, reduce fraud, avoid downtime, and make decisions faster. 20 | Milvus is a graduated-stage project of the LF AI & Data Foundation. 21 | 22 | Use Milvus as storage for Haystack pipelines as `MilvusDocumentStore`. 23 | 24 | 🚀 See an example application that uses the `MilvusDocumentStore` to do Milvus documentation QA [here](https://github.com/TuanaCelik/milvus-documentation-qa). 25 | 26 | ## Installation 27 | 28 | ```bash 29 | pip install milvus-haystack 30 | ``` 31 | 32 | ## Usage 33 | 34 | Once installed and running, you can start using Milvus with Haystack by initializing it: 35 | 36 | ```python 37 | from milvus_haystack import MilvusDocumentStore 38 | 39 | document_store = MilvusDocumentStore() 40 | ``` 41 | 42 | ### Writing Documents to MilvusDocumentStore 43 | 44 | To write documents to your `MilvusDocumentStore`, create an indexing pipeline, or use the `write_documents()` function. 45 | For this step, you may make use of the available [FileConverters](https://docs.haystack.deepset.ai/docs/file_converters) and [PreProcessors](https://docs.haystack.deepset.ai/docs/preprocessor), as well as other [Integrations](/integrations) that might help you fetch data from other resources. Below is the example indexing pipeline used in the Milvus Documentation QA demo, which makes use of the `Crawler` component. 46 | 47 | #### Indexing Pipeline 48 | 49 | ```python 50 | from haystack import Pipeline 51 | from haystack.nodes import Crawler, PreProcessor, EmbeddingRetriever 52 | from milvus_haystack import MilvusDocumentStore 53 | 54 | document_store = MilvusDocumentStore(recreate_index=True, return_embedding=True, similarity="cosine") 55 | crawler = Crawler(urls=["https://milvus.io/docs/"], crawler_depth=1, overwrite_existing_files=True, output_dir="crawled_files") 56 | preprocessor = PreProcessor( 57 | clean_empty_lines=True, 58 | clean_whitespace=False, 59 | clean_header_footer=True, 60 | split_by="word", 61 | split_length=500, 62 | split_respect_sentence_boundary=True, 63 | ) 64 | retriever = EmbeddingRetriever(document_store=document_store, embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1") 65 | 66 | indexing_pipeline = Pipeline() 67 | indexing_pipeline.add_node(component=crawler, name="crawler", inputs=['File']) 68 | indexing_pipeline.add_node(component=preprocessor, name="preprocessor", inputs=['crawler']) 69 | indexing_pipeline.add_node(component=retriever, name="retriever", inputs=['preprocessor']) 70 | indexing_pipeline.add_node(component=document_store, name="document_store", inputs=['retriever']) 71 | 72 | indexing_pipeline.run() 73 | ``` 74 | 75 | ### Using Milvus in a Retrieval Augmented Generative Pipeline 76 | 77 | Once you have documents in your `MilvusDocumentStore`, it's ready to be used in any Haystack pipeline. For example, below is a pipeline that makes use of the ["deepset/question-answering"](https://prompthub.deepset.ai/?prompt=deepset%2Fquestion-answering) prompt that is designed to generate answers for the retrieved documents. Below is the example pipeline used in the Milvus Documentation QA deme that generates replies to queries using GPT-4: 78 | 79 | ```python 80 | from haystack import Pipeline 81 | from haystack.nodes import EmbeddingRetriever, PromptNode, PromptTemplate, AnswerParser 82 | from milvus_haystack import MilvusDocumentStore 83 | 84 | document_store = MilvusDocumentStore() 85 | 86 | retriever = EmbeddingRetriever(document_store=document_store, embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1") 87 | template = PromptTemplate(prompt="deepset/question-answering", output_parser=AnswerParser()) 88 | prompt_node = PromptNode(model_name_or_path="gpt-4", default_prompt_template=template, api_key=YOUR_OPENAI_API_KEY, max_length=200) 89 | 90 | query_pipeline = Pipeline() 91 | query_pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"]) 92 | query_pipeline.add_node(component=prompt_node, name="PromptNode", inputs=["Retriever"]) 93 | ``` -------------------------------------------------------------------------------- /integrations/weaviate-document-store.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: integration 3 | name: Weaviate Document Store 4 | description: Use a Weaviate database with Haystack 5 | authors: 6 | - name: deepset 7 | socials: 8 | github: deepset-ai 9 | twitter: deepset_ai 10 | linkedin: deepset-ai 11 | pypi: https://pypi.org/project/farm-haystack 12 | repo: https://github.com/deepset-ai/haystack 13 | type: Document Store 14 | report_issue: https://github.com/deepset-ai/haystack/issues 15 | logo: /logos/weaviate.png 16 | --- 17 | 18 | Haystack supports the use of [Weaviate](https://weaviate.io/) as data storage for LLM pipelines, with the `WeaviateDocumentStore`. You can choose to run Weaviate locally youself, or use a hosted Weaviate database. 19 | 20 | For details on the available methods and parameters of the `WeaviateDocumentStore`, check out the Haystack [API Reference](https://docs.haystack.deepset.ai/reference/document-store-api#weaviatedocumentstore) and [Documentation](https://docs.haystack.deepset.ai/docs/document_store#initialization) 21 | 22 | ## Installation 23 | 24 | ```bash 25 | pip install farm-haystack[weaviate] 26 | ``` 27 | 28 | ## Usage 29 | 30 | To use Weaviate as your data storage for your Haystack LLM pipelines, you should have it running locally or have a hosted instance. Then, you can initialize a `WeaviateDocumentStore`: 31 | 32 | ```python 33 | from haystack.document_stores import WeaviateDocumentStore 34 | 35 | document_store = WeaviateDocumentStore(host='http://localhost", 36 | port=8080, 37 | embedding_dim=768) 38 | ``` 39 | 40 | ### Writing Documents to WeaviateDocumentStore 41 | 42 | To write documents to your `WeaviateDocumentStore`, create an indexing pipeline, or use the `write_documents()` function. 43 | For this step, you may make use of the available [FileConverters](https://docs.haystack.deepset.ai/docs/file_converters) and [PreProcessors](https://docs.haystack.deepset.ai/docs/preprocessor), as well as other [Integrations](/integrations) that might help you fetch data from other resources. Below is an example indexing pipeline that indexes your Markdown files into a Weaviate database. The example pipeline below not only indexes the contents of the files, but also the embeddings. This way, we can do vector search on our files. 44 | 45 | #### Indexing Pipeline 46 | 47 | ```python 48 | from haystack import Pipeline 49 | from haystack.document_stores import WeaviateDocumentStore 50 | from haystack.nodes import EmbeddingRetriever, MarkdownConverter, PreProcessor 51 | 52 | document_store = WeaviateDocumentStore(host='http://localhost", 53 | port=8080, 54 | embedding_dim=768) 55 | converter = MarkdownConverter() 56 | preprocessor = PreProcessor() 57 | retriever = EmbeddingRetriever(document_store = document_store, 58 | embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1") 59 | 60 | indexing_pipeline = Pipeline() 61 | indexing_pipeline.add_node(component=converter, name="PDFConverter", inputs=["File"]) 62 | indexing_pipeline.add_node(component=preprocessor, name="PreProcessor", inputs=["PDFConverter"]) 63 | indexing_pipeline.add_node(component=retriever, name="Retriever", inputs=["PreProcessor"]) 64 | indexing_pipeline.add_node(component=document_store, name="DocumentStore", inputs=["Retriever"]) 65 | 66 | indexing_pipeline.run(file_paths=["filename.pdf"]) 67 | ``` 68 | 69 | ### Using Weaviate in a Query Pipeline 70 | 71 | Once you have documents in your `WeaviateDocumentStore`, it's ready to be used in any Haystack pipeline. For example, below is a pipeline that makes use of a custom prompt that, given a query, is designed to generate long answers based on the retrieved documents. 72 | 73 | ```python 74 | from haystack import Pipeline 75 | from haystack.document_stores import WeaviateDocumentStore 76 | from haystack.nodes import AnswerParser, EmbeddingRetriever, PromptNode, PromptTemplate 77 | 78 | document_store = WeaviateDocumentStore(host='http://localhost", 79 | port=8080, 80 | embedding_dim=768) 81 | 82 | retriever = EmbeddingRetriever(document_store = document_store, 83 | embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1") 84 | prompt_template = PromptTemplate(prompt = """"Given the provided Documents, answer the Query. Make your answer detailed and long\n 85 | Query: {query}\n 86 | Documents: {join(documents)} 87 | Answer: 88 | """, 89 | output_parser=AnswerParser()) 90 | prompt_node = PromptNode(model_name_or_path = "gpt-4", 91 | api_key = "YOUR_OPENAI_KEY", 92 | default_prompt_template = prompt_template) 93 | 94 | query_pipeline = Pipeline() 95 | query_pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"]) 96 | query_pipeline.add_node(component=prompt_node, name="PromptNode", inputs=["Retriever"]) 97 | 98 | query_pipeline.run(query = "What is Weaviate", params={"Retriever" : {"top_k": 5}}) 99 | ``` -------------------------------------------------------------------------------- /integrations/basic-agent-memory.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: integration 3 | name: Basic Agent Memory Tool 4 | description: A working memory that stores the Agent's conversation memory 5 | authors: 6 | - name: Roland Tannous 7 | socials: 8 | github: rolandtannous 9 | twitter: rolandtannous 10 | - name: Xceron 11 | socials: 12 | github: Xceron 13 | pypi: https://pypi.org/project/haystack-memory/ 14 | repo: https://github.com/rolandtannous/haystack-memory 15 | type: Agent Tool 16 | report_issue: https://github.com/rolandtannous/haystack-memory/issues 17 | --- 18 | 19 | This library implements a working memory that stores the Agent's conversation memory 20 | and a sensory memory that stores the agent's short-term sensory memory. The working memory can be utilized in-memory or through Redis, with the 21 | 22 | Redis implementation featuring a sliding window. On the other hand, the sensory memory is an in-memory implementation that mimics 23 | a human's brief sensory memory, lasting only for the duration of one interaction.. 24 | 25 | ## Installation 26 | 27 | - Python pip: ```pip install --upgrade haystack-memory``` . This method will attempt to install the dependencies (farm-haystack>=1.15.0, redis) 28 | - Python pip (skip dependency installation): Use ```pip install --upgrade haystack-memory --no-deps``` 29 | - Using git: ```pip install git+https://github.com/rolandtannous/haystack-memory.git@main#egg=haystack-memory``` 30 | 31 | 32 | ## Usage 33 | 34 | To use memory in your agent, you need three components: 35 | - `MemoryRecallNode`: This node is added to the agent as a tool. It will allow the agent to remember the conversation and make query-memory associations. 36 | - `MemoryUtils`: This class should be used to save the queries and the final agent answers to the conversation memory. 37 | - `chat`: This is a method of the MemoryUtils class. It is used to chat with the agent. It will save the query and the answer to the memory. It also returns the full result for further usage. 38 | 39 | ```py 40 | from haystack.agents import Agent, Tool 41 | from haystack.nodes import PromptNode 42 | from haystack_memory.prompt_templates import memory_template 43 | from haystack_memory.memory import MemoryRecallNode 44 | from haystack_memory.utils import MemoryUtils 45 | 46 | # Initialize the memory and the memory tool so the agent can retrieve the memory 47 | working_memory = [] 48 | sensory_memory = [] 49 | memory_node = MemoryRecallNode(memory=working_memory) 50 | memory_tool = Tool(name="Memory", 51 | pipeline_or_node=memory_node, 52 | description="Your memory. Always access this tool first to remember what you have learned.") 53 | 54 | prompt_node = PromptNode(model_name_or_path="text-davinci-003", 55 | api_key="", 56 | max_length=1024, 57 | stop_words=["Observation:"]) 58 | memory_agent = Agent(prompt_node=prompt_node, prompt_template=memory_template) 59 | memory_agent.add_tool(memory_tool) 60 | 61 | # Initialize the utils to save the query and the answers to the memory 62 | memory_utils = MemoryUtils(working_memory=working_memory,sensory_memory=sensory_memory, agent=memory_agent) 63 | result = memory_utils.chat("") 64 | print(working_memory) 65 | ``` 66 | 67 | ### Redis 68 | 69 | The working memory can also be stored in a redis database which makes it possible to use different memories at the same time to be used with multiple agents. Additionally, it supports a sliding window to only utilize the last k messages. 70 | 71 | ```py 72 | from haystack.agents import Agent, Tool 73 | from haystack.nodes import PromptNode 74 | from haystack_memory.memory import RedisMemoryRecallNode 75 | from haystack_memory.prompt_templates import memory_template 76 | from haystack_memory.utils import RedisUtils 77 | 78 | sensory_memory = [] 79 | # Initialize the memory and the memory tool so the agent can retrieve the memory 80 | redis_memory_node = RedisMemoryRecallNode(memory_id="working_memory", 81 | host="localhost", 82 | port=6379, 83 | db=0) 84 | memory_tool = Tool(name="Memory", 85 | pipeline_or_node=redis_memory_node, 86 | description="Your memory. Always access this tool first to remember what you have learned.") 87 | prompt_node = PromptNode(model_name_or_path="text-davinci-003", 88 | api_key="", 89 | max_length=1024, 90 | stop_words=["Observation:"]) 91 | memory_agent = Agent(prompt_node=prompt_node, prompt_template=memory_template) 92 | # Initialize the utils to save the query and the answers to the memory 93 | redis_utils = RedisUtils(agent=memory_agent, 94 | sensory_memory=sensory_memory, 95 | memory_id="working_memory", 96 | host="localhost", 97 | port=6379, 98 | db=0) 99 | result = redis_utils.chat("") 100 | ``` 101 | 102 | 103 | ## Examples 104 | 105 | Examples can be found in the `examples/` folder. They contain usage examples for both in-memory and Redis memory types. 106 | To open the examples in colab, click on the following links: 107 | - Basic Memory: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/rolandtannous/HaystackAgentBasicMemory/blob/main/examples/example_basic_memory.ipynb) 108 | - Redis Memory: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/rolandtannous/HaystackAgentBasicMemory/blob/main/examples/example_redis_memory.ipynb) 109 | 110 | 111 | 112 | 113 | 114 | 115 | -------------------------------------------------------------------------------- /integrations/newspaper3k.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: integration 3 | name: Newspaper3k Wrapper Nodes 4 | description: Newspaper3k wrapper nodes. It allows to scrape articles directly using the scraper Node or crawling many pages using the crawler Node. 5 | 6 | authors: 7 | - name: Haradai 8 | socials: 9 | github: haradai 10 | pypi: https://pypi.org/project/newspaper3k-haystack 11 | repo: https://github.com/Haradai/newspaper3k-haystack 12 | type: Custom Node 13 | report_issue: https://github.com/Haradai/newspaper3k-haystack/issues 14 | --- 15 | 16 | Newspaper3k Haystack is a simple wrapper for the newspaper3k library within the Haystack framework. It allows to scrape articles given urls using the scraper node or crawl many pages using the crawler node. 17 | 18 | ## Installation: 19 | You can install Newspaper3k Haystack using pip: 20 | ``` 21 | pip install newspaper3k-haystack 22 | ``` 23 | 24 | ## Usage: 25 | ### Scraper Node: 26 | ``` 27 | from newspaper3k-haystack import newspaper3k_scraper 28 | scraper = newspaper3k_scraper() 29 | 30 | ``` 31 | You can also provide a header for the request and a timeout for the page loading. 32 | ``` 33 | scraper = newspaper3k_scraper( 34 | headers={'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0', 35 | 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'}, 36 | request_timeout= 10) 37 | ``` 38 | 39 | To run in standalone mode you can use the run or run_batch if you want to load one url or multiple urls in an array. 40 | 41 | **Available parameters:** 42 | ``` 43 | :param query: list of strings containing the webpages to scrape. 44 | :param lang: (None by default) language to process the article with, if None autodetected. 45 | Available languages are: (more info at https://newspaper.readthedocs.io/en/latest/) 46 | input code full name 47 | 48 | ar Arabic 49 | ru Russian 50 | nl Dutch 51 | de German 52 | en English 53 | es Spanish 54 | fr French 55 | he Hebrew 56 | ... 57 | :param summary: (False by default) Wether to summarize the document (through nespaper3k) and save it as document metadata. 58 | :param path: (None by default) Path where to store the downloaded articles html, if None, not downloaded. Ignored if load=True 59 | :param load: (False by default) If true query should be a local path to an html file to scrape. 60 | ``` 61 | **In Standalone:** 62 | ``` 63 | scraper.run(query="https://www.lonelyplanet.com/articles/getting-around-norway", 64 | metadata=True, 65 | summary=True, 66 | keywords=True, 67 | path="articles") 68 | ``` 69 | **In a Pipeline:** 70 | ``` 71 | 72 | from qdrant_haystack.document_stores import QdrantDocumentStore 73 | from haystack.nodes import EntityExtractor 74 | from haystack.pipelines import Pipeline 75 | from haystack.nodes import PreProcessor 76 | 77 | document_store = QdrantDocumentStore( 78 | ":memory:", 79 | index="Document", 80 | embedding_dim=768, 81 | recreate_index=True, 82 | ) 83 | 84 | entity_extractor = EntityExtractor(model_name_or_path="dslim/bert-base-NER",flatten_entities_in_meta_data=True) 85 | 86 | processor = PreProcessor( 87 | clean_empty_lines=False, 88 | clean_whitespace=False, 89 | clean_header_footer=False, 90 | split_by="sentence", 91 | split_length=30, 92 | split_respect_sentence_boundary=False, 93 | split_overlap=0 94 | ) 95 | 96 | indexing_pipeline = Pipeline() 97 | indexing_pipeline.add_node(component=scraper, name="scraper", inputs=['File']) 98 | indexing_pipeline.add_node(component=processor, name="processor", inputs=['scraper']) 99 | indexing_pipeline.add_node(entity_extractor, "EntityExtractor", ["processor"]) 100 | indexing_pipeline.add_node(component=document_store, name="document_store", inputs=['EntityExtractor']) 101 | 102 | #we can pass the previously seen arguments also 103 | indexing_pipeline.run(query = "https://www.roughguides.com/norway/", 104 | params={ 105 | "scraper":{ 106 | "metadata":True, 107 | "summary":True, 108 | "keywords":True 109 | } 110 | }) 111 | ``` 112 | ### Crawler Node: 113 | ``` 114 | from newspaper3k-haystack import newspaper3k_crawler 115 | ``` 116 | When initializing the crawler you can pass the same parameters as to the scraper node. 117 | 118 | ``` 119 | crawler = newspaper3k_crawler( 120 | headers={'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0', 121 | 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'}, 122 | request_timeout= 10) 123 | ``` 124 | 125 | **Available parameters:** 126 | ``` 127 | :param query: list of initial urls to start scraping 128 | :param n_articles: number of articles to scrape per initial url 129 | :param beam: number of articles from each scraped website to prioritize in the crawl queue. 130 | If 0 then priority of scrape will be a simple continuous pile of found links after each scrape. (BFS). 131 | If 1 then would be performing (DFS). 132 | :param filters: dictionary with lists of strings that the urls should contain or not. Keys: positive and negative. 133 | Urls will be checked to contain at least one positive filter and none of the negatives. 134 | e.g. 135 | {positive: [".com",".es"], 136 | negative: ["facebook","instagram"]} 137 | :param keep_links: (False by default) Wether to keep the found links in each page as document metadata or not 138 | 139 | :param lang: (None by default) language to process the article with, if None autodetected. 140 | Available languages are: (more info at https://newspaper.readthedocs.io/en/latest/) 141 | input code full name 142 | 143 | ar Arabic 144 | ru Russian 145 | nl Dutch 146 | de German 147 | en English 148 | es Spanish 149 | fr French 150 | he Hebrew 151 | it Italian 152 | ko Korean 153 | no Norwegian 154 | fa Persian 155 | pl Polish 156 | pt Portuguese 157 | sv Swedish 158 | hu Hungarian 159 | fi Finnish 160 | da Danish 161 | zh Chinese 162 | id Indonesian 163 | vi Vietnamese 164 | sw Swahili 165 | tr Turkish 166 | el Greek 167 | uk Ukrainian 168 | bg Bulgarian 169 | hr Croatian 170 | ro Romanian 171 | sl Slovenian 172 | sr Serbian 173 | et Estonian 174 | ja Japanese 175 | be Belarusian 176 | 177 | :param metadata: (False by default) Wether to get article metadata. 178 | :param keywords: (False by default) Wether to save the detected article keywords as document metadata. 179 | :param summary: (False by default) Wether to summarize the document (through nespaper3k) and save it as document metadata. 180 | :param path: (None by default) Path where to store the downloaded articles html, if None, not downloaded. 181 | ``` 182 | **In Standalone:** 183 | 184 | You can also use run_batch and pass a list of urls in the query argument. It will scrape n_articles for each provided url. 185 | ``` 186 | docs = crawler.run( 187 | query = "https://www.roughguides.com/norway/ ", 188 | n_articles = 10, 189 | beam = 5, 190 | filters = { 191 | "positive":["norway"], 192 | "negative":["facebook","instagram"] 193 | }, 194 | keep_links = False, 195 | metadata=True, 196 | summary=True, 197 | keywords=True, 198 | path = "articles") 199 | ``` 200 | 201 | **In a Pipeline:** 202 | ``` 203 | from qdrant_haystack.document_stores import QdrantDocumentStore 204 | from haystack.nodes import EntityExtractor 205 | from haystack.pipelines import Pipeline 206 | from haystack.nodes import PreProcessor 207 | 208 | document_store = QdrantDocumentStore( 209 | ":memory:", 210 | index="Document", 211 | embedding_dim=768, 212 | recreate_index=True, 213 | ) 214 | 215 | entity_extractor = EntityExtractor(model_name_or_path="dslim/bert-base-NER",flatten_entities_in_meta_data=True) 216 | 217 | processor = PreProcessor( 218 | clean_empty_lines=False, 219 | clean_whitespace=False, 220 | clean_header_footer=False, 221 | split_by="sentence", 222 | split_length=30, 223 | split_respect_sentence_boundary=False, 224 | split_overlap=0 225 | ) 226 | 227 | indexing_pipeline = Pipeline() 228 | indexing_pipeline.add_node(component=crawler, name="crawler", inputs=['File']) 229 | indexing_pipeline.add_node(component=processor, name="processor", inputs=['crawler']) 230 | indexing_pipeline.add_node(entity_extractor, "EntityExtractor", ["processor"]) 231 | indexing_pipeline.add_node(component=document_store, name="document_store", inputs=['EntityExtractor']) 232 | 233 | #we can pass the previously seen arguments also 234 | indexing_pipeline.run(query = "https://www.roughguides.com/norway/", 235 | params={ 236 | "crawler":{ 237 | "n_articles" : 500, 238 | "beam" : 5, 239 | "filters" : { 240 | "positive":["norway"], 241 | "negative": ["facebook"] 242 | }, 243 | "keep_links" : False, 244 | "metadata":True, 245 | "summary":True, 246 | "keywords":True, 247 | "path": "articles" 248 | } 249 | }) 250 | ``` --------------------------------------------------------------------------------