├── .gitignore
├── logos
    ├── chroma.png
    ├── meta.png
    ├── milvus.png
    ├── qdrant.png
    ├── chainlit.png
    ├── elastic.png
    ├── pinecone.png
    ├── weaviate.png
    ├── intel-labs.png
    └── opensearch.png
├── images
    └── chainlit-haystack.png
├── .github
    └── workflows
    │   └── publish_integrations.yml
├── integrations
    ├── text2speech.md
    ├── fastrag.md
    ├── mastodon-fetcher.md
    ├── chroma-documentstore.md
    ├── azure-translator.md
    ├── veracity.md
    ├── chainlit.md
    ├── document-threshold.md
    ├── entailment-checker.md
    ├── opensearch-document-store.md
    ├── readmedocs-fetcher.md
    ├── faiss-document-store.md
    ├── lemmatize.md
    ├── qdrant-document-store.md
    ├── elasticsearch-document-store.md
    ├── pinecone-document-store.md
    ├── milvus-document-store.md
    ├── weaviate-document-store.md
    ├── basic-agent-memory.md
    └── newspaper3k.md
└── README.md


/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store


--------------------------------------------------------------------------------
/logos/chroma.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LLukas22/haystack-integrations/main/logos/chroma.png


--------------------------------------------------------------------------------
/logos/meta.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LLukas22/haystack-integrations/main/logos/meta.png


--------------------------------------------------------------------------------
/logos/milvus.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LLukas22/haystack-integrations/main/logos/milvus.png


--------------------------------------------------------------------------------
/logos/qdrant.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LLukas22/haystack-integrations/main/logos/qdrant.png


--------------------------------------------------------------------------------
/logos/chainlit.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LLukas22/haystack-integrations/main/logos/chainlit.png


--------------------------------------------------------------------------------
/logos/elastic.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LLukas22/haystack-integrations/main/logos/elastic.png


--------------------------------------------------------------------------------
/logos/pinecone.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LLukas22/haystack-integrations/main/logos/pinecone.png


--------------------------------------------------------------------------------
/logos/weaviate.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LLukas22/haystack-integrations/main/logos/weaviate.png


--------------------------------------------------------------------------------
/logos/intel-labs.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LLukas22/haystack-integrations/main/logos/intel-labs.png


--------------------------------------------------------------------------------
/logos/opensearch.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LLukas22/haystack-integrations/main/logos/opensearch.png


--------------------------------------------------------------------------------
/images/chainlit-haystack.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LLukas22/haystack-integrations/main/images/chainlit-haystack.png


--------------------------------------------------------------------------------
/.github/workflows/publish_integrations.yml:
--------------------------------------------------------------------------------
 1 | name: Publish integrations on Haystack Home
 2 | 
 3 | on:
 4 |   workflow_dispatch:
 5 |   push:
 6 |     branches:
 7 |       - main
 8 | 
 9 | 
10 | jobs:
11 |   publish-integrations:
12 |     runs-on: ubuntu-latest
13 | 
14 |     steps:
15 |       - name: trigger-hook
16 |         run: |
17 |           curl -X POST ${{ secrets.VERCEL_DEPLOY_HOOK }}


--------------------------------------------------------------------------------
/integrations/text2speech.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | layout: integration
 3 | name: AnswerToSpeech & DocumentToSpeech
 4 | description: Convert Haystack Answers and Documents to audio files
 5 | authors:
 6 |     - name: deepset
 7 |       socials:
 8 |         github: deepset-ai
 9 |         twitter: deepset_ai
10 |         linkedin: deepset-ai
11 | pypi: https://pypi.org/project/farm-haystack-text2speech/
12 | repo: https://github.com/deepset-ai/haystack-extras/tree/main/nodes/text2speech
13 | type: Custom Node
14 | report_issue: https://github.com/deepset-ai/haystack-extras/issues
15 | ---
16 | 
17 | The `farm-haystack-text2speech` package contains two Nodes that allow you to convert Haystack `Answers` and `Documents` into audio files: `AnswerToSpeech` and `DocumentToSpeech`.
18 | 
19 | ## Installation
20 | 
21 | For Debain-based systems, first install some more dependencies:
22 | ```bash
23 | sudo apt-get install libsndfile1 ffmpeg
24 | ```
25 | 
26 | Install the package:
27 | ```bash
28 | pip install farm-haystack-text2speech
29 | ```
30 | 
31 | ## Usage
32 | 
33 | For a full example of how to use the `AnswerToSpeech` Node, you may try out our "[Make Your QA Pipelines Talk Tutorial](https://haystack.deepset.ai/tutorials/17_audio)"
34 | 
35 | For example, in a simple Extractive QA Pipeline:
36 | 
37 | ```python
38 | from haystack.nodes import BM25Retriever, FARMReader
39 | from text2speech import AnswerToSpeech
40 | 
41 | retriever = BM25Retriever(document_store=document_store)
42 | reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True)
43 | answer2speech = AnswerToSpeech(
44 |     model_name_or_path="espnet/kan-bayashi_ljspeech_vits", generated_audio_dir=Path("./audio_answers")
45 | )
46 | 
47 | audio_pipeline = Pipeline()
48 | audio_pipeline.add_node(retriever, name="Retriever", inputs=["Query"])
49 | audio_pipeline.add_node(reader, name="Reader", inputs=["Retriever"])
50 | audio_pipeline.add_node(answer2speech, name="AnswerToSpeech", inputs=["Reader"])
51 | ```
52 | 


--------------------------------------------------------------------------------
/integrations/fastrag.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | layout: integration
 3 | name: fastRAG
 4 | description: A research framework designed to facilitate the building of retrieval augmented generative pipelines.
 5 | authors:
 6 |     - name: Intel Labs
 7 |       socials:
 8 |         github: IntelLabs
 9 | pypi:
10 | repo: https://github.com/IntelLabs/fastRAG
11 | type: Custom Node
12 | report_issue: https://github.com/IntelLabs/fastRAG/issues
13 | logo: /logos/intel-labs.png
14 | ---
15 | 
16 | fast**RAG** is a research framework designed to facilitate the building of retrieval augmented generative pipelines. Its main goal is to make retrieval augmented generation as efficient as possible through the use of state-of-the-art and efficient retrieval and generative models. The framework includes a variety of sparse and dense retrieval models, as well as different extractive and generative information processing models. fastRAG aims to provide researchers and developers with a comprehensive tool-set for exploring and advancing the field of retrieval augmented generation.
17 | 
18 | It includes custom nodes such as:
19 | - Image Generators
20 | - Knoweldge Graph Creator
21 | - Document Shapers 
22 | - Reader with FiD implementation
23 | - Efficient document vector store (PLAID)
24 | - Benchmarking scripts
25 | 
26 | ## Installation
27 | 
28 | Preliminary requirements:
29 | 
30 | - Python 3.8+
31 | - PyTorch
32 | 
33 | In a new virtual environment, run:
34 | 
35 | ```bash
36 | pip install .
37 | ```
38 | 
39 | There are various dependencies, based on usage:
40 | 
41 | ```bash
42 | # Additional engines/components
43 | pip install .[faiss-cpu]           # CPU-based Faiss
44 | pip install .[faiss-gpu]           # GPU-based Faiss
45 | pip install .[qdrant]              # Qdrant support
46 | pip install libs/colbert           # ColBERT/PLAID indexing engine
47 | pip install .[image-generation]    # Stable diffusion library
48 | pip install .[knowledge_graph]     # spacy and KG libraries
49 | 
50 | # REST API + UI
51 | pip install .[ui]
52 | 
53 | # Benchmarking
54 | pip install .[benchmark]
55 | 
56 | # Dev tools
57 | pip install .[dev]
58 | ```
59 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Haystack Integrations
 2 | 
 3 | This repository is an index of Haystack integrations that can be used with a Haystack Pipeline or Agent.
 4 | 
 5 | These integrations are maintained by their respective owner or authors. You can browse them on the [Haystack Integrations](https://haystack.deepset.ai/integrations) page, where you will find information on the Author(s), installation and usage of each tool.
 6 | 
 7 | ## What are Haystack Integrations?
 8 | 
 9 | A Haystack Integration are Custom Nodes, DocumentStores or Agent Tools that are either external packages or additional technologies that can be used with Haystack. Some integrations may be maintained by the deepset team, others are community contributions that are owned by the authors of the integration. 
10 | 
11 | ## Looking for prompts?
12 | 
13 | Prompts for the `PromptNode` and `Agent` can be found on our [Prompt Hub](https://prompthub.deepset.ai).
14 | To contribute a prompt, follow instructions in the [`prompthub`](https://github.com/deepset-ai/prompthub) repo.
15 | 
16 | ## How to contribute
17 | 
18 | To contribute, create a PR add an `.md` file to the `integrations/` directory. A few things to include in the file 👇
19 | The frontmatter has to include the following:
20 | ```
21 | ---
22 | name: Name of your integration (required)
23 | description: A short description (this will appear on the front page element of your integration on the website) (required)
24 | authors:
25 |     - name: Name of Author 1 (required)
26 |       socials:
27 |         github: include if desired
28 |         twitter: include if desired
29 |     - name: Name of Author 2
30 |       socials:
31 |         github: include if desired
32 |         twitter: include if desired
33 | pypi: url of pypi package if exists
34 | repo: url of GitHub repo if exists 
35 | type: Custom Node OR Document Store OR Agent Tool (required)
36 | report_issue: url to where people can report an issue with the integration
37 | ---
38 | ```
39 | Note that there should be at least one of either the `pypi` or `repo` fields for us to merge the integration.
40 | 
41 | Then, please add as much information and instructions about your Integration as possible as the body of your `.md` file.
42 | 
43 | Open a Pull Request, and congrats, if all goes well, you will see your integration on the integrations page in no time 🥳
44 | 


--------------------------------------------------------------------------------
/integrations/mastodon-fetcher.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | layout: integration
 3 | name: Mastodon Fetcher
 4 | description: A custom component to fetch a mastodon usernames latest posts
 5 | authors:
 6 |     - name: Tuana Çelik
 7 |       socials:
 8 |         github: tuanacelik
 9 |         twitter: tuanacelik
10 |         linkedin: tuanacelik
11 | pypi: https://pypi.org/project/mastodon-fetcher-haystack/
12 | repo: https://github.com/tuanacelik/mastodon-fetcher-haystack
13 | type: Custom Node
14 | report_issue: https://github.com/tuanacelik/mastodon-fetcher-haystack/issues
15 | ---
16 | 
17 | The `MastodonFetcher` is a simple custom component that fetches the `last_k_posts` of a given Mastodon username.
18 | 
19 | This component expects `query` to be a complete Mastodon username. For example "tuana@sigmoid.social". If the provided username is correct and public, `MastodonFetcher` will return a list of `Document` objects where the contents are the users latest posts.
20 | 
21 | ## Installation
22 | 
23 | Run `pip install mastodon-fetcher-haystack` to install the latest available release.
24 | 
25 | ## Usage
26 | 
27 | Because the component returns a list of Documents, it can be used at the same step that a Retriever would normally be used. For example, use it in a Retrieval Augmented Generative (RAG) pipeline as follows:
28 | 
29 | ```python
30 | from haystack import Pipeline
31 | from haystack.nodes import PromptNode, PromptTemplate, AnswerParser
32 | from haystack.utils import print_answers
33 | from mastodon_fetcher_haystack.mastodon_fetcher import MastodonFetcher
34 | 
35 | mastodon_fetcher = MastodonFetcher()
36 | 
37 | prompt_template = PromptTemplate(prompt="Given the follwing Mastodon posts stream, create a short summary of the topics the account posts about. Mastodon posts stream: {join(documents)};\n Answer:", 
38 |                                 output_parser=AnswerParser())
39 | prompt_node = PromptNode(default_prompt_template=prompt_template, model_name_or_path="text-davinci-003", api_key=YOUR_OPENAI_API_KEY)
40 | 
41 | pipe = Pipeline()
42 | pipe.add_node(component=mastodon_fetcher, name="MastodonFetcher", inputs=["Query"])
43 | pipe.add_node(component=prompt_node, name="PromptNode", inputs=["MastodonFetcher"])
44 | result = pipe.run(query="tuana@sigmoid.social", params={"MastodonFetcher": {"last_k_posts": 3}})
45 | ```
46 | 
47 | ## Limitations
48 | 1. The way this component is set up is very particular with how it expects usernames. Make sure you provide the full username, e.g.: `username@instance`
49 | 2. By default, the Mastodon API allows requesting up to 40 posts.
50 | 


--------------------------------------------------------------------------------
/integrations/chroma-documentstore.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | layout: integration
 3 | name: Chroma Document Store
 4 | description: A Document Store for storing and retrieval from Chroma - built for Haystack 2.0
 5 | authors:
 6 |   - name: Massimiliano Pippi
 7 |     socials:
 8 |       github: masci
 9 | pypi: https://pypi.org/project/chroma-store
10 | repo: https://github.com/masci/chroma-haystack
11 | type: Document Store
12 | report_issue: https://github.com/masci/chroma-haystack/issues
13 | logo: /logos/chroma.png
14 | ---
15 | # Chroma Document Store for Haystack
16 | 
17 | [![PyPI - Version](https://img.shields.io/pypi/v/chroma-haystack.svg)](https://pypi.org/project/chroma-haystack)
18 | [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/chroma-haystack.svg)](https://pypi.org/project/chroma-haystack)
19 | [![test](https://github.com/masci/chroma-haystack/actions/workflows/test.yml/badge.svg)](https://github.com/masci/chroma-haystack/actions/workflows/test.yml)
20 | 
21 | -----
22 | 
23 | **Table of Contents**
24 | 
25 | - [Chroma Document Store for Haystack](#chroma-document-store-for-haystack)
26 |   - [Installation](#installation)
27 |   - [Examples](#examples)
28 |   - [License](#license)
29 | 
30 | ## Installation
31 | 
32 | ```console
33 | pip install chroma-haystack
34 | ```
35 | ## Usage
36 | Once installed, you can start using your Chroma database with Haystack 2.0 by initializing it:
37 | 
38 | ```python
39 | from chroma_haystack import ChromaDocumentStore
40 | 
41 | # Chroma is used in-memory so we use the same instances in the two pipelines below
42 | document_store = ChromaDocumentStore()
43 | ```
44 | 
45 | ### Writing Documents to ChromaDocumentStore
46 | To write documents to `ChromaDocumentStore`, create an indexing pipeline.
47 | 
48 | ```python
49 | from haystack.preview.components.file_converters import TextFileToDocument
50 | from haystack.preview.components.writers import DocumentWriter
51 | 
52 | indexing = Pipeline()
53 | indexing.add_component("converter", TextFileToDocument())
54 | indexing.add_component("writer", DocumentWriter(document_store))
55 | indexing.connect("converter", "writer")
56 | indexing.run({"converter": {"paths": file_paths}})
57 | ```
58 | 
59 | ## Examples
60 | You can find a code example showing how to use the Document Store and the Retriever under the `example/` folder of this repo or in [this Colab](https://colab.research.google.com/drive/1YpDetI8BRbObPDEVdfqUcwhEX9UUXP-m?usp=sharing).
61 | 
62 | ## License
63 | 
64 | `chroma-haystack` is distributed under the terms of the [Apache-2.0](https://spdx.org/licenses/Apache-2.0.html) license.
65 | 


--------------------------------------------------------------------------------
/integrations/azure-translator.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | layout: integration
 3 | name: Azure Translate Nodes
 4 | description: TranslateAnswer and TranslateQuery Nodes that use the Azure Translate endpoint
 5 | authors:
 6 |     - name: recrudesce (Russ)
 7 |       socials:
 8 |         github: recrudesce
 9 |         twitter: recrudesce
10 | pypi: https://pypi.org/project/haystack-translate-node/ 
11 | repo: https://github.com/recrudesce/haystack_translate_node
12 | type: Custom Node
13 | report_issue: https://github.com/recrudesce/haystack_translate_node/issues
14 | ---
15 | 
16 | This package allows you to use the Azure translation endpoints to separately translate the query and the answer. It's good for scenarios where your dataset is in a different language to what you expect the user query to be in. This way, you will be able to translate the user query to the your dataset's language, and translate the answer back to the user's language.
17 | 
18 | ## Installation
19 | Run `pip install haystack-translate-node` to install the latest available version.
20 | 
21 | ## Usage
22 | Include in your pipeline as follows:
23 | 
24 | ```python
25 | from haystack_translate_node import TranslateAnswer, TranslateQuery
26 | 
27 | translate_query = TranslateQuery(api_key="<yourapikey>", location="<yourazureregion>", azure_translate_endpoint="<yourazureendpoint>", base_lang="en")
28 | translate_answer = TranslateAnswer(api_key="<yourapikey>", location="<yourazureregion>", azure_translate_endpoint="<yourazureendpoint>", base_lang="en")
29 | 
30 | pipel = Pipeline()
31 | pipel.add_node(component=translate_query, name="TranslateQuery", inputs=["Query"])
32 | pipel.add_node(component=retriever, name="Retriever", inputs=["TranslateQuery"])
33 | pipel.add_node(component=prompt_node, name="prompt_node", inputs=["Retriever"])
34 | pipel.add_node(component=translate_answer, name="TranslateAnswer", inputs=["prompt_node"])
35 | ```
36 | 
37 | `location`, `azure_translate_endpoint`, and `base_lang` are optional, and will default to uksouth, https://api.cognitive.microsofttranslator.com/, and en respectively.
38 | 
39 | TranslateQuery will determine the language of the query, and assign it to the `in_lang` JSON value.
40 | 
41 | TranslateQuery will take the original query, in any language, and assign it to the `in_query` JSON value.
42 | 
43 | TranslateQuery will overwrite the original `query` JSON value with the translated English value
44 | 
45 | You can then query your `base_lang` corpus using the `query` value as normal using a standard Haystack Retriever node, which will place your results in `results`.
46 | 
47 | TranslateAnswer translate the `base_lang` result stored in results back to the language stored in `in_lang` and subsequently store it in the `out_answer` JSON value.
48 | 


--------------------------------------------------------------------------------
/integrations/veracity.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | layout: integration
 3 | name: Veracity Node
 4 | description: A node to check the validity of an answer, based on the given context.
 5 | authors:
 6 |     - name: Xceron
 7 |       socials:
 8 |         github: Xceron
 9 | repo: https://github.com/Xceron/haystack_veracity_node
10 | type: Custom Node
11 | report_issue: https://github.com/Xceron/haystack_veracity_node/issues
12 | ---
13 | 
14 | This Node checks whether the given input is correctly answered by the given context (as judged by the given LLM). One example usage is together with [Haystack Memory](https://github.com/rolandtannous/haystack-memory): After the memory is retrieved, the given model checks whether the output is satisfying the question.
15 | 
16 | **Important**: 
17 | The Node expects the context to be passed into `results`. If the previous node in the pipeline is putting the text somewhere else, use a [Shaper](https://docs.haystack.deepset.ai/docs/shaper) to `rename` the argument to `results`. 
18 | 
19 | ## Installation
20 | 
21 | Clone the repo to a directory, change to that directory, then perform a `pip install '.'`.  This will install the package to your Python libraries.
22 | 
23 | ## Usage
24 | ### Example Usage with Haystack Memory
25 | ```py
26 | from haystack_veracity_node.node import VeracityNode
27 | from haystack_memory.memory import RedisMemoryRecallNode
28 | from haystack_memory.prompt_templates import memory_template
29 | from haystack import Pipeline
30 | from haystack.agents import Agent, Tool
31 | from haystack.nodes import PromptNode
32 | 
33 | # Create VeracityNode
34 | veracity_node = VeracityNode(model_name_or_path="gpt-3.5-turbo", api_key="YOUR_KEY")
35 | 
36 | # Create Memory
37 | redis_memory_node = RedisMemoryRecallNode(memory_id="agent_memory",
38 |                                           host="localhost",
39 |                                           port=6379,
40 |                                           db=0)
41 | 
42 | # Add them together in a pipeline
43 | memory_pipeline = Pipeline()
44 | memory_pipeline.add_node(component=redis_memory_node, name="MemoryTool", inputs=["Query"])
45 | memory_pipeline.add_node(component=veracity_node, name="VeracityNode", inputs=["MemoryTool"])
46 | 
47 | # Create an agent and add the pipeline as a tool
48 | prompt_node = PromptNode(model_name_or_path="text-davinci-003", api_key=openai_api_key, max_length=512,
49 |                          stop_words=["Observation:"])
50 | memory_agent = Agent(prompt_node=prompt_node, prompt_template=memory_template)
51 | memory_tool = Tool(name="Memory",
52 |                    pipeline_or_node=memory_pipeline,
53 |                    description="Your memory. Always access this tool first to remember what you have learned.")
54 | 
55 | memory_agent.add_tool(memory_tool)
56 | ```
57 | 


--------------------------------------------------------------------------------
/integrations/chainlit.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | layout: integration
 3 | name: Chainlit Agent UI
 4 | description: Visualise and debug your agent's intermediary steps!
 5 | authors:
 6 |     - name: Chainlit Team
 7 |       socials:
 8 |         github: Chainlit
 9 |         twitter: chainlit_io
10 | pypi: https://pypi.org/project/chainlit/
11 | repo: https://github.com/Chainlit/chainlit
12 | type: Custom Node
13 | report_issue: https://github.com/Chainlit/chainlit/issues
14 | logo: /logos/chainlit.png
15 | ---
16 | 
17 | Chainlit is an open-source Python package that makes it incredibly fast to build, test and share LLM apps. Integrate the Chainlit API in your existing code to spawn a ChatGPT-like interface in minutes. With a simple line of code, you can leverage Chainlit to interact with your agent, visualise intermediary steps, debug them in an advanced prompt playground and share your app to collect human feedback. More info on the [documentation](https://docs.chainlit.io/).
18 | 
19 | ![Chainlit screenshot](https://raw.githubusercontent.com/deepset-ai/haystack-integrations/main/images/chainlit-haystack.png)
20 | 
21 | ## Installation
22 | 
23 | ```bash
24 | pip install chainlit
25 | ```
26 | 
27 | ## Usage
28 | 
29 | Create a new Python file named `app.py` with the code below. This code adds the Chainlit callback handler to the Haystack callback manager. The callback handler is responsible for listening to the Agent’s intermediate steps and sending them to the UI.
30 | 
31 | ```python
32 | from haystack.agents.conversational import ConversationalAgent
33 | import chainlit as cl
34 | 
35 | ## Agent Code
36 | 
37 | agent = ConversationalAgent(
38 |   prompt_node=conversational_agent_prompt_node,
39 |   memory=memory,
40 |   prompt_template=agent_prompt,
41 |   tools=[search_tool],
42 | )
43 | 
44 | cl.HaystackAgentCallbackHandler(agent)
45 | 
46 | @cl.on_message
47 | async def main(message: str):
48 |     response = await cl.make_async(agent.run)(message)
49 |     await cl.Message(author="Agent", content=response["answers"][0].answer).send()
50 | ```
51 | 
52 | To kick off your LLM app, open a terminal, navigate to the directory containing `app.py`, and run the following command:
53 | 
54 | ```bash
55 | chainlit run app.py
56 | ```
57 | 
58 | ## Example
59 | Check out this full example from [the cookbook](https://github.com/Chainlit/cookbook/tree/main/haystack). 
60 | 
61 | ## About Chainlit
62 | Chainlit is an open-source Python package that makes it incredibly fast to build, test and share LLM apps. Integrate the Chainlit API in your existing code to spawn a ChatGPT-like interface in minutes!
63 | 
64 | ### Key features
65 | - Build LLM Apps fast: Integrate seamlessly with an existing code base or start from scratch in minutes
66 | - Visualize multi-steps reasoning: Understand the intermediary steps that produced an output at a glance
67 | - Iterate on prompts: Deep dive into prompts in the Prompt Playground to understand where things went wrong and iterate
68 | - Collaborate with teammates: Invite your teammates, create annotated datasets and run experiments together
69 | - Share your app: Publish your LLM app and share it with the world (coming soon)
70 | 


--------------------------------------------------------------------------------
/integrations/document-threshold.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | layout: integration
 3 | name: Document Threshold
 4 | description: This component filters documents based on a minimum Confidence Score percentage, ensuring only the documents above the threshold get passed down the pipeline.
 5 | authors:
 6 |     - name: recrudesce
 7 |       socials:
 8 |         github: recrudesce
 9 |         twitter: recrudesce
10 | pypi: https://pypi.org/project/haystack-threshold-node/
11 | repo: https://github.com/recrudesce/haystack_threshold_node
12 | type: Custom Node
13 | report_issue: https://github.com/recrudesce/haystack_threshold_node/issues
14 | ---
15 | # Haystack Threshold Node
16 | This component filters documents based on a threshold percentage, ensuring only the documents above the threshold get passed down the pipeline.
17 | This allows you to query your document store for a larger top_k, but then filter the results down to those which are above a set confidence score.
18 | 
19 | ## Installation
20 | 
21 | `pip install haystack-threshold-node`
22 | 
23 | ## Usage
24 | 
25 | Include it in your pipeline - example as follows:
26 | 
27 | ```python
28 | import logging
29 | import re
30 | 
31 | from datasets import load_dataset
32 | from haystack.document_stores import InMemoryDocumentStore
33 | from haystack.nodes import PromptNode, PromptTemplate, AnswerParser, BM25Retriever
34 | from haystack.pipelines import Pipeline
35 | from haystack_lemmatize_node import LemmatizeDocuments
36 | 
37 | 
38 | logging.basicConfig(format="%(levelname)s - %(name)s -  %(message)s", level=logging.WARNING)
39 | logging.getLogger("haystack").setLevel(logging.INFO)
40 | 
41 | document_store = InMemoryDocumentStore(use_bm25=True)
42 | 
43 | dataset = load_dataset("bilgeyucel/seven-wonders", split="train")
44 | document_store.write_documents(dataset)
45 | 
46 | retriever = BM25Retriever(document_store=document_store, top_k=10)
47 | 
48 | lfqa_prompt = PromptTemplate(
49 |     name="lfqa",
50 |     prompt_text="Given the context please answer the question using your own words. Generate a comprehensive, summarized answer. If the information is not included in the provided context, reply with 'Provided documents didn't contain the necessary information to provide the answer'\n\nContext: {documents}\n\nQuestion: {query} \n\nAnswer:",
51 |     output_parser=AnswerParser(),
52 | )
53 | 
54 | prompt_node = PromptNode(
55 |     model_name_or_path="text-davinci-003",
56 |     default_prompt_template=lfqa_prompt,
57 |     max_length=500,
58 |     api_key="sk-OPENAIKEY",
59 | )
60 | 
61 | # The value you pass for threshold is the lowest % score you will accept. Whole numbers only.
62 | # In this example, the threshold is set to 80%.
63 | threshold = DocumentThreshold(threshold=80) 
64 | 
65 | pipe = Pipeline()
66 | pipe.add_node(component=retriever, name="Retriever", inputs=["Query"])
67 | pipe.add_node(component=threshold, name="Threshold", inputs=["Retriever"])
68 | pipe.add_node(component=prompt_node, name="prompt_node", inputs=["Threshold"])
69 | 
70 | query = "What does the Rhodes Statue look like?"
71 |   
72 | output = pipe.run(query)
73 | 
74 | print(output['answers'][0].answer)
75 | ```
76 | 


--------------------------------------------------------------------------------
/integrations/entailment-checker.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | layout: integration
 3 | name: Entailment Checker
 4 | description: Haystack node for checking the entailment between a statement and a list of Documents
 5 | authors:
 6 |     - name: Stefano Fiorucci
 7 |       socials:
 8 |         github: anakin87
 9 | pypi: https://pypi.org/project/haystack-entailment-checker/
10 | repo: https://github.com/anakin87/haystack-entailment-checker
11 | type: Custom Node
12 | report_issue: https://github.com/anakin87/haystack-entailment-checker/issues
13 | ---
14 | **Live Demo**: Fact Checking 🎸 Rocks! &nbsp; [![Generic badge](https://img.shields.io/badge/🤗-Open%20in%20Spaces-blue.svg)](https://huggingface.co/spaces/anakin87/fact-checking-rocks)
15 | 
16 | ## How it works
17 | ![Entailment Checker Node](https://github.com/anakin87/haystack-entailment-checker/raw/main/images/entailment_checker_node.png)
18 | - The node takes a list of Documents (commonly returned by a [Retriever](https://docs.haystack.deepset.ai/docs/retriever)) and a statement as input.
19 | - Using a Natural Language Inference model, the text entailment between each text passage/Document (premise) and the statement (hypothesis) is computed. For every text passage, we get 3 scores (summing to 1): entailment, contradiction and neutral.
20 | - The text entailment scores are aggregated using a weighted average. The weight is the relevance score of each passage returned by the Retriever, if availaible. It expresses the similarity between the text passage and the statement. **Now we have a summary score, so it is possible to tell if the passages confirm, are neutral or disprove the user statement.**
21 | - *Empirical consideration: if in the first N documents (N<K), there is strong evidence of entailment/contradiction (partial aggregate scores > **threshold**), it is better not to consider the less relevant other (K-N) documents.*
22 | 
23 | ## Installation
24 | ```bash
25 | pip install haystack-entailment-checker
26 | ```
27 | 
28 | ## Usage
29 | ### Basic example
30 | ```python
31 | from haystack import Document
32 | from haystack_entailment_checker import EntailmentChecker
33 | 
34 | ec = EntailmentChecker(
35 |         model_name_or_path = "microsoft/deberta-v2-xlarge-mnli",
36 |         use_gpu = False,
37 |         entailment_contradiction_threshold = 0.5)
38 | 
39 | doc = Document("My cat is lazy")
40 | 
41 | print(ec.run("My cat is very active", [doc]))
42 | # ({'documents': [...],
43 | # 'aggregate_entailment_info': {'contradiction': 1.0, 'neutral': 0.0, 'entailment': 0.0}}, ...)
44 | ```
45 | 
46 | ### Fact-checking pipeline (Retriever + EntailmentChecker)
47 | ```python
48 | from haystack import Document, Pipeline
49 | from haystack.nodes import BM25Retriever
50 | from haystack.document_stores import InMemoryDocumentStore
51 | from haystack_entailment_checker import EntailmentChecker
52 | 
53 | # INDEXING
54 | # the knowledge base can consist of many documents
55 | docs = [...]
56 | ds = InMemoryDocumentStore(use_bm25=True)
57 | ds.write_documents(docs)
58 | 
59 | # QUERYING
60 | retriever = BM25Retriever(document_store=ds)
61 | ec = EntailmentChecker()
62 | 
63 | pipe = Pipeline()
64 | pipe.add_node(component=retriever, name="Retriever", inputs=["Query"])
65 | pipe.add_node(component=ec, name="EntailmentChecker", inputs=["Retriever"])
66 | 
67 | pipe.run(query="YOUR STATEMENT TO CHECK")
68 | ```
69 | 


--------------------------------------------------------------------------------
/integrations/opensearch-document-store.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | layout: integration
 3 | name: OpenSearch Document Store
 4 | description: Use an OpenSearch database with Haystack
 5 | authors:
 6 |     - name: deepset
 7 |       socials:
 8 |         github: deepset-ai
 9 |         twitter: deepset_ai
10 |         linkedin: deepset-ai
11 | pypi: https://pypi.org/project/farm-haystack
12 | repo: https://github.com/deepset-ai/haystack
13 | type: Document Store
14 | report_issue: https://github.com/deepset-ai/haystack/issues
15 | logo: /logos/opensearch.png
16 | ---
17 | 
18 | You can use [OpenSearch](https://opensearch.org/docs/latest/#docker-quickstart) in your Haystack pipelines with the [OpenSearchDocumentStore](https://docs.haystack.deepset.ai/docs/document_store#initialization)
19 | 
20 | For a detailed overview of all the available methods and settings for the `OpenSearchDocumentStore`, visit the Haystack [API Reference](https://docs.haystack.deepset.ai/reference/document-store-api#opensearchdocumentstore)
21 | 
22 | ## Installation
23 | 
24 | ```bash
25 | pip install farm-haystack[opensearch]
26 | ```
27 | 
28 | ## Usage
29 | 
30 | Once installed and running, you can start using OpenSearch with Haystack by initializing it: 
31 | 
32 | ```python
33 | from haystack.document_stores import OpenSearchDocumentStore
34 | 
35 | document_store = OpenSearchDocumentStore()
36 | ```
37 | 
38 | ### Writing Documents to OpenSearchDocumentStore
39 | 
40 | To write documents to your `OpenSearchDocumentStore`, create an indexing pipeline, or use the `write_documents()` function.
41 | For this step, you may make use of the available [FileConverters](https://docs.haystack.deepset.ai/docs/file_converters) and [PreProcessors](https://docs.haystack.deepset.ai/docs/preprocessor), as well as other [Integrations](/integrations) that might help you fetch data from other resources.
42 | 
43 | #### Indexing Pipeline
44 | 
45 | ```python
46 | from haystack import Pipeline
47 | from haystack.document_stores import OpenSearchDocumentStore
48 | from haystack.nodes import PDFToTextConverter, PreProcessor
49 | 
50 | document_store = OpenSearchDocumentStore()
51 | converter = PDFToTextConverter()
52 | preprocessor = PreProcessor()
53 | 
54 | indexing_pipeline = Pipeline()
55 | indexing_pipeline.add_node(component=converter, name="PDFConverter", inputs=["File"])
56 | indexing_pipeline.add_node(component=preprocessor, name="PreProcessor", inputs=["PDFConverter"])
57 | indexing_pipeline.add_node(component=document_store, name="DocumentStore", inputs=["PreProcessor"])
58 | 
59 | indexing_pipeline.run(file_paths=["filename.pdf"])
60 | ```
61 | 
62 | ### Using OpenSearch in a Query Pipeline
63 | 
64 | Once you have documents in your `OpenSearchDocumentStore`, it's ready to be used in any Haystack pipeline. For example, below is a pipeline that makes use of the ["deepset/question-generation"](https://prompthub.deepset.ai/?prompt=deepset%2Fquestion-generation) prompt that is designed to generate questions for the retrieved documents. If our `OpenSearchDocumentStore` had documents about food in it, you could generate questions about "Pizzas" in the following way:
65 | 
66 | ```python
67 | from haystack import Pipeline
68 | from haystack.document_stores import OpenSearchDocumentStore
69 | from haystack.nodes import BM25Retriever, PromptNode
70 | 
71 | document_store = OpenSearchDocumentStore()
72 | retriever = BM25Retriever(document_sotre = document_store)
73 | prompt_node = PromptNode(model_name_or_path = "gpt-4",
74 |                          api_key = "YOUR_OPENAI_KEY",
75 |                          default_prompt_template = "deepset/question-generation")
76 | 
77 | query_pipeline = Pipeline()
78 | query_pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"])
79 | query_pipeline.add_node(component=prompt_node, name="PromptNode", inputs=["Retriever"])
80 | 
81 | query_pipeline.run(query = "Pizzas")
82 | ```


--------------------------------------------------------------------------------
/integrations/readmedocs-fetcher.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | layout: integration
 3 | name: ReadMeDocs Fetcher
 4 | description: Fetch documentdation pages from ReadMe docs sites.
 5 | authors:
 6 |     - name: Tuana Çelik
 7 |       socials:
 8 |         github: tuanacelik
 9 |         twitter: tuanacelik
10 |         linkedin: tuanacelik
11 |     - name: Silvano Cerza
12 |       socials:
13 |         github: silvanocerza
14 |         linkedin: silvanocerza
15 | pypi: https://pypi.org/project/readmedocs-fetcher-haystack/
16 | repo: https://github.com/tuanacelik/readmedocs-fetcher-haystack
17 | type: Custom Node
18 | report_issue: https://github.com/tuanacelik/readmedocs-fetcher-haystack/issues
19 | ---
20 | 
21 | This custom component for Haystack is designed to fetch documentation pages from the [ReadMe](https://readme.com/) documentation you have access to. It uses a `MarkdownConverter` to convert all of your documentation pages to a list of Haystack `Documents`. You can use this node as a standalone node or within an indexing pipeline. 
22 | 
23 | ## Installation
24 | 
25 | ```bash
26 | pip install readmedocs-fetcher-haystack
27 | ```
28 | 
29 | ## Usage
30 | 
31 | 1. To initialize a `ReadmeDocsFetcher` you have to provide an `api_key` parameter. This is your ReadMe Docs API Key.
32 | 2. There are 3 optional parameters to initialize the `ReadmeDocsFetcher`
33 |     - `slugs`: To fetch a list of specific pages from your documentation. E.g. if you have want to fetch 'https://docs.haystack.deepset.ai/docs/installation' the slug would be `installation`. If not set, all of the available pages will be fetched.
34 |     - `base_url`: Optionally provide this to add the full url of a documentation page to the `meta` of the created document. For example `base_url='https://docs.haystack.deepset.ai'"`
35 |     - `version`: If not set, the latest stable version of tour docs will be fetched. 
36 |     - `markdown_converter`: When documents are fetched from ReadMe, temporary `.md` files are created and we use a [`MakrdownConverter`](https://docs.haystack.deepset.ai/reference/file-converters-api#markdownconverter) to create a list of haystack `Documents`. If not provided at initialization, then a `MarkdownConverter` with the default parameters is used.
37 | 
38 | ### Standalone
39 | ```python
40 | import os
41 | from dotenv import load_dotenv
42 | from haystack.nodes import MarkdownConverter
43 | from readmedocs_fetcher_haystack import ReadmeDocsFetcher
44 | 
45 | load_dotenv()
46 | README_API_KEY = os.getenv('README_API_KEY')
47 | 
48 | converter = MarkdownConverter(remove_code_snippets=False)
49 | readme_fetcher = ReadmeDocsFetcher(api_key=README_API_KEY, markdown_converter=converter, base_url="https://docs.haystack.deepset.ai")
50 | readme_fetcher.fetch_docs()
51 | ```
52 | 
53 | To fetch a single doc from a specific version:
54 | ```python
55 | readme_fetcher.fetch_docs(slugs=["nodes_overview"], version="v1.18")
56 | ```
57 | ### In a Pipeline
58 | 
59 | ```python
60 | import os
61 | from dotenv import load_dotenv
62 | from haystack import Pipeline
63 | from haystack.nodes import MarkdownConverter, PreProcessor
64 | from haystack.document_stores import InMemoryDocumentStore
65 | from readmedocs_fetcher_haystack import ReadmeDocsFetcher
66 | 
67 | load_dotenv()
68 | README_API_KEY = os.getenv('README_API_KEY')
69 | 
70 | converter = MarkdownConverter(remove_code_snippets=False)
71 | readme_fetcher = ReadmeDocsFetcher(api_key=README_API_KEY, markdown_converter=converter, base_url="https://docs.haystack.deepset.ai"))
72 | 
73 | preprocessor = PreProcessor()
74 | doc_store = InMemoryDocumentStore()
75 | 
76 | pipe = Pipeline()
77 | pipe.add_node(component=readme_fetcher, name="ReadmeFetcher", inputs=["File"])
78 | pipe.add_node(component=preprocessor, name="Preprocessor", inputs=["ReadmeFetcher"])
79 | pipe.add_node(component=doc_store, name="DocumentStore", inputs=["Preprocessor"])
80 | pipe.run()
81 | ```
82 | 
83 | To fetch a single documentation page:
84 | ```python
85 | pipe.run(params={"ReadmeFetcher":{"slugs": ["nodes_overview"]}})
86 | ```
87 | 


--------------------------------------------------------------------------------
/integrations/faiss-document-store.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | layout: integration
 3 | name: FAISS Document Store
 4 | description: Use a FAISS vector database with Haystack
 5 | authors:
 6 |     - name: deepset
 7 |       socials:
 8 |         github: deepset-ai
 9 |         twitter: deepset_ai
10 |         linkedin: deepset-ai
11 | pypi: https://pypi.org/project/farm-haystack
12 | repo: https://github.com/deepset-ai/haystack
13 | type: Document Store
14 | report_issue: https://github.com/deepset-ai/haystack/issues
15 | logo: /logos/meta.png
16 | ---
17 | 
18 | [Faiss](https://github.com/facebookresearch/faiss#readme) is a project by Meta, for efficient vector search. You can use it in your Haystack pipelines with the [FAISSDocumentStore](https://docs.haystack.deepset.ai/docs/document_store#initialization)
19 | 
20 | For a detailed explanation on different initialization options of the `FAISSDocumentStore`, please visit the [Haystack Documentation](https://docs.haystack.deepset.ai/docs/document_store#initialization) and [API Reference](https://docs.haystack.deepset.ai/reference/document-store-api#faissdocumentstore). Below are some examples of how you might use it within a Haystack Pipeline.
21 | 
22 | ## Installation
23 | 
24 | ```bash
25 | pip install farm-haystack[faiss]
26 | ```
27 | 
28 | or to install `FAISSDocumentStore` with GPU support, you may install:
29 | ```bash
30 | pip install farm-haystack[faiss-gpu]
31 | ```
32 | 
33 | ## Usage
34 | 
35 | Once installed, you can start using FAISS with Haystack by initializing it: 
36 | 
37 | ```python
38 | from haystack.document_stores import FAISSDocumentStore
39 | 
40 | document_store = FAISSDocumentStore()
41 | ```
42 | 
43 | ### Writing Documents to FAISSDocumentStore
44 | 
45 | To write documents to your `FAISSDocumentStore`, create an indexing pipeline, or use the `write_documents()` function.
46 | For this step, you may make use of the available [FileConverters](https://docs.haystack.deepset.ai/docs/file_converters) and [PreProcessors](https://docs.haystack.deepset.ai/docs/preprocessor), as well as other [Integrations](/integrations) that might help you fetch data from other resources.
47 | 
48 | #### Indexing Pipeline
49 | 
50 | ```python
51 | from haystack import Pipeline
52 | from haystack.document_stores import FAISSDocumentStore
53 | from haystack.nodes import PDFToTextConverter, PreProcessor
54 | 
55 | document_store = FAISSDocumentStore()
56 | converter = PDFToTextConverter()
57 | preprocessor = PreProcessor()
58 | 
59 | indexing_pipeline = Pipeline()
60 | indexing_pipeline.add_node(component=converter, name="PDFConverter", inputs=["File"])
61 | indexing_pipeline.add_node(component=preprocessor, name="PreProcessor", inputs=["PDFConverter"])
62 | indexing_pipeline.add_node(component=document_store, name="DocumentStore", inputs=["PreProcessor"])
63 | 
64 | indexing_pipeline.run(file_paths=["filename.pdf"])
65 | ```
66 | 
67 | ### Using Faiss in a Query Pipeline
68 | 
69 | Once you have documents in your `FAISSDocumentStore`, it's ready to be used in any Haystack pipeline. Such as a Retrieval Augmented Generation (RAG) pipeline. Learn more about [Retrievers](https://docs.haystack.deepset.ai/docs/retriever) to make use of vector search within your LLM pipelines.
70 | 
71 | ```python
72 | from haystack import Pipeline
73 | from haystack.document_stores import FAISSDocumentStore
74 | from haystack.nodes import EmbeddingRetriever, PromptNode
75 | 
76 | document_store = FAISSDocumentStore()
77 | retriever = EmbeddingRetriever(document_store = document_store,
78 |                                embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1")
79 | prompt_node = PromptNode(model_name_or_path = "gpt-4",
80 |                          api_key = "YOUR_OPENAI_KEY",
81 |                          default_prompt_template = "deepset/question-answering-with-references")
82 | 
83 | query_pipeline = Pipeline()
84 | query_pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"])
85 | query_pipeline.add_node(component=prompt_node, name="PromptNode", inputs=["Retriever"])
86 | 
87 | query_pipeline.run(query = "What is Haystack?")
88 | ```


--------------------------------------------------------------------------------
/integrations/lemmatize.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | layout: integration
 3 | name: Document Lemmatizer
 4 | description: A lemmatizing node for documents which can potentially reduce token use by up to 30%.
 5 | authors:
 6 |     - name: recrudesce
 7 |       socials:
 8 |         github: recrudesce
 9 |         twitter: recrudesce
10 |     - name: Xceron
11 |       socials:
12 |         github: Xceron
13 | pypi: https://pypi.org/project/haystack-lemmatize-node/
14 | repo: https://github.com/recrudesce/haystack_lemmatize_node
15 | type: Custom Node
16 | report_issue: https://github.com/recrudesce/haystack_lemmatize_node/issues
17 | ---
18 | 
19 | ## Lemmatization
20 | 
21 | Lemmatization is a text pre-processing technique used in natural language processing (NLP) models to break a word down to its root meaning to identify similarities. For example, a lemmatization algorithm would reduce the word better to its root word, or lemme, good.
22 | 
23 | This node can be placed within a pipeline to lemmatize documents returned by a Retriever, prior to adding them as context to a prompt (for a PromptNode or similar).
24 | The process of lemmatizing the document content can potentially reduce the amount of tokens used by up to 30%, without drastically affecting the meaning of the document.
25 | 
26 | ![image](https://user-images.githubusercontent.com/6450799/230403871-d0299748-977c-4c9e-9d70-914d8ff2bf3b.png)
27 | 
28 | ### Before Lemmatization:
29 | ![image](https://user-images.githubusercontent.com/6450799/230404198-a3ed6382-03b8-4ec6-b88d-4232560752f8.png)
30 | 
31 | ### After Lemmatization:
32 | ![image](https://user-images.githubusercontent.com/6450799/230404246-a8488a57-73bd-4420-9f1b-8a080b84121b.png)
33 | 
34 | ## Installation
35 | 
36 | Run `pip install haystack-lemmatize-node` to install the latest available release.
37 | 
38 | ## Usage
39 | 
40 | Include it in your pipeline - example as follows:
41 | 
42 | ```python
43 | import logging
44 | import re
45 | 
46 | from datasets import load_dataset
47 | from haystack.document_stores import InMemoryDocumentStore
48 | from haystack.nodes import PromptNode, PromptTemplate, AnswerParser, BM25Retriever
49 | from haystack.pipelines import Pipeline
50 | from haystack_lemmatize_node import LemmatizeDocuments
51 | 
52 | 
53 | logging.basicConfig(format="%(levelname)s - %(name)s -  %(message)s", level=logging.WARNING)
54 | logging.getLogger("haystack").setLevel(logging.INFO)
55 | 
56 | document_store = InMemoryDocumentStore(use_bm25=True)
57 | 
58 | dataset = load_dataset("bilgeyucel/seven-wonders", split="train")
59 | document_store.write_documents(dataset)
60 | 
61 | retriever = BM25Retriever(document_store=document_store, top_k=2)
62 | 
63 | lfqa_prompt = PromptTemplate(
64 |     name="lfqa",
65 |     prompt_text="Given the context please answer the question using your own words. Generate a comprehensive, summarized answer. If the information is not included in the provided context, reply with 'Provided documents didn't contain the necessary information to provide the answer'\n\nContext: {documents}\n\nQuestion: {query} \n\nAnswer:",
66 |     output_parser=AnswerParser(),
67 | )
68 | 
69 | prompt_node = PromptNode(
70 |     model_name_or_path="text-davinci-003",
71 |     default_prompt_template=lfqa_prompt,
72 |     max_length=500,
73 |     api_key="sk-OPENAIKEY",
74 | )
75 | 
76 | lemmatize = LemmatizeDocuments() # you can pass the `base_lang=XX` argument here too, where XX is a language as listed here: https://pypi.org/project/simplemma/
77 | 
78 | pipe = Pipeline()
79 | pipe.add_node(component=retriever, name="Retriever", inputs=["Query"])
80 | pipe.add_node(component=lemmatize, name="Lemmatize", inputs=["Retriever"])
81 | pipe.add_node(component=prompt_node, name="prompt_node", inputs=["Lemmatize"])
82 | 
83 | query = "What does the Rhodes Statue look like?"
84 |   
85 | output = pipe.run(query)
86 | 
87 | print(output['answers'][0].answer)
88 | ```
89 | 
90 | ## Caveats
91 | Sometimes lemmatization can be slow for large document content, but in the world of AI where we can potentially wait 30+ seconds for an LLM to respond (hello GPT-4), what's a couple more seconds?
92 | 


--------------------------------------------------------------------------------
/integrations/qdrant-document-store.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | layout: integration
  3 | name: Qdrant Document Store
  4 | description: Use the Qdrant vector database with Haystack
  5 | authors:
  6 |     - name: Qdrant 
  7 |       socials:
  8 |         github: qdrant
  9 |         twitter: qdrant_engine
 10 | pypi: https://pypi.org/project/qdrant-haystack/
 11 | repo: https://github.com/qdrant/qdrant-haystack
 12 | type: Document Store
 13 | report_issue: https://github.com/qdrant/qdrant-haystack/issues
 14 | logo: /logos/qdrant.png
 15 | ---
 16 | 
 17 | An integration of [Qdrant](https://qdrant.tech) vector database with [Haystack](https://haystack.deepset.ai/)
 18 | by [deepset](https://www.deepset.ai).
 19 | 
 20 | The library finally allows using Qdrant as a document store, and provides an in-place replacement
 21 | for any other vector embeddings store. Thus, you should expect any kind of application to be working
 22 | smoothly just by changing the provider to `QdrantDocumentStore`.
 23 | 
 24 | ## Installation
 25 | 
 26 | `qdrant-haystack` might be installed as any other Python library, using pip or poetry:
 27 | 
 28 | ```bash
 29 | pip install qdrant-haystack
 30 | ```
 31 | 
 32 | ```bash
 33 | poetry add qdrant-haystack
 34 | ```
 35 | 
 36 | ## Usage
 37 | 
 38 | Once installed, you can already start using `QdrantDocumentStore` as any other store that supports
 39 | embeddings.
 40 | 
 41 | ```python
 42 | from qdrant_haystack import QdrantDocumentStore
 43 | 
 44 | document_store = QdrantDocumentStore(
 45 |     url="localhost",
 46 |     index="Document",
 47 |     embedding_dim=512,
 48 |     recreate_index=True,
 49 |     hnsw_config={"m": 16, "ef_construct": 64}  # Optional
 50 | )
 51 | ```
 52 | 
 53 | The list of parameters accepted by `QdrantDocumentStore` is complementary to those used in the
 54 | official [Python Qdrant client](https://github.com/qdrant/qdrant_client).
 55 | 
 56 | ### Using local in-memory / disk-persisted mode
 57 | 
 58 | Qdrant Python client, from version 1.1.1, supports local in-memory/disk-persisted mode. That's 
 59 | a good choice for any test scenarios and quick experiments in which you do not plan to store
 60 | lots of vectors. In such a case spinning a Docker container might be even not required. 
 61 | 
 62 | The local mode was also implemented in `qdrant-haystack` integration.
 63 | 
 64 | #### In-memory storage
 65 | 
 66 | In case you want to have a transient storage, for example in case of automated tests launched
 67 | during your CI/CD pipeline, using Qdrant Local mode with in-memory storage might be a preferred
 68 | option. It might be simply enabled by passing `:memory:` as first parameter, while creating an
 69 | instance of `QdrantDocumentStore`.
 70 | 
 71 | ```python
 72 | from qdrant_haystack import QdrantDocumentStore
 73 | 
 74 | document_store = QdrantDocumentStore(
 75 |     ":memory:",
 76 |     index="Document",
 77 |     embedding_dim=512,
 78 |     recreate_index=True,
 79 |     hnsw_config={"m": 16, "ef_construct": 64}  # Optional
 80 | )
 81 | ```
 82 | 
 83 | #### On disk storage
 84 | 
 85 | However, if you prefer to keep the vectors between different runs of your application, it
 86 | might be better to use on disk storage and pass the path that should be used to persist
 87 | the data.
 88 | 
 89 | ```python
 90 | from qdrant_haystack import QdrantDocumentStore
 91 | 
 92 | document_store = QdrantDocumentStore(
 93 |     path="/home/qdrant/storage_local",
 94 |     index="Document",
 95 |     embedding_dim=512,
 96 |     recreate_index=True,
 97 |     hnsw_config={"m": 16, "ef_construct": 64}  # Optional
 98 | )
 99 | ```
100 | 
101 | ### Connecting to Qdrant Cloud cluster
102 | 
103 | If you prefer not to manage your own Qdrant instance, [Qdrant Cloud](https://cloud.qdrant.io/)
104 | might be a better option.
105 | 
106 | ```python
107 | from qdrant_haystack import QdrantDocumentStore
108 | 
109 | document_store = QdrantDocumentStore(
110 |     url="https://YOUR-CLUSTER-URL.aws.cloud.qdrant.io",
111 |     index="Document",
112 |     api_key="<< YOUR QDRANT CLOUD API KEY >>",
113 |     embedding_dim=512,
114 |     recreate_index=True,
115 | )
116 | ```
117 | 
118 | There is no difference in terms of functionality between local instances and cloud clusters.


--------------------------------------------------------------------------------
/integrations/elasticsearch-document-store.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | layout: integration
 3 | name: Elasticsearch Document Store
 4 | description: Use an Elasticsearch database with Haystack
 5 | authors:
 6 |     - name: deepset
 7 |       socials:
 8 |         github: deepset-ai
 9 |         twitter: deepset_ai
10 |         linkedin: deepset-ai
11 | pypi: https://pypi.org/project/farm-haystack
12 | repo: https://github.com/deepset-ai/haystack
13 | type: Document Store
14 | report_issue: https://github.com/deepset-ai/haystack/issues
15 | logo: /logos/elastic.png
16 | ---
17 | 
18 | The `ElasticsearchDocumentStore` is maintained within the core Haystack project. It allows you to use [Elasticsearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/elasticsearch-intro.html) as data storage for your Haystack pipelines.
19 | 
20 | For a details on available methods, visit the [API Reference](https://docs.haystack.deepset.ai/reference/document-store-api#elasticsearchdocumentstore-1)
21 | 
22 | ## Installation
23 | 
24 | To run an Elasticsearch instance locally, first follow the [installation](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html) and [start up](https://www.elastic.co/guide/en/elasticsearch/reference/current/starting-elasticsearch.html) guides. 
25 | 
26 | ```bash
27 | pip install farm-haystack[elasticsearch]
28 | ```
29 | 
30 | To install Elasticsearch 7, you can run `pip install farm-haystac[elasticsearch7]`.
31 | 
32 | ## Usage
33 | 
34 | Once installed, you can start using your Elasticsearch database with Haystack by initializing it: 
35 | 
36 | ```python
37 | from haystack.document_stores import ElasticsearchDocumentStore
38 | 
39 | document_store = ElasticsearchDocumentStore(host = "localhost",
40 |                                             port = 9200,
41 |                                             embedding_dim = 768)
42 | ```
43 | 
44 | ### Writing Documents to ElasticsearchDocumentStore
45 | 
46 | To write documents to your `ElasticsearchDocumentStore`, create an indexing pipeline, or use the `write_documents()` function.
47 | For this step, you may make use of the available [FileConverters](https://docs.haystack.deepset.ai/docs/file_converters) and [PreProcessors](https://docs.haystack.deepset.ai/docs/preprocessor), as well as other [Integrations](/integrations) that might help you fetch data from other resources.
48 | 
49 | #### Indexing Pipeline
50 | 
51 | ```python
52 | from haystack import Pipeline
53 | from haystack.document_stores import ElasticsearchDocumentStore
54 | from haystack.nodes import TextConverter, PreProcessor
55 | 
56 | document_store = ElasticsearchDocumentStore(host = "localhost", port = 9200)
57 | converter = TextConverter()
58 | preprocessor = PreProcessor()
59 | 
60 | indexing_pipeline = Pipeline()
61 | indexing_pipeline.add_node(component=converter, name="TextConverter", inputs=["File"])
62 | indexing_pipeline.add_node(component=preprocessor, name="PreProcessor", inputs=["TextConverter"])
63 | indexing_pipeline.add_node(component=document_store, name="DocumentStore", inputs=["PreProcessor"])
64 | 
65 | indexing_pipeline.run(file_paths=["filename.txt"])
66 | ```
67 | 
68 | ### Using Elasticsearch in a Query Pipeline
69 | 
70 | Once you have documents in your `ElasitsearchDocumentStore`, it's ready to be used in any Haystack pipeline. Such as a Retrieval Augmented Generation (RAG) pipeline. Learn more about [Retrievers](https://docs.haystack.deepset.ai/docs/retriever) to make use of vector search within your LLM pipelines.
71 | 
72 | ```python
73 | from haystack import Pipeline
74 | from haystack.document_stores import ElasticsearchDocumentStore
75 | from haystack.nodes import EmbeddingRetriever, PromptNode
76 | 
77 | document_store = ElasticsearchDocumentStore()
78 | retriever = EmbeddingRetriever(document_store = document_store,
79 |                                embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1")
80 | prompt_node = PromptNode(model_name_or_path = "google/flan-t5-xl", default_prompt_template = "deepset/question-answering")
81 | 
82 | query_pipeline = Pipeline()
83 | query_pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"])
84 | query_pipeline.add_node(component=prompt_node, name="PromptNode", inputs=["Retriever"])
85 | 
86 | query_pipeline.run(query = "Where is Istanbul?")
87 | ```


--------------------------------------------------------------------------------
/integrations/pinecone-document-store.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | layout: integration
 3 | name: Pinecone Document Store
 4 | description: Use a Pinecone database with Haystack
 5 | authors:
 6 |     - name: deepset
 7 |       socials:
 8 |         github: deepset-ai
 9 |         twitter: deepset_ai
10 |         linkedin: deepset-ai
11 | pypi: https://pypi.org/project/farm-haystack
12 | repo: https://github.com/deepset-ai/haystack
13 | type: Document Store
14 | report_issue: https://github.com/deepset-ai/haystack/issues
15 | logo: /logos/pinecone.png
16 | ---
17 | 
18 | [Pinecone](https://www.pinecone.io/) is a fast and scalable vector database which you can use in Haystack pipelines with the [PineconeDocumentStore](https://docs.haystack.deepset.ai/docs/document_store#initialization)
19 | 
20 | For a detailed overview of all the available methods and settings for the `PineconeDocumentStore`, visit the Haystack [API Reference](https://docs.haystack.deepset.ai/reference/document-store-api#pineconedocumentstore)
21 | 
22 | ## Installation
23 | 
24 | ```bash
25 | pip install farm-haystack[pinecone]
26 | ```
27 | 
28 | ## Usage
29 | 
30 | To use Pinecone as your data storage for your Haystack LLM pipelines, you must have an account with Pinecone and an API Key. Once you have those, you can initialize a `PineconeDocumentStore` for Haystack:
31 | 
32 | ```python
33 | from haystack.document_stores import PineconeDocumentStore
34 | 
35 | document_store = PineconeDocumentStore(api_key='YOUR_API_KEY',
36 |                                        similarity="cosine",
37 |                                        embedding_dim=768)
38 | ```
39 | 
40 | ### Writing Documents to PineconeDocumentStore
41 | 
42 | To write documents to your `PineconeDocumentStore`, create an indexing pipeline, or use the `write_documents()` function.
43 | For this step, you may make use of the available [FileConverters](https://docs.haystack.deepset.ai/docs/file_converters) and [PreProcessors](https://docs.haystack.deepset.ai/docs/preprocessor), as well as other [Integrations](/integrations) that might help you fetch data from other resources. Below is an example indexing pipeline that indexes your Markdown files into a Pinecone database.
44 | 
45 | #### Indexing Pipeline
46 | 
47 | ```python
48 | from haystack import Pipeline
49 | from haystack.document_stores import PineconeDocumentStore
50 | from haystack.nodes import MarkdownConverter, PreProcessor
51 | 
52 | document_store = PineconeDocumentStore(api_key='YOUR_API_KEY',
53 |                                        similarity="cosine",
54 |                                        embedding_dim=768)
55 | converter = MarkdownConverter()
56 | preprocessor = PreProcessor()
57 | 
58 | indexing_pipeline = Pipeline()
59 | indexing_pipeline.add_node(component=converter, name="PDFConverter", inputs=["File"])
60 | indexing_pipeline.add_node(component=preprocessor, name="PreProcessor", inputs=["PDFConverter"])
61 | indexing_pipeline.add_node(component=document_store, name="DocumentStore", inputs=["PreProcessor"])
62 | 
63 | indexing_pipeline.run(file_paths=["filename.pdf"])
64 | ```
65 | 
66 | ### Using Pinecone in a Query Pipeline
67 | 
68 | Once you have documents in your `PineconeDocumentStore`, it's ready to be used in any Haystack pipeline. For example, below is a pipeline that makes use of a custom prompt that is designed to answer questions for the retrieved documents.
69 | 
70 | ```python
71 | from haystack import Pipeline
72 | from haystack.document_stores import PineconeDocumentStore
73 | from haystack.nodes import AnswerParser, EmbeddingRetriever, PromptNode, PromptTemplate
74 | 
75 | document_store = PineconeDocumentStore(api_key='YOUR_API_KEY',
76 |                                        similarity="cosine",
77 |                                        embedding_dim=768)
78 |               
79 | retriever = EmbeddingRetriever(document_store = document_store,
80 |                                embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1")
81 | prompt_template = PromptTemplate(prompt = """"Answer the following query based on the provided context. If the context does
82 |                                               not include an answer, reply with 'I don't know'.\n
83 |                                               Query: {query}\n
84 |                                               Documents: {join(documents)}
85 |                                               Answer: 
86 |                                           """,
87 |                                           output_parser=AnswerParser())
88 | prompt_node = PromptNode(model_name_or_path = "gpt-4",
89 |                          api_key = "YOUR_OPENAI_KEY",
90 |                          default_prompt_template = prompt_template)
91 | 
92 | query_pipeline = Pipeline()
93 | query_pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"])
94 | query_pipeline.add_node(component=prompt_node, name="PromptNode", inputs=["Retriever"])
95 | 
96 | query_pipeline.run(query = "What is Pinecone", params={"Retriever" : {"top_k": 5}})
97 | ```


--------------------------------------------------------------------------------
/integrations/milvus-document-store.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | layout: integration
 3 | name: Milvus Document Store
 4 | description: Use the Milvus vector database with Haystack
 5 | authors:
 6 |     - name: Zilliz 
 7 |       socials:
 8 |         github: zilliztech
 9 |         twitter: zilliz_universe
10 | pypi: https://pypi.org/project/milvus-haystack/
11 | repo: https://github.com/milvus-io/milvus-haystack
12 | type: Document Store
13 | report_issue: https://github.com/milvus-io/milvus-haystack/issues
14 | logo: /logos/milvus.png
15 | ---
16 | 
17 | An integration of [Milvus](https://milvus.io/) vector database with [Haystack](https://haystack.deepset.ai/).
18 | 
19 | Milvus is a flexible, reliable, and fast cloud-native, open-source vector database. It powers embedding similarity search and AI applications and strives to make vector databases accessible to every organization. Milvus can store, index, and manage a billion+ embedding vectors generated by deep neural networks and other machine learning (ML) models. This level of scale is vital to handling the volumes of unstructured data generated to help organizations to analyze and act on it to provide better service, reduce fraud, avoid downtime, and make decisions faster.
20 | Milvus is a graduated-stage project of the LF AI & Data Foundation.
21 | 
22 | Use Milvus as storage for Haystack pipelines as `MilvusDocumentStore`.
23 | 
24 | 🚀 See an example application that uses the `MilvusDocumentStore` to do Milvus documentation QA [here](https://github.com/TuanaCelik/milvus-documentation-qa).
25 | 
26 | ## Installation
27 | 
28 | ```bash
29 | pip install milvus-haystack
30 | ```
31 | 
32 | ## Usage
33 | 
34 | Once installed and running, you can start using Milvus with Haystack by initializing it: 
35 | 
36 | ```python
37 | from milvus_haystack import MilvusDocumentStore
38 | 
39 | document_store = MilvusDocumentStore()
40 | ```
41 | 
42 | ### Writing Documents to MilvusDocumentStore
43 | 
44 | To write documents to your `MilvusDocumentStore`, create an indexing pipeline, or use the `write_documents()` function.
45 | For this step, you may make use of the available [FileConverters](https://docs.haystack.deepset.ai/docs/file_converters) and [PreProcessors](https://docs.haystack.deepset.ai/docs/preprocessor), as well as other [Integrations](/integrations) that might help you fetch data from other resources. Below is the example indexing pipeline used in the Milvus Documentation QA demo, which makes use of the `Crawler` component.
46 | 
47 | #### Indexing Pipeline
48 | 
49 | ```python
50 | from haystack import Pipeline
51 | from haystack.nodes import Crawler, PreProcessor, EmbeddingRetriever
52 | from milvus_haystack import MilvusDocumentStore
53 | 
54 | document_store = MilvusDocumentStore(recreate_index=True, return_embedding=True, similarity="cosine")
55 | crawler = Crawler(urls=["https://milvus.io/docs/"], crawler_depth=1, overwrite_existing_files=True, output_dir="crawled_files")
56 | preprocessor = PreProcessor(
57 |     clean_empty_lines=True,
58 |     clean_whitespace=False,
59 |     clean_header_footer=True,
60 |     split_by="word",
61 |     split_length=500,
62 |     split_respect_sentence_boundary=True,
63 | )
64 | retriever = EmbeddingRetriever(document_store=document_store, embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1")
65 | 
66 | indexing_pipeline = Pipeline()
67 | indexing_pipeline.add_node(component=crawler, name="crawler", inputs=['File'])
68 | indexing_pipeline.add_node(component=preprocessor, name="preprocessor", inputs=['crawler'])
69 | indexing_pipeline.add_node(component=retriever, name="retriever", inputs=['preprocessor'])
70 | indexing_pipeline.add_node(component=document_store, name="document_store", inputs=['retriever'])
71 | 
72 | indexing_pipeline.run()
73 | ```
74 | 
75 | ### Using Milvus in a Retrieval Augmented Generative Pipeline
76 | 
77 | Once you have documents in your `MilvusDocumentStore`, it's ready to be used in any Haystack pipeline. For example, below is a pipeline that makes use of the ["deepset/question-answering"](https://prompthub.deepset.ai/?prompt=deepset%2Fquestion-answering) prompt that is designed to generate answers for the retrieved documents. Below is the example pipeline used in the Milvus Documentation QA deme that generates replies to queries using GPT-4:
78 | 
79 | ```python
80 | from haystack import Pipeline
81 | from haystack.nodes import EmbeddingRetriever, PromptNode, PromptTemplate, AnswerParser
82 | from milvus_haystack import MilvusDocumentStore
83 | 
84 | document_store = MilvusDocumentStore()
85 | 
86 | retriever = EmbeddingRetriever(document_store=document_store, embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1")
87 | template = PromptTemplate(prompt="deepset/question-answering", output_parser=AnswerParser())
88 | prompt_node = PromptNode(model_name_or_path="gpt-4", default_prompt_template=template, api_key=YOUR_OPENAI_API_KEY, max_length=200)
89 | 
90 | query_pipeline = Pipeline()
91 | query_pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"])
92 | query_pipeline.add_node(component=prompt_node, name="PromptNode", inputs=["Retriever"])
93 | ```


--------------------------------------------------------------------------------
/integrations/weaviate-document-store.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | layout: integration
 3 | name: Weaviate Document Store
 4 | description: Use a Weaviate database with Haystack
 5 | authors:
 6 |     - name: deepset
 7 |       socials:
 8 |         github: deepset-ai
 9 |         twitter: deepset_ai
10 |         linkedin: deepset-ai
11 | pypi: https://pypi.org/project/farm-haystack
12 | repo: https://github.com/deepset-ai/haystack
13 | type: Document Store
14 | report_issue: https://github.com/deepset-ai/haystack/issues
15 | logo: /logos/weaviate.png
16 | ---
17 | 
18 | Haystack supports the use of [Weaviate](https://weaviate.io/) as data storage for LLM pipelines, with the `WeaviateDocumentStore`. You can choose to run Weaviate locally youself, or use a hosted Weaviate database.
19 | 
20 | For details on the available methods and parameters of the `WeaviateDocumentStore`, check out the Haystack [API Reference](https://docs.haystack.deepset.ai/reference/document-store-api#weaviatedocumentstore) and [Documentation](https://docs.haystack.deepset.ai/docs/document_store#initialization)
21 | 
22 | ## Installation
23 | 
24 | ```bash
25 | pip install farm-haystack[weaviate]
26 | ```
27 | 
28 | ## Usage
29 | 
30 | To use Weaviate as your data storage for your Haystack LLM pipelines, you should have it running locally or have a hosted instance. Then, you can initialize a `WeaviateDocumentStore`:
31 | 
32 | ```python
33 | from haystack.document_stores import WeaviateDocumentStore
34 | 
35 | document_store = WeaviateDocumentStore(host='http://localhost",
36 |                                        port=8080,
37 |                                        embedding_dim=768)
38 | ```
39 | 
40 | ### Writing Documents to WeaviateDocumentStore
41 | 
42 | To write documents to your `WeaviateDocumentStore`, create an indexing pipeline, or use the `write_documents()` function.
43 | For this step, you may make use of the available [FileConverters](https://docs.haystack.deepset.ai/docs/file_converters) and [PreProcessors](https://docs.haystack.deepset.ai/docs/preprocessor), as well as other [Integrations](/integrations) that might help you fetch data from other resources. Below is an example indexing pipeline that indexes your Markdown files into a Weaviate database. The example pipeline below not only indexes the contents of the files, but also the embeddings. This way, we can do vector search on our files.
44 | 
45 | #### Indexing Pipeline
46 | 
47 | ```python
48 | from haystack import Pipeline
49 | from haystack.document_stores import WeaviateDocumentStore
50 | from haystack.nodes import EmbeddingRetriever, MarkdownConverter, PreProcessor
51 | 
52 | document_store = WeaviateDocumentStore(host='http://localhost",
53 |                                        port=8080,
54 |                                        embedding_dim=768)
55 | converter = MarkdownConverter()
56 | preprocessor = PreProcessor()
57 | retriever = EmbeddingRetriever(document_store = document_store,
58 |                                embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1")
59 | 
60 | indexing_pipeline = Pipeline()
61 | indexing_pipeline.add_node(component=converter, name="PDFConverter", inputs=["File"])
62 | indexing_pipeline.add_node(component=preprocessor, name="PreProcessor", inputs=["PDFConverter"])
63 | indexing_pipeline.add_node(component=retriever, name="Retriever", inputs=["PreProcessor"])
64 | indexing_pipeline.add_node(component=document_store, name="DocumentStore", inputs=["Retriever"])
65 | 
66 | indexing_pipeline.run(file_paths=["filename.pdf"])
67 | ```
68 | 
69 | ### Using Weaviate in a Query Pipeline
70 | 
71 | Once you have documents in your `WeaviateDocumentStore`, it's ready to be used in any Haystack pipeline. For example, below is a pipeline that makes use of a custom prompt that, given a query, is designed to generate long answers based on the retrieved documents.
72 | 
73 | ```python
74 | from haystack import Pipeline
75 | from haystack.document_stores import WeaviateDocumentStore
76 | from haystack.nodes import AnswerParser, EmbeddingRetriever, PromptNode, PromptTemplate
77 | 
78 | document_store = WeaviateDocumentStore(host='http://localhost",
79 |                                        port=8080,
80 |                                        embedding_dim=768)
81 |               
82 | retriever = EmbeddingRetriever(document_store = document_store,
83 |                                embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1")
84 | prompt_template = PromptTemplate(prompt = """"Given the provided Documents, answer the Query. Make your answer detailed and long\n
85 |                                               Query: {query}\n
86 |                                               Documents: {join(documents)}
87 |                                               Answer: 
88 |                                           """,
89 |                                           output_parser=AnswerParser())
90 | prompt_node = PromptNode(model_name_or_path = "gpt-4",
91 |                          api_key = "YOUR_OPENAI_KEY",
92 |                          default_prompt_template = prompt_template)
93 | 
94 | query_pipeline = Pipeline()
95 | query_pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"])
96 | query_pipeline.add_node(component=prompt_node, name="PromptNode", inputs=["Retriever"])
97 | 
98 | query_pipeline.run(query = "What is Weaviate", params={"Retriever" : {"top_k": 5}})
99 | ```


--------------------------------------------------------------------------------
/integrations/basic-agent-memory.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | layout: integration
  3 | name: Basic Agent Memory Tool
  4 | description: A working memory that stores the Agent's conversation memory
  5 | authors:
  6 |     - name: Roland Tannous
  7 |       socials:
  8 |         github: rolandtannous
  9 |         twitter: rolandtannous
 10 |     - name: Xceron
 11 |       socials:
 12 |         github: Xceron
 13 | pypi: https://pypi.org/project/haystack-memory/
 14 | repo: https://github.com/rolandtannous/haystack-memory
 15 | type: Agent Tool
 16 | report_issue: https://github.com/rolandtannous/haystack-memory/issues
 17 | ---
 18 | 
 19 | This library implements a working memory that stores the Agent's conversation memory 
 20 | and a sensory memory that stores the agent's short-term sensory memory. The working memory can be utilized in-memory or through Redis, with the 
 21 | 
 22 | Redis implementation featuring a sliding window. On the other hand, the sensory memory is an in-memory implementation that mimics 
 23 | a human's brief sensory memory, lasting only for the duration of one interaction.. 
 24 | 
 25 | ## Installation
 26 | 
 27 | - Python pip: ```pip install --upgrade haystack-memory``` . This method will attempt to install the dependencies (farm-haystack>=1.15.0, redis)
 28 | - Python pip (skip dependency installation): Use  ```pip install --upgrade haystack-memory --no-deps```
 29 | - Using git: ```pip install git+https://github.com/rolandtannous/haystack-memory.git@main#egg=haystack-memory```
 30 | 
 31 | 
 32 | ## Usage
 33 | 
 34 | To use memory in your agent, you need three components:
 35 | - `MemoryRecallNode`: This node is added to the agent as a tool. It will allow the agent to remember the conversation and make query-memory associations.
 36 | - `MemoryUtils`: This class should be used to save the queries and the final agent answers to the conversation memory.
 37 | - `chat`: This is a method of the MemoryUtils class. It is used to chat with the agent. It will save the query and the answer to the memory. It also returns the full result for further usage.
 38 | 
 39 | ```py
 40 | from haystack.agents import Agent, Tool
 41 | from haystack.nodes import PromptNode
 42 | from haystack_memory.prompt_templates import memory_template
 43 | from haystack_memory.memory import MemoryRecallNode
 44 | from haystack_memory.utils import MemoryUtils
 45 | 
 46 | # Initialize the memory and the memory tool so the agent can retrieve the memory
 47 | working_memory = []
 48 | sensory_memory = []
 49 | memory_node = MemoryRecallNode(memory=working_memory)
 50 | memory_tool = Tool(name="Memory",
 51 |                    pipeline_or_node=memory_node,
 52 |                    description="Your memory. Always access this tool first to remember what you have learned.")
 53 | 
 54 | prompt_node = PromptNode(model_name_or_path="text-davinci-003", 
 55 |                          api_key="<YOUR_OPENAI_KEY>", 
 56 |                          max_length=1024,
 57 |                          stop_words=["Observation:"])
 58 | memory_agent = Agent(prompt_node=prompt_node, prompt_template=memory_template)
 59 | memory_agent.add_tool(memory_tool)
 60 | 
 61 | # Initialize the utils to save the query and the answers to the memory
 62 | memory_utils = MemoryUtils(working_memory=working_memory,sensory_memory=sensory_memory, agent=memory_agent)
 63 | result = memory_utils.chat("<Your Question>")
 64 | print(working_memory)
 65 | ```
 66 | 
 67 | ### Redis
 68 | 
 69 | The working memory can also be stored in a redis database which makes it possible to use different memories at the same time to be used with multiple agents. Additionally, it supports a sliding window to only utilize the last k messages.
 70 | 
 71 | ```py
 72 | from haystack.agents import Agent, Tool
 73 | from haystack.nodes import PromptNode
 74 | from haystack_memory.memory import RedisMemoryRecallNode
 75 | from haystack_memory.prompt_templates import memory_template
 76 | from haystack_memory.utils import RedisUtils
 77 | 
 78 | sensory_memory = []
 79 | # Initialize the memory and the memory tool so the agent can retrieve the memory
 80 | redis_memory_node = RedisMemoryRecallNode(memory_id="working_memory",
 81 |                                           host="localhost",
 82 |                                           port=6379,
 83 |                                           db=0)
 84 | memory_tool = Tool(name="Memory",
 85 |                    pipeline_or_node=redis_memory_node,
 86 |                    description="Your memory. Always access this tool first to remember what you have learned.")
 87 | prompt_node = PromptNode(model_name_or_path="text-davinci-003",
 88 |                          api_key="<YOUR_OPENAI_KEY>",
 89 |                          max_length=1024,
 90 |                          stop_words=["Observation:"])
 91 | memory_agent = Agent(prompt_node=prompt_node, prompt_template=memory_template)
 92 | # Initialize the utils to save the query and the answers to the memory
 93 | redis_utils = RedisUtils(agent=memory_agent,
 94 |                          sensory_memory=sensory_memory,
 95 |                          memory_id="working_memory",
 96 |                          host="localhost",
 97 |                          port=6379,
 98 |                          db=0)
 99 | result = redis_utils.chat("<Your Question>")
100 | ```
101 | 
102 | 
103 | ## Examples
104 | 
105 | Examples can be found in the `examples/` folder. They contain usage examples for both in-memory and Redis memory types.
106 | To open the examples in colab, click on the following links:
107 | - Basic Memory: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/rolandtannous/HaystackAgentBasicMemory/blob/main/examples/example_basic_memory.ipynb)
108 | - Redis Memory: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/rolandtannous/HaystackAgentBasicMemory/blob/main/examples/example_redis_memory.ipynb)
109 | 
110 | 
111 | 
112 | 
113 | 
114 | 
115 | 


--------------------------------------------------------------------------------
/integrations/newspaper3k.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | layout: integration
  3 | name: Newspaper3k Wrapper Nodes
  4 | description: Newspaper3k wrapper nodes. It allows to scrape articles directly using the scraper Node or crawling many pages using the crawler Node.
  5 | 
  6 | authors:
  7 |     - name: Haradai
  8 |       socials:
  9 |         github: haradai
 10 | pypi: https://pypi.org/project/newspaper3k-haystack
 11 | repo: https://github.com/Haradai/newspaper3k-haystack
 12 | type: Custom Node
 13 | report_issue: https://github.com/Haradai/newspaper3k-haystack/issues
 14 | ---
 15 | 
 16 | Newspaper3k Haystack is a simple wrapper for the newspaper3k library within the Haystack framework. It allows to scrape articles given urls using the scraper node or crawl many pages using the crawler node.
 17 | 
 18 | ## Installation:
 19 | You can install Newspaper3k Haystack using pip:
 20 | ```
 21 | pip install newspaper3k-haystack
 22 | ```
 23 | 
 24 | ## Usage:
 25 | ### Scraper Node:
 26 | ```
 27 | from newspaper3k-haystack import newspaper3k_scraper
 28 | scraper = newspaper3k_scraper()
 29 | 
 30 | ```
 31 | You can also provide a header for the request and a timeout for the page loading.
 32 | ```
 33 | scraper = newspaper3k_scraper(
 34 |     headers={'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0',
 35 |             'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'},
 36 |     request_timeout= 10)
 37 | ```
 38 | 
 39 | To run in standalone mode you can use the run or run_batch if you want to load one url or multiple urls in an array.
 40 | 
 41 | **Available parameters:**
 42 | ```
 43 | :param query: list of strings containing the webpages to scrape.
 44 | :param lang: (None by default) language to process the article with, if None autodetected.
 45 |     Available languages are: (more info at https://newspaper.readthedocs.io/en/latest/)
 46 |     input code      full name
 47 | 
 48 |     ar              Arabic
 49 |     ru              Russian
 50 |     nl              Dutch
 51 |     de              German
 52 |     en              English
 53 |     es              Spanish
 54 |     fr              French
 55 |     he              Hebrew
 56 | ...
 57 | :param summary: (False by default) Wether to summarize the document (through nespaper3k) and save it as document metadata.
 58 | :param path: (None by default) Path where to store the downloaded articles html, if None, not downloaded. Ignored if load=True
 59 | :param load: (False by default) If true query should be a local path to an html file to scrape.
 60 | ```
 61 | **In Standalone:**
 62 | ```
 63 | scraper.run(query="https://www.lonelyplanet.com/articles/getting-around-norway",
 64 |     metadata=True,
 65 |     summary=True,
 66 |     keywords=True,
 67 |     path="articles")
 68 | ```
 69 | **In a Pipeline:**
 70 | ```
 71 | 
 72 | from qdrant_haystack.document_stores import QdrantDocumentStore
 73 | from haystack.nodes import EntityExtractor
 74 | from haystack.pipelines import Pipeline
 75 | from haystack.nodes import PreProcessor
 76 | 
 77 | document_store = QdrantDocumentStore(
 78 |     ":memory:",
 79 |     index="Document",
 80 |     embedding_dim=768,
 81 |     recreate_index=True,
 82 | )
 83 | 
 84 | entity_extractor = EntityExtractor(model_name_or_path="dslim/bert-base-NER",flatten_entities_in_meta_data=True)
 85 | 
 86 | processor = PreProcessor(
 87 |     clean_empty_lines=False,
 88 |     clean_whitespace=False,
 89 |     clean_header_footer=False,
 90 |     split_by="sentence",
 91 |     split_length=30,
 92 |     split_respect_sentence_boundary=False,
 93 |     split_overlap=0
 94 | )
 95 | 
 96 | indexing_pipeline = Pipeline()
 97 | indexing_pipeline.add_node(component=scraper, name="scraper", inputs=['File'])
 98 | indexing_pipeline.add_node(component=processor, name="processor", inputs=['scraper'])
 99 | indexing_pipeline.add_node(entity_extractor, "EntityExtractor", ["processor"])
100 | indexing_pipeline.add_node(component=document_store, name="document_store", inputs=['EntityExtractor'])
101 | 
102 | #we can pass the previously seen arguments also
103 | indexing_pipeline.run(query = "https://www.roughguides.com/norway/",
104 |     params={
105 |         "scraper":{
106 |             "metadata":True,
107 |             "summary":True,
108 |             "keywords":True
109 |         }
110 |     })
111 | ```
112 | ### Crawler Node:
113 | ```
114 | from newspaper3k-haystack import newspaper3k_crawler
115 | ```
116 | When initializing the crawler you can pass the same parameters as to the scraper node.
117 | 
118 | ```
119 | crawler = newspaper3k_crawler(
120 |     headers={'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0',
121 |             'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'},
122 |     request_timeout= 10)
123 | ```
124 | 
125 | **Available parameters:**
126 | ```
127 |         :param query: list of initial urls to start scraping
128 |         :param n_articles: number of articles to scrape per initial url
129 |         :param beam: number of articles from each scraped website to prioritize in the crawl queue.
130 |             If 0 then priority of scrape will be a simple continuous pile of found links after each scrape. (BFS). 
131 |             If 1 then would be performing (DFS).
132 |         :param filters: dictionary with lists of strings that the urls should contain or not. Keys: positive and negative.
133 |             Urls will be checked to contain at least one positive filter and none of the negatives.
134 |             e.g.
135 |             {positive: [".com",".es"],
136 |             negative: ["facebook","instagram"]}
137 |         :param keep_links: (False by default) Wether to keep the found links in each page as document metadata or not
138 | 
139 |         :param lang: (None by default) language to process the article with, if None autodetected.
140 |             Available languages are: (more info at https://newspaper.readthedocs.io/en/latest/)
141 |             input code      full name
142 | 
143 |             ar              Arabic
144 |             ru              Russian
145 |             nl              Dutch
146 |             de              German
147 |             en              English
148 |             es              Spanish
149 |             fr              French
150 |             he              Hebrew
151 |             it              Italian
152 |             ko              Korean
153 |             no              Norwegian
154 |             fa              Persian
155 |             pl              Polish
156 |             pt              Portuguese
157 |             sv              Swedish
158 |             hu              Hungarian
159 |             fi              Finnish
160 |             da              Danish
161 |             zh              Chinese
162 |             id              Indonesian
163 |             vi              Vietnamese
164 |             sw              Swahili
165 |             tr              Turkish
166 |             el              Greek
167 |             uk              Ukrainian
168 |             bg              Bulgarian
169 |             hr              Croatian
170 |             ro              Romanian
171 |             sl              Slovenian
172 |             sr              Serbian
173 |             et              Estonian
174 |             ja              Japanese
175 |             be              Belarusian
176 | 
177 |         :param metadata: (False by default) Wether to get article metadata.
178 |         :param keywords: (False by default) Wether to save the detected article keywords as document metadata.
179 |         :param summary: (False by default) Wether to summarize the document (through nespaper3k) and save it as document metadata.
180 |         :param path: (None by default) Path where to store the downloaded articles html, if None, not downloaded.
181 | ```
182 | **In Standalone:** 
183 | 
184 | You can also use run_batch and pass a list of urls in the query argument. It will scrape n_articles for each provided url.
185 | ```
186 | docs = crawler.run(
187 |     query = "https://www.roughguides.com/norway/ ",
188 |     n_articles = 10,
189 |     beam = 5,
190 |     filters = {
191 |         "positive":["norway"],
192 |         "negative":["facebook","instagram"]
193 |     },
194 |     keep_links = False,
195 |     metadata=True,
196 |     summary=True,
197 |     keywords=True,
198 |     path = "articles")
199 | ```
200 | 
201 | **In a Pipeline:**
202 | ```
203 | from qdrant_haystack.document_stores import QdrantDocumentStore
204 | from haystack.nodes import EntityExtractor
205 | from haystack.pipelines import Pipeline
206 | from haystack.nodes import PreProcessor
207 | 
208 | document_store = QdrantDocumentStore(
209 |     ":memory:",
210 |     index="Document",
211 |     embedding_dim=768,
212 |     recreate_index=True,
213 | )
214 | 
215 | entity_extractor = EntityExtractor(model_name_or_path="dslim/bert-base-NER",flatten_entities_in_meta_data=True)
216 | 
217 | processor = PreProcessor(
218 |     clean_empty_lines=False,
219 |     clean_whitespace=False,
220 |     clean_header_footer=False,
221 |     split_by="sentence",
222 |     split_length=30,
223 |     split_respect_sentence_boundary=False,
224 |     split_overlap=0
225 | )
226 | 
227 | indexing_pipeline = Pipeline()
228 | indexing_pipeline.add_node(component=crawler, name="crawler", inputs=['File'])
229 | indexing_pipeline.add_node(component=processor, name="processor", inputs=['crawler'])
230 | indexing_pipeline.add_node(entity_extractor, "EntityExtractor", ["processor"])
231 | indexing_pipeline.add_node(component=document_store, name="document_store", inputs=['EntityExtractor'])
232 | 
233 | #we can pass the previously seen arguments also
234 | indexing_pipeline.run(query = "https://www.roughguides.com/norway/",
235 |     params={
236 |         "crawler":{
237 |             "n_articles" : 500,
238 |             "beam" : 5,
239 |             "filters" : {
240 |                 "positive":["norway"],
241 |                 "negative": ["facebook"]
242 |             },
243 |             "keep_links" : False,
244 |             "metadata":True,
245 |             "summary":True,
246 |             "keywords":True,
247 |             "path": "articles"
248 |         }
249 |     })
250 | ```


--------------------------------------------------------------------------------