├── README.md └── crew-agentic-pav.ipynb /README.md: -------------------------------------------------------------------------------- 1 | # Agentic RAG Using CrewAI & LangChain 2 | 3 | We are going to see how agents can be involved in the RAG system to rtrieve the most relevant information. 4 | -------------------------------------------------------------------------------- /crew-agentic-pav.ipynb: -------------------------------------------------------------------------------- 1 | {"cells":[{"cell_type":"markdown","id":"5c2f2621-074e-45a3-b892-9fd4ac5dbc2a","metadata":{"language":"python"},"source":"# Agentic RAG Using CrewAI & LangChain\n\nWe are going to see how agents can be involved in the RAG system to rtrieve the most relevant information."},{"cell_type":"code","execution_count":7,"id":"2408f415-d51c-4d75-ad71-e809b7072778","metadata":{"execution":{"iopub.execute_input":"2024-08-06T08:52:51.355586Z","iopub.status.busy":"2024-08-06T08:52:51.355253Z","iopub.status.idle":"2024-08-06T08:53:07.107101Z","shell.execute_reply":"2024-08-06T08:53:07.105613Z","shell.execute_reply.started":"2024-08-06T08:52:51.355560Z"},"language":"python","trusted":true},"outputs":[],"source":"!pip install crewai==0.28.8 crewai_tools==0.1.6 langchain_community==0.0.29 sentence-transformers langchain-groq --quiet"},{"cell_type":"code","execution_count":8,"id":"7d74a183-c473-4286-aa91-6ac1c89f8a08","metadata":{"execution":{"iopub.execute_input":"2024-08-06T08:53:11.142412Z","iopub.status.busy":"2024-08-06T08:53:11.141324Z","iopub.status.idle":"2024-08-06T08:53:17.696025Z","shell.execute_reply":"2024-08-06T08:53:17.695307Z","shell.execute_reply.started":"2024-08-06T08:53:11.142373Z"},"language":"python","trusted":true},"outputs":[],"source":"from langchain_openai import ChatOpenAI\nimport os\nfrom crewai_tools import PDFSearchTool\nfrom langchain_community.tools.tavily_search import TavilySearchResults\nfrom crewai_tools import tool\nfrom crewai import Crew\nfrom crewai import Task\nfrom crewai import Agent"},{"cell_type":"markdown","id":"f537530b-c530-43d6-afb7-d6a24d237908","metadata":{"language":"python"},"source":"## Mention the Groq API Key"},{"cell_type":"code","execution_count":9,"id":"b211674e-aed8-4732-beef-7c3922279138","metadata":{"execution":{"iopub.execute_input":"2024-08-06T08:53:26.673091Z","iopub.status.busy":"2024-08-06T08:53:26.672457Z","iopub.status.idle":"2024-08-06T08:53:26.678191Z","shell.execute_reply":"2024-08-06T08:53:26.677631Z","shell.execute_reply.started":"2024-08-06T08:53:26.673058Z"},"language":"python","trusted":true},"outputs":[],"source":"import os\n\n# Set the API key\nos.environ['GROQ_API_KEY'] = 'Add Your Groq API Key'"},{"cell_type":"markdown","id":"61808e55-3a5c-4977-a36f-6487740d20c9","metadata":{"language":"python"},"source":"## Mention the LLM being used"},{"cell_type":"code","execution_count":11,"id":"dcb3e7b8-04c1-4a4b-a78f-9f10cef13222","metadata":{"execution":{"iopub.execute_input":"2024-08-06T08:54:42.224934Z","iopub.status.busy":"2024-08-06T08:54:42.224560Z","iopub.status.idle":"2024-08-06T08:54:42.268360Z","shell.execute_reply":"2024-08-06T08:54:42.267668Z","shell.execute_reply.started":"2024-08-06T08:54:42.224905Z"},"language":"python","trusted":true},"outputs":[],"source":"llm = ChatOpenAI(\n openai_api_base=\"https://api.groq.com/openai/v1\",\n openai_api_key=os.environ['GROQ_API_KEY'],\n model_name=\"llama3-8b-8192\",\n temperature=0.1,\n max_tokens=1000,\n)"},{"cell_type":"code","execution_count":48,"id":"be093f01-bb3d-483e-9ea2-0ca9c5875000","metadata":{"execution":{"iopub.execute_input":"2024-08-06T09:09:42.443927Z","iopub.status.busy":"2024-08-06T09:09:42.443522Z","iopub.status.idle":"2024-08-06T09:09:43.219922Z","shell.execute_reply":"2024-08-06T09:09:43.219238Z","shell.execute_reply.started":"2024-08-06T09:09:42.443896Z"},"language":"python","trusted":true},"outputs":[],"source":"import requests\n\npdf_url = 'https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf'\nresponse = requests.get(pdf_url)\n\nwith open('attenstion_is_all_you_need.pdf', 'wb') as file:\n file.write(response.content)"},{"cell_type":"markdown","id":"c2cfdc74-f167-4a14-a4d6-6094cb5b71a0","metadata":{"language":"python"},"source":"## Create a RAG tool variable to pass our PDF"},{"cell_type":"code","execution_count":49,"id":"d97d2f22-8667-4799-bf90-1a9fec1dc540","metadata":{"execution":{"iopub.execute_input":"2024-08-06T09:10:08.404340Z","iopub.status.busy":"2024-08-06T09:10:08.403918Z","iopub.status.idle":"2024-08-06T09:10:12.883348Z","shell.execute_reply":"2024-08-06T09:10:12.880732Z","shell.execute_reply.started":"2024-08-06T09:10:08.404297Z"},"language":"python","trusted":true},"outputs":[{"name":"stderr","output_type":"stream","text":"Inserting batches in chromadb: 100%|██████████| 1/1 [00:03<00:00, 3.82s/it]\n"}],"source":"rag_tool = PDFSearchTool(pdf='attenstion_is_all_you_need.pdf',\n config=dict(\n llm=dict(\n provider=\"groq\", # or google, openai, anthropic, llama2, ...\n config=dict(\n model=\"llama3-8b-8192\",\n # temperature=0.5,\n # top_p=1,\n # stream=true,\n ),\n ),\n embedder=dict(\n provider=\"huggingface\", # or openai, ollama, ...\n config=dict(\n model=\"BAAI/bge-small-en-v1.5\",\n #task_type=\"retrieval_document\",\n # title=\"Embeddings\",\n ),\n ),\n )\n)"},{"cell_type":"code","execution_count":50,"id":"40e49ecd-7cd8-464c-be2f-5e4aa1f58de6","metadata":{"execution":{"iopub.execute_input":"2024-08-06T09:10:29.400488Z","iopub.status.busy":"2024-08-06T09:10:29.399575Z","iopub.status.idle":"2024-08-06T09:10:29.479174Z","shell.execute_reply":"2024-08-06T09:10:29.475340Z","shell.execute_reply.started":"2024-08-06T09:10:29.400441Z"},"language":"python","trusted":true},"outputs":[{"name":"stdout","output_type":"stream","text":"Using Tool: Search a PDF's content\n"},{"data":{"text/plain":"'Relevant Content:\\nThis makes it more difficult to learn dependencies between distant positions [ 11]. In the Transformer this is reduced to a constant number of operations, albeit at the cost of reduced effective resolution due to averaging attention-weighted positions, an effect we counteract with Multi-Head Attention as described in section 3.2. Self-attention, sometimes called intra-attention is an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence. Self-attention has been used successfully in a variety of tasks including reading comprehension, abstractive summarization, textual entailment and learning task-independent sentence representations [4, 22, 23, 19]. End-to-end memory networks are based on a recurrent attention mechanism instead of sequence- aligned recurrence and have been shown to perform well on simple-language question answering and language modeling tasks [28]. To the best of our knowledge, however, the Transformer\\n\\nIn terms of computational complexity, self-attention layers are faster than recurrent layers when the sequence lengthnis smaller than the representation dimensionality d, which is most often the case with sentence representations used by state-of-the-art models in machine translations, such as word-piece [31] and byte-pair [ 25] representations. To improve computational performance for tasks involving very long sequences, self-attention could be restricted to considering only a neighborhood of size rin 6\\n\\nfactor of k. Separable convolutions [ 6], however, decrease the complexity considerably, to O(k·n·d+n·d2). Even with k=n, however, the complexity of a separable convolution is equal to the combination of a self-attention layer and a point-wise feed-forward layer, the approach we take in our model. As side benefit, self-attention could yield more interpretable models. We inspect attention distributions from our models and present and discuss examples in the appendix. Not only do individual attention heads clearly learn to perform different tasks, many appear to exhibit behavior related to the syntactic and semantic structure of the sentences. 5 Training This section describes the training regime for our models. 5.1 Training Data and Batching We trained on the standard WMT 2014 English-German dataset consisting of about 4.5 million sentence pairs. Sentences were encoded using byte-pair encoding [ 3], which has a shared source- target vocabulary of about 37000 tokens. For English-French,'"},"execution_count":50,"metadata":{},"output_type":"execute_result"}],"source":"rag_tool.run(\"How did self-attention mechanism evolve in large language models?\")"},{"cell_type":"markdown","id":"0ac98c3c-b02f-4680-8fab-99fbfa3c45a3","metadata":{"language":"python"},"source":"## Mention the Tavily API Key"},{"cell_type":"code","execution_count":51,"id":"4e1b4e5e-a4c4-4b81-8d11-97b9f7eb1d4f","metadata":{"execution":{"iopub.execute_input":"2024-08-06T09:10:33.516323Z","iopub.status.busy":"2024-08-06T09:10:33.515939Z","iopub.status.idle":"2024-08-06T09:10:33.520768Z","shell.execute_reply":"2024-08-06T09:10:33.520117Z","shell.execute_reply.started":"2024-08-06T09:10:33.516295Z"},"language":"python","trusted":true},"outputs":[],"source":"import os\n\n# Set the Tavily API key\nos.environ['TAVILY_API_KEY'] = 'Add Your Tavily API Key'"},{"cell_type":"code","execution_count":53,"id":"086f3f46-0750-4d59-a56a-02d13b802223","metadata":{"execution":{"iopub.execute_input":"2024-08-06T09:10:40.163940Z","iopub.status.busy":"2024-08-06T09:10:40.163524Z","iopub.status.idle":"2024-08-06T09:10:40.167780Z","shell.execute_reply":"2024-08-06T09:10:40.167028Z","shell.execute_reply.started":"2024-08-06T09:10:40.163904Z"},"language":"python","trusted":true},"outputs":[],"source":"web_search_tool = TavilySearchResults(k=3)"},{"cell_type":"code","execution_count":54,"id":"b445a6f5-ff84-4ee8-b633-f28be3d06b00","metadata":{"execution":{"iopub.execute_input":"2024-08-06T09:10:41.783474Z","iopub.status.busy":"2024-08-06T09:10:41.782945Z","iopub.status.idle":"2024-08-06T09:10:43.410666Z","shell.execute_reply":"2024-08-06T09:10:43.409742Z","shell.execute_reply.started":"2024-08-06T09:10:41.783445Z"},"language":"python","trusted":true},"outputs":[{"data":{"text/plain":"[{'url': 'https://cyrilzakka.github.io/llm-playbook/attention.html',\n 'content': \"Self-Attention. Self-attention is a specific type of attention mechanism where the Query (Q), Key (K), and Value (V) all come from the same place, usually the output of the previous layer in your network. In layman's terms, self-attention enables tokens in the input sequence to look at other tokens in the same sequence to gather contextual clues.\"},\n {'url': 'https://medium.com/data-science-community-srm/exploring-attention-mechanisms-in-large-language-models-33549ae2c5b8',\n 'content': 'The Self-Attention Mechanism of LLMs: In the Transformer architecture, the self-attention mechanism plays a crucial role by helping the model decide how important each word or token is in a ...'},\n {'url': 'https://medium.com/@vimal.parakhiya/decoding-large-language-models-a-deep-dive-into-the-self-attention-mechanism-967d6207d4ae',\n 'content': 'Simplified yet conceptual representation of self-attention mechanism. It emulates a compact information retrieval system, i.e. for a given sentence, it crafts questions for each word and seeks ...'},\n {'url': 'https://magazine.sebastianraschka.com/p/understanding-and-coding-self-attention',\n 'content': 'If we have a training text sample like \"Life is short eat desert first\" we have the following setup, where the context vectors for the word to the right side of the arrow should only incorporate itself and the previous words:\\n\"Life\" → \"is\"\\n\"Life is\" → \"short\"\\n\"Life is short\" → \"eat\"\\n\"Life is short eat\" → \"desert\"\\n\"Life is short eat desert\" → \"first\"\\nThe simplest way to achieve this setup above is to mask out all future tokens by applying a mask to the attention weight matrix above the diagonal, as illustrated in the figure below. In:\\nOut:\\nNow that we have all the required keys and values, we can proceed to the next step and compute the unnormalized attention weights ω (omega), which are illustrated in the figure below:\\nAs illustrated in the figure above, we compute ωi,j as the dot product between the query and key sequences, ωi,j = q(i) k(j).\\n First, we quickly recap the computation of the attention scores from the previous Self-Attention section:\\nIn:\\nOut:\\nSimilar to the Self-Attention section before, the output above is a 6×6 tensor containing these pairwise unnormalized attention weights (also called attention scores) for the 6 input tokens.\\n In:\\n(Note that ω is the symbol for the Greek letter \"omega\", hence the code variable with the same name above.)\\nOut:\\nSince we will need those unnormalized attention weights ω to compute the actual attention weights later, let\\'s compute the ω values for all input tokens as illustrated in the previous figure:\\n Since the sentence consists of 6 words, this will result in a 6×3-dimensional embedding:\\nIn:\\nOut:\\nDefining the Weight Matrices\\nNow, let\\'s discuss the widely utilized self-attention mechanism known as the scaled dot-product attention, which is an integral part of the transformer architecture.\\n'},\n {'url': 'https://sebastianraschka.com/blog/2023/self-attention-from-scratch.html',\n 'content': 'In:\\nOut:\\nSince we will need those to compute the attention scores later, let’s compute the \\\\(\\\\omega\\\\) values for all input tokens as illustrated in the previous figure:\\nIn:\\nOut:\\nComputing the Attention Scores #\\nThe subsequent step in self-attention is to normalize the unnormalized attention weights, \\\\(\\\\omega\\\\), to obtain the normalized attention weights, \\\\(\\\\alpha\\\\), by applying the softmax function. In:\\nOut:\\nNow that we have all the required keys and values, we can proceed to the next step and compute the unnormalized attention weights \\\\(\\\\omega\\\\) , which are illustrated in the figure below:\\nAs illustrated in the figure above,\\nwe compute \\\\(\\\\omega_{i, j}\\\\) as the dot product between the query and key sequences, \\\\(\\\\omega_{i j}=\\\\mathbf{q}^{(i)^{\\\\top}} \\\\mathbf{k}^{(j)}\\\\).\\n = 24\\\\) and use \\\\(d_v = 28\\\\),\\ninitializing the projection matrices as follows:\\nIn:\\nComputing the Unnormalized Attention Weights #\\nNow, let’s suppose we are interested in computing the attention-vector for the second input element – the second input element acts as the query here:\\n Stable Diffusion uses cross-attention between the generated image in the U-Net model and the text prompts used for conditioning as described in High-Resolution Image Synthesis with Latent Diffusion Models – the original paper that describes the Stable Diffusion model that was later adopted by Stability AI to implement the popular Stable Diffusion model.\\nConclusion #\\nIn this article, we saw how self-attention works using a step-by-step coding approach. Since the sentence consists of 6 words, this will result in a \\\\(6 \\\\times 16\\\\)-dimensional embedding:\\nIn:\\nOut:\\nDefining the Weight Matrices #\\nNow, let’s discuss the widely utilized self-attention mechanism known as the scaled dot-product attention, which is integrated into the transformer architecture.\\n'}]"},"execution_count":54,"metadata":{},"output_type":"execute_result"}],"source":"web_search_tool.run(\"What is self-attention mechansim in large language models?\")"},{"cell_type":"markdown","id":"d1e69851-f7e3-4b9a-8c7c-7da0f7f04bde","metadata":{"language":"python"},"source":"## Define a Tool"},{"cell_type":"code","execution_count":55,"id":"907b371c-4b24-4c80-b833-971f56ad4c1f","metadata":{"execution":{"iopub.execute_input":"2024-08-06T09:11:34.946830Z","iopub.status.busy":"2024-08-06T09:11:34.946268Z","iopub.status.idle":"2024-08-06T09:11:34.952558Z","shell.execute_reply":"2024-08-06T09:11:34.951976Z","shell.execute_reply.started":"2024-08-06T09:11:34.946789Z"},"language":"python","trusted":true},"outputs":[],"source":"@tool\ndef router_tool(question):\n \"\"\"Router Function\"\"\"\n if 'self-attention' in question:\n return 'vectorstore'\n else:\n return 'web_search'"},{"cell_type":"markdown","id":"be17dc59-0e72-4eaa-9654-400df6cb371e","metadata":{"language":"python"},"source":"## Create Agents to work with"},{"cell_type":"code","execution_count":56,"id":"6cacf6a3-7f50-49d6-9ab0-b63433d7032c","metadata":{"execution":{"iopub.execute_input":"2024-08-06T09:11:39.241558Z","iopub.status.busy":"2024-08-06T09:11:39.241051Z","iopub.status.idle":"2024-08-06T09:11:39.249728Z","shell.execute_reply":"2024-08-06T09:11:39.248634Z","shell.execute_reply.started":"2024-08-06T09:11:39.241517Z"},"language":"python","trusted":true},"outputs":[],"source":"Router_Agent = Agent(\n role='Router',\n goal='Route user question to a vectorstore or web search',\n backstory=(\n \"You are an expert at routing a user question to a vectorstore or web search.\"\n \"Use the vectorstore for questions on concept related to Retrieval-Augmented Generation.\"\n \"You do not need to be stringent with the keywords in the question related to these topics. Otherwise, use web-search.\"\n ),\n verbose=True,\n allow_delegation=False,\n llm=llm,\n)"},{"cell_type":"code","execution_count":57,"id":"4fc04747-af23-4b19-9ece-6473c683f739","metadata":{"execution":{"iopub.execute_input":"2024-08-06T09:11:41.155843Z","iopub.status.busy":"2024-08-06T09:11:41.155481Z","iopub.status.idle":"2024-08-06T09:11:41.162015Z","shell.execute_reply":"2024-08-06T09:11:41.161341Z","shell.execute_reply.started":"2024-08-06T09:11:41.155815Z"},"language":"python","trusted":true},"outputs":[],"source":"Retriever_Agent = Agent(\nrole=\"Retriever\",\ngoal=\"Use the information retrieved from the vectorstore to answer the question\",\nbackstory=(\n \"You are an assistant for question-answering tasks.\"\n \"Use the information present in the retrieved context to answer the question.\"\n \"You have to provide a clear concise answer.\"\n),\nverbose=True,\nallow_delegation=False,\nllm=llm,\n)"},{"cell_type":"code","execution_count":58,"id":"25ece7eb-37fa-450d-991b-16dd8abca495","metadata":{"execution":{"iopub.execute_input":"2024-08-06T09:11:42.476081Z","iopub.status.busy":"2024-08-06T09:11:42.475654Z","iopub.status.idle":"2024-08-06T09:11:42.489959Z","shell.execute_reply":"2024-08-06T09:11:42.485582Z","shell.execute_reply.started":"2024-08-06T09:11:42.476042Z"},"language":"python","trusted":true},"outputs":[],"source":"Grader_agent = Agent(\n role='Answer Grader',\n goal='Filter out erroneous retrievals',\n backstory=(\n \"You are a grader assessing relevance of a retrieved document to a user question.\"\n \"If the document contains keywords related to the user question, grade it as relevant.\"\n \"It does not need to be a stringent test.You have to make sure that the answer is relevant to the question.\"\n ),\n verbose=True,\n allow_delegation=False,\n llm=llm,\n)"},{"cell_type":"code","execution_count":59,"id":"5cdb03f9-34bc-400b-8ada-80c08a71f1f8","metadata":{"execution":{"iopub.execute_input":"2024-08-06T09:11:43.725087Z","iopub.status.busy":"2024-08-06T09:11:43.724709Z","iopub.status.idle":"2024-08-06T09:11:43.731283Z","shell.execute_reply":"2024-08-06T09:11:43.730399Z","shell.execute_reply.started":"2024-08-06T09:11:43.725058Z"},"language":"python","trusted":true},"outputs":[],"source":"hallucination_grader = Agent(\n role=\"Hallucination Grader\",\n goal=\"Filter out hallucination\",\n backstory=(\n \"You are a hallucination grader assessing whether an answer is grounded in / supported by a set of facts.\"\n \"Make sure you meticulously review the answer and check if the response provided is in alignmnet with the question asked\"\n ),\n verbose=True,\n allow_delegation=False,\n llm=llm,\n)"},{"cell_type":"code","execution_count":60,"id":"b6e3e019-18c4-4f33-be69-628274a544e2","metadata":{"execution":{"iopub.execute_input":"2024-08-06T09:11:48.359638Z","iopub.status.busy":"2024-08-06T09:11:48.359179Z","iopub.status.idle":"2024-08-06T09:11:48.372283Z","shell.execute_reply":"2024-08-06T09:11:48.369019Z","shell.execute_reply.started":"2024-08-06T09:11:48.359547Z"},"language":"python","trusted":true},"outputs":[],"source":"answer_grader = Agent(\n role=\"Answer Grader\",\n goal=\"Filter out hallucination from the answer.\",\n backstory=(\n \"You are a grader assessing whether an answer is useful to resolve a question.\"\n \"Make sure you meticulously review the answer and check if it makes sense for the question asked\"\n \"If the answer is relevant generate a clear and concise response.\"\n \"If the answer gnerated is not relevant then perform a websearch using 'web_search_tool'\"\n ),\n verbose=True,\n allow_delegation=False,\n llm=llm,\n)"},{"cell_type":"markdown","id":"626670c9-717e-4bcd-b0b4-e4d9aaaae7eb","metadata":{"language":"python"},"source":"## Defining Tasks"},{"cell_type":"code","execution_count":61,"id":"5e9c9242-5b00-4e59-8e3d-d904c8e9d21f","metadata":{"execution":{"iopub.execute_input":"2024-08-06T09:11:50.881204Z","iopub.status.busy":"2024-08-06T09:11:50.880708Z","iopub.status.idle":"2024-08-06T09:11:50.886480Z","shell.execute_reply":"2024-08-06T09:11:50.885321Z","shell.execute_reply.started":"2024-08-06T09:11:50.881168Z"},"language":"python","trusted":true},"outputs":[],"source":"router_task = Task(\n description=(\"Analyse the keywords in the question {question}\"\n \"Based on the keywords decide whether it is eligible for a vectorstore search or a web search.\"\n \"Return a single word 'vectorstore' if it is eligible for vectorstore search.\"\n \"Return a single word 'websearch' if it is eligible for web search.\"\n \"Do not provide any other premable or explaination.\"\n ),\n expected_output=(\"Give a binary choice 'websearch' or 'vectorstore' based on the question\"\n \"Do not provide any other premable or explaination.\"),\n agent=Router_Agent,\n tools=[router_tool],\n)"},{"cell_type":"code","execution_count":62,"id":"5a497afe-0fd0-4b9f-a97b-64d1b5518ef0","metadata":{"execution":{"iopub.execute_input":"2024-08-06T09:11:52.682162Z","iopub.status.busy":"2024-08-06T09:11:52.681767Z","iopub.status.idle":"2024-08-06T09:11:52.689821Z","shell.execute_reply":"2024-08-06T09:11:52.689037Z","shell.execute_reply.started":"2024-08-06T09:11:52.682135Z"},"language":"python","trusted":true},"outputs":[],"source":"retriever_task = Task(\n description=(\"Based on the response from the router task extract information for the question {question} with the help of the respective tool.\"\n \"Use the web_serach_tool to retrieve information from the web in case the router task output is 'websearch'.\"\n \"Use the rag_tool to retrieve information from the vectorstore in case the router task output is 'vectorstore'.\"\n ),\n expected_output=(\"You should analyse the output of the 'router_task'\"\n \"If the response is 'websearch' then use the web_search_tool to retrieve information from the web.\"\n \"If the response is 'vectorstore' then use the rag_tool to retrieve information from the vectorstore.\"\n \"Return a claer and consise text as response.\"),\n agent=Retriever_Agent,\n context=[router_task],\n #tools=[retriever_tool],\n)"},{"cell_type":"code","execution_count":63,"id":"ce51750e-941d-41fa-b4d4-d36f7e56961e","metadata":{"execution":{"iopub.execute_input":"2024-08-06T09:11:54.826584Z","iopub.status.busy":"2024-08-06T09:11:54.825901Z","iopub.status.idle":"2024-08-06T09:11:54.831307Z","shell.execute_reply":"2024-08-06T09:11:54.830655Z","shell.execute_reply.started":"2024-08-06T09:11:54.826545Z"},"language":"python","trusted":true},"outputs":[],"source":"grader_task = Task(\n description=(\"Based on the response from the retriever task for the quetion {question} evaluate whether the retrieved content is relevant to the question.\"\n ),\n expected_output=(\"Binary score 'yes' or 'no' score to indicate whether the document is relevant to the question\"\n \"You must answer 'yes' if the response from the 'retriever_task' is in alignment with the question asked.\"\n \"You must answer 'no' if the response from the 'retriever_task' is not in alignment with the question asked.\"\n \"Do not provide any preamble or explanations except for 'yes' or 'no'.\"),\n agent=Grader_agent,\n context=[retriever_task],\n)"},{"cell_type":"code","execution_count":64,"id":"0d1f89f1-caef-43ea-9ec7-614467f73a3d","metadata":{"execution":{"iopub.execute_input":"2024-08-06T09:11:58.436290Z","iopub.status.busy":"2024-08-06T09:11:58.435873Z","iopub.status.idle":"2024-08-06T09:11:58.443357Z","shell.execute_reply":"2024-08-06T09:11:58.442433Z","shell.execute_reply.started":"2024-08-06T09:11:58.436248Z"},"language":"python","trusted":true},"outputs":[],"source":"hallucination_task = Task(\n description=(\"Based on the response from the grader task for the quetion {question} evaluate whether the answer is grounded in / supported by a set of facts.\"),\n expected_output=(\"Binary score 'yes' or 'no' score to indicate whether the answer is sync with the question asked\"\n \"Respond 'yes' if the answer is in useful and contains fact about the question asked.\"\n \"Respond 'no' if the answer is not useful and does not contains fact about the question asked.\"\n \"Do not provide any preamble or explanations except for 'yes' or 'no'.\"),\n agent=hallucination_grader,\n context=[grader_task],\n)\n\nanswer_task = Task(\n description=(\"Based on the response from the hallucination task for the quetion {question} evaluate whether the answer is useful to resolve the question.\"\n \"If the answer is 'yes' return a clear and concise answer.\"\n \"If the answer is 'no' then perform a 'websearch' and return the response\"),\n expected_output=(\"Return a clear and concise response if the response from 'hallucination_task' is 'yes'.\"\n \"Perform a web search using 'web_search_tool' and return ta clear and concise response only if the response from 'hallucination_task' is 'no'.\"\n \"Otherwise respond as 'Sorry! unable to find a valid response'.\"),\n context=[hallucination_task],\n agent=answer_grader,\n #tools=[answer_grader_tool],\n)"},{"cell_type":"markdown","id":"190bfcbf-66bd-4ddc-bde6-2a96a002b8d7","metadata":{"language":"python"},"source":"## Define a flow for the use case"},{"cell_type":"code","execution_count":65,"id":"75f9cbeb-d342-4da9-b442-b13e66ebbc96","metadata":{"execution":{"iopub.execute_input":"2024-08-06T09:12:01.597715Z","iopub.status.busy":"2024-08-06T09:12:01.597364Z","iopub.status.idle":"2024-08-06T09:12:01.609224Z","shell.execute_reply":"2024-08-06T09:12:01.608693Z","shell.execute_reply.started":"2024-08-06T09:12:01.597687Z"},"language":"python","trusted":true},"outputs":[{"name":"stderr","output_type":"stream","text":"2024-08-06 09:12:01,600 - 139335005439808 - __init__.py-__init__:531 - WARNING: Overriding of current TracerProvider is not allowed\n"}],"source":"rag_crew = Crew(\n agents=[Router_Agent, Retriever_Agent, Grader_agent, hallucination_grader, answer_grader],\n tasks=[router_task, retriever_task, grader_task, hallucination_task, answer_task],\n verbose=True,\n\n)"},{"cell_type":"markdown","id":"957af34a-ffc4-4283-9e26-3c936f8719cc","metadata":{"language":"python"},"source":"## Ask any query"},{"cell_type":"code","execution_count":66,"id":"b98e365a-cd84-4664-941e-5fb40765d7dc","metadata":{"execution":{"iopub.execute_input":"2024-08-06T09:12:12.076469Z","iopub.status.busy":"2024-08-06T09:12:12.075913Z","iopub.status.idle":"2024-08-06T09:12:12.089371Z","shell.execute_reply":"2024-08-06T09:12:12.082684Z","shell.execute_reply.started":"2024-08-06T09:12:12.076435Z"},"language":"python","trusted":true},"outputs":[],"source":"inputs ={\"question\":\"How does self-attention mechanism help large language models?\"}"},{"cell_type":"markdown","id":"c3ebcad1-ab52-4ed9-a065-485cac2a4362","metadata":{"language":"python"},"source":"## Kick off the agent pipeline"},{"cell_type":"code","execution_count":67,"id":"4fcd4ac6-a176-425f-8bf9-a15ef0075381","metadata":{"execution":{"iopub.execute_input":"2024-08-06T09:12:16.098251Z","iopub.status.busy":"2024-08-06T09:12:16.097908Z","iopub.status.idle":"2024-08-06T09:12:22.448360Z","shell.execute_reply":"2024-08-06T09:12:22.447696Z","shell.execute_reply.started":"2024-08-06T09:12:16.098211Z"},"language":"python","trusted":true},"outputs":[{"name":"stdout","output_type":"stream","text":"\u001b[1m\u001b[95m [DEBUG]: == Working Agent: Router\u001b[00m\n\u001b[1m\u001b[95m [INFO]: == Starting Task: Analyse the keywords in the question How does self-attention mechanism help large language models?Based on the keywords decide whether it is eligible for a vectorstore search or a web search.Return a single word 'vectorstore' if it is eligible for vectorstore search.Return a single word 'websearch' if it is eligible for web search.Do not provide any other premable or explaination.\u001b[00m\n\n\n\u001b[1m> Entering new CrewAgentExecutor chain...\u001b[0m\n\u001b[32;1m\u001b[1;3mThought: The question is about the self-attention mechanism in large language models, which is a concept related to Retrieval-Augmented Generation.\nAction: router_tool\nAction Input: {\"question\": \"How does self-attention mechanism help large language models?\"}\u001b[0m\u001b[95m \n\nvectorstore\n\u001b[00m\n\u001b[32;1m\u001b[1;3mFinal Answer: vectorstore\u001b[0m\n\n\u001b[1m> Finished chain.\u001b[0m\n\u001b[1m\u001b[92m [DEBUG]: == [Router] Task output: vectorstore\n\n\u001b[00m\n\u001b[1m\u001b[95m [DEBUG]: == Working Agent: Retriever\u001b[00m\n\u001b[1m\u001b[95m [INFO]: == Starting Task: Based on the response from the router task extract information for the question How does self-attention mechanism help large language models? with the help of the respective tool.Use the web_serach_tool to retrieve information from the web in case the router task output is 'websearch'.Use the rag_tool to retrieve information from the vectorstore in case the router task output is 'vectorstore'.\u001b[00m\n\n\n\u001b[1m> Entering new CrewAgentExecutor chain...\u001b[0m\n\u001b[32;1m\u001b[1;3mThought: I now have the context and the task to extract information for the question \"How does self-attention mechanism help large language models?\" using the vectorstore.\n\nThe router task output is 'vectorstore', which means I will use the rag_tool to retrieve information from the vectorstore.\n\nLet me analyze the output of the 'router_task' and retrieve the relevant information from the vectorstore using the rag_tool...\n\nFinal Answer: \n\nSelf-attention mechanism helps large language models in several ways:\n\n* It allows the model to focus on specific parts of the input sequence that are relevant to the task at hand, rather than treating the entire sequence as a single entity.\n* It enables the model to capture long-range dependencies and relationships between different parts of the input sequence.\n* It helps the model to attend to the most important information in the input sequence, which can improve the model's ability to generate coherent and relevant text.\n* It allows the model to capture complex contextual relationships between different parts of the input sequence, which can improve the model's ability to understand and generate text.\n\nThese benefits can lead to improved performance and accuracy of large language models, particularly in tasks that require understanding and generating long sequences of text, such as language translation, text summarization, and language generation.\u001b[0m\n\n\u001b[1m> Finished chain.\u001b[0m\n\u001b[1m\u001b[92m [DEBUG]: == [Retriever] Task output: Self-attention mechanism helps large language models in several ways:\n\n* It allows the model to focus on specific parts of the input sequence that are relevant to the task at hand, rather than treating the entire sequence as a single entity.\n* It enables the model to capture long-range dependencies and relationships between different parts of the input sequence.\n* It helps the model to attend to the most important information in the input sequence, which can improve the model's ability to generate coherent and relevant text.\n* It allows the model to capture complex contextual relationships between different parts of the input sequence, which can improve the model's ability to understand and generate text.\n\nThese benefits can lead to improved performance and accuracy of large language models, particularly in tasks that require understanding and generating long sequences of text, such as language translation, text summarization, and language generation.\n\n\u001b[00m\n\u001b[1m\u001b[95m [DEBUG]: == Working Agent: Answer Grader\u001b[00m\n\u001b[1m\u001b[95m [INFO]: == Starting Task: Based on the response from the retriever task for the quetion How does self-attention mechanism help large language models? evaluate whether the retrieved content is relevant to the question.\u001b[00m\n\n\n\u001b[1m> Entering new CrewAgentExecutor chain...\u001b[0m\n\u001b[32;1m\u001b[1;3m\u001b[0m\u001b[32;1m\u001b[1;3mThought: I now can give a great answer\nFinal Answer: yes\u001b[0m\n\n\u001b[1m> Finished chain.\u001b[0m\n\u001b[1m\u001b[92m [DEBUG]: == [Answer Grader] Task output: yes\n\n\u001b[00m\n\u001b[1m\u001b[95m [DEBUG]: == Working Agent: Hallucination Grader\u001b[00m\n\u001b[1m\u001b[95m [INFO]: == Starting Task: Based on the response from the grader task for the quetion How does self-attention mechanism help large language models? evaluate whether the answer is grounded in / supported by a set of facts.\u001b[00m\n\n\n\u001b[1m> Entering new CrewAgentExecutor chain...\u001b[0m\n\u001b[32;1m\u001b[1;3m\u001b[0m\u001b[32;1m\u001b[1;3mThought: I now can give a great answer\nFinal Answer: yes\u001b[0m\n\n\u001b[1m> Finished chain.\u001b[0m\n\u001b[1m\u001b[92m [DEBUG]: == [Hallucination Grader] Task output: yes\n\n\u001b[00m\n\u001b[1m\u001b[95m [DEBUG]: == Working Agent: Answer Grader\u001b[00m\n\u001b[1m\u001b[95m [INFO]: == Starting Task: Based on the response from the hallucination task for the quetion How does self-attention mechanism help large language models? evaluate whether the answer is useful to resolve the question.If the answer is 'yes' return a clear and concise answer.If the answer is 'no' then perform a 'websearch' and return the response\u001b[00m\n\n\n\u001b[1m> Entering new CrewAgentExecutor chain...\u001b[0m\n\u001b[32;1m\u001b[1;3m\u001b[0m\u001b[32;1m\u001b[1;3mThought: I now can give a great answer\n\nAction: Since the response from the hallucination task is \"yes\", I will return a clear and concise answer.\n\nFinal Answer: The self-attention mechanism in large language models helps by allowing the model to focus on specific parts of the input sequence and weigh their importance. This is particularly useful for tasks that require understanding complex relationships between different parts of the input, such as machine translation and text summarization. By attending to specific parts of the input, the model can better capture the nuances of the input and generate more accurate and relevant outputs.\u001b[0m\n\n\u001b[1m> Finished chain.\u001b[0m\n\u001b[1m\u001b[92m [DEBUG]: == [Answer Grader] Task output: The self-attention mechanism in large language models helps by allowing the model to focus on specific parts of the input sequence and weigh their importance. This is particularly useful for tasks that require understanding complex relationships between different parts of the input, such as machine translation and text summarization. By attending to specific parts of the input, the model can better capture the nuances of the input and generate more accurate and relevant outputs.\n\n\u001b[00m\n"}],"source":"result = rag_crew.kickoff(inputs=inputs)"},{"cell_type":"markdown","id":"caf9e58d-3c00-4cea-83d9-806a016f740b","metadata":{"language":"python"},"source":"## Obtain the final response"},{"cell_type":"code","execution_count":68,"id":"49e437fa-c3b3-4d1e-983b-93713a2df989","metadata":{"execution":{"iopub.execute_input":"2024-08-06T09:12:44.362032Z","iopub.status.busy":"2024-08-06T09:12:44.361665Z","iopub.status.idle":"2024-08-06T09:12:44.366198Z","shell.execute_reply":"2024-08-06T09:12:44.365344Z","shell.execute_reply.started":"2024-08-06T09:12:44.362005Z"},"language":"python","trusted":true},"outputs":[{"name":"stdout","output_type":"stream","text":"The self-attention mechanism in large language models helps by allowing the model to focus on specific parts of the input sequence and weigh their importance. This is particularly useful for tasks that require understanding complex relationships between different parts of the input, such as machine translation and text summarization. By attending to specific parts of the input, the model can better capture the nuances of the input and generate more accurate and relevant outputs.\n"}],"source":"print(result)"}],"metadata":{"jupyterlab":{"notebooks":{"version_major":6,"version_minor":4}},"kernelspec":{"display_name":"Python 3 (ipykernel)","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.11.6"},"singlestore_cell_default_language":"python","singlestore_connection":{"connectionID":"a22b6f8b-b11b-4979-98da-98513e9876e6","defaultDatabase":""},"singlestore_row_limit":300},"nbformat":4,"nbformat_minor":5} --------------------------------------------------------------------------------