├── LICENSE ├── LLM-RAG-GRAPH.ipynb ├── README.md ├── assets ├── Q1-graph.jpg ├── Q1.jpg ├── Q2-graph.jpg └── Q2.jpg ├── coreconfigs.py ├── coreutils.py ├── example_query.py ├── pgdb_setup.sh ├── pgvector.sql ├── requirements.txt ├── setup.sh ├── store_embeddings.py ├── text_processed └── NBK549776.txt └── texts_input ├── NBK501509.txt └── NBK548420.txt /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /LLM-RAG-GRAPH.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "id": "f5f52427", 7 | "metadata": {}, 8 | "outputs": [], 9 | "source": [ 10 | "import gradio as gr\n", 11 | "from coreutils import LLMOps" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": 2, 17 | "id": "c30710dd", 18 | "metadata": {}, 19 | "outputs": [ 20 | { 21 | "data": { 22 | "application/vnd.jupyter.widget-view+json": { 23 | "model_id": "028bbaa9ff794017ae348c296d05019f", 24 | "version_major": 2, 25 | "version_minor": 0 26 | }, 27 | "text/plain": [ 28 | "Loading checkpoint shards: 0%| | 0/8 [00:00" 60 | ], 61 | "text/plain": [ 62 | "" 63 | ] 64 | }, 65 | "metadata": {}, 66 | "output_type": "display_data" 67 | }, 68 | { 69 | "data": { 70 | "text/plain": [] 71 | }, 72 | "execution_count": 3, 73 | "metadata": {}, 74 | "output_type": "execute_result" 75 | }, 76 | { 77 | "name": "stdout", 78 | "output_type": "stream", 79 | "text": [ 80 | "Embedding model ok.\n", 81 | "DB connection established.\n" 82 | ] 83 | } 84 | ], 85 | "source": [ 86 | "%%capture --no-display\n", 87 | " \n", 88 | "#define gradio interface and other parameters\n", 89 | "with gr.Blocks() as app:\n", 90 | " with gr.Row():\n", 91 | " with gr.Column():\n", 92 | " input = gr.Textbox(label=\"Question\", show_copy_button=True)\n", 93 | " with gr.Column():\n", 94 | " slider = gr.Slider(1, 10, value=1, label=\"Randomness\", show_label=True,\n", 95 | " step=1, info=\"High values generates diverse texts.\")\n", 96 | " submit_btn = gr.Button(\"submit\")\n", 97 | " with gr.Row():\n", 98 | " with gr.Column():\n", 99 | " ans = gr.Textbox(label=\"Answer with context\", show_copy_button=True)\n", 100 | " with gr.Column():\n", 101 | " img_html = gr.HTML()\n", 102 | " with gr.Row():\n", 103 | " doc_html = gr.HTML()\n", 104 | " with gr.Row():\n", 105 | " grph_html = gr.HTML()\n", 106 | " submit_btn.click(fn=mdl_gr_response, \n", 107 | " inputs=[input, slider], \n", 108 | " outputs=[ans, grph_html, img_html, doc_html])\n", 109 | "app.load(show_progress=\"minimal\") \n", 110 | "app.launch(share=False, quiet=True, show_api=False, height=1300, show_error=True)" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": null, 116 | "id": "77c6240a", 117 | "metadata": {}, 118 | "outputs": [], 119 | "source": [] 120 | } 121 | ], 122 | "metadata": { 123 | "kernelspec": { 124 | "display_name": "Python 3 (ipykernel)", 125 | "language": "python", 126 | "name": "python3" 127 | }, 128 | "language_info": { 129 | "codemirror_mode": { 130 | "name": "ipython", 131 | "version": 3 132 | }, 133 | "file_extension": ".py", 134 | "mimetype": "text/x-python", 135 | "name": "python", 136 | "nbconvert_exporter": "python", 137 | "pygments_lexer": "ipython3", 138 | "version": "3.10.13" 139 | } 140 | }, 141 | "nbformat": 4, 142 | "nbformat_minor": 5 143 | } 144 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Build relationship Graphs using LLM in a Retrieval-Augmented Generation(RAG) framework with pgvector as a vector database 2 | 3 | 4 | ## Overview 5 | 6 | Tool to build relationship graphs using a large language module (LLM). 7 | Supports adding context to the query using Retrieval-Augmented Generation(RAG). Context is built against an internal knowledge base. Context embeddings are stored and retrieved from a vector database. Relationships are stored in the database. 8 | 9 | 10 | ## Tool Features 11 | - Store context in the vector database 12 | - Retrieve context from vector database, supplement the query with the context thus improve LLM response quality 13 | - Along with the LLM response, visualize the relationships in the document(s), highlight related documents and images 14 | 15 | 16 | ## Installation 17 | ### Prerequisites 18 | 19 | - [Python](https://www.python.org/downloads/) 3.10 or greater 20 | - check requirements.txt for required python libraries 21 | 22 | ### Supported Database 23 | 24 | - [PostgreSQL](https://www.postgresql.org/) . Supports Postgres 11+ . Tested on 14.10. 25 | 26 | ### Vector Database 27 | 28 | - [pgvector](https://github.com/pgvector/pgvector) 29 | 30 | 31 | ### Scripts 32 | 33 | - pgdb_setup.sh: Install postgresql14.10 database on Ubuntu. 34 | - pgvector.sql: Configure postgresql database as a vector database 35 | - setup.sh: Install required python packages, configure vector database. Assumes PostgreSQL database on the same host. Review the file before execution. 36 | 37 | 38 | ## Application 39 | 40 | - coreconfigs.py: Application configurations. An important file to review and edit. 41 | - store_embeddings.py: Wrapper script to read the text files, generate and store embeddings, relationships in pgvector database 42 | - example_query.py: Example to query LLM, save results as a html 43 | - LLM-RAG-GRAPH.ipynb: Jupyter notebook with Gradio interface can also be used to interact with the LLM and visualize the graph 44 | 45 | 46 | ## Getting Started 47 | 48 | ### Application config and run 49 | - Download the repo 50 | - Perform the installation steps (see above) 51 | - #### Edit coreconfigs.py to update the postgreSQL DB connection. 52 | 53 | - run store_embeddings.py to store the embeddings, relationships into pgvector DB 54 | 55 | ``` 56 | Embedding model ok. 57 | DB connection established. 58 | Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 8/8 [00:02<00:00, 3.39it/s] 59 | Processing text file: NBK548420.txt 60 | Get relations: Cetirizine and its enantiomer levocetirizine are second generation antihistamines that are used for the treatment of allergic rhinitis, angioedema and chronic urticaria. 61 | ... 62 | ... 63 | Embeddings commited for file: texts_input\NBK548420.txt 64 | ``` 65 | 66 | - run the example_query.py to test 67 | 68 | ``` 69 | python example_query.py 70 | Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 8/8 [00:04<00:00, 1.99it/s] 71 | WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu. 72 | Embedding model ok. 73 | DB connection established. 74 | View the html file: user_qry_results.html for the results 75 | 76 | ... 77 | ``` 78 | 79 | 80 | ## Example 1 81 | 82 |
83 | 84 |

Generated graph full resolution

85 | 86 |
87 | 88 | 89 | ## Example2: Query with a typo 90 | 91 |
92 | 93 |

Generated graph full resolution

94 | 95 |
-------------------------------------------------------------------------------- /assets/Q1-graph.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ryogesh/llm-rag-graph/37658d65b169b9342f37a567f31c63e8dc7b7488/assets/Q1-graph.jpg -------------------------------------------------------------------------------- /assets/Q1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ryogesh/llm-rag-graph/37658d65b169b9342f37a567f31c63e8dc7b7488/assets/Q1.jpg -------------------------------------------------------------------------------- /assets/Q2-graph.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ryogesh/llm-rag-graph/37658d65b169b9342f37a567f31c63e8dc7b7488/assets/Q2-graph.jpg -------------------------------------------------------------------------------- /assets/Q2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ryogesh/llm-rag-graph/37658d65b169b9342f37a567f31c63e8dc7b7488/assets/Q2.jpg -------------------------------------------------------------------------------- /coreconfigs.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # -*- coding: utf-8 -*- 3 | 4 | """ All configuration items 5 | Important: CHANGE the postgresDB connection information below 6 | Optional: To ignore during text extraction, Change 7 | _IGNORE_SENTS: list of sentences (not words) 8 | """ 9 | 10 | # Models used: Embedding model, LLM and Spacy model (for sentence identification) 11 | # Embedding Model with lesser Dimension=384, better for vectorDB performance 12 | # Use MTEB leaderboard: https://huggingface.co/spaces/mteb/leaderboard for model details 13 | # Embedding Model Sequence Length = 512, setting _MAX_TKNLEN to about 25% 14 | # Text chunks shouldn't be too short or too long to be of good context 15 | _EMBED_MDL = "khoa-klaytn/bge-small-en-v1.5-angle" 16 | _DB_EMBED_DIM = 384 17 | _MAX_TKNLEN = 120 18 | 19 | # LLM 20 | _LLM_NAME = "HuggingFaceH4/zephyr-7b-beta" 21 | _LLM_MSG_TMPLT = [{ "role": "system", "content": "",}, {"role": "user", "content": ''},] 22 | 23 | # LLM model sequence length = 4k, we will provide about 1k tokens, _MAX_TKNLEN*_MAX_SIM_TXTS 24 | # Higher length requires higher GPU processing, memory and can lead to OoM error on smaller GPUs. 25 | # Reducing context tokens, reduces processing costs. 26 | # But short contexts may lead to inaccurate or repetitive answers. 27 | _MAX_SIM_TXTS = 4 28 | 29 | # SciSpacy model 30 | # see https://github.com/allenai/scispacy?tab=readme-ov-file#available-models 31 | _SPACYMDL = "en_core_sci_lg" 32 | _SPACY_MAX_TKNLEN = 25 33 | 34 | # Directory to store extracted texts 35 | _TEXTDIR = "texts_input" 36 | # Directory to store texts once embeddings are stored in vector DB 37 | _TXTSREADDIR = "texts_processed" 38 | 39 | 40 | #PgVector DB details 41 | _PGHOST = "1.1.1.1" 42 | _PGPORT = 5432 43 | _PGUSER = "ragu" 44 | _PGDB = "ragdb" 45 | _PGPWD = "yourpassword" 46 | -------------------------------------------------------------------------------- /coreutils.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # -*- coding: utf-8 -*- 3 | 4 | """ coreutils module: Provides common utilities for other modules """ 5 | from pathlib import Path 6 | from time import sleep 7 | import sys 8 | from datetime import datetime, timezone 9 | import json 10 | from base64 import b64encode 11 | 12 | import graphviz 13 | 14 | import numpy as np 15 | import psycopg 16 | 17 | import spacy 18 | 19 | import torch 20 | import transformers 21 | from sentence_transformers import SentenceTransformer 22 | 23 | from coreconfigs import _SPACYMDL, _LLM_NAME, _LLM_MSG_TMPLT, _EMBED_MDL, _TXTSREADDIR, \ 24 | _DB_EMBED_DIM, _MAX_SIM_TXTS, _MAX_TKNLEN, _SPACY_MAX_TKNLEN, \ 25 | _PGHOST, _PGPORT, _PGUSER, _PGDB, _PGPWD 26 | 27 | 28 | class DbOps(): 29 | """ For database operations """ 30 | def __init__(self): 31 | self.stmt = '' 32 | self.values = '' 33 | self._tmr = 0 34 | self._conn = '' 35 | self._dbconn_retry() 36 | 37 | def _dbconn_retry(self): 38 | try: 39 | self._conn = psycopg.connect(dbname=_PGDB, 40 | user=_PGUSER, 41 | password=_PGPWD, 42 | host=_PGHOST, 43 | port=_PGPORT, 44 | sslmode="prefer", 45 | connect_timeout=2) 46 | except psycopg.OperationalError: 47 | if self._tmr < 6: 48 | self._tmr += 3 49 | ### Server connection issue, try in few secs 50 | print(f"Unable to connect to database, trying in {self._tmr} secs...") 51 | sleep(self._tmr) 52 | self._dbconn_retry() 53 | else: 54 | self._tmr = 0 55 | raise 56 | 57 | def execstmt(self): 58 | """ Execute the DB statements """ 59 | cur = self._conn.execute(self.stmt, self.values) 60 | res = '' 61 | if cur.description: #check for return rows 62 | res = cur.fetchall() 63 | return res 64 | 65 | def commit(self): 66 | """ Commits the transaction""" 67 | self._conn.commit() 68 | 69 | def rollback(self): 70 | """ Rollback the transaction""" 71 | self._conn.rollback() 72 | 73 | 74 | class Embeds(): 75 | """ 76 | Provides helper functions to 77 | 1. Iterate all the directories under _TEXTDIR 78 | Read text, chunk and save in pgvector DB 79 | 2. Generate embedding and store in pgvector DB 80 | 3. If text_relations = True, then for each sentence generate obj> triplet 81 | 4. search for similar texts in pgvector DB 82 | """ 83 | 84 | def __init__(self, dbconn=True, text_relations=True): 85 | self.emb_mdl = SentenceTransformer(_EMBED_MDL) 86 | self._text_relations = text_relations 87 | 88 | ## Verify embedding dimension size before processing 89 | embeddings = self.emb_mdl.encode("Hello World") 90 | if _DB_EMBED_DIM < embeddings.size: 91 | print(f"DB field length={_DB_EMBED_DIM}. Embedding dimension={embeddings.size}") 92 | print("Choose a different model or change embedding dimension on DB.") 93 | print("Exiting...") 94 | sys.exit(1) 95 | else: 96 | print("Embedding model ok.") 97 | if dbconn: 98 | self.dbo = DbOps() 99 | print("DB connection established.") 100 | # similarity: <=> cosine, <-> L2, <#> inner product 101 | # We normalize embeddings so use <#> 102 | # Ensure t_document_chunks index is using vector_ip_ops 103 | self.dbo_stmts = {"upd_doc":"update t_documents set created_at=%s where id=%s", 104 | "ins_doc": "insert into t_documents (doc_name) values(%s) RETURNING id", 105 | "sel_doc": "select id from t_documents where doc_name = %s", 106 | "doc_refs": "select doc_name, doc_reference from t_documents \ 107 | where id in ({qargs}) and doc_reference is NOT NULL", 108 | "doc_images": "select img_desc, img_reference from t_document_images \ 109 | where doc_id in ({qargs}) ", 110 | "del_txts": "delete from t_document_chunks where doc_id = %s", 111 | "del_relations": "delete from t_chunk_relations where doc_id = %s", 112 | "ins_txt": "insert into t_document_chunks (doc_id, chunk, embedding) \ 113 | values(%s, %s, %s) RETURNING id", 114 | "ins_relations": "insert into t_chunk_relations (doc_id, chunk_id, text_relation, json_relation) \ 115 | values(%s, %s, %s, %s)", 116 | "ins_relations_nj": "insert into t_chunk_relations (doc_id, chunk_id, text_relation) \ 117 | values(%s, %s, %s)", 118 | "sim_chunks": "select json_relation from t_chunk_relations \ 119 | where chunk_id in ({qargs}) and json_relation is NOT NULL", 120 | "sim_txts": f"SELECT id, doc_id, chunk FROM t_document_chunks \ 121 | ORDER BY embedding <#> %s LIMIT {_MAX_SIM_TXTS}" 122 | } 123 | if self._text_relations: 124 | self.prsr = spacy.load(_SPACYMDL) 125 | self.llm = LLMOps() 126 | self.llm.gconfigdct["temperature"] = .1 127 | self.llm.gconfigdct["max_new_tokens"] = 512 128 | system_content = """Translate the user content as entity relation triplet in 129 | {"subj": "", "relation": "", "obj": ""} json format.""" 130 | self.llm.msg_tmplt[0]['content'] = system_content 131 | self.gconfig = transformers.GenerationConfig(**self.llm.gconfigdct) 132 | 133 | def np_to_str(self, val): 134 | """Convert np.float32 to np.float64. json.dumps supports it.""" 135 | return np.float64(val) 136 | 137 | def dbexec(self, stmt, values, msg): 138 | """ 139 | Generic function for executing database statements 140 | If results=False, returns '' 141 | If results=True returns all rows 142 | """ 143 | self.dbo.stmt = stmt 144 | self.dbo.values = values 145 | retval = '' 146 | try: 147 | retval = self.dbo.execstmt() 148 | except Exception: 149 | print(f"{msg} failed....") 150 | print("Rolling back transaction") 151 | print(f"Statement: {self.dbo.stmt}") 152 | print(f"Values: {self.dbo.values}") 153 | self.dbo.rollback() 154 | raise 155 | self.dbo.stmt = '' 156 | self.dbo.values = '' 157 | return retval 158 | 159 | def save_chunk_relations(self, txtlst, docid, chunkid): 160 | """ Get the relation triplet from LLM and insert into DB """ 161 | def process_itm(jsn): 162 | json_val = '' 163 | if isinstance(jsn["obj"], str): 164 | if jsn.get("obj_qualifier"): 165 | jsn["relation"] = f'{jsn["obj_qualifier"]} {jsn["relation"]}' 166 | if jsn.get("context"): 167 | jsn["relation"] = f'{jsn["subj"]} {jsn["relation"]}' 168 | jsn["subj"] = jsn["context"] 169 | if jsn["obj"]: 170 | json_val = jsn 171 | elif isinstance(jsn["obj"], list): 172 | obj = '' 173 | try: 174 | if isinstance(jsn["obj"][0], str): 175 | obj = ', '.join(jsn["obj"]) 176 | elif isinstance(jsn["obj"][0], dict): 177 | obj = ', '.join({i["subj"] for i in jsn["obj"] if i["subj"]}) 178 | if obj: 179 | jsn["obj"] = obj 180 | json_val = jsn 181 | except TypeError: 182 | print("Ignoring list triplet due to incorrect json format") 183 | return json_val 184 | 185 | for text in txtlst: 186 | doc = self.prsr(text) 187 | pos = {tkn.pos_ for tkn in doc} 188 | # Generate relations only on sentences 189 | # with < 25 (default) tokens, else the generated relations can be too complicated. 190 | # with Noun and Verb 191 | if len(doc) < _SPACY_MAX_TKNLEN and 'NOUN' in pos and 'VERB' in pos: 192 | self.llm.msg_tmplt[1]['content'] = text 193 | print(f"Get relations: {text}") 194 | prompt = self.llm.pipeline.tokenizer.apply_chat_template(self.llm.msg_tmplt, 195 | tokenize=False, 196 | add_generation_prompt=False) 197 | outputs = self.llm.pipeline(prompt, generation_config=self.gconfig) 198 | res = outputs[0]["generated_text"].split("<|assistant|>\n")[1] 199 | print(f"Generated triplet: {res}") 200 | jlst = [] 201 | try: 202 | jsn = json.loads(res) 203 | except json.decoder.JSONDecodeError: 204 | try: 205 | for itm in res.split('{')[1:]: 206 | jsn = json.loads('{'+itm.replace('\n','').strip(',')) 207 | jitm = process_itm(jsn) 208 | if jitm: 209 | jlst.append(jitm) 210 | except json.decoder.JSONDecodeError: 211 | print("Ignoring triplet due to incorrect json format") 212 | else: 213 | jitm = process_itm(jsn) 214 | if jitm: 215 | jlst.append(jitm) 216 | if jlst: 217 | _ = self.dbexec(self.dbo_stmts['ins_relations'], 218 | (docid, chunkid, res, json.dumps(jlst)), 219 | "Insert chunk relations") 220 | else: 221 | _ = self.dbexec(self.dbo_stmts['ins_relations_nj'], 222 | (docid, chunkid, res), 223 | "Insert chunk relations") 224 | 225 | def save_embeddings_to_db(self, fldr, parent='.'): 226 | """ 227 | Iterate all the directories under _TEXTDIR (fldr) 228 | Read text file, chunk texts and save chunk+embeddings in pgvector DB 229 | For each sentence get the relation triplet , 230 | LLM optionally provides "obj_qualifier" or "context" 231 | """ 232 | def emb_to_db(txtchunk, txtlst, docid): 233 | embeddings = self.emb_mdl.encode(txtchunk) 234 | # Normalizing the embeddings, just in case 235 | # default is Frobenius norm 236 | # https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html 237 | fnorm = np.linalg.norm(embeddings) 238 | lst = list(embeddings/fnorm) 239 | # json supports only np.float64. Convert np.float32 240 | embed_str = json.dumps(lst, default=np.float64) 241 | chunkid = self.dbexec(self.dbo_stmts['ins_txt'], 242 | (docid, json.dumps(txtlst), embed_str), 243 | "Insert chunk into Document") 244 | #print(f"docid: {docid}, chunkid:{chunkid[0][0]}") 245 | if self._text_relations: 246 | self.save_chunk_relations(txtlst, docid, chunkid[0][0]) 247 | 248 | for rfl in fldr.iterdir(): 249 | if rfl.is_file(): 250 | print(f"Processing text file: {rfl.name}") 251 | # If the file has been processed already, delete the document chunks and reprocess 252 | doc_id = self.dbexec(self.dbo_stmts['sel_doc'], (rfl.name, ), 253 | "Check for Document") 254 | 255 | if doc_id: 256 | docid = doc_id[0][0] 257 | _ = self.dbexec(self.dbo_stmts['del_relations'], (docid, ), 258 | "Deleting chunk relations") 259 | _ = self.dbexec(self.dbo_stmts['del_txts'], (docid, ), 260 | "Deleting document chunks") 261 | _ = self.dbexec(self.dbo_stmts['upd_doc'], 262 | (datetime.now(tz=timezone.utc), docid), 263 | "Updating document timestamp") 264 | else: 265 | doc_id = self.dbexec(self.dbo_stmts['ins_doc'], (rfl.name, ), 266 | "Insert Document") 267 | docid = doc_id[0][0] 268 | with open(rfl, encoding="utf-8", errors="replace") as txt_fl: 269 | filetexts = txt_fl.readlines() 270 | txtchunk = '' 271 | txtlst = [] 272 | for txt in filetexts: 273 | txt = txt.strip() 274 | txtchunk = f"{txtchunk} {txt}" 275 | txtlst.append(txt) 276 | if len(txtchunk.split()) >= _MAX_TKNLEN: 277 | emb_to_db(txtchunk, txtlst, docid) 278 | txtlst = [] 279 | txtchunk = '' 280 | emb_to_db(txtchunk, txtlst, docid) # Pending will be a separate chunk 281 | self.dbo.commit() 282 | print(f"Embeddings commited for file: {rfl}") 283 | try: 284 | _ = rfl.replace(Path(_TXTSREADDIR, parent, rfl.name)) 285 | except (PermissionError, FileExistsError, FileNotFoundError) as err: 286 | print(f"File not moved: {err}") 287 | print("Ignoring error...") 288 | 289 | if rfl.is_dir(): 290 | print(f"Creating text processed directory: {rfl.name}") 291 | Path(_TXTSREADDIR, rfl.name).mkdir(parents=True, exist_ok=True) 292 | self.save_embeddings_to_db(rfl, rfl.name) 293 | # Delete the processed text directory, ignore error if any file exists 294 | try: 295 | rfl.rmdir() 296 | except (OSError, FileNotFoundError) as err: 297 | print(f"Directory not deleted: {err}") 298 | print("Ignoring error...") 299 | 300 | def get_similar_texts(self, text): 301 | """ 302 | 1. Generate text embedding. 303 | 2. Compare similarity against vectorDB and get texts similar to the input text. 304 | """ 305 | embeddings = self.emb_mdl.encode(text) 306 | # Normalize before querying the DB 307 | fnorm = np.linalg.norm(embeddings) 308 | lst = list(embeddings/fnorm) 309 | # json supports only np.float64. Convert np.float32 310 | embed_str = json.dumps(lst, default=np.float64) 311 | sim_txts = self.dbexec(self.dbo_stmts['sim_txts'], (embed_str,), "Get similar texts") 312 | sim_chunk_ids = {itm[0] for itm in sim_txts} 313 | sim_doc_ids = {itm[1] for itm in sim_txts} 314 | all_txts = [] 315 | contxt = '' 316 | # Avoid duplicate sentences, less noise in context is better for LLM response 317 | for itm in sim_txts: 318 | for txt in itm[2]: 319 | if txt not in all_txts: 320 | all_txts.append(txt) 321 | contxt = f"{contxt} {txt}" 322 | # Do not exceed the tokens limit 323 | if len(contxt.split()) >= _MAX_TKNLEN*_MAX_SIM_TXTS: 324 | break 325 | return contxt, sim_chunk_ids, sim_doc_ids 326 | 327 | class LLMOps(): 328 | """For LLM operations """ 329 | def __init__(self): 330 | self.pipeline = transformers.pipeline("text-generation", 331 | model=_LLM_NAME, 332 | torch_dtype=torch.bfloat16, 333 | device_map="auto", 334 | ) 335 | self.gconfigdct = self.pipeline.model.generation_config.to_dict() 336 | self.gconfigdct["max_new_tokens"] =256 337 | self.gconfigdct["do_sample"] = True 338 | self.gconfigdct["top_k"] = 50 339 | self.gconfigdct["top_p"] = 0.95 340 | self.gconfigdct["pad_token_id"] = self.pipeline.model.config.eos_token_id 341 | self.gconfigdct["temperature"] = .7 342 | self.emb = '' 343 | self.msg_tmplt = _LLM_MSG_TMPLT 344 | 345 | def mdl_response(self, qry, temp=1): 346 | """ Function returns the answer from the LLM with context""" 347 | if temp < 1 or temp > 9: 348 | temp = 7 349 | if not self.emb: 350 | self.emb = Embeds(text_relations=False) 351 | contxt, sim_chunk_ids, sim_doc_ids = self.emb.get_similar_texts(qry) 352 | self.msg_tmplt[1]['content'] = contxt 353 | prompt = self.pipeline.tokenizer.apply_chat_template(self.msg_tmplt, tokenize=False, 354 | add_generation_prompt=True) 355 | self.gconfigdct["temperature"] = temp/10 356 | gconfig = transformers.GenerationConfig(**self.gconfigdct) 357 | outputs = self.pipeline(prompt, generation_config=gconfig) 358 | res = outputs[0]["generated_text"].split("<|assistant|>\n")[1] 359 | #print(f"Similar Chunk ids: {sim_chunk_ids}") 360 | #print(f"Similar Doc ids: {sim_doc_ids}") 361 | 362 | # Get document relations 363 | dbqry = self.emb.dbo_stmts['sim_chunks'] 364 | dbqry = dbqry.replace("{qargs}", ','.join(str(i) for i in sim_chunk_ids)) 365 | dbres = self.emb.dbexec(dbqry, None, "Get chunk relations") 366 | sim_chunk_lst = [] 367 | for each in dbres: 368 | for row in each: 369 | for itm in row: 370 | sim_chunk_lst.append((itm["subj"], itm['obj'], itm['relation'])) 371 | 372 | # Get document references, document images 373 | docs = ','.join(str(i) for i in sim_doc_ids) 374 | dbqry = self.emb.dbo_stmts['doc_refs'] 375 | dbqry = dbqry.replace("{qargs}", docs) 376 | dbres = self.emb.dbexec(dbqry, None, "Get document references") 377 | sim_doc_refs = {row[0]:row[1] for row in dbres} 378 | 379 | dbqry = self.emb.dbo_stmts['doc_images'] 380 | dbqry = dbqry.replace("{qargs}", docs) 381 | dbres = self.emb.dbexec(dbqry, None, "Get document images") 382 | sim_doc_imgs = {row[0]:row[1] for row in dbres} 383 | return res, set(sim_chunk_lst), sim_doc_refs, sim_doc_imgs 384 | 385 | def mdl_ui_response(self, qry, temp=1): 386 | """ Function returns ui friendly results from the LLM 387 | Returns a dictionary of UI elements {"answer", "graph", "images", "docs"} 388 | Query Answer, relations graph, images from documents, associated documents 389 | """ 390 | res, sim_chunk_lst, sim_doc_refs, sim_doc_imgs = self.mdl_response(qry, temp) 391 | # Build graph, return as jpeg image 392 | grph = graphviz.Digraph('wide') 393 | for row in sim_chunk_lst: 394 | grph.edge(row[0].lower(), row[1].lower(), row[2].lower()) 395 | unflt = grph.unflatten(stagger=5) 396 | grph_html = "

Relations graph

" 397 | grph = '
' 398 | grph_html += grph %(b64encode(unflt._repr_image_jpeg()).__repr__()[2:-1]) 399 | 400 | # Images in document 401 | img_html = '

Images in documents

' 402 | img = '

%s%s

' 403 | for key, val in sim_doc_imgs.items(): 404 | img_html += img %(key, val, key) 405 | 406 | # Documents queried for context 407 | doc_html = '

Documents referenced

' 408 | ref = '' 409 | for key, val in sim_doc_refs.items(): 410 | doc_html += ref %(val, val, key.split('.')[0]) 411 | return {"answer":res, "graph":grph_html, "images":img_html, "docs":doc_html} 412 | -------------------------------------------------------------------------------- /example_query.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # -*- coding: utf-8 -*- 3 | 4 | """ Example script to query LLM """ 5 | 6 | from coreutils import LLMOps 7 | 8 | 9 | llm = LLMOps() 10 | 11 | #typo in the query 12 | user_qry = "cetrizine guideline" 13 | dct = llm.mdl_ui_response(user_qry) 14 | 15 | html = f'

Question:

{user_qry}\ 16 |

LLM Response:

{dct["answer"]}
\ 17 | {dct["images"]} {dct["docs"]} {dct["graph"]} ' 18 | fname = "user_qry_results.html" 19 | with open(fname, 'wt', encoding="utf-8") as fl: 20 | fl.write(html) 21 | 22 | print(f"View the html file: {fname} for the results") 23 | -------------------------------------------------------------------------------- /pgdb_setup.sh: -------------------------------------------------------------------------------- 1 | # Refer: https://www.postgresql.org/download/linux/ubuntu/ 2 | sh -c 'echo "deb https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs 2>/dev/null)-pgdg main" > /etc/apt/sources.list.d/pgdg.list' 3 | 4 | # Import the repository signing key: 5 | wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add - 6 | 7 | # Update the package lists: 8 | apt-get update 9 | 10 | # Install the latest version of PostgreSQL. 11 | # If you want a specific version, use 'postgresql-12' or similar instead of 'postgresql': 12 | apt-get -y install postgresql-server-14 postgresql-server-dev-14 13 | 14 | echo "host all all 192.168.0.0/16 scram-sha-256" > /etc/postgresql/14/main/pg_hba.conf 15 | 16 | 17 | sed -i "s/#listen_addresses = 'localhost'/listen_addresses = '*'/" /etc/postgresql/14/main/postgresql.conf 18 | sed -i "s/shared_buffers = 128MB/shared_buffers = 256MB/" /etc/postgresql/14/main/postgresql.conf 19 | sed -i "s/#maintenance_work_mem = 64MB/maintenance_work_mem = 512MB/" /etc/postgresql/14/main/postgresql.conf 20 | sed -i "s/#jit = on/jit = off/" /etc/postgresql/14/main/postgresql.conf 21 | 22 | systemctl restart postgresql 23 | -------------------------------------------------------------------------------- /pgvector.sql: -------------------------------------------------------------------------------- 1 | create database ragdb; 2 | create user ragu with encrypted password 'yourpassword'; 3 | grant all privileges on database ragdb to ragu; 4 | 5 | \c ragdb 6 | CREATE EXTENSION if not exists vector; 7 | CREATE TABLE t_documents (id bigserial PRIMARY KEY, 8 | doc_name varchar(256), 9 | doc_reference varchar, 10 | created_at timestamp default now()); 11 | 12 | CREATE TABLE t_document_chunks (id bigserial PRIMARY KEY, 13 | doc_id bigserial not null references t_documents(id), 14 | chunk jsonb, 15 | embedding vector(384), 16 | created_at timestamp default now()); 17 | 18 | CREATE INDEX ON t_document_chunks USING hnsw (embedding vector_ip_ops) WITH (m = 16, ef_construction = 128); 19 | 20 | CREATE TABLE t_chunk_relations (id bigserial PRIMARY KEY, 21 | doc_id bigserial not null references t_documents(id), 22 | chunk_id bigserial not null references t_document_chunks(id), 23 | text_relation varchar, 24 | json_relation jsonb); 25 | CREATE INDEX idx_t_chunk_relations ON t_chunk_relations (chunk_id); 26 | 27 | CREATE TABLE t_document_images (id bigserial PRIMARY KEY, 28 | doc_id bigserial not null references t_documents(id), 29 | img_reference varchar, 30 | img_desc varchar(256)); 31 | 32 | GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA PUBLIC to ragu; 33 | GRANT ALL ON ALL SEQUENCES IN SCHEMA PUBLIC to ragu; 34 | \q 35 | 36 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | ftfy==6.1.3 2 | numpy==1.26.3 3 | torch==2.0.1 4 | transformers==4.36.2 5 | accelerate==0.26.0 6 | sentence-transformers==2.2.2 7 | gradio==4.14.0 8 | psycopg[binary]==3.1.17 9 | spacy==3.7.2 10 | graphviz== -------------------------------------------------------------------------------- /setup.sh: -------------------------------------------------------------------------------- 1 | 2 | # Install required python packages 3 | pip install -r requirements.txt 4 | 5 | # Install scispacy model, refer https://github.com/allenai/scispacy 6 | pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.3/en_core_sci_lg-0.5.3.tar.gz 7 | 8 | # If postgresDB is not installed 9 | pgdb_setup.sh 10 | 11 | # Setup the vector database for RAG 12 | su -c 'psql < pgvector.sql' postgres 13 | -------------------------------------------------------------------------------- /store_embeddings.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # -*- coding: utf-8 -*- 3 | 4 | """ Script to 5 | 1. Iterate all files under the directory 6 | 2. Read text file, chunk texts and save chunk+embeddings in pgvector DB 7 | """ 8 | 9 | from pathlib import Path 10 | 11 | from coreconfigs import _TEXTDIR 12 | from coreutils import Embeds 13 | 14 | 15 | texts = Path(_TEXTDIR) 16 | embd = Embeds(dbconn=True, text_relations=True) 17 | embd.save_embeddings_to_db(texts) 18 | -------------------------------------------------------------------------------- /text_processed/NBK549776.txt: -------------------------------------------------------------------------------- 1 | Cetirizine is a medication used in the treatment of allergic rhinitis and urticaria. 2 | It is a second-generation antihistamine. 3 | Cetirizine was FDA-approved in the United States as a prescription-only product in 1995, and later in 2007, it got approval as an over-the-counter medication. 4 | This activity reviews the indications, action, and contraindications for cetirizine as a valuable agent in treating rhinitis and urticaria. 5 | This activity will highlight the mechanism of action, adverse event profile, and other key factors (e.g., off-label uses, dosing, pharmacodynamics, pharmacokinetics, monitoring, relevant interactions) pertinent for members of the interprofessional team in the treatment of patients using cetirizine. 6 | Summarize the mechanism of action of cetirizine. 7 | Outline the adverse effects of cetirizine. 8 | Review the monitoring of cetirizine. 9 | Explain how healthcare professionals should educate patients on the potential adverse effects of cetirizine, such as drowsiness and fatigue. 10 | Cetirizine was FDA-approved in the United States as a prescription-only product in 1995, and later in 2007, it got approval as an over-the-counter medication. 11 | Derived from the first-generation antihistamine hydroxyzine, cetirizine does not cross the blood-brain barrier to the extent of its first-generation counterparts; as a result, cetirizine is an effective treatment of allergic rhinitis that simultaneously minimizes the possibility of adverse sedative effects. 12 | In addition, cetirizine is a second-generation antihistamine that effectively relieves sneezing, rhinorrhea, and watery eyes associated with seasonal allergies and allergic rhinitis due to allergens such as dust mites and molds. 13 | In addition, cetirizine is available as a prescription-only ophthalmic formulation to treat allergic conjunctivitis. 14 | Cetirizine is an FDA-approved medication for the relief and treatment of allergic rhinitis and chronic urticaria. 15 | Cetirizine also effectively reduces hives' severity and pruritus in patients with idiopathic urticaria. 16 | Cetirizine is also safe to use in the geriatric population. 17 | Cetirizine is safe to treat perennial allergic rhinitis and urticaria in adults and children over the age of 6 months; it is indicated for treating seasonal allergies in adults and children two years and older. 18 | The ophthalmic formulation of cetirizine is FDA-approved to treat allergic conjunctivitis. 19 | Second-generation antihistamines like cetirizine are safe and effective treatment options in patients with chronic urticaria and are considered first-line agents by the AAAAI and ACAAI guidelines. 20 | Cetirizine is used as an adjunct to epinephrine (off-label) for the management of anaphylaxis. 21 | (The American Academy of Allergy, Asthma & Immunology (AAAAI) and the American College of Allergy, Asthma & Immunology (ACAAI) guidelines). 22 | Cetirizine is a fast-acting, highly selective antagonist of the peripheral histamine H1 receptor. 23 | The H1-receptors inhibited by cetirizine are primarily on respiratory smooth muscle cells, vascular endothelial cells, immune cells, and the gastrointestinal tract. 24 | Unlike first-generation antihistamines such as diphenhydramine and doxylamine, cetirizine does not cross the blood-brain barrier to a large extent, avoiding the neurons of the central nervous system. 25 | As a result, cetirizine produces minimal sedation compared to many first-generation antihistamines. 26 | Given its antagonism with histamine H1-receptors, cetirizine effectively reverses many of the effects of histamine. 27 | Like other second-generation antihistamines, cetirizine decreases vascular permeability, decreasing fluid escaping to tissues from capillaries. 28 | Cetirizine is also an inhibitor of histamine-induced bronchospasm. 29 | Cetirizine has been found to exert significant anti-inflammatory activity, reducing the infiltration of inflammatory cells in the setting of allergic rhinitis. 30 | Specifically, research has found that cetirizine minimizes the migration of neutrophils and eosinophils. 31 | Cetirizine is absorbed rapidly in the gastrointestinal tract and undergoes substantial excretion by the kidney. 32 | Cetirizine reaches peak plasma concentration after approximately one hour. 33 | Its effects typically begin after 20 to 60 minutes and persist for at least 24 hours. 34 | Food does not affect the extent of exposure (AUC) of cetirizine, but the time to attain peak concentration is delayed by 1.7 hours. 35 | The mean plasma protein binding of cetirizine is 93%. 36 | Cetirizine undergoes oxidative O-dealkylation to a metabolite with negligible antihistaminic activity. 37 | Cetirizine is not a substrate of the CYP450 system. 38 | Evidence indicates that cetirizine is a P-glycoprotein substrate, which should be considered in the concurrent use of cetirizine with P-gp inhibitors. 39 | The elimination half-life of cetirizine is 8.3 hours. 40 | Cetirizine is primarily excreted through the kidney. 41 | Occasional and small doses of cetirizine are acceptable while breastfeeding. 42 | Prolonged use of larger doses may cause a decrease in the milk supply or drowsiness and other adverse effects in the infant, particularly in combination with pseudoephedrine. 43 | The use of an ophthalmic formulation of cetirizine by the mother is hypothesized to have minimal risk to the breastfed infant. 44 | Clinicians should advise the mother to apply pressure over the tear duct by the corner of the eye and remove the leftover solution to decrease the amount of drug that reaches the breast milk. 45 | Cetirizine is a former US FDA pregnancy category B medicine. 46 | The American College of Obstetricians and Gynecologists & the American College of Allergy, Asthma, and Immunology (ACOG-ACAAI) suggests cetirizine for pregnant women who requires antihistamine treatment. 47 | Cetirizine should be used in pregnancy only when necessary. 48 | According to manufacturers prescribing information, 12 years and older patients with hepatic impairment should reduce the dose to 5 mg once daily. 49 | The manufacturer also recommends lowering the dose for 6 to 11 years old patients with hepatic impairment. 50 | According to manufacturers prescribing information, 12 years and older patients with decreased renal function (CrCL 11-31 mL/min) and patients on hemodialysis (CrCL less than 7 mL/min) should reduce the dose to 5 mg once daily. 51 | The manufacturer also recommends lowering the dose for 6 to 11 years old patients with renal impairment. 52 | Each bottle of 0.24% cetirizine hydrochloride contains benzalkonium chloride and can be absorbed by contact lenses. 53 | Manufacturers advise patients to remove contact lenses before instilling eye drops and wait for 10 min until the reinsertion of lenses. 54 | If irritation or redness persists after this precaution, then avoid the use of contact lenses. 55 | It is proven safe and effective for pediatric patients two years and above in clinical studies. 56 | Cetirizine is available as tablets, capsules, solutions, and orally disintegrating tablets. 57 | The dosing of cetirizine depends on the patient's age. 58 | In adults and children 12 years or older, the recommended dose is 5 or 10 mg per day orally, depending on symptom severity. 59 | It is available in 5 mg and 10 mg tablets and 5 mg/ 5 ml oral solution and elixir. 60 | The ophthalmic formulation is available as 0.24% cetirizine hydrochloride eye drops in 5 mL and 7.5 mL bottles. 61 | In children 6 to 11 years old, 5 or 10 mg (1 or 2 teaspoons) once daily in syrup form is recommended depending on symptom severity. 62 | In children 2 to 5 years old, the recommended dose is 2.5 mg (half a teaspoon) in syrup form once daily. 63 | n children six months to 23 months old, the recommended dose is 2.5 mg (half teaspoon) in syrup form once daily. 64 | One drop (0.24% cetirizine hydrochloride ophthalmic solution) is instilled in the affected eye twice daily for patients with allergic conjunctivitis. 65 | Cetirizine is safe and relatively well-tolerated for treating allergic rhinitis and urticaria. 66 | Although uncommon, its primary adverse effects in adults include somnolence, fatigue, pharyngitis, dizziness, and dry mouth. 67 | Somnolence, as a result of cetirizine, appears to be dose-related. 68 | Research indicates that in some patients, cetirizine contributes to daytime sleepiness. 69 | Children taking cetirizine most commonly experience similar side effects as adults taking cetirizine (somnolence, fatigue, and dry mouth). 70 | Children, in particular, are more likely than adults to experience headaches while taking cetirizine. 71 | In pediatric patients aged 2 to 11 years, the majority of adverse reactions reported with cetirizine were mild or moderate. 72 | Among all, somnolence appeared to be dose-related and abdominal pain was considered treatment-related. 73 | Common adverse drug reactions of cetirizine ophthalmic solution are conjunctival hyperemia and instillation site pain. 74 | While on cetirizine therapy, few cases of transient, reversible hepatic transaminase elevations have been reported in the literature. 75 | Some reports exist of hepatitis with elevated bilirubin too. 76 | In postmarketing studies, rare, potentially severe adverse events like severe hypotension, anaphylaxis, hemolytic anemia, cholestasis, orofacial dyskinesia, glomerulonephritis, hepatitis, stillbirth, and thrombocytopenia are reported. 77 | Ophthalmic formulation's common adverse reactions are local pain at the instillation site, ocular hyperemia, and decreased visual acuity. 78 | Patients should be advised not to use cetirizine concurrently with alcohol or other CNS depressants, such as benzodiazepines or opioids, as it may cause dose-related sedation. 79 | Pitolisant is a histamine-3 receptor competitive antagonist and inverse agonist used in patients with narcolepsy. 80 | Concurrent use of pitolisant with antihistamines like cetirizine may diminish the therapeutic efficacy of pitolisant. 81 | Avoid combination. 82 | Cetirizine decreases gabapentin plasma concentrations and reduces systemic exposure to gabapentin. 83 | However, gabapentin is a CNS depressant; hence the pharmacodynamic synergism leading to additional CNS depression may also be observed. 84 | Cetirizine is a substrate of P-glycoprotein, and verapamil is an inhibitor of P-glycoprotein. 85 | Concurrent administration of both drugs prevents the efflux of cetirizine from the CNS and increases antihistaminic activity. 86 | Cetirizine should not be administered with erdafitinib as erdatifinib is also an inhibitor of P-glycoprotein(ABCB1, MDR1). 87 | Cetirizine is contraindicated in anyone with a known hypersensitivity to it or any of its ingredients. 88 | Cetirizine is also contraindicated in anyone with a known hypersensitivity to hydroxyzine, as cetirizine is a metabolite of hydroxyzine. 89 | There are few well-controlled human studies on cetirizine in pregnant mothers, although these showed it to be safe during pregnancy in animal studies. 90 | First-generation antihistamines, diphenhydramine, and doxylamine are safest to use during pregnancy. 91 | However, first-generation antihistamines are more likely than second-generation antihistamines to cause somnolence; clinicians should counsel the patients regarding the potential adverse effects of the medication they choose to take during pregnancy. 92 | Patients taking cetirizine require monitoring for the relief of symptoms. 93 | Healthcare team members should also monitor patients for adverse effects such as fatigue and somnolence in adults and headaches in children. 94 | The kidney primarily excretes cetirizine; as a result, the risk of toxicity is typically higher in patients with impaired renal function. 95 | Patients with renal impairment should take a lower medication dosage in their age bracket. 96 | Liver function and enzymes should also be closely monitored in patients with hepatic impairment. 97 | Healthcare providers should make dosage adjustments as needed for patients with hepatic impairment. 98 | Cetirizine may be confused with sertraline (look-alike-sound-alike drugs). 99 | Clinicians and pharmacists should be careful while prescribing and dispensing this drug. 100 | Research showed the minimal lethal dose to be approximately 460 times the maximum recommended daily dose for adults in rats. 101 | The primary target of acute toxicity in rodents was the central nervous system. 102 | The primary target of multiple-dose toxicity in rodents was the liver. 103 | A small number of cases of cetirizine overdose appear in the literature. 104 | However, many overdoses of cetirizine in children result from improper medication storage by adults living in the same home. 105 | Most overdose incidents in children resolve spontaneously, with drowsiness and sedation being the main adverse effects observed. 106 | Drug-induced liver damage is common with numerous medications; there are reports of a small number of cases of cetirizine-induced liver damage; in all cases, liver enzyme values returned to normal after cessation of cetirizine. 107 | An adult who overdosed on 150 mg cetirizine had somnolence but did not have abnormal blood chemistry, hematology results, or other clinical signs. 108 | An infant overdosed on 180 mg of cetirizine and experienced restlessness and irritability, followed by drowsiness. 109 | Several hours after an accidental overdose of cetirizine, the six-year-old child presented with fixed and dilated pupils, tachycardia, agitation, hyperthermia, and hallucinations consistent with anticholinergic toxicity. 110 | There is no known specific antidote to cetirizine, and it can not be effectively removed by dialysis. 111 | When overdosed on cetirizine, treatment should be supportive and symptomatic, considering any concomitantly ingested medications. 112 | Cetirizine is a relatively safe and effective medication for treating allergic rhinitis, urticaria, and allergic conjunctivitis. 113 | As cetirizine is also available over the counter, prescribers should educate patients on possible side effects, such as drowsiness, fatigue, and dry mouth, while dispensing medicine. 114 | Healthcare providers should be careful when prescribing cetirizine to patients with impaired renal or hepatic function. 115 | Ophthalmologists should educate contact lens wearers to exercise precautions and proper direct use of eye drops. 116 | Patients using eye drops should be informed that local pain at the instillation site, ocular hyperemia, and decreased visual acuity are common adverse reactions with the ophthalmic formulation. 117 | Immunologists play a crucial role in the management of refractory urticaria. 118 | Nurses should monitor therapeutic success and consult patients not to combine cetirizine with drugs that cause central nervous system depression. 119 | Pharmacists should perform thorough medication reconciliation and verify that the patient is not taking any medications or supplements that could exacerbate cetirizine's adverse effects. 120 | Clinicians(MD, DO, NP, PA), nurses, and pharmacists who prescribe or recommend cetirizine to patients should also provide information on the safe storage of cetirizine to prevent accidental overdose by children. 121 | If the overdose of cetirizine is intentional, a psychiatrist should be consulted. 122 | Communication and collaboration among interprofessional teams can achieve the best patient outcomes and reduce healthcare service utilization costs. 123 | -------------------------------------------------------------------------------- /texts_input/NBK501509.txt: -------------------------------------------------------------------------------- 1 | Small occasional doses of cetirizine are acceptable during breastfeeding. 2 | Larger doses or more prolonged use may cause drowsiness and other effects in the infant or decrease the milk supply, particularly in combination with a sympathomimetic such as pseudoephedrine or before lactation is well established. 3 | International guidelines recommend cetirizine as an acceptable choice if an antihistamine is required during breastfeeding. 4 | Cetirizine has been used successfully in cases of persistent pain of the breast during breastfeeding. 5 | Ophthalmic use of cetirizine by the mother should pose little risk to the breastfed infant. 6 | To substantially diminish the amount of drug that reaches the breastmilk after using eye drops, place pressure over the tear duct by the corner of the eye for 1 minute or more, then remove the excess solution with an absorbent tissue. 7 | Three women who were exclusively breastfeeding their 5- to 6-month-old infants were taking cetirizine 10 mg daily by mouth. 8 | Each mother donated milk samples before a dose and 1, 2, 4, 6, 8, 10, 12 and 24 hours after the dose. 9 | An average peak level of 49 mcg/L occurred at an average of 2 hours after the dose. 10 | The average milk concentration over the 24-hour period was 21.1 mcg/L. An exclusively breastfed infant would receive an average of 3.1 mcg/kg daily or a weight-adjusted dosage of 1.77% of the maternal dosage. 11 | As part of a validation study on analysis of cetirizine and levocetirizine in breastmilk, 252 steady-state milk samples from 228 women taking either cetirizine 5 to 20 mg daily (n = 229) or levocetirizine 5 mg daily (n = 9) were analyzed. 12 | Specific dosages and times of milk collection were not given. 13 | The median milk concentrations of cetirizine and levocetirizine was 13 mcg/L (range 0.65 to 65 mcg/L; IQR 4.9 to 24.8 mcg/L) in 228 samples. 14 | Twenty-four samples had levels below the limit of quantification (<0.39 mcg/L). 15 | Women taking cetirizine (n = 31) collected complete sample of milk at about 0, 2, 4, 8, 12 and 24 hours after a daily dose and submitted aliquots for analysis. 16 | The average milk concentration was 16.8 mcg/L and the half-life in milk was 7 hours. 17 | The peak milk concentration averaged 41 mcg/L at an average time of 2.4 hours after a dose. 18 | Using the peak milk concentration, the authors calculated that a fully breastfed infant would receive a maximum of 2.5 mcg/kg daily, which represents a relative infant dose of 1.9%. 19 | Using the average concentration would result in a daily infant dosage of 1 mcg/kg daily and a relative infant dose of 0.8%. 20 | In one telephone follow-up study, mothers reported irritability and colicky symptoms 10% of infants exposed to various antihistamines and drowsiness was reported in 1.6% of infants. 21 | None of the reactions required medical attention. 22 | A woman who was nursing (extent not stated) her newborn infant was treated for pemphigus with oral prednisolone 25 mg daily, with the dosage increased over 2 weeks to 60 mg daily. 23 | She was also taking cetirizine 10 mg daily and topical betamethasone 0.1% twice daily to the lesions. 24 | Because of a poor response, the betamethasone was changed to clobetasol propionate ointment 0.05%. 25 | She continued breastfeeding throughout treatment and her infant was developing normally at 8 weeks of age and beyond. 26 | A woman with narcolepsy took sodium oxybate 4 grams each night at 10 pm and 2 am as well as fluoxetine 20 mg and cetirizine 5 mg daily throughout pregnancy and postpartum. 27 | She breastfed her infant except for 4 hours after the 10 pm oxybate dose and 4 hours after the 2 am dose. 28 | She either pumped breastmilk or breastfed her infant just before each dose of oxybate. 29 | The infant was exclusively breastfed or breastmilk fed for 6 months when solids were introduced. 30 | The infant was evaluated at 2, 4 and 6 months with the Ages and Stages Questionnaires, which were withing the normal range as were the infant's growth and pediatrician's clinical impressions regarding the infant's growth and development. 31 | Three women taking long-term cetirizine 10 mg daily by mouth while exclusively breastfeeding their 5- to 6-month old infants. 32 | The mothers reported no adverse effects in their infants. 33 | Thirty-one women taking cetirizine 10 mg (n = 29) or 20 mg (n = 2) daily reported no adverse effects in 61% of their infants and minor adverse effects fever, sedation, rash, poor feeding, bruising, refusing of the breast or constipation. 34 | But mothers attributed these effects to other causes such a cold, weaning or learning to crawl. 35 | Antihistamines in relatively high doses given by injection can decrease basal serum prolactin in nonlactating women and in early postpartum women. 36 | However, suckling-induced prolactin secretion is not affected by antihistamine pretreatment of postpartum mothers.[10] Whether lower oral doses of cetirizine have the same effect on serum prolactin or whether the effects on prolactin have any consequences on breastfeeding success have not been studied. 37 | The prolactin level in a mother with established lactation may not affect her ability to breastfeed. 38 | In a study of 31 women taking cetirizine 10 mg (n = 29) or 20 mg (n = 2) daily, 10 reported a perceived decrease in milk supply over the prior 3 days. 39 | -------------------------------------------------------------------------------- /texts_input/NBK548420.txt: -------------------------------------------------------------------------------- 1 | Cetirizine and its enantiomer levocetirizine are second generation antihistamines that are used for the treatment of allergic rhinitis, angioedema and chronic urticaria. 2 | Cetirizine and levocetirizine have been linked to rare, isolated instances of clinically apparent acute liver injury. 3 | Cetirizine (se tir' i zeen) is a second generation antihistamine (H1 receptor blocker) that is used widely to treat allergic symptoms associated with hay fever, seasonal allergies, urticaria, angioedema and atopic dermatitis. 4 | Levocetirizine (lee" voe se tir' i zeen) is the levorotatory R-enantiomer of cetirizine and its more active form. 5 | Cetirizine and levocetirizine belong to the piperazine class of antihistamines and, like other second generation antihistamines, are considered to be nonsedating. 6 | Indeed, prospective studies have shown that sedation is less common with cetirizine and levocetirizine than with first generation antihistamines such as diphenhydramine, but some degree of sedation may still occur. 7 | Cetirizine was approved for use by prescription in the United States in 1995 and as an over-the-counter medication in 2007. 8 | Cetirizine is currently one of the most widely used medications with more than 5 million prescriptions filled yearly in addition to considerable nonprescription use. 9 | Cetirizine is available in 5 and 10 mg tablets and capsules in multiple generic forms and under the trade name Zyrtec. 10 | Oral solutions and fixed combinations with pseudoephrine are also available. 11 | The typical dose of cetirizine is 5 to 10 mg once daily and it is often given chronically, at least during allergy season. 12 | Levocetirizine was approved for use in the United States in 2007 and is currently available by prescription only. 13 | Levocetirizine is available in 5 mg tablets and in an oral solution generically and under the brand name Xyzal. 14 | Common side effects of the second generation antihistamines include blurred vision, dry mouth and throat, palpitations, tachycardia, abdominal distress, constipation and headache. 15 | Although considered to be nonsedating antihistamines, cetirizine and levocetirizine can cause mild drowsiness particularly at higher doses. 16 | Antihistamines can worsen urinary retention and glaucoma. 17 | Cetirizine and levocetirizine use are not generally associated with liver enzyme elevations, but have been linked to rare instances of clinically apparent liver injury. 18 | In published reports, the time to onset varied widely, from 1 to 40 weeks and the pattern of injury ranged from cholestatic hepatitis to hepatocellular jaundice. 19 | The reported cases were mild to moderate in severity and self-limited in course with rapid recovery after stopping the medication. 20 | Immunoallergic and autoimmune features were rare, but recurrence of acute liver injury upon reexposure to cetirizine has been described. 21 | Likelihood score: C (probable rare cause of clinically apparent liver injury). 22 | The cause of acute liver injury from cetirizine is not known. 23 | It is metabolized by the liver and a toxic metabolite may account for idiosyncratic injury. 24 | Acute liver injury from cetirizine and levocetirizine is rare and usually self-limited. 25 | Acute liver failure and vanishing bile duct syndrome have not been linked to these second generation antihistamines. 26 | Recurrence of liver injury has been described in patients who restart cetirizine. 27 | There is no information about cross reactivity among the various antihistamines after clinically apparent hepatotoxicity, but switching to another agent with a different structure and belonging to a separate class is probably safe. 28 | References on the safety and potential hepatotoxicity of antihistamines are given together after the Overview section on Antihistamines. 29 | A 28 year old man developed jaundice and pruritus after having taken cetirizine (10 mg daily) for 2 years for allergic rhinitis. 30 | He had no previous history of liver disease, drug allergies, or risk factors for viral hepatitis. 31 | He drank alcohol but only on weekends, although often consuming six 12-ounce cans of beer in one sitting. 32 | His other medical conditions included only allergic rhinitis and episodes of sinusitis. 33 | His only other medications were phenylephrine nasal sprays, and he specifically denied use of other over-the-counter drugs, herbal medications or nutritional supplements. 34 | On examination, he was jaundiced but had no fever, rash, hepatomegaly or signs of chronic liver disease. 35 | Laboratory testing showed a total serum bilirubin of 9.7 mg/dL, ALT 215 U/L, AST 61 U/L, and alkaline phosphatase 260 U/L (Table). 36 | Test for viral hepatitis and autoimmune liver disease were negative. 37 | Abdominal ultrasound and endoscopic retrograde cholangiopancreatography showed no evidence of gallstones or biliary obstruction. 38 | He stopped cetirizine at the time of presentation and was treated with hydroxyzine for control of pruritus. 39 | However, jaundice and pruritus persisted, and a liver biopsy was done which showed zone 3 cholestasis and mild hepatic necrosis and inflammation compatible with drug induced liver injury. 40 | There was no steatosis or changes suggestive of alcohol related liver injury. 41 | He was started on ursodiol and improved slowly, but serum enzymes and bilirubin remained slightly elevated at the time of the last follow up visit. 42 | A young man developed severe jaundice and pruritus after taking cetirizine for allergic rhinitis for several years. 43 | He denied taking other medications and had no clinical, serologic or radiologic evidence for viral hepatitis, autoimmune liver disease or biliary tract disease. 44 | A liver biopsy showed marked cholestasis but little or no inflammation, steatosis or fibrosis. 45 | The clinical phenotype might be best described as "bland cholestasis" which raises the possibility of unacknowledged use of anabolic steroids. 46 | The prolonged jaundice and pruritus also favors this diagnosis and the cetirizine may have been used as a cover for the illicit use of body building drugs. 47 | Actually cetirizine is closely related to and metabolized to hydroxyzine (another member of the piperazine class of antihistamines), which he was given during the episode of jaundice because of pruritus. 48 | This report, like other case reports of cetirizine hepatotoxicity, is not totally convincing. 49 | --------------------------------------------------------------------------------