├── LICENSE
├── LLM-RAG-GRAPH.ipynb
├── README.md
├── assets
├── Q1-graph.jpg
├── Q1.jpg
├── Q2-graph.jpg
└── Q2.jpg
├── coreconfigs.py
├── coreutils.py
├── example_query.py
├── pgdb_setup.sh
├── pgvector.sql
├── requirements.txt
├── setup.sh
├── store_embeddings.py
├── text_processed
└── NBK549776.txt
└── texts_input
├── NBK501509.txt
└── NBK548420.txt
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "[]"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright [yyyy] [name of copyright owner]
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------
/LLM-RAG-GRAPH.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "id": "f5f52427",
7 | "metadata": {},
8 | "outputs": [],
9 | "source": [
10 | "import gradio as gr\n",
11 | "from coreutils import LLMOps"
12 | ]
13 | },
14 | {
15 | "cell_type": "code",
16 | "execution_count": 2,
17 | "id": "c30710dd",
18 | "metadata": {},
19 | "outputs": [
20 | {
21 | "data": {
22 | "application/vnd.jupyter.widget-view+json": {
23 | "model_id": "028bbaa9ff794017ae348c296d05019f",
24 | "version_major": 2,
25 | "version_minor": 0
26 | },
27 | "text/plain": [
28 | "Loading checkpoint shards: 0%| | 0/8 [00:00, ?it/s]"
29 | ]
30 | },
31 | "metadata": {},
32 | "output_type": "display_data"
33 | },
34 | {
35 | "name": "stderr",
36 | "output_type": "stream",
37 | "text": [
38 | "WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.\n"
39 | ]
40 | }
41 | ],
42 | "source": [
43 | "llm = LLMOps()\n",
44 | "def mdl_gr_response(qry, temp):\n",
45 | " \"\"\" Gradio UI friendly results from the LLM \"\"\"\n",
46 | " dct = llm.mdl_ui_response(qry, temp)\n",
47 | " return (dct[\"answer\"], dct[\"graph\"], dct[\"images\"], dct[\"docs\"])"
48 | ]
49 | },
50 | {
51 | "cell_type": "code",
52 | "execution_count": 3,
53 | "id": "cda51ac7",
54 | "metadata": {},
55 | "outputs": [
56 | {
57 | "data": {
58 | "text/html": [
59 | "
"
60 | ],
61 | "text/plain": [
62 | ""
63 | ]
64 | },
65 | "metadata": {},
66 | "output_type": "display_data"
67 | },
68 | {
69 | "data": {
70 | "text/plain": []
71 | },
72 | "execution_count": 3,
73 | "metadata": {},
74 | "output_type": "execute_result"
75 | },
76 | {
77 | "name": "stdout",
78 | "output_type": "stream",
79 | "text": [
80 | "Embedding model ok.\n",
81 | "DB connection established.\n"
82 | ]
83 | }
84 | ],
85 | "source": [
86 | "%%capture --no-display\n",
87 | " \n",
88 | "#define gradio interface and other parameters\n",
89 | "with gr.Blocks() as app:\n",
90 | " with gr.Row():\n",
91 | " with gr.Column():\n",
92 | " input = gr.Textbox(label=\"Question\", show_copy_button=True)\n",
93 | " with gr.Column():\n",
94 | " slider = gr.Slider(1, 10, value=1, label=\"Randomness\", show_label=True,\n",
95 | " step=1, info=\"High values generates diverse texts.\")\n",
96 | " submit_btn = gr.Button(\"submit\")\n",
97 | " with gr.Row():\n",
98 | " with gr.Column():\n",
99 | " ans = gr.Textbox(label=\"Answer with context\", show_copy_button=True)\n",
100 | " with gr.Column():\n",
101 | " img_html = gr.HTML()\n",
102 | " with gr.Row():\n",
103 | " doc_html = gr.HTML()\n",
104 | " with gr.Row():\n",
105 | " grph_html = gr.HTML()\n",
106 | " submit_btn.click(fn=mdl_gr_response, \n",
107 | " inputs=[input, slider], \n",
108 | " outputs=[ans, grph_html, img_html, doc_html])\n",
109 | "app.load(show_progress=\"minimal\") \n",
110 | "app.launch(share=False, quiet=True, show_api=False, height=1300, show_error=True)"
111 | ]
112 | },
113 | {
114 | "cell_type": "code",
115 | "execution_count": null,
116 | "id": "77c6240a",
117 | "metadata": {},
118 | "outputs": [],
119 | "source": []
120 | }
121 | ],
122 | "metadata": {
123 | "kernelspec": {
124 | "display_name": "Python 3 (ipykernel)",
125 | "language": "python",
126 | "name": "python3"
127 | },
128 | "language_info": {
129 | "codemirror_mode": {
130 | "name": "ipython",
131 | "version": 3
132 | },
133 | "file_extension": ".py",
134 | "mimetype": "text/x-python",
135 | "name": "python",
136 | "nbconvert_exporter": "python",
137 | "pygments_lexer": "ipython3",
138 | "version": "3.10.13"
139 | }
140 | },
141 | "nbformat": 4,
142 | "nbformat_minor": 5
143 | }
144 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Build relationship Graphs using LLM in a Retrieval-Augmented Generation(RAG) framework with pgvector as a vector database
2 |
3 |
4 | ## Overview
5 |
6 | Tool to build relationship graphs using a large language module (LLM).
7 | Supports adding context to the query using Retrieval-Augmented Generation(RAG). Context is built against an internal knowledge base. Context embeddings are stored and retrieved from a vector database. Relationships are stored in the database.
8 |
9 |
10 | ## Tool Features
11 | - Store context in the vector database
12 | - Retrieve context from vector database, supplement the query with the context thus improve LLM response quality
13 | - Along with the LLM response, visualize the relationships in the document(s), highlight related documents and images
14 |
15 |
16 | ## Installation
17 | ### Prerequisites
18 |
19 | - [Python](https://www.python.org/downloads/) 3.10 or greater
20 | - check requirements.txt for required python libraries
21 |
22 | ### Supported Database
23 |
24 | - [PostgreSQL](https://www.postgresql.org/) . Supports Postgres 11+ . Tested on 14.10.
25 |
26 | ### Vector Database
27 |
28 | - [pgvector](https://github.com/pgvector/pgvector)
29 |
30 |
31 | ### Scripts
32 |
33 | - pgdb_setup.sh: Install postgresql14.10 database on Ubuntu.
34 | - pgvector.sql: Configure postgresql database as a vector database
35 | - setup.sh: Install required python packages, configure vector database. Assumes PostgreSQL database on the same host. Review the file before execution.
36 |
37 |
38 | ## Application
39 |
40 | - coreconfigs.py: Application configurations. An important file to review and edit.
41 | - store_embeddings.py: Wrapper script to read the text files, generate and store embeddings, relationships in pgvector database
42 | - example_query.py: Example to query LLM, save results as a html
43 | - LLM-RAG-GRAPH.ipynb: Jupyter notebook with Gradio interface can also be used to interact with the LLM and visualize the graph
44 |
45 |
46 | ## Getting Started
47 |
48 | ### Application config and run
49 | - Download the repo
50 | - Perform the installation steps (see above)
51 | - #### Edit coreconfigs.py to update the postgreSQL DB connection.
52 |
53 | - run store_embeddings.py to store the embeddings, relationships into pgvector DB
54 |
55 | ```
56 | Embedding model ok.
57 | DB connection established.
58 | Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 8/8 [00:02<00:00, 3.39it/s]
59 | Processing text file: NBK548420.txt
60 | Get relations: Cetirizine and its enantiomer levocetirizine are second generation antihistamines that are used for the treatment of allergic rhinitis, angioedema and chronic urticaria.
61 | ...
62 | ...
63 | Embeddings commited for file: texts_input\NBK548420.txt
64 | ```
65 |
66 | - run the example_query.py to test
67 |
68 | ```
69 | python example_query.py
70 | Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 8/8 [00:04<00:00, 1.99it/s]
71 | WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
72 | Embedding model ok.
73 | DB connection established.
74 | View the html file: user_qry_results.html for the results
75 |
76 | ...
77 | ```
78 |
79 |
80 | ## Example 1
81 |
82 |
83 |

84 |
Generated graph full resolution
85 |

86 |
87 |
88 |
89 | ## Example2: Query with a typo
90 |
91 |
92 |

93 |
Generated graph full resolution
94 |

95 |
--------------------------------------------------------------------------------
/assets/Q1-graph.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ryogesh/llm-rag-graph/37658d65b169b9342f37a567f31c63e8dc7b7488/assets/Q1-graph.jpg
--------------------------------------------------------------------------------
/assets/Q1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ryogesh/llm-rag-graph/37658d65b169b9342f37a567f31c63e8dc7b7488/assets/Q1.jpg
--------------------------------------------------------------------------------
/assets/Q2-graph.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ryogesh/llm-rag-graph/37658d65b169b9342f37a567f31c63e8dc7b7488/assets/Q2-graph.jpg
--------------------------------------------------------------------------------
/assets/Q2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ryogesh/llm-rag-graph/37658d65b169b9342f37a567f31c63e8dc7b7488/assets/Q2.jpg
--------------------------------------------------------------------------------
/coreconfigs.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/python
2 | # -*- coding: utf-8 -*-
3 |
4 | """ All configuration items
5 | Important: CHANGE the postgresDB connection information below
6 | Optional: To ignore during text extraction, Change
7 | _IGNORE_SENTS: list of sentences (not words)
8 | """
9 |
10 | # Models used: Embedding model, LLM and Spacy model (for sentence identification)
11 | # Embedding Model with lesser Dimension=384, better for vectorDB performance
12 | # Use MTEB leaderboard: https://huggingface.co/spaces/mteb/leaderboard for model details
13 | # Embedding Model Sequence Length = 512, setting _MAX_TKNLEN to about 25%
14 | # Text chunks shouldn't be too short or too long to be of good context
15 | _EMBED_MDL = "khoa-klaytn/bge-small-en-v1.5-angle"
16 | _DB_EMBED_DIM = 384
17 | _MAX_TKNLEN = 120
18 |
19 | # LLM
20 | _LLM_NAME = "HuggingFaceH4/zephyr-7b-beta"
21 | _LLM_MSG_TMPLT = [{ "role": "system", "content": "",}, {"role": "user", "content": ''},]
22 |
23 | # LLM model sequence length = 4k, we will provide about 1k tokens, _MAX_TKNLEN*_MAX_SIM_TXTS
24 | # Higher length requires higher GPU processing, memory and can lead to OoM error on smaller GPUs.
25 | # Reducing context tokens, reduces processing costs.
26 | # But short contexts may lead to inaccurate or repetitive answers.
27 | _MAX_SIM_TXTS = 4
28 |
29 | # SciSpacy model
30 | # see https://github.com/allenai/scispacy?tab=readme-ov-file#available-models
31 | _SPACYMDL = "en_core_sci_lg"
32 | _SPACY_MAX_TKNLEN = 25
33 |
34 | # Directory to store extracted texts
35 | _TEXTDIR = "texts_input"
36 | # Directory to store texts once embeddings are stored in vector DB
37 | _TXTSREADDIR = "texts_processed"
38 |
39 |
40 | #PgVector DB details
41 | _PGHOST = "1.1.1.1"
42 | _PGPORT = 5432
43 | _PGUSER = "ragu"
44 | _PGDB = "ragdb"
45 | _PGPWD = "yourpassword"
46 |
--------------------------------------------------------------------------------
/coreutils.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/python
2 | # -*- coding: utf-8 -*-
3 |
4 | """ coreutils module: Provides common utilities for other modules """
5 | from pathlib import Path
6 | from time import sleep
7 | import sys
8 | from datetime import datetime, timezone
9 | import json
10 | from base64 import b64encode
11 |
12 | import graphviz
13 |
14 | import numpy as np
15 | import psycopg
16 |
17 | import spacy
18 |
19 | import torch
20 | import transformers
21 | from sentence_transformers import SentenceTransformer
22 |
23 | from coreconfigs import _SPACYMDL, _LLM_NAME, _LLM_MSG_TMPLT, _EMBED_MDL, _TXTSREADDIR, \
24 | _DB_EMBED_DIM, _MAX_SIM_TXTS, _MAX_TKNLEN, _SPACY_MAX_TKNLEN, \
25 | _PGHOST, _PGPORT, _PGUSER, _PGDB, _PGPWD
26 |
27 |
28 | class DbOps():
29 | """ For database operations """
30 | def __init__(self):
31 | self.stmt = ''
32 | self.values = ''
33 | self._tmr = 0
34 | self._conn = ''
35 | self._dbconn_retry()
36 |
37 | def _dbconn_retry(self):
38 | try:
39 | self._conn = psycopg.connect(dbname=_PGDB,
40 | user=_PGUSER,
41 | password=_PGPWD,
42 | host=_PGHOST,
43 | port=_PGPORT,
44 | sslmode="prefer",
45 | connect_timeout=2)
46 | except psycopg.OperationalError:
47 | if self._tmr < 6:
48 | self._tmr += 3
49 | ### Server connection issue, try in few secs
50 | print(f"Unable to connect to database, trying in {self._tmr} secs...")
51 | sleep(self._tmr)
52 | self._dbconn_retry()
53 | else:
54 | self._tmr = 0
55 | raise
56 |
57 | def execstmt(self):
58 | """ Execute the DB statements """
59 | cur = self._conn.execute(self.stmt, self.values)
60 | res = ''
61 | if cur.description: #check for return rows
62 | res = cur.fetchall()
63 | return res
64 |
65 | def commit(self):
66 | """ Commits the transaction"""
67 | self._conn.commit()
68 |
69 | def rollback(self):
70 | """ Rollback the transaction"""
71 | self._conn.rollback()
72 |
73 |
74 | class Embeds():
75 | """
76 | Provides helper functions to
77 | 1. Iterate all the directories under _TEXTDIR
78 | Read text, chunk and save in pgvector DB
79 | 2. Generate embedding and store in pgvector DB
80 | 3. If text_relations = True, then for each sentence generate obj> triplet
81 | 4. search for similar texts in pgvector DB
82 | """
83 |
84 | def __init__(self, dbconn=True, text_relations=True):
85 | self.emb_mdl = SentenceTransformer(_EMBED_MDL)
86 | self._text_relations = text_relations
87 |
88 | ## Verify embedding dimension size before processing
89 | embeddings = self.emb_mdl.encode("Hello World")
90 | if _DB_EMBED_DIM < embeddings.size:
91 | print(f"DB field length={_DB_EMBED_DIM}. Embedding dimension={embeddings.size}")
92 | print("Choose a different model or change embedding dimension on DB.")
93 | print("Exiting...")
94 | sys.exit(1)
95 | else:
96 | print("Embedding model ok.")
97 | if dbconn:
98 | self.dbo = DbOps()
99 | print("DB connection established.")
100 | # similarity: <=> cosine, <-> L2, <#> inner product
101 | # We normalize embeddings so use <#>
102 | # Ensure t_document_chunks index is using vector_ip_ops
103 | self.dbo_stmts = {"upd_doc":"update t_documents set created_at=%s where id=%s",
104 | "ins_doc": "insert into t_documents (doc_name) values(%s) RETURNING id",
105 | "sel_doc": "select id from t_documents where doc_name = %s",
106 | "doc_refs": "select doc_name, doc_reference from t_documents \
107 | where id in ({qargs}) and doc_reference is NOT NULL",
108 | "doc_images": "select img_desc, img_reference from t_document_images \
109 | where doc_id in ({qargs}) ",
110 | "del_txts": "delete from t_document_chunks where doc_id = %s",
111 | "del_relations": "delete from t_chunk_relations where doc_id = %s",
112 | "ins_txt": "insert into t_document_chunks (doc_id, chunk, embedding) \
113 | values(%s, %s, %s) RETURNING id",
114 | "ins_relations": "insert into t_chunk_relations (doc_id, chunk_id, text_relation, json_relation) \
115 | values(%s, %s, %s, %s)",
116 | "ins_relations_nj": "insert into t_chunk_relations (doc_id, chunk_id, text_relation) \
117 | values(%s, %s, %s)",
118 | "sim_chunks": "select json_relation from t_chunk_relations \
119 | where chunk_id in ({qargs}) and json_relation is NOT NULL",
120 | "sim_txts": f"SELECT id, doc_id, chunk FROM t_document_chunks \
121 | ORDER BY embedding <#> %s LIMIT {_MAX_SIM_TXTS}"
122 | }
123 | if self._text_relations:
124 | self.prsr = spacy.load(_SPACYMDL)
125 | self.llm = LLMOps()
126 | self.llm.gconfigdct["temperature"] = .1
127 | self.llm.gconfigdct["max_new_tokens"] = 512
128 | system_content = """Translate the user content as entity relation triplet in
129 | {"subj": "", "relation": "", "obj": ""} json format."""
130 | self.llm.msg_tmplt[0]['content'] = system_content
131 | self.gconfig = transformers.GenerationConfig(**self.llm.gconfigdct)
132 |
133 | def np_to_str(self, val):
134 | """Convert np.float32 to np.float64. json.dumps supports it."""
135 | return np.float64(val)
136 |
137 | def dbexec(self, stmt, values, msg):
138 | """
139 | Generic function for executing database statements
140 | If results=False, returns ''
141 | If results=True returns all rows
142 | """
143 | self.dbo.stmt = stmt
144 | self.dbo.values = values
145 | retval = ''
146 | try:
147 | retval = self.dbo.execstmt()
148 | except Exception:
149 | print(f"{msg} failed....")
150 | print("Rolling back transaction")
151 | print(f"Statement: {self.dbo.stmt}")
152 | print(f"Values: {self.dbo.values}")
153 | self.dbo.rollback()
154 | raise
155 | self.dbo.stmt = ''
156 | self.dbo.values = ''
157 | return retval
158 |
159 | def save_chunk_relations(self, txtlst, docid, chunkid):
160 | """ Get the relation triplet from LLM and insert into DB """
161 | def process_itm(jsn):
162 | json_val = ''
163 | if isinstance(jsn["obj"], str):
164 | if jsn.get("obj_qualifier"):
165 | jsn["relation"] = f'{jsn["obj_qualifier"]} {jsn["relation"]}'
166 | if jsn.get("context"):
167 | jsn["relation"] = f'{jsn["subj"]} {jsn["relation"]}'
168 | jsn["subj"] = jsn["context"]
169 | if jsn["obj"]:
170 | json_val = jsn
171 | elif isinstance(jsn["obj"], list):
172 | obj = ''
173 | try:
174 | if isinstance(jsn["obj"][0], str):
175 | obj = ', '.join(jsn["obj"])
176 | elif isinstance(jsn["obj"][0], dict):
177 | obj = ', '.join({i["subj"] for i in jsn["obj"] if i["subj"]})
178 | if obj:
179 | jsn["obj"] = obj
180 | json_val = jsn
181 | except TypeError:
182 | print("Ignoring list triplet due to incorrect json format")
183 | return json_val
184 |
185 | for text in txtlst:
186 | doc = self.prsr(text)
187 | pos = {tkn.pos_ for tkn in doc}
188 | # Generate relations only on sentences
189 | # with < 25 (default) tokens, else the generated relations can be too complicated.
190 | # with Noun and Verb
191 | if len(doc) < _SPACY_MAX_TKNLEN and 'NOUN' in pos and 'VERB' in pos:
192 | self.llm.msg_tmplt[1]['content'] = text
193 | print(f"Get relations: {text}")
194 | prompt = self.llm.pipeline.tokenizer.apply_chat_template(self.llm.msg_tmplt,
195 | tokenize=False,
196 | add_generation_prompt=False)
197 | outputs = self.llm.pipeline(prompt, generation_config=self.gconfig)
198 | res = outputs[0]["generated_text"].split("<|assistant|>\n")[1]
199 | print(f"Generated triplet: {res}")
200 | jlst = []
201 | try:
202 | jsn = json.loads(res)
203 | except json.decoder.JSONDecodeError:
204 | try:
205 | for itm in res.split('{')[1:]:
206 | jsn = json.loads('{'+itm.replace('\n','').strip(','))
207 | jitm = process_itm(jsn)
208 | if jitm:
209 | jlst.append(jitm)
210 | except json.decoder.JSONDecodeError:
211 | print("Ignoring triplet due to incorrect json format")
212 | else:
213 | jitm = process_itm(jsn)
214 | if jitm:
215 | jlst.append(jitm)
216 | if jlst:
217 | _ = self.dbexec(self.dbo_stmts['ins_relations'],
218 | (docid, chunkid, res, json.dumps(jlst)),
219 | "Insert chunk relations")
220 | else:
221 | _ = self.dbexec(self.dbo_stmts['ins_relations_nj'],
222 | (docid, chunkid, res),
223 | "Insert chunk relations")
224 |
225 | def save_embeddings_to_db(self, fldr, parent='.'):
226 | """
227 | Iterate all the directories under _TEXTDIR (fldr)
228 | Read text file, chunk texts and save chunk+embeddings in pgvector DB
229 | For each sentence get the relation triplet ,
230 | LLM optionally provides "obj_qualifier" or "context"
231 | """
232 | def emb_to_db(txtchunk, txtlst, docid):
233 | embeddings = self.emb_mdl.encode(txtchunk)
234 | # Normalizing the embeddings, just in case
235 | # default is Frobenius norm
236 | # https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html
237 | fnorm = np.linalg.norm(embeddings)
238 | lst = list(embeddings/fnorm)
239 | # json supports only np.float64. Convert np.float32
240 | embed_str = json.dumps(lst, default=np.float64)
241 | chunkid = self.dbexec(self.dbo_stmts['ins_txt'],
242 | (docid, json.dumps(txtlst), embed_str),
243 | "Insert chunk into Document")
244 | #print(f"docid: {docid}, chunkid:{chunkid[0][0]}")
245 | if self._text_relations:
246 | self.save_chunk_relations(txtlst, docid, chunkid[0][0])
247 |
248 | for rfl in fldr.iterdir():
249 | if rfl.is_file():
250 | print(f"Processing text file: {rfl.name}")
251 | # If the file has been processed already, delete the document chunks and reprocess
252 | doc_id = self.dbexec(self.dbo_stmts['sel_doc'], (rfl.name, ),
253 | "Check for Document")
254 |
255 | if doc_id:
256 | docid = doc_id[0][0]
257 | _ = self.dbexec(self.dbo_stmts['del_relations'], (docid, ),
258 | "Deleting chunk relations")
259 | _ = self.dbexec(self.dbo_stmts['del_txts'], (docid, ),
260 | "Deleting document chunks")
261 | _ = self.dbexec(self.dbo_stmts['upd_doc'],
262 | (datetime.now(tz=timezone.utc), docid),
263 | "Updating document timestamp")
264 | else:
265 | doc_id = self.dbexec(self.dbo_stmts['ins_doc'], (rfl.name, ),
266 | "Insert Document")
267 | docid = doc_id[0][0]
268 | with open(rfl, encoding="utf-8", errors="replace") as txt_fl:
269 | filetexts = txt_fl.readlines()
270 | txtchunk = ''
271 | txtlst = []
272 | for txt in filetexts:
273 | txt = txt.strip()
274 | txtchunk = f"{txtchunk} {txt}"
275 | txtlst.append(txt)
276 | if len(txtchunk.split()) >= _MAX_TKNLEN:
277 | emb_to_db(txtchunk, txtlst, docid)
278 | txtlst = []
279 | txtchunk = ''
280 | emb_to_db(txtchunk, txtlst, docid) # Pending will be a separate chunk
281 | self.dbo.commit()
282 | print(f"Embeddings commited for file: {rfl}")
283 | try:
284 | _ = rfl.replace(Path(_TXTSREADDIR, parent, rfl.name))
285 | except (PermissionError, FileExistsError, FileNotFoundError) as err:
286 | print(f"File not moved: {err}")
287 | print("Ignoring error...")
288 |
289 | if rfl.is_dir():
290 | print(f"Creating text processed directory: {rfl.name}")
291 | Path(_TXTSREADDIR, rfl.name).mkdir(parents=True, exist_ok=True)
292 | self.save_embeddings_to_db(rfl, rfl.name)
293 | # Delete the processed text directory, ignore error if any file exists
294 | try:
295 | rfl.rmdir()
296 | except (OSError, FileNotFoundError) as err:
297 | print(f"Directory not deleted: {err}")
298 | print("Ignoring error...")
299 |
300 | def get_similar_texts(self, text):
301 | """
302 | 1. Generate text embedding.
303 | 2. Compare similarity against vectorDB and get texts similar to the input text.
304 | """
305 | embeddings = self.emb_mdl.encode(text)
306 | # Normalize before querying the DB
307 | fnorm = np.linalg.norm(embeddings)
308 | lst = list(embeddings/fnorm)
309 | # json supports only np.float64. Convert np.float32
310 | embed_str = json.dumps(lst, default=np.float64)
311 | sim_txts = self.dbexec(self.dbo_stmts['sim_txts'], (embed_str,), "Get similar texts")
312 | sim_chunk_ids = {itm[0] for itm in sim_txts}
313 | sim_doc_ids = {itm[1] for itm in sim_txts}
314 | all_txts = []
315 | contxt = ''
316 | # Avoid duplicate sentences, less noise in context is better for LLM response
317 | for itm in sim_txts:
318 | for txt in itm[2]:
319 | if txt not in all_txts:
320 | all_txts.append(txt)
321 | contxt = f"{contxt} {txt}"
322 | # Do not exceed the tokens limit
323 | if len(contxt.split()) >= _MAX_TKNLEN*_MAX_SIM_TXTS:
324 | break
325 | return contxt, sim_chunk_ids, sim_doc_ids
326 |
327 | class LLMOps():
328 | """For LLM operations """
329 | def __init__(self):
330 | self.pipeline = transformers.pipeline("text-generation",
331 | model=_LLM_NAME,
332 | torch_dtype=torch.bfloat16,
333 | device_map="auto",
334 | )
335 | self.gconfigdct = self.pipeline.model.generation_config.to_dict()
336 | self.gconfigdct["max_new_tokens"] =256
337 | self.gconfigdct["do_sample"] = True
338 | self.gconfigdct["top_k"] = 50
339 | self.gconfigdct["top_p"] = 0.95
340 | self.gconfigdct["pad_token_id"] = self.pipeline.model.config.eos_token_id
341 | self.gconfigdct["temperature"] = .7
342 | self.emb = ''
343 | self.msg_tmplt = _LLM_MSG_TMPLT
344 |
345 | def mdl_response(self, qry, temp=1):
346 | """ Function returns the answer from the LLM with context"""
347 | if temp < 1 or temp > 9:
348 | temp = 7
349 | if not self.emb:
350 | self.emb = Embeds(text_relations=False)
351 | contxt, sim_chunk_ids, sim_doc_ids = self.emb.get_similar_texts(qry)
352 | self.msg_tmplt[1]['content'] = contxt
353 | prompt = self.pipeline.tokenizer.apply_chat_template(self.msg_tmplt, tokenize=False,
354 | add_generation_prompt=True)
355 | self.gconfigdct["temperature"] = temp/10
356 | gconfig = transformers.GenerationConfig(**self.gconfigdct)
357 | outputs = self.pipeline(prompt, generation_config=gconfig)
358 | res = outputs[0]["generated_text"].split("<|assistant|>\n")[1]
359 | #print(f"Similar Chunk ids: {sim_chunk_ids}")
360 | #print(f"Similar Doc ids: {sim_doc_ids}")
361 |
362 | # Get document relations
363 | dbqry = self.emb.dbo_stmts['sim_chunks']
364 | dbqry = dbqry.replace("{qargs}", ','.join(str(i) for i in sim_chunk_ids))
365 | dbres = self.emb.dbexec(dbqry, None, "Get chunk relations")
366 | sim_chunk_lst = []
367 | for each in dbres:
368 | for row in each:
369 | for itm in row:
370 | sim_chunk_lst.append((itm["subj"], itm['obj'], itm['relation']))
371 |
372 | # Get document references, document images
373 | docs = ','.join(str(i) for i in sim_doc_ids)
374 | dbqry = self.emb.dbo_stmts['doc_refs']
375 | dbqry = dbqry.replace("{qargs}", docs)
376 | dbres = self.emb.dbexec(dbqry, None, "Get document references")
377 | sim_doc_refs = {row[0]:row[1] for row in dbres}
378 |
379 | dbqry = self.emb.dbo_stmts['doc_images']
380 | dbqry = dbqry.replace("{qargs}", docs)
381 | dbres = self.emb.dbexec(dbqry, None, "Get document images")
382 | sim_doc_imgs = {row[0]:row[1] for row in dbres}
383 | return res, set(sim_chunk_lst), sim_doc_refs, sim_doc_imgs
384 |
385 | def mdl_ui_response(self, qry, temp=1):
386 | """ Function returns ui friendly results from the LLM
387 | Returns a dictionary of UI elements {"answer", "graph", "images", "docs"}
388 | Query Answer, relations graph, images from documents, associated documents
389 | """
390 | res, sim_chunk_lst, sim_doc_refs, sim_doc_imgs = self.mdl_response(qry, temp)
391 | # Build graph, return as jpeg image
392 | grph = graphviz.Digraph('wide')
393 | for row in sim_chunk_lst:
394 | grph.edge(row[0].lower(), row[1].lower(), row[2].lower())
395 | unflt = grph.unflatten(stagger=5)
396 | grph_html = "Relations graph
"
397 | grph = '

'
398 | grph_html += grph %(b64encode(unflt._repr_image_jpeg()).__repr__()[2:-1])
399 |
400 | # Images in document
401 | img_html = 'Images in documents
'
402 | img = '%s
'
403 | for key, val in sim_doc_imgs.items():
404 | img_html += img %(key, val, key)
405 |
406 | # Documents queried for context
407 | doc_html = 'Documents referenced
'
408 | ref = ''
409 | for key, val in sim_doc_refs.items():
410 | doc_html += ref %(val, val, key.split('.')[0])
411 | return {"answer":res, "graph":grph_html, "images":img_html, "docs":doc_html}
412 |
--------------------------------------------------------------------------------
/example_query.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/python
2 | # -*- coding: utf-8 -*-
3 |
4 | """ Example script to query LLM """
5 |
6 | from coreutils import LLMOps
7 |
8 |
9 | llm = LLMOps()
10 |
11 | #typo in the query
12 | user_qry = "cetrizine guideline"
13 | dct = llm.mdl_ui_response(user_qry)
14 |
15 | html = f'Question:
{user_qry}\
16 |
LLM Response:
{dct["answer"]}
\
17 | {dct["images"]} {dct["docs"]} {dct["graph"]} '
18 | fname = "user_qry_results.html"
19 | with open(fname, 'wt', encoding="utf-8") as fl:
20 | fl.write(html)
21 |
22 | print(f"View the html file: {fname} for the results")
23 |
--------------------------------------------------------------------------------
/pgdb_setup.sh:
--------------------------------------------------------------------------------
1 | # Refer: https://www.postgresql.org/download/linux/ubuntu/
2 | sh -c 'echo "deb https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs 2>/dev/null)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
3 |
4 | # Import the repository signing key:
5 | wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add -
6 |
7 | # Update the package lists:
8 | apt-get update
9 |
10 | # Install the latest version of PostgreSQL.
11 | # If you want a specific version, use 'postgresql-12' or similar instead of 'postgresql':
12 | apt-get -y install postgresql-server-14 postgresql-server-dev-14
13 |
14 | echo "host all all 192.168.0.0/16 scram-sha-256" > /etc/postgresql/14/main/pg_hba.conf
15 |
16 |
17 | sed -i "s/#listen_addresses = 'localhost'/listen_addresses = '*'/" /etc/postgresql/14/main/postgresql.conf
18 | sed -i "s/shared_buffers = 128MB/shared_buffers = 256MB/" /etc/postgresql/14/main/postgresql.conf
19 | sed -i "s/#maintenance_work_mem = 64MB/maintenance_work_mem = 512MB/" /etc/postgresql/14/main/postgresql.conf
20 | sed -i "s/#jit = on/jit = off/" /etc/postgresql/14/main/postgresql.conf
21 |
22 | systemctl restart postgresql
23 |
--------------------------------------------------------------------------------
/pgvector.sql:
--------------------------------------------------------------------------------
1 | create database ragdb;
2 | create user ragu with encrypted password 'yourpassword';
3 | grant all privileges on database ragdb to ragu;
4 |
5 | \c ragdb
6 | CREATE EXTENSION if not exists vector;
7 | CREATE TABLE t_documents (id bigserial PRIMARY KEY,
8 | doc_name varchar(256),
9 | doc_reference varchar,
10 | created_at timestamp default now());
11 |
12 | CREATE TABLE t_document_chunks (id bigserial PRIMARY KEY,
13 | doc_id bigserial not null references t_documents(id),
14 | chunk jsonb,
15 | embedding vector(384),
16 | created_at timestamp default now());
17 |
18 | CREATE INDEX ON t_document_chunks USING hnsw (embedding vector_ip_ops) WITH (m = 16, ef_construction = 128);
19 |
20 | CREATE TABLE t_chunk_relations (id bigserial PRIMARY KEY,
21 | doc_id bigserial not null references t_documents(id),
22 | chunk_id bigserial not null references t_document_chunks(id),
23 | text_relation varchar,
24 | json_relation jsonb);
25 | CREATE INDEX idx_t_chunk_relations ON t_chunk_relations (chunk_id);
26 |
27 | CREATE TABLE t_document_images (id bigserial PRIMARY KEY,
28 | doc_id bigserial not null references t_documents(id),
29 | img_reference varchar,
30 | img_desc varchar(256));
31 |
32 | GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA PUBLIC to ragu;
33 | GRANT ALL ON ALL SEQUENCES IN SCHEMA PUBLIC to ragu;
34 | \q
35 |
36 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | ftfy==6.1.3
2 | numpy==1.26.3
3 | torch==2.0.1
4 | transformers==4.36.2
5 | accelerate==0.26.0
6 | sentence-transformers==2.2.2
7 | gradio==4.14.0
8 | psycopg[binary]==3.1.17
9 | spacy==3.7.2
10 | graphviz==
--------------------------------------------------------------------------------
/setup.sh:
--------------------------------------------------------------------------------
1 |
2 | # Install required python packages
3 | pip install -r requirements.txt
4 |
5 | # Install scispacy model, refer https://github.com/allenai/scispacy
6 | pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.3/en_core_sci_lg-0.5.3.tar.gz
7 |
8 | # If postgresDB is not installed
9 | pgdb_setup.sh
10 |
11 | # Setup the vector database for RAG
12 | su -c 'psql < pgvector.sql' postgres
13 |
--------------------------------------------------------------------------------
/store_embeddings.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/python
2 | # -*- coding: utf-8 -*-
3 |
4 | """ Script to
5 | 1. Iterate all files under the directory
6 | 2. Read text file, chunk texts and save chunk+embeddings in pgvector DB
7 | """
8 |
9 | from pathlib import Path
10 |
11 | from coreconfigs import _TEXTDIR
12 | from coreutils import Embeds
13 |
14 |
15 | texts = Path(_TEXTDIR)
16 | embd = Embeds(dbconn=True, text_relations=True)
17 | embd.save_embeddings_to_db(texts)
18 |
--------------------------------------------------------------------------------
/text_processed/NBK549776.txt:
--------------------------------------------------------------------------------
1 | Cetirizine is a medication used in the treatment of allergic rhinitis and urticaria.
2 | It is a second-generation antihistamine.
3 | Cetirizine was FDA-approved in the United States as a prescription-only product in 1995, and later in 2007, it got approval as an over-the-counter medication.
4 | This activity reviews the indications, action, and contraindications for cetirizine as a valuable agent in treating rhinitis and urticaria.
5 | This activity will highlight the mechanism of action, adverse event profile, and other key factors (e.g., off-label uses, dosing, pharmacodynamics, pharmacokinetics, monitoring, relevant interactions) pertinent for members of the interprofessional team in the treatment of patients using cetirizine.
6 | Summarize the mechanism of action of cetirizine.
7 | Outline the adverse effects of cetirizine.
8 | Review the monitoring of cetirizine.
9 | Explain how healthcare professionals should educate patients on the potential adverse effects of cetirizine, such as drowsiness and fatigue.
10 | Cetirizine was FDA-approved in the United States as a prescription-only product in 1995, and later in 2007, it got approval as an over-the-counter medication.
11 | Derived from the first-generation antihistamine hydroxyzine, cetirizine does not cross the blood-brain barrier to the extent of its first-generation counterparts; as a result, cetirizine is an effective treatment of allergic rhinitis that simultaneously minimizes the possibility of adverse sedative effects.
12 | In addition, cetirizine is a second-generation antihistamine that effectively relieves sneezing, rhinorrhea, and watery eyes associated with seasonal allergies and allergic rhinitis due to allergens such as dust mites and molds.
13 | In addition, cetirizine is available as a prescription-only ophthalmic formulation to treat allergic conjunctivitis.
14 | Cetirizine is an FDA-approved medication for the relief and treatment of allergic rhinitis and chronic urticaria.
15 | Cetirizine also effectively reduces hives' severity and pruritus in patients with idiopathic urticaria.
16 | Cetirizine is also safe to use in the geriatric population.
17 | Cetirizine is safe to treat perennial allergic rhinitis and urticaria in adults and children over the age of 6 months; it is indicated for treating seasonal allergies in adults and children two years and older.
18 | The ophthalmic formulation of cetirizine is FDA-approved to treat allergic conjunctivitis.
19 | Second-generation antihistamines like cetirizine are safe and effective treatment options in patients with chronic urticaria and are considered first-line agents by the AAAAI and ACAAI guidelines.
20 | Cetirizine is used as an adjunct to epinephrine (off-label) for the management of anaphylaxis.
21 | (The American Academy of Allergy, Asthma & Immunology (AAAAI) and the American College of Allergy, Asthma & Immunology (ACAAI) guidelines).
22 | Cetirizine is a fast-acting, highly selective antagonist of the peripheral histamine H1 receptor.
23 | The H1-receptors inhibited by cetirizine are primarily on respiratory smooth muscle cells, vascular endothelial cells, immune cells, and the gastrointestinal tract.
24 | Unlike first-generation antihistamines such as diphenhydramine and doxylamine, cetirizine does not cross the blood-brain barrier to a large extent, avoiding the neurons of the central nervous system.
25 | As a result, cetirizine produces minimal sedation compared to many first-generation antihistamines.
26 | Given its antagonism with histamine H1-receptors, cetirizine effectively reverses many of the effects of histamine.
27 | Like other second-generation antihistamines, cetirizine decreases vascular permeability, decreasing fluid escaping to tissues from capillaries.
28 | Cetirizine is also an inhibitor of histamine-induced bronchospasm.
29 | Cetirizine has been found to exert significant anti-inflammatory activity, reducing the infiltration of inflammatory cells in the setting of allergic rhinitis.
30 | Specifically, research has found that cetirizine minimizes the migration of neutrophils and eosinophils.
31 | Cetirizine is absorbed rapidly in the gastrointestinal tract and undergoes substantial excretion by the kidney.
32 | Cetirizine reaches peak plasma concentration after approximately one hour.
33 | Its effects typically begin after 20 to 60 minutes and persist for at least 24 hours.
34 | Food does not affect the extent of exposure (AUC) of cetirizine, but the time to attain peak concentration is delayed by 1.7 hours.
35 | The mean plasma protein binding of cetirizine is 93%.
36 | Cetirizine undergoes oxidative O-dealkylation to a metabolite with negligible antihistaminic activity.
37 | Cetirizine is not a substrate of the CYP450 system.
38 | Evidence indicates that cetirizine is a P-glycoprotein substrate, which should be considered in the concurrent use of cetirizine with P-gp inhibitors.
39 | The elimination half-life of cetirizine is 8.3 hours.
40 | Cetirizine is primarily excreted through the kidney.
41 | Occasional and small doses of cetirizine are acceptable while breastfeeding.
42 | Prolonged use of larger doses may cause a decrease in the milk supply or drowsiness and other adverse effects in the infant, particularly in combination with pseudoephedrine.
43 | The use of an ophthalmic formulation of cetirizine by the mother is hypothesized to have minimal risk to the breastfed infant.
44 | Clinicians should advise the mother to apply pressure over the tear duct by the corner of the eye and remove the leftover solution to decrease the amount of drug that reaches the breast milk.
45 | Cetirizine is a former US FDA pregnancy category B medicine.
46 | The American College of Obstetricians and Gynecologists & the American College of Allergy, Asthma, and Immunology (ACOG-ACAAI) suggests cetirizine for pregnant women who requires antihistamine treatment.
47 | Cetirizine should be used in pregnancy only when necessary.
48 | According to manufacturers prescribing information, 12 years and older patients with hepatic impairment should reduce the dose to 5 mg once daily.
49 | The manufacturer also recommends lowering the dose for 6 to 11 years old patients with hepatic impairment.
50 | According to manufacturers prescribing information, 12 years and older patients with decreased renal function (CrCL 11-31 mL/min) and patients on hemodialysis (CrCL less than 7 mL/min) should reduce the dose to 5 mg once daily.
51 | The manufacturer also recommends lowering the dose for 6 to 11 years old patients with renal impairment.
52 | Each bottle of 0.24% cetirizine hydrochloride contains benzalkonium chloride and can be absorbed by contact lenses.
53 | Manufacturers advise patients to remove contact lenses before instilling eye drops and wait for 10 min until the reinsertion of lenses.
54 | If irritation or redness persists after this precaution, then avoid the use of contact lenses.
55 | It is proven safe and effective for pediatric patients two years and above in clinical studies.
56 | Cetirizine is available as tablets, capsules, solutions, and orally disintegrating tablets.
57 | The dosing of cetirizine depends on the patient's age.
58 | In adults and children 12 years or older, the recommended dose is 5 or 10 mg per day orally, depending on symptom severity.
59 | It is available in 5 mg and 10 mg tablets and 5 mg/ 5 ml oral solution and elixir.
60 | The ophthalmic formulation is available as 0.24% cetirizine hydrochloride eye drops in 5 mL and 7.5 mL bottles.
61 | In children 6 to 11 years old, 5 or 10 mg (1 or 2 teaspoons) once daily in syrup form is recommended depending on symptom severity.
62 | In children 2 to 5 years old, the recommended dose is 2.5 mg (half a teaspoon) in syrup form once daily.
63 | n children six months to 23 months old, the recommended dose is 2.5 mg (half teaspoon) in syrup form once daily.
64 | One drop (0.24% cetirizine hydrochloride ophthalmic solution) is instilled in the affected eye twice daily for patients with allergic conjunctivitis.
65 | Cetirizine is safe and relatively well-tolerated for treating allergic rhinitis and urticaria.
66 | Although uncommon, its primary adverse effects in adults include somnolence, fatigue, pharyngitis, dizziness, and dry mouth.
67 | Somnolence, as a result of cetirizine, appears to be dose-related.
68 | Research indicates that in some patients, cetirizine contributes to daytime sleepiness.
69 | Children taking cetirizine most commonly experience similar side effects as adults taking cetirizine (somnolence, fatigue, and dry mouth).
70 | Children, in particular, are more likely than adults to experience headaches while taking cetirizine.
71 | In pediatric patients aged 2 to 11 years, the majority of adverse reactions reported with cetirizine were mild or moderate.
72 | Among all, somnolence appeared to be dose-related and abdominal pain was considered treatment-related.
73 | Common adverse drug reactions of cetirizine ophthalmic solution are conjunctival hyperemia and instillation site pain.
74 | While on cetirizine therapy, few cases of transient, reversible hepatic transaminase elevations have been reported in the literature.
75 | Some reports exist of hepatitis with elevated bilirubin too.
76 | In postmarketing studies, rare, potentially severe adverse events like severe hypotension, anaphylaxis, hemolytic anemia, cholestasis, orofacial dyskinesia, glomerulonephritis, hepatitis, stillbirth, and thrombocytopenia are reported.
77 | Ophthalmic formulation's common adverse reactions are local pain at the instillation site, ocular hyperemia, and decreased visual acuity.
78 | Patients should be advised not to use cetirizine concurrently with alcohol or other CNS depressants, such as benzodiazepines or opioids, as it may cause dose-related sedation.
79 | Pitolisant is a histamine-3 receptor competitive antagonist and inverse agonist used in patients with narcolepsy.
80 | Concurrent use of pitolisant with antihistamines like cetirizine may diminish the therapeutic efficacy of pitolisant.
81 | Avoid combination.
82 | Cetirizine decreases gabapentin plasma concentrations and reduces systemic exposure to gabapentin.
83 | However, gabapentin is a CNS depressant; hence the pharmacodynamic synergism leading to additional CNS depression may also be observed.
84 | Cetirizine is a substrate of P-glycoprotein, and verapamil is an inhibitor of P-glycoprotein.
85 | Concurrent administration of both drugs prevents the efflux of cetirizine from the CNS and increases antihistaminic activity.
86 | Cetirizine should not be administered with erdafitinib as erdatifinib is also an inhibitor of P-glycoprotein(ABCB1, MDR1).
87 | Cetirizine is contraindicated in anyone with a known hypersensitivity to it or any of its ingredients.
88 | Cetirizine is also contraindicated in anyone with a known hypersensitivity to hydroxyzine, as cetirizine is a metabolite of hydroxyzine.
89 | There are few well-controlled human studies on cetirizine in pregnant mothers, although these showed it to be safe during pregnancy in animal studies.
90 | First-generation antihistamines, diphenhydramine, and doxylamine are safest to use during pregnancy.
91 | However, first-generation antihistamines are more likely than second-generation antihistamines to cause somnolence; clinicians should counsel the patients regarding the potential adverse effects of the medication they choose to take during pregnancy.
92 | Patients taking cetirizine require monitoring for the relief of symptoms.
93 | Healthcare team members should also monitor patients for adverse effects such as fatigue and somnolence in adults and headaches in children.
94 | The kidney primarily excretes cetirizine; as a result, the risk of toxicity is typically higher in patients with impaired renal function.
95 | Patients with renal impairment should take a lower medication dosage in their age bracket.
96 | Liver function and enzymes should also be closely monitored in patients with hepatic impairment.
97 | Healthcare providers should make dosage adjustments as needed for patients with hepatic impairment.
98 | Cetirizine may be confused with sertraline (look-alike-sound-alike drugs).
99 | Clinicians and pharmacists should be careful while prescribing and dispensing this drug.
100 | Research showed the minimal lethal dose to be approximately 460 times the maximum recommended daily dose for adults in rats.
101 | The primary target of acute toxicity in rodents was the central nervous system.
102 | The primary target of multiple-dose toxicity in rodents was the liver.
103 | A small number of cases of cetirizine overdose appear in the literature.
104 | However, many overdoses of cetirizine in children result from improper medication storage by adults living in the same home.
105 | Most overdose incidents in children resolve spontaneously, with drowsiness and sedation being the main adverse effects observed.
106 | Drug-induced liver damage is common with numerous medications; there are reports of a small number of cases of cetirizine-induced liver damage; in all cases, liver enzyme values returned to normal after cessation of cetirizine.
107 | An adult who overdosed on 150 mg cetirizine had somnolence but did not have abnormal blood chemistry, hematology results, or other clinical signs.
108 | An infant overdosed on 180 mg of cetirizine and experienced restlessness and irritability, followed by drowsiness.
109 | Several hours after an accidental overdose of cetirizine, the six-year-old child presented with fixed and dilated pupils, tachycardia, agitation, hyperthermia, and hallucinations consistent with anticholinergic toxicity.
110 | There is no known specific antidote to cetirizine, and it can not be effectively removed by dialysis.
111 | When overdosed on cetirizine, treatment should be supportive and symptomatic, considering any concomitantly ingested medications.
112 | Cetirizine is a relatively safe and effective medication for treating allergic rhinitis, urticaria, and allergic conjunctivitis.
113 | As cetirizine is also available over the counter, prescribers should educate patients on possible side effects, such as drowsiness, fatigue, and dry mouth, while dispensing medicine.
114 | Healthcare providers should be careful when prescribing cetirizine to patients with impaired renal or hepatic function.
115 | Ophthalmologists should educate contact lens wearers to exercise precautions and proper direct use of eye drops.
116 | Patients using eye drops should be informed that local pain at the instillation site, ocular hyperemia, and decreased visual acuity are common adverse reactions with the ophthalmic formulation.
117 | Immunologists play a crucial role in the management of refractory urticaria.
118 | Nurses should monitor therapeutic success and consult patients not to combine cetirizine with drugs that cause central nervous system depression.
119 | Pharmacists should perform thorough medication reconciliation and verify that the patient is not taking any medications or supplements that could exacerbate cetirizine's adverse effects.
120 | Clinicians(MD, DO, NP, PA), nurses, and pharmacists who prescribe or recommend cetirizine to patients should also provide information on the safe storage of cetirizine to prevent accidental overdose by children.
121 | If the overdose of cetirizine is intentional, a psychiatrist should be consulted.
122 | Communication and collaboration among interprofessional teams can achieve the best patient outcomes and reduce healthcare service utilization costs.
123 |
--------------------------------------------------------------------------------
/texts_input/NBK501509.txt:
--------------------------------------------------------------------------------
1 | Small occasional doses of cetirizine are acceptable during breastfeeding.
2 | Larger doses or more prolonged use may cause drowsiness and other effects in the infant or decrease the milk supply, particularly in combination with a sympathomimetic such as pseudoephedrine or before lactation is well established.
3 | International guidelines recommend cetirizine as an acceptable choice if an antihistamine is required during breastfeeding.
4 | Cetirizine has been used successfully in cases of persistent pain of the breast during breastfeeding.
5 | Ophthalmic use of cetirizine by the mother should pose little risk to the breastfed infant.
6 | To substantially diminish the amount of drug that reaches the breastmilk after using eye drops, place pressure over the tear duct by the corner of the eye for 1 minute or more, then remove the excess solution with an absorbent tissue.
7 | Three women who were exclusively breastfeeding their 5- to 6-month-old infants were taking cetirizine 10 mg daily by mouth.
8 | Each mother donated milk samples before a dose and 1, 2, 4, 6, 8, 10, 12 and 24 hours after the dose.
9 | An average peak level of 49 mcg/L occurred at an average of 2 hours after the dose.
10 | The average milk concentration over the 24-hour period was 21.1 mcg/L. An exclusively breastfed infant would receive an average of 3.1 mcg/kg daily or a weight-adjusted dosage of 1.77% of the maternal dosage.
11 | As part of a validation study on analysis of cetirizine and levocetirizine in breastmilk, 252 steady-state milk samples from 228 women taking either cetirizine 5 to 20 mg daily (n = 229) or levocetirizine 5 mg daily (n = 9) were analyzed.
12 | Specific dosages and times of milk collection were not given.
13 | The median milk concentrations of cetirizine and levocetirizine was 13 mcg/L (range 0.65 to 65 mcg/L; IQR 4.9 to 24.8 mcg/L) in 228 samples.
14 | Twenty-four samples had levels below the limit of quantification (<0.39 mcg/L).
15 | Women taking cetirizine (n = 31) collected complete sample of milk at about 0, 2, 4, 8, 12 and 24 hours after a daily dose and submitted aliquots for analysis.
16 | The average milk concentration was 16.8 mcg/L and the half-life in milk was 7 hours.
17 | The peak milk concentration averaged 41 mcg/L at an average time of 2.4 hours after a dose.
18 | Using the peak milk concentration, the authors calculated that a fully breastfed infant would receive a maximum of 2.5 mcg/kg daily, which represents a relative infant dose of 1.9%.
19 | Using the average concentration would result in a daily infant dosage of 1 mcg/kg daily and a relative infant dose of 0.8%.
20 | In one telephone follow-up study, mothers reported irritability and colicky symptoms 10% of infants exposed to various antihistamines and drowsiness was reported in 1.6% of infants.
21 | None of the reactions required medical attention.
22 | A woman who was nursing (extent not stated) her newborn infant was treated for pemphigus with oral prednisolone 25 mg daily, with the dosage increased over 2 weeks to 60 mg daily.
23 | She was also taking cetirizine 10 mg daily and topical betamethasone 0.1% twice daily to the lesions.
24 | Because of a poor response, the betamethasone was changed to clobetasol propionate ointment 0.05%.
25 | She continued breastfeeding throughout treatment and her infant was developing normally at 8 weeks of age and beyond.
26 | A woman with narcolepsy took sodium oxybate 4 grams each night at 10 pm and 2 am as well as fluoxetine 20 mg and cetirizine 5 mg daily throughout pregnancy and postpartum.
27 | She breastfed her infant except for 4 hours after the 10 pm oxybate dose and 4 hours after the 2 am dose.
28 | She either pumped breastmilk or breastfed her infant just before each dose of oxybate.
29 | The infant was exclusively breastfed or breastmilk fed for 6 months when solids were introduced.
30 | The infant was evaluated at 2, 4 and 6 months with the Ages and Stages Questionnaires, which were withing the normal range as were the infant's growth and pediatrician's clinical impressions regarding the infant's growth and development.
31 | Three women taking long-term cetirizine 10 mg daily by mouth while exclusively breastfeeding their 5- to 6-month old infants.
32 | The mothers reported no adverse effects in their infants.
33 | Thirty-one women taking cetirizine 10 mg (n = 29) or 20 mg (n = 2) daily reported no adverse effects in 61% of their infants and minor adverse effects fever, sedation, rash, poor feeding, bruising, refusing of the breast or constipation.
34 | But mothers attributed these effects to other causes such a cold, weaning or learning to crawl.
35 | Antihistamines in relatively high doses given by injection can decrease basal serum prolactin in nonlactating women and in early postpartum women.
36 | However, suckling-induced prolactin secretion is not affected by antihistamine pretreatment of postpartum mothers.[10] Whether lower oral doses of cetirizine have the same effect on serum prolactin or whether the effects on prolactin have any consequences on breastfeeding success have not been studied.
37 | The prolactin level in a mother with established lactation may not affect her ability to breastfeed.
38 | In a study of 31 women taking cetirizine 10 mg (n = 29) or 20 mg (n = 2) daily, 10 reported a perceived decrease in milk supply over the prior 3 days.
39 |
--------------------------------------------------------------------------------
/texts_input/NBK548420.txt:
--------------------------------------------------------------------------------
1 | Cetirizine and its enantiomer levocetirizine are second generation antihistamines that are used for the treatment of allergic rhinitis, angioedema and chronic urticaria.
2 | Cetirizine and levocetirizine have been linked to rare, isolated instances of clinically apparent acute liver injury.
3 | Cetirizine (se tir' i zeen) is a second generation antihistamine (H1 receptor blocker) that is used widely to treat allergic symptoms associated with hay fever, seasonal allergies, urticaria, angioedema and atopic dermatitis.
4 | Levocetirizine (lee" voe se tir' i zeen) is the levorotatory R-enantiomer of cetirizine and its more active form.
5 | Cetirizine and levocetirizine belong to the piperazine class of antihistamines and, like other second generation antihistamines, are considered to be nonsedating.
6 | Indeed, prospective studies have shown that sedation is less common with cetirizine and levocetirizine than with first generation antihistamines such as diphenhydramine, but some degree of sedation may still occur.
7 | Cetirizine was approved for use by prescription in the United States in 1995 and as an over-the-counter medication in 2007.
8 | Cetirizine is currently one of the most widely used medications with more than 5 million prescriptions filled yearly in addition to considerable nonprescription use.
9 | Cetirizine is available in 5 and 10 mg tablets and capsules in multiple generic forms and under the trade name Zyrtec.
10 | Oral solutions and fixed combinations with pseudoephrine are also available.
11 | The typical dose of cetirizine is 5 to 10 mg once daily and it is often given chronically, at least during allergy season.
12 | Levocetirizine was approved for use in the United States in 2007 and is currently available by prescription only.
13 | Levocetirizine is available in 5 mg tablets and in an oral solution generically and under the brand name Xyzal.
14 | Common side effects of the second generation antihistamines include blurred vision, dry mouth and throat, palpitations, tachycardia, abdominal distress, constipation and headache.
15 | Although considered to be nonsedating antihistamines, cetirizine and levocetirizine can cause mild drowsiness particularly at higher doses.
16 | Antihistamines can worsen urinary retention and glaucoma.
17 | Cetirizine and levocetirizine use are not generally associated with liver enzyme elevations, but have been linked to rare instances of clinically apparent liver injury.
18 | In published reports, the time to onset varied widely, from 1 to 40 weeks and the pattern of injury ranged from cholestatic hepatitis to hepatocellular jaundice.
19 | The reported cases were mild to moderate in severity and self-limited in course with rapid recovery after stopping the medication.
20 | Immunoallergic and autoimmune features were rare, but recurrence of acute liver injury upon reexposure to cetirizine has been described.
21 | Likelihood score: C (probable rare cause of clinically apparent liver injury).
22 | The cause of acute liver injury from cetirizine is not known.
23 | It is metabolized by the liver and a toxic metabolite may account for idiosyncratic injury.
24 | Acute liver injury from cetirizine and levocetirizine is rare and usually self-limited.
25 | Acute liver failure and vanishing bile duct syndrome have not been linked to these second generation antihistamines.
26 | Recurrence of liver injury has been described in patients who restart cetirizine.
27 | There is no information about cross reactivity among the various antihistamines after clinically apparent hepatotoxicity, but switching to another agent with a different structure and belonging to a separate class is probably safe.
28 | References on the safety and potential hepatotoxicity of antihistamines are given together after the Overview section on Antihistamines.
29 | A 28 year old man developed jaundice and pruritus after having taken cetirizine (10 mg daily) for 2 years for allergic rhinitis.
30 | He had no previous history of liver disease, drug allergies, or risk factors for viral hepatitis.
31 | He drank alcohol but only on weekends, although often consuming six 12-ounce cans of beer in one sitting.
32 | His other medical conditions included only allergic rhinitis and episodes of sinusitis.
33 | His only other medications were phenylephrine nasal sprays, and he specifically denied use of other over-the-counter drugs, herbal medications or nutritional supplements.
34 | On examination, he was jaundiced but had no fever, rash, hepatomegaly or signs of chronic liver disease.
35 | Laboratory testing showed a total serum bilirubin of 9.7 mg/dL, ALT 215 U/L, AST 61 U/L, and alkaline phosphatase 260 U/L (Table).
36 | Test for viral hepatitis and autoimmune liver disease were negative.
37 | Abdominal ultrasound and endoscopic retrograde cholangiopancreatography showed no evidence of gallstones or biliary obstruction.
38 | He stopped cetirizine at the time of presentation and was treated with hydroxyzine for control of pruritus.
39 | However, jaundice and pruritus persisted, and a liver biopsy was done which showed zone 3 cholestasis and mild hepatic necrosis and inflammation compatible with drug induced liver injury.
40 | There was no steatosis or changes suggestive of alcohol related liver injury.
41 | He was started on ursodiol and improved slowly, but serum enzymes and bilirubin remained slightly elevated at the time of the last follow up visit.
42 | A young man developed severe jaundice and pruritus after taking cetirizine for allergic rhinitis for several years.
43 | He denied taking other medications and had no clinical, serologic or radiologic evidence for viral hepatitis, autoimmune liver disease or biliary tract disease.
44 | A liver biopsy showed marked cholestasis but little or no inflammation, steatosis or fibrosis.
45 | The clinical phenotype might be best described as "bland cholestasis" which raises the possibility of unacknowledged use of anabolic steroids.
46 | The prolonged jaundice and pruritus also favors this diagnosis and the cetirizine may have been used as a cover for the illicit use of body building drugs.
47 | Actually cetirizine is closely related to and metabolized to hydroxyzine (another member of the piperazine class of antihistamines), which he was given during the episode of jaundice because of pruritus.
48 | This report, like other case reports of cetirizine hepatotoxicity, is not totally convincing.
49 |
--------------------------------------------------------------------------------