├── .gitignore
├── .streamlit
└── config.toml
├── Dockerfile
├── README.md
├── app
├── __init__.py
├── app.py
├── config.py
├── constants.py
├── hotel_index_schema.yml
├── prompt.py
└── retrieve.py
├── assets
├── design-dark.png
├── design.excalidraw
└── design.png
├── data
└── data.pkl
├── docker-compose.yml
├── env.template
├── requirements.txt
└── run.py
/.gitignore:
--------------------------------------------------------------------------------
1 | .env
2 | *__pycache__/
--------------------------------------------------------------------------------
/.streamlit/config.toml:
--------------------------------------------------------------------------------
1 | [theme]
2 | primaryColor="#F63366"
3 | backgroundColor="#FFFFFF"
4 | secondaryBackgroundColor="#F0F2F6"
5 | textColor="#262730"
6 | font="sans serif"
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM python:3.8-slim
2 |
3 | RUN apt-get update && apt-get install python3 curl python3-pip -y
4 |
5 | COPY ./requirements.txt .
6 | RUN pip3 install --upgrade pip
7 |
8 | # install pytorch cpu to reduce docker image size
9 | RUN pip install --no-cache-dir torch==2.0.1+cpu -f https://download.pytorch.org/whl/torch_stable.html
10 | RUN pip3 install -r requirements.txt
11 |
12 | WORKDIR /hotel
13 | COPY . /hotel
14 |
15 | ENTRYPOINT [ "streamlit", "run" ]
16 | CMD [ "run.py", "--server.fileWatcherType", "none", "--browser.gatherUsageStats", "false", "--server.address", "0.0.0.0"]
17 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
10 |
11 | # LLM Hotel Recommender
12 |
13 | The LLM Hotel Recommender is a Streamlit app that uses [Redis](https://redis.com) and the [OpenAI API](https://api.openai.com/) to generate hotel recommendations based on a user's preferences. Because Redis can perform semantic search in addition to other operations like tag and text search
14 | users are able to search for hotels in the US based on a variety of criteria, including:
15 |
16 | - State
17 | - City
18 | - Positive Qualities
19 | - Negative Qualities
20 |
21 | The application will cite it's sources (reviews) for each recommendation and provide all the reviews that were returned.
22 |
23 |
24 |
25 |
26 |

27 |
28 |
29 | ## Design
30 |
31 | The recommender uses the Hypothetical Document Embeddings (HyDE) approach which uses an LLM (OpenAI in this case)
32 | to generate a fake review based on user input. The system then uses Redis vector search to semantically search
33 | for hotels with reviews that are similar to the fake review. The returned reviews are then passed to another LLM to
34 | generate a recommendation for the user.
35 |
36 | 
37 | 
38 |
39 | ## Run the Application
40 |
41 |
42 | ### Docker Compose
43 | 1. Create your env file:
44 |
45 | ```bash
46 | $ cp .env.template .env
47 | ```
48 | *fill out values, most importantly, your `OPENAI_API_KEY`*
49 |
50 | 2. Run with docker compose:
51 | ```bash
52 | $ docker compose up
53 | ```
54 | *add `-d` option to daemonize the processes to the background if you wish.*
55 |
56 | Issues with dependencies? Try force-building with no-cache:
57 | ```
58 | $ docker compose build --no-cache
59 | ```
60 |
61 | 3. Navigate to:
62 | ```
63 | http://localhost:8501/
64 | ```
65 |
66 |
67 | ### Local
68 |
69 | 1. Create your env file:
70 |
71 | ```bash
72 | $ cp .env.template .env
73 | ```
74 | *fill out values, most importantly, your `OPENAI_API_KEY`*
75 |
76 | 2. Clone the repo:
77 | ```bash
78 | $ git clone https://github.com/redis-developer/LLM-Recommender.git
79 |
80 | 3. Install dependencies:
81 | ```bash
82 | $ pip install -r requirements.txt
83 | ```
84 |
85 | 3. Run the app:
86 | ```bash
87 | $ streamlit run run.py
88 | ```
89 |
90 |
91 | ## Known Bugs
92 |
93 | 1. Hotels by the same name in different cities are not handled well
94 |
95 | ## Future Work
96 |
97 | 1. Add more search criteria (GeoFilter, Price, etc.)
98 | 2. Dataset is relatively sparse
99 | 3. Use OpenAI Functions or parsing to extract Hotel name from recommendation instead of LLM
100 |
--------------------------------------------------------------------------------
/app/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/redis-developer/LLM-Recommender/0b803d9605a14f3852df949f699a4988a7e962b6/app/__init__.py
--------------------------------------------------------------------------------
/app/app.py:
--------------------------------------------------------------------------------
1 |
2 | import os
3 | import openai
4 | import pickle
5 | import streamlit as st
6 |
7 | from urllib.error import URLError
8 | from redisvl.vectorize.text import HFTextVectorizer
9 | os.environ["TOKENIZERS_PARALLELISM"] = "false"
10 |
11 | from app.config import (
12 | CHAT_MODEL,
13 | VECTORIZER,
14 | DATAFILE
15 | )
16 | from app.retrieve import (
17 | create_retrieval_index,
18 | retrieve_context,
19 | make_filter,
20 | retrieve_top_three_hotels
21 | )
22 | from app.prompt import (
23 | make_prompt,
24 | generate_hyde_prompt,
25 | format_prompt_reviews,
26 | get_recommended_hotel_prompt
27 | )
28 |
29 | from app.constants import (
30 | STATES,
31 | CITIES
32 | )
33 |
34 | def recommend_hotel(positive, negative, reviews):
35 |
36 | prompt = make_prompt(positive, negative, reviews)
37 | retrieval = openai.ChatCompletion.create(
38 | model=CHAT_MODEL,
39 | messages=[{'role':"user",
40 | 'content': prompt}],
41 | max_tokens=1000)
42 |
43 | # get the response
44 | response = retrieval['choices'][0]['message']['content']
45 | return response
46 |
47 |
48 | def get_hotel_name(recommendation):
49 | prompt = get_recommended_hotel_prompt(recommendation)
50 | retrieval = openai.ChatCompletion.create(
51 | model=CHAT_MODEL,
52 | messages=[{'role':"user",
53 | 'content': prompt}],
54 | max_tokens=1000)
55 |
56 | # get the response
57 | response = retrieval['choices'][0]['message']['content']
58 | return response
59 |
60 | @st.cache_resource
61 | def vectorizer():
62 | return HFTextVectorizer(f"sentence-transformers/{VECTORIZER}")
63 |
64 | @st.cache_data
65 | def load_data():
66 | data = pickle.load(open(DATAFILE, "rb"))
67 | return data
68 |
69 | def set_city():
70 | state = st.session_state["state"]
71 | try:
72 | return CITIES[state][0]
73 | except IndexError:
74 | return []
75 |
76 |
77 | def main():
78 | data = load_data()
79 | INDEX = create_retrieval_index(data)
80 | EMBEDDING_MODEL = vectorizer()
81 |
82 | try:
83 | # Defining default values
84 | defaults = {
85 | "state": "",
86 | "city": "",
87 | "positive": "",
88 | "negative": "",
89 | "response": "",
90 | "hotel_info": "",
91 | "hotel_reviews": "",
92 | "all_similar_reviews": ""
93 | }
94 |
95 | # Checking if keys exist in session state, if not, initializing them
96 | for key, value in defaults.items():
97 | if key not in st.session_state:
98 | st.session_state[key] = value
99 |
100 | col1, col2 = st.columns([4,2])
101 |
102 | st.write("# LLM Hotel Recommender")
103 | with st.sidebar:
104 | st.write("## Filter By Location")
105 | st.selectbox("State", STATES, key="state", on_change=set_city)
106 | st.selectbox("City", CITIES[st.session_state['state']], key="city")
107 |
108 | st.write("The LLM Hotel Recommender is a Streamlit app that uses Redis and the " +
109 | "OpenAI API to generate hotel recommendations based on a user's preferences.")
110 |
111 |
112 | st.text_input("What would you like in a hotel?", key="positive")
113 | st.text_input("What would you like to avoid in a hotel?", key="negative")
114 |
115 | if st.button("Find Hotel"):
116 | with st.spinner("OpenAI and Redis are working to find you a hotel..."):
117 | # filter
118 | query_filter = make_filter(st.session_state['state'], st.session_state['city'])
119 |
120 | # make a hyde prompt
121 | hyde_prompt = generate_hyde_prompt(
122 | st.session_state['positive'],
123 | st.session_state['negative']
124 | )
125 |
126 | # Retrieve the context
127 | context = retrieve_context(INDEX,
128 | hyde_prompt,
129 | EMBEDDING_MODEL,
130 | query_filter=query_filter)
131 | top_three_hotels = retrieve_top_three_hotels(context)
132 |
133 | # TODO catch index error
134 | top_hotel = top_three_hotels[0]
135 | top_hotel_reviews = format_prompt_reviews([top_hotel])
136 | other_options = format_prompt_reviews(top_three_hotels)
137 |
138 | recommendation = recommend_hotel(
139 | st.session_state['positive'],
140 | st.session_state['negative'],
141 | top_hotel_reviews
142 | )
143 |
144 | hotel_info = {
145 | "Hotel Name": top_hotel['name'],
146 | "Hotel Address": top_hotel['address'],
147 | "City": top_hotel['city'],
148 | "State": top_hotel['state'],
149 | }
150 | hotel_info = "\n" + "\n".join([f"{k}: {v}" for k, v in hotel_info.items()])
151 | st.session_state['response'] = recommendation
152 | st.session_state['hotel_info'] = hotel_info
153 | st.session_state['hotel_reviews'] = top_hotel_reviews
154 | st.session_state['all_similar_reviews'] = other_options
155 |
156 |
157 | st.write("### Recommendations")
158 | st.write(f"{st.session_state['response']}")
159 | with st.expander("Show Hotel Info"):
160 | st.text(st.session_state['hotel_info'])
161 | with st.expander("Show Hotel Reviews"):
162 | st.text(st.session_state['hotel_reviews'])
163 | with st.expander("Show All Similar Reviews"):
164 | st.text(st.session_state['all_similar_reviews'])
165 |
166 | st.write("\n")
167 | st.write("---------")
168 | st.write("\n")
169 | st.write("### About")
170 | st.write("The recommender uses the Hypothetical Document Embeddings (HyDE)" +
171 | " approach which uses an LLM (OpenAI in this case) to generate a fake review" +
172 | " based on user input. The system then uses Redis vector search to semantically search"
173 | " for hotels with reviews that are similar to the fake review. The returned reviews" +
174 | " are then passed to another LLM to generate a recommendation for the user.")
175 |
176 | st.write("#### Dataset")
177 | st.write("The dataset is from [Datafiniti](https://data.world/datafiniti/hotel-reviews) and is hosted on data.world")
178 |
179 | except URLError as e:
180 | st.error(
181 | """
182 | **This demo requires internet access.**
183 | Connection error: %s
184 | """
185 | % e.reason
186 | )
--------------------------------------------------------------------------------
/app/config.py:
--------------------------------------------------------------------------------
1 | import os
2 | import openai
3 | from dotenv import load_dotenv
4 |
5 | load_dotenv()
6 |
7 | # get path to the directory where this file is located
8 | BASEDIR = os.path.abspath(os.path.dirname(__file__))
9 | # get parent directory of BASEDIR
10 | PARENTDIR = os.path.abspath(os.path.join(BASEDIR, os.pardir))
11 |
12 | # Index and Hotel data
13 | SCHEMA = os.getenv("SCHEMA", f"{BASEDIR}/hotel_index_schema.yml")
14 | DATADIR = os.getenv("DATADIR", f"{PARENTDIR}/data")
15 | DATAFILE = os.getenv("DATAFILE", f"{DATADIR}/data.pkl")
16 |
17 | # Redis information
18 | REDIS_HOST = os.getenv("REDIS_HOST", "localhost")
19 | REDIS_PORT = os.getenv("REDIS_PORT", "6379")
20 | REDIS_ADDRESS = f"redis://{REDIS_HOST}:{REDIS_PORT}"
21 |
22 | # AI models
23 | openai.api_key = os.getenv('OPENAI_API_KEY')
24 | CHAT_MODEL = os.getenv('OPENAI_CHAT_MODEL')
25 | VECTORIZER = os.getenv('HF_VECTOR_MODEL', 'all-MiniLM-L6-v2')
--------------------------------------------------------------------------------
/app/constants.py:
--------------------------------------------------------------------------------
1 | STATES = [
2 | 'AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA',
3 | 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD',
4 | 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 'NJ',
5 | 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC',
6 | 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY', ""
7 | ]
8 |
9 | CITIES = {"": [],
10 | 'AL': ['Mobile',
11 | 'Jasper',
12 | 'Clanton',
13 | 'Huntsville',
14 | 'Birmingham',
15 | 'Montgomery',
16 | 'Prattville',
17 | 'Enterprise',
18 | 'Anderson',
19 | 'Lincoln'],
20 | 'AK': ['Healy', 'Juneau', 'Dutch Harbor'],
21 | 'AZ': ['Glendale',
22 | 'Tucson',
23 | 'Eloy',
24 | 'Phoenix',
25 | 'Prescott Valley',
26 | 'Mesa',
27 | 'Payson',
28 | 'Wellton'],
29 | 'AR': ['Harrison',
30 | 'Bull Shoals',
31 | 'Yellville',
32 | 'West Memphis',
33 | 'Bryant',
34 | 'Batesville',
35 | 'Springdale',
36 | 'Bella Vista',
37 | 'Russellville'],
38 | 'CA': ['Palm Springs',
39 | 'Livermore',
40 | 'Barstow',
41 | 'Irvine',
42 | 'San Diego',
43 | 'Tracy',
44 | 'Carlsbad',
45 | 'Pasadena',
46 | 'Marina',
47 | 'San Jose',
48 | 'Napa',
49 | 'Marina Del Rey',
50 | 'Rancho Mirage',
51 | 'Sunnyvale',
52 | 'San Francisco',
53 | 'Colton',
54 | 'Gardena',
55 | 'San Bruno',
56 | 'Morro Bay',
57 | 'Joshua Tree',
58 | 'Long Beach',
59 | 'Anaheim',
60 | 'Garberville',
61 | 'Upland',
62 | 'Eureka',
63 | 'Inglewood',
64 | 'Santa Barbara',
65 | 'Redding',
66 | 'Signal Hill',
67 | 'Selma',
68 | 'Lake Forest',
69 | 'Garden Grove',
70 | 'Mariposa',
71 | 'San Clemente',
72 | 'Studio City',
73 | 'Rohnert Park',
74 | 'Hesperia',
75 | 'Victorville',
76 | 'Coronado'],
77 | 'CO': ['Silverton',
78 | 'Castle Rock',
79 | 'Denver',
80 | 'Aspen',
81 | 'Arvada',
82 | 'Griante',
83 | 'Cripple Creek',
84 | 'Mosca',
85 | 'Walden',
86 | 'Colorado Springs',
87 | 'Fort Collins',
88 | 'Pueblo',
89 | 'Grand Junction'],
90 | 'CT': ['Bristol', 'Plainville', 'Southport', 'Hartford'],
91 | 'DE': ['New Castle'],
92 | 'FL': ['West Palm Beach',
93 | 'Bonita Springs',
94 | 'Orlando',
95 | 'Palm Harbor',
96 | 'Lake Worth',
97 | 'Delray Beach',
98 | 'Naples',
99 | 'Jacksonville',
100 | 'Deland',
101 | 'Sarasota',
102 | 'Bradenton',
103 | 'Miami',
104 | 'Englewood',
105 | 'Fort Pierce'],
106 | 'GA': ['Mableton',
107 | 'Dublin',
108 | 'Dahlonega',
109 | 'Griffin',
110 | 'Douglasville',
111 | 'Macon',
112 | 'Atlanta',
113 | 'Norcross',
114 | 'Warner Robins',
115 | 'Calhoun',
116 | 'Savannah',
117 | 'Cave Spring',
118 | 'Lithia Springs',
119 | 'Madison',
120 | 'Lagrange',
121 | 'Silver Creek',
122 | 'Lavonia',
123 | 'Alpharetta',
124 | 'Whigham'],
125 | 'HI': ['Kapaa', 'Kihei', 'Princeville', 'Kailua Kona'],
126 | 'ID': ['Gooding', 'Coeur D Alene', 'Boise', 'Idaho Falls'],
127 | 'IL': ['Westmont',
128 | 'Arcola',
129 | 'Chicago',
130 | 'Troy',
131 | 'Rockford',
132 | 'Northbrook',
133 | 'Freeport',
134 | 'Des Plaines',
135 | 'Princeton',
136 | 'Deerfield',
137 | 'Galena',
138 | 'Glenview',
139 | 'Elk Grove Village'],
140 | 'IN': ['Terre Haute',
141 | 'Merrillville',
142 | 'Bloomington',
143 | 'Elkhart',
144 | 'Indianapolis',
145 | 'Fort Wayne',
146 | 'Plainfield'],
147 | 'IA': ['Humeston',
148 | 'Coralville',
149 | 'Ames',
150 | 'Raymond',
151 | 'Guernsey',
152 | 'Boone',
153 | 'Cedar Rapids'],
154 | 'KS': ['Olathe', 'Manhattan', 'Ulysses'],
155 | 'KY': ['Corbin',
156 | 'Kuttawa',
157 | 'Owensboro',
158 | 'Hopkinsville',
159 | 'Lexington',
160 | 'Park City'],
161 | 'LA': ['New Orleans',
162 | 'Bossier City',
163 | 'Opelousas',
164 | 'Monroe',
165 | 'Shreveport',
166 | 'Slidell',
167 | 'Sulphur'],
168 | 'ME': ['Waterville', 'Ogunquit'],
169 | 'MD': ['Columbia',
170 | 'Pittsville',
171 | 'Brandywine',
172 | 'Glen Burnie',
173 | 'Saint Leonard',
174 | 'Annapolis',
175 | 'Takoma Park'],
176 | 'MA': ['Boston',
177 | 'West Springfield',
178 | 'Cambridge',
179 | 'Brockton',
180 | 'Auburn',
181 | 'Fitchburg'],
182 | 'MI': ['Holland',
183 | 'Oscoda',
184 | 'Bay City',
185 | 'Prescott',
186 | 'Utica',
187 | 'Saginaw',
188 | 'Southfield',
189 | 'Grand Rapids',
190 | 'Alma',
191 | 'Saint Ignace',
192 | 'Battle Creek',
193 | 'Clinton Township',
194 | 'Grayling',
195 | 'Whitehall',
196 | 'Big Rapids'],
197 | 'MN': ['Harmony',
198 | 'Detroit Lakes',
199 | 'Shakopee',
200 | 'Windom',
201 | 'Worthington',
202 | 'Minneapolis',
203 | 'Fairmont',
204 | 'Lake Elmo',
205 | 'Burnsville',
206 | 'Saint Cloud'],
207 | 'MS': ['Batesville', 'Biloxi'],
208 | 'MO': ['Osage Beach',
209 | 'St. Robert',
210 | 'Lamar',
211 | 'Saint Mary',
212 | 'Licking',
213 | 'Saint Louis',
214 | 'Joplin',
215 | 'Sikeston',
216 | 'Kansas City',
217 | 'Springfield'],
218 | 'MT': ['Whitefish',
219 | 'Billings',
220 | 'Kalispell',
221 | 'Columbia Falls',
222 | 'Miles City',
223 | 'Big Sky'],
224 | 'NE': ['Valentine', 'Burwell', 'Cozad', 'North Platte', 'Gretna'],
225 | 'NV': ['Las Vegas', 'Incline Village', 'Laughlin'],
226 | 'NH': ['Hampton', 'Twin Mountain', 'Northwood', 'Gorham'],
227 | 'NJ': ['Clifton',
228 | 'Springfield',
229 | 'Mount Arlington',
230 | 'Bordentown',
231 | 'Ridgefield Park',
232 | 'Carlstadt'],
233 | 'NM': ['Albuquerque', 'Portales', 'Raton', 'Tatum'],
234 | 'NY': ['Ithaca',
235 | 'Staten Island',
236 | 'Deansboro',
237 | 'New York',
238 | 'Hauppauge',
239 | 'Geneva',
240 | 'Cooperstown',
241 | 'Newburgh',
242 | 'East Syracuse',
243 | 'Schenectady',
244 | 'Albany',
245 | 'Woodstock'],
246 | 'NC': ['Lake Lure',
247 | 'Forest City',
248 | 'Cary',
249 | 'Raleigh',
250 | 'Charlotte',
251 | 'Rocky Mount',
252 | 'Southern Pines',
253 | 'Lincolnton',
254 | 'Burlington'],
255 | 'ND': ['Devils Lake'],
256 | 'OH': ['Alliance',
257 | 'Stone Creek',
258 | 'Dayton',
259 | 'Bowling Green',
260 | 'Columbus',
261 | 'Portland',
262 | 'Troy',
263 | 'Springfield'],
264 | 'OK': ['El Reno',
265 | 'Oklahoma City',
266 | 'Owasso',
267 | 'Enid',
268 | 'Elk City',
269 | 'Seiling',
270 | 'Moore',
271 | 'Lebanon',
272 | 'Blackwell'],
273 | 'OR': ['Canyonville',
274 | 'Portland',
275 | 'Lincoln City',
276 | 'Bend',
277 | 'Woodburn',
278 | 'Newport',
279 | 'Eugene',
280 | 'Gresham',
281 | 'Forest Grove'],
282 | 'PA': ['Harleigh',
283 | 'Warren',
284 | 'Erie',
285 | 'Harrisburg',
286 | 'Altoona',
287 | 'Ramey',
288 | 'Palermo',
289 | 'Mercer',
290 | 'Tannersville',
291 | 'Cranberry Twp',
292 | 'East Hickory',
293 | 'Mendenhall',
294 | 'Huntingdon',
295 | 'Indiana'],
296 | 'RI': ['Providence'],
297 | 'SC': ['Columbia',
298 | 'Dillon',
299 | 'Aiken',
300 | 'Georgetown',
301 | 'Abbeville',
302 | 'Irmo',
303 | 'Blythewood',
304 | 'Mullins',
305 | 'Boiling Springs',
306 | 'Little River',
307 | 'Florence'],
308 | 'SD': ['Brookings', 'Spearfish'],
309 | 'TN': ['Lenoir City',
310 | 'Soddy Daisy',
311 | 'Madisonville',
312 | 'Nashville',
313 | 'Johnson City',
314 | 'Kingston',
315 | 'Chattanooga',
316 | 'Jackson',
317 | 'Knoxville',
318 | 'Clinton',
319 | 'Sweetwater',
320 | 'Cleveland',
321 | 'Hendersonville',
322 | 'Crossville',
323 | 'Gatlinburg'],
324 | 'TX': ['Houston',
325 | 'Beaumont',
326 | 'Mont Belvieu',
327 | 'Grand Prairie',
328 | 'Tyler',
329 | 'Waco',
330 | 'Austin',
331 | 'San Antonio',
332 | 'Dallas',
333 | 'Abilene',
334 | 'Fort Worth',
335 | 'Irving',
336 | 'Carrollton',
337 | 'El Paso',
338 | 'San Marcos',
339 | 'Corpus Christi',
340 | 'Carrizo Springs',
341 | 'Lubbock',
342 | 'Celina',
343 | 'Port Aransas',
344 | 'Junction'],
345 | 'UT': ['Eden', 'Park City', 'Springdale', 'Midway'],
346 | 'VT': [],
347 | 'VA': ['Virginia Beach',
348 | 'Leesburg',
349 | 'Alexandria',
350 | 'Manassas',
351 | 'Roanoke',
352 | 'Ruther Glen',
353 | 'Ashburn',
354 | 'Springfield',
355 | 'Dublin',
356 | 'Richmond',
357 | 'Arlington',
358 | 'Tappahannock',
359 | 'Max Meadows',
360 | 'Chantilly',
361 | 'Emporia'],
362 | 'WA': ['Spokane',
363 | 'Winthrop',
364 | 'Auburn',
365 | 'Seattle',
366 | 'Medina',
367 | 'Friday Harbor',
368 | 'Woodland',
369 | 'Oak Harbor'],
370 | 'WV': ['Charles Town'],
371 | 'WI': ['Fremont',
372 | 'Random Lake',
373 | 'Appleton',
374 | 'Williams Bay',
375 | 'Eau Claire',
376 | 'Genoa City',
377 | 'Oconomowoc',
378 | 'Fond Du Lac',
379 | 'Kenosha',
380 | 'Black River Falls',
381 | 'Green Bay'],
382 | 'WY': ['Cheyenne', 'Cody', 'Pinedale']}
--------------------------------------------------------------------------------
/app/hotel_index_schema.yml:
--------------------------------------------------------------------------------
1 |
2 | index:
3 | name: hotelsindex
4 | prefix: hotels
5 |
6 | fields:
7 | tag:
8 | - name: name
9 | - name: city
10 | - name: state
11 | - name: country
12 | - name: categories
13 | separator: ","
14 | text:
15 | - name: review
16 | - name: title
17 | - name: address
18 | numeric:
19 | - name: rating
20 | vector:
21 | - name: embedding
22 | dims: 384
23 | distance_metric: cosine
24 | algorithm: flat
25 | datatype: float32
26 |
--------------------------------------------------------------------------------
/app/prompt.py:
--------------------------------------------------------------------------------
1 | import openai
2 | from typing import List, Union, Dict
3 | from app.config import CHAT_MODEL
4 |
5 |
6 | def generate_hyde_prompt(positive, negative):
7 |
8 | hyde_prompt = format_hyde_prompt(positive, negative)
9 |
10 | hyde_review = openai.ChatCompletion.create(
11 | model=CHAT_MODEL,
12 | messages=[{'role':"user",
13 | 'content': hyde_prompt}],
14 | max_tokens=200)
15 |
16 | hyde_review = hyde_review['choices'][0]['message']['content']
17 |
18 | return hyde_review
19 |
20 |
21 |
22 | def format_hyde_prompt(positive: str, negative: str):
23 | retrieval_prompt = f'''Your job is to generate a review for a hotel based on the positive and negative qualities provided.
24 | The review should present the positive qualities and the opposite of the negative qualities provided.
25 | The review should be at least 10 words long and from the prospective of a customer who stayed at the hotel. Be informal
26 | and concise. You aren't that smart.
27 |
28 | Positive Qualities the user would like:
29 |
30 | {positive}
31 |
32 | Negative Qualities the user would like to avoid:
33 |
34 | {negative}
35 | '''
36 | return retrieval_prompt
37 |
38 | def get_recommended_hotel_prompt(generated_output):
39 | prompt = f'''The following is a recommendation for a hotel based on reviews.
40 |
41 | {generated_output}
42 |
43 | What is the name of the recommended hotel? Include no extra information other than the name of the hotel
44 | as presented in the review.
45 | '''
46 | return prompt
47 |
48 | def make_prompt(positive: str, negative: str, reviews: str):
49 | retrieval_prompt = f'''You are a service dedicated to recommending hotels based on user reviews.
50 | You will be provided positive and negative qualities the user is looking for in a hotel, as well as a
51 | large number of reviews of a hotel that is likely to be a good candidate. Explain why the hotel is a good
52 | fit for the user based on the reviews and explain any drawbacks the user should be aware of but only if there
53 | are any. Always start the suggestion with "Based on user reviews, I suggest the following hotel:"
54 |
55 | Positive Qualities the user would like:
56 |
57 | {positive}
58 |
59 | Negative Qualities the user would like to avoid:
60 |
61 | {negative}
62 |
63 | Reviews:
64 |
65 | {reviews}
66 |
67 | Format for your response:
68 |
69 | Hotel: \n
70 | Reason: \n
71 | '''
72 | return retrieval_prompt
73 |
74 |
75 | def format_prompt_reviews(top_hotels: List[Dict[str, Union[str, List[str]]]]):
76 | content = []
77 | if len(top_hotels) > 0:
78 | # join the hotel name and the review
79 | for hotel in top_hotels:
80 | content.append(f"Hotel Name: {hotel['name']}\n")
81 | for i, review in enumerate(hotel['reviews']):
82 | content.append(f"Review {i+1}: {review}\n")
83 | content.append("\n")
84 | return "".join(content)
85 |
--------------------------------------------------------------------------------
/app/retrieve.py:
--------------------------------------------------------------------------------
1 |
2 | from redisvl.query import VectorQuery
3 | from redisvl.query.filter import Tag, FilterExpression
4 | from redisvl.index import SearchIndex
5 | from typing import List, Dict, Union, Any
6 |
7 | from app.config import REDIS_ADDRESS, SCHEMA
8 |
9 |
10 | def create_retrieval_index(data):
11 | index = SearchIndex.from_yaml(SCHEMA)
12 | index.connect(url=REDIS_ADDRESS)
13 | if index.exists():
14 | index = SearchIndex.from_existing('hotelsindex', REDIS_ADDRESS)
15 | else:
16 | index.create(overwrite=True)
17 | # load data
18 | index.load(data)
19 | return index
20 |
21 |
22 | def retrieve_context(index, search_prompt, vectorizer, query_filter=None):
23 |
24 | # Embed the query
25 | query_embedding = vectorizer.embed(search_prompt)
26 |
27 | # Get the top result from the index
28 | vector_query = VectorQuery(
29 | vector=query_embedding,
30 | vector_field_name="embedding",
31 | return_fields=["review", "name", "title", "address", "city", "state"],
32 | filter_expression=query_filter,
33 | num_results=30,
34 | )
35 |
36 | results = index.query(vector_query)
37 | return results
38 |
39 |
40 | def retrieve_top_three_hotels(results: List[Dict[str, Any]]):
41 | # count the number of reviews for each hotel and return the three with the most reviews
42 | hotel_reviews: Dict[str, List[int, List[str]]] = {}
43 | hotel_data: Dict[str, Dict] = {}
44 |
45 | def get_fields(doc):
46 | return {
47 | "name": doc["name"],
48 | "address": doc["address"],
49 | "city": doc["city"],
50 | "state": doc["state"],
51 | "title": doc["title"],
52 | "review": doc["review"],
53 | }
54 |
55 | for doc in results:
56 | hash_key = str(hash(doc["name"] + doc["address"] + doc["city"] + doc["state"]))
57 | if hash_key in hotel_reviews:
58 | hotel_reviews[hash_key][0] += 1
59 | hotel_reviews[hash_key][1].append(doc["review"])
60 |
61 | else:
62 | hotel_reviews[hash_key] = [1, [doc["review"]]]
63 | hotel_data[hash_key] = get_fields(doc)
64 |
65 | top_three = sorted(hotel_reviews.items(), key=lambda x: x[1][0], reverse=True)[:3]
66 | top_three_hotels = []
67 | for hash_key, review_data in top_three:
68 | reviews = review_data[1]
69 | hotel = hotel_data[hash_key]
70 | top_three_hotels.append({**hotel, "reviews": reviews})
71 | return top_three_hotels
72 |
73 |
74 | def make_filter(state: str = None, city: str = None) -> Union[FilterExpression, None]:
75 | state_tag = Tag("state")
76 | city_tag = Tag("city")
77 | if state and city:
78 | return (state_tag == state) & (city_tag == city)
79 | elif state:
80 | return state_tag == state
81 | elif city:
82 | return city_tag == city
83 | else:
84 | return None
85 |
--------------------------------------------------------------------------------
/assets/design-dark.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/redis-developer/LLM-Recommender/0b803d9605a14f3852df949f699a4988a7e962b6/assets/design-dark.png
--------------------------------------------------------------------------------
/assets/design.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/redis-developer/LLM-Recommender/0b803d9605a14f3852df949f699a4988a7e962b6/assets/design.png
--------------------------------------------------------------------------------
/data/data.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/redis-developer/LLM-Recommender/0b803d9605a14f3852df949f699a4988a7e962b6/data/data.pkl
--------------------------------------------------------------------------------
/docker-compose.yml:
--------------------------------------------------------------------------------
1 | version: "3.9"
2 | services:
3 | streamlit:
4 | platform: linux/amd64
5 | container_name: streamlit
6 | build:
7 | context: ./
8 | ports:
9 | - "8501:8501"
10 | env_file:
11 | - .env
12 | depends_on:
13 | redis:
14 | condition: service_healthy
15 | redis:
16 | image: redis/redis-stack-server:latest
17 | ports:
18 | - "6379:6379"
19 | env_file:
20 | - .env
21 | volumes:
22 | - redis_data:/data
23 | healthcheck:
24 | test: ["CMD", "redis-cli", "-h", "localhost", "-p", "6379", "ping"]
25 | interval: 2s
26 | timeout: 1m30s
27 | retries: 5
28 | start_period: 5s
29 |
30 | volumes:
31 | redis_data:
--------------------------------------------------------------------------------
/env.template:
--------------------------------------------------------------------------------
1 | OPENAI_API_KEY=ADD_YOUR_KEY_HERE
2 | OPENAI_API_TYPE=openai
3 | OPENAI_COMPLETIONS_ENGINE=text-davinci-003
4 | OPENAI_CHAT_MODEL=gpt-3.5-turbo
5 | HF_VECTOR_MODEL=all-MiniLM-L6-v2
6 | REDIS_HOST=localhost
7 | REDIS_PORT=6379
8 | TOKENIZERS_PARALLELISM=false
9 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | redisvl==0.0.3
2 | streamlit>=1.25.0
3 | sentence-transformers>=2.2.2
4 | tiktoken>=0.4.0
5 | openai>=0.27.8
6 | python-dotenv
--------------------------------------------------------------------------------
/run.py:
--------------------------------------------------------------------------------
1 |
2 | from app.app import main
3 |
4 | main()
--------------------------------------------------------------------------------