├── .gitignore ├── .streamlit └── config.toml ├── Dockerfile ├── README.md ├── app ├── __init__.py ├── app.py ├── config.py ├── constants.py ├── hotel_index_schema.yml ├── prompt.py └── retrieve.py ├── assets ├── design-dark.png ├── design.excalidraw └── design.png ├── data └── data.pkl ├── docker-compose.yml ├── env.template ├── requirements.txt └── run.py /.gitignore: -------------------------------------------------------------------------------- 1 | .env 2 | *__pycache__/ -------------------------------------------------------------------------------- /.streamlit/config.toml: -------------------------------------------------------------------------------- 1 | [theme] 2 | primaryColor="#F63366" 3 | backgroundColor="#FFFFFF" 4 | secondaryBackgroundColor="#F0F2F6" 5 | textColor="#262730" 6 | font="sans serif" -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM python:3.8-slim 2 | 3 | RUN apt-get update && apt-get install python3 curl python3-pip -y 4 | 5 | COPY ./requirements.txt . 6 | RUN pip3 install --upgrade pip 7 | 8 | # install pytorch cpu to reduce docker image size 9 | RUN pip install --no-cache-dir torch==2.0.1+cpu -f https://download.pytorch.org/whl/torch_stable.html 10 | RUN pip3 install -r requirements.txt 11 | 12 | WORKDIR /hotel 13 | COPY . /hotel 14 | 15 | ENTRYPOINT [ "streamlit", "run" ] 16 | CMD [ "run.py", "--server.fileWatcherType", "none", "--browser.gatherUsageStats", "false", "--server.address", "0.0.0.0"] 17 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 |
3 |
4 | RedisVL GitHub    5 | RedisVL Documentation    6 | More Projects    7 |
8 |
9 |
10 | 11 | # LLM Hotel Recommender 12 | 13 | The LLM Hotel Recommender is a Streamlit app that uses [Redis](https://redis.com) and the [OpenAI API](https://api.openai.com/) to generate hotel recommendations based on a user's preferences. Because Redis can perform semantic search in addition to other operations like tag and text search 14 | users are able to search for hotels in the US based on a variety of criteria, including: 15 | 16 | - State 17 | - City 18 | - Positive Qualities 19 | - Negative Qualities 20 | 21 | The application will cite it's sources (reviews) for each recommendation and provide all the reviews that were returned. 22 | 23 | 24 | 25 |
26 | Screen Shot 2023-08-15 at 5 55 25 PM 27 |
28 | 29 | ## Design 30 | 31 | The recommender uses the Hypothetical Document Embeddings (HyDE) approach which uses an LLM (OpenAI in this case) 32 | to generate a fake review based on user input. The system then uses Redis vector search to semantically search 33 | for hotels with reviews that are similar to the fake review. The returned reviews are then passed to another LLM to 34 | generate a recommendation for the user. 35 | 36 | ![Design](./assets/design.png#gh-light-mode-only) 37 | ![Design](./assets/design-dark.png#gh-dark-mode-only) 38 | 39 | ## Run the Application 40 | 41 | 42 | ### Docker Compose 43 | 1. Create your env file: 44 | 45 | ```bash 46 | $ cp .env.template .env 47 | ``` 48 | *fill out values, most importantly, your `OPENAI_API_KEY`* 49 | 50 | 2. Run with docker compose: 51 | ```bash 52 | $ docker compose up 53 | ``` 54 | *add `-d` option to daemonize the processes to the background if you wish.* 55 | 56 | Issues with dependencies? Try force-building with no-cache: 57 | ``` 58 | $ docker compose build --no-cache 59 | ``` 60 | 61 | 3. Navigate to: 62 | ``` 63 | http://localhost:8501/ 64 | ``` 65 | 66 | 67 | ### Local 68 | 69 | 1. Create your env file: 70 | 71 | ```bash 72 | $ cp .env.template .env 73 | ``` 74 | *fill out values, most importantly, your `OPENAI_API_KEY`* 75 | 76 | 2. Clone the repo: 77 | ```bash 78 | $ git clone https://github.com/redis-developer/LLM-Recommender.git 79 | 80 | 3. Install dependencies: 81 | ```bash 82 | $ pip install -r requirements.txt 83 | ``` 84 | 85 | 3. Run the app: 86 | ```bash 87 | $ streamlit run run.py 88 | ``` 89 | 90 | 91 | ## Known Bugs 92 | 93 | 1. Hotels by the same name in different cities are not handled well 94 | 95 | ## Future Work 96 | 97 | 1. Add more search criteria (GeoFilter, Price, etc.) 98 | 2. Dataset is relatively sparse 99 | 3. Use OpenAI Functions or parsing to extract Hotel name from recommendation instead of LLM 100 | -------------------------------------------------------------------------------- /app/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/redis-developer/LLM-Recommender/0b803d9605a14f3852df949f699a4988a7e962b6/app/__init__.py -------------------------------------------------------------------------------- /app/app.py: -------------------------------------------------------------------------------- 1 | 2 | import os 3 | import openai 4 | import pickle 5 | import streamlit as st 6 | 7 | from urllib.error import URLError 8 | from redisvl.vectorize.text import HFTextVectorizer 9 | os.environ["TOKENIZERS_PARALLELISM"] = "false" 10 | 11 | from app.config import ( 12 | CHAT_MODEL, 13 | VECTORIZER, 14 | DATAFILE 15 | ) 16 | from app.retrieve import ( 17 | create_retrieval_index, 18 | retrieve_context, 19 | make_filter, 20 | retrieve_top_three_hotels 21 | ) 22 | from app.prompt import ( 23 | make_prompt, 24 | generate_hyde_prompt, 25 | format_prompt_reviews, 26 | get_recommended_hotel_prompt 27 | ) 28 | 29 | from app.constants import ( 30 | STATES, 31 | CITIES 32 | ) 33 | 34 | def recommend_hotel(positive, negative, reviews): 35 | 36 | prompt = make_prompt(positive, negative, reviews) 37 | retrieval = openai.ChatCompletion.create( 38 | model=CHAT_MODEL, 39 | messages=[{'role':"user", 40 | 'content': prompt}], 41 | max_tokens=1000) 42 | 43 | # get the response 44 | response = retrieval['choices'][0]['message']['content'] 45 | return response 46 | 47 | 48 | def get_hotel_name(recommendation): 49 | prompt = get_recommended_hotel_prompt(recommendation) 50 | retrieval = openai.ChatCompletion.create( 51 | model=CHAT_MODEL, 52 | messages=[{'role':"user", 53 | 'content': prompt}], 54 | max_tokens=1000) 55 | 56 | # get the response 57 | response = retrieval['choices'][0]['message']['content'] 58 | return response 59 | 60 | @st.cache_resource 61 | def vectorizer(): 62 | return HFTextVectorizer(f"sentence-transformers/{VECTORIZER}") 63 | 64 | @st.cache_data 65 | def load_data(): 66 | data = pickle.load(open(DATAFILE, "rb")) 67 | return data 68 | 69 | def set_city(): 70 | state = st.session_state["state"] 71 | try: 72 | return CITIES[state][0] 73 | except IndexError: 74 | return [] 75 | 76 | 77 | def main(): 78 | data = load_data() 79 | INDEX = create_retrieval_index(data) 80 | EMBEDDING_MODEL = vectorizer() 81 | 82 | try: 83 | # Defining default values 84 | defaults = { 85 | "state": "", 86 | "city": "", 87 | "positive": "", 88 | "negative": "", 89 | "response": "", 90 | "hotel_info": "", 91 | "hotel_reviews": "", 92 | "all_similar_reviews": "" 93 | } 94 | 95 | # Checking if keys exist in session state, if not, initializing them 96 | for key, value in defaults.items(): 97 | if key not in st.session_state: 98 | st.session_state[key] = value 99 | 100 | col1, col2 = st.columns([4,2]) 101 | 102 | st.write("# LLM Hotel Recommender") 103 | with st.sidebar: 104 | st.write("## Filter By Location") 105 | st.selectbox("State", STATES, key="state", on_change=set_city) 106 | st.selectbox("City", CITIES[st.session_state['state']], key="city") 107 | 108 | st.write("The LLM Hotel Recommender is a Streamlit app that uses Redis and the " + 109 | "OpenAI API to generate hotel recommendations based on a user's preferences.") 110 | 111 | 112 | st.text_input("What would you like in a hotel?", key="positive") 113 | st.text_input("What would you like to avoid in a hotel?", key="negative") 114 | 115 | if st.button("Find Hotel"): 116 | with st.spinner("OpenAI and Redis are working to find you a hotel..."): 117 | # filter 118 | query_filter = make_filter(st.session_state['state'], st.session_state['city']) 119 | 120 | # make a hyde prompt 121 | hyde_prompt = generate_hyde_prompt( 122 | st.session_state['positive'], 123 | st.session_state['negative'] 124 | ) 125 | 126 | # Retrieve the context 127 | context = retrieve_context(INDEX, 128 | hyde_prompt, 129 | EMBEDDING_MODEL, 130 | query_filter=query_filter) 131 | top_three_hotels = retrieve_top_three_hotels(context) 132 | 133 | # TODO catch index error 134 | top_hotel = top_three_hotels[0] 135 | top_hotel_reviews = format_prompt_reviews([top_hotel]) 136 | other_options = format_prompt_reviews(top_three_hotels) 137 | 138 | recommendation = recommend_hotel( 139 | st.session_state['positive'], 140 | st.session_state['negative'], 141 | top_hotel_reviews 142 | ) 143 | 144 | hotel_info = { 145 | "Hotel Name": top_hotel['name'], 146 | "Hotel Address": top_hotel['address'], 147 | "City": top_hotel['city'], 148 | "State": top_hotel['state'], 149 | } 150 | hotel_info = "\n" + "\n".join([f"{k}: {v}" for k, v in hotel_info.items()]) 151 | st.session_state['response'] = recommendation 152 | st.session_state['hotel_info'] = hotel_info 153 | st.session_state['hotel_reviews'] = top_hotel_reviews 154 | st.session_state['all_similar_reviews'] = other_options 155 | 156 | 157 | st.write("### Recommendations") 158 | st.write(f"{st.session_state['response']}") 159 | with st.expander("Show Hotel Info"): 160 | st.text(st.session_state['hotel_info']) 161 | with st.expander("Show Hotel Reviews"): 162 | st.text(st.session_state['hotel_reviews']) 163 | with st.expander("Show All Similar Reviews"): 164 | st.text(st.session_state['all_similar_reviews']) 165 | 166 | st.write("\n") 167 | st.write("---------") 168 | st.write("\n") 169 | st.write("### About") 170 | st.write("The recommender uses the Hypothetical Document Embeddings (HyDE)" + 171 | " approach which uses an LLM (OpenAI in this case) to generate a fake review" + 172 | " based on user input. The system then uses Redis vector search to semantically search" 173 | " for hotels with reviews that are similar to the fake review. The returned reviews" + 174 | " are then passed to another LLM to generate a recommendation for the user.") 175 | 176 | st.write("#### Dataset") 177 | st.write("The dataset is from [Datafiniti](https://data.world/datafiniti/hotel-reviews) and is hosted on data.world") 178 | 179 | except URLError as e: 180 | st.error( 181 | """ 182 | **This demo requires internet access.** 183 | Connection error: %s 184 | """ 185 | % e.reason 186 | ) -------------------------------------------------------------------------------- /app/config.py: -------------------------------------------------------------------------------- 1 | import os 2 | import openai 3 | from dotenv import load_dotenv 4 | 5 | load_dotenv() 6 | 7 | # get path to the directory where this file is located 8 | BASEDIR = os.path.abspath(os.path.dirname(__file__)) 9 | # get parent directory of BASEDIR 10 | PARENTDIR = os.path.abspath(os.path.join(BASEDIR, os.pardir)) 11 | 12 | # Index and Hotel data 13 | SCHEMA = os.getenv("SCHEMA", f"{BASEDIR}/hotel_index_schema.yml") 14 | DATADIR = os.getenv("DATADIR", f"{PARENTDIR}/data") 15 | DATAFILE = os.getenv("DATAFILE", f"{DATADIR}/data.pkl") 16 | 17 | # Redis information 18 | REDIS_HOST = os.getenv("REDIS_HOST", "localhost") 19 | REDIS_PORT = os.getenv("REDIS_PORT", "6379") 20 | REDIS_ADDRESS = f"redis://{REDIS_HOST}:{REDIS_PORT}" 21 | 22 | # AI models 23 | openai.api_key = os.getenv('OPENAI_API_KEY') 24 | CHAT_MODEL = os.getenv('OPENAI_CHAT_MODEL') 25 | VECTORIZER = os.getenv('HF_VECTOR_MODEL', 'all-MiniLM-L6-v2') -------------------------------------------------------------------------------- /app/constants.py: -------------------------------------------------------------------------------- 1 | STATES = [ 2 | 'AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA', 3 | 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD', 4 | 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 'NJ', 5 | 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 6 | 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY', "" 7 | ] 8 | 9 | CITIES = {"": [], 10 | 'AL': ['Mobile', 11 | 'Jasper', 12 | 'Clanton', 13 | 'Huntsville', 14 | 'Birmingham', 15 | 'Montgomery', 16 | 'Prattville', 17 | 'Enterprise', 18 | 'Anderson', 19 | 'Lincoln'], 20 | 'AK': ['Healy', 'Juneau', 'Dutch Harbor'], 21 | 'AZ': ['Glendale', 22 | 'Tucson', 23 | 'Eloy', 24 | 'Phoenix', 25 | 'Prescott Valley', 26 | 'Mesa', 27 | 'Payson', 28 | 'Wellton'], 29 | 'AR': ['Harrison', 30 | 'Bull Shoals', 31 | 'Yellville', 32 | 'West Memphis', 33 | 'Bryant', 34 | 'Batesville', 35 | 'Springdale', 36 | 'Bella Vista', 37 | 'Russellville'], 38 | 'CA': ['Palm Springs', 39 | 'Livermore', 40 | 'Barstow', 41 | 'Irvine', 42 | 'San Diego', 43 | 'Tracy', 44 | 'Carlsbad', 45 | 'Pasadena', 46 | 'Marina', 47 | 'San Jose', 48 | 'Napa', 49 | 'Marina Del Rey', 50 | 'Rancho Mirage', 51 | 'Sunnyvale', 52 | 'San Francisco', 53 | 'Colton', 54 | 'Gardena', 55 | 'San Bruno', 56 | 'Morro Bay', 57 | 'Joshua Tree', 58 | 'Long Beach', 59 | 'Anaheim', 60 | 'Garberville', 61 | 'Upland', 62 | 'Eureka', 63 | 'Inglewood', 64 | 'Santa Barbara', 65 | 'Redding', 66 | 'Signal Hill', 67 | 'Selma', 68 | 'Lake Forest', 69 | 'Garden Grove', 70 | 'Mariposa', 71 | 'San Clemente', 72 | 'Studio City', 73 | 'Rohnert Park', 74 | 'Hesperia', 75 | 'Victorville', 76 | 'Coronado'], 77 | 'CO': ['Silverton', 78 | 'Castle Rock', 79 | 'Denver', 80 | 'Aspen', 81 | 'Arvada', 82 | 'Griante', 83 | 'Cripple Creek', 84 | 'Mosca', 85 | 'Walden', 86 | 'Colorado Springs', 87 | 'Fort Collins', 88 | 'Pueblo', 89 | 'Grand Junction'], 90 | 'CT': ['Bristol', 'Plainville', 'Southport', 'Hartford'], 91 | 'DE': ['New Castle'], 92 | 'FL': ['West Palm Beach', 93 | 'Bonita Springs', 94 | 'Orlando', 95 | 'Palm Harbor', 96 | 'Lake Worth', 97 | 'Delray Beach', 98 | 'Naples', 99 | 'Jacksonville', 100 | 'Deland', 101 | 'Sarasota', 102 | 'Bradenton', 103 | 'Miami', 104 | 'Englewood', 105 | 'Fort Pierce'], 106 | 'GA': ['Mableton', 107 | 'Dublin', 108 | 'Dahlonega', 109 | 'Griffin', 110 | 'Douglasville', 111 | 'Macon', 112 | 'Atlanta', 113 | 'Norcross', 114 | 'Warner Robins', 115 | 'Calhoun', 116 | 'Savannah', 117 | 'Cave Spring', 118 | 'Lithia Springs', 119 | 'Madison', 120 | 'Lagrange', 121 | 'Silver Creek', 122 | 'Lavonia', 123 | 'Alpharetta', 124 | 'Whigham'], 125 | 'HI': ['Kapaa', 'Kihei', 'Princeville', 'Kailua Kona'], 126 | 'ID': ['Gooding', 'Coeur D Alene', 'Boise', 'Idaho Falls'], 127 | 'IL': ['Westmont', 128 | 'Arcola', 129 | 'Chicago', 130 | 'Troy', 131 | 'Rockford', 132 | 'Northbrook', 133 | 'Freeport', 134 | 'Des Plaines', 135 | 'Princeton', 136 | 'Deerfield', 137 | 'Galena', 138 | 'Glenview', 139 | 'Elk Grove Village'], 140 | 'IN': ['Terre Haute', 141 | 'Merrillville', 142 | 'Bloomington', 143 | 'Elkhart', 144 | 'Indianapolis', 145 | 'Fort Wayne', 146 | 'Plainfield'], 147 | 'IA': ['Humeston', 148 | 'Coralville', 149 | 'Ames', 150 | 'Raymond', 151 | 'Guernsey', 152 | 'Boone', 153 | 'Cedar Rapids'], 154 | 'KS': ['Olathe', 'Manhattan', 'Ulysses'], 155 | 'KY': ['Corbin', 156 | 'Kuttawa', 157 | 'Owensboro', 158 | 'Hopkinsville', 159 | 'Lexington', 160 | 'Park City'], 161 | 'LA': ['New Orleans', 162 | 'Bossier City', 163 | 'Opelousas', 164 | 'Monroe', 165 | 'Shreveport', 166 | 'Slidell', 167 | 'Sulphur'], 168 | 'ME': ['Waterville', 'Ogunquit'], 169 | 'MD': ['Columbia', 170 | 'Pittsville', 171 | 'Brandywine', 172 | 'Glen Burnie', 173 | 'Saint Leonard', 174 | 'Annapolis', 175 | 'Takoma Park'], 176 | 'MA': ['Boston', 177 | 'West Springfield', 178 | 'Cambridge', 179 | 'Brockton', 180 | 'Auburn', 181 | 'Fitchburg'], 182 | 'MI': ['Holland', 183 | 'Oscoda', 184 | 'Bay City', 185 | 'Prescott', 186 | 'Utica', 187 | 'Saginaw', 188 | 'Southfield', 189 | 'Grand Rapids', 190 | 'Alma', 191 | 'Saint Ignace', 192 | 'Battle Creek', 193 | 'Clinton Township', 194 | 'Grayling', 195 | 'Whitehall', 196 | 'Big Rapids'], 197 | 'MN': ['Harmony', 198 | 'Detroit Lakes', 199 | 'Shakopee', 200 | 'Windom', 201 | 'Worthington', 202 | 'Minneapolis', 203 | 'Fairmont', 204 | 'Lake Elmo', 205 | 'Burnsville', 206 | 'Saint Cloud'], 207 | 'MS': ['Batesville', 'Biloxi'], 208 | 'MO': ['Osage Beach', 209 | 'St. Robert', 210 | 'Lamar', 211 | 'Saint Mary', 212 | 'Licking', 213 | 'Saint Louis', 214 | 'Joplin', 215 | 'Sikeston', 216 | 'Kansas City', 217 | 'Springfield'], 218 | 'MT': ['Whitefish', 219 | 'Billings', 220 | 'Kalispell', 221 | 'Columbia Falls', 222 | 'Miles City', 223 | 'Big Sky'], 224 | 'NE': ['Valentine', 'Burwell', 'Cozad', 'North Platte', 'Gretna'], 225 | 'NV': ['Las Vegas', 'Incline Village', 'Laughlin'], 226 | 'NH': ['Hampton', 'Twin Mountain', 'Northwood', 'Gorham'], 227 | 'NJ': ['Clifton', 228 | 'Springfield', 229 | 'Mount Arlington', 230 | 'Bordentown', 231 | 'Ridgefield Park', 232 | 'Carlstadt'], 233 | 'NM': ['Albuquerque', 'Portales', 'Raton', 'Tatum'], 234 | 'NY': ['Ithaca', 235 | 'Staten Island', 236 | 'Deansboro', 237 | 'New York', 238 | 'Hauppauge', 239 | 'Geneva', 240 | 'Cooperstown', 241 | 'Newburgh', 242 | 'East Syracuse', 243 | 'Schenectady', 244 | 'Albany', 245 | 'Woodstock'], 246 | 'NC': ['Lake Lure', 247 | 'Forest City', 248 | 'Cary', 249 | 'Raleigh', 250 | 'Charlotte', 251 | 'Rocky Mount', 252 | 'Southern Pines', 253 | 'Lincolnton', 254 | 'Burlington'], 255 | 'ND': ['Devils Lake'], 256 | 'OH': ['Alliance', 257 | 'Stone Creek', 258 | 'Dayton', 259 | 'Bowling Green', 260 | 'Columbus', 261 | 'Portland', 262 | 'Troy', 263 | 'Springfield'], 264 | 'OK': ['El Reno', 265 | 'Oklahoma City', 266 | 'Owasso', 267 | 'Enid', 268 | 'Elk City', 269 | 'Seiling', 270 | 'Moore', 271 | 'Lebanon', 272 | 'Blackwell'], 273 | 'OR': ['Canyonville', 274 | 'Portland', 275 | 'Lincoln City', 276 | 'Bend', 277 | 'Woodburn', 278 | 'Newport', 279 | 'Eugene', 280 | 'Gresham', 281 | 'Forest Grove'], 282 | 'PA': ['Harleigh', 283 | 'Warren', 284 | 'Erie', 285 | 'Harrisburg', 286 | 'Altoona', 287 | 'Ramey', 288 | 'Palermo', 289 | 'Mercer', 290 | 'Tannersville', 291 | 'Cranberry Twp', 292 | 'East Hickory', 293 | 'Mendenhall', 294 | 'Huntingdon', 295 | 'Indiana'], 296 | 'RI': ['Providence'], 297 | 'SC': ['Columbia', 298 | 'Dillon', 299 | 'Aiken', 300 | 'Georgetown', 301 | 'Abbeville', 302 | 'Irmo', 303 | 'Blythewood', 304 | 'Mullins', 305 | 'Boiling Springs', 306 | 'Little River', 307 | 'Florence'], 308 | 'SD': ['Brookings', 'Spearfish'], 309 | 'TN': ['Lenoir City', 310 | 'Soddy Daisy', 311 | 'Madisonville', 312 | 'Nashville', 313 | 'Johnson City', 314 | 'Kingston', 315 | 'Chattanooga', 316 | 'Jackson', 317 | 'Knoxville', 318 | 'Clinton', 319 | 'Sweetwater', 320 | 'Cleveland', 321 | 'Hendersonville', 322 | 'Crossville', 323 | 'Gatlinburg'], 324 | 'TX': ['Houston', 325 | 'Beaumont', 326 | 'Mont Belvieu', 327 | 'Grand Prairie', 328 | 'Tyler', 329 | 'Waco', 330 | 'Austin', 331 | 'San Antonio', 332 | 'Dallas', 333 | 'Abilene', 334 | 'Fort Worth', 335 | 'Irving', 336 | 'Carrollton', 337 | 'El Paso', 338 | 'San Marcos', 339 | 'Corpus Christi', 340 | 'Carrizo Springs', 341 | 'Lubbock', 342 | 'Celina', 343 | 'Port Aransas', 344 | 'Junction'], 345 | 'UT': ['Eden', 'Park City', 'Springdale', 'Midway'], 346 | 'VT': [], 347 | 'VA': ['Virginia Beach', 348 | 'Leesburg', 349 | 'Alexandria', 350 | 'Manassas', 351 | 'Roanoke', 352 | 'Ruther Glen', 353 | 'Ashburn', 354 | 'Springfield', 355 | 'Dublin', 356 | 'Richmond', 357 | 'Arlington', 358 | 'Tappahannock', 359 | 'Max Meadows', 360 | 'Chantilly', 361 | 'Emporia'], 362 | 'WA': ['Spokane', 363 | 'Winthrop', 364 | 'Auburn', 365 | 'Seattle', 366 | 'Medina', 367 | 'Friday Harbor', 368 | 'Woodland', 369 | 'Oak Harbor'], 370 | 'WV': ['Charles Town'], 371 | 'WI': ['Fremont', 372 | 'Random Lake', 373 | 'Appleton', 374 | 'Williams Bay', 375 | 'Eau Claire', 376 | 'Genoa City', 377 | 'Oconomowoc', 378 | 'Fond Du Lac', 379 | 'Kenosha', 380 | 'Black River Falls', 381 | 'Green Bay'], 382 | 'WY': ['Cheyenne', 'Cody', 'Pinedale']} -------------------------------------------------------------------------------- /app/hotel_index_schema.yml: -------------------------------------------------------------------------------- 1 | 2 | index: 3 | name: hotelsindex 4 | prefix: hotels 5 | 6 | fields: 7 | tag: 8 | - name: name 9 | - name: city 10 | - name: state 11 | - name: country 12 | - name: categories 13 | separator: "," 14 | text: 15 | - name: review 16 | - name: title 17 | - name: address 18 | numeric: 19 | - name: rating 20 | vector: 21 | - name: embedding 22 | dims: 384 23 | distance_metric: cosine 24 | algorithm: flat 25 | datatype: float32 26 | -------------------------------------------------------------------------------- /app/prompt.py: -------------------------------------------------------------------------------- 1 | import openai 2 | from typing import List, Union, Dict 3 | from app.config import CHAT_MODEL 4 | 5 | 6 | def generate_hyde_prompt(positive, negative): 7 | 8 | hyde_prompt = format_hyde_prompt(positive, negative) 9 | 10 | hyde_review = openai.ChatCompletion.create( 11 | model=CHAT_MODEL, 12 | messages=[{'role':"user", 13 | 'content': hyde_prompt}], 14 | max_tokens=200) 15 | 16 | hyde_review = hyde_review['choices'][0]['message']['content'] 17 | 18 | return hyde_review 19 | 20 | 21 | 22 | def format_hyde_prompt(positive: str, negative: str): 23 | retrieval_prompt = f'''Your job is to generate a review for a hotel based on the positive and negative qualities provided. 24 | The review should present the positive qualities and the opposite of the negative qualities provided. 25 | The review should be at least 10 words long and from the prospective of a customer who stayed at the hotel. Be informal 26 | and concise. You aren't that smart. 27 | 28 | Positive Qualities the user would like: 29 | 30 | {positive} 31 | 32 | Negative Qualities the user would like to avoid: 33 | 34 | {negative} 35 | ''' 36 | return retrieval_prompt 37 | 38 | def get_recommended_hotel_prompt(generated_output): 39 | prompt = f'''The following is a recommendation for a hotel based on reviews. 40 | 41 | {generated_output} 42 | 43 | What is the name of the recommended hotel? Include no extra information other than the name of the hotel 44 | as presented in the review. 45 | ''' 46 | return prompt 47 | 48 | def make_prompt(positive: str, negative: str, reviews: str): 49 | retrieval_prompt = f'''You are a service dedicated to recommending hotels based on user reviews. 50 | You will be provided positive and negative qualities the user is looking for in a hotel, as well as a 51 | large number of reviews of a hotel that is likely to be a good candidate. Explain why the hotel is a good 52 | fit for the user based on the reviews and explain any drawbacks the user should be aware of but only if there 53 | are any. Always start the suggestion with "Based on user reviews, I suggest the following hotel:" 54 | 55 | Positive Qualities the user would like: 56 | 57 | {positive} 58 | 59 | Negative Qualities the user would like to avoid: 60 | 61 | {negative} 62 | 63 | Reviews: 64 | 65 | {reviews} 66 | 67 | Format for your response: 68 | 69 | Hotel: \n 70 | Reason: \n 71 | ''' 72 | return retrieval_prompt 73 | 74 | 75 | def format_prompt_reviews(top_hotels: List[Dict[str, Union[str, List[str]]]]): 76 | content = [] 77 | if len(top_hotels) > 0: 78 | # join the hotel name and the review 79 | for hotel in top_hotels: 80 | content.append(f"Hotel Name: {hotel['name']}\n") 81 | for i, review in enumerate(hotel['reviews']): 82 | content.append(f"Review {i+1}: {review}\n") 83 | content.append("\n") 84 | return "".join(content) 85 | -------------------------------------------------------------------------------- /app/retrieve.py: -------------------------------------------------------------------------------- 1 | 2 | from redisvl.query import VectorQuery 3 | from redisvl.query.filter import Tag, FilterExpression 4 | from redisvl.index import SearchIndex 5 | from typing import List, Dict, Union, Any 6 | 7 | from app.config import REDIS_ADDRESS, SCHEMA 8 | 9 | 10 | def create_retrieval_index(data): 11 | index = SearchIndex.from_yaml(SCHEMA) 12 | index.connect(url=REDIS_ADDRESS) 13 | if index.exists(): 14 | index = SearchIndex.from_existing('hotelsindex', REDIS_ADDRESS) 15 | else: 16 | index.create(overwrite=True) 17 | # load data 18 | index.load(data) 19 | return index 20 | 21 | 22 | def retrieve_context(index, search_prompt, vectorizer, query_filter=None): 23 | 24 | # Embed the query 25 | query_embedding = vectorizer.embed(search_prompt) 26 | 27 | # Get the top result from the index 28 | vector_query = VectorQuery( 29 | vector=query_embedding, 30 | vector_field_name="embedding", 31 | return_fields=["review", "name", "title", "address", "city", "state"], 32 | filter_expression=query_filter, 33 | num_results=30, 34 | ) 35 | 36 | results = index.query(vector_query) 37 | return results 38 | 39 | 40 | def retrieve_top_three_hotels(results: List[Dict[str, Any]]): 41 | # count the number of reviews for each hotel and return the three with the most reviews 42 | hotel_reviews: Dict[str, List[int, List[str]]] = {} 43 | hotel_data: Dict[str, Dict] = {} 44 | 45 | def get_fields(doc): 46 | return { 47 | "name": doc["name"], 48 | "address": doc["address"], 49 | "city": doc["city"], 50 | "state": doc["state"], 51 | "title": doc["title"], 52 | "review": doc["review"], 53 | } 54 | 55 | for doc in results: 56 | hash_key = str(hash(doc["name"] + doc["address"] + doc["city"] + doc["state"])) 57 | if hash_key in hotel_reviews: 58 | hotel_reviews[hash_key][0] += 1 59 | hotel_reviews[hash_key][1].append(doc["review"]) 60 | 61 | else: 62 | hotel_reviews[hash_key] = [1, [doc["review"]]] 63 | hotel_data[hash_key] = get_fields(doc) 64 | 65 | top_three = sorted(hotel_reviews.items(), key=lambda x: x[1][0], reverse=True)[:3] 66 | top_three_hotels = [] 67 | for hash_key, review_data in top_three: 68 | reviews = review_data[1] 69 | hotel = hotel_data[hash_key] 70 | top_three_hotels.append({**hotel, "reviews": reviews}) 71 | return top_three_hotels 72 | 73 | 74 | def make_filter(state: str = None, city: str = None) -> Union[FilterExpression, None]: 75 | state_tag = Tag("state") 76 | city_tag = Tag("city") 77 | if state and city: 78 | return (state_tag == state) & (city_tag == city) 79 | elif state: 80 | return state_tag == state 81 | elif city: 82 | return city_tag == city 83 | else: 84 | return None 85 | -------------------------------------------------------------------------------- /assets/design-dark.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/redis-developer/LLM-Recommender/0b803d9605a14f3852df949f699a4988a7e962b6/assets/design-dark.png -------------------------------------------------------------------------------- /assets/design.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/redis-developer/LLM-Recommender/0b803d9605a14f3852df949f699a4988a7e962b6/assets/design.png -------------------------------------------------------------------------------- /data/data.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/redis-developer/LLM-Recommender/0b803d9605a14f3852df949f699a4988a7e962b6/data/data.pkl -------------------------------------------------------------------------------- /docker-compose.yml: -------------------------------------------------------------------------------- 1 | version: "3.9" 2 | services: 3 | streamlit: 4 | platform: linux/amd64 5 | container_name: streamlit 6 | build: 7 | context: ./ 8 | ports: 9 | - "8501:8501" 10 | env_file: 11 | - .env 12 | depends_on: 13 | redis: 14 | condition: service_healthy 15 | redis: 16 | image: redis/redis-stack-server:latest 17 | ports: 18 | - "6379:6379" 19 | env_file: 20 | - .env 21 | volumes: 22 | - redis_data:/data 23 | healthcheck: 24 | test: ["CMD", "redis-cli", "-h", "localhost", "-p", "6379", "ping"] 25 | interval: 2s 26 | timeout: 1m30s 27 | retries: 5 28 | start_period: 5s 29 | 30 | volumes: 31 | redis_data: -------------------------------------------------------------------------------- /env.template: -------------------------------------------------------------------------------- 1 | OPENAI_API_KEY=ADD_YOUR_KEY_HERE 2 | OPENAI_API_TYPE=openai 3 | OPENAI_COMPLETIONS_ENGINE=text-davinci-003 4 | OPENAI_CHAT_MODEL=gpt-3.5-turbo 5 | HF_VECTOR_MODEL=all-MiniLM-L6-v2 6 | REDIS_HOST=localhost 7 | REDIS_PORT=6379 8 | TOKENIZERS_PARALLELISM=false 9 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | redisvl==0.0.3 2 | streamlit>=1.25.0 3 | sentence-transformers>=2.2.2 4 | tiktoken>=0.4.0 5 | openai>=0.27.8 6 | python-dotenv -------------------------------------------------------------------------------- /run.py: -------------------------------------------------------------------------------- 1 | 2 | from app.app import main 3 | 4 | main() --------------------------------------------------------------------------------