├── .gitignore ├── README.md ├── app.py ├── assets └── ecommerce-demo.gif ├── data ├── gs-all-cat-sample-200k.csv └── marqo-gs_100k.csv ├── marqo ├── add_documents.py ├── create_index.py └── get_stats.py └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | # Virtual Environment 2 | venv 3 | 4 | # Environment Variables 5 | .env -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | Blog 3 | 4 | 5 | Hugging Face 6 | 7 | 8 | Blog 9 | 10 | 11 | Blog 12 | 13 | 14 | Slack 15 | 16 | 17 | # Ecommerce Search with Marqo 18 | 19 | This repository contains a multimodal ecommerce search application built using Marqo's [Cloud-based search engine](https://www.marqo.ai/cloud?utm_source=github&utm_medium=organic&utm_campaign=marqo-ai&utm_term=2024-11-07-04-36-utc) and Marqo's state-of-the-art [ecommerce embedding models](https://huggingface.co/collections/Marqo/marqo-ecommerce-embeddings-66f611b9bb9d035a8d164fbb). 20 | 21 |

22 | 23 |

24 | 25 | ## Step 1. Set Up 26 | First, you will need a Marqo Cloud API Key. To obtain this, visit this [article](https://www.marqo.ai/blog/finding-my-marqo-api-key). 27 | Once you have your API Key, place it inside a `.env` file such that: 28 | ```env 29 | MARQO_API_KEY = "XXXXXXXX" # Visit https://www.marqo.ai/blog/finding-my-marqo-api-key 30 | ``` 31 | 32 | To install all packages needed for this search demo: 33 | ```bash 34 | python3 -m venv venv 35 | source venv/bin/activate # or venv\Scripts\activate for Windows 36 | pip install -r requirements.txt 37 | ``` 38 | 39 | Now you're ready to create your Marqo Index! 40 | 41 | ## Step 2: Create Your Marqo Index 42 | For this search demo, we will be using Marqo's state-of-the-art ecommerce embedding models, `marqo-ecommerce-embeddings-B` and `marqo-ecommerce-embeddings-L`. The file `marqo/create_index.py` provides code for each of these models. Feel free to change this to suit whichever model you want. By default, this search demo will use `marqo-ecommerce-embeddings-L`. For more information on these models, see our [Hugging Face](https://huggingface.co/collections/Marqo/marqo-ecommerce-embeddings-66f611b9bb9d035a8d164fbb). 43 | 44 | To create your index: 45 | ```bash 46 | python3 marqo/create_index.py 47 | python3 marqo/add_documents.py 48 | ``` 49 | 50 | If you visit [Marqo Cloud](https://cloud.marqo.ai/indexes/), you will be able to see the status of your index (and when it's ready to add documents to). The second line here adds data from `data/marqo-gs_100k.csv` which is a 100k subset of the [Marqo-GS-10M](https://huggingface.co/datasets/Marqo/marqo-GS-10M) dataset. Note, we also have a csv containing 200k items across all categories in the Google Shopping dataset. Feel free to use this dataset if you'd prefer. To check the status of your index when documents are being added, you can run: 51 | ```bash 52 | python3 marqo/get_stats.py 53 | ``` 54 | This will tell you how many documents and vectors are in your index. These numbers will continue to increase as more data is added to your index. 55 | 56 | ## Step 3: Run the Application 57 | While documents are being added to your index, you can run the UI. To run the search demo: 58 | ```bash 59 | python3 app.py 60 | ``` 61 | This will create a UI exactly like the video at the top of this README.md. 62 | 63 | ## Step 4 (Optional): Deploy on Hugging Face 64 | We set up this ecommerce search demo with the ability to deploy onto Hugging Face. Simply set up a Gradio Hugging Face Space and copy the contents of the `app.py` file. Note, you will need to define your Marqo API Key as a secret variable in your Hugging Face Space for this to work. 65 | 66 | To see this demo live on Hugging Face, visit our [Ecommerce Search Space](https://huggingface.co/spaces/Marqo/Ecommerce-Search)! 67 | 68 | ## Step 5: Clean Up 69 | If you follow the steps in this guide, you will create an index with CPU large inference and a basic storage shard. This index will cost $0.38 per hour. When you are done with the index you can delete it with the following code: 70 | ```python 71 | import marqo 72 | import os 73 | 74 | mq = marqo.Client("https://api.marqo.ai", api_key=os.getenv("MARQO_API_KEY")) 75 | mq.delete_index(index_name) 76 | ``` 77 | 78 | **If you do not delete your index you will continue to be charged for it.** 79 | 80 | ## Running This Project Locally 81 | If you'd prefer to run this project locally rather than with Marqo Cloud, you can do so using our [open source version of Marqo](https://github.com/marqo-ai). To run Marqo using Docker: 82 | ```bash 83 | docker rm -f marqo 84 | docker pull marqoai/marqo:latest 85 | docker run --name marqo -it -p 8882:8882 marqoai/marqo:latest 86 | ``` 87 | Once Marqo is running, you can deploy in the same way as above but please note, `mq = marqo.Client("https://api.marqo.ai", api_key=api_key)` will need to be replaced with `mq = marqo.Client("http://localhost:8882", api_key=None)` in `app.py`, `marqo/create_index.py`, `marqo/add_documents.py` and `marqo/get_stats.py`. 88 | 89 | ## Questions? Contact Us! 90 | If you have any questions about this search demo or about Marqo's capabilities, you can: 91 | * [Join Our Slack Community](https://join.slack.com/t/marqo-community/shared_invite/zt-2ry33y71j-H0WUeQvFaVlKuuZwl38BeA) 92 | * [Book a Demo](https://www.marqo.ai/book-demo?utm_source=github&utm_medium=organic&utm_campaign=marqo-ai&utm_term=2024-11-07-04-36-utc) 93 | -------------------------------------------------------------------------------- /app.py: -------------------------------------------------------------------------------- 1 | import marqo 2 | import requests 3 | import io 4 | from PIL import Image 5 | import gradio as gr 6 | import os 7 | from dotenv import load_dotenv 8 | 9 | load_dotenv() 10 | 11 | # Initialize Marqo client 12 | api_key = os.getenv("MARQO_API_KEY") # To find your Marqo api key, visit https://www.marqo.ai/blog/finding-my-marqo-api-key 13 | mq = marqo.Client("https://api.marqo.ai", api_key=api_key) 14 | 15 | # If you'd rather run Marqo locally, swap out the code above with 16 | # mq = marqo.Client("http://localhost:8882", api_key=None) 17 | # For more information on running Marqo with Docker, see our GitHub: https://github.com/marqo-ai/marqo 18 | 19 | def search_marqo(query, themes, negatives): 20 | """ 21 | Searches Marqo index with a query, additional themes to emphasize, and negative themes to avoid. 22 | Args: 23 | query (str): Main search query. 24 | themes (str): Additional positive theme for emphasis. 25 | negatives (str): Negative theme to de-emphasize in search. 26 | Returns: 27 | list: A list of tuples containing images and associated product information (title, description, price, score). 28 | """ 29 | # Build query weights 30 | query_weights = {query: 1.0} 31 | if themes: 32 | query_weights[themes] = 0.75 33 | if negatives: 34 | query_weights[negatives] = -1.1 35 | 36 | # Perform search with Marqo 37 | res = mq.index("marqo-ecommerce-search").search(query_weights, limit=10) # limit to top 10 results 38 | 39 | # Prepare results 40 | products = [] 41 | for hit in res['hits']: 42 | image_url = hit.get('image_url') 43 | title = hit.get('title', 'No Title') 44 | score = hit['_score'] 45 | 46 | # Fetch the image from the URL 47 | response = requests.get(image_url) 48 | image = Image.open(io.BytesIO(response.content)) 49 | 50 | # Append product details for Gradio display 51 | product_info = f'{title}' 52 | products.append((image, product_info)) 53 | 54 | return products 55 | 56 | def clear_inputs(): 57 | """ 58 | Clears input fields and results in the Gradio interface. 59 | Returns: 60 | tuple: Empty values to reset query, themes, negatives, and results gallery. 61 | """ 62 | return "", "", [], [] # Clears query, themes, negatives, and results 63 | 64 | # Gradio Blocks Interface for Custom Layout 65 | with gr.Blocks(css=".orange-button { background-color: orange; color: black; }") as interface: 66 | gr.Markdown("

Multimodal Ecommerce Search with Marqo's SOTA Embedding Models

") 67 | gr.Markdown("### This ecommerce search demo uses:") 68 | gr.Markdown("### 1. [Marqo Cloud](https://www.marqo.ai/cloud) for the Search Engine.") 69 | gr.Markdown("### 2. [Marqo-Ecommerce-Embeddings](https://huggingface.co/collections/Marqo/marqo-ecommerce-embeddings-66f611b9bb9d035a8d164fbb) for the multimodal embedding model. Specifically, `marqo-ecommerce-L`.") 70 | gr.Markdown("### 3. Products from the [Marqo-GS-10M](https://huggingface.co/datasets/Marqo/marqo-GS-10M) dataset.") 71 | 72 | gr.Markdown("") 73 | # gr.Markdown("If you can't find the item you're looking for, let a member of our team know and we'll add it to the dataset.") 74 | 75 | with gr.Row(): 76 | query_input = gr.Textbox(placeholder="coffee machine", label="Search Query") 77 | themes_input = gr.Textbox(placeholder="silver", label="More of...") 78 | negatives_input = gr.Textbox(placeholder="pods", label="Less of...") 79 | 80 | with gr.Row(): 81 | search_button = gr.Button("Submit", elem_classes="orange-button") 82 | # clear_button = gr.Button("Clear") 83 | 84 | results_gallery = gr.Gallery(label="Top 10 Results", columns=4) 85 | 86 | # Set up function call for search on button click or Enter key 87 | search_button.click(fn=search_marqo, inputs=[query_input, themes_input, negatives_input], outputs=results_gallery) 88 | 89 | # Clear button functionality 90 | # clear_button.click(fn=clear_inputs, inputs=[], outputs=[query_input, themes_input, negatives_input, results_gallery]) 91 | 92 | # Enable Enter key submission for all input fields 93 | query_input.submit(fn=search_marqo, inputs=[query_input, themes_input, negatives_input], outputs=results_gallery) 94 | themes_input.submit(fn=search_marqo, inputs=[query_input, themes_input, negatives_input], outputs=results_gallery) 95 | negatives_input.submit(fn=search_marqo, inputs=[query_input, themes_input, negatives_input], outputs=results_gallery) 96 | 97 | # Launch the app 98 | interface.launch() -------------------------------------------------------------------------------- /assets/ecommerce-demo.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/marqo-ai/ecommerce-search/bb86e38f66e98cf43b9abb1466c8b75b6dee7f53/assets/ecommerce-demo.gif -------------------------------------------------------------------------------- /marqo/add_documents.py: -------------------------------------------------------------------------------- 1 | import marqo 2 | import os 3 | import pandas as pd 4 | from dotenv import load_dotenv 5 | 6 | load_dotenv() 7 | 8 | # Initialize the Marqo Cloud Client 9 | print("Initializing Marqo Cloud client...") 10 | api_key = os.getenv("MARQO_API_KEY") # To find your Marqo api key, visit https://www.marqo.ai/blog/finding-my-marqo-api-key 11 | mq = marqo.Client("https://api.marqo.ai", api_key=api_key) 12 | 13 | # If you'd rather run Marqo locally, swap out the code above with 14 | # mq = marqo.Client("http://localhost:8882", api_key=None) 15 | # For more information on running Marqo with Docker, see our GitHub: https://github.com/marqo-ai/marqo 16 | 17 | # Read the CSV data file 18 | path_to_data = "data/marqo-gs_100k.csv" 19 | df = pd.read_csv(path_to_data) 20 | 21 | # Convert the data into the required document format 22 | print("Converting data into the required document format...") 23 | documents = [] 24 | for index, row in df.iterrows(): 25 | document = { 26 | "image_url": row["image"], 27 | "query": row["query"], 28 | "title": row["title"], 29 | "score": row["score"], 30 | } 31 | documents.append(document) 32 | 33 | # Print progress for every 1000 rows processed 34 | if (index + 1) % 1000 == 0: 35 | print(f"Processed {index + 1} rows...") 36 | 37 | print("Data conversion completed. Starting document upload...") 38 | 39 | # Add the documents to the Marqo index in batches 40 | batch_size = 64 41 | for i in range(0, len(documents), batch_size): 42 | batch = documents[i:i + batch_size] 43 | try: 44 | print(f"Uploading batch {i // batch_size + 1} (rows {i + 1} to {i + len(batch)})...") 45 | res = mq.index("marqo-ecommerce-search").add_documents( 46 | batch, 47 | client_batch_size=batch_size, 48 | mappings={ 49 | "image_title_multimodal": { 50 | "type": "multimodal_combination", 51 | "weights": {"title": 0.1, "query": 0.1, "image_url": 0.8}, 52 | } 53 | }, 54 | tensor_fields=["image_title_multimodal"], 55 | ) 56 | print(f"Batch {i // batch_size + 1} upload response: {res}") 57 | except Exception as e: 58 | print(f"Error uploading batch {i // batch_size + 1}: {e}") 59 | 60 | print("All batches processed.") -------------------------------------------------------------------------------- /marqo/create_index.py: -------------------------------------------------------------------------------- 1 | import marqo 2 | import os 3 | from dotenv import load_dotenv 4 | 5 | load_dotenv() 6 | 7 | # Initialize Marqo Cloud Client 8 | # Fetch API key from environment variable for secure access 9 | api_key = os.getenv("MARQO_API_KEY") # To find your Marqo api key, visit https://www.marqo.ai/blog/finding-my-marqo-api-key 10 | mq = marqo.Client("https://api.marqo.ai", api_key=api_key) 11 | 12 | # If you'd rather run Marqo locally, swap out the code above with 13 | # mq = marqo.Client("http://localhost:8882", api_key=None) 14 | # For more information on running Marqo with Docker, see our GitHub: https://github.com/marqo-ai/marqo 15 | 16 | # Define settings for the Marqo index 17 | settings = { 18 | "type": "unstructured", 19 | "model": "Marqo/marqo-ecommerce-embeddings-L", # Specify alternative model 20 | "modelProperties": { 21 | "name": "hf-hub:Marqo/marqo-ecommerce-embeddings-L", 22 | "dimensions": 1024, # Larger dimensionality for embeddings 23 | "type": "open_clip" 24 | }, 25 | "treatUrlsAndPointersAsImages": True, # Enable image URLs as image sources 26 | "inferenceType": "marqo.CPU.large", # Inference type for Marqo Cloud resources 27 | } 28 | 29 | # Specify the name of the index 30 | index_name = "marqo-ecommerce-search" 31 | 32 | # Delete the existing index if it already exists to avoid conflicts 33 | try: 34 | mq.index(index_name).delete() 35 | except: 36 | pass # If the index does not exist, skip deletion 37 | 38 | # Create a new index with the specified settings 39 | mq.create_index(index_name, settings_dict=settings) 40 | 41 | # Alternative model configuration: marqo-ecommerce-embeddings-B 42 | # settings = { 43 | # "type": "unstructured", # Set the index type as unstructured data 44 | # "model": "Marqo/marqo-ecommerce-embeddings-B", # Specify model name 45 | # "modelProperties": { # Set model properties to use Marqo's ecommerce embeddings on HF Hub 46 | # "name": "hf-hub:Marqo/marqo-ecommerce-embeddings-B", 47 | # "dimensions": 768, # Dimensionality of the embedding model 48 | # "type": "open_clip" # Model type (OpenCLIP architecture) 49 | # }, 50 | # "treatUrlsAndPointersAsImages": True, # Enable image URLs as image sources 51 | # "inferenceType": "marqo.CPU.large", # Inference type for Marqo Cloud resources 52 | # } -------------------------------------------------------------------------------- /marqo/get_stats.py: -------------------------------------------------------------------------------- 1 | import marqo 2 | import os 3 | from dotenv import load_dotenv 4 | 5 | load_dotenv() 6 | 7 | # Retrieve the Marqo Cloud API key from environment variables for secure access 8 | api_key = os.getenv("MARQO_API_KEY") 9 | 10 | # Initialize Marqo client for Marqo Cloud using the API key 11 | mq = marqo.Client("https://api.marqo.ai", api_key=api_key) # To find your Marqo api key, visit https://www.marqo.ai/blog/finding-my-marqo-api-key 12 | 13 | # If you'd rather run Marqo locally, swap out the code above with 14 | # mq = marqo.Client("http://localhost:8882", api_key=None) 15 | # For more information on running Marqo with Docker, see our GitHub: https://github.com/marqo-ai/marqo 16 | 17 | # Define the name of the index to retrieve statistics from 18 | index_name = "marqo-ecommerce-search" 19 | 20 | # Fetch statistics for the specified index 21 | results = mq.index(index_name).get_stats() 22 | 23 | # Print the results to view the statistics of the index 24 | print(results) 25 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | marqo==3.9.0 2 | requests==2.32.3 3 | Pillow==11.0.0 4 | python-dotenv==1.0.1 5 | gradio==5.5.0 6 | python-dotenv==1.0.1 7 | --------------------------------------------------------------------------------