├── .gitignore
├── README.md
├── app.py
├── assets
└── ecommerce-demo.gif
├── data
├── gs-all-cat-sample-200k.csv
└── marqo-gs_100k.csv
├── marqo
├── add_documents.py
├── create_index.py
└── get_stats.py
└── requirements.txt
/.gitignore:
--------------------------------------------------------------------------------
1 | # Virtual Environment
2 | venv
3 |
4 | # Environment Variables
5 | .env
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 | # Ecommerce Search with Marqo
18 |
19 | This repository contains a multimodal ecommerce search application built using Marqo's [Cloud-based search engine](https://www.marqo.ai/cloud?utm_source=github&utm_medium=organic&utm_campaign=marqo-ai&utm_term=2024-11-07-04-36-utc) and Marqo's state-of-the-art [ecommerce embedding models](https://huggingface.co/collections/Marqo/marqo-ecommerce-embeddings-66f611b9bb9d035a8d164fbb).
20 |
21 |
22 |
23 |
24 |
25 | ## Step 1. Set Up
26 | First, you will need a Marqo Cloud API Key. To obtain this, visit this [article](https://www.marqo.ai/blog/finding-my-marqo-api-key).
27 | Once you have your API Key, place it inside a `.env` file such that:
28 | ```env
29 | MARQO_API_KEY = "XXXXXXXX" # Visit https://www.marqo.ai/blog/finding-my-marqo-api-key
30 | ```
31 |
32 | To install all packages needed for this search demo:
33 | ```bash
34 | python3 -m venv venv
35 | source venv/bin/activate # or venv\Scripts\activate for Windows
36 | pip install -r requirements.txt
37 | ```
38 |
39 | Now you're ready to create your Marqo Index!
40 |
41 | ## Step 2: Create Your Marqo Index
42 | For this search demo, we will be using Marqo's state-of-the-art ecommerce embedding models, `marqo-ecommerce-embeddings-B` and `marqo-ecommerce-embeddings-L`. The file `marqo/create_index.py` provides code for each of these models. Feel free to change this to suit whichever model you want. By default, this search demo will use `marqo-ecommerce-embeddings-L`. For more information on these models, see our [Hugging Face](https://huggingface.co/collections/Marqo/marqo-ecommerce-embeddings-66f611b9bb9d035a8d164fbb).
43 |
44 | To create your index:
45 | ```bash
46 | python3 marqo/create_index.py
47 | python3 marqo/add_documents.py
48 | ```
49 |
50 | If you visit [Marqo Cloud](https://cloud.marqo.ai/indexes/), you will be able to see the status of your index (and when it's ready to add documents to). The second line here adds data from `data/marqo-gs_100k.csv` which is a 100k subset of the [Marqo-GS-10M](https://huggingface.co/datasets/Marqo/marqo-GS-10M) dataset. Note, we also have a csv containing 200k items across all categories in the Google Shopping dataset. Feel free to use this dataset if you'd prefer. To check the status of your index when documents are being added, you can run:
51 | ```bash
52 | python3 marqo/get_stats.py
53 | ```
54 | This will tell you how many documents and vectors are in your index. These numbers will continue to increase as more data is added to your index.
55 |
56 | ## Step 3: Run the Application
57 | While documents are being added to your index, you can run the UI. To run the search demo:
58 | ```bash
59 | python3 app.py
60 | ```
61 | This will create a UI exactly like the video at the top of this README.md.
62 |
63 | ## Step 4 (Optional): Deploy on Hugging Face
64 | We set up this ecommerce search demo with the ability to deploy onto Hugging Face. Simply set up a Gradio Hugging Face Space and copy the contents of the `app.py` file. Note, you will need to define your Marqo API Key as a secret variable in your Hugging Face Space for this to work.
65 |
66 | To see this demo live on Hugging Face, visit our [Ecommerce Search Space](https://huggingface.co/spaces/Marqo/Ecommerce-Search)!
67 |
68 | ## Step 5: Clean Up
69 | If you follow the steps in this guide, you will create an index with CPU large inference and a basic storage shard. This index will cost $0.38 per hour. When you are done with the index you can delete it with the following code:
70 | ```python
71 | import marqo
72 | import os
73 |
74 | mq = marqo.Client("https://api.marqo.ai", api_key=os.getenv("MARQO_API_KEY"))
75 | mq.delete_index(index_name)
76 | ```
77 |
78 | **If you do not delete your index you will continue to be charged for it.**
79 |
80 | ## Running This Project Locally
81 | If you'd prefer to run this project locally rather than with Marqo Cloud, you can do so using our [open source version of Marqo](https://github.com/marqo-ai). To run Marqo using Docker:
82 | ```bash
83 | docker rm -f marqo
84 | docker pull marqoai/marqo:latest
85 | docker run --name marqo -it -p 8882:8882 marqoai/marqo:latest
86 | ```
87 | Once Marqo is running, you can deploy in the same way as above but please note, `mq = marqo.Client("https://api.marqo.ai", api_key=api_key)` will need to be replaced with `mq = marqo.Client("http://localhost:8882", api_key=None)` in `app.py`, `marqo/create_index.py`, `marqo/add_documents.py` and `marqo/get_stats.py`.
88 |
89 | ## Questions? Contact Us!
90 | If you have any questions about this search demo or about Marqo's capabilities, you can:
91 | * [Join Our Slack Community](https://join.slack.com/t/marqo-community/shared_invite/zt-2ry33y71j-H0WUeQvFaVlKuuZwl38BeA)
92 | * [Book a Demo](https://www.marqo.ai/book-demo?utm_source=github&utm_medium=organic&utm_campaign=marqo-ai&utm_term=2024-11-07-04-36-utc)
93 |
--------------------------------------------------------------------------------
/app.py:
--------------------------------------------------------------------------------
1 | import marqo
2 | import requests
3 | import io
4 | from PIL import Image
5 | import gradio as gr
6 | import os
7 | from dotenv import load_dotenv
8 |
9 | load_dotenv()
10 |
11 | # Initialize Marqo client
12 | api_key = os.getenv("MARQO_API_KEY") # To find your Marqo api key, visit https://www.marqo.ai/blog/finding-my-marqo-api-key
13 | mq = marqo.Client("https://api.marqo.ai", api_key=api_key)
14 |
15 | # If you'd rather run Marqo locally, swap out the code above with
16 | # mq = marqo.Client("http://localhost:8882", api_key=None)
17 | # For more information on running Marqo with Docker, see our GitHub: https://github.com/marqo-ai/marqo
18 |
19 | def search_marqo(query, themes, negatives):
20 | """
21 | Searches Marqo index with a query, additional themes to emphasize, and negative themes to avoid.
22 | Args:
23 | query (str): Main search query.
24 | themes (str): Additional positive theme for emphasis.
25 | negatives (str): Negative theme to de-emphasize in search.
26 | Returns:
27 | list: A list of tuples containing images and associated product information (title, description, price, score).
28 | """
29 | # Build query weights
30 | query_weights = {query: 1.0}
31 | if themes:
32 | query_weights[themes] = 0.75
33 | if negatives:
34 | query_weights[negatives] = -1.1
35 |
36 | # Perform search with Marqo
37 | res = mq.index("marqo-ecommerce-search").search(query_weights, limit=10) # limit to top 10 results
38 |
39 | # Prepare results
40 | products = []
41 | for hit in res['hits']:
42 | image_url = hit.get('image_url')
43 | title = hit.get('title', 'No Title')
44 | score = hit['_score']
45 |
46 | # Fetch the image from the URL
47 | response = requests.get(image_url)
48 | image = Image.open(io.BytesIO(response.content))
49 |
50 | # Append product details for Gradio display
51 | product_info = f'{title}'
52 | products.append((image, product_info))
53 |
54 | return products
55 |
56 | def clear_inputs():
57 | """
58 | Clears input fields and results in the Gradio interface.
59 | Returns:
60 | tuple: Empty values to reset query, themes, negatives, and results gallery.
61 | """
62 | return "", "", [], [] # Clears query, themes, negatives, and results
63 |
64 | # Gradio Blocks Interface for Custom Layout
65 | with gr.Blocks(css=".orange-button { background-color: orange; color: black; }") as interface:
66 | gr.Markdown("Multimodal Ecommerce Search with Marqo's SOTA Embedding Models
")
67 | gr.Markdown("### This ecommerce search demo uses:")
68 | gr.Markdown("### 1. [Marqo Cloud](https://www.marqo.ai/cloud) for the Search Engine.")
69 | gr.Markdown("### 2. [Marqo-Ecommerce-Embeddings](https://huggingface.co/collections/Marqo/marqo-ecommerce-embeddings-66f611b9bb9d035a8d164fbb) for the multimodal embedding model. Specifically, `marqo-ecommerce-L`.")
70 | gr.Markdown("### 3. Products from the [Marqo-GS-10M](https://huggingface.co/datasets/Marqo/marqo-GS-10M) dataset.")
71 |
72 | gr.Markdown("")
73 | # gr.Markdown("If you can't find the item you're looking for, let a member of our team know and we'll add it to the dataset.")
74 |
75 | with gr.Row():
76 | query_input = gr.Textbox(placeholder="coffee machine", label="Search Query")
77 | themes_input = gr.Textbox(placeholder="silver", label="More of...")
78 | negatives_input = gr.Textbox(placeholder="pods", label="Less of...")
79 |
80 | with gr.Row():
81 | search_button = gr.Button("Submit", elem_classes="orange-button")
82 | # clear_button = gr.Button("Clear")
83 |
84 | results_gallery = gr.Gallery(label="Top 10 Results", columns=4)
85 |
86 | # Set up function call for search on button click or Enter key
87 | search_button.click(fn=search_marqo, inputs=[query_input, themes_input, negatives_input], outputs=results_gallery)
88 |
89 | # Clear button functionality
90 | # clear_button.click(fn=clear_inputs, inputs=[], outputs=[query_input, themes_input, negatives_input, results_gallery])
91 |
92 | # Enable Enter key submission for all input fields
93 | query_input.submit(fn=search_marqo, inputs=[query_input, themes_input, negatives_input], outputs=results_gallery)
94 | themes_input.submit(fn=search_marqo, inputs=[query_input, themes_input, negatives_input], outputs=results_gallery)
95 | negatives_input.submit(fn=search_marqo, inputs=[query_input, themes_input, negatives_input], outputs=results_gallery)
96 |
97 | # Launch the app
98 | interface.launch()
--------------------------------------------------------------------------------
/assets/ecommerce-demo.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/marqo-ai/ecommerce-search/bb86e38f66e98cf43b9abb1466c8b75b6dee7f53/assets/ecommerce-demo.gif
--------------------------------------------------------------------------------
/marqo/add_documents.py:
--------------------------------------------------------------------------------
1 | import marqo
2 | import os
3 | import pandas as pd
4 | from dotenv import load_dotenv
5 |
6 | load_dotenv()
7 |
8 | # Initialize the Marqo Cloud Client
9 | print("Initializing Marqo Cloud client...")
10 | api_key = os.getenv("MARQO_API_KEY") # To find your Marqo api key, visit https://www.marqo.ai/blog/finding-my-marqo-api-key
11 | mq = marqo.Client("https://api.marqo.ai", api_key=api_key)
12 |
13 | # If you'd rather run Marqo locally, swap out the code above with
14 | # mq = marqo.Client("http://localhost:8882", api_key=None)
15 | # For more information on running Marqo with Docker, see our GitHub: https://github.com/marqo-ai/marqo
16 |
17 | # Read the CSV data file
18 | path_to_data = "data/marqo-gs_100k.csv"
19 | df = pd.read_csv(path_to_data)
20 |
21 | # Convert the data into the required document format
22 | print("Converting data into the required document format...")
23 | documents = []
24 | for index, row in df.iterrows():
25 | document = {
26 | "image_url": row["image"],
27 | "query": row["query"],
28 | "title": row["title"],
29 | "score": row["score"],
30 | }
31 | documents.append(document)
32 |
33 | # Print progress for every 1000 rows processed
34 | if (index + 1) % 1000 == 0:
35 | print(f"Processed {index + 1} rows...")
36 |
37 | print("Data conversion completed. Starting document upload...")
38 |
39 | # Add the documents to the Marqo index in batches
40 | batch_size = 64
41 | for i in range(0, len(documents), batch_size):
42 | batch = documents[i:i + batch_size]
43 | try:
44 | print(f"Uploading batch {i // batch_size + 1} (rows {i + 1} to {i + len(batch)})...")
45 | res = mq.index("marqo-ecommerce-search").add_documents(
46 | batch,
47 | client_batch_size=batch_size,
48 | mappings={
49 | "image_title_multimodal": {
50 | "type": "multimodal_combination",
51 | "weights": {"title": 0.1, "query": 0.1, "image_url": 0.8},
52 | }
53 | },
54 | tensor_fields=["image_title_multimodal"],
55 | )
56 | print(f"Batch {i // batch_size + 1} upload response: {res}")
57 | except Exception as e:
58 | print(f"Error uploading batch {i // batch_size + 1}: {e}")
59 |
60 | print("All batches processed.")
--------------------------------------------------------------------------------
/marqo/create_index.py:
--------------------------------------------------------------------------------
1 | import marqo
2 | import os
3 | from dotenv import load_dotenv
4 |
5 | load_dotenv()
6 |
7 | # Initialize Marqo Cloud Client
8 | # Fetch API key from environment variable for secure access
9 | api_key = os.getenv("MARQO_API_KEY") # To find your Marqo api key, visit https://www.marqo.ai/blog/finding-my-marqo-api-key
10 | mq = marqo.Client("https://api.marqo.ai", api_key=api_key)
11 |
12 | # If you'd rather run Marqo locally, swap out the code above with
13 | # mq = marqo.Client("http://localhost:8882", api_key=None)
14 | # For more information on running Marqo with Docker, see our GitHub: https://github.com/marqo-ai/marqo
15 |
16 | # Define settings for the Marqo index
17 | settings = {
18 | "type": "unstructured",
19 | "model": "Marqo/marqo-ecommerce-embeddings-L", # Specify alternative model
20 | "modelProperties": {
21 | "name": "hf-hub:Marqo/marqo-ecommerce-embeddings-L",
22 | "dimensions": 1024, # Larger dimensionality for embeddings
23 | "type": "open_clip"
24 | },
25 | "treatUrlsAndPointersAsImages": True, # Enable image URLs as image sources
26 | "inferenceType": "marqo.CPU.large", # Inference type for Marqo Cloud resources
27 | }
28 |
29 | # Specify the name of the index
30 | index_name = "marqo-ecommerce-search"
31 |
32 | # Delete the existing index if it already exists to avoid conflicts
33 | try:
34 | mq.index(index_name).delete()
35 | except:
36 | pass # If the index does not exist, skip deletion
37 |
38 | # Create a new index with the specified settings
39 | mq.create_index(index_name, settings_dict=settings)
40 |
41 | # Alternative model configuration: marqo-ecommerce-embeddings-B
42 | # settings = {
43 | # "type": "unstructured", # Set the index type as unstructured data
44 | # "model": "Marqo/marqo-ecommerce-embeddings-B", # Specify model name
45 | # "modelProperties": { # Set model properties to use Marqo's ecommerce embeddings on HF Hub
46 | # "name": "hf-hub:Marqo/marqo-ecommerce-embeddings-B",
47 | # "dimensions": 768, # Dimensionality of the embedding model
48 | # "type": "open_clip" # Model type (OpenCLIP architecture)
49 | # },
50 | # "treatUrlsAndPointersAsImages": True, # Enable image URLs as image sources
51 | # "inferenceType": "marqo.CPU.large", # Inference type for Marqo Cloud resources
52 | # }
--------------------------------------------------------------------------------
/marqo/get_stats.py:
--------------------------------------------------------------------------------
1 | import marqo
2 | import os
3 | from dotenv import load_dotenv
4 |
5 | load_dotenv()
6 |
7 | # Retrieve the Marqo Cloud API key from environment variables for secure access
8 | api_key = os.getenv("MARQO_API_KEY")
9 |
10 | # Initialize Marqo client for Marqo Cloud using the API key
11 | mq = marqo.Client("https://api.marqo.ai", api_key=api_key) # To find your Marqo api key, visit https://www.marqo.ai/blog/finding-my-marqo-api-key
12 |
13 | # If you'd rather run Marqo locally, swap out the code above with
14 | # mq = marqo.Client("http://localhost:8882", api_key=None)
15 | # For more information on running Marqo with Docker, see our GitHub: https://github.com/marqo-ai/marqo
16 |
17 | # Define the name of the index to retrieve statistics from
18 | index_name = "marqo-ecommerce-search"
19 |
20 | # Fetch statistics for the specified index
21 | results = mq.index(index_name).get_stats()
22 |
23 | # Print the results to view the statistics of the index
24 | print(results)
25 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | marqo==3.9.0
2 | requests==2.32.3
3 | Pillow==11.0.0
4 | python-dotenv==1.0.1
5 | gradio==5.5.0
6 | python-dotenv==1.0.1
7 |
--------------------------------------------------------------------------------