├── Autocomplete.md ├── Matching.md ├── Measuring.md ├── README.md ├── Ranking.md └── Typeahead-Examples.md /Autocomplete.md: -------------------------------------------------------------------------------- 1 | # Autocomplete 2 | 3 | Autocomplete can save users time and help them find what they’re looking for. A common approach for e-commerce apps is to use popular queries. We can: 4 | 5 | - start with common queries 6 | - filter duplicates, misspellings, and queries without results 7 | - approve suggestions 8 | 9 | Then, we can implement and measure. 10 | 11 | ### Top Queries 12 | 13 | First, we need to measure search. You can read more about how to do this [here](Level-Up-Your-Search.md). 14 | 15 | Once it’s measured, get a list of queries and the number of distinct users who searched for it. With SQL: 16 | 17 | ```sql 18 | SELECT LOWER(query), COUNT(distinct user_id) FROM searches 19 | WHERE exclude = FALSE 20 | GROUP BY LOWER(query) 21 | HAVING COUNT(distinct user_id) > 5 22 | ``` 23 | 24 | > Note: If you have different results for different stores or regions, do this seperately for each of them. For instance, `Kirkland Signature` may be a top query in one store (Costco) but not in other stores. 25 | 26 | ### Duplicates 27 | 28 | The first thing you’ll notice is many queries are similar. 29 | 30 | - apple, apples (plural) 31 | - hand soap, handsoap (space) 32 | - ben & jerry's, ben and jerry's (ampersand) 33 | - organic milk vs milk organic (order) 34 | - amy's, amys (apostrophe) 35 | 36 | We need to: 37 | 38 | 1. detect these 39 | 2. decide which to show 40 | 41 | We can use a custom hashing algorithm for detection. 42 | 43 | 1. tokenize (whitespace works) 44 | 2. stem each token 45 | 3. sort 46 | 4. concat (without spaces) 47 | 48 | For instance, `organic soaps` and `soap organic` both hash to `organsoap`. 49 | 50 | When deciding which to show, a good heuristic is to show the most searched. However, you may like `amy's` to show up over `amys`. In this case, you may want to check against brands and prefer them. 51 | 52 | ### Misspellings 53 | 54 | We don’t want `zuchini`, `zuchinni`, and `zucchini` showing up in your suggestions. We could use a spelling library like [Hunspell](http://hunspell.sourceforge.net/) or [Aspell](http://aspell.net/). However, most of the time, queries will be domain-specific. 55 | 56 | One solution is to create your own corpus. At Instacart, we use product names. Since even these could have misspellings, set a minimum number of times a word appears before being added to the corpus. 57 | 58 | ### Ensure Results 59 | 60 | We don’t want to suggest a query with no results, so each suggested query should be checked for results. If you use fuzzy searching, turn it off for this. 61 | 62 | ### Approve Suggestions 63 | 64 | Once the automated work has been completed, you should look over the suggestions and bulk approve them by hand. If you have multiple stores or region, you can take shortcuts here: if a query is already approved for one store, approve it for others. 65 | 66 | ## Implement 67 | 68 | Once we have suggestions, it’s time to implement and measure. There a number of decisions to make: 69 | 70 | - which library to use 71 | - the number of suggestions to show 72 | - how to rank 73 | 74 | We’ll take it one by one. 75 | 76 | ### Libraries 77 | 78 | Here’s our list of recommended client libraries: 79 | 80 | - web - [Typeahead.js](https://twitter.github.io/typeahead.js/) 81 | - iOS - [MLPAutoCompleteTextField](https://github.com/EddyBorja/MLPAutoCompleteTextField) 82 | - Android - [AutoCompleteTextView](http://developer.android.com/reference/android/widget/AutoCompleteTextView.html) 83 | 84 | ### Number of Suggestions 85 | 86 | The number of suggestions varies from site-to-site. 87 | 88 | Site | Suggestions 89 | --- | --- 90 | Amazon | 8 91 | Overstock | 10 92 | Esty | 11 93 | eBay | 12 94 | 95 | Between 8 and 12 is probably good. You can always A/B test if needed. 96 | 97 | ### Ranking 98 | 99 | To start, it’s easiest to rank by popularity (the distinct number of users who searched). Eventually, you could optimize for other objectives, like basket size. 100 | 101 | ### Tip 102 | 103 | There’s no need to wait for a user to start typing to show suggestions. You can show popular ones as soon as the search box is focused. 104 | 105 | ## Performance 106 | 107 | Responsiveness is essential for autocomplete. You should keep the number of network requests to a minimum and filter client-side when possible. If you have under 10k suggestions, prefech all of them in a single request. 108 | 109 | ## Measure and Analyze 110 | 111 | We recommend adding a single field to the searches table to help with analysis: `typed_query`. It should be null for searches that weren’t autocompleted. 112 | 113 | From this, you can analyze the percent of searches that use autocomplete. You can also see if the overall conversion rate increases after adding it (or do an A/B test). 114 | 115 | ## Conclusion 116 | 117 | This should give you a nice foundation for getting started with autocomplete. Check out [Autosuggest](https://github.com/ankane/autosuggest) for a Ruby implementation. 118 | 119 | If you use Typehead.js, we also have [examples](Typeahead-Examples.md) for how to prefetch and measure. 120 | -------------------------------------------------------------------------------- /Matching.md: -------------------------------------------------------------------------------- 1 | # Matching 2 | 3 | The goal of matching is to return relevant results without returning irrelevant ones. There are a number of general techniques you can use to make this happen. 4 | 5 | ### Stemming 6 | 7 | If a user searches for `apples`, we want to return results with `apple`. Stemming is one way of accomplishing this. Stemming reduces a word to its stem (similar to its root word). There are [many different algorithms](https://en.wikipedia.org/wiki/Stemming) for stemming. [Porter](http://snowball.tartarus.org/algorithms/porter/stemmer.html) is a popular one for English. With the Porter stemmer, both `apples` and `apple` stem to `appl`. 8 | 9 | ### Synonyms 10 | 11 | When a user searches for `coke`, they likely want results with `coca-cola` to be returned as well. The same goes for `tissues` and `kleenex`. 12 | 13 | A key consideration in how to implement this is the difficulty of updating synonyms. With Elasticsearch, we recommend doing synonym expansion at query time so you can update synonyms without reindexing. You can read more about the tradeoff [here](https://www.elastic.co/guide/en/elasticsearch/guide/current/synonyms-expand-or-contract.html). 14 | 15 | There are some common symbols you may want to expand like `&` to `and` and `%` to `percent`. Also, the [WordNet database](https://en.wikipedia.org/wiki/WordNet) has a list of English synonyms. However, loading the entire database can significantly impact performance, so we recommend building a smaller list by hand. 16 | 17 | ### Misspellings 18 | 19 | We aren’t always great spellers. We type `zuchini` when we want `zucchini` and `siracha` when we want `sriracha`. Common misspellings will emerge and can be mapped to correct spellings. However, this won’t catch the long tail of typos. 20 | 21 | A more general approach is fuzzy searching. This typically returns results that are within a certain [edit distance](https://en.wikipedia.org/wiki/Edit_distance). The Damerau–Levenshtein distance is a good choice. It counts an edit as an insertion, deletion, substitution, or transposition of two adjacent characters. For instance, `hello` and `helm` have an edit distance of two (substitute `l` for `m` and delete the `o`). Also, it’s available on popular search engines. An edit distance of one is a good place to start. 22 | 23 | There are a few downsides to fuzzy searching to be aware of. The biggest is it can return irrelevant results. For instance, a search for `beet` will return `beef` and `beer`. It’s also less performant. 24 | 25 | Both of these can be addressed by fuzzy searching selectively. If a search returns many results without fuzziness, fuzzy searching is unlikely to be useful. So for each search, first perform without fuzziness, and only if there are too few results, search again with it. 26 | 27 | ### Special Characters 28 | 29 | Some results may have special characters, like `jalapeño`. We want a match if the user searches `jalapeno` (without the enye). ASCII folding is a technique to map characters to their ASCII equivalent. This maps `ñ` to `n`. For English, this works well, but it can be problematic for other languages. 30 | 31 | ### Spaces 32 | 33 | Search engines often use tokenization to split text into words, so whitespace (or lack of whitespace) can be problematic. Let’s examine the phrases `dish washer` and `dishwasher`. We likely want them to return similar results. We could map them as synonyms, but it would be tedious to do this with many phrases. 34 | 35 | One approach is to use word n-grams, or shingles. With this approach words are combined together in tokens, so `red dish washer` is tokenized to `reddish` and `dishwasher`. When indexing, index the shingles in addition to the individual words. When querying, try both the original query as well as shingles. Let’s look at two examples. 36 | 37 | #### Example 1: Spaces in Result 38 | 39 | You have an item named `Red Dish Washer` and a user searches for `red dishwasher`. The tokens produced are: 40 | 41 | - `red`, `dish`, `washer`, `reddish`, `dishwasher` for the item 42 | - `red`, `dishwasher` for the original query 43 | - `reddishwasher` for the query with shingles 44 | 45 | All tokens in the original query match the item, so it’s a match. 46 | 47 | #### Example 2: Spaces in Query 48 | 49 | You have an item named `Red Dishwasher` and a user searches for `dish washer`. The tokens produced are: 50 | 51 | - `red`, `dishwasher`, `reddishwasher` for the item 52 | - `dish`, `washer` for the original query 53 | - `dishwasher` for the query with shingles 54 | 55 | All tokens in the query with shingles match the item, so it’s a match. 56 | 57 | > Note: the query `red dish washer` will not match, as it’s tokenized to `reddish` and `dishwasher`. One way around this is to include the individual words as tokens as well and use an `OR` condition, but this can have other side effects. 58 | 59 | ### False Matches 60 | 61 | When searching for `butter`, you probably aren’t looking for `peanut butter`. An easy fix is to add a *NOT* condition for `peanut butter` to this search. Keep a mapping of these and apply them as needed. 62 | 63 | ### Unavailable Results 64 | 65 | Sometimes, you understand exactly what the user wants, but it’s not available. You may have personally encountered this problem with Netflix. They know what movie you want to watch, but it’s unavailable for streaming. At Instacart, people sometimes search for products we don’t sell - like cigarettes - or produce that’s only available during certain seasons - like strawberries. 66 | 67 | In this case, you can explain it’s unavailable and show related items. 68 | -------------------------------------------------------------------------------- /Measuring.md: -------------------------------------------------------------------------------- 1 | # Measuring 2 | 3 | The first step to improving is measuring. This allows us to see where we stand and track progress over time. 4 | 5 | We want to know what users are searching for. For instance, what are the most popular queries. We also want to know if a search is successful. Did the user find what they were looking for? For an e-commerce site, a good indicator may be if she added item to her cart or made a purchase. You may want to have multiple conversion goals, but for now, let’s start with one. 6 | 7 | When a user searches, track the: 8 | 9 | - query 10 | - user id (or visitor id) 11 | - number of results 12 | - time 13 | 14 | When a user converts, track the: 15 | 16 | - result id 17 | - position 18 | - time 19 | 20 | We also want a flag for excluding searches from admins and bots from analysis. We can use the user agent to detect bots. There are many open source projects for this. It’s also good to exclude users who search excessively. They may be bots as well, and at the very least can throw off analysis. We can retroactively determine this and update the flag. 21 | 22 | We want to collect all the data in a single table to simplify analysis. Our initial searches table will have: 23 | 24 | - query 25 | - user_id 26 | - results_count 27 | - searched_at (time) 28 | - result_id 29 | - position 30 | - converted_at (time) 31 | - exclude (boolean) 32 | 33 | If you have different results for different stores or regions, store those values as well. 34 | 35 | You could also have a separate table for conversions to record multiple conversions for a single search, in addition to storing the first conversion in the searches table. If users can sort or filter results, we recommend storing those actions in a separate table as well. However, these are outside the scope of this post. 36 | 37 | Now, let’s analyze. 38 | 39 | ## Analyze 40 | 41 | With the fields above, you can calculate: 42 | 43 | - top queries 44 | - overall conversion rate 45 | - queries with a low conversion rate 46 | - queries with no results (or few results) 47 | - average searches per user 48 | - average time to conversion 49 | - average position of conversions 50 | 51 | There are a number of things we can improve (time to conversion, position of conversions, etc), but it often makes sense to start with the overall conversion rate. Plot it over time so we can see our progress. 52 | 53 | Start by getting the top 100 queries and sorting by conversion rate (lowest first). If we’ve set the `exclude` flag properly, the data should be pretty clean. For each query, perform a search and try to identify the issue. Likely, a number of themes will emerge. 54 | 55 | Our general approach to improve will be to first improve matching, then improve ranking. 56 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Practical Search 2 | 3 | Best search practices for developers and code to implement them. Let’s make search a better experience for our users. 4 | 5 | ## Chapters 6 | 7 | 1. [Measuring](Measuring.md) 8 | 1. [Matching](Matching.md) 9 | 1. [Ranking](Ranking.md) 10 | 1. [Autocomplete](Autocomplete.md) 11 | 12 | ## Related Projects 13 | 14 | - [Searchkick](https://github.com/ankane/searchkick) - Intelligent search made easy for Rails 15 | - [Searchjoy](https://github.com/ankane/searchjoy) - Search analytics made easy 16 | - [Autosuggest](https://github.com/ankane/autosuggest) - Autocomplete suggestions based on what your users search 17 | 18 | ## Contribute 19 | 20 | This is a work in progress, built for the open-source community. If you have great practices, articles, or videos, [please share](https://github.com/ankane/search_guide/issues/new). 21 | -------------------------------------------------------------------------------- /Ranking.md: -------------------------------------------------------------------------------- 1 | # Ranking 2 | 3 | There are many different strategies for ranking. If we want the most relevant results to show up first, we can take into account: 4 | 5 | - number of terms that match 6 | - significance of each term 7 | - popularity of each result (such as number of times ordered or viewed) 8 | 9 | One extremely effective strategy is the *popularity given the specific query*. To accomplish this, we’ll use the data we collected above. 10 | 11 | ### Conversions 12 | 13 | Conversions are a great source of data for relevance. Algorithms like TF-IDF and BM25 work great when dealing only with text, but we now have powerful metadata. 14 | 15 | If a user searches for `ice cream` and adds Ben & Jerry’s Chunky Monkey to the cart (our conversion metric at Instacart), that item should get a little more weight for similar queries. To prevent specific users from throwing off this approach, we count only one conversion per user. 16 | 17 | The most basic method has two drawbacks: 18 | 19 | 1. New items are ranked last 20 | 2. Top result stay top results because top results convert better, even if they aren’t most relevant 21 | 22 | There are number of ways to address each of these issues. We’ve opted for simple ones. 23 | 24 | - For #1, assign new items a weight until enough data is collected. 25 | - For #2, randomly penalize top results to give other results a better chance to convert. 26 | 27 | Another good strategy for ranking, which can be combined, is personalization. However, let’s save that for another post. For now, we have a strategy for helping with precision when there’s little data and getting rid of the pesky irrelevant results at the bottom. 28 | 29 | ### Learning to Rank 30 | 31 | Another more advanced strategy is called “learning to rank”. This uses machine learning to rank results. This typically has two steps: 32 | 33 | 1. Retrieve relevant results from your search engine 34 | 2. Rerank results with a machine learning model 35 | 36 | The features of the model are often specific to the user searching, like whether they’ve bought the brand of the result before. 37 | 38 | ## Other 39 | 40 | You’ll likely find other issues that don’t fit into the categories above. You should still classify them. A few we’ve encountered at Instacart are: 41 | 42 | ### Unexpected Searches 43 | 44 | People often type the name of a retailer, like Whole Foods or Costco, into the the search box when trying to change stores. We now automatically switch stores when this happens. 45 | 46 | ### Missing Images 47 | 48 | People like to see images of the products they’re buying, so having search results with few images leads to low conversions. In this case, we need to get images. A few different ways we’ll do this are licensing the images through a 3rd party API, reaching out to manufacturers directly, or photographing them ourselves. 49 | 50 | ### Missing Products 51 | 52 | In the early days, products sometimes weren’t added to our catalog, even though we carried them. Once, we were missing all the cream cheese from a popular retailer. This was easy to identify after looking at search data. The fix was pretty straightforward: add them. 53 | -------------------------------------------------------------------------------- /Typeahead-Examples.md: -------------------------------------------------------------------------------- 1 | # Typehead.js Examples 2 | 3 | Typeahead.js offers prefetch, which loads terms in a single request after the initial page load. This keeps the initial page load fast and results show up instantly as the user types. However, prefetch uses local storage and [it’s not recommended to be used for the entire data set](https://github.com/twitter/typeahead.js/blob/master/doc/bloodhound.md#prefetch), so we use a custom prefetch that doesn’t use local storage. 4 | 5 | ```js 6 | var engine = new Bloodhound({ 7 | datumTokenizer: Bloodhound.tokenizers.obj.whitespace('value'), 8 | queryTokenizer: Bloodhound.tokenizers.whitespace, 9 | limit: 4, 10 | local: [] 11 | }) 12 | engine.initialize() 13 | 14 | // fetch suggestions 15 | $.getJSON('/suggestions', function (suggestions) { 16 | engine.add(suggestions) 17 | }) 18 | 19 | $searchInput.typeahead({highlight: true}, {source: engine.ttAdapter()}) 20 | ``` 21 | 22 | Measure typed query (only populated when autocompleted) 23 | 24 | ```js 25 | var typedQuery 26 | 27 | $searchInput.on('keyup', function () { 28 | typedQuery = $searchInput.typeahead('val') 29 | }).on('typeahead:selected typeahead:autocompleted', function () { 30 | $('#typed_query').val(typedQuery) // autocompleted!! 31 | }) 32 | ``` 33 | 34 | Measure typing time 35 | 36 | ```js 37 | var typingStartedAt 38 | 39 | $searchInput.on('keyup', function (e) { 40 | if (!typingStartedAt && e.keyCode != 13) { 41 | typingStartedAt = new Date() 42 | } 43 | }) 44 | $searchInput.closest('form').on('submit', function () { 45 | if (typingStartedAt) { 46 | $('#typing_time', ((new Date()) - typingStartedAt) / 1000.0) 47 | typingStartedAt = null 48 | } 49 | }) 50 | ``` 51 | --------------------------------------------------------------------------------