├── Autocomplete.md
├── Matching.md
├── Measuring.md
├── README.md
├── Ranking.md
└── Typeahead-Examples.md


/Autocomplete.md:
--------------------------------------------------------------------------------
  1 | # Autocomplete
  2 | 
  3 | Autocomplete can save users time and help them find what they’re looking for. A common approach for e-commerce apps is to use popular queries. We can:
  4 | 
  5 | - start with common queries
  6 | - filter duplicates, misspellings, and queries without results
  7 | - approve suggestions
  8 | 
  9 | Then, we can implement and measure.
 10 | 
 11 | ### Top Queries
 12 | 
 13 | First, we need to measure search. You can read more about how to do this [here](Level-Up-Your-Search.md).
 14 | 
 15 | Once it’s measured, get a list of queries and the number of distinct users who searched for it. With SQL:
 16 | 
 17 | ```sql
 18 | SELECT LOWER(query), COUNT(distinct user_id) FROM searches
 19 | WHERE exclude = FALSE
 20 | GROUP BY LOWER(query)
 21 | HAVING COUNT(distinct user_id) > 5
 22 | ```
 23 | 
 24 | > Note: If you have different results for different stores or regions, do this seperately for each of them. For instance, `Kirkland Signature` may be a top query in one store (Costco) but not in other stores.
 25 | 
 26 | ### Duplicates
 27 | 
 28 | The first thing you’ll notice is many queries are similar.
 29 | 
 30 | - apple, apples (plural)
 31 | - hand soap, handsoap (space)
 32 | - ben & jerry's, ben and jerry's (ampersand)
 33 | - organic milk vs milk organic (order)
 34 | - amy's, amys (apostrophe)
 35 | 
 36 | We need to:
 37 | 
 38 | 1. detect these
 39 | 2. decide which to show
 40 | 
 41 | We can use a custom hashing algorithm for detection.
 42 | 
 43 | 1. tokenize (whitespace works)
 44 | 2. stem each token
 45 | 3. sort
 46 | 4. concat (without spaces)
 47 | 
 48 | For instance, `organic soaps` and `soap organic` both hash to `organsoap`.
 49 | 
 50 | When deciding which to show, a good heuristic is to show the most searched. However, you may like `amy's` to show up over `amys`. In this case, you may want to check against brands and prefer them.
 51 | 
 52 | ### Misspellings
 53 | 
 54 | We don’t want `zuchini`, `zuchinni`, and `zucchini` showing up in your suggestions. We could use a spelling library like [Hunspell](http://hunspell.sourceforge.net/) or [Aspell](http://aspell.net/). However, most of the time, queries will be domain-specific.
 55 | 
 56 | One solution is to create your own corpus. At Instacart, we use product names. Since even these could have misspellings, set a minimum number of times a word appears before being added to the corpus.
 57 | 
 58 | ### Ensure Results
 59 | 
 60 | We don’t want to suggest a query with no results, so each suggested query should be checked for results. If you use fuzzy searching, turn it off for this.
 61 | 
 62 | ### Approve Suggestions
 63 | 
 64 | Once the automated work has been completed, you should look over the suggestions and bulk approve them by hand. If you have multiple stores or region, you can take shortcuts here: if a query is already approved for one store, approve it for others.
 65 | 
 66 | ## Implement
 67 | 
 68 | Once we have suggestions, it’s time to implement and measure. There a number of decisions to make:
 69 | 
 70 | - which library to use
 71 | - the number of suggestions to show
 72 | - how to rank
 73 | 
 74 | We’ll take it one by one.
 75 | 
 76 | ### Libraries
 77 | 
 78 | Here’s our list of recommended client libraries:
 79 | 
 80 | - web - [Typeahead.js](https://twitter.github.io/typeahead.js/)
 81 | - iOS - [MLPAutoCompleteTextField](https://github.com/EddyBorja/MLPAutoCompleteTextField)
 82 | - Android - [AutoCompleteTextView](http://developer.android.com/reference/android/widget/AutoCompleteTextView.html)
 83 | 
 84 | ### Number of Suggestions
 85 | 
 86 | The number of suggestions varies from site-to-site.
 87 | 
 88 | Site | Suggestions
 89 | --- | ---
 90 | Amazon | 8
 91 | Overstock | 10
 92 | Esty | 11
 93 | eBay | 12
 94 | 
 95 | Between 8 and 12 is probably good. You can always A/B test if needed.
 96 | 
 97 | ### Ranking
 98 | 
 99 | To start, it’s easiest to rank by popularity (the distinct number of users who searched). Eventually, you could optimize for other objectives, like basket size.
100 | 
101 | ### Tip
102 | 
103 | There’s no need to wait for a user to start typing to show suggestions. You can show popular ones as soon as the search box is focused.
104 | 
105 | ## Performance
106 | 
107 | Responsiveness is essential for autocomplete. You should keep the number of network requests to a minimum and filter client-side when possible. If you have under 10k suggestions, prefech all of them in a single request.
108 | 
109 | ## Measure and Analyze
110 | 
111 | We recommend adding a single field to the searches table to help with analysis: `typed_query`. It should be null for searches that weren’t autocompleted.
112 | 
113 | From this, you can analyze the percent of searches that use autocomplete. You can also see if the overall conversion rate increases after adding it (or do an A/B test).
114 | 
115 | ## Conclusion
116 | 
117 | This should give you a nice foundation for getting started with autocomplete. Check out [Autosuggest](https://github.com/ankane/autosuggest) for a Ruby implementation.
118 | 
119 | If you use Typehead.js, we also have [examples](Typeahead-Examples.md) for how to prefetch and measure.
120 | 


--------------------------------------------------------------------------------
/Matching.md:
--------------------------------------------------------------------------------
 1 | # Matching
 2 | 
 3 | The goal of matching is to return relevant results without returning irrelevant ones. There are a number of general techniques you can use to make this happen.
 4 | 
 5 | ### Stemming
 6 | 
 7 | If a user searches for `apples`, we want to return results with `apple`. Stemming is one way of accomplishing this. Stemming reduces a word to its stem (similar to its root word). There are [many different algorithms](https://en.wikipedia.org/wiki/Stemming) for stemming. [Porter](http://snowball.tartarus.org/algorithms/porter/stemmer.html) is a popular one for English. With the Porter stemmer, both `apples` and `apple` stem to `appl`.
 8 | 
 9 | ### Synonyms
10 | 
11 | When a user searches for `coke`, they likely want results with `coca-cola` to be returned as well. The same goes for `tissues` and `kleenex`.
12 | 
13 | A key consideration in how to implement this is the difficulty of updating synonyms. With Elasticsearch, we recommend doing synonym expansion at query time so you can update synonyms without reindexing. You can read more about the tradeoff [here](https://www.elastic.co/guide/en/elasticsearch/guide/current/synonyms-expand-or-contract.html).
14 | 
15 | There are some common symbols you may want to expand like `&` to `and` and `%` to `percent`. Also, the [WordNet database](https://en.wikipedia.org/wiki/WordNet) has a list of English synonyms. However, loading the entire database can significantly impact performance, so we recommend building a smaller list by hand.
16 | 
17 | ### Misspellings
18 | 
19 | We aren’t always great spellers. We type `zuchini` when we want `zucchini` and `siracha` when we want `sriracha`. Common misspellings will emerge and can be mapped to correct spellings. However, this won’t catch the long tail of typos.
20 | 
21 | A more general approach is fuzzy searching. This typically returns results that are within a certain [edit distance](https://en.wikipedia.org/wiki/Edit_distance). The Damerau–Levenshtein distance is a good choice. It counts an edit as an insertion, deletion, substitution, or transposition of two adjacent characters. For instance, `hello` and `helm` have an edit distance of two (substitute `l` for `m` and delete the `o`). Also, it’s available on popular search engines. An edit distance of one is a good place to start.
22 | 
23 | There are a few downsides to fuzzy searching to be aware of. The biggest is it can return irrelevant results. For instance, a search for `beet` will return `beef` and `beer`. It’s also less performant.
24 | 
25 | Both of these can be addressed by fuzzy searching selectively. If a search returns many results without fuzziness, fuzzy searching is unlikely to be useful. So for each search, first perform without fuzziness, and only if there are too few results, search again with it.
26 | 
27 | ### Special Characters
28 | 
29 | Some results may have special characters, like `jalapeño`. We want a match if the user searches `jalapeno` (without the enye). ASCII folding is a technique to map characters to their ASCII equivalent. This maps  `ñ` to `n`. For English, this works well, but it can be problematic for other languages.
30 | 
31 | ### Spaces
32 | 
33 | Search engines often use tokenization to split text into words, so whitespace (or lack of whitespace) can be problematic. Let’s examine the phrases `dish washer` and `dishwasher`. We likely want them to return similar results. We could map them as synonyms, but it would be tedious to do this with many phrases.
34 | 
35 | One approach is to use word n-grams, or shingles. With this approach words are combined together in tokens, so `red dish washer` is tokenized to `reddish` and `dishwasher`. When indexing, index the shingles in addition to the individual words. When querying, try both the original query as well as shingles. Let’s look at two examples.
36 | 
37 | #### Example 1: Spaces in Result
38 | 
39 | You have an item named `Red Dish Washer` and a user searches for `red dishwasher`. The tokens produced are:
40 | 
41 | - `red`, `dish`, `washer`, `reddish`, `dishwasher` for the item
42 | - `red`, `dishwasher` for the original query
43 | - `reddishwasher` for the query with shingles
44 | 
45 | All tokens in the original query match the item, so it’s a match.
46 | 
47 | #### Example 2: Spaces in Query
48 | 
49 | You have an item named `Red Dishwasher` and a user searches for `dish washer`. The tokens produced are:
50 | 
51 | - `red`, `dishwasher`, `reddishwasher` for the item
52 | - `dish`, `washer` for the original query
53 | - `dishwasher` for the query with shingles
54 | 
55 | All tokens in the query with shingles match the item, so it’s a match.
56 | 
57 | > Note: the query `red dish washer` will not match, as it’s tokenized to `reddish` and `dishwasher`. One way around this is to include the individual words as tokens as well and use an `OR` condition, but this can have other side effects.
58 | 
59 | ### False Matches
60 | 
61 | When searching for `butter`, you probably aren’t looking for `peanut butter`. An easy fix is to add a *NOT* condition for `peanut butter` to this search. Keep a mapping of these and apply them as needed.
62 | 
63 | ### Unavailable Results
64 | 
65 | Sometimes, you understand exactly what the user wants, but it’s not available. You may have personally encountered this problem with Netflix. They know what movie you want to watch, but it’s unavailable for streaming. At Instacart, people sometimes search for products we don’t sell - like cigarettes - or produce that’s only available during certain seasons - like strawberries.
66 | 
67 | In this case, you can explain it’s unavailable and show related items.
68 | 


--------------------------------------------------------------------------------
/Measuring.md:
--------------------------------------------------------------------------------
 1 | # Measuring
 2 | 
 3 | The first step to improving is measuring. This allows us to see where we stand and track progress over time.
 4 | 
 5 | We want to know what users are searching for. For instance, what are the most popular queries. We also want to know if a search is successful. Did the user find what they were looking for? For an e-commerce site, a good indicator may be if she added item to her cart or made a purchase. You may want to have multiple conversion goals, but for now, let’s start with one.
 6 | 
 7 | When a user searches, track the:
 8 | 
 9 | - query
10 | - user id (or visitor id)
11 | - number of results
12 | - time
13 | 
14 | When a user converts, track the:
15 | 
16 | - result id
17 | - position
18 | - time
19 | 
20 | We also want a flag for excluding searches from admins and bots from analysis. We can use the user agent to detect bots. There are many open source projects for this. It’s also good to exclude users who search excessively. They may be bots as well, and at the very least can throw off analysis. We can retroactively determine this and update the flag.
21 | 
22 | We want to collect all the data in a single table to simplify analysis. Our initial searches table will have:
23 | 
24 | - query
25 | - user_id
26 | - results_count
27 | - searched_at (time)
28 | - result_id
29 | - position
30 | - converted_at (time)
31 | - exclude (boolean)
32 | 
33 | If you have different results for different stores or regions, store those values as well.
34 | 
35 | You could also have a separate table for conversions to record multiple conversions for a single search, in addition to storing the first conversion in the searches table. If users can sort or filter results, we recommend storing those actions in a separate table as well. However, these are outside the scope of this post.
36 | 
37 | Now, let’s analyze.
38 | 
39 | ## Analyze
40 | 
41 | With the fields above, you can calculate:
42 | 
43 | - top queries
44 | - overall conversion rate
45 | - queries with a low conversion rate
46 | - queries with no results (or few results)
47 | - average searches per user
48 | - average time to conversion
49 | - average position of conversions
50 | 
51 | There are a number of things we can improve (time to conversion, position of conversions, etc), but it often makes sense to start with the overall conversion rate. Plot it over time so we can see our progress.
52 | 
53 | Start by getting the top 100 queries and sorting by conversion rate (lowest first). If we’ve set the `exclude` flag properly, the data should be pretty clean. For each query, perform a search and try to identify the issue. Likely, a number of themes will emerge.
54 | 
55 | Our general approach to improve will be to first improve matching, then improve ranking.
56 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Practical Search
 2 | 
 3 | Best search practices for developers and code to implement them. Let’s make search a better experience for our users.
 4 | 
 5 | ## Chapters
 6 | 
 7 | 1. [Measuring](Measuring.md)
 8 | 1. [Matching](Matching.md)
 9 | 1. [Ranking](Ranking.md)
10 | 1. [Autocomplete](Autocomplete.md)
11 | 
12 | ## Related Projects
13 | 
14 | - [Searchkick](https://github.com/ankane/searchkick) - Intelligent search made easy for Rails
15 | - [Searchjoy](https://github.com/ankane/searchjoy) - Search analytics made easy
16 | - [Autosuggest](https://github.com/ankane/autosuggest) - Autocomplete suggestions based on what your users search
17 | 
18 | ## Contribute
19 | 
20 | This is a work in progress, built for the open-source community. If you have great practices, articles, or videos, [please share](https://github.com/ankane/search_guide/issues/new).
21 | 


--------------------------------------------------------------------------------
/Ranking.md:
--------------------------------------------------------------------------------
 1 | # Ranking
 2 | 
 3 | There are many different strategies for ranking. If we want the most relevant results to show up first, we can take into account:
 4 | 
 5 | - number of terms that match
 6 | - significance of each term
 7 | - popularity of each result (such as number of times ordered or viewed)
 8 | 
 9 | One extremely effective strategy is the *popularity given the specific query*. To accomplish this, we’ll use the data we collected above.
10 | 
11 | ### Conversions
12 | 
13 | Conversions are a great source of data for relevance. Algorithms like TF-IDF and BM25 work great when dealing only with text, but we now have powerful metadata.
14 | 
15 | If a user searches for `ice cream` and adds Ben & Jerry’s Chunky Monkey to the cart (our conversion metric at Instacart), that item should get a little more weight for similar queries. To prevent specific users from throwing off this approach, we count only one conversion per user.
16 | 
17 | The most basic method has two drawbacks:
18 | 
19 | 1. New items are ranked last
20 | 2. Top result stay top results because top results convert better, even if they aren’t most relevant
21 | 
22 | There are number of ways to address each of these issues. We’ve opted for simple ones.
23 | 
24 | - For #1, assign new items a weight until enough data is collected.
25 | - For #2, randomly penalize top results to give other results a better chance to convert.
26 | 
27 | Another good strategy for ranking, which can be combined, is personalization. However, let’s save that for another post. For now, we have a strategy for helping with precision when there’s little data and getting rid of the pesky irrelevant results at the bottom.
28 | 
29 | ### Learning to Rank
30 | 
31 | Another more advanced strategy is called “learning to rank”. This uses machine learning to rank results. This typically has two steps:
32 | 
33 | 1. Retrieve relevant results from your search engine
34 | 2. Rerank results with a machine learning model
35 | 
36 | The features of the model are often specific to the user searching, like whether they’ve bought the brand of the result before.
37 | 
38 | ## Other
39 | 
40 | You’ll likely find other issues that don’t fit into the categories above. You should still classify them. A few we’ve encountered at Instacart are:
41 | 
42 | ### Unexpected Searches
43 | 
44 | People often type the name of a retailer, like Whole Foods or Costco, into the the search box when trying to change stores. We now automatically switch stores when this happens.
45 | 
46 | ### Missing Images
47 | 
48 | People like to see images of the products they’re buying, so having search results with few images leads to low conversions. In this case, we need to get images. A few different ways we’ll do this are licensing the images through a 3rd party API, reaching out to manufacturers directly, or photographing them ourselves.
49 | 
50 | ### Missing Products
51 | 
52 | In the early days, products sometimes weren’t added to our catalog, even though we carried them. Once, we were missing all the cream cheese from a popular retailer. This was easy to identify after looking at search data. The fix was pretty straightforward: add them.
53 | 


--------------------------------------------------------------------------------
/Typeahead-Examples.md:
--------------------------------------------------------------------------------
 1 | # Typehead.js Examples
 2 | 
 3 | Typeahead.js offers prefetch, which loads terms in a single request after the initial page load.  This keeps the initial page load fast and results show up instantly as the user types. However, prefetch uses local storage and [it’s not recommended to be used for the entire data set](https://github.com/twitter/typeahead.js/blob/master/doc/bloodhound.md#prefetch), so we use a custom prefetch that doesn’t use local storage.
 4 | 
 5 | ```js
 6 | var engine = new Bloodhound({
 7 |   datumTokenizer: Bloodhound.tokenizers.obj.whitespace('value'),
 8 |   queryTokenizer: Bloodhound.tokenizers.whitespace,
 9 |   limit: 4,
10 |   local: []
11 | })
12 | engine.initialize()
13 | 
14 | // fetch suggestions
15 | $.getJSON('/suggestions', function (suggestions) {
16 |   engine.add(suggestions)
17 | })
18 | 
19 | $searchInput.typeahead({highlight: true}, {source: engine.ttAdapter()})
20 | ```
21 | 
22 | Measure typed query (only populated when autocompleted)
23 | 
24 | ```js
25 | var typedQuery
26 | 
27 | $searchInput.on('keyup', function () {
28 |   typedQuery = $searchInput.typeahead('val')
29 | }).on('typeahead:selected typeahead:autocompleted', function () {
30 |   $('#typed_query').val(typedQuery) // autocompleted!!
31 | })
32 | ```
33 | 
34 | Measure typing time
35 | 
36 | ```js
37 | var typingStartedAt
38 | 
39 | $searchInput.on('keyup', function (e) {
40 |   if (!typingStartedAt && e.keyCode != 13) {
41 |     typingStartedAt = new Date()
42 |   }
43 | })
44 | $searchInput.closest('form').on('submit', function () {
45 |   if (typingStartedAt) {
46 |     $('#typing_time', ((new Date()) - typingStartedAt) / 1000.0)
47 |     typingStartedAt = null
48 |   }
49 | })
50 | ```
51 | 


--------------------------------------------------------------------------------