├── assignments ├── assignment0 │ ├── excercise │ │ ├── python-env.sh │ │ ├── requirements.txt │ │ ├── get_dataset.sh │ │ └── setup_solr.sh │ └── README.md ├── assignment1 │ ├── exercise │ │ └── src │ │ │ ├── frontend │ │ │ ├── templates │ │ │ │ ├── layout.html │ │ │ │ └── simpleIndexSearchPage.html │ │ │ └── app.py │ │ │ ├── understand_data.py │ │ │ └── simple_index.py │ └── README.md └── assignment2 │ ├── excercise │ └── src │ │ ├── frontend │ │ ├── templates │ │ │ ├── layout.html │ │ │ ├── simpleIndexSearchPage.html │ │ │ └── entityAwareSearchPage.html │ │ └── app.py │ │ └── entity_aware_index.py │ └── README.md ├── finished-product ├── resources │ ├── sigir2017-demo.gif │ ├── stanford_title_ner_tags_case_sensitive.csv.gz │ ├── setup_solr.sh │ └── create_fields.sh ├── data │ └── get_dataset.sh └── src │ ├── frontend │ ├── templates │ │ ├── layout.html │ │ ├── simpleIndexSearchPage.html │ │ └── entityAwareSearchPage.html │ └── app.py │ ├── backend │ ├── understand_data.py │ └── nlp.py │ └── indexing │ ├── simple_index.py │ └── entity_aware_index.py ├── README.md └── LICENSE /assignments/assignment0/excercise/python-env.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | pip install -r requirements.txt 4 | python -m nltk.downloader punkt 5 | python -m nltk.downloader stopwords -------------------------------------------------------------------------------- /finished-product/resources/sigir2017-demo.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/candidate-selection-tutorial-sigir2017/candidate-selection-tutorial/HEAD/finished-product/resources/sigir2017-demo.gif -------------------------------------------------------------------------------- /finished-product/resources/stanford_title_ner_tags_case_sensitive.csv.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/candidate-selection-tutorial-sigir2017/candidate-selection-tutorial/HEAD/finished-product/resources/stanford_title_ner_tags_case_sensitive.csv.gz -------------------------------------------------------------------------------- /finished-product/resources/setup_solr.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | rm -rf solr-6.6.0 4 | wget -nc http://www-eu.apache.org/dist/lucene/solr/6.6.0/solr-6.6.0.tgz 5 | tar -zxf solr-6.6.0.tgz 6 | cd solr-6.6.0 7 | bin/solr restart 8 | cd .. 9 | ./create_fields.sh 10 | -------------------------------------------------------------------------------- /assignments/assignment0/excercise/requirements.txt: -------------------------------------------------------------------------------- 1 | cycler==0.10.0 2 | httplib2==0.10.3 3 | lxml==3.8.0 4 | nltk==3.2.4 5 | numpy==1.13.1 6 | pandas==0.20.3 7 | PTable==0.9.2 8 | pyparsing==2.2.0 9 | pysolr==3.6.0 10 | python-dateutil==2.6.1 11 | pytz==2017.2 12 | requests==2.10.0 13 | six==1.10.0 14 | sunburnt==0.6 15 | web.py==0.37 16 | sner==0.2.1 17 | -------------------------------------------------------------------------------- /assignments/assignment0/excercise/get_dataset.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | set -e 4 | 5 | # Escape code 6 | esc=`echo -en "\033"` 7 | 8 | # Set colors 9 | cc_red="${esc}[0;31m" 10 | cc_green="${esc}[0;32m" 11 | cc_yellow="${esc}[0;33m" 12 | cc_blue="${esc}[0;34m" 13 | cc_normal=`echo -en "${esc}[m\017"` 14 | 15 | function ec () { 16 | echo -e "${cc_green}${1}${cc_normal}" 17 | } 18 | 19 | cd ~/workspace/candidate-selection-tutorial/finished-product/data 20 | ./get_dataset.sh 21 | cd ~/workspace/candidate-selection-tutorial/assignments/assignment0/excercise 22 | ec "\n\nNews aggregator dataset has been downloaded!" 23 | 24 | -------------------------------------------------------------------------------- /assignments/assignment0/excercise/setup_solr.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | set -e 3 | 4 | # Escape code 5 | esc=`echo -en "\033"` 6 | 7 | # Set colors 8 | cc_red="${esc}[0;31m" 9 | cc_green="${esc}[0;32m" 10 | cc_yellow="${esc}[0;33m" 11 | cc_blue="${esc}[0;34m" 12 | cc_normal=`echo -en "${esc}[m\017"` 13 | 14 | function ec () { 15 | echo -e "${cc_green}${1}${cc_normal}" 16 | } 17 | 18 | cd ~/workspace/candidate-selection-tutorial/finished-product/resources 19 | ./setup_solr.sh 20 | cd ~/workspace/candidate-selection-tutorial/assignments/assignment0/excercise 21 | ec "\n\nSolr has been started and collections with specific dataset fields have been created!" 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## SIGIR 2017 - Candidate Selection for Personalized Search and Recommender Systems 2 | 3 | ### Abstract 4 | Modern day social media search and recommender systems require complex query formulation that incorporates both user context and their explicit search queries. Users expect these systems to be fast and provide relevant results to their query and context. With millions of documents to choose from, these systems utilize a multi-pass scoring function to narrow the results and provide the most relevant ones to users. Candidate selection is required to sift through all the documents in the index and select a relevant few to be ranked by subsequent scoring functions. It becomes crucial to narrow down the document set while maintaining relevant ones in resulting set. In this tutorial we survey various candidate selection techniques and deep dive into case studies on a large scale social media platform. In the later half we provide hands-on tutorial where we explore building these candidate selection models on a real world dataset and see how to balance the tradeoff between relevance and latency. 5 | 6 | ### Presenters 7 | Dhruv Arya, Ganesh Venkataraman, Aman Grover, Krishnaram Kenthapadi, Yiqun Liu 8 | 9 | ### Final Output 10 |  11 | -------------------------------------------------------------------------------- /assignments/assignment1/exercise/src/frontend/templates/layout.html: -------------------------------------------------------------------------------- 1 | $def with (content) 2 | 3 | 4 |
5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 || Search Results | 22 |Publisher | 23 |Category | 24 |Url | 25 |
|---|
| Search Results | 22 |Publisher | 23 |Category | 24 |Url | 25 |
|---|
| Search Results | 22 |Publisher | 23 |Category | 24 |Url | 25 |
|---|
| Search Results | 22 |Publisher | 23 |Category | 24 |Url | 25 |
|---|
| Search Results | 22 |Publisher | 23 |Category | 24 |Url | 25 |
|---|