├── .gitignore ├── Dockerfile ├── README.md ├── address_to_geocoding.json ├── binder └── requirements.txt ├── dedupe-settings.pickle ├── dedupe-simple-settings.pickle ├── dedupe-simple-training.json ├── dedupe-slides-training.json ├── dedupe └── variables │ ├── __init__.py │ └── custom_variables.py ├── full-indexing.png ├── graph_utils.py ├── index.html ├── requirements.in ├── requirements.txt ├── restaurant-training.csv ├── restaurant.csv ├── restaurant.original.csv ├── rise.css ├── slides-pt-br.ipynb ├── slides-pycon-us-2020.ipynb ├── slides-reduced.ipynb ├── slides.ipynb ├── sorted-neighbourhood.png ├── standard-blocking.png ├── svm_dedupe.py ├── training-input-output.txt ├── training-simple-input-output.txt └── vinta.png /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg 26 | MANIFEST 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | .hypothesis/ 48 | .pytest_cache/ 49 | 50 | # Translations 51 | *.mo 52 | *.pot 53 | 54 | # Django stuff: 55 | *.log 56 | local_settings.py 57 | db.sqlite3 58 | 59 | # Flask stuff: 60 | instance/ 61 | .webassets-cache 62 | 63 | # Scrapy stuff: 64 | .scrapy 65 | 66 | # Sphinx documentation 67 | docs/_build/ 68 | 69 | # PyBuilder 70 | target/ 71 | 72 | # Jupyter Notebook 73 | .ipynb_checkpoints 74 | 75 | # pyenv 76 | .python-version 77 | 78 | # celery beat schedule file 79 | celerybeat-schedule 80 | 81 | # SageMath parsed files 82 | *.sage.py 83 | 84 | # Environments 85 | .env 86 | .venv 87 | env/ 88 | venv/ 89 | ENV/ 90 | env.bak/ 91 | venv.bak/ 92 | 93 | # Spyder project settings 94 | .spyderproject 95 | .spyproject 96 | 97 | # Rope project settings 98 | .ropeproject 99 | 100 | # mkdocs documentation 101 | /site 102 | 103 | # mypy 104 | .mypy_cache/ 105 | 106 | # Sublime 107 | *.sublime-workspace 108 | *.sublime-project 109 | 110 | # Local-only files 111 | _personal/ 112 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM jupyter/scipy-notebook:1145fb1198b2 2 | 3 | USER ${NB_USER} 4 | COPY reduced-requirements.txt /tmp/ 5 | RUN pip install -r /tmp/reduced-requirements.txt 6 | CMD ["jupyter", "notebook", "--ip", "0.0.0.0"] 7 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | | :warning: Looking for PyCon US 2020 slides? They're on [`pycon-2020` branch. Please click here!](https://github.com/vintasoftware/deduplication-slides/tree/pycon-2020) | 2 | | --- | 3 | 4 | # 1 + 1 = 1 or Record Deduplication with Python 5 | 6 | Jupyter Notebook from the talk "1 + 1 = 1 or Record Deduplication with Python", presented at [PyBay 2018](https://www.youtube.com/channel/UC51aOZF5nnderbuar5D5ifw/playlists) and [PyGotham 2018](https://2018.pygotham.org/talks/). The `slides.ipynb` version was presented at PyBay, while the `slides-reduced.ipynb` version was presented at PyGotham. 7 | 8 | ## Running (Binder) 9 | It's possible to run the `slides-reduced.ipynb` version online! Click here: [![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/vintasoftware/deduplication-slides/master?filepath=slides-reduced.ipynb) 10 | 11 | ### Errors on Binder? 12 | In case you face errors on Binder, try again later or [use the non-interactive rendered version](https://nbviewer.jupyter.org/github/vintasoftware/deduplication-slides/blob/master/slides-reduced.ipynb). 13 | 14 | ## Running (Local) 15 | Install `libpostal` (instructions [here](https://github.com/openvenues/libpostal)) and `pip install -r requirements.txt`. Run `jupyter notebook` 16 | 17 | --- 18 | 19 | # 🇧🇷 1 + 1 = 1 ou Pareamento de Registros com Python 20 | 21 | Jupyter Notebook da talk "1 + 1 = 1 ou Pareamento de Registros com Python", apresentada na Python Brasil 2019. 22 | 23 | ## Rodando (Binder) 24 | É possível rodar online, sem precisar instalar nada no seu computador! Clique aqui: [![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/vintasoftware/deduplication-slides/master?filepath=slides-pt-br.ipynb) 25 | 26 | ### Erros no Binder? 27 | Caso o Binder não funcione, tente novamente depois ou [use a versão renderizada sem interação](https://nbviewer.jupyter.org/github/vintasoftware/deduplication-slides/blob/master/slides-pt-br.ipynb). 28 | 29 | ## Rodando (Local) 30 | Instalar `libpostal` (instruções [aqui](https://github.com/openvenues/libpostal)) e `pip install -r requirements.txt`. Rodar `jupyter notebook` e abrir `slides-pt-br.ipynb` 31 | -------------------------------------------------------------------------------- /binder/requirements.txt: -------------------------------------------------------------------------------- 1 | affinegap==1.10 2 | appnope==0.1.0 3 | asn1crypto==0.24.0 4 | backcall==0.1.0 5 | bleach==2.1.3 6 | BTrees==4.5.0 7 | categorical-distance==1.9 8 | certifi==2019.9.11 9 | cffi==1.11.5 10 | chardet==3.0.4 11 | click==6.7 12 | colorama==0.3.9 13 | cryptography==2.3.1 14 | cycler==0.10.0 15 | dateparser==0.7.0 16 | datetime-distance==0.1.3 17 | decorator==4.3.0 18 | dedupe==1.9.9 19 | dedupe-hcluster==0.3.6 20 | dedupe-variable-address==0.0.8 21 | dedupe-variable-datetime==0.1.5 22 | dedupe-variable-fuzzycategory==0.0.1 23 | dedupe-variable-name==0.0.13 24 | DoubleMetaphone==0.1 25 | entrypoints==0.2.3 26 | fastcluster==1.1.24 27 | future==0.16.0 28 | fuzzycategory==0.0.4 29 | geocoder==1.38.1 30 | gitdb2==2.0.4 31 | GitPython==2.1.11 32 | haversine==0.4.5 33 | highered==0.2.1 34 | html5lib==1.0.1 35 | idna==2.7 36 | ipykernel==4.8.2 37 | ipython==6.5.0 38 | ipython-genutils==0.2.0 39 | ipywidgets==7.4.0 40 | jedi==0.12.1 41 | jellyfish==0.6.1 42 | Jinja2==2.10 43 | joblib==0.14.0 44 | jsonschema==2.6.0 45 | jupyter==1.0.0 46 | jupyter-client==5.2.3 47 | jupyter-console==5.2.0 48 | jupyter-contrib-core==0.3.3 49 | jupyter-core==4.4.0 50 | jupyter-nbextensions-configurator==0.4.0 51 | kiwisolver==1.0.1 52 | Levenshtein-search==1.4.4 53 | MarkupSafe==1.0 54 | matplotlib==3.0.0 55 | mistune==0.8.3 56 | nameparser==1.0.1 57 | nbconvert==5.3.1 58 | nbdime==1.0.2 59 | nbformat==4.4.0 60 | networkx==2.2 61 | notebook==5.6.0 62 | numexpr==2.6.8 63 | numpy==1.15.1 64 | pandas==0.23.4 65 | pandocfilters==1.4.2 66 | parseratorvariable==0.0.18 67 | parso==0.3.1 68 | persistent==4.3.0 69 | pexpect==4.6.0 70 | phonenumbers==8.9.14 71 | pickleshare==0.7.4 72 | probableparsing==0.0.1 73 | probablepeople==0.5.4 74 | prometheus-client==0.3.1 75 | prompt-toolkit==1.0.15 76 | ptyprocess==0.6.0 77 | pycparser==2.18 78 | Pygments==2.2.0 79 | pyhacrf-datamade==0.2.3 80 | PyLBFGS==0.2.0.11 81 | pyparsing==2.2.0 82 | python-crfsuite==0.9.6 83 | python-dateutil==2.7.3 84 | python-dotenv==0.10.3 85 | python-geohash==0.8.5 86 | pytz==2018.5 87 | PyYAML==3.13 88 | pyzmq==17.1.0 89 | qtconsole==4.3.1 90 | ratelim==0.1.6 91 | recordlinkage==0.13.2 92 | regex==2018.7.11 93 | requests==2.19.1 94 | rise==5.5.1 95 | rlr==2.4.5 96 | scikit-learn==0.19.2 97 | scipy==1.1.0 98 | Send2Trash==1.5.0 99 | simplecosine==1.2 100 | simplegeneric==0.8.1 101 | simplejson==3.16.0 102 | six==1.11.0 103 | smmap2==2.0.4 104 | terminado==0.8.2 105 | testpath==0.3.1 106 | tornado==5.1 107 | traitlets==4.3.2 108 | treeinterpreter==0.2.1 109 | tzlocal==1.5.1 110 | Unidecode==1.0.22 111 | urllib3==1.23 112 | usaddress==0.5.10 113 | wcwidth==0.1.7 114 | webencodings==0.5.1 115 | widgetsnbextension==3.4.0 116 | zope.index==4.3.0 117 | zope.interface==4.5.0 118 | -------------------------------------------------------------------------------- /dedupe-settings.pickle: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vintasoftware/deduplication-slides/631389413a558ea83a407a47870253325b7b068e/dedupe-settings.pickle -------------------------------------------------------------------------------- /dedupe-simple-settings.pickle: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vintasoftware/deduplication-slides/631389413a558ea83a407a47870253325b7b068e/dedupe-simple-settings.pickle -------------------------------------------------------------------------------- /dedupe-simple-training.json: -------------------------------------------------------------------------------- 1 | {"distinct": [{"__class__": "tuple", "__value__": [{"name": "philippe the original", "addr": "1001 north alameda", "city": "los angeles", "postal": "90012", "cluster": 17, "latlng": {"__class__": "tuple", "__value__": [34.059721, -118.237025]}}, {"name": "pisces", "addr": "95 ave. a", "city": "new york city", "postal": "10009", "cluster": 54, "latlng": {"__class__": "tuple", "__value__": [40.7256332, -73.984031]}}]}, {"__class__": "tuple", "__value__": [{"name": "philippe the original", "addr": "1001 n. alameda st.", "city": "chinatown", "postal": "90012", "cluster": 17, "latlng": {"__class__": "tuple", "__value__": [34.059721, -118.237025]}}, {"name": "mon kee seafood restaurant", "addr": "679 n. spring st.", "city": "los angeles", "postal": "90012", "cluster": 147, "latlng": {"__class__": "tuple", "__value__": [34.0595568, -118.2382488]}}]}, {"__class__": "tuple", "__value__": [{"name": "caffe vivaldi", "addr": "32 jones st. at bleecker st.", "city": "new york", "postal": "10014", "cluster": 211, "latlng": {"__class__": "tuple", "__value__": [40.7317316, -74.00298049999999]}}, {"name": "patria", "addr": "250 park ave. s at 20th st.", "city": "new york", "postal": "10003", "cluster": 323, "latlng": {"__class__": "tuple", "__value__": [40.7382552, -73.988214]}}]}, {"__class__": "tuple", "__value__": [{"name": "i trulli", "addr": "122 e. 27th st. between lexington and park aves.", "city": "new york", "postal": "10028", "cluster": 265, "latlng": {"__class__": "tuple", "__value__": [40.77961699999999, -73.95631999999999]}}, {"name": "otabe", "addr": "68 e. 56th st.", "city": "new york", "postal": "10022", "cluster": 318, "latlng": {"__class__": "tuple", "__value__": [40.7611775, -73.9720541]}}]}, {"__class__": "tuple", "__value__": [{"name": "viva mercado s", "addr": "6182 w. flamingo rd.", "city": "las vegas", "postal": "89103", "cluster": 451, "latlng": {"__class__": "tuple", "__value__": [36.1149027, -115.2269398]}}, {"name": "cafe con leche", "addr": "424 amsterdam ave.", "city": "new york city", "postal": "10024", "cluster": 605, "latlng": {"__class__": "tuple", "__value__": [40.7841454, -73.9778061]}}]}, {"__class__": "tuple", "__value__": [{"name": "faz", "addr": "161 sutter st.", "city": "san francisco", "postal": "94104", "cluster": 469, "latlng": {"__class__": "tuple", "__value__": [37.78973999999999, -122.4032937]}}, {"name": "splendido embarcadero", "addr": "4", "city": "san francisco", "postal": null, "cluster": 513, "latlng": {"__class__": "tuple", "__value__": [37.7779649, -122.3962019]}}]}, {"__class__": "tuple", "__value__": [{"name": "hi life restaurant and lounge", "addr": "1340 1st ave. at 72nd st.", "city": "new york", "postal": "10021", "cluster": 262, "latlng": {"__class__": "tuple", "__value__": [40.7675807, -73.95580029999999]}}, {"name": "trattoria dell arte", "addr": "900 7th ave. between 56th and 57th sts.", "city": "new york", "postal": "10106", "cluster": 362, "latlng": {"__class__": "tuple", "__value__": [40.7654454, -73.9805358]}}]}, {"__class__": "tuple", "__value__": [{"name": "lattanzi ristorante", "addr": "361 w. 46th st.", "city": "new york", "postal": "10036", "cluster": 283, "latlng": {"__class__": "tuple", "__value__": [40.7608494, -73.990016]}}, {"name": "pomaire", "addr": "371 w. 46th st. off 9th ave.", "city": "new york", "postal": "10036", "cluster": 329, "latlng": {"__class__": "tuple", "__value__": [40.7609632, -73.9902681]}}]}, {"__class__": "tuple", "__value__": [{"name": "empire korea", "addr": "6 e. 32nd st.", "city": "new york", "postal": "10016", "cluster": 233, "latlng": {"__class__": "tuple", "__value__": [40.7465392, -73.9849772]}}, {"name": "tu lan", "addr": "8 sixth st.", "city": "san francisco", "postal": "94103", "cluster": 750, "latlng": {"__class__": "tuple", "__value__": [37.7818759, -122.41013]}}]}, {"__class__": "tuple", "__value__": [{"name": "mesa grill", "addr": "102 5th ave. between 15th and 16th sts.", "city": "new york", "postal": "10011", "cluster": 47, "latlng": {"__class__": "tuple", "__value__": [40.7370445, -73.9931189]}}, {"name": "fiorello s roman cafe", "addr": "1900 broadway between 63rd and 64th sts.", "city": "new york", "postal": "10023", "cluster": 241, "latlng": {"__class__": "tuple", "__value__": [40.7715867, -73.98138519999999]}}]}, {"__class__": "tuple", "__value__": [{"name": "second street grill", "addr": "200 e. fremont st.", "city": "las vegas", "postal": "89101", "cluster": 71, "latlng": {"__class__": "tuple", "__value__": [36.1709271, -115.1431516]}}, {"name": "paty s", "addr": "10001 riverside dr.", "city": "toluca lake", "postal": "91602", "cluster": 154, "latlng": {"__class__": "tuple", "__value__": [34.1524376, -118.3496191]}}]}, {"__class__": "tuple", "__value__": [{"name": "le colonial", "addr": "149 e. 57th st.", "city": "new york", "postal": "10022", "cluster": 286, "latlng": {"__class__": "tuple", "__value__": [40.7608569, -73.9683494]}}, {"name": "cassell s", "addr": "3266 w. sixth st.", "city": "la", "postal": "90020", "cluster": 547, "latlng": {"__class__": "tuple", "__value__": [34.0634809, -118.2936285]}}]}, {"__class__": "tuple", "__value__": [{"name": "cafe ritz carlton buckhead", "addr": "3434 peachtree rd.", "city": "atlanta", "postal": "30326", "cluster": 89, "latlng": {"__class__": "tuple", "__value__": [33.8508073, -84.364227]}}, {"name": "ritz carlton dining room buckhead ", "addr": "3434 peachtree rd. ne", "city": "atlanta", "postal": "30326", "cluster": 90, "latlng": {"__class__": "tuple", "__value__": [33.8508073, -84.364227]}}]}, {"__class__": "tuple", "__value__": [{"name": "pamir", "addr": "1065 1st ave. at 58th st.", "city": "new york", "postal": "10022", "cluster": 321, "latlng": {"__class__": "tuple", "__value__": [40.7591914, -73.9626112]}}, {"name": "rosa mexicano", "addr": "1063 1st ave. at 58th st.", "city": "new york", "postal": "10022", "cluster": 337, "latlng": {"__class__": "tuple", "__value__": [40.75896580000001, -73.9627039]}}]}, {"__class__": "tuple", "__value__": [{"name": "le gamin", "addr": "50 macdougal st. between houston and prince sts.", "city": "new york", "postal": "10012", "cluster": 287, "latlng": {"__class__": "tuple", "__value__": [40.7273246, -74.0024635]}}, {"name": "le marais", "addr": "150 w. 46th st.", "city": "new york", "postal": "10036", "cluster": 290, "latlng": {"__class__": "tuple", "__value__": [40.7580041, -73.98431149999999]}}]}, {"__class__": "tuple", "__value__": [{"name": "restaurant ritz carlton atlanta", "addr": "181 peachtree st.", "city": "atlanta", "postal": "30303", "cluster": 91, "latlng": {"__class__": "tuple", "__value__": [33.7585793, -84.3870657]}}, {"name": "ritz carlton cafe atlanta ", "addr": "181 peachtree st.", "city": "atlanta", "postal": "30303", "cluster": 711, "latlng": {"__class__": "tuple", "__value__": [33.7585793, -84.3870657]}}]}, {"__class__": "tuple", "__value__": [{"name": "tillerman", "addr": "2245 e. flamingo rd.", "city": "las vegas", "postal": "89119", "cluster": 73, "latlng": {"__class__": "tuple", "__value__": [36.114384, -115.1218936]}}, {"name": "original pantry bakery", "addr": "875 s. figueroa st. downtown", "city": "la", "postal": "90017", "cluster": 583, "latlng": {"__class__": "tuple", "__value__": [34.0464451, -118.2628321]}}]}, {"__class__": "tuple", "__value__": [{"name": "paty s", "addr": "10001 riverside dr.", "city": "toluca lake", "postal": "91602", "cluster": 154, "latlng": {"__class__": "tuple", "__value__": [34.1524376, -118.3496191]}}, {"name": "restaurant horikawa", "addr": "111 s. san pedro st.", "city": "los angeles", "postal": "90012", "cluster": 160, "latlng": {"__class__": "tuple", "__value__": [34.0500968, -118.2413802]}}]}, {"__class__": "tuple", "__value__": [{"name": "cafe ritz carlton buckhead", "addr": "3434 peachtree rd.", "city": "atlanta", "postal": "30326", "cluster": 89, "latlng": {"__class__": "tuple", "__value__": [33.8508073, -84.364227]}}, {"name": "dining room ritz carlton buckhead", "addr": "3434 peachtree rd.", "city": "atlanta", "postal": "30326", "cluster": 90, "latlng": {"__class__": "tuple", "__value__": [33.8508073, -84.364227]}}]}, {"__class__": "tuple", "__value__": [{"name": "arnie morton s of chicago", "addr": "435 s. la cienega blvd.", "city": "los angeles", "postal": "90048", "cluster": 0, "latlng": {"__class__": "tuple", "__value__": [34.070609, -118.376722]}}, {"name": "sarabeth s kitchen", "addr": "423 amsterdam ave. between 80th and 81st sts.", "city": "new york", "postal": "10024", "cluster": 344, "latlng": {"__class__": "tuple", "__value__": [40.7838797, -73.97742439999999]}}]}, {"__class__": "tuple", "__value__": [{"name": "le colonial", "addr": "8783 beverly blvd.", "city": "los angeles", "postal": "90048", "cluster": 144, "latlng": {"__class__": "tuple", "__value__": [34.0773657, -118.3833566]}}, {"name": "cendrillon asian grill marimba bar", "addr": "45 mercer st. between broome and grand sts.", "city": "new york", "postal": "10013", "cluster": 216, "latlng": {"__class__": "tuple", "__value__": [40.721674, -74.001407]}}]}, {"__class__": "tuple", "__value__": [{"name": "ritz carlton cafe buckhead ", "addr": "3434 peachtree rd. ne", "city": "atlanta", "postal": "30326", "cluster": 89, "latlng": {"__class__": "tuple", "__value__": [33.8508073, -84.364227]}}, {"name": "dining room ritz carlton buckhead", "addr": "3434 peachtree rd.", "city": "atlanta", "postal": "30326", "cluster": 90, "latlng": {"__class__": "tuple", "__value__": [33.8508073, -84.364227]}}]}], "match": [{"__class__": "tuple", "__value__": [{"name": "le bernardin", "addr": "155 w. 51st st.", "city": "new york", "postal": "10019", "cluster": 41, "latlng": {"__class__": "tuple", "__value__": [40.7615691, -73.98180479999999]}}, {"name": "le bernardin", "addr": "155 w. 51st st.", "city": "new york city", "postal": "10019", "cluster": 41, "latlng": {"__class__": "tuple", "__value__": [40.7615691, -73.98180479999999]}}]}, {"__class__": "tuple", "__value__": [{"name": "cafe lalo", "addr": "201 w. 83rd st.", "city": "new york", "postal": "10024", "cluster": 26, "latlng": {"__class__": "tuple", "__value__": [40.78598119999999, -73.97672659999999]}}, {"name": "cafe lalo", "addr": "201 w. 83rd st.", "city": "new york city", "postal": "10024", "cluster": 26, "latlng": {"__class__": "tuple", "__value__": [40.78598119999999, -73.97672659999999]}}]}, {"__class__": "tuple", "__value__": [{"name": "aquavit", "addr": "13 w. 54th st.", "city": "new york", "postal": "10019", "cluster": 24, "latlng": {"__class__": "tuple", "__value__": [40.7616767, -73.976345]}}, {"name": "aquavit", "addr": "13 w. 54th st.", "city": "new york city", "postal": "10019", "cluster": 24, "latlng": {"__class__": "tuple", "__value__": [40.7616767, -73.976345]}}]}, {"__class__": "tuple", "__value__": [{"name": "second avenue deli", "addr": "156 2nd ave. at 10th st.", "city": "new york", "postal": "10003", "cluster": 58, "latlng": {"__class__": "tuple", "__value__": [40.7296096, -73.9867012]}}, {"name": "second avenue deli", "addr": "156 second ave.", "city": "new york city", "postal": "10003", "cluster": 58, "latlng": {"__class__": "tuple", "__value__": [40.7296096, -73.9867012]}}]}, {"__class__": "tuple", "__value__": [{"name": "cafe ritz carlton buckhead", "addr": "3434 peachtree rd.", "city": "atlanta", "postal": "30326", "cluster": 89, "latlng": {"__class__": "tuple", "__value__": [33.8508073, -84.364227]}}, {"name": "ritz carlton cafe buckhead ", "addr": "3434 peachtree rd. ne", "city": "atlanta", "postal": "30326", "cluster": 89, "latlng": {"__class__": "tuple", "__value__": [33.8508073, -84.364227]}}]}, {"__class__": "tuple", "__value__": [{"name": "smith wollensky", "addr": "201 e. 49th st.", "city": "new york", "postal": "10017", "cluster": 62, "latlng": {"__class__": "tuple", "__value__": [40.755156, -73.9707177]}}, {"name": "smith wollensky", "addr": "797 third ave.", "city": "new york city", "postal": "10022", "cluster": 62, "latlng": {"__class__": "tuple", "__value__": [40.7551704, -73.9707437]}}]}, {"__class__": "tuple", "__value__": [{"name": "lespinasse", "addr": "2 e. 55th st.", "city": "new york", "postal": "10022", "cluster": 43, "latlng": {"__class__": "tuple", "__value__": [40.7613979, -73.9746128]}}, {"name": "lespinasse new york city ", "addr": "2 e. 55th st.", "city": "new york city", "postal": "10022", "cluster": 43, "latlng": {"__class__": "tuple", "__value__": [40.7613979, -73.9746128]}}]}, {"__class__": "tuple", "__value__": [{"name": "philippe s the original", "addr": "1001 n. alameda st.", "city": "los angeles", "postal": "90012", "cluster": 17, "latlng": {"__class__": "tuple", "__value__": [34.059721, -118.237025]}}, {"name": "philippe the original", "addr": "1001 north alameda", "city": "los angeles", "postal": "90012", "cluster": 17, "latlng": {"__class__": "tuple", "__value__": [34.059721, -118.237025]}}]}, {"__class__": "tuple", "__value__": [{"name": "georgia grille", "addr": "2290 peachtree rd. peachtree square shopping center", "city": "atlanta", "postal": "30309", "cluster": 81, "latlng": {"__class__": "tuple", "__value__": [33.8168771, -84.3905065]}}, {"name": "georgia grille", "addr": "2290 peachtree rd.", "city": "atlanta", "postal": "30309", "cluster": 81, "latlng": {"__class__": "tuple", "__value__": [33.8171632, -84.3900366]}}]}, {"__class__": "tuple", "__value__": [{"name": "mifune japan center kintetsu building", "addr": "1737 post st.", "city": "san francisco", "postal": "94115", "cluster": 107, "latlng": {"__class__": "tuple", "__value__": [37.785329, -122.430369]}}, {"name": "mifune", "addr": "1737 post st.", "city": "san francisco", "postal": "94115", "cluster": 107, "latlng": {"__class__": "tuple", "__value__": [37.785329, -122.430369]}}]}, {"__class__": "tuple", "__value__": [{"name": "restaurant ritz carlton atlanta", "addr": "181 peachtree st.", "city": "atlanta", "postal": "30303", "cluster": 91, "latlng": {"__class__": "tuple", "__value__": [33.7585793, -84.3870657]}}, {"name": "ritz carlton restaurant", "addr": "181 peachtree st.", "city": "atlanta", "postal": "30303", "cluster": 91, "latlng": {"__class__": "tuple", "__value__": [33.7585793, -84.3870657]}}]}, {"__class__": "tuple", "__value__": [{"name": "le montrachet", "addr": "3000 w. paradise rd.", "city": "las vegas", "postal": "89109", "cluster": 69, "latlng": {"__class__": "tuple", "__value__": [36.1362611, -115.1512539]}}, {"name": "le montrachet bistro", "addr": "3000 paradise rd.", "city": "las vegas", "postal": "89109", "cluster": 69, "latlng": {"__class__": "tuple", "__value__": [36.1362611, -115.1512539]}}]}]} -------------------------------------------------------------------------------- /dedupe-slides-training.json: -------------------------------------------------------------------------------- 1 | {"distinct": [{"__class__": "tuple", "__value__": [{"name": "cite", "addr": "120 w. 51st st.", "city": "new york", "type": "french", "postal": "10019", "addr_variations": {"__class__": "frozenset", "__value__": ["120 w 51st street", "120 w 51 saint", "120 west 51 street", "120 west 51 saint", "120 west 51st street", "120 west 51st saint", "120 w 51st saint", "120 w 51 street"]}, "latlng": {"__class__": "tuple", "__value__": [40.7607952, -73.9812268]}}, {"name": "new york noodletown", "addr": "28 1/2 bowery at bayard st.", "city": "new york", "type": "asian", "postal": "10013", "addr_variations": {"__class__": "frozenset", "__value__": ["28 1 2 bowery at bayard saint", "28 1 2 bowery at bayard street"]}, "latlng": {"__class__": "tuple", "__value__": [40.7150317, -73.9970383]}}]}, {"__class__": "tuple", "__value__": [{"name": "bernardin", "addr": "155 w. 51st st.", "city": "new york city", "type": "seafood", "postal": "10019", "addr_variations": {"__class__": "frozenset", "__value__": ["155 west 51st saint", "155 w 51 street", "155 west 51 street", "155 w 51st street", "155 west 51 saint", "155 w 51st saint", "155 west 51st street", "155 w 51 saint"]}, "latlng": {"__class__": "tuple", "__value__": [40.7615691, -73.98180479999999]}}, {"name": "republic", "addr": "37a union sq. w between 16th and 17th sts.", "city": "new york", "type": "asian", "postal": "10003", "addr_variations": {"__class__": "frozenset", "__value__": ["37a union square west between 16th and 17 streets", "37 a union square west between 16th and 17th streets", "37 a union square w between 16th and 17 streets", "37 a union square west between 16 and 17 streets", "37a union square west between 16 and 17 streets", "37a union square w between 16 and 17 streets", "37 a union square west between 16th and 17 streets", "37a union square w between 16 and 17th streets", "37a union square west between 16 and 17th streets", "37a union square w between 16th and 17th streets", "37 a union square w between 16th and 17th streets", "37a union square west between 16th and 17th streets", "37a union square w between 16th and 17 streets", "37 a union square w between 16 and 17th streets", "37 a union square west between 16 and 17th streets", "37 a union square w between 16 and 17 streets"]}, "latlng": {"__class__": "tuple", "__value__": [40.7369985, -73.9907851]}}]}, {"__class__": "tuple", "__value__": [{"name": "dawat", "addr": "210 e. 58th st.", "city": "new york", "type": "asian", "postal": null, "addr_variations": {"__class__": "frozenset", "__value__": ["210 e 58th street", "210 east 58 street", "210 east 58th street", "210 e 58 street", "210 e 58 saint", "210 e 58th saint", "210 east 58 saint", "210 east 58th saint"]}, "latlng": null}, {"name": "rainbow restaurant", "addr": "2118 n. decatur rd.", "city": "decatur", "type": "vegetarian", "postal": "30033", "addr_variations": {"__class__": "frozenset", "__value__": ["2118 n decatur road", "2118 north decatur road"]}, "latlng": {"__class__": "tuple", "__value__": [33.7908588, -84.3052307]}}]}, {"__class__": "tuple", "__value__": [{"name": "yujean kang gourmet chinese cuisine", "addr": "67 n. raymond ave.", "city": "los angeles", "type": "asian", "postal": "91103", "addr_variations": {"__class__": "frozenset", "__value__": ["67 north raymond avenue", "67 nord raymond avenue", "67 n raymond avenue"]}, "latlng": {"__class__": "tuple", "__value__": [34.147086, -118.1490988]}}, {"name": "ruby", "addr": "45 s. fair oaks ave.", "city": "pasadena", "type": "diners", "postal": "91105", "addr_variations": {"__class__": "frozenset", "__value__": ["45 san fair oaks avenue", "45 south fair oaks avenue", "45 s fair oaks avenue"]}, "latlng": {"__class__": "tuple", "__value__": [34.1449715, -118.1506038]}}]}, {"__class__": "tuple", "__value__": [{"name": "coyote cafe", "addr": "3799 las vegas blvd. s", "city": "las vegas", "type": "southwestern", "postal": "89109", "addr_variations": {"__class__": "frozenset", "__value__": ["3799 las vegas boulevard san", "3799 las vegas boulevard south", "3799 las vegas boulevard s", "3799 las vegas boulevard sur"]}, "latlng": {"__class__": "tuple", "__value__": [36.1022507, -115.1699679]}}, {"name": "tre visi", "addr": "3799 las vegas blvd. s.", "city": "las vegas", "type": "italian", "postal": "89109", "addr_variations": {"__class__": "frozenset", "__value__": ["3799 las vegas boulevard san", "3799 las vegas boulevard south", "3799 las vegas boulevard s", "3799 las vegas boulevard sur"]}, "latlng": {"__class__": "tuple", "__value__": [36.1022507, -115.1699679]}}]}, {"__class__": "tuple", "__value__": [{"name": "osteria del forno", "addr": "519 columbus ave.", "city": "san francisco", "type": "italian", "postal": "94133", "addr_variations": {"__class__": "frozenset", "__value__": ["519 columbus avenue"]}, "latlng": {"__class__": "tuple", "__value__": [37.799736, -122.4096355]}}, {"name": "caffe greco", "addr": "423 columbus ave.", "city": "san francisco", "type": "continental", "postal": "94133", "addr_variations": {"__class__": "frozenset", "__value__": ["423 columbus avenue"]}, "latlng": {"__class__": "tuple", "__value__": [37.7989568, -122.4086733]}}]}, {"__class__": "tuple", "__value__": [{"name": "orangerie", "addr": "903 n. la cienega blvd.", "city": "los angeles", "type": "french", "postal": "90069", "addr_variations": {"__class__": "frozenset", "__value__": ["903 n lane cienega boulevard", "903 north louisiana cienega boulevard", "903 n louisiana cienega boulevard", "903 norte la cienega boulevard", "903 n la cienega boulevard", "903 norte lane cienega boulevard", "903 north lane cienega boulevard", "903 norte louisiana cienega boulevard", "903 north la cienega boulevard"]}, "latlng": {"__class__": "tuple", "__value__": [34.0870981, -118.376626]}}, {"name": "drai", "addr": "730 n. la cienega blvd.", "city": "los angeles", "type": "french", "postal": "90069", "addr_variations": {"__class__": "frozenset", "__value__": ["730 n lane cienega boulevard", "730 north lane cienega boulevard", "730 norte la cienega boulevard", "730 norte louisiana cienega boulevard", "730 n louisiana cienega boulevard", "730 norte lane cienega boulevard", "730 north louisiana cienega boulevard", "730 n la cienega boulevard", "730 north la cienega boulevard"]}, "latlng": {"__class__": "tuple", "__value__": [34.0845064, -118.3761899]}}]}, {"__class__": "tuple", "__value__": [{"name": "chin", "addr": "3200 las vegas blvd. s", "city": "las vegas", "type": "asian", "postal": "89109", "addr_variations": {"__class__": "frozenset", "__value__": ["3200 las vegas boulevard s", "3200 las vegas boulevard south", "3200 las vegas boulevard san", "3200 las vegas boulevard sur"]}, "latlng": {"__class__": "tuple", "__value__": [36.1275236, -115.1715003]}}, {"name": "morton chicago las vegas", "addr": "3200 las vegas blvd. s.", "city": "las vegas", "type": "steakhouses", "postal": "89109", "addr_variations": {"__class__": "frozenset", "__value__": ["3200 las vegas boulevard s", "3200 las vegas boulevard south", "3200 las vegas boulevard san", "3200 las vegas boulevard sur"]}, "latlng": {"__class__": "tuple", "__value__": [36.1275236, -115.1715003]}}]}, {"__class__": "tuple", "__value__": [{"name": "locanda veneta", "addr": "3rd st.", "city": "los angeles", "type": "italian", "postal": null, "addr_variations": {"__class__": "frozenset", "__value__": ["3 saint", "3rd street", "3 street", "3rd saint"]}, "latlng": {"__class__": "tuple", "__value__": [33.4947903, -112.069374]}}, {"name": "sofi", "addr": "3rd st.", "city": "los angeles", "type": "mediterranean", "postal": null, "addr_variations": {"__class__": "frozenset", "__value__": ["3 saint", "3rd street", "3 street", "3rd saint"]}, "latlng": {"__class__": "tuple", "__value__": [33.4947903, -112.069374]}}]}, {"__class__": "tuple", "__value__": [{"name": "postrio", "addr": "545 post st.", "city": "san francisco", "type": "american", "postal": "94102", "addr_variations": {"__class__": "frozenset", "__value__": ["545 post street", "545 post saint"]}, "latlng": {"__class__": "tuple", "__value__": [37.78782959999999, -122.4107561]}}, {"name": "pacific pan pacific hotel", "addr": "500 post st.", "city": "san francisco", "type": "french", "postal": "94102", "addr_variations": {"__class__": "frozenset", "__value__": ["500 post saint", "500 post street"]}, "latlng": {"__class__": "tuple", "__value__": [37.7883396, -122.4103029]}}]}, {"__class__": "tuple", "__value__": [{"name": "teresa", "addr": "103 1st ave. between 6th and 7th sts.", "city": "new york", "type": "east european", "postal": "10003", "addr_variations": {"__class__": "frozenset", "__value__": ["103 1 avenue between 6 and 7 streets", "103 1 avenue between 6th and 7 streets", "103 1st avenue between 6 and 7th streets", "103 1st avenue between 6th and 7 streets", "103 1 avenue between 6th and 7th streets", "103 1st avenue between 6th and 7th streets", "103 1st avenue between 6 and 7 streets", "103 1 avenue between 6 and 7th streets"]}, "latlng": {"__class__": "tuple", "__value__": [40.7266961, -73.9861943]}}, {"name": "teresa", "addr": "80 montague st.", "city": "queens", "type": "polish", "postal": "11201", "addr_variations": {"__class__": "frozenset", "__value__": ["80 montague saint", "80 montague street"]}, "latlng": {"__class__": "tuple", "__value__": [40.6951748, -73.9962484]}}]}, {"__class__": "tuple", "__value__": [{"name": "cafe ritz carlton buckhead", "addr": "3434 peachtree rd.", "city": "atlanta", "type": "ext 6108 international", "postal": "30326", "addr_variations": {"__class__": "frozenset", "__value__": ["3434 peachtree road"]}, "latlng": {"__class__": "tuple", "__value__": [33.8508073, -84.364227]}}, {"name": "dining room ritz carlton buckhead", "addr": "3434 peachtree rd.", "city": "atlanta", "type": "international", "postal": "30326", "addr_variations": {"__class__": "frozenset", "__value__": ["3434 peachtree road"]}, "latlng": {"__class__": "tuple", "__value__": [33.8508073, -84.364227]}}]}, {"__class__": "tuple", "__value__": [{"name": "c3", "addr": "103 waverly pl. near washington sq.", "city": "new york", "type": "american", "postal": "10011", "addr_variations": {"__class__": "frozenset", "__value__": ["103 waverly plain near washington square", "103 waverly place near washington square"]}, "latlng": {"__class__": "tuple", "__value__": [40.7324496, -73.9987276]}}, {"name": "caffe dell artista", "addr": "46 greenwich ave.", "city": "new york", "type": "coffee bar", "postal": "10011", "addr_variations": {"__class__": "frozenset", "__value__": ["46 greenwich avenue"]}, "latlng": {"__class__": "tuple", "__value__": [40.735596, -74.000357]}}]}, {"__class__": "tuple", "__value__": [{"name": "main street", "addr": "446 columbus ave. between 81st and 82nd sts.", "city": "new york", "type": "american", "postal": "10024", "addr_variations": {"__class__": "frozenset", "__value__": ["446 columbus avenue between 81st and 82 streets", "446 columbus avenue between 81 and 82 streets", "446 columbus avenue between 81st and 82nd streets", "446 columbus avenue between 81 and 82nd streets"]}, "latlng": {"__class__": "tuple", "__value__": [40.784841, -73.97746599999999]}}, {"name": "rain", "addr": "100 w. 82nd st.", "city": "new york", "type": "asian", "postal": "10024", "addr_variations": {"__class__": "frozenset", "__value__": ["100 w 82 street", "100 w 82nd saint", "100 w 82 saint", "100 west 82nd street", "100 w 82nd street", "100 west 82nd saint", "100 west 82 saint", "100 west 82 street"]}, "latlng": {"__class__": "tuple", "__value__": [40.7839758, -73.9745045]}}]}, {"__class__": "tuple", "__value__": [{"name": "folie", "addr": "2316 polk st.", "city": "san francisco", "type": "french", "postal": "94109", "addr_variations": {"__class__": "frozenset", "__value__": ["2316 polk street", "2316 polk saint"]}, "latlng": {"__class__": "tuple", "__value__": [37.7981417, -122.4220609]}}, {"name": "mario bohemian cigar store cafe", "addr": "2209 polk st.", "city": "san francisco", "type": "italian", "postal": "94109", "addr_variations": {"__class__": "frozenset", "__value__": ["2209 polk street", "2209 polk saint"]}, "latlng": {"__class__": "tuple", "__value__": [37.7971197, -122.4222379]}}]}, {"__class__": "tuple", "__value__": [{"name": "locanda veneta", "addr": "3rd st.", "city": "los angeles", "type": "italian", "postal": null, "addr_variations": {"__class__": "frozenset", "__value__": ["3 saint", "3rd street", "3 street", "3rd saint"]}, "latlng": {"__class__": "tuple", "__value__": [33.4947903, -112.069374]}}, {"name": "cava", "addr": "3rd st.", "city": "los angeles", "type": "mediterranean", "postal": null, "addr_variations": {"__class__": "frozenset", "__value__": ["3 saint", "3rd street", "3 street", "3rd saint"]}, "latlng": {"__class__": "tuple", "__value__": [33.4947903, -112.069374]}}]}, {"__class__": "tuple", "__value__": [{"name": "lanza restaurant", "addr": "168 1st ave. between 10th and 11th sts.", "city": "new york", "type": "italian", "postal": "10009", "addr_variations": {"__class__": "frozenset", "__value__": ["168 1st avenue between 10 and 11 streets", "168 1st avenue between 10 and 11th streets", "168 1st avenue between 10th and 11th streets", "168 1 avenue between 10 and 11th streets", "168 1 avenue between 10 and 11 streets", "168 1 avenue between 10th and 11th streets", "168 1st avenue between 10th and 11 streets", "168 1 avenue between 10th and 11 streets"]}, "latlng": {"__class__": "tuple", "__value__": [40.728755, -73.98406500000002]}}, {"name": "xunta", "addr": "174 1st ave. between 10th and 11th sts.", "city": "new york", "type": "mediterranean", "postal": "10009", "addr_variations": {"__class__": "frozenset", "__value__": ["174 1 avenue between 10th and 11th streets", "174 1 avenue between 10th and 11 streets", "174 1st avenue between 10 and 11th streets", "174 1st avenue between 10th and 11 streets", "174 1st avenue between 10th and 11th streets", "174 1 avenue between 10 and 11 streets", "174 1 avenue between 10 and 11th streets", "174 1st avenue between 10 and 11 streets"]}, "latlng": {"__class__": "tuple", "__value__": [40.72907, -73.983948]}}]}, {"__class__": "tuple", "__value__": [{"name": "caffe lure", "addr": "169 sullivan st. between houston and bleecker sts.", "city": "new york", "type": "french", "postal": "10012", "addr_variations": {"__class__": "frozenset", "__value__": ["169 sullivan street between houston and bleecker streets", "169 sullivan saint between houston and bleecker streets"]}, "latlng": {"__class__": "tuple", "__value__": [40.7279278, -74.0009847]}}, {"name": "caffe reggio", "addr": "119 macdougal st. between 3rd and bleecker sts.", "city": "new york", "type": "coffee bar", "postal": "10012", "addr_variations": {"__class__": "frozenset", "__value__": ["119 macdougal saint between 3rd and bleecker streets", "119 macdougal street between 3rd and bleecker streets", "119 macdougal street between 3 and bleecker streets", "119 macdougal saint between 3 and bleecker streets"]}, "latlng": {"__class__": "tuple", "__value__": [40.73030790000001, -74.0003706]}}]}, {"__class__": "tuple", "__value__": [{"name": "ritz carlton dining room buckhead", "addr": "3434 peachtree rd. ne", "city": "atlanta", "type": "american (new)", "postal": "30326", "addr_variations": {"__class__": "frozenset", "__value__": ["3434 peachtree road northeast", "3434 peachtree road nebraska", "3434 peachtree road ne"]}, "latlng": {"__class__": "tuple", "__value__": [33.8508073, -84.364227]}}, {"name": "monte carlo", "addr": "3145 las vegas blvd. s.", "city": "las vegas", "type": "french (new)", "postal": "89109", "addr_variations": {"__class__": "frozenset", "__value__": ["3145 las vegas boulevard sur", "3145 las vegas boulevard s", "3145 las vegas boulevard san", "3145 las vegas boulevard south"]}, "latlng": {"__class__": "tuple", "__value__": [36.127675, -115.1664725]}}]}, {"__class__": "tuple", "__value__": [{"name": "park avenue cafe", "addr": "100 e. 63rd st.", "city": "new york", "type": "american", "postal": "10065", "addr_variations": {"__class__": "frozenset", "__value__": ["100 e 63 saint", "100 east 63rd saint", "100 east 63rd street", "100 e 63rd street", "100 e 63rd saint", "100 e 63 street", "100 east 63 street", "100 east 63 saint"]}, "latlng": {"__class__": "tuple", "__value__": [40.7650225, -73.9676044]}}, {"name": "west beach cafe", "addr": "60 n. venice blvd.", "city": "los angeles", "type": "american", "postal": "90291", "addr_variations": {"__class__": "frozenset", "__value__": ["60 north venice boulevard", "60 n venice boulevard"]}, "latlng": {"__class__": "tuple", "__value__": [33.984674, -118.4703147]}}]}, {"__class__": "tuple", "__value__": [{"name": "rain", "addr": "100 w. 82nd st.", "city": "new york", "type": "asian", "postal": "10024", "addr_variations": {"__class__": "frozenset", "__value__": ["100 w 82 street", "100 w 82nd saint", "100 w 82 saint", "100 west 82nd street", "100 w 82nd street", "100 west 82nd saint", "100 west 82 saint", "100 west 82 street"]}, "latlng": {"__class__": "tuple", "__value__": [40.7839758, -73.9745045]}}, {"name": "splendido embarcadero", "addr": "4", "city": "san francisco", "type": "mediterranean", "postal": null, "addr_variations": {"__class__": "frozenset", "__value__": ["4"]}, "latlng": {"__class__": "tuple", "__value__": [37.7773755, -122.395447]}}]}, {"__class__": "tuple", "__value__": [{"name": "cava", "addr": "3rd st.", "city": "los angeles", "type": "mediterranean", "postal": null, "addr_variations": {"__class__": "frozenset", "__value__": ["3 saint", "3rd street", "3 street", "3rd saint"]}, "latlng": {"__class__": "tuple", "__value__": [33.4947903, -112.069374]}}, {"name": "veniero pasticceria", "addr": "342 e. 11th st. near 1st ave.", "city": "new york", "type": "coffee bar", "postal": "10003", "addr_variations": {"__class__": "frozenset", "__value__": ["342 east 11th saint near 1 avenue", "342 e 11th saint near 1 avenue", "342 east 11 saint near 1 avenue", "342 east 11 street near 1st avenue", "342 east 11th street near 1st avenue", "342 e 11 saint near 1 avenue", "342 east 11th street near 1 avenue", "342 e 11 street near 1st avenue", "342 east 11th saint near 1st avenue", "342 e 11th street near 1 avenue", "342 e 11th saint near 1st avenue", "342 e 11 saint near 1st avenue", "342 east 11 saint near 1st avenue", "342 e 11 street near 1 avenue", "342 e 11th street near 1st avenue", "342 east 11 street near 1 avenue"]}, "latlng": {"__class__": "tuple", "__value__": [40.7294893, -73.98452019999999]}}]}, {"__class__": "tuple", "__value__": [{"name": "szechuan hunan cottage", "addr": "1588 york ave.", "city": "new york city", "type": "chinese", "postal": "10028", "addr_variations": {"__class__": "frozenset", "__value__": ["1588 york avenue"]}, "latlng": {"__class__": "tuple", "__value__": [40.7743013, -73.94803689999999]}}, {"name": "szechuan kitchen", "addr": "1460 first ave.", "city": "new york city", "type": "chinese", "postal": "10021", "addr_variations": {"__class__": "frozenset", "__value__": ["1460 1st avenue", "1460 1 avenue"]}, "latlng": {"__class__": "tuple", "__value__": [40.7700976, -73.95371999999999]}}]}, {"__class__": "tuple", "__value__": [{"name": "jody maroni sausage kingdom", "addr": "2011 ocean front walk", "city": "venice", "type": "hot dogs", "postal": "90291", "addr_variations": {"__class__": "frozenset", "__value__": ["2011 ocean front walk"]}, "latlng": {"__class__": "tuple", "__value__": [33.9846332, -118.471432]}}, {"name": "joe", "addr": "1023 abbot kinney blvd.", "city": "venice", "type": "american (new)", "postal": "90291", "addr_variations": {"__class__": "frozenset", "__value__": ["1023 abbot kinney boulevard"]}, "latlng": {"__class__": "tuple", "__value__": [33.9922429, -118.4718658]}}]}], "match": [{"__class__": "tuple", "__value__": [{"name": "dawat", "addr": "210 e. 58th st.", "city": "new york", "type": "asian", "postal": null, "addr_variations": {"__class__": "frozenset", "__value__": ["210 e 58th street", "210 east 58 street", "210 east 58th street", "210 e 58 street", "210 e 58 saint", "210 e 58th saint", "210 east 58 saint", "210 east 58th saint"]}, "latlng": null}, {"name": "dawat", "addr": "210 e. 58th st.", "city": "new york city", "type": "indian", "postal": "10022", "addr_variations": {"__class__": "frozenset", "__value__": ["210 e 58th street", "210 east 58 street", "210 east 58th street", "210 e 58 street", "210 e 58 saint", "210 e 58th saint", "210 east 58 saint", "210 east 58th saint"]}, "latlng": {"__class__": "tuple", "__value__": [40.7604227, -73.9664276]}}]}, {"__class__": "tuple", "__value__": [{"name": "art delicatessen", "addr": "12224 ventura blvd.", "city": "studio city", "type": "american", "postal": "91604", "addr_variations": {"__class__": "frozenset", "__value__": ["12224 ventura boulevard"]}, "latlng": {"__class__": "tuple", "__value__": [34.1429661, -118.3994688]}}, {"name": "art deli", "addr": "12224 ventura blvd.", "city": "los angeles", "type": "delis", "postal": "91604", "addr_variations": {"__class__": "frozenset", "__value__": ["12224 ventura boulevard"]}, "latlng": {"__class__": "tuple", "__value__": [34.1429661, -118.3994688]}}]}, {"__class__": "tuple", "__value__": [{"name": "gotham bar grill", "addr": "12 e 12th st", "city": "new york city", "type": "new american", "postal": "10003", "addr_variations": {"__class__": "frozenset", "__value__": ["12 east 12 saint", "12 east 12 street", "12 east 12th street", "12 e 12th saint", "12 east 12th saint", "12 e 12th street", "12 e 12 street", "12 e 12 saint"]}, "latlng": {"__class__": "tuple", "__value__": [40.734207, -73.99369899999999]}}, {"name": "gotham", "addr": "12 e 12th st", "city": "new york", "type": "new american", "postal": "10003", "addr_variations": {"__class__": "frozenset", "__value__": ["12 east 12 saint", "12 east 12 street", "12 east 12th street", "12 e 12th saint", "12 east 12th saint", "12 e 12th street", "12 e 12 street", "12 e 12 saint"]}, "latlng": {"__class__": "tuple", "__value__": [40.734207, -73.99369899999999]}}]}, {"__class__": "tuple", "__value__": [{"name": "plumpjack cafe", "addr": "3201 fillmore st.", "city": "san francisco", "type": "mediterranean", "postal": "94123", "addr_variations": {"__class__": "frozenset", "__value__": ["3201 fillmore saint", "3201 fillmore street"]}, "latlng": {"__class__": "tuple", "__value__": [37.79911990000001, -122.4360911]}}, {"name": "plumpjack cafe", "addr": "3127 fillmore st.", "city": "san francisco", "type": "american (new)", "postal": "94123", "addr_variations": {"__class__": "frozenset", "__value__": ["3127 fillmore street", "3127 fillmore saint"]}, "latlng": {"__class__": "tuple", "__value__": [37.79834090000001, -122.4359412]}}]}, {"__class__": "tuple", "__value__": [{"name": "dining room ritz carlton buckhead", "addr": "3434 peachtree rd.", "city": "atlanta", "type": "international", "postal": "30326", "addr_variations": {"__class__": "frozenset", "__value__": ["3434 peachtree road"]}, "latlng": {"__class__": "tuple", "__value__": [33.8508073, -84.364227]}}, {"name": "ritz carlton dining room buckhead", "addr": "3434 peachtree rd. ne", "city": "atlanta", "type": "american (new)", "postal": "30326", "addr_variations": {"__class__": "frozenset", "__value__": ["3434 peachtree road northeast", "3434 peachtree road nebraska", "3434 peachtree road ne"]}, "latlng": {"__class__": "tuple", "__value__": [33.8508073, -84.364227]}}]}, {"__class__": "tuple", "__value__": [{"name": "philippe original", "addr": "1001 n. alameda st.", "city": "los angeles", "type": "american", "postal": "90012", "addr_variations": {"__class__": "frozenset", "__value__": ["1001 nosso alameda saint", "1001 north alameda street", "1001 n alameda santo", "1001 n alameda street", "1001 north alameda santo", "1001 norte alameda street", "1001 north alameda saint", "1001 nosso alameda santo", "1001 n alameda saint", "1001 norte alameda saint", "1001 norte alameda santo", "1001 nosso alameda street"]}, "latlng": {"__class__": "tuple", "__value__": [34.059721, -118.237025]}}, {"name": "philippe original", "addr": "1001 north alameda", "city": "los angeles", "type": "sandwiches", "postal": "90012", "addr_variations": {"__class__": "frozenset", "__value__": ["1001 north alameda"]}, "latlng": {"__class__": "tuple", "__value__": [34.059721, -118.237025]}}]}, {"__class__": "tuple", "__value__": [{"name": "hotel bel air", "addr": "701 stone canyon rd.", "city": "bel air", "type": "californian", "postal": "90077", "addr_variations": {"__class__": "frozenset", "__value__": ["701 stone canyon road"]}, "latlng": {"__class__": "tuple", "__value__": [34.0865944, -118.4463507]}}, {"name": "bel air hotel", "addr": "701 stone canyon rd.", "city": "bel air", "type": "californian", "postal": "90077", "addr_variations": {"__class__": "frozenset", "__value__": ["701 stone canyon road"]}, "latlng": {"__class__": "tuple", "__value__": [34.0865944, -118.4463507]}}]}, {"__class__": "tuple", "__value__": [{"name": "fenix", "addr": "8358 sunset blvd. west", "city": "hollywood", "type": "american", "postal": "90069", "addr_variations": {"__class__": "frozenset", "__value__": ["8358 sunset boulevard west"]}, "latlng": {"__class__": "tuple", "__value__": [34.0950968, -118.3719666]}}, {"name": "fenix at argyle", "addr": "8358 sunset blvd. west", "city": "hollywood", "type": "american", "postal": "90069", "addr_variations": {"__class__": "frozenset", "__value__": ["8358 sunset boulevard west"]}, "latlng": {"__class__": "tuple", "__value__": [34.0950968, -118.3719666]}}]}, {"__class__": "tuple", "__value__": [{"name": "lulu", "addr": "816 folsom st.", "city": "san francisco", "type": "mediterranean", "postal": "94107", "addr_variations": {"__class__": "frozenset", "__value__": ["816 folsom street", "816 folsom saint"]}, "latlng": {"__class__": "tuple", "__value__": [37.7817926, -122.4018175]}}, {"name": "lulu restaurant bis cafe", "addr": "816 folsom st.", "city": "san francisco", "type": "mediterranean", "postal": "94107", "addr_variations": {"__class__": "frozenset", "__value__": ["816 folsom street", "816 folsom saint"]}, "latlng": {"__class__": "tuple", "__value__": [37.7817926, -122.4018175]}}]}, {"__class__": "tuple", "__value__": [{"name": "pinot bistro", "addr": "12969 ventura blvd.", "city": "los angeles", "type": "french", "postal": "91604", "addr_variations": {"__class__": "frozenset", "__value__": ["12969 ventura boulevard"]}, "latlng": {"__class__": "tuple", "__value__": [34.14571950000001, -118.4160795]}}, {"name": "pinot bistro", "addr": "12969 ventura boulevard", "city": "studio city", "type": "bistro", "postal": "91604", "addr_variations": {"__class__": "frozenset", "__value__": ["12969 ventura boulevard"]}, "latlng": {"__class__": "tuple", "__value__": [34.14571950000001, -118.4160795]}}]}, {"__class__": "tuple", "__value__": [{"name": "gramercy tavern", "addr": "42 e. 20th st. between park ave. s and broadway", "city": "new york", "type": "american", "postal": "10003", "addr_variations": {"__class__": "frozenset", "__value__": ["42 east 20 street between park avenue south and broadway", "42 e 20 saint between park avenue san and broadway", "42 east 20th street between park avenue s and broadway", "42 east 20 saint between park avenue san and broadway", "42 e 20 street between park avenue san and broadway", "42 e 20th saint between park avenue s and broadway", "42 east 20th street between park avenue south and broadway", "42 east 20 street between park avenue s and broadway", "42 e 20th street between park avenue s and broadway", "42 east 20th saint between park avenue san and broadway", "42 east 20 street between park avenue san and broadway", "42 e 20th saint between park avenue south and broadway", "42 e 20th saint between park avenue san and broadway", "42 east 20 saint between park avenue s and broadway", "42 east 20 saint between park avenue south and broadway", "42 east 20th saint between park avenue south and broadway", "42 e 20 saint between park avenue south and broadway", "42 e 20th street between park avenue san and broadway", "42 e 20 street between park avenue s and broadway", "42 e 20 saint between park avenue s and broadway", "42 east 20th street between park avenue san and broadway", "42 east 20th saint between park avenue s and broadway", "42 e 20 street between park avenue south and broadway", "42 e 20th street between park avenue south and broadway"]}, "latlng": {"__class__": "tuple", "__value__": [40.7384555, -73.98850639999999]}}, {"name": "gramercy tavern", "addr": "42 e. 20th st.", "city": "new york city", "type": "american (new)", "postal": "10003", "addr_variations": {"__class__": "frozenset", "__value__": ["42 east 20th street", "42 e 20th saint", "42 e 20th street", "42 east 20 street", "42 e 20 saint", "42 e 20 street", "42 east 20th saint", "42 east 20 saint"]}, "latlng": {"__class__": "tuple", "__value__": [40.7384647, -73.9884665]}}]}, {"__class__": "tuple", "__value__": [{"name": "montrachet", "addr": "239 w. broadway between walker and white sts.", "city": "new york", "type": "french", "postal": "10013", "addr_variations": {"__class__": "frozenset", "__value__": ["239 w broadway between walker and white streets", "239 west broadway between walker and white streets"]}, "latlng": {"__class__": "tuple", "__value__": [40.7195598, -74.0057489]}}, {"name": "montrachet", "addr": "239 w. broadway", "city": "new york city", "type": "french bistro", "postal": "10013", "addr_variations": {"__class__": "frozenset", "__value__": ["239 west broadway", "239 w broadway"]}, "latlng": {"__class__": "tuple", "__value__": [40.7194666, -74.0057516]}}]}, {"__class__": "tuple", "__value__": [{"name": "smith wollensky", "addr": "201 e. 49th st.", "city": "new york", "type": "american", "postal": "10017", "addr_variations": {"__class__": "frozenset", "__value__": ["201 east 49 saint", "201 e 49 saint", "201 east 49th street", "201 e 49th street", "201 e 49 street", "201 east 49th saint", "201 e 49th saint", "201 east 49 street"]}, "latlng": {"__class__": "tuple", "__value__": [40.755156, -73.9707177]}}, {"name": "smith wollensky", "addr": "797 third ave.", "city": "new york city", "type": "steakhouses", "postal": "10022", "addr_variations": {"__class__": "frozenset", "__value__": ["797 3rd avenue", "797 3 avenue"]}, "latlng": {"__class__": "tuple", "__value__": [40.7551704, -73.9707437]}}]}, {"__class__": "tuple", "__value__": [{"name": "tavern green", "addr": "in central park at 67th st.", "city": "new york", "type": "american", "postal": "10023", "addr_variations": {"__class__": "frozenset", "__value__": ["indiana central park at 67 saint", "in central park at 67 saint", "indiana central park at 67 street", "indiana central park at 67th street", "in central park at 67th street", "in central park at 67 street", "indiana central park at 67th saint", "in central park at 67th saint"]}, "latlng": {"__class__": "tuple", "__value__": [40.7730403, -73.97829449999999]}}, {"name": "tavern green", "addr": "central park west", "city": "new york city", "type": "american (new)", "postal": null, "addr_variations": {"__class__": "frozenset", "__value__": ["central park west"]}, "latlng": {"__class__": "tuple", "__value__": [40.7848582, -73.9696519]}}]}, {"__class__": "tuple", "__value__": [{"name": "montrachet", "addr": "3000 w. paradise rd.", "city": "las vegas", "type": "continental", "postal": "89109", "addr_variations": {"__class__": "frozenset", "__value__": ["3000 w paradise road", "3000 west paradise road"]}, "latlng": {"__class__": "tuple", "__value__": [36.1362611, -115.1512539]}}, {"name": "montrachet bistro", "addr": "3000 paradise rd.", "city": "las vegas", "type": "french bistro", "postal": "89109", "addr_variations": {"__class__": "frozenset", "__value__": ["3000 paradise road"]}, "latlng": {"__class__": "tuple", "__value__": [36.1362611, -115.1512539]}}]}, {"__class__": "tuple", "__value__": [{"name": "ritz carlton restaurant dining room", "addr": "600 stockton st.", "city": "san francisco", "type": "american", "postal": "94108", "addr_variations": {"__class__": "frozenset", "__value__": ["600 stockton saint", "600 stockton street"]}, "latlng": {"__class__": "tuple", "__value__": [37.7918754, -122.4070392]}}, {"name": "ritz carlton dining room san francisco", "addr": "600 stockton st.", "city": "san francisco", "type": "french (new)", "postal": "94108", "addr_variations": {"__class__": "frozenset", "__value__": ["600 stockton saint", "600 stockton street"]}, "latlng": {"__class__": "tuple", "__value__": [37.7918754, -122.4070392]}}]}, {"__class__": "tuple", "__value__": [{"name": "arnie morton chicago", "addr": "435 s. la cienega blv.", "city": "los angeles", "type": "american", "postal": "90048", "addr_variations": {"__class__": "frozenset", "__value__": ["435 sur louisiana cienega boulevard", "435 s la cienega bulevar", "435 san louisiana cienega bulevar", "435 s lane cienega boulevard", "435 sur louisiana cienega bulevar", "435 san la cienega boulevard", "435 south lane cienega bulevar", "435 san lane cienega bulevar", "435 south lane cienega boulevard", "435 san la cienega bulevar", "435 sur lane cienega boulevard", "435 s louisiana cienega bulevar", "435 south louisiana cienega boulevard", "435 s louisiana cienega boulevard", "435 san lane cienega boulevard", "435 sur la cienega boulevard", "435 san louisiana cienega boulevard", "435 s lane cienega bulevar", "435 sur lane cienega bulevar", "435 s la cienega boulevard", "435 sur la cienega bulevar", "435 south la cienega bulevar", "435 south louisiana cienega bulevar", "435 south la cienega boulevard"]}, "latlng": {"__class__": "tuple", "__value__": [34.070609, -118.376722]}}, {"name": "arnie morton", "addr": "435 s. la cienega boulevard", "city": "los angeles", "type": "steakhouses", "postal": "90048", "addr_variations": {"__class__": "frozenset", "__value__": ["435 south louisiana cienega boulevard", "435 s la cienega boulevard", "435 s louisiana cienega boulevard", "435 san lane cienega boulevard", "435 san la cienega boulevard", "435 south lane cienega boulevard", "435 san louisiana cienega boulevard", "435 s lane cienega boulevard", "435 south la cienega boulevard"]}, "latlng": {"__class__": "tuple", "__value__": [34.070609, -118.376722]}}]}, {"__class__": "tuple", "__value__": [{"name": "palm", "addr": "9001 santa monica blvd.", "city": "los angeles", "type": "american", "postal": "90069", "addr_variations": {"__class__": "frozenset", "__value__": ["9001 santa monica boulevard"]}, "latlng": {"__class__": "tuple", "__value__": [34.083064, -118.387282]}}, {"name": "palm los angeles", "addr": "9001 sta monica boulevard", "city": "hollywood", "type": "steakhouses", "postal": "90069", "addr_variations": {"__class__": "frozenset", "__value__": ["9001 santa monica boulevard", "9001 station monica boulevard"]}, "latlng": {"__class__": "tuple", "__value__": [34.083064, -118.387282]}}]}, {"__class__": "tuple", "__value__": [{"name": "fringale", "addr": "570 4th st.", "city": "san francisco", "type": "french", "postal": "94107", "addr_variations": {"__class__": "frozenset", "__value__": ["570 4th saint", "570 4 street", "570 4 saint", "570 4th street"]}, "latlng": {"__class__": "tuple", "__value__": [37.7785416, -122.3971931]}}, {"name": "fringale", "addr": "570 fourth st.", "city": "san francisco", "type": "french bistro", "postal": "94107", "addr_variations": {"__class__": "frozenset", "__value__": ["570 4th saint", "570 4 street", "570 4 saint", "570 4th street"]}, "latlng": {"__class__": "tuple", "__value__": [37.7785416, -122.3971931]}}]}, {"__class__": "tuple", "__value__": [{"name": "locanda veneta", "addr": "8638 w 3rd", "city": "st los angeles", "type": "italian", "postal": "90048", "addr_variations": {"__class__": "frozenset", "__value__": ["8638 west 3", "8638 w 3", "8638 wohnung 3rd", "8638 weg 3", "8638 west 3rd", "8638 wohnung 3", "8638 weg 3rd", "8638 w 3rd"]}, "latlng": {"__class__": "tuple", "__value__": [34.0734172, -118.3810964]}}, {"name": "locanda", "addr": "w. third st.", "city": "st los angeles", "type": "italian", "postal": null, "addr_variations": {"__class__": "frozenset", "__value__": ["west 3 street", "w 3 street", "west 3rd saint", "w 3rd street", "west 3rd street", "west 3 saint", "w 3 saint", "w 3rd saint"]}, "latlng": {"__class__": "tuple", "__value__": [34.0689584, -118.3209281]}}]}, {"__class__": "tuple", "__value__": [{"name": "locanda veneta", "addr": "3rd st.", "city": "los angeles", "type": "italian", "postal": null, "addr_variations": {"__class__": "frozenset", "__value__": ["3 saint", "3rd street", "3 street", "3rd saint"]}, "latlng": {"__class__": "tuple", "__value__": [33.4947903, -112.069374]}}, {"name": "locanda veneta", "addr": "8638 w. third st.", "city": "los angeles", "type": "italian", "postal": "90048", "addr_variations": {"__class__": "frozenset", "__value__": ["8638 west 3rd saint", "8638 w 3rd street", "8638 w 3 street", "8638 west 3 saint", "8638 w 3 saint", "8638 west 3 street", "8638 west 3rd street", "8638 w 3rd saint"]}, "latlng": {"__class__": "tuple", "__value__": [34.0734172, -118.3810964]}}]}, {"__class__": "tuple", "__value__": [{"name": "locanda veneta", "addr": "3rd st.", "city": "los angeles", "type": "italian", "postal": null, "addr_variations": {"__class__": "frozenset", "__value__": ["3 saint", "3rd street", "3 street", "3rd saint"]}, "latlng": {"__class__": "tuple", "__value__": [33.4947903, -112.069374]}}, {"name": "locanda veneta", "addr": "8638 w 3rd", "city": "st los angeles", "type": "italian", "postal": "90048", "addr_variations": {"__class__": "frozenset", "__value__": ["8638 west 3", "8638 w 3", "8638 wohnung 3rd", "8638 weg 3", "8638 west 3rd", "8638 wohnung 3", "8638 weg 3rd", "8638 w 3rd"]}, "latlng": {"__class__": "tuple", "__value__": [34.0734172, -118.3810964]}}]}]} -------------------------------------------------------------------------------- /dedupe/variables/__init__.py: -------------------------------------------------------------------------------- 1 | from pkgutil import extend_path 2 | __path__ = extend_path(__path__, __name__) 3 | -------------------------------------------------------------------------------- /dedupe/variables/custom_variables.py: -------------------------------------------------------------------------------- 1 | import logging 2 | 3 | from dedupe.variables.latlong import LatLongType 4 | from dedupe.variables.string import ShortStringType 5 | from recordlinkage.algorithms.distance import _haversine_distance 6 | from recordlinkage.algorithms.numeric import _exp_sim 7 | import jellyfish 8 | import numpy as np 9 | 10 | 11 | logger = logging.getLogger(__name__) 12 | 13 | 14 | class JaroWinklerType(ShortStringType): 15 | type = "JaroWinkler" 16 | 17 | def __init__(self, definition): 18 | super().__init__(definition) 19 | 20 | self.comparator = jellyfish.jaro_winkler 21 | 22 | 23 | class ExpLatLongType(LatLongType): 24 | type = 'ExpLatLong' 25 | 26 | @staticmethod 27 | def comparator(x, y): 28 | dist = _haversine_distance(*[*x, *y]) 29 | return _exp_sim( 30 | np.float32(dist), 31 | scale=np.float32(0.1), 32 | offset=np.float32(0.01)) 33 | -------------------------------------------------------------------------------- /full-indexing.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vintasoftware/deduplication-slides/631389413a558ea83a407a47870253325b7b068e/full-indexing.png -------------------------------------------------------------------------------- /graph_utils.py: -------------------------------------------------------------------------------- 1 | import itertools 2 | import matplotlib.pyplot as plt 3 | import networkx as nx 4 | 5 | 6 | def get_diff_pairs( 7 | golden_pairs_set, 8 | dedupe_found_pairs_set, 9 | dedupe_unclustered_found_pairs_set, 10 | diff_set_ids 11 | ): 12 | diff_dedupe_clustered_pairs = [ 13 | (x, y) for x, y in dedupe_found_pairs_set 14 | if x in diff_set_ids or y in diff_set_ids] 15 | diff_dedupe_unclustered_pairs = [ 16 | (x, y) for x, y in dedupe_unclustered_found_pairs_set 17 | if x in diff_set_ids or y in diff_set_ids] 18 | diff_true_pairs = [ 19 | (x, y) for x, y in golden_pairs_set 20 | if x in diff_set_ids or y in diff_set_ids] 21 | diff_all_ids = set(itertools.chain.from_iterable( 22 | diff_dedupe_clustered_pairs + 23 | diff_dedupe_unclustered_pairs + 24 | diff_true_pairs)) 25 | return ( 26 | diff_dedupe_clustered_pairs, 27 | diff_dedupe_unclustered_pairs, 28 | diff_true_pairs, 29 | diff_all_ids 30 | ) 31 | 32 | 33 | def draw_pairs_graph(df, edges, nodes, edge_labels_dict, title): 34 | G = nx.Graph() 35 | for node in nodes: 36 | G.add_node(node, 37 | name=str(node) + ':' + df.loc[node]['name']) 38 | G.add_edges_from(edges) 39 | 40 | plt.figure(figsize=(10, 6)) 41 | pos = nx.circular_layout(G) 42 | nx.draw_networkx_nodes(G, pos, alpha=0.3, node_size=1000) 43 | nx.draw_networkx_labels(G, pos, labels=nx.get_node_attributes(G, 'name'), font_size=20) 44 | nx.draw_networkx_edges(G, pos, alpha=0.3, width=4) 45 | edge_labels = {pair: edge_labels_dict[pair] for pair in edges 46 | if pair in edge_labels_dict} 47 | nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels, font_size=20) 48 | plt.margins(0.5, 0.2) 49 | plt.axis('off') 50 | plt.title(title, fontdict={'fontsize': 24, 'fontweight': 'bold'}) 51 | plt.show() 52 | 53 | 54 | def show_cluster_graphs( 55 | df, 56 | golden_pairs_set, 57 | dedupe_found_pairs_set, 58 | dedupe_unclustered_found_pairs_set, 59 | dedupe_unclustered_pairs_score_dict, 60 | diff_set_ids 61 | ): 62 | ( 63 | diff_dedupe_clustered_pairs, 64 | diff_dedupe_unclustered_pairs, 65 | diff_true_pairs, 66 | diff_all_ids 67 | ) = get_diff_pairs( 68 | golden_pairs_set, 69 | dedupe_found_pairs_set, 70 | dedupe_unclustered_found_pairs_set, 71 | diff_set_ids 72 | ) 73 | display(df.loc[list(diff_all_ids)]) 74 | draw_pairs_graph( 75 | df, diff_true_pairs, diff_set_ids, {}, "Truth") 76 | draw_pairs_graph( 77 | df, diff_dedupe_unclustered_pairs, diff_set_ids, dedupe_unclustered_pairs_score_dict, "Unclustered") 78 | draw_pairs_graph( 79 | df, diff_dedupe_clustered_pairs, diff_set_ids, dedupe_unclustered_pairs_score_dict, "Clustered") 80 | -------------------------------------------------------------------------------- /requirements.in: -------------------------------------------------------------------------------- 1 | dateparser 2 | dedupe 3 | DoubleMetaphone 4 | geocoder 5 | jupyter 6 | matplotlib 7 | nameparser 8 | nbdime 9 | networkx 10 | numexpr 11 | numpy 12 | pandas 13 | phonenumbers 14 | postal 15 | probablepeople 16 | recordlinkage 17 | requests 18 | rise 19 | unidecode 20 | usaddress 21 | pip-tools -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | # 2 | # This file is autogenerated by pip-compile 3 | # To update, run: 4 | # 5 | # pip-compile --build-isolation --output-file=requirements.txt requirements.in 6 | # 7 | affinegap==1.11 # via dedupe 8 | appnope==0.1.0 # via ipykernel, ipython 9 | attrs==19.3.0 # via jsonschema 10 | backcall==0.1.0 # via ipython 11 | bleach==3.1.1 # via nbconvert 12 | btrees==4.6.1 # via dedupe, zope.index 13 | categorical-distance==1.9 # via dedupe 14 | certifi==2019.11.28 # via requests 15 | cffi==1.14.0 # via persistent 16 | chardet==3.0.4 # via requests 17 | click==7.0 # via geocoder, pip-tools 18 | colorama==0.4.3 # via nbdime 19 | cycler==0.10.0 # via matplotlib 20 | dateparser==0.7.2 # via -r requirements.in 21 | datetime-distance==0.1.3 # via dedupe-variable-datetime 22 | decorator==4.4.1 # via ipython, networkx, ratelim, traitlets 23 | dedupe-hcluster==0.3.8 # via dedupe 24 | dedupe-variable-datetime==0.1.5 # via dedupe 25 | dedupe==1.10.0 # via -r requirements.in, dedupe-variable-datetime 26 | defusedxml==0.6.0 # via nbconvert 27 | doublemetaphone==0.1 # via -r requirements.in, dedupe, probablepeople 28 | entrypoints==0.3 # via nbconvert 29 | fastcluster==1.1.26 # via dedupe 30 | future==0.18.2 # via datetime-distance, dedupe-variable-datetime, geocoder, probablepeople, rlr, usaddress 31 | geocoder==1.38.1 # via -r requirements.in 32 | gitdb==4.0.2 # via gitpython 33 | gitpython==3.1.0 # via nbdime 34 | haversine==2.2.0 # via dedupe 35 | highered==0.2.1 # via dedupe 36 | idna==2.9 # via requests 37 | importlib-metadata==1.5.0 # via jsonschema 38 | ipykernel==5.1.4 # via ipywidgets, jupyter, jupyter-console, notebook, qtconsole 39 | ipython-genutils==0.2.0 # via nbformat, notebook, qtconsole, traitlets 40 | ipython==7.12.0 # via ipykernel, ipywidgets, jupyter-console 41 | ipywidgets==7.5.1 # via jupyter 42 | jedi==0.16.0 # via ipython 43 | jellyfish==0.7.2 # via recordlinkage 44 | jinja2==2.11.1 # via nbconvert, nbdime, notebook 45 | joblib==0.14.1 # via recordlinkage, scikit-learn 46 | jsonschema==3.2.0 # via nbformat 47 | jupyter-client==6.0.0 # via ipykernel, jupyter-console, notebook, qtconsole 48 | jupyter-console==6.1.0 # via jupyter 49 | jupyter-core==4.6.3 # via jupyter-client, nbconvert, nbformat, notebook, qtconsole 50 | jupyter==1.0.0 # via -r requirements.in 51 | kiwisolver==1.1.0 # via matplotlib 52 | levenshtein-search==1.4.5 # via dedupe 53 | markupsafe==1.1.1 # via jinja2 54 | matplotlib==3.1.3 # via -r requirements.in 55 | mistune==0.8.4 # via nbconvert 56 | nameparser==1.0.6 # via -r requirements.in 57 | nbconvert==5.6.1 # via jupyter, notebook 58 | nbdime==1.1.0 # via -r requirements.in 59 | nbformat==5.0.4 # via ipywidgets, nbconvert, nbdime, notebook 60 | networkx==2.4 # via -r requirements.in 61 | notebook==6.0.3 # via jupyter, nbdime, rise, widgetsnbextension 62 | numexpr==2.7.1 # via -r requirements.in 63 | numpy==1.18.1 # via -r requirements.in, categorical-distance, dedupe, dedupe-hcluster, fastcluster, highered, matplotlib, numexpr, pandas, pyhacrf-datamade, pylbfgs, recordlinkage, rlr, scikit-learn, scipy, simplecosine 64 | pandas==1.0.1 # via -r requirements.in, recordlinkage 65 | pandocfilters==1.4.2 # via nbconvert 66 | parso==0.6.2 # via jedi 67 | persistent==4.5.1 # via btrees, zope.index 68 | pexpect==4.8.0 # via ipython 69 | phonenumbers==8.11.4 # via -r requirements.in 70 | pickleshare==0.7.5 # via ipython 71 | pip-tools==4.5.1 # via -r requirements.in 72 | postal==1.1.8 # via -r requirements.in 73 | probableparsing==0.0.1 # via probablepeople, usaddress 74 | probablepeople==0.5.4 # via -r requirements.in 75 | prometheus-client==0.7.1 # via notebook 76 | prompt-toolkit==3.0.3 # via ipython, jupyter-console 77 | ptyprocess==0.6.0 # via pexpect, terminado 78 | pycparser==2.19 # via cffi 79 | pygments==2.5.2 # via ipython, jupyter-console, nbconvert, nbdime, qtconsole 80 | pyhacrf-datamade==0.2.5 # via highered 81 | pylbfgs==0.2.0.13 # via pyhacrf-datamade, rlr 82 | pyparsing==2.4.6 # via matplotlib 83 | pyrsistent==0.15.7 # via jsonschema 84 | python-crfsuite==0.9.6 # via probablepeople, usaddress 85 | python-dateutil==2.8.1 # via dateparser, datetime-distance, jupyter-client, matplotlib, pandas 86 | pytz==2019.3 # via dateparser, pandas, tzlocal 87 | pyzmq==19.0.0 # via jupyter-client, notebook 88 | qtconsole==4.6.0 # via jupyter 89 | ratelim==0.1.6 # via geocoder 90 | recordlinkage==0.14 # via -r requirements.in 91 | regex==2020.2.20 # via dateparser 92 | requests==2.23.0 # via -r requirements.in, geocoder, nbdime 93 | rise==5.6.1 # via -r requirements.in 94 | rlr==2.4.5 # via dedupe 95 | scikit-learn==0.22.1 # via recordlinkage 96 | scipy==1.4.1 # via recordlinkage, scikit-learn 97 | send2trash==1.5.0 # via notebook 98 | simplecosine==1.2 # via dedupe 99 | six==1.14.0 # via bleach, cycler, geocoder, jsonschema, nbdime, pip-tools, postal, pyrsistent, python-dateutil, traitlets, zope.index 100 | smmap==3.0.1 # via gitdb 101 | terminado==0.8.3 # via notebook 102 | testpath==0.4.4 # via nbconvert 103 | tornado==6.0.3 # via ipykernel, jupyter-client, nbdime, notebook, terminado 104 | traitlets==4.3.3 # via ipykernel, ipython, ipywidgets, jupyter-client, jupyter-core, nbconvert, nbformat, notebook, qtconsole 105 | tzlocal==2.0.0 # via dateparser 106 | unidecode==1.1.1 # via -r requirements.in 107 | urllib3==1.25.8 # via requests 108 | usaddress==0.5.10 # via -r requirements.in 109 | wcwidth==0.1.8 # via prompt-toolkit 110 | webencodings==0.5.1 # via bleach 111 | widgetsnbextension==3.5.1 # via ipywidgets 112 | zipp==3.0.0 # via importlib-metadata 113 | zope.index==5.0.0 # via dedupe 114 | zope.interface==4.7.1 # via btrees, persistent, zope.index 115 | 116 | # The following packages are considered to be unsafe in a requirements file: 117 | # setuptools 118 | -------------------------------------------------------------------------------- /restaurant-training.csv: -------------------------------------------------------------------------------- 1 | name,addr,city,phone,type,cluster locanda veneta,3rd st.,los angeles,310/274-1893,italian,13 locanda veneta,8638 w. third st.,los angeles,310-274-1893,italian,13 locanda veneta,8638 w 3rd,st los angeles,+1 310-274-1893,italian,13 cafe lalo,201 w. 83rd st.,new york,212/496-6031,coffee bar,26 cafe lalo,201 w. 83rd st.,new york city,212-496-6031,coffeehouses,26 les celebrites,160 central park s,new york,212/484-5113,french,42 les celebrites,155 w. 58th st.,new york city,212-484-5113,french (classic),42 second avenue deli,156 2nd ave. at 10th st.,new york,212/677-0606,delicatessen,58 second avenue deli,156 second ave.,new york city,212-677-0606,delis,58 smith & wollensky,201 e. 49th st.,new york,212/753-1530,american,62 smith & wollensky,797 third ave.,new york city,212-753-1530,steakhouses,62 chin's,3200 las vegas blvd. s,las vegas,702/733-8899,asian,67 chin's,3200 las vegas blvd. s.,las vegas,702-733-8899,chinese,67 toulouse,b peachtree rd.,atlanta,404/351-9533,french,92 toulouse,293-b peachtree rd.,atlanta,404-351-9533,french (new),92 rose pistola,532 columbus ave.,san francisco,415/399-0499,italian,111 rose pistola,532 columbus ave.,san francisco,415-399-0499,italian,111 bistro garden,176 n. canon dr.,los angeles,310/550-3900,californian,115 remi,3rd st. promenade,santa monica,310/393-6545,italian,159 remi,145 w. 53rd st.,new york,212/581-4242,italian,334 west,63rd street steakhouse 44 w. 63rd st.,new york,212/246-6363,american,375 bistro,3400 las vegas blvd. s,las vegas,702/791-7111,continental,429 mikado,3400 las vegas blvd. s,las vegas,702/791-7111,asian,446 l'osteria del forno,519 columbus ave.,san francisco,415/982-1124,italian,490 stars,150 redwood alley,san francisco,415/861-7827,american,514 stars cafe,500 van ness ave.,san francisco,415/861-4344,american,515 belvedere the,9882 little santa monica blvd.,beverly hills,310-788-2306,pacific new wave,537 bernard's,515 s. olive st.,los angeles,213-612-1580,continental,539 bistro 45,45 s. mentor ave.,pasadena,818-795-2478,californian,540 cafe '50s,838 lincoln blvd.,venice,310-399-1955,american,545 cafe blanc,9777 little santa monica blvd.,beverly hills,310-888-0108,pacific new wave,546 la cachette,10506 little santa monica blvd.,century city,310-470-4992,french (new),568 moongate,3400 las vegas blvd. s.,las vegas,702-791-7352,chinese,666 -------------------------------------------------------------------------------- /restaurant.original.csv: -------------------------------------------------------------------------------- 1 | name,addr,city,phone,type,class 2 | "arnie morton's of chicago","435 s. la cienega blv.","los angeles","310/246-1501","american",0 3 | "arnie morton's of chicago","435 s. la cienega blvd.","los angeles","310-246-1501","steakhouses",0 4 | "art's delicatessen","12224 ventura blvd.","studio city","818/762-1221","american",1 5 | "art's deli","12224 ventura blvd.","studio city","818-762-1221","delis",1 6 | "hotel bel-air","701 stone canyon rd.","bel air","310/472-1211","californian",2 7 | "bel-air hotel","701 stone canyon rd.","bel air","310-472-1211","californian",2 8 | "cafe bizou","14016 ventura blvd.","sherman oaks","818/788-3536","french",3 9 | "cafe bizou","14016 ventura blvd.","sherman oaks","818-788-3536","french bistro",3 10 | "campanile","624 s. la brea ave.","los angeles","213/938-1447","american",4 11 | "campanile","624 s. la brea ave.","los angeles","213-938-1447","californian",4 12 | "chinois on main","2709 main st.","santa monica","310/392-9025","french",5 13 | "chinois on main","2709 main st.","santa monica","310-392-9025","pacific new wave",5 14 | "citrus","6703 melrose ave.","los angeles","213/857-0034","californian",6 15 | "citrus","6703 melrose ave.","los angeles","213-857-0034","californian",6 16 | "fenix","8358 sunset blvd. west","hollywood","213/848-6677","american",7 17 | "fenix at the argyle","8358 sunset blvd.","w. hollywood","213-848-6677","french (new)",7 18 | "granita","23725 w. malibu rd.","malibu","310/456-0488","californian",8 19 | "granita","23725 w. malibu rd.","malibu","310-456-0488","californian",8 20 | "grill on the alley","9560 dayton way","los angeles","310/276-0615","american",9 21 | "grill the","9560 dayton way","beverly hills","310-276-0615","american (traditional)",9 22 | "restaurant katsu","1972 n. hillhurst ave.","los angeles","213/665-1891","asian",10 23 | "katsu","1972 hillhurst ave.","los feliz","213-665-1891","japanese",10 24 | "l'orangerie","903 n. la cienega blvd.","los angeles","310/652-9770","french",11 25 | "l'orangerie","903 n. la cienega blvd.","w. hollywood","310-652-9770","french (classic)",11 26 | "le chardonnay","8284 melrose ave.","los angeles","213/655-8880","french",12 27 | "le chardonnay (los angeles)","8284 melrose ave.","los angeles","213-655-8880","french bistro",12 28 | "locanda veneta","3rd st.","los angeles","310/274-1893","italian",13 29 | "locanda veneta","8638 w. third st.","los angeles","310-274-1893","italian",13 30 | "matsuhisa","129 n. la cienega blvd.","beverly hills","310/659-9639","asian",14 31 | "matsuhisa","129 n. la cienega blvd.","beverly hills","310-659-9639","seafood",14 32 | "the palm","9001 santa monica blvd.","los angeles","310/550-8811","american",15 33 | "palm the (los angeles)","9001 santa monica blvd.","w. hollywood","310-550-8811","steakhouses",15 34 | "patina","5955 melrose ave.","los angeles","213/467-1108","californian",16 35 | "patina","5955 melrose ave.","los angeles","213-467-1108","californian",16 36 | "philippe's the original","1001 n. alameda st.","los angeles","213/628-3781","american",17 37 | "philippe the original","1001 n. alameda st.","chinatown","213-628-3781","cafeterias",17 38 | "pinot bistro","12969 ventura blvd.","los angeles","818/990-0500","french",18 39 | "pinot bistro","12969 ventura blvd.","studio city","818-990-0500","french bistro",18 40 | "rex il ristorante","617 s. olive st.","los angeles","213/627-2300","italian",19 41 | "rex il ristorante","617 s. olive st.","los angeles","213-627-2300","nuova cucina italian",19 42 | "spago","1114 horn ave.","los angeles","310/652-4025","californian",20 43 | "spago (los angeles)","8795 sunset blvd.","w. hollywood","310-652-4025","californian",20 44 | "valentino","3115 pico blvd.","santa monica","310/829-4313","italian",21 45 | "valentino","3115 pico blvd.","santa monica","310-829-4313","italian",21 46 | "yujean kang's gourmet chinese cuisine","67 n. raymond ave.","los angeles","818/585-0855","asian",22 47 | "yujean kang's","67 n. raymond ave.","pasadena","818-585-0855","chinese",22 48 | "21 club","21 w. 52nd st.","new york","212/582-7200","american",23 49 | "21 club","21 w. 52nd st.","new york city","212-582-7200","american (new)",23 50 | "aquavit","13 w. 54th st.","new york","212/307-7311","continental",24 51 | "aquavit","13 w. 54th st.","new york city","212-307-7311","scandinavian",24 52 | "aureole","34 e. 61st st.","new york","212/ 319-1660","american",25 53 | "aureole","34 e. 61st st.","new york city","212-319-1660","american (new)",25 54 | "cafe lalo","201 w. 83rd st.","new york","212/496-6031","coffee bar",26 55 | "cafe lalo","201 w. 83rd st.","new york city","212-496-6031","coffeehouses",26 56 | "cafe des artistes","1 w. 67th st.","new york","212/877-3500","continental",27 57 | "cafe des artistes","1 w. 67th st.","new york city","212-877-3500","french (classic)",27 58 | "carmine's","2450 broadway between 90th and 91st sts.","new york","212/362-2200","italian",28 59 | "carmine's","2450 broadway","new york city","212-362-2200","italian",28 60 | "carnegie deli","854 7th ave. between 54th and 55th sts.","new york","212/757-2245","delicatessen",29 61 | "carnegie deli","854 seventh ave.","new york city","212-757-2245","delis",29 62 | "chanterelle","2 harrison st. near hudson st.","new york","212/966-6960","american",30 63 | "chanterelle","2 harrison st.","new york city","212-966-6960","french (new)",30 64 | "daniel","20 e. 76th st.","new york","212/288-0033","french",31 65 | "daniel","20 e. 76th st.","new york city","212-288-0033","french (new)",31 66 | "dawat","210 e. 58th st.","new york","212/355-7555","asian",32 67 | "dawat","210 e. 58th st.","new york city","212-355-7555","indian",32 68 | "felidia","243 e. 58th st.","new york","212/758-1479","italian",33 69 | "felidia","243 e. 58th st.","new york city","212-758-1479","italian",33 70 | "four seasons grill room","99 e. 52nd st.","new york","212/754-9494","american",34 71 | "four seasons","99 e. 52nd st.","new york city","212-754-9494","american (new)",34 72 | "gotham bar & grill","12 e. 12th st.","new york","212/620-4020","american",35 73 | "gotham bar & grill","12 e. 12th st.","new york city","212-620-4020","american (new)",35 74 | "gramercy tavern","42 e. 20th st. between park ave. s and broadway","new york","212/477-0777","american",36 75 | "gramercy tavern","42 e. 20th st.","new york city","212-477-0777","american (new)",36 76 | "island spice","402 w. 44th st.","new york","212/765-1737","tel caribbean",37 77 | "island spice","402 w. 44th st.","new york city","212-765-1737","caribbean",37 78 | "jo jo","160 e. 64th st.","new york","212/223-5656","american",38 79 | "jo jo","160 e. 64th st.","new york city","212-223-5656","french bistro",38 80 | "la caravelle","33 w. 55th st.","new york","212/586-4252","french",39 81 | "la caravelle","33 w. 55th st.","new york city","212-586-4252","french (classic)",39 82 | "la cote basque","60 w. 55th st. between 5th and 6th ave.","new york","212/688-6525","french",40 83 | "la cote basque","60 w. 55th st.","new york city","212-688-6525","french (classic)",40 84 | "le bernardin","155 w. 51st st.","new york","212/489-1515","french",41 85 | "le bernardin","155 w. 51st st.","new york city","212-489-1515","seafood",41 86 | "les celebrites","160 central park s","new york","212/484-5113","french",42 87 | "les celebrites","155 w. 58th st.","new york city","212-484-5113","french (classic)",42 88 | "lespinasse","2 e. 55th st.","new york","212/339-6719","american",43 89 | "lespinasse (new york city)","2 e. 55th st.","new york city","212-339-6719","asian",43 90 | "lutece","249 e. 50th st.","new york","212/752-2225","french",44 91 | "lutece","249 e. 50th st.","new york city","212-752-2225","french (classic)",44 92 | "manhattan ocean club","57 w. 58th st.","new york","212/ 371-7777","seafood",45 93 | "manhattan ocean club","57 w. 58th st.","new york city","212-371-7777","seafood",45 94 | "march","405 e. 58th st.","new york","212/754-6272","american",46 95 | "march","405 e. 58th st.","new york city","212-754-6272","american (new)",46 96 | "mesa grill","102 5th ave. between 15th and 16th sts.","new york","212/807-7400","american",47 97 | "mesa grill","102 fifth ave.","new york city","212-807-7400","southwestern",47 98 | "mi cocina","57 jane st. off hudson st.","new york","212/627-8273","mexican",48 99 | "mi cocina","57 jane st.","new york city","212-627-8273","mexican",48 100 | "montrachet","239 w. broadway between walker and white sts.","new york","212/ 219-2777","french",49 101 | "montrachet","239 w. broadway","new york city","212-219-2777","french bistro",49 102 | "oceana","55 e. 54th st.","new york","212/759-5941","seafood",50 103 | "oceana","55 e. 54th st.","new york city","212-759-5941","seafood",50 104 | "park avenue cafe","100 e. 63rd st.","new york","212/644-1900","american",51 105 | "park avenue cafe (new york city)","100 e. 63rd st.","new york city","212-644-1900","american (new)",51 106 | "petrossian","182 w. 58th st.","new york","212/245-2214","french",52 107 | "petrossian","182 w. 58th st.","new york city","212-245-2214","russian",52 108 | "picholine","35 w. 64th st.","new york","212/724-8585","mediterranean",53 109 | "picholine","35 w. 64th st.","new york city","212-724-8585","mediterranean",53 110 | "pisces","95 ave. a at 6th st.","new york","212/260-6660","seafood",54 111 | "pisces","95 ave. a","new york city","212-260-6660","seafood",54 112 | "rainbow room","30 rockefeller plaza","new york","212/632-5000","or 212/632-5100 american",55 113 | "rainbow room","30 rockefeller plaza","new york city","212-632-5000","american (new)",55 114 | "river cafe","1 water st. at the east river","brooklyn","718/522-5200","american",56 115 | "river cafe","1 water st.","brooklyn","718-522-5200","american (new)",56 116 | "san domenico","240 central park s","new york","212/265-5959","italian",57 117 | "san domenico","240 central park s.","new york city","212-265-5959","italian",57 118 | "second avenue deli","156 2nd ave. at 10th st.","new york","212/677-0606","delicatessen",58 119 | "second avenue deli","156 second ave.","new york city","212-677-0606","delis",58 120 | "seryna","11 e. 53rd st.","new york","212/980-9393","asian",59 121 | "seryna","11 e. 53rd st.","new york city","212-980-9393","japanese",59 122 | "shun lee west","43 w. 65th st.","new york","212/371-8844","asian",60 123 | "shun lee palace","155 e. 55th st.","new york city","212-371-8844","chinese",60 124 | "sign of the dove","1110 3rd ave. at 65th st.","new york","212/861-8080","american",61 125 | "sign of the dove","1110 third ave.","new york city","212-861-8080","american (new)",61 126 | "smith & wollensky","201 e. 49th st.","new york","212/753-1530","american",62 127 | "smith & wollensky","797 third ave.","new york city","212-753-1530","steakhouses",62 128 | "tavern on the green","in central park at 67th st.","new york","212/873-3200","american",63 129 | "tavern on the green","central park west","new york city","212-873-3200","american (new)",63 130 | "uncle nick's","747 9th ave. between 50th and 51st sts.","new york","212/315-1726","mediterranean",64 131 | "uncle nick's","747 ninth ave.","new york city","212-245-7992","greek",64 132 | "union square cafe","21 e. 16th st.","new york","212/243-4020","american",65 133 | "union square cafe","21 e. 16th st.","new york city","212-243-4020","american (new)",65 134 | "virgil's","152 w. 44th st.","new york","212/ 921-9494","american",66 135 | "virgil's real bbq","152 w. 44th st.","new york city","212-921-9494","bbq",66 136 | "chin's","3200 las vegas blvd. s","las vegas","702/733-8899","asian",67 137 | "chin's","3200 las vegas blvd. s.","las vegas","702-733-8899","chinese",67 138 | "coyote cafe","3799 las vegas blvd. s","las vegas","702/891-7349","southwestern",68 139 | "coyote cafe (las vegas)","3799 las vegas blvd. s.","las vegas","702-891-7349","southwestern",68 140 | "le montrachet","3000 w. paradise rd.","las vegas","702/732-5111","continental",69 141 | "le montrachet bistro","3000 paradise rd.","las vegas","702-732-5651","french bistro",69 142 | "palace court","3570 las vegas blvd. s","las vegas","702/731-7547","continental",70 143 | "palace court","3570 las vegas blvd. s.","las vegas","702-731-7110","french (new)",70 144 | "second street grille","200 e. fremont st.","las vegas","702/385-3232","seafood",71 145 | "second street grill","200 e. fremont st.","las vegas","702-385-6277","pacific rim",71 146 | "steak house","2880 las vegas blvd. s","las vegas","702/734-0410","steak houses",72 147 | "steak house the","2880 las vegas blvd. s.","las vegas","702-734-0410","steakhouses",72 148 | "tillerman","2245 e. flamingo rd.","las vegas","702/731-4036","seafood",73 149 | "tillerman the","2245 e. flamingo rd.","las vegas","702-731-4036","steakhouses",73 150 | "abruzzi","2355 peachtree rd. peachtree battle shopping center","atlanta","404/261-8186","italian",74 151 | "abruzzi","2355 peachtree rd. ne","atlanta","404-261-8186","italian",74 152 | "bacchanalia","3125 piedmont rd. near peachtree rd.","atlanta","404/365-0410","international",75 153 | "bacchanalia","3125 piedmont rd.","atlanta","404-365-0410","californian",75 154 | "bone's","3130 piedmont road","atlanta","404/237-2663","american",76 155 | "bone's restaurant","3130 piedmont rd. ne","atlanta","404-237-2663","steakhouses",76 156 | "brasserie le coze","3393 peachtree rd. lenox square mall near neiman marcus","atlanta","404/266-1440","french",77 157 | "brasserie le coze","3393 peachtree rd.","atlanta","404-266-1440","french bistro",77 158 | "buckhead diner","3073 piedmont road","atlanta","404/262-3336","american",78 159 | "buckhead diner","3073 piedmont rd.","atlanta","404-262-3336","american (new)",78 160 | "ciboulette","1529 piedmont ave.","atlanta","404/874-7600","french",79 161 | "ciboulette restaurant","1529 piedmont ave.","atlanta","404-874-7600","french (new)",79 162 | "delectables","1 margaret mitchell sq.","atlanta","404/681-2909","american",80 163 | "delectables","1 margaret mitchell sq.","atlanta","404-681-2909","cafeterias",80 164 | "georgia grille","2290 peachtree rd. peachtree square shopping center","atlanta","404/352-3517","american",81 165 | "georgia grille","2290 peachtree rd.","atlanta","404-352-3517","southwestern",81 166 | "hedgerose heights inn","490 e. paces ferry rd.","atlanta","404/233-7673","international",82 167 | "hedgerose heights inn the","490 e. paces ferry rd. ne","atlanta","404-233-7673","continental",82 168 | "heera of india","595 piedmont ave. rio shopping mall","atlanta","404/876-4408","asian",83 169 | "heera of india","595 piedmont ave.","atlanta","404-876-4408","indian",83 170 | "indigo coastal grill","1397 n. highland ave.","atlanta","404/876-0676","caribbean",84 171 | "indigo coastal grill","1397 n. highland ave.","atlanta","404-876-0676","eclectic",84 172 | "la grotta","2637 peachtree rd. peachtree house condominium","atlanta","404/231-1368","italian",85 173 | "la grotta","2637 peachtree rd. ne","atlanta","404-231-1368","italian",85 174 | "mary mac's tea room","224 ponce de leon ave.","atlanta","404/876-1800","southern",86 175 | "mary mac's tea room","224 ponce de leon ave.","atlanta","404-876-1800","southern/soul",86 176 | "nikolai's roof","255 courtland st. at harris st.","atlanta","404/221-6362","continental",87 177 | "nikolai's roof","255 courtland st.","atlanta","404-221-6362","continental",87 178 | "pano's and paul's","1232 w. paces ferry rd.","atlanta","404/261-3662","international",88 179 | "pano's & paul's","1232 w. paces ferry rd.","atlanta","404-261-3662","american (new)",88 180 | "cafe ritz-carlton buckhead","3434 peachtree rd.","atlanta","404/237-2700","ext 6108 international",89 181 | "ritz-carlton cafe (buckhead)","3434 peachtree rd. ne","atlanta","404-237-2700","american (new)",89 182 | "dining room ritz-carlton buckhead","3434 peachtree rd.","atlanta","404/237-2700","international",90 183 | "ritz-carlton dining room (buckhead)","3434 peachtree rd. ne","atlanta","404-237-2700","american (new)",90 184 | "restaurant ritz-carlton atlanta","181 peachtree st.","atlanta","404/659-0400","continental",91 185 | "ritz-carlton restaurant","181 peachtree st.","atlanta","404-659-0400","french (classic)",91 186 | "toulouse","b peachtree rd.","atlanta","404/351-9533","french",92 187 | "toulouse","293-b peachtree rd.","atlanta","404-351-9533","french (new)",92 188 | "veni vidi vici","41 14th st.","atlanta","404/875-8424","italian",93 189 | "veni vidi vici","41 14th st.","atlanta","404-875-8424","italian",93 190 | "alain rondelli","126 clement st.","san francisco","415/387-0408","french",94 191 | "alain rondelli","126 clement st.","san francisco","415-387-0408","french (new)",94 192 | "aqua","252 california st.","san francisco","415/956-9662","seafood",95 193 | "aqua","252 california st.","san francisco","415-956-9662","american (new)",95 194 | "boulevard","1 mission st.","san francisco","415/543-6084","american",96 195 | "boulevard","1 mission st.","san francisco","415-543-6084","american (new)",96 196 | "cafe claude","7 claude la.","san francisco","415/392-3505","french",97 197 | "cafe claude","7 claude ln.","san francisco","415-392-3505","french bistro",97 198 | "campton place","340 stockton st.","san francisco","415/955-5555","american",98 199 | "campton place","340 stockton st.","san francisco","415-955-5555","american (new)",98 200 | "chez michel","804 northpoint","san francisco","415/775-7036","french",99 201 | "chez michel","804 north point st.","san francisco","415-775-7036","californian",99 202 | "fleur de lys","777 sutter st.","san francisco","415/673-7779","french",100 203 | "fleur de lys","777 sutter st.","san francisco","415-673-7779","french (new)",100 204 | "fringale","570 4th st.","san francisco","415/543-0573","french",101 205 | "fringale","570 fourth st.","san francisco","415-543-0573","french bistro",101 206 | "hawthorne lane","22 hawthorne st.","san francisco","415/777-9779","american",102 207 | "hawthorne lane","22 hawthorne st.","san francisco","415-777-9779","californian",102 208 | "khan toke thai house","5937 geary blvd.","san francisco","415/668-6654","asian",103 209 | "khan toke thai house","5937 geary blvd.","san francisco","415-668-6654","thai",103 210 | "la folie","2316 polk st.","san francisco","415/776-5577","french",104 211 | "la folie","2316 polk st.","san francisco","415-776-5577","french (new)",104 212 | "lulu","816 folsom st.","san francisco","415/495-5775","mediterranean",105 213 | "lulu restaurant-bis-cafe","816 folsom st.","san francisco","415-495-5775","mediterranean",105 214 | "masa's","648 bush st.","san francisco","415/989-7154","french",106 215 | "masa's","648 bush st.","san francisco","415-989-7154","french (new)",106 216 | "mifune japan center kintetsu building","1737 post st.","san francisco","415/922-0337","asian",107 217 | "mifune","1737 post st.","san francisco","415-922-0337","japanese",107 218 | "plumpjack cafe","3201 fillmore st.","san francisco","415/563-4755","mediterranean",108 219 | "plumpjack cafe","3127 fillmore st.","san francisco","415-563-4755","american (new)",108 220 | "postrio","545 post st.","san francisco","415/776-7825","american",109 221 | "postrio","545 post st.","san francisco","415-776-7825","californian",109 222 | "ritz-carlton restaurant and dining room","600 stockton st.","san francisco","415/296-7465","american",110 223 | "ritz-carlton dining room (san francisco)","600 stockton st.","san francisco","415-296-7465","french (new)",110 224 | "rose pistola","532 columbus ave.","san francisco","415/399-0499","italian",111 225 | "rose pistola","532 columbus ave.","san francisco","415-399-0499","italian",111 226 | "adriano's ristorante","2930 beverly glen circle","los angeles","310/475-9807","italian",112 227 | "barney greengrass","9570 wilshire blvd.","beverly hills","310/777-5877","american",113 228 | "beaurivage","26025 pacific coast hwy.","malibu","310/456-5733","french",114 229 | "bistro garden","176 n. canon dr.","los angeles","310/550-3900","californian",115 230 | "border grill","4th st.","los angeles","310/451-1655","mexican",116 231 | "broadway deli","3rd st. promenade","santa monica","310/451-0616","american",117 232 | "ca'brea","346 s. la brea ave.","los angeles","213/938-2863","italian",118 233 | "ca'del sol","4100 cahuenga blvd.","los angeles","818/985-4669","italian",119 234 | "cafe pinot","700 w. fifth st.","los angeles","213/239-6500","californian",120 235 | "california pizza kitchen","207 s. beverly dr.","los angeles","310/275-1101","californian",121 236 | "canter's","419 n. fairfax ave.","los angeles","213/651-2030.","american",122 237 | "cava","3rd st.","los angeles","213/658-8898","mediterranean",123 238 | "cha cha cha","656 n. virgil ave.","los angeles","213/664-7723","caribbean",124 239 | "chan dara","310 n. larchmont blvd.","los angeles","213/467-1052","asian",125 240 | "clearwater cafe","168 w. colorado blvd.","los angeles","818/356-0959","health food",126 241 | "dining room","9500 wilshire blvd.","los angeles","310/275-5200","californian",127 242 | "dive!","10250 santa monica blvd.","los angeles","310/788-","dive american",128 243 | "drago","2628 wilshire blvd.","santa monica","310/828-1585","italian",129 244 | "drai's","730 n. la cienega blvd.","los angeles","310/358-8585","french",130 245 | "dynasty room","930 hilgard ave.","los angeles","310/208-8765","continental",131 246 | "eclipse","8800 melrose ave.","los angeles","310/724-5959","californian",132 247 | "ed debevic's","134 n. la cienega","los angeles","310/659-1952","american",133 248 | "el cholo","1121 s. western ave.","los angeles","213/734-2773","mexican",134 249 | "gilliland's","2424 main st.","santa monica","310/392-3901","american",135 250 | "gladstone's","4 fish 17300 pacific coast hwy. at sunset blvd.","pacific palisades","310/454-3474","american",136 251 | "hard rock cafe","8600 beverly blvd.","los angeles","310/276-7605","american",137 252 | "harry's bar & american grill","2020 ave. of the stars","los angeles","310/277-2333","italian",138 253 | "il fornaio cucina italiana","301 n. beverly dr.","los angeles","310/550-8330","italian",139 254 | "jack sprat's grill","10668 w. pico blvd.","los angeles","310/837-6662","health food",140 255 | "jackson's farm","439 n. beverly drive","los angeles","310/273-5578","californian",141 256 | "jimmy's","201 moreno dr.","los angeles","310/552-2394","continental",142 257 | "joss","9255 sunset blvd.","los angeles","310/276-1886","asian",143 258 | "le colonial","8783 beverly blvd.","los angeles","310/289-0660","asian",144 259 | "le dome","8720 sunset blvd.","los angeles","310/659-6919","french",145 260 | "louise's trattoria","4500 los feliz blvd.","los angeles","213/667-0777","italian",146 261 | "mon kee seafood restaurant","679 n. spring st.","los angeles","213/628-6717","asian",147 262 | "morton's","8764 melrose ave.","los angeles","310/276-5205","american",148 263 | "nate 'n' al's","414 n. beverly dr.","los angeles","310/274-0101","american",149 264 | "nicola","601 s. figueroa st.","los angeles","213/485-0927","american",150 265 | "ocean avenue","1401 ocean ave.","santa monica","310/394-5669","american",151 266 | "orleans","11705 national blvd.","los angeles","310/479-4187","cajun",152 267 | "pacific dining car","6th st.","los angeles","213/483-6000","american",153 268 | "paty's","10001 riverside dr.","toluca lake","818/761-9126","american",154 269 | "pinot hollywood","1448 n. gower st.","los angeles","213/461-8800","californian",155 270 | "posto","14928 ventura blvd.","sherman oaks","818/784-4400","italian",156 271 | "prego","362 n. camden dr.","los angeles","310/277-7346","italian",157 272 | "rj's the rib joint","252 n. beverly dr.","los angeles","310/274-7427","american",158 273 | "remi","3rd st. promenade","santa monica","310/393-6545","italian",159 274 | "restaurant horikawa","111 s. san pedro st.","los angeles","213/680-9355","asian",160 275 | "roscoe's house of chicken 'n' waffles","1514 n. gower st.","los angeles","213/466-9329","american",161 276 | "schatzi on main","3110 main st.","los angeles","310/399-4800","continental",162 277 | "sofi","3rd st.","los angeles","213/651-0346","mediterranean",163 278 | "swingers","8020 beverly blvd.","los angeles","213/653-5858","american",164 279 | "tavola calda","7371 melrose ave.","los angeles","213/658-6340","italian",165 280 | "the mandarin","430 n. camden dr.","los angeles","310/859-0926","asian",166 281 | "tommy tang's","7313 melrose ave.","los angeles","213/937-5733","asian",167 282 | "tra di noi","3835 cross creek rd.","los angeles","310/456-0169","italian",168 283 | "trader vic's","9876 wilshire blvd.","los angeles","310/276-6345","asian",169 284 | "vida","1930 north hillhurst ave.","los feliz","213/660-4446","american",170 285 | "west beach cafe","60 n. venice blvd.","los angeles","310/823-5396","american",171 286 | "20 mott","20 mott st. between bowery and pell st.","new york","212/964-0380","asian",172 287 | "9 jones street","9 jones st.","new york","212/989-1220","american",173 288 | "adrienne","700 5th ave. at 55th st.","new york","212/903-3918","french",174 289 | "agrotikon","322 e. 14 st. between 1st and 2nd aves.","new york","212/473-2602","mediterranean",175 290 | "aja","937 broadway at 22nd st.","new york","212/473-8388","american",176 291 | "alamo","304 e. 48th st.","new york","212/ 759-0590","mexican",177 292 | "alley's end","311 w. 17th st.","new york","212/627-8899","american",178 293 | "ambassador grill","1 united nations plaza at 44th st.","new york","212/702-5014","american",179 294 | "american place","2 park ave. at 32nd st.","new york","212/684-2122","american",180 295 | "anche vivolo","222 e. 58th st. between 2nd and 3rd aves.","new york","212/308-0112","italian",181 296 | "arizona","206 206 e. 60th st.","new york","212/838-0440","american",182 297 | "arturo's","106 w. houston st. off thompson st.","new york","212/677-3820","italian",183 298 | "au mandarin","200-250 vesey st. world financial center","new york","212/385-0313","asian",184 299 | "bar anise","1022 3rd ave. between 60th and 61st sts.","new york","212/355-1112","mediterranean",185 300 | "barbetta","321 w. 46th st.","new york","212/246-9171","italian",186 301 | "ben benson's","123 w. 52nd st.","new york","212/581-8888","american",187 302 | "big cup","228 8th ave. between 21st and 22nd sts.","new york","212/206-0059","coffee bar",188 303 | "billy's","948 1st ave. between 52nd and 53rd sts.","new york","212/753-1870","american",189 304 | "boca chica","13 1st ave. near 1st st.","new york","212/473-0108","latin american",190 305 | "bolo","23 e. 22nd st.","new york","212/228-2200","mediterranean",191 306 | "boonthai","1393a 2nd ave. between 72nd and 73rd sts.","new york","212/249-8484","asian",192 307 | "bouterin","420 e. 59th st. off 1st ave.","new york","212/758-0323","french",193 308 | "brothers bar-b-q","225 varick st. at clarkston st.","new york","212/727-2775","american",194 309 | "bruno","240 e. 58th st.","new york","212/688-4190","italian",195 310 | "bryant park grill roof restaurant and bp cafe","25 w. 40th st. between 5th and 6th aves.","new york","212/840-6500","american",196 311 | "c3","103 waverly pl. near washington sq.","new york","212/254-1200","american",197 312 | "ct","111 e. 22nd st. between park ave. s and lexington ave.","new york","212/995-8500","french",198 313 | "cafe bianco","1486 2nd ave. between 77th and 78th sts.","new york","212/988-2655","coffee bar",199 314 | "cafe botanica","160 central park s","new york","212/484-5120","french",200 315 | "cafe la fortuna","69 w. 71st st.","new york","212/724-5846","coffee bar",201 316 | "cafe luxembourg","200 w. 70th st.","new york","212/873-7411","french",202 317 | "cafe pierre","2 e. 61st st.","new york","212/940-8185","french",203 318 | "cafe centro","200 park ave. between 45th st. and vanderbilt ave.","new york","212/818-1222","french",204 319 | "cafe fes","246 w. 4th st. at charles st.","new york","212/924-7653","mediterranean",205 320 | "caffe dante","81 macdougal st. between houston and bleeker sts.","new york","212/982-5275","coffee bar",206 321 | "caffe dell'artista","46 greenwich ave.","new york","212/645-4431","coffee bar",207 322 | "caffe lure","169 sullivan st. between houston and bleecker sts.","new york","212/473-2642","french",208 323 | "caffe reggio","119 macdougal st. between 3rd and bleecker sts.","new york","212/475-9557","coffee bar",209 324 | "caffe roma","385 broome st. at mulberry","new york","212/226-8413","coffee bar",210 325 | "caffe vivaldi","32 jones st. at bleecker st.","new york","212/691-7538","coffee bar",211 326 | "caffe bondi ristorante","7 w. 20th st.","new york","212/691-8136","italian",212 327 | "capsouto freres","451 washington st. near watts st.","new york","212/966-4900","french",213 328 | "captain's table","860 2nd ave. at 46th st.","new york","212/697-9538","seafood",214 329 | "casa la femme","150 wooster st. between houston and prince sts.","new york","212/505-0005","middle eastern",215 330 | "cendrillon asian grill & marimba bar","45 mercer st. between broome and grand sts.","new york","212/343-9012","asian",216 331 | "chez jacqueline","72 macdougal st. between w. houston and bleecker sts.","new york","212/505-0727","french",217 332 | "chiam","160 e. 48th st.","new york","212/371-2323","asian",218 333 | "china grill","60 w. 53rd st.","new york","212/333-7788","american",219 334 | "cite","120 w. 51st st.","new york","212/956-7100","french",220 335 | "coco pazzo","23 e. 74th st.","new york","212/794-0205","italian",221 336 | "columbus bakery","53rd sts.","new york","212/421-0334","coffee bar",222 337 | "corrado cafe","1013 3rd ave. between 60th and 61st sts.","new york","212/753-5100","coffee bar",223 338 | "cupcake cafe","522 9th ave. at 39th st.","new york","212/465-1530","coffee bar",224 339 | "da nico","164 mulberry st. between grand and broome sts.","new york","212/343-1212","italian",225 340 | "dean & deluca","121 prince st.","new york","212/254-8776","coffee bar",226 341 | "diva","341 w. broadway near grand st.","new york","212/941-9024","italian",227 342 | "dix et sept","181 w. 10th st.","new york","212/645-8023","french",228 343 | "docks","633 3rd ave. at 40th st.","new york","212/ 986-8080","seafood",229 344 | "duane park cafe","157 duane st. between w. broadway and hudson st.","new york","212/732-5555","american",230 345 | "el teddy's","219 w. broadway between franklin and white sts.","new york","212/941-7070","mexican",231 346 | "emily's","1325 5th ave. at 111th st.","new york","212/996-1212","american",232 347 | "empire korea","6 e. 32nd st.","new york","212/725-1333","asian",233 348 | "ernie's","2150 broadway between 75th and 76th sts.","new york","212/496-1588","american",234 349 | "evergreen cafe","1288 1st ave. at 69th st.","new york","212/744-3266","asian",235 350 | "f. ille ponte ristorante","39 desbrosses st. near west st.","new york","212/226-4621","italian",236 351 | "felix","340 w. broadway at grand st.","new york","212/431-0021","french",237 352 | "ferrier","29 e. 65th st.","new york","212/772-9000","french",238 353 | "fifty seven fifty seven","57 e. 57th st.","new york","212/758-5757","american",239 354 | "film center cafe","635 9th ave. between 44th and 45th sts.","new york","212/ 262-2525","american",240 355 | "fiorello's roman cafe","1900 broadway between 63rd and 64th sts.","new york","212/595-5330","italian",241 356 | "firehouse","522 columbus ave. between 85th and 86th sts.","new york","212/595-3139","american",242 357 | "first","87 1st ave. between 5th and 6th sts.","new york","212/674-3823","american",243 358 | "fishin eddie","73 w. 71st st.","new york","212/874-3474","seafood",244 359 | "fleur de jour","348 e. 62nd st.","new york","212/355-2020","coffee bar",245 360 | "flowers","21 west 17th st. between 5th and 6th aves.","new york","212/691-8888","american",246 361 | "follonico","6 w. 24th st.","new york","212/691-6359","italian",247 362 | "fraunces tavern","54 pearl st. at broad st.","new york","212/269-0144","american",248 363 | "french roast","458 6th ave. at 11th st.","new york","212/533-2233","french",249 364 | "french roast cafe","2340 broadway at 85th st.","new york","212/799-1533","coffee bar",250 365 | "frico bar","402 w. 43rd st. off 9th ave.","new york","212/564-7272","italian",251 366 | "fujiyama mama","467 columbus ave. between 82nd and 83rd sts.","new york","212/769-1144","asian",252 367 | "gabriela's","685 amsterdam ave. at 93rd st.","new york","212/961-0574","mexican",253 368 | "gallagher's","228 w. 52nd st.","new york","212/245-5336","american",254 369 | "gianni's","15 fulton st.","new york","212/608-7300","seafood",255 370 | "girafe","208 e. 58th st. between 2nd and 3rd aves.","new york","212/752-3054","italian",256 371 | "global","33 93 2nd ave. between 5th and 6th sts.","new york","212/477-8427","american",257 372 | "golden unicorn","18 e. broadway at catherine st.","new york","212/ 941-0911","asian",258 373 | "grand ticino","228 thompson st. between w. 3rd and bleecker sts.","new york","212/777-5922","italian",259 374 | "halcyon","151 w. 54th st. in the rihga royal hotel","new york","212/468-8888","american",260 375 | "hard rock cafe","221 w. 57th st.","new york","212/489-6565","american",261 376 | "hi-life restaurant and lounge","1340 1st ave. at 72nd st.","new york","212/249-3600","american",262 377 | "home","20 cornelia st. between bleecker and w. 4th st.","new york","212/243-9579","american",263 378 | "hudson river club","4 world financial center","new york","212/786-1500","american",264 379 | "i trulli","122 e. 27th st. between lexington and park aves.","new york","212/481-7372","italian",265 380 | "il cortile","125 mulberry st. between canal and hester sts.","new york","212/226-6060","italian",266 381 | "il nido","251 e. 53rd st.","new york","212/753-8450","italian",267 382 | "inca grill","492 broome st. near w. broadway","new york","212/966-3371","latin american",268 383 | "indochine","430 lafayette st. between 4th st. and astor pl.","new york","212/505-5111","asian",269 384 | "internet cafe","82 e. 3rd st. between 1st and 2nd aves.","new york","212/ 614-0747","coffee bar",270 385 | "ipanema","13 w. 46th st.","new york","212/730-5848","latin american",271 386 | "jean lafitte","68 w. 58th st.","new york","212/751-2323","french",272 387 | "jewel of india","15 w. 44th st.","new york","212/869-5544","asian",273 388 | "jimmy sung's","219 e. 44th st. between 2nd and 3rd aves.","new york","212/682-5678","asian",274 389 | "joe allen","326 w. 46th st.","new york","212/581-6464","american",275 390 | "judson grill","152 w. 52nd st.","new york","212/582-5252","american",276 391 | "l'absinthe","227 e. 67th st.","new york","212/794-4950","french",277 392 | "l'auberge","1191 1st ave. between 64th and 65th sts.","new york","212/288-8791","middle eastern",278 393 | "l'auberge du midi","310 w. 4th st. between w. 12th and bank sts.","new york","212/242-4705","french",279 394 | "l'udo","432 lafayette st. near astor pl.","new york","212/388-0978","french",280 395 | "la reserve","4 w. 49th st.","new york","212/247-2993","french",281 396 | "lanza restaurant","168 1st ave. between 10th and 11th sts.","new york","212/674-7014","italian",282 397 | "lattanzi ristorante","361 w. 46th st.","new york","212/315-0980","italian",283 398 | "layla","211 w. broadway at franklin st.","new york","212/431-0700","middle eastern",284 399 | "le chantilly","106 e. 57th st.","new york","212/751-2931","french",285 400 | "le colonial","149 e. 57th st.","new york","212/ 752-0808","asian",286 401 | "le gamin","50 macdougal st. between houston and prince sts.","new york","212/254-4678","coffee bar",287 402 | "le jardin","25 cleveland pl. near spring st.","new york","212/343-9599","french",288 403 | "le madri","168 w. 18th st.","new york","212/727-8022","italian",289 404 | "le marais","150 w. 46th st.","new york","212/869-0900","american",290 405 | "le perigord","405 e. 52nd st.","new york","212/755-6244","french",291 406 | "le select","507 columbus ave. between 84th and 85th sts.","new york","212/875-1993","american",292 407 | "les halles","411 park ave. s between 28th and 29th sts.","new york","212/679-4111","french",293 408 | "lincoln tavern","51 w. 64th st.","new york","212/721-8271","american",294 409 | "lola","30 west 22nd st. between 5th and 6th ave.","new york","212/675-6700","american",295 410 | "lucky strike","59 grand st. between wooster st. and w. broadway","new york","212/941-0479","or 212/941-0772 american",296 411 | "mad fish","2182 broadway between 77th and 78th sts.","new york","212/787-0202","seafood",297 412 | "main street","446 columbus ave. between 81st and 82nd sts.","new york","212/873-5025","american",298 413 | "mangia e bevi","800 9th ave. at 53rd st.","new york","212/956-3976","italian",299 414 | "manhattan cafe","1161 1st ave. between 63rd and 64th sts.","new york","212/888-6556","american",300 415 | "manila garden","325 e. 14th st. between 1st and 2nd aves.","new york","212/777-6314","asian",301 416 | "marichu","342 e. 46th st. between 1st and 2nd aves.","new york","212/370-1866","french",302 417 | "marquet patisserie","15 e. 12th st. between 5th ave. and university pl.","new york","212/229-9313","coffee bar",303 418 | "match","160 mercer st. between houston and prince sts.","new york","212/906-9173","american",304 419 | "matthew's","1030 3rd ave. at 61st st.","new york","212/838-4343","american",305 420 | "mavalli palace","46 e. 29th st.","new york","212/679-5535","asian",306 421 | "milan cafe and coffee bar","120 w. 23rd st.","new york","212/807-1801","coffee bar",307 422 | "monkey bar","60 e. 54th st.","new york","212/838-2600","american",308 423 | "montien","1134 1st ave. between 62nd and 63rd sts.","new york","212/421-4433","asian",309 424 | "morton's","551 5th ave. at 45th st.","new york","212/972-3315","american",310 425 | "motown cafe","104 w. 57th st. near 6th ave.","new york","212/581-8030","american",311 426 | "new york kom tang soot bul house","32 w. 32nd st.","new york","212/ 947-8482","asian",312 427 | "new york noodletown","28 1/2 bowery at bayard st.","new york","212/349-0923","asian",313 428 | "newsbar","2 w. 19th st.","new york","212/255-3996","coffee bar",314 429 | "odeon","145 w. broadway at thomas st.","new york","212/233-0507","american",315 430 | "orso","322 w. 46th st.","new york","212/489-7212","italian",316 431 | "osteria al droge","142 w. 44th st.","new york","212/944-3643","italian",317 432 | "otabe","68 e. 56th st.","new york","212/223-7575","asian",318 433 | "pacifica","138 lafayette st. between canal and howard sts.","new york","212/941-4168","asian",319 434 | "palio","151 w. 51st. st.","new york","212/245-4850","italian",320 435 | "pamir","1065 1st ave. at 58th st.","new york","212/644-9258","middle eastern",321 436 | "parioli romanissimo","24 e. 81st st.","new york","212/288-2391","italian",322 437 | "patria","250 park ave. s at 20th st.","new york","212/777-6211","latin american",323 438 | "peacock alley","301 park ave. between 49th and 50th sts.","new york","212/872-4895","french",324 439 | "pen & pencil","205 e. 45th st.","new york","212/682-8660","american",325 440 | "penang soho","109 spring st. between greene and mercer sts.","new york","212/274-8883","asian",326 441 | "persepolis","1423 2nd ave. between 74th and 75th sts.","new york","212/535-1100","middle eastern",327 442 | "planet hollywood","140 w. 57th st.","new york","212/333-7827","american",328 443 | "pomaire","371 w. 46th st. off 9th ave.","new york","212/ 956-3055","latin american",329 444 | "popover cafe","551 amsterdam ave. between 86th and 87th sts.","new york","212/595-8555","american",330 445 | "post house","28 e. 63rd st.","new york","212/935-2888","american",331 446 | "rain","100 w. 82nd st.","new york","212/501-0776","asian",332 447 | "red tulip","439 e. 75th st.","new york","212/734-4893","eastern european",333 448 | "remi","145 w. 53rd st.","new york","212/581-4242","italian",334 449 | "republic","37a union sq. w between 16th and 17th sts.","new york","212/627-7172","asian",335 450 | "roettelle a. g","126 e. 7th st. between 1st ave. and ave. a","new york","212/674-4140","continental",336 451 | "rosa mexicano","1063 1st ave. at 58th st.","new york","212/753-7407","mexican",337 452 | "ruth's chris","148 w. 51st st.","new york","212/245-9600","american",338 453 | "s.p.q.r","133 mulberry st. between hester and grand sts.","new york","212/925-3120","italian",339 454 | "sal anthony's","55 irving pl.","new york","212/982-9030","italian",340 455 | "sammy's roumanian steak house","157 chrystie st. at delancey st.","new york","212/673-0330","east european",341 456 | "san pietro","18 e. 54th st.","new york","212/753-9015","italian",342 457 | "sant ambroeus","1000 madison ave. between 77th and 78th sts.","new york","212/570-2211","coffee bar",343 458 | "sarabeth's kitchen","423 amsterdam ave. between 80th and 81st sts.","new york","212/496-6280","american",344 459 | "sea grill","19 w. 49th st.","new york","212/332-7610","seafood",345 460 | "serendipity","3 225 e. 60th st.","new york","212/838-3531","american",346 461 | "seventh regiment mess and bar","643 park ave. at 66th st.","new york","212/744-4107","american",347 462 | "sfuzzi","58 w. 65th st.","new york","212/873-3700","american",348 463 | "shaan","57 w. 48th st.","new york","212/ 977-8400","asian",349 464 | "sofia fabulous pizza","1022 madison ave. near 79th st.","new york","212/734-2676","italian",350 465 | "spring street natural restaurant & bar","62 spring st. at lafayette st.","new york","212/966-0290","american",351 466 | "stage deli","834 7th ave. between 53rd and 54th sts.","new york","212/245-7850","delicatessen",352 467 | "stingray","428 amsterdam ave. between 80th and 81st sts.","new york","212/501-7515","seafood",353 468 | "sweet'n'tart cafe","76 mott st. at canal st.","new york","212/334-8088","asian",354 469 | "t salon","143 mercer st. at prince st.","new york","212/925-3700","coffee bar",355 470 | "tang pavillion","65 w. 55th st.","new york","212/956-6888","asian",356 471 | "tapika","950 8th ave. at 56th st.","new york","212/ 397-3737","american",357 472 | "teresa's","103 1st ave. between 6th and 7th sts.","new york","212/228-0604","east european",358 473 | "terrace","400 w. 119th st. between amsterdam and morningside aves.","new york","212/666-9490","continental",359 474 | "the coffee pot","350 9th ave. at 49th st.","new york","212/265-3566","coffee bar",360 475 | "the savannah club","2420 broadway at 89th st.","new york","212/496-1066","american",361 476 | "trattoria dell'arte","900 7th ave. between 56th and 57th sts.","new york","212/245-9800","italian",362 477 | "triangolo","345 e. 83rd st.","new york","212/472-4488","italian",363 478 | "tribeca grill","375 greenwich st. near franklin st.","new york","212/941-3900","american",364 479 | "trois jean","154 e. 79th st. between lexington and 3rd aves.","new york","212/988-4858","coffee bar",365 480 | "tse yang","34 e. 51st st.","new york","212/688-5447","asian",366 481 | "turkish kitchen","386 3rd ave. between 27th and 28th sts.","new york","212/679-1810","middle eastern",367 482 | "two two two","222 w. 79th st.","new york","212/799-0400","american",368 483 | "veniero's pasticceria","342 e. 11th st. near 1st ave.","new york","212/674-7264","coffee bar",369 484 | "verbena","54 irving pl. at 17th st.","new york","212/260-5454","american",370 485 | "victor's cafe","52 236 w. 52nd st.","new york","212/586-7714","latin american",371 486 | "vince & eddie's","70 w. 68th st.","new york","212/721-0068","american",372 487 | "vong","200 e. 54th st.","new york","212/486-9592","american",373 488 | "water club","500 e. 30th st.","new york","212/683-3333","american",374 489 | "west","63rd street steakhouse 44 w. 63rd st.","new york","212/246-6363","american",375 490 | "xunta","174 1st ave. between 10th and 11th sts.","new york","212/614-0620","mediterranean",376 491 | "zen palate","34 union sq. e at 16th st.","new york","212/614-9291","and 212/614-9345 asian",377 492 | "zoe","90 prince st. between broadway and mercer st.","new york","212/966-6722","american",378 493 | "abbey","163 ponce de leon ave.","atlanta","404/876-8532","international",379 494 | "aleck's barbecue heaven","783 martin luther king jr. dr.","atlanta","404/525-2062","barbecue",380 495 | "annie's thai castle","3195 roswell rd.","atlanta","404/264-9546","asian",381 496 | "anthonys","3109 piedmont rd. just south of peachtree rd.","atlanta","404/262-7379","american",382 497 | "atlanta fish market","265 pharr rd.","atlanta","404/262-3165","american",383 498 | "beesley's of buckhead","260 e. paces ferry road","atlanta","404/264-1334","continental",384 499 | "bertolini's","3500 peachtree rd. phipps plaza","atlanta","404/233-2333","italian",385 500 | "bistango","1100 peachtree st.","atlanta","404/724-0901","mediterranean",386 501 | "cafe renaissance","7050 jimmy carter blvd. norcross","atlanta","770/441--0291","american",387 502 | "camille's","1186 n. highland ave.","atlanta","404/872-7203","italian",388 503 | "cassis","3300 peachtree rd. grand hyatt","atlanta","404/365-8100","mediterranean",389 504 | "city grill","50 hurt plaza","atlanta","404/524-2489","international",390 505 | "coco loco","40 buckhead crossing mall on the sidney marcus blvd.","atlanta","404/364-0212","caribbean",391 506 | "colonnade restaurant","1879 cheshire bridge rd.","atlanta","404/874-5642","southern",392 507 | "dante's down the hatch buckhead","3380 peachtree rd.","atlanta","404/266-1600","continental",393 508 | "dante's down the hatch","underground underground mall underground atlanta","atlanta","404/577-1800","continental",394 509 | "fat matt's rib shack","1811 piedmont ave. near cheshire bridge rd.","atlanta","404/607-1622","barbecue",395 510 | "french quarter food shop","923 peachtree st. at 8th st.","atlanta","404/875-2489","southern",396 511 | "holt bros. bar-b-q","6359 jimmy carter blvd. at buford hwy. norcross","atlanta","770/242-3984","barbecue",397 512 | "horseradish grill","4320 powers ferry rd.","atlanta","404/255-7277","southern",398 513 | "hsu's gourmet","192 peachtree center ave. at international blvd.","atlanta","404/659-2788","asian",399 514 | "imperial fez","2285 peachtree rd. peachtree battle condominium","atlanta","404/351-0870","mediterranean",400 515 | "kamogawa","3300 peachtree rd. grand hyatt","atlanta","404/841-0314","asian",401 516 | "la grotta at ravinia dunwoody rd.","holiday inn/crowne plaza at ravinia dunwoody","atlanta","770/395-9925","italian",402 517 | "little szechuan","c buford hwy. northwoods plaza doraville","atlanta","770/451-0192","asian",403 518 | "lowcountry barbecue","6301 roswell rd. sandy springs plaza sandy springs","atlanta","404/255-5160","barbecue",404 519 | "luna si","1931 peachtree rd.","atlanta","404/355-5993","continental",405 520 | "mambo restaurante cubano","1402 n. highland ave.","atlanta","404/874-2626","caribbean",406 521 | "mckinnon's louisiane","3209 maple dr.","atlanta","404/237-1313","southern",407 522 | "mi spia dunwoody rd.","park place across from perimeter mall dunwoody","atlanta","770/393-1333","italian",408 523 | "nickiemoto's: a sushi bar","247 buckhead ave. east village sq.","atlanta","404/842-0334","fusion",409 524 | "palisades","1829 peachtree rd.","atlanta","404/350-6755","continental",410 525 | "pleasant peasant","555 peachtree st. at linden ave.","atlanta","404/874-3223","american",411 526 | "pricci","500 pharr rd.","atlanta","404/237-2941","italian",412 527 | "r.j.'s uptown kitchen & wine bar","870 n. highland ave.","atlanta","404/875-7775","american",413 528 | "rib ranch","25 irby ave.","atlanta","404/233-7644","barbecue",414 529 | "sa tsu ki","3043 buford hwy.","atlanta","404/325-5285","asian",415 530 | "sato sushi and thai","6050 peachtree pkwy. norcross","atlanta","770/449-0033","asian",416 531 | "south city kitchen","1144 crescent ave.","atlanta","404/873-7358","southern",417 532 | "south of france","2345 cheshire bridge rd.","atlanta","404/325-6963","french",418 533 | "stringer's fish camp and oyster bar","3384 shallowford rd. chamblee","atlanta","770/458-7145","southern",419 534 | "sundown cafe","2165 cheshire bridge rd.","atlanta","404/321-1118","american",420 535 | "taste of new orleans","889 w. peachtree st.","atlanta","404/874-5535","southern",421 536 | "tomtom","3393 peachtree rd.","atlanta","404/264-1163","continental",422 537 | "antonio's","3700 w. flamingo","las vegas","702/252-7737","italian",423 538 | "bally's big kitchen","3645 las vegas blvd. s","las vegas","702/739-4111","buffets",424 539 | "bamboo garden","4850 flamingo rd.","las vegas","702/871-3262","asian",425 540 | "battista's hole in the wall","4041 audrie st. at flamingo rd.","las vegas","702/732-1424","italian",426 541 | "bertolini's","3570 las vegas blvd. s","las vegas","702/735-4663","italian",427 542 | "binion's coffee shop","128 fremont st.","las vegas","702/382-1600","coffee shops/diners",428 543 | "bistro","3400 las vegas blvd. s","las vegas","702/791-7111","continental",429 544 | "broiler","4111 boulder hwy.","las vegas","702/432-7777","american",430 545 | "bugsy's diner","3555 las vegas blvd. s","las vegas","702/733-3111","coffee shops/diners",431 546 | "cafe michelle","1350 e. flamingo rd.","las vegas","702/735-8686","american",432 547 | "cafe roma","3570 las vegas blvd. s","las vegas","702/731-7547","coffee shops/diners",433 548 | "capozzoli's","3333 s. maryland pkwy.","las vegas","702/731-5311","italian",434 549 | "carnival world","3700 w. flamingo rd.","las vegas","702/252-7777","buffets",435 550 | "center stage plaza hotel","1 main st.","las vegas","702/386-2512","american",436 551 | "circus circus","2880 las vegas blvd. s","las vegas","702/734-0410","buffets",437 552 | "empress court","3570 las vegas blvd. s","las vegas","702/731-7888","asian",438 553 | "feast","2411 w. sahara ave.","las vegas","702/367-2411","buffets",439 554 | "golden nugget hotel","129 e. fremont st.","las vegas","702/385-7111","buffets",440 555 | "golden steer","308 w. sahara ave.","las vegas","702/384-4470","steak houses",441 556 | "lillie langtry's","129 e. fremont st.","las vegas","702/385-7111","asian",442 557 | "mandarin court","1510 e. flamingo rd.","las vegas","702/737-1234","asian",443 558 | "margarita's mexican cantina","3120 las vegas blvd. s","las vegas","702/794-8200","mexican",444 559 | "mary's diner","5111 w. boulder hwy.","las vegas","702/454-8073","coffee shops/diners",445 560 | "mikado","3400 las vegas blvd. s","las vegas","702/791-7111","asian",446 561 | "pamplemousse","400 e. sahara ave.","las vegas","702/733-2066","continental",447 562 | "ralph's diner","3000 las vegas blvd. s","las vegas","702/732-6330","coffee shops/diners",448 563 | "the bacchanal","3570 las vegas blvd. s","las vegas","702/731-7525","only in las vegas",449 564 | "venetian","3713 w. sahara ave.","las vegas","702/876-4190","italian",450 565 | "viva mercado's","6182 w. flamingo rd.","las vegas","702/871-8826","mexican",451 566 | "yolie's","3900 paradise rd.","las vegas","702/794-0700","steak houses",452 567 | "2223","2223 market st.","san francisco","415/431-0692","american",453 568 | "acquarello","1722 sacramento st.","san francisco","415/567-5432","italian",454 569 | "bardelli's","243 o'farrell st.","san francisco","415/982-0243","old san francisco",455 570 | "betelnut","2030 union st.","san francisco","415/929-8855","asian",456 571 | "bistro roti","155 steuart st.","san francisco","415/495-6500","french",457 572 | "bix","56 gold st.","san francisco","415/433-6300","american",458 573 | "bizou","598 fourth st.","san francisco","415/543-2222","french",459 574 | "buca giovanni","800 greenwich st.","san francisco","415/776-7766","italian",460 575 | "cafe adriano","3347 fillmore st.","san francisco","415/474-4180","italian",461 576 | "cafe marimba","2317 chestnut st.","san francisco","415/776-1506","mexican/latin american/spanish",462 577 | "california culinary academy","625 polk st.","san francisco","415/771-3500","french",463 578 | "capp's corner","1600 powell st.","san francisco","415/989-2589","italian",464 579 | "carta","1772 market st.","san francisco","415/863-3516","american",465 580 | "chevys","4th and howard sts.","san francisco","415/543-8060","mexican/latin american/spanish",466 581 | "cypress club","500 jackson st.","san francisco","415/296-8555","american",467 582 | "des alpes","732 broadway","san francisco","415/788-9900","french",468 583 | "faz","161 sutter st.","san francisco","415/362-0404","greek and middle eastern",469 584 | "fog city diner","1300 battery st.","san francisco","415/982-2000","american",470 585 | "garden court","market and new montgomery sts.","san francisco","415/546-5011","old san francisco",471 586 | "gaylord's","ghirardelli sq.","san francisco","415/771-8822","asian",472 587 | "grand cafe hotel monaco","501 geary st.","san francisco","415/292-0101","american",473 588 | "greens","bldg. a fort mason","san francisco","415/771-6222","vegetarian",474 589 | "harbor village","4 embarcadero center","san francisco","415/781-8833","asian",475 590 | "harris'","2100 van ness ave.","san francisco","415/673-1888","steak houses",476 591 | "harry denton's","161 steuart st.","san francisco","415/882-1333","american",477 592 | "hayes street grill","320 hayes st.","san francisco","415/863-5545","seafood",478 593 | "helmand","430 broadway","san francisco","415/362-0641","greek and middle eastern",479 594 | "hong kong flower lounge","5322 geary blvd.","san francisco","415/668-8998","asian",480 595 | "hong kong villa","2332 clement st.","san francisco","415/752-8833","asian",481 596 | "hyde street bistro","1521 hyde st.","san francisco","415/441-7778","italian",482 597 | "il fornaio levi's plaza","1265 battery st.","san francisco","415/986-0100","italian",483 598 | "izzy's steak & chop house","3345 steiner st.","san francisco","415/563-0487","steak houses",484 599 | "jack's","615 sacramento st.","san francisco","415/986-9854","old san francisco",485 600 | "kabuto sushi","5116 geary blvd.","san francisco","415/752-5652","asian",486 601 | "katia's","600 5th ave.","san francisco","415/668-9292","",487 602 | "kuleto's","221 powell st.","san francisco","415/397-7720","italian",488 603 | "kyo-ya. sheraton palace hotel","2 new montgomery st. at market st.","san francisco","415/546-5000","asian",489 604 | "l'osteria del forno","519 columbus ave.","san francisco","415/982-1124","italian",490 605 | "le central","453 bush st.","san francisco","415/391-2233","french",491 606 | "le soleil","133 clement st.","san francisco","415/668-4848","asian",492 607 | "macarthur park","607 front st.","san francisco","415/398-5700","american",493 608 | "manora","3226 mission st.","san francisco","415/861-6224","asian",494 609 | "maykadeh","470 green st.","san francisco","415/362-8286","greek and middle eastern",495 610 | "mccormick & kuleto's","ghirardelli sq.","san francisco","415/929-1730","seafood",496 611 | "millennium","246 mcallister st.","san francisco","415/487-9800","vegetarian",497 612 | "moose's","1652 stockton st.","san francisco","415/989-7800","mediterranean",498 613 | "north india","3131 webster st.","san francisco","415/931-1556","asian",499 614 | "one market","1 market st.","san francisco","415/777-5577","american",500 615 | "oritalia","1915 fillmore st.","san francisco","415/346-1333","italian",501 616 | "pacific pan pacific hotel","500 post st.","san francisco","415/929-2087","french",502 617 | "palio d'asti","640 sacramento st.","san francisco","415/395-9800","italian",503 618 | "pane e vino","3011 steiner st.","san francisco","415/346-2111","italian",504 619 | "pastis","1015 battery st.","san francisco","415/391-2555","french",505 620 | "perry's","1944 union st.","san francisco","415/922-9022","american",506 621 | "r&g lounge","631 b kearny st.","san francisco","415/982-7877","or 415/982-3811 asian",507 622 | "rubicon","558 sacramento st.","san francisco","415/434-4100","american",508 623 | "rumpus","1 tillman pl.","san francisco","415/421-2300","american",509 624 | "sanppo","1702 post st.","san francisco","415/346-3486","asian",510 625 | "scala's bistro","432 powell st.","san francisco","415/395-8555","italian",511 626 | "south park cafe","108 south park","san francisco","415/495-7275","french",512 627 | "splendido embarcadero","4","san francisco","415/986-3222","mediterranean",513 628 | "stars","150 redwood alley","san francisco","415/861-7827","american",514 629 | "stars cafe","500 van ness ave.","san francisco","415/861-4344","american",515 630 | "stoyanof's cafe","1240 9th ave.","san francisco","415/664-3664","greek and middle eastern",516 631 | "straits cafe","3300 geary blvd.","san francisco","415/668-1783","asian",517 632 | "suppenkuche","601 hayes st.","san francisco","415/252-9289","russian/german",518 633 | "tadich grill","240 california st.","san francisco","415/391-2373","seafood",519 634 | "the heights","3235 sacramento st.","san francisco","415/474-8890","french",520 635 | "thepin","298 gough st.","san francisco","415/863-9335","asian",521 636 | "ton kiang","3148 geary blvd.","san francisco","415/752-4440","asian",522 637 | "vertigo","600 montgomery st.","san francisco","415/433-7250","mediterranean",523 638 | "vivande porta via","2125 fillmore st.","san francisco","415/346-4430","italian",524 639 | "vivande ristorante","670 golden gate ave.","san francisco","415/673-9245","italian",525 640 | "world wrapps","2257 chestnut st.","san francisco","415/563-9727","american",526 641 | "wu kong","101 spear st.","san francisco","415/957-9300","asian",527 642 | "yank sing","427 battery st.","san francisco","415/541-4949","asian",528 643 | "yaya cuisine","1220 9th ave.","san francisco","415/566-6966","greek and middle eastern",529 644 | "yoyo tsumami bistro","1611 post st.","san francisco","415/922-7788","french",530 645 | "zarzuela","2000 hyde st.","san francisco","415/346-0800","mexican/latin american/spanish",531 646 | "zuni cafe & grill","1658 market st.","san francisco","415/552-2522","mediterranean",532 647 | "apple pan the","10801 w. pico blvd.","west la","310-475-3585","american",534 648 | "asahi ramen","2027 sawtelle blvd.","west la","310-479-2231","noodle shops",535 649 | "baja fresh","3345 kimber dr.","westlake village","805-498-4049","mexican",536 650 | "belvedere the","9882 little santa monica blvd.","beverly hills","310-788-2306","pacific new wave",537 651 | "benita's frites","1433 third st. promenade","santa monica","310-458-2889","fast food",538 652 | "bernard's","515 s. olive st.","los angeles","213-612-1580","continental",539 653 | "bistro 45","45 s. mentor ave.","pasadena","818-795-2478","californian",540 654 | "brent's deli","19565 parthenia ave.","northridge","818-886-5679","delis",541 655 | "brighton coffee shop","9600 brighton way","beverly hills","310-276-7732","coffee shops",542 656 | "bristol farms market cafe","1570 rosecrans ave. s.","pasadena","310-643-5229","californian",543 657 | "bruno's","3838 centinela ave.","mar vista","310-397-5703","italian",544 658 | "cafe '50s","838 lincoln blvd.","venice","310-399-1955","american",545 659 | "cafe blanc","9777 little santa monica blvd.","beverly hills","310-888-0108","pacific new wave",546 660 | "cassell's","3266 w. sixth st.","la","213-480-8668","hamburgers",547 661 | "chez melange","1716 pch","redondo beach","310-540-1222","eclectic",548 662 | "diaghilev","1020 n. san vicente blvd.","w. hollywood","310-854-1111","russian",549 663 | "don antonio's","1136 westwood blvd.","westwood","310-209-1422","italian",550 664 | "duke's","8909 sunset blvd.","w. hollywood","310-652-3100","coffee shops",551 665 | "falafel king","1059 broxton ave.","westwood","310-208-4444","middle eastern",552 666 | "feast from the east","1949 westwood blvd.","west la","310-475-0400","chinese",553 667 | "gumbo pot the","6333 w. third st.","la","213-933-0358","cajun/creole",554 668 | "hollywood hills coffee shop","6145 franklin ave.","hollywood","213-467-7678","coffee shops",555 669 | "indo cafe","10428 1/2 national blvd.","la","310-815-1290","indonesian",556 670 | "jan's family restaurant","8424 beverly blvd.","la","213-651-2866","coffee shops",557 671 | "jiraffe","502 santa monica blvd","santa monica","310-917-6671","californian",558 672 | "jody maroni's sausage kingdom","2011 ocean front walk","venice","310-306-1995","hot dogs",559 673 | "joe's","1023 abbot kinney blvd.","venice","310-399-5811","american (new)",560 674 | "john o'groats","10516 w. pico blvd.","west la","310-204-0692","coffee shops",561 675 | "johnnie's pastrami","4017 s. sepulveda blvd.","culver city","310-397-6654","delis",562 676 | "johnny reb's southern smokehouse","4663 long beach blvd.","long beach","310-423-7327","southern/soul",563 677 | "johnny rockets (la)","7507 melrose ave.","la","213-651-3361","american",564 678 | "killer shrimp","4000 colfax ave.","studio city","818-508-1570","seafood",565 679 | "kokomo cafe","6333 w. third st.","la","213-933-0773","american",566 680 | "koo koo roo","8393 w. beverly blvd.","la","213-655-9045","chicken",567 681 | "la cachette","10506 little santa monica blvd.","century city","310-470-4992","french (new)",568 682 | "la salsa (la)","22800 pch","malibu","310-456-6299","mexican",569 683 | "la serenata de garibaldi","1842 e. first","st. boyle hts.","213-265-2887","mexican/tex-mex",570 684 | "langer's","704 s. alvarado st.","la","213-483-8050","delis",571 685 | "local nochol","30869 thousand oaks blvd.","westlake village","818-706-7706","health food",572 686 | "main course the","10509 w. pico blvd.","rancho park","310-475-7564","american",573 687 | "mani's bakery & espresso bar","519 s. fairfax ave.","la","213-938-8800","desserts",574 688 | "martha's","22nd street grill 25 22nd","st. hermosa beach","310-376-7786","american",575 689 | "maxwell's cafe","13329 washington blvd.","marina del rey","310-306-7829","american",576 690 | "michael's (los angeles)","1147 third st.","santa monica","310-451-0843","californian",577 691 | "mishima","8474 w. third st.","la","213-782-0181","noodle shops",578 692 | "mo better meatty meat","7261 melrose ave.","la","213-935-5280","hamburgers",579 693 | "mulberry st.","17040 ventura blvd.","encino","818-906-8881","pizza",580 694 | "ocean park cafe","3117 ocean park blvd.","santa monica","310-452-5728","american",581 695 | "ocean star","145 n. atlantic blvd.","monterey park","818-308-2128","seafood",582 696 | "original pantry bakery","875 s. figueroa st. downtown","la","213-627-6879","diners",583 697 | "parkway grill","510 s. arroyo pkwy.","pasadena","818-795-1001","californian",584 698 | "pho hoa","642 broadway","chinatown","213-626-5530","vietnamese",585 699 | "pink's famous chili dogs","709 n. la brea ave.","la","213-931-4223","hot dogs",586 700 | "poquito mas","2635 w. olive ave.","burbank","818-563-2252","mexican",587 701 | "r-23","923 e. third st.","los angeles","213-687-7178","japanese",588 702 | "rae's","2901 pico blvd.","santa monica","310-828-7937","diners",589 703 | "rubin's red hots","15322 ventura blvd.","encino","818-905-6515","hot dogs",590 704 | "ruby's (la)","45 s. fair oaks ave.","pasadena","818-796-7829","diners",591 705 | "russell's burgers","1198 pch","seal beach","310-596-9556","hamburgers",592 706 | "ruth's chris steak house (los angeles)","224 s. beverly dr.","beverly hills","310-859-8744","steakhouses",593 707 | "shiro","1505 mission st. s.","pasadena","818-799-4774","pacific new wave",594 708 | "sushi nozawa","11288 ventura blvd.","studio city","818-508-7017","japanese",595 709 | "sweet lady jane","8360 melrose ave.","la","213-653-7145","desserts",596 710 | "taiko","11677 san vicente blvd.","brentwood","310-207-7782","noodle shops",597 711 | "tommy's","2575 beverly blvd.","la","213-389-9060","hamburgers",598 712 | "uncle bill's pancake house","1305 highland ave.","manhattan beach","310-545-5177","diners",599 713 | "water grill","544 s. grand ave.","los angeles","213-891-0900","seafood",600 714 | "zankou chicken","1415 e. colorado st.","glendale","818-244-1937","middle eastern",601 715 | "afghan kebab house","764 ninth ave.","new york city","212-307-1612","afghan",602 716 | "arcadia","21 e. 62nd st.","new york city","212-223-2900","american (new)",603 717 | "benny's burritos","93 ave. a","new york city","212-254-2054","mexican",604 718 | "cafe con leche","424 amsterdam ave.","new york city","212-595-7000","cuban",605 719 | "corner bistro","331 w. fourth st.","new york city","212-242-9502","hamburgers",606 720 | "cucina della fontana","368 bleecker st.","new york city","212-242-0636","italian",607 721 | "cucina di pesce","87 e. fourth st.","new york city","212-260-6800","seafood",608 722 | "darbar","44 w. 56th st.","new york city","212-432-7227","indian",609 723 | "ej's luncheonette","432 sixth ave.","new york city","212-473-5555","diners",610 724 | "edison cafe","228 w. 47th st.","new york city","212-840-5000","diners",611 725 | "elias corner","24-02 31st st.","queens","718-932-1510","greek",612 726 | "good enough to eat","483 amsterdam ave.","new york city","212-496-0163","american",613 727 | "gray's papaya","2090 broadway","new york city","212-799-0243","hot dogs",614 728 | "il mulino","86 w. third st.","new york city","212-673-3783","italian",615 729 | "jackson diner","37-03 74th st.","queens","718-672-1232","indian",616 730 | "joe's shanghai","9 pell st.","queens","718-539-3838","chinese",617 731 | "john's pizzeria","48 w. 65th st.","new york city","212-721-7001","pizza",618 732 | "kelley & ping","127 greene st.","new york city","212-228-1212","pan-asian",619 733 | "kiev","117 second ave.","new york city","212-674-4040","ukrainian",620 734 | "kuruma zushi","2nd fl.","new york city","212-317-2802","japanese",621 735 | "la caridad","2199 broadway","new york city","212-874-2780","cuban",622 736 | "la grenouille","3 e. 52nd st.","new york city","212-752-1495","french (classic)",623 737 | "lemongrass grill","61a seventh ave.","brooklyn","718-399-7100","thai",624 738 | "lombardi's","32 spring st.","new york city","212-941-7994","pizza",625 739 | "marnie's noodle shop","466 hudson st.","new york city","212-741-3214","asian",626 740 | "menchanko-tei","39 w. 55th st.","new york city","212-247-1585","japanese",627 741 | "mitali east-west","296 bleecker st.","new york city","212-989-1367","indian",628 742 | "monsoon (ny)","435 amsterdam ave.","new york city","212-580-8686","thai",629 743 | "moustache","405 atlantic ave.","brooklyn","718-852-5555","middle eastern",630 744 | "nobu","105 hudson st.","new york city","212-219-0500","japanese",631 745 | "one if by land tibs","17 barrow st.","new york city","212-228-0822","continental",632 746 | "oyster bar","lower level","new york city","212-490-6650","seafood",633 747 | "palm","837 second ave.","new york city","212-687-2953","steakhouses",634 748 | "palm too","840 second ave.","new york city","212-697-5198","steakhouses",635 749 | "patsy's pizza","19 old fulton st.","brooklyn","718-858-4300","pizza",636 750 | "peter luger steak house","178 broadway","brooklyn","718-387-7400","steakhouses",637 751 | "rose of india","308 e. sixth st.","new york city","212-533-5011","indian",638 752 | "sam's noodle shop","411 third ave.","new york city","212-213-2288","chinese",639 753 | "sarabeth's","1295 madison ave.","new york city","212-410-7335","american",640 754 | "sparks steak house","210 e. 46th st.","new york city","212-687-4855","steakhouses",641 755 | "stick to your ribs","5-16 51st ave.","queens","718-937-3030","bbq",642 756 | "sushisay","38 e. 51st st.","new york city","212-755-1780","japanese",643 757 | "sylvia's","328 lenox ave.","new york city","212-996-0660","southern/soul",644 758 | "szechuan hunan cottage","1588 york ave.","new york city","212-535-5223","chinese",645 759 | "szechuan kitchen","1460 first ave.","new york city","212-249-4615","chinese",646 760 | "teresa's","80 montague st.","queens","718-520-2910","polish",647 761 | "thai house cafe","151 hudson st.","new york city","212-334-1085","thai",648 762 | "thailand restaurant","106 bayard st.","new york city","212-349-3132","thai",649 763 | "veselka","144 second ave.","new york city","212-228-9682","ukrainian",650 764 | "westside cottage","689 ninth ave.","new york city","212-245-0800","chinese",651 765 | "windows on the world","107th fl.","new york city","212-524-7000","eclectic",652 766 | "wollensky's grill","205 e. 49th st.","new york city","212-753-0444","steakhouses",653 767 | "yama","122 e. 17th st.","new york city","212-475-0969","japanese",654 768 | "zarela","953 second ave.","new york city","212-644-6740","mexican",655 769 | "andre's french restaurant","401 s. 6th st.","las vegas","702-385-5016","french (classic)",656 770 | "buccaneer bay club","3300 las vegas blvd. s.","las vegas","702-894-7350","continental",657 771 | "buzio's in the rio","3700 w. flamingo rd.","las vegas","702-252-7697","seafood",658 772 | "emeril's new orleans fish house","3799 las vegas blvd. s.","las vegas","702-891-7374","seafood",659 773 | "fiore rotisserie & grille","3700 w. flamingo rd.","las vegas","702-252-7702","italian",660 774 | "hugo's cellar","202 e. fremont st.","las vegas","702-385-4011","continental",661 775 | "madame ching's","3300 las vegas blvd. s.","las vegas","702-894-7111","asian",662 776 | "mayflower cuisinier","4750 w. sahara ave.","las vegas","702-870-8432","chinese",663 777 | "michael's (las vegas)","3595 las vegas blvd. s.","las vegas","702-737-7111","continental",664 778 | "monte carlo","3145 las vegas blvd. s.","las vegas","702-733-4524","french (new)",665 779 | "moongate","3400 las vegas blvd. s.","las vegas","702-791-7352","chinese",666 780 | "morton's of chicago (las vegas)","3200 las vegas blvd. s.","las vegas","702-893-0703","steakhouses",667 781 | "nicky blair's","3925 paradise rd.","las vegas","702-792-9900","italian",668 782 | "piero's restaurant","355 convention center dr.","las vegas","702-369-2305","italian",669 783 | "spago (las vegas)","3500 las vegas blvd. s.","las vegas","702-369-6300","californian",670 784 | "steakhouse the","128 e. fremont st.","las vegas","702-382-1600","steakhouses",671 785 | "stefano's","129 fremont st.","las vegas","702-385-7111","italian",672 786 | "sterling brunch","3645 las vegas blvd. s.","las vegas","702-739-4651","eclectic",673 787 | "tre visi","3799 las vegas blvd. s.","las vegas","702-891-7331","italian",674 788 | "103 west","103 w. paces ferry rd.","atlanta","404-233-5993","continental",675 789 | "alon's at the terrace","659 peachtree st.","atlanta","404-724-0444","sandwiches",676 790 | "baker's cajun cafe","1134 euclid ave.","atlanta","404-223-5039","cajun/creole",677 791 | "barbecue kitchen","1437 virginia ave.","atlanta","404-766-9906","bbq",678 792 | "bistro the","56 e. andrews dr. nw","atlanta","404-231-5733","french bistro",679 793 | "bobby & june's kountry kitchen","375 14th st.","atlanta","404-876-3872","southern/soul",680 794 | "bradshaw's restaurant","2911 s. pharr court","atlanta","404-261-7015","southern/soul",681 795 | "brookhaven cafe","4274 peachtree rd.","atlanta","404-231-5907","vegetarian",682 796 | "cafe sunflower","5975 roswell rd.","atlanta","404-256-1675","health food",683 797 | "canoe","4199 paces ferry rd.","atlanta","770-432-2663","american (new)",684 798 | "carey's","1021 cobb pkwy. se","marietta","770-422-8042","hamburgers",685 799 | "carey's corner","1215 powers ferry rd.","marietta","770-933-0909","hamburgers",686 800 | "chops","70 w. paces ferry rd.","atlanta","404-262-2675","steakhouses",687 801 | "chopstix","4279 roswell rd.","atlanta","404-255-4868","chinese",688 802 | "deacon burton's soulfood restaurant","1029 edgewood ave. se","atlanta","404-523-1929","southern/soul",689 803 | "eats","600 ponce de leon ave.","atlanta","404-888-9149","italian",690 804 | "flying biscuit the","1655 mclendon ave.","atlanta","404-687-8888","eclectic",691 805 | "frijoleros","1031 peachtree st. ne","atlanta","404-892-8226","tex-mex",692 806 | "greenwood's","1087 green st.","roswell","770-992-5383","southern/soul",693 807 | "harold's barbecue","171 mcdonough blvd.","atlanta","404-627-9268","bbq",694 808 | "havana sandwich shop","2905 buford hwy.","atlanta","404-636-4094","cuban",695 809 | "house of chan","2469 cobb pkwy.","smyrna","770-955-9444","chinese",696 810 | "indian delights","3675 satellite blvd.","duluth","100-813-8212","indian",697 811 | "java jive","790 ponce de leon ave.","atlanta","404-876-6161","coffee shops",698 812 | "johnny rockets (at)","2970 cobb pkwy.","atlanta","770-955-6068","american",699 813 | "kalo's coffee house","1248 clairmont rd.","decatur","404-325-3733","coffeehouses",700 814 | "la fonda latina","4427 roswell rd.","atlanta","404-303-8201","spanish",701 815 | "lettuce souprise you (at)","3525 mall blvd.","duluth","770-418-9969","cafeterias",702 816 | "majestic","1031 ponce de leon ave.","atlanta","404-875-0276","diners",703 817 | "morton's of chicago (atlanta)","303 peachtree st. ne","atlanta","404-577-4366","steakhouses",704 818 | "my thai","1248 clairmont rd.","atlanta","404-636-4280","thai",705 819 | "nava","3060 peachtree rd.","atlanta","404-240-1984","southwestern",706 820 | "nuevo laredo cantina","1495 chattahoochee ave. nw","atlanta","404-352-9009","mexican",707 821 | "original pancake house (at)","4330 peachtree rd.","atlanta","404-237-4116","american",708 822 | "palm the (atlanta)","3391 peachtree rd. ne","atlanta","404-814-1955","steakhouses",709 823 | "rainbow restaurant","2118 n. decatur rd.","decatur","404-633-3538","vegetarian",710 824 | "ritz-carlton cafe (atlanta)","181 peachtree st.","atlanta","404-659-0400","american (new)",711 825 | "riviera","519 e. paces ferry rd.","atlanta","404-262-7112","mediterranean",712 826 | "silver skillet the","200 14th st. nw","atlanta","404-874-1388","coffee shops",713 827 | "soto","3330 piedmont rd.","atlanta","404-233-2005","japanese",714 828 | "thelma's kitchen","764 marietta st. nw","atlanta","404-688-5855","cafeterias",715 829 | "tortillas","774 ponce de leon ave. ne","atlanta","404-892-0193","tex-mex",716 830 | "van gogh's restaurant & bar","70 w. crossville rd.","roswell","770-993-1156","american (new)",717 831 | "veggieland","220 sandy springs circle","atlanta","404-231-3111","vegetarian",718 832 | "white house restaurant","3172 peachtree rd. ne","atlanta","404-237-7601","diners",719 833 | "zab-e-lee","4837 old national hwy.","college park","404-768-2705","thai",720 834 | "bill's place","2315 clement st.","san francisco","415-221-5262","hamburgers",721 835 | "cafe flore","2298 market st.","san francisco","415-621-8579","californian",722 836 | "caffe greco","423 columbus ave.","san francisco","415-397-6261","continental",723 837 | "campo santo","240 columbus ave.","san francisco","415-433-9623","mexican",724 838 | "cha cha cha's","1805 haight st.","san francisco","415-386-5758","caribbean",725 839 | "doidge's","2217 union st.","san francisco","415-921-2149","american",726 840 | "dottie's true blue cafe","522 jones st.","san francisco","415-885-2767","diners",727 841 | "dusit thai","3221 mission st.","san francisco","415-826-4639","thai",728 842 | "ebisu","1283 ninth ave.","san francisco","415-566-1770","japanese",729 843 | "emerald garden restaurant","1550 california st.","san francisco","415-673-1155","vietnamese",730 844 | "eric's chinese restaurant","1500 church st.","san francisco","415-282-0919","chinese",731 845 | "hamburger mary's","1582 folsom st.","san francisco","415-626-1985","hamburgers",732 846 | "kelly's on trinity","333 bush st.","san francisco","415-362-4454","californian",733 847 | "la cumbre","515 valencia st.","san francisco","415-863-8205","mexican",734 848 | "la mediterranee","288 noe st.","san francisco","415-431-7210","mediterranean",735 849 | "la taqueria","2889 mission st.","san francisco","415-285-7117","mexican",736 850 | "mario's bohemian cigar store cafe","2209 polk st.","san francisco","415-776-8226","italian",737 851 | "marnee thai","2225 irving st.","san francisco","415-665-9500","thai",738 852 | "mel's drive-in","3355 geary st.","san francisco","415-387-2244","hamburgers",739 853 | "mo's burgers","1322 grant st.","san francisco","415-788-3779","hamburgers",740 854 | "phnom penh cambodian restaurant","631 larkin st.","san francisco","415-775-5979","cambodian",741 855 | "roosevelt tamale parlor","2817 24th st.","san francisco","415-550-9213","mexican",742 856 | "sally's cafe & bakery","300 de haro st.","san francisco","415-626-6006","american",743 857 | "san francisco bbq","1328 18th st.","san francisco","415-431-8956","thai",744 858 | "slanted door","584 valencia st.","san francisco","415-861-8032","vietnamese",745 859 | "swan oyster depot","1517 polk st.","san francisco","415-673-1101","seafood",746 860 | "thep phanom","400 waller st.","san francisco","415-431-2526","thai",747 861 | "ti couz","3108 16th st.","san francisco","415-252-7373","french",748 862 | "trio cafe","1870 fillmore st.","san francisco","415-563-2248","american",749 863 | "tu lan","8 sixth st.","san francisco","415-626-0927","vietnamese",750 864 | "vicolo pizzeria","201 ivy st.","san francisco","415-863-2382","pizza",751 865 | "wa-ha-ka oaxaca mexican grill","2141 polk st.","san francisco","415-775-1055","mexican",752 -------------------------------------------------------------------------------- /rise.css: -------------------------------------------------------------------------------- 1 | .rise-enabled { 2 | background-color: #fff; 3 | } 4 | 5 | 6 | .rendered_html table, .rendered_html th, .rendered_html tr, .rendered_html td { 7 | font-size: 100%; 8 | } 9 | 10 | .rendered_html img.vinta_logo { 11 | display: inline-block; 12 | margin-left: 0; 13 | } 14 | -------------------------------------------------------------------------------- /sorted-neighbourhood.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vintasoftware/deduplication-slides/631389413a558ea83a407a47870253325b7b068e/sorted-neighbourhood.png -------------------------------------------------------------------------------- /standard-blocking.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vintasoftware/deduplication-slides/631389413a558ea83a407a47870253325b7b068e/standard-blocking.png -------------------------------------------------------------------------------- /svm_dedupe.py: -------------------------------------------------------------------------------- 1 | import random 2 | 3 | import numpy 4 | from dedupe.api import Dedupe 5 | from dedupe.labeler import DedupeDisagreementLearner 6 | from dedupe.labeler import RLRLearner 7 | from sklearn.svm.classes import SVC 8 | 9 | 10 | def _build_model(): 11 | return SVC(kernel='rbf', probability=True) 12 | 13 | 14 | class SVMLearner(RLRLearner): 15 | 16 | def __init__(self, data_model, *args, **kwargs): 17 | self.svm_classifier = _build_model() 18 | super().__init__(data_model, *args, **kwargs) 19 | 20 | def fit(self, X, y): 21 | y = numpy.array(y) 22 | 23 | # This replicates Dedupe's behavior, adapting it to sklearn: 24 | # if there are only non-matching examples on y, 25 | # grab a random record and consider it as a match with itself 26 | # if there are only matching examples on y, 27 | # grab a random pair and consider it as a non-match 28 | # Also, if both X and y are empty, do both things above. 29 | # This happens on active learning when there's no existing training_pairs. 30 | if not y.any(): 31 | random_pair = random.choice(self.candidates) 32 | exact_match = (random_pair[0], random_pair[0]) 33 | X = numpy.vstack([X, self.transform([exact_match])]) 34 | y = numpy.concatenate([y, [1]]) 35 | if numpy.count_nonzero(y) == len(y): 36 | random_pair = random.choice(self.candidates) 37 | X = numpy.vstack([X, self.transform([random_pair])]) 38 | y = numpy.concatenate([y, [0]]) 39 | 40 | self.y = y 41 | self.X = X 42 | self.svm_classifier.fit(X, y) 43 | 44 | def predict_proba(self, examples): 45 | return self.svm_classifier.predict_proba(examples)[:, 1].reshape(-1, 1) 46 | 47 | 48 | class SVMDisagreementLearner(DedupeDisagreementLearner): 49 | 50 | def _common_init(self): 51 | self.classifier = SVMLearner(self.data_model, 52 | candidates=self.candidates) 53 | self.learners = (self.classifier, self.blocker) 54 | self.y = numpy.array([]) 55 | self.pairs = [] 56 | 57 | 58 | class SVMDedupe(Dedupe): 59 | classifier = _build_model() 60 | ActiveLearner = SVMDisagreementLearner 61 | -------------------------------------------------------------------------------- /training-input-output.txt: -------------------------------------------------------------------------------- 1 | name : cite 2 | addr : 120 w. 51st st. 3 | city : new york 4 | postal : 10019 5 | latlng : (40.7607952, -73.9812268) 6 | addr_variations : frozenset({'120 west 51st saint', '120 west 51 saint', '120 w 51st street', '120 w 51st saint', '120 west 51st street', '120 w 51 saint', '120 w 51 street', '120 west 51 street'}) 7 | 8 | name : new york noodletown 9 | addr : 28 1/2 bowery at bayard st. 10 | city : new york 11 | postal : 10013 12 | latlng : (40.7150317, -73.9970383) 13 | addr_variations : frozenset({'28 1 2 bowery at bayard saint', '28 1 2 bowery at bayard street'}) 14 | 15 | 0/10 positive, 0/10 negative 16 | Do these records refer to the same thing? 17 | (y)es / (n)o / (u)nsure / (f)inished 18 | n 19 | name : bernardin 20 | addr : 155 w. 51st st. 21 | city : new york city 22 | postal : 10019 23 | latlng : (40.7615691, -73.98180479999999) 24 | addr_variations : frozenset({'155 w 51st saint', '155 west 51st saint', '155 w 51st street', '155 west 51 saint', '155 w 51 street', '155 w 51 saint', '155 west 51 street', '155 west 51st street'}) 25 | 26 | name : republic 27 | addr : 37a union sq. w between 16th and 17th sts. 28 | city : new york 29 | postal : 10003 30 | latlng : (40.7369985, -73.9907851) 31 | addr_variations : frozenset({'37a union square w between 16th and 17 streets', '37 a union square west between 16th and 17 streets', '37a union square w between 16th and 17th streets', '37 a union square west between 16th and 17th streets', '37 a union square west between 16 and 17th streets', '37 a union square w between 16th and 17th streets', '37 a union square w between 16 and 17 streets', '37 a union square w between 16th and 17 streets', '37 a union square west between 16 and 17 streets', '37 a union square w between 16 and 17th streets', '37a union square w between 16 and 17 streets', '37a union square w between 16 and 17th streets', '37a union square west between 16 and 17th streets', '37a union square west between 16 and 17 streets', '37a union square west between 16th and 17 streets', '37a union square west between 16th and 17th streets'}) 32 | 33 | 0/10 positive, 1/10 negative 34 | Do these records refer to the same thing? 35 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 36 | n 37 | name : dawat 38 | addr : 210 e. 58th st. 39 | city : new york 40 | postal : None 41 | latlng : None 42 | addr_variations : frozenset({'210 e 58 street', '210 e 58th saint', '210 east 58 saint', '210 east 58th street', '210 east 58 street', '210 e 58 saint', '210 e 58th street', '210 east 58th saint'}) 43 | 44 | name : dawat 45 | addr : 210 e. 58th st. 46 | city : new york city 47 | postal : 10022 48 | latlng : (40.7604227, -73.9664276) 49 | addr_variations : frozenset({'210 e 58 street', '210 e 58th saint', '210 east 58 saint', '210 east 58th street', '210 east 58 street', '210 e 58 saint', '210 e 58th street', '210 east 58th saint'}) 50 | 51 | 0/10 positive, 2/10 negative 52 | Do these records refer to the same thing? 53 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 54 | y 55 | name : art delicatessen 56 | addr : 12224 ventura blvd. 57 | city : studio city 58 | postal : 91604 59 | latlng : (34.1429661, -118.3994688) 60 | addr_variations : frozenset({'12224 ventura boulevard'}) 61 | 62 | name : art deli 63 | addr : 12224 ventura blvd. 64 | city : los angeles 65 | postal : 91604 66 | latlng : (34.1429661, -118.3994688) 67 | addr_variations : frozenset({'12224 ventura boulevard'}) 68 | 69 | 1/10 positive, 2/10 negative 70 | Do these records refer to the same thing? 71 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 72 | y 73 | INFO:dedupe.training:Final predicate set: 74 | INFO:dedupe.training:(SimplePredicate: (fingerprint, addr), SimplePredicate: (wholeFieldPredicate, name)) 75 | name : gotham bar grill 76 | addr : 12 e 12th st 77 | city : new york city 78 | postal : 10003 79 | latlng : (40.734207, -73.99369899999999) 80 | addr_variations : frozenset({'12 e 12th street', '12 east 12 saint', '12 e 12 street', '12 e 12 saint', '12 east 12th saint', '12 east 12th street', '12 east 12 street', '12 e 12th saint'}) 81 | 82 | name : gotham 83 | addr : 12 e 12th st 84 | city : new york 85 | postal : 10003 86 | latlng : (40.734207, -73.99369899999999) 87 | addr_variations : frozenset({'12 e 12th street', '12 east 12 saint', '12 e 12 street', '12 e 12 saint', '12 east 12th saint', '12 east 12th street', '12 east 12 street', '12 e 12th saint'}) 88 | 89 | 2/10 positive, 2/10 negative 90 | Do these records refer to the same thing? 91 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 92 | y 93 | INFO:dedupe.training:Final predicate set: 94 | INFO:dedupe.training:(SimplePredicate: (fingerprint, addr), SimplePredicate: (sortedAcronym, name)) 95 | name : dawat 96 | addr : 210 e. 58th st. 97 | city : new york 98 | postal : None 99 | latlng : None 100 | addr_variations : frozenset({'210 e 58 street', '210 e 58th saint', '210 east 58 saint', '210 east 58th street', '210 east 58 street', '210 e 58 saint', '210 e 58th street', '210 east 58th saint'}) 101 | 102 | name : rainbow restaurant 103 | addr : 2118 n. decatur rd. 104 | city : decatur 105 | postal : 30033 106 | latlng : (33.7908588, -84.3052307) 107 | addr_variations : frozenset({'2118 north decatur road', '2118 n decatur road'}) 108 | 109 | 3/10 positive, 2/10 negative 110 | Do these records refer to the same thing? 111 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 112 | n 113 | INFO:dedupe.training:Final predicate set: 114 | INFO:dedupe.training:(SimplePredicate: (fingerprint, addr), TfidfNGramCanopyPredicate: (0.6, name)) 115 | name : yujean kang gourmet chinese cuisine 116 | addr : 67 n. raymond ave. 117 | city : los angeles 118 | postal : 91103 119 | latlng : (34.147086, -118.1490988) 120 | addr_variations : frozenset({'67 nord raymond avenue', '67 n raymond avenue', '67 north raymond avenue'}) 121 | 122 | name : ruby 123 | addr : 45 s. fair oaks ave. 124 | city : pasadena 125 | postal : 91105 126 | latlng : (34.1449715, -118.1506038) 127 | addr_variations : frozenset({'45 s fair oaks avenue', '45 south fair oaks avenue', '45 san fair oaks avenue'}) 128 | 129 | 3/10 positive, 3/10 negative 130 | Do these records refer to the same thing? 131 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 132 | n 133 | name : plumpjack cafe 134 | addr : 3201 fillmore st. 135 | city : san francisco 136 | postal : 94123 137 | latlng : (37.79911990000001, -122.4360911) 138 | addr_variations : frozenset({'3201 fillmore street', '3201 fillmore saint'}) 139 | 140 | name : plumpjack cafe 141 | addr : 3127 fillmore st. 142 | city : san francisco 143 | postal : 94123 144 | latlng : (37.79834090000001, -122.4359412) 145 | addr_variations : frozenset({'3127 fillmore street', '3127 fillmore saint'}) 146 | 147 | 3/10 positive, 4/10 negative 148 | Do these records refer to the same thing? 149 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 150 | y 151 | name : dining room ritz carlton buckhead 152 | addr : 3434 peachtree rd. 153 | city : atlanta 154 | postal : 30326 155 | latlng : (33.8508073, -84.364227) 156 | addr_variations : frozenset({'3434 peachtree road'}) 157 | 158 | name : ritz carlton dining room buckhead 159 | addr : 3434 peachtree rd. ne 160 | city : atlanta 161 | postal : 30326 162 | latlng : (33.8508073, -84.364227) 163 | addr_variations : frozenset({'3434 peachtree road northeast', '3434 peachtree road ne', '3434 peachtree road nebraska'}) 164 | 165 | 4/10 positive, 4/10 negative 166 | Do these records refer to the same thing? 167 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 168 | y 169 | INFO:dedupe.training:Final predicate set: 170 | INFO:dedupe.training:(SimplePredicate: (suffixArray, addr), TfidfNGramCanopyPredicate: (0.6, name)) 171 | name : coyote cafe 172 | addr : 3799 las vegas blvd. s 173 | city : las vegas 174 | postal : 89109 175 | latlng : (36.1022507, -115.1699679) 176 | addr_variations : frozenset({'3799 las vegas boulevard sur', '3799 las vegas boulevard san', '3799 las vegas boulevard south', '3799 las vegas boulevard s'}) 177 | 178 | name : tre visi 179 | addr : 3799 las vegas blvd. s. 180 | city : las vegas 181 | postal : 89109 182 | latlng : (36.1022507, -115.1699679) 183 | addr_variations : frozenset({'3799 las vegas boulevard sur', '3799 las vegas boulevard san', '3799 las vegas boulevard south', '3799 las vegas boulevard s'}) 184 | 185 | 5/10 positive, 4/10 negative 186 | Do these records refer to the same thing? 187 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 188 | n 189 | INFO:dedupe.training:Final predicate set: 190 | INFO:dedupe.training:(SimplePredicate: (commonTwoTokens, addr), TfidfNGramCanopyPredicate: (0.6, name)) 191 | name : osteria del forno 192 | addr : 519 columbus ave. 193 | city : san francisco 194 | postal : 94133 195 | latlng : (37.799736, -122.4096355) 196 | addr_variations : frozenset({'519 columbus avenue'}) 197 | 198 | name : caffe greco 199 | addr : 423 columbus ave. 200 | city : san francisco 201 | postal : 94133 202 | latlng : (37.7989568, -122.4086733) 203 | addr_variations : frozenset({'423 columbus avenue'}) 204 | 205 | 5/10 positive, 5/10 negative 206 | Do these records refer to the same thing? 207 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 208 | n 209 | name : orangerie 210 | addr : 903 n. la cienega blvd. 211 | city : los angeles 212 | postal : 90069 213 | latlng : (34.0870981, -118.376626) 214 | addr_variations : frozenset({'903 norte louisiana cienega boulevard', '903 north louisiana cienega boulevard', '903 norte lane cienega boulevard', '903 north lane cienega boulevard', '903 north la cienega boulevard', '903 n la cienega boulevard', '903 n lane cienega boulevard', '903 n louisiana cienega boulevard', '903 norte la cienega boulevard'}) 215 | 216 | name : drai 217 | addr : 730 n. la cienega blvd. 218 | city : los angeles 219 | postal : 90069 220 | latlng : (34.0845064, -118.3761899) 221 | addr_variations : frozenset({'730 north louisiana cienega boulevard', '730 n lane cienega boulevard', '730 norte la cienega boulevard', '730 n la cienega boulevard', '730 norte louisiana cienega boulevard', '730 n louisiana cienega boulevard', '730 north lane cienega boulevard', '730 north la cienega boulevard', '730 norte lane cienega boulevard'}) 222 | 223 | 5/10 positive, 6/10 negative 224 | Do these records refer to the same thing? 225 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 226 | n 227 | name : philippe original 228 | addr : 1001 n. alameda st. 229 | city : los angeles 230 | postal : 90012 231 | latlng : (34.059721, -118.237025) 232 | addr_variations : frozenset({'1001 n alameda saint', '1001 n alameda santo', '1001 north alameda street', '1001 north alameda santo', '1001 norte alameda saint', '1001 nosso alameda saint', '1001 norte alameda santo', '1001 nosso alameda santo', '1001 n alameda street', '1001 nosso alameda street', '1001 north alameda saint', '1001 norte alameda street'}) 233 | 234 | name : philippe original 235 | addr : 1001 north alameda 236 | city : los angeles 237 | postal : 90012 238 | latlng : (34.059721, -118.237025) 239 | addr_variations : frozenset({'1001 north alameda'}) 240 | 241 | 5/10 positive, 7/10 negative 242 | Do these records refer to the same thing? 243 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 244 | y 245 | name : chin 246 | addr : 3200 las vegas blvd. s 247 | city : las vegas 248 | postal : 89109 249 | latlng : (36.1275236, -115.1715003) 250 | addr_variations : frozenset({'3200 las vegas boulevard s', '3200 las vegas boulevard sur', '3200 las vegas boulevard south', '3200 las vegas boulevard san'}) 251 | 252 | name : morton chicago las vegas 253 | addr : 3200 las vegas blvd. s. 254 | city : las vegas 255 | postal : 89109 256 | latlng : (36.1275236, -115.1715003) 257 | addr_variations : frozenset({'3200 las vegas boulevard s', '3200 las vegas boulevard sur', '3200 las vegas boulevard south', '3200 las vegas boulevard san'}) 258 | 259 | 6/10 positive, 7/10 negative 260 | Do these records refer to the same thing? 261 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 262 | n 263 | INFO:dedupe.training:Final predicate set: 264 | INFO:dedupe.training:(TfidfNGramCanopyPredicate: (0.6, addr), TfidfNGramCanopyPredicate: (0.6, name)) 265 | name : locanda veneta 266 | addr : 3rd st. 267 | city : los angeles 268 | postal : None 269 | latlng : (33.4947903, -112.069374) 270 | addr_variations : frozenset({'3 street', '3rd saint', '3 saint', '3rd street'}) 271 | 272 | name : sofi 273 | addr : 3rd st. 274 | city : los angeles 275 | postal : None 276 | latlng : (33.4947903, -112.069374) 277 | addr_variations : frozenset({'3 street', '3rd saint', '3 saint', '3rd street'}) 278 | 279 | 6/10 positive, 8/10 negative 280 | Do these records refer to the same thing? 281 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 282 | n 283 | name : hotel bel air 284 | addr : 701 stone canyon rd. 285 | city : bel air 286 | postal : 90077 287 | latlng : (34.0865944, -118.4463507) 288 | addr_variations : frozenset({'701 stone canyon road'}) 289 | 290 | name : bel air hotel 291 | addr : 701 stone canyon rd. 292 | city : bel air 293 | postal : 90077 294 | latlng : (34.0865944, -118.4463507) 295 | addr_variations : frozenset({'701 stone canyon road'}) 296 | 297 | 6/10 positive, 9/10 negative 298 | Do these records refer to the same thing? 299 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 300 | y 301 | name : fenix 302 | addr : 8358 sunset blvd. west 303 | city : hollywood 304 | postal : 90069 305 | latlng : (34.0950968, -118.3719666) 306 | addr_variations : frozenset({'8358 sunset boulevard west'}) 307 | 308 | name : fenix at argyle 309 | addr : 8358 sunset blvd. west 310 | city : hollywood 311 | postal : 90069 312 | latlng : (34.0950968, -118.3719666) 313 | addr_variations : frozenset({'8358 sunset boulevard west'}) 314 | 315 | 7/10 positive, 9/10 negative 316 | Do these records refer to the same thing? 317 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 318 | y 319 | name : lulu 320 | addr : 816 folsom st. 321 | city : san francisco 322 | postal : 94107 323 | latlng : (37.7817926, -122.4018175) 324 | addr_variations : frozenset({'816 folsom saint', '816 folsom street'}) 325 | 326 | name : lulu restaurant bis cafe 327 | addr : 816 folsom st. 328 | city : san francisco 329 | postal : 94107 330 | latlng : (37.7817926, -122.4018175) 331 | addr_variations : frozenset({'816 folsom saint', '816 folsom street'}) 332 | 333 | 8/10 positive, 9/10 negative 334 | Do these records refer to the same thing? 335 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 336 | y 337 | INFO:dedupe.training:Final predicate set: 338 | INFO:dedupe.training:(TfidfNGramCanopyPredicate: (0.4, name), TfidfNGramCanopyPredicate: (0.6, addr)) 339 | name : pinot bistro 340 | addr : 12969 ventura blvd. 341 | city : los angeles 342 | postal : 91604 343 | latlng : (34.14571950000001, -118.4160795) 344 | addr_variations : frozenset({'12969 ventura boulevard'}) 345 | 346 | name : pinot bistro 347 | addr : 12969 ventura boulevard 348 | city : studio city 349 | postal : 91604 350 | latlng : (34.14571950000001, -118.4160795) 351 | addr_variations : frozenset({'12969 ventura boulevard'}) 352 | 353 | 9/10 positive, 9/10 negative 354 | Do these records refer to the same thing? 355 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 356 | y 357 | INFO:dedupe.training:Final predicate set: 358 | INFO:dedupe.training:(SimplePredicate: (fingerprint, addr), SimplePredicate: (sameThreeCharStartPredicate, name)) 359 | INFO:dedupe.training:(SimplePredicate: (fingerprint, city), SimplePredicate: (fingerprint, name)) 360 | name : gramercy tavern 361 | addr : 42 e. 20th st. between park ave. s and broadway 362 | city : new york 363 | postal : 10003 364 | latlng : (40.7384555, -73.98850639999999) 365 | addr_variations : frozenset({'42 east 20th saint between park avenue s and broadway', '42 e 20 saint between park avenue south and broadway', '42 east 20 street between park avenue san and broadway', '42 e 20th saint between park avenue san and broadway', '42 east 20 street between park avenue s and broadway', '42 east 20 saint between park avenue s and broadway', '42 east 20th street between park avenue south and broadway', '42 east 20th saint between park avenue san and broadway', '42 e 20th street between park avenue san and broadway', '42 east 20th saint between park avenue south and broadway', '42 east 20th street between park avenue s and broadway', '42 east 20 saint between park avenue san and broadway', '42 e 20th saint between park avenue south and broadway', '42 e 20th saint between park avenue s and broadway', '42 e 20th street between park avenue s and broadway', '42 e 20 street between park avenue s and broadway', '42 east 20 street between park avenue south and broadway', '42 e 20 saint between park avenue s and broadway', '42 e 20 street between park avenue san and broadway', '42 e 20 saint between park avenue san and broadway', '42 e 20th street between park avenue south and broadway', '42 east 20 saint between park avenue south and broadway', '42 e 20 street between park avenue south and broadway', '42 east 20th street between park avenue san and broadway'}) 366 | 367 | name : gramercy tavern 368 | addr : 42 e. 20th st. 369 | city : new york city 370 | postal : 10003 371 | latlng : (40.7384647, -73.9884665) 372 | addr_variations : frozenset({'42 e 20 street', '42 e 20th saint', '42 east 20 saint', '42 east 20th street', '42 e 20 saint', '42 east 20th saint', '42 east 20 street', '42 e 20th street'}) 373 | 374 | 10/10 positive, 9/10 negative 375 | Do these records refer to the same thing? 376 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 377 | y 378 | INFO:dedupe.training:Final predicate set: 379 | INFO:dedupe.training:(SimplePredicate: (commonTwoTokens, name), SimplePredicate: (fingerprint, name)) 380 | INFO:dedupe.training:(SimplePredicate: (fingerprint, addr), SimplePredicate: (sameThreeCharStartPredicate, name)) 381 | name : montrachet 382 | addr : 239 w. broadway between walker and white sts. 383 | city : new york 384 | postal : 10013 385 | latlng : (40.7195598, -74.0057489) 386 | addr_variations : frozenset({'239 west broadway between walker and white streets', '239 w broadway between walker and white streets'}) 387 | 388 | name : montrachet 389 | addr : 239 w. broadway 390 | city : new york city 391 | postal : 10013 392 | latlng : (40.7194666, -74.0057516) 393 | addr_variations : frozenset({'239 west broadway', '239 w broadway'}) 394 | 395 | 11/10 positive, 9/10 negative 396 | Do these records refer to the same thing? 397 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 398 | y 399 | name : smith wollensky 400 | addr : 201 e. 49th st. 401 | city : new york 402 | postal : 10017 403 | latlng : (40.755156, -73.9707177) 404 | addr_variations : frozenset({'201 e 49 street', '201 e 49 saint', '201 east 49 street', '201 east 49 saint', '201 east 49th street', '201 e 49th saint', '201 e 49th street', '201 east 49th saint'}) 405 | 406 | name : smith wollensky 407 | addr : 797 third ave. 408 | city : new york city 409 | postal : 10022 410 | latlng : (40.7551704, -73.9707437) 411 | addr_variations : frozenset({'797 3 avenue', '797 3rd avenue'}) 412 | 413 | 12/10 positive, 9/10 negative 414 | Do these records refer to the same thing? 415 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 416 | y 417 | INFO:dedupe.training:Final predicate set: 418 | INFO:dedupe.training:(SimplePredicate: (sameSevenCharStartPredicate, addr), SimplePredicate: (sameThreeCharStartPredicate, name)) 419 | INFO:dedupe.training:(SimplePredicate: (fingerprint, city), SimplePredicate: (fingerprint, name)) 420 | name : postrio 421 | addr : 545 post st. 422 | city : san francisco 423 | postal : 94102 424 | latlng : (37.78782959999999, -122.4107561) 425 | addr_variations : frozenset({'545 post saint', '545 post street'}) 426 | 427 | name : pacific pan pacific hotel 428 | addr : 500 post st. 429 | city : san francisco 430 | postal : 94102 431 | latlng : (37.7883396, -122.4103029) 432 | addr_variations : frozenset({'500 post saint', '500 post street'}) 433 | 434 | 13/10 positive, 9/10 negative 435 | Do these records refer to the same thing? 436 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 437 | n 438 | INFO:dedupe.training:Final predicate set: 439 | INFO:dedupe.training:(SimplePredicate: (commonTwoTokens, name), SimplePredicate: (fingerprint, name)) 440 | INFO:dedupe.training:(SimplePredicate: (commonThreeTokens, addr), SimplePredicate: (firstTokenPredicate, name)) 441 | name : teresa 442 | addr : 103 1st ave. between 6th and 7th sts. 443 | city : new york 444 | postal : 10003 445 | latlng : (40.7266961, -73.9861943) 446 | addr_variations : frozenset({'103 1 avenue between 6th and 7 streets', '103 1 avenue between 6 and 7th streets', '103 1 avenue between 6 and 7 streets', '103 1st avenue between 6th and 7th streets', '103 1 avenue between 6th and 7th streets', '103 1st avenue between 6 and 7th streets', '103 1st avenue between 6 and 7 streets', '103 1st avenue between 6th and 7 streets'}) 447 | 448 | name : teresa 449 | addr : 80 montague st. 450 | city : queens 451 | postal : 11201 452 | latlng : (40.6951748, -73.9962484) 453 | addr_variations : frozenset({'80 montague street', '80 montague saint'}) 454 | 455 | 13/10 positive, 10/10 negative 456 | Do these records refer to the same thing? 457 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 458 | n 459 | name : cafe ritz carlton buckhead 460 | addr : 3434 peachtree rd. 461 | city : atlanta 462 | postal : 30326 463 | latlng : (33.8508073, -84.364227) 464 | addr_variations : frozenset({'3434 peachtree road'}) 465 | 466 | name : dining room ritz carlton buckhead 467 | addr : 3434 peachtree rd. 468 | city : atlanta 469 | postal : 30326 470 | latlng : (33.8508073, -84.364227) 471 | addr_variations : frozenset({'3434 peachtree road'}) 472 | 473 | 13/10 positive, 11/10 negative 474 | Do these records refer to the same thing? 475 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 476 | n 477 | name : tavern green 478 | addr : in central park at 67th st. 479 | city : new york 480 | postal : 10023 481 | latlng : (40.7730403, -73.97829449999999) 482 | addr_variations : frozenset({'in central park at 67th saint', 'in central park at 67th street', 'indiana central park at 67th saint', 'in central park at 67 street', 'indiana central park at 67 saint', 'indiana central park at 67 street', 'in central park at 67 saint', 'indiana central park at 67th street'}) 483 | 484 | name : tavern green 485 | addr : central park west 486 | city : new york city 487 | postal : None 488 | latlng : (40.7848582, -73.9696519) 489 | addr_variations : frozenset({'central park west'}) 490 | 491 | 13/10 positive, 12/10 negative 492 | Do these records refer to the same thing? 493 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 494 | y 495 | name : montrachet 496 | addr : 3000 w. paradise rd. 497 | city : las vegas 498 | postal : 89109 499 | latlng : (36.1362611, -115.1512539) 500 | addr_variations : frozenset({'3000 w paradise road', '3000 west paradise road'}) 501 | 502 | name : montrachet bistro 503 | addr : 3000 paradise rd. 504 | city : las vegas 505 | postal : 89109 506 | latlng : (36.1362611, -115.1512539) 507 | addr_variations : frozenset({'3000 paradise road'}) 508 | 509 | 14/10 positive, 12/10 negative 510 | Do these records refer to the same thing? 511 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 512 | y 513 | name : c3 514 | addr : 103 waverly pl. near washington sq. 515 | city : new york 516 | postal : 10011 517 | latlng : (40.7324496, -73.9987276) 518 | addr_variations : frozenset({'103 waverly place near washington square', '103 waverly plain near washington square'}) 519 | 520 | name : caffe dell artista 521 | addr : 46 greenwich ave. 522 | city : new york 523 | postal : 10011 524 | latlng : (40.735596, -74.000357) 525 | addr_variations : frozenset({'46 greenwich avenue'}) 526 | 527 | 15/10 positive, 12/10 negative 528 | Do these records refer to the same thing? 529 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 530 | n 531 | INFO:dedupe.training:Final predicate set: 532 | INFO:dedupe.training:(SimplePredicate: (fingerprint, name), SimplePredicate: (latLongGridPredicate, latlng)) 533 | INFO:dedupe.training:(LevenshteinCanopyPredicate: (2, addr), SimplePredicate: (firstTokenPredicate, name)) 534 | name : main street 535 | addr : 446 columbus ave. between 81st and 82nd sts. 536 | city : new york 537 | postal : 10024 538 | latlng : (40.784841, -73.97746599999999) 539 | addr_variations : frozenset({'446 columbus avenue between 81st and 82nd streets', '446 columbus avenue between 81 and 82 streets', '446 columbus avenue between 81 and 82nd streets', '446 columbus avenue between 81st and 82 streets'}) 540 | 541 | name : rain 542 | addr : 100 w. 82nd st. 543 | city : new york 544 | postal : 10024 545 | latlng : (40.7839758, -73.9745045) 546 | addr_variations : frozenset({'100 west 82nd saint', '100 w 82nd street', '100 west 82 street', '100 west 82 saint', '100 w 82nd saint', '100 w 82 street', '100 west 82nd street', '100 w 82 saint'}) 547 | 548 | 15/10 positive, 13/10 negative 549 | Do these records refer to the same thing? 550 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 551 | n 552 | name : folie 553 | addr : 2316 polk st. 554 | city : san francisco 555 | postal : 94109 556 | latlng : (37.7981417, -122.4220609) 557 | addr_variations : frozenset({'2316 polk saint', '2316 polk street'}) 558 | 559 | name : mario bohemian cigar store cafe 560 | addr : 2209 polk st. 561 | city : san francisco 562 | postal : 94109 563 | latlng : (37.7971197, -122.4222379) 564 | addr_variations : frozenset({'2209 polk saint', '2209 polk street'}) 565 | 566 | 15/10 positive, 14/10 negative 567 | Do these records refer to the same thing? 568 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 569 | n 570 | name : locanda veneta 571 | addr : 3rd st. 572 | city : los angeles 573 | postal : None 574 | latlng : (33.4947903, -112.069374) 575 | addr_variations : frozenset({'3 street', '3rd saint', '3 saint', '3rd street'}) 576 | 577 | name : cava 578 | addr : 3rd st. 579 | city : los angeles 580 | postal : None 581 | latlng : (33.4947903, -112.069374) 582 | addr_variations : frozenset({'3 street', '3rd saint', '3 saint', '3rd street'}) 583 | 584 | 15/10 positive, 15/10 negative 585 | Do these records refer to the same thing? 586 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 587 | n 588 | name : lanza restaurant 589 | addr : 168 1st ave. between 10th and 11th sts. 590 | city : new york 591 | postal : 10009 592 | latlng : (40.728755, -73.98406500000002) 593 | addr_variations : frozenset({'168 1st avenue between 10 and 11th streets', '168 1 avenue between 10th and 11th streets', '168 1st avenue between 10th and 11th streets', '168 1st avenue between 10th and 11 streets', '168 1 avenue between 10 and 11 streets', '168 1 avenue between 10 and 11th streets', '168 1 avenue between 10th and 11 streets', '168 1st avenue between 10 and 11 streets'}) 594 | 595 | name : xunta 596 | addr : 174 1st ave. between 10th and 11th sts. 597 | city : new york 598 | postal : 10009 599 | latlng : (40.72907, -73.983948) 600 | addr_variations : frozenset({'174 1st avenue between 10 and 11th streets', '174 1 avenue between 10 and 11 streets', '174 1st avenue between 10th and 11 streets', '174 1 avenue between 10th and 11th streets', '174 1st avenue between 10th and 11th streets', '174 1st avenue between 10 and 11 streets', '174 1 avenue between 10th and 11 streets', '174 1 avenue between 10 and 11th streets'}) 601 | 602 | 15/10 positive, 16/10 negative 603 | Do these records refer to the same thing? 604 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 605 | n 606 | name : ritz carlton restaurant dining room 607 | addr : 600 stockton st. 608 | city : san francisco 609 | postal : 94108 610 | latlng : (37.7918754, -122.4070392) 611 | addr_variations : frozenset({'600 stockton street', '600 stockton saint'}) 612 | 613 | name : ritz carlton dining room san francisco 614 | addr : 600 stockton st. 615 | city : san francisco 616 | postal : 94108 617 | latlng : (37.7918754, -122.4070392) 618 | addr_variations : frozenset({'600 stockton street', '600 stockton saint'}) 619 | 620 | 15/10 positive, 17/10 negative 621 | Do these records refer to the same thing? 622 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 623 | y 624 | name : arnie morton chicago 625 | addr : 435 s. la cienega blv. 626 | city : los angeles 627 | postal : 90048 628 | latlng : (34.070609, -118.376722) 629 | addr_variations : frozenset({'435 san louisiana cienega boulevard', '435 south lane cienega bulevar', '435 sur la cienega boulevard', '435 south la cienega boulevard', '435 san la cienega bulevar', '435 sur la cienega bulevar', '435 south louisiana cienega bulevar', '435 sur lane cienega boulevard', '435 south louisiana cienega boulevard', '435 sur louisiana cienega bulevar', '435 s la cienega bulevar', '435 sur lane cienega bulevar', '435 s louisiana cienega bulevar', '435 south la cienega bulevar', '435 san lane cienega bulevar', '435 s lane cienega boulevard', '435 san lane cienega boulevard', '435 s louisiana cienega boulevard', '435 s lane cienega bulevar', '435 san la cienega boulevard', '435 s la cienega boulevard', '435 san louisiana cienega bulevar', '435 south lane cienega boulevard', '435 sur louisiana cienega boulevard'}) 630 | 631 | name : arnie morton 632 | addr : 435 s. la cienega boulevard 633 | city : los angeles 634 | postal : 90048 635 | latlng : (34.070609, -118.376722) 636 | addr_variations : frozenset({'435 south la cienega boulevard', '435 san la cienega boulevard', '435 s la cienega boulevard', '435 san louisiana cienega boulevard', '435 south lane cienega boulevard', '435 s lane cienega boulevard', '435 san lane cienega boulevard', '435 s louisiana cienega boulevard', '435 south louisiana cienega boulevard'}) 637 | 638 | 16/10 positive, 17/10 negative 639 | Do these records refer to the same thing? 640 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 641 | y 642 | name : palm 643 | addr : 9001 santa monica blvd. 644 | city : los angeles 645 | postal : 90069 646 | latlng : (34.083064, -118.387282) 647 | addr_variations : frozenset({'9001 santa monica boulevard'}) 648 | 649 | name : palm los angeles 650 | addr : 9001 sta monica boulevard 651 | city : hollywood 652 | postal : 90069 653 | latlng : (34.083064, -118.387282) 654 | addr_variations : frozenset({'9001 station monica boulevard', '9001 santa monica boulevard'}) 655 | 656 | 17/10 positive, 17/10 negative 657 | Do these records refer to the same thing? 658 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 659 | y 660 | INFO:dedupe.training:Final predicate set: 661 | INFO:dedupe.training:(SimplePredicate: (commonTwoTokens, addr), SimplePredicate: (firstTokenPredicate, name)) 662 | INFO:dedupe.training:(SimplePredicate: (commonTwoTokens, name), SimplePredicate: (fingerprint, name)) 663 | name : caffe lure 664 | addr : 169 sullivan st. between houston and bleecker sts. 665 | city : new york 666 | postal : 10012 667 | latlng : (40.7279278, -74.0009847) 668 | addr_variations : frozenset({'169 sullivan street between houston and bleecker streets', '169 sullivan saint between houston and bleecker streets'}) 669 | 670 | name : caffe reggio 671 | addr : 119 macdougal st. between 3rd and bleecker sts. 672 | city : new york 673 | postal : 10012 674 | latlng : (40.73030790000001, -74.0003706) 675 | addr_variations : frozenset({'119 macdougal saint between 3 and bleecker streets', '119 macdougal saint between 3rd and bleecker streets', '119 macdougal street between 3rd and bleecker streets', '119 macdougal street between 3 and bleecker streets'}) 676 | 677 | 18/10 positive, 17/10 negative 678 | Do these records refer to the same thing? 679 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 680 | n 681 | INFO:dedupe.training:Final predicate set: 682 | INFO:dedupe.training:(SimplePredicate: (firstTokenPredicate, name), TfidfNGramCanopyPredicate: (0.6, addr)) 683 | INFO:dedupe.training:(SimplePredicate: (commonTwoTokens, name), SimplePredicate: (fingerprint, name)) 684 | name : fringale 685 | addr : 570 4th st. 686 | city : san francisco 687 | postal : 94107 688 | latlng : (37.7785416, -122.3971931) 689 | addr_variations : frozenset({'570 4th street', '570 4 street', '570 4 saint', '570 4th saint'}) 690 | 691 | name : fringale 692 | addr : 570 fourth st. 693 | city : san francisco 694 | postal : 94107 695 | latlng : (37.7785416, -122.3971931) 696 | addr_variations : frozenset({'570 4th street', '570 4 street', '570 4 saint', '570 4th saint'}) 697 | 698 | 18/10 positive, 18/10 negative 699 | Do these records refer to the same thing? 700 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 701 | y 702 | name : locanda veneta 703 | addr : 8638 w 3rd 704 | city : st los angeles 705 | postal : 90048 706 | latlng : (34.0734172, -118.3810964) 707 | addr_variations : frozenset({'8638 wohnung 3rd', '8638 w 3', '8638 wohnung 3', '8638 weg 3', '8638 west 3rd', '8638 w 3rd', '8638 west 3', '8638 weg 3rd'}) 708 | 709 | name : locanda 710 | addr : w. third st. 711 | city : st los angeles 712 | postal : None 713 | latlng : (34.0689584, -118.3209281) 714 | addr_variations : frozenset({'west 3 street', 'w 3 saint', 'west 3rd street', 'w 3rd street', 'w 3 street', 'west 3 saint', 'w 3rd saint', 'west 3rd saint'}) 715 | 716 | 19/10 positive, 18/10 negative 717 | Do these records refer to the same thing? 718 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 719 | y 720 | INFO:dedupe.training:Final predicate set: 721 | INFO:dedupe.training:(SimplePredicate: (firstTokenPredicate, name), SimplePredicate: (sameThreeCharStartPredicate, addr)) 722 | INFO:dedupe.training:(SimplePredicate: (commonTwoTokens, name), SimplePredicate: (fingerprint, name)) 723 | name : locanda veneta 724 | addr : 3rd st. 725 | city : los angeles 726 | postal : None 727 | latlng : (33.4947903, -112.069374) 728 | addr_variations : frozenset({'3 street', '3rd saint', '3 saint', '3rd street'}) 729 | 730 | name : locanda veneta 731 | addr : 8638 w. third st. 732 | city : los angeles 733 | postal : 90048 734 | latlng : (34.0734172, -118.3810964) 735 | addr_variations : frozenset({'8638 west 3rd street', '8638 west 3 street', '8638 w 3 saint', '8638 west 3 saint', '8638 w 3rd street', '8638 west 3rd saint', '8638 w 3rd saint', '8638 w 3 street'}) 736 | 737 | 20/10 positive, 18/10 negative 738 | Do these records refer to the same thing? 739 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 740 | y 741 | INFO:dedupe.training:Final predicate set: 742 | INFO:dedupe.training:(SimplePredicate: (firstTokenPredicate, name), SimplePredicate: (sameThreeCharStartPredicate, addr)) 743 | INFO:dedupe.training:(SimplePredicate: (commonTwoTokens, name), SimplePredicate: (fingerprint, name)) 744 | INFO:dedupe.training:(SimplePredicate: (commonThreeTokens, city), SimplePredicate: (sameSevenCharStartPredicate, name)) 745 | name : locanda veneta 746 | addr : 3rd st. 747 | city : los angeles 748 | postal : None 749 | latlng : (33.4947903, -112.069374) 750 | addr_variations : frozenset({'3 street', '3rd saint', '3 saint', '3rd street'}) 751 | 752 | name : locanda veneta 753 | addr : 8638 w 3rd 754 | city : st los angeles 755 | postal : 90048 756 | latlng : (34.0734172, -118.3810964) 757 | addr_variations : frozenset({'8638 wohnung 3rd', '8638 w 3', '8638 wohnung 3', '8638 weg 3', '8638 west 3rd', '8638 w 3rd', '8638 west 3', '8638 weg 3rd'}) 758 | 759 | 21/10 positive, 18/10 negative 760 | Do these records refer to the same thing? 761 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 762 | y 763 | name : ritz carlton dining room buckhead 764 | addr : 3434 peachtree rd. ne 765 | city : atlanta 766 | postal : 30326 767 | latlng : (33.8508073, -84.364227) 768 | addr_variations : frozenset({'3434 peachtree road northeast', '3434 peachtree road ne', '3434 peachtree road nebraska'}) 769 | 770 | name : monte carlo 771 | addr : 3145 las vegas blvd. s. 772 | city : las vegas 773 | postal : 89109 774 | latlng : (36.127675, -115.1664725) 775 | addr_variations : frozenset({'3145 las vegas boulevard san', '3145 las vegas boulevard s', '3145 las vegas boulevard south', '3145 las vegas boulevard sur'}) 776 | 777 | 22/10 positive, 18/10 negative 778 | Do these records refer to the same thing? 779 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 780 | n 781 | name : park avenue cafe 782 | addr : 100 e. 63rd st. 783 | city : new york 784 | postal : 10065 785 | latlng : (40.7650225, -73.9676044) 786 | addr_variations : frozenset({'100 east 63rd saint', '100 east 63 street', '100 east 63 saint', '100 e 63rd saint', '100 east 63rd street', '100 e 63 saint', '100 e 63 street', '100 e 63rd street'}) 787 | 788 | name : west beach cafe 789 | addr : 60 n. venice blvd. 790 | city : los angeles 791 | postal : 90291 792 | latlng : (33.984674, -118.4703147) 793 | addr_variations : frozenset({'60 n venice boulevard', '60 north venice boulevard'}) 794 | 795 | 22/10 positive, 19/10 negative 796 | Do these records refer to the same thing? 797 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 798 | n 799 | name : rain 800 | addr : 100 w. 82nd st. 801 | city : new york 802 | postal : 10024 803 | latlng : (40.7839758, -73.9745045) 804 | addr_variations : frozenset({'100 west 82nd saint', '100 w 82nd street', '100 west 82 street', '100 west 82 saint', '100 w 82nd saint', '100 w 82 street', '100 west 82nd street', '100 w 82 saint'}) 805 | 806 | name : splendido embarcadero 807 | addr : 4 808 | city : san francisco 809 | postal : None 810 | latlng : (37.7773755, -122.395447) 811 | addr_variations : frozenset({'4'}) 812 | 813 | 22/10 positive, 20/10 negative 814 | Do these records refer to the same thing? 815 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 816 | n 817 | name : cava 818 | addr : 3rd st. 819 | city : los angeles 820 | postal : None 821 | latlng : (33.4947903, -112.069374) 822 | addr_variations : frozenset({'3 street', '3rd saint', '3 saint', '3rd street'}) 823 | 824 | name : veniero pasticceria 825 | addr : 342 e. 11th st. near 1st ave. 826 | city : new york 827 | postal : 10003 828 | latlng : (40.7294893, -73.98452019999999) 829 | addr_variations : frozenset({'342 e 11 street near 1st avenue', '342 east 11th street near 1 avenue', '342 east 11 street near 1st avenue', '342 east 11 saint near 1 avenue', '342 east 11th saint near 1 avenue', '342 east 11 street near 1 avenue', '342 e 11 saint near 1st avenue', '342 e 11th street near 1st avenue', '342 east 11th street near 1st avenue', '342 e 11th saint near 1st avenue', '342 e 11th street near 1 avenue', '342 east 11 saint near 1st avenue', '342 e 11 saint near 1 avenue', '342 e 11 street near 1 avenue', '342 e 11th saint near 1 avenue', '342 east 11th saint near 1st avenue'}) 830 | 831 | 22/10 positive, 21/10 negative 832 | Do these records refer to the same thing? 833 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 834 | n 835 | name : szechuan hunan cottage 836 | addr : 1588 york ave. 837 | city : new york city 838 | postal : 10028 839 | latlng : (40.7743013, -73.94803689999999) 840 | addr_variations : frozenset({'1588 york avenue'}) 841 | 842 | name : szechuan kitchen 843 | addr : 1460 first ave. 844 | city : new york city 845 | postal : 10021 846 | latlng : (40.7700976, -73.95371999999999) 847 | addr_variations : frozenset({'1460 1 avenue', '1460 1st avenue'}) 848 | 849 | 22/10 positive, 22/10 negative 850 | Do these records refer to the same thing? 851 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 852 | n 853 | name : jody maroni sausage kingdom 854 | addr : 2011 ocean front walk 855 | city : venice 856 | postal : 90291 857 | latlng : (33.9846332, -118.471432) 858 | addr_variations : frozenset({'2011 ocean front walk'}) 859 | 860 | name : joe 861 | addr : 1023 abbot kinney blvd. 862 | city : venice 863 | postal : 90291 864 | latlng : (33.9922429, -118.4718658) 865 | addr_variations : frozenset({'1023 abbot kinney boulevard'}) 866 | 867 | 22/10 positive, 23/10 negative 868 | Do these records refer to the same thing? 869 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 870 | n 871 | name : auberge 872 | addr : 1191 1st ave. between 64th and 65th sts. 873 | city : new york 874 | postal : 10065 875 | latlng : (40.763164, -73.959661) 876 | addr_variations : frozenset({'1191 1st avenue between 64th and 65 streets', '1191 1st avenue between 64 and 65 streets', '1191 1 avenue between 64 and 65th streets', '1191 1 avenue between 64 and 65 streets', '1191 1 avenue between 64th and 65 streets', '1191 1st avenue between 64th and 65th streets', '1191 1 avenue between 64th and 65th streets', '1191 1st avenue between 64 and 65th streets'}) 877 | 878 | name : auberge du midi 879 | addr : 310 w. 4th st. between w. 12th and bank sts. 880 | city : new york 881 | postal : 10014 882 | latlng : (40.7372617, -74.0039956) 883 | addr_variations : frozenset({'310 west 4 saint between west 12 and bank streets', '310 w 4 street between w 12 and bank streets', '310 west 4 saint between w 12th and bank streets', '310 w 4th saint between w 12th and bank streets', '310 west 4 street between west 12 and bank streets', '310 w 4th street between w 12 and bank streets', '310 w 4 street between w 12th and bank streets', '310 west 4th saint between west 12th and bank streets', '310 w 4 saint between west 12 and bank streets', '310 w 4th saint between w 12 and bank streets', '310 w 4th street between west 12 and bank streets', '310 w 4th saint between west 12th and bank streets', '310 west 4 street between w 12 and bank streets', '310 west 4th saint between w 12 and bank streets', '310 w 4 saint between west 12th and bank streets', '310 w 4 street between west 12th and bank streets', '310 west 4 saint between west 12th and bank streets', '310 west 4th street between w 12 and bank streets', '310 w 4th saint between west 12 and bank streets', '310 west 4th street between west 12 and bank streets', '310 w 4 saint between w 12 and bank streets', '310 w 4th street between west 12th and bank streets', '310 west 4 street between west 12th and bank streets', '310 west 4 saint between w 12 and bank streets', '310 west 4 street between w 12th and bank streets', '310 west 4th saint between w 12th and bank streets', '310 w 4 street between west 12 and bank streets', '310 w 4 saint between w 12th and bank streets', '310 west 4th street between west 12th and bank streets', '310 west 4th saint between west 12 and bank streets', '310 west 4th street between w 12th and bank streets', '310 w 4th street between w 12th and bank streets'}) 884 | 885 | 22/10 positive, 24/10 negative 886 | Do these records refer to the same thing? 887 | (y)es / (n)o / (u)nsure / (f)inished / (p)revious 888 | f 889 | Finished labeling 890 | INFO:rlr.crossvalidation:using cross validation to find optimum alpha... 891 | INFO:rlr.crossvalidation:optimum alpha: 1.000000, score 0.4295512458697395 892 | INFO:dedupe.training:Final predicate set: 893 | INFO:dedupe.training:(SimplePredicate: (firstTokenPredicate, name), TfidfNGramCanopyPredicate: (0.6, addr)) 894 | INFO:dedupe.training:(SimplePredicate: (commonTwoTokens, name), SimplePredicate: (fingerprint, name)) -------------------------------------------------------------------------------- /training-simple-input-output.txt: -------------------------------------------------------------------------------- 1 | name : philippe the original 2 | addr : 1001 north alameda 3 | postal : 90012 4 | latlng : (34.059721, -118.237025) 5 | 6 | name : pisces 7 | addr : 95 ave. a 8 | postal : 10009 9 | latlng : (40.7256332, -73.984031) 10 | 11 | 0/10 positive, 0/10 negative 12 | Do these records refer to the same thing? 13 | (y)es / (n)o / (u)nsure / (f)inished 14 | n 15 | 16 | name : philippe the original 17 | addr : 1001 n. alameda st. 18 | postal : 90012 19 | latlng : (34.059721, -118.237025) 20 | 21 | name : mon kee seafood restaurant 22 | addr : 679 n. spring st. 23 | postal : 90012 24 | latlng : (34.0595568, -118.2382488) 25 | 26 | 0/10 positive, 1/10 negative 27 | Do these records refer to the same thing? 28 | (y)es / (n)o / (u)nsure / (f)inished 29 | n 30 | 31 | name : caffe vivaldi 32 | addr : 32 jones st. at bleecker st. 33 | postal : 10014 34 | latlng : (40.7317316, -74.00298049999999) 35 | 36 | name : patria 37 | addr : 250 park ave. s at 20th st. 38 | postal : 10003 39 | latlng : (40.7382552, -73.988214) 40 | 41 | 0/10 positive, 2/10 negative 42 | Do these records refer to the same thing? 43 | (y)es / (n)o / (u)nsure / (f)inished 44 | n 45 | 46 | name : i trulli 47 | addr : 122 e. 27th st. between lexington and park aves. 48 | postal : 10028 49 | latlng : (40.77961699999999, -73.95631999999999) 50 | 51 | name : otabe 52 | addr : 68 e. 56th st. 53 | postal : 10022 54 | latlng : (40.7611775, -73.9720541) 55 | 56 | 0/10 positive, 3/10 negative 57 | Do these records refer to the same thing? 58 | (y)es / (n)o / (u)nsure / (f)inished 59 | n 60 | 61 | name : viva mercado s 62 | addr : 6182 w. flamingo rd. 63 | postal : 89103 64 | latlng : (36.1149027, -115.2269398) 65 | 66 | name : cafe con leche 67 | addr : 424 amsterdam ave. 68 | postal : 10024 69 | latlng : (40.7841454, -73.9778061) 70 | 71 | 0/10 positive, 4/10 negative 72 | Do these records refer to the same thing? 73 | (y)es / (n)o / (u)nsure / (f)inished 74 | n 75 | 76 | name : faz 77 | addr : 161 sutter st. 78 | postal : 94104 79 | latlng : (37.78973999999999, -122.4032937) 80 | 81 | name : splendido embarcadero 82 | addr : 4 83 | postal : None 84 | latlng : (37.7779649, -122.3962019) 85 | 86 | 0/10 positive, 5/10 negative 87 | Do these records refer to the same thing? 88 | (y)es / (n)o / (u)nsure / (f)inished 89 | n 90 | 91 | name : le bernardin 92 | addr : 155 w. 51st st. 93 | postal : 10019 94 | latlng : (40.7615691, -73.98180479999999) 95 | 96 | name : le bernardin 97 | addr : 155 w. 51st st. 98 | postal : 10019 99 | latlng : (40.7615691, -73.98180479999999) 100 | 101 | 0/10 positive, 6/10 negative 102 | Do these records refer to the same thing? 103 | (y)es / (n)o / (u)nsure / (f)inished 104 | y 105 | 106 | name : cafe lalo 107 | addr : 201 w. 83rd st. 108 | postal : 10024 109 | latlng : (40.78598119999999, -73.97672659999999) 110 | 111 | name : cafe lalo 112 | addr : 201 w. 83rd st. 113 | postal : 10024 114 | latlng : (40.78598119999999, -73.97672659999999) 115 | 116 | 1/10 positive, 6/10 negative 117 | Do these records refer to the same thing? 118 | (y)es / (n)o / (u)nsure / (f)inished 119 | y 120 | 121 | INFO:dedupe.training:Final predicate set: 122 | INFO:dedupe.training:(SimplePredicate: (fingerprint, addr), SimplePredicate: (wholeFieldPredicate, name)) 123 | name : hi life restaurant and lounge 124 | addr : 1340 1st ave. at 72nd st. 125 | postal : 10021 126 | latlng : (40.7675807, -73.95580029999999) 127 | 128 | name : trattoria dell arte 129 | addr : 900 7th ave. between 56th and 57th sts. 130 | postal : 10106 131 | latlng : (40.7654454, -73.9805358) 132 | 133 | 2/10 positive, 6/10 negative 134 | Do these records refer to the same thing? 135 | (y)es / (n)o / (u)nsure / (f)inished 136 | n 137 | 138 | name : lattanzi ristorante 139 | addr : 361 w. 46th st. 140 | postal : 10036 141 | latlng : (40.7608494, -73.990016) 142 | 143 | name : pomaire 144 | addr : 371 w. 46th st. off 9th ave. 145 | postal : 10036 146 | latlng : (40.7609632, -73.9902681) 147 | 148 | 2/10 positive, 7/10 negative 149 | Do these records refer to the same thing? 150 | (y)es / (n)o / (u)nsure / (f)inished 151 | n 152 | 153 | name : empire korea 154 | addr : 6 e. 32nd st. 155 | postal : 10016 156 | latlng : (40.7465392, -73.9849772) 157 | 158 | name : tu lan 159 | addr : 8 sixth st. 160 | postal : 94103 161 | latlng : (37.7818759, -122.41013) 162 | 163 | 2/10 positive, 8/10 negative 164 | Do these records refer to the same thing? 165 | (y)es / (n)o / (u)nsure / (f)inished 166 | n 167 | 168 | name : aquavit 169 | addr : 13 w. 54th st. 170 | postal : 10019 171 | latlng : (40.7616767, -73.976345) 172 | 173 | name : aquavit 174 | addr : 13 w. 54th st. 175 | postal : 10019 176 | latlng : (40.7616767, -73.976345) 177 | 178 | 2/10 positive, 9/10 negative 179 | Do these records refer to the same thing? 180 | (y)es / (n)o / (u)nsure / (f)inished 181 | y 182 | 183 | name : mesa grill 184 | addr : 102 5th ave. between 15th and 16th sts. 185 | postal : 10011 186 | latlng : (40.7370445, -73.9931189) 187 | 188 | name : fiorello s roman cafe 189 | addr : 1900 broadway between 63rd and 64th sts. 190 | postal : 10023 191 | latlng : (40.7715867, -73.98138519999999) 192 | 193 | 3/10 positive, 9/10 negative 194 | Do these records refer to the same thing? 195 | (y)es / (n)o / (u)nsure / (f)inished 196 | n 197 | 198 | name : second avenue deli 199 | addr : 156 2nd ave. at 10th st. 200 | postal : 10003 201 | latlng : (40.7296096, -73.9867012) 202 | 203 | name : second avenue deli 204 | addr : 156 second ave. 205 | postal : 10003 206 | latlng : (40.7296096, -73.9867012) 207 | 208 | 3/10 positive, 10/10 negative 209 | Do these records refer to the same thing? 210 | (y)es / (n)o / (u)nsure / (f)inished 211 | y 212 | 213 | name : second street grill 214 | addr : 200 e. fremont st. 215 | postal : 89101 216 | latlng : (36.1709271, -115.1431516) 217 | 218 | name : paty s 219 | addr : 10001 riverside dr. 220 | postal : 91602 221 | latlng : (34.1524376, -118.3496191) 222 | 223 | 4/10 positive, 10/10 negative 224 | Do these records refer to the same thing? 225 | (y)es / (n)o / (u)nsure / (f)inished 226 | n 227 | 228 | INFO:dedupe.training:Final predicate set: 229 | INFO:dedupe.training:(SimplePredicate: (firstIntegerPredicate, addr), SimplePredicate: (wholeFieldPredicate, name)) 230 | name : cafe ritz carlton buckhead 231 | addr : 3434 peachtree rd. 232 | postal : 30326 233 | latlng : (33.8508073, -84.364227) 234 | 235 | name : ritz carlton cafe buckhead 236 | addr : 3434 peachtree rd. ne 237 | postal : 30326 238 | latlng : (33.8508073, -84.364227) 239 | 240 | 4/10 positive, 11/10 negative 241 | Do these records refer to the same thing? 242 | (y)es / (n)o / (u)nsure / (f)inished 243 | y 244 | 245 | INFO:dedupe.training:Final predicate set: 246 | INFO:dedupe.training:(SimplePredicate: (fingerprint, name), SimplePredicate: (firstIntegerPredicate, addr)) 247 | name : le colonial 248 | addr : 149 e. 57th st. 249 | postal : 10022 250 | latlng : (40.7608569, -73.9683494) 251 | 252 | name : cassell s 253 | addr : 3266 w. sixth st. 254 | postal : 90020 255 | latlng : (34.0634809, -118.2936285) 256 | 257 | 5/10 positive, 11/10 negative 258 | Do these records refer to the same thing? 259 | (y)es / (n)o / (u)nsure / (f)inished 260 | n 261 | 262 | name : cafe ritz carlton buckhead 263 | addr : 3434 peachtree rd. 264 | postal : 30326 265 | latlng : (33.8508073, -84.364227) 266 | 267 | name : ritz carlton dining room buckhead 268 | addr : 3434 peachtree rd. ne 269 | postal : 30326 270 | latlng : (33.8508073, -84.364227) 271 | 272 | 5/10 positive, 12/10 negative 273 | Do these records refer to the same thing? 274 | (y)es / (n)o / (u)nsure / (f)inished 275 | n 276 | 277 | name : pamir 278 | addr : 1065 1st ave. at 58th st. 279 | postal : 10022 280 | latlng : (40.7591914, -73.9626112) 281 | 282 | name : rosa mexicano 283 | addr : 1063 1st ave. at 58th st. 284 | postal : 10022 285 | latlng : (40.75896580000001, -73.9627039) 286 | 287 | 5/10 positive, 13/10 negative 288 | Do these records refer to the same thing? 289 | (y)es / (n)o / (u)nsure / (f)inished 290 | n 291 | 292 | name : smith wollensky 293 | addr : 201 e. 49th st. 294 | postal : 10017 295 | latlng : (40.755156, -73.9707177) 296 | 297 | name : smith wollensky 298 | addr : 797 third ave. 299 | postal : 10022 300 | latlng : (40.7551704, -73.9707437) 301 | 302 | 5/10 positive, 14/10 negative 303 | Do these records refer to the same thing? 304 | (y)es / (n)o / (u)nsure / (f)inished 305 | y 306 | 307 | name : le gamin 308 | addr : 50 macdougal st. between houston and prince sts. 309 | postal : 10012 310 | latlng : (40.7273246, -74.0024635) 311 | 312 | name : le marais 313 | addr : 150 w. 46th st. 314 | postal : 10036 315 | latlng : (40.7580041, -73.98431149999999) 316 | 317 | 6/10 positive, 14/10 negative 318 | Do these records refer to the same thing? 319 | (y)es / (n)o / (u)nsure / (f)inished 320 | n 321 | 322 | INFO:dedupe.training:Final predicate set: 323 | INFO:dedupe.training:(SimplePredicate: (fingerprint, name), SimplePredicate: (sameThreeCharStartPredicate, postal)) 324 | name : lespinasse 325 | addr : 2 e. 55th st. 326 | postal : 10022 327 | latlng : (40.7613979, -73.9746128) 328 | 329 | name : lespinasse new york city 330 | addr : 2 e. 55th st. 331 | postal : 10022 332 | latlng : (40.7613979, -73.9746128) 333 | 334 | 6/10 positive, 15/10 negative 335 | Do these records refer to the same thing? 336 | (y)es / (n)o / (u)nsure / (f)inished 337 | y 338 | 339 | name : philippe s the original 340 | addr : 1001 n. alameda st. 341 | postal : 90012 342 | latlng : (34.059721, -118.237025) 343 | 344 | name : philippe the original 345 | addr : 1001 north alameda 346 | postal : 90012 347 | latlng : (34.059721, -118.237025) 348 | 349 | 7/10 positive, 15/10 negative 350 | Do these records refer to the same thing? 351 | (y)es / (n)o / (u)nsure / (f)inished 352 | y 353 | 354 | INFO:dedupe.training:Final predicate set: 355 | INFO:dedupe.training:(SimplePredicate: (latLongGridPredicate, latlng), TfidfNGramCanopyPredicate: (0.6, name)) 356 | name : restaurant ritz carlton atlanta 357 | addr : 181 peachtree st. 358 | postal : 30303 359 | latlng : (33.7585793, -84.3870657) 360 | 361 | name : ritz carlton cafe atlanta 362 | addr : 181 peachtree st. 363 | postal : 30303 364 | latlng : (33.7585793, -84.3870657) 365 | 366 | 8/10 positive, 15/10 negative 367 | Do these records refer to the same thing? 368 | (y)es / (n)o / (u)nsure / (f)inished 369 | n 370 | 371 | name : tillerman 372 | addr : 2245 e. flamingo rd. 373 | postal : 89119 374 | latlng : (36.114384, -115.1218936) 375 | 376 | name : original pantry bakery 377 | addr : 875 s. figueroa st. downtown 378 | postal : 90017 379 | latlng : (34.0464451, -118.2628321) 380 | 381 | 8/10 positive, 16/10 negative 382 | Do these records refer to the same thing? 383 | (y)es / (n)o / (u)nsure / (f)inished 384 | n 385 | 386 | name : georgia grille 387 | addr : 2290 peachtree rd. peachtree square shopping center 388 | postal : 30309 389 | latlng : (33.8168771, -84.3905065) 390 | 391 | name : georgia grille 392 | addr : 2290 peachtree rd. 393 | postal : 30309 394 | latlng : (33.8171632, -84.3900366) 395 | 396 | 8/10 positive, 17/10 negative 397 | Do these records refer to the same thing? 398 | (y)es / (n)o / (u)nsure / (f)inished 399 | y 400 | 401 | name : paty s 402 | addr : 10001 riverside dr. 403 | postal : 91602 404 | latlng : (34.1524376, -118.3496191) 405 | 406 | name : restaurant horikawa 407 | addr : 111 s. san pedro st. 408 | postal : 90012 409 | latlng : (34.0500968, -118.2413802) 410 | 411 | 9/10 positive, 17/10 negative 412 | Do these records refer to the same thing? 413 | (y)es / (n)o / (u)nsure / (f)inished 414 | n 415 | 416 | name : cafe ritz carlton buckhead 417 | addr : 3434 peachtree rd. 418 | postal : 30326 419 | latlng : (33.8508073, -84.364227) 420 | 421 | name : dining room ritz carlton buckhead 422 | addr : 3434 peachtree rd. 423 | postal : 30326 424 | latlng : (33.8508073, -84.364227) 425 | 426 | 9/10 positive, 18/10 negative 427 | Do these records refer to the same thing? 428 | (y)es / (n)o / (u)nsure / (f)inished 429 | n 430 | 431 | name : arnie morton s of chicago 432 | addr : 435 s. la cienega blvd. 433 | postal : 90048 434 | latlng : (34.070609, -118.376722) 435 | 436 | name : sarabeth s kitchen 437 | addr : 423 amsterdam ave. between 80th and 81st sts. 438 | postal : 10024 439 | latlng : (40.7838797, -73.97742439999999) 440 | 441 | 9/10 positive, 19/10 negative 442 | Do these records refer to the same thing? 443 | (y)es / (n)o / (u)nsure / (f)inished 444 | n 445 | 446 | name : mifune japan center kintetsu building 447 | addr : 1737 post st. 448 | postal : 94115 449 | latlng : (37.785329, -122.430369) 450 | 451 | name : mifune 452 | addr : 1737 post st. 453 | postal : 94115 454 | latlng : (37.785329, -122.430369) 455 | 456 | 9/10 positive, 20/10 negative 457 | Do these records refer to the same thing? 458 | (y)es / (n)o / (u)nsure / (f)inished 459 | y 460 | 461 | name : le colonial 462 | addr : 8783 beverly blvd. 463 | postal : 90048 464 | latlng : (34.0773657, -118.3833566) 465 | 466 | name : cendrillon asian grill marimba bar 467 | addr : 45 mercer st. between broome and grand sts. 468 | postal : 10013 469 | latlng : (40.721674, -74.001407) 470 | 471 | 10/10 positive, 20/10 negative 472 | Do these records refer to the same thing? 473 | (y)es / (n)o / (u)nsure / (f)inished 474 | n 475 | 476 | INFO:dedupe.training:Final predicate set: 477 | INFO:dedupe.training:(TfidfNGramCanopyPredicate: (0.2, postal), TfidfNGramCanopyPredicate: (0.8, name)) 478 | INFO:dedupe.training:(SimplePredicate: (fingerprint, addr), SimplePredicate: (sameFiveCharStartPredicate, name)) 479 | name : restaurant ritz carlton atlanta 480 | addr : 181 peachtree st. 481 | postal : 30303 482 | latlng : (33.7585793, -84.3870657) 483 | 484 | name : ritz carlton restaurant 485 | addr : 181 peachtree st. 486 | postal : 30303 487 | latlng : (33.7585793, -84.3870657) 488 | 489 | 10/10 positive, 21/10 negative 490 | Do these records refer to the same thing? 491 | (y)es / (n)o / (u)nsure / (f)inished 492 | y 493 | 494 | name : le montrachet 495 | addr : 3000 w. paradise rd. 496 | postal : 89109 497 | latlng : (36.1362611, -115.1512539) 498 | 499 | name : le montrachet bistro 500 | addr : 3000 paradise rd. 501 | postal : 89109 502 | latlng : (36.1362611, -115.1512539) 503 | 504 | 11/10 positive, 21/10 negative 505 | Do these records refer to the same thing? 506 | (y)es / (n)o / (u)nsure / (f)inished 507 | y 508 | 509 | name : ritz carlton cafe buckhead 510 | addr : 3434 peachtree rd. ne 511 | postal : 30326 512 | latlng : (33.8508073, -84.364227) 513 | 514 | name : dining room ritz carlton buckhead 515 | addr : 3434 peachtree rd. 516 | postal : 30326 517 | latlng : (33.8508073, -84.364227) 518 | 519 | 12/10 positive, 21/10 negative 520 | Do these records refer to the same thing? 521 | (y)es / (n)o / (u)nsure / (f)inished 522 | n 523 | 524 | INFO:dedupe.training:Final predicate set: 525 | INFO:dedupe.training:(TfidfNGramCanopyPredicate: (0.2, postal), TfidfNGramCanopyPredicate: (0.8, name)) 526 | INFO:dedupe.training:(LevenshteinCanopyPredicate: (2, addr), SimplePredicate: (sameFiveCharStartPredicate, name)) 527 | INFO:dedupe.training:Final predicate set: 528 | INFO:dedupe.training:(SimplePredicate: (firstIntegerPredicate, addr), SimplePredicate: (sameFiveCharStartPredicate, name)) 529 | INFO:dedupe.training:(SimplePredicate: (latLongGridPredicate, latlng), SimplePredicate: (oneGramFingerprint, name)) -------------------------------------------------------------------------------- /vinta.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vintasoftware/deduplication-slides/631389413a558ea83a407a47870253325b7b068e/vinta.png --------------------------------------------------------------------------------