├── .gitattributes ├── .gitignore ├── Pipfile ├── README.md ├── calc_and_plot_d2v_full.py ├── calc_dataset_stats.py ├── cat_stats.py ├── compare_all_pairs_scores.py ├── compare_cats.py ├── corpus_utils.py ├── database ├── __init__.py ├── make_patent_db.py ├── pub_date_extractor.py └── pubdate_updater.py ├── db_patent_stats.py ├── doc2vec.py ├── evaluate_simcoefs.py ├── evaluate_simcoefs_humanscores.py ├── follow_up_analyses.ipynb ├── get_baseline_auc.py ├── idf_regression.py ├── idf_regression_entire_corpus.py ├── kpca.py ├── lat_sem_ana.py ├── make_section_corpus.py ├── patentcrawler ├── get_duplicates.py ├── patentcollector.py └── scrape_patents.py ├── plot_simcoef_distr.py ├── plot_utils.py ├── train_and_calc_w2v.py ├── wmd_pats.py ├── word2vec.py └── word2vec_app.py /.gitattributes: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/.gitattributes -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/.gitignore -------------------------------------------------------------------------------- /Pipfile: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/Pipfile -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/README.md -------------------------------------------------------------------------------- /calc_and_plot_d2v_full.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/calc_and_plot_d2v_full.py -------------------------------------------------------------------------------- /calc_dataset_stats.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/calc_dataset_stats.py -------------------------------------------------------------------------------- /cat_stats.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/cat_stats.py -------------------------------------------------------------------------------- /compare_all_pairs_scores.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/compare_all_pairs_scores.py -------------------------------------------------------------------------------- /compare_cats.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/compare_cats.py -------------------------------------------------------------------------------- /corpus_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/corpus_utils.py -------------------------------------------------------------------------------- /database/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /database/make_patent_db.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/database/make_patent_db.py -------------------------------------------------------------------------------- /database/pub_date_extractor.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/database/pub_date_extractor.py -------------------------------------------------------------------------------- /database/pubdate_updater.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/database/pubdate_updater.py -------------------------------------------------------------------------------- /db_patent_stats.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/db_patent_stats.py -------------------------------------------------------------------------------- /doc2vec.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/doc2vec.py -------------------------------------------------------------------------------- /evaluate_simcoefs.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/evaluate_simcoefs.py -------------------------------------------------------------------------------- /evaluate_simcoefs_humanscores.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/evaluate_simcoefs_humanscores.py -------------------------------------------------------------------------------- /follow_up_analyses.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/follow_up_analyses.ipynb -------------------------------------------------------------------------------- /get_baseline_auc.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/get_baseline_auc.py -------------------------------------------------------------------------------- /idf_regression.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/idf_regression.py -------------------------------------------------------------------------------- /idf_regression_entire_corpus.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/idf_regression_entire_corpus.py -------------------------------------------------------------------------------- /kpca.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/kpca.py -------------------------------------------------------------------------------- /lat_sem_ana.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/lat_sem_ana.py -------------------------------------------------------------------------------- /make_section_corpus.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/make_section_corpus.py -------------------------------------------------------------------------------- /patentcrawler/get_duplicates.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/patentcrawler/get_duplicates.py -------------------------------------------------------------------------------- /patentcrawler/patentcollector.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/patentcrawler/patentcollector.py -------------------------------------------------------------------------------- /patentcrawler/scrape_patents.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/patentcrawler/scrape_patents.py -------------------------------------------------------------------------------- /plot_simcoef_distr.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/plot_simcoef_distr.py -------------------------------------------------------------------------------- /plot_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/plot_utils.py -------------------------------------------------------------------------------- /train_and_calc_w2v.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/train_and_calc_w2v.py -------------------------------------------------------------------------------- /wmd_pats.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/wmd_pats.py -------------------------------------------------------------------------------- /word2vec.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/word2vec.py -------------------------------------------------------------------------------- /word2vec_app.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/helmersl/patent_similarity_search/HEAD/word2vec_app.py --------------------------------------------------------------------------------