├── .github
├── FUNDING.yml
├── ISSUE_TEMPLATE
│ ├── bug_report.md
│ └── feature_request.md
├── PULL_REQUEST_TEMPLATE
│ └── pull_request_template.md
├── labeler.yml
└── workflows
│ ├── greetings.yml
│ ├── labeler.yml
│ ├── sync.yml
│ └── urlchecker.yml
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE.md
├── README.md
├── README_French.md
├── README_Hindi.md
├── README_Portuguesse.md
├── README_Spanish.md
├── _config.yml
└── logo.svg
/.github/FUNDING.yml:
--------------------------------------------------------------------------------
1 | # These are supported funding model platforms
2 |
3 | github: [ agrover112 ]
4 | patreon: # Replace with a single Patreon username
5 | open_collective: # Replace with a single Open Collective username
6 | ko_fi: # Replace with a single Ko-fi username
7 | tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
8 | community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
9 | liberapay: # Replace with a single Liberapay username
10 | issuehunt: # Replace with a single IssueHunt username
11 | otechie: # Replace with a single Otechie username
12 | lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry
13 | custom: [ 'paypal.me/agrover112' ]
14 |
--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/bug_report.md:
--------------------------------------------------------------------------------
1 | ---
2 | name: Bug report
3 | about: Create a report to help us improve
4 | title: "[BUG]"
5 | labels: bug
6 | assignees: ''
7 |
8 | ---
9 |
10 | Your issue may already be reported!
11 | Please search on the [issue tracker](https://github.com/Agrover112/awesome-semantic-search/issues) before creating one.
12 |
13 |
14 | **Describe the bug**
15 | A clear and concise description of what the bug is.
16 |
17 | **To Reproduce**
18 | Steps to reproduce the behavior:
19 | 1. Go to '...'
20 | 2. Click on '....'
21 | 3. Scroll down to '....'
22 | 4. See error
23 |
24 | **Expected behavior**
25 | A clear and concise description of what you expected to happen.
26 |
27 | **Screenshots**
28 | If applicable, add screenshots to help explain your problem.
29 |
30 | **Additional context**
31 | Add any other context about the problem here.
32 |
--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/feature_request.md:
--------------------------------------------------------------------------------
1 | ---
2 | name: Feature request
3 | about: Suggest an idea for this project
4 | title: ''
5 | labels: ''
6 | assignees: ''
7 |
8 | ---
9 |
10 | **Is your feature request related to a problem? Please describe.**
11 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
12 |
13 | **Describe the solution you'd like**
14 | A clear and concise description of what you want to happen.
15 |
16 | **Describe alternatives you've considered**
17 | A clear and concise description of any alternative solutions or features you've considered.
18 |
19 | **Additional context**
20 | Add any other context or screenshots about the feature request here.
21 |
--------------------------------------------------------------------------------
/.github/PULL_REQUEST_TEMPLATE/pull_request_template.md:
--------------------------------------------------------------------------------
1 | A similar PR may already be submitted!
2 | Please search among the [Pull request](https://github.com/Agrover112/awesome-semantic-search/pulls) before creating one.
3 |
4 | Thanks for submitting a pull request! Please provide enough information so that others can review your pull request:
5 |
6 | For more information, see the `CONTRIBUTING` guide.
7 |
8 |
9 | **Summary**
10 |
11 |
12 |
13 | This PR fixes/implements the following **bugs/features**
14 |
15 | * [ ] Bug 1
16 | * [ ] Bug 2
17 | * [ ] Feature 1
18 | * [ ] Feature 2
19 | * [ ] Breaking changes
20 |
21 |
22 |
23 | Explain the **motivation** for making this change. What existing problem does the pull request solve?
24 |
25 |
26 |
27 | **Test plan (required)**
28 |
29 | Demonstrate the code is solid. Example: The exact commands you ran and their output, screenshots / videos if the pull request changes UI.
30 |
31 |
32 |
33 | **Code formatting**
34 |
35 |
36 |
37 | **Closing issues**
38 |
39 |
40 | Fixes #
41 |
--------------------------------------------------------------------------------
/.github/labeler.yml:
--------------------------------------------------------------------------------
1 | # Add labels based on what README file language is being contributed to
2 | English: README.md
3 | Hindi: README_Hindi.md
4 | Spanish: README_Spanish.md
5 |
--------------------------------------------------------------------------------
/.github/workflows/greetings.yml:
--------------------------------------------------------------------------------
1 | name: Greetings
2 |
3 | on: [pull_request, issues]
4 |
5 | jobs:
6 | greeting:
7 | runs-on: ubuntu-latest
8 | permissions:
9 | issues: write
10 | pull-requests: write
11 | steps:
12 | - uses: actions/first-interaction@v1
13 | with:
14 | repo-token: ${{ secrets.GITHUB_TOKEN }}
15 | issue-message: 'Message that will be displayed on users first issue'
16 | pr-message: 'Message that will be displayed on users first pull request'
17 |
--------------------------------------------------------------------------------
/.github/workflows/labeler.yml:
--------------------------------------------------------------------------------
1 | # This workflow will triage pull requests and apply a label based on the
2 | # paths that are modified in the pull request.
3 | #
4 | # To use this workflow, you will need to set up a .github/labeler.yml
5 | # file with configuration. For more information, see:
6 | # https://github.com/actions/labeler
7 |
8 | name: Labeler
9 | on: [pull_request_target]
10 |
11 | jobs:
12 | label:
13 | runs-on: ubuntu-latest
14 |
15 | steps:
16 | - uses: actions/labeler@v3
17 | with:
18 | repo-token: "${{ secrets.GITHUB_TOKEN }}"
19 |
--------------------------------------------------------------------------------
/.github/workflows/sync.yml:
--------------------------------------------------------------------------------
1 | name: Sync Fork
2 |
3 | on:
4 | schedule:
5 | - cron: '*/30 * * * *' # every 30 minutes
6 | workflow_dispatch: # on button click
7 |
8 | jobs:
9 | sync:
10 |
11 | runs-on: ubuntu-latest
12 |
13 | steps:
14 | - uses: tgymnich/fork-sync@v1.4
15 | with:
16 | token: ${{ secrets.PERSONAL_TOKEN }}
17 | owner: llvm
18 | base: master
19 | head: master
20 |
--------------------------------------------------------------------------------
/.github/workflows/urlchecker.yml:
--------------------------------------------------------------------------------
1 |
2 | name: Check URLs
3 |
4 | on: [push, pull_request]
5 |
6 | jobs:
7 | urlcheck:
8 | runs-on: ubuntu-latest
9 |
10 | steps:
11 | - uses: actions/checkout@v2
12 | - name: URLs-checker
13 | uses: urlstechie/urlchecker-action@0.0.27
14 | with:
15 | file_types: .md
16 | print_all: false
17 | timeout: 10
18 | retry_count: 3
19 | exclude_patterns: https://github.com/Agrover112/awesome-semantic-search/issues
20 | force_pass: true
21 |
--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | # Contributor Covenant Code of Conduct.
2 |
3 | ## Our Pledge
4 |
5 | We as members, contributors, and leaders pledge to make participation in our
6 | community a harassment-free experience for everyone, regardless of age, body
7 | size, visible or invisible disability, ethnicity, sex characteristics, gender
8 | identity and expression, level of experience, education, socio-economic status,
9 | nationality, personal appearance, race, religion, or sexual identity
10 | and orientation.
11 |
12 | We pledge to act and interact in ways that contribute to an open, welcoming,
13 | diverse, inclusive, and healthy community.
14 |
15 | ## Our Standards
16 |
17 | Examples of behavior that contributes to a positive environment for our
18 | community include:
19 |
20 | * Demonstrating empathy and kindness toward other people
21 | * Being respectful of differing opinions, viewpoints, and experiences
22 | * Giving and gracefully accepting constructive feedback
23 | * Accepting responsibility and apologizing to those affected by our mistakes,
24 | and learning from the experience
25 | * Focusing on what is best not just for us as individuals, but for the
26 | overall community
27 |
28 | Examples of unacceptable behavior include:
29 |
30 | * The use of sexualized language or imagery, and sexual attention or
31 | advances of any kind
32 | * Trolling, insulting or derogatory comments, and personal or political attacks
33 | * Public or private harassment
34 | * Publishing others' private information, such as a physical or email
35 | address, without their explicit permission
36 | * Other conduct which could reasonably be considered inappropriate in a
37 | professional setting
38 |
39 | ## Enforcement Responsibilities
40 |
41 | Community leaders are responsible for clarifying and enforcing our standards of
42 | acceptable behavior and will take appropriate and fair corrective action in
43 | response to any behavior that they deem inappropriate, threatening, offensive,
44 | or harmful.
45 |
46 | Community leaders have the right and responsibility to remove, edit, or reject
47 | comments, commits, code, wiki edits, issues, and other contributions that are
48 | not aligned to this Code of Conduct, and will communicate reasons for moderation
49 | decisions when appropriate.
50 |
51 | ## Scope
52 |
53 | This Code of Conduct applies within all community spaces, and also applies when
54 | an individual is officially representing the community in public spaces.
55 | Examples of representing our community include using an official e-mail address,
56 | posting via an official social media account, or acting as an appointed
57 | representative at an online or offline event.
58 |
59 | ## Enforcement
60 |
61 | Instances of abusive, harassing, or otherwise unacceptable behavior may be
62 | reported to the community leaders responsible for enforcement at
63 | .
64 | All complaints will be reviewed and investigated promptly and fairly.
65 |
66 | All community leaders are obligated to respect the privacy and security of the
67 | reporter of any incident.
68 |
69 | ## Enforcement Guidelines
70 |
71 | Community leaders will follow these Community Impact Guidelines in determining
72 | the consequences for any action they deem in violation of this Code of Conduct:
73 |
74 | ### 1. Correction
75 |
76 | **Community Impact**: Use of inappropriate language or other behavior deemed
77 | unprofessional or unwelcome in the community.
78 |
79 | **Consequence**: A private, written warning from community leaders, providing
80 | clarity around the nature of the violation and an explanation of why the
81 | behavior was inappropriate. A public apology may be requested.
82 |
83 | ### 2. Warning
84 |
85 | **Community Impact**: A violation through a single incident or series
86 | of actions.
87 |
88 | **Consequence**: A warning with consequences for continued behavior. No
89 | interaction with the people involved, including unsolicited interaction with
90 | those enforcing the Code of Conduct, for a specified period of time. This
91 | includes avoiding interactions in community spaces as well as external channels
92 | like social media. Violating these terms may lead to a temporary or
93 | permanent ban.
94 |
95 | ### 3. Temporary Ban
96 |
97 | **Community Impact**: A serious violation of community standards, including
98 | sustained inappropriate behavior.
99 |
100 | **Consequence**: A temporary ban from any sort of interaction or public
101 | communication with the community for a specified period of time. No public or
102 | private interaction with the people involved, including unsolicited interaction
103 | with those enforcing the Code of Conduct, is allowed during this period.
104 | Violating these terms may lead to a permanent ban.
105 |
106 | ### 4. Permanent Ban
107 |
108 | **Community Impact**: Demonstrating a pattern of violation of community
109 | standards, including sustained inappropriate behavior, harassment of an
110 | individual, or aggression toward or disparagement of classes of individuals.
111 |
112 | **Consequence**: A permanent ban from any sort of public interaction within
113 | the community.
114 |
115 | ## Attribution
116 |
117 | This Code of Conduct is adapted from the [Contributor Covenant][homepage],
118 | version 2.0, available at
119 | https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
120 |
121 | Community Impact Guidelines were inspired by [Mozilla's code of conduct
122 | enforcement ladder](https://github.com/mozilla/diversity).
123 |
124 | [homepage]: https://www.contributor-covenant.org
125 |
126 | For answers to common questions about this code of conduct, see the FAQ at
127 | https://www.contributor-covenant.org/faq. Translations are available at
128 | https://www.contributor-covenant.org/translations.
129 |
--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # CONTRIBUTING GUIDELINES
2 | Please take a moment to review this document in order to make the contribution process easy and effective for everyone involved.
3 |
4 | Following these guidelines helps to communicate that you respect the time of the developers managing and developing this open source project. In return, they should reciprocate that respect in addressing your issue or assessing patches and features
5 | ## Some contributing rules you should follow
6 |
7 | - Be critical: is the proposed library or paper or conference really awesome? If does, add it in the relevant section at the last position.Bear in mind, that in many cases one resource may fit multiple categories.Choose exactly one.
8 | - Make use of [discussion](https://github.com/Agrover112/awesome-semantic-search/discussions) properly (proper language)
9 | - Check if the resource you are adding already exists in the [list](https://github.com/Agrover112/awesome-semantic-search#papers)
10 | - Check for broken or re-located links.
11 | - If this is your first contribution, You might also want to take up issues with the good first issue or the help wanted label.
12 |
13 | - Discuss the changes you wish to make by creating an [issue](https://github.com/Agrover112/awesome-semantic-search/issues/new) or comment on an [existing issue](https://github.com/Agrover112/awesome-semantic-search/issues).
14 | - Description should start with a capital letter and be ended with proper punctuation.
15 | - Once you have been assigned the issue by the maintainer, you can go ahead to fork the repo, clone and make changes to fix the issue.
16 | - Please follow [**conventional commits**](https://www.conventionalcommits.org/en/v1.0.0-beta.2/)
17 |
18 | ## Making your Pull Request
19 |
20 | - Good pull requests - patches, improvements, new features - are a fantastic help. They should remain focused in scope and avoid containing unrelated commits.
21 |
22 | - you can create a pull request referencing the number of the issue you fixed.
23 |
24 | - Once, you have completed this, your pull request would be reviewed by a maintainer, if it satisfies the requirements of the corresponding issue to which it was made, it would be merged.
25 |
26 | Kudos to you :balloon:
27 |
28 | ---
29 |
30 | Thank you for contributing to [awesome-semantic-search](https://github.com/Agrover112/awesome-semantic-search).
31 |
--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
1 | Creative Commons Legal Code
2 |
3 | CC0 1.0 Universal
4 |
5 | CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE
6 | LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN
7 | ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS
8 | INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES
9 | REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS
10 | PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM
11 | THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED
12 | HEREUNDER.
13 |
14 | Statement of Purpose
15 |
16 | The laws of most jurisdictions throughout the world automatically confer
17 | exclusive Copyright and Related Rights (defined below) upon the creator
18 | and subsequent owner(s) (each and all, an "owner") of an original work of
19 | authorship and/or a database (each, a "Work").
20 |
21 | Certain owners wish to permanently relinquish those rights to a Work for
22 | the purpose of contributing to a commons of creative, cultural and
23 | scientific works ("Commons") that the public can reliably and without fear
24 | of later claims of infringement build upon, modify, incorporate in other
25 | works, reuse and redistribute as freely as possible in any form whatsoever
26 | and for any purposes, including without limitation commercial purposes.
27 | These owners may contribute to the Commons to promote the ideal of a free
28 | culture and the further production of creative, cultural and scientific
29 | works, or to gain reputation or greater distribution for their Work in
30 | part through the use and efforts of others.
31 |
32 | For these and/or other purposes and motivations, and without any
33 | expectation of additional consideration or compensation, the person
34 | associating CC0 with a Work (the "Affirmer"), to the extent that he or she
35 | is an owner of Copyright and Related Rights in the Work, voluntarily
36 | elects to apply CC0 to the Work and publicly distribute the Work under its
37 | terms, with knowledge of his or her Copyright and Related Rights in the
38 | Work and the meaning and intended legal effect of CC0 on those rights.
39 |
40 | 1. Copyright and Related Rights. A Work made available under CC0 may be
41 | protected by copyright and related or neighboring rights ("Copyright and
42 | Related Rights"). Copyright and Related Rights include, but are not
43 | limited to, the following:
44 |
45 | i. the right to reproduce, adapt, distribute, perform, display,
46 | communicate, and translate a Work;
47 | ii. moral rights retained by the original author(s) and/or performer(s);
48 | iii. publicity and privacy rights pertaining to a person's image or
49 | likeness depicted in a Work;
50 | iv. rights protecting against unfair competition in regards to a Work,
51 | subject to the limitations in paragraph 4(a), below;
52 | v. rights protecting the extraction, dissemination, use and reuse of data
53 | in a Work;
54 | vi. database rights (such as those arising under Directive 96/9/EC of the
55 | European Parliament and of the Council of 11 March 1996 on the legal
56 | protection of databases, and under any national implementation
57 | thereof, including any amended or successor version of such
58 | directive); and
59 | vii. other similar, equivalent or corresponding rights throughout the
60 | world based on applicable law or treaty, and any national
61 | implementations thereof.
62 |
63 | 2. Waiver. To the greatest extent permitted by, but not in contravention
64 | of, applicable law, Affirmer hereby overtly, fully, permanently,
65 | irrevocably and unconditionally waives, abandons, and surrenders all of
66 | Affirmer's Copyright and Related Rights and associated claims and causes
67 | of action, whether now known or unknown (including existing as well as
68 | future claims and causes of action), in the Work (i) in all territories
69 | worldwide, (ii) for the maximum duration provided by applicable law or
70 | treaty (including future time extensions), (iii) in any current or future
71 | medium and for any number of copies, and (iv) for any purpose whatsoever,
72 | including without limitation commercial, advertising or promotional
73 | purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each
74 | member of the public at large and to the detriment of Affirmer's heirs and
75 | successors, fully intending that such Waiver shall not be subject to
76 | revocation, rescission, cancellation, termination, or any other legal or
77 | equitable action to disrupt the quiet enjoyment of the Work by the public
78 | as contemplated by Affirmer's express Statement of Purpose.
79 |
80 | 3. Public License Fallback. Should any part of the Waiver for any reason
81 | be judged legally invalid or ineffective under applicable law, then the
82 | Waiver shall be preserved to the maximum extent permitted taking into
83 | account Affirmer's express Statement of Purpose. In addition, to the
84 | extent the Waiver is so judged Affirmer hereby grants to each affected
85 | person a royalty-free, non transferable, non sublicensable, non exclusive,
86 | irrevocable and unconditional license to exercise Affirmer's Copyright and
87 | Related Rights in the Work (i) in all territories worldwide, (ii) for the
88 | maximum duration provided by applicable law or treaty (including future
89 | time extensions), (iii) in any current or future medium and for any number
90 | of copies, and (iv) for any purpose whatsoever, including without
91 | limitation commercial, advertising or promotional purposes (the
92 | "License"). The License shall be deemed effective as of the date CC0 was
93 | applied by Affirmer to the Work. Should any part of the License for any
94 | reason be judged legally invalid or ineffective under applicable law, such
95 | partial invalidity or ineffectiveness shall not invalidate the remainder
96 | of the License, and in such case Affirmer hereby affirms that he or she
97 | will not (i) exercise any of his or her remaining Copyright and Related
98 | Rights in the Work or (ii) assert any associated claims and causes of
99 | action with respect to the Work, in either case contrary to Affirmer's
100 | express Statement of Purpose.
101 |
102 | 4. Limitations and Disclaimers.
103 |
104 | a. No trademark or patent rights held by Affirmer are waived, abandoned,
105 | surrendered, licensed or otherwise affected by this document.
106 | b. Affirmer offers the Work as-is and makes no representations or
107 | warranties of any kind concerning the Work, express, implied,
108 | statutory or otherwise, including without limitation warranties of
109 | title, merchantability, fitness for a particular purpose, non
110 | infringement, or the absence of latent or other defects, accuracy, or
111 | the present or absence of errors, whether or not discoverable, all to
112 | the greatest extent permissible under applicable law.
113 | c. Affirmer disclaims responsibility for clearing rights of other persons
114 | that may apply to the Work or any use thereof, including without
115 | limitation any person's Copyright and Related Rights in the Work.
116 | Further, Affirmer disclaims responsibility for obtaining any necessary
117 | consents, permissions or other rights required for any use of the
118 | Work.
119 | d. Affirmer understands and acknowledges that Creative Commons is not a
120 | party to this document and has no duty or obligation with respect to
121 | this CC0 or use of the Work.
122 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Awesome Semantic-Search [](https://awesome.re) [](https://conventionalcommits.org)
2 |
3 |
4 |
5 |
6 |
7 |
8 | Logo made by [@createdbytango](https://instagram.com/createdbytango).
9 |
10 | **Looking for More Paper Additions.
11 | PS: Raise a PR**
12 |
13 | Following repository aims to serve a meta-repository for [Semantic Search](https://en.wikipedia.org/wiki/Semantic_search) and [Semantic Similarity](http://nlpprogress.com/english/semantic_textual_similarity.html) related tasks.
14 |
15 | Semantic Search isn't limited to text! It can be done with images, speech, etc.There are numerous different use-cases and applications of semantic search.
16 |
17 | Feel free to raise a PR on this repo!
18 |
19 | ## Contents
20 |
21 | - [Papers](#papers)
22 | - [2014](#2014)
23 | - [2015](#2015)
24 | - [2016](#2016)
25 | - [2017](#2017)
26 | - [2018](#2018)
27 | - [2019](#2019)
28 | - [2020](#2020)
29 | - [2021](#2021)
30 | - [2022](#2022)
31 | - [2023](#2023)
32 | - [Articles](#articles)
33 | - [Libraries and Tools](#libraries-and-tools)
34 | - [Datasets](#datasets)
35 | - [Milestones](#milestones)
36 |
37 | ## Papers
38 |
39 | ### 2010
40 | - [Priority Range Trees](https://arxiv.org/abs/1009.3527)
41 | - [Information Retrieval and the semantic web](https://ieeexplore.ieee.org/document/5607549) 📄
42 |
43 | ### 2014
44 | - [A Latent Semantic Model with Convolutional-Pooling
45 | Structure for Information Retrieval](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/cikm2014_cdssm_final.pdf) 📄
46 |
47 | ### 2015
48 | - [Skip-Thought Vectors](https://arxiv.org/pdf/1506.06726.pdf) 📄
49 | - [Practical and Optimal LSH for Angular Distance](https://proceedings.neurips.cc/paper/2015/hash/2823f4797102ce1a1aec05359cc16dd9-Abstract.html)
50 |
51 | ### 2016
52 | - [Bag of Tricks for Efficient Text Classification](https://arxiv.org/abs/1607.01759) 📄
53 | - [Enriching Word Vectors with Subword Information](https://arxiv.org/abs/1607.04606) 📄
54 | - [Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs](https://arxiv.org/abs/1603.09320)
55 | - [On Approximately Searching for Similar Word Embeddings](https://www.aclweb.org/anthology/P16-1214.pdf)
56 | - [Learning Distributed Representations of Sentences from Unlabelled Data](https://arxiv.org/abs/1602.03483)📄
57 | - [Approximate Nearest Neighbor Search on High Dimensional Data --- Experiments, Analyses, and Improvement](https://arxiv.org/abs/1610.02455)
58 |
59 | ### 2017
60 | - [Supervised Learning of Universal Sentence Representations from Natural Language Inference Data](https://research.fb.com/wp-content/uploads/2017/09/emnlp2017.pdf) 📄
61 | - [Semantic Textual Similarity For Hindi](https://www.semanticscholar.org/paper/Semantic-Textual-Similarity-For-Hindi-Mujadia-Mamidi/372f615ce36d7543512b8e40d6de51d17f316e0b)📄
62 | - [Efficient Natural Language Response Suggestion for Smart Reply](https://arxiv.org/abs/1705.00652)📃
63 |
64 | ### 2018
65 | - [Universal Sentence Encoder](https://arxiv.org/pdf/1803.11175.pdf) 📄
66 | - [Learning Semantic Textual Similarity from Conversations](https://arxiv.org/pdf/1804.07754.pdf) 📄
67 | - [Google AI Blog: Advances in Semantic Textual Similarity](https://ai.googleblog.com/2018/05/advances-in-semantic-textual-similarity.html) 📄
68 | - [Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech](https://arxiv.org/abs/1803.08976))🔊
69 | - [Optimization of Indexing Based on k-Nearest Neighbor Graph for Proximity Search in High-dimensional Data](https://arxiv.org/abs/1810.07355) 🔊
70 | - [Fast Approximate Nearest Neighbor Search With The
71 | Navigating Spreading-out Graph](http://www.vldb.org/pvldb/vol12/p461-fu.pdf)
72 | - [The Case for Learned Index Structures](https://dl.acm.org/doi/10.1145/3183713.3196909)
73 |
74 | ### 2019
75 | - [LASER: Language Agnostic Sentence Representations](https://engineering.fb.com/2019/01/22/ai-research/laser-multilingual-sentence-embeddings/) 📄
76 | - [Document Expansion by Query Prediction](https://arxiv.org/abs/1904.08375) 📄
77 | - [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/pdf/1908.10084.pdf) 📄
78 | - [Multi-Stage Document Ranking with BERT](https://arxiv.org/abs/1910.14424) 📄
79 | - [Latent Retrieval for Weakly Supervised Open Domain Question Answering](https://arxiv.org/abs/1906.00300)
80 | - [End-to-End Open-Domain Question Answering with BERTserini](https://www.aclweb.org/anthology/N19-4013/)
81 | - [BioBERT: a pre-trained biomedical language representation model for biomedical text mining](https://arxiv.org/abs/1901.08746)📄
82 | - [Analyzing and Improving Representations with the Soft Nearest Neighbor Loss](https://arxiv.org/pdf/1902.01889.pdf)📷
83 | - [DiskANN: Fast Accurate Billion-point Nearest
84 | Neighbor Search on a Single Node](https://proceedings.neurips.cc/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf)
85 |
86 | ### 2020
87 | - [Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset: Preliminary Thoughts and Lessons Learned](https://arxiv.org/abs/2004.05125) 📄
88 | - [PASSAGE RE-RANKING WITH BERT](https://arxiv.org/pdf/1901.04085.pdf) 📄
89 | - [CO-Search: COVID-19 Information Retrieval with Semantic Search, Question Answering, and Abstractive Summarization](https://arxiv.org/pdf/2006.09595.pdf) 📄
90 | - [LaBSE:Language-agnostic BERT Sentence Embedding](https://arxiv.org/abs/2007.01852) 📄
91 | - [Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset](https://arxiv.org/abs/2007.07846) 📄
92 | - [DeText: A deep NLP framework for intelligent text understanding](https://engineering.linkedin.com/blog/2020/open-sourcing-detext) 📄
93 | - [Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation](https://arxiv.org/pdf/2004.09813.pdf) 📄
94 | - [Pretrained Transformers for Text Ranking: BERT and Beyond](https://arxiv.org/abs/2010.06467) 📄
95 | - [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909)
96 | - [ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORS](https://openreview.net/pdf?id=r1xMH1BtvB)📄
97 | - [Improving Deep Learning For Airbnb Search](https://arxiv.org/pdf/2002.05515)
98 | - [Managing Diversity in Airbnb Search](https://arxiv.org/abs/2004.02621)📄
99 | - [Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval](https://arxiv.org/abs/2007.00808v1)📄
100 | - [Unsupervised Image Style Embeddings for Retrieval and Recognition Tasks](https://openaccess.thecvf.com/content_WACV_2020/papers/Gairola_Unsupervised_Image_Style_Embeddings_for_Retrieval_and_Recognition_Tasks_WACV_2020_paper.pdf)📷
101 | - [DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations](https://arxiv.org/abs/2006.03659)📄
102 |
103 | ### 2021
104 | - [Hybrid approach for semantic similarity calculation between Tamil words](https://www.researchgate.net/publication/350112163_Hybrid_approach_for_semantic_similarity_calculation_between_Tamil_words) 📄
105 | - [Augmented SBERT](https://arxiv.org/pdf/2010.08240.pdf) 📄
106 | - [BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models](https://arxiv.org/abs/2104.08663) 📄
107 | - [Compatibility-aware Heterogeneous Visual Search](https://arxiv.org/abs/2105.06047) 📷
108 | - [Learning Personal Style from Few Examples](https://chuanenlin.com/personalstyle)📷
109 | - [TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning](https://arxiv.org/abs/2104.06979)📄
110 | - [A Survey of Transformers](https://arxiv.org/abs/2106.04554)📄📷
111 | - [SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking](https://dl.acm.org/doi/10.1145/3404835.3463098)📄
112 | - [High Quality Related Search Query Suggestions using Deep Reinforcement Learning](https://arxiv.org/abs/2108.04452v1)
113 | - [Embedding-based Product Retrieval in Taobao Search](https://arxiv.org/pdf/2106.09297.pdf)📄📷
114 | - [TPRM: A Topic-based Personalized Ranking Model for Web Search](https://arxiv.org/abs/2108.06014)📄
115 | - [mMARCO: A Multilingual Version of MS MARCO Passage Ranking Dataset](https://arxiv.org/abs/2108.13897)📄
116 | - [Database Reasoning Over Text](https://aclanthology.org/2021.acl-long.241.pdf)📄
117 | - [How Does Adversarial Fine-Tuning Benefit BERT?](https://arxiv.org/abs/2108.13602))📄
118 | - [Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation](https://arxiv.org/abs/2108.12409)📄
119 | - [Primer: Searching for Efficient Transformers for Language Modeling](https://arxiv.org/abs/2109.08668)📄
120 | - [How Familiar Does That Sound? Cross-Lingual Representational
121 | Similarity Analysis of Acoustic Word Embeddings](https://arxiv.org/pdf/2109.10179.pdf)🔊
122 | - [SimCSE: Simple Contrastive Learning of Sentence Embeddings](https://arxiv.org/abs/2104.08821#)📄
123 | - [Compositional Attention: Disentangling Search and Retrieval](https://arxiv.org/abs/2110.09419)📄📷
124 | - [SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search](https://arxiv.org/abs/2111.08566)
125 | - [GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval](https://arxiv.org/abs/2112.07577) 📄
126 | - [Generative Search Engines: Initial Experiments](https://computationalcreativity.net/iccc21/wp-content/uploads/2021/09/ICCC_2021_paper_50.pdf) 📷
127 | - [Rethinking Search: Making Domain Experts out of Dilettantes](https://dl.acm.org/doi/10.1145/3476415.3476428)
128 | -[WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach](https://arxiv.org/abs/2104.01767)
129 |
130 | ### 2022
131 | - [Text and Code Embeddings by Contrastive Pre-Training](https://arxiv.org/abs/2201.10005)📄
132 | - [RELIC: Retrieving Evidence for Literary Claims](https://arxiv.org/abs/2203.10053)📄
133 | - [Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations](https://arxiv.org/abs/2109.13059)📄
134 | - [SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation](https://arxiv.org/abs/2205.08180)🔊
135 | - [An Analysis of Fusion Functions for Hybrid Retrieval](https://arxiv.org/abs/2210.11934)📄
136 | - [Out-of-distribution Detection with Deep Nearest Neighbors](https://arxiv.org/abs/2204.06507)
137 | - [ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition](https://arxiv.org/abs/2210.13352)🔊
138 | - [Analyzing Acoustic Word Embeddings From Pre-Trained Self-Supervised Speech Models](https://arxiv.org/pdf/2210.16043.pdf))🔊
139 | - [Rethinking with Retrieval: Faithful Large Language Model Inference](https://arxiv.org/abs/2301.00303)📄
140 | - [Precise Zero-Shot Dense Retrieval without Relevance Labels](https://arxiv.org/pdf/2212.10496.pdf)📄
141 | - [Transformer Memory as a Differentiable Search Index](https://arxiv.org/abs/2202.06991)📄
142 |
143 | ### 2023
144 | - [FINGER: Fast Inference for Graph-based Approximate Nearest Neighbor Search](https://dl.acm.org/doi/10.1145/3543507.3583318)📄
145 | - [“Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors](https://aclanthology.org/2023.findings-acl.426/)📄
146 | - [SparseEmbed: Learning Sparse Lexical Representations with Contextual Embeddings for Retrieval](https://dl.acm.org/doi/pdf/10.1145/3539618.3592065) 📄
147 |
148 | ## Articles
149 | - [Tackling Semantic Search](https://adityamalte.substack.com/p/tackle-semantic-search/)
150 | - [Semantic search in Azure Cognitive Search](https://docs.microsoft.com/en-us/azure/search/semantic-search-overview)
151 | - [How we used semantic search to make our search 10x smarter](https://zilliz.com/blog/How-we-used-semantic-search-to-make-our-search-10-x-smarter/)
152 | - [Stanford AI Blog : Building Scalable, Explainable, and Adaptive NLP Models with Retrieval](https://ai.stanford.edu/blog/retrieval-based-NLP/)
153 | - [Building a semantic search engine with dual space word embeddings](https://m.mage.ai/building-a-semantic-search-engine-with-dual-space-word-embeddings-f5a596eb6d90)
154 | - [Billion-scale semantic similarity search with FAISS+SBERT](https://towardsdatascience.com/billion-scale-semantic-similarity-search-with-faiss-sbert-c845614962e2)
155 | - [Some observations about similarity search thresholds](https://greglandrum.github.io/rdkit-blog/similarity/reference/2021/05/26/similarity-threshold-observations1.html)
156 | - [Near Duplicate Image Search using Locality Sensitive Hashing](https://keras.io/examples/vision/near_dup_search/)
157 | - [Free Course on Vector Similarity Search and Faiss]( https://link.medium.com/HtFoFKlKvkb)
158 | - [Comprehensive Guide To Approximate Nearest Neighbors Algorithms](https://link.medium.com/V62Z8drvEkb)
159 | - [Introducing the hybrid index to enable keyword-aware semantic search](https://www.pinecone.io/learn/hybrid-search/?utm_medium=email&_hsmi=0&_hsenc=p2ANqtz--zLu9hiyh-y_XTa7FCEpi8JESJKmif5dhpYtAxTWka8PIttaTOGE21LMZlg9EOZyPYpCm6GDvYy57tlGRwH6TjgLCsJg&utm_content=231741722&utm_source=hs_email)
160 | - [Argilla Semantic Search](https://docs.argilla.io/en/latest/guides/features/semantic-search.html)
161 | - [Co:here's Multilingual Text Understanding Model](https://txt.cohere.ai/multilingual/)
162 | - [Simplify Search woth Multilingual Embedding Models](https://blog.vespa.ai/simplify-search-with-multilingual-embeddings/)
163 |
164 | ## Libraries and Tools
165 | - [fastText](https://fasttext.cc/)
166 | - [Universal Sentence Encoder](https://tfhub.dev/google/universal-sentence-encoder/4)
167 | - [SBERT](https://www.sbert.net/)
168 | - [ELECTRA](https://github.com/google-research/electra)
169 | - [LaBSE](https://tfhub.dev/google/LaBSE/2)
170 | - [LASER](https://github.com/facebookresearch/LASER)
171 | - [Relevance AI - Vector Platform From Experimentation To Deployment](https://relevance.ai)
172 | - [Haystack](https://github.com/deepset-ai/haystack/)
173 | - [Jina.AI](https://jina.ai/)
174 | - [pinecone](https://www.pinecone.io/)
175 | - [SentEval Toolkit](https://github.com/facebookresearch/SentEval?utm_source=catalyzex.com)
176 | - [ranx](https://github.com/AmenRa/ranx)
177 | - [BEIR :Benchmarking IR](https://github.com/UKPLab/beir)
178 | - [RELiC: Retrieving Evidence for Literary Claims Dataset](https://relic.cs.umass.edu/)
179 | - [matchzoo-py](https://github.com/NTMC-Community/MatchZoo-py)
180 | - [deep_text_matching](https://github.com/wangle1218/deep_text_matching)
181 | - [Which Frame?](http://whichframe.com/)
182 | - [lexica.art](https://lexica.art/)
183 | - [emoji semantic search](https://github.com/lilianweng/emoji-semantic-search)
184 | - [PySerini](https://github.com/castorini/pyserini)
185 | - [BERTSerini](https://github.com/rsvp-ai/bertserini)
186 | - [BERTSimilarity](https://github.com/Brokenwind/BertSimilarity)
187 | - [milvus](https://www.milvus.io/)
188 | - [NeuroNLP++](https://plusplus.neuronlp.fruitflybrain.org/)
189 | - [weaviate](https://github.com/semi-technologies/weaviate)
190 | - [semantic-search-through-wikipedia-with-weaviate](https://github.com/semi-technologies/semantic-search-through-wikipedia-with-weaviate)
191 | - [natural-language-youtube-search](https://github.com/haltakov/natural-language-youtube-search)
192 | - [same.energy](https://www.same.energy/about)
193 | - [ann benchmarks](http://ann-benchmarks.com/)
194 | - [scaNN](https://github.com/google-research/google-research/tree/master/scann)
195 | - [REALM](https://github.com/google-research/language/tree/master/language/realm)
196 | - [annoy](https://github.com/spotify/annoy)
197 | - [pynndescent](https://github.com/lmcinnes/pynndescent)
198 | - [nsg](https://github.com/ZJULearning/nsg)
199 | - [FALCONN](https://github.com/FALCONN-LIB/FALCONN)
200 | - [redis HNSW](https://github.com/zhao-lang/redis_hnsw)
201 | - [autofaiss](https://github.com/criteo/autofaiss)
202 | - [DPR](https://github.com/facebookresearch/DPR)
203 | - [rank_BM25](https://github.com/dorianbrown/rank_bm25)
204 | - [FlashRank](https://github.com/PrithivirajDamodaran/FlashRank)
205 | - [nearPy](http://pixelogik.github.io/NearPy/)
206 | - [vearch](https://github.com/vearch/vearch)
207 | - [vespa](https://github.com/vespa-engine/vespa)
208 | - [PyNNDescent](https://github.com/lmcinnes/pynndescent)
209 | - [pgANN](https://github.com/netrasys/pgANN)
210 | - [Tensorflow Similarity](https://github.com/tensorflow/similarity)
211 | - [opensemanticsearch.org](https://www.opensemanticsearch.org/)
212 | - [GPT3 Semantic Search](https://gpt3demo.com/category/semantic-search)
213 | - [searchy](https://github.com/lubianat/searchy)
214 | - [txtai](https://github.com/neuml/txtai)
215 | - [HyperTag](https://github.com/Ravn-Tech/HyperTag)
216 | - [vectorai](https://github.com/vector-ai/vectorai)
217 | - [embeddinghub](https://github.com/featureform/embeddinghub)
218 | - [AquilaDb](https://github.com/Aquila-Network/AquilaDB)
219 | - [STripNet](https://github.com/stephenleo/stripnet)
220 |
221 | ## Datasets
222 | - [Semantic Text Similarity Dataset Hub](https://github.com/brmson/dataset-sts)
223 | - [Facebook AI Image Similarity Challenge](https://www.drivendata.org/competitions/79/competition-image-similarity-1-dev/?fbclid=IwAR31vRV0EdxRdrxtPy12neZtBJQ0H9qdLHm8Wl2DjHY09PtQdn1nEEIJVUo)
224 | - [WIT : Wikipedia-based Image Text Dataset](https://github.com/google-research-datasets/wit)
225 | - [BEIR](https://github.com/beir-cellar/beir)
226 | - MTEB
227 |
228 | ## Milestones
229 |
230 | Have a look at the [project board](https://github.com/Agrover112/awesome-semantic-search/projects/1) for the task list to contribute to any of the open issues.
231 |
--------------------------------------------------------------------------------
/README_French.md:
--------------------------------------------------------------------------------
1 | # Impressionnant Recherche-Sémantique [](https://awesome.re) [](https://conventionalcommits.org)
2 |
3 |
4 |
5 | Logo réalisé par [@createdbytango](https://instagram.com/createdbytango).
6 |
7 | **À la recherche d'ajouts de papiers supplémentaires.
8 | PS : Soumettez une Pull Request**
9 |
10 | Le référentiel suivant vise à servir de méta-référentiel pour les tâches liées à la [recherche sémantique](https://en.wikipedia.org/wiki/Semantic_search) et à la [similarité sémantique](http://nlpprogress.com/english/semantic_textual_similarity.html).
11 |
12 | La recherche sémantique n'est pas limitée au texte ! Elle peut être réalisée avec des images, de la parole, etc. Il existe de nombreux cas d'utilisation et applications différents de la recherche sémantique.
13 |
14 | N'hésitez pas à soumettre une [Pull Request](https://github.com/Agrover112/awesome-semantic-search/projects/1) sur ce référentiel !
15 |
16 | ## Contenu
17 |
18 | - [Papiers](#papers)
19 | - [2014](#2014)
20 | - [2015](#2015)
21 | - [2016](#2016)
22 | - [2017](#2017)
23 | - [2018](#2018)
24 | - [2019](#2019)
25 | - [2020](#2020)
26 | - [2021](#2021)
27 | - [2022](#2022)
28 | - [2023](#2023)
29 | - [Articles](#articles)
30 | - [Bibliothèques et Outils](#libraries-and-tools)
31 | - [Ensembles de données](#datasets)
32 | - [Étapes Importantes](#milestones)
33 |
34 | ## Papiers
35 |
36 | ### 2010
37 | - [Priority Range Trees](https://arxiv.org/abs/1009.3527)
38 |
39 | ### 2014
40 | - [Un Modèle Sémantique Latent avec une Structure de Convolutions-Pooling pour la Récupération d'Informations](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/cikm2014_cdssm_final.pdf) 📄
41 |
42 | ### 2015
43 | - [Vecteurs Skip-Thought](https://arxiv.org/pdf/1506.06726.pdf) 📄
44 | - [LSH Pratique et Optimal pour la Distance Angulaire](https://proceedings.neurips.cc/paper/2015/hash/2823f4797102ce1a1aec05359cc16dd9-Abstract.html)
45 |
46 | ### 2016
47 | - [Sac de Trucs pour la Classification Efficace du Texte](https://arxiv.org/abs/1607.01759) 📄
48 | - [Enrichissement des Vecteurs de Mots avec des Informations Subword](https://arxiv.org/abs/1607.04606) 📄
49 | - [Recherche de Voisin le Plus Proche Approximatif Efficace et Robuste en Utilisant des Graphes Mondiaux Navigables Hiérarchiques](https://arxiv.org/abs/1603.09320)
50 | - [Recherche Approximative du Voisin le Plus Proche pour les Vecteurs de Mots Similaires - Expériences, Analyses et Amélioration](https://www.aclweb.org/anthology/P16-1214.pdf)
51 | - [Apprentissage de Représentations Distribuées de Phrases à Partir de Données Non Étiquetées](https://arxiv.org/abs/1602.03483) 📄
52 | - [Recherche Approximative du Voisin le Plus Proche sur des Données de Grande Dimension --- Expériences, Analyses et Amélioration](https://arxiv.org/abs/1610.02455)
53 |
54 | ### 2017
55 | - [Apprentissage Supervisé de Représentations Universelles de Phrases à Partir de Données d'Inférence en Langage Naturel](https://research.fb.com/wp-content/uploads/2017/09/emnlp2017.pdf) 📄
56 | - [Similarité Textuelle Sémantique pour le Hindi](https://www.semanticscholar.org/paper/Semantic-Textual-Similarity-For-Hindi-Mujadia-Mamidi/372f615ce36d7543512b8e40d6de51d17f316e0b) 📄
57 | - [Suggestion Efficace de Réponses en Langage Naturel pour Smart Reply](https://arxiv.org/abs/1705.00652) 📃
58 |
59 | ### 2018
60 | - [Encodeur Universel de Phrases](https://arxiv.org/pdf/1803.11175.pdf) 📄
61 | - [Apprentissage de la Similarité Textuelle Sémantique à Partir de Conversations](https://arxiv.org/pdf/1804.07754.pdf) 📄
62 | - [Blog Google AI : Avancées dans la Similarité Textuelle Sémantique](https://ai.googleblog.com/2018/05/advances-in-semantic-textual-similarity.html) 📄
63 | - [Speech2Vec : Un Cadre Séquence à Séquence pour Apprendre des Embarquements de Mots à Partir de la Parole](https://arxiv.org/abs/1803.08976)) 🔊
64 | - [Optimisation de l'Indexation Basée sur le Graphique du Voisin le Plus Proche k pour la Recherche de Proximité dans des Données de Grande Dimension](https://arxiv.org/abs/1810.07355) 🔊
65 | - [Recherche Efficace du Voisin le Plus Proche Approximatif avec le Graphique de Dissémination](http://www.vldb.org/pvldb/vol12/p461-fu.pdf)
66 | - [Plaidoyer pour des Structures d'Indexation Apprises](https://dl.acm.org/doi/10.1145/3183713.3196909)
67 |
68 | ### 2019
69 | - [LASER : Représentations de phrases indépendantes du langage](https://engineering.fb.com/2019/01/22/ai-research/laser-multilingual-sentence-embeddings/) 📄
70 | - [Expansion de document par prédiction de requête](https://arxiv.org/abs/1904.08375) 📄
71 | - [Sentence-BERT : Intégration de phrases à l'aide de réseaux Siamese BERT](https://arxiv.org/pdf/1908.10084.pdf) 📄
72 | - [Classement de documents à plusieurs étapes avec BERT](https://arxiv.org/abs/1910.14424) 📄
73 | - [Récupération latente pour le questionnement faiblement supervisé en domaine ouvert](https://arxiv.org/abs/1906.00300)
74 | - [Question-réponse de bout en bout avec BERTserini](https://www.aclweb.org/anthology/N19-4013/)
75 | - [BioBERT : un modèle de représentation linguistique biomédicale pré-entraîné pour l'extraction de texte biomédical](https://arxiv.org/abs/1901.08746)📄
76 | - [Analyse et amélioration des représentations avec la perte douce du voisin le plus proche](https://arxiv.org/pdf/1902.01889.pdf)📷
77 | - [DiskANN : Recherche rapide et précise du voisin le plus proche pour un milliard de points sur un seul nœud](https://proceedings.neurips.cc/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf)
78 |
79 | ### 2020
80 | - [Déploiement rapide d'un moteur de recherche neuronal pour le COVID-19 Open Research Dataset : Réflexions préliminaires et leçons apprises](https://arxiv.org/abs/2004.05125) 📄
81 | - [RE-CLASSEMENT DE PASSAGE AVEC BERT](https://arxiv.org/pdf/1901.04085.pdf) 📄
82 | - [CO-Search : Recherche d'informations sur le COVID-19 avec recherche sémantique, question-réponse et résumé abstrait](https://arxiv.org/pdf/2006.09595.pdf) 📄
83 | - [LaBSE : Intégration de phrases sans langage](https://arxiv.org/abs/2007.01852) 📄
84 | - [Covidex : Modèles de classement neuronal et infrastructure de recherche par mot-clé pour le COVID-19 Open Research Dataset](https://arxiv.org/abs/2007.07846) 📄
85 | - [DeText : Un cadre d'IA profonde pour la compréhension intelligente du texte](https://engineering.linkedin.com/blog/2020/open-sourcing-detext) 📄
86 | - [Rendre les plongements de phrases monolingues multilingues en utilisant la distillation des connaissances](https://arxiv.org/pdf/2004.09813.pdf) 📄
87 | - [Transformateurs pré-entraînés pour le classement de texte : BERT et au-delà](https://arxiv.org/abs/2010.06467) 📄
88 | - [REALM : Pré-entraînement d'un modèle linguistique augmenté par récupération](https://arxiv.org/abs/2002.08909)
89 | - [ELECTRA : PRÉ-ENTRAÎNEMENT DES ENCODEURS DE TEXTE EN TANT QUE DISCRIMINATEURS PLUTÔT QUE DES GÉNÉRATEURS](https://openreview.net/pdf?id=r1xMH1BtvB)📄
90 | - [Amélioration de l'apprentissage profond pour la recherche Airbnb](https://arxiv.org/pdf/2002.05515)
91 | - [Gestion de la diversité dans la recherche Airbnb](https://arxiv.org/abs/2004.02621)📄
92 | - [Apprentissage négatif de contraste approximatif du voisin le plus proche pour la recherche dense de texte](https://arxiv.org/abs/2007.00808v1)📄
93 | - [Plongements d'images sans supervision pour les tâches de recherche et de reconnaissance](https://openaccess.thecvf.com/content_WACV_2020/papers/Gairola_Unsupervised_Image_Style_Embeddings_for_Retrieval_and_Recognition_Tasks_WACV_2020_paper.pdf)📷
94 | - [DeCLUTR : Apprentissage en profondeur contrastif pour les représentations textuelles non supervisées](https://arxiv.org/abs/2006.03659)📄
95 |
96 |
97 | ### 2021
98 | - [Approche hybride pour le calcul de similarité sémantique entre les mots tamouls](https://www.researchgate.net/publication/350112163_Hybrid_approach_for_semantic_similarity_calculation_between_Tamil_words) 📄
99 | - [SBERT augmenté](https://arxiv.org/pdf/2010.08240.pdf) 📄
100 | - [BEIR : un banc d'essai hétérogène pour l'évaluation sans tir préalable des modèles de recherche d'informations](https://arxiv.org/abs/2104.08663) 📄
101 | - [Recherche visuelle hétérogène compatible](https://arxiv.org/abs/2105.06047) 📷
102 | - [Apprentissage du style personnel à partir de quelques exemples](https://chuanenlin.com/personalstyle)📷
103 | - [TSDAE : Utilisation d'un auto-encodeur de débruitage séquentiel basé sur un transformateur pour l'apprentissage non supervisé de l'intégration de phrases](https://arxiv.org/abs/2104.06979)📄
104 | - [Une enquête sur les transformateurs](https://arxiv.org/abs/2106.04554)📄📷
105 | - [SPLADE : Modèle lexical et d'expansion parcimonieux pour le classement de la première étape](https://dl.acm.org/doi/10.1145/3404835.3463098)📄
106 | - [Suggestions de requêtes de recherche liées de haute qualité à l'aide de l'apprentissage en profondeur par renforcement](https://arxiv.org/abs/2108.04452v1)
107 | - [Récupération de produits basée sur l'intégration dans la recherche Taobao](https://arxiv.org/pdf/2106.09297.pdf)📄📷
108 | - [TPRM : Un modèle de classement personnalisé basé sur les sujets pour la recherche Web](https://arxiv.org/abs/2108.06014)📄
109 | - [mMARCO : Une version multilingue de l'ensemble de données de classement de passages MS MARCO](https://arxiv.org/abs/2108.13897)📄
110 | - [Raisonnement sur la base de données à partir du texte](https://aclanthology.org/2021.acl-long.241.pdf)📄
111 | - [En quoi l'affinage adversarial profite-t-il à BERT ?](https://arxiv.org/abs/2108.13602))📄
112 | - [Entraînement court, test long : l'attention avec des biais linéaires permet l'extrapolation de la longueur d'entrée](https://arxiv.org/abs/2108.12409)📄
113 | - [Primer : Recherche d'architectures de transformateurs efficaces pour la modélisation linguistique](https://arxiv.org/abs/2109.08668)📄
114 | - [À quel point cela semble-t-il familier ? Analyse de similarité représentationnelle interlingue des plongements acoustiques de mots](https://arxiv.org/pdf/2109.10179.pdf)🔊
115 | - [SimCSE : Apprentissage contrastif simple des plongements de phrases](https://arxiv.org/abs/2104.08821#)📄
116 | - [Attention compositionnelle : Désentrelacement de la recherche et de la récupération](https://arxiv.org/abs/2110.09419)📄📷
117 | - [SPANN : Recherche de voisin le plus proche efficace à l'échelle du milliard](https://arxiv.org/abs/2111.08566)
118 | - [GPL : Étiquetage pseudo-génératif pour l'adaptation de domaine non supervisée de la récupération dense](https://arxiv.org/abs/2112.07577) 📄
119 | - [Moteurs de recherche génératifs : expériences initiales](https://computationalcreativity.net/iccc21/wp-content/uploads/2021/09/ICCC_2021_paper_50.pdf) 📷
120 | - [Repenser la recherche : faire des experts de domaine à partir de dilettantes](https://dl.acm.org/doi/10.1145/3476415.3476428)
121 | - [WhiteningBERT : Une approche facile d'intégration de phrases non supervisée](https://arxiv.org/abs/2104.01767)
122 |
123 | ### 2022
124 | - [Intégration de textes et de codes par pré-entraînement contrastif](https://arxiv.org/abs/2201.10005)📄
125 | - [RELIC : Récupération de preuves pour les revendications littéraires](https://arxiv.org/abs/2203.10053)📄
126 | - [Trans-Encoder : Modélisation non supervisée de paires de phrases par auto-distillations mutuelles et mutuelles](https://arxiv.org/abs/2109.13059)📄
127 | - [SAMU-XLSR : Représentation multimodale de l'énoncé interlingue alignée sémantiquement](https://arxiv.org/abs/2205.08180)🔊
128 | - [Analyse des fonctions de fusion pour la recherche hybride](https://arxiv.org/abs/2210.11934)📄
129 | - [Détection hors distribution avec des voisins les plus proches profonds](https://arxiv.org/abs/2204.06507)
130 | - [ESB : Un banc d'essai pour la reconnaissance de la parole de bout en bout multi-domaines](https://arxiv.org/abs/2210.13352)🔊
131 | - [Analyse des plongements acoustiques de mots à partir de modèles de parole auto-supervisés pré-entraînés](https://arxiv.org/pdf/2210.16043.pdf))🔊
132 | - [Repenser avec la récupération : Inférence fidèle de grands modèles linguistiques](https://arxiv.org/abs/2301.00303)📄
133 | - [Récupération dense précise sans étiquettes de pertinence](https://arxiv.org/pdf/2212.10496.pdf)📄
134 | - [Mémoire du transformateur en tant qu'index de recherche différenciable](https://arxiv.org/abs/2202.06991)📄
135 |
136 | ### 2023
137 | - [FINGER : Inférence rapide pour la recherche du voisin le plus proche approximatif basée sur un graphe](https://dl.acm.org/doi/10.1145/3543507.3583318)📄
138 | - [Classification de texte "faible ressource" : une méthode de classification sans paramètre avec des compresseurs](https://aclanthology.org/2023.findings-acl.426/)📄
139 | - [SparseEmbed : Apprentissage de représentations lexicales clairsemées avec des plongements contextuels pour la récupération](https://dl.acm.org/doi/pdf/10.1145/3539618.3592065) 📄
140 |
141 | ## Articles
142 | - [Aborder la recherche sémantique](https://adityamalte.substack.com/p/tackle-semantic-search/)
143 | - [Recherche sémantique dans Azure Cognitive Search](https://docs.microsoft.com/en-us/azure/search/semantic-search-overview)
144 | - [Comment nous avons utilisé la recherche sémantique pour rendre notre recherche 10 fois plus intelligente](https://zilliz.com/blog/How-we-used-semantic-search-to-make-our-search-10-x-smarter/)
145 | - [Stanford AI Blog : Construction de modèles NLP évolutifs, explicables et adaptatifs avec la récupération](https://ai.stanford.edu/blog/retrieval-based-NLP/)
146 | - [Construction d'un moteur de recherche sémantique avec des plongements de mots à double espace](https://m.mage.ai/building-a-semantic-search-engine-with-dual-space-word-embeddings-f5a596eb6d90)
147 | - [Recherche de similarité sémantique à l'échelle du milliard avec FAISS+SBERT](https://towardsdatascience.com/billion-scale-semantic-similarity-search-with-faiss-sbert-c845614962e2)
148 | - [Quelques observations sur les seuils de recherche de similarité](https://greglandrum.github.io/rdkit-blog/similarity/reference/2021/05/26/similarity-threshold-observations1.html)
149 | - [Recherche d'images quasi identiques avec Locality Sensitive Hashing](https://keras.io/examples/vision/near_dup_search/)
150 | - [Cours gratuit sur la recherche de similarité vectorielle et Faiss](https://link.medium.com/HtFoFKlKvkb)
151 | - [Guide complet des algorithmes de recherche des voisins les plus proches approximatifs](https://link.medium.com/V62Z8drvEkb)
152 | - [Introduction de l'index hybride pour permettre la recherche sémantique consciente des mots-clés](https://www.pinecone.io/learn/hybrid-search/?utm_medium=email&_hsmi=0&_hsenc=p2ANqtz--zLu9hiyh-y_XTa7FCEpi8JESJKmif5dhpYtAxTWka8PIttaTOGE21LMZlg9EOZyPYpCm6GDvYy57tlGRwH6TjgLCsJg&utm_content=231741722&utm_source=hs_email)
153 | - [Recherche sémantique Argilla](https://docs.argilla.io/en/latest/guides/features/semantic-search.html)
154 | - [Modèle de compréhension textuelle multilingue de Co:here](https://txt.cohere.ai/multilingual/)
155 | - [Simplifiez la recherche avec des modèles d'embedding multilingues](https://blog.vespa.ai/simplify-search-with-multilingual-embeddings/)
156 |
157 | ## Bibliothèques et Outils
158 | - [fastText](https://fasttext.cc/)
159 | - [Universal Sentence Encoder](https://tfhub.dev/google/universal-sentence-encoder/4)
160 | - [SBERT](https://www.sbert.net/)
161 | - [ELECTRA](https://github.com/google-research/electra)
162 | - [LaBSE](https://tfhub.dev/google/LaBSE/2)
163 | - [LASER](https://github.com/facebookresearch/LASER)
164 | - [Relevance AI - Plateforme vectorielle de l'expérimentation au déploiement](https://relevance.ai)
165 | - [Haystack](https://github.com/deepset-ai/haystack/)
166 | - [Jina.AI](https://jina.ai/)
167 | - [Pinecone](https://www.pinecone.io/)
168 | - [SentEval Toolkit](https://github.com/facebookresearch/SentEval?utm_source=catalyzex.com)
169 | - [ranx](https://github.com/AmenRa/ranx)
170 | - [BEIR :Evaluation des IR](https://github.com/UKPLab/beir)
171 | - [RELiC: Jeu de données de récupération d'éléments pour les revendications littéraires](https://relic.cs.umass.edu/)
172 | - [matchzoo-py](https://github.com/NTMC-Community/MatchZoo-py)
173 | - [deep_text_matching](https://github.com/wangle1218/deep_text_matching)
174 | - [Quel cadre ?](http://whichframe.com/)
175 | - [lexica.art](https://lexica.art/)
176 | - [Recherche sémantique emoji](https://github.com/lilianweng/emoji-semantic-search)
177 | - [PySerini](https://github.com/castorini/pyserini)
178 | - [BERTSerini](https://github.com/rsvp-ai/bertserini)
179 | - [BERTSimilarity](https://github.com/Brokenwind/BertSimilarity)
180 | - [milvus](https://www.milvus.io/)
181 | - [NeuroNLP++](https://plusplus.neuronlp.fruitflybrain.org/)
182 | - [weaviate](https://github.com/semi-technologies/weaviate)
183 | - [Recherche sémantique à travers Wikipedia avec Weaviate](https://github.com/semi-technologies/semantic-search-through-wikipedia-with-weaviate)
184 | - [Recherche naturelle sur YouTube](https://github.com/haltakov/natural-language-youtube-search)
185 | - [same.energy](https://www.same.energy/about)
186 | - [Benchmarks ANN](http://ann-benchmarks.com/)
187 | - [scaNN](https://github.com/google-research/google-research/tree/master/scann)
188 | - [REALM](https://github.com/google-research/language/tree/master/language/realm)
189 | - [annoy](https://github.com/spotify/annoy)
190 | - [pynndescent](https://github.com/lmcinnes/pynndescent)
191 | - [nsg](https://github.com/ZJULearning/nsg)
192 | - [FALCONN](https://github.com/FALCONN-LIB/FALCONN)
193 | - [redis HNSW](https://github.com/zhao-lang/redis_hnsw)
194 | - [autofaiss](https://github.com/criteo/autofaiss)
195 | - [DPR](https://github.com/facebookresearch/DPR)
196 | - [rank_BM25](https://github.com/dorianbrown/rank_bm25)
197 | - [nearPy](http://pixelogik.github.io/NearPy/)
198 | - [vearch](https://github.com/vearch/vearch)
199 | - [vespa](https://github.com/vespa-engine/vespa)
200 | - [PyNNDescent](https://github.com/lmcinnes/pynndescent)
201 | - [pgANN](https://github.com/netrasys/pgANN)
202 | - [Tensorflow Similarity](https://github.com/tensorflow/similarity)
203 | - [opensemanticsearch.org](https://www.opensemanticsearch.org/)
204 | - [GPT3 Semantic Search](https://gpt3demo.com/category/semantic-search)
205 | - [searchy](https://github.com/lubianat/searchy)
206 | - [txtai](https://github.com/neuml/txtai)
207 | - [HyperTag](https://github.com/Ravn-Tech/HyperTag)
208 | - [vectorai](https://github.com/vector-ai/vectorai)
209 | - [embeddinghub](https://github.com/featureform/embeddinghub)
210 | - [AquilaDb](https://github.com/Aquila-Network/AquilaDB)
211 | - [STripNet](https://github.com/stephenleo/stripnet)
212 |
213 | ## Ensembles-de-données
214 | - [Semantic Text Similarity Dataset Hub](https://github.com/brmson/dataset-sts)
215 | - [Facebook AI Image Similarity Challenge](https://www.drivendata.org/competitions/79/competition-image-similarity-1-dev/?fbclid=IwAR31vRV0EdxRdrxtPy12neZtBJQ0H9qdLHm8Wl2DjHY09PtQdn1nEEIJVUo)
216 | - [WIT : Wikipedia-based Image Text Dataset](https://github.com/google-research-datasets/wit)
217 | - [BEIR](https://github.com/beir-cellar/beir)
218 | - MTEB
219 |
220 | ## Étapes Importantes
221 |
222 | Consultez le [tableau du projet](https://github.com/Agrover112/awesome-semantic-search/projects/1) pour la liste des tâches afin de contribuer à l'une des issues ouvertes.
223 |
224 |
--------------------------------------------------------------------------------
/README_Hindi.md:
--------------------------------------------------------------------------------
1 | Awesome Semantic-Search [](https://awesome.re)
2 | ======================================================================================
3 |
4 |
5 |
6 | logo इनके द्वारा निर्मित [@createdbytango](https://instagram.com/createdbytango).
7 |
8 | निम्नलिखित रिपॉजिटरी का उद्देश्य [सिमेंटिक
9 | सर्च](https://en.wikipedia.org/wiki/Semantic_search) और [सिमेंटिक
10 | समानता](http://nlpprogress.com/english/semantic_textual_similarity.html)
11 | से संबंधित कार्यों के लिए मेटा-रिपॉजिटरी की सेवा करना है।
12 |
13 | सिमेंटिक सर्च टेक्स्ट तक ही सीमित नहीं है! यह छवियों, भाषण, आदि के साथ
14 | किया जा सकता है। इसलिए अर्थपूर्ण खोज के कई अलग-अलग उपयोग-मामले और
15 | अनुप्रयोग हैं।
16 |
17 | Contributions / Milestones
18 | --------------------------
19 |
20 | [कार्य
21 | सूची](https://github.com/Agrover112/awesome-semantic-search/projects/1)
22 | के लिए प्रोजेक्ट बोर्ड पर एक नज़र डालें
23 |
24 | विषय-सूची
25 | ---------
26 |
27 | - [दस्तावेज़](#दस्तावेज़)
28 | - [2014](#2014)
29 | - [2015](#2015)
30 | - [2016](#2016)
31 | - [2017](#2017)
32 | - [2018](#2018)
33 | - [2019](#2019)
34 | - [2020](#2020)
35 | - [2021](#2021)
36 |
37 | - [लेख](#लेख)
38 | - [Libraries तथा Tools](#libraries-तथा-tools)
39 | - [डेटासेट](#डेटासेट)
40 | - [माइलस्टोन्स](#माइलस्टोन्स)
41 |
42 |
43 | दस्तावेज़
44 | ---------
45 |
46 | ### 2010
47 |
48 | - [प्राथमिकता रेंज पेड़ ](https://arxiv.org/abs/1009.3527)
49 | 📄
50 |
51 | ### 2014
52 |
53 | - [सूचना पुनर्प्राप्ति के लिए कनवल्शनल-पूलिंग स्ट्रक्चर के साथ एक
54 | अव्यक्त सिमेंटिक
55 | मॉडल](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/cikm2014_cdssm_final.pdf)
56 | 📄
57 |
58 | ### 2015
59 |
60 | - [स्किप-थॉट वैक्टर](https://arxiv.org/pdf/1506.06726.pdf) 📄
61 | - [कोणीय दूरी के लिए व्यावहारिक और इष्टतम एलएसएच](https://proceedings.neurips.cc/paper/2015/hash/2823f4797102ce1a1aec05359cc16dd9-Abstract.html) 📄
62 |
63 | ### 2016
64 |
65 | - [कुशल पाठ वर्गीकरण के लिए ट्रिक्स का
66 | बैग](https://arxiv.org/abs/1607.01759) 📄
67 | - [सबवर्ड जानकारी के साथ वर्ड वैक्टर को समृद्ध
68 | करना](https://arxiv.org/abs/1607.04606) 📄
69 | - [पदानुक्रमित नेविगेट करने योग्य लघु विश्व ग्राफ़ का उपयोग करके कुशल
70 | और मजबूत अनुमानित निकटतम पड़ोसी
71 | खोज](https://arxiv.org/abs/1603.09320)
72 | - [लगभग समान शब्द एंबेडिंग की खोज
73 | पर](https://www.aclweb.org/anthology/P16-1214.pdf)
74 | - [बिना लेबल वाले डेटा से वाक्यों के वितरित अभ्यावेदन सीखना](https://arxiv.org/abs/1602.03483) 📄
75 | - [उच्च आयामी डेटा पर अनुमानित निकटतम पड़ोसी खोज --- प्रयोग, विश्लेषण और सुधार](https://arxiv.org/abs/1610.02455)
76 |
77 | ### 2017
78 |
79 | - [प्राकृतिक भाषा अनुमान डेटा से सार्वभौमिक वाक्य अभ्यावेदन की
80 | पर्यवेक्षित
81 | शिक्षा](https://research.fb.com/wp-content/uploads/2017/09/emnlp2017.pdf)
82 | 📄
83 |
84 | ### 2018
85 |
86 | - [यूनिवर्सल सेंटेंस एनकोडर](https://arxiv.org/pdf/1803.11175.pdf) 📄
87 | - [बातचीत से सिमेंटिक टेक्स्टुअल समानता
88 | सीखना](https://arxiv.org/pdf/1804.07754.pdf) 📄
89 | - [Google AI ब्लॉग: सिमेंटिक टेक्स्टुअल समानता में
90 | प्रगति](https://ai.googleblog.com/2018/05/advances-in-semantic-textual-similarity.html)
91 | 📄
92 | - [उच्च-आयामी डेटा में निकटता खोज के लिए k-निकटतम पड़ोसी ग्राफ़ के
93 | आधार पर अनुक्रमण का अनुकूलन](https://arxiv.org/abs/1810.07355)
94 | - [नेविगेटिंग स्प्रेडिंग-आउट ग्राफ के साथ तेजी से अनुमानित निकटतम पड़ोसी खोज](http://www.vldb.org/pvldb/vol12/p461-fu.pdf)
95 | - [सीखा सूचकांक संरचनाओं के लिए मामला](https://dl.acm.org/doi/10.1145/3183713.3196909)
96 |
97 | ### 2019
98 |
99 | - [लेजर: भाषा अज्ञेय वाक्य
100 | प्रतिनिधित्व](https://engineering.fb.com/2019/01/22/ai-research/laser-multilingual-sentence-embeddings/)
101 | 📄
102 | - [प्रश्न भविष्यवाणी द्वारा दस्तावेज़
103 | विस्तार](https://arxiv.org/abs/1904.08375) 📄
104 | - [सेंटेंस-बर्ट: स्याम देश के बर्ट-नेटवर्क का इस्तेमाल करते हुए वाक्य
105 | एम्बेडिंग](https://arxiv.org/pdf/1908.10084.pdf) 📄
106 | - [बर्ट के साथ बहु-स्तरीय दस्तावेज़
107 | रैंकिंग](https://arxiv.org/abs/1910.14424) 📄
108 | - [कमजोर पर्यवेक्षित खुले डोमेन प्रश्न उत्तर के लिए गुप्त पुनर्प्राप्ति](https://arxiv.org/abs/1906.00300)
109 | - [BERTserini के साथ एंड-टू-एंड ओपन-डोमेन प्रश्न उत्तर](https://www.aclweb.org/anthology/N19-4013/)
110 | - [बायोबर्ट: बायोमेडिकल टेक्स्ट माइनिंग के लिए एक पूर्व-प्रशिक्षित बायोमेडिकल भाषा प्रतिनिधित्व मॉडल](https://arxiv.org/abs/1901.08746)📄
111 | - [नरम निकटतम पड़ोसी नुकसान के साथ प्रतिनिधित्व का विश्लेषण और सुधार](https://arxiv.org/pdf/1902.01889.pdf):camera_flash:
112 | - [DiskANN: एक ही नोड पर तेजी से सटीक अरब-बिंदु निकटतम पड़ोसी खोजें](https://proceedings.neurips.cc/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf)
113 |
114 | ### 2020
115 |
116 | - [COVID-19 ओपन रिसर्च डेटासेट के लिए एक तंत्रिका खोज इंजन को तेजी से
117 | तैनात करना: प्रारंभिक विचार और सीखे गए
118 | सबक](https://arxiv.org/abs/2004.05125) 📄
119 | - [बर्ट के साथ पैसेज री-रैंकिंग](https://arxiv.org/pdf/1901.04085.pdf)
120 | 📄
121 | - [सह-खोज: अर्थपूर्ण खोज के साथ COVID-19 सूचना पुनर्प्राप्ति, प्रश्न
122 | उत्तर, और सार संक्षेप](https://arxiv.org/pdf/2006.09595.pdf) 📄
123 | - [LaBSE:Language-agnostic BERT Sentence
124 | Embedding](https://arxiv.org/abs/2007.01852) 📄
125 | - [Covidex: COVID-19 ओपन रिसर्च डेटासेट के लिए न्यूरल रैंकिंग मॉडल और
126 | कीवर्ड सर्च इंफ्रास्ट्रक्चर](https://arxiv.org/abs/2007.07846) 📄
127 | - [DeTect: बुद्धिमान पाठ समझ के लिए एक गहन एनएलपी
128 | ढांचा](https://engineering.linkedin.com/blog/2020/open-sourcing-detext)
129 | 📄
130 | - [ज्ञान आसवन का उपयोग करके एकभाषी वाक्य एम्बेडिंग बहुभाषी
131 | बनाना](https://arxiv.org/pdf/2004.09813.pdf) 📄
132 | - [टेक्स्ट रैंकिंग के लिए पूर्व प्रशिक्षित ट्रांसफॉर्मर: बीईआरटी और
133 | परे](https://arxiv.org/abs/2010.06467) 📄
134 | - [REALM: पुनर्प्राप्ति-संवर्धित भाषा मॉडल पूर्व-प्रशिक्षण](https://arxiv.org/abs/2002.08909)
135 | - [इलेक्ट्रा: प्री-ट्रेनिंग टेक्स्ट एनकोडर जेनरेटर के बजाय डिस्क्रिमिनेटर के रूप में होते हैं](https://openreview.net/pdf?id=r1xMH1BtvB)📄
136 | - [एयरबीएनबी खोज के लिए डीप लर्निंग में सुधार](https://arxiv.org/pdf/2002.05515)
137 | - [Airbnb खोज में विविधता का प्रबंधन](https://arxiv.org/abs/2004.02621)📄
138 | - [सघन पाठ पुनर्प्राप्ति के लिए लगभग निकटतम पड़ोसी नकारात्मक विपरीत शिक्षा](https://arxiv.org/abs/2007.00808v1)📄
139 |
140 | ### 2021
141 |
142 | - [तमिल शब्दों के बीच अर्थ समानता गणना के लिए हाइब्रिड दृष्टिकोण](https://www.researchgate.net/publication/350112163_Hybrid_approach_for_semantic_similarity_calculation_between_Tamil_words):page_facing_up:
143 | - [संवर्धित SBERT](https://arxiv.org/pdf/2010.08240.pdf) 📄
144 | - [BEIR: सूचना पुनर्प्राप्ति मॉडल के शून्य-शॉट मूल्यांकन के लिए एक
145 | विषम बेंचमार्क](https://arxiv.org/abs/2104.08663) 📄
146 | - [संगतता-जागरूक विषम दृश्य खोज](https://arxiv.org/abs/2105.06047) 📷
147 | - [कुछ उदाहरणों से व्यक्तिगत शैली सीखना](https://chuanenlin.com/personalstyle/)📷
148 | - [TSDAE: अनसुपरवाइज्ड सेंटेंस एंबेडिंग लर्निंग के लिए ट्रांसफॉर्मर-आधारित अनुक्रमिक डीनोइज़िंग ऑटो-एनकोडर का उपयोग करना](https://arxiv.org/abs/2104.06979)📄
149 | - [ट्रांसफॉर्मर का एक सर्वेक्षण](https://arxiv.org/abs/2106.04554)📄📷
150 | - [डीप सुदृढीकरण लर्निंग का उपयोग करके उच्च गुणवत्ता से संबंधित खोज क्वेरी सुझाव](https://arxiv.org/abs/2108.04452v1)
151 | - [Taobao खोज में एम्बेडिंग-आधारित उत्पाद पुनर्प्राप्ति](https://arxiv.org/pdf/2106.09297.pdf)📄📷
152 | - [टीपीआरएम: वेब खोज के लिए एक विषय-आधारित निजीकृत रैंकिंग मॉडल](https://arxiv.org/abs/2108.06014)📄
153 | - [mMARCO: एमएस मार्को पैसेज रैंकिंग डेटासेट का एक बहुभाषी संस्करण](https://arxiv.org/abs/2108.13897)📄
154 | - [टेक्स्ट पर डेटाबेस रीजनिंग](https://aclanthology.org/2021.acl-long.241.pdf)
155 | - [एडवरसैरियल फाइन-ट्यूनिंग BERT को कैसे लाभ पहुंचाता है?](https://arxiv.org/abs/2108.13602):page_facing_up:
156 | - [ट्रेन शॉर्ट, टेस्ट लांग: रैखिक पूर्वाग्रहों के साथ ध्यान इनपुट लेंथ एक्सपेरिमेंटेशन को सक्षम बनाता है](https://arxiv.org/abs/2108.12409):page_facing_up:
157 | - [प्राइमर: भाषा मॉडलिंग के लिए कुशल ट्रांसफॉर्मर की खोज](https://arxiv.org/abs/2109.08668)📄
158 | - [वह ध्वनि कितनी परिचित है? ध्वनिक शब्द एम्बेडिंग का क्रॉस-लिंगुअल रिप्रेजेंटेशनल समानता विश्लेषण](https://arxiv.org/pdf/2109.10179.pdf):loud_sound:
159 | - [SimCSE: वाक्य एम्बेडिंग की सरल विरोधाभासी शिक्षा](https://arxiv.org/abs/2104.08821#):page_facing_up:
160 | - [रचनात्मक ध्यान:खोज और पुनर्प्राप्ति को अलग करना](https://arxiv.org/abs/2110.09419)📄📷
161 | - [स्पैन: अत्यधिक कुशल अरब पैमाने पर लगभग निकटतम पड़ोसी खोज](https://arxiv.org/abs/2111.08566)
162 |
163 | लेख
164 | -------
165 | - [अर्थपूर्ण खोज से निपटना](https://adityamalte.substack.com/p/tackle-semantic-search/)
166 | - [Azure Congnitive Search में सिमेंटिक सर्च](https://docs.microsoft.com/en-us/azure/search/semantic-search-overview)
167 | - [हमने अपनी खोज को 10x स्मार्ट बनाने के लिए सिमेंटिक खोज का उपयोग कैसे किया](https://zilliz.com/blog/How-we-used-semantic-search-to-make-our-search-10-x-smarter/)
168 | - [दोहरे स्थान वाले शब्द एम्बेडिंग के साथ सिमेंटिक सर्च इंजन का निर्माण](https://m.mage.ai/building-a-semantic-search-engine-with-dual-space-word-embeddings-f5a596eb6d90)
169 | - [FAISS+SBERT के साथ अरब-पैमाने की सिमेंटिक समानता खोज](https://towardsdatascience.com/billion-scale-semantic-similarity-search-with-faiss-sbert-c845614962e2)
170 | - [समानता खोज थ्रेसहोल्ड के बारे में कुछ टिप्पणियां](https://greglandrum.github.io/rdkit-blog/similarity/reference/2021/05/26/similarity-threshold-observations1.html)
171 | - [स्थानीयता संवेदनशील हैशिंग का उपयोग करके डुप्लिकेट छवि खोज के पास](https://keras.io/examples/vision/near_dup_search/)
172 | - [वेक्टर समानता खोज और फैस पर नि: शुल्क पाठ्यक्रम](https://link.medium.com/HtFoFKlKvkb)
173 | - [निकटतम पड़ोसियों के एल्गोरिदम के लिए व्यापक गाइड](https://link.medium.com/V62Z8drvEkb)
174 |
175 | Libraries तथा Tools
176 | -------------------
177 |
178 | - [fastText](https://fasttext.cc/)
179 | - [Universal Sentence
180 | Encoder](https://tfhub.dev/google/universal-sentence-encoder/4)
181 | - [SBERT](https://www.sbert.net/)
182 | - [LaBSE](https://tfhub.dev/google/LaBSE/2)
183 | - [LASER](https://github.com/facebookresearch/LASER)
184 | - [Haystack](https://github.com/deepset-ai/haystack/)
185 | - [Jina.AI](https://jina.ai/)
186 | - [SentEval
187 | Toolkit](https://github.com/facebookresearch/SentEval?utm_source=catalyzex.com)
188 | - [BEIR :Benchmarking IR](https://github.com/UKPLab/beir)
189 | - [Which Frame?](http://whichframe.com/)
190 | - [PySerini](https://github.com/castorini/pyserini)
191 | - [milvus](https://www.milvus.io/)
192 | - [weaviate](https://github.com/semi-technologies/weaviate)
193 | - [natural-language-youtube-search](https://github.com/haltakov/natural-language-youtube-search)
194 | - [same.energy](https://www.same.energy/about)
195 | - [scaNN](https://github.com/google-research/google-research/tree/master/scann)
196 | - [annoy](https://github.com/spotify/annoy)
197 | - [faiss](https://github.com/facebookresearch/faiss)
198 | - [DPR](https://github.com/facebookresearch/DPR)
199 | - [rank\_BM25](https://github.com/dorianbrown/rank_bm25)
200 | - [nearPy](http://pixelogik.github.io/NearPy/)
201 | - [vearch](https://github.com/vearch/vearch)
202 | - [PyNNDescent](https://github.com/lmcinnes/pynndescent)
203 | - [pgANN](https://github.com/netrasys/pgANN)
204 | - [Tensorflow Similarity](https://github.com/tensorflow/similarity)
205 | - [opensemanticsearch.org](https://www.opensemanticsearch.org/)
206 | - [GPT3 Semantic Search](https://gpt3demo.com/category/semantic-search)
207 | - [searchy](https://github.com/lubianat/searchy)
208 | - [txtai](https://github.com/neuml/txtai)
209 | - [HyperTag](https://github.com/Ravn-Tech/HyperTag)
210 | - [vectorai](https://github.com/vector-ai/vectorai)
211 | - [embeddinghub](https://github.com/featureform/embeddinghub)
212 | - [AquilaDb](https://github.com/Aquila-Network/AquilaDB)
213 |
214 | डेटासेट
215 | -------
216 |
217 | - [सिमेंटिक टेक्स्ट समानता डेटासेट
218 | हब](https://github.com/brmson/dataset-sts)
219 | - [फेसबुक एआई छवि समानता चुनौती](https://www.drivendata.org/competitions/79/competition-image-similarity-1-dev/?fbclid=IwAR31vRV0EdxRdrxtPy12neZtBJQ0H9qdLHm8Wl2DjHY09PtQdn1nEEIJVUo)
220 | - [WIT: विकिपीडिया-आधारित छवि पाठ डेटासेट](https://github.com/google-research-datasets/wit)
221 |
222 |
223 | माइलस्टोन्स
224 | -------
225 |
226 | - कार्य सूची के लिए [परियोजना बोर्ड](https://github.com/Agrover112/awesome-semantic-search/projects/1) पर एक नज़र डालें ताकि किसी भी खुले मुद्दे में योगदान किया जा सके।
227 |
--------------------------------------------------------------------------------
/README_Portuguesse.md:
--------------------------------------------------------------------------------
1 | # Busca Semântica Incrível [](https://awesome.re) [](https://conventionalcommits.org)
2 |
3 |
4 |
5 | Logo feito por [@createdbytango](https://instagram.com/createdbytango).
6 |
7 | **À procura de mais adições de artigos.
8 | PS: Abra um PR (Pedido de Pull)**
9 |
10 | Este repositório visa servir como um meta-repositório para tarefas relacionadas com [Busca Semântica](https://pt.wikipedia.org/wiki/Busca_semântica) e [Similaridade Semântica](http://nlpprogress.com/english/semantic_textual_similarity.html).
11 |
12 | A busca semântica não se limita a texto! Pode ser feito com imagens, voz, etc. Existem inúmeros casos de uso e diferentes aplicações de busca semântica.
13 |
14 | Sinta-se à vontade para abrir um PR neste repositório!
15 |
16 | ## Conteúdo
17 |
18 | - [Artigos](#artigos)
19 | - [2014](#2014)
20 | - [2015](#2015)
21 | - [2016](#2016)
22 | - [2017](#2017)
23 | - [2018](#2018)
24 | - [2019](#2019)
25 | - [2020](#2020)
26 | - [2021](#2021)
27 | - [2022](#2022)
28 | - [2023](#2023)
29 | - [Artigos](#artigos)
30 | - [Bibliotecas e Ferramentas](#bibliotecas-e-ferramentas)
31 | - [Conjuntos de Dados](#conjuntos-de-dados)
32 | - [Marcos](#marcos)
33 |
34 | ## Artigos
35 |
36 | ### 2010
37 |
38 | - [Priority Range Trees](https://arxiv.org/abs/1009.3527)
39 |
40 | ### 2014
41 |
42 | - [Um Modelo Semântico Latente com Estrutura de Convolutional-Pooling para Recuperação de Informação](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/cikm2014_cdssm_final.pdf) 📄
43 |
44 | ### 2015
45 |
46 | - [Vetores de Skip-Thought](https://arxiv.org/pdf/1506.06726.pdf) 📄
47 | - [LSH Prático e Ótimo para Distância Angular](https://proceedings.neurips.cc/paper/2015/hash/2823f4797102ce1a1aec05359cc16dd9-Abstract.html)
48 |
49 | ### 2016
50 |
51 | - [Saco de truques para classificação eficiente de texto](https://arxiv.org/abs/1607.01759) 📄
52 | - [Enriquecendo vetores de palavras com informações de subpalavras](https://arxiv.org/abs/1607.04606) 📄
53 | - [Pesquisa aproximada de vizinho mais próximo eficiente e robusta usando gráficos hierárquicos navegáveis de pequenos mundos](https://arxiv.org/abs/1603.09320)
54 | - [Sobre a pesquisa aproximada de incorporações de palavras semelhantes](https://www.aclweb.org/anthology/P16-1214.pdf)
55 | - [Aprendendo representações distribuídas de sentenças a partir de dados não rotulados](https://arxiv.org/abs/1602.03483)📄
56 | - [Pesquisa aproximada do vizinho mais próximo em dados de alta dimensão --- Experimentos, análises e melhorias](https://arxiv.org/abs/1610.02455)
57 |
58 | ### 2017
59 |
60 | - [Aprendizagem supervisionada de representações de frases universais a partir de dados de inferência de linguagem natural](https://research.fb.com/wp-content/uploads/2017/09/emnlp2017.pdf) 📄
61 | - [Semelhança textual semântica para hindi] (https://www.semanticscholar.org/paper/Semantic-Textual-Similarity-For-Hindi-Mujadia-Mamidi/372f615ce36d7543512b8e40d6de51d17f316e0b)📄
62 | - [Sugestão eficiente de resposta em linguagem natural para resposta inteligente](https://arxiv.org/abs/1705.00652)📃
63 |
64 | ### 2018
65 |
66 | - [Codificador de frases universais](https://arxiv.org/pdf/1803.11175.pdf) 📄
67 | - [Aprendendo similaridade textual semântica em conversas](https://arxiv.org/pdf/1804.07754.pdf) 📄
68 | - [Blog de IA do Google: avanços na similaridade textual semântica](https://ai.googleblog.com/2018/05/advances-in-semantic-textual-similarity.html) 📄
69 | - [Speech2Vec: uma estrutura de sequência a sequência para aprender incorporações de palavras a partir da fala](https://arxiv.org/abs/1803.08976))🔊
70 | - [Otimização da indexação com base no gráfico k-vizinho mais próximo para pesquisa de proximidade em dados de alta dimensão](https://arxiv.org/abs/1810.07355) 🔊
71 | - [Pesquisa rápida aproximada do vizinho mais próximo com o
72 | Navegando no gráfico de dispersão](http://www.vldb.org/pvldb/vol12/p461-fu.pdf)
73 | - [O caso das estruturas de índice aprendidas](https://dl.acm.org/doi/10.1145/3183713.3196909)
74 |
75 | ### 2019
76 |
77 | - [LASER: representações de frases agnósticas de linguagem](https://engineering.fb.com/2019/01/22/ai-research/laser-multilingual-sentence-embeddings/) 📄
78 | - [Expansão de documentos por previsão de consulta](https://arxiv.org/abs/1904.08375) 📄
79 | - [Sentence-BERT: Embeddings de frases usando redes BERT siamesas](https://arxiv.org/pdf/1908.10084.pdf) 📄
80 | - [Classificação de documentos em vários estágios com BERT](https://arxiv.org/abs/1910.14424) 📄
81 | - [Recuperação latente para resposta a perguntas de domínio aberto com supervisão fraca](https://arxiv.org/abs/1906.00300)
82 | - [Resposta completa de perguntas de domínio aberto com BERTserini](https://www.aclweb.org/anthology/N19-4013/)
83 | - [BioBERT: um modelo de representação de linguagem biomédica pré-treinado para mineração de texto biomédico](https://arxiv.org/abs/1901.08746)📄
84 | - [Analisando e melhorando representações com a perda suave do vizinho mais próximo](https://arxiv.org/pdf/1902.01889.pdf)📷
85 | - [DiskANN: rápido e preciso bilhão de pontos mais próximo
86 | Pesquisa de vizinho em um único nó](https://proceedings.neurips.cc/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf)
87 |
88 | ### 2020
89 |
90 | - [Implantando rapidamente um mecanismo de pesquisa neural para o conjunto de dados de pesquisa aberta COVID-19: reflexões preliminares e lições aprendidas](https://arxiv.org/abs/2004.05125) 📄
91 | - [RE-RANKING DA PASSAGEM COM BERT](https://arxiv.org/pdf/1901.04085.pdf) 📄
92 | - [CO-Search: recuperação de informações sobre COVID-19 com pesquisa semântica, resposta a perguntas e resumo abstrativo](https://arxiv.org/pdf/2006.09595.pdf) 📄
93 | - [LaBSE: Incorporação de frase BERT independente de idioma](https://arxiv.org/abs/2007.01852) 📄
94 | - [Covidex: Modelos de classificação neural e infraestrutura de pesquisa de palavras-chave para o conjunto de dados de pesquisa aberta COVID-19](https://arxiv.org/abs/2007.07846) 📄
95 | - [DeText: uma estrutura profunda de PNL para compreensão inteligente de texto](https://engineering.linkedin.com/blog/2020/open-sourcing-detext) 📄
96 | - [Fazendo incorporações de frases monolíngues multilíngues usando destilação de conhecimento](https://arxiv.org/pdf/2004.09813.pdf) 📄
97 | - [Transformadores pré-treinados para classificação de texto: BERT e além](https://arxiv.org/abs/2010.06467) 📄
98 | - [REALM: Pré-treinamento de modelo de linguagem aumentada de recuperação](https://arxiv.org/abs/2002.08909)
99 | - [ELECTRA: CODIFICADORES DE TEXTO DE PRÉ-TREINAMENTO COMO DISCRIMINADORES EM VEZ DE GERADORES](https://openreview.net/pdf?id=r1xMH1BtvB)📄
100 | - [Melhorando o aprendizado profundo para pesquisa no Airbnb](https://arxiv.org/pdf/2002.05515)
101 | - [Gerenciando a Diversidade na Pesquisa Airbnb](https://arxiv.org/abs/2004.02621)📄
102 | - [Aprendizagem contrastiva negativa aproximada do vizinho mais próximo para recuperação de texto denso](https://arxiv.org/abs/2007.00808v1)📄
103 | - [Incorporações de estilo de imagem não supervisionado para tarefas de recuperação e reconhecimento](https://openaccess.thecvf.com/content_WACV_2020/papers/Gairola_Unsupervised_Image_Style_Embeddings_for_Retrieval_and_Recognition_Tasks_WACV_2020_paper.pdf)📷
104 | - [DeCLUTR: Aprendizagem Contrastiva Profunda para Representações Textuais Não Supervisionadas](https://arxiv.org/abs/2006.03659)📄
105 |
106 | ### 2021
107 |
108 | - [Abordagem híbrida para cálculo de similaridade semântica entre palavras Tamil](https://www.researchgate.net/publication/350112163_Hybrid_approach_for_semantic_similarity_calculation_between_Tamil_words) 📄
109 | - [SBERT aumentado](https://arxiv.org/pdf/2010.08240.pdf) 📄
110 | - [BEIR: um benchmark heterogêneo para avaliação zero-shot de modelos de recuperação de informações](https://arxiv.org/abs/2104.08663) 📄
111 | - [Pesquisa visual heterogênea com reconhecimento de compatibilidade](https://arxiv.org/abs/2105.06047) 📷
112 | - [Aprendendo estilo pessoal com alguns exemplos](https://chuanenlin.com/personalstyle)📷
113 | - [TSDAE: Usando codificador automático de eliminação de ruído sequencial baseado em transformador para aprendizagem não supervisionada de incorporação de frases](https://arxiv.org/abs/2104.06979)📄
114 | - [Uma Pesquisa de Transformadores](https://arxiv.org/abs/2106.04554)📄📷
115 | - [SPLADE: modelo lexical esparso e de expansão para classificação de primeiro estágio](https://dl.acm.org/doi/10.1145/3404835.3463098)📄
116 | - [Sugestões de consulta de pesquisa relacionada de alta qualidade usando Deep Reinforcement Learning](https://arxiv.org/abs/2108.04452v1)
117 | - [Recuperação de produto baseada em incorporação na pesquisa Taobao](https://arxiv.org/pdf/2106.09297.pdf)📄📷
118 | - [TPRM: um modelo de classificação personalizado baseado em tópicos para pesquisa na Web](https://arxiv.org/abs/2108.06014)📄
119 | - [mMARCO: uma versão multilíngue do conjunto de dados de classificação de passagens MS MARCO](https://arxiv.org/abs/2108.13897)📄
120 | - [Raciocínio de banco de dados sobre texto](https://aclanthology.org/2021.acl-long.241.pdf)📄
121 | - [Como o ajuste fino adversário beneficia o BERT?](https://arxiv.org/abs/2108.13602))📄
122 | - [Treinar curto, testar longo: atenção com polarizações lineares permite extrapolação de comprimento de entrada](https://arxiv.org/abs/2108.12409)📄
123 | - [Primer: Procurando Transformadores Eficientes para Modelagem de Linguagem](https://arxiv.org/abs/2109.08668)📄
124 | - [Quão familiar isso parece? Representacional Multilíngue
125 | Análise de similaridade de incorporações acústicas de palavras](https://arxiv.org/pdf/2109.10179.pdf)🔊
126 | - [SimCSE: Aprendizagem contrastiva simples de incorporações de frases](https://arxiv.org/abs/2104.08821#)📄
127 | - [Atenção Composicional: Desembaraçando Pesquisa e Recuperação](https://arxiv.org/abs/2110.09419)📄📷
128 | - [SPANN: pesquisa aproximada de vizinho mais próximo em escala de bilhões de dólares altamente eficiente](https://arxiv.org/abs/2111.08566)
129 | - [GPL: Pseudo-rotulagem generativa para adaptação de domínio não supervisionado de recuperação densa](https://arxiv.org/abs/2112.07577) 📄
130 | - [Mecanismos de pesquisa generativos: experimentos iniciais](https://computationalcreativity.net/iccc21/wp-content/uploads/2021/09/ICCC_2021_paper_50.pdf) 📷
131 | - [Repensando a pesquisa: transformando diletantes em especialistas em domínio](https://dl.acm.org/doi/10.1145/3476415.3476428) -[WhiteningBERT: uma abordagem fácil de incorporação de frases não supervisionadas](https://arxiv.org/abs/2104.01767)
132 |
133 | ### 2022
134 |
135 | - [Incorporações de texto e código por pré-treinamento contrastivo](https://arxiv.org/abs/2201.10005)📄
136 | - [RELIC: Recuperando evidências para reivindicações literárias](https://arxiv.org/abs/2203.10053)📄
137 | - [Trans-Encoder: modelagem não supervisionada de pares de frases por meio de destilações próprias e mútuas](https://arxiv.org/abs/2109.13059)📄
138 | - [SAMU-XLSR: Representação de fala interlingual em nível de expressão multimodal semanticamente alinhada](https://arxiv.org/abs/2205.08180)🔊
139 | - [Uma análise de funções de fusão para recuperação híbrida](https://arxiv.org/abs/2210.11934)📄
140 | - [Detecção fora de distribuição com vizinhos mais próximos](https://arxiv.org/abs/2204.06507)
141 | - [ESB: uma referência para reconhecimento de fala ponta a ponta em vários domínios](https://arxiv.org/abs/2210.13352)🔊
142 | - [Analisando incorporações de palavras acústicas a partir de modelos de fala auto-supervisionados pré-treinados](https://arxiv.org/pdf/2210.16043.pdf))🔊
143 | - [Repensando com recuperação: inferência fiel do modelo de linguagem grande](https://arxiv.org/abs/2301.00303)📄
144 | - [Recuperação densa precisa de tiro zero sem rótulos de relevância](https://arxiv.org/pdf/2212.10496.pdf)📄
145 | - [Memória do transformador como índice de pesquisa diferenciável](https://arxiv.org/abs/2202.06991)📄
146 |
147 | ### 2023
148 |
149 | - [FINGER: Inferência rápida para pesquisa aproximada de vizinho mais próximo baseada em gráfico](https://dl.acm.org/doi/10.1145/3543507.3583318)📄
150 | - [Classificação de texto de “baixos recursos”: um método de classificação sem parâmetros com compressores](https://aclanthology.org/2023.findings-acl.426/)📄
151 | - [SparseEmbed: aprendendo representações lexicais esparsas com incorporações contextuais para recuperação](https://dl.acm.org/doi/pdf/10.1145/3539618.3592065) 📄
152 |
153 | ## Artigos
154 |
155 | - [Combatendo a pesquisa semântica](https://adityamalte.substack.com/p/tackle-semantic-search/)
156 | - [Pesquisa semântica no Azure Cognitive Search](https://docs.microsoft.com/en-us/azure/search/semantic-search-overview)
157 | - [Como usamos a pesquisa semântica para tornar nossa pesquisa 10 vezes mais inteligente](https://zilliz.com/blog/How-we-used-semantic-search-to-make-our-search-10-x-smarter/)
158 | - [Stanford AI Blog: Construindo modelos de PNL escaláveis, explicáveis e adaptativos com recuperação](https://ai.stanford.edu/blog/retrieval-based-NLP/)
159 | - [Construindo um mecanismo de pesquisa semântico com embeddings de palavras de espaço duplo](https://m.mage.ai/building-a-semantic-search-engine-with-dual-space-word-embeddings-f5a596eb6d90)
160 | - [Pesquisa de similaridade semântica em escala de bilhões com FAISS+SBERT](https://towardsdatascience.com/billion-scale-semantic-similarity-search-with-faiss-sbert-c845614962e2)
161 | - [Algumas observações sobre limites de pesquisa de similaridade](https://greglandrum.github.io/rdkit-blog/similarity/reference/2021/05/26/similarity-threshold-observations1.html)
162 | - [Pesquisa de imagens quase duplicadas usando hash sensível à localidade](https://keras.io/examples/vision/near_dup_search/)
163 | - [Curso gratuito sobre pesquisa de similaridade vetorial e Faiss](https://link.medium.com/HtFoFKlKvkb)
164 | - [Guia abrangente para algoritmos aproximados de vizinhos mais próximos](https://link.medium.com/V62Z8drvEkb)
165 | - [Apresentando o índice híbrido para permitir a pesquisa semântica com reconhecimento de palavras-chave](https://www.pinecone.io/learn/hybrid-search/?utm_medium=email&_hsmi=0&_hsenc=p2ANqtz--zLu9hiyh-y_XTa7FCEpi8JESJKmif5dhpYtAxTWka8PIttaTOGE21LMZlg9EOZyPYpCm6GDvYy57tlGRwH6TjgLCsJg&utm_content=231741722&utm_source=hs_email)
166 | - [Pesquisa Semântica Argilla](https://docs.argilla.io/en/latest/guides/features/semantic-search.html)
167 | - [Co: aqui está o modelo de compreensão de texto multilíngue](https://txt.cohere.ai/multilingual/)
168 | - [Simplifique a pesquisa com modelos de incorporação multilíngue](https://blog.vespa.ai/simplify-search-with-multilingual-embeddings/)
169 |
170 | ## Bibliotecas e ferramentas
171 |
172 | - [fastText](https://fasttext.cc/)
173 | - [Codificador de frase universal](https://tfhub.dev/google/universal-sentence-encoder/4)
174 | - [SBERT](https://www.sbert.net/)
175 | - [ELECTRA](https://github.com/google-research/electra)
176 | - [LaBSE](https://tfhub.dev/google/LaBSE/2)
177 | - [LASER](https://github.com/facebookresearch/LASER)
178 | - [Relevance AI - Plataforma vetorial da experimentação à implantação](https://relevance.ai)
179 | - [Palheiro](https://github.com/deepset-ai/haystack/)
180 | - [Jina.AI](https://jina.ai/)
181 | - [pinha](https://www.pinecone.io/)
182 | - [Kit de ferramentas SentEval](https://github.com/facebookresearch/SentEval?utm_source=catalyzex.com)
183 | - [ranx](https://github.com/AmenRa/ranx)
184 | - [BEIR: Comparativo de RI](https://github.com/UKPLab/beir)
185 | - [RELiC: recuperando evidências para conjunto de dados de reivindicações literárias](https://relic.cs.umass.edu/)
186 | - [matchzoo-py](https://github.com/NTMC-Community/MatchZoo-py)
187 | - [deep_text_matching](https://github.com/wangle1218/deep_text_matching)
188 | - [Qual quadro?](http://qualframe.com/)
189 | - [lexica.art](https://lexica.art/)
190 | - [pesquisa semântica de emoji](https://github.com/lilianweng/emoji-semantic-search)
191 | - [PySerini](https://github.com/castorini/pyserini)
192 | - [BERTSerini](https://github.com/rsvp-ai/bertserini)
193 | - [BERTSimilarity](https://github.com/Brokenwind/BertSimilarity)
194 | - [milvus](https://www.milvus.io/)
195 | - [NeuroNLP++](https://plusplus.neuronlp.fruitflybrain.org/)
196 | - [weaviate](https://github.com/semi-technologies/weaviate)
197 | - [pesquisa semântica através da wikipedia-com-weaviate](https://github.com/semi-technologies/semantic-search-through-wikipedia-with-weaviate)
198 | - [pesquisa em linguagem natural do YouTube](https://github.com/haltakov/linguagemnatural-youtube-search)
199 | - [same.energy](https://www.same.energy/about)
200 | - [ann benchmarks](http://ann-benchmarks.com/)
201 | - [scaNN](https://github.com/google-research/google-research/tree/master/scann)
202 | - [REALM](https://github.com/google-research/linguagem/tree/master/idioma/realm)
203 | - [irritante](https://github.com/spotify/annoy)
204 | - [pynndescente](https://github.com/lmcinnes/pynndescente)
205 | - [nsg](https://github.com/ZJULearning/nsg)
206 | - [FALCONN](https://github.com/FALCONN-LIB/FALCONN)
207 | - [redis HNSW](https://github.com/zhao-lang/redis_hnsw)
208 | - [autofaiss](https://github.com/criteo/autofaiss)
209 | - [DPR](https://github.com/facebookresearch/DPR)
210 | - [rank_BM25](https://github.com/dorianbrown/rank_bm25)
211 | - [nearPy](http://pixelogik.github.io/NearPy/)
212 | - [vearch](https://github.com/vearch/vearch)
213 | - [vespa](https://github.com/vespa-engine/vespa)
214 | - [PyNNDescent](https://github.com/lmcinnes/pynndescent)
215 | - [pgANN](https://github.com/netrasys/pgANN)
216 | - [Semelhança do Tensorflow](https://github.com/tensorflow/similarity)
217 | - [opensemanticsearch.org](https://www.opensemanticsearch.org/)
218 | - [Pesquisa Semântica GPT3](https://gpt3demo.com/category/semantic-search)
219 | - [pesquisar](https://github.com/lubianat/searchy)
220 | - [txtai](https://github.com/neuml/txtai)
221 | - [HyperTag](https://github.com/Ravn-Tech/HyperTag)
222 | - [vetorai](https://github.com/vector-ai/vectorai)
223 | - [embeddinghub](https://github.com/featureform/embeddinghub)
224 | - [AquilaDb](https://github.com/Aquila-Network/AquilaDB)
225 | - [STripNet](https://github.com/stephenleo/stripnet)
226 |
227 | ## Conjuntos de dados
228 |
229 | - [Hub de conjunto de dados de similaridade de texto semântico](https://github.com/brmson/dataset-sts)
230 | - [Desafio de similaridade de imagens de IA do Facebook](https://www.drivendata.org/competitions/79/competition-image-similarity-1-dev/?fbclid=IwAR31vRV0EdxRdrxtPy12neZtBJQ0H9qdLHm8Wl2DjHY09PtQdn1nEEIJVUo)
231 | - [WIT: conjunto de dados de texto de imagem baseado na Wikipédia](https://github.com/google-research-datasets/wit)
232 | - [BEIR](https://github.com/beir-cellar/beir)
233 | - MTEB
234 |
235 | ## Conquistas
236 |
237 | Dê uma olhada no [quadro do projeto](https://github.com/Agrover112/awesome-semantic-search/projects/1) para ver a lista de tarefas para contribuir com qualquer uma das questões em aberto.
238 |
--------------------------------------------------------------------------------
/README_Spanish.md:
--------------------------------------------------------------------------------
1 | # Awesome Semantic-Search [](https://awesome.re) [](https://conventionalcommits.org)
2 |
3 |
4 |
5 |
6 |
7 |
8 | Logo hecho por [@createdbytango](https://instagram.com/createdbytango).
9 |
10 | Este repositorio intenta ser un meta-repositorio para los temas relacionados [Búsqueda Semántica](https://en.wikipedia.org/wiki/Semantic_search) and [Similaridad Semántica](http://nlpprogress.com/english/semantic_textual_similarity.html).
11 |
12 | La búsqueda semántica no está limitada solamente a texto! Puede hacerse con imágenes, discursos, etcétera. Es por eso que hay muchos casos en los que la búsqueda semántica se puede aplicar.
13 |
14 | ## Índice
15 |
16 | - [Papers](#papers)
17 | - [2014](#2014)
18 | - [2015](#2015)
19 | - [2016](#2016)
20 | - [2017](#2017)
21 | - [2018](#2018)
22 | - [2019](#2019)
23 | - [2020](#2020)
24 | - [2021](#2021)
25 | - [Artículos](#articulos)
26 | - [Librerías y Herramientas](#librerías-y-herramientas)
27 | - [Conjuntos de datos](#conjuntos-de-datos)
28 | - [Hitos](#hitos)
29 |
30 | ## Papers
31 | ### 2014
32 | - [Un modelo semántico latente con estructura Pooling Convolucional para la recopilación de información](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/cikm2014_cdssm_final.pdf) 📄
33 |
34 | ### 2015
35 | - [Vectores Skip-Thought](https://arxiv.org/pdf/1506.06726.pdf) 📄
36 |
37 | ### 2016
38 | - [Bolsa de trucos para la clasificación eficiente de textos](https://arxiv.org/abs/1607.01759) 📄
39 | - [Vectores de palabras enrriquecedores con información Subword](https://arxiv.org/abs/1607.04606) 📄
40 | - [Aproximaciones robustas y eficientes para la búsqueda del vecino mas cercano usando grafos Jerárquicos Navegables de Mundos Pequeños](https://arxiv.org/abs/1603.09320)
41 | - [Sobre la aproximación al buscar Embeddings de Palabras Similares](https://www.aclweb.org/anthology/P16-1214.pdf)
42 | - [Aprendiendo las Distribuciones de Representaciones de Oraciones a partir de Información Sin Clasificar](https://arxiv.org/abs/1602.03483)📄
43 |
44 | ### 2017
45 | - [Aprendizaje supervisado de las Representaciones de Oraciones Universales a partir de el Lenguaje Natural de los Datos de Inferencia](https://research.fb.com/wp-content/uploads/2017/09/emnlp2017.pdf) 📄
46 |
47 | ### 2018
48 | - [Codificador de Oraciones Universal](https://arxiv.org/pdf/1803.11175.pdf) 📄
49 | - [Aprendiendo la Similaridad Semántica Textual a partir de conversaciones](https://arxiv.org/pdf/1804.07754.pdf) 📄
50 | - [Blog de IA de Google: Avances en la Similaridad Textual Semántica](https://ai.googleblog.com/2018/05/advances-in-semantic-textual-similarity.html) 📄
51 | - [Optimización de la Indexación basada en los k Vecinos más Cercanos por Proximidad en Búsqueda en Datos de Varias Dimensiones](https://arxiv.org/abs/1810.07355)
52 |
53 | ### 2019
54 | - [ROAL: Representaciones de Oraciones Agnósticas del Lenguaje](https://engineering.fb.com/2019/01/22/ai-research/laser-multilingual-sentence-embeddings/) 📄
55 | - [Expansión de Documentos por Predicción de Consultas](https://arxiv.org/abs/1904.08375) 📄
56 | - [Oraciones-BERT: Embeddings de Oraciones usando Redes Siamesas BERT](https://arxiv.org/pdf/1908.10084.pdf) 📄
57 | - [Ranking de Oraciones Multi Fase con BERT](https://arxiv.org/abs/1910.14424) 📄
58 | - [Recuperación Latente para Respuestas a preguntas de dominio abierto Débilmente Supervisadas](https://arxiv.org/abs/1906.00300)
59 | - [End-to-End Respuestas a Preguntas de Dominio Abierto con BERTserini](https://www.aclweb.org/anthology/N19-4013/)
60 |
61 | ### 2020
62 | - [Desplegando Rapidamente un Mecanismo de Búsqueda Neuronal para los Conjuntos de Datos de Investigación Abiertos de COVID-19: Pensamientos Preeliminares y Lecciones Aprendidas](https://arxiv.org/abs/2004.05125) 📄
63 | - [RE-CLASIFICACIÓN DE PASAJE CON BERT](https://arxiv.org/pdf/1901.04085.pdf) 📄
64 | - [CO-Búsqueda: Recuperación de infomación de COVID-19 con Búsqueda Semántica, Respondiendo Preguntas y Resumen Abstracto.](https://arxiv.org/pdf/2006.09595.pdf) 📄
65 | - [EOALaB: Embedding de Oraciones Agnósticas del lenguaje BERTLanguage-agnostic BERT Sentence Embedding](https://arxiv.org/abs/2007.01852) 📄
66 | - [Covidex: Modelos de Ranking Neural e Infraestructura de Búsqueda de Palabras Clave para los Conjuntos de Datos abiertos de COVID-19](https://arxiv.org/abs/2007.07846) 📄
67 | - [DeText: Un framework profundo de NLP para entender textos inteligentes](https://engineering.linkedin.com/blog/2020/open-sourcing-detext) 📄
68 | - [Haciendo Embeddings de Oraciones Monolinguales Multilinguales usando Destilación de Conocimiento](https://arxiv.org/pdf/2004.09813.pdf) 📄
69 | - [Transformadores Preentrenados para Ranking de textos: BERT y más allá](https://arxiv.org/abs/2010.06467) 📄
70 | - [LMPRA: Language de Modelo Preentrenado para Recuperacion Aumentada](https://arxiv.org/abs/2002.08909)
71 | - [ELECTRA: PREENTRENANDO CODIFICADORES DE TEXTOS COMO DISCRIMINADORES EN VEZ DE COMO GENERADORES](https://openreview.net/pdf?id=r1xMH1BtvB)📄
72 | ### 2021
73 | - [SBERT Aumentado](https://arxiv.org/pdf/2010.08240.pdf) 📄
74 | - [BEIR: Un Punto de Referencia Homogéneo para Evaluaciones Zero-shot de Modelos de Recuperación de Información](https://arxiv.org/abs/2104.08663) 📄
75 | - [Búsquedas Visuales Conscientes de Compatibilidad Heterogénea](https://arxiv.org/abs/2105.06047) 📷
76 | - [Aprendiendo el Estilo Personal a partir de Pocos Ejempos](https://chuanenlin.com/personalstyle)📷
77 | - [TSDAE: Usando Codificadores Automáticos de Eliminación del ruido Basados en Transformaciones para Aprendizaje Sin Supervisión de Embedding de Oraciones](https://arxiv.org/abs/2104.06979)📄
78 | - [Una Encuesta sobre Transformaciones](https://arxiv.org/abs/2106.04554)📄📷
79 |
80 | ## Artículos
81 |
82 | - [Abordando la Búsqueda Semántica](https://adityamalte.substack.com/p/tackle-semantic-search/)
83 | - [Búsqueda Semántica en Azure Congnitive Search](https://docs.microsoft.com/en-us/azure/search/semantic-search-overview)
84 | - [Como usamos búsqueda semántica pra hacer nuestras búsquedas 10x más rápidas](https://zilliz.com/blog/How-we-used-semantic-search-to-make-our-search-10-x-smarter)
85 | - [Construyendo un sistema de búsqueda semántico con doble embedding de palabras](https://m.mage.ai/building-a-semantic-search-engine-with-dual-space-word-embeddings-f5a596eb6d90)
86 | - [Búsqueda de similaridad semántica de escala un Millón FAISS+SBERT](https://towardsdatascience.com/billion-scale-semantic-similarity-search-with-faiss-sbert-c845614962e2)
87 | - [Algunas observaciones sobre umbrales de búsqueda de similiridades](https://greglandrum.github.io/rdkit-blog/similarity/reference/2021/05/26/similarity-threshold-observations1.html)
88 | ## Librerías y Herramientas
89 | - [fastText](https://fasttext.cc/)
90 | - [Universal Sentence Encoder](https://tfhub.dev/google/universal-sentence-encoder/4)
91 | - [SBERT](https://www.sbert.net/)
92 | - [ELECTRA](https://github.com/google-research/electra)
93 | - [LaBSE](https://tfhub.dev/google/LaBSE/2)
94 | - [LASER](https://github.com/facebookresearch/LASER)
95 | - [Haystack](https://github.com/deepset-ai/haystack/)
96 | - [Jina.AI](https://jina.ai/)
97 | - [SentEval Toolkit](https://github.com/facebookresearch/SentEval?utm_source=catalyzex.com)
98 | - [BEIR :Benchmarking IR](https://github.com/UKPLab/beir)
99 | - [matchzoo-py](https://github.com/NTMC-Community/MatchZoo-py)
100 | - [Which Frame?](http://whichframe.com/)
101 | - [PySerini](https://github.com/castorini/pyserini)
102 | - [BERTSerini](https://github.com/rsvp-ai/bertserini)
103 | - [BERTSimilarity](https://github.com/Brokenwind/BertSimilarity)
104 | - [milvus](https://www.milvus.io/)
105 | - [weaviate](https://github.com/semi-technologies/weaviate)
106 | - [natural-language-youtube-search](https://github.com/haltakov/natural-language-youtube-search)
107 | - [same.energy](https://www.same.energy/about)
108 | - [scaNN](https://github.com/google-research/google-research/tree/master/scann)
109 | - [REALM](https://github.com/google-research/language/tree/master/language/realm)
110 | - [annoy](https://github.com/spotify/annoy)
111 | - [faiss](https://github.com/facebookresearch/faiss)
112 | - [DPR](https://github.com/facebookresearch/DPR)
113 | - [rank_BM25](https://github.com/dorianbrown/rank_bm25)
114 | - [nearPy](http://pixelogik.github.io/NearPy/)
115 | - [vearch](https://github.com/vearch/vearch)
116 | - [PyNNDescent](https://github.com/lmcinnes/pynndescent)
117 | - [pgANN](https://github.com/netrasys/pgANN)
118 | - [opensemanticsearch.org](https://www.opensemanticsearch.org/)
119 | - [GPT3 Semantic Search](https://gpt3demo.com/category/semantic-search)
120 | - [searchy](https://github.com/lubianat/searchy)
121 | ## Conjuntos de Datos
122 | - [Conjunto de Datos de Textos de Similaridad Semántica](https://github.com/brmson/dataset-sts)
123 |
124 | ## Hitos
125 |
126 | Mira el [projecto](https://github.com/Agrover112/awesome-semantic-search/projects/1) para ver la lista de tareas a contribuir para cualquiera de los Issues abiertos.
127 |
--------------------------------------------------------------------------------
/_config.yml:
--------------------------------------------------------------------------------
1 | theme: jekyll-theme-minimal
--------------------------------------------------------------------------------
/logo.svg:
--------------------------------------------------------------------------------
1 |
75 |
--------------------------------------------------------------------------------