├── POLICY.md ├── LICENSE └── README.md /POLICY.md: -------------------------------------------------------------------------------- 1 | # Submission policy 2 | 3 | Following the spirit of promoting open source software, you should provide a reference to your github repository as first priority. If links are pointed to non open source sites ( ie : your company website ), you will be deprioritized first and categorized into [cloud service](https://github.com/currentsapi/awesome-vector-search#cloud-service). 4 | 5 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 Pheme Pte Ltd 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Awesome Vector Search Engine 2 | 3 | 4 | > A curated list of awesome vector search framework/engine, library, cloud service and research papers to vector similarity search 5 | 6 | 7 | ### Standalone Service 8 | - [Apache Cassandra 5.0 – Vector search (cep-30), Strict Serialisable ACID (cep-15), horizontally scaling database](https://cassandra.apache.org) 9 | - [Qdrant - Vector Similarity Search Engine with extended filtering support](https://github.com/qdrant/qdrant) 10 | - [Vald - A Highly Scalable Distributed Vector Search Engine](https://github.com/vdaas/vald) 11 | - [Milvus - A cloud-native vector database with high-performance and high scalability.](https://github.com/milvus-io/milvus) 12 | - [Weaviate - A cloud-native, real-time vector search engine](https://github.com/semi-technologies/weaviate) 13 | - [OpenDistro Elasticsearch KNN - A machine learning plugin which supports an approximate k-NN search algorithm for Open Distro for Elasticsearch](https://github.com/opendistro-for-elasticsearch/k-NN) 14 | - [Elastiknn - Elasticsearch plugin for nearest neighbor search](https://github.com/alexklibisz/elastiknn) 15 | - [Epsilla - A High Performance Vector Database Management System, Hippocampus For AI](https://github.com/epsilla-cloud/vectordb) 16 | - [Vearch - A scalable distributed system for efficient similarity search of deep learning vectors](https://github.com/vearch/vearch) 17 | - [pgANN - Fast Approximate Nearest Neighbor (ANN) searches with a PostgreSQL database](https://github.com/netrasys/pgANN) 18 | - [Jina - Jina allows you to build deep learning-powered search-as-a-service.](https://github.com/jina-ai/jina) 19 | - [Infinity - The AI-native database built for LLM applications, providing incredibly fast vector and full-text search](https://github.com/infiniflow/infinity) 20 | - [Aquila DB - Distribution focused k-NN search algorithm](https://github.com/Aquila-Network/AquilaDB) 21 | - [Redis HNSW - A redis module for similarity search based on HNSW](https://github.com/zhao-lang/redis_hnsw) 22 | - [Solr - Apache Solr](https://github.com/apache/solr) - [has a Dense Vector Search feature as of Solr 9.0](https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html) 23 | - [Marqo - A semantic search engine which supports tensor search (sequence of vectors)](https://github.com/marqo-ai/marqo) 24 | - [txtai - Build semantic search applications and workflows](https://github.com/neuml/txtai) 25 | - [Semantra - A multipurpose tool for semantically searching documents.](https://github.com/freedmand/semantra) 26 | - [SuperDuperDB - Bring AI to your favorite database](https://github.com/SuperDuperDB/superduperdb) 27 | - [TensorDB - High Performance Vector Database Supporting Heterogeneous Computing](https://www.actionsky.com/tensorDB) 28 | - [JVector - a pure Java, zero dependency, embedded vector search engine, used by DataStax Astra DB and Apache Cassandra.](https://github.com/jbellis/jvector/) 29 | - [VQLite - Simple and Lightweight Vector Search Engine](https://github.com/VQLite/VQLite) 30 | - [Vexvault - 100% browser based, open source, scalable, simple, zero-cost vector search](https://github.com/Xyntopia/vexvault) 31 | - [Vespa.ai - Text search engine and ... fast approximate vector search (ANN)](https://github.com/vespa-engine) 32 | - [Vespa's large-scale ANN search using HNSW-IF indexes is described here](https://blog.vespa.ai/vespa-hybrid-billion-scale-vector-search/) 33 | 34 | ### Library 35 | - [LangStream - LangStream is an open-source project that combines the best of event-based architectures with the latest Gen AI technologies.](https://langstream.ai) 36 | - [CassIO - CassIO is the ultimate solution for seamlessly integrating Apache Cassandra® with generative artificial intelligence and other machine learning workloads](https://cassio.org) 37 | - [JVector - A pure Java, zero dependency, embedded vector search engine used by some of the advanced distributed databases such as DataStax Astra DB & Apache Cassandra™](https://github.com/jbellis/jvector) 38 | - [Faiss - A library for efficient similarity search and clustering of dense vectors](https://github.com/facebookresearch/faiss) 39 | - [Distributed Faiss - Work with FAISS indexes which don't fit into a single server memory](https://github.com/facebookresearch/distributed-faiss) 40 | - [Autofaiss - Automatically create Faiss knn indices](https://github.com/criteo/autofaiss) 41 | - [ScaNN - A library efficient vector similarity search at scale. ](https://github.com/google-research/google-research/tree/master/scann) 42 | - [NMSLIB - Non-Metric Space Library, an efficient similarity search library for generic non-metric spaces](https://github.com/nmslib/nmslib) 43 | - [Annoy - C++ library with Python bindings to search for points](https://github.com/spotify/annoy) 44 | - [FLANN - Library written in C++ and contains bindings for the following languages: C, MATLAB, Python, and Ruby](http://www.cs.ubc.ca/research/flann/) 45 | - [LLM App - Open-source Python library for a real-time data KNN (K-Nearest Neighbors) indexing](https://github.com/pathwaycom/llm-app) 46 | - [MRPT - Fast nearest neighbor search with random projection](https://github.com/teemupitkanen/mrpt) 47 | - [RPForest - Python library for approximate nearest neighbours search](https://github.com/lyst/rpforest) 48 | - [pgvector - Open-source vector similarity search extension for Postgres](https://github.com/pgvector/pgvector) 49 | - [PASE - Ultra-High-Dimensional approximate nearest neighbor search extension for Postgres](https://github.com/alipay/PASE) 50 | - [Pyserini - Toolkit for reproducible information retrieval research with sparse and dense representations](https://github.com/castorini/pyserini) 51 | - [NGT - Provides commands and a library for performing high-speed approximate nearest neighbor ](https://github.com/yahoojapan/NGT) 52 | - [NearPy - Approximate search using different locality-sensitive hashing methods](http://pixelogik.github.io/NearPy/) 53 | - [TOROS N2 - lightweight approximate Nearest Neighbor library](https://github.com/kakao/n2) 54 | - [PUFFINN - Parameterless and Universal Fast FInding of Nearest Neighbors](https://github.com/puffinn/puffinn) 55 | - [SPTAG - A distributed approximate nearest neighborhood search (ANN) library ](https://github.com/microsoft/SPTAG) 56 | - [PyNNDescent - A python nearest neighbor descent for approximate k nearest neighbors](https://github.com/lmcinnes/pynndescent) 57 | - [TarsosLSH - A Java library implementing practical nearest neighbour search algorithm for multidimensional vectors ](https://github.com/JorenSix/TarsosLSH) 58 | - [TorchPQ - Efficient implementations of Product Quantization and its variants using Pytorch and CUDA](https://github.com/DeMoriarty/TorchPQ) 59 | - [Granne - Graph-based retrieval of approximate nearest neighbors witten in rust ](https://github.com/granne/granne) 60 | - [Embeddinghub - A database built for machine learning embeddings](https://github.com/featureform/embeddinghub) 61 | - [Hora - Efficient approximate nearest neighbor search algorithm collections library written in Rust](https://github.com/hora-search/hora) 62 | - [Voy - A WASM vector similarity search engine written in Rust](https://github.com/tantaraio/voy) 63 | - [Chroma - The open-source embedding database for building LLM apps in Python or JavaScript with memory](https://github.com/chroma-core/chroma) 64 | - [USearch - Smaller & Faster Vector Search Engine for C++, Python, JavaScript, Rust, Java, GoLang, Wolfram](https://github.com/unum-cloud/usearch) 65 | - [Golang vector stores collection - Chroma, PGVector interfaces](https://github.com/urjitbhatia/vectorstores) 66 | - [Scalable Vector Search (SVS) - A performance library for vector similarity search](https://github.com/IntelLabs/ScalableVectorSearch) 67 | 68 | ### Cloud Service 69 | 70 | - [Epsilla Cloud - The fully managed serverless vector database with 10X faster, cheaper and better.](https://cloud.epsilla.com) 71 | - [DataStax Astra Vector - Multi-cloud, serverless vector DBaaS](https://www.datastax.com/products/vector-search) 72 | - [Relevance AI - Vector Platform From Experimentation To Deployment](https://relevance.ai/vectors/) 73 | - [Pinecone - Managed vector search with filtering, live index updates, horizontal scaling, and a lot more](https://www.pinecone.io) 74 | - [MyScale - A managed vector database based on ClickHouse](https://myscale.com) 75 | - [Redis Cloud - Managed vector database in Redis](https://redis.com/cloud) 76 | - [Zilliz Cloud - Cloud-native service for Milvus](https://zilliz.com/cloud) 77 | 78 | ### Research Papers 79 | 80 | List of methods on how approximate vector search algorithm can be implemented more effciently. 81 | 82 | - [SPANN: Highly-efficient Billion-scale Approximate Nearest Neighborhood Search - NEURIPS 2021](https://proceedings.neurips.cc/paper/2021/hash/299dc35e747eb77177d9cea10a802da2-Abstract.html) 83 | - [Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors - ECCV 2018](https://openaccess.thecvf.com/content_ECCV_2018/html/Dmitry_Baranchuk_Revisiting_the_Inverted_ECCV_2018_paper.html) 84 | - [Accelerating Large-Scale Inference with Anisotropic Vector Quantization](https://arxiv.org/abs/1908.10396) 85 | - [Billion-scale similarity search with GPUs](https://arxiv.org/abs/1702.08734) 86 | - [Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs](https://arxiv.org/abs/1603.09320) 87 | - [Optimization of Indexing Based on k-Nearest Neighbor Graph for Proximity Search in High-dimensional Data](https://arxiv.org/abs/1810.07355) 88 | - [On Approximately Searching for Similar Word Embeddings - ACL 2016](https://www.aclweb.org/anthology/P16-1214.pdf) 89 | 90 | [![CC0](https://i.creativecommons.org/p/zero/1.0/88x31.png)](https://creativecommons.org/publicdomain/zero/1.0/) 91 | --------------------------------------------------------------------------------