237 |
238 |
239 |
244 |
245 |
246 |
247 |
248 |
249 |
252 |
253 |
254 |
255 |
256 |
257 |
258 |
--------------------------------------------------------------------------------
/docsrc/FAQ.rst:
--------------------------------------------------------------------------------
1 |
2 | Frequently Asked Questions
3 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4 |
5 | Frequently Asked Questions
6 |
7 | I got a JSONDecodeError. How do I resolve this?
8 | #################################################
9 |
10 | A JSONDecodeError can be diagnosed in the following steps:
11 |
12 | * Check that the field entries are the same as the type that is stated in the examples and the typehints.
13 | * If the inputs are correct, message for help on the Discord chat! (Find link on README.)
14 |
15 | Does the engine run on a nearest neighbor implementation?
16 | ##################################################################
17 |
18 | The neighbor runs an Exact Nearest Neighbors implementation and when the number of documents exceeds
19 | 100k documents, the engine starts to run on ANN instead.
20 |
21 | When I insert a document with the same ID what happens to the document?
22 | ###################################################################################
23 |
24 | The document is over-written. In order to edit the document (ie – change a field or add a new field),
25 | you can use the `edit_document` function.
26 |
27 |
28 | How do I add a new field in a collection?
29 | #################################################
30 |
31 | Currently, the only way is to run edit_document.
32 |
33 | How do I get more search results?
34 | #################################################
35 |
36 | To get more search results, you want to change the `page_size` parameter. To view the next
37 | page of results, you want to use the cursor.
38 |
39 | Are there any limits to the API request calls?
40 | #################################################
41 |
42 | The API requests will time out after 400 seconds. There is no limit on the size of the request or anything.
43 |
44 | Is Vector AI able to store images and videos?
45 | #################################################
46 |
47 | We currently do not support storing images and videos but this can be fixed by using a link to the
48 | video or image instead.
49 |
50 |
--------------------------------------------------------------------------------
/docsrc/Makefile:
--------------------------------------------------------------------------------
1 | # Minimal makefile for Sphinx documentation
2 | #
3 |
4 | # You can set these variables from the command line, and also
5 | # from the environment for the first two.
6 | SPHINXOPTS ?=
7 | SPHINXBUILD ?= sphinx-build
8 | SOURCEDIR = .
9 | BUILDDIR = _build
10 |
11 | # Put it first so that "make" without argument is like "make help".
12 | help:
13 | @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14 |
15 | .PHONY: help Makefile
16 |
17 | # Catch-all target: route all unknown targets to Sphinx using the new
18 | # "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
19 | %: Makefile
20 | @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
21 |
22 | docs:
23 | cp ../examples/*.ipynb .
24 | rm -rf _build
25 | python3 -m sphinx . _build -j3
26 | rm *.ipynb
27 |
28 | docs-migrate:
29 | rm -rf _build
30 | cp ../examples/*.ipynb .
31 | rm -rf ../docs
32 | python3 -m sphinx . ../docs -j3
33 | rm *.ipynb
34 | touch ../docs/.nojekyll
35 |
--------------------------------------------------------------------------------
/docsrc/README.md:
--------------------------------------------------------------------------------
1 | ## Documentation
2 |
3 | To make documentation on Linux/Unix systems, (if you are on Windows, download WSL) and run:
4 |
5 | ```
6 | make docs-migrate
7 | ```
8 |
9 | or
10 |
11 | ```
12 | cp -r ../examples/*.ipynb .
13 | python3 -m sphinx . _build -j3
14 | # You can alter j above to the number of processes you want running in parallel. Afterwards, you can remove all notebooks from directory using:
15 | rm -f *.ipynb
16 | ```
17 |
18 | or if you only want to make them and store them in the docsrc subdirectory, run:
19 |
20 | ```
21 | make docs
22 | ```
23 |
24 | The process for generating the documentation is that the notebooks are copied into this folder, the documentation then runs nbsphinx into ../docs/ folder which then hosts all the html files.
25 |
--------------------------------------------------------------------------------
/docsrc/analytics.rst:
--------------------------------------------------------------------------------
1 | Visualisations
2 | ^^^^^^^^^^^^^^^^^^
3 |
4 | Visualisations
5 | =======================================================
6 | Visualisations
7 |
8 | .. automodule:: vectorai.analytics.viz
9 | :members:
10 |
--------------------------------------------------------------------------------
/docsrc/array_dict_vectorizer.rst:
--------------------------------------------------------------------------------
1 | Array & Dictionary
2 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3 |
4 | Array & Dictionary
5 | =======================================================
6 | Array & Dictionary
7 |
8 | .. automodule:: vectorai.api.array_dict_vectorizer
9 | :members:
10 |
--------------------------------------------------------------------------------
/docsrc/audio.rst:
--------------------------------------------------------------------------------
1 | Audios
2 | ^^^^^^^^^
3 |
4 | Audios
5 | =======================================================
6 | Audios
7 |
8 | .. automodule:: vectorai.api.audio
9 | :members:
10 |
--------------------------------------------------------------------------------
/docsrc/client.rst:
--------------------------------------------------------------------------------
1 | Client
2 | ^^^^^^^
3 |
4 | Client
5 | =======================================================
6 |
7 | Documentation for Vector AI client goes here.
8 |
9 | .. automodule:: vectorai.client
10 | :members:
11 |
12 |
13 |
--------------------------------------------------------------------------------
/docsrc/cluster.rst:
--------------------------------------------------------------------------------
1 | Cluster
2 | ^^^^^^^
3 |
4 | Cluster
5 | =======================================================
6 |
7 | Documentation for vector clustering goes here.
8 |
9 |
10 | .. automodule:: vectorai.api.cluster
11 | :members:
12 |
--------------------------------------------------------------------------------
/docsrc/conf.py:
--------------------------------------------------------------------------------
1 | # Configuration file for the Sphinx documentation builder.
2 | #
3 | # This file only contains a selection of the most common options. For a full
4 | # list see the documentation:
5 | # https://www.sphinx-doc.org/en/master/usage/configuration.html
6 |
7 | # -- Path setup --------------------------------------------------------------
8 |
9 | # If extensions (or modules to document with autodoc) are in another directory,
10 | # add these directories to sys.path here. If the directory is relative to the
11 | # documentation root, use os.path.abspath to make it absolute, like shown here.
12 | #
13 | # import os
14 | # import sys
15 | # sys.path.insert(0, os.path.abspath('.'))
16 |
17 |
18 | # -- Project information -----------------------------------------------------
19 |
20 | project = 'vectorai'
21 | copyright = '2020, OnSearch Pty Ltd'
22 | author = 'OnSearch Pty Ltd'
23 |
24 | # The full ve ion, including alpha/beta/rc tags
25 | release = '0.1.0'
26 |
27 |
28 | # -- General configuration ---------------------------------------------------
29 |
30 | # Add any Sphinx extension module names here, as strings. They can be
31 | # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
32 | # ones.
33 | extensions = [
34 | 'nbsphinx',
35 | "sphinx.ext.autodoc",
36 | "sphinx.ext.coverage",
37 | "sphinx.ext.napoleon",
38 | "sphinx_rtd_theme"
39 | ]
40 |
41 | # Add any paths that contain templates here, relative to this directory.
42 | templates_path = ['_templates']
43 |
44 | # List of patterns, relative to source directory, that match files and
45 | # directories to ignore when looking for source files.
46 | # This pattern also affects html_static_path and html_extra_path.
47 | exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
48 |
49 |
50 | # -- Options for HTML output -------------------------------------------------
51 |
52 | # The theme to use for HTML and HTML Help pages. See the documentation for
53 | # a list of builtin themes.
54 | #
55 | html_theme = "sphinx_rtd_theme"
56 | nbsphinx_execute = 'never'
57 |
58 | # Add any paths that contain custom static files (such as style sheets) here,
59 | # relative to this directory. They are copied after the builtin static files,
60 | # so a file named "default.css" will overwrite the builtin "default.css".
61 | html_static_path = ['_static']
62 | autodoc_member_order = 'bysource'
63 |
--------------------------------------------------------------------------------
/docsrc/dimensionality_reduction.rst:
--------------------------------------------------------------------------------
1 | Dimensionality Reduction
2 | ^^^^^^^^^^^^^^^^^^^^^^^^^^
3 |
4 | Dimensionality Reduction
5 | =======================================================
6 |
7 | .. automodule:: vectorai.api.dimensionality_reduction
8 | :members:
9 |
10 |
--------------------------------------------------------------------------------
/docsrc/image.rst:
--------------------------------------------------------------------------------
1 | Images
2 | ^^^^^^^^^
3 |
4 | Images
5 | =======================================================
6 | Images
7 |
8 | .. automodule:: vectorai.api.image
9 | :members:
10 |
--------------------------------------------------------------------------------
/docsrc/images/2d-cosine-similarity.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vector-ai/vectorai/7b1a0eb2bb06a82d85ac3633eea984604baf2ea6/docsrc/images/2d-cosine-similarity.png
--------------------------------------------------------------------------------
/docsrc/images/dimensionality_reduced_vector_plot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vector-ai/vectorai/7b1a0eb2bb06a82d85ac3633eea984604baf2ea6/docsrc/images/dimensionality_reduced_vector_plot.png
--------------------------------------------------------------------------------
/docsrc/images/vectordb-1d-plot-example-readme.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vector-ai/vectorai/7b1a0eb2bb06a82d85ac3633eea984604baf2ea6/docsrc/images/vectordb-1d-plot-example-readme.png
--------------------------------------------------------------------------------
/docsrc/images/vectordb-plot-1d-cosine-similarity-comparison.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vector-ai/vectorai/7b1a0eb2bb06a82d85ac3633eea984604baf2ea6/docsrc/images/vectordb-plot-1d-cosine-similarity-comparison.png
--------------------------------------------------------------------------------
/docsrc/index.rst:
--------------------------------------------------------------------------------
1 | .. vectorai documentation master file, created by
2 | sphinx-quickstart on Sat Sep 12 14:33:11 2020.
3 | You can adapt this file completely to your liking, but it should at least
4 | contain the root `toctree` directive.
5 |
6 | Welcome to Vector AI's documentation!
7 | ====================================
8 |
9 | .. image:: https://getvectorai.com/assets/logo-with-text.png
10 | :width: 600
11 | :alt: Vector AI
12 |
13 | Vector AI aims to store vectors alongside documents (text/audio/images/videos).
14 | It is designed to be a light-weight library to create/manipulate/search and analyse the
15 | underlying vectors to power machine learning applications such as semantic
16 | search, recommendations, etc.
17 |
18 | - Our REST API documentation can be found here: https://api.vctr.ai/documentation
19 | - Our discord can be found here: https://discord.gg/CbwUxyD
20 |
21 | Features:
22 |
23 | - **Multimedia Data Vectorisation**: Image2Vec, Audio2Vec, etc (Any data can be turned into vectors through machine learning)
24 | - **Vector Similarity Search**: Enable searching of vectors and rich multimedia with vector similarity search. The backbone of many popular A.I use cases like reverse image search, recommendations, personalisation, etc.
25 | - **Vector Operations**: Flexible search with out of the box operations on vectors. e.g. mean, median, sum, etc.
26 | - **Aggregation**: All the traditional aggregation you'd expect. e.g. group by mean, pivot tables, etc
27 | - **Clustering**: Interpret your vectors and data by allocating them to buckets and get statistics about these different buckets based on data you provide.
28 | - **Vector Analytics**: Get better understanding of your vectors by using out-of-the-box practical vector analytics, giving you better understanding of the quality of your vectors.
29 |
30 | Why Vector AI compared to other Nearest Neighbor implementations?
31 | -------------------------------------------------------------------
32 |
33 | - **Production Ready**: Our API is fully managed and can scale to power
34 | hundreds of millions of searches a day. Even at millions of searches
35 | it is blazing fast through edge caching, gpus and software
36 | optisation. So you never have to worry about scaling your
37 | infastructure as your use case scales.
38 | - **Richer understanding of your vectors and their properties**: Our
39 | library is designed to allow people to not just designed to obtain
40 | nearest neighbors but to actually use in production-ready search
41 | systems - allowing users to analyse, iterate, improve and
42 | productionise their vectors the moment they are added to the index.
43 | - **Simple to use. Quick to get started.**: One of our core design
44 | principles is that we focus a lot on how people can get started on
45 | using Vector AI as quickly as possible, while having a tonne of
46 | functionality and customisability options.
47 | - **Framework agnostic**: We are never going to force a specific
48 | framework on Vector AI. If you have a framework of choice, you can use
49 | it - as long as your documents are JSON-serializable!
50 | - **Store vector data with ease**: The document-orientated nature for
51 | Vector AI allows users to label, filter search and understand their
52 | vectors as much as possible. We think that other libraries that
53 | simply provide a nearest-neighbor implementation do not have as rich
54 | functionality.
55 |
56 |
57 | How to install
58 | ###############
59 |
60 | To install vectorai, run the following
61 |
62 | .. code-block:: RST
63 |
64 | pip install vectorai
65 |
66 |
67 | To install from source, clone the repository and then run
68 |
69 | .. code-block:: RST
70 |
71 | cd vectorai
72 | pip install -e .
73 |
74 | Schema
75 | ########
76 |
77 | We have a very simple schema to follow to allow you to optimise functionality with vector search:
78 |
79 | .. list-table:: Schema Rules
80 | :widths: 25 75
81 | :header-rows: 1
82 |
83 | * - Field
84 | - Purpose
85 |
86 | * - _id
87 | - ID of the document. These need to be unique for the document.
88 |
89 | * - _vector_
90 | - These are required to label the vectors for vector search.
91 |
92 | .. toctree::
93 | :maxdepth: 2
94 | :caption: Contents
95 |
96 | intro
97 | quickstart
98 |
99 | .. toctree::
100 | :caption: Guides
101 |
102 | industry_ecommerce
103 | vector_analytics_example
104 | custom_encodings_example
105 |
106 | .. toctree::
107 | :caption: Case Studies
108 |
109 | industry_nba_players
110 |
111 | .. toctree::
112 | :caption: Frequently Asked Questions
113 |
114 | FAQ
115 |
116 |
117 | .. toctree::
118 | :maxdepth: 2
119 | :caption: Documentation
120 |
121 | client
122 | read
123 | write
124 | cluster
125 | array_dict_vectorizer
126 | dimensionality_reduction
127 | vector_search
128 | image
129 | text
130 | audio
131 | analytics
132 |
133 |
134 | Indices and tables
135 | ==================
136 |
137 | * :ref:`genindex`
138 | * :ref:`modindex`
139 | * :ref:`search`
140 |
--------------------------------------------------------------------------------
/docsrc/intro.rst:
--------------------------------------------------------------------------------
1 |
2 | Vector AI - Essentials
3 | ^^^^^^^^^^^^^^^^^^^^^^
4 |
5 | Vector AI is built to store vectors alongside documents (text/audio/images/videos).
6 | It is designed to be a light-weight library to create, manipulate, search and analyse vectors to power machine
7 | learning applications such as semantic search, recommendations, etc.
8 |
9 | Important Terminologies
10 | =======================
11 | - **Vectors** (aka. embeddings, 1D arrays)
12 |
13 | - **Models/Encoders** (aka. Embedders) Turns data into vectors e.g. Word2Vec turns words into vectors
14 |
15 | - **Vector Similarity Search** (aka. Nearest Neighbor Search, Distance Search)
16 |
17 | - **Collection** (aka. Index, Table) ~ a collection is made up of multiple documents
18 |
19 | - **Documents** (aka. Json, Item, Dictionary, Row) ~ a document can contain vector + other important information
20 |
21 |
22 | .. code-block:: RST
23 | e.g.
24 | {
25 | "_id" : "1",
26 | "description_vector__ ": [...],
27 | "description" : "This is a great idea"
28 | }
29 |
30 | Some important information: for predefined vectors use the suffix "_vector_" in the name like "description_vector_", for ids to do quick key value lookup use the name "_id"
31 |
32 | Documents in Vector AI
33 | ========================
34 |
35 | Documents (dictionaries) consists of fields (dictionary keys) and values.
36 |
37 | 1. Vector AI is document orientated (dictionaries/jsons) which means you can have nested fields. This means that you have documents such as:
38 |
39 | .. code-block:: RST
40 |
41 | document_example = {
42 | "car": {
43 | "wheels":
44 | {
45 | "number": 4
46 | }
47 | }
48 | }
49 |
50 | then running vi_client.get_field("car.wheels.number") will return 4
51 |
52 | 2. When uploading documents into VectorAi, it will infer the schema from the first document being inserted.
53 |
54 | You are able to navigate the documents within the fields by using the functions below, allowing you to navigate through
55 | nested documents if the fields are separated by .'s.
56 |
57 | .. code-block:: python
58 |
59 | vi_client.set_field(field, doc, value)
60 | vi_client.get_field(field, doc)
61 | vi_client.set_field_across_documents(field, docs, values)
62 | vi_client.get_field_across_documents(field, docs)
63 |
64 | Models With Vector AI
65 | ========================
66 |
67 | Vector AI has deployed models that we've handpicked and tuned to work nicely out of the box on most problems.
68 | These models, however, may be changed over time. When they do we make sure that
69 | previous models are still deployed and can be used.
70 | To prototype something quickly we highly recommend using these deployed models.
71 |
72 |
73 | **If you are working on a problem that requires highly customised or finetuned models, reach out to us
74 | for enterprise services where we can fine tune these models for your use case or feel free to build your own.**
75 |
76 | Currently, our deployed models are:
77 | * ViText2Vec - our text to vector model
78 | * ViImage2Vec - our image to vector model
79 | * ViAudio2Vec - our audio to vector model
80 | * dimensionality_reduction_job - perform dimensionality reduction on your vectors
81 | * clustering_job - perform clustering on your vectors
82 | * advanced_cluster_job - perform clustering with advanced options on your vectors
83 |
--------------------------------------------------------------------------------
/docsrc/make.bat:
--------------------------------------------------------------------------------
1 | @ECHO OFF
2 |
3 | pushd %~dp0
4 |
5 | REM Command file for Sphinx documentation
6 |
7 | if "%SPHINXBUILD%" == "" (
8 | set SPHINXBUILD=sphinx-build
9 | )
10 | set SOURCEDIR=.
11 | set BUILDDIR=_build
12 |
13 | if "%1" == "" goto help
14 |
15 | xcopy ..\examples\*.ipynb .
16 |
17 | %SPHINXBUILD% >NUL 2>NUL
18 | if errorlevel 9009 (
19 | echo.
20 | echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
21 | echo.installed, then set the SPHINXBUILD environment variable to point
22 | echo.to the full path of the 'sphinx-build' executable. Alternatively you
23 | echo.may add the Sphinx directory to PATH.
24 | echo.
25 | echo.If you don't have Sphinx installed, grab it from
26 | echo.http://sphinx-doc.org/
27 | exit /b 1
28 | )
29 |
30 | %SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
31 |
32 | del -rf *.ipynb
33 |
34 | goto end
35 |
36 | :help
37 | %SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
38 |
39 | :end
40 | popd
41 |
--------------------------------------------------------------------------------
/docsrc/read.rst:
--------------------------------------------------------------------------------
1 | Read
2 | ^^^^^^
3 |
4 | Read
5 | =======================================================
6 | Read
7 |
8 | .. automodule:: vectorai.api.read
9 | :members:
10 |
11 | .. automodule:: vectorai.read
12 | :members:
13 |
--------------------------------------------------------------------------------
/docsrc/text.rst:
--------------------------------------------------------------------------------
1 | Texts
2 | ^^^^^^^^
3 |
4 | Texts
5 | =======================================================
6 | Texts
7 |
8 | .. automodule:: vectorai.api.text
9 | :members:
10 |
--------------------------------------------------------------------------------
/docsrc/vector_search.rst:
--------------------------------------------------------------------------------
1 | Search
2 | ^^^^^^
3 |
4 | Search
5 | =======================================================
6 | Search
7 |
8 | .. automodule:: vectorai.api.search
9 | :members:
10 |
--------------------------------------------------------------------------------
/docsrc/write.rst:
--------------------------------------------------------------------------------
1 | Write
2 | ^^^^^
3 |
4 | Write
5 | =======================================================
6 | Write
7 |
8 | This is documentation for the Write API for Vector AI.
9 |
10 | .. automodule:: vectorai.write
11 | :members:
12 | :show-inheritance:
13 |
14 | .. automodule:: vectorai.api.write
15 | :members:
16 | :show-inheritance:
17 |
--------------------------------------------------------------------------------
/examples/data/Corona_NLP_train.csv:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vector-ai/vectorai/7b1a0eb2bb06a82d85ac3633eea984604baf2ea6/examples/data/Corona_NLP_train.csv
--------------------------------------------------------------------------------
/examples/data/nba_per_36.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vector-ai/vectorai/7b1a0eb2bb06a82d85ac3633eea984604baf2ea6/examples/data/nba_per_36.xlsx
--------------------------------------------------------------------------------
/examples/data/nba_per_game.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vector-ai/vectorai/7b1a0eb2bb06a82d85ac3633eea984604baf2ea6/examples/data/nba_per_game.xlsx
--------------------------------------------------------------------------------
/examples/images/2d-cosine-similarity.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vector-ai/vectorai/7b1a0eb2bb06a82d85ac3633eea984604baf2ea6/examples/images/2d-cosine-similarity.png
--------------------------------------------------------------------------------
/examples/images/dimensionality_reduced_vector_plot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vector-ai/vectorai/7b1a0eb2bb06a82d85ac3633eea984604baf2ea6/examples/images/dimensionality_reduced_vector_plot.png
--------------------------------------------------------------------------------
/examples/images/vectordb-1d-plot-example-readme.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vector-ai/vectorai/7b1a0eb2bb06a82d85ac3633eea984604baf2ea6/examples/images/vectordb-1d-plot-example-readme.png
--------------------------------------------------------------------------------
/examples/images/vectordb-plot-1d-cosine-similarity-comparison.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vector-ai/vectorai/7b1a0eb2bb06a82d85ac3633eea984604baf2ea6/examples/images/vectordb-plot-1d-cosine-similarity-comparison.png
--------------------------------------------------------------------------------
/pytest.ini:
--------------------------------------------------------------------------------
1 | [pytest]
2 | markers =
3 | use_client: marks tests as those that use the client
4 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | requests
2 | numpy
3 | pandas
4 | appdirs>=1.4.4
5 | plotly>=4.0.0
6 | tqdm>=4.27.0
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | # -*- coding: utf-8 -*-
3 |
4 | from setuptools import setup, find_packages
5 | import os
6 |
7 | core_req = ["requests", "numpy", "pandas", "appdirs>=1.4.4", "tqdm>=4.27.0", "plotly>=4.0.0"]
8 | extras_req = {
9 | "dev" : ["twine", "black", "pytest", "pytest-cov", "vectorai", "openapi-to-sdk"],
10 | "test" : ["pytest", "pytest-cov", "pytest-rerunfailures"],
11 | "docs" : ["sphinx-rtd-theme>=0.5.0", "nbsphinx>=0.7.1"]
12 | }
13 | extras_req["all"] = [p for r in extras_req.values() for p in r]
14 |
15 | version = '0.2.5'
16 | if 'IS_VECTORAI_NIGHTLY' in os.environ.keys():
17 | from datetime import datetime
18 | name = 'vectorai-nightly'
19 | version = version + '.' + datetime.today().date().__str__().replace('-', '.')
20 | else:
21 | name = 'vectorai'
22 |
23 | setup(
24 | name=name,
25 | version=version,
26 | author="OnSearch Pty Ltd",
27 | author_email="dev@vctr.ai",
28 | description="A Python framework for building vector based applications. Encode, query and analyse data using vectors.",
29 | long_description=open("README.md", "r", encoding="utf-8").read(),
30 | long_description_content_type="text/markdown",
31 | keywords="vector, embeddings, machinelearning, ai, artificialintelligence, nlp, tensorflow, pytorch, nearestneighbors, search, analytics, clustering, dimensionalityreduction",
32 | url="https://github.com/vector-ai/vectorai",
33 | license="Apache",
34 | packages=find_packages(exclude=["tests*"]),
35 | python_requires=">=3",
36 | install_requires=core_req,
37 | extras_require=extras_req,
38 | classifiers=[
39 | "Development Status :: 5 - Production/Stable",
40 | "Intended Audience :: Developers",
41 | "Intended Audience :: Education",
42 | "Intended Audience :: Science/Research",
43 | "Intended Audience :: Information Technology",
44 | "Intended Audience :: Financial and Insurance Industry",
45 | "Intended Audience :: Healthcare Industry",
46 | "Intended Audience :: Manufacturing",
47 | "License :: OSI Approved :: Apache Software License",
48 | "Operating System :: OS Independent",
49 | "Programming Language :: Python",
50 | "Programming Language :: Python :: 3",
51 | "Programming Language :: Python :: 3.4",
52 | "Programming Language :: Python :: 3.5",
53 | "Programming Language :: Python :: 3.6",
54 | "Programming Language :: Python :: 3.7",
55 | "Programming Language :: Python :: Implementation :: PyPy",
56 | "Topic :: Database",
57 | "Topic :: Internet :: WWW/HTTP :: Indexing/Search",
58 | "Topic :: Multimedia :: Sound/Audio :: Conversion",
59 | "Topic :: Multimedia :: Video :: Conversion",
60 | "Topic :: Scientific/Engineering :: Artificial Intelligence",
61 | "Topic :: Scientific/Engineering :: Image Recognition",
62 | "Topic :: Scientific/Engineering :: Information Analysis",
63 | "Topic :: Scientific/Engineering :: Visualization",
64 | "Topic :: Software Development :: Libraries :: Application Frameworks",
65 | ],
66 | )
67 |
--------------------------------------------------------------------------------
/tests/README.md:
--------------------------------------------------------------------------------
1 |
2 | ## Writing Testing For Package
3 |
4 | ```
5 | pytest --cov=vectorai tests/*
6 | ```
7 |
8 | If you would like to contribute code or tests, we suggest adhering to the pylint style guide and typehint as much as possible.
9 |
--------------------------------------------------------------------------------
/tests/__init__.py:
--------------------------------------------------------------------------------
1 | """Testing suite for the library.
2 | """
3 |
--------------------------------------------------------------------------------
/tests/analytics/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vector-ai/vectorai/7b1a0eb2bb06a82d85ac3633eea984604baf2ea6/tests/analytics/__init__.py
--------------------------------------------------------------------------------
/tests/analytics/api/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vector-ai/vectorai/7b1a0eb2bb06a82d85ac3633eea984604baf2ea6/tests/analytics/api/__init__.py
--------------------------------------------------------------------------------
/tests/analytics/api/test_comparator.py:
--------------------------------------------------------------------------------
1 | """Smoke tests for the API - Ensure that these do not error out!
2 | """
3 | import pytest
4 | import time
5 | from ...utils import TempClientWithDocs
6 |
7 | @pytest.mark.use_client
8 | def test_smoke_compare_ranks(test_analytics_client, document_vector_fields):
9 | with TempClientWithDocs(test_analytics_client) as client:
10 | time.sleep(2)
11 | results = test_analytics_client.random_compare_search_by_id(
12 | collection_name=client.collection_name,
13 | vector_fields=[document_vector_fields[0], document_vector_fields[1]]
14 | )
15 |
16 | # @pytest.mark.use_client
17 | # def test_smoke_compare_ranks_vector(test_analytics_client, document_vector_fields):
18 | # with TempClientWithDocs(test_analytics_client) as client:
19 | # time.sleep(2)
20 | # results = test_analytics_client.random_compare_search(
21 | # collection_name=client.collection_name,
22 | # vector_fields=document_vector_fields
23 | # )
24 |
--------------------------------------------------------------------------------
/tests/analytics/scorer/test_base_scorer.py:
--------------------------------------------------------------------------------
1 | # """
2 | # Test Base Scorer In Analytics
3 | # """
4 | # from vectorai.analytics.scorer
5 |
6 | # class test_scorer():
7 |
8 |
--------------------------------------------------------------------------------
/tests/analytics/test_relational_documents.py:
--------------------------------------------------------------------------------
1 | """
2 | Tests for relational documents.
3 | """
4 | from vectorai.analytics.relational_documents import *
5 | from vectorai.utils import UtilsMixin
6 | import pytest
7 |
8 | def test_vector_operation():
9 | assert (vector_operation([1, 2, 3], [3, 2, 1]) == [2, 2, 2])
10 |
11 | @pytest.mark.parametrize("test_operation, expected_output", [("minus", [0, 0, 0])])
12 | def test_relational_document_creation(test_operation, expected_output):
13 | mixin_utils = UtilsMixin()
14 | doc_1 = {'_vector_': [1, 2, 3], 'country': 'Australia'}
15 | doc_2 = {'_vector_': [1, 2, 3], 'country': 'New Zealand'}
16 | relational_doc = create_relational_document(doc_1, doc_2, vector_fields=['_vector_'],
17 | label_field='country', operation=test_operation)
18 | assert relational_doc['_vector_'] == expected_output
19 |
--------------------------------------------------------------------------------
/tests/analytics/test_score.py:
--------------------------------------------------------------------------------
1 | """
2 | Testing module for analytics scoring.
3 | """
4 | def test_cosine_similarity(test_client):
5 | """
6 | Testing cosine similarity function works.
7 | """
8 | test_client.calculate_cosine_similarity(test_client.generate_vector(10),
9 | test_client.generate_vector(10))
10 | assert True
11 |
--------------------------------------------------------------------------------
/tests/analytics/test_tables.py:
--------------------------------------------------------------------------------
1 | import time
2 | import pytest
3 |
4 | class TestCompare:
5 | @pytest.mark.use_client
6 | def test_setup(self, test_client, test_collection_name):
7 | """
8 | Test Setup.
9 | """
10 | num_of_docs = 50
11 | if test_collection_name in test_client.list_collections():
12 | test_client.delete_collection(test_collection_name)
13 | documents = test_client.create_sample_documents(num_of_docs)
14 | test_client.set_field_across_documents('color_vector_',
15 | [test_client.generate_vector(50, num_of_constant_values=49) for x in range(num_of_docs)], documents)
16 | test_client.set_field_across_documents('color_2_vector_',
17 | [test_client.generate_vector(50, num_of_constant_values=49) for x in range(num_of_docs)], documents)
18 | results = test_client.insert_documents(test_collection_name, documents)
19 | time.sleep(10)
20 | assert results['inserted_successfully'] == num_of_docs
21 |
22 | @pytest.mark.use_client
23 | @pytest.mark.parametrize("test_vector_fields", [("color_vector_"), ("color_2_vector_")])
24 | def test_compare_tables_simple(self, test_client, test_collection_name, test_vector_fields):
25 | """
26 | Test compare a simple table.
27 | """
28 | time.sleep(10)
29 | id_document = test_client.random_documents(test_collection_name, 1)['documents'][0]
30 | print(id_document)
31 | df = test_client.compare_vector_search_results(test_collection_name,
32 | vector_fields=[test_vector_fields], id_document=id_document, label='color')
33 | assert df.shape[0] > 0
34 |
35 | @pytest.mark.use_client
36 | def test_compare_tables_2_columns(self, test_client, test_collection_name):
37 | """
38 | Test compare a simple table.
39 | """
40 | id_document = test_client.random_documents(test_collection_name, 1)['documents'][0]
41 | df = test_client.compare_vector_search_results(test_collection_name,
42 | vector_fields=["color_vector_", "color_2_vector_"], id_document=id_document, label='color')
43 | assert df.shape[0] > 0
44 | assert df.shape[1] == 2
45 | @pytest.mark.use_client
46 | def test_teardown(self, test_client, test_collection_name):
47 | """
48 | Teardown.
49 | """
50 | test_client.delete_collection(test_collection_name)
51 | time.sleep(5)
52 | assert test_collection_name not in test_client.list_collections()
53 |
--------------------------------------------------------------------------------
/tests/analytics/test_viz.py:
--------------------------------------------------------------------------------
1 | """
2 | Test visualisations
3 | """
4 | import plotly.graph_objects as go
5 |
6 | def test_radar_plot_across_documents(test_client):
7 | """
8 | Test radar plots across documents
9 | """
10 | docs = test_client.create_sample_documents(5)
11 | fig = test_client.plot_radar_across_documents(docs, anchor_documents=docs[0:2],
12 | vector_field='color_vector_', label_field='color')
13 | assert isinstance(fig, go.Figure)
14 |
15 | def test_radar_plot_across_vector_fields(test_client):
16 | """
17 | Test radar plots across documents.
18 | """
19 | docs = test_client.create_sample_documents(5)
20 | fig = test_client.plot_radar_across_vector_fields(docs, anchor_document=docs[0],
21 | vector_fields=['color_vector_', 'color_2_vector_'], label_field='country')
22 | assert isinstance(fig, go.Figure)
23 |
--------------------------------------------------------------------------------
/tests/conftest.py:
--------------------------------------------------------------------------------
1 | """
2 | Global testing variables.
3 | """
4 | import pytest
5 | import os
6 | from vectorai.client import ViClient
7 | from vectorai.analytics.client import ViAnalyticsClient
8 | from vectorai.models.deployed import ViText2Vec
9 | import random
10 | import string
11 |
12 | def pytest_addoption(parser):
13 | parser.addoption(
14 | "--use_client", action="store_true", default=False, help="run slow tests"
15 | )
16 |
17 |
18 | def pytest_configure(config):
19 | config.addinivalue_line("markers", "slow: mark test as slow to run")
20 |
21 |
22 | def pytest_collection_modifyitems(config, items):
23 | if config.getoption("--use_client"):
24 | # --runslow given in cli: do not skip slow tests
25 | return
26 | skip_slow = pytest.mark.skip(reason="need --use_client option to run")
27 | for item in items:
28 | if "use_client" in item.keywords:
29 | item.add_marker(skip_slow)
30 |
31 | def get_random_string(length):
32 | # Random string with the combination of lower and upper case
33 | letters = 'abcdefghijklmnopqrstuvwxyz'
34 | return ''.join(random.choice(letters) for i in range(length))
35 |
36 | @pytest.fixture
37 | def test_username():
38 | return os.environ['VI_USERNAME']
39 |
40 |
41 | @pytest.fixture
42 | def test_api_key():
43 | return os.environ['VI_API_KEY']
44 |
45 |
46 | @pytest.fixture
47 | def test_client(test_username, test_api_key):
48 | """Testing for the client login.
49 | """
50 | return ViClient(username=test_username, api_key=test_api_key,
51 | url="https://vectorai-development-api-vectorai-test-api.azurewebsites.net/")
52 |
53 | @pytest.fixture(scope='class')
54 | def test_collection_name():
55 | return "test_colour_col_" + str(get_random_string(3))
56 |
57 | # @pytest.fixture
58 | # def test_collection_client(test_username, test_api_key, test_collection_name):
59 | # """Testing for the client login.
60 | # """
61 | # client = ViCollectionClient(username=test_username, api_key=test_api_key, collection_name=test_collection_name)
62 | # return client
63 |
64 | @pytest.fixture
65 | def test_analytics_client(test_username, test_api_key):
66 | return ViClient(username=test_username, api_key=test_api_key)
67 |
68 | @pytest.fixture
69 | def test_vector_field():
70 | return "item_vector_"
71 |
72 | @pytest.fixture
73 | def document_vector_fields():
74 | return ['color_vector_', 'color_2_vector_']
75 |
76 | @pytest.fixture
77 | def test_id_field():
78 | return "_id"
79 |
80 | @pytest.fixture
81 | def sample_documents():
82 | sample_documents = [
83 | {
84 | "name": "Bob",
85 | "color": "Orange",
86 | "team": "los angeles lakers"
87 | },
88 | {
89 | "name": "William",
90 | "color": "Yellow",
91 | "team": "miami heat"
92 | },
93 | {
94 | "name": "James Patterson",
95 | "color": "Blue",
96 | "team": "Charlotte Bobcats"
97 | }
98 | ]
99 | return sample_documents
100 |
101 |
102 | @pytest.fixture
103 | def test_text_encoder():
104 | """
105 | Text Encoder
106 | """
107 | model = ViText2Vec(os.environ['VI_USERNAME'], os.environ['VI_API_KEY'])
108 | return model
109 |
--------------------------------------------------------------------------------
/tests/test_client.py:
--------------------------------------------------------------------------------
1 | """Testing the client.
2 | """
3 |
4 | from vectorai import *
5 |
6 |
7 | def test_client_login_works(test_username, test_api_key):
8 | """Testing for the client login.
9 | """
10 | client = ViClient(username=test_username, api_key=test_api_key)
11 | assert True
12 |
--------------------------------------------------------------------------------
/tests/test_doc_utils.py:
--------------------------------------------------------------------------------
1 | """Testing for document utilities.
2 | """
3 | import pytest
4 | from vectorai.errors import MissingFieldError
5 |
6 | def test_set_field(test_client):
7 | sample = {}
8 | test_client.set_field("simple", doc=sample, value=[0, 2])
9 | assert test_client.get_field("simple", sample) == [0, 2]
10 |
11 | def test_set_field_nested(test_client):
12 | sample = {}
13 | test_client.set_field('simple.weird.strange', sample, value=3)
14 | assert test_client.get_field('simple.weird.strange', sample) == 3
15 | assert sample['simple']['weird']['strange'] == 3
16 |
17 | def test_get_field_chunk(test_client):
18 | sample = {
19 | 'kfc': [{'food': 'chicken'}, {'food': 'prawns'}]}
20 | assert test_client.get_field('kfc.0.food', sample) == 'chicken'
21 | assert test_client.get_field('kfc.1.food', sample) == 'prawns'
22 |
23 | def test_get_field_chunk_error(test_client):
24 | sample = {
25 | 'kfc': [{'food': 'chicken'}, {'food': 'prawns'}]}
26 | with pytest.raises(MissingFieldError):
27 | test_client.get_field('kfc.food', sample, missing_treatment='raise_error')
28 |
29 | def test_get_fields(test_client):
30 | doc = test_client.create_sample_documents(1)[0]
31 | assert len(test_client.get_fields(['size.cm', 'size.feet'], doc)) == 2
32 |
33 | def test_get_field_across_documents(test_client):
34 | docs = test_client.create_sample_documents(2)
35 | values = test_client.get_field_across_documents('color', docs)
36 | assert len(values) == 2
37 |
38 | def test_set_and_get_field_across_documents(test_client):
39 | docs = test_client.create_sample_documents(5)
40 | test_client.set_field_across_documents('size.inches', list(range(5)), docs)
41 | for i, doc in enumerate(docs):
42 | assert test_client.get_field('size.inches', doc) == i
43 |
44 | def test_is_field(test_client):
45 | """
46 | Test if it is a field
47 | """
48 | docs = test_client.create_sample_documents(10)
49 | assert test_client.is_field("size", docs[0])
50 | assert not test_client.is_field("hfueishfuie", docs[0])
51 | assert test_client.is_field("size.cm", docs[0])
52 | assert not test_client.is_field("size.bafehui", docs[0])
53 |
--------------------------------------------------------------------------------
/tests/test_error.py:
--------------------------------------------------------------------------------
1 | """
2 | Testing for Errors
3 | """
4 | import pytest
5 | from vectorai.errors import APIError
6 |
7 | def test_api_error(test_client):
8 | response = {'status': "error", "message": "This is a test error."}
9 | with pytest.raises(APIError):
10 | test_client._raise_error(response)
11 |
--------------------------------------------------------------------------------
/tests/test_models.py:
--------------------------------------------------------------------------------
1 | """
2 | Test for models.
3 | """
4 |
5 | import pytest
6 | from vectorai.models import ViDeployedModel
7 |
8 | def test_operations_sum(test_text_encoder):
9 | vectors = [[1, 2], [2, 3]]
10 | assert [3, 5] == test_text_encoder._vector_operation(vectors, vector_operation="sum")
11 |
12 | def test_operations_minus(test_text_encoder):
13 | vectors = [[1, 2], [2, 3]]
14 | assert [-1, -1] == test_text_encoder._vector_operation(vectors, vector_operation="minus")
15 |
16 | def test_operations_mean(test_text_encoder):
17 | vectors = [[1, 2], [2, 3]]
18 | assert [1.5, 2.5] == test_text_encoder._vector_operation(vectors, vector_operation="mean")
19 |
20 | def test_operations_max(test_text_encoder):
21 | vectors = [[1, 2], [2, 3]]
22 | assert [2, 3] == test_text_encoder._vector_operation(vectors, vector_operation="max")
23 |
24 | def test_operations_min(test_text_encoder):
25 | vectors = [[1, 2], [2, 3]]
26 | assert [1, 2] == test_text_encoder._vector_operation(vectors, vector_operation="min")
27 |
28 | def test_operations_min_with_error(test_text_encoder):
29 | with pytest.raises(ValueError):
30 | vectors = vectors = [[1, 2], [2, 3], [2, 4]]
31 | test_text_encoder._vector_operation(vectors, vector_operation='minus')
32 |
--------------------------------------------------------------------------------
/tests/test_read.py:
--------------------------------------------------------------------------------
1 | """Testing the various read functions for Vi
2 | """
3 | import pytest
4 | import time
5 | from vectorai.errors import MissingFieldWarning, MissingFieldError
6 | from .utils import TempClientWithDocs
7 |
8 | class TestRead:
9 | @pytest.mark.use_client
10 | def test_setup_for_read(self, test_client, test_collection_name):
11 | """Test Setup for Read Operations"""
12 | if test_collection_name in test_client.list_collections():
13 | test_client.delete_collection(test_collection_name)
14 | documents = test_client.create_sample_documents(5)
15 | test_client.insert_documents(
16 | collection_name=test_collection_name, documents=documents
17 | )
18 | time.sleep(10)
19 | assert True
20 |
21 | @pytest.mark.use_client
22 | def test_get_item_by_id(self, test_client, test_collection_name):
23 | return_item = test_client.id(collection_name=test_collection_name, document_id="0")
24 | for var in ['color', 'number', 'color_vector_', 'insert_date_']:
25 | assert var in return_item
26 |
27 | @pytest.mark.use_client
28 | def test_advanced_search_by_id(self, test_client, test_collection_name):
29 | filter_query = [
30 | {'field': 'color',
31 | 'filter_type': 'text',
32 | 'condition_value': 'red',
33 | 'condition': '=='}
34 | ]
35 | results = test_client.advanced_search_by_id(test_collection_name,
36 | document_id=test_client.random_documents(test_collection_name)['documents'][0]['_id'],
37 | search_fields={'color_vector_':1}, filters=filter_query)
38 | assert len(results) > 0
39 |
40 | @pytest.mark.use_client
41 | def test_get_document_by_bulk_id(self, test_client, test_collection_name):
42 | return_documents = test_client.bulk_id(
43 | collection_name=test_collection_name, document_ids=["0", "1"]
44 | )
45 | assert len(return_documents) == 2
46 |
47 |
48 | @pytest.mark.use_client
49 | def test_cleanup_for_read(self, test_client, test_collection_name):
50 | """Test Setup for Read Operations"""
51 | test_client.delete_collection(collection_name=test_collection_name)
52 | assert True
53 |
54 | def test_get_field(test_client):
55 | """Test for accessing the document field.
56 | """
57 | test_dict = {"kfc": {"item": "chickens"}}
58 | assert test_client.get_field("kfc.item", doc=test_dict) == "chickens"
59 |
60 | def test_get_empty_field(test_client):
61 | with pytest.raises(MissingFieldError):
62 | docs = test_client.create_sample_documents(10)
63 | test_client.get_field_across_documents('_id_', docs)
64 |
65 | def test_check_schema(test_client):
66 | """Testing a nested dictionary to ensure it can detected a nested vector field
67 | """
68 | with pytest.warns(None) as record:
69 | nested_schema = {}
70 | test_client._check_schema(nested_schema)
71 | # nested_schema = {'chk': {'chk_vector_': [0, 2, 3]}}
72 | assert len(record) == 2
73 | assert test_client._check_schema(nested_schema) == (True, True)
74 |
75 | def test_check_schema_with_vector_field(test_client):
76 | """Testing a nested dictionary to ensure it can detected a nested vector field
77 | """
78 | with pytest.warns(None) as record:
79 | nested_schema = {'chk': {'chk_vector_': [0, 2, 3]}}
80 | test_client._check_schema(nested_schema)
81 | assert len(record) == 1
82 | assert test_client._check_schema(nested_schema) == (True, False)
83 |
84 | def test_check_schema_id_field(test_client):
85 | with pytest.warns(None) as record:
86 | nested_schema = {'_id': "text"}
87 | test_client._check_schema(nested_schema)
88 | assert len(record) == 1
89 | assert test_client._check_schema(nested_schema) == (False, True)
90 |
91 | def test_check_schema_both(test_client):
92 | with pytest.warns(None) as record:
93 | nested_schema = {'_id': "text", "chk_vector_":[0, 1, 2]}
94 | assert len(record) == 0
95 | assert test_client._check_schema(nested_schema) == (False, False)
96 |
97 | @pytest.mark.use_client
98 | def test_search_collections(test_client):
99 | """
100 | Smoke test for searching collections
101 | """
102 | cn = 'example_collection_123y8io'
103 | if cn not in test_client.list_collections():
104 | test_client.create_collection(cn)
105 | time.sleep(2)
106 | assert len(test_client.search_collections('123y8io')) > 0, "Not searching collections properly."
107 | test_client.delete_collection(cn)
108 |
109 | @pytest.mark.use_client
110 | def test_random_recommendation_smoke_test(test_client, test_collection_name):
111 | """
112 | Smoke test for recommending random ID.
113 | """
114 | with TempClientWithDocs(test_client, test_collection_name):
115 | time.sleep(2)
116 | results = test_client.random_recommendation(
117 | test_collection_name,
118 | search_field='color_vector_')
119 | assert len(results['results']) > 0, "Random recommendation fails."
120 |
121 | @pytest.mark.use_client
122 | def test_random_documents_with_filters(test_client, test_collection_name):
123 | """
124 | Random documents with filters.
125 | """
126 | with TempClientWithDocs(test_client, test_collection_name, num_of_docs=20):
127 | time.sleep(2)
128 | filter_query = [{'field': 'country',
129 | 'filter_type': 'category',
130 | 'condition_value': 'Italy',
131 | 'condition': '=='}]
132 | docs = test_client.random_documents_with_filters(
133 | test_collection_name, filters=filter_query, page_size=20)
134 | print(filter_query)
135 | for doc in docs['documents']:
136 | assert doc['country'] == 'Italy'
137 |
138 | @pytest.mark.use_client
139 | def test_search_with_filters(test_client, test_collection_name):
140 | with TempClientWithDocs(test_client, test_collection_name, num_of_docs=100):
141 | time.sleep(2)
142 | filter_query = [{'field': 'country',
143 | 'filter_type': 'category',
144 | 'condition_value': 'Italy',
145 | 'condition': '=='}]
146 | docs = test_client.search_with_filters(
147 | test_collection_name, vector=test_client.generate_vector(30),
148 | field=['color_vector_'],
149 | filters=filter_query, page_size=20)
150 | for doc in docs['results']:
151 | assert doc['country'] == 'Italy'
152 |
153 | @pytest.mark.use_client
154 | def test_hybrid_search_with_filters(test_client, test_collection_name):
155 | with TempClientWithDocs(test_client, test_collection_name, num_of_docs=100):
156 | time.sleep(2)
157 | filter_query = [{'field': 'country',
158 | 'filter_type': 'category',
159 | 'condition_value': 'Italy',
160 | 'condition': '=='}]
161 | docs = test_client.hybrid_search_with_filters(
162 | test_collection_name,
163 | vector=test_client.generate_vector(30),
164 | text="red",
165 | text_fields=['color'],
166 | fields=['color_vector_'],
167 | filters=filter_query, page_size=20)
168 | for doc in docs['results']:
169 | assert doc['country'] == 'Italy'
170 |
--------------------------------------------------------------------------------
/tests/test_search.py:
--------------------------------------------------------------------------------
1 | """
2 | Testing for search functions
3 | """
4 | import numpy as np
5 | import pytest
6 | import time
7 |
8 | @pytest.mark.skip(reason="Chunk Search being altered.")
9 | @pytest.mark.use_client
10 | def test_chunk_search(test_client, test_collection_name):
11 | if test_collection_name in test_client.list_collections():
12 | test_client.delete_collection(test_collection_name)
13 | test_client.insert_documents(test_collection_name,
14 | test_client.create_sample_documents(10, include_chunks=True))
15 | time.sleep(5)
16 | vec = np.random.rand(1, 30).tolist()[0]
17 | results = test_client.chunk_search(
18 | test_collection_name,
19 | vector=vec,
20 | search_fields=['chunk.color_chunkvector_'],
21 | )
22 | assert 'error' not in results.keys()
23 |
--------------------------------------------------------------------------------
/tests/test_write_collection_basics.py:
--------------------------------------------------------------------------------
1 | """Test the write database.
2 | """
3 | import json
4 | import pytest
5 | import os
6 | import time
7 | import numpy as np
8 | from vectorai.models.deployed import ViText2Vec
9 | from vectorai.write import ViWriteClient
10 | from vectorai.errors import APIError, MissingFieldError, MissingFieldWarning, CollectionNameError
11 | from vectorai.client import ViClient
12 | from .utils import TempClientWithDocs
13 |
14 | class TestCollectionBasics:
15 | @pytest.mark.use_client
16 | def test_create_collection(self, test_client, test_collection_name, test_vector_field):
17 | collection_name = test_collection_name
18 | if collection_name in test_client.list_collections():
19 | test_client.delete_collection(collection_name)
20 | response = test_client.create_collection(
21 | collection_name=collection_name, collection_schema={test_vector_field: 512}
22 | )
23 | assert response is None
24 |
25 | @pytest.mark.use_client
26 | def test_prevent_collection_overwrite(self, test_client, test_collection_name):
27 | """
28 | Test prevention of the overwriting of the collections.
29 | """
30 | if test_collection_name not in test_client.list_collections():
31 | test_client.create_collection(test_collection_name)
32 | with pytest.raises(APIError):
33 | response = test_client.create_collection(collection_name=test_collection_name)
34 |
35 | @pytest.mark.use_client
36 | def test_list_collections(self, test_collection_name, test_client):
37 | response = test_client.list_collections()
38 | assert response.count(test_collection_name) == 1
39 |
40 | @pytest.mark.use_client
41 | def test_delete_collection(self, test_client, test_collection_name):
42 | response = test_client.delete_collection(collection_name=test_collection_name)
43 | assert response['status'] == 'complete'
44 |
45 | def test_dummy_vector(test_client):
46 | """
47 | Test the dummy vector
48 | """
49 | assert len(test_client.dummy_vector(512)) == 512
50 |
51 | def test_set_field_on_new_field(test_client):
52 | """
53 | Assert when set on new field.
54 | """
55 | doc = {}
56 | test_client.set_field('balls', doc, 3)
57 | assert doc['balls'] == 3
58 |
59 | def test_set_field_on_new_dict(test_client):
60 | doc = {}
61 | test_client.set_field('check.balls', doc, 3)
62 | assert test_client.get_field('check.balls', doc) == 3
63 |
64 | def test_vector_name(test_client):
65 | text_encoder = ViText2Vec(os.environ['VI_USERNAME'], os.environ['VI_API_KEY'])
66 | test_client.set_name(text_encoder, 'vectorai_text')
67 | vector_name = test_client._get_vector_name_for_encoding("color", text_encoder, model_list=[text_encoder])
68 | assert vector_name == "color_vectorai_text_vector_"
69 |
70 | def test_vector_name_2(test_client):
71 | text_encoder = ViText2Vec(os.environ['VI_USERNAME'], os.environ['VI_API_KEY'])
72 | text_encoder_2 = ViText2Vec(os.environ['VI_USERNAME'], os.environ['VI_API_KEY'])
73 | test_client.set_name(text_encoder, "vectorai")
74 | test_client.set_name(text_encoder_2, "vectorai_2")
75 | vector_name = test_client._get_vector_name_for_encoding("color", text_encoder, model_list=[text_encoder, text_encoder_2])
76 | assert vector_name == "color_vectorai_vector_"
77 | vector_name = test_client._get_vector_name_for_encoding("color", text_encoder_2, model_list=[text_encoder, text_encoder_2])
78 | assert vector_name == 'color_vectorai_2_vector_'
79 |
80 | def test_vector_name_same_name(test_client):
81 | text_encoder = ViText2Vec(os.environ['VI_USERNAME'], os.environ['VI_API_KEY'])
82 | with pytest.raises(ValueError):
83 | vector_name = test_client._check_if_multiple_models_have_same_name(models={'color':[text_encoder, text_encoder]})
84 |
85 | def test_encode_documents_With_models_using_encode(test_client):
86 | docs = test_client.create_sample_documents(5)
87 | text_encoder = ViText2Vec(os.environ['VI_USERNAME'], os.environ['VI_API_KEY'])
88 | test_client.set_name(text_encoder, "vectorai_text")
89 | test_client.encode_documents_with_models_using_encode(docs, models={'color': [text_encoder]})
90 | assert 'color_vectorai_text_vector_' in docs[0].keys()
91 |
92 | @pytest.mark.use_client
93 | def test_raises_warning_if_no_id(test_client, test_collection_name):
94 | docs = test_client.create_sample_documents(10)
95 | {x.pop('_id') for x in docs}
96 | with pytest.warns(MissingFieldWarning) as record:
97 | test_client.insert_documents(test_collection_name, docs)
98 | assert len(record) > 1
99 | assert record[1].message.args[0] == test_client.NO_ID_WARNING_MESSAGE
100 |
101 | @pytest.mark.use_client
102 | def test_raises_warning_if_only_one_id_is_present(test_client, test_collection_name):
103 | docs = test_client.create_sample_documents(10)
104 | {x.pop('_id') for x in docs[1:]}
105 | with pytest.warns(MissingFieldWarning) as record:
106 | test_client.insert_documents(test_collection_name, docs)
107 | assert record[0].message.args[0] == test_client.NO_ID_WARNING_MESSAGE
108 |
109 | @pytest.mark.use_client
110 | def test_retrieve_and_encode_simple(test_client, test_collection_name):
111 | """Test retrieving documents and encoding them with vectors.
112 | """
113 | VECTOR_LENGTH = 100
114 | def fake_encode(x):
115 | return test_client.generate_vector(VECTOR_LENGTH)
116 | with TempClientWithDocs(test_client, test_collection_name, 100) as client:
117 | results = client.retrieve_and_encode(test_collection_name,
118 | models={'country': fake_encode})
119 | assert list(client.collection_schema(test_collection_name)['country_vector_'].keys())[0] == 'vector'
120 | assert len(results['failed_document_ids']) == 0
121 | assert 'country_vector_' in client.collection_schema(test_collection_name)
122 | docs = client.retrieve_documents(test_collection_name)['documents']
123 | assert len(docs[0]['country_vector_']) == VECTOR_LENGTH
124 |
125 | @pytest.mark.parametrize('collection_name',['HIUFE', 'HUIF_;', 'fheuwiHF'])
126 | def test_collection_name_error(test_client, collection_name):
127 | with pytest.raises(CollectionNameError):
128 | test_client._typecheck_collection_name(collection_name)
129 |
130 | @pytest.mark.parametrize('collection_name', ['fehwu'])
131 | def test_collection_name_not_error(test_client, collection_name):
132 | test_client._typecheck_collection_name(collection_name)
133 | assert True
134 |
--------------------------------------------------------------------------------
/tests/test_write_deployed_models.py:
--------------------------------------------------------------------------------
1 | """Test the write database.
2 | """
3 | import json
4 | import pytest
5 | import os
6 | import time
7 | import numpy as np
8 | from vectorai.models.deployed import ViText2Vec
9 | from vectorai.write import ViWriteClient
10 | from vectorai.errors import APIError, MissingFieldError, MissingFieldWarning, CollectionNameError
11 | from vectorai.client import ViClient
12 | from .utils import TempClientWithDocs
13 |
14 | @pytest.mark.use_client
15 | def test_encode_documents_with_deployed_model(test_client, test_text_encoder):
16 | """
17 | Test single encoding method for models.
18 | """
19 | documents = test_client.create_sample_documents(10)
20 | test_client.encode_documents_with_models(documents, models={'color': [test_text_encoder]}, use_bulk_encode=False)
21 | assert 'color_vector_' in documents[0].keys()
22 | assert len(documents[0]['color_vector_']) > 0
23 |
24 | @pytest.mark.use_client
25 | def test_bulk_encode_documents_with_deployed_model(test_client, test_text_encoder):
26 | """
27 | Test bulk encoding method for models.
28 | """
29 | # Test when model key input is a list
30 | documents = test_client.create_sample_documents(10)
31 | test_client.encode_documents_with_models(documents, models={'color': [test_text_encoder]}, use_bulk_encode=True)
32 | assert 'color_vector_' in documents[0].keys()
33 | assert len(documents[0]['color_vector_']) > 0
34 | del documents
35 | documents = test_client.create_sample_documents(10)
36 | test_client.encode_documents_with_models(documents, models={'color': test_text_encoder}, use_bulk_encode=True)
37 | assert 'color_vector_' in documents[0].keys()
38 | assert len(documents[0]['color_vector_']) > 0
39 |
--------------------------------------------------------------------------------
/tests/test_write_documents.py:
--------------------------------------------------------------------------------
1 | """Test the write database.
2 | """
3 | import json
4 | import pytest
5 | import os
6 | import time
7 | import numpy as np
8 | from vectorai.models.deployed import ViText2Vec
9 | from vectorai.write import ViWriteClient
10 | from vectorai.errors import APIError, MissingFieldError, MissingFieldWarning, CollectionNameError
11 | from vectorai.client import ViClient
12 | from .utils import TempClientWithDocs
13 |
14 | def test__write_document_nested_field():
15 | sample = {"this": {}}
16 | ViWriteClient.set_field("this.is", doc=sample, value=[0, 2])
17 | assert sample["this"]["is"] == [0, 2]
18 |
19 | def test__write_document_nested_field_2():
20 | sample = {"this": {"is": {}}}
21 | ViWriteClient.set_field("this.is", doc=sample, value=[0, 2])
22 | assert sample["this"]["is"] == [0, 2]
23 |
24 | @pytest.mark.use_client
25 | def test_encode_documents_with_deployed_model(test_client, test_text_encoder):
26 | """
27 | Test single encoding method for models.
28 | """
29 | documents = test_client.create_sample_documents(10)
30 | test_client.encode_documents_with_models(documents, models={'color': [test_text_encoder]}, use_bulk_encode=False)
31 | assert 'color_vector_' in documents[0].keys()
32 | assert len(documents[0]['color_vector_']) > 0
33 |
34 | @pytest.mark.use_client
35 | def test_bulk_encode_documents_with_deployed_model(test_client, test_text_encoder):
36 | """
37 | Test bulk encoding method for models.
38 | """
39 | # Test when model key input is a list
40 | documents = test_client.create_sample_documents(10)
41 | test_client.encode_documents_with_models(documents, models={'color': [test_text_encoder]}, use_bulk_encode=True)
42 | assert 'color_vector_' in documents[0].keys()
43 | assert len(documents[0]['color_vector_']) > 0
44 | del documents
45 | documents = test_client.create_sample_documents(10)
46 | test_client.encode_documents_with_models(documents, models={'color': test_text_encoder}, use_bulk_encode=True)
47 | assert 'color_vector_' in documents[0].keys()
48 | assert len(documents[0]['color_vector_']) > 0
49 |
50 | def test_dummy_vector(test_client):
51 | """
52 | Test the dummy vector
53 | """
54 | assert len(test_client.dummy_vector(512)) == 512
55 |
56 | def test_set_field_on_new_field(test_client):
57 | """
58 | Assert when set on new field.
59 | """
60 | doc = {}
61 | test_client.set_field('balls', doc, 3)
62 | assert doc['balls'] == 3
63 |
64 | def test_set_field_on_new_dict(test_client):
65 | doc = {}
66 | test_client.set_field('check.balls', doc, 3)
67 | assert test_client.get_field('check.balls', doc) == 3
68 |
69 | def test_vector_name(test_client):
70 | text_encoder = ViText2Vec(os.environ['VI_USERNAME'], os.environ['VI_API_KEY'])
71 | test_client.set_name(text_encoder, 'vectorai_text')
72 | vector_name = test_client._get_vector_name_for_encoding("color", text_encoder, model_list=[text_encoder])
73 | assert vector_name == "color_vectorai_text_vector_"
74 |
75 | def test_vector_name_2(test_client):
76 | text_encoder = ViText2Vec(os.environ['VI_USERNAME'], os.environ['VI_API_KEY'])
77 | text_encoder_2 = ViText2Vec(os.environ['VI_USERNAME'], os.environ['VI_API_KEY'])
78 | test_client.set_name(text_encoder, "vectorai")
79 | test_client.set_name(text_encoder_2, "vectorai_2")
80 | vector_name = test_client._get_vector_name_for_encoding("color", text_encoder, model_list=[text_encoder, text_encoder_2])
81 | assert vector_name == "color_vectorai_vector_"
82 | vector_name = test_client._get_vector_name_for_encoding("color", text_encoder_2, model_list=[text_encoder, text_encoder_2])
83 | assert vector_name == 'color_vectorai_2_vector_'
84 |
85 | def test_vector_name_same_name(test_client):
86 | text_encoder = ViText2Vec(os.environ['VI_USERNAME'], os.environ['VI_API_KEY'])
87 | with pytest.raises(ValueError):
88 | vector_name = test_client._check_if_multiple_models_have_same_name(models={'color':[text_encoder, text_encoder]})
89 |
90 | def test_encode_documents_With_models_using_encode(test_client):
91 | docs = test_client.create_sample_documents(5)
92 | text_encoder = ViText2Vec(os.environ['VI_USERNAME'], os.environ['VI_API_KEY'])
93 | test_client.set_name(text_encoder, "vectorai_text")
94 | test_client.encode_documents_with_models_using_encode(docs, models={'color': [text_encoder]})
95 | assert 'color_vectorai_text_vector_' in docs[0].keys()
96 |
--------------------------------------------------------------------------------
/tests/test_write_edit.py:
--------------------------------------------------------------------------------
1 | """Test the write database.
2 | """
3 | import json
4 | import pytest
5 | import os
6 | import time
7 | import numpy as np
8 | from vectorai.models.deployed import ViText2Vec
9 | from vectorai.write import ViWriteClient
10 | from vectorai.errors import APIError, MissingFieldError, MissingFieldWarning, CollectionNameError
11 | from vectorai.client import ViClient
12 | from .utils import TempClientWithDocs
13 |
14 | class TestEdit:
15 | @pytest.mark.use_client
16 | def test_setup_for_read(self, test_client, test_collection_name):
17 | """Test Setup for Read Operations"""
18 | if test_collection_name in test_client.list_collections():
19 | test_client.delete_collection(collection_name=test_collection_name)
20 | documents = [
21 | {
22 | "_id": "2",
23 | "document_vector_": test_client.generate_vector(vector_length=512),
24 | "attribute": "red",
25 | },
26 | {
27 | "_id": "1",
28 | "document_vector_": test_client.generate_vector(vector_length=512),
29 | "attribute": "blue",
30 | },
31 | ]
32 |
33 | test_client.insert_documents(
34 | collection_name=test_collection_name, documents=documents
35 | )
36 | time.sleep(10)
37 | assert True
38 |
39 |
40 | @pytest.mark.use_client
41 | def test_edit_document(self, test_client, test_collection_name):
42 | with TempClientWithDocs(test_client, test_collection_name) as client:
43 | edits = {
44 | "_id": "1",
45 | "location": "Paris"
46 | }
47 | client.edit_document(
48 | collection_name=test_collection_name, edits=edits, document_id=edits['_id']
49 | )
50 | time.sleep(2)
51 | doc = client.id(collection_name=test_collection_name, document_id="1")
52 | assert doc["location"] == "Paris"
53 |
54 | @pytest.mark.use_client
55 | def test_create_filter(self, test_client, test_collection_name):
56 | with TempClientWithDocs(test_client, test_collection_name) as client:
57 | doc = {
58 | 'location': "Paris"
59 | }
60 | client.insert(test_collection_name, doc)
61 | results = test_client.filters(
62 | test_collection_name,
63 | test_client.create_filter_query(test_collection_name, 'location', 'contains', 'Paris'))
64 | assert len(results) > 0
65 |
66 | @pytest.mark.use_client
67 | def test_create_filter_2(self, test_client, test_collection_name):
68 | with TempClientWithDocs(test_client, test_collection_name) as client:
69 | doc = {
70 | 'location': "Paris"
71 | }
72 | client.insert(test_collection_name, doc)
73 | results = test_client.filters(
74 | test_collection_name,
75 | test_client.create_filter_query(
76 | test_collection_name, 'location', 'exact_match', 'Paris'))
77 | assert len(results) > 0
78 |
79 | @pytest.mark.use_client
80 | def test_create_filter_3(self, test_client, test_collection_name):
81 | with TempClientWithDocs(test_client, test_collection_name) as client:
82 | results = test_client.filters(test_collection_name,
83 | test_client.create_filter_query(test_collection_name, 'size.feet', '<=', '31'))
84 | assert len(results) > 0
85 |
86 | @pytest.mark.use_client
87 | def test_create_filter_4(self, test_client, test_collection_name):
88 | with TempClientWithDocs(test_client, test_collection_name):
89 | results = test_client.filters(test_collection_name,
90 | test_client.create_filter_query(test_collection_name, 'insert_date_', '>=', '2020-01-01'))
91 | assert len(results) > 0
92 |
93 | @pytest.mark.use_client
94 | def test_edit_documents(self, test_client, test_collection_name):
95 | """Test adding of an attribute
96 | """
97 | with TempClientWithDocs(test_client, test_collection_name):
98 | edits = [
99 | {"_id": "2", "location": "Sydney",},
100 | {"_id": "1", "location": "New York",},
101 | ]
102 | test_client.edit_documents(test_collection_name, edits)
103 | doc = test_client.id(collection_name=test_collection_name, document_id="2")
104 | assert doc["location"] == "Sydney"
105 | doc = test_client.id(collection_name=test_collection_name, document_id="1")
106 | assert doc['location'] == 'New York'
107 |
108 | @pytest.mark.use_client
109 | def test_edit_documents(test_client, test_collection_name):
110 | with TempClientWithDocs(test_client, test_collection_name, 100) as client:
111 | edits = test_client.create_sample_documents(100)
112 | {x.update({'favorite_singer': 'billie eilish'}) for x in edits}
113 | response = client.edit_documents(test_collection_name, edits)
114 | assert response['edited_successfully'] == len(edits)
115 | # Retrieve the documents
116 | docs = client.retrieve_documents(test_collection_name,
117 | include_fields=['favorite_singer'], page_size=1)['documents']
118 | for doc in docs:
119 | assert doc['favorite_singer'] == 'billie eilish'
120 |
--------------------------------------------------------------------------------
/tests/test_write_insert.py:
--------------------------------------------------------------------------------
1 | """Test the write database.
2 | """
3 | import json
4 | import pytest
5 | import os
6 | import time
7 | import numpy as np
8 | from vectorai.models.deployed import ViText2Vec
9 | from vectorai.write import ViWriteClient
10 | from vectorai.errors import APIError, MissingFieldError, MissingFieldWarning, CollectionNameError
11 | from vectorai.client import ViClient
12 | from .utils import TempClientWithDocs
13 |
14 | class TestInsert:
15 | @pytest.mark.use_client
16 | def test_insert_documents_simple_and_collection_stats_match(self, test_client,
17 | test_collection_name):
18 | """
19 | Testing for simple document insertion
20 | """
21 | if test_collection_name in test_client.list_collections():
22 | test_client.delete_collection(test_collection_name)
23 | sample_documents = test_client.create_sample_documents(10)
24 | test_client.insert_documents(test_collection_name, sample_documents)
25 | time.sleep(10)
26 | assert test_client.collection_stats(test_collection_name)['number_of_documents'] == 10
27 | test_client.delete_collection(test_collection_name)
28 | time.sleep(3)
29 |
30 | @pytest.mark.use_client
31 | def test_inserting_documents_without_id_fields(self, test_client, test_collection_name):
32 | """
33 | Test inserting documents if they do not have an ID field.
34 | """
35 | if test_collection_name in test_client.list_collections():
36 | test_client.delete_collection(test_collection_name)
37 | sample_documents = test_client.create_sample_documents(10)
38 | # Remove the ID fields
39 | {x.pop('_id') for x in sample_documents}
40 | test_client.insert_documents(test_collection_name, sample_documents)
41 | time.sleep(10)
42 | assert test_client.collection_stats(test_collection_name)['number_of_documents'] == 10
43 | test_client.delete_collection(test_collection_name)
44 | time.sleep(3)
45 |
46 | @pytest.mark.use_client
47 | def test_inserting_documents_without_id_fields_with_overwrite(self, test_client,
48 | test_collection_name):
49 | """
50 | Test inserting documents if they do not have an ID field.
51 | """
52 | if test_collection_name in test_client.list_collections():
53 | test_client.delete_collection(test_collection_name)
54 | sample_documents = test_client.create_sample_documents(10)
55 | # Remove the ID fields
56 | {x.pop('_id') for x in sample_documents}
57 | with pytest.warns(MissingFieldWarning):
58 | test_client.insert_documents(test_collection_name, sample_documents, overwrite=True)
59 | time.sleep(10)
60 | assert test_client.collection_stats(test_collection_name)['number_of_documents'] == 10
61 | test_client.delete_collection(test_collection_name)
62 | time.sleep(3)
63 |
64 | @pytest.mark.use_client
65 | def test_inserting_documents_when_id_is_not_a_string(self, test_client, test_collection_name):
66 | """
67 | Test inserting documents when ID is not a string
68 | """
69 | if test_collection_name in test_client.list_collections():
70 | test_client.delete_collection(test_collection_name)
71 | sample_documents = test_client.create_sample_documents(10)
72 | # Create integer IDs strings
73 | {x.update({'_id': int(x['_id'])}) for x in sample_documents}
74 | test_client.insert_documents(test_collection_name, sample_documents, overwrite=False)
75 | time.sleep(10)
76 | assert test_client.collection_stats(test_collection_name)['number_of_documents'] == 10
77 | test_client.delete_collection(test_collection_name)
78 | time.sleep(3)
79 |
80 | @pytest.mark.use_client
81 | def test_inserting_documents_when_id_is_not_a_string_with_overwrite(self, test_client,
82 | test_collection_name):
83 | """
84 | Test inserting documents when ID is not a string
85 | """
86 | if test_collection_name in test_client.list_collections():
87 | test_client.delete_collection(test_collection_name)
88 | sample_documents = test_client.create_sample_documents(10)
89 | # Create integer IDs strings
90 | {x.update({'_id': int(x['_id'])}) for x in sample_documents}
91 | test_client.insert_documents(test_collection_name, sample_documents, overwrite=True)
92 | time.sleep(10)
93 | assert test_client.collection_stats(test_collection_name)['number_of_documents'] == 10
94 | test_client.delete_collection(test_collection_name)
95 | time.sleep(3)
96 |
97 | @pytest.mark.use_client
98 | def test_insert_single_document(self, test_client, test_collection_name):
99 | if test_collection_name not in test_client.list_collections():
100 | test_client.create_collection(test_collection_name)
101 | # document = {"sample_vector_": test_client.generate_vector(20), "sample_name": "hi"}
102 | document = test_client.create_sample_document(1)
103 | response = test_client.insert(
104 | collection_name=test_collection_name, document=document
105 | )
106 | assert response['status'] == 'success'
107 |
108 | @pytest.mark.use_client
109 | def test_insert_single_document_error(self, test_client, test_collection_name):
110 | """Trigger an insert fail error
111 | """
112 | with pytest.raises(APIError):
113 | if test_collection_name not in test_client.list_collections():
114 | test_client.create_collection(test_collection_name)
115 | document = {
116 | "sample_vectors_": [test_client.generate_vector(20)] + [np.nan],
117 | "samplename": [["hi"]],
118 | }
119 | response = test_client.insert(
120 | collection_name=test_collection_name, document=document
121 | )
122 |
123 |
124 | @pytest.mark.use_client
125 | def test_clean_up(self, test_client, test_collection_name):
126 | """Remove a collection if it is there.
127 | """
128 | if test_collection_name in test_client.list_collections():
129 | test_client.delete_collection(test_collection_name)
130 | assert test_collection_name not in test_client.list_collections()
131 |
--------------------------------------------------------------------------------
/tests/test_write_misc.py:
--------------------------------------------------------------------------------
1 | """Test the write database.
2 | """
3 | import json
4 | import pytest
5 | import os
6 | import time
7 | import numpy as np
8 | from vectorai.models.deployed import ViText2Vec
9 | from vectorai.write import ViWriteClient
10 | from vectorai.errors import APIError, MissingFieldError, MissingFieldWarning, CollectionNameError
11 | from vectorai.client import ViClient
12 | from .utils import TempClientWithDocs
13 |
14 | def test_vector_name_same_name(test_client):
15 | text_encoder = ViText2Vec(os.environ['VI_USERNAME'], os.environ['VI_API_KEY'])
16 | with pytest.raises(ValueError):
17 | vector_name = test_client._check_if_multiple_models_have_same_name(models={'color':[text_encoder, text_encoder]})
18 |
19 | def test_encode_documents_With_models_using_encode(test_client):
20 | docs = test_client.create_sample_documents(5)
21 | text_encoder = ViText2Vec(os.environ['VI_USERNAME'], os.environ['VI_API_KEY'])
22 | test_client.set_name(text_encoder, "vectorai_text")
23 | test_client.encode_documents_with_models_using_encode(docs, models={'color': [text_encoder]})
24 | assert 'color_vectorai_text_vector_' in docs[0].keys()
25 |
26 | @pytest.mark.use_client
27 | def test_raises_warning_if_no_id(test_client, test_collection_name):
28 | docs = test_client.create_sample_documents(10)
29 | {x.pop('_id') for x in docs}
30 | with pytest.warns(MissingFieldWarning) as record:
31 | test_client.insert_documents(test_collection_name, docs)
32 | assert len(record) > 1
33 | assert record[1].message.args[0] == test_client.NO_ID_WARNING_MESSAGE
34 |
35 | @pytest.mark.use_client
36 | def test_raises_warning_if_only_one_id_is_present(test_client, test_collection_name):
37 | docs = test_client.create_sample_documents(10)
38 | {x.pop('_id') for x in docs[1:]}
39 | with pytest.warns(MissingFieldWarning) as record:
40 | test_client.insert_documents(test_collection_name, docs)
41 | assert record[0].message.args[0] == test_client.NO_ID_WARNING_MESSAGE
42 |
--------------------------------------------------------------------------------
/tests/test_write_multiprocessing.py:
--------------------------------------------------------------------------------
1 | """Test the write database.
2 | """
3 | import json
4 | import pytest
5 | import os
6 | import time
7 | import numpy as np
8 | from vectorai.models.deployed import ViText2Vec
9 | from vectorai.write import ViWriteClient
10 | from vectorai.errors import APIError, MissingFieldError, MissingFieldWarning, CollectionNameError
11 | from vectorai.client import ViClient
12 | from .utils import TempClientWithDocs
13 |
14 | @pytest.mark.use_client
15 | def test_multiprocess_insert(test_client, test_collection_name):
16 | NUM_OF_DOCUMENTS_INSERTED = 10
17 | if test_collection_name in test_client.list_collections():
18 | test_client.delete_collection(test_collection_name)
19 | time.sleep(10)
20 | documents = test_client.create_sample_documents(NUM_OF_DOCUMENTS_INSERTED)
21 | results = test_client.insert_documents(test_collection_name, documents, workers=5, overwrite=False)
22 | time.sleep(10)
23 | assert len(results['failed_document_ids']) == 0
24 | assert test_collection_name in test_client.list_collections()
25 | assert test_client.collection_stats(test_collection_name)['number_of_documents'] == NUM_OF_DOCUMENTS_INSERTED
26 | test_client.delete_collection(test_collection_name)
27 |
28 | @pytest.mark.use_client
29 | def test_multiprocess_insert_with_error(test_client, test_collection_name):
30 | NUM_OF_DOCUMENTS_INSERTED = 100
31 | if test_collection_name in test_client.list_collections():
32 | test_client.delete_collection(test_collection_name)
33 | documents = test_client.create_sample_documents(NUM_OF_DOCUMENTS_INSERTED)
34 | documents.append({
35 | '_id': '9993',
36 | 'color': np.nan
37 | })
38 |
39 | # This should result in 1 failure
40 | results = test_client.insert_documents(test_collection_name, documents, workers=5, overwrite=False)
41 | time.sleep(10)
42 | assert len(results['failed_document_ids']) == 1
43 | assert test_collection_name in test_client.list_collections()
44 | assert test_client.collection_stats(test_collection_name)['number_of_documents'] > 0
45 | test_client.delete_collection(test_collection_name)
46 |
47 | @pytest.mark.use_client
48 | def test_multiprocess_insert_with_error_with_overwrite(test_client, test_collection_name):
49 | NUM_OF_DOCUMENTS_INSERTED = 100
50 | if test_collection_name in test_client.list_collections():
51 | test_client.delete_collection(test_collection_name)
52 | time.sleep(5)
53 | documents = test_client.create_sample_documents(NUM_OF_DOCUMENTS_INSERTED)
54 | documents.append({
55 | '_id': '9993',
56 | 'color': np.nan
57 | })
58 |
59 | # This should result in 1 failure
60 | results = test_client.insert_documents(test_collection_name, documents, workers=5, overwrite=True)
61 | time.sleep(10)
62 | assert len(results['failed_document_ids']) == 1
63 | assert test_collection_name in test_client.list_collections()
64 | assert test_client.collection_stats(test_collection_name)['number_of_documents'] > 0
65 | test_client.delete_collection(test_collection_name)
66 |
67 | @pytest.mark.use_client
68 | def test_multiprocess_with_overwrite(test_client, test_collection_name):
69 | if test_collection_name in test_client.list_collections():
70 | test_client.delete_collection(test_collection_name)
71 | time.sleep(5)
72 | NUM_OF_DOCS = 10
73 | docs = test_client.create_sample_documents(NUM_OF_DOCS)
74 | test_client.insert_documents(test_collection_name, docs[0:5], workers=1, overwrite=False)
75 | response = test_client.insert_documents(test_collection_name, docs[3:5], workers=1,
76 | overwrite=True)
77 | assert response['inserted_successfully'] == 2
78 |
79 | @pytest.mark.use_client
80 | def test_multiprocess_with_overwrite_insert(test_client, test_collection_name):
81 | if test_collection_name in test_client.list_collections():
82 | test_client.delete_collection(test_collection_name)
83 | time.sleep(5)
84 | NUM_OF_DOCS = 10
85 | docs = test_client.create_sample_documents(NUM_OF_DOCS)
86 | test_client.insert_documents(test_collection_name, docs[0:5], workers=1, overwrite=False)
87 | response = test_client.insert_documents(test_collection_name, docs[3:5], workers=1,
88 | overwrite=True)
89 | assert response['inserted_successfully'] == 2
90 |
91 | @pytest.mark.use_client
92 | def test_multiprocess_overwrite(test_client, test_collection_name):
93 | if test_collection_name in test_client.list_collections():
94 | test_client.delete_collection(test_collection_name)
95 | time.sleep(5)
96 | NUM_OF_DOCS = 100
97 | docs = test_client.create_sample_documents(NUM_OF_DOCS)
98 | test_client.insert_documents(test_collection_name, docs[0:5], workers=1, overwrite=False)
99 | # For document with id '3'
100 | TEST_ID = '3'
101 | id_document = test_client.id(collection_name=test_collection_name, document_id=TEST_ID)
102 | test_client.set_field('test.field', id_document, 'stranger')
103 | docs[3] = id_document
104 | print(docs[3])
105 | docs[3].update({'_id': '3'})
106 | response = test_client.insert_documents(test_collection_name, docs[3:5], workers=1,
107 | overwrite=True)
108 | id_document = test_client.id(collection_name=test_collection_name, document_id=TEST_ID)
109 | assert test_client.get_field('test.field', id_document) == 'stranger'
110 | time.sleep(5)
111 | test_client.delete_collection(test_collection_name)
112 |
113 | @pytest.mark.use_client
114 | def test_multiprocess_not_overwrite(test_client, test_collection_name):
115 | if test_collection_name in test_client.list_collections():
116 | test_client.delete_collection(test_collection_name)
117 | time.sleep(5)
118 | NUM_OF_DOCS = 100
119 | docs = test_client.create_sample_documents(NUM_OF_DOCS)
120 | test_client.insert_documents(test_collection_name, docs[0:5], workers=1, overwrite=False)
121 | # For document with id '3'
122 | TEST_ID = '3'
123 | id_document = test_client.id(collection_name=test_collection_name, document_id=TEST_ID)
124 | test_client.set_field('test.field', id_document, 'stranger')
125 | docs[3] = id_document
126 | docs[3].update({'_id': '3'})
127 | response = test_client.insert_documents(test_collection_name, docs[3:5], workers=1,
128 | overwrite=False)
129 | id_document = test_client.id(collection_name=test_collection_name, document_id=TEST_ID)
130 | with pytest.raises(MissingFieldError):
131 | test_client.get_field('test.field', id_document)
132 | time.sleep(5)
133 | test_client.delete_collection(test_collection_name)
134 |
--------------------------------------------------------------------------------
/tests/test_write_retrieve_and_encode.py:
--------------------------------------------------------------------------------
1 | """Test the write database.
2 | """
3 | import json
4 | import pytest
5 | import os
6 | import time
7 | import numpy as np
8 | from vectorai.models.deployed import ViText2Vec
9 | from vectorai.write import ViWriteClient
10 | from vectorai.errors import APIError, MissingFieldError, MissingFieldWarning, CollectionNameError
11 | from vectorai.client import ViClient
12 | from .utils import TempClientWithDocs
13 |
14 | @pytest.mark.use_client
15 | def test_retrieve_and_encode_simple(test_client, test_collection_name):
16 | """Test retrieving documents and encoding them with vectors.
17 | """
18 | VECTOR_LENGTH = 100
19 | def fake_encode(x):
20 | return test_client.generate_vector(VECTOR_LENGTH)
21 | # with TempClientWithDocs(test_client, test_collection_name, 100) as client:
22 | test_client.insert_documents(test_collection_name, test_client.create_sample_documents(100))
23 | results = test_client.retrieve_and_encode(test_collection_name,
24 | models={'country': fake_encode})
25 | assert list(test_client.collection_schema(test_collection_name)['country_vector_'].keys())[0] == 'vector'
26 | assert len(results['failed_document_ids']) == 0
27 | assert 'country_vector_' in test_client.collection_schema(test_collection_name)
28 | docs = test_client.retrieve_documents(test_collection_name)['documents']
29 | assert len(docs[0]['country_vector_']) == VECTOR_LENGTH
30 |
--------------------------------------------------------------------------------
/tests/utils.py:
--------------------------------------------------------------------------------
1 | import time
2 | import random
3 | import string
4 | from vectorai import ViClient
5 |
6 | class TempClient:
7 | def __init__(self, client, collection_name: str=None):
8 | self.client = client
9 | if isinstance(client, ViClient):
10 | self.collection_name = collection_name
11 | # elif isinstance(client, ViCollectionClient):
12 | # self.collection_name = self.client.collection_name
13 |
14 | def teardown_collection(self):
15 | if self.collection_name in self.client.list_collections():
16 | time.sleep(2)
17 | if isinstance(self.client, ViClient):
18 | self.client.delete_collection(self.collection_name)
19 | elif isinstance(self.client, ViCollectionClient):
20 | self.client.delete_collection()
21 |
22 | def __enter__(self):
23 | self.teardown_collection()
24 | return self.client
25 |
26 | def __exit__(self, *exc):
27 | self.teardown_collection()
28 |
29 | class TempClientWithDocs(TempClient):
30 | """
31 | Temporary Client With Documents already inserted.
32 | """
33 | def __init__(self, client, collection_name: str=None, num_of_docs: int=10):
34 | self.client = client
35 | if hasattr(self.client, 'collection_name'):
36 | self.collection_name = collection_name
37 | else:
38 | if collection_name is None:
39 | collection_name = self.generate_random_collection_name()
40 | self.collection_name = collection_name
41 | self.client.collection_name = collection_name
42 | self.num_of_docs = num_of_docs
43 | self.teardown_collection()
44 | self.client.insert_documents(self.collection_name,
45 | self.client.create_sample_documents(self.num_of_docs))
46 |
47 | def generate_random_collection_name(self):
48 | return self.generate_random_string(20)
49 |
50 | def generate_random_string(self, num_of_letters):
51 | letters = string.ascii_lowercase
52 | return '_delete_'.join(random.choice(letters) for i in range(num_of_letters))
53 |
54 | def __enter__(self):
55 | # self.teardown_collection()
56 | # self.client.insert_documents(self.collection_name,
57 | # self.client.create_sample_documents(self.num_of_docs))
58 | return self.client
59 |
--------------------------------------------------------------------------------
/utils/automate_api.py:
--------------------------------------------------------------------------------
1 | if __name__=="__main__":
2 | import os
3 | from openapi_to_sdk.sdk_automation import PythonSDKBuilder
4 |
5 | url="https://vectorai-development-api.azurewebsites.net"
6 | url="https://vectorai-development-api.azurewebsites.net"
7 | # url = "https://api.vctr.ai"
8 | sdk = PythonSDKBuilder(
9 | url=url,
10 | # url="https://vectorai-development-api.azurewebsites.net",
11 | # url='https://vecdb-aueast-api.azurewebsites.net',
12 | inherited_properties=['username', 'api_key', 'url'],
13 | decorators=[
14 | 'retry()',
15 | "return_curl_or_response('json')"],
16 | override_param_defaults=dict(
17 | min_score=None,
18 | cursor=None,
19 | # url='https://vecdb-aueast-api.azurewebsites.net',
20 | url=url,
21 | # sort=False,
22 | sort_by_created_at_date=False,
23 | ),
24 | internal_functions=[
25 | "list_collections",
26 | "create_collection",
27 | "search",
28 | "delete_collection",
29 | "create_collection_from_document"
30 | ],
31 | )
32 | sdk.to_python_file(
33 | class_name="_ViAPIClient",
34 | filename='vectorai/api/api.py',
35 | import_strings=['import requests', 'from vectorai.api.utils import retry, return_curl_or_response'],
36 | include_response_parsing=False,
37 | )
38 |
39 | from vectorai.api.api import _ViAPIClient
40 | vi = _ViAPIClient(os.environ['VI_USERNAME'], os.environ['VI_API_KEY'], url=url)
41 | print(vi._list_collections())
42 |
43 |
--------------------------------------------------------------------------------
/utils/download_badges.py:
--------------------------------------------------------------------------------
1 | import requests
2 | def download_image(url, output_image_file):
3 | r = requests.get(url)
4 | with open(output_image_file, 'w') as f:
5 | if isinstance(r.content, bytes):
6 | content = r.content.decode()
7 | else:
8 | content = r.content
9 | f.write(content)
10 |
11 | if __name__=="__main__":
12 |
13 | download_image("https://static.pepy.tech/personalized-badge/vectorai-nightly?period=total&units=none&left_color=black&right_color=purple&left_text=Total%20Downloads",
14 | "assets/total_downloads.svg")
15 | download_image("https://static.pepy.tech/personalized-badge/vectorai-nightly?period=week&units=none&left_color=black&right_color=purple&left_text=Weekly%20Downloads",
16 | "assets/weekly_downloads.svg")
17 | download_image("https://static.pepy.tech/personalized-badge/vectorai-nightly?period=month&units=none&left_color=black&right_color=purple&left_text=Monthly%20Downloads",
18 | "assets/monthly_downloads.svg")
19 |
--------------------------------------------------------------------------------
/vectorai/__init__.py:
--------------------------------------------------------------------------------
1 | """Vecdb Client
2 | """
3 | __version__ = "0.2.2"
4 |
5 | from .api import *
6 | from .client import *
7 | from .read import *
8 | from .write import *
9 |
--------------------------------------------------------------------------------
/vectorai/analytics/__init__.py:
--------------------------------------------------------------------------------
1 | """SUbmodule for Vector Analytics.
2 | """
3 |
4 | from .analytics import ViAnalyticsMixin
5 | from .client import ViAnalyticsClient
6 |
--------------------------------------------------------------------------------
/vectorai/analytics/analytics.py:
--------------------------------------------------------------------------------
1 | """Mixin class for analytics submodule containing vector analytics tools.
2 | """
3 |
4 | from .dimensionality_reduction import *
5 | from .viz import *
6 | from .tables import *
7 |
8 | class ViAnalyticsMixin(VizMixin, TableMixin):
9 | """
10 | Vi Analytics Mixin.
11 | Currently includes visualisation mixin.
12 | """
13 |
14 | pass
15 |
--------------------------------------------------------------------------------
/vectorai/analytics/api/__init__.py:
--------------------------------------------------------------------------------
1 | from .comparator import ComparatorAPI
--------------------------------------------------------------------------------
/vectorai/analytics/api/comparator.py:
--------------------------------------------------------------------------------
1 | import requests
2 | from typing import List, Dict, Optional
3 | from ...api.utils import retry, return_curl_or_response
4 |
5 | class ComparatorAPI:
6 | def __init__(self, username: str=None, api_key: str=None,
7 | url: str = "https://api.vctr.ai", analytics_url="https://vector-analytics.vctr.ai"):
8 | self.username = username
9 | self.api_key = api_key
10 | self.url = url
11 | self.analytics_url = analytics_url
12 |
13 |
14 | @return_curl_or_response('content')
15 | @retry()
16 | def _compare_ranks(
17 | self,
18 | ranked_list_1: List[Dict],
19 | ranked_list_2: List[Dict],
20 | fields_to_display: List[str]=None,
21 | image_fields: List[str]=[],
22 | audio_fields: List[str]=[],
23 | column_titles: Optional[List[str]] = None,
24 | x_axis_title: str = 'Fields',
25 | y_axis_title: str = 'Comparing: ',
26 | header: str = "
Top-K Ranking Comparator
",
27 | subheader: str = "
Compare ranks in the different lists.
",
28 | colors: List[str]=['#ccff99', 'powderblue', '#ffc2b3'],
29 | return_curl: bool=False,
30 | **kwargs
31 | ):
32 | """
33 | Compare Top-K Lists.
34 | Args:
35 | ranked_list_1: A list of results as a dictionary containing the required fields.
36 | ranked_list_2: Another list of results
37 | fields_to_display: The fields required for displaying the object
38 | image_fields: The fields which are images
39 | audio_fields: The fields which are audio
40 | column_titles: The name of the columns for the differnt rank fields
41 | x_axis_title: The title of the x axis
42 | y_axis_title: The title of the y axis
43 | header: The name of the graph
44 | subheader: The sub-header of the graph
45 | """
46 |
47 | params={
48 | "username": self.username,
49 | "api_key": self.api_key,
50 | "ranked_list_1": ranked_list_1,
51 | "ranked_list_2": ranked_list_2,
52 | "fields_to_display": fields_to_display,
53 | "image_fields": image_fields,
54 | "audio_fields": audio_fields,
55 | "column_titles": column_titles,
56 | "x_axis_title": x_axis_title,
57 | "y_axis_title": y_axis_title,
58 | "header": header,
59 | "subheader": subheader,
60 | "colors": colors,
61 | }
62 | params.update(kwargs)
63 | return requests.post(
64 | url= f"{self.analytics_url}/comparator/compare_ranks/",
65 | json=params)
66 |
--------------------------------------------------------------------------------
/vectorai/analytics/client.py:
--------------------------------------------------------------------------------
1 | import os
2 | from .comparator import ComparatorClient
3 | from .viz import VizMixin
4 |
5 | class ViAnalyticsClient(ComparatorClient, VizMixin):
6 | def __init__(self, username: str=None, api_key: str=None,
7 | url: str = "https://api.vctr.ai", analytics_url="https://vector-analytics.vctr.ai"):
8 | self.username = username if username is not None else os.environ['VI_USERNAME']
9 | self.api_key = api_key if api_key is not None else os.environ['VI_API_KEY']
10 | self.url = url
11 | self.analytics_url = analytics_url
12 |
--------------------------------------------------------------------------------
/vectorai/analytics/comparator.py:
--------------------------------------------------------------------------------
1 | from typing import List, Dict, Optional
2 | # from ..read import ViReadClient
3 | # from ..client import ViClient
4 | from .api.comparator import ComparatorAPI
5 |
6 | class ComparatorClient(ComparatorAPI):
7 | def __init__(self, username: str=None, api_key: str=None,
8 | url: str = "https://api.vctr.ai",
9 | analytics_url="https://vector-analytics.vctr.ai"):
10 | self.username = username
11 | self.api_key = api_key
12 | self.url = url
13 | self.analytics_url = analytics_url
14 |
15 | def write_to_html(self, content, file_name: str):
16 | with open(file_name, 'w') as f:
17 | f.write(content)
18 |
19 | def output(self, content, html_file: str=None):
20 | if html_file is None:
21 | if self.is_in_notebook():
22 | from IPython.display import HTML
23 | return HTML(content.decode())
24 | return content
25 | self.write_to_html(content)
26 | print(f"Written to {html_file}.")
27 | return content
28 |
29 | def compare_ranks(
30 | self,
31 | ranked_list_1: List[Dict],
32 | ranked_list_2: List[Dict],
33 | fields_to_display: List[str]=None,
34 | image_fields: List[str]=[],
35 | audio_fields: List[str]=[],
36 | column_titles: Optional[List[str]] = None,
37 | x_axis_title: str = 'Fields',
38 | y_axis_title: str = 'Comparing: ',
39 | header: str = "
Top-K Ranking Comparator
",
40 | subheader: str = "
Compare ranks in the different lists.
",
41 | colors: List[str]=['#ccff99', 'powderblue', '#ffc2b3'],
42 | html_file: str=None
43 | ):
44 | """
45 | Compare Top-K Lists.
46 | Args:
47 | ranked_list_1: A list of results as a dictionary containing the required fields.
48 | ranked_list_2: Another list of results
49 | fields_to_display: The fields required for displaying the object
50 | image_fields: The fields which are images
51 | audio_fields: The fields which are audio
52 | column_titles: The name of the columns for the differnt rank fields
53 | x_axis_title: The title of the x axis
54 | y_axis_title: The title of the y axis
55 | header: The name of the graph
56 | subheader: The sub-header of the graph
57 | """
58 | content = self._compare_ranks(ranked_list_1, ranked_list_2,
59 | column_titles=column_titles, fields_to_display=fields_to_display,
60 | image_fields=image_fields, audio_fields=audio_fields,
61 | x_axis_title=x_axis_title, y_axis_title=y_axis_title,
62 | header=header, subheader=subheader, colors=colors)
63 | return self.output(content)
64 |
65 | def compare_search(
66 | self,
67 | collection_name: str,
68 | vector_fields: List[str],
69 | vector : List[float],
70 | fields_to_display: List[str]=None,
71 | image_fields: List[str]=[],
72 | audio_fields: List[str]=[],
73 | x_axis_title: str = 'Fields',
74 | y_axis_title: str = 'Vector fields',
75 | header: str = "