├── .gitignore
├── .travis.yml
├── LICENSE
├── README.md
├── pom.xml
└── src
├── assembly
└── benchmark.xml
├── benchmark
└── java
│ └── com
│ └── eatthepath
│ └── jvptree
│ ├── CartesianDistanceFunction.java
│ ├── CartesianPoint.java
│ ├── ThresholdSelectionBenchmark.java
│ ├── VPTreeConstructionBenchmark.java
│ └── VPTreeQueryBenchmark.java
├── main
└── java
│ ├── com
│ └── eatthepath
│ │ └── jvptree
│ │ ├── DistanceComparator.java
│ │ ├── DistanceFunction.java
│ │ ├── MetaIterator.java
│ │ ├── NearestNeighborCollector.java
│ │ ├── PartitionException.java
│ │ ├── PointFilter.java
│ │ ├── SpatialIndex.java
│ │ ├── ThresholdSelectionStrategy.java
│ │ ├── VPTree.java
│ │ ├── VPTreeNode.java
│ │ ├── package-info.java
│ │ └── util
│ │ ├── MedianDistanceThresholdSelectionStrategy.java
│ │ ├── SamplingMedianDistanceThresholdSelectionStrategy.java
│ │ └── package-info.java
│ └── overview.html
└── test
└── java
└── com
└── eatthepath
└── jvptree
├── IntegerDistanceFunction.java
├── MetaIteratorTest.java
├── NearestNeighborCollectorTest.java
├── VPTreeNodeTest.java
├── VPTreeTest.java
├── example
├── CartesianDistanceFunction.java
├── CartesianPoint.java
├── ExampleApp.java
└── SpaceInvader.java
└── util
├── MedianDistanceThresholdSelectionStrategyTest.java
└── SamplingMedianDistanceThresholdSelectionStrategyTest.java
/.gitignore:
--------------------------------------------------------------------------------
1 | *.class
2 |
3 | # Mobile Tools for Java (J2ME)
4 | .mtj.tmp/
5 |
6 | # Package Files #
7 | *.jar
8 | *.war
9 | *.ear
10 |
11 | # virtual machine crash logs, see http://www.java.com/en/download/help/error_hotspot.xml
12 | hs_err_pid*
13 | /target
14 |
15 | # Eclipse project files
16 | .classpath
17 | .project
18 | .settings/
19 |
20 | # IntelliJ project files
21 | .idea/
22 | *.iml
23 |
24 | # Generated output
25 | doc/
26 |
27 | # OS detritus
28 | .DS_Store
29 |
--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
1 | language: java
2 | jdk:
3 | - openjdk8
4 | - openjdk10
5 | - openjdk11
6 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | The MIT License (MIT)
2 |
3 | Copyright (c) 2014 Jon Chambers
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | [](https://travis-ci.org/jchambers/jvptree)
2 |
3 | # jvptree
4 |
5 | Jvptree is a generic [vantage-point tree](https://en.wikipedia.org/wiki/Vantage-point_tree) implementation written in Java that allows for quick (*O(log(n))*) searches for the nearest neighbors to a given point. Vantage-point trees are binary space partitioning trees that partition points according to their distance from each node's "vantage point." Points that are closer than a chosen threshold go into one child node, while points that are farther away go into the other. Vantage point trees operate on any [metric space](https://en.wikipedia.org/wiki/Metric_space).
6 |
7 | Steve Hanov has written a great and accessible [introducton to vp-trees](http://stevehanov.ca/blog/index.php?id=130).
8 |
9 | ## Getting jvptree
10 |
11 | If you use [Maven](http://maven.apache.org/), you can add jvptree to your project by adding the following dependency declaration to your POM:
12 |
13 | ```xml
14 |
15 | com.eatthepath
16 | jvptree
17 | 0.2
18 |
19 | ```
20 |
21 | If you don't use Maven, you can download jvptree as a `.jar` file and add it to your project directly. Jvptree has no external dependencies, and works with Java 1.7 and newer.
22 |
23 | ## Major concepts
24 |
25 | The main thing vantage-point trees do is partitioning points into groups that are closer or farther than a given distance threshold. To do that, a vp-tree needs to be able to figure out how far apart any two points are and also decide what to use as a distance threshold. At a minimum, you'll need to provide a distance function that can calculate the distance between points. You may optionally specify a threshold selection strategy; if you don't, a reasonable default will be used.
26 |
27 | ### Distance functions
28 |
29 | You must always specify a [distance function](http://jchambers.github.io/jvptree/apidocs/0.2/com/eatthepath/jvptree/DistanceFunction.html) when creating a vp-tree. Distance functions take two points as arguments and must satisfy the requirements of a metric space, namely:
30 |
31 | - d(x, y) >= 0
32 | - d(x, y) = 0 if and only if x == y
33 | - d(x, y) == d(y, x)
34 | - d(x, z) <= d(x, y) + d(y, z)
35 |
36 | ### Threshold selection strategies
37 |
38 | You may optionally specify a [strategy for choosing a distance threshold](http://jchambers.github.io/jvptree/apidocs/0.2/com/eatthepath/jvptree/ThresholdSelectionStrategy.html) for partitioning. By default, jvptree will use [sampling median strategy](http://jchambers.github.io/jvptree/apidocs/0.2/com/eatthepath/jvptree/util/SamplingMedianDistanceThresholdSelectionStrategy.html), where it will take the median distance from a small subset of the points to partition. Jvptree also includes a [threshold selection strategy that takes the median of *all* points](http://jchambers.github.io/jvptree/apidocs/0.2/com/eatthepath/jvptree/util/MedianDistanceThresholdSelectionStrategy.html) to be partitioned; this is slower, but may result in a more balanced tree. Most users will not need to specify a threshold selection strategy.
39 |
40 | ### Node capacity
41 |
42 | Additionally, you may specify a desired capacity for the tree's leaf nodes. It's worth mentioning early that you almost certainly do not need to worry about this; a reasonable default (32 points) will be used, and most users won't realize significant performance gains by tuning it.
43 |
44 | Still, for those in need, you may choose a desired capacity for leaf nodes in a vp-tree. At one extreme, leaf nodes may contain only a single point. This means that searches will have to traverse more nodes, but once a leaf node is reached, fewer points will need to be searched to find nearest neighbors.
45 |
46 | Using a larger node capacity will result in a "flatter" tree, and fewer nodes will need to be traversed when searching, but more nodes will need to be tested once a search reaches a leaf node. Larger node capacities also lead to less memory overhead because there are fewer nodes in the tree.
47 |
48 | As a general rule of thumb, node capacities should be on the same order of magnitude as your typical search result size. The idea is that if a search reaches a leaf node, most of the points in the node will wind up in the collection of nearest neighbors (i.e. they all would have had to been checked anyhow) and few other nodes will have to be visited to gather any remaining neighbors.
49 |
50 | ## Using jvptree
51 |
52 | As discussed above, you must provide a distance function when creating a vp-tree and may optionally specify a distance threshold selection strategy and leaf node capacity. As a simple example, let's say you're writing a version of [Space Invaders](https://en.wikipedia.org/wiki/Space_Invaders), and you know you'll need to find the closest enemies to the player's position. To start, everything on the playing field will exist at a specific point:
53 |
54 | ```java
55 | public interface CartesianPoint {
56 | double getX();
57 | double getY();
58 | }
59 | ```
60 |
61 | To create a vp-tree, you must provide a distance function that will return the distance between any two given points. In this example, you might create a `CartesianDistanceFunction` class:
62 |
63 | ```java
64 | public class CartesianDistanceFunction implements DistanceFunction {
65 |
66 | public double getDistance(final CartesianPoint firstPoint, final CartesianPoint secondPoint) {
67 | final double deltaX = firstPoint.getX() - secondPoint.getX();
68 | final double deltaY = firstPoint.getY() - secondPoint.getY();
69 |
70 | return Math.sqrt((deltaX * deltaX) + (deltaY * deltaY));
71 | }
72 | }
73 | ```
74 |
75 | Once you have your distance function, you can create a vp-tree that stores the locations of all of the space invaders on the playing field:
76 |
77 | ```java
78 | final VPTree vpTree =
79 | new VPTree<>(new CartesianDistanceFunction(), enemies);
80 | ```
81 |
82 | In this case, we provide all of our points at construction time, but you may also create an empty tree and add points later. The `VPTree` class implements Java's [`Collection`](http://docs.oracle.com/javase/7/docs/api/java/util/Collection.html) interface and supports all optional operations.
83 |
84 | Note that a `VPTree` has two generic types: a general "base" point type and a more specific type for the elements actually stored in the tree. You can query the tree using any instance of the base type, but still know that you'll be receiving a list of the more specific type as a result of the query. In our example, this is helpful because the player's location is a cartesian point, but the player is not a space invader. It wouldn't make much sense to create a new space invader at the player's location just to query the vp-tree, and so this construct allows us to query the tree with the player's location instead.
85 |
86 | With your tree created, you can find the closest enemies to the player's position. For example, to find (up to) the ten closest space invaders:
87 |
88 | ```java
89 | final List nearestEnemies =
90 | vpTree.getNearestNeighbors(playerPosition, 10);
91 | ```
92 |
93 | You could also find all of the enemies that are within firing range of the player:
94 |
95 | ```java
96 | final List enemiesWithinFiringRange =
97 | vpTree.getAllWithinDistance(playerPosition, 4.5);
98 | ```
99 |
100 | ## License
101 |
102 | Jvptree is available to the public under the [MIT License](http://opensource.org/licenses/MIT).
103 |
--------------------------------------------------------------------------------
/pom.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 | 4.0.0
4 |
5 | com.eatthepath
6 | jvptree
7 | jar
8 | 0.4.0-SNAPSHOT
9 | jvptree
10 | A generic vp-tree implementation in Java
11 |
12 |
13 |
14 | The MIT License (MIT)
15 | http://opensource.org/licenses/MIT
16 | repo
17 |
18 |
19 |
20 |
21 | org.sonatype.oss
22 | oss-parent
23 | 7
24 |
25 |
26 |
27 | UTF-8
28 |
29 |
30 |
31 |
32 | org.junit.jupiter
33 | junit-jupiter-engine
34 | 5.7.1
35 | test
36 |
37 |
38 |
39 | org.junit.jupiter
40 | junit-jupiter-params
41 | 5.7.1
42 | test
43 |
44 |
45 |
46 | org.openjdk.jmh
47 | jmh-core
48 | 1.21
49 | test
50 |
51 |
52 |
53 | org.openjdk.jmh
54 | jmh-generator-annprocess
55 | 1.21
56 | test
57 |
58 |
59 |
60 |
61 |
62 |
63 | org.apache.maven.plugins
64 | maven-surefire-plugin
65 | 3.0.0-M4
66 |
67 |
68 |
69 | org.apache.maven.plugins
70 | maven-jar-plugin
71 | 3.0.2
72 |
73 |
74 | **/.gitignore
75 |
76 |
77 |
78 |
79 |
80 |
81 | org.codehaus.mojo
82 | build-helper-maven-plugin
83 | 3.0.0
84 |
85 |
86 | add-test-source
87 | generate-test-sources
88 |
89 | add-test-source
90 |
91 |
92 |
93 | src/benchmark/java
94 |
95 |
96 |
97 |
98 |
99 |
100 |
101 | org.apache.maven.plugins
102 | maven-compiler-plugin
103 | 3.8.1
104 |
105 |
106 | 1.8
107 | 1.8
108 |
109 |
110 |
111 |
112 |
113 | testCompile
114 |
115 |
116 |
117 |
118 |
119 | org.openjdk.jmh
120 | jmh-generator-annprocess
121 | 1.21
122 |
123 |
124 |
125 |
126 |
127 |
128 |
129 |
130 | org.apache.maven.plugins
131 | maven-source-plugin
132 | 2.2.1
133 |
134 |
135 |
136 | attach-sources
137 |
138 | jar
139 |
140 |
141 |
142 |
143 |
144 |
145 | org.apache.maven.plugins
146 | maven-assembly-plugin
147 | 3.1.1
148 |
149 |
150 |
151 | src/assembly/benchmark.xml
152 |
153 |
154 |
155 |
156 |
157 | make-assembly
158 | package
159 |
160 | single
161 |
162 |
163 | true
164 |
165 |
166 | org.openjdk.jmh.Main
167 |
168 |
169 |
170 |
171 |
172 |
173 |
174 |
175 |
176 |
177 |
178 | release-sign-artifacts
179 |
180 |
181 | performRelease
182 | true
183 |
184 |
185 |
186 |
187 |
188 | org.apache.maven.plugins
189 | maven-javadoc-plugin
190 | 2.9.1
191 |
192 |
193 | attach-javadocs
194 |
195 | jar
196 |
197 |
198 |
199 |
200 |
201 | org.apache.maven.plugins
202 | maven-source-plugin
203 | 2.2.1
204 |
205 |
206 | attach-sources
207 |
208 | jar
209 |
210 |
211 |
212 |
213 |
214 | org.apache.maven.plugins
215 | maven-gpg-plugin
216 | 1.1
217 |
218 |
219 | sign-artifacts
220 | verify
221 |
222 | sign
223 |
224 |
225 |
226 |
227 |
228 |
229 |
230 |
231 |
232 |
233 |
234 | jon
235 | Jon Chambers
236 | jon.chambers@gmail.com
237 | https://github.com/jchambers
238 |
239 | developer
240 |
241 | -5
242 |
243 |
244 | 2015
245 | https://github.com/jchambers/jvptree
246 |
247 | scm:git:https://github.com/jchambers/jvptree.git
248 | scm:git:git@github.com:jchambers/jvptree.git
249 | https://github.com/jchambers/jvptree
250 |
251 |
252 |
--------------------------------------------------------------------------------
/src/assembly/benchmark.xml:
--------------------------------------------------------------------------------
1 |
3 | benchmark
4 |
5 | jar
6 |
7 | false
8 |
9 |
10 | /
11 | true
12 | true
13 | test
14 |
15 |
16 |
17 |
18 | ${project.build.directory}/test-classes
19 | /
20 |
21 | **/*
22 |
23 | true
24 |
25 |
26 |
27 |
--------------------------------------------------------------------------------
/src/benchmark/java/com/eatthepath/jvptree/CartesianDistanceFunction.java:
--------------------------------------------------------------------------------
1 | package com.eatthepath.jvptree;
2 |
3 | import com.eatthepath.jvptree.DistanceFunction;
4 |
5 | public class CartesianDistanceFunction implements DistanceFunction {
6 |
7 | @Override
8 | public double getDistance(final CartesianPoint firstPoint, final CartesianPoint secondPoint) {
9 | final double deltaX = firstPoint.getX() - secondPoint.getX();
10 | final double deltaY = firstPoint.getY() - secondPoint.getY();
11 |
12 | return Math.sqrt((deltaX * deltaX) + (deltaY * deltaY));
13 | }
14 | }
15 |
--------------------------------------------------------------------------------
/src/benchmark/java/com/eatthepath/jvptree/CartesianPoint.java:
--------------------------------------------------------------------------------
1 | package com.eatthepath.jvptree;
2 |
3 | public class CartesianPoint {
4 | private final double x;
5 | private final double y;
6 |
7 | public CartesianPoint(final double x, final double y) {
8 | this.x = x;
9 | this.y = y;
10 | }
11 |
12 | public double getX() {
13 | return this.x;
14 | }
15 |
16 | public double getY() {
17 | return this.y;
18 | }
19 | }
20 |
--------------------------------------------------------------------------------
/src/benchmark/java/com/eatthepath/jvptree/ThresholdSelectionBenchmark.java:
--------------------------------------------------------------------------------
1 | /*
2 | * Copyright (c) 2014, Oracle America, Inc.
3 | * All rights reserved.
4 | *
5 | * Redistribution and use in source and binary forms, with or without
6 | * modification, are permitted provided that the following conditions are met:
7 | *
8 | * * Redistributions of source code must retain the above copyright notice,
9 | * this list of conditions and the following disclaimer.
10 | *
11 | * * Redistributions in binary form must reproduce the above copyright
12 | * notice, this list of conditions and the following disclaimer in the
13 | * documentation and/or other materials provided with the distribution.
14 | *
15 | * * Neither the name of Oracle nor the names of its contributors may be used
16 | * to endorse or promote products derived from this software without
17 | * specific prior written permission.
18 | *
19 | * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
20 | * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21 | * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
22 | * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
23 | * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
24 | * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
25 | * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
26 | * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
27 | * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
28 | * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
29 | * THE POSSIBILITY OF SUCH DAMAGE.
30 | */
31 |
32 | package com.eatthepath.jvptree;
33 |
34 | import java.util.ArrayList;
35 | import java.util.Collections;
36 | import java.util.List;
37 | import java.util.Random;
38 |
39 | import org.openjdk.jmh.annotations.Benchmark;
40 | import org.openjdk.jmh.annotations.Param;
41 | import org.openjdk.jmh.annotations.Scope;
42 | import org.openjdk.jmh.annotations.Setup;
43 | import org.openjdk.jmh.annotations.State;
44 |
45 | import com.eatthepath.jvptree.util.MedianDistanceThresholdSelectionStrategy;
46 | import com.eatthepath.jvptree.util.SamplingMedianDistanceThresholdSelectionStrategy;
47 |
48 | @State(Scope.Thread)
49 | public class ThresholdSelectionBenchmark {
50 |
51 | @Param({"100000"})
52 | public int pointCount;
53 |
54 | private List points;
55 |
56 | private final Random random = new Random();
57 | private final CartesianDistanceFunction distanceFunction = new CartesianDistanceFunction();
58 |
59 | private final MedianDistanceThresholdSelectionStrategy medianSelectionStrategy =
60 | new MedianDistanceThresholdSelectionStrategy<>();
61 |
62 | private final SamplingMedianDistanceThresholdSelectionStrategy samplingMedianSelectionStrategy =
63 | new SamplingMedianDistanceThresholdSelectionStrategy<>(100);
64 |
65 | @Setup
66 | public void setUp() {
67 | this.points = new ArrayList<>(this.pointCount);
68 |
69 | for (int i = 0; i < this.pointCount; i++) {
70 | this.points.add(this.createRandomPoint());
71 | }
72 | }
73 |
74 | @Benchmark
75 | public double benchmarkRandomThresholdSelection() {
76 | final CartesianPoint origin = this.createRandomPoint();
77 |
78 | return this.distanceFunction.getDistance(origin, this.points.get(this.random.nextInt(this.pointCount)));
79 | }
80 |
81 | @Benchmark
82 | public double benchmarkMedianThresholdSelection() {
83 | final CartesianPoint origin = this.createRandomPoint();
84 |
85 | return this.medianSelectionStrategy.selectThreshold(this.points, origin, this.distanceFunction);
86 | }
87 |
88 | @Benchmark
89 | public double benchmarkSamplingMedianThresholdSelection() {
90 | final CartesianPoint origin = this.createRandomPoint();
91 |
92 | return this.samplingMedianSelectionStrategy.selectThreshold(this.points, origin, this.distanceFunction);
93 | }
94 |
95 | @Benchmark
96 | public double benchmarkNaiveMedianThresholdSelection() {
97 | final CartesianPoint origin = this.createRandomPoint();
98 |
99 | Collections.sort(this.points, new DistanceComparator<>(origin, this.distanceFunction));
100 |
101 | return this.distanceFunction.getDistance(origin, this.points.get(this.points.size() / 2));
102 | }
103 |
104 | private CartesianPoint createRandomPoint() {
105 | return new CartesianPoint(this.random.nextDouble(), this.random.nextDouble());
106 | }
107 | }
108 |
--------------------------------------------------------------------------------
/src/benchmark/java/com/eatthepath/jvptree/VPTreeConstructionBenchmark.java:
--------------------------------------------------------------------------------
1 | package com.eatthepath.jvptree;
2 |
3 | import java.util.ArrayList;
4 | import java.util.List;
5 | import java.util.Random;
6 |
7 | import org.openjdk.jmh.annotations.Benchmark;
8 | import org.openjdk.jmh.annotations.Param;
9 | import org.openjdk.jmh.annotations.Scope;
10 | import org.openjdk.jmh.annotations.Setup;
11 | import org.openjdk.jmh.annotations.State;
12 |
13 | @State(Scope.Thread)
14 | public class VPTreeConstructionBenchmark {
15 |
16 | @Param({"100000"})
17 | public int pointCount;
18 |
19 | private List points;
20 |
21 | private final Random random = new Random();
22 | private final CartesianDistanceFunction distanceFunction = new CartesianDistanceFunction();
23 |
24 | @Setup
25 | public void setUp() {
26 | this.points = new ArrayList<>(this.pointCount);
27 |
28 | for (int i = 0; i < this.pointCount; i++) {
29 | this.points.add(this.createRandomPoint());
30 | }
31 | }
32 |
33 | @Benchmark
34 | public VPTree benchmarkConstructTreeWithPoints() {
35 | return new VPTree<>(this.distanceFunction, this.points);
36 | }
37 |
38 | @Benchmark
39 | public VPTree benchmarkConstructAndAddPoints() {
40 | final VPTree vptree = new VPTree<>(this.distanceFunction);
41 | vptree.addAll(this.points);
42 |
43 | return vptree;
44 | }
45 |
46 | private CartesianPoint createRandomPoint() {
47 | return new CartesianPoint(this.random.nextDouble(), this.random.nextDouble());
48 | }
49 | }
50 |
--------------------------------------------------------------------------------
/src/benchmark/java/com/eatthepath/jvptree/VPTreeQueryBenchmark.java:
--------------------------------------------------------------------------------
1 | package com.eatthepath.jvptree;
2 |
3 | import java.util.ArrayList;
4 | import java.util.Collections;
5 | import java.util.List;
6 | import java.util.Random;
7 |
8 | import org.openjdk.jmh.annotations.Benchmark;
9 | import org.openjdk.jmh.annotations.Param;
10 | import org.openjdk.jmh.annotations.Scope;
11 | import org.openjdk.jmh.annotations.Setup;
12 | import org.openjdk.jmh.annotations.State;
13 |
14 | import com.eatthepath.jvptree.util.SamplingMedianDistanceThresholdSelectionStrategy;
15 |
16 | @State(Scope.Thread)
17 | public class VPTreeQueryBenchmark {
18 |
19 | @Param({"100000"})
20 | public int pointCount;
21 |
22 | @Param({"2", "16", "128"})
23 | public int nodeSize;
24 |
25 | @Param({"2", "16", "128"})
26 | public int resultSetSize;
27 |
28 | private List points;
29 | private VPTree vptree;
30 |
31 | private final Random random = new Random();
32 | private final CartesianDistanceFunction distanceFunction = new CartesianDistanceFunction();
33 |
34 | @Setup
35 | public void setUp() {
36 | this.points = new ArrayList<>(this.pointCount);
37 |
38 | for (int i = 0; i < this.pointCount; i++) {
39 | this.points.add(this.createRandomPoint());
40 | }
41 |
42 | this.vptree = new VPTree<>(this.distanceFunction,
43 | new SamplingMedianDistanceThresholdSelectionStrategy(32),
44 | this.nodeSize, this.points);
45 | }
46 |
47 | @Benchmark
48 | public List benchmarkNaiveSearch() {
49 | Collections.sort(this.points, new DistanceComparator<>(this.createRandomPoint(), this.distanceFunction));
50 | return this.points.subList(0, this.resultSetSize);
51 | }
52 |
53 | @Benchmark
54 | public List benchmarkQueryTree() {
55 | return this.vptree.getNearestNeighbors(this.createRandomPoint(), this.resultSetSize);
56 | }
57 |
58 | private CartesianPoint createRandomPoint() {
59 | return new CartesianPoint(this.random.nextDouble(), this.random.nextDouble());
60 | }
61 | }
62 |
--------------------------------------------------------------------------------
/src/main/java/com/eatthepath/jvptree/DistanceComparator.java:
--------------------------------------------------------------------------------
1 | package com.eatthepath.jvptree;
2 |
3 | import java.util.Comparator;
4 |
5 | /**
6 | * A {@code Comparator} that orders points by their distance (as determined by a given distance function) from a given
7 | * origin point.
8 | *
9 | * @author Jon Chambers
10 | */
11 | public class DistanceComparator implements Comparator {
12 | private final T origin;
13 | private final DistanceFunction super T> distanceFunction;
14 |
15 | /**
16 | * Constructs a new distance comparator with the given origin point and distance function.
17 | *
18 | * @param origin the point from which distances to other points will be calculated
19 | * @param distanceFunction the function that calculates the distance between the origin and the given points
20 | */
21 | public DistanceComparator(final T origin, final DistanceFunction super T> distanceFunction) {
22 | this.origin = origin;
23 | this.distanceFunction = distanceFunction;
24 | }
25 |
26 | /**
27 | * Compares two points by their distance from this distance comparator's origin point.
28 | *
29 | * @param o1 the first point to be compared
30 | * @param o2 the second point to be compared
31 | *
32 | * @return a negative integer if o1 is closer to the origin than o2, a positive integer if o2 is closer to the
33 | * origin than o1, or zero if o1 and o2 are equidistant from the origin
34 | */
35 | public int compare(final T o1, final T o2) {
36 | return Double.compare(
37 | this.distanceFunction.getDistance(this.origin, o1),
38 | this.distanceFunction.getDistance(this.origin, o2));
39 | }
40 | }
41 |
--------------------------------------------------------------------------------
/src/main/java/com/eatthepath/jvptree/DistanceFunction.java:
--------------------------------------------------------------------------------
1 | package com.eatthepath.jvptree;
2 |
3 | /**
4 | *
A function that calculates the distance between two points. For the purposes of vp-trees, distance functions must
5 | * conform to the rules of a metric space, namely:
6 | *
7 | *
8 | *
d(x, y) ≥ 0
9 | *
d(x, y) = 0 if and only if x = y
10 | *
d(x, y) = d(y, x)
11 | *
d(x, z) ≤ d(x, y) + d(y, z)
12 | *
13 | *
14 | * @author Jon Chambers
15 | */
16 | public interface DistanceFunction {
17 |
18 | /**
19 | * Returns the distance between two points.
20 | *
21 | * @param firstPoint the first point
22 | * @param secondPoint the second point
23 | *
24 | * @return the distance between the two points
25 | */
26 | double getDistance(T firstPoint, T secondPoint);
27 | }
28 |
--------------------------------------------------------------------------------
/src/main/java/com/eatthepath/jvptree/MetaIterator.java:
--------------------------------------------------------------------------------
1 | package com.eatthepath.jvptree;
2 |
3 | import java.util.ArrayDeque;
4 | import java.util.Collection;
5 | import java.util.Deque;
6 | import java.util.Iterator;
7 | import java.util.NoSuchElementException;
8 |
9 | /**
10 | * An iterator that concatenates a number of sub-iterators.
11 | *
12 | * @author Jon Chambers
13 | */
14 | class MetaIterator implements Iterator {
15 |
16 | private final Deque> iterators;
17 |
18 | /**
19 | * Constructs an iterator that concatenates the contents of the given collection of iterators.
20 | *
21 | * @param iterators the iterators to concatenate
22 | */
23 | public MetaIterator(final Collection> iterators) {
24 | this.iterators = new ArrayDeque<>(iterators);
25 | }
26 |
27 | /*
28 | * (non-Javadoc)
29 | * @see java.util.Iterator#hasNext()
30 | */
31 | public boolean hasNext() {
32 | while (!this.iterators.isEmpty()) {
33 | if (this.iterators.peek().hasNext()) {
34 | return true;
35 | }
36 |
37 | this.iterators.pop();
38 | }
39 |
40 | return false;
41 | }
42 |
43 | /*
44 | * (non-Javadoc)
45 | * @see java.util.Iterator#next()
46 | */
47 | public E next() {
48 | if (!this.hasNext()) {
49 | throw new NoSuchElementException();
50 | }
51 |
52 | return this.iterators.peek().next();
53 | }
54 |
55 | /*
56 | * (non-Javadoc)
57 | * @see java.util.Iterator#remove()
58 | */
59 | public void remove() {
60 | throw new UnsupportedOperationException();
61 | }
62 | }
63 |
--------------------------------------------------------------------------------
/src/main/java/com/eatthepath/jvptree/NearestNeighborCollector.java:
--------------------------------------------------------------------------------
1 | package com.eatthepath.jvptree;
2 |
3 | import java.util.ArrayList;
4 | import java.util.List;
5 | import java.util.PriorityQueue;
6 |
7 | /**
8 | * A utility class that uses a priority queue to efficiently collect results for a k-nearest-neighbors query in a
9 | * vp-tree.
10 | *
11 | * @author Jon Chambers
12 | */
13 | class NearestNeighborCollector
{
14 | private final P queryPoint;
15 | private final int capacity;
16 |
17 | private final DistanceFunction
distanceFunction;
18 | private final DistanceComparator
distanceComparator;
19 | private final PriorityQueue priorityQueue;
20 |
21 | private double distanceToFarthestPoint;
22 |
23 | /**
24 | * Constructs a new nearest neighbor collector that selectively accepts points that are close to the given query
25 | * point as determined by the given distance function. Up to the given number of nearest neighbors are collected,
26 | * and if neighbors are found that are closer than points in the current set, the most distant previously collected
27 | * point is replaced with the closer candidate.
28 | *
29 | * @param queryPoint the point for which nearest neighbors are to be collected
30 | * @param distanceFunction the distance function to be used to determine the distance between the query point and
31 | * potential neighbors
32 | * @param capacity the maximum number of nearest neighbors to collect
33 | */
34 | public NearestNeighborCollector(final P queryPoint, final DistanceFunction
distanceFunction, final int capacity) {
35 | if (capacity < 1) {
36 | throw new IllegalArgumentException("Capacity must be positive.");
37 | }
38 |
39 | this.queryPoint = queryPoint;
40 | this.distanceFunction = distanceFunction;
41 | this.capacity = capacity;
42 |
43 | this.distanceComparator = new DistanceComparator<>(queryPoint, distanceFunction);
44 |
45 | this.priorityQueue =
46 | new PriorityQueue<>(this.capacity, java.util.Collections.reverseOrder(this.distanceComparator));
47 | }
48 |
49 | /**
50 | * Returns the query point for this collector.
51 | *
52 | * @return the query point for this collector
53 | */
54 | public P getQueryPoint() {
55 | return this.queryPoint;
56 | }
57 |
58 | /**
59 | * Offers a point to this collector. The point may or may not be added to the collection; points will only be added
60 | * if the collector is not already full, or if the collector is full, but the offered point is closer to the query
61 | * point than the most distant point already in the collection.
62 | *
63 | * @param point the point to offer to this collector
64 | */
65 | public void offerPoint(final E point) {
66 | final boolean pointAdded;
67 |
68 | if (this.priorityQueue.size() < this.capacity) {
69 | this.priorityQueue.add(point);
70 | pointAdded = true;
71 | } else {
72 | assert this.priorityQueue.size() > 0;
73 |
74 | final double distanceToNewPoint = this.distanceFunction.getDistance(this.queryPoint, point);
75 |
76 | if (distanceToNewPoint < this.distanceToFarthestPoint) {
77 | this.priorityQueue.poll();
78 | this.priorityQueue.add(point);
79 | pointAdded = true;
80 | } else {
81 | pointAdded = false;
82 | }
83 | }
84 |
85 | if (pointAdded) {
86 | this.distanceToFarthestPoint = this.distanceFunction.getDistance(this.queryPoint, this.priorityQueue.peek());
87 | }
88 | }
89 |
90 | /**
91 | * Returns the point retained by this collector that is the farthest from the query point.
92 | *
93 | * @return the point retained by this collector that is the farthest from the query point
94 | */
95 | public E getFarthestPoint() {
96 | return this.priorityQueue.peek();
97 | }
98 |
99 | /**
100 | * Returns a list of points retained by this collector, sorted by distance from the query point.
101 | *
102 | * @return a list of points retained by this collector, sorted by distance from the query point
103 | */
104 | public List toSortedList() {
105 | final ArrayList sortedList = new ArrayList<>(this.priorityQueue);
106 | java.util.Collections.sort(sortedList, this.distanceComparator);
107 |
108 | return sortedList;
109 | }
110 | }
111 |
--------------------------------------------------------------------------------
/src/main/java/com/eatthepath/jvptree/PartitionException.java:
--------------------------------------------------------------------------------
1 | package com.eatthepath.jvptree;
2 |
3 | /**
4 | * Indicates that a list of points could not be partitioned by distance because either all points are on one side of
5 | * the distance threshold or all points are of equal distance from the pivot point.
6 | *
7 | * @author Jon Chambers
8 | */
9 | class PartitionException extends Exception {
10 | private static final long serialVersionUID = 1L;
11 | }
12 |
--------------------------------------------------------------------------------
/src/main/java/com/eatthepath/jvptree/PointFilter.java:
--------------------------------------------------------------------------------
1 | package com.eatthepath.jvptree;
2 |
3 | /**
4 | * A stateless filter that can determine whether points should be included in a spatial index's result set when
5 | * searching for nearby neighbors.
6 | *
7 | * @param the type of point to which this filter applies
8 | *
9 | * @see SpatialIndex#getNearestNeighbors(Object, int, PointFilter)
10 | * @see SpatialIndex#getAllWithinDistance(Object, double, PointFilter)
11 | */
12 | public interface PointFilter {
13 |
14 | /**
15 | * Tests whether a point should be included in a spatial index's result set when searching for nearby neighbors.
16 | *
17 | * @param point the point to test
18 | *
19 | * @return {@code true} if the point may be included in the result set or {@code false} if it should be excluded
20 | */
21 | boolean allowPoint(T point);
22 | }
23 |
--------------------------------------------------------------------------------
/src/main/java/com/eatthepath/jvptree/SpatialIndex.java:
--------------------------------------------------------------------------------
1 | package com.eatthepath.jvptree;
2 |
3 | import java.util.Collection;
4 | import java.util.List;
5 |
6 | /**
7 | * A collection of points that can be searched efficiently to find points near a given query point. A spatial index
8 | * takes two generic types. The first, {@code P}, is the base type of point for which distances can be measured. The
9 | * second, {@code E}, is the specific type of point contained within the index. The two ideas are separated because
10 | * callers may want to use an instance of {@code E} when querying the index. For example, an index that is used to
11 | * search for local businesses might have a base type of {@code GeospatialPoint}, but a specific type of
12 | * {@code HardwareStore}, which implements {@code GeospatialPoint} but has a number of additional required properties.
13 | * By separating the types, callers may realize the benefits of using a specific type when working with elements in the
14 | * index without the need to construct a new {@code HardwareStore} instance when querying points. Instead, they might
15 | * call {@link SpatialIndex#getNearestNeighbors(Object, int)} with a {@code new GeospatialPoint} instead of a much
16 | * heavier {@code HardwareStore}.
17 | *
18 | * @author Jon Chambers
19 | *
20 | * @param
the base type of points between which distances can be measured
21 | * @param the specific type of point contained in this vantage point tree
22 | */
23 | public interface SpatialIndex
extends Collection {
24 | /**
25 | *
Returns a list of the nearest neighbors to a given query point. The returned list is sorted by increasing
26 | * distance from the query point.
27 | *
28 | *
This returned list will contain at most {@code maxResults} elements (and may contain fewer if
29 | * {@code maxResults} is larger than the number of points in the index). If multiple points have the same distance
30 | * from the query point, the order in which they appear in the returned list is undefined. By extension, if multiple
31 | * points have the same distance from the query point and those points would "straddle" the end of the
32 | * returned list, which points are included in the list and which are cut off is not prescribed.
33 | *
34 | * @param queryPoint the point for which to find neighbors
35 | * @param maxResults the maximum length of the returned list
36 | *
37 | * @return a list of the nearest neighbors to the given query point sorted by increasing distance from the query
38 | * point
39 | */
40 | List getNearestNeighbors(P queryPoint, int maxResults);
41 |
42 | /**
43 | *
Returns a list of the nearest neighbors accepted by the given filter to a given query point. The returned list
44 | * is sorted by increasing distance from the query point.
45 | *
46 | *
This returned list will contain at most {@code maxResults} elements (and may contain fewer if
47 | * {@code maxResults} is larger than the number of points in the index). If multiple points have the same distance
48 | * from the query point, the order in which they appear in the returned list is undefined. By extension, if multiple
49 | * points have the same distance from the query point and those points would "straddle" the end of the
50 | * returned list, which points are included in the list and which are cut off is not prescribed.
51 | *
52 | * @param queryPoint the point for which to find neighbors
53 | * @param maxResults the maximum length of the returned list
54 | * @param filter a filter to apply to each element to determine if it should be included in the list of neighbors
55 | *
56 | * @return a list of the nearest neighbors to the given query point sorted by increasing distance from the query
57 | * point
58 | */
59 | List getNearestNeighbors(P queryPoint, int maxResults, PointFilter super E> filter);
60 |
61 | /**
62 | * Returns a list of all points within a given distance to a query point.
63 | *
64 | * @param queryPoint the point for which to find neighbors
65 | * @param maxDistance the maximum allowable distance from the query point; points farther away than
66 | * {@code maxDistance} will not be included in the returned list
67 | *
68 | * @return a list of all points within the given distance to the query point; the returned list is sorted in order
69 | * of increasing distance from the query point
70 | */
71 | List getAllWithinDistance(P queryPoint, double maxDistance);
72 |
73 | /**
74 | * Returns a list of all points within a given distance to a query point that match the given filter.
75 | *
76 | * @param queryPoint the point for which to find neighbors
77 | * @param maxDistance the maximum allowable distance from the query point; points farther away than
78 | * {@code maxDistance} will not be included in the returned list
79 | * @param filter a filter to apply to each element to determine if it should be included in the list of neighbors
80 | *
81 | * @return a list of all points within the given distance to the query point; the returned list is sorted in order
82 | * of increasing distance from the query point
83 | */
84 | List getAllWithinDistance(P queryPoint, double maxDistance, PointFilter super E> filter);
85 | }
86 |
--------------------------------------------------------------------------------
/src/main/java/com/eatthepath/jvptree/ThresholdSelectionStrategy.java:
--------------------------------------------------------------------------------
1 | package com.eatthepath.jvptree;
2 |
3 | import java.util.List;
4 |
5 | /**
6 | * A strategy for choosing a distance threshold for vp-tree nodes. The main feature of vp-trees is that they partition
7 | * collections of points into collections of points that are closer to a given point (the vantage point) than a certain
8 | * threshold or farther away from the vantage point than the threshold. Given a list of points, a
9 | * {@code ThresholdSelectionStrategy} chooses the distance that will be used by a vp-tree node to partition its points.
10 | *
11 | * @author Jon Chambers
12 | */
13 | public interface ThresholdSelectionStrategy
{
14 |
15 | /**
16 | * Chooses a partitioning distance threshold appropriate for the given list of points. Implementations are allowed to
17 | * reorder the list of points, but must not add or remove points from the list.
18 | *
19 | * @param points the points for which to choose a partitioning distance threshold
20 | * @param origin the point from which the threshold distances should be calculated
21 | * @param distanceFunction the function to be used to calculate distances between points
22 | *
23 | * @return a partitioning threshold distance appropriate for the given list of points; ideally, some points should
24 | * be closer to the origin than the returned threshold, and some should be farther
25 | */
26 | double selectThreshold(List points, P origin, DistanceFunction
A vantage-point tree (or vp-tree) is a binary space partitioning collection of points in a metric space. The main
13 | * feature of vantage point trees is that they allow for k-nearest-neighbor searches in any metric space in
14 | * O(log(n)) time.
15 | *
16 | *
Vantage point trees recursively partition points by choosing a "vantage point" and a distance threshold;
17 | * points are then partitioned into one collection that contains all of the points closer to the vantage point than the
18 | * chosen threshold and one collection that contains all of the points farther away than the chosen threshold.
19 | *
20 | *
A {@linkplain DistanceFunction distance function} that satisfies the properties of a metric space must be provided
21 | * when constructing a vantage point tree. Callers may also specify a threshold selection strategy (a sampling median
22 | * strategy is used by default) and a node size to tune the ratio of nodes searched to points inspected per node.
23 | * Vantage point trees may be constructed with or without an initial collection of points, though specifying a
24 | * collection of points at construction time is the most efficient approach.
the base type of points between which distances can be measured
29 | * @param the specific type of point contained in this vantage point tree
30 | */
31 | public class VPTree
implements SpatialIndex
{
32 |
33 | private final DistanceFunction
distanceFunction;
34 | private final ThresholdSelectionStrategy
thresholdSelectionStrategy;
35 | private final int nodeCapacity;
36 |
37 | private VPTreeNode
rootNode;
38 |
39 | public static final int DEFAULT_NODE_CAPACITY = 32;
40 |
41 | private static final PointFilter