├── index.html ├── doc ├── Jamfile.v2 ├── index.html ├── img │ ├── fpr_c.png │ ├── fpr_n_k.png │ ├── natvis.png │ ├── stride.png │ ├── bloom_lookup.png │ ├── db_speedup.png │ ├── fpr_n_k_bk.png │ ├── block_insertion.png │ ├── bloom_insertion.png │ ├── block_multi_insertion.png │ └── multiblock_insertion.png ├── bloom │ ├── reference │ │ ├── header_bloom.adoc │ │ ├── header_block.adoc │ │ ├── header_multiblock.adoc │ │ ├── header_fast_multiblock32.adoc │ │ ├── header_fast_multiblock64.adoc │ │ ├── block.adoc │ │ ├── multiblock.adoc │ │ ├── header_filter.adoc │ │ ├── fast_multiblock64.adoc │ │ ├── fast_multiblock32.adoc │ │ └── subfilters.adoc │ ├── copyright.adoc │ ├── release_notes.adoc │ ├── reference.adoc │ ├── acknowledgements.adoc │ ├── future_work.adoc │ ├── intro.adoc │ ├── fpr_estimation.adoc │ ├── implementation_notes.adoc │ ├── configuration.adoc │ ├── primer.adoc │ └── tutorial.adoc └── bloom.adoc ├── example ├── Jamfile.v2 ├── basic.cpp ├── serialization.cpp ├── rolling_filter.cpp └── genome.cpp ├── benchmark ├── Jamfile.v2 ├── fpr_c.cpp ├── comparison_table.cpp └── bulk_comparison_table.cpp ├── meta └── libraries.json ├── test ├── CMakeLists.txt ├── test_boost_bloom_hpp.cpp ├── Jamfile.v2 ├── test_visualization.cpp ├── test_types.hpp ├── test_bulk_operations.cpp ├── test_array.cpp ├── test_comparison.cpp ├── test_insertion.cpp ├── test_combination.cpp ├── test_fpr.cpp ├── test_capacity.cpp └── test_utilities.hpp ├── include └── boost │ ├── bloom │ ├── detail │ │ ├── avx2.hpp │ │ ├── neon.hpp │ │ ├── sse2.hpp │ │ ├── constexpr_bit_width.hpp │ │ ├── block_fpr_base.hpp │ │ ├── multiblock_fpr_base.hpp │ │ ├── block_ops.hpp │ │ ├── mulx64.hpp │ │ ├── block_base.hpp │ │ ├── fast_multiblock32_avx2.hpp │ │ ├── fast_multiblock64_avx2.hpp │ │ ├── fast_multiblock32_sse2.hpp │ │ ├── fast_multiblock32_neon.hpp │ │ ├── type_traits.hpp │ │ └── bloom_printers.hpp │ ├── fast_multiblock64.hpp │ ├── fast_multiblock32.hpp │ ├── multiblock.hpp │ ├── block.hpp │ └── filter.hpp │ └── bloom.hpp ├── .codecov.yml ├── .drone ├── drone.bat └── drone.sh ├── CMakeLists.txt ├── extra ├── boost_bloom.natvis └── boost_bloom_printers.py ├── .gitattributes └── README.md /index.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boostorg/bloom/HEAD/index.html -------------------------------------------------------------------------------- /doc/Jamfile.v2: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boostorg/bloom/HEAD/doc/Jamfile.v2 -------------------------------------------------------------------------------- /doc/index.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boostorg/bloom/HEAD/doc/index.html -------------------------------------------------------------------------------- /doc/img/fpr_c.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boostorg/bloom/HEAD/doc/img/fpr_c.png -------------------------------------------------------------------------------- /doc/img/fpr_n_k.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boostorg/bloom/HEAD/doc/img/fpr_n_k.png -------------------------------------------------------------------------------- /doc/img/natvis.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boostorg/bloom/HEAD/doc/img/natvis.png -------------------------------------------------------------------------------- /doc/img/stride.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boostorg/bloom/HEAD/doc/img/stride.png -------------------------------------------------------------------------------- /example/Jamfile.v2: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boostorg/bloom/HEAD/example/Jamfile.v2 -------------------------------------------------------------------------------- /benchmark/Jamfile.v2: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boostorg/bloom/HEAD/benchmark/Jamfile.v2 -------------------------------------------------------------------------------- /doc/img/bloom_lookup.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boostorg/bloom/HEAD/doc/img/bloom_lookup.png -------------------------------------------------------------------------------- /doc/img/db_speedup.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boostorg/bloom/HEAD/doc/img/db_speedup.png -------------------------------------------------------------------------------- /doc/img/fpr_n_k_bk.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boostorg/bloom/HEAD/doc/img/fpr_n_k_bk.png -------------------------------------------------------------------------------- /doc/img/block_insertion.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boostorg/bloom/HEAD/doc/img/block_insertion.png -------------------------------------------------------------------------------- /doc/img/bloom_insertion.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boostorg/bloom/HEAD/doc/img/bloom_insertion.png -------------------------------------------------------------------------------- /doc/img/block_multi_insertion.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boostorg/bloom/HEAD/doc/img/block_multi_insertion.png -------------------------------------------------------------------------------- /doc/img/multiblock_insertion.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/boostorg/bloom/HEAD/doc/img/multiblock_insertion.png -------------------------------------------------------------------------------- /doc/bloom/reference/header_bloom.adoc: -------------------------------------------------------------------------------- 1 | [#header_bloom] 2 | == `` 3 | 4 | :idprefix: header_bloom_ 5 | 6 | Convenience header including all the other headers listed in this 7 | reference. 8 | 9 | ''' -------------------------------------------------------------------------------- /doc/bloom/copyright.adoc: -------------------------------------------------------------------------------- 1 | [#copyright] 2 | = Copyright and License 3 | 4 | :idprefix: copyright_ 5 | 6 | Of this documentation: 7 | 8 | * Copyright © 2025 Joaquín M López Muñoz 9 | 10 | Distributed under the http://www.boost.org/LICENSE_1_0.txt[Boost Software License, Version 1.0^]. 11 | -------------------------------------------------------------------------------- /doc/bloom/reference/header_block.adoc: -------------------------------------------------------------------------------- 1 | [#header_block] 2 | == `` 3 | 4 | :idprefix: header_block_ 5 | 6 | [listing,subs="+macros,+quotes"] 7 | ----- 8 | namespace boost{ 9 | namespace bloom{ 10 | 11 | template 12 | struct xref:block[block]; 13 | 14 | } // namespace bloom 15 | } // namespace boost 16 | ----- 17 | 18 | -------------------------------------------------------------------------------- /meta/libraries.json: -------------------------------------------------------------------------------- 1 | { 2 | "key": "bloom", 3 | "name": "Bloom", 4 | "authors": [ 5 | "Joaqu\u00edn M L\u00f3pez Mu\u00f1oz" 6 | ], 7 | "description": "Bloom filters.", 8 | "cxxstd": "11", 9 | "category": [ 10 | "Containers" 11 | ], 12 | "maintainers": [ 13 | "Joaquin M Lopez Munoz " 14 | ] 15 | } 16 | -------------------------------------------------------------------------------- /doc/bloom/reference/header_multiblock.adoc: -------------------------------------------------------------------------------- 1 | [#header_multiblock] 2 | == `` 3 | 4 | :idprefix: header_multiblock_ 5 | 6 | [listing,subs="+macros,+quotes"] 7 | ----- 8 | namespace boost{ 9 | namespace bloom{ 10 | 11 | template 12 | struct xref:multiblock[multiblock]; 13 | 14 | } // namespace bloom 15 | } // namespace boost 16 | ----- 17 | 18 | -------------------------------------------------------------------------------- /doc/bloom/reference/header_fast_multiblock32.adoc: -------------------------------------------------------------------------------- 1 | [#header_fast_multiblock32] 2 | == `` 3 | 4 | :idprefix: header_fast_multiblock32_ 5 | 6 | [listing,subs="+macros,+quotes"] 7 | ----- 8 | namespace boost{ 9 | namespace bloom{ 10 | 11 | template 12 | struct xref:fast_multiblock32[fast_multiblock32]; 13 | 14 | } // namespace bloom 15 | } // namespace boost 16 | ----- 17 | 18 | -------------------------------------------------------------------------------- /doc/bloom/reference/header_fast_multiblock64.adoc: -------------------------------------------------------------------------------- 1 | [#header_fast_multiblock64] 2 | == `` 3 | 4 | :idprefix: header_fast_multiblock64_ 5 | 6 | [listing,subs="+macros,+quotes"] 7 | ----- 8 | namespace boost{ 9 | namespace bloom{ 10 | 11 | template 12 | struct xref:fast_multiblock64[fast_multiblock64]; 13 | 14 | } // namespace bloom 15 | } // namespace boost 16 | ----- 17 | 18 | -------------------------------------------------------------------------------- /doc/bloom/release_notes.adoc: -------------------------------------------------------------------------------- 1 | [#release_notes] 2 | = Release Notes 3 | 4 | :idprefix: release_notes_ 5 | 6 | == Boost 1.90 7 | 8 | * Added bulk-mode insertion and lookup for increased performance. 9 | * Made lookup implementation branchless for `block`, `fast_multiblock32` 10 | and `fast_multiblock64`, which results in some performance gains, particularly 11 | for mixed successful/unsuccessful queries. 12 | 13 | == Boost 1.89 14 | 15 | * Initial release. 16 | 17 | -------------------------------------------------------------------------------- /test/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | # Copyright 2018, 2019, 2021, 2022 Peter Dimov 2 | # Copyright 2025 Joaquin M Lopez Munoz 3 | # Distributed under the Boost Software License, Version 1.0. 4 | # See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt 5 | 6 | include(BoostTestJamfile OPTIONAL RESULT_VARIABLE HAVE_BOOST_TEST) 7 | 8 | if(HAVE_BOOST_TEST) 9 | 10 | boost_test_jamfile(FILE Jamfile.v2 11 | LINK_LIBRARIES Boost::bloom Boost::config Boost::core Boost::mp11) 12 | 13 | endif() -------------------------------------------------------------------------------- /include/boost/bloom/detail/avx2.hpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Joaquin M Lopez Munoz. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | #ifndef BOOST_BLOOM_DETAIL_AVX2_HPP 10 | #define BOOST_BLOOM_DETAIL_AVX2_HPP 11 | 12 | #if defined(__AVX2__) 13 | #define BOOST_BLOOM_AVX2 14 | #endif 15 | 16 | #if defined(BOOST_BLOOM_AVX2) 17 | #include 18 | #endif 19 | 20 | #endif 21 | -------------------------------------------------------------------------------- /doc/bloom/reference.adoc: -------------------------------------------------------------------------------- 1 | [#reference] 2 | = Reference 3 | 4 | include::reference/header_bloom.adoc[] 5 | include::reference/header_filter.adoc[] 6 | include::reference/filter.adoc[] 7 | include::reference/subfilters.adoc[] 8 | include::reference/header_block.adoc[] 9 | include::reference/block.adoc[] 10 | include::reference/header_multiblock.adoc[] 11 | include::reference/multiblock.adoc[] 12 | include::reference/header_fast_multiblock32.adoc[] 13 | include::reference/fast_multiblock32.adoc[] 14 | include::reference/header_fast_multiblock64.adoc[] 15 | include::reference/fast_multiblock64.adoc[] 16 | -------------------------------------------------------------------------------- /include/boost/bloom.hpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Joaquin M Lopez Munoz. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | #ifndef BOOST_BLOOM_HPP 10 | #define BOOST_BLOOM_HPP 11 | 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include 17 | 18 | #endif 19 | -------------------------------------------------------------------------------- /include/boost/bloom/detail/neon.hpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Joaquin M Lopez Munoz. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | #ifndef BOOST_BLOOM_DETAIL_NEON_HPP 10 | #define BOOST_BLOOM_DETAIL_NEON_HPP 11 | 12 | #if defined(__ARM_NEON)&&!defined(__ARM_BIG_ENDIAN) 13 | #define BOOST_BLOOM_LITTLE_ENDIAN_NEON 14 | #endif 15 | 16 | #if defined(BOOST_BLOOM_LITTLE_ENDIAN_NEON) 17 | #include 18 | #endif 19 | 20 | #endif 21 | -------------------------------------------------------------------------------- /include/boost/bloom/detail/sse2.hpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Joaquin M Lopez Munoz. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | #ifndef BOOST_BLOOM_DETAIL_SSE2_HPP 10 | #define BOOST_BLOOM_DETAIL_SSE2_HPP 11 | 12 | #if defined(__SSE2__)|| \ 13 | defined(_M_X64)||(defined(_M_IX86_FP)&&_M_IX86_FP>=2) 14 | #define BOOST_BLOOM_SSE2 15 | #endif 16 | 17 | #if defined(BOOST_BLOOM_SSE2) 18 | #include 19 | #endif 20 | 21 | #endif 22 | -------------------------------------------------------------------------------- /.codecov.yml: -------------------------------------------------------------------------------- 1 | # Copyright 2019 - 2021 Alexander Grund 2 | # Distributed under the Boost Software License, Version 1.0. 3 | # (See accompanying file LICENSE_1_0.txt or copy at http://boost.org/LICENSE_1_0.txt) 4 | # 5 | # Sample codecov configuration file. Edit as required 6 | 7 | codecov: 8 | max_report_age: off 9 | require_ci_to_pass: yes 10 | notify: 11 | # Increase this if you have multiple coverage collection jobs 12 | after_n_builds: 1 13 | wait_for_ci: yes 14 | 15 | # Change how pull request comments look 16 | comment: 17 | layout: "reach,diff,flags,files,footer" 18 | 19 | # Ignore specific files or folders. Glob patterns are supported. 20 | # See https://docs.codecov.com/docs/ignoring-paths 21 | ignore: 22 | - extra/**/* 23 | # - test/**/* -------------------------------------------------------------------------------- /test/test_boost_bloom_hpp.cpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Joaquin M Lopez Munoz. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | #include 10 | #include 11 | 12 | struct use_types 13 | { 14 | using type1=boost::bloom::filter; 15 | using type2=boost::bloom::block; 16 | using type3=boost::bloom::multiblock; 17 | using type4=boost::bloom::fast_multiblock32<1>; 18 | using type5=boost::bloom::fast_multiblock64<1>; 19 | }; 20 | 21 | int main() 22 | { 23 | (void)use_types{}; 24 | return boost::report_errors(); 25 | } 26 | -------------------------------------------------------------------------------- /include/boost/bloom/detail/constexpr_bit_width.hpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Joaquin M Lopez Munoz. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | #ifndef BOOST_BLOOM_DETAIL_CONSTEXPR_BIT_WIDTH_HPP 10 | #define BOOST_BLOOM_DETAIL_CONSTEXPR_BIT_WIDTH_HPP 11 | 12 | #include 13 | 14 | namespace boost{ 15 | namespace bloom{ 16 | namespace detail{ 17 | 18 | /* boost::core::bit_width is not always C++11 constexpr */ 19 | 20 | constexpr std::size_t constexpr_bit_width(std::size_t x) 21 | { 22 | return x?1+constexpr_bit_width(x>>1):0; 23 | } 24 | 25 | } /* namespace detail */ 26 | } /* namespace bloom */ 27 | } /* namespace boost */ 28 | #endif 29 | -------------------------------------------------------------------------------- /include/boost/bloom/detail/block_fpr_base.hpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Joaquin M Lopez Munoz. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | #ifndef BOOST_BLOOM_DETAIL_BLOCK_FPR_BASE_HPP 10 | #define BOOST_BLOOM_DETAIL_BLOCK_FPR_BASE_HPP 11 | 12 | #include 13 | #include 14 | 15 | namespace boost{ 16 | namespace bloom{ 17 | namespace detail{ 18 | 19 | template 20 | struct block_fpr_base 21 | { 22 | static double fpr(std::size_t i,std::size_t w) 23 | { 24 | return std::pow(1.0-std::pow(1.0-1.0/w,(double)K*i),(double)K); 25 | } 26 | }; 27 | 28 | } /* namespace detail */ 29 | } /* namespace bloom */ 30 | } /* namespace boost */ 31 | #endif 32 | -------------------------------------------------------------------------------- /include/boost/bloom/detail/multiblock_fpr_base.hpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Joaquin M Lopez Munoz. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | #ifndef BOOST_BLOOM_DETAIL_MULTIBLOCK_FPR_BASE_HPP 10 | #define BOOST_BLOOM_DETAIL_MULTIBLOCK_FPR_BASE_HPP 11 | 12 | #include 13 | #include 14 | 15 | namespace boost{ 16 | namespace bloom{ 17 | namespace detail{ 18 | 19 | template 20 | struct multiblock_fpr_base 21 | { 22 | static double fpr(std::size_t i,std::size_t w) 23 | { 24 | return std::pow(1.0-std::pow(1.0-(double)K/w,(double)i),(double)K); 25 | } 26 | }; 27 | 28 | } /* namespace detail */ 29 | } /* namespace bloom */ 30 | } /* namespace boost */ 31 | #endif 32 | -------------------------------------------------------------------------------- /include/boost/bloom/fast_multiblock64.hpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Joaquin M Lopez Munoz. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | #ifndef BOOST_BLOOM_FAST_MULTIBLOCK64_HPP 10 | #define BOOST_BLOOM_FAST_MULTIBLOCK64_HPP 11 | 12 | #include 13 | 14 | #if defined(BOOST_BLOOM_AVX2) 15 | #include 16 | #else /* fallback */ 17 | #include 18 | #include 19 | #include 20 | 21 | namespace boost{ 22 | namespace bloom{ 23 | 24 | template 25 | using fast_multiblock64=multiblock; 26 | 27 | } /* namespace bloom */ 28 | } /* namespace boost */ 29 | #endif 30 | 31 | #endif 32 | -------------------------------------------------------------------------------- /doc/bloom/reference/block.adoc: -------------------------------------------------------------------------------- 1 | [#block] 2 | == Class Template `block` 3 | 4 | :idprefix: block_ 5 | 6 | `boost::bloom::block` -- A xref:subfilter[subfilter] over an integral type. 7 | 8 | === Synopsis 9 | 10 | [listing,subs="+macros,+quotes"] 11 | ----- 12 | // #include 13 | 14 | namespace boost{ 15 | namespace bloom{ 16 | 17 | template 18 | struct block 19 | { 20 | static constexpr std::size_t k = K; 21 | using value_type = Block; 22 | 23 | // the rest of the interface is not public 24 | 25 | } // namespace bloom 26 | } // namespace boost 27 | ----- 28 | 29 | === Description 30 | 31 | *Template Parameters* 32 | 33 | [cols="1,4"] 34 | |=== 35 | 36 | |`Block` 37 | |An unsigned integral type or an array of 2^`N`^ elements of unsigned integral type. 38 | 39 | |`K` 40 | | Number of bits set/checked per operation. Must be greater than zero. 41 | 42 | |=== 43 | 44 | ''' -------------------------------------------------------------------------------- /.drone/drone.bat: -------------------------------------------------------------------------------- 1 | @REM Copyright 2022 Peter Dimov 2 | @REM Distributed under the Boost Software License, Version 1.0. 3 | @REM https://www.boost.org/LICENSE_1_0.txt 4 | 5 | @ECHO ON 6 | 7 | set LIBRARY=%1 8 | set DRONE_BUILD_DIR=%CD% 9 | 10 | set BOOST_BRANCH=develop 11 | if "%DRONE_BRANCH%" == "master" set BOOST_BRANCH=master 12 | cd .. 13 | git clone -b %BOOST_BRANCH% --depth 1 https://github.com/boostorg/boost.git boost-root 14 | cd boost-root 15 | git submodule update --init tools/boostdep 16 | mkdir -p libs\%LIBRARY% & REM remove when/if the library makes it into Boost 17 | xcopy /s /e /q %DRONE_BUILD_DIR% libs\%LIBRARY%\ 18 | python tools/boostdep/depinst/depinst.py -I examples %LIBRARY% 19 | cmd /c bootstrap 20 | b2 -d0 headers 21 | 22 | if not "%CXXSTD%" == "" set CXXSTD=cxxstd=%CXXSTD% 23 | if not "%ADDRMD%" == "" set ADDRMD=address-model=%ADDRMD% 24 | b2 -j3 libs/%LIBRARY%/test toolset=%TOOLSET% %CXXSTD% %ADDRMD% variant=debug,release embed-manifest-via=linker 25 | -------------------------------------------------------------------------------- /doc/bloom/reference/multiblock.adoc: -------------------------------------------------------------------------------- 1 | [#multiblock] 2 | == Class Template `multiblock` 3 | 4 | :idprefix: multiblock_ 5 | 6 | `boost::bloom::multiblock` -- A xref:subfilter[subfilter] over an array of an integral type. 7 | 8 | === Synopsis 9 | 10 | [listing,subs="+macros,+quotes"] 11 | ----- 12 | // #include 13 | 14 | namespace boost{ 15 | namespace bloom{ 16 | 17 | template 18 | struct multiblock 19 | { 20 | static constexpr std::size_t k = K; 21 | using value_type = Block[k]; 22 | 23 | // the rest of the interface is not public 24 | 25 | } // namespace bloom 26 | } // namespace boost 27 | ----- 28 | 29 | === Description 30 | 31 | *Template Parameters* 32 | 33 | [cols="1,4"] 34 | |=== 35 | 36 | |`Block` 37 | |An unsigned integral type or an array of 2^`N`^ elements of unsigned integral type. 38 | 39 | |`K` 40 | | Number of bits set/checked per operation. Must be greater than zero. 41 | 42 | |=== 43 | 44 | Each of the `K` bits set/checked is located in a different element of the 45 | `Block[K]` array. 46 | 47 | ''' -------------------------------------------------------------------------------- /test/Jamfile.v2: -------------------------------------------------------------------------------- 1 | # Copyright 2025 Joaquín M López Muñoz. 2 | # Distributed under the Boost Software License, Version 1.0. 3 | # (See accompanying file LICENSE_1_0.txt or copy at 4 | # http://www.boost.org/LICENSE_1_0.txt) 5 | # 6 | # See http://www.boost.org/libs/bloom for library home page. 7 | 8 | require-b2 5.0.1 ; 9 | import-search /boost/config/checks ; 10 | 11 | import testing ; 12 | import config : requires ; 13 | 14 | project 15 | : requirements 16 | /boost/bloom//boost_bloom 17 | /boost/config//boost_config 18 | /boost/core//boost_core 19 | /boost/mp11//boost_mp11 20 | [ requires cxx11_noexcept ] # used as a proxy for C++11 support 21 | msvc:-D_SCL_SECURE_NO_WARNINGS 22 | ; 23 | 24 | run test_array.cpp ; 25 | run test_boost_bloom_hpp.cpp ; 26 | run test_bulk_operations.cpp ; 27 | run test_capacity.cpp ; 28 | run test_combination.cpp ; 29 | run test_comparison.cpp ; 30 | run test_construction.cpp ; 31 | run test_fpr.cpp ; 32 | run test_insertion.cpp ; 33 | 34 | compile test_visualization.cpp ; 35 | -------------------------------------------------------------------------------- /test/test_visualization.cpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Braden Ganetsky. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | /* This is a file for testing of visualizations, 10 | * such as Visual Studio Natvis or GDB pretty printers. 11 | * Run this test and break at the label called `break_here`. 12 | * Inspect the variables to test correctness. 13 | */ 14 | 15 | #include 16 | 17 | #include 18 | 19 | void break_here() {} 20 | 21 | // Prevent any "unused" errors 22 | template void use(Args&&...) {} 23 | 24 | int main() 25 | { 26 | boost::bloom::filter filter1{}; 27 | boost::bloom::filter filter2{1}; 28 | boost::bloom::filter filter3{{1,2,3,4,5}, 2000}; 29 | 30 | use(filter1); 31 | use(filter2); 32 | use(filter3); 33 | 34 | break_here(); 35 | 36 | return boost::report_errors(); 37 | } 38 | -------------------------------------------------------------------------------- /.drone/drone.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Copyright 2022 Peter Dimov 4 | # Distributed under the Boost Software License, Version 1.0. 5 | # https://www.boost.org/LICENSE_1_0.txt 6 | 7 | set -ex 8 | export PATH=~/.local/bin:/usr/local/bin:$PATH 9 | 10 | DRONE_BUILD_DIR=$(pwd) 11 | 12 | BOOST_BRANCH=develop 13 | if [ "$DRONE_BRANCH" = "master" ]; then BOOST_BRANCH=master; fi 14 | 15 | cd .. 16 | git clone -b $BOOST_BRANCH --depth 1 https://github.com/boostorg/boost.git boost-root 17 | cd boost-root 18 | mkdir -p libs/$LIBRARY # remove when/if the library makes it into Boost 19 | git submodule update --init tools/boostdep 20 | cp -r $DRONE_BUILD_DIR/* libs/$LIBRARY 21 | python tools/boostdep/depinst/depinst.py -I examples $LIBRARY 22 | ./bootstrap.sh 23 | ./b2 -d0 headers 24 | 25 | echo "using $TOOLSET : : $COMPILER ;" > ~/user-config.jam 26 | ./b2 -j3 libs/$LIBRARY/test toolset=$TOOLSET cxxstd=$CXXSTD variant=debug,release ${ADDRMD:+address-model=$ADDRMD} ${UBSAN:+undefined-sanitizer=norecover debug-symbols=on} ${ASAN:+address-sanitizer=norecover debug-symbols=on} ${LINKFLAGS:+linkflags=$LINKFLAGS} ${LINK:+link=$LINK} 27 | -------------------------------------------------------------------------------- /doc/bloom/acknowledgements.adoc: -------------------------------------------------------------------------------- 1 | [#acknowledgements] 2 | = Acknowledgements 3 | 4 | :idprefix: acknowledgements_ 5 | 6 | Peter Dimov and Christian Mazakas reviewed significant portions of the code 7 | and documentation during the development phase. Sam Darwin provided support 8 | for CI setup and documentation building. Braden Ganetsky contributed the 9 | GDB pretty-printer for `boost::bloom::filter`. 10 | 11 | The Boost acceptance review took place between the 13th and 22nd of May, 12 | 2025. Big thanks to Arnaud Becheler for his expert managing. The 13 | following people participated in the review: 14 | Dmitry Arkhipov, 15 | David Bien, 16 | Claudio DeSouza, 17 | Peter Dimov, 18 | Vinnie Falco, 19 | Alexander Grund, 20 | Seth Heeren, 21 | Andrzej Krzemieński, 22 | Ivan Matek, 23 | Christian Mazakas, 24 | Rubén Pérez, 25 | Kostas Savvidis, 26 | Peter Turcan, 27 | Tomer Vromen. Many thanks to all of them for their very helpful feedback. 28 | 29 | Boost.Bloom was designed and written in 30 | https://en.wikipedia.org/wiki/C%C3%A1ceres%2c_Spain[Cáceres^] and 31 | https://en.wikipedia.org/wiki/Oropesa,_Spain[Oropesa^], 32 | January-June 2025. -------------------------------------------------------------------------------- /doc/bloom.adoc: -------------------------------------------------------------------------------- 1 | = Boost.Bloom 2 | :toc: left 3 | :toclevels: 3 4 | :idprefix: 5 | :docinfo: private-footer 6 | :source-highlighter: rouge 7 | :source-language: c++ 8 | :nofooter: 9 | :sectlinks: 10 | :leveloffset: +1 11 | :imagesdir: ../img 12 | :stem: latexmath 13 | :small: pass:[] 14 | :small-end: pass:[] 15 | :cpp: C++ 16 | 17 | ++++ 18 | 39 | ++++ 40 | 41 | include::bloom/intro.adoc[] 42 | include::bloom/primer.adoc[] 43 | include::bloom/tutorial.adoc[] 44 | include::bloom/configuration.adoc[] 45 | include::bloom/benchmarks.adoc[] 46 | include::bloom/reference.adoc[] 47 | include::bloom/future_work.adoc[] 48 | include::bloom/fpr_estimation.adoc[] 49 | include::bloom/implementation_notes.adoc[] 50 | include::bloom/release_notes.adoc[] 51 | include::bloom/acknowledgements.adoc[] 52 | include::bloom/copyright.adoc[] 53 | -------------------------------------------------------------------------------- /include/boost/bloom/fast_multiblock32.hpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Joaquin M Lopez Munoz. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | #ifndef BOOST_BLOOM_FAST_MULTIBLOCK32_HPP 10 | #define BOOST_BLOOM_FAST_MULTIBLOCK32_HPP 11 | 12 | #include 13 | #include 14 | #include 15 | 16 | #if defined(BOOST_BLOOM_AVX2) 17 | #include 18 | #elif defined(BOOST_BLOOM_SSE2) /* important that this comes after AVX2 */ 19 | #include 20 | #elif defined(BOOST_BLOOM_LITTLE_ENDIAN_NEON) 21 | #include 22 | #else /* fallback */ 23 | #include 24 | #include 25 | #include 26 | 27 | namespace boost{ 28 | namespace bloom{ 29 | 30 | template 31 | using fast_multiblock32=multiblock; 32 | 33 | } /* namespace bloom */ 34 | } /* namespace boost */ 35 | #endif 36 | 37 | #endif 38 | -------------------------------------------------------------------------------- /doc/bloom/reference/header_filter.adoc: -------------------------------------------------------------------------------- 1 | [#header_filter] 2 | == `` 3 | 4 | :idprefix: header_filter_ 5 | 6 | Defines `xref:filter[boost::bloom::filter]` 7 | and associated functions. 8 | 9 | [listing,subs="+macros,+quotes"] 10 | ----- 11 | namespace boost{ 12 | namespace bloom{ 13 | 14 | template< 15 | typename T, std::size_t K, 16 | typename Subfilter = block, std::size_t Stride = 0, 17 | typename Hash = boost::hash, 18 | typename Allocator = std::allocator 19 | > 20 | class xref:filter[filter]; 21 | 22 | template< 23 | typename T, std::size_t K, typename SF, std::size_t S, typename H, typename A 24 | > 25 | bool xref:filter_operator[operator+++==+++]( 26 | const filter& x, const filter& y); 27 | 28 | template< 29 | typename T, std::size_t K, typename SF, std::size_t S, typename H, typename A 30 | > 31 | bool xref:filter_operator_2[operator!=]( 32 | const filter& x, const filter& y); 33 | 34 | template< 35 | typename T, std::size_t K, typename SF, std::size_t S, typename H, typename A 36 | > 37 | void xref:filter_swap_2[swap](filter& x, filter& y) 38 | noexcept(noexcept(x.swap(y))); 39 | 40 | } // namespace bloom 41 | } // namespace boost 42 | ----- 43 | 44 | -------------------------------------------------------------------------------- /CMakeLists.txt: -------------------------------------------------------------------------------- 1 | # Generated by `boostdep --cmake bloom` 2 | # Copyright 2020, 2021 Peter Dimov 3 | # Distributed under the Boost Software License, Version 1.0. 4 | # https://www.boost.org/LICENSE_1_0.txt 5 | 6 | cmake_minimum_required(VERSION 3.8...3.20) 7 | 8 | project(boost_bloom VERSION "${BOOST_SUPERPROJECT_VERSION}" LANGUAGES CXX) 9 | 10 | add_library(boost_bloom INTERFACE) 11 | add_library(Boost::bloom ALIAS boost_bloom) 12 | 13 | target_include_directories(boost_bloom INTERFACE include) 14 | 15 | target_link_libraries(boost_bloom 16 | INTERFACE 17 | Boost::assert 18 | Boost::config 19 | Boost::container_hash 20 | Boost::core 21 | Boost::throw_exception 22 | Boost::type_traits 23 | ) 24 | 25 | target_compile_features(boost_bloom INTERFACE cxx_std_11) 26 | 27 | if(NOT CMAKE_VERSION VERSION_LESS 3.19 AND CMAKE_GENERATOR MATCHES "Visual Studio") 28 | 29 | file(GLOB_RECURSE boost_bloom_IDEFILES CONFIGURE_DEPENDS "include/*.hpp") 30 | source_group(TREE ${PROJECT_SOURCE_DIR}/include FILES ${boost_bloom_IDEFILES} PREFIX "Header Files") 31 | list(APPEND boost_bloom_IDEFILES extra/boost_bloom.natvis) 32 | target_sources(boost_bloom PRIVATE ${boost_bloom_IDEFILES}) 33 | 34 | endif() 35 | 36 | if(BUILD_TESTING AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/test/CMakeLists.txt") 37 | 38 | add_subdirectory(test) 39 | 40 | endif() 41 | -------------------------------------------------------------------------------- /example/basic.cpp: -------------------------------------------------------------------------------- 1 | /* Basic example of use of Boost.Bloom. 2 | * 3 | * Copyright 2025 Joaquin M Lopez Munoz. 4 | * Distributed under the Boost Software License, Version 1.0. 5 | * (See accompanying file LICENSE_1_0.txt or copy at 6 | * http://www.boost.org/LICENSE_1_0.txt) 7 | * 8 | * See https://www.boost.org/libs/bloom for library home page. 9 | */ 10 | 11 | #include 12 | #include 13 | #include 14 | #include 15 | 16 | int main() 17 | { 18 | /* Bloom filter of strings with 5 bits set per insertion */ 19 | 20 | using filter = boost::bloom::filter; 21 | 22 | /* create filter with a capacity of 1,000,000 bits */ 23 | 24 | filter f(1000000); 25 | 26 | /* insert elements in the set */ 27 | 28 | f.insert("hello"); 29 | f.insert("Boost"); 30 | 31 | /* elements inserted are always correctly checked as such */ 32 | 33 | assert(f.may_contain("hello") == true); 34 | 35 | /* Elements not inserted may incorrectly be identified as such with a 36 | * false probability rate (FPR) which is a function of the array capacity, 37 | * the number of bits set per element and generally how the 38 | * boost::bloom::filter was configured. 39 | */ 40 | 41 | if(f.may_contain("bye")) { /* likely false */ 42 | std::cout << "false positive\n"; 43 | } 44 | else { 45 | std::cout << "everything worked as expected\n"; 46 | } 47 | } 48 | -------------------------------------------------------------------------------- /doc/bloom/reference/fast_multiblock64.adoc: -------------------------------------------------------------------------------- 1 | [#fast_multiblock64] 2 | == Class Template `fast_multiblock64` 3 | 4 | :idprefix: fast_multiblock64_ 5 | 6 | `boost::bloom::fast_multiblock64` -- A faster replacement of 7 | `xref:multiblock[multiblock]`. 8 | 9 | === Synopsis 10 | 11 | [listing,subs="+macros,+quotes"] 12 | ----- 13 | // #include 14 | 15 | namespace boost{ 16 | namespace bloom{ 17 | 18 | template 19 | struct fast_multiblock64 20 | { 21 | static constexpr std::size_t k = K; 22 | using value_type = _implementation-defined_; 23 | 24 | // might not be present 25 | static constexpr std::size_t used_value_size = _implementation-defined_; 26 | 27 | // the rest of the interface is not public 28 | 29 | } // namespace bloom 30 | } // namespace boost 31 | ----- 32 | 33 | === Description 34 | 35 | *Template Parameters* 36 | 37 | [cols="1,4"] 38 | |=== 39 | 40 | |`K` 41 | | Number of bits set/checked per operation. Must be greater than zero. 42 | 43 | |=== 44 | 45 | `fast_multiblock64` is statistically equivalent to 46 | `xref:multiblock[multiblock]`, but takes advantage 47 | of selected SIMD technologies, when available at compile time, to perform faster. 48 | Currently supported: AVX2. 49 | The non-SIMD case falls back to regular `multiblock`. 50 | 51 | `xref:subfilters_used_value_size[_used-value-size_]>` is 52 | `8 * K`. 53 | -------------------------------------------------------------------------------- /doc/bloom/reference/fast_multiblock32.adoc: -------------------------------------------------------------------------------- 1 | [#fast_multiblock32] 2 | == Class Template `fast_multiblock32` 3 | 4 | :idprefix: fast_multiblock32_ 5 | 6 | `boost::bloom::fast_multiblock32` -- A faster replacement of 7 | `xref:multiblock[multiblock]`. 8 | 9 | === Synopsis 10 | 11 | [listing,subs="+macros,+quotes"] 12 | ----- 13 | // #include 14 | 15 | namespace boost{ 16 | namespace bloom{ 17 | 18 | template 19 | struct fast_multiblock32 20 | { 21 | static constexpr std::size_t k = K; 22 | using value_type = _implementation-defined_; 23 | 24 | // might not be present 25 | static constexpr std::size_t used_value_size = _implementation-defined_; 26 | 27 | // the rest of the interface is not public 28 | 29 | } // namespace bloom 30 | } // namespace boost 31 | ----- 32 | 33 | === Description 34 | 35 | *Template Parameters* 36 | 37 | [cols="1,4"] 38 | |=== 39 | 40 | |`K` 41 | | Number of bits set/checked per operation. Must be greater than zero. 42 | 43 | |=== 44 | 45 | `fast_multiblock32` is statistically equivalent to 46 | `xref:multiblock[multiblock]`, but takes advantage 47 | of selected SIMD technologies, when available at compile time, to perform faster. 48 | Currently supported: AVX2, little-endian Neon, SSE2. 49 | The non-SIMD case falls back to regular `multiblock`. 50 | 51 | `xref:subfilters_used_value_size[_used-value-size_]>` is 52 | `4 * K`. 53 | 54 | ''' -------------------------------------------------------------------------------- /test/test_types.hpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Joaquin M Lopez Munoz. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | #ifndef BOOST_BLOOM_TEST_TEST_TYPES_HPP 10 | #define BOOST_BLOOM_TEST_TEST_TYPES_HPP 11 | 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include 17 | #include 18 | #include 19 | #include 20 | #include 21 | #include 22 | 23 | using test_types=boost::mp11::mp_list< 24 | boost::bloom::filter< 25 | int,2 26 | >, 27 | boost::bloom::filter< 28 | std::string,1,boost::bloom::block,1 29 | >, 30 | boost::bloom::filter< 31 | int,1,boost::bloom::block 32 | >, 33 | boost::bloom::filter< 34 | std::size_t,1,boost::bloom::multiblock 35 | >, 36 | boost::bloom::filter< 37 | std::size_t,1,boost::bloom::multiblock,1 38 | >, 39 | boost::bloom::filter< 40 | unsigned char,1,boost::bloom::fast_multiblock32<5>,2 41 | >, 42 | boost::bloom::filter< 43 | int,1,boost::bloom::fast_multiblock64<11> 44 | > 45 | >; 46 | 47 | using identity_test_types= 48 | boost::mp11::mp_transform; 49 | 50 | #endif 51 | -------------------------------------------------------------------------------- /extra/boost_bloom.natvis: -------------------------------------------------------------------------------- 1 | 2 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 23 | 24 | 25 | {{ capacity={capacity()} }} 26 | 27 | 28 | *reinterpret_cast<hasher*>(static_cast<hash_base*>(this)) 29 | 30 | 31 | *reinterpret_cast<super::allocator_type*>(static_cast<super::allocator_base*>(this)) 32 | 33 | 34 | capacity() 35 | 36 | 37 | {{ data={(void*)data()} size={array_size()} }} 38 | 39 | 40 | array_size() 41 | data() 42 | 43 | 44 | 45 | 46 | 47 | 48 | -------------------------------------------------------------------------------- /test/test_bulk_operations.cpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Joaquin M Lopez Munoz. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | #include 10 | #include 11 | #include 12 | #include 13 | #include "test_types.hpp" 14 | #include "test_utilities.hpp" 15 | 16 | using namespace test_utilities; 17 | 18 | template 19 | void test_bulk_operations() 20 | { 21 | using filter=Filter; 22 | using value_type=typename filter::value_type; 23 | 24 | ValueFactory fac; 25 | { 26 | filter f1(10000),f2(f1); 27 | std::array input; 28 | for(auto& x:input)x=fac(); 29 | f1.insert(input.begin(),input.end()); 30 | f2.insert( 31 | make_input_iterator(input.begin()),make_input_iterator(input.end())); 32 | BOOST_TEST(f1==f2); 33 | } 34 | { 35 | Filter f(10000); 36 | std::array input; 37 | for(auto& x:input)x=fac(); 38 | for(std::size_t i=0;i 48 | void operator()(T) 49 | { 50 | using filter=typename T::type; 51 | using value_type=typename filter::value_type; 52 | 53 | test_bulk_operations>(); 54 | } 55 | }; 56 | 57 | int main() 58 | { 59 | boost::mp11::mp_for_each(lambda{}); 60 | return boost::report_errors(); 61 | } 62 | -------------------------------------------------------------------------------- /test/test_array.cpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Joaquin M Lopez Munoz. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | #include 10 | #include 11 | #include 12 | #include 13 | #include "test_types.hpp" 14 | #include "test_utilities.hpp" 15 | 16 | using namespace test_utilities; 17 | 18 | template 19 | void test_array() 20 | { 21 | using filter=Filter; 22 | 23 | ValueFactory fac; 24 | 25 | { 26 | filter f; 27 | const filter& cf=f; 28 | BOOST_TEST_EQ(f.array().size(),0); 29 | BOOST_TEST_EQ(f.array().data(),cf.array().data()); 30 | BOOST_TEST_EQ(f.array().size(),cf.array().size()); 31 | } 32 | { 33 | filter f(1000); 34 | const filter& cf=f; 35 | BOOST_TEST_NE(f.array().data(),nullptr); 36 | BOOST_TEST_EQ(f.array().size(),f.capacity()/CHAR_BIT); 37 | BOOST_TEST_EQ(f.array().data(),cf.array().data()); 38 | BOOST_TEST_EQ(f.array().size(),cf.array().size()); 39 | } 40 | { 41 | filter f1(1000),f2(1000); 42 | for(int i=0;i<10;++i)f1.insert(fac()); 43 | std::memcpy(f2.array().data(),f1.array().data(),f1.array().size()); 44 | BOOST_TEST_NE(f1.array().data(),f2.array().data()); 45 | BOOST_TEST(f1==f2); 46 | } 47 | } 48 | 49 | struct lambda 50 | { 51 | template 52 | void operator()(T) 53 | { 54 | using filter=typename T::type; 55 | using value_type=typename filter::value_type; 56 | 57 | test_array>(); 58 | } 59 | }; 60 | 61 | int main() 62 | { 63 | boost::mp11::mp_for_each(lambda{}); 64 | return boost::report_errors(); 65 | } 66 | -------------------------------------------------------------------------------- /doc/bloom/future_work.adoc: -------------------------------------------------------------------------------- 1 | [#future_work] 2 | = Future Work 3 | 4 | :idprefix: future_work_ 5 | 6 | A number of features asked by reviewers and users of Boost.Bloom are 7 | considered for inclusion into future versions of the library. 8 | 9 | == `try_insert` 10 | 11 | To avoid inserting an already present element, we now have to do: 12 | 13 | [source] 14 | ----- 15 | if(!f.may_contain(x)) f.insert(x); 16 | ----- 17 | 18 | These two calls can be combined in a potentially faster, 19 | single operation: 20 | 21 | [source] 22 | ----- 23 | bool res = f.try_insert(x); // returns true if x was not present 24 | ----- 25 | 26 | == Estimation of number of elements inserted 27 | 28 | For a classical Bloom filter, the number of elements actually inserted 29 | can be estimated from the number {small}stem:[B]{small-end} of bits set 30 | to one in the array as 31 | 32 | [.formula-center] 33 | {small}stem:[n\approx-\displaystyle\frac{m}{k}\ln\left(1-\displaystyle\frac{B}{m}\right),]{small-end} 34 | 35 | which can be used for the implementation of a member function 36 | `estimated_size`. As of this writing, we don't know how to extend the 37 | formula to the case of block and multiblock filters. Any help on this 38 | problem is much appreciated. 39 | 40 | == Run-time specification of _k_ 41 | 42 | Currently, the number _k_ of bits set per operation is configured at compile time. 43 | A variation of (or extension to) `boost::bloom::filter` can be provided 44 | where the value of _k_ is specified at run-time, the tradeoff being that 45 | its performance will be worse than the static case (preliminary experiments 46 | show an increase in execution time of around 10-20%). 47 | 48 | == Alternative filters 49 | 50 | We can consider adding additional data structures such as 51 | https://en.wikipedia.org/wiki/Cuckoo_filter[cuckoo^] and 52 | https://arxiv.org/pdf/1912.08258[xor^] filters, which are more 53 | space efficient and potentially faster. -------------------------------------------------------------------------------- /test/test_comparison.cpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Joaquin M Lopez Munoz. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | #include 10 | #include 11 | #include "test_types.hpp" 12 | #include "test_utilities.hpp" 13 | 14 | using namespace test_utilities; 15 | 16 | template 17 | void test_comparison() 18 | { 19 | using filter=Filter; 20 | 21 | ValueFactory fac; 22 | 23 | { 24 | BOOST_TEST(filter{}==filter{}); 25 | BOOST_TEST(!(filter{}!=filter{})); 26 | BOOST_TEST(!(filter{}==filter{1000})); 27 | BOOST_TEST(filter{1000}!=filter{}); 28 | BOOST_TEST(!(filter{1000}==filter{{fac(),fac()},1000})); 29 | BOOST_TEST((filter{{fac(),fac()},1000}!=filter{1000})); 30 | } 31 | { 32 | filter f1{1000},f2{1000}; 33 | for(int i=0;i<10;++i){ 34 | auto x=fac(); 35 | f1.insert(x); 36 | f2.insert(x); 37 | } 38 | BOOST_TEST(f1==f2); 39 | BOOST_TEST(!(f2!=f1)); 40 | 41 | for(int i=0;i<10;++i){ 42 | auto x=fac(); 43 | f2.insert(x); 44 | } 45 | BOOST_TEST(!(f1==f2)); /* with high prob. */ 46 | BOOST_TEST(f2!=f1); 47 | 48 | const filter f3=f2; 49 | BOOST_TEST(f2==f3); 50 | BOOST_TEST(!(f3!=f2)); 51 | 52 | f2.clear(); 53 | BOOST_TEST(!(f2==f3)); 54 | BOOST_TEST(f3!=f2); 55 | } 56 | } 57 | 58 | struct lambda 59 | { 60 | template 61 | void operator()(T) 62 | { 63 | using filter=typename T::type; 64 | using value_type=typename filter::value_type; 65 | 66 | test_comparison>(); 67 | } 68 | }; 69 | 70 | int main() 71 | { 72 | boost::mp11::mp_for_each(lambda{}); 73 | return boost::report_errors(); 74 | } 75 | -------------------------------------------------------------------------------- /include/boost/bloom/multiblock.hpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Joaquin M Lopez Munoz. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | #ifndef BOOST_BLOOM_MULTIBLOCK_HPP 10 | #define BOOST_BLOOM_MULTIBLOCK_HPP 11 | 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include 17 | #include 18 | #include 19 | 20 | namespace boost{ 21 | namespace bloom{ 22 | 23 | template 24 | struct multiblock: 25 | public detail::multiblock_fpr_base, 26 | private detail::block_base 27 | { 28 | static constexpr std::size_t k=K; 29 | using value_type=Block[k]; 30 | 31 | /* NOLINTNEXTLINE(readability-redundant-inline-specifier) */ 32 | static inline void mark(value_type& x,std::uint64_t hash) 33 | { 34 | std::size_t i=0; 35 | loop(hash,[&](std::uint64_t h){block_ops::set(x[i++],h&mask);}); 36 | } 37 | 38 | #if BOOST_WORKAROUND(BOOST_MSVC,<=1900) 39 | /* 'int': forcing value to bool 'true' or 'false' */ 40 | #pragma warning(push) 41 | #pragma warning(disable:4800) 42 | #endif 43 | 44 | /* NOLINTNEXTLINE(readability-redundant-inline-specifier) */ 45 | static inline bool check(const value_type& x,std::uint64_t hash) 46 | { 47 | int res=1; 48 | std::size_t i=0; 49 | loop(hash,[&](std::uint64_t h){block_ops::reduce(res,x[i++],h&mask);}); 50 | return res; 51 | } 52 | 53 | #if BOOST_WORKAROUND(BOOST_MSVC,<=1900) 54 | #pragma warning(pop) /* C4800 */ 55 | #endif 56 | 57 | private: 58 | using super=detail::block_base; 59 | using super::mask; 60 | using super::loop; 61 | using block_ops=detail::block_ops; 62 | }; 63 | 64 | } /* namespace bloom */ 65 | } /* namespace boost */ 66 | #endif 67 | -------------------------------------------------------------------------------- /doc/bloom/reference/subfilters.adoc: -------------------------------------------------------------------------------- 1 | [#subfilter] 2 | == Subfilters 3 | 4 | :idprefix: subfilters_ 5 | 6 | A _subfilter_ implements a specific algorithm for bit setting (insertion) and 7 | bit checking (lookup) for `boost::bloom::filter`. Subfilters operate 8 | on portions of the filter's internal array called _subarrays_. The 9 | exact width of these subarrays is statically dependent on the subfilter type. 10 | 11 | The full interface of a conforming subfilter is not exposed publicly, hence 12 | users can't provide their own subfilters and may only use those natively 13 | provided by the library. What follows is the publicly available interface. 14 | 15 | [listing,subs="+macros,+quotes"] 16 | ----- 17 | Subfilter::k 18 | ----- 19 | 20 | [horizontal] 21 | Result:;; A compile-time `std::size_t` value indicating 22 | the number of (not necessarily distinct) bits set/checked per operation. 23 | 24 | [listing,subs="+macros,+quotes"] 25 | ----- 26 | typename Subfilter::value_type 27 | ----- 28 | 29 | [horizontal] 30 | Result:;; A cv-unqualified, 31 | https://en.cppreference.com/w/cpp/named_req/TriviallyCopyable[TriviallyCopyable^] 32 | type to which the subfilter projects assigned subarrays. 33 | 34 | [listing,subs="+macros,+quotes"] 35 | ----- 36 | Subfilter::used_value_size 37 | ----- 38 | 39 | [horizontal] 40 | Result:;; A compile-time `std::size_t` value indicating 41 | the size of the effective portion of `Subfilter::value_type` used 42 | for bit setting/checking (assumed to begin at the lowest address in memory). 43 | Postconditions:;; Greater than zero and not greater than `sizeof(Subfilter::value_type)`. 44 | Notes:;; Optional. 45 | 46 | === _used-value-size_ 47 | 48 | [listing,subs="+macros,+quotes"] 49 | ----- 50 | template 51 | constexpr std::size_t _used-value-size_; // exposition only 52 | ----- 53 | 54 | `_used-value-size_` is `Subfilter::used_value_size` if this nested 55 | constant exists, or `sizeof(Subfilter::value_type)` otherwise. 56 | The value is the effective size in bytes of the subarrays upon which a 57 | given subfilter operates. 58 | 59 | ''' -------------------------------------------------------------------------------- /include/boost/bloom/block.hpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Joaquin M Lopez Munoz. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | #ifndef BOOST_BLOOM_BLOCK_HPP 10 | #define BOOST_BLOOM_BLOCK_HPP 11 | 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include 17 | 18 | namespace boost{ 19 | namespace bloom{ 20 | 21 | template 22 | struct block: 23 | public detail::block_fpr_base, 24 | private detail::block_base 25 | { 26 | static constexpr std::size_t k=K; 27 | using value_type=Block; 28 | 29 | /* NOLINTNEXTLINE(readability-redundant-inline-specifier) */ 30 | static inline void mark(value_type& x,std::uint64_t hash) 31 | { 32 | loop(hash,[&](std::uint64_t h){block_ops::set(x,h&mask);}); 33 | } 34 | 35 | /* NOLINTNEXTLINE(readability-redundant-inline-specifier) */ 36 | static inline bool check(const value_type& x,std::uint64_t hash) 37 | { 38 | return check(x,hash,typename block_ops::is_extended_block{}); 39 | } 40 | 41 | private: 42 | using super=detail::block_base; 43 | using super::mask; 44 | using super::loop; 45 | using super::loop_while; 46 | using block_ops=detail::block_ops; 47 | 48 | /* NOLINTNEXTLINE(readability-redundant-inline-specifier) */ 49 | static inline bool check( 50 | const value_type& x,std::uint64_t hash, 51 | std::false_type /* non-extended block */) 52 | { 53 | Block fp; 54 | block_ops::zero(fp); 55 | mark(fp,hash); 56 | return block_ops::testc(x,fp); 57 | } 58 | 59 | /* NOLINTNEXTLINE(readability-redundant-inline-specifier) */ 60 | static inline bool check( 61 | const value_type& x,std::uint64_t hash, 62 | std::true_type /* extended block */) 63 | { 64 | int res=1; 65 | loop(hash,[&](std::uint64_t h){ 66 | res&=block_ops::get_at_lsb(x,h&mask); 67 | }); 68 | return res; 69 | } 70 | }; 71 | 72 | } /* namespace bloom */ 73 | } /* namespace boost */ 74 | #endif 75 | -------------------------------------------------------------------------------- /include/boost/bloom/detail/block_ops.hpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Joaquin M Lopez Munoz. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | #ifndef BOOST_BLOOM_DETAIL_BLOCK_OPS_HPP 10 | #define BOOST_BLOOM_DETAIL_BLOCK_OPS_HPP 11 | 12 | #include 13 | #include 14 | #include 15 | 16 | namespace boost{ 17 | namespace bloom{ 18 | namespace detail{ 19 | 20 | #if defined(BOOST_MSVC) 21 | #pragma warning(push) 22 | #pragma warning(disable:4714) /* marked as __forceinline not inlined */ 23 | #endif 24 | 25 | template 26 | struct block_ops 27 | { 28 | using is_extended_block=std::false_type; 29 | using value_type=Block; 30 | 31 | static BOOST_FORCEINLINE void zero(Block& x) 32 | { 33 | x=0; 34 | } 35 | 36 | static BOOST_FORCEINLINE void set(value_type& x,std::uint64_t n) 37 | { 38 | x|=Block(1)<(x>>n); 44 | } 45 | 46 | static BOOST_FORCEINLINE void reduce( 47 | int& res,const value_type& x,std::uint64_t n) 48 | { 49 | res&=get_at_lsb(x,n); 50 | } 51 | 52 | static BOOST_FORCEINLINE bool testc(const value_type& x,const value_type& y) 53 | { 54 | return (x&y)==y; 55 | } 56 | }; 57 | 58 | template 59 | struct block_ops 60 | { 61 | using is_extended_block=std::true_type; 62 | using value_type=Block[N]; 63 | 64 | static BOOST_FORCEINLINE void zero(value_type& x) 65 | { 66 | for(std::size_t i=0;i(x[n%N]>>(n/N)); 77 | } 78 | 79 | static BOOST_FORCEINLINE void reduce( 80 | int& res,const value_type& x,std::uint64_t n) 81 | { 82 | res&=get_at_lsb(x,n); 83 | } 84 | }; 85 | 86 | #if defined(BOOST_MSVC) 87 | #pragma warning(pop) /* C4714 */ 88 | #endif 89 | 90 | 91 | } /* namespace detail */ 92 | } /* namespace bloom */ 93 | } /* namespace boost */ 94 | 95 | #endif 96 | -------------------------------------------------------------------------------- /doc/bloom/intro.adoc: -------------------------------------------------------------------------------- 1 | [#intro] 2 | = Introduction 3 | 4 | :idprefix: intro_ 5 | 6 | Boost.Bloom provides the class template `xref:tutorial[boost::bloom::filter]` 7 | that can be configured to implement a classical Bloom filter as well as 8 | variations discussed in the literature such as block filters, multiblock filters, 9 | and more. 10 | 11 | [source,subs="+macros,+quotes"] 12 | ----- 13 | #include 14 | #include 15 | #include 16 | #include 17 | 18 | int main() 19 | { 20 | // Bloom filter of strings with 5 bits set per insertion 21 | using filter = boost::bloom::filter; 22 | 23 | // create filter with a capacity of 1'000'000 **bits** 24 | filter f(1'000'000); 25 | 26 | // insert elements (they can't be erased, Bloom filters are insert-only) 27 | f.insert("hello"); 28 | f.insert("Boost"); 29 | 30 | // elements inserted are always correctly checked as such 31 | assert(f.may_contain("hello") == true); 32 | 33 | // elements not inserted may incorrectly be identified as such with a 34 | // false positive rate (FPR) which is a function of the array capacity, 35 | // the number of bits set per element and generally how the boost::bloom::filter 36 | // was specified 37 | if(f.may_contain("bye")) { // likely false 38 | std::cout << "false positive\n"; 39 | } 40 | else { 41 | std::cout << "everything worked as expected\n"; 42 | } 43 | } 44 | ----- 45 | 46 | The different filter variations supported are specified at compile time 47 | as part of the `boost::bloom::filter` instantiation definition. 48 | Boost.Bloom has been implemented with a focus on performance; 49 | SIMD technologies such as AVX2, Neon and SSE2 can be leveraged to speed up 50 | operations. 51 | 52 | == Getting Started 53 | 54 | Consult the website 55 | https://www.boost.org/doc/user-guide/getting-started.html[section^] 56 | on how to install the entire Boost project or only Boost.Bloom 57 | and its dependencies. 58 | 59 | Boost.Bloom is a header-only library, so no additional build phase is 60 | needed. C++11 or later required. The library has been verified to 61 | work with GCC 4.8, Clang 3.9 and Visual Studio 2015 (and later versions 62 | of those). You can check that your environment is correctly set up 63 | by compiling the 64 | link:../../example/basic.cpp[example program] shown above. 65 | 66 | If you are not familiar with Bloom filters in general, see the 67 | xref:primer[primer]; otherwise, you can jump directly to the 68 | xref:tutorial[tutorial]. -------------------------------------------------------------------------------- /example/serialization.cpp: -------------------------------------------------------------------------------- 1 | /* Serialization of boost::bloom::filter. 2 | * 3 | * Copyright 2025 Joaquin M Lopez Munoz. 4 | * Distributed under the Boost Software License, Version 1.0. 5 | * (See accompanying file LICENSE_1_0.txt or copy at 6 | * http://www.boost.org/LICENSE_1_0.txt) 7 | * 8 | * See https://www.boost.org/libs/bloom for library home page. 9 | */ 10 | 11 | #include 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include 17 | #include 18 | #include 19 | 20 | /* emits a deterministic pseudorandom sequence of UUIDs */ 21 | 22 | struct uuid_generator 23 | { 24 | boost::uuids::uuid operator()() 25 | { 26 | std::uint8_t data[16]; 27 | std::uint64_t x = rng(); 28 | std::memcpy(&data[0], &x, sizeof(x)); 29 | x = rng(); 30 | std::memcpy(&data[8], &x, sizeof(x)); 31 | 32 | return {data}; 33 | } 34 | 35 | boost::detail::splitmix64 rng; 36 | }; 37 | 38 | using filter = boost::bloom::filter< 39 | boost::uuids::uuid, 1, boost::bloom::multiblock >; 40 | 41 | static constexpr std::size_t num_elements = 10000; 42 | 43 | /* creates a filter with num_elements UUIDs */ 44 | 45 | filter create_filter() 46 | { 47 | uuid_generator gen; 48 | filter f(num_elements, 0.005); 49 | for(std::size_t i = 0; i < num_elements; ++i) f.insert(gen()); 50 | return f; 51 | } 52 | 53 | void save_filter(const filter& f, const char* filename) 54 | { 55 | std::ofstream out(filename, std::ios::binary | std::ios::trunc); 56 | std::size_t c=f.capacity(); 57 | out.write(reinterpret_cast(&c), sizeof(c)); /* save capacity (bits) */ 58 | auto s = f.array(); 59 | out.write(reinterpret_cast(s.data()), s.size()); /* save array */ 60 | } 61 | 62 | filter load_filter(const char* filename) 63 | { 64 | std::ifstream in(filename, std::ios::binary); 65 | std::size_t c; 66 | in.read(reinterpret_cast(&c), sizeof(c)); 67 | filter f(c); 68 | auto s = f.array(); 69 | in.read(reinterpret_cast(s.data()), s.size()); /* load array */ 70 | return f; 71 | } 72 | 73 | int main() 74 | { 75 | static constexpr const char* filename = "filter.bin"; 76 | 77 | auto f1 = create_filter(); 78 | save_filter(f1, filename); 79 | auto f2 = load_filter(filename); 80 | 81 | if (f1 == f2) std::cout << "serialization correct\n"; 82 | else std::cout << "something went wrong\n"; 83 | } 84 | -------------------------------------------------------------------------------- /include/boost/bloom/detail/mulx64.hpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2022 Peter Dimov. 2 | * Copyright 2025 Joaquin M Lopez Munoz. 3 | * Distributed under the Boost Software License, Version 1.0. 4 | * (See accompanying file LICENSE_1_0.txt or copy at 5 | * http://www.boost.org/LICENSE_1_0.txt) 6 | * 7 | * See https://www.boost.org/libs/bloom for library home page. 8 | */ 9 | 10 | #ifndef BOOST_BLOOM_DETAIL_MULX64_HPP 11 | #define BOOST_BLOOM_DETAIL_MULX64_HPP 12 | 13 | #include 14 | #include 15 | #include 16 | 17 | #if defined(_MSC_VER)&&!defined(__clang__) 18 | #include 19 | #endif 20 | 21 | namespace boost{ 22 | namespace bloom{ 23 | namespace detail{ 24 | 25 | #if defined(_MSC_VER)&&defined(_M_X64)&&!defined(__clang__) 26 | 27 | __forceinline std::uint64_t umul128( 28 | std::uint64_t x,std::uint64_t y,std::uint64_t& hi) 29 | { 30 | return _umul128(x,y,&hi); 31 | } 32 | 33 | #elif defined(_MSC_VER)&&defined(_M_ARM64)&&!defined(__clang__) 34 | 35 | __forceinline std::uint64_t umul128( 36 | std::uint64_t x,std::uint64_t y,std::uint64_t& hi) 37 | { 38 | hi=__umulh(x,y); 39 | return x*y; 40 | } 41 | 42 | #elif defined(__SIZEOF_INT128__) 43 | 44 | /* NOLINTNEXTLINE(readability-redundant-inline-specifier) */ 45 | inline std::uint64_t umul128( 46 | std::uint64_t x,std::uint64_t y,std::uint64_t& hi) 47 | { 48 | __uint128_t r=(__uint128_t)x*y; 49 | hi=(std::uint64_t)(r>>64); 50 | return (std::uint64_t)r; 51 | } 52 | 53 | #else 54 | 55 | /* NOLINTNEXTLINE(readability-redundant-inline-specifier) */ 56 | inline std::uint64_t umul128( 57 | std::uint64_t x,std::uint64_t y,std::uint64_t& hi) 58 | { 59 | std::uint64_t x1=(std::uint32_t)x; 60 | std::uint64_t x2=x >> 32; 61 | 62 | std::uint64_t y1=(std::uint32_t)y; 63 | std::uint64_t y2=y >> 32; 64 | 65 | std::uint64_t r3=x2*y2; 66 | 67 | std::uint64_t r2a=x1*y2; 68 | 69 | r3+=r2a>>32; 70 | 71 | std::uint64_t r2b=x2*y1; 72 | 73 | r3+=r2b>>32; 74 | 75 | std::uint64_t r1=x1*y1; 76 | 77 | std::uint64_t r2=(r1>>32)+(std::uint32_t)r2a+(std::uint32_t)r2b; 78 | 79 | r1=(r2<<32)+(std::uint32_t)r1; 80 | r3+=r2>>32; 81 | 82 | hi=r3; 83 | return r1; 84 | } 85 | 86 | #endif 87 | 88 | /* NOLINTNEXTLINE(readability-redundant-inline-specifier) */ 89 | inline std::uint64_t mulx64(std::uint64_t x)noexcept 90 | { 91 | /* multiplier is 2^64/phi */ 92 | std::uint64_t hi; 93 | std::uint64_t lo=umul128(x,0x9E3779B97F4A7C15ull,hi); 94 | return hi^lo; 95 | } 96 | 97 | } /* namespace detail */ 98 | } /* namespace bloom */ 99 | } /* namespace boost */ 100 | #endif 101 | -------------------------------------------------------------------------------- /include/boost/bloom/detail/block_base.hpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Joaquin M Lopez Munoz. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | #ifndef BOOST_BLOOM_DETAIL_BLOCK_BASE_HPP 10 | #define BOOST_BLOOM_DETAIL_BLOCK_BASE_HPP 11 | 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include 17 | #include 18 | 19 | namespace boost{ 20 | namespace bloom{ 21 | namespace detail{ 22 | 23 | #if defined(BOOST_MSVC) 24 | #pragma warning(push) 25 | #pragma warning(disable:4714) /* marked as __forceinline not inlined */ 26 | #endif 27 | 28 | /* Validates type Block and provides common looping facilities for block 29 | * and multiblock. 30 | */ 31 | 32 | template 33 | struct block_base 34 | { 35 | static_assert( 36 | is_unsigned_integral_or_extended_unsigned_integral::value|| 37 | ( 38 | is_array_of< 39 | Block,is_unsigned_integral_or_extended_unsigned_integral>::value&& 40 | is_power_of_two::value>::value 41 | ), 42 | "Block must be an (extended) unsigned integral type or an array T[N] " 43 | "with T an (extended) unsigned integral type and N a power of two"); 44 | static constexpr std::size_t k=K; 45 | static constexpr std::size_t hash_width=sizeof(std::uint64_t)*CHAR_BIT; 46 | static constexpr std::size_t block_width=sizeof(Block)*CHAR_BIT; 47 | static constexpr std::size_t mask=block_width-1; 48 | static constexpr std::size_t shift=constexpr_bit_width(mask); 49 | static constexpr std::size_t rehash_k=(hash_width-shift)/shift; 50 | 51 | template 52 | static BOOST_FORCEINLINE void loop(std::uint64_t hash,F f) 53 | { 54 | for(std::size_t i=0;i>=shift; 58 | f(h); 59 | } 60 | hash=detail::mulx64(hash); 61 | } 62 | auto h=hash; 63 | for(std::size_t i=0;i>=shift; 65 | f(h); 66 | } 67 | } 68 | 69 | template 70 | static BOOST_FORCEINLINE bool loop_while(std::uint64_t hash,F f) 71 | { 72 | for(std::size_t i=0;i>=shift; 76 | if(!f(h))return false; 77 | } 78 | hash=detail::mulx64(hash); 79 | } 80 | auto h=hash; 81 | for(std::size_t i=0;i>=shift; 83 | if(!f(h))return false; 84 | } 85 | return true; 86 | } 87 | }; 88 | 89 | #if defined(BOOST_MSVC) 90 | #pragma warning(pop) /* C4714 */ 91 | #endif 92 | 93 | } /* namespace detail */ 94 | } /* namespace bloom */ 95 | } /* namespace boost */ 96 | #endif 97 | -------------------------------------------------------------------------------- /test/test_insertion.cpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Joaquin M Lopez Munoz. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | #include 10 | #include 11 | #include 12 | #include 13 | #include "test_types.hpp" 14 | #include "test_utilities.hpp" 15 | 16 | using namespace test_utilities; 17 | 18 | template 19 | struct multiarg_constructed 20 | { 21 | multiarg_constructed()=default; 22 | template 23 | multiarg_constructed(Arg1&& arg,Arg2&&,Args&&...): 24 | x{std::forward(arg)}{} 25 | multiarg_constructed(const multiarg_constructed&)=delete; 26 | multiarg_constructed(multiarg_constructed&&)=delete; 27 | multiarg_constructed& operator=(const multiarg_constructed&)=default; 28 | 29 | operator const T&()const{return x;} 30 | 31 | T x; 32 | }; 33 | 34 | template 35 | struct transparent:Functor 36 | { 37 | using is_transparent=void; 38 | using Functor::Functor; 39 | }; 40 | 41 | template 42 | void test_insertion() 43 | { 44 | using filter=rehash_filter< 45 | revalue_filter>, 46 | transparent 47 | >; 48 | using value_type=typename filter::value_type; 49 | 50 | ValueFactory fac; 51 | 52 | { 53 | filter f(10000); 54 | value_type x{fac(),0}; 55 | f.insert(const_cast(x)); 56 | BOOST_TEST(f.may_contain(x)); 57 | } 58 | { 59 | filter f(10000); 60 | value_type x{fac(),0}; 61 | f.insert(std::move(x)); 62 | BOOST_TEST(f.may_contain(x)); 63 | } 64 | { 65 | filter f(10000); 66 | auto x=fac(); 67 | f.insert(x); /* transparent insert */ 68 | BOOST_TEST(f.may_contain(x)); 69 | } 70 | { 71 | filter f(10000); 72 | std::array input; 73 | for(auto& x:input)x={fac(),0}; 74 | f.insert(input.begin(),input.end()); 75 | BOOST_TEST(may_contain(f,input)); 76 | } 77 | { 78 | filter f(10000); 79 | std::array input; 80 | for(auto& x:input)x=fac(); 81 | f.insert(input.begin(),input.end()); /* transparent insert */ 82 | BOOST_TEST(may_contain(f,input)); 83 | } 84 | { 85 | filter f(10000); 86 | std::initializer_list il={{fac(),0},{fac(),0},{fac(),0}}; 87 | f.insert(il); 88 | BOOST_TEST(may_contain(f,il)); 89 | } 90 | } 91 | 92 | struct lambda 93 | { 94 | template 95 | void operator()(T) 96 | { 97 | using filter=typename T::type; 98 | using value_type=typename filter::value_type; 99 | 100 | test_insertion>(); 101 | } 102 | }; 103 | 104 | int main() 105 | { 106 | boost::mp11::mp_for_each(lambda{}); 107 | return boost::report_errors(); 108 | } 109 | -------------------------------------------------------------------------------- /test/test_combination.cpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Joaquin M Lopez Munoz. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | #include 10 | #include 11 | #include 12 | #include "test_types.hpp" 13 | #include "test_utilities.hpp" 14 | 15 | using namespace test_utilities; 16 | 17 | template 18 | void test_combination() 19 | { 20 | using filter=Filter; 21 | using value_type=typename filter::value_type; 22 | 23 | std::vector input1,input2; 24 | ValueFactory fac; 25 | for(int i=0;i<10;++i){ 26 | input1.push_back(fac()); 27 | input2.push_back(fac()); 28 | } 29 | 30 | #if defined(__clang__)&&defined(__has_warning) 31 | #if __has_warning("-Wself-assign-overloaded") 32 | #pragma clang diagnostic push 33 | #pragma clang diagnostic ignored "-Wself-assign-overloaded" 34 | #endif 35 | #endif 36 | { 37 | filter f{0}; 38 | f&=f; 39 | f|=f; 40 | } 41 | { 42 | filter f{input1.begin(),input1.end(),1000}, 43 | f_copy{f}; 44 | 45 | f&=f; 46 | BOOST_TEST(f==f_copy); 47 | f|=f; 48 | BOOST_TEST(f==f_copy); 49 | } 50 | #if defined(__clang__)&&defined(__has_warning) 51 | #if __has_warning("-Wself-assign-overloaded") 52 | #pragma clang diagnostic pop 53 | #endif 54 | #endif 55 | 56 | { 57 | filter f1{input1.begin(),input1.end(),1000}, 58 | f1_copy{f1}, 59 | f2{input2.begin(),input2.end(),f1.capacity()+1}; 60 | 61 | BOOST_TEST_THROWS(f1&=filter{},std::invalid_argument); 62 | BOOST_TEST(f1==f1_copy); 63 | BOOST_TEST_THROWS(f1&=f2,std::invalid_argument); 64 | BOOST_TEST(f1==f1_copy); 65 | BOOST_TEST_THROWS(f1|=filter{},std::invalid_argument); 66 | BOOST_TEST(f1==f1_copy); 67 | BOOST_TEST_THROWS(f1|=f2;,std::invalid_argument); 68 | BOOST_TEST(f1==f1_copy); 69 | } 70 | { 71 | filter f1{input1.begin(),input1.end(),1000}, 72 | f1_copy{f1}, 73 | empty{f1.capacity()}; 74 | 75 | filter& rf1=(f1|=empty); 76 | BOOST_TEST_EQ(&rf1,&f1); 77 | BOOST_TEST(f1==f1_copy); 78 | filter& rf2=(f1&=empty); 79 | BOOST_TEST_EQ(&rf2,&f1); 80 | BOOST_TEST(f1==empty); 81 | } 82 | { 83 | filter f1{input1.begin(),input1.end(),1000}; 84 | const filter f2{input2.begin(),input2.end(),f1.capacity()}; 85 | 86 | f1.insert(input2.begin(),input2.end()); 87 | f1&=f2; 88 | BOOST_TEST(may_contain(f1,input2)); 89 | BOOST_TEST(may_not_contain(f1,input1)); 90 | } 91 | { 92 | filter f1{input1.begin(),input1.end(),1000}; 93 | const filter f2{input2.begin(),input2.end(),f1.capacity()}; 94 | 95 | f1|=f2; 96 | BOOST_TEST(may_contain(f1,input1)); 97 | BOOST_TEST(may_contain(f1,input2)); 98 | } 99 | } 100 | 101 | struct lambda 102 | { 103 | template 104 | void operator()(T) 105 | { 106 | using filter=typename T::type; 107 | using value_type=typename filter::value_type; 108 | 109 | test_combination>(); 110 | } 111 | }; 112 | 113 | int main() 114 | { 115 | boost::mp11::mp_for_each(lambda{}); 116 | return boost::report_errors(); 117 | } 118 | -------------------------------------------------------------------------------- /include/boost/bloom/detail/fast_multiblock32_avx2.hpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Joaquin M Lopez Munoz. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | #ifndef BOOST_BLOOM_DETAIL_FAST_MULTIBLOCK32_AVX2_HPP 10 | #define BOOST_BLOOM_DETAIL_FAST_MULTIBLOCK32_AVX2_HPP 11 | 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include 17 | #include 18 | #include 19 | 20 | namespace boost{ 21 | namespace bloom{ 22 | 23 | #if defined(BOOST_MSVC) 24 | #pragma warning(push) 25 | #pragma warning(disable:4714) /* marked as __forceinline not inlined */ 26 | #endif 27 | 28 | template 29 | struct fast_multiblock32:detail::multiblock_fpr_base 30 | { 31 | static constexpr std::size_t k=K; 32 | using value_type=__m256i[(k+7)/8]; 33 | static constexpr std::size_t used_value_size=sizeof(std::uint32_t)*k; 34 | 35 | static BOOST_FORCEINLINE void mark(value_type& x,std::uint64_t hash) 36 | { 37 | for(std::size_t i=0;i 10 | #include 11 | #include 12 | #include 13 | #include 14 | #include 15 | #include "test_types.hpp" 16 | #include "test_utilities.hpp" 17 | 18 | using namespace test_utilities; 19 | 20 | template 21 | struct throwing_allocator 22 | { 23 | using value_type=T; 24 | 25 | throwing_allocator()=default; 26 | template 27 | throwing_allocator(const throwing_allocator&){} 28 | 29 | T* allocate(std::size_t n) 30 | { 31 | return static_cast(capped_new(n*sizeof(T))); 32 | } 33 | 34 | void deallocate(T* p,std::size_t){::operator delete(p);} 35 | 36 | bool operator==(const throwing_allocator& x)const{return true;} 37 | bool operator!=(const throwing_allocator& x)const{return false;} 38 | }; 39 | 40 | template 41 | double measure_fpr(Filter&& f,std::size_t n) 42 | { 43 | using value_type=typename std::remove_reference::type::value_type; 44 | 45 | value_factory fac; 46 | std::size_t res=0; 47 | for(std::size_t i=0;i 54 | void test_fpr() 55 | { 56 | using filter=rehash_filter< 57 | revalue_filter< 58 | realloc_filter>, 59 | std::string 60 | >, 61 | boost::hash 62 | >; 63 | 64 | BOOST_TEST_GT(filter(0,0.0).capacity(),0u); 65 | BOOST_TEST_GT(filter(0,0.5).capacity(),0u); 66 | BOOST_TEST_EQ(filter(0,1.0).capacity(),0u); 67 | BOOST_TEST_THROWS((void)filter(1,0.0),std::bad_alloc); 68 | BOOST_TEST_EQ(filter(100,1.0).capacity(),0u); 69 | 70 | { 71 | static constexpr int max_fpr_exp= 72 | std::numeric_limits::digits>=64?5:3; 73 | 74 | for(int i=1;i<=max_fpr_exp;++i){ 75 | std::size_t n=(std::size_t)std::pow(10.0,(double)(i+1)); 76 | double target_fpr=std::pow(10,(double)-i); 77 | double measured_fpr=measure_fpr(filter(n,target_fpr),n); 78 | double err=measured_fpr/target_fpr; 79 | BOOST_TEST_LE(err,2.5); 80 | } 81 | } 82 | 83 | BOOST_TEST_EQ(filter::fpr_for(0,1),0.0); 84 | BOOST_TEST_EQ(filter::fpr_for(0,0),1.0); 85 | BOOST_TEST_EQ(filter::fpr_for(1,0),1.0); 86 | 87 | { 88 | for(int i=1;i<=5;++i){ 89 | double fpr1=std::pow(10.0,(double)-i); 90 | double fpr2=filter::fpr_for(10000,filter::capacity_for(10000,fpr1)); 91 | BOOST_TEST_LE(std::abs((double)fpr2-fpr1)/fpr1,0.2); 92 | } 93 | } 94 | { 95 | for(int i=1;i<=5;++i){ 96 | std::size_t m1=(std::size_t)std::pow(10.0,(double)(i+4)); 97 | std::size_t m2=filter::capacity_for(10000,filter::fpr_for(10000,m1)); 98 | BOOST_TEST_LE(std::abs((double)m2-m1)/m1,0.05); 99 | } 100 | } 101 | } 102 | 103 | struct lambda 104 | { 105 | template 106 | void operator()(T) 107 | { 108 | using filter=typename T::type; 109 | 110 | test_fpr(); 111 | } 112 | }; 113 | 114 | int main() 115 | { 116 | boost::mp11::mp_for_each(lambda{}); 117 | return boost::report_errors(); 118 | } 119 | -------------------------------------------------------------------------------- /benchmark/fpr_c.cpp: -------------------------------------------------------------------------------- 1 | /* For a given filter type, outputs FPR vs. c = m/n with optimum k. 2 | * 3 | * Copyright 2025 Joaquin M Lopez Munoz. 4 | * Distributed under the Boost Software License, Version 1.0. 5 | * (See accompanying file LICENSE_1_0.txt or copy at 6 | * http://www.boost.org/LICENSE_1_0.txt) 7 | * 8 | * See https://www.boost.org/libs/bloom for library home page. 9 | */ 10 | 11 | #include 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include 17 | #include 18 | #include 19 | #include 20 | #include 21 | #include 22 | #include 23 | 24 | template 25 | double fpr(std::size_t c) 26 | { 27 | using value_type=typename Filter::value_type; 28 | 29 | std::size_t num_elements=(std::size_t)(1000/Filter::fpr_for(1,c)); 30 | std::vector data_in,data_out; 31 | { 32 | boost::detail::splitmix64 rng; 33 | boost::unordered_flat_set unique; 34 | for(std::size_t i=0;i 69 | using filter=boost::bloom::filter,1>; 70 | 71 | /* change this to your desired c range */ 72 | std::size_t c_min=4, 73 | c_max=24; 74 | 75 | /* you may need to change this if optimum k >= k_max */ 76 | constexpr std::size_t k_max=20; 77 | 78 | using fpr_function=std::function; 79 | static std::vector fprs=[] 80 | { 81 | std::vector fprs; 82 | using ks=boost::mp11::mp_iota_c; 83 | boost::mp11::mp_for_each([&](auto K){ 84 | fprs.emplace_back(fpr< ::filter >); 85 | }); 86 | return fprs; 87 | }(); 88 | 89 | int main() 90 | { 91 | std::string filter_name= 92 | boost::typeindex::type_id< ::filter<666> >().pretty_name(); 93 | boost::replace_all(filter_name,"boost::bloom::",""); 94 | boost::replace_all(filter_name,"class ",""); 95 | boost::replace_all(filter_name,"struct ",""); 96 | boost::replace_all(filter_name,"666","K"); 97 | 98 | std::cout 99 | <=k_max){ 107 | std::cerr<<"k_max hit, raise it and rerun\n"; 108 | return EXIT_FAILURE; 109 | } 110 | double rn=fprs[ik+1](c); 111 | if(rn>=r)break; 112 | r=rn; 113 | ++ik; 114 | } 115 | std::cout<$", printer) 44 | 45 | add_template_printer("boost::bloom::filter", BoostBloomFilterPrinter) 46 | 47 | return pp 48 | 49 | gdb.printing.register_pretty_printer(gdb.current_objfile(), boost_bloom_build_pretty_printer()) 50 | 51 | # https://sourceware.org/gdb/current/onlinedocs/gdb.html/Writing-an-Xmethod.html 52 | class BoostBloomFilterSubscriptMethod(gdb.xmethod.XMethod): 53 | def __init__(self): 54 | gdb.xmethod.XMethod.__init__(self, 'subscript') 55 | 56 | def get_worker(self, method_name): 57 | if method_name == 'operator[]': 58 | return BoostBloomFilterSubscriptWorker() 59 | 60 | class BoostBloomFilterSubscriptWorker(gdb.xmethod.XMethodWorker): 61 | def get_arg_types(self): 62 | return [gdb.lookup_type('std::size_t')] 63 | 64 | def get_result_type(self, obj): 65 | return gdb.lookup_type('unsigned char') 66 | 67 | def __call__(self, obj, index): 68 | fp = BoostBloomFilterPrinter(obj) 69 | if fp.array_size == 0: 70 | print('Error: Filter is null') 71 | return 72 | elif index < 0 or index >= fp.array_size: 73 | print('Error: Out of bounds') 74 | return 75 | else: 76 | data = fp.data 77 | return (data + index).dereference() 78 | 79 | class BoostBloomFilterMatcher(gdb.xmethod.XMethodMatcher): 80 | def __init__(self): 81 | gdb.xmethod.XMethodMatcher.__init__(self, 'BoostBloomFilterMatcher') 82 | self.methods = [BoostBloomFilterSubscriptMethod()] 83 | 84 | def match(self, class_type, method_name): 85 | if not class_type.tag.startswith('boost::bloom::filter<'): 86 | return None 87 | 88 | workers = [] 89 | for method in self.methods: 90 | if method.enabled: 91 | worker = method.get_worker(method_name) 92 | if worker: 93 | workers.append(worker) 94 | return workers 95 | 96 | gdb.xmethod.register_xmethod_matcher(None, BoostBloomFilterMatcher()) 97 | -------------------------------------------------------------------------------- /test/test_capacity.cpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Joaquin M Lopez Munoz. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | #include 10 | #include 11 | #include 12 | #include 13 | #include 14 | #include 15 | #include "test_types.hpp" 16 | #include "test_utilities.hpp" 17 | 18 | using namespace test_utilities; 19 | 20 | static std::size_t num_allocations=0; 21 | 22 | template 23 | struct counting_allocator 24 | { 25 | using value_type=T; 26 | 27 | counting_allocator()=default; 28 | template 29 | counting_allocator(const counting_allocator&){} 30 | 31 | T* allocate(std::size_t n) 32 | { 33 | ++num_allocations; 34 | return static_cast(capped_new(n*sizeof(T))); 35 | } 36 | 37 | void deallocate(T* p,std::size_t){::operator delete(p);} 38 | 39 | bool operator==(const counting_allocator& x)const{return true;} 40 | bool operator!=(const counting_allocator& x)const{return false;} 41 | }; 42 | 43 | template 44 | void test_capacity() 45 | { 46 | using filter=realloc_filter>; 47 | 48 | ValueFactory fac; 49 | 50 | { 51 | for(std::size_t n=0;n<10000;++n){ 52 | const filter f{n}; 53 | std::size_t c=f.capacity(); 54 | BOOST_TEST_EQ(c%CHAR_BIT,0); 55 | if(n==0)BOOST_TEST_EQ(c,0); 56 | else BOOST_TEST_GE(c,n); 57 | BOOST_TEST_EQ(filter{c}.capacity(),c); 58 | } 59 | } 60 | { 61 | num_allocations=0; 62 | filter f; 63 | BOOST_TEST_EQ(f.capacity(),0); 64 | BOOST_TEST_EQ(num_allocations,0); 65 | } 66 | { 67 | BOOST_TEST_THROWS( 68 | (void)filter((std::numeric_limits::max)()), 69 | std::bad_alloc); 70 | } 71 | { 72 | filter f{{fac(),fac()},1000}; 73 | std::size_t c=f.capacity(); 74 | num_allocations=0; 75 | f.reset(f.capacity()); 76 | BOOST_TEST_EQ(num_allocations,0); 77 | BOOST_TEST_EQ(f.capacity(),c); 78 | BOOST_TEST(f==filter{f.capacity()}); 79 | } 80 | { 81 | filter f{{fac(),fac()},1000}; 82 | num_allocations=0; 83 | f.reset(); 84 | BOOST_TEST_EQ(num_allocations,0); 85 | BOOST_TEST_EQ(f.capacity(),0); 86 | BOOST_TEST(f==filter{}); 87 | } 88 | { 89 | filter f{{fac(),fac()},1000}; 90 | num_allocations=0; 91 | f.reset(0,1.0); 92 | BOOST_TEST_EQ(num_allocations,0); 93 | BOOST_TEST_EQ(f.capacity(),0); 94 | BOOST_TEST(f==filter{}); 95 | } 96 | { 97 | filter f{{fac(),fac()},1000}; 98 | std::size_t c=f.capacity(); 99 | num_allocations=0; 100 | f.reset(c+1); 101 | BOOST_TEST_EQ(num_allocations,1); 102 | BOOST_TEST_GE(f.capacity(),c+1); 103 | BOOST_TEST(f==filter{f.capacity()}); 104 | } 105 | { 106 | filter f; 107 | std::size_t c=filter::capacity_for(100,0.1); 108 | num_allocations=0; 109 | f.reset(100,0.1); 110 | BOOST_TEST_EQ(num_allocations,1); 111 | BOOST_TEST_EQ(f.capacity(),c); 112 | } 113 | { 114 | filter f1{{fac(),fac()},1000},f2; 115 | std::size_t c=f1.capacity(); 116 | num_allocations=0; 117 | f2=f1; 118 | BOOST_TEST_EQ(num_allocations,1); 119 | BOOST_TEST_GE(f2.capacity(),c); 120 | BOOST_TEST(f1==f2); 121 | } 122 | { 123 | for(int i=0;i<=5;++i){ 124 | double fpr=std::pow(10,(double)-i); 125 | BOOST_TEST_EQ( 126 | filter::capacity_for(100,fpr), 127 | filter(100,fpr).capacity()); 128 | } 129 | } 130 | } 131 | 132 | struct lambda 133 | { 134 | template 135 | void operator()(T) 136 | { 137 | using filter=typename T::type; 138 | using value_type=typename filter::value_type; 139 | 140 | test_capacity>(); 141 | } 142 | }; 143 | 144 | int main() 145 | { 146 | boost::mp11::mp_for_each(lambda{}); 147 | return boost::report_errors(); 148 | } 149 | -------------------------------------------------------------------------------- /include/boost/bloom/detail/fast_multiblock64_avx2.hpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Joaquin M Lopez Munoz. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | #ifndef BOOST_BLOOM_DETAIL_FAST_MULTIBLOCK64_AVX2_HPP 10 | #define BOOST_BLOOM_DETAIL_FAST_MULTIBLOCK64_AVX2_HPP 11 | 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include 17 | #include 18 | #include 19 | 20 | namespace boost{ 21 | namespace bloom{ 22 | 23 | #if defined(BOOST_MSVC) 24 | #pragma warning(push) 25 | #pragma warning(disable:4714) /* marked as __forceinline not inlined */ 26 | #endif 27 | 28 | namespace detail{ 29 | 30 | struct m256ix2 31 | { 32 | __m256i lo,hi; 33 | }; 34 | 35 | } /* namespace detail */ 36 | 37 | template 38 | struct fast_multiblock64:detail::multiblock_fpr_base 39 | { 40 | static constexpr std::size_t k=K; 41 | using value_type=detail::m256ix2[(k+7)/8]; 42 | static constexpr std::size_t used_value_size=sizeof(std::uint64_t)*k; 43 | 44 | static BOOST_FORCEINLINE void mark(value_type& x,std::uint64_t hash) 45 | { 46 | for(int i=0;i4)x.hi=_mm256_or_si256(x.hi,h.hi); 102 | } 103 | 104 | #if BOOST_WORKAROUND(BOOST_MSVC,<=1900) 105 | /* 'int': forcing value to bool 'true' or 'false' */ 106 | #pragma warning(push) 107 | #pragma warning(disable:4800) 108 | #endif 109 | 110 | static BOOST_FORCEINLINE bool check_m256ix2( 111 | const detail::m256ix2& x,std::uint64_t hash,std::size_t kp) 112 | { 113 | detail::m256ix2 h=make_m256ix2(hash,kp); 114 | auto res=_mm256_testc_si256(x.lo,h.lo); 115 | if(kp>4)res&=_mm256_testc_si256(x.hi,h.hi); 116 | return res; 117 | } 118 | 119 | #if BOOST_WORKAROUND(BOOST_MSVC,<=1900) 120 | #pragma warning(pop) /* C4800 */ 121 | #endif 122 | }; 123 | 124 | #if defined(BOOST_MSVC) 125 | #pragma warning(pop) /* C4714 */ 126 | #endif 127 | 128 | } /* namespace bloom */ 129 | } /* namespace boost */ 130 | 131 | #endif 132 | -------------------------------------------------------------------------------- /.gitattributes: -------------------------------------------------------------------------------- 1 | * text=auto !eol svneol=native#text/plain 2 | *.gitattributes text svneol=native#text/plain 3 | 4 | # Scriptish formats 5 | *.bat text svneol=native#text/plain 6 | *.bsh text svneol=native#text/x-beanshell 7 | *.cgi text svneol=native#text/plain 8 | *.cmd text svneol=native#text/plain 9 | *.js text svneol=native#text/javascript 10 | *.php text svneol=native#text/x-php 11 | *.pl text svneol=native#text/x-perl 12 | *.pm text svneol=native#text/x-perl 13 | *.py text svneol=native#text/x-python 14 | *.sh eol=lf svneol=LF#text/x-sh 15 | configure eol=lf svneol=LF#text/x-sh 16 | 17 | # Image formats 18 | *.bmp binary svneol=unset#image/bmp 19 | *.gif binary svneol=unset#image/gif 20 | *.ico binary svneol=unset#image/ico 21 | *.jpeg binary svneol=unset#image/jpeg 22 | *.jpg binary svneol=unset#image/jpeg 23 | *.png binary svneol=unset#image/png 24 | *.tif binary svneol=unset#image/tiff 25 | *.tiff binary svneol=unset#image/tiff 26 | *.svg text svneol=native#image/svg%2Bxml 27 | 28 | # Data formats 29 | *.pdf binary svneol=unset#application/pdf 30 | *.avi binary svneol=unset#video/avi 31 | *.doc binary svneol=unset#application/msword 32 | *.dsp text svneol=crlf#text/plain 33 | *.dsw text svneol=crlf#text/plain 34 | *.eps binary svneol=unset#application/postscript 35 | *.gz binary svneol=unset#application/gzip 36 | *.mov binary svneol=unset#video/quicktime 37 | *.mp3 binary svneol=unset#audio/mpeg 38 | *.ppt binary svneol=unset#application/vnd.ms-powerpoint 39 | *.ps binary svneol=unset#application/postscript 40 | *.psd binary svneol=unset#application/photoshop 41 | *.rdf binary svneol=unset#text/rdf 42 | *.rss text svneol=unset#text/xml 43 | *.rtf binary svneol=unset#text/rtf 44 | *.sln text svneol=native#text/plain 45 | *.swf binary svneol=unset#application/x-shockwave-flash 46 | *.tgz binary svneol=unset#application/gzip 47 | *.vcproj text svneol=native#text/xml 48 | *.vcxproj text svneol=native#text/xml 49 | *.vsprops text svneol=native#text/xml 50 | *.wav binary svneol=unset#audio/wav 51 | *.xls binary svneol=unset#application/vnd.ms-excel 52 | *.zip binary svneol=unset#application/zip 53 | 54 | # Text formats 55 | .htaccess text svneol=native#text/plain 56 | *.bbk text svneol=native#text/xml 57 | *.cmake text svneol=native#text/plain 58 | *.css text svneol=native#text/css 59 | *.dtd text svneol=native#text/xml 60 | *.htm text svneol=native#text/html 61 | *.html text svneol=native#text/html 62 | *.ini text svneol=native#text/plain 63 | *.log text svneol=native#text/plain 64 | *.mak text svneol=native#text/plain 65 | *.qbk text svneol=native#text/plain 66 | *.rst text svneol=native#text/plain 67 | *.sql text svneol=native#text/x-sql 68 | *.txt text svneol=native#text/plain 69 | *.xhtml text svneol=native#text/xhtml%2Bxml 70 | *.xml text svneol=native#text/xml 71 | *.xsd text svneol=native#text/xml 72 | *.xsl text svneol=native#text/xml 73 | *.xslt text svneol=native#text/xml 74 | *.xul text svneol=native#text/xul 75 | *.yml text svneol=native#text/plain 76 | boost-no-inspect text svneol=native#text/plain 77 | CHANGES text svneol=native#text/plain 78 | COPYING text svneol=native#text/plain 79 | INSTALL text svneol=native#text/plain 80 | Jamfile text svneol=native#text/plain 81 | Jamroot text svneol=native#text/plain 82 | Jamfile.v2 text svneol=native#text/plain 83 | Jamrules text svneol=native#text/plain 84 | Makefile* text svneol=native#text/plain 85 | README text svneol=native#text/plain 86 | TODO text svneol=native#text/plain 87 | 88 | # Code formats 89 | *.c text svneol=native#text/plain 90 | *.cpp text svneol=native#text/plain 91 | *.h text svneol=native#text/plain 92 | *.hpp text svneol=native#text/plain 93 | *.ipp text svneol=native#text/plain 94 | *.tpp text svneol=native#text/plain 95 | *.jam text svneol=native#text/plain 96 | *.java text svneol=native#text/plain 97 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Boost Bloom Library 2 | 3 | [![Branch](https://img.shields.io/badge/branch-master-brightgreen.svg)](https://github.com/boostorg/bloom/tree/master) [![CI](https://github.com/boostorg/bloom/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/boostorg/bloom/actions/workflows/ci.yml) [![Drone status](https://img.shields.io/drone/build/boostorg/bloom/master?server=https%3A%2F%2Fdrone.cpp.al&logo=drone&logoColor=%23CCCCCC&label=CI)](https://drone.cpp.al/boostorg/bloom) [![codecov](https://codecov.io/gh/joaquintides/bloom/branch/master/graph/badge.svg)](https://app.codecov.io/gh/joaquintides/bloom/tree/master) [![Documentation](https://img.shields.io/badge/docs-master-brightgreen.svg)](https://boost.org/doc/libs/master/libs/bloom)
4 | [![Branch](https://img.shields.io/badge/branch-develop-brightgreen.svg)](https://github.com/boostorg/bloom/tree/develop) [![CI](https://github.com/boostorg/bloom/actions/workflows/ci.yml/badge.svg?branch=develop)](https://github.com/boostorg/bloom/actions/workflows/ci.yml) [![Drone status](https://img.shields.io/drone/build/boostorg/bloom/develop?server=https%3A%2F%2Fdrone.cpp.al&logo=drone&logoColor=%23CCCCCC&label=CI)](https://drone.cpp.al/boostorg/bloom) [![codecov](https://codecov.io/gh/joaquintides/bloom/branch/develop/graph/badge.svg)](https://app.codecov.io/gh/joaquintides/bloom/tree/develop) [![Documentation](https://img.shields.io/badge/docs-develop-brightgreen.svg)](https://boost.org/doc/libs/develop/libs/bloom)
5 | [![BSL 1.0](https://img.shields.io/badge/license-BSL_1.0-blue.svg)](https://www.boost.org/users/license.html) C++11 required Header-only library 6 | 7 | Boost.Bloom provides the class template `boost::bloom::filter` that 8 | can be configured to implement a classical [Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) 9 | as well as variations discussed in the literature such as block filters, multiblock filters, and more. 10 | 11 | ```cpp 12 | #include 13 | #include 14 | #include 15 | 16 | int main() 17 | { 18 | // Bloom filter of strings with 5 bits set per insertion 19 | using filter = boost::bloom::filter; 20 | 21 | // create filter with a capacity of 1'000'000 **bits** 22 | filter f(1'000'000); 23 | 24 | // insert elements (they can't be erased, Bloom filters are insert-only) 25 | f.insert("hello"); 26 | f.insert("Boost"); 27 | //... 28 | 29 | // elements inserted are always correctly checked as such 30 | assert(f.may_contain("hello") == true); 31 | 32 | // elements not inserted may incorrectly be identified as such with a 33 | // false positive rate (FPR) which is a function of the array capacity, 34 | // the number of bits set per element and generally how the boost::bloom::filter 35 | // was specified 36 | if(f.may_contain("bye")) { // likely false 37 | //... 38 | } 39 | } 40 | ``` 41 | 42 | ## Learn about Boost.Bloom 43 | 44 | * [Online documentation](https://boost.org/libs/bloom) 45 | * [Some benchmarks](https://github.com/boostorg/boost_bloom_benchmarks) 46 | 47 | ## Install Boost.Bloom 48 | 49 | * [Download Boost](https://www.boost.org/users/download/) and you're ready to go (this is a header-only library requiring no building). 50 | * Using Conan 2: In case you don't have it yet, add an entry for Boost in your `conanfile.txt` (the example requires at least Boost 1.89): 51 | ``` 52 | [requires] 53 | boost/[>=1.89.0] 54 | ``` 55 |
    If you're not using any compiled Boost library, the following will skip building altogether:
56 | 57 | ``` 58 | [options] 59 | boost:header_only=True 60 | ``` 61 | * Using vcpkg: Execute the command 62 | ``` 63 | vcpkg install boost-bloom 64 | ``` 65 | * Using CMake: [Boost CMake support infrastructure](https://github.com/boostorg/cmake) 66 | allows you to use CMake directly to download, build and consume all of Boost or 67 | some specific libraries. 68 | 69 | ## Support 70 | 71 | * Join the **#boost** discussion group at [cpplang.slack.com](https://cpplang.slack.com/) 72 | ([ask for an invite](https://cppalliance.org/slack/) if you’re not a member of this workspace yet) 73 | * [File an issue](https://github.com/boostorg/bloom/issues) 74 | 75 | ## Contribute 76 | 77 | * [Pull requests](https://github.com/boostorg/bloom/pulls) against **develop** branch are most welcome. 78 | Note that by submitting patches you agree to license your modifications under the [Boost Software License, Version 1.0](http://www.boost.org/LICENSE_1_0.txt). 79 | -------------------------------------------------------------------------------- /test/test_utilities.hpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Joaquin M Lopez Munoz. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | #ifndef BOOST_BLOOM_TEST_TEST_UTILITIES_HPP 10 | #define BOOST_BLOOM_TEST_TEST_UTILITIES_HPP 11 | 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include 17 | 18 | namespace test_utilities{ 19 | 20 | template 21 | struct value_factory 22 | { 23 | T operator()(){return n++;} 24 | T n=0; 25 | }; 26 | 27 | template<> 28 | struct value_factory 29 | { 30 | std::string operator()() 31 | { 32 | return std::to_string(n++); 33 | } 34 | 35 | int n=0; 36 | }; 37 | 38 | template 39 | struct revalue_filter_impl; 40 | 41 | template< 42 | typename T,std::size_t K,typename S,std::size_t B,typename H,typename A, 43 | typename U 44 | > 45 | struct revalue_filter_impl,U> 46 | { 47 | using type=boost::bloom::filter; 48 | }; 49 | 50 | template 51 | using revalue_filter=typename revalue_filter_impl::type; 52 | 53 | template 54 | struct rehash_filter_impl; 55 | 56 | template< 57 | typename T,std::size_t K,typename S,std::size_t B,typename H,typename A, 58 | typename Hash 59 | > 60 | struct rehash_filter_impl,Hash> 61 | { 62 | using type=boost::bloom::filter; 63 | }; 64 | 65 | template 66 | using rehash_filter=typename rehash_filter_impl::type; 67 | 68 | template 69 | struct realloc_filter_impl; 70 | 71 | template< 72 | typename T,std::size_t K,typename S,std::size_t B,typename H,typename A, 73 | typename Allocator 74 | > 75 | struct realloc_filter_impl,Allocator> 76 | { 77 | using type=boost::bloom::filter; 78 | }; 79 | 80 | template 81 | using realloc_filter=typename realloc_filter_impl::type; 82 | 83 | void* capped_new(std::size_t n) 84 | { 85 | using limits=std::numeric_limits; 86 | static constexpr std::size_t alloc_limit= 87 | limits::digits>=64? 88 | /* avoid AddressSanitizer: allocation-size-too-big */ 89 | (std::size_t)0x10000000000ull: 90 | /* avoid big allocations that might succeed in 32-bit */ 91 | (limits::max)()/256; 92 | 93 | if(n>alloc_limit)throw std::bad_alloc{}; 94 | 95 | return ::operator new(n); 96 | } 97 | 98 | template 99 | std::size_t may_contain_count(const Filter& f,const Input& input) 100 | { 101 | using input_value_type=typename Input::value_type; 102 | std::size_t res=0; 103 | f.may_contain( 104 | input.begin(),input.end(), 105 | [&](const input_value_type&,bool b){res+=b;}); 106 | return res; 107 | } 108 | 109 | template 110 | bool may_contain(const Filter& f,const Input& input) 111 | { 112 | return may_contain_count(f,input)==input.size(); 113 | } 114 | 115 | template 116 | bool may_not_contain(const Filter& f,const Input& input) 117 | { 118 | /* may_contain_count should be 0 with high probability */ 119 | return may_contain_count(f,input) 123 | class input_iterator 124 | { 125 | using traits=std::iterator_traits; 126 | Iterator it; 127 | 128 | public: 129 | using iterator_category=std::input_iterator_tag; 130 | using value_type=typename traits::value_type; 131 | using difference_type=typename traits::difference_type; 132 | using pointer=Iterator; 133 | using reference=typename traits::reference; 134 | 135 | input_iterator(Iterator it_):it{it_}{} 136 | reference operator*()const{return *it;} 137 | pointer operator->()const{return it;} 138 | input_iterator& operator++(){++it;return *this;} 139 | input_iterator operator++(int){auto res=*this;++it;return res;} 140 | bool operator==(const input_iterator& x)const{return it==x.it;} 141 | bool operator!=(const input_iterator& x)const{return !(*this==x);} 142 | }; 143 | 144 | template 145 | input_iterator make_input_iterator(Iterator it){return {it};} 146 | 147 | } /* namespace test_utilities */ 148 | #endif 149 | -------------------------------------------------------------------------------- /include/boost/bloom/detail/fast_multiblock32_sse2.hpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Joaquin M Lopez Munoz. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | #ifndef BOOST_BLOOM_DETAIL_FAST_MULTIBLOCK32_SSE2_HPP 10 | #define BOOST_BLOOM_DETAIL_FAST_MULTIBLOCK32_SSE2_HPP 11 | 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include 17 | #include 18 | #include 19 | 20 | #ifdef __SSE4_1__ 21 | #include 22 | #endif 23 | 24 | namespace boost{ 25 | namespace bloom{ 26 | 27 | #if defined(BOOST_MSVC) 28 | #pragma warning(push) 29 | #pragma warning(disable:4714) /* marked as __forceinline not inlined */ 30 | #endif 31 | 32 | namespace detail{ 33 | 34 | struct m128ix2 35 | { 36 | __m128i lo,hi; 37 | }; 38 | 39 | /* NOLINTNEXTLINE(readability-redundant-inline-specifier) */ 40 | static inline int mm_testc_si128(__m128i x,__m128i y) 41 | { 42 | #ifdef __SSE4_1__ 43 | return _mm_testc_si128(x,y); 44 | #else 45 | return _mm_movemask_epi8(_mm_cmpeq_epi32(_mm_and_si128(x,y),y))==0xFFFF; 46 | #endif 47 | } 48 | 49 | } /* namespace detail */ 50 | 51 | template 52 | struct fast_multiblock32:detail::multiblock_fpr_base 53 | { 54 | static constexpr std::size_t k=K; 55 | using value_type=detail::m128ix2[(k+7)/8]; 56 | static constexpr std::size_t used_value_size=sizeof(std::uint32_t)*k; 57 | 58 | static BOOST_FORCEINLINE void mark(value_type& x,std::uint64_t hash) 59 | { 60 | for(std::size_t i=0;i4)x.hi=_mm_or_si128(x.hi,h.hi); 124 | } 125 | 126 | #if BOOST_WORKAROUND(BOOST_MSVC,<=1900) 127 | /* 'int': forcing value to bool 'true' or 'false' */ 128 | #pragma warning(push) 129 | #pragma warning(disable:4800) 130 | #endif 131 | 132 | static BOOST_FORCEINLINE bool check_m128ix2( 133 | const detail::m128ix2& x,std::uint64_t hash,std::size_t kp) 134 | { 135 | detail::m128ix2 h=make_m128ix2(hash,kp); 136 | auto res=detail::mm_testc_si128(x.lo,h.lo); 137 | if(kp>4)res&=detail::mm_testc_si128(x.hi,h.hi); 138 | return res; 139 | } 140 | 141 | #if BOOST_WORKAROUND(BOOST_MSVC,<=1900) 142 | #pragma warning(pop) /* C4800 */ 143 | #endif 144 | }; 145 | 146 | #if defined(BOOST_MSVC) 147 | #pragma warning(pop) /* C4714 */ 148 | #endif 149 | 150 | } /* namespace bloom */ 151 | } /* namespace boost */ 152 | 153 | #endif 154 | -------------------------------------------------------------------------------- /doc/bloom/fpr_estimation.adoc: -------------------------------------------------------------------------------- 1 | [#fpr_estimation] 2 | = Appendix A: FPR Estimation 3 | 4 | :idprefix: fpr_estimation_ 5 | 6 | For a classical Bloom filter, the theoretical false positive rate, under some simplifying assumptions, 7 | is given by 8 | 9 | [.formula-center] 10 | {small}stem:[\text{FPR}(n,m,k)=\left(1 - \left(1 - \displaystyle\frac{1}{m}\right)^{kn}\right)^k \approx \left(1 - e^{-kn/m}\right)^k]{small-end} for large {small}stem:[m]{small-end}, 11 | 12 | where {small}stem:[n]{small-end} is the number of elements inserted in the filter, {small}stem:[m]{small-end} its capacity in bits and {small}stem:[k]{small-end} the 13 | number of bits set per insertion (see a https://en.wikipedia.org/wiki/Bloom_filter#Probability_of_false_positives[derivation^] 14 | of this formula). For a fixed inverse load factor {small}stem:[c=m/n]{small-end}, 15 | the expression reaches at 16 | 17 | [.formula-center] 18 | {small}stem:[k_{\text{opt}}=c\cdot\ln2]{small-end} 19 | 20 | its minimum value 21 | {small}stem:[1/2^{k_{\text{opt}}} \approx 0.6185^{c}]{small-end}. 22 | The optimum {small}stem:[k]{small-end}, which must be an integer, 23 | is either 24 | {small}stem:[\lfloor k_{\text{opt}}\rfloor]{small-end} or 25 | {small}stem:[\lceil k_{\text{opt}}\rceil]{small-end}. 26 | 27 | In the case of filter of the form `boost::bloom::filter>`, we can extend 28 | the approach from https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=f376ff09a64b388bfcde2f5353e9ddb44033aac8[Putze et al.^] 29 | to derive the (approximate but very precise) formula: 30 | 31 | [.formula-center] 32 | {small}stem:[\text{FPR}_{\text{block}}(n,m,b,k,k')=\left(\displaystyle\sum_{i=0}^{\infty} \text{Pois}(i,nbk/m) \cdot \text{FPR}(i,b,k')\right)^{k},]{small-end} 33 | 34 | where 35 | 36 | [.formula-center] 37 | {small}stem:[\text{Pois}(i,\lambda)=\displaystyle\frac{\lambda^i e^{-\lambda}}{i!}]{small-end} 38 | 39 | is the probability mass function of a https://en.wikipedia.org/wiki/Poisson_distribution[Poisson distribution^] 40 | with mean {small}stem:[\lambda]{small-end}, and {small}stem:[b]{small-end} is the size of `Block` in bits. If we're using `multiblock`, we have 41 | 42 | [.formula-center] 43 | {small}stem:[\text{FPR}_\text{multiblock}(n,m,b,k,k')=\left(\displaystyle\sum_{i=0}^{\infty} \text{Pois}(i,nbkk'/m) \cdot \text{FPR}(i,b,1)^{k'}\right)^{k}.]{small-end} 44 | 45 | As we have commented xref:primer_multiblock_filters[before], in general 46 | 47 | [.formula-center] 48 | {small}stem:[\text{FPR}_\text{block}(n,m,b,k,k') \geq \text{FPR}_\text{multiblock}(n,m,b,k,k') \geq \text{FPR}(n,m,kk'),]{small-end} 49 | 50 | that is, block and multiblock filters have worse FPR than the classical filter for the same number of bits 51 | set per insertion, but they will be faster. We have the particular case 52 | 53 | [.formula-center] 54 | {small}stem:[\text{FPR}_{\text{block}}(n,m,b,k,1)=\text{FPR}_{\text{multiblock}}(n,m,b,k,1)=\text{FPR}(n,m,k),]{small-end} 55 | 56 | which follows simply from the observation that using `{block|multiblock}` behaves exactly as 57 | a classical Bloom filter. 58 | 59 | We don't know of any closed, simple formula for the FPR of block and multiblock filters when 60 | `Stride` is not its "natural" size `xref:subfilters_used_value_size[_used-value-size_]`, 61 | that is, when subfilter subarrays overlap. 62 | We can use the following approximations ({small}stem:[s]{small-end} = `Stride` in bits): 63 | 64 | [.formula-center] 65 | {small}stem:[\text{FPR}_{\text{block}}(n,m,b,s,k,k')=\left(\displaystyle\sum_{i=0}^{\infty} \text{Pois}\left(i,\frac{n(2b-s)k}{m}\right) \cdot \text{FPR}(i,2b-s,k')\right)^{k},]{small-end} + 66 | {small}stem:[\text{FPR}_\text{multiblock}(n,m,b,s,k,k')=\left(\displaystyle\sum_{i=0}^{\infty} \text{Pois}\left(i,\frac{n(2bk'-s)k}{m}\right) \cdot \text{FPR}\left(i,\frac{2bk'-s}{k'},1\right)^{k'}\right)^{k},]{small-end} 67 | 68 | where the replacement of {small}stem:[b]{small-end} with {small}stem:[2b-s]{small-end} 69 | (or {small}stem:[bk']{small-end} with {small}stem:[2bk'-s]{small-end} for multiblock filters) accounts 70 | for the fact that the window of hashing positions affecting a particular bit spreads due to 71 | overlapping. Note that the formulas reduce to the non-overlapping case when {small}stem:[s]{small-end} takes its 72 | default value (stem:[b] for block, stem:[bk'] for multiblock). These approximations are acceptable for 73 | low values of {small}stem:[k']{small-end} but tend to underestimate the actual FPR as {small}stem:[k']{small-end} grows. 74 | In general, the use of overlapping improves (decreases) FPR by a factor ranging from 75 | 0.6 to 0.9 for typical filter configurations. 76 | 77 | {small}stem:[\text{FPR}_{\text{block}}(n,m,b,s,k,k')]{small-end} and {small}stem:[\text{FPR}_\text{multiblock}(n,m,b,s,k,k')]{small-end} 78 | are the formulas used by the implementation of 79 | `xref:filter_fpr_estimation[boost::filter::fpr_for]`. 80 | -------------------------------------------------------------------------------- /include/boost/bloom/detail/fast_multiblock32_neon.hpp: -------------------------------------------------------------------------------- 1 | /* Copyright 2025 Joaquin M Lopez Munoz. 2 | * Distributed under the Boost Software License, Version 1.0. 3 | * (See accompanying file LICENSE_1_0.txt or copy at 4 | * http://www.boost.org/LICENSE_1_0.txt) 5 | * 6 | * See https://www.boost.org/libs/bloom for library home page. 7 | */ 8 | 9 | #ifndef BOOST_BLOOM_DETAIL_FAST_MULTIBLOCK32_NEON_HPP 10 | #define BOOST_BLOOM_DETAIL_FAST_MULTIBLOCK32_NEON_HPP 11 | 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include 17 | #include 18 | 19 | namespace boost{ 20 | namespace bloom{ 21 | 22 | #if defined(BOOST_MSVC) 23 | #pragma warning(push) 24 | #pragma warning(disable:4714) /* marked as __forceinline not inlined */ 25 | #endif 26 | 27 | /* https://stackoverflow.com/a/54018882/213114 */ 28 | 29 | #ifdef _MSC_VER 30 | #define BOOST_BLOOM_INIT_U32X4(w,x,y,z) \ 31 | {(std::uint32_t(w)+(unsigned long long(x)<<32)), \ 32 | (std::uint32_t(y)+(unsigned long long(z)<<32))} 33 | #else 34 | #define BOOST_BLOOM_INIT_U32X4(w,x,y,z) \ 35 | {std::uint32_t(w),std::uint32_t(x),std::uint32_t(y),std::uint32_t(z)} 36 | #endif 37 | 38 | #define BOOST_BLOOM_INIT_U32X4X2(w0,x0,y0,z0,w1,x1,y1,z1) \ 39 | {{BOOST_BLOOM_INIT_U32X4(w0,x0,y0,z0),BOOST_BLOOM_INIT_U32X4(w1,x1,y1,z1)}} 40 | 41 | template 42 | struct fast_multiblock32:detail::multiblock_fpr_base 43 | { 44 | static constexpr std::size_t k=K; 45 | using value_type=uint32x4x2_t[(k+7)/8]; 46 | static constexpr std::size_t used_value_size=sizeof(std::uint32_t)*k; 47 | 48 | static BOOST_FORCEINLINE void mark(value_type& x,std::uint64_t hash) 49 | { 50 | for(std::size_t i=0;i 15 | #include 16 | #include 17 | #include 18 | #include 19 | #include 20 | 21 | namespace boost{ 22 | namespace bloom{ 23 | namespace detail{ 24 | namespace is_nothrow_swappable_helper_detail{ 25 | 26 | using std::swap; 27 | 28 | template 29 | struct is_nothrow_swappable_helper 30 | { 31 | constexpr static bool value=false; 32 | }; 33 | 34 | template 35 | struct is_nothrow_swappable_helper< 36 | T, 37 | boost::void_t(),std::declval()))> 38 | > 39 | { 40 | constexpr static bool value= 41 | noexcept(swap(std::declval(),std::declval())); 42 | }; 43 | 44 | } /* namespace is_nothrow_swappable_helper_detail */ 45 | 46 | template 47 | struct is_nothrow_swappable:std::integral_constant< 48 | bool, 49 | is_nothrow_swappable_helper_detail::is_nothrow_swappable_helper::value 50 | >{}; 51 | 52 | #define BOOST_BLOOM_STATIC_ASSERT_IS_NOTHROW_SWAPPABLE(T) \ 53 | static_assert( \ 54 | boost::bloom::detail::is_nothrow_swappable< T >::value, \ 55 | #T " must be nothrow swappable") 56 | 57 | template 58 | struct is_cv_unqualified_object:std::integral_constant< 59 | bool, 60 | !std::is_const::value&& 61 | !std::is_volatile::value&& 62 | !std::is_function::value&& 63 | !std::is_reference::value&& 64 | !std::is_void::value 65 | >{}; 66 | 67 | #define BOOST_BLOOM_STATIC_ASSERT_IS_CV_UNQUALIFIED_OBJECT(T) \ 68 | static_assert( \ 69 | boost::bloom::detail::is_cv_unqualified_object< T >::value, \ 70 | #T " must be a cv-unqualified object type") 71 | 72 | template 73 | struct remove_cvref 74 | { 75 | using type= 76 | typename std::remove_cv::type>::type; 77 | }; 78 | 79 | template 80 | using remove_cvref_t=typename remove_cvref::type; 81 | 82 | template 83 | struct is_transparent:std::false_type{}; 84 | 85 | template 86 | struct is_transparent>:std::true_type{}; 87 | 88 | template 89 | using enable_if_transparent_t= 90 | typename std::enable_if::value,Q>::type; 91 | 92 | template 93 | struct is_integral_or_extended_integral:std::is_integral{}; 94 | template 95 | struct is_unsigned_or_extended_unsigned:std::is_unsigned{}; 96 | 97 | #if defined(__SIZEOF_INT128__) 98 | 99 | #if defined(BOOST_GCC) 100 | #pragma GCC diagnostic push 101 | #pragma GCC diagnostic ignored "-Wpedantic" 102 | #endif 103 | 104 | template<> 105 | struct is_integral_or_extended_integral<__int128>:std::true_type{}; 106 | template<> 107 | struct is_integral_or_extended_integral:std::true_type{}; 108 | template<> 109 | struct is_unsigned_or_extended_unsigned:std::true_type{}; 110 | 111 | #if defined(BOOST_GCC) 112 | #pragma GCC diagnostic pop 113 | #endif 114 | 115 | #endif 116 | 117 | template 118 | struct is_unsigned_integral_or_extended_unsigned_integral: 119 | std::integral_constant< 120 | bool, 121 | is_integral_or_extended_integral::value&& 122 | is_unsigned_or_extended_unsigned::value 123 | > 124 | {}; 125 | 126 | template class Trait> 127 | struct is_array_of:std::false_type{}; 128 | 129 | template class Trait> 130 | struct is_array_of:Trait{}; 131 | 132 | template struct array_size: 133 | std::integral_constant{}; 134 | template struct array_size: 135 | std::integral_constant{}; 136 | 137 | template 138 | struct is_power_of_two:std::integral_constant{}; 139 | 140 | #if defined(BOOST_NO_CXX20_HDR_CONCEPTS) 141 | template 142 | using is_forward_iterator=std::is_base_of< 143 | std::forward_iterator_tag, 144 | typename std::iterator_traits::iterator_category 145 | >; 146 | #else 147 | template 148 | using is_forward_iterator=std::integral_constant< 149 | bool, 150 | std::forward_iterator 151 | >; 152 | #endif 153 | 154 | #define BOOST_BLOOM_STATIC_ASSERT_IS_FORWARD_ITERATOR(Iterator) \ 155 | static_assert( \ 156 | boost::bloom::detail::is_forward_iterator< Iterator >::value, \ 157 | #Iterator " must be a forward iterator") 158 | 159 | } /* namespace detail */ 160 | } /* namespace bloom */ 161 | } /* namespace boost */ 162 | 163 | #endif 164 | -------------------------------------------------------------------------------- /example/rolling_filter.cpp: -------------------------------------------------------------------------------- 1 | /* Proof-of-concept implementation of a rolling Bloom filter. 2 | * 3 | * Copyright 2025 Joaquin M Lopez Munoz. 4 | * Distributed under the Boost Software License, Version 1.0. 5 | * (See accompanying file LICENSE_1_0.txt or copy at 6 | * http://www.boost.org/LICENSE_1_0.txt) 7 | * 8 | * See https://www.boost.org/libs/bloom for library home page. 9 | */ 10 | 11 | #include 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include 17 | 18 | /* Regular Bloom filters don't "forget", that is, the number of elements 19 | * in the filter keeps growing (and the FPR increasing) until the entire bit 20 | * array is reset. This is how a *rolling filter* can be devised that keeps 21 | * track of the last elements inserted only: 22 | * 23 | * - A so-called *window size* w and a number of windows n are specified. 24 | * - n regular Bloom filters are kept simultaneously, one for each window. 25 | * - We keep an index i to designate the i-th filter as the *active filter*. 26 | * - Insertion of a new element is done in the active filter only. 27 | * - Lookup queries all the filters. 28 | * - Every w insertions, we bump i (mod n) to switch to a new active filter 29 | * and clear it before continuing with the insertions. 30 | * 31 | * It's not hard to see that this algorithm implements a filter that remembers 32 | * only the last s elements inserted, with w * (n-1) < s <= w * n. The 33 | * resulting FPR oscillates between 1 - (1 - sub_fpr)^(n-1) and 34 | * 1 - (1 - sub_fpr)^n, where sub_fpr is the FPR of an individual filter 35 | * after w insertions. 36 | */ 37 | 38 | template< 39 | typename T, std::size_t K, 40 | typename Subfilter = boost::bloom::block, 41 | std::size_t Stride = 0, 42 | typename Hash = boost::hash, 43 | typename Allocator=std::allocator 44 | > 45 | class rolling_filter 46 | { 47 | public: 48 | rolling_filter(std::size_t w_, std::size_t n, double max_fpr): 49 | w(w_), fs(n) 50 | { 51 | assert(w > 0); 52 | assert(n >= 2); 53 | assert(max_fpr >= 0.0 && max_fpr <= 1.0); 54 | 55 | /* Adjust the capacity of each individual filter so that 56 | * this->max_fpr() ~ max_fpr. 57 | */ 58 | 59 | auto sub_fpr = 1.0 - std::pow(1.0 - max_fpr, 1.0 / n); 60 | auto m = filter_type::capacity_for(w, sub_fpr); 61 | for(auto& f: fs) f.reset(m); 62 | } 63 | 64 | std::size_t min_size() const 65 | { 66 | /* Minimum size under *stationary* conditions: the actual size can be 67 | * smaller than this if we haven't yet inserted that many elements. 68 | */ 69 | 70 | return w * (fs.size() - 1); 71 | } 72 | 73 | std::size_t max_size() const 74 | { 75 | return w * fs.size(); 76 | } 77 | 78 | double min_fpr() const 79 | { 80 | double sub_fpr = filter_type::fpr_for(w, fs[0].capacity()); 81 | return 1.0 - std::pow(1.0 - sub_fpr, (double)fs.size() - 1.0); 82 | } 83 | 84 | double max_fpr() const 85 | { 86 | double sub_fpr = filter_type::fpr_for(w, fs[0].capacity()); 87 | return 1.0 - std::pow(1.0 - sub_fpr, (double)fs.size()); 88 | } 89 | 90 | std::size_t capacity() const 91 | { 92 | return fs[0].capacity() * fs.size(); 93 | } 94 | 95 | void insert(const T& x) 96 | { 97 | if(++count > w) { 98 | count = 1; 99 | if(++i >= fs.size()) i = 0; 100 | fs[i].clear(); 101 | } 102 | fs[i].insert(x); 103 | } 104 | 105 | bool may_contain(const T& x) const 106 | { 107 | for(const auto& f: fs) { 108 | if(f.may_contain(x)) return true; 109 | } 110 | return false; 111 | } 112 | 113 | private: 114 | using filter_type = boost::bloom::filter< 115 | T, K, Subfilter, Stride, Hash, Allocator 116 | >; 117 | using vector_type = std::vector< 118 | filter_type, 119 | typename std::allocator_traits:: 120 | template rebind_alloc 121 | >; 122 | 123 | std::size_t w; 124 | vector_type fs; 125 | std::size_t count = 0, 126 | i = 0; 127 | }; 128 | 129 | int main() 130 | { 131 | /* Construct a rolling filter with a size between 132 | * 9,000 and 10,000 elements. 133 | */ 134 | 135 | const std::size_t window_size = 1000; 136 | const std::size_t num_windows = 10; 137 | const double max_fpr = 0.01; 138 | 139 | rolling_filter rf(window_size, num_windows, max_fpr); 140 | std::cout << "rolling filter capacity: " << rf.capacity() << " bits\n"; 141 | 142 | /* Run the filter through more than 10x the elements it can hold. */ 143 | 144 | const std::size_t num_elements = rf.max_size() * 10 + window_size / 2; 145 | for(std::size_t i = 0 ; i < num_elements; ++i) rf.insert(i); 146 | 147 | /* Check the filter has actually forgotten the first 148 | * num_elements - rf.max_size() elements. 149 | */ 150 | 151 | std::size_t count = 0; 152 | for(std::size_t i = 0 ; i < num_elements - rf.max_size(); ++i) { 153 | count += rf.may_contain(i); 154 | } 155 | std::cout << "measured fpr: " 156 | << (double)count / (num_elements - rf.max_size()) 157 | << " (should be between " << rf.min_fpr() 158 | << " and " << rf.max_fpr() << ")\n"; 159 | 160 | /* The remaining elements must be mostly in the filter. */ 161 | 162 | count = 0; 163 | for(std::size_t i = num_elements - rf.max_size() ; i < num_elements; ++i) { 164 | count += rf.may_contain(i); 165 | } 166 | std::cout << "elements found: " << count 167 | << " (must be between " << rf.min_size() 168 | << " and " << rf.max_size() << ")\n"; 169 | } 170 | -------------------------------------------------------------------------------- /example/genome.cpp: -------------------------------------------------------------------------------- 1 | /* Using Boost.Bloom to check occurrence of DNA sequences in a genome. 2 | * 3 | * Copyright 2025 Joaquin M Lopez Munoz. 4 | * Distributed under the Boost Software License, Version 1.0. 5 | * (See accompanying file LICENSE_1_0.txt or copy at 6 | * http://www.boost.org/LICENSE_1_0.txt) 7 | * 8 | * See https://www.boost.org/libs/bloom for library home page. 9 | */ 10 | 11 | #include 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include 17 | #include 18 | #include 19 | #include 20 | #include 21 | 22 | /* A k-mer is a sliding segment of size k over a sequence of DNA nucleotides 23 | * (A, C, G, T). k_mer encodes the segment in a 64-bit word using 2 bits per 24 | * nucleotide. 25 | */ 26 | 27 | template 28 | struct k_mer 29 | { 30 | static_assert( 31 | K >= 0 && 32 | 2 * K <= sizeof(std::uint64_t) * CHAR_BIT); 33 | 34 | static constexpr std::size_t size() 35 | { 36 | return K; 37 | } 38 | 39 | void reset() 40 | { 41 | data = 0; 42 | } 43 | 44 | /* shift the k-mer and append a new nucleotide */ 45 | 46 | k_mer& operator+=(char n) 47 | { 48 | static constexpr std::uint64_t mask= 49 | (((std::uint64_t)1) << (2 * size())) - 1; 50 | 51 | data <<= 2; 52 | data &= mask; 53 | data |= table[(unsigned char)n]; 54 | return *this; 55 | } 56 | 57 | std::uint64_t data = 0; 58 | 59 | using table_type=std::array; 60 | 61 | static constexpr table_type table = [] { 62 | table_type table{}; 63 | table['A'] = table['a'] = 0; 64 | table['C'] = table['c'] = 1; 65 | table['G'] = table['g'] = 2; 66 | table['T'] = table['t'] = 3; 67 | return table; 68 | }(); 69 | }; 70 | 71 | template 72 | std::size_t hash_value(const k_mer& km) 73 | { 74 | /* k:mer::data is 8 bytes wide. We use it directly as the associated 75 | * hash value in 64-bit mode, as std::size_t is the same size; in 32-bit 76 | * mode, we XOR the high and low portions of data to make it fit into 77 | * a std::size_t. 78 | */ 79 | 80 | if constexpr (sizeof(std::size_t) >= sizeof(std::uint64_t)) { 81 | return (std::size_t)km.data; 82 | } 83 | else{ /* 32-bit mode */ 84 | return (std::size_t)(km.data ^ (km.data >> 32)); 85 | } 86 | } 87 | 88 | /* Insert all the k-mers of a given genome in a boost::bloom::filter. 89 | * Assumed format is FASTA with A, C, G, T. 90 | * https://en.wikipedia.org/wiki/FASTA_format 91 | */ 92 | 93 | using genome_filter = boost::bloom::filter< 94 | k_mer<20>, /* using k-mers of length 20 */ 95 | 1, boost::bloom::fast_multiblock32<8> >; 96 | 97 | genome_filter make_genome_filter(const char* filename) 98 | { 99 | using k_mer = genome_filter::value_type; 100 | 101 | std::ifstream in(filename, std::ios::ate); /* open at end to tell size */ 102 | if(!in) throw std::runtime_error("can't open file"); 103 | 104 | /* As a rough estimation, we assume that the number of k-mers 105 | * is approximately equal to the length of the genome --this is 106 | * overpessimistic due to the likely presence of duplicate k-mers. 107 | * We set FPR = 1%. 108 | */ 109 | 110 | genome_filter f((std::size_t)in.tellg(), 0.01); 111 | in.seekg(0); 112 | 113 | std::string line; 114 | std::size_t width = 0; 115 | k_mer km; 116 | while(std::getline(in, line)) { 117 | if(line.empty()) continue; 118 | if(line[0] == '>') { /* annotation lines in the FASTA format */ 119 | width = 0; 120 | km.reset(); 121 | continue; 122 | } 123 | std::size_t i = 0; 124 | 125 | /* don't insert km till it has km.size() nucleotides */ 126 | 127 | for(; width< km.size() - 1 && i < line.size(); ++i) { 128 | km += line[i]; 129 | ++width; 130 | } 131 | 132 | for(; i < line.size(); ++i) { 133 | km += line[i]; 134 | f.insert(km); 135 | } 136 | } 137 | return f; 138 | } 139 | 140 | /* We estimate a DNA sequence seq to be contained in a genome if all the k-mers 141 | * of seq are contained. The calculation of the resulting false positive rate 142 | * is left as an exercise for the reader. 143 | */ 144 | 145 | bool may_contain(const genome_filter& f, std::string_view seq) 146 | { 147 | using k_mer = genome_filter::value_type; 148 | 149 | assert(seq.size() >= k_mer::size()); 150 | 151 | k_mer km; 152 | auto first = seq.begin(), last = seq.end(); 153 | 154 | /* feed first km.size() -1 nucleotides */ 155 | 156 | for(std::size_t i = 0; i < km.size() - 1; ++i) km += *first++; 157 | 158 | do{ 159 | km += *first++; 160 | if(!f.may_contain(km)) return false; 161 | }while(first != last); 162 | return true; 163 | } 164 | 165 | int main() 166 | { 167 | try{ 168 | /* Fruit fly genome (Drosophila melanogaster), available at 169 | * https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001215.4/ 170 | */ 171 | 172 | auto f=make_genome_filter( 173 | "GCF_000001215.4_Release_6_plus_ISO1_MT_genomic.fna"); 174 | 175 | /* Some DNA sequences */ 176 | 177 | const char* seqs[] = { 178 | "ataaataagattgCGACTCAAAATTAAgcaataacac", /* chr. X */ 179 | "attatagggagaaatatgatcgcgtatgcgagagtagtgccaacatattgtgctc", /* chr. 3L */ 180 | "agaATTTACTAAGTACTTCTATGAATGGAATTATTATTGGAAACTCTACAA", /* chr. 4 */ 181 | "ATTTACTAAGTACTTCTATCTGCAAATTAACAATTTATCAAACAACTG", /* not present */ 182 | "ataaataagattgCGACTCAAAAGTAAgcaat" /* mutation in chr. X, not present */ 183 | }; 184 | 185 | int i = 0; 186 | for(auto seq: seqs){ 187 | std::cout << "check sequence " << i++ << ": " 188 | << may_contain(f, seq) << "\n"; 189 | } 190 | } 191 | catch(const std::exception& e) { 192 | std::cerr << e.what() << "\n"; 193 | return EXIT_FAILURE; 194 | } 195 | } 196 | -------------------------------------------------------------------------------- /include/boost/bloom/detail/bloom_printers.hpp: -------------------------------------------------------------------------------- 1 | // Copyright 2025 Braden Ganetsky 2 | // Distributed under the Boost Software License, Version 1.0. 3 | // https://www.boost.org/LICENSE_1_0.txt 4 | 5 | // Generated on 2025-06-29T10:15:10 6 | 7 | #ifndef BOOST_BLOOM_DETAIL_BLOOM_PRINTERS_HPP 8 | #define BOOST_BLOOM_DETAIL_BLOOM_PRINTERS_HPP 9 | 10 | #ifndef BOOST_ALL_NO_EMBEDDED_GDB_SCRIPTS 11 | #if defined(__ELF__) 12 | #ifdef __clang__ 13 | #pragma clang diagnostic push 14 | #pragma clang diagnostic ignored "-Woverlength-strings" 15 | #endif 16 | __asm__(".pushsection \".debug_gdb_scripts\", \"MS\",%progbits,1\n" 17 | ".ascii \"\\4gdb.inlined-script.BOOST_BLOOM_DETAIL_BLOOM_PRINTERS_HPP\\n\"\n" 18 | ".ascii \"import gdb.printing\\n\"\n" 19 | ".ascii \"import gdb.xmethod\\n\"\n" 20 | 21 | ".ascii \"class BoostBloomFilterPrinter:\\n\"\n" 22 | ".ascii \" def __init__(self, val):\\n\"\n" 23 | ".ascii \" self.void_pointer = gdb.lookup_type(\\\"void\\\").pointer()\\n\"\n" 24 | ".ascii \" nullptr = gdb.Value(0).cast(self.void_pointer)\\n\"\n" 25 | 26 | ".ascii \" has_array = val[\\\"ar\\\"][\\\"data\\\"] != nullptr\\n\"\n" 27 | 28 | ".ascii \" if has_array:\\n\"\n" 29 | ".ascii \" stride = int(val[\\\"stride\\\"])\\n\"\n" 30 | ".ascii \" used_value_size = int(val[\\\"used_value_size\\\"])\\n\"\n" 31 | ".ascii \" self.array_size = int(val[\\\"hs\\\"][\\\"rng\\\"]) * stride + (used_value_size - stride)\\n\"\n" 32 | ".ascii \" else:\\n\"\n" 33 | ".ascii \" self.array_size = 0\\n\"\n" 34 | ".ascii \" self.capacity = self.array_size * 8\\n\"\n" 35 | ".ascii \" if has_array:\\n\"\n" 36 | ".ascii \" self.data = val[\\\"ar\\\"][\\\"array\\\"]\\n\"\n" 37 | ".ascii \" else:\\n\"\n" 38 | ".ascii \" self.data = nullptr\\n\"\n" 39 | 40 | ".ascii \" def to_string(self):\\n\"\n" 41 | ".ascii \" return f\\\"boost::bloom::filter with {{capacity = {self.capacity}, data = {self.data.cast(self.void_pointer)}, size = {self.array_size}}}\\\"\\n\"\n" 42 | 43 | ".ascii \" def display_hint(self):\\n\"\n" 44 | ".ascii \" return \\\"map\\\"\\n\"\n" 45 | ".ascii \" def children(self):\\n\"\n" 46 | ".ascii \" def generator():\\n\"\n" 47 | ".ascii \" data = self.data\\n\"\n" 48 | ".ascii \" for i in range(self.array_size):\\n\"\n" 49 | ".ascii \" yield \\\"\\\", f\\\"{i}\\\"\\n\"\n" 50 | ".ascii \" yield \\\"\\\", data.dereference()\\n\"\n" 51 | ".ascii \" data = data + 1\\n\"\n" 52 | ".ascii \" return generator()\\n\"\n" 53 | 54 | ".ascii \"def boost_bloom_build_pretty_printer():\\n\"\n" 55 | ".ascii \" pp = gdb.printing.RegexpCollectionPrettyPrinter(\\\"boost_bloom\\\")\\n\"\n" 56 | ".ascii \" add_template_printer = lambda name, printer: pp.add_printer(name, f\\\"^{name}<.*>$\\\", printer)\\n\"\n" 57 | 58 | ".ascii \" add_template_printer(\\\"boost::bloom::filter\\\", BoostBloomFilterPrinter)\\n\"\n" 59 | 60 | ".ascii \" return pp\\n\"\n" 61 | 62 | ".ascii \"gdb.printing.register_pretty_printer(gdb.current_objfile(), boost_bloom_build_pretty_printer())\\n\"\n" 63 | 64 | ".ascii \"# https://sourceware.org/gdb/current/onlinedocs/gdb.html/Writing-an-Xmethod.html\\n\"\n" 65 | ".ascii \"class BoostBloomFilterSubscriptMethod(gdb.xmethod.XMethod):\\n\"\n" 66 | ".ascii \" def __init__(self):\\n\"\n" 67 | ".ascii \" gdb.xmethod.XMethod.__init__(self, 'subscript')\\n\"\n" 68 | 69 | ".ascii \" def get_worker(self, method_name):\\n\"\n" 70 | ".ascii \" if method_name == 'operator[]':\\n\"\n" 71 | ".ascii \" return BoostBloomFilterSubscriptWorker()\\n\"\n" 72 | 73 | ".ascii \"class BoostBloomFilterSubscriptWorker(gdb.xmethod.XMethodWorker):\\n\"\n" 74 | ".ascii \" def get_arg_types(self):\\n\"\n" 75 | ".ascii \" return [gdb.lookup_type('std::size_t')]\\n\"\n" 76 | 77 | ".ascii \" def get_result_type(self, obj):\\n\"\n" 78 | ".ascii \" return gdb.lookup_type('unsigned char')\\n\"\n" 79 | 80 | ".ascii \" def __call__(self, obj, index):\\n\"\n" 81 | ".ascii \" fp = BoostBloomFilterPrinter(obj)\\n\"\n" 82 | ".ascii \" if fp.array_size == 0:\\n\"\n" 83 | ".ascii \" print('Error: Filter is null')\\n\"\n" 84 | ".ascii \" return\\n\"\n" 85 | ".ascii \" elif index < 0 or index >= fp.array_size:\\n\"\n" 86 | ".ascii \" print('Error: Out of bounds')\\n\"\n" 87 | ".ascii \" return\\n\"\n" 88 | ".ascii \" else:\\n\"\n" 89 | ".ascii \" data = fp.data\\n\"\n" 90 | ".ascii \" return (data + index).dereference()\\n\"\n" 91 | 92 | ".ascii \"class BoostBloomFilterMatcher(gdb.xmethod.XMethodMatcher):\\n\"\n" 93 | ".ascii \" def __init__(self):\\n\"\n" 94 | ".ascii \" gdb.xmethod.XMethodMatcher.__init__(self, 'BoostBloomFilterMatcher')\\n\"\n" 95 | ".ascii \" self.methods = [BoostBloomFilterSubscriptMethod()]\\n\"\n" 96 | 97 | ".ascii \" def match(self, class_type, method_name):\\n\"\n" 98 | ".ascii \" if not class_type.tag.startswith('boost::bloom::filter<'):\\n\"\n" 99 | ".ascii \" return None\\n\"\n" 100 | 101 | ".ascii \" workers = []\\n\"\n" 102 | ".ascii \" for method in self.methods:\\n\"\n" 103 | ".ascii \" if method.enabled:\\n\"\n" 104 | ".ascii \" worker = method.get_worker(method_name)\\n\"\n" 105 | ".ascii \" if worker:\\n\"\n" 106 | ".ascii \" workers.append(worker)\\n\"\n" 107 | ".ascii \" return workers\\n\"\n" 108 | 109 | ".ascii \"gdb.xmethod.register_xmethod_matcher(None, BoostBloomFilterMatcher())\\n\"\n" 110 | 111 | ".byte 0\n" 112 | ".popsection\n"); 113 | #ifdef __clang__ 114 | #pragma clang diagnostic pop 115 | #endif 116 | #endif // defined(__ELF__) 117 | #endif // !defined(BOOST_ALL_NO_EMBEDDED_GDB_SCRIPTS) 118 | 119 | #endif // !defined(BOOST_BLOOM_DETAIL_BLOOM_PRINTERS_HPP) 120 | -------------------------------------------------------------------------------- /doc/bloom/implementation_notes.adoc: -------------------------------------------------------------------------------- 1 | [#implementation_notes] 2 | = Appendix B: Implementation Notes 3 | 4 | :idprefix: implementation_notes_ 5 | 6 | == Hash Mixing 7 | 8 | This is the bit-mixing post-process we use to improve the statistical properties 9 | of the hash function when it doesn't have the avalanching property: 10 | 11 | [.formula-center] 12 | {small}stem:[m\leftarrow\text{mul}(h,C)]{small-end}, + 13 | {small}stem:[h'\leftarrow\text{high}(m)\text{ xor }\text{low}(m)]{small-end}, 14 | 15 | where {small}stem:[\text{mul}]{small-end} denotes 128-bit multiplication of two 64-bit factors, 16 | {small}stem:[\text{high}(m)]{small-end} and {small}stem:[\text{low}(m)]{small-end} 17 | are the high and low 64-bit words of {small}stem:[m]{small-end}, respectively, 18 | {small}stem:[C=\lfloor 2^{64}/\varphi \rfloor]{small-end} and 19 | {small}stem:[\varphi]{small-end} is the https://en.wikipedia.org/wiki/Golden_ratio[golden ratio^]. 20 | 21 | == 32-bit mode 22 | 23 | Internally, we always use 64-bit hash values even if in 32-bit mode, where 24 | the user-provided hash function produces 32-bit outputs. To expand 25 | a 32-bit hash value to 64 bits, we use the same mixing procedure 26 | described 27 | xref:implementation_notes_hash_mixing[above]. 28 | 29 | == Dispensing with Multiple Hash Functions 30 | 31 | Direct implementations of a Bloom filter with {small}stem:[k]{small-end} 32 | bits per operation require {small}stem:[k]{small-end} different and independent 33 | hash functions {small}stem:[h_i(x)]{small-end}, which incurs an important 34 | performance penalty, particularly if the objects are expensive to hash 35 | (e.g. strings). https://www.eecs.harvard.edu/~michaelm/postscripts/rsa2008.pdf[Kirsch and Mitzenmacher^] 36 | show how to relax this requirement down to two different hash functions 37 | {small}stem:[h_1(x)]{small-end} and {small}stem:[h_2(x)]{small-end} linearly 38 | combined as 39 | 40 | [.formula-center] 41 | {small}stem:[g_i(x)=h_1(x)+ih_2(x).]{small-end} 42 | 43 | Without formal justification, we have relaxed this even further to just one 44 | initial hash value {small}stem:[h_0=h_0(x)]{small-end}, where new values 45 | {small}stem:[h_i]{small-end} are computed from {small}stem:[h_{i-1}]{small-end} 46 | by means of very cheap mixing schemes. In what follows 47 | {small}stem:[k]{small-end}, {small}stem:[k']{small-end} are the homonym values 48 | in a filter of the form `boost::bloom::filter>`, 49 | {small}stem:[b]{small-end} is `sizeof(Block) * CHAR_BIT`, 50 | and {small}stem:[r]{small-end} is the number of subarrays in the filter. 51 | 52 | === Subarray Location 53 | 54 | To produce a location (i.e. a number {small}stem:[p]{small-end} in {small}stem:[[0,r)]{small-end}) from 55 | {small}stem:[h_{i-1}]{small-end}, instead of the straightforward but costly 56 | procedure {small}stem:[p\leftarrow h_{i-1}\bmod r]{small-end} we resort to 57 | Lemire's https://arxiv.org/pdf/1805.10941[fastrange technique^]: 58 | 59 | [.formula-center] 60 | {small}stem:[m\leftarrow\text{mul}(h_{i-1},r),]{small-end} + 61 | {small}stem:[p\leftarrow\lfloor m/2^{64} \rfloor=\text{high}(m).]{small-end} 62 | 63 | To decorrelate {small}stem:[p]{small-end} from further uses of the hash value, 64 | we produce {small}stem:[h_{i}]{small-end} from {small}stem:[h_{i-1}]{small-end} as 65 | 66 | [.formula-center] 67 | {small}stem:[h_i\leftarrow c \cdot h_{i-1} \bmod 2^{64}=\text{low}(c \cdot h_{i-1}),]{small-end} 68 | 69 | with {small}stem:[c=\text{0xf1357aea2e62a9c5}]{small-end} (64-bit mode), 70 | {small}stem:[c=\text{0xe817fb2d}]{small-end} (32-bit mode) obtained 71 | from https://arxiv.org/pdf/2001.05304[Steele and Vigna^]. 72 | The transformation {small}stem:[h_{i-1} \rightarrow h_i]{small-end} is 73 | a simple https://en.wikipedia.org/wiki/Linear_congruential_generator[multiplicative congruential generator^] 74 | over {small}stem:[2^{64}]{small-end}. For this MCG to produce long 75 | cycles {small}stem:[h_0]{small-end} must be odd, so the implementation adjusts 76 | {small}stem:[h_0]{small-end} to {small}stem:[h_0'= (h_0\text{ or }1)]{small-end}, 77 | which renders the least significant bit of {small}stem:[h_i]{small-end} 78 | unsuitable for pseudorandomization (it is always one). 79 | 80 | === Bit selection 81 | 82 | Inside a subfilter, we must produce {small}stem:[k']{small-end} 83 | values from {small}stem:[h_i]{small-end} in the range 84 | {small}stem:[[0,b)]{small-end} (the positions of the {small}stem:[k']{small-end} 85 | bits). We do this by successively taking {small}stem:[\log_2b]{small-end} bits 86 | from {small}stem:[h_i]{small-end} without utilizing the portion containing 87 | its least significant bit (which is always one as we have discussed). 88 | If we run out of bits (which happens when 89 | {small}stem:[k'> 63/\log_2b]{small-end}), we produce a new hash value 90 | {small}stem:[h_{i+1}]{small-end} from {small}stem:[h_{i}]{small-end} 91 | using the mixing procedure 92 | xref:implementation_notes_hash_mixing[already described]. 93 | 94 | == SIMD algorithms 95 | 96 | === `fast_multiblock32` 97 | 98 | When using AVX2, we select up to 8 bits at a time by creating 99 | a `+++__+++m256i` of 32-bit values {small}stem:[(x_0,x_1,...,x_7)]{small-end} 100 | where each {small}stem:[x_i]{small-end} is constructed from 101 | a different 5-bit portion of the hash value, and calculating from this 102 | the `+++__+++m256i` {small}stem:[(2^{x_0},2^{x_1},...,2^{x_7})]{small-end} 103 | with https://www.intel.com/content/www/us/en/docs/cpp-compiler/developer-guide-reference/2021-10/mm256-sllv-epi32-64.html[`+++_+++mm256_sllv_epi32`^]. 104 | If more bits are needed, we generate a new hash value as 105 | xref:implementation_notes_hash_mixing[described before] and repeat. 106 | 107 | For little-endian Neon, the algorithm is similar but the computations 108 | are carried out with two `uint32x4_t`+++s+++ in parallel as Neon does not have 109 | 256-bit registers. 110 | 111 | In the case of SSE2, we don't have the 128-bit equivalent of 112 | `+++_+++mm256_sllv_epi32`, so we use the following, mildly interesting 113 | technique: a `+++__+++m128i` of the form 114 | 115 | [.formula-center] 116 | {small}stem:[((x_0+127)\cdot 2^{23},(x_1+127)\cdot 2^{23},(x_2+127)\cdot 2^{23},(x_3+127)\cdot 2^{23}),]{small-end} 117 | 118 | where each {small}stem:[x_i]{small-end} is in {small}stem:[[0,32)]{small-end}, 119 | can be `reinterpret_cast`+++ed+++ to (i.e., has the same binary representation as) 120 | the `+++__+++m128` (register of `float`+++s+++) 121 | 122 | [.formula-center] 123 | {small}stem:[(2^{x_0},2^{x_1},2^{x_2},2^{x_3}),]{small-end} 124 | 125 | from which our desired `+++__+++m128i` of shifted 1s can be obtained 126 | with https://www.intel.com/content/www/us/en/docs/cpp-compiler/developer-guide-reference/2021-10/conversion-intrinsics-003.html#GUID-B1CFE576-21E9-4E70-BE5E-B9B18D598C12[`+++_+++mm_cvttps_epi32`^]. 127 | 128 | === `fast_multiblock64` 129 | 130 | We only provide a SIMD implementation for AVX2 that relies on two 131 | parallel `+++__+++m256i`+++s+++ for the generation of up 132 | to 8 64-bit values with shifted 1s. For Neon and SSE2, emulation 133 | through 4 128-bit registers proved slower than non-SIMD `multiblock`. 134 | -------------------------------------------------------------------------------- /doc/bloom/configuration.adoc: -------------------------------------------------------------------------------- 1 | [#configuration] 2 | = Choosing a Filter Configuration 3 | 4 | :idprefix: configuration_ 5 | 6 | Boost.Bloom offers a plethora of compile-time and run-time configuration options, 7 | so it may be difficult to make a choice. 8 | If you're aiming for a given FPR or have a particular capacity in mind and 9 | you'd like to choose the most appropriate filter type, the following chart 10 | may come handy. 11 | 12 | image::fpr_c.png[align=center, title="FPR vs. _c_ for different filter types."] 13 | 14 | The chart plots FPR vs. _c_ (capacity / number of elements inserted) for several 15 | `boost::bloom::filter`+++s+++ where `K` has been set to its optimum value (minimum FPR) 16 | as shown in the table below. 17 | 18 | +++ 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 |
c = capacity / number of elements inserted
4 5 6 7 8 9 10 11 12 1314 15 16 17 18 19 20 21 22 23 24
filter<T,1,block<uint32_t,K>> 3 3 3 4 4 5 5 5 5 55 5 6 6 7 7 7 7 7 7 7
filter<T,1,block<uint32_t,K>,1> 2 3 4 4 4 4 5 5 5 66 6 6 6 6 6 7 7 7 7 7
filter<T,1,block<uint64_t,K>> 2 3 4 4 5 5 5 5 5 66 6 6 6 7 7 7 7 7 7 7
filter<T,1,block<uint64_t,K>,1> 2 3 4 4 4 5 6 6 6 77 7 7 7 8 8 8 8 8 9 9
filter<T,1,multiblock<uint32_t,K>> 3 3 4 5 6 6 8 8 8 89 9 9 10 13 13 15 15 15 16 16
filter<T,1,block<uint64_t[8],K>> 4 4 4 5 5 6 7 7 7 88 9 9 10 10 11 12 12 12 12 12
filter<T,1,multiblock<uint32_t,K>,1> 3 3 4 5 6 6 7 7 8 89 9 10 10 12 12 14 14 14 14 15
filter<T,1,block<uint64_t[8],K>,1> 3 3 4 5 6 6 7 7 7 88 8 10 11 11 12 12 12 12 12 13
filter<T,1,multiblock<uint64_t,K>> 4 4 5 5 6 6 6 7 8 810 10 12 13 14 15 15 15 15 16 17
filter<T,1,multiblock<uint64_t,K>,1> 3 3 4 5 5 6 6 7 9 1010 11 11 12 12 13 13 13 15 16 16
filter<T,K> 3 4 4 5 5 6 6 8 8 910 11 12 13 13 13 14 16 16 16 17
74 |
75 | +++ 76 | 77 | Let's see how this can be used by way of an example. Suppose we plan to insert 10M elements 78 | and want to keep the FPR at 10^−4^. The chart gives us five different 79 | values of _c_ (the array capacity divided by the number of elements, in our case 10M): 80 | 81 | * `filter` -> _c_ ≅ 19 bits per element 82 | * `filter, 1>` -> _c_ ≅ 20 bits per element 83 | * `filter>` -> _c_ ≅ 21 bits per element 84 | * `filter, 1>` -> _c_ ≅ 21 bits per element 85 | * `filter, 1>` -> _c_ ≅ 21.5 bits per element 86 | * `filter>` -> _c_ ≅ 22 bits per element 87 | * `filter>` -> _c_ ≅ 23 bits per element 88 | 89 | These options have different tradeoffs in terms of space used and performance. If 90 | we choose `filter, 1>` as a compromise (or better yet, 91 | `filter, 1>`), the only remaining step is to consult the 92 | value of `K` in the table for _c_ = 21 or 22, and we get our final configuration: 93 | 94 | [source,subs="+macros,+quotes"] 95 | ----- 96 | using my_filter=filter, 1>; 97 | ----- 98 | 99 | The resulting filter can be constructed in any of the following ways: 100 | 101 | [source] 102 | ----- 103 | // 1) calculate the capacity from the value of c we got from the chart 104 | my_filter f((std::size_t)(10'000'000 * 21.5)); 105 | 106 | // 2) let the library calculate the capacity from n and target fpr 107 | // expect some deviation from the capacity in 1) 108 | my_filter f(10'000'000, 1E-4); 109 | 110 | // 3) equivalent to 2) 111 | my_filter f(my_filter::capacity_for(10'000'000, 1E-4)); 112 | ----- 113 | -------------------------------------------------------------------------------- /doc/bloom/primer.adoc: -------------------------------------------------------------------------------- 1 | [#primer] 2 | = Bloom Filter Primer 3 | 4 | :idprefix: primer_ 5 | 6 | A Bloom filter (named after its inventor Burton Howard Bloom) is a probabilistic data 7 | structure where inserted elements can be looked up with 100% accuracy, whereas looking 8 | up for a non-inserted element may fail with some probability called the filter's 9 | _false positive rate_ or FPR. The tradeoff here is that Bloom filters occupy much less 10 | space than traditional non-probabilistic containers (typically, around 8-20 bits per 11 | element) for an acceptably low FPR. The greater the filter's _capacity_ (its size in bits), 12 | the lower the resulting FPR. 13 | 14 | In general, Bloom filters are useful to prevent/mitigate queries against large data sets 15 | when exact retrieval is costly and/or can't be made in main memory. 16 | 17 | [.boxed] 18 | ==== 19 | *Example: Speeding up unsuccessful requests to a database* 20 | 21 | One prime application of Bloom filters and similar data structures is for the prevention 22 | of expensive disk/network accesses when these would fail to retrieve a given piece of 23 | information. 24 | For instance, suppose we are developing a frontend for a database with access time 25 | 10 ms and we know 50% of the requests will not succeed (the record does not exist). 26 | Inserting a Bloom filter with a lookup time of 200 ns and a FPR of 0.5% will reduce the 27 | average response time of the system from 10 ms to 28 | 29 | [.formula-center] 30 | (10 + 0.0002) × 50.25% + 0.0002 × 49.75% ≅ 5.03 ms, 31 | 32 | that is, we get a ×1.99 overall speedup. If the database holds 1 billion records, 33 | an in-memory filter with say 8 bits per element will occupy 0.93 GB, 34 | which is perfectly realizable. 35 | 36 | image::db_speedup.png[align=center, title="Improving DB negative access time with a Bloom filter."] 37 | 38 | ==== 39 | 40 | Applications have been described in the areas of web caching, 41 | dictionary compression, network routing and genomics, among others. 42 | https://www.eecs.harvard.edu/~michaelm/postscripts/im2005b.pdf[Broder and Mitzenmacher^] 43 | provide a rather extensive review of use cases with a focus on networking. 44 | 45 | == Implementation 46 | 47 | The implementation of a classical Bloom filter consists of an array of _m_ bits, initially set to zero. 48 | Inserting an element _x_ reduces to selecting _k_ positions pseudorandomly (with the help 49 | of _k_ independent hash functions) and setting them to one. 50 | 51 | image::bloom_insertion.png[align=center, title="Insertion in a classical Bloom filter with _k_ = 6 different hash functions. Inserting _x_ reduces to setting to one the bits at positions 10, 14, 43, 58, 1, and 39 as indicated by _h_~1~(_x_), ..., _h_~6~(_x_)."] 52 | 53 | To check if an element _y_ is in the filter, we follow the same procedure and see if 54 | the selected bits are all set to one. In the example figure there are two unset bits, which 55 | definitely indicates _y_ was not inserted in the filter. 56 | 57 | image::bloom_lookup.png[align=center, title="Lookup in a classical Bloom filter. Value _y_ is not in the filter because bits at positions 20 and 61 are not set to one."] 58 | 59 | A false positive occurs when the bits checked happen to be all set to one due to 60 | other, unrelated insertions. The probability of having a false positive increases as we 61 | add more elements to the filter, whereas for a given number _n_ of inserted elements, a filter 62 | with greater capacity (larger bit array) will have a lower FPR. 63 | The number _k_ of bits set per operation also affects the FPR, albeit in a more complicated way: 64 | when the array is sparsely populated, a higher value of _k_ improves (decreases) the FPR, 65 | as there are more chances that we hit a non-set bit; however, if _k_ is very high 66 | the array will have more and more bits set to one as new elements are inserted, which 67 | eventually will reach a point where we lose out to a filter with a lower _k_ and 68 | thus a smaller proportion of set bits. For given values of _n_ and _m_, the optimum _k_ is 69 | {small}stem:[\lfloor k_{\text{opt}}\rfloor]{small-end} or 70 | {small}stem:[\lceil k_{\text{opt}}\rceil]{small-end}, with 71 | 72 | [.formula-center] 73 | {small}stem:[k_{\text{opt}}=\displaystyle\frac{m\cdot\ln2}{n},]{small-end} 74 | 75 | for a minimum FPR close to 76 | {small}stem:[1/2^{k_{\text{opt}}} \approx 0.6185^{m/n}]{small-end}. See the appendix 77 | on xref:fpr_estimation[FPR estimation] for more details. 78 | 79 | image::fpr_n_k.png[align=center, title="FPR vs. number of inserted elements for two filters with _m_ = 10^5^ bits. _k_ = 6 (red) has a better (lower) FPR than _k_ = 2 (blue) for small values of _n_, but eventually degrades more as _n_ grows. The dotted line shows the minimum attainable FPR resulting from selecting the optimum value of _k_ for each value of _n_."] 80 | 81 | == Variations on the Classical Filter 82 | 83 | === Block Filters 84 | 85 | An operation on a Bloom filter involves accessing _k_ different positions in memory, 86 | which, for large arrays, results in _k_ CPU cache misses and affects the 87 | operation's performance. A variation on the classical approach called a 88 | _block filter_ seeks to minimize cache misses by concentrating all bit 89 | setting/checking in a small block of _b_ bits pseudorandomly selected from the 90 | entire array. If the block is small enough, it will fit in a CPU cacheline, 91 | thus drastically reducing the number of cache misses. 92 | 93 | image::block_insertion.png[align=center, title="Block filter. A block of _b_ bits is selected based on _h_~0~(x), and all subsequent bit setting is constrained there."] 94 | 95 | The downside is that the resulting FPR is worse than that of a classical filter for 96 | the same values of _n_, _m_ and _k_. Intuitively, block filters reduce the 97 | uniformity of the distribution of bits in the array, which ultimately hurts their 98 | probabilistic performance. 99 | 100 | image::fpr_n_k_bk.png[align=center, title="FPR (logarithmic scale) vs. number of inserted elements for a classical and a block filter with the same _k_ = 4, _m_ = 10^5^ bits."] 101 | 102 | A further variation in this idea is to have operations select _k_ blocks 103 | with _k'_ bits set on each. This, again, will have a worse FPR than a classical 104 | filter with _k·k'_ bits per operation, but improves on a plain 105 | _k·k'_ block filter. 106 | 107 | image::block_multi_insertion.png[align=center, title="Block filter with multi-insertion. _k_ = 2 blocks are selected, and _k_' = 3 bits are set in each."] 108 | 109 | === Multiblock Filters 110 | 111 | _Multiblock filters_ take block filters' approach further by having 112 | bit setting/checking done on a sequence of consecutive blocks of size _b_, 113 | so that each block takes exactly one bit. This still maintains a good cache 114 | locality but improves FPR with respect to block filters because bits set to one 115 | are more spread out across the array. 116 | 117 | image::multiblock_insertion.png[align=center, title="Multiblock filter. A range of _k_' = 4 consecutive blocks is selected based on _h_~0~(x), and a bit is set to one in each of the blocks."] 118 | 119 | Multiblock filters can also be combined with multi-insertion. In general, 120 | for the same number of bits per operation and equal values of _n_ and _m_, 121 | a classical Bloom filter will have the better (lower) FPR, followed by 122 | multiblock filters and then block filters. Execution speed will roughly go 123 | in the reverse order. When considering block/multiblock filters with 124 | multi-insertion, the number of available configurations grows quickly and 125 | you will need to do some experimenting to locate your preferred point in the 126 | (FPR, capacity, speed) tradeoff space. 127 | -------------------------------------------------------------------------------- /benchmark/comparison_table.cpp: -------------------------------------------------------------------------------- 1 | /* Comparison table for several configurations of boost::bloom::filter. 2 | * 3 | * Copyright 2025 Joaquin M Lopez Munoz. 4 | * Distributed under the Boost Software License, Version 1.0. 5 | * (See accompanying file LICENSE_1_0.txt or copy at 6 | * http://www.boost.org/LICENSE_1_0.txt) 7 | * 8 | * See https://www.boost.org/libs/bloom for library home page. 9 | */ 10 | 11 | #include 12 | #include 13 | #include 14 | #include 15 | 16 | std::chrono::high_resolution_clock::time_point measure_start,measure_pause; 17 | 18 | template 19 | double measure(F f) 20 | { 21 | using namespace std::chrono; 22 | 23 | static const int num_trials=10; 24 | static const milliseconds min_time_per_trial(10); 25 | std::array trials; 26 | 27 | for(int i=0;i>(t2-measure_start).count()/runs; 39 | } 40 | 41 | std::sort(trials.begin(),trials.end()); 42 | return std::accumulate( 43 | trials.begin()+2,trials.end()-2,0.0)/(trials.size()-4); 44 | } 45 | 46 | void pause_timing() 47 | { 48 | measure_pause=std::chrono::high_resolution_clock::now(); 49 | } 50 | 51 | void resume_timing() 52 | { 53 | measure_start+=std::chrono::high_resolution_clock::now()-measure_pause; 54 | } 55 | 56 | #include 57 | #include 58 | #include 59 | #include 60 | #include 61 | #include 62 | #include 63 | #include 64 | #include 65 | #include 66 | #include 67 | #include 68 | 69 | template 70 | struct unordered_flat_set_filter 71 | { 72 | using value_type=T; 73 | 74 | unordered_flat_set_filter(std::size_t){} 75 | void insert(const T& x){s.insert(x);} 76 | bool may_contain(const T& x){return s.contains(x);} 77 | 78 | boost::unordered_flat_set s; 79 | }; 80 | 81 | static std::size_t num_elements; 82 | 83 | struct test_results 84 | { 85 | double fpr; /* % */ 86 | double insertion_time; /* ns per element */ 87 | double successful_lookup_time; /* ns per element */ 88 | double unsuccessful_lookup_time; /* ns per element */ 89 | double mixed_lookup_time; /* ns per element */ 90 | }; 91 | 92 | template 93 | test_results test(std::size_t c) 94 | { 95 | using value_type=typename Filter::value_type; 96 | 97 | static constexpr double lookup_mix=0.1; /* successful pr. */ 98 | static constexpr std::uint64_t mixed_lookup_cut= 99 | (std::uint64_t)( 100 | lookup_mix*(double)(std::numeric_limits::max)()); 101 | 102 | std::vector data_in,data_out,data_mixed; 103 | { 104 | boost::detail::splitmix64 rng; 105 | boost::unordered_flat_set unique; 106 | for(std::size_t i=0;i void row(std::size_t c) 203 | { 204 | std::cout<< 205 | " \n" 206 | " "<\n"; 207 | 208 | boost::mp11::mp_for_each< 209 | boost::mp11::mp_transform 210 | >([&](auto i){ 211 | using filter=typename decltype(i)::type; 212 | auto res=test(c); 213 | std::cout<< 214 | " "<\n" 215 | " "<\n" 216 | " "<\n" 217 | " "<\n" 218 | " "<\n" 219 | " "<\n"; 220 | }); 221 | 222 | std::cout<< 223 | " \n"; 224 | } 225 | 226 | using namespace boost::bloom; 227 | 228 | template 229 | using filters1=boost::mp11::mp_list< 230 | filter, 231 | filter>, 232 | filter,1> 233 | >; 234 | 235 | template 236 | using filters2=boost::mp11::mp_list< 237 | filter>, 238 | filter,1>, 239 | filter> 240 | >; 241 | 242 | template 243 | using filters3=boost::mp11::mp_list< 244 | filter,1>, 245 | filter>, 246 | filter,1> 247 | >; 248 | 249 | template 250 | using filters4=boost::mp11::mp_list< 251 | filter>, 252 | filter,1>, 253 | filter> 254 | >; 255 | 256 | int main(int argc,char* argv[]) 257 | { 258 | if(argc<2){ 259 | std::cerr<<"provide the number of elements\n"; 260 | return EXIT_FAILURE; 261 | } 262 | try{ 263 | num_elements=std::stoul(argv[1]); 264 | } 265 | catch(...){ 266 | std::cerr<<"wrong arg\n"; 267 | return EXIT_FAILURE; 268 | } 269 | 270 | /* reference table: boost::unordered_flat_set */ 271 | 272 | auto res=test>(0); 273 | std::cout<< 274 | "\n" 275 | " \n" 276 | " \n" 277 | " \n" 278 | " \n" 279 | " \n" 280 | " \n" 281 | " \n" 282 | " \n" 283 | " \n" 288 | "
boost::unordered_flat_set
insertionsuccessful
lookup
unsuccessful
lookup
mixed
lookup
"<\n" 284 | " "<\n" 285 | " "<\n" 286 | " "<\n" 287 | "
\n"; 289 | 290 | /* filter table */ 291 | 292 | auto subheader= 293 | " K\n" 294 | " FPR
[%]\n" 295 | " ins.\n" 296 | " succ.
lkp.\n" 297 | " uns.
lkp.\n" 298 | " mixed
lkp.\n"; 299 | 300 | std::cout<< 301 | "\n" 302 | " \n" 303 | " \n" 304 | " \n" 305 | " \n" 306 | " \n" 307 | " \n" 308 | " \n" 309 | " \n"<< 310 | subheader<< 311 | subheader<< 312 | subheader<< 313 | " \n"; 314 | 315 | row>( 8); 316 | row>(12); 317 | row>(16); 318 | row>(20); 319 | 320 | std::cout<< 321 | " \n" 322 | " \n" 323 | " \n" 324 | " \n" 325 | " \n" 326 | " \n" 327 | " \n" 328 | " \n"<< 329 | subheader<< 330 | subheader<< 331 | subheader<< 332 | " \n"; 333 | 334 | row>( 8); 335 | row>(12); 336 | row>(16); 337 | row>(20); 338 | 339 | std::cout<< 340 | " \n" 341 | " \n" 342 | " \n" 343 | " \n" 344 | " \n" 345 | " \n" 346 | " \n" 347 | " \n"<< 348 | subheader<< 349 | subheader<< 350 | subheader<< 351 | " \n"; 352 | 353 | row>( 8); 354 | row>(12); 355 | row>(16); 356 | row>(20); 357 | 358 | std::cout<< 359 | " \n" 360 | " \n" 361 | " \n" 362 | " \n" 363 | " \n" 364 | " \n" 365 | " \n" 366 | " \n"<< 367 | subheader<< 368 | subheader<< 369 | subheader<< 370 | " \n"; 371 | 372 | row>( 8); 373 | row>(12); 374 | row>(16); 375 | row>(20); 376 | 377 | std::cout<<"
filter<int,K>filter<int,1,block<uint64_t,K>>filter<int,1,block<uint64_t,K>,1>
c
filter<int,1,multiblock<uint64_t,K>>filter<int,1,multiblock<uint64_t,K>,1>filter<int,1,fast_multiblock32<K>>
c
filter<int,1,fast_multiblock32<K>,1>filter<int,1,fast_multiblock64<K>>filter<int,1,fast_multiblock64<K>,1>
c
filter<int,1,block<uint64_t[8],K>>filter<int,1,block<uint64_t[8],K>,1>filter<int,1,multiblock<uint64_t[8],K>>
c
\n"; 378 | } 379 | -------------------------------------------------------------------------------- /include/boost/bloom/filter.hpp: -------------------------------------------------------------------------------- 1 | /* Configurable Bloom filter. 2 | * 3 | * Copyright 2025 Joaquin M Lopez Munoz. 4 | * Distributed under the Boost Software License, Version 1.0. 5 | * (See accompanying file LICENSE_1_0.txt or copy at 6 | * http://www.boost.org/LICENSE_1_0.txt) 7 | * 8 | * See https://www.boost.org/libs/bloom for library home page. 9 | */ 10 | 11 | #ifndef BOOST_BLOOM_FILTER_HPP 12 | #define BOOST_BLOOM_FILTER_HPP 13 | 14 | #include 15 | #include 16 | #include 17 | #include 18 | #include 19 | #include 20 | #include 21 | #include 22 | #include 23 | #include 24 | #include 25 | #include 26 | #include 27 | #include 28 | #include 29 | #include 30 | 31 | namespace boost{ 32 | namespace bloom{ 33 | namespace detail{ 34 | 35 | /* Mixing policies: no_mix_policy is the identity function, and 36 | * mulx64_mix_policy uses the mulx64 function from 37 | * . 38 | * 39 | * filter mixes hash results with mulx64 if the hash is not marked as 40 | * avalanching, i.e. it's not of good quality (see 41 | * ), or if std::size_t is less than 64 bits 42 | * (mixing policies promote to std::uint64_t). 43 | */ 44 | 45 | struct no_mix_policy 46 | { 47 | template 48 | /* NOLINTNEXTLINE(readability-redundant-inline-specifier) */ 49 | static inline std::uint64_t mix(const Hash& h,const T& x) 50 | { 51 | return (std::uint64_t)h(x); 52 | } 53 | }; 54 | 55 | struct mulx64_mix_policy 56 | { 57 | template 58 | /* NOLINTNEXTLINE(readability-redundant-inline-specifier) */ 59 | static inline std::uint64_t mix(const Hash& h,const T& x) 60 | { 61 | return mulx64((std::uint64_t)h(x)); 62 | } 63 | }; 64 | 65 | } /* namespace detail */ 66 | 67 | #if defined(BOOST_MSVC) 68 | #pragma warning(push) 69 | #pragma warning(disable:4714) /* marked as __forceinline not inlined */ 70 | #endif 71 | 72 | template< 73 | typename T,std::size_t K, 74 | typename Subfilter=block,std::size_t Stride=0, 75 | typename Hash=boost::hash,typename Allocator=std::allocator 76 | > 77 | class 78 | 79 | #if defined(_MSC_VER)&&_MSC_FULL_VER>=190023918 80 | __declspec(empty_bases) /* activate EBO with multiple inheritance */ 81 | #endif 82 | 83 | filter: 84 | detail::filter_core< 85 | K,Subfilter,Stride,allocator_rebind_t 86 | >, 87 | empty_value 88 | { 89 | BOOST_BLOOM_STATIC_ASSERT_IS_CV_UNQUALIFIED_OBJECT(T); 90 | static_assert( 91 | std::is_same>::value, 92 | "Allocator's value_type must be unsigned char"); 93 | using super=detail::filter_core; 94 | using mix_policy=typename std::conditional< 95 | boost::hash_is_avalanching::value&& 96 | sizeof(std::size_t)>=sizeof(std::uint64_t), 97 | detail::no_mix_policy, 98 | detail::mulx64_mix_policy 99 | >::type; 100 | 101 | public: 102 | using value_type=T; 103 | using super::k; 104 | using subfilter=typename super::subfilter; 105 | using super::stride; 106 | using hasher=Hash; 107 | using allocator_type=Allocator; 108 | using size_type=typename super::size_type; 109 | using difference_type=typename super::difference_type; 110 | using reference=value_type&; 111 | using const_reference=const value_type&; 112 | using pointer=value_type*; 113 | using const_pointer=const value_type*; 114 | static constexpr std::size_t bulk_insert_size=super::bulk_insert_size; 115 | static constexpr std::size_t bulk_may_contain_size= 116 | super::bulk_may_contain_size; 117 | 118 | filter()=default; 119 | 120 | explicit filter( 121 | std::size_t m,const hasher& h=hasher(), 122 | const allocator_type& al=allocator_type()): 123 | super{m,al},hash_base{empty_init,h}{} 124 | 125 | filter( 126 | std::size_t n,double fpr,const hasher& h=hasher(), 127 | const allocator_type& al=allocator_type()): 128 | super{n,fpr,al},hash_base{empty_init,h}{} 129 | 130 | template 131 | filter( 132 | InputIterator first,InputIterator last, 133 | std::size_t m,const hasher& h=hasher(), 134 | const allocator_type& al=allocator_type()): 135 | filter{m,h,al} 136 | { 137 | insert(first,last); 138 | } 139 | 140 | template 141 | filter( 142 | InputIterator first,InputIterator last, 143 | std::size_t n,double fpr,const hasher& h=hasher(), 144 | const allocator_type& al=allocator_type()): 145 | filter{n,fpr,h,al} 146 | { 147 | insert(first,last); 148 | } 149 | 150 | filter(const filter&)=default; 151 | filter(filter&&)=default; 152 | 153 | template 154 | filter( 155 | InputIterator first,InputIterator last, 156 | std::size_t m,const allocator_type& al): 157 | filter{first,last,m,hasher(),al}{} 158 | 159 | template 160 | filter( 161 | InputIterator first,InputIterator last, 162 | std::size_t n,double fpr,const allocator_type& al): 163 | filter{first,last,n,fpr,hasher(),al}{} 164 | 165 | explicit filter(const allocator_type& al):filter{0,al}{} 166 | 167 | filter(const filter& x,const allocator_type& al): 168 | super{x,al},hash_base{empty_init,x.h()}{} 169 | 170 | filter(filter&& x,const allocator_type& al): 171 | super{std::move(x),al},hash_base{empty_init,std::move(x.h())}{} 172 | 173 | filter( 174 | std::initializer_list il, 175 | std::size_t m,const hasher& h=hasher(), 176 | const allocator_type& al=allocator_type()): 177 | filter{il.begin(),il.end(),m,h,al}{} 178 | 179 | filter( 180 | std::initializer_list il, 181 | std::size_t n,double fpr,const hasher& h=hasher(), 182 | const allocator_type& al=allocator_type()): 183 | filter{il.begin(),il.end(),n,fpr,h,al}{} 184 | 185 | filter(std::size_t m,const allocator_type& al): 186 | filter{m,hasher(),al}{} 187 | 188 | filter(std::size_t n,double fpr,const allocator_type& al): 189 | filter{n,fpr,hasher(),al}{} 190 | 191 | filter( 192 | std::initializer_list il, 193 | std::size_t m,const allocator_type& al): 194 | filter{il.begin(),il.end(),m,hasher(),al}{} 195 | 196 | filter( 197 | std::initializer_list il, 198 | std::size_t n,double fpr,const allocator_type& al): 199 | filter{il.begin(),il.end(),n,fpr,hasher(),al}{} 200 | 201 | filter& operator=(const filter& x) 202 | { 203 | BOOST_BLOOM_STATIC_ASSERT_IS_NOTHROW_SWAPPABLE(Hash); 204 | using std::swap; 205 | 206 | auto x_h=x.h(); 207 | super::operator=(x); 208 | swap(h(),x_h); 209 | return *this; 210 | } 211 | 212 | filter& operator=(filter&& x) 213 | noexcept(noexcept(std::declval()=(std::declval()))) 214 | { 215 | BOOST_BLOOM_STATIC_ASSERT_IS_NOTHROW_SWAPPABLE(Hash); 216 | using std::swap; 217 | 218 | super::operator=(std::move(x)); 219 | swap(h(),x.h()); 220 | return *this; 221 | } 222 | 223 | filter& operator=(std::initializer_list il) 224 | { 225 | clear(); 226 | insert(il); 227 | return *this; 228 | } 229 | 230 | using super::get_allocator; 231 | using super::capacity; 232 | using super::capacity_for; 233 | using super::fpr_for; 234 | using super::array; 235 | 236 | BOOST_FORCEINLINE void insert(const T& x) 237 | { 238 | super::insert(hash_for(x)); 239 | } 240 | 241 | template< 242 | typename U, 243 | typename H=hasher,detail::enable_if_transparent_t* =nullptr 244 | > 245 | BOOST_FORCEINLINE void insert(const U& x) 246 | { 247 | super::insert(hash_for(x)); 248 | } 249 | 250 | template 251 | void insert(InputIterator first,InputIterator last) 252 | { 253 | insert_impl( 254 | first,last, 255 | std::integral_constant< 256 | bool,detail::is_forward_iterator::value>{}); 257 | } 258 | 259 | void insert(std::initializer_list il) 260 | { 261 | insert(il.begin(),il.end()); 262 | } 263 | 264 | void swap(filter& x) 265 | noexcept(noexcept(std::declval().swap(std::declval()))) 266 | { 267 | BOOST_BLOOM_STATIC_ASSERT_IS_NOTHROW_SWAPPABLE(Hash); 268 | using std::swap; 269 | 270 | super::swap(x); 271 | swap(h(),x.h()); 272 | } 273 | 274 | using super::clear; 275 | using super::reset; 276 | 277 | filter& operator&=(const filter& x) 278 | { 279 | super::operator&=(x); 280 | return *this; 281 | } 282 | 283 | filter& operator|=(const filter& x) 284 | { 285 | super::operator|=(x); 286 | return *this; 287 | } 288 | 289 | hasher hash_function()const 290 | { 291 | return h(); 292 | } 293 | 294 | BOOST_FORCEINLINE bool may_contain(const T& x)const 295 | { 296 | return super::may_contain(hash_for(x)); 297 | } 298 | 299 | template< 300 | typename U, 301 | typename H=hasher,detail::enable_if_transparent_t* =nullptr 302 | > 303 | BOOST_FORCEINLINE bool may_contain(const U& x)const 304 | { 305 | return super::may_contain(hash_for(x)); 306 | } 307 | 308 | template 309 | void may_contain( 310 | ForwardIterator first,ForwardIterator last,F f)const 311 | { 312 | BOOST_BLOOM_STATIC_ASSERT_IS_FORWARD_ITERATOR(ForwardIterator); 313 | 314 | super::bulk_may_contain( 315 | [this,first]()mutable{return promoting_hash_for(*first++);}, 316 | static_cast(std::distance(first,last)), 317 | [&f,first](bool res)mutable{f(*first++,res);}); 318 | } 319 | 320 | private: 321 | template< 322 | typename T1,std::size_t K1,typename SF,std::size_t S,typename H,typename A 323 | > 324 | bool friend operator==( 325 | const filter& x,const filter& y); 326 | 327 | using hash_base=empty_value; 328 | 329 | const Hash& h()const{return hash_base::get();} 330 | Hash& h(){return hash_base::get();} 331 | 332 | template 333 | /* NOLINTNEXTLINE(readability-redundant-inline-specifier) */ 334 | inline std::uint64_t hash_for(const U& x)const 335 | { 336 | return mix_policy::mix(h(),x); 337 | } 338 | 339 | /* promoting_hash_for forces conversion to value_type unless Hash 340 | * is transparent. 341 | */ 342 | 343 | /* NOLINTNEXTLINE(readability-redundant-inline-specifier) */ 344 | inline std::uint64_t promoting_hash_for(const T& x)const 345 | { 346 | return hash_for(x); 347 | } 348 | 349 | template< 350 | typename U, 351 | typename H=hasher,detail::enable_if_transparent_t* =nullptr 352 | > 353 | /* NOLINTNEXTLINE(readability-redundant-inline-specifier) */ 354 | inline std::uint64_t promoting_hash_for(const U& x)const 355 | { 356 | return hash_for(x); 357 | } 358 | 359 | template 360 | void insert_impl( 361 | Iterator first,Iterator last,std::false_type /* input iterator */) 362 | { 363 | while(first!=last)insert(*first++); 364 | } 365 | 366 | template 367 | void insert_impl( 368 | Iterator first,Iterator last,std::true_type /* forward iterator */) 369 | { 370 | super::bulk_insert( 371 | [this,first]()mutable{return promoting_hash_for(*first++);}, 372 | static_cast(std::distance(first,last))); 373 | } 374 | }; 375 | 376 | template< 377 | typename T,std::size_t K,typename SF,std::size_t S,typename H,typename A 378 | > 379 | bool operator==(const filter& x,const filter& y) 380 | { 381 | using super=typename filter::super; 382 | return static_cast(x)==static_cast(y); 383 | } 384 | 385 | template< 386 | typename T,std::size_t K,typename SF,std::size_t S,typename H,typename A 387 | > 388 | bool operator!=(const filter& x,const filter& y) 389 | { 390 | return !(x==y); 391 | } 392 | 393 | template< 394 | typename T,std::size_t K,typename SF,std::size_t S,typename H,typename A 395 | > 396 | void swap(filter& x,filter& y) 397 | noexcept(noexcept(x.swap(y))) 398 | { 399 | x.swap(y); 400 | } 401 | 402 | #if defined(BOOST_MSVC) 403 | #pragma warning(pop) /* C4714 */ 404 | #endif 405 | 406 | } /* namespace bloom */ 407 | } /* namespace boost */ 408 | #endif 409 | -------------------------------------------------------------------------------- /benchmark/bulk_comparison_table.cpp: -------------------------------------------------------------------------------- 1 | /* Comparison table of regular vs. bulk operations for several configurations 2 | * of boost::bloom::filter. 3 | * 4 | * Copyright 2025 Joaquin M Lopez Munoz. 5 | * Distributed under the Boost Software License, Version 1.0. 6 | * (See accompanying file LICENSE_1_0.txt or copy at 7 | * http://www.boost.org/LICENSE_1_0.txt) 8 | * 9 | * See https://www.boost.org/libs/bloom for library home page. 10 | */ 11 | 12 | #include 13 | #include 14 | #include 15 | #include 16 | 17 | std::chrono::high_resolution_clock::time_point measure_start,measure_pause; 18 | 19 | template 20 | double measure(F f) 21 | { 22 | using namespace std::chrono; 23 | 24 | static const int num_trials=10; 25 | static const milliseconds min_time_per_trial(10); 26 | std::array trials; 27 | 28 | for(int i=0;i>(t2-measure_start).count()/runs; 40 | } 41 | 42 | std::sort(trials.begin(),trials.end()); 43 | return std::accumulate( 44 | trials.begin()+2,trials.end()-2,0.0)/(trials.size()-4); 45 | } 46 | 47 | void pause_timing() 48 | { 49 | measure_pause=std::chrono::high_resolution_clock::now(); 50 | } 51 | 52 | void resume_timing() 53 | { 54 | measure_start+=std::chrono::high_resolution_clock::now()-measure_pause; 55 | } 56 | 57 | #include 58 | #include 59 | #include 60 | #include 61 | #include 62 | #include 63 | #include 64 | #include 65 | #include 66 | #include 67 | #include 68 | #include 69 | 70 | static std::size_t num_elements; 71 | 72 | struct test_results 73 | { 74 | double insertion_time; /* ns per element */ 75 | double successful_lookup_time; /* ns per element */ 76 | double unsuccessful_lookup_time; /* ns per element */ 77 | double mixed_lookup_time; /* ns per element */ 78 | }; 79 | 80 | template 81 | test_results test(std::size_t c) 82 | { 83 | using value_type=typename Filter::value_type; 84 | 85 | static constexpr double lookup_mix=0.1; /* successful pr. */ 86 | static constexpr std::uint64_t mixed_lookup_cut= 87 | (std::uint64_t)( 88 | lookup_mix*(double)(std::numeric_limits::max)()); 89 | 90 | std::vector data_in,data_out,data_mixed; 91 | { 92 | boost::detail::splitmix64 rng; 93 | boost::unordered_flat_set unique; 94 | for(std::size_t i=0;i 171 | bulk_test_results bulk_test(std::size_t c) 172 | { 173 | using value_type=typename Filter::value_type; 174 | 175 | static constexpr double lookup_mix=0.5; 176 | static constexpr std::uint64_t mixed_lookup_cut= 177 | (std::uint64_t)( 178 | lookup_mix*(double)(std::numeric_limits::max)()); 179 | std::vector data_in,data_out,data_mixed; 180 | { 181 | boost::detail::splitmix64 rng; 182 | boost::unordered_flat_set unique; 183 | for(std::size_t i=0;i void row(std::size_t c) 269 | { 270 | std::cout<< 271 | " \n" 272 | " "<\n"; 273 | 274 | boost::mp11::mp_for_each< 275 | boost::mp11::mp_transform 276 | >([&](auto i){ 277 | using filter=typename decltype(i)::type; 278 | auto res=test(c); 279 | auto bulk_res=bulk_test(c); 280 | std::cout<< 281 | " "<\n" 282 | " "<\n" 283 | " "<\n" 284 | " "<\n" 285 | " "<\n"; 286 | }); 287 | 288 | std::cout<< 289 | " \n"; 290 | } 291 | 292 | using namespace boost::bloom; 293 | 294 | template 295 | using filters1=boost::mp11::mp_list< 296 | filter, 297 | filter>, 298 | filter,1> 299 | >; 300 | 301 | template 302 | using filters2=boost::mp11::mp_list< 303 | filter>, 304 | filter,1>, 305 | filter> 306 | >; 307 | 308 | template 309 | using filters3=boost::mp11::mp_list< 310 | filter,1>, 311 | filter>, 312 | filter,1> 313 | >; 314 | 315 | template 316 | using filters4=boost::mp11::mp_list< 317 | filter>, 318 | filter,1>, 319 | filter> 320 | >; 321 | 322 | int main(int argc,char* argv[]) 323 | { 324 | if(argc<2){ 325 | std::cerr<<"provide the number of elements\n"; 326 | return EXIT_FAILURE; 327 | } 328 | try{ 329 | num_elements=std::stoul(argv[1]); 330 | } 331 | catch(...){ 332 | std::cerr<<"wrong arg\n"; 333 | return EXIT_FAILURE; 334 | } 335 | 336 | /* table */ 337 | 338 | auto subheader= 339 | " K\n" 340 | " ins.\n" 341 | " succ.
lkp.\n" 342 | " uns.
lkp.\n" 343 | " mixed
lkp.\n"; 344 | 345 | std::cout<< 346 | "\n" 347 | " \n" 348 | " \n" 349 | " \n" 350 | " \n" 351 | " \n" 352 | " \n" 353 | " \n" 354 | " \n"<< 355 | subheader<< 356 | subheader<< 357 | subheader<< 358 | " \n"; 359 | 360 | row>( 8); 361 | row>(12); 362 | row>(16); 363 | row>(20); 364 | 365 | std::cout<< 366 | " \n" 367 | " \n" 368 | " \n" 369 | " \n" 370 | " \n" 371 | " \n" 372 | " \n" 373 | " \n"<< 374 | subheader<< 375 | subheader<< 376 | subheader<< 377 | " \n"; 378 | 379 | row>( 8); 380 | row>(12); 381 | row>(16); 382 | row>(20); 383 | 384 | std::cout<< 385 | " \n" 386 | " \n" 387 | " \n" 388 | " \n" 389 | " \n" 390 | " \n" 391 | " \n" 392 | " \n"<< 393 | subheader<< 394 | subheader<< 395 | subheader<< 396 | " \n"; 397 | 398 | row>( 8); 399 | row>(12); 400 | row>(16); 401 | row>(20); 402 | 403 | std::cout<< 404 | " \n" 405 | " \n" 406 | " \n" 407 | " \n" 408 | " \n" 409 | " \n" 410 | " \n" 411 | " \n"<< 412 | subheader<< 413 | subheader<< 414 | subheader<< 415 | " \n"; 416 | 417 | row>( 8); 418 | row>(12); 419 | row>(16); 420 | row>(20); 421 | 422 | std::cout<<"
filter<int,K>filter<int,1,block<uint64_t,K>>filter<int,1,block<uint64_t,K>,1>
c
filter<int,1,multiblock<uint64_t,K>>filter<int,1,multiblock<uint64_t,K>,1>filter<int,1,fast_multiblock32<K>>
c
filter<int,1,fast_multiblock32<K>,1>filter<int,1,fast_multiblock64<K>>filter<int,1,fast_multiblock64<K>,1>
c
filter<int,1,block<uint64_t[8],K>>filter<int,1,block<uint64_t[8],K>,1>filter<int,1,multiblock<uint64_t[8],K>>
c
\n"; 423 | } 424 | -------------------------------------------------------------------------------- /doc/bloom/tutorial.adoc: -------------------------------------------------------------------------------- 1 | [#tutorial] 2 | = Tutorial 3 | 4 | :idprefix: tutorial_ 5 | 6 | A `boost::bloom::filter` can be regarded as a bit array divided into _subarrays_ that 7 | are selected pseudo-randomly (based on a hash function) upon insertion: 8 | each of the subarrays is passed to a _subfilter_ that marks several of its bits according 9 | to some associated strategy. 10 | 11 | Note that although `boost::bloom::filter` mimics the interface of a container 12 | and provides operations such as `insert`, it is actually _not_ a 13 | container: for instance, insertion does not involve the actual storage 14 | of the element in the data stucture, but merely sets some bits in the internal 15 | array based on the hash value of the element. 16 | 17 | == Filter Definition 18 | 19 | [listing,subs="+macros,+quotes"] 20 | ----- 21 | template< 22 | typename T, std::size_t K, 23 | typename Subfilter = block, std::size_t Stride = 0, 24 | typename Hash = boost::hash, 25 | typename Allocator = std::allocator 26 | > 27 | class filter; 28 | ----- 29 | 30 | * `T`: Type of the elements inserted. 31 | * `K`: Number of subarrays marked per insertion. 32 | * `xref:tutorial_subfilter[Subfilter]`: Type of subfilter used. 33 | * `xref:tutorial_stride[Stride`]: Distance in bytes between the initial positions of consecutive subarrays. 34 | * `xref:tutorial_hash[Hash]`: A hash function for `T`. 35 | * `Allocator`: An allocator for `unsigned char`. 36 | 37 | === `Subfilter` 38 | 39 | A subfilter defines the local strategy for setting or checking bits within 40 | a selected subarray of the bit array. It determines how many bits are 41 | modified per operation, how they are arranged in memory, and how memory is accessed. 42 | The following subfilters are provided: 43 | 44 | ++++ 45 |
46 | ++++ 47 | [options="header"] 48 | |=== 49 | | Subfilter | Description | Pros | Cons 50 | 51 | | `block` 52 | | Sets `K'` bits in a subarray of type `Block` 53 | | Very fast access time 54 | | FPR is worse (higher) the smaller `Block` is 55 | 56 | | `multiblock` 57 | | Sets one bit in each of the elements of a `Block[K']` subarray 58 | | Better (lower) FPR than `block` for the same `Block` type 59 | | Performance may worsen if cacheline boundaries are crossed when accessing the subarray 60 | 61 | | `fast_multiblock32` 62 | | Statistically equivalent to `multiblock`, but uses 63 | faster SIMD-based algorithms when SSE2, AVX2 or Neon are enabled at 64 | compile time 65 | | Always prefer it to `multiblock` when SSE2/AVX2/Neon is available 66 | | FPR is worse (higher) than `fast_multiblock64` for the same `K'` 67 | 68 | | `fast_multiblock64` 69 | | Statistically equivalent to `multiblock`, but uses a 70 | faster SIMD-based algorithm when AVX2 is enabled at compile time 71 | | Always prefer it to `multiblock` when AVX2 is available 72 | | Slower than `fast_multiblock32` for the same `K'` 73 | |=== 74 | ++++ 75 |
76 | ++++ 77 | 78 | In the table above, `Block` can be an unsigned integral type 79 | (e.g. `unsigned char`, `uint32_t`, `uint64_t`), or 80 | an array of 2^`N`^ unsigned integrals (e.g. `uint64_t[8]`). In general, 81 | the wider `Block` is, the better (lower) the resulting FPR, but 82 | cache locality worsens and performance may suffer as a result. 83 | 84 | Note that the total number of bits set/checked for a 85 | `boost::bloom::filter>` is `K * K'`. The 86 | default configuration `boost::bloom::filter` = 87 | `boost::bloom::filter>`, which corresponds to a 88 | xref:primer_implementation[classical Bloom filter], has the best (lowest) FPR among all filters 89 | with the same number of bits per operation, but is also the slowest: a new 90 | subarray is accessed for each bit set/checked. Consult the 91 | xref:benchmarks[benchmarks section] to see different tradeoffs between FPR and 92 | performance. 93 | 94 | Once a subfilter has been selected, the parameter `K'` can be tuned 95 | to its optimum value (minimum FPR) if the number of elements that will be inserted is 96 | known in advance, as explained in a xref:configuration[dedicated section]; 97 | otherwise, low values of `K'` will generally be faster and preferred to 98 | higher values as long as the resulting FPR is at acceptable levels. 99 | 100 | === `Stride` 101 | 102 | As we have seen, `Subfilter` defines the subarray (`Block` in the case of 103 | `block`, `Block[K']` for `multiblock`) used by 104 | `boost::bloom::filter`: contiguous portions of the underlying bit array 105 | are then accessed and treated as those subarrays. The `Stride` parameter 106 | controls the distance in bytes between the initial positions of 107 | consecutive subarrays. 108 | 109 | When the default value 0 is used, the stride is automatically set 110 | to the size of the subarrays, and so there's no overlapping between them. 111 | If `Stride` is set to a smaller value than that size, contiguous 112 | subarrays superimpose on one another: the level of overlap is larger 113 | for smaller values of `Stride`, with maximum overlap happening when 114 | `Stride` is 1 byte. 115 | 116 | image::stride.png[align=center, title="Two different configurations of `Stride`: (a) non-overlapping subarrays, (b) overlapping subarrays.+++
+++Each subarray is associated to the stride of the same color."] 117 | 118 | As it happens, overlapping improves (decreases) the resulting FPR 119 | with respect to the non-overlapping case, the tradeoff being that 120 | subarrays may not be aligned in memory, which can impact performance 121 | negatively. 122 | 123 | === `Hash` 124 | 125 | Unlike other Bloom filter implementations requiring several hash functions per operation, 126 | `boost::bloom::filter` uses only one. 127 | By default, link:../../../container_hash/index.html[Boost.ContainerHash] is used. 128 | Consult this library's link:../../../container_hash/doc/html/hash.html#user[dedicated section] 129 | if you need to extend `boost::hash` for your own types. 130 | 131 | When the provided hash function is of sufficient quality, it is used 132 | as is; otherwise, a bit-mixing post-process is applied to hash values that improves 133 | their statistical properties so that the resulting FPR approaches its 134 | theoretical limit. The hash function is determined to be of high quality 135 | (more precisely, to have the so-called _avalanching_ property) via the 136 | `link:../../../container_hash/doc/html/hash.html#ref_hash_is_avalanchinghash[boost::hash_is_avalanching]` 137 | trait. 138 | 139 | == Capacity 140 | 141 | The size of the filter's internal array is specified at construction time: 142 | 143 | [source,subs="+macros,+quotes"] 144 | ----- 145 | using filter = boost::bloom::filter; 146 | filter f(1'000'000); // array of 1'000'000 **bits** 147 | std::cout << f.capacity(); // >= 1'000'000 148 | ----- 149 | 150 | Note that `boost::bloom::filter` default constructor specifies a capacity 151 | of zero, which in general won't be of much use -- the assigned array 152 | is null. 153 | 154 | Instead of specifying the array's capacity directly, we can let the library 155 | figure it out based on the number of elements we plan to insert and the 156 | desired FPR: 157 | 158 | [source] 159 | ----- 160 | // we'll insert 100'000 elements and want a FPR ~ 1% 161 | filter f(100'000, 0.01); 162 | 163 | // this is equivalent 164 | filter f2(filter::capacity_for(100'000, 0.01)); 165 | ----- 166 | 167 | Be careful when the FPR specified is very small, as the resulting capacity 168 | may be too large to fit in memory: 169 | 170 | [source] 171 | ----- 172 | // resulting capacity ~ 1.4E12, out of memory std::bad_alloc is thrown 173 | filter f3(100'000, 1E-50); 174 | ----- 175 | 176 | Once a filter is constructed, its array is fixed (for instance, it won't 177 | grow dynamically as elements are inserted). The only way to change it is 178 | by assignment/swapping from a different filter, or using `reset`: 179 | 180 | [source,subs="+macros,+quotes"] 181 | ----- 182 | f.reset(2'000'000); // change to 2'000'000 bits **and clears the filter** 183 | f.reset(100'000, 0.005); // equivalent to reset(filter::capacity_for(100'000, 0.005)); 184 | f.reset(); // null array (capacity == 0) 185 | ----- 186 | 187 | == Insertion and Lookup 188 | 189 | Insertion is done in much the same way as with a traditional container: 190 | 191 | [source] 192 | ----- 193 | f.insert("hello"); 194 | f.insert(data.begin(), data.end()); 195 | ----- 196 | 197 | Of course, in this context "insertion" does not involve any actual 198 | storage of elements into the filter, but rather the setting of bits in the 199 | internal array based on the hash values of those elements. 200 | Lookup goes as follows: 201 | 202 | [source] 203 | ----- 204 | bool b1 = f.may_contain("hello"); // b1 is true since we actually inserted "hello" 205 | bool b2 = f.may_contain("bye"); // b2 is most likely false 206 | ----- 207 | 208 | As its name suggests, `may_contain` can return `true` even if the 209 | element has not been previously inserted, that is, it may yield false 210 | positives -- this is the essence of probabilistic data structures. 211 | `fpr_for` provides an estimation of the false positive rate: 212 | 213 | [source] 214 | ----- 215 | // we have inserted 100 elements so far, what's our FPR? 216 | std::cout<< filter::fpr_for(100, f.capacity()); 217 | ----- 218 | 219 | Note that in the example we provided the number 100 externally: 220 | `boost::bloom::filter` does not keep track of the number of elements 221 | that have been inserted -- in other words, it does not have a `size` 222 | operation. 223 | 224 | Once inserted, there is no way to remove a specific element from the filter. 225 | We can only clear up the filter entirely: 226 | 227 | [source] 228 | ----- 229 | f.clear(); // sets all the bits in the array to zero 230 | ----- 231 | 232 | == Bulk Operations 233 | 234 | In general, the following code: 235 | 236 | [source] 237 | ----- 238 | f.insert(data.begin(), data.end()); 239 | ----- 240 | 241 | is faster than: 242 | 243 | [source] 244 | ----- 245 | for(const auto& x: data) f.insert(x); 246 | ----- 247 | 248 | This is so because the former processes the range in 249 | chunks of size xref:filter_bulk_insert_size[`bulk_insert_size`] 250 | using some internal streamlining techniques in order to reduce execution 251 | time. Similarly, `may_contain` can be executed 252 | in bulk mode as follows: 253 | 254 | [source] 255 | ----- 256 | f.may_contain( 257 | input.begin(), input.end(), // range of elements to do lookup on 258 | [](value_type& x, bool b) { // called for each of the elements with their lookup result 259 | if(b) std::cout << x << "likely in the filter"; 260 | else std::cout << x << "not in the filter"; 261 | }); 262 | ----- 263 | 264 | Bulk `may_contain` processes the range in chunks of 265 | xref:filter_bulk_may_contain_size[`bulk_may_contain_size`] elements. 266 | 267 | Bulk mode can increase performance by a factor of 2x or more, but this is 268 | very dependent on the filter configuration, the compiler used and the 269 | environment, and in some cases it results in a net performance loss. 270 | In general, the speedup is higher for larger array sizes. 271 | For more information, consult the dedicated 272 | xref:benchmarks_bulk_operations[benchmark section] and 273 | https://github.com/boostorg/boost_bloom_benchmarks/tree/bulk-operations[associated repo^]. 274 | 275 | == Filter Combination 276 | 277 | `boost::bloom::filter`+++s+++ can be combined by doing the OR logical operation 278 | of the bits of their arrays: 279 | 280 | [source] 281 | ----- 282 | filter f2 = ...; 283 | ... 284 | f |= f2; // f and f2 must have exactly the same capacity 285 | ----- 286 | 287 | The result is equivalent to a filter "containing" the set union of the elements 288 | of `f` and `f2`. AND combination, on the other hand, results in a filter 289 | holding the _intersection_ of the elements: 290 | 291 | [source] 292 | ----- 293 | filter f3 = ...; 294 | ... 295 | f &= f3; // f and f3 must have exactly the same capacity 296 | ----- 297 | 298 | For AND combination, be aware that the resulting FPR will be in general 299 | worse (higher) than if the filter had been constructed from scratch 300 | by inserting only the common elements -- don't trust `fpr_for` in this 301 | case. 302 | 303 | == Direct Access to the Array 304 | 305 | The contents of the bit array can be accessed directly with the `array` 306 | member function, which can be leveraged for filter serialization: 307 | 308 | [source] 309 | ----- 310 | filter f1 = ...; 311 | ... 312 | 313 | // save filter 314 | std::ofstream out("filter.bin", std::ios::binary); 315 | std::size_t c1=f1.capacity(); 316 | out.write(reinterpret_cast(&c1), sizeof(c1)); // save capacity (bits) 317 | boost::span s1 = f1.array(); 318 | out.write(reinterpret_cast(s1.data()), s1.size()); // save array 319 | out.close(); 320 | 321 | // load filter 322 | filter f2; 323 | std::ifstream in("filter.bin", std::ios::binary); 324 | std::size_t c2; 325 | in.read(reinterpret_cast(&c2), sizeof(c2)); 326 | f2.reset(c2); // restore capacity 327 | boost::span s2 = f2.array(); 328 | in.read(reinterpret_cast(s2.data()), s2.size()); // load array 329 | in.close(); 330 | ----- 331 | 332 | Note that `array()` is a span over `unsigned char`+++s+++ whereas 333 | capacities are measured in bits, so `array.size()` is 334 | `capacity() / CHAR_BIT`. If you load a serialized filter in a computer 335 | other than the one where it was saved, take into account that 336 | the CPU architectures at each end must have the same 337 | https://es.wikipedia.org/wiki/Endianness[endianness^] for the 338 | reconstruction to work. 339 | 340 | == Debugging 341 | 342 | === Visual Studio Natvis 343 | 344 | Add the link:../../extra/boost_bloom.natvis[`boost_bloom.natvis`^] visualizer 345 | to your project to allow for user-friendly inspection of `boost::bloom::filter`+++s+++. 346 | 347 | image::natvis.png[align=center, title="View of a `boost::bloom::filter` with `boost_bloom.natvis`."] 348 | 349 | === GDB Pretty-Printer 350 | 351 | `boost::bloom::filter` comes with a dedicated 352 | https://sourceware.org/gdb/current/onlinedocs/gdb.html/Pretty-Printing.html#Pretty-Printing[pretty-printer^] 353 | for visual inspection when debugging with GDB: 354 | 355 | [source,plaintext] 356 | ----- 357 | (gdb) print f 358 | $1 = boost::bloom::filter with {capacity = 2000, data = 0x6da840, size = 250} = {[0] = 0 '\000', 359 | [1] = 0 '\000', [2] = 0 '\000', [3] = 0 '\000', [4] = 0 '\000', [5] = 1 '\001'...} 360 | 361 | (gdb) # boost::bloom::filter does not have an operator[]. The following expression 362 | (gdb) # is used in place of print f.array()[30] 363 | (gdb) print f[30] 364 | $2 = 128 '\200' 365 | ----- 366 | 367 | Remember to enable pretty-printing in GDB (typically a one-time setup): 368 | 369 | [source,plaintext] 370 | ----- 371 | (gdb) set print pretty on 372 | ----- 373 | 374 | The pretty-printer is automatically embedded in the program if your compiled binary 375 | format is ELF and the macro `BOOST_ALL_NO_EMBEDDED_GDB_SCRIPTS` is _not_ defined; 376 | embedded pretty-printers are enabled for a particular GDB session 377 | with this command (or by default by adding it to your `.gdbinit` configuration 378 | file): 379 | 380 | [source,plaintext,subs="+quotes"] 381 | ----- 382 | (gdb) add-auto-load-safe-path __ 383 | ----- 384 | 385 | As an alternative to using the embedded pretty-printer, you can explicitly 386 | load the link:../../extra/boost_bloom_printers.py[`boost_bloom_printers.py`^] 387 | script: 388 | 389 | [source,plaintext,subs="+quotes"] 390 | ----- 391 | (gdb) source __/libs/bloom/extra/boost_bloom_printers.py 392 | ----- 393 | --------------------------------------------------------------------------------