├── AUTHORS ├── CHANGES ├── LICENSE ├── README.md ├── THIRD-PARTY-NOTICES ├── VERSION ├── esa_matchfinder.c ├── esa_matchfinder.h └── libsais ├── CHANGES ├── LICENSE ├── VERSION ├── libsais.c └── libsais.h /AUTHORS: -------------------------------------------------------------------------------- 1 | -- Authors of esa-matchfinder 2 | 3 | Ilya Grebnov 4 | 5 | -- This program is based on (at least) the work of 6 | 7 | Eric Biggers, Charles Bloom, Piotr Tarsa, Yann Collet, 8 | Bulat Ziganshin, Conor McCarthy, Lucas Marsh, Emmanuel Marty, 9 | Aki Utoslahti, Mohamed Ibrahim Abouelhoda, Enno Ohlebusch. 10 | -------------------------------------------------------------------------------- /CHANGES: -------------------------------------------------------------------------------- 1 | Changes in 1.2.1 (February 6, 2025) 2 | - Resolved strict aliasing violation resulted in invalid code generation by Intel compiler. 3 | 4 | Changes in 1.2.0 (December 2, 2023) 5 | - Small performance optimization for esa_matchfinder_advance API. 6 | 7 | Changes in 1.1.0 (November 30, 2023) 8 | - New API to find matches within specified sliding window. 9 | 10 | Changes in 1.0.1 (June 19, 2022) 11 | - Improved cache coherence for ARMv8 architecture. 12 | 13 | Changes in 1.0.0 (June 12, 2022) 14 | - Initial public release of the esa-matchfinder. 15 | 16 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # The esa-matchfinder 2 | 3 | The esa-matchfinder is a C99 library for efficient Lempel-Ziv factorization using enhanced suffix array (ESA). 4 | 5 | Copyright (c) 2022-2025 Ilya Grebnov 6 | 7 | > * The esa-matchfinder is block based algorithm with maximum supported block size of 512 megabytes finding matches in range of 2..64 bytes using 12x bytes of extra memory. ESA_MATCHFINDER_MATCH_BITS definition could be changed to support larger match finding range, but with reduction in maximum supported block size. 8 | > * The esa-matchfinder does not employ any heuristics or search depth limitations and always finds distance optimal matches even on highly repetitive sources. The only exception is matches at beginning at the block; due to implementation details the esa-matchfinder can not find any matches with offset 0. 9 | > * The esa-matchfinder is fast in best, average and worst cases (see [Benchmarks](#benchmarks) below). But the esa-matchfinder is sensitive to fast memory and software prefetching and might not be suitable for some CPU architectures. The esa-matchfinder might also be suboptimal on specific data types (some DNA sequences in particular), so please benchmark yourself. 10 | > * The esa-matchfinder works with compilers from Microsoft and GNU, but I recommend Clang for best performance. Additionally, the esa-matchfinder is designed for 64-bit systems and will work suboptimally on 32-bit system. 11 | 12 | ## Algorithm 13 | > The esa-matchfinder uses methodology of bottom-up traversal of the Longest Common Prefix (LCP) interval tree written in 2014-2015 by Eric Biggers and dedicated to the public domain worldwide. 14 | 15 | The esa-matchfinder finds all distance optimal matches (between min_match_length and max_match_length inclusive) for every position of the input block using following algorithm: 16 | 17 | 1. Suffix (SA) and longest common prefix (LCP) arrays are constructed for the input block. Next, interval tree is constructed on top of SA and LCP arrays. 18 | > * The data structure consisting of SA and LCP is often referred as enhanced suffix array (ESA). Hence the name of the match finder. 19 | 20 | 2. Each interval is a maximum range (could not be further extended to the left or right) of suffixes in SA with common prefix of certain length (LCP). This intervals represent internal nodes of suffix tree. 21 | 3. Using interval tree we can now traverse up or down to either wider interval with smaller common prefix or narrower intervals with larger common prefixes. 22 | 4. For purpose of Lempel-Ziv factorization we only need to support bottom-up traversal, so during interval tree construction we only need to capture link to parent interval and length of interval's common prefix. 23 | 5. LCP array is also pruned by min_match_length and max_match_length to reduce size and depth of interval tree. 24 | 6. Additionally, for each position of input block we capture link to a leaf interval corresponding to that position, so we can start bottom-up traversal during factorization phase. 25 | 7. SA, LCP and interval tree construction is done during input block parsing phase in linear time with optional multi-threaded optimization using OpenMP. 26 | 8. LZ factorization phase is done from left to right by bottom-up traversal of interval tree for each position from input block by reading and updating each corresponding interval with latest offset. 27 | 28 | ## License 29 | The esa-matchfinder released under the [Apache License Version 2.0](LICENSE "Apache license") and is considered suitable for production use. However, no warranty or fitness for a particular purpose is expressed or implied. 30 | 31 | ## Changes 32 | * February 6, 2025 (1.2.1) 33 | * Resolved strict aliasing violation resulted in invalid code generation by Intel compiler. 34 | * December 2, 2023 (1.2.0) 35 | * Small performance optimization for esa_matchfinder_advance API. 36 | * November 30, 2023 (1.1.0) 37 | * New API to find matches within specified sliding window. 38 | * June 19, 2022 (1.0.1) 39 | * Improved cache coherence for ARMv8 architecture. 40 | * June 12, 2022 (1.0.0) 41 | * Initial public release of the esa-matchfinder. 42 | 43 | ## Example of usage (See [esa_matchfinder.h](esa_matchfinder.h) for complete APIs list) 44 | ```c 45 | #include "esa_matchfinder.h" 46 | 47 | long long multi_pass_optimal_parse(const unsigned char * buffer, int size) 48 | { 49 | long long total_matches = 0; 50 | 51 | void * mf = esa_matchfinder_create(size, /*min_match_length*/ 2, /*max_match_length*/ 64); 52 | if (mf != NULL && esa_matchfinder_parse(mf, buffer, size) == ESA_MATCHFINDER_NO_ERROR) 53 | { 54 | for (int pass = 0; pass < 2; pass += 1) 55 | { 56 | ESA_MATCHFINDER_MATCH matches[ESA_MATCHFINDER_MAX_MATCH_LENGTH]; 57 | 58 | esa_matchfinder_rewind(mf, /*position*/ 0); 59 | 60 | for (int position = 0; position < size; position += 1) 61 | { 62 | total_matches += esa_matchfinder_find_all_matches(mf, matches) - matches; 63 | } 64 | } 65 | } 66 | 67 | esa_matchfinder_destroy(mf); 68 | 69 | return total_matches; 70 | } 71 | ``` 72 | 73 | --- 74 | 75 | # Benchmarks # 76 | 77 | ## Methodology ## 78 | * Input files were capped at 510MB for tests on x86-64 architecture and 128MB for tests on ARMv8 architecture 79 | * For all match finders maximum match length was set to 64 bytes (other parameters were not changed) 80 | * Optimality is defined as percentage of optimal matches (longest possible length) found across all possible matches 81 | * The timings are minimum of five runs measuring single-threaded performance in *optimal* parsing mode 82 | 83 | ## Specification (x86-64 architecture) ## 84 | * OS: Microsoft Windows 10 Pro 64-Bit 85 | * CPU: Intel Core i7-9700K Processor (12M Cache, 5GHz) 86 | * RAM: 2x8 GB dual-channel DDR4 (4133 MHz, 17-17-17-37) 87 | * Compiler: Microsoft Visual C++ compiler v14.32 88 | * Optimizations: /MD /DNDEBUG /O2 /GL /arch:AVX2 89 | 90 | ### Silesia Corpus (x86-64 architecture) ### 91 | 92 | | file | size | esa-matchfinder v1.0.0 | optimality | LZMA HC4 v21.07 | optimality | LZMA BT4 v21.07 | optimality | 93 | |:----:|:----:|:----------------------:|:----------:|:---------------:|:----------:|:---------------:|:----------:| 94 | | dickens | 10192446 | **0.715 sec (14.26 MB/s)** | 100.00% | 3.632 sec (2.81 MB/s) | 40.43% | 2.880 sec (3.54 MB/s) | 99.84% | 95 | | mozilla | 51220480 | **3.533 sec (14.50 MB/s)** | 100.00% | 12.364 sec (4.14 MB/s) | 55.98% | 8.546 sec (5.99 MB/s) | 80.08% | 96 | | mr | 9970564 | **0.865 sec (11.53 MB/s)** | 100.00% | 1.680 sec (5.93 MB/s) | 56.13% | 1.688 sec (5.91 MB/s) | 94.35% | 97 | | nci | 33553445 | **3.626 sec (9.25 MB/s)** | 100.00% | 6.155 sec (5.45 MB/s) | 16.25% | 5.957 sec (5.63 MB/s) | 61.62% | 98 | | ooffice | 6152192 | **0.324 sec (18.99 MB/s)** | 100.00% | 1.120 sec (5.49 MB/s) | 70.35% | 0.780 sec (7.89 MB/s) | 84.30% | 99 | | osdb | 10085684 | **0.599 sec (16.84 MB/s)** | 100.00% | 2.191 sec (4.60 MB/s) | 35.58% | 1.716 sec (5.88 MB/s) | 85.83% | 100 | | reymont | 6627202 | **0.478 sec (13.86 MB/s)** | 100.00% | 1.644 sec (4.03 MB/s) | 27.64% | 1.586 sec (4.18 MB/s) | 99.43% | 101 | | samba | 21606400 | **1.484 sec (14.56 MB/s)** | 100.00% | 4.093 sec (5.28 MB/s) | 63.92% | 2.932 sec (7.37 MB/s) | 85.49% | 102 | | sao | 7251944 | **0.405 sec (17.91 MB/s)** | 100.00% | 1.345 sec (5.39 MB/s) | 34.99% | 0.997 sec (7.27 MB/s) | 51.66% | 103 | | webster | 41458703 | **4.388 sec (9.45 MB/s)** | 100.00% | 15.712 sec (2.64 MB/s) | 38.14% | 12.630 sec (3.28 MB/s) | 95.90% | 104 | | x-ray | 8474240 | **0.412 sec (20.57 MB/s)** | 100.00% | 1.751 sec (4.84 MB/s) | 90.99% | 1.081 sec (7.84 MB/s) | 95.25% | 105 | | xml | 5345280 | **0.278 sec (19.23 MB/s)** | 100.00% | 0.777 sec (6.88 MB/s) | 57.68% | 0.667 sec (8.01 MB/s) | 98.50% | 106 | 107 | ### Large Canterbury Corpus (x86-64 architecture) ### 108 | 109 | | file | size | esa-matchfinder v1.0.0 | optimality | LZMA HC4 v21.07 | optimality | LZMA BT4 v21.07 | optimality | 110 | |:----:|:----:|:----------------------:|:----------:|:---------------:|:----------:|:---------------:|:----------:| 111 | | bible.txt | 4047392 | **0.256 sec (15.81 MB/s)** | 100.00% | 1.021 sec (3.96 MB/s) | 50.12% | 0.806 sec (5.02 MB/s) | 99.83% | 112 | | E.coli | 4638690 | **0.327 sec (14.19 MB/s)** | 100.00% | 0.817 sec (5.68 MB/s) | 2.70% | 1.603 sec (2.89 MB/s) | 99.95% | 113 | | world192.txt | 2473400 | **0.138 sec (17.92 MB/s)** | 100.00% | 0.525 sec (4.71 MB/s) | 63.90% | 0.363 sec (6.81 MB/s) | 99.10% | 114 | 115 | ### Manzini Corpus (x86-64 architecture) ### 116 | 117 | | file | size | esa-matchfinder v1.0.0 | optimality | LZMA HC4 v21.07 | optimality | LZMA BT4 v21.07 | optimality | 118 | |:----:|:----:|:----------------------:|:----------:|:---------------:|:----------:|:---------------:|:----------:| 119 | | chr22.dna | 34553758 | **4.651 sec (7.43 MB/s)** | 100.00% | 6.019 sec (5.74 MB/s) | 4.67% | 18.142 sec (1.90 MB/s) | 96.22% | 120 | | etext99 | 105277340 | **12.065 sec (8.73 MB/s)** | 100.00% | 46.457 sec (2.27 MB/s) | 24.64% | 46.607 sec (2.26 MB/s) | 98.62% | 121 | | gcc-3.0.tar | 86630400 | **8.230 sec (10.53 MB/s)** | 100.00% | 23.390 sec (3.70 MB/s) | 59.21% | 17.811 sec (4.86 MB/s) | 94.33% | 122 | | howto | 39422105 | **3.259 sec (12.10 MB/s)** | 100.00% | 14.254 sec (2.77 MB/s) | 45.33% | 10.613 sec (3.71 MB/s) | 95.05% | 123 | | jdk13c | 69728899 | **6.522 sec (10.69 MB/s)** | 100.00% | 11.797 sec (5.91 MB/s) | 61.10% | 10.246 sec (6.81 MB/s) | 92.01% | 124 | | linux-2.4.5.tar | 116254720 | **11.075 sec (10.50 MB/s)** | 100.00% | 36.868 sec (3.15 MB/s) | 57.33% | 27.014 sec (4.30 MB/s) | 93.13% | 125 | | rctail96 | 114711151 | **10.375 sec (11.06 MB/s)** | 100.00% | 31.860 sec (3.60 MB/s) | 54.54% | 26.365 sec (4.35 MB/s) | 98.46% | 126 | | rfc | 116421901 | **14.677 sec (7.93 MB/s)** | 100.00% | 37.246 sec (3.13 MB/s) | 35.41% | 33.363 sec (3.49 MB/s) | 87.53% | 127 | | sprot34.dat | 109617186 | **11.628 sec (9.43 MB/s)** | 100.00% | 53.546 sec (2.05 MB/s) | 59.60% | 28.559 sec (3.84 MB/s) | 93.73% | 128 | | w3c2 | 104201579 | **9.480 sec (10.99 MB/s)** | 100.00% | 19.914 sec (5.23 MB/s) | 64.99% | 15.080 sec (6.91 MB/s) | 93.09% | 129 | 130 | ### Large Text Compression Benchmark Corpus (x86-64 architecture) ### 131 | 132 | | file | size | esa-matchfinder v1.0.0 | optimality | LZMA HC4 v21.07 | optimality | LZMA BT4 v21.07 | optimality | 133 | |:----:|:----:|:----------------------:|:----------:|:---------------:|:----------:|:---------------:|:----------:| 134 | | enwik8 | 100000000 | **10.241 sec (9.76 MB/s)** | 100.00% | 48.950 sec (2.04 MB/s) | 33.41% | 40.285 sec (2.48 MB/s) | 96.56% | 135 | | enwik9 | 534773760 | **81.713 sec (6.54 MB/s)** | 100.00% | 277.775 sec (1.93 MB/s) | 24.76% | 257.795 sec (2.07 MB/s) | 91.65% | 136 | 137 | ### The Gauntlet Corpus (x86-64 architecture) ### 138 | 139 | | file | size | esa-matchfinder v1.0.0 | optimality | LZMA HC4 v21.07 | optimality | LZMA BT4 v21.07 | optimality | 140 | |:----:|:----:|:----------------------:|:----------:|:---------------:|:----------:|:---------------:|:----------:| 141 | | abac | 200000 | 0.013 sec (15.38 MB/s) | 100.00% | **0.006 sec (33.33 MB/s)** | 100.00% | 0.008 sec (25.00 MB/s) | 100.00% | 142 | | abba | 10500596 | **1.045 sec (10.05 MB/s)** | 100.00% | 2.926 sec (3.59 MB/s) | 0.03% | 2.789 sec (3.77 MB/s) | 97.64% | 143 | | book1x20 | 15375420 | **0.952 sec (16.15 MB/s)** | 100.00% | 4.008 sec (3.84 MB/s) | 30.82% | 2.912 sec (5.28 MB/s) | 99.87% | 144 | | fib_s14930352 | 14930352 | **0.861 sec (17.34 MB/s)** | 100.00% | 1.844 sec (8.10 MB/s) | 96.99% | 1.057 sec (14.13 MB/s) | 100.00% | 145 | | fss10 | 12078908 | **0.660 sec (18.30 MB/s)** | 100.00% | 1.682 sec (7.18 MB/s) | 95.65% | 0.989 sec (12.21 MB/s) | 100.00% | 146 | | fss9 | 2851443 | **0.128 sec (22.28 MB/s)** | 100.00% | 0.397 sec (7.18 MB/s) | 95.65% | 0.233 sec (12.24 MB/s) | 100.00% | 147 | | houston | 3839141 | 0.257 sec (14.94 MB/s) | 100.00% | **0.141 sec (27.23 MB/s)** | 97.64% | 0.173 sec (22.19 MB/s) | 100.00% | 148 | | paper5x80 | 956322 | **0.036 sec (26.56 MB/s)** | 100.00% | 0.073 sec (13.10 MB/s) | 94.59% | 0.071 sec (13.47 MB/s) | 99.94% | 149 | | test1 | 2097152 | 0.080 sec (26.21 MB/s) | 100.00% | **0.070 sec (29.96 MB/s)** | 100.00% | 0.101 sec (20.76 MB/s) | 100.00% | 150 | | test2 | 2097152 | 0.080 sec (26.21 MB/s) | 100.00% | **0.070 sec (29.96 MB/s)** | 100.00% | 0.101 sec (20.76 MB/s) | 100.00% | 151 | | test3 | 2097088 | 0.077 sec (27.23 MB/s) | 100.00% | **0.071 sec (29.54 MB/s)** | 100.00% | 0.086 sec (24.38 MB/s) | 100.00% | 152 | 153 | ### Pizza & Chilli Corpus (x86-64 architecture) ### 154 | 155 | | file | size | esa-matchfinder v1.0.0 | optimality | LZMA HC4 v21.07 | optimality | LZMA BT4 v21.07 | optimality | 156 | |:----:|:----:|:----------------------:|:----------:|:---------------:|:----------:|:---------------:|:----------:| 157 | | dblp.xml | 296135874 | **39.566 sec (7.48 MB/s)** | 100.00% | 100.764 sec (2.94 MB/s) | 46.52% | 74.943 sec (3.95 MB/s) | 93.38% | 158 | | dna | 403927746 | 98.964 sec (4.08 MB/s) | 100.00% | **72.058 sec (5.61 MB/s)** | 0.51% | 314.605 sec (1.28 MB/s) | 63.97% | 159 | | english.1024MB | 534773760 | **94.228 sec (5.68 MB/s)** | 100.00% | 266.114 sec (2.01 MB/s) | 15.29% | 309.063 sec (1.73 MB/s) | 95.59% | 160 | | pitches | 55832855 | **3.698 sec (15.10 MB/s)** | 100.00% | 16.901 sec (3.30 MB/s) | 79.95% | 10.341 sec (5.40 MB/s) | 95.11% | 161 | | proteins | 534773760 | **69.993 sec (7.64 MB/s)** | 100.00% | 463.257 sec (1.15 MB/s) | 43.66% | 320.663 sec (1.67 MB/s) | 99.85% | 162 | | sources | 210866607 | **22.116 sec (9.53 MB/s)** | 100.00% | 73.429 sec (2.87 MB/s) | 54.69% | 56.835 sec (3.71 MB/s) | 93.25% | 163 | 164 | ### Pizza & Chilli Repetitive Corpus (x86-64 architecture) ### 165 | 166 | | file | size | esa-matchfinder v1.0.0 | optimality | LZMA HC4 v21.07 | optimality | LZMA BT4 v21.07 | optimality | 167 | |:----:|:----:|:----------------------:|:----------:|:---------------:|:----------:|:---------------:|:----------:| 168 | | cere | 461286644 | **68.413 sec (6.74 MB/s)** | 100.00% | 77.037 sec (5.99 MB/s) | 7.30% | 266.363 sec (1.73 MB/s) | 94.04% | 169 | | coreutils | 205281778 | **18.967 sec (10.82 MB/s)** | 100.00% | 54.280 sec (3.78 MB/s) | 34.14% | 40.171 sec (5.11 MB/s) | 87.59% | 170 | | einstein.de.txt | 92758441 | **6.620 sec (14.01 MB/s)** | 100.00% | 10.236 sec (9.06 MB/s) | 80.28% | 9.088 sec (10.21 MB/s) | 96.82% | 171 | | einstein.en.txt | 467626544 | **39.117 sec (11.95 MB/s)** | 100.00% | 56.046 sec (8.34 MB/s) | 77.54% | 48.619 sec (9.62 MB/s) | 97.64% | 172 | | Escherichia_Coli | 112689515 | **16.191 sec (6.96 MB/s)** | 100.00% | 19.889 sec (5.67 MB/s) | 0.26% | 63.204 sec (1.78 MB/s) | 96.93% | 173 | | influenza | 154808555 | 44.042 sec (3.52 MB/s) | 100.00% | **25.125 sec (6.16 MB/s)** | 28.19% | 27.703 sec (5.59 MB/s) | 98.49% | 174 | | kernel | 257961616 | **21.931 sec (11.76 MB/s)** | 100.00% | 81.326 sec (3.17 MB/s) | 24.72% | 59.968 sec (4.30 MB/s) | 93.18% | 175 | | para | 429265758 | **69.888 sec (6.14 MB/s)** | 100.00% | 73.130 sec (5.87 MB/s) | 3.94% | 251.836 sec (1.70 MB/s) | 93.09% | 176 | | world_leaders | 46968181 | **5.758 sec (8.16 MB/s)** | 100.00% | 6.124 sec (7.67 MB/s) | 40.96% | 6.126 sec (7.67 MB/s) | 62.59% | 177 | | dblp.xml.00001.1 | 104857600 | **12.065 sec (8.69 MB/s)** | 100.00% | 18.851 sec (5.56 MB/s) | 32.29% | 16.655 sec (6.30 MB/s) | 96.79% | 178 | | dblp.xml.00001.2 | 104857600 | **12.765 sec (8.21 MB/s)** | 100.00% | 18.972 sec (5.53 MB/s) | 32.01% | 16.767 sec (6.25 MB/s) | 96.99% | 179 | | dblp.xml.0001.1 | 104857600 | **13.064 sec (8.03 MB/s)** | 100.00% | 18.952 sec (5.53 MB/s) | 32.36% | 16.712 sec (6.27 MB/s) | 96.76% | 180 | | dblp.xml.0001.2 | 104857600 | **15.651 sec (6.70 MB/s)** | 100.00% | 19.258 sec (5.44 MB/s) | 30.31% | 17.409 sec (6.02 MB/s) | 96.99% | 181 | | dna.001.1 | 104857600 | **16.519 sec (6.35 MB/s)** | 100.00% | 18.601 sec (5.64 MB/s) | 0.23% | 33.080 sec (3.17 MB/s) | 99.49% | 182 | | english.001.2 | 104857600 | **11.087 sec (9.46 MB/s)** | 100.00% | 29.621 sec (3.54 MB/s) | 36.64% | 21.313 sec (4.92 MB/s) | 99.32% | 183 | | proteins.001.1 | 104857600 | 12.562 sec (8.35 MB/s) | 100.00% | 24.271 sec (4.32 MB/s) | 89.70% | **12.411 sec (8.45 MB/s)** | 99.99% | 184 | | sources.001.2 | 104857600 | **10.484 sec (10.00 MB/s)** | 100.00% | 20.817 sec (5.04 MB/s) | 43.87% | 16.775 sec (6.25 MB/s) | 97.24% | 185 | | fib41 | 267914296 | 19.852 sec (13.50 MB/s) | 100.00% | 33.175 sec (8.08 MB/s) | 96.99% | **19.070 sec (14.05 MB/s)** | 100.00% | 186 | | rs.13 | 216747218 | **16.024 sec (13.53 MB/s)** | 100.00% | 30.276 sec (7.16 MB/s) | 95.65% | 17.824 sec (12.16 MB/s) | 100.00% | 187 | | tm29 | 268435456 | **19.926 sec (13.47 MB/s)** | 100.00% | 51.541 sec (5.21 MB/s) | 89.58% | 26.480 sec (10.14 MB/s) | 100.00% | 188 | 189 | ## Specification (ARMv8 architecture) ## 190 | * OS: Ubuntu 20.04 LTS 64-Bit 191 | * CPU: ODROID-N2+ Amlogic S922X (Cortex-A73 2.4Ghz) 192 | * RAM: 4GB LPDDR4 (2666 MHz) 193 | * Compiler: Clang v10.0.0 194 | * Optimizations: -DNDEBUG -O3 -flto=thin -mcpu=native 195 | 196 | ### Silesia Corpus (ARMv8 architecture) ### 197 | 198 | | file | size | esa-matchfinder v1.0.0 | optimality | LZMA HC4 v21.07 | optimality | LZMA BT4 v21.07 | optimality | 199 | |:----:|:----:|:----------------------:|:----------:|:---------------:|:----------:|:---------------:|:----------:| 200 | | dickens | 10192446 | **4.595 sec (2.22 MB/s)** | 100.00% | 18.955 sec (0.54 MB/s) | 40.43% | 9.678 sec (1.05 MB/s) | 99.84% | 201 | | mozilla | 51220480 | **18.435 sec (2.78 MB/s)** | 100.00% | 43.410 sec (1.18 MB/s) | 55.98% | 24.890 sec (2.06 MB/s) | 80.08% | 202 | | mr | 9970564 | **4.070 sec (2.45 MB/s)** | 100.00% | 7.299 sec (1.37 MB/s) | 56.13% | 5.935 sec (1.68 MB/s) | 94.35% | 203 | | nci | 33553445 | 18.982 sec (1.77 MB/s) | 100.00% | 18.183 sec (1.85 MB/s) | 16.25% | **16.916 sec (1.98 MB/s)** | 61.62% | 204 | | ooffice | 6152192 | **1.884 sec (3.27 MB/s)** | 100.00% | 5.239 sec (1.17 MB/s) | 70.35% | 2.779 sec (2.21 MB/s) | 84.30% | 205 | | osdb | 10085684 | **4.090 sec (2.47 MB/s)** | 100.00% | 9.079 sec (1.11 MB/s) | 35.58% | 5.522 sec (1.83 MB/s) | 85.83% | 206 | | reymont | 6627202 | **3.094 sec (2.14 MB/s)** | 100.00% | 6.582 sec (1.01 MB/s) | 27.64% | 5.023 sec (1.32 MB/s) | 99.43% | 207 | | samba | 21606400 | **8.063 sec (2.68 MB/s)** | 100.00% | 16.016 sec (1.35 MB/s) | 63.92% | 8.947 sec (2.42 MB/s) | 85.49% | 208 | | sao | 7251944 | **2.553 sec (2.84 MB/s)** | 100.00% | 6.461 sec (1.12 MB/s) | 34.99% | 3.753 sec (1.93 MB/s) | 51.66% | 209 | | webster | 41458703 | **23.900 sec (1.73 MB/s)** | 100.00% | 67.306 sec (0.62 MB/s) | 38.14% | 36.756 sec (1.13 MB/s) | 95.90% | 210 | | x-ray | 8474240 | **2.221 sec (3.82 MB/s)** | 100.00% | 7.558 sec (1.12 MB/s) | 90.99% | 3.921 sec (2.16 MB/s) | 95.25% | 211 | | xml | 5345280 | **1.590 sec (3.36 MB/s)** | 100.00% | 2.380 sec (2.25 MB/s) | 57.68% | 1.910 sec (2.80 MB/s) | 98.50% | 212 | 213 | ### Large Canterbury Corpus (ARMv8 architecture) ### 214 | 215 | | file | size | esa-matchfinder v1.0.0 | optimality | LZMA HC4 v21.07 | optimality | LZMA BT4 v21.07 | optimality | 216 | |:----:|:----:|:----------------------:|:----------:|:---------------:|:----------:|:---------------:|:----------:| 217 | | bible.txt | 4047392 | **1.619 sec (2.50 MB/s)** | 100.00% | 4.741 sec (0.85 MB/s) | 50.12% | 2.688 sec (1.51 MB/s) | 99.83% | 218 | | E.coli | 4638690 | 2.068 sec (2.24 MB/s) | 100.00% | **1.988 sec (2.33 MB/s)** | 2.70% | 5.537 sec (0.84 MB/s) | 99.95% | 219 | | world192.txt | 2473400 | **0.914 sec (2.71 MB/s)** | 100.00% | 2.583 sec (0.96 MB/s) | 63.90% | 1.243 sec (1.99 MB/s) | 99.10% | 220 | 221 | ### Manzini Corpus (ARMv8 architecture) ### 222 | 223 | | file | size | esa-matchfinder v1.0.0 | optimality | LZMA HC4 v21.07 | optimality | LZMA BT4 v21.07 | optimality | 224 | |:----:|:----:|:----------------------:|:----------:|:---------------:|:----------:|:---------------:|:----------:| 225 | | chr22.dna | 34553758 | 24.388 sec (1.42 MB/s) | 100.00% | **14.290 sec (2.42 MB/s)** | 4.67% | 54.272 sec (0.64 MB/s) | 96.22% | 226 | | etext99 | 105277340 | **66.408 sec (1.59 MB/s)** | 100.00% | 216.948 sec (0.49 MB/s) | 24.64% | 133.282 sec (0.79 MB/s) | 98.62% | 227 | | gcc-3.0.tar | 86630400 | **41.198 sec (2.10 MB/s)** | 100.00% | 89.721 sec (0.97 MB/s) | 59.21% | 51.345 sec (1.69 MB/s) | 94.33% | 228 | | howto | 39422105 | **18.222 sec (2.16 MB/s)** | 100.00% | 61.739 sec (0.64 MB/s) | 45.33% | 31.992 sec (1.23 MB/s) | 95.05% | 229 | | jdk13c | 69728899 | 32.194 sec (2.17 MB/s) | 100.00% | 39.770 sec (1.75 MB/s) | 61.10% | **30.121 sec (2.31 MB/s)** | 92.01% | 230 | | linux-2.4.5.tar | 116254720 | **55.748 sec (2.09 MB/s)** | 100.00% | 135.994 sec (0.85 MB/s) | 57.33% | 75.110 sec (1.55 MB/s) | 93.13% | 231 | | rctail96 | 114711151 | **55.181 sec (2.08 MB/s)** | 100.00% | 133.541 sec (0.86 MB/s) | 54.54% | 80.031 sec (1.43 MB/s) | 98.46% | 232 | | rfc | 116421901 | **73.199 sec (1.59 MB/s)** | 100.00% | 149.776 sec (0.78 MB/s) | 35.41% | 96.395 sec (1.21 MB/s) | 87.53% | 233 | | sprot34.dat | 109617186 | **58.204 sec (1.88 MB/s)** | 100.00% | 179.813 sec (0.61 MB/s) | 59.60% | 80.105 sec (1.37 MB/s) | 93.73% | 234 | | w3c2 | 104201579 | 49.571 sec (2.10 MB/s) | 100.00% | 70.244 sec (1.48 MB/s) | 64.99% | **45.104 sec (2.31 MB/s)** | 93.09% | 235 | 236 | ### Large Text Compression Benchmark Corpus (ARMv8 architecture) ### 237 | 238 | | file | size | esa-matchfinder v1.0.0 | optimality | LZMA HC4 v21.07 | optimality | LZMA BT4 v21.07 | optimality | 239 | |:----:|:----:|:----------------------:|:----------:|:---------------:|:----------:|:---------------:|:----------:| 240 | | enwik8 | 100000000 | **53.981 sec (1.85 MB/s)** | 100.00% | 206.851 sec (0.48 MB/s) | 33.41% | 112.780 sec (0.89 MB/s) | 96.56% | 241 | 242 | ### The Gauntlet Corpus (ARMv8 architecture) ### 243 | 244 | | file | size | esa-matchfinder v1.0.0 | optimality | LZMA HC4 v21.07 | optimality | LZMA BT4 v21.07 | optimality | 245 | |:----:|:----:|:----------------------:|:----------:|:---------------:|:----------:|:---------------:|:----------:| 246 | | abac | 200000 | 0.045 sec (4.40 MB/s) | 100.00% | **0.023 sec (8.78 MB/s)** | 100.00% | 0.046 sec (4.31 MB/s) | 100.00% | 247 | | abba | 10500596 | 7.005 sec (1.50 MB/s) | 100.00% | **6.204 sec (1.69 MB/s)** | 0.03% | 7.848 sec (1.34 MB/s) | 97.64% | 248 | | book1x20 | 15375420 | **6.223 sec (2.47 MB/s)** | 100.00% | 25.273 sec (0.61 MB/s) | 30.82% | 12.392 sec (1.24 MB/s) | 99.87% | 249 | | fib_s14930352 | 14930352 | **3.682 sec (4.06 MB/s)** | 100.00% | 6.112 sec (2.44 MB/s) | 96.99% | 4.503 sec (3.32 MB/s) | 100.00% | 250 | | fss10 | 12078908 | **3.123 sec (3.87 MB/s)** | 100.00% | 5.428 sec (2.23 MB/s) | 95.65% | 3.771 sec (3.20 MB/s) | 100.00% | 251 | | fss9 | 2851443 | **0.757 sec (3.77 MB/s)** | 100.00% | 1.277 sec (2.23 MB/s) | 95.65% | 0.891 sec (3.20 MB/s) | 100.00% | 252 | | houston | 3839141 | 0.788 sec (4.87 MB/s) | 100.00% | **0.502 sec (7.65 MB/s)** | 97.64% | 0.815 sec (4.71 MB/s) | 100.00% | 253 | | paper5x80 | 956322 | **0.179 sec (5.35 MB/s)** | 100.00% | 0.192 sec (4.97 MB/s) | 94.59% | 0.217 sec (4.41 MB/s) | 99.94% | 254 | | test1 | 2097152 | 0.481 sec (4.36 MB/s) | 100.00% | **0.245 sec (8.55 MB/s)** | 100.00% | 0.494 sec (4.24 MB/s) | 100.00% | 255 | | test2 | 2097152 | 0.479 sec (4.38 MB/s) | 100.00% | **0.246 sec (8.54 MB/s)** | 100.00% | 0.494 sec (4.24 MB/s) | 100.00% | 256 | | test3 | 2097088 | **0.300 sec (6.98 MB/s)** | 100.00% | 0.461 sec (4.55 MB/s) | 100.00% | 0.506 sec (4.15 MB/s) | 100.00% | 257 | -------------------------------------------------------------------------------- /THIRD-PARTY-NOTICES: -------------------------------------------------------------------------------- 1 | The esa-matchfinder uses third-party libraries or other resources that may 2 | be distributed under licenses different than the esa-matchfinder software. 3 | 4 | The attached notices are provided for information only. 5 | 6 | License notice for 'libsais' library 7 | ------------------------------------ 8 | 9 | Copyright (c) 2021-2022 Ilya Grebnov 10 | 11 | Licensed under the Apache License, Version 2.0 (the "License"); 12 | you may not use this file except in compliance with the License. 13 | You may obtain a copy of the License at 14 | 15 | http://www.apache.org/licenses/LICENSE-2.0 16 | 17 | Unless required by applicable law or agreed to in writing, software 18 | distributed under the License is distributed on an "AS IS" BASIS, 19 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 20 | See the License for the specific language governing permissions and 21 | limitations under the License. 22 | -------------------------------------------------------------------------------- /VERSION: -------------------------------------------------------------------------------- 1 | 1.2.1 -------------------------------------------------------------------------------- /esa_matchfinder.c: -------------------------------------------------------------------------------- 1 | /*-- 2 | 3 | This file is a part of esa-matchfinder, a library for efficient 4 | Lempel-Ziv factorization using enhanced suffix array (ESA). 5 | 6 | Copyright (c) 2022-2025 Ilya Grebnov 7 | 8 | Licensed under the Apache License, Version 2.0 (the "License"); 9 | you may not use this file except in compliance with the License. 10 | You may obtain a copy of the License at 11 | 12 | http://www.apache.org/licenses/LICENSE-2.0 13 | 14 | Unless required by applicable law or agreed to in writing, software 15 | distributed under the License is distributed on an "AS IS" BASIS, 16 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 17 | See the License for the specific language governing permissions and 18 | limitations under the License. 19 | 20 | Please see the file LICENSE for full copyright and license details. 21 | 22 | --*/ 23 | 24 | // This file uses the libsais library for linear time suffix array (SA) 25 | // and permuted longest common prefix array (PLCP) construction. 26 | // 27 | // See https://github.com/IlyaGrebnov/libsais for more information. 28 | // 29 | // The libsais library is released under Apache License 2.0. 30 | // Copyright (c) 2021-2022 Ilya Grebnov 31 | // 32 | 33 | #include "esa_matchfinder.h" 34 | #include "libsais/libsais.h" 35 | 36 | #include 37 | #include 38 | #include 39 | #include 40 | #include 41 | 42 | #if defined(ESA_MATCHFINDER_OPENMP) 43 | #include 44 | 45 | #define ESA_MF_NUM_THREADS_MAX (256) 46 | #else 47 | #define ESA_MF_UNUSED(_x) (void)(_x) 48 | #define ESA_MF_NUM_THREADS_MAX (1) 49 | #endif 50 | 51 | #define ESA_MF_TOTAL_BITS (64) 52 | 53 | #define ESA_MF_LCP_BITS (ESA_MATCHFINDER_MATCH_BITS) 54 | #define ESA_MF_LCP_MAX (((uint64_t)1 << ESA_MF_LCP_BITS) - 1) 55 | #define ESA_MF_LCP_SHIFT (ESA_MF_TOTAL_BITS - ESA_MF_LCP_BITS) 56 | #define ESA_MF_LCP_MASK (ESA_MF_LCP_MAX << ESA_MF_LCP_SHIFT) 57 | 58 | #define ESA_MF_OFFSET_BITS (ESA_MF_LCP_SHIFT / 2) 59 | #define ESA_MF_OFFSET_MAX (((uint64_t)1 << ESA_MF_OFFSET_BITS) - 1) 60 | #define ESA_MF_OFFSET_SHIFT (ESA_MF_TOTAL_BITS - ESA_MF_LCP_BITS - ESA_MF_OFFSET_BITS) 61 | #define ESA_MF_OFFSET_MASK (ESA_MF_OFFSET_MAX << ESA_MF_OFFSET_SHIFT) 62 | 63 | #define ESA_MF_PARENT_BITS (ESA_MF_OFFSET_SHIFT) 64 | #define ESA_MF_PARENT_MAX (((uint64_t)1 << ESA_MF_PARENT_BITS) - 1) 65 | #define ESA_MF_PARENT_SHIFT (ESA_MF_TOTAL_BITS - ESA_MF_LCP_BITS - ESA_MF_OFFSET_BITS - ESA_MF_PARENT_BITS) 66 | #define ESA_MF_PARENT_MASK (ESA_MF_PARENT_MAX << ESA_MF_PARENT_SHIFT) 67 | 68 | #define ESA_MF_STORAGE_PADDING (64) 69 | 70 | #if defined(__clang__) 71 | #pragma clang diagnostic push 72 | #pragma clang diagnostic ignored "-Wunreachable-code" 73 | #pragma clang diagnostic ignored "-Wstrict-aliasing" 74 | #pragma clang diagnostic ignored "-Wuninitialized" 75 | #elif defined(__GNUC__) 76 | #pragma GCC diagnostic push 77 | #pragma GCC diagnostic ignored "-Wunreachable-code" 78 | #pragma GCC diagnostic ignored "-Wstrict-aliasing" 79 | #pragma GCC diagnostic ignored "-Wuninitialized" 80 | #elif defined(_MSC_VER) 81 | #pragma warning(push) 82 | #pragma warning(disable: 4127) 83 | #pragma warning(disable: 4820) 84 | #endif 85 | 86 | typedef struct ESA_MF_THREAD_STATE 87 | { 88 | ptrdiff_t interval_tree_start; 89 | ptrdiff_t interval_tree_end; 90 | } ESA_MF_THREAD_STATE; 91 | 92 | typedef struct ESA_MF_CONTEXT 93 | { 94 | uint64_t prefetch[4][8]; 95 | uint64_t position; 96 | 97 | uint64_t * sa_parent_link; 98 | uint32_t * plcp_leaf_link; 99 | uint64_t min_match_length_minus_1; 100 | 101 | int32_t * esa_storage; 102 | void * libsais_ctx; 103 | 104 | int32_t block_size; 105 | int32_t max_block_size; 106 | int32_t min_match_length; 107 | int32_t max_match_length; 108 | int32_t num_threads; 109 | 110 | ESA_MF_THREAD_STATE threads[ESA_MF_NUM_THREADS_MAX]; 111 | } ESA_MF_CONTEXT; 112 | 113 | #if defined(__GNUC__) || defined(__clang__) 114 | #define ESA_MF_RESTRICT __restrict__ 115 | #elif defined(_MSC_VER) || defined(__INTEL_COMPILER) 116 | #define ESA_MF_RESTRICT __restrict 117 | #else 118 | #error Your compiler, configuration or platform is not supported. 119 | #endif 120 | 121 | #if defined(__has_builtin) 122 | #if __has_builtin(__builtin_prefetch) 123 | #define ESA_MF_HAS_BUILTIN_PREFECTCH 124 | #endif 125 | #elif defined(__GNUC__) && (((__GNUC__ == 3) && (__GNUC_MINOR__ >= 2)) || (__GNUC__ >= 4)) 126 | #define ESA_MF_HAS_BUILTIN_PREFECTCH 127 | #endif 128 | 129 | #if defined(ESA_MF_HAS_BUILTIN_PREFECTCH) 130 | #define esa_matchfinder_prefetchr(address) __builtin_prefetch((const void *)(address), 0, 3) 131 | #define esa_matchfinder_prefetchw(address) __builtin_prefetch((const void *)(address), 1, 3) 132 | #elif defined (_M_IX86) || defined (_M_AMD64) 133 | #include 134 | #define esa_matchfinder_prefetchr(address) _mm_prefetch((const void *)(address), _MM_HINT_T0) 135 | #define esa_matchfinder_prefetchw(address) _m_prefetchw((const void *)(address)) 136 | #elif defined (_M_ARM) 137 | #include 138 | #define esa_matchfinder_prefetchr(address) __prefetch((const void *)(address)) 139 | #define esa_matchfinder_prefetchw(address) __prefetchw((const void *)(address)) 140 | #elif defined (_M_ARM64) 141 | #include 142 | #define esa_matchfinder_prefetchr(address) __prefetch2((const void *)(address), 0) 143 | #define esa_matchfinder_prefetchw(address) __prefetch2((const void *)(address), 16) 144 | #else 145 | #error Your compiler, configuration or platform is not supported. 146 | #endif 147 | 148 | #if !defined(__LITTLE_ENDIAN__) && !defined(__BIG_ENDIAN__) 149 | #if defined(_LITTLE_ENDIAN) \ 150 | || (defined(BYTE_ORDER) && defined(LITTLE_ENDIAN) && BYTE_ORDER == LITTLE_ENDIAN) \ 151 | || (defined(_BYTE_ORDER) && defined(_LITTLE_ENDIAN) && _BYTE_ORDER == _LITTLE_ENDIAN) \ 152 | || (defined(__BYTE_ORDER) && defined(__LITTLE_ENDIAN) && __BYTE_ORDER == __LITTLE_ENDIAN) \ 153 | || (defined(__BYTE_ORDER__) && defined(__ORDER_LITTLE_ENDIAN__) && __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__) 154 | #define __LITTLE_ENDIAN__ 155 | #elif defined(_BIG_ENDIAN) \ 156 | || (defined(BYTE_ORDER) && defined(BIG_ENDIAN) && BYTE_ORDER == BIG_ENDIAN) \ 157 | || (defined(_BYTE_ORDER) && defined(_BIG_ENDIAN) && _BYTE_ORDER == _BIG_ENDIAN) \ 158 | || (defined(__BYTE_ORDER) && defined(__BIG_ENDIAN) && __BYTE_ORDER == __BIG_ENDIAN) \ 159 | || (defined(__BYTE_ORDER__) && defined(__ORDER_BIG_ENDIAN__) && __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) 160 | #define __BIG_ENDIAN__ 161 | #elif defined(_WIN32) 162 | #define __LITTLE_ENDIAN__ 163 | #else 164 | #error Your compiler, configuration or platform is not supported. 165 | #endif 166 | #endif 167 | 168 | static void * esa_matchfinder_align_up(const void * address, size_t alignment) 169 | { 170 | return (void *)((((ptrdiff_t)address) + ((ptrdiff_t)alignment) - 1) & (-((ptrdiff_t)alignment))); 171 | } 172 | 173 | static void * esa_matchfinder_alloc_aligned(size_t size, size_t alignment) 174 | { 175 | void * address = malloc(size + sizeof(short) + alignment - 1); 176 | if (address != NULL) 177 | { 178 | void * aligned_address = esa_matchfinder_align_up((void *)((ptrdiff_t)address + (ptrdiff_t)(sizeof(short))), alignment); 179 | ((short *)aligned_address)[-1] = (short)((ptrdiff_t)aligned_address - (ptrdiff_t)address); 180 | 181 | return aligned_address; 182 | } 183 | 184 | return NULL; 185 | } 186 | 187 | static void esa_matchfinder_free_aligned(void * aligned_address) 188 | { 189 | if (aligned_address != NULL) 190 | { 191 | free((void *)((ptrdiff_t)aligned_address - ((short *)aligned_address)[-1])); 192 | } 193 | } 194 | 195 | static void esa_matchfinder_set_position(ESA_MF_CONTEXT * matchfinder_ctx, uint64_t position) 196 | { 197 | matchfinder_ctx->position = position; 198 | memset(matchfinder_ctx->prefetch, 0, sizeof(matchfinder_ctx->prefetch)); 199 | } 200 | 201 | static ESA_MF_CONTEXT * esa_matchfinder_alloc_ctx(int32_t max_block_size, int32_t min_match_length, int32_t max_match_length, int32_t num_threads) 202 | { 203 | num_threads = num_threads < ESA_MF_NUM_THREADS_MAX ? num_threads : ESA_MF_NUM_THREADS_MAX; 204 | max_block_size = (max_block_size + ESA_MF_STORAGE_PADDING - 1) & (-ESA_MF_STORAGE_PADDING); 205 | 206 | ESA_MF_CONTEXT * matchfinder_ctx = (ESA_MF_CONTEXT *)esa_matchfinder_alloc_aligned(sizeof(ESA_MF_CONTEXT), ESA_MF_STORAGE_PADDING); 207 | int32_t * esa_storage = (int32_t *)esa_matchfinder_alloc_aligned((2 * ESA_MF_STORAGE_PADDING + 3 * (size_t)max_block_size) * sizeof(int32_t), ESA_MF_STORAGE_PADDING); 208 | 209 | #if defined(ESA_MATCHFINDER_OPENMP) && defined(LIBSAIS_OPENMP) 210 | void * libsais_ctx = libsais_create_ctx_omp(num_threads); 211 | #else 212 | void * libsais_ctx = libsais_create_ctx(); 213 | #endif 214 | 215 | if (matchfinder_ctx != NULL && esa_storage != NULL && libsais_ctx != NULL) 216 | { 217 | matchfinder_ctx->esa_storage = esa_storage; 218 | matchfinder_ctx->libsais_ctx = libsais_ctx; 219 | 220 | matchfinder_ctx->block_size = -1; 221 | matchfinder_ctx->max_block_size = max_block_size; 222 | matchfinder_ctx->min_match_length = min_match_length; 223 | matchfinder_ctx->max_match_length = max_match_length; 224 | matchfinder_ctx->num_threads = num_threads; 225 | 226 | matchfinder_ctx->sa_parent_link = (uint64_t *)(void *)(matchfinder_ctx->esa_storage + ESA_MF_STORAGE_PADDING) + 0 * matchfinder_ctx->max_block_size; 227 | matchfinder_ctx->plcp_leaf_link = (uint32_t *)(void *)(matchfinder_ctx->esa_storage + ESA_MF_STORAGE_PADDING) + 2 * matchfinder_ctx->max_block_size; 228 | matchfinder_ctx->min_match_length_minus_1 = (uint64_t)matchfinder_ctx->min_match_length - 1; 229 | 230 | esa_matchfinder_set_position(matchfinder_ctx, (uint64_t)-1); 231 | 232 | return matchfinder_ctx; 233 | } 234 | 235 | libsais_free_ctx(libsais_ctx); 236 | 237 | esa_matchfinder_free_aligned(esa_storage); 238 | esa_matchfinder_free_aligned(matchfinder_ctx); 239 | 240 | return NULL; 241 | } 242 | 243 | static void esa_matchfinder_free_ctx(ESA_MF_CONTEXT * matchfinder_ctx) 244 | { 245 | if (matchfinder_ctx != NULL) 246 | { 247 | libsais_free_ctx(matchfinder_ctx->libsais_ctx); 248 | 249 | esa_matchfinder_free_aligned(matchfinder_ctx->esa_storage); 250 | esa_matchfinder_free_aligned(matchfinder_ctx); 251 | } 252 | } 253 | 254 | static void esa_matchfinder_convert_32u_to_64u(uint32_t * ESA_MF_RESTRICT S, uint64_t * ESA_MF_RESTRICT D, ptrdiff_t omp_block_start, ptrdiff_t omp_block_size) 255 | { 256 | ptrdiff_t i, j; 257 | for (i = omp_block_start, j = omp_block_start + omp_block_size; i < j; i += 1) 258 | { 259 | D[i] = (uint64_t)S[i]; 260 | } 261 | } 262 | 263 | static void esa_matchfinder_convert_inplace_32u_to_64u(uint32_t * ESA_MF_RESTRICT V, ptrdiff_t omp_block_start, ptrdiff_t omp_block_size) 264 | { 265 | ptrdiff_t i, j; 266 | for (i = omp_block_start + omp_block_size - 1, j = omp_block_start; i >= j; i -= 1) 267 | { 268 | #if defined(__LITTLE_ENDIAN__) 269 | V[i + i + 0] = V[i]; V[i + i + 1] = 0; 270 | #else 271 | V[i + i + 0] = 0; V[i + i + 1] = V[i]; 272 | #endif 273 | } 274 | } 275 | 276 | static void esa_matchfinder_convert_inplace_32u_to_64u_omp(uint32_t * ESA_MF_RESTRICT V, ptrdiff_t n, ptrdiff_t num_threads) 277 | { 278 | while (n >= 65536) 279 | { 280 | ptrdiff_t block_size = n >> 1; n -= block_size; 281 | 282 | #if defined(ESA_MATCHFINDER_OPENMP) 283 | #pragma omp parallel num_threads(num_threads) if(num_threads > 1) 284 | #endif 285 | { 286 | #if defined(ESA_MATCHFINDER_OPENMP) 287 | ptrdiff_t omp_thread_num = omp_get_thread_num(); 288 | ptrdiff_t omp_num_threads = omp_get_num_threads(); 289 | #else 290 | ESA_MF_UNUSED(num_threads); 291 | 292 | ptrdiff_t omp_thread_num = 0; 293 | ptrdiff_t omp_num_threads = 1; 294 | #endif 295 | ptrdiff_t omp_block_stride = (block_size / omp_num_threads) & (-16); 296 | ptrdiff_t omp_block_start = omp_thread_num * omp_block_stride; 297 | ptrdiff_t omp_block_size = omp_thread_num < omp_num_threads - 1 ? omp_block_stride : block_size - omp_block_start; 298 | 299 | esa_matchfinder_convert_32u_to_64u(((uint32_t *)(void *)V) + n, ((uint64_t *)(void *)V) + n, omp_block_start, omp_block_size); 300 | } 301 | } 302 | 303 | esa_matchfinder_convert_inplace_32u_to_64u(V, 0, n); 304 | } 305 | 306 | static void esa_matchfinder_reset_interval_tree(uint64_t * ESA_MF_RESTRICT sa_parent_link, ptrdiff_t omp_block_start, ptrdiff_t omp_block_size) 307 | { 308 | ptrdiff_t i, j; for (i = omp_block_start, j = omp_block_start + omp_block_size; i < j; i += 1) { sa_parent_link[i] &= (~ESA_MF_OFFSET_MASK); } 309 | } 310 | 311 | static void esa_matchfinder_reset_interval_tree_omp(uint64_t * ESA_MF_RESTRICT sa_parent_link, ptrdiff_t n, ptrdiff_t num_threads) 312 | { 313 | #if defined(ESA_MATCHFINDER_OPENMP) 314 | #pragma omp parallel num_threads(num_threads) if(num_threads > 1 && n >= 65536) 315 | #endif 316 | { 317 | #if defined(ESA_MATCHFINDER_OPENMP) 318 | ptrdiff_t omp_thread_num = omp_get_thread_num(); 319 | ptrdiff_t omp_num_threads = omp_get_num_threads(); 320 | #else 321 | ESA_MF_UNUSED(num_threads); 322 | 323 | ptrdiff_t omp_thread_num = 0; 324 | ptrdiff_t omp_num_threads = 1; 325 | #endif 326 | ptrdiff_t omp_block_stride = (n / omp_num_threads) & (-16); 327 | ptrdiff_t omp_block_start = omp_thread_num * omp_block_stride; 328 | ptrdiff_t omp_block_size = omp_thread_num < omp_num_threads - 1 ? omp_block_stride : n - omp_block_start; 329 | 330 | esa_matchfinder_reset_interval_tree(sa_parent_link, omp_block_start, omp_block_size); 331 | } 332 | } 333 | 334 | static void esa_matchfinder_fast_forward 335 | ( 336 | uint64_t * ESA_MF_RESTRICT sa_parent_link, 337 | uint32_t * ESA_MF_RESTRICT plcp_leaf_link, 338 | uint64_t target_position 339 | ) 340 | { 341 | const uint64_t prefetch_distance = 32; 342 | uint64_t position = target_position - 1; 343 | 344 | for (; position >= prefetch_distance; position -= 1) 345 | { 346 | esa_matchfinder_prefetchr(&plcp_leaf_link[position - 2 * prefetch_distance]); 347 | esa_matchfinder_prefetchw(&sa_parent_link[plcp_leaf_link[position - prefetch_distance]]); 348 | 349 | const uint64_t offset = (uint64_t)position << ESA_MF_OFFSET_SHIFT; 350 | uint64_t reference = plcp_leaf_link[position]; 351 | uint64_t interval = sa_parent_link[reference]; 352 | 353 | while ((interval & ESA_MF_OFFSET_MASK) == 0) 354 | { 355 | sa_parent_link[reference] = interval + offset; 356 | reference = (uint32_t)interval; 357 | interval = sa_parent_link[reference]; 358 | } 359 | } 360 | 361 | for (; position > 0; position -= 1) 362 | { 363 | const uint64_t offset = (uint64_t)position << ESA_MF_OFFSET_SHIFT; 364 | uint64_t reference = plcp_leaf_link[position]; 365 | uint64_t interval = sa_parent_link[reference]; 366 | 367 | while ((interval & ESA_MF_OFFSET_MASK) == 0) 368 | { 369 | sa_parent_link[reference] = interval + offset; 370 | reference = (uint32_t)interval; 371 | interval = sa_parent_link[reference]; 372 | } 373 | } 374 | } 375 | 376 | static ptrdiff_t esa_matchfinder_build_interval_tree 377 | ( 378 | uint64_t * ESA_MF_RESTRICT sa_parent_link, 379 | uint32_t * ESA_MF_RESTRICT plcp_leaf_link, 380 | uint64_t min_match_length, 381 | uint64_t max_match_length, 382 | ptrdiff_t omp_block_start, 383 | ptrdiff_t omp_block_size 384 | ) 385 | { 386 | uint64_t intervals[2 * ESA_MATCHFINDER_MAX_MATCH_LENGTH]; 387 | 388 | const ptrdiff_t prefetch_distance = 32; 389 | uint64_t * ESA_MF_RESTRICT stack = intervals; 390 | uint64_t top_interval = stack[0] = 0; 391 | uint64_t next_interval_index = (uint64_t)(omp_block_start + omp_block_size - 1); 392 | 393 | min_match_length -= 1; 394 | max_match_length -= min_match_length; 395 | 396 | for (ptrdiff_t i = omp_block_start + omp_block_size - 1; i >= omp_block_start; i -= 1) 397 | { 398 | esa_matchfinder_prefetchr(&sa_parent_link[i - 2 * prefetch_distance]); 399 | 400 | esa_matchfinder_prefetchw(&plcp_leaf_link[sa_parent_link[i - prefetch_distance]]); 401 | esa_matchfinder_prefetchw(&sa_parent_link[next_interval_index - prefetch_distance]); 402 | 403 | uint64_t next_pos = sa_parent_link[i]; 404 | uint64_t next_lcp = (uint64_t)plcp_leaf_link[next_pos] - min_match_length; 405 | 406 | if ((int64_t)next_lcp < 0) { next_lcp = 0; } 407 | if (next_lcp > max_match_length) { next_lcp = max_match_length; } 408 | 409 | uint64_t next_interval = (next_lcp << ESA_MF_LCP_SHIFT) + next_interval_index; 410 | uint64_t top_interval_lcp = top_interval >> ESA_MF_LCP_SHIFT; 411 | 412 | stack[1] = next_interval; 413 | top_interval = next_lcp > top_interval_lcp ? next_interval : top_interval; 414 | next_interval_index -= next_lcp > top_interval_lcp; 415 | stack += next_lcp > top_interval_lcp; 416 | 417 | plcp_leaf_link[next_pos] = (uint32_t)top_interval; 418 | 419 | while (next_lcp < top_interval_lcp) 420 | { 421 | uint64_t closed_interval = top_interval; 422 | 423 | stack = stack - 1; 424 | top_interval = stack[0]; 425 | top_interval_lcp = top_interval >> ESA_MF_LCP_SHIFT; 426 | 427 | stack[1] = next_interval; 428 | top_interval = next_lcp > top_interval_lcp ? next_interval : top_interval; 429 | next_interval_index -= next_lcp > top_interval_lcp; 430 | stack += next_lcp > top_interval_lcp; 431 | 432 | sa_parent_link[(uint32_t)closed_interval] = (uint32_t)top_interval + (closed_interval & ESA_MF_LCP_MASK); 433 | } 434 | } 435 | 436 | return (ptrdiff_t)(next_interval_index + 1); 437 | } 438 | 439 | #if defined(ESA_MATCHFINDER_OPENMP) 440 | 441 | static ptrdiff_t esa_matchfinder_find_breakpoint 442 | ( 443 | uint64_t * ESA_MF_RESTRICT sa_parent_link, 444 | uint32_t * ESA_MF_RESTRICT plcp_leaf_link, 445 | uint32_t min_match_length, 446 | ptrdiff_t omp_block_start, 447 | ptrdiff_t omp_block_size 448 | ) 449 | { 450 | const ptrdiff_t prefetch_distance = 32; 451 | 452 | for (ptrdiff_t i = omp_block_start + omp_block_size - 1; i >= omp_block_start; i -= 1) 453 | { 454 | esa_matchfinder_prefetchr(&sa_parent_link[i - 2 * prefetch_distance]); 455 | esa_matchfinder_prefetchr(&plcp_leaf_link[sa_parent_link[i - prefetch_distance]]); 456 | 457 | if (plcp_leaf_link[sa_parent_link[i]] < min_match_length) 458 | { 459 | return i; 460 | } 461 | } 462 | 463 | return -1; 464 | } 465 | 466 | #endif 467 | 468 | static void esa_matchfinder_build_interval_tree_omp 469 | ( 470 | uint64_t * ESA_MF_RESTRICT sa_parent_link, 471 | uint32_t * ESA_MF_RESTRICT plcp_leaf_link, 472 | uint64_t min_match_length, 473 | uint64_t max_match_length, 474 | ptrdiff_t n, 475 | ptrdiff_t num_threads, 476 | ESA_MF_THREAD_STATE * threads 477 | ) 478 | { 479 | #if defined(ESA_MATCHFINDER_OPENMP) 480 | ptrdiff_t breakpoints[ESA_MF_NUM_THREADS_MAX]; 481 | 482 | for (ptrdiff_t thread = 0; thread < num_threads; thread += 1) 483 | { 484 | threads[thread].interval_tree_start = 0; 485 | threads[thread].interval_tree_end = 0; 486 | } 487 | 488 | #pragma omp parallel num_threads(num_threads) if(num_threads > 1 && n >= 65536) 489 | #endif 490 | { 491 | #if defined(ESA_MATCHFINDER_OPENMP) 492 | ptrdiff_t omp_thread_num = omp_get_thread_num(); 493 | ptrdiff_t omp_num_threads = omp_get_num_threads(); 494 | #else 495 | ESA_MF_UNUSED(num_threads); 496 | 497 | ptrdiff_t omp_thread_num = 0; 498 | ptrdiff_t omp_num_threads = 1; 499 | #endif 500 | ptrdiff_t omp_block_stride = (n / omp_num_threads) & (-16); 501 | ptrdiff_t omp_block_start = omp_thread_num * omp_block_stride; 502 | ptrdiff_t omp_block_size = omp_thread_num < omp_num_threads - 1 ? omp_block_stride : n - omp_block_start; 503 | ptrdiff_t omp_block_end = omp_block_start + omp_block_size; 504 | 505 | if (omp_num_threads == 1) 506 | { 507 | threads[omp_thread_num].interval_tree_end = omp_block_end; 508 | threads[omp_thread_num].interval_tree_start = esa_matchfinder_build_interval_tree( 509 | sa_parent_link, 510 | plcp_leaf_link, 511 | min_match_length, 512 | max_match_length, 513 | omp_block_start, 514 | omp_block_end - omp_block_start); 515 | } 516 | #if defined(ESA_MATCHFINDER_OPENMP) 517 | else 518 | { 519 | { 520 | breakpoints[omp_thread_num] = omp_thread_num < omp_num_threads - 1 521 | ? esa_matchfinder_find_breakpoint(sa_parent_link, plcp_leaf_link, (uint32_t)min_match_length, omp_block_start, omp_block_end - omp_block_start) 522 | : n; 523 | } 524 | 525 | #pragma omp barrier 526 | 527 | { 528 | if (breakpoints[omp_thread_num] != -1) 529 | { 530 | omp_block_end = breakpoints[omp_thread_num]; 531 | omp_block_start = 0; 532 | 533 | for (ptrdiff_t thread = omp_thread_num - 1; thread >= 0; thread -= 1) 534 | { 535 | if (breakpoints[thread] != -1) { omp_block_start = breakpoints[thread]; break; } 536 | } 537 | 538 | if (omp_block_start < omp_block_end) 539 | { 540 | threads[omp_thread_num].interval_tree_end = omp_block_end; 541 | threads[omp_thread_num].interval_tree_start = esa_matchfinder_build_interval_tree( 542 | sa_parent_link, 543 | plcp_leaf_link, 544 | min_match_length, 545 | max_match_length, 546 | omp_block_start, 547 | omp_block_end - omp_block_start); 548 | } 549 | } 550 | } 551 | } 552 | #endif 553 | } 554 | 555 | { 556 | sa_parent_link[0] = ESA_MF_OFFSET_MASK; 557 | } 558 | } 559 | 560 | void * esa_matchfinder_create(int32_t max_block_size, int32_t min_match_length, int32_t max_match_length) 561 | { 562 | if ((max_block_size < 0) || 563 | (max_block_size > ESA_MATCHFINDER_MAX_BLOCK_SIZE) || 564 | (min_match_length < ESA_MATCHFINDER_MIN_MATCH_LENGTH) || 565 | (max_match_length > (int32_t)ESA_MF_LCP_MAX + min_match_length - 1) || 566 | (max_match_length < min_match_length)) 567 | { 568 | return NULL; 569 | } 570 | 571 | return (void *)esa_matchfinder_alloc_ctx(max_block_size, min_match_length, max_match_length, 1); 572 | } 573 | 574 | #if defined(ESA_MATCHFINDER_OPENMP) 575 | 576 | void * esa_matchfinder_create_omp(int32_t max_block_size, int32_t min_match_length, int32_t max_match_length, int32_t num_threads) 577 | { 578 | if ((max_block_size < 0) || 579 | (max_block_size > ESA_MATCHFINDER_MAX_BLOCK_SIZE) || 580 | (min_match_length < ESA_MATCHFINDER_MIN_MATCH_LENGTH) || 581 | (max_match_length > (int32_t)ESA_MF_LCP_MAX + min_match_length - 1) || 582 | (max_match_length < min_match_length) || 583 | (num_threads < 0)) 584 | { 585 | return NULL; 586 | } 587 | 588 | num_threads = num_threads > 0 ? num_threads : omp_get_max_threads(); 589 | return (void *)esa_matchfinder_alloc_ctx(max_block_size, min_match_length, max_match_length, num_threads); 590 | } 591 | 592 | #endif 593 | 594 | void esa_matchfinder_destroy(void * mf) 595 | { 596 | esa_matchfinder_free_ctx((ESA_MF_CONTEXT *)mf); 597 | } 598 | 599 | int32_t esa_matchfinder_parse(void * mf, const uint8_t * block, int32_t block_size) 600 | { 601 | ESA_MF_CONTEXT * matchfinder_ctx = (ESA_MF_CONTEXT *)mf; 602 | 603 | if ((matchfinder_ctx == NULL) || (block == NULL) || (block_size < 0) || (block_size > matchfinder_ctx->max_block_size)) 604 | { 605 | return ESA_MATCHFINDER_BAD_PARAMETER; 606 | } 607 | 608 | matchfinder_ctx->block_size = block_size; 609 | memset(matchfinder_ctx->esa_storage + 0 * ESA_MF_STORAGE_PADDING + 0 * matchfinder_ctx->max_block_size + 0 * matchfinder_ctx->block_size, 0, ESA_MF_STORAGE_PADDING * sizeof(int32_t)); 610 | memset(matchfinder_ctx->esa_storage + 1 * ESA_MF_STORAGE_PADDING + 2 * matchfinder_ctx->max_block_size + 1 * matchfinder_ctx->block_size, 0, ESA_MF_STORAGE_PADDING * sizeof(int32_t)); 611 | 612 | int32_t result = libsais_ctx( 613 | matchfinder_ctx->libsais_ctx, 614 | block, 615 | (int32_t *)(void *)matchfinder_ctx->sa_parent_link, 616 | matchfinder_ctx->block_size, 617 | (2 * matchfinder_ctx->max_block_size) - matchfinder_ctx->block_size, 618 | NULL); 619 | 620 | if (result == ESA_MATCHFINDER_NO_ERROR) 621 | { 622 | #if defined(ESA_MATCHFINDER_OPENMP) && defined(LIBSAIS_OPENMP) 623 | result = libsais_plcp_omp( 624 | block, 625 | (int32_t *)(void *)matchfinder_ctx->sa_parent_link, 626 | (int32_t *)(void *)matchfinder_ctx->plcp_leaf_link, 627 | matchfinder_ctx->block_size, 628 | matchfinder_ctx->num_threads); 629 | #else 630 | result = libsais_plcp( 631 | block, 632 | (int32_t *)(void *)matchfinder_ctx->sa_parent_link, 633 | (int32_t *)(void *)matchfinder_ctx->plcp_leaf_link, 634 | block_size); 635 | #endif 636 | 637 | if (result == ESA_MATCHFINDER_NO_ERROR) 638 | { 639 | esa_matchfinder_convert_inplace_32u_to_64u_omp( 640 | (uint32_t *)(void *)matchfinder_ctx->sa_parent_link, 641 | matchfinder_ctx->block_size, 642 | matchfinder_ctx->num_threads); 643 | 644 | esa_matchfinder_build_interval_tree_omp( 645 | matchfinder_ctx->sa_parent_link, 646 | matchfinder_ctx->plcp_leaf_link, 647 | (uint64_t)matchfinder_ctx->min_match_length, 648 | (uint64_t)matchfinder_ctx->max_match_length, 649 | matchfinder_ctx->block_size, 650 | matchfinder_ctx->num_threads, 651 | matchfinder_ctx->threads); 652 | 653 | esa_matchfinder_set_position(matchfinder_ctx, 0); 654 | } 655 | } 656 | 657 | return result; 658 | } 659 | 660 | int32_t esa_matchfinder_get_position(void * mf) 661 | { 662 | return (int32_t)((ESA_MF_CONTEXT *)mf)->position; 663 | } 664 | 665 | int32_t esa_matchfinder_rewind(void * mf, int32_t position) 666 | { 667 | ESA_MF_CONTEXT * matchfinder_ctx = (ESA_MF_CONTEXT *)mf; 668 | 669 | if ((matchfinder_ctx == NULL) || (position < 0) || (position >= matchfinder_ctx->block_size)) 670 | { 671 | return ESA_MATCHFINDER_BAD_PARAMETER; 672 | } 673 | 674 | if (matchfinder_ctx->position != (uint64_t)position) 675 | { 676 | if (matchfinder_ctx->position != 0) 677 | { 678 | for (ptrdiff_t thread = 0; thread < matchfinder_ctx->num_threads; thread += 1) 679 | { 680 | ptrdiff_t interval_tree_start = matchfinder_ctx->threads[thread].interval_tree_start; 681 | ptrdiff_t interval_tree_end = matchfinder_ctx->threads[thread].interval_tree_end; 682 | 683 | if (interval_tree_start < interval_tree_end) 684 | { 685 | esa_matchfinder_reset_interval_tree_omp( 686 | matchfinder_ctx->sa_parent_link + interval_tree_start, 687 | interval_tree_end - interval_tree_start, 688 | matchfinder_ctx->num_threads); 689 | } 690 | } 691 | } 692 | 693 | if (position > 0) 694 | { 695 | esa_matchfinder_fast_forward(matchfinder_ctx->sa_parent_link, matchfinder_ctx->plcp_leaf_link, (uint64_t)position); 696 | } 697 | 698 | esa_matchfinder_set_position(matchfinder_ctx, (uint64_t)position); 699 | } 700 | 701 | return ESA_MATCHFINDER_NO_ERROR; 702 | } 703 | 704 | ESA_MATCHFINDER_MATCH * esa_matchfinder_find_all_matches(void * mf, ESA_MATCHFINDER_MATCH * matches) 705 | { 706 | ESA_MF_CONTEXT * ESA_MF_RESTRICT const matchfinder_ctx = (ESA_MF_CONTEXT *)mf; 707 | 708 | const ptrdiff_t prefetch_distance = 4; 709 | const uint64_t position = matchfinder_ctx->position++; 710 | 711 | uint64_t * ESA_MF_RESTRICT const sa_parent_link = matchfinder_ctx->sa_parent_link; 712 | uint32_t * ESA_MF_RESTRICT const plcp_leaf_link = matchfinder_ctx->plcp_leaf_link; 713 | uint64_t * ESA_MF_RESTRICT const prefetch = &matchfinder_ctx->prefetch[position & (prefetch_distance - 1)][0]; 714 | ESA_MATCHFINDER_MATCH * ESA_MF_RESTRICT next_match = matches; 715 | 716 | esa_matchfinder_prefetchw(&sa_parent_link[ (sa_parent_link[prefetch[0]] & ESA_MF_PARENT_MASK)]); 717 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[0] = (sa_parent_link[prefetch[1]] & ESA_MF_PARENT_MASK)]); 718 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[1] = (sa_parent_link[prefetch[2]] & ESA_MF_PARENT_MASK)]); 719 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[2] = (sa_parent_link[prefetch[3]] & ESA_MF_PARENT_MASK)]); 720 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[3] = (sa_parent_link[prefetch[4]] & ESA_MF_PARENT_MASK)]); 721 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[4] = (sa_parent_link[prefetch[5]] & ESA_MF_PARENT_MASK)]); 722 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[5] = (sa_parent_link[prefetch[6]] & ESA_MF_PARENT_MASK)]); 723 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[6] = (plcp_leaf_link[position + 8 * prefetch_distance])]); 724 | esa_matchfinder_prefetchr(&plcp_leaf_link[position + 9 * prefetch_distance]); 725 | 726 | const uint64_t min_match_length = (uint64_t)matchfinder_ctx->min_match_length_minus_1; 727 | const uint64_t new_offset = (uint64_t)position << ESA_MF_OFFSET_SHIFT; 728 | uint64_t best_match = (uint64_t)(uint32_t)-1; 729 | uint64_t reference = plcp_leaf_link[position]; 730 | 731 | while (reference != 0) 732 | { 733 | const uint64_t interval = sa_parent_link[reference]; 734 | const uint64_t match = min_match_length + (interval >> ESA_MF_LCP_SHIFT) + ((interval & ESA_MF_OFFSET_MASK) << (32 - ESA_MF_OFFSET_SHIFT)); 735 | 736 | #if defined(__LITTLE_ENDIAN__) && !defined(__BIG_ENDIAN__) 737 | if (offsetof(ESA_MATCHFINDER_MATCH, length) == 0 && offsetof(ESA_MATCHFINDER_MATCH, offset) == 4) 738 | { 739 | *(uint64_t *)(void *)next_match = match; 740 | } 741 | else 742 | #endif 743 | { 744 | next_match->length = (int32_t)(match ); 745 | next_match->offset = (int32_t)(match >> 32); 746 | } 747 | 748 | next_match += match > best_match; 749 | best_match = match; 750 | 751 | sa_parent_link[reference] = (interval & (~ESA_MF_OFFSET_MASK)) + new_offset; 752 | reference = interval & ESA_MF_PARENT_MASK; 753 | } 754 | 755 | return next_match; 756 | } 757 | 758 | ESA_MATCHFINDER_MATCH * esa_matchfinder_find_all_matches_in_window(void * mf, ESA_MATCHFINDER_MATCH * matches, int32_t window_size) 759 | { 760 | ESA_MF_CONTEXT * ESA_MF_RESTRICT const matchfinder_ctx = (ESA_MF_CONTEXT *)mf; 761 | 762 | const ptrdiff_t prefetch_distance = 4; 763 | const uint64_t position = matchfinder_ctx->position++; 764 | 765 | uint64_t * ESA_MF_RESTRICT const sa_parent_link = matchfinder_ctx->sa_parent_link; 766 | uint32_t * ESA_MF_RESTRICT const plcp_leaf_link = matchfinder_ctx->plcp_leaf_link; 767 | uint64_t * ESA_MF_RESTRICT const prefetch = &matchfinder_ctx->prefetch[position & (prefetch_distance - 1)][0]; 768 | ESA_MATCHFINDER_MATCH * ESA_MF_RESTRICT next_match = matches; 769 | 770 | esa_matchfinder_prefetchw(&sa_parent_link[ (sa_parent_link[prefetch[0]] & ESA_MF_PARENT_MASK)]); 771 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[0] = (sa_parent_link[prefetch[1]] & ESA_MF_PARENT_MASK)]); 772 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[1] = (sa_parent_link[prefetch[2]] & ESA_MF_PARENT_MASK)]); 773 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[2] = (sa_parent_link[prefetch[3]] & ESA_MF_PARENT_MASK)]); 774 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[3] = (sa_parent_link[prefetch[4]] & ESA_MF_PARENT_MASK)]); 775 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[4] = (sa_parent_link[prefetch[5]] & ESA_MF_PARENT_MASK)]); 776 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[5] = (sa_parent_link[prefetch[6]] & ESA_MF_PARENT_MASK)]); 777 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[6] = (plcp_leaf_link[position + 8 * prefetch_distance])]); 778 | esa_matchfinder_prefetchr(&plcp_leaf_link[position + 9 * prefetch_distance]); 779 | 780 | const uint64_t min_match_length = (uint64_t)matchfinder_ctx->min_match_length_minus_1; 781 | const uint64_t new_offset = (uint64_t)position << ESA_MF_OFFSET_SHIFT; 782 | uint64_t best_match = (position > (uint64_t)window_size ? (position - (uint64_t)window_size) << 32 : 0) + (uint64_t)(uint32_t)-1; 783 | uint64_t reference = plcp_leaf_link[position]; 784 | 785 | while (reference != 0) 786 | { 787 | const uint64_t interval = sa_parent_link[reference]; 788 | const uint64_t match = min_match_length + (interval >> ESA_MF_LCP_SHIFT) + ((interval & ESA_MF_OFFSET_MASK) << (32 - ESA_MF_OFFSET_SHIFT)); 789 | 790 | #if defined(__LITTLE_ENDIAN__) && !defined(__BIG_ENDIAN__) 791 | if (offsetof(ESA_MATCHFINDER_MATCH, length) == 0 && offsetof(ESA_MATCHFINDER_MATCH, offset) == 4) 792 | { 793 | *(uint64_t *)(void *)next_match = match; 794 | } 795 | else 796 | #endif 797 | { 798 | next_match->length = (int32_t)(match ); 799 | next_match->offset = (int32_t)(match >> 32); 800 | } 801 | 802 | next_match += match > best_match; 803 | best_match = match; 804 | 805 | sa_parent_link[reference] = (interval & (~ESA_MF_OFFSET_MASK)) + new_offset; 806 | reference = interval & ESA_MF_PARENT_MASK; 807 | } 808 | 809 | return next_match; 810 | } 811 | 812 | ESA_MATCHFINDER_MATCH esa_matchfinder_find_best_match(void * mf) 813 | { 814 | ESA_MF_CONTEXT * ESA_MF_RESTRICT const matchfinder_ctx = (ESA_MF_CONTEXT *)mf; 815 | 816 | const ptrdiff_t prefetch_distance = 4; 817 | const uint64_t position = matchfinder_ctx->position++; 818 | 819 | uint64_t * ESA_MF_RESTRICT const sa_parent_link = matchfinder_ctx->sa_parent_link; 820 | uint32_t * ESA_MF_RESTRICT const plcp_leaf_link = matchfinder_ctx->plcp_leaf_link; 821 | uint64_t * ESA_MF_RESTRICT const prefetch = &matchfinder_ctx->prefetch[position & (prefetch_distance - 1)][0]; 822 | 823 | esa_matchfinder_prefetchw(&sa_parent_link[ (sa_parent_link[prefetch[0]] & ESA_MF_PARENT_MASK)]); 824 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[0] = (sa_parent_link[prefetch[1]] & ESA_MF_PARENT_MASK)]); 825 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[1] = (sa_parent_link[prefetch[2]] & ESA_MF_PARENT_MASK)]); 826 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[2] = (sa_parent_link[prefetch[3]] & ESA_MF_PARENT_MASK)]); 827 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[3] = (sa_parent_link[prefetch[4]] & ESA_MF_PARENT_MASK)]); 828 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[4] = (sa_parent_link[prefetch[5]] & ESA_MF_PARENT_MASK)]); 829 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[5] = (sa_parent_link[prefetch[6]] & ESA_MF_PARENT_MASK)]); 830 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[6] = (plcp_leaf_link[position + 8 * prefetch_distance])]); 831 | esa_matchfinder_prefetchr(&plcp_leaf_link[position + 9 * prefetch_distance]); 832 | 833 | const uint64_t min_match_length = (uint64_t)matchfinder_ctx->min_match_length_minus_1; 834 | const uint64_t new_offset = (uint64_t)position << ESA_MF_OFFSET_SHIFT; 835 | uint64_t best_match = 0; 836 | uint64_t reference = plcp_leaf_link[position]; 837 | 838 | while (reference != 0) 839 | { 840 | const uint64_t interval = sa_parent_link[reference]; 841 | uint64_t match = min_match_length + (interval >> ESA_MF_LCP_SHIFT) + ((interval & ESA_MF_OFFSET_MASK) << (32 - ESA_MF_OFFSET_SHIFT)); 842 | 843 | match = interval & ESA_MF_OFFSET_MASK ? match : best_match; 844 | best_match = best_match == 0 ? match : best_match; 845 | 846 | sa_parent_link[reference] = (interval & (~ESA_MF_OFFSET_MASK)) + new_offset; 847 | reference = interval & ESA_MF_PARENT_MASK; 848 | } 849 | 850 | { 851 | ESA_MATCHFINDER_MATCH match; 852 | 853 | #if defined(__LITTLE_ENDIAN__) && !defined(__BIG_ENDIAN__) 854 | if (offsetof(ESA_MATCHFINDER_MATCH, length) == 0 && offsetof(ESA_MATCHFINDER_MATCH, offset) == 4) 855 | { 856 | *(uint64_t *)(void *)&match = best_match; 857 | } 858 | else 859 | #endif 860 | { 861 | match.length = (int32_t)(best_match ); 862 | match.offset = (int32_t)(best_match >> 32); 863 | } 864 | 865 | return match; 866 | } 867 | } 868 | 869 | ESA_MATCHFINDER_MATCH esa_matchfinder_find_best_match_in_window(void * mf, int32_t window_size) 870 | { 871 | ESA_MF_CONTEXT * ESA_MF_RESTRICT const matchfinder_ctx = (ESA_MF_CONTEXT *)mf; 872 | 873 | const ptrdiff_t prefetch_distance = 4; 874 | const uint64_t position = matchfinder_ctx->position++; 875 | 876 | uint64_t * ESA_MF_RESTRICT const sa_parent_link = matchfinder_ctx->sa_parent_link; 877 | uint32_t * ESA_MF_RESTRICT const plcp_leaf_link = matchfinder_ctx->plcp_leaf_link; 878 | uint64_t * ESA_MF_RESTRICT const prefetch = &matchfinder_ctx->prefetch[position & (prefetch_distance - 1)][0]; 879 | 880 | esa_matchfinder_prefetchw(&sa_parent_link[ (sa_parent_link[prefetch[0]] & ESA_MF_PARENT_MASK)]); 881 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[0] = (sa_parent_link[prefetch[1]] & ESA_MF_PARENT_MASK)]); 882 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[1] = (sa_parent_link[prefetch[2]] & ESA_MF_PARENT_MASK)]); 883 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[2] = (sa_parent_link[prefetch[3]] & ESA_MF_PARENT_MASK)]); 884 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[3] = (sa_parent_link[prefetch[4]] & ESA_MF_PARENT_MASK)]); 885 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[4] = (sa_parent_link[prefetch[5]] & ESA_MF_PARENT_MASK)]); 886 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[5] = (sa_parent_link[prefetch[6]] & ESA_MF_PARENT_MASK)]); 887 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[6] = (plcp_leaf_link[position + 8 * prefetch_distance])]); 888 | esa_matchfinder_prefetchr(&plcp_leaf_link[position + 9 * prefetch_distance]); 889 | 890 | const uint64_t min_match_length = (uint64_t)matchfinder_ctx->min_match_length_minus_1; 891 | const uint64_t new_offset = (uint64_t)position << ESA_MF_OFFSET_SHIFT; 892 | const uint64_t match_cutoff = (position > (uint64_t)window_size ? (position - (uint64_t)window_size) << 32 : 0) + (uint64_t)(uint32_t)-1; 893 | 894 | uint64_t best_match = 0; 895 | uint64_t reference = plcp_leaf_link[position]; 896 | 897 | while (reference != 0) 898 | { 899 | const uint64_t interval = sa_parent_link[reference]; 900 | uint64_t match = min_match_length + (interval >> ESA_MF_LCP_SHIFT) + ((interval & ESA_MF_OFFSET_MASK) << (32 - ESA_MF_OFFSET_SHIFT)); 901 | 902 | match = match > match_cutoff ? match : best_match; 903 | best_match = best_match == 0 ? match : best_match; 904 | 905 | sa_parent_link[reference] = (interval & (~ESA_MF_OFFSET_MASK)) + new_offset; 906 | reference = interval & ESA_MF_PARENT_MASK; 907 | } 908 | 909 | { 910 | ESA_MATCHFINDER_MATCH match; 911 | 912 | #if defined(__LITTLE_ENDIAN__) && !defined(__BIG_ENDIAN__) 913 | if (offsetof(ESA_MATCHFINDER_MATCH, length) == 0 && offsetof(ESA_MATCHFINDER_MATCH, offset) == 4) 914 | { 915 | *(uint64_t *)(void *)&match = best_match; 916 | } 917 | else 918 | #endif 919 | { 920 | match.length = (int32_t)(best_match ); 921 | match.offset = (int32_t)(best_match >> 32); 922 | } 923 | 924 | return match; 925 | } 926 | } 927 | 928 | static void esa_matchfinder_advance_backwards(void * mf, int32_t count) 929 | { 930 | ESA_MF_CONTEXT * ESA_MF_RESTRICT const matchfinder_ctx = (ESA_MF_CONTEXT *)mf; 931 | 932 | const ptrdiff_t prefetch_distance = 4; 933 | const uint64_t current_position = matchfinder_ctx->position; 934 | const uint64_t target_position = matchfinder_ctx->position += (uint64_t)count; 935 | 936 | uint64_t * ESA_MF_RESTRICT const sa_parent_link = matchfinder_ctx->sa_parent_link; 937 | uint32_t * ESA_MF_RESTRICT const plcp_leaf_link = matchfinder_ctx->plcp_leaf_link; 938 | 939 | memset(matchfinder_ctx->prefetch, 0, sizeof(matchfinder_ctx->prefetch)); 940 | 941 | for (uint64_t position = target_position + prefetch_distance * 8; position-- != target_position; ) 942 | { 943 | uint64_t * ESA_MF_RESTRICT const prefetch = &matchfinder_ctx->prefetch[position & (prefetch_distance - 1)][0]; 944 | 945 | esa_matchfinder_prefetchw(&sa_parent_link[ (sa_parent_link[prefetch[0]] & ESA_MF_PARENT_MASK)]); 946 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[0] = (sa_parent_link[prefetch[1]] & ESA_MF_PARENT_MASK)]); 947 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[1] = (sa_parent_link[prefetch[2]] & ESA_MF_PARENT_MASK)]); 948 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[2] = (sa_parent_link[prefetch[3]] & ESA_MF_PARENT_MASK)]); 949 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[3] = (sa_parent_link[prefetch[4]] & ESA_MF_PARENT_MASK)]); 950 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[4] = (sa_parent_link[prefetch[5]] & ESA_MF_PARENT_MASK)]); 951 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[5] = (sa_parent_link[prefetch[6]] & ESA_MF_PARENT_MASK)]); 952 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[6] = (plcp_leaf_link[position - 8 * prefetch_distance])]); 953 | esa_matchfinder_prefetchr(&plcp_leaf_link[position - 9 * prefetch_distance]); 954 | } 955 | 956 | for (uint64_t position = target_position; position-- != current_position; ) 957 | { 958 | if (position >= (uint64_t)(8 * prefetch_distance)) 959 | { 960 | uint64_t * ESA_MF_RESTRICT const prefetch = &matchfinder_ctx->prefetch[position & (prefetch_distance - 1)][0]; 961 | 962 | esa_matchfinder_prefetchw(&sa_parent_link[ (sa_parent_link[prefetch[0]] & ESA_MF_PARENT_MASK)]); 963 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[0] = (sa_parent_link[prefetch[1]] & ESA_MF_PARENT_MASK)]); 964 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[1] = (sa_parent_link[prefetch[2]] & ESA_MF_PARENT_MASK)]); 965 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[2] = (sa_parent_link[prefetch[3]] & ESA_MF_PARENT_MASK)]); 966 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[3] = (sa_parent_link[prefetch[4]] & ESA_MF_PARENT_MASK)]); 967 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[4] = (sa_parent_link[prefetch[5]] & ESA_MF_PARENT_MASK)]); 968 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[5] = (sa_parent_link[prefetch[6]] & ESA_MF_PARENT_MASK)]); 969 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[6] = (plcp_leaf_link[position - 8 * prefetch_distance])]); 970 | esa_matchfinder_prefetchr(&plcp_leaf_link[position - 9 * prefetch_distance]); 971 | } 972 | 973 | const uint64_t new_offset = (uint64_t)position << ESA_MF_OFFSET_SHIFT; 974 | uint64_t reference = plcp_leaf_link[position]; 975 | uint64_t interval = sa_parent_link[reference]; 976 | 977 | while ((interval & ESA_MF_OFFSET_MASK) < new_offset) 978 | { 979 | sa_parent_link[reference] = (interval & (~ESA_MF_OFFSET_MASK)) + new_offset; 980 | reference = interval & ESA_MF_PARENT_MASK; 981 | interval = sa_parent_link[reference]; 982 | } 983 | } 984 | 985 | memset(matchfinder_ctx->prefetch, 0, sizeof(matchfinder_ctx->prefetch)); 986 | 987 | for (uint64_t position = target_position - prefetch_distance * 8; position != target_position; position += 1) 988 | { 989 | uint64_t * ESA_MF_RESTRICT const prefetch = &matchfinder_ctx->prefetch[position & (prefetch_distance - 1)][0]; 990 | 991 | esa_matchfinder_prefetchw(&sa_parent_link[ (sa_parent_link[prefetch[0]] & ESA_MF_PARENT_MASK)]); 992 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[0] = (sa_parent_link[prefetch[1]] & ESA_MF_PARENT_MASK)]); 993 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[1] = (sa_parent_link[prefetch[2]] & ESA_MF_PARENT_MASK)]); 994 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[2] = (sa_parent_link[prefetch[3]] & ESA_MF_PARENT_MASK)]); 995 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[3] = (sa_parent_link[prefetch[4]] & ESA_MF_PARENT_MASK)]); 996 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[4] = (sa_parent_link[prefetch[5]] & ESA_MF_PARENT_MASK)]); 997 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[5] = (sa_parent_link[prefetch[6]] & ESA_MF_PARENT_MASK)]); 998 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[6] = (plcp_leaf_link[position + 8 * prefetch_distance])]); 999 | esa_matchfinder_prefetchr(&plcp_leaf_link[position + 9 * prefetch_distance]); 1000 | } 1001 | } 1002 | 1003 | void esa_matchfinder_advance(void * mf, int32_t count) 1004 | { 1005 | if (count >= /*ESA_MF_ADVANCE_BACKWARDS_THRESHOLD*/ 64) 1006 | { 1007 | esa_matchfinder_advance_backwards(mf, count); 1008 | return; 1009 | } 1010 | 1011 | ESA_MF_CONTEXT * ESA_MF_RESTRICT const matchfinder_ctx = (ESA_MF_CONTEXT *)mf; 1012 | 1013 | const ptrdiff_t prefetch_distance = 4; 1014 | const uint64_t current_position = matchfinder_ctx->position; 1015 | const uint64_t target_position = matchfinder_ctx->position += (uint64_t)count; 1016 | 1017 | uint64_t * ESA_MF_RESTRICT const sa_parent_link = matchfinder_ctx->sa_parent_link; 1018 | uint32_t * ESA_MF_RESTRICT const plcp_leaf_link = matchfinder_ctx->plcp_leaf_link; 1019 | 1020 | for (uint64_t position = current_position; position < target_position; position += 1) 1021 | { 1022 | uint64_t * ESA_MF_RESTRICT const prefetch = &matchfinder_ctx->prefetch[position & (prefetch_distance - 1)][0]; 1023 | 1024 | esa_matchfinder_prefetchw(&sa_parent_link[ (sa_parent_link[prefetch[0]] & ESA_MF_PARENT_MASK)]); 1025 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[0] = (sa_parent_link[prefetch[1]] & ESA_MF_PARENT_MASK)]); 1026 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[1] = (sa_parent_link[prefetch[2]] & ESA_MF_PARENT_MASK)]); 1027 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[2] = (sa_parent_link[prefetch[3]] & ESA_MF_PARENT_MASK)]); 1028 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[3] = (sa_parent_link[prefetch[4]] & ESA_MF_PARENT_MASK)]); 1029 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[4] = (sa_parent_link[prefetch[5]] & ESA_MF_PARENT_MASK)]); 1030 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[5] = (sa_parent_link[prefetch[6]] & ESA_MF_PARENT_MASK)]); 1031 | esa_matchfinder_prefetchw(&sa_parent_link[prefetch[6] = (plcp_leaf_link[position + 8 * prefetch_distance])]); 1032 | esa_matchfinder_prefetchr(&plcp_leaf_link[position + 9 * prefetch_distance]); 1033 | 1034 | const uint64_t new_offset = (uint64_t)position << ESA_MF_OFFSET_SHIFT; 1035 | uint64_t reference = plcp_leaf_link[position]; 1036 | 1037 | while (reference != 0) 1038 | { 1039 | uint64_t interval = sa_parent_link[reference]; 1040 | 1041 | sa_parent_link[reference] = (interval & (~ESA_MF_OFFSET_MASK)) + new_offset; 1042 | reference = interval & ESA_MF_PARENT_MASK; 1043 | } 1044 | } 1045 | } 1046 | 1047 | #if defined(__clang__) 1048 | #pragma clang diagnostic pop 1049 | #elif defined(__GNUC__) 1050 | #pragma GCC diagnostic pop 1051 | #elif defined(_MSC_VER) 1052 | #pragma warning(pop) 1053 | #endif 1054 | -------------------------------------------------------------------------------- /esa_matchfinder.h: -------------------------------------------------------------------------------- 1 | /*-- 2 | 3 | This file is a part of esa-matchfinder, a library for efficient 4 | Lempel-Ziv factorization using enhanced suffix array (ESA). 5 | 6 | Copyright (c) 2022-2025 Ilya Grebnov 7 | 8 | Licensed under the Apache License, Version 2.0 (the "License"); 9 | you may not use this file except in compliance with the License. 10 | You may obtain a copy of the License at 11 | 12 | http://www.apache.org/licenses/LICENSE-2.0 13 | 14 | Unless required by applicable law or agreed to in writing, software 15 | distributed under the License is distributed on an "AS IS" BASIS, 16 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 17 | See the License for the specific language governing permissions and 18 | limitations under the License. 19 | 20 | Please see the file LICENSE for full copyright and license details. 21 | 22 | --*/ 23 | 24 | #ifndef ESA_MATCHFINDER_H 25 | #define ESA_MATCHFINDER_H 1 26 | 27 | #define ESA_MATCHFINDER_MATCH_BITS (6) 28 | #define ESA_MATCHFINDER_MAX_BLOCK_SIZE (1 << ((64 - ESA_MATCHFINDER_MATCH_BITS) / 2)) 29 | #define ESA_MATCHFINDER_MIN_MATCH_LENGTH (2) 30 | #define ESA_MATCHFINDER_MAX_MATCH_LENGTH (1 << ESA_MATCHFINDER_MATCH_BITS) 31 | 32 | #define ESA_MATCHFINDER_NO_ERROR (0) 33 | #define ESA_MATCHFINDER_BAD_PARAMETER (-1) 34 | 35 | #define ESA_MATCHFINDER_VERSION_MAJOR 1 36 | #define ESA_MATCHFINDER_VERSION_MINOR 2 37 | #define ESA_MATCHFINDER_VERSION_PATCH 1 38 | #define ESA_MATCHFINDER_VERSION_STRING "1.2.1" 39 | 40 | #if defined(ESA_MATCHFINDER_OPENMP) && !defined(LIBSAIS_OPENMP) 41 | #error "ESA_MATCHFINDER_OPENMP requires LIBSAIS_OPENMP to be defined. Please define LIBSAIS_OPENMP and enable OpenMP support for libsais." 42 | #endif 43 | 44 | #ifdef __cplusplus 45 | extern "C" { 46 | #endif 47 | 48 | #include 49 | 50 | typedef struct ESA_MATCHFINDER_MATCH 51 | { 52 | int32_t length; 53 | int32_t offset; 54 | } ESA_MATCHFINDER_MATCH; 55 | 56 | /** 57 | * Creates the enhanced suffix array (ESA) based match-finder for Lempel-Ziv factorization. 58 | * @param max_block_size The maximum block size to support (must be less or equal to ESA_MATCHFINDER_MAX_BLOCK_SIZE). 59 | * @param min_match_length The minimum match length to find (must be greater or equal to ESA_MATCHFINDER_MIN_MATCH_LENGTH). 60 | * @param max_match_length The maximum match length to find (must be less or equal to ESA_MATCHFINDER_MAX_MATCH_LENGTH). 61 | * @return The enhanced suffix array (ESA) based match-finder, NULL otherwise. 62 | */ 63 | void * esa_matchfinder_create(int32_t max_block_size, int32_t min_match_length, int32_t max_match_length); 64 | 65 | #if defined(ESA_MATCHFINDER_OPENMP) 66 | /** 67 | * Creates the enhanced suffix array (ESA) based match-finder for Lempel-Ziv factorization with multi-threaded optimization using OpenMP. 68 | * @param max_block_size The maximum block size to support (must be less or equal to ESA_MATCHFINDER_MAX_BLOCK_SIZE). 69 | * @param min_match_length The minimum match length to find (must be greater or equal to ESA_MATCHFINDER_MIN_MATCH_LENGTH). 70 | * @param max_match_length The maximum match length to find (must be less or equal to ESA_MATCHFINDER_MAX_MATCH_LENGTH). 71 | * @param num_threads The number of OpenMP threads to use (can be 0 for default number of OpenMP threads). 72 | * @return The enhanced suffix array (ESA) based match-finder, NULL otherwise. 73 | */ 74 | void * esa_matchfinder_create_omp(int32_t max_block_size, int32_t min_match_length, int32_t max_match_length, int32_t num_threads); 75 | #endif 76 | 77 | /** 78 | * Destroys the match-finder and frees previously allocated memory. 79 | * @param mf The enhanced suffix array (ESA) based match-finder. 80 | */ 81 | void esa_matchfinder_destroy(void * mf); 82 | 83 | /** 84 | * Parses the input block by building enhanced suffix array (ESA) to speed up subsequent match-finding operations. 85 | * @param mf The enhanced suffix array (ESA) based match-finder. 86 | * @param block The input block to parse. 87 | * @param block_size The size of input block to parse. 88 | * @return 0 if no error occurred, -1 otherwise. 89 | */ 90 | int32_t esa_matchfinder_parse(void * mf, const uint8_t * block, int32_t block_size); 91 | 92 | /** 93 | * Gets the current match-finder position. 94 | * @param mf The enhanced suffix array (ESA) based match-finder. 95 | * @return The current match-finder position. 96 | */ 97 | int32_t esa_matchfinder_get_position(void * mf); 98 | 99 | /** 100 | * Rewinds the match-finder forward or backward to the specified position. 101 | * @param mf The enhanced suffix array (ESA) based match-finder. 102 | * @param position The match-finder position to rewind to. 103 | * @return 0 if no error occurred, -1 otherwise. 104 | */ 105 | int32_t esa_matchfinder_rewind(void * mf, int32_t position); 106 | 107 | /** 108 | * Finds all distance-optimal matches at the current position of the match-finder, and then advances the position by one byte. 109 | * The recorded matches will be sorted by strictly decreasing length and strictly increasing offset from the beginning of the block. 110 | * @param mf The enhanced suffix array (ESA) based match-finder. 111 | * @param matches The output array to record the matches (array must be of ESA_MATCHFINDER_MAX_MATCH_LENGTH size). 112 | * @return The pointer to the end of recorded matches array (if no matches were found, this will be the same as matches). 113 | */ 114 | ESA_MATCHFINDER_MATCH * esa_matchfinder_find_all_matches(void * mf, ESA_MATCHFINDER_MATCH * matches); 115 | 116 | /** 117 | * Finds all distance-optimal matches within a specified sliding window at the current position of the match-finder, and then advances the position by one byte. 118 | * The recorded matches will be sorted by strictly decreasing length and strictly increasing offset from the beginning of the block. 119 | * @param mf The enhanced suffix array (ESA) based match-finder. 120 | * @param matches The output array to record the matches (array must be of ESA_MATCHFINDER_MAX_MATCH_LENGTH size). 121 | * @param window_size The maximum allowed distance between the current position and found matches. 122 | * @return The pointer to the end of recorded matches array (if no matches were found, this will be the same as matches). 123 | */ 124 | ESA_MATCHFINDER_MATCH * esa_matchfinder_find_all_matches_in_window(void * mf, ESA_MATCHFINDER_MATCH * matches, int32_t window_size); 125 | 126 | /** 127 | * Finds the best match at the current position of the match-finder, and then advances the position by one byte. 128 | * @param mf The enhanced suffix array (ESA) based match-finder. 129 | * @return The best match found (match of zero length and zero offset is returned if no matches were found). 130 | */ 131 | ESA_MATCHFINDER_MATCH esa_matchfinder_find_best_match(void * mf); 132 | 133 | /** 134 | * Finds the best match within a specified sliding window at the current position of the match-finder, and then advances the position by one byte. 135 | * @param mf The enhanced suffix array (ESA) based match-finder. 136 | * @param window_size The maximum allowed distance between the current position and found match. 137 | * @return The best match found (match of zero length and zero offset is returned if no matches were found). 138 | */ 139 | ESA_MATCHFINDER_MATCH esa_matchfinder_find_best_match_in_window(void * mf, int32_t window_size); 140 | 141 | /** 142 | * Advances the match-finder position forward by the specified number of bytes without recording matches. 143 | * @param mf The enhanced suffix array (ESA) based match-finder. 144 | * @param count The number of bytes to advance. 145 | */ 146 | void esa_matchfinder_advance(void * mf, int32_t count); 147 | 148 | #ifdef __cplusplus 149 | } 150 | #endif 151 | 152 | #endif 153 | -------------------------------------------------------------------------------- /libsais/CHANGES: -------------------------------------------------------------------------------- 1 | Changes in 2.10.0 (April 12, 2025) 2 | - Improved performance, with noticeable gains on ARM architecture. 3 | - Fixed compiler warnings and addressed undefined behavior. 4 | 5 | Changes in 2.9.1 (March 19, 2025) 6 | - No functional changes, resolved compiler warnings & undefined behavior. 7 | 8 | Changes in 2.9.0 (March 16, 2025) 9 | - Support for generalized suffix array (GSA) construction. 10 | - Support for longest common prefix array (LCP) construction for generalized suffix array (GSA). 11 | 12 | Changes in 2.8.7 (January 16, 2025) 13 | - Restore the input array after suffix array construction (libsais64 & libsais16x64). 14 | 15 | Changes in 2.8.6 (November 18, 2024) 16 | - Fixed out-of-bound memory access issue for large inputs. 17 | 18 | Changes in 2.8.5 (July 31, 2024) 19 | - Miscellaneous changes to reduce compiler warnings about implicit functions. 20 | 21 | Changes in 2.8.4 (June 13, 2024) 22 | - Additional OpenMP acceleration (libsais16 & libsais16x64). 23 | 24 | Changes in 2.8.3 (June 11, 2024) 25 | - Implemented suffix array construction of a long 16-bit array (libsais16x64). 26 | 27 | Changes in 2.8.2 (May 27, 2024) 28 | - Implemented suffix array construction of a long 64-bit array (libsais64). 29 | 30 | Changes in 2.8.1 (April 5, 2024) 31 | - Fixed out-of-bound memory access issue for large inputs (libsais64). 32 | 33 | Changes in 2.8.0 (March 3, 2024) 34 | - Implemented permuted longest common prefix array (PLCP) construction of an integer array. 35 | - Fixed compilation error when compiling the library with OpenMP enabled. 36 | 37 | Changes in 2.7.5 (February 26, 2024) 38 | - Improved performance of suffix array and burrows wheeler transform construction on degenerate inputs. 39 | 40 | Changes in 2.7.4 (February 23, 2024) 41 | - Resolved strict aliasing violation resulted in invalid code generation by Intel compiler. 42 | 43 | Changes in 2.7.3 (April 21, 2023) 44 | - CMake script for library build and integration with other projects. 45 | 46 | Changes in 2.7.2 (April 18, 2023) 47 | - Fixed out-of-bound memory access issue for large inputs (libsais64). 48 | 49 | Changes in 2.7.1 (June 19, 2022) 50 | - Improved cache coherence for ARMv8 architecture. 51 | 52 | Changes in 2.7.0 (April 12, 2022) 53 | - Support for longest common prefix array (LCP) construction. 54 | 55 | Changes in 2.6.5 (January 1, 2022) 56 | - Exposed functions to construct suffix array of a given integer array. 57 | - Improved detection of various compiler intrinsics. 58 | - Capped free space parameter to avoid crashing due to 32-bit integer overflow. 59 | 60 | Changes in 2.6.0 (October 21, 2021) 61 | - libsais16 for 16-bit inputs. 62 | 63 | Changes in 2.5.0 (October 15, 2021) 64 | - Support for optional symbol frequency tables. 65 | 66 | Changes in 2.4.0 (July 14, 2021) 67 | - Reverse Burrows-Wheeler transform. 68 | 69 | Changes in 2.3.0 (June 23, 2021) 70 | - Burrows-Wheeler transform with auxiliary indexes. 71 | 72 | Changes in 2.2.0 (April 27, 2021) 73 | - libsais64 for inputs larger than 2GB. 74 | 75 | Changes in 2.1.0 (April 19, 2021) 76 | - Additional OpenMP acceleration. 77 | 78 | Changes in 2.0.0 (April 4, 2021) 79 | - OpenMP acceleration. 80 | 81 | Changes in 1.0.0 (February 23, 2021) 82 | - Initial Release. 83 | -------------------------------------------------------------------------------- /libsais/LICENSE: -------------------------------------------------------------------------------- 1 | 2 | Apache License 3 | Version 2.0, January 2004 4 | http://www.apache.org/licenses/ 5 | 6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 7 | 8 | 1. Definitions. 9 | 10 | "License" shall mean the terms and conditions for use, reproduction, 11 | and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by 14 | the copyright owner that is granting the License. 15 | 16 | "Legal Entity" shall mean the union of the acting entity and all 17 | other entities that control, are controlled by, or are under common 18 | control with that entity. For the purposes of this definition, 19 | "control" means (i) the power, direct or indirect, to cause the 20 | direction or management of such entity, whether by contract or 21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 22 | outstanding shares, or (iii) beneficial ownership of such entity. 23 | 24 | "You" (or "Your") shall mean an individual or Legal Entity 25 | exercising permissions granted by this License. 26 | 27 | "Source" form shall mean the preferred form for making modifications, 28 | including but not limited to software source code, documentation 29 | source, and configuration files. 30 | 31 | "Object" form shall mean any form resulting from mechanical 32 | transformation or translation of a Source form, including but 33 | not limited to compiled object code, generated documentation, 34 | and conversions to other media types. 35 | 36 | "Work" shall mean the work of authorship, whether in Source or 37 | Object form, made available under the License, as indicated by a 38 | copyright notice that is included in or attached to the work 39 | (an example is provided in the Appendix below). 40 | 41 | "Derivative Works" shall mean any work, whether in Source or Object 42 | form, that is based on (or derived from) the Work and for which the 43 | editorial revisions, annotations, elaborations, or other modifications 44 | represent, as a whole, an original work of authorship. For the purposes 45 | of this License, Derivative Works shall not include works that remain 46 | separable from, or merely link (or bind by name) to the interfaces of, 47 | the Work and Derivative Works thereof. 48 | 49 | "Contribution" shall mean any work of authorship, including 50 | the original version of the Work and any modifications or additions 51 | to that Work or Derivative Works thereof, that is intentionally 52 | submitted to Licensor for inclusion in the Work by the copyright owner 53 | or by an individual or Legal Entity authorized to submit on behalf of 54 | the copyright owner. For the purposes of this definition, "submitted" 55 | means any form of electronic, verbal, or written communication sent 56 | to the Licensor or its representatives, including but not limited to 57 | communication on electronic mailing lists, source code control systems, 58 | and issue tracking systems that are managed by, or on behalf of, the 59 | Licensor for the purpose of discussing and improving the Work, but 60 | excluding communication that is conspicuously marked or otherwise 61 | designated in writing by the copyright owner as "Not a Contribution." 62 | 63 | "Contributor" shall mean Licensor and any individual or Legal Entity 64 | on behalf of whom a Contribution has been received by Licensor and 65 | subsequently incorporated within the Work. 66 | 67 | 2. Grant of Copyright License. Subject to the terms and conditions of 68 | this License, each Contributor hereby grants to You a perpetual, 69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 70 | copyright license to reproduce, prepare Derivative Works of, 71 | publicly display, publicly perform, sublicense, and distribute the 72 | Work and such Derivative Works in Source or Object form. 73 | 74 | 3. Grant of Patent License. Subject to the terms and conditions of 75 | this License, each Contributor hereby grants to You a perpetual, 76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 77 | (except as stated in this section) patent license to make, have made, 78 | use, offer to sell, sell, import, and otherwise transfer the Work, 79 | where such license applies only to those patent claims licensable 80 | by such Contributor that are necessarily infringed by their 81 | Contribution(s) alone or by combination of their Contribution(s) 82 | with the Work to which such Contribution(s) was submitted. If You 83 | institute patent litigation against any entity (including a 84 | cross-claim or counterclaim in a lawsuit) alleging that the Work 85 | or a Contribution incorporated within the Work constitutes direct 86 | or contributory patent infringement, then any patent licenses 87 | granted to You under this License for that Work shall terminate 88 | as of the date such litigation is filed. 89 | 90 | 4. Redistribution. You may reproduce and distribute copies of the 91 | Work or Derivative Works thereof in any medium, with or without 92 | modifications, and in Source or Object form, provided that You 93 | meet the following conditions: 94 | 95 | (a) You must give any other recipients of the Work or 96 | Derivative Works a copy of this License; and 97 | 98 | (b) You must cause any modified files to carry prominent notices 99 | stating that You changed the files; and 100 | 101 | (c) You must retain, in the Source form of any Derivative Works 102 | that You distribute, all copyright, patent, trademark, and 103 | attribution notices from the Source form of the Work, 104 | excluding those notices that do not pertain to any part of 105 | the Derivative Works; and 106 | 107 | (d) If the Work includes a "NOTICE" text file as part of its 108 | distribution, then any Derivative Works that You distribute must 109 | include a readable copy of the attribution notices contained 110 | within such NOTICE file, excluding those notices that do not 111 | pertain to any part of the Derivative Works, in at least one 112 | of the following places: within a NOTICE text file distributed 113 | as part of the Derivative Works; within the Source form or 114 | documentation, if provided along with the Derivative Works; or, 115 | within a display generated by the Derivative Works, if and 116 | wherever such third-party notices normally appear. The contents 117 | of the NOTICE file are for informational purposes only and 118 | do not modify the License. You may add Your own attribution 119 | notices within Derivative Works that You distribute, alongside 120 | or as an addendum to the NOTICE text from the Work, provided 121 | that such additional attribution notices cannot be construed 122 | as modifying the License. 123 | 124 | You may add Your own copyright statement to Your modifications and 125 | may provide additional or different license terms and conditions 126 | for use, reproduction, or distribution of Your modifications, or 127 | for any such Derivative Works as a whole, provided Your use, 128 | reproduction, and distribution of the Work otherwise complies with 129 | the conditions stated in this License. 130 | 131 | 5. Submission of Contributions. Unless You explicitly state otherwise, 132 | any Contribution intentionally submitted for inclusion in the Work 133 | by You to the Licensor shall be under the terms and conditions of 134 | this License, without any additional terms or conditions. 135 | Notwithstanding the above, nothing herein shall supersede or modify 136 | the terms of any separate license agreement you may have executed 137 | with Licensor regarding such Contributions. 138 | 139 | 6. Trademarks. This License does not grant permission to use the trade 140 | names, trademarks, service marks, or product names of the Licensor, 141 | except as required for reasonable and customary use in describing the 142 | origin of the Work and reproducing the content of the NOTICE file. 143 | 144 | 7. Disclaimer of Warranty. Unless required by applicable law or 145 | agreed to in writing, Licensor provides the Work (and each 146 | Contributor provides its Contributions) on an "AS IS" BASIS, 147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 148 | implied, including, without limitation, any warranties or conditions 149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 150 | PARTICULAR PURPOSE. You are solely responsible for determining the 151 | appropriateness of using or redistributing the Work and assume any 152 | risks associated with Your exercise of permissions under this License. 153 | 154 | 8. Limitation of Liability. In no event and under no legal theory, 155 | whether in tort (including negligence), contract, or otherwise, 156 | unless required by applicable law (such as deliberate and grossly 157 | negligent acts) or agreed to in writing, shall any Contributor be 158 | liable to You for damages, including any direct, indirect, special, 159 | incidental, or consequential damages of any character arising as a 160 | result of this License or out of the use or inability to use the 161 | Work (including but not limited to damages for loss of goodwill, 162 | work stoppage, computer failure or malfunction, or any and all 163 | other commercial damages or losses), even if such Contributor 164 | has been advised of the possibility of such damages. 165 | 166 | 9. Accepting Warranty or Additional Liability. While redistributing 167 | the Work or Derivative Works thereof, You may choose to offer, 168 | and charge a fee for, acceptance of support, warranty, indemnity, 169 | or other liability obligations and/or rights consistent with this 170 | License. However, in accepting such obligations, You may act only 171 | on Your own behalf and on Your sole responsibility, not on behalf 172 | of any other Contributor, and only if You agree to indemnify, 173 | defend, and hold each Contributor harmless for any liability 174 | incurred by, or claims asserted against, such Contributor by reason 175 | of your accepting any such warranty or additional liability. 176 | 177 | END OF TERMS AND CONDITIONS 178 | 179 | APPENDIX: How to apply the Apache License to your work. 180 | 181 | To apply the Apache License to your work, attach the following 182 | boilerplate notice, with the fields enclosed by brackets "[]" 183 | replaced with your own identifying information. (Don't include 184 | the brackets!) The text should be enclosed in the appropriate 185 | comment syntax for the file format. We also recommend that a 186 | file or class name and description of purpose be included on the 187 | same "printed page" as the copyright notice for easier 188 | identification within third-party archives. 189 | 190 | Copyright [yyyy] [name of copyright owner] 191 | 192 | Licensed under the Apache License, Version 2.0 (the "License"); 193 | you may not use this file except in compliance with the License. 194 | You may obtain a copy of the License at 195 | 196 | http://www.apache.org/licenses/LICENSE-2.0 197 | 198 | Unless required by applicable law or agreed to in writing, software 199 | distributed under the License is distributed on an "AS IS" BASIS, 200 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 201 | See the License for the specific language governing permissions and 202 | limitations under the License. 203 | -------------------------------------------------------------------------------- /libsais/VERSION: -------------------------------------------------------------------------------- 1 | 2.10.0 2 | -------------------------------------------------------------------------------- /libsais/libsais.h: -------------------------------------------------------------------------------- 1 | /*-- 2 | 3 | This file is a part of libsais, a library for linear time suffix array, 4 | longest common prefix array and burrows wheeler transform construction. 5 | 6 | Copyright (c) 2021-2025 Ilya Grebnov 7 | 8 | Licensed under the Apache License, Version 2.0 (the "License"); 9 | you may not use this file except in compliance with the License. 10 | You may obtain a copy of the License at 11 | 12 | http://www.apache.org/licenses/LICENSE-2.0 13 | 14 | Unless required by applicable law or agreed to in writing, software 15 | distributed under the License is distributed on an "AS IS" BASIS, 16 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 17 | See the License for the specific language governing permissions and 18 | limitations under the License. 19 | 20 | Please see the file LICENSE for full copyright information. 21 | 22 | --*/ 23 | 24 | #ifndef LIBSAIS_H 25 | #define LIBSAIS_H 1 26 | 27 | #define LIBSAIS_VERSION_MAJOR 2 28 | #define LIBSAIS_VERSION_MINOR 10 29 | #define LIBSAIS_VERSION_PATCH 0 30 | #define LIBSAIS_VERSION_STRING "2.10.0" 31 | 32 | #ifdef _WIN32 33 | #ifdef LIBSAIS_SHARED 34 | #ifdef LIBSAIS_EXPORTS 35 | #define LIBSAIS_API __declspec(dllexport) 36 | #else 37 | #define LIBSAIS_API __declspec(dllimport) 38 | #endif 39 | #else 40 | #define LIBSAIS_API 41 | #endif 42 | #else 43 | #define LIBSAIS_API 44 | #endif 45 | 46 | #ifdef __cplusplus 47 | extern "C" { 48 | #endif 49 | 50 | #include 51 | 52 | /** 53 | * Creates the libsais context that allows reusing allocated memory with each libsais operation. 54 | * In multi-threaded environments, use one context per thread for parallel executions. 55 | * @return the libsais context, NULL otherwise. 56 | */ 57 | LIBSAIS_API void * libsais_create_ctx(void); 58 | 59 | #if defined(LIBSAIS_OPENMP) 60 | /** 61 | * Creates the libsais context that allows reusing allocated memory with each parallel libsais operation using OpenMP. 62 | * In multi-threaded environments, use one context per thread for parallel executions. 63 | * @param threads The number of OpenMP threads to use (can be 0 for OpenMP default). 64 | * @return the libsais context, NULL otherwise. 65 | */ 66 | LIBSAIS_API void * libsais_create_ctx_omp(int32_t threads); 67 | #endif 68 | 69 | /** 70 | * Destroys the libsass context and free previusly allocated memory. 71 | * @param ctx The libsais context (can be NULL). 72 | */ 73 | LIBSAIS_API void libsais_free_ctx(void * ctx); 74 | 75 | /** 76 | * Constructs the suffix array of a given string. 77 | * @param T [0..n-1] The input string. 78 | * @param SA [0..n-1+fs] The output array of suffixes. 79 | * @param n The length of the given string. 80 | * @param fs The extra space available at the end of SA array (0 should be enough for most cases). 81 | * @param freq [0..255] The output symbol frequency table (can be NULL). 82 | * @return 0 if no error occurred, -1 or -2 otherwise. 83 | */ 84 | LIBSAIS_API int32_t libsais(const uint8_t * T, int32_t * SA, int32_t n, int32_t fs, int32_t * freq); 85 | 86 | /** 87 | * Constructs the generalized suffix array (GSA) of given string set. 88 | * @param T [0..n-1] The input string set using 0 as separators (T[n-1] must be 0). 89 | * @param SA [0..n-1+fs] The output array of suffixes. 90 | * @param n The length of the given string set. 91 | * @param fs The extra space available at the end of SA array (0 should be enough for most cases). 92 | * @param freq [0..255] The output symbol frequency table (can be NULL). 93 | * @return 0 if no error occurred, -1 or -2 otherwise. 94 | */ 95 | LIBSAIS_API int32_t libsais_gsa(const uint8_t * T, int32_t * SA, int32_t n, int32_t fs, int32_t * freq); 96 | 97 | /** 98 | * Constructs the suffix array of a given integer array. 99 | * Note, during construction input array will be modified, but restored at the end if no errors occurred. 100 | * @param T [0..n-1] The input integer array. 101 | * @param SA [0..n-1+fs] The output array of suffixes. 102 | * @param n The length of the integer array. 103 | * @param k The alphabet size of the input integer array. 104 | * @param fs Extra space available at the end of SA array (can be 0, but 4k or better 6k is recommended for optimal performance). 105 | * @return 0 if no error occurred, -1 or -2 otherwise. 106 | */ 107 | LIBSAIS_API int32_t libsais_int(int32_t * T, int32_t * SA, int32_t n, int32_t k, int32_t fs); 108 | 109 | /** 110 | * Constructs the suffix array of a given string using libsais context. 111 | * @param ctx The libsais context. 112 | * @param T [0..n-1] The input string. 113 | * @param SA [0..n-1+fs] The output array of suffixes. 114 | * @param n The length of the given string. 115 | * @param fs The extra space available at the end of SA array (0 should be enough for most cases). 116 | * @param freq [0..255] The output symbol frequency table (can be NULL). 117 | * @return 0 if no error occurred, -1 or -2 otherwise. 118 | */ 119 | LIBSAIS_API int32_t libsais_ctx(const void * ctx, const uint8_t * T, int32_t * SA, int32_t n, int32_t fs, int32_t * freq); 120 | 121 | /** 122 | * Constructs the generalized suffix array (GSA) of given string set using libsais context. 123 | * @param ctx The libsais context. 124 | * @param T [0..n-1] The input string set using 0 as separators (T[n-1] must be 0). 125 | * @param SA [0..n-1+fs] The output array of suffixes. 126 | * @param n The length of the given string set. 127 | * @param fs The extra space available at the end of SA array (0 should be enough for most cases). 128 | * @param freq [0..255] The output symbol frequency table (can be NULL). 129 | * @return 0 if no error occurred, -1 or -2 otherwise. 130 | */ 131 | LIBSAIS_API int32_t libsais_gsa_ctx(const void * ctx, const uint8_t * T, int32_t * SA, int32_t n, int32_t fs, int32_t * freq); 132 | 133 | #if defined(LIBSAIS_OPENMP) 134 | /** 135 | * Constructs the suffix array of a given string in parallel using OpenMP. 136 | * @param T [0..n-1] The input string. 137 | * @param SA [0..n-1+fs] The output array of suffixes. 138 | * @param n The length of the given string. 139 | * @param fs The extra space available at the end of SA array (0 should be enough for most cases). 140 | * @param freq [0..255] The output symbol frequency table (can be NULL). 141 | * @param threads The number of OpenMP threads to use (can be 0 for OpenMP default). 142 | * @return 0 if no error occurred, -1 or -2 otherwise. 143 | */ 144 | LIBSAIS_API int32_t libsais_omp(const uint8_t * T, int32_t * SA, int32_t n, int32_t fs, int32_t * freq, int32_t threads); 145 | 146 | /** 147 | * Constructs the generalized suffix array (GSA) of given string set in parallel using OpenMP. 148 | * @param T [0..n-1] The input string set using 0 as separators (T[n-1] must be 0). 149 | * @param SA [0..n-1+fs] The output array of suffixes. 150 | * @param n The length of the given string set. 151 | * @param fs The extra space available at the end of SA array (0 should be enough for most cases). 152 | * @param freq [0..255] The output symbol frequency table (can be NULL). 153 | * @param threads The number of OpenMP threads to use (can be 0 for OpenMP default). 154 | * @return 0 if no error occurred, -1 or -2 otherwise. 155 | */ 156 | LIBSAIS_API int32_t libsais_gsa_omp(const uint8_t * T, int32_t * SA, int32_t n, int32_t fs, int32_t * freq, int32_t threads); 157 | 158 | /** 159 | * Constructs the suffix array of a given integer array in parallel using OpenMP. 160 | * Note, during construction input array will be modified, but restored at the end if no errors occurred. 161 | * @param T [0..n-1] The input integer array. 162 | * @param SA [0..n-1+fs] The output array of suffixes. 163 | * @param n The length of the integer array. 164 | * @param k The alphabet size of the input integer array. 165 | * @param fs Extra space available at the end of SA array (can be 0, but 4k or better 6k is recommended for optimal performance). 166 | * @param threads The number of OpenMP threads to use (can be 0 for OpenMP default). 167 | * @return 0 if no error occurred, -1 or -2 otherwise. 168 | */ 169 | LIBSAIS_API int32_t libsais_int_omp(int32_t * T, int32_t * SA, int32_t n, int32_t k, int32_t fs, int32_t threads); 170 | #endif 171 | 172 | /** 173 | * Constructs the burrows-wheeler transformed string (BWT) of a given string. 174 | * @param T [0..n-1] The input string. 175 | * @param U [0..n-1] The output string (can be T). 176 | * @param A [0..n-1+fs] The temporary array. 177 | * @param n The length of the given string. 178 | * @param fs The extra space available at the end of A array (0 should be enough for most cases). 179 | * @param freq [0..255] The output symbol frequency table (can be NULL). 180 | * @return The primary index if no error occurred, -1 or -2 otherwise. 181 | */ 182 | LIBSAIS_API int32_t libsais_bwt(const uint8_t * T, uint8_t * U, int32_t * A, int32_t n, int32_t fs, int32_t * freq); 183 | 184 | /** 185 | * Constructs the burrows-wheeler transformed string (BWT) of a given string with auxiliary indexes. 186 | * @param T [0..n-1] The input string. 187 | * @param U [0..n-1] The output string (can be T). 188 | * @param A [0..n-1+fs] The temporary array. 189 | * @param n The length of the given string. 190 | * @param fs The extra space available at the end of A array (0 should be enough for most cases). 191 | * @param freq [0..255] The output symbol frequency table (can be NULL). 192 | * @param r The sampling rate for auxiliary indexes (must be power of 2). 193 | * @param I [0..(n-1)/r] The output auxiliary indexes. 194 | * @return 0 if no error occurred, -1 or -2 otherwise. 195 | */ 196 | LIBSAIS_API int32_t libsais_bwt_aux(const uint8_t * T, uint8_t * U, int32_t * A, int32_t n, int32_t fs, int32_t * freq, int32_t r, int32_t * I); 197 | 198 | /** 199 | * Constructs the burrows-wheeler transformed string (BWT) of a given string using libsais context. 200 | * @param ctx The libsais context. 201 | * @param T [0..n-1] The input string. 202 | * @param U [0..n-1] The output string (can be T). 203 | * @param A [0..n-1+fs] The temporary array. 204 | * @param n The length of the given string. 205 | * @param fs The extra space available at the end of A array (0 should be enough for most cases). 206 | * @param freq [0..255] The output symbol frequency table (can be NULL). 207 | * @return The primary index if no error occurred, -1 or -2 otherwise. 208 | */ 209 | LIBSAIS_API int32_t libsais_bwt_ctx(const void * ctx, const uint8_t * T, uint8_t * U, int32_t * A, int32_t n, int32_t fs, int32_t * freq); 210 | 211 | /** 212 | * Constructs the burrows-wheeler transformed string (BWT) of a given string with auxiliary indexes using libsais context. 213 | * @param ctx The libsais context. 214 | * @param T [0..n-1] The input string. 215 | * @param U [0..n-1] The output string (can be T). 216 | * @param A [0..n-1+fs] The temporary array. 217 | * @param n The length of the given string. 218 | * @param fs The extra space available at the end of A array (0 should be enough for most cases). 219 | * @param freq [0..255] The output symbol frequency table (can be NULL). 220 | * @param r The sampling rate for auxiliary indexes (must be power of 2). 221 | * @param I [0..(n-1)/r] The output auxiliary indexes. 222 | * @return 0 if no error occurred, -1 or -2 otherwise. 223 | */ 224 | LIBSAIS_API int32_t libsais_bwt_aux_ctx(const void * ctx, const uint8_t * T, uint8_t * U, int32_t * A, int32_t n, int32_t fs, int32_t * freq, int32_t r, int32_t * I); 225 | 226 | #if defined(LIBSAIS_OPENMP) 227 | /** 228 | * Constructs the burrows-wheeler transformed string (BWT) of a given string in parallel using OpenMP. 229 | * @param T [0..n-1] The input string. 230 | * @param U [0..n-1] The output string (can be T). 231 | * @param A [0..n-1+fs] The temporary array. 232 | * @param n The length of the given string. 233 | * @param fs The extra space available at the end of A array (0 should be enough for most cases). 234 | * @param freq [0..255] The output symbol frequency table (can be NULL). 235 | * @param threads The number of OpenMP threads to use (can be 0 for OpenMP default). 236 | * @return The primary index if no error occurred, -1 or -2 otherwise. 237 | */ 238 | LIBSAIS_API int32_t libsais_bwt_omp(const uint8_t * T, uint8_t * U, int32_t * A, int32_t n, int32_t fs, int32_t * freq, int32_t threads); 239 | 240 | /** 241 | * Constructs the burrows-wheeler transformed string (BWT) of a given string with auxiliary indexes in parallel using OpenMP. 242 | * @param T [0..n-1] The input string. 243 | * @param U [0..n-1] The output string (can be T). 244 | * @param A [0..n-1+fs] The temporary array. 245 | * @param n The length of the given string. 246 | * @param fs The extra space available at the end of A array (0 should be enough for most cases). 247 | * @param freq [0..255] The output symbol frequency table (can be NULL). 248 | * @param r The sampling rate for auxiliary indexes (must be power of 2). 249 | * @param I [0..(n-1)/r] The output auxiliary indexes. 250 | * @param threads The number of OpenMP threads to use (can be 0 for OpenMP default). 251 | * @return 0 if no error occurred, -1 or -2 otherwise. 252 | */ 253 | LIBSAIS_API int32_t libsais_bwt_aux_omp(const uint8_t * T, uint8_t * U, int32_t * A, int32_t n, int32_t fs, int32_t * freq, int32_t r, int32_t * I, int32_t threads); 254 | #endif 255 | 256 | /** 257 | * Creates the libsais reverse BWT context that allows reusing allocated memory with each libsais_unbwt_* operation. 258 | * In multi-threaded environments, use one context per thread for parallel executions. 259 | * @return the libsais context, NULL otherwise. 260 | */ 261 | LIBSAIS_API void * libsais_unbwt_create_ctx(void); 262 | 263 | #if defined(LIBSAIS_OPENMP) 264 | /** 265 | * Creates the libsais reverse BWT context that allows reusing allocated memory with each parallel libsais_unbwt_* operation using OpenMP. 266 | * In multi-threaded environments, use one context per thread for parallel executions. 267 | * @param threads The number of OpenMP threads to use (can be 0 for OpenMP default). 268 | * @return the libsais context, NULL otherwise. 269 | */ 270 | LIBSAIS_API void * libsais_unbwt_create_ctx_omp(int32_t threads); 271 | #endif 272 | 273 | /** 274 | * Destroys the libsass reverse BWT context and free previusly allocated memory. 275 | * @param ctx The libsais context (can be NULL). 276 | */ 277 | LIBSAIS_API void libsais_unbwt_free_ctx(void * ctx); 278 | 279 | /** 280 | * Constructs the original string from a given burrows-wheeler transformed string (BWT) with primary index. 281 | * @param T [0..n-1] The input string. 282 | * @param U [0..n-1] The output string (can be T). 283 | * @param A [0..n] The temporary array (NOTE, temporary array must be n + 1 size). 284 | * @param n The length of the given string. 285 | * @param freq [0..255] The input symbol frequency table (can be NULL). 286 | * @param i The primary index. 287 | * @return 0 if no error occurred, -1 or -2 otherwise. 288 | */ 289 | LIBSAIS_API int32_t libsais_unbwt(const uint8_t * T, uint8_t * U, int32_t * A, int32_t n, const int32_t * freq, int32_t i); 290 | 291 | /** 292 | * Constructs the original string from a given burrows-wheeler transformed string (BWT) with primary index using libsais reverse BWT context. 293 | * @param ctx The libsais reverse BWT context. 294 | * @param T [0..n-1] The input string. 295 | * @param U [0..n-1] The output string (can be T). 296 | * @param A [0..n] The temporary array (NOTE, temporary array must be n + 1 size). 297 | * @param n The length of the given string. 298 | * @param freq [0..255] The input symbol frequency table (can be NULL). 299 | * @param i The primary index. 300 | * @return 0 if no error occurred, -1 or -2 otherwise. 301 | */ 302 | LIBSAIS_API int32_t libsais_unbwt_ctx(const void * ctx, const uint8_t * T, uint8_t * U, int32_t * A, int32_t n, const int32_t * freq, int32_t i); 303 | 304 | /** 305 | * Constructs the original string from a given burrows-wheeler transformed string (BWT) with auxiliary indexes. 306 | * @param T [0..n-1] The input string. 307 | * @param U [0..n-1] The output string (can be T). 308 | * @param A [0..n] The temporary array (NOTE, temporary array must be n + 1 size). 309 | * @param n The length of the given string. 310 | * @param freq [0..255] The input symbol frequency table (can be NULL). 311 | * @param r The sampling rate for auxiliary indexes (must be power of 2). 312 | * @param I [0..(n-1)/r] The input auxiliary indexes. 313 | * @return 0 if no error occurred, -1 or -2 otherwise. 314 | */ 315 | LIBSAIS_API int32_t libsais_unbwt_aux(const uint8_t * T, uint8_t * U, int32_t * A, int32_t n, const int32_t * freq, int32_t r, const int32_t * I); 316 | 317 | /** 318 | * Constructs the original string from a given burrows-wheeler transformed string (BWT) with auxiliary indexes using libsais reverse BWT context. 319 | * @param ctx The libsais reverse BWT context. 320 | * @param T [0..n-1] The input string. 321 | * @param U [0..n-1] The output string (can be T). 322 | * @param A [0..n] The temporary array (NOTE, temporary array must be n + 1 size). 323 | * @param n The length of the given string. 324 | * @param freq [0..255] The input symbol frequency table (can be NULL). 325 | * @param r The sampling rate for auxiliary indexes (must be power of 2). 326 | * @param I [0..(n-1)/r] The input auxiliary indexes. 327 | * @return 0 if no error occurred, -1 or -2 otherwise. 328 | */ 329 | LIBSAIS_API int32_t libsais_unbwt_aux_ctx(const void * ctx, const uint8_t * T, uint8_t * U, int32_t * A, int32_t n, const int32_t * freq, int32_t r, const int32_t * I); 330 | 331 | #if defined(LIBSAIS_OPENMP) 332 | /** 333 | * Constructs the original string from a given burrows-wheeler transformed string (BWT) with primary index in parallel using OpenMP. 334 | * @param T [0..n-1] The input string. 335 | * @param U [0..n-1] The output string (can be T). 336 | * @param A [0..n] The temporary array (NOTE, temporary array must be n + 1 size). 337 | * @param n The length of the given string. 338 | * @param freq [0..255] The input symbol frequency table (can be NULL). 339 | * @param i The primary index. 340 | * @param threads The number of OpenMP threads to use (can be 0 for OpenMP default). 341 | * @return 0 if no error occurred, -1 or -2 otherwise. 342 | */ 343 | LIBSAIS_API int32_t libsais_unbwt_omp(const uint8_t * T, uint8_t * U, int32_t * A, int32_t n, const int32_t * freq, int32_t i, int32_t threads); 344 | 345 | /** 346 | * Constructs the original string from a given burrows-wheeler transformed string (BWT) with auxiliary indexes in parallel using OpenMP. 347 | * @param T [0..n-1] The input string. 348 | * @param U [0..n-1] The output string (can be T). 349 | * @param A [0..n] The temporary array (NOTE, temporary array must be n + 1 size). 350 | * @param n The length of the given string. 351 | * @param freq [0..255] The input symbol frequency table (can be NULL). 352 | * @param r The sampling rate for auxiliary indexes (must be power of 2). 353 | * @param I [0..(n-1)/r] The input auxiliary indexes. 354 | * @param threads The number of OpenMP threads to use (can be 0 for OpenMP default). 355 | * @return 0 if no error occurred, -1 or -2 otherwise. 356 | */ 357 | LIBSAIS_API int32_t libsais_unbwt_aux_omp(const uint8_t * T, uint8_t * U, int32_t * A, int32_t n, const int32_t * freq, int32_t r, const int32_t * I, int32_t threads); 358 | #endif 359 | 360 | /** 361 | * Constructs the permuted longest common prefix array (PLCP) of a given string and a suffix array. 362 | * @param T [0..n-1] The input string. 363 | * @param SA [0..n-1] The input suffix array. 364 | * @param PLCP [0..n-1] The output permuted longest common prefix array. 365 | * @param n The length of the string and the suffix array. 366 | * @return 0 if no error occurred, -1 otherwise. 367 | */ 368 | LIBSAIS_API int32_t libsais_plcp(const uint8_t * T, const int32_t * SA, int32_t * PLCP, int32_t n); 369 | 370 | /** 371 | * Constructs the permuted longest common prefix array (PLCP) of a given string set and a generalized suffix array (GSA). 372 | * @param T [0..n-1] The input string set using 0 as separators (T[n-1] must be 0). 373 | * @param SA [0..n-1] The input generalized suffix array. 374 | * @param PLCP [0..n-1] The output permuted longest common prefix array. 375 | * @param n The length of the string set and the generalized suffix array. 376 | * @return 0 if no error occurred, -1 otherwise. 377 | */ 378 | LIBSAIS_API int32_t libsais_plcp_gsa(const uint8_t * T, const int32_t * SA, int32_t * PLCP, int32_t n); 379 | 380 | /** 381 | * Constructs the permuted longest common prefix array (PLCP) of a integer array and a suffix array. 382 | * @param T [0..n-1] The input integer array. 383 | * @param SA [0..n-1] The input suffix array. 384 | * @param PLCP [0..n-1] The output permuted longest common prefix array. 385 | * @param n The length of the integer array and the suffix array. 386 | * @return 0 if no error occurred, -1 otherwise. 387 | */ 388 | LIBSAIS_API int32_t libsais_plcp_int(const int32_t * T, const int32_t * SA, int32_t * PLCP, int32_t n); 389 | 390 | /** 391 | * Constructs the longest common prefix array (LCP) of a given permuted longest common prefix array (PLCP) and a suffix array. 392 | * @param PLCP [0..n-1] The input permuted longest common prefix array. 393 | * @param SA [0..n-1] The input suffix array or generalized suffix array (GSA). 394 | * @param LCP [0..n-1] The output longest common prefix array (can be SA). 395 | * @param n The length of the permuted longest common prefix array and the suffix array. 396 | * @return 0 if no error occurred, -1 otherwise. 397 | */ 398 | LIBSAIS_API int32_t libsais_lcp(const int32_t * PLCP, const int32_t * SA, int32_t * LCP, int32_t n); 399 | 400 | #if defined(LIBSAIS_OPENMP) 401 | /** 402 | * Constructs the permuted longest common prefix array (PLCP) of a given string and a suffix array in parallel using OpenMP. 403 | * @param T [0..n-1] The input string. 404 | * @param SA [0..n-1] The input suffix array. 405 | * @param PLCP [0..n-1] The output permuted longest common prefix array. 406 | * @param n The length of the string and the suffix array. 407 | * @param threads The number of OpenMP threads to use (can be 0 for OpenMP default). 408 | * @return 0 if no error occurred, -1 otherwise. 409 | */ 410 | LIBSAIS_API int32_t libsais_plcp_omp(const uint8_t * T, const int32_t * SA, int32_t * PLCP, int32_t n, int32_t threads); 411 | 412 | /** 413 | * Constructs the permuted longest common prefix array (PLCP) of a given string set and a generalized suffix array (GSA) in parallel using OpenMP. 414 | * @param T [0..n-1] The input string set using 0 as separators (T[n-1] must be 0). 415 | * @param SA [0..n-1] The input generalized suffix array. 416 | * @param PLCP [0..n-1] The output permuted longest common prefix array. 417 | * @param n The length of the string set and the generalized suffix array. 418 | * @param threads The number of OpenMP threads to use (can be 0 for OpenMP default). 419 | * @return 0 if no error occurred, -1 otherwise. 420 | */ 421 | LIBSAIS_API int32_t libsais_plcp_gsa_omp(const uint8_t * T, const int32_t * SA, int32_t * PLCP, int32_t n, int32_t threads); 422 | 423 | /** 424 | * Constructs the permuted longest common prefix array (PLCP) of a given integer array and a suffix array in parallel using OpenMP. 425 | * @param T [0..n-1] The input integer array. 426 | * @param SA [0..n-1] The input suffix array. 427 | * @param PLCP [0..n-1] The output permuted longest common prefix array. 428 | * @param n The length of the integer array and the suffix array. 429 | * @param threads The number of OpenMP threads to use (can be 0 for OpenMP default). 430 | * @return 0 if no error occurred, -1 otherwise. 431 | */ 432 | LIBSAIS_API int32_t libsais_plcp_int_omp(const int32_t * T, const int32_t * SA, int32_t * PLCP, int32_t n, int32_t threads); 433 | 434 | /** 435 | * Constructs the longest common prefix array (LCP) of a given permuted longest common prefix array (PLCP) and a suffix array in parallel using OpenMP. 436 | * @param PLCP [0..n-1] The input permuted longest common prefix array. 437 | * @param SA [0..n-1] The input suffix array or generalized suffix array (GSA). 438 | * @param LCP [0..n-1] The output longest common prefix array (can be SA). 439 | * @param n The length of the permuted longest common prefix array and the suffix array. 440 | * @param threads The number of OpenMP threads to use (can be 0 for OpenMP default). 441 | * @return 0 if no error occurred, -1 otherwise. 442 | */ 443 | LIBSAIS_API int32_t libsais_lcp_omp(const int32_t * PLCP, const int32_t * SA, int32_t * LCP, int32_t n, int32_t threads); 444 | #endif 445 | 446 | #ifdef __cplusplus 447 | } 448 | #endif 449 | 450 | #endif 451 | --------------------------------------------------------------------------------