├── .gitignore ├── LICENSE ├── LICENSE.cc0.md ├── LICENSE.zlib.md ├── Makefile ├── README.md ├── VS2017 ├── .gitignore ├── lz4ultra.sln ├── lz4ultra.vcxproj ├── lz4ultra.vcxproj.filters └── lz4ultra.vcxproj.user ├── Xcode └── lz4ultra.xcodeproj │ └── project.pbxproj └── src ├── dictionary.c ├── dictionary.h ├── expand_block.c ├── expand_block.h ├── expand_inmem.c ├── expand_inmem.h ├── expand_streaming.c ├── expand_streaming.h ├── format.h ├── frame.c ├── frame.h ├── lib.c ├── lib.h ├── libdivsufsort ├── .gitignore ├── CHANGELOG.md ├── CMakeLists.txt ├── CMakeModules │ ├── AppendCompilerFlags.cmake │ ├── CheckFunctionKeywords.cmake │ ├── CheckLFS.cmake │ ├── ProjectCPack.cmake │ └── cmake_uninstall.cmake.in ├── LICENSE ├── README.md ├── VERSION.cmake ├── examples │ ├── CMakeLists.txt │ ├── bwt.c │ ├── mksary.c │ ├── sasearch.c │ ├── suftest.c │ └── unbwt.c ├── include │ ├── CMakeLists.txt │ ├── config.h.cmake │ ├── divsufsort.h │ ├── divsufsort.h.cmake │ ├── divsufsort_config.h │ ├── divsufsort_private.h │ └── lfs.h.cmake ├── lib │ ├── CMakeLists.txt │ ├── divsufsort.c │ ├── divsufsort_utils.c │ ├── sssort.c │ └── trsort.c └── pkgconfig │ ├── CMakeLists.txt │ └── libdivsufsort.pc.cmake ├── lz4ultra.c ├── matchfinder.c ├── matchfinder.h ├── shrink_block.c ├── shrink_block.h ├── shrink_context.c ├── shrink_context.h ├── shrink_inmem.c ├── shrink_inmem.h ├── shrink_streaming.c ├── shrink_streaming.h ├── stream.c ├── stream.h └── xxhash ├── LICENSE.txt ├── xxhash.c └── xxhash.h /.gitignore: -------------------------------------------------------------------------------- 1 | obj 2 | lz4ultra 3 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The lz4ultra code is available under the Zlib license, except for src/matchfinder.c which is placed under the Creative Commons CC0 license. 2 | 3 | Please consult LICENSE.zlib.md and LICENSE.CC0.md for more information. 4 | -------------------------------------------------------------------------------- /LICENSE.cc0.md: -------------------------------------------------------------------------------- 1 | ## creative commons 2 | 3 | # CC0 1.0 Universal 4 | 5 | CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED HEREUNDER. 6 | 7 | ### Statement of Purpose 8 | 9 | The laws of most jurisdictions throughout the world automatically confer exclusive Copyright and Related Rights (defined below) upon the creator and subsequent owner(s) (each and all, an "owner") of an original work of authorship and/or a database (each, a "Work"). 10 | 11 | Certain owners wish to permanently relinquish those rights to a Work for the purpose of contributing to a commons of creative, cultural and scientific works ("Commons") that the public can reliably and without fear of later claims of infringement build upon, modify, incorporate in other works, reuse and redistribute as freely as possible in any form whatsoever and for any purposes, including without limitation commercial purposes. These owners may contribute to the Commons to promote the ideal of a free culture and the further production of creative, cultural and scientific works, or to gain reputation or greater distribution for their Work in part through the use and efforts of others. 12 | 13 | For these and/or other purposes and motivations, and without any expectation of additional consideration or compensation, the person associating CC0 with a Work (the "Affirmer"), to the extent that he or she is an owner of Copyright and Related Rights in the Work, voluntarily elects to apply CC0 to the Work and publicly distribute the Work under its terms, with knowledge of his or her Copyright and Related Rights in the Work and the meaning and intended legal effect of CC0 on those rights. 14 | 15 | 1. __Copyright and Related Rights.__ A Work made available under CC0 may be protected by copyright and related or neighboring rights ("Copyright and Related Rights"). Copyright and Related Rights include, but are not limited to, the following: 16 | 17 | i. the right to reproduce, adapt, distribute, perform, display, communicate, and translate a Work; 18 | 19 | ii. moral rights retained by the original author(s) and/or performer(s); 20 | 21 | iii. publicity and privacy rights pertaining to a person's image or likeness depicted in a Work; 22 | 23 | iv. rights protecting against unfair competition in regards to a Work, subject to the limitations in paragraph 4(a), below; 24 | 25 | v. rights protecting the extraction, dissemination, use and reuse of data in a Work; 26 | 27 | vi. database rights (such as those arising under Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, and under any national implementation thereof, including any amended or successor version of such directive); and 28 | 29 | vii. other similar, equivalent or corresponding rights throughout the world based on applicable law or treaty, and any national implementations thereof. 30 | 31 | 2. __Waiver.__ To the greatest extent permitted by, but not in contravention of, applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and unconditionally waives, abandons, and surrenders all of Affirmer's Copyright and Related Rights and associated claims and causes of action, whether now known or unknown (including existing as well as future claims and causes of action), in the Work (i) in all territories worldwide, (ii) for the maximum duration provided by applicable law or treaty (including future time extensions), (iii) in any current or future medium and for any number of copies, and (iv) for any purpose whatsoever, including without limitation commercial, advertising or promotional purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each member of the public at large and to the detriment of Affirmer's heirs and successors, fully intending that such Waiver shall not be subject to revocation, rescission, cancellation, termination, or any other legal or equitable action to disrupt the quiet enjoyment of the Work by the public as contemplated by Affirmer's express Statement of Purpose. 32 | 33 | 3. __Public License Fallback.__ Should any part of the Waiver for any reason be judged legally invalid or ineffective under applicable law, then the Waiver shall be preserved to the maximum extent permitted taking into account Affirmer's express Statement of Purpose. In addition, to the extent the Waiver is so judged Affirmer hereby grants to each affected person a royalty-free, non transferable, non sublicensable, non exclusive, irrevocable and unconditional license to exercise Affirmer's Copyright and Related Rights in the Work (i) in all territories worldwide, (ii) for the maximum duration provided by applicable law or treaty (including future time extensions), (iii) in any current or future medium and for any number of copies, and (iv) for any purpose whatsoever, including without limitation commercial, advertising or promotional purposes (the "License"). The License shall be deemed effective as of the date CC0 was applied by Affirmer to the Work. Should any part of the License for any reason be judged legally invalid or ineffective under applicable law, such partial invalidity or ineffectiveness shall not invalidate the remainder of the License, and in such case Affirmer hereby affirms that he or she will not (i) exercise any of his or her remaining Copyright and Related Rights in the Work or (ii) assert any associated claims and causes of action with respect to the Work, in either case contrary to Affirmer's express Statement of Purpose. 34 | 35 | 4. __Limitations and Disclaimers.__ 36 | 37 | a. No trademark or patent rights held by Affirmer are waived, abandoned, surrendered, licensed or otherwise affected by this document. 38 | 39 | b. Affirmer offers the Work as-is and makes no representations or warranties of any kind concerning the Work, express, implied, statutory or otherwise, including without limitation warranties of title, merchantability, fitness for a particular purpose, non infringement, or the absence of latent or other defects, accuracy, or the present or absence of errors, whether or not discoverable, all to the greatest extent permissible under applicable law. 40 | 41 | c. Affirmer disclaims responsibility for clearing rights of other persons that may apply to the Work or any use thereof, including without limitation any person's Copyright and Related Rights in the Work. Further, Affirmer disclaims responsibility for obtaining any necessary consents, permissions or other rights required for any use of the Work. 42 | 43 | d. Affirmer understands and acknowledges that Creative Commons is not a party to this document and has no duty or obligation with respect to this CC0 or use of the Work. 44 | -------------------------------------------------------------------------------- /LICENSE.zlib.md: -------------------------------------------------------------------------------- 1 | Copyright (c) 2019 Emmanuel Marty 2 | 3 | This software is provided 'as-is', without any express or implied warranty. In 4 | no event will the authors be held liable for any damages arising from the use of 5 | this software. 6 | 7 | Permission is granted to anyone to use this software for any purpose, including 8 | commercial applications, and to alter it and redistribute it freely, subject to 9 | the following restrictions: 10 | 11 | 1. The origin of this software must not be misrepresented; you must not claim 12 | that you wrote the original software. If you use this software in a product, 13 | an acknowledgment in the product documentation would be appreciated but is 14 | not required. 15 | 16 | 2. Altered source versions must be plainly marked as such, and must not be 17 | misrepresented as being the original software. 18 | 19 | 3. This notice may not be removed or altered from any source distribution. 20 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | CC=clang 2 | CFLAGS=-O3 -fomit-frame-pointer -Isrc/libdivsufsort/include -Isrc/xxhash -Isrc 3 | OBJDIR=obj 4 | LDFLAGS= 5 | STRIP=strip 6 | 7 | $(OBJDIR)/%.o: src/../%.c 8 | @mkdir -p '$(@D)' 9 | $(CC) $(CFLAGS) -c $< -o $@ 10 | 11 | APP := lz4ultra 12 | 13 | OBJS := $(OBJDIR)/src/lz4ultra.o 14 | OBJS += $(OBJDIR)/src/dictionary.o 15 | OBJS += $(OBJDIR)/src/expand_block.o 16 | OBJS += $(OBJDIR)/src/expand_inmem.o 17 | OBJS += $(OBJDIR)/src/expand_streaming.o 18 | OBJS += $(OBJDIR)/src/frame.o 19 | OBJS += $(OBJDIR)/src/lib.o 20 | OBJS += $(OBJDIR)/src/matchfinder.o 21 | OBJS += $(OBJDIR)/src/shrink_block.o 22 | OBJS += $(OBJDIR)/src/shrink_context.o 23 | OBJS += $(OBJDIR)/src/shrink_inmem.o 24 | OBJS += $(OBJDIR)/src/shrink_streaming.o 25 | OBJS += $(OBJDIR)/src/stream.o 26 | OBJS += $(OBJDIR)/src/libdivsufsort/lib/divsufsort.o 27 | OBJS += $(OBJDIR)/src/libdivsufsort/lib/divsufsort_utils.o 28 | OBJS += $(OBJDIR)/src/libdivsufsort/lib/sssort.o 29 | OBJS += $(OBJDIR)/src/libdivsufsort/lib/trsort.o 30 | OBJS += $(OBJDIR)/src/xxhash/xxhash.o 31 | 32 | all: $(APP) 33 | 34 | $(APP): $(OBJS) 35 | @mkdir -p ../../bin/posix 36 | $(CC) $^ $(LDFLAGS) -o $(APP) 37 | $(STRIP) $(APP) 38 | 39 | clean: 40 | @rm -rf $(APP) $(OBJDIR) 41 | 42 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | lz4ultra -- Optimal LZ4 packer with faster decompression 2 | ======================================================== 3 | 4 | lz4ultra is a command-line optimal compression utility that produces compressed files in the [lz4](https://github.com/lz4/lz4) format created by Yann Collet. 5 | 6 | The tool creates optimally compressed files, like lz4 in optimal compression mode ("lz4hc"), smallLZ4, blz4 and lz4x. 7 | 8 | With enwik9 (1,000,000,000 bytes): 9 | 10 | Compr.size Tokens Decomp.time (μs, Core i7-6700) 11 | lz4 1.9.2 -12 (favor ratio) 372,443,347 95,698,349 505,804 12 | smalLZ4 1.5 -9 371,680,328 93,172,985 348,018 13 | lz4ultra 1.3.0 (favor ratio) 371,680,323 93,165,899 347,936 14 | lz4 1.9.2 -12 --favor-decSpeed 377,175,400 92,080,802 457,141 15 | lz4ultra 1.3.0 --favor-decSpeed 376,118,079 88,521,993 296,972 16 | 17 | The produced files are meant to be decompressed with the lz4 tool and library. While lz4ultra includes a decompressor, it is mostly meant to verify the output of the compressor and isn't as optimized as Yann Collet's lz4 proper. 18 | 19 | The tool defaults to 4 Mb blocks with inter-block dependencies but can be configured to output all of the LZ4 block sizes (64 Kb to 4 Mb), to use the LZ4 8 Mb blocks legacy encoding, and to compress independent blocks, using command-line switches. 20 | 21 | lz4ultra is developed by Emmanuel Marty with the help of spke. 22 | -------------------------------------------------------------------------------- /VS2017/.gitignore: -------------------------------------------------------------------------------- 1 | .vs 2 | Debug 3 | Release 4 | bin 5 | -------------------------------------------------------------------------------- /VS2017/lz4ultra.sln: -------------------------------------------------------------------------------- 1 |  2 | Microsoft Visual Studio Solution File, Format Version 12.00 3 | # Visual Studio 15 4 | VisualStudioVersion = 15.0.28307.489 5 | MinimumVisualStudioVersion = 10.0.40219.1 6 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "lz4ultra", "lz4ultra.vcxproj", "{3F30FEE8-63C5-4D39-A175-EDD7EA93E9B8}" 7 | EndProject 8 | Global 9 | GlobalSection(SolutionConfigurationPlatforms) = preSolution 10 | Debug|x64 = Debug|x64 11 | Debug|x86 = Debug|x86 12 | Release|x64 = Release|x64 13 | Release|x86 = Release|x86 14 | EndGlobalSection 15 | GlobalSection(ProjectConfigurationPlatforms) = postSolution 16 | {3F30FEE8-63C5-4D39-A175-EDD7EA93E9B8}.Debug|x64.ActiveCfg = Debug|x64 17 | {3F30FEE8-63C5-4D39-A175-EDD7EA93E9B8}.Debug|x64.Build.0 = Debug|x64 18 | {3F30FEE8-63C5-4D39-A175-EDD7EA93E9B8}.Debug|x86.ActiveCfg = Debug|Win32 19 | {3F30FEE8-63C5-4D39-A175-EDD7EA93E9B8}.Debug|x86.Build.0 = Debug|Win32 20 | {3F30FEE8-63C5-4D39-A175-EDD7EA93E9B8}.Release|x64.ActiveCfg = Release|x64 21 | {3F30FEE8-63C5-4D39-A175-EDD7EA93E9B8}.Release|x64.Build.0 = Release|x64 22 | {3F30FEE8-63C5-4D39-A175-EDD7EA93E9B8}.Release|x86.ActiveCfg = Release|Win32 23 | {3F30FEE8-63C5-4D39-A175-EDD7EA93E9B8}.Release|x86.Build.0 = Release|Win32 24 | EndGlobalSection 25 | GlobalSection(SolutionProperties) = preSolution 26 | HideSolutionNode = FALSE 27 | EndGlobalSection 28 | GlobalSection(ExtensibilityGlobals) = postSolution 29 | SolutionGuid = {A1E1655C-AA9F-41F0-80C9-18DD0B859D7C} 30 | EndGlobalSection 31 | EndGlobal 32 | -------------------------------------------------------------------------------- /VS2017/lz4ultra.vcxproj.filters: -------------------------------------------------------------------------------- 1 |  2 | 3 | 4 | 5 | {4FC737F1-C7A5-4376-A066-2A32D752A2FF} 6 | cpp;c;cc;cxx;def;odl;idl;hpj;bat;asm;asmx 7 | 8 | 9 | {93995380-89BD-4b04-88EB-625FBE52EBFB} 10 | h;hh;hpp;hxx;hm;inl;inc;ipp;xsd 11 | 12 | 13 | {67DA6AB6-F800-4c08-8B7A-83BB121AAD01} 14 | rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tiff;tif;png;wav;mfcribbon-ms 15 | 16 | 17 | {a858de66-bef8-44b2-aaba-99ab69a3a806} 18 | 19 | 20 | {8ffd119e-b205-4e17-8c23-b945711b5e16} 21 | 22 | 23 | {7b58e9ea-8419-4a92-b23b-52a66da2bca3} 24 | 25 | 26 | {178a6577-0784-4aa4-8a35-9c443a088e23} 27 | 28 | 29 | 30 | 31 | Fichiers d%27en-tête 32 | 33 | 34 | Fichiers sources\libdivsufsort\include 35 | 36 | 37 | Fichiers sources\libdivsufsort\include 38 | 39 | 40 | Fichiers sources 41 | 42 | 43 | Fichiers sources\xxhash 44 | 45 | 46 | Fichiers sources 47 | 48 | 49 | Fichiers sources 50 | 51 | 52 | Fichiers sources 53 | 54 | 55 | Fichiers sources 56 | 57 | 58 | Fichiers sources 59 | 60 | 61 | Fichiers sources 62 | 63 | 64 | Fichiers sources 65 | 66 | 67 | Fichiers sources 68 | 69 | 70 | Fichiers sources 71 | 72 | 73 | Fichiers sources 74 | 75 | 76 | Fichiers sources 77 | 78 | 79 | Fichiers sources\libdivsufsort\include 80 | 81 | 82 | Fichiers sources 83 | 84 | 85 | 86 | 87 | Fichiers sources\libdivsufsort\lib 88 | 89 | 90 | Fichiers sources\libdivsufsort\lib 91 | 92 | 93 | Fichiers sources\libdivsufsort\lib 94 | 95 | 96 | Fichiers sources 97 | 98 | 99 | Fichiers sources\xxhash 100 | 101 | 102 | Fichiers sources 103 | 104 | 105 | Fichiers sources 106 | 107 | 108 | Fichiers sources 109 | 110 | 111 | Fichiers sources 112 | 113 | 114 | Fichiers sources 115 | 116 | 117 | Fichiers sources 118 | 119 | 120 | Fichiers sources 121 | 122 | 123 | Fichiers sources 124 | 125 | 126 | Fichiers sources 127 | 128 | 129 | Fichiers sources 130 | 131 | 132 | Fichiers sources\libdivsufsort\lib 133 | 134 | 135 | Fichiers sources 136 | 137 | 138 | -------------------------------------------------------------------------------- /VS2017/lz4ultra.vcxproj.user: -------------------------------------------------------------------------------- 1 |  2 | 3 | 4 | $(TargetPath) 5 | -c -v corpus/zxspectrum/graphics/bfox-dont_go_away_(2010).mg1 packed_lz4ultra/zxspectrum/graphics/bfox-dont_go_away_(2010).mg.lzs 6 | WindowsLocalDebugger 7 | $(ProjectDir)..\ 8 | 9 | 10 | $(TargetPath) 11 | -c -v corpus/zxspectrum/graphics/bfox-dont_go_away_(2010).mg1 packed_lz4ultra/zxspectrum/graphics/bfox-dont_go_away_(2010).mg.lzs 12 | WindowsLocalDebugger 13 | $(ProjectDir)..\ 14 | 15 | 16 | $(TargetPath) 17 | -c -v corpus/zxspectrum/graphics/bfox-dont_go_away_(2010).mg1 packed_lz4ultra/zxspectrum/graphics/bfox-dont_go_away_(2010).mg.lzs 18 | WindowsLocalDebugger 19 | $(ProjectDir)..\ 20 | 21 | 22 | $(TargetPath) 23 | -c -v corpus/zxspectrum/graphics/bfox-dont_go_away_(2010).mg1 packed_lz4ultra/zxspectrum/graphics/bfox-dont_go_away_(2010).mg.lzs 24 | WindowsLocalDebugger 25 | $(ProjectDir)..\ 26 | 27 | -------------------------------------------------------------------------------- /src/dictionary.c: -------------------------------------------------------------------------------- 1 | /* 2 | * dictionary.c - dictionary implementation 3 | * 4 | * Copyright (C) 2019 Emmanuel Marty 5 | * 6 | * This software is provided 'as-is', without any express or implied 7 | * warranty. In no event will the authors be held liable for any damages 8 | * arising from the use of this software. 9 | * 10 | * Permission is granted to anyone to use this software for any purpose, 11 | * including commercial applications, and to alter it and redistribute it 12 | * freely, subject to the following restrictions: 13 | * 14 | * 1. The origin of this software must not be misrepresented; you must not 15 | * claim that you wrote the original software. If you use this software 16 | * in a product, an acknowledgment in the product documentation would be 17 | * appreciated but is not required. 18 | * 2. Altered source versions must be plainly marked as such, and must not be 19 | * misrepresented as being the original software. 20 | * 3. This notice may not be removed or altered from any source distribution. 21 | */ 22 | 23 | /* 24 | * Uses the libdivsufsort library Copyright (c) 2003-2008 Yuta Mori 25 | * 26 | * Inspired by LZ4 by Yann Collet. https://github.com/lz4/lz4 27 | * With help, ideas, optimizations and speed measurements by spke 28 | * With ideas from Lizard by Przemyslaw Skibinski and Yann Collet. https://github.com/inikep/lizard 29 | * Also with ideas from smallz4 by Stephan Brumme. https://create.stephan-brumme.com/smallz4/ 30 | * 31 | */ 32 | 33 | #include 34 | #include "dictionary.h" 35 | #include "format.h" 36 | #include "lib.h" 37 | 38 | /** 39 | * Load dictionary contents 40 | * 41 | * @param pszDictionaryFilename name of dictionary file, or NULL for none 42 | * @param ppDictionaryData pointer to returned dictionary contents, or NULL for none 43 | * @param pDictionaryDataSize pointer to returned size of dictionary contents, or 0 44 | * 45 | * @return LZSA_OK for success, or an error value from lz4ultra_status_t 46 | */ 47 | int lz4ultra_dictionary_load(const char *pszDictionaryFilename, void **ppDictionaryData, int *pDictionaryDataSize) { 48 | unsigned char *pDictionaryData = NULL; 49 | int nDictionaryDataSize = 0; 50 | 51 | if (pszDictionaryFilename) { 52 | pDictionaryData = (unsigned char *)malloc(HISTORY_SIZE); 53 | if (!pDictionaryData) { 54 | return LZ4ULTRA_ERROR_MEMORY; 55 | } 56 | 57 | FILE *f_dictionary = fopen(pszDictionaryFilename, "rb"); 58 | if (!f_dictionary) { 59 | free(pDictionaryData); 60 | pDictionaryData = NULL; 61 | 62 | return LZ4ULTRA_ERROR_DICTIONARY; 63 | } 64 | 65 | fseek(f_dictionary, 0, SEEK_END); 66 | #ifdef _WIN32 67 | __int64 nDictionaryFileSize = _ftelli64(f_dictionary); 68 | #else 69 | off_t nDictionaryFileSize = ftello(f_dictionary); 70 | #endif 71 | if (nDictionaryFileSize > HISTORY_SIZE) { 72 | /* Use the last HISTORY_SIZE bytes of the dictionary */ 73 | fseek(f_dictionary, -HISTORY_SIZE, SEEK_END); 74 | } 75 | else { 76 | fseek(f_dictionary, 0, SEEK_SET); 77 | } 78 | 79 | nDictionaryDataSize = (int)fread(pDictionaryData, 1, HISTORY_SIZE, f_dictionary); 80 | if (nDictionaryDataSize < 0) 81 | nDictionaryDataSize = 0; 82 | 83 | fclose(f_dictionary); 84 | f_dictionary = NULL; 85 | } 86 | 87 | *ppDictionaryData = pDictionaryData; 88 | *pDictionaryDataSize = nDictionaryDataSize; 89 | return LZ4ULTRA_OK; 90 | } 91 | 92 | /** 93 | * Free dictionary contents 94 | * 95 | * @param ppDictionaryData pointer to pointer to dictionary contents 96 | */ 97 | void lz4ultra_dictionary_free(void **ppDictionaryData) { 98 | if (*ppDictionaryData) { 99 | free(*ppDictionaryData); 100 | ppDictionaryData = NULL; 101 | } 102 | } 103 | -------------------------------------------------------------------------------- /src/dictionary.h: -------------------------------------------------------------------------------- 1 | /* 2 | * dictionary.h - dictionary definitions 3 | * 4 | * Copyright (C) 2019 Emmanuel Marty 5 | * 6 | * This software is provided 'as-is', without any express or implied 7 | * warranty. In no event will the authors be held liable for any damages 8 | * arising from the use of this software. 9 | * 10 | * Permission is granted to anyone to use this software for any purpose, 11 | * including commercial applications, and to alter it and redistribute it 12 | * freely, subject to the following restrictions: 13 | * 14 | * 1. The origin of this software must not be misrepresented; you must not 15 | * claim that you wrote the original software. If you use this software 16 | * in a product, an acknowledgment in the product documentation would be 17 | * appreciated but is not required. 18 | * 2. Altered source versions must be plainly marked as such, and must not be 19 | * misrepresented as being the original software. 20 | * 3. This notice may not be removed or altered from any source distribution. 21 | */ 22 | 23 | /* 24 | * Uses the libdivsufsort library Copyright (c) 2003-2008 Yuta Mori 25 | * 26 | * Inspired by LZ4 by Yann Collet. https://github.com/lz4/lz4 27 | * With help, ideas, optimizations and speed measurements by spke 28 | * With ideas from Lizard by Przemyslaw Skibinski and Yann Collet. https://github.com/inikep/lizard 29 | * Also with ideas from smallz4 by Stephan Brumme. https://create.stephan-brumme.com/smallz4/ 30 | * 31 | */ 32 | 33 | #ifndef _DICTIONARY_H 34 | #define _DICTIONARY_H 35 | 36 | /** 37 | * Load dictionary contents 38 | * 39 | * @param pszDictionaryFilename name of dictionary file, or NULL for none 40 | * @param ppDictionaryData pointer to returned dictionary contents, or NULL for none 41 | * @param pDictionaryDataSize pointer to returned size of dictionary contents, or 0 42 | * 43 | * @return LZSA_OK for success, or an error value from lz4ultra_status_t 44 | */ 45 | int lz4ultra_dictionary_load(const char *pszDictionaryFilename, void **ppDictionaryData, int *pDictionaryDataSize); 46 | 47 | /** 48 | * Free dictionary contents 49 | * 50 | * @param ppDictionaryData pointer to pointer to dictionary contents 51 | */ 52 | void lz4ultra_dictionary_free(void **ppDictionaryData); 53 | 54 | #endif /* _DICTIONARY_H */ 55 | -------------------------------------------------------------------------------- /src/expand_block.c: -------------------------------------------------------------------------------- 1 | /* 2 | * expand_block.c - block decompressor implementation 3 | * 4 | * Copyright (C) 2019 Emmanuel Marty 5 | * 6 | * This software is provided 'as-is', without any express or implied 7 | * warranty. In no event will the authors be held liable for any damages 8 | * arising from the use of this software. 9 | * 10 | * Permission is granted to anyone to use this software for any purpose, 11 | * including commercial applications, and to alter it and redistribute it 12 | * freely, subject to the following restrictions: 13 | * 14 | * 1. The origin of this software must not be misrepresented; you must not 15 | * claim that you wrote the original software. If you use this software 16 | * in a product, an acknowledgment in the product documentation would be 17 | * appreciated but is not required. 18 | * 2. Altered source versions must be plainly marked as such, and must not be 19 | * misrepresented as being the original software. 20 | * 3. This notice may not be removed or altered from any source distribution. 21 | */ 22 | 23 | /* 24 | * Uses the libdivsufsort library Copyright (c) 2003-2008 Yuta Mori 25 | * 26 | * Inspired by LZ4 by Yann Collet. https://github.com/lz4/lz4 27 | * With help, ideas, optimizations and speed measurements by spke 28 | * With ideas from Lizard by Przemyslaw Skibinski and Yann Collet. https://github.com/inikep/lizard 29 | * Also with ideas from smallz4 by Stephan Brumme. https://create.stephan-brumme.com/smallz4/ 30 | * 31 | */ 32 | 33 | /* This code is mostly here to verify the compressor's output. You should use the real, optimized lz4 decompressor to decompress your data. */ 34 | 35 | #include 36 | #include 37 | #include "format.h" 38 | #include "expand_block.h" 39 | 40 | #if defined(__GNUC__) || defined(__clang__) 41 | #define likely(x) __builtin_expect(!!(x), 1) 42 | #define unlikely(x) __builtin_expect(!!(x), 0) 43 | #else 44 | #define likely(x) (x) 45 | #define unlikely(x) (x) 46 | #endif 47 | 48 | #define LZ4ULTRA_DECOMPRESSOR_BUILD_LEN(__len) { \ 49 | unsigned int byte; \ 50 | do { \ 51 | if (unlikely(pInBlock >= pInBlockEnd)) return -1; \ 52 | byte = (unsigned int)*pInBlock++; \ 53 | __len += byte; \ 54 | } while (unlikely(byte == 255)); \ 55 | } 56 | 57 | /** 58 | * Decompress one data block 59 | * 60 | * @param pInBlock pointer to compressed data 61 | * @param nBlockSize size of compressed data, in bytes 62 | * @param pOutData pointer to output decompression buffer (previously decompressed bytes + room for decompressing this block) 63 | * @param nOutDataOffset starting index of where to store decompressed bytes in output buffer (and size of previously decompressed bytes) 64 | * @param nBlockMaxSize total size of output decompression buffer, in bytes 65 | * 66 | * @return size of decompressed data in bytes, or -1 for error 67 | */ 68 | int lz4ultra_decompressor_expand_block(const unsigned char *pInBlock, int nBlockSize, unsigned char *pOutData, int nOutDataOffset, int nBlockMaxSize) { 69 | const unsigned char *pInBlockEnd = pInBlock + nBlockSize; 70 | unsigned char *pCurOutData = pOutData + nOutDataOffset; 71 | const unsigned char *pOutDataEnd = pCurOutData + nBlockMaxSize; 72 | const unsigned char *pOutDataFastEnd = pOutDataEnd - 18; 73 | 74 | while (likely(pInBlock < pInBlockEnd)) { 75 | const unsigned int token = (unsigned int)*pInBlock++; 76 | unsigned int nLiterals = ((token & 0xf0) >> 4); 77 | 78 | if (nLiterals != LITERALS_RUN_LEN && pCurOutData <= pOutDataFastEnd && (pInBlock + 16) <= pInBlockEnd) { 79 | memcpy(pCurOutData, pInBlock, 16); 80 | } 81 | else { 82 | if (likely(nLiterals == LITERALS_RUN_LEN)) 83 | LZ4ULTRA_DECOMPRESSOR_BUILD_LEN(nLiterals); 84 | 85 | if (unlikely((pInBlock + nLiterals) > pInBlockEnd)) return -1; 86 | if (unlikely((pCurOutData + nLiterals) > pOutDataEnd)) return -1; 87 | 88 | memcpy(pCurOutData, pInBlock, nLiterals); 89 | } 90 | 91 | pInBlock += nLiterals; 92 | pCurOutData += nLiterals; 93 | 94 | if (likely((pInBlock + 2) <= pInBlockEnd)) { 95 | unsigned int nMatchOffset; 96 | 97 | nMatchOffset = (unsigned int)*pInBlock++; 98 | nMatchOffset |= ((unsigned int)*pInBlock++) << 8; 99 | 100 | unsigned int nMatchLen = (token & 0x0f); 101 | 102 | nMatchLen += MIN_MATCH_SIZE; 103 | if (nMatchLen != (MATCH_RUN_LEN + MIN_MATCH_SIZE) && nMatchOffset >= 8 && pCurOutData <= pOutDataFastEnd) { 104 | const unsigned char *pSrc = pCurOutData - nMatchOffset; 105 | 106 | if (unlikely(pSrc < pOutData)) return -1; 107 | 108 | memcpy(pCurOutData, pSrc, 8); 109 | memcpy(pCurOutData + 8, pSrc + 8, 8); 110 | memcpy(pCurOutData + 16, pSrc + 16, 2); 111 | 112 | pCurOutData += nMatchLen; 113 | } 114 | else { 115 | if (likely(nMatchLen == (MATCH_RUN_LEN + MIN_MATCH_SIZE))) 116 | LZ4ULTRA_DECOMPRESSOR_BUILD_LEN(nMatchLen); 117 | 118 | if (unlikely((pCurOutData + nMatchLen) > pOutDataEnd)) return -1; 119 | 120 | const unsigned char *pSrc = pCurOutData - nMatchOffset; 121 | if (unlikely(pSrc < pOutData)) return -1; 122 | 123 | if (nMatchOffset >= 16 && (pCurOutData + nMatchLen) <= pOutDataFastEnd) { 124 | const unsigned char *pCopySrc = pSrc; 125 | unsigned char *pCopyDst = pCurOutData; 126 | const unsigned char *pCopyEndDst = pCurOutData + nMatchLen; 127 | 128 | do { 129 | memcpy(pCopyDst, pCopySrc, 16); 130 | pCopySrc += 16; 131 | pCopyDst += 16; 132 | } while (pCopyDst < pCopyEndDst); 133 | 134 | pCurOutData += nMatchLen; 135 | } 136 | else { 137 | while (nMatchLen--) { 138 | *pCurOutData++ = *pSrc++; 139 | } 140 | } 141 | } 142 | } 143 | } 144 | 145 | return (int)(pCurOutData - (pOutData + nOutDataOffset)); 146 | } 147 | -------------------------------------------------------------------------------- /src/expand_block.h: -------------------------------------------------------------------------------- 1 | /* 2 | * expand_block.h - block decompressor definitions 3 | * 4 | * Copyright (C) 2019 Emmanuel Marty 5 | * 6 | * This software is provided 'as-is', without any express or implied 7 | * warranty. In no event will the authors be held liable for any damages 8 | * arising from the use of this software. 9 | * 10 | * Permission is granted to anyone to use this software for any purpose, 11 | * including commercial applications, and to alter it and redistribute it 12 | * freely, subject to the following restrictions: 13 | * 14 | * 1. The origin of this software must not be misrepresented; you must not 15 | * claim that you wrote the original software. If you use this software 16 | * in a product, an acknowledgment in the product documentation would be 17 | * appreciated but is not required. 18 | * 2. Altered source versions must be plainly marked as such, and must not be 19 | * misrepresented as being the original software. 20 | * 3. This notice may not be removed or altered from any source distribution. 21 | */ 22 | 23 | /* 24 | * Uses the libdivsufsort library Copyright (c) 2003-2008 Yuta Mori 25 | * 26 | * Inspired by LZ4 by Yann Collet. https://github.com/lz4/lz4 27 | * With help, ideas, optimizations and speed measurements by spke 28 | * With ideas from Lizard by Przemyslaw Skibinski and Yann Collet. https://github.com/inikep/lizard 29 | * Also with ideas from smallz4 by Stephan Brumme. https://create.stephan-brumme.com/smallz4/ 30 | * 31 | */ 32 | 33 | #ifndef _EXPAND_BLOCK_H 34 | #define _EXPAND_BLOCK_H 35 | 36 | /** 37 | * Decompress one data block 38 | * 39 | * @param pInBlock pointer to compressed data 40 | * @param nBlockSize size of compressed data, in bytes 41 | * @param pOutData pointer to output decompression buffer (previously decompressed bytes + room for decompressing this block) 42 | * @param nOutDataOffset starting index of where to store decompressed bytes in output buffer (and size of previously decompressed bytes) 43 | * @param nBlockMaxSize total size of output decompression buffer, in bytes 44 | * 45 | * @return size of decompressed data in bytes, or -1 for error 46 | */ 47 | int lz4ultra_decompressor_expand_block(const unsigned char *pInBlock, int nBlockSize, unsigned char *pOutData, int nOutDataOffset, int nBlockMaxSize); 48 | 49 | #endif /* _EXPAND_BLOCK_H */ 50 | -------------------------------------------------------------------------------- /src/expand_inmem.c: -------------------------------------------------------------------------------- 1 | /* 2 | * expand_inmem.c - in-memory decompression implementation 3 | * 4 | * Copyright (C) 2019 Emmanuel Marty 5 | * 6 | * This software is provided 'as-is', without any express or implied 7 | * warranty. In no event will the authors be held liable for any damages 8 | * arising from the use of this software. 9 | * 10 | * Permission is granted to anyone to use this software for any purpose, 11 | * including commercial applications, and to alter it and redistribute it 12 | * freely, subject to the following restrictions: 13 | * 14 | * 1. The origin of this software must not be misrepresented; you must not 15 | * claim that you wrote the original software. If you use this software 16 | * in a product, an acknowledgment in the product documentation would be 17 | * appreciated but is not required. 18 | * 2. Altered source versions must be plainly marked as such, and must not be 19 | * misrepresented as being the original software. 20 | * 3. This notice may not be removed or altered from any source distribution. 21 | */ 22 | 23 | /* 24 | * Uses the libdivsufsort library Copyright (c) 2003-2008 Yuta Mori 25 | * 26 | * Inspired by LZ4 by Yann Collet. https://github.com/lz4/lz4 27 | * With help, ideas, optimizations and speed measurements by spke 28 | * With ideas from Lizard by Przemyslaw Skibinski and Yann Collet. https://github.com/inikep/lizard 29 | * Also with ideas from smallz4 by Stephan Brumme. https://create.stephan-brumme.com/smallz4/ 30 | * 31 | */ 32 | 33 | #include 34 | #include 35 | #include "expand_inmem.h" 36 | #include "lib.h" 37 | #include "frame.h" 38 | 39 | /** 40 | * Get maximum decompressed size of compressed data 41 | * 42 | * @param pFileData compressed data 43 | * @param nFileSize compressed size in bytes 44 | * 45 | * @return maximum decompressed size 46 | */ 47 | size_t lz4ultra_inmem_get_max_decompressed_size(const unsigned char *pFileData, size_t nFileSize) { 48 | const unsigned char *pCurFileData = pFileData; 49 | const unsigned char *pEndFileData = pCurFileData + nFileSize; 50 | int nBlockMaxCode = 0; 51 | unsigned int nFlags = 0; 52 | int nBlockMaxBits, nBlockMaxSize; 53 | size_t nMaxDecompressedSize = 0; 54 | 55 | /* Check header */ 56 | if ((pCurFileData + LZ4ULTRA_HEADER_SIZE) > pEndFileData) 57 | return -1; 58 | 59 | int nExtraHeaderSize = lz4ultra_check_header(pCurFileData, LZ4ULTRA_HEADER_SIZE); 60 | if (nExtraHeaderSize < 0) 61 | return -1; 62 | 63 | if (((pCurFileData + LZ4ULTRA_HEADER_SIZE + nExtraHeaderSize) > pEndFileData) || 64 | lz4ultra_decode_header(pCurFileData, LZ4ULTRA_HEADER_SIZE + nExtraHeaderSize, &nBlockMaxCode, &nFlags) != LZ4ULTRA_DECODE_OK) 65 | return -1; 66 | 67 | if (nFlags & LZ4ULTRA_FLAG_LEGACY_FRAMES) 68 | nBlockMaxBits = 23; 69 | else 70 | nBlockMaxBits = 8 + (nBlockMaxCode << 1); 71 | nBlockMaxSize = 1 << nBlockMaxBits; 72 | 73 | pCurFileData += (LZ4ULTRA_HEADER_SIZE + nExtraHeaderSize); 74 | 75 | while (pCurFileData < pEndFileData) { 76 | unsigned int nBlockDataSize = 0; 77 | int nIsUncompressed = 0; 78 | 79 | /* Decode frame header */ 80 | if ((pCurFileData + LZ4ULTRA_FRAME_SIZE) > pEndFileData || 81 | lz4ultra_decode_frame(pCurFileData, LZ4ULTRA_FRAME_SIZE, nFlags, &nBlockDataSize, &nIsUncompressed) != LZ4ULTRA_DECODE_OK) 82 | return -1; 83 | pCurFileData += LZ4ULTRA_FRAME_SIZE; 84 | 85 | if (!nBlockDataSize) 86 | break; 87 | 88 | /* Add one potentially full block to the decompressed size */ 89 | nMaxDecompressedSize += nBlockMaxSize; 90 | 91 | if ((pCurFileData + nBlockDataSize) > pEndFileData) 92 | return -1; 93 | 94 | pCurFileData += nBlockDataSize; 95 | } 96 | 97 | return nMaxDecompressedSize; 98 | } 99 | 100 | /** 101 | * Decompress data in memory 102 | * 103 | * @param pFileData compressed data 104 | * @param pOutBuffer buffer for decompressed data 105 | * @param nFileSize compressed size in bytes 106 | * @param nMaxOutBufferSize maximum capacity of decompression buffer 107 | * @param nFlags compression flags (LZ4ULTRA_FLAG_xxx) 108 | * 109 | * @return actual decompressed size, or -1 for error 110 | */ 111 | size_t lz4ultra_decompress_inmem(const unsigned char *pFileData, unsigned char *pOutBuffer, size_t nFileSize, size_t nMaxOutBufferSize, unsigned int nFlags) { 112 | const unsigned char *pCurFileData = pFileData; 113 | const unsigned char *pEndFileData = pCurFileData + nFileSize; 114 | unsigned char *pCurOutBuffer = pOutBuffer; 115 | const unsigned char *pEndOutBuffer = pCurOutBuffer + nMaxOutBufferSize; 116 | int nBlockMaxCode = 0; 117 | int nBlockMaxBits, nBlockMaxSize, nPreviousBlockSize; 118 | 119 | if (nFlags & LZ4ULTRA_FLAG_RAW_BLOCK) { 120 | return (size_t)lz4ultra_decompressor_expand_block(pFileData, (int)nFileSize - 2 /* EOD marker */, pOutBuffer, 0, (int)nMaxOutBufferSize); 121 | } 122 | 123 | /* Check header */ 124 | if ((pCurFileData + LZ4ULTRA_HEADER_SIZE) > pEndFileData) 125 | return -1; 126 | 127 | int nExtraHeaderSize = lz4ultra_check_header(pCurFileData, LZ4ULTRA_HEADER_SIZE); 128 | if (nExtraHeaderSize < 0) 129 | return -1; 130 | 131 | if (((pCurFileData + LZ4ULTRA_HEADER_SIZE + nExtraHeaderSize) > pEndFileData) || 132 | lz4ultra_decode_header(pCurFileData, LZ4ULTRA_HEADER_SIZE + nExtraHeaderSize, &nBlockMaxCode, &nFlags) != LZ4ULTRA_DECODE_OK) 133 | return -1; 134 | 135 | if (nFlags & LZ4ULTRA_FLAG_LEGACY_FRAMES) 136 | nBlockMaxBits = 23; 137 | else 138 | nBlockMaxBits = 8 + (nBlockMaxCode << 1); 139 | nBlockMaxSize = 1 << nBlockMaxBits; 140 | 141 | pCurFileData += (LZ4ULTRA_HEADER_SIZE + nExtraHeaderSize); 142 | nPreviousBlockSize = 0; 143 | 144 | while (pCurFileData < pEndFileData) { 145 | unsigned int nBlockDataSize = 0; 146 | int nIsUncompressed = 0; 147 | 148 | /* Decode frame header */ 149 | if ((pCurFileData + LZ4ULTRA_FRAME_SIZE) > pEndFileData || 150 | lz4ultra_decode_frame(pCurFileData, LZ4ULTRA_FRAME_SIZE, nFlags, &nBlockDataSize, &nIsUncompressed) != LZ4ULTRA_DECODE_OK) 151 | return -1; 152 | pCurFileData += LZ4ULTRA_FRAME_SIZE; 153 | 154 | if (!nBlockDataSize) 155 | break; 156 | 157 | if (!nIsUncompressed) { 158 | int nDecompressedSize; 159 | 160 | /* Decompress block */ 161 | if ((pCurFileData + nBlockDataSize) > pEndFileData) 162 | return -1; 163 | 164 | if ((nFlags & LZ4ULTRA_FLAG_INDEP_BLOCKS) || (nPreviousBlockSize == 0)) 165 | nDecompressedSize = lz4ultra_decompressor_expand_block(pCurFileData, nBlockDataSize, pCurOutBuffer, 0, (int)(pEndOutBuffer - pCurOutBuffer)); 166 | else 167 | nDecompressedSize = lz4ultra_decompressor_expand_block(pCurFileData, nBlockDataSize, pCurOutBuffer - nPreviousBlockSize, nPreviousBlockSize, (int)(pEndOutBuffer - pCurOutBuffer + nPreviousBlockSize)); 168 | if (nDecompressedSize < 0) 169 | return -1; 170 | 171 | pCurOutBuffer += nDecompressedSize; 172 | nPreviousBlockSize = nDecompressedSize; 173 | } 174 | else { 175 | /* Copy uncompressed block */ 176 | if ((pCurFileData + nBlockDataSize) > pEndFileData) 177 | return -1; 178 | if ((pCurOutBuffer + nBlockDataSize) > pEndOutBuffer) 179 | return -1; 180 | memcpy(pCurOutBuffer, pCurFileData, nBlockDataSize); 181 | pCurOutBuffer += nBlockDataSize; 182 | } 183 | 184 | pCurFileData += nBlockDataSize; 185 | } 186 | 187 | return (int)(pCurOutBuffer - pOutBuffer); 188 | } 189 | -------------------------------------------------------------------------------- /src/expand_inmem.h: -------------------------------------------------------------------------------- 1 | /* 2 | * expand_inmem.h - in-memory decompression definitions 3 | * 4 | * Copyright (C) 2019 Emmanuel Marty 5 | * 6 | * This software is provided 'as-is', without any express or implied 7 | * warranty. In no event will the authors be held liable for any damages 8 | * arising from the use of this software. 9 | * 10 | * Permission is granted to anyone to use this software for any purpose, 11 | * including commercial applications, and to alter it and redistribute it 12 | * freely, subject to the following restrictions: 13 | * 14 | * 1. The origin of this software must not be misrepresented; you must not 15 | * claim that you wrote the original software. If you use this software 16 | * in a product, an acknowledgment in the product documentation would be 17 | * appreciated but is not required. 18 | * 2. Altered source versions must be plainly marked as such, and must not be 19 | * misrepresented as being the original software. 20 | * 3. This notice may not be removed or altered from any source distribution. 21 | */ 22 | 23 | /* 24 | * Uses the libdivsufsort library Copyright (c) 2003-2008 Yuta Mori 25 | * 26 | * Inspired by LZ4 by Yann Collet. https://github.com/lz4/lz4 27 | * With help, ideas, optimizations and speed measurements by spke 28 | * With ideas from Lizard by Przemyslaw Skibinski and Yann Collet. https://github.com/inikep/lizard 29 | * Also with ideas from smallz4 by Stephan Brumme. https://create.stephan-brumme.com/smallz4/ 30 | * 31 | */ 32 | 33 | #ifndef _EXPAND_INMEM_H 34 | #define _EXPAND_INMEM_H 35 | 36 | #include 37 | 38 | /** 39 | * Get maximum decompressed size of compressed data 40 | * 41 | * @param pFileData compressed data 42 | * @param nFileSize compressed size in bytes 43 | * 44 | * @return maximum decompressed size 45 | */ 46 | size_t lz4ultra_inmem_get_max_decompressed_size(const unsigned char *pFileData, size_t nFileSize); 47 | 48 | /** 49 | * Decompress data in memory 50 | * 51 | * @param pFileData compressed data 52 | * @param pOutBuffer buffer for decompressed data 53 | * @param nFileSize compressed size in bytes 54 | * @param nMaxOutBufferSize maximum capacity of decompression buffer 55 | * @param nFlags compression flags (LZ4ULTRA_FLAG_xxx) 56 | * 57 | * @return actual decompressed size, or -1 for error 58 | */ 59 | size_t lz4ultra_decompress_inmem(const unsigned char *pFileData, unsigned char *pOutBuffer, size_t nFileSize, size_t nMaxOutBufferSize, unsigned int nFlags); 60 | 61 | #endif /* _EXPAND_INMEM_H */ 62 | -------------------------------------------------------------------------------- /src/expand_streaming.c: -------------------------------------------------------------------------------- 1 | /* 2 | * expand_streaming.c - streaming decompression implementation 3 | * 4 | * Copyright (C) 2019 Emmanuel Marty 5 | * 6 | * This software is provided 'as-is', without any express or implied 7 | * warranty. In no event will the authors be held liable for any damages 8 | * arising from the use of this software. 9 | * 10 | * Permission is granted to anyone to use this software for any purpose, 11 | * including commercial applications, and to alter it and redistribute it 12 | * freely, subject to the following restrictions: 13 | * 14 | * 1. The origin of this software must not be misrepresented; you must not 15 | * claim that you wrote the original software. If you use this software 16 | * in a product, an acknowledgment in the product documentation would be 17 | * appreciated but is not required. 18 | * 2. Altered source versions must be plainly marked as such, and must not be 19 | * misrepresented as being the original software. 20 | * 3. This notice may not be removed or altered from any source distribution. 21 | */ 22 | 23 | /* 24 | * Uses the libdivsufsort library Copyright (c) 2003-2008 Yuta Mori 25 | * 26 | * Inspired by LZ4 by Yann Collet. https://github.com/lz4/lz4 27 | * With help, ideas, optimizations and speed measurements by spke 28 | * With ideas from Lizard by Przemyslaw Skibinski and Yann Collet. https://github.com/inikep/lizard 29 | * Also with ideas from smallz4 by Stephan Brumme. https://create.stephan-brumme.com/smallz4/ 30 | * 31 | */ 32 | 33 | #include 34 | #include 35 | #include "expand_streaming.h" 36 | #include "format.h" 37 | #include "frame.h" 38 | #include "lib.h" 39 | 40 | /*-------------- File API -------------- */ 41 | 42 | /** 43 | * Decompress file 44 | * 45 | * @param pszInFilename name of input(compressed) file to decompress 46 | * @param pszOutFilename name of output(decompressed) file to generate 47 | * @param pszDictionaryFilename name of dictionary file, or NULL for none 48 | * @param nFlags compression flags (LZ4ULTRA_FLAG_RAW_BLOCK to decompress a raw block, or 0) 49 | * @param pOriginalSize pointer to returned output(decompressed) size, updated when this function is successful 50 | * @param pCompressedSize pointer to returned input(compressed) size, updated when this function is successful 51 | * 52 | * @return LZ4ULTRA_OK for success, or an error value from lz4ultra_status_t 53 | */ 54 | lz4ultra_status_t lz4ultra_decompress_file(const char *pszInFilename, const char *pszOutFilename, const char *pszDictionaryFilename, const unsigned int nFlags, 55 | long long *pOriginalSize, long long *pCompressedSize) { 56 | lz4ultra_stream_t inStream, outStream; 57 | void *pDictionaryData = NULL; 58 | int nDictionaryDataSize = 0; 59 | lz4ultra_status_t nStatus; 60 | 61 | if (lz4ultra_filestream_open(&inStream, pszInFilename, "rb") < 0) { 62 | return LZ4ULTRA_ERROR_SRC; 63 | } 64 | 65 | if (lz4ultra_filestream_open(&outStream, pszOutFilename, "wb") < 0) { 66 | inStream.close(&inStream); 67 | return LZ4ULTRA_ERROR_DST; 68 | } 69 | 70 | nStatus = lz4ultra_dictionary_load(pszDictionaryFilename, &pDictionaryData, &nDictionaryDataSize); 71 | if (nStatus) { 72 | outStream.close(&outStream); 73 | inStream.close(&inStream); 74 | 75 | return nStatus; 76 | } 77 | 78 | nStatus = lz4ultra_decompress_stream(&inStream, &outStream, pDictionaryData, nDictionaryDataSize, nFlags, pOriginalSize, pCompressedSize); 79 | 80 | lz4ultra_dictionary_free(&pDictionaryData); 81 | outStream.close(&outStream); 82 | inStream.close(&inStream); 83 | 84 | return nStatus; 85 | } 86 | 87 | /*-------------- Streaming API -------------- */ 88 | 89 | /** 90 | * Decompress stream 91 | * 92 | * @param pInStream input(compressed) stream to decompress 93 | * @param pOutStream output(decompressed) stream to write to 94 | * @param pDictionaryData dictionary contents, or NULL for none 95 | * @param nDictionaryDataSize size of dictionary contents, or 0 96 | * @param pOriginalSize pointer to returned output(decompressed) size, updated when this function is successful 97 | * @param pCompressedSize pointer to returned input(compressed) size, updated when this function is successful 98 | * 99 | * @return LZ4ULTRA_OK for success, or an error value from lz4ultra_status_t 100 | */ 101 | lz4ultra_status_t lz4ultra_decompress_stream(lz4ultra_stream_t *pInStream, lz4ultra_stream_t *pOutStream, const void *pDictionaryData, int nDictionaryDataSize, unsigned int nFlags, 102 | long long *pOriginalSize, long long *pCompressedSize) { 103 | long long nOriginalSize = 0LL; 104 | long long nCompressedSize = 0LL; 105 | int nBlockMaxCode = 7; 106 | unsigned char cFrameData[16]; 107 | unsigned char *pInBlock; 108 | unsigned char *pOutData; 109 | 110 | if ((nFlags & LZ4ULTRA_FLAG_RAW_BLOCK) == 0) { 111 | memset(cFrameData, 0, 16); 112 | 113 | if (pInStream->read(pInStream, cFrameData, LZ4ULTRA_HEADER_SIZE) != LZ4ULTRA_HEADER_SIZE) { 114 | return LZ4ULTRA_ERROR_SRC; 115 | } 116 | 117 | int nExtraHeaderSize = lz4ultra_check_header(cFrameData, LZ4ULTRA_HEADER_SIZE); 118 | if (nExtraHeaderSize < 0) 119 | return LZ4ULTRA_ERROR_FORMAT; 120 | 121 | if (pInStream->read(pInStream, cFrameData + LZ4ULTRA_HEADER_SIZE, nExtraHeaderSize) != nExtraHeaderSize) { 122 | return LZ4ULTRA_ERROR_SRC; 123 | } 124 | 125 | int nSuccess = lz4ultra_decode_header(cFrameData, LZ4ULTRA_HEADER_SIZE + nExtraHeaderSize, &nBlockMaxCode, &nFlags); 126 | if (nSuccess < 0) { 127 | if (nSuccess == LZ4ULTRA_DECODE_ERR_SUM) 128 | return LZ4ULTRA_ERROR_CHECKSUM; 129 | else 130 | return LZ4ULTRA_ERROR_FORMAT; 131 | } 132 | 133 | nCompressedSize += (long long)(LZ4ULTRA_HEADER_SIZE + nExtraHeaderSize); 134 | } 135 | 136 | int nBlockMaxBits; 137 | if (nFlags & LZ4ULTRA_FLAG_LEGACY_FRAMES) 138 | nBlockMaxBits = 23; 139 | else 140 | nBlockMaxBits = 8 + (nBlockMaxCode << 1); 141 | int nBlockMaxSize = 1 << nBlockMaxBits; 142 | 143 | pInBlock = (unsigned char*)malloc(nBlockMaxSize); 144 | if (!pInBlock) { 145 | return LZ4ULTRA_ERROR_MEMORY; 146 | } 147 | 148 | pOutData = (unsigned char*)malloc(nBlockMaxSize + HISTORY_SIZE); 149 | if (!pOutData) { 150 | free(pInBlock); 151 | pInBlock = NULL; 152 | 153 | return LZ4ULTRA_ERROR_MEMORY; 154 | } 155 | 156 | int nDecompressionError = 0; 157 | int nPrevDecompressedSize = 0; 158 | int nNumBlocks = 0; 159 | 160 | while (!pInStream->eof(pInStream) && !nDecompressionError) { 161 | unsigned int nBlockSize = 0; 162 | int nIsUncompressed = 0; 163 | 164 | if (nPrevDecompressedSize != 0) { 165 | memcpy(pOutData + HISTORY_SIZE - nPrevDecompressedSize, pOutData + HISTORY_SIZE + (nBlockMaxSize - nPrevDecompressedSize), nPrevDecompressedSize); 166 | } 167 | else if (nDictionaryDataSize != 0) { 168 | memcpy(pOutData + HISTORY_SIZE - nDictionaryDataSize, pDictionaryData, nDictionaryDataSize); 169 | nPrevDecompressedSize = nDictionaryDataSize; 170 | 171 | if (!(nFlags & LZ4ULTRA_FLAG_INDEP_BLOCKS)) 172 | nDictionaryDataSize = 0; 173 | } 174 | 175 | if ((nFlags & LZ4ULTRA_FLAG_RAW_BLOCK) == 0) { 176 | memset(cFrameData, 0, 16); 177 | if (pInStream->read(pInStream, cFrameData, LZ4ULTRA_FRAME_SIZE) == LZ4ULTRA_FRAME_SIZE) { 178 | int nSuccess = lz4ultra_decode_frame(cFrameData, LZ4ULTRA_FRAME_SIZE, nFlags, &nBlockSize, &nIsUncompressed); 179 | if (nSuccess < 0) 180 | nBlockSize = 0; 181 | 182 | nCompressedSize += (long long)LZ4ULTRA_FRAME_SIZE; 183 | } 184 | else { 185 | nBlockSize = 0; 186 | } 187 | } 188 | else { 189 | if (!nNumBlocks) 190 | nBlockSize = nBlockMaxSize; 191 | else 192 | nBlockSize = 0; 193 | } 194 | 195 | if (nBlockSize != 0) { 196 | int nDecompressedSize = 0; 197 | 198 | if ((int)nBlockSize > nBlockMaxSize) { 199 | nDecompressionError = LZ4ULTRA_ERROR_FORMAT; 200 | break; 201 | } 202 | size_t nReadBytes = pInStream->read(pInStream, pInBlock, nBlockSize); 203 | if (nFlags & LZ4ULTRA_FLAG_RAW_BLOCK) { 204 | if (nReadBytes > 2) 205 | nReadBytes -= 2; 206 | else 207 | nReadBytes = 0; 208 | nBlockSize = (unsigned int)nReadBytes; 209 | } 210 | 211 | if (nReadBytes == nBlockSize) { 212 | nCompressedSize += (long long)nReadBytes; 213 | 214 | if (nIsUncompressed) { 215 | memcpy(pOutData + HISTORY_SIZE, pInBlock, nBlockSize); 216 | nDecompressedSize = nBlockSize; 217 | } 218 | else { 219 | nDecompressedSize = lz4ultra_decompressor_expand_block(pInBlock, nBlockSize, pOutData, HISTORY_SIZE, nBlockMaxSize); 220 | if (nDecompressedSize < 0) { 221 | nDecompressionError = LZ4ULTRA_ERROR_DECOMPRESSION; 222 | break; 223 | } 224 | } 225 | 226 | if (nDecompressedSize != 0) { 227 | nOriginalSize += (long long)nDecompressedSize; 228 | 229 | if (pOutStream->write(pOutStream, pOutData + HISTORY_SIZE, nDecompressedSize) != nDecompressedSize) 230 | nDecompressionError = LZ4ULTRA_ERROR_DST; 231 | 232 | if (!(nFlags & LZ4ULTRA_FLAG_INDEP_BLOCKS)) { 233 | nPrevDecompressedSize = nDecompressedSize; 234 | if (nPrevDecompressedSize > HISTORY_SIZE) 235 | nPrevDecompressedSize = HISTORY_SIZE; 236 | } 237 | else { 238 | nPrevDecompressedSize = 0; 239 | } 240 | nDecompressedSize = 0; 241 | } 242 | } 243 | else { 244 | break; 245 | } 246 | 247 | nNumBlocks++; 248 | } 249 | else { 250 | break; 251 | } 252 | } 253 | 254 | free(pOutData); 255 | pOutData = NULL; 256 | 257 | free(pInBlock); 258 | pInBlock = NULL; 259 | 260 | *pOriginalSize = nOriginalSize; 261 | *pCompressedSize = nCompressedSize; 262 | return nDecompressionError; 263 | } 264 | -------------------------------------------------------------------------------- /src/expand_streaming.h: -------------------------------------------------------------------------------- 1 | /* 2 | * expand_streaming.h - streaming decompression definitions 3 | * 4 | * Copyright (C) 2019 Emmanuel Marty 5 | * 6 | * This software is provided 'as-is', without any express or implied 7 | * warranty. In no event will the authors be held liable for any damages 8 | * arising from the use of this software. 9 | * 10 | * Permission is granted to anyone to use this software for any purpose, 11 | * including commercial applications, and to alter it and redistribute it 12 | * freely, subject to the following restrictions: 13 | * 14 | * 1. The origin of this software must not be misrepresented; you must not 15 | * claim that you wrote the original software. If you use this software 16 | * in a product, an acknowledgment in the product documentation would be 17 | * appreciated but is not required. 18 | * 2. Altered source versions must be plainly marked as such, and must not be 19 | * misrepresented as being the original software. 20 | * 3. This notice may not be removed or altered from any source distribution. 21 | */ 22 | 23 | /* 24 | * Uses the libdivsufsort library Copyright (c) 2003-2008 Yuta Mori 25 | * 26 | * Inspired by LZ4 by Yann Collet. https://github.com/lz4/lz4 27 | * With help, ideas, optimizations and speed measurements by spke 28 | * With ideas from Lizard by Przemyslaw Skibinski and Yann Collet. https://github.com/inikep/lizard 29 | * Also with ideas from smallz4 by Stephan Brumme. https://create.stephan-brumme.com/smallz4/ 30 | * 31 | */ 32 | 33 | #ifndef _EXPAND_STREAMING_H 34 | #define _EXPAND_STREAMING_H 35 | 36 | #include "stream.h" 37 | 38 | /* Forward declaration */ 39 | typedef enum _lz4ultra_status_t lz4ultra_status_t; 40 | 41 | /*-------------- File API -------------- */ 42 | 43 | /** 44 | * Decompress file 45 | * 46 | * @param pszInFilename name of input(compressed) file to decompress 47 | * @param pszOutFilename name of output(decompressed) file to generate 48 | * @param pszDictionaryFilename name of dictionary file, or NULL for none 49 | * @param nFlags compression flags (LZ4ULTRA_FLAG_RAW_BLOCK to decompress a raw block, or 0) 50 | * @param pOriginalSize pointer to returned output(decompressed) size, updated when this function is successful 51 | * @param pCompressedSize pointer to returned input(compressed) size, updated when this function is successful 52 | * 53 | * @return LZ4ULTRA_OK for success, or an error value from lz4ultra_status_t 54 | */ 55 | lz4ultra_status_t lz4ultra_decompress_file(const char *pszInFilename, const char *pszOutFilename, const char *pszDictionaryFilename, const unsigned int nFlags, 56 | long long *pOriginalSize, long long *pCompressedSize); 57 | 58 | /*-------------- Streaming API -------------- */ 59 | 60 | /** 61 | * Decompress stream 62 | * 63 | * @param pInStream input(compressed) stream to decompress 64 | * @param pOutStream output(decompressed) stream to write to 65 | * @param pDictionaryData dictionary contents, or NULL for none 66 | * @param nDictionaryDataSize size of dictionary contents, or 0 67 | * @param nFlags compression flags (LZ4ULTRA_FLAG_RAW_BLOCK to decompress a raw block, or 0) 68 | * @param pOriginalSize pointer to returned output(decompressed) size, updated when this function is successful 69 | * @param pCompressedSize pointer to returned input(compressed) size, updated when this function is successful 70 | * 71 | * @return LZ4ULTRA_OK for success, or an error value from lz4ultra_status_t 72 | */ 73 | lz4ultra_status_t lz4ultra_decompress_stream(lz4ultra_stream_t *pInStream, lz4ultra_stream_t *pOutStream, const void *pDictionaryData, int nDictionaryDataSize, unsigned int nFlags, 74 | long long *pOriginalSize, long long *pCompressedSize); 75 | 76 | #endif /* _EXPAND_STREAMING_H */ 77 | -------------------------------------------------------------------------------- /src/format.h: -------------------------------------------------------------------------------- 1 | /* 2 | * format.h - byte stream format definitions 3 | * 4 | * Copyright (C) 2019 Emmanuel Marty 5 | * 6 | * This software is provided 'as-is', without any express or implied 7 | * warranty. In no event will the authors be held liable for any damages 8 | * arising from the use of this software. 9 | * 10 | * Permission is granted to anyone to use this software for any purpose, 11 | * including commercial applications, and to alter it and redistribute it 12 | * freely, subject to the following restrictions: 13 | * 14 | * 1. The origin of this software must not be misrepresented; you must not 15 | * claim that you wrote the original software. If you use this software 16 | * in a product, an acknowledgment in the product documentation would be 17 | * appreciated but is not required. 18 | * 2. Altered source versions must be plainly marked as such, and must not be 19 | * misrepresented as being the original software. 20 | * 3. This notice may not be removed or altered from any source distribution. 21 | */ 22 | 23 | /* 24 | * Uses the libdivsufsort library Copyright (c) 2003-2008 Yuta Mori 25 | * 26 | * Inspired by LZ4 by Yann Collet. https://github.com/lz4/lz4 27 | * With help, ideas, optimizations and speed measurements by spke 28 | * With ideas from Lizard by Przemyslaw Skibinski and Yann Collet. https://github.com/inikep/lizard 29 | * Also with ideas from smallz4 by Stephan Brumme. https://create.stephan-brumme.com/smallz4/ 30 | * 31 | */ 32 | 33 | #ifndef _FORMAT_H 34 | #define _FORMAT_H 35 | 36 | #define MIN_MATCH_SIZE 4 37 | #define MIN_OFFSET 1 38 | #define MAX_OFFSET 0xffff 39 | #define HISTORY_SIZE 65536 40 | #define LITERALS_RUN_LEN 15 41 | #define MATCH_RUN_LEN 15 42 | 43 | #endif /* _FORMAT_H */ 44 | -------------------------------------------------------------------------------- /src/frame.c: -------------------------------------------------------------------------------- 1 | /* 2 | * frame.c - lz4 frame implementation 3 | * 4 | * Copyright (C) 2019 Emmanuel Marty 5 | * 6 | * This software is provided 'as-is', without any express or implied 7 | * warranty. In no event will the authors be held liable for any damages 8 | * arising from the use of this software. 9 | * 10 | * Permission is granted to anyone to use this software for any purpose, 11 | * including commercial applications, and to alter it and redistribute it 12 | * freely, subject to the following restrictions: 13 | * 14 | * 1. The origin of this software must not be misrepresented; you must not 15 | * claim that you wrote the original software. If you use this software 16 | * in a product, an acknowledgment in the product documentation would be 17 | * appreciated but is not required. 18 | * 2. Altered source versions must be plainly marked as such, and must not be 19 | * misrepresented as being the original software. 20 | * 3. This notice may not be removed or altered from any source distribution. 21 | */ 22 | 23 | /* 24 | * Uses the libdivsufsort library Copyright (c) 2003-2008 Yuta Mori 25 | * 26 | * Inspired by LZ4 by Yann Collet. https://github.com/lz4/lz4 27 | * With help, ideas, optimizations and speed measurements by spke 28 | * With ideas from Lizard by Przemyslaw Skibinski and Yann Collet. https://github.com/inikep/lizard 29 | * Also with ideas from smallz4 by Stephan Brumme. https://create.stephan-brumme.com/smallz4/ 30 | * 31 | */ 32 | 33 | #include "frame.h" 34 | #include "lib.h" 35 | #include "xxhash.h" 36 | 37 | /** 38 | * Encode compressed stream header 39 | * 40 | * @param pFrameData encoding buffer 41 | * @param nMaxFrameDataSize max encoding buffer size, in bytes 42 | * @param nFlags compression flags (LZ4ULTRA_FLAG_xxx) 43 | * @param nBlockMaxCode max block size code (4-7) 44 | * 45 | * @return number of encoded bytes, or -1 for failure 46 | */ 47 | int lz4ultra_encode_header(unsigned char *pFrameData, const int nMaxFrameDataSize, const unsigned int nFlags, int nBlockMaxCode) { 48 | if (nFlags & LZ4ULTRA_FLAG_LEGACY_FRAMES) { 49 | if (nMaxFrameDataSize >= 4) { 50 | pFrameData[0] = 0x02; /* Legacy magic number: 0x184D2204 */ 51 | pFrameData[1] = 0x21; 52 | pFrameData[2] = 0x4C; 53 | pFrameData[3] = 0x18; 54 | 55 | return 4; 56 | } 57 | else { 58 | return LZ4ULTRA_ENCODE_ERR; 59 | } 60 | } 61 | else { 62 | if (nMaxFrameDataSize >= 7) { 63 | pFrameData[0] = 0x04; /* Magic number: 0x184D2204 */ 64 | pFrameData[1] = 0x22; 65 | pFrameData[2] = 0x4D; 66 | pFrameData[3] = 0x18; 67 | 68 | pFrameData[4] = 0b01000000; /* Version.Hi Version.Lo !B.Indep B.Checksum Content.Size Content.Checksum Reserved.Hi Reserved.Lo */ 69 | if (nFlags & LZ4ULTRA_FLAG_INDEP_BLOCKS) 70 | pFrameData[4] |= 0b00100000; /* B.Indep */ 71 | pFrameData[5] = nBlockMaxCode << 4; /* Block MaxSize */ 72 | 73 | XXH32_hash_t headerSum = XXH32(pFrameData + 4, 2, 0); 74 | pFrameData[6] = (headerSum >> 8) & 0xff; /* Header checksum */ 75 | 76 | return 7; 77 | } 78 | else { 79 | return LZ4ULTRA_ENCODE_ERR; 80 | } 81 | } 82 | } 83 | 84 | /** 85 | * Encode compressed block frame header 86 | * 87 | * @param pFrameData encoding buffer 88 | * @param nMaxFrameDataSize max encoding buffer size, in bytes 89 | * @param nFlags compression flags 90 | * @param nBlockDataSize compressed block's data size, in bytes 91 | * 92 | * @return number of encoded bytes, or -1 for failure 93 | */ 94 | int lz4ultra_encode_compressed_block_frame(unsigned char *pFrameData, const int nMaxFrameDataSize, const unsigned int nFlags, const int nBlockDataSize) { 95 | if (nMaxFrameDataSize >= 4 && (nBlockDataSize & 0x80000000) == 0) { 96 | pFrameData[0] = nBlockDataSize & 0xff; 97 | pFrameData[1] = (nBlockDataSize >> 8) & 0xff; 98 | pFrameData[2] = (nBlockDataSize >> 16) & 0xff; 99 | pFrameData[3] = (nBlockDataSize >> 24) & 0x7f; /* Compressed block */ 100 | return 4; 101 | } 102 | else { 103 | return LZ4ULTRA_ENCODE_ERR; 104 | } 105 | } 106 | 107 | /** 108 | * Encode uncompressed block frame header 109 | * 110 | * @param pFrameData encoding buffer 111 | * @param nMaxFrameDataSize max encoding buffer size, in bytes 112 | * @param nFlags compression flags 113 | * @param nBlockDataSize uncompressed block's data size, in bytes 114 | * 115 | * @return number of encoded bytes, or -1 for failure 116 | */ 117 | int lz4ultra_encode_uncompressed_block_frame(unsigned char *pFrameData, const int nMaxFrameDataSize, const unsigned int nFlags, const int nBlockDataSize) { 118 | if (nFlags & LZ4ULTRA_FLAG_LEGACY_FRAMES) 119 | return LZ4ULTRA_ERROR_RAW_UNCOMPRESSED; 120 | 121 | if (nMaxFrameDataSize >= 4 && (nBlockDataSize & 0x80000000) == 0) { 122 | pFrameData[0] = nBlockDataSize & 0xff; 123 | pFrameData[1] = (nBlockDataSize >> 8) & 0xff; 124 | pFrameData[2] = (nBlockDataSize >> 16) & 0xff; 125 | pFrameData[3] = ((nBlockDataSize >> 24) & 0x7f) | 0x80; /* Uncompressed block */ 126 | return 4; 127 | } 128 | else { 129 | return LZ4ULTRA_ENCODE_ERR; 130 | } 131 | } 132 | 133 | /** 134 | * Encode terminal frame header 135 | * 136 | * @param pFrameData encoding buffer 137 | * @param nMaxFrameDataSize max encoding buffer size, in bytes 138 | * @param nFlags compression flags 139 | * 140 | * @return number of encoded bytes, or -1 for failure 141 | */ 142 | int lz4ultra_encode_footer_frame(unsigned char *pFrameData, const int nMaxFrameDataSize, const unsigned int nFlags) { 143 | if (nFlags & LZ4ULTRA_FLAG_LEGACY_FRAMES) 144 | return 0; 145 | 146 | if (nMaxFrameDataSize >= 4) { 147 | pFrameData[0] = 0x00; /* EOD frame */ 148 | pFrameData[1] = 0x00; 149 | pFrameData[2] = 0x00; 150 | pFrameData[3] = 0x00; 151 | return 4; 152 | } 153 | else { 154 | return LZ4ULTRA_ENCODE_ERR; 155 | } 156 | } 157 | 158 | /** 159 | * Check compressed stream header 160 | * 161 | * @param pFrameData data bytes 162 | * @param nFrameDataSize number of bytes to check 163 | * 164 | * @return the number of extra header bytes to read for decoding, or LZ4ULTRA_DECODE_ERR_xxx for failure 165 | */ 166 | int lz4ultra_check_header(const unsigned char *pFrameData, const int nFrameDataSize) { 167 | if (nFrameDataSize == 4) { 168 | if (pFrameData[0] == 0x04 && 169 | pFrameData[1] == 0x22 && 170 | pFrameData[2] == 0x4D && 171 | pFrameData[3] == 0x18) { 172 | /* LZ4 magic number */ 173 | return 3; 174 | } 175 | 176 | if (pFrameData[0] == 0x02 && 177 | pFrameData[1] == 0x21 && 178 | pFrameData[2] == 0x4C && 179 | pFrameData[3] == 0x18) { 180 | /* Legacy magic number */ 181 | return 0; 182 | } 183 | } 184 | 185 | return LZ4ULTRA_DECODE_ERR_FORMAT; 186 | } 187 | 188 | /** 189 | * Decode compressed stream header 190 | * 191 | * @param pFrameData data bytes 192 | * @param nFrameDataSize number of bytes to decode 193 | * @param nBlockMaxCode pointer to max block size code (4-7), updated if this function succeeds 194 | * @param nFlags returned compression flags 195 | * 196 | * @return LZ4ULTRA_DECODE_OK for success, or LZ4ULTRA_DECODE_ERR_xxx for failure 197 | */ 198 | int lz4ultra_decode_header(const unsigned char *pFrameData, const int nFrameDataSize, int *nBlockMaxCode, unsigned int *nFlags) { 199 | if (nFrameDataSize == 7) { 200 | if (pFrameData[0] != 0x04 || 201 | pFrameData[1] != 0x22 || 202 | pFrameData[2] != 0x4D || 203 | pFrameData[3] != 0x18 || 204 | (pFrameData[4] & 0xc0) != 0b01000000 || 205 | (pFrameData[5] & 0x0f) != 0) { 206 | return LZ4ULTRA_DECODE_ERR_FORMAT; 207 | } 208 | 209 | XXH32_hash_t headerSum = XXH32(pFrameData + 4, 2, 0); 210 | if (((headerSum >> 8) & 0xff) != pFrameData[6]) { 211 | return LZ4ULTRA_DECODE_ERR_SUM; 212 | } 213 | 214 | *nFlags = (pFrameData[4] & 0x20) ? LZ4ULTRA_FLAG_INDEP_BLOCKS : 0; 215 | *nBlockMaxCode = (pFrameData[5] >> 4); 216 | 217 | return LZ4ULTRA_DECODE_OK; 218 | } 219 | else if (nFrameDataSize == 4) { 220 | if (pFrameData[0] != 0x02 || 221 | pFrameData[1] != 0x21 || 222 | pFrameData[2] != 0x4C || 223 | pFrameData[3] != 0x18) { 224 | return LZ4ULTRA_DECODE_ERR_FORMAT; 225 | } 226 | 227 | *nFlags = LZ4ULTRA_FLAG_LEGACY_FRAMES; 228 | *nBlockMaxCode = 0; 229 | 230 | return LZ4ULTRA_DECODE_OK; 231 | } 232 | else { 233 | return LZ4ULTRA_DECODE_ERR_FORMAT; 234 | } 235 | } 236 | 237 | /** 238 | * Decode frame header 239 | * 240 | * @param pFrameData data bytes 241 | * @param nFrameDataSize number of bytes to decode 242 | * @param nFlags compression flags 243 | * @param nBlockSize pointer to block size, updated if this function succeeds (set to 0 if this is the terminal frame) 244 | * @param nIsUncompressed pointer to compressed block flag, updated if this function succeeds 245 | * 246 | * @return LZ4ULTRA_DECODE_OK for success, or LZ4ULTRA_DECODE_ERR_FORMAT for failure 247 | */ 248 | int lz4ultra_decode_frame(const unsigned char *pFrameData, const int nFrameDataSize, const unsigned int nFlags, unsigned int *nBlockSize, int *nIsUncompressed) { 249 | if (nFrameDataSize == 4) { 250 | *nBlockSize = ((unsigned int)pFrameData[0]) | 251 | (((unsigned int)pFrameData[1]) << 8) | 252 | (((unsigned int)pFrameData[2]) << 16) | 253 | (((unsigned int)pFrameData[3]) << 24); 254 | 255 | *nIsUncompressed = ((*nBlockSize) & 0x80000000) ? 1 : 0; 256 | (*nBlockSize) &= 0x7fffffff; 257 | return 0; 258 | } 259 | else { 260 | return LZ4ULTRA_DECODE_ERR_FORMAT; 261 | } 262 | } 263 | -------------------------------------------------------------------------------- /src/frame.h: -------------------------------------------------------------------------------- 1 | /* 2 | * frame.h - lz4 frame definitions 3 | * 4 | * Copyright (C) 2019 Emmanuel Marty 5 | * 6 | * This software is provided 'as-is', without any express or implied 7 | * warranty. In no event will the authors be held liable for any damages 8 | * arising from the use of this software. 9 | * 10 | * Permission is granted to anyone to use this software for any purpose, 11 | * including commercial applications, and to alter it and redistribute it 12 | * freely, subject to the following restrictions: 13 | * 14 | * 1. The origin of this software must not be misrepresented; you must not 15 | * claim that you wrote the original software. If you use this software 16 | * in a product, an acknowledgment in the product documentation would be 17 | * appreciated but is not required. 18 | * 2. Altered source versions must be plainly marked as such, and must not be 19 | * misrepresented as being the original software. 20 | * 3. This notice may not be removed or altered from any source distribution. 21 | */ 22 | 23 | /* 24 | * Uses the libdivsufsort library Copyright (c) 2003-2008 Yuta Mori 25 | * 26 | * Inspired by LZ4 by Yann Collet. https://github.com/lz4/lz4 27 | * With help, ideas, optimizations and speed measurements by spke 28 | * With ideas from Lizard by Przemyslaw Skibinski and Yann Collet. https://github.com/inikep/lizard 29 | * Also with ideas from smallz4 by Stephan Brumme. https://create.stephan-brumme.com/smallz4/ 30 | * 31 | */ 32 | 33 | #ifndef _FRAME_H 34 | #define _FRAME_H 35 | 36 | #include 37 | 38 | #define LZ4ULTRA_HEADER_SIZE 4 39 | #define LZ4ULTRA_MAX_HEADER_SIZE 7 40 | #define LZ4ULTRA_FRAME_SIZE 4 41 | 42 | #define LZ4ULTRA_ENCODE_ERR (-1) 43 | 44 | #define LZ4ULTRA_DECODE_OK 0 45 | #define LZ4ULTRA_DECODE_ERR_FORMAT (-1) 46 | #define LZ4ULTRA_DECODE_ERR_SUM (-2) 47 | 48 | /** 49 | * Encode compressed stream header 50 | * 51 | * @param pFrameData encoding buffer 52 | * @param nMaxFrameDataSize max encoding buffer size, in bytes 53 | * @param nFlags compression flags (LZ4ULTRA_FLAG_xxx) 54 | * @param nBlockMaxCode max block size code (4-7) 55 | * 56 | * @return number of encoded bytes, or -1 for failure 57 | */ 58 | int lz4ultra_encode_header(unsigned char *pFrameData, const int nMaxFrameDataSize, const unsigned int nFlags, int nBlockMaxCode); 59 | 60 | /** 61 | * Encode compressed block frame header 62 | * 63 | * @param pFrameData encoding buffer 64 | * @param nMaxFrameDataSize max encoding buffer size, in bytes 65 | * @param nFlags compression flags 66 | * @param nBlockDataSize compressed block's data size, in bytes 67 | * 68 | * @return number of encoded bytes, or -1 for failure 69 | */ 70 | int lz4ultra_encode_compressed_block_frame(unsigned char *pFrameData, const int nMaxFrameDataSize, const unsigned int nFlags, const int nBlockDataSize); 71 | 72 | /** 73 | * Encode uncompressed block frame header 74 | * 75 | * @param pFrameData encoding buffer 76 | * @param nMaxFrameDataSize max encoding buffer size, in bytes 77 | * @param nFlags compression flags 78 | * @param nBlockDataSize uncompressed block's data size, in bytes 79 | * 80 | * @return number of encoded bytes, or -1 for failure 81 | */ 82 | int lz4ultra_encode_uncompressed_block_frame(unsigned char *pFrameData, const int nMaxFrameDataSize, const unsigned int nFlags, const int nBlockDataSize); 83 | 84 | /** 85 | * Encode terminal frame header 86 | * 87 | * @param pFrameData encoding buffer 88 | * @param nMaxFrameDataSize max encoding buffer size, in bytes 89 | * @param nFlags compression flags 90 | * 91 | * @return number of encoded bytes, or -1 for failure 92 | */ 93 | int lz4ultra_encode_footer_frame(unsigned char *pFrameData, const int nMaxFrameDataSize, const unsigned int nFlags); 94 | 95 | /** 96 | * Check compressed stream header 97 | * 98 | * @param pFrameData data bytes 99 | * @param nFrameDataSize number of bytes to check 100 | * 101 | * @return the number of extra header bytes to read for decoding, or LZ4ULTRA_DECODE_ERR_xxx for failure 102 | */ 103 | int lz4ultra_check_header(const unsigned char *pFrameData, const int nFrameDataSize); 104 | 105 | /** 106 | * Decode compressed stream header 107 | * 108 | * @param pFrameData data bytes 109 | * @param nFrameDataSize number of bytes to decode 110 | * @param nBlockMaxCode pointer to max block size code (4-7), updated if this function succeeds 111 | * @param nFlags returned compression flags 112 | * 113 | * @return LZ4ULTRA_DECODE_OK for success, or LZ4ULTRA_DECODE_ERR_xxx for failure 114 | */ 115 | int lz4ultra_decode_header(const unsigned char *pFrameData, const int nFrameDataSize, int *nBlockMaxCode, unsigned int *nFlags); 116 | 117 | /** 118 | * Decode frame header 119 | * 120 | * @param pFrameData data bytes 121 | * @param nFrameDataSize number of bytes to decode 122 | * @param nFlags compression flags 123 | * @param nBlockSize pointer to block size, updated if this function succeeds (set to 0 if this is the terminal frame) 124 | * @param nIsUncompressed pointer to compressed block flag, updated if this function succeeds 125 | * 126 | * @return LZ4ULTRA_DECODE_OK for success, or LZ4ULTRA_DECODE_ERR_FORMAT for failure 127 | */ 128 | int lz4ultra_decode_frame(const unsigned char *pFrameData, const int nFrameDataSize, const unsigned int nFlags, unsigned int *nBlockSize, int *nIsUncompressed); 129 | 130 | #endif /* _FRAME_H */ 131 | -------------------------------------------------------------------------------- /src/lib.c: -------------------------------------------------------------------------------- 1 | /* 2 | * lib.c - lz4ultra library implementation 3 | * 4 | * Copyright (C) 2019 Emmanuel Marty 5 | * 6 | * This software is provided 'as-is', without any express or implied 7 | * warranty. In no event will the authors be held liable for any damages 8 | * arising from the use of this software. 9 | * 10 | * Permission is granted to anyone to use this software for any purpose, 11 | * including commercial applications, and to alter it and redistribute it 12 | * freely, subject to the following restrictions: 13 | * 14 | * 1. The origin of this software must not be misrepresented; you must not 15 | * claim that you wrote the original software. If you use this software 16 | * in a product, an acknowledgment in the product documentation would be 17 | * appreciated but is not required. 18 | * 2. Altered source versions must be plainly marked as such, and must not be 19 | * misrepresented as being the original software. 20 | * 3. This notice may not be removed or altered from any source distribution. 21 | */ 22 | 23 | /* 24 | * Uses the libdivsufsort library Copyright (c) 2003-2008 Yuta Mori 25 | * 26 | * Inspired by LZ4 by Yann Collet. https://github.com/lz4/lz4 27 | * With help, ideas, optimizations and speed measurements by spke 28 | * With ideas from Lizard by Przemyslaw Skibinski and Yann Collet. https://github.com/inikep/lizard 29 | * Also with ideas from smallz4 by Stephan Brumme. https://create.stephan-brumme.com/smallz4/ 30 | * 31 | */ 32 | 33 | #include 34 | #include 35 | #include 36 | #include "lib.h" 37 | #include "frame.h" 38 | #include "format.h" 39 | -------------------------------------------------------------------------------- /src/lib.h: -------------------------------------------------------------------------------- 1 | /* 2 | * lib.h - lz4ultra library definitions 3 | * 4 | * Copyright (C) 2019 Emmanuel Marty 5 | * 6 | * This software is provided 'as-is', without any express or implied 7 | * warranty. In no event will the authors be held liable for any damages 8 | * arising from the use of this software. 9 | * 10 | * Permission is granted to anyone to use this software for any purpose, 11 | * including commercial applications, and to alter it and redistribute it 12 | * freely, subject to the following restrictions: 13 | * 14 | * 1. The origin of this software must not be misrepresented; you must not 15 | * claim that you wrote the original software. If you use this software 16 | * in a product, an acknowledgment in the product documentation would be 17 | * appreciated but is not required. 18 | * 2. Altered source versions must be plainly marked as such, and must not be 19 | * misrepresented as being the original software. 20 | * 3. This notice may not be removed or altered from any source distribution. 21 | */ 22 | 23 | /* 24 | * Uses the libdivsufsort library Copyright (c) 2003-2008 Yuta Mori 25 | * 26 | * Inspired by LZ4 by Yann Collet. https://github.com/lz4/lz4 27 | * With help, ideas, optimizations and speed measurements by spke 28 | * With ideas from Lizard by Przemyslaw Skibinski and Yann Collet. https://github.com/inikep/lizard 29 | * Also with ideas from smallz4 by Stephan Brumme. https://create.stephan-brumme.com/smallz4/ 30 | * 31 | */ 32 | 33 | #ifndef _LIB_H 34 | #define _LIB_H 35 | 36 | #include "stream.h" 37 | #include "dictionary.h" 38 | #include "shrink_context.h" 39 | #include "shrink_streaming.h" 40 | #include "shrink_inmem.h" 41 | #include "expand_block.h" 42 | #include "expand_streaming.h" 43 | #include "expand_inmem.h" 44 | 45 | /** High level status for compression and decompression */ 46 | typedef enum _lz4ultra_status_t { 47 | LZ4ULTRA_OK = 0, /**< Success */ 48 | LZ4ULTRA_ERROR_SRC, /**< Error reading input */ 49 | LZ4ULTRA_ERROR_DST, /**< Error reading output */ 50 | LZ4ULTRA_ERROR_DICTIONARY, /**< Error reading dictionary */ 51 | LZ4ULTRA_ERROR_MEMORY, /**< Out of memory */ 52 | 53 | /* Compression-specific status codes */ 54 | LZ4ULTRA_ERROR_COMPRESSION, /**< Internal compression error */ 55 | LZ4ULTRA_ERROR_RAW_TOOLARGE, /**< Input is too large to be compressed to a raw block */ 56 | LZ4ULTRA_ERROR_RAW_UNCOMPRESSED, /**< Input is incompressible and raw blocks don't support uncompressed data */ 57 | 58 | /* Decompression-specific status codes */ 59 | LZ4ULTRA_ERROR_FORMAT, /**< Invalid input format or magic number when decompressing */ 60 | LZ4ULTRA_ERROR_CHECKSUM, /**< Invalid checksum when decompressing */ 61 | LZ4ULTRA_ERROR_DECOMPRESSION, /**< Internal decompression error */ 62 | } lz4ultra_status_t; 63 | 64 | /* Compression flags */ 65 | #define LZ4ULTRA_FLAG_FAVOR_RATIO (1<<0) /**< 1 to compress with the best ratio, 0 to trade some compression ratio for extra decompression speed */ 66 | #define LZ4ULTRA_FLAG_RAW_BLOCK (1<<1) /**< 1 to emit raw block */ 67 | #define LZ4ULTRA_FLAG_INDEP_BLOCKS (1<<2) /**< 1 if blocks are independent, 0 if using inter-block back references */ 68 | #define LZ4ULTRA_FLAG_LEGACY_FRAMES (1<<3) /**< 1 if using the legacy frames format, 0 if using the modern lz4 frame format */ 69 | 70 | #endif /* _LIB_H */ 71 | -------------------------------------------------------------------------------- /src/libdivsufsort/.gitignore: -------------------------------------------------------------------------------- 1 | # Object files 2 | *.o 3 | *.ko 4 | *.obj 5 | *.elf 6 | 7 | # Precompiled Headers 8 | *.gch 9 | *.pch 10 | 11 | # Libraries 12 | *.lib 13 | *.a 14 | *.la 15 | *.lo 16 | 17 | # Shared objects (inc. Windows DLLs) 18 | *.dll 19 | *.so 20 | *.so.* 21 | *.dylib 22 | 23 | # Executables 24 | *.exe 25 | *.out 26 | *.app 27 | *.i*86 28 | *.x86_64 29 | *.hex 30 | 31 | # CMake files/directories 32 | build/ 33 | -------------------------------------------------------------------------------- /src/libdivsufsort/CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # libdivsufsort Change Log 2 | 3 | See full changelog at: https://github.com/y-256/libdivsufsort/commits 4 | 5 | ## [2.0.1] - 2010-11-11 6 | ### Fixed 7 | * Wrong variable used in `divbwt` function 8 | * Enclose some string variables with double quotation marks in include/CMakeLists.txt 9 | * Fix typo in include/CMakeLists.txt 10 | 11 | ## 2.0.0 - 2008-08-23 12 | ### Changed 13 | * Switch the build system to [CMake](http://www.cmake.org/) 14 | * Improve the performance of the suffix-sorting algorithm 15 | 16 | ### Added 17 | * OpenMP support 18 | * 64-bit version of divsufsort 19 | 20 | [Unreleased]: https://github.com/y-256/libdivsufsort/compare/2.0.1...HEAD 21 | [2.0.1]: https://github.com/y-256/libdivsufsort/compare/2.0.0...2.0.1 22 | -------------------------------------------------------------------------------- /src/libdivsufsort/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | ### cmake file for building libdivsufsort Package ### 2 | cmake_minimum_required(VERSION 2.4.4) 3 | set(CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/CMakeModules") 4 | include(AppendCompilerFlags) 5 | 6 | ## Project information ## 7 | project(libdivsufsort C) 8 | set(PROJECT_VENDOR "Yuta Mori") 9 | set(PROJECT_CONTACT "yuta.256@gmail.com") 10 | set(PROJECT_URL "https://github.com/y-256/libdivsufsort") 11 | set(PROJECT_DESCRIPTION "A lightweight suffix sorting library") 12 | include(VERSION.cmake) 13 | 14 | ## CPack configuration ## 15 | set(CPACK_GENERATOR "TGZ;TBZ2;ZIP") 16 | set(CPACK_SOURCE_GENERATOR "TGZ;TBZ2;ZIP") 17 | include(ProjectCPack) 18 | 19 | ## Project options ## 20 | option(BUILD_SHARED_LIBS "Set to OFF to build static libraries" ON) 21 | option(BUILD_EXAMPLES "Build examples" ON) 22 | option(BUILD_DIVSUFSORT64 "Build libdivsufsort64" OFF) 23 | option(USE_OPENMP "Use OpenMP for parallelization" OFF) 24 | option(WITH_LFS "Enable Large File Support" ON) 25 | 26 | ## Installation directories ## 27 | set(LIB_SUFFIX "" CACHE STRING "Define suffix of directory name (32 or 64)") 28 | 29 | set(CMAKE_INSTALL_RUNTIMEDIR "" CACHE PATH "Specify the output directory for dll runtimes (default is bin)") 30 | if(NOT CMAKE_INSTALL_RUNTIMEDIR) 31 | set(CMAKE_INSTALL_RUNTIMEDIR "${CMAKE_INSTALL_PREFIX}/bin") 32 | endif(NOT CMAKE_INSTALL_RUNTIMEDIR) 33 | 34 | set(CMAKE_INSTALL_LIBDIR "" CACHE PATH "Specify the output directory for libraries (default is lib)") 35 | if(NOT CMAKE_INSTALL_LIBDIR) 36 | set(CMAKE_INSTALL_LIBDIR "${CMAKE_INSTALL_PREFIX}/lib${LIB_SUFFIX}") 37 | endif(NOT CMAKE_INSTALL_LIBDIR) 38 | 39 | set(CMAKE_INSTALL_INCLUDEDIR "" CACHE PATH "Specify the output directory for header files (default is include)") 40 | if(NOT CMAKE_INSTALL_INCLUDEDIR) 41 | set(CMAKE_INSTALL_INCLUDEDIR "${CMAKE_INSTALL_PREFIX}/include") 42 | endif(NOT CMAKE_INSTALL_INCLUDEDIR) 43 | 44 | set(CMAKE_INSTALL_PKGCONFIGDIR "" CACHE PATH "Specify the output directory for pkgconfig files (default is lib/pkgconfig)") 45 | if(NOT CMAKE_INSTALL_PKGCONFIGDIR) 46 | set(CMAKE_INSTALL_PKGCONFIGDIR "${CMAKE_INSTALL_LIBDIR}/pkgconfig") 47 | endif(NOT CMAKE_INSTALL_PKGCONFIGDIR) 48 | 49 | ## Build type ## 50 | if(NOT CMAKE_BUILD_TYPE) 51 | set(CMAKE_BUILD_TYPE "Release") 52 | elseif(CMAKE_BUILD_TYPE STREQUAL "Debug") 53 | set(CMAKE_VERBOSE_MAKEFILE ON) 54 | endif(NOT CMAKE_BUILD_TYPE) 55 | 56 | ## Compiler options ## 57 | if(MSVC) 58 | append_c_compiler_flags("/W4" "VC" CMAKE_C_FLAGS) 59 | append_c_compiler_flags("/Oi;/Ot;/Ox;/Oy" "VC" CMAKE_C_FLAGS_RELEASE) 60 | if(USE_OPENMP) 61 | append_c_compiler_flags("/openmp" "VC" CMAKE_C_FLAGS) 62 | endif(USE_OPENMP) 63 | elseif(BORLAND) 64 | append_c_compiler_flags("-w" "BCC" CMAKE_C_FLAGS) 65 | append_c_compiler_flags("-Oi;-Og;-Os;-Ov;-Ox" "BCC" CMAKE_C_FLAGS_RELEASE) 66 | else(MSVC) 67 | if(CMAKE_COMPILER_IS_GNUCC) 68 | append_c_compiler_flags("-Wall" "GCC" CMAKE_C_FLAGS) 69 | append_c_compiler_flags("-fomit-frame-pointer" "GCC" CMAKE_C_FLAGS_RELEASE) 70 | if(USE_OPENMP) 71 | append_c_compiler_flags("-fopenmp" "GCC" CMAKE_C_FLAGS) 72 | endif(USE_OPENMP) 73 | else(CMAKE_COMPILER_IS_GNUCC) 74 | append_c_compiler_flags("-Wall" "UNKNOWN" CMAKE_C_FLAGS) 75 | append_c_compiler_flags("-fomit-frame-pointer" "UNKNOWN" CMAKE_C_FLAGS_RELEASE) 76 | if(USE_OPENMP) 77 | append_c_compiler_flags("-fopenmp;-openmp;-omp" "UNKNOWN" CMAKE_C_FLAGS) 78 | endif(USE_OPENMP) 79 | endif(CMAKE_COMPILER_IS_GNUCC) 80 | endif(MSVC) 81 | 82 | ## Add definitions ## 83 | add_definitions(-DHAVE_CONFIG_H=1 -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS) 84 | 85 | ## Add subdirectories ## 86 | add_subdirectory(pkgconfig) 87 | add_subdirectory(include) 88 | add_subdirectory(lib) 89 | if(BUILD_EXAMPLES) 90 | add_subdirectory(examples) 91 | endif(BUILD_EXAMPLES) 92 | 93 | ## Add 'uninstall' target ## 94 | CONFIGURE_FILE( 95 | "${CMAKE_CURRENT_SOURCE_DIR}/CMakeModules/cmake_uninstall.cmake.in" 96 | "${CMAKE_CURRENT_BINARY_DIR}/CMakeModules/cmake_uninstall.cmake" 97 | IMMEDIATE @ONLY) 98 | ADD_CUSTOM_TARGET(uninstall 99 | "${CMAKE_COMMAND}" -P "${CMAKE_CURRENT_BINARY_DIR}/CMakeModules/cmake_uninstall.cmake") 100 | -------------------------------------------------------------------------------- /src/libdivsufsort/CMakeModules/AppendCompilerFlags.cmake: -------------------------------------------------------------------------------- 1 | include(CheckCSourceCompiles) 2 | include(CheckCXXSourceCompiles) 3 | 4 | macro(append_c_compiler_flags _flags _name _result) 5 | set(SAFE_CMAKE_REQUIRED_FLAGS ${CMAKE_REQUIRED_FLAGS}) 6 | string(REGEX REPLACE "[-+/ ]" "_" cname "${_name}") 7 | string(TOUPPER "${cname}" cname) 8 | foreach(flag ${_flags}) 9 | string(REGEX REPLACE "^[-+/ ]+(.*)[-+/ ]*$" "\\1" flagname "${flag}") 10 | string(REGEX REPLACE "[-+/ ]" "_" flagname "${flagname}") 11 | string(TOUPPER "${flagname}" flagname) 12 | set(have_flag "HAVE_${cname}_${flagname}") 13 | set(CMAKE_REQUIRED_FLAGS "${flag}") 14 | check_c_source_compiles("int main() { return 0; }" ${have_flag}) 15 | if(${have_flag}) 16 | set(${_result} "${${_result}} ${flag}") 17 | endif(${have_flag}) 18 | endforeach(flag) 19 | set(CMAKE_REQUIRED_FLAGS ${SAFE_CMAKE_REQUIRED_FLAGS}) 20 | endmacro(append_c_compiler_flags) 21 | 22 | macro(append_cxx_compiler_flags _flags _name _result) 23 | set(SAFE_CMAKE_REQUIRED_FLAGS ${CMAKE_REQUIRED_FLAGS}) 24 | string(REGEX REPLACE "[-+/ ]" "_" cname "${_name}") 25 | string(TOUPPER "${cname}" cname) 26 | foreach(flag ${_flags}) 27 | string(REGEX REPLACE "^[-+/ ]+(.*)[-+/ ]*$" "\\1" flagname "${flag}") 28 | string(REGEX REPLACE "[-+/ ]" "_" flagname "${flagname}") 29 | string(TOUPPER "${flagname}" flagname) 30 | set(have_flag "HAVE_${cname}_${flagname}") 31 | set(CMAKE_REQUIRED_FLAGS "${flag}") 32 | check_cxx_source_compiles("int main() { return 0; }" ${have_flag}) 33 | if(${have_flag}) 34 | set(${_result} "${${_result}} ${flag}") 35 | endif(${have_flag}) 36 | endforeach(flag) 37 | set(CMAKE_REQUIRED_FLAGS ${SAFE_CMAKE_REQUIRED_FLAGS}) 38 | endmacro(append_cxx_compiler_flags) 39 | -------------------------------------------------------------------------------- /src/libdivsufsort/CMakeModules/CheckFunctionKeywords.cmake: -------------------------------------------------------------------------------- 1 | include(CheckCSourceCompiles) 2 | 3 | macro(check_function_keywords _wordlist) 4 | set(${_result} "") 5 | foreach(flag ${_wordlist}) 6 | string(REGEX REPLACE "[-+/ ()]" "_" flagname "${flag}") 7 | string(TOUPPER "${flagname}" flagname) 8 | set(have_flag "HAVE_${flagname}") 9 | check_c_source_compiles("${flag} void func(); void func() { } int main() { func(); return 0; }" ${have_flag}) 10 | if(${have_flag} AND NOT ${_result}) 11 | set(${_result} "${flag}") 12 | # break() 13 | endif(${have_flag} AND NOT ${_result}) 14 | endforeach(flag) 15 | endmacro(check_function_keywords) 16 | -------------------------------------------------------------------------------- /src/libdivsufsort/CMakeModules/CheckLFS.cmake: -------------------------------------------------------------------------------- 1 | ## Checks for large file support ## 2 | include(CheckIncludeFile) 3 | include(CheckSymbolExists) 4 | include(CheckTypeSize) 5 | 6 | macro(check_lfs _isenable) 7 | set(LFS_OFF_T "") 8 | set(LFS_FOPEN "") 9 | set(LFS_FSEEK "") 10 | set(LFS_FTELL "") 11 | set(LFS_PRID "") 12 | 13 | if(${_isenable}) 14 | set(SAFE_CMAKE_REQUIRED_DEFINITIONS "${CMAKE_REQUIRED_DEFINITIONS}") 15 | set(CMAKE_REQUIRED_DEFINITIONS ${CMAKE_REQUIRED_DEFINITIONS} 16 | -D_LARGEFILE_SOURCE -D_LARGE_FILES -D_FILE_OFFSET_BITS=64 17 | -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS) 18 | 19 | check_include_file("sys/types.h" HAVE_SYS_TYPES_H) 20 | check_include_file("inttypes.h" HAVE_INTTYPES_H) 21 | check_include_file("stddef.h" HAVE_STDDEF_H) 22 | check_include_file("stdint.h" HAVE_STDINT_H) 23 | 24 | # LFS type1: 8 <= sizeof(off_t), fseeko, ftello 25 | check_type_size("off_t" SIZEOF_OFF_T) 26 | if(SIZEOF_OFF_T GREATER 7) 27 | check_symbol_exists("fseeko" "stdio.h" HAVE_FSEEKO) 28 | check_symbol_exists("ftello" "stdio.h" HAVE_FTELLO) 29 | if(HAVE_FSEEKO AND HAVE_FTELLO) 30 | set(LFS_OFF_T "off_t") 31 | set(LFS_FOPEN "fopen") 32 | set(LFS_FSEEK "fseeko") 33 | set(LFS_FTELL "ftello") 34 | check_symbol_exists("PRIdMAX" "inttypes.h" HAVE_PRIDMAX) 35 | if(HAVE_PRIDMAX) 36 | set(LFS_PRID "PRIdMAX") 37 | else(HAVE_PRIDMAX) 38 | check_type_size("long" SIZEOF_LONG) 39 | check_type_size("int" SIZEOF_INT) 40 | if(SIZEOF_OFF_T GREATER SIZEOF_LONG) 41 | set(LFS_PRID "\"lld\"") 42 | elseif(SIZEOF_LONG GREATER SIZEOF_INT) 43 | set(LFS_PRID "\"ld\"") 44 | else(SIZEOF_OFF_T GREATER SIZEOF_LONG) 45 | set(LFS_PRID "\"d\"") 46 | endif(SIZEOF_OFF_T GREATER SIZEOF_LONG) 47 | endif(HAVE_PRIDMAX) 48 | endif(HAVE_FSEEKO AND HAVE_FTELLO) 49 | endif(SIZEOF_OFF_T GREATER 7) 50 | 51 | # LFS type2: 8 <= sizeof(off64_t), fopen64, fseeko64, ftello64 52 | if(NOT LFS_OFF_T) 53 | check_type_size("off64_t" SIZEOF_OFF64_T) 54 | if(SIZEOF_OFF64_T GREATER 7) 55 | check_symbol_exists("fopen64" "stdio.h" HAVE_FOPEN64) 56 | check_symbol_exists("fseeko64" "stdio.h" HAVE_FSEEKO64) 57 | check_symbol_exists("ftello64" "stdio.h" HAVE_FTELLO64) 58 | if(HAVE_FOPEN64 AND HAVE_FSEEKO64 AND HAVE_FTELLO64) 59 | set(LFS_OFF_T "off64_t") 60 | set(LFS_FOPEN "fopen64") 61 | set(LFS_FSEEK "fseeko64") 62 | set(LFS_FTELL "ftello64") 63 | check_symbol_exists("PRIdMAX" "inttypes.h" HAVE_PRIDMAX) 64 | if(HAVE_PRIDMAX) 65 | set(LFS_PRID "PRIdMAX") 66 | else(HAVE_PRIDMAX) 67 | check_type_size("long" SIZEOF_LONG) 68 | check_type_size("int" SIZEOF_INT) 69 | if(SIZEOF_OFF64_T GREATER SIZEOF_LONG) 70 | set(LFS_PRID "\"lld\"") 71 | elseif(SIZEOF_LONG GREATER SIZEOF_INT) 72 | set(LFS_PRID "\"ld\"") 73 | else(SIZEOF_OFF64_T GREATER SIZEOF_LONG) 74 | set(LFS_PRID "\"d\"") 75 | endif(SIZEOF_OFF64_T GREATER SIZEOF_LONG) 76 | endif(HAVE_PRIDMAX) 77 | endif(HAVE_FOPEN64 AND HAVE_FSEEKO64 AND HAVE_FTELLO64) 78 | endif(SIZEOF_OFF64_T GREATER 7) 79 | endif(NOT LFS_OFF_T) 80 | 81 | # LFS type3: 8 <= sizeof(__int64), _fseeki64, _ftelli64 82 | if(NOT LFS_OFF_T) 83 | check_type_size("__int64" SIZEOF___INT64) 84 | if(SIZEOF___INT64 GREATER 7) 85 | check_symbol_exists("_fseeki64" "stdio.h" HAVE__FSEEKI64) 86 | check_symbol_exists("_ftelli64" "stdio.h" HAVE__FTELLI64) 87 | if(HAVE__FSEEKI64 AND HAVE__FTELLI64) 88 | set(LFS_OFF_T "__int64") 89 | set(LFS_FOPEN "fopen") 90 | set(LFS_FSEEK "_fseeki64") 91 | set(LFS_FTELL "_ftelli64") 92 | set(LFS_PRID "\"I64d\"") 93 | endif(HAVE__FSEEKI64 AND HAVE__FTELLI64) 94 | endif(SIZEOF___INT64 GREATER 7) 95 | endif(NOT LFS_OFF_T) 96 | 97 | set(CMAKE_REQUIRED_DEFINITIONS "${SAFE_CMAKE_REQUIRED_DEFINITIONS}") 98 | endif(${_isenable}) 99 | 100 | if(NOT LFS_OFF_T) 101 | ## not found 102 | set(LFS_OFF_T "long") 103 | set(LFS_FOPEN "fopen") 104 | set(LFS_FSEEK "fseek") 105 | set(LFS_FTELL "ftell") 106 | set(LFS_PRID "\"ld\"") 107 | endif(NOT LFS_OFF_T) 108 | 109 | endmacro(check_lfs) 110 | -------------------------------------------------------------------------------- /src/libdivsufsort/CMakeModules/ProjectCPack.cmake: -------------------------------------------------------------------------------- 1 | # If the cmake version includes cpack, use it 2 | IF(EXISTS "${CMAKE_ROOT}/Modules/CPack.cmake") 3 | SET(CPACK_PACKAGE_DESCRIPTION_SUMMARY "${PROJECT_DESCRIPTION}") 4 | SET(CPACK_PACKAGE_VENDOR "${PROJECT_VENDOR}") 5 | SET(CPACK_PACKAGE_DESCRIPTION_FILE "${CMAKE_CURRENT_SOURCE_DIR}/README.md") 6 | SET(CPACK_RESOURCE_FILE_LICENSE "${CMAKE_CURRENT_SOURCE_DIR}/LICENSE") 7 | SET(CPACK_PACKAGE_VERSION_MAJOR "${PROJECT_VERSION_MAJOR}") 8 | SET(CPACK_PACKAGE_VERSION_MINOR "${PROJECT_VERSION_MINOR}") 9 | SET(CPACK_PACKAGE_VERSION_PATCH "${PROJECT_VERSION_PATCH}") 10 | # SET(CPACK_PACKAGE_INSTALL_DIRECTORY "${PROJECT_NAME} ${PROJECT_VERSION}") 11 | SET(CPACK_SOURCE_PACKAGE_FILE_NAME "${PROJECT_NAME}-${PROJECT_VERSION_FULL}") 12 | 13 | IF(NOT DEFINED CPACK_SYSTEM_NAME) 14 | SET(CPACK_SYSTEM_NAME "${CMAKE_SYSTEM_NAME}-${CMAKE_SYSTEM_PROCESSOR}") 15 | ENDIF(NOT DEFINED CPACK_SYSTEM_NAME) 16 | 17 | IF(${CPACK_SYSTEM_NAME} MATCHES Windows) 18 | IF(CMAKE_CL_64) 19 | SET(CPACK_SYSTEM_NAME win64-${CMAKE_SYSTEM_PROCESSOR}) 20 | ELSE(CMAKE_CL_64) 21 | SET(CPACK_SYSTEM_NAME win32-${CMAKE_SYSTEM_PROCESSOR}) 22 | ENDIF(CMAKE_CL_64) 23 | ENDIF(${CPACK_SYSTEM_NAME} MATCHES Windows) 24 | 25 | IF(NOT DEFINED CPACK_PACKAGE_FILE_NAME) 26 | SET(CPACK_PACKAGE_FILE_NAME "${CPACK_SOURCE_PACKAGE_FILE_NAME}-${CPACK_SYSTEM_NAME}") 27 | ENDIF(NOT DEFINED CPACK_PACKAGE_FILE_NAME) 28 | 29 | SET(CPACK_PACKAGE_CONTACT "${PROJECT_CONTACT}") 30 | IF(UNIX) 31 | SET(CPACK_STRIP_FILES "") 32 | SET(CPACK_SOURCE_STRIP_FILES "") 33 | # SET(CPACK_PACKAGE_EXECUTABLES "ccmake" "CMake") 34 | ENDIF(UNIX) 35 | SET(CPACK_SOURCE_IGNORE_FILES "/CVS/" "/build/" "/\\\\.build/" "/\\\\.svn/" "~$") 36 | # include CPack model once all variables are set 37 | INCLUDE(CPack) 38 | ENDIF(EXISTS "${CMAKE_ROOT}/Modules/CPack.cmake") 39 | -------------------------------------------------------------------------------- /src/libdivsufsort/CMakeModules/cmake_uninstall.cmake.in: -------------------------------------------------------------------------------- 1 | IF(NOT EXISTS "@CMAKE_CURRENT_BINARY_DIR@/install_manifest.txt") 2 | MESSAGE(FATAL_ERROR "Cannot find install manifest: \"@CMAKE_CURRENT_BINARY_DIR@/install_manifest.txt\"") 3 | ENDIF(NOT EXISTS "@CMAKE_CURRENT_BINARY_DIR@/install_manifest.txt") 4 | 5 | FILE(READ "@CMAKE_CURRENT_BINARY_DIR@/install_manifest.txt" files) 6 | STRING(REGEX REPLACE "\n" ";" files "${files}") 7 | 8 | SET(NUM 0) 9 | FOREACH(file ${files}) 10 | IF(EXISTS "$ENV{DESTDIR}${file}") 11 | MESSAGE(STATUS "Looking for \"$ENV{DESTDIR}${file}\" - found") 12 | SET(UNINSTALL_CHECK_${NUM} 1) 13 | ELSE(EXISTS "$ENV{DESTDIR}${file}") 14 | MESSAGE(STATUS "Looking for \"$ENV{DESTDIR}${file}\" - not found") 15 | SET(UNINSTALL_CHECK_${NUM} 0) 16 | ENDIF(EXISTS "$ENV{DESTDIR}${file}") 17 | MATH(EXPR NUM "1 + ${NUM}") 18 | ENDFOREACH(file) 19 | 20 | SET(NUM 0) 21 | FOREACH(file ${files}) 22 | IF(${UNINSTALL_CHECK_${NUM}}) 23 | MESSAGE(STATUS "Uninstalling \"$ENV{DESTDIR}${file}\"") 24 | EXEC_PROGRAM( 25 | "@CMAKE_COMMAND@" ARGS "-E remove \"$ENV{DESTDIR}${file}\"" 26 | OUTPUT_VARIABLE rm_out 27 | RETURN_VALUE rm_retval 28 | ) 29 | IF(NOT "${rm_retval}" STREQUAL 0) 30 | MESSAGE(FATAL_ERROR "Problem when removing \"$ENV{DESTDIR}${file}\"") 31 | ENDIF(NOT "${rm_retval}" STREQUAL 0) 32 | ENDIF(${UNINSTALL_CHECK_${NUM}}) 33 | MATH(EXPR NUM "1 + ${NUM}") 34 | ENDFOREACH(file) 35 | 36 | FILE(REMOVE "@CMAKE_CURRENT_BINARY_DIR@/install_manifest.txt") 37 | -------------------------------------------------------------------------------- /src/libdivsufsort/LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2003 Yuta Mori All rights reserved. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /src/libdivsufsort/README.md: -------------------------------------------------------------------------------- 1 | # libdivsufsort 2 | 3 | libdivsufsort is a software library that implements a lightweight suffix array construction algorithm. 4 | 5 | ## News 6 | * 2015-03-21: The project has moved from [Google Code](http://code.google.com/p/libdivsufsort/) to [GitHub](https://github.com/y-256/libdivsufsort) 7 | 8 | ## Introduction 9 | This library provides a simple and an efficient C API to construct a suffix array and a Burrows-Wheeler transformed string from a given string over a constant-size alphabet. 10 | The algorithm runs in O(n log n) worst-case time using only 5n+O(1) bytes of memory space, where n is the length of 11 | the string. 12 | 13 | ## Build requirements 14 | * An ANSI C Compiler (e.g. GNU GCC) 15 | * [CMake](http://www.cmake.org/ "CMake") version 2.4.2 or newer 16 | * CMake-supported build tool 17 | 18 | ## Building on GNU/Linux 19 | 1. Get the source code from GitHub. You can either 20 | * use git to clone the repository 21 | ``` 22 | git clone https://github.com/y-256/libdivsufsort.git 23 | ``` 24 | * or download a [zip file](../../archive/master.zip) directly 25 | 2. Create a `build` directory in the package source directory. 26 | ```shell 27 | $ cd libdivsufsort 28 | $ mkdir build 29 | $ cd build 30 | ``` 31 | 3. Configure the package for your system. 32 | If you want to install to a different location, change the -DCMAKE_INSTALL_PREFIX option. 33 | ```shell 34 | $ cmake -DCMAKE_BUILD_TYPE="Release" \ 35 | -DCMAKE_INSTALL_PREFIX="/usr/local" .. 36 | ``` 37 | 4. Compile the package. 38 | ```shell 39 | $ make 40 | ``` 41 | 5. (Optional) Install the library and header files. 42 | ```shell 43 | $ sudo make install 44 | ``` 45 | 46 | ## API 47 | ```c 48 | /* Data types */ 49 | typedef int32_t saint_t; 50 | typedef int32_t saidx_t; 51 | typedef uint8_t sauchar_t; 52 | 53 | /* 54 | * Constructs the suffix array of a given string. 55 | * @param T[0..n-1] The input string. 56 | * @param SA[0..n-1] The output array or suffixes. 57 | * @param n The length of the given string. 58 | * @return 0 if no error occurred, -1 or -2 otherwise. 59 | */ 60 | saint_t 61 | divsufsort(const sauchar_t *T, saidx_t *SA, saidx_t n); 62 | 63 | /* 64 | * Constructs the burrows-wheeler transformed string of a given string. 65 | * @param T[0..n-1] The input string. 66 | * @param U[0..n-1] The output string. (can be T) 67 | * @param A[0..n-1] The temporary array. (can be NULL) 68 | * @param n The length of the given string. 69 | * @return The primary index if no error occurred, -1 or -2 otherwise. 70 | */ 71 | saidx_t 72 | divbwt(const sauchar_t *T, sauchar_t *U, saidx_t *A, saidx_t n); 73 | ``` 74 | 75 | ## Example Usage 76 | ```c 77 | #include 78 | #include 79 | #include 80 | 81 | #include 82 | 83 | int main() { 84 | // intput data 85 | char *Text = "abracadabra"; 86 | int n = strlen(Text); 87 | int i, j; 88 | 89 | // allocate 90 | int *SA = (int *)malloc(n * sizeof(int)); 91 | 92 | // sort 93 | divsufsort((unsigned char *)Text, SA, n); 94 | 95 | // output 96 | for(i = 0; i < n; ++i) { 97 | printf("SA[%2d] = %2d: ", i, SA[i]); 98 | for(j = SA[i]; j < n; ++j) { 99 | printf("%c", Text[j]); 100 | } 101 | printf("$\n"); 102 | } 103 | 104 | // deallocate 105 | free(SA); 106 | 107 | return 0; 108 | } 109 | ``` 110 | See the [examples](examples) directory for a few other examples. 111 | 112 | ## Benchmarks 113 | See [Benchmarks](https://github.com/y-256/libdivsufsort/blob/wiki/SACA_Benchmarks.md) page for details. 114 | 115 | ## License 116 | libdivsufsort is released under the [MIT license](LICENSE "MIT license"). 117 | > The MIT License (MIT) 118 | > 119 | > Copyright (c) 2003 Yuta Mori All rights reserved. 120 | > 121 | > Permission is hereby granted, free of charge, to any person obtaining a copy 122 | > of this software and associated documentation files (the "Software"), to deal 123 | > in the Software without restriction, including without limitation the rights 124 | > to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 125 | > copies of the Software, and to permit persons to whom the Software is 126 | > furnished to do so, subject to the following conditions: 127 | > 128 | > The above copyright notice and this permission notice shall be included in all 129 | > copies or substantial portions of the Software. 130 | > 131 | > THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 132 | > IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 133 | > FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 134 | > AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 135 | > LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 136 | > OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 137 | > SOFTWARE. 138 | 139 | ## Author 140 | * Yuta Mori 141 | -------------------------------------------------------------------------------- /src/libdivsufsort/VERSION.cmake: -------------------------------------------------------------------------------- 1 | set(PROJECT_VERSION_MAJOR "2") 2 | set(PROJECT_VERSION_MINOR "0") 3 | set(PROJECT_VERSION_PATCH "2") 4 | set(PROJECT_VERSION_EXTRA "-1") 5 | set(PROJECT_VERSION "${PROJECT_VERSION_MAJOR}.${PROJECT_VERSION_MINOR}") 6 | set(PROJECT_VERSION_FULL "${PROJECT_VERSION_MAJOR}.${PROJECT_VERSION_MINOR}.${PROJECT_VERSION_PATCH}${PROJECT_VERSION_EXTRA}") 7 | 8 | set(LIBRARY_VERSION "3.0.1") 9 | set(LIBRARY_SOVERSION "3") 10 | 11 | ## Git revision number ## 12 | if(EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/.git") 13 | execute_process(COMMAND git describe --tags HEAD 14 | WORKING_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}" 15 | OUTPUT_VARIABLE GIT_DESCRIBE_TAGS ERROR_QUIET) 16 | if(GIT_DESCRIBE_TAGS) 17 | string(REGEX REPLACE "^v(.*)" "\\1" GIT_REVISION "${GIT_DESCRIBE_TAGS}") 18 | string(STRIP "${GIT_REVISION}" GIT_REVISION) 19 | if(GIT_REVISION) 20 | set(PROJECT_VERSION_FULL "${GIT_REVISION}") 21 | endif(GIT_REVISION) 22 | endif(GIT_DESCRIBE_TAGS) 23 | endif(EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/.git") 24 | -------------------------------------------------------------------------------- /src/libdivsufsort/examples/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | ## Add definitions ## 2 | add_definitions(-D_LARGEFILE_SOURCE -D_LARGE_FILES -D_FILE_OFFSET_BITS=64) 3 | 4 | ## Targets ## 5 | include_directories("${CMAKE_CURRENT_SOURCE_DIR}/../include" 6 | "${CMAKE_CURRENT_BINARY_DIR}/../include") 7 | link_directories("${CMAKE_CURRENT_BINARY_DIR}/../lib") 8 | foreach(src suftest mksary sasearch bwt unbwt) 9 | add_executable(${src} ${src}.c) 10 | target_link_libraries(${src} divsufsort) 11 | endforeach(src) 12 | -------------------------------------------------------------------------------- /src/libdivsufsort/examples/bwt.c: -------------------------------------------------------------------------------- 1 | /* 2 | * bwt.c for libdivsufsort 3 | * Copyright (c) 2003-2008 Yuta Mori All Rights Reserved. 4 | * 5 | * Permission is hereby granted, free of charge, to any person 6 | * obtaining a copy of this software and associated documentation 7 | * files (the "Software"), to deal in the Software without 8 | * restriction, including without limitation the rights to use, 9 | * copy, modify, merge, publish, distribute, sublicense, and/or sell 10 | * copies of the Software, and to permit persons to whom the 11 | * Software is furnished to do so, subject to the following 12 | * conditions: 13 | * 14 | * The above copyright notice and this permission notice shall be 15 | * included in all copies or substantial portions of the Software. 16 | * 17 | * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 18 | * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 19 | * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 20 | * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 21 | * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 22 | * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 23 | * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 24 | * OTHER DEALINGS IN THE SOFTWARE. 25 | */ 26 | 27 | #if HAVE_CONFIG_H 28 | # include "config.h" 29 | #endif 30 | #include 31 | #if HAVE_STRING_H 32 | # include 33 | #endif 34 | #if HAVE_STDLIB_H 35 | # include 36 | #endif 37 | #if HAVE_MEMORY_H 38 | # include 39 | #endif 40 | #if HAVE_STDDEF_H 41 | # include 42 | #endif 43 | #if HAVE_STRINGS_H 44 | # include 45 | #endif 46 | #if HAVE_SYS_TYPES_H 47 | # include 48 | #endif 49 | #if HAVE_IO_H && HAVE_FCNTL_H 50 | # include 51 | # include 52 | #endif 53 | #include 54 | #include 55 | #include "lfs.h" 56 | 57 | 58 | static 59 | size_t 60 | write_int(FILE *fp, saidx_t n) { 61 | unsigned char c[4]; 62 | c[0] = (unsigned char)((n >> 0) & 0xff), c[1] = (unsigned char)((n >> 8) & 0xff), 63 | c[2] = (unsigned char)((n >> 16) & 0xff), c[3] = (unsigned char)((n >> 24) & 0xff); 64 | return fwrite(c, sizeof(unsigned char), 4, fp); 65 | } 66 | 67 | static 68 | void 69 | print_help(const char *progname, int status) { 70 | fprintf(stderr, 71 | "bwt, a burrows-wheeler transform program, version %s.\n", 72 | divsufsort_version()); 73 | fprintf(stderr, "usage: %s [-b num] INFILE OUTFILE\n", progname); 74 | fprintf(stderr, " -b num set block size to num MiB [1..512] (default: 32)\n\n"); 75 | exit(status); 76 | } 77 | 78 | int 79 | main(int argc, const char *argv[]) { 80 | FILE *fp, *ofp; 81 | const char *fname, *ofname; 82 | sauchar_t *T; 83 | saidx_t *SA; 84 | LFS_OFF_T n; 85 | size_t m; 86 | saidx_t pidx; 87 | clock_t start,finish; 88 | saint_t i, blocksize = 32, needclose = 3; 89 | 90 | /* Check arguments. */ 91 | if((argc == 1) || 92 | (strcmp(argv[1], "-h") == 0) || 93 | (strcmp(argv[1], "--help") == 0)) { print_help(argv[0], EXIT_SUCCESS); } 94 | if((argc != 3) && (argc != 5)) { print_help(argv[0], EXIT_FAILURE); } 95 | i = 1; 96 | if(argc == 5) { 97 | if(strcmp(argv[i], "-b") != 0) { print_help(argv[0], EXIT_FAILURE); } 98 | blocksize = atoi(argv[i + 1]); 99 | if(blocksize < 0) { blocksize = 1; } 100 | else if(512 < blocksize) { blocksize = 512; } 101 | i += 2; 102 | } 103 | blocksize <<= 20; 104 | 105 | /* Open a file for reading. */ 106 | if(strcmp(argv[i], "-") != 0) { 107 | #if HAVE_FOPEN_S 108 | if(fopen_s(&fp, fname = argv[i], "rb") != 0) { 109 | #else 110 | if((fp = LFS_FOPEN(fname = argv[i], "rb")) == NULL) { 111 | #endif 112 | fprintf(stderr, "%s: Cannot open file `%s': ", argv[0], fname); 113 | perror(NULL); 114 | exit(EXIT_FAILURE); 115 | } 116 | } else { 117 | #if HAVE__SETMODE && HAVE__FILENO 118 | if(_setmode(_fileno(stdin), _O_BINARY) == -1) { 119 | fprintf(stderr, "%s: Cannot set mode: ", argv[0]); 120 | perror(NULL); 121 | exit(EXIT_FAILURE); 122 | } 123 | #endif 124 | fp = stdin; 125 | fname = "stdin"; 126 | needclose ^= 1; 127 | } 128 | i += 1; 129 | 130 | /* Open a file for writing. */ 131 | if(strcmp(argv[i], "-") != 0) { 132 | #if HAVE_FOPEN_S 133 | if(fopen_s(&ofp, ofname = argv[i], "wb") != 0) { 134 | #else 135 | if((ofp = LFS_FOPEN(ofname = argv[i], "wb")) == NULL) { 136 | #endif 137 | fprintf(stderr, "%s: Cannot open file `%s': ", argv[0], ofname); 138 | perror(NULL); 139 | exit(EXIT_FAILURE); 140 | } 141 | } else { 142 | #if HAVE__SETMODE && HAVE__FILENO 143 | if(_setmode(_fileno(stdout), _O_BINARY) == -1) { 144 | fprintf(stderr, "%s: Cannot set mode: ", argv[0]); 145 | perror(NULL); 146 | exit(EXIT_FAILURE); 147 | } 148 | #endif 149 | ofp = stdout; 150 | ofname = "stdout"; 151 | needclose ^= 2; 152 | } 153 | 154 | /* Get the file size. */ 155 | if(LFS_FSEEK(fp, 0, SEEK_END) == 0) { 156 | n = LFS_FTELL(fp); 157 | rewind(fp); 158 | if(n < 0) { 159 | fprintf(stderr, "%s: Cannot ftell `%s': ", argv[0], fname); 160 | perror(NULL); 161 | exit(EXIT_FAILURE); 162 | } 163 | if(0x20000000L < n) { n = 0x20000000L; } 164 | if((blocksize == 0) || (n < blocksize)) { blocksize = (saidx_t)n; } 165 | } else if(blocksize == 0) { blocksize = 32 << 20; } 166 | 167 | /* Allocate 5blocksize bytes of memory. */ 168 | T = (sauchar_t *)malloc(blocksize * sizeof(sauchar_t)); 169 | SA = (saidx_t *)malloc(blocksize * sizeof(saidx_t)); 170 | if((T == NULL) || (SA == NULL)) { 171 | fprintf(stderr, "%s: Cannot allocate memory.\n", argv[0]); 172 | exit(EXIT_FAILURE); 173 | } 174 | 175 | /* Write the blocksize. */ 176 | if(write_int(ofp, blocksize) != 4) { 177 | fprintf(stderr, "%s: Cannot write to `%s': ", argv[0], ofname); 178 | perror(NULL); 179 | exit(EXIT_FAILURE); 180 | } 181 | 182 | fprintf(stderr, " BWT (blocksize %" PRIdSAINT_T ") ... ", blocksize); 183 | start = clock(); 184 | for(n = 0; 0 < (m = fread(T, sizeof(sauchar_t), blocksize, fp)); n += m) { 185 | /* Burrows-Wheeler Transform. */ 186 | pidx = divbwt(T, T, SA, m); 187 | if(pidx < 0) { 188 | fprintf(stderr, "%s (bw_transform): %s.\n", 189 | argv[0], 190 | (pidx == -1) ? "Invalid arguments" : "Cannot allocate memory"); 191 | exit(EXIT_FAILURE); 192 | } 193 | 194 | /* Write the bwted data. */ 195 | if((write_int(ofp, pidx) != 4) || 196 | (fwrite(T, sizeof(sauchar_t), m, ofp) != m)) { 197 | fprintf(stderr, "%s: Cannot write to `%s': ", argv[0], ofname); 198 | perror(NULL); 199 | exit(EXIT_FAILURE); 200 | } 201 | } 202 | if(ferror(fp)) { 203 | fprintf(stderr, "%s: Cannot read from `%s': ", argv[0], fname); 204 | perror(NULL); 205 | exit(EXIT_FAILURE); 206 | } 207 | finish = clock(); 208 | fprintf(stderr, "%" PRIdOFF_T " bytes: %.4f sec\n", 209 | n, (double)(finish - start) / (double)CLOCKS_PER_SEC); 210 | 211 | /* Close files */ 212 | if(needclose & 1) { fclose(fp); } 213 | if(needclose & 2) { fclose(ofp); } 214 | 215 | /* Deallocate memory. */ 216 | free(SA); 217 | free(T); 218 | 219 | return 0; 220 | } 221 | -------------------------------------------------------------------------------- /src/libdivsufsort/examples/mksary.c: -------------------------------------------------------------------------------- 1 | /* 2 | * mksary.c for libdivsufsort 3 | * Copyright (c) 2003-2008 Yuta Mori All Rights Reserved. 4 | * 5 | * Permission is hereby granted, free of charge, to any person 6 | * obtaining a copy of this software and associated documentation 7 | * files (the "Software"), to deal in the Software without 8 | * restriction, including without limitation the rights to use, 9 | * copy, modify, merge, publish, distribute, sublicense, and/or sell 10 | * copies of the Software, and to permit persons to whom the 11 | * Software is furnished to do so, subject to the following 12 | * conditions: 13 | * 14 | * The above copyright notice and this permission notice shall be 15 | * included in all copies or substantial portions of the Software. 16 | * 17 | * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 18 | * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 19 | * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 20 | * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 21 | * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 22 | * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 23 | * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 24 | * OTHER DEALINGS IN THE SOFTWARE. 25 | */ 26 | 27 | #if HAVE_CONFIG_H 28 | # include "config.h" 29 | #endif 30 | #include 31 | #if HAVE_STRING_H 32 | # include 33 | #endif 34 | #if HAVE_STDLIB_H 35 | # include 36 | #endif 37 | #if HAVE_MEMORY_H 38 | # include 39 | #endif 40 | #if HAVE_STDDEF_H 41 | # include 42 | #endif 43 | #if HAVE_STRINGS_H 44 | # include 45 | #endif 46 | #if HAVE_SYS_TYPES_H 47 | # include 48 | #endif 49 | #if HAVE_IO_H && HAVE_FCNTL_H 50 | # include 51 | # include 52 | #endif 53 | #include 54 | #include 55 | #include "lfs.h" 56 | 57 | 58 | static 59 | void 60 | print_help(const char *progname, int status) { 61 | fprintf(stderr, 62 | "mksary, a simple suffix array builder, version %s.\n", 63 | divsufsort_version()); 64 | fprintf(stderr, "usage: %s INFILE OUTFILE\n\n", progname); 65 | exit(status); 66 | } 67 | 68 | int 69 | main(int argc, const char *argv[]) { 70 | FILE *fp, *ofp; 71 | const char *fname, *ofname; 72 | sauchar_t *T; 73 | saidx_t *SA; 74 | LFS_OFF_T n; 75 | clock_t start, finish; 76 | saint_t needclose = 3; 77 | 78 | /* Check arguments. */ 79 | if((argc == 1) || 80 | (strcmp(argv[1], "-h") == 0) || 81 | (strcmp(argv[1], "--help") == 0)) { print_help(argv[0], EXIT_SUCCESS); } 82 | if(argc != 3) { print_help(argv[0], EXIT_FAILURE); } 83 | 84 | /* Open a file for reading. */ 85 | if(strcmp(argv[1], "-") != 0) { 86 | #if HAVE_FOPEN_S 87 | if(fopen_s(&fp, fname = argv[1], "rb") != 0) { 88 | #else 89 | if((fp = LFS_FOPEN(fname = argv[1], "rb")) == NULL) { 90 | #endif 91 | fprintf(stderr, "%s: Cannot open file `%s': ", argv[0], fname); 92 | perror(NULL); 93 | exit(EXIT_FAILURE); 94 | } 95 | } else { 96 | #if HAVE__SETMODE && HAVE__FILENO 97 | if(_setmode(_fileno(stdin), _O_BINARY) == -1) { 98 | fprintf(stderr, "%s: Cannot set mode: ", argv[0]); 99 | perror(NULL); 100 | exit(EXIT_FAILURE); 101 | } 102 | #endif 103 | fp = stdin; 104 | fname = "stdin"; 105 | needclose ^= 1; 106 | } 107 | 108 | /* Open a file for writing. */ 109 | if(strcmp(argv[2], "-") != 0) { 110 | #if HAVE_FOPEN_S 111 | if(fopen_s(&ofp, ofname = argv[2], "wb") != 0) { 112 | #else 113 | if((ofp = LFS_FOPEN(ofname = argv[2], "wb")) == NULL) { 114 | #endif 115 | fprintf(stderr, "%s: Cannot open file `%s': ", argv[0], ofname); 116 | perror(NULL); 117 | exit(EXIT_FAILURE); 118 | } 119 | } else { 120 | #if HAVE__SETMODE && HAVE__FILENO 121 | if(_setmode(_fileno(stdout), _O_BINARY) == -1) { 122 | fprintf(stderr, "%s: Cannot set mode: ", argv[0]); 123 | perror(NULL); 124 | exit(EXIT_FAILURE); 125 | } 126 | #endif 127 | ofp = stdout; 128 | ofname = "stdout"; 129 | needclose ^= 2; 130 | } 131 | 132 | /* Get the file size. */ 133 | if(LFS_FSEEK(fp, 0, SEEK_END) == 0) { 134 | n = LFS_FTELL(fp); 135 | rewind(fp); 136 | if(n < 0) { 137 | fprintf(stderr, "%s: Cannot ftell `%s': ", argv[0], fname); 138 | perror(NULL); 139 | exit(EXIT_FAILURE); 140 | } 141 | if(0x7fffffff <= n) { 142 | fprintf(stderr, "%s: Input file `%s' is too big.\n", argv[0], fname); 143 | exit(EXIT_FAILURE); 144 | } 145 | } else { 146 | fprintf(stderr, "%s: Cannot fseek `%s': ", argv[0], fname); 147 | perror(NULL); 148 | exit(EXIT_FAILURE); 149 | } 150 | 151 | /* Allocate 5blocksize bytes of memory. */ 152 | T = (sauchar_t *)malloc((size_t)n * sizeof(sauchar_t)); 153 | SA = (saidx_t *)malloc((size_t)n * sizeof(saidx_t)); 154 | if((T == NULL) || (SA == NULL)) { 155 | fprintf(stderr, "%s: Cannot allocate memory.\n", argv[0]); 156 | exit(EXIT_FAILURE); 157 | } 158 | 159 | /* Read n bytes of data. */ 160 | if(fread(T, sizeof(sauchar_t), (size_t)n, fp) != (size_t)n) { 161 | fprintf(stderr, "%s: %s `%s': ", 162 | argv[0], 163 | (ferror(fp) || !feof(fp)) ? "Cannot read from" : "Unexpected EOF in", 164 | fname); 165 | perror(NULL); 166 | exit(EXIT_FAILURE); 167 | } 168 | if(needclose & 1) { fclose(fp); } 169 | 170 | /* Construct the suffix array. */ 171 | fprintf(stderr, "%s: %" PRIdOFF_T " bytes ... ", fname, n); 172 | start = clock(); 173 | if(divsufsort(T, SA, (saidx_t)n) != 0) { 174 | fprintf(stderr, "%s: Cannot allocate memory.\n", argv[0]); 175 | exit(EXIT_FAILURE); 176 | } 177 | finish = clock(); 178 | fprintf(stderr, "%.4f sec\n", (double)(finish - start) / (double)CLOCKS_PER_SEC); 179 | 180 | /* Write the suffix array. */ 181 | if(fwrite(SA, sizeof(saidx_t), (size_t)n, ofp) != (size_t)n) { 182 | fprintf(stderr, "%s: Cannot write to `%s': ", argv[0], ofname); 183 | perror(NULL); 184 | exit(EXIT_FAILURE); 185 | } 186 | if(needclose & 2) { fclose(ofp); } 187 | 188 | /* Deallocate memory. */ 189 | free(SA); 190 | free(T); 191 | 192 | return 0; 193 | } 194 | -------------------------------------------------------------------------------- /src/libdivsufsort/examples/sasearch.c: -------------------------------------------------------------------------------- 1 | /* 2 | * sasearch.c for libdivsufsort 3 | * Copyright (c) 2003-2008 Yuta Mori All Rights Reserved. 4 | * 5 | * Permission is hereby granted, free of charge, to any person 6 | * obtaining a copy of this software and associated documentation 7 | * files (the "Software"), to deal in the Software without 8 | * restriction, including without limitation the rights to use, 9 | * copy, modify, merge, publish, distribute, sublicense, and/or sell 10 | * copies of the Software, and to permit persons to whom the 11 | * Software is furnished to do so, subject to the following 12 | * conditions: 13 | * 14 | * The above copyright notice and this permission notice shall be 15 | * included in all copies or substantial portions of the Software. 16 | * 17 | * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 18 | * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 19 | * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 20 | * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 21 | * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 22 | * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 23 | * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 24 | * OTHER DEALINGS IN THE SOFTWARE. 25 | */ 26 | 27 | #if HAVE_CONFIG_H 28 | # include "config.h" 29 | #endif 30 | #include 31 | #if HAVE_STRING_H 32 | # include 33 | #endif 34 | #if HAVE_STDLIB_H 35 | # include 36 | #endif 37 | #if HAVE_MEMORY_H 38 | # include 39 | #endif 40 | #if HAVE_STDDEF_H 41 | # include 42 | #endif 43 | #if HAVE_STRINGS_H 44 | # include 45 | #endif 46 | #if HAVE_SYS_TYPES_H 47 | # include 48 | #endif 49 | #if HAVE_IO_H && HAVE_FCNTL_H 50 | # include 51 | # include 52 | #endif 53 | #include 54 | #include "lfs.h" 55 | 56 | 57 | static 58 | void 59 | print_help(const char *progname, int status) { 60 | fprintf(stderr, 61 | "sasearch, a simple SA-based full-text search tool, version %s\n", 62 | divsufsort_version()); 63 | fprintf(stderr, "usage: %s PATTERN FILE SAFILE\n\n", progname); 64 | exit(status); 65 | } 66 | 67 | int 68 | main(int argc, const char *argv[]) { 69 | FILE *fp; 70 | const char *P; 71 | sauchar_t *T; 72 | saidx_t *SA; 73 | LFS_OFF_T n; 74 | size_t Psize; 75 | saidx_t i, size, left; 76 | 77 | if((argc == 1) || 78 | (strcmp(argv[1], "-h") == 0) || 79 | (strcmp(argv[1], "--help") == 0)) { print_help(argv[0], EXIT_SUCCESS); } 80 | if(argc != 4) { print_help(argv[0], EXIT_FAILURE); } 81 | 82 | P = argv[1]; 83 | Psize = strlen(P); 84 | 85 | /* Open a file for reading. */ 86 | #if HAVE_FOPEN_S 87 | if(fopen_s(&fp, argv[2], "rb") != 0) { 88 | #else 89 | if((fp = LFS_FOPEN(argv[2], "rb")) == NULL) { 90 | #endif 91 | fprintf(stderr, "%s: Cannot open file `%s': ", argv[0], argv[2]); 92 | perror(NULL); 93 | exit(EXIT_FAILURE); 94 | } 95 | 96 | /* Get the file size. */ 97 | if(LFS_FSEEK(fp, 0, SEEK_END) == 0) { 98 | n = LFS_FTELL(fp); 99 | rewind(fp); 100 | if(n < 0) { 101 | fprintf(stderr, "%s: Cannot ftell `%s': ", argv[0], argv[2]); 102 | perror(NULL); 103 | exit(EXIT_FAILURE); 104 | } 105 | } else { 106 | fprintf(stderr, "%s: Cannot fseek `%s': ", argv[0], argv[2]); 107 | perror(NULL); 108 | exit(EXIT_FAILURE); 109 | } 110 | 111 | /* Allocate 5n bytes of memory. */ 112 | T = (sauchar_t *)malloc((size_t)n * sizeof(sauchar_t)); 113 | SA = (saidx_t *)malloc((size_t)n * sizeof(saidx_t)); 114 | if((T == NULL) || (SA == NULL)) { 115 | fprintf(stderr, "%s: Cannot allocate memory.\n", argv[0]); 116 | exit(EXIT_FAILURE); 117 | } 118 | 119 | /* Read n bytes of data. */ 120 | if(fread(T, sizeof(sauchar_t), (size_t)n, fp) != (size_t)n) { 121 | fprintf(stderr, "%s: %s `%s': ", 122 | argv[0], 123 | (ferror(fp) || !feof(fp)) ? "Cannot read from" : "Unexpected EOF in", 124 | argv[2]); 125 | perror(NULL); 126 | exit(EXIT_FAILURE); 127 | } 128 | fclose(fp); 129 | 130 | /* Open the SA file for reading. */ 131 | #if HAVE_FOPEN_S 132 | if(fopen_s(&fp, argv[3], "rb") != 0) { 133 | #else 134 | if((fp = LFS_FOPEN(argv[3], "rb")) == NULL) { 135 | #endif 136 | fprintf(stderr, "%s: Cannot open file `%s': ", argv[0], argv[3]); 137 | perror(NULL); 138 | exit(EXIT_FAILURE); 139 | } 140 | 141 | /* Read n * sizeof(saidx_t) bytes of data. */ 142 | if(fread(SA, sizeof(saidx_t), (size_t)n, fp) != (size_t)n) { 143 | fprintf(stderr, "%s: %s `%s': ", 144 | argv[0], 145 | (ferror(fp) || !feof(fp)) ? "Cannot read from" : "Unexpected EOF in", 146 | argv[3]); 147 | perror(NULL); 148 | exit(EXIT_FAILURE); 149 | } 150 | fclose(fp); 151 | 152 | /* Search and print */ 153 | size = sa_search(T, (saidx_t)n, 154 | (const sauchar_t *)P, (saidx_t)Psize, 155 | SA, (saidx_t)n, &left); 156 | for(i = 0; i < size; ++i) { 157 | fprintf(stdout, "%" PRIdSAIDX_T "\n", SA[left + i]); 158 | } 159 | 160 | /* Deallocate memory. */ 161 | free(SA); 162 | free(T); 163 | 164 | return 0; 165 | } 166 | -------------------------------------------------------------------------------- /src/libdivsufsort/examples/suftest.c: -------------------------------------------------------------------------------- 1 | /* 2 | * suftest.c for libdivsufsort 3 | * Copyright (c) 2003-2008 Yuta Mori All Rights Reserved. 4 | * 5 | * Permission is hereby granted, free of charge, to any person 6 | * obtaining a copy of this software and associated documentation 7 | * files (the "Software"), to deal in the Software without 8 | * restriction, including without limitation the rights to use, 9 | * copy, modify, merge, publish, distribute, sublicense, and/or sell 10 | * copies of the Software, and to permit persons to whom the 11 | * Software is furnished to do so, subject to the following 12 | * conditions: 13 | * 14 | * The above copyright notice and this permission notice shall be 15 | * included in all copies or substantial portions of the Software. 16 | * 17 | * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 18 | * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 19 | * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 20 | * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 21 | * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 22 | * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 23 | * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 24 | * OTHER DEALINGS IN THE SOFTWARE. 25 | */ 26 | 27 | #if HAVE_CONFIG_H 28 | # include "config.h" 29 | #endif 30 | #include 31 | #if HAVE_STRING_H 32 | # include 33 | #endif 34 | #if HAVE_STDLIB_H 35 | # include 36 | #endif 37 | #if HAVE_MEMORY_H 38 | # include 39 | #endif 40 | #if HAVE_STDDEF_H 41 | # include 42 | #endif 43 | #if HAVE_STRINGS_H 44 | # include 45 | #endif 46 | #if HAVE_SYS_TYPES_H 47 | # include 48 | #endif 49 | #if HAVE_IO_H && HAVE_FCNTL_H 50 | # include 51 | # include 52 | #endif 53 | #include 54 | #include 55 | #include "lfs.h" 56 | 57 | 58 | static 59 | void 60 | print_help(const char *progname, int status) { 61 | fprintf(stderr, 62 | "suftest, a suffixsort tester, version %s.\n", 63 | divsufsort_version()); 64 | fprintf(stderr, "usage: %s FILE\n\n", progname); 65 | exit(status); 66 | } 67 | 68 | int 69 | main(int argc, const char *argv[]) { 70 | FILE *fp; 71 | const char *fname; 72 | sauchar_t *T; 73 | saidx_t *SA; 74 | LFS_OFF_T n; 75 | clock_t start, finish; 76 | saint_t needclose = 1; 77 | 78 | /* Check arguments. */ 79 | if((argc == 1) || 80 | (strcmp(argv[1], "-h") == 0) || 81 | (strcmp(argv[1], "--help") == 0)) { print_help(argv[0], EXIT_SUCCESS); } 82 | if(argc != 2) { print_help(argv[0], EXIT_FAILURE); } 83 | 84 | /* Open a file for reading. */ 85 | if(strcmp(argv[1], "-") != 0) { 86 | #if HAVE_FOPEN_S 87 | if(fopen_s(&fp, fname = argv[1], "rb") != 0) { 88 | #else 89 | if((fp = LFS_FOPEN(fname = argv[1], "rb")) == NULL) { 90 | #endif 91 | fprintf(stderr, "%s: Cannot open file `%s': ", argv[0], fname); 92 | perror(NULL); 93 | exit(EXIT_FAILURE); 94 | } 95 | } else { 96 | #if HAVE__SETMODE && HAVE__FILENO 97 | if(_setmode(_fileno(stdin), _O_BINARY) == -1) { 98 | fprintf(stderr, "%s: Cannot set mode: ", argv[0]); 99 | perror(NULL); 100 | exit(EXIT_FAILURE); 101 | } 102 | #endif 103 | fp = stdin; 104 | fname = "stdin"; 105 | needclose = 0; 106 | } 107 | 108 | /* Get the file size. */ 109 | if(LFS_FSEEK(fp, 0, SEEK_END) == 0) { 110 | n = LFS_FTELL(fp); 111 | rewind(fp); 112 | if(n < 0) { 113 | fprintf(stderr, "%s: Cannot ftell `%s': ", argv[0], fname); 114 | perror(NULL); 115 | exit(EXIT_FAILURE); 116 | } 117 | if(0x7fffffff <= n) { 118 | fprintf(stderr, "%s: Input file `%s' is too big.\n", argv[0], fname); 119 | exit(EXIT_FAILURE); 120 | } 121 | } else { 122 | fprintf(stderr, "%s: Cannot fseek `%s': ", argv[0], fname); 123 | perror(NULL); 124 | exit(EXIT_FAILURE); 125 | } 126 | 127 | /* Allocate 5n bytes of memory. */ 128 | T = (sauchar_t *)malloc((size_t)n * sizeof(sauchar_t)); 129 | SA = (saidx_t *)malloc((size_t)n * sizeof(saidx_t)); 130 | if((T == NULL) || (SA == NULL)) { 131 | fprintf(stderr, "%s: Cannot allocate memory.\n", argv[0]); 132 | exit(EXIT_FAILURE); 133 | } 134 | 135 | /* Read n bytes of data. */ 136 | if(fread(T, sizeof(sauchar_t), (size_t)n, fp) != (size_t)n) { 137 | fprintf(stderr, "%s: %s `%s': ", 138 | argv[0], 139 | (ferror(fp) || !feof(fp)) ? "Cannot read from" : "Unexpected EOF in", 140 | argv[1]); 141 | perror(NULL); 142 | exit(EXIT_FAILURE); 143 | } 144 | if(needclose & 1) { fclose(fp); } 145 | 146 | /* Construct the suffix array. */ 147 | fprintf(stderr, "%s: %" PRIdOFF_T " bytes ... ", fname, n); 148 | start = clock(); 149 | if(divsufsort(T, SA, (saidx_t)n) != 0) { 150 | fprintf(stderr, "%s: Cannot allocate memory.\n", argv[0]); 151 | exit(EXIT_FAILURE); 152 | } 153 | finish = clock(); 154 | fprintf(stderr, "%.4f sec\n", (double)(finish - start) / (double)CLOCKS_PER_SEC); 155 | 156 | /* Check the suffix array. */ 157 | if(sufcheck(T, SA, (saidx_t)n, 1) != 0) { exit(EXIT_FAILURE); } 158 | 159 | /* Deallocate memory. */ 160 | free(SA); 161 | free(T); 162 | 163 | return 0; 164 | } 165 | -------------------------------------------------------------------------------- /src/libdivsufsort/examples/unbwt.c: -------------------------------------------------------------------------------- 1 | /* 2 | * unbwt.c for libdivsufsort 3 | * Copyright (c) 2003-2008 Yuta Mori All Rights Reserved. 4 | * 5 | * Permission is hereby granted, free of charge, to any person 6 | * obtaining a copy of this software and associated documentation 7 | * files (the "Software"), to deal in the Software without 8 | * restriction, including without limitation the rights to use, 9 | * copy, modify, merge, publish, distribute, sublicense, and/or sell 10 | * copies of the Software, and to permit persons to whom the 11 | * Software is furnished to do so, subject to the following 12 | * conditions: 13 | * 14 | * The above copyright notice and this permission notice shall be 15 | * included in all copies or substantial portions of the Software. 16 | * 17 | * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 18 | * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 19 | * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 20 | * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 21 | * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 22 | * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 23 | * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 24 | * OTHER DEALINGS IN THE SOFTWARE. 25 | */ 26 | 27 | #if HAVE_CONFIG_H 28 | # include "config.h" 29 | #endif 30 | #include 31 | #if HAVE_STRING_H 32 | # include 33 | #endif 34 | #if HAVE_STDLIB_H 35 | # include 36 | #endif 37 | #if HAVE_MEMORY_H 38 | # include 39 | #endif 40 | #if HAVE_STDDEF_H 41 | # include 42 | #endif 43 | #if HAVE_STRINGS_H 44 | # include 45 | #endif 46 | #if HAVE_SYS_TYPES_H 47 | # include 48 | #endif 49 | #if HAVE_IO_H && HAVE_FCNTL_H 50 | # include 51 | # include 52 | #endif 53 | #include 54 | #include 55 | #include "lfs.h" 56 | 57 | 58 | static 59 | size_t 60 | read_int(FILE *fp, saidx_t *n) { 61 | unsigned char c[4]; 62 | size_t m = fread(c, sizeof(unsigned char), 4, fp); 63 | if(m == 4) { 64 | *n = (c[0] << 0) | (c[1] << 8) | 65 | (c[2] << 16) | (c[3] << 24); 66 | } 67 | return m; 68 | } 69 | 70 | static 71 | void 72 | print_help(const char *progname, int status) { 73 | fprintf(stderr, 74 | "unbwt, an inverse burrows-wheeler transform program, version %s.\n", 75 | divsufsort_version()); 76 | fprintf(stderr, "usage: %s INFILE OUTFILE\n\n", progname); 77 | exit(status); 78 | } 79 | 80 | int 81 | main(int argc, const char *argv[]) { 82 | FILE *fp, *ofp; 83 | const char *fname, *ofname; 84 | sauchar_t *T; 85 | saidx_t *A; 86 | LFS_OFF_T n; 87 | size_t m; 88 | saidx_t pidx; 89 | clock_t start, finish; 90 | saint_t err, blocksize, needclose = 3; 91 | 92 | /* Check arguments. */ 93 | if((argc == 1) || 94 | (strcmp(argv[1], "-h") == 0) || 95 | (strcmp(argv[1], "--help") == 0)) { print_help(argv[0], EXIT_SUCCESS); } 96 | if(argc != 3) { print_help(argv[0], EXIT_FAILURE); } 97 | 98 | /* Open a file for reading. */ 99 | if(strcmp(argv[1], "-") != 0) { 100 | #if HAVE_FOPEN_S 101 | if(fopen_s(&fp, fname = argv[1], "rb") != 0) { 102 | #else 103 | if((fp = LFS_FOPEN(fname = argv[1], "rb")) == NULL) { 104 | #endif 105 | fprintf(stderr, "%s: Cannot open file `%s': ", argv[0], fname); 106 | perror(NULL); 107 | exit(EXIT_FAILURE); 108 | } 109 | } else { 110 | #if HAVE__SETMODE && HAVE__FILENO 111 | if(_setmode(_fileno(stdin), _O_BINARY) == -1) { 112 | fprintf(stderr, "%s: Cannot set mode: ", argv[0]); 113 | perror(NULL); 114 | exit(EXIT_FAILURE); 115 | } 116 | #endif 117 | fp = stdin; 118 | fname = "stdin"; 119 | needclose ^= 1; 120 | } 121 | 122 | /* Open a file for writing. */ 123 | if(strcmp(argv[2], "-") != 0) { 124 | #if HAVE_FOPEN_S 125 | if(fopen_s(&ofp, ofname = argv[2], "wb") != 0) { 126 | #else 127 | if((ofp = LFS_FOPEN(ofname = argv[2], "wb")) == NULL) { 128 | #endif 129 | fprintf(stderr, "%s: Cannot open file `%s': ", argv[0], ofname); 130 | perror(NULL); 131 | exit(EXIT_FAILURE); 132 | } 133 | } else { 134 | #if HAVE__SETMODE && HAVE__FILENO 135 | if(_setmode(_fileno(stdout), _O_BINARY) == -1) { 136 | fprintf(stderr, "%s: Cannot set mode: ", argv[0]); 137 | perror(NULL); 138 | exit(EXIT_FAILURE); 139 | } 140 | #endif 141 | ofp = stdout; 142 | ofname = "stdout"; 143 | needclose ^= 2; 144 | } 145 | 146 | /* Read the blocksize. */ 147 | if(read_int(fp, &blocksize) != 4) { 148 | fprintf(stderr, "%s: Cannot read from `%s': ", argv[0], fname); 149 | perror(NULL); 150 | exit(EXIT_FAILURE); 151 | } 152 | 153 | /* Allocate 5blocksize bytes of memory. */ 154 | T = (sauchar_t *)malloc(blocksize * sizeof(sauchar_t)); 155 | A = (saidx_t *)malloc(blocksize * sizeof(saidx_t)); 156 | if((T == NULL) || (A == NULL)) { 157 | fprintf(stderr, "%s: Cannot allocate memory.\n", argv[0]); 158 | exit(EXIT_FAILURE); 159 | } 160 | 161 | fprintf(stderr, "UnBWT (blocksize %" PRIdSAINT_T ") ... ", blocksize); 162 | start = clock(); 163 | for(n = 0; (m = read_int(fp, &pidx)) != 0; n += m) { 164 | /* Read blocksize bytes of data. */ 165 | if((m != 4) || ((m = fread(T, sizeof(sauchar_t), blocksize, fp)) == 0)) { 166 | fprintf(stderr, "%s: %s `%s': ", 167 | argv[0], 168 | (ferror(fp) || !feof(fp)) ? "Cannot read from" : "Unexpected EOF in", 169 | fname); 170 | perror(NULL); 171 | exit(EXIT_FAILURE); 172 | } 173 | 174 | /* Inverse Burrows-Wheeler Transform. */ 175 | if((err = inverse_bw_transform(T, T, A, m, pidx)) != 0) { 176 | fprintf(stderr, "%s (reverseBWT): %s.\n", 177 | argv[0], 178 | (err == -1) ? "Invalid data" : "Cannot allocate memory"); 179 | exit(EXIT_FAILURE); 180 | } 181 | 182 | /* Write m bytes of data. */ 183 | if(fwrite(T, sizeof(sauchar_t), m, ofp) != m) { 184 | fprintf(stderr, "%s: Cannot write to `%s': ", argv[0], ofname); 185 | perror(NULL); 186 | exit(EXIT_FAILURE); 187 | } 188 | } 189 | if(ferror(fp)) { 190 | fprintf(stderr, "%s: Cannot read from `%s': ", argv[0], fname); 191 | perror(NULL); 192 | exit(EXIT_FAILURE); 193 | } 194 | finish = clock(); 195 | fprintf(stderr, "%" PRIdOFF_T " bytes: %.4f sec\n", 196 | n, (double)(finish - start) / (double)CLOCKS_PER_SEC); 197 | 198 | /* Close files */ 199 | if(needclose & 1) { fclose(fp); } 200 | if(needclose & 2) { fclose(ofp); } 201 | 202 | /* Deallocate memory. */ 203 | free(A); 204 | free(T); 205 | 206 | return 0; 207 | } 208 | -------------------------------------------------------------------------------- /src/libdivsufsort/include/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | include(CheckIncludeFiles) 2 | include(CheckIncludeFile) 3 | include(CheckSymbolExists) 4 | include(CheckTypeSize) 5 | include(CheckFunctionKeywords) 6 | include(CheckLFS) 7 | 8 | ## Checks for header files ## 9 | check_include_file("inttypes.h" HAVE_INTTYPES_H) 10 | check_include_file("memory.h" HAVE_MEMORY_H) 11 | check_include_file("stddef.h" HAVE_STDDEF_H) 12 | check_include_file("stdint.h" HAVE_STDINT_H) 13 | check_include_file("stdlib.h" HAVE_STDLIB_H) 14 | check_include_file("string.h" HAVE_STRING_H) 15 | check_include_file("strings.h" HAVE_STRINGS_H) 16 | check_include_file("sys/types.h" HAVE_SYS_TYPES_H) 17 | if(HAVE_INTTYPES_H) 18 | set(INCFILE "#include ") 19 | elseif(HAVE_STDINT_H) 20 | set(INCFILE "#include ") 21 | else(HAVE_INTTYPES_H) 22 | set(INCFILE "") 23 | endif(HAVE_INTTYPES_H) 24 | 25 | ## create configuration files from .cmake file ## 26 | if(BUILD_EXAMPLES) 27 | ## Checks for WinIO ## 28 | if(WIN32) 29 | check_include_file("io.h" HAVE_IO_H) 30 | check_include_file("fcntl.h" HAVE_FCNTL_H) 31 | check_symbol_exists("_setmode" "io.h;fcntl.h" HAVE__SETMODE) 32 | if(NOT HAVE__SETMODE) 33 | check_symbol_exists("setmode" "io.h;fcntl.h" HAVE_SETMODE) 34 | endif(NOT HAVE__SETMODE) 35 | check_symbol_exists("_fileno" "stdio.h" HAVE__FILENO) 36 | check_symbol_exists("fopen_s" "stdio.h" HAVE_FOPEN_S) 37 | check_symbol_exists("_O_BINARY" "fcntl.h" HAVE__O_BINARY) 38 | endif(WIN32) 39 | 40 | ## Checks for large file support ## 41 | check_lfs(WITH_LFS) 42 | configure_file("${CMAKE_CURRENT_SOURCE_DIR}/lfs.h.cmake" "${CMAKE_CURRENT_BINARY_DIR}/lfs.h" @ONLY) 43 | endif(BUILD_EXAMPLES) 44 | 45 | ## generate config.h ## 46 | check_function_keywords("inline;__inline;__inline__;__declspec(dllexport);__declspec(dllimport)") 47 | if(HAVE_INLINE) 48 | set(INLINE "inline") 49 | elseif(HAVE___INLINE) 50 | set(INLINE "__inline") 51 | elseif(HAVE___INLINE__) 52 | set(INLINE "__inline__") 53 | else(HAVE_INLINE) 54 | set(INLINE "") 55 | endif(HAVE_INLINE) 56 | configure_file("${CMAKE_CURRENT_SOURCE_DIR}/config.h.cmake" "${CMAKE_CURRENT_BINARY_DIR}/config.h") 57 | 58 | ## Checks for types ## 59 | # sauchar_t (8bit) 60 | check_type_size("uint8_t" UINT8_T) 61 | if(HAVE_UINT8_T) 62 | set(SAUCHAR_TYPE "uint8_t") 63 | else(HAVE_UINT8_T) 64 | check_type_size("unsigned char" SIZEOF_UNSIGNED_CHAR) 65 | if("${SIZEOF_UNSIGNED_CHAR}" STREQUAL "1") 66 | set(SAUCHAR_TYPE "unsigned char") 67 | else("${SIZEOF_UNSIGNED_CHAR}" STREQUAL "1") 68 | message(FATAL_ERROR "Cannot find unsigned 8-bit integer type") 69 | endif("${SIZEOF_UNSIGNED_CHAR}" STREQUAL "1") 70 | endif(HAVE_UINT8_T) 71 | # saint_t (32bit) 72 | check_type_size("int32_t" INT32_T) 73 | if(HAVE_INT32_T) 74 | set(SAINT32_TYPE "int32_t") 75 | check_symbol_exists("PRId32" "inttypes.h" HAVE_PRID32) 76 | if(HAVE_PRID32) 77 | set(SAINT32_PRId "PRId32") 78 | else(HAVE_PRID32) 79 | set(SAINT32_PRId "\"d\"") 80 | endif(HAVE_PRID32) 81 | else(HAVE_INT32_T) 82 | check_type_size("int" SIZEOF_INT) 83 | check_type_size("long" SIZEOF_LONG) 84 | check_type_size("short" SIZEOF_SHORT) 85 | check_type_size("__int32" SIZEOF___INT32) 86 | if("${SIZEOF_INT}" STREQUAL "4") 87 | set(SAINT32_TYPE "int") 88 | set(SAINT32_PRId "\"d\"") 89 | elseif("${SIZEOF_LONG}" STREQUAL "4") 90 | set(SAINT32_TYPE "long") 91 | set(SAINT32_PRId "\"ld\"") 92 | elseif("${SIZEOF_SHORT}" STREQUAL "4") 93 | set(SAINT32_TYPE "short") 94 | set(SAINT32_PRId "\"d\"") 95 | elseif("${SIZEOF___INT32}" STREQUAL "4") 96 | set(SAINT32_TYPE "__int32") 97 | set(SAINT32_PRId "\"d\"") 98 | else("${SIZEOF_INT}" STREQUAL "4") 99 | message(FATAL_ERROR "Cannot find 32-bit integer type") 100 | endif("${SIZEOF_INT}" STREQUAL "4") 101 | endif(HAVE_INT32_T) 102 | # saint64_t (64bit) 103 | if(BUILD_DIVSUFSORT64) 104 | check_type_size("int64_t" INT64_T) 105 | if(HAVE_INT64_T) 106 | set(SAINT64_TYPE "int64_t") 107 | check_symbol_exists("PRId64" "inttypes.h" HAVE_PRID64) 108 | if(HAVE_PRID64) 109 | set(SAINT64_PRId "PRId64") 110 | else(HAVE_PRID64) 111 | set(SAINT64_PRId "\"lld\"") 112 | endif(HAVE_PRID64) 113 | else(HAVE_INT64_T) 114 | check_type_size("int" SIZEOF_INT) 115 | check_type_size("long" SIZEOF_LONG) 116 | check_type_size("long long" SIZEOF_LONG_LONG) 117 | check_type_size("__int64" SIZEOF___INT64) 118 | if("${SIZEOF_INT}" STREQUAL "8") 119 | set(SAINT64_TYPE "int") 120 | set(SAINT64_PRId "\"d\"") 121 | elseif("${SIZEOF_LONG}" STREQUAL "8") 122 | set(SAINT64_TYPE "long") 123 | set(SAINT64_PRId "\"ld\"") 124 | elseif("${SIZEOF_LONG_LONG}" STREQUAL "8") 125 | set(SAINT64_TYPE "long long") 126 | set(SAINT64_PRId "\"lld\"") 127 | elseif("${SIZEOF___INT64}" STREQUAL "8") 128 | set(SAINT64_TYPE "__int64") 129 | set(SAINT64_PRId "\"I64d\"") 130 | else("${SIZEOF_INT}" STREQUAL "8") 131 | message(SEND_ERROR "Cannot find 64-bit integer type") 132 | set(BUILD_DIVSUFSORT64 OFF) 133 | endif("${SIZEOF_INT}" STREQUAL "8") 134 | endif(HAVE_INT64_T) 135 | endif(BUILD_DIVSUFSORT64) 136 | 137 | ## generate divsufsort.h ## 138 | set(DIVSUFSORT_IMPORT "") 139 | set(DIVSUFSORT_EXPORT "") 140 | if(BUILD_SHARED_LIBS) 141 | if(HAVE___DECLSPEC_DLLIMPORT_) 142 | set(DIVSUFSORT_IMPORT "__declspec(dllimport)") 143 | endif(HAVE___DECLSPEC_DLLIMPORT_) 144 | if(HAVE___DECLSPEC_DLLEXPORT_) 145 | set(DIVSUFSORT_EXPORT "__declspec(dllexport)") 146 | endif(HAVE___DECLSPEC_DLLEXPORT_) 147 | endif(BUILD_SHARED_LIBS) 148 | set(W64BIT "") 149 | set(SAINDEX_TYPE "${SAINT32_TYPE}") 150 | set(SAINDEX_PRId "${SAINT32_PRId}") 151 | set(SAINT_PRId "${SAINT32_PRId}") 152 | configure_file("${CMAKE_CURRENT_SOURCE_DIR}/divsufsort.h.cmake" 153 | "${CMAKE_CURRENT_BINARY_DIR}/divsufsort.h" @ONLY) 154 | install(FILES "${CMAKE_CURRENT_BINARY_DIR}/divsufsort.h" DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}) 155 | if(BUILD_DIVSUFSORT64) 156 | set(W64BIT "64") 157 | set(SAINDEX_TYPE "${SAINT64_TYPE}") 158 | set(SAINDEX_PRId "${SAINT64_PRId}") 159 | configure_file("${CMAKE_CURRENT_SOURCE_DIR}/divsufsort.h.cmake" 160 | "${CMAKE_CURRENT_BINARY_DIR}/divsufsort64.h" @ONLY) 161 | install(FILES "${CMAKE_CURRENT_BINARY_DIR}/divsufsort64.h" DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}) 162 | endif(BUILD_DIVSUFSORT64) 163 | -------------------------------------------------------------------------------- /src/libdivsufsort/include/config.h.cmake: -------------------------------------------------------------------------------- 1 | /* 2 | * config.h for libdivsufsort 3 | * Copyright (c) 2003-2008 Yuta Mori All Rights Reserved. 4 | * 5 | * Permission is hereby granted, free of charge, to any person 6 | * obtaining a copy of this software and associated documentation 7 | * files (the "Software"), to deal in the Software without 8 | * restriction, including without limitation the rights to use, 9 | * copy, modify, merge, publish, distribute, sublicense, and/or sell 10 | * copies of the Software, and to permit persons to whom the 11 | * Software is furnished to do so, subject to the following 12 | * conditions: 13 | * 14 | * The above copyright notice and this permission notice shall be 15 | * included in all copies or substantial portions of the Software. 16 | * 17 | * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 18 | * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 19 | * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 20 | * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 21 | * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 22 | * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 23 | * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 24 | * OTHER DEALINGS IN THE SOFTWARE. 25 | */ 26 | 27 | #ifndef _CONFIG_H 28 | #define _CONFIG_H 1 29 | 30 | #ifdef __cplusplus 31 | extern "C" { 32 | #endif /* __cplusplus */ 33 | 34 | /** Define to the version of this package. **/ 35 | #cmakedefine PROJECT_VERSION_FULL "${PROJECT_VERSION_FULL}" 36 | 37 | /** Define to 1 if you have the header files. **/ 38 | #cmakedefine HAVE_INTTYPES_H 1 39 | #cmakedefine HAVE_STDDEF_H 1 40 | #cmakedefine HAVE_STDINT_H 1 41 | #cmakedefine HAVE_STDLIB_H 1 42 | #cmakedefine HAVE_STRING_H 1 43 | #cmakedefine HAVE_STRINGS_H 1 44 | #cmakedefine HAVE_MEMORY_H 1 45 | #cmakedefine HAVE_SYS_TYPES_H 1 46 | 47 | /** for WinIO **/ 48 | #cmakedefine HAVE_IO_H 1 49 | #cmakedefine HAVE_FCNTL_H 1 50 | #cmakedefine HAVE__SETMODE 1 51 | #cmakedefine HAVE_SETMODE 1 52 | #cmakedefine HAVE__FILENO 1 53 | #cmakedefine HAVE_FOPEN_S 1 54 | #cmakedefine HAVE__O_BINARY 1 55 | #ifndef HAVE__SETMODE 56 | # if HAVE_SETMODE 57 | # define _setmode setmode 58 | # define HAVE__SETMODE 1 59 | # endif 60 | # if HAVE__SETMODE && !HAVE__O_BINARY 61 | # define _O_BINARY 0 62 | # define HAVE__O_BINARY 1 63 | # endif 64 | #endif 65 | 66 | /** for inline **/ 67 | #ifndef INLINE 68 | # define INLINE @INLINE@ 69 | #endif 70 | 71 | /** for VC++ warning **/ 72 | #ifdef _MSC_VER 73 | #pragma warning(disable: 4127) 74 | #endif 75 | 76 | 77 | #ifdef __cplusplus 78 | } /* extern "C" */ 79 | #endif /* __cplusplus */ 80 | 81 | #endif /* _CONFIG_H */ 82 | -------------------------------------------------------------------------------- /src/libdivsufsort/include/divsufsort.h: -------------------------------------------------------------------------------- 1 | /* 2 | * divsufsort.h for libdivsufsort 3 | * Copyright (c) 2003-2008 Yuta Mori All Rights Reserved. 4 | * 5 | * Permission is hereby granted, free of charge, to any person 6 | * obtaining a copy of this software and associated documentation 7 | * files (the "Software"), to deal in the Software without 8 | * restriction, including without limitation the rights to use, 9 | * copy, modify, merge, publish, distribute, sublicense, and/or sell 10 | * copies of the Software, and to permit persons to whom the 11 | * Software is furnished to do so, subject to the following 12 | * conditions: 13 | * 14 | * The above copyright notice and this permission notice shall be 15 | * included in all copies or substantial portions of the Software. 16 | * 17 | * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 18 | * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 19 | * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 20 | * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 21 | * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 22 | * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 23 | * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 24 | * OTHER DEALINGS IN THE SOFTWARE. 25 | */ 26 | 27 | #ifndef _DIVSUFSORT_H 28 | #define _DIVSUFSORT_H 1 29 | 30 | #ifdef __cplusplus 31 | extern "C" { 32 | #endif /* __cplusplus */ 33 | 34 | #define DIVSUFSORT_API 35 | 36 | /*- Datatypes -*/ 37 | #ifndef SAUCHAR_T 38 | #define SAUCHAR_T 39 | typedef unsigned char sauchar_t; 40 | #endif /* SAUCHAR_T */ 41 | #ifndef SAINT_T 42 | #define SAINT_T 43 | typedef int saint_t; 44 | #endif /* SAINT_T */ 45 | #ifndef SAIDX_T 46 | #define SAIDX_T 47 | typedef int saidx_t; 48 | #endif /* SAIDX_T */ 49 | #ifndef PRIdSAIDX_T 50 | #define PRIdSAIDX_T "d" 51 | #endif 52 | 53 | /*- divsufsort context */ 54 | typedef struct _divsufsort_ctx_t { 55 | saidx_t *bucket_A; 56 | saidx_t *bucket_B; 57 | } divsufsort_ctx_t; 58 | 59 | /*- Prototypes -*/ 60 | 61 | /** 62 | * Initialize suffix array context 63 | * 64 | * @return 0 for success, or non-zero in case of an error 65 | */ 66 | int divsufsort_init(divsufsort_ctx_t *ctx); 67 | 68 | /** 69 | * Destroy suffix array context 70 | * 71 | * @param ctx suffix array context to destroy 72 | */ 73 | void divsufsort_destroy(divsufsort_ctx_t *ctx); 74 | 75 | /** 76 | * Constructs the suffix array of a given string. 77 | * @param ctx suffix array context 78 | * @param T[0..n-1] The input string. 79 | * @param SA[0..n-1] The output array of suffixes. 80 | * @param n The length of the given string. 81 | * @return 0 if no error occurred, -1 or -2 otherwise. 82 | */ 83 | DIVSUFSORT_API 84 | saint_t divsufsort_build_array(divsufsort_ctx_t *ctx, const sauchar_t *T, saidx_t *SA, saidx_t n); 85 | 86 | #if 0 87 | /** 88 | * Constructs the burrows-wheeler transformed string of a given string. 89 | * @param T[0..n-1] The input string. 90 | * @param U[0..n-1] The output string. (can be T) 91 | * @param A[0..n-1] The temporary array. (can be NULL) 92 | * @param n The length of the given string. 93 | * @return The primary index if no error occurred, -1 or -2 otherwise. 94 | */ 95 | DIVSUFSORT_API 96 | saidx_t 97 | divbwt(const sauchar_t *T, sauchar_t *U, saidx_t *A, saidx_t n); 98 | 99 | /** 100 | * Returns the version of the divsufsort library. 101 | * @return The version number string. 102 | */ 103 | DIVSUFSORT_API 104 | const char * 105 | divsufsort_version(void); 106 | 107 | 108 | /** 109 | * Constructs the burrows-wheeler transformed string of a given string and suffix array. 110 | * @param T[0..n-1] The input string. 111 | * @param U[0..n-1] The output string. (can be T) 112 | * @param SA[0..n-1] The suffix array. (can be NULL) 113 | * @param n The length of the given string. 114 | * @param idx The output primary index. 115 | * @return 0 if no error occurred, -1 or -2 otherwise. 116 | */ 117 | DIVSUFSORT_API 118 | saint_t 119 | bw_transform(const sauchar_t *T, sauchar_t *U, 120 | saidx_t *SA /* can NULL */, 121 | saidx_t n, saidx_t *idx); 122 | 123 | /** 124 | * Inverse BW-transforms a given BWTed string. 125 | * @param T[0..n-1] The input string. 126 | * @param U[0..n-1] The output string. (can be T) 127 | * @param A[0..n-1] The temporary array. (can be NULL) 128 | * @param n The length of the given string. 129 | * @param idx The primary index. 130 | * @return 0 if no error occurred, -1 or -2 otherwise. 131 | */ 132 | DIVSUFSORT_API 133 | saint_t 134 | inverse_bw_transform(const sauchar_t *T, sauchar_t *U, 135 | saidx_t *A /* can NULL */, 136 | saidx_t n, saidx_t idx); 137 | 138 | /** 139 | * Checks the correctness of a given suffix array. 140 | * @param T[0..n-1] The input string. 141 | * @param SA[0..n-1] The input suffix array. 142 | * @param n The length of the given string. 143 | * @param verbose The verbose mode. 144 | * @return 0 if no error occurred. 145 | */ 146 | DIVSUFSORT_API 147 | saint_t 148 | sufcheck(const sauchar_t *T, const saidx_t *SA, saidx_t n, saint_t verbose); 149 | 150 | /** 151 | * Search for the pattern P in the string T. 152 | * @param T[0..Tsize-1] The input string. 153 | * @param Tsize The length of the given string. 154 | * @param P[0..Psize-1] The input pattern string. 155 | * @param Psize The length of the given pattern string. 156 | * @param SA[0..SAsize-1] The input suffix array. 157 | * @param SAsize The length of the given suffix array. 158 | * @param idx The output index. 159 | * @return The count of matches if no error occurred, -1 otherwise. 160 | */ 161 | DIVSUFSORT_API 162 | saidx_t 163 | sa_search(const sauchar_t *T, saidx_t Tsize, 164 | const sauchar_t *P, saidx_t Psize, 165 | const saidx_t *SA, saidx_t SAsize, 166 | saidx_t *left); 167 | 168 | /** 169 | * Search for the character c in the string T. 170 | * @param T[0..Tsize-1] The input string. 171 | * @param Tsize The length of the given string. 172 | * @param SA[0..SAsize-1] The input suffix array. 173 | * @param SAsize The length of the given suffix array. 174 | * @param c The input character. 175 | * @param idx The output index. 176 | * @return The count of matches if no error occurred, -1 otherwise. 177 | */ 178 | DIVSUFSORT_API 179 | saidx_t 180 | sa_simplesearch(const sauchar_t *T, saidx_t Tsize, 181 | const saidx_t *SA, saidx_t SAsize, 182 | saint_t c, saidx_t *left); 183 | #endif 184 | 185 | #ifdef __cplusplus 186 | } /* extern "C" */ 187 | #endif /* __cplusplus */ 188 | 189 | #endif /* _DIVSUFSORT_H */ 190 | -------------------------------------------------------------------------------- /src/libdivsufsort/include/divsufsort.h.cmake: -------------------------------------------------------------------------------- 1 | /* 2 | * divsufsort@W64BIT@.h for libdivsufsort@W64BIT@ 3 | * Copyright (c) 2003-2008 Yuta Mori All Rights Reserved. 4 | * 5 | * Permission is hereby granted, free of charge, to any person 6 | * obtaining a copy of this software and associated documentation 7 | * files (the "Software"), to deal in the Software without 8 | * restriction, including without limitation the rights to use, 9 | * copy, modify, merge, publish, distribute, sublicense, and/or sell 10 | * copies of the Software, and to permit persons to whom the 11 | * Software is furnished to do so, subject to the following 12 | * conditions: 13 | * 14 | * The above copyright notice and this permission notice shall be 15 | * included in all copies or substantial portions of the Software. 16 | * 17 | * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 18 | * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 19 | * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 20 | * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 21 | * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 22 | * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 23 | * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 24 | * OTHER DEALINGS IN THE SOFTWARE. 25 | */ 26 | 27 | #ifndef _DIVSUFSORT@W64BIT@_H 28 | #define _DIVSUFSORT@W64BIT@_H 1 29 | 30 | #ifdef __cplusplus 31 | extern "C" { 32 | #endif /* __cplusplus */ 33 | 34 | @INCFILE@ 35 | 36 | #ifndef DIVSUFSORT_API 37 | # ifdef DIVSUFSORT_BUILD_DLL 38 | # define DIVSUFSORT_API @DIVSUFSORT_EXPORT@ 39 | # else 40 | # define DIVSUFSORT_API @DIVSUFSORT_IMPORT@ 41 | # endif 42 | #endif 43 | 44 | /*- Datatypes -*/ 45 | #ifndef SAUCHAR_T 46 | #define SAUCHAR_T 47 | typedef @SAUCHAR_TYPE@ sauchar_t; 48 | #endif /* SAUCHAR_T */ 49 | #ifndef SAINT_T 50 | #define SAINT_T 51 | typedef @SAINT32_TYPE@ saint_t; 52 | #endif /* SAINT_T */ 53 | #ifndef SAIDX@W64BIT@_T 54 | #define SAIDX@W64BIT@_T 55 | typedef @SAINDEX_TYPE@ saidx@W64BIT@_t; 56 | #endif /* SAIDX@W64BIT@_T */ 57 | #ifndef PRIdSAINT_T 58 | #define PRIdSAINT_T @SAINT_PRId@ 59 | #endif /* PRIdSAINT_T */ 60 | #ifndef PRIdSAIDX@W64BIT@_T 61 | #define PRIdSAIDX@W64BIT@_T @SAINDEX_PRId@ 62 | #endif /* PRIdSAIDX@W64BIT@_T */ 63 | 64 | 65 | /*- Prototypes -*/ 66 | 67 | /** 68 | * Constructs the suffix array of a given string. 69 | * @param T[0..n-1] The input string. 70 | * @param SA[0..n-1] The output array of suffixes. 71 | * @param n The length of the given string. 72 | * @return 0 if no error occurred, -1 or -2 otherwise. 73 | */ 74 | DIVSUFSORT_API 75 | saint_t 76 | divsufsort@W64BIT@(const sauchar_t *T, saidx@W64BIT@_t *SA, saidx@W64BIT@_t n); 77 | 78 | /** 79 | * Constructs the burrows-wheeler transformed string of a given string. 80 | * @param T[0..n-1] The input string. 81 | * @param U[0..n-1] The output string. (can be T) 82 | * @param A[0..n-1] The temporary array. (can be NULL) 83 | * @param n The length of the given string. 84 | * @return The primary index if no error occurred, -1 or -2 otherwise. 85 | */ 86 | DIVSUFSORT_API 87 | saidx@W64BIT@_t 88 | divbwt@W64BIT@(const sauchar_t *T, sauchar_t *U, saidx@W64BIT@_t *A, saidx@W64BIT@_t n); 89 | 90 | /** 91 | * Returns the version of the divsufsort library. 92 | * @return The version number string. 93 | */ 94 | DIVSUFSORT_API 95 | const char * 96 | divsufsort@W64BIT@_version(void); 97 | 98 | 99 | /** 100 | * Constructs the burrows-wheeler transformed string of a given string and suffix array. 101 | * @param T[0..n-1] The input string. 102 | * @param U[0..n-1] The output string. (can be T) 103 | * @param SA[0..n-1] The suffix array. (can be NULL) 104 | * @param n The length of the given string. 105 | * @param idx The output primary index. 106 | * @return 0 if no error occurred, -1 or -2 otherwise. 107 | */ 108 | DIVSUFSORT_API 109 | saint_t 110 | bw_transform@W64BIT@(const sauchar_t *T, sauchar_t *U, 111 | saidx@W64BIT@_t *SA /* can NULL */, 112 | saidx@W64BIT@_t n, saidx@W64BIT@_t *idx); 113 | 114 | /** 115 | * Inverse BW-transforms a given BWTed string. 116 | * @param T[0..n-1] The input string. 117 | * @param U[0..n-1] The output string. (can be T) 118 | * @param A[0..n-1] The temporary array. (can be NULL) 119 | * @param n The length of the given string. 120 | * @param idx The primary index. 121 | * @return 0 if no error occurred, -1 or -2 otherwise. 122 | */ 123 | DIVSUFSORT_API 124 | saint_t 125 | inverse_bw_transform@W64BIT@(const sauchar_t *T, sauchar_t *U, 126 | saidx@W64BIT@_t *A /* can NULL */, 127 | saidx@W64BIT@_t n, saidx@W64BIT@_t idx); 128 | 129 | /** 130 | * Checks the correctness of a given suffix array. 131 | * @param T[0..n-1] The input string. 132 | * @param SA[0..n-1] The input suffix array. 133 | * @param n The length of the given string. 134 | * @param verbose The verbose mode. 135 | * @return 0 if no error occurred. 136 | */ 137 | DIVSUFSORT_API 138 | saint_t 139 | sufcheck@W64BIT@(const sauchar_t *T, const saidx@W64BIT@_t *SA, saidx@W64BIT@_t n, saint_t verbose); 140 | 141 | /** 142 | * Search for the pattern P in the string T. 143 | * @param T[0..Tsize-1] The input string. 144 | * @param Tsize The length of the given string. 145 | * @param P[0..Psize-1] The input pattern string. 146 | * @param Psize The length of the given pattern string. 147 | * @param SA[0..SAsize-1] The input suffix array. 148 | * @param SAsize The length of the given suffix array. 149 | * @param idx The output index. 150 | * @return The count of matches if no error occurred, -1 otherwise. 151 | */ 152 | DIVSUFSORT_API 153 | saidx@W64BIT@_t 154 | sa_search@W64BIT@(const sauchar_t *T, saidx@W64BIT@_t Tsize, 155 | const sauchar_t *P, saidx@W64BIT@_t Psize, 156 | const saidx@W64BIT@_t *SA, saidx@W64BIT@_t SAsize, 157 | saidx@W64BIT@_t *left); 158 | 159 | /** 160 | * Search for the character c in the string T. 161 | * @param T[0..Tsize-1] The input string. 162 | * @param Tsize The length of the given string. 163 | * @param SA[0..SAsize-1] The input suffix array. 164 | * @param SAsize The length of the given suffix array. 165 | * @param c The input character. 166 | * @param idx The output index. 167 | * @return The count of matches if no error occurred, -1 otherwise. 168 | */ 169 | DIVSUFSORT_API 170 | saidx@W64BIT@_t 171 | sa_simplesearch@W64BIT@(const sauchar_t *T, saidx@W64BIT@_t Tsize, 172 | const saidx@W64BIT@_t *SA, saidx@W64BIT@_t SAsize, 173 | saint_t c, saidx@W64BIT@_t *left); 174 | 175 | 176 | #ifdef __cplusplus 177 | } /* extern "C" */ 178 | #endif /* __cplusplus */ 179 | 180 | #endif /* _DIVSUFSORT@W64BIT@_H */ 181 | -------------------------------------------------------------------------------- /src/libdivsufsort/include/divsufsort_config.h: -------------------------------------------------------------------------------- 1 | #define HAVE_STRING_H 1 2 | #define HAVE_STDLIB_H 1 3 | #define HAVE_MEMORY_H 1 4 | #define HAVE_STDINT_H 1 5 | #define INLINE inline 6 | 7 | #ifdef _MSC_VER 8 | #pragma warning( disable : 4244 ) 9 | #endif /* _MSC_VER */ 10 | -------------------------------------------------------------------------------- /src/libdivsufsort/include/divsufsort_private.h: -------------------------------------------------------------------------------- 1 | /* 2 | * divsufsort_private.h for libdivsufsort 3 | * Copyright (c) 2003-2008 Yuta Mori All Rights Reserved. 4 | * 5 | * Permission is hereby granted, free of charge, to any person 6 | * obtaining a copy of this software and associated documentation 7 | * files (the "Software"), to deal in the Software without 8 | * restriction, including without limitation the rights to use, 9 | * copy, modify, merge, publish, distribute, sublicense, and/or sell 10 | * copies of the Software, and to permit persons to whom the 11 | * Software is furnished to do so, subject to the following 12 | * conditions: 13 | * 14 | * The above copyright notice and this permission notice shall be 15 | * included in all copies or substantial portions of the Software. 16 | * 17 | * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 18 | * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 19 | * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 20 | * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 21 | * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 22 | * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 23 | * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 24 | * OTHER DEALINGS IN THE SOFTWARE. 25 | */ 26 | 27 | #ifndef _DIVSUFSORT_PRIVATE_H 28 | #define _DIVSUFSORT_PRIVATE_H 1 29 | 30 | #ifdef __cplusplus 31 | extern "C" { 32 | #endif /* __cplusplus */ 33 | 34 | #include "divsufsort_config.h" 35 | #include 36 | #include 37 | #if HAVE_STRING_H 38 | # include 39 | #endif 40 | #if HAVE_STDLIB_H 41 | # include 42 | #endif 43 | #if HAVE_MEMORY_H 44 | # include 45 | #endif 46 | #if HAVE_STDDEF_H 47 | # include 48 | #endif 49 | #if HAVE_STRINGS_H 50 | # include 51 | #endif 52 | #if HAVE_INTTYPES_H 53 | # include 54 | #else 55 | # if HAVE_STDINT_H 56 | # include 57 | # endif 58 | #endif 59 | #if defined(BUILD_DIVSUFSORT64) 60 | # include "divsufsort64.h" 61 | # ifndef SAIDX_T 62 | # define SAIDX_T 63 | # define saidx_t saidx64_t 64 | # endif /* SAIDX_T */ 65 | # ifndef PRIdSAIDX_T 66 | # define PRIdSAIDX_T PRIdSAIDX64_T 67 | # endif /* PRIdSAIDX_T */ 68 | # define divsufsort divsufsort64 69 | # define divbwt divbwt64 70 | # define divsufsort_version divsufsort64_version 71 | # define bw_transform bw_transform64 72 | # define inverse_bw_transform inverse_bw_transform64 73 | # define sufcheck sufcheck64 74 | # define sa_search sa_search64 75 | # define sa_simplesearch sa_simplesearch64 76 | # define sssort sssort64 77 | # define trsort trsort64 78 | #else 79 | # include "divsufsort.h" 80 | #endif 81 | 82 | 83 | /*- Constants -*/ 84 | #if !defined(UINT8_MAX) 85 | # define UINT8_MAX (255) 86 | #endif /* UINT8_MAX */ 87 | #if defined(ALPHABET_SIZE) && (ALPHABET_SIZE < 1) 88 | # undef ALPHABET_SIZE 89 | #endif 90 | #if !defined(ALPHABET_SIZE) 91 | # define ALPHABET_SIZE (UINT8_MAX + 1) 92 | #endif 93 | /* for divsufsort.c */ 94 | #define BUCKET_A_SIZE (ALPHABET_SIZE) 95 | #define BUCKET_B_SIZE (ALPHABET_SIZE * ALPHABET_SIZE) 96 | /* for sssort.c */ 97 | #if defined(SS_INSERTIONSORT_THRESHOLD) 98 | # if SS_INSERTIONSORT_THRESHOLD < 1 99 | # undef SS_INSERTIONSORT_THRESHOLD 100 | # define SS_INSERTIONSORT_THRESHOLD (1) 101 | # endif 102 | #else 103 | # define SS_INSERTIONSORT_THRESHOLD (8) 104 | #endif 105 | #if defined(SS_BLOCKSIZE) 106 | # if SS_BLOCKSIZE < 0 107 | # undef SS_BLOCKSIZE 108 | # define SS_BLOCKSIZE (0) 109 | # elif 32768 <= SS_BLOCKSIZE 110 | # undef SS_BLOCKSIZE 111 | # define SS_BLOCKSIZE (32767) 112 | # endif 113 | #else 114 | # define SS_BLOCKSIZE (1024) 115 | #endif 116 | /* minstacksize = log(SS_BLOCKSIZE) / log(3) * 2 */ 117 | #if SS_BLOCKSIZE == 0 118 | # if defined(BUILD_DIVSUFSORT64) 119 | # define SS_MISORT_STACKSIZE (96) 120 | # else 121 | # define SS_MISORT_STACKSIZE (64) 122 | # endif 123 | #elif SS_BLOCKSIZE <= 4096 124 | # define SS_MISORT_STACKSIZE (16) 125 | #else 126 | # define SS_MISORT_STACKSIZE (24) 127 | #endif 128 | #if defined(BUILD_DIVSUFSORT64) 129 | # define SS_SMERGE_STACKSIZE (64) 130 | #else 131 | # define SS_SMERGE_STACKSIZE (32) 132 | #endif 133 | /* for trsort.c */ 134 | #define TR_INSERTIONSORT_THRESHOLD (8) 135 | #if defined(BUILD_DIVSUFSORT64) 136 | # define TR_STACKSIZE (96) 137 | #else 138 | # define TR_STACKSIZE (64) 139 | #endif 140 | 141 | 142 | /*- Macros -*/ 143 | #ifndef SWAP 144 | # define SWAP(_a, _b) do { t = (_a); (_a) = (_b); (_b) = t; } while(0) 145 | #endif /* SWAP */ 146 | #ifndef MIN 147 | # define MIN(_a, _b) (((_a) < (_b)) ? (_a) : (_b)) 148 | #endif /* MIN */ 149 | #ifndef MAX 150 | # define MAX(_a, _b) (((_a) > (_b)) ? (_a) : (_b)) 151 | #endif /* MAX */ 152 | #define STACK_PUSH(_a, _b, _c, _d)\ 153 | do {\ 154 | assert(ssize < STACK_SIZE);\ 155 | stack[ssize].a = (_a), stack[ssize].b = (_b),\ 156 | stack[ssize].c = (_c), stack[ssize++].d = (_d);\ 157 | } while(0) 158 | #define STACK_PUSH5(_a, _b, _c, _d, _e)\ 159 | do {\ 160 | assert(ssize < STACK_SIZE);\ 161 | stack[ssize].a = (_a), stack[ssize].b = (_b),\ 162 | stack[ssize].c = (_c), stack[ssize].d = (_d), stack[ssize++].e = (_e);\ 163 | } while(0) 164 | #define STACK_POP(_a, _b, _c, _d)\ 165 | do {\ 166 | assert(0 <= ssize);\ 167 | if(ssize == 0) { return; }\ 168 | (_a) = stack[--ssize].a, (_b) = stack[ssize].b,\ 169 | (_c) = stack[ssize].c, (_d) = stack[ssize].d;\ 170 | } while(0) 171 | #define STACK_POP5(_a, _b, _c, _d, _e)\ 172 | do {\ 173 | assert(0 <= ssize);\ 174 | if(ssize == 0) { return; }\ 175 | (_a) = stack[--ssize].a, (_b) = stack[ssize].b,\ 176 | (_c) = stack[ssize].c, (_d) = stack[ssize].d, (_e) = stack[ssize].e;\ 177 | } while(0) 178 | /* for divsufsort.c */ 179 | #define BUCKET_A(_c0) bucket_A[(_c0)] 180 | #if ALPHABET_SIZE == 256 181 | #define BUCKET_B(_c0, _c1) (bucket_B[((_c1) << 8) | (_c0)]) 182 | #define BUCKET_BSTAR(_c0, _c1) (bucket_B[((_c0) << 8) | (_c1)]) 183 | #else 184 | #define BUCKET_B(_c0, _c1) (bucket_B[(_c1) * ALPHABET_SIZE + (_c0)]) 185 | #define BUCKET_BSTAR(_c0, _c1) (bucket_B[(_c0) * ALPHABET_SIZE + (_c1)]) 186 | #endif 187 | 188 | 189 | /*- Private Prototypes -*/ 190 | /* sssort.c */ 191 | void 192 | sssort(const sauchar_t *Td, const saidx_t *PA, 193 | saidx_t *first, saidx_t *last, 194 | saidx_t *buf, saidx_t bufsize, 195 | saidx_t depth, saidx_t n, saint_t lastsuffix); 196 | /* trsort.c */ 197 | void 198 | trsort(saidx_t *ISA, saidx_t *SA, saidx_t n, saidx_t depth); 199 | 200 | 201 | #ifdef __cplusplus 202 | } /* extern "C" */ 203 | #endif /* __cplusplus */ 204 | 205 | #endif /* _DIVSUFSORT_PRIVATE_H */ 206 | -------------------------------------------------------------------------------- /src/libdivsufsort/include/lfs.h.cmake: -------------------------------------------------------------------------------- 1 | /* 2 | * lfs.h for libdivsufsort 3 | * Copyright (c) 2003-2008 Yuta Mori All Rights Reserved. 4 | * 5 | * Permission is hereby granted, free of charge, to any person 6 | * obtaining a copy of this software and associated documentation 7 | * files (the "Software"), to deal in the Software without 8 | * restriction, including without limitation the rights to use, 9 | * copy, modify, merge, publish, distribute, sublicense, and/or sell 10 | * copies of the Software, and to permit persons to whom the 11 | * Software is furnished to do so, subject to the following 12 | * conditions: 13 | * 14 | * The above copyright notice and this permission notice shall be 15 | * included in all copies or substantial portions of the Software. 16 | * 17 | * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 18 | * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 19 | * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 20 | * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 21 | * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 22 | * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 23 | * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 24 | * OTHER DEALINGS IN THE SOFTWARE. 25 | */ 26 | 27 | #ifndef _LFS_H 28 | #define _LFS_H 1 29 | 30 | #ifdef __cplusplus 31 | extern "C" { 32 | #endif /* __cplusplus */ 33 | 34 | #ifndef __STRICT_ANSI__ 35 | # define LFS_OFF_T @LFS_OFF_T@ 36 | # define LFS_FOPEN @LFS_FOPEN@ 37 | # define LFS_FTELL @LFS_FTELL@ 38 | # define LFS_FSEEK @LFS_FSEEK@ 39 | # define LFS_PRId @LFS_PRID@ 40 | #else 41 | # define LFS_OFF_T long 42 | # define LFS_FOPEN fopen 43 | # define LFS_FTELL ftell 44 | # define LFS_FSEEK fseek 45 | # define LFS_PRId "ld" 46 | #endif 47 | #ifndef PRIdOFF_T 48 | # define PRIdOFF_T LFS_PRId 49 | #endif 50 | 51 | 52 | #ifdef __cplusplus 53 | } /* extern "C" */ 54 | #endif /* __cplusplus */ 55 | 56 | #endif /* _LFS_H */ 57 | -------------------------------------------------------------------------------- /src/libdivsufsort/lib/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | include_directories("${CMAKE_CURRENT_SOURCE_DIR}/../include" 2 | "${CMAKE_CURRENT_BINARY_DIR}/../include") 3 | 4 | set(divsufsort_SRCS divsufsort.c sssort.c trsort.c utils.c) 5 | 6 | ## libdivsufsort ## 7 | add_library(divsufsort ${divsufsort_SRCS}) 8 | install(TARGETS divsufsort 9 | RUNTIME DESTINATION ${CMAKE_INSTALL_RUNTIMEDIR} 10 | LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} 11 | ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}) 12 | set_target_properties(divsufsort PROPERTIES 13 | VERSION "${LIBRARY_VERSION}" 14 | SOVERSION "${LIBRARY_SOVERSION}" 15 | DEFINE_SYMBOL DIVSUFSORT_BUILD_DLL 16 | RUNTIME_OUTPUT_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/../examples") 17 | 18 | ## libdivsufsort64 ## 19 | if(BUILD_DIVSUFSORT64) 20 | add_library(divsufsort64 ${divsufsort_SRCS}) 21 | install(TARGETS divsufsort64 22 | RUNTIME DESTINATION ${CMAKE_INSTALL_RUNTIMEDIR} 23 | LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} 24 | ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}) 25 | set_target_properties(divsufsort64 PROPERTIES 26 | VERSION "${LIBRARY_VERSION}" 27 | SOVERSION "${LIBRARY_SOVERSION}" 28 | DEFINE_SYMBOL DIVSUFSORT_BUILD_DLL 29 | COMPILE_FLAGS "-DBUILD_DIVSUFSORT64" 30 | RUNTIME_OUTPUT_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/../examples") 31 | endif(BUILD_DIVSUFSORT64) 32 | -------------------------------------------------------------------------------- /src/libdivsufsort/lib/divsufsort_utils.c: -------------------------------------------------------------------------------- 1 | /* 2 | * utils.c for libdivsufsort 3 | * Copyright (c) 2003-2008 Yuta Mori All Rights Reserved. 4 | * 5 | * Permission is hereby granted, free of charge, to any person 6 | * obtaining a copy of this software and associated documentation 7 | * files (the "Software"), to deal in the Software without 8 | * restriction, including without limitation the rights to use, 9 | * copy, modify, merge, publish, distribute, sublicense, and/or sell 10 | * copies of the Software, and to permit persons to whom the 11 | * Software is furnished to do so, subject to the following 12 | * conditions: 13 | * 14 | * The above copyright notice and this permission notice shall be 15 | * included in all copies or substantial portions of the Software. 16 | * 17 | * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 18 | * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 19 | * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 20 | * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 21 | * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 22 | * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 23 | * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 24 | * OTHER DEALINGS IN THE SOFTWARE. 25 | */ 26 | 27 | #include "divsufsort_private.h" 28 | 29 | 30 | /*- Private Function -*/ 31 | 32 | #if 0 33 | /* Binary search for inverse bwt. */ 34 | static 35 | saidx_t 36 | binarysearch_lower(const saidx_t *A, saidx_t size, saidx_t value) { 37 | saidx_t half, i; 38 | for(i = 0, half = size >> 1; 39 | 0 < size; 40 | size = half, half >>= 1) { 41 | if(A[i + half] < value) { 42 | i += half + 1; 43 | half -= (size & 1) ^ 1; 44 | } 45 | } 46 | return i; 47 | } 48 | 49 | 50 | /*- Functions -*/ 51 | 52 | /* Burrows-Wheeler transform. */ 53 | saint_t 54 | bw_transform(const sauchar_t *T, sauchar_t *U, saidx_t *SA, 55 | saidx_t n, saidx_t *idx) { 56 | saidx_t *A, i, j, p, t; 57 | saint_t c; 58 | 59 | /* Check arguments. */ 60 | if((T == NULL) || (U == NULL) || (n < 0) || (idx == NULL)) { return -1; } 61 | if(n <= 1) { 62 | if(n == 1) { U[0] = T[0]; } 63 | *idx = n; 64 | return 0; 65 | } 66 | 67 | if((A = SA) == NULL) { 68 | i = divbwt(T, U, NULL, n); 69 | if(0 <= i) { *idx = i; i = 0; } 70 | return (saint_t)i; 71 | } 72 | 73 | /* BW transform. */ 74 | if(T == U) { 75 | t = n; 76 | for(i = 0, j = 0; i < n; ++i) { 77 | p = t - 1; 78 | t = A[i]; 79 | if(0 <= p) { 80 | c = T[j]; 81 | U[j] = (j <= p) ? T[p] : (sauchar_t)A[p]; 82 | A[j] = c; 83 | j++; 84 | } else { 85 | *idx = i; 86 | } 87 | } 88 | p = t - 1; 89 | if(0 <= p) { 90 | c = T[j]; 91 | U[j] = (j <= p) ? T[p] : (sauchar_t)A[p]; 92 | A[j] = c; 93 | } else { 94 | *idx = i; 95 | } 96 | } else { 97 | U[0] = T[n - 1]; 98 | for(i = 0; A[i] != 0; ++i) { U[i + 1] = T[A[i] - 1]; } 99 | *idx = i + 1; 100 | for(++i; i < n; ++i) { U[i] = T[A[i] - 1]; } 101 | } 102 | 103 | if(SA == NULL) { 104 | /* Deallocate memory. */ 105 | free(A); 106 | } 107 | 108 | return 0; 109 | } 110 | 111 | /* Inverse Burrows-Wheeler transform. */ 112 | saint_t 113 | inverse_bw_transform(const sauchar_t *T, sauchar_t *U, saidx_t *A, 114 | saidx_t n, saidx_t idx) { 115 | saidx_t C[ALPHABET_SIZE]; 116 | sauchar_t D[ALPHABET_SIZE]; 117 | saidx_t *B; 118 | saidx_t i, p; 119 | saint_t c, d; 120 | 121 | /* Check arguments. */ 122 | if((T == NULL) || (U == NULL) || (n < 0) || (idx < 0) || 123 | (n < idx) || ((0 < n) && (idx == 0))) { 124 | return -1; 125 | } 126 | if(n <= 1) { return 0; } 127 | 128 | if((B = A) == NULL) { 129 | /* Allocate n*sizeof(saidx_t) bytes of memory. */ 130 | if((B = (saidx_t *)malloc((size_t)n * sizeof(saidx_t))) == NULL) { return -2; } 131 | } 132 | 133 | /* Inverse BW transform. */ 134 | for(c = 0; c < ALPHABET_SIZE; ++c) { C[c] = 0; } 135 | for(i = 0; i < n; ++i) { ++C[T[i]]; } 136 | for(c = 0, d = 0, i = 0; c < ALPHABET_SIZE; ++c) { 137 | p = C[c]; 138 | if(0 < p) { 139 | C[c] = i; 140 | D[d++] = (sauchar_t)c; 141 | i += p; 142 | } 143 | } 144 | for(i = 0; i < idx; ++i) { B[C[T[i]]++] = i; } 145 | for( ; i < n; ++i) { B[C[T[i]]++] = i + 1; } 146 | for(c = 0; c < d; ++c) { C[c] = C[D[c]]; } 147 | for(i = 0, p = idx; i < n; ++i) { 148 | U[i] = D[binarysearch_lower(C, d, p)]; 149 | p = B[p - 1]; 150 | } 151 | 152 | if(A == NULL) { 153 | /* Deallocate memory. */ 154 | free(B); 155 | } 156 | 157 | return 0; 158 | } 159 | 160 | /* Checks the suffix array SA of the string T. */ 161 | saint_t 162 | sufcheck(const sauchar_t *T, const saidx_t *SA, 163 | saidx_t n, saint_t verbose) { 164 | saidx_t C[ALPHABET_SIZE]; 165 | saidx_t i, p, q, t; 166 | saint_t c; 167 | 168 | if(verbose) { fprintf(stderr, "sufcheck: "); } 169 | 170 | /* Check arguments. */ 171 | if((T == NULL) || (SA == NULL) || (n < 0)) { 172 | if(verbose) { fprintf(stderr, "Invalid arguments.\n"); } 173 | return -1; 174 | } 175 | if(n == 0) { 176 | if(verbose) { fprintf(stderr, "Done.\n"); } 177 | return 0; 178 | } 179 | 180 | /* check range: [0..n-1] */ 181 | for(i = 0; i < n; ++i) { 182 | if((SA[i] < 0) || (n <= SA[i])) { 183 | if(verbose) { 184 | fprintf(stderr, "Out of the range [0,%" PRIdSAIDX_T "].\n" 185 | " SA[%" PRIdSAIDX_T "]=%" PRIdSAIDX_T "\n", 186 | n - 1, i, SA[i]); 187 | } 188 | return -2; 189 | } 190 | } 191 | 192 | /* check first characters. */ 193 | for(i = 1; i < n; ++i) { 194 | if(T[SA[i - 1]] > T[SA[i]]) { 195 | if(verbose) { 196 | fprintf(stderr, "Suffixes in wrong order.\n" 197 | " T[SA[%" PRIdSAIDX_T "]=%" PRIdSAIDX_T "]=%d" 198 | " > T[SA[%" PRIdSAIDX_T "]=%" PRIdSAIDX_T "]=%d\n", 199 | i - 1, SA[i - 1], T[SA[i - 1]], i, SA[i], T[SA[i]]); 200 | } 201 | return -3; 202 | } 203 | } 204 | 205 | /* check suffixes. */ 206 | for(i = 0; i < ALPHABET_SIZE; ++i) { C[i] = 0; } 207 | for(i = 0; i < n; ++i) { ++C[T[i]]; } 208 | for(i = 0, p = 0; i < ALPHABET_SIZE; ++i) { 209 | t = C[i]; 210 | C[i] = p; 211 | p += t; 212 | } 213 | 214 | q = C[T[n - 1]]; 215 | C[T[n - 1]] += 1; 216 | for(i = 0; i < n; ++i) { 217 | p = SA[i]; 218 | if(0 < p) { 219 | c = T[--p]; 220 | t = C[c]; 221 | } else { 222 | c = T[p = n - 1]; 223 | t = q; 224 | } 225 | if((t < 0) || (p != SA[t])) { 226 | if(verbose) { 227 | fprintf(stderr, "Suffix in wrong position.\n" 228 | " SA[%" PRIdSAIDX_T "]=%" PRIdSAIDX_T " or\n" 229 | " SA[%" PRIdSAIDX_T "]=%" PRIdSAIDX_T "\n", 230 | t, (0 <= t) ? SA[t] : -1, i, SA[i]); 231 | } 232 | return -4; 233 | } 234 | if(t != q) { 235 | ++C[c]; 236 | if((n <= C[c]) || (T[SA[C[c]]] != c)) { C[c] = -1; } 237 | } 238 | } 239 | 240 | if(1 <= verbose) { fprintf(stderr, "Done.\n"); } 241 | return 0; 242 | } 243 | 244 | 245 | static 246 | int 247 | _compare(const sauchar_t *T, saidx_t Tsize, 248 | const sauchar_t *P, saidx_t Psize, 249 | saidx_t suf, saidx_t *match) { 250 | saidx_t i, j; 251 | saint_t r; 252 | for(i = suf + *match, j = *match, r = 0; 253 | (i < Tsize) && (j < Psize) && ((r = T[i] - P[j]) == 0); ++i, ++j) { } 254 | *match = j; 255 | return (r == 0) ? -(j != Psize) : r; 256 | } 257 | 258 | /* Search for the pattern P in the string T. */ 259 | saidx_t 260 | sa_search(const sauchar_t *T, saidx_t Tsize, 261 | const sauchar_t *P, saidx_t Psize, 262 | const saidx_t *SA, saidx_t SAsize, 263 | saidx_t *idx) { 264 | saidx_t size, lsize, rsize, half; 265 | saidx_t match, lmatch, rmatch; 266 | saidx_t llmatch, lrmatch, rlmatch, rrmatch; 267 | saidx_t i, j, k; 268 | saint_t r; 269 | 270 | if(idx != NULL) { *idx = -1; } 271 | if((T == NULL) || (P == NULL) || (SA == NULL) || 272 | (Tsize < 0) || (Psize < 0) || (SAsize < 0)) { return -1; } 273 | if((Tsize == 0) || (SAsize == 0)) { return 0; } 274 | if(Psize == 0) { if(idx != NULL) { *idx = 0; } return SAsize; } 275 | 276 | for(i = j = k = 0, lmatch = rmatch = 0, size = SAsize, half = size >> 1; 277 | 0 < size; 278 | size = half, half >>= 1) { 279 | match = MIN(lmatch, rmatch); 280 | r = _compare(T, Tsize, P, Psize, SA[i + half], &match); 281 | if(r < 0) { 282 | i += half + 1; 283 | half -= (size & 1) ^ 1; 284 | lmatch = match; 285 | } else if(r > 0) { 286 | rmatch = match; 287 | } else { 288 | lsize = half, j = i, rsize = size - half - 1, k = i + half + 1; 289 | 290 | /* left part */ 291 | for(llmatch = lmatch, lrmatch = match, half = lsize >> 1; 292 | 0 < lsize; 293 | lsize = half, half >>= 1) { 294 | lmatch = MIN(llmatch, lrmatch); 295 | r = _compare(T, Tsize, P, Psize, SA[j + half], &lmatch); 296 | if(r < 0) { 297 | j += half + 1; 298 | half -= (lsize & 1) ^ 1; 299 | llmatch = lmatch; 300 | } else { 301 | lrmatch = lmatch; 302 | } 303 | } 304 | 305 | /* right part */ 306 | for(rlmatch = match, rrmatch = rmatch, half = rsize >> 1; 307 | 0 < rsize; 308 | rsize = half, half >>= 1) { 309 | rmatch = MIN(rlmatch, rrmatch); 310 | r = _compare(T, Tsize, P, Psize, SA[k + half], &rmatch); 311 | if(r <= 0) { 312 | k += half + 1; 313 | half -= (rsize & 1) ^ 1; 314 | rlmatch = rmatch; 315 | } else { 316 | rrmatch = rmatch; 317 | } 318 | } 319 | 320 | break; 321 | } 322 | } 323 | 324 | if(idx != NULL) { *idx = (0 < (k - j)) ? j : i; } 325 | return k - j; 326 | } 327 | 328 | /* Search for the character c in the string T. */ 329 | saidx_t 330 | sa_simplesearch(const sauchar_t *T, saidx_t Tsize, 331 | const saidx_t *SA, saidx_t SAsize, 332 | saint_t c, saidx_t *idx) { 333 | saidx_t size, lsize, rsize, half; 334 | saidx_t i, j, k, p; 335 | saint_t r; 336 | 337 | if(idx != NULL) { *idx = -1; } 338 | if((T == NULL) || (SA == NULL) || (Tsize < 0) || (SAsize < 0)) { return -1; } 339 | if((Tsize == 0) || (SAsize == 0)) { return 0; } 340 | 341 | for(i = j = k = 0, size = SAsize, half = size >> 1; 342 | 0 < size; 343 | size = half, half >>= 1) { 344 | p = SA[i + half]; 345 | r = (p < Tsize) ? T[p] - c : -1; 346 | if(r < 0) { 347 | i += half + 1; 348 | half -= (size & 1) ^ 1; 349 | } else if(r == 0) { 350 | lsize = half, j = i, rsize = size - half - 1, k = i + half + 1; 351 | 352 | /* left part */ 353 | for(half = lsize >> 1; 354 | 0 < lsize; 355 | lsize = half, half >>= 1) { 356 | p = SA[j + half]; 357 | r = (p < Tsize) ? T[p] - c : -1; 358 | if(r < 0) { 359 | j += half + 1; 360 | half -= (lsize & 1) ^ 1; 361 | } 362 | } 363 | 364 | /* right part */ 365 | for(half = rsize >> 1; 366 | 0 < rsize; 367 | rsize = half, half >>= 1) { 368 | p = SA[k + half]; 369 | r = (p < Tsize) ? T[p] - c : -1; 370 | if(r <= 0) { 371 | k += half + 1; 372 | half -= (rsize & 1) ^ 1; 373 | } 374 | } 375 | 376 | break; 377 | } 378 | } 379 | 380 | if(idx != NULL) { *idx = (0 < (k - j)) ? j : i; } 381 | return k - j; 382 | } 383 | #endif 384 | -------------------------------------------------------------------------------- /src/libdivsufsort/pkgconfig/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | ## generate libdivsufsort.pc ## 2 | set(W64BIT "") 3 | configure_file("${CMAKE_CURRENT_SOURCE_DIR}/libdivsufsort.pc.cmake" "${CMAKE_CURRENT_BINARY_DIR}/libdivsufsort.pc" @ONLY) 4 | install(FILES "${CMAKE_CURRENT_BINARY_DIR}/libdivsufsort.pc" DESTINATION ${CMAKE_INSTALL_PKGCONFIGDIR}) 5 | if(BUILD_DIVSUFSORT64) 6 | set(W64BIT "64") 7 | configure_file("${CMAKE_CURRENT_SOURCE_DIR}/libdivsufsort.pc.cmake" "${CMAKE_CURRENT_BINARY_DIR}/libdivsufsort64.pc" @ONLY) 8 | install(FILES "${CMAKE_CURRENT_BINARY_DIR}/libdivsufsort64.pc" DESTINATION ${CMAKE_INSTALL_PKGCONFIGDIR}) 9 | endif(BUILD_DIVSUFSORT64) 10 | -------------------------------------------------------------------------------- /src/libdivsufsort/pkgconfig/libdivsufsort.pc.cmake: -------------------------------------------------------------------------------- 1 | prefix=@CMAKE_INSTALL_PREFIX@ 2 | exec_prefix=${prefix} 3 | libdir=@CMAKE_INSTALL_LIBDIR@ 4 | includedir=@CMAKE_INSTALL_INCLUDEDIR@ 5 | 6 | Name: @PROJECT_NAME@@W64BIT@ 7 | Description: @PROJECT_DESCRIPTION@ 8 | Version: @PROJECT_VERSION_FULL@ 9 | URL: @PROJECT_URL@ 10 | Libs: -L${libdir} -ldivsufsort@W64BIT@ 11 | Cflags: -I${includedir} 12 | -------------------------------------------------------------------------------- /src/matchfinder.c: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/emmanuel-marty/lz4ultra/f80965cb37a4586cdbe109f02d1ee676f8408889/src/matchfinder.c -------------------------------------------------------------------------------- /src/matchfinder.h: -------------------------------------------------------------------------------- 1 | /* 2 | * matchfinder.h - LZ match finder implementation 3 | * 4 | * Copyright (C) 2019 Emmanuel Marty 5 | * 6 | * This software is provided 'as-is', without any express or implied 7 | * warranty. In no event will the authors be held liable for any damages 8 | * arising from the use of this software. 9 | * 10 | * Permission is granted to anyone to use this software for any purpose, 11 | * including commercial applications, and to alter it and redistribute it 12 | * freely, subject to the following restrictions: 13 | * 14 | * 1. The origin of this software must not be misrepresented; you must not 15 | * claim that you wrote the original software. If you use this software 16 | * in a product, an acknowledgment in the product documentation would be 17 | * appreciated but is not required. 18 | * 2. Altered source versions must be plainly marked as such, and must not be 19 | * misrepresented as being the original software. 20 | * 3. This notice may not be removed or altered from any source distribution. 21 | */ 22 | 23 | /* 24 | * Uses the libdivsufsort library Copyright (c) 2003-2008 Yuta Mori 25 | * 26 | * Inspired by LZ4 by Yann Collet. https://github.com/lz4/lz4 27 | * With help, ideas, optimizations and speed measurements by spke 28 | * With ideas from Lizard by Przemyslaw Skibinski and Yann Collet. https://github.com/inikep/lizard 29 | * Also with ideas from smallz4 by Stephan Brumme. https://create.stephan-brumme.com/smallz4/ 30 | * 31 | */ 32 | 33 | #ifndef _MATCHFINDER_H 34 | #define _MATCHFINDER_H 35 | 36 | /* Forward declarations */ 37 | typedef struct _lz4ultra_match lz4ultra_match; 38 | typedef struct _lz4ultra_compressor lz4ultra_compressor; 39 | 40 | /** 41 | * Parse input data, build suffix array and overlaid data structures to speed up match finding 42 | * 43 | * @param pCompressor compression context 44 | * @param pInWindow pointer to input data window (previously compressed bytes + bytes to compress) 45 | * @param nInWindowSize total input size in bytes (previously compressed bytes + bytes to compress) 46 | * 47 | * @return 0 for success, non-zero for failure 48 | */ 49 | int lz4ultra_build_suffix_array(lz4ultra_compressor *pCompressor, const unsigned char *pInWindow, const int nInWindowSize); 50 | 51 | /** 52 | * Skip previously compressed bytes 53 | * 54 | * @param pCompressor compression context 55 | * @param nStartOffset current offset in input window (typically 0) 56 | * @param nEndOffset offset to skip to in input window (typically the number of previously compressed bytes) 57 | */ 58 | void lz4ultra_skip_matches(lz4ultra_compressor *pCompressor, const int nStartOffset, const int nEndOffset); 59 | 60 | /** 61 | * Find all matches for the data to be compressed. 62 | * 63 | * @param pCompressor compression context 64 | * @param nStartOffset current offset in input window (typically the number of previously compressed bytes) 65 | * @param nEndOffset offset to end finding matches at (typically the size of the total input window in bytes 66 | */ 67 | void lz4ultra_find_all_matches(lz4ultra_compressor *pCompressor, const int nStartOffset, const int nEndOffset); 68 | 69 | #endif /* _MATCHFINDER_H */ 70 | -------------------------------------------------------------------------------- /src/shrink_block.h: -------------------------------------------------------------------------------- 1 | /* 2 | * shrink_block.h - optimal LZ4 block compressor definitions 3 | * 4 | * Copyright (C) 2019 Emmanuel Marty 5 | * 6 | * This software is provided 'as-is', without any express or implied 7 | * warranty. In no event will the authors be held liable for any damages 8 | * arising from the use of this software. 9 | * 10 | * Permission is granted to anyone to use this software for any purpose, 11 | * including commercial applications, and to alter it and redistribute it 12 | * freely, subject to the following restrictions: 13 | * 14 | * 1. The origin of this software must not be misrepresented; you must not 15 | * claim that you wrote the original software. If you use this software 16 | * in a product, an acknowledgment in the product documentation would be 17 | * appreciated but is not required. 18 | * 2. Altered source versions must be plainly marked as such, and must not be 19 | * misrepresented as being the original software. 20 | * 3. This notice may not be removed or altered from any source distribution. 21 | */ 22 | 23 | /* 24 | * Uses the libdivsufsort library Copyright (c) 2003-2008 Yuta Mori 25 | * 26 | * Inspired by LZ4 by Yann Collet. https://github.com/lz4/lz4 27 | * With help, ideas, optimizations and speed measurements by spke 28 | * With ideas from Lizard by Przemyslaw Skibinski and Yann Collet. https://github.com/inikep/lizard 29 | * Also with ideas from smallz4 by Stephan Brumme. https://create.stephan-brumme.com/smallz4/ 30 | * 31 | */ 32 | 33 | #ifndef _SHRINK_BLOCK_H 34 | #define _SHRINK_BLOCK_H 35 | 36 | /* Forward declarations */ 37 | typedef struct _lz4ultra_compressor lz4ultra_compressor; 38 | 39 | /** 40 | * Select the most optimal matches, reduce the token count if possible, and then emit a block of compressed LZ4 data 41 | * 42 | * @param pCompressor compression context 43 | * @param pInWindow pointer to input data window (previously compressed bytes + bytes to compress) 44 | * @param nPreviousBlockSize number of previously compressed bytes (or 0 for none) 45 | * @param nInDataSize number of input bytes to compress 46 | * @param pOutData pointer to output buffer 47 | * @param nMaxOutDataSize maximum size of output buffer, in bytes 48 | * 49 | * @return size of compressed data in output buffer, or -1 if the data is uncompressible 50 | */ 51 | int lz4ultra_optimize_and_write_block(lz4ultra_compressor *pCompressor, const unsigned char *pInWindow, const int nPreviousBlockSize, const int nInDataSize, unsigned char *pOutData, const int nMaxOutDataSize); 52 | 53 | #endif /* _SHRINK_BLOCK_H */ 54 | -------------------------------------------------------------------------------- /src/shrink_context.c: -------------------------------------------------------------------------------- 1 | /* 2 | * shrink_context.c - compression context implementation 3 | * 4 | * Copyright (C) 2019 Emmanuel Marty 5 | * 6 | * This software is provided 'as-is', without any express or implied 7 | * warranty. In no event will the authors be held liable for any damages 8 | * arising from the use of this software. 9 | * 10 | * Permission is granted to anyone to use this software for any purpose, 11 | * including commercial applications, and to alter it and redistribute it 12 | * freely, subject to the following restrictions: 13 | * 14 | * 1. The origin of this software must not be misrepresented; you must not 15 | * claim that you wrote the original software. If you use this software 16 | * in a product, an acknowledgment in the product documentation would be 17 | * appreciated but is not required. 18 | * 2. Altered source versions must be plainly marked as such, and must not be 19 | * misrepresented as being the original software. 20 | * 3. This notice may not be removed or altered from any source distribution. 21 | */ 22 | 23 | /* 24 | * Uses the libdivsufsort library Copyright (c) 2003-2008 Yuta Mori 25 | * 26 | * Inspired by LZ4 by Yann Collet. https://github.com/lz4/lz4 27 | * With help, ideas, optimizations and speed measurements by spke 28 | * With ideas from Lizard by Przemyslaw Skibinski and Yann Collet. https://github.com/inikep/lizard 29 | * Also with ideas from smallz4 by Stephan Brumme. https://create.stephan-brumme.com/smallz4/ 30 | * 31 | */ 32 | 33 | #include 34 | #include 35 | #include "shrink_context.h" 36 | #include "shrink_block.h" 37 | #include "matchfinder.h" 38 | 39 | /** 40 | * Initialize compression context 41 | * 42 | * @param pCompressor compression context to initialize 43 | * @param nMaxWindowSize maximum size of input data window (previously compressed bytes + bytes to compress) 44 | * @param nFlags compression flags 45 | * 46 | * @return 0 for success, non-zero for failure 47 | */ 48 | int lz4ultra_compressor_init(lz4ultra_compressor *pCompressor, const int nMaxWindowSize, const int nFlags) { 49 | int nResult; 50 | 51 | nResult = divsufsort_init(&pCompressor->divsufsort_context); 52 | pCompressor->intervals = NULL; 53 | pCompressor->pos_data = NULL; 54 | pCompressor->open_intervals = NULL; 55 | pCompressor->match = NULL; 56 | pCompressor->flags = nFlags; 57 | pCompressor->num_commands = 0; 58 | 59 | if (!nResult) { 60 | pCompressor->intervals = (unsigned long long *)malloc(nMaxWindowSize * sizeof(unsigned long long)); 61 | 62 | if (pCompressor->intervals) { 63 | pCompressor->pos_data = (unsigned long long *)malloc(nMaxWindowSize * sizeof(unsigned long long)); 64 | 65 | if (pCompressor->pos_data) { 66 | pCompressor->open_intervals = (unsigned long long *)malloc((LCP_MAX + 1) * sizeof(unsigned long long)); 67 | 68 | if (pCompressor->open_intervals) { 69 | pCompressor->match = (lz4ultra_match *)malloc(nMaxWindowSize * sizeof(lz4ultra_match)); 70 | 71 | if (pCompressor->match) 72 | return 0; 73 | } 74 | } 75 | } 76 | } 77 | 78 | lz4ultra_compressor_destroy(pCompressor); 79 | return 100; 80 | } 81 | 82 | /** 83 | * Clean up compression context and free up any associated resources 84 | * 85 | * @param pCompressor compression context to clean up 86 | */ 87 | void lz4ultra_compressor_destroy(lz4ultra_compressor *pCompressor) { 88 | divsufsort_destroy(&pCompressor->divsufsort_context); 89 | 90 | if (pCompressor->match) { 91 | free(pCompressor->match); 92 | pCompressor->match = NULL; 93 | } 94 | 95 | if (pCompressor->open_intervals) { 96 | free(pCompressor->open_intervals); 97 | pCompressor->open_intervals = NULL; 98 | } 99 | 100 | if (pCompressor->pos_data) { 101 | free(pCompressor->pos_data); 102 | pCompressor->pos_data = NULL; 103 | } 104 | 105 | if (pCompressor->intervals) { 106 | free(pCompressor->intervals); 107 | pCompressor->intervals = NULL; 108 | } 109 | } 110 | 111 | /** 112 | * Compress one block of data 113 | * 114 | * @param pCompressor compression context 115 | * @param pInWindow pointer to input data window (previously compressed bytes + bytes to compress) 116 | * @param nPreviousBlockSize number of previously compressed bytes (or 0 for none) 117 | * @param nInDataSize number of input bytes to compress 118 | * @param pOutData pointer to output buffer 119 | * @param nMaxOutDataSize maximum size of output buffer, in bytes 120 | * 121 | * @return size of compressed data in output buffer, or -1 if the data is uncompressible 122 | */ 123 | int lz4ultra_compressor_shrink_block(lz4ultra_compressor *pCompressor, const unsigned char *pInWindow, const int nPreviousBlockSize, const int nInDataSize, unsigned char *pOutData, const int nMaxOutDataSize) { 124 | if (lz4ultra_build_suffix_array(pCompressor, pInWindow, nPreviousBlockSize + nInDataSize)) 125 | return -1; 126 | if (nPreviousBlockSize) { 127 | lz4ultra_skip_matches(pCompressor, 0, nPreviousBlockSize); 128 | } 129 | lz4ultra_find_all_matches(pCompressor, nPreviousBlockSize, nPreviousBlockSize + nInDataSize); 130 | return lz4ultra_optimize_and_write_block(pCompressor, pInWindow, nPreviousBlockSize, nInDataSize, pOutData, nMaxOutDataSize); 131 | } 132 | 133 | /** 134 | * Get the number of compression commands issued in compressed data blocks 135 | * 136 | * @return number of commands 137 | */ 138 | int lz4ultra_compressor_get_command_count(lz4ultra_compressor *pCompressor) { 139 | return pCompressor->num_commands; 140 | } 141 | -------------------------------------------------------------------------------- /src/shrink_context.h: -------------------------------------------------------------------------------- 1 | /* 2 | * shrink_context.h - compression context definitions 3 | * 4 | * Copyright (C) 2019 Emmanuel Marty 5 | * 6 | * This software is provided 'as-is', without any express or implied 7 | * warranty. In no event will the authors be held liable for any damages 8 | * arising from the use of this software. 9 | * 10 | * Permission is granted to anyone to use this software for any purpose, 11 | * including commercial applications, and to alter it and redistribute it 12 | * freely, subject to the following restrictions: 13 | * 14 | * 1. The origin of this software must not be misrepresented; you must not 15 | * claim that you wrote the original software. If you use this software 16 | * in a product, an acknowledgment in the product documentation would be 17 | * appreciated but is not required. 18 | * 2. Altered source versions must be plainly marked as such, and must not be 19 | * misrepresented as being the original software. 20 | * 3. This notice may not be removed or altered from any source distribution. 21 | */ 22 | 23 | /* 24 | * Uses the libdivsufsort library Copyright (c) 2003-2008 Yuta Mori 25 | * 26 | * Inspired by LZ4 by Yann Collet. https://github.com/lz4/lz4 27 | * With help, ideas, optimizations and speed measurements by spke 28 | * With ideas from Lizard by Przemyslaw Skibinski and Yann Collet. https://github.com/inikep/lizard 29 | * Also with ideas from smallz4 by Stephan Brumme. https://create.stephan-brumme.com/smallz4/ 30 | * 31 | */ 32 | 33 | #ifndef _SHRINK_CONTEXT_H 34 | #define _SHRINK_CONTEXT_H 35 | 36 | #include "divsufsort.h" 37 | 38 | #define LCP_BITS 15 39 | #define LCP_MAX (1LL<<(LCP_BITS - 1)) 40 | #define LCP_SHIFT (39-LCP_BITS) 41 | #define LCP_MASK (((1ULL< 28 | * With ideas from Lizard by Przemyslaw Skibinski and Yann Collet. https://github.com/inikep/lizard 29 | * Also with ideas from smallz4 by Stephan Brumme. https://create.stephan-brumme.com/smallz4/ 30 | * 31 | */ 32 | 33 | #include 34 | #include 35 | #include "shrink_inmem.h" 36 | #include "frame.h" 37 | #include "format.h" 38 | #include "lib.h" 39 | 40 | /** 41 | * Get maximum compressed size of input(source) data 42 | * 43 | * @param nInputSize input(source) size in bytes 44 | * @param nFlags compression flags (LZ4ULTRA_FLAG_xxx) 45 | * @param nBlockMaxCode maximum block size code (4..7 for 64 Kb..4 Mb) 46 | * 47 | * @return maximum compressed size 48 | */ 49 | size_t lz4ultra_get_max_compressed_size_inmem(size_t nInputSize, unsigned int nFlags, int nBlockMaxCode) { 50 | int nBlockMaxBits; 51 | int nBlockMaxSize; 52 | 53 | if (nFlags & LZ4ULTRA_FLAG_LEGACY_FRAMES) { 54 | nBlockMaxBits = 23; 55 | } 56 | else { 57 | nBlockMaxBits = 8 + (nBlockMaxCode << 1); 58 | } 59 | nBlockMaxSize = 1 << nBlockMaxBits; 60 | 61 | if (nInputSize < nBlockMaxSize && (nFlags & LZ4ULTRA_FLAG_LEGACY_FRAMES) == 0) { 62 | /* If the entire input data is shorter than the specified block size, try to reduce the 63 | * block size until is the smallest one that can fit the data */ 64 | 65 | do { 66 | nBlockMaxBits = 8 + (nBlockMaxCode << 1); 67 | nBlockMaxSize = 1 << nBlockMaxBits; 68 | 69 | int nPrevBlockMaxBits = 8 + ((nBlockMaxCode - 1) << 1); 70 | int nPrevBlockMaxSize = 1 << nPrevBlockMaxBits; 71 | if (nBlockMaxCode > 4 && nPrevBlockMaxSize > nInputSize) { 72 | nBlockMaxCode--; 73 | } 74 | else 75 | break; 76 | } while (1); 77 | } 78 | 79 | return LZ4ULTRA_MAX_HEADER_SIZE + ((nInputSize + (nBlockMaxSize - 1)) >> nBlockMaxBits) * LZ4ULTRA_FRAME_SIZE + nInputSize + LZ4ULTRA_FRAME_SIZE /* footer */; 80 | } 81 | 82 | /** 83 | * Compress memory 84 | * 85 | * @param pInputData pointer to input(source) data to compress 86 | * @param pOutBuffer buffer for compressed data 87 | * @param nInputSize input(source) size in bytes 88 | * @param nMaxOutBufferSize maximum capacity of compression buffer 89 | * @param nFlags compression flags (LZ4ULTRA_FLAG_xxx) 90 | * @param nBlockMaxCode maximum block size code (4..7 for 64 Kb..4 Mb) 91 | * 92 | * @return actual compressed size, or -1 for error 93 | */ 94 | size_t lz4ultra_compress_inmem(const unsigned char *pInputData, unsigned char *pOutBuffer, size_t nInputSize, size_t nMaxOutBufferSize, unsigned int nFlags, int nBlockMaxCode) { 95 | lz4ultra_compressor compressor; 96 | size_t nOriginalSize = 0L; 97 | size_t nCompressedSize = 0L; 98 | int nBlockMaxBits; 99 | int nBlockMaxSize; 100 | int nResult; 101 | int nError = 0; 102 | 103 | if (nFlags & LZ4ULTRA_FLAG_LEGACY_FRAMES) { 104 | nBlockMaxBits = 23; 105 | nFlags |= LZ4ULTRA_FLAG_INDEP_BLOCKS; 106 | } 107 | else { 108 | nBlockMaxBits = 8 + (nBlockMaxCode << 1); 109 | } 110 | nBlockMaxSize = 1 << nBlockMaxBits; 111 | 112 | if (nInputSize < nBlockMaxSize && (nFlags & LZ4ULTRA_FLAG_LEGACY_FRAMES) == 0) { 113 | /* If the entire input data is shorter than the specified block size, try to reduce the 114 | * block size until is the smallest one that can fit the data */ 115 | 116 | do { 117 | nBlockMaxBits = 8 + (nBlockMaxCode << 1); 118 | nBlockMaxSize = 1 << nBlockMaxBits; 119 | 120 | int nPrevBlockMaxBits = 8 + ((nBlockMaxCode - 1) << 1); 121 | int nPrevBlockMaxSize = 1 << nPrevBlockMaxBits; 122 | if (nBlockMaxCode > 4 && nPrevBlockMaxSize > nInputSize) { 123 | nBlockMaxCode--; 124 | } 125 | else 126 | break; 127 | } while (1); 128 | } 129 | 130 | nResult = lz4ultra_compressor_init(&compressor, nBlockMaxSize + HISTORY_SIZE, nFlags); 131 | if (nResult != 0) { 132 | return LZ4ULTRA_ERROR_MEMORY; 133 | } 134 | 135 | if ((nFlags & LZ4ULTRA_FLAG_RAW_BLOCK) == 0) { 136 | int nHeaderSize = lz4ultra_encode_header(pOutBuffer + nCompressedSize, (int)(nMaxOutBufferSize - nCompressedSize), nFlags, nBlockMaxCode); 137 | if (nHeaderSize < 0) 138 | nError = LZ4ULTRA_ERROR_COMPRESSION; 139 | else { 140 | nCompressedSize += nHeaderSize; 141 | } 142 | } 143 | 144 | int nPreviousBlockSize = 0; 145 | int nNumBlocks = 0; 146 | 147 | while (nOriginalSize < nInputSize && !nError) { 148 | int nInDataSize; 149 | 150 | nInDataSize = (int)(nInputSize - nOriginalSize); 151 | if (nInDataSize > nBlockMaxSize) 152 | nInDataSize = nBlockMaxSize; 153 | 154 | if (nInDataSize > 0) { 155 | if ((nFlags & LZ4ULTRA_FLAG_RAW_BLOCK) != 0 && (nNumBlocks || nInDataSize > 0x400000)) { 156 | nError = LZ4ULTRA_ERROR_RAW_TOOLARGE; 157 | break; 158 | } 159 | 160 | int nOutDataSize; 161 | int nOutDataEnd = (int)(nMaxOutBufferSize - LZ4ULTRA_FRAME_SIZE - LZ4ULTRA_FRAME_SIZE /* footer */ - nCompressedSize); 162 | int nHeaderOffset = LZ4ULTRA_FRAME_SIZE; 163 | 164 | if ((nFlags & LZ4ULTRA_FLAG_RAW_BLOCK) != 0) { 165 | nHeaderOffset = 0; 166 | nOutDataEnd = (int)(nMaxOutBufferSize - nCompressedSize); 167 | } 168 | 169 | if (nOutDataEnd > nBlockMaxSize) 170 | nOutDataEnd = nBlockMaxSize; 171 | 172 | nOutDataSize = lz4ultra_compressor_shrink_block(&compressor, pInputData + nOriginalSize - nPreviousBlockSize, nPreviousBlockSize, nInDataSize, pOutBuffer + nHeaderOffset + nCompressedSize, nOutDataEnd); 173 | if (nOutDataSize >= 0) { 174 | int nFrameHeaderSize = 0; 175 | 176 | /* Compressed block */ 177 | 178 | if ((nFlags & LZ4ULTRA_FLAG_RAW_BLOCK) == 0) { 179 | nFrameHeaderSize = lz4ultra_encode_compressed_block_frame(pOutBuffer + nCompressedSize, (int)(nMaxOutBufferSize - nCompressedSize), nFlags, nOutDataSize); 180 | if (nFrameHeaderSize < 0) 181 | nError = LZ4ULTRA_ERROR_COMPRESSION; 182 | } 183 | 184 | if (!nError) { 185 | nOriginalSize += nInDataSize; 186 | nCompressedSize += nFrameHeaderSize + nOutDataSize; 187 | } 188 | } 189 | else { 190 | /* Write uncompressible, literal block */ 191 | 192 | if ((nFlags & LZ4ULTRA_FLAG_RAW_BLOCK) != 0) { 193 | /* Uncompressible data isn't supported by raw blocks */ 194 | nError = LZ4ULTRA_ERROR_RAW_UNCOMPRESSED; 195 | break; 196 | } 197 | 198 | int nFrameHeaderSize; 199 | 200 | nFrameHeaderSize = lz4ultra_encode_uncompressed_block_frame(pOutBuffer + nCompressedSize, (int)(nMaxOutBufferSize - nCompressedSize), nFlags, nInDataSize); 201 | if (nFrameHeaderSize < 0) 202 | nError = LZ4ULTRA_ERROR_COMPRESSION; 203 | else { 204 | if (nInDataSize > (nMaxOutBufferSize - (nCompressedSize + nFrameHeaderSize))) 205 | nError = LZ4ULTRA_ERROR_DST; 206 | else { 207 | memcpy(pOutBuffer + nFrameHeaderSize + nCompressedSize, pInputData + nOriginalSize, nInDataSize); 208 | nOriginalSize += nInDataSize; 209 | nCompressedSize += nFrameHeaderSize + (long long)nInDataSize; 210 | } 211 | } 212 | } 213 | 214 | if (!(nFlags & LZ4ULTRA_FLAG_INDEP_BLOCKS)) { 215 | nPreviousBlockSize = nInDataSize; 216 | if (nPreviousBlockSize > HISTORY_SIZE) 217 | nPreviousBlockSize = HISTORY_SIZE; 218 | } 219 | else { 220 | nPreviousBlockSize = 0; 221 | } 222 | 223 | nNumBlocks++; 224 | } 225 | } 226 | 227 | int nFooterSize; 228 | 229 | if ((nFlags & LZ4ULTRA_FLAG_RAW_BLOCK) != 0) { 230 | nFooterSize = 0; 231 | } 232 | else { 233 | nFooterSize = lz4ultra_encode_footer_frame(pOutBuffer + nCompressedSize, (int)(nMaxOutBufferSize - nCompressedSize), nFlags); 234 | if (nFooterSize < 0) 235 | nError = LZ4ULTRA_ERROR_COMPRESSION; 236 | } 237 | 238 | if (!nError) { 239 | nCompressedSize += nFooterSize; 240 | } 241 | 242 | 243 | lz4ultra_compressor_destroy(&compressor); 244 | 245 | if (nError) { 246 | return -1; 247 | } 248 | else { 249 | return nCompressedSize; 250 | } 251 | } 252 | -------------------------------------------------------------------------------- /src/shrink_inmem.h: -------------------------------------------------------------------------------- 1 | /* 2 | * shrink_inmem.h - in-memory compression definitions 3 | * 4 | * Copyright (C) 2019 Emmanuel Marty 5 | * 6 | * This software is provided 'as-is', without any express or implied 7 | * warranty. In no event will the authors be held liable for any damages 8 | * arising from the use of this software. 9 | * 10 | * Permission is granted to anyone to use this software for any purpose, 11 | * including commercial applications, and to alter it and redistribute it 12 | * freely, subject to the following restrictions: 13 | * 14 | * 1. The origin of this software must not be misrepresented; you must not 15 | * claim that you wrote the original software. If you use this software 16 | * in a product, an acknowledgment in the product documentation would be 17 | * appreciated but is not required. 18 | * 2. Altered source versions must be plainly marked as such, and must not be 19 | * misrepresented as being the original software. 20 | * 3. This notice may not be removed or altered from any source distribution. 21 | */ 22 | 23 | /* 24 | * Uses the libdivsufsort library Copyright (c) 2003-2008 Yuta Mori 25 | * 26 | * Inspired by LZ4 by Yann Collet. https://github.com/lz4/lz4 27 | * With help, ideas, optimizations and speed measurements by spke 28 | * With ideas from Lizard by Przemyslaw Skibinski and Yann Collet. https://github.com/inikep/lizard 29 | * Also with ideas from smallz4 by Stephan Brumme. https://create.stephan-brumme.com/smallz4/ 30 | * 31 | */ 32 | 33 | #ifndef _SHRINK_INMEM_H 34 | #define _SHRINK_INMEM_H 35 | 36 | #include 37 | 38 | /** 39 | * Get maximum compressed size of input(source) data 40 | * 41 | * @param nInputSize input(source) size in bytes 42 | * @param nFlags compression flags (LZ4ULTRA_FLAG_xxx) 43 | * @param nBlockMaxCode maximum block size code (4..7 for 64 Kb..4 Mb) 44 | * 45 | * @return maximum compressed size 46 | */ 47 | size_t lz4ultra_get_max_compressed_size_inmem(size_t nInputSize, unsigned int nFlags, 48 | int nBlockMaxCode); 49 | 50 | /** 51 | * Compress memory 52 | * 53 | * @param pInputData pointer to input(source) data to compress 54 | * @param pOutBuffer buffer for compressed data 55 | * @param nInputSize input(source) size in bytes 56 | * @param nMaxOutBufferSize maximum capacity of compression buffer 57 | * @param nFlags compression flags (LZ4ULTRA_FLAG_xxx) 58 | * @param nBlockMaxCode maximum block size code (4..7 for 64 Kb..4 Mb) 59 | * 60 | * @return actual compressed size, or -1 for error 61 | */ 62 | size_t lz4ultra_compress_inmem(const unsigned char *pInputData, unsigned char *pOutBuffer, size_t nInputSize, size_t nMaxOutBufferSize, unsigned int nFlags, 63 | int nBlockMaxCode); 64 | 65 | #endif /* _SHRINK_INMEM_H */ 66 | -------------------------------------------------------------------------------- /src/shrink_streaming.h: -------------------------------------------------------------------------------- 1 | /* 2 | * shrink_streaming.h - streaming compression definitions 3 | * 4 | * Copyright (C) 2019 Emmanuel Marty 5 | * 6 | * This software is provided 'as-is', without any express or implied 7 | * warranty. In no event will the authors be held liable for any damages 8 | * arising from the use of this software. 9 | * 10 | * Permission is granted to anyone to use this software for any purpose, 11 | * including commercial applications, and to alter it and redistribute it 12 | * freely, subject to the following restrictions: 13 | * 14 | * 1. The origin of this software must not be misrepresented; you must not 15 | * claim that you wrote the original software. If you use this software 16 | * in a product, an acknowledgment in the product documentation would be 17 | * appreciated but is not required. 18 | * 2. Altered source versions must be plainly marked as such, and must not be 19 | * misrepresented as being the original software. 20 | * 3. This notice may not be removed or altered from any source distribution. 21 | */ 22 | 23 | /* 24 | * Uses the libdivsufsort library Copyright (c) 2003-2008 Yuta Mori 25 | * 26 | * Inspired by LZ4 by Yann Collet. https://github.com/lz4/lz4 27 | * With help, ideas, optimizations and speed measurements by spke 28 | * With ideas from Lizard by Przemyslaw Skibinski and Yann Collet. https://github.com/inikep/lizard 29 | * Also with ideas from smallz4 by Stephan Brumme. https://create.stephan-brumme.com/smallz4/ 30 | * 31 | */ 32 | 33 | #ifndef _SHRINK_STREAMING_H 34 | #define _SHRINK_STREAMING_H 35 | 36 | #include "stream.h" 37 | 38 | /* Forward declaration */ 39 | typedef enum _lz4ultra_status_t lz4ultra_status_t; 40 | 41 | /*-------------- File API -------------- */ 42 | 43 | /** 44 | * Compress file 45 | * 46 | * @param pszInFilename name of input(source) file to compress 47 | * @param pszOutFilename name of output(compressed) file to generate 48 | * @param pszDictionaryFilename name of dictionary file, or NULL for none 49 | * @param nFlags compression flags (LZ4ULTRA_FLAG_xxx) 50 | * @param nBlockMaxCode maximum block size code (4..7 for 64 Kb..4 Mb) 51 | * @param start start function, called when the max block size is finalized and compression is about to start, or NULL for none 52 | * @param progress progress function, called after compressing each block, or NULL for none 53 | * @param pOriginalSize pointer to returned input(source) size, updated when this function is successful 54 | * @param pCompressedSize pointer to returned output(compressed) size, updated when this function is successful 55 | * @param pCommandCount pointer to returned token(compression commands) count, updated when this function is successful 56 | * 57 | * @return LZ4ULTRA_OK for success, or an error value from lz4ultra_status_t 58 | */ 59 | lz4ultra_status_t lz4ultra_compress_file(const char *pszInFilename, const char *pszOutFilename, const char *pszDictionaryFilename, const unsigned int nFlags, 60 | int nBlockMaxCode, 61 | void(*start)(int nBlockMaxCode, const unsigned int nFlags), 62 | void(*progress)(long long nOriginalSize, long long nCompressedSize), long long *pOriginalSize, long long *pCompressedSize, int *pCommandCount); 63 | 64 | /*-------------- Streaming API -------------- */ 65 | 66 | /** 67 | * Compress stream 68 | * 69 | * @param pInStream input(source) stream to compress 70 | * @param pOutStream output(compressed) stream to write to 71 | * @param pDictionaryData dictionary contents, or NULL for none 72 | * @param nDictionaryDataSize size of dictionary contents, or 0 73 | * @param nFlags compression flags (LZ4ULTRA_FLAG_xxx) 74 | * @param nBlockMaxCode maximum block size code (4..7 for 64 Kb..4 Mb) 75 | * @param start start function, called when the max block size is finalized and compression is about to start, or NULL for none 76 | * @param progress progress function, called after compressing each block, or NULL for none 77 | * @param pOriginalSize pointer to returned input(source) size, updated when this function is successful 78 | * @param pCompressedSize pointer to returned output(compressed) size, updated when this function is successful 79 | * @param pCommandCount pointer to returned token(compression commands) count, updated when this function is successful 80 | * 81 | * @return LZ4ULTRA_OK for success, or an error value from lz4ultra_status_t 82 | */ 83 | lz4ultra_status_t lz4ultra_compress_stream(lz4ultra_stream_t *pInStream, lz4ultra_stream_t *pOutStream, const void *pDictionaryData, int nDictionaryDataSize, unsigned int nFlags, 84 | int nBlockMaxCode, 85 | void(*start)(int nBlockMaxCode, const unsigned int nFlags), 86 | void(*progress)(long long nOriginalSize, long long nCompressedSize), long long *pOriginalSize, long long *pCompressedSize, int *pCommandCount); 87 | 88 | #endif /* _SHRINK_STREAMING_H */ 89 | -------------------------------------------------------------------------------- /src/stream.c: -------------------------------------------------------------------------------- 1 | /* 2 | * stream.c - streaming I/O implementation 3 | * 4 | * Copyright (C) 2019 Emmanuel Marty 5 | * 6 | * This software is provided 'as-is', without any express or implied 7 | * warranty. In no event will the authors be held liable for any damages 8 | * arising from the use of this software. 9 | * 10 | * Permission is granted to anyone to use this software for any purpose, 11 | * including commercial applications, and to alter it and redistribute it 12 | * freely, subject to the following restrictions: 13 | * 14 | * 1. The origin of this software must not be misrepresented; you must not 15 | * claim that you wrote the original software. If you use this software 16 | * in a product, an acknowledgment in the product documentation would be 17 | * appreciated but is not required. 18 | * 2. Altered source versions must be plainly marked as such, and must not be 19 | * misrepresented as being the original software. 20 | * 3. This notice may not be removed or altered from any source distribution. 21 | */ 22 | 23 | /* 24 | * Uses the libdivsufsort library Copyright (c) 2003-2008 Yuta Mori 25 | * 26 | * Inspired by LZ4 by Yann Collet. https://github.com/lz4/lz4 27 | * With help, ideas, optimizations and speed measurements by spke 28 | * With ideas from Lizard by Przemyslaw Skibinski and Yann Collet. https://github.com/inikep/lizard 29 | * Also with ideas from smallz4 by Stephan Brumme. https://create.stephan-brumme.com/smallz4/ 30 | * 31 | */ 32 | 33 | #include 34 | #include 35 | #include 36 | #include "stream.h" 37 | 38 | /** 39 | * Close file stream 40 | * 41 | * @param stream stream 42 | */ 43 | static void lz4ultra_filestream_close(lz4ultra_stream_t *stream) { 44 | if (stream->obj) { 45 | fclose((FILE*)stream->obj); 46 | stream->obj = NULL; 47 | stream->read = NULL; 48 | stream->write = NULL; 49 | stream->eof = NULL; 50 | stream->close = NULL; 51 | } 52 | } 53 | 54 | /** 55 | * Read from file stream 56 | * 57 | * @param stream stream 58 | * @param ptr buffer to read into 59 | * @param size number of bytes to read 60 | * 61 | * @return number of bytes read 62 | */ 63 | static size_t lz4ultra_filestream_read(lz4ultra_stream_t *stream, void *ptr, size_t size) { 64 | return fread(ptr, 1, size, (FILE*)stream->obj); 65 | } 66 | 67 | /** 68 | * Write to file stream 69 | * 70 | * @param stream stream 71 | * @param ptr buffer to write from 72 | * @param size number of bytes to write 73 | * 74 | * @return number of bytes written 75 | */ 76 | static size_t lz4ultra_filestream_write(lz4ultra_stream_t *stream, void *ptr, size_t size) { 77 | return fwrite(ptr, 1, size, (FILE*)stream->obj); 78 | } 79 | 80 | /** 81 | * Check if file stream has reached the end of the data 82 | * 83 | * @param stream stream 84 | * 85 | * @return nonzero if the end of the data has been reached, 0 if there is more data 86 | */ 87 | static int lz4ultra_filestream_eof(lz4ultra_stream_t *stream) { 88 | return feof((FILE*)stream->obj); 89 | } 90 | 91 | /** 92 | * Open file and create an I/O stream from it 93 | * 94 | * @param stream stream to fill out 95 | * @param pszInFilename filename 96 | * @param pszMode open mode, as with fopen() 97 | * 98 | * @return 0 for success, nonzero for failure 99 | */ 100 | int lz4ultra_filestream_open(lz4ultra_stream_t *stream, const char *pszInFilename, const char *pszMode) { 101 | stream->obj = (void*)fopen(pszInFilename, pszMode); 102 | if (stream->obj) { 103 | stream->read = lz4ultra_filestream_read; 104 | stream->write = lz4ultra_filestream_write; 105 | stream->eof = lz4ultra_filestream_eof; 106 | stream->close = lz4ultra_filestream_close; 107 | return 0; 108 | } 109 | else 110 | return -1; 111 | } 112 | -------------------------------------------------------------------------------- /src/stream.h: -------------------------------------------------------------------------------- 1 | /* 2 | * stream.h - streaming I/O definitions 3 | * 4 | * Copyright (C) 2019 Emmanuel Marty 5 | * 6 | * This software is provided 'as-is', without any express or implied 7 | * warranty. In no event will the authors be held liable for any damages 8 | * arising from the use of this software. 9 | * 10 | * Permission is granted to anyone to use this software for any purpose, 11 | * including commercial applications, and to alter it and redistribute it 12 | * freely, subject to the following restrictions: 13 | * 14 | * 1. The origin of this software must not be misrepresented; you must not 15 | * claim that you wrote the original software. If you use this software 16 | * in a product, an acknowledgment in the product documentation would be 17 | * appreciated but is not required. 18 | * 2. Altered source versions must be plainly marked as such, and must not be 19 | * misrepresented as being the original software. 20 | * 3. This notice may not be removed or altered from any source distribution. 21 | */ 22 | 23 | /* 24 | * Uses the libdivsufsort library Copyright (c) 2003-2008 Yuta Mori 25 | * 26 | * Inspired by LZ4 by Yann Collet. https://github.com/lz4/lz4 27 | * With help, ideas, optimizations and speed measurements by spke 28 | * With ideas from Lizard by Przemyslaw Skibinski and Yann Collet. https://github.com/inikep/lizard 29 | * Also with ideas from smallz4 by Stephan Brumme. https://create.stephan-brumme.com/smallz4/ 30 | * 31 | */ 32 | 33 | #ifndef _STREAM_H 34 | #define _STREAM_H 35 | 36 | /* Forward declaration */ 37 | typedef struct _lz4ultra_stream_t lz4ultra_stream_t; 38 | 39 | /* I/O stream */ 40 | typedef struct _lz4ultra_stream_t { 41 | /** Opaque stream-specific pointer */ 42 | void *obj; 43 | 44 | /** 45 | * Read from stream 46 | * 47 | * @param stream stream 48 | * @param ptr buffer to read into 49 | * @param size number of bytes to read 50 | * 51 | * @return number of bytes read 52 | */ 53 | size_t(*read)(lz4ultra_stream_t *stream, void *ptr, size_t size); 54 | 55 | /** 56 | * Write to stream 57 | * 58 | * @param stream stream 59 | * @param ptr buffer to write from 60 | * @param size number of bytes to write 61 | * 62 | * @return number of bytes written 63 | */ 64 | size_t(*write)(lz4ultra_stream_t *stream, void *ptr, size_t size); 65 | 66 | 67 | /** 68 | * Check if stream has reached the end of the data 69 | * 70 | * @param stream stream 71 | * 72 | * @return nonzero if the end of the data has been reached, 0 if there is more data 73 | */ 74 | int(*eof)(lz4ultra_stream_t *stream); 75 | 76 | /** 77 | * Close stream 78 | * 79 | * @param stream stream 80 | */ 81 | void(*close)(lz4ultra_stream_t *stream); 82 | } lz4ultra_stream_t; 83 | 84 | /** 85 | * Open file and create an I/O stream from it 86 | * 87 | * @param stream stream to fill out 88 | * @param pszInFilename filename 89 | * @param pszMode open mode, as with fopen() 90 | * 91 | * @return 0 for success, nonzero for failure 92 | */ 93 | int lz4ultra_filestream_open(lz4ultra_stream_t *stream, const char *pszInFilename, const char *pszMode); 94 | 95 | #endif /* _STREAM_H */ 96 | -------------------------------------------------------------------------------- /src/xxhash/LICENSE.txt: -------------------------------------------------------------------------------- 1 | xxHash Library 2 | Copyright (c) 2012-2014, Yann Collet 3 | All rights reserved. 4 | 5 | Redistribution and use in source and binary forms, with or without modification, 6 | are permitted provided that the following conditions are met: 7 | 8 | * Redistributions of source code must retain the above copyright notice, this 9 | list of conditions and the following disclaimer. 10 | 11 | * Redistributions in binary form must reproduce the above copyright notice, this 12 | list of conditions and the following disclaimer in the documentation and/or 13 | other materials provided with the distribution. 14 | 15 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 16 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 17 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 18 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR 19 | ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 20 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 21 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 22 | ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 23 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 24 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 25 | --------------------------------------------------------------------------------