├── LICENSE
├── LzmaSpec.cpp
├── LzmaSpec.exe
├── README.md
├── examples
    ├── a.lzma
    ├── a.txt
    ├── a_eos.lzma
    ├── a_eos_and_size.lzma
    ├── a_lp1_lc2_pb1.lzma
    ├── bad_corrupted.lzma
    ├── bad_eos_incorrect_size.lzma
    ├── bad_incorrect_size.lzma
    └── info.txt
├── geo_lzma
├── lzma-specification.txt
├── lzmaSh2.cpp
├── lzmaSh2.exe
├── lzmaSh2a.cpp
├── lzmaSh2a.exe
├── lzmaspec-readme.txt
└── test.bat


/LICENSE:
--------------------------------------------------------------------------------
  1 |                     GNU GENERAL PUBLIC LICENSE
  2 |                        Version 3, 29 June 2007
  3 | 
  4 |  Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
  5 |  Everyone is permitted to copy and distribute verbatim copies
  6 |  of this license document, but changing it is not allowed.
  7 | 
  8 |                             Preamble
  9 | 
 10 |   The GNU General Public License is a free, copyleft license for
 11 | software and other kinds of works.
 12 | 
 13 |   The licenses for most software and other practical works are designed
 14 | to take away your freedom to share and change the works.  By contrast,
 15 | the GNU General Public License is intended to guarantee your freedom to
 16 | share and change all versions of a program--to make sure it remains free
 17 | software for all its users.  We, the Free Software Foundation, use the
 18 | GNU General Public License for most of our software; it applies also to
 19 | any other work released this way by its authors.  You can apply it to
 20 | your programs, too.
 21 | 
 22 |   When we speak of free software, we are referring to freedom, not
 23 | price.  Our General Public Licenses are designed to make sure that you
 24 | have the freedom to distribute copies of free software (and charge for
 25 | them if you wish), that you receive source code or can get it if you
 26 | want it, that you can change the software or use pieces of it in new
 27 | free programs, and that you know you can do these things.
 28 | 
 29 |   To protect your rights, we need to prevent others from denying you
 30 | these rights or asking you to surrender the rights.  Therefore, you have
 31 | certain responsibilities if you distribute copies of the software, or if
 32 | you modify it: responsibilities to respect the freedom of others.
 33 | 
 34 |   For example, if you distribute copies of such a program, whether
 35 | gratis or for a fee, you must pass on to the recipients the same
 36 | freedoms that you received.  You must make sure that they, too, receive
 37 | or can get the source code.  And you must show them these terms so they
 38 | know their rights.
 39 | 
 40 |   Developers that use the GNU GPL protect your rights with two steps:
 41 | (1) assert copyright on the software, and (2) offer you this License
 42 | giving you legal permission to copy, distribute and/or modify it.
 43 | 
 44 |   For the developers' and authors' protection, the GPL clearly explains
 45 | that there is no warranty for this free software.  For both users' and
 46 | authors' sake, the GPL requires that modified versions be marked as
 47 | changed, so that their problems will not be attributed erroneously to
 48 | authors of previous versions.
 49 | 
 50 |   Some devices are designed to deny users access to install or run
 51 | modified versions of the software inside them, although the manufacturer
 52 | can do so.  This is fundamentally incompatible with the aim of
 53 | protecting users' freedom to change the software.  The systematic
 54 | pattern of such abuse occurs in the area of products for individuals to
 55 | use, which is precisely where it is most unacceptable.  Therefore, we
 56 | have designed this version of the GPL to prohibit the practice for those
 57 | products.  If such problems arise substantially in other domains, we
 58 | stand ready to extend this provision to those domains in future versions
 59 | of the GPL, as needed to protect the freedom of users.
 60 | 
 61 |   Finally, every program is threatened constantly by software patents.
 62 | States should not allow patents to restrict development and use of
 63 | software on general-purpose computers, but in those that do, we wish to
 64 | avoid the special danger that patents applied to a free program could
 65 | make it effectively proprietary.  To prevent this, the GPL assures that
 66 | patents cannot be used to render the program non-free.
 67 | 
 68 |   The precise terms and conditions for copying, distribution and
 69 | modification follow.
 70 | 
 71 |                        TERMS AND CONDITIONS
 72 | 
 73 |   0. Definitions.
 74 | 
 75 |   "This License" refers to version 3 of the GNU General Public License.
 76 | 
 77 |   "Copyright" also means copyright-like laws that apply to other kinds of
 78 | works, such as semiconductor masks.
 79 | 
 80 |   "The Program" refers to any copyrightable work licensed under this
 81 | License.  Each licensee is addressed as "you".  "Licensees" and
 82 | "recipients" may be individuals or organizations.
 83 | 
 84 |   To "modify" a work means to copy from or adapt all or part of the work
 85 | in a fashion requiring copyright permission, other than the making of an
 86 | exact copy.  The resulting work is called a "modified version" of the
 87 | earlier work or a work "based on" the earlier work.
 88 | 
 89 |   A "covered work" means either the unmodified Program or a work based
 90 | on the Program.
 91 | 
 92 |   To "propagate" a work means to do anything with it that, without
 93 | permission, would make you directly or secondarily liable for
 94 | infringement under applicable copyright law, except executing it on a
 95 | computer or modifying a private copy.  Propagation includes copying,
 96 | distribution (with or without modification), making available to the
 97 | public, and in some countries other activities as well.
 98 | 
 99 |   To "convey" a work means any kind of propagation that enables other
100 | parties to make or receive copies.  Mere interaction with a user through
101 | a computer network, with no transfer of a copy, is not conveying.
102 | 
103 |   An interactive user interface displays "Appropriate Legal Notices"
104 | to the extent that it includes a convenient and prominently visible
105 | feature that (1) displays an appropriate copyright notice, and (2)
106 | tells the user that there is no warranty for the work (except to the
107 | extent that warranties are provided), that licensees may convey the
108 | work under this License, and how to view a copy of this License.  If
109 | the interface presents a list of user commands or options, such as a
110 | menu, a prominent item in the list meets this criterion.
111 | 
112 |   1. Source Code.
113 | 
114 |   The "source code" for a work means the preferred form of the work
115 | for making modifications to it.  "Object code" means any non-source
116 | form of a work.
117 | 
118 |   A "Standard Interface" means an interface that either is an official
119 | standard defined by a recognized standards body, or, in the case of
120 | interfaces specified for a particular programming language, one that
121 | is widely used among developers working in that language.
122 | 
123 |   The "System Libraries" of an executable work include anything, other
124 | than the work as a whole, that (a) is included in the normal form of
125 | packaging a Major Component, but which is not part of that Major
126 | Component, and (b) serves only to enable use of the work with that
127 | Major Component, or to implement a Standard Interface for which an
128 | implementation is available to the public in source code form.  A
129 | "Major Component", in this context, means a major essential component
130 | (kernel, window system, and so on) of the specific operating system
131 | (if any) on which the executable work runs, or a compiler used to
132 | produce the work, or an object code interpreter used to run it.
133 | 
134 |   The "Corresponding Source" for a work in object code form means all
135 | the source code needed to generate, install, and (for an executable
136 | work) run the object code and to modify the work, including scripts to
137 | control those activities.  However, it does not include the work's
138 | System Libraries, or general-purpose tools or generally available free
139 | programs which are used unmodified in performing those activities but
140 | which are not part of the work.  For example, Corresponding Source
141 | includes interface definition files associated with source files for
142 | the work, and the source code for shared libraries and dynamically
143 | linked subprograms that the work is specifically designed to require,
144 | such as by intimate data communication or control flow between those
145 | subprograms and other parts of the work.
146 | 
147 |   The Corresponding Source need not include anything that users
148 | can regenerate automatically from other parts of the Corresponding
149 | Source.
150 | 
151 |   The Corresponding Source for a work in source code form is that
152 | same work.
153 | 
154 |   2. Basic Permissions.
155 | 
156 |   All rights granted under this License are granted for the term of
157 | copyright on the Program, and are irrevocable provided the stated
158 | conditions are met.  This License explicitly affirms your unlimited
159 | permission to run the unmodified Program.  The output from running a
160 | covered work is covered by this License only if the output, given its
161 | content, constitutes a covered work.  This License acknowledges your
162 | rights of fair use or other equivalent, as provided by copyright law.
163 | 
164 |   You may make, run and propagate covered works that you do not
165 | convey, without conditions so long as your license otherwise remains
166 | in force.  You may convey covered works to others for the sole purpose
167 | of having them make modifications exclusively for you, or provide you
168 | with facilities for running those works, provided that you comply with
169 | the terms of this License in conveying all material for which you do
170 | not control copyright.  Those thus making or running the covered works
171 | for you must do so exclusively on your behalf, under your direction
172 | and control, on terms that prohibit them from making any copies of
173 | your copyrighted material outside their relationship with you.
174 | 
175 |   Conveying under any other circumstances is permitted solely under
176 | the conditions stated below.  Sublicensing is not allowed; section 10
177 | makes it unnecessary.
178 | 
179 |   3. Protecting Users' Legal Rights From Anti-Circumvention Law.
180 | 
181 |   No covered work shall be deemed part of an effective technological
182 | measure under any applicable law fulfilling obligations under article
183 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or
184 | similar laws prohibiting or restricting circumvention of such
185 | measures.
186 | 
187 |   When you convey a covered work, you waive any legal power to forbid
188 | circumvention of technological measures to the extent such circumvention
189 | is effected by exercising rights under this License with respect to
190 | the covered work, and you disclaim any intention to limit operation or
191 | modification of the work as a means of enforcing, against the work's
192 | users, your or third parties' legal rights to forbid circumvention of
193 | technological measures.
194 | 
195 |   4. Conveying Verbatim Copies.
196 | 
197 |   You may convey verbatim copies of the Program's source code as you
198 | receive it, in any medium, provided that you conspicuously and
199 | appropriately publish on each copy an appropriate copyright notice;
200 | keep intact all notices stating that this License and any
201 | non-permissive terms added in accord with section 7 apply to the code;
202 | keep intact all notices of the absence of any warranty; and give all
203 | recipients a copy of this License along with the Program.
204 | 
205 |   You may charge any price or no price for each copy that you convey,
206 | and you may offer support or warranty protection for a fee.
207 | 
208 |   5. Conveying Modified Source Versions.
209 | 
210 |   You may convey a work based on the Program, or the modifications to
211 | produce it from the Program, in the form of source code under the
212 | terms of section 4, provided that you also meet all of these conditions:
213 | 
214 |     a) The work must carry prominent notices stating that you modified
215 |     it, and giving a relevant date.
216 | 
217 |     b) The work must carry prominent notices stating that it is
218 |     released under this License and any conditions added under section
219 |     7.  This requirement modifies the requirement in section 4 to
220 |     "keep intact all notices".
221 | 
222 |     c) You must license the entire work, as a whole, under this
223 |     License to anyone who comes into possession of a copy.  This
224 |     License will therefore apply, along with any applicable section 7
225 |     additional terms, to the whole of the work, and all its parts,
226 |     regardless of how they are packaged.  This License gives no
227 |     permission to license the work in any other way, but it does not
228 |     invalidate such permission if you have separately received it.
229 | 
230 |     d) If the work has interactive user interfaces, each must display
231 |     Appropriate Legal Notices; however, if the Program has interactive
232 |     interfaces that do not display Appropriate Legal Notices, your
233 |     work need not make them do so.
234 | 
235 |   A compilation of a covered work with other separate and independent
236 | works, which are not by their nature extensions of the covered work,
237 | and which are not combined with it such as to form a larger program,
238 | in or on a volume of a storage or distribution medium, is called an
239 | "aggregate" if the compilation and its resulting copyright are not
240 | used to limit the access or legal rights of the compilation's users
241 | beyond what the individual works permit.  Inclusion of a covered work
242 | in an aggregate does not cause this License to apply to the other
243 | parts of the aggregate.
244 | 
245 |   6. Conveying Non-Source Forms.
246 | 
247 |   You may convey a covered work in object code form under the terms
248 | of sections 4 and 5, provided that you also convey the
249 | machine-readable Corresponding Source under the terms of this License,
250 | in one of these ways:
251 | 
252 |     a) Convey the object code in, or embodied in, a physical product
253 |     (including a physical distribution medium), accompanied by the
254 |     Corresponding Source fixed on a durable physical medium
255 |     customarily used for software interchange.
256 | 
257 |     b) Convey the object code in, or embodied in, a physical product
258 |     (including a physical distribution medium), accompanied by a
259 |     written offer, valid for at least three years and valid for as
260 |     long as you offer spare parts or customer support for that product
261 |     model, to give anyone who possesses the object code either (1) a
262 |     copy of the Corresponding Source for all the software in the
263 |     product that is covered by this License, on a durable physical
264 |     medium customarily used for software interchange, for a price no
265 |     more than your reasonable cost of physically performing this
266 |     conveying of source, or (2) access to copy the
267 |     Corresponding Source from a network server at no charge.
268 | 
269 |     c) Convey individual copies of the object code with a copy of the
270 |     written offer to provide the Corresponding Source.  This
271 |     alternative is allowed only occasionally and noncommercially, and
272 |     only if you received the object code with such an offer, in accord
273 |     with subsection 6b.
274 | 
275 |     d) Convey the object code by offering access from a designated
276 |     place (gratis or for a charge), and offer equivalent access to the
277 |     Corresponding Source in the same way through the same place at no
278 |     further charge.  You need not require recipients to copy the
279 |     Corresponding Source along with the object code.  If the place to
280 |     copy the object code is a network server, the Corresponding Source
281 |     may be on a different server (operated by you or a third party)
282 |     that supports equivalent copying facilities, provided you maintain
283 |     clear directions next to the object code saying where to find the
284 |     Corresponding Source.  Regardless of what server hosts the
285 |     Corresponding Source, you remain obligated to ensure that it is
286 |     available for as long as needed to satisfy these requirements.
287 | 
288 |     e) Convey the object code using peer-to-peer transmission, provided
289 |     you inform other peers where the object code and Corresponding
290 |     Source of the work are being offered to the general public at no
291 |     charge under subsection 6d.
292 | 
293 |   A separable portion of the object code, whose source code is excluded
294 | from the Corresponding Source as a System Library, need not be
295 | included in conveying the object code work.
296 | 
297 |   A "User Product" is either (1) a "consumer product", which means any
298 | tangible personal property which is normally used for personal, family,
299 | or household purposes, or (2) anything designed or sold for incorporation
300 | into a dwelling.  In determining whether a product is a consumer product,
301 | doubtful cases shall be resolved in favor of coverage.  For a particular
302 | product received by a particular user, "normally used" refers to a
303 | typical or common use of that class of product, regardless of the status
304 | of the particular user or of the way in which the particular user
305 | actually uses, or expects or is expected to use, the product.  A product
306 | is a consumer product regardless of whether the product has substantial
307 | commercial, industrial or non-consumer uses, unless such uses represent
308 | the only significant mode of use of the product.
309 | 
310 |   "Installation Information" for a User Product means any methods,
311 | procedures, authorization keys, or other information required to install
312 | and execute modified versions of a covered work in that User Product from
313 | a modified version of its Corresponding Source.  The information must
314 | suffice to ensure that the continued functioning of the modified object
315 | code is in no case prevented or interfered with solely because
316 | modification has been made.
317 | 
318 |   If you convey an object code work under this section in, or with, or
319 | specifically for use in, a User Product, and the conveying occurs as
320 | part of a transaction in which the right of possession and use of the
321 | User Product is transferred to the recipient in perpetuity or for a
322 | fixed term (regardless of how the transaction is characterized), the
323 | Corresponding Source conveyed under this section must be accompanied
324 | by the Installation Information.  But this requirement does not apply
325 | if neither you nor any third party retains the ability to install
326 | modified object code on the User Product (for example, the work has
327 | been installed in ROM).
328 | 
329 |   The requirement to provide Installation Information does not include a
330 | requirement to continue to provide support service, warranty, or updates
331 | for a work that has been modified or installed by the recipient, or for
332 | the User Product in which it has been modified or installed.  Access to a
333 | network may be denied when the modification itself materially and
334 | adversely affects the operation of the network or violates the rules and
335 | protocols for communication across the network.
336 | 
337 |   Corresponding Source conveyed, and Installation Information provided,
338 | in accord with this section must be in a format that is publicly
339 | documented (and with an implementation available to the public in
340 | source code form), and must require no special password or key for
341 | unpacking, reading or copying.
342 | 
343 |   7. Additional Terms.
344 | 
345 |   "Additional permissions" are terms that supplement the terms of this
346 | License by making exceptions from one or more of its conditions.
347 | Additional permissions that are applicable to the entire Program shall
348 | be treated as though they were included in this License, to the extent
349 | that they are valid under applicable law.  If additional permissions
350 | apply only to part of the Program, that part may be used separately
351 | under those permissions, but the entire Program remains governed by
352 | this License without regard to the additional permissions.
353 | 
354 |   When you convey a copy of a covered work, you may at your option
355 | remove any additional permissions from that copy, or from any part of
356 | it.  (Additional permissions may be written to require their own
357 | removal in certain cases when you modify the work.)  You may place
358 | additional permissions on material, added by you to a covered work,
359 | for which you have or can give appropriate copyright permission.
360 | 
361 |   Notwithstanding any other provision of this License, for material you
362 | add to a covered work, you may (if authorized by the copyright holders of
363 | that material) supplement the terms of this License with terms:
364 | 
365 |     a) Disclaiming warranty or limiting liability differently from the
366 |     terms of sections 15 and 16 of this License; or
367 | 
368 |     b) Requiring preservation of specified reasonable legal notices or
369 |     author attributions in that material or in the Appropriate Legal
370 |     Notices displayed by works containing it; or
371 | 
372 |     c) Prohibiting misrepresentation of the origin of that material, or
373 |     requiring that modified versions of such material be marked in
374 |     reasonable ways as different from the original version; or
375 | 
376 |     d) Limiting the use for publicity purposes of names of licensors or
377 |     authors of the material; or
378 | 
379 |     e) Declining to grant rights under trademark law for use of some
380 |     trade names, trademarks, or service marks; or
381 | 
382 |     f) Requiring indemnification of licensors and authors of that
383 |     material by anyone who conveys the material (or modified versions of
384 |     it) with contractual assumptions of liability to the recipient, for
385 |     any liability that these contractual assumptions directly impose on
386 |     those licensors and authors.
387 | 
388 |   All other non-permissive additional terms are considered "further
389 | restrictions" within the meaning of section 10.  If the Program as you
390 | received it, or any part of it, contains a notice stating that it is
391 | governed by this License along with a term that is a further
392 | restriction, you may remove that term.  If a license document contains
393 | a further restriction but permits relicensing or conveying under this
394 | License, you may add to a covered work material governed by the terms
395 | of that license document, provided that the further restriction does
396 | not survive such relicensing or conveying.
397 | 
398 |   If you add terms to a covered work in accord with this section, you
399 | must place, in the relevant source files, a statement of the
400 | additional terms that apply to those files, or a notice indicating
401 | where to find the applicable terms.
402 | 
403 |   Additional terms, permissive or non-permissive, may be stated in the
404 | form of a separately written license, or stated as exceptions;
405 | the above requirements apply either way.
406 | 
407 |   8. Termination.
408 | 
409 |   You may not propagate or modify a covered work except as expressly
410 | provided under this License.  Any attempt otherwise to propagate or
411 | modify it is void, and will automatically terminate your rights under
412 | this License (including any patent licenses granted under the third
413 | paragraph of section 11).
414 | 
415 |   However, if you cease all violation of this License, then your
416 | license from a particular copyright holder is reinstated (a)
417 | provisionally, unless and until the copyright holder explicitly and
418 | finally terminates your license, and (b) permanently, if the copyright
419 | holder fails to notify you of the violation by some reasonable means
420 | prior to 60 days after the cessation.
421 | 
422 |   Moreover, your license from a particular copyright holder is
423 | reinstated permanently if the copyright holder notifies you of the
424 | violation by some reasonable means, this is the first time you have
425 | received notice of violation of this License (for any work) from that
426 | copyright holder, and you cure the violation prior to 30 days after
427 | your receipt of the notice.
428 | 
429 |   Termination of your rights under this section does not terminate the
430 | licenses of parties who have received copies or rights from you under
431 | this License.  If your rights have been terminated and not permanently
432 | reinstated, you do not qualify to receive new licenses for the same
433 | material under section 10.
434 | 
435 |   9. Acceptance Not Required for Having Copies.
436 | 
437 |   You are not required to accept this License in order to receive or
438 | run a copy of the Program.  Ancillary propagation of a covered work
439 | occurring solely as a consequence of using peer-to-peer transmission
440 | to receive a copy likewise does not require acceptance.  However,
441 | nothing other than this License grants you permission to propagate or
442 | modify any covered work.  These actions infringe copyright if you do
443 | not accept this License.  Therefore, by modifying or propagating a
444 | covered work, you indicate your acceptance of this License to do so.
445 | 
446 |   10. Automatic Licensing of Downstream Recipients.
447 | 
448 |   Each time you convey a covered work, the recipient automatically
449 | receives a license from the original licensors, to run, modify and
450 | propagate that work, subject to this License.  You are not responsible
451 | for enforcing compliance by third parties with this License.
452 | 
453 |   An "entity transaction" is a transaction transferring control of an
454 | organization, or substantially all assets of one, or subdividing an
455 | organization, or merging organizations.  If propagation of a covered
456 | work results from an entity transaction, each party to that
457 | transaction who receives a copy of the work also receives whatever
458 | licenses to the work the party's predecessor in interest had or could
459 | give under the previous paragraph, plus a right to possession of the
460 | Corresponding Source of the work from the predecessor in interest, if
461 | the predecessor has it or can get it with reasonable efforts.
462 | 
463 |   You may not impose any further restrictions on the exercise of the
464 | rights granted or affirmed under this License.  For example, you may
465 | not impose a license fee, royalty, or other charge for exercise of
466 | rights granted under this License, and you may not initiate litigation
467 | (including a cross-claim or counterclaim in a lawsuit) alleging that
468 | any patent claim is infringed by making, using, selling, offering for
469 | sale, or importing the Program or any portion of it.
470 | 
471 |   11. Patents.
472 | 
473 |   A "contributor" is a copyright holder who authorizes use under this
474 | License of the Program or a work on which the Program is based.  The
475 | work thus licensed is called the contributor's "contributor version".
476 | 
477 |   A contributor's "essential patent claims" are all patent claims
478 | owned or controlled by the contributor, whether already acquired or
479 | hereafter acquired, that would be infringed by some manner, permitted
480 | by this License, of making, using, or selling its contributor version,
481 | but do not include claims that would be infringed only as a
482 | consequence of further modification of the contributor version.  For
483 | purposes of this definition, "control" includes the right to grant
484 | patent sublicenses in a manner consistent with the requirements of
485 | this License.
486 | 
487 |   Each contributor grants you a non-exclusive, worldwide, royalty-free
488 | patent license under the contributor's essential patent claims, to
489 | make, use, sell, offer for sale, import and otherwise run, modify and
490 | propagate the contents of its contributor version.
491 | 
492 |   In the following three paragraphs, a "patent license" is any express
493 | agreement or commitment, however denominated, not to enforce a patent
494 | (such as an express permission to practice a patent or covenant not to
495 | sue for patent infringement).  To "grant" such a patent license to a
496 | party means to make such an agreement or commitment not to enforce a
497 | patent against the party.
498 | 
499 |   If you convey a covered work, knowingly relying on a patent license,
500 | and the Corresponding Source of the work is not available for anyone
501 | to copy, free of charge and under the terms of this License, through a
502 | publicly available network server or other readily accessible means,
503 | then you must either (1) cause the Corresponding Source to be so
504 | available, or (2) arrange to deprive yourself of the benefit of the
505 | patent license for this particular work, or (3) arrange, in a manner
506 | consistent with the requirements of this License, to extend the patent
507 | license to downstream recipients.  "Knowingly relying" means you have
508 | actual knowledge that, but for the patent license, your conveying the
509 | covered work in a country, or your recipient's use of the covered work
510 | in a country, would infringe one or more identifiable patents in that
511 | country that you have reason to believe are valid.
512 | 
513 |   If, pursuant to or in connection with a single transaction or
514 | arrangement, you convey, or propagate by procuring conveyance of, a
515 | covered work, and grant a patent license to some of the parties
516 | receiving the covered work authorizing them to use, propagate, modify
517 | or convey a specific copy of the covered work, then the patent license
518 | you grant is automatically extended to all recipients of the covered
519 | work and works based on it.
520 | 
521 |   A patent license is "discriminatory" if it does not include within
522 | the scope of its coverage, prohibits the exercise of, or is
523 | conditioned on the non-exercise of one or more of the rights that are
524 | specifically granted under this License.  You may not convey a covered
525 | work if you are a party to an arrangement with a third party that is
526 | in the business of distributing software, under which you make payment
527 | to the third party based on the extent of your activity of conveying
528 | the work, and under which the third party grants, to any of the
529 | parties who would receive the covered work from you, a discriminatory
530 | patent license (a) in connection with copies of the covered work
531 | conveyed by you (or copies made from those copies), or (b) primarily
532 | for and in connection with specific products or compilations that
533 | contain the covered work, unless you entered into that arrangement,
534 | or that patent license was granted, prior to 28 March 2007.
535 | 
536 |   Nothing in this License shall be construed as excluding or limiting
537 | any implied license or other defenses to infringement that may
538 | otherwise be available to you under applicable patent law.
539 | 
540 |   12. No Surrender of Others' Freedom.
541 | 
542 |   If conditions are imposed on you (whether by court order, agreement or
543 | otherwise) that contradict the conditions of this License, they do not
544 | excuse you from the conditions of this License.  If you cannot convey a
545 | covered work so as to satisfy simultaneously your obligations under this
546 | License and any other pertinent obligations, then as a consequence you may
547 | not convey it at all.  For example, if you agree to terms that obligate you
548 | to collect a royalty for further conveying from those to whom you convey
549 | the Program, the only way you could satisfy both those terms and this
550 | License would be to refrain entirely from conveying the Program.
551 | 
552 |   13. Use with the GNU Affero General Public License.
553 | 
554 |   Notwithstanding any other provision of this License, you have
555 | permission to link or combine any covered work with a work licensed
556 | under version 3 of the GNU Affero General Public License into a single
557 | combined work, and to convey the resulting work.  The terms of this
558 | License will continue to apply to the part which is the covered work,
559 | but the special requirements of the GNU Affero General Public License,
560 | section 13, concerning interaction through a network will apply to the
561 | combination as such.
562 | 
563 |   14. Revised Versions of this License.
564 | 
565 |   The Free Software Foundation may publish revised and/or new versions of
566 | the GNU General Public License from time to time.  Such new versions will
567 | be similar in spirit to the present version, but may differ in detail to
568 | address new problems or concerns.
569 | 
570 |   Each version is given a distinguishing version number.  If the
571 | Program specifies that a certain numbered version of the GNU General
572 | Public License "or any later version" applies to it, you have the
573 | option of following the terms and conditions either of that numbered
574 | version or of any later version published by the Free Software
575 | Foundation.  If the Program does not specify a version number of the
576 | GNU General Public License, you may choose any version ever published
577 | by the Free Software Foundation.
578 | 
579 |   If the Program specifies that a proxy can decide which future
580 | versions of the GNU General Public License can be used, that proxy's
581 | public statement of acceptance of a version permanently authorizes you
582 | to choose that version for the Program.
583 | 
584 |   Later license versions may give you additional or different
585 | permissions.  However, no additional obligations are imposed on any
586 | author or copyright holder as a result of your choosing to follow a
587 | later version.
588 | 
589 |   15. Disclaimer of Warranty.
590 | 
591 |   THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
592 | APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
593 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
594 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
595 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
596 | PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
597 | IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
598 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
599 | 
600 |   16. Limitation of Liability.
601 | 
602 |   IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
603 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
604 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
605 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
606 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
607 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
608 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
609 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
610 | SUCH DAMAGES.
611 | 
612 |   17. Interpretation of Sections 15 and 16.
613 | 
614 |   If the disclaimer of warranty and limitation of liability provided
615 | above cannot be given local legal effect according to their terms,
616 | reviewing courts shall apply local law that most closely approximates
617 | an absolute waiver of all civil liability in connection with the
618 | Program, unless a warranty or assumption of liability accompanies a
619 | copy of the Program in return for a fee.
620 | 
621 |                      END OF TERMS AND CONDITIONS
622 | 
623 |             How to Apply These Terms to Your New Programs
624 | 
625 |   If you develop a new program, and you want it to be of the greatest
626 | possible use to the public, the best way to achieve this is to make it
627 | free software which everyone can redistribute and change under these terms.
628 | 
629 |   To do so, attach the following notices to the program.  It is safest
630 | to attach them to the start of each source file to most effectively
631 | state the exclusion of warranty; and each file should have at least
632 | the "copyright" line and a pointer to where the full notice is found.
633 | 
634 |     <one line to give the program's name and a brief idea of what it does.>
635 |     Copyright (C) <year>  <name of author>
636 | 
637 |     This program is free software: you can redistribute it and/or modify
638 |     it under the terms of the GNU General Public License as published by
639 |     the Free Software Foundation, either version 3 of the License, or
640 |     (at your option) any later version.
641 | 
642 |     This program is distributed in the hope that it will be useful,
643 |     but WITHOUT ANY WARRANTY; without even the implied warranty of
644 |     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
645 |     GNU General Public License for more details.
646 | 
647 |     You should have received a copy of the GNU General Public License
648 |     along with this program.  If not, see <https://www.gnu.org/licenses/>.
649 | 
650 | Also add information on how to contact you by electronic and paper mail.
651 | 
652 |   If the program does terminal interaction, make it output a short
653 | notice like this when it starts in an interactive mode:
654 | 
655 |     <program>  Copyright (C) <year>  <name of author>
656 |     This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
657 |     This is free software, and you are welcome to redistribute it
658 |     under certain conditions; type `show c' for details.
659 | 
660 | The hypothetical commands `show w' and `show c' should show the appropriate
661 | parts of the General Public License.  Of course, your program's commands
662 | might be different; for a GUI interface, you would use an "about box".
663 | 
664 |   You should also get your employer (if you work as a programmer) or school,
665 | if any, to sign a "copyright disclaimer" for the program, if necessary.
666 | For more information on this, and how to apply and follow the GNU GPL, see
667 | <https://www.gnu.org/licenses/>.
668 | 
669 |   The GNU General Public License does not permit incorporating your program
670 | into proprietary programs.  If your program is a subroutine library, you
671 | may consider it more useful to permit linking proprietary applications with
672 | the library.  If this is what you want to do, use the GNU Lesser General
673 | Public License instead of this License.  But first, please read
674 | <https://www.gnu.org/licenses/why-not-lgpl.html>.
675 | 


--------------------------------------------------------------------------------
/LzmaSpec.cpp:
--------------------------------------------------------------------------------
  1 | /* LzmaSpec.c -- LZMA Reference Decoder
  2 | 2015-06-14 : Igor Pavlov : Public domain */
  3 | 
  4 | // This code implements LZMA file decoding according to LZMA specification.
  5 | // This code is not optimized for speed.
  6 | 
  7 | #include <stdio.h>
  8 | 
  9 | #ifdef _MSC_VER
 10 |   #pragma warning(disable : 4710) // function not inlined
 11 |   #pragma warning(disable : 4996) // This function or variable may be unsafe
 12 | #endif
 13 | 
 14 | typedef unsigned char Byte;
 15 | typedef unsigned short UInt16;
 16 | 
 17 | #ifdef _LZMA_UINT32_IS_ULONG
 18 |   typedef unsigned long UInt32;
 19 | #else
 20 |   typedef unsigned int UInt32;
 21 | #endif
 22 | 
 23 | #if defined(_MSC_VER) || defined(__BORLANDC__)
 24 |   typedef unsigned __int64 UInt64;
 25 | #else
 26 |   typedef unsigned long long int UInt64;
 27 | #endif
 28 | 
 29 | 
 30 | struct CInputStream
 31 | {
 32 |   FILE *File;
 33 |   UInt64 Processed;
 34 |   
 35 |   void Init() { Processed = 0; }
 36 | 
 37 |   Byte ReadByte()
 38 |   {
 39 |     int c = getc(File);
 40 |     if (c < 0)
 41 |       throw "Unexpected end of file";
 42 |     Processed++;
 43 |     return (Byte)c;
 44 |   }
 45 | };
 46 | 
 47 | 
 48 | struct COutStream
 49 | {
 50 |   FILE *File;
 51 |   UInt64 Processed;
 52 | 
 53 |   void Init() { Processed = 0; }
 54 | 
 55 |   void WriteByte(Byte b)
 56 |   {
 57 |     if (putc(b, File) == EOF)
 58 |       throw "File writing error";
 59 |     Processed++;
 60 |   }
 61 | };
 62 | 
 63 | 
 64 | class COutWindow
 65 | {
 66 |   Byte *Buf;
 67 |   UInt32 Pos;
 68 |   UInt32 Size;
 69 |   bool IsFull;
 70 | 
 71 | public:
 72 |   unsigned TotalPos;
 73 |   COutStream OutStream;
 74 | 
 75 |   COutWindow(): Buf(NULL) {}
 76 |   ~COutWindow() { delete []Buf; }
 77 |  
 78 |   void Create(UInt32 dictSize)
 79 |   {
 80 |     Buf = new Byte[dictSize];
 81 |     Pos = 0;
 82 |     Size = dictSize;
 83 |     IsFull = false;
 84 |     TotalPos = 0;
 85 |   }
 86 | 
 87 |   void PutByte(Byte b)
 88 |   {
 89 |     TotalPos++;
 90 |     Buf[Pos++] = b;
 91 |     if (Pos == Size)
 92 |     {
 93 |       Pos = 0;
 94 |       IsFull = true;
 95 |     }
 96 |     OutStream.WriteByte(b);
 97 |   }
 98 | 
 99 |   Byte GetByte(UInt32 dist) const
100 |   {
101 |     return Buf[dist <= Pos ? Pos - dist : Size - dist + Pos];
102 |   }
103 | 
104 |   void CopyMatch(UInt32 dist, unsigned len)
105 |   {
106 |     for (; len > 0; len--)
107 |       PutByte(GetByte(dist));
108 |   }
109 | 
110 |   bool CheckDistance(UInt32 dist) const
111 |   {
112 |     return dist <= Pos || IsFull;
113 |   }
114 | 
115 |   bool IsEmpty() const
116 |   {
117 |     return Pos == 0 && !IsFull;
118 |   }
119 | };
120 | 
121 | 
122 | #define kNumBitModelTotalBits 11
123 | #define kNumMoveBits 5
124 | 
125 | typedef UInt16 CProb;
126 | 
127 | #define PROB_INIT_VAL ((1 << kNumBitModelTotalBits) / 2)
128 | 
129 | #define INIT_PROBS(p) \
130 |  { for (unsigned i = 0; i < sizeof(p) / sizeof(p[0]); i++) p[i] = PROB_INIT_VAL; }
131 | 
132 | class CRangeDecoder
133 | {
134 |   UInt32 Range;
135 |   UInt32 Code;
136 | 
137 |   void Normalize();
138 | 
139 | public:
140 | 
141 |   CInputStream *InStream;
142 |   bool Corrupted;
143 | 
144 |   bool Init();
145 |   bool IsFinishedOK() const { return Code == 0; }
146 | 
147 |   UInt32 DecodeDirectBits(unsigned numBits);
148 |   unsigned DecodeBit(CProb *prob);
149 | };
150 | 
151 | bool CRangeDecoder::Init()
152 | {
153 |   Corrupted = false;
154 |   Range = 0xFFFFFFFF;
155 |   Code = 0;
156 | 
157 |   Byte b = InStream->ReadByte();
158 |   
159 |   for (int i = 0; i < 4; i++)
160 |     Code = (Code << 8) | InStream->ReadByte();
161 |   
162 |   if (b != 0 || Code == Range)
163 |     Corrupted = true;
164 |   return b == 0;
165 | }
166 | 
167 | #define kTopValue ((UInt32)1 << 24)
168 | 
169 | void CRangeDecoder::Normalize()
170 | {
171 |   if (Range < kTopValue)
172 |   {
173 |     Range <<= 8;
174 |     Code = (Code << 8) | InStream->ReadByte();
175 |   }
176 | }
177 | 
178 | UInt32 CRangeDecoder::DecodeDirectBits(unsigned numBits)
179 | {
180 |   UInt32 res = 0;
181 |   do
182 |   {
183 |     Range >>= 1;
184 |     Code -= Range;
185 |     UInt32 t = 0 - ((UInt32)Code >> 31);
186 |     Code += Range & t;
187 |     
188 |     if (Code == Range)
189 |       Corrupted = true;
190 |     
191 |     Normalize();
192 |     res <<= 1;
193 |     res += t + 1;
194 |   }
195 |   while (--numBits);
196 |   return res;
197 | }
198 | 
199 | unsigned CRangeDecoder::DecodeBit(CProb *prob)
200 | {
201 |   unsigned v = *prob;
202 |   UInt32 bound = (Range >> kNumBitModelTotalBits) * v;
203 |   unsigned symbol;
204 |   if (Code < bound)
205 |   {
206 |     v += ((1 << kNumBitModelTotalBits) - v) >> kNumMoveBits;
207 |     Range = bound;
208 |     symbol = 0;
209 |   }
210 |   else
211 |   {
212 |     v -= v >> kNumMoveBits;
213 |     Code -= bound;
214 |     Range -= bound;
215 |     symbol = 1;
216 |   }
217 |   *prob = (CProb)v;
218 |   Normalize();
219 |   return symbol;
220 | }
221 | 
222 | 
223 | unsigned BitTreeReverseDecode(CProb *probs, unsigned numBits, CRangeDecoder *rc)
224 | {
225 |   unsigned m = 1;
226 |   unsigned symbol = 0;
227 |   for (unsigned i = 0; i < numBits; i++)
228 |   {
229 |     unsigned bit = rc->DecodeBit(&probs[m]);
230 |     m <<= 1;
231 |     m += bit;
232 |     symbol |= (bit << i);
233 |   }
234 |   return symbol;
235 | }
236 | 
237 | template <unsigned NumBits>
238 | class CBitTreeDecoder
239 | {
240 |   CProb Probs[(unsigned)1 << NumBits];
241 | 
242 | public:
243 | 
244 |   void Init()
245 |   {
246 |     INIT_PROBS(Probs);
247 |   }
248 | 
249 |   unsigned Decode(CRangeDecoder *rc)
250 |   {
251 |     unsigned m = 1;
252 |     for (unsigned i = 0; i < NumBits; i++)
253 |       m = (m << 1) + rc->DecodeBit(&Probs[m]);
254 |     return m - ((unsigned)1 << NumBits);
255 |   }
256 | 
257 |   unsigned ReverseDecode(CRangeDecoder *rc)
258 |   {
259 |     return BitTreeReverseDecode(Probs, NumBits, rc);
260 |   }
261 | };
262 | 
263 | #define kNumPosBitsMax 4
264 | 
265 | #define kNumStates 12
266 | #define kNumLenToPosStates 4
267 | #define kNumAlignBits 4
268 | #define kStartPosModelIndex 4
269 | #define kEndPosModelIndex 14
270 | #define kNumFullDistances (1 << (kEndPosModelIndex >> 1))
271 | #define kMatchMinLen 2
272 | 
273 | class CLenDecoder
274 | {
275 |   CProb Choice;
276 |   CProb Choice2;
277 |   CBitTreeDecoder<3> LowCoder[1 << kNumPosBitsMax];
278 |   CBitTreeDecoder<3> MidCoder[1 << kNumPosBitsMax];
279 |   CBitTreeDecoder<8> HighCoder;
280 | 
281 | public:
282 | 
283 |   void Init()
284 |   {
285 |     Choice = PROB_INIT_VAL;
286 |     Choice2 = PROB_INIT_VAL;
287 |     HighCoder.Init();
288 |     for (unsigned i = 0; i < (1 << kNumPosBitsMax); i++)
289 |     {
290 |       LowCoder[i].Init();
291 |       MidCoder[i].Init();
292 |     }
293 |   }
294 | 
295 |   unsigned Decode(CRangeDecoder *rc, unsigned posState)
296 |   {
297 |     if (rc->DecodeBit(&Choice) == 0)
298 |       return LowCoder[posState].Decode(rc);
299 |     if (rc->DecodeBit(&Choice2) == 0)
300 |       return 8 + MidCoder[posState].Decode(rc);
301 |     return 16 + HighCoder.Decode(rc);
302 |   }
303 | };
304 | 
305 | unsigned UpdateState_Literal(unsigned state)
306 | {
307 |   if (state < 4) return 0;
308 |   else if (state < 10) return state - 3;
309 |   else return state - 6;
310 | }
311 | unsigned UpdateState_Match   (unsigned state) { return state < 7 ? 7 : 10; }
312 | unsigned UpdateState_Rep     (unsigned state) { return state < 7 ? 8 : 11; }
313 | unsigned UpdateState_ShortRep(unsigned state) { return state < 7 ? 9 : 11; }
314 | 
315 | #define LZMA_DIC_MIN (1 << 12)
316 | 
317 | class CLzmaDecoder
318 | {
319 | public:
320 |   CRangeDecoder RangeDec;
321 |   COutWindow OutWindow;
322 | 
323 |   bool markerIsMandatory;
324 |   unsigned lc, pb, lp;
325 |   UInt32 dictSize;
326 |   UInt32 dictSizeInProperties;
327 | 
328 |   void DecodeProperties(const Byte *properties)
329 |   {
330 |     unsigned d = properties[0];
331 |     if (d >= (9 * 5 * 5))
332 |       throw "Incorrect LZMA properties";
333 |     lc = d % 9;
334 |     d /= 9;
335 |     pb = d / 5;
336 |     lp = d % 5;
337 |     dictSizeInProperties = 0;
338 |     for (int i = 0; i < 4; i++)
339 |       dictSizeInProperties |= (UInt32)properties[i + 1] << (8 * i);
340 |     dictSize = dictSizeInProperties;
341 |     if (dictSize < LZMA_DIC_MIN)
342 |       dictSize = LZMA_DIC_MIN;
343 |   }
344 | 
345 |   CLzmaDecoder(): LitProbs(NULL) {}
346 |   ~CLzmaDecoder() { delete []LitProbs; }
347 | 
348 |   void Create()
349 |   {
350 |     OutWindow.Create(dictSize);
351 |     CreateLiterals();
352 |   }
353 | 
354 |   int Decode(bool unpackSizeDefined, UInt64 unpackSize);
355 |   
356 | private:
357 | 
358 |   CProb *LitProbs;
359 | 
360 |   void CreateLiterals()
361 |   {
362 |     LitProbs = new CProb[(UInt32)0x300 << (lc + lp)];
363 |   }
364 |   
365 |   void InitLiterals()
366 |   {
367 |     UInt32 num = (UInt32)0x300 << (lc + lp);
368 |     for (UInt32 i = 0; i < num; i++)
369 |       LitProbs[i] = PROB_INIT_VAL;
370 |   }
371 |   
372 |   void DecodeLiteral(unsigned state, UInt32 rep0)
373 |   {
374 |     unsigned prevByte = 0;
375 |     if (!OutWindow.IsEmpty())
376 |       prevByte = OutWindow.GetByte(1);
377 |     
378 |     unsigned symbol = 1;
379 |     unsigned litState = ((OutWindow.TotalPos & ((1 << lp) - 1)) << lc) + (prevByte >> (8 - lc));
380 |     CProb *probs = &LitProbs[(UInt32)0x300 * litState];
381 |     
382 |     if (state >= 7)
383 |     {
384 |       unsigned matchByte = OutWindow.GetByte(rep0 + 1);
385 |       do
386 |       {
387 |         unsigned matchBit = (matchByte >> 7) & 1;
388 |         matchByte <<= 1;
389 |         unsigned bit = RangeDec.DecodeBit(&probs[((1 + matchBit) << 8) + symbol]);
390 |         symbol = (symbol << 1) | bit;
391 |         if (matchBit != bit)
392 |           break;
393 |       }
394 |       while (symbol < 0x100);
395 |     }
396 |     while (symbol < 0x100)
397 |       symbol = (symbol << 1) | RangeDec.DecodeBit(&probs[symbol]);
398 |     OutWindow.PutByte((Byte)(symbol - 0x100));
399 |   }
400 | 
401 |   CBitTreeDecoder<6> PosSlotDecoder[kNumLenToPosStates];
402 |   CBitTreeDecoder<kNumAlignBits> AlignDecoder;
403 |   CProb PosDecoders[1 + kNumFullDistances - kEndPosModelIndex];
404 |   
405 |   void InitDist()
406 |   {
407 |     for (unsigned i = 0; i < kNumLenToPosStates; i++)
408 |       PosSlotDecoder[i].Init();
409 |     AlignDecoder.Init();
410 |     INIT_PROBS(PosDecoders);
411 |   }
412 |   
413 |   unsigned DecodeDistance(unsigned len)
414 |   {
415 |     unsigned lenState = len;
416 |     if (lenState > kNumLenToPosStates - 1)
417 |       lenState = kNumLenToPosStates - 1;
418 |     
419 |     unsigned posSlot = PosSlotDecoder[lenState].Decode(&RangeDec);
420 |     if (posSlot < 4)
421 |       return posSlot;
422 |     
423 |     unsigned numDirectBits = (unsigned)((posSlot >> 1) - 1);
424 |     UInt32 dist = ((2 | (posSlot & 1)) << numDirectBits);
425 |     if (posSlot < kEndPosModelIndex)
426 |       dist += BitTreeReverseDecode(PosDecoders + dist - posSlot, numDirectBits, &RangeDec);
427 |     else
428 |     {
429 |       dist += RangeDec.DecodeDirectBits(numDirectBits - kNumAlignBits) << kNumAlignBits;
430 |       dist += AlignDecoder.ReverseDecode(&RangeDec);
431 |     }
432 |     return dist;
433 |   }
434 | 
435 |   CProb IsMatch[kNumStates << kNumPosBitsMax];
436 |   CProb IsRep[kNumStates];
437 |   CProb IsRepG0[kNumStates];
438 |   CProb IsRepG1[kNumStates];
439 |   CProb IsRepG2[kNumStates];
440 |   CProb IsRep0Long[kNumStates << kNumPosBitsMax];
441 | 
442 |   CLenDecoder LenDecoder;
443 |   CLenDecoder RepLenDecoder;
444 | 
445 |   void Init()
446 |   {
447 |     InitLiterals();
448 |     InitDist();
449 | 
450 |     INIT_PROBS(IsMatch);
451 |     INIT_PROBS(IsRep);
452 |     INIT_PROBS(IsRepG0);
453 |     INIT_PROBS(IsRepG1);
454 |     INIT_PROBS(IsRepG2);
455 |     INIT_PROBS(IsRep0Long);
456 | 
457 |     LenDecoder.Init();
458 |     RepLenDecoder.Init();
459 |   }
460 | };
461 |     
462 | 
463 | #define LZMA_RES_ERROR                   0
464 | #define LZMA_RES_FINISHED_WITH_MARKER    1
465 | #define LZMA_RES_FINISHED_WITHOUT_MARKER 2
466 | 
467 | int CLzmaDecoder::Decode(bool unpackSizeDefined, UInt64 unpackSize)
468 | {
469 |   if (!RangeDec.Init())
470 |     return LZMA_RES_ERROR;
471 | 
472 |   Init();
473 | 
474 |   UInt32 rep0 = 0, rep1 = 0, rep2 = 0, rep3 = 0;
475 |   unsigned state = 0;
476 |   
477 |   for (;;)
478 |   {
479 |     if (unpackSizeDefined && unpackSize == 0 && !markerIsMandatory)
480 |       if (RangeDec.IsFinishedOK())
481 |         return LZMA_RES_FINISHED_WITHOUT_MARKER;
482 | 
483 |     unsigned posState = OutWindow.TotalPos & ((1 << pb) - 1);
484 | 
485 |     if (RangeDec.DecodeBit(&IsMatch[(state << kNumPosBitsMax) + posState]) == 0)
486 |     {
487 |       if (unpackSizeDefined && unpackSize == 0)
488 |         return LZMA_RES_ERROR;
489 |       DecodeLiteral(state, rep0);
490 |       state = UpdateState_Literal(state);
491 |       unpackSize--;
492 |       continue;
493 |     }
494 |     
495 |     unsigned len;
496 |     
497 |     if (RangeDec.DecodeBit(&IsRep[state]) != 0)
498 |     {
499 |       if (unpackSizeDefined && unpackSize == 0)
500 |         return LZMA_RES_ERROR;
501 |       if (OutWindow.IsEmpty())
502 |         return LZMA_RES_ERROR;
503 |       if (RangeDec.DecodeBit(&IsRepG0[state]) == 0)
504 |       {
505 |         if (RangeDec.DecodeBit(&IsRep0Long[(state << kNumPosBitsMax) + posState]) == 0)
506 |         {
507 |           state = UpdateState_ShortRep(state);
508 |           OutWindow.PutByte(OutWindow.GetByte(rep0 + 1));
509 |           unpackSize--;
510 |           continue;
511 |         }
512 |       }
513 |       else
514 |       {
515 |         UInt32 dist;
516 |         if (RangeDec.DecodeBit(&IsRepG1[state]) == 0)
517 |           dist = rep1;
518 |         else
519 |         {
520 |           if (RangeDec.DecodeBit(&IsRepG2[state]) == 0)
521 |             dist = rep2;
522 |           else
523 |           {
524 |             dist = rep3;
525 |             rep3 = rep2;
526 |           }
527 |           rep2 = rep1;
528 |         }
529 |         rep1 = rep0;
530 |         rep0 = dist;
531 |       }
532 |       len = RepLenDecoder.Decode(&RangeDec, posState);
533 |       state = UpdateState_Rep(state);
534 |     }
535 |     else
536 |     {
537 |       rep3 = rep2;
538 |       rep2 = rep1;
539 |       rep1 = rep0;
540 |       len = LenDecoder.Decode(&RangeDec, posState);
541 |       state = UpdateState_Match(state);
542 |       rep0 = DecodeDistance(len);
543 |       if (rep0 == 0xFFFFFFFF)
544 |         return RangeDec.IsFinishedOK() ?
545 |             LZMA_RES_FINISHED_WITH_MARKER :
546 |             LZMA_RES_ERROR;
547 | 
548 |       if (unpackSizeDefined && unpackSize == 0)
549 |         return LZMA_RES_ERROR;
550 |       if (rep0 >= dictSize || !OutWindow.CheckDistance(rep0))
551 |         return LZMA_RES_ERROR;
552 |     }
553 |     len += kMatchMinLen;
554 |     bool isError = false;
555 |     if (unpackSizeDefined && unpackSize < len)
556 |     {
557 |       len = (unsigned)unpackSize;
558 |       isError = true;
559 |     }
560 |     OutWindow.CopyMatch(rep0 + 1, len);
561 |     unpackSize -= len;
562 |     if (isError)
563 |       return LZMA_RES_ERROR;
564 |   }
565 | }
566 | 
567 | static void Print(const char *s)
568 | {
569 |   fputs(s, stdout);
570 | }
571 | 
572 | static void PrintError(const char *s)
573 | {
574 |   fputs(s, stderr);
575 | }
576 | 
577 | 
578 | #define CONVERT_INT_TO_STR(charType, tempSize) \
579 | 
580 | void ConvertUInt64ToString(UInt64 val, char *s)
581 | {
582 |   char temp[32];
583 |   unsigned i = 0;
584 |   while (val >= 10)
585 |   {
586 |     temp[i++] = (char)('0' + (unsigned)(val % 10));
587 |     val /= 10;
588 |   }
589 |   *s++ = (char)('0' + (unsigned)val);
590 |   while (i != 0)
591 |   {
592 |     i--;
593 |     *s++ = temp[i];
594 |   }
595 |   *s = 0;
596 | }
597 | 
598 | void PrintUInt64(const char *title, UInt64 v)
599 | {
600 |   Print(title);
601 |   Print(" : ");
602 |   char s[32];
603 |   ConvertUInt64ToString(v, s);
604 |   Print(s);
605 |   Print(" bytes \n");
606 | }
607 | 
608 | int main2(int numArgs, const char *args[])
609 | {
610 |   Print("\nLZMA Reference Decoder 15.00 : Igor Pavlov : Public domain : 2015-04-16\n");
611 |   if (numArgs == 1)
612 |     Print("\nUse: lzmaSpec a.lzma outFile");
613 | 
614 |   if (numArgs != 3)
615 |     throw "you must specify two parameters";
616 | 
617 |   CInputStream inStream;
618 |   inStream.File = fopen(args[1], "rb");
619 |   inStream.Init();
620 |   if (inStream.File == 0)
621 |     throw "Can't open input file";
622 | 
623 |   CLzmaDecoder lzmaDecoder;
624 |   lzmaDecoder.OutWindow.OutStream.File = fopen(args[2], "wb+");
625 |   lzmaDecoder.OutWindow.OutStream.Init();
626 |   if (inStream.File == 0)
627 |     throw "Can't open output file";
628 | 
629 |   Byte header[13];
630 |   int i;
631 |   for (i = 0; i < 13; i++)
632 |     header[i] = inStream.ReadByte();
633 | 
634 |   lzmaDecoder.DecodeProperties(header);
635 | 
636 |   printf("\nlc=%d, lp=%d, pb=%d", lzmaDecoder.lc, lzmaDecoder.lp, lzmaDecoder.pb);
637 |   printf("\nDictionary Size in properties = %u", lzmaDecoder.dictSizeInProperties);
638 |   printf("\nDictionary Size for decoding  = %u", lzmaDecoder.dictSize);
639 | 
640 |   UInt64 unpackSize = 0;
641 |   bool unpackSizeDefined = false;
642 |   for (i = 0; i < 8; i++)
643 |   {
644 |     Byte b = header[5 + i];
645 |     if (b != 0xFF)
646 |       unpackSizeDefined = true;
647 |     unpackSize |= (UInt64)b << (8 * i);
648 |   }
649 | 
650 |   lzmaDecoder.markerIsMandatory = !unpackSizeDefined;
651 | 
652 |   Print("\n");
653 |   if (unpackSizeDefined)
654 |     PrintUInt64("Uncompressed Size", unpackSize);
655 |   else
656 |     Print("End marker is expected\n");
657 |   lzmaDecoder.RangeDec.InStream = &inStream;
658 | 
659 |   Print("\n");
660 | 
661 |   lzmaDecoder.Create();
662 |   
663 |   int res = lzmaDecoder.Decode(unpackSizeDefined, unpackSize);
664 | 
665 |   PrintUInt64("Read    ", inStream.Processed);
666 |   PrintUInt64("Written ", lzmaDecoder.OutWindow.OutStream.Processed);
667 | 
668 |   if (res == LZMA_RES_ERROR)
669 |     throw "LZMA decoding error";
670 |   else if (res == LZMA_RES_FINISHED_WITHOUT_MARKER)
671 |     Print("Finished without end marker");
672 |   else if (res == LZMA_RES_FINISHED_WITH_MARKER)
673 |   {
674 |     if (unpackSizeDefined)
675 |     {
676 |       if (lzmaDecoder.OutWindow.OutStream.Processed != unpackSize)
677 |         throw "Finished with end marker before than specified size";
678 |       Print("Warning: ");
679 |     }
680 |     Print("Finished with end marker");
681 |   }
682 |   else
683 |     throw "Internal Error";
684 | 
685 |   Print("\n");
686 |   
687 |   if (lzmaDecoder.RangeDec.Corrupted)
688 |   {
689 |     Print("\nWarning: LZMA stream is corrupted\n");
690 |   }
691 | 
692 |   return 0;
693 | }
694 | 
695 | 
696 | int
697 |   #ifdef _MSC_VER
698 |     __cdecl
699 |   #endif
700 | main(int numArgs, const char *args[])
701 | {
702 |   try { return main2(numArgs, args); }
703 |   catch (const char *s)
704 |   {
705 |     PrintError("\nError:\n");
706 |     PrintError(s);
707 |     PrintError("\n");
708 |     return 1;
709 |   }
710 |   catch(...)
711 |   {
712 |     PrintError("\nError\n");
713 |     return 1;
714 |   }
715 | }
716 | 


--------------------------------------------------------------------------------
/LzmaSpec.exe:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shelwien/lzma_sh/0e916adc087e029ad5afd1567bf0ca03a5574a1b/LzmaSpec.exe


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # lzma_sh
 2 | compact lzma decoder
 3 | 
 4 | lzmaSh2a.cpp is a little longer (247 LoC vs 228), but has an explicit state table
 5 | and integrated id decoding.
 6 | 
 7 | LzmaSpec.cpp is the "official" demo decoder, 716 LoC
 8 | 
 9 | 
10 | 


--------------------------------------------------------------------------------
/examples/a.lzma:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shelwien/lzma_sh/0e916adc087e029ad5afd1567bf0ca03a5574a1b/examples/a.lzma


--------------------------------------------------------------------------------
/examples/a.txt:
--------------------------------------------------------------------------------
 1 | LZMA decoder test example
 2 | =========================
 3 | ! LZMA ! Decoder ! TEST !
 4 | =========================
 5 | ! TEST ! LZMA ! Decoder !
 6 | =========================
 7 | ---- Test Line 1 -------- 
 8 | =========================
 9 | ---- Test Line 2 -------- 
10 | =========================
11 | === End of test file ==== 
12 | =========================
13 | 


--------------------------------------------------------------------------------
/examples/a_eos.lzma:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shelwien/lzma_sh/0e916adc087e029ad5afd1567bf0ca03a5574a1b/examples/a_eos.lzma


--------------------------------------------------------------------------------
/examples/a_eos_and_size.lzma:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shelwien/lzma_sh/0e916adc087e029ad5afd1567bf0ca03a5574a1b/examples/a_eos_and_size.lzma


--------------------------------------------------------------------------------
/examples/a_lp1_lc2_pb1.lzma:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shelwien/lzma_sh/0e916adc087e029ad5afd1567bf0ca03a5574a1b/examples/a_lp1_lc2_pb1.lzma


--------------------------------------------------------------------------------
/examples/bad_corrupted.lzma:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shelwien/lzma_sh/0e916adc087e029ad5afd1567bf0ca03a5574a1b/examples/bad_corrupted.lzma


--------------------------------------------------------------------------------
/examples/bad_eos_incorrect_size.lzma:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shelwien/lzma_sh/0e916adc087e029ad5afd1567bf0ca03a5574a1b/examples/bad_eos_incorrect_size.lzma


--------------------------------------------------------------------------------
/examples/bad_incorrect_size.lzma:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shelwien/lzma_sh/0e916adc087e029ad5afd1567bf0ca03a5574a1b/examples/bad_incorrect_size.lzma


--------------------------------------------------------------------------------
/examples/info.txt:
--------------------------------------------------------------------------------
 1 | GOOD archives:
 2 | 
 3 | a.lzma
 4 |   the stream was compressed with default properties lp=0 lc=3 pb=2 and 64 KiB dictionary
 5 | a_eos.lzma
 6 |   the stream has EOS marker
 7 | a_eos_and_size.lzma
 8 |   the stream has EOS marker and unpack size is defined
 9 | a_lp1_lc2_pb1.lzma
10 |   the stream was compressed with lp=1 lc=2 pb=1 properties
11 | 
12 | 
13 | BAD ARCHIVES:
14 | 
15 | bad_corrupted.lzma
16 |   some bytes in compressed stream were changed
17 | bad_eos_incorrect_size.lzma
18 |   the stream has EOS marker and unpack size in header is larger than real uncompressed size
19 | bad_incorrect_size.lzma
20 |   the header contains incorrect size (290). The correct size is 327
21 | 


--------------------------------------------------------------------------------
/geo_lzma:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shelwien/lzma_sh/0e916adc087e029ad5afd1567bf0ca03a5574a1b/geo_lzma


--------------------------------------------------------------------------------
/lzma-specification.txt:
--------------------------------------------------------------------------------
   1 | LZMA specification (DRAFT version)
   2 | ----------------------------------
   3 | 
   4 | Author: Igor Pavlov
   5 | Date: 2015-06-14
   6 | 
   7 | This specification defines the format of LZMA compressed data and lzma file format.
   8 | 
   9 | Notation 
  10 | --------
  11 | 
  12 | We use the syntax of C++ programming language.
  13 | We use the following types in C++ code:
  14 |   unsigned - unsigned integer, at least 16 bits in size
  15 |   int      - signed integer, at least 16 bits in size
  16 |   UInt64   - 64-bit unsigned integer
  17 |   UInt32   - 32-bit unsigned integer
  18 |   UInt16   - 16-bit unsigned integer
  19 |   Byte     - 8-bit unsigned integer
  20 |   bool     - boolean type with two possible values: false, true
  21 | 
  22 | 
  23 | lzma file format
  24 | ================
  25 | 
  26 | The lzma file contains the raw LZMA stream and the header with related properties.
  27 | 
  28 | The files in that format use ".lzma" extension.
  29 | 
  30 | The lzma file format layout:
  31 | 
  32 | Offset Size Description
  33 | 
  34 |   0     1   LZMA model properties (lc, lp, pb) in encoded form
  35 |   1     4   Dictionary size (32-bit unsigned integer, little-endian)
  36 |   5     8   Uncompressed size (64-bit unsigned integer, little-endian)
  37 |  13         Compressed data (LZMA stream)
  38 | 
  39 | LZMA properties:
  40 | 
  41 |     name  Range          Description
  42 | 
  43 |       lc  [0, 8]         the number of "literal context" bits
  44 |       lp  [0, 4]         the number of "literal pos" bits
  45 |       pb  [0, 4]         the number of "pos" bits
  46 | dictSize  [0, 2^32 - 1]  the dictionary size 
  47 | 
  48 | The following code encodes LZMA properties:
  49 | 
  50 | void EncodeProperties(Byte *properties)
  51 | {
  52 |   properties[0] = (Byte)((pb * 5 + lp) * 9 + lc);
  53 |   Set_UInt32_LittleEndian(properties + 1, dictSize);
  54 | }
  55 | 
  56 | If the value of dictionary size in properties is smaller than (1 << 12),
  57 | the LZMA decoder must set the dictionary size variable to (1 << 12).
  58 | 
  59 | #define LZMA_DIC_MIN (1 << 12)
  60 | 
  61 |   unsigned lc, pb, lp;
  62 |   UInt32 dictSize;
  63 |   UInt32 dictSizeInProperties;
  64 | 
  65 |   void DecodeProperties(const Byte *properties)
  66 |   {
  67 |     unsigned d = properties[0];
  68 |     if (d >= (9 * 5 * 5))
  69 |       throw "Incorrect LZMA properties";
  70 |     lc = d % 9;
  71 |     d /= 9;
  72 |     pb = d / 5;
  73 |     lp = d % 5;
  74 |     dictSizeInProperties = 0;
  75 |     for (int i = 0; i < 4; i++)
  76 |       dictSizeInProperties |= (UInt32)properties[i + 1] << (8 * i);
  77 |     dictSize = dictSizeInProperties;
  78 |     if (dictSize < LZMA_DIC_MIN)
  79 |       dictSize = LZMA_DIC_MIN;
  80 |   }
  81 | 
  82 | If "Uncompressed size" field contains ones in all 64 bits, it means that
  83 | uncompressed size is unknown and there is the "end marker" in stream,
  84 | that indicates the end of decoding point.
  85 | In opposite case, if the value from "Uncompressed size" field is not
  86 | equal to ((2^64) - 1), the LZMA stream decoding must be finished after
  87 | specified number of bytes (Uncompressed size) is decoded. And if there 
  88 | is the "end marker", the LZMA decoder must read that marker also.
  89 | 
  90 | 
  91 | The new scheme to encode LZMA properties
  92 | ----------------------------------------
  93 | 
  94 | If LZMA compression is used for some another format, it's recommended to
  95 | use a new improved scheme to encode LZMA properties. That new scheme was
  96 | used in xz format that uses the LZMA2 compression algorithm.
  97 | The LZMA2 is a new compression algorithm that is based on the LZMA algorithm.
  98 | 
  99 | The dictionary size in LZMA2 is encoded with just one byte and LZMA2 supports
 100 | only reduced set of dictionary sizes:
 101 |   (2 << 11), (3 << 11),
 102 |   (2 << 12), (3 << 12),
 103 |   ...
 104 |   (2 << 30), (3 << 30),
 105 |   (2 << 31) - 1
 106 | 
 107 | The dictionary size can be extracted from encoded value with the following code:
 108 | 
 109 |   dictSize = (p == 40) ? 0xFFFFFFFF : (((UInt32)2 | ((p) & 1)) << ((p) / 2 + 11));
 110 | 
 111 | Also there is additional limitation (lc + lp <= 4) in LZMA2 for values of 
 112 | "lc" and "lp" properties:
 113 | 
 114 |   if (lc + lp > 4)
 115 |     throw "Unsupported properties: (lc + lp) > 4";
 116 | 
 117 | There are some advantages for LZMA decoder with such (lc + lp) value
 118 | limitation. It reduces the maximum size of tables allocated by decoder.
 119 | And it reduces the complexity of initialization procedure, that can be 
 120 | important to keep high speed of decoding of big number of small LZMA streams.
 121 | 
 122 | It's recommended to use that limitation (lc + lp <= 4) for any new format
 123 | that uses LZMA compression. Note that the combinations of "lc" and "lp" 
 124 | parameters, where (lc + lp > 4), can provide significant improvement in 
 125 | compression ratio only in some rare cases.
 126 | 
 127 | The LZMA properties can be encoded into two bytes in new scheme:
 128 | 
 129 | Offset Size Description
 130 | 
 131 |   0     1   The dictionary size encoded with LZMA2 scheme
 132 |   1     1   LZMA model properties (lc, lp, pb) in encoded form
 133 | 
 134 | 
 135 | The RAM usage 
 136 | =============
 137 | 
 138 | The RAM usage for LZMA decoder is determined by the following parts:
 139 | 
 140 | 1) The Sliding Window (from 4 KiB to 4 GiB).
 141 | 2) The probability model counter arrays (arrays of 16-bit variables).
 142 | 3) Some additional state variables (about 10 variables of 32-bit integers).
 143 | 
 144 | 
 145 | The RAM usage for Sliding Window
 146 | --------------------------------
 147 | 
 148 | There are two main scenarios of decoding:
 149 | 
 150 | 1) The decoding of full stream to one RAM buffer.
 151 | 
 152 |   If we decode full LZMA stream to one output buffer in RAM, the decoder 
 153 |   can use that output buffer as sliding window. So the decoder doesn't 
 154 |   need additional buffer allocated for sliding window.
 155 | 
 156 | 2) The decoding to some external storage.
 157 | 
 158 |   If we decode LZMA stream to external storage, the decoder must allocate
 159 |   the buffer for sliding window. The size of that buffer must be equal 
 160 |   or larger than the value of dictionary size from properties of LZMA stream.
 161 | 
 162 | In this specification we describe the code for decoding to some external
 163 | storage. The optimized version of code for decoding of full stream to one
 164 | output RAM buffer can require some minor changes in code.
 165 | 
 166 | 
 167 | The RAM usage for the probability model counters
 168 | ------------------------------------------------
 169 | 
 170 | The size of the probability model counter arrays is calculated with the 
 171 | following formula:
 172 | 
 173 | size_of_prob_arrays = 1846 + 768 * (1 << (lp + lc))
 174 | 
 175 | Each probability model counter is 11-bit unsigned integer.
 176 | If we use 16-bit integer variables (2-byte integers) for these probability 
 177 | model counters, the RAM usage required by probability model counter arrays 
 178 | can be estimated with the following formula:
 179 | 
 180 |   RAM = 4 KiB + 1.5 KiB * (1 << (lp + lc))
 181 | 
 182 | For example, for default LZMA parameters (lp = 0 and lc = 3), the RAM usage is
 183 | 
 184 |   RAM_lc3_lp0 = 4 KiB + 1.5 KiB * 8 = 16 KiB
 185 | 
 186 | The maximum RAM state usage is required for decoding the stream with lp = 4 
 187 | and lc = 8:
 188 | 
 189 |   RAM_lc8_lp4 = 4 KiB + 1.5 KiB * 4096 = 6148 KiB
 190 | 
 191 | If the decoder uses LZMA2's limited property condition 
 192 | (lc + lp <= 4), the RAM usage will be not larger than
 193 | 
 194 |   RAM_lc_lp_4 = 4 KiB + 1.5 KiB * 16 = 28 KiB
 195 | 
 196 | 
 197 | The RAM usage for encoder
 198 | -------------------------
 199 | 
 200 | There are many variants for LZMA encoding code.
 201 | These variants have different values for memory consumption.
 202 | Note that memory consumption for LZMA Encoder can not be 
 203 | smaller than memory consumption of LZMA Decoder for same stream.
 204 | 
 205 | The RAM usage required by modern effective implementation of 
 206 | LZMA Encoder can be estimated with the following formula:
 207 | 
 208 |   Encoder_RAM_Usage = 4 MiB + 11 * dictionarySize.
 209 | 
 210 | But there are some modes of the encoder that require less memory.
 211 | 
 212 | 
 213 | LZMA Decoding
 214 | =============
 215 | 
 216 | The LZMA compression algorithm uses LZ-based compression with Sliding Window
 217 | and Range Encoding as entropy coding method.
 218 | 
 219 | 
 220 | Sliding Window
 221 | --------------
 222 | 
 223 | LZMA uses Sliding Window compression similar to LZ77 algorithm.
 224 | 
 225 | LZMA stream must be decoded to the sequence that consists
 226 | of MATCHES and LITERALS:
 227 |   
 228 |   - a LITERAL is a 8-bit character (one byte).
 229 |     The decoder just puts that LITERAL to the uncompressed stream.
 230 |   
 231 |   - a MATCH is a pair of two numbers (DISTANCE-LENGTH pair).
 232 |     The decoder takes one byte exactly "DISTANCE" characters behind
 233 |     current position in the uncompressed stream and puts it to 
 234 |     uncompressed stream. The decoder must repeat it "LENGTH" times.
 235 | 
 236 | The "DISTANCE" can not be larger than dictionary size.
 237 | And the "DISTANCE" can not be larger than the number of bytes in
 238 | the uncompressed stream that were decoded before that match.
 239 | 
 240 | In this specification we use cyclic buffer to implement Sliding Window
 241 | for LZMA decoder:
 242 | 
 243 | class COutWindow
 244 | {
 245 |   Byte *Buf;
 246 |   UInt32 Pos;
 247 |   UInt32 Size;
 248 |   bool IsFull;
 249 | 
 250 | public:
 251 |   unsigned TotalPos;
 252 |   COutStream OutStream;
 253 | 
 254 |   COutWindow(): Buf(NULL) {}
 255 |   ~COutWindow() { delete []Buf; }
 256 |  
 257 |   void Create(UInt32 dictSize)
 258 |   {
 259 |     Buf = new Byte[dictSize];
 260 |     Pos = 0;
 261 |     Size = dictSize;
 262 |     IsFull = false;
 263 |     TotalPos = 0;
 264 |   }
 265 | 
 266 |   void PutByte(Byte b)
 267 |   {
 268 |     TotalPos++;
 269 |     Buf[Pos++] = b;
 270 |     if (Pos == Size)
 271 |     {
 272 |       Pos = 0;
 273 |       IsFull = true;
 274 |     }
 275 |     OutStream.WriteByte(b);
 276 |   }
 277 | 
 278 |   Byte GetByte(UInt32 dist) const
 279 |   {
 280 |     return Buf[dist <= Pos ? Pos - dist : Size - dist + Pos];
 281 |   }
 282 | 
 283 |   void CopyMatch(UInt32 dist, unsigned len)
 284 |   {
 285 |     for (; len > 0; len--)
 286 |       PutByte(GetByte(dist));
 287 |   }
 288 | 
 289 |   bool CheckDistance(UInt32 dist) const
 290 |   {
 291 |     return dist <= Pos || IsFull;
 292 |   }
 293 | 
 294 |   bool IsEmpty() const
 295 |   {
 296 |     return Pos == 0 && !IsFull;
 297 |   }
 298 | };
 299 | 
 300 | 
 301 | In another implementation it's possible to use one buffer that contains 
 302 | Sliding Window and the whole data stream after uncompressing.
 303 | 
 304 | 
 305 | Range Decoder
 306 | -------------
 307 | 
 308 | LZMA algorithm uses Range Encoding (1) as entropy coding method.
 309 | 
 310 | LZMA stream contains just one very big number in big-endian encoding.
 311 | LZMA decoder uses the Range Decoder to extract a sequence of binary
 312 | symbols from that big number.
 313 | 
 314 | The state of the Range Decoder:
 315 | 
 316 | struct CRangeDecoder
 317 | {
 318 |   UInt32 Range; 
 319 |   UInt32 Code;
 320 |   InputStream *InStream;
 321 | 
 322 |   bool Corrupted;
 323 | }
 324 | 
 325 | The notes about UInt32 type for the "Range" and "Code" variables:
 326 | 
 327 |   It's possible to use 64-bit (unsigned or signed) integer type
 328 |   for the "Range" and the "Code" variables instead of 32-bit unsigned,
 329 |   but some additional code must be used to truncate the values to 
 330 |   low 32-bits after some operations.
 331 | 
 332 |   If the programming language does not support 32-bit unsigned integer type 
 333 |   (like in case of JAVA language), it's possible to use 32-bit signed integer, 
 334 |   but some code must be changed. For example, it's required to change the code
 335 |   that uses comparison operations for UInt32 variables in this specification.
 336 | 
 337 | The Range Decoder can be in some states that can be treated as 
 338 | "Corruption" in LZMA stream. The Range Decoder uses the variable "Corrupted":
 339 | 
 340 |   (Corrupted == false), if the Range Decoder has not detected any corruption.
 341 |   (Corrupted == true), if the Range Decoder has detected some corruption.
 342 | 
 343 | The reference LZMA Decoder ignores the value of the "Corrupted" variable.
 344 | So it continues to decode the stream, even if the corruption can be detected
 345 | in the Range Decoder. To provide the full compatibility with output of the 
 346 | reference LZMA Decoder, another LZMA Decoder implementations must also 
 347 | ignore the value of the "Corrupted" variable.
 348 | 
 349 | The LZMA Encoder is required to create only such LZMA streams, that will not 
 350 | lead the Range Decoder to states, where the "Corrupted" variable is set to true.
 351 | 
 352 | The Range Decoder reads first 5 bytes from input stream to initialize
 353 | the state:
 354 | 
 355 | bool CRangeDecoder::Init()
 356 | {
 357 |   Corrupted = false;
 358 |   Range = 0xFFFFFFFF;
 359 |   Code = 0;
 360 | 
 361 |   Byte b = InStream->ReadByte();
 362 |   
 363 |   for (int i = 0; i < 4; i++)
 364 |     Code = (Code << 8) | InStream->ReadByte();
 365 |   
 366 |   if (b != 0 || Code == Range)
 367 |     Corrupted = true;
 368 |   return b == 0;
 369 | }
 370 | 
 371 | The LZMA Encoder always writes ZERO in initial byte of compressed stream.
 372 | That scheme allows to simplify the code of the Range Encoder in the 
 373 | LZMA Encoder. If initial byte is not equal to ZERO, the LZMA Decoder must
 374 | stop decoding and report error.
 375 | 
 376 | After the last bit of data was decoded by Range Decoder, the value of the
 377 | "Code" variable must be equal to 0. The LZMA Decoder must check it by 
 378 | calling the IsFinishedOK() function:
 379 | 
 380 |   bool IsFinishedOK() const { return Code == 0; }
 381 | 
 382 | If there is corruption in data stream, there is big probability that
 383 | the "Code" value will be not equal to 0 in the Finish() function. So that
 384 | check in the IsFinishedOK() function provides very good feature for 
 385 | corruption detection.
 386 | 
 387 | The value of the "Range" variable before each bit decoding can not be smaller 
 388 | than ((UInt32)1 << 24). The Normalize() function keeps the "Range" value in 
 389 | described range.
 390 | 
 391 | #define kTopValue ((UInt32)1 << 24)
 392 | 
 393 | void CRangeDecoder::Normalize()
 394 | {
 395 |   if (Range < kTopValue)
 396 |   {
 397 |     Range <<= 8;
 398 |     Code = (Code << 8) | InStream->ReadByte();
 399 |   }
 400 | }
 401 | 
 402 | Notes: if the size of the "Code" variable is larger than 32 bits, it's
 403 | required to keep only low 32 bits of the "Code" variable after the change
 404 | in Normalize() function.
 405 | 
 406 | If the LZMA Stream is not corrupted, the value of the "Code" variable is
 407 | always smaller than value of the "Range" variable.
 408 | But the Range Decoder ignores some types of corruptions, so the value of
 409 | the "Code" variable can be equal or larger than value of the "Range" variable
 410 | for some "Corrupted" archives.
 411 | 
 412 | 
 413 | LZMA uses Range Encoding only with binary symbols of two types:
 414 |   1) binary symbols with fixed and equal probabilities (direct bits)
 415 |   2) binary symbols with predicted probabilities
 416 | 
 417 | The DecodeDirectBits() function decodes the sequence of direct bits:
 418 | 
 419 | UInt32 CRangeDecoder::DecodeDirectBits(unsigned numBits)
 420 | {
 421 |   UInt32 res = 0;
 422 |   do
 423 |   {
 424 |     Range >>= 1;
 425 |     Code -= Range;
 426 |     UInt32 t = 0 - ((UInt32)Code >> 31);
 427 |     Code += Range & t;
 428 |     
 429 |     if (Code == Range)
 430 |       Corrupted = true;
 431 |     
 432 |     Normalize();
 433 |     res <<= 1;
 434 |     res += t + 1;
 435 |   }
 436 |   while (--numBits);
 437 |   return res;
 438 | }
 439 | 
 440 | 
 441 | The Bit Decoding with Probability Model
 442 | ---------------------------------------
 443 | 
 444 | The task of Bit Probability Model is to estimate probabilities of binary
 445 | symbols. And then it provides the Range Decoder with that information.
 446 | The better prediction provides better compression ratio.
 447 | The Bit Probability Model uses statistical data of previous decoded
 448 | symbols.
 449 | 
 450 | That estimated probability is presented as 11-bit unsigned integer value
 451 | that represents the probability of symbol "0".
 452 | 
 453 | #define kNumBitModelTotalBits 11
 454 | 
 455 | Mathematical probabilities can be presented with the following formulas:
 456 |      probability(symbol_0) = prob / 2048.
 457 |      probability(symbol_1) =  1 - Probability(symbol_0) =  
 458 |                            =  1 - prob / 2048 =  
 459 |                            =  (2048 - prob) / 2048
 460 | where the "prob" variable contains 11-bit integer probability counter.
 461 | 
 462 | It's recommended to use 16-bit unsigned integer type, to store these 11-bit
 463 | probability values:
 464 | 
 465 | typedef UInt16 CProb;
 466 | 
 467 | Each probability value must be initialized with value ((1 << 11) / 2),
 468 | that represents the state, where probabilities of symbols 0 and 1 
 469 | are equal to 0.5:
 470 | 
 471 | #define PROB_INIT_VAL ((1 << kNumBitModelTotalBits) / 2)
 472 | 
 473 | The INIT_PROBS macro is used to initialize the array of CProb variables:
 474 | 
 475 | #define INIT_PROBS(p) \
 476 |  { for (unsigned i = 0; i < sizeof(p) / sizeof(p[0]); i++) p[i] = PROB_INIT_VAL; }
 477 | 
 478 | 
 479 | The DecodeBit() function decodes one bit.
 480 | The LZMA decoder provides the pointer to CProb variable that contains 
 481 | information about estimated probability for symbol 0 and the Range Decoder 
 482 | updates that CProb variable after decoding. The Range Decoder increases 
 483 | estimated probability of the symbol that was decoded:
 484 | 
 485 | #define kNumMoveBits 5
 486 | 
 487 | unsigned CRangeDecoder::DecodeBit(CProb *prob)
 488 | {
 489 |   unsigned v = *prob;
 490 |   UInt32 bound = (Range >> kNumBitModelTotalBits) * v;
 491 |   unsigned symbol;
 492 |   if (Code < bound)
 493 |   {
 494 |     v += ((1 << kNumBitModelTotalBits) - v) >> kNumMoveBits;
 495 |     Range = bound;
 496 |     symbol = 0;
 497 |   }
 498 |   else
 499 |   {
 500 |     v -= v >> kNumMoveBits;
 501 |     Code -= bound;
 502 |     Range -= bound;
 503 |     symbol = 1;
 504 |   }
 505 |   *prob = (CProb)v;
 506 |   Normalize();
 507 |   return symbol;
 508 | }
 509 | 
 510 | 
 511 | The Binary Tree of bit model counters
 512 | -------------------------------------
 513 | 
 514 | LZMA uses a tree of Bit model variables to decode symbol that needs
 515 | several bits for storing. There are two versions of such trees in LZMA:
 516 |   1) the tree that decodes bits from high bit to low bit (the normal scheme).
 517 |   2) the tree that decodes bits from low bit to high bit (the reverse scheme).
 518 | 
 519 | Each binary tree structure supports different size of decoded symbol
 520 | (the size of binary sequence that contains value of symbol).
 521 | If that size of decoded symbol is "NumBits" bits, the tree structure 
 522 | uses the array of (2 << NumBits) counters of CProb type. 
 523 | But only ((2 << NumBits) - 1) items are used by encoder and decoder.
 524 | The first item (the item with index equal to 0) in array is unused.
 525 | That scheme with unused array's item allows to simplify the code.
 526 | 
 527 | unsigned BitTreeReverseDecode(CProb *probs, unsigned numBits, CRangeDecoder *rc)
 528 | {
 529 |   unsigned m = 1;
 530 |   unsigned symbol = 0;
 531 |   for (unsigned i = 0; i < numBits; i++)
 532 |   {
 533 |     unsigned bit = rc->DecodeBit(&probs[m]);
 534 |     m <<= 1;
 535 |     m += bit;
 536 |     symbol |= (bit << i);
 537 |   }
 538 |   return symbol;
 539 | }
 540 | 
 541 | template <unsigned NumBits>
 542 | class CBitTreeDecoder
 543 | {
 544 |   CProb Probs[(unsigned)1 << NumBits];
 545 | 
 546 | public:
 547 | 
 548 |   void Init()
 549 |   {
 550 |     INIT_PROBS(Probs);
 551 |   }
 552 | 
 553 |   unsigned Decode(CRangeDecoder *rc)
 554 |   {
 555 |     unsigned m = 1;
 556 |     for (unsigned i = 0; i < NumBits; i++)
 557 |       m = (m << 1) + rc->DecodeBit(&Probs[m]);
 558 |     return m - ((unsigned)1 << NumBits);
 559 |   }
 560 | 
 561 |   unsigned ReverseDecode(CRangeDecoder *rc)
 562 |   {
 563 |     return BitTreeReverseDecode(Probs, NumBits, rc);
 564 |   }
 565 | };
 566 | 
 567 | 
 568 | LZ part of LZMA 
 569 | ---------------
 570 | 
 571 | LZ part of LZMA describes details about the decoding of MATCHES and LITERALS.
 572 | 
 573 | 
 574 | The Literal Decoding
 575 | --------------------
 576 | 
 577 | The LZMA Decoder uses (1 << (lc + lp)) tables with CProb values, where 
 578 | each table contains 0x300 CProb values:
 579 | 
 580 |   CProb *LitProbs;
 581 | 
 582 |   void CreateLiterals()
 583 |   {
 584 |     LitProbs = new CProb[(UInt32)0x300 << (lc + lp)];
 585 |   }
 586 |   
 587 |   void InitLiterals()
 588 |   {
 589 |     UInt32 num = (UInt32)0x300 << (lc + lp);
 590 |     for (UInt32 i = 0; i < num; i++)
 591 |       LitProbs[i] = PROB_INIT_VAL;
 592 |   }
 593 | 
 594 | To select the table for decoding it uses the context that consists of
 595 | (lc) high bits from previous literal and (lp) low bits from value that
 596 | represents current position in outputStream.
 597 | 
 598 | If (State > 7), the Literal Decoder also uses "matchByte" that represents 
 599 | the byte in OutputStream at position the is the DISTANCE bytes before 
 600 | current position, where the DISTANCE is the distance in DISTANCE-LENGTH pair
 601 | of latest decoded match.
 602 | 
 603 | The following code decodes one literal and puts it to Sliding Window buffer:
 604 | 
 605 |   void DecodeLiteral(unsigned state, UInt32 rep0)
 606 |   {
 607 |     unsigned prevByte = 0;
 608 |     if (!OutWindow.IsEmpty())
 609 |       prevByte = OutWindow.GetByte(1);
 610 |     
 611 |     unsigned symbol = 1;
 612 |     unsigned litState = ((OutWindow.TotalPos & ((1 << lp) - 1)) << lc) + (prevByte >> (8 - lc));
 613 |     CProb *probs = &LitProbs[(UInt32)0x300 * litState];
 614 |     
 615 |     if (state >= 7)
 616 |     {
 617 |       unsigned matchByte = OutWindow.GetByte(rep0 + 1);
 618 |       do
 619 |       {
 620 |         unsigned matchBit = (matchByte >> 7) & 1;
 621 |         matchByte <<= 1;
 622 |         unsigned bit = RangeDec.DecodeBit(&probs[((1 + matchBit) << 8) + symbol]);
 623 |         symbol = (symbol << 1) | bit;
 624 |         if (matchBit != bit)
 625 |           break;
 626 |       }
 627 |       while (symbol < 0x100);
 628 |     }
 629 |     while (symbol < 0x100)
 630 |       symbol = (symbol << 1) | RangeDec.DecodeBit(&probs[symbol]);
 631 |     OutWindow.PutByte((Byte)(symbol - 0x100));
 632 |   }
 633 | 
 634 | 
 635 | The match length decoding
 636 | -------------------------
 637 | 
 638 | The match length decoder returns normalized (zero-based value) 
 639 | length of match. That value can be converted to real length of the match 
 640 | with the following code:
 641 | 
 642 | #define kMatchMinLen 2
 643 | 
 644 |     matchLen = len + kMatchMinLen;
 645 | 
 646 | The match length decoder can return the values from 0 to 271.
 647 | And the corresponded real match length values can be in the range 
 648 | from 2 to 273.
 649 | 
 650 | The following scheme is used for the match length encoding:
 651 | 
 652 |   Binary encoding    Binary Tree structure    Zero-based match length 
 653 |   sequence                                    (binary + decimal):
 654 | 
 655 |   0 xxx              LowCoder[posState]       xxx
 656 |   1 0 yyy            MidCoder[posState]       yyy + 8
 657 |   1 1 zzzzzzzz       HighCoder                zzzzzzzz + 16
 658 | 
 659 | LZMA uses bit model variable "Choice" to decode the first selection bit.
 660 | 
 661 | If the first selection bit is equal to 0, the decoder uses binary tree 
 662 |   LowCoder[posState] to decode 3-bit zero-based match length (xxx).
 663 | 
 664 | If the first selection bit is equal to 1, the decoder uses bit model 
 665 |   variable "Choice2" to decode the second selection bit.
 666 | 
 667 |   If the second selection bit is equal to 0, the decoder uses binary tree 
 668 |     MidCoder[posState] to decode 3-bit "yyy" value, and zero-based match
 669 |     length is equal to (yyy + 8).
 670 | 
 671 |   If the second selection bit is equal to 1, the decoder uses binary tree 
 672 |     HighCoder to decode 8-bit "zzzzzzzz" value, and zero-based 
 673 |     match length is equal to (zzzzzzzz + 16).
 674 | 
 675 | LZMA uses "posState" value as context to select the binary tree 
 676 | from LowCoder and MidCoder binary tree arrays:
 677 | 
 678 |     unsigned posState = OutWindow.TotalPos & ((1 << pb) - 1);
 679 | 
 680 | The full code of the length decoder:
 681 | 
 682 | class CLenDecoder
 683 | {
 684 |   CProb Choice;
 685 |   CProb Choice2;
 686 |   CBitTreeDecoder<3> LowCoder[1 << kNumPosBitsMax];
 687 |   CBitTreeDecoder<3> MidCoder[1 << kNumPosBitsMax];
 688 |   CBitTreeDecoder<8> HighCoder;
 689 | 
 690 | public:
 691 | 
 692 |   void Init()
 693 |   {
 694 |     Choice = PROB_INIT_VAL;
 695 |     Choice2 = PROB_INIT_VAL;
 696 |     HighCoder.Init();
 697 |     for (unsigned i = 0; i < (1 << kNumPosBitsMax); i++)
 698 |     {
 699 |       LowCoder[i].Init();
 700 |       MidCoder[i].Init();
 701 |     }
 702 |   }
 703 | 
 704 |   unsigned Decode(CRangeDecoder *rc, unsigned posState)
 705 |   {
 706 |     if (rc->DecodeBit(&Choice) == 0)
 707 |       return LowCoder[posState].Decode(rc);
 708 |     if (rc->DecodeBit(&Choice2) == 0)
 709 |       return 8 + MidCoder[posState].Decode(rc);
 710 |     return 16 + HighCoder.Decode(rc);
 711 |   }
 712 | };
 713 | 
 714 | The LZMA decoder uses two instances of CLenDecoder class.
 715 | The first instance is for the matches of "Simple Match" type,
 716 | and the second instance is for the matches of "Rep Match" type:
 717 | 
 718 |   CLenDecoder LenDecoder;
 719 |   CLenDecoder RepLenDecoder;
 720 | 
 721 | 
 722 | The match distance decoding
 723 | ---------------------------
 724 | 
 725 | LZMA supports dictionary sizes up to 4 GiB minus 1.
 726 | The value of match distance (decoded by distance decoder) can be 
 727 | from 1 to 2^32. But the distance value that is equal to 2^32 is used to
 728 | indicate the "End of stream" marker. So real largest match distance 
 729 | that is used for LZ-window match is (2^32 - 1).
 730 | 
 731 | LZMA uses normalized match length (zero-based length) 
 732 | to calculate the context state "lenState" do decode the distance value:
 733 | 
 734 | #define kNumLenToPosStates 4
 735 | 
 736 |     unsigned lenState = len;
 737 |     if (lenState > kNumLenToPosStates - 1)
 738 |       lenState = kNumLenToPosStates - 1;
 739 | 
 740 | The distance decoder returns the "dist" value that is zero-based value 
 741 | of match distance. The real match distance can be calculated with the
 742 | following code:
 743 |   
 744 |   matchDistance = dist + 1; 
 745 | 
 746 | The state of the distance decoder and the initialization code: 
 747 | 
 748 |   #define kEndPosModelIndex 14
 749 |   #define kNumFullDistances (1 << (kEndPosModelIndex >> 1))
 750 |   #define kNumAlignBits 4
 751 | 
 752 |   CBitTreeDecoder<6> PosSlotDecoder[kNumLenToPosStates];
 753 |   CProb PosDecoders[1 + kNumFullDistances - kEndPosModelIndex];
 754 |   CBitTreeDecoder<kNumAlignBits> AlignDecoder;
 755 | 
 756 |   void InitDist()
 757 |   {
 758 |     for (unsigned i = 0; i < kNumLenToPosStates; i++)
 759 |       PosSlotDecoder[i].Init();
 760 |     AlignDecoder.Init();
 761 |     INIT_PROBS(PosDecoders);
 762 |   }
 763 | 
 764 | At first stage the distance decoder decodes 6-bit "posSlot" value with bit
 765 | tree decoder from PosSlotDecoder array. It's possible to get 2^6=64 different 
 766 | "posSlot" values.
 767 | 
 768 |     unsigned posSlot = PosSlotDecoder[lenState].Decode(&RangeDec);
 769 | 
 770 | The encoding scheme for distance value is shown in the following table:
 771 | 
 772 | posSlot (decimal) /
 773 |       zero-based distance (binary)
 774 |  0    0
 775 |  1    1
 776 |  2    10
 777 |  3    11
 778 | 
 779 |  4    10 x
 780 |  5    11 x
 781 |  6    10 xx
 782 |  7    11 xx
 783 |  8    10 xxx
 784 |  9    11 xxx
 785 | 10    10 xxxx
 786 | 11    11 xxxx
 787 | 12    10 xxxxx
 788 | 13    11 xxxxx
 789 | 
 790 | 14    10 yy zzzz
 791 | 15    11 yy zzzz
 792 | 16    10 yyy zzzz
 793 | 17    11 yyy zzzz
 794 | ...
 795 | 62    10 yyyyyyyyyyyyyyyyyyyyyyyyyy zzzz
 796 | 63    11 yyyyyyyyyyyyyyyyyyyyyyyyyy zzzz
 797 | 
 798 | where 
 799 |   "x ... x" means the sequence of binary symbols encoded with binary tree and 
 800 |       "Reverse" scheme. It uses separated binary tree for each posSlot from 4 to 13.
 801 |   "y" means direct bit encoded with range coder.
 802 |   "zzzz" means the sequence of four binary symbols encoded with binary
 803 |       tree with "Reverse" scheme, where one common binary tree "AlignDecoder"
 804 |       is used for all posSlot values.
 805 | 
 806 | If (posSlot < 4), the "dist" value is equal to posSlot value.
 807 | 
 808 | If (posSlot >= 4), the decoder uses "posSlot" value to calculate the value of
 809 |   the high bits of "dist" value and the number of the low bits.
 810 | 
 811 |   If (4 <= posSlot < kEndPosModelIndex), the decoder uses bit tree decoders.
 812 |     (one separated bit tree decoder per one posSlot value) and "Reverse" scheme.
 813 |     In this implementation we use one CProb array "PosDecoders" that contains 
 814 |     all CProb variables for all these bit decoders.
 815 |   
 816 |   if (posSlot >= kEndPosModelIndex), the middle bits are decoded as direct 
 817 |     bits from RangeDecoder and the low 4 bits are decoded with a bit tree 
 818 |     decoder "AlignDecoder" with "Reverse" scheme.
 819 | 
 820 | The code to decode zero-based match distance:
 821 |   
 822 |   unsigned DecodeDistance(unsigned len)
 823 |   {
 824 |     unsigned lenState = len;
 825 |     if (lenState > kNumLenToPosStates - 1)
 826 |       lenState = kNumLenToPosStates - 1;
 827 |     
 828 |     unsigned posSlot = PosSlotDecoder[lenState].Decode(&RangeDec);
 829 |     if (posSlot < 4)
 830 |       return posSlot;
 831 |     
 832 |     unsigned numDirectBits = (unsigned)((posSlot >> 1) - 1);
 833 |     UInt32 dist = ((2 | (posSlot & 1)) << numDirectBits);
 834 |     if (posSlot < kEndPosModelIndex)
 835 |       dist += BitTreeReverseDecode(PosDecoders + dist - posSlot, numDirectBits, &RangeDec);
 836 |     else
 837 |     {
 838 |       dist += RangeDec.DecodeDirectBits(numDirectBits - kNumAlignBits) << kNumAlignBits;
 839 |       dist += AlignDecoder.ReverseDecode(&RangeDec);
 840 |     }
 841 |     return dist;
 842 |   }
 843 | 
 844 | 
 845 | 
 846 | LZMA Decoding modes
 847 | -------------------
 848 | 
 849 | There are 2 types of LZMA streams:
 850 | 
 851 | 1) The stream with "End of stream" marker.
 852 | 2) The stream without "End of stream" marker.
 853 | 
 854 | And the LZMA Decoder supports 3 modes of decoding:
 855 | 
 856 | 1) The unpack size is undefined. The LZMA decoder stops decoding after 
 857 |    getting "End of stream" marker. 
 858 |    The input variables for that case:
 859 |     
 860 |       markerIsMandatory = true
 861 |       unpackSizeDefined = false
 862 |       unpackSize contains any value
 863 | 
 864 | 2) The unpack size is defined and LZMA decoder supports both variants, 
 865 |    where the stream can contain "End of stream" marker or the stream is
 866 |    finished without "End of stream" marker. The LZMA decoder must detect 
 867 |    any of these situations.
 868 |    The input variables for that case:
 869 |     
 870 |       markerIsMandatory = false
 871 |       unpackSizeDefined = true
 872 |       unpackSize contains unpack size
 873 | 
 874 | 3) The unpack size is defined and the LZMA stream must contain 
 875 |    "End of stream" marker
 876 |    The input variables for that case:
 877 |     
 878 |       markerIsMandatory = true
 879 |       unpackSizeDefined = true
 880 |       unpackSize contains unpack size
 881 | 
 882 | 
 883 | The main loop of decoder
 884 | ------------------------
 885 | 
 886 | The main loop of LZMA decoder:
 887 | 
 888 | Initialize the LZMA state.
 889 | loop
 890 | {
 891 |   // begin of loop
 892 |   Check "end of stream" conditions.
 893 |   Decode Type of MATCH / LITERAL. 
 894 |     If it's LITERAL, decode LITERAL value and put the LITERAL to Window.
 895 |     If it's MATCH, decode the length of match and the match distance. 
 896 |         Check error conditions, check end of stream conditions and copy
 897 |         the sequence of match bytes from sliding window to current position
 898 |         in window.
 899 |   Go to begin of loop
 900 | }
 901 | 
 902 | The reference implementation of LZMA decoder uses "unpackSize" variable
 903 | to keep the number of remaining bytes in output stream. So it reduces 
 904 | "unpackSize" value after each decoded LITERAL or MATCH.
 905 | 
 906 | The following code contains the "end of stream" condition check at the start
 907 | of the loop:
 908 | 
 909 |     if (unpackSizeDefined && unpackSize == 0 && !markerIsMandatory)
 910 |       if (RangeDec.IsFinishedOK())
 911 |         return LZMA_RES_FINISHED_WITHOUT_MARKER;
 912 | 
 913 | LZMA uses three types of matches:
 914 | 
 915 | 1) "Simple Match" -     the match with distance value encoded with bit models.
 916 | 
 917 | 2) "Rep Match" -        the match that uses the distance from distance
 918 |                         history table.
 919 | 
 920 | 3) "Short Rep Match" -  the match of single byte length, that uses the latest 
 921 |                         distance from distance history table.
 922 | 
 923 | The LZMA decoder keeps the history of latest 4 match distances that were used 
 924 | by decoder. That set of 4 variables contains zero-based match distances and 
 925 | these variables are initialized with zero values:
 926 | 
 927 |   UInt32 rep0 = 0, rep1 = 0, rep2 = 0, rep3 = 0;
 928 | 
 929 | The LZMA decoder uses binary model variables to select type of MATCH or LITERAL:
 930 | 
 931 | #define kNumStates 12
 932 | #define kNumPosBitsMax 4
 933 | 
 934 |   CProb IsMatch[kNumStates << kNumPosBitsMax];
 935 |   CProb IsRep[kNumStates];
 936 |   CProb IsRepG0[kNumStates];
 937 |   CProb IsRepG1[kNumStates];
 938 |   CProb IsRepG2[kNumStates];
 939 |   CProb IsRep0Long[kNumStates << kNumPosBitsMax];
 940 | 
 941 | The decoder uses "state" variable value to select exact variable 
 942 | from "IsRep", "IsRepG0", "IsRepG1" and "IsRepG2" arrays.
 943 | The "state" variable can get the value from 0 to 11.
 944 | Initial value for "state" variable is zero:
 945 | 
 946 |   unsigned state = 0;
 947 | 
 948 | The "state" variable is updated after each LITERAL or MATCH with one of the
 949 | following functions:
 950 | 
 951 | unsigned UpdateState_Literal(unsigned state)
 952 | {
 953 |   if (state < 4) return 0;
 954 |   else if (state < 10) return state - 3;
 955 |   else return state - 6;
 956 | }
 957 | unsigned UpdateState_Match   (unsigned state) { return state < 7 ? 7 : 10; }
 958 | unsigned UpdateState_Rep     (unsigned state) { return state < 7 ? 8 : 11; }
 959 | unsigned UpdateState_ShortRep(unsigned state) { return state < 7 ? 9 : 11; }
 960 | 
 961 | The decoder calculates "state2" variable value to select exact variable from 
 962 | "IsMatch" and "IsRep0Long" arrays:
 963 | 
 964 | unsigned posState = OutWindow.TotalPos & ((1 << pb) - 1);
 965 | unsigned state2 = (state << kNumPosBitsMax) + posState;
 966 | 
 967 | The decoder uses the following code flow scheme to select exact 
 968 | type of LITERAL or MATCH:
 969 | 
 970 | IsMatch[state2] decode
 971 |   0 - the Literal
 972 |   1 - the Match
 973 |     IsRep[state] decode
 974 |       0 - Simple Match
 975 |       1 - Rep Match
 976 |         IsRepG0[state] decode
 977 |           0 - the distance is rep0
 978 |             IsRep0Long[state2] decode
 979 |               0 - Short Rep Match
 980 |               1 - Rep Match 0
 981 |           1 - 
 982 |             IsRepG1[state] decode
 983 |               0 - Rep Match 1
 984 |               1 - 
 985 |                 IsRepG2[state] decode
 986 |                   0 - Rep Match 2
 987 |                   1 - Rep Match 3
 988 | 
 989 | 
 990 | LITERAL symbol
 991 | --------------
 992 | If the value "0" was decoded with IsMatch[state2] decoding, we have "LITERAL" type.
 993 | 
 994 | At first the LZMA decoder must check that it doesn't exceed 
 995 | specified uncompressed size:
 996 | 
 997 |       if (unpackSizeDefined && unpackSize == 0)
 998 |         return LZMA_RES_ERROR;
 999 | 
1000 | Then it decodes literal value and puts it to sliding window:
1001 | 
1002 |       DecodeLiteral(state, rep0);
1003 | 
1004 | Then the decoder must update the "state" value and "unpackSize" value;
1005 | 
1006 |       state = UpdateState_Literal(state);
1007 |       unpackSize--;
1008 | 
1009 | Then the decoder must go to the begin of main loop to decode next Match or Literal.
1010 | 
1011 | 
1012 | Simple Match
1013 | ------------
1014 | 
1015 | If the value "1" was decoded with IsMatch[state2] decoding,
1016 | we have the "Simple Match" type.
1017 | 
1018 | The distance history table is updated with the following scheme:
1019 |     
1020 |       rep3 = rep2;
1021 |       rep2 = rep1;
1022 |       rep1 = rep0;
1023 | 
1024 | The zero-based length is decoded with "LenDecoder":
1025 | 
1026 |       len = LenDecoder.Decode(&RangeDec, posState);
1027 | 
1028 | The state is update with UpdateState_Match function:
1029 | 
1030 |       state = UpdateState_Match(state);
1031 | 
1032 | and the new "rep0" value is decoded with DecodeDistance:
1033 | 
1034 |       rep0 = DecodeDistance(len);
1035 | 
1036 | That "rep0" will be used as zero-based distance for current match.
1037 | 
1038 | If the value of "rep0" is equal to 0xFFFFFFFF, it means that we have 
1039 | "End of stream" marker, so we can stop decoding and check finishing 
1040 | condition in Range Decoder:
1041 | 
1042 |       if (rep0 == 0xFFFFFFFF)
1043 |         return RangeDec.IsFinishedOK() ?
1044 |             LZMA_RES_FINISHED_WITH_MARKER :
1045 |             LZMA_RES_ERROR;
1046 | 
1047 | If uncompressed size is defined, LZMA decoder must check that it doesn't 
1048 | exceed that specified uncompressed size:
1049 | 
1050 |       if (unpackSizeDefined && unpackSize == 0)
1051 |         return LZMA_RES_ERROR;
1052 | 
1053 | Also the decoder must check that "rep0" value is not larger than dictionary size
1054 | and is not larger than the number of already decoded bytes:
1055 | 
1056 |       if (rep0 >= dictSize || !OutWindow.CheckDistance(rep0))
1057 |         return LZMA_RES_ERROR;
1058 | 
1059 | Then the decoder must copy match bytes as described in 
1060 | "The match symbols copying" section.
1061 | 
1062 | 
1063 | Rep Match
1064 | ---------
1065 | 
1066 | If the LZMA decoder has decoded the value "1" with IsRep[state] variable,
1067 | we have "Rep Match" type.
1068 | 
1069 | At first the LZMA decoder must check that it doesn't exceed 
1070 | specified uncompressed size:
1071 | 
1072 |       if (unpackSizeDefined && unpackSize == 0)
1073 |         return LZMA_RES_ERROR;
1074 | 
1075 | Also the decoder must return error, if the LZ window is empty:
1076 | 
1077 |       if (OutWindow.IsEmpty())
1078 |         return LZMA_RES_ERROR;
1079 | 
1080 | If the match type is "Rep Match", the decoder uses one of the 4 variables of
1081 | distance history table to get the value of distance for current match.
1082 | And there are 4 corresponding ways of decoding flow. 
1083 | 
1084 | The decoder updates the distance history with the following scheme 
1085 | depending from type of match:
1086 | 
1087 | - "Rep Match 0" or "Short Rep Match":
1088 |       ; LZMA doesn't update the distance history    
1089 | 
1090 | - "Rep Match 1":
1091 |       UInt32 dist = rep1;
1092 |       rep1 = rep0;
1093 |       rep0 = dist;
1094 | 
1095 | - "Rep Match 2":
1096 |       UInt32 dist = rep2;
1097 |       rep2 = rep1;
1098 |       rep1 = rep0;
1099 |       rep0 = dist;
1100 | 
1101 | - "Rep Match 3":
1102 |       UInt32 dist = rep3;
1103 |       rep3 = rep2;
1104 |       rep2 = rep1;
1105 |       rep1 = rep0;
1106 |       rep0 = dist;
1107 | 
1108 | Then the decoder decodes exact subtype of "Rep Match" using "IsRepG0", "IsRep0Long",
1109 | "IsRepG1", "IsRepG2".
1110 | 
1111 | If the subtype is "Short Rep Match", the decoder updates the state, puts 
1112 | the one byte from window to current position in window and goes to next 
1113 | MATCH/LITERAL symbol (the begin of main loop):
1114 | 
1115 |           state = UpdateState_ShortRep(state);
1116 |           OutWindow.PutByte(OutWindow.GetByte(rep0 + 1));
1117 |           unpackSize--;
1118 |           continue;
1119 | 
1120 | In other cases (Rep Match 0/1/2/3), it decodes the zero-based 
1121 | length of match with "RepLenDecoder" decoder:
1122 | 
1123 |       len = RepLenDecoder.Decode(&RangeDec, posState);
1124 | 
1125 | Then it updates the state:
1126 | 
1127 |       state = UpdateState_Rep(state);
1128 | 
1129 | Then the decoder must copy match bytes as described in 
1130 | "The Match symbols copying" section.
1131 | 
1132 | 
1133 | The match symbols copying
1134 | -------------------------
1135 | 
1136 | If we have the match (Simple Match or Rep Match 0/1/2/3), the decoder must
1137 | copy the sequence of bytes with calculated match distance and match length.
1138 | If uncompressed size is defined, LZMA decoder must check that it doesn't 
1139 | exceed that specified uncompressed size:
1140 | 
1141 |     len += kMatchMinLen;
1142 |     bool isError = false;
1143 |     if (unpackSizeDefined && unpackSize < len)
1144 |     {
1145 |       len = (unsigned)unpackSize;
1146 |       isError = true;
1147 |     }
1148 |     OutWindow.CopyMatch(rep0 + 1, len);
1149 |     unpackSize -= len;
1150 |     if (isError)
1151 |       return LZMA_RES_ERROR;
1152 | 
1153 | Then the decoder must go to the begin of main loop to decode next MATCH or LITERAL.
1154 | 
1155 | 
1156 | 
1157 | NOTES
1158 | -----
1159 | 
1160 | This specification doesn't describe the variant of decoder implementation 
1161 | that supports partial decoding. Such partial decoding case can require some 
1162 | changes in "end of stream" condition checks code. Also such code 
1163 | can use additional status codes, returned by decoder.
1164 | 
1165 | This specification uses C++ code with templates to simplify describing.
1166 | The optimized version of LZMA decoder doesn't need templates.
1167 | Such optimized version can use just two arrays of CProb variables:
1168 |   1) The dynamic array of CProb variables allocated for the Literal Decoder.
1169 |   2) The one common array that contains all other CProb variables.
1170 | 
1171 | 
1172 | References:      
1173 | 
1174 | 1. G. N. N. Martin, Range encoding: an algorithm for removing redundancy 
1175 |    from a digitized message, Video & Data Recording Conference, 
1176 |    Southampton, UK, July 24-27, 1979.
1177 | 


--------------------------------------------------------------------------------
/lzmaSh2.cpp:
--------------------------------------------------------------------------------
  1 | #include <stdio.h>
  2 | 
  3 | typedef unsigned int  uint;
  4 | typedef unsigned char byte;
  5 | typedef unsigned long long qword;
  6 | 
  7 | struct lzma_decode {
  8 | 
  9 |   enum { SCALElog=11, SCALE=1<<SCALElog, hSCALE=SCALE/2 };
 10 | 
 11 |   struct Counter {
 12 |     short P;
 13 |     Counter() { P = hSCALE; }
 14 |     void Update( uint bit ) { P += bit ? -(P>>5) : ((SCALE-P)>>5); }
 15 |   };
 16 | 
 17 |   FILE *f, *g;
 18 |   byte get( void ) { return getc(f); }
 19 |   void put( uint c ) { putc(c,g); }
 20 | 
 21 |   uint range, code;
 22 | 
 23 |   void rc_Init( void ) {
 24 |     code = get() | ((get()<<24) | (get()<<16) | (get()<<8) | (get()));
 25 |     range = 0xFFFFFFFF;
 26 |   }
 27 | 
 28 |   uint rc_Bits( uint l ) {
 29 |     uint x=0; do {
 30 |       if( range<0x01000000 ) range<<=8, code=(code<<8) | get();
 31 |       range &= ~1;
 32 |       uint rnew = (range>>1) * 1;
 33 |       uint bit = code >= rnew;
 34 |       range = bit ? code-=rnew,range-rnew : rnew;
 35 |       x += x + bit; 
 36 |     } while( --l!=0 );
 37 |     return x;
 38 |   }
 39 | 
 40 |   uint rc_Decode( uint P ) {
 41 |     if( range<0x01000000 ) range<<=8, code=(code<<8) | get();
 42 |     uint rnew = (range >> SCALElog) * P; 
 43 |     uint bit = code >= rnew;
 44 |     range = bit ? code-=rnew,range-rnew : rnew;
 45 |     return bit;
 46 |   }
 47 | 
 48 |   uint BIT( Counter& cc ) { 
 49 |     uint bit = rc_Decode(cc.P);
 50 |     cc.Update( bit );
 51 |     return bit; 
 52 |   }
 53 | 
 54 |   enum {
 55 |     kNumLPosBitsMax = 4,
 56 |     kNumPosBitsMax  = 4, kNumPosStatesMax = (1<<kNumPosBitsMax),
 57 |     kLenNumLowBits  = 3, kLenNumLowSymbols = (1<<kLenNumLowBits),
 58 |     kLenNumMidBits  = 3, kLenNumMidSymbols = (1<<kLenNumMidBits),
 59 |     kLenNumHighBits = 8, kLenNumHighSymbols = (1<<kLenNumHighBits),
 60 |     kStartPosModelIndex = 4, kEndPosModelIndex = 14, 
 61 |     kNumFullDistances = (1<<(kEndPosModelIndex>>1)),
 62 |     kNumPosSlotBits = 6, kNumLenToPosStates = 4,
 63 |     kNumAlignBits   = 4, kAlignTableSize = (1<<kNumAlignBits),
 64 |     kMatchMinLen    = 2, kNumLitStates = 7, kNumStates = 12
 65 |   };
 66 | 
 67 |   byte* dic;
 68 |   uint lc, lp, pb, dictSize;
 69 |   qword f_len;
 70 |   byte rbit5[32];
 71 | 
 72 |   Counter c_IsMatch[kNumStates][1<<kNumPosBitsMax];
 73 |   Counter c_IsRep[kNumStates];
 74 |   Counter c_IsRepG0[kNumStates];
 75 |   Counter c_IsRepG1[kNumStates];
 76 |   Counter c_IsRepG2[kNumStates];
 77 |   Counter c_IsRep0Long[kNumStates][1<<kNumPosBitsMax];
 78 |   Counter c_LenChoice[2];
 79 |   Counter c_LenChoice2[2];
 80 |   Counter c_Literal[1<<kNumLPosBitsMax][256][3][256];
 81 |   Counter c_LenLow[2][kNumPosStatesMax][1<<kLenNumLowBits];
 82 |   Counter c_LenMid[2][kNumPosStatesMax][1<<kLenNumMidBits];
 83 |   Counter c_LenHigh[2][1<<kLenNumHighBits];
 84 |   Counter c_PosSlot[kNumLenToPosStates][1<<kNumPosSlotBits];
 85 |   Counter c_SpecPos[kNumFullDistances-kEndPosModelIndex];
 86 |   Counter c_Align[1<<kNumAlignBits];
 87 | 
 88 |   lzma_decode( FILE* _f, FILE* _g ) {
 89 | 
 90 |     f=_f; g=_g;
 91 |     byte d = get(); // lc/pb/lp byte first
 92 |     dictSize = get() | (get()<<8) | (get()<<16) | (get()<<24);
 93 |     dic = new byte[dictSize];
 94 |     f_len = 0;
 95 |     uint i; for( i=0; i<8; i++ ) f_len = (f_len>>8) | (qword(get())<<56);
 96 |     rc_Init();
 97 |     lc = d % 9; d /= 9;
 98 |     pb = d / 5; lp = d % 5;
 99 | 
100 |     for( i=0; i<32; i++ ) rbit5[i] = ((i*0x0802&0x22110)|(i*0x8020&0x88440))*0x10101 >> 16+3;
101 | 
102 |     uint state=0,rep0=1,rep1=1,rep2=1,rep3=1;
103 |     uint dicPos = 0, dicBufSize = dictSize;
104 |     uint pbMask = (1<<pb)-1, lpMask = (1<<lp)-1, lc8 = 8-lc;
105 |     uint sym=0, dist, pos, len;
106 | 
107 |     for( qword filepos=0; filepos<f_len; ) {
108 | 
109 |       uint posState = filepos & pbMask;
110 |       uint psym = byte(sym);
111 |       #define rep0pos() (dicPos-rep0) + ((dicPos<rep0) ? dicBufSize : 0)
112 |       #define symstore(sym) { dic[dicPos]=sym; if( ++dicPos==dicBufSize ) dicPos=0; put(sym); filepos++; }
113 | 
114 |       if( BIT(c_IsMatch[state][posState])==0 ) { // decode a literal?
115 | 
116 |         Counter (&cc)[3][256] = c_Literal[filepos&lpMask][psym>>lc8];
117 |         if( state>=kNumLitStates ) {
118 |           uint matchbyte = 0x100 + dic[rep0pos()];
119 |           for( sym=1; sym<0x100; ) {
120 |             uint mbprefix = (matchbyte<<=1) >> 8;
121 |             sym += sym + BIT(cc[1+(mbprefix&1)][sym]);
122 |             if( mbprefix!=sym ) break;
123 |           }
124 |         } else sym=1;
125 |         for(; sym<0x100; sym+=sym+BIT(cc[0][sym]) );
126 | 
127 |         symstore(sym);
128 |         state = (state<4) ? 0 : (state<10) ? state-3 : state-6;
129 | 
130 |       } else {
131 | 
132 |         uint f_rep = BIT(c_IsRep[state]);
133 | 
134 |         if( f_rep==0 ) state += kNumStates; else {
135 | 
136 |           if( BIT(c_IsRepG0[state])==0 ) {
137 | 
138 |             if( BIT(c_IsRep0Long[state][posState])==0 ) {
139 |               sym = dic[rep0pos()]; symstore(sym);
140 |               state = state < kNumLitStates ? 9 : 11;
141 |               continue;
142 |             }
143 | 
144 |           } else {
145 | 
146 |             dist = rep1;
147 |             if( BIT(c_IsRepG1[state]) ) {
148 |               dist = rep2;
149 |               if( BIT(c_IsRepG2[state]) ) dist = rep3, rep3 = rep2;
150 |               rep2 = rep1;
151 |             }
152 |             rep1 = rep0; rep0 = dist;
153 |           }
154 | 
155 |           state = state < kNumLitStates ? 8 : 11;
156 |         }
157 | 
158 |         uint limit, offset;
159 |         Counter* clen = 0;
160 |         if( BIT(c_LenChoice[f_rep])==0 ) {
161 |           clen = &c_LenLow[f_rep][posState][0];
162 |           offset = 0; limit = (1 << kLenNumLowBits);
163 |         } else {
164 |           if( BIT(c_LenChoice2[f_rep])==0 ) {
165 |             clen = &c_LenMid[f_rep][posState][0];
166 |             offset = kLenNumLowSymbols; limit = (1<<kLenNumMidBits);
167 |           } else {
168 |             clen = &c_LenHigh[f_rep][0];
169 |             offset = kLenNumLowSymbols + kLenNumMidSymbols; limit = (1<<kLenNumHighBits);
170 |           }
171 |         }
172 |         for( len=1; len<limit; len+=len + BIT(clen[len]) );
173 |         len -= limit; len += offset;
174 | 
175 |         if( state>=kNumStates ) {
176 |           Counter (&cpos)[1<<kNumPosSlotBits] = c_PosSlot[len<kNumLenToPosStates?len:kNumLenToPosStates-1];
177 |           for( dist=1; dist<64; dist+=dist + BIT(cpos[dist]) ); dist -= 64;
178 | 
179 |           if( dist>=kStartPosModelIndex ) {
180 |             uint posSlot = dist;
181 |             int numDirectBits = (dist>>1) - 1; // 13/2-1=5 max
182 |             dist = (2 | (dist & 1));
183 | 
184 |             if( posSlot<kEndPosModelIndex ) {
185 | 
186 |               dist <<= numDirectBits;
187 |               uint limit = 1<<numDirectBits;
188 |               for( i=1; i<limit; i+=i + BIT(c_SpecPos[dist-posSlot-1+i]) );
189 |               dist += rbit5[ (i-limit)<<(5-numDirectBits) ];
190 | 
191 |             } else {
192 | 
193 |               numDirectBits -= kNumAlignBits;
194 |               (dist<<=numDirectBits) += rc_Bits(numDirectBits);
195 |               for( i=1; i<16; i+=i + BIT(c_Align[i]) );
196 |               (dist <<= kNumAlignBits) += rbit5[(i-16)<<1];
197 |               if( dist==0xFFFFFFFFU ) break; // EOF?
198 |             }
199 |           }
200 |           rep3=rep2; rep2=rep1; rep1=rep0; rep0=dist+1;
201 | 
202 |           state = (state < kNumStates + kNumLitStates) ? kNumLitStates : kNumLitStates + 3;
203 |         } // dist
204 | 
205 |         for( pos=rep0pos(),len+=kMatchMinLen; len>0; len-- ) {
206 |           sym = dic[pos]; symstore(sym);
207 |           if( ++pos == dicBufSize ) pos=0;
208 |         }
209 | 
210 |       } // match
211 | 
212 |     } // for
213 | 
214 |   }
215 | 
216 | };
217 | 
218 | int main( int argc, char** argv ) {
219 |   if( argc<3 ) return 1;
220 |   FILE* f = fopen( argv[1], "rb" ); if( f==0 ) return 2;
221 |   FILE* g = fopen( argv[2], "wb" ); if( g==0 ) return 3;
222 | 
223 |   static lzma_decode D( f, g );
224 | 
225 |   fclose( f );
226 |   fclose( g );
227 |   return 0;
228 | }
229 | 


--------------------------------------------------------------------------------
/lzmaSh2.exe:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shelwien/lzma_sh/0e916adc087e029ad5afd1567bf0ca03a5574a1b/lzmaSh2.exe


--------------------------------------------------------------------------------
/lzmaSh2a.cpp:
--------------------------------------------------------------------------------
  1 | #include <stdio.h>
  2 | 
  3 | typedef unsigned int  uint;
  4 | typedef unsigned char byte;
  5 | typedef unsigned long long qword;
  6 | 
  7 | const byte statemap[][7] = {
  8 |   {  7,  0,  8,  9,  8,  8,  8, },
  9 |   {  7,  0,  8,  9,  8,  8,  8, },
 10 |   {  7,  0,  8,  9,  8,  8,  8, },
 11 |   {  7,  0,  8,  9,  8,  8,  8, },
 12 |   {  7,  1,  8,  9,  8,  8,  8, },
 13 |   {  7,  2,  8,  9,  8,  8,  8, },
 14 |   {  7,  3,  8,  9,  8,  8,  8, },
 15 |   { 10,  4, 11, 11, 11, 11, 11, },
 16 |   { 10,  5, 11, 11, 11, 11, 11, },
 17 |   { 10,  6, 11, 11, 11, 11, 11, },
 18 |   { 10,  4, 11, 11, 11, 11, 11, },
 19 |   { 10,  5, 11, 11, 11, 11, 11, },
 20 | };
 21 | 
 22 | struct lzma_decode {
 23 | 
 24 |   enum { SCALElog=11, SCALE=1<<SCALElog, hSCALE=SCALE/2 };
 25 | 
 26 |   struct Counter {
 27 |     short P;
 28 |     Counter() { P = hSCALE; }
 29 |     void Update( uint bit ) { P += bit ? -(P>>5) : ((SCALE-P)>>5); }
 30 |   };
 31 | 
 32 |   FILE *f, *g;
 33 |   byte get( void ) { return getc(f); }
 34 |   void put( uint c ) { putc(c,g); }
 35 | 
 36 |   uint range, code;
 37 | 
 38 |   void rc_Init( void ) {
 39 |     code = get() | ((get()<<24) | (get()<<16) | (get()<<8) | (get()));
 40 |     range = 0xFFFFFFFF;
 41 |   }
 42 | 
 43 |   uint rc_Bits( uint l ) {
 44 |     uint x=0; do {
 45 |       if( range<0x01000000 ) range<<=8, code=(code<<8) | get();
 46 |       range &= ~1;
 47 |       uint rnew = (range>>1) * 1;
 48 |       uint bit = code >= rnew;
 49 |       range = bit ? code-=rnew,range-rnew : rnew;
 50 |       x += x + bit; 
 51 |     } while( --l!=0 );
 52 |     return x;
 53 |   }
 54 | 
 55 |   uint rc_Decode( uint P ) {
 56 |     if( range<0x01000000 ) range<<=8, code=(code<<8) | get();
 57 |     uint rnew = (range >> SCALElog) * P; 
 58 |     uint bit = code >= rnew;
 59 |     range = bit ? code-=rnew,range-rnew : rnew;
 60 |     return bit;
 61 |   }
 62 | 
 63 |   uint BIT( Counter& cc ) { 
 64 |     uint bit = rc_Decode(cc.P);
 65 |     cc.Update( bit );
 66 |     return bit; 
 67 |   }
 68 | 
 69 |   enum {
 70 |     kNumLPosBitsMax = 4,
 71 |     kNumPosBitsMax  = 4, kNumPosStatesMax = (1<<kNumPosBitsMax),
 72 |     kLenNumLowBits  = 3, kLenNumLowSymbols = (1<<kLenNumLowBits),
 73 |     kLenNumMidBits  = 3, kLenNumMidSymbols = (1<<kLenNumMidBits),
 74 |     kLenNumHighBits = 8, kLenNumHighSymbols = (1<<kLenNumHighBits),
 75 |     kStartPosModelIndex = 4, kEndPosModelIndex = 14, 
 76 |     kNumFullDistances = (1<<(kEndPosModelIndex>>1)),
 77 |     kNumPosSlotBits = 6, kNumLenToPosStates = 4,
 78 |     kNumAlignBits   = 4, kAlignTableSize = (1<<kNumAlignBits),
 79 |     kMatchMinLen    = 2, kNumLitStates = 7, kNumStates = 12,
 80 |     id_match=0, id_lit,id_r0,id_litr0, id_r1,id_r2,id_r3
 81 |   };
 82 | 
 83 |   byte* dic;
 84 |   uint lc, lp, pb, dictSize;
 85 |   qword f_len;
 86 |   byte rbit5[32];
 87 | 
 88 |   Counter c_IsMatch[kNumStates][1<<kNumPosBitsMax];
 89 |   Counter c_IsRep[kNumStates];
 90 |   Counter c_IsRepG0[kNumStates];
 91 |   Counter c_IsRepG1[kNumStates];
 92 |   Counter c_IsRepG2[kNumStates];
 93 |   Counter c_IsRep0Long[kNumStates][1<<kNumPosBitsMax];
 94 |   Counter c_LenChoice[2];
 95 |   Counter c_LenChoice2[2];
 96 |   Counter c_Literal[1<<kNumLPosBitsMax][256][3][256];
 97 |   Counter c_LenLow[2][kNumPosStatesMax][1<<kLenNumLowBits];
 98 |   Counter c_LenMid[2][kNumPosStatesMax][1<<kLenNumMidBits];
 99 |   Counter c_LenHigh[2][1<<kLenNumHighBits];
100 |   Counter c_PosSlot[kNumLenToPosStates][1<<kNumPosSlotBits];
101 |   Counter c_SpecPos[kNumFullDistances-kEndPosModelIndex];
102 |   Counter c_Align[1<<kNumAlignBits];
103 | 
104 |   lzma_decode( FILE* _f, FILE* _g ) {
105 | 
106 |     f=_f; g=_g;
107 |     byte d = get(); // lc/pb/lp byte first
108 |     dictSize = get() | (get()<<8) | (get()<<16) | (get()<<24);
109 |     dic = new byte[dictSize];
110 |     f_len = 0;
111 |     uint i; for( i=0; i<8; i++ ) f_len = (f_len>>8) | (qword(get())<<56);
112 |     rc_Init();
113 |     lc = d % 9; d /= 9;
114 |     pb = d / 5; lp = d % 5;
115 | 
116 |     for( i=0; i<32; i++ ) rbit5[i] = ((i*0x0802&0x22110)|(i*0x8020&0x88440))*0x10101 >> 16+3;
117 | 
118 |     uint state=0,rep0=1,rep1=1,rep2=1,rep3=1;
119 |     uint dicPos = 0, dicBufSize = dictSize;
120 |     uint pbMask = (1<<pb)-1, lpMask = (1<<lp)-1, lc8 = 8-lc;
121 |     uint id, val, sym=0, i_len, dist, pos, len, cps;
122 |     Counter* clen = 0;
123 | 
124 |     for( qword filepos=0; filepos<f_len; state=statemap[state][id] ) {
125 | 
126 |       uint posState = filepos & pbMask;
127 |       uint psym = byte(sym);
128 |       #define rep0pos() (dicPos-rep0) + ((dicPos<rep0) ? dicBufSize : 0)
129 |       #define symstore(sym) { dic[dicPos]=sym; if( ++dicPos==dicBufSize ) dicPos=0; put(sym); filepos++; }
130 | 
131 |       if( BIT(c_IsMatch[state][posState])==0 ) id=id_lit; else
132 |         if( BIT(c_IsRep[state])==0 ) id=id_match; else
133 |           if( BIT(c_IsRepG0[state])==0 )
134 |             if( BIT(c_IsRep0Long[state][posState])==0 ) id=id_litr0; else id=id_r0;
135 |           else
136 |             if( BIT(c_IsRepG1[state])==0 ) id=id_r1; else
137 |               if( BIT(c_IsRepG2[state]) ) id=id_r3; else id=id_r2;
138 | 
139 |       if( (id==id_lit) || (id==id_litr0) ) { // decode a literal?
140 | 
141 |         if( id==id_litr0 ) sym = dic[rep0pos()]; else {
142 |           Counter (&cc)[3][256] = c_Literal[filepos&lpMask][psym>>lc8];
143 | 
144 |           if( state>=kNumLitStates ) {
145 |             uint matchbyte = 0x100 + dic[rep0pos()];
146 |             for( sym=1; sym<0x100; ) {
147 |               uint mbprefix = (matchbyte<<=1) >> 8;
148 |               sym += sym + BIT(cc[1+(mbprefix&1)][sym]);
149 |               if( mbprefix!=sym ) break;
150 |             }
151 |           } else sym=1;
152 |           for(; sym<0x100; sym+=sym+BIT(cc[0][sym]) );
153 |         }
154 | 
155 |         symstore(sym);
156 | 
157 |       } else {
158 | 
159 |         uint f_rep = (id!=id_match);
160 | 
161 |         if( f_rep ) {
162 |           if( id!=id_r0 ) {
163 |             dist = rep1;
164 |             if( id!=id_r1 ) {
165 |               dist = rep2;
166 |               if( id==id_r3 ) dist = rep3, rep3 = rep2;
167 |               rep2 = rep1;
168 |             }
169 |             rep1 = rep0; rep0 = dist;
170 |           }
171 |         }
172 | 
173 |         if( BIT(c_LenChoice[f_rep])==0 ) i_len=0; else
174 |           if( BIT(c_LenChoice2[f_rep])==0 ) i_len=1; else i_len=2;
175 | 
176 |         uint limit, offset;
177 |         if( i_len==0 ) {
178 |           clen = &c_LenLow[f_rep][posState][0];
179 |           offset = 0; limit = (1 << kLenNumLowBits);
180 |         } else {
181 |           if( i_len==1 ) {
182 |             clen = &c_LenMid[f_rep][posState][0];
183 |             offset = kLenNumLowSymbols; limit = (1<<kLenNumMidBits);
184 |           } else {
185 |             clen = &c_LenHigh[f_rep][0];
186 |             offset = kLenNumLowSymbols + kLenNumMidSymbols; limit = (1<<kLenNumHighBits);
187 |           }
188 |         }
189 | 
190 |         for( len=1; len<limit; len+=len + BIT(clen[len]) );
191 |         len -= limit; 
192 |         len += offset;
193 | 
194 |         if( f_rep==0 ) {
195 | 
196 |           Counter (&cpos)[1<<kNumPosSlotBits] = c_PosSlot[len<kNumLenToPosStates?len:kNumLenToPosStates-1];
197 | 
198 |           for( dist=1; dist<64; dist+=dist + BIT(cpos[dist]) ); dist -= 64;
199 | 
200 |           if( dist>=kStartPosModelIndex ) {
201 |             uint posSlot = dist;
202 |             int numDirectBits = (dist>>1) - 1; // 13/2-1=5 max
203 |             dist = (2 | (dist & 1));
204 | 
205 |             if( posSlot<kEndPosModelIndex ) {
206 | 
207 |               dist <<= numDirectBits;
208 |               uint limit = 1<<numDirectBits;
209 |               for( i=1; i<limit; i+=i + BIT(c_SpecPos[dist-posSlot-1+i]) );
210 |               dist += rbit5[ (i-limit)<<(5-numDirectBits) ];
211 | 
212 |             } else {
213 | 
214 |               numDirectBits -= kNumAlignBits;
215 |               (dist<<=numDirectBits) += rc_Bits(numDirectBits);
216 |               for( i=1; i<16; i+=i + BIT(c_Align[i]) );
217 |               (dist <<= kNumAlignBits) += rbit5[(i-16)<<1];
218 |               if( dist==0xFFFFFFFFU ) break; // EOF?
219 |             }
220 |           }
221 |           rep3=rep2; rep2=rep1; rep1=rep0; rep0=dist+1;
222 |         } // dist
223 | 
224 |         for( pos=rep0pos(),len+=kMatchMinLen; len>0; len-- ) {
225 |           sym = dic[pos]; symstore(sym);
226 |           if( ++pos == dicBufSize ) pos=0;
227 |         }
228 | 
229 |       } // match
230 | 
231 |     } // for
232 | 
233 |   }
234 | 
235 | };
236 | 
237 | int main( int argc, char** argv ) {
238 |   if( argc<3 ) return 1;
239 |   FILE* f = fopen( argv[1], "rb" ); if( f==0 ) return 2;
240 |   FILE* g = fopen( argv[2], "wb" ); if( g==0 ) return 3;
241 | 
242 |   static lzma_decode D( f, g );
243 | 
244 |   fclose( f );
245 |   fclose( g );
246 |   return 0;
247 | }
248 | 


--------------------------------------------------------------------------------
/lzmaSh2a.exe:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shelwien/lzma_sh/0e916adc087e029ad5afd1567bf0ca03a5574a1b/lzmaSh2a.exe


--------------------------------------------------------------------------------
/lzmaspec-readme.txt:
--------------------------------------------------------------------------------
 1 | LZMA Specification
 2 | ------------------
 3 | 
 4 | This package contains:
 5 | 
 6 |   - LZMA Specification
 7 |   - LZMA Reference Decoder in C++
 8 |   - The folder with examples of lzma archives
 9 | 
10 | Note that LZMA Reference Decoder is not optimized for speed.
11 | You can use LZMA Decoder from LZMA SDK, if you need the code optimized for speed.
12 | 
13 | If you see some bug in code or errors in text of specification,
14 | you can send a message to Igor Pavlov in support forum 
15 | or via SourceForge email message system:
16 | 
17 | http://www.7-zip.org/support.html
18 | 
19 | 
20 | ---
21 | Igor Pavlov
22 | http://www.7-zip.org
23 | 


--------------------------------------------------------------------------------
/test.bat:
--------------------------------------------------------------------------------
1 | @echo off
2 | 
3 | del 1 2
4 | lzmaSh2.exe geo_lzma 1
5 | lzmaSh2a.exe geo_lzma 2
6 | fc /b 1 2
7 | 
8 | 


--------------------------------------------------------------------------------