├── LICENSE
├── LzmaSpec.cpp
├── LzmaSpec.exe
├── README.md
├── examples
├── a.lzma
├── a.txt
├── a_eos.lzma
├── a_eos_and_size.lzma
├── a_lp1_lc2_pb1.lzma
├── bad_corrupted.lzma
├── bad_eos_incorrect_size.lzma
├── bad_incorrect_size.lzma
└── info.txt
├── geo_lzma
├── lzma-specification.txt
├── lzmaSh2.cpp
├── lzmaSh2.exe
├── lzmaSh2a.cpp
├── lzmaSh2a.exe
├── lzmaspec-readme.txt
└── test.bat
/LICENSE:
--------------------------------------------------------------------------------
1 | GNU GENERAL PUBLIC LICENSE
2 | Version 3, 29 June 2007
3 |
4 | Copyright (C) 2007 Free Software Foundation, Inc.
5 | Everyone is permitted to copy and distribute verbatim copies
6 | of this license document, but changing it is not allowed.
7 |
8 | Preamble
9 |
10 | The GNU General Public License is a free, copyleft license for
11 | software and other kinds of works.
12 |
13 | The licenses for most software and other practical works are designed
14 | to take away your freedom to share and change the works. By contrast,
15 | the GNU General Public License is intended to guarantee your freedom to
16 | share and change all versions of a program--to make sure it remains free
17 | software for all its users. We, the Free Software Foundation, use the
18 | GNU General Public License for most of our software; it applies also to
19 | any other work released this way by its authors. You can apply it to
20 | your programs, too.
21 |
22 | When we speak of free software, we are referring to freedom, not
23 | price. Our General Public Licenses are designed to make sure that you
24 | have the freedom to distribute copies of free software (and charge for
25 | them if you wish), that you receive source code or can get it if you
26 | want it, that you can change the software or use pieces of it in new
27 | free programs, and that you know you can do these things.
28 |
29 | To protect your rights, we need to prevent others from denying you
30 | these rights or asking you to surrender the rights. Therefore, you have
31 | certain responsibilities if you distribute copies of the software, or if
32 | you modify it: responsibilities to respect the freedom of others.
33 |
34 | For example, if you distribute copies of such a program, whether
35 | gratis or for a fee, you must pass on to the recipients the same
36 | freedoms that you received. You must make sure that they, too, receive
37 | or can get the source code. And you must show them these terms so they
38 | know their rights.
39 |
40 | Developers that use the GNU GPL protect your rights with two steps:
41 | (1) assert copyright on the software, and (2) offer you this License
42 | giving you legal permission to copy, distribute and/or modify it.
43 |
44 | For the developers' and authors' protection, the GPL clearly explains
45 | that there is no warranty for this free software. For both users' and
46 | authors' sake, the GPL requires that modified versions be marked as
47 | changed, so that their problems will not be attributed erroneously to
48 | authors of previous versions.
49 |
50 | Some devices are designed to deny users access to install or run
51 | modified versions of the software inside them, although the manufacturer
52 | can do so. This is fundamentally incompatible with the aim of
53 | protecting users' freedom to change the software. The systematic
54 | pattern of such abuse occurs in the area of products for individuals to
55 | use, which is precisely where it is most unacceptable. Therefore, we
56 | have designed this version of the GPL to prohibit the practice for those
57 | products. If such problems arise substantially in other domains, we
58 | stand ready to extend this provision to those domains in future versions
59 | of the GPL, as needed to protect the freedom of users.
60 |
61 | Finally, every program is threatened constantly by software patents.
62 | States should not allow patents to restrict development and use of
63 | software on general-purpose computers, but in those that do, we wish to
64 | avoid the special danger that patents applied to a free program could
65 | make it effectively proprietary. To prevent this, the GPL assures that
66 | patents cannot be used to render the program non-free.
67 |
68 | The precise terms and conditions for copying, distribution and
69 | modification follow.
70 |
71 | TERMS AND CONDITIONS
72 |
73 | 0. Definitions.
74 |
75 | "This License" refers to version 3 of the GNU General Public License.
76 |
77 | "Copyright" also means copyright-like laws that apply to other kinds of
78 | works, such as semiconductor masks.
79 |
80 | "The Program" refers to any copyrightable work licensed under this
81 | License. Each licensee is addressed as "you". "Licensees" and
82 | "recipients" may be individuals or organizations.
83 |
84 | To "modify" a work means to copy from or adapt all or part of the work
85 | in a fashion requiring copyright permission, other than the making of an
86 | exact copy. The resulting work is called a "modified version" of the
87 | earlier work or a work "based on" the earlier work.
88 |
89 | A "covered work" means either the unmodified Program or a work based
90 | on the Program.
91 |
92 | To "propagate" a work means to do anything with it that, without
93 | permission, would make you directly or secondarily liable for
94 | infringement under applicable copyright law, except executing it on a
95 | computer or modifying a private copy. Propagation includes copying,
96 | distribution (with or without modification), making available to the
97 | public, and in some countries other activities as well.
98 |
99 | To "convey" a work means any kind of propagation that enables other
100 | parties to make or receive copies. Mere interaction with a user through
101 | a computer network, with no transfer of a copy, is not conveying.
102 |
103 | An interactive user interface displays "Appropriate Legal Notices"
104 | to the extent that it includes a convenient and prominently visible
105 | feature that (1) displays an appropriate copyright notice, and (2)
106 | tells the user that there is no warranty for the work (except to the
107 | extent that warranties are provided), that licensees may convey the
108 | work under this License, and how to view a copy of this License. If
109 | the interface presents a list of user commands or options, such as a
110 | menu, a prominent item in the list meets this criterion.
111 |
112 | 1. Source Code.
113 |
114 | The "source code" for a work means the preferred form of the work
115 | for making modifications to it. "Object code" means any non-source
116 | form of a work.
117 |
118 | A "Standard Interface" means an interface that either is an official
119 | standard defined by a recognized standards body, or, in the case of
120 | interfaces specified for a particular programming language, one that
121 | is widely used among developers working in that language.
122 |
123 | The "System Libraries" of an executable work include anything, other
124 | than the work as a whole, that (a) is included in the normal form of
125 | packaging a Major Component, but which is not part of that Major
126 | Component, and (b) serves only to enable use of the work with that
127 | Major Component, or to implement a Standard Interface for which an
128 | implementation is available to the public in source code form. A
129 | "Major Component", in this context, means a major essential component
130 | (kernel, window system, and so on) of the specific operating system
131 | (if any) on which the executable work runs, or a compiler used to
132 | produce the work, or an object code interpreter used to run it.
133 |
134 | The "Corresponding Source" for a work in object code form means all
135 | the source code needed to generate, install, and (for an executable
136 | work) run the object code and to modify the work, including scripts to
137 | control those activities. However, it does not include the work's
138 | System Libraries, or general-purpose tools or generally available free
139 | programs which are used unmodified in performing those activities but
140 | which are not part of the work. For example, Corresponding Source
141 | includes interface definition files associated with source files for
142 | the work, and the source code for shared libraries and dynamically
143 | linked subprograms that the work is specifically designed to require,
144 | such as by intimate data communication or control flow between those
145 | subprograms and other parts of the work.
146 |
147 | The Corresponding Source need not include anything that users
148 | can regenerate automatically from other parts of the Corresponding
149 | Source.
150 |
151 | The Corresponding Source for a work in source code form is that
152 | same work.
153 |
154 | 2. Basic Permissions.
155 |
156 | All rights granted under this License are granted for the term of
157 | copyright on the Program, and are irrevocable provided the stated
158 | conditions are met. This License explicitly affirms your unlimited
159 | permission to run the unmodified Program. The output from running a
160 | covered work is covered by this License only if the output, given its
161 | content, constitutes a covered work. This License acknowledges your
162 | rights of fair use or other equivalent, as provided by copyright law.
163 |
164 | You may make, run and propagate covered works that you do not
165 | convey, without conditions so long as your license otherwise remains
166 | in force. You may convey covered works to others for the sole purpose
167 | of having them make modifications exclusively for you, or provide you
168 | with facilities for running those works, provided that you comply with
169 | the terms of this License in conveying all material for which you do
170 | not control copyright. Those thus making or running the covered works
171 | for you must do so exclusively on your behalf, under your direction
172 | and control, on terms that prohibit them from making any copies of
173 | your copyrighted material outside their relationship with you.
174 |
175 | Conveying under any other circumstances is permitted solely under
176 | the conditions stated below. Sublicensing is not allowed; section 10
177 | makes it unnecessary.
178 |
179 | 3. Protecting Users' Legal Rights From Anti-Circumvention Law.
180 |
181 | No covered work shall be deemed part of an effective technological
182 | measure under any applicable law fulfilling obligations under article
183 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or
184 | similar laws prohibiting or restricting circumvention of such
185 | measures.
186 |
187 | When you convey a covered work, you waive any legal power to forbid
188 | circumvention of technological measures to the extent such circumvention
189 | is effected by exercising rights under this License with respect to
190 | the covered work, and you disclaim any intention to limit operation or
191 | modification of the work as a means of enforcing, against the work's
192 | users, your or third parties' legal rights to forbid circumvention of
193 | technological measures.
194 |
195 | 4. Conveying Verbatim Copies.
196 |
197 | You may convey verbatim copies of the Program's source code as you
198 | receive it, in any medium, provided that you conspicuously and
199 | appropriately publish on each copy an appropriate copyright notice;
200 | keep intact all notices stating that this License and any
201 | non-permissive terms added in accord with section 7 apply to the code;
202 | keep intact all notices of the absence of any warranty; and give all
203 | recipients a copy of this License along with the Program.
204 |
205 | You may charge any price or no price for each copy that you convey,
206 | and you may offer support or warranty protection for a fee.
207 |
208 | 5. Conveying Modified Source Versions.
209 |
210 | You may convey a work based on the Program, or the modifications to
211 | produce it from the Program, in the form of source code under the
212 | terms of section 4, provided that you also meet all of these conditions:
213 |
214 | a) The work must carry prominent notices stating that you modified
215 | it, and giving a relevant date.
216 |
217 | b) The work must carry prominent notices stating that it is
218 | released under this License and any conditions added under section
219 | 7. This requirement modifies the requirement in section 4 to
220 | "keep intact all notices".
221 |
222 | c) You must license the entire work, as a whole, under this
223 | License to anyone who comes into possession of a copy. This
224 | License will therefore apply, along with any applicable section 7
225 | additional terms, to the whole of the work, and all its parts,
226 | regardless of how they are packaged. This License gives no
227 | permission to license the work in any other way, but it does not
228 | invalidate such permission if you have separately received it.
229 |
230 | d) If the work has interactive user interfaces, each must display
231 | Appropriate Legal Notices; however, if the Program has interactive
232 | interfaces that do not display Appropriate Legal Notices, your
233 | work need not make them do so.
234 |
235 | A compilation of a covered work with other separate and independent
236 | works, which are not by their nature extensions of the covered work,
237 | and which are not combined with it such as to form a larger program,
238 | in or on a volume of a storage or distribution medium, is called an
239 | "aggregate" if the compilation and its resulting copyright are not
240 | used to limit the access or legal rights of the compilation's users
241 | beyond what the individual works permit. Inclusion of a covered work
242 | in an aggregate does not cause this License to apply to the other
243 | parts of the aggregate.
244 |
245 | 6. Conveying Non-Source Forms.
246 |
247 | You may convey a covered work in object code form under the terms
248 | of sections 4 and 5, provided that you also convey the
249 | machine-readable Corresponding Source under the terms of this License,
250 | in one of these ways:
251 |
252 | a) Convey the object code in, or embodied in, a physical product
253 | (including a physical distribution medium), accompanied by the
254 | Corresponding Source fixed on a durable physical medium
255 | customarily used for software interchange.
256 |
257 | b) Convey the object code in, or embodied in, a physical product
258 | (including a physical distribution medium), accompanied by a
259 | written offer, valid for at least three years and valid for as
260 | long as you offer spare parts or customer support for that product
261 | model, to give anyone who possesses the object code either (1) a
262 | copy of the Corresponding Source for all the software in the
263 | product that is covered by this License, on a durable physical
264 | medium customarily used for software interchange, for a price no
265 | more than your reasonable cost of physically performing this
266 | conveying of source, or (2) access to copy the
267 | Corresponding Source from a network server at no charge.
268 |
269 | c) Convey individual copies of the object code with a copy of the
270 | written offer to provide the Corresponding Source. This
271 | alternative is allowed only occasionally and noncommercially, and
272 | only if you received the object code with such an offer, in accord
273 | with subsection 6b.
274 |
275 | d) Convey the object code by offering access from a designated
276 | place (gratis or for a charge), and offer equivalent access to the
277 | Corresponding Source in the same way through the same place at no
278 | further charge. You need not require recipients to copy the
279 | Corresponding Source along with the object code. If the place to
280 | copy the object code is a network server, the Corresponding Source
281 | may be on a different server (operated by you or a third party)
282 | that supports equivalent copying facilities, provided you maintain
283 | clear directions next to the object code saying where to find the
284 | Corresponding Source. Regardless of what server hosts the
285 | Corresponding Source, you remain obligated to ensure that it is
286 | available for as long as needed to satisfy these requirements.
287 |
288 | e) Convey the object code using peer-to-peer transmission, provided
289 | you inform other peers where the object code and Corresponding
290 | Source of the work are being offered to the general public at no
291 | charge under subsection 6d.
292 |
293 | A separable portion of the object code, whose source code is excluded
294 | from the Corresponding Source as a System Library, need not be
295 | included in conveying the object code work.
296 |
297 | A "User Product" is either (1) a "consumer product", which means any
298 | tangible personal property which is normally used for personal, family,
299 | or household purposes, or (2) anything designed or sold for incorporation
300 | into a dwelling. In determining whether a product is a consumer product,
301 | doubtful cases shall be resolved in favor of coverage. For a particular
302 | product received by a particular user, "normally used" refers to a
303 | typical or common use of that class of product, regardless of the status
304 | of the particular user or of the way in which the particular user
305 | actually uses, or expects or is expected to use, the product. A product
306 | is a consumer product regardless of whether the product has substantial
307 | commercial, industrial or non-consumer uses, unless such uses represent
308 | the only significant mode of use of the product.
309 |
310 | "Installation Information" for a User Product means any methods,
311 | procedures, authorization keys, or other information required to install
312 | and execute modified versions of a covered work in that User Product from
313 | a modified version of its Corresponding Source. The information must
314 | suffice to ensure that the continued functioning of the modified object
315 | code is in no case prevented or interfered with solely because
316 | modification has been made.
317 |
318 | If you convey an object code work under this section in, or with, or
319 | specifically for use in, a User Product, and the conveying occurs as
320 | part of a transaction in which the right of possession and use of the
321 | User Product is transferred to the recipient in perpetuity or for a
322 | fixed term (regardless of how the transaction is characterized), the
323 | Corresponding Source conveyed under this section must be accompanied
324 | by the Installation Information. But this requirement does not apply
325 | if neither you nor any third party retains the ability to install
326 | modified object code on the User Product (for example, the work has
327 | been installed in ROM).
328 |
329 | The requirement to provide Installation Information does not include a
330 | requirement to continue to provide support service, warranty, or updates
331 | for a work that has been modified or installed by the recipient, or for
332 | the User Product in which it has been modified or installed. Access to a
333 | network may be denied when the modification itself materially and
334 | adversely affects the operation of the network or violates the rules and
335 | protocols for communication across the network.
336 |
337 | Corresponding Source conveyed, and Installation Information provided,
338 | in accord with this section must be in a format that is publicly
339 | documented (and with an implementation available to the public in
340 | source code form), and must require no special password or key for
341 | unpacking, reading or copying.
342 |
343 | 7. Additional Terms.
344 |
345 | "Additional permissions" are terms that supplement the terms of this
346 | License by making exceptions from one or more of its conditions.
347 | Additional permissions that are applicable to the entire Program shall
348 | be treated as though they were included in this License, to the extent
349 | that they are valid under applicable law. If additional permissions
350 | apply only to part of the Program, that part may be used separately
351 | under those permissions, but the entire Program remains governed by
352 | this License without regard to the additional permissions.
353 |
354 | When you convey a copy of a covered work, you may at your option
355 | remove any additional permissions from that copy, or from any part of
356 | it. (Additional permissions may be written to require their own
357 | removal in certain cases when you modify the work.) You may place
358 | additional permissions on material, added by you to a covered work,
359 | for which you have or can give appropriate copyright permission.
360 |
361 | Notwithstanding any other provision of this License, for material you
362 | add to a covered work, you may (if authorized by the copyright holders of
363 | that material) supplement the terms of this License with terms:
364 |
365 | a) Disclaiming warranty or limiting liability differently from the
366 | terms of sections 15 and 16 of this License; or
367 |
368 | b) Requiring preservation of specified reasonable legal notices or
369 | author attributions in that material or in the Appropriate Legal
370 | Notices displayed by works containing it; or
371 |
372 | c) Prohibiting misrepresentation of the origin of that material, or
373 | requiring that modified versions of such material be marked in
374 | reasonable ways as different from the original version; or
375 |
376 | d) Limiting the use for publicity purposes of names of licensors or
377 | authors of the material; or
378 |
379 | e) Declining to grant rights under trademark law for use of some
380 | trade names, trademarks, or service marks; or
381 |
382 | f) Requiring indemnification of licensors and authors of that
383 | material by anyone who conveys the material (or modified versions of
384 | it) with contractual assumptions of liability to the recipient, for
385 | any liability that these contractual assumptions directly impose on
386 | those licensors and authors.
387 |
388 | All other non-permissive additional terms are considered "further
389 | restrictions" within the meaning of section 10. If the Program as you
390 | received it, or any part of it, contains a notice stating that it is
391 | governed by this License along with a term that is a further
392 | restriction, you may remove that term. If a license document contains
393 | a further restriction but permits relicensing or conveying under this
394 | License, you may add to a covered work material governed by the terms
395 | of that license document, provided that the further restriction does
396 | not survive such relicensing or conveying.
397 |
398 | If you add terms to a covered work in accord with this section, you
399 | must place, in the relevant source files, a statement of the
400 | additional terms that apply to those files, or a notice indicating
401 | where to find the applicable terms.
402 |
403 | Additional terms, permissive or non-permissive, may be stated in the
404 | form of a separately written license, or stated as exceptions;
405 | the above requirements apply either way.
406 |
407 | 8. Termination.
408 |
409 | You may not propagate or modify a covered work except as expressly
410 | provided under this License. Any attempt otherwise to propagate or
411 | modify it is void, and will automatically terminate your rights under
412 | this License (including any patent licenses granted under the third
413 | paragraph of section 11).
414 |
415 | However, if you cease all violation of this License, then your
416 | license from a particular copyright holder is reinstated (a)
417 | provisionally, unless and until the copyright holder explicitly and
418 | finally terminates your license, and (b) permanently, if the copyright
419 | holder fails to notify you of the violation by some reasonable means
420 | prior to 60 days after the cessation.
421 |
422 | Moreover, your license from a particular copyright holder is
423 | reinstated permanently if the copyright holder notifies you of the
424 | violation by some reasonable means, this is the first time you have
425 | received notice of violation of this License (for any work) from that
426 | copyright holder, and you cure the violation prior to 30 days after
427 | your receipt of the notice.
428 |
429 | Termination of your rights under this section does not terminate the
430 | licenses of parties who have received copies or rights from you under
431 | this License. If your rights have been terminated and not permanently
432 | reinstated, you do not qualify to receive new licenses for the same
433 | material under section 10.
434 |
435 | 9. Acceptance Not Required for Having Copies.
436 |
437 | You are not required to accept this License in order to receive or
438 | run a copy of the Program. Ancillary propagation of a covered work
439 | occurring solely as a consequence of using peer-to-peer transmission
440 | to receive a copy likewise does not require acceptance. However,
441 | nothing other than this License grants you permission to propagate or
442 | modify any covered work. These actions infringe copyright if you do
443 | not accept this License. Therefore, by modifying or propagating a
444 | covered work, you indicate your acceptance of this License to do so.
445 |
446 | 10. Automatic Licensing of Downstream Recipients.
447 |
448 | Each time you convey a covered work, the recipient automatically
449 | receives a license from the original licensors, to run, modify and
450 | propagate that work, subject to this License. You are not responsible
451 | for enforcing compliance by third parties with this License.
452 |
453 | An "entity transaction" is a transaction transferring control of an
454 | organization, or substantially all assets of one, or subdividing an
455 | organization, or merging organizations. If propagation of a covered
456 | work results from an entity transaction, each party to that
457 | transaction who receives a copy of the work also receives whatever
458 | licenses to the work the party's predecessor in interest had or could
459 | give under the previous paragraph, plus a right to possession of the
460 | Corresponding Source of the work from the predecessor in interest, if
461 | the predecessor has it or can get it with reasonable efforts.
462 |
463 | You may not impose any further restrictions on the exercise of the
464 | rights granted or affirmed under this License. For example, you may
465 | not impose a license fee, royalty, or other charge for exercise of
466 | rights granted under this License, and you may not initiate litigation
467 | (including a cross-claim or counterclaim in a lawsuit) alleging that
468 | any patent claim is infringed by making, using, selling, offering for
469 | sale, or importing the Program or any portion of it.
470 |
471 | 11. Patents.
472 |
473 | A "contributor" is a copyright holder who authorizes use under this
474 | License of the Program or a work on which the Program is based. The
475 | work thus licensed is called the contributor's "contributor version".
476 |
477 | A contributor's "essential patent claims" are all patent claims
478 | owned or controlled by the contributor, whether already acquired or
479 | hereafter acquired, that would be infringed by some manner, permitted
480 | by this License, of making, using, or selling its contributor version,
481 | but do not include claims that would be infringed only as a
482 | consequence of further modification of the contributor version. For
483 | purposes of this definition, "control" includes the right to grant
484 | patent sublicenses in a manner consistent with the requirements of
485 | this License.
486 |
487 | Each contributor grants you a non-exclusive, worldwide, royalty-free
488 | patent license under the contributor's essential patent claims, to
489 | make, use, sell, offer for sale, import and otherwise run, modify and
490 | propagate the contents of its contributor version.
491 |
492 | In the following three paragraphs, a "patent license" is any express
493 | agreement or commitment, however denominated, not to enforce a patent
494 | (such as an express permission to practice a patent or covenant not to
495 | sue for patent infringement). To "grant" such a patent license to a
496 | party means to make such an agreement or commitment not to enforce a
497 | patent against the party.
498 |
499 | If you convey a covered work, knowingly relying on a patent license,
500 | and the Corresponding Source of the work is not available for anyone
501 | to copy, free of charge and under the terms of this License, through a
502 | publicly available network server or other readily accessible means,
503 | then you must either (1) cause the Corresponding Source to be so
504 | available, or (2) arrange to deprive yourself of the benefit of the
505 | patent license for this particular work, or (3) arrange, in a manner
506 | consistent with the requirements of this License, to extend the patent
507 | license to downstream recipients. "Knowingly relying" means you have
508 | actual knowledge that, but for the patent license, your conveying the
509 | covered work in a country, or your recipient's use of the covered work
510 | in a country, would infringe one or more identifiable patents in that
511 | country that you have reason to believe are valid.
512 |
513 | If, pursuant to or in connection with a single transaction or
514 | arrangement, you convey, or propagate by procuring conveyance of, a
515 | covered work, and grant a patent license to some of the parties
516 | receiving the covered work authorizing them to use, propagate, modify
517 | or convey a specific copy of the covered work, then the patent license
518 | you grant is automatically extended to all recipients of the covered
519 | work and works based on it.
520 |
521 | A patent license is "discriminatory" if it does not include within
522 | the scope of its coverage, prohibits the exercise of, or is
523 | conditioned on the non-exercise of one or more of the rights that are
524 | specifically granted under this License. You may not convey a covered
525 | work if you are a party to an arrangement with a third party that is
526 | in the business of distributing software, under which you make payment
527 | to the third party based on the extent of your activity of conveying
528 | the work, and under which the third party grants, to any of the
529 | parties who would receive the covered work from you, a discriminatory
530 | patent license (a) in connection with copies of the covered work
531 | conveyed by you (or copies made from those copies), or (b) primarily
532 | for and in connection with specific products or compilations that
533 | contain the covered work, unless you entered into that arrangement,
534 | or that patent license was granted, prior to 28 March 2007.
535 |
536 | Nothing in this License shall be construed as excluding or limiting
537 | any implied license or other defenses to infringement that may
538 | otherwise be available to you under applicable patent law.
539 |
540 | 12. No Surrender of Others' Freedom.
541 |
542 | If conditions are imposed on you (whether by court order, agreement or
543 | otherwise) that contradict the conditions of this License, they do not
544 | excuse you from the conditions of this License. If you cannot convey a
545 | covered work so as to satisfy simultaneously your obligations under this
546 | License and any other pertinent obligations, then as a consequence you may
547 | not convey it at all. For example, if you agree to terms that obligate you
548 | to collect a royalty for further conveying from those to whom you convey
549 | the Program, the only way you could satisfy both those terms and this
550 | License would be to refrain entirely from conveying the Program.
551 |
552 | 13. Use with the GNU Affero General Public License.
553 |
554 | Notwithstanding any other provision of this License, you have
555 | permission to link or combine any covered work with a work licensed
556 | under version 3 of the GNU Affero General Public License into a single
557 | combined work, and to convey the resulting work. The terms of this
558 | License will continue to apply to the part which is the covered work,
559 | but the special requirements of the GNU Affero General Public License,
560 | section 13, concerning interaction through a network will apply to the
561 | combination as such.
562 |
563 | 14. Revised Versions of this License.
564 |
565 | The Free Software Foundation may publish revised and/or new versions of
566 | the GNU General Public License from time to time. Such new versions will
567 | be similar in spirit to the present version, but may differ in detail to
568 | address new problems or concerns.
569 |
570 | Each version is given a distinguishing version number. If the
571 | Program specifies that a certain numbered version of the GNU General
572 | Public License "or any later version" applies to it, you have the
573 | option of following the terms and conditions either of that numbered
574 | version or of any later version published by the Free Software
575 | Foundation. If the Program does not specify a version number of the
576 | GNU General Public License, you may choose any version ever published
577 | by the Free Software Foundation.
578 |
579 | If the Program specifies that a proxy can decide which future
580 | versions of the GNU General Public License can be used, that proxy's
581 | public statement of acceptance of a version permanently authorizes you
582 | to choose that version for the Program.
583 |
584 | Later license versions may give you additional or different
585 | permissions. However, no additional obligations are imposed on any
586 | author or copyright holder as a result of your choosing to follow a
587 | later version.
588 |
589 | 15. Disclaimer of Warranty.
590 |
591 | THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
592 | APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
593 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
594 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
595 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
596 | PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
597 | IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
598 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
599 |
600 | 16. Limitation of Liability.
601 |
602 | IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
603 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
604 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
605 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
606 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
607 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
608 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
609 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
610 | SUCH DAMAGES.
611 |
612 | 17. Interpretation of Sections 15 and 16.
613 |
614 | If the disclaimer of warranty and limitation of liability provided
615 | above cannot be given local legal effect according to their terms,
616 | reviewing courts shall apply local law that most closely approximates
617 | an absolute waiver of all civil liability in connection with the
618 | Program, unless a warranty or assumption of liability accompanies a
619 | copy of the Program in return for a fee.
620 |
621 | END OF TERMS AND CONDITIONS
622 |
623 | How to Apply These Terms to Your New Programs
624 |
625 | If you develop a new program, and you want it to be of the greatest
626 | possible use to the public, the best way to achieve this is to make it
627 | free software which everyone can redistribute and change under these terms.
628 |
629 | To do so, attach the following notices to the program. It is safest
630 | to attach them to the start of each source file to most effectively
631 | state the exclusion of warranty; and each file should have at least
632 | the "copyright" line and a pointer to where the full notice is found.
633 |
634 |
635 | Copyright (C)
636 |
637 | This program is free software: you can redistribute it and/or modify
638 | it under the terms of the GNU General Public License as published by
639 | the Free Software Foundation, either version 3 of the License, or
640 | (at your option) any later version.
641 |
642 | This program is distributed in the hope that it will be useful,
643 | but WITHOUT ANY WARRANTY; without even the implied warranty of
644 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
645 | GNU General Public License for more details.
646 |
647 | You should have received a copy of the GNU General Public License
648 | along with this program. If not, see .
649 |
650 | Also add information on how to contact you by electronic and paper mail.
651 |
652 | If the program does terminal interaction, make it output a short
653 | notice like this when it starts in an interactive mode:
654 |
655 | Copyright (C)
656 | This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
657 | This is free software, and you are welcome to redistribute it
658 | under certain conditions; type `show c' for details.
659 |
660 | The hypothetical commands `show w' and `show c' should show the appropriate
661 | parts of the General Public License. Of course, your program's commands
662 | might be different; for a GUI interface, you would use an "about box".
663 |
664 | You should also get your employer (if you work as a programmer) or school,
665 | if any, to sign a "copyright disclaimer" for the program, if necessary.
666 | For more information on this, and how to apply and follow the GNU GPL, see
667 | .
668 |
669 | The GNU General Public License does not permit incorporating your program
670 | into proprietary programs. If your program is a subroutine library, you
671 | may consider it more useful to permit linking proprietary applications with
672 | the library. If this is what you want to do, use the GNU Lesser General
673 | Public License instead of this License. But first, please read
674 | .
675 |
--------------------------------------------------------------------------------
/LzmaSpec.cpp:
--------------------------------------------------------------------------------
1 | /* LzmaSpec.c -- LZMA Reference Decoder
2 | 2015-06-14 : Igor Pavlov : Public domain */
3 |
4 | // This code implements LZMA file decoding according to LZMA specification.
5 | // This code is not optimized for speed.
6 |
7 | #include
8 |
9 | #ifdef _MSC_VER
10 | #pragma warning(disable : 4710) // function not inlined
11 | #pragma warning(disable : 4996) // This function or variable may be unsafe
12 | #endif
13 |
14 | typedef unsigned char Byte;
15 | typedef unsigned short UInt16;
16 |
17 | #ifdef _LZMA_UINT32_IS_ULONG
18 | typedef unsigned long UInt32;
19 | #else
20 | typedef unsigned int UInt32;
21 | #endif
22 |
23 | #if defined(_MSC_VER) || defined(__BORLANDC__)
24 | typedef unsigned __int64 UInt64;
25 | #else
26 | typedef unsigned long long int UInt64;
27 | #endif
28 |
29 |
30 | struct CInputStream
31 | {
32 | FILE *File;
33 | UInt64 Processed;
34 |
35 | void Init() { Processed = 0; }
36 |
37 | Byte ReadByte()
38 | {
39 | int c = getc(File);
40 | if (c < 0)
41 | throw "Unexpected end of file";
42 | Processed++;
43 | return (Byte)c;
44 | }
45 | };
46 |
47 |
48 | struct COutStream
49 | {
50 | FILE *File;
51 | UInt64 Processed;
52 |
53 | void Init() { Processed = 0; }
54 |
55 | void WriteByte(Byte b)
56 | {
57 | if (putc(b, File) == EOF)
58 | throw "File writing error";
59 | Processed++;
60 | }
61 | };
62 |
63 |
64 | class COutWindow
65 | {
66 | Byte *Buf;
67 | UInt32 Pos;
68 | UInt32 Size;
69 | bool IsFull;
70 |
71 | public:
72 | unsigned TotalPos;
73 | COutStream OutStream;
74 |
75 | COutWindow(): Buf(NULL) {}
76 | ~COutWindow() { delete []Buf; }
77 |
78 | void Create(UInt32 dictSize)
79 | {
80 | Buf = new Byte[dictSize];
81 | Pos = 0;
82 | Size = dictSize;
83 | IsFull = false;
84 | TotalPos = 0;
85 | }
86 |
87 | void PutByte(Byte b)
88 | {
89 | TotalPos++;
90 | Buf[Pos++] = b;
91 | if (Pos == Size)
92 | {
93 | Pos = 0;
94 | IsFull = true;
95 | }
96 | OutStream.WriteByte(b);
97 | }
98 |
99 | Byte GetByte(UInt32 dist) const
100 | {
101 | return Buf[dist <= Pos ? Pos - dist : Size - dist + Pos];
102 | }
103 |
104 | void CopyMatch(UInt32 dist, unsigned len)
105 | {
106 | for (; len > 0; len--)
107 | PutByte(GetByte(dist));
108 | }
109 |
110 | bool CheckDistance(UInt32 dist) const
111 | {
112 | return dist <= Pos || IsFull;
113 | }
114 |
115 | bool IsEmpty() const
116 | {
117 | return Pos == 0 && !IsFull;
118 | }
119 | };
120 |
121 |
122 | #define kNumBitModelTotalBits 11
123 | #define kNumMoveBits 5
124 |
125 | typedef UInt16 CProb;
126 |
127 | #define PROB_INIT_VAL ((1 << kNumBitModelTotalBits) / 2)
128 |
129 | #define INIT_PROBS(p) \
130 | { for (unsigned i = 0; i < sizeof(p) / sizeof(p[0]); i++) p[i] = PROB_INIT_VAL; }
131 |
132 | class CRangeDecoder
133 | {
134 | UInt32 Range;
135 | UInt32 Code;
136 |
137 | void Normalize();
138 |
139 | public:
140 |
141 | CInputStream *InStream;
142 | bool Corrupted;
143 |
144 | bool Init();
145 | bool IsFinishedOK() const { return Code == 0; }
146 |
147 | UInt32 DecodeDirectBits(unsigned numBits);
148 | unsigned DecodeBit(CProb *prob);
149 | };
150 |
151 | bool CRangeDecoder::Init()
152 | {
153 | Corrupted = false;
154 | Range = 0xFFFFFFFF;
155 | Code = 0;
156 |
157 | Byte b = InStream->ReadByte();
158 |
159 | for (int i = 0; i < 4; i++)
160 | Code = (Code << 8) | InStream->ReadByte();
161 |
162 | if (b != 0 || Code == Range)
163 | Corrupted = true;
164 | return b == 0;
165 | }
166 |
167 | #define kTopValue ((UInt32)1 << 24)
168 |
169 | void CRangeDecoder::Normalize()
170 | {
171 | if (Range < kTopValue)
172 | {
173 | Range <<= 8;
174 | Code = (Code << 8) | InStream->ReadByte();
175 | }
176 | }
177 |
178 | UInt32 CRangeDecoder::DecodeDirectBits(unsigned numBits)
179 | {
180 | UInt32 res = 0;
181 | do
182 | {
183 | Range >>= 1;
184 | Code -= Range;
185 | UInt32 t = 0 - ((UInt32)Code >> 31);
186 | Code += Range & t;
187 |
188 | if (Code == Range)
189 | Corrupted = true;
190 |
191 | Normalize();
192 | res <<= 1;
193 | res += t + 1;
194 | }
195 | while (--numBits);
196 | return res;
197 | }
198 |
199 | unsigned CRangeDecoder::DecodeBit(CProb *prob)
200 | {
201 | unsigned v = *prob;
202 | UInt32 bound = (Range >> kNumBitModelTotalBits) * v;
203 | unsigned symbol;
204 | if (Code < bound)
205 | {
206 | v += ((1 << kNumBitModelTotalBits) - v) >> kNumMoveBits;
207 | Range = bound;
208 | symbol = 0;
209 | }
210 | else
211 | {
212 | v -= v >> kNumMoveBits;
213 | Code -= bound;
214 | Range -= bound;
215 | symbol = 1;
216 | }
217 | *prob = (CProb)v;
218 | Normalize();
219 | return symbol;
220 | }
221 |
222 |
223 | unsigned BitTreeReverseDecode(CProb *probs, unsigned numBits, CRangeDecoder *rc)
224 | {
225 | unsigned m = 1;
226 | unsigned symbol = 0;
227 | for (unsigned i = 0; i < numBits; i++)
228 | {
229 | unsigned bit = rc->DecodeBit(&probs[m]);
230 | m <<= 1;
231 | m += bit;
232 | symbol |= (bit << i);
233 | }
234 | return symbol;
235 | }
236 |
237 | template
238 | class CBitTreeDecoder
239 | {
240 | CProb Probs[(unsigned)1 << NumBits];
241 |
242 | public:
243 |
244 | void Init()
245 | {
246 | INIT_PROBS(Probs);
247 | }
248 |
249 | unsigned Decode(CRangeDecoder *rc)
250 | {
251 | unsigned m = 1;
252 | for (unsigned i = 0; i < NumBits; i++)
253 | m = (m << 1) + rc->DecodeBit(&Probs[m]);
254 | return m - ((unsigned)1 << NumBits);
255 | }
256 |
257 | unsigned ReverseDecode(CRangeDecoder *rc)
258 | {
259 | return BitTreeReverseDecode(Probs, NumBits, rc);
260 | }
261 | };
262 |
263 | #define kNumPosBitsMax 4
264 |
265 | #define kNumStates 12
266 | #define kNumLenToPosStates 4
267 | #define kNumAlignBits 4
268 | #define kStartPosModelIndex 4
269 | #define kEndPosModelIndex 14
270 | #define kNumFullDistances (1 << (kEndPosModelIndex >> 1))
271 | #define kMatchMinLen 2
272 |
273 | class CLenDecoder
274 | {
275 | CProb Choice;
276 | CProb Choice2;
277 | CBitTreeDecoder<3> LowCoder[1 << kNumPosBitsMax];
278 | CBitTreeDecoder<3> MidCoder[1 << kNumPosBitsMax];
279 | CBitTreeDecoder<8> HighCoder;
280 |
281 | public:
282 |
283 | void Init()
284 | {
285 | Choice = PROB_INIT_VAL;
286 | Choice2 = PROB_INIT_VAL;
287 | HighCoder.Init();
288 | for (unsigned i = 0; i < (1 << kNumPosBitsMax); i++)
289 | {
290 | LowCoder[i].Init();
291 | MidCoder[i].Init();
292 | }
293 | }
294 |
295 | unsigned Decode(CRangeDecoder *rc, unsigned posState)
296 | {
297 | if (rc->DecodeBit(&Choice) == 0)
298 | return LowCoder[posState].Decode(rc);
299 | if (rc->DecodeBit(&Choice2) == 0)
300 | return 8 + MidCoder[posState].Decode(rc);
301 | return 16 + HighCoder.Decode(rc);
302 | }
303 | };
304 |
305 | unsigned UpdateState_Literal(unsigned state)
306 | {
307 | if (state < 4) return 0;
308 | else if (state < 10) return state - 3;
309 | else return state - 6;
310 | }
311 | unsigned UpdateState_Match (unsigned state) { return state < 7 ? 7 : 10; }
312 | unsigned UpdateState_Rep (unsigned state) { return state < 7 ? 8 : 11; }
313 | unsigned UpdateState_ShortRep(unsigned state) { return state < 7 ? 9 : 11; }
314 |
315 | #define LZMA_DIC_MIN (1 << 12)
316 |
317 | class CLzmaDecoder
318 | {
319 | public:
320 | CRangeDecoder RangeDec;
321 | COutWindow OutWindow;
322 |
323 | bool markerIsMandatory;
324 | unsigned lc, pb, lp;
325 | UInt32 dictSize;
326 | UInt32 dictSizeInProperties;
327 |
328 | void DecodeProperties(const Byte *properties)
329 | {
330 | unsigned d = properties[0];
331 | if (d >= (9 * 5 * 5))
332 | throw "Incorrect LZMA properties";
333 | lc = d % 9;
334 | d /= 9;
335 | pb = d / 5;
336 | lp = d % 5;
337 | dictSizeInProperties = 0;
338 | for (int i = 0; i < 4; i++)
339 | dictSizeInProperties |= (UInt32)properties[i + 1] << (8 * i);
340 | dictSize = dictSizeInProperties;
341 | if (dictSize < LZMA_DIC_MIN)
342 | dictSize = LZMA_DIC_MIN;
343 | }
344 |
345 | CLzmaDecoder(): LitProbs(NULL) {}
346 | ~CLzmaDecoder() { delete []LitProbs; }
347 |
348 | void Create()
349 | {
350 | OutWindow.Create(dictSize);
351 | CreateLiterals();
352 | }
353 |
354 | int Decode(bool unpackSizeDefined, UInt64 unpackSize);
355 |
356 | private:
357 |
358 | CProb *LitProbs;
359 |
360 | void CreateLiterals()
361 | {
362 | LitProbs = new CProb[(UInt32)0x300 << (lc + lp)];
363 | }
364 |
365 | void InitLiterals()
366 | {
367 | UInt32 num = (UInt32)0x300 << (lc + lp);
368 | for (UInt32 i = 0; i < num; i++)
369 | LitProbs[i] = PROB_INIT_VAL;
370 | }
371 |
372 | void DecodeLiteral(unsigned state, UInt32 rep0)
373 | {
374 | unsigned prevByte = 0;
375 | if (!OutWindow.IsEmpty())
376 | prevByte = OutWindow.GetByte(1);
377 |
378 | unsigned symbol = 1;
379 | unsigned litState = ((OutWindow.TotalPos & ((1 << lp) - 1)) << lc) + (prevByte >> (8 - lc));
380 | CProb *probs = &LitProbs[(UInt32)0x300 * litState];
381 |
382 | if (state >= 7)
383 | {
384 | unsigned matchByte = OutWindow.GetByte(rep0 + 1);
385 | do
386 | {
387 | unsigned matchBit = (matchByte >> 7) & 1;
388 | matchByte <<= 1;
389 | unsigned bit = RangeDec.DecodeBit(&probs[((1 + matchBit) << 8) + symbol]);
390 | symbol = (symbol << 1) | bit;
391 | if (matchBit != bit)
392 | break;
393 | }
394 | while (symbol < 0x100);
395 | }
396 | while (symbol < 0x100)
397 | symbol = (symbol << 1) | RangeDec.DecodeBit(&probs[symbol]);
398 | OutWindow.PutByte((Byte)(symbol - 0x100));
399 | }
400 |
401 | CBitTreeDecoder<6> PosSlotDecoder[kNumLenToPosStates];
402 | CBitTreeDecoder AlignDecoder;
403 | CProb PosDecoders[1 + kNumFullDistances - kEndPosModelIndex];
404 |
405 | void InitDist()
406 | {
407 | for (unsigned i = 0; i < kNumLenToPosStates; i++)
408 | PosSlotDecoder[i].Init();
409 | AlignDecoder.Init();
410 | INIT_PROBS(PosDecoders);
411 | }
412 |
413 | unsigned DecodeDistance(unsigned len)
414 | {
415 | unsigned lenState = len;
416 | if (lenState > kNumLenToPosStates - 1)
417 | lenState = kNumLenToPosStates - 1;
418 |
419 | unsigned posSlot = PosSlotDecoder[lenState].Decode(&RangeDec);
420 | if (posSlot < 4)
421 | return posSlot;
422 |
423 | unsigned numDirectBits = (unsigned)((posSlot >> 1) - 1);
424 | UInt32 dist = ((2 | (posSlot & 1)) << numDirectBits);
425 | if (posSlot < kEndPosModelIndex)
426 | dist += BitTreeReverseDecode(PosDecoders + dist - posSlot, numDirectBits, &RangeDec);
427 | else
428 | {
429 | dist += RangeDec.DecodeDirectBits(numDirectBits - kNumAlignBits) << kNumAlignBits;
430 | dist += AlignDecoder.ReverseDecode(&RangeDec);
431 | }
432 | return dist;
433 | }
434 |
435 | CProb IsMatch[kNumStates << kNumPosBitsMax];
436 | CProb IsRep[kNumStates];
437 | CProb IsRepG0[kNumStates];
438 | CProb IsRepG1[kNumStates];
439 | CProb IsRepG2[kNumStates];
440 | CProb IsRep0Long[kNumStates << kNumPosBitsMax];
441 |
442 | CLenDecoder LenDecoder;
443 | CLenDecoder RepLenDecoder;
444 |
445 | void Init()
446 | {
447 | InitLiterals();
448 | InitDist();
449 |
450 | INIT_PROBS(IsMatch);
451 | INIT_PROBS(IsRep);
452 | INIT_PROBS(IsRepG0);
453 | INIT_PROBS(IsRepG1);
454 | INIT_PROBS(IsRepG2);
455 | INIT_PROBS(IsRep0Long);
456 |
457 | LenDecoder.Init();
458 | RepLenDecoder.Init();
459 | }
460 | };
461 |
462 |
463 | #define LZMA_RES_ERROR 0
464 | #define LZMA_RES_FINISHED_WITH_MARKER 1
465 | #define LZMA_RES_FINISHED_WITHOUT_MARKER 2
466 |
467 | int CLzmaDecoder::Decode(bool unpackSizeDefined, UInt64 unpackSize)
468 | {
469 | if (!RangeDec.Init())
470 | return LZMA_RES_ERROR;
471 |
472 | Init();
473 |
474 | UInt32 rep0 = 0, rep1 = 0, rep2 = 0, rep3 = 0;
475 | unsigned state = 0;
476 |
477 | for (;;)
478 | {
479 | if (unpackSizeDefined && unpackSize == 0 && !markerIsMandatory)
480 | if (RangeDec.IsFinishedOK())
481 | return LZMA_RES_FINISHED_WITHOUT_MARKER;
482 |
483 | unsigned posState = OutWindow.TotalPos & ((1 << pb) - 1);
484 |
485 | if (RangeDec.DecodeBit(&IsMatch[(state << kNumPosBitsMax) + posState]) == 0)
486 | {
487 | if (unpackSizeDefined && unpackSize == 0)
488 | return LZMA_RES_ERROR;
489 | DecodeLiteral(state, rep0);
490 | state = UpdateState_Literal(state);
491 | unpackSize--;
492 | continue;
493 | }
494 |
495 | unsigned len;
496 |
497 | if (RangeDec.DecodeBit(&IsRep[state]) != 0)
498 | {
499 | if (unpackSizeDefined && unpackSize == 0)
500 | return LZMA_RES_ERROR;
501 | if (OutWindow.IsEmpty())
502 | return LZMA_RES_ERROR;
503 | if (RangeDec.DecodeBit(&IsRepG0[state]) == 0)
504 | {
505 | if (RangeDec.DecodeBit(&IsRep0Long[(state << kNumPosBitsMax) + posState]) == 0)
506 | {
507 | state = UpdateState_ShortRep(state);
508 | OutWindow.PutByte(OutWindow.GetByte(rep0 + 1));
509 | unpackSize--;
510 | continue;
511 | }
512 | }
513 | else
514 | {
515 | UInt32 dist;
516 | if (RangeDec.DecodeBit(&IsRepG1[state]) == 0)
517 | dist = rep1;
518 | else
519 | {
520 | if (RangeDec.DecodeBit(&IsRepG2[state]) == 0)
521 | dist = rep2;
522 | else
523 | {
524 | dist = rep3;
525 | rep3 = rep2;
526 | }
527 | rep2 = rep1;
528 | }
529 | rep1 = rep0;
530 | rep0 = dist;
531 | }
532 | len = RepLenDecoder.Decode(&RangeDec, posState);
533 | state = UpdateState_Rep(state);
534 | }
535 | else
536 | {
537 | rep3 = rep2;
538 | rep2 = rep1;
539 | rep1 = rep0;
540 | len = LenDecoder.Decode(&RangeDec, posState);
541 | state = UpdateState_Match(state);
542 | rep0 = DecodeDistance(len);
543 | if (rep0 == 0xFFFFFFFF)
544 | return RangeDec.IsFinishedOK() ?
545 | LZMA_RES_FINISHED_WITH_MARKER :
546 | LZMA_RES_ERROR;
547 |
548 | if (unpackSizeDefined && unpackSize == 0)
549 | return LZMA_RES_ERROR;
550 | if (rep0 >= dictSize || !OutWindow.CheckDistance(rep0))
551 | return LZMA_RES_ERROR;
552 | }
553 | len += kMatchMinLen;
554 | bool isError = false;
555 | if (unpackSizeDefined && unpackSize < len)
556 | {
557 | len = (unsigned)unpackSize;
558 | isError = true;
559 | }
560 | OutWindow.CopyMatch(rep0 + 1, len);
561 | unpackSize -= len;
562 | if (isError)
563 | return LZMA_RES_ERROR;
564 | }
565 | }
566 |
567 | static void Print(const char *s)
568 | {
569 | fputs(s, stdout);
570 | }
571 |
572 | static void PrintError(const char *s)
573 | {
574 | fputs(s, stderr);
575 | }
576 |
577 |
578 | #define CONVERT_INT_TO_STR(charType, tempSize) \
579 |
580 | void ConvertUInt64ToString(UInt64 val, char *s)
581 | {
582 | char temp[32];
583 | unsigned i = 0;
584 | while (val >= 10)
585 | {
586 | temp[i++] = (char)('0' + (unsigned)(val % 10));
587 | val /= 10;
588 | }
589 | *s++ = (char)('0' + (unsigned)val);
590 | while (i != 0)
591 | {
592 | i--;
593 | *s++ = temp[i];
594 | }
595 | *s = 0;
596 | }
597 |
598 | void PrintUInt64(const char *title, UInt64 v)
599 | {
600 | Print(title);
601 | Print(" : ");
602 | char s[32];
603 | ConvertUInt64ToString(v, s);
604 | Print(s);
605 | Print(" bytes \n");
606 | }
607 |
608 | int main2(int numArgs, const char *args[])
609 | {
610 | Print("\nLZMA Reference Decoder 15.00 : Igor Pavlov : Public domain : 2015-04-16\n");
611 | if (numArgs == 1)
612 | Print("\nUse: lzmaSpec a.lzma outFile");
613 |
614 | if (numArgs != 3)
615 | throw "you must specify two parameters";
616 |
617 | CInputStream inStream;
618 | inStream.File = fopen(args[1], "rb");
619 | inStream.Init();
620 | if (inStream.File == 0)
621 | throw "Can't open input file";
622 |
623 | CLzmaDecoder lzmaDecoder;
624 | lzmaDecoder.OutWindow.OutStream.File = fopen(args[2], "wb+");
625 | lzmaDecoder.OutWindow.OutStream.Init();
626 | if (inStream.File == 0)
627 | throw "Can't open output file";
628 |
629 | Byte header[13];
630 | int i;
631 | for (i = 0; i < 13; i++)
632 | header[i] = inStream.ReadByte();
633 |
634 | lzmaDecoder.DecodeProperties(header);
635 |
636 | printf("\nlc=%d, lp=%d, pb=%d", lzmaDecoder.lc, lzmaDecoder.lp, lzmaDecoder.pb);
637 | printf("\nDictionary Size in properties = %u", lzmaDecoder.dictSizeInProperties);
638 | printf("\nDictionary Size for decoding = %u", lzmaDecoder.dictSize);
639 |
640 | UInt64 unpackSize = 0;
641 | bool unpackSizeDefined = false;
642 | for (i = 0; i < 8; i++)
643 | {
644 | Byte b = header[5 + i];
645 | if (b != 0xFF)
646 | unpackSizeDefined = true;
647 | unpackSize |= (UInt64)b << (8 * i);
648 | }
649 |
650 | lzmaDecoder.markerIsMandatory = !unpackSizeDefined;
651 |
652 | Print("\n");
653 | if (unpackSizeDefined)
654 | PrintUInt64("Uncompressed Size", unpackSize);
655 | else
656 | Print("End marker is expected\n");
657 | lzmaDecoder.RangeDec.InStream = &inStream;
658 |
659 | Print("\n");
660 |
661 | lzmaDecoder.Create();
662 |
663 | int res = lzmaDecoder.Decode(unpackSizeDefined, unpackSize);
664 |
665 | PrintUInt64("Read ", inStream.Processed);
666 | PrintUInt64("Written ", lzmaDecoder.OutWindow.OutStream.Processed);
667 |
668 | if (res == LZMA_RES_ERROR)
669 | throw "LZMA decoding error";
670 | else if (res == LZMA_RES_FINISHED_WITHOUT_MARKER)
671 | Print("Finished without end marker");
672 | else if (res == LZMA_RES_FINISHED_WITH_MARKER)
673 | {
674 | if (unpackSizeDefined)
675 | {
676 | if (lzmaDecoder.OutWindow.OutStream.Processed != unpackSize)
677 | throw "Finished with end marker before than specified size";
678 | Print("Warning: ");
679 | }
680 | Print("Finished with end marker");
681 | }
682 | else
683 | throw "Internal Error";
684 |
685 | Print("\n");
686 |
687 | if (lzmaDecoder.RangeDec.Corrupted)
688 | {
689 | Print("\nWarning: LZMA stream is corrupted\n");
690 | }
691 |
692 | return 0;
693 | }
694 |
695 |
696 | int
697 | #ifdef _MSC_VER
698 | __cdecl
699 | #endif
700 | main(int numArgs, const char *args[])
701 | {
702 | try { return main2(numArgs, args); }
703 | catch (const char *s)
704 | {
705 | PrintError("\nError:\n");
706 | PrintError(s);
707 | PrintError("\n");
708 | return 1;
709 | }
710 | catch(...)
711 | {
712 | PrintError("\nError\n");
713 | return 1;
714 | }
715 | }
716 |
--------------------------------------------------------------------------------
/LzmaSpec.exe:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shelwien/lzma_sh/0e916adc087e029ad5afd1567bf0ca03a5574a1b/LzmaSpec.exe
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # lzma_sh
2 | compact lzma decoder
3 |
4 | lzmaSh2a.cpp is a little longer (247 LoC vs 228), but has an explicit state table
5 | and integrated id decoding.
6 |
7 | LzmaSpec.cpp is the "official" demo decoder, 716 LoC
8 |
9 |
10 |
--------------------------------------------------------------------------------
/examples/a.lzma:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shelwien/lzma_sh/0e916adc087e029ad5afd1567bf0ca03a5574a1b/examples/a.lzma
--------------------------------------------------------------------------------
/examples/a.txt:
--------------------------------------------------------------------------------
1 | LZMA decoder test example
2 | =========================
3 | ! LZMA ! Decoder ! TEST !
4 | =========================
5 | ! TEST ! LZMA ! Decoder !
6 | =========================
7 | ---- Test Line 1 --------
8 | =========================
9 | ---- Test Line 2 --------
10 | =========================
11 | === End of test file ====
12 | =========================
13 |
--------------------------------------------------------------------------------
/examples/a_eos.lzma:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shelwien/lzma_sh/0e916adc087e029ad5afd1567bf0ca03a5574a1b/examples/a_eos.lzma
--------------------------------------------------------------------------------
/examples/a_eos_and_size.lzma:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shelwien/lzma_sh/0e916adc087e029ad5afd1567bf0ca03a5574a1b/examples/a_eos_and_size.lzma
--------------------------------------------------------------------------------
/examples/a_lp1_lc2_pb1.lzma:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shelwien/lzma_sh/0e916adc087e029ad5afd1567bf0ca03a5574a1b/examples/a_lp1_lc2_pb1.lzma
--------------------------------------------------------------------------------
/examples/bad_corrupted.lzma:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shelwien/lzma_sh/0e916adc087e029ad5afd1567bf0ca03a5574a1b/examples/bad_corrupted.lzma
--------------------------------------------------------------------------------
/examples/bad_eos_incorrect_size.lzma:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shelwien/lzma_sh/0e916adc087e029ad5afd1567bf0ca03a5574a1b/examples/bad_eos_incorrect_size.lzma
--------------------------------------------------------------------------------
/examples/bad_incorrect_size.lzma:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shelwien/lzma_sh/0e916adc087e029ad5afd1567bf0ca03a5574a1b/examples/bad_incorrect_size.lzma
--------------------------------------------------------------------------------
/examples/info.txt:
--------------------------------------------------------------------------------
1 | GOOD archives:
2 |
3 | a.lzma
4 | the stream was compressed with default properties lp=0 lc=3 pb=2 and 64 KiB dictionary
5 | a_eos.lzma
6 | the stream has EOS marker
7 | a_eos_and_size.lzma
8 | the stream has EOS marker and unpack size is defined
9 | a_lp1_lc2_pb1.lzma
10 | the stream was compressed with lp=1 lc=2 pb=1 properties
11 |
12 |
13 | BAD ARCHIVES:
14 |
15 | bad_corrupted.lzma
16 | some bytes in compressed stream were changed
17 | bad_eos_incorrect_size.lzma
18 | the stream has EOS marker and unpack size in header is larger than real uncompressed size
19 | bad_incorrect_size.lzma
20 | the header contains incorrect size (290). The correct size is 327
21 |
--------------------------------------------------------------------------------
/geo_lzma:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shelwien/lzma_sh/0e916adc087e029ad5afd1567bf0ca03a5574a1b/geo_lzma
--------------------------------------------------------------------------------
/lzma-specification.txt:
--------------------------------------------------------------------------------
1 | LZMA specification (DRAFT version)
2 | ----------------------------------
3 |
4 | Author: Igor Pavlov
5 | Date: 2015-06-14
6 |
7 | This specification defines the format of LZMA compressed data and lzma file format.
8 |
9 | Notation
10 | --------
11 |
12 | We use the syntax of C++ programming language.
13 | We use the following types in C++ code:
14 | unsigned - unsigned integer, at least 16 bits in size
15 | int - signed integer, at least 16 bits in size
16 | UInt64 - 64-bit unsigned integer
17 | UInt32 - 32-bit unsigned integer
18 | UInt16 - 16-bit unsigned integer
19 | Byte - 8-bit unsigned integer
20 | bool - boolean type with two possible values: false, true
21 |
22 |
23 | lzma file format
24 | ================
25 |
26 | The lzma file contains the raw LZMA stream and the header with related properties.
27 |
28 | The files in that format use ".lzma" extension.
29 |
30 | The lzma file format layout:
31 |
32 | Offset Size Description
33 |
34 | 0 1 LZMA model properties (lc, lp, pb) in encoded form
35 | 1 4 Dictionary size (32-bit unsigned integer, little-endian)
36 | 5 8 Uncompressed size (64-bit unsigned integer, little-endian)
37 | 13 Compressed data (LZMA stream)
38 |
39 | LZMA properties:
40 |
41 | name Range Description
42 |
43 | lc [0, 8] the number of "literal context" bits
44 | lp [0, 4] the number of "literal pos" bits
45 | pb [0, 4] the number of "pos" bits
46 | dictSize [0, 2^32 - 1] the dictionary size
47 |
48 | The following code encodes LZMA properties:
49 |
50 | void EncodeProperties(Byte *properties)
51 | {
52 | properties[0] = (Byte)((pb * 5 + lp) * 9 + lc);
53 | Set_UInt32_LittleEndian(properties + 1, dictSize);
54 | }
55 |
56 | If the value of dictionary size in properties is smaller than (1 << 12),
57 | the LZMA decoder must set the dictionary size variable to (1 << 12).
58 |
59 | #define LZMA_DIC_MIN (1 << 12)
60 |
61 | unsigned lc, pb, lp;
62 | UInt32 dictSize;
63 | UInt32 dictSizeInProperties;
64 |
65 | void DecodeProperties(const Byte *properties)
66 | {
67 | unsigned d = properties[0];
68 | if (d >= (9 * 5 * 5))
69 | throw "Incorrect LZMA properties";
70 | lc = d % 9;
71 | d /= 9;
72 | pb = d / 5;
73 | lp = d % 5;
74 | dictSizeInProperties = 0;
75 | for (int i = 0; i < 4; i++)
76 | dictSizeInProperties |= (UInt32)properties[i + 1] << (8 * i);
77 | dictSize = dictSizeInProperties;
78 | if (dictSize < LZMA_DIC_MIN)
79 | dictSize = LZMA_DIC_MIN;
80 | }
81 |
82 | If "Uncompressed size" field contains ones in all 64 bits, it means that
83 | uncompressed size is unknown and there is the "end marker" in stream,
84 | that indicates the end of decoding point.
85 | In opposite case, if the value from "Uncompressed size" field is not
86 | equal to ((2^64) - 1), the LZMA stream decoding must be finished after
87 | specified number of bytes (Uncompressed size) is decoded. And if there
88 | is the "end marker", the LZMA decoder must read that marker also.
89 |
90 |
91 | The new scheme to encode LZMA properties
92 | ----------------------------------------
93 |
94 | If LZMA compression is used for some another format, it's recommended to
95 | use a new improved scheme to encode LZMA properties. That new scheme was
96 | used in xz format that uses the LZMA2 compression algorithm.
97 | The LZMA2 is a new compression algorithm that is based on the LZMA algorithm.
98 |
99 | The dictionary size in LZMA2 is encoded with just one byte and LZMA2 supports
100 | only reduced set of dictionary sizes:
101 | (2 << 11), (3 << 11),
102 | (2 << 12), (3 << 12),
103 | ...
104 | (2 << 30), (3 << 30),
105 | (2 << 31) - 1
106 |
107 | The dictionary size can be extracted from encoded value with the following code:
108 |
109 | dictSize = (p == 40) ? 0xFFFFFFFF : (((UInt32)2 | ((p) & 1)) << ((p) / 2 + 11));
110 |
111 | Also there is additional limitation (lc + lp <= 4) in LZMA2 for values of
112 | "lc" and "lp" properties:
113 |
114 | if (lc + lp > 4)
115 | throw "Unsupported properties: (lc + lp) > 4";
116 |
117 | There are some advantages for LZMA decoder with such (lc + lp) value
118 | limitation. It reduces the maximum size of tables allocated by decoder.
119 | And it reduces the complexity of initialization procedure, that can be
120 | important to keep high speed of decoding of big number of small LZMA streams.
121 |
122 | It's recommended to use that limitation (lc + lp <= 4) for any new format
123 | that uses LZMA compression. Note that the combinations of "lc" and "lp"
124 | parameters, where (lc + lp > 4), can provide significant improvement in
125 | compression ratio only in some rare cases.
126 |
127 | The LZMA properties can be encoded into two bytes in new scheme:
128 |
129 | Offset Size Description
130 |
131 | 0 1 The dictionary size encoded with LZMA2 scheme
132 | 1 1 LZMA model properties (lc, lp, pb) in encoded form
133 |
134 |
135 | The RAM usage
136 | =============
137 |
138 | The RAM usage for LZMA decoder is determined by the following parts:
139 |
140 | 1) The Sliding Window (from 4 KiB to 4 GiB).
141 | 2) The probability model counter arrays (arrays of 16-bit variables).
142 | 3) Some additional state variables (about 10 variables of 32-bit integers).
143 |
144 |
145 | The RAM usage for Sliding Window
146 | --------------------------------
147 |
148 | There are two main scenarios of decoding:
149 |
150 | 1) The decoding of full stream to one RAM buffer.
151 |
152 | If we decode full LZMA stream to one output buffer in RAM, the decoder
153 | can use that output buffer as sliding window. So the decoder doesn't
154 | need additional buffer allocated for sliding window.
155 |
156 | 2) The decoding to some external storage.
157 |
158 | If we decode LZMA stream to external storage, the decoder must allocate
159 | the buffer for sliding window. The size of that buffer must be equal
160 | or larger than the value of dictionary size from properties of LZMA stream.
161 |
162 | In this specification we describe the code for decoding to some external
163 | storage. The optimized version of code for decoding of full stream to one
164 | output RAM buffer can require some minor changes in code.
165 |
166 |
167 | The RAM usage for the probability model counters
168 | ------------------------------------------------
169 |
170 | The size of the probability model counter arrays is calculated with the
171 | following formula:
172 |
173 | size_of_prob_arrays = 1846 + 768 * (1 << (lp + lc))
174 |
175 | Each probability model counter is 11-bit unsigned integer.
176 | If we use 16-bit integer variables (2-byte integers) for these probability
177 | model counters, the RAM usage required by probability model counter arrays
178 | can be estimated with the following formula:
179 |
180 | RAM = 4 KiB + 1.5 KiB * (1 << (lp + lc))
181 |
182 | For example, for default LZMA parameters (lp = 0 and lc = 3), the RAM usage is
183 |
184 | RAM_lc3_lp0 = 4 KiB + 1.5 KiB * 8 = 16 KiB
185 |
186 | The maximum RAM state usage is required for decoding the stream with lp = 4
187 | and lc = 8:
188 |
189 | RAM_lc8_lp4 = 4 KiB + 1.5 KiB * 4096 = 6148 KiB
190 |
191 | If the decoder uses LZMA2's limited property condition
192 | (lc + lp <= 4), the RAM usage will be not larger than
193 |
194 | RAM_lc_lp_4 = 4 KiB + 1.5 KiB * 16 = 28 KiB
195 |
196 |
197 | The RAM usage for encoder
198 | -------------------------
199 |
200 | There are many variants for LZMA encoding code.
201 | These variants have different values for memory consumption.
202 | Note that memory consumption for LZMA Encoder can not be
203 | smaller than memory consumption of LZMA Decoder for same stream.
204 |
205 | The RAM usage required by modern effective implementation of
206 | LZMA Encoder can be estimated with the following formula:
207 |
208 | Encoder_RAM_Usage = 4 MiB + 11 * dictionarySize.
209 |
210 | But there are some modes of the encoder that require less memory.
211 |
212 |
213 | LZMA Decoding
214 | =============
215 |
216 | The LZMA compression algorithm uses LZ-based compression with Sliding Window
217 | and Range Encoding as entropy coding method.
218 |
219 |
220 | Sliding Window
221 | --------------
222 |
223 | LZMA uses Sliding Window compression similar to LZ77 algorithm.
224 |
225 | LZMA stream must be decoded to the sequence that consists
226 | of MATCHES and LITERALS:
227 |
228 | - a LITERAL is a 8-bit character (one byte).
229 | The decoder just puts that LITERAL to the uncompressed stream.
230 |
231 | - a MATCH is a pair of two numbers (DISTANCE-LENGTH pair).
232 | The decoder takes one byte exactly "DISTANCE" characters behind
233 | current position in the uncompressed stream and puts it to
234 | uncompressed stream. The decoder must repeat it "LENGTH" times.
235 |
236 | The "DISTANCE" can not be larger than dictionary size.
237 | And the "DISTANCE" can not be larger than the number of bytes in
238 | the uncompressed stream that were decoded before that match.
239 |
240 | In this specification we use cyclic buffer to implement Sliding Window
241 | for LZMA decoder:
242 |
243 | class COutWindow
244 | {
245 | Byte *Buf;
246 | UInt32 Pos;
247 | UInt32 Size;
248 | bool IsFull;
249 |
250 | public:
251 | unsigned TotalPos;
252 | COutStream OutStream;
253 |
254 | COutWindow(): Buf(NULL) {}
255 | ~COutWindow() { delete []Buf; }
256 |
257 | void Create(UInt32 dictSize)
258 | {
259 | Buf = new Byte[dictSize];
260 | Pos = 0;
261 | Size = dictSize;
262 | IsFull = false;
263 | TotalPos = 0;
264 | }
265 |
266 | void PutByte(Byte b)
267 | {
268 | TotalPos++;
269 | Buf[Pos++] = b;
270 | if (Pos == Size)
271 | {
272 | Pos = 0;
273 | IsFull = true;
274 | }
275 | OutStream.WriteByte(b);
276 | }
277 |
278 | Byte GetByte(UInt32 dist) const
279 | {
280 | return Buf[dist <= Pos ? Pos - dist : Size - dist + Pos];
281 | }
282 |
283 | void CopyMatch(UInt32 dist, unsigned len)
284 | {
285 | for (; len > 0; len--)
286 | PutByte(GetByte(dist));
287 | }
288 |
289 | bool CheckDistance(UInt32 dist) const
290 | {
291 | return dist <= Pos || IsFull;
292 | }
293 |
294 | bool IsEmpty() const
295 | {
296 | return Pos == 0 && !IsFull;
297 | }
298 | };
299 |
300 |
301 | In another implementation it's possible to use one buffer that contains
302 | Sliding Window and the whole data stream after uncompressing.
303 |
304 |
305 | Range Decoder
306 | -------------
307 |
308 | LZMA algorithm uses Range Encoding (1) as entropy coding method.
309 |
310 | LZMA stream contains just one very big number in big-endian encoding.
311 | LZMA decoder uses the Range Decoder to extract a sequence of binary
312 | symbols from that big number.
313 |
314 | The state of the Range Decoder:
315 |
316 | struct CRangeDecoder
317 | {
318 | UInt32 Range;
319 | UInt32 Code;
320 | InputStream *InStream;
321 |
322 | bool Corrupted;
323 | }
324 |
325 | The notes about UInt32 type for the "Range" and "Code" variables:
326 |
327 | It's possible to use 64-bit (unsigned or signed) integer type
328 | for the "Range" and the "Code" variables instead of 32-bit unsigned,
329 | but some additional code must be used to truncate the values to
330 | low 32-bits after some operations.
331 |
332 | If the programming language does not support 32-bit unsigned integer type
333 | (like in case of JAVA language), it's possible to use 32-bit signed integer,
334 | but some code must be changed. For example, it's required to change the code
335 | that uses comparison operations for UInt32 variables in this specification.
336 |
337 | The Range Decoder can be in some states that can be treated as
338 | "Corruption" in LZMA stream. The Range Decoder uses the variable "Corrupted":
339 |
340 | (Corrupted == false), if the Range Decoder has not detected any corruption.
341 | (Corrupted == true), if the Range Decoder has detected some corruption.
342 |
343 | The reference LZMA Decoder ignores the value of the "Corrupted" variable.
344 | So it continues to decode the stream, even if the corruption can be detected
345 | in the Range Decoder. To provide the full compatibility with output of the
346 | reference LZMA Decoder, another LZMA Decoder implementations must also
347 | ignore the value of the "Corrupted" variable.
348 |
349 | The LZMA Encoder is required to create only such LZMA streams, that will not
350 | lead the Range Decoder to states, where the "Corrupted" variable is set to true.
351 |
352 | The Range Decoder reads first 5 bytes from input stream to initialize
353 | the state:
354 |
355 | bool CRangeDecoder::Init()
356 | {
357 | Corrupted = false;
358 | Range = 0xFFFFFFFF;
359 | Code = 0;
360 |
361 | Byte b = InStream->ReadByte();
362 |
363 | for (int i = 0; i < 4; i++)
364 | Code = (Code << 8) | InStream->ReadByte();
365 |
366 | if (b != 0 || Code == Range)
367 | Corrupted = true;
368 | return b == 0;
369 | }
370 |
371 | The LZMA Encoder always writes ZERO in initial byte of compressed stream.
372 | That scheme allows to simplify the code of the Range Encoder in the
373 | LZMA Encoder. If initial byte is not equal to ZERO, the LZMA Decoder must
374 | stop decoding and report error.
375 |
376 | After the last bit of data was decoded by Range Decoder, the value of the
377 | "Code" variable must be equal to 0. The LZMA Decoder must check it by
378 | calling the IsFinishedOK() function:
379 |
380 | bool IsFinishedOK() const { return Code == 0; }
381 |
382 | If there is corruption in data stream, there is big probability that
383 | the "Code" value will be not equal to 0 in the Finish() function. So that
384 | check in the IsFinishedOK() function provides very good feature for
385 | corruption detection.
386 |
387 | The value of the "Range" variable before each bit decoding can not be smaller
388 | than ((UInt32)1 << 24). The Normalize() function keeps the "Range" value in
389 | described range.
390 |
391 | #define kTopValue ((UInt32)1 << 24)
392 |
393 | void CRangeDecoder::Normalize()
394 | {
395 | if (Range < kTopValue)
396 | {
397 | Range <<= 8;
398 | Code = (Code << 8) | InStream->ReadByte();
399 | }
400 | }
401 |
402 | Notes: if the size of the "Code" variable is larger than 32 bits, it's
403 | required to keep only low 32 bits of the "Code" variable after the change
404 | in Normalize() function.
405 |
406 | If the LZMA Stream is not corrupted, the value of the "Code" variable is
407 | always smaller than value of the "Range" variable.
408 | But the Range Decoder ignores some types of corruptions, so the value of
409 | the "Code" variable can be equal or larger than value of the "Range" variable
410 | for some "Corrupted" archives.
411 |
412 |
413 | LZMA uses Range Encoding only with binary symbols of two types:
414 | 1) binary symbols with fixed and equal probabilities (direct bits)
415 | 2) binary symbols with predicted probabilities
416 |
417 | The DecodeDirectBits() function decodes the sequence of direct bits:
418 |
419 | UInt32 CRangeDecoder::DecodeDirectBits(unsigned numBits)
420 | {
421 | UInt32 res = 0;
422 | do
423 | {
424 | Range >>= 1;
425 | Code -= Range;
426 | UInt32 t = 0 - ((UInt32)Code >> 31);
427 | Code += Range & t;
428 |
429 | if (Code == Range)
430 | Corrupted = true;
431 |
432 | Normalize();
433 | res <<= 1;
434 | res += t + 1;
435 | }
436 | while (--numBits);
437 | return res;
438 | }
439 |
440 |
441 | The Bit Decoding with Probability Model
442 | ---------------------------------------
443 |
444 | The task of Bit Probability Model is to estimate probabilities of binary
445 | symbols. And then it provides the Range Decoder with that information.
446 | The better prediction provides better compression ratio.
447 | The Bit Probability Model uses statistical data of previous decoded
448 | symbols.
449 |
450 | That estimated probability is presented as 11-bit unsigned integer value
451 | that represents the probability of symbol "0".
452 |
453 | #define kNumBitModelTotalBits 11
454 |
455 | Mathematical probabilities can be presented with the following formulas:
456 | probability(symbol_0) = prob / 2048.
457 | probability(symbol_1) = 1 - Probability(symbol_0) =
458 | = 1 - prob / 2048 =
459 | = (2048 - prob) / 2048
460 | where the "prob" variable contains 11-bit integer probability counter.
461 |
462 | It's recommended to use 16-bit unsigned integer type, to store these 11-bit
463 | probability values:
464 |
465 | typedef UInt16 CProb;
466 |
467 | Each probability value must be initialized with value ((1 << 11) / 2),
468 | that represents the state, where probabilities of symbols 0 and 1
469 | are equal to 0.5:
470 |
471 | #define PROB_INIT_VAL ((1 << kNumBitModelTotalBits) / 2)
472 |
473 | The INIT_PROBS macro is used to initialize the array of CProb variables:
474 |
475 | #define INIT_PROBS(p) \
476 | { for (unsigned i = 0; i < sizeof(p) / sizeof(p[0]); i++) p[i] = PROB_INIT_VAL; }
477 |
478 |
479 | The DecodeBit() function decodes one bit.
480 | The LZMA decoder provides the pointer to CProb variable that contains
481 | information about estimated probability for symbol 0 and the Range Decoder
482 | updates that CProb variable after decoding. The Range Decoder increases
483 | estimated probability of the symbol that was decoded:
484 |
485 | #define kNumMoveBits 5
486 |
487 | unsigned CRangeDecoder::DecodeBit(CProb *prob)
488 | {
489 | unsigned v = *prob;
490 | UInt32 bound = (Range >> kNumBitModelTotalBits) * v;
491 | unsigned symbol;
492 | if (Code < bound)
493 | {
494 | v += ((1 << kNumBitModelTotalBits) - v) >> kNumMoveBits;
495 | Range = bound;
496 | symbol = 0;
497 | }
498 | else
499 | {
500 | v -= v >> kNumMoveBits;
501 | Code -= bound;
502 | Range -= bound;
503 | symbol = 1;
504 | }
505 | *prob = (CProb)v;
506 | Normalize();
507 | return symbol;
508 | }
509 |
510 |
511 | The Binary Tree of bit model counters
512 | -------------------------------------
513 |
514 | LZMA uses a tree of Bit model variables to decode symbol that needs
515 | several bits for storing. There are two versions of such trees in LZMA:
516 | 1) the tree that decodes bits from high bit to low bit (the normal scheme).
517 | 2) the tree that decodes bits from low bit to high bit (the reverse scheme).
518 |
519 | Each binary tree structure supports different size of decoded symbol
520 | (the size of binary sequence that contains value of symbol).
521 | If that size of decoded symbol is "NumBits" bits, the tree structure
522 | uses the array of (2 << NumBits) counters of CProb type.
523 | But only ((2 << NumBits) - 1) items are used by encoder and decoder.
524 | The first item (the item with index equal to 0) in array is unused.
525 | That scheme with unused array's item allows to simplify the code.
526 |
527 | unsigned BitTreeReverseDecode(CProb *probs, unsigned numBits, CRangeDecoder *rc)
528 | {
529 | unsigned m = 1;
530 | unsigned symbol = 0;
531 | for (unsigned i = 0; i < numBits; i++)
532 | {
533 | unsigned bit = rc->DecodeBit(&probs[m]);
534 | m <<= 1;
535 | m += bit;
536 | symbol |= (bit << i);
537 | }
538 | return symbol;
539 | }
540 |
541 | template
542 | class CBitTreeDecoder
543 | {
544 | CProb Probs[(unsigned)1 << NumBits];
545 |
546 | public:
547 |
548 | void Init()
549 | {
550 | INIT_PROBS(Probs);
551 | }
552 |
553 | unsigned Decode(CRangeDecoder *rc)
554 | {
555 | unsigned m = 1;
556 | for (unsigned i = 0; i < NumBits; i++)
557 | m = (m << 1) + rc->DecodeBit(&Probs[m]);
558 | return m - ((unsigned)1 << NumBits);
559 | }
560 |
561 | unsigned ReverseDecode(CRangeDecoder *rc)
562 | {
563 | return BitTreeReverseDecode(Probs, NumBits, rc);
564 | }
565 | };
566 |
567 |
568 | LZ part of LZMA
569 | ---------------
570 |
571 | LZ part of LZMA describes details about the decoding of MATCHES and LITERALS.
572 |
573 |
574 | The Literal Decoding
575 | --------------------
576 |
577 | The LZMA Decoder uses (1 << (lc + lp)) tables with CProb values, where
578 | each table contains 0x300 CProb values:
579 |
580 | CProb *LitProbs;
581 |
582 | void CreateLiterals()
583 | {
584 | LitProbs = new CProb[(UInt32)0x300 << (lc + lp)];
585 | }
586 |
587 | void InitLiterals()
588 | {
589 | UInt32 num = (UInt32)0x300 << (lc + lp);
590 | for (UInt32 i = 0; i < num; i++)
591 | LitProbs[i] = PROB_INIT_VAL;
592 | }
593 |
594 | To select the table for decoding it uses the context that consists of
595 | (lc) high bits from previous literal and (lp) low bits from value that
596 | represents current position in outputStream.
597 |
598 | If (State > 7), the Literal Decoder also uses "matchByte" that represents
599 | the byte in OutputStream at position the is the DISTANCE bytes before
600 | current position, where the DISTANCE is the distance in DISTANCE-LENGTH pair
601 | of latest decoded match.
602 |
603 | The following code decodes one literal and puts it to Sliding Window buffer:
604 |
605 | void DecodeLiteral(unsigned state, UInt32 rep0)
606 | {
607 | unsigned prevByte = 0;
608 | if (!OutWindow.IsEmpty())
609 | prevByte = OutWindow.GetByte(1);
610 |
611 | unsigned symbol = 1;
612 | unsigned litState = ((OutWindow.TotalPos & ((1 << lp) - 1)) << lc) + (prevByte >> (8 - lc));
613 | CProb *probs = &LitProbs[(UInt32)0x300 * litState];
614 |
615 | if (state >= 7)
616 | {
617 | unsigned matchByte = OutWindow.GetByte(rep0 + 1);
618 | do
619 | {
620 | unsigned matchBit = (matchByte >> 7) & 1;
621 | matchByte <<= 1;
622 | unsigned bit = RangeDec.DecodeBit(&probs[((1 + matchBit) << 8) + symbol]);
623 | symbol = (symbol << 1) | bit;
624 | if (matchBit != bit)
625 | break;
626 | }
627 | while (symbol < 0x100);
628 | }
629 | while (symbol < 0x100)
630 | symbol = (symbol << 1) | RangeDec.DecodeBit(&probs[symbol]);
631 | OutWindow.PutByte((Byte)(symbol - 0x100));
632 | }
633 |
634 |
635 | The match length decoding
636 | -------------------------
637 |
638 | The match length decoder returns normalized (zero-based value)
639 | length of match. That value can be converted to real length of the match
640 | with the following code:
641 |
642 | #define kMatchMinLen 2
643 |
644 | matchLen = len + kMatchMinLen;
645 |
646 | The match length decoder can return the values from 0 to 271.
647 | And the corresponded real match length values can be in the range
648 | from 2 to 273.
649 |
650 | The following scheme is used for the match length encoding:
651 |
652 | Binary encoding Binary Tree structure Zero-based match length
653 | sequence (binary + decimal):
654 |
655 | 0 xxx LowCoder[posState] xxx
656 | 1 0 yyy MidCoder[posState] yyy + 8
657 | 1 1 zzzzzzzz HighCoder zzzzzzzz + 16
658 |
659 | LZMA uses bit model variable "Choice" to decode the first selection bit.
660 |
661 | If the first selection bit is equal to 0, the decoder uses binary tree
662 | LowCoder[posState] to decode 3-bit zero-based match length (xxx).
663 |
664 | If the first selection bit is equal to 1, the decoder uses bit model
665 | variable "Choice2" to decode the second selection bit.
666 |
667 | If the second selection bit is equal to 0, the decoder uses binary tree
668 | MidCoder[posState] to decode 3-bit "yyy" value, and zero-based match
669 | length is equal to (yyy + 8).
670 |
671 | If the second selection bit is equal to 1, the decoder uses binary tree
672 | HighCoder to decode 8-bit "zzzzzzzz" value, and zero-based
673 | match length is equal to (zzzzzzzz + 16).
674 |
675 | LZMA uses "posState" value as context to select the binary tree
676 | from LowCoder and MidCoder binary tree arrays:
677 |
678 | unsigned posState = OutWindow.TotalPos & ((1 << pb) - 1);
679 |
680 | The full code of the length decoder:
681 |
682 | class CLenDecoder
683 | {
684 | CProb Choice;
685 | CProb Choice2;
686 | CBitTreeDecoder<3> LowCoder[1 << kNumPosBitsMax];
687 | CBitTreeDecoder<3> MidCoder[1 << kNumPosBitsMax];
688 | CBitTreeDecoder<8> HighCoder;
689 |
690 | public:
691 |
692 | void Init()
693 | {
694 | Choice = PROB_INIT_VAL;
695 | Choice2 = PROB_INIT_VAL;
696 | HighCoder.Init();
697 | for (unsigned i = 0; i < (1 << kNumPosBitsMax); i++)
698 | {
699 | LowCoder[i].Init();
700 | MidCoder[i].Init();
701 | }
702 | }
703 |
704 | unsigned Decode(CRangeDecoder *rc, unsigned posState)
705 | {
706 | if (rc->DecodeBit(&Choice) == 0)
707 | return LowCoder[posState].Decode(rc);
708 | if (rc->DecodeBit(&Choice2) == 0)
709 | return 8 + MidCoder[posState].Decode(rc);
710 | return 16 + HighCoder.Decode(rc);
711 | }
712 | };
713 |
714 | The LZMA decoder uses two instances of CLenDecoder class.
715 | The first instance is for the matches of "Simple Match" type,
716 | and the second instance is for the matches of "Rep Match" type:
717 |
718 | CLenDecoder LenDecoder;
719 | CLenDecoder RepLenDecoder;
720 |
721 |
722 | The match distance decoding
723 | ---------------------------
724 |
725 | LZMA supports dictionary sizes up to 4 GiB minus 1.
726 | The value of match distance (decoded by distance decoder) can be
727 | from 1 to 2^32. But the distance value that is equal to 2^32 is used to
728 | indicate the "End of stream" marker. So real largest match distance
729 | that is used for LZ-window match is (2^32 - 1).
730 |
731 | LZMA uses normalized match length (zero-based length)
732 | to calculate the context state "lenState" do decode the distance value:
733 |
734 | #define kNumLenToPosStates 4
735 |
736 | unsigned lenState = len;
737 | if (lenState > kNumLenToPosStates - 1)
738 | lenState = kNumLenToPosStates - 1;
739 |
740 | The distance decoder returns the "dist" value that is zero-based value
741 | of match distance. The real match distance can be calculated with the
742 | following code:
743 |
744 | matchDistance = dist + 1;
745 |
746 | The state of the distance decoder and the initialization code:
747 |
748 | #define kEndPosModelIndex 14
749 | #define kNumFullDistances (1 << (kEndPosModelIndex >> 1))
750 | #define kNumAlignBits 4
751 |
752 | CBitTreeDecoder<6> PosSlotDecoder[kNumLenToPosStates];
753 | CProb PosDecoders[1 + kNumFullDistances - kEndPosModelIndex];
754 | CBitTreeDecoder AlignDecoder;
755 |
756 | void InitDist()
757 | {
758 | for (unsigned i = 0; i < kNumLenToPosStates; i++)
759 | PosSlotDecoder[i].Init();
760 | AlignDecoder.Init();
761 | INIT_PROBS(PosDecoders);
762 | }
763 |
764 | At first stage the distance decoder decodes 6-bit "posSlot" value with bit
765 | tree decoder from PosSlotDecoder array. It's possible to get 2^6=64 different
766 | "posSlot" values.
767 |
768 | unsigned posSlot = PosSlotDecoder[lenState].Decode(&RangeDec);
769 |
770 | The encoding scheme for distance value is shown in the following table:
771 |
772 | posSlot (decimal) /
773 | zero-based distance (binary)
774 | 0 0
775 | 1 1
776 | 2 10
777 | 3 11
778 |
779 | 4 10 x
780 | 5 11 x
781 | 6 10 xx
782 | 7 11 xx
783 | 8 10 xxx
784 | 9 11 xxx
785 | 10 10 xxxx
786 | 11 11 xxxx
787 | 12 10 xxxxx
788 | 13 11 xxxxx
789 |
790 | 14 10 yy zzzz
791 | 15 11 yy zzzz
792 | 16 10 yyy zzzz
793 | 17 11 yyy zzzz
794 | ...
795 | 62 10 yyyyyyyyyyyyyyyyyyyyyyyyyy zzzz
796 | 63 11 yyyyyyyyyyyyyyyyyyyyyyyyyy zzzz
797 |
798 | where
799 | "x ... x" means the sequence of binary symbols encoded with binary tree and
800 | "Reverse" scheme. It uses separated binary tree for each posSlot from 4 to 13.
801 | "y" means direct bit encoded with range coder.
802 | "zzzz" means the sequence of four binary symbols encoded with binary
803 | tree with "Reverse" scheme, where one common binary tree "AlignDecoder"
804 | is used for all posSlot values.
805 |
806 | If (posSlot < 4), the "dist" value is equal to posSlot value.
807 |
808 | If (posSlot >= 4), the decoder uses "posSlot" value to calculate the value of
809 | the high bits of "dist" value and the number of the low bits.
810 |
811 | If (4 <= posSlot < kEndPosModelIndex), the decoder uses bit tree decoders.
812 | (one separated bit tree decoder per one posSlot value) and "Reverse" scheme.
813 | In this implementation we use one CProb array "PosDecoders" that contains
814 | all CProb variables for all these bit decoders.
815 |
816 | if (posSlot >= kEndPosModelIndex), the middle bits are decoded as direct
817 | bits from RangeDecoder and the low 4 bits are decoded with a bit tree
818 | decoder "AlignDecoder" with "Reverse" scheme.
819 |
820 | The code to decode zero-based match distance:
821 |
822 | unsigned DecodeDistance(unsigned len)
823 | {
824 | unsigned lenState = len;
825 | if (lenState > kNumLenToPosStates - 1)
826 | lenState = kNumLenToPosStates - 1;
827 |
828 | unsigned posSlot = PosSlotDecoder[lenState].Decode(&RangeDec);
829 | if (posSlot < 4)
830 | return posSlot;
831 |
832 | unsigned numDirectBits = (unsigned)((posSlot >> 1) - 1);
833 | UInt32 dist = ((2 | (posSlot & 1)) << numDirectBits);
834 | if (posSlot < kEndPosModelIndex)
835 | dist += BitTreeReverseDecode(PosDecoders + dist - posSlot, numDirectBits, &RangeDec);
836 | else
837 | {
838 | dist += RangeDec.DecodeDirectBits(numDirectBits - kNumAlignBits) << kNumAlignBits;
839 | dist += AlignDecoder.ReverseDecode(&RangeDec);
840 | }
841 | return dist;
842 | }
843 |
844 |
845 |
846 | LZMA Decoding modes
847 | -------------------
848 |
849 | There are 2 types of LZMA streams:
850 |
851 | 1) The stream with "End of stream" marker.
852 | 2) The stream without "End of stream" marker.
853 |
854 | And the LZMA Decoder supports 3 modes of decoding:
855 |
856 | 1) The unpack size is undefined. The LZMA decoder stops decoding after
857 | getting "End of stream" marker.
858 | The input variables for that case:
859 |
860 | markerIsMandatory = true
861 | unpackSizeDefined = false
862 | unpackSize contains any value
863 |
864 | 2) The unpack size is defined and LZMA decoder supports both variants,
865 | where the stream can contain "End of stream" marker or the stream is
866 | finished without "End of stream" marker. The LZMA decoder must detect
867 | any of these situations.
868 | The input variables for that case:
869 |
870 | markerIsMandatory = false
871 | unpackSizeDefined = true
872 | unpackSize contains unpack size
873 |
874 | 3) The unpack size is defined and the LZMA stream must contain
875 | "End of stream" marker
876 | The input variables for that case:
877 |
878 | markerIsMandatory = true
879 | unpackSizeDefined = true
880 | unpackSize contains unpack size
881 |
882 |
883 | The main loop of decoder
884 | ------------------------
885 |
886 | The main loop of LZMA decoder:
887 |
888 | Initialize the LZMA state.
889 | loop
890 | {
891 | // begin of loop
892 | Check "end of stream" conditions.
893 | Decode Type of MATCH / LITERAL.
894 | If it's LITERAL, decode LITERAL value and put the LITERAL to Window.
895 | If it's MATCH, decode the length of match and the match distance.
896 | Check error conditions, check end of stream conditions and copy
897 | the sequence of match bytes from sliding window to current position
898 | in window.
899 | Go to begin of loop
900 | }
901 |
902 | The reference implementation of LZMA decoder uses "unpackSize" variable
903 | to keep the number of remaining bytes in output stream. So it reduces
904 | "unpackSize" value after each decoded LITERAL or MATCH.
905 |
906 | The following code contains the "end of stream" condition check at the start
907 | of the loop:
908 |
909 | if (unpackSizeDefined && unpackSize == 0 && !markerIsMandatory)
910 | if (RangeDec.IsFinishedOK())
911 | return LZMA_RES_FINISHED_WITHOUT_MARKER;
912 |
913 | LZMA uses three types of matches:
914 |
915 | 1) "Simple Match" - the match with distance value encoded with bit models.
916 |
917 | 2) "Rep Match" - the match that uses the distance from distance
918 | history table.
919 |
920 | 3) "Short Rep Match" - the match of single byte length, that uses the latest
921 | distance from distance history table.
922 |
923 | The LZMA decoder keeps the history of latest 4 match distances that were used
924 | by decoder. That set of 4 variables contains zero-based match distances and
925 | these variables are initialized with zero values:
926 |
927 | UInt32 rep0 = 0, rep1 = 0, rep2 = 0, rep3 = 0;
928 |
929 | The LZMA decoder uses binary model variables to select type of MATCH or LITERAL:
930 |
931 | #define kNumStates 12
932 | #define kNumPosBitsMax 4
933 |
934 | CProb IsMatch[kNumStates << kNumPosBitsMax];
935 | CProb IsRep[kNumStates];
936 | CProb IsRepG0[kNumStates];
937 | CProb IsRepG1[kNumStates];
938 | CProb IsRepG2[kNumStates];
939 | CProb IsRep0Long[kNumStates << kNumPosBitsMax];
940 |
941 | The decoder uses "state" variable value to select exact variable
942 | from "IsRep", "IsRepG0", "IsRepG1" and "IsRepG2" arrays.
943 | The "state" variable can get the value from 0 to 11.
944 | Initial value for "state" variable is zero:
945 |
946 | unsigned state = 0;
947 |
948 | The "state" variable is updated after each LITERAL or MATCH with one of the
949 | following functions:
950 |
951 | unsigned UpdateState_Literal(unsigned state)
952 | {
953 | if (state < 4) return 0;
954 | else if (state < 10) return state - 3;
955 | else return state - 6;
956 | }
957 | unsigned UpdateState_Match (unsigned state) { return state < 7 ? 7 : 10; }
958 | unsigned UpdateState_Rep (unsigned state) { return state < 7 ? 8 : 11; }
959 | unsigned UpdateState_ShortRep(unsigned state) { return state < 7 ? 9 : 11; }
960 |
961 | The decoder calculates "state2" variable value to select exact variable from
962 | "IsMatch" and "IsRep0Long" arrays:
963 |
964 | unsigned posState = OutWindow.TotalPos & ((1 << pb) - 1);
965 | unsigned state2 = (state << kNumPosBitsMax) + posState;
966 |
967 | The decoder uses the following code flow scheme to select exact
968 | type of LITERAL or MATCH:
969 |
970 | IsMatch[state2] decode
971 | 0 - the Literal
972 | 1 - the Match
973 | IsRep[state] decode
974 | 0 - Simple Match
975 | 1 - Rep Match
976 | IsRepG0[state] decode
977 | 0 - the distance is rep0
978 | IsRep0Long[state2] decode
979 | 0 - Short Rep Match
980 | 1 - Rep Match 0
981 | 1 -
982 | IsRepG1[state] decode
983 | 0 - Rep Match 1
984 | 1 -
985 | IsRepG2[state] decode
986 | 0 - Rep Match 2
987 | 1 - Rep Match 3
988 |
989 |
990 | LITERAL symbol
991 | --------------
992 | If the value "0" was decoded with IsMatch[state2] decoding, we have "LITERAL" type.
993 |
994 | At first the LZMA decoder must check that it doesn't exceed
995 | specified uncompressed size:
996 |
997 | if (unpackSizeDefined && unpackSize == 0)
998 | return LZMA_RES_ERROR;
999 |
1000 | Then it decodes literal value and puts it to sliding window:
1001 |
1002 | DecodeLiteral(state, rep0);
1003 |
1004 | Then the decoder must update the "state" value and "unpackSize" value;
1005 |
1006 | state = UpdateState_Literal(state);
1007 | unpackSize--;
1008 |
1009 | Then the decoder must go to the begin of main loop to decode next Match or Literal.
1010 |
1011 |
1012 | Simple Match
1013 | ------------
1014 |
1015 | If the value "1" was decoded with IsMatch[state2] decoding,
1016 | we have the "Simple Match" type.
1017 |
1018 | The distance history table is updated with the following scheme:
1019 |
1020 | rep3 = rep2;
1021 | rep2 = rep1;
1022 | rep1 = rep0;
1023 |
1024 | The zero-based length is decoded with "LenDecoder":
1025 |
1026 | len = LenDecoder.Decode(&RangeDec, posState);
1027 |
1028 | The state is update with UpdateState_Match function:
1029 |
1030 | state = UpdateState_Match(state);
1031 |
1032 | and the new "rep0" value is decoded with DecodeDistance:
1033 |
1034 | rep0 = DecodeDistance(len);
1035 |
1036 | That "rep0" will be used as zero-based distance for current match.
1037 |
1038 | If the value of "rep0" is equal to 0xFFFFFFFF, it means that we have
1039 | "End of stream" marker, so we can stop decoding and check finishing
1040 | condition in Range Decoder:
1041 |
1042 | if (rep0 == 0xFFFFFFFF)
1043 | return RangeDec.IsFinishedOK() ?
1044 | LZMA_RES_FINISHED_WITH_MARKER :
1045 | LZMA_RES_ERROR;
1046 |
1047 | If uncompressed size is defined, LZMA decoder must check that it doesn't
1048 | exceed that specified uncompressed size:
1049 |
1050 | if (unpackSizeDefined && unpackSize == 0)
1051 | return LZMA_RES_ERROR;
1052 |
1053 | Also the decoder must check that "rep0" value is not larger than dictionary size
1054 | and is not larger than the number of already decoded bytes:
1055 |
1056 | if (rep0 >= dictSize || !OutWindow.CheckDistance(rep0))
1057 | return LZMA_RES_ERROR;
1058 |
1059 | Then the decoder must copy match bytes as described in
1060 | "The match symbols copying" section.
1061 |
1062 |
1063 | Rep Match
1064 | ---------
1065 |
1066 | If the LZMA decoder has decoded the value "1" with IsRep[state] variable,
1067 | we have "Rep Match" type.
1068 |
1069 | At first the LZMA decoder must check that it doesn't exceed
1070 | specified uncompressed size:
1071 |
1072 | if (unpackSizeDefined && unpackSize == 0)
1073 | return LZMA_RES_ERROR;
1074 |
1075 | Also the decoder must return error, if the LZ window is empty:
1076 |
1077 | if (OutWindow.IsEmpty())
1078 | return LZMA_RES_ERROR;
1079 |
1080 | If the match type is "Rep Match", the decoder uses one of the 4 variables of
1081 | distance history table to get the value of distance for current match.
1082 | And there are 4 corresponding ways of decoding flow.
1083 |
1084 | The decoder updates the distance history with the following scheme
1085 | depending from type of match:
1086 |
1087 | - "Rep Match 0" or "Short Rep Match":
1088 | ; LZMA doesn't update the distance history
1089 |
1090 | - "Rep Match 1":
1091 | UInt32 dist = rep1;
1092 | rep1 = rep0;
1093 | rep0 = dist;
1094 |
1095 | - "Rep Match 2":
1096 | UInt32 dist = rep2;
1097 | rep2 = rep1;
1098 | rep1 = rep0;
1099 | rep0 = dist;
1100 |
1101 | - "Rep Match 3":
1102 | UInt32 dist = rep3;
1103 | rep3 = rep2;
1104 | rep2 = rep1;
1105 | rep1 = rep0;
1106 | rep0 = dist;
1107 |
1108 | Then the decoder decodes exact subtype of "Rep Match" using "IsRepG0", "IsRep0Long",
1109 | "IsRepG1", "IsRepG2".
1110 |
1111 | If the subtype is "Short Rep Match", the decoder updates the state, puts
1112 | the one byte from window to current position in window and goes to next
1113 | MATCH/LITERAL symbol (the begin of main loop):
1114 |
1115 | state = UpdateState_ShortRep(state);
1116 | OutWindow.PutByte(OutWindow.GetByte(rep0 + 1));
1117 | unpackSize--;
1118 | continue;
1119 |
1120 | In other cases (Rep Match 0/1/2/3), it decodes the zero-based
1121 | length of match with "RepLenDecoder" decoder:
1122 |
1123 | len = RepLenDecoder.Decode(&RangeDec, posState);
1124 |
1125 | Then it updates the state:
1126 |
1127 | state = UpdateState_Rep(state);
1128 |
1129 | Then the decoder must copy match bytes as described in
1130 | "The Match symbols copying" section.
1131 |
1132 |
1133 | The match symbols copying
1134 | -------------------------
1135 |
1136 | If we have the match (Simple Match or Rep Match 0/1/2/3), the decoder must
1137 | copy the sequence of bytes with calculated match distance and match length.
1138 | If uncompressed size is defined, LZMA decoder must check that it doesn't
1139 | exceed that specified uncompressed size:
1140 |
1141 | len += kMatchMinLen;
1142 | bool isError = false;
1143 | if (unpackSizeDefined && unpackSize < len)
1144 | {
1145 | len = (unsigned)unpackSize;
1146 | isError = true;
1147 | }
1148 | OutWindow.CopyMatch(rep0 + 1, len);
1149 | unpackSize -= len;
1150 | if (isError)
1151 | return LZMA_RES_ERROR;
1152 |
1153 | Then the decoder must go to the begin of main loop to decode next MATCH or LITERAL.
1154 |
1155 |
1156 |
1157 | NOTES
1158 | -----
1159 |
1160 | This specification doesn't describe the variant of decoder implementation
1161 | that supports partial decoding. Such partial decoding case can require some
1162 | changes in "end of stream" condition checks code. Also such code
1163 | can use additional status codes, returned by decoder.
1164 |
1165 | This specification uses C++ code with templates to simplify describing.
1166 | The optimized version of LZMA decoder doesn't need templates.
1167 | Such optimized version can use just two arrays of CProb variables:
1168 | 1) The dynamic array of CProb variables allocated for the Literal Decoder.
1169 | 2) The one common array that contains all other CProb variables.
1170 |
1171 |
1172 | References:
1173 |
1174 | 1. G. N. N. Martin, Range encoding: an algorithm for removing redundancy
1175 | from a digitized message, Video & Data Recording Conference,
1176 | Southampton, UK, July 24-27, 1979.
1177 |
--------------------------------------------------------------------------------
/lzmaSh2.cpp:
--------------------------------------------------------------------------------
1 | #include
2 |
3 | typedef unsigned int uint;
4 | typedef unsigned char byte;
5 | typedef unsigned long long qword;
6 |
7 | struct lzma_decode {
8 |
9 | enum { SCALElog=11, SCALE=1<>5) : ((SCALE-P)>>5); }
15 | };
16 |
17 | FILE *f, *g;
18 | byte get( void ) { return getc(f); }
19 | void put( uint c ) { putc(c,g); }
20 |
21 | uint range, code;
22 |
23 | void rc_Init( void ) {
24 | code = get() | ((get()<<24) | (get()<<16) | (get()<<8) | (get()));
25 | range = 0xFFFFFFFF;
26 | }
27 |
28 | uint rc_Bits( uint l ) {
29 | uint x=0; do {
30 | if( range<0x01000000 ) range<<=8, code=(code<<8) | get();
31 | range &= ~1;
32 | uint rnew = (range>>1) * 1;
33 | uint bit = code >= rnew;
34 | range = bit ? code-=rnew,range-rnew : rnew;
35 | x += x + bit;
36 | } while( --l!=0 );
37 | return x;
38 | }
39 |
40 | uint rc_Decode( uint P ) {
41 | if( range<0x01000000 ) range<<=8, code=(code<<8) | get();
42 | uint rnew = (range >> SCALElog) * P;
43 | uint bit = code >= rnew;
44 | range = bit ? code-=rnew,range-rnew : rnew;
45 | return bit;
46 | }
47 |
48 | uint BIT( Counter& cc ) {
49 | uint bit = rc_Decode(cc.P);
50 | cc.Update( bit );
51 | return bit;
52 | }
53 |
54 | enum {
55 | kNumLPosBitsMax = 4,
56 | kNumPosBitsMax = 4, kNumPosStatesMax = (1<>1)),
62 | kNumPosSlotBits = 6, kNumLenToPosStates = 4,
63 | kNumAlignBits = 4, kAlignTableSize = (1<>8) | (qword(get())<<56);
96 | rc_Init();
97 | lc = d % 9; d /= 9;
98 | pb = d / 5; lp = d % 5;
99 |
100 | for( i=0; i<32; i++ ) rbit5[i] = ((i*0x0802&0x22110)|(i*0x8020&0x88440))*0x10101 >> 16+3;
101 |
102 | uint state=0,rep0=1,rep1=1,rep2=1,rep3=1;
103 | uint dicPos = 0, dicBufSize = dictSize;
104 | uint pbMask = (1<>lc8];
117 | if( state>=kNumLitStates ) {
118 | uint matchbyte = 0x100 + dic[rep0pos()];
119 | for( sym=1; sym<0x100; ) {
120 | uint mbprefix = (matchbyte<<=1) >> 8;
121 | sym += sym + BIT(cc[1+(mbprefix&1)][sym]);
122 | if( mbprefix!=sym ) break;
123 | }
124 | } else sym=1;
125 | for(; sym<0x100; sym+=sym+BIT(cc[0][sym]) );
126 |
127 | symstore(sym);
128 | state = (state<4) ? 0 : (state<10) ? state-3 : state-6;
129 |
130 | } else {
131 |
132 | uint f_rep = BIT(c_IsRep[state]);
133 |
134 | if( f_rep==0 ) state += kNumStates; else {
135 |
136 | if( BIT(c_IsRepG0[state])==0 ) {
137 |
138 | if( BIT(c_IsRep0Long[state][posState])==0 ) {
139 | sym = dic[rep0pos()]; symstore(sym);
140 | state = state < kNumLitStates ? 9 : 11;
141 | continue;
142 | }
143 |
144 | } else {
145 |
146 | dist = rep1;
147 | if( BIT(c_IsRepG1[state]) ) {
148 | dist = rep2;
149 | if( BIT(c_IsRepG2[state]) ) dist = rep3, rep3 = rep2;
150 | rep2 = rep1;
151 | }
152 | rep1 = rep0; rep0 = dist;
153 | }
154 |
155 | state = state < kNumLitStates ? 8 : 11;
156 | }
157 |
158 | uint limit, offset;
159 | Counter* clen = 0;
160 | if( BIT(c_LenChoice[f_rep])==0 ) {
161 | clen = &c_LenLow[f_rep][posState][0];
162 | offset = 0; limit = (1 << kLenNumLowBits);
163 | } else {
164 | if( BIT(c_LenChoice2[f_rep])==0 ) {
165 | clen = &c_LenMid[f_rep][posState][0];
166 | offset = kLenNumLowSymbols; limit = (1<=kNumStates ) {
176 | Counter (&cpos)[1<=kStartPosModelIndex ) {
180 | uint posSlot = dist;
181 | int numDirectBits = (dist>>1) - 1; // 13/2-1=5 max
182 | dist = (2 | (dist & 1));
183 |
184 | if( posSlot0; len-- ) {
206 | sym = dic[pos]; symstore(sym);
207 | if( ++pos == dicBufSize ) pos=0;
208 | }
209 |
210 | } // match
211 |
212 | } // for
213 |
214 | }
215 |
216 | };
217 |
218 | int main( int argc, char** argv ) {
219 | if( argc<3 ) return 1;
220 | FILE* f = fopen( argv[1], "rb" ); if( f==0 ) return 2;
221 | FILE* g = fopen( argv[2], "wb" ); if( g==0 ) return 3;
222 |
223 | static lzma_decode D( f, g );
224 |
225 | fclose( f );
226 | fclose( g );
227 | return 0;
228 | }
229 |
--------------------------------------------------------------------------------
/lzmaSh2.exe:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shelwien/lzma_sh/0e916adc087e029ad5afd1567bf0ca03a5574a1b/lzmaSh2.exe
--------------------------------------------------------------------------------
/lzmaSh2a.cpp:
--------------------------------------------------------------------------------
1 | #include
2 |
3 | typedef unsigned int uint;
4 | typedef unsigned char byte;
5 | typedef unsigned long long qword;
6 |
7 | const byte statemap[][7] = {
8 | { 7, 0, 8, 9, 8, 8, 8, },
9 | { 7, 0, 8, 9, 8, 8, 8, },
10 | { 7, 0, 8, 9, 8, 8, 8, },
11 | { 7, 0, 8, 9, 8, 8, 8, },
12 | { 7, 1, 8, 9, 8, 8, 8, },
13 | { 7, 2, 8, 9, 8, 8, 8, },
14 | { 7, 3, 8, 9, 8, 8, 8, },
15 | { 10, 4, 11, 11, 11, 11, 11, },
16 | { 10, 5, 11, 11, 11, 11, 11, },
17 | { 10, 6, 11, 11, 11, 11, 11, },
18 | { 10, 4, 11, 11, 11, 11, 11, },
19 | { 10, 5, 11, 11, 11, 11, 11, },
20 | };
21 |
22 | struct lzma_decode {
23 |
24 | enum { SCALElog=11, SCALE=1<>5) : ((SCALE-P)>>5); }
30 | };
31 |
32 | FILE *f, *g;
33 | byte get( void ) { return getc(f); }
34 | void put( uint c ) { putc(c,g); }
35 |
36 | uint range, code;
37 |
38 | void rc_Init( void ) {
39 | code = get() | ((get()<<24) | (get()<<16) | (get()<<8) | (get()));
40 | range = 0xFFFFFFFF;
41 | }
42 |
43 | uint rc_Bits( uint l ) {
44 | uint x=0; do {
45 | if( range<0x01000000 ) range<<=8, code=(code<<8) | get();
46 | range &= ~1;
47 | uint rnew = (range>>1) * 1;
48 | uint bit = code >= rnew;
49 | range = bit ? code-=rnew,range-rnew : rnew;
50 | x += x + bit;
51 | } while( --l!=0 );
52 | return x;
53 | }
54 |
55 | uint rc_Decode( uint P ) {
56 | if( range<0x01000000 ) range<<=8, code=(code<<8) | get();
57 | uint rnew = (range >> SCALElog) * P;
58 | uint bit = code >= rnew;
59 | range = bit ? code-=rnew,range-rnew : rnew;
60 | return bit;
61 | }
62 |
63 | uint BIT( Counter& cc ) {
64 | uint bit = rc_Decode(cc.P);
65 | cc.Update( bit );
66 | return bit;
67 | }
68 |
69 | enum {
70 | kNumLPosBitsMax = 4,
71 | kNumPosBitsMax = 4, kNumPosStatesMax = (1<>1)),
77 | kNumPosSlotBits = 6, kNumLenToPosStates = 4,
78 | kNumAlignBits = 4, kAlignTableSize = (1<>8) | (qword(get())<<56);
112 | rc_Init();
113 | lc = d % 9; d /= 9;
114 | pb = d / 5; lp = d % 5;
115 |
116 | for( i=0; i<32; i++ ) rbit5[i] = ((i*0x0802&0x22110)|(i*0x8020&0x88440))*0x10101 >> 16+3;
117 |
118 | uint state=0,rep0=1,rep1=1,rep2=1,rep3=1;
119 | uint dicPos = 0, dicBufSize = dictSize;
120 | uint pbMask = (1<>lc8];
143 |
144 | if( state>=kNumLitStates ) {
145 | uint matchbyte = 0x100 + dic[rep0pos()];
146 | for( sym=1; sym<0x100; ) {
147 | uint mbprefix = (matchbyte<<=1) >> 8;
148 | sym += sym + BIT(cc[1+(mbprefix&1)][sym]);
149 | if( mbprefix!=sym ) break;
150 | }
151 | } else sym=1;
152 | for(; sym<0x100; sym+=sym+BIT(cc[0][sym]) );
153 | }
154 |
155 | symstore(sym);
156 |
157 | } else {
158 |
159 | uint f_rep = (id!=id_match);
160 |
161 | if( f_rep ) {
162 | if( id!=id_r0 ) {
163 | dist = rep1;
164 | if( id!=id_r1 ) {
165 | dist = rep2;
166 | if( id==id_r3 ) dist = rep3, rep3 = rep2;
167 | rep2 = rep1;
168 | }
169 | rep1 = rep0; rep0 = dist;
170 | }
171 | }
172 |
173 | if( BIT(c_LenChoice[f_rep])==0 ) i_len=0; else
174 | if( BIT(c_LenChoice2[f_rep])==0 ) i_len=1; else i_len=2;
175 |
176 | uint limit, offset;
177 | if( i_len==0 ) {
178 | clen = &c_LenLow[f_rep][posState][0];
179 | offset = 0; limit = (1 << kLenNumLowBits);
180 | } else {
181 | if( i_len==1 ) {
182 | clen = &c_LenMid[f_rep][posState][0];
183 | offset = kLenNumLowSymbols; limit = (1<=kStartPosModelIndex ) {
201 | uint posSlot = dist;
202 | int numDirectBits = (dist>>1) - 1; // 13/2-1=5 max
203 | dist = (2 | (dist & 1));
204 |
205 | if( posSlot0; len-- ) {
225 | sym = dic[pos]; symstore(sym);
226 | if( ++pos == dicBufSize ) pos=0;
227 | }
228 |
229 | } // match
230 |
231 | } // for
232 |
233 | }
234 |
235 | };
236 |
237 | int main( int argc, char** argv ) {
238 | if( argc<3 ) return 1;
239 | FILE* f = fopen( argv[1], "rb" ); if( f==0 ) return 2;
240 | FILE* g = fopen( argv[2], "wb" ); if( g==0 ) return 3;
241 |
242 | static lzma_decode D( f, g );
243 |
244 | fclose( f );
245 | fclose( g );
246 | return 0;
247 | }
248 |
--------------------------------------------------------------------------------
/lzmaSh2a.exe:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shelwien/lzma_sh/0e916adc087e029ad5afd1567bf0ca03a5574a1b/lzmaSh2a.exe
--------------------------------------------------------------------------------
/lzmaspec-readme.txt:
--------------------------------------------------------------------------------
1 | LZMA Specification
2 | ------------------
3 |
4 | This package contains:
5 |
6 | - LZMA Specification
7 | - LZMA Reference Decoder in C++
8 | - The folder with examples of lzma archives
9 |
10 | Note that LZMA Reference Decoder is not optimized for speed.
11 | You can use LZMA Decoder from LZMA SDK, if you need the code optimized for speed.
12 |
13 | If you see some bug in code or errors in text of specification,
14 | you can send a message to Igor Pavlov in support forum
15 | or via SourceForge email message system:
16 |
17 | http://www.7-zip.org/support.html
18 |
19 |
20 | ---
21 | Igor Pavlov
22 | http://www.7-zip.org
23 |
--------------------------------------------------------------------------------
/test.bat:
--------------------------------------------------------------------------------
1 | @echo off
2 |
3 | del 1 2
4 | lzmaSh2.exe geo_lzma 1
5 | lzmaSh2a.exe geo_lzma 2
6 | fc /b 1 2
7 |
8 |
--------------------------------------------------------------------------------