├── .gitattributes ├── .gitignore ├── Compress.c ├── Custom.h ├── Decompression.txt ├── Huffman.c ├── LICENSE ├── Main.c ├── Node.c ├── README.md ├── Uncompress.c └── 截图 ├── clip_image001.png ├── clip_image003.png ├── clip_image005.png ├── clip_image007.png ├── clip_image009.png ├── clip_image011.png ├── clip_image013.png ├── clip_image015.png ├── clip_image016.jpg ├── clip_image017.jpg ├── clip_image018.jpg ├── clip_image019.jpg ├── clip_image020.jpg ├── clip_image021.jpg ├── clip_image022.jpg ├── clip_image023.jpg ├── clip_image024.jpg ├── clip_image025.jpg ├── clip_image026.jpg ├── clip_image027.jpg ├── clip_image028.jpg └── clip_image029.jpg /.gitattributes: -------------------------------------------------------------------------------- 1 | # Auto detect text files and perform LF normalization 2 | * text=auto 3 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Prerequisites 2 | *.d 3 | 4 | # Object files 5 | *.o 6 | *.ko 7 | *.obj 8 | *.elf 9 | 10 | # Linker output 11 | *.ilk 12 | *.map 13 | *.exp 14 | 15 | # Precompiled Headers 16 | *.gch 17 | *.pch 18 | 19 | # Libraries 20 | *.lib 21 | *.a 22 | *.la 23 | *.lo 24 | 25 | # Shared objects (inc. Windows DLLs) 26 | *.dll 27 | *.so 28 | *.so.* 29 | *.dylib 30 | 31 | # Executables 32 | *.exe 33 | *.out 34 | *.app 35 | *.i*86 36 | *.x86_64 37 | *.hex 38 | 39 | # Debug files 40 | *.dSYM/ 41 | *.su 42 | *.idb 43 | *.pdb 44 | 45 | # Kernel Module Compile Results 46 | *.mod* 47 | *.cmd 48 | .tmp_versions/ 49 | modules.order 50 | Module.symvers 51 | Mkfile.old 52 | dkms.conf 53 | -------------------------------------------------------------------------------- /Compress.c: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/Compress.c -------------------------------------------------------------------------------- /Custom.h: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/Custom.h -------------------------------------------------------------------------------- /Decompression.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/Decompression.txt -------------------------------------------------------------------------------- /Huffman.c: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/Huffman.c -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Mozilla Public License Version 2.0 2 | ================================== 3 | 4 | 1. Definitions 5 | -------------- 6 | 7 | 1.1. "Contributor" 8 | means each individual or legal entity that creates, contributes to 9 | the creation of, or owns Covered Software. 10 | 11 | 1.2. "Contributor Version" 12 | means the combination of the Contributions of others (if any) used 13 | by a Contributor and that particular Contributor's Contribution. 14 | 15 | 1.3. "Contribution" 16 | means Covered Software of a particular Contributor. 17 | 18 | 1.4. "Covered Software" 19 | means Source Code Form to which the initial Contributor has attached 20 | the notice in Exhibit A, the Executable Form of such Source Code 21 | Form, and Modifications of such Source Code Form, in each case 22 | including portions thereof. 23 | 24 | 1.5. "Incompatible With Secondary Licenses" 25 | means 26 | 27 | (a) that the initial Contributor has attached the notice described 28 | in Exhibit B to the Covered Software; or 29 | 30 | (b) that the Covered Software was made available under the terms of 31 | version 1.1 or earlier of the License, but not also under the 32 | terms of a Secondary License. 33 | 34 | 1.6. "Executable Form" 35 | means any form of the work other than Source Code Form. 36 | 37 | 1.7. "Larger Work" 38 | means a work that combines Covered Software with other material, in 39 | a separate file or files, that is not Covered Software. 40 | 41 | 1.8. "License" 42 | means this document. 43 | 44 | 1.9. "Licensable" 45 | means having the right to grant, to the maximum extent possible, 46 | whether at the time of the initial grant or subsequently, any and 47 | all of the rights conveyed by this License. 48 | 49 | 1.10. "Modifications" 50 | means any of the following: 51 | 52 | (a) any file in Source Code Form that results from an addition to, 53 | deletion from, or modification of the contents of Covered 54 | Software; or 55 | 56 | (b) any new file in Source Code Form that contains any Covered 57 | Software. 58 | 59 | 1.11. "Patent Claims" of a Contributor 60 | means any patent claim(s), including without limitation, method, 61 | process, and apparatus claims, in any patent Licensable by such 62 | Contributor that would be infringed, but for the grant of the 63 | License, by the making, using, selling, offering for sale, having 64 | made, import, or transfer of either its Contributions or its 65 | Contributor Version. 66 | 67 | 1.12. "Secondary License" 68 | means either the GNU General Public License, Version 2.0, the GNU 69 | Lesser General Public License, Version 2.1, the GNU Affero General 70 | Public License, Version 3.0, or any later versions of those 71 | licenses. 72 | 73 | 1.13. "Source Code Form" 74 | means the form of the work preferred for making modifications. 75 | 76 | 1.14. "You" (or "Your") 77 | means an individual or a legal entity exercising rights under this 78 | License. For legal entities, "You" includes any entity that 79 | controls, is controlled by, or is under common control with You. For 80 | purposes of this definition, "control" means (a) the power, direct 81 | or indirect, to cause the direction or management of such entity, 82 | whether by contract or otherwise, or (b) ownership of more than 83 | fifty percent (50%) of the outstanding shares or beneficial 84 | ownership of such entity. 85 | 86 | 2. License Grants and Conditions 87 | -------------------------------- 88 | 89 | 2.1. Grants 90 | 91 | Each Contributor hereby grants You a world-wide, royalty-free, 92 | non-exclusive license: 93 | 94 | (a) under intellectual property rights (other than patent or trademark) 95 | Licensable by such Contributor to use, reproduce, make available, 96 | modify, display, perform, distribute, and otherwise exploit its 97 | Contributions, either on an unmodified basis, with Modifications, or 98 | as part of a Larger Work; and 99 | 100 | (b) under Patent Claims of such Contributor to make, use, sell, offer 101 | for sale, have made, import, and otherwise transfer either its 102 | Contributions or its Contributor Version. 103 | 104 | 2.2. Effective Date 105 | 106 | The licenses granted in Section 2.1 with respect to any Contribution 107 | become effective for each Contribution on the date the Contributor first 108 | distributes such Contribution. 109 | 110 | 2.3. Limitations on Grant Scope 111 | 112 | The licenses granted in this Section 2 are the only rights granted under 113 | this License. No additional rights or licenses will be implied from the 114 | distribution or licensing of Covered Software under this License. 115 | Notwithstanding Section 2.1(b) above, no patent license is granted by a 116 | Contributor: 117 | 118 | (a) for any code that a Contributor has removed from Covered Software; 119 | or 120 | 121 | (b) for infringements caused by: (i) Your and any other third party's 122 | modifications of Covered Software, or (ii) the combination of its 123 | Contributions with other software (except as part of its Contributor 124 | Version); or 125 | 126 | (c) under Patent Claims infringed by Covered Software in the absence of 127 | its Contributions. 128 | 129 | This License does not grant any rights in the trademarks, service marks, 130 | or logos of any Contributor (except as may be necessary to comply with 131 | the notice requirements in Section 3.4). 132 | 133 | 2.4. Subsequent Licenses 134 | 135 | No Contributor makes additional grants as a result of Your choice to 136 | distribute the Covered Software under a subsequent version of this 137 | License (see Section 10.2) or under the terms of a Secondary License (if 138 | permitted under the terms of Section 3.3). 139 | 140 | 2.5. Representation 141 | 142 | Each Contributor represents that the Contributor believes its 143 | Contributions are its original creation(s) or it has sufficient rights 144 | to grant the rights to its Contributions conveyed by this License. 145 | 146 | 2.6. Fair Use 147 | 148 | This License is not intended to limit any rights You have under 149 | applicable copyright doctrines of fair use, fair dealing, or other 150 | equivalents. 151 | 152 | 2.7. Conditions 153 | 154 | Sections 3.1, 3.2, 3.3, and 3.4 are conditions of the licenses granted 155 | in Section 2.1. 156 | 157 | 3. Responsibilities 158 | ------------------- 159 | 160 | 3.1. Distribution of Source Form 161 | 162 | All distribution of Covered Software in Source Code Form, including any 163 | Modifications that You create or to which You contribute, must be under 164 | the terms of this License. You must inform recipients that the Source 165 | Code Form of the Covered Software is governed by the terms of this 166 | License, and how they can obtain a copy of this License. You may not 167 | attempt to alter or restrict the recipients' rights in the Source Code 168 | Form. 169 | 170 | 3.2. Distribution of Executable Form 171 | 172 | If You distribute Covered Software in Executable Form then: 173 | 174 | (a) such Covered Software must also be made available in Source Code 175 | Form, as described in Section 3.1, and You must inform recipients of 176 | the Executable Form how they can obtain a copy of such Source Code 177 | Form by reasonable means in a timely manner, at a charge no more 178 | than the cost of distribution to the recipient; and 179 | 180 | (b) You may distribute such Executable Form under the terms of this 181 | License, or sublicense it under different terms, provided that the 182 | license for the Executable Form does not attempt to limit or alter 183 | the recipients' rights in the Source Code Form under this License. 184 | 185 | 3.3. Distribution of a Larger Work 186 | 187 | You may create and distribute a Larger Work under terms of Your choice, 188 | provided that You also comply with the requirements of this License for 189 | the Covered Software. If the Larger Work is a combination of Covered 190 | Software with a work governed by one or more Secondary Licenses, and the 191 | Covered Software is not Incompatible With Secondary Licenses, this 192 | License permits You to additionally distribute such Covered Software 193 | under the terms of such Secondary License(s), so that the recipient of 194 | the Larger Work may, at their option, further distribute the Covered 195 | Software under the terms of either this License or such Secondary 196 | License(s). 197 | 198 | 3.4. Notices 199 | 200 | You may not remove or alter the substance of any license notices 201 | (including copyright notices, patent notices, disclaimers of warranty, 202 | or limitations of liability) contained within the Source Code Form of 203 | the Covered Software, except that You may alter any license notices to 204 | the extent required to remedy known factual inaccuracies. 205 | 206 | 3.5. Application of Additional Terms 207 | 208 | You may choose to offer, and to charge a fee for, warranty, support, 209 | indemnity or liability obligations to one or more recipients of Covered 210 | Software. However, You may do so only on Your own behalf, and not on 211 | behalf of any Contributor. You must make it absolutely clear that any 212 | such warranty, support, indemnity, or liability obligation is offered by 213 | You alone, and You hereby agree to indemnify every Contributor for any 214 | liability incurred by such Contributor as a result of warranty, support, 215 | indemnity or liability terms You offer. You may include additional 216 | disclaimers of warranty and limitations of liability specific to any 217 | jurisdiction. 218 | 219 | 4. Inability to Comply Due to Statute or Regulation 220 | --------------------------------------------------- 221 | 222 | If it is impossible for You to comply with any of the terms of this 223 | License with respect to some or all of the Covered Software due to 224 | statute, judicial order, or regulation then You must: (a) comply with 225 | the terms of this License to the maximum extent possible; and (b) 226 | describe the limitations and the code they affect. Such description must 227 | be placed in a text file included with all distributions of the Covered 228 | Software under this License. Except to the extent prohibited by statute 229 | or regulation, such description must be sufficiently detailed for a 230 | recipient of ordinary skill to be able to understand it. 231 | 232 | 5. Termination 233 | -------------- 234 | 235 | 5.1. The rights granted under this License will terminate automatically 236 | if You fail to comply with any of its terms. However, if You become 237 | compliant, then the rights granted under this License from a particular 238 | Contributor are reinstated (a) provisionally, unless and until such 239 | Contributor explicitly and finally terminates Your grants, and (b) on an 240 | ongoing basis, if such Contributor fails to notify You of the 241 | non-compliance by some reasonable means prior to 60 days after You have 242 | come back into compliance. Moreover, Your grants from a particular 243 | Contributor are reinstated on an ongoing basis if such Contributor 244 | notifies You of the non-compliance by some reasonable means, this is the 245 | first time You have received notice of non-compliance with this License 246 | from such Contributor, and You become compliant prior to 30 days after 247 | Your receipt of the notice. 248 | 249 | 5.2. If You initiate litigation against any entity by asserting a patent 250 | infringement claim (excluding declaratory judgment actions, 251 | counter-claims, and cross-claims) alleging that a Contributor Version 252 | directly or indirectly infringes any patent, then the rights granted to 253 | You by any and all Contributors for the Covered Software under Section 254 | 2.1 of this License shall terminate. 255 | 256 | 5.3. In the event of termination under Sections 5.1 or 5.2 above, all 257 | end user license agreements (excluding distributors and resellers) which 258 | have been validly granted by You or Your distributors under this License 259 | prior to termination shall survive termination. 260 | 261 | ************************************************************************ 262 | * * 263 | * 6. Disclaimer of Warranty * 264 | * ------------------------- * 265 | * * 266 | * Covered Software is provided under this License on an "as is" * 267 | * basis, without warranty of any kind, either expressed, implied, or * 268 | * statutory, including, without limitation, warranties that the * 269 | * Covered Software is free of defects, merchantable, fit for a * 270 | * particular purpose or non-infringing. The entire risk as to the * 271 | * quality and performance of the Covered Software is with You. * 272 | * Should any Covered Software prove defective in any respect, You * 273 | * (not any Contributor) assume the cost of any necessary servicing, * 274 | * repair, or correction. This disclaimer of warranty constitutes an * 275 | * essential part of this License. No use of any Covered Software is * 276 | * authorized under this License except under this disclaimer. * 277 | * * 278 | ************************************************************************ 279 | 280 | ************************************************************************ 281 | * * 282 | * 7. Limitation of Liability * 283 | * -------------------------- * 284 | * * 285 | * Under no circumstances and under no legal theory, whether tort * 286 | * (including negligence), contract, or otherwise, shall any * 287 | * Contributor, or anyone who distributes Covered Software as * 288 | * permitted above, be liable to You for any direct, indirect, * 289 | * special, incidental, or consequential damages of any character * 290 | * including, without limitation, damages for lost profits, loss of * 291 | * goodwill, work stoppage, computer failure or malfunction, or any * 292 | * and all other commercial damages or losses, even if such party * 293 | * shall have been informed of the possibility of such damages. This * 294 | * limitation of liability shall not apply to liability for death or * 295 | * personal injury resulting from such party's negligence to the * 296 | * extent applicable law prohibits such limitation. Some * 297 | * jurisdictions do not allow the exclusion or limitation of * 298 | * incidental or consequential damages, so this exclusion and * 299 | * limitation may not apply to You. * 300 | * * 301 | ************************************************************************ 302 | 303 | 8. Litigation 304 | ------------- 305 | 306 | Any litigation relating to this License may be brought only in the 307 | courts of a jurisdiction where the defendant maintains its principal 308 | place of business and such litigation shall be governed by laws of that 309 | jurisdiction, without reference to its conflict-of-law provisions. 310 | Nothing in this Section shall prevent a party's ability to bring 311 | cross-claims or counter-claims. 312 | 313 | 9. Miscellaneous 314 | ---------------- 315 | 316 | This License represents the complete agreement concerning the subject 317 | matter hereof. If any provision of this License is held to be 318 | unenforceable, such provision shall be reformed only to the extent 319 | necessary to make it enforceable. Any law or regulation which provides 320 | that the language of a contract shall be construed against the drafter 321 | shall not be used to construe this License against a Contributor. 322 | 323 | 10. Versions of the License 324 | --------------------------- 325 | 326 | 10.1. New Versions 327 | 328 | Mozilla Foundation is the license steward. Except as provided in Section 329 | 10.3, no one other than the license steward has the right to modify or 330 | publish new versions of this License. Each version will be given a 331 | distinguishing version number. 332 | 333 | 10.2. Effect of New Versions 334 | 335 | You may distribute the Covered Software under the terms of the version 336 | of the License under which You originally received the Covered Software, 337 | or under the terms of any subsequent version published by the license 338 | steward. 339 | 340 | 10.3. Modified Versions 341 | 342 | If you create software not governed by this License, and you want to 343 | create a new license for such software, you may create and use a 344 | modified version of this License if you rename the license and remove 345 | any references to the name of the license steward (except to note that 346 | such modified license differs from this License). 347 | 348 | 10.4. Distributing Source Code Form that is Incompatible With Secondary 349 | Licenses 350 | 351 | If You choose to distribute Source Code Form that is Incompatible With 352 | Secondary Licenses under the terms of this version of the License, the 353 | notice described in Exhibit B of this License must be attached. 354 | 355 | Exhibit A - Source Code Form License Notice 356 | ------------------------------------------- 357 | 358 | This Source Code Form is subject to the terms of the Mozilla Public 359 | License, v. 2.0. If a copy of the MPL was not distributed with this 360 | file, You can obtain one at http://mozilla.org/MPL/2.0/. 361 | 362 | If it is not possible or desirable to put the notice in a particular 363 | file, then You may include the notice in a location (such as a LICENSE 364 | file in a relevant directory) where a recipient would be likely to look 365 | for such a notice. 366 | 367 | You may add additional accurate notices of copyright ownership. 368 | 369 | Exhibit B - "Incompatible With Secondary Licenses" Notice 370 | --------------------------------------------------------- 371 | 372 | This Source Code Form is "Incompatible With Secondary Licenses", as 373 | defined by the Mozilla Public License, v. 2.0. -------------------------------------------------------------------------------- /Main.c: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/Main.c -------------------------------------------------------------------------------- /Node.c: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/Node.c -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 课题概述 2 | 3 | 建立一个文本文件A,统计该文件中各字符频率,对各字符进行Huffman编码,将该文件翻译成Huffman编码文件B,再将Huffman编码文件译码成文件C,并对文件A与C进行比较。 4 | 5 | ## 知识点及要求 6 | 7 | 1)能够正确读写文件 8 | 9 | 2)统计文本文件中各字符频率 10 | 11 | 3)编码:构造哈夫曼树,得到各字符的哈夫曼编码 12 | 13 | 4)按照哈夫曼编码将文件A翻译为Huffman编码文件B。 14 | 15 | 5)译码:对文件B进行译码,得到文件C 16 | 17 | 6)比对文件C与文件A 18 | 19 | ## 扩展 20 | 21 | 1)进一步提高压缩算法效率 22 | 23 | 2)文件A中含有中文字符,提示:英文字符ASCII编码,范围0-127;中文机内码GB2312,2个字节,每个字节编码范围(160~254)。 24 | 25 | # 课题分析 26 | 27 | 为了建立哈夫曼树,首先扫描源文件,统计每类字符出现的次数,然后根据字符频度建立哈夫曼树,接着根据哈夫曼树生成哈夫曼编码。再次扫描文件,每次读取8bits,根据“字符—编码”表,匹配编码,并将编码存入压缩文件,同时存入编码表。解压时,读取编码表,然后读取编码匹配编码表找到对应字符,存入文件,完成解压。 28 | 29 | ## 压缩 30 | 31 | 首先需要统计出文本文件的各个字符频率,文件长度。根据出现的频率,将频率为0的字符剔除。通过剩下的字符计算出哈夫曼树需要的结点,分配空间,并将字符与频度存入结点中。然后建立哈夫曼树,得到哈夫曼编码。在编码文件中写入字符种类数、字符及权重、文件长度。读取文件,将字符对应的哈夫曼编码写入编码文件。 32 | 33 | ## 解压 34 | 35 | 读取编码文件中的字符种类数,计算哈夫曼树所需结点,分配空间,读取字符及对应权重,存入哈夫曼树结点中,建立哈夫曼树。然后读取文件长度和编码,进行解码。 36 | 37 | # 功能实现 38 | 39 | ## 文件结构 40 | 41 | ![img](/截图/clip_image001.png) 42 | 43 | 图8 B题文件结构 44 | 45 | 如图8所示,头文件夹中的Custom.h是自定义头文件,里面放着常用的头文件。 46 | 47 | 源文件夹的Compress.c中,放着函数Compress,实现了文件的压缩功能;Huffman.c中,放着寻找两个权值较小的结点Select函数、建立哈夫曼树的函数CreateTree和生成哈夫曼编码的函数HuffmanCode;Main.c中放着主函数;Node.c中放着统计字符频度结点和哈夫曼树结点的定义;Uncompress.c中放着实现解压功能的函数Uncompress。 48 | 49 | 资源文件夹中放着用来提供内容的文本文档Decompression.txt。 50 | 51 | ## Main.c 52 | 53 | Main.c中,放着主函数。主函数定义了文件存放路径的字符数组name、names、input_file_name、output_file_name用来给其他功能提供参数,并通过while循环实现了一个菜单功能,来调用各个函数。 54 | 55 | ## Node.c 56 | 57 | Node.c中,定义了统计字符频度的临时结点CharactersFrequency,结点中有保存字符的uchar和保存频度的frequency。uchar为了存储范围更大(0~255),且unsigned long和int在32位下空间占用一样,所以都定义为无符号型。 58 | 59 | 还定义了哈夫曼树结点Huffman,结点中有保存字符的uchar、保存频度frequency、保存字符哈夫曼编码的code(动态分配空间)和双亲以及左右孩子。 60 | 61 | ## Huffman.c 62 | 63 | Huffman.c中,放着寻找两个权值较小的结点Select函数、建立哈夫曼树的函数CreateTree和生成哈夫曼编码的函数HuffmanCode。 64 | 65 | Select函数参数为Huffman树动态分配空间的首地址、结点数以及两个保存最小结点的变量地址a和b,无返回值。 66 | 67 | 定义一个最小值min,初始值设为unsigned long型最大值ULONG_MAX。遍历每一个双亲是0的结点,将频度最小的结点传给a,将该结点的双亲结点赋值为1,表示已查找过。然后再找下一个最小值,赋值给b,找到后无需标记双亲结点。 68 | 69 | CreateTree函数参数为Huffman树动态分配空间的首地址、字符种类数和哈夫曼树所需节点数,无返回值。 70 | 71 | 哈夫曼树为二叉树,树结点含有权重(在这里为字符频度,同时也要把频度相关联的字符保存在结点中)、左右孩子、双亲等信息。考虑到建立哈夫曼树所需结点会比较多,也比较大,如果静态分配,会浪费很大空间,故我们打算用动态分配的方法,并且,为了利用数组的随机访问特性,也将所需的所有树节点一次性动态分配,保证其内存的连续性。另外,结点中存储编码的空间由于长度不定,也需要动态分配内存。因此在Node.c中,定义了统计字符频度的临时结点CharactersFrequency,这个结点仅保存字符及对应频度,也用动态分配,但是一次性分配256个空间,统计并将信息转移到树结点后,就将这256个空间释放,既利用了数组的随机访问,也避免了空间的浪费。 72 | 73 | 定义两个变量a和b,遍历非字符结点(没有保存字符和频度的结点,即for (i = char_kind; i < number_node; ++i)),调用函数Select寻找双亲结点为0,频度为最小值和次小值的结点给a和b,将a和b的双亲赋值为i,i结点的频度为a+b的频度。循环直到哈夫曼树建立完成。如图9所示。 74 | 75 | ![img](/截图/clip_image003.png) 76 | 77 | 图9 CreateTree函数示意图 78 | 79 | HuffmanCode函数参数为Huffman树动态分配空间的首地址、字符种类数,没有返回值。 80 | 81 | 每类字符对应一串编码,故从叶子结点(字符所在结点)由下往上生成每类字符对应的编码,左0,右1。为了得到正向的编码,设置一个编码缓存数组,从后往前保存,然后从前往后拷贝到叶子结点对应编码域中,再根据得到的编码长度为哈夫曼树的code分配空间。对于缓存数组的大小,由于字符种类最多为256种,构建的哈夫曼树最多有256个叶子结点,树的深度最大为255,故编码最长为255,所以分配256个空间,最后一位用于保存结束标志。 82 | 83 | 首先给字符数组code_temporarily动态分配256个空间,将code_temporarily[255]设为’\0’。然后从叶子向根遍历,左0右1求编码,倒序存入code_temporarily数组中。然后给哈夫曼树的code动态分配存储空间(256-数组下标),将code_temporarily数组暂存的编码存入哈夫曼树的code中。具体如图10所示。 84 | 85 | ![img](/截图/clip_image005.png) 86 | 87 | ![img](/截图/clip_image007.png) 88 | 89 | 图10 HuffmanCode函数示意图 90 | 91 | ## Compress.c 92 | 93 | Compress.c中,放着Compress函数。函数的参数为指向读取文件地址的文件指针、指向生成文件地址的文件指针,int型的返回值。 94 | 95 | 要建立哈夫曼树,先要得到各类字符的频度,我想到了两种扫描方案,一种是利用链表存储,每扫描到一类新字符就动态分配内存,另一种是利用数组,静态分配256个空间,对应256类字符,然后用下标随机存储。链表在需要时才分配存储空间,可以节省内存。但是每加入一个新字符都要扫描一次链表,很费时。考虑到最多有256个字符种类,不多,使用静态数组不会造成很大的空间浪费,而可以用数组的下标匹配字符,不需扫描数组就可以找到每类字符的位置,达到随机存储的目的,效率有很大的提高。当然,不一定每类字符都出现,所以,统计完后,需要排序,将字符频度为零的结点剔除。 96 | 97 | 统计字符频度后,打开要生成的文件,将字符种类写入。当字符种类为1时,只有一个哈夫曼结点,无法构造哈夫曼编码,但是可以直接处理,依次保存字符种类数、字符、字符频度(此时就是文件长度)即可,解压时仍然先读取字符种类数,为1则特殊处理,读取字符和频度(此时就是文件长度),利用频度控制循环,输出字符到文件即可。当字符种类不是1时,根据字符种类数,计算建立哈夫曼树所需结点数(2*字符种类数-1),然后动态分配空间建立哈夫曼树所需结点,并将暂存在CharactersFrequency(暂存字符节点)中的字符和频度存入哈夫曼树中,并将所有节点的双亲值初始化为0。 98 | 99 | 然后调用CreateTree函数建树,调用HuffmanCode函数算哈夫曼编码。然后打开要生成的编码文件,写入字符和相应的权重。紧接着字符和权重信息后面写入文件长度和字符编码。读取文件时这里以feof来判断文件结束,是由于eof判断的文件类型比较局限,而feof在读完最后一个字节之后,再次读文件时才会设置结束标志,所以需要在while循环之前读一次,然后每次在循环的最后读取文件,这样可以正确判断文件结束。在哈夫曼树节点中,编码的每一位都是以字符形式保存的,占用空间很大,不可以直接写入压缩文件,故需要转为二进制形式写入。可以定义一个函数,将保存编码的字符数组转为二进制,但是比较麻烦,效率也不高;正好,可以利用C语言提供的位操作(与、或、移位)来实现,每匹配一位,用“或”操作存入低位,并左移一位,为下一位腾出空间,依次循环,满足8位就写入一次。以位操作来匹配编码,每次存入最低位,然后左移一位,依次循环处理,满8位保存一次,直到全部字符处理完成。具体如图11和图12所示。 100 | 101 | ![img](/截图/clip_image009.png) 102 | 103 | 图11 将字符对应的哈夫曼码写入编码文件 104 | 105 | ![img](/截图/clip_image011.png) 106 | 107 | 图12 哈夫曼码不足8位的处理方法 108 | 109 | ## Uncompress.c 110 | 111 | Uncompress.c中,放着Uncompress函数。函数的参数为指向读取文件地址的文件指针、指向生成文件地址的文件指针,int型的返回值。 112 | 113 | 以二进制方式打开压缩文件,首先将文件前端的字符种类数读取出来,根据字符种类的数量动态分配空间,然后将随后的字符编码读取处理保存到动态分配的结点中,然后开始译码。 114 | 115 | 以8位为处理单元,依次读取随后的编码匹配对应的字符,这里对比编码依然用在文件压缩中所用的方法,就是用C语言的位操作,同0x80与操作,判断8bits字符的最高位是否为‘1’,对比一位后,左移一位,将最高位移除,次高位移到最高位,依次对比。这次是从编码到字符反向匹配,与压缩时有一点不同,需要用读取的编码逐位与编码表中的编码进行对比,对比一位后,增加一位再对比,而且每次对比都是一个循环(与每个字符的编码对比),效率很低。 116 | 117 | 于是,我思考另外的方法,可以将哈夫曼树保存到文件中,解码时,从树根到叶子对比编码,只要一次遍历就可以找到编码对应的存于叶子结点中的字符,极大提高了效率。 118 | 119 | 然而,我们发现树结点中有字符、编码、左右孩子、双亲,而且孩子和双亲还必须是整型的(树节点最多为256*2-1=511个),占用空间很大,会导致压缩文件变大。 120 | 121 | 我们进一步考虑,可以仅存储字符及对应频度(频度为unsigned long,一般情况下与int占用空间一样,同为4个字节),解码时读取数据重建哈夫曼树,这样就解决了空间问题。 122 | 123 | 虽然重建哈夫曼树(双重循环,每个循环的次数最大为511)也要花费一定的时间,但是相对上面的与编码表匹配(每位编码都要循环匹配所有字符(最多为256种)一次,而总的编码位数一般很大,且随着文件变大而增长)所花费的时间更少。具体如图13和图14所示。 124 | 125 | ![img](/截图/clip_image013.png) 126 | 127 | 图13 举例 128 | 129 | ![img](/截图/clip_image015.png) 130 | 131 | 图14 译码过程 132 | 133 | 后来我在初步编码时,发现一些问题:解码后无法得到完全正确的源文件,经过排查,发现以EOF判断压缩文件的结束不可取,因为压缩文件是二进制文件,而EOF一般用来判断非二进制文件的结束,所以我们用文件长度来控制译码结束。 134 | 135 | # 运行截图 136 | 137 | ![img](/截图/clip_image017.jpg) 138 | 139 | 图15 主菜单 140 | 141 | 142 | 143 | ![img](/截图/clip_image019.jpg) 144 | 145 | 图16 压缩功能 146 | 147 | 148 | 149 | ![img](/截图/clip_image021.jpg) 150 | 151 | 图17 压缩文件的内容![img](/截图/clip_image023.jpg) 152 | 153 | 图18 原文件与压缩文件对比 154 | 155 | 156 | 157 | ![img](/截图/clip_image025.jpg) 158 | 159 | 图19 解压功能![img](/截图/clip_image027.jpg) 160 | 161 | 图20 原文件与解压文件对比一 162 | 163 | 164 | 165 | ![img](/截图/clip_image029.jpg) 166 | 167 | 图21 解压文件与原文件对比二 -------------------------------------------------------------------------------- /Uncompress.c: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/Uncompress.c -------------------------------------------------------------------------------- /截图/clip_image001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/截图/clip_image001.png -------------------------------------------------------------------------------- /截图/clip_image003.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/截图/clip_image003.png -------------------------------------------------------------------------------- /截图/clip_image005.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/截图/clip_image005.png -------------------------------------------------------------------------------- /截图/clip_image007.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/截图/clip_image007.png -------------------------------------------------------------------------------- /截图/clip_image009.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/截图/clip_image009.png -------------------------------------------------------------------------------- /截图/clip_image011.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/截图/clip_image011.png -------------------------------------------------------------------------------- /截图/clip_image013.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/截图/clip_image013.png -------------------------------------------------------------------------------- /截图/clip_image015.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/截图/clip_image015.png -------------------------------------------------------------------------------- /截图/clip_image016.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/截图/clip_image016.jpg -------------------------------------------------------------------------------- /截图/clip_image017.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/截图/clip_image017.jpg -------------------------------------------------------------------------------- /截图/clip_image018.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/截图/clip_image018.jpg -------------------------------------------------------------------------------- /截图/clip_image019.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/截图/clip_image019.jpg -------------------------------------------------------------------------------- /截图/clip_image020.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/截图/clip_image020.jpg -------------------------------------------------------------------------------- /截图/clip_image021.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/截图/clip_image021.jpg -------------------------------------------------------------------------------- /截图/clip_image022.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/截图/clip_image022.jpg -------------------------------------------------------------------------------- /截图/clip_image023.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/截图/clip_image023.jpg -------------------------------------------------------------------------------- /截图/clip_image024.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/截图/clip_image024.jpg -------------------------------------------------------------------------------- /截图/clip_image025.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/截图/clip_image025.jpg -------------------------------------------------------------------------------- /截图/clip_image026.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/截图/clip_image026.jpg -------------------------------------------------------------------------------- /截图/clip_image027.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/截图/clip_image027.jpg -------------------------------------------------------------------------------- /截图/clip_image028.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/截图/clip_image028.jpg -------------------------------------------------------------------------------- /截图/clip_image029.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dormirr/huffman-coding-compression/43c63f8d5a33f0960eead1af4a555d48b247950f/截图/clip_image029.jpg --------------------------------------------------------------------------------