├── .gitignore ├── .travis.yml ├── LICENSE.txt ├── README.rst ├── samples └── zar1-sample-text-01.zar ├── setup.py └── src ├── zarnegar-converter.py └── zarnegar_converter ├── __init__.py ├── tests └── test_zar1.py ├── unicode_arabic.py ├── unicode_bidi.py ├── unicode_joining.py ├── zar1_encoding.py ├── zar1_file.py └── zar_file.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.egg-info 2 | *.pyc 3 | /.eggs/ 4 | /build/ 5 | /dist/ 6 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | language: python 2 | python: 3 | - "2.7" 4 | script: ./setup.py test 5 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | GNU GENERAL PUBLIC LICENSE 2 | Version 3, 29 June 2007 3 | 4 | Copyright (C) 2007 Free Software Foundation, Inc. 5 | Everyone is permitted to copy and distribute verbatim copies 6 | of this license document, but changing it is not allowed. 7 | 8 | Preamble 9 | 10 | The GNU General Public License is a free, copyleft license for 11 | software and other kinds of works. 12 | 13 | The licenses for most software and other practical works are designed 14 | to take away your freedom to share and change the works. By contrast, 15 | the GNU General Public License is intended to guarantee your freedom to 16 | share and change all versions of a program--to make sure it remains free 17 | software for all its users. We, the Free Software Foundation, use the 18 | GNU General Public License for most of our software; it applies also to 19 | any other work released this way by its authors. You can apply it to 20 | your programs, too. 21 | 22 | When we speak of free software, we are referring to freedom, not 23 | price. Our General Public Licenses are designed to make sure that you 24 | have the freedom to distribute copies of free software (and charge for 25 | them if you wish), that you receive source code or can get it if you 26 | want it, that you can change the software or use pieces of it in new 27 | free programs, and that you know you can do these things. 28 | 29 | To protect your rights, we need to prevent others from denying you 30 | these rights or asking you to surrender the rights. Therefore, you have 31 | certain responsibilities if you distribute copies of the software, or if 32 | you modify it: responsibilities to respect the freedom of others. 33 | 34 | For example, if you distribute copies of such a program, whether 35 | gratis or for a fee, you must pass on to the recipients the same 36 | freedoms that you received. You must make sure that they, too, receive 37 | or can get the source code. And you must show them these terms so they 38 | know their rights. 39 | 40 | Developers that use the GNU GPL protect your rights with two steps: 41 | (1) assert copyright on the software, and (2) offer you this License 42 | giving you legal permission to copy, distribute and/or modify it. 43 | 44 | For the developers' and authors' protection, the GPL clearly explains 45 | that there is no warranty for this free software. For both users' and 46 | authors' sake, the GPL requires that modified versions be marked as 47 | changed, so that their problems will not be attributed erroneously to 48 | authors of previous versions. 49 | 50 | Some devices are designed to deny users access to install or run 51 | modified versions of the software inside them, although the manufacturer 52 | can do so. This is fundamentally incompatible with the aim of 53 | protecting users' freedom to change the software. The systematic 54 | pattern of such abuse occurs in the area of products for individuals to 55 | use, which is precisely where it is most unacceptable. Therefore, we 56 | have designed this version of the GPL to prohibit the practice for those 57 | products. If such problems arise substantially in other domains, we 58 | stand ready to extend this provision to those domains in future versions 59 | of the GPL, as needed to protect the freedom of users. 60 | 61 | Finally, every program is threatened constantly by software patents. 62 | States should not allow patents to restrict development and use of 63 | software on general-purpose computers, but in those that do, we wish to 64 | avoid the special danger that patents applied to a free program could 65 | make it effectively proprietary. To prevent this, the GPL assures that 66 | patents cannot be used to render the program non-free. 67 | 68 | The precise terms and conditions for copying, distribution and 69 | modification follow. 70 | 71 | TERMS AND CONDITIONS 72 | 73 | 0. Definitions. 74 | 75 | "This License" refers to version 3 of the GNU General Public License. 76 | 77 | "Copyright" also means copyright-like laws that apply to other kinds of 78 | works, such as semiconductor masks. 79 | 80 | "The Program" refers to any copyrightable work licensed under this 81 | License. Each licensee is addressed as "you". "Licensees" and 82 | "recipients" may be individuals or organizations. 83 | 84 | To "modify" a work means to copy from or adapt all or part of the work 85 | in a fashion requiring copyright permission, other than the making of an 86 | exact copy. The resulting work is called a "modified version" of the 87 | earlier work or a work "based on" the earlier work. 88 | 89 | A "covered work" means either the unmodified Program or a work based 90 | on the Program. 91 | 92 | To "propagate" a work means to do anything with it that, without 93 | permission, would make you directly or secondarily liable for 94 | infringement under applicable copyright law, except executing it on a 95 | computer or modifying a private copy. Propagation includes copying, 96 | distribution (with or without modification), making available to the 97 | public, and in some countries other activities as well. 98 | 99 | To "convey" a work means any kind of propagation that enables other 100 | parties to make or receive copies. Mere interaction with a user through 101 | a computer network, with no transfer of a copy, is not conveying. 102 | 103 | An interactive user interface displays "Appropriate Legal Notices" 104 | to the extent that it includes a convenient and prominently visible 105 | feature that (1) displays an appropriate copyright notice, and (2) 106 | tells the user that there is no warranty for the work (except to the 107 | extent that warranties are provided), that licensees may convey the 108 | work under this License, and how to view a copy of this License. If 109 | the interface presents a list of user commands or options, such as a 110 | menu, a prominent item in the list meets this criterion. 111 | 112 | 1. Source Code. 113 | 114 | The "source code" for a work means the preferred form of the work 115 | for making modifications to it. "Object code" means any non-source 116 | form of a work. 117 | 118 | A "Standard Interface" means an interface that either is an official 119 | standard defined by a recognized standards body, or, in the case of 120 | interfaces specified for a particular programming language, one that 121 | is widely used among developers working in that language. 122 | 123 | The "System Libraries" of an executable work include anything, other 124 | than the work as a whole, that (a) is included in the normal form of 125 | packaging a Major Component, but which is not part of that Major 126 | Component, and (b) serves only to enable use of the work with that 127 | Major Component, or to implement a Standard Interface for which an 128 | implementation is available to the public in source code form. A 129 | "Major Component", in this context, means a major essential component 130 | (kernel, window system, and so on) of the specific operating system 131 | (if any) on which the executable work runs, or a compiler used to 132 | produce the work, or an object code interpreter used to run it. 133 | 134 | The "Corresponding Source" for a work in object code form means all 135 | the source code needed to generate, install, and (for an executable 136 | work) run the object code and to modify the work, including scripts to 137 | control those activities. However, it does not include the work's 138 | System Libraries, or general-purpose tools or generally available free 139 | programs which are used unmodified in performing those activities but 140 | which are not part of the work. For example, Corresponding Source 141 | includes interface definition files associated with source files for 142 | the work, and the source code for shared libraries and dynamically 143 | linked subprograms that the work is specifically designed to require, 144 | such as by intimate data communication or control flow between those 145 | subprograms and other parts of the work. 146 | 147 | The Corresponding Source need not include anything that users 148 | can regenerate automatically from other parts of the Corresponding 149 | Source. 150 | 151 | The Corresponding Source for a work in source code form is that 152 | same work. 153 | 154 | 2. Basic Permissions. 155 | 156 | All rights granted under this License are granted for the term of 157 | copyright on the Program, and are irrevocable provided the stated 158 | conditions are met. This License explicitly affirms your unlimited 159 | permission to run the unmodified Program. The output from running a 160 | covered work is covered by this License only if the output, given its 161 | content, constitutes a covered work. This License acknowledges your 162 | rights of fair use or other equivalent, as provided by copyright law. 163 | 164 | You may make, run and propagate covered works that you do not 165 | convey, without conditions so long as your license otherwise remains 166 | in force. You may convey covered works to others for the sole purpose 167 | of having them make modifications exclusively for you, or provide you 168 | with facilities for running those works, provided that you comply with 169 | the terms of this License in conveying all material for which you do 170 | not control copyright. Those thus making or running the covered works 171 | for you must do so exclusively on your behalf, under your direction 172 | and control, on terms that prohibit them from making any copies of 173 | your copyrighted material outside their relationship with you. 174 | 175 | Conveying under any other circumstances is permitted solely under 176 | the conditions stated below. Sublicensing is not allowed; section 10 177 | makes it unnecessary. 178 | 179 | 3. Protecting Users' Legal Rights From Anti-Circumvention Law. 180 | 181 | No covered work shall be deemed part of an effective technological 182 | measure under any applicable law fulfilling obligations under article 183 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or 184 | similar laws prohibiting or restricting circumvention of such 185 | measures. 186 | 187 | When you convey a covered work, you waive any legal power to forbid 188 | circumvention of technological measures to the extent such circumvention 189 | is effected by exercising rights under this License with respect to 190 | the covered work, and you disclaim any intention to limit operation or 191 | modification of the work as a means of enforcing, against the work's 192 | users, your or third parties' legal rights to forbid circumvention of 193 | technological measures. 194 | 195 | 4. Conveying Verbatim Copies. 196 | 197 | You may convey verbatim copies of the Program's source code as you 198 | receive it, in any medium, provided that you conspicuously and 199 | appropriately publish on each copy an appropriate copyright notice; 200 | keep intact all notices stating that this License and any 201 | non-permissive terms added in accord with section 7 apply to the code; 202 | keep intact all notices of the absence of any warranty; and give all 203 | recipients a copy of this License along with the Program. 204 | 205 | You may charge any price or no price for each copy that you convey, 206 | and you may offer support or warranty protection for a fee. 207 | 208 | 5. Conveying Modified Source Versions. 209 | 210 | You may convey a work based on the Program, or the modifications to 211 | produce it from the Program, in the form of source code under the 212 | terms of section 4, provided that you also meet all of these conditions: 213 | 214 | a) The work must carry prominent notices stating that you modified 215 | it, and giving a relevant date. 216 | 217 | b) The work must carry prominent notices stating that it is 218 | released under this License and any conditions added under section 219 | 7. This requirement modifies the requirement in section 4 to 220 | "keep intact all notices". 221 | 222 | c) You must license the entire work, as a whole, under this 223 | License to anyone who comes into possession of a copy. This 224 | License will therefore apply, along with any applicable section 7 225 | additional terms, to the whole of the work, and all its parts, 226 | regardless of how they are packaged. This License gives no 227 | permission to license the work in any other way, but it does not 228 | invalidate such permission if you have separately received it. 229 | 230 | d) If the work has interactive user interfaces, each must display 231 | Appropriate Legal Notices; however, if the Program has interactive 232 | interfaces that do not display Appropriate Legal Notices, your 233 | work need not make them do so. 234 | 235 | A compilation of a covered work with other separate and independent 236 | works, which are not by their nature extensions of the covered work, 237 | and which are not combined with it such as to form a larger program, 238 | in or on a volume of a storage or distribution medium, is called an 239 | "aggregate" if the compilation and its resulting copyright are not 240 | used to limit the access or legal rights of the compilation's users 241 | beyond what the individual works permit. Inclusion of a covered work 242 | in an aggregate does not cause this License to apply to the other 243 | parts of the aggregate. 244 | 245 | 6. Conveying Non-Source Forms. 246 | 247 | You may convey a covered work in object code form under the terms 248 | of sections 4 and 5, provided that you also convey the 249 | machine-readable Corresponding Source under the terms of this License, 250 | in one of these ways: 251 | 252 | a) Convey the object code in, or embodied in, a physical product 253 | (including a physical distribution medium), accompanied by the 254 | Corresponding Source fixed on a durable physical medium 255 | customarily used for software interchange. 256 | 257 | b) Convey the object code in, or embodied in, a physical product 258 | (including a physical distribution medium), accompanied by a 259 | written offer, valid for at least three years and valid for as 260 | long as you offer spare parts or customer support for that product 261 | model, to give anyone who possesses the object code either (1) a 262 | copy of the Corresponding Source for all the software in the 263 | product that is covered by this License, on a durable physical 264 | medium customarily used for software interchange, for a price no 265 | more than your reasonable cost of physically performing this 266 | conveying of source, or (2) access to copy the 267 | Corresponding Source from a network server at no charge. 268 | 269 | c) Convey individual copies of the object code with a copy of the 270 | written offer to provide the Corresponding Source. This 271 | alternative is allowed only occasionally and noncommercially, and 272 | only if you received the object code with such an offer, in accord 273 | with subsection 6b. 274 | 275 | d) Convey the object code by offering access from a designated 276 | place (gratis or for a charge), and offer equivalent access to the 277 | Corresponding Source in the same way through the same place at no 278 | further charge. You need not require recipients to copy the 279 | Corresponding Source along with the object code. If the place to 280 | copy the object code is a network server, the Corresponding Source 281 | may be on a different server (operated by you or a third party) 282 | that supports equivalent copying facilities, provided you maintain 283 | clear directions next to the object code saying where to find the 284 | Corresponding Source. Regardless of what server hosts the 285 | Corresponding Source, you remain obligated to ensure that it is 286 | available for as long as needed to satisfy these requirements. 287 | 288 | e) Convey the object code using peer-to-peer transmission, provided 289 | you inform other peers where the object code and Corresponding 290 | Source of the work are being offered to the general public at no 291 | charge under subsection 6d. 292 | 293 | A separable portion of the object code, whose source code is excluded 294 | from the Corresponding Source as a System Library, need not be 295 | included in conveying the object code work. 296 | 297 | A "User Product" is either (1) a "consumer product", which means any 298 | tangible personal property which is normally used for personal, family, 299 | or household purposes, or (2) anything designed or sold for incorporation 300 | into a dwelling. In determining whether a product is a consumer product, 301 | doubtful cases shall be resolved in favor of coverage. For a particular 302 | product received by a particular user, "normally used" refers to a 303 | typical or common use of that class of product, regardless of the status 304 | of the particular user or of the way in which the particular user 305 | actually uses, or expects or is expected to use, the product. A product 306 | is a consumer product regardless of whether the product has substantial 307 | commercial, industrial or non-consumer uses, unless such uses represent 308 | the only significant mode of use of the product. 309 | 310 | "Installation Information" for a User Product means any methods, 311 | procedures, authorization keys, or other information required to install 312 | and execute modified versions of a covered work in that User Product from 313 | a modified version of its Corresponding Source. The information must 314 | suffice to ensure that the continued functioning of the modified object 315 | code is in no case prevented or interfered with solely because 316 | modification has been made. 317 | 318 | If you convey an object code work under this section in, or with, or 319 | specifically for use in, a User Product, and the conveying occurs as 320 | part of a transaction in which the right of possession and use of the 321 | User Product is transferred to the recipient in perpetuity or for a 322 | fixed term (regardless of how the transaction is characterized), the 323 | Corresponding Source conveyed under this section must be accompanied 324 | by the Installation Information. But this requirement does not apply 325 | if neither you nor any third party retains the ability to install 326 | modified object code on the User Product (for example, the work has 327 | been installed in ROM). 328 | 329 | The requirement to provide Installation Information does not include a 330 | requirement to continue to provide support service, warranty, or updates 331 | for a work that has been modified or installed by the recipient, or for 332 | the User Product in which it has been modified or installed. Access to a 333 | network may be denied when the modification itself materially and 334 | adversely affects the operation of the network or violates the rules and 335 | protocols for communication across the network. 336 | 337 | Corresponding Source conveyed, and Installation Information provided, 338 | in accord with this section must be in a format that is publicly 339 | documented (and with an implementation available to the public in 340 | source code form), and must require no special password or key for 341 | unpacking, reading or copying. 342 | 343 | 7. Additional Terms. 344 | 345 | "Additional permissions" are terms that supplement the terms of this 346 | License by making exceptions from one or more of its conditions. 347 | Additional permissions that are applicable to the entire Program shall 348 | be treated as though they were included in this License, to the extent 349 | that they are valid under applicable law. If additional permissions 350 | apply only to part of the Program, that part may be used separately 351 | under those permissions, but the entire Program remains governed by 352 | this License without regard to the additional permissions. 353 | 354 | When you convey a copy of a covered work, you may at your option 355 | remove any additional permissions from that copy, or from any part of 356 | it. (Additional permissions may be written to require their own 357 | removal in certain cases when you modify the work.) You may place 358 | additional permissions on material, added by you to a covered work, 359 | for which you have or can give appropriate copyright permission. 360 | 361 | Notwithstanding any other provision of this License, for material you 362 | add to a covered work, you may (if authorized by the copyright holders of 363 | that material) supplement the terms of this License with terms: 364 | 365 | a) Disclaiming warranty or limiting liability differently from the 366 | terms of sections 15 and 16 of this License; or 367 | 368 | b) Requiring preservation of specified reasonable legal notices or 369 | author attributions in that material or in the Appropriate Legal 370 | Notices displayed by works containing it; or 371 | 372 | c) Prohibiting misrepresentation of the origin of that material, or 373 | requiring that modified versions of such material be marked in 374 | reasonable ways as different from the original version; or 375 | 376 | d) Limiting the use for publicity purposes of names of licensors or 377 | authors of the material; or 378 | 379 | e) Declining to grant rights under trademark law for use of some 380 | trade names, trademarks, or service marks; or 381 | 382 | f) Requiring indemnification of licensors and authors of that 383 | material by anyone who conveys the material (or modified versions of 384 | it) with contractual assumptions of liability to the recipient, for 385 | any liability that these contractual assumptions directly impose on 386 | those licensors and authors. 387 | 388 | All other non-permissive additional terms are considered "further 389 | restrictions" within the meaning of section 10. If the Program as you 390 | received it, or any part of it, contains a notice stating that it is 391 | governed by this License along with a term that is a further 392 | restriction, you may remove that term. If a license document contains 393 | a further restriction but permits relicensing or conveying under this 394 | License, you may add to a covered work material governed by the terms 395 | of that license document, provided that the further restriction does 396 | not survive such relicensing or conveying. 397 | 398 | If you add terms to a covered work in accord with this section, you 399 | must place, in the relevant source files, a statement of the 400 | additional terms that apply to those files, or a notice indicating 401 | where to find the applicable terms. 402 | 403 | Additional terms, permissive or non-permissive, may be stated in the 404 | form of a separately written license, or stated as exceptions; 405 | the above requirements apply either way. 406 | 407 | 8. Termination. 408 | 409 | You may not propagate or modify a covered work except as expressly 410 | provided under this License. Any attempt otherwise to propagate or 411 | modify it is void, and will automatically terminate your rights under 412 | this License (including any patent licenses granted under the third 413 | paragraph of section 11). 414 | 415 | However, if you cease all violation of this License, then your 416 | license from a particular copyright holder is reinstated (a) 417 | provisionally, unless and until the copyright holder explicitly and 418 | finally terminates your license, and (b) permanently, if the copyright 419 | holder fails to notify you of the violation by some reasonable means 420 | prior to 60 days after the cessation. 421 | 422 | Moreover, your license from a particular copyright holder is 423 | reinstated permanently if the copyright holder notifies you of the 424 | violation by some reasonable means, this is the first time you have 425 | received notice of violation of this License (for any work) from that 426 | copyright holder, and you cure the violation prior to 30 days after 427 | your receipt of the notice. 428 | 429 | Termination of your rights under this section does not terminate the 430 | licenses of parties who have received copies or rights from you under 431 | this License. If your rights have been terminated and not permanently 432 | reinstated, you do not qualify to receive new licenses for the same 433 | material under section 10. 434 | 435 | 9. Acceptance Not Required for Having Copies. 436 | 437 | You are not required to accept this License in order to receive or 438 | run a copy of the Program. Ancillary propagation of a covered work 439 | occurring solely as a consequence of using peer-to-peer transmission 440 | to receive a copy likewise does not require acceptance. However, 441 | nothing other than this License grants you permission to propagate or 442 | modify any covered work. These actions infringe copyright if you do 443 | not accept this License. Therefore, by modifying or propagating a 444 | covered work, you indicate your acceptance of this License to do so. 445 | 446 | 10. Automatic Licensing of Downstream Recipients. 447 | 448 | Each time you convey a covered work, the recipient automatically 449 | receives a license from the original licensors, to run, modify and 450 | propagate that work, subject to this License. You are not responsible 451 | for enforcing compliance by third parties with this License. 452 | 453 | An "entity transaction" is a transaction transferring control of an 454 | organization, or substantially all assets of one, or subdividing an 455 | organization, or merging organizations. If propagation of a covered 456 | work results from an entity transaction, each party to that 457 | transaction who receives a copy of the work also receives whatever 458 | licenses to the work the party's predecessor in interest had or could 459 | give under the previous paragraph, plus a right to possession of the 460 | Corresponding Source of the work from the predecessor in interest, if 461 | the predecessor has it or can get it with reasonable efforts. 462 | 463 | You may not impose any further restrictions on the exercise of the 464 | rights granted or affirmed under this License. For example, you may 465 | not impose a license fee, royalty, or other charge for exercise of 466 | rights granted under this License, and you may not initiate litigation 467 | (including a cross-claim or counterclaim in a lawsuit) alleging that 468 | any patent claim is infringed by making, using, selling, offering for 469 | sale, or importing the Program or any portion of it. 470 | 471 | 11. Patents. 472 | 473 | A "contributor" is a copyright holder who authorizes use under this 474 | License of the Program or a work on which the Program is based. The 475 | work thus licensed is called the contributor's "contributor version". 476 | 477 | A contributor's "essential patent claims" are all patent claims 478 | owned or controlled by the contributor, whether already acquired or 479 | hereafter acquired, that would be infringed by some manner, permitted 480 | by this License, of making, using, or selling its contributor version, 481 | but do not include claims that would be infringed only as a 482 | consequence of further modification of the contributor version. For 483 | purposes of this definition, "control" includes the right to grant 484 | patent sublicenses in a manner consistent with the requirements of 485 | this License. 486 | 487 | Each contributor grants you a non-exclusive, worldwide, royalty-free 488 | patent license under the contributor's essential patent claims, to 489 | make, use, sell, offer for sale, import and otherwise run, modify and 490 | propagate the contents of its contributor version. 491 | 492 | In the following three paragraphs, a "patent license" is any express 493 | agreement or commitment, however denominated, not to enforce a patent 494 | (such as an express permission to practice a patent or covenant not to 495 | sue for patent infringement). To "grant" such a patent license to a 496 | party means to make such an agreement or commitment not to enforce a 497 | patent against the party. 498 | 499 | If you convey a covered work, knowingly relying on a patent license, 500 | and the Corresponding Source of the work is not available for anyone 501 | to copy, free of charge and under the terms of this License, through a 502 | publicly available network server or other readily accessible means, 503 | then you must either (1) cause the Corresponding Source to be so 504 | available, or (2) arrange to deprive yourself of the benefit of the 505 | patent license for this particular work, or (3) arrange, in a manner 506 | consistent with the requirements of this License, to extend the patent 507 | license to downstream recipients. "Knowingly relying" means you have 508 | actual knowledge that, but for the patent license, your conveying the 509 | covered work in a country, or your recipient's use of the covered work 510 | in a country, would infringe one or more identifiable patents in that 511 | country that you have reason to believe are valid. 512 | 513 | If, pursuant to or in connection with a single transaction or 514 | arrangement, you convey, or propagate by procuring conveyance of, a 515 | covered work, and grant a patent license to some of the parties 516 | receiving the covered work authorizing them to use, propagate, modify 517 | or convey a specific copy of the covered work, then the patent license 518 | you grant is automatically extended to all recipients of the covered 519 | work and works based on it. 520 | 521 | A patent license is "discriminatory" if it does not include within 522 | the scope of its coverage, prohibits the exercise of, or is 523 | conditioned on the non-exercise of one or more of the rights that are 524 | specifically granted under this License. You may not convey a covered 525 | work if you are a party to an arrangement with a third party that is 526 | in the business of distributing software, under which you make payment 527 | to the third party based on the extent of your activity of conveying 528 | the work, and under which the third party grants, to any of the 529 | parties who would receive the covered work from you, a discriminatory 530 | patent license (a) in connection with copies of the covered work 531 | conveyed by you (or copies made from those copies), or (b) primarily 532 | for and in connection with specific products or compilations that 533 | contain the covered work, unless you entered into that arrangement, 534 | or that patent license was granted, prior to 28 March 2007. 535 | 536 | Nothing in this License shall be construed as excluding or limiting 537 | any implied license or other defenses to infringement that may 538 | otherwise be available to you under applicable patent law. 539 | 540 | 12. No Surrender of Others' Freedom. 541 | 542 | If conditions are imposed on you (whether by court order, agreement or 543 | otherwise) that contradict the conditions of this License, they do not 544 | excuse you from the conditions of this License. If you cannot convey a 545 | covered work so as to satisfy simultaneously your obligations under this 546 | License and any other pertinent obligations, then as a consequence you may 547 | not convey it at all. For example, if you agree to terms that obligate you 548 | to collect a royalty for further conveying from those to whom you convey 549 | the Program, the only way you could satisfy both those terms and this 550 | License would be to refrain entirely from conveying the Program. 551 | 552 | 13. Use with the GNU Affero General Public License. 553 | 554 | Notwithstanding any other provision of this License, you have 555 | permission to link or combine any covered work with a work licensed 556 | under version 3 of the GNU Affero General Public License into a single 557 | combined work, and to convey the resulting work. The terms of this 558 | License will continue to apply to the part which is the covered work, 559 | but the special requirements of the GNU Affero General Public License, 560 | section 13, concerning interaction through a network will apply to the 561 | combination as such. 562 | 563 | 14. Revised Versions of this License. 564 | 565 | The Free Software Foundation may publish revised and/or new versions of 566 | the GNU General Public License from time to time. Such new versions will 567 | be similar in spirit to the present version, but may differ in detail to 568 | address new problems or concerns. 569 | 570 | Each version is given a distinguishing version number. If the 571 | Program specifies that a certain numbered version of the GNU General 572 | Public License "or any later version" applies to it, you have the 573 | option of following the terms and conditions either of that numbered 574 | version or of any later version published by the Free Software 575 | Foundation. If the Program does not specify a version number of the 576 | GNU General Public License, you may choose any version ever published 577 | by the Free Software Foundation. 578 | 579 | If the Program specifies that a proxy can decide which future 580 | versions of the GNU General Public License can be used, that proxy's 581 | public statement of acceptance of a version permanently authorizes you 582 | to choose that version for the Program. 583 | 584 | Later license versions may give you additional or different 585 | permissions. However, no additional obligations are imposed on any 586 | author or copyright holder as a result of your choosing to follow a 587 | later version. 588 | 589 | 15. Disclaimer of Warranty. 590 | 591 | THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY 592 | APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT 593 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY 594 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, 595 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 596 | PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM 597 | IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF 598 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 599 | 600 | 16. Limitation of Liability. 601 | 602 | IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 603 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS 604 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY 605 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE 606 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF 607 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD 608 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), 609 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF 610 | SUCH DAMAGES. 611 | 612 | 17. Interpretation of Sections 15 and 16. 613 | 614 | If the disclaimer of warranty and limitation of liability provided 615 | above cannot be given local legal effect according to their terms, 616 | reviewing courts shall apply local law that most closely approximates 617 | an absolute waiver of all civil liability in connection with the 618 | Program, unless a warranty or assumption of liability accompanies a 619 | copy of the Program in return for a fee. 620 | 621 | END OF TERMS AND CONDITIONS 622 | -------------------------------------------------------------------------------- /README.rst: -------------------------------------------------------------------------------- 1 | =============================================================== 2 | Converter for Zarnegar Encoding and File Format to Unicode Text 3 | =============================================================== 4 | 5 | Homepage: https://github.com/behnam/python-zarnegar-converter 6 | 7 | .. image:: https://img.shields.io/travis/behnam/python-zarnegar-converter.svg 8 | :target: https://travis-ci.org/behnam/python-zarnegar-converter 9 | 10 | `Zarnegar`_ (Persian: *زرنگار*, zarnegār, meaning gold-depicting) is a 11 | commercial, stand-alone Persian/Arabic word processor program developed for 12 | MS-DOS and Windows. The first version of Zarnegar (for DOS), was released in 13 | April-May 1991, and Windows versions have been available since 2000. 14 | 15 | Zarnegar has employed two different character sets and file formats. 16 | 17 | ----------------------- 18 | Zarnegar1 Character Set 19 | ----------------------- 20 | 21 | Zarnegar used an `Iran System`_-based character encoding system, named 22 | *Zarnegar1*, with text file formats for its early versions, up to its "Zarnegar 23 | 75" version. *Zarnegar1* character set is a *2-form left-to-right visual 24 | encoding*, meaning the every `Perso-Arabic`_ letter receives different 25 | character codes based on its cursive joining form, but most letters receive 26 | only 2 forms, because of the limited code-points available2 forms, because of 27 | the limited code-points available. 28 | 29 | This project has a partial implementation of `Zarnegar1`_ encoding 30 | (`zarnegar_converter/zar1_encoding.py`) and a full implementation of its binary 31 | and text file formats (`zarnegar_converter/zar1_file.py`). 32 | 33 | ------------------------ 34 | Zarnegar75 Character Set 35 | ------------------------ 36 | 37 | With "Zarnegar 75" version of the program, a new character encoding system was 38 | introduced, and the file format was changed to another binary format. 39 | *Zarnegar75* character set is a 4-form bidirectional encoding, meaning that 40 | every `Perso-Arabic`_ letter receives one, two, or four character code, 41 | depending on its cursive joining form, and these letters are stored in the 42 | memory in the semantic order. 43 | 44 | Support for *Zarnegar75* file format and encoding is still in progress. 45 | 46 | ---------- 47 | How to Use 48 | ---------- 49 | 50 | .. code:: bash 51 | 52 | $ ./src/zarnegar-converter.py unicode_legacy_lro samples/zar1-sample-text-01.zar 53 | ‭ ﻡﺎﯾﺧ ﺕﺎﯾﻋﺎﺑﺭ ﻩﺭﺎﺑﺭﺩ | 54 | ‭ ﯽﻧﭘﺍﮊ ﺭﻌﺷ ﺭﺩ ﻭﮐﯾﺎﻫ | 55 | 56 | ----------------- 57 | How to Contribute 58 | ----------------- 59 | 60 | Please report any issues at 61 | or submit GitHub 62 | pull requests. 63 | 64 | The encoding mappings (both Zarnegar1 and Zarnegar75) can be improved with 65 | access to more sample files. Please write to if you like to 66 | contribute (private or public) Zarnegar source files to improve this project. 67 | 68 | ---------------- 69 | Acknowledgements 70 | ---------------- 71 | 72 | Thanks to `Cecil H. Green Library`_ of Stanford University, specially John A 73 | Eilts and Behzad Allahyar, for sharing their collection of Zarnegar documents. 74 | 75 | Also thanks to `The Official Website of Ahmad Shamlou`_ for sharing their 76 | collection of documents. 77 | 78 | ------------ 79 | Legal Notice 80 | ------------ 81 | 82 | *Zarnegar* is a trademark of *SinaSoft Corporation*. This project is NOT 83 | affiliated with SinaSoft Corporation. 84 | 85 | Copyright (C) 2017 Behnam Esfahbod 86 | 87 | This program is free software: you can redistribute it and/or modify it under 88 | the terms of the GNU General Public License as published by the Free Software 89 | Foundation, either version 3 of the License, or (at your option) any later 90 | version. 91 | 92 | This program is distributed in the hope that it will be useful, but WITHOUT ANY 93 | WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A 94 | PARTICULAR PURPOSE. See the GNU General Public License for more details. 95 | 96 | .. _Zarnegar: https://en.wikipedia.org/wiki/Zarnegar_(word_processor) 97 | .. _Zarnegar1: https://en.wikipedia.org/wiki/Zarnegar1 98 | .. _Iran System: https://en.wikipedia.org/wiki/Iran_System_encoding 99 | .. _Perso-Arabic: https://en.wikipedia.org/wiki/Perso-Arabic 100 | .. _Cecil H. Green Library: https://library.stanford.edu/green 101 | .. _The Official Website of Ahmad Shamlou: http://shamlou.org/ 102 | -------------------------------------------------------------------------------- /samples/zar1-sample-text-01.zar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/persian-computing/python-zarnegar-converter/e3482740c34cba14e7c372a675c9166d213629be/samples/zar1-sample-text-01.zar -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding: utf-8 3 | 4 | # Copyright (C) 2017 Behnam Esfahbod 5 | # 6 | # This program is free software: you can redistribute it and/or modify 7 | # it under the terms of the GNU General Public License as published by 8 | # the Free Software Foundation, either version 3 of the License, or 9 | # (at your option) any later version. 10 | # 11 | # This program is distributed in the hope that it will be useful, 12 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 13 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 14 | # GNU General Public License for more details. 15 | # 16 | # You should have received a copy of the GNU General Public License 17 | # along with this program. If not, see . 18 | # 19 | # Author(s): Behnam Esfahbod 20 | 21 | 22 | import os.path 23 | from setuptools import setup, find_packages 24 | 25 | 26 | with open(os.path.join(os.path.dirname(__file__), 'README.rst')) as f: 27 | readme = f.read() 28 | 29 | setup( 30 | name='zarnegar-converter', 31 | version='0.1.3', 32 | description='Converter for Zarnegar Encoding and File Format to Unicode Text Files', 33 | author='Behnam Esfahbod', 34 | author_email='behnam@zwnj.org', 35 | maintainer='Behnam Esfahbod', 36 | maintainer_email='behnam@zwnj.org', 37 | url='https://github.com/behnam/python-zarnegar-converter', 38 | long_description=readme, 39 | license="GNU General Public License, Version 3", 40 | 41 | classifiers=[ 42 | "Development Status :: 4 - Beta", 43 | "Environment :: Console", 44 | "License :: OSI Approved :: GNU General Public License (GPL)", 45 | "Natural Language :: Persian", 46 | "Topic :: Software Development :: Internationalization", 47 | "Topic :: Text Editors", 48 | ], 49 | 50 | include_package_data=True, 51 | package_data={ 52 | '': ['*.txt', '*.rst'], 53 | }, 54 | packages=find_packages('src'), 55 | package_dir={ 56 | '':'src', 57 | }, 58 | scripts=[ 59 | "src/zarnegar-converter.py", 60 | ], 61 | test_suite='nose.collector', 62 | tests_require=['nose'], 63 | ) 64 | -------------------------------------------------------------------------------- /src/zarnegar-converter.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding: utf-8 3 | 4 | # Copyright (C) 2017 Behnam Esfahbod 5 | # 6 | # This program is free software: you can redistribute it and/or modify 7 | # it under the terms of the GNU General Public License as published by 8 | # the Free Software Foundation, either version 3 of the License, or 9 | # (at your option) any later version. 10 | # 11 | # This program is distributed in the hope that it will be useful, 12 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 13 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 14 | # GNU General Public License for more details. 15 | # 16 | # You should have received a copy of the GNU General Public License 17 | # along with this program. If not, see . 18 | # 19 | # Author(s): Behnam Esfahbod 20 | 21 | 22 | from __future__ import absolute_import 23 | from __future__ import division 24 | from __future__ import print_function 25 | from __future__ import unicode_literals 26 | 27 | import sys 28 | import os 29 | import logging 30 | 31 | from zarnegar_converter.zar_file import ZarFile 32 | 33 | 34 | """ 35 | Converter for Zarnegar Encoding and File Format to Unicode Text 36 | """ 37 | 38 | 39 | _USAGE = '''\ 40 | Converter for Zarnegar Encoding and File Format to Unicode Text 41 | 42 | Usage: %s [ [ []]] 43 | 44 | Arguments: 45 | output-format desired output format (see list below) 46 | input-file path to input file (default: stdin) 47 | output-file path to output file (default: stdout) 48 | log-file path to log file (default: stderr) 49 | 50 | Output Formats: 51 | * unicode_rlo Unicode Arabic semantic (standard) encoding, in Right-to-Left Override order 52 | * unicode_lro Unicode Arabic semantic (standard) encoding, in Left-to-Right Override order 53 | * unicode_legacy_lro Legacy Unicoe Arabic Presentation Form encoding, in Right-to-Left Override order 54 | * unicode_legacy_rlo Legacy Unicoe Arabic Presentation Form encoding, in Left-to-Right Override order 55 | * zar1_text Zar1 encoded (text file) 56 | ''' 57 | 58 | 59 | 60 | def get_output_bytes( 61 | output_format, 62 | zar_file, 63 | ): 64 | # Zar1 65 | if output_format == 'zar1_text': 66 | return zar_file.get_zar1_text_output() 67 | 68 | # Unicode Legacy 69 | if output_format == 'unicode_legacy_lro': 70 | return zar_file.get_unicode_legacy_lro_output().encode('utf8') 71 | if output_format == 'unicode_legacy_rlo': 72 | return zar_file.get_unicode_legacy_rlo_output().encode('utf8') 73 | 74 | # Unicode Semantic 75 | if output_format == 'unicode_lro': 76 | return zar_file.get_unicode_lro_output().encode('utf8') 77 | if output_format == 'unicode_rlo': 78 | return zar_file.get_unicode_rlo_output().encode('utf8') 79 | 80 | raise UsageError("invalid output format: %s" % output_format) 81 | 82 | 83 | def convert_and_write( 84 | output_format, 85 | in_file, 86 | out_file, 87 | ): 88 | zar_file = ZarFile.get(in_file) 89 | out_file.write(get_output_bytes(output_format, zar_file)) 90 | 91 | 92 | def main( 93 | output_format, 94 | in_filename=None, 95 | out_filename=None, 96 | log_filename=None, 97 | ): 98 | logging.basicConfig(level=logging.WARNING) 99 | if log_filename: 100 | logging.basicConfig( 101 | filename=log_filename, 102 | level=logging.DEBUG, 103 | filemode='w', 104 | ) 105 | 106 | in_file = None 107 | out_file = None 108 | try: 109 | in_file = open(in_filename, 'r') if in_filename else sys.stdin 110 | out_file = open(out_filename, 'w') if out_filename else sys.stdout 111 | convert_and_write(output_format, in_file, out_file) 112 | except IOError: 113 | if not in_file: 114 | raise IOError("cannot read from input file: %s" % in_filename) 115 | if not out_file: 116 | raise IOError("cannot write to output file: %s" % out_filename) 117 | finally: 118 | if in_filename and in_file: 119 | in_file.close() 120 | if out_filename and out_file: 121 | out_file.close() 122 | 123 | 124 | class UsageError (Exception): 125 | pass 126 | 127 | 128 | def error(err_file, err): 129 | err_file.write("Error: %s%s" % (err, os.linesep)) 130 | err_file.write(os.linesep) 131 | 132 | def usage(err_file, script_name): 133 | err_file.write(_USAGE % script_name) 134 | 135 | if __name__=='__main__': 136 | try: 137 | if len(sys.argv) < 2 or len(sys.argv) > 5: 138 | raise UsageError("invalid arguments") 139 | main(*sys.argv[1:]) 140 | 141 | except UsageError as err: 142 | error(sys.stderr, err) 143 | usage(sys.stderr, os.path.basename(sys.argv[0])) 144 | exit(1) 145 | 146 | except IOError as err: 147 | error(sys.stderr, err) 148 | exit(2) 149 | -------------------------------------------------------------------------------- /src/zarnegar_converter/__init__.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | 3 | # Copyright (C) 2017 Behnam Esfahbod 4 | # 5 | # This program is free software: you can redistribute it and/or modify 6 | # it under the terms of the GNU General Public License as published by 7 | # the Free Software Foundation, either version 3 of the License, or 8 | # (at your option) any later version. 9 | # 10 | # This program is distributed in the hope that it will be useful, 11 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 12 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 13 | # GNU General Public License for more details. 14 | # 15 | # You should have received a copy of the GNU General Public License 16 | # along with this program. If not, see . 17 | # 18 | # Author(s): Behnam Esfahbod 19 | 20 | from __future__ import absolute_import 21 | from __future__ import division 22 | from __future__ import print_function 23 | from __future__ import unicode_literals 24 | -------------------------------------------------------------------------------- /src/zarnegar_converter/tests/test_zar1.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | 3 | # Copyright (C) 2017 Behnam Esfahbod 4 | # 5 | # This program is free software: you can redistribute it and/or modify 6 | # it under the terms of the GNU General Public License as published by 7 | # the Free Software Foundation, either version 3 of the License, or 8 | # (at your option) any later version. 9 | # 10 | # This program is distributed in the hope that it will be useful, 11 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 12 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 13 | # GNU General Public License for more details. 14 | # 15 | # You should have received a copy of the GNU General Public License 16 | # along with this program. If not, see . 17 | # 18 | # Author(s): Behnam Esfahbod 19 | 20 | from unittest import TestCase 21 | 22 | from zarnegar_converter.zar1_file import Zar1File 23 | 24 | class TestZar1(TestCase): 25 | def test_zar1_text(self): 26 | sample = Zar1File.get(open('samples/zar1-sample-text-01.zar', 'r')) 27 | 28 | input_lines = sample.get_zar1_text_lines() 29 | self.assertEqual(input_lines, [ 30 | ' \xf4\x91\xfe\xa1 \x96\x91\xfe\xe4\x91\x93\xa4 \xb4\xf9\xa4\x91\x93\xa4\xa2 |', 31 | ' \xfc\xf7\x95\x90\xa6 \xa4\xe3\xaa \xa4\xa2 \xf8\xee\xfe\x91\xfb |', 32 | ]) 33 | 34 | zar1_text_lines = sample.get_zar1_text_lines() 35 | self.assertEqual(zar1_text_lines, [ 36 | ' \xf4\x91\xfe\xa1 \x96\x91\xfe\xe4\x91\x93\xa4 \xb4\xf9\xa4\x91\x93\xa4\xa2 |', 37 | ' \xfc\xf7\x95\x90\xa6 \xa4\xe3\xaa \xa4\xa2 \xf8\xee\xfe\x91\xfb |', 38 | ]) 39 | 40 | unicode_legacy_lro_lines = sample.get_unicode_legacy_lro_lines() 41 | self.assertEqual(unicode_legacy_lro_lines, [ 42 | u'‭ ﻡﺎﯾﺧ ﺕﺎﯾﻋﺎﺑﺭ ﻩﺭﺎﺑﺭﺩ |', 43 | u'‭ ﯽﻧﭘﺍﮊ ﺭﻌﺷ ﺭﺩ ﻭﮐﯾﺎﻫ |', 44 | ]) 45 | -------------------------------------------------------------------------------- /src/zarnegar_converter/unicode_arabic.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | 3 | # Copyright (C) 2017 Behnam Esfahbod 4 | # 5 | # This program is free software: you can redistribute it and/or modify 6 | # it under the terms of the GNU General Public License as published by 7 | # the Free Software Foundation, either version 3 of the License, or 8 | # (at your option) any later version. 9 | # 10 | # This program is distributed in the hope that it will be useful, 11 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 12 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 13 | # GNU General Public License for more details. 14 | # 15 | # You should have received a copy of the GNU General Public License 16 | # along with this program. If not, see . 17 | # 18 | # Author(s): Behnam Esfahbod 19 | 20 | from __future__ import absolute_import 21 | from __future__ import division 22 | from __future__ import print_function 23 | from __future__ import unicode_literals 24 | 25 | from zarnegar_converter import unicode_bidi 26 | from zarnegar_converter.unicode_joining import remove_useless_joining_control_chars, ZWNJ, ZWJ as ZWJ_ 27 | 28 | 29 | """ 30 | Convert Unicode Arabic Presentation Form to semantic Unicode Arabic 31 | """ 32 | 33 | # U+ARABIC HAMZA ABOVE U+0654 does not have any presentation form encoded in 34 | # the Unicode, therefore we use a PUA code point here. 35 | # 36 | # See also: http://www.unicode.org/L2/L2017/17149-hamza-above-isolated.pdf 37 | ARABIC_HAMZA_ABOVE_ISOLATED_FORM_PUA = 0xF8FD 38 | _AHAIF = ARABIC_HAMZA_ABOVE_ISOLATED_FORM_PUA 39 | 40 | _LEGACY_TO_SEMANTIC_MAP = { 41 | # 1-Shape Letters 42 | 0xFB8A: 0x0698, # ARABIC LETTER JEH 43 | 0xFE80: 0x0621, # ARABIC LETTER HAMZA 44 | 0xFEA9: 0x062F, # ARABIC LETTER DAL 45 | 0xFEAB: 0x0630, # ARABIC LETTER THAL 46 | 0xFEAD: 0x0631, # ARABIC LETTER REH 47 | 0xFEAF: 0x0632, # ARABIC LETTER ZAIN 48 | 0xFEC1: 0x0637, # ARABIC LETTER TAH 49 | 0xFEC5: 0x0638, # ARABIC LETTER ZAH 50 | 0xFEED: 0x0648, # ARABIC LETTER WAW 51 | 52 | # 2-Shape Letters: ALEF 53 | 0xFE8D: [0x0627, ZWNJ], # ARABIC LETTER ALEF (isolated form) 54 | 0xFE8E: [0x0627, ZWJ_], # ARABIC LETTER ALEF (final form) 55 | 56 | # 2-Shape Letters: Others 57 | 0xFE8F: [ZWNJ, 0x0628], # ARABIC LETTER BEH (final-isolated form) 58 | 0xFE91: [ZWJ_, 0x0628], # ARABIC LETTER BEH (initial-medial form) 59 | 60 | 0xFB56: [ZWNJ, 0x067E], # ARABIC LETTER PEH (final-isolated form) 61 | 0xFB58: [ZWJ_, 0x067E], # ARABIC LETTER PEH (initial-medial form) 62 | 63 | 0xFE95: [ZWNJ, 0x062A], # ARABIC LETTER TEH (final-isolated form) 64 | 0xFE97: [ZWJ_, 0x062A], # ARABIC LETTER TEH (initial-medial form) 65 | 66 | 0xFE99: [ZWNJ, 0x062B], # ARABIC LETTER THEH (final-isolated form) 67 | 0xFE9B: [ZWJ_, 0x062B], # ARABIC LETTER THEH (initial-medial form) 68 | 69 | 0xFE9D: [ZWNJ, 0x062C], # ARABIC LETTER JEEM (final-isolated form) 70 | 0xFE9F: [ZWJ_, 0x062C], # ARABIC LETTER JEEM (initial-medial form) 71 | 72 | 0xFB7A: [ZWNJ, 0x0686], # ARABIC LETTER TCHEH (final-isolated form) 73 | 0xFB7C: [ZWJ_, 0x0686], # ARABIC LETTER TCHEH (initial-medial form) 74 | 75 | 0xFEA1: [ZWNJ, 0x062D], # ARABIC LETTER HAH (final-isolated form) 76 | 0xFEA3: [ZWJ_, 0x062D], # ARABIC LETTER HAH (initial-medial form) 77 | 78 | 0xFEA5: [ZWNJ, 0x062E], # ARABIC LETTER KHAH (final-isolated form) 79 | 0xFEA7: [ZWJ_, 0x062E], # ARABIC LETTER KHAH (initial-medial form) 80 | 81 | 0xFEB1: [ZWNJ, 0x0633], # ARABIC LETTER SEEN (final-isolated form) 82 | 0xFEB3: [ZWJ_, 0x0633], # ARABIC LETTER SEEN (initial-medial form) 83 | 84 | 0xFEB5: [ZWNJ, 0x0634], # ARABIC LETTER SHEEN (final-isolated form) 85 | 0xFEB7: [ZWJ_, 0x0634], # ARABIC LETTER SHEEN (initial-medial form) 86 | 87 | 0xFEB9: [ZWNJ, 0x0635], # ARABIC LETTER SAD (final-isolated form) 88 | 0xFEBB: [ZWJ_, 0x0635], # ARABIC LETTER SAD (initial-medial form) 89 | 90 | 0xFEBD: [ZWNJ, 0x0636], # ARABIC LETTER DAD (final-isolated form) 91 | 0xFEBF: [ZWJ_, 0x0636], # ARABIC LETTER DAD (initial-medial form) 92 | 93 | 0xFED1: [ZWNJ, 0x0641], # ARABIC LETTER FEH (final-isolated form) 94 | 0xFED3: [ZWJ_, 0x0641], # ARABIC LETTER FEH (initial-medial form) 95 | 96 | 0xFED5: [ZWNJ, 0x0642], # ARABIC LETTER QAF (final-isolated form) 97 | 0xFED7: [ZWJ_, 0x0642], # ARABIC LETTER QAF (initial-medial form) 98 | 99 | 0xFB8E: [ZWNJ, 0x06A9], # ARABIC LETTER KEHEH (final-isolated form) 100 | 0xFB90: [ZWJ_, 0x06A9], # ARABIC LETTER KEHEH (initial-medial form) 101 | 102 | 0xFB92: [ZWNJ, 0x06AF], # ARABIC LETTER GAF (final-isolated form) 103 | 0xFB94: [ZWJ_, 0x06AF], # ARABIC LETTER GAF (initial-medial form) 104 | 105 | 0xFEDD: [ZWNJ, 0x0644], # ARABIC LETTER LAM (final-isolated form) 106 | 0xFEDF: [ZWJ_, 0x0644], # ARABIC LETTER LAM (initial-medial form) 107 | 108 | 0xFEE1: [ZWNJ, 0x0645], # ARABIC LETTER MEEM (final-isolated form) 109 | 0xFEE3: [ZWJ_, 0x0645], # ARABIC LETTER MEEM (initial-medial form) 110 | 111 | 0xFEE5: [ZWNJ, 0x0646], # ARABIC LETTER NOON (final-isolated form) 112 | 0xFEE7: [ZWJ_, 0x0646], # ARABIC LETTER NOON (initial-medial form) 113 | 114 | # 3-Shape Letters 115 | 0xFEE9: [ZWNJ, 0x0647], # ARABIC LETTER HEH (final-isolated form) 116 | 0xFEEB: [ZWJ_, 0x0647, ZWNJ], # ARABIC LETTER HEH (initial form) 117 | 0xFEEC: [ZWJ_, 0x0647, ZWJ_], # ARABIC LETTER HEH (medial form) 118 | 119 | 0xFBFC: [ZWNJ, 0x06CC, ZWNJ], # ARABIC LETTER FARSI YEH (isolated form) 120 | 0xFBFD: [ZWNJ, 0x06CC, ZWJ_], # ARABIC LETTER FARSI YEH (final form) 121 | 0xFBFE: [ZWJ_, 0x06CC], # ARABIC LETTER FARSI YEH (initial-medial form) 122 | 123 | # 4-Shape Letters 124 | 0xFEC9: [ZWNJ, 0x0639, ZWNJ], # ARABIC LETTER AIN (isolated form) 125 | 0xFECA: [ZWNJ, 0x0639, ZWJ_], # ARABIC LETTER AIN (final form) 126 | 0xFECB: [ZWJ_, 0x0639, ZWNJ], # ARABIC LETTER AIN (initial form) 127 | 0xFECC: [ZWJ_, 0x0639, ZWJ_], # ARABIC LETTER AIN (medial form) 128 | 129 | 0xFECD: [ZWNJ, 0x063A, ZWNJ], # ARABIC LETTER GHAIN (isolated form) 130 | 0xFECE: [ZWNJ, 0x063A, ZWJ_], # ARABIC LETTER GHAIN (final form) 131 | 0xFECF: [ZWJ_, 0x063A, ZWNJ], # ARABIC LETTER GHAIN (initial form) 132 | 0xFED0: [ZWJ_, 0x063A, ZWJ_], # ARABIC LETTER GHAIN (medial form) 133 | 134 | # Others Letters 135 | 0xFE81: [0x0622, ZWNJ], # ARABIC LETTER ALEF WITH MADDA ABOVE (isolated form) 136 | 0xFE8B: [ZWJ_, 0x0626], # ARABIC LETTER YEH WITH HAMZA ABOVE (initial-medial form) 137 | 0xFEFB: [0x0627, 0x0644], # ARABIC LIGATURE LAM WITH ALEF 138 | 139 | # Diacritics 140 | 0xFE70: 0x064B, # ARABIC FATHATAN (mark) 141 | 0xFE72: 0x064C, # ARABIC DAMMATAN (mark) 142 | 0xFE76: 0x064E, # ARABIC FATHA (mark) 143 | 0xFE78: 0x064F, # ARABIC DAMMA (mark) 144 | 0xFE7A: 0x0650, # ARABIC KASRA (mark) 145 | 0xFE7C: 0x0651, # ARABIC SHADDA (mark) 146 | 0xFE7E: 0x0652, # ARABIC SUKUN (mark) 147 | _AHAIF: 0x0654, # ARABIC HAMZA ABOVE (mark) 148 | } 149 | 150 | 151 | def convert_legacy_char_to_semantic_lro(legacy_char, line_no): 152 | codepoints = _LEGACY_TO_SEMANTIC_MAP.get(ord(legacy_char), ord(legacy_char)) 153 | if type(codepoints) is int: 154 | return unichr(codepoints) 155 | if type(codepoints) is list: 156 | return ''.join(map(lambda cp: unichr(cp), codepoints)) 157 | raise Error("invalid map value") 158 | 159 | def convert_legacy_line_to_semantic_lro(legacy_text, line_no): 160 | semantic_text = ''.join([ 161 | convert_legacy_char_to_semantic_lro(legacy_char, line_no) 162 | for legacy_char in legacy_text 163 | ]) 164 | return remove_useless_joining_control_chars(semantic_text) 165 | -------------------------------------------------------------------------------- /src/zarnegar_converter/unicode_bidi.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | 3 | # Copyright (C) 2017 Behnam Esfahbod 4 | # 5 | # This program is free software: you can redistribute it and/or modify 6 | # it under the terms of the GNU General Public License as published by 7 | # the Free Software Foundation, either version 3 of the License, or 8 | # (at your option) any later version. 9 | # 10 | # This program is distributed in the hope that it will be useful, 11 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 12 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 13 | # GNU General Public License for more details. 14 | # 15 | # You should have received a copy of the GNU General Public License 16 | # along with this program. If not, see . 17 | # 18 | # Author(s): Behnam Esfahbod 19 | 20 | from __future__ import absolute_import 21 | from __future__ import division 22 | from __future__ import print_function 23 | from __future__ import unicode_literals 24 | 25 | 26 | """ 27 | Unicode Bidirection helpers for Zarnegar Encoding 28 | """ 29 | 30 | 31 | LRO = 0x202D # LEFT-TO-RIGHT OVERRIDE 32 | 33 | LRO_CHAR = "\u202D" # LEFT-TO-RIGHT OVERRIDE 34 | RLO_CHAR = "\u202E" # RIGHT-TO-LEFT OVERRIDE 35 | 36 | MIRROR_MAP = { 37 | 0x0028: 0x0029, # LEFT PARENTHESIS 38 | 0x0029: 0x0028, # RIGHT PARENTHESIS 39 | 40 | 0x003C: 0x003E, # LESS-THAN SIGN 41 | 0x003E: 0x003C, # GREATER-THAN SIGN 42 | 43 | 0x005B: 0x005D, # LEFT SQUARE BRACKET 44 | 0x005D: 0x005B, # RIGHT SQUARE BRACKET 45 | 46 | 0x007B: 0x007D, # LEFT CURLY BRACKET 47 | 0x007D: 0x007B, # RIGHT CURLY BRACKET 48 | 49 | 0x00AB: 0x00BB, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK 50 | 0x00BB: 0x00AB, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK 51 | } 52 | 53 | 54 | def get_mirrored(text): 55 | return ''.join([ 56 | unichr(MIRROR_MAP.get(ord(char), ord(char))) for char in text 57 | ]) 58 | 59 | def get_reversed(text): 60 | return get_mirrored(reversed(text)) 61 | -------------------------------------------------------------------------------- /src/zarnegar_converter/unicode_joining.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | 3 | # Copyright (C) 2017 Behnam Esfahbod 4 | # 5 | # This program is free software: you can redistribute it and/or modify 6 | # it under the terms of the GNU General Public License as published by 7 | # the Free Software Foundation, either version 3 of the License, or 8 | # (at your option) any later version. 9 | # 10 | # This program is distributed in the hope that it will be useful, 11 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 12 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 13 | # GNU General Public License for more details. 14 | # 15 | # You should have received a copy of the GNU General Public License 16 | # along with this program. If not, see . 17 | # 18 | # Author(s): Behnam Esfahbod 19 | 20 | from __future__ import absolute_import 21 | from __future__ import division 22 | from __future__ import print_function 23 | from __future__ import unicode_literals 24 | 25 | 26 | """ 27 | Unicode Arabic Joining helpers for Zarnegar Encoding 28 | """ 29 | 30 | 31 | ZWNJ = 0x200C # ZERO-WIDTH NON-JOINER 32 | ZWJ = 0x200D # ZERO-WIDTH JOINER 33 | 34 | ZWNJ_CHAR = "\u200C" # ZERO-WIDTH NON-JOINER 35 | ZWJ_CHAR = "\u200D" # ZERO-WIDTH JOINER 36 | 37 | LEFT_JOINER = [ 38 | 0x0626, # ARABIC LETTER YEH WITH HAMZA ABOVE 39 | 0x0628, # ARABIC LETTER BEH 40 | 0x062A, # ARABIC LETTER TEH 41 | 0x062B, # ARABIC LETTER THEH 42 | 0x062C, # ARABIC LETTER JEEM 43 | 0x062D, # ARABIC LETTER HAH 44 | 0x062E, # ARABIC LETTER KHAH 45 | 0x0633, # ARABIC LETTER SEEN 46 | 0x0634, # ARABIC LETTER SHEEN 47 | 0x0635, # ARABIC LETTER SAD 48 | 0x0636, # ARABIC LETTER DAD 49 | 0x0637, # ARABIC LETTER TAH 50 | 0x0638, # ARABIC LETTER ZAH 51 | 0x0639, # ARABIC LETTER AIN 52 | 0x063A, # ARABIC LETTER GHAIN 53 | 0x0640, # ARABIC TATWEEL 54 | 0x0641, # ARABIC LETTER FEH 55 | 0x0642, # ARABIC LETTER QAF 56 | 0x0644, # ARABIC LETTER LAM 57 | 0x0645, # ARABIC LETTER MEEM 58 | 0x0646, # ARABIC LETTER NOON 59 | 0x0647, # ARABIC LETTER HEH 60 | 0x067E, # ARABIC LETTER PEH 61 | 0x0686, # ARABIC LETTER TCHEH 62 | 0x06A9, # ARABIC LETTER KEHEH 63 | 0x06AF, # ARABIC LETTER GAF 64 | 0x06CC, # ARABIC LETTER FARSI YEH 65 | ZWJ, 66 | ] 67 | 68 | RIGHT_JOINER = [ 69 | 0x0622, # ARABIC LETTER ALEF WITH MADDA ABOVE 70 | 0x0626, # ARABIC LETTER YEH WITH HAMZA ABOVE 71 | 0x0627, # ARABIC LETTER ALEF 72 | 0x0628, # ARABIC LETTER BEH 73 | 0x062A, # ARABIC LETTER TEH 74 | 0x062B, # ARABIC LETTER THEH 75 | 0x062C, # ARABIC LETTER JEEM 76 | 0x062D, # ARABIC LETTER HAH 77 | 0x062E, # ARABIC LETTER KHAH 78 | 0x062F, # ARABIC LETTER DAL 79 | 0x0630, # ARABIC LETTER THAL 80 | 0x0631, # ARABIC LETTER REH 81 | 0x0632, # ARABIC LETTER ZAIN 82 | 0x0633, # ARABIC LETTER SEEN 83 | 0x0634, # ARABIC LETTER SHEEN 84 | 0x0635, # ARABIC LETTER SAD 85 | 0x0636, # ARABIC LETTER DAD 86 | 0x0637, # ARABIC LETTER TAH 87 | 0x0638, # ARABIC LETTER ZAH 88 | 0x0639, # ARABIC LETTER AIN 89 | 0x063A, # ARABIC LETTER GHAIN 90 | 0x0640, # ARABIC TATWEEL 91 | 0x0641, # ARABIC LETTER FEH 92 | 0x0642, # ARABIC LETTER QAF 93 | 0x0644, # ARABIC LETTER LAM 94 | 0x0645, # ARABIC LETTER MEEM 95 | 0x0646, # ARABIC LETTER NOON 96 | 0x0647, # ARABIC LETTER HEH 97 | 0x0648, # ARABIC LETTER WAW 98 | 0x067E, # ARABIC LETTER PEH 99 | 0x0686, # ARABIC LETTER TCHEH 100 | 0x0698, # ARABIC LETTER JEH 101 | 0x06A9, # ARABIC LETTER KEHEH 102 | 0x06AF, # ARABIC LETTER GAF 103 | 0x06CC, # ARABIC LETTER FARSI YEH 104 | ZWJ, 105 | ] 106 | 107 | 108 | def is_zwnj(char): 109 | return ord(char) == ZWNJ if char is not None else False 110 | 111 | def is_zwj(char): 112 | return ord(char) == ZWJ if char is not None else False 113 | 114 | def is_left_joiner(char): 115 | return ord(char) in LEFT_JOINER if char is not None else False 116 | 117 | def is_right_joiner(char): 118 | return ord(char) in RIGHT_JOINER if char is not None else False 119 | 120 | # Applies to a Left-to-Right text 121 | def remove_useless_joining_control_chars(text): 122 | result = '' 123 | text = text.replace(ZWNJ_CHAR + ZWNJ_CHAR, ZWNJ_CHAR) 124 | text = text.replace(ZWJ_CHAR + ZWJ_CHAR, ZWJ_CHAR) 125 | text_len = len(text) 126 | for idx in range(text_len): 127 | chr_on_left = text[idx - 1] if idx > 0 else None 128 | chr_current = text[idx] 129 | chr_on_right = text[idx + 1] if idx < text_len - 1 else None 130 | if is_zwnj(chr_current): 131 | if not (is_right_joiner(chr_on_left) and is_left_joiner(chr_on_right)): 132 | continue 133 | if is_zwj(chr_current): 134 | if is_right_joiner(chr_on_left) and is_left_joiner(chr_on_right): 135 | continue 136 | result += chr_current 137 | return result 138 | -------------------------------------------------------------------------------- /src/zarnegar_converter/zar1_encoding.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | 3 | # Copyright (C) 2017 Behnam Esfahbod 4 | # 5 | # This program is free software: you can redistribute it and/or modify 6 | # it under the terms of the GNU General Public License as published by 7 | # the Free Software Foundation, either version 3 of the License, or 8 | # (at your option) any later version. 9 | # 10 | # This program is distributed in the hope that it will be useful, 11 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 12 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 13 | # GNU General Public License for more details. 14 | # 15 | # You should have received a copy of the GNU General Public License 16 | # along with this program. If not, see . 17 | # 18 | # Author(s): Behnam Esfahbod 19 | 20 | from __future__ import absolute_import 21 | from __future__ import division 22 | from __future__ import print_function 23 | from __future__ import unicode_literals 24 | 25 | import logging 26 | 27 | from zarnegar_converter import unicode_arabic 28 | from zarnegar_converter import unicode_bidi 29 | 30 | """ 31 | Convert Zarnegar Encoding to Unicode Arabic Presentation Form 32 | """ 33 | 34 | _AHAIF = unicode_arabic.ARABIC_HAMZA_ABOVE_ISOLATED_FORM_PUA 35 | 36 | _IRAN_SYSTEM_MAP = { 37 | # Numerals 38 | 0x80: 0x06F0, # EXTENDED ARABIC-INDIC DIGIT ZERO 39 | 0x81: 0x06F1, # EXTENDED ARABIC-INDIC DIGIT ONE 40 | 0x82: 0x06F2, # EXTENDED ARABIC-INDIC DIGIT TWO 41 | 0x83: 0x06F3, # EXTENDED ARABIC-INDIC DIGIT THREE 42 | 0x84: 0x06F4, # EXTENDED ARABIC-INDIC DIGIT FOUR 43 | 0x85: 0x06F5, # EXTENDED ARABIC-INDIC DIGIT FIVE 44 | 0x86: 0x06F6, # EXTENDED ARABIC-INDIC DIGIT SIX 45 | 0x87: 0x06F7, # EXTENDED ARABIC-INDIC DIGIT SEVEN 46 | 0x88: 0x06F8, # EXTENDED ARABIC-INDIC DIGIT EIGHT 47 | 0x89: 0x06F9, # EXTENDED ARABIC-INDIC DIGIT NINE 48 | 49 | # Punctuations 50 | 0x8A: 0x060C, # ARABIC COMMA 51 | 0x8B: 0x0640, # ARABIC TATWEEL 52 | 0x8C: 0x061F, # ARABIC QUESTION MARK 53 | 54 | # Letters 55 | 0x8D: 0xFE81, # ARABIC LETTER ALEF WITH MADDA ABOVE ISOLATED FORM 56 | 0x8E: 0xFE8B, # ARABIC LETTER YEH WITH HAMZA ABOVE INITIAL FORM 57 | 0x8F: 0xFE80, # ARABIC LETTER HAMZA ISOLATED FORM 58 | 59 | 0x90: 0xFE8D, # ARABIC LETTER ALEF ISOLATED FORM 60 | 0x91: 0xFE8E, # ARABIC LETTER ALEF FINAL FORM 61 | 0x92: 0xFE8F, # ARABIC LETTER BEH ISOLATED FORM 62 | 0x93: 0xFE91, # ARABIC LETTER BEH INITIAL FORM 63 | 0x94: 0xFB56, # ARABIC LETTER PEH ISOLATED FORM 64 | 0x95: 0xFB58, # ARABIC LETTER PEH INITIAL FORM 65 | 0x96: 0xFE95, # ARABIC LETTER TEH ISOLATED FORM 66 | 0x97: 0xFE97, # ARABIC LETTER TEH INITIAL FORM 67 | 0x98: 0xFE99, # ARABIC LETTER THEH ISOLATED FORM 68 | 0x99: 0xFE9B, # ARABIC LETTER THEH INITIAL FORM 69 | 0x9A: 0xFE9D, # ARABIC LETTER JEEM ISOLATED FORM 70 | 0x9B: 0xFE9F, # ARABIC LETTER JEEM INITIAL FORM 71 | 0x9C: 0xFB7A, # ARABIC LETTER TCHEH ISOLATED FORM 72 | 0x9D: 0xFB7C, # ARABIC LETTER TCHEH INITIAL FORM 73 | 0x9E: 0xFEA1, # ARABIC LETTER HAH ISOLATED FORM 74 | 0x9F: 0xFEA3, # ARABIC LETTER HAH INITIAL FORM 75 | 76 | 0xA0: 0xFEA5, # ARABIC LETTER KHAH ISOLATED FORM 77 | 0xA1: 0xFEA7, # ARABIC LETTER KHAH INITIAL FORM 78 | 0xA2: 0xFEA9, # ARABIC LETTER DAL ISOLATED FORM 79 | 0xA3: 0xFEAB, # ARABIC LETTER THAL ISOLATED FORM 80 | 0xA4: 0xFEAD, # ARABIC LETTER REH ISOLATED FORM 81 | 0xA5: 0xFEAF, # ARABIC LETTER ZAIN ISOLATED FORM 82 | 0xA6: 0xFB8A, # ARABIC LETTER JEH ISOLATED FORM 83 | 0xA7: 0xFEB1, # ARABIC LETTER SEEN ISOLATED FORM 84 | 0xA8: 0xFEB3, # ARABIC LETTER SEEN INITIAL FORM 85 | 0xA9: 0xFEB5, # ARABIC LETTER SHEEN ISOLATED FORM 86 | 0xAA: 0xFEB7, # ARABIC LETTER SHEEN INITIAL FORM 87 | 0xAB: 0xFEB9, # ARABIC LETTER SAD ISOLATED FORM 88 | 0xAC: 0xFEBB, # ARABIC LETTER SAD INITIAL FORM 89 | 0xAD: 0xFEBD, # ARABIC LETTER DAD ISOLATED FORM 90 | 0xAE: 0xFEBF, # ARABIC LETTER DAD INITIAL FORM 91 | 0xAF: 0xFEC1, # ARABIC LETTER TAH ISOLATED FORM 92 | 93 | # Shadows 94 | 0xB0: 0x2591, # LIGHT SHADE 95 | 0xB1: 0x2592, # MEDIUM SHADE 96 | 0xB2: 0x2593, # DARK SHADE 97 | 98 | # Box Drawings 99 | 0xB3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL 100 | 0xB4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT 101 | 0xB5: 0x2561, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE 102 | 0xB6: 0x2562, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE 103 | 0xB7: 0x2556, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE 104 | 0xB8: 0x2555, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE 105 | 0xB9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT 106 | 0xBA: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL 107 | 0xBB: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT 108 | 0xBC: 0x255D, # BOX DRAWINGS DOUBLE UP AND LEFT 109 | 0xBD: 0x255C, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE 110 | 0xBE: 0x255B, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE 111 | 0xBF: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT 112 | 113 | 0xC0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT 114 | 0xC1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL 115 | 0xC2: 0x252C, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL 116 | 0xC3: 0x251C, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT 117 | 0xC4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL 118 | 0xC5: 0x253C, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL 119 | 0xC6: 0x255E, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE 120 | 0xC7: 0x255F, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE 121 | 0xC8: 0x255A, # BOX DRAWINGS DOUBLE UP AND RIGHT 122 | 0xC9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT 123 | 0xCA: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL 124 | 0xCB: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL 125 | 0xCC: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT 126 | 0xCD: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL 127 | 0xCE: 0x256C, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL 128 | 0xCF: 0x2567, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE 129 | 130 | 0xD0: 0x2568, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE 131 | 0xD1: 0x2564, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE 132 | 0xD2: 0x2565, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE 133 | 0xD3: 0x2559, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE 134 | 0xD4: 0x2558, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE 135 | 0xD5: 0x2552, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE 136 | 0xD6: 0x2553, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE 137 | 0xD7: 0x256B, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE 138 | 0xD8: 0x256A, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE 139 | 0xD9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT 140 | 0xDA: 0x250C, # BOX DRAWINGS LIGHT DOWN AND RIGHT 141 | 142 | # Shadows 143 | 0xDB: 0x2588, # FULL BLOCK 144 | 0xDC: 0x2584, # LOWER HALF BLOCK 145 | 0xDD: 0x258C, # LEFT HALF BLOCK 146 | 0xDE: 0x2590, # RIGHT HALF BLOCK 147 | 0xDF: 0x2580, # UPPER HALF BLOCK 148 | 149 | # Letters 150 | 0xE0: 0xFEC5, # ARABIC LETTER ZAH ISOLATED FORM 151 | 0xE1: 0xFEC9, # ARABIC LETTER AIN ISOLATED FORM 152 | 0xE2: 0xFECA, # ARABIC LETTER AIN FINAL FORM 153 | 0xE3: 0xFECC, # ARABIC LETTER AIN MEDIAL FORM 154 | 0xE4: 0xFECB, # ARABIC LETTER AIN INITIAL FORM 155 | 0xE5: 0xFECD, # ARABIC LETTER GHAIN ISOLATED FORM 156 | 0xE6: 0xFECE, # ARABIC LETTER GHAIN FINAL FORM 157 | 0xE7: 0xFED0, # ARABIC LETTER GHAIN MEDIAL FORM 158 | 0xE8: 0xFECF, # ARABIC LETTER GHAIN INITIAL FORM 159 | 0xE9: 0xFED1, # ARABIC LETTER FEH ISOLATED FORM 160 | 0xEA: 0xFED3, # ARABIC LETTER FEH INITIAL FORM 161 | 0xEB: 0xFED5, # ARABIC LETTER QAF ISOLATED FORM 162 | 0xEC: 0xFED7, # ARABIC LETTER QAF INITIAL FORM 163 | 0xED: 0xFB8E, # ARABIC LETTER KEHEH ISOLATED FORM 164 | 0xEE: 0xFB90, # ARABIC LETTER KEHEH INITIAL FORM 165 | 0xEF: 0xFB92, # ARABIC LETTER GAF ISOLATED FORM 166 | 167 | # Letters 168 | 0xF0: 0xFB94, # ARABIC LETTER GAF INITIAL FORM 169 | 0xF1: 0xFEDD, # ARABIC LETTER LAM ISOLATED FORM 170 | 0xF2: 0xFEFB, # ARABIC LIGATURE LAM WITH ALEF ISOLATED FORM 171 | 0xF3: 0xFEDF, # ARABIC LETTER LAM INITIAL FORM 172 | 0xF4: 0xFEE1, # ARABIC LETTER MEEM ISOLATED FORM 173 | 0xF5: 0xFEE3, # ARABIC LETTER MEEM INITIAL FORM 174 | 0xF6: 0xFEE5, # ARABIC LETTER NOON ISOLATED FORM 175 | 0xF7: 0xFEE7, # ARABIC LETTER NOON INITIAL FORM 176 | 0xF8: 0xFEED, # ARABIC LETTER WAW ISOLATED FORM 177 | 0xF9: 0xFEE9, # ARABIC LETTER HEH ISOLATED FORM 178 | 0xFA: 0xFEEC, # ARABIC LETTER HEH MEDIAL FORM 179 | 0xFB: 0xFEEB, # ARABIC LETTER HEH INITIAL FORM 180 | 0xFC: 0xFBFD, # ARABIC LETTER FARSI YEH FINAL FORM 181 | 0xFD: 0xFBFC, # ARABIC LETTER FARSI YEH ISOLATED FORM 182 | 0xFE: 0xFBFE, # ARABIC LETTER FARSI YEH INITIAL FORM 183 | 184 | 0xFF: 0x00A0, # NO-BREAK SPACE 185 | } 186 | 187 | _ZARNEGAR_OVERRIDES_MAP = { 188 | 0x00: 0x0000, 189 | 0x01: 0x0001, 190 | 191 | 0x03: 0xFD3E, # ORNATE LEFT PARENTHESIS 192 | 0x04: 0xFD3F, # ORNATE RIGHT PARENTHESIS 193 | 194 | 0x1D: 0x00A0, # NO-BREAK SPACE 195 | 196 | 0xB0: 0xFE7C, # ARABIC SHADDA ISOLATED FORM 197 | 0xB1: 0xFE76, # ARABIC FATHA ISOLATED FORM 198 | 0xB2: 0xFE70, # ARABIC FATHATAN ISOLATED FORM 199 | # 0xB3: TODO 200 | 0xB4: _AHAIF, # ARABIC HAMZA ABOVE ISOLATED FORM 201 | 0xB5: 0xFE78, # ARABIC DAMMA ISOLATED FORM 202 | 0xB6: 0xFE72, # ARABIC DAMMATAN ISOLATED FORM 203 | # 0xB7: TODO 204 | # 0xB8: TODO 205 | # 0xB9: TODO 206 | # 0xBA: TODO 207 | # 0xBB: TODO 208 | # 0xBC: TODO 209 | # 0xBD: TODO 210 | 0xBE: 0xFE7A, # ARABIC KASRA ISOLATED FORM 211 | # 0xBF: TODO 212 | 213 | # 0xC0: TODO 214 | # 0xC1: TODO 215 | # 0xC2: TODO 216 | 0xC3: 0x00AB, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK 217 | 0xC4: 0x00BB, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK 218 | # 0xC5: TODO 219 | # 0xC6: TODO 220 | 0xC7: 0x061B, # ARABIC SEMICOLON 221 | # 0xC8: TODO 222 | # 0xC9: TODO 223 | # 0xCA: TODO 224 | # 0xCB: TODO 225 | # 0xCC: TODO 226 | # 0xCD: TODO 227 | # 0xCE: TODO 228 | # 0xCF: TODO 229 | } 230 | 231 | _ZARNEGAR_MAP = dict(enumerate(range(0x80))) 232 | _ZARNEGAR_MAP.update(_IRAN_SYSTEM_MAP) 233 | _ZARNEGAR_MAP.update(_ZARNEGAR_OVERRIDES_MAP) 234 | 235 | 236 | def _in_zar_override(char_byte): 237 | return ord(char_byte) in _ZARNEGAR_OVERRIDES_MAP 238 | 239 | def convert_zar_byte_to_legacy_char(char_byte, line_no): 240 | codepoints = _ZARNEGAR_MAP[ord(char_byte)] 241 | 242 | if type(codepoints) is int: 243 | # "U+%04X" % ord(char) if char is not None else "NONE" 244 | #if ord(char_byte) in range(0x00, 0x20): 245 | if ord(char_byte) in range(0x00, 0x20) and not _in_zar_override(char_byte): 246 | logging.error('zar_legacy: ERROR1: Line %4d: 0x%02X', line_no, ord(char_byte)) 247 | #if ord(char_byte) in range(0xB0, 0xE0): 248 | if ord(char_byte) in range(0xB0, 0xE0) and not _in_zar_override(char_byte): 249 | logging.error('zar_legacy: ERROR2: Line %4d: 0x%02X', line_no, ord(char_byte)) 250 | return unichr(codepoints) 251 | 252 | if type(codepoints) is list: 253 | return ''.join(map(lambda cp: unichr(cp), codepoints)) 254 | 255 | raise Error("invalid map value") 256 | 257 | def convert_zar1_line_to_unicode_legacy_lro(zar1_line, line_no): 258 | legacy_text = ''.join([ 259 | convert_zar_byte_to_legacy_char(zar_byte, line_no) 260 | for zar_byte in zar1_line 261 | ]) 262 | return unicode_bidi.LRO_CHAR + legacy_text 263 | 264 | def convert_zar1_line_to_semantic_lro(zar_text, line_no): 265 | legacy_text = ''.join([ 266 | convert_zar_byte_to_legacy_char(zar_byte, line_no) 267 | for zar_byte in zar_text 268 | ]) 269 | return unicode_arabic.convert_legacy_line_to_semantic_lro(legacy_text, line_no) 270 | 271 | def convert_zar1_line_to_unicode_lro(zar_text, line_no): 272 | lro_text = convert_zar1_line_to_semantic_lro(zar_text, line_no) 273 | return unicode_bidi.LRO_CHAR + lro_text 274 | 275 | def convert_zar1_line_to_unicode_rlo(zar_text, line_no): 276 | lro_text = convert_zar1_line_to_semantic_lro(zar_text, line_no) 277 | rlo_text = unicode_bidi.get_reversed(lro_text) 278 | return unicode_bidi.RLO_CHAR + rlo_text 279 | -------------------------------------------------------------------------------- /src/zarnegar_converter/zar1_file.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | 3 | # Copyright (C) 2017 Behnam Esfahbod 4 | # 5 | # This program is free software: you can redistribute it and/or modify 6 | # it under the terms of the GNU General Public License as published by 7 | # the Free Software Foundation, either version 3 of the License, or 8 | # (at your option) any later version. 9 | # 10 | # This program is distributed in the hope that it will be useful, 11 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 12 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 13 | # GNU General Public License for more details. 14 | # 15 | # You should have received a copy of the GNU General Public License 16 | # along with this program. If not, see . 17 | # 18 | # Author(s): Behnam Esfahbod 19 | 20 | from __future__ import absolute_import 21 | from __future__ import division 22 | from __future__ import print_function 23 | from __future__ import unicode_literals 24 | 25 | import sys 26 | import struct 27 | import logging 28 | 29 | from zarnegar_converter import zar1_encoding 30 | from zarnegar_converter.zar_file import ZarFile, ZarFileTypeError, OUTPUT_NEW_LINE 31 | 32 | 33 | """ 34 | Read-only view on a Zarnegar File 35 | 36 | Generates a list of 80-byte-wide lines from a Zarnegar text or binary file. 37 | """ 38 | 39 | 40 | _LINE_WIDTH = 80 41 | 42 | _BINARY_MAGIC = b'\x03\xCA\xB1\xF2' 43 | 44 | _BINARY_HEADER_FMT = ( 45 | '<' + # Little-Endian 46 | 'H' + # Total Lines Count 47 | 'H' + # Total Text Length 48 | '10s' # Installation/User Data 49 | ) 50 | _binary_header_struct = struct.Struct(_BINARY_HEADER_FMT) 51 | 52 | _BINARY_LINE_INFO_FMT = ( 53 | '<' + # Little-Endian 54 | 'B' + # Line Text Start 55 | 'H' + # Cumulative Text Length 56 | 'B' # Line Text Length 57 | ) 58 | _binary_line_info_struct = struct.Struct(_BINARY_LINE_INFO_FMT) 59 | 60 | 61 | class Zar1File(ZarFile): 62 | 63 | @staticmethod 64 | def get(in_file): 65 | try: 66 | return Zar1BinaryFile(in_file) 67 | except ZarFileTypeError: 68 | return Zar1TextFile(in_file) 69 | 70 | def _append_line(self, text): 71 | rest = b' ' * (_LINE_WIDTH - len(text)) 72 | self._lines.append(text + rest) 73 | 74 | # == Zar1, Text == 75 | 76 | def get_zar1_text_output(self): 77 | return b''.join([ 78 | line.rstrip() + OUTPUT_NEW_LINE 79 | for line in self.get_zar1_text_lines() 80 | ]) 81 | 82 | def get_zar1_text_lines(self): 83 | return self._lines 84 | 85 | # == Unicode, Legacy == 86 | 87 | def get_unicode_legacy_lro_output(self): 88 | return ''.join([ 89 | line.rstrip() + OUTPUT_NEW_LINE 90 | for line in self.get_unicode_legacy_lro_lines() 91 | ]) 92 | 93 | def get_unicode_legacy_lro_lines(self): 94 | return [ 95 | zar1_encoding.convert_zar1_line_to_unicode_legacy_lro(zar1_line, line_no) 96 | for line_no, zar1_line in enumerate(self._lines, start=1) 97 | ] 98 | 99 | # == Unicode, Semantic, Left-to-Right Override == 100 | 101 | def get_unicode_lro_output(self): 102 | return ''.join([ 103 | line.rstrip() + OUTPUT_NEW_LINE 104 | for line in self.get_unicode_lro_lines() 105 | ]) 106 | 107 | def get_unicode_lro_lines(self): 108 | return [ 109 | zar1_encoding.convert_zar1_line_to_unicode_lro(zar1_line, line_no) 110 | for line_no, zar1_line in enumerate(self._lines, start=1) 111 | ] 112 | 113 | # == Unicode, Semantic, Right-to-Left Override == 114 | 115 | def get_unicode_rlo_output(self): 116 | return ''.join([ 117 | line.rstrip() + OUTPUT_NEW_LINE 118 | for line in self.get_unicode_rlo_lines() 119 | ]) 120 | 121 | def get_unicode_rlo_lines(self): 122 | return [ 123 | zar1_encoding.convert_zar1_line_to_unicode_rlo(zar1_line, line_no) 124 | for line_no, zar1_line in enumerate(self._lines, start=1) 125 | ] 126 | 127 | 128 | class Zar1TextFile(Zar1File): 129 | 130 | def __init__(self, in_file): 131 | self._file = in_file 132 | self._lines = [] 133 | self._read() 134 | 135 | def _read(self): 136 | logging.info(b'Reading Zar1 Text file...') 137 | self._file.seek(0) 138 | for line in self._file.readlines(): 139 | text = line.rstrip() # Drop CRLF 140 | self._append_line(text) 141 | 142 | 143 | class Zar1BinaryFile(Zar1File): 144 | 145 | def __init__(self, in_file): 146 | self._file = in_file 147 | self._verify_magic_number() 148 | self._lines = [] 149 | self._read() 150 | 151 | def _verify_magic_number(self): 152 | self._file.seek(0) 153 | magic = self._file.read(len(_BINARY_MAGIC)) 154 | if magic != _BINARY_MAGIC: 155 | raise ZarFileTypeError("Not a Zar1 Binary File") 156 | 157 | def _read(self): 158 | logging.info(b'Reading Zar1 Binary file...') 159 | self._file.seek(len(_BINARY_MAGIC)) 160 | 161 | header = _binary_header_struct.unpack( 162 | self._file.read(_binary_header_struct.size), 163 | ) 164 | lines_count = header[0] 165 | 166 | line_infos = [] 167 | for line_idx in range(lines_count): 168 | read_bytes = self._file.read(_binary_line_info_struct.size) 169 | line_info = _binary_line_info_struct.unpack(read_bytes) 170 | line_infos.append(line_info) 171 | 172 | for line_info in line_infos: 173 | left_indent = line_info[0] 174 | text_len = line_info[2] 175 | text = b' ' * left_indent + self._file.read(text_len) 176 | self._append_line(text) 177 | -------------------------------------------------------------------------------- /src/zarnegar_converter/zar_file.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | 3 | # Copyright (C) 2017 Behnam Esfahbod 4 | # 5 | # This program is free software: you can redistribute it and/or modify 6 | # it under the terms of the GNU General Public License as published by 7 | # the Free Software Foundation, either version 3 of the License, or 8 | # (at your option) any later version. 9 | # 10 | # This program is distributed in the hope that it will be useful, 11 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 12 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 13 | # GNU General Public License for more details. 14 | # 15 | # You should have received a copy of the GNU General Public License 16 | # along with this program. If not, see . 17 | # 18 | # Author(s): Behnam Esfahbod 19 | 20 | from __future__ import absolute_import 21 | from __future__ import division 22 | from __future__ import print_function 23 | from __future__ import unicode_literals 24 | 25 | import sys 26 | 27 | 28 | OUTPUT_NEW_LINE = b'\r\n' 29 | 30 | 31 | class ZarFile(object): 32 | 33 | @staticmethod 34 | def get(in_file): 35 | from zarnegar_converter.zar1_file import Zar1File 36 | return Zar1File.get(in_file) 37 | 38 | # == DEBUG == 39 | 40 | def get_debug(self): 41 | raise NotImplementedError 42 | 43 | # == Zar1, Text == 44 | 45 | def get_zar1_text_output(self): 46 | raise NotImplementedError 47 | 48 | def get_zar1_text_lines(self): 49 | raise NotImplementedError 50 | 51 | # == Unicode, Legacy == 52 | 53 | def get_unicode_legacy_lro_output(self): 54 | raise NotImplementedError 55 | 56 | def get_unicode_legacy_lro_lines(self): 57 | raise NotImplementedError 58 | 59 | def get_unicode_legacy_rlo_output(self): 60 | raise NotImplementedError 61 | 62 | def get_unicode_legacy_rlo_lines(self): 63 | raise NotImplementedError 64 | 65 | # == Unicode, Semantic, Left-to-Right Override == 66 | 67 | def get_unicode_lro_output(self): 68 | raise NotImplementedError 69 | 70 | def get_unicode_lro_lines(self): 71 | raise NotImplementedError 72 | 73 | # == Unicode, Semantic, Right-to-Left Override == 74 | 75 | def get_unicode_rlo_output(self): 76 | raise NotImplementedError 77 | 78 | def get_unicode_rlo_lines(self): 79 | raise NotImplementedError 80 | 81 | 82 | class ZarFileTypeError(Exception): 83 | pass 84 | --------------------------------------------------------------------------------