├── LICENSE
├── README.md
├── __init__.py
├── evaluate.py
├── log
    └── __init__.py
├── models.py
├── quantization
    ├── __init__.py
    ├── lsqplus_quantize_V1.py
    ├── lsqplus_quantize_V2.py
    ├── lsqquantize_V1.py
    └── lsqquantize_V2.py
├── seebnparam.py
└── trains.py


/LICENSE:
--------------------------------------------------------------------------------
  1 |                     GNU GENERAL PUBLIC LICENSE
  2 |                        Version 3, 29 June 2007
  3 | 
  4 |  Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
  5 |  Everyone is permitted to copy and distribute verbatim copies
  6 |  of this license document, but changing it is not allowed.
  7 |      
  8 |                             Preamble
  9 | 
 10 |   The GNU General Public License is a free, copyleft license for
 11 | software and other kinds of works.
 12 | 
 13 |   The licenses for most software and other practical works are designed
 14 | to take away your freedom to share and change the works.  By contrast,
 15 | the GNU General Public License is intended to guarantee your freedom to
 16 | share and change all versions of a program--to make sure it remains free
 17 | software for all its users.  We, the Free Software Foundation, use the
 18 | GNU General Public License for most of our software; it applies also to
 19 | any other work released this way by its authors.  You can apply it to
 20 | your programs, too.
 21 | 
 22 |   When we speak of free software, we are referring to freedom, not
 23 | price.  Our General Public Licenses are designed to make sure that you
 24 | have the freedom to distribute copies of free software (and charge for
 25 | them if you wish), that you receive source code or can get it if you
 26 | want it, that you can change the software or use pieces of it in new
 27 | free programs, and that you know you can do these things.
 28 | 
 29 |   To protect your rights, we need to prevent others from denying you
 30 | these rights or asking you to surrender the rights.  Therefore, you have
 31 | certain responsibilities if you distribute copies of the software, or if
 32 | you modify it: responsibilities to respect the freedom of others.
 33 | 
 34 |   For example, if you distribute copies of such a program, whether
 35 | gratis or for a fee, you must pass on to the recipients the same
 36 | freedoms that you received.  You must make sure that they, too, receive
 37 | or can get the source code.  And you must show them these terms so they
 38 | know their rights.
 39 | 
 40 |   Developers that use the GNU GPL protect your rights with two steps:
 41 | (1) assert copyright on the software, and (2) offer you this License
 42 | giving you legal permission to copy, distribute and/or modify it.
 43 | 
 44 |   For the developers' and authors' protection, the GPL clearly explains
 45 | that there is no warranty for this free software.  For both users' and
 46 | authors' sake, the GPL requires that modified versions be marked as
 47 | changed, so that their problems will not be attributed erroneously to
 48 | authors of previous versions.
 49 | 
 50 |   Some devices are designed to deny users access to install or run
 51 | modified versions of the software inside them, although the manufacturer
 52 | can do so.  This is fundamentally incompatible with the aim of
 53 | protecting users' freedom to change the software.  The systematic
 54 | pattern of such abuse occurs in the area of products for individuals to
 55 | use, which is precisely where it is most unacceptable.  Therefore, we
 56 | have designed this version of the GPL to prohibit the practice for those
 57 | products.  If such problems arise substantially in other domains, we
 58 | stand ready to extend this provision to those domains in future versions
 59 | of the GPL, as needed to protect the freedom of users.
 60 | 
 61 |   Finally, every program is threatened constantly by software patents.
 62 | States should not allow patents to restrict development and use of
 63 | software on general-purpose computers, but in those that do, we wish to
 64 | avoid the special danger that patents applied to a free program could
 65 | make it effectively proprietary.  To prevent this, the GPL assures that
 66 | patents cannot be used to render the program non-free.
 67 | 
 68 |   The precise terms and conditions for copying, distribution and
 69 | modification follow.
 70 | 
 71 |                        TERMS AND CONDITIONS
 72 | 
 73 |   0. Definitions.
 74 | 
 75 |   "This License" refers to version 3 of the GNU General Public License.
 76 | 
 77 |   "Copyright" also means copyright-like laws that apply to other kinds of
 78 | works, such as semiconductor masks.
 79 | 
 80 |   "The Program" refers to any copyrightable work licensed under this
 81 | License.  Each licensee is addressed as "you".  "Licensees" and
 82 | "recipients" may be individuals or organizations.
 83 | 
 84 |   To "modify" a work means to copy from or adapt all or part of the work
 85 | in a fashion requiring copyright permission, other than the making of an
 86 | exact copy.  The resulting work is called a "modified version" of the
 87 | earlier work or a work "based on" the earlier work.
 88 | 
 89 |   A "covered work" means either the unmodified Program or a work based
 90 | on the Program.
 91 | 
 92 |   To "propagate" a work means to do anything with it that, without
 93 | permission, would make you directly or secondarily liable for
 94 | infringement under applicable copyright law, except executing it on a
 95 | computer or modifying a private copy.  Propagation includes copying,
 96 | distribution (with or without modification), making available to the
 97 | public, and in some countries other activities as well.
 98 | 
 99 |   To "convey" a work means any kind of propagation that enables other
100 | parties to make or receive copies.  Mere interaction with a user through
101 | a computer network, with no transfer of a copy, is not conveying.
102 | 
103 |   An interactive user interface displays "Appropriate Legal Notices"
104 | to the extent that it includes a convenient and prominently visible
105 | feature that (1) displays an appropriate copyright notice, and (2)
106 | tells the user that there is no warranty for the work (except to the
107 | extent that warranties are provided), that licensees may convey the
108 | work under this License, and how to view a copy of this License.  If
109 | the interface presents a list of user commands or options, such as a
110 | menu, a prominent item in the list meets this criterion.
111 | 
112 |   1. Source Code.
113 | 
114 |   The "source code" for a work means the preferred form of the work
115 | for making modifications to it.  "Object code" means any non-source
116 | form of a work.
117 | 
118 |   A "Standard Interface" means an interface that either is an official
119 | standard defined by a recognized standards body, or, in the case of
120 | interfaces specified for a particular programming language, one that
121 | is widely used among developers working in that language.
122 | 
123 |   The "System Libraries" of an executable work include anything, other
124 | than the work as a whole, that (a) is included in the normal form of
125 | packaging a Major Component, but which is not part of that Major
126 | Component, and (b) serves only to enable use of the work with that
127 | Major Component, or to implement a Standard Interface for which an
128 | implementation is available to the public in source code form.  A
129 | "Major Component", in this context, means a major essential component
130 | (kernel, window system, and so on) of the specific operating system
131 | (if any) on which the executable work runs, or a compiler used to
132 | produce the work, or an object code interpreter used to run it.
133 | 
134 |   The "Corresponding Source" for a work in object code form means all
135 | the source code needed to generate, install, and (for an executable
136 | work) run the object code and to modify the work, including scripts to
137 | control those activities.  However, it does not include the work's
138 | System Libraries, or general-purpose tools or generally available free
139 | programs which are used unmodified in performing those activities but
140 | which are not part of the work.  For example, Corresponding Source
141 | includes interface definition files associated with source files for
142 | the work, and the source code for shared libraries and dynamically
143 | linked subprograms that the work is specifically designed to require,
144 | such as by intimate data communication or control flow between those
145 | subprograms and other parts of the work.
146 | 
147 |   The Corresponding Source need not include anything that users
148 | can regenerate automatically from other parts of the Corresponding
149 | Source.
150 | 
151 |   The Corresponding Source for a work in source code form is that
152 | same work.
153 | 
154 |   2. Basic Permissions.
155 | 
156 |   All rights granted under this License are granted for the term of
157 | copyright on the Program, and are irrevocable provided the stated
158 | conditions are met.  This License explicitly affirms your unlimited
159 | permission to run the unmodified Program.  The output from running a
160 | covered work is covered by this License only if the output, given its
161 | content, constitutes a covered work.  This License acknowledges your
162 | rights of fair use or other equivalent, as provided by copyright law.
163 | 
164 |   You may make, run and propagate covered works that you do not
165 | convey, without conditions so long as your license otherwise remains
166 | in force.  You may convey covered works to others for the sole purpose
167 | of having them make modifications exclusively for you, or provide you
168 | with facilities for running those works, provided that you comply with
169 | the terms of this License in conveying all material for which you do
170 | not control copyright.  Those thus making or running the covered works
171 | for you must do so exclusively on your behalf, under your direction
172 | and control, on terms that prohibit them from making any copies of
173 | your copyrighted material outside their relationship with you.
174 | 
175 |   Conveying under any other circumstances is permitted solely under
176 | the conditions stated below.  Sublicensing is not allowed; section 10
177 | makes it unnecessary.
178 | 
179 |   3. Protecting Users' Legal Rights From Anti-Circumvention Law.
180 | 
181 |   No covered work shall be deemed part of an effective technological
182 | measure under any applicable law fulfilling obligations under article
183 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or
184 | similar laws prohibiting or restricting circumvention of such
185 | measures.
186 | 
187 |   When you convey a covered work, you waive any legal power to forbid
188 | circumvention of technological measures to the extent such circumvention
189 | is effected by exercising rights under this License with respect to
190 | the covered work, and you disclaim any intention to limit operation or
191 | modification of the work as a means of enforcing, against the work's
192 | users, your or third parties' legal rights to forbid circumvention of
193 | technological measures.
194 | 
195 |   4. Conveying Verbatim Copies.
196 | 
197 |   You may convey verbatim copies of the Program's source code as you
198 | receive it, in any medium, provided that you conspicuously and
199 | appropriately publish on each copy an appropriate copyright notice;
200 | keep intact all notices stating that this License and any
201 | non-permissive terms added in accord with section 7 apply to the code;
202 | keep intact all notices of the absence of any warranty; and give all
203 | recipients a copy of this License along with the Program.
204 | 
205 |   You may charge any price or no price for each copy that you convey,
206 | and you may offer support or warranty protection for a fee.
207 | 
208 |   5. Conveying Modified Source Versions.
209 | 
210 |   You may convey a work based on the Program, or the modifications to
211 | produce it from the Program, in the form of source code under the
212 | terms of section 4, provided that you also meet all of these conditions:
213 | 
214 |     a) The work must carry prominent notices stating that you modified
215 |     it, and giving a relevant date.
216 | 
217 |     b) The work must carry prominent notices stating that it is
218 |     released under this License and any conditions added under section
219 |     7.  This requirement modifies the requirement in section 4 to
220 |     "keep intact all notices".
221 | 
222 |     c) You must license the entire work, as a whole, under this
223 |     License to anyone who comes into possession of a copy.  This
224 |     License will therefore apply, along with any applicable section 7
225 |     additional terms, to the whole of the work, and all its parts,
226 |     regardless of how they are packaged.  This License gives no
227 |     permission to license the work in any other way, but it does not
228 |     invalidate such permission if you have separately received it.
229 | 
230 |     d) If the work has interactive user interfaces, each must display
231 |     Appropriate Legal Notices; however, if the Program has interactive
232 |     interfaces that do not display Appropriate Legal Notices, your
233 |     work need not make them do so.
234 | 
235 |   A compilation of a covered work with other separate and independent
236 | works, which are not by their nature extensions of the covered work,
237 | and which are not combined with it such as to form a larger program,
238 | in or on a volume of a storage or distribution medium, is called an
239 | "aggregate" if the compilation and its resulting copyright are not
240 | used to limit the access or legal rights of the compilation's users
241 | beyond what the individual works permit.  Inclusion of a covered work
242 | in an aggregate does not cause this License to apply to the other
243 | parts of the aggregate.
244 | 
245 |   6. Conveying Non-Source Forms.
246 | 
247 |   You may convey a covered work in object code form under the terms
248 | of sections 4 and 5, provided that you also convey the
249 | machine-readable Corresponding Source under the terms of this License,
250 | in one of these ways:
251 | 
252 |     a) Convey the object code in, or embodied in, a physical product
253 |     (including a physical distribution medium), accompanied by the
254 |     Corresponding Source fixed on a durable physical medium
255 |     customarily used for software interchange.
256 | 
257 |     b) Convey the object code in, or embodied in, a physical product
258 |     (including a physical distribution medium), accompanied by a
259 |     written offer, valid for at least three years and valid for as
260 |     long as you offer spare parts or customer support for that product
261 |     model, to give anyone who possesses the object code either (1) a
262 |     copy of the Corresponding Source for all the software in the
263 |     product that is covered by this License, on a durable physical
264 |     medium customarily used for software interchange, for a price no
265 |     more than your reasonable cost of physically performing this
266 |     conveying of source, or (2) access to copy the
267 |     Corresponding Source from a network server at no charge.
268 | 
269 |     c) Convey individual copies of the object code with a copy of the
270 |     written offer to provide the Corresponding Source.  This
271 |     alternative is allowed only occasionally and noncommercially, and
272 |     only if you received the object code with such an offer, in accord
273 |     with subsection 6b.
274 | 
275 |     d) Convey the object code by offering access from a designated
276 |     place (gratis or for a charge), and offer equivalent access to the
277 |     Corresponding Source in the same way through the same place at no
278 |     further charge.  You need not require recipients to copy the
279 |     Corresponding Source along with the object code.  If the place to
280 |     copy the object code is a network server, the Corresponding Source
281 |     may be on a different server (operated by you or a third party)
282 |     that supports equivalent copying facilities, provided you maintain
283 |     clear directions next to the object code saying where to find the
284 |     Corresponding Source.  Regardless of what server hosts the
285 |     Corresponding Source, you remain obligated to ensure that it is
286 |     available for as long as needed to satisfy these requirements.
287 | 
288 |     e) Convey the object code using peer-to-peer transmission, provided
289 |     you inform other peers where the object code and Corresponding
290 |     Source of the work are being offered to the general public at no
291 |     charge under subsection 6d.
292 | 
293 |   A separable portion of the object code, whose source code is excluded
294 | from the Corresponding Source as a System Library, need not be
295 | included in conveying the object code work.
296 | 
297 |   A "User Product" is either (1) a "consumer product", which means any
298 | tangible personal property which is normally used for personal, family,
299 | or household purposes, or (2) anything designed or sold for incorporation
300 | into a dwelling.  In determining whether a product is a consumer product,
301 | doubtful cases shall be resolved in favor of coverage.  For a particular
302 | product received by a particular user, "normally used" refers to a
303 | typical or common use of that class of product, regardless of the status
304 | of the particular user or of the way in which the particular user
305 | actually uses, or expects or is expected to use, the product.  A product
306 | is a consumer product regardless of whether the product has substantial
307 | commercial, industrial or non-consumer uses, unless such uses represent
308 | the only significant mode of use of the product.
309 | 
310 |   "Installation Information" for a User Product means any methods,
311 | procedures, authorization keys, or other information required to install
312 | and execute modified versions of a covered work in that User Product from
313 | a modified version of its Corresponding Source.  The information must
314 | suffice to ensure that the continued functioning of the modified object
315 | code is in no case prevented or interfered with solely because
316 | modification has been made.
317 | 
318 |   If you convey an object code work under this section in, or with, or
319 | specifically for use in, a User Product, and the conveying occurs as
320 | part of a transaction in which the right of possession and use of the
321 | User Product is transferred to the recipient in perpetuity or for a
322 | fixed term (regardless of how the transaction is characterized), the
323 | Corresponding Source conveyed under this section must be accompanied
324 | by the Installation Information.  But this requirement does not apply
325 | if neither you nor any third party retains the ability to install
326 | modified object code on the User Product (for example, the work has
327 | been installed in ROM).
328 | 
329 |   The requirement to provide Installation Information does not include a
330 | requirement to continue to provide support service, warranty, or updates
331 | for a work that has been modified or installed by the recipient, or for
332 | the User Product in which it has been modified or installed.  Access to a
333 | network may be denied when the modification itself materially and
334 | adversely affects the operation of the network or violates the rules and
335 | protocols for communication across the network.
336 | 
337 |   Corresponding Source conveyed, and Installation Information provided,
338 | in accord with this section must be in a format that is publicly
339 | documented (and with an implementation available to the public in
340 | source code form), and must require no special password or key for
341 | unpacking, reading or copying.
342 | 
343 |   7. Additional Terms.
344 | 
345 |   "Additional permissions" are terms that supplement the terms of this
346 | License by making exceptions from one or more of its conditions.
347 | Additional permissions that are applicable to the entire Program shall
348 | be treated as though they were included in this License, to the extent
349 | that they are valid under applicable law.  If additional permissions
350 | apply only to part of the Program, that part may be used separately
351 | under those permissions, but the entire Program remains governed by
352 | this License without regard to the additional permissions.
353 | 
354 |   When you convey a copy of a covered work, you may at your option
355 | remove any additional permissions from that copy, or from any part of
356 | it.  (Additional permissions may be written to require their own
357 | removal in certain cases when you modify the work.)  You may place
358 | additional permissions on material, added by you to a covered work,
359 | for which you have or can give appropriate copyright permission.
360 | 
361 |   Notwithstanding any other provision of this License, for material you
362 | add to a covered work, you may (if authorized by the copyright holders of
363 | that material) supplement the terms of this License with terms:
364 | 
365 |     a) Disclaiming warranty or limiting liability differently from the
366 |     terms of sections 15 and 16 of this License; or
367 | 
368 |     b) Requiring preservation of specified reasonable legal notices or
369 |     author attributions in that material or in the Appropriate Legal
370 |     Notices displayed by works containing it; or
371 | 
372 |     c) Prohibiting misrepresentation of the origin of that material, or
373 |     requiring that modified versions of such material be marked in
374 |     reasonable ways as different from the original version; or
375 | 
376 |     d) Limiting the use for publicity purposes of names of licensors or
377 |     authors of the material; or
378 | 
379 |     e) Declining to grant rights under trademark law for use of some
380 |     trade names, trademarks, or service marks; or
381 | 
382 |     f) Requiring indemnification of licensors and authors of that
383 |     material by anyone who conveys the material (or modified versions of
384 |     it) with contractual assumptions of liability to the recipient, for
385 |     any liability that these contractual assumptions directly impose on
386 |     those licensors and authors.
387 | 
388 |   All other non-permissive additional terms are considered "further
389 | restrictions" within the meaning of section 10.  If the Program as you
390 | received it, or any part of it, contains a notice stating that it is
391 | governed by this License along with a term that is a further
392 | restriction, you may remove that term.  If a license document contains
393 | a further restriction but permits relicensing or conveying under this
394 | License, you may add to a covered work material governed by the terms
395 | of that license document, provided that the further restriction does
396 | not survive such relicensing or conveying.
397 | 
398 |   If you add terms to a covered work in accord with this section, you
399 | must place, in the relevant source files, a statement of the
400 | additional terms that apply to those files, or a notice indicating
401 | where to find the applicable terms.
402 | 
403 |   Additional terms, permissive or non-permissive, may be stated in the
404 | form of a separately written license, or stated as exceptions;
405 | the above requirements apply either way.
406 | 
407 |   8. Termination.
408 | 
409 |   You may not propagate or modify a covered work except as expressly
410 | provided under this License.  Any attempt otherwise to propagate or
411 | modify it is void, and will automatically terminate your rights under
412 | this License (including any patent licenses granted under the third
413 | paragraph of section 11).
414 | 
415 |   However, if you cease all violation of this License, then your
416 | license from a particular copyright holder is reinstated (a)
417 | provisionally, unless and until the copyright holder explicitly and
418 | finally terminates your license, and (b) permanently, if the copyright
419 | holder fails to notify you of the violation by some reasonable means
420 | prior to 60 days after the cessation.
421 | 
422 |   Moreover, your license from a particular copyright holder is
423 | reinstated permanently if the copyright holder notifies you of the
424 | violation by some reasonable means, this is the first time you have
425 | received notice of violation of this License (for any work) from that
426 | copyright holder, and you cure the violation prior to 30 days after
427 | your receipt of the notice.
428 | 
429 |   Termination of your rights under this section does not terminate the
430 | licenses of parties who have received copies or rights from you under
431 | this License.  If your rights have been terminated and not permanently
432 | reinstated, you do not qualify to receive new licenses for the same
433 | material under section 10.
434 | 
435 |   9. Acceptance Not Required for Having Copies.
436 | 
437 |   You are not required to accept this License in order to receive or
438 | run a copy of the Program.  Ancillary propagation of a covered work
439 | occurring solely as a consequence of using peer-to-peer transmission
440 | to receive a copy likewise does not require acceptance.  However,
441 | nothing other than this License grants you permission to propagate or
442 | modify any covered work.  These actions infringe copyright if you do
443 | not accept this License.  Therefore, by modifying or propagating a
444 | covered work, you indicate your acceptance of this License to do so.
445 | 
446 |   10. Automatic Licensing of Downstream Recipients.
447 | 
448 |   Each time you convey a covered work, the recipient automatically
449 | receives a license from the original licensors, to run, modify and
450 | propagate that work, subject to this License.  You are not responsible
451 | for enforcing compliance by third parties with this License.
452 | 
453 |   An "entity transaction" is a transaction transferring control of an
454 | organization, or substantially all assets of one, or subdividing an
455 | organization, or merging organizations.  If propagation of a covered
456 | work results from an entity transaction, each party to that
457 | transaction who receives a copy of the work also receives whatever
458 | licenses to the work the party's predecessor in interest had or could
459 | give under the previous paragraph, plus a right to possession of the
460 | Corresponding Source of the work from the predecessor in interest, if
461 | the predecessor has it or can get it with reasonable efforts.
462 | 
463 |   You may not impose any further restrictions on the exercise of the
464 | rights granted or affirmed under this License.  For example, you may
465 | not impose a license fee, royalty, or other charge for exercise of
466 | rights granted under this License, and you may not initiate litigation
467 | (including a cross-claim or counterclaim in a lawsuit) alleging that
468 | any patent claim is infringed by making, using, selling, offering for
469 | sale, or importing the Program or any portion of it.
470 | 
471 |   11. Patents.
472 | 
473 |   A "contributor" is a copyright holder who authorizes use under this
474 | License of the Program or a work on which the Program is based.  The
475 | work thus licensed is called the contributor's "contributor version".
476 | 
477 |   A contributor's "essential patent claims" are all patent claims
478 | owned or controlled by the contributor, whether already acquired or
479 | hereafter acquired, that would be infringed by some manner, permitted
480 | by this License, of making, using, or selling its contributor version,
481 | but do not include claims that would be infringed only as a
482 | consequence of further modification of the contributor version.  For
483 | purposes of this definition, "control" includes the right to grant
484 | patent sublicenses in a manner consistent with the requirements of
485 | this License.
486 | 
487 |   Each contributor grants you a non-exclusive, worldwide, royalty-free
488 | patent license under the contributor's essential patent claims, to
489 | make, use, sell, offer for sale, import and otherwise run, modify and
490 | propagate the contents of its contributor version.
491 | 
492 |   In the following three paragraphs, a "patent license" is any express
493 | agreement or commitment, however denominated, not to enforce a patent
494 | (such as an express permission to practice a patent or covenant not to
495 | sue for patent infringement).  To "grant" such a patent license to a
496 | party means to make such an agreement or commitment not to enforce a
497 | patent against the party.
498 | 
499 |   If you convey a covered work, knowingly relying on a patent license,
500 | and the Corresponding Source of the work is not available for anyone
501 | to copy, free of charge and under the terms of this License, through a
502 | publicly available network server or other readily accessible means,
503 | then you must either (1) cause the Corresponding Source to be so
504 | available, or (2) arrange to deprive yourself of the benefit of the
505 | patent license for this particular work, or (3) arrange, in a manner
506 | consistent with the requirements of this License, to extend the patent
507 | license to downstream recipients.  "Knowingly relying" means you have
508 | actual knowledge that, but for the patent license, your conveying the
509 | covered work in a country, or your recipient's use of the covered work
510 | in a country, would infringe one or more identifiable patents in that
511 | country that you have reason to believe are valid.
512 | 
513 |   If, pursuant to or in connection with a single transaction or
514 | arrangement, you convey, or propagate by procuring conveyance of, a
515 | covered work, and grant a patent license to some of the parties
516 | receiving the covered work authorizing them to use, propagate, modify
517 | or convey a specific copy of the covered work, then the patent license
518 | you grant is automatically extended to all recipients of the covered
519 | work and works based on it.
520 | 
521 |   A patent license is "discriminatory" if it does not include within
522 | the scope of its coverage, prohibits the exercise of, or is
523 | conditioned on the non-exercise of one or more of the rights that are
524 | specifically granted under this License.  You may not convey a covered
525 | work if you are a party to an arrangement with a third party that is
526 | in the business of distributing software, under which you make payment
527 | to the third party based on the extent of your activity of conveying
528 | the work, and under which the third party grants, to any of the
529 | parties who would receive the covered work from you, a discriminatory
530 | patent license (a) in connection with copies of the covered work
531 | conveyed by you (or copies made from those copies), or (b) primarily
532 | for and in connection with specific products or compilations that
533 | contain the covered work, unless you entered into that arrangement,
534 | or that patent license was granted, prior to 28 March 2007.
535 | 
536 |   Nothing in this License shall be construed as excluding or limiting
537 | any implied license or other defenses to infringement that may
538 | otherwise be available to you under applicable patent law.
539 | 
540 |   12. No Surrender of Others' Freedom.
541 | 
542 |   If conditions are imposed on you (whether by court order, agreement or
543 | otherwise) that contradict the conditions of this License, they do not
544 | excuse you from the conditions of this License.  If you cannot convey a
545 | covered work so as to satisfy simultaneously your obligations under this
546 | License and any other pertinent obligations, then as a consequence you may
547 | not convey it at all.  For example, if you agree to terms that obligate you
548 | to collect a royalty for further conveying from those to whom you convey
549 | the Program, the only way you could satisfy both those terms and this
550 | License would be to refrain entirely from conveying the Program.
551 | 
552 |   13. Use with the GNU Affero General Public License.
553 | 
554 |   Notwithstanding any other provision of this License, you have
555 | permission to link or combine any covered work with a work licensed
556 | under version 3 of the GNU Affero General Public License into a single
557 | combined work, and to convey the resulting work.  The terms of this
558 | License will continue to apply to the part which is the covered work,
559 | but the special requirements of the GNU Affero General Public License,
560 | section 13, concerning interaction through a network will apply to the
561 | combination as such.
562 | 
563 |   14. Revised Versions of this License.
564 | 
565 |   The Free Software Foundation may publish revised and/or new versions of
566 | the GNU General Public License from time to time.  Such new versions will
567 | be similar in spirit to the present version, but may differ in detail to
568 | address new problems or concerns.
569 | 
570 |   Each version is given a distinguishing version number.  If the
571 | Program specifies that a certain numbered version of the GNU General
572 | Public License "or any later version" applies to it, you have the
573 | option of following the terms and conditions either of that numbered
574 | version or of any later version published by the Free Software
575 | Foundation.  If the Program does not specify a version number of the
576 | GNU General Public License, you may choose any version ever published
577 | by the Free Software Foundation.
578 | 
579 |   If the Program specifies that a proxy can decide which future
580 | versions of the GNU General Public License can be used, that proxy's
581 | public statement of acceptance of a version permanently authorizes you
582 | to choose that version for the Program.
583 | 
584 |   Later license versions may give you additional or different
585 | permissions.  However, no additional obligations are imposed on any
586 | author or copyright holder as a result of your choosing to follow a
587 | later version.
588 | 
589 |   15. Disclaimer of Warranty.
590 | 
591 |   THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
592 | APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
593 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
594 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
595 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
596 | PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
597 | IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
598 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
599 | 
600 |   16. Limitation of Liability.
601 | 
602 |   IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
603 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
604 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
605 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
606 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
607 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
608 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
609 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
610 | SUCH DAMAGES.
611 | 
612 |   17. Interpretation of Sections 15 and 16.
613 | 
614 |   If the disclaimer of warranty and limitation of liability provided
615 | above cannot be given local legal effect according to their terms,
616 | reviewing courts shall apply local law that most closely approximates
617 | an absolute waiver of all civil liability in connection with the
618 | Program, unless a warranty or assumption of liability accompanies a
619 | copy of the Program in return for a fee.
620 | 
621 |                      END OF TERMS AND CONDITIONS
622 | 
623 |             How to Apply These Terms to Your New Programs
624 | 
625 |   If you develop a new program, and you want it to be of the greatest
626 | possible use to the public, the best way to achieve this is to make it
627 | free software which everyone can redistribute and change under these terms.
628 | 
629 |   To do so, attach the following notices to the program.  It is safest
630 | to attach them to the start of each source file to most effectively
631 | state the exclusion of warranty; and each file should have at least
632 | the "copyright" line and a pointer to where the full notice is found.
633 | 
634 |     <one line to give the program's name and a brief idea of what it does.>
635 |     Copyright (C) <year>  <name of author>
636 | 
637 |     This program is free software: you can redistribute it and/or modify
638 |     it under the terms of the GNU General Public License as published by
639 |     the Free Software Foundation, either version 3 of the License, or
640 |     (at your option) any later version.
641 | 
642 |     This program is distributed in the hope that it will be useful,
643 |     but WITHOUT ANY WARRANTY; without even the implied warranty of
644 |     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
645 |     GNU General Public License for more details.
646 | 
647 |     You should have received a copy of the GNU General Public License
648 |     along with this program.  If not, see <https://www.gnu.org/licenses/>.
649 | 
650 | Also add information on how to contact you by electronic and paper mail.
651 | 
652 |   If the program does terminal interaction, make it output a short
653 | notice like this when it starts in an interactive mode:
654 | 
655 |     <program>  Copyright (C) <year>  <name of author>
656 |     This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
657 |     This is free software, and you are welcome to redistribute it
658 |     under certain conditions; type `show c' for details.
659 | 
660 | The hypothetical commands `show w' and `show c' should show the appropriate
661 | parts of the General Public License.  Of course, your program's commands
662 | might be different; for a GUI interface, you would use an "about box".
663 | 
664 |   You should also get your employer (if you work as a programmer) or school,
665 | if any, to sign a "copyright disclaimer" for the program, if necessary.
666 | For more information on this, and how to apply and follow the GNU GPL, see
667 | <https://www.gnu.org/licenses/>.
668 | 
669 |   The GNU General Public License does not permit incorporating your program
670 | into proprietary programs.  If your program is a subroutine library, you
671 | may consider it more useful to permit linking proprietary applications with
672 | the library.  If this is what you want to do, use the GNU Lesser General
673 | Public License instead of this License.  But first, please read
674 | <https://www.gnu.org/licenses/why-not-lgpl.html>.
675 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # LSQ and LSQ+<br>
 2 | LSQ+ net or LSQplus net and LSQ net <br>
 3 | 
 4 | ## commit log<br>
 5 | `
 6 | 2023-01-08
 7 | `
 8 | Dorefa and Pact, [https://github.com/ZouJiu1/Dorefa_Pact](https://github.com/ZouJiu1/Dorefa_Pact)<br>
 9 | --------------------------------------------------------------------------------------------------------------<br>
10 | add torch.nn.Parameter .data, retrain models 18-01-2022<br>
11 | 
12 | I'm not the author, I just complish an unofficial implementation of LSQ+ or LSQplus and LSQ，the origin paper you can find LSQ+ here [arxiv.org/abs/2004.09576](https://arxiv.org/abs/2004.09576) and LSQ here [arxiv.org/abs/1902.08153](https://arxiv.org/abs/1902.08153).<br>
13 | 
14 | pytorch==1.8.1<br>
15 | 
16 | You should train 32-bit float model firstly, then you can finetune a low bit-width quantization QAT model by loading the trained 32-bit float model<br>
17 | 
18 | Dataset used for training is CIFAR10 and model used is Resnet18 revised<br>
19 | 
20 | ## Version introduction
21 | lsqplus_quantize_V1.py: initialize s、beta of activation quantization according to LSQ+ [LSQ+: Improving low-bit quantization through learnable offsets and better initialization](https://arxiv.org/abs/2004.09576)<br><br>
22 | lsqplus_quantize_V2.py: initialize s、beta of activation quantization according to min max values<br><br>
23 | lsqquantize_V1.py：initialize s of activation quantization according to LSQ [Learned Step Size Quantization](https://arxiv.org/abs/1902.08153)<br><br>
24 | lsqquantize_V2.py: initialize s of activation quantization = 1<br><br>
25 | lsqplus_quantize_V2.py has the best result when use cifar10 dataset<br>
26 | 
27 | ## The Train Results 
28 | ### For the below table all set a_bit=8, w_bit=8
29 | | version | weight per_channel | learning rate | A s initial | A beta initial | best epoch | Accuracy | models
30 | | ------ | --------- | ------ | ------ | ------ | ------ | ------ | ------ |
31 | | Float 32bit | - | <=66 0.1<br><=86 0.01<br><=99 0.001<br><=112 0.0001 | - | - | 112 | 92.6 | [https://www.aliyundrive.com/s/6B2AZ45fFjx](https://www.aliyundrive.com/s/6B2AZ45fFjx) |
32 | | lsqplus_quantize_V1 | × | <=31 0.1<br><=61 0.01<br><=81 0.001<br><112 0.0001 | 1 | -1e-9 | 90 | 90.3 | [https://www.aliyundrive.com/s/FNZRhoTe8uW](https://www.aliyundrive.com/s/FNZRhoTe8uW) |
33 | | lsqplus_quantize_V2 | × | as before | - | - | 87 | 92.8 | [https://www.aliyundrive.com/s/WDH3ZnEa7vy](https://www.aliyundrive.com/s/WDH3ZnEa7vy) |
34 | | lsqplus_quantize_V1 | ✔ | as before | - | - | 96 | 91.19  | [https://www.aliyundrive.com/s/JATsi4vdurp](https://www.aliyundrive.com/s/JATsi4vdurp) |
35 | | lsqplus_quantize_V2 | ✔ | as before | - | - | 69 | 92.8 | [https://www.aliyundrive.com/s/LRWHaBLQGWc](https://www.aliyundrive.com/s/LRWHaBLQGWc) |
36 | | lsqquantize_V1 | × | as before | - | - | 102 | 91.89 | [https://www.aliyundrive.com/s/nR1KZZRuB23](https://www.aliyundrive.com/s/nR1KZZRuB23) |
37 | | lsqquantize_V2 | × | as before | - | - | 69 | 91.82 | [https://www.aliyundrive.com/s/7fjmViqUvh4](https://www.aliyundrive.com/s/7fjmViqUvh4) |
38 | | lsqquantize_V1 | ✔ | as before | - | - | 108 | 91.29 | [https://www.aliyundrive.com/s/](https://www.aliyundrive.com/s/PX84qGorVxY) |
39 | | lsqquantize_V2 | ✔ | as before | - | - | 72 | 91.72 | [https://www.aliyundrive.com/s/7nGvMVZcKp7](https://www.aliyundrive.com/s/7nGvMVZcKp7) |
40 | <br>
41 | all
42 | 
43 | [https://www.aliyundrive.com/s/hng9XsvhYru](https://www.aliyundrive.com/s/hng9XsvhYru)  
44 | 
45 | <br>
46 | A represent activation, I use moving average method to initialize s and beta.<br><br>
47 | 
48 | LEARNED STEP SIZE QUANTIZATION<br>
49 | LSQ+: Improving low-bit quantization through learnable offsets and better initialization<br>
50 | 
51 | ### References<br>
52 | https://github.com/666DZY666/micronet<br>
53 | https://github.com/hustzxd/LSQuantization<br>
54 | https://github.com/zhutmost/lsq-net<br>
55 | https://github.com/Zhen-Dong/HAWQ<br>
56 | https://github.com/KwangHoonAn/PACT<br>
57 | https://github.com/Jermmy/pytorch-quantization-demo<br>
58 | 


--------------------------------------------------------------------------------
/__init__.py:
--------------------------------------------------------------------------------
1 |       
2 |       
3 | 


--------------------------------------------------------------------------------
/evaluate.py:
--------------------------------------------------------------------------------
  1 | #encoding=utf-8
  2 | #Author: ZouJiu
  3 | #Time: 2021-11-13
  4 | 
  5 | import numpy as np
  6 | import torch
  7 | import os
  8 | import time
  9 | import torch
 10 | import torchvision
 11 | import torchvision.transforms as transforms
 12 | from torch.utils.data import Dataset, DataLoader
 13 | # from load_datas import TF, trainDataset, collate_fn
 14 | import models #, resnet50
 15 | from quantization.lsqquantize_V1 import prepare as lsqprepareV1
 16 | from quantization.lsqquantize_V2 import prepare as lsqprepareV2
 17 | from quantization.lsqplus_quantize_V1 import prepare as lsqplusprepareV1
 18 | from quantization.lsqplus_quantize_V2 import prepare as lsqplusprepareV2
 19 | from quantization.lsqplus_quantize_V1 import update_LSQplus_activation_Scalebeta
 20 | import torch.optim as optim
 21 | import datetime
 22 | os.environ["CUDA_VISIBLE_DEVICES"] = '1'
 23 | 
 24 | def adjust_lr(optimizer, stepiters, epoch):
 25 |     if epoch < 135:
 26 |         lr = 0.1
 27 |     elif epoch < 185:
 28 |         lr = 0.01
 29 |     elif epoch < 290:
 30 |         lr = 0.001
 31 |     else:
 32 |         import sys
 33 |         sys.exit(0)
 34 |     for param_group in optimizer.param_groups:
 35 |         param_group['lr'] = lr
 36 | 
 37 | def evaluate():
 38 |     config = {'a_bit':8, 'w_bit':8, "all_positive":False, "per_channel":True, 
 39 |               "num_classes":10,"batch_init":20}
 40 |     pretrainedmodel = r'C:\Users\10696\Desktop\QAT\lsq+\log\model_108_42510_0.003_92.528_2021-11-27_17-49-47.pth'
 41 |     Resnet_pretrain = False #test
 42 |     batch_size = 128
 43 |     num_epochs = 290
 44 |     Floatmodel = True    #QAT or float-32 train
 45 |     LSQplus = True     #LSQ+ or LSQ
 46 |     scratch = True #从最开始训练，不是finetuning， 若=False就是finetuning
 47 |     tim = datetime.datetime.strftime(datetime.datetime.now(),"%Y-%m-%d %H-%M-%S").replace(' ', '_')
 48 | 
 49 |     test_transform = transforms.Compose([
 50 |         # transforms.Resize((32, 32)),
 51 |         transforms.ToTensor(),
 52 |         transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.201))])
 53 | 
 54 |     batch_size = 128 #Accuracy all is: 73.4
 55 | 
 56 |     testset = torchvision.datasets.CIFAR10(root='datas', train=False,
 57 |                                         download=True, transform=test_transform)
 58 |     testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
 59 |                                             shuffle=False, num_workers=2, drop_last=True)
 60 | 
 61 |     classes = ('plane', 'car', 'bird', 'cat',
 62 |             'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
 63 |     device = "cuda" if torch.cuda.is_available() else "cpu"
 64 | 
 65 |     model = models.resnet18(pretrained = Resnet_pretrain, num_classes=config['num_classes'])
 66 | 
 67 |     #LSQ+
 68 |     if LSQplus and not Floatmodel:
 69 |         lsqplusprepare(model, inplace=True, a_bits=config["a_bit"], w_bits=config["w_bit"],
 70 |                 all_positive=config["all_positive"], per_channel=config["per_channel"],
 71 |                 batch_init = config["batch_init"])
 72 |     elif not LSQplus and not Floatmodel:
 73 |         #LSQ
 74 |         lsqprepare(model, inplace=True, a_bits=config["a_bit"], w_bits=config["w_bit"],
 75 |                 all_positive=config["all_positive"], per_channel=config["per_channel"],
 76 |                 batch_init = config["batch_init"])
 77 |     elif Floatmodel:
 78 |         pass
 79 | 
 80 |     if not Floatmodel:
 81 |         print(model)
 82 |     if not os.path.exists(pretrainedmodel):
 83 |         print('the pretrainedmodel do not exists %s'%pretrainedmodel)
 84 |     if pretrainedmodel and os.path.exists(pretrainedmodel):
 85 |         print('loading pretrained model: ', pretrainedmodel)
 86 |         if torch.cuda.is_available():
 87 |             state_dict = torch.load(pretrainedmodel, map_location='cuda')
 88 |         else:
 89 |             state_dict = torch.load(pretrainedmodel, map_location='cpu')
 90 |         model.load_state_dict(state_dict['state_dict'])
 91 |         if not scratch:
 92 |             iteration = state_dict['iteration']
 93 |             alliters = state_dict['alliters']
 94 |             nowepoch = state_dict['nowepoch']
 95 |         else:
 96 |             iteration = 0
 97 |             alliters = 0
 98 |             nowepoch = 0
 99 |         print('loading complete')
100 |     else:
101 |         print('no pretrained model')
102 |         iteration = 0
103 |         alliters = 0
104 |         nowepoch = 0
105 |     model = model.to(device)
106 | 
107 |     print('validation of testes')
108 |     # prepare to count predictions for each class
109 |     correct_pred = {classname: 0 for classname in classes}
110 |     total_pred = {classname: 0 for classname in classes}
111 | 
112 |     model.eval()
113 |     # again no gradients needed
114 |     with torch.no_grad():
115 |         for data in testloader:
116 |             images, labels = data
117 |             images = images.to(device)
118 |             labels = labels.to(device)
119 |             outputs = model(images)
120 |             _, predictions = torch.max(outputs, 1)
121 |             # collect the correct predictions for each class
122 |             for label, prediction in zip(labels, predictions):
123 |                 if label == prediction:
124 |                     correct_pred[classes[label]] += 1
125 |                 total_pred[classes[label]] += 1
126 | 
127 | 
128 |     # print accuracy for each class
129 |     correctall = 0
130 |     alltest = 0
131 |     for classname, correct_count in correct_pred.items():
132 |         accuracy = 100 * float(correct_count) / total_pred[classname]
133 |         print("Accuracy for class {:5s} is: {:.1f} %".format(classname,
134 |                                                     accuracy))
135 |         correctall += correct_count
136 |         alltest += total_pred[classname]
137 |     print("Accuracy all is: {:.1f}".format(100 * float(correctall)/alltest))
138 |     
139 | 
140 | if __name__ == '__main__':
141 |     evaluate()
142 | 


--------------------------------------------------------------------------------
/log/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ZouJiu1/LSQplus/2076e86479491f0e68ada31d36948596a1ee24f9/log/__init__.py


--------------------------------------------------------------------------------
/models.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | from torch import Tensor
  3 | import torch.nn as nn
  4 | from torch.nn import functional as F
  5 | from torch.hub import load_state_dict_from_url
  6 | from typing import Type, Any, Callable, Union, List, Optional
  7 | 
  8 | 
  9 | __all__ = ['ResNet', 'resnet18', 'resnet34', 'resnet50', 'resnet101',
 10 |            'resnet152', 'resnext50_32x4d', 'resnext101_32x8d',
 11 |            'wide_resnet50_2', 'wide_resnet101_2']
 12 | 
 13 | 
 14 | model_urls = {
 15 |     'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth',
 16 |     'resnet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth',
 17 |     'resnet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth',
 18 |     'resnet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth',
 19 |     'resnet152': 'https://download.pytorch.org/models/resnet152-b121ed2d.pth',
 20 |     'resnext50_32x4d': 'https://download.pytorch.org/models/resnext50_32x4d-7cdf4587.pth',
 21 |     'resnext101_32x8d': 'https://download.pytorch.org/models/resnext101_32x8d-8ba56ff5.pth',
 22 |     'wide_resnet50_2': 'https://download.pytorch.org/models/wide_resnet50_2-95faca4d.pth',
 23 |     'wide_resnet101_2': 'https://download.pytorch.org/models/wide_resnet101_2-32ee1156.pth',
 24 | }
 25 | 
 26 | 
 27 | def conv3x3(in_planes: int, out_planes: int, stride: int = 1, groups: int = 1, dilation: int = 1) -> nn.Conv2d:
 28 |     """3x3 convolution with padding"""
 29 |     return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
 30 |                      padding=dilation, groups=groups, bias=False, dilation=dilation)
 31 | 
 32 | 
 33 | def conv1x1(in_planes: int, out_planes: int, stride: int = 1) -> nn.Conv2d:
 34 |     """1x1 convolution"""
 35 |     return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)
 36 | 
 37 | 
 38 | class BasicBlock(nn.Module):
 39 |     expansion: int = 1
 40 | 
 41 |     def __init__(
 42 |         self,
 43 |         inplanes: int,
 44 |         planes: int,
 45 |         stride: int = 1,
 46 |         downsample: Optional[nn.Module] = None,
 47 |         groups: int = 1,
 48 |         base_width: int = 64,
 49 |         dilation: int = 1,
 50 |         norm_layer: Optional[Callable[..., nn.Module]] = None
 51 |     ) -> None:
 52 |         super(BasicBlock, self).__init__()
 53 |         if norm_layer is None:
 54 |             norm_layer = nn.BatchNorm2d
 55 |         if groups != 1 or base_width != 64:
 56 |             raise ValueError('BasicBlock only supports groups=1 and base_width=64')
 57 |         if dilation > 1:
 58 |             raise NotImplementedError("Dilation > 1 not supported in BasicBlock")
 59 |         # Both self.conv1 and self.downsample layers downsample the input when stride != 1
 60 |         self.conv1 = conv3x3(inplanes, planes, stride)
 61 |         self.bn1 = norm_layer(planes)
 62 |         self.relu = nn.ReLU(inplace=True)
 63 |         self.conv2 = conv3x3(planes, planes)
 64 |         self.bn2 = norm_layer(planes)
 65 |         self.downsample = downsample
 66 |         self.stride = stride
 67 | 
 68 |     def forward(self, x: Tensor) -> Tensor:
 69 |         identity = x
 70 |         out = self.conv1(x)
 71 |         out = self.bn1(out)
 72 |         out = self.relu(out)
 73 | 
 74 |         out = self.conv2(out)
 75 |         out = self.bn2(out)
 76 | 
 77 |         if self.downsample is not None:
 78 |             identity = self.downsample(x)
 79 | 
 80 |         out += identity
 81 |         out = self.relu(out)
 82 | 
 83 |         return out
 84 | 
 85 | 
 86 | class Bottleneck(nn.Module):
 87 |     # Bottleneck in torchvision places the stride for downsampling at 3x3 convolution(self.conv2)
 88 |     # while original implementation places the stride at the first 1x1 convolution(self.conv1)
 89 |     # according to "Deep residual learning for image recognition"https://arxiv.org/abs/1512.03385.
 90 |     # This variant is also known as ResNet V1.5 and improves accuracy according to
 91 |     # https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch.
 92 | 
 93 |     expansion: int = 4
 94 | 
 95 |     def __init__(
 96 |         self,
 97 |         inplanes: int,
 98 |         planes: int,
 99 |         stride: int = 1,
100 |         downsample: Optional[nn.Module] = None,
101 |         groups: int = 1,
102 |         base_width: int = 64,
103 |         dilation: int = 1,
104 |         norm_layer: Optional[Callable[..., nn.Module]] = None
105 |     ) -> None:
106 |         super(Bottleneck, self).__init__()
107 |         if norm_layer is None:
108 |             norm_layer = nn.BatchNorm2d
109 |         width = int(planes * (base_width / 64.)) * groups
110 |         # Both self.conv2 and self.downsample layers downsample the input when stride != 1
111 |         self.conv1 = conv1x1(inplanes, width)
112 |         self.bn1 = norm_layer(width)
113 |         self.conv2 = conv3x3(width, width, stride, groups, dilation)
114 |         self.bn2 = norm_layer(width)
115 |         self.conv3 = conv1x1(width, planes * self.expansion)
116 |         self.bn3 = norm_layer(planes * self.expansion)
117 |         self.relu = nn.ReLU(inplace=True)
118 |         self.downsample = downsample
119 |         self.stride = stride
120 | 
121 |     def forward(self, x: Tensor) -> Tensor:
122 |         identity = x
123 | 
124 |         out = self.conv1(x)
125 |         out = self.bn1(out)
126 |         out = self.relu(out)
127 | 
128 |         out = self.conv2(out)
129 |         out = self.bn2(out)
130 |         out = self.relu(out)
131 | 
132 |         out = self.conv3(out)
133 |         out = self.bn3(out)
134 | 
135 |         if self.downsample is not None:
136 |             identity = self.downsample(x)
137 | 
138 |         out += identity
139 |         out = self.relu(out)
140 | 
141 |         return out
142 | 
143 | 
144 | class ResNet(nn.Module):
145 | 
146 |     def __init__(
147 |         self,
148 |         block: Type[Union[BasicBlock, Bottleneck]],
149 |         layers: List[int],
150 |         num_classes: int = 1000,
151 |         zero_init_residual: bool = False,
152 |         groups: int = 1,
153 |         width_per_group: int = 64,
154 |         replace_stride_with_dilation: Optional[List[bool]] = None,
155 |         norm_layer: Optional[Callable[..., nn.Module]] = None
156 |     ) -> None:
157 |         super(ResNet, self).__init__()
158 |         if norm_layer is None:
159 |             norm_layer = nn.BatchNorm2d
160 |         self._norm_layer = norm_layer
161 | 
162 |         self.inplanes = 64
163 |         self.dilation = 1
164 |         if replace_stride_with_dilation is None:
165 |             # each element in the tuple indicates if we should replace
166 |             # the 2x2 stride with a dilated convolution instead
167 |             replace_stride_with_dilation = [False, False, False]
168 |         if len(replace_stride_with_dilation) != 3:
169 |             raise ValueError("replace_stride_with_dilation should be None "
170 |                              "or a 3-element tuple, got {}".format(replace_stride_with_dilation))
171 |         self.groups = groups
172 |         self.base_width = width_per_group
173 |         self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=3, stride=1, padding=1,
174 |                                bias=False)
175 |         self.bn1 = norm_layer(self.inplanes)
176 |         self.relu = nn.ReLU(inplace=True)
177 |         self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
178 |         self.layer1 = self._make_layer(block, 64, layers[0])
179 |         self.layer2 = self._make_layer(block, 128, layers[1], stride=2,
180 |                                        dilate=replace_stride_with_dilation[0])
181 |         self.layer3 = self._make_layer(block, 256, layers[2], stride=2,
182 |                                        dilate=replace_stride_with_dilation[1])
183 |         self.layer4 = self._make_layer(block, 512, layers[3], stride=2,
184 |                                        dilate=replace_stride_with_dilation[2])
185 |         # self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
186 |         self.poolavg = nn.AvgPool2d(4)
187 |         self.fc = nn.Linear(512 * block.expansion, num_classes)
188 | 
189 |         for m in self.modules():
190 |             if isinstance(m, nn.Conv2d):
191 |                 nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
192 |             elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
193 |                 nn.init.constant_(m.weight, 1)
194 |                 nn.init.constant_(m.bias, 0)
195 | 
196 |         # Zero-initialize the last BN in each residual branch,
197 |         # so that the residual branch starts with zeros, and each residual block behaves like an identity.
198 |         # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
199 |         if zero_init_residual:
200 |             for m in self.modules():
201 |                 if isinstance(m, Bottleneck):
202 |                     nn.init.constant_(m.bn3.weight, 0)  # type: ignore[arg-type]
203 |                 elif isinstance(m, BasicBlock):
204 |                     nn.init.constant_(m.bn2.weight, 0)  # type: ignore[arg-type]
205 | 
206 |     def _make_layer(self, block: Type[Union[BasicBlock, Bottleneck]], planes: int, blocks: int,
207 |                     stride: int = 1, dilate: bool = False) -> nn.Sequential:
208 |         norm_layer = self._norm_layer
209 |         downsample = None
210 |         previous_dilation = self.dilation
211 |         if dilate:
212 |             self.dilation *= stride
213 |             stride = 1
214 |         if stride != 1 or self.inplanes != planes * block.expansion:
215 |             downsample = nn.Sequential(
216 |                 conv1x1(self.inplanes, planes * block.expansion, stride),
217 |                 norm_layer(planes * block.expansion),
218 |             )
219 | 
220 |         layers = []
221 |         layers.append(block(self.inplanes, planes, stride, downsample, self.groups,
222 |                             self.base_width, previous_dilation, norm_layer))
223 |         self.inplanes = planes * block.expansion
224 |         for _ in range(1, blocks):
225 |             layers.append(block(self.inplanes, planes, groups=self.groups,
226 |                                 base_width=self.base_width, dilation=self.dilation,
227 |                                 norm_layer=norm_layer))
228 | 
229 |         return nn.Sequential(*layers)
230 | 
231 |     def _forward_impl(self, x: Tensor) -> Tensor:
232 |         # See note [TorchScript super()]
233 |         x = self.conv1(x)
234 |         x = self.bn1(x)
235 |         x = self.relu(x)
236 |         # x = self.maxpool(x)
237 | 
238 |         x = self.layer1(x)
239 |         x = self.layer2(x)
240 |         x = self.layer3(x)
241 |         x = self.layer4(x)
242 | 
243 |         # x = self.avgpool(x)
244 |         # x = torch.flatten(x, 1)
245 | 
246 |         x = self.poolavg(x)
247 |         x = x.view(x.size()[0], -1)
248 |         x = self.fc(x)
249 | 
250 |         return x
251 | 
252 |     def forward(self, x: Tensor) -> Tensor:
253 |         return self._forward_impl(x)
254 | 
255 | 
256 | def _resnet(
257 |     arch: str,
258 |     block: Type[Union[BasicBlock, Bottleneck]],
259 |     layers: List[int],
260 |     pretrained: bool,
261 |     progress: bool,
262 |     **kwargs: Any
263 | ) -> ResNet:
264 |     model = ResNet(block, layers, **kwargs)
265 |     # if pretrained:
266 |     #     state_dict = load_state_dict_from_url(model_urls[arch],
267 |     #                                           progress=progress)
268 |     #     delete = []
269 |     #     for key, value in state_dict.items():
270 |     #         if 'fc' in key:
271 |     #             delete.append(key)
272 |     #     print('detele: ', delete)
273 |     #     for key in delete:
274 |     #         state_dict.pop(key)
275 |     #     model.load_state_dict(state_dict, strict=False)
276 |     return model
277 | 
278 | 
279 | def resnet18(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> ResNet:
280 |     r"""ResNet-18 model from
281 |     `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.
282 | 
283 |     Args:
284 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
285 |         progress (bool): If True, displays a progress bar of the download to stderr
286 |     """
287 |     return _resnet('resnet18', BasicBlock, [2, 2, 2, 2], pretrained, progress,
288 |                    **kwargs)
289 | 
290 | 
291 | def resnet34(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> ResNet:
292 |     r"""ResNet-34 model from
293 |     `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.
294 | 
295 |     Args:
296 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
297 |         progress (bool): If True, displays a progress bar of the download to stderr
298 |     """
299 |     return _resnet('resnet34', BasicBlock, [3, 4, 6, 3], pretrained, progress,
300 |                    **kwargs)
301 | 
302 | 
303 | def resnet50(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> ResNet:
304 |     r"""ResNet-50 model from
305 |     `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.
306 | 
307 |     Args:
308 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
309 |         progress (bool): If True, displays a progress bar of the download to stderr
310 |     """
311 |     return _resnet('resnet50', Bottleneck, [3, 4, 6, 3], pretrained, progress,
312 |                    **kwargs)
313 | 
314 | 
315 | def resnet101(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> ResNet:
316 |     r"""ResNet-101 model from
317 |     `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.
318 | 
319 |     Args:
320 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
321 |         progress (bool): If True, displays a progress bar of the download to stderr
322 |     """
323 |     return _resnet('resnet101', Bottleneck, [3, 4, 23, 3], pretrained, progress,
324 |                    **kwargs)
325 | 
326 | 
327 | def resnet152(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> ResNet:
328 |     r"""ResNet-152 model from
329 |     `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.
330 | 
331 |     Args:
332 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
333 |         progress (bool): If True, displays a progress bar of the download to stderr
334 |     """
335 |     return _resnet('resnet152', Bottleneck, [3, 8, 36, 3], pretrained, progress,
336 |                    **kwargs)
337 | 
338 | 
339 | def resnext50_32x4d(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> ResNet:
340 |     r"""ResNeXt-50 32x4d model from
341 |     `"Aggregated Residual Transformation for Deep Neural Networks" <https://arxiv.org/pdf/1611.05431.pdf>`_.
342 | 
343 |     Args:
344 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
345 |         progress (bool): If True, displays a progress bar of the download to stderr
346 |     """
347 |     kwargs['groups'] = 32
348 |     kwargs['width_per_group'] = 4
349 |     return _resnet('resnext50_32x4d', Bottleneck, [3, 4, 6, 3],
350 |                    pretrained, progress, **kwargs)
351 | 
352 | 
353 | def resnext101_32x8d(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> ResNet:
354 |     r"""ResNeXt-101 32x8d model from
355 |     `"Aggregated Residual Transformation for Deep Neural Networks" <https://arxiv.org/pdf/1611.05431.pdf>`_.
356 | 
357 |     Args:
358 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
359 |         progress (bool): If True, displays a progress bar of the download to stderr
360 |     """
361 |     kwargs['groups'] = 32
362 |     kwargs['width_per_group'] = 8
363 |     return _resnet('resnext101_32x8d', Bottleneck, [3, 4, 23, 3],
364 |                    pretrained, progress, **kwargs)
365 | 
366 | 
367 | def wide_resnet50_2(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> ResNet:
368 |     r"""Wide ResNet-50-2 model from
369 |     `"Wide Residual Networks" <https://arxiv.org/pdf/1605.07146.pdf>`_.
370 | 
371 |     The model is the same as ResNet except for the bottleneck number of channels
372 |     which is twice larger in every block. The number of channels in outer 1x1
373 |     convolutions is the same, e.g. last block in ResNet-50 has 2048-512-2048
374 |     channels, and in Wide ResNet-50-2 has 2048-1024-2048.
375 | 
376 |     Args:
377 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
378 |         progress (bool): If True, displays a progress bar of the download to stderr
379 |     """
380 |     kwargs['width_per_group'] = 64 * 2
381 |     return _resnet('wide_resnet50_2', Bottleneck, [3, 4, 6, 3],
382 |                    pretrained, progress, **kwargs)
383 | 
384 | 
385 | def wide_resnet101_2(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> ResNet:
386 |     r"""Wide ResNet-101-2 model from
387 |     `"Wide Residual Networks" <https://arxiv.org/pdf/1605.07146.pdf>`_.
388 | 
389 |     The model is the same as ResNet except for the bottleneck number of channels
390 |     which is twice larger in every block. The number of channels in outer 1x1
391 |     convolutions is the same, e.g. last block in ResNet-50 has 2048-512-2048
392 |     channels, and in Wide ResNet-50-2 has 2048-1024-2048.
393 | 
394 |     Args:
395 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
396 |         progress (bool): If True, displays a progress bar of the download to stderr
397 |     """
398 |     kwargs['width_per_group'] = 64 * 2
399 |     return _resnet('wide_resnet101_2', Bottleneck, [3, 4, 23, 3],
400 |                    pretrained, progress, **kwargs)
401 | 
402 | def wide_resnet1202(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> ResNet:
403 |     r"""
404 |     The model is the same as ResNet except for the bottleneck number of channels
405 |     which is twice larger in every block. The number of channels in outer 1x1
406 |     convolutions is the same, e.g. last block in ResNet-50 has 2048-512-2048
407 |     channels, and in Wide ResNet-50-2 has 2048-1024-2048.
408 | 
409 |     Args:
410 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
411 |         progress (bool): If True, displays a progress bar of the download to stderr
412 |     """
413 |     return _resnet('resnet1202', BasicBlock, [200, 200, 200, 200],
414 |                    pretrained=False, progress=progress, **kwargs)
415 | 


--------------------------------------------------------------------------------
/quantization/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ZouJiu1/LSQplus/2076e86479491f0e68ada31d36948596a1ee24f9/quantization/__init__.py


--------------------------------------------------------------------------------
/quantization/lsqplus_quantize_V1.py:
--------------------------------------------------------------------------------
  1 | import copy
  2 | import math
  3 | import torch
  4 | import torch.nn as nn
  5 | import torch.nn.functional as F
  6 | from torch.autograd import Function
  7 | from quantization.lsqquantize_V1 import Round
  8 | 
  9 | class ALSQPlus(Function):
 10 |     @staticmethod
 11 |     def forward(ctx, weight, alpha, g, Qn, Qp, beta):
 12 |         # assert alpha > 0, "alpha={}".format(alpha)
 13 |         ctx.save_for_backward(weight, alpha, beta)
 14 |         ctx.other = g, Qn, Qp
 15 |         w_q = Round.apply(torch.div((weight - beta), alpha).clamp(Qn, Qp))
 16 |         w_q = w_q * alpha + beta
 17 |         return w_q
 18 | 
 19 |     @staticmethod
 20 |     def backward(ctx, grad_weight):
 21 |         weight, alpha, beta = ctx.saved_tensors
 22 |         g, Qn, Qp = ctx.other
 23 |         q_w = (weight - beta) / alpha
 24 |         smaller = (q_w < Qn).float() #bool值转浮点值，1.0或者0.0
 25 |         bigger = (q_w > Qp).float() #bool值转浮点值，1.0或者0.0
 26 |         between = 1.0 - smaller -bigger #得到位于量化区间的index
 27 |         grad_alpha = ((smaller * Qn + bigger * Qp + 
 28 |             between * Round.apply(q_w) - between * q_w)*grad_weight * g).sum().unsqueeze(dim=0)
 29 |         grad_beta = ((smaller + bigger) * grad_weight * g).sum().unsqueeze(dim=0)
 30 |         #在量化区间之外的值都是常数，故导数也是0
 31 |         grad_weight = between * grad_weight
 32 |         #返回的梯度要和forward的参数对应起来
 33 |         return grad_weight, grad_alpha,  None, None, None, grad_beta
 34 | 
 35 | class WLSQPlus(Function):
 36 |     @staticmethod
 37 |     def forward(ctx, weight, alpha, g, Qn, Qp, per_channel):
 38 |         # assert alpha > 0, "alpha={}".format(alpha)
 39 |         ctx.save_for_backward(weight, alpha)
 40 |         ctx.other = g, Qn, Qp, per_channel
 41 |         if per_channel:
 42 |             sizes = weight.size()
 43 |             weight = weight.contiguous().view(weight.size()[0], -1)
 44 |             weight = torch.transpose(weight, 0, 1)
 45 |             alpha = torch.broadcast_to(alpha, weight.size())
 46 |             w_q = Round.apply(torch.div(weight, alpha).clamp(Qn, Qp))
 47 |             w_q = w_q * alpha
 48 |             w_q = torch.transpose(w_q, 0, 1)
 49 |             w_q = w_q.contiguous().view(sizes)
 50 |         else:
 51 |             w_q = Round.apply(torch.div(weight, alpha).clamp(Qn, Qp))
 52 |             w_q = w_q * alpha 
 53 |         return w_q
 54 | 
 55 |     @staticmethod
 56 |     def backward(ctx, grad_weight):
 57 |         weight, alpha = ctx.saved_tensors
 58 |         g, Qn, Qp, per_channel = ctx.other
 59 |         if per_channel:
 60 |             sizes = weight.size()
 61 |             weight = weight.contiguous().view(weight.size()[0], -1)
 62 |             weight = torch.transpose(weight, 0, 1)
 63 |             alpha = torch.broadcast_to(alpha, weight.size())
 64 |             q_w = weight / alpha
 65 |             q_w = torch.transpose(q_w, 0, 1)
 66 |             q_w = q_w.contiguous().view(sizes)
 67 |         else:
 68 |             q_w = weight / alpha
 69 |         smaller = (q_w < Qn).float() #bool值转浮点值，1.0或者0.0
 70 |         bigger = (q_w > Qp).float() #bool值转浮点值，1.0或者0.0
 71 |         between = 1.0 - smaller -bigger #得到位于量化区间的index
 72 |         if per_channel:
 73 |             grad_alpha = ((smaller * Qn + bigger * Qp + 
 74 |                 between * Round.apply(q_w) - between * q_w)*grad_weight * g)
 75 |             grad_alpha = grad_alpha.contiguous().view(grad_alpha.size()[0], -1).sum(dim=1)
 76 |         else:
 77 |             grad_alpha = ((smaller * Qn + bigger * Qp + 
 78 |                 between * Round.apply(q_w) - between * q_w)*grad_weight * g).sum().unsqueeze(dim=0)
 79 |         #在量化区间之外的值都是常数，故导数也是0
 80 |         grad_weight = between * grad_weight
 81 |         return grad_weight, grad_alpha, None, None, None, None
 82 | 
 83 | def grad_scale(x, scale):
 84 |     y = x
 85 |     y_grad = x * scale
 86 |     return (y - y_grad).detach() + y_grad
 87 | 
 88 | def round_pass(x):
 89 |     y = x.round()
 90 |     y_grad = x
 91 |     return (y - y_grad).detach() + y_grad
 92 | 
 93 | def get_percentile_min_max(input, lower_percentile, uppper_percentile, output_tensor):
 94 |     batch_size = input.shape[0]
 95 |     lower_index = round(batch_size * (1 - lower_percentile*0.01))
 96 |     upper_index = round(batch_size * (1 - uppper_percentile*0.01))
 97 | 
 98 |     upper_bound = torch.kthvalue(input, k=upper_index).values
 99 | 
100 |     if lower_percentile==0:
101 |         lower_bound = upper_bound * 0
102 |     else:
103 |         low_bound = -torch.kthvalue(-input, k=lower_index).values
104 |     
105 | 
106 | # A(特征)量化
107 | class LSQPlusActivationQuantizer(nn.Module):
108 |     def __init__(self, a_bits, all_positive=False,batch_init = 20):
109 |         #activations 没有per-channel这个选项的
110 |         super(LSQPlusActivationQuantizer, self).__init__()
111 |         self.a_bits = a_bits
112 |         self.all_positive = all_positive
113 |         self.batch_init = batch_init
114 |         if self.all_positive:
115 |             # unsigned activation is quantized to [0, 2^b-1]
116 |             self.Qn = 0
117 |             self.Qp = 2 ** self.a_bits - 1
118 |         else:
119 |             # signed weight/activation is quantized to [-2^(b-1), 2^(b-1)-1]
120 |             self.Qn = - 2 ** (self.a_bits - 1)
121 |             self.Qp = 2 ** (self.a_bits - 1) - 1
122 |         self.s = torch.nn.Parameter(torch.ones(1), requires_grad=True)
123 |         # self.beta = torch.nn.Parameter(torch.tensor([float(0)]))
124 |         self.beta = torch.nn.Parameter(torch.tensor([float(-1e-9)]), requires_grad=True)
125 |         self.init_state = 0
126 | 
127 |     # 量化/反量化
128 |     def forward(self, activation):
129 |         #V1
130 |         # print(self.a_bits, self.batch_init)
131 |         if self.a_bits == 32:
132 |             q_a = activation
133 |         elif self.a_bits == 1:
134 |             print('！Binary quantization is not supported ！')
135 |             assert self.a_bits != 1
136 |         else:
137 |             if self.init_state==0:
138 |                 self.g = 1.0/math.sqrt(activation.numel() * self.Qp)
139 |                 self.init_state += 1
140 |             q_a = ALSQPlus.apply(activation, self.s, self.g, self.Qn, self.Qp, self.beta)
141 |             # print(self.s, self.beta)
142 |         return q_a
143 | 
144 | # W(权重)量化
145 | class LSQPlusWeightQuantizer(nn.Module):
146 |     def __init__(self, w_bits, all_positive=False, per_channel=False,batch_init = 20):
147 |         super(LSQPlusWeightQuantizer, self).__init__()
148 |         self.w_bits = w_bits
149 |         self.all_positive = all_positive
150 |         self.batch_init = batch_init
151 |         if self.all_positive:
152 |             # unsigned activation is quantized to [0, 2^b-1]
153 |             self.Qn = 0
154 |             self.Qp = 2 ** w_bits - 1
155 |         else:
156 |             # signed weight/activation is quantized to [-2^(b-1), 2^(b-1)-1]
157 |             self.Qn = - 2 ** (w_bits - 1)
158 |             self.Qp = 2 ** (w_bits - 1) - 1
159 |         self.per_channel = per_channel
160 |         self.init_state = 0
161 |         self.s = torch.nn.Parameter(torch.ones(1), requires_grad=True)
162 |         # self.beta = torch.nn.Parameter(torch.ones(0), requires_grad=True)
163 | 
164 |     # 量化/反量化
165 |     def forward(self, weight):
166 |         if self.init_state==0:
167 |             self.g = 1.0/math.sqrt(weight.numel() * self.Qp)
168 |             self.div = 2**self.w_bits - 1
169 |             if self.per_channel:
170 |                 weight_tmp = weight.detach().contiguous().view(weight.size()[0], -1)
171 |                 mean = torch.mean(weight_tmp, dim=1)
172 |                 std = torch.std(weight_tmp, dim=1)
173 |                 self.s.data, _ = torch.max(torch.stack([torch.abs(mean-3*std), torch.abs(mean + 3*std)]), dim=0)
174 |                 self.s.data = self.s.data/self.div
175 |             else:
176 |                 mean = torch.mean(weight.detach())
177 |                 std = torch.std(weight.detach())
178 |                 self.s.data = max([torch.abs(mean-3*std), torch.abs(mean + 3*std)])/self.div
179 |             self.init_state += 1
180 |         elif self.init_state<self.batch_init:
181 |             self.div = 2**self.w_bits-1
182 |             if self.per_channel:
183 |                 weight_tmp = weight.detach().contiguous().view(weight.size()[0], -1)
184 |                 mean = torch.mean(weight_tmp, dim=1)
185 |                 std = torch.std(weight_tmp, dim=1)
186 |                 self.s.data, _ = torch.max(torch.stack([torch.abs(mean-3*std), torch.abs(mean + 3*std)]), dim=0)
187 |                 self.s.data =  self.s.data*0.9 + 0.1*self.s.data/self.div
188 |             else:
189 |                 mean = torch.mean(weight.detach())
190 |                 std = torch.std(weight.detach())
191 |                 self.s.data = self.s.data*0.9 + 0.1*max([torch.abs(mean-3*std), torch.abs(mean + 3*std)])/self.div
192 |             self.init_state += 1
193 |         elif self.init_state==self.batch_init:
194 |             # self.s = torch.nn.Parameter(self.s)
195 |             self.init_state += 1
196 | 
197 |         if self.w_bits == 32:
198 |             output = weight
199 |         elif self.w_bits == 1:
200 |             print('！Binary quantization is not supported ！')
201 |             assert self.w_bits != 1
202 |         else:
203 |             w_q = WLSQPlus.apply(weight, self.s, self.g, self.Qn, self.Qp, self.per_channel)
204 | 
205 |             # alpha = grad_scale(self.s, g)
206 |             # w_q = Round.apply((weight/alpha).clamp(Qn, Qp)) * alpha
207 |         return w_q
208 | 
209 | def update_LSQplus_activation_Scalebeta(model):
210 |     for name, child in model.named_children():
211 |         if isinstance(child, (QuantConv2d, QuantConvTranspose2d, QuantLinear)):
212 |             #weight = child.weight.data
213 |             s = child.activation_quantizer.s.data
214 |             beta = child.activation_quantizer.beta.data
215 |             Qn = child.activation_quantizer.Qn
216 |             Qp = child.activation_quantizer.Qp
217 |             g = child.activation_quantizer.g
218 |             # print('before: ', name, child.activation_quantizer.s.grad.data, child.activation_quantizer.beta.grad.data, s, beta)
219 |             q_input = (child.input - beta) / s   //論文第三頁公式(3)
220 |             # print(q_input)
221 |             smaller = (q_input < Qn).float() #bool值转浮点值，1.0或者0.0
222 |             bigger = (q_input > Qp).float() #bool值转浮点值，1.0或者0.0
223 |             between = 1.0 - smaller -bigger #得到位于量化区间的index
224 |             grad_alpha = ((smaller * Qn + bigger * Qp + 
225 |                            between * Round.apply(q_input) - between * q_input) * g).sum().unsqueeze(dim=0)
226 |             grad_beta = ((smaller + bigger) * g).sum().unsqueeze(dim=0)
227 |             # print('grad_beta: ',grad_beta,g, smaller.sum(), bigger.sum(), between.sum(),Qn, Qp)
228 |             child.activation_quantizer.s.grad.data.add_(g*(2*(child.quant_input-child.input)*grad_alpha).sum().unsqueeze(dim=0))
229 |             child.activation_quantizer.beta.grad.data.add_(g*(2*(child.quant_input-child.input)*grad_beta).sum().unsqueeze(dim=0))
230 | 
231 |             model._modules[name] = child
232 |             # print('after: ', model._modules[name].activation_quantizer.s.grad.data, model._modules[name].activation_quantizer.beta.grad.data, s, beta,
233 |                 # torch.square(child.quant_input-child.input).sum())
234 |         else:
235 |             child = update_LSQplus_activation_Scalebeta(child)
236 |             model._modules[name] = child
237 |     return model
238 |     
239 | 
240 | 
241 | class QuantConv2d(nn.Conv2d):
242 |     def __init__(self,
243 |                  in_channels,
244 |                  out_channels,
245 |                  kernel_size,
246 |                  stride=1,
247 |                  padding=0,
248 |                  dilation=1,
249 |                  groups=1,
250 |                  bias=True,
251 |                  padding_mode='zeros',
252 |                  a_bits=8,
253 |                  w_bits=8,
254 |                  quant_inference=False,
255 |                  all_positive=False, 
256 |                  per_channel=False,
257 |                  batch_init = 20):
258 |         super(QuantConv2d, self).__init__(in_channels, out_channels, kernel_size, stride, padding, dilation, groups,
259 |                                           bias, padding_mode)
260 |         self.quant_inference = quant_inference
261 |         self.activation_quantizer = LSQPlusActivationQuantizer(a_bits=a_bits, all_positive=all_positive,batch_init = batch_init)
262 |         self.weight_quantizer = LSQPlusWeightQuantizer(w_bits=w_bits, all_positive=all_positive, per_channel=per_channel,batch_init = batch_init)
263 | 
264 |     def forward(self, input):
265 |         self.input = input
266 |         self.quant_input = self.activation_quantizer(self.input)
267 |         if not self.quant_inference:
268 |             self.quant_weight = self.weight_quantizer(self.weight)
269 |         else:
270 |             self.quant_weight = self.weight
271 | 
272 |         output = F.conv2d(self.quant_input, self.quant_weight, self.bias, self.stride, self.padding, self.dilation,
273 |                           self.groups)
274 |         return output
275 | 
276 | 
277 | class QuantConvTranspose2d(nn.ConvTranspose2d):
278 |     def __init__(self,
279 |                  in_channels,
280 |                  out_channels,
281 |                  kernel_size,
282 |                  stride=1,
283 |                  padding=0,
284 |                  output_padding=0,
285 |                  dilation=1,
286 |                  groups=1,
287 |                  bias=True,
288 |                  padding_mode='zeros',
289 |                  a_bits=8,
290 |                  w_bits=8,
291 |                  quant_inference=False, 
292 |                  all_positive=False, 
293 |                  per_channel=False,
294 |                  batch_init = 20):
295 |         super(QuantConvTranspose2d, self).__init__(in_channels, out_channels, kernel_size, stride, padding, output_padding,
296 |                                                    dilation, groups, bias, padding_mode)
297 |         self.quant_inference = quant_inference
298 |         self.activation_quantizer = LSQPlusActivationQuantizer(a_bits=a_bits, all_positive=all_positive,batch_init = batch_init)
299 |         self.weight_quantizer = LSQPlusWeightQuantizer(w_bits=w_bits, all_positive=all_positive, per_channel=per_channel,batch_init = batch_init)
300 | 
301 |     def forward(self, input):
302 |         self.input = input
303 |         self.quant_input = self.activation_quantizer(self.input)
304 |         if not self.quant_inference:
305 |             self.quant_weight = self.weight_quantizer(self.weight)
306 |         else:
307 |             self.quant_weight = self.weight
308 |         output = F.conv_transpose2d(self.quant_input, self.quant_weight, self.bias, self.stride, self.padding, self.output_padding,
309 |                                     self.groups, self.dilation)
310 |         return output
311 | 
312 | 
313 | class QuantLinear(nn.Linear):
314 |     def __init__(self,
315 |                  in_features,
316 |                  out_features,
317 |                  bias=True,
318 |                  a_bits=8,
319 |                  w_bits=8,
320 |                  quant_inference=False, 
321 |                  all_positive=False, 
322 |                  per_channel=False,
323 |                  batch_init = 20):
324 |         super(QuantLinear, self).__init__(in_features, out_features, bias)
325 |         self.quant_inference = quant_inference
326 |         self.activation_quantizer = LSQPlusActivationQuantizer(a_bits=a_bits, all_positive=all_positive,batch_init = batch_init)
327 |         self.weight_quantizer = LSQPlusWeightQuantizer(w_bits=w_bits, all_positive=all_positive, per_channel=per_channel,batch_init = batch_init)
328 | 
329 |     def forward(self, input):
330 |         self.input = input
331 |         self.quant_input = self.activation_quantizer(self.input)
332 |         if not self.quant_inference:
333 |             self.quant_weight = self.weight_quantizer(self.weight)
334 |         else:
335 |             self.quant_weight = self.weight
336 |         output = F.linear(self.quant_input, self.quant_weight, self.bias)
337 |         return output
338 | 
339 | 
340 | def add_quant_op(module, layer_counter, a_bits=8, w_bits=8,
341 |                  quant_inference=False, all_positive=False, per_channel=False, batch_init = 20):
342 |     for name, child in module.named_children():
343 |         if isinstance(child, nn.Conv2d):
344 |             layer_counter[0] += 1
345 |             if layer_counter[0] >= 1: #第一层也量化
346 |                 if child.bias is not None:
347 |                     quant_conv = QuantConv2d(child.in_channels, child.out_channels,
348 |                                              child.kernel_size, stride=child.stride,
349 |                                              padding=child.padding, dilation=child.dilation,
350 |                                              groups=child.groups, bias=True, padding_mode=child.padding_mode,
351 |                                              a_bits=a_bits, w_bits=w_bits, quant_inference=quant_inference,
352 |                                              all_positive=all_positive, per_channel=per_channel, batch_init = batch_init)
353 |                     quant_conv.bias.data = child.bias
354 |                 else:
355 |                     quant_conv = QuantConv2d(child.in_channels, child.out_channels,
356 |                                              child.kernel_size, stride=child.stride,
357 |                                              padding=child.padding, dilation=child.dilation,
358 |                                              groups=child.groups, bias=False, padding_mode=child.padding_mode,
359 |                                              a_bits=a_bits, w_bits=w_bits, quant_inference=quant_inference,
360 |                                              all_positive=all_positive, per_channel=per_channel, batch_init = batch_init)
361 |                 quant_conv.weight.data = child.weight
362 |                 module._modules[name] = quant_conv
363 |         elif isinstance(child, nn.ConvTranspose2d):
364 |             layer_counter[0] += 1
365 |             if layer_counter[0] >= 1: #第一层也量化
366 |                 if child.bias is not None:
367 |                     quant_conv_transpose = QuantConvTranspose2d(child.in_channels,
368 |                                                                 child.out_channels,
369 |                                                                 child.kernel_size,
370 |                                                                 stride=child.stride,
371 |                                                                 padding=child.padding,
372 |                                                                 output_padding=child.output_padding,
373 |                                                                 dilation=child.dilation,
374 |                                                                 groups=child.groups,
375 |                                                                 bias=True,
376 |                                                                 padding_mode=child.padding_mode,
377 |                                                                 a_bits=a_bits,
378 |                                                                 w_bits=w_bits,
379 |                                                                 quant_inference=quant_inference,
380 |                                              all_positive=all_positive, per_channel=per_channel, batch_init = batch_init)
381 |                     quant_conv_transpose.bias.data = child.bias
382 |                 else:
383 |                     quant_conv_transpose = QuantConvTranspose2d(child.in_channels,
384 |                                                                 child.out_channels,
385 |                                                                 child.kernel_size,
386 |                                                                 stride=child.stride,
387 |                                                                 padding=child.padding,
388 |                                                                 output_padding=child.output_padding,
389 |                                                                 dilation=child.dilation,
390 |                                                                 groups=child.groups, bias=False,
391 |                                                                 padding_mode=child.padding_mode,
392 |                                                                 a_bits=a_bits,
393 |                                                                 w_bits=w_bits,
394 |                                                                 quant_inference=quant_inference,
395 |                                              all_positive=all_positive, per_channel=per_channel, batch_init = batch_init)
396 |                 quant_conv_transpose.weight.data = child.weight
397 |                 module._modules[name] = quant_conv_transpose
398 |         elif isinstance(child, nn.Linear):
399 |             layer_counter[0] += 1
400 |             if layer_counter[0] >= 1: #第一层也量化
401 |                 if child.bias is not None:
402 |                     quant_linear = QuantLinear(child.in_features, child.out_features,
403 |                                                bias=True, a_bits=a_bits, w_bits=w_bits,
404 |                                                quant_inference=quant_inference,
405 |                                              all_positive=all_positive, per_channel=per_channel, 
406 |                                              batch_init = batch_init)
407 |                     quant_linear.bias.data = child.bias
408 |                 else:
409 |                     quant_linear = QuantLinear(child.in_features, child.out_features,
410 |                                                bias=False, a_bits=a_bits, w_bits=w_bits,
411 |                                                quant_inference=quant_inference,
412 |                                              all_positive=all_positive, per_channel=per_channel, 
413 |                                              batch_init = batch_init)
414 |                 quant_linear.weight.data = child.weight
415 |                 module._modules[name] = quant_linear
416 |         else:
417 |             add_quant_op(child, layer_counter, a_bits=a_bits, w_bits=w_bits,
418 |                          quant_inference=quant_inference, all_positive=all_positive, 
419 |                          per_channel=per_channel, batch_init = batch_init)
420 | 
421 | 
422 | def prepare(model, inplace=False, a_bits=8, w_bits=8, quant_inference=False,
423 |             all_positive=False, per_channel=False, batch_init = 20):
424 |     if not inplace:
425 |         model = copy.deepcopy(model)
426 |     layer_counter = [0]
427 |     add_quant_op(model, layer_counter, a_bits=a_bits, w_bits=w_bits,
428 |                  quant_inference=quant_inference, 
429 |                  all_positive=all_positive, per_channel=per_channel, batch_init = batch_init)
430 |     return model
431 | 


--------------------------------------------------------------------------------
/quantization/lsqplus_quantize_V2.py:
--------------------------------------------------------------------------------
  1 | import copy
  2 | import math
  3 | import torch
  4 | import torch.nn as nn
  5 | import torch.nn.functional as F
  6 | from torch.autograd import Function
  7 | from quantization.lsqquantize_V1 import Round
  8 | 
  9 | class ALSQPlus(Function):
 10 |     @staticmethod
 11 |     def forward(ctx, weight, alpha, g, Qn, Qp, beta):
 12 |         # assert alpha > 0, "alpha={}".format(alpha)
 13 |         ctx.save_for_backward(weight, alpha, beta)
 14 |         ctx.other = g, Qn, Qp
 15 |         w_q = Round.apply(torch.div((weight - beta), alpha).clamp(Qn, Qp))
 16 |         w_q = w_q * alpha + beta
 17 |         return w_q
 18 | 
 19 |     @staticmethod
 20 |     def backward(ctx, grad_weight):
 21 |         weight, alpha, beta = ctx.saved_tensors
 22 |         g, Qn, Qp = ctx.other
 23 |         q_w = (weight - beta) / alpha
 24 |         smaller = (q_w < Qn).float() #bool值转浮点值，1.0或者0.0
 25 |         bigger = (q_w > Qp).float() #bool值转浮点值，1.0或者0.0
 26 |         between = 1.0 - smaller -bigger #得到位于量化区间的index
 27 |         grad_alpha = ((smaller * Qn + bigger * Qp + 
 28 |             between * Round.apply(q_w) - between * q_w)*grad_weight * g).sum().unsqueeze(dim=0)
 29 |         grad_beta = ((smaller + bigger) * grad_weight * g).sum().unsqueeze(dim=0)
 30 |         #在量化区间之外的值都是常数，故导数也是0
 31 |         grad_weight = between * grad_weight
 32 |         #返回的梯度要和forward的参数对应起来
 33 |         return grad_weight, grad_alpha,  None, None, None, grad_beta
 34 | 
 35 | class WLSQPlus(Function):
 36 |     @staticmethod
 37 |     def forward(ctx, weight, alpha, g, Qn, Qp, per_channel):
 38 |         # assert alpha > 0, "alpha={}".format(alpha)
 39 |         ctx.save_for_backward(weight, alpha)
 40 |         ctx.other = g, Qn, Qp, per_channel
 41 |         if per_channel:
 42 |             sizes = weight.size()
 43 |             weight = weight.contiguous().view(weight.size()[0], -1)
 44 |             weight = torch.transpose(weight, 0, 1)
 45 |             alpha = torch.broadcast_to(alpha, weight.size())
 46 |             w_q = Round.apply(torch.div(weight, alpha).clamp(Qn, Qp))
 47 |             w_q = w_q * alpha
 48 |             w_q = torch.transpose(w_q, 0, 1)
 49 |             w_q = w_q.contiguous().view(sizes)
 50 |         else:
 51 |             w_q = Round.apply(torch.div(weight, alpha).clamp(Qn, Qp))
 52 |             w_q = w_q * alpha 
 53 |         return w_q
 54 | 
 55 |     @staticmethod
 56 |     def backward(ctx, grad_weight):
 57 |         weight, alpha = ctx.saved_tensors
 58 |         g, Qn, Qp, per_channel = ctx.other
 59 |         if per_channel:
 60 |             sizes = weight.size()
 61 |             weight = weight.contiguous().view(weight.size()[0], -1)
 62 |             weight = torch.transpose(weight, 0, 1)
 63 |             alpha = torch.broadcast_to(alpha, weight.size())
 64 |             q_w = weight / alpha
 65 |             q_w = torch.transpose(q_w, 0, 1)
 66 |             q_w = q_w.contiguous().view(sizes)
 67 |         else:
 68 |             q_w = weight / alpha
 69 |         smaller = (q_w < Qn).float() #bool值转浮点值，1.0或者0.0
 70 |         bigger = (q_w > Qp).float() #bool值转浮点值，1.0或者0.0
 71 |         between = 1.0 - smaller -bigger #得到位于量化区间的index
 72 |         if per_channel:
 73 |             grad_alpha = ((smaller * Qn + bigger * Qp + 
 74 |                 between * Round.apply(q_w) - between * q_w)*grad_weight * g)
 75 |             grad_alpha = grad_alpha.contiguous().view(grad_alpha.size()[0], -1).sum(dim=1)
 76 |         else:
 77 |             grad_alpha = ((smaller * Qn + bigger * Qp + 
 78 |                 between * Round.apply(q_w) - between * q_w)*grad_weight * g).sum().unsqueeze(dim=0)
 79 |         #在量化区间之外的值都是常数，故导数也是0
 80 |         grad_weight = between * grad_weight
 81 |         return grad_weight, grad_alpha, None, None, None, None
 82 | 
 83 | def grad_scale(x, scale):
 84 |     y = x
 85 |     y_grad = x * scale
 86 |     return (y - y_grad).detach() + y_grad
 87 | 
 88 | def round_pass(x):
 89 |     y = x.round()
 90 |     y_grad = x
 91 |     return (y - y_grad).detach() + y_grad
 92 | 
 93 | def get_percentile_min_max(input, lower_percentile, uppper_percentile, output_tensor):
 94 |     batch_size = input.shape[0]
 95 |     lower_index = round(batch_size * (1 - lower_percentile*0.01))
 96 |     upper_index = round(batch_size * (1 - uppper_percentile*0.01))
 97 | 
 98 |     upper_bound = torch.kthvalue(input, k=upper_index).values
 99 | 
100 |     if lower_percentile==0:
101 |         lower_bound = upper_bound * 0
102 |     else:
103 |         low_bound = -torch.kthvalue(-input, k=lower_index).values
104 | 
105 | # def update_scale_betas():
106 | #     for m in model.modules():
107 | #         if isinstance(m, nn.)
108 | 
109 | # A(特征)量化
110 | class LSQPlusActivationQuantizer(nn.Module):
111 |     def __init__(self, a_bits, all_positive=False,batch_init = 20):
112 |         #activations 没有per-channel这个选项的
113 |         super(LSQPlusActivationQuantizer, self).__init__()
114 |         self.a_bits = a_bits
115 |         self.all_positive = all_positive
116 |         self.batch_init = batch_init
117 |         if self.all_positive:
118 |             # unsigned activation is quantized to [0, 2^b-1]
119 |             self.Qn = 0
120 |             self.Qp = 2 ** self.a_bits - 1
121 |         else:
122 |             # signed weight/activation is quantized to [-2^(b-1), 2^(b-1)-1]
123 |             self.Qn = - 2 ** (self.a_bits - 1)
124 |             self.Qp = 2 ** (self.a_bits - 1) - 1
125 |         self.s = torch.nn.Parameter(torch.ones(1), requires_grad=True)
126 |         self.beta = torch.nn.Parameter(torch.ones(0), requires_grad=True)
127 |         self.init_state = 0
128 | 
129 |     # 量化/反量化
130 |     def forward(self, activation):
131 |         if self.init_state==0:
132 |             self.g = 1.0/math.sqrt(activation.numel() * self.Qp)
133 |             mina = torch.min(activation.detach())
134 |             self.s.data = (torch.max(activation.detach()) - mina)/(self.Qp-self.Qn)
135 |             self.beta.data = mina - self.s.data *self.Qn
136 |             self.init_state += 1
137 |         elif self.init_state<self.batch_init:
138 |             mina = torch.min(activation.detach())
139 |             self.s.data = self.s.data*0.9 + 0.1*(torch.max(activation.detach()) - mina)/(self.Qp-self.Qn)
140 |             self.beta.data = self.s.data*0.9 + 0.1* (mina - self.s.data * self.Qn)
141 |             self.init_state += 1
142 |         elif self.init_state==self.batch_init:
143 |             # self.s = torch.nn.Parameter(self.s)
144 |             # self.beta = torch.nn.Parameter(self.beta)
145 |             self.init_state += 1
146 | 
147 | 
148 |         if self.a_bits == 32:
149 |             q_a = activation
150 |         elif self.a_bits == 1:
151 |             print('！Binary quantization is not supported ！')
152 |             assert self.a_bits != 1
153 |         else:
154 |             q_a = ALSQPlus.apply(activation, self.s, self.g, self.Qn, self.Qp, self.beta)
155 |         return q_a
156 | 
157 | # W(权重)量化
158 | class LSQPlusWeightQuantizer(nn.Module):
159 |     def __init__(self, w_bits, all_positive=False, per_channel=False,batch_init = 20):
160 |         super(LSQPlusWeightQuantizer, self).__init__()
161 |         self.w_bits = w_bits
162 |         self.all_positive = all_positive
163 |         self.batch_init = batch_init
164 |         if self.all_positive:
165 |             # unsigned activation is quantized to [0, 2^b-1]
166 |             self.Qn = 0
167 |             self.Qp = 2 ** w_bits - 1
168 |         else:
169 |             # signed weight/activation is quantized to [-2^(b-1), 2^(b-1)-1]
170 |             self.Qn = - 2 ** (w_bits - 1)
171 |             self.Qp = 2 ** (w_bits - 1) - 1
172 |         self.per_channel = per_channel
173 |         self.init_state = 0
174 |         self.s = torch.nn.Parameter(torch.ones(1), requires_grad=True)
175 | 
176 |     # 量化/反量化
177 |     def forward(self, weight):
178 |         '''
179 |         For this work, each layer of weights and each layer of activations has a distinct step size, represented
180 | as an fp32 value, initialized to 2h|v|i/√OP , computed on either the initial weights values or the first
181 | batch of activations, respectively
182 |         '''
183 |         if self.init_state==0:
184 |             self.g = 1.0/math.sqrt(weight.numel() * self.Qp)
185 |             self.div = 2**self.w_bits-1
186 |             if self.per_channel:
187 |                 weight_tmp = weight.detach().contiguous().view(weight.size()[0], -1)
188 |                 mean = torch.mean(weight_tmp, dim=1)
189 |                 std = torch.std(weight_tmp, dim=1)
190 |                 self.s.data, _ = torch.max(torch.stack([torch.abs(mean-3*std), torch.abs(mean + 3*std)]), dim=0)
191 |                 self.s.data = self.s.data/self.div
192 |             else:
193 |                 mean = torch.mean(weight.detach())
194 |                 std = torch.std(weight.detach())
195 |                 self.s.data = max([torch.abs(mean-3*std), torch.abs(mean + 3*std)])/self.div
196 |             self.init_state += 1
197 |         elif self.init_state<self.batch_init:
198 |             self.div = 2**self.w_bits-1
199 |             if self.per_channel:
200 |                 weight_tmp = weight.detach().contiguous().view(weight.size()[0], -1)
201 |                 mean = torch.mean(weight_tmp, dim=1)
202 |                 std = torch.std(weight_tmp, dim=1)
203 |                 self.s.data, _ = torch.max(torch.stack([torch.abs(mean-3*std), torch.abs(mean + 3*std)]), dim=0)
204 |                 self.s.data =  self.s.data*0.9 + 0.1*self.s.data/self.div
205 |             else:
206 |                 mean = torch.mean(weight.detach())
207 |                 std = torch.std(weight.detach())
208 |                 self.s.data = self.s.data*0.9 + 0.1*max([torch.abs(mean-3*std), torch.abs(mean + 3*std)])/self.div
209 |             self.init_state += 1
210 |         elif self.init_state==self.batch_init:
211 |             # self.s = torch.nn.Parameter(self.s)
212 |             self.init_state += 1
213 | 
214 | 
215 |         if self.w_bits == 32:
216 |             output = weight
217 |         elif self.w_bits == 1:
218 |             print('！Binary quantization is not supported ！')
219 |             assert self.w_bits != 1
220 |         else:
221 |             w_q = WLSQPlus.apply(weight, self.s, self.g, self.Qn, self.Qp, self.per_channel)
222 | 
223 |             # alpha = grad_scale(self.s, g)
224 |             # w_q = Round.apply((weight/alpha).clamp(Qn, Qp)) * alpha
225 |         return w_q
226 | 
227 | class QuantConv2d(nn.Conv2d):
228 |     def __init__(self,
229 |                  in_channels,
230 |                  out_channels,
231 |                  kernel_size,
232 |                  stride=1,
233 |                  padding=0,
234 |                  dilation=1,
235 |                  groups=1,
236 |                  bias=True,
237 |                  padding_mode='zeros',
238 |                  a_bits=8,
239 |                  w_bits=8,
240 |                  quant_inference=False,
241 |                  all_positive=False, 
242 |                  per_channel=False,
243 |                  batch_init = 20):
244 |         super(QuantConv2d, self).__init__(in_channels, out_channels, kernel_size, stride, padding, dilation, groups,
245 |                                           bias, padding_mode)
246 |         self.quant_inference = quant_inference
247 |         self.activation_quantizer = LSQPlusActivationQuantizer(a_bits=a_bits, all_positive=all_positive,batch_init = batch_init)
248 |         self.weight_quantizer = LSQPlusWeightQuantizer(w_bits=w_bits, all_positive=all_positive, per_channel=per_channel,batch_init = batch_init)
249 | 
250 |     def forward(self, input):
251 |         quant_input = self.activation_quantizer(input)
252 |         if not self.quant_inference:
253 |             quant_weight = self.weight_quantizer(self.weight)
254 |         else:
255 |             quant_weight = self.weight
256 | 
257 |         output = F.conv2d(quant_input, quant_weight, self.bias, self.stride, self.padding, self.dilation,
258 |                           self.groups)
259 |         return output
260 | 
261 | 
262 | class QuantConvTranspose2d(nn.ConvTranspose2d):
263 |     def __init__(self,
264 |                  in_channels,
265 |                  out_channels,
266 |                  kernel_size,
267 |                  stride=1,
268 |                  padding=0,
269 |                  output_padding=0,
270 |                  dilation=1,
271 |                  groups=1,
272 |                  bias=True,
273 |                  padding_mode='zeros',
274 |                  a_bits=8,
275 |                  w_bits=8,
276 |                  quant_inference=False, 
277 |                  all_positive=False, 
278 |                  per_channel=False,
279 |                  batch_init = 20):
280 |         super(QuantConvTranspose2d, self).__init__(in_channels, out_channels, kernel_size, stride, padding, output_padding,
281 |                                                    dilation, groups, bias, padding_mode)
282 |         self.quant_inference = quant_inference
283 |         self.activation_quantizer = LSQPlusActivationQuantizer(a_bits=a_bits, all_positive=all_positive,batch_init = batch_init)
284 |         self.weight_quantizer = LSQPlusWeightQuantizer(w_bits=w_bits, all_positive=all_positive, per_channel=per_channel,batch_init = batch_init)
285 | 
286 |     def forward(self, input):
287 |         quant_input = self.activation_quantizer(input)
288 |         if not self.quant_inference:
289 |             quant_weight = self.weight_quantizer(self.weight)
290 |         else:
291 |             quant_weight = self.weight
292 |         output = F.conv_transpose2d(quant_input, quant_weight, self.bias, self.stride, self.padding, self.output_padding,
293 |                                     self.groups, self.dilation)
294 |         return output
295 | 
296 | 
297 | class QuantLinear(nn.Linear):
298 |     def __init__(self,
299 |                  in_features,
300 |                  out_features,
301 |                  bias=True,
302 |                  a_bits=8,
303 |                  w_bits=8,
304 |                  quant_inference=False, 
305 |                  all_positive=False, 
306 |                  per_channel=False,
307 |                  batch_init = 20):
308 |         super(QuantLinear, self).__init__(in_features, out_features, bias)
309 |         self.quant_inference = quant_inference
310 |         self.activation_quantizer = LSQPlusActivationQuantizer(a_bits=a_bits, all_positive=all_positive,batch_init = batch_init)
311 |         self.weight_quantizer = LSQPlusWeightQuantizer(w_bits=w_bits, all_positive=all_positive, per_channel=per_channel,batch_init = batch_init)
312 | 
313 |     def forward(self, input):
314 |         quant_input = self.activation_quantizer(input)
315 |         if not self.quant_inference:
316 |             quant_weight = self.weight_quantizer(self.weight)
317 |         else:
318 |             quant_weight = self.weight
319 |         output = F.linear(quant_input, quant_weight, self.bias)
320 |         return output
321 | 
322 | 
323 | def add_quant_op(module, layer_counter, a_bits=8, w_bits=8,
324 |                  quant_inference=False, all_positive=False, per_channel=False, batch_init = 20):
325 |     for name, child in module.named_children():
326 |         if isinstance(child, nn.Conv2d):
327 |             layer_counter[0] += 1
328 |             if layer_counter[0] >= 1: #第一层也量化
329 |                 if child.bias is not None:
330 |                     quant_conv = QuantConv2d(child.in_channels, child.out_channels,
331 |                                              child.kernel_size, stride=child.stride,
332 |                                              padding=child.padding, dilation=child.dilation,
333 |                                              groups=child.groups, bias=True, padding_mode=child.padding_mode,
334 |                                              a_bits=a_bits, w_bits=w_bits, quant_inference=quant_inference,
335 |                                              all_positive=all_positive, per_channel=per_channel, batch_init = batch_init)
336 |                     quant_conv.bias.data = child.bias
337 |                 else:
338 |                     quant_conv = QuantConv2d(child.in_channels, child.out_channels,
339 |                                              child.kernel_size, stride=child.stride,
340 |                                              padding=child.padding, dilation=child.dilation,
341 |                                              groups=child.groups, bias=False, padding_mode=child.padding_mode,
342 |                                              a_bits=a_bits, w_bits=w_bits, quant_inference=quant_inference,
343 |                                              all_positive=all_positive, per_channel=per_channel, batch_init = batch_init)
344 |                 quant_conv.weight.data = child.weight
345 |                 module._modules[name] = quant_conv
346 |         elif isinstance(child, nn.ConvTranspose2d):
347 |             layer_counter[0] += 1
348 |             if layer_counter[0] >= 1: #第一层也量化
349 |                 if child.bias is not None:
350 |                     quant_conv_transpose = QuantConvTranspose2d(child.in_channels,
351 |                                                                 child.out_channels,
352 |                                                                 child.kernel_size,
353 |                                                                 stride=child.stride,
354 |                                                                 padding=child.padding,
355 |                                                                 output_padding=child.output_padding,
356 |                                                                 dilation=child.dilation,
357 |                                                                 groups=child.groups,
358 |                                                                 bias=True,
359 |                                                                 padding_mode=child.padding_mode,
360 |                                                                 a_bits=a_bits,
361 |                                                                 w_bits=w_bits,
362 |                                                                 quant_inference=quant_inference,
363 |                                              all_positive=all_positive, per_channel=per_channel, batch_init = batch_init)
364 |                     quant_conv_transpose.bias.data = child.bias
365 |                 else:
366 |                     quant_conv_transpose = QuantConvTranspose2d(child.in_channels,
367 |                                                                 child.out_channels,
368 |                                                                 child.kernel_size,
369 |                                                                 stride=child.stride,
370 |                                                                 padding=child.padding,
371 |                                                                 output_padding=child.output_padding,
372 |                                                                 dilation=child.dilation,
373 |                                                                 groups=child.groups, bias=False,
374 |                                                                 padding_mode=child.padding_mode,
375 |                                                                 a_bits=a_bits,
376 |                                                                 w_bits=w_bits,
377 |                                                                 quant_inference=quant_inference,
378 |                                              all_positive=all_positive, per_channel=per_channel, batch_init = batch_init)
379 |                 quant_conv_transpose.weight.data = child.weight
380 |                 module._modules[name] = quant_conv_transpose
381 |         elif isinstance(child, nn.Linear):
382 |             layer_counter[0] += 1
383 |             if layer_counter[0] >= 1: #第一层也量化
384 |                 if child.bias is not None:
385 |                     quant_linear = QuantLinear(child.in_features, child.out_features,
386 |                                                bias=True, a_bits=a_bits, w_bits=w_bits,
387 |                                                quant_inference=quant_inference,
388 |                                              all_positive=all_positive, per_channel=per_channel, 
389 |                                              batch_init = batch_init)
390 |                     quant_linear.bias.data = child.bias
391 |                 else:
392 |                     quant_linear = QuantLinear(child.in_features, child.out_features,
393 |                                                bias=False, a_bits=a_bits, w_bits=w_bits,
394 |                                                quant_inference=quant_inference,
395 |                                              all_positive=all_positive, per_channel=per_channel, 
396 |                                              batch_init = batch_init)
397 |                 quant_linear.weight.data = child.weight
398 |                 module._modules[name] = quant_linear
399 |         else:
400 |             add_quant_op(child, layer_counter, a_bits=a_bits, w_bits=w_bits,
401 |                          quant_inference=quant_inference, all_positive=all_positive, 
402 |                          per_channel=per_channel, batch_init = batch_init)
403 | 
404 | 
405 | def prepare(model, inplace=False, a_bits=8, w_bits=8, quant_inference=False,
406 |             all_positive=False, per_channel=False, batch_init = 20):
407 |     if not inplace:
408 |         model = copy.deepcopy(model)
409 |     layer_counter = [0]
410 |     add_quant_op(model, layer_counter, a_bits=a_bits, w_bits=w_bits,
411 |                  quant_inference=quant_inference, 
412 |                  all_positive=all_positive, per_channel=per_channel, batch_init = batch_init)
413 |     return model


--------------------------------------------------------------------------------
/quantization/lsqquantize_V1.py:
--------------------------------------------------------------------------------
  1 | import copy
  2 | import math
  3 | import torch
  4 | import torch.nn as nn
  5 | import torch.nn.functional as F
  6 | from torch.autograd import Function
  7 | 
  8 | 
  9 | # ********************* quantizers（量化器，量化） *********************
 10 | # 取整(ste)
 11 | class Round(Function):
 12 |     @staticmethod
 13 |     def forward(self, input):
 14 |         sign = torch.sign(input)
 15 |         output = sign * torch.floor(torch.abs(input) + 0.5)
 16 |         return output
 17 | 
 18 |     @staticmethod
 19 |     def backward(self, grad_output):
 20 |         grad_input = grad_output.clone()
 21 |         return grad_input
 22 | 
 23 | class FunLSQ(Function):
 24 |     @staticmethod
 25 |     def forward(ctx, weight, alpha, g, Qn, Qp, per_channel=False):
 26 |         #根据论文里LEARNED STEP SIZE QUANTIZATION第2节的公式
 27 |         # assert alpha > 0, "alpha={}".format(alpha)
 28 |         ctx.save_for_backward(weight, alpha)
 29 |         ctx.other = g, Qn, Qp, per_channel
 30 |         if per_channel:
 31 |             sizes = weight.size()
 32 |             weight = weight.contiguous().view(weight.size()[0], -1)
 33 |             weight = torch.transpose(weight, 0, 1)
 34 |             alpha = torch.broadcast_to(alpha, weight.size())
 35 |             w_q = Round.apply(torch.div(weight, alpha).clamp(Qn, Qp))
 36 |             w_q = w_q * alpha
 37 |             w_q = torch.transpose(w_q, 0, 1)
 38 |             w_q = w_q.contiguous().view(sizes)
 39 |         else:
 40 |             w_q = Round.apply(torch.div(weight, alpha).clamp(Qn, Qp))
 41 |             w_q = w_q * alpha
 42 |         return w_q
 43 | 
 44 |     @staticmethod
 45 |     def backward(ctx, grad_weight):
 46 |         #根据论文里LEARNED STEP SIZE QUANTIZATION第2.1节
 47 |         #分为三部分：位于量化区间的、小于下界的、大于上界的
 48 |         weight, alpha = ctx.saved_tensors
 49 |         g, Qn, Qp, per_channel = ctx.other
 50 |         if per_channel:
 51 |             sizes = weight.size()
 52 |             weight = weight.contiguous().view(weight.size()[0], -1)
 53 |             weight = torch.transpose(weight, 0, 1)
 54 |             alpha = torch.broadcast_to(alpha, weight.size())
 55 |             q_w = weight / alpha
 56 |             q_w = torch.transpose(q_w, 0, 1)
 57 |             q_w = q_w.contiguous().view(sizes)
 58 |         else:
 59 |             q_w = weight / alpha
 60 |         smaller = (q_w < Qn).float() #bool值转浮点值，1.0或者0.0
 61 |         bigger = (q_w > Qp).float() #bool值转浮点值，1.0或者0.0
 62 |         between = 1.0 - smaller -bigger #得到位于量化区间的index
 63 |         if per_channel:
 64 |             grad_alpha = ((smaller * Qn + bigger * Qp + 
 65 |                 between * Round.apply(q_w) - between * q_w)*grad_weight * g)
 66 |             grad_alpha = grad_alpha.contiguous().view(grad_alpha.size()[0], -1).sum(dim=1)
 67 |         else:
 68 |             grad_alpha = ((smaller * Qn + bigger * Qp + 
 69 |                 between * Round.apply(q_w) - between * q_w)*grad_weight * g).sum().unsqueeze(dim=0) #?
 70 |         #在量化区间之外的值都是常数，故导数也是0
 71 |         grad_weight = between * grad_weight
 72 |         return grad_weight, grad_alpha, None, None, None, None
 73 | 
 74 | def grad_scale(x, scale):
 75 |     y = x
 76 |     y_grad = x * scale
 77 |     return (y - y_grad).detach() + y_grad
 78 | 
 79 | def round_pass(x):
 80 |     y = x.round()
 81 |     y_grad = x
 82 |     return (y - y_grad).detach() + y_grad
 83 | 
 84 | # A(特征)量化
 85 | class LSQActivationQuantizer(nn.Module):
 86 |     def __init__(self, a_bits, all_positive=False, batch_init = 20):
 87 |         #activations 没有per-channel这个选项的
 88 |         super(LSQActivationQuantizer, self).__init__()
 89 |         self.a_bits = a_bits
 90 |         self.all_positive = all_positive
 91 |         self.batch_init = batch_init
 92 |         if self.all_positive:
 93 |             # unsigned activation is quantized to [0, 2^b-1]
 94 |             self.Qn = 0
 95 |             self.Qp = 2 ** self.a_bits - 1
 96 |         else:
 97 |             # signed weight/activation is quantized to [-2^(b-1), 2^(b-1)-1]
 98 |             self.Qn = - 2 ** (self.a_bits - 1)
 99 |             self.Qp = 2 ** (self.a_bits - 1) - 1
100 |         self.s = torch.nn.Parameter(torch.ones(1), requires_grad=True)
101 |         # self.s = torch.nn.Parameter(torch.ones(0.01), requires_grad=True)
102 |         # self.register_parameter('Ascale', self.s)
103 |         self.init_state = 0
104 | 
105 |     # 量化/反量化
106 |     def forward(self, activation):
107 |         '''
108 |         For this work, each layer of weights and each layer of activations has a distinct step size, represented
109 | as an fp32 value, initialized to 2h|v|i/√OP , computed on either the initial weights values or the first
110 | batch of activations, respectively
111 |         '''
112 |         #V1
113 |         if self.init_state==0:
114 |             self.g = 1.0/math.sqrt(activation.numel() * self.Qp)
115 |             self.s.data = torch.mean(torch.abs(activation.detach()))*2/(math.sqrt(self.Qp))
116 |             self.init_state += 1
117 |         elif self.init_state<self.batch_init:
118 |             self.s.data = 0.9*self.s.data + 0.1*torch.mean(torch.abs(activation.detach()))*2/(math.sqrt(self.Qp))
119 |             self.init_state += 1
120 |         elif self.init_state==self.batch_init:
121 |             # self.s = torch.nn.Parameter(self.s)
122 |             self.init_state += 1
123 |         if self.a_bits == 32:
124 |             output = activation
125 |         elif self.a_bits == 1:
126 |             print('！Binary quantization is not supported ！')
127 |             assert self.a_bits != 1
128 |         else:
129 |             # print(self.s, self.g)
130 |             q_a = FunLSQ.apply(activation, self.s, self.g, self.Qn, self.Qp)
131 | 
132 |             # alpha = grad_scale(self.s, g)
133 |             # q_a = Round.apply((activation/alpha).clamp(Qn, Qp)) * alpha
134 |         return q_a
135 | 
136 | # W(权重)量化
137 | class LSQWeightQuantizer(nn.Module):
138 |     def __init__(self, w_bits, all_positive=False, per_channel=False, batch_init = 20):
139 |         super(LSQWeightQuantizer, self).__init__()
140 |         self.w_bits = w_bits
141 |         self.all_positive = all_positive
142 |         self.batch_init = batch_init
143 |         if self.all_positive:
144 |             # unsigned activation is quantized to [0, 2^b-1]
145 |             self.Qn = 0
146 |             self.Qp = 2 ** w_bits - 1
147 |         else:
148 |             # signed weight/activation is quantized to [-2^(b-1), 2^(b-1)-1]
149 |             self.Qn = - 2 ** (w_bits - 1)
150 |             self.Qp = 2 ** (w_bits - 1) - 1
151 |         self.per_channel = per_channel
152 |         self.s = torch.nn.Parameter(torch.ones(1), requires_grad=True)
153 |         # self.register_parameter('Wscale', self.s)
154 |         self.init_state = 0
155 | 
156 |     # 量化/反量化
157 |     def forward(self, weight):
158 |         if self.init_state==0:
159 |             self.g = 1.0/math.sqrt(weight.numel() * self.Qp)
160 |             if self.per_channel:
161 |                 weight_tmp = weight.detach().contiguous().view(weight.size()[0], -1)
162 |                 self.s.data = torch.mean(torch.abs(weight_tmp), dim=1)*2/(math.sqrt(self.Qp))
163 |             else:
164 |                 self.s.data = torch.mean(torch.abs(weight.detach()))*2/(math.sqrt(self.Qp))
165 |             self.init_state += 1
166 |         elif self.init_state<self.batch_init:
167 |             if self.per_channel:
168 |                 weight_tmp = weight.detach().contiguous().view(weight.size()[0], -1)
169 |                 self.s.data = 0.9*self.s.data + 0.1*torch.mean(torch.abs(weight_tmp), dim=1)*2/(math.sqrt(self.Qp))
170 |             else:
171 |                 self.s.data = 0.9*self.s.data + 0.1*torch.mean(torch.abs(weight.detach()))*2/(math.sqrt(self.Qp))
172 |             self.init_state += 1
173 |         elif self.init_state==self.batch_init:
174 |             # self.s = torch.nn.Parameter(self.s)
175 |             self.init_state += 1
176 |         if self.w_bits == 32:
177 |             output = weight
178 |         elif self.w_bits == 1:
179 |             print('！Binary quantization is not supported ！')
180 |             assert self.w_bits != 1
181 |         else:
182 |             # print(self.s, self.g)
183 |             w_q = FunLSQ.apply(weight, self.s, self.g, self.Qn, self.Qp, self.per_channel)
184 | 
185 |             # alpha = grad_scale(self.s, g)
186 |             # w_q = Round.apply((weight/alpha).clamp(Qn, Qp)) * alpha
187 |         return w_q
188 | 
189 | class QuantConv2d(nn.Conv2d):
190 |     def __init__(self,
191 |                  in_channels,
192 |                  out_channels,
193 |                  kernel_size,
194 |                  stride=1,
195 |                  padding=0,
196 |                  dilation=1,
197 |                  groups=1,
198 |                  bias=True,
199 |                  padding_mode='zeros',
200 |                  a_bits=8,
201 |                  w_bits=8,
202 |                  quant_inference=False, 
203 |                  all_positive=False, 
204 |                  per_channel=False, 
205 |                  batch_init = 20):
206 |         super(QuantConv2d, self).__init__(in_channels, out_channels, kernel_size, stride, padding, dilation, groups,
207 |                                           bias, padding_mode)
208 |         self.quant_inference = quant_inference
209 |         self.activation_quantizer = LSQActivationQuantizer(a_bits=a_bits, all_positive=all_positive,batch_init = batch_init)
210 |         self.weight_quantizer = LSQWeightQuantizer(w_bits=w_bits, all_positive=all_positive, per_channel=per_channel,batch_init = batch_init)
211 | 
212 |     def forward(self, input):
213 |         quant_input = self.activation_quantizer(input)
214 |         # print('input:',input.size(),self.quant_inference)
215 |         if not self.quant_inference:
216 |             quant_weight = self.weight_quantizer(self.weight)
217 |         else:
218 |             quant_weight = self.weight
219 | 
220 |         output = F.conv2d(quant_input, quant_weight, self.bias, self.stride, self.padding, self.dilation,
221 |                           self.groups)
222 |         return output
223 | 
224 | 
225 | class QuantConvTranspose2d(nn.ConvTranspose2d):
226 |     def __init__(self,
227 |                  in_channels,
228 |                  out_channels,
229 |                  kernel_size,
230 |                  stride=1,
231 |                  padding=0,
232 |                  output_padding=0,
233 |                  dilation=1,
234 |                  groups=1,
235 |                  bias=True,
236 |                  padding_mode='zeros',
237 |                  a_bits=8,
238 |                  w_bits=8,
239 |                  quant_inference=False, 
240 |                  all_positive=False, 
241 |                  per_channel=False, 
242 |                  batch_init = 20):
243 |         super(QuantConvTranspose2d, self).__init__(in_channels, out_channels, kernel_size, stride, padding, output_padding,
244 |                                                    dilation, groups, bias, padding_mode)
245 |         self.quant_inference = quant_inference
246 |         self.activation_quantizer = LSQActivationQuantizer(a_bits=a_bits, all_positive=all_positive,batch_init = batch_init)
247 |         self.weight_quantizer = LSQWeightQuantizer(w_bits=w_bits, all_positive=all_positive, per_channel=per_channel,batch_init = batch_init)
248 | 
249 |     def forward(self, input):
250 |         quant_input = self.activation_quantizer(input)
251 |         if not self.quant_inference:
252 |             quant_weight = self.weight_quantizer(self.weight)
253 |         else:
254 |             quant_weight = self.weight
255 |         output = F.conv_transpose2d(quant_input, quant_weight, self.bias, self.stride, self.padding, self.output_padding,
256 |                                     self.groups, self.dilation)
257 |         return output
258 | 
259 | 
260 | class QuantLinear(nn.Linear):
261 |     def __init__(self,
262 |                  in_features,
263 |                  out_features,
264 |                  bias=True,
265 |                  a_bits=8,
266 |                  w_bits=8,
267 |                  quant_inference=False, 
268 |                  all_positive=False, 
269 |                  per_channel=False, 
270 |                  batch_init = 20):
271 |         super(QuantLinear, self).__init__(in_features, out_features, bias)
272 |         self.quant_inference = quant_inference
273 |         self.activation_quantizer = LSQActivationQuantizer(a_bits=a_bits, all_positive=all_positive,batch_init = batch_init)
274 |         self.weight_quantizer = LSQWeightQuantizer(w_bits=w_bits, all_positive=all_positive, per_channel=per_channel,batch_init = batch_init)
275 | 
276 |     def forward(self, input):
277 |         quant_input = self.activation_quantizer(input)
278 |         if not self.quant_inference:
279 |             quant_weight = self.weight_quantizer(self.weight)
280 |         else:
281 |             quant_weight = self.weight
282 |         output = F.linear(quant_input, quant_weight, self.bias)
283 |         return output
284 | 
285 | 
286 | def add_quant_op(module, layer_counter, a_bits=8, w_bits=8,
287 |                  quant_inference=False, all_positive=False, per_channel=False, 
288 |                  batch_init = 20):
289 |     for name, child in module.named_children():
290 |         if isinstance(child, nn.Conv2d):
291 |             layer_counter[0] += 1
292 |             if layer_counter[0] >= 1: #第一层也量化
293 |                 if child.bias is not None:
294 |                     quant_conv = QuantConv2d(child.in_channels, child.out_channels,
295 |                                              child.kernel_size, stride=child.stride,
296 |                                              padding=child.padding, dilation=child.dilation,
297 |                                              groups=child.groups, bias=True, padding_mode=child.padding_mode,
298 |                                              a_bits=a_bits, w_bits=w_bits, quant_inference=quant_inference,
299 |                                              all_positive=all_positive, per_channel=per_channel, batch_init = batch_init)
300 |                     quant_conv.bias.data = child.bias
301 |                 else:
302 |                     quant_conv = QuantConv2d(child.in_channels, child.out_channels,
303 |                                              child.kernel_size, stride=child.stride,
304 |                                              padding=child.padding, dilation=child.dilation,
305 |                                              groups=child.groups, bias=False, padding_mode=child.padding_mode,
306 |                                              a_bits=a_bits, w_bits=w_bits, quant_inference=quant_inference,
307 |                                              all_positive=all_positive, per_channel=per_channel, batch_init = batch_init)
308 |                 quant_conv.weight.data = child.weight
309 |                 module._modules[name] = quant_conv
310 |         elif isinstance(child, nn.ConvTranspose2d):
311 |             layer_counter[0] += 1
312 |             if layer_counter[0] >= 1: #第一层也量化
313 |                 if child.bias is not None:
314 |                     quant_conv_transpose = QuantConvTranspose2d(child.in_channels,
315 |                                                                 child.out_channels,
316 |                                                                 child.kernel_size,
317 |                                                                 stride=child.stride,
318 |                                                                 padding=child.padding,
319 |                                                                 output_padding=child.output_padding,
320 |                                                                 dilation=child.dilation,
321 |                                                                 groups=child.groups,
322 |                                                                 bias=True,
323 |                                                                 padding_mode=child.padding_mode,
324 |                                                                 a_bits=a_bits,
325 |                                                                 w_bits=w_bits,
326 |                                                                 quant_inference=quant_inference,
327 |                                              all_positive=all_positive, per_channel=per_channel, batch_init = batch_init)
328 |                     quant_conv_transpose.bias.data = child.bias
329 |                 else:
330 |                     quant_conv_transpose = QuantConvTranspose2d(child.in_channels,
331 |                                                                 child.out_channels,
332 |                                                                 child.kernel_size,
333 |                                                                 stride=child.stride,
334 |                                                                 padding=child.padding,
335 |                                                                 output_padding=child.output_padding,
336 |                                                                 dilation=child.dilation,
337 |                                                                 groups=child.groups, bias=False,
338 |                                                                 padding_mode=child.padding_mode,
339 |                                                                 a_bits=a_bits,
340 |                                                                 w_bits=w_bits,
341 |                                                                 quant_inference=quant_inference,
342 |                                              all_positive=all_positive, per_channel=per_channel, batch_init = batch_init)
343 |                 quant_conv_transpose.weight.data = child.weight
344 |                 module._modules[name] = quant_conv_transpose
345 |         elif isinstance(child, nn.Linear):
346 |             layer_counter[0] += 1
347 |             if layer_counter[0] >= 1: #第一层也量化
348 |                 if child.bias is not None:
349 |                     quant_linear = QuantLinear(child.in_features, child.out_features,
350 |                                                bias=True, a_bits=a_bits, w_bits=w_bits,
351 |                                                quant_inference=quant_inference,
352 |                                              all_positive=all_positive, per_channel=per_channel, batch_init = batch_init)
353 |                     quant_linear.bias.data = child.bias
354 |                 else:
355 |                     quant_linear = QuantLinear(child.in_features, child.out_features,
356 |                                                bias=False, a_bits=a_bits, w_bits=w_bits,
357 |                                                quant_inference=quant_inference,
358 |                                              all_positive=all_positive, per_channel=per_channel, batch_init = batch_init)
359 |                 quant_linear.weight.data = child.weight
360 |                 module._modules[name] = quant_linear
361 |         else:
362 |             add_quant_op(child, layer_counter, a_bits=a_bits, w_bits=w_bits,
363 |                          quant_inference=quant_inference, all_positive=all_positive, per_channel=per_channel, batch_init = batch_init)
364 | 
365 | 
366 | def prepare(model, inplace=False, a_bits=8, w_bits=8, quant_inference=False,
367 |             all_positive=False, per_channel=False, batch_init = 20):
368 |     if not inplace:
369 |         model = copy.deepcopy(model)
370 |     layer_counter = [0]
371 |     add_quant_op(model, layer_counter, a_bits=a_bits, w_bits=w_bits,
372 |                  quant_inference=quant_inference, all_positive=all_positive, 
373 |                  per_channel=per_channel, batch_init = batch_init)
374 |     return model
375 | 


--------------------------------------------------------------------------------
/quantization/lsqquantize_V2.py:
--------------------------------------------------------------------------------
  1 | import copy
  2 | import math
  3 | import torch
  4 | import torch.nn as nn
  5 | import torch.nn.functional as F
  6 | from torch.autograd import Function
  7 | '''
  8 | self.s = torch.nn.Parameter(torch.ones(1))  #V2
  9 | 激活值量化参数s初始化使用了常数1
 10 | '''
 11 | 
 12 | # ********************* quantizers（量化器，量化） *********************
 13 | # 取整(ste)
 14 | class Round(Function):
 15 |     @staticmethod
 16 |     def forward(self, input):
 17 |         sign = torch.sign(input)
 18 |         output = sign * torch.floor(torch.abs(input) + 0.5)
 19 |         return output
 20 | 
 21 |     @staticmethod
 22 |     def backward(self, grad_output):
 23 |         grad_input = grad_output.clone()
 24 |         return grad_input
 25 | 
 26 | class FunLSQ(Function):
 27 |     @staticmethod
 28 |     def forward(ctx, weight, alpha, g, Qn, Qp, per_channel=False):
 29 |         #根据论文里LEARNED STEP SIZE QUANTIZATION第2节的公式
 30 |         # assert alpha > 0, "alpha={}".format(alpha)
 31 |         ctx.save_for_backward(weight, alpha)
 32 |         ctx.other = g, Qn, Qp, per_channel
 33 |         if per_channel:
 34 |             sizes = weight.size()
 35 |             weight = weight.contiguous().view(weight.size()[0], -1)
 36 |             weight = torch.transpose(weight, 0, 1)
 37 |             alpha = torch.broadcast_to(alpha, weight.size())
 38 |             w_q = Round.apply(torch.div(weight, alpha).clamp(Qn, Qp))
 39 |             w_q = w_q * alpha
 40 |             w_q = torch.transpose(w_q, 0, 1)
 41 |             w_q = w_q.contiguous().view(sizes)
 42 |         else:
 43 |             w_q = Round.apply(torch.div(weight, alpha).clamp(Qn, Qp))
 44 |             w_q = w_q * alpha
 45 |         return w_q
 46 | 
 47 |     @staticmethod
 48 |     def backward(ctx, grad_weight):
 49 |         #根据论文里LEARNED STEP SIZE QUANTIZATION第2.1节
 50 |         #分为三部分：位于量化区间的、小于下界的、大于上界的
 51 |         weight, alpha = ctx.saved_tensors
 52 |         g, Qn, Qp, per_channel = ctx.other
 53 |         if per_channel:
 54 |             sizes = weight.size()
 55 |             weight = weight.contiguous().view(weight.size()[0], -1)
 56 |             weight = torch.transpose(weight, 0, 1)
 57 |             alpha = torch.broadcast_to(alpha, weight.size())
 58 |             q_w = weight / alpha
 59 |             q_w = torch.transpose(q_w, 0, 1)
 60 |             q_w = q_w.contiguous().view(sizes)
 61 |         else:
 62 |             q_w = weight / alpha
 63 |         smaller = (q_w < Qn).float() #bool值转浮点值，1.0或者0.0
 64 |         bigger = (q_w > Qp).float() #bool值转浮点值，1.0或者0.0
 65 |         between = 1.0 - smaller -bigger #得到位于量化区间的index
 66 |         if per_channel:
 67 |             grad_alpha = ((smaller * Qn + bigger * Qp + 
 68 |                 between * Round.apply(q_w) - between * q_w)*grad_weight * g)
 69 |             grad_alpha = grad_alpha.contiguous().view(grad_alpha.size()[0], -1).sum(dim=1)
 70 |         else:
 71 |             grad_alpha = ((smaller * Qn + bigger * Qp + 
 72 |                 between * Round.apply(q_w) - between * q_w)*grad_weight * g).sum().unsqueeze(dim=0) #?
 73 |         #在量化区间之外的值都是常数，故导数也是0
 74 |         grad_weight = between * grad_weight  
 75 |         return grad_weight, grad_alpha, None, None, None, None
 76 | 
 77 | def grad_scale(x, scale):
 78 |     y = x
 79 |     y_grad = x * scale
 80 |     return (y - y_grad).detach() + y_grad
 81 | 
 82 | def round_pass(x):
 83 |     y = x.round()
 84 |     y_grad = x
 85 |     return (y - y_grad).detach() + y_grad
 86 | 
 87 | # A(特征)量化
 88 | class LSQActivationQuantizer(nn.Module):
 89 |     def __init__(self, a_bits, all_positive=False, batch_init = 20):
 90 |         #activations 没有per-channel这个选项的
 91 |         super(LSQActivationQuantizer, self).__init__()
 92 |         self.a_bits = a_bits
 93 |         self.all_positive = all_positive
 94 |         self.batch_init = batch_init
 95 |         if self.all_positive:
 96 |             # unsigned activation is quantized to [0, 2^b-1]
 97 |             self.Qn = 0
 98 |             self.Qp = 2 ** self.a_bits - 1
 99 |         else:
100 |             # signed weight/activation is quantized to [-2^(b-1), 2^(b-1)-1]
101 |             self.Qn = - 2 ** (self.a_bits - 1)
102 |             self.Qp = 2 ** (self.a_bits - 1) - 1
103 |         self.s = torch.nn.Parameter(torch.ones(1), requires_grad=True)  #V2
104 |         # self.register_parameter('Ascale', self.s)
105 |         self.init_state = 0
106 | 
107 |     # 量化/反量化
108 |     def forward(self, activation):
109 |         if self.a_bits == 32:
110 |             output = activation
111 |         elif self.a_bits == 1:
112 |             print('！Binary quantization is not supported ！')
113 |             assert self.a_bits != 1
114 |         else:
115 |             if self.init_state==0:
116 |                 self.g = 1.0/math.sqrt(activation.numel() * self.Qp)
117 |                 self.init_state += 1
118 |             # print(self.s, self.g)
119 |             q_a = FunLSQ.apply(activation, self.s, self.g, self.Qn, self.Qp)
120 | 
121 |             # alpha = grad_scale(self.s, g)
122 |             # q_a = Round.apply((activation/alpha).clamp(Qn, Qp)) * alpha
123 |         return q_a
124 | 
125 | # W(权重)量化
126 | class LSQWeightQuantizer(nn.Module):
127 |     def __init__(self, w_bits, all_positive=False, per_channel=False, batch_init = 20):
128 |         super(LSQWeightQuantizer, self).__init__()
129 |         self.w_bits = w_bits
130 |         self.all_positive = all_positive
131 |         self.batch_init = batch_init
132 |         if self.all_positive:
133 |             # unsigned activation is quantized to [0, 2^b-1]
134 |             self.Qn = 0
135 |             self.Qp = 2 ** w_bits - 1
136 |         else:
137 |             # signed weight/activation is quantized to [-2^(b-1), 2^(b-1)-1]
138 |             self.Qn = - 2 ** (w_bits - 1)
139 |             self.Qp = 2 ** (w_bits - 1) - 1
140 |         self.per_channel = per_channel
141 |         self.s = torch.nn.Parameter(torch.ones(1), requires_grad=True)
142 |         # self.register_parameter('Wscale', self.s)
143 |         self.init_state = 0
144 | 
145 |     # 量化/反量化
146 |     def forward(self, weight):
147 |         if self.init_state==0:
148 |             self.g = 1.0/math.sqrt(weight.numel() * self.Qp)
149 |             if self.per_channel:
150 |                 weight_tmp = weight.detach().contiguous().view(weight.size()[0], -1)
151 |                 self.s.data = torch.mean(torch.abs(weight_tmp), dim=1)*2/(math.sqrt(self.Qp))
152 |             else:
153 |                 self.s.data = torch.mean(torch.abs(weight.detach()))*2/(math.sqrt(self.Qp))
154 |             self.init_state += 1
155 |         elif self.init_state<self.batch_init:
156 |             if self.per_channel:
157 |                 weight_tmp = weight.detach().contiguous().view(weight.size()[0], -1)
158 |                 self.s.data = 0.9*self.s.data + 0.1*torch.mean(torch.abs(weight_tmp), dim=1)*2/(math.sqrt(self.Qp))
159 |             else:
160 |                 self.s.data = 0.9*self.s.data + 0.1*torch.mean(torch.abs(weight.detach()))*2/(math.sqrt(self.Qp))
161 |             self.init_state += 1
162 |         elif self.init_state==self.batch_init:
163 |             # self.s = torch.nn.Parameter(self.s)
164 |             self.init_state += 1
165 |         if self.w_bits == 32:
166 |             output = weight
167 |         elif self.w_bits == 1:
168 |             print('！Binary quantization is not supported ！')
169 |             assert self.w_bits != 1
170 |         else:
171 |             # print(self.s, self.g)
172 |             w_q = FunLSQ.apply(weight, self.s, self.g, self.Qn, self.Qp, self.per_channel)
173 | 
174 |             # alpha = grad_scale(self.s, g)
175 |             # w_q = Round.apply((weight/alpha).clamp(Qn, Qp)) * alpha
176 |         return w_q
177 | 
178 | class QuantConv2d(nn.Conv2d):
179 |     def __init__(self,
180 |                  in_channels,
181 |                  out_channels,
182 |                  kernel_size,
183 |                  stride=1,
184 |                  padding=0,
185 |                  dilation=1,
186 |                  groups=1,
187 |                  bias=True,
188 |                  padding_mode='zeros',
189 |                  a_bits=8,
190 |                  w_bits=8,
191 |                  quant_inference=False, 
192 |                  all_positive=False, 
193 |                  per_channel=False, 
194 |                  batch_init = 20):
195 |         super(QuantConv2d, self).__init__(in_channels, out_channels, kernel_size, stride, padding, dilation, groups,
196 |                                           bias, padding_mode)
197 |         self.quant_inference = quant_inference
198 |         self.activation_quantizer = LSQActivationQuantizer(a_bits=a_bits, all_positive=all_positive,batch_init = batch_init)
199 |         self.weight_quantizer = LSQWeightQuantizer(w_bits=w_bits, all_positive=all_positive, per_channel=per_channel,batch_init = batch_init)
200 | 
201 |     def forward(self, input):
202 |         quant_input = self.activation_quantizer(input)
203 |         # print('input:',input.size(),self.quant_inference)
204 |         if not self.quant_inference:
205 |             quant_weight = self.weight_quantizer(self.weight)
206 |         else:
207 |             quant_weight = self.weight
208 | 
209 |         output = F.conv2d(quant_input, quant_weight, self.bias, self.stride, self.padding, self.dilation,
210 |                           self.groups)
211 |         return output
212 | 
213 | 
214 | class QuantConvTranspose2d(nn.ConvTranspose2d):
215 |     def __init__(self,
216 |                  in_channels,
217 |                  out_channels,
218 |                  kernel_size,
219 |                  stride=1,
220 |                  padding=0,
221 |                  output_padding=0,
222 |                  dilation=1,
223 |                  groups=1,
224 |                  bias=True,
225 |                  padding_mode='zeros',
226 |                  a_bits=8,
227 |                  w_bits=8,
228 |                  quant_inference=False, 
229 |                  all_positive=False, 
230 |                  per_channel=False, 
231 |                  batch_init = 20):
232 |         super(QuantConvTranspose2d, self).__init__(in_channels, out_channels, kernel_size, stride, padding, output_padding,
233 |                                                    dilation, groups, bias, padding_mode)
234 |         self.quant_inference = quant_inference
235 |         self.activation_quantizer = LSQActivationQuantizer(a_bits=a_bits, all_positive=all_positive,batch_init = batch_init)
236 |         self.weight_quantizer = LSQWeightQuantizer(w_bits=w_bits, all_positive=all_positive, per_channel=per_channel,batch_init = batch_init)
237 | 
238 |     def forward(self, input):
239 |         quant_input = self.activation_quantizer(input)
240 |         if not self.quant_inference:
241 |             quant_weight = self.weight_quantizer(self.weight)
242 |         else:
243 |             quant_weight = self.weight
244 |         output = F.conv_transpose2d(quant_input, quant_weight, self.bias, self.stride, self.padding, self.output_padding,
245 |                                     self.groups, self.dilation)
246 |         return output
247 | 
248 | 
249 | class QuantLinear(nn.Linear):
250 |     def __init__(self,
251 |                  in_features,
252 |                  out_features,
253 |                  bias=True,
254 |                  a_bits=8,
255 |                  w_bits=8,
256 |                  quant_inference=False, 
257 |                  all_positive=False, 
258 |                  per_channel=False, 
259 |                  batch_init = 20):
260 |         super(QuantLinear, self).__init__(in_features, out_features, bias)
261 |         self.quant_inference = quant_inference
262 |         self.activation_quantizer = LSQActivationQuantizer(a_bits=a_bits, all_positive=all_positive,batch_init = batch_init)
263 |         self.weight_quantizer = LSQWeightQuantizer(w_bits=w_bits, all_positive=all_positive, per_channel=per_channel,batch_init = batch_init)
264 | 
265 |     def forward(self, input):
266 |         quant_input = self.activation_quantizer(input)
267 |         if not self.quant_inference:
268 |             quant_weight = self.weight_quantizer(self.weight)
269 |         else:
270 |             quant_weight = self.weight
271 |         output = F.linear(quant_input, quant_weight, self.bias)
272 |         return output
273 | 
274 | 
275 | def add_quant_op(module, layer_counter, a_bits=8, w_bits=8,
276 |                  quant_inference=False, all_positive=False, per_channel=False, 
277 |                  batch_init = 20):
278 |     for name, child in module.named_children():
279 |         if isinstance(child, nn.Conv2d):
280 |             layer_counter[0] += 1
281 |             if layer_counter[0] >= 1: #第一层也量化
282 |                 if child.bias is not None:
283 |                     quant_conv = QuantConv2d(child.in_channels, child.out_channels,
284 |                                              child.kernel_size, stride=child.stride,
285 |                                              padding=child.padding, dilation=child.dilation,
286 |                                              groups=child.groups, bias=True, padding_mode=child.padding_mode,
287 |                                              a_bits=a_bits, w_bits=w_bits, quant_inference=quant_inference,
288 |                                              all_positive=all_positive, per_channel=per_channel, batch_init = batch_init)
289 |                     quant_conv.bias.data = child.bias
290 |                 else:
291 |                     quant_conv = QuantConv2d(child.in_channels, child.out_channels,
292 |                                              child.kernel_size, stride=child.stride,
293 |                                              padding=child.padding, dilation=child.dilation,
294 |                                              groups=child.groups, bias=False, padding_mode=child.padding_mode,
295 |                                              a_bits=a_bits, w_bits=w_bits, quant_inference=quant_inference,
296 |                                              all_positive=all_positive, per_channel=per_channel, batch_init = batch_init)
297 |                 quant_conv.weight.data = child.weight
298 |                 module._modules[name] = quant_conv
299 |         elif isinstance(child, nn.ConvTranspose2d):
300 |             layer_counter[0] += 1
301 |             if layer_counter[0] >= 1: #第一层也量化
302 |                 if child.bias is not None:
303 |                     quant_conv_transpose = QuantConvTranspose2d(child.in_channels,
304 |                                                                 child.out_channels,
305 |                                                                 child.kernel_size,
306 |                                                                 stride=child.stride,
307 |                                                                 padding=child.padding,
308 |                                                                 output_padding=child.output_padding,
309 |                                                                 dilation=child.dilation,
310 |                                                                 groups=child.groups,
311 |                                                                 bias=True,
312 |                                                                 padding_mode=child.padding_mode,
313 |                                                                 a_bits=a_bits,
314 |                                                                 w_bits=w_bits,
315 |                                                                 quant_inference=quant_inference,
316 |                                              all_positive=all_positive, per_channel=per_channel, batch_init = batch_init)
317 |                     quant_conv_transpose.bias.data = child.bias
318 |                 else:
319 |                     quant_conv_transpose = QuantConvTranspose2d(child.in_channels,
320 |                                                                 child.out_channels,
321 |                                                                 child.kernel_size,
322 |                                                                 stride=child.stride,
323 |                                                                 padding=child.padding,
324 |                                                                 output_padding=child.output_padding,
325 |                                                                 dilation=child.dilation,
326 |                                                                 groups=child.groups, bias=False,
327 |                                                                 padding_mode=child.padding_mode,
328 |                                                                 a_bits=a_bits,
329 |                                                                 w_bits=w_bits,
330 |                                                                 quant_inference=quant_inference,
331 |                                              all_positive=all_positive, per_channel=per_channel, batch_init = batch_init)
332 |                 quant_conv_transpose.weight.data = child.weight
333 |                 module._modules[name] = quant_conv_transpose
334 |         elif isinstance(child, nn.Linear):
335 |             layer_counter[0] += 1
336 |             if layer_counter[0] >= 1: #第一层也量化
337 |                 if child.bias is not None:
338 |                     quant_linear = QuantLinear(child.in_features, child.out_features,
339 |                                                bias=True, a_bits=a_bits, w_bits=w_bits,
340 |                                                quant_inference=quant_inference,
341 |                                              all_positive=all_positive, per_channel=per_channel, batch_init = batch_init)
342 |                     quant_linear.bias.data = child.bias
343 |                 else:
344 |                     quant_linear = QuantLinear(child.in_features, child.out_features,
345 |                                                bias=False, a_bits=a_bits, w_bits=w_bits,
346 |                                                quant_inference=quant_inference,
347 |                                              all_positive=all_positive, per_channel=per_channel, batch_init = batch_init)
348 |                 quant_linear.weight.data = child.weight
349 |                 module._modules[name] = quant_linear
350 |         else:
351 |             add_quant_op(child, layer_counter, a_bits=a_bits, w_bits=w_bits,
352 |                          quant_inference=quant_inference, all_positive=all_positive, per_channel=per_channel, batch_init = batch_init)
353 | 
354 | 
355 | def prepare(model, inplace=False, a_bits=8, w_bits=8, quant_inference=False,
356 |             all_positive=False, per_channel=False, batch_init = 20):
357 |     if not inplace:
358 |         model = copy.deepcopy(model)
359 |     layer_counter = [0]
360 |     add_quant_op(model, layer_counter, a_bits=a_bits, w_bits=w_bits,
361 |                  quant_inference=quant_inference, all_positive=all_positive, 
362 |                  per_channel=per_channel, batch_init = batch_init)
363 |     return model
364 | 


--------------------------------------------------------------------------------
/seebnparam.py:
--------------------------------------------------------------------------------
  1 | #encoding=utf-8
  2 | #Author: ZouJiu
  3 | #Time: 2021-11-13
  4 | 
  5 | import numpy as np
  6 | import torch
  7 | import os
  8 | import time
  9 | import torch
 10 | import torchvision
 11 | import torchvision.transforms as transforms
 12 | from torch.utils.data import Dataset, DataLoader
 13 | # from load_datas import TF, trainDataset, collate_fn
 14 | import models #, resnet50
 15 | from quantization.lsqquantize_V1 import prepare as lsqprepareV1
 16 | from quantization.lsqquantize_V2 import prepare as lsqprepareV2
 17 | from quantization.lsqplus_quantize_V1 import prepare as lsqplusprepareV1
 18 | from quantization.lsqplus_quantize_V2 import prepare as lsqplusprepareV2
 19 | from quantization.lsqplus_quantize_V1 import update_LSQplus_activation_Scalebeta
 20 | import torch.optim as optim
 21 | import datetime
 22 | import matplotlib.pyplot as plt
 23 | # os.environ["CUDA_VISIBLE_DEVICES"] = '0'
 24 | 
 25 | def adjust_lr(optimizer, stepiters, epoch):
 26 |     # if stepiters < 100: #2warmup start
 27 |     #     lr = stepiters*0.01/100
 28 |     # elif stepiters < 2000:
 29 |     #     lr = 0.001
 30 |     # elif stepiters < 3000:
 31 |     #     lr = 0.001
 32 |     if epoch <= 30:
 33 |         lr = 0.1
 34 |     elif epoch <= 46:
 35 |         lr = 0.01
 36 |     elif epoch <= 55:
 37 |         lr = 0.001
 38 |     else:
 39 |         lr = 0.0001
 40 |     for param_group in optimizer.param_groups:
 41 |         param_group['lr'] = lr
 42 |     return lr
 43 | 
 44 | def trainer():
 45 |     #batch_init 使用预训练模型对量化参数进行初始化的iters or steps
 46 |     config = {'a_bit':8, 'w_bit':8, "all_positive":False, "per_channel":True, 
 47 |               "num_classes":10,"batch_init":20}
 48 |     pretrainedmodel = r'C:\Users\10696\Desktop\QAT\lsq+\log\model_108_42510_0.003_92.528_2021-11-27_17-49-47.pth'
 49 |     # Resnet_pretrain = False
 50 |     batch_size = 128
 51 |     num_epochs = 112
 52 |     Floatmodel = True    #QAT or float-32 train   False or True
 53 |     LSQplus = False       #LSQ+ or LSQ    True or False
 54 |     version = 'V1'
 55 |     scratch = False       #从最开始训练，不是finetuning， 若=False就是finetuning
 56 |     showstep = 31
 57 |     #LSQPlusActivationQuantizer里的self.beta初始值要关注
 58 |     plusV1_inititers = 30 #update激活层的量化参数s和beta
 59 |     assert showstep > 0
 60 |     assert isinstance(showstep, int)
 61 |     assert isinstance(batch_size, int)
 62 |     assert isinstance(num_epochs, int)
 63 |     if Floatmodel:
 64 |         prefix = 'float32'
 65 |     elif LSQplus and not Floatmodel and version=='V1':
 66 |         if  not config['per_channel']:
 67 |             prefix = 'LSQplus_V1'
 68 |         else:
 69 |             prefix = 'LSQplus_V1_pcl'
 70 |     elif LSQplus and not Floatmodel and version=='V2':
 71 |         if  not config['per_channel']:
 72 |             prefix = 'LSQplus_V2'
 73 |         else:
 74 |             prefix = 'LSQplus_V2_pcl'
 75 |     elif not LSQplus and not Floatmodel and version=='V1':
 76 |         if  not config['per_channel']:
 77 |             prefix = 'LSQ_V1'
 78 |         else:
 79 |             prefix = 'LSQ_V1_pcl'
 80 |     elif not LSQplus and not Floatmodel and version=='V2':
 81 |         if  not config['per_channel']:
 82 |             prefix = 'LSQ_V2'
 83 |         else:
 84 |             prefix = 'LSQ_V2_pcl'
 85 |     else:
 86 |         print('setting is wrong......, please check it')
 87 |         exit(-1)
 88 | 
 89 |     tim = datetime.datetime.strftime(datetime.datetime.now(),"%Y-%m-%d %H-%M-%S").replace(' ', '_')
 90 |     logfile = r'log'+os.sep+prefix+'_log_%s.txt'%tim
 91 |     savepath = r'log'
 92 |     flogs = open(logfile, 'w')
 93 | 
 94 |     train_transform = transforms.Compose([
 95 |         transforms.RandomCrop(32, padding=4),
 96 |         transforms.RandomHorizontalFlip(p=0.5),
 97 |         # transforms.Resize((32, 32)),
 98 |         transforms.ToTensor(),
 99 |         transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.201))])
100 |     test_transform = transforms.Compose([
101 |         # transforms.Resize((32, 32)),
102 |         transforms.ToTensor(),
103 |         transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.201))])
104 | 
105 |     trainset = torchvision.datasets.CIFAR10(root='datas', train=True,
106 |                                             download=True, transform=train_transform)
107 |     trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
108 |                                             shuffle=True, num_workers=2, drop_last=True)
109 | 
110 |     testset = torchvision.datasets.CIFAR10(root='datas', train=False,
111 |                                         download=True, transform=test_transform)
112 |     testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
113 |                                             shuffle=False, num_workers=2, drop_last=True)
114 | 
115 |     classes = ('plane', 'car', 'bird', 'cat',
116 |             'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
117 |     device = "cuda" if torch.cuda.is_available() else "cpu"
118 | 
119 |     model = models.resnet18(num_classes=config['num_classes'])
120 | 
121 |     #LSQ+
122 |     if LSQplus and not Floatmodel and version=='V1':
123 |         #LSQplus V1
124 |         lsqplusprepareV1(model, inplace=True, a_bits=config["a_bit"], w_bits=config["w_bit"],
125 |                 all_positive=config["all_positive"], per_channel=config["per_channel"],
126 |                 batch_init = config["batch_init"])
127 |         print(model, '\npreparing lsqplus V1 models')
128 |     elif LSQplus and not Floatmodel and version=='V2':
129 |         #LSQplus V2
130 |         lsqplusprepareV2(model, inplace=True, a_bits=config["a_bit"], w_bits=config["w_bit"],
131 |                 all_positive=config["all_positive"], per_channel=config["per_channel"],
132 |                 batch_init = config["batch_init"])
133 |         print(model, '\npreparing lsqplus V2 models')
134 |     elif not LSQplus and not Floatmodel and version=='V1':
135 |         #LSQ V1
136 |         lsqprepareV1(model, inplace=True, a_bits=config["a_bit"], w_bits=config["w_bit"],
137 |                 all_positive=config["all_positive"], per_channel=config["per_channel"],
138 |                 batch_init = config["batch_init"])
139 |         print(model, '\npreparing lsq V1 models')
140 |     elif not LSQplus and not Floatmodel and version=='V2':
141 |         #LSQ V2
142 |         lsqprepareV2(model, inplace=True, a_bits=config["a_bit"], w_bits=config["w_bit"],
143 |                 all_positive=config["all_positive"], per_channel=config["per_channel"],
144 |                 batch_init = config["batch_init"])
145 |         print(model, '\npreparing lsq V2 models')
146 |     elif Floatmodel:
147 |         print(model, '\npreparing float models')
148 |         pass
149 |     # if not Floatmodel:
150 |         # print(model)
151 |     flogs.write(str(model)+'\n')
152 |     if not os.path.exists(pretrainedmodel):
153 |         print('the pretrainedmodel do not exists %s'%pretrainedmodel)
154 |     if pretrainedmodel and os.path.exists(pretrainedmodel):
155 |         print('loading pretrained model: ', pretrainedmodel)
156 |         if torch.cuda.is_available():
157 |             state_dict = torch.load(pretrainedmodel, map_location='cuda')
158 |         else:
159 |             state_dict = torch.load(pretrainedmodel, map_location='cpu')
160 |         missingkeys, unexpected_keys = model.load_state_dict(state_dict['state_dict'], strict=False)
161 |         print('missingkeys: ', missingkeys)
162 |         print('unexpected_keys: ', unexpected_keys)
163 |         if not scratch:
164 |             iteration = state_dict['iteration']
165 |             alliters = state_dict['alliters']
166 |             nowepoch = state_dict['nowepoch']
167 |         else:
168 |             iteration = 0
169 |             alliters = 0
170 |             nowepoch = 0
171 |         print('loading complete')
172 |     else:
173 |         print('no pretrained model')
174 |         iteration = 0
175 |         alliters = 0
176 |         nowepoch = 0
177 |     model = model.to(device)
178 | 
179 |     weight = []
180 |     count = 0
181 |     weightsepa = []
182 |     for m in model.modules():
183 |         if isinstance(m, torch.nn.Conv2d):
184 |             w = m.weight.data.clone().detach().numpy() #out channel, in channel, h, w
185 |             out_channel = w.shape[0]
186 |             w_per_channel = np.reshape(w, (out_channel, -1))
187 |             w_per_layer = np.reshape(w, (-1))
188 |             print(w_per_channel.shape, w_per_layer.shape)
189 |             weight.append(w_per_layer)
190 |             weightsepa.extend(w_per_channel)
191 |             print(len(weightsepa[-1]))
192 |             count += 1
193 |     
194 |     print(len(weightsepa[11]))
195 |     plt.hist(weightsepa[11], bins=100)
196 |     plt.title("all weights parameters")
197 |     plt.ylabel('numbers')
198 |     plt.xlabel("weights")
199 |     plt.show()
200 | 
201 |     # plt.figure(figsize=(1620,1620))
202 |     fig, axs = plt.subplots(3, 3)
203 |     for i in range(3):
204 |         for j in range(3):
205 |             axs[i, j].hist(weight[i+j], bins=100)
206 |             # axs[i, j].set_title("weights of layer %d"%(i+j+1))
207 | 
208 |     plt.show()
209 | 
210 |     bn = []
211 |     count = 0
212 |     bnsepa = []
213 |     for m in model.modules():
214 |         if isinstance(m, torch.nn.BatchNorm2d):
215 |             size = m.weight.data.shape[0]
216 |             gammas = list(m.weight.data.clone().detach().numpy())
217 |             bn.extend(gammas)
218 |             bnsepa.append(gammas)
219 |             print(len(bnsepa[-1]))
220 |             count += 1
221 |     
222 |     plt.hist(bn, bins=100)
223 |     plt.ylabel('numbers')
224 |     plt.xlabel("γ")
225 |     plt.show()
226 | 
227 |     # bn.sort()
228 |     plt.plot(np.arange(len(bn)), bn)
229 |     plt.title("resnet18 BN γ parameters γ*x+β")
230 |     plt.ylabel("no sorted γ")
231 |     plt.xlabel("indexs")
232 |     plt.show()
233 | 
234 |     bn.sort()
235 |     plt.plot(np.arange(len(bn)), bn)
236 |     plt.title("resnet18 BN γ parameters γ*x+β")
237 |     plt.ylabel("sorted γ")
238 |     plt.xlabel("indexs")
239 |     plt.show()
240 | 
241 |     tmp9 = bnsepa[2]
242 |     plt.plot(np.arange(len(tmp9)), tmp9)
243 |     plt.title("resnet18 BN γ parameters γ*x+β")
244 |     plt.ylabel("no sorted γ")
245 |     plt.xlabel("3th layer")
246 |     plt.show()
247 | 
248 |     tmp9 = bnsepa[2]
249 |     tmp9.sort()
250 |     plt.plot(np.arange(len(tmp9)), tmp9)
251 |     plt.title("resnet18 BN γ parameters γ*x+β")
252 |     plt.ylabel("sorted γ")
253 |     plt.xlabel("3th layer")
254 |     plt.show()
255 | 
256 |     tmp9 = bnsepa[17]
257 |     plt.plot(np.arange(len(tmp9)), tmp9)
258 |     plt.title("resnet18 BN γ parameters γ*x+β")
259 |     plt.ylabel("no sorted γ")
260 |     plt.xlabel("18th layer")
261 |     plt.show()
262 |     
263 |     tmp9 = bnsepa[17]
264 |     tmp9.sort()
265 |     plt.plot(np.arange(len(tmp9)), tmp9)
266 |     plt.title("resnet18 BN γ parameters γ*x+β")
267 |     plt.ylabel("sorted γ")
268 |     plt.xlabel("18th layer")
269 |     plt.show()
270 | 
271 |     plt.close()
272 |     print(len(bnsepa))
273 |     print(bn[:30])
274 |     print(bn[-30:])
275 | 
276 | if __name__ == '__main__':
277 |     trainer()
278 | 


--------------------------------------------------------------------------------
/trains.py:
--------------------------------------------------------------------------------
  1 | #encoding=utf-8
  2 | #Author: ZouJiu
  3 | #Time: 2021-11-13
  4 | 
  5 | import numpy as np
  6 | import torch
  7 | import os
  8 | import time
  9 | import torch
 10 | import torchvision
 11 | import torchvision.transforms as transforms
 12 | from torch.utils.data import Dataset, DataLoader
 13 | # from load_datas import TF, trainDataset, collate_fn
 14 | import models #, resnet50
 15 | from quantization.lsqquantize_V1 import prepare as lsqprepareV1
 16 | from quantization.lsqquantize_V2 import prepare as lsqprepareV2
 17 | from quantization.lsqplus_quantize_V1 import prepare as lsqplusprepareV1
 18 | from quantization.lsqplus_quantize_V2 import prepare as lsqplusprepareV2
 19 | from quantization.lsqplus_quantize_V1 import update_LSQplus_activation_Scalebeta
 20 | import torch.optim as optim
 21 | import datetime
 22 | # os.environ["CUDA_VISIBLE_DEVICES"] = '0'
 23 | 
 24 | def adjust_lr(optimizer, stepiters, epoch):
 25 |     # if stepiters < 100: #2warmup start
 26 |     #     lr = stepiters*0.01/100
 27 |     # elif stepiters < 2000:
 28 |     #     lr = 0.001
 29 |     # elif stepiters < 3000:
 30 |     #     lr = 0.001
 31 |     if epoch <= 31:
 32 |         lr = 0.1
 33 |     elif epoch <= 61:
 34 |         lr = 0.01
 35 |     elif epoch <= 81:
 36 |         lr = 0.001
 37 |     else:
 38 |         lr = 0.0001
 39 |     for param_group in optimizer.param_groups:
 40 |         param_group['lr'] = lr
 41 |     return lr
 42 | 
 43 | def trainer():
 44 |     #batch_init 使用预训练模型对量化参数进行初始化的iters or steps
 45 |     config = {'a_bit':8, 'w_bit':8, "all_positive":False, "per_channel":False, 
 46 |               "num_classes":10,"batch_init":20}
 47 |     pretrainedmodel = r'C:\Users\10696\Desktop\QAT\lsq+\log\model_108_42510_0.003_92.528_2021-11-27_17-49-47.pth'
 48 |     # Resnet_pretrain = False
 49 |     batch_size = 128
 50 |     num_epochs = 112
 51 |     Floatmodel = False    #QAT or float-32 train   False or True
 52 |     LSQplus = False       #LSQ+ or LSQ    True or False
 53 |     version = 'V1'
 54 |     scratch = True       #从最开始训练，不是finetuning， 若=False就是finetuning
 55 |     showstep = 31
 56 |     #LSQPlusActivationQuantizer里的self.beta初始值要关注
 57 |     plusV1_inititers = 30 #update激活层的量化参数s和beta
 58 |     assert showstep > 0
 59 |     assert isinstance(showstep, int)
 60 |     assert isinstance(batch_size, int)
 61 |     assert isinstance(num_epochs, int)
 62 |     if Floatmodel:
 63 |         prefix = 'float32'
 64 |     elif LSQplus and not Floatmodel and version=='V1':
 65 |         if  not config['per_channel']:
 66 |             prefix = 'LSQplus_V1'
 67 |         else:
 68 |             prefix = 'LSQplus_V1_pcl'
 69 |     elif LSQplus and not Floatmodel and version=='V2':
 70 |         if  not config['per_channel']:
 71 |             prefix = 'LSQplus_V2'
 72 |         else:
 73 |             prefix = 'LSQplus_V2_pcl'
 74 |     elif not LSQplus and not Floatmodel and version=='V1':
 75 |         if  not config['per_channel']:
 76 |             prefix = 'LSQ_V1'
 77 |         else:
 78 |             prefix = 'LSQ_V1_pcl'
 79 |     elif not LSQplus and not Floatmodel and version=='V2':
 80 |         if  not config['per_channel']:
 81 |             prefix = 'LSQ_V2'
 82 |         else:
 83 |             prefix = 'LSQ_V2_pcl'
 84 |     else:
 85 |         print('setting is wrong......, please check it')
 86 |         exit(-1)
 87 | 
 88 |     tim = datetime.datetime.strftime(datetime.datetime.now(),"%Y-%m-%d %H-%M-%S").replace(' ', '_')
 89 |     logfile = r'log'+os.sep+prefix+'_log_%s.txt'%tim
 90 |     savepath = r'log'
 91 |     flogs = open(logfile, 'w')
 92 | 
 93 |     train_transform = transforms.Compose([
 94 |         transforms.RandomCrop(32, padding=4),
 95 |         transforms.RandomHorizontalFlip(p=0.5),
 96 |         # transforms.Resize((32, 32)),
 97 |         transforms.ToTensor(),
 98 |         transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.201))])
 99 |     test_transform = transforms.Compose([
100 |         # transforms.Resize((32, 32)),
101 |         transforms.ToTensor(),
102 |         transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.201))])
103 | 
104 |     trainset = torchvision.datasets.CIFAR10(root='datas', train=True,
105 |                                             download=True, transform=train_transform)
106 |     trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
107 |                                             shuffle=True, num_workers=2, drop_last=True)
108 | 
109 |     testset = torchvision.datasets.CIFAR10(root='datas', train=False,
110 |                                         download=True, transform=test_transform)
111 |     testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
112 |                                             shuffle=False, num_workers=2, drop_last=True)
113 | 
114 |     classes = ('plane', 'car', 'bird', 'cat',
115 |             'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
116 |     device = "cuda" if torch.cuda.is_available() else "cpu"
117 | 
118 |     model = models.resnet18(num_classes=config['num_classes'])
119 | 
120 |     #LSQ+
121 |     if LSQplus and not Floatmodel and version=='V1':
122 |         #LSQplus V1
123 |         lsqplusprepareV1(model, inplace=True, a_bits=config["a_bit"], w_bits=config["w_bit"],
124 |                 all_positive=config["all_positive"], per_channel=config["per_channel"],
125 |                 batch_init = config["batch_init"])
126 |         print(model, '\npreparing lsqplus V1 models')
127 |     elif LSQplus and not Floatmodel and version=='V2':
128 |         #LSQplus V2
129 |         lsqplusprepareV2(model, inplace=True, a_bits=config["a_bit"], w_bits=config["w_bit"],
130 |                 all_positive=config["all_positive"], per_channel=config["per_channel"],
131 |                 batch_init = config["batch_init"])
132 |         print(model, '\npreparing lsqplus V2 models')
133 |     elif not LSQplus and not Floatmodel and version=='V1':
134 |         #LSQ V1
135 |         lsqprepareV1(model, inplace=True, a_bits=config["a_bit"], w_bits=config["w_bit"],
136 |                 all_positive=config["all_positive"], per_channel=config["per_channel"],
137 |                 batch_init = config["batch_init"])
138 |         print(model, '\npreparing lsq V1 models')
139 |     elif not LSQplus and not Floatmodel and version=='V2':
140 |         #LSQ V2
141 |         lsqprepareV2(model, inplace=True, a_bits=config["a_bit"], w_bits=config["w_bit"],
142 |                 all_positive=config["all_positive"], per_channel=config["per_channel"],
143 |                 batch_init = config["batch_init"])
144 |         print(model, '\npreparing lsq V2 models')
145 |     elif Floatmodel:
146 |         print(model, '\npreparing float models')
147 |         pass
148 |     # if not Floatmodel:
149 |         # print(model)
150 |     flogs.write(str(model)+'\n')
151 |     if not os.path.exists(pretrainedmodel):
152 |         print('the pretrainedmodel do not exists %s'%pretrainedmodel)
153 |     if pretrainedmodel and os.path.exists(pretrainedmodel):
154 |         print('loading pretrained model: ', pretrainedmodel)
155 |         if torch.cuda.is_available():
156 |             state_dict = torch.load(pretrainedmodel, map_location='cuda')
157 |         else:
158 |             state_dict = torch.load(pretrainedmodel, map_location='cpu')
159 |         missingkeys, unexpected_keys = model.load_state_dict(state_dict['state_dict'], strict=False)
160 |         print('missingkeys: ', missingkeys)
161 |         print('unexpected_keys: ', unexpected_keys)
162 |         if not scratch:
163 |             iteration = state_dict['iteration']
164 |             alliters = state_dict['alliters']
165 |             nowepoch = state_dict['nowepoch']
166 |         else:
167 |             iteration = 0
168 |             alliters = 0
169 |             nowepoch = 0
170 |         print('loading complete')
171 |     else:
172 |         print('no pretrained model')
173 |         iteration = 0
174 |         alliters = 0
175 |         nowepoch = 0
176 |     model = model.to(device)
177 |     # print(torch.__version__)
178 |     time.sleep(3)
179 |     adam = False
180 |     lr = 0.001 # initial learning rate (SGD=1E-2, Adam=1E-3)
181 |     momnetum=0.9
182 |     params = [p for p in model.parameters() if p.requires_grad]
183 |     # if adam:
184 |     #     optimizer = optim.Adam(params, lr=lr, betas=(momnetum, 0.999))  # adjust beta1 to momentum
185 |     # else:
186 |     optimizer = optim.SGD(params, lr=lr, momentum=momnetum, weight_decay=5e-4)
187 |     # and a learning rate scheduler
188 |     # lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
189 |     #                                                step_size=7,
190 |     #                                                gamma=0.1)
191 |     torch.manual_seed(999999)
192 |     start = time.time()
193 |     print('Using {} device'.format(device))
194 |     flogs.write('Using {} device'.format(device)+'\n')
195 |     stepiters = 0
196 |     criterion = torch.nn.CrossEntropyLoss()
197 |     pre = -999999
198 |     for epoch in range(num_epochs):
199 |         print('\nEpoch {}/{}'.format(epoch, num_epochs))
200 |         flogs.write('Epoch {}/{}'.format(epoch, num_epochs)+'\n')
201 |         print('-'*100)
202 |         running_loss = 0
203 |         if epoch<nowepoch:
204 |             stepiters += len(trainloader)
205 |             continue
206 |         model.train()
207 |         count = 0
208 |         print("length trainloader is: ", len(trainloader))
209 |         train_acc = 0
210 |         train_all = 0
211 |         for i, (image, label) in enumerate(trainloader):
212 |             stepiters += 1
213 |             if stepiters<alliters:
214 |                 continue
215 |             count += 1
216 |             lr = adjust_lr(optimizer, stepiters, epoch) #
217 |             optimizer.zero_grad()
218 |             image = image.to(device)
219 |             label = label.to(device)
220 |             outputs = model(image)
221 |             _, predict = torch.max(outputs, 1)
222 |             train_acc += (predict==label).sum()
223 |             train_all += len(label)
224 |             train_Acc = train_acc/train_all
225 | 
226 |             loss = criterion(outputs, label)
227 |             loss.backward()
228 | 
229 |             #LSQplus V1论文原版的实现，在前几个的iters使用MSE公式update其s和beta
230 |             if LSQplus and version=='V1' and not Floatmodel and stepiters<plusV1_inititers and epoch==0:
231 |                 print(stepiters, ': update_LSQplus_activation_Scalebeta')
232 |                 model = update_LSQplus_activation_Scalebeta(model)
233 |             optimizer.step()
234 |             # statistics
235 |             running_loss += loss.item()
236 |             epoch_loss = running_loss / count
237 |             logword = 'epoch: {}, iteration: {}, alliters: {}, lr: {}, loss: {:.3f}, avgloss: {:.3f}, train_Acc: {:.3f}'.format(
238 |                 epoch, i+1, stepiters, optimizer.state_dict()['param_groups'][0]['lr'], loss.item(), epoch_loss, train_Acc)
239 |             if i%showstep==0:
240 |                 print(logword)
241 |                 flogs.write(logword+'\n')
242 |                 flogs.flush()
243 |             savestate = {'state_dict':model.state_dict(),\
244 |                         'iteration':i,\
245 |                         'alliters':stepiters,\
246 |                         "lr":lr,\
247 |                         'nowepoch':epoch}
248 |         # prepare to count predictions for each class
249 |         correct_pred = {classname: 0 for classname in classes}
250 |         total_pred = {classname: 0 for classname in classes}
251 | 
252 |         # again no gradients needed
253 |         if epoch%3==0 and epoch>nowepoch:
254 |             print('validation of testes')
255 |             with torch.no_grad():
256 |                 count = 0
257 |                 print('length of testloader: ', len(testloader))
258 |                 for data in testloader:
259 |                     count += 1
260 |                     images, labels = data
261 |                     images = images.to(device)
262 |                     labels = labels.to(device)
263 |                     outputs = model(images)
264 |                     # if count==100:
265 |                     #     break
266 |                     _, predictions = torch.max(outputs, 1)
267 |                     # collect the correct predictions for each class
268 |                     for label, prediction in zip(labels, predictions):
269 |                         if label == prediction:
270 |                             correct_pred[classes[label]] += 1
271 |                         total_pred[classes[label]] += 1
272 | 
273 |             # print accuracy for each class
274 |             correctall = 0
275 |             alltest = 0
276 |             for classname, correct_count in correct_pred.items():
277 |                 accuracy = 100 * float(correct_count) / total_pred[classname]
278 |                 print("Validation Accuracy for class {:5s} is: {:.1f} %".format(classname,
279 |                                                             accuracy))
280 |                 correctall += correct_count
281 |                 alltest += total_pred[classname]
282 |                 flogs.write("Accuracy for class {:5s} is: {:.1f} %".format(classname, accuracy)+'\n')
283 |             flogs.flush()
284 |             Accuracy = round(100 * float(correctall)/alltest, 3)
285 |             print("Accuracy all is: {:.1f}".format(Accuracy))
286 | 
287 |             # lr_scheduler.step()
288 |             iteration=0
289 |             try:
290 |                 if epoch>nowepoch and Accuracy>pre:
291 |                     torch.save(savestate, os.path.join(savepath, prefix+'_models_{}_{}_{}_{:.3f}_{}_{}.pth'.format(
292 |                         lr, epoch, stepiters, loss.item(),Accuracy,tim)))
293 |                 pre = Accuracy
294 |             except:
295 |                 pass
296 |         # evaluate(model, dataloader_test, device = device)
297 |     timeused  = time.time() - start
298 |     print('Training complete in {:.0f}m {:.0f}s'.format(timeused//60, timeused%60))
299 |     flogs.close()
300 | 
301 | if __name__ == '__main__':
302 |     trainer()
303 | 


--------------------------------------------------------------------------------