├── LICENSE
├── README.md
├── 人像生成相关
├── InstantID.md
├── PhotoMaker.md
├── PuLID.md
└── images
│ ├── InstantID.png
│ ├── PhotoMaker.png
│ └── PuLID.png
├── 可控性相关
├── ACE++.md
├── ACE.md
├── BrushNet.md
├── Controlnet.md
├── IC-Light.md
└── images
│ ├── ACE++介绍.png
│ ├── ACE++框架.png
│ ├── ACE架构.png
│ ├── ACE概览.png
│ ├── BrushNet.png
│ ├── IC-Light.png
│ ├── controlnet结构图.png
│ └── 零卷积.png
└── 风格迁移相关
├── IP-Adapter.md
└── images
└── IP-Adapter模型架构图.png
/LICENSE:
--------------------------------------------------------------------------------
1 | GNU GENERAL PUBLIC LICENSE
2 | Version 3, 29 June 2007
3 |
4 | Copyright (C) 2007 Free Software Foundation, Inc.
5 | Everyone is permitted to copy and distribute verbatim copies
6 | of this license document, but changing it is not allowed.
7 |
8 | Preamble
9 |
10 | The GNU General Public License is a free, copyleft license for
11 | software and other kinds of works.
12 |
13 | The licenses for most software and other practical works are designed
14 | to take away your freedom to share and change the works. By contrast,
15 | the GNU General Public License is intended to guarantee your freedom to
16 | share and change all versions of a program--to make sure it remains free
17 | software for all its users. We, the Free Software Foundation, use the
18 | GNU General Public License for most of our software; it applies also to
19 | any other work released this way by its authors. You can apply it to
20 | your programs, too.
21 |
22 | When we speak of free software, we are referring to freedom, not
23 | price. Our General Public Licenses are designed to make sure that you
24 | have the freedom to distribute copies of free software (and charge for
25 | them if you wish), that you receive source code or can get it if you
26 | want it, that you can change the software or use pieces of it in new
27 | free programs, and that you know you can do these things.
28 |
29 | To protect your rights, we need to prevent others from denying you
30 | these rights or asking you to surrender the rights. Therefore, you have
31 | certain responsibilities if you distribute copies of the software, or if
32 | you modify it: responsibilities to respect the freedom of others.
33 |
34 | For example, if you distribute copies of such a program, whether
35 | gratis or for a fee, you must pass on to the recipients the same
36 | freedoms that you received. You must make sure that they, too, receive
37 | or can get the source code. And you must show them these terms so they
38 | know their rights.
39 |
40 | Developers that use the GNU GPL protect your rights with two steps:
41 | (1) assert copyright on the software, and (2) offer you this License
42 | giving you legal permission to copy, distribute and/or modify it.
43 |
44 | For the developers' and authors' protection, the GPL clearly explains
45 | that there is no warranty for this free software. For both users' and
46 | authors' sake, the GPL requires that modified versions be marked as
47 | changed, so that their problems will not be attributed erroneously to
48 | authors of previous versions.
49 |
50 | Some devices are designed to deny users access to install or run
51 | modified versions of the software inside them, although the manufacturer
52 | can do so. This is fundamentally incompatible with the aim of
53 | protecting users' freedom to change the software. The systematic
54 | pattern of such abuse occurs in the area of products for individuals to
55 | use, which is precisely where it is most unacceptable. Therefore, we
56 | have designed this version of the GPL to prohibit the practice for those
57 | products. If such problems arise substantially in other domains, we
58 | stand ready to extend this provision to those domains in future versions
59 | of the GPL, as needed to protect the freedom of users.
60 |
61 | Finally, every program is threatened constantly by software patents.
62 | States should not allow patents to restrict development and use of
63 | software on general-purpose computers, but in those that do, we wish to
64 | avoid the special danger that patents applied to a free program could
65 | make it effectively proprietary. To prevent this, the GPL assures that
66 | patents cannot be used to render the program non-free.
67 |
68 | The precise terms and conditions for copying, distribution and
69 | modification follow.
70 |
71 | TERMS AND CONDITIONS
72 |
73 | 0. Definitions.
74 |
75 | "This License" refers to version 3 of the GNU General Public License.
76 |
77 | "Copyright" also means copyright-like laws that apply to other kinds of
78 | works, such as semiconductor masks.
79 |
80 | "The Program" refers to any copyrightable work licensed under this
81 | License. Each licensee is addressed as "you". "Licensees" and
82 | "recipients" may be individuals or organizations.
83 |
84 | To "modify" a work means to copy from or adapt all or part of the work
85 | in a fashion requiring copyright permission, other than the making of an
86 | exact copy. The resulting work is called a "modified version" of the
87 | earlier work or a work "based on" the earlier work.
88 |
89 | A "covered work" means either the unmodified Program or a work based
90 | on the Program.
91 |
92 | To "propagate" a work means to do anything with it that, without
93 | permission, would make you directly or secondarily liable for
94 | infringement under applicable copyright law, except executing it on a
95 | computer or modifying a private copy. Propagation includes copying,
96 | distribution (with or without modification), making available to the
97 | public, and in some countries other activities as well.
98 |
99 | To "convey" a work means any kind of propagation that enables other
100 | parties to make or receive copies. Mere interaction with a user through
101 | a computer network, with no transfer of a copy, is not conveying.
102 |
103 | An interactive user interface displays "Appropriate Legal Notices"
104 | to the extent that it includes a convenient and prominently visible
105 | feature that (1) displays an appropriate copyright notice, and (2)
106 | tells the user that there is no warranty for the work (except to the
107 | extent that warranties are provided), that licensees may convey the
108 | work under this License, and how to view a copy of this License. If
109 | the interface presents a list of user commands or options, such as a
110 | menu, a prominent item in the list meets this criterion.
111 |
112 | 1. Source Code.
113 |
114 | The "source code" for a work means the preferred form of the work
115 | for making modifications to it. "Object code" means any non-source
116 | form of a work.
117 |
118 | A "Standard Interface" means an interface that either is an official
119 | standard defined by a recognized standards body, or, in the case of
120 | interfaces specified for a particular programming language, one that
121 | is widely used among developers working in that language.
122 |
123 | The "System Libraries" of an executable work include anything, other
124 | than the work as a whole, that (a) is included in the normal form of
125 | packaging a Major Component, but which is not part of that Major
126 | Component, and (b) serves only to enable use of the work with that
127 | Major Component, or to implement a Standard Interface for which an
128 | implementation is available to the public in source code form. A
129 | "Major Component", in this context, means a major essential component
130 | (kernel, window system, and so on) of the specific operating system
131 | (if any) on which the executable work runs, or a compiler used to
132 | produce the work, or an object code interpreter used to run it.
133 |
134 | The "Corresponding Source" for a work in object code form means all
135 | the source code needed to generate, install, and (for an executable
136 | work) run the object code and to modify the work, including scripts to
137 | control those activities. However, it does not include the work's
138 | System Libraries, or general-purpose tools or generally available free
139 | programs which are used unmodified in performing those activities but
140 | which are not part of the work. For example, Corresponding Source
141 | includes interface definition files associated with source files for
142 | the work, and the source code for shared libraries and dynamically
143 | linked subprograms that the work is specifically designed to require,
144 | such as by intimate data communication or control flow between those
145 | subprograms and other parts of the work.
146 |
147 | The Corresponding Source need not include anything that users
148 | can regenerate automatically from other parts of the Corresponding
149 | Source.
150 |
151 | The Corresponding Source for a work in source code form is that
152 | same work.
153 |
154 | 2. Basic Permissions.
155 |
156 | All rights granted under this License are granted for the term of
157 | copyright on the Program, and are irrevocable provided the stated
158 | conditions are met. This License explicitly affirms your unlimited
159 | permission to run the unmodified Program. The output from running a
160 | covered work is covered by this License only if the output, given its
161 | content, constitutes a covered work. This License acknowledges your
162 | rights of fair use or other equivalent, as provided by copyright law.
163 |
164 | You may make, run and propagate covered works that you do not
165 | convey, without conditions so long as your license otherwise remains
166 | in force. You may convey covered works to others for the sole purpose
167 | of having them make modifications exclusively for you, or provide you
168 | with facilities for running those works, provided that you comply with
169 | the terms of this License in conveying all material for which you do
170 | not control copyright. Those thus making or running the covered works
171 | for you must do so exclusively on your behalf, under your direction
172 | and control, on terms that prohibit them from making any copies of
173 | your copyrighted material outside their relationship with you.
174 |
175 | Conveying under any other circumstances is permitted solely under
176 | the conditions stated below. Sublicensing is not allowed; section 10
177 | makes it unnecessary.
178 |
179 | 3. Protecting Users' Legal Rights From Anti-Circumvention Law.
180 |
181 | No covered work shall be deemed part of an effective technological
182 | measure under any applicable law fulfilling obligations under article
183 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or
184 | similar laws prohibiting or restricting circumvention of such
185 | measures.
186 |
187 | When you convey a covered work, you waive any legal power to forbid
188 | circumvention of technological measures to the extent such circumvention
189 | is effected by exercising rights under this License with respect to
190 | the covered work, and you disclaim any intention to limit operation or
191 | modification of the work as a means of enforcing, against the work's
192 | users, your or third parties' legal rights to forbid circumvention of
193 | technological measures.
194 |
195 | 4. Conveying Verbatim Copies.
196 |
197 | You may convey verbatim copies of the Program's source code as you
198 | receive it, in any medium, provided that you conspicuously and
199 | appropriately publish on each copy an appropriate copyright notice;
200 | keep intact all notices stating that this License and any
201 | non-permissive terms added in accord with section 7 apply to the code;
202 | keep intact all notices of the absence of any warranty; and give all
203 | recipients a copy of this License along with the Program.
204 |
205 | You may charge any price or no price for each copy that you convey,
206 | and you may offer support or warranty protection for a fee.
207 |
208 | 5. Conveying Modified Source Versions.
209 |
210 | You may convey a work based on the Program, or the modifications to
211 | produce it from the Program, in the form of source code under the
212 | terms of section 4, provided that you also meet all of these conditions:
213 |
214 | a) The work must carry prominent notices stating that you modified
215 | it, and giving a relevant date.
216 |
217 | b) The work must carry prominent notices stating that it is
218 | released under this License and any conditions added under section
219 | 7. This requirement modifies the requirement in section 4 to
220 | "keep intact all notices".
221 |
222 | c) You must license the entire work, as a whole, under this
223 | License to anyone who comes into possession of a copy. This
224 | License will therefore apply, along with any applicable section 7
225 | additional terms, to the whole of the work, and all its parts,
226 | regardless of how they are packaged. This License gives no
227 | permission to license the work in any other way, but it does not
228 | invalidate such permission if you have separately received it.
229 |
230 | d) If the work has interactive user interfaces, each must display
231 | Appropriate Legal Notices; however, if the Program has interactive
232 | interfaces that do not display Appropriate Legal Notices, your
233 | work need not make them do so.
234 |
235 | A compilation of a covered work with other separate and independent
236 | works, which are not by their nature extensions of the covered work,
237 | and which are not combined with it such as to form a larger program,
238 | in or on a volume of a storage or distribution medium, is called an
239 | "aggregate" if the compilation and its resulting copyright are not
240 | used to limit the access or legal rights of the compilation's users
241 | beyond what the individual works permit. Inclusion of a covered work
242 | in an aggregate does not cause this License to apply to the other
243 | parts of the aggregate.
244 |
245 | 6. Conveying Non-Source Forms.
246 |
247 | You may convey a covered work in object code form under the terms
248 | of sections 4 and 5, provided that you also convey the
249 | machine-readable Corresponding Source under the terms of this License,
250 | in one of these ways:
251 |
252 | a) Convey the object code in, or embodied in, a physical product
253 | (including a physical distribution medium), accompanied by the
254 | Corresponding Source fixed on a durable physical medium
255 | customarily used for software interchange.
256 |
257 | b) Convey the object code in, or embodied in, a physical product
258 | (including a physical distribution medium), accompanied by a
259 | written offer, valid for at least three years and valid for as
260 | long as you offer spare parts or customer support for that product
261 | model, to give anyone who possesses the object code either (1) a
262 | copy of the Corresponding Source for all the software in the
263 | product that is covered by this License, on a durable physical
264 | medium customarily used for software interchange, for a price no
265 | more than your reasonable cost of physically performing this
266 | conveying of source, or (2) access to copy the
267 | Corresponding Source from a network server at no charge.
268 |
269 | c) Convey individual copies of the object code with a copy of the
270 | written offer to provide the Corresponding Source. This
271 | alternative is allowed only occasionally and noncommercially, and
272 | only if you received the object code with such an offer, in accord
273 | with subsection 6b.
274 |
275 | d) Convey the object code by offering access from a designated
276 | place (gratis or for a charge), and offer equivalent access to the
277 | Corresponding Source in the same way through the same place at no
278 | further charge. You need not require recipients to copy the
279 | Corresponding Source along with the object code. If the place to
280 | copy the object code is a network server, the Corresponding Source
281 | may be on a different server (operated by you or a third party)
282 | that supports equivalent copying facilities, provided you maintain
283 | clear directions next to the object code saying where to find the
284 | Corresponding Source. Regardless of what server hosts the
285 | Corresponding Source, you remain obligated to ensure that it is
286 | available for as long as needed to satisfy these requirements.
287 |
288 | e) Convey the object code using peer-to-peer transmission, provided
289 | you inform other peers where the object code and Corresponding
290 | Source of the work are being offered to the general public at no
291 | charge under subsection 6d.
292 |
293 | A separable portion of the object code, whose source code is excluded
294 | from the Corresponding Source as a System Library, need not be
295 | included in conveying the object code work.
296 |
297 | A "User Product" is either (1) a "consumer product", which means any
298 | tangible personal property which is normally used for personal, family,
299 | or household purposes, or (2) anything designed or sold for incorporation
300 | into a dwelling. In determining whether a product is a consumer product,
301 | doubtful cases shall be resolved in favor of coverage. For a particular
302 | product received by a particular user, "normally used" refers to a
303 | typical or common use of that class of product, regardless of the status
304 | of the particular user or of the way in which the particular user
305 | actually uses, or expects or is expected to use, the product. A product
306 | is a consumer product regardless of whether the product has substantial
307 | commercial, industrial or non-consumer uses, unless such uses represent
308 | the only significant mode of use of the product.
309 |
310 | "Installation Information" for a User Product means any methods,
311 | procedures, authorization keys, or other information required to install
312 | and execute modified versions of a covered work in that User Product from
313 | a modified version of its Corresponding Source. The information must
314 | suffice to ensure that the continued functioning of the modified object
315 | code is in no case prevented or interfered with solely because
316 | modification has been made.
317 |
318 | If you convey an object code work under this section in, or with, or
319 | specifically for use in, a User Product, and the conveying occurs as
320 | part of a transaction in which the right of possession and use of the
321 | User Product is transferred to the recipient in perpetuity or for a
322 | fixed term (regardless of how the transaction is characterized), the
323 | Corresponding Source conveyed under this section must be accompanied
324 | by the Installation Information. But this requirement does not apply
325 | if neither you nor any third party retains the ability to install
326 | modified object code on the User Product (for example, the work has
327 | been installed in ROM).
328 |
329 | The requirement to provide Installation Information does not include a
330 | requirement to continue to provide support service, warranty, or updates
331 | for a work that has been modified or installed by the recipient, or for
332 | the User Product in which it has been modified or installed. Access to a
333 | network may be denied when the modification itself materially and
334 | adversely affects the operation of the network or violates the rules and
335 | protocols for communication across the network.
336 |
337 | Corresponding Source conveyed, and Installation Information provided,
338 | in accord with this section must be in a format that is publicly
339 | documented (and with an implementation available to the public in
340 | source code form), and must require no special password or key for
341 | unpacking, reading or copying.
342 |
343 | 7. Additional Terms.
344 |
345 | "Additional permissions" are terms that supplement the terms of this
346 | License by making exceptions from one or more of its conditions.
347 | Additional permissions that are applicable to the entire Program shall
348 | be treated as though they were included in this License, to the extent
349 | that they are valid under applicable law. If additional permissions
350 | apply only to part of the Program, that part may be used separately
351 | under those permissions, but the entire Program remains governed by
352 | this License without regard to the additional permissions.
353 |
354 | When you convey a copy of a covered work, you may at your option
355 | remove any additional permissions from that copy, or from any part of
356 | it. (Additional permissions may be written to require their own
357 | removal in certain cases when you modify the work.) You may place
358 | additional permissions on material, added by you to a covered work,
359 | for which you have or can give appropriate copyright permission.
360 |
361 | Notwithstanding any other provision of this License, for material you
362 | add to a covered work, you may (if authorized by the copyright holders of
363 | that material) supplement the terms of this License with terms:
364 |
365 | a) Disclaiming warranty or limiting liability differently from the
366 | terms of sections 15 and 16 of this License; or
367 |
368 | b) Requiring preservation of specified reasonable legal notices or
369 | author attributions in that material or in the Appropriate Legal
370 | Notices displayed by works containing it; or
371 |
372 | c) Prohibiting misrepresentation of the origin of that material, or
373 | requiring that modified versions of such material be marked in
374 | reasonable ways as different from the original version; or
375 |
376 | d) Limiting the use for publicity purposes of names of licensors or
377 | authors of the material; or
378 |
379 | e) Declining to grant rights under trademark law for use of some
380 | trade names, trademarks, or service marks; or
381 |
382 | f) Requiring indemnification of licensors and authors of that
383 | material by anyone who conveys the material (or modified versions of
384 | it) with contractual assumptions of liability to the recipient, for
385 | any liability that these contractual assumptions directly impose on
386 | those licensors and authors.
387 |
388 | All other non-permissive additional terms are considered "further
389 | restrictions" within the meaning of section 10. If the Program as you
390 | received it, or any part of it, contains a notice stating that it is
391 | governed by this License along with a term that is a further
392 | restriction, you may remove that term. If a license document contains
393 | a further restriction but permits relicensing or conveying under this
394 | License, you may add to a covered work material governed by the terms
395 | of that license document, provided that the further restriction does
396 | not survive such relicensing or conveying.
397 |
398 | If you add terms to a covered work in accord with this section, you
399 | must place, in the relevant source files, a statement of the
400 | additional terms that apply to those files, or a notice indicating
401 | where to find the applicable terms.
402 |
403 | Additional terms, permissive or non-permissive, may be stated in the
404 | form of a separately written license, or stated as exceptions;
405 | the above requirements apply either way.
406 |
407 | 8. Termination.
408 |
409 | You may not propagate or modify a covered work except as expressly
410 | provided under this License. Any attempt otherwise to propagate or
411 | modify it is void, and will automatically terminate your rights under
412 | this License (including any patent licenses granted under the third
413 | paragraph of section 11).
414 |
415 | However, if you cease all violation of this License, then your
416 | license from a particular copyright holder is reinstated (a)
417 | provisionally, unless and until the copyright holder explicitly and
418 | finally terminates your license, and (b) permanently, if the copyright
419 | holder fails to notify you of the violation by some reasonable means
420 | prior to 60 days after the cessation.
421 |
422 | Moreover, your license from a particular copyright holder is
423 | reinstated permanently if the copyright holder notifies you of the
424 | violation by some reasonable means, this is the first time you have
425 | received notice of violation of this License (for any work) from that
426 | copyright holder, and you cure the violation prior to 30 days after
427 | your receipt of the notice.
428 |
429 | Termination of your rights under this section does not terminate the
430 | licenses of parties who have received copies or rights from you under
431 | this License. If your rights have been terminated and not permanently
432 | reinstated, you do not qualify to receive new licenses for the same
433 | material under section 10.
434 |
435 | 9. Acceptance Not Required for Having Copies.
436 |
437 | You are not required to accept this License in order to receive or
438 | run a copy of the Program. Ancillary propagation of a covered work
439 | occurring solely as a consequence of using peer-to-peer transmission
440 | to receive a copy likewise does not require acceptance. However,
441 | nothing other than this License grants you permission to propagate or
442 | modify any covered work. These actions infringe copyright if you do
443 | not accept this License. Therefore, by modifying or propagating a
444 | covered work, you indicate your acceptance of this License to do so.
445 |
446 | 10. Automatic Licensing of Downstream Recipients.
447 |
448 | Each time you convey a covered work, the recipient automatically
449 | receives a license from the original licensors, to run, modify and
450 | propagate that work, subject to this License. You are not responsible
451 | for enforcing compliance by third parties with this License.
452 |
453 | An "entity transaction" is a transaction transferring control of an
454 | organization, or substantially all assets of one, or subdividing an
455 | organization, or merging organizations. If propagation of a covered
456 | work results from an entity transaction, each party to that
457 | transaction who receives a copy of the work also receives whatever
458 | licenses to the work the party's predecessor in interest had or could
459 | give under the previous paragraph, plus a right to possession of the
460 | Corresponding Source of the work from the predecessor in interest, if
461 | the predecessor has it or can get it with reasonable efforts.
462 |
463 | You may not impose any further restrictions on the exercise of the
464 | rights granted or affirmed under this License. For example, you may
465 | not impose a license fee, royalty, or other charge for exercise of
466 | rights granted under this License, and you may not initiate litigation
467 | (including a cross-claim or counterclaim in a lawsuit) alleging that
468 | any patent claim is infringed by making, using, selling, offering for
469 | sale, or importing the Program or any portion of it.
470 |
471 | 11. Patents.
472 |
473 | A "contributor" is a copyright holder who authorizes use under this
474 | License of the Program or a work on which the Program is based. The
475 | work thus licensed is called the contributor's "contributor version".
476 |
477 | A contributor's "essential patent claims" are all patent claims
478 | owned or controlled by the contributor, whether already acquired or
479 | hereafter acquired, that would be infringed by some manner, permitted
480 | by this License, of making, using, or selling its contributor version,
481 | but do not include claims that would be infringed only as a
482 | consequence of further modification of the contributor version. For
483 | purposes of this definition, "control" includes the right to grant
484 | patent sublicenses in a manner consistent with the requirements of
485 | this License.
486 |
487 | Each contributor grants you a non-exclusive, worldwide, royalty-free
488 | patent license under the contributor's essential patent claims, to
489 | make, use, sell, offer for sale, import and otherwise run, modify and
490 | propagate the contents of its contributor version.
491 |
492 | In the following three paragraphs, a "patent license" is any express
493 | agreement or commitment, however denominated, not to enforce a patent
494 | (such as an express permission to practice a patent or covenant not to
495 | sue for patent infringement). To "grant" such a patent license to a
496 | party means to make such an agreement or commitment not to enforce a
497 | patent against the party.
498 |
499 | If you convey a covered work, knowingly relying on a patent license,
500 | and the Corresponding Source of the work is not available for anyone
501 | to copy, free of charge and under the terms of this License, through a
502 | publicly available network server or other readily accessible means,
503 | then you must either (1) cause the Corresponding Source to be so
504 | available, or (2) arrange to deprive yourself of the benefit of the
505 | patent license for this particular work, or (3) arrange, in a manner
506 | consistent with the requirements of this License, to extend the patent
507 | license to downstream recipients. "Knowingly relying" means you have
508 | actual knowledge that, but for the patent license, your conveying the
509 | covered work in a country, or your recipient's use of the covered work
510 | in a country, would infringe one or more identifiable patents in that
511 | country that you have reason to believe are valid.
512 |
513 | If, pursuant to or in connection with a single transaction or
514 | arrangement, you convey, or propagate by procuring conveyance of, a
515 | covered work, and grant a patent license to some of the parties
516 | receiving the covered work authorizing them to use, propagate, modify
517 | or convey a specific copy of the covered work, then the patent license
518 | you grant is automatically extended to all recipients of the covered
519 | work and works based on it.
520 |
521 | A patent license is "discriminatory" if it does not include within
522 | the scope of its coverage, prohibits the exercise of, or is
523 | conditioned on the non-exercise of one or more of the rights that are
524 | specifically granted under this License. You may not convey a covered
525 | work if you are a party to an arrangement with a third party that is
526 | in the business of distributing software, under which you make payment
527 | to the third party based on the extent of your activity of conveying
528 | the work, and under which the third party grants, to any of the
529 | parties who would receive the covered work from you, a discriminatory
530 | patent license (a) in connection with copies of the covered work
531 | conveyed by you (or copies made from those copies), or (b) primarily
532 | for and in connection with specific products or compilations that
533 | contain the covered work, unless you entered into that arrangement,
534 | or that patent license was granted, prior to 28 March 2007.
535 |
536 | Nothing in this License shall be construed as excluding or limiting
537 | any implied license or other defenses to infringement that may
538 | otherwise be available to you under applicable patent law.
539 |
540 | 12. No Surrender of Others' Freedom.
541 |
542 | If conditions are imposed on you (whether by court order, agreement or
543 | otherwise) that contradict the conditions of this License, they do not
544 | excuse you from the conditions of this License. If you cannot convey a
545 | covered work so as to satisfy simultaneously your obligations under this
546 | License and any other pertinent obligations, then as a consequence you may
547 | not convey it at all. For example, if you agree to terms that obligate you
548 | to collect a royalty for further conveying from those to whom you convey
549 | the Program, the only way you could satisfy both those terms and this
550 | License would be to refrain entirely from conveying the Program.
551 |
552 | 13. Use with the GNU Affero General Public License.
553 |
554 | Notwithstanding any other provision of this License, you have
555 | permission to link or combine any covered work with a work licensed
556 | under version 3 of the GNU Affero General Public License into a single
557 | combined work, and to convey the resulting work. The terms of this
558 | License will continue to apply to the part which is the covered work,
559 | but the special requirements of the GNU Affero General Public License,
560 | section 13, concerning interaction through a network will apply to the
561 | combination as such.
562 |
563 | 14. Revised Versions of this License.
564 |
565 | The Free Software Foundation may publish revised and/or new versions of
566 | the GNU General Public License from time to time. Such new versions will
567 | be similar in spirit to the present version, but may differ in detail to
568 | address new problems or concerns.
569 |
570 | Each version is given a distinguishing version number. If the
571 | Program specifies that a certain numbered version of the GNU General
572 | Public License "or any later version" applies to it, you have the
573 | option of following the terms and conditions either of that numbered
574 | version or of any later version published by the Free Software
575 | Foundation. If the Program does not specify a version number of the
576 | GNU General Public License, you may choose any version ever published
577 | by the Free Software Foundation.
578 |
579 | If the Program specifies that a proxy can decide which future
580 | versions of the GNU General Public License can be used, that proxy's
581 | public statement of acceptance of a version permanently authorizes you
582 | to choose that version for the Program.
583 |
584 | Later license versions may give you additional or different
585 | permissions. However, no additional obligations are imposed on any
586 | author or copyright holder as a result of your choosing to follow a
587 | later version.
588 |
589 | 15. Disclaimer of Warranty.
590 |
591 | THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
592 | APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
593 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
594 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
595 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
596 | PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
597 | IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
598 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
599 |
600 | 16. Limitation of Liability.
601 |
602 | IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
603 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
604 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
605 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
606 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
607 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
608 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
609 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
610 | SUCH DAMAGES.
611 |
612 | 17. Interpretation of Sections 15 and 16.
613 |
614 | If the disclaimer of warranty and limitation of liability provided
615 | above cannot be given local legal effect according to their terms,
616 | reviewing courts shall apply local law that most closely approximates
617 | an absolute waiver of all civil liability in connection with the
618 | Program, unless a warranty or assumption of liability accompanies a
619 | copy of the Program in return for a fee.
620 |
621 | END OF TERMS AND CONDITIONS
622 |
623 | How to Apply These Terms to Your New Programs
624 |
625 | If you develop a new program, and you want it to be of the greatest
626 | possible use to the public, the best way to achieve this is to make it
627 | free software which everyone can redistribute and change under these terms.
628 |
629 | To do so, attach the following notices to the program. It is safest
630 | to attach them to the start of each source file to most effectively
631 | state the exclusion of warranty; and each file should have at least
632 | the "copyright" line and a pointer to where the full notice is found.
633 |
634 |
635 | Copyright (C)
636 |
637 | This program is free software: you can redistribute it and/or modify
638 | it under the terms of the GNU General Public License as published by
639 | the Free Software Foundation, either version 3 of the License, or
640 | (at your option) any later version.
641 |
642 | This program is distributed in the hope that it will be useful,
643 | but WITHOUT ANY WARRANTY; without even the implied warranty of
644 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
645 | GNU General Public License for more details.
646 |
647 | You should have received a copy of the GNU General Public License
648 | along with this program. If not, see .
649 |
650 | Also add information on how to contact you by electronic and paper mail.
651 |
652 | If the program does terminal interaction, make it output a short
653 | notice like this when it starts in an interactive mode:
654 |
655 | Copyright (C)
656 | This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
657 | This is free software, and you are welcome to redistribute it
658 | under certain conditions; type `show c' for details.
659 |
660 | The hypothetical commands `show w' and `show c' should show the appropriate
661 | parts of the General Public License. Of course, your program's commands
662 | might be different; for a GUI interface, you would use an "about box".
663 |
664 | You should also get your employer (if you work as a programmer) or school,
665 | if any, to sign a "copyright disclaimer" for the program, if necessary.
666 | For more information on this, and how to apply and follow the GNU GPL, see
667 | .
668 |
669 | The GNU General Public License does not permit incorporating your program
670 | into proprietary programs. If your program is a subroutine library, you
671 | may consider it more useful to permit linking proprietary applications with
672 | the library. If this is what you want to do, use the GNU Lesser General
673 | Public License instead of this License. But first, please read
674 | .
675 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # AIGC-AlgoNotes
2 | AIGC算法工程师面试八股文
3 |
--------------------------------------------------------------------------------
/人像生成相关/InstantID.md:
--------------------------------------------------------------------------------
1 | ## 目录
2 |
3 | - [1.InstantID原理是什么?](#1.InstantID原理是什么?)
4 | - [2.IdentityNet的作用](#2.IdentityNet的作用)
5 | - [原论文链接](https://arxiv.org/pdf/2401.07519)
6 |
7 |
8 | 1.PuLID原理是什么?
9 |
10 | 
11 |
12 | InstantID 通过轻量级的模块设计,实现了高效、无微调的身份保持图像生成。
13 |
14 | 1. **ID 嵌入**:
15 | - 使用预训练的人脸编码器(如 antelopev21)从参考人脸图像中提取强语义特征(如身份、性别、年龄),这些嵌入作为生成过程的核心条件。
16 | 2. **模块化架构**:
17 | - **Image Adapter**:一个轻量级模块,通过**解耦的交叉注意力机制**支持图像作为条件输入,与文本条件并行工作。
18 | - **IdentityNet**:用于捕获复杂的面部特征,结合 5 个关键点(眼睛、鼻子、嘴巴)实现弱空间控制,专注于身份保持而非冗余细节。
19 | 3. **与扩散模型的整合**:
20 | - 以 Stable Diffusion 为基础,冻结其预训练权重,仅优化新增模块,确保与社区模型的兼容性。
21 | - 在扩散过程中通过 ID 嵌入和轻量级模块引导生成,避免了微调过程。
22 |
23 | 2.IdentityNet的作用
24 |
25 | - **高保真度的身份保持**:通过整合面部嵌入(ID Embedding)和空间控制信息,IdentityNet在生成图像时确保面部特征的细节(如五官、表情)能够与参考图像高度一致。
26 | - **弱空间控制**:引入了面部关键点(如眼睛、鼻子和嘴巴的位置信息)作为空间约束条件,避免了过强的限制对编辑灵活性造成影响,同时减少了图像生成中面部自由度过大的问题。
--------------------------------------------------------------------------------
/人像生成相关/PhotoMaker.md:
--------------------------------------------------------------------------------
1 | ## 目录
2 |
3 | - [1.PhotoMaker原理是什么?](#1.PhotoMaker原理是什么?)
4 | - [2.损失如何计算?](#2.损失如何计算?)
5 | - [原论文链接](https://arxiv.org/pdf/2312.04461)
6 |
7 | 1.PhotoMaker原理是什么?
8 |
9 | 
10 |
11 | PhotoMaker 的核心在于使用堆叠 ID 嵌入和文本嵌入共同驱动扩散模型的生成过程,从而达到高保真度和灵活性。
12 |
13 | 1. **堆叠 ID 嵌入 (Stacked ID Embedding)**:
14 | - 将多个输入 ID 图像的特征嵌入进行堆叠,形成统一的 ID 表征。
15 | - 每个子嵌入对应一个输入图像,保留了丰富的身份特征。
16 | - 通过模型内的交叉注意力机制,自适应融合 ID 嵌入和文本提示。
17 | 2. **训练数据构建**:
18 | - PhotoMaker 使用一个由多样化 ID 图像组成的数据集训练。
19 | - 数据集中每个 ID 包含多视角、多表情、多场景的图像,增强模型泛化能力。
20 | 3. **生成过程**:
21 | - 将堆叠 ID 嵌入注入到扩散模型的交叉注意力层。
22 | - 替换文本嵌入中的类别词(如 “man” 或 “woman”)为堆叠 ID 嵌入,确保生成图像与目标 ID 一致。
23 |
24 | 2.损失如何计算?
25 |
26 | 1. **ID 保真损失**:
27 | - 使用基于人脸识别模型(如 ArcFace)的 ID 相似性度量,确保生成的图像与输入 ID 的嵌入一致。
28 | 2. **文本一致性损失**:
29 | - 采用 CLIP-T 指标度量生成图像与文本提示的相似性,提升文本可控性。
30 | 3. **多样性损失**:
31 | - 使用面部多样性指标(如 LPIPS)鼓励生成图像在表情、视角等方面具有多样性。
32 | 4. **掩码扩散损失**:
33 | - 随机遮挡 ID 无关区域,通过扩散模型生成更高质量的 ID 相关区域。
--------------------------------------------------------------------------------
/人像生成相关/PuLID.md:
--------------------------------------------------------------------------------
1 | ## 目录
2 |
3 | - [1.PuLID原理是什么?](#1.PuLID原理是什么?)
4 | - [2.介绍下Lightning T2I 分支](#2.介绍下Lightning-T2I分支)
5 | - [3.零卷积起什么作用?](#3.零卷积起什么作用?)
6 | - [4.什么是对比对齐损失和精确ID损失?](#4.什么是对比对齐损失和精确ID损失?)
7 | - [原论文链接](https://arxiv.org/pdf/2302.05543)
8 |
9 |
10 | 1.PuLID原理是什么?
11 |
12 | 
13 |
14 | 1. **核心目标**:
15 | - PuLID 是一种无需调优的 ID 定制方法,旨在在插入 ID 的同时保持模型原始行为的一致性。
16 | 2. **关键技术**:
17 | - 引入 Lightning T2I 分支,通过快速去噪生成高质量图像。
18 | - 使用对比对齐损失和精确 ID 损失,在保证 ID 保真度的同时减少干扰。
19 | 3. **优势**:
20 | - 无需大量微调或复杂数据集。
21 | - 能够处理复杂提示并保持生成的灵活性。
22 |
23 | 2.介绍下Lightning-T2I分支
24 |
25 | - Lightning T2I 分支 是 PuLID 方法中的一个核心部分,目的是通过快速去噪技术,从纯噪声生成高质量图像,同时保持原始模型的行为一致性。与传统的扩散模型需要数百步去噪不同,Lightning T2I 仅需 4 步即可生成图像。
26 | - 其工作原理是,构建两条对比路径:一条仅使用文本提示,另一条同时使用 ID 和文本提示,通过对比学习对齐这两条路径的特征,确保 ID 的插入不会影响图像的背景、光照和风格。
27 | - 这种方法的优势在于:
28 | 1.加速生成过程,大大提高了效率;
29 | 2.精准插入 ID,不干扰其他图像元素;
30 | 3.计算资源低,适合实际应用。
31 |
32 | 3.零卷积起什么作用?
33 |
34 | 在 ControlNet 中,零卷积通常用于处理**条件输入特征**,以实现以下功能:
35 |
36 | 1. **条件特征对主干网络的无损注入**:
37 | - 零卷积可以在初始状态下对输入的条件特征进行透明传递,确保不会干扰主干网络的功能。
38 | - 随着训练的进行,零卷积可以学习到如何将条件特征整合到生成流程中,使主干网络逐步受到条件的引导。
39 |
40 | 2. **梯度传递的稳定性**:
41 | - 零卷积层在初始状态下不对特征施加干扰,有助于稳定梯度传递,避免对原始预训练模型的破坏。
42 |
43 | 3. **条件融合的灵活性**:
44 | - 零卷积允许在不同层灵活地调整条件输入特征的影响范围,增强条件控制能力。
45 |
46 | 4.什么是对比对齐损失和精确ID损失?
47 |
48 | 1. **对比对齐损失(Contrastive Alignment Loss)**:
49 | - 目的是确保在插入 ID 后,模型仍然能保持对文本提示的响应能力,避免 ID 的插入干扰图像的其他部分(如背景、风格等)。
50 | - 计算方式:构建两条对比路径,一条只使用文本提示,另一条同时使用 ID 和文本提示。在 UNet 的交叉注意力层中,对这两条路径的特征进行对齐,通过计算它们的语义相似度来最小化两者之间的差异。
51 | - 公式:使用 Softmax 操作对比两条路径的特征,相似度越高,损失越小。
52 | 2. **精确 ID 损失(Accurate ID Loss)**:
53 | - 目的是确保 ID 信息插入后,生成的人脸图像与目标 ID 在视觉上具有高相似度。
54 | - 计算方式:通过 Lightning T2I 分支 生成一个高质量的图像,然后提取其人脸特征并与目标 ID 的人脸特征进行对比,计算它们的 余弦相似度(Cosine Similarity)。
55 | - 公式:通过 CosSim 计算生成图像和目标 ID 之间的相似度,值越高,表示 ID 保真度越高
--------------------------------------------------------------------------------
/人像生成相关/images/InstantID.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/huan085128/AIGC-AlgoNotes/44029432592f56e9a549b0d1c087b05a67262bc6/人像生成相关/images/InstantID.png
--------------------------------------------------------------------------------
/人像生成相关/images/PhotoMaker.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/huan085128/AIGC-AlgoNotes/44029432592f56e9a549b0d1c087b05a67262bc6/人像生成相关/images/PhotoMaker.png
--------------------------------------------------------------------------------
/人像生成相关/images/PuLID.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/huan085128/AIGC-AlgoNotes/44029432592f56e9a549b0d1c087b05a67262bc6/人像生成相关/images/PuLID.png
--------------------------------------------------------------------------------
/可控性相关/ACE++.md:
--------------------------------------------------------------------------------
1 | ## 目录
2 |
3 | - [1.ACE++是什么?](#ACE++是什么?)
4 | - [2.ACE++核心原理是什么?](#2.ACE++核心原理是什么?)
5 | - [3.为什么LUC++在通道维度上进行拼接,而不是在序列维度上拼接?](#3.为什么LUC++在通道维度上进行拼接,而不是在序列维度上拼接?)
6 | - [4.损失如何计算?](#4.损失如何计算?)
7 | - [5.解释下速度(velocity)](#5.解释下速度(velocity))
8 | - [原论文链接](https://arxiv.org/pdf/2410.00086)
9 |
10 | 1.ACE++是什么?
11 |
12 | 
13 |
14 | ACE++是对ACE模型的改进版本,重点在于基于指令的图像生成和编辑,通过上下文感知的内容填充方法来提升性能。ACE++的核心改进是对**长上下文条件单元LCU**的增强,使其能够支持更多类型的图像生成和编辑任务。
15 |
16 | 2.ACE++核心原理是什么?
17 |
18 | 
19 |
20 | 1. **LCU++输入格式**:
21 | - ACE++扩展了原先的LCU输入格式,使其支持包括图像生成、编辑、修复等多种任务。其主要创新是LCU++,通过将输入图像、掩码和噪声潜在表示在通道维度上进行拼接,而不是在序列维度上拼接。这种方式有效地减少了模型适应任务时的计算成本,避免了序列拼接对上下文感知框架的干扰,从而提高了训练效率。
22 |
23 | 2. **两阶段训练过程**:
24 | - **阶段一**:使用基础的文本到图像生成模型(如FLUX.1-dev)进行“0-ref”任务的预训练。这一步骤帮助模型学习基本的图像生成能力。
25 | - **阶段二**:使用包括“0-ref”和“N-ref”任务在内的所有任务对模型进行微调,使其能够更好地处理各种生成任务并根据指令生成图像。
26 |
27 | 3. **模型架构**:
28 | - **ACE++将LCU++输入格式集成到FLUX.1-dev模型的架构中**,采用完整的注意力框架处理多模态输入(包括图像、掩码和噪声潜在表示)。该模型通过文本嵌入、图像嵌入和掩码嵌入来处理输入数据,然后将这些特征映射为序列令牌,并通过transformer层进行处理。
29 |
30 | 4. **生成任务与编辑**:
31 | ACE++能够执行多种图像编辑和生成任务,包括:
32 |
33 | - **肖像一致性生成**:确保生成的图像在不同场景中保持一致的身份。
34 | - **主体一致性**:在不同场景中保持特定主体的一致性。
35 | - **局部编辑**:根据掩码对图像的特定区域进行修改,如添加物体或改变细节。
36 | - **灵活指令**:支持通过灵活的自然语言描述进行动态图像编辑,例如改变背景或添加新元素。
37 |
38 | 5. **高效微调与部署**:
39 | ACE++还提供了使用LoRA进行轻量级微调的模型,适用于肖像保护、主题驱动生成和局部编辑等特定领域任务。这使得模型能够快速适应特定任务,减少计算开销。
40 |
41 | 3.为什么LUC++在通道维度上进行拼接,而不是在序列维度上拼接?
42 |
43 | 在传统的图像生成任务中,使用文本引导的生成通常是基于文本和图像的序列拼接,也就是把文本的嵌入(text embedding)和图像的嵌入(image embedding)拼接在一起,然后将其输入到生成模型中。这种方法的缺点在于,随着输入条件的增加,模型的计算复杂度会急剧增加,尤其是在面对多模态输入时,拼接的序列会变得非常庞大,导致训练和推理过程中的计算负担较重。
44 |
45 | ACE++改进后的方法通过在通道维度拼接输入,而不是在序列维度拼接,避免了上述问题,并在以下方面有效提高了训练效率:
46 | 1. **减少计算复杂度**:
47 | 序列拼接会增加序列长度,导致计算量呈平方增长(O(n²)),而通道拼接避免了这种情况,减少了计算负担。
48 |
49 | 2. **上下文感知优化**:
50 | 序列拼接会使模型处理长序列时出现上下文干扰,而通道拼接使不同输入(如图像、掩码、噪声)独立处理,避免了信息混杂,提高了上下文感知能力。
51 |
52 | 3. **高效处理多模态输入**:
53 | 通道拼接允许模型并行处理图像、掩码和噪声,而不需要串行计算,有助于提高训练和推理效率。
54 |
55 |
56 | 4.损失如何计算?
57 |
58 | 在ACE++的训练过程中,损失函数的计算基于**预测速度(velocity)**和**目标速度**之间的差异,目标是最小化生成图像与目标图像之间的误差。损失函数包含两个主要部分:
59 |
60 | 1. **生成损失(Generation Loss)**:
61 | ACE++模型使用**噪声潜空间(noisy latent space)**进行图像生成。在训练时,模型需要预测当前时刻的**速度(velocity)**,即如何从当前噪声样本(Xt)变化到目标样本(X1)。这个过程使用**线性插值**生成目标样本,并根据模型的预测计算损失。
62 |
63 | 2. **损失公式**:
64 | 损失函数由两个部分组成:
65 | - **Lref**:重建损失,用于衡量参考图像(如给定的源图像或参考图像)与生成图像之间的差异。对于没有参考图像的任务(0-ref任务),这个损失为0。
66 | - **Ltar**:目标损失,用于衡量生成图像与目标图像之间的差异。
67 |
68 | 总的损失函数如下:
69 |
70 | \[
71 | L = E_{t, x_0, x_1} \| v_t - u_t \|^2 = \sum_{i=0}^{N-1} E_{t, x_0, x_1} \| v_i - u_i \|^2 + E_{t, x_0, x_1} \| v_N - u_N \|^2
72 | \]
73 |
74 | - **v_t** 是模型预测的速度,表示从当前噪声样本到目标样本的变化。
75 | - **u_t** 是实际的目标速度,表示从噪声到目标的真实变化。
76 | - **Lref** 是参考图像的重建损失(在0-ref任务中为0)。
77 | - **Ltar** 是目标图像的生成损失。
78 |
79 | 5.解释下速度(velocity)
80 |
81 | 在ACE++的训练过程中,“速度(velocity)” 是一个用来描述从一个噪声状态到目标状态的变化速率的概念。这个术语其实与扩散模型的训练过程密切相关。
82 |
83 | 1. **扩散模型和噪声拉丁(Latent)表示**
84 | 扩散模型(Diffusion Models)是一类生成模型,通过模拟数据从纯噪声到目标数据的逐步“扩散”过程来生成样本。具体来说,扩散模型通过反向过程逐步从噪声(随机的潜变量)中恢复出一个清晰的图像。这个过程是通过预测每一步的“变化”来实现的,这里的变化就是指从当前噪声状态到目标状态的变化。
85 |
86 | 在ACE++中,模型使用了噪声潜在表示(Xt),它表示当前图像(或数据)在噪声空间中的状态。通过逐步减少噪声,模型希望恢复出清晰的目标图像。
87 |
88 | 2. **“速度”是什么意思**
89 | 在训练过程中,我们通过计算每个时间步骤的速度,即噪声状态Xt和目标图像X1之间的变化速率,来引导模型的学习过程。可以把速度理解为噪声的变化,即如何从当前的噪声状态(Xt)朝向目标图像的清晰状态(X1)迈进。
90 |
91 | 速度的定义是模型预测的从当前噪声状态到目标图像状态的变化量(或者说是梯度)。通过这个预测,模型学习如何一步步去从噪声中生成图像。
92 |
93 | 3. **为什么需要“速度”**
94 | 在扩散模型中,每一步的生成过程实际上是预测噪声的“去除”过程。为了使模型生成图像,我们不仅需要知道如何从当前噪声生成下一步的清晰图像,还需要知道如何调整每个时间步的噪声去除速率,确保每个步骤的变化能够逐渐使噪声变为清晰的图像。这就是速度的作用:它帮助模型在每个时间步骤中预测如何变化(去除噪声),以接近目标图像。
95 |
96 | 4. **训练中的速度计算**
97 | 在训练时,模型通过预测当前噪声状态和目标图像之间的速度(变化量),然后与实际的目标速度进行比较。训练的目标是最小化这种预测速度和真实速度之间的差异。
98 |
99 | 具体的损失函数中,Lref和Ltar就是计算预测的速度与真实速度之间的差异,并通过反向传播更新模型的参数。
--------------------------------------------------------------------------------
/可控性相关/ACE.md:
--------------------------------------------------------------------------------
1 | ## 目录
2 |
3 | - [1.ACE是什么?](#ACE是什么?)
4 | - [2.ACE核心原理是什么?](#2.ACE核心原理是什么?)
5 | - [3.解释下条件单元CU](#3.解释下条件单元CU)
6 | - [4.解释下长上下文条件单元LCU](#4.解释下长上下文条件单元LCU)
7 | - [5.长上下文自注意力机制](#5.长上下文自注意力机制)
8 | - [6.ACE是如何进行训练的?](#6.ACE是如何进行训练的?)
9 | - [原论文链接](https://arxiv.org/pdf/2410.00086)
10 |
11 | 1.ACE是什么?
12 |
13 | 
14 |
15 | **ACE: All-round Creator and Editor**是一种视觉生成模型,旨在解决当前视觉生成领域的一个重要问题——不同生成任务需要不同的输入条件,而现有的基础生成模型难以适应多模态条件并完成各种生成任务。ACE模型提出了一种统一的框架,能够处理从文本指导生成到图像编辑等多种任务,且支持多轮交互式生成。
16 |
17 | 2.ACE核心原理是什么?
18 |
19 | 
20 |
21 | ACE(All-round Creator and Editor)的核心原理可以概括为以下几个关键部分:
22 | 1. **统一的条件输入格式(CU 和 LCU)**:
23 | - **条件单元(Condition Unit,CU)**:ACE模型通过定义条件单元(CU),将文本指导、图像和掩码(mask)等多模态输入整合在一起。每个CU包含了文本指令(T)和相应的视觉信息(V),视觉信息包括一组图像(I)和对应的掩码(M)。这种统一格式允许ACE处理各种生成任务,如文本生成、图像编辑和区域编辑等。
24 | - **长上下文条件单元(Long-context Condition Unit,LCU)**:为了增强模型对复杂任务的理解,ACE引入了LCU,结合了多轮历史生成信息。LCU不仅包含当前的输入,还包括来自先前生成回合的历史数据,这使得模型能够更好地理解上下文,进行多轮交互和长期的任务生成。
25 |
26 | 2. **基于Transformer的Diffusion模型**:
27 | - ACE采用了Diffusion Transformer架构,这是一种结合了扩散模型和Transformer模型优势的生成方法。具体来说,ACE通过**Condition Tokenizing**模块将不同类型的输入条件(文本和图像)转化为统一的表示,并通过**Long-context Attention Block**进行处理,确保在多轮生成过程中能够有效集成历史信息,保证生成内容的连贯性。
28 |
29 | 3. **创新的输入编码方法**:
30 | - **图像指示嵌入(Image Indicator Embedding)**:为了确保文本指令中的图像顺序与实际图像的顺序一致,ACE引入了图像指示嵌入。这一模块为每个图像分配一个文本指示符,帮助模型理解不同图像在文本中的位置。
31 | - **长上下文自注意力机制(Long-context Attention)**:在处理多轮生成任务时,ACE的长上下文自注意力机制可以有效地整合所有图像和文本输入的信息,保证每一轮生成都能够参考前面的上下文,从而生成更加连贯和符合指令的图像。
32 |
33 | 4. **高效的数据收集和处理方法**:
34 | - 由于高质量的数据对训练生成模型至关重要,ACE提出了两种数据收集方法:合成数据和从大规模图像数据库中配对图像。通过合成数据,可以生成满足特定需求的图像对;而通过从真实数据集中配对图像,则能够提高数据多样性并减少过拟合风险。
35 |
36 | 3.解释下条件单元CU
37 |
38 | ### CU的结构
39 | **条件单元CU**通过统一的输入格式整合了文本指令和视觉信息(包括图像和掩码),使得ACE能够适应多种类型的视觉生成任务。
40 | **CU由两个主要部分组成**:
41 | 1. **文本指令(T)**:
42 | - 文本指令描述了模型生成图像或进行图像编辑时的要求。这些指令可能是简单的描述性文本,也可以是较为复杂的编辑任务指令。
43 | - 例如,"生成一个穿红色裙子的女孩" 或 "根据给定的边缘图生成完整的花朵图像"。
44 |
45 | 2. **视觉信息(V)**:
46 | - 视觉信息部分包含与图像相关的数据,具体包括:
47 | - **图像(I)**:输入图像,可能是原始图像或者经过处理的图像。
48 | - **掩码(M)**:掩码用于指定图像的特定区域(例如,图像中的一个部分或目标对象),通常用于图像编辑任务。在没有特定掩码的情况下,掩码可能是一个全1的掩码,表示处理整个图像。
49 | - 视觉信息可以包含多个图像和相应的掩码。例如,进行对象编辑时,模型可能需要处理多个图像,每个图像可能有一个不同的掩码。
50 |
51 | ### CU的形式
52 | 根据ACE的定义,**条件单元(CU)** 的结构可以表示为:
53 | - **CU = {T, V}**
54 | - 其中T是文本指令(文本信息),V是视觉信息(包括图像和掩码)。视觉信息V可以进一步拆分为一组图像和掩码对,表示为:
55 | - **V = {[I1; M1], [I2; M2], ..., [IN; MN]}**
56 | - 每对 `[Ii; Mi]` 表示第i个输入图像和其对应的掩码(如果有的话)。如果没有掩码,则Mi为一个全1的掩码。
57 |
58 | 4.解释下长上下文条件单元LCU
59 |
60 |
61 | **长上下文条件单元LCU**的结构是对条件单元(CU)的扩展。在ACE中,CU用于处理每个任务的输入,包括文本指令和视觉信息(如图像和掩码)。而LCU则通过引入历史上下文,将之前的任务信息与当前任务的输入相结合,从而帮助模型更好地理解和生成符合用户意图的内容。
62 |
63 | **LCU的结构如下**:
64 |
65 | - **LCU = {{Ti−m, Ti−m+1, ..., Ti}, {Vi−m, Vi−m+1, ..., Vi}}**
66 |
67 | 其中:
68 | - **Ti** 是当前回合的文本指令。
69 | - **Vi** 是当前回合的视觉信息(图像和掩码)。
70 | - **Ti−m, Ti−m+1, ..., Ti** 是过去m轮生成任务中的文本指令(历史文本信息)。
71 | - **Vi−m, Vi−m+1, ..., Vi** 是过去m轮生成任务中的视觉信息(历史图像和掩码)。
72 |
73 | 通过这种方式,LCU不仅包括当前回合的输入数据,还包括历史生成任务的信息。历史信息的引入使得模型能够在当前生成任务中考虑先前的上下文,从而生成更加连贯和符合预期的结果。
74 |
75 | 5.长上下文自注意力机制
76 |
77 | 长上下文自注意力机制的是基于自注意力机制的扩展。自注意力机制的核心是计算输入序列中每个位置的“注意力分数”,这些分数决定了每个位置与其他位置之间的信息交互。在传统的自注意力机制中,每个输入序列的位置都需要与其他所有位置进行交互,这种计算复杂度为O(n²),对于长序列来说,计算效率较低。
78 |
79 | 为了克服这一问题,长上下文自注意力机制通过以下方式进行优化:
80 |
81 | 1. **时间步嵌入(Time Step Embedding)**:
82 | - 每个输入位置会被赋予一个时间步嵌入,以帮助模型区分不同的时间步,从而更好地处理长时间序列中的依赖关系。
83 |
84 | 2. **3D旋转位置编码(3D RoPE)**:
85 | - 为了更好地处理图像和文本序列中的空间与时间依赖,ACE采用了3D旋转位置编码(3D RoPE),它将位置编码扩展到三维空间,确保不同空间和时间层次的信息能够有效交互。
86 |
87 | 3. **长上下文自注意力模块(Long-context Self-Attention)**:
88 | - 在长上下文自注意力模块中,所有输入的图像和文本嵌入都会与历史上下文信息交互。通过这种方式,模型能够在生成每个输出时,同时考虑当前输入和历史信息,从而确保生成的内容更加连贯。
89 |
90 | 4. **跨注意力机制(Cross-Attention)**:
91 | - 除了自注意力,长上下文自注意力还包括跨注意力机制。每个输入的图像和文本嵌入只与属于同一条件单元(CU)的其他图像和文本进行交互,而不是与所有条件单元的内容进行交互。这种设计保证了文本和图像的匹配更加精准。
92 |
93 | 6.ACE是如何进行训练的?
94 |
95 | ### 1. 数据收集和构建
96 | 训练ACE时,首先需要大量的高质量数据,特别是**图像对**和**文本指令**。由于缺少直接可用的训练数据,ACE提出了两种主要的数据收集方法:
97 |
98 | - **合成数据(Synthesis-based Data)**:
99 | - 使用现有的开源模型生成特定任务的图像对。这些模型通过合成或修改现有图像来生成符合目标条件的图像,例如通过风格转换生成图像,或者通过加入特定对象或背景来修改图像内容。
100 | - 这种方法的优点是生成速度快,能够快速构建大量的数据对,但也可能面临合成数据过拟合的问题。
101 |
102 | - **真实数据对(Pairing from Real Databases)**:
103 | - 从大规模图像数据库(如LAION-5B、OpenImages等)中提取图像对,确保数据的多样性和真实性。ACE使用了一个**层次化聚合管道**,首先通过图像特征提取(如使用SigLIP提取图像的语义特征),然后通过K-means聚类将图像分为不同的类别。接下来,通过**Union-find算法**进行进一步的分组,确保图像对在语义上相关。
104 | - 通过这种方法,ACE可以获得高质量的图像对,并减少合成数据可能带来的过拟合问题。
105 |
106 | ### 2. 指令标签的生成和优化
107 | 为了训练ACE,除了图像数据对之外,还需要为每个图像对生成**对应的文本指令**。这些文本指令描述了如何将源图像转换为目标图像。生成指令的过程是一个复杂的任务,因为它不仅仅是描述图像的内容,还需要指出源图像和目标图像之间的差异。
108 |
109 | ACE使用了两种方法来生成指令:
110 |
111 | - **基于模板的方法(Template-based Method)**:
112 | - 这种方法利用人工构建的模板,根据任务类型为不同的视觉任务生成指令。这些模板通常包含特定任务所需的通用表达式。模板方法的缺点是生成的指令缺乏多样性,可能导致过拟合。
113 |
114 | - **基于多模态大语言模型(MLLM-based Method)**:
115 | - 为了解决模板方法带来的多样性不足问题,ACE引入了多模态大语言模型(MLLM)来生成更灵活和多样化的指令。通过训练MLLM,ACE能够根据图像的内容和与目标图像的差异生成更丰富和准确的文本指令。
116 |
117 | ### 3. 训练策略
118 | ACE模型的训练采用了联合训练(Joint Training)策略,即通过一个统一的框架同时训练生成和编辑任务。具体来说,训练过程中的关键步骤包括:
119 |
120 | - **多模态输入的处理**:
121 | - ACE模型需要处理的输入包括文本指令、图像以及对应的掩码。模型需要学习如何将这些多模态信息结合起来,以生成符合指令要求的图像。
122 |
123 | - **损失函数设计**:
124 | - 在训练过程中,ACE使用了多种损失函数来优化生成结果。这些损失函数通常包括:
125 | - **生成损失**:度量生成图像与目标图像之间的差异。
126 | - **文本一致性损失**:衡量生成图像与文本指令之间的一致性。
127 | - **图像质量损失**:评估生成图像的美观度、细节和视觉效果。
128 |
129 | - **多任务学习**:
130 | - ACE支持多种任务类型(例如,文本引导生成、语义编辑、重绘等)。在训练过程中,模型需要同时优化多个任务的表现,因此,ACE采用了多任务学习方法,允许模型在不同任务之间共享知识。
131 |
132 | - **长上下文训练(Long-context Training)**:
133 | - 对于需要多轮生成或长时间上下文的任务,ACE会将历史生成信息作为额外的输入,以帮助模型更好地理解任务的全局上下文。通过这种方式,ACE能够在处理复杂任务时保持一致性和连贯性。
134 |
135 |
--------------------------------------------------------------------------------
/可控性相关/BrushNet.md:
--------------------------------------------------------------------------------
1 | ## 目录
2 |
3 | - [1.BrushNet原理是什么?](#BrushNet原理是什么?)
4 | - [2.特征层如何融合?](#2.特征层如何融合?)
5 | - [3.模糊混合策略](#3.模糊混合策略)
6 | - [原论文链接](https://arxiv.org/pdf/2403.06976)
7 |
8 | 1.BrushNet原理是什么?
9 |
10 | 
11 |
12 | BrushNet 提出了 双分支扩散架构(Dual-Branch Diffusion Architecture),通过显式分离图像修复特征和生成特征,实现高质量的图像修复。
13 | 1. **额外分支(修复分支)**:
14 | 专门处理遮挡区域的特征提取。输入包括:
15 | - 噪声潜变量(Noisy Latent)。
16 | - 遮挡图像潜变量(Masked Image Latent)。
17 | - 下采样后的遮罩(Downsampled Mask)。
18 | 使用 VAE 编码器 提取遮挡区域特征,确保特征分布与预训练模型一致。
19 | 1. **主分支(生成分支)**:
20 | 冻结预训练的扩散模型,保持未遮挡区域的生成能力。
21 |
22 | 2.特征层如何融合?
23 |
24 | - 通过 零卷积(Zero Convolution) 将修复分支提取的特征逐层插入到主分支的特征图中。
25 | **实现方式**:与controlnet类似,复制unet
26 | - 实现了像素级别的控制,提升遮挡区域与未遮挡区域的边界一致性。
27 |
28 | 3.模糊混合策略
29 |
30 | - 对生成结果与原始图像的未遮挡区域进行模糊混合,进一步优化边界效果,保证视觉一致性。
31 | **实现方式**:使用**高斯模糊**操作平滑遮罩边界,对生成图像和原始图像进行像素级加权混合
--------------------------------------------------------------------------------
/可控性相关/Controlnet.md:
--------------------------------------------------------------------------------
1 | ## 目录
2 |
3 | - [1.ControlNet原理是什么?](#1.ControlNet原理是什么?)
4 | - [2.什么是零卷积?](#2.什么是零卷积?)
5 | - [3.零卷积起什么作用?](#3.零卷积起什么作用?)
6 | - [4.ControlNet复制了U-Net的哪些层?](#4.ControlNet复制了U-Net的哪些层?)
7 | - [5.为什么controlnet只复制了unet的编码层?](#5.为什么controlnet只复制了unet的编码层?)
8 | - [原论文链接](https://arxiv.org/pdf/2302.05543)
9 |
10 |
11 | 1.ControlNet原理是什么?
12 |
13 | 
14 | 1. **条件控制**:
15 | ControlNet 的目标是通过输入条件(如边缘图、深度图、姿态信息等),指导生成模型(如 Stable Diffusion)生成符合特定约束的图像。
16 |
17 | 2. **双网络结构**:
18 | - **主干网络 (Pre-trained Model)**:
19 | 使用一个已经预训练好的扩散模型(如 Stable Diffusion),作为生成图像的核心模型。
20 | - **辅助网络 (ControlNet)**:
21 | 添加一个条件网络(ControlNet),其结构和主干网络类似,但接受额外的控制输入。ControlNet 会在某些层添加条件特征,使主干网络可以在生成过程中参考这些特征。
22 |
23 | 3. **权重复用**:
24 | - ControlNet 基于预训练扩散模型的权重,并在其上添加新的参数进行调整。这个设计保证了 ControlNet 在引入控制能力的同时,不会破坏原始模型的生成能力。
25 |
26 | 4. **冻结权重**:
27 | - 预训练模型的权重通常是冻结的,ControlNet 通过调整自身参数来引导生成过程。
28 |
29 | 2.什么是零卷积?
30 |
31 | 
32 | #### 零卷积的关键特性
33 |
34 | 1. **初始状态无影响**:
35 | - 在初始状态下,卷积核的权重为零,因此输入数据在经过零卷积层时不会发生改变,仅会受到偏置的影响。
36 |
37 | 2. **参数学习能力**:
38 | - 在训练过程中,零卷积层可以学习到适合的权重参数,从而逐步改变输入数据,增加模型对条件输入的适应性。
39 |
40 | 3. **轻量化设计**:
41 | - 零卷积层的初始行为相当于一个恒等映射(Identity Mapping),这使其不会破坏已有的预训练模型功能,同时保留了学习的灵活性。
42 |
43 | 3.零卷积起什么作用?
44 |
45 | 在 ControlNet 中,零卷积通常用于处理**条件输入特征**,以实现以下功能:
46 |
47 | 1. **条件特征对主干网络的无损注入**:
48 | - 零卷积可以在初始状态下对输入的条件特征进行透明传递,确保不会干扰主干网络的功能。
49 | - 随着训练的进行,零卷积可以学习到如何将条件特征整合到生成流程中,使主干网络逐步受到条件的引导。
50 |
51 | 2. **梯度传递的稳定性**:
52 | - 零卷积层在初始状态下不对特征施加干扰,有助于稳定梯度传递,避免对原始预训练模型的破坏。
53 |
54 | 3. **条件融合的灵活性**:
55 | - 零卷积允许在不同层灵活地调整条件输入特征的影响范围,增强条件控制能力。
56 |
57 | 4.ControlNet复制了U-Net的哪些层?
58 |
59 | 复制了U-Net中的Encode部分,decoder部分进行skip connection。
60 | 其中的复制层包括 12 个编码块 和 1 个中间块,覆盖 Stable Diffusion 的主要编码部分。
61 |
62 | 5.为什么controlnet只复制了unet的编码层?
63 |
64 | ControlNet 只复制 UNet 的编码层,而不复制解码层,是为了:
65 | 1. **高效提取条件特征**:编码层是特征提取的关键部分,适合引入条件控制。
66 | 2. **不干扰图像重建**:解码层专注于图像重建,不需要额外的条件控制。
67 | 3. **降低计算成本**:减少不必要的参数复制,保持模型轻量化。
68 | 4. **保持主干模型的生成能力**:保证对原始生成流程的影响最小。
--------------------------------------------------------------------------------
/可控性相关/IC-Light.md:
--------------------------------------------------------------------------------
1 | ## 目录
2 |
3 | - [1.IC-Light原理是什么?](#IC-Light原理是什么?)
4 | - [2.训练方法](#2.训练方法)
5 | - [原论文链接](https://arxiv.org/pdf/2312.04461)
6 |
7 | 1.IC-Light原理是什么?
8 |
9 | 
10 |
11 | **IC-Light** 的核心基于 **光传输一致性(Light Transport Consistency)** 的物理原理:
12 |
13 | 1. **线性混合原理**:
14 | - 物体在不同光照条件下的外观,其线性混合结果应该与该物体在混合光照条件下的外观一致。
15 | - 数学表达:
16 | \[
17 | I_{L1+L2} = I_{L1} + I_{L2}
18 | \]
19 | 其中 \( I_{L1} \) 和 \( I_{L2} \) 分别为物体在光照条件 \( L1 \) 和 \( L2 \) 下的外观。
20 |
21 | 2. **一致性约束**:
22 | - 在扩散模型的训练中,通过损失函数约束模型的输出,确保只修改光照,不改变其他本质属性(如材质、颜色等)。
23 | - 损失函数形式:
24 | \[
25 | L_{\text{consistency}} = \| I_{L1+L2} - (I_{L1} + I_{L2}) \|_2^2
26 | \]
27 |
28 | 3. **数据处理与增强**:
29 | - 使用真实场景、渲染数据和合成光照数据构建多样化的训练集,增强模型对复杂光照场景的适应能力。
30 |
31 | 2.训练方法
32 |
33 | 1. **数据预处理**:
34 | - 提取输入图像的反照率、背景信息和光照环境图。
35 | - 应用多种数据增强技术,生成随机光照变化的训练样本。
36 |
37 | 2. **损失函数设计**:
38 | - **基础损失(Vanilla Loss)**:指导模型生成基本光照编辑结果。
39 | - **一致性损失(Consistency Loss)**:通过光照一致性约束,保证模型对光照编辑的精确性。
40 |
41 | 3. **模型架构与优化**:
42 | - 基于扩散模型(如 Stable Diffusion 和 Flux),冻结 VAE 编码器,专注于光照特性优化。
43 | - 使用多层感知机(MLP)处理光照线性混合问题,增强模型的表达能力。
--------------------------------------------------------------------------------
/可控性相关/images/ACE++介绍.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/huan085128/AIGC-AlgoNotes/44029432592f56e9a549b0d1c087b05a67262bc6/可控性相关/images/ACE++介绍.png
--------------------------------------------------------------------------------
/可控性相关/images/ACE++框架.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/huan085128/AIGC-AlgoNotes/44029432592f56e9a549b0d1c087b05a67262bc6/可控性相关/images/ACE++框架.png
--------------------------------------------------------------------------------
/可控性相关/images/ACE架构.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/huan085128/AIGC-AlgoNotes/44029432592f56e9a549b0d1c087b05a67262bc6/可控性相关/images/ACE架构.png
--------------------------------------------------------------------------------
/可控性相关/images/ACE概览.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/huan085128/AIGC-AlgoNotes/44029432592f56e9a549b0d1c087b05a67262bc6/可控性相关/images/ACE概览.png
--------------------------------------------------------------------------------
/可控性相关/images/BrushNet.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/huan085128/AIGC-AlgoNotes/44029432592f56e9a549b0d1c087b05a67262bc6/可控性相关/images/BrushNet.png
--------------------------------------------------------------------------------
/可控性相关/images/IC-Light.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/huan085128/AIGC-AlgoNotes/44029432592f56e9a549b0d1c087b05a67262bc6/可控性相关/images/IC-Light.png
--------------------------------------------------------------------------------
/可控性相关/images/controlnet结构图.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/huan085128/AIGC-AlgoNotes/44029432592f56e9a549b0d1c087b05a67262bc6/可控性相关/images/controlnet结构图.png
--------------------------------------------------------------------------------
/可控性相关/images/零卷积.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/huan085128/AIGC-AlgoNotes/44029432592f56e9a549b0d1c087b05a67262bc6/可控性相关/images/零卷积.png
--------------------------------------------------------------------------------
/风格迁移相关/IP-Adapter.md:
--------------------------------------------------------------------------------
1 | ## 目录
2 |
3 | - [1.IP-Adapter原理是什么?](#1.IPAdapter原理是什么?)
4 | - [2.能介绍下解耦的交叉注意力机制吗?](#2.能介绍下解耦的交叉注意力机制吗?)
5 | - [3.IP-Adapter是如何处理图像输入的?](#3.IP-Adapter是如何处理图像输入的?)
6 | - [4.IP-Adapter FaceID模型有何变化?](#4.IP-AdapterFaceID模型有何变化?)
7 | - [原论文链接](https://arxiv.org/pdf/2308.06721)
8 |
9 |
10 | 1.IP-Adapter原理是什么?
11 |
12 | 
13 | IP-Adapter 是一种**轻量化的图像提示适配器**,核心是**解耦的cross attention**,能够将图像特征与文本特征分开处理,同时嵌入到扩散模型中。它的设计保留了预训练模型的文本生成能力,同时支持图像 prompt 和多模态生成。通过冻结预训练模型并仅优化新增的注意力模块,IP-Adapter 实现了高效训练、广泛适配性和灵活的多任务能力,适合用于多模态图像生成和控制任务。
14 |
15 | 2.能介绍下解耦的交叉注意力机制吗?
16 |
17 | 解耦的交叉注意力机制是 IP-Adapter 的核心创新之一。它将文本和图像特征分开处理,分别通过独立的交叉注意力层生成注意力输出,其中文本特征通过原始的交叉注意力层,图像特征通过新增的交叉注意力层,最后将两者结合。这种方法避免了特征之间的干扰,使图像特征更加细粒度地嵌入模型,同时保留文本生成能力。
18 |
19 | 3.IP-Adapter如何处理图像特征?
20 |
21 | 使用 CLIP 图像编码器提取图像特征,这些特征通过一个轻量级投影网络(Linear + 归一化层LN)进行处理后,与预训练模型的特征维度对齐。
22 |
23 | 4.IP-Adapter FaceID模型有何变化?
24 |
25 | IP-Adapter FaceID 模型在 IP-Adapter 的基础上,使用ID Encoder来提取人脸特征,face ID embedding (提取人脸ID) + CLIP image embedding (提取人脸结构)。
26 |
--------------------------------------------------------------------------------
/风格迁移相关/images/IP-Adapter模型架构图.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/huan085128/AIGC-AlgoNotes/44029432592f56e9a549b0d1c087b05a67262bc6/风格迁移相关/images/IP-Adapter模型架构图.png
--------------------------------------------------------------------------------