├── .gitignore
├── LICENSE
├── Readme.md
├── cinemaot
├── __init__.py
├── benchmark.py
├── cinemaot.py
├── sinkhorn_knopp.py
└── utils.py
├── cinemaot_tutorial.ipynb
├── pyproject.toml
├── setup.cfg
└── simulation.py
/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | __pycache__
3 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | GNU AFFERO GENERAL PUBLIC LICENSE
2 | Version 3, 19 November 2007
3 |
4 | Copyright (C) 2007 Free Software Foundation, Inc.
5 | Everyone is permitted to copy and distribute verbatim copies
6 | of this license document, but changing it is not allowed.
7 |
8 | Preamble
9 |
10 | The GNU Affero General Public License is a free, copyleft license for
11 | software and other kinds of works, specifically designed to ensure
12 | cooperation with the community in the case of network server software.
13 |
14 | The licenses for most software and other practical works are designed
15 | to take away your freedom to share and change the works. By contrast,
16 | our General Public Licenses are intended to guarantee your freedom to
17 | share and change all versions of a program--to make sure it remains free
18 | software for all its users.
19 |
20 | When we speak of free software, we are referring to freedom, not
21 | price. Our General Public Licenses are designed to make sure that you
22 | have the freedom to distribute copies of free software (and charge for
23 | them if you wish), that you receive source code or can get it if you
24 | want it, that you can change the software or use pieces of it in new
25 | free programs, and that you know you can do these things.
26 |
27 | Developers that use our General Public Licenses protect your rights
28 | with two steps: (1) assert copyright on the software, and (2) offer
29 | you this License which gives you legal permission to copy, distribute
30 | and/or modify the software.
31 |
32 | A secondary benefit of defending all users' freedom is that
33 | improvements made in alternate versions of the program, if they
34 | receive widespread use, become available for other developers to
35 | incorporate. Many developers of free software are heartened and
36 | encouraged by the resulting cooperation. However, in the case of
37 | software used on network servers, this result may fail to come about.
38 | The GNU General Public License permits making a modified version and
39 | letting the public access it on a server without ever releasing its
40 | source code to the public.
41 |
42 | The GNU Affero General Public License is designed specifically to
43 | ensure that, in such cases, the modified source code becomes available
44 | to the community. It requires the operator of a network server to
45 | provide the source code of the modified version running there to the
46 | users of that server. Therefore, public use of a modified version, on
47 | a publicly accessible server, gives the public access to the source
48 | code of the modified version.
49 |
50 | An older license, called the Affero General Public License and
51 | published by Affero, was designed to accomplish similar goals. This is
52 | a different license, not a version of the Affero GPL, but Affero has
53 | released a new version of the Affero GPL which permits relicensing under
54 | this license.
55 |
56 | The precise terms and conditions for copying, distribution and
57 | modification follow.
58 |
59 | TERMS AND CONDITIONS
60 |
61 | 0. Definitions.
62 |
63 | "This License" refers to version 3 of the GNU Affero General Public License.
64 |
65 | "Copyright" also means copyright-like laws that apply to other kinds of
66 | works, such as semiconductor masks.
67 |
68 | "The Program" refers to any copyrightable work licensed under this
69 | License. Each licensee is addressed as "you". "Licensees" and
70 | "recipients" may be individuals or organizations.
71 |
72 | To "modify" a work means to copy from or adapt all or part of the work
73 | in a fashion requiring copyright permission, other than the making of an
74 | exact copy. The resulting work is called a "modified version" of the
75 | earlier work or a work "based on" the earlier work.
76 |
77 | A "covered work" means either the unmodified Program or a work based
78 | on the Program.
79 |
80 | To "propagate" a work means to do anything with it that, without
81 | permission, would make you directly or secondarily liable for
82 | infringement under applicable copyright law, except executing it on a
83 | computer or modifying a private copy. Propagation includes copying,
84 | distribution (with or without modification), making available to the
85 | public, and in some countries other activities as well.
86 |
87 | To "convey" a work means any kind of propagation that enables other
88 | parties to make or receive copies. Mere interaction with a user through
89 | a computer network, with no transfer of a copy, is not conveying.
90 |
91 | An interactive user interface displays "Appropriate Legal Notices"
92 | to the extent that it includes a convenient and prominently visible
93 | feature that (1) displays an appropriate copyright notice, and (2)
94 | tells the user that there is no warranty for the work (except to the
95 | extent that warranties are provided), that licensees may convey the
96 | work under this License, and how to view a copy of this License. If
97 | the interface presents a list of user commands or options, such as a
98 | menu, a prominent item in the list meets this criterion.
99 |
100 | 1. Source Code.
101 |
102 | The "source code" for a work means the preferred form of the work
103 | for making modifications to it. "Object code" means any non-source
104 | form of a work.
105 |
106 | A "Standard Interface" means an interface that either is an official
107 | standard defined by a recognized standards body, or, in the case of
108 | interfaces specified for a particular programming language, one that
109 | is widely used among developers working in that language.
110 |
111 | The "System Libraries" of an executable work include anything, other
112 | than the work as a whole, that (a) is included in the normal form of
113 | packaging a Major Component, but which is not part of that Major
114 | Component, and (b) serves only to enable use of the work with that
115 | Major Component, or to implement a Standard Interface for which an
116 | implementation is available to the public in source code form. A
117 | "Major Component", in this context, means a major essential component
118 | (kernel, window system, and so on) of the specific operating system
119 | (if any) on which the executable work runs, or a compiler used to
120 | produce the work, or an object code interpreter used to run it.
121 |
122 | The "Corresponding Source" for a work in object code form means all
123 | the source code needed to generate, install, and (for an executable
124 | work) run the object code and to modify the work, including scripts to
125 | control those activities. However, it does not include the work's
126 | System Libraries, or general-purpose tools or generally available free
127 | programs which are used unmodified in performing those activities but
128 | which are not part of the work. For example, Corresponding Source
129 | includes interface definition files associated with source files for
130 | the work, and the source code for shared libraries and dynamically
131 | linked subprograms that the work is specifically designed to require,
132 | such as by intimate data communication or control flow between those
133 | subprograms and other parts of the work.
134 |
135 | The Corresponding Source need not include anything that users
136 | can regenerate automatically from other parts of the Corresponding
137 | Source.
138 |
139 | The Corresponding Source for a work in source code form is that
140 | same work.
141 |
142 | 2. Basic Permissions.
143 |
144 | All rights granted under this License are granted for the term of
145 | copyright on the Program, and are irrevocable provided the stated
146 | conditions are met. This License explicitly affirms your unlimited
147 | permission to run the unmodified Program. The output from running a
148 | covered work is covered by this License only if the output, given its
149 | content, constitutes a covered work. This License acknowledges your
150 | rights of fair use or other equivalent, as provided by copyright law.
151 |
152 | You may make, run and propagate covered works that you do not
153 | convey, without conditions so long as your license otherwise remains
154 | in force. You may convey covered works to others for the sole purpose
155 | of having them make modifications exclusively for you, or provide you
156 | with facilities for running those works, provided that you comply with
157 | the terms of this License in conveying all material for which you do
158 | not control copyright. Those thus making or running the covered works
159 | for you must do so exclusively on your behalf, under your direction
160 | and control, on terms that prohibit them from making any copies of
161 | your copyrighted material outside their relationship with you.
162 |
163 | Conveying under any other circumstances is permitted solely under
164 | the conditions stated below. Sublicensing is not allowed; section 10
165 | makes it unnecessary.
166 |
167 | 3. Protecting Users' Legal Rights From Anti-Circumvention Law.
168 |
169 | No covered work shall be deemed part of an effective technological
170 | measure under any applicable law fulfilling obligations under article
171 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or
172 | similar laws prohibiting or restricting circumvention of such
173 | measures.
174 |
175 | When you convey a covered work, you waive any legal power to forbid
176 | circumvention of technological measures to the extent such circumvention
177 | is effected by exercising rights under this License with respect to
178 | the covered work, and you disclaim any intention to limit operation or
179 | modification of the work as a means of enforcing, against the work's
180 | users, your or third parties' legal rights to forbid circumvention of
181 | technological measures.
182 |
183 | 4. Conveying Verbatim Copies.
184 |
185 | You may convey verbatim copies of the Program's source code as you
186 | receive it, in any medium, provided that you conspicuously and
187 | appropriately publish on each copy an appropriate copyright notice;
188 | keep intact all notices stating that this License and any
189 | non-permissive terms added in accord with section 7 apply to the code;
190 | keep intact all notices of the absence of any warranty; and give all
191 | recipients a copy of this License along with the Program.
192 |
193 | You may charge any price or no price for each copy that you convey,
194 | and you may offer support or warranty protection for a fee.
195 |
196 | 5. Conveying Modified Source Versions.
197 |
198 | You may convey a work based on the Program, or the modifications to
199 | produce it from the Program, in the form of source code under the
200 | terms of section 4, provided that you also meet all of these conditions:
201 |
202 | a) The work must carry prominent notices stating that you modified
203 | it, and giving a relevant date.
204 |
205 | b) The work must carry prominent notices stating that it is
206 | released under this License and any conditions added under section
207 | 7. This requirement modifies the requirement in section 4 to
208 | "keep intact all notices".
209 |
210 | c) You must license the entire work, as a whole, under this
211 | License to anyone who comes into possession of a copy. This
212 | License will therefore apply, along with any applicable section 7
213 | additional terms, to the whole of the work, and all its parts,
214 | regardless of how they are packaged. This License gives no
215 | permission to license the work in any other way, but it does not
216 | invalidate such permission if you have separately received it.
217 |
218 | d) If the work has interactive user interfaces, each must display
219 | Appropriate Legal Notices; however, if the Program has interactive
220 | interfaces that do not display Appropriate Legal Notices, your
221 | work need not make them do so.
222 |
223 | A compilation of a covered work with other separate and independent
224 | works, which are not by their nature extensions of the covered work,
225 | and which are not combined with it such as to form a larger program,
226 | in or on a volume of a storage or distribution medium, is called an
227 | "aggregate" if the compilation and its resulting copyright are not
228 | used to limit the access or legal rights of the compilation's users
229 | beyond what the individual works permit. Inclusion of a covered work
230 | in an aggregate does not cause this License to apply to the other
231 | parts of the aggregate.
232 |
233 | 6. Conveying Non-Source Forms.
234 |
235 | You may convey a covered work in object code form under the terms
236 | of sections 4 and 5, provided that you also convey the
237 | machine-readable Corresponding Source under the terms of this License,
238 | in one of these ways:
239 |
240 | a) Convey the object code in, or embodied in, a physical product
241 | (including a physical distribution medium), accompanied by the
242 | Corresponding Source fixed on a durable physical medium
243 | customarily used for software interchange.
244 |
245 | b) Convey the object code in, or embodied in, a physical product
246 | (including a physical distribution medium), accompanied by a
247 | written offer, valid for at least three years and valid for as
248 | long as you offer spare parts or customer support for that product
249 | model, to give anyone who possesses the object code either (1) a
250 | copy of the Corresponding Source for all the software in the
251 | product that is covered by this License, on a durable physical
252 | medium customarily used for software interchange, for a price no
253 | more than your reasonable cost of physically performing this
254 | conveying of source, or (2) access to copy the
255 | Corresponding Source from a network server at no charge.
256 |
257 | c) Convey individual copies of the object code with a copy of the
258 | written offer to provide the Corresponding Source. This
259 | alternative is allowed only occasionally and noncommercially, and
260 | only if you received the object code with such an offer, in accord
261 | with subsection 6b.
262 |
263 | d) Convey the object code by offering access from a designated
264 | place (gratis or for a charge), and offer equivalent access to the
265 | Corresponding Source in the same way through the same place at no
266 | further charge. You need not require recipients to copy the
267 | Corresponding Source along with the object code. If the place to
268 | copy the object code is a network server, the Corresponding Source
269 | may be on a different server (operated by you or a third party)
270 | that supports equivalent copying facilities, provided you maintain
271 | clear directions next to the object code saying where to find the
272 | Corresponding Source. Regardless of what server hosts the
273 | Corresponding Source, you remain obligated to ensure that it is
274 | available for as long as needed to satisfy these requirements.
275 |
276 | e) Convey the object code using peer-to-peer transmission, provided
277 | you inform other peers where the object code and Corresponding
278 | Source of the work are being offered to the general public at no
279 | charge under subsection 6d.
280 |
281 | A separable portion of the object code, whose source code is excluded
282 | from the Corresponding Source as a System Library, need not be
283 | included in conveying the object code work.
284 |
285 | A "User Product" is either (1) a "consumer product", which means any
286 | tangible personal property which is normally used for personal, family,
287 | or household purposes, or (2) anything designed or sold for incorporation
288 | into a dwelling. In determining whether a product is a consumer product,
289 | doubtful cases shall be resolved in favor of coverage. For a particular
290 | product received by a particular user, "normally used" refers to a
291 | typical or common use of that class of product, regardless of the status
292 | of the particular user or of the way in which the particular user
293 | actually uses, or expects or is expected to use, the product. A product
294 | is a consumer product regardless of whether the product has substantial
295 | commercial, industrial or non-consumer uses, unless such uses represent
296 | the only significant mode of use of the product.
297 |
298 | "Installation Information" for a User Product means any methods,
299 | procedures, authorization keys, or other information required to install
300 | and execute modified versions of a covered work in that User Product from
301 | a modified version of its Corresponding Source. The information must
302 | suffice to ensure that the continued functioning of the modified object
303 | code is in no case prevented or interfered with solely because
304 | modification has been made.
305 |
306 | If you convey an object code work under this section in, or with, or
307 | specifically for use in, a User Product, and the conveying occurs as
308 | part of a transaction in which the right of possession and use of the
309 | User Product is transferred to the recipient in perpetuity or for a
310 | fixed term (regardless of how the transaction is characterized), the
311 | Corresponding Source conveyed under this section must be accompanied
312 | by the Installation Information. But this requirement does not apply
313 | if neither you nor any third party retains the ability to install
314 | modified object code on the User Product (for example, the work has
315 | been installed in ROM).
316 |
317 | The requirement to provide Installation Information does not include a
318 | requirement to continue to provide support service, warranty, or updates
319 | for a work that has been modified or installed by the recipient, or for
320 | the User Product in which it has been modified or installed. Access to a
321 | network may be denied when the modification itself materially and
322 | adversely affects the operation of the network or violates the rules and
323 | protocols for communication across the network.
324 |
325 | Corresponding Source conveyed, and Installation Information provided,
326 | in accord with this section must be in a format that is publicly
327 | documented (and with an implementation available to the public in
328 | source code form), and must require no special password or key for
329 | unpacking, reading or copying.
330 |
331 | 7. Additional Terms.
332 |
333 | "Additional permissions" are terms that supplement the terms of this
334 | License by making exceptions from one or more of its conditions.
335 | Additional permissions that are applicable to the entire Program shall
336 | be treated as though they were included in this License, to the extent
337 | that they are valid under applicable law. If additional permissions
338 | apply only to part of the Program, that part may be used separately
339 | under those permissions, but the entire Program remains governed by
340 | this License without regard to the additional permissions.
341 |
342 | When you convey a copy of a covered work, you may at your option
343 | remove any additional permissions from that copy, or from any part of
344 | it. (Additional permissions may be written to require their own
345 | removal in certain cases when you modify the work.) You may place
346 | additional permissions on material, added by you to a covered work,
347 | for which you have or can give appropriate copyright permission.
348 |
349 | Notwithstanding any other provision of this License, for material you
350 | add to a covered work, you may (if authorized by the copyright holders of
351 | that material) supplement the terms of this License with terms:
352 |
353 | a) Disclaiming warranty or limiting liability differently from the
354 | terms of sections 15 and 16 of this License; or
355 |
356 | b) Requiring preservation of specified reasonable legal notices or
357 | author attributions in that material or in the Appropriate Legal
358 | Notices displayed by works containing it; or
359 |
360 | c) Prohibiting misrepresentation of the origin of that material, or
361 | requiring that modified versions of such material be marked in
362 | reasonable ways as different from the original version; or
363 |
364 | d) Limiting the use for publicity purposes of names of licensors or
365 | authors of the material; or
366 |
367 | e) Declining to grant rights under trademark law for use of some
368 | trade names, trademarks, or service marks; or
369 |
370 | f) Requiring indemnification of licensors and authors of that
371 | material by anyone who conveys the material (or modified versions of
372 | it) with contractual assumptions of liability to the recipient, for
373 | any liability that these contractual assumptions directly impose on
374 | those licensors and authors.
375 |
376 | All other non-permissive additional terms are considered "further
377 | restrictions" within the meaning of section 10. If the Program as you
378 | received it, or any part of it, contains a notice stating that it is
379 | governed by this License along with a term that is a further
380 | restriction, you may remove that term. If a license document contains
381 | a further restriction but permits relicensing or conveying under this
382 | License, you may add to a covered work material governed by the terms
383 | of that license document, provided that the further restriction does
384 | not survive such relicensing or conveying.
385 |
386 | If you add terms to a covered work in accord with this section, you
387 | must place, in the relevant source files, a statement of the
388 | additional terms that apply to those files, or a notice indicating
389 | where to find the applicable terms.
390 |
391 | Additional terms, permissive or non-permissive, may be stated in the
392 | form of a separately written license, or stated as exceptions;
393 | the above requirements apply either way.
394 |
395 | 8. Termination.
396 |
397 | You may not propagate or modify a covered work except as expressly
398 | provided under this License. Any attempt otherwise to propagate or
399 | modify it is void, and will automatically terminate your rights under
400 | this License (including any patent licenses granted under the third
401 | paragraph of section 11).
402 |
403 | However, if you cease all violation of this License, then your
404 | license from a particular copyright holder is reinstated (a)
405 | provisionally, unless and until the copyright holder explicitly and
406 | finally terminates your license, and (b) permanently, if the copyright
407 | holder fails to notify you of the violation by some reasonable means
408 | prior to 60 days after the cessation.
409 |
410 | Moreover, your license from a particular copyright holder is
411 | reinstated permanently if the copyright holder notifies you of the
412 | violation by some reasonable means, this is the first time you have
413 | received notice of violation of this License (for any work) from that
414 | copyright holder, and you cure the violation prior to 30 days after
415 | your receipt of the notice.
416 |
417 | Termination of your rights under this section does not terminate the
418 | licenses of parties who have received copies or rights from you under
419 | this License. If your rights have been terminated and not permanently
420 | reinstated, you do not qualify to receive new licenses for the same
421 | material under section 10.
422 |
423 | 9. Acceptance Not Required for Having Copies.
424 |
425 | You are not required to accept this License in order to receive or
426 | run a copy of the Program. Ancillary propagation of a covered work
427 | occurring solely as a consequence of using peer-to-peer transmission
428 | to receive a copy likewise does not require acceptance. However,
429 | nothing other than this License grants you permission to propagate or
430 | modify any covered work. These actions infringe copyright if you do
431 | not accept this License. Therefore, by modifying or propagating a
432 | covered work, you indicate your acceptance of this License to do so.
433 |
434 | 10. Automatic Licensing of Downstream Recipients.
435 |
436 | Each time you convey a covered work, the recipient automatically
437 | receives a license from the original licensors, to run, modify and
438 | propagate that work, subject to this License. You are not responsible
439 | for enforcing compliance by third parties with this License.
440 |
441 | An "entity transaction" is a transaction transferring control of an
442 | organization, or substantially all assets of one, or subdividing an
443 | organization, or merging organizations. If propagation of a covered
444 | work results from an entity transaction, each party to that
445 | transaction who receives a copy of the work also receives whatever
446 | licenses to the work the party's predecessor in interest had or could
447 | give under the previous paragraph, plus a right to possession of the
448 | Corresponding Source of the work from the predecessor in interest, if
449 | the predecessor has it or can get it with reasonable efforts.
450 |
451 | You may not impose any further restrictions on the exercise of the
452 | rights granted or affirmed under this License. For example, you may
453 | not impose a license fee, royalty, or other charge for exercise of
454 | rights granted under this License, and you may not initiate litigation
455 | (including a cross-claim or counterclaim in a lawsuit) alleging that
456 | any patent claim is infringed by making, using, selling, offering for
457 | sale, or importing the Program or any portion of it.
458 |
459 | 11. Patents.
460 |
461 | A "contributor" is a copyright holder who authorizes use under this
462 | License of the Program or a work on which the Program is based. The
463 | work thus licensed is called the contributor's "contributor version".
464 |
465 | A contributor's "essential patent claims" are all patent claims
466 | owned or controlled by the contributor, whether already acquired or
467 | hereafter acquired, that would be infringed by some manner, permitted
468 | by this License, of making, using, or selling its contributor version,
469 | but do not include claims that would be infringed only as a
470 | consequence of further modification of the contributor version. For
471 | purposes of this definition, "control" includes the right to grant
472 | patent sublicenses in a manner consistent with the requirements of
473 | this License.
474 |
475 | Each contributor grants you a non-exclusive, worldwide, royalty-free
476 | patent license under the contributor's essential patent claims, to
477 | make, use, sell, offer for sale, import and otherwise run, modify and
478 | propagate the contents of its contributor version.
479 |
480 | In the following three paragraphs, a "patent license" is any express
481 | agreement or commitment, however denominated, not to enforce a patent
482 | (such as an express permission to practice a patent or covenant not to
483 | sue for patent infringement). To "grant" such a patent license to a
484 | party means to make such an agreement or commitment not to enforce a
485 | patent against the party.
486 |
487 | If you convey a covered work, knowingly relying on a patent license,
488 | and the Corresponding Source of the work is not available for anyone
489 | to copy, free of charge and under the terms of this License, through a
490 | publicly available network server or other readily accessible means,
491 | then you must either (1) cause the Corresponding Source to be so
492 | available, or (2) arrange to deprive yourself of the benefit of the
493 | patent license for this particular work, or (3) arrange, in a manner
494 | consistent with the requirements of this License, to extend the patent
495 | license to downstream recipients. "Knowingly relying" means you have
496 | actual knowledge that, but for the patent license, your conveying the
497 | covered work in a country, or your recipient's use of the covered work
498 | in a country, would infringe one or more identifiable patents in that
499 | country that you have reason to believe are valid.
500 |
501 | If, pursuant to or in connection with a single transaction or
502 | arrangement, you convey, or propagate by procuring conveyance of, a
503 | covered work, and grant a patent license to some of the parties
504 | receiving the covered work authorizing them to use, propagate, modify
505 | or convey a specific copy of the covered work, then the patent license
506 | you grant is automatically extended to all recipients of the covered
507 | work and works based on it.
508 |
509 | A patent license is "discriminatory" if it does not include within
510 | the scope of its coverage, prohibits the exercise of, or is
511 | conditioned on the non-exercise of one or more of the rights that are
512 | specifically granted under this License. You may not convey a covered
513 | work if you are a party to an arrangement with a third party that is
514 | in the business of distributing software, under which you make payment
515 | to the third party based on the extent of your activity of conveying
516 | the work, and under which the third party grants, to any of the
517 | parties who would receive the covered work from you, a discriminatory
518 | patent license (a) in connection with copies of the covered work
519 | conveyed by you (or copies made from those copies), or (b) primarily
520 | for and in connection with specific products or compilations that
521 | contain the covered work, unless you entered into that arrangement,
522 | or that patent license was granted, prior to 28 March 2007.
523 |
524 | Nothing in this License shall be construed as excluding or limiting
525 | any implied license or other defenses to infringement that may
526 | otherwise be available to you under applicable patent law.
527 |
528 | 12. No Surrender of Others' Freedom.
529 |
530 | If conditions are imposed on you (whether by court order, agreement or
531 | otherwise) that contradict the conditions of this License, they do not
532 | excuse you from the conditions of this License. If you cannot convey a
533 | covered work so as to satisfy simultaneously your obligations under this
534 | License and any other pertinent obligations, then as a consequence you may
535 | not convey it at all. For example, if you agree to terms that obligate you
536 | to collect a royalty for further conveying from those to whom you convey
537 | the Program, the only way you could satisfy both those terms and this
538 | License would be to refrain entirely from conveying the Program.
539 |
540 | 13. Remote Network Interaction; Use with the GNU General Public License.
541 |
542 | Notwithstanding any other provision of this License, if you modify the
543 | Program, your modified version must prominently offer all users
544 | interacting with it remotely through a computer network (if your version
545 | supports such interaction) an opportunity to receive the Corresponding
546 | Source of your version by providing access to the Corresponding Source
547 | from a network server at no charge, through some standard or customary
548 | means of facilitating copying of software. This Corresponding Source
549 | shall include the Corresponding Source for any work covered by version 3
550 | of the GNU General Public License that is incorporated pursuant to the
551 | following paragraph.
552 |
553 | Notwithstanding any other provision of this License, you have
554 | permission to link or combine any covered work with a work licensed
555 | under version 3 of the GNU General Public License into a single
556 | combined work, and to convey the resulting work. The terms of this
557 | License will continue to apply to the part which is the covered work,
558 | but the work with which it is combined will remain governed by version
559 | 3 of the GNU General Public License.
560 |
561 | 14. Revised Versions of this License.
562 |
563 | The Free Software Foundation may publish revised and/or new versions of
564 | the GNU Affero General Public License from time to time. Such new versions
565 | will be similar in spirit to the present version, but may differ in detail to
566 | address new problems or concerns.
567 |
568 | Each version is given a distinguishing version number. If the
569 | Program specifies that a certain numbered version of the GNU Affero General
570 | Public License "or any later version" applies to it, you have the
571 | option of following the terms and conditions either of that numbered
572 | version or of any later version published by the Free Software
573 | Foundation. If the Program does not specify a version number of the
574 | GNU Affero General Public License, you may choose any version ever published
575 | by the Free Software Foundation.
576 |
577 | If the Program specifies that a proxy can decide which future
578 | versions of the GNU Affero General Public License can be used, that proxy's
579 | public statement of acceptance of a version permanently authorizes you
580 | to choose that version for the Program.
581 |
582 | Later license versions may give you additional or different
583 | permissions. However, no additional obligations are imposed on any
584 | author or copyright holder as a result of your choosing to follow a
585 | later version.
586 |
587 | 15. Disclaimer of Warranty.
588 |
589 | THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
590 | APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
591 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
592 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
593 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
594 | PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
595 | IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
596 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
597 |
598 | 16. Limitation of Liability.
599 |
600 | IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
601 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
602 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
603 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
604 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
605 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
606 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
607 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
608 | SUCH DAMAGES.
609 |
610 | 17. Interpretation of Sections 15 and 16.
611 |
612 | If the disclaimer of warranty and limitation of liability provided
613 | above cannot be given local legal effect according to their terms,
614 | reviewing courts shall apply local law that most closely approximates
615 | an absolute waiver of all civil liability in connection with the
616 | Program, unless a warranty or assumption of liability accompanies a
617 | copy of the Program in return for a fee.
618 |
619 | END OF TERMS AND CONDITIONS
620 |
621 | How to Apply These Terms to Your New Programs
622 |
623 | If you develop a new program, and you want it to be of the greatest
624 | possible use to the public, the best way to achieve this is to make it
625 | free software which everyone can redistribute and change under these terms.
626 |
627 | To do so, attach the following notices to the program. It is safest
628 | to attach them to the start of each source file to most effectively
629 | state the exclusion of warranty; and each file should have at least
630 | the "copyright" line and a pointer to where the full notice is found.
631 |
632 |
633 | Copyright (C)
634 |
635 | This program is free software: you can redistribute it and/or modify
636 | it under the terms of the GNU Affero General Public License as published by
637 | the Free Software Foundation, either version 3 of the License, or
638 | (at your option) any later version.
639 |
640 | This program is distributed in the hope that it will be useful,
641 | but WITHOUT ANY WARRANTY; without even the implied warranty of
642 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
643 | GNU Affero General Public License for more details.
644 |
645 | You should have received a copy of the GNU Affero General Public License
646 | along with this program. If not, see .
647 |
648 | Also add information on how to contact you by electronic and paper mail.
649 |
650 | If your software can interact with users remotely through a computer
651 | network, you should also make sure that it provides a way for users to
652 | get its source. For example, if your program is a web application, its
653 | interface could display a "Source" link that leads users to an archive
654 | of the code. There are many ways you could offer source, and different
655 | solutions will be better for different programs; see section 13 for the
656 | specific requirements.
657 |
658 | You should also get your employer (if you work as a programmer) or school,
659 | if any, to sign a "copyright disclaimer" for the program, if necessary.
660 | For more information on this, and how to apply and follow the GNU AGPL, see
661 | .
--------------------------------------------------------------------------------
/Readme.md:
--------------------------------------------------------------------------------
1 | # Causal INdependent Effect Module Attribution + Optimal Transport (CINEMA-OT)
2 |
3 | CINEMA-OT is a **causal** framework for perturbation effect analysis to identify **individual treatment effects** and **synergy** at the **single cell** level.
4 |
5 | **Note**: Newer versions of CINEMA-OT are maintained at [Pertpy](https://github.com/scverse/pertpy).
6 |
7 | ## Architecture
8 |
9 |
10 |
11 |
12 | Read our preprint on bioRxiv:
13 |
14 | - Dong, Mingze, et al. "Causal identification of single-cell experimental perturbation effects with CINEMA-OT". bioRxiv (2022).
15 | [https://www.biorxiv.org/content/10.1101/2022.07.31.502173v3](https://www.biorxiv.org/content/10.1101/2022.07.31.502173v3)
16 |
17 | ## System requirements
18 | ### Hardware requirements
19 | `CINEMA-OT` requires only a standard computer with enough RAM to perform in-memory computations.
20 | ### OS requirements
21 | The `CINEMA-OT` package is supported for all OS in principle. The package has been tested on the following systems:
22 | * macOS: Monterey (12.4)
23 | * Linux: RHEL Maipo (7.9), Ubantu (18.04)
24 | ### Dependencies
25 | See `setup.cfg` for details.
26 |
27 | ## Installation
28 | CINEMA-OT requires `python` version 3.7+. Install directly from pip with:
29 |
30 | pip install cinemaot
31 |
32 | The installation should take no more than a few minutes on a normal desktop computer.
33 |
34 |
35 | ## Usage
36 |
37 | For detailed usage, follow our step-by-step tutorial here:
38 |
39 | - [Getting Started with CINEMA-OT](https://github.com/vandijklab/CINEMA-OT/blob/main/cinemaot_tutorial.ipynb)
40 |
41 | Download the data used for the tutorial here:
42 |
43 | - [Ex vivo stimulation of human peripheral blood mononuclear cells (PBMC) with interferon](https://drive.google.com/file/d/1A3rNdgfiXFWhCUOoUfJ-AiY7AAOU0Ie3/view?usp=sharing)
44 |
--------------------------------------------------------------------------------
/cinemaot/__init__.py:
--------------------------------------------------------------------------------
1 | """CINEMA-OT - Causal Independent Effect Module Attribution + Optimal Transport, for single-cell level treatment effect identification"""
2 | __version__ = "0.0.3"
3 | from . import cinemaot
--------------------------------------------------------------------------------
/cinemaot/benchmark.py:
--------------------------------------------------------------------------------
1 | import scib
2 | import numpy as np
3 | import pandas as pd
4 | import scanpy as sc
5 | from sklearn.neighbors import NearestNeighbors
6 | from scipy.sparse import csr_matrix
7 |
8 | # In this newer version we use the Python implementation of xicor
9 | # import rpy2.robjects as ro
10 | # import rpy2.robjects.numpy2ri
11 | # import rpy2.robjects.pandas2ri
12 | # from rpy2.robjects.packages import importr
13 | # rpy2.robjects.numpy2ri.activate()
14 | # rpy2.robjects.pandas2ri.activate()
15 |
16 | from scipy.stats.stats import pearsonr
17 | from sklearn.decomposition import FastICA
18 | from sklearn.metrics import roc_curve
19 | from sklearn.metrics import auc
20 | from sklearn.metrics import pairwise_distances
21 | from . import sinkhorn_knopp as skp
22 |
23 | from sklearn.preprocessing import OneHotEncoder
24 | from scipy.stats import ttest_1samp
25 | import harmonypy as hm
26 |
27 | def mixscape(adata,obs_label, ref_label, expr_label, nn=20, return_te = True):
28 | X_pca1 = adata.obsm['X_pca'][adata.obs[obs_label]==expr_label,:]
29 | X_pca2 = adata.obsm['X_pca'][adata.obs[obs_label]==ref_label,:]
30 | nbrs = NearestNeighbors(n_neighbors=nn, algorithm='ball_tree').fit(X_pca1)
31 | mixscape_pca = adata.obsm['X_pca'].copy()
32 | mixscapematrix = nbrs.kneighbors_graph(X_pca2).toarray()
33 | mixscape_pca[adata.obs[obs_label]==ref_label,:] = np.dot(mixscapematrix, mixscape_pca[adata.obs[obs_label]==expr_label,:])/20
34 | if return_te:
35 | te2 = adata.X[adata.obs[obs_label]==ref_label,:] - (mixscapematrix/np.sum(mixscapematrix,axis=1)[:,None]) @ (adata.X[adata.obs[obs_label]==expr_label,:])
36 | return mixscape_pca, mixscapematrix, te2
37 | else:
38 | return mixscape_pca, mixscapematrix
39 |
40 | def harmony_mixscape(adata,obs_label, ref_label, expr_label,nn=20, return_te = True):
41 | meta_data = adata.obs
42 | data_mat=adata.obsm['X_pca']
43 | vars_use=[obs_label]
44 | ho = hm.run_harmony(data_mat, meta_data,vars_use)
45 | hmdata = ho.Z_corr.T
46 | X_pca1 = hmdata[adata.obs[obs_label]==expr_label,:]
47 | X_pca2 = hmdata[adata.obs[obs_label]==ref_label,:]
48 | nbrs = NearestNeighbors(n_neighbors=nn, algorithm='ball_tree').fit(X_pca1)
49 | hmmatrix = nbrs.kneighbors_graph(X_pca2).toarray()
50 | if return_te:
51 | te2 = adata.X[adata.obs[obs_label]==ref_label,:] - np.matmul(hmmatrix/np.sum(hmmatrix,axis=1)[:,None],adata.X[adata.obs[obs_label]==expr_label,:])
52 | return hmdata, hmmatrix, te2
53 | else:
54 | return hmdata, hmmatrix
55 |
56 | def OT(adata,obs_label, ref_label, expr_label,thres=0.01, return_te = True):
57 | cf1 = adata.obsm['X_pca'][adata.obs[obs_label]==expr_label,0:20]
58 | cf2 = adata.obsm['X_pca'][adata.obs[obs_label]==ref_label,0:20]
59 | r = np.zeros([cf1.shape[0],1])
60 | c = np.zeros([cf2.shape[0],1])
61 | r[:,0] = 1/cf1.shape[0]
62 | c[:,0] = 1/cf2.shape[0]
63 | sk = skp.SinkhornKnopp(setr=r,setc=c,epsilon=1e-2)
64 | dis = pairwise_distances(cf1,cf2)
65 | e = thres * adata.obsm['X_pca'].shape[1]
66 | af = np.exp(-dis * dis / e)
67 | ot = sk.fit(af).T
68 | OT_pca = adata.obsm['X_pca'].copy()
69 | OT_pca[adata.obs[obs_label]==ref_label,:] = np.matmul(ot/np.sum(ot,axis=1)[:,None],OT_pca[adata.obs[obs_label]==expr_label,:])
70 | if return_te:
71 | te2 = adata.X[adata.obs[obs_label]==ref_label,:] - np.matmul(ot/np.sum(ot,axis=1)[:,None],adata.X[adata.obs[obs_label]==expr_label,:])
72 | return OT_pca, ot, te2
73 | else:
74 | return OT_pca, ot
75 |
76 |
77 | def evaluate_cinema(matrix,ite,gt,gite):
78 | #includes four statistics: knn-AUC, treatment effect pearson correlation, treatment effect spearman correlation, ttest AUC
79 | aucdata = np.zeros(gt.shape[0])
80 | corr_ = np.zeros(gt.shape[0])
81 | scorr_ = np.zeros(gt.shape[0])
82 | #genesig = np.zeros(gite.shape[1])
83 | for i in range(gt.shape[0]):
84 | fpr, tpr, thres = roc_curve(gt[i,:],matrix[i,:])
85 | aucdata[i] = auc(fpr,tpr)
86 | for i in range(ite.shape[0]):
87 | corr_[i], pval = pearsonr(ite[i,1000:],gite[i,1000:])
88 | scorr_[i],pval = spearmanr(ite[i,1000:],gite[i,1000:])
89 | corr_[i], pval = pearsonr(ite[i,:],gite[i,:])
90 | scorr_[i],pval = spearmanr(ite[i,:],gite[i,:])
91 | return np.median(aucdata), np.median(corr_), np.median(scorr_)
92 |
93 | def evaluate_batch(sig, adata,obs_label, label, continuity,asw=True,silhouette=True,graph_conn=True,pcr=True,nmi=True,ari=True,diff_coefs=False):
94 | #Label is a list!!!
95 | newsig = sc.AnnData(X=sig, obs = adata.obs)
96 | sc.pp.pca(newsig,n_comps=min(15,newsig.X.shape[1]-1))
97 | #newsig.obsm['X_pca'] = newsig.X
98 | k0=15
99 | sc.pp.neighbors(newsig, n_neighbors=k0)
100 | sc.tl.diffmap(newsig, n_comps=min(15,newsig.X.shape[1]-1))
101 | eigen = newsig.obsm['X_diffmap']
102 | #newsig_nbrs = NearestNeighbors(n_neighbors=10, algorithm='ball_tree').fit(newsig.X)
103 | #newsig_con = newsig_nbrs.kneighbors_graph(newsig.X)
104 | #newsig.obsp['connectivities'] = newsig_con
105 | newsig_metrics = scib.metrics.metrics(adata,newsig,obs_label,label[0],
106 | isolated_labels_asw_= asw,
107 | graph_conn_= graph_conn,
108 | silhouette_ = silhouette,
109 | nmi_=nmi,
110 | ari_=ari,
111 | pcr_=pcr)
112 | if diff_coefs:
113 | for i in range(len(label)):
114 | steps = adata.obs[label[i]].values
115 | #also we test max correlation to see strong functional dependence between steps and signals, for each state_group population
116 | if continuity[i]:
117 | xi = np.zeros(eigen.shape[1])
118 | #pval = np.zeros(eigen.shape[1])
119 | j = 0
120 | for source_row in eigen.T:
121 | #rresults = xicor(ro.FloatVector(source_row), ro.FloatVector(steps), pvalue = True)
122 | xi_obj = Xi(source_row,steps.astype(np.float))
123 | xi[j] = xi_obj.correlation
124 | j = j+1
125 | maxcoef = np.max(xi)
126 | #newsig_metrics.rename(index={'trajectory':'trajectory_coef'},inplace=True)
127 | #newsig_metrics.iloc[13,0] = np.max(xi)
128 | newsig_metrics.loc[label[i]] = maxcoef
129 | else:
130 | encoder = OneHotEncoder(sparse=False)
131 | onehot = encoder.fit_transform(np.array(adata.obs[label[i]].values.tolist()).reshape(-1, 1))
132 | yi = np.zeros([onehot.shape[1],eigen.shape[1]])
133 | k = 0
134 | #ind = onehot.T[0] * 0
135 | m = onehot.T.shape[0]
136 | for indicator in onehot.T[0:m-1]:
137 | j = 0
138 | #ind = ind + indicator
139 | for source_row in eigen.T:
140 | xi_obj = Xi(source_row,indicator*1)
141 | yi[k,j] = xi_obj.correlation
142 | j = j+1
143 | k = k+1
144 |
145 | #newsig_metrics.rename(index={'hvg_overlap':'state_coef'},inplace=True)
146 | #newsig_metrics.iloc[12,0] = np.mean(np.max(yi,axis=1))
147 | newsig_metrics.loc[label[i]] = np.mean(np.max(yi,axis=1))
148 |
149 | return newsig_metrics
150 |
151 |
152 | class Xi:
153 | """
154 | x and y are the data vectors
155 | """
156 |
157 | def __init__(self, x, y):
158 |
159 | self.x = x
160 | self.y = y
161 |
162 | @property
163 | def sample_size(self):
164 | return len(self.x)
165 |
166 | @property
167 | def x_ordered_rank(self):
168 | # PI is the rank vector for x, with ties broken at random
169 | # Not mine: source (https://stackoverflow.com/a/47430384/1628971)
170 | # random shuffling of the data - reason to use random.choice is that
171 | # pd.sample(frac=1) uses the same randomizing algorithm
172 | len_x = len(self.x)
173 | randomized_indices = np.random.choice(np.arange(len_x), len_x, replace=False)
174 | randomized = [self.x[idx] for idx in randomized_indices]
175 | # same as pandas rank method 'first'
176 | rankdata = ss.rankdata(randomized, method="ordinal")
177 | # Reindexing based on pairs of indices before and after
178 | unrandomized = [
179 | rankdata[j] for i, j in sorted(zip(randomized_indices, range(len_x)))
180 | ]
181 | return unrandomized
182 |
183 | @property
184 | def y_rank_max(self):
185 | # f[i] is number of j s.t. y[j] <= y[i], divided by n.
186 | return ss.rankdata(self.y, method="max") / self.sample_size
187 |
188 | @property
189 | def g(self):
190 | # g[i] is number of j s.t. y[j] >= y[i], divided by n.
191 | return ss.rankdata([-i for i in self.y], method="max") / self.sample_size
192 |
193 | @property
194 | def x_ordered(self):
195 | # order of the x's, ties broken at random.
196 | return np.argsort(self.x_ordered_rank)
197 |
198 | @property
199 | def x_rank_max_ordered(self):
200 | x_ordered_result = self.x_ordered
201 | y_rank_max_result = self.y_rank_max
202 | # Rearrange f according to ord.
203 | return [y_rank_max_result[i] for i in x_ordered_result]
204 |
205 | @property
206 | def mean_absolute(self):
207 | x1 = self.x_rank_max_ordered[0 : (self.sample_size - 1)]
208 | x2 = self.x_rank_max_ordered[1 : self.sample_size]
209 |
210 | return (
211 | np.mean(
212 | np.abs(
213 | [
214 | x - y
215 | for x, y in zip(
216 | x1,
217 | x2,
218 | )
219 | ]
220 | )
221 | )
222 | * (self.sample_size - 1)
223 | / (2 * self.sample_size)
224 | )
225 |
226 | @property
227 | def inverse_g_mean(self):
228 | gvalue = self.g
229 | return np.mean(gvalue * (1 - gvalue))
230 |
231 | @property
232 | def correlation(self):
233 | """xi correlation"""
234 | return 1 - self.mean_absolute / self.inverse_g_mean
235 |
236 | @classmethod
237 | def xi(cls, x, y):
238 | return cls(x, y)
239 |
240 | def pval_asymptotic(self, ties=False, nperm=1000):
241 | """
242 | Returns p values of the correlation
243 | Args:
244 | ties: boolean
245 | If ties is true, the algorithm assumes that the data has ties
246 | and employs the more elaborated theory for calculating
247 | the P-value. Otherwise, it uses the simpler theory. There is
248 | no harm in setting tiles True, even if there are no ties.
249 | nperm: int
250 | The number of permutations for the permutation test, if needed.
251 | default 1000
252 | Returns:
253 | p value
254 | """
255 | # If there are no ties, return xi and theoretical P-value:
256 |
257 | if ties:
258 | return 1 - ss.norm.cdf(
259 | np.sqrt(self.sample_size) * self.correlation / np.sqrt(2 / 5)
260 | )
261 |
262 | # If there are ties, and the theoretical method
263 | # is to be used for calculation P-values:
264 | # The following steps calculate the theoretical variance
265 | # in the presence of ties:
266 | sorted_ordered_x_rank = sorted(self.x_rank_max_ordered)
267 |
268 | ind = [i + 1 for i in range(self.sample_size)]
269 | ind2 = [2 * self.sample_size - 2 * ind[i - 1] + 1 for i in ind]
270 |
271 | a = (
272 | np.mean([i * j * j for i, j in zip(ind2, sorted_ordered_x_rank)])
273 | / self.sample_size
274 | )
275 |
276 | c = (
277 | np.mean([i * j for i, j in zip(ind2, sorted_ordered_x_rank)])
278 | / self.sample_size
279 | )
280 |
281 | cq = np.cumsum(sorted_ordered_x_rank)
282 |
283 | m = [
284 | (i + (self.sample_size - j) * k) / self.sample_size
285 | for i, j, k in zip(cq, ind, sorted_ordered_x_rank)
286 | ]
287 |
288 | b = np.mean([np.square(i) for i in m])
289 | v = (a - 2 * b + np.square(c)) / np.square(self.inverse_g_mean)
290 |
291 | return 1 - ss.norm.cdf(
292 | np.sqrt(self.sample_size) * self.correlation / np.sqrt(v)
293 | )
--------------------------------------------------------------------------------
/cinemaot/cinemaot.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import pandas as pd
3 | import scanpy as sc
4 | from anndata import AnnData
5 | from . import sinkhorn_knopp as skp
6 | #from . import utils
7 | from scipy.sparse import issparse
8 | from sklearn.neighbors import NearestNeighbors
9 | import scipy.stats as ss
10 |
11 | # In this newer version we use the Python implementation of xicor
12 | # import rpy2.robjects as ro
13 | # import rpy2.robjects.numpy2ri
14 | # import rpy2.robjects.pandas2ri
15 | # from rpy2.robjects.packages import importr
16 | # rpy2.robjects.numpy2ri.activate()
17 | # rpy2.robjects.pandas2ri.activate()
18 |
19 |
20 | # Instead of projecting the whole count matrix, we use the pca result of projected ICA components to stablize the noise
21 | # returning an anndata object
22 | # Detecting differently expressed genes: G = A + Z + AZ + e by NB regression. Significant coefficient before AZ means conditional-specific effect
23 | # Further exclusion of false positives may be removed by permutation (as in PseudotimeDE)
24 |
25 | #import ot
26 |
27 | import statsmodels.api as sm
28 | from sklearn.linear_model import LinearRegression
29 |
30 | from sklearn.decomposition import FastICA
31 | import sklearn.metrics
32 |
33 |
34 | def cinemaot_unweighted(adata,obs_label,ref_label,expr_label,dim=20,thres=0.15,smoothness=1e-4,eps=1e-3,mode='parametric',marker=None,preweight_label=None):
35 | """
36 | Parameters
37 | ----------
38 | adata: 'AnnData'
39 | An anndata object containing the whole gene count matrix and an observation index for treatments. It should be preprocessed before input.
40 | obs_label: 'str'
41 | A string for indicating the treatment column name in adata.obs.
42 | ref_label: 'str'
43 | A string for indicating the control group in adata.obs.values.
44 | expr_label: 'str'
45 | A string for indicating the experiment group in adata.obs.values.
46 | dim: 'int'
47 | The number of independent components.
48 | thres: 'float'
49 | The threshold for setting the Chatterjee coefficent for confounder separation.
50 | smoothness: 'float'
51 | The parameter for setting the smoothness of entropy-regularized optimal transport. Should be set as a small value above zero!
52 | eps: 'float'
53 | The parameter for stop condition of OT convergence.
54 | mode: 'str'
55 | If mode is 'parametric', return standard differential matrices. If it's non-parametric, we return expr cells' weighted quantile.
56 | Return
57 | ----------
58 | cf: 'numpy.ndarray'
59 | Confounder components, of shape (n_cells,n_components).
60 | ot: 'numpy.ndarray'
61 | Transport map across control and experimental conditions.
62 | te2: 'numpy.ndarray'
63 | Single-cell differential expression for each cell in control condition, of shape (n_refcells, n_genes).
64 | """
65 | if dim is None:
66 | sk = skp.SinkhornKnopp()
67 | c = 0.5
68 | data=adata.X
69 | vm = (1e-3 + data + c * data * data)/(1+c)
70 | P = sk.fit(vm)
71 | wm = np.dot(np.dot(np.sqrt(sk._D1),vm),np.sqrt(sk._D2))
72 | u,s,vt = np.linalg.svd(wm)
73 | dim = np.min(sum(s > (np.sqrt(data.shape[0])+np.sqrt(data.shape[1]))),adata.obsm['X_pca'].shape[1])
74 |
75 |
76 | transformer = FastICA(n_components=dim, random_state=0,whiten="arbitrary-variance")
77 | X_transformed = transformer.fit_transform(adata.obsm['X_pca'][:,:dim])
78 | #importr("XICOR")
79 | #xicor = ro.r["xicor"]
80 | groupvec = (adata.obs[obs_label]==ref_label *1).values #control
81 | xi = np.zeros(dim)
82 | #pval = np.zeros(dim)
83 | j = 0
84 | for source_row in X_transformed.T:
85 | xi_obj = Xi(source_row,groupvec*1)
86 | #rresults = xicor(ro.FloatVector(source_row), ro.FloatVector(groupvec), pvalue = True)
87 | #xi[j] = np.array(rresults.rx2("xi"))[0]
88 | xi[j] = xi_obj.correlation
89 | #pval[j] = np.array(rresults.rx2("pval"))[0]
90 | j = j+1
91 | cf = X_transformed[:,xi (np.sqrt(data.shape[0])+np.sqrt(data.shape[1])))
199 |
200 | sk = skp.SinkhornKnopp()
201 | adata_ = adata[adata.obs[obs_label].isin([expr_label,ref_label])].copy()
202 | if use_rep is None:
203 | X_pca1 = adata_.obsm['X_pca'][adata_.obs[obs_label]==expr_label,:]
204 | X_pca2 = adata_.obsm['X_pca'][adata_.obs[obs_label]==ref_label,:]
205 | nbrs = NearestNeighbors(n_neighbors=k, algorithm='ball_tree').fit(X_pca1)
206 | mixscape_pca = adata.obsm['X_pca'].copy()
207 | mixscapematrix = nbrs.kneighbors_graph(X_pca2).toarray()
208 | mixscape_pca[adata_.obs[obs_label]==ref_label,:] = np.dot(mixscapematrix, mixscape_pca[adata_.obs[obs_label]==expr_label,:])/k
209 |
210 | adata_.obsm['X_mpca'] = mixscape_pca
211 | sc.pp.neighbors(adata_,use_rep='X_mpca')
212 |
213 | else:
214 | sc.pp.neighbors(adata_,use_rep=use_rep)
215 | sc.tl.leiden(adata_,resolution=resolution)
216 |
217 | z = np.zeros(adata_.shape[0]) + 1
218 |
219 | j = 0
220 |
221 | for i in adata_.obs['leiden'].cat.categories:
222 | if adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==ref_label)].shape[0] >= adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==expr_label)].shape[0]:
223 | z[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==ref_label)] = adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==expr_label)].shape[0] / adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==ref_label)].shape[0]
224 | if j == 0:
225 | idx = sc.pp.subsample(adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==ref_label)],n_obs = adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==expr_label)].shape[0],copy=True).obs.index
226 | idx = idx.append(adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==expr_label)].obs.index)
227 | j = j + 1
228 | else:
229 | idx_tmp = sc.pp.subsample(adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==ref_label)],n_obs = adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==expr_label)].shape[0],copy=True).obs.index
230 | idx_tmp = idx_tmp.append(adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==expr_label)].obs.index)
231 | idx = idx.append(idx_tmp)
232 | else:
233 | z[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==expr_label)] = adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==ref_label)].shape[0] / adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==expr_label)].shape[0]
234 | if j == 0:
235 | idx = sc.pp.subsample(adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==expr_label)],n_obs = adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==ref_label)].shape[0],copy=True).obs.index
236 | idx = idx.append(adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==ref_label)].obs.index)
237 | j = j + 1
238 | else:
239 | idx_tmp = sc.pp.subsample(adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==expr_label)],n_obs = adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==ref_label)].shape[0],copy=True).obs.index
240 | idx_tmp = idx_tmp.append(adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==ref_label)].obs.index)
241 | idx = idx.append(idx_tmp)
242 |
243 | transformer = FastICA(n_components=dim, random_state=0, whiten="arbitrary-variance")
244 | X_transformed = transformer.fit_transform(adata_[idx].obsm['X_pca'][:,:dim])
245 | #importr("XICOR")
246 | #xicor = ro.r["xicor"]
247 | groupvec = (adata_[idx].obs[obs_label]==ref_label *1).values #control
248 | xi = np.zeros(dim)
249 | #pval = np.zeros(dim)
250 | j = 0
251 | for source_row in X_transformed.T:
252 | xi_obj = Xi(source_row,groupvec*1)
253 | #rresults = xicor(ro.FloatVector(source_row), ro.FloatVector(groupvec), pvalue = True)
254 | #xi[j] = np.array(rresults.rx2("xi"))[0]
255 | xi[j] = xi_obj.correlation
256 | #pval[j] = np.array(rresults.rx2("pval"))[0]
257 | j = j+1
258 |
259 | cf = transformer.transform(adata_.obsm['X_pca'][:,:dim])[:,xi0)>=n_cells:
368 | glm_binom = sm.GLM(adata.raw.X[:,i].toarray()[:,0], X, family=sm.families.Poisson())
369 | try:
370 | res = glm_binom.fit(tol=1e-4)
371 | pvalue[i] = np.min(res.pvalues[cf.shape[1]+2:])
372 | effectsize[i] = res.params[np.argmin(res.pvalues[cf.shape[1]+2:])]
373 | except:
374 | pvalue[i] = 0
375 | effectsize[i] = 0
376 |
377 | return effectsize, pvalue
378 |
379 |
380 | def attribution_scatter(adata,obs_label,control_label,expr_label,use_raw=True):
381 | cf = adata.obsm['cf']
382 | if use_raw:
383 | Y0 = adata.raw.X.toarray()[adata.obs[obs_label]==control_label,:]
384 | Y1 = adata.raw.X.toarray()[adata.obs[obs_label]==expr_label,:]
385 | else:
386 | Y0 = adata.X.toarray()[adata.obs[obs_label]==control_label,:]
387 | Y1 = adata.X.toarray()[adata.obs[obs_label]==expr_label,:]
388 | X0 = cf[adata.obs[obs_label]==control_label,:]
389 | X1 = cf[adata.obs[obs_label]==expr_label,:]
390 | ols0 = LinearRegression()
391 | ols0.fit(X0,Y0)
392 | ols1 = LinearRegression()
393 | ols1.fit(X1,Y1)
394 | c0 = ols0.predict(X0) - np.mean(ols0.predict(X0),axis=0)
395 | c1 = ols1.predict(X1) - np.mean(ols1.predict(X1),axis=0)
396 | e0 = Y0 - ols0.predict(X0)
397 | e1 = Y1 - ols1.predict(X1)
398 | #c_effect = np.mean(np.abs(ols1.coef_)+1e-6,axis=1) / np.mean(np.abs(ols0.coef_)+1e-6,axis=1)
399 | c_effect = (np.linalg.norm(c1,axis=0)+1e-6)/(np.linalg.norm(c0,axis=0)+1e-6)
400 | s_effect = (np.linalg.norm(e1,axis=0)+1e-6)/(np.linalg.norm(e0,axis=0)+1e-6)
401 | return c_effect, s_effect
402 |
403 |
404 | class Xi:
405 | """
406 | x and y are the data vectors
407 | """
408 |
409 | def __init__(self, x, y):
410 |
411 | self.x = x
412 | self.y = y
413 |
414 | @property
415 | def sample_size(self):
416 | return len(self.x)
417 |
418 | @property
419 | def x_ordered_rank(self):
420 | # PI is the rank vector for x, with ties broken at random
421 | # Not mine: source (https://stackoverflow.com/a/47430384/1628971)
422 | # random shuffling of the data - reason to use random.choice is that
423 | # pd.sample(frac=1) uses the same randomizing algorithm
424 | len_x = len(self.x)
425 | randomized_indices = np.random.choice(np.arange(len_x), len_x, replace=False)
426 | randomized = [self.x[idx] for idx in randomized_indices]
427 | # same as pandas rank method 'first'
428 | rankdata = ss.rankdata(randomized, method="ordinal")
429 | # Reindexing based on pairs of indices before and after
430 | unrandomized = [
431 | rankdata[j] for i, j in sorted(zip(randomized_indices, range(len_x)))
432 | ]
433 | return unrandomized
434 |
435 | @property
436 | def y_rank_max(self):
437 | # f[i] is number of j s.t. y[j] <= y[i], divided by n.
438 | return ss.rankdata(self.y, method="max") / self.sample_size
439 |
440 | @property
441 | def g(self):
442 | # g[i] is number of j s.t. y[j] >= y[i], divided by n.
443 | return ss.rankdata([-i for i in self.y], method="max") / self.sample_size
444 |
445 | @property
446 | def x_ordered(self):
447 | # order of the x's, ties broken at random.
448 | return np.argsort(self.x_ordered_rank)
449 |
450 | @property
451 | def x_rank_max_ordered(self):
452 | x_ordered_result = self.x_ordered
453 | y_rank_max_result = self.y_rank_max
454 | # Rearrange f according to ord.
455 | return [y_rank_max_result[i] for i in x_ordered_result]
456 |
457 | @property
458 | def mean_absolute(self):
459 | x1 = self.x_rank_max_ordered[0 : (self.sample_size - 1)]
460 | x2 = self.x_rank_max_ordered[1 : self.sample_size]
461 |
462 | return (
463 | np.mean(
464 | np.abs(
465 | [
466 | x - y
467 | for x, y in zip(
468 | x1,
469 | x2,
470 | )
471 | ]
472 | )
473 | )
474 | * (self.sample_size - 1)
475 | / (2 * self.sample_size)
476 | )
477 |
478 | @property
479 | def inverse_g_mean(self):
480 | gvalue = self.g
481 | return np.mean(gvalue * (1 - gvalue))
482 |
483 | @property
484 | def correlation(self):
485 | """xi correlation"""
486 | return 1 - self.mean_absolute / self.inverse_g_mean
487 |
488 | @classmethod
489 | def xi(cls, x, y):
490 | return cls(x, y)
491 |
492 | def pval_asymptotic(self, ties=False, nperm=1000):
493 | """
494 | Returns p values of the correlation
495 | Args:
496 | ties: boolean
497 | If ties is true, the algorithm assumes that the data has ties
498 | and employs the more elaborated theory for calculating
499 | the P-value. Otherwise, it uses the simpler theory. There is
500 | no harm in setting tiles True, even if there are no ties.
501 | nperm: int
502 | The number of permutations for the permutation test, if needed.
503 | default 1000
504 | Returns:
505 | p value
506 | """
507 | # If there are no ties, return xi and theoretical P-value:
508 |
509 | if ties:
510 | return 1 - ss.norm.cdf(
511 | np.sqrt(self.sample_size) * self.correlation / np.sqrt(2 / 5)
512 | )
513 |
514 | # If there are ties, and the theoretical method
515 | # is to be used for calculation P-values:
516 | # The following steps calculate the theoretical variance
517 | # in the presence of ties:
518 | sorted_ordered_x_rank = sorted(self.x_rank_max_ordered)
519 |
520 | ind = [i + 1 for i in range(self.sample_size)]
521 | ind2 = [2 * self.sample_size - 2 * ind[i - 1] + 1 for i in ind]
522 |
523 | a = (
524 | np.mean([i * j * j for i, j in zip(ind2, sorted_ordered_x_rank)])
525 | / self.sample_size
526 | )
527 |
528 | c = (
529 | np.mean([i * j for i, j in zip(ind2, sorted_ordered_x_rank)])
530 | / self.sample_size
531 | )
532 |
533 | cq = np.cumsum(sorted_ordered_x_rank)
534 |
535 | m = [
536 | (i + (self.sample_size - j) * k) / self.sample_size
537 | for i, j, k in zip(cq, ind, sorted_ordered_x_rank)
538 | ]
539 |
540 | b = np.mean([np.square(i) for i in m])
541 | v = (a - 2 * b + np.square(c)) / np.square(self.inverse_g_mean)
542 |
543 | return 1 - ss.norm.cdf(
544 | np.sqrt(self.sample_size) * self.correlation / np.sqrt(v)
545 | )
--------------------------------------------------------------------------------
/cinemaot/sinkhorn_knopp.py:
--------------------------------------------------------------------------------
1 | import warnings
2 |
3 | import numpy as np
4 |
5 |
6 | class SinkhornKnopp:
7 | """
8 | Sinkhorn Knopp Algorithm
9 |
10 | Takes a non-negative square matrix P, where P =/= 0
11 | and iterates through Sinkhorn Knopp's algorithm
12 | to convert P to a doubly stochastic matrix.
13 | Guaranteed convergence if P has total support.
14 |
15 | For reference see original paper:
16 | http://msp.org/pjm/1967/21-2/pjm-v21-n2-p14-s.pdf
17 |
18 | Parameters
19 | ----------
20 | max_iter : int, default=1000
21 | The maximum number of iterations.
22 |
23 | epsilon : float, default=1e-3
24 | Metric used to compute the stopping condition,
25 | which occurs if all the row and column sums are
26 | within epsilon of 1. This should be a very small value.
27 | Epsilon must be between 0 and 1.
28 |
29 | Attributes
30 | ----------
31 | _max_iter : int, default=1000
32 | User defined parameter. See above.
33 |
34 | _epsilon : float, default=1e-3
35 | User defined paramter. See above.
36 |
37 | _stopping_condition: string
38 | Either "max_iter", "epsilon", or None, which is a
39 | description of why the algorithm stopped iterating.
40 |
41 | _iterations : int
42 | The number of iterations elapsed during the algorithm's
43 | run-time.
44 |
45 | _D1 : 2d-array
46 | Diagonal matrix obtained after a stopping condition was met
47 | so that _D1.dot(P).dot(_D2) is close to doubly stochastic.
48 |
49 | _D2 : 2d-array
50 | Diagonal matrix obtained after a stopping condition was met
51 | so that _D1.dot(P).dot(_D2) is close to doubly stochastic.
52 |
53 | Example
54 | -------
55 |
56 | .. code-block:: python
57 | >>> import numpy as np
58 | >>> from sinkhorn_knopp import sinkhorn_knopp as skp
59 | >>> sk = skp.SinkhornKnopp()
60 | >>> P = [[.011, .15], [1.71, .1]]
61 | >>> P_ds = sk.fit(P)
62 | >>> P_ds
63 | array([[ 0.06102561, 0.93897439],
64 | [ 0.93809928, 0.06190072]])
65 | >>> np.sum(P_ds, axis=0)
66 | array([ 0.99912489, 1.00087511])
67 | >>> np.sum(P_ds, axis=1)
68 | array([ 1., 1.])
69 |
70 | """
71 |
72 | def __init__(self, max_iter=1000, setr=0, setc=0, epsilon=1e-3):
73 | assert isinstance(max_iter, int) or isinstance(max_iter, float),\
74 | "max_iter is not of type int or float: %r" % max_iter
75 | assert max_iter > 0,\
76 | "max_iter must be greater than 0: %r" % max_iter
77 | self._max_iter = int(max_iter)
78 |
79 | assert isinstance(epsilon, int) or isinstance(epsilon, float),\
80 | "epsilon is not of type float or int: %r" % epsilon
81 | assert epsilon > 0 and epsilon < 1,\
82 | "epsilon must be between 0 and 1 exclusive: %r" % epsilon
83 | self._epsilon = epsilon
84 | self._setr = setr
85 | self._setc = setc
86 | self._stopping_condition = None
87 | self._iterations = 0
88 | self._D1 = np.ones(1)
89 | self._D2 = np.ones(1)
90 |
91 | def fit(self, P):
92 | """Fit the diagonal matrices in Sinkhorn Knopp's algorithm
93 |
94 | Parameters
95 | ----------
96 | P : 2d array-like
97 | Must be a square non-negative 2d array-like object, that
98 | is convertible to a numpy array. The matrix must not be
99 | equal to 0 and it must have total support for the algorithm
100 | to converge.
101 |
102 | Returns
103 | -------
104 | A double stochastic matrix.
105 |
106 | """
107 | P = np.asarray(P)
108 | assert np.all(P >= 0)
109 | assert P.ndim == 2
110 |
111 | N = P.shape[0]
112 | if np.sum(abs(self._setr)) == 0:
113 | rsum = P.shape[1]
114 | else:
115 | rsum = self._setr
116 | if np.sum(abs(self._setc)) == 0:
117 | csum = P.shape[0]
118 | else:
119 | csum = self._setc
120 | max_threshr = rsum + self._epsilon
121 | min_threshr = rsum - self._epsilon
122 | max_threshc = csum + self._epsilon
123 | min_threshc = csum - self._epsilon
124 | # Initialize r and c, the diagonals of D1 and D2
125 | # and warn if the matrix does not have support.
126 | r = np.ones((N, 1))
127 | pdotr = P.T.dot(r)
128 | total_support_warning_str = (
129 | "Matrix P must have total support. "
130 | "See documentation"
131 | )
132 | if not np.all(pdotr != 0):
133 | warnings.warn(total_support_warning_str, UserWarning)
134 |
135 | c = 1 / pdotr
136 | pdotc = P.dot(c)
137 | if not np.all(pdotc != 0):
138 | warnings.warn(total_support_warning_str, UserWarning)
139 |
140 | r = 1 / pdotc
141 | del pdotr, pdotc
142 |
143 | P_eps = np.copy(P)
144 | while np.any(np.sum(P_eps, axis=1) < min_threshr) \
145 | or np.any(np.sum(P_eps, axis=1) > max_threshr) \
146 | or np.any(np.sum(P_eps, axis=0) < min_threshc) \
147 | or np.any(np.sum(P_eps, axis=0) > max_threshc):
148 |
149 | c = csum / P.T.dot(r)
150 | r = rsum / P.dot(c)
151 |
152 | self._D1 = np.diag(np.squeeze(r))
153 | self._D2 = np.diag(np.squeeze(c))
154 |
155 | P_eps = np.diag(self._D1)[:,None] * P * np.diag(self._D2)[None,:]
156 |
157 |
158 | self._iterations += 1
159 |
160 | if self._iterations >= self._max_iter:
161 | self._stopping_condition = "max_iter"
162 | break
163 |
164 | if not self._stopping_condition:
165 | self._stopping_condition = "epsilon"
166 |
167 | self._D1 = np.diag(np.squeeze(r))
168 | self._D2 = np.diag(np.squeeze(c))
169 | P_eps = np.diag(self._D1)[:,None] * P * np.diag(self._D2)[None,:]
170 |
171 | return P_eps
172 |
--------------------------------------------------------------------------------
/cinemaot/utils.py:
--------------------------------------------------------------------------------
1 | import gseapy as gp
2 | import pandas as pd
3 | from scipy.stats import wilcoxon
4 | import numpy as np
5 | import scanpy as sc
6 | #import scib
7 | from sklearn.linear_model import LogisticRegression
8 | from sklearn.preprocessing import OneHotEncoder
9 | from scipy.stats import kstest
10 | import plotly.graph_objects as go
11 | import plotly.express as px
12 |
13 | # import rpy2.robjects as ro
14 | # import rpy2.robjects.numpy2ri
15 | # import rpy2.robjects.pandas2ri
16 | # from rpy2.robjects.packages import importr
17 | # rpy2.robjects.numpy2ri.activate()
18 | # rpy2.robjects.pandas2ri.activate()
19 |
20 |
21 | def dominantcluster(adata,ctobs,clobs):
22 | clustername = []
23 | clustertime = np.zeros(adata.obs[ctobs].value_counts().values.shape[0])
24 | for i in adata.obs[clobs].value_counts().sort_index().index.values:
25 | tmp = adata.obs[ctobs][adata.obs[clobs]==i].value_counts().sort_index()
26 | ind = np.argmax(tmp.values)
27 | clustername.append(tmp.index.values[ind] + str(int(clustertime[ind])))
28 | clustertime[ind] = clustertime[ind] + 1
29 | return clustername
30 |
31 | def assignleiden(adata,ctobs,clobs,label):
32 | clustername = dominantcluster(adata,ctobs,clobs)
33 | ss = adata.obs[clobs].values.tolist()
34 | for i in range(len(ss)):
35 | ss[i] = clustername[int(ss[i])]
36 | adata.obs[label] = ss
37 | return
38 |
39 | def clustertest_synergy(adata1,adata2,clobs,thres,fthres,path,genesetpath,organism):
40 | # In this simplified function, we return the gene set only. The function is only designed for synergy computation.
41 | mkup = []
42 | mkdown = []
43 | for i in list(set(adata1.obs[clobs].values.tolist())):
44 | adata = adata1
45 | clusterindex = (adata.obs[clobs].values==i)
46 | tmpte = adata.X[clusterindex,:]
47 | clustername = i
48 | pv = np.zeros(tmpte.shape[1])
49 | for k in range(tmpte.shape[1]):
50 | st, pv[k] = wilcoxon(tmpte[:,k],zero_method='zsplit')
51 | genenames = adata.var_names.values
52 | upindex = (((pv0)*1) * (np.abs(np.median(tmpte,axis=0))>fthres))>0
53 | downindex = (((pvfthres))>0
54 | allindex = (((pvfthres))>0
55 | upgenes1 = genenames[upindex]
56 | downgenes1 = genenames[downindex]
57 | allgenes1 = genenames[allindex]
58 | adata = adata2
59 | clusterindex = (adata.obs[clobs].values==i)
60 | tmpte = adata.X[clusterindex,:]
61 | clustername = i
62 | pv = np.zeros(tmpte.shape[1])
63 | for k in range(tmpte.shape[1]):
64 | st, pv[k] = wilcoxon(tmpte[:,k],zero_method='zsplit')
65 | genenames = adata.var_names.values
66 | upindex = (((pv0)*1) * (np.abs(np.median(tmpte,axis=0))>fthres))>0
67 | downindex = (((pvfthres))>0
68 | allindex = (((pvfthres))>0
69 | upgenes2 = genenames[upindex]
70 | downgenes2 = genenames[downindex]
71 | allgenes2 = genenames[allindex]
72 | up1syn = list(set(upgenes1.tolist()) - set(upgenes2.tolist()))
73 | up2syn = list(set(upgenes2.tolist()) - set(upgenes1.tolist()))
74 | down1syn = list(set(downgenes1.tolist()) - set(downgenes2.tolist()))
75 | down2syn = list(set(downgenes2.tolist()) - set(downgenes1.tolist()))
76 | allgenes = list(set(up1syn) | set(up2syn) | set(down1syn) | set(down2syn))
77 | enr_up1 = gp.enrichr(gene_list=up1syn, gene_sets=genesetpath,
78 | no_plot=True,organism=organism,
79 | outdir=path, format='png')
80 | enr_up2 = gp.enrichr(gene_list=up2syn, gene_sets=genesetpath,
81 | no_plot=True,organism=organism,
82 | outdir=path, format='png')
83 | enr_down1 = gp.enrichr(gene_list=down1syn, gene_sets=genesetpath,
84 | no_plot=True,organism=organism,
85 | outdir=path, format='png')
86 | enr_down2 = gp.enrichr(gene_list=down2syn, gene_sets=genesetpath,
87 | no_plot=True,organism=organism,
88 | outdir=path, format='png')
89 | if not enr_up1.results.empty:
90 | enr_up1.results.iloc[enr_up1.results['Adjusted P-value'].values<1e-2,:].to_csv(path+'/Up1'+clustername+'.csv')
91 | if not enr_up2.results.empty:
92 | enr_up2.results.iloc[enr_up2.results['Adjusted P-value'].values<1e-2,:].to_csv(path+'/Up2'+clustername+'.csv')
93 | if not enr_down1.results.empty:
94 | enr_down1.results.iloc[enr_down1.results['Adjusted P-value'].values<1e-2,:].to_csv(path+'/Down1'+clustername+'.csv')
95 | if not enr_down2.results.empty:
96 | enr_down2.results.iloc[enr_down2.results['Adjusted P-value'].values<1e-2,:].to_csv(path+'/Down2'+clustername+'.csv')
97 | upgenes1df = pd.DataFrame(index=up1syn)
98 | upgenes2df = pd.DataFrame(index=up2syn)
99 | downgenes1df = pd.DataFrame(index=down1syn)
100 | downgenes2df = pd.DataFrame(index=down2syn)
101 | allgenesdf = pd.DataFrame(index=allgenes)
102 | upgenes1df.to_csv(path+'/Upnames1'+clustername+'.csv')
103 | upgenes2df.to_csv(path+'/Upnames2'+clustername+'.csv')
104 | downgenes1df.to_csv(path+'/Downnames1'+clustername+'.csv')
105 | downgenes2df.to_csv(path+'/Downnames2'+clustername+'.csv')
106 | allgenesdf.to_csv(path+'/names'+clustername+'.csv')
107 |
108 | return
109 |
110 |
111 | def clustertest(adata,clobs,thres,fthres,label,path,genesetpath,organism,onlyup=False):
112 | # Changed from ttest to Wilcoxon test
113 | clusternum = int(np.max((np.asfarray(adata.obs[clobs].values))))
114 | genenum = np.zeros([clusternum+1])
115 | mk = []
116 | for i in range(clusternum+1):
117 | clusterindex = (np.asfarray(adata.obs[clobs].values)==i)
118 | tmpte = adata.X[clusterindex,:]
119 | clustername = adata.obs[label][clusterindex][0]
120 | pv = np.zeros(tmpte.shape[1])
121 | for k in range(tmpte.shape[1]):
122 | st, pv[k] = wilcoxon(tmpte[:,k],zero_method='zsplit')
123 | genenames = adata.var_names.values
124 | upindex = (((pv0)*1) * (np.abs(np.median(tmpte,axis=0))>fthres))>0
125 | downindex = (((pvfthres))>0
126 | allindex = (((pvfthres))>0
127 | upgenes = genenames[upindex]
128 | downgenes = genenames[downindex]
129 | allgenes = genenames[allindex]
130 | mk.extend(allgenes.tolist())
131 | mk = list(set(mk))
132 | genenum[i] = np.sum(((pvfthres)))
133 | enr_up = gp.enrichr(gene_list=upgenes.tolist(), gene_sets=genesetpath,
134 | no_plot=True,organism=organism,
135 | outdir=path, format='png')
136 | enr_down = gp.enrichr(gene_list=downgenes.tolist(), gene_sets=genesetpath,
137 | no_plot=True,organism=organism,
138 | outdir=path, format='png')
139 | enr = gp.enrichr(gene_list=allgenes.tolist(), gene_sets=genesetpath,
140 | no_plot=True,organism=organism,
141 | outdir=path, format='png')
142 | if not enr_up.results.empty:
143 | enr_up.results.iloc[enr_up.results['Adjusted P-value'].values<1e-3,:].to_csv(path+'/Up'+clustername+'.csv')
144 | if not enr_down.results.empty:
145 | enr_down.results.iloc[enr_down.results['Adjusted P-value'].values<1e-3,:].to_csv(path+'/Down'+clustername+'.csv')
146 | if not enr.results.empty:
147 | enr.results.iloc[enr.results['Adjusted P-value'].values<1e-3,:].to_csv(path+'/'+clustername+'.csv')
148 | upgenesdf = pd.DataFrame(index=upgenes)
149 | downgenesdf = pd.DataFrame(index=downgenes)
150 | allgenesdf = pd.DataFrame(index=allgenes)
151 | upgenesdf.to_csv(path+'/Upnames'+clustername+'.csv')
152 | downgenesdf.to_csv(path+'/Downnames'+clustername+'.csv')
153 | allgenesdf.to_csv(path+'/names'+clustername+'.csv')
154 | if onlyup:
155 | enr = enr_up
156 |
157 | if not enr.results.empty:
158 | if i == 0:
159 | df = enr.results.transpose().iloc[4:5,:]
160 | df.columns = enr.results['Term'][:]
161 | df.index.values[0] = clustername
162 | else:
163 | tmp = enr.results.transpose().iloc[4:5,:]
164 | tmp.columns = enr.results['Term'][:]
165 | tmp.index.values[0] = clustername
166 | df = pd.concat([df,tmp])
167 | #df.values = -np.log10(df.values)
168 | #DF = sc.AnnData(df.transpose())
169 | #sc.pl.clustermap(DF,cmap='viridis', col_cluster=False)
170 | return genenum, df, mk
171 |
172 |
173 | def concordance_map(confounder,response,obs_label, cl_label, condition):
174 | #deprecated
175 | cf = confounder[confounder.obs[obs_label] == condition,:]
176 | cf.obs['res_cl'] = response.obs[cl_label].values
177 | aswmatrix = np.zeros([len(list(set(cf.obs['res_cl'].values.tolist()))),len(list(set(cf.obs['res_cl'].values.tolist())))])
178 | indnummatrix = pd.DataFrame(None,list(set(cf.obs['res_cl'].values.tolist())),list(set(cf.obs['res_cl'].values.tolist())))
179 | k = 0
180 | #return aswmatrix
181 | for i in list(set(cf.obs['res_cl'].values.tolist())):
182 | l = 0
183 | for j in list(set(cf.obs['res_cl'].values.tolist())):
184 | if i != j:
185 | tmpcf = cf[cf.obs['res_cl'].isin([i,j]),:].copy()
186 | sc.pp.pca(tmpcf)
187 | encoder = OneHotEncoder(sparse=False)
188 | onehot = encoder.fit_transform(np.array(tmpcf.obs['res_cl'].values.tolist()).reshape(-1, 1))
189 | label = onehot[:,0]
190 | lc = LogisticRegression(penalty='l1',solver='liblinear',C=1)
191 | lc.fit(tmpcf.X, label)
192 | prob = lc.predict_proba(tmpcf.X)
193 | prob1 = prob[label==1,0]
194 | prob2 = prob[label==0,0]
195 | st, pv = kstest(prob1,prob2)
196 | #yi = np.zeros([onehot.shape[1],eigen.shape[1]])
197 | aswmatrix[k,l] = -np.log10(pv+1e-20)
198 | if np.sum(lc.coef_!=0)>0:
199 | indnummatrix.iloc[k,l] = str(np.argwhere(lc.coef_[0] !=0)[:,0].tolist())[1:-1]
200 | else:
201 | aswmatrix[k,l] = 0
202 | l = l + 1
203 | k = k + 1
204 | aswmatrix = pd.DataFrame(aswmatrix,list(set(cf.obs['res_cl'].values.tolist())),list(set(cf.obs['res_cl'].values.tolist())))
205 | return aswmatrix, indnummatrix
206 |
207 |
208 | def coarse_matching(de,de_label,ref,ref_label,ot,scaling=1e6,mode='mean'):
209 | coarse_ot = pd.DataFrame(index=sorted(set(de.obs[de_label].values.tolist())),columns=sorted(set(ref.obs[ref_label].values.tolist())),dtype=float)
210 | for i in set(de.obs[de_label].values.tolist()):
211 | for j in set(ref.obs[ref_label].values.tolist()):
212 | tmp_ot = ot[de.obs[de_label]==i,:]
213 | if mode=='mean':
214 | coarse_ot[j][i] = np.mean(tmp_ot[:,ref.obs[ref_label]==j]) * scaling
215 | else:
216 | coarse_ot[j][i] = np.sum(tmp_ot[:,ref.obs[ref_label]==j]) * scaling
217 | return coarse_ot
218 |
219 | def sankey_plot(coarse_ot,thres1=0.1,thres2=0.1,title_text="Sankey Diagram",width=600,height=400):
220 | new_coarse_ot = pd.DataFrame(np.zeros([coarse_ot.shape[0]*coarse_ot.shape[1],3]))
221 | k = 0
222 | for i in range(coarse_ot.shape[0]):
223 | for j in range(coarse_ot.shape[1]):
224 | thres_ = max(thres1 * np.sum(coarse_ot.values[i,:]), thres2 * np.sum(coarse_ot.values[:,j]))
225 | if coarse_ot.values[i,j] > thres_:
226 | new_coarse_ot.iloc[k,1] = 'Response: ' + coarse_ot.index[i]
227 | new_coarse_ot.iloc[k,0] = coarse_ot.columns[j]
228 | new_coarse_ot.iloc[k,2] = coarse_ot.values[i,j]
229 |
230 | k = k + 1
231 | new_coarse_ot = new_coarse_ot.loc[new_coarse_ot.iloc[:,2]>0,:]
232 | a = set(new_coarse_ot[0].values.tolist())
233 | b = set(new_coarse_ot[1].values.tolist())
234 | a0 = []
235 | for i in range(len(list(a))):
236 | a0.append(list(a)[i][:-1])
237 | a0 = list(set(a0))
238 |
239 | source = np.arange(new_coarse_ot.shape[0] + new_coarse_ot.shape[0])
240 | target = np.arange(new_coarse_ot.shape[0] + new_coarse_ot.shape[0])
241 |
242 | for i in range(new_coarse_ot.shape[0]):
243 | source[i+new_coarse_ot.shape[0]] = np.argwhere(np.array(list(a))==new_coarse_ot[0].values[i])[0][0]
244 | target[i+new_coarse_ot.shape[0]] = np.argwhere(np.array(list(b))==new_coarse_ot[1].values[i])[0][0]
245 |
246 | target = target + len(list(a))
247 |
248 | for i in range(new_coarse_ot.shape[0]):
249 | source[i] = np.argwhere(np.array(a0)==new_coarse_ot[0].values[i][:-1])[0][0]
250 | target[i] = np.argwhere(np.array(list(a))==new_coarse_ot[0].values[i])[0][0]
251 |
252 | target = target + len(a0)
253 | source[new_coarse_ot.shape[0]:] = source[new_coarse_ot.shape[0]:] + len(a0)
254 | values = np.zeros(2*new_coarse_ot.shape[0])
255 | for i in range(new_coarse_ot.shape[0]):
256 | values[i] = np.sum(new_coarse_ot.values[:,2][new_coarse_ot.values[:,0]==new_coarse_ot.values[i,0]]) / np.sum(new_coarse_ot.values[:,0]==new_coarse_ot.values[i,0])
257 |
258 | values[new_coarse_ot.shape[0]:] = new_coarse_ot.values[:,2]
259 | colorlist = px.colors.qualitative.Plotly
260 | colors = np.array(a0 + list(a) + list(b))
261 | colors[0:len(a0)] = colorlist[0:len(a0)]
262 | for i in range(len(a0),len(a0)+len(list(a))):
263 | colors[i] = colors[0:len(a0)][np.array(a0)==(list(a)[i-len(a0)][:-1])][0]
264 | for i in range(len(a0)+len(list(a)),len(a0)+len(list(a))+len(list(b))):
265 | colors[i] = colors[0:len(a0)][np.array(a0)==(list(b)[i-len(a0)-len(list(a))][10:-1])][0]
266 |
267 | fig = go.Figure(data=[go.Sankey(
268 | node = dict(
269 | pad = 15,
270 | thickness = 20,
271 | #line = dict(color = "black", width = 0.5),
272 | label = a0 + list(a) + list(b),
273 | color = colors
274 | ),
275 | link = dict(
276 | source = source, # indices correspond to labels, eg A1, A2, A1, B1, ...
277 | target = target,
278 | value = values
279 | ))])
280 |
281 | fig.update_layout(title_text="Sankey Diagram", font_family="Arial", font_size=10,width=width, height=height)
282 | fig.show()
283 | return
284 |
285 |
286 |
287 |
--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
1 | [build-system]
2 | requires = ["setuptools>=42"]
3 | build-backend = "setuptools.build_meta"
--------------------------------------------------------------------------------
/setup.cfg:
--------------------------------------------------------------------------------
1 | [metadata]
2 | name = cinemaot
3 | version = 0.0.4
4 | author = Mingze Dong
5 | author_email = mingze.dong@yale.edu
6 | description = Causal INdependent Effect Module Attribution + Optimal Transport
7 | long_description = file: README.md
8 | long_description_content_type = text/markdown
9 | url = https://github.com/vandijklab/CINEMA-OT
10 | project_urls =
11 | Bug Tracker = https://github.com/vandijklab/CINEMA-OT/issues
12 | classifiers =
13 | Programming Language :: Python :: 3
14 | Development Status :: 2 - Pre-Alpha
15 | Operating System :: OS Independent
16 |
17 | [options]
18 | package_dir =
19 | packages = find:
20 | python_requires = >=3.7
21 | install_requires =
22 | numpy
23 | pandas
24 | scanpy
25 | scikit-learn
26 | scipy
27 | statsmodels
28 | anndata
29 |
30 | [options.packages.find]
31 | where =
32 |
33 |
--------------------------------------------------------------------------------
/simulation.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import scanpy as sc
3 | import matplotlib.pyplot as plt
4 | from scsim_master import scsim
5 | import pandas as pd
6 |
7 | import random
8 |
9 | def numbers_with_sum(n, k):
10 | """n numbers with sum k"""
11 | if n == 1:
12 | return [k]
13 | num = random.randint(0, k)
14 | return [num] + numbers_with_sum(n - 1, k - num)
15 |
16 | random.seed(0)
17 | np.random.seed(0)
18 | for i in range(15):
19 | states_num = round(i/5) + 2
20 | gp = numbers_with_sum(states_num, 10-states_num)
21 | simulator = scsim.scsim(ngenes=1000, ncells=5000, seed = i, ngroups=states_num, libloc=7.64, libscale=0.78,
22 | mean_rate=7.68,mean_shape=0.34, expoutprob=0.00286,
23 | expoutloc=6.15, expoutscale=0.49,
24 | diffexpprob=.5, diffexpdownprob=.5, diffexploc=1, diffexpscale=1,
25 | bcv_dispersion=0.448, bcv_dof=22.087, ndoublets=0,groupprob=(np.array(gp)+1)/10,proggoups=[1,2],nproggenes=500,
26 | progdeloc=1,progdescale=1,progdownprob=0.,progcellfrac = 1.)
27 |
28 | simulator.simulate()
29 | tmpobs = simulator.cellparams
30 | ## "Groups" represent the treatment variable
31 | tmpobs['Groups'] = 0
32 | tmpobs['Response_state'] = 0
33 | response_num = round(i/5) + 1
34 | attribution_matrix = np.zeros([states_num,response_num])
35 | simulator2_counts = simulator.counts.iloc[:,0:500].copy()
36 | for j in range(states_num):
37 | ncells_j = np.sum(simulator.cellparams['group']==(j+1))
38 |
39 | group = np.random.randint(0,2,size=ncells_j)
40 |
41 | gp2 = np.zeros(response_num+1) + 0.5
42 | gp2[1:] = (np.array(numbers_with_sum(response_num, 5)))/10
43 |
44 | simulator2 = scsim.scsim(ngenes=500, ncells=ncells_j, seed = 300, ngroups=response_num+1, libloc=7.64, libscale=0.78,
45 | mean_rate=7.68,mean_shape=0.34, expoutprob=0.00286,
46 | expoutloc=6.15, expoutscale=0.49,
47 | diffexpprob=.5, diffexpdownprob=.5, diffexploc=1, diffexpscale=1,
48 | bcv_dispersion=0.148, bcv_dof=22.087, ndoublets=0,groupprob=gp2,nproggenes=0,
49 | progdeloc=1,progdescale=1,progdownprob=0.,progcellfrac = 1.)
50 |
51 | attribution_matrix[j,:] = 2 * gp2[1:]
52 | simulator2.simulate()
53 | ## group==1 is assigned as control, set the rest as perturbed
54 | tmpobs['Groups'][simulator.cellparams['group']==(j+1)] = (simulator2.cellparams['group'].values > 1) * 1 + 1
55 | tmpobs['Response_state'][simulator.cellparams['group']==(j+1)] = simulator2.cellparams['group'].values
56 | simulator2_counts.loc[simulator.cellparams['group']==(j+1),:] = simulator2.counts.values
57 | ## in the final anndata, 'group' represents cell state / type, 'Groups' represents treated or not, 'Response_state' indicates response heterogeneity
58 | adata = sc.AnnData(pd.concat([simulator.counts, simulator2_counts], axis=1),obs=tmpobs)
59 | adata.uns['attribution'] = attribution_matrix
60 | adata.write('ScsimBenchmarkData/adata'+str(i)+'.h5ad')
--------------------------------------------------------------------------------