├── .gitignore
├── LICENSE
├── Readme.md
├── cinemaot
    ├── __init__.py
    ├── benchmark.py
    ├── cinemaot.py
    ├── sinkhorn_knopp.py
    └── utils.py
├── cinemaot_tutorial.ipynb
├── pyproject.toml
├── setup.cfg
└── simulation.py


/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | __pycache__
3 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 |                     GNU AFFERO GENERAL PUBLIC LICENSE
  2 |                        Version 3, 19 November 2007
  3 | 
  4 |  Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
  5 |  Everyone is permitted to copy and distribute verbatim copies
  6 |  of this license document, but changing it is not allowed.
  7 | 
  8 |                             Preamble
  9 | 
 10 |   The GNU Affero General Public License is a free, copyleft license for
 11 | software and other kinds of works, specifically designed to ensure
 12 | cooperation with the community in the case of network server software.
 13 | 
 14 |   The licenses for most software and other practical works are designed
 15 | to take away your freedom to share and change the works.  By contrast,
 16 | our General Public Licenses are intended to guarantee your freedom to
 17 | share and change all versions of a program--to make sure it remains free
 18 | software for all its users.
 19 | 
 20 |   When we speak of free software, we are referring to freedom, not
 21 | price.  Our General Public Licenses are designed to make sure that you
 22 | have the freedom to distribute copies of free software (and charge for
 23 | them if you wish), that you receive source code or can get it if you
 24 | want it, that you can change the software or use pieces of it in new
 25 | free programs, and that you know you can do these things.
 26 | 
 27 |   Developers that use our General Public Licenses protect your rights
 28 | with two steps: (1) assert copyright on the software, and (2) offer
 29 | you this License which gives you legal permission to copy, distribute
 30 | and/or modify the software.
 31 | 
 32 |   A secondary benefit of defending all users' freedom is that
 33 | improvements made in alternate versions of the program, if they
 34 | receive widespread use, become available for other developers to
 35 | incorporate.  Many developers of free software are heartened and
 36 | encouraged by the resulting cooperation.  However, in the case of
 37 | software used on network servers, this result may fail to come about.
 38 | The GNU General Public License permits making a modified version and
 39 | letting the public access it on a server without ever releasing its
 40 | source code to the public.
 41 | 
 42 |   The GNU Affero General Public License is designed specifically to
 43 | ensure that, in such cases, the modified source code becomes available
 44 | to the community.  It requires the operator of a network server to
 45 | provide the source code of the modified version running there to the
 46 | users of that server.  Therefore, public use of a modified version, on
 47 | a publicly accessible server, gives the public access to the source
 48 | code of the modified version.
 49 | 
 50 |   An older license, called the Affero General Public License and
 51 | published by Affero, was designed to accomplish similar goals.  This is
 52 | a different license, not a version of the Affero GPL, but Affero has
 53 | released a new version of the Affero GPL which permits relicensing under
 54 | this license.
 55 | 
 56 |   The precise terms and conditions for copying, distribution and
 57 | modification follow.
 58 | 
 59 |                        TERMS AND CONDITIONS
 60 | 
 61 |   0. Definitions.
 62 | 
 63 |   "This License" refers to version 3 of the GNU Affero General Public License.
 64 | 
 65 |   "Copyright" also means copyright-like laws that apply to other kinds of
 66 | works, such as semiconductor masks.
 67 | 
 68 |   "The Program" refers to any copyrightable work licensed under this
 69 | License.  Each licensee is addressed as "you".  "Licensees" and
 70 | "recipients" may be individuals or organizations.
 71 | 
 72 |   To "modify" a work means to copy from or adapt all or part of the work
 73 | in a fashion requiring copyright permission, other than the making of an
 74 | exact copy.  The resulting work is called a "modified version" of the
 75 | earlier work or a work "based on" the earlier work.
 76 | 
 77 |   A "covered work" means either the unmodified Program or a work based
 78 | on the Program.
 79 | 
 80 |   To "propagate" a work means to do anything with it that, without
 81 | permission, would make you directly or secondarily liable for
 82 | infringement under applicable copyright law, except executing it on a
 83 | computer or modifying a private copy.  Propagation includes copying,
 84 | distribution (with or without modification), making available to the
 85 | public, and in some countries other activities as well.
 86 | 
 87 |   To "convey" a work means any kind of propagation that enables other
 88 | parties to make or receive copies.  Mere interaction with a user through
 89 | a computer network, with no transfer of a copy, is not conveying.
 90 | 
 91 |   An interactive user interface displays "Appropriate Legal Notices"
 92 | to the extent that it includes a convenient and prominently visible
 93 | feature that (1) displays an appropriate copyright notice, and (2)
 94 | tells the user that there is no warranty for the work (except to the
 95 | extent that warranties are provided), that licensees may convey the
 96 | work under this License, and how to view a copy of this License.  If
 97 | the interface presents a list of user commands or options, such as a
 98 | menu, a prominent item in the list meets this criterion.
 99 | 
100 |   1. Source Code.
101 | 
102 |   The "source code" for a work means the preferred form of the work
103 | for making modifications to it.  "Object code" means any non-source
104 | form of a work.
105 | 
106 |   A "Standard Interface" means an interface that either is an official
107 | standard defined by a recognized standards body, or, in the case of
108 | interfaces specified for a particular programming language, one that
109 | is widely used among developers working in that language.
110 | 
111 |   The "System Libraries" of an executable work include anything, other
112 | than the work as a whole, that (a) is included in the normal form of
113 | packaging a Major Component, but which is not part of that Major
114 | Component, and (b) serves only to enable use of the work with that
115 | Major Component, or to implement a Standard Interface for which an
116 | implementation is available to the public in source code form.  A
117 | "Major Component", in this context, means a major essential component
118 | (kernel, window system, and so on) of the specific operating system
119 | (if any) on which the executable work runs, or a compiler used to
120 | produce the work, or an object code interpreter used to run it.
121 | 
122 |   The "Corresponding Source" for a work in object code form means all
123 | the source code needed to generate, install, and (for an executable
124 | work) run the object code and to modify the work, including scripts to
125 | control those activities.  However, it does not include the work's
126 | System Libraries, or general-purpose tools or generally available free
127 | programs which are used unmodified in performing those activities but
128 | which are not part of the work.  For example, Corresponding Source
129 | includes interface definition files associated with source files for
130 | the work, and the source code for shared libraries and dynamically
131 | linked subprograms that the work is specifically designed to require,
132 | such as by intimate data communication or control flow between those
133 | subprograms and other parts of the work.
134 | 
135 |   The Corresponding Source need not include anything that users
136 | can regenerate automatically from other parts of the Corresponding
137 | Source.
138 | 
139 |   The Corresponding Source for a work in source code form is that
140 | same work.
141 | 
142 |   2. Basic Permissions.
143 | 
144 |   All rights granted under this License are granted for the term of
145 | copyright on the Program, and are irrevocable provided the stated
146 | conditions are met.  This License explicitly affirms your unlimited
147 | permission to run the unmodified Program.  The output from running a
148 | covered work is covered by this License only if the output, given its
149 | content, constitutes a covered work.  This License acknowledges your
150 | rights of fair use or other equivalent, as provided by copyright law.
151 | 
152 |   You may make, run and propagate covered works that you do not
153 | convey, without conditions so long as your license otherwise remains
154 | in force.  You may convey covered works to others for the sole purpose
155 | of having them make modifications exclusively for you, or provide you
156 | with facilities for running those works, provided that you comply with
157 | the terms of this License in conveying all material for which you do
158 | not control copyright.  Those thus making or running the covered works
159 | for you must do so exclusively on your behalf, under your direction
160 | and control, on terms that prohibit them from making any copies of
161 | your copyrighted material outside their relationship with you.
162 | 
163 |   Conveying under any other circumstances is permitted solely under
164 | the conditions stated below.  Sublicensing is not allowed; section 10
165 | makes it unnecessary.
166 | 
167 |   3. Protecting Users' Legal Rights From Anti-Circumvention Law.
168 | 
169 |   No covered work shall be deemed part of an effective technological
170 | measure under any applicable law fulfilling obligations under article
171 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or
172 | similar laws prohibiting or restricting circumvention of such
173 | measures.
174 | 
175 |   When you convey a covered work, you waive any legal power to forbid
176 | circumvention of technological measures to the extent such circumvention
177 | is effected by exercising rights under this License with respect to
178 | the covered work, and you disclaim any intention to limit operation or
179 | modification of the work as a means of enforcing, against the work's
180 | users, your or third parties' legal rights to forbid circumvention of
181 | technological measures.
182 | 
183 |   4. Conveying Verbatim Copies.
184 | 
185 |   You may convey verbatim copies of the Program's source code as you
186 | receive it, in any medium, provided that you conspicuously and
187 | appropriately publish on each copy an appropriate copyright notice;
188 | keep intact all notices stating that this License and any
189 | non-permissive terms added in accord with section 7 apply to the code;
190 | keep intact all notices of the absence of any warranty; and give all
191 | recipients a copy of this License along with the Program.
192 | 
193 |   You may charge any price or no price for each copy that you convey,
194 | and you may offer support or warranty protection for a fee.
195 | 
196 |   5. Conveying Modified Source Versions.
197 | 
198 |   You may convey a work based on the Program, or the modifications to
199 | produce it from the Program, in the form of source code under the
200 | terms of section 4, provided that you also meet all of these conditions:
201 | 
202 |     a) The work must carry prominent notices stating that you modified
203 |     it, and giving a relevant date.
204 | 
205 |     b) The work must carry prominent notices stating that it is
206 |     released under this License and any conditions added under section
207 |     7.  This requirement modifies the requirement in section 4 to
208 |     "keep intact all notices".
209 | 
210 |     c) You must license the entire work, as a whole, under this
211 |     License to anyone who comes into possession of a copy.  This
212 |     License will therefore apply, along with any applicable section 7
213 |     additional terms, to the whole of the work, and all its parts,
214 |     regardless of how they are packaged.  This License gives no
215 |     permission to license the work in any other way, but it does not
216 |     invalidate such permission if you have separately received it.
217 | 
218 |     d) If the work has interactive user interfaces, each must display
219 |     Appropriate Legal Notices; however, if the Program has interactive
220 |     interfaces that do not display Appropriate Legal Notices, your
221 |     work need not make them do so.
222 | 
223 |   A compilation of a covered work with other separate and independent
224 | works, which are not by their nature extensions of the covered work,
225 | and which are not combined with it such as to form a larger program,
226 | in or on a volume of a storage or distribution medium, is called an
227 | "aggregate" if the compilation and its resulting copyright are not
228 | used to limit the access or legal rights of the compilation's users
229 | beyond what the individual works permit.  Inclusion of a covered work
230 | in an aggregate does not cause this License to apply to the other
231 | parts of the aggregate.
232 | 
233 |   6. Conveying Non-Source Forms.
234 | 
235 |   You may convey a covered work in object code form under the terms
236 | of sections 4 and 5, provided that you also convey the
237 | machine-readable Corresponding Source under the terms of this License,
238 | in one of these ways:
239 | 
240 |     a) Convey the object code in, or embodied in, a physical product
241 |     (including a physical distribution medium), accompanied by the
242 |     Corresponding Source fixed on a durable physical medium
243 |     customarily used for software interchange.
244 | 
245 |     b) Convey the object code in, or embodied in, a physical product
246 |     (including a physical distribution medium), accompanied by a
247 |     written offer, valid for at least three years and valid for as
248 |     long as you offer spare parts or customer support for that product
249 |     model, to give anyone who possesses the object code either (1) a
250 |     copy of the Corresponding Source for all the software in the
251 |     product that is covered by this License, on a durable physical
252 |     medium customarily used for software interchange, for a price no
253 |     more than your reasonable cost of physically performing this
254 |     conveying of source, or (2) access to copy the
255 |     Corresponding Source from a network server at no charge.
256 | 
257 |     c) Convey individual copies of the object code with a copy of the
258 |     written offer to provide the Corresponding Source.  This
259 |     alternative is allowed only occasionally and noncommercially, and
260 |     only if you received the object code with such an offer, in accord
261 |     with subsection 6b.
262 | 
263 |     d) Convey the object code by offering access from a designated
264 |     place (gratis or for a charge), and offer equivalent access to the
265 |     Corresponding Source in the same way through the same place at no
266 |     further charge.  You need not require recipients to copy the
267 |     Corresponding Source along with the object code.  If the place to
268 |     copy the object code is a network server, the Corresponding Source
269 |     may be on a different server (operated by you or a third party)
270 |     that supports equivalent copying facilities, provided you maintain
271 |     clear directions next to the object code saying where to find the
272 |     Corresponding Source.  Regardless of what server hosts the
273 |     Corresponding Source, you remain obligated to ensure that it is
274 |     available for as long as needed to satisfy these requirements.
275 | 
276 |     e) Convey the object code using peer-to-peer transmission, provided
277 |     you inform other peers where the object code and Corresponding
278 |     Source of the work are being offered to the general public at no
279 |     charge under subsection 6d.
280 | 
281 |   A separable portion of the object code, whose source code is excluded
282 | from the Corresponding Source as a System Library, need not be
283 | included in conveying the object code work.
284 | 
285 |   A "User Product" is either (1) a "consumer product", which means any
286 | tangible personal property which is normally used for personal, family,
287 | or household purposes, or (2) anything designed or sold for incorporation
288 | into a dwelling.  In determining whether a product is a consumer product,
289 | doubtful cases shall be resolved in favor of coverage.  For a particular
290 | product received by a particular user, "normally used" refers to a
291 | typical or common use of that class of product, regardless of the status
292 | of the particular user or of the way in which the particular user
293 | actually uses, or expects or is expected to use, the product.  A product
294 | is a consumer product regardless of whether the product has substantial
295 | commercial, industrial or non-consumer uses, unless such uses represent
296 | the only significant mode of use of the product.
297 | 
298 |   "Installation Information" for a User Product means any methods,
299 | procedures, authorization keys, or other information required to install
300 | and execute modified versions of a covered work in that User Product from
301 | a modified version of its Corresponding Source.  The information must
302 | suffice to ensure that the continued functioning of the modified object
303 | code is in no case prevented or interfered with solely because
304 | modification has been made.
305 | 
306 |   If you convey an object code work under this section in, or with, or
307 | specifically for use in, a User Product, and the conveying occurs as
308 | part of a transaction in which the right of possession and use of the
309 | User Product is transferred to the recipient in perpetuity or for a
310 | fixed term (regardless of how the transaction is characterized), the
311 | Corresponding Source conveyed under this section must be accompanied
312 | by the Installation Information.  But this requirement does not apply
313 | if neither you nor any third party retains the ability to install
314 | modified object code on the User Product (for example, the work has
315 | been installed in ROM).
316 | 
317 |   The requirement to provide Installation Information does not include a
318 | requirement to continue to provide support service, warranty, or updates
319 | for a work that has been modified or installed by the recipient, or for
320 | the User Product in which it has been modified or installed.  Access to a
321 | network may be denied when the modification itself materially and
322 | adversely affects the operation of the network or violates the rules and
323 | protocols for communication across the network.
324 | 
325 |   Corresponding Source conveyed, and Installation Information provided,
326 | in accord with this section must be in a format that is publicly
327 | documented (and with an implementation available to the public in
328 | source code form), and must require no special password or key for
329 | unpacking, reading or copying.
330 | 
331 |   7. Additional Terms.
332 | 
333 |   "Additional permissions" are terms that supplement the terms of this
334 | License by making exceptions from one or more of its conditions.
335 | Additional permissions that are applicable to the entire Program shall
336 | be treated as though they were included in this License, to the extent
337 | that they are valid under applicable law.  If additional permissions
338 | apply only to part of the Program, that part may be used separately
339 | under those permissions, but the entire Program remains governed by
340 | this License without regard to the additional permissions.
341 | 
342 |   When you convey a copy of a covered work, you may at your option
343 | remove any additional permissions from that copy, or from any part of
344 | it.  (Additional permissions may be written to require their own
345 | removal in certain cases when you modify the work.)  You may place
346 | additional permissions on material, added by you to a covered work,
347 | for which you have or can give appropriate copyright permission.
348 | 
349 |   Notwithstanding any other provision of this License, for material you
350 | add to a covered work, you may (if authorized by the copyright holders of
351 | that material) supplement the terms of this License with terms:
352 | 
353 |     a) Disclaiming warranty or limiting liability differently from the
354 |     terms of sections 15 and 16 of this License; or
355 | 
356 |     b) Requiring preservation of specified reasonable legal notices or
357 |     author attributions in that material or in the Appropriate Legal
358 |     Notices displayed by works containing it; or
359 | 
360 |     c) Prohibiting misrepresentation of the origin of that material, or
361 |     requiring that modified versions of such material be marked in
362 |     reasonable ways as different from the original version; or
363 | 
364 |     d) Limiting the use for publicity purposes of names of licensors or
365 |     authors of the material; or
366 | 
367 |     e) Declining to grant rights under trademark law for use of some
368 |     trade names, trademarks, or service marks; or
369 | 
370 |     f) Requiring indemnification of licensors and authors of that
371 |     material by anyone who conveys the material (or modified versions of
372 |     it) with contractual assumptions of liability to the recipient, for
373 |     any liability that these contractual assumptions directly impose on
374 |     those licensors and authors.
375 | 
376 |   All other non-permissive additional terms are considered "further
377 | restrictions" within the meaning of section 10.  If the Program as you
378 | received it, or any part of it, contains a notice stating that it is
379 | governed by this License along with a term that is a further
380 | restriction, you may remove that term.  If a license document contains
381 | a further restriction but permits relicensing or conveying under this
382 | License, you may add to a covered work material governed by the terms
383 | of that license document, provided that the further restriction does
384 | not survive such relicensing or conveying.
385 | 
386 |   If you add terms to a covered work in accord with this section, you
387 | must place, in the relevant source files, a statement of the
388 | additional terms that apply to those files, or a notice indicating
389 | where to find the applicable terms.
390 | 
391 |   Additional terms, permissive or non-permissive, may be stated in the
392 | form of a separately written license, or stated as exceptions;
393 | the above requirements apply either way.
394 | 
395 |   8. Termination.
396 | 
397 |   You may not propagate or modify a covered work except as expressly
398 | provided under this License.  Any attempt otherwise to propagate or
399 | modify it is void, and will automatically terminate your rights under
400 | this License (including any patent licenses granted under the third
401 | paragraph of section 11).
402 | 
403 |   However, if you cease all violation of this License, then your
404 | license from a particular copyright holder is reinstated (a)
405 | provisionally, unless and until the copyright holder explicitly and
406 | finally terminates your license, and (b) permanently, if the copyright
407 | holder fails to notify you of the violation by some reasonable means
408 | prior to 60 days after the cessation.
409 | 
410 |   Moreover, your license from a particular copyright holder is
411 | reinstated permanently if the copyright holder notifies you of the
412 | violation by some reasonable means, this is the first time you have
413 | received notice of violation of this License (for any work) from that
414 | copyright holder, and you cure the violation prior to 30 days after
415 | your receipt of the notice.
416 | 
417 |   Termination of your rights under this section does not terminate the
418 | licenses of parties who have received copies or rights from you under
419 | this License.  If your rights have been terminated and not permanently
420 | reinstated, you do not qualify to receive new licenses for the same
421 | material under section 10.
422 | 
423 |   9. Acceptance Not Required for Having Copies.
424 | 
425 |   You are not required to accept this License in order to receive or
426 | run a copy of the Program.  Ancillary propagation of a covered work
427 | occurring solely as a consequence of using peer-to-peer transmission
428 | to receive a copy likewise does not require acceptance.  However,
429 | nothing other than this License grants you permission to propagate or
430 | modify any covered work.  These actions infringe copyright if you do
431 | not accept this License.  Therefore, by modifying or propagating a
432 | covered work, you indicate your acceptance of this License to do so.
433 | 
434 |   10. Automatic Licensing of Downstream Recipients.
435 | 
436 |   Each time you convey a covered work, the recipient automatically
437 | receives a license from the original licensors, to run, modify and
438 | propagate that work, subject to this License.  You are not responsible
439 | for enforcing compliance by third parties with this License.
440 | 
441 |   An "entity transaction" is a transaction transferring control of an
442 | organization, or substantially all assets of one, or subdividing an
443 | organization, or merging organizations.  If propagation of a covered
444 | work results from an entity transaction, each party to that
445 | transaction who receives a copy of the work also receives whatever
446 | licenses to the work the party's predecessor in interest had or could
447 | give under the previous paragraph, plus a right to possession of the
448 | Corresponding Source of the work from the predecessor in interest, if
449 | the predecessor has it or can get it with reasonable efforts.
450 | 
451 |   You may not impose any further restrictions on the exercise of the
452 | rights granted or affirmed under this License.  For example, you may
453 | not impose a license fee, royalty, or other charge for exercise of
454 | rights granted under this License, and you may not initiate litigation
455 | (including a cross-claim or counterclaim in a lawsuit) alleging that
456 | any patent claim is infringed by making, using, selling, offering for
457 | sale, or importing the Program or any portion of it.
458 | 
459 |   11. Patents.
460 | 
461 |   A "contributor" is a copyright holder who authorizes use under this
462 | License of the Program or a work on which the Program is based.  The
463 | work thus licensed is called the contributor's "contributor version".
464 | 
465 |   A contributor's "essential patent claims" are all patent claims
466 | owned or controlled by the contributor, whether already acquired or
467 | hereafter acquired, that would be infringed by some manner, permitted
468 | by this License, of making, using, or selling its contributor version,
469 | but do not include claims that would be infringed only as a
470 | consequence of further modification of the contributor version.  For
471 | purposes of this definition, "control" includes the right to grant
472 | patent sublicenses in a manner consistent with the requirements of
473 | this License.
474 | 
475 |   Each contributor grants you a non-exclusive, worldwide, royalty-free
476 | patent license under the contributor's essential patent claims, to
477 | make, use, sell, offer for sale, import and otherwise run, modify and
478 | propagate the contents of its contributor version.
479 | 
480 |   In the following three paragraphs, a "patent license" is any express
481 | agreement or commitment, however denominated, not to enforce a patent
482 | (such as an express permission to practice a patent or covenant not to
483 | sue for patent infringement).  To "grant" such a patent license to a
484 | party means to make such an agreement or commitment not to enforce a
485 | patent against the party.
486 | 
487 |   If you convey a covered work, knowingly relying on a patent license,
488 | and the Corresponding Source of the work is not available for anyone
489 | to copy, free of charge and under the terms of this License, through a
490 | publicly available network server or other readily accessible means,
491 | then you must either (1) cause the Corresponding Source to be so
492 | available, or (2) arrange to deprive yourself of the benefit of the
493 | patent license for this particular work, or (3) arrange, in a manner
494 | consistent with the requirements of this License, to extend the patent
495 | license to downstream recipients.  "Knowingly relying" means you have
496 | actual knowledge that, but for the patent license, your conveying the
497 | covered work in a country, or your recipient's use of the covered work
498 | in a country, would infringe one or more identifiable patents in that
499 | country that you have reason to believe are valid.
500 | 
501 |   If, pursuant to or in connection with a single transaction or
502 | arrangement, you convey, or propagate by procuring conveyance of, a
503 | covered work, and grant a patent license to some of the parties
504 | receiving the covered work authorizing them to use, propagate, modify
505 | or convey a specific copy of the covered work, then the patent license
506 | you grant is automatically extended to all recipients of the covered
507 | work and works based on it.
508 | 
509 |   A patent license is "discriminatory" if it does not include within
510 | the scope of its coverage, prohibits the exercise of, or is
511 | conditioned on the non-exercise of one or more of the rights that are
512 | specifically granted under this License.  You may not convey a covered
513 | work if you are a party to an arrangement with a third party that is
514 | in the business of distributing software, under which you make payment
515 | to the third party based on the extent of your activity of conveying
516 | the work, and under which the third party grants, to any of the
517 | parties who would receive the covered work from you, a discriminatory
518 | patent license (a) in connection with copies of the covered work
519 | conveyed by you (or copies made from those copies), or (b) primarily
520 | for and in connection with specific products or compilations that
521 | contain the covered work, unless you entered into that arrangement,
522 | or that patent license was granted, prior to 28 March 2007.
523 | 
524 |   Nothing in this License shall be construed as excluding or limiting
525 | any implied license or other defenses to infringement that may
526 | otherwise be available to you under applicable patent law.
527 | 
528 |   12. No Surrender of Others' Freedom.
529 | 
530 |   If conditions are imposed on you (whether by court order, agreement or
531 | otherwise) that contradict the conditions of this License, they do not
532 | excuse you from the conditions of this License.  If you cannot convey a
533 | covered work so as to satisfy simultaneously your obligations under this
534 | License and any other pertinent obligations, then as a consequence you may
535 | not convey it at all.  For example, if you agree to terms that obligate you
536 | to collect a royalty for further conveying from those to whom you convey
537 | the Program, the only way you could satisfy both those terms and this
538 | License would be to refrain entirely from conveying the Program.
539 | 
540 |   13. Remote Network Interaction; Use with the GNU General Public License.
541 | 
542 |   Notwithstanding any other provision of this License, if you modify the
543 | Program, your modified version must prominently offer all users
544 | interacting with it remotely through a computer network (if your version
545 | supports such interaction) an opportunity to receive the Corresponding
546 | Source of your version by providing access to the Corresponding Source
547 | from a network server at no charge, through some standard or customary
548 | means of facilitating copying of software.  This Corresponding Source
549 | shall include the Corresponding Source for any work covered by version 3
550 | of the GNU General Public License that is incorporated pursuant to the
551 | following paragraph.
552 | 
553 |   Notwithstanding any other provision of this License, you have
554 | permission to link or combine any covered work with a work licensed
555 | under version 3 of the GNU General Public License into a single
556 | combined work, and to convey the resulting work.  The terms of this
557 | License will continue to apply to the part which is the covered work,
558 | but the work with which it is combined will remain governed by version
559 | 3 of the GNU General Public License.
560 | 
561 |   14. Revised Versions of this License.
562 | 
563 |   The Free Software Foundation may publish revised and/or new versions of
564 | the GNU Affero General Public License from time to time.  Such new versions
565 | will be similar in spirit to the present version, but may differ in detail to
566 | address new problems or concerns.
567 | 
568 |   Each version is given a distinguishing version number.  If the
569 | Program specifies that a certain numbered version of the GNU Affero General
570 | Public License "or any later version" applies to it, you have the
571 | option of following the terms and conditions either of that numbered
572 | version or of any later version published by the Free Software
573 | Foundation.  If the Program does not specify a version number of the
574 | GNU Affero General Public License, you may choose any version ever published
575 | by the Free Software Foundation.
576 | 
577 |   If the Program specifies that a proxy can decide which future
578 | versions of the GNU Affero General Public License can be used, that proxy's
579 | public statement of acceptance of a version permanently authorizes you
580 | to choose that version for the Program.
581 | 
582 |   Later license versions may give you additional or different
583 | permissions.  However, no additional obligations are imposed on any
584 | author or copyright holder as a result of your choosing to follow a
585 | later version.
586 | 
587 |   15. Disclaimer of Warranty.
588 | 
589 |   THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
590 | APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
591 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
592 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
593 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
594 | PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
595 | IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
596 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
597 | 
598 |   16. Limitation of Liability.
599 | 
600 |   IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
601 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
602 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
603 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
604 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
605 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
606 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
607 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
608 | SUCH DAMAGES.
609 | 
610 |   17. Interpretation of Sections 15 and 16.
611 | 
612 |   If the disclaimer of warranty and limitation of liability provided
613 | above cannot be given local legal effect according to their terms,
614 | reviewing courts shall apply local law that most closely approximates
615 | an absolute waiver of all civil liability in connection with the
616 | Program, unless a warranty or assumption of liability accompanies a
617 | copy of the Program in return for a fee.
618 | 
619 |                      END OF TERMS AND CONDITIONS
620 | 
621 |             How to Apply These Terms to Your New Programs
622 | 
623 |   If you develop a new program, and you want it to be of the greatest
624 | possible use to the public, the best way to achieve this is to make it
625 | free software which everyone can redistribute and change under these terms.
626 | 
627 |   To do so, attach the following notices to the program.  It is safest
628 | to attach them to the start of each source file to most effectively
629 | state the exclusion of warranty; and each file should have at least
630 | the "copyright" line and a pointer to where the full notice is found.
631 | 
632 |     <one line to give the program's name and a brief idea of what it does.>
633 |     Copyright (C) <year>  <name of author>
634 | 
635 |     This program is free software: you can redistribute it and/or modify
636 |     it under the terms of the GNU Affero General Public License as published by
637 |     the Free Software Foundation, either version 3 of the License, or
638 |     (at your option) any later version.
639 | 
640 |     This program is distributed in the hope that it will be useful,
641 |     but WITHOUT ANY WARRANTY; without even the implied warranty of
642 |     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
643 |     GNU Affero General Public License for more details.
644 | 
645 |     You should have received a copy of the GNU Affero General Public License
646 |     along with this program.  If not, see <https://www.gnu.org/licenses/>.
647 | 
648 | Also add information on how to contact you by electronic and paper mail.
649 | 
650 |   If your software can interact with users remotely through a computer
651 | network, you should also make sure that it provides a way for users to
652 | get its source.  For example, if your program is a web application, its
653 | interface could display a "Source" link that leads users to an archive
654 | of the code.  There are many ways you could offer source, and different
655 | solutions will be better for different programs; see section 13 for the
656 | specific requirements.
657 | 
658 |   You should also get your employer (if you work as a programmer) or school,
659 | if any, to sign a "copyright disclaimer" for the program, if necessary.
660 | For more information on this, and how to apply and follow the GNU AGPL, see
661 | <https://www.gnu.org/licenses/>.


--------------------------------------------------------------------------------
/Readme.md:
--------------------------------------------------------------------------------
 1 | # Causal INdependent Effect Module Attribution + Optimal Transport (CINEMA-OT)
 2 | 
 3 | CINEMA-OT is a **causal** framework for perturbation effect analysis to identify **individual treatment effects** and **synergy** at the **single cell** level. 
 4 | 
 5 | **Note**: Newer versions of CINEMA-OT are maintained at [Pertpy](https://github.com/scverse/pertpy).
 6 | 
 7 | ## Architecture
 8 | 
 9 | <img width="1460" alt="image" src="https://user-images.githubusercontent.com/68533876/228745549-8328ea36-25c6-4665-9c68-bab1e1a78ef9.png">
10 | 
11 | 
12 | Read our preprint on bioRxiv:
13 | 
14 | - Dong, Mingze, et al. "Causal identification of single-cell experimental perturbation effects with CINEMA-OT". bioRxiv (2022).
15 | [https://www.biorxiv.org/content/10.1101/2022.07.31.502173v3](https://www.biorxiv.org/content/10.1101/2022.07.31.502173v3)
16 | 
17 | ## System requirements
18 | ### Hardware requirements
19 | `CINEMA-OT` requires only a standard computer with enough RAM to perform in-memory computations.
20 | ### OS requirements
21 | The `CINEMA-OT` package is supported for all OS in principle. The package has been tested on the following systems:
22 | * macOS: Monterey (12.4)
23 | * Linux: RHEL Maipo (7.9), Ubantu (18.04)
24 | ### Dependencies
25 | See `setup.cfg` for details.
26 | 
27 | ## Installation
28 | CINEMA-OT requires `python` version 3.7+.  Install directly from pip with:
29 | 
30 |     pip install cinemaot
31 | 
32 | The installation should take no more than a few minutes on a normal desktop computer.
33 | 
34 | 
35 | ## Usage
36 | 
37 | For detailed usage, follow our step-by-step tutorial here:
38 | 
39 | - [Getting Started with CINEMA-OT](https://github.com/vandijklab/CINEMA-OT/blob/main/cinemaot_tutorial.ipynb)
40 | 
41 | Download the data used for the tutorial here:
42 | 
43 | - [Ex vivo stimulation of human peripheral blood mononuclear cells (PBMC) with interferon](https://drive.google.com/file/d/1A3rNdgfiXFWhCUOoUfJ-AiY7AAOU0Ie3/view?usp=sharing)
44 | 


--------------------------------------------------------------------------------
/cinemaot/__init__.py:
--------------------------------------------------------------------------------
1 | """CINEMA-OT - Causal Independent Effect Module Attribution + Optimal Transport, for single-cell level treatment effect identification"""
2 | __version__ = "0.0.3"
3 | from . import cinemaot


--------------------------------------------------------------------------------
/cinemaot/benchmark.py:
--------------------------------------------------------------------------------
  1 | import scib
  2 | import numpy as np
  3 | import pandas as pd
  4 | import scanpy as sc
  5 | from sklearn.neighbors import NearestNeighbors
  6 | from scipy.sparse import csr_matrix
  7 | 
  8 | # In this newer version we use the Python implementation of xicor
  9 | # import rpy2.robjects as ro
 10 | # import rpy2.robjects.numpy2ri
 11 | # import rpy2.robjects.pandas2ri
 12 | # from rpy2.robjects.packages import importr
 13 | # rpy2.robjects.numpy2ri.activate()
 14 | # rpy2.robjects.pandas2ri.activate()
 15 | 
 16 | from scipy.stats.stats import pearsonr
 17 | from sklearn.decomposition import FastICA
 18 | from sklearn.metrics import roc_curve
 19 | from sklearn.metrics import auc
 20 | from sklearn.metrics import pairwise_distances
 21 | from . import sinkhorn_knopp as skp
 22 | 
 23 | from sklearn.preprocessing import OneHotEncoder
 24 | from scipy.stats import ttest_1samp
 25 | import harmonypy as hm
 26 | 
 27 | def mixscape(adata,obs_label, ref_label, expr_label, nn=20, return_te = True):
 28 |     X_pca1 = adata.obsm['X_pca'][adata.obs[obs_label]==expr_label,:]
 29 |     X_pca2 = adata.obsm['X_pca'][adata.obs[obs_label]==ref_label,:]
 30 |     nbrs = NearestNeighbors(n_neighbors=nn, algorithm='ball_tree').fit(X_pca1)
 31 |     mixscape_pca = adata.obsm['X_pca'].copy()
 32 |     mixscapematrix = nbrs.kneighbors_graph(X_pca2).toarray()
 33 |     mixscape_pca[adata.obs[obs_label]==ref_label,:] = np.dot(mixscapematrix, mixscape_pca[adata.obs[obs_label]==expr_label,:])/20
 34 |     if return_te:
 35 |         te2 = adata.X[adata.obs[obs_label]==ref_label,:] - (mixscapematrix/np.sum(mixscapematrix,axis=1)[:,None]) @ (adata.X[adata.obs[obs_label]==expr_label,:])
 36 |         return mixscape_pca, mixscapematrix, te2
 37 |     else:
 38 |         return mixscape_pca, mixscapematrix
 39 | 
 40 | def harmony_mixscape(adata,obs_label, ref_label, expr_label,nn=20, return_te = True):
 41 |     meta_data = adata.obs
 42 |     data_mat=adata.obsm['X_pca']
 43 |     vars_use=[obs_label]
 44 |     ho = hm.run_harmony(data_mat, meta_data,vars_use)
 45 |     hmdata = ho.Z_corr.T
 46 |     X_pca1 = hmdata[adata.obs[obs_label]==expr_label,:]
 47 |     X_pca2 = hmdata[adata.obs[obs_label]==ref_label,:]
 48 |     nbrs = NearestNeighbors(n_neighbors=nn, algorithm='ball_tree').fit(X_pca1)
 49 |     hmmatrix = nbrs.kneighbors_graph(X_pca2).toarray()
 50 |     if return_te:
 51 |         te2 = adata.X[adata.obs[obs_label]==ref_label,:] - np.matmul(hmmatrix/np.sum(hmmatrix,axis=1)[:,None],adata.X[adata.obs[obs_label]==expr_label,:])
 52 |         return hmdata, hmmatrix, te2
 53 |     else:
 54 |         return hmdata, hmmatrix
 55 | 
 56 | def OT(adata,obs_label, ref_label, expr_label,thres=0.01, return_te = True):
 57 |     cf1 = adata.obsm['X_pca'][adata.obs[obs_label]==expr_label,0:20]
 58 |     cf2 = adata.obsm['X_pca'][adata.obs[obs_label]==ref_label,0:20]
 59 |     r = np.zeros([cf1.shape[0],1])
 60 |     c = np.zeros([cf2.shape[0],1])
 61 |     r[:,0] = 1/cf1.shape[0]
 62 |     c[:,0] = 1/cf2.shape[0]
 63 |     sk = skp.SinkhornKnopp(setr=r,setc=c,epsilon=1e-2)
 64 |     dis = pairwise_distances(cf1,cf2)
 65 |     e = thres * adata.obsm['X_pca'].shape[1]
 66 |     af = np.exp(-dis * dis / e)
 67 |     ot = sk.fit(af).T
 68 |     OT_pca = adata.obsm['X_pca'].copy()
 69 |     OT_pca[adata.obs[obs_label]==ref_label,:] = np.matmul(ot/np.sum(ot,axis=1)[:,None],OT_pca[adata.obs[obs_label]==expr_label,:])
 70 |     if return_te:
 71 |         te2 = adata.X[adata.obs[obs_label]==ref_label,:] - np.matmul(ot/np.sum(ot,axis=1)[:,None],adata.X[adata.obs[obs_label]==expr_label,:])
 72 |         return OT_pca, ot, te2
 73 |     else:
 74 |         return OT_pca, ot
 75 | 
 76 | 
 77 | def evaluate_cinema(matrix,ite,gt,gite):
 78 |     #includes four statistics: knn-AUC, treatment effect pearson correlation, treatment effect spearman correlation, ttest AUC
 79 |     aucdata = np.zeros(gt.shape[0])
 80 |     corr_ = np.zeros(gt.shape[0])
 81 |     scorr_ = np.zeros(gt.shape[0])
 82 |     #genesig = np.zeros(gite.shape[1])
 83 |     for i in range(gt.shape[0]):
 84 |         fpr, tpr, thres = roc_curve(gt[i,:],matrix[i,:])
 85 |         aucdata[i] = auc(fpr,tpr)
 86 |     for i in range(ite.shape[0]):
 87 |         corr_[i], pval = pearsonr(ite[i,1000:],gite[i,1000:])
 88 |         scorr_[i],pval = spearmanr(ite[i,1000:],gite[i,1000:])
 89 |         corr_[i], pval = pearsonr(ite[i,:],gite[i,:])
 90 |         scorr_[i],pval = spearmanr(ite[i,:],gite[i,:])        
 91 |     return np.median(aucdata), np.median(corr_), np.median(scorr_)
 92 | 
 93 | def evaluate_batch(sig, adata,obs_label, label, continuity,asw=True,silhouette=True,graph_conn=True,pcr=True,nmi=True,ari=True,diff_coefs=False):
 94 |     #Label is a list!!!
 95 |     newsig = sc.AnnData(X=sig, obs = adata.obs)
 96 |     sc.pp.pca(newsig,n_comps=min(15,newsig.X.shape[1]-1))
 97 |     #newsig.obsm['X_pca'] = newsig.X
 98 |     k0=15
 99 |     sc.pp.neighbors(newsig, n_neighbors=k0)
100 |     sc.tl.diffmap(newsig, n_comps=min(15,newsig.X.shape[1]-1))
101 |     eigen = newsig.obsm['X_diffmap']
102 |     #newsig_nbrs = NearestNeighbors(n_neighbors=10, algorithm='ball_tree').fit(newsig.X)
103 |     #newsig_con = newsig_nbrs.kneighbors_graph(newsig.X)
104 |     #newsig.obsp['connectivities'] = newsig_con
105 |     newsig_metrics = scib.metrics.metrics(adata,newsig,obs_label,label[0],
106 |         isolated_labels_asw_= asw,
107 |         graph_conn_= graph_conn,
108 |         silhouette_ = silhouette,
109 |         nmi_=nmi,
110 |         ari_=ari,                            
111 |         pcr_=pcr)
112 |     if diff_coefs:
113 |         for i in range(len(label)):
114 |             steps = adata.obs[label[i]].values
115 |             #also we test max correlation to see strong functional dependence between steps and signals, for each state_group population 
116 |             if continuity[i]:
117 |                 xi = np.zeros(eigen.shape[1])
118 |                 #pval = np.zeros(eigen.shape[1])
119 |                 j = 0
120 |                 for source_row in eigen.T:
121 |                     #rresults = xicor(ro.FloatVector(source_row), ro.FloatVector(steps), pvalue = True)
122 |                     xi_obj = Xi(source_row,steps.astype(np.float))
123 |                     xi[j] = xi_obj.correlation
124 |                     j = j+1
125 |                 maxcoef = np.max(xi)
126 |                 #newsig_metrics.rename(index={'trajectory':'trajectory_coef'},inplace=True)
127 |                 #newsig_metrics.iloc[13,0] = np.max(xi)
128 |                 newsig_metrics.loc[label[i]] = maxcoef
129 |             else:
130 |                 encoder = OneHotEncoder(sparse=False)
131 |                 onehot = encoder.fit_transform(np.array(adata.obs[label[i]].values.tolist()).reshape(-1, 1))
132 |                 yi = np.zeros([onehot.shape[1],eigen.shape[1]])
133 |                 k = 0
134 |                 #ind = onehot.T[0] * 0
135 |                 m = onehot.T.shape[0]
136 |                 for indicator in onehot.T[0:m-1]:
137 |                     j = 0
138 |                     #ind = ind + indicator
139 |                     for source_row in eigen.T:
140 |                         xi_obj = Xi(source_row,indicator*1)
141 |                         yi[k,j] = xi_obj.correlation
142 |                         j = j+1
143 |                     k = k+1
144 |         
145 |             #newsig_metrics.rename(index={'hvg_overlap':'state_coef'},inplace=True)
146 |             #newsig_metrics.iloc[12,0] = np.mean(np.max(yi,axis=1))
147 |                 newsig_metrics.loc[label[i]] = np.mean(np.max(yi,axis=1))
148 |         
149 |     return newsig_metrics
150 | 
151 | 
152 | class Xi:
153 |     """
154 |     x and y are the data vectors
155 |     """
156 | 
157 |     def __init__(self, x, y):
158 | 
159 |         self.x = x
160 |         self.y = y
161 | 
162 |     @property
163 |     def sample_size(self):
164 |         return len(self.x)
165 | 
166 |     @property
167 |     def x_ordered_rank(self):
168 |         # PI is the rank vector for x, with ties broken at random
169 |         # Not mine: source (https://stackoverflow.com/a/47430384/1628971)
170 |         # random shuffling of the data - reason to use random.choice is that
171 |         # pd.sample(frac=1) uses the same randomizing algorithm
172 |         len_x = len(self.x)
173 |         randomized_indices = np.random.choice(np.arange(len_x), len_x, replace=False)
174 |         randomized = [self.x[idx] for idx in randomized_indices]
175 |         # same as pandas rank method 'first'
176 |         rankdata = ss.rankdata(randomized, method="ordinal")
177 |         # Reindexing based on pairs of indices before and after
178 |         unrandomized = [
179 |             rankdata[j] for i, j in sorted(zip(randomized_indices, range(len_x)))
180 |         ]
181 |         return unrandomized
182 | 
183 |     @property
184 |     def y_rank_max(self):
185 |         # f[i] is number of j s.t. y[j] <= y[i], divided by n.
186 |         return ss.rankdata(self.y, method="max") / self.sample_size
187 | 
188 |     @property
189 |     def g(self):
190 |         # g[i] is number of j s.t. y[j] >= y[i], divided by n.
191 |         return ss.rankdata([-i for i in self.y], method="max") / self.sample_size
192 | 
193 |     @property
194 |     def x_ordered(self):
195 |         # order of the x's, ties broken at random.
196 |         return np.argsort(self.x_ordered_rank)
197 | 
198 |     @property
199 |     def x_rank_max_ordered(self):
200 |         x_ordered_result = self.x_ordered
201 |         y_rank_max_result = self.y_rank_max
202 |         # Rearrange f according to ord.
203 |         return [y_rank_max_result[i] for i in x_ordered_result]
204 | 
205 |     @property
206 |     def mean_absolute(self):
207 |         x1 = self.x_rank_max_ordered[0 : (self.sample_size - 1)]
208 |         x2 = self.x_rank_max_ordered[1 : self.sample_size]
209 |         
210 |         return (
211 |             np.mean(
212 |                 np.abs(
213 |                     [
214 |                         x - y
215 |                         for x, y in zip(
216 |                             x1,
217 |                             x2,
218 |                         )
219 |                     ]
220 |                 )
221 |             )
222 |             * (self.sample_size - 1)
223 |             / (2 * self.sample_size)
224 |         )
225 | 
226 |     @property
227 |     def inverse_g_mean(self):
228 |         gvalue = self.g
229 |         return np.mean(gvalue * (1 - gvalue))
230 | 
231 |     @property
232 |     def correlation(self):
233 |         """xi correlation"""
234 |         return 1 - self.mean_absolute / self.inverse_g_mean
235 | 
236 |     @classmethod
237 |     def xi(cls, x, y):
238 |         return cls(x, y)
239 | 
240 |     def pval_asymptotic(self, ties=False, nperm=1000):
241 |         """
242 |         Returns p values of the correlation
243 |         Args:
244 |             ties: boolean
245 |                 If ties is true, the algorithm assumes that the data has ties
246 |                 and employs the more elaborated theory for calculating
247 |                 the P-value. Otherwise, it uses the simpler theory. There is
248 |                 no harm in setting tiles True, even if there are no ties.
249 |             nperm: int
250 |                 The number of permutations for the permutation test, if needed.
251 |                 default 1000
252 |         Returns:
253 |             p value
254 |         """
255 |         # If there are no ties, return xi and theoretical P-value:
256 | 
257 |         if ties:
258 |             return 1 - ss.norm.cdf(
259 |                 np.sqrt(self.sample_size) * self.correlation / np.sqrt(2 / 5)
260 |             )
261 | 
262 |         # If there are ties, and the theoretical method
263 |         # is to be used for calculation P-values:
264 |         # The following steps calculate the theoretical variance
265 |         # in the presence of ties:
266 |         sorted_ordered_x_rank = sorted(self.x_rank_max_ordered)
267 | 
268 |         ind = [i + 1 for i in range(self.sample_size)]
269 |         ind2 = [2 * self.sample_size - 2 * ind[i - 1] + 1 for i in ind]
270 | 
271 |         a = (
272 |             np.mean([i * j * j for i, j in zip(ind2, sorted_ordered_x_rank)])
273 |             / self.sample_size
274 |         )
275 | 
276 |         c = (
277 |             np.mean([i * j for i, j in zip(ind2, sorted_ordered_x_rank)])
278 |             / self.sample_size
279 |         )
280 | 
281 |         cq = np.cumsum(sorted_ordered_x_rank)
282 | 
283 |         m = [
284 |             (i + (self.sample_size - j) * k) / self.sample_size
285 |             for i, j, k in zip(cq, ind, sorted_ordered_x_rank)
286 |         ]
287 | 
288 |         b = np.mean([np.square(i) for i in m])
289 |         v = (a - 2 * b + np.square(c)) / np.square(self.inverse_g_mean)
290 | 
291 |         return 1 - ss.norm.cdf(
292 |             np.sqrt(self.sample_size) * self.correlation / np.sqrt(v)
293 |         )


--------------------------------------------------------------------------------
/cinemaot/cinemaot.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import pandas as pd
  3 | import scanpy as sc
  4 | from anndata import AnnData
  5 | from . import sinkhorn_knopp as skp
  6 | #from . import utils
  7 | from scipy.sparse import issparse
  8 | from sklearn.neighbors import NearestNeighbors
  9 | import scipy.stats as ss
 10 | 
 11 | # In this newer version we use the Python implementation of xicor
 12 | # import rpy2.robjects as ro
 13 | # import rpy2.robjects.numpy2ri
 14 | # import rpy2.robjects.pandas2ri
 15 | # from rpy2.robjects.packages import importr
 16 | # rpy2.robjects.numpy2ri.activate()
 17 | # rpy2.robjects.pandas2ri.activate()
 18 | 
 19 | 
 20 | # Instead of projecting the whole count matrix, we use the pca result of projected ICA components to stablize the noise
 21 | # returning an anndata object
 22 | # Detecting differently expressed genes: G = A + Z + AZ + e by NB regression. Significant coefficient before AZ means conditional-specific effect
 23 | # Further exclusion of false positives may be removed by permutation (as in PseudotimeDE)
 24 | 
 25 | #import ot
 26 | 
 27 | import statsmodels.api as sm
 28 | from sklearn.linear_model import LinearRegression
 29 | 
 30 | from sklearn.decomposition import FastICA
 31 | import sklearn.metrics
 32 | 
 33 | 
 34 | def cinemaot_unweighted(adata,obs_label,ref_label,expr_label,dim=20,thres=0.15,smoothness=1e-4,eps=1e-3,mode='parametric',marker=None,preweight_label=None):
 35 |     """
 36 |     Parameters
 37 |     ----------
 38 |     adata: 'AnnData'
 39 |         An anndata object containing the whole gene count matrix and an observation index for treatments. It should be preprocessed before input.
 40 |     obs_label: 'str'
 41 |         A string for indicating the treatment column name in adata.obs.
 42 |     ref_label: 'str'
 43 |         A string for indicating the control group in adata.obs.values.
 44 |     expr_label: 'str'
 45 |         A string for indicating the experiment group in adata.obs.values.
 46 |     dim: 'int'
 47 |         The number of independent components.
 48 |     thres: 'float'
 49 |         The threshold for setting the Chatterjee coefficent for confounder separation.
 50 |     smoothness: 'float'
 51 |         The parameter for setting the smoothness of entropy-regularized optimal transport. Should be set as a small value above zero!
 52 |     eps: 'float'
 53 |         The parameter for stop condition of OT convergence. 
 54 |     mode: 'str'
 55 |         If mode is 'parametric', return standard differential matrices. If it's non-parametric, we return expr cells' weighted quantile.
 56 |     Return
 57 |     ----------
 58 |     cf: 'numpy.ndarray'
 59 |         Confounder components, of shape (n_cells,n_components).
 60 |     ot: 'numpy.ndarray'
 61 |         Transport map across control and experimental conditions.
 62 |     te2: 'numpy.ndarray'
 63 |         Single-cell differential expression for each cell in control condition, of shape (n_refcells, n_genes).
 64 |     """
 65 |     if dim is None:
 66 |         sk = skp.SinkhornKnopp()
 67 |         c = 0.5
 68 |         data=adata.X
 69 |         vm = (1e-3 + data + c * data * data)/(1+c)
 70 |         P = sk.fit(vm)
 71 |         wm = np.dot(np.dot(np.sqrt(sk._D1),vm),np.sqrt(sk._D2))
 72 |         u,s,vt = np.linalg.svd(wm)
 73 |         dim = np.min(sum(s > (np.sqrt(data.shape[0])+np.sqrt(data.shape[1]))),adata.obsm['X_pca'].shape[1])
 74 | 
 75 | 
 76 |     transformer = FastICA(n_components=dim, random_state=0,whiten="arbitrary-variance")
 77 |     X_transformed = transformer.fit_transform(adata.obsm['X_pca'][:,:dim])
 78 |     #importr("XICOR")
 79 |     #xicor = ro.r["xicor"]
 80 |     groupvec = (adata.obs[obs_label]==ref_label *1).values #control
 81 |     xi = np.zeros(dim)
 82 |     #pval = np.zeros(dim)
 83 |     j = 0
 84 |     for source_row in X_transformed.T:
 85 |         xi_obj = Xi(source_row,groupvec*1)
 86 |         #rresults = xicor(ro.FloatVector(source_row), ro.FloatVector(groupvec), pvalue = True)
 87 |         #xi[j] = np.array(rresults.rx2("xi"))[0]
 88 |         xi[j] = xi_obj.correlation
 89 |         #pval[j] = np.array(rresults.rx2("pval"))[0]
 90 |         j = j+1
 91 |     cf = X_transformed[:,xi<thres]
 92 |     cf1 = cf[adata.obs[obs_label]==expr_label,:] #expr
 93 |     cf2 = cf[adata.obs[obs_label]==ref_label,:] #control
 94 |     if sum(xi<thres)==1:
 95 |         dis = sklearn.metrics.pairwise_distances(cf1.reshape(-1,1),cf2.reshape(-1,1))
 96 |     elif sum(xi<thres)==0:
 97 |         raise ValueError("No confounder components identified. Please try a higher threshold.")
 98 |     else:
 99 |         dis = sklearn.metrics.pairwise_distances(cf1,cf2)
100 |     e = smoothness * sum(xi<thres)
101 |     af = np.exp(-dis * dis / e)
102 |     r = np.zeros([cf1.shape[0],1])
103 |     c = np.zeros([cf2.shape[0],1])
104 |     if preweight_label is None:
105 |         r[:,0] = 1/cf1.shape[0]
106 |         c[:,0] = 1/cf2.shape[0]
107 |     else:
108 |         #implement a simple function here, taking adata.obs, output inverse prob weight. For consistency, c is still the empirical distribution, while r is weighted.
109 |         adata1 = adata[adata.obs[obs_label]==expr_label,:]
110 |         adata2 = adata[adata.obs[obs_label]==ref_label,:]
111 |         c[:,0] = 1/cf2.shape[0]
112 |         for ct in list(set(adata1.obs[preweight_label].values.tolist())):
113 |             r[(adata1.obs[preweight_label]==ct).values,0] = np.sum((adata2.obs[preweight_label]==ct).values) / np.sum((adata1.obs[preweight_label]==ct).values)
114 |         r[:,0] = r[:,0]/np.sum(r[:,0])
115 | 
116 |     sk = skp.SinkhornKnopp(setr=r,setc=c,epsilon=eps)
117 |     ot_matrix = sk.fit(af).T
118 | 
119 |     embedding = X_transformed[adata.obs[obs_label]==ref_label,:] - np.matmul(ot_matrix/np.sum(ot_matrix,axis=1)[:,None],X_transformed[adata.obs[obs_label]==expr_label,:])
120 | 
121 |     if mode == 'parametric':
122 |         if issparse(adata.X):
123 |             te2 = adata.X.toarray()[adata.obs[obs_label]==ref_label,:] - np.matmul(ot_matrix/np.sum(ot_matrix,axis=1)[:,None],adata.X.toarray()[adata.obs[obs_label]==expr_label,:])
124 |         else:
125 |             te2 = adata.X[adata.obs[obs_label]==ref_label,:] - np.matmul(ot_matrix/np.sum(ot_matrix,axis=1)[:,None],adata.X[adata.obs[obs_label]==expr_label,:])
126 |     elif mode == 'non_parametric':
127 |         if issparse(adata.X):
128 |             ref = adata.X.toarray()[adata.obs[obs_label]==ref_label,:]
129 |             ref = ref[:,adata.var_names.isin(marker)]
130 |             expr = adata.X.toarray()[adata.obs[obs_label]==expr_label,:]
131 |             expr = expr[:,adata.var_names.isin(marker)]
132 |             te2 = ref * 0
133 |             for i in range(te2.shape[0]):
134 |                 te2[i,:] = weighted_quantile(expr,ref[i,:],sample_weight=ot_matrix[i,:])
135 |         else:
136 |             ref = adata.X[adata.obs[obs_label]==ref_label,:]
137 |             ref = ref[:,adata.var_names.isin(marker)]
138 |             expr = adata.X[adata.obs[obs_label]==expr_label,:]
139 |             expr = expr[:,adata.var_names.isin(marker)]
140 |             te2 = ref * 0
141 |             for i in range(te2.shape[0]):
142 |                 te2[i,:] = weighted_quantile(expr,ref[i,:],sample_weight=ot_matrix[i,:])            
143 |     else:
144 |         raise ValueError("We do not support other methods for DE now.")
145 | 
146 |     TE = sc.AnnData(te2,obs=adata[adata.obs[obs_label]==ref_label,:].obs.copy(),var=adata.var.copy())
147 |     TE.obsm['X_embedding'] = embedding
148 |     return cf, ot_matrix, TE
149 | 
150 | 
151 | 
152 | def cinemaot_weighted(adata,obs_label,ref_label,expr_label,use_rep=None,dim=20,thres=0.75,smoothness=1e-4,eps=1e-3,k=10,resolution=1,mode='parametric',marker=None):
153 |     """
154 |     Parameters
155 |     ----------
156 |     adata: 'AnnData'
157 |         An anndata object containing the whole gene count matrix and an observation index for treatments. It should be preprocessed before input.
158 |     obs_label: 'str'
159 |         A string for indicating the treatment column name in adata.obs.
160 |     ref_label: 'str'
161 |         A string for indicating the control group in adata.obs.values.
162 |     expr_label: 'str'
163 |         A string for indicating the experiment group in adata.obs.values.
164 |     dim: 'int'
165 |         The number of independent components.
166 |     thres: 'float'
167 |         The threshold for setting the Chatterjee coefficent for confounder separation.
168 |     smoothness: 'float'
169 |         The parameter for setting the smoothness of entropy-regularized optimal transport. Should be set as a small value above zero!
170 |     eps: 'float'
171 |         The parameter for stop condition of OT convergence. 
172 |     iter_num: 'int'
173 |         Iteration number for reweighting.
174 |     k: 'int'
175 |         The parameter for knn.
176 | 
177 |     Return
178 |     ----------
179 |     cf: 'numpy.ndarray'
180 |         Confounder components, of shape (n_cells,n_components).
181 |     ot: 'numpy.ndarray'
182 |         Transport map across control and experimental conditions.
183 |     te2: 'numpy.ndarray'
184 |         Single-cell differential expression for each cell in control condition, of shape (n_refcells, n_genes).
185 |     r: 'numpy.ndarray'
186 |         Propensity score weights for expr condition.
187 |     c: 'numpy.ndarray'
188 |         Propensity score weights for reference condition.       
189 |     """
190 |     if dim is None:
191 |         sk = skp.SinkhornKnopp()
192 |         c = 0.5
193 |         data=adata.X
194 |         vm = (1e-3 + data + c * data * data)/(1+c)
195 |         P = sk.fit(vm)
196 |         wm = np.dot(np.dot(np.sqrt(sk._D1),vm),np.sqrt(sk._D2))
197 |         u,s,vt = np.linalg.svd(wm)
198 |         dim = sum(s > (np.sqrt(data.shape[0])+np.sqrt(data.shape[1])))
199 | 
200 |     sk = skp.SinkhornKnopp()
201 |     adata_ = adata[adata.obs[obs_label].isin([expr_label,ref_label])].copy()
202 |     if use_rep is None:
203 |         X_pca1 = adata_.obsm['X_pca'][adata_.obs[obs_label]==expr_label,:]
204 |         X_pca2 = adata_.obsm['X_pca'][adata_.obs[obs_label]==ref_label,:]
205 |         nbrs = NearestNeighbors(n_neighbors=k, algorithm='ball_tree').fit(X_pca1)
206 |         mixscape_pca = adata.obsm['X_pca'].copy()
207 |         mixscapematrix = nbrs.kneighbors_graph(X_pca2).toarray()
208 |         mixscape_pca[adata_.obs[obs_label]==ref_label,:] = np.dot(mixscapematrix, mixscape_pca[adata_.obs[obs_label]==expr_label,:])/k
209 | 
210 |         adata_.obsm['X_mpca'] = mixscape_pca
211 |         sc.pp.neighbors(adata_,use_rep='X_mpca')
212 | 
213 |     else:
214 |         sc.pp.neighbors(adata_,use_rep=use_rep)
215 |     sc.tl.leiden(adata_,resolution=resolution)
216 | 
217 |     z = np.zeros(adata_.shape[0]) + 1
218 | 
219 |     j = 0
220 | 
221 |     for i in adata_.obs['leiden'].cat.categories:
222 |         if adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==ref_label)].shape[0] >= adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==expr_label)].shape[0]:
223 |             z[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==ref_label)] = adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==expr_label)].shape[0] / adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==ref_label)].shape[0]
224 |             if j == 0:
225 |                 idx = sc.pp.subsample(adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==ref_label)],n_obs = adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==expr_label)].shape[0],copy=True).obs.index
226 |                 idx = idx.append(adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==expr_label)].obs.index)
227 |                 j = j + 1
228 |             else:
229 |                 idx_tmp = sc.pp.subsample(adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==ref_label)],n_obs = adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==expr_label)].shape[0],copy=True).obs.index
230 |                 idx_tmp = idx_tmp.append(adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==expr_label)].obs.index)
231 |                 idx = idx.append(idx_tmp)
232 |         else:
233 |             z[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==expr_label)] = adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==ref_label)].shape[0] / adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==expr_label)].shape[0]
234 |             if j == 0:
235 |                 idx = sc.pp.subsample(adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==expr_label)],n_obs = adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==ref_label)].shape[0],copy=True).obs.index
236 |                 idx = idx.append(adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==ref_label)].obs.index)
237 |                 j = j + 1
238 |             else:
239 |                 idx_tmp = sc.pp.subsample(adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==expr_label)],n_obs = adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==ref_label)].shape[0],copy=True).obs.index
240 |                 idx_tmp = idx_tmp.append(adata_[(adata_.obs['leiden']==i) & (adata_.obs[obs_label]==ref_label)].obs.index)
241 |                 idx = idx.append(idx_tmp)
242 | 
243 |     transformer = FastICA(n_components=dim, random_state=0, whiten="arbitrary-variance")
244 |     X_transformed = transformer.fit_transform(adata_[idx].obsm['X_pca'][:,:dim])
245 |     #importr("XICOR")
246 |     #xicor = ro.r["xicor"]
247 |     groupvec = (adata_[idx].obs[obs_label]==ref_label *1).values #control
248 |     xi = np.zeros(dim)
249 |     #pval = np.zeros(dim)
250 |     j = 0
251 |     for source_row in X_transformed.T:
252 |         xi_obj = Xi(source_row,groupvec*1)
253 |         #rresults = xicor(ro.FloatVector(source_row), ro.FloatVector(groupvec), pvalue = True)
254 |         #xi[j] = np.array(rresults.rx2("xi"))[0]
255 |         xi[j] = xi_obj.correlation
256 |         #pval[j] = np.array(rresults.rx2("pval"))[0]
257 |         j = j+1
258 | 
259 |     cf = transformer.transform(adata_.obsm['X_pca'][:,:dim])[:,xi<thres]
260 | 
261 |     cf1 = X_transformed[adata_[idx].obs[obs_label]==expr_label,:][:,xi<thres]
262 |     cf2 = cf[adata_.obs[obs_label]==ref_label,:]
263 |     r = np.zeros([cf1.shape[0],1])
264 |     c = np.zeros([cf2.shape[0],1])
265 |     r[:,0] = 1/cf1.shape[0]
266 |     c[:,0] = 1/cf2.shape[0]
267 |     #return cf,xi,adata_[idx]
268 |     if sum(xi<thres)==1:
269 |         dis = sklearn.metrics.pairwise_distances(cf1.reshape(-1,1),cf2.reshape(-1,1))
270 |     elif sum(xi<thres)==0:
271 |         raise ValueError("No confounder components identified. Please try a higher threshold.")
272 |     else:
273 |         dis = sklearn.metrics.pairwise_distances(cf1,cf2)
274 |     e = smoothness * sum(xi<thres)
275 |     af = np.exp(-dis * dis / e)
276 |     sk = skp.SinkhornKnopp(setr=r,setc=c,epsilon=eps)
277 |     ot = sk.fit(af).T
278 |     #return (adata.X[idx][adata[idx].obs[obs_label]==expr_label,:])
279 | 
280 |     if mode == 'parametric':
281 |         cf[adata.obs[obs_label]==ref_label,:] = (ot/np.sum(ot,axis=1)[:,None]) @ (cf1)
282 |         if issparse(adata.X):
283 |             te2 = adata.X.toarray()[adata.obs[obs_label]==ref_label,:] - (ot/np.sum(ot,axis=1)[:,None]) @ (adata[idx].X.toarray()[adata[idx].obs[obs_label]==expr_label,:])
284 |         else:
285 |             te2 = adata.X[adata.obs[obs_label]==ref_label,:] - (ot/np.sum(ot,axis=1)[:,None]) @ (adata[idx].X[adata[idx].obs[obs_label]==expr_label,:])
286 |         te2 = sc.AnnData(te2,obs=adata[adata.obs[obs_label]==ref_label,:].obs.copy(),var=adata.var.copy())
287 |         embedding = transformer.transform(adata_.obsm['X_pca'][:,:dim])[adata_.obs[obs_label]==ref_label,:] - (ot/np.sum(ot,axis=1)[:,None]) @ (transformer.transform(adata_[idx].obsm['X_pca'][:,:dim])[adata_[idx].obs[obs_label]==expr_label,:])
288 |         te2.obsm['X_embedding'] = embedding
289 |     elif mode == 'non_parametric':
290 |         if issparse(adata.X):
291 |             ref = adata.X.toarray()[adata.obs[obs_label]==ref_label,:]
292 |             ref = ref[:,adata.var_names.isin(marker)]
293 |             expr = adata.X.toarray()[adata.obs[obs_label]==expr_label,:]
294 |             expr = expr[:,adata.var_names.isin(marker)]
295 |             te2 = ref * 0
296 |             for i in range(te2.shape[0]):
297 |                 te2[i,:] = weighted_quantile(expr,ref[i,:],sample_weight=ot[i,:])
298 |         else:
299 |             ref = adata.X[adata.obs[obs_label]==ref_label,:]
300 |             ref = ref[:,adata.var_names.isin(marker)]
301 |             expr = adata.X[adata.obs[obs_label]==expr_label,:]
302 |             expr = expr[:,adata.var_names.isin(marker)]
303 |             te2 = ref * 0
304 |             for i in range(te2.shape[0]):
305 |                 te2[i,:] = weighted_quantile(expr,ref[i,:],sample_weight=ot[i,:]) 
306 |     else:
307 |         raise ValueError("We do not support other methods for DE now.")
308 |     return cf, ot, te2, z[adata.obs[obs_label]==ref_label]
309 | 
310 | def synergy(adata,obs_label,base,A,B,AB,dim=20,thres=0.15,smoothness=1e-4,eps=1e-3,mode='parametric',preweight_label=None,path=None,fthres=None):
311 |     adata1 = adata[adata.obs[obs_label].isin([base,A]),:]
312 |     adata2 = adata[adata.obs[obs_label].isin([B,AB]),:]
313 |     adata_link = adata[adata.obs[obs_label].isin([base,B]),:]
314 |     cf, ot1, de1 = cinemaot_unweighted(adata1,obs_label=obs_label, ref_label=base, expr_label=A,dim=dim,thres=thres,smoothness=smoothness,eps=eps,mode='parametric',preweight_label=preweight_label)
315 |     cf, ot2, de2 = cinemaot_unweighted(adata2,obs_label=obs_label, ref_label=B, expr_label=AB,dim=dim,thres=thres,smoothness=smoothness,eps=eps,mode='parametric',preweight_label=preweight_label)
316 |     cf, ot0, de0 = cinemaot_unweighted(adata_link,obs_label=obs_label, ref_label=base, expr_label=B,dim=dim,thres=thres,smoothness=smoothness,eps=eps,mode='parametric',preweight_label=preweight_label)
317 |     if mode == 'parametric':
318 |         syn = sc.AnnData(-((ot0/np.sum(ot0,axis=1)[:,None]) @ de2.X - de1.X),obs=de1.obs.copy(),var=de1.var.copy())
319 |         syn.obsm['X_embedding'] = (ot0/np.sum(ot0,axis=1)[:,None]) @ de2.obsm['X_embedding'] - de1.obsm['X_embedding']
320 |         return syn
321 |     #elif mode == 'non_parametric':
322 |         # For data with varying batch effect across conditions, we recommend output the difference set of significant genes
323 |         #syn2 = -(ot0/np.sum(ot0,axis=1)[:,None]) @ de2
324 |         #syn1 = -de1
325 |         #subset = adata[adata.obs[obs_label].isin([base]),:]
326 |         #syn2 = sc.AnnData(syn2)
327 |         #syn2.obs[preweight_label] = subset.obs[preweight_label].values
328 |         #syn2.var_names = subset.var_names
329 |         #syn1 = sc.AnnData(syn1)
330 |         #syn1.obs[preweight_label] = subset.obs[preweight_label].values
331 |         #syn1.var_names = subset.var_names
332 |         #utils.clustertest_synergy(syn1,syn2,preweight_label,1e-5,fthres,path=path)
333 |         #return
334 |         #raise ValueError("We do not non-parametric synergy now.")
335 |     else:
336 |         raise ValueError("We do not support other methods for synergy now.")
337 |         return
338 | 
339 | 
340 | def weighted_quantile(values, num, sample_weight=None, 
341 |                       values_sorted=False):
342 |     """
343 |     Estimate weighted quantile for robust estimation of gene expression change given the OT map. The function is completely vectorized to accelerate computation
344 |     """
345 |     values = np.array(values)
346 |     if sample_weight is None:
347 |         sample_weight = np.ones(len(values))
348 |     sorter = np.argsort(values,axis=0)
349 |     values = np.take_along_axis(values, sorter, axis=0)
350 |     sample_weight = np.tile(sample_weight/np.sum(sample_weight),(1,values.shape[1]))
351 |     sample_weight = np.take_along_axis(sample_weight,sorter,axis=1)
352 |     weighted_quantiles = np.cumsum(sample_weight,axis=0)
353 |     weighted_quantiles = np.vstack((np.zeros(values.shape[1]),weighted_quantiles))
354 |     numindex = np.sum(values <= num.reshape(1,-1),axis=0)
355 |     return np.diag(weighted_quantiles[np.ix_(numindex,np.arange(values.shape[1]))])
356 | 
357 | # For two conditions
358 | def NBregression(adata,n_cells=200):
359 |     cf = adata.obsm['cf']
360 |     z = np.zeros([adata.shape[0],2])
361 |     z[:,0] = 1
362 |     z[:,1] = (np.array(adata.obs['perturbation'].values) == adata.obs['perturbation'].values[0]) * 1
363 |     X = np.hstack((cf,z,cf*z[:,1][:,None]))
364 |     effectsize = np.zeros(adata.raw.X.shape[1])
365 |     pvalue = np.zeros(adata.raw.X.shape[1])
366 |     for i in range(adata.raw.X.shape[1]):
367 |         if np.sum(adata.raw.X[:,i].toarray()[:,0]>0)>=n_cells:
368 |             glm_binom = sm.GLM(adata.raw.X[:,i].toarray()[:,0], X, family=sm.families.Poisson())
369 |             try:
370 |                 res = glm_binom.fit(tol=1e-4)
371 |                 pvalue[i] = np.min(res.pvalues[cf.shape[1]+2:])
372 |                 effectsize[i] = res.params[np.argmin(res.pvalues[cf.shape[1]+2:])]
373 |             except:
374 |                 pvalue[i] = 0
375 |                 effectsize[i] = 0
376 | 
377 |     return effectsize, pvalue
378 | 
379 | 
380 | def attribution_scatter(adata,obs_label,control_label,expr_label,use_raw=True):
381 |     cf = adata.obsm['cf']
382 |     if use_raw:
383 |         Y0 = adata.raw.X.toarray()[adata.obs[obs_label]==control_label,:]
384 |         Y1 = adata.raw.X.toarray()[adata.obs[obs_label]==expr_label,:]
385 |     else:
386 |         Y0 = adata.X.toarray()[adata.obs[obs_label]==control_label,:]
387 |         Y1 = adata.X.toarray()[adata.obs[obs_label]==expr_label,:]
388 |     X0 = cf[adata.obs[obs_label]==control_label,:]
389 |     X1 = cf[adata.obs[obs_label]==expr_label,:]
390 |     ols0 = LinearRegression()
391 |     ols0.fit(X0,Y0)
392 |     ols1 = LinearRegression()
393 |     ols1.fit(X1,Y1)
394 |     c0 = ols0.predict(X0) - np.mean(ols0.predict(X0),axis=0)
395 |     c1 = ols1.predict(X1) - np.mean(ols1.predict(X1),axis=0)
396 |     e0 = Y0 - ols0.predict(X0)
397 |     e1 = Y1 - ols1.predict(X1)
398 |     #c_effect = np.mean(np.abs(ols1.coef_)+1e-6,axis=1) / np.mean(np.abs(ols0.coef_)+1e-6,axis=1)
399 |     c_effect = (np.linalg.norm(c1,axis=0)+1e-6)/(np.linalg.norm(c0,axis=0)+1e-6)
400 |     s_effect = (np.linalg.norm(e1,axis=0)+1e-6)/(np.linalg.norm(e0,axis=0)+1e-6)
401 |     return c_effect, s_effect
402 | 
403 | 
404 | class Xi:
405 |     """
406 |     x and y are the data vectors
407 |     """
408 | 
409 |     def __init__(self, x, y):
410 | 
411 |         self.x = x
412 |         self.y = y
413 | 
414 |     @property
415 |     def sample_size(self):
416 |         return len(self.x)
417 | 
418 |     @property
419 |     def x_ordered_rank(self):
420 |         # PI is the rank vector for x, with ties broken at random
421 |         # Not mine: source (https://stackoverflow.com/a/47430384/1628971)
422 |         # random shuffling of the data - reason to use random.choice is that
423 |         # pd.sample(frac=1) uses the same randomizing algorithm
424 |         len_x = len(self.x)
425 |         randomized_indices = np.random.choice(np.arange(len_x), len_x, replace=False)
426 |         randomized = [self.x[idx] for idx in randomized_indices]
427 |         # same as pandas rank method 'first'
428 |         rankdata = ss.rankdata(randomized, method="ordinal")
429 |         # Reindexing based on pairs of indices before and after
430 |         unrandomized = [
431 |             rankdata[j] for i, j in sorted(zip(randomized_indices, range(len_x)))
432 |         ]
433 |         return unrandomized
434 | 
435 |     @property
436 |     def y_rank_max(self):
437 |         # f[i] is number of j s.t. y[j] <= y[i], divided by n.
438 |         return ss.rankdata(self.y, method="max") / self.sample_size
439 | 
440 |     @property
441 |     def g(self):
442 |         # g[i] is number of j s.t. y[j] >= y[i], divided by n.
443 |         return ss.rankdata([-i for i in self.y], method="max") / self.sample_size
444 | 
445 |     @property
446 |     def x_ordered(self):
447 |         # order of the x's, ties broken at random.
448 |         return np.argsort(self.x_ordered_rank)
449 | 
450 |     @property
451 |     def x_rank_max_ordered(self):
452 |         x_ordered_result = self.x_ordered
453 |         y_rank_max_result = self.y_rank_max
454 |         # Rearrange f according to ord.
455 |         return [y_rank_max_result[i] for i in x_ordered_result]
456 | 
457 |     @property
458 |     def mean_absolute(self):
459 |         x1 = self.x_rank_max_ordered[0 : (self.sample_size - 1)]
460 |         x2 = self.x_rank_max_ordered[1 : self.sample_size]
461 |         
462 |         return (
463 |             np.mean(
464 |                 np.abs(
465 |                     [
466 |                         x - y
467 |                         for x, y in zip(
468 |                             x1,
469 |                             x2,
470 |                         )
471 |                     ]
472 |                 )
473 |             )
474 |             * (self.sample_size - 1)
475 |             / (2 * self.sample_size)
476 |         )
477 | 
478 |     @property
479 |     def inverse_g_mean(self):
480 |         gvalue = self.g
481 |         return np.mean(gvalue * (1 - gvalue))
482 | 
483 |     @property
484 |     def correlation(self):
485 |         """xi correlation"""
486 |         return 1 - self.mean_absolute / self.inverse_g_mean
487 | 
488 |     @classmethod
489 |     def xi(cls, x, y):
490 |         return cls(x, y)
491 | 
492 |     def pval_asymptotic(self, ties=False, nperm=1000):
493 |         """
494 |         Returns p values of the correlation
495 |         Args:
496 |             ties: boolean
497 |                 If ties is true, the algorithm assumes that the data has ties
498 |                 and employs the more elaborated theory for calculating
499 |                 the P-value. Otherwise, it uses the simpler theory. There is
500 |                 no harm in setting tiles True, even if there are no ties.
501 |             nperm: int
502 |                 The number of permutations for the permutation test, if needed.
503 |                 default 1000
504 |         Returns:
505 |             p value
506 |         """
507 |         # If there are no ties, return xi and theoretical P-value:
508 | 
509 |         if ties:
510 |             return 1 - ss.norm.cdf(
511 |                 np.sqrt(self.sample_size) * self.correlation / np.sqrt(2 / 5)
512 |             )
513 | 
514 |         # If there are ties, and the theoretical method
515 |         # is to be used for calculation P-values:
516 |         # The following steps calculate the theoretical variance
517 |         # in the presence of ties:
518 |         sorted_ordered_x_rank = sorted(self.x_rank_max_ordered)
519 | 
520 |         ind = [i + 1 for i in range(self.sample_size)]
521 |         ind2 = [2 * self.sample_size - 2 * ind[i - 1] + 1 for i in ind]
522 | 
523 |         a = (
524 |             np.mean([i * j * j for i, j in zip(ind2, sorted_ordered_x_rank)])
525 |             / self.sample_size
526 |         )
527 | 
528 |         c = (
529 |             np.mean([i * j for i, j in zip(ind2, sorted_ordered_x_rank)])
530 |             / self.sample_size
531 |         )
532 | 
533 |         cq = np.cumsum(sorted_ordered_x_rank)
534 | 
535 |         m = [
536 |             (i + (self.sample_size - j) * k) / self.sample_size
537 |             for i, j, k in zip(cq, ind, sorted_ordered_x_rank)
538 |         ]
539 | 
540 |         b = np.mean([np.square(i) for i in m])
541 |         v = (a - 2 * b + np.square(c)) / np.square(self.inverse_g_mean)
542 | 
543 |         return 1 - ss.norm.cdf(
544 |             np.sqrt(self.sample_size) * self.correlation / np.sqrt(v)
545 |         )


--------------------------------------------------------------------------------
/cinemaot/sinkhorn_knopp.py:
--------------------------------------------------------------------------------
  1 | import warnings
  2 | 
  3 | import numpy as np
  4 | 
  5 | 
  6 | class SinkhornKnopp:
  7 |     """
  8 |     Sinkhorn Knopp Algorithm
  9 | 
 10 |     Takes a non-negative square matrix P, where P =/= 0
 11 |     and iterates through Sinkhorn Knopp's algorithm
 12 |     to convert P to a doubly stochastic matrix.
 13 |     Guaranteed convergence if P has total support.
 14 | 
 15 |     For reference see original paper:
 16 |         http://msp.org/pjm/1967/21-2/pjm-v21-n2-p14-s.pdf
 17 | 
 18 |     Parameters
 19 |     ----------
 20 |     max_iter : int, default=1000
 21 |         The maximum number of iterations.
 22 | 
 23 |     epsilon : float, default=1e-3
 24 |         Metric used to compute the stopping condition,
 25 |         which occurs if all the row and column sums are
 26 |         within epsilon of 1. This should be a very small value.
 27 |         Epsilon must be between 0 and 1.
 28 | 
 29 |     Attributes
 30 |     ----------
 31 |     _max_iter : int, default=1000
 32 |         User defined parameter. See above.
 33 | 
 34 |     _epsilon : float, default=1e-3
 35 |         User defined paramter. See above.
 36 | 
 37 |     _stopping_condition: string
 38 |         Either "max_iter", "epsilon", or None, which is a
 39 |         description of why the algorithm stopped iterating.
 40 | 
 41 |     _iterations : int
 42 |         The number of iterations elapsed during the algorithm's
 43 |         run-time.
 44 | 
 45 |     _D1 : 2d-array
 46 |         Diagonal matrix obtained after a stopping condition was met
 47 |         so that _D1.dot(P).dot(_D2) is close to doubly stochastic.
 48 | 
 49 |     _D2 : 2d-array
 50 |         Diagonal matrix obtained after a stopping condition was met
 51 |         so that _D1.dot(P).dot(_D2) is close to doubly stochastic.
 52 | 
 53 |     Example
 54 |     -------
 55 | 
 56 |     .. code-block:: python
 57 |         >>> import numpy as np
 58 |         >>> from sinkhorn_knopp import sinkhorn_knopp as skp
 59 |         >>> sk = skp.SinkhornKnopp()
 60 |         >>> P = [[.011, .15], [1.71, .1]]
 61 |         >>> P_ds = sk.fit(P)
 62 |         >>> P_ds
 63 |         array([[ 0.06102561,  0.93897439],
 64 |            [ 0.93809928,  0.06190072]])
 65 |         >>> np.sum(P_ds, axis=0)
 66 |         array([ 0.99912489,  1.00087511])
 67 |         >>> np.sum(P_ds, axis=1)
 68 |         array([ 1.,  1.])
 69 | 
 70 |     """
 71 | 
 72 |     def __init__(self, max_iter=1000, setr=0, setc=0, epsilon=1e-3):
 73 |         assert isinstance(max_iter, int) or isinstance(max_iter, float),\
 74 |             "max_iter is not of type int or float: %r" % max_iter
 75 |         assert max_iter > 0,\
 76 |             "max_iter must be greater than 0: %r" % max_iter
 77 |         self._max_iter = int(max_iter)
 78 | 
 79 |         assert isinstance(epsilon, int) or isinstance(epsilon, float),\
 80 |             "epsilon is not of type float or int: %r" % epsilon
 81 |         assert epsilon > 0 and epsilon < 1,\
 82 |             "epsilon must be between 0 and 1 exclusive: %r" % epsilon
 83 |         self._epsilon = epsilon
 84 |         self._setr = setr
 85 |         self._setc = setc
 86 |         self._stopping_condition = None
 87 |         self._iterations = 0
 88 |         self._D1 = np.ones(1)
 89 |         self._D2 = np.ones(1)
 90 | 
 91 |     def fit(self, P):
 92 |         """Fit the diagonal matrices in Sinkhorn Knopp's algorithm
 93 | 
 94 |         Parameters
 95 |         ----------
 96 |         P : 2d array-like
 97 |         Must be a square non-negative 2d array-like object, that
 98 |         is convertible to a numpy array. The matrix must not be
 99 |         equal to 0 and it must have total support for the algorithm
100 |         to converge.
101 | 
102 |         Returns
103 |         -------
104 |         A double stochastic matrix.
105 | 
106 |         """
107 |         P = np.asarray(P)
108 |         assert np.all(P >= 0)
109 |         assert P.ndim == 2
110 | 
111 |         N = P.shape[0]
112 |         if np.sum(abs(self._setr)) == 0:
113 |         	rsum = P.shape[1]
114 |         else:
115 |         	rsum = self._setr
116 |         if np.sum(abs(self._setc)) == 0:
117 |         	csum = P.shape[0]
118 |         else:
119 |         	csum = self._setc
120 |         max_threshr = rsum + self._epsilon
121 |         min_threshr = rsum - self._epsilon
122 |         max_threshc = csum + self._epsilon
123 |         min_threshc = csum - self._epsilon
124 |         # Initialize r and c, the diagonals of D1 and D2
125 |         # and warn if the matrix does not have support.
126 |         r = np.ones((N, 1))
127 |         pdotr = P.T.dot(r)
128 |         total_support_warning_str = (
129 |             "Matrix P must have total support. "
130 |             "See documentation"
131 |         )
132 |         if not np.all(pdotr != 0):
133 |             warnings.warn(total_support_warning_str, UserWarning)
134 | 
135 |         c = 1 / pdotr
136 |         pdotc = P.dot(c)
137 |         if not np.all(pdotc != 0):
138 |             warnings.warn(total_support_warning_str, UserWarning)
139 | 
140 |         r = 1 / pdotc
141 |         del pdotr, pdotc
142 | 
143 |         P_eps = np.copy(P)
144 |         while np.any(np.sum(P_eps, axis=1) < min_threshr) \
145 |                 or np.any(np.sum(P_eps, axis=1) > max_threshr) \
146 |                 or np.any(np.sum(P_eps, axis=0) < min_threshc) \
147 |                 or np.any(np.sum(P_eps, axis=0) > max_threshc):
148 | 
149 |             c = csum / P.T.dot(r)
150 |             r = rsum / P.dot(c)
151 | 
152 |             self._D1 = np.diag(np.squeeze(r))
153 |             self._D2 = np.diag(np.squeeze(c))
154 | 
155 |             P_eps = np.diag(self._D1)[:,None] * P * np.diag(self._D2)[None,:]
156 | 
157 | 
158 |             self._iterations += 1
159 | 
160 |             if self._iterations >= self._max_iter:
161 |                 self._stopping_condition = "max_iter"
162 |                 break
163 | 
164 |         if not self._stopping_condition:
165 |             self._stopping_condition = "epsilon"
166 | 
167 |         self._D1 = np.diag(np.squeeze(r))
168 |         self._D2 = np.diag(np.squeeze(c))
169 |         P_eps = np.diag(self._D1)[:,None] * P * np.diag(self._D2)[None,:]
170 | 
171 |         return P_eps
172 | 


--------------------------------------------------------------------------------
/cinemaot/utils.py:
--------------------------------------------------------------------------------
  1 | import gseapy as gp
  2 | import pandas as pd
  3 | from scipy.stats import wilcoxon
  4 | import numpy as np
  5 | import scanpy as sc
  6 | #import scib
  7 | from sklearn.linear_model import LogisticRegression
  8 | from sklearn.preprocessing import OneHotEncoder
  9 | from scipy.stats import kstest
 10 | import plotly.graph_objects as go
 11 | import plotly.express as px
 12 | 
 13 | # import rpy2.robjects as ro
 14 | # import rpy2.robjects.numpy2ri
 15 | # import rpy2.robjects.pandas2ri
 16 | # from rpy2.robjects.packages import importr
 17 | # rpy2.robjects.numpy2ri.activate()
 18 | # rpy2.robjects.pandas2ri.activate()
 19 | 
 20 | 
 21 | def dominantcluster(adata,ctobs,clobs):
 22 |     clustername = []
 23 |     clustertime = np.zeros(adata.obs[ctobs].value_counts().values.shape[0])
 24 |     for i in adata.obs[clobs].value_counts().sort_index().index.values:
 25 |         tmp = adata.obs[ctobs][adata.obs[clobs]==i].value_counts().sort_index()
 26 |         ind = np.argmax(tmp.values)
 27 |         clustername.append(tmp.index.values[ind] + str(int(clustertime[ind])))
 28 |         clustertime[ind] = clustertime[ind] + 1
 29 |     return clustername
 30 | 
 31 | def assignleiden(adata,ctobs,clobs,label):
 32 |     clustername = dominantcluster(adata,ctobs,clobs)
 33 |     ss = adata.obs[clobs].values.tolist()
 34 |     for i in range(len(ss)):
 35 |         ss[i] = clustername[int(ss[i])]
 36 |     adata.obs[label] = ss
 37 |     return
 38 | 
 39 | def clustertest_synergy(adata1,adata2,clobs,thres,fthres,path,genesetpath,organism):
 40 |     # In this simplified function, we return the gene set only. The function is only designed for synergy computation.
 41 |     mkup = []
 42 |     mkdown = []
 43 |     for i in list(set(adata1.obs[clobs].values.tolist())):
 44 |         adata = adata1
 45 |         clusterindex = (adata.obs[clobs].values==i)
 46 |         tmpte = adata.X[clusterindex,:]
 47 |         clustername = i
 48 |         pv = np.zeros(tmpte.shape[1])
 49 |         for k in range(tmpte.shape[1]):
 50 |             st, pv[k] = wilcoxon(tmpte[:,k],zero_method='zsplit')
 51 |         genenames = adata.var_names.values
 52 |         upindex = (((pv<thres)*1) * ((np.median(tmpte,axis=0)>0)*1) * (np.abs(np.median(tmpte,axis=0))>fthres))>0
 53 |         downindex = (((pv<thres)*1) * ((np.median(tmpte,axis=0)<0)*1)* (np.abs(np.median(tmpte,axis=0))>fthres))>0
 54 |         allindex = (((pv<thres)*1) * (np.abs(np.median(tmpte,axis=0))>fthres))>0
 55 |         upgenes1 = genenames[upindex]
 56 |         downgenes1 = genenames[downindex]
 57 |         allgenes1 = genenames[allindex]
 58 |         adata = adata2
 59 |         clusterindex = (adata.obs[clobs].values==i)
 60 |         tmpte = adata.X[clusterindex,:]
 61 |         clustername = i
 62 |         pv = np.zeros(tmpte.shape[1])
 63 |         for k in range(tmpte.shape[1]):
 64 |             st, pv[k] = wilcoxon(tmpte[:,k],zero_method='zsplit')
 65 |         genenames = adata.var_names.values
 66 |         upindex = (((pv<thres)*1) * ((np.median(tmpte,axis=0)>0)*1) * (np.abs(np.median(tmpte,axis=0))>fthres))>0
 67 |         downindex = (((pv<thres)*1) * ((np.median(tmpte,axis=0)<0)*1)* (np.abs(np.median(tmpte,axis=0))>fthres))>0
 68 |         allindex = (((pv<thres)*1) * (np.abs(np.median(tmpte,axis=0))>fthres))>0
 69 |         upgenes2 = genenames[upindex]
 70 |         downgenes2 = genenames[downindex]
 71 |         allgenes2 = genenames[allindex]
 72 |         up1syn = list(set(upgenes1.tolist()) - set(upgenes2.tolist()))
 73 |         up2syn = list(set(upgenes2.tolist()) - set(upgenes1.tolist()))
 74 |         down1syn = list(set(downgenes1.tolist()) - set(downgenes2.tolist()))
 75 |         down2syn = list(set(downgenes2.tolist()) - set(downgenes1.tolist()))
 76 |         allgenes = list(set(up1syn) | set(up2syn) | set(down1syn) | set(down2syn))
 77 |         enr_up1 = gp.enrichr(gene_list=up1syn, gene_sets=genesetpath,
 78 |                      no_plot=True,organism=organism,
 79 |                      outdir=path, format='png')
 80 |         enr_up2 = gp.enrichr(gene_list=up2syn, gene_sets=genesetpath,
 81 |                      no_plot=True,organism=organism,
 82 |                      outdir=path, format='png')
 83 |         enr_down1 = gp.enrichr(gene_list=down1syn, gene_sets=genesetpath,
 84 |                      no_plot=True,organism=organism,
 85 |                      outdir=path, format='png')
 86 |         enr_down2 = gp.enrichr(gene_list=down2syn, gene_sets=genesetpath,
 87 |                      no_plot=True,organism=organism,
 88 |                      outdir=path, format='png')
 89 |         if not enr_up1.results.empty:
 90 |             enr_up1.results.iloc[enr_up1.results['Adjusted P-value'].values<1e-2,:].to_csv(path+'/Up1'+clustername+'.csv')
 91 |         if not enr_up2.results.empty:
 92 |             enr_up2.results.iloc[enr_up2.results['Adjusted P-value'].values<1e-2,:].to_csv(path+'/Up2'+clustername+'.csv')
 93 |         if not enr_down1.results.empty:
 94 |             enr_down1.results.iloc[enr_down1.results['Adjusted P-value'].values<1e-2,:].to_csv(path+'/Down1'+clustername+'.csv')
 95 |         if not enr_down2.results.empty:
 96 |             enr_down2.results.iloc[enr_down2.results['Adjusted P-value'].values<1e-2,:].to_csv(path+'/Down2'+clustername+'.csv')
 97 |         upgenes1df = pd.DataFrame(index=up1syn)
 98 |         upgenes2df = pd.DataFrame(index=up2syn)
 99 |         downgenes1df = pd.DataFrame(index=down1syn)
100 |         downgenes2df = pd.DataFrame(index=down2syn)
101 |         allgenesdf = pd.DataFrame(index=allgenes)
102 |         upgenes1df.to_csv(path+'/Upnames1'+clustername+'.csv')
103 |         upgenes2df.to_csv(path+'/Upnames2'+clustername+'.csv')
104 |         downgenes1df.to_csv(path+'/Downnames1'+clustername+'.csv')
105 |         downgenes2df.to_csv(path+'/Downnames2'+clustername+'.csv')
106 |         allgenesdf.to_csv(path+'/names'+clustername+'.csv')
107 | 
108 |     return
109 | 
110 | 
111 | def clustertest(adata,clobs,thres,fthres,label,path,genesetpath,organism,onlyup=False):
112 |     # Changed from ttest to Wilcoxon test
113 |     clusternum = int(np.max((np.asfarray(adata.obs[clobs].values))))
114 |     genenum = np.zeros([clusternum+1])
115 |     mk = []
116 |     for i in range(clusternum+1):
117 |         clusterindex = (np.asfarray(adata.obs[clobs].values)==i)
118 |         tmpte = adata.X[clusterindex,:]
119 |         clustername = adata.obs[label][clusterindex][0]
120 |         pv = np.zeros(tmpte.shape[1])
121 |         for k in range(tmpte.shape[1]):
122 |             st, pv[k] = wilcoxon(tmpte[:,k],zero_method='zsplit')
123 |         genenames = adata.var_names.values
124 |         upindex = (((pv<thres)*1) * ((np.median(tmpte,axis=0)>0)*1) * (np.abs(np.median(tmpte,axis=0))>fthres))>0
125 |         downindex = (((pv<thres)*1) * ((np.median(tmpte,axis=0)<0)*1)* (np.abs(np.median(tmpte,axis=0))>fthres))>0
126 |         allindex = (((pv<thres)*1) * (np.abs(np.median(tmpte,axis=0))>fthres))>0
127 |         upgenes = genenames[upindex]
128 |         downgenes = genenames[downindex]
129 |         allgenes = genenames[allindex]
130 |         mk.extend(allgenes.tolist())
131 |         mk = list(set(mk))
132 |         genenum[i] = np.sum(((pv<thres)*1) * ((np.abs(np.mean(tmpte,axis=0))>fthres)))
133 |         enr_up = gp.enrichr(gene_list=upgenes.tolist(), gene_sets=genesetpath,
134 |                      no_plot=True,organism=organism,
135 |                      outdir=path, format='png')
136 |         enr_down = gp.enrichr(gene_list=downgenes.tolist(), gene_sets=genesetpath,
137 |                      no_plot=True,organism=organism,
138 |                      outdir=path, format='png')
139 |         enr = gp.enrichr(gene_list=allgenes.tolist(), gene_sets=genesetpath,
140 |                      no_plot=True,organism=organism,
141 |                      outdir=path, format='png')
142 |         if not enr_up.results.empty:
143 |             enr_up.results.iloc[enr_up.results['Adjusted P-value'].values<1e-3,:].to_csv(path+'/Up'+clustername+'.csv')
144 |         if not enr_down.results.empty:
145 |             enr_down.results.iloc[enr_down.results['Adjusted P-value'].values<1e-3,:].to_csv(path+'/Down'+clustername+'.csv')
146 |         if not enr.results.empty:
147 |             enr.results.iloc[enr.results['Adjusted P-value'].values<1e-3,:].to_csv(path+'/'+clustername+'.csv')
148 |         upgenesdf = pd.DataFrame(index=upgenes)
149 |         downgenesdf = pd.DataFrame(index=downgenes)
150 |         allgenesdf = pd.DataFrame(index=allgenes)
151 |         upgenesdf.to_csv(path+'/Upnames'+clustername+'.csv')
152 |         downgenesdf.to_csv(path+'/Downnames'+clustername+'.csv')
153 |         allgenesdf.to_csv(path+'/names'+clustername+'.csv')
154 |         if onlyup:
155 |             enr = enr_up
156 | 
157 |         if not enr.results.empty:
158 |             if i == 0:
159 |                 df = enr.results.transpose().iloc[4:5,:]
160 |                 df.columns = enr.results['Term'][:]
161 |                 df.index.values[0] = clustername
162 |             else:
163 |                 tmp = enr.results.transpose().iloc[4:5,:]
164 |                 tmp.columns = enr.results['Term'][:]
165 |                 tmp.index.values[0] = clustername
166 |                 df = pd.concat([df,tmp])
167 |     #df.values = -np.log10(df.values)
168 |     #DF = sc.AnnData(df.transpose())
169 |     #sc.pl.clustermap(DF,cmap='viridis', col_cluster=False)
170 |     return genenum, df, mk
171 | 
172 | 
173 | def concordance_map(confounder,response,obs_label, cl_label, condition):
174 |     #deprecated
175 |     cf = confounder[confounder.obs[obs_label] == condition,:]
176 |     cf.obs['res_cl'] = response.obs[cl_label].values
177 |     aswmatrix = np.zeros([len(list(set(cf.obs['res_cl'].values.tolist()))),len(list(set(cf.obs['res_cl'].values.tolist())))])
178 |     indnummatrix = pd.DataFrame(None,list(set(cf.obs['res_cl'].values.tolist())),list(set(cf.obs['res_cl'].values.tolist())))
179 |     k = 0
180 |     #return aswmatrix
181 |     for i in list(set(cf.obs['res_cl'].values.tolist())):
182 |         l = 0
183 |         for j in list(set(cf.obs['res_cl'].values.tolist())):
184 |             if i != j:
185 |                 tmpcf = cf[cf.obs['res_cl'].isin([i,j]),:].copy()
186 |                 sc.pp.pca(tmpcf)
187 |                 encoder = OneHotEncoder(sparse=False)
188 |                 onehot = encoder.fit_transform(np.array(tmpcf.obs['res_cl'].values.tolist()).reshape(-1, 1))
189 |                 label = onehot[:,0]
190 |                 lc = LogisticRegression(penalty='l1',solver='liblinear',C=1)
191 |                 lc.fit(tmpcf.X, label)
192 |                 prob = lc.predict_proba(tmpcf.X)
193 |                 prob1 = prob[label==1,0]
194 |                 prob2 = prob[label==0,0]
195 |                 st, pv = kstest(prob1,prob2)
196 |                 #yi = np.zeros([onehot.shape[1],eigen.shape[1]])
197 |                 aswmatrix[k,l] = -np.log10(pv+1e-20)
198 |                 if np.sum(lc.coef_!=0)>0:
199 |                     indnummatrix.iloc[k,l] = str(np.argwhere(lc.coef_[0] !=0)[:,0].tolist())[1:-1]
200 |             else:
201 |                 aswmatrix[k,l] = 0
202 |             l = l + 1
203 |         k = k + 1
204 |     aswmatrix = pd.DataFrame(aswmatrix,list(set(cf.obs['res_cl'].values.tolist())),list(set(cf.obs['res_cl'].values.tolist())))
205 |     return aswmatrix, indnummatrix
206 | 
207 | 
208 | def coarse_matching(de,de_label,ref,ref_label,ot,scaling=1e6,mode='mean'):
209 |     coarse_ot = pd.DataFrame(index=sorted(set(de.obs[de_label].values.tolist())),columns=sorted(set(ref.obs[ref_label].values.tolist())),dtype=float)
210 |     for i in set(de.obs[de_label].values.tolist()):
211 |         for j in set(ref.obs[ref_label].values.tolist()):
212 |             tmp_ot = ot[de.obs[de_label]==i,:]
213 |             if mode=='mean':
214 |                 coarse_ot[j][i] = np.mean(tmp_ot[:,ref.obs[ref_label]==j]) * scaling
215 |             else:
216 |                 coarse_ot[j][i] = np.sum(tmp_ot[:,ref.obs[ref_label]==j]) * scaling
217 |     return coarse_ot
218 | 
219 | def sankey_plot(coarse_ot,thres1=0.1,thres2=0.1,title_text="Sankey Diagram",width=600,height=400):
220 |     new_coarse_ot = pd.DataFrame(np.zeros([coarse_ot.shape[0]*coarse_ot.shape[1],3]))
221 |     k = 0
222 |     for i in range(coarse_ot.shape[0]):
223 |         for j in range(coarse_ot.shape[1]):
224 |             thres_ = max(thres1 * np.sum(coarse_ot.values[i,:]), thres2 * np.sum(coarse_ot.values[:,j]))
225 |             if coarse_ot.values[i,j] > thres_:
226 |                 new_coarse_ot.iloc[k,1] = 'Response: ' + coarse_ot.index[i]
227 |                 new_coarse_ot.iloc[k,0] = coarse_ot.columns[j]
228 |                 new_coarse_ot.iloc[k,2] = coarse_ot.values[i,j]
229 |         
230 |                 k = k + 1
231 |     new_coarse_ot = new_coarse_ot.loc[new_coarse_ot.iloc[:,2]>0,:]
232 |     a = set(new_coarse_ot[0].values.tolist())
233 |     b = set(new_coarse_ot[1].values.tolist())
234 |     a0 = []
235 |     for i in range(len(list(a))):
236 |         a0.append(list(a)[i][:-1])
237 |     a0 = list(set(a0))
238 |     
239 |     source = np.arange(new_coarse_ot.shape[0] + new_coarse_ot.shape[0])
240 |     target = np.arange(new_coarse_ot.shape[0] + new_coarse_ot.shape[0])
241 |     
242 |     for i in range(new_coarse_ot.shape[0]):
243 |         source[i+new_coarse_ot.shape[0]] = np.argwhere(np.array(list(a))==new_coarse_ot[0].values[i])[0][0]
244 |         target[i+new_coarse_ot.shape[0]] = np.argwhere(np.array(list(b))==new_coarse_ot[1].values[i])[0][0]
245 |     
246 |     target = target + len(list(a))
247 |     
248 |     for i in range(new_coarse_ot.shape[0]):
249 |         source[i] = np.argwhere(np.array(a0)==new_coarse_ot[0].values[i][:-1])[0][0]
250 |         target[i] = np.argwhere(np.array(list(a))==new_coarse_ot[0].values[i])[0][0]
251 |     
252 |     target = target + len(a0)
253 |     source[new_coarse_ot.shape[0]:] = source[new_coarse_ot.shape[0]:] + len(a0)
254 |     values = np.zeros(2*new_coarse_ot.shape[0])
255 |     for i in range(new_coarse_ot.shape[0]):
256 |         values[i] = np.sum(new_coarse_ot.values[:,2][new_coarse_ot.values[:,0]==new_coarse_ot.values[i,0]]) / np.sum(new_coarse_ot.values[:,0]==new_coarse_ot.values[i,0])
257 |     
258 |     values[new_coarse_ot.shape[0]:] = new_coarse_ot.values[:,2]
259 |     colorlist = px.colors.qualitative.Plotly
260 |     colors = np.array(a0 + list(a) + list(b))
261 |     colors[0:len(a0)] = colorlist[0:len(a0)]
262 |     for i in range(len(a0),len(a0)+len(list(a))):
263 |         colors[i] = colors[0:len(a0)][np.array(a0)==(list(a)[i-len(a0)][:-1])][0]
264 |     for i in range(len(a0)+len(list(a)),len(a0)+len(list(a))+len(list(b))):
265 |         colors[i] = colors[0:len(a0)][np.array(a0)==(list(b)[i-len(a0)-len(list(a))][10:-1])][0]
266 | 
267 |     fig = go.Figure(data=[go.Sankey(
268 |     node = dict(
269 |       pad = 15,
270 |       thickness = 20,
271 |       #line = dict(color = "black", width = 0.5),
272 |       label = a0 + list(a) + list(b),
273 |       color = colors
274 |     ),
275 |     link = dict(
276 |       source = source, # indices correspond to labels, eg A1, A2, A1, B1, ...
277 |       target = target,
278 |       value = values
279 |   ))])
280 | 
281 |     fig.update_layout(title_text="Sankey Diagram", font_family="Arial", font_size=10,width=width, height=height)
282 |     fig.show()
283 |     return
284 | 
285 | 
286 | 
287 | 


--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
1 | [build-system]
2 | requires = ["setuptools>=42"]
3 | build-backend = "setuptools.build_meta"


--------------------------------------------------------------------------------
/setup.cfg:
--------------------------------------------------------------------------------
 1 | [metadata]
 2 | name = cinemaot
 3 | version = 0.0.4
 4 | author = Mingze Dong
 5 | author_email = mingze.dong@yale.edu
 6 | description = Causal INdependent Effect Module Attribution + Optimal Transport
 7 | long_description = file: README.md
 8 | long_description_content_type = text/markdown
 9 | url = https://github.com/vandijklab/CINEMA-OT
10 | project_urls =
11 |     Bug Tracker = https://github.com/vandijklab/CINEMA-OT/issues
12 | classifiers =
13 |     Programming Language :: Python :: 3
14 |     Development Status :: 2 - Pre-Alpha
15 |     Operating System :: OS Independent
16 | 
17 | [options]
18 | package_dir =
19 | packages = find:
20 | python_requires = >=3.7
21 | install_requires =
22 |     numpy
23 |     pandas
24 |     scanpy
25 |     scikit-learn
26 |     scipy
27 |     statsmodels
28 |     anndata
29 | 
30 | [options.packages.find]
31 | where = 
32 | 
33 | 


--------------------------------------------------------------------------------
/simulation.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import scanpy as sc
 3 | import matplotlib.pyplot as plt
 4 | from scsim_master import scsim
 5 | import pandas as pd
 6 | 
 7 | import random
 8 | 
 9 | def numbers_with_sum(n, k):
10 |     """n numbers with sum k"""
11 |     if n == 1:
12 |         return [k]
13 |     num = random.randint(0, k)
14 |     return [num] + numbers_with_sum(n - 1, k - num)
15 | 
16 | random.seed(0)
17 | np.random.seed(0)
18 | for i in range(15):
19 |     states_num = round(i/5) + 2
20 |     gp = numbers_with_sum(states_num, 10-states_num)
21 |     simulator = scsim.scsim(ngenes=1000, ncells=5000, seed = i, ngroups=states_num, libloc=7.64, libscale=0.78,
22 |                  mean_rate=7.68,mean_shape=0.34, expoutprob=0.00286,
23 |                  expoutloc=6.15, expoutscale=0.49,
24 |                  diffexpprob=.5, diffexpdownprob=.5, diffexploc=1, diffexpscale=1,
25 |                  bcv_dispersion=0.448, bcv_dof=22.087, ndoublets=0,groupprob=(np.array(gp)+1)/10,proggoups=[1,2],nproggenes=500,
26 |                  progdeloc=1,progdescale=1,progdownprob=0.,progcellfrac = 1.)
27 |     
28 |     simulator.simulate()
29 |     tmpobs = simulator.cellparams
30 |     ## "Groups" represent the treatment variable
31 |     tmpobs['Groups'] = 0
32 |     tmpobs['Response_state'] = 0
33 |     response_num = round(i/5) + 1
34 |     attribution_matrix = np.zeros([states_num,response_num])
35 |     simulator2_counts = simulator.counts.iloc[:,0:500].copy()
36 |     for j in range(states_num):
37 |         ncells_j = np.sum(simulator.cellparams['group']==(j+1))
38 |         
39 |         group = np.random.randint(0,2,size=ncells_j)
40 |     
41 |         gp2 = np.zeros(response_num+1) + 0.5
42 |         gp2[1:] = (np.array(numbers_with_sum(response_num, 5)))/10
43 |         
44 |         simulator2 = scsim.scsim(ngenes=500, ncells=ncells_j, seed = 300, ngroups=response_num+1, libloc=7.64, libscale=0.78,
45 |                      mean_rate=7.68,mean_shape=0.34, expoutprob=0.00286,
46 |                      expoutloc=6.15, expoutscale=0.49,
47 |                      diffexpprob=.5, diffexpdownprob=.5, diffexploc=1, diffexpscale=1,
48 |                      bcv_dispersion=0.148, bcv_dof=22.087, ndoublets=0,groupprob=gp2,nproggenes=0,
49 |                      progdeloc=1,progdescale=1,progdownprob=0.,progcellfrac = 1.)
50 |         
51 |         attribution_matrix[j,:] = 2 * gp2[1:]
52 |         simulator2.simulate()
53 |         ## group==1 is assigned as control, set the rest as perturbed
54 |         tmpobs['Groups'][simulator.cellparams['group']==(j+1)] = (simulator2.cellparams['group'].values > 1) * 1 + 1
55 |         tmpobs['Response_state'][simulator.cellparams['group']==(j+1)] = simulator2.cellparams['group'].values
56 |         simulator2_counts.loc[simulator.cellparams['group']==(j+1),:] = simulator2.counts.values
57 |     ## in the final anndata, 'group' represents cell state / type, 'Groups' represents treated or not, 'Response_state' indicates response heterogeneity
58 |     adata = sc.AnnData(pd.concat([simulator.counts, simulator2_counts], axis=1),obs=tmpobs)
59 |     adata.uns['attribution'] = attribution_matrix
60 |     adata.write('ScsimBenchmarkData/adata'+str(i)+'.h5ad')


--------------------------------------------------------------------------------