├── .gitignore ├── Cohorts ├── Augustine │ └── expressions.tsv.gz ├── Bogunovic │ └── expressions.tsv.gz ├── Hao │ └── expressions.tsv.gz ├── Pan_TCGA │ ├── annotation.tsv │ └── signatures.tsv ├── Raskin │ └── expressions.tsv.gz ├── TCGA_input_expressions │ └── expressions.tsv.gz └── Ulloa-Montoya │ └── expressions.tsv.gz ├── Data_processing_methods.ipynb ├── GEO_data_retrieval.ipynb ├── LICENSE ├── Notice.md ├── README.md ├── Setup.md ├── TME_Classification.ipynb ├── img ├── Abstract.svg ├── TME_workflow.jpg └── mfp_characteristics.png ├── install_r_packages.R ├── make_tme_environment.sh ├── plots ├── distribution_example.svg ├── pca_batches_example_colored.svg ├── pca_batches_example_not_colored.svg ├── pca_no_outliers.svg ├── pca_outliers_example.svg └── umap_examples.svg ├── portraits ├── __init__.py ├── classification.py ├── clustering.py ├── mapping.py ├── plotting.py └── utils.py ├── requirements.txt ├── signatures ├── gene_signatures.gmt └── gene_signatures_order.tsv └── upstream_html ├── From_cell_files.html ├── From_cell_files.pdf ├── Methods_Description_-_Batch_correction.html └── Methods_Description_-_Batch_correction.pdf /.gitignore: -------------------------------------------------------------------------------- 1 | # Created by .ignore support plugin (hsz.mobi) 2 | ### Python template 3 | # Byte-compiled / optimized / DLL files 4 | __pycache__/ 5 | *.py[cod] 6 | *$py.class 7 | 8 | # C extensions 9 | *.so 10 | 11 | # Distribution / packaging 12 | .Python 13 | build/ 14 | develop-eggs/ 15 | dist/ 16 | downloads/ 17 | eggs/ 18 | .eggs/ 19 | lib/ 20 | lib64/ 21 | parts/ 22 | sdist/ 23 | var/ 24 | wheels/ 25 | pip-wheel-metadata/ 26 | share/python-wheels/ 27 | *.egg-info/ 28 | .installed.cfg 29 | *.egg 30 | MANIFEST 31 | 32 | # PyInstaller 33 | # Usually these files are written by a python script from a template 34 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 35 | *.manifest 36 | *.spec 37 | 38 | # Installer logs 39 | pip-log.txt 40 | pip-delete-this-directory.txt 41 | 42 | # Unit test / coverage reports 43 | htmlcov/ 44 | .tox/ 45 | .nox/ 46 | .coverage 47 | .coverage.* 48 | .cache 49 | nosetests.xml 50 | coverage.xml 51 | *.cover 52 | .hypothesis/ 53 | .pytest_cache/ 54 | 55 | # Translations 56 | *.mo 57 | *.pot 58 | 59 | # Django stuff: 60 | *.log 61 | local_settings.py 62 | db.sqlite3 63 | db.sqlite3-journal 64 | 65 | # Flask stuff: 66 | instance/ 67 | .webassets-cache 68 | 69 | # Scrapy stuff: 70 | .scrapy 71 | 72 | # Sphinx documentation 73 | docs/_build/ 74 | 75 | # PyBuilder 76 | target/ 77 | 78 | # Jupyter Notebook 79 | .ipynb_checkpoints 80 | 81 | # IPython 82 | profile_default/ 83 | ipython_config.py 84 | 85 | # pyenv 86 | .python-version 87 | 88 | # pipenv 89 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 90 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 91 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 92 | # install all needed dependencies. 93 | #Pipfile.lock 94 | 95 | # celery beat schedule file 96 | celerybeat-schedule 97 | 98 | # SageMath parsed files 99 | *.sage.py 100 | 101 | # Environments 102 | .env 103 | .venv 104 | env/ 105 | venv/ 106 | ENV/ 107 | env.bak/ 108 | venv.bak/ 109 | 110 | # Spyder project settings 111 | .spyderproject 112 | .spyproject 113 | 114 | # Rope project settings 115 | .ropeproject 116 | 117 | # mkdocs documentation 118 | /site 119 | 120 | # mypy 121 | .mypy_cache/ 122 | .dmypy.json 123 | dmypy.json 124 | 125 | # Pyre type checker 126 | .pyre/ 127 | 128 | # Idea 129 | .idea/ -------------------------------------------------------------------------------- /Cohorts/Augustine/expressions.tsv.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BostonGene/MFP/4debc39bb1bae10dfc550ec9fcb1c653d722c773/Cohorts/Augustine/expressions.tsv.gz -------------------------------------------------------------------------------- /Cohorts/Bogunovic/expressions.tsv.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BostonGene/MFP/4debc39bb1bae10dfc550ec9fcb1c653d722c773/Cohorts/Bogunovic/expressions.tsv.gz -------------------------------------------------------------------------------- /Cohorts/Hao/expressions.tsv.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BostonGene/MFP/4debc39bb1bae10dfc550ec9fcb1c653d722c773/Cohorts/Hao/expressions.tsv.gz -------------------------------------------------------------------------------- /Cohorts/Raskin/expressions.tsv.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BostonGene/MFP/4debc39bb1bae10dfc550ec9fcb1c653d722c773/Cohorts/Raskin/expressions.tsv.gz -------------------------------------------------------------------------------- /Cohorts/TCGA_input_expressions/expressions.tsv.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BostonGene/MFP/4debc39bb1bae10dfc550ec9fcb1c653d722c773/Cohorts/TCGA_input_expressions/expressions.tsv.gz -------------------------------------------------------------------------------- /Cohorts/Ulloa-Montoya/expressions.tsv.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BostonGene/MFP/4debc39bb1bae10dfc550ec9fcb1c653d722c773/Cohorts/Ulloa-Montoya/expressions.tsv.gz -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Software License Agreement 2 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 3 | 4 | 1 - Definitions 5 | 6 | “BostonGene” shall mean BostonGene Corporation and its affiliates. 7 | 8 | "Derivative Software" shall mean any work, whether in Source or Object Form, that is based on (or derived from) the Software and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. 9 | 10 | "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 6 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or any entity or entities authorized by the copyright owner that is granting the License. 13 | 14 | "Object Form” shall mean any form resulting from mechanical transformation or translation of a Source Form, including but not limited to compiled object code, generated documentation, and conversions to other media types. 15 | 16 | "Source Form” shall mean software source code, documentation source, configuration files, and data. 17 | 18 | “Software” - shall mean the work of authorship, whether in Source or Object Form, made available under the License. 19 | 20 | "You" (or "Your") shall mean an individual or any entity or entities exercising permissions granted by this License. 21 | 22 | 2 - Grant of Copyright License. 23 | a - Subject to the terms and conditions of this License, Licensor hereby grants to You a worldwide, non-exclusive, non-transferrable, revocable, non-sublicensable, royalty-free copyright license to: (a) use the Software solely for academic and non-profit purposes; and (b) to reproduce, prepare Derivative Software of, and distribute the Software and such Derivative Software in Source or Object Form, all solely for academic and non-profit purposes. 24 | 25 | b - Any use, reproduction, or distribution of the Software or Derivative Software for direct or indirect commercial (including strategic) gain, purpose, or advantage, including for any research and/or development purpose by a for-profit entity or on behalf of a for-profit entity, requires a separately executed written license agreement. Please direct any inquiries concerning uses under this Section 2b to askusepermission@bostongene.com. 26 | 27 | c - The Software may incorporate third party software which may be subject to additional terms and conditions. Any use, reproduction, or distribution of the Software or Derivative Software shall comply with any additional terms and conditions applicable to such third party software. 28 | 29 | d - Any publication of results obtained with the Software or Derivative Software shall acknowledge its use by an appropriate citation including attribution to BostonGene and any other Licensor. 30 | 31 | 3 - Redistribution. You may reproduce and distribute copies of the Software or Derivative Software thereof to third parties in any medium, with or without modifications, and in Source or Object Form, provided that You comply with the following conditions: 32 | 33 | a - Any reproduction or distribution of copies of the Software or Derivative Software must be under the terms and conditions of this License such that any third party obtains rights to the Software or Derivative Software only as a result of accepting the terms and conditions of this License. 34 | 35 | b - Any other recipient of the Software or Derivative Software must receive a copy of this License. 36 | 37 | c - Any modified files must carry prominent notices stating that You changed the files. 38 | 39 | d - The copyright of BostonGene (as “Copyright BostonGene Corporation”), copyrights of any other Licensors, and copyrights of any incorporated third party software as described in the associated documentation must be acknowledged, and any such copyrights must be part of a NOTICE file to be included with any reproduction or distribution of the Software or Derivative Software. 40 | 41 | e - You may add Your own copyright statement to Your modifications. 42 | 43 | 4 - Attribution and Marks. This License does not grant permission to use the names, trade names, trademarks, service marks, or product names of any Licensor, except as required by Section 3d and for reasonable and customary use in describing the origin of the Software and Derivative Software and reproducing the content of the NOTICE file. 44 | 45 | 5 - Termination. 46 | 47 | a - You may not use, reproduce, modify or distribute the Software or Derivative Software except as expressly provided under this License. Any attempt otherwise to use, reproduce, modify, or distribute the Software or Derivative Software is void, and will automatically, without notice, terminate all Your rights under this License, including any and all copyright licenses granted by this License. 48 | 49 | b - You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to use, reproduce, modify, or distribute the Software or the Derivative Software. These actions are prohibited by law if you do not accept this License. Therefore, by using, reproducing, modifying, or distributing the Software or the Derivative Software, you indicate your acceptance of this License to do so, and all its terms and conditions. 50 | 51 | 6. Disclaimer of Warranty and Limitation of Liability. 52 | 53 | THE SOFTWARE IS PROVIDED BY LICENSOR "AS IS", WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. ALL SUCH WARRANTIES ARE DISCLAIMED. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE SOFTWARE AND ASSUME ANY RISKS ASSOCIATED WITH YOUR EXERCISE OF PERMISSIONS UNDER THIS LICENSE. 54 | 55 | IN NO EVENT SHALL LICENSOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. LICENSEE AGREES TO DEFEND, INDEMNIFY AND HOLD HARMLESS LICENSOR FOR ANY CLAIMS ARISING FROM LICENSEE’S USE OF THE SOFTWARE TO THE FULLEST EXTENT PERMITTED BY LAW. 56 | 57 | 58 | END OF TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 59 | 60 | 61 | Please direct any inquiries concerning uses not expressly permitted by this license (for example, any use prohibited by Section 2b without a separately executed written license agreement) to askusepermission@bostongene.com. 62 | -------------------------------------------------------------------------------- /Notice.md: -------------------------------------------------------------------------------- 1 | **NOTICE OF OPEN SOUCE LICENSES, TERMS AND CONDITIONS** 2 | 3 | BostonGene’s MFP Software (“MFP”) contains Open Source Software (“OSS”). The OSS is subject to licenses, terms, and conditions imposed by third parties holding copyrights in that OSS. BostonGene has been requested to provide the following notices about that OSS. Please consult the full text of the applicable license for additional information about licenses, terms, and conditions that apply to the OSS used in MFP. 4 | 5 | 6 | MFP contains **pandas**, **scikit-learn**, **scipy**, **numpy**, **networkx**, **seaborn**, and **python-louvain** OSS, licensed under the BSD 3-Clause License, to which the following applies: 7 | 8 | For **pandas**, the following applies: 9 | 10 | Copyright (c) 2008-2011, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team 11 | All rights reserved. 12 | 13 | Copyright (c) 2011-2023, Open source contributors. 14 | 15 | For **scikit-learn**, the following applies: 16 | 17 | Copyright (c) 2007-2023 The scikit-learn developers. 18 | All rights reserved. 19 | 20 | For **scipy**, the following applies: 21 | 22 | Copyright (c) 2001-2002 Enthought, Inc. 2003-2023, SciPy Developers. 23 | All rights reserved. 24 | 25 | For **numpy**, the following applies: 26 | 27 | Copyright (c) 2005-2023, NumPy Developers. 28 | All rights reserved. 29 | 30 | For **networkx**, the following applies: 31 | Copyright (C) 2004-2023, NetworkX Developers 32 | Aric Hagberg 33 | Dan Schult 34 | Pieter Swart 35 | All rights reserved. 36 | 37 | For **seaborn**, the following applies: 38 | 39 | Copyright (c) 2012-2016, Michael L. Waskom 40 | All rights reserved. 41 | 42 | For **python-louvain**, the following applies: 43 | 44 | Copyright (c) 2009-2018, Thomas Aynaud 45 | All rights reserved. 46 | 47 | For **pandas**, **scikit-learn**, **scipy**, **numpy**, **networkx**, **seaborn**, and **python-louvain**, the following applies: 48 | 49 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 50 | 51 | * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 52 | 53 | * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 54 | 55 | * Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. 56 | 57 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USEOF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 58 | 59 | MFP contains **matplotlib** OSS, licensed under the Matplotlib License, to which the following applies: 60 | 61 | Copyright (c) 2012- Matplotlib Development Team; All Rights Reserved 62 | 63 | 1. This LICENSE AGREEMENT is between the Matplotlib Development Team ("MDT"), and the Individual or Organization ("Licensee") accessing and otherwise using matplotlib software in source or binary form and its associated documentation. 64 | 65 | 2. Subject to the terms and conditions of this License Agreement, MDT hereby grants Licensee a nonexclusive, royalty-free, world-wide license to reproduce, analyze, test, perform and/or display publicly, prepare derivative works, distribute, and otherwise use matplotlib alone or in any derivative version, provided, however, that MDT's License Agreement and MDT's notice of copyright, i.e., "Copyright (c) 2012- Matplotlib Development Team; All Rights Reserved" are retained in matplotlib alone or in any derivative version prepared by Licensee. 66 | 67 | 3. In the event Licensee prepares a derivative work that is based on or incorporates matplotlib or any part thereof, and wants to make the derivative work available to others as provided herein, then Licensee hereby agrees to include in any such work a brief summary of the changes made to matplotlib . 68 | 69 | 4. MDT is making matplotlib available to Licensee on an "AS IS" basis. MDT MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, MDT MAKES NO AND DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF MATPLOTLIB WILL NOT INFRINGE ANY THIRD PARTY RIGHTS. 70 | 71 | 5. MDT SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF MATPLOTLIB FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING MATPLOTLIB , OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF. 72 | 73 | 6. This License Agreement will automatically terminate upon a material breach of its terms and conditions. 74 | 75 | 7. Nothing in this License Agreement shall be deemed to create any relationship of agency, partnership, or joint venture between MDT and Licensee. This License Agreement does not grant permission to use MDT trademarks or trade name in a trademark sense to endorse or promote products or services of Licensee, or any third party. 76 | 77 | 8. By copying, installing or otherwise using matplotlib , Licensee agrees to be bound by the terms and conditions of this License Agreement. 78 | 79 | 80 | If you have questions regarding any of the above notices or would like to obtain a copy of any OSS that BostonGene is required to provide pursuant to any of the above licenses, please contact BostonGene at askusepermission@bostongene.com. 81 | 82 | 83 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Conserved pan-cancer microenvironment subtypes. 2 | 3 | ## Introduction 4 | Tumor microenvironment (TME) plays a significant role in clinical outcomes and response to antineoplastic therapy. By exerting pro-tumorigenic and anti-tumorigenic actions, tumor-infiltrating immune cells can profoundly influence tumor progression and affect the success of anti-cancer treatments. Cancer-associated fibroblasts (CAFs), as well as angiogenic signals from stromal cells, have also been shown to affect outcomes in cancer patients. 5 | 6 | 7 | BostonGene has compiled a curated list of 29 functional genes and uses [single-sample Gene Set Enrichment Analysis (ssGSEA) scores](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2783335/) (in BostonGene we developed another formula for ssGSEA scores which is presented in [utils.py](portraits/utils))of their expression levels to classify a tumor sample into one of the TME subtypes (Fig. 1). Materials provided in this repository will help to identify the TME type of an input sample. 8 | 9 | ![image2020-11-4_18-20-47](https://user-images.githubusercontent.com/127855909/228009303-964b1147-0f42-4361-819b-bc22be9ccd97.png) 10 | 11 |

Figure 1. Types of tumor microenvironment.

12 | 13 |

A brief description of each type of TME and its graphical interpretation can be found in Fig. 2.

14 | 15 | ![image2020-11-4_18-12-26](https://user-images.githubusercontent.com/127855909/228009221-3fe09cc9-a30a-4d3f-aa4b-3641c6278f7e.png) 16 | 17 |

Figure 2. TME types brief description.

18 | 19 | ## Citation 20 | If software, data, and/or website are used in your publication, please cite [Bagaev A et al. Conserved pan-cancer microenvironment subtypes predict response to immunotherapy. Cancer Cell. 2021 Jun 14;39(6):845-865](https://www.cell.com/cancer-cell/fulltext/S1535-6108(21)00222-1#articleInformation) and make a reference to this repository. 21 | 22 | 23 | For more information visit [BostonGene’s Scientific portal](https://science.bostongene.com/tumor-portrait/) 24 | 25 | 26 | ## Setup 27 | Set up your environment according to the requirements in the description of the [Setup.md](Setup.md) file in the repository. 28 | 29 | 30 | If your environment is already set up accordingly, clone the Github repository to start your analysis. 31 | 32 | 33 | 34 | git clone https://github.com/BostonGene/MFP.git 35 | 36 | 37 | ## Implementation overview 38 | ***Note: The example analysis is performed for a cohort. Do not perform TME classification analyses for just one sample.*** 39 | 40 | 41 | The analysis workflow is presented in the diagram below, highlighting the main steps and logical elements of the notebook. 42 | 43 | ![TME_workflow](https://github.com/BostonGene/MFP/assets/127855909/32a0c5ab-55fc-4670-a522-148899364327) 44 | 45 | The analysis comprises two notebooks: [TME_Classification.ipynb](TME_Classification.ipynb) and [GEO_data_retrieval.ipynb](GEO_data_retrieval.ipynb). 46 | 47 | TME_Classification.ipynb serves as the primary component of the analysis, encompassing various sections for data processing and classification. It includes an additional section allowing users to incorporate their own reference cohort for classification purposes. 48 | 49 | For users intending to download their data from GEO before initiating the analyses, the GEO_retrieval.ipynb notebook is available. This notebook provides a dedicated framework for data retrieval specifically from GEO, enabling seamless integration into the subsequent analysis pipeline. 50 | 51 | 52 | The pipeline consists of several nodes that correspond to each other where some of them are optional(depends on user choice): 53 | 54 | * Data preparation 55 | * Data retrieval (and normalization) 56 | * Quality Check (QC) 57 | * Batch detection 58 | * Outliers detection 59 | * Data distribution check for data quality 60 | * Classification 61 | * Reading the expressions.tsv file to get the gene expression matrix of the samples of interest 62 | * Getting the reference gene signatures expressions matrix (TCGA cohort is set by default and can be changed to the path to your reference cohort) 63 | * Identifying the TME subtype of the sample/samples of interest by comparing their ssGSEA score to the ssGSEA scores of the reference cohort 64 | * Giving an output .tsv file with the sample/samples TME subtype 65 | * Clusterization of a reference cohort (optional; we recommend using the default TCGA cohort) 66 | * Getting a reference cohort input 67 | * Identifying the TME subtype for each sample based on its ssGSEA score 68 | * Getting an output .tsv table with the sample subtypes in the reference cohort 69 | 70 | **Note: It is recommended to use the default TCGA cohort to avoid possible problems during analysis.** 71 | 72 | 73 | You can also access the [Data_processing_methods.ipynb](Data_processing_methods.ipynb) notebook, which provides additional information on how the batch correction and outlier detection analyses were done in the article. 74 | -------------------------------------------------------------------------------- /Setup.md: -------------------------------------------------------------------------------- 1 | # Setup 2 | All of the calculations and analyses are done in the iPython notebook. The code is written in Python and partially in R. The R is used to get data from GEO in a CEL file and process it to RNA-seq type data. 3 | 4 | We recommend installing our Python virtual environment to perform the analysis. For detailed instructions, please refer to the “Preparation of the environment” section 5 | 6 | 7 | ## Environment Requirements 8 | * Python 3.10 9 | * The packages are in requirements.txt 10 | * R 4.0.0 or higher 11 | * The packages are in install_R_packages.R 12 | * Jupyter notebook with its R and Python kernels 13 | * WSL (for Windows users) 14 | 15 | ## WSL installation 16 | If you are a Windows user, install WSL on your computer. If your operating system is Unix-based, skip this step. 17 | 18 | To install WSL, follow the instructions provided on the [WSL installation webpage](https://learn.microsoft.com/en-us/windows/wsl/install). 19 | 20 | ## Preparation of the environment 21 | Please follow the instruction steps in the given order: 22 | 23 | ***Installation of python using apt source*** 24 | 25 | 26 | sudo apt-get update 27 | sudo apt-get install python3.10-venv python3.10-dev python3-pip 28 | 29 | ***Create python environment*** 30 | 31 | 32 | clone https://github.com/BostonGene/MFP 33 | cd MFP 34 | bash make_tme_environment.sh 35 | 36 | make_tme_environment.sh creates python3.10 environment with all necessary packages and creates ipykernel core for the environment with name tme_env 37 | 38 | 39 | ***Installation of python using conda*** 40 | 41 | 42 | If you want to create a python environment via conda please follow this [link](https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/installing-with-conda.html) 43 | 44 | 45 | ***Install the jupyter kernel for your environment (for conda)*** 46 | 47 | 48 | python -m ipykernel install --user --name=MFP_env 49 | 50 | 51 | **If your data is in CEL format you have to download R and all of the required packages.** 52 | 53 | 54 | ***Installation and preparation of R*** 55 | 56 | 57 | sudo apt install r-base-core 58 | Rscript -e "install.packages('IRkernel')" 59 | Rscript -e "IRkernel::installspec(user = FALSE)" 60 | Rscript -e 'install.packages("httr", repos="http://cran.rstudio.com/")' 61 | Rscript -e 'install.packages("RJSONIO", repos="http://cran.rstudio.com/")' 62 | sudo apt-get install libxml2-dev libcurl4-openssl-dev libssl-dev 63 | Rscript install_r_packages.R 64 | 65 | 66 | -------------------------------------------------------------------------------- /img/TME_workflow.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BostonGene/MFP/4debc39bb1bae10dfc550ec9fcb1c653d722c773/img/TME_workflow.jpg -------------------------------------------------------------------------------- /img/mfp_characteristics.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BostonGene/MFP/4debc39bb1bae10dfc550ec9fcb1c653d722c773/img/mfp_characteristics.png -------------------------------------------------------------------------------- /install_r_packages.R: -------------------------------------------------------------------------------- 1 | install.packages("BiocManager") 2 | 3 | path_to_download_packages<-.libPaths() 4 | 5 | BiocManager::install(c("affy", "annotate", "gcrma"), dependencies=TRUE, lib=path_to_download_packages[1]) -------------------------------------------------------------------------------- /make_tme_environment.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | set -e 3 | 4 | VENV_NAME="TME_env" 5 | 6 | if [ -d "$VENV_NAME" ] 7 | then 8 | echo "Directory $VENV_NAME already exist." 9 | echo "If you want to reinstall environment, execute the commands below and then restart script:" 10 | echo "rm -rf $VENV_NAME" | tr '[:upper:]' '[:lower:]' 11 | exit 1 12 | fi 13 | 14 | echo "Create new virtual environment with name: '$VENV_NAME'" 15 | python3.10 -m venv $VENV_NAME 16 | 17 | echo "Enter virtual environment" 18 | source $VENV_NAME/bin/activate 19 | 20 | echo "Install packages" 21 | pip install --upgrade pip wheel --no-cache-dir 22 | pip install -r requirements.txt --no-cache-dir 23 | 24 | echo "Create jupyter kernel with name $VENV_NAME" | tr '[:upper:]' '[:lower:]' 25 | python -m ipykernel install --user --name="$VENV_NAME" 26 | -------------------------------------------------------------------------------- /plots/pca_batches_example_colored.svg: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | 9 | 2023-03-14T14:27:41.593711 10 | image/svg+xml 11 | 12 | 13 | Matplotlib v3.5.2, https://matplotlib.org/ 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 30 | 31 | 32 | 33 | 39 | 40 | 41 | 42 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | 163 | 164 | 165 | 166 | 167 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | 177 | 178 | 179 | 180 | 181 | 182 | 183 | −100 184 | 185 | 186 | 187 | 188 | 189 | 190 | 191 | 192 | 193 | −50 194 | 195 | 196 | 197 | 198 | 199 | 200 | 201 | 202 | 203 | 0 204 | 205 | 206 | 207 | 208 | 209 | 210 | 211 | 212 | 213 | 50 214 | 215 | 216 | 217 | 218 | 219 | 220 | 221 | 222 | 223 | 100 224 | 225 | 226 | 227 | PCA 1 component 23% variance explained 228 | 229 | 230 | 231 | 232 | 233 | 234 | 237 | 238 | 239 | 240 | 241 | 242 | 243 | −100 244 | 245 | 246 | 247 | 248 | 249 | 250 | 251 | 252 | 253 | −50 254 | 255 | 256 | 257 | 258 | 259 | 260 | 261 | 262 | 263 | 0 264 | 265 | 266 | 267 | 268 | 269 | 270 | 271 | 272 | 273 | 50 274 | 275 | 276 | 277 | 278 | 279 | 280 | 281 | 282 | 283 | 100 284 | 285 | 286 | 287 | 288 | 289 | 290 | 291 | 292 | 293 | 150 294 | 295 | 296 | 297 | PCA 2 component 9% variance explained 298 | 299 | 300 | 301 | 304 | 305 | 306 | 309 | 310 | 311 | 314 | 315 | 316 | 319 | 320 | 321 | With Batches 322 | 323 | 324 | 325 | 326 | 332 | 333 | 334 | 335 | 336 | 337 | 338 | 339 | 340 | 341 | 342 | 343 | 344 | 345 | 346 | 347 | 348 | 349 | 350 | 351 | 352 | 353 | 354 | 355 | 356 | 357 | 358 | 359 | 360 | 361 | 362 | 363 | 364 | 365 | 366 | 367 | 368 | 369 | 370 | 371 | 372 | 373 | 374 | 375 | 376 | 377 | 378 | 379 | 380 | 381 | 382 | 383 | 384 | 385 | 386 | 387 | 388 | 389 | 390 | 391 | 392 | 393 | 394 | 395 | 396 | 397 | 398 | 399 | 400 | 401 | 402 | 403 | 404 | 405 | 406 | 407 | 408 | 409 | 410 | −50 411 | 412 | 413 | 414 | 415 | 416 | 417 | 418 | 419 | 420 | 0 421 | 422 | 423 | 424 | 425 | 426 | 427 | 428 | 429 | 430 | 50 431 | 432 | 433 | 434 | 435 | 436 | 437 | 438 | 439 | 440 | 100 441 | 442 | 443 | 444 | PCA 1 component 14% variance explained 445 | 446 | 447 | 448 | 449 | 450 | 451 | 452 | 453 | 454 | 455 | −100 456 | 457 | 458 | 459 | 460 | 461 | 462 | 463 | 464 | 465 | −50 466 | 467 | 468 | 469 | 470 | 471 | 472 | 473 | 474 | 475 | 0 476 | 477 | 478 | 479 | 480 | 481 | 482 | 483 | 484 | 485 | 50 486 | 487 | 488 | 489 | 490 | 491 | 492 | 493 | 494 | 495 | 100 496 | 497 | 498 | 499 | 500 | 501 | 502 | 503 | 504 | 505 | 150 506 | 507 | 508 | 509 | PCA 2 component 10% variance explained 510 | 511 | 512 | 513 | 516 | 517 | 518 | 521 | 522 | 523 | 526 | 527 | 528 | 531 | 532 | 533 | Without Batches 534 | 535 | 536 | 537 | 538 | 539 | 540 | 541 | 542 | 543 | 544 | 545 | 546 | -------------------------------------------------------------------------------- /plots/pca_batches_example_not_colored.svg: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | 9 | 2023-03-14T14:27:19.303345 10 | image/svg+xml 11 | 12 | 13 | Matplotlib v3.5.2, https://matplotlib.org/ 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 30 | 31 | 32 | 33 | 39 | 40 | 41 | 42 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 160 | 161 | 162 | 163 | 164 | 165 | 166 | −100 167 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | −50 177 | 178 | 179 | 180 | 181 | 182 | 183 | 184 | 185 | 186 | 0 187 | 188 | 189 | 190 | 191 | 192 | 193 | 194 | 195 | 196 | 50 197 | 198 | 199 | 200 | 201 | 202 | 203 | 204 | 205 | 206 | 100 207 | 208 | 209 | 210 | PCA 1 component 23% variance explained 211 | 212 | 213 | 214 | 215 | 216 | 217 | 220 | 221 | 222 | 223 | 224 | 225 | 226 | −100 227 | 228 | 229 | 230 | 231 | 232 | 233 | 234 | 235 | 236 | −50 237 | 238 | 239 | 240 | 241 | 242 | 243 | 244 | 245 | 246 | 0 247 | 248 | 249 | 250 | 251 | 252 | 253 | 254 | 255 | 256 | 50 257 | 258 | 259 | 260 | 261 | 262 | 263 | 264 | 265 | 266 | 100 267 | 268 | 269 | 270 | 271 | 272 | 273 | 274 | 275 | 276 | 150 277 | 278 | 279 | 280 | PCA 2 component 9% variance explained 281 | 282 | 283 | 284 | 287 | 288 | 289 | 292 | 293 | 294 | 297 | 298 | 299 | 302 | 303 | 304 | With Batches 305 | 306 | 307 | 308 | 309 | 315 | 316 | 317 | 318 | 319 | 320 | 321 | 322 | 323 | 324 | 325 | 326 | 327 | 328 | 329 | 330 | 331 | 332 | 333 | 334 | 335 | 336 | 337 | 338 | 339 | 340 | 341 | 342 | 343 | 344 | 345 | 346 | 347 | 348 | 349 | 350 | 351 | 352 | 353 | 354 | 355 | 356 | 357 | 358 | 359 | 360 | 361 | 362 | 363 | 364 | 365 | 366 | 367 | 368 | 369 | 370 | 371 | 372 | 373 | 374 | 375 | 376 | 377 | 378 | 379 | 380 | 381 | 382 | 383 | 384 | 385 | 386 | 387 | 388 | 389 | 390 | 391 | 392 | 393 | −50 394 | 395 | 396 | 397 | 398 | 399 | 400 | 401 | 402 | 403 | 0 404 | 405 | 406 | 407 | 408 | 409 | 410 | 411 | 412 | 413 | 50 414 | 415 | 416 | 417 | 418 | 419 | 420 | 421 | 422 | 423 | 100 424 | 425 | 426 | 427 | PCA 1 component 14% variance explained 428 | 429 | 430 | 431 | 432 | 433 | 434 | 435 | 436 | 437 | 438 | −100 439 | 440 | 441 | 442 | 443 | 444 | 445 | 446 | 447 | 448 | −50 449 | 450 | 451 | 452 | 453 | 454 | 455 | 456 | 457 | 458 | 0 459 | 460 | 461 | 462 | 463 | 464 | 465 | 466 | 467 | 468 | 50 469 | 470 | 471 | 472 | 473 | 474 | 475 | 476 | 477 | 478 | 100 479 | 480 | 481 | 482 | 483 | 484 | 485 | 486 | 487 | 488 | 150 489 | 490 | 491 | 492 | PCA 2 component 10% variance explained 493 | 494 | 495 | 496 | 499 | 500 | 501 | 504 | 505 | 506 | 509 | 510 | 511 | 514 | 515 | 516 | Without Batches 517 | 518 | 519 | 520 | 521 | 522 | 523 | 524 | 525 | 526 | 527 | 528 | 529 | -------------------------------------------------------------------------------- /plots/pca_no_outliers.svg: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | 9 | 2023-03-13T14:46:49.574530 10 | image/svg+xml 11 | 12 | 13 | Matplotlib v3.5.2, https://matplotlib.org/ 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 30 | 31 | 32 | 33 | 39 | 40 | 41 | 42 | 43 | 44 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 0.0 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 0.2 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 0.4 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 0.6 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 0.8 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 1.0 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 0.0 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 0.2 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 0.4 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 0.6 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 0.8 161 | 162 | 163 | 164 | 165 | 166 | 167 | 168 | 169 | 170 | 1.0 171 | 172 | 173 | 174 | 175 | 178 | 179 | 180 | 183 | 184 | 185 | 188 | 189 | 190 | 193 | 194 | 195 | 196 | 197 | -------------------------------------------------------------------------------- /plots/pca_outliers_example.svg: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | 9 | 2023-03-14T14:48:16.097026 10 | image/svg+xml 11 | 12 | 13 | Matplotlib v3.5.2, https://matplotlib.org/ 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 30 | 31 | 32 | 33 | 39 | 40 | 41 | 42 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 0 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 200 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 400 159 | 160 | 161 | 162 | 163 | 164 | 165 | 166 | 167 | 168 | 600 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | 177 | 178 | 800 179 | 180 | 181 | 182 | PCA 1 component 49% variance explained 183 | 184 | 185 | 186 | 187 | 188 | 189 | 192 | 193 | 194 | 195 | 196 | 197 | 198 | −75 199 | 200 | 201 | 202 | 203 | 204 | 205 | 206 | 207 | 208 | −50 209 | 210 | 211 | 212 | 213 | 214 | 215 | 216 | 217 | 218 | −25 219 | 220 | 221 | 222 | 223 | 224 | 225 | 226 | 227 | 228 | 0 229 | 230 | 231 | 232 | 233 | 234 | 235 | 236 | 237 | 238 | 25 239 | 240 | 241 | 242 | 243 | 244 | 245 | 246 | 247 | 248 | 50 249 | 250 | 251 | 252 | 253 | 254 | 255 | 256 | 257 | 258 | 75 259 | 260 | 261 | 262 | 263 | 264 | 265 | 266 | 267 | 268 | 100 269 | 270 | 271 | 272 | 273 | 274 | 275 | 276 | 277 | 278 | 125 279 | 280 | 281 | 282 | PCA 2 component 6% variance explained 283 | 284 | 285 | 286 | 289 | 290 | 291 | 294 | 295 | 296 | 299 | 300 | 301 | 304 | 305 | 306 | With Outliers 307 | 308 | 309 | 310 | 311 | 317 | 318 | 319 | 320 | 321 | 322 | 323 | 324 | 325 | 326 | 327 | 328 | 329 | 330 | 331 | 332 | 333 | 334 | 335 | 336 | 337 | 338 | 339 | 340 | 341 | 342 | 343 | 344 | 345 | 346 | 347 | 348 | 349 | 350 | 351 | 352 | 353 | 354 | 355 | 356 | 357 | 358 | 359 | 360 | 361 | 362 | 363 | 364 | 365 | 366 | 367 | 368 | 369 | 370 | 371 | 372 | 373 | 374 | 375 | 376 | 377 | 378 | 379 | 380 | 381 | 382 | 383 | 384 | 385 | 386 | 387 | 388 | 389 | 390 | 391 | 392 | 393 | 394 | 395 | −50 396 | 397 | 398 | 399 | 400 | 401 | 402 | 403 | 404 | 405 | 0 406 | 407 | 408 | 409 | 410 | 411 | 412 | 413 | 414 | 415 | 50 416 | 417 | 418 | 419 | 420 | 421 | 422 | 423 | 424 | 425 | 100 426 | 427 | 428 | 429 | PCA 1 component 14% variance explained 430 | 431 | 432 | 433 | 434 | 435 | 436 | 437 | 438 | 439 | 440 | −100 441 | 442 | 443 | 444 | 445 | 446 | 447 | 448 | 449 | 450 | −50 451 | 452 | 453 | 454 | 455 | 456 | 457 | 458 | 459 | 460 | 0 461 | 462 | 463 | 464 | 465 | 466 | 467 | 468 | 469 | 470 | 50 471 | 472 | 473 | 474 | 475 | 476 | 477 | 478 | 479 | 480 | 100 481 | 482 | 483 | 484 | 485 | 486 | 487 | 488 | 489 | 490 | 150 491 | 492 | 493 | 494 | PCA 2 component 10% variance explained 495 | 496 | 497 | 498 | 501 | 502 | 503 | 506 | 507 | 508 | 511 | 512 | 513 | 516 | 517 | 518 | Without Outliers 519 | 520 | 521 | 522 | 523 | 524 | 525 | 526 | 527 | 528 | 529 | 530 | 531 | -------------------------------------------------------------------------------- /plots/umap_examples.svg: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 6 | 7 | 8 | 9 | 2023-03-14T13:29:44.593569 10 | image/svg+xml 11 | 12 | 13 | Matplotlib v3.5.2, https://matplotlib.org/ 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 30 | 31 | 32 | 33 | 39 | 40 | 41 | 42 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 160 | 161 | 162 | 163 | 164 | 165 | 166 | −5 167 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | 0 177 | 178 | 179 | 180 | 181 | 182 | 183 | 184 | 185 | 186 | 5 187 | 188 | 189 | 190 | 191 | 192 | 193 | 194 | 195 | 196 | 10 197 | 198 | 199 | 200 | 201 | 202 | 203 | 204 | 205 | 206 | 15 207 | 208 | 209 | 210 | 211 | 212 | 213 | 214 | 215 | 216 | 20 217 | 218 | 219 | 220 | 221 | 222 | 223 | 224 | 227 | 228 | 229 | 230 | 231 | 232 | 233 | 13 234 | 235 | 236 | 237 | 238 | 239 | 240 | 241 | 242 | 243 | 14 244 | 245 | 246 | 247 | 248 | 249 | 250 | 251 | 252 | 253 | 15 254 | 255 | 256 | 257 | 258 | 259 | 260 | 261 | 262 | 263 | 16 264 | 265 | 266 | 267 | 268 | 269 | 270 | 271 | 272 | 273 | 17 274 | 275 | 276 | 277 | 278 | 281 | 282 | 283 | 286 | 287 | 288 | 291 | 292 | 293 | 296 | 297 | 298 | With Batches 299 | 300 | 301 | 302 | 303 | 309 | 310 | 311 | 312 | 313 | 314 | 315 | 316 | 317 | 318 | 319 | 320 | 321 | 322 | 323 | 324 | 325 | 326 | 327 | 328 | 329 | 330 | 331 | 332 | 333 | 334 | 335 | 336 | 337 | 338 | 339 | 340 | 341 | 342 | 343 | 344 | 345 | 346 | 347 | 348 | 349 | 350 | 351 | 352 | 353 | 354 | 355 | 356 | 357 | 358 | 359 | 360 | 361 | 362 | 363 | 364 | 365 | 366 | 367 | 368 | 369 | 370 | 371 | 372 | 373 | 374 | 375 | 376 | 377 | 378 | 379 | 380 | 381 | 382 | 383 | 384 | 385 | 386 | 387 | −2 388 | 389 | 390 | 391 | 392 | 393 | 394 | 395 | 396 | 397 | −1 398 | 399 | 400 | 401 | 402 | 403 | 404 | 405 | 406 | 407 | 0 408 | 409 | 410 | 411 | 412 | 413 | 414 | 415 | 416 | 417 | 1 418 | 419 | 420 | 421 | 422 | 423 | 424 | 425 | 426 | 427 | 2 428 | 429 | 430 | 431 | 432 | 433 | 434 | 435 | 436 | 437 | 3 438 | 439 | 440 | 441 | 442 | 443 | 444 | 445 | 446 | 447 | 448 | 449 | 4.5 450 | 451 | 452 | 453 | 454 | 455 | 456 | 457 | 458 | 459 | 5.0 460 | 461 | 462 | 463 | 464 | 465 | 466 | 467 | 468 | 469 | 5.5 470 | 471 | 472 | 473 | 474 | 475 | 476 | 477 | 478 | 479 | 6.0 480 | 481 | 482 | 483 | 484 | 485 | 486 | 487 | 488 | 489 | 6.5 490 | 491 | 492 | 493 | 494 | 495 | 496 | 497 | 498 | 499 | 7.0 500 | 501 | 502 | 503 | 504 | 505 | 506 | 507 | 508 | 509 | 7.5 510 | 511 | 512 | 513 | 514 | 515 | 516 | 517 | 518 | 519 | 8.0 520 | 521 | 522 | 523 | 524 | 527 | 528 | 529 | 532 | 533 | 534 | 537 | 538 | 539 | 542 | 543 | 544 | Without Batches 545 | 546 | 547 | 548 | 549 | 550 | 551 | 552 | 553 | 554 | 555 | 556 | 557 | -------------------------------------------------------------------------------- /portraits/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BostonGene/MFP/4debc39bb1bae10dfc550ec9fcb1c653d722c773/portraits/__init__.py -------------------------------------------------------------------------------- /portraits/classification.py: -------------------------------------------------------------------------------- 1 | import copy 2 | 3 | import pandas as pd 4 | from sklearn.neighbors import KNeighborsClassifier 5 | 6 | from portraits.utils import median_scale 7 | 8 | 9 | class KNeighborsClusterClassifier: 10 | def __init__(self, norm=True, algorithm='auto', clip=2, scale=False, k=35): 11 | """ 12 | Classification using KNN. Fit with signature matrix and labels. Predict on the signatures. 13 | To use training cohort parameters for scaling set norm=True (If the cohorts are from the same batch) 14 | Data sets from different batches should be scaled 15 | :param norm: 16 | :param algorithm: 17 | :param clip: 18 | :param scale: 19 | :param k: 20 | """ 21 | self.norm = norm 22 | self.median = 0 23 | self.mad = 1 24 | self.X = None 25 | self.y = None 26 | self.algorithm = algorithm 27 | self.model = None 28 | self.clip = clip 29 | self.scale = scale 30 | self.k = k 31 | 32 | def check_is_fitted(self): 33 | return (self.X is not None) and (self.y is not None) and (self.model is not None) 34 | 35 | def preprocess_data(self, X): 36 | x = copy.deepcopy(X) 37 | if self.scale: 38 | x = median_scale(x) 39 | 40 | x = (x - self.median) / self.mad 41 | if self.clip is not None and self.clip > 0: 42 | x = x.clip(-1 * self.clip, self.clip) 43 | return x 44 | 45 | def check_columns(self, X): 46 | if hasattr(self.X, 'columns'): 47 | try: 48 | return X[self.X.columns] 49 | except KeyError: 50 | raise Exception('Columns do not match') 51 | return X 52 | 53 | def fit(self, X, y): 54 | """ 55 | :param X: pd.DataFrame, RNA data, columns - features, index - samples 56 | :param y: pd.Series, cluster labels 57 | """ 58 | if X.shape[0] != len(y): 59 | raise Exception('Shapes do not match') 60 | 61 | if self.norm: 62 | self.median = X.median() 63 | self.mad = X.mad() 64 | 65 | self.X = self.preprocess_data(X) 66 | self.y = copy.deepcopy(y) 67 | 68 | self.model = KNeighborsClassifier(algorithm=self.algorithm, n_neighbors=self.k).fit(self.X, self.y) 69 | 70 | return self 71 | 72 | def predict(self, X): 73 | """ 74 | Predict - return a pd.Series with the predicted cluster labels 75 | 76 | :param X: pd.DataFrame, RNA data, columns - features, index - samples 77 | :return: pd.Series, predicted cluster labels 78 | """ 79 | if X.shape[1] != self.X.shape[1]: 80 | raise Exception('Shapes do not match') 81 | 82 | x_scaled = self.preprocess_data(self.check_columns(X)) 83 | # Here self.model.predict is used in order to mimic its' way to select the class in case of equal probabilities 84 | return pd.Series(self.model.predict(x_scaled), index=x_scaled.index) 85 | 86 | def predict_proba(self, X): 87 | """ 88 | :param X: pd.DataFrame, RNA data, columns - features, index - samples 89 | :return: pd.DataFrame, probabilities for each cluster. Index - samples, columns - clusters 90 | """ 91 | 92 | if X.shape[1] != self.X.shape[1]: 93 | raise Exception('Shapes do not match') 94 | 95 | x_scaled = self.preprocess_data(self.check_columns(X)) 96 | 97 | return pd.DataFrame(self.model.predict_proba(x_scaled).astype(float), index=x_scaled.index, 98 | columns=self.model.classes_) 99 | -------------------------------------------------------------------------------- /portraits/clustering.py: -------------------------------------------------------------------------------- 1 | import warnings 2 | 3 | import community # louvain 4 | import matplotlib.pyplot as plt 5 | import networkx as nx 6 | import numpy as np 7 | import pandas as pd 8 | from tqdm import tqdm 9 | 10 | 11 | def gen_graph(similarity_matrix, threshold=0.8): 12 | """ 13 | Generates a graph from the similarity_matrix (square dataframe). Each sample is a node, similarity - edge weight. 14 | Edges with weight lower than the threshold are ignored. 15 | Only nodes with at least 1 edge with weight above the threshold will be present in the final graph 16 | :param similarity_matrix: 17 | :param threshold: 18 | :return: 19 | """ 20 | G = nx.Graph() 21 | for col_n in similarity_matrix: 22 | col = similarity_matrix[col_n].drop(col_n) 23 | mtr = col[col > threshold] 24 | for row_n, val in list(mtr.to_dict().items()): 25 | G.add_edge(col_n, row_n, weight= np.round(val, 2)) 26 | return G 27 | 28 | 29 | def louvain_community(correlation_matrix, threshold=1.4, resolution=1, random_state=100, **kwargs): 30 | """ 31 | Generates a graph from a correlation_matrix with weighted edges (weight 10000000: 42 | warnings.warn("Too many edges will result in huge computational time") 43 | return pd.Series(community.best_partition(G, resolution=resolution, 44 | random_state=random_state, **kwargs)) + 1 45 | 46 | 47 | def dense_clustering(data, threshold=0.4, name='MFP', method='louvain', **kwargs): 48 | """ 49 | Generates a graph from the table features(cols)*samples(rows). 50 | Then performs community detection using a selected method leiden|louvain 51 | :param method: 52 | :param data: 53 | :param threshold: 54 | :param name: 55 | :return: 56 | """ 57 | if method == 'louvain': # others could be implemented like Leiden 58 | partition = louvain_community(data.T.corr() + 1, threshold + 1, **kwargs) 59 | else: 60 | raise Exception('Unknown method') 61 | 62 | return partition.rename(name) 63 | 64 | 65 | def clustering_profile_metrics(data, threshold_mm=(0.3, 0.65), step=0.01, method='louvain'): 66 | """ 67 | Iterates threshold in threshold_mm area with step. Calculates cluster separation metrics on each threshold. 68 | Returns a pd.DataFrame with the metrics 69 | :param data: 70 | :param threshold_mm: 71 | :param step: 72 | :param method: 73 | :return: 74 | """ 75 | from sklearn.metrics import silhouette_score, calinski_harabasz_score, davies_bouldin_score 76 | cluster_metrics = {} 77 | 78 | for tr in tqdm(np.round(np.arange(threshold_mm[0], threshold_mm[1], step), 3)): 79 | clusters_comb = dense_clustering(data, threshold=tr, method=method) 80 | cluster_metrics[tr] = { 81 | 'ch': calinski_harabasz_score(data.loc[clusters_comb.index], clusters_comb), 82 | 'db': davies_bouldin_score(data.loc[clusters_comb.index], clusters_comb), 83 | 'sc': silhouette_score(data.loc[clusters_comb.index], clusters_comb), 84 | 'N': len(clusters_comb.unique()), 85 | 'perc': clusters_comb, 86 | } 87 | 88 | return pd.DataFrame(cluster_metrics).T 89 | 90 | 91 | def clustering_profile_metrics_plot(cluster_metrics, num_clusters_ylim_max=7): 92 | """ 93 | Plots a dataframe from clustering_profile_metrics 94 | :param cluster_metrics: 95 | :param num_clusters_ylim_max: 96 | :return: axis array 97 | """ 98 | 99 | # necessary for correct x axis sharing 100 | cluster_metrics.index = [str(x) for x in cluster_metrics.index] 101 | 102 | plots_ratios = [3, 3, 3, 1, 2] 103 | fig, axs = plt.subplots(len(plots_ratios), 1, figsize=(8, np.sum(plots_ratios)), 104 | gridspec_kw={'height_ratios': plots_ratios}, sharex=True) 105 | for ax in axs: 106 | ax.tick_params(axis='x', which='minor', length=0) 107 | af = axs.flat 108 | 109 | ax = cluster_metrics.db.plot(ax=next(af), label='Davies Bouldin', color='#E63D06') 110 | ax.legend() 111 | 112 | ax = cluster_metrics.ch.plot(ax=next(af), label='Calinski Harabasz', color='#E63D06') 113 | ax.legend() 114 | 115 | ax = cluster_metrics.sc.plot(ax=next(af), label='Silhouette score', color='#E63D06') 116 | ax.legend() 117 | 118 | ax = cluster_metrics.N.plot(kind='line', ax=next(af), label='# clusters', color='#000000') 119 | ax.set_ylim(0, num_clusters_ylim_max) 120 | ax.legend() 121 | 122 | # display percentage for 10 clusters max 123 | clusters_perc = pd.DataFrame([x.value_counts() for x in cluster_metrics.perc], 124 | index=cluster_metrics.index).iloc[:, :10] 125 | ax=next(af) 126 | clusters_perc.plot(kind='bar', stacked=True, ax=ax, width = 0.85) 127 | ax.legend(loc=(1.01, -0.5)) 128 | 129 | ax.set_xticks(ax.get_xticks() - .5) 130 | ax.set_xticklabels(ax.get_xticklabels(), rotation=90) 131 | 132 | ax.set_ylabel('Cluster %') 133 | return ax 134 | 135 | 136 | def clustering_select_best_tr(data, n_clusters=4, threshold_mm=(0.3, 0.6), 137 | step=0.025, method='leiden', num_clusters_ylim_max=7, plot=True): 138 | """ 139 | Selects the best threshold for n_clusters separation using dense_clustering with selected method 140 | from threshold_mm with a particular step 141 | :param data: dataframe with processes (rows - samples, columns - signatures) 142 | :param n_clusters: desired number of clusters 143 | :param threshold_mm: range of thresholds 144 | :param step: step to go through range of thresholds 145 | :param method: clustering method 146 | :param num_clusters_ylim_max: set y_lim for plot with number of clusters 147 | :param plot: whether to plot all matrix 148 | :return: the threshold to get n_clusters 149 | """ 150 | cl_scs = clustering_profile_metrics(data, threshold_mm=threshold_mm, step=step, method=method) 151 | 152 | if plot: 153 | clustering_profile_metrics_plot(cl_scs, num_clusters_ylim_max) 154 | plt.show() 155 | 156 | cl_scs_filtered = cl_scs[cl_scs.N == n_clusters] 157 | 158 | if not len(cl_scs_filtered): 159 | raise Exception('No partition with n_clusters = {}'.format(n_clusters)) 160 | 161 | cl_scs_filtered.sc += 1 - cl_scs_filtered.sc.min() 162 | return (cl_scs_filtered.ch / cl_scs_filtered.db / cl_scs_filtered.sc).sort_values().index[-1] 163 | -------------------------------------------------------------------------------- /portraits/mapping.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import logging 3 | 4 | def get_gs_for_probes_from_3col(platform_file, probe_list): 5 | 6 | import logging 7 | 8 | """ 9 | Getting probe-gene symbol dictionary 10 | 11 | :param platform_name: str, platform name 12 | :param probe_list: list, list with probe names 13 | 14 | :return: dict, dictionary with probe-gene symbol key-values 15 | """ 16 | try: 17 | platform_data = pd.read_csv(platform_file, sep='\t', header=None, index_col=0, na_values=["NONE"]) 18 | except Exception as e: 19 | logging.warning(f"Failed to read mapping 3col-file: {str(e)}") 20 | return None 21 | 22 | dict_raw_name_id = dict() 23 | not_found_probes_amount = len(set(probe_list).difference(platform_data.index)) 24 | if not_found_probes_amount: 25 | logging.warn(f'{not_found_probes_amount} probes not found or format is not correct.') 26 | return dict() 27 | 28 | result = platform_data[1].loc[probe_list].dropna().astype(str).apply( 29 | lambda x: x.strip().replace(" ", "").split("///")).to_dict() 30 | 31 | return result 32 | 33 | 34 | def get_expressions_list(probes_list, probes_value_table, method='max'): 35 | """ 36 | Returns list of expressions for matching gene_symbol 37 | 38 | :param probes_list: list, list of probes (for matching gene_symbol) 39 | :param probes_value_table: pd.DataFrame, matching table for probe_id and expression values (for each sample) 40 | :param method: str, getting expressions method (max / med ) 41 | 42 | :return: list, list of expressions for matching gene_symbol 43 | """ 44 | 45 | def average_expression(expressions_list): 46 | return sum(expressions_list) / len(expressions_list) 47 | 48 | probes_avg_expr_dict = {} 49 | # count average for all gsms 50 | for probe in probes_list: 51 | probes_avg_expr_dict[probe] = average_expression(probes_value_table.loc[probe, :]) 52 | probe_res = '' 53 | if method == 'max': 54 | # choose probe-id with max average value 55 | probe_res = max(probes_avg_expr_dict, key=probes_avg_expr_dict.get) 56 | 57 | elif method == 'med': 58 | # choose probe with median value 59 | # if 2 probes choose probe with max average value 60 | 61 | # sort dict by values and return list of keys 62 | sorted_probes_list = sorted(probes_avg_expr_dict, key=probes_avg_expr_dict.get) 63 | if len(probes_list) % 2 == 0: 64 | probe_res = max(sorted_probes_list[len(probes_list) / 2 - 1], 65 | sorted_probes_list[len(probes_list) / 2], 66 | key=probes_avg_expr_dict.get) 67 | else: 68 | probe_res = sorted_probes_list[len(probes_list) / 2] 69 | return probes_value_table.loc[probe_res, :] 70 | 71 | 72 | def get_expressions_for_gs(probes_gs_dict, probes_value_table, gs_sel_alg='max'): 73 | """ 74 | Getting genes/samples expression table 75 | 76 | :param probes_gs_dict: dict, dictionary with probe-gene symbol key-values 77 | :param probes_value_table: pd.DataFrame, probes/samples expression transformed dataframe 78 | :param gs_sel_alg: str, getting expressions method (max / med ) 79 | 80 | :return: gs_expr_table: pd.DataFrame, genes/samples expression table 81 | """ 82 | import logging 83 | 84 | def get_reverse_dictionary(probes_gs_dict): 85 | from collections import defaultdict 86 | gs_probes_dict = defaultdict(list) 87 | for probe, gs_list in probes_gs_dict.items(): 88 | for gs in gs_list: 89 | gs_probes_dict[gs].append(probe) 90 | return gs_probes_dict 91 | 92 | gs_expr_table = pd.DataFrame() 93 | 94 | logging.info("Making list of probes for each of gene-symbols ...") 95 | gs_probes_dict = get_reverse_dictionary(probes_gs_dict) 96 | 97 | logging.info("Making expression list for each of gene-symbols ...") 98 | for gs, probe_list in gs_probes_dict.items(): 99 | # print(probe_list[:30]) 100 | if (len(probe_list) == 1): 101 | gs_expr_table[gs] = probes_value_table.loc[probe_list[0], :] 102 | else: 103 | gs_expr_table[gs] = get_expressions_list( 104 | probes_list=probe_list, 105 | probes_value_table=probes_value_table, 106 | method=gs_sel_alg 107 | ) 108 | 109 | logging.info(f'Final expression samples/gene symbols table shape: {gs_expr_table.shape}') 110 | 111 | return gs_expr_table -------------------------------------------------------------------------------- /portraits/plotting.py: -------------------------------------------------------------------------------- 1 | import copy 2 | 3 | import matplotlib 4 | import matplotlib.pyplot as plt 5 | import numpy as np 6 | import pandas as pd 7 | import seaborn as sns 8 | import os 9 | import umap.umap_ as umap 10 | 11 | 12 | from portraits.utils import item_series, to_common_samples 13 | 14 | 15 | def axis_net(x, y, title='', x_len=4, y_len=4, title_y=1, gridspec_kw=None): 16 | """ 17 | Return an axis iterative for subplots arranged in a net 18 | :param x: int, number of subplots in a row 19 | :param y: int, number of subplots in a column 20 | :param title: str, plot title 21 | :param x_len: float, width of a subplot in inches 22 | :param y_len: float, height of a subplot in inches 23 | :param gridspec_kw: is used to specify axis ner with different rows/cols sizes. 24 | A dict: height_ratios -> list + width_ratios -> list 25 | :param title_y: absolute y position for suptitle 26 | :return: axs.flat, numpy.flatiter object which consists of axes (for further plots) 27 | """ 28 | if x == y == 1: 29 | fig, ax = plt.subplots(figsize=(x * x_len, y * y_len)) 30 | af = ax 31 | else: 32 | fig, axs = plt.subplots(y, x, figsize=(x * x_len, y * y_len), gridspec_kw=gridspec_kw) 33 | af = axs.flat 34 | 35 | fig.suptitle(title, y=title_y) 36 | return af 37 | 38 | 39 | def lin_colors(factors_vector, cmap='default', sort=True, min_v=0, max_v=1, linspace=True): 40 | """ 41 | Return dictionary of unique features of "factors_vector" as keys and color hexes as entries 42 | :param factors_vector: pd.Series 43 | :param cmap: matplotlib.colors.LinearSegmentedColormap, which colormap to base the returned dictionary on 44 | default - matplotlib.cmap.hsv with min_v=0, max_v=.8, lighten_color=.9 45 | :param sort: bool, whether to sort the unique features 46 | :param min_v: float, for continuous palette - minimum number to choose colors from 47 | :param max_v: float, for continuous palette - maximum number to choose colors from 48 | :param linspace: bool, whether to spread the colors from "min_v" to "max_v" 49 | linspace=False can be used only in discrete cmaps 50 | :return: dict 51 | """ 52 | 53 | unique_factors = factors_vector.dropna().unique() 54 | if sort: 55 | unique_factors = np.sort(unique_factors) 56 | 57 | if cmap == 'default': 58 | cmap = matplotlib.cm.rainbow 59 | max_v = .92 60 | 61 | if linspace: 62 | cmap_colors = cmap(np.linspace(min_v, max_v, len(unique_factors))) 63 | else: 64 | cmap_colors = np.array(cmap.colors[:len(unique_factors)]) 65 | 66 | return dict(list(zip(unique_factors, [matplotlib.colors.to_hex(x) for x in cmap_colors]))) 67 | 68 | 69 | def axis_net(x, y, title='', x_len=4, y_len=4, title_y=1, gridspec_kw=None): 70 | """ 71 | Return an axis iterative for subplots arranged in a net 72 | :param x: int, number of subplots in a row 73 | :param y: int, number of subplots in a column 74 | :param title: str, plot title 75 | :param x_len: float, width of a subplot in inches 76 | :param y_len: float, height of a subplot in inches 77 | :param gridspec_kw: is used to specify axis ner with different rows/cols sizes. 78 | A dict: height_ratios -> list + width_ratios -> list 79 | :param title_y: absolute y position for suptitle 80 | :return: axs.flat, numpy.flatiter object which consists of axes (for further plots) 81 | """ 82 | if x == y == 1: 83 | fig, ax = plt.subplots(figsize=(x * x_len, y * y_len)) 84 | af = ax 85 | else: 86 | fig, axs = plt.subplots(y, x, figsize=(x * x_len, y * y_len), gridspec_kw=gridspec_kw) 87 | af = axs.flat 88 | 89 | fig.suptitle(title, y=title_y) 90 | return af 91 | 92 | 93 | def pca_plot(data, grouping=None, order=(), n_components=2, ax=None, palette=None, 94 | alpha=1, random_state=42, s=20, figsize=(5, 5), title='', 95 | legend='in', **kwargs): 96 | kwargs_scatter = dict() 97 | kwargs_scatter['linewidth'] = kwargs.pop('linewidth', 0) 98 | kwargs_scatter['marker'] = kwargs.pop('marker', 'o') 99 | kwargs_scatter['edgecolor'] = kwargs.pop('edgecolor', 'black') 100 | 101 | if grouping is None: 102 | grouping = item_series('*', data) 103 | 104 | # Common samples 105 | c_data, c_grouping = to_common_samples([data, grouping]) 106 | 107 | if len(order): 108 | group_order = copy.copy(order) 109 | else: 110 | group_order = np.sort(c_grouping.unique()) 111 | 112 | if palette is None: 113 | cur_palette = lin_colors(c_grouping) 114 | else: 115 | cur_palette = copy.copy(palette) 116 | 117 | if ax is None: 118 | _, ax = plt.subplots(figsize=figsize) 119 | 120 | # Get model and transform 121 | n_components = min(n_components, len(c_data.columns)) 122 | from sklearn.decomposition import PCA 123 | model = PCA(n_components=n_components, random_state=random_state, **kwargs) 124 | 125 | data_tr = pd.DataFrame(model.fit_transform(c_data), index=c_data.index) 126 | 127 | label_1 = 'PCA 1 component {}% variance explained'.format(int(model.explained_variance_ratio_[0] * 100)) 128 | label_2 = 'PCA 2 component {}% variance explained'.format(int(model.explained_variance_ratio_[1] * 100)) 129 | 130 | kwargs_scatter = kwargs_scatter or {} 131 | for group in group_order: 132 | samples = list(c_grouping[c_grouping == group].index) 133 | ax.scatter(data_tr[0][samples], data_tr[1][samples], color=cur_palette[group], s=s, alpha=alpha, 134 | label=str(group), **kwargs_scatter) 135 | 136 | if legend == 'out': 137 | ax.legend(scatterpoints=1, bbox_to_anchor=(1, 1), loc=2, borderaxespad=0.1) 138 | elif legend == 'in': 139 | ax.legend(scatterpoints=1) 140 | 141 | ax.set_title(title) 142 | ax.set_xlabel(label_1) 143 | ax.set_ylabel(label_2) 144 | 145 | return ax 146 | 147 | 148 | def clustering_heatmap(ds, title='', corr='pearson', method='complete', 149 | yl=True, xl=True, 150 | cmap=matplotlib.cm.coolwarm, col_colors=None, 151 | figsize=None, **kwargs): 152 | from scipy.spatial.distance import squareform 153 | from scipy.cluster.hierarchy import linkage 154 | 155 | dissimilarity_matrix = 1 - ds.T.corr(method=corr) 156 | hclust_linkage = linkage(squareform(dissimilarity_matrix), method=method) 157 | 158 | g = sns.clustermap(1 - dissimilarity_matrix, method=method, 159 | row_linkage=hclust_linkage, col_linkage=hclust_linkage, 160 | cmap=cmap, yticklabels=yl, xticklabels=xl, 161 | col_colors=col_colors, figsize=figsize, **kwargs) 162 | 163 | g.fig.suptitle(title) 164 | 165 | return g 166 | 167 | 168 | def patch_plot(patches, ax=None, order='sort', w=0.25, h=0, legend_right=True, 169 | show_ticks=False): 170 | cur_patches = pd.Series(patches) 171 | 172 | if order == 'sort': 173 | order = list(np.sort(cur_patches.index)) 174 | 175 | data = pd.Series([1] * len(order), index=order[::-1]) 176 | if ax is None: 177 | if h == 0: 178 | h = 0.3 * len(patches) 179 | _, ax = plt.subplots(figsize=(w, h)) 180 | 181 | data.plot(kind='barh', color=[cur_patches[x] for x in data.index], width=1, ax=ax) 182 | ax.set_xticks([]) 183 | if legend_right: 184 | ax.yaxis.tick_right() 185 | 186 | sns.despine(offset={'left': -2}, ax=ax) 187 | 188 | ax.grid(False) 189 | for spine in ax.spines.values(): 190 | spine.set_visible(False) 191 | 192 | if not show_ticks: 193 | ax.tick_params(length=0) 194 | 195 | return ax 196 | 197 | 198 | def draw_graph(G, ax=None, title='', figsize=(12, 12), v_labels=True, e_labels=True, node_color='r', node_size=30, 199 | el_fs=5, nl_fs=8): 200 | """ 201 | Draws a graph. 202 | :param G: 203 | :param ax: 204 | :param title: 205 | :param figsize: 206 | :param v_labels: 207 | :param e_labels: 208 | :param node_color: 209 | :param node_size: 210 | :param el_fs: edge label font size 211 | :param nl_fs: node label font size 212 | :return: 213 | """ 214 | import networkx as nx 215 | 216 | if ax is None: 217 | _, ax = plt.subplots(figsize=figsize) 218 | 219 | pos = nx.nx_pydot.graphviz_layout(G, prog="neato") 220 | nx.draw_networkx_nodes(G, pos, node_size=node_size, node_color=node_color) 221 | if v_labels: 222 | nx.draw_networkx_labels(G, pos, ax=ax, font_size=nl_fs, font_family='sans-serif', font_color='blue') 223 | 224 | nx.draw_networkx_edges(G, pos, ax=ax) 225 | if e_labels: 226 | labels = nx.get_edge_attributes(G, 'weight') 227 | nx.draw_networkx_edge_labels(G, pos, ax=ax, font_size=el_fs, width=labels, edge_labels=labels) 228 | 229 | ax.set_title(title, fontsize=18) 230 | return ax 231 | 232 | 233 | def umap_plot(data, grouping=None, order=(), n_components=30, ax=None, palette=None, 234 | alpha=1, random_state=42, s=20, figsize=(5, 5), title='', 235 | legend='in', **kwargs): 236 | kwargs_scatter = dict() 237 | kwargs_scatter['linewidth'] = kwargs.pop('linewidth', 0) 238 | kwargs_scatter['marker'] = kwargs.pop('marker', 'o') 239 | kwargs_scatter['edgecolor'] = kwargs.pop('edgecolor', 'black') 240 | 241 | if grouping is None: 242 | grouping = item_series('*', data) 243 | 244 | # Common samples 245 | c_data, c_grouping = to_common_samples([data, grouping]) 246 | 247 | if len(order): 248 | group_order = copy.copy(order) 249 | else: 250 | group_order = np.sort(c_grouping.unique()) 251 | 252 | if palette is None: 253 | cur_palette = lin_colors(c_grouping) 254 | else: 255 | cur_palette = copy.copy(palette) 256 | 257 | if ax is None: 258 | _, ax = plt.subplots(figsize=figsize) 259 | 260 | # Get model and transform 261 | n_components = min(n_components, len(c_data.columns)) 262 | from sklearn.decomposition import PCA 263 | model = PCA(n_components= n_components, random_state=random_state, **kwargs) 264 | reducer = umap.UMAP() 265 | 266 | data_tmp = pd.DataFrame(model.fit_transform(c_data), index=c_data.index) 267 | 268 | 269 | data_tr = pd.DataFrame(reducer.fit_transform(data_tmp), index=c_data.index) 270 | 271 | kwargs_scatter = kwargs_scatter or {} 272 | for group in group_order: 273 | samples = list(c_grouping[c_grouping == group].index) 274 | ax.scatter(data_tr[0], data_tr[1], color=cur_palette[group], s=s, alpha=alpha, 275 | label=str(group), **kwargs_scatter) 276 | 277 | if legend == 'out': 278 | ax.legend(scatterpoints=1, bbox_to_anchor=(1, 1), loc=2, borderaxespad=0.1) 279 | elif legend == 'in': 280 | ax.legend(scatterpoints=1) 281 | 282 | ax.set_title(title) 283 | 284 | return ax 285 | 286 | 287 | def distplot_qc( 288 | exp_df, 289 | model_pickle='/uftp/Transformatics/Tools/Distplot_predictor/Distplot_QC.sav', 290 | stand_path='/uftp/Transformatics/Tools/Distplot_predictor/Standard_dist.tsv', 291 | log2=False, 292 | ): 293 | """ 294 | Predicting quality of gene expression distribution 295 | :param stand_path: "Perfect" distribution 296 | :param exp_df: Dataframe of gene expression; samples in rows, genes in columns. Expression values should 297 | be log2-transformed before passing it to function - np.log2(exp+1) 298 | :param log2: True if expression values log2-transformed already 299 | :param model_pickle: Path to saved model 300 | :return: Series of predicted quality for expression distribution 301 | 302 | Model versions: 303 | sklearn version - 0.24.1 304 | pip version - 20.3.3 (python 3.7) 305 | """ 306 | 307 | import joblib 308 | from scipy.stats import ks_2samp, mannwhitneyu 309 | import pandas as pd 310 | from bioreactor.utils import read_dataset 311 | 312 | if not log2: 313 | exp_df = np.log2(exp_df + 1) 314 | 315 | displot_model = joblib.load(model_pickle) 316 | 317 | exp_stats = pd.DataFrame(columns=['K-S Stat', 'M-W Stat']) 318 | 319 | stand_dist = read_dataset(stand_path).iloc[:, 0] 320 | 321 | exp_stats['K-S Stat'] = exp_df.T.apply(lambda x: ks_2samp(x, stand_dist)[0]) 322 | exp_stats['M-W Stat'] = exp_df.T.apply(lambda x: mannwhitneyu(x, stand_dist)[0]) 323 | 324 | return pd.Series(displot_model.predict(exp_stats), index=exp_df.index) 325 | 326 | def vector_pie_plot(data, ax=None, figsize=(4, 4), title='', palette=None, display_counts=False, order=None): 327 | """ 328 | Constructs pie plot by provided pd.Series 329 | :param data: pd.Series 330 | :param ax: matplotlib axis, axis to plot on 331 | :param figsize: (float, float), figure size in inches 332 | :param title: str, plot title 333 | :param palette: dict, palette for plotting. Keys are unique values from groups, entries are color hexes 334 | :param display_counts: bool 335 | :param order: list, order to display groups 336 | :return: matplotlib axis 337 | """ 338 | if ax is None: 339 | _, ax = plt.subplots(figsize=figsize) 340 | 341 | order = order or list(data.unique()) 342 | 343 | c_data = data.value_counts() 344 | c_data = c_data[[x for x in order if x in c_data.index]] 345 | 346 | if palette is not None: 347 | c_colors = pd.Series(palette)[c_data.index] 348 | else: 349 | c_colors = None 350 | 351 | if display_counts: 352 | actopcl_rule = lambda p: '{:.0f}'.format(p * sum(c_data.values) / 100) 353 | else: 354 | actopcl_rule = '%1.1f%%' 355 | 356 | _, _, text_props = ax.pie( 357 | c_data, labels=c_data.index, autopct=actopcl_rule, startangle=0, textprops={'fontsize': 14}, colors=c_colors 358 | ) 359 | 360 | for i in text_props: 361 | i.set_color('#ffffff') 362 | ax.axis('equal') 363 | ax.set_title(title) 364 | ax.set_xlabel(data.name) 365 | return ax 366 | 367 | def line_palette_annotation_plot(val_vector, palette, ax=None, nan_color='#ffffff', 368 | hide_ticks=True, hide_borders=True, **kwargs): 369 | """ 370 | Draws line annotation plot 371 | :param val_vector: pd.Series with values 372 | :param palette: dict, palette for values 373 | :param ax: ax to plot 374 | :param nan_color: str, color for np.nan 375 | :param hide_ticks: bool, whether to plot ticks 376 | :param hide_borders: bool, whether to plot borders 377 | :return: ax with plot 378 | """ 379 | return line_annotation_plot(val_vector.map(palette), ax=ax, nan_color=nan_color, 380 | hide_ticks=hide_ticks, hide_borders=hide_borders, **kwargs) 381 | 382 | 383 | def line_annotation_plot(color_vector, ax=None, nan_color='#ffffff', offset=0, hide_ticks=True, hide_borders=True): 384 | """ 385 | 386 | :param color_vector: 387 | :param ax: 388 | :param nan_color: 389 | :param offset: 390 | :return: 391 | """ 392 | 393 | if ax is None: 394 | _, ax = plt.subplots(figsize=(max(len(color_vector) / 15.0, 6), 0.5)) 395 | 396 | items_amount = len(color_vector) 397 | 398 | xss = np.arange(items_amount) - offset 399 | yss = pd.Series([1] * items_amount, index=color_vector.index) 400 | 401 | with sns.axes_style("white"): 402 | ax.bar( 403 | xss, 404 | yss, 405 | color=color_vector.fillna(nan_color), 406 | width=1, 407 | align='edge', 408 | edgecolor=color_vector.fillna(nan_color), 409 | ) 410 | 411 | ax.set_ylim(0, 1) 412 | ax.set_xlim(0, items_amount) 413 | 414 | ax.set_xticklabels([]) 415 | ax.set_yticklabels([]) 416 | ax.xaxis.label.set_visible(False) 417 | ax.set_ylabel(color_vector.name, rotation=0, labelpad=10, va='center', ha='right') 418 | 419 | if hide_ticks: 420 | ax.tick_params(length=0) 421 | 422 | if hide_borders: 423 | for spine in ['bottom', 'top', 'left', 'right']: 424 | ax.spines[spine].set_visible(False) 425 | 426 | return ax 427 | 428 | 429 | 430 | def axis_matras(ys, title='', x_len=8, title_y=1, sharex=True): 431 | """ 432 | Return an axis iterative for subplots stacked vertically 433 | :param ys: list, list of lengths by 'y' 434 | :param title: str, title for plot 435 | :param x_len: int, length by 'x' 436 | :param sharex: boolean, images will be shared if True 437 | :param title_y: absolute y position for suptitle 438 | :return: axs.flat, numpy.flatiter object which consists of axes (for further plots) 439 | """ 440 | fig, axs = plt.subplots(len(ys), 1, figsize=(x_len, np.sum(ys)), gridspec_kw={'height_ratios': ys}, sharex=sharex) 441 | fig.suptitle(title, y=title_y) 442 | 443 | for ax in axs: 444 | ax.tick_params(axis='x', which='minor', length=0) 445 | 446 | return axs.flat -------------------------------------------------------------------------------- /portraits/utils.py: -------------------------------------------------------------------------------- 1 | import warnings 2 | 3 | import numpy as np 4 | import pandas as pd 5 | 6 | 7 | class GeneSet(object): 8 | def __init__(self, name, descr, genes): 9 | self.name = name 10 | self.descr = descr 11 | self.genes = set(genes) 12 | self.genes_ordered = list(genes) 13 | 14 | def __str__(self): 15 | return '{}\t{}\t{}'.format(self.name, self.descr, '\t'.join(self.genes)) 16 | 17 | 18 | def read_gene_sets(gmt_file): 19 | """ 20 | Return dict {geneset_name : GeneSet object} 21 | 22 | :param gmt_file: str, path to .gmt file 23 | :return: dict 24 | """ 25 | gene_sets = {} 26 | with open(gmt_file) as handle: 27 | for line in handle: 28 | items = line.strip().split('\t') 29 | name = items[0].strip() 30 | description = items[1].strip() 31 | genes = set([gene.strip() for gene in items[2:]]) 32 | gene_sets[name] = GeneSet(name, description, genes) 33 | 34 | return gene_sets 35 | 36 | 37 | def ssgsea_score(ranks, genes): 38 | common_genes = list(set(genes).intersection(set(ranks.index))) 39 | if not len(common_genes): 40 | return pd.Series([0] * len(ranks.columns), index=ranks.columns) 41 | sranks = ranks.loc[common_genes] 42 | return (sranks ** 1.25).sum() / (sranks ** 0.25).sum() - (len(ranks.index) - len(common_genes) + 1) / 2 43 | 44 | 45 | def ssgsea_formula(data, gene_sets, rank_method='max'): 46 | """ 47 | Return DataFrame with ssgsea scores 48 | Only overlapping genes will be analyzed 49 | 50 | :param data: pd.DataFrame, DataFrame with samples in columns and variables in rows 51 | :param gene_sets: dict, keys - processes, values - bioreactor.gsea.GeneSet 52 | :param rank_method: str, 'min' or 'max'. 53 | :return: pd.DataFrame, ssgsea scores, index - genesets, columns - patients 54 | """ 55 | 56 | ranks = data.T.rank(method=rank_method, na_option='bottom') 57 | 58 | return pd.DataFrame({gs_name: ssgsea_score(ranks, gene_sets[gs_name].genes) 59 | for gs_name in list(gene_sets.keys())}) 60 | 61 | 62 | def median_scale(data, clip=None): 63 | c_data = (data - data.median()) / data.mad() 64 | if clip is not None: 65 | return c_data.clip(-clip, clip) 66 | return c_data 67 | 68 | 69 | def read_dataset(file, sep='\t', header=0, index_col=0, comment=None): 70 | return pd.read_csv(file, sep=sep, header=header, index_col=index_col, 71 | na_values=['Na', 'NA', 'NAN'], comment=comment) 72 | 73 | 74 | def item_series(item, indexed=None): 75 | """ 76 | Creates a series filled with item with indexes from indexed (if Series-like) or numerical indexes (size=indexed) 77 | :param item: value for filling 78 | :param indexed: 79 | :return: 80 | """ 81 | if indexed is not None: 82 | if hasattr(indexed, 'index'): 83 | return pd.Series([item] * len(indexed), index=indexed.index) 84 | elif type(indexed) is int and indexed > 0: 85 | return pd.Series([item] * indexed, index=np.arange(indexed)) 86 | return pd.Series() 87 | 88 | 89 | def to_common_samples(df_list=()): 90 | """ 91 | Accepts a list of dataframes. Returns all dataframes with only intersecting indexes 92 | :param df_list: list of pd.DataFrame 93 | :return: pd.DataFrame 94 | """ 95 | cs = set(df_list[0].index) 96 | for i in range(1, len(df_list)): 97 | cs = cs.intersection(df_list[i].index) 98 | 99 | if len(cs) < 1: 100 | warnings.warn('No common samples!') 101 | return [df_list[i].loc[list(cs)] for i in range(len(df_list))] 102 | 103 | 104 | def cut_clustermap_tree(g, n_clusters=2, by_cols=True, name='Clusters'): 105 | """ 106 | Cut clustermap into desired number of clusters. See scipy.cluster.hierarchy.cut_tree documentation. 107 | :param g: 108 | :param n_clusters: 109 | :param by_cols: 110 | :param name: 111 | :return: pd.Series 112 | """ 113 | from scipy.cluster.hierarchy import cut_tree 114 | if by_cols: 115 | link = g.dendrogram_col.linkage 116 | index = g.data.columns 117 | else: 118 | link = g.dendrogram_row.linkage 119 | index = g.data.index 120 | 121 | return pd.Series(cut_tree(link, n_clusters=n_clusters)[:, 0], index=index, name=name) + 1 122 | 123 | 124 | def pivot_vectors(vec1, vec2, na_label_1=None, na_label_2=None): 125 | """ 126 | Aggregates 2 vectors into a table with amount of pairs (vec1.x, vec2.y) in a cell 127 | Both series must have same index. 128 | Else different indexes values will be counted in a_label_1/na_label_2 columns if specified or ignored 129 | :param vec1: pd.Series 130 | :param vec2: pd.Series 131 | :param na_label_1: How to name NA column 132 | :param na_label_2: How to name NA row 133 | :return: pivot table 134 | """ 135 | 136 | name1 = str(vec1.name) 137 | if vec1.name is None: 138 | name1 = 'V1' 139 | 140 | name2 = str(vec2.name) 141 | if vec2.name is None: 142 | name2 = 'V2' 143 | 144 | if name1 == name2: 145 | name1 += '_1' 146 | name2 += '_2' 147 | 148 | sub_df = pd.DataFrame({name1: vec1, 149 | name2: vec2}) 150 | # FillNAs 151 | fill_dict = {} 152 | if na_label_1 is not None: 153 | fill_dict[name1] = na_label_1 154 | if na_label_2 is not None: 155 | fill_dict[name2] = na_label_2 156 | sub_df.fillna(value=fill_dict, inplace=True) 157 | 158 | sub_df = sub_df.assign(N=item_series(1, sub_df)) 159 | 160 | return pd.pivot_table(data=sub_df, columns=name1, 161 | index=name2, values='N', aggfunc=sum).fillna(0).astype(int) 162 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | pandas==1.5.3 2 | numpy==1.24.2 3 | seaborn==0.12.2 4 | matplotlib==3.7.0 5 | matplotlib-inline==0.1.6 6 | umap-learn 7 | ipykernel==6.21.2 8 | ipython==8.10.0 9 | ipython-genutils==0.2.0 10 | tqdm==4.64.1 11 | scikit-learn==1.2.1 12 | rpy2==3.5.9 13 | python-louvain==0.12 14 | networkx==2.8.5 15 | joblib==1.2.0 16 | scipy==1.10.1 17 | -------------------------------------------------------------------------------- /signatures/gene_signatures.gmt: -------------------------------------------------------------------------------- 1 | MHCI MHCI HLA-A HLA-B HLA-C B2M TAP1 TAP2 NLRC5 TAPBP 2 | MHCII MHCII HLA-DRA HLA-DRB1 HLA-DMA HLA-DPA1 HLA-DPB1 HLA-DMB HLA-DQB1 HLA-DQA1 CIITA 3 | Coactivation_molecules Co-stimulatory molecules CD28 CD40 TNFRSF4 ICOS TNFRSF9 CD27 CD80 CD86 CD40LG CD83 TNFSF4 ICOSLG TNFSF9 CD70 4 | Effector_cells Effector cells IFNG GZMA GZMB PRF1 GZMK ZAP70 GNLY FASLG TBX21 EOMES CD8A CD8B 5 | T_cell_traffic Effector cell traffic CXCL9 CXCL10 CXCL11 CX3CL1 CCL3 CCL4 CX3CR1 CCL5 CXCR3 6 | NK_cells NK cells NKG7 CD160 CD244 NCR1 KLRC2 KLRK1 CD226 GZMH GNLY IFNG KIR2DL4 EOMES GZMB FGFBP2 KLRF1 SH2D1B NCR3 7 | T_cells T cells TBX21 ITK CD3D CD3E CD3G TRAC TRBC1 TRBC2 CD28 CD5 TRAT1 8 | B_cells B cells CD19 MS4A1 TNFRSF13C CR2 TNFRSF17 TNFRSF13B CD22 CD79A CD79B BLK FCRL5 PAX5 STAP1 9 | M1_signatures M1 signature NOS2 TNF IL1B SOCS3 CMKLR1 IRF5 IL12A IL12B IL23A 10 | Th1_signature Th1 signature IFNG IL2 CD40LG IL21 TBX21 STAT4 IL12RB2 11 | Antitumor_cytokines Antitumor cytokines TNF IFNB1 IFNA2 CCL3 TNFSF10 IL21 12 | Checkpoint_inhibition Checkpoint molecules PDCD1 CD274 CTLA4 LAG3 PDCD1LG2 BTLA HAVCR2 TIGIT VSIR 13 | Treg Treg FOXP3 CTLA4 IL10 TNFRSF18 CCR8 IKZF4 IKZF2 14 | T_reg_traffic Treg and Th2 traffic CCL17 CCL22 CCL1 CCL28 CCR4 CCR8 CCR10 15 | Neutrophil_signature Neutrophil signature MPO ELANE PRTN3 CTSG CXCR1 CXCR2 FCGR3B CD177 FFAR2 PGLYRP1 16 | Granulocyte_traffic Granulocyte traffic CXCL8 CXCL1 CXCL2 CXCL5 CCL11 KITLG CXCR1 CXCR2 CCR3 17 | MDSC Immune Suppression by Myeloid Cells IDO1 ARG1 IL10 CYBB PTGS2 IL4I1 IL6 18 | MDSC_traffic Myeloid cells traffic CSF2 CSF3 CXCL12 CCL26 IL6 CXCL8 CXCL5 CSF1R CSF2RA CSF3R CXCR4 IL6R CXCR2 CCL15 CSF1 19 | Macrophages Tumor-associated Macrophages IL10 MRC1 MSR1 CD163 CSF1R IL4I1 SIGLEC1 CD68 20 | Macrophage_DC_traffic Macrophage and DC traffic CCL2 CCL7 CCL8 XCL1 CCR2 XCR1 CSF1R CSF1 21 | Th2_signature Th2 signature IL4 IL5 IL13 IL10 CCR4 22 | Protumor_cytokines Protumor cytokines IL10 TGFB1 TGFB2 TGFB3 IL22 MIF IL6 23 | CAF Fibroblasts COL1A1 COL1A2 COL5A1 ACTA2 FGF2 FAP LRP1 CD248 COL6A1 COL6A2 COL6A3 CXCL12 FBLN1 LUM MFAP5 MMP3 MMP2 PDGFRB PDGFRA 24 | Matrix Matrix FN1 COL1A1 COL1A2 COL4A1 COL3A1 VTN LGALS7 LGALS9 LAMA3 LAMB3 LAMC2 TNC ELN COL5A1 COL11A1 25 | Matrix_remodeling Matrix remodeling CA9 MMP9 MMP2 MMP1 MMP3 MMP12 MMP7 MMP11 PLOD2 ADAMTS4 ADAMTS5 LOX 26 | Angiogenesis Angiogenesis VEGFA VEGFB VEGFC PDGFC CXCL8 CXCR2 FLT1 PGF CXCL5 KDR ANGPT1 ANGPT2 TEK VWF CDH5 27 | Endothelium Endothelium NOS3 KDR FLT1 VCAM1 VWF CDH5 MMRN1 ENG CLEC14A MMRN2 28 | Proliferation_rate Tumor proliferation rate MKI67 ESCO2 CETN3 CDK2 CCND1 CCNE1 AURKA AURKB E2F1 MYBL2 BUB1 PLK1 CCNB1 MCM2 MCM6 29 | EMT_signature EMT signature SNAI1 SNAI2 TWIST1 TWIST2 ZEB1 ZEB2 CDH2 30 | -------------------------------------------------------------------------------- /signatures/gene_signatures_order.tsv: -------------------------------------------------------------------------------- 1 | Angiogenesis 2 | Endothelium 3 | CAF 4 | Matrix 5 | Matrix_remodeling 6 | Protumor_cytokines 7 | Neutrophil_signature 8 | Granulocyte_traffic 9 | Macrophages 10 | Macrophage_DC_traffic 11 | MDSC_traffic 12 | MDSC 13 | Th2_signature 14 | T_reg_traffic 15 | Treg 16 | M1_signatures 17 | MHCII 18 | Antitumor_cytokines 19 | Coactivation_molecules 20 | B_cells 21 | NK_cells 22 | Checkpoint_inhibition 23 | Effector_cells 24 | T_cells 25 | Th1_signature 26 | T_cell_traffic 27 | MHCI 28 | EMT_signature 29 | Proliferation_rate -------------------------------------------------------------------------------- /upstream_html/From_cell_files.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BostonGene/MFP/4debc39bb1bae10dfc550ec9fcb1c653d722c773/upstream_html/From_cell_files.pdf -------------------------------------------------------------------------------- /upstream_html/Methods_Description_-_Batch_correction.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BostonGene/MFP/4debc39bb1bae10dfc550ec9fcb1c653d722c773/upstream_html/Methods_Description_-_Batch_correction.pdf --------------------------------------------------------------------------------