├── .gitignore ├── Figure1.png ├── LICENSE ├── README.md ├── bidcell ├── BIDCellModel.py ├── __init__.py ├── config.py ├── example_params │ ├── cosmx.yaml │ ├── merscope.yaml │ ├── small_example.yaml │ ├── stereoseq.yaml │ └── xenium.yaml ├── model │ ├── dataio │ │ └── dataset_input.py │ ├── model │ │ ├── intialisation.py │ │ ├── layers.py │ │ ├── losses.py │ │ └── model.py │ ├── postprocess_predictions.py │ ├── predict.py │ ├── train.py │ └── utils │ │ └── utils.py └── processing │ ├── cell_gene_matrix.py │ ├── nuclei_segmentation.py │ ├── nuclei_stitch_fov.py │ ├── preannotate.py │ ├── transcript_patches.py │ ├── transcripts.py │ └── utils.py ├── data ├── dataset_xenium_breast1_small │ ├── morphology_mip_small.tif │ └── transcripts_small.csv ├── example_mousebrain_genes.txt └── sc_references │ ├── sc_breast.csv │ ├── sc_breast_markers_neg.csv │ └── sc_breast_markers_pos.csv ├── example_small.py ├── pdm.lock ├── pyproject.toml ├── setup.cfg └── tests └── __init__.py /.gitignore: -------------------------------------------------------------------------------- 1 | **/__pycache__ 2 | **/cell_gene_matrices 3 | **/model_outputs 4 | *.tif 5 | *.pbs 6 | *.csv 7 | *.gz 8 | *.zip 9 | **/check_cells.py 10 | **/show_cells_by_id.py 11 | **/check.py 12 | example_data/ 13 | data_large/ 14 | dist 15 | params_small_example.yaml 16 | *_example_config.yaml 17 | example_large.py 18 | pypi_token.txt 19 | 20 | !morphology_mip_small.tif 21 | !transcripts_small.csv 22 | !sc_breast.csv 23 | !sc_breast_markers_neg.csv 24 | !sc_breast_markers_pos.csv 25 | .vscode/settings.json 26 | data/dataset_xenium_breast1/all_gene_names.txt 27 | .pdm-python 28 | get_markers.py 29 | -------------------------------------------------------------------------------- /Figure1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SydneyBioX/BIDCell/e565988cd2e78e622c68bd0a5649a1ec8b9b281f/Figure1.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # BIDCell: Biologically-informed self-supervised learning for segmentation of subcellular spatial transcriptomics data 2 | 3 | For more details of our method, please refer to: https://doi.org/10.1038/s41467-023-44560-w 4 | 5 | Recent advances in subcellular imaging transcriptomics platforms have enabled spatial mapping of the expression of hundreds of genes at subcellular resolution and provide topographic context to the data. This has created a new data analytics challenge to correctly identify cells and accurately assign transcripts, ensuring that all available data can be utilised. To this end, we introduce BIDCell, a self-supervised deep learning-based framework that incorporates cell type and morphology information via novel biologically-informed loss functions. We also introduce CellSPA, a comprehensive evaluation framework consisting of metrics in five complementary categories for cell segmentation performance. We demonstrate that BIDCell outperforms other state-of-the-art methods according to many CellSPA metrics across a variety of tissue types of technology platforms, including 10x Genomics Xenium. Taken together, we find that BIDCell can facilitate single-cell spatial expression analyses, including cell-cell interactions, enabling great potential in biological discovery. 6 | 7 | ![alt text](Figure1.png) 8 | 9 | 10 | ## Installation 11 | 12 | > **Note**: A GPU with at least 12GB VRAM is strongly recommended for the deep learning component, and 32GB RAM for data processing. 13 | We ran BIDCell on a Linux system with a 12GB NVIDIA GTX Titan V GPU, Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz with 16 threads, and 64GB RAM. 14 | 15 | 1. Create virtual environment (Python>=3.9,<3.13): 16 | ```sh 17 | conda create --name bidcell python=3.10 18 | ``` 19 | 2. Activate virtual environment: 20 | ```sh 21 | conda activate bidcell 22 | ``` 23 | 3. Install package: 24 | ```sh 25 | python -m pip install bidcell 26 | ``` 27 | Installation of dependencies typically requires a few minutes. 28 | 29 | > **Note**: We are actively finding and fixing issues. If you encounter `[xcb] Unknown sequence number while processing queue`, try running without a GUI, e.g. through PuTTY. Please let us know any other issues you may find. Thank you. 30 | 31 | 32 | ## Demo 33 | 34 | Please download the BIDCell GitHub repository first. A small subset of Xenium breast cancer data is provided as a demo. The example .yaml file can be found in ``bidcell/example_params/small_example.yaml``. Use the following to run all the steps to verify installation: 35 | ```sh 36 | python example_small.py 37 | ``` 38 | Or: 39 | ```py 40 | from bidcell import BIDCellModel 41 | BIDCellModel.get_example_data() 42 | model = BIDCellModel("params_small_example.yaml") 43 | model.run_pipeline() 44 | ``` 45 | 46 | 47 | ## Parameters 48 | 49 | Parameters are defined in .yaml files. Examples are provided for 4 major platforms, including Xenium, CosMx, MERSCOPE, and Stereo-seq. BIDCell may also be applied to data from other technologies such as MERFISH. 50 | 51 | > **Note**: **Please modify `cpus` to suit your system. Higher `cpus` allow faster runtimes but may freeze your system.** 52 | 53 | Run the following to obtain examples: 54 | ```py 55 | from bidcell import BIDCellModel 56 | BIDCellModel.get_example_config("xenium") 57 | BIDCellModel.get_example_config("cosmx") 58 | BIDCellModel.get_example_config("merscope") 59 | BIDCellModel.get_example_config("stereoseq") 60 | ``` 61 | This will copy the .yaml for the respective vendor into your working directory, for example `xenium_example_config.yaml`. 62 | 63 | 64 | ## Example usage 65 | 66 | The full dataset (Xenium Output Bundle In Situ Replicate 1) may be downloaded from https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast. The breast cancer reference data are provided with this package under `data/sc_references`, or `./example_data/sc_references` if you have run `example_small.py`. Please ensure the correct paths are provided for the parameters under `files` in `xenium_example_config.yaml`, in particular, the paths for the transcripts (`transcripts.csv.gz`) and DAPI (`morphology_mip.ome.tif`) files. 67 | 68 | To run the entire pipeline (data processing, training, prediction, and extracting the cell-gene matrix): 69 | ```py 70 | from bidcell import BIDCellModel 71 | model = BIDCellModel("xenium_example_config.yaml") 72 | model.run_pipeline() 73 | ``` 74 | Alternatively, the pipeline can be broken down into 3 main stages: 75 | ```py 76 | from bidcell import BIDCellModel 77 | model = BIDCellModel("xenium_example_config.yaml") 78 | model.preprocess() 79 | model.train() 80 | model.predict() 81 | ``` 82 | Or, functions in `preprocess` can be called individually: 83 | ```py 84 | from bidcell import BIDCellModel 85 | model = BIDCellModel("xenium_example_config.yaml") 86 | # model.stitch_nuclei() # for when nuclei images are separated into FOVs (e.g., CosMx) 87 | model.segment_nuclei() 88 | model.generate_expression_maps() 89 | model.generate_patches() 90 | model.make_cell_gene_mat(is_cell=False) 91 | model.preannotate() 92 | model.train() 93 | model.predict() 94 | ``` 95 | 96 | If your machine/server has multiple GPUs, you may select the GPU to use, e.g. for GPU ID 3: 97 | ```sh 98 | CUDA_VISIBLE_DEVICES=3 99 | ``` 100 | 101 | 102 | ## Single-cell reference and markers 103 | 104 | BIDCell uses single-cell reference data for improved results. These can be downloaded from public repositories such as TISCH2, Allen Brain Map, and the Human Cell Atlas. 105 | 106 | Please see the provided breast cancer single-cell reference and positive/negative marker files (`sc_breast.csv`, `sc_breast_markers_pos.csv`, and `sc_breast_markers_neg.csv`) as a template. 107 | 108 | The reference csv file contains average expressions for all of the genes in the spatial transcriptomic dataset of different cell types. You may choose an appropriate list of cell types to include for your data. 109 | 110 | The positive and negative markers files contain the respective marker genes for each cell type. The positive and negative markers were those with expressions in the highest and lowest 10 percentile for each cell type of a tissue sample. We found that removing positive markers that were common to at least a third of cell types in each dataset was appropriate across various datasets. Using a larger number of positive markers tends to increase the size of predicted cells. Manual curation and alternative approaches to determine the marker genes can also be used. 111 | 112 | Only <1,000 genes are needed to perform segmentation. Specify a selection of genes in a file (see Stero-seq example). 113 | 114 | Some example reference data may be found here: https://github.com/SydneyBioX/scClassify?tab=readme-ov-file#pretrained-models 115 | 116 | 117 | ## Segmentation architectures: 118 | The default is UNet3+ https://arxiv.org/abs/2004.08790, and we have found it to perform well across different technologies and tissue types. 119 | To use a different architecture, select from a list of popular backbones or define your own: 120 | - Set `model_params.name` in the .yaml file with an encoder from https://segmentation-modelspytorch.readthedocs.io/en/latest/index.html 121 | - Or, modify `SegmentationModel` class in [`model.py`](bidcell/model/model/model.py) 122 | 123 | 124 | ## Additional information 125 | 126 | If you receive the error: ``pickle.UnpicklingError: pickle data was truncated``, try reducing `cpus` 127 | 128 | Performing segmentation at a higher resolution requires a larger patch size, thus more GPU memory. 129 | 130 | Expected outputs: 131 | - .tif file of segmented cells, where the value corresponds to cell IDs. File name ends in `_connected.tif` 132 | - e.g.: `dataset_xenium_breast1_small/model_outputs/2023_09_06_11_55_24/test_output/epoch_{test_epoch}_step_{test_step}_connected.tif` 133 | - `expr_mat.csv` containing gene expressions of segmented cells 134 | - e.g.: `dataset_xenium_breast1_small/cell_gene_matrices/2023_09_06_11_55_24/expr_mat.csv` 135 | 136 | Expected runtime (based on our system for the Xenium breast cancer dataset): 137 | - Training: ~10 mins for 4,000 steps 138 | - Inference: ~ 50 mins 139 | - Postprocessing: ~ 30 mins 140 | 141 | 142 | ## Xenium Ranger and Xenium Explorer 143 | 144 | The BIDCell output .tif segmentation can be used with Xenium Ranger and then viewed in Xenium Explorer. The .tif file needs to be resized to be the same dimensions as the DAPI image (`morphology_mip.ome.tif`): 145 | 146 | ```py 147 | cells = cv2.resize(cells.astype('float32'), (w_dapi, h_dapi), interpolation=cv2.INTER_NEAREST) 148 | cells = cells.astype(np.uint32) 149 | ``` 150 | 151 | The resized segmentation can then be used as the input file to the `--cells` argument for `xeniumranger import-segmentation`. The same applies to the nuclei from BIDCell. 152 | 153 | 154 | ## Contact us 155 | 156 | If you have any enquiries, especially about using BIDCell to segment cells in your data, please contact xiaohang.fu@sydney.edu.au. We are also happy to receive any suggestions and comments. 157 | 158 | 159 | ## Citation 160 | 161 | If BIDCell has assisted you with your work, please kindly cite our paper: 162 | 163 | Fu, X., Lin, Y., Lin, D., Mechtersheimer, D., Wang, C., Ameen, F., Ghazanfar, S., Patrick, E., Kim, J., & Yang, J. Y. H. BIDCell: Biologically-informed self-supervised learning for segmentation of subcellular spatial transcriptomics data. Nat Commun 15, 509 (2024). https://doi.org/10.1038/s41467-023-44560-w -------------------------------------------------------------------------------- /bidcell/BIDCellModel.py: -------------------------------------------------------------------------------- 1 | """BIDCellModel class module""" 2 | import importlib.resources 3 | import os 4 | from pathlib import Path 5 | from shutil import copyfile, copytree 6 | from typing import Literal 7 | 8 | from .config import load_config 9 | from .model.postprocess_predictions import postprocess_predictions 10 | from .model.predict import fill_grid, predict 11 | from .model.train import train 12 | from .model.utils.utils import get_newest_id 13 | from .processing.cell_gene_matrix import make_cell_gene_mat 14 | from .processing.nuclei_segmentation import segment_nuclei 15 | from .processing.nuclei_stitch_fov import stitch_nuclei 16 | from .processing.preannotate import preannotate 17 | from .processing.transcript_patches import generate_patches 18 | from .processing.transcripts import generate_expression_maps 19 | 20 | 21 | class BIDCellModel: 22 | """The BIDCellModel class, which provides an interface for preprocessing, training and predicting all the cell types for a datset.""" 23 | 24 | def __init__(self, config_file: str) -> None: 25 | """Constructs a BIDCellModel instance using the user-supplied config file.\n 26 | The configuration is validated during construction. 27 | 28 | Parameters 29 | ---------- 30 | config_file : str 31 | Path to the YAML configuration file. 32 | """ 33 | self.config = load_config(config_file) 34 | 35 | def run_pipeline(self): 36 | """Runs the entire BIDCell pipeline using the settings defined in the configuration. 37 | """ 38 | print("### Preprocessing ###") 39 | print() 40 | self.preprocess() 41 | print() 42 | print("### Training ###") 43 | print() 44 | self.train() 45 | print() 46 | print("### Predict ###") 47 | print() 48 | self.predict() 49 | print() 50 | print("### Done ###") 51 | 52 | def preprocess(self) -> None: 53 | """Preprocess the dataset for training. 54 | """ 55 | if self.config.nuclei_fovs.stitch_nuclei_fovs: 56 | stitch_nuclei(self.config) 57 | if self.config.nuclei.crop_nuclei_to_ts: 58 | generate_expression_maps(self.config) 59 | segment_nuclei(self.config) 60 | else: 61 | segment_nuclei(self.config) 62 | generate_expression_maps(self.config) 63 | generate_patches(self.config) 64 | make_cell_gene_mat(self.config, is_cell=False) 65 | preannotate(self.config) 66 | 67 | def stitch_nuclei(self): 68 | """Stich separate FOV files into a single one (e.g. CosMx data).\n 69 | Runs inside preprocess by default, if nuclei_fovs.stitch_nuclei_fovs is True in the configuration file. 70 | """ 71 | stitch_nuclei(self.config) 72 | 73 | def segment_nuclei(self): 74 | """Run the nucleus segmentation algorythm. Runs inside preprocess by default. 75 | """ 76 | segment_nuclei(self.config) 77 | 78 | def generate_expression_maps(self): 79 | """Generate the expression maps. Runs inside preprocess by default. 80 | """ 81 | generate_expression_maps(self.config) 82 | 83 | def generate_patches(self): 84 | """Generate patches for training. Runs inside preprocess by default. 85 | """ 86 | generate_patches(self.config) 87 | 88 | def make_cell_gene_mat(self, is_cell: bool, timestamp: str = "last"): 89 | """Make a matrix containing counts for each cell. Runs inside preprocess and predict by default. 90 | 91 | Parameters 92 | ---------- 93 | is_cell : bool 94 | If False, uses nuclei masks for creation, other wise it uses `timestamp` to chose a directory containing segmented cells outputted by BIDCell. 95 | timestamp : str, optional 96 | The timestamp corrisponding to the name of a directory in the data directory under `model_outputs`, by default "last", in which case it uses the folder with the most recent timestamp. 97 | """ 98 | if is_cell and timestamp == "last": 99 | timestamp = get_newest_id( 100 | os.path.join(self.config.files.data_dir, "model_outputs") 101 | ) 102 | elif is_cell: 103 | self.__check_valid_timestamp(timestamp) 104 | make_cell_gene_mat(self.config, is_cell, timestamp=timestamp) 105 | 106 | def preannotate(self): 107 | """Preannotate the cells. Runs inside preprocess by default. 108 | """ 109 | preannotate(self.config) 110 | 111 | def train(self) -> None: 112 | """Train the model. 113 | """ 114 | train(self.config) 115 | 116 | def predict(self) -> None: 117 | """Segment and annotate the cells. 118 | """ 119 | predict(self.config) 120 | 121 | if self.config.experiment_dirs.dir_id == "last": 122 | timestamp = get_newest_id( 123 | os.path.join(self.config.files.data_dir, "model_outputs") 124 | ) 125 | else: 126 | timestamp = self.config.experiment_dirs.dir_id 127 | self.__check_valid_timestamp(timestamp) 128 | 129 | fill_grid(self.config, timestamp) 130 | 131 | postprocess_predictions(self.config, timestamp) 132 | 133 | make_cell_gene_mat(self.config, is_cell=True, timestamp=timestamp) 134 | 135 | @staticmethod 136 | def get_example_config(vendor: Literal["cosmx", "merscope", "stereoseq", "xenium"]) -> None: 137 | """Gets an example configuration for a given vendor and places it in the working directory. 138 | 139 | Parameters 140 | ---------- 141 | vendor : Literal["cosmx", "merscope", "stereoseq", "xenium"] 142 | The vendor of the equiptment used to produce the dataset. 143 | """ 144 | vendors = ["cosmx", "merscope", "stereoseq", "xenium"] 145 | if not any([vendor.lower() == x for x in vendors]): 146 | raise ValueError(f"Unknown vendor `{vendor}`\n\tChose one of {*vendors,}") 147 | params_path = ( 148 | importlib.resources.files("bidcell") / "example_params" / f"{vendor}.yaml" 149 | ) 150 | if not (dest := Path().cwd() / f"{vendor}_example_config.yaml").exists(): 151 | copyfile(params_path, dest) 152 | 153 | @staticmethod 154 | def get_example_data(with_config: bool = True) -> None: 155 | """Gets the small example data included in the package and places it in the current working directory. 156 | 157 | Parameters 158 | ---------- 159 | with_config : bool, optional 160 | Whether to get the configuration for the example data, by default True 161 | """ 162 | root: Path = importlib.resources.files("bidcell") 163 | data_path = ( 164 | root.parent / "data" 165 | ) 166 | cwd = Path().cwd() 167 | if not (cwd / "example_data").exists(): 168 | copytree(data_path, cwd / "example_data") 169 | if with_config and not (cwd / "params_small_example.yaml").exists(): 170 | copyfile( 171 | root / "example_params" / "small_example.yaml", 172 | cwd / "params_small_example.yaml" 173 | ) 174 | 175 | def __check_valid_timestamp(self, timestamp: str) -> None: 176 | outputs_path = Path(self.config.files.data_dir / "model_outputs") 177 | outputs = list(outputs_path.iterdir()) 178 | if len(outputs) == 0: 179 | raise ValueError( 180 | f"There are no outputs yet under {str(outputs_path)}. Run BIDCell at least once with this dataset to get some." 181 | ) 182 | if not any( 183 | [timestamp == x for x in outputs if x.is_dir()] 184 | ): 185 | valid_dirs = "\n".join(["\t" + str(x) for x in outputs]) 186 | raise ValueError( 187 | f"{timestamp} is not a valid model output directory (set in configuration YAML under `experiment_dirs.dir_id`). Choose one of the following:\n{valid_dirs}" 188 | ) 189 | -------------------------------------------------------------------------------- /bidcell/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | BIDCell - biologically-informed deep learning for cell segmentation of subcelluar spatial transcriptomics data 3 | """ 4 | 5 | from bidcell.BIDCellModel import BIDCellModel 6 | 7 | __all__ = ["BIDCellModel"] 8 | -------------------------------------------------------------------------------- /bidcell/config.py: -------------------------------------------------------------------------------- 1 | import os 2 | from pathlib import Path 3 | from typing import Literal, Annotated 4 | 5 | import yaml 6 | from pydantic import BaseModel, computed_field, model_validator, ConfigDict 7 | from pydantic.functional_validators import AfterValidator 8 | 9 | 10 | def validate_path(v: str | None) -> str: 11 | if v is None: 12 | return v 13 | 14 | path = Path(v) 15 | 16 | assert ( 17 | path.exists() 18 | ), f"Invalid path {v}: Ensure you have the correct path in your config file." 19 | 20 | return str(path.resolve()) 21 | 22 | 23 | PathString = Annotated[str, AfterValidator(validate_path)] 24 | 25 | 26 | class FileParams(BaseModel): 27 | data_dir: PathString 28 | fp_dapi: PathString | None = None 29 | fp_transcripts: PathString 30 | fp_ref: PathString 31 | fp_pos_markers: PathString 32 | fp_neg_markers: PathString 33 | 34 | # Internal defaults 35 | # file of affine transformation - needed if cropping to align DAPI to transcripts 36 | fp_affine: str = "affine.csv" 37 | # file name of nuclei tif file 38 | fp_nuclei: str = "nuclei.tif" 39 | # file name of resized DAPI image 40 | fp_rdapi: str = "dapi_resized.tif" 41 | # directory containing processed gene expression maps 42 | dir_out_maps: str = "expr_maps" 43 | # filtered and xy-scaled transcripts data 44 | fp_transcripts_processed: str = "transcripts_processed.csv" 45 | # txt file containing list of gene names 46 | fp_gene_names: str = "all_gene_names.txt" 47 | # directory prefix of transcript patches 48 | dir_patches: str = "expr_maps_input_patches_" 49 | # directory for cell-gene expression matrices 50 | dir_cgm: str = "cell_gene_matrices" 51 | # file name of nuclei expression matrices 52 | fp_expr: str = "expr_mat.csv" 53 | # file name of nuclei annotations 54 | fp_nuclei_anno: str = "nuclei_cell_type.h5" 55 | # file name of text file containing selected gene names, e.g. selected_genes.txt 56 | fp_selected_genes: str | None = None 57 | 58 | # Internal 59 | # fp_stitched: str | None = None 60 | 61 | 62 | class NucleiFovParams(BaseModel): 63 | stitch_nuclei_fovs: bool 64 | dir_dapi: str | None = None 65 | ext_dapi: str = "tif" 66 | pattern_z: str = "Z###" 67 | pattern_f: str = "F###" 68 | channel_first: bool = False 69 | channel_dapi: int = -1 70 | n_fov: int | None = None 71 | min_fov: int | None = None 72 | n_fov_h: int | None = None 73 | n_fov_w: int | None = None 74 | start_corner: Literal["ul", "ur", "bl", "br"] = "ul" 75 | row_major: bool = False 76 | z_level: int = 1 77 | mip: bool = False 78 | flip_ud: bool = False 79 | 80 | @model_validator(mode="after") 81 | def check_dapi(self): 82 | if not self.stitch_nuclei_fovs: 83 | return self 84 | 85 | if self.dir_dapi is None: 86 | raise ValueError( 87 | "dir_dapi must be specified if stitch_nuclei_fovs is True." 88 | ) 89 | 90 | p = Path(self.dir_dapi) 91 | 92 | if not p.exists(): 93 | raise ValueError( 94 | "Invalid value for dir_dapi ({self.dir_dapi}): Check the config file and ensure the correct directory is specified." 95 | ) 96 | 97 | if not p.is_dir(): 98 | raise ValueError( 99 | "dir_dapi is not a directory: dir_dapi must point to a directory containing the FOVs to be stitched together." 100 | ) 101 | return self 102 | 103 | 104 | class NucleiParams(BaseModel): 105 | # divide into sections if too large - maximum height to process in original resolution 106 | max_height: int = 24000 107 | # divide into sections if too large - maximum width to process in original resolution 108 | max_width: int = 32000 109 | # crop nuclei to size of transcript maps 110 | crop_nuclei_to_ts: bool = False 111 | # use CPU for Cellpose if no GPU available 112 | use_cpu: bool = False 113 | # estimated diameter of nuclei for Cellpose 114 | diameter: int | None = None 115 | 116 | 117 | class TranscriptParams(BaseModel): 118 | min_qv: int = 20 119 | # divide into sections if too large - height of patches 120 | max_height: int = 3500 121 | # divide into sections if too large - width of patches 122 | max_width: int = 4000 123 | shift_to_origin: bool = False 124 | x_col: str = "x_location" 125 | y_col: str = "y_location" 126 | gene_col: str = "feature_name" 127 | counts_col: str | None = None 128 | transcripts_to_filter: list[str] 129 | 130 | 131 | class AffineParams(BaseModel): 132 | target_pix_um: float = 1.0 133 | base_pix_x: float 134 | base_pix_y: float 135 | base_ts_x: float 136 | base_ts_y: float 137 | global_shift_x: int = 0 138 | global_shift_y: int = 0 139 | 140 | # Scaling images 141 | @computed_field 142 | @property 143 | def scale_pix_x(self) -> float: 144 | return self.base_pix_x / self.target_pix_um 145 | 146 | @computed_field 147 | @property 148 | def scale_pix_y(self) -> float: 149 | return self.base_pix_y / self.target_pix_um 150 | 151 | # Scaling transcript locations 152 | @computed_field 153 | @property 154 | def scale_ts_x(self) -> float: 155 | return self.base_ts_x / self.target_pix_um 156 | 157 | @computed_field 158 | @property 159 | def scale_ts_y(self) -> float: 160 | return self.base_ts_y / self.target_pix_um 161 | 162 | 163 | class CellGeneMatParams(BaseModel): 164 | # max h+w for resized segmentation to extract expressions from 165 | max_sum_hw: int = 30000 166 | 167 | 168 | class ModelParams(BaseModel): 169 | name: str = "custom" # TODO: Validate this field 170 | patch_size: int 171 | elongated: list[str] 172 | 173 | 174 | class TrainingParams(BaseModel): 175 | model_config = ConfigDict(protected_namespaces=()) 176 | total_epochs: int = 1 177 | total_steps: int = 4000 178 | # learning rate of DL model 179 | learning_rate: float = 0.00001 180 | # adam optimiser beta1 181 | beta1: float = 0.9 182 | # adam optimiser beta2 183 | beta2: float = 0.999 184 | # adam optimiser weight decay 185 | weight_decay: float = 0.0001 186 | # optimiser 187 | optimizer: Literal["adam", "rmsprop"] = "adam" 188 | ne_weight: float = 1.0 189 | os_weight: float = 1.0 190 | cc_weight: float = 1.0 191 | ov_weight: float = 1.0 192 | pos_weight: float = 1.0 193 | neg_weight: float = 1.0 194 | # number of training steps per model save 195 | model_freq: int = 1000 196 | # number of training steps per sample save 197 | sample_freq: int = 100 198 | 199 | 200 | class TestingParams(BaseModel): 201 | test_epoch: int = 1 202 | test_step: int = 4000 203 | 204 | 205 | class PostprocessParams(BaseModel): 206 | # size of patches to perform morphological processing 207 | patch_size_mp: int = 1024 208 | 209 | 210 | class ExperimentDirs(BaseModel): 211 | model_config = ConfigDict(protected_namespaces=()) 212 | # directory names for each experiment 213 | dir_id: str = "last" 214 | model_dir: str = "models" 215 | test_output_dir: str = "test_output" 216 | samples_dir: str = "samples" 217 | 218 | 219 | class Config(BaseModel): 220 | model_config = ConfigDict(protected_namespaces=()) 221 | files: FileParams 222 | nuclei_fovs: NucleiFovParams 223 | nuclei: NucleiParams 224 | transcripts: TranscriptParams 225 | affine: AffineParams 226 | model_params: ModelParams 227 | training_params: TrainingParams 228 | testing_params: TestingParams 229 | cpus: int 230 | postprocess: PostprocessParams = PostprocessParams() 231 | experiment_dirs: ExperimentDirs = ExperimentDirs() 232 | cgm_params: CellGeneMatParams = CellGeneMatParams() 233 | 234 | 235 | def load_config(path: str) -> Config: 236 | if not os.path.exists(path): 237 | FileNotFoundError( 238 | f"Config file at {path} could not be found. Please check if the filepath is valid." 239 | ) 240 | 241 | with open(path) as config_file: 242 | try: 243 | config = yaml.safe_load(config_file) 244 | except Exception: 245 | raise ValueError( 246 | "The inputted YAML config was invalid, try looking at the example config." 247 | ) 248 | 249 | if not isinstance(config, dict): 250 | raise ValueError( 251 | "The inputted YAML config was invalid, try looking at the example config." 252 | ) 253 | 254 | # validate the configuration schema 255 | config = Config(**config) 256 | return config 257 | -------------------------------------------------------------------------------- /bidcell/example_params/cosmx.yaml: -------------------------------------------------------------------------------- 1 | # for functions in bidcell/processing 2 | # NOTE: Commented options default to None 3 | 4 | cpus: 16 # number of CPUs for multiprocessing 5 | 6 | files: 7 | data_dir: ./data_large/dataset_cosmx_nsclc # data directory for processed/output data 8 | fp_dapi: # path of DAPI image or path of output stitched DAPI if using stitch_nuclei 9 | fp_transcripts: ./data_large/dataset_cosmx_nsclc/Lung5_Rep1_tx_file.csv # path of transcripts file 10 | fp_ref: ./data_large/sc_references/sc_nsclc.csv # file path of reference data 11 | fp_pos_markers: ./data_large/sc_references/sc_nsclc_markers_pos.csv # file path of positive markers 12 | fp_neg_markers: ./data_large/sc_references/sc_nsclc_markers_neg.csv # file path of negative markers 13 | 14 | nuclei_fovs: 15 | stitch_nuclei_fovs: True # set True to stitch separate FOVs of DAPI together in 1 image 16 | dir_dapi: ./data_large/dataset_cosmx_nsclc/Lung5_Rep1-RawMorphologyImages # name of directory containing the DAPI FOV images 17 | ext_dapi: tif # extension of the DAPI images 18 | pattern_z: Z### # String pattern to find in the file names for the Z number, or None for no Z component 19 | pattern_f: F### # String pattern to find in file names for the FOV number 20 | channel_first: True # channel axis first (e.g. [5,H,W]) or last (e.g. [H,W,5]) in image volumes 21 | channel_dapi: -1 # channel index of the DAPI images in the image volumes 22 | n_fov: 30 # total number of FOVs 23 | min_fov: 1 # smallest FOV number - usually 0 or 1 24 | n_fov_h: 6 # number of FOVs tiled along vertical axis 25 | n_fov_w: 5 # number of FOVs tiled along horizontal axis 26 | start_corner: ul # position of first FOV - choose from ul, ur, bl, br 27 | row_major: True # row major ordering of FOVs 28 | z_level: 1 # which z slice to use, or set mip to use MIP 29 | mip: False # take the maximum intensity projection across all Z 30 | flip_ud: True # flip images up/down before stitching 31 | 32 | nuclei: 33 | diameter: # estimated diameter of nuclei for Cellpose - or None to automatically compute, default: None 34 | 35 | transcripts: 36 | shift_to_origin: True # shift to origin, making min(x) and min(y) (0,0) 37 | x_col: x_global_px # name of x location column in transcripts file 38 | y_col: y_global_px # name of y location column in transcripts file 39 | gene_col: target # name of genes column in transcripts file 40 | transcripts_to_filter: # genes starting with these strings will be filtered out 41 | - NegControlProbe_ 42 | - antisense_ 43 | - NegControlCodeword_ 44 | - BLANK_ 45 | - Blank- 46 | - NegPrb 47 | 48 | affine: 49 | target_pix_um: 0.5 # microns per pixel to perform segmentation; default: 1.0 50 | base_pix_x: 0.18 # convert to microns along width by multiplying the original pixels by base_pix_x microns per pixel 51 | base_pix_y: 0.18 # convert to microns along width by multiplying the original pixels by base_pix_y microns per pixel 52 | base_ts_x: 0.18 # convert between transcript locations and target pixels along width 53 | base_ts_y: 0.18 # convert between transcript locations and target pixels along height 54 | global_shift_x: 0 # additional adjustment to align transcripts to DAPI in target pixels along image width; default: 0 55 | global_shift_y: 0 # additional adjustment to align transcripts to DAPI in target pixels along image height; default: 0 56 | 57 | model_params: 58 | name: custom # segmentation model to use: custom for model in model.py or set to a encoder name from segmentation_models_pytorch; default: custom 59 | patch_size: 64 # size of transcriptomic image patches for input to DL model 60 | elongated: # list of elongated cell types that are in the single-cell reference 61 | - Adventitial fibroblasts 62 | - Alveolar fibroblasts 63 | - Peribronchial fibroblasts 64 | - Subpleural fibroblasts 65 | - Myofibroblasts 66 | - Fibromyocytes 67 | 68 | training_params: 69 | total_epochs: 1 # number of training epochs; default: 1 70 | total_steps: 4000 # number of training steps; default: 4000 71 | ne_weight: 1.0 # weight for nuclei encapsulation loss; default: 1.0 72 | os_weight: 1.0 # weight for oversegmentation loss; default: 1.0 73 | cc_weight: 1.0 # weight for cell-calling loss; default: 1.0 74 | ov_weight: 1.0 # weight for oversegmentation loss; default: 1.0 75 | pos_weight: 1.0 # weight for positive marker loss; default: 1.0 76 | neg_weight: 1.0 # weight for negative marker loss; default: 1.0 77 | 78 | testing_params: 79 | test_epoch: 1 # epoch to test; default: 1 80 | test_step: 4000 # step number to test; default: 4000 81 | 82 | experiment_dirs: 83 | dir_id: last # specify timestamp of output dir or leave blank to use latest dir, default: last -------------------------------------------------------------------------------- /bidcell/example_params/merscope.yaml: -------------------------------------------------------------------------------- 1 | # for functions in bidcell/processing 2 | # NOTE: Commented options default to None 3 | 4 | cpus: 16 # number of CPUs for multiprocessing 5 | 6 | files: 7 | data_dir: ./data_large/dataset_merscope_melanoma2 # data directory for processed/output data 8 | fp_dapi: ./data_large/dataset_merscope_melanoma2/HumanMelanomaPatient2_images_mosaic_DAPI_z0.tif # path of DAPI image or path of output stitched DAPI if using stitch_nuclei 9 | fp_transcripts: ./data_large/dataset_merscope_melanoma2/HumanMelanomaPatient2_detected_transcripts.csv # path of transcripts file 10 | fp_ref: ./data_large/sc_references/sc_melanoma.csv # file path of reference data 11 | fp_pos_markers: ./data_large/sc_references/sc_melanoma_markers_pos.csv # file path of positive markers 12 | fp_neg_markers: ./data_large/sc_references/sc_melanoma_markers_neg.csv # file path of negative markers 13 | 14 | nuclei_fovs: 15 | stitch_nuclei_fovs: False # set True to stitch separate FOVs of DAPI together in 1 image 16 | 17 | nuclei: 18 | diameter: # estimated diameter of nuclei for Cellpose - or None to automatically compute, default: None 19 | 20 | transcripts: 21 | shift_to_origin: True # shift to origin, making min(x) and min(y) (0,0) 22 | x_col: global_x # name of x location column in transcripts file 23 | y_col: global_y # name of y location column in transcripts file 24 | gene_col: gene # name of genes column in transcripts file 25 | transcripts_to_filter: # genes starting with these strings will be filtered out 26 | - NegControlProbe_ 27 | - antisense_ 28 | - NegControlCodeword_ 29 | - BLANK_ 30 | - Blank- 31 | - NegPrb 32 | 33 | affine: 34 | target_pix_um: 1.0 # microns per pixel to perform segmentation; default: 1.0 35 | base_pix_x: 0.107999132774 # convert to microns along width by multiplying the original pixels by base_pix_x microns per pixel 36 | base_pix_y: 0.107997631125 # convert to microns along width by multiplying the original pixels by base_pix_y microns per pixel 37 | base_ts_x: 1.0 # convert between transcript locations and target pixels along width 38 | base_ts_y: 1.0 # convert between transcript locations and target pixels along height 39 | global_shift_x: 12 # additional adjustment to align transcripts to DAPI in target pixels along image width; default: 0 40 | global_shift_y: 10 # additional adjustment to align transcripts to DAPI in target pixels along image height; default: 0 41 | 42 | model_params: 43 | name: custom # segmentation model to use: custom for model in model.py or set to a encoder name from segmentation_models_pytorch; default: custom 44 | patch_size: 64 # size of transcriptomic image patches for input to DL model 45 | elongated: # list of elongated cell types that are in the single-cell reference 46 | - Endothelial 47 | - Fibroblasts 48 | - Myofibroblasts 49 | - SMC 50 | 51 | training_params: 52 | total_epochs: 1 # number of training epochs; default: 1 53 | total_steps: 4000 # number of training steps; default: 4000 54 | ne_weight: 1.0 # weight for nuclei encapsulation loss; default: 1.0 55 | os_weight: 1.0 # weight for oversegmentation loss; default: 1.0 56 | cc_weight: 1.0 # weight for cell-calling loss; default: 1.0 57 | ov_weight: 1.0 # weight for oversegmentation loss; default: 1.0 58 | pos_weight: 1.0 # weight for positive marker loss; default: 1.0 59 | neg_weight: 1.0 # weight for negative marker loss; default: 1.0 60 | 61 | testing_params: 62 | test_epoch: 1 # epoch to test; default: 1 63 | test_step: 4000 # step number to test; default: 4000 64 | 65 | experiment_dirs: 66 | dir_id: last # specify timestamp of output dir or leave blank to use latest dir, default: last -------------------------------------------------------------------------------- /bidcell/example_params/small_example.yaml: -------------------------------------------------------------------------------- 1 | # for functions in bidcell/processing 2 | # NOTE: Commented options default to None 3 | 4 | cpus: 8 # number of CPUs for multiprocessing 5 | 6 | files: 7 | data_dir: ./example_data/dataset_xenium_breast1_small # data directory for processed/output data 8 | fp_dapi: ./example_data/dataset_xenium_breast1_small/morphology_mip_small.tif # path of DAPI image or path of output stitched DAPI if using stitch_nuclei 9 | fp_transcripts: ./example_data/dataset_xenium_breast1_small/transcripts_small.csv # path of transcripts file 10 | fp_ref: ./example_data/sc_references/sc_breast.csv # file path of reference data 11 | fp_pos_markers: ./example_data/sc_references/sc_breast_markers_pos.csv # file path of positive markers 12 | fp_neg_markers: ./example_data/sc_references/sc_breast_markers_neg.csv # file path of negative markers 13 | 14 | nuclei_fovs: 15 | stitch_nuclei_fovs: False # set True to stitch separate FOVs of DAPI together in 1 image 16 | 17 | nuclei: 18 | diameter: # estimated diameter of nuclei for Cellpose - or None to automatically compute, default: None 19 | 20 | transcripts: 21 | shift_to_origin: True # shift to origin, making min(x) and min(y) (0,0) 22 | x_col: x_location # name of x location column in transcripts file 23 | y_col: y_location # name of y location column in transcripts file 24 | gene_col: feature_name # name of genes column in transcripts file 25 | transcripts_to_filter: # genes starting with these strings will be filtered out 26 | - NegControlProbe_ 27 | - antisense_ 28 | - NegControlCodeword_ 29 | - BLANK_ 30 | - Blank- 31 | - NegPrb 32 | 33 | affine: 34 | target_pix_um: 1.0 # microns per pixel to perform segmentation; default: 1.0 35 | base_pix_x: 0.2125 # convert to microns along width by multiplying the original pixels by base_pix_x microns per pixel 36 | base_pix_y: 0.2125 # convert to microns along width by multiplying the original pixels by base_pix_y microns per pixel 37 | base_ts_x: 1.0 # convert between transcript locations and target pixels along width 38 | base_ts_y: 1.0 # convert between transcript locations and target pixels along height 39 | global_shift_x: 0 # additional adjustment to align transcripts to DAPI in target pixels along image width; default: 0 40 | global_shift_y: 0 # additional adjustment to align transcripts to DAPI in target pixels along image height; default: 0 41 | 42 | model_params: 43 | name: custom # segmentation model to use: custom for model in model.py or set to a encoder name from segmentation_models_pytorch; default: custom 44 | patch_size: 48 # size of transcriptomic image patches for input to DL model 45 | elongated: # list of elongated cell types that are in the single-cell reference 46 | - Endothelial 47 | - Fibroblasts 48 | - Myofibroblasts 49 | - SMC 50 | 51 | training_params: 52 | total_epochs: 1 # number of training epochs; default: 1 53 | total_steps: 60 # number of training steps; default: 4000 54 | ne_weight: 1.0 # weight for nuclei encapsulation loss; default: 1.0 55 | os_weight: 1.0 # weight for oversegmentation loss; default: 1.0 56 | cc_weight: 1.0 # weight for cell-calling loss; default: 1.0 57 | ov_weight: 1.0 # weight for oversegmentation loss; default: 1.0 58 | pos_weight: 1.0 # weight for positive marker loss; default: 1.0 59 | neg_weight: 1.0 # weight for negative marker loss; default: 1.0 60 | 61 | testing_params: 62 | test_epoch: 1 # epoch to test; default: 1 63 | test_step: 60 # step number to test; default: 4000 64 | 65 | experiment_dirs: 66 | dir_id: last # specify timestamp of output dir or leave blank to use latest dir, default: last -------------------------------------------------------------------------------- /bidcell/example_params/stereoseq.yaml: -------------------------------------------------------------------------------- 1 | # for functions in bidcell/processing 2 | # NOTE: Commented options default to None 3 | 4 | cpus: 16 # number of CPUs for multiprocessing 5 | 6 | files: 7 | data_dir: ./data_large/dataset_stereoseq_mousebrain # data directory for processed/output data 8 | fp_dapi: ./data_large/dataset_stereoseq_mousebrain/Mouse_brain_Adult.tif # path of DAPI image or path of output stitched DAPI if using stitch_nuclei 9 | fp_transcripts: ./data_large/dataset_stereoseq_mousebrain/Mouse_brain_Adult_GEM_bin1.tsv.gz # path of transcripts file 10 | fp_ref: ./data_large/sc_references/sc_mousebrain.csv # file path of reference data 11 | fp_pos_markers: ./data_large/sc_references/sc_mousebrain_markers_pos.csv # file path of positive markers 12 | fp_neg_markers: ./data_large/sc_references/sc_mousebrain_markers_neg.csv # file path of negative markers 13 | fp_selected_genes: ./data_large/dataset_stereoseq_mousebrain/example_mousebrain_genes.txt # file containing names of genes to use 14 | 15 | nuclei_fovs: 16 | stitch_nuclei_fovs: False # set True to stitch separate FOVs of DAPI together in 1 image 17 | 18 | nuclei: 19 | diameter: 30 # estimated diameter of nuclei for Cellpose - or None to automatically compute, default: None 20 | crop_nuclei_to_ts: True # crop nuclei to region of transcript detections - run generate_expression_maps() before segment_nuclei() 21 | 22 | transcripts: 23 | shift_to_origin: True # shift to origin, making min(x) and min(y) (0,0) 24 | x_col: y # name of x location column in transcripts file 25 | y_col: x # name of y location column in transcripts file 26 | gene_col: geneID # name of genes column in transcripts file 27 | counts_col: MIDCounts # name of counts column in transcripts file, eg MIDCounts, default: None 28 | transcripts_to_filter: # genes starting with these strings will be filtered out 29 | - NegControlProbe_ 30 | - antisense_ 31 | - NegControlCodeword_ 32 | - BLANK_ 33 | - Blank- 34 | - NegPrb 35 | 36 | affine: 37 | target_pix_um: 1.0 # microns per pixel to perform segmentation; default: 1.0 38 | base_pix_x: 1.0 # convert to microns along width by multiplying the original pixels by base_pix_x microns per pixel 39 | base_pix_y: 1.0 # convert to microns along width by multiplying the original pixels by base_pix_y microns per pixel 40 | base_ts_x: 1.0 # convert between transcript locations and target pixels along width 41 | base_ts_y: 1.0 # convert between transcript locations and target pixels along height 42 | global_shift_x: 0 # additional adjustment to align transcripts to DAPI in target pixels along image width; default: 0 43 | global_shift_y: 0 # additional adjustment to align transcripts to DAPI in target pixels along image height; default: 0 44 | 45 | model_params: 46 | name: custom # segmentation model to use: custom for model in model.py or set to a encoder name from segmentation_models_pytorch; default: custom 47 | patch_size: 64 # size of transcriptomic image patches for input to DL model 48 | elongated: # list of elongated cell types that are in the single-cell reference 49 | - Astro 50 | - CA1 51 | - CA2 52 | - CA3 53 | - CR 54 | - CT SUB 55 | - Car3 56 | - DG 57 | - Endo 58 | - IT HATA 59 | - L2 IT APr 60 | - L2 IT ENTl 61 | - L2 IT ENTm 62 | - L2 IT RSP-ACA 63 | - L2 IT RSPv-POST-PRE 64 | - L2/3 IT AI 65 | - L2/3 IT CTX 66 | - L2/3 IT ENTl 67 | - L2/3 IT PAR 68 | - L2/3 IT POST-PRE 69 | - L2/3 IT ProS 70 | - L3 IT ENTl 71 | - L3 IT ENTm 72 | - L4 IT CTX 73 | - L4 RSP-ACA 74 | - L4/5 IT CTX 75 | - L5 IT CTX 76 | - L5 IT RSP-ACA 77 | - L5 PPP 78 | - L5 PT CTX 79 | - L5/6 IT CTX 80 | - L5/6 IT PFC 81 | - L5/6 IT TPE-ENT 82 | - L5/6 NP CT CTX 83 | - L5/6 NP CTX 84 | - L6 CT CTX 85 | - L6 CT ENT 86 | - L6 IT CTX 87 | - L6 IT ENTl 88 | - L6b CTX 89 | - L6b ENT 90 | - L6b RHP 91 | - Lamp5 92 | - Meis2 93 | - Micro-PVM 94 | - Mossy 95 | - NP PPP 96 | - NP SUB 97 | - Ntng1 HPF 98 | - Oligo 99 | - Pax6 100 | - ProS 101 | - Pvalb 102 | - SMC-Peri 103 | - SUB 104 | - Sncg 105 | - Sst 106 | - VLMC 107 | - Vip 108 | 109 | training_params: 110 | total_epochs: 1 # number of training epochs; default: 1 111 | total_steps: 4000 # number of training steps; default: 4000 112 | ne_weight: 1.0 # weight for nuclei encapsulation loss; default: 1.0 113 | os_weight: 1.0 # weight for oversegmentation loss; default: 1.0 114 | cc_weight: 1.0 # weight for cell-calling loss; default: 1.0 115 | ov_weight: 1.0 # weight for oversegmentation loss; default: 1.0 116 | pos_weight: 1.0 # weight for positive marker loss; default: 1.0 117 | neg_weight: 1.0 # weight for negative marker loss; default: 1.0 118 | 119 | testing_params: 120 | test_epoch: 1 # epoch to test; default: 1 121 | test_step: 4000 # step number to test; default: 4000 122 | 123 | experiment_dirs: 124 | dir_id: last # specify timestamp of output dir or leave blank to use latest dir, default: last -------------------------------------------------------------------------------- /bidcell/example_params/xenium.yaml: -------------------------------------------------------------------------------- 1 | # for functions in bidcell/processing 2 | # NOTE: Commented options default to None 3 | 4 | cpus: 16 # number of CPUs for multiprocessing 5 | 6 | files: # NOTE: please ensure these point to the right locations 7 | data_dir: ./data_large/dataset_xenium_breast1 # data directory for processed/output data 8 | fp_dapi: ./data_large/dataset_xenium_breast1/morphology_mip.ome.tif # path of DAPI image or path of output stitched DAPI if using stitch_nuclei 9 | fp_transcripts: ./data_large/dataset_xenium_breast1/transcripts.csv.gz # path of transcripts file 10 | fp_ref: ./example_data/sc_references/sc_breast.csv # file path of reference data 11 | fp_pos_markers: ./example_data/sc_references/sc_breast_markers_pos.csv # file path of positive markers 12 | fp_neg_markers: ./example_data/sc_references/sc_breast_markers_neg.csv # file path of negative markers 13 | 14 | nuclei_fovs: 15 | stitch_nuclei_fovs: False # set True to stitch separate FOVs of DAPI together in 1 image 16 | 17 | nuclei: 18 | diameter: # estimated diameter of nuclei for Cellpose - or None to automatically compute, default: None 19 | 20 | transcripts: 21 | shift_to_origin: False # shift to origin, making min(x) and min(y) (0,0) 22 | x_col: x_location # name of x location column in transcripts file 23 | y_col: y_location # name of y location column in transcripts file 24 | gene_col: feature_name # name of genes column in transcripts file 25 | transcripts_to_filter: # genes starting with these strings will be filtered out 26 | - NegControlProbe_ 27 | - antisense_ 28 | - NegControlCodeword_ 29 | - BLANK_ 30 | - Blank- 31 | - NegPrb 32 | 33 | affine: 34 | target_pix_um: 1.0 # microns per pixel to perform segmentation; default: 1.0 35 | base_pix_x: 0.2125 # convert to microns along width by multiplying the original pixels by base_pix_x microns per pixel 36 | base_pix_y: 0.2125 # convert to microns along width by multiplying the original pixels by base_pix_y microns per pixel 37 | base_ts_x: 1.0 # convert between transcript locations and target pixels along width 38 | base_ts_y: 1.0 # convert between transcript locations and target pixels along height 39 | global_shift_x: 0 # additional adjustment to align transcripts to DAPI in target pixels along image width; default: 0 40 | global_shift_y: 0 # additional adjustment to align transcripts to DAPI in target pixels along image height; default: 0 41 | 42 | model_params: 43 | name: custom # segmentation model to use: custom for model in model.py or set to a encoder name from segmentation_models_pytorch; default: custom 44 | patch_size: 48 # size of transcriptomic image patches for input to DL model 45 | elongated: # list of elongated cell types that are in the single-cell reference 46 | - Endothelial 47 | - Fibroblasts 48 | - Myofibroblasts 49 | - SMC 50 | 51 | training_params: 52 | total_epochs: 1 # number of training epochs; default: 1 53 | total_steps: 4000 # number of training steps; default: 4000 54 | ne_weight: 1.0 # weight for nuclei encapsulation loss; default: 1.0 55 | os_weight: 1.0 # weight for oversegmentation loss; default: 1.0 56 | cc_weight: 1.0 # weight for cell-calling loss; default: 1.0 57 | ov_weight: 1.0 # weight for oversegmentation loss; default: 1.0 58 | pos_weight: 1.0 # weight for positive marker loss; default: 1.0 59 | neg_weight: 1.0 # weight for negative marker loss; default: 1.0 60 | 61 | testing_params: 62 | test_epoch: 1 # epoch to test; default: 1 63 | test_step: 4000 # step number to test; default: 4000 64 | 65 | experiment_dirs: 66 | dir_id: last # specify timestamp of output dir or leave blank to use latest dir, default: last -------------------------------------------------------------------------------- /bidcell/model/dataio/dataset_input.py: -------------------------------------------------------------------------------- 1 | import glob 2 | import os 3 | import random 4 | import re 5 | import sys 6 | import warnings 7 | 8 | import cv2 9 | import h5py 10 | import imgaug.augmenters as iaa 11 | import natsort 12 | import numpy as np 13 | import pandas as pd 14 | import tifffile 15 | import torch 16 | import torch.utils.data as data 17 | from scipy.ndimage import rotate 18 | 19 | warnings.filterwarnings("ignore", category=np.VisibleDeprecationWarning) 20 | 21 | 22 | class DataProcessing(data.Dataset): 23 | def __init__( 24 | self, 25 | config, 26 | isTraining=True, 27 | all_patches=True, 28 | shift_patches=0, 29 | total_steps=None, 30 | ): 31 | self.patch_size = config.model_params.patch_size 32 | self.isTraining = isTraining 33 | 34 | self.expr_fp = ( 35 | config.files.data_dir 36 | + "/" 37 | + config.files.dir_out_maps 38 | + "/" 39 | + config.files.dir_patches 40 | + str(self.patch_size) 41 | + "x" 42 | + str(self.patch_size) 43 | + "_shift_" 44 | + str(shift_patches) 45 | ) 46 | 47 | self.nuclei_fp = os.path.join(config.files.data_dir, config.files.fp_nuclei) 48 | self.nuclei_types_fp = os.path.join( 49 | config.files.data_dir, config.files.fp_nuclei_anno 50 | ) 51 | self.gene_names_fp = os.path.join( 52 | config.files.data_dir, config.files.fp_gene_names 53 | ) 54 | 55 | self.pos_markers_fp = config.files.fp_pos_markers 56 | self.neg_markers_fp = config.files.fp_neg_markers 57 | self.ref_fp = config.files.fp_ref 58 | 59 | # Check valid data directories 60 | if not os.path.exists(self.expr_fp): 61 | sys.exit("Invalid file path %s" % self.expr_fp) 62 | if not os.path.exists(self.nuclei_fp): 63 | sys.exit("Invalid file path %s" % self.nuclei_fp) 64 | if not os.path.exists(self.nuclei_types_fp): 65 | sys.exit("Invalid file path %s" % self.nuclei_types_fp) 66 | if not os.path.exists(self.pos_markers_fp): 67 | sys.exit("Invalid file path %s" % self.pos_markers_fp) 68 | if not os.path.exists(self.neg_markers_fp): 69 | sys.exit("Invalid file path %s" % self.neg_markers_fp) 70 | if not os.path.exists(self.ref_fp): 71 | sys.exit("Invalid file path %s" % self.ref_fp) 72 | if not os.path.exists(self.gene_names_fp): 73 | sys.exit("Invalid file path %s" % self.gene_names_fp) 74 | 75 | self.nuclei = tifffile.imread(self.nuclei_fp) 76 | self.nuclei = self.nuclei.astype(np.int32) 77 | print("Loaded nuclei") 78 | print(self.nuclei.shape) 79 | 80 | expr_fp_ext = ".hdf5" 81 | fp_patches_all = glob.glob(self.expr_fp + "/*" + expr_fp_ext) 82 | fp_patches_all = natsort.natsorted(fp_patches_all) 83 | 84 | # # Get coordinates of non-overlapping patches 85 | # if shift_patches == 0: 86 | # h_starts = list(np.arange(0, self.nuclei.shape[0]-self.patch_size, self.patch_size)) 87 | # w_starts = list(np.arange(0, self.nuclei.shape[1]-self.patch_size, self.patch_size)) 88 | 89 | # # Include remainder patches on 90 | # h_starts.append(self.nuclei.shape[0]-self.patch_size) 91 | # w_starts.append(self.nuclei.shape[1]-self.patch_size) 92 | # else: 93 | # h_starts = list(np.arange(shift_patches, self.nuclei.shape[0]-self.patch_size, self.patch_size)) 94 | # w_starts = list(np.arange(shift_patches, self.nuclei.shape[1]-self.patch_size, self.patch_size)) 95 | 96 | # coords_starts = [(x, y) for x in h_starts for y in w_starts] 97 | 98 | # Randomly select train/test samples 99 | random.seed(1234) 100 | n_coords = len(fp_patches_all) 101 | sample_ids = range(n_coords) 102 | sample_k = int(80 * n_coords / 100) 103 | train_ids = random.sample(sample_ids, k=sample_k) 104 | 105 | if self.isTraining: 106 | self.fp_patches = [fp_patches_all[x] for x in train_ids] 107 | if total_steps is not None: 108 | total_steps = total_steps + int(0.05 * len(train_ids)) 109 | if total_steps <= len(train_ids): 110 | self.fp_patches = self.fp_patches[:total_steps] 111 | elif all_patches: 112 | self.fp_patches = fp_patches_all 113 | else: 114 | test_ids = [x for x in sample_ids if x not in train_ids] 115 | self.fp_patches = [fp_patches_all[x] for x in test_ids] 116 | 117 | print("%d patches available" % len(self.fp_patches)) 118 | 119 | # if self.isTraining: 120 | # self.coords_starts = [coords_starts[x] for x in train_ids] 121 | # self.coords_starts = self.coords_starts 122 | # elif all_patches == True: 123 | # self.coords_starts = coords_starts 124 | # else: 125 | # test_ids = [x for x in sample_ids if x not in train_ids] 126 | # self.coords_starts = [coords_starts[x] for x in test_ids] 127 | 128 | # Nuclei IDs with cell types and elongated nuclei 129 | h5f = h5py.File(self.nuclei_types_fp, "r") 130 | self.nuclei_types_idx = list(h5f["data"][:]) 131 | self.nuclei_types_ids = list(h5f["ids"][:]) 132 | h5f.close() 133 | 134 | # Get order of cell-types from sc reference 135 | atlas_exprs = pd.read_csv(self.ref_fp, index_col=0) 136 | ct_idx_ref = atlas_exprs["ct_idx"].tolist() 137 | ct_ref = atlas_exprs["cell_type"].tolist() 138 | name_index_dict = {} 139 | for name, index in zip(ct_ref, ct_idx_ref): 140 | name_index_dict[index] = name 141 | type_names = [name_index_dict[index] for index in sorted(ct_idx_ref)] 142 | self.type_names = list(dict.fromkeys(type_names)) 143 | print(f"Cell types: {self.type_names}") 144 | 145 | types_elong = config.model_params.elongated 146 | idx_elong = [self.type_names.index(x) for x in types_elong] 147 | nuclei_types_elong = [1 if x in idx_elong else 0 for x in self.nuclei_types_idx] 148 | self.nuclei_ids_elong = [ 149 | x for i, x in enumerate(self.nuclei_types_ids) if nuclei_types_elong[i] == 1 150 | ] 151 | 152 | # print('%d patches available' %len(self.coords_starts)) 153 | 154 | df_pos_markers = pd.read_csv(self.pos_markers_fp, index_col=0) 155 | df_neg_markers = pd.read_csv(self.neg_markers_fp, index_col=0) 156 | 157 | # self.pos_markers = df_pos_markers.to_numpy() 158 | # self.neg_markers = df_neg_markers.to_numpy() 159 | self.pos_markers = df_pos_markers 160 | self.neg_markers = df_neg_markers 161 | 162 | with open(self.gene_names_fp) as file: 163 | self.gene_names = [line.rstrip() for line in file] 164 | 165 | def augment_data(self, batch_raw): 166 | batch_raw = np.expand_dims(batch_raw, 0) 167 | 168 | # Original, horizontal 169 | random_flip = np.random.randint(2, size=1)[0] 170 | # 0, 90, 180, 270 171 | random_rotate = np.random.randint(4, size=1)[0] 172 | 173 | # Flips 174 | if random_flip == 0: 175 | batch_flip = batch_raw * 1 176 | else: 177 | batch_flip = iaa.Flipud(1.0)(images=batch_raw) 178 | 179 | # Rotations 180 | if random_rotate == 0: 181 | batch_rotate = batch_flip * 1 182 | elif random_rotate == 1: 183 | batch_rotate = iaa.Rot90(1, keep_size=True)(images=batch_flip) 184 | elif random_rotate == 2: 185 | batch_rotate = iaa.Rot90(2, keep_size=True)(images=batch_flip) 186 | else: 187 | batch_rotate = iaa.Rot90(3, keep_size=True)(images=batch_flip) 188 | 189 | images_aug_array = np.array(batch_rotate) 190 | 191 | return images_aug_array, random_flip, random_rotate 192 | 193 | def normalise_images(self, imgs): 194 | return (imgs - self.fold_mean) / self.fold_std 195 | 196 | def __len__(self): 197 | "Denotes the total number of samples" 198 | return len(self.fp_patches) 199 | 200 | def __getitem__(self, index): 201 | "Generates one sample of data" 202 | 203 | patch_fp = self.fp_patches[index] 204 | h5f = h5py.File(patch_fp, "r") 205 | expr = h5f["data"][:].astype(np.float64) 206 | h5f.close() 207 | 208 | # Global coordinates 209 | expr_fp_g = re.findall(r"\d+", os.path.basename(patch_fp)) 210 | coords_h1 = int(expr_fp_g[0]) 211 | coords_w1 = int(expr_fp_g[1]) 212 | coords_h2 = coords_h1 + self.patch_size 213 | coords_w2 = coords_w1 + self.patch_size 214 | 215 | nucl = self.nuclei[coords_h1:coords_h2, coords_w1:coords_w2] 216 | 217 | assert expr.shape[0] == self.patch_size, print(expr.shape[0]) 218 | assert expr.shape[1] == self.patch_size, print(expr.shape[1]) 219 | 220 | img = np.concatenate((expr, np.expand_dims(nucl, -1)), -1) 221 | 222 | if self.isTraining: 223 | img, _, _ = self.augment_data(img) 224 | img = img[0, :, :, :] 225 | 226 | expr_aug = img[:, :, :-1] 227 | nucl_aug = img[:, :, -1] 228 | 229 | # Tile cells individually along channel axis 230 | cell_ids, _ = np.unique(nucl_aug, return_index=True) 231 | cell_ids = cell_ids[cell_ids != 0] 232 | n_cells = len(cell_ids) 233 | 234 | nucl_split = np.zeros((self.patch_size, self.patch_size, n_cells)) 235 | search_areas = np.zeros((self.patch_size, self.patch_size, n_cells)) 236 | 237 | # For non-elongated cells 238 | kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (20, 20)) 239 | 240 | # For elongated cells 241 | ksizevmin = 3 242 | ksize_total = 60 243 | ecc_scale = 0.9 244 | 245 | # For pos/neg marker loss 246 | search_pos = np.zeros((self.patch_size, self.patch_size, n_cells)) 247 | search_neg = np.zeros((self.patch_size, self.patch_size, n_cells)) 248 | kernel_posneg = np.ones((3, 3), dtype=np.uint8) 249 | 250 | for i_cell, c_id in enumerate(cell_ids): 251 | nucl_split[:, :, i_cell] = np.where(nucl_aug == c_id, 1, 0) 252 | 253 | if c_id not in self.nuclei_ids_elong: 254 | # Not elongated 255 | search_areas[:, :, i_cell] = cv2.dilate( 256 | nucl_split[:, :, i_cell], kernel, iterations=1 257 | ) 258 | else: 259 | # Elongated 260 | try: 261 | contours = cv2.findContours( 262 | nucl_split[:, :, i_cell].astype(np.uint8), 263 | cv2.RETR_LIST, 264 | cv2.CHAIN_APPROX_NONE, 265 | ) 266 | 267 | ellipse = cv2.fitEllipse(np.squeeze(contours[0])) 268 | (center, axes, orientation) = ellipse 269 | majoraxis_length = max(axes) 270 | minoraxis_length = min(axes) 271 | eccentricity = np.sqrt( 272 | 1 - (minoraxis_length / majoraxis_length) ** 2 273 | ) 274 | 275 | # Get ellipse filter based on eccentricity and majoraxis length 276 | # Rotate based on orientation 277 | # https://opencv24-python-tutorials.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html 278 | ksizeh = int(round(ecc_scale * eccentricity * ksize_total)) 279 | ksizev = ( 280 | ksize_total - ksizeh 281 | if (ksize_total - ksizeh) > ksizevmin 282 | else ksizevmin 283 | ) 284 | kernel_elong = cv2.getStructuringElement( 285 | cv2.MORPH_ELLIPSE, (ksizeh, ksizev) 286 | ) 287 | kernel_elong = rotate(kernel_elong, 90 - orientation, reshape=True) 288 | 289 | search_areas[:, :, i_cell] = cv2.dilate( 290 | nucl_split[:, :, i_cell], kernel_elong, iterations=1 291 | ) 292 | except Exception: 293 | search_areas[:, :, i_cell] = cv2.dilate( 294 | nucl_split[:, :, i_cell], kernel, iterations=1 295 | ) 296 | 297 | ct_nucleus = int(self.nuclei_types_idx[self.nuclei_types_ids.index(c_id)]) 298 | ct_nucleus_name = self.type_names[ct_nucleus] 299 | 300 | # Markers with dilation 301 | # ct_pos = np.expand_dims(np.expand_dims(self.pos_markers[ct_nucleus,:], 0),0)*expr_aug 302 | pos_vals = self.pos_markers.loc[ct_nucleus_name, self.gene_names].to_numpy() 303 | ct_pos = np.expand_dims(np.expand_dims(pos_vals, 0), 0) * expr_aug 304 | ct_pos = np.sum(ct_pos, -1) 305 | ct_pos[ct_pos > 0] = 1 306 | ct_pos[ct_pos < 0] = 0 307 | search_pos[:, :, i_cell] = search_areas[:, :, i_cell] * cv2.dilate( 308 | ct_pos, kernel_posneg, iterations=1 309 | ) 310 | search_pos[search_pos > 0] = 1 311 | search_pos[search_pos < 0] = 0 312 | 313 | # ct_neg = np.expand_dims(np.expand_dims(self.neg_markers[ct_nucleus,:], 0),0)*expr_aug 314 | neg_vals = self.neg_markers.loc[ct_nucleus_name, self.gene_names].to_numpy() 315 | ct_neg = np.expand_dims(np.expand_dims(neg_vals, 0), 0) * expr_aug 316 | ct_neg = np.sum(ct_neg, -1) 317 | ct_neg[ct_neg > 0] = 1 318 | ct_neg[ct_neg < 0] = 0 319 | search_neg[:, :, i_cell] = search_areas[:, :, i_cell] * cv2.dilate( 320 | ct_neg, kernel_posneg, iterations=1 321 | ) 322 | search_neg[search_neg > 0] = 1 323 | search_neg[search_neg < 0] = 0 324 | 325 | search_areas[search_areas > 0] = 1 326 | search_areas[search_areas < 0] = 0 327 | 328 | expr_aug_sum = np.sum(expr_aug, -1) 329 | 330 | # Mask expressions and change channel order 331 | expr_split = np.repeat(expr_aug[:, :, :, np.newaxis], n_cells, axis=3) 332 | expr_split = expr_split * np.expand_dims(search_areas, 2) 333 | 334 | # Convert to tensor 335 | expr_torch = torch.from_numpy(expr_split).float() 336 | nucl_torch = torch.from_numpy(nucl_split).long() 337 | search_areas_torch = torch.from_numpy(search_areas).long() 338 | search_pos_torch = torch.from_numpy(search_pos).long() 339 | search_neg_torch = torch.from_numpy(search_neg).long() 340 | 341 | if self.isTraining: 342 | return ( 343 | expr_torch, 344 | nucl_torch, 345 | search_areas_torch, 346 | search_pos_torch, 347 | search_neg_torch, 348 | coords_h1, 349 | coords_w1, 350 | nucl_aug, 351 | expr_aug_sum, 352 | ) 353 | else: 354 | return ( 355 | expr_torch, 356 | nucl_torch, 357 | search_areas_torch, 358 | search_pos_torch, 359 | search_neg_torch, 360 | coords_h1, 361 | coords_w1, 362 | nucl_aug, 363 | expr_aug_sum, 364 | self.nuclei.shape[0], 365 | self.nuclei.shape[1], 366 | self.expr_fp, 367 | ) 368 | -------------------------------------------------------------------------------- /bidcell/model/model/intialisation.py: -------------------------------------------------------------------------------- 1 | from torch.nn import init 2 | 3 | 4 | def weights_init_normal(m): 5 | classname = m.__class__.__name__ 6 | if classname.find("Conv") != -1: 7 | init.normal_(m.weight.data, 0.0, 0.02) 8 | elif classname.find("Linear") != -1: 9 | init.normal_(m.weight.data, 0.0, 0.02) 10 | elif classname.find("BatchNorm") != -1: 11 | init.normal_(m.weight.data, 1.0, 0.02) 12 | init.constant_(m.bias.data, 0.0) 13 | 14 | 15 | def weights_init_xavier(m): 16 | classname = m.__class__.__name__ 17 | if classname.find("Conv") != -1: 18 | init.xavier_normal_(m.weight.data, gain=1) 19 | elif classname.find("Linear") != -1: 20 | init.xavier_normal_(m.weight.data, gain=1) 21 | elif classname.find("BatchNorm") != -1: 22 | init.normal_(m.weight.data, 1.0, 0.02) 23 | init.constant_(m.bias.data, 0.0) 24 | 25 | 26 | def weights_init_kaiming(m): 27 | classname = m.__class__.__name__ 28 | if classname.find("Conv") != -1: 29 | init.kaiming_normal_(m.weight.data, a=0, mode="fan_in") 30 | elif classname.find("Linear") != -1: 31 | init.kaiming_normal_(m.weight.data, a=0, mode="fan_in") 32 | elif classname.find("BatchNorm") != -1: 33 | init.normal_(m.weight.data, 1.0, 0.02) 34 | init.constant_(m.bias.data, 0.0) 35 | 36 | 37 | def weights_init_orthogonal(m): 38 | classname = m.__class__.__name__ 39 | if classname.find("Conv") != -1: 40 | init.orthogonal_(m.weight.data, gain=1) 41 | elif classname.find("Linear") != -1: 42 | init.orthogonal_(m.weight.data, gain=1) 43 | elif classname.find("BatchNorm") != -1: 44 | init.normal_(m.weight.data, 1.0, 0.02) 45 | init.constant_(m.bias.data, 0.0) 46 | 47 | 48 | def init_weights(net, init_type="normal"): 49 | if init_type == "normal": 50 | net.apply(weights_init_normal) 51 | elif init_type == "xavier": 52 | net.apply(weights_init_xavier) 53 | elif init_type == "kaiming": 54 | net.apply(weights_init_kaiming) 55 | elif init_type == "orthogonal": 56 | net.apply(weights_init_orthogonal) 57 | else: 58 | raise NotImplementedError( 59 | "initialization method [%s] is not implemented" % init_type 60 | ) 61 | -------------------------------------------------------------------------------- /bidcell/model/model/layers.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | from .intialisation import init_weights 4 | 5 | 6 | class unetConv2(nn.Module): 7 | def __init__(self, in_size, out_size, is_batchnorm, n=2, ks=3, stride=1, padding=1): 8 | super(unetConv2, self).__init__() 9 | self.n = n 10 | self.ks = ks 11 | self.stride = stride 12 | self.padding = padding 13 | s = stride 14 | p = padding 15 | 16 | if is_batchnorm: 17 | for i in range(1, n + 1): 18 | conv = nn.Sequential( 19 | nn.Conv2d(in_size, out_size, ks, s, p), 20 | nn.BatchNorm2d(out_size), 21 | nn.ReLU(inplace=True), 22 | ) 23 | setattr(self, "conv%d" % i, conv) 24 | in_size = out_size 25 | else: 26 | for i in range(1, n + 1): 27 | conv = nn.Sequential( 28 | nn.Conv2d(in_size, out_size, ks, s, p), 29 | nn.ReLU(inplace=True), 30 | ) 31 | setattr(self, "conv%d" % i, conv) 32 | in_size = out_size 33 | 34 | # initialise the blocks 35 | for m in self.children(): 36 | init_weights(m, init_type="kaiming") 37 | 38 | def forward(self, inputs): 39 | x = inputs 40 | for i in range(1, self.n + 1): 41 | conv = getattr(self, "conv%d" % i) 42 | x = conv(x) 43 | return x 44 | 45 | 46 | class unetUp(nn.Module): 47 | def __init__(self, in_size, out_size, is_deconv, n_concat=2): 48 | super(unetUp, self).__init__() 49 | self.conv = unetConv2(out_size * 2, out_size, False) 50 | 51 | if is_deconv: 52 | self.up = nn.ConvTranspose2d( 53 | in_size, out_size, kernel_size=4, stride=2, padding=1 54 | ) 55 | else: 56 | self.up = nn.UpsamplingBilinear2d(scale_factor=2) 57 | 58 | # initialise the blocks 59 | for m in self.children(): 60 | if m.__class__.__name__.find("unetConv2") != -1: 61 | continue 62 | init_weights(m, init_type="kaiming") 63 | 64 | def forward(self, inputs0, *input): 65 | outputs0 = self.up(inputs0) 66 | for i in range(len(input)): 67 | outputs0 = torch.cat([outputs0, input[i]], 1) 68 | return self.conv(outputs0) 69 | 70 | 71 | class unetUp_origin(nn.Module): 72 | def __init__(self, in_size, out_size, is_deconv, n_concat=2): 73 | super(unetUp_origin, self).__init__() 74 | # self.conv = unetConv2(out_size*2, out_size, False) 75 | if is_deconv: 76 | self.conv = unetConv2(in_size + (n_concat - 2) * out_size, out_size, False) 77 | self.up = nn.ConvTranspose2d( 78 | in_size, out_size, kernel_size=4, stride=2, padding=1 79 | ) 80 | else: 81 | self.conv = unetConv2(in_size + (n_concat - 2) * out_size, out_size, False) 82 | self.up = nn.UpsamplingBilinear2d(scale_factor=2) 83 | 84 | # initialise the blocks 85 | for m in self.children(): 86 | if m.__class__.__name__.find("unetConv2") != -1: 87 | continue 88 | init_weights(m, init_type="kaiming") 89 | 90 | def forward(self, inputs0, *input): 91 | # print(self.n_concat) 92 | # print(input) 93 | outputs0 = self.up(inputs0) 94 | for i in range(len(input)): 95 | outputs0 = torch.cat([outputs0, input[i]], 1) 96 | return self.conv(outputs0) 97 | -------------------------------------------------------------------------------- /bidcell/model/model/losses.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | 5 | class NucleiEncapsulationLoss(nn.Module): 6 | """ 7 | Ensure that nuclei are fully within predicted cells 8 | """ 9 | 10 | def __init__(self, weight, device) -> None: 11 | super(NucleiEncapsulationLoss, self).__init__() 12 | self.weight = weight 13 | self.device = device 14 | 15 | def forward(self, seg_pred, batch_n): 16 | criterion_ce = torch.nn.CrossEntropyLoss(reduction="mean") 17 | loss = criterion_ce(seg_pred, batch_n[:, 0, :, :]) 18 | 19 | return self.weight * loss 20 | 21 | 22 | class Oversegmentation(nn.Module): 23 | """ 24 | Minimise oversegmentation 25 | """ 26 | 27 | def __init__(self, weight, device) -> None: 28 | super(Oversegmentation, self).__init__() 29 | self.weight = weight 30 | self.device = device 31 | 32 | def forward(self, seg_pred, batch_n): 33 | batch_n = batch_n[:, 0, :, :] 34 | 35 | seg_probs = torch.nn.functional.softmax(seg_pred, dim=1) 36 | probs_nuc = seg_probs[:, 1, :, :] * batch_n 37 | 38 | mask_cyto = torch.ones(batch_n.shape).to(self.device) - batch_n 39 | probs_cyto = seg_probs[:, 1, :, :] * mask_cyto 40 | 41 | ones = torch.ones(probs_cyto.shape).to(self.device) 42 | zeros = torch.zeros(probs_cyto.shape).to(self.device) 43 | 44 | alpha = 1.0 45 | 46 | preds_nuc = torch.sigmoid((probs_nuc - 0.5) * alpha) * (ones - zeros) + zeros 47 | count_nuc = torch.sum(preds_nuc) 48 | 49 | preds_cyto = torch.sigmoid((probs_cyto - 0.5) * alpha) * (ones - zeros) + zeros 50 | count_cyto = torch.sum(preds_cyto) 51 | 52 | extra = count_cyto - count_nuc 53 | m = torch.nn.ReLU() 54 | loss = m(extra) 55 | 56 | loss = loss / seg_pred.shape[0] 57 | 58 | return self.weight * loss 59 | 60 | 61 | class CellCallingLoss(nn.Module): 62 | """ 63 | Maximise assignment of transcripts to cells 64 | """ 65 | 66 | def __init__(self, weight, device) -> None: 67 | super(CellCallingLoss, self).__init__() 68 | self.weight = weight 69 | self.device = device 70 | 71 | def forward(self, seg_pred, batch_sa): 72 | # Limit to searchable area where there is detected expression 73 | penalisable = batch_sa * 1 74 | criterion_ce = torch.nn.CrossEntropyLoss(reduction="none") 75 | loss = criterion_ce(seg_pred, penalisable[:, 0, :, :]) 76 | 77 | loss_total = torch.sum(loss) 78 | 79 | loss_total = loss_total / seg_pred.shape[0] 80 | 81 | return self.weight * loss_total 82 | 83 | 84 | class OverlapLoss(nn.Module): 85 | """ 86 | Penalise overlaps between different cells 87 | """ 88 | 89 | def __init__(self, weight, device) -> None: 90 | super(OverlapLoss, self).__init__() 91 | self.weight = weight 92 | self.device = device 93 | 94 | def forward(self, seg_pred, batch_n): 95 | batch_n = batch_n[:, 0, :, :] 96 | seg_probs = torch.nn.functional.softmax(seg_pred, dim=1) 97 | 98 | all_nuclei = torch.sum(batch_n, 0) 99 | all_not_nuclei = torch.ones(batch_n.shape).to(self.device) - all_nuclei 100 | 101 | probs_cyto = seg_probs[:, 1, :, :] * all_not_nuclei 102 | 103 | ones = torch.ones(probs_cyto.shape).to(self.device) 104 | zeros = torch.zeros(probs_cyto.shape).to(self.device) 105 | 106 | # Penalise if number of cell prob > 0.5 is > 1 107 | alpha = 1.0 108 | preds_cyto = torch.sigmoid((probs_cyto - 0.5) * alpha) * (ones - zeros) + zeros 109 | count_cyto_overlap = torch.sum(preds_cyto, 0) - all_not_nuclei 110 | m = torch.nn.ReLU() 111 | count_cyto_overlap = m(count_cyto_overlap) 112 | 113 | loss = torch.sum(count_cyto_overlap) 114 | 115 | scale = seg_pred.shape[0] * seg_pred.shape[2] * seg_pred.shape[3] 116 | loss = loss / scale 117 | 118 | return self.weight * loss 119 | 120 | 121 | class PosNegMarkerLoss(nn.Module): 122 | """ 123 | Positive and negative markers of cell type 124 | """ 125 | 126 | def __init__(self, weight_pos, weight_neg, device) -> None: 127 | super(PosNegMarkerLoss, self).__init__() 128 | self.weight_pos = weight_pos 129 | self.weight_neg = weight_neg 130 | self.device = device 131 | 132 | def forward(self, seg_pred, batch_pos, batch_neg): 133 | batch_pos = batch_pos[:, 0, :, :] 134 | batch_neg = batch_neg[:, 0, :, :] 135 | 136 | # POSITIVE markers 137 | criterion_ce = torch.nn.CrossEntropyLoss(reduction="sum") 138 | loss_pos = criterion_ce(seg_pred, batch_pos) 139 | 140 | # NEGATIVE markers 141 | seg_probs = torch.nn.functional.softmax(seg_pred, dim=1) 142 | probs_cells = seg_probs[:, 1, :, :] 143 | 144 | ones = torch.ones(probs_cells.shape).to(self.device) 145 | zeros = torch.zeros(probs_cells.shape).to(self.device) 146 | 147 | alpha = 1.0 148 | 149 | preds_cells = ( 150 | torch.sigmoid((probs_cells - 0.5) * alpha) * (ones - zeros) + zeros 151 | ) 152 | 153 | loss_neg = torch.sum(preds_cells * batch_neg) 154 | 155 | loss_total = ( 156 | self.weight_pos * loss_pos + self.weight_neg * loss_neg 157 | ) / seg_pred.shape[0] 158 | 159 | return loss_total 160 | -------------------------------------------------------------------------------- /bidcell/model/model/model.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # Credit: https://github.com/avBuffer/UNet3plus_pth 3 | import torch 4 | import torch.nn as nn 5 | 6 | from .intialisation import init_weights 7 | from .layers import unetConv2 8 | 9 | 10 | class SegmentationModel(nn.Module): 11 | def __init__( 12 | self, 13 | n_channels=313, 14 | bilinear=True, 15 | feature_scale=4, 16 | is_deconv=True, 17 | is_batchnorm=True, 18 | ): 19 | super(SegmentationModel, self).__init__() 20 | self.n_channels = n_channels 21 | self.bilinear = bilinear 22 | self.feature_scale = feature_scale 23 | self.is_deconv = is_deconv 24 | self.is_batchnorm = is_batchnorm 25 | filters = [64, 128, 256, 512, 1024] 26 | 27 | ## -------------Encoder-------------- 28 | self.conv1 = unetConv2(self.n_channels, filters[0], self.is_batchnorm) 29 | self.maxpool1 = nn.MaxPool2d(kernel_size=2) 30 | 31 | self.conv2 = unetConv2(filters[0], filters[1], self.is_batchnorm) 32 | self.maxpool2 = nn.MaxPool2d(kernel_size=2) 33 | 34 | self.conv3 = unetConv2(filters[1], filters[2], self.is_batchnorm) 35 | self.maxpool3 = nn.MaxPool2d(kernel_size=2) 36 | 37 | self.conv4 = unetConv2(filters[2], filters[3], self.is_batchnorm) 38 | self.maxpool4 = nn.MaxPool2d(kernel_size=2) 39 | 40 | self.conv5 = unetConv2(filters[3], filters[4], self.is_batchnorm) 41 | 42 | ## -------------Decoder-------------- 43 | self.CatChannels = filters[0] 44 | self.CatBlocks = 5 45 | self.UpChannels = self.CatChannels * self.CatBlocks 46 | 47 | """stage 4d""" 48 | # h1->320*320, hd4->40*40, Pooling 8 times 49 | self.h1_PT_hd4 = nn.MaxPool2d(8, 8, ceil_mode=True) 50 | self.h1_PT_hd4_conv = nn.Conv2d(filters[0], self.CatChannels, 3, padding=1) 51 | self.h1_PT_hd4_bn = nn.BatchNorm2d(self.CatChannels) 52 | self.h1_PT_hd4_relu = nn.ReLU(inplace=True) 53 | 54 | # h2->160*160, hd4->40*40, Pooling 4 times 55 | self.h2_PT_hd4 = nn.MaxPool2d(4, 4, ceil_mode=True) 56 | self.h2_PT_hd4_conv = nn.Conv2d(filters[1], self.CatChannels, 3, padding=1) 57 | self.h2_PT_hd4_bn = nn.BatchNorm2d(self.CatChannels) 58 | self.h2_PT_hd4_relu = nn.ReLU(inplace=True) 59 | 60 | # h3->80*80, hd4->40*40, Pooling 2 times 61 | self.h3_PT_hd4 = nn.MaxPool2d(2, 2, ceil_mode=True) 62 | self.h3_PT_hd4_conv = nn.Conv2d(filters[2], self.CatChannels, 3, padding=1) 63 | self.h3_PT_hd4_bn = nn.BatchNorm2d(self.CatChannels) 64 | self.h3_PT_hd4_relu = nn.ReLU(inplace=True) 65 | 66 | # h4->40*40, hd4->40*40, Concatenation 67 | self.h4_Cat_hd4_conv = nn.Conv2d(filters[3], self.CatChannels, 3, padding=1) 68 | self.h4_Cat_hd4_bn = nn.BatchNorm2d(self.CatChannels) 69 | self.h4_Cat_hd4_relu = nn.ReLU(inplace=True) 70 | 71 | # hd5->20*20, hd4->40*40, Upsample 2 times 72 | self.hd5_UT_hd4 = nn.Upsample( 73 | scale_factor=2, mode="bilinear", align_corners=False 74 | ) # 14*14 75 | self.hd5_UT_hd4_conv = nn.Conv2d(filters[4], self.CatChannels, 3, padding=1) 76 | self.hd5_UT_hd4_bn = nn.BatchNorm2d(self.CatChannels) 77 | self.hd5_UT_hd4_relu = nn.ReLU(inplace=True) 78 | 79 | # fusion(h1_PT_hd4, h2_PT_hd4, h3_PT_hd4, h4_Cat_hd4, hd5_UT_hd4) 80 | self.conv4d_1 = nn.Conv2d(self.UpChannels, self.UpChannels, 3, padding=1) # 16 81 | self.bn4d_1 = nn.BatchNorm2d(self.UpChannels) 82 | self.relu4d_1 = nn.ReLU(inplace=True) 83 | 84 | """stage 3d""" 85 | # h1->320*320, hd3->80*80, Pooling 4 times 86 | self.h1_PT_hd3 = nn.MaxPool2d(4, 4, ceil_mode=True) 87 | self.h1_PT_hd3_conv = nn.Conv2d(filters[0], self.CatChannels, 3, padding=1) 88 | self.h1_PT_hd3_bn = nn.BatchNorm2d(self.CatChannels) 89 | self.h1_PT_hd3_relu = nn.ReLU(inplace=True) 90 | 91 | # h2->160*160, hd3->80*80, Pooling 2 times 92 | self.h2_PT_hd3 = nn.MaxPool2d(2, 2, ceil_mode=True) 93 | self.h2_PT_hd3_conv = nn.Conv2d(filters[1], self.CatChannels, 3, padding=1) 94 | self.h2_PT_hd3_bn = nn.BatchNorm2d(self.CatChannels) 95 | self.h2_PT_hd3_relu = nn.ReLU(inplace=True) 96 | 97 | # h3->80*80, hd3->80*80, Concatenation 98 | self.h3_Cat_hd3_conv = nn.Conv2d(filters[2], self.CatChannels, 3, padding=1) 99 | self.h3_Cat_hd3_bn = nn.BatchNorm2d(self.CatChannels) 100 | self.h3_Cat_hd3_relu = nn.ReLU(inplace=True) 101 | 102 | # hd4->40*40, hd4->80*80, Upsample 2 times 103 | self.hd4_UT_hd3 = nn.Upsample( 104 | scale_factor=2, mode="bilinear", align_corners=False 105 | ) # 14*14 106 | self.hd4_UT_hd3_conv = nn.Conv2d( 107 | self.UpChannels, self.CatChannels, 3, padding=1 108 | ) 109 | self.hd4_UT_hd3_bn = nn.BatchNorm2d(self.CatChannels) 110 | self.hd4_UT_hd3_relu = nn.ReLU(inplace=True) 111 | 112 | # hd5->20*20, hd4->80*80, Upsample 4 times 113 | self.hd5_UT_hd3 = nn.Upsample( 114 | scale_factor=4, mode="bilinear", align_corners=False 115 | ) # 14*14 116 | self.hd5_UT_hd3_conv = nn.Conv2d(filters[4], self.CatChannels, 3, padding=1) 117 | self.hd5_UT_hd3_bn = nn.BatchNorm2d(self.CatChannels) 118 | self.hd5_UT_hd3_relu = nn.ReLU(inplace=True) 119 | 120 | # fusion(h1_PT_hd3, h2_PT_hd3, h3_Cat_hd3, hd4_UT_hd3, hd5_UT_hd3) 121 | self.conv3d_1 = nn.Conv2d(self.UpChannels, self.UpChannels, 3, padding=1) # 16 122 | self.bn3d_1 = nn.BatchNorm2d(self.UpChannels) 123 | self.relu3d_1 = nn.ReLU(inplace=True) 124 | 125 | """stage 2d """ 126 | # h1->320*320, hd2->160*160, Pooling 2 times 127 | self.h1_PT_hd2 = nn.MaxPool2d(2, 2, ceil_mode=True) 128 | self.h1_PT_hd2_conv = nn.Conv2d(filters[0], self.CatChannels, 3, padding=1) 129 | self.h1_PT_hd2_bn = nn.BatchNorm2d(self.CatChannels) 130 | self.h1_PT_hd2_relu = nn.ReLU(inplace=True) 131 | 132 | # h2->160*160, hd2->160*160, Concatenation 133 | self.h2_Cat_hd2_conv = nn.Conv2d(filters[1], self.CatChannels, 3, padding=1) 134 | self.h2_Cat_hd2_bn = nn.BatchNorm2d(self.CatChannels) 135 | self.h2_Cat_hd2_relu = nn.ReLU(inplace=True) 136 | 137 | # hd3->80*80, hd2->160*160, Upsample 2 times 138 | self.hd3_UT_hd2 = nn.Upsample( 139 | scale_factor=2, mode="bilinear", align_corners=False 140 | ) # 14*14 141 | self.hd3_UT_hd2_conv = nn.Conv2d( 142 | self.UpChannels, self.CatChannels, 3, padding=1 143 | ) 144 | self.hd3_UT_hd2_bn = nn.BatchNorm2d(self.CatChannels) 145 | self.hd3_UT_hd2_relu = nn.ReLU(inplace=True) 146 | 147 | # hd4->40*40, hd2->160*160, Upsample 4 times 148 | self.hd4_UT_hd2 = nn.Upsample( 149 | scale_factor=4, mode="bilinear", align_corners=False 150 | ) # 14*14 151 | self.hd4_UT_hd2_conv = nn.Conv2d( 152 | self.UpChannels, self.CatChannels, 3, padding=1 153 | ) 154 | self.hd4_UT_hd2_bn = nn.BatchNorm2d(self.CatChannels) 155 | self.hd4_UT_hd2_relu = nn.ReLU(inplace=True) 156 | 157 | # hd5->20*20, hd2->160*160, Upsample 8 times 158 | self.hd5_UT_hd2 = nn.Upsample( 159 | scale_factor=8, mode="bilinear", align_corners=False 160 | ) # 14*14 161 | self.hd5_UT_hd2_conv = nn.Conv2d(filters[4], self.CatChannels, 3, padding=1) 162 | self.hd5_UT_hd2_bn = nn.BatchNorm2d(self.CatChannels) 163 | self.hd5_UT_hd2_relu = nn.ReLU(inplace=True) 164 | 165 | # fusion(h1_PT_hd2, h2_Cat_hd2, hd3_UT_hd2, hd4_UT_hd2, hd5_UT_hd2) 166 | self.conv2d_1 = nn.Conv2d(self.UpChannels, self.UpChannels, 3, padding=1) # 16 167 | self.bn2d_1 = nn.BatchNorm2d(self.UpChannels) 168 | self.relu2d_1 = nn.ReLU(inplace=True) 169 | 170 | """stage 1d""" 171 | # h1->320*320, hd1->320*320, Concatenation 172 | self.h1_Cat_hd1_conv = nn.Conv2d(filters[0], self.CatChannels, 3, padding=1) 173 | self.h1_Cat_hd1_bn = nn.BatchNorm2d(self.CatChannels) 174 | self.h1_Cat_hd1_relu = nn.ReLU(inplace=True) 175 | 176 | # hd2->160*160, hd1->320*320, Upsample 2 times 177 | self.hd2_UT_hd1 = nn.Upsample( 178 | scale_factor=2, mode="bilinear", align_corners=False 179 | ) # 14*14 180 | self.hd2_UT_hd1_conv = nn.Conv2d( 181 | self.UpChannels, self.CatChannels, 3, padding=1 182 | ) 183 | self.hd2_UT_hd1_bn = nn.BatchNorm2d(self.CatChannels) 184 | self.hd2_UT_hd1_relu = nn.ReLU(inplace=True) 185 | 186 | # hd3->80*80, hd1->320*320, Upsample 4 times 187 | self.hd3_UT_hd1 = nn.Upsample( 188 | scale_factor=4, mode="bilinear", align_corners=False 189 | ) # 14*14 190 | self.hd3_UT_hd1_conv = nn.Conv2d( 191 | self.UpChannels, self.CatChannels, 3, padding=1 192 | ) 193 | self.hd3_UT_hd1_bn = nn.BatchNorm2d(self.CatChannels) 194 | self.hd3_UT_hd1_relu = nn.ReLU(inplace=True) 195 | 196 | # hd4->40*40, hd1->320*320, Upsample 8 times 197 | self.hd4_UT_hd1 = nn.Upsample( 198 | scale_factor=8, mode="bilinear", align_corners=False 199 | ) # 14*14 200 | self.hd4_UT_hd1_conv = nn.Conv2d( 201 | self.UpChannels, self.CatChannels, 3, padding=1 202 | ) 203 | self.hd4_UT_hd1_bn = nn.BatchNorm2d(self.CatChannels) 204 | self.hd4_UT_hd1_relu = nn.ReLU(inplace=True) 205 | 206 | # hd5->20*20, hd1->320*320, Upsample 16 times 207 | self.hd5_UT_hd1 = nn.Upsample( 208 | scale_factor=16, mode="bilinear", align_corners=False 209 | ) # 14*14 210 | self.hd5_UT_hd1_conv = nn.Conv2d(filters[4], self.CatChannels, 3, padding=1) 211 | self.hd5_UT_hd1_bn = nn.BatchNorm2d(self.CatChannels) 212 | self.hd5_UT_hd1_relu = nn.ReLU(inplace=True) 213 | 214 | # fusion(h1_Cat_hd1, hd2_UT_hd1, hd3_UT_hd1, hd4_UT_hd1, hd5_UT_hd1) 215 | self.conv1d_1 = nn.Conv2d(self.UpChannels, self.UpChannels, 3, padding=1) # 16 216 | self.bn1d_1 = nn.BatchNorm2d(self.UpChannels) 217 | self.relu1d_1 = nn.ReLU(inplace=True) 218 | 219 | # output 220 | self.outconv_seg = nn.Conv2d(self.UpChannels, 2, 3, padding=1) 221 | 222 | # initialise weights 223 | for m in self.modules(): 224 | if isinstance(m, nn.Conv2d): 225 | init_weights(m, init_type="kaiming") 226 | elif isinstance(m, nn.BatchNorm2d): 227 | init_weights(m, init_type="kaiming") 228 | 229 | def forward(self, inputs): 230 | ## -------------Encoder------------- 231 | h1 = self.conv1(inputs) # h1->320*320*64 232 | 233 | h2 = self.maxpool1(h1) 234 | h2 = self.conv2(h2) # h2->160*160*128 235 | 236 | h3 = self.maxpool2(h2) 237 | h3 = self.conv3(h3) # h3->80*80*256 238 | 239 | h4 = self.maxpool3(h3) 240 | h4 = self.conv4(h4) # h4->40*40*512 241 | 242 | h5 = self.maxpool4(h4) 243 | hd5 = self.conv5(h5) # h5->20*20*1024 244 | 245 | ## -------------Decoder------------- 246 | h1_PT_hd4 = self.h1_PT_hd4_relu( 247 | self.h1_PT_hd4_bn(self.h1_PT_hd4_conv(self.h1_PT_hd4(h1))) 248 | ) 249 | h2_PT_hd4 = self.h2_PT_hd4_relu( 250 | self.h2_PT_hd4_bn(self.h2_PT_hd4_conv(self.h2_PT_hd4(h2))) 251 | ) 252 | h3_PT_hd4 = self.h3_PT_hd4_relu( 253 | self.h3_PT_hd4_bn(self.h3_PT_hd4_conv(self.h3_PT_hd4(h3))) 254 | ) 255 | h4_Cat_hd4 = self.h4_Cat_hd4_relu(self.h4_Cat_hd4_bn(self.h4_Cat_hd4_conv(h4))) 256 | hd5_UT_hd4 = self.hd5_UT_hd4_relu( 257 | self.hd5_UT_hd4_bn(self.hd5_UT_hd4_conv(self.hd5_UT_hd4(hd5))) 258 | ) 259 | hd4 = self.relu4d_1( 260 | self.bn4d_1( 261 | self.conv4d_1( 262 | torch.cat( 263 | (h1_PT_hd4, h2_PT_hd4, h3_PT_hd4, h4_Cat_hd4, hd5_UT_hd4), 1 264 | ) 265 | ) 266 | ) 267 | ) # hd4->40*40*UpChannels 268 | 269 | h1_PT_hd3 = self.h1_PT_hd3_relu( 270 | self.h1_PT_hd3_bn(self.h1_PT_hd3_conv(self.h1_PT_hd3(h1))) 271 | ) 272 | h2_PT_hd3 = self.h2_PT_hd3_relu( 273 | self.h2_PT_hd3_bn(self.h2_PT_hd3_conv(self.h2_PT_hd3(h2))) 274 | ) 275 | h3_Cat_hd3 = self.h3_Cat_hd3_relu(self.h3_Cat_hd3_bn(self.h3_Cat_hd3_conv(h3))) 276 | hd4_UT_hd3 = self.hd4_UT_hd3_relu( 277 | self.hd4_UT_hd3_bn(self.hd4_UT_hd3_conv(self.hd4_UT_hd3(hd4))) 278 | ) 279 | hd5_UT_hd3 = self.hd5_UT_hd3_relu( 280 | self.hd5_UT_hd3_bn(self.hd5_UT_hd3_conv(self.hd5_UT_hd3(hd5))) 281 | ) 282 | hd3 = self.relu3d_1( 283 | self.bn3d_1( 284 | self.conv3d_1( 285 | torch.cat( 286 | (h1_PT_hd3, h2_PT_hd3, h3_Cat_hd3, hd4_UT_hd3, hd5_UT_hd3), 1 287 | ) 288 | ) 289 | ) 290 | ) # hd3->80*80*UpChannels 291 | 292 | h1_PT_hd2 = self.h1_PT_hd2_relu( 293 | self.h1_PT_hd2_bn(self.h1_PT_hd2_conv(self.h1_PT_hd2(h1))) 294 | ) 295 | h2_Cat_hd2 = self.h2_Cat_hd2_relu(self.h2_Cat_hd2_bn(self.h2_Cat_hd2_conv(h2))) 296 | hd3_UT_hd2 = self.hd3_UT_hd2_relu( 297 | self.hd3_UT_hd2_bn(self.hd3_UT_hd2_conv(self.hd3_UT_hd2(hd3))) 298 | ) 299 | hd4_UT_hd2 = self.hd4_UT_hd2_relu( 300 | self.hd4_UT_hd2_bn(self.hd4_UT_hd2_conv(self.hd4_UT_hd2(hd4))) 301 | ) 302 | hd5_UT_hd2 = self.hd5_UT_hd2_relu( 303 | self.hd5_UT_hd2_bn(self.hd5_UT_hd2_conv(self.hd5_UT_hd2(hd5))) 304 | ) 305 | hd2 = self.relu2d_1( 306 | self.bn2d_1( 307 | self.conv2d_1( 308 | torch.cat( 309 | (h1_PT_hd2, h2_Cat_hd2, hd3_UT_hd2, hd4_UT_hd2, hd5_UT_hd2), 1 310 | ) 311 | ) 312 | ) 313 | ) # hd2->160*160*UpChannels 314 | 315 | h1_Cat_hd1 = self.h1_Cat_hd1_relu(self.h1_Cat_hd1_bn(self.h1_Cat_hd1_conv(h1))) 316 | hd2_UT_hd1 = self.hd2_UT_hd1_relu( 317 | self.hd2_UT_hd1_bn(self.hd2_UT_hd1_conv(self.hd2_UT_hd1(hd2))) 318 | ) 319 | hd3_UT_hd1 = self.hd3_UT_hd1_relu( 320 | self.hd3_UT_hd1_bn(self.hd3_UT_hd1_conv(self.hd3_UT_hd1(hd3))) 321 | ) 322 | hd4_UT_hd1 = self.hd4_UT_hd1_relu( 323 | self.hd4_UT_hd1_bn(self.hd4_UT_hd1_conv(self.hd4_UT_hd1(hd4))) 324 | ) 325 | hd5_UT_hd1 = self.hd5_UT_hd1_relu( 326 | self.hd5_UT_hd1_bn(self.hd5_UT_hd1_conv(self.hd5_UT_hd1(hd5))) 327 | ) 328 | hd1 = self.relu1d_1( 329 | self.bn1d_1( 330 | self.conv1d_1( 331 | torch.cat( 332 | (h1_Cat_hd1, hd2_UT_hd1, hd3_UT_hd1, hd4_UT_hd1, hd5_UT_hd1), 1 333 | ) 334 | ) 335 | ) 336 | ) # hd1->320*320*UpChannels 337 | 338 | seg = self.outconv_seg(hd1) # d1->320*320*2 339 | 340 | return seg 341 | -------------------------------------------------------------------------------- /bidcell/model/postprocess_predictions.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import glob 3 | import multiprocessing as mp 4 | import os 5 | import random 6 | import re 7 | from collections import Counter 8 | 9 | import cv2 10 | import matplotlib.pyplot as plt 11 | import numpy as np 12 | import tifffile 13 | from scipy import ndimage as ndi 14 | from ..config import load_config, Config 15 | 16 | 17 | def get_n_processes(n_processes): 18 | """Number of CPUs for multiprocessing""" 19 | if n_processes is None: 20 | return mp.cpu_count() 21 | else: 22 | return n_processes if n_processes <= mp.cpu_count() else mp.cpu_count() 23 | 24 | 25 | def sorted_alphanumeric(data): 26 | convert = lambda text: int(text) if text.isdigit() else text.lower() 27 | alphanum_key = lambda key: [convert(c) for c in re.split("([0-9]+)", key)] 28 | return sorted(data, key=alphanum_key) 29 | 30 | 31 | def get_exp_dir(config): 32 | if config.files.dir_id == "last": 33 | folders = next(os.walk("model_outputs"))[1] 34 | folders = sorted_alphanumeric(folders) 35 | folder_last = folders[-1] 36 | dir_id = folder_last.replace("\\", "/") 37 | else: 38 | dir_id = config.files.dir_id 39 | 40 | return dir_id 41 | 42 | 43 | def postprocess_connect(img, nuclei): 44 | cell_ids = np.unique(img) 45 | cell_ids = cell_ids[1:] 46 | 47 | random.shuffle(cell_ids) 48 | 49 | kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5)) 50 | 51 | # Touch diagonally = same object 52 | s = ndi.generate_binary_structure(2, 2) 53 | 54 | final = np.zeros(img.shape, dtype=np.uint32) 55 | 56 | for i in cell_ids: 57 | i_mask = np.where(img == i, 1, 0).astype(np.uint8) 58 | 59 | connected_mask = cv2.dilate(i_mask, kernel, iterations=2) 60 | connected_mask = cv2.erode(connected_mask, kernel, iterations=2) 61 | 62 | # Add nucleus as predicted by cellpose 63 | nucleus_mask = np.where(nuclei == i, 1, 0).astype(np.uint8) 64 | 65 | connected_mask = connected_mask + nucleus_mask 66 | connected_mask[connected_mask > 0] = 1 67 | 68 | unique_ids, num_ids = ndi.label(connected_mask, structure=s) 69 | if num_ids > 1: 70 | # The first element is always 0 (background) 71 | unique, counts = np.unique(unique_ids, return_counts=True) 72 | 73 | # Ensure counts in descending order 74 | counts, unique = (list(t) for t in zip(*sorted(zip(counts, unique)))) 75 | counts.reverse() 76 | unique.reverse() 77 | counts = np.array(counts) 78 | unique = np.array(unique) 79 | 80 | no_overlap = False 81 | 82 | # Rarely, the nucleus is not the largest segment 83 | for i_part in range(1, len(counts)): 84 | if i_part > 1: 85 | no_overlap = True # TODO: Helen, check this! 86 | largest = unique[np.argmax(counts[i_part:]) + i_part] 87 | connected_mask = np.where(unique_ids == largest, 1, 0) 88 | # Break if current largest region overlaps nucleus 89 | if np.sum(connected_mask * nucleus_mask) > 0.5: 90 | break 91 | 92 | # Close holes on largest section 93 | filled_mask = ndi.binary_fill_holes(connected_mask).astype(int) 94 | 95 | else: 96 | filled_mask = ndi.binary_fill_holes(connected_mask).astype(int) 97 | 98 | final = np.where(filled_mask > 0, i, final) 99 | 100 | final = np.where(nuclei > 0, nuclei, final) 101 | return final 102 | 103 | 104 | def remove_islands(img, nuclei): 105 | cell_ids = np.unique(img) 106 | cell_ids = cell_ids[1:] 107 | 108 | random.shuffle(cell_ids) 109 | 110 | # Touch diagonally = same object 111 | s = ndi.generate_binary_structure(2, 2) 112 | 113 | final = np.zeros(img.shape, dtype=np.uint32) 114 | 115 | for i in cell_ids: 116 | i_mask = np.where(img == i, 1, 0).astype(np.uint8) 117 | 118 | nucleus_mask = np.where(nuclei == i, 1, 0).astype(np.uint8) 119 | 120 | # Number of blobs belonging to cell 121 | unique_ids, num_blobs = ndi.label(i_mask, structure=s) 122 | if num_blobs > 1: 123 | # Keep the blob with max overlap to nucleus 124 | amount_overlap = np.zeros(num_blobs) 125 | 126 | for i_blob in range(1, num_blobs + 1): 127 | blob = np.where(unique_ids == i_blob, 1, 0) 128 | amount_overlap[i_blob - 1] = np.sum(blob * nucleus_mask) 129 | blob_keep = np.argmax(amount_overlap) + 1 130 | 131 | final_mask = np.where(unique_ids == blob_keep, 1, 0) 132 | 133 | else: 134 | blob_size = np.count_nonzero(i_mask) 135 | if blob_size > 2: 136 | final_mask = i_mask.copy() 137 | else: 138 | final_mask = i_mask * 0 139 | 140 | final_mask = ndi.binary_fill_holes(final_mask).astype(int) 141 | 142 | final = np.where(final_mask > 0, i, final) 143 | 144 | return final 145 | 146 | 147 | def process_chunk(chunk, patch_size, img_whole, nuclei_img, output_dir): 148 | for index in range(len(chunk)): 149 | coords = chunk[index] 150 | coords_x1 = coords[0] 151 | coords_y1 = coords[1] 152 | coords_x2 = coords_x1 + patch_size 153 | coords_y2 = coords_y1 + patch_size 154 | 155 | img = img_whole[coords_x1:coords_x2, coords_y1:coords_y2] 156 | 157 | nuclei = nuclei_img[coords_x1:coords_x2, coords_y1:coords_y2] 158 | 159 | output_fp = output_dir + "%d_%d.tif" % (coords_x1, coords_y1) 160 | 161 | # print('Filling holes') 162 | filled = postprocess_connect(img, nuclei) 163 | 164 | # print('Removing islands') 165 | final = remove_islands(filled, nuclei) 166 | 167 | tifffile.imwrite(output_fp, final.astype(np.uint32), photometric="minisblack") 168 | 169 | cell_ids = np.unique(final)[1:] 170 | 171 | # Visualise cells with random colours 172 | n_cells_ids = len(cell_ids) 173 | cell_ids_rand = np.arange(1, n_cells_ids + 1) 174 | random.shuffle(cell_ids_rand) 175 | 176 | keep_mask = np.isin(nuclei, cell_ids) 177 | nuclei = np.where(keep_mask, nuclei, 0) 178 | keep_mask = np.isin(img, cell_ids) 179 | img = np.where(keep_mask, nuclei, 0) 180 | 181 | dictionary = dict(zip(cell_ids, cell_ids_rand)) 182 | dictionary[0] = 0 183 | nuclei_mapped = np.copy(nuclei) 184 | img_mapped = np.copy(img) 185 | final_mapped = np.copy(final) 186 | 187 | nuclei_mapped = np.vectorize(dictionary.get)(nuclei) 188 | img_mapped = np.vectorize(dictionary.get)(img) 189 | final_mapped = np.vectorize(dictionary.get)(final) 190 | 191 | fig, axes = plt.subplots(ncols=3, figsize=(9, 3), sharex=True, sharey=True) 192 | ax = axes.ravel() 193 | ax[0].imshow(nuclei_mapped, cmap=plt.cm.gray) 194 | ax[0].set_title("Nuclei") 195 | ax[1].imshow(img_mapped, cmap=plt.cm.nipy_spectral) 196 | ax[1].set_title("Original") 197 | ax[2].imshow(final_mapped, cmap=plt.cm.nipy_spectral) 198 | ax[2].set_title("Processed") 199 | for a in ax: 200 | a.set_axis_off() 201 | fig.tight_layout() 202 | # plt.show() 203 | 204 | fig.savefig(output_fp.replace("tif", "png"), dpi=300) 205 | plt.close(fig) 206 | 207 | 208 | def combine(config, dir_id, patch_size, nuclei_img): 209 | """ 210 | Combine the patches previously output by the connect function 211 | """ 212 | 213 | fp_dir = dir_id + "epoch_%d_step_%d_connected" % ( 214 | config.testing_params.test_epoch, 215 | config.testing_params.test_step, 216 | ) 217 | fp_unconnected = dir_id + "epoch_%d_step_%d.tif" % ( 218 | config.testing_params.test_epoch, 219 | config.testing_params.test_step, 220 | ) 221 | 222 | dl_pred = tifffile.imread(fp_unconnected) 223 | height = dl_pred.shape[0] 224 | width = dl_pred.shape[1] 225 | 226 | seg_final = np.zeros((height, width), dtype=np.uint32) 227 | 228 | fp_seg = glob.glob(fp_dir + "/*.tif", recursive=True) 229 | 230 | sample = tifffile.imread(fp_seg[0]) 231 | patch_h = sample.shape[0] 232 | patch_w = sample.shape[1] 233 | 234 | cell_ids = [] 235 | 236 | for fp in fp_seg: 237 | patch = tifffile.imread(fp) 238 | 239 | patch_ids = np.unique(patch) 240 | patch_ids = patch_ids[patch_ids != 0] 241 | cell_ids.extend(patch_ids) 242 | 243 | fp_coords = os.path.basename(fp).split(".")[0] 244 | fp_x = int(fp_coords.split("_")[0]) 245 | fp_y = int(fp_coords.split("_")[1]) 246 | 247 | # Place into appropriate location 248 | seg_final[fp_x : fp_x + patch_h, fp_y : fp_y + patch_w] = patch[:] 249 | 250 | # If cell is split by windowing, keep component with nucleus 251 | count_ids = Counter(cell_ids) 252 | windowed_ids = [k for k, v in count_ids.items() if v > 1] 253 | 254 | # Check along borders 255 | h_starts = list(np.arange(0, height - patch_size, patch_size)) 256 | w_starts = list(np.arange(0, width - patch_size, patch_size)) 257 | h_starts.append(height - patch_size) 258 | w_starts.append(width - patch_size) 259 | 260 | # Mask along grid 261 | h_starts_wide = [] 262 | w_starts_wide = [] 263 | for i in range(-10, 11): 264 | h_starts_wide.extend([x + i for x in h_starts]) 265 | w_starts_wide.extend([x + i for x in w_starts]) 266 | 267 | mask = np.zeros(seg_final.shape) 268 | mask[h_starts_wide, :] = 1 269 | mask[:, w_starts_wide] = 1 270 | 271 | masked = mask * seg_final 272 | masked_ids = np.unique(masked)[1:] 273 | 274 | # IDs to check for split bodies 275 | to_check_ids = list(set(masked_ids) & set(windowed_ids)) 276 | 277 | return seg_final, to_check_ids 278 | 279 | 280 | def process_check_splits(config, dir_id, nuclei_img, seg_final, chunk_ids): 281 | """ 282 | Check and fix cells split by windowing 283 | """ 284 | 285 | chunk_seg = np.zeros(seg_final.shape, dtype=np.uint32) 286 | 287 | # Touch diagonally = same object 288 | s = ndi.generate_binary_structure(2, 2) 289 | 290 | for i in chunk_ids: 291 | i_mask = np.where(seg_final == i, 1, 0).astype(np.uint8) 292 | 293 | # Number of blobs belonging to cell 294 | unique_ids, num_blobs = ndi.label(i_mask, structure=s) 295 | 296 | # Bounding box 297 | bb = np.argwhere(unique_ids) 298 | (ystart, xstart), (ystop, xstop) = bb.min(0), bb.max(0) + 1 299 | unique_ids_crop = unique_ids[ystart:ystop, xstart:xstop] 300 | 301 | nucleus_mask = np.where(nuclei_img == i, 1, 0).astype(np.uint8) 302 | nucleus_mask = nucleus_mask[ystart:ystop, xstart:xstop] 303 | 304 | if num_blobs > 1: 305 | # Keep the blob with max overlap to nucleus 306 | amount_overlap = np.zeros(num_blobs) 307 | 308 | for i_blob in range(1, num_blobs + 1): 309 | blob = np.where(unique_ids_crop == i_blob, 1, 0) 310 | amount_overlap[i_blob - 1] = np.sum(blob * nucleus_mask) 311 | blob_keep = np.argmax(amount_overlap) + 1 312 | 313 | # Put into final segmentation 314 | final_mask = np.where(unique_ids_crop == blob_keep, 1, 0) 315 | 316 | # seg_final = np.where(seg_final == i, 0, seg_final) 317 | 318 | # # Double check the few outliers 319 | # unique_ids_2, num_blobs_2 = ndi.label(final_mask, structure=s) 320 | # if num_blobs_2 > 1: 321 | # # Keep largest 322 | # blob_keep_2 = np.argmax(np.bincount(unique_ids_2)[1:]) + 1 323 | # final_mask = np.where(unique_ids_2 == blob_keep_2, 1, 0) 324 | 325 | chunk_seg[ystart:ystop, xstart:xstop] = np.where( 326 | final_mask == 1, i, chunk_seg[ystart:ystop, xstart:xstop] 327 | ) 328 | 329 | else: 330 | chunk_seg = np.where(i_mask == 1, i, chunk_seg) 331 | 332 | tifffile.imwrite( 333 | dir_id + "/" + str(chunk_ids[0]) + "_checked_splits.tif", 334 | chunk_seg, 335 | photometric="minisblack", 336 | ) 337 | 338 | 339 | def postprocess_predictions(config: Config, dir_id: str): 340 | dir_id = config.files.data_dir + "/model_outputs/" + dir_id + "/test_output/" 341 | 342 | pred_fp = dir_id + "epoch_%d_step_%d.tif" % ( 343 | config.testing_params.test_epoch, 344 | config.testing_params.test_step, 345 | ) 346 | output_dir = dir_id + "epoch_%d_step_%d_connected/" % ( 347 | config.testing_params.test_epoch, 348 | config.testing_params.test_step, 349 | ) 350 | 351 | nucleus_fp = os.path.join(config.files.data_dir, config.files.fp_nuclei) 352 | nuclei_img = tifffile.imread(nucleus_fp) 353 | 354 | if not os.path.exists(output_dir): 355 | os.makedirs(output_dir) 356 | 357 | img_whole = tifffile.imread(pred_fp) 358 | 359 | smallest_dim = np.min(img_whole.shape) 360 | if config.postprocess.patch_size_mp < smallest_dim: 361 | patch_size = config.postprocess.patch_size_mp 362 | else: 363 | patch_size = smallest_dim 364 | 365 | h_starts = list(np.arange(0, img_whole.shape[0] - patch_size, patch_size)) 366 | w_starts = list(np.arange(0, img_whole.shape[1] - patch_size, patch_size)) 367 | h_starts.append(img_whole.shape[0] - patch_size) 368 | w_starts.append(img_whole.shape[1] - patch_size) 369 | coords_starts = [(x, y) for x in h_starts for y in w_starts] 370 | print("%d patches available" % len(coords_starts)) 371 | 372 | # num_processes = mp.cpu_count() 373 | num_processes = get_n_processes(config.cpus) 374 | print("Num multiprocessing splits: %d" % num_processes) 375 | 376 | coords_splits = np.array_split(coords_starts, num_processes) 377 | processes = [] 378 | 379 | print("Processing...") 380 | 381 | for chunk in coords_splits: 382 | p = mp.Process( 383 | target=process_chunk, 384 | args=(chunk, patch_size, img_whole, nuclei_img, output_dir), 385 | ) 386 | processes.append(p) 387 | p.start() 388 | 389 | for p in processes: 390 | p.join() 391 | 392 | print("Combining results") 393 | seg_final, to_check_ids = combine(config, dir_id, patch_size, nuclei_img) 394 | # print(len(np.unique(seg_final)), len(to_check_ids)) 395 | 396 | ids_splits = np.array_split(to_check_ids, num_processes) 397 | processes = [] 398 | 399 | for chunk_ids in ids_splits: 400 | p = mp.Process( 401 | target=process_check_splits, 402 | args=(config, dir_id, nuclei_img, seg_final, chunk_ids), 403 | ) 404 | processes.append(p) 405 | p.start() 406 | 407 | for p in processes: 408 | p.join() 409 | 410 | check_mask = np.isin(seg_final, to_check_ids) 411 | seg_final = np.where(check_mask == 1, 0, seg_final) 412 | 413 | fp_checked_splits = glob.glob(dir_id + "/*_checked_splits.tif", recursive=True) 414 | for fp in fp_checked_splits: 415 | checked_split = tifffile.imread(fp) 416 | seg_final = np.where(checked_split > 0, checked_split, seg_final) 417 | os.remove(fp) 418 | 419 | seg_final = np.where(nuclei_img > 0, nuclei_img, seg_final) 420 | 421 | fp_dir = dir_id + "epoch_%d_step_%d_connected" % ( 422 | config.testing_params.test_epoch, 423 | config.testing_params.test_step, 424 | ) 425 | fp_output_seg = fp_dir + ".tif" 426 | print("Saved segmentation to %s" % fp_output_seg) 427 | tifffile.imwrite( 428 | fp_output_seg, seg_final.astype(np.uint32), photometric="minisblack" 429 | ) 430 | 431 | 432 | if __name__ == "__main__": 433 | parser = argparse.ArgumentParser() 434 | 435 | parser.add_argument("--config_dir", type=str, help="path to config") 436 | 437 | args = parser.parse_args() 438 | config = load_config(args.config_dir) 439 | 440 | postprocess_predictions(config) 441 | -------------------------------------------------------------------------------- /bidcell/model/predict.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import bisect 3 | import glob 4 | import logging 5 | import os 6 | import re 7 | import sys 8 | 9 | import natsort 10 | import numpy as np 11 | import pandas as pd 12 | import segmentation_models_pytorch as smp 13 | import tifffile 14 | import torch 15 | from torch.utils.data import DataLoader 16 | 17 | from .dataio.dataset_input import DataProcessing 18 | from .model.model import SegmentationModel as Network 19 | from .utils.utils import ( 20 | get_experiment_id, 21 | get_files_list, 22 | get_seg_mask, 23 | make_dir, 24 | save_fig_outputs, 25 | sorted_alphanumeric, 26 | ) 27 | 28 | from ..config import load_config, Config 29 | 30 | 31 | def predict(config: Config) -> str: 32 | logging.basicConfig( 33 | format="%(asctime)s %(levelname)s %(message)s", 34 | level=logging.INFO, 35 | stream=sys.stdout, 36 | ) 37 | 38 | use_cuda = torch.cuda.is_available() 39 | device = torch.device("cuda" if use_cuda else "cpu") 40 | 41 | # Create experiment directories 42 | make_new = False 43 | timestamp = get_experiment_id( 44 | make_new, 45 | config.experiment_dirs.dir_id, 46 | config.files.data_dir, 47 | ) 48 | experiment_path = os.path.join(config.files.data_dir, "model_outputs", timestamp) 49 | model_dir = experiment_path + "/" + config.experiment_dirs.model_dir 50 | test_output_dir = experiment_path + "/" + config.experiment_dirs.test_output_dir 51 | make_dir(test_output_dir) 52 | 53 | # Set up the model 54 | logging.info("Initialising model") 55 | 56 | atlas_exprs = pd.read_csv(config.files.fp_ref, index_col=0) 57 | n_genes = atlas_exprs.shape[1] - 3 58 | print("Number of genes: %d" % n_genes) 59 | 60 | if config.model_params.name != "custom": 61 | model = smp.Unet( 62 | encoder_name=config.model_params.name, 63 | encoder_weights=None, 64 | in_channels=n_genes, 65 | classes=2, 66 | ) 67 | else: 68 | model = Network(n_channels=n_genes) 69 | 70 | model = model.to(device) 71 | 72 | # Get list of model files 73 | if config.testing_params.test_epoch < 0: 74 | saved_model_paths, _ = get_files_list(model_dir, [".pth"]) 75 | saved_model_paths = sorted_alphanumeric(saved_model_paths) 76 | saved_model_names = [ 77 | (os.path.basename(x)).split(".")[0] for x in saved_model_paths 78 | ] 79 | saved_model_epochs = [x.split("_")[1] for x in saved_model_names] 80 | saved_model_steps = [x.split("_")[-1] for x in saved_model_names] 81 | if config.testing_params.test_epoch is None: 82 | saved_model_epochs = np.array(saved_model_epochs, dtype="int") 83 | saved_model_steps = np.array(saved_model_steps, dtype="int") 84 | elif config.testing_params.test_epoch == -1: 85 | saved_model_epochs = np.array(saved_model_epochs[-1], dtype="int") 86 | saved_model_epochs = [saved_model_epochs] 87 | saved_model_steps = np.array(saved_model_steps[-1], dtype="int") 88 | saved_model_steps = [saved_model_steps] 89 | else: 90 | saved_model_epochs = [config.testing_params.test_epoch] 91 | saved_model_steps = [config.testing_params.test_step] 92 | 93 | shifts = [0, int(config.model_params.patch_size / 2)] 94 | 95 | for shift_patches in shifts: 96 | # Dataloader 97 | logging.info("Preparing data") 98 | test_dataset = DataProcessing( 99 | config, 100 | isTraining=False, 101 | shift_patches=shift_patches, 102 | ) 103 | test_loader = DataLoader( 104 | dataset=test_dataset, batch_size=1, shuffle=False, num_workers=0 105 | ) 106 | 107 | n_test_examples = len(test_loader) 108 | logging.info("Total number of patches: %d" % n_test_examples) 109 | 110 | logging.info("Begin prediction") 111 | 112 | for epoch_idx, (test_epoch, test_step) in enumerate( 113 | zip(saved_model_epochs, saved_model_steps) 114 | ): 115 | current_dir = ( 116 | test_output_dir 117 | + "/" 118 | + "epoch_" 119 | + str(test_epoch) 120 | + "_step_" 121 | + str(test_step) 122 | ) 123 | make_dir(current_dir) 124 | 125 | # Restore model 126 | load_path = model_dir + "/epoch_%d_step_%d.pth" % (test_epoch, test_step) 127 | checkpoint = torch.load(load_path) 128 | model.load_state_dict(checkpoint["model_state_dict"]) 129 | epoch = checkpoint["epoch"] 130 | assert epoch == test_epoch 131 | print("Predict using " + load_path) 132 | 133 | model = model.eval() 134 | 135 | for batch_idx, ( 136 | batch_x313, 137 | batch_n, 138 | batch_sa, 139 | batch_pos, 140 | batch_neg, 141 | coords_h1, 142 | coords_w1, 143 | nucl_aug, 144 | expr_aug_sum, 145 | whole_h, 146 | whole_w, 147 | expr_fp, 148 | ) in enumerate(test_loader): 149 | if batch_idx == 0: 150 | whole_seg = np.zeros((whole_h, whole_w), dtype=np.uint32) 151 | 152 | # Permute channels axis to batch axis 153 | batch_x313 = batch_x313[0, :, :, :, :].permute(3, 2, 0, 1) 154 | batch_sa = batch_sa.permute(3, 0, 1, 2) 155 | batch_n = batch_n.permute(3, 0, 1, 2) 156 | 157 | if batch_x313.shape[0] == 0: 158 | seg_patch = np.zeros( 159 | ( 160 | config.model_params.patch_size, 161 | config.model_params.patch_size, 162 | ), 163 | dtype=np.uint32, 164 | ) 165 | 166 | else: 167 | # Transfer to GPU 168 | batch_x313 = batch_x313.to(device) 169 | batch_sa = batch_sa.to(device) 170 | batch_n = batch_n.to(device) 171 | 172 | # Forward pass 173 | seg_pred = model(batch_x313) 174 | 175 | coords_h1 = coords_h1.detach().cpu().squeeze().numpy() 176 | coords_w1 = coords_w1.detach().cpu().squeeze().numpy() 177 | sample_seg = seg_pred.detach().cpu().numpy() 178 | sample_n = nucl_aug.detach().cpu().numpy() 179 | sample_sa = batch_sa.detach().cpu().numpy() 180 | sample_expr = expr_aug_sum.detach().cpu().numpy() 181 | patch_fp = current_dir + "/%d_%d.png" % (coords_h1, coords_w1) 182 | 183 | if (batch_idx % config.training_params.sample_freq) == 0: 184 | save_fig_outputs( 185 | sample_seg, sample_n, sample_sa, sample_expr, patch_fp 186 | ) 187 | 188 | seg_patch = get_seg_mask(sample_seg, sample_n) 189 | 190 | # seg_patch_fp = current_dir + '/' + "%d_%d.tif" %(coords_h1, coords_w1) 191 | # tifffile.imwrite(seg_patch_fp, seg_patch.astype(np.uint32), photometric='minisblack') 192 | 193 | whole_seg[ 194 | coords_h1 : coords_h1 + config.model_params.patch_size, 195 | coords_w1 : coords_w1 + config.model_params.patch_size, 196 | ] = seg_patch.copy() 197 | 198 | seg_fp = ( 199 | test_output_dir 200 | + "/" 201 | + "epoch_%d_step_%d_seg_shift%d.tif" 202 | % (test_epoch, test_step, shift_patches) 203 | ) 204 | 205 | tifffile.imwrite( 206 | seg_fp, whole_seg.astype(np.uint32), photometric="minisblack" 207 | ) 208 | 209 | logging.info("Finished") 210 | 211 | return test_output_dir 212 | 213 | 214 | def gap_coords(coords, patcsize): 215 | """If gap larger than patcsize -> remove all corresponding locations""" 216 | starts_diff = np.diff(coords) 217 | gap_idx = np.where(starts_diff > patcsize)[0] + 1 218 | # print(gap_idx.shape) 219 | gap_end = [coords[x] for x in gap_idx] 220 | gap_start = [coords[bisect.bisect(coords, x - 1) - 1] for x in gap_end] 221 | # print(gap_start, gap_end) 222 | gap = [] 223 | for gs, ge in zip(gap_start, gap_end): 224 | gap.extend(list(range(gs, ge))) 225 | # print(len(gap)) 226 | return gap 227 | 228 | 229 | def fill_grid(config: Config, dir_id: str): 230 | """ 231 | Combine predictions from unshifted and shifted patches to remove 232 | border effects 233 | """ 234 | 235 | print("Combining predictions") 236 | 237 | patch_size = config.model_params.patch_size 238 | shift = int(patch_size / 2) 239 | 240 | expr_fp = ( 241 | config.files.data_dir 242 | + "/" 243 | + config.files.dir_out_maps 244 | + "/" 245 | + config.files.dir_patches 246 | + str(config.model_params.patch_size) 247 | + "x" 248 | + str(config.model_params.patch_size) 249 | + "_shift_" 250 | + str(shift) 251 | ) 252 | expr_fp_ext = ".hdf5" 253 | 254 | dir_id = os.path.join( 255 | config.files.data_dir, 256 | "model_outputs", 257 | dir_id, 258 | config.experiment_dirs.test_output_dir, 259 | ) 260 | 261 | pred_fp = "%s/epoch_%d_step_%d_seg_shift0.tif" % ( 262 | dir_id, 263 | config.testing_params.test_epoch, 264 | config.testing_params.test_step, 265 | ) 266 | pred_fp_sf = "%s/epoch_%d_step_%d_seg_shift%d.tif" % ( 267 | dir_id, 268 | config.testing_params.test_epoch, 269 | config.testing_params.test_step, 270 | shift, 271 | ) 272 | 273 | output_fp = dir_id + "/" + os.path.basename(pred_fp).replace("_seg_shift0", "") 274 | 275 | pred = tifffile.imread(pred_fp) 276 | pred_sf = tifffile.imread(pred_fp_sf) 277 | 278 | fp_patches_sf = glob.glob(expr_fp + "/*" + expr_fp_ext) 279 | fp_patches_sf = natsort.natsorted(fp_patches_sf) 280 | 281 | coords_patches = [re.findall(r"\d+", os.path.basename(x)) for x in fp_patches_sf] 282 | coords_h1 = [int(x[0]) for x in coords_patches] 283 | coords_w1 = [int(x[1]) for x in coords_patches] 284 | 285 | # Fill along grid 286 | h_starts_wide = [] 287 | w_starts_wide = [] 288 | 289 | # Middle section of shifted patches 290 | for i in range(int(patch_size * 0.35), int(patch_size * 0.65)): 291 | h_starts_wide.extend([x + i for x in coords_h1]) 292 | w_starts_wide.extend([x + i for x in coords_w1]) 293 | 294 | # Gap larger than patch_size -> remove all corresponding locations 295 | h_gap = gap_coords(coords_h1, patch_size) 296 | w_gap = gap_coords(coords_w1, patch_size) 297 | 298 | fill = np.zeros(pred.shape) 299 | fill[h_starts_wide, :] = 1 300 | fill[:, w_starts_wide] = 1 301 | 302 | # Gaps 303 | fill[h_gap, :] = 0 304 | fill[:, w_gap] = 0 305 | 306 | tifffile.imwrite( 307 | dir_id + "/" + "fill.tif", fill.astype(np.uint16), photometric="minisblack" 308 | ) 309 | 310 | result = np.zeros(pred.shape, dtype=np.uint32) 311 | result = np.where(fill > 0, pred_sf, pred) 312 | 313 | tifffile.imwrite(output_fp, result.astype(np.uint32), photometric="minisblack") 314 | 315 | 316 | if __name__ == "__main__": 317 | parser = argparse.ArgumentParser() 318 | 319 | parser.add_argument("--config_dir", type=str, help="path to config") 320 | 321 | args = parser.parse_args() 322 | config = load_config(args.config_dir) 323 | 324 | test_output_dir = predict(config) 325 | 326 | fill_grid(config, test_output_dir) 327 | -------------------------------------------------------------------------------- /bidcell/model/train.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import logging 3 | import math 4 | import sys 5 | import os 6 | 7 | import matplotlib.pyplot as plt 8 | import pandas as pd 9 | import segmentation_models_pytorch as smp 10 | import torch 11 | import torch.optim.lr_scheduler as lr_scheduler 12 | from torch.utils.data import DataLoader 13 | 14 | from .dataio.dataset_input import DataProcessing 15 | from .model.losses import ( 16 | CellCallingLoss, 17 | NucleiEncapsulationLoss, 18 | OverlapLoss, 19 | Oversegmentation, 20 | PosNegMarkerLoss, 21 | ) 22 | from .model.model import SegmentationModel as Network 23 | from .utils.utils import ( 24 | get_experiment_id, 25 | make_dir, 26 | save_fig_outputs, 27 | ) 28 | from ..config import load_config, Config 29 | 30 | 31 | def train(config: Config): 32 | logging.basicConfig( 33 | format="%(asctime)s %(levelname)s %(message)s", 34 | level=logging.INFO, 35 | stream=sys.stdout, 36 | ) 37 | 38 | use_cuda = torch.cuda.is_available() 39 | device = torch.device("cuda" if use_cuda else "cpu") 40 | 41 | # Create experiment directories 42 | resume_epoch = None # could be added 43 | resume_step = 0 44 | if resume_epoch is None: 45 | make_new = True 46 | else: 47 | make_new = False 48 | 49 | timestamp = get_experiment_id( 50 | make_new, 51 | config.experiment_dirs.dir_id, 52 | config.files.data_dir, 53 | ) 54 | experiment_path = os.path.join(config.files.data_dir, "model_outputs", timestamp) 55 | make_dir(experiment_path + "/" + config.experiment_dirs.model_dir) 56 | make_dir(experiment_path + "/" + config.experiment_dirs.samples_dir) 57 | 58 | if config.training_params.model_freq <= config.testing_params.test_step: 59 | model_freq = config.training_params.model_freq 60 | else: 61 | model_freq = config.testing_params.test_step 62 | 63 | # Set up the model 64 | logging.info("Initialising model") 65 | 66 | atlas_exprs = pd.read_csv(config.files.fp_ref, index_col=0) 67 | n_genes = atlas_exprs.shape[1] - 3 68 | print("Number of genes: %d" % n_genes) 69 | 70 | if config.model_params.name != "custom": 71 | model = smp.Unet( 72 | encoder_name=config.model_params.name, 73 | encoder_weights=None, 74 | in_channels=n_genes, 75 | classes=2, 76 | ) 77 | else: 78 | model = Network(n_channels=n_genes) 79 | 80 | model = model.to(device) 81 | 82 | # Dataloader 83 | logging.info("Preparing data") 84 | 85 | train_dataset = DataProcessing( 86 | config, 87 | isTraining=True, 88 | total_steps=config.training_params.total_steps, 89 | ) 90 | train_loader = DataLoader( 91 | dataset=train_dataset, batch_size=1, shuffle=True, num_workers=0, drop_last=True 92 | ) 93 | 94 | n_train_examples = len(train_loader) 95 | logging.info("Total number of training examples: %d" % n_train_examples) 96 | 97 | # Loss functions 98 | criterion_ne = NucleiEncapsulationLoss(config.training_params.ne_weight, device) 99 | criterion_os = Oversegmentation(config.training_params.os_weight, device) 100 | criterion_cc = CellCallingLoss(config.training_params.cc_weight, device) 101 | criterion_ov = OverlapLoss(config.training_params.ov_weight, device) 102 | criterion_pn = PosNegMarkerLoss( 103 | config.training_params.pos_weight, 104 | config.training_params.neg_weight, 105 | device, 106 | ) 107 | 108 | # Optimiser 109 | if config.training_params.optimizer == "rmsprop": 110 | optimizer = torch.optim.RMSprop( 111 | model.parameters(), 112 | lr=config.training_params.learning_rate, 113 | weight_decay=1e-8, 114 | ) 115 | elif config.training_params.optimizer == "adam": 116 | optimizer = torch.optim.Adam( 117 | model.parameters(), 118 | lr=config.training_params.learning_rate, 119 | betas=(config.training_params.beta1, config.training_params.beta2), 120 | weight_decay=config.training_params.weight_decay, 121 | ) 122 | else: 123 | sys.exit("Select optimiser from rmsprop or adam") 124 | 125 | global_step = 0 126 | 127 | # Scheduler https://arxiv.org/pdf/1812.01187.pdf 128 | lf = ( 129 | lambda x: ( 130 | ((1 + math.cos(x * math.pi / config.training_params.total_epochs)) / 2) 131 | ** 1.0 132 | ) 133 | * 0.95 134 | + 0.05 135 | ) # cosine 136 | scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf) 137 | scheduler.last_epoch = global_step 138 | 139 | # Starting epoch 140 | if resume_epoch is not None: 141 | initial_epoch = resume_epoch 142 | else: 143 | initial_epoch = 0 144 | 145 | # Restore saved model 146 | if resume_epoch is not None: 147 | load_path = ( 148 | experiment_path 149 | + "/" 150 | + config.experiment_dirs.model_dir 151 | + "/epoch_%d_step_%d.pth" % (resume_epoch, resume_step) 152 | ) 153 | checkpoint = torch.load(load_path) 154 | model.load_state_dict(checkpoint["model_state_dict"]) 155 | optimizer.load_state_dict(checkpoint["optimizer_state_dict"]) 156 | epoch = checkpoint["epoch"] 157 | assert epoch == resume_epoch 158 | print("Resume training, successfully loaded " + load_path) 159 | 160 | logging.info("Begin training") 161 | 162 | model = model.train() 163 | 164 | lrs = [] 165 | 166 | for epoch in range(initial_epoch, config.training_params.total_epochs): 167 | cur_lr = optimizer.param_groups[0]["lr"] 168 | print("\nEpoch =", (epoch + 1), " lr =", cur_lr) 169 | 170 | for step_epoch, ( 171 | batch_x313, 172 | batch_n, 173 | batch_sa, 174 | batch_pos, 175 | batch_neg, 176 | coords_h1, 177 | coords_w1, 178 | nucl_aug, 179 | expr_aug_sum, 180 | ) in enumerate(train_loader): 181 | # Permute channels axis to batch axis 182 | # torch.Size([1, patch_size, patch_size, 313, n_cells]) to [n_cells, 313, patch_size, patch_size] 183 | batch_x313 = batch_x313[0, :, :, :, :].permute(3, 2, 0, 1) 184 | batch_sa = batch_sa.permute(3, 0, 1, 2) 185 | batch_pos = batch_pos.permute(3, 0, 1, 2) 186 | batch_neg = batch_neg.permute(3, 0, 1, 2) 187 | batch_n = batch_n.permute(3, 0, 1, 2) 188 | 189 | if batch_x313.shape[0] == 0: 190 | if (step_epoch % model_freq) == 0: 191 | save_path = ( 192 | experiment_path 193 | + "/" 194 | + config.experiment_dirs.model_dir 195 | + "/epoch_%d_step_%d.pth" % (epoch + 1, step_epoch) 196 | ) 197 | torch.save( 198 | { 199 | "epoch": epoch + 1, 200 | "model_state_dict": model.state_dict(), 201 | "optimizer_state_dict": optimizer.state_dict(), 202 | }, 203 | save_path, 204 | ) 205 | logging.info("Model saved: %s" % save_path) 206 | continue 207 | 208 | # Transfer to GPU 209 | batch_x313 = batch_x313.to(device) 210 | batch_sa = batch_sa.to(device) 211 | batch_pos = batch_pos.to(device) 212 | batch_neg = batch_neg.to(device) 213 | batch_n = batch_n.to(device) 214 | 215 | optimizer.zero_grad() 216 | 217 | seg_pred = model(batch_x313) 218 | 219 | # Compute losses 220 | loss_ne = criterion_ne(seg_pred, batch_n) 221 | loss_os = criterion_os(seg_pred, batch_n) 222 | loss_cc = criterion_cc(seg_pred, batch_sa) 223 | loss_ov = criterion_ov(seg_pred, batch_n) 224 | loss_pn = criterion_pn(seg_pred, batch_pos, batch_neg) 225 | 226 | loss_ne = loss_ne.squeeze() 227 | loss_os = loss_os.squeeze() 228 | loss_cc = loss_cc.squeeze() 229 | loss_ov = loss_ov.squeeze() 230 | loss_pn = loss_pn.squeeze() 231 | 232 | loss = loss_ne + loss_os + loss_cc + loss_ov + loss_pn 233 | 234 | # Optimisation 235 | loss.backward() 236 | optimizer.step() 237 | 238 | # step_ne_loss = loss_ne.detach().cpu().numpy() # noqa 239 | # step_os_loss = loss_os.detach().cpu().numpy() # noqa 240 | # step_cc_loss = loss_cc.detach().cpu().numpy() # noqa 241 | # step_ov_loss = loss_ov.detach().cpu().numpy() # noqa 242 | # step_pn_loss = loss_pn.detach().cpu().numpy() # noqa 243 | 244 | step_train_loss = loss.detach().cpu().numpy() 245 | 246 | if (global_step % config.training_params.sample_freq) == 0: 247 | coords_h1 = coords_h1.detach().cpu().squeeze().numpy() 248 | coords_w1 = coords_w1.detach().cpu().squeeze().numpy() 249 | sample_seg = seg_pred.detach().cpu().numpy() 250 | sample_n = nucl_aug.detach().cpu().numpy() 251 | sample_sa = batch_sa.detach().cpu().numpy() 252 | sample_expr = expr_aug_sum.detach().cpu().numpy() 253 | patch_fp = ( 254 | experiment_path 255 | + "/" 256 | + config.experiment_dirs.samples_dir 257 | + "/epoch_%d_%d_%d_%d.png" 258 | % (epoch + 1, step_epoch, coords_h1, coords_w1) 259 | ) 260 | 261 | save_fig_outputs(sample_seg, sample_n, sample_sa, sample_expr, patch_fp) 262 | 263 | print( 264 | "Epoch[{}/{}], Step[{}], Loss:{:.4f}".format( 265 | epoch + 1, 266 | config.training_params.total_epochs, 267 | step_epoch, 268 | step_train_loss, 269 | ) 270 | ) 271 | # print('NE:{:.4f}, TC:{:.4f}, CC:{:.4f}, OV:{:.4f}, PN:{:.4f}'.format(step_ne_loss, 272 | # step_os_loss, 273 | # step_cc_loss, 274 | # step_ov_loss, 275 | # step_pn_loss)) 276 | 277 | # Save model 278 | if (step_epoch % model_freq) == 0: 279 | save_path = ( 280 | experiment_path 281 | + "/" 282 | + config.experiment_dirs.model_dir 283 | + "/epoch_%d_step_%d.pth" % (epoch + 1, step_epoch) 284 | ) 285 | torch.save( 286 | { 287 | "epoch": epoch + 1, 288 | "model_state_dict": model.state_dict(), 289 | "optimizer_state_dict": optimizer.state_dict(), 290 | }, 291 | save_path, 292 | ) 293 | logging.info("Model saved: %s" % save_path) 294 | 295 | global_step += 1 296 | 297 | # Update and append current LR 298 | scheduler.step() 299 | lrs.append(cur_lr) 300 | 301 | # Plot lr scheduler 302 | # plt.plot(lrs, ".-", label="LambdaLR") 303 | # plt.xlabel("epoch") 304 | # plt.ylabel("LR") 305 | # plt.tight_layout() 306 | # plt.savefig(experiment_path + "/LR.png", dpi=300) 307 | 308 | logging.info("Training finished") 309 | 310 | 311 | if __name__ == "__main__": 312 | parser = argparse.ArgumentParser() 313 | 314 | parser.add_argument("--config_dir", type=str, help="path to config") 315 | 316 | args = parser.parse_args() 317 | config = load_config(args.config_dir) 318 | 319 | train(config) 320 | -------------------------------------------------------------------------------- /bidcell/model/utils/utils.py: -------------------------------------------------------------------------------- 1 | import collections 2 | import datetime as dt 3 | import json 4 | import os 5 | import random 6 | import re 7 | import sys 8 | 9 | import matplotlib.pyplot as plt 10 | import natsort 11 | import numpy as np 12 | from scipy.special import softmax 13 | 14 | 15 | def sorted_alphanumeric(data): 16 | """ 17 | Alphanumerically sort a list 18 | """ 19 | convert = lambda text: int(text) if text.isdigit() else text.lower() 20 | alphanum_key = lambda key: [convert(c) for c in re.split("([0-9]+)", key)] 21 | return sorted(data, key=alphanum_key) 22 | 23 | 24 | def make_dir(dir_path): 25 | """ 26 | Make directory if doesn't exist 27 | """ 28 | if not os.path.exists(dir_path): 29 | os.makedirs(dir_path) 30 | 31 | 32 | def delete_file(path): 33 | """ 34 | Delete file if exists 35 | """ 36 | if os.path.exists(path): 37 | os.remove(path) 38 | 39 | 40 | def get_files_list(path, ext_array=[".tif"]): 41 | """ 42 | Get all files in a directory with a specific extension 43 | """ 44 | files_list = list() 45 | dirs_list = list() 46 | 47 | for root, dirs, files in os.walk(path, topdown=True): 48 | for file in files: 49 | if any(x in file for x in ext_array): 50 | files_list.append(os.path.join(root, file)) 51 | folder = os.path.dirname(os.path.join(root, file)) 52 | if folder not in dirs_list: 53 | dirs_list.append(folder) 54 | 55 | return files_list, dirs_list 56 | 57 | 58 | def json_file_to_pyobj(filename): 59 | """ 60 | Read json config file 61 | """ 62 | 63 | def _json_object_hook(d): 64 | return collections.namedtuple("X", d.keys())(*d.values()) 65 | 66 | def json2obj(data): 67 | return json.loads(data, object_hook=_json_object_hook) 68 | 69 | return json2obj(open(filename).read()) 70 | 71 | 72 | def get_newest_id(exp_dir="model_outputs"): 73 | """Get the latest experiment ID based on its timestamp 74 | 75 | Parameters 76 | ---------- 77 | exp_dir : str, optional 78 | Name of the directory that contains all the experiment directories 79 | 80 | Returns 81 | ------- 82 | exp_id : str 83 | Name of the latest experiment directory 84 | """ 85 | folders = next(os.walk(exp_dir))[1] 86 | if len(folders) == 0: 87 | sys.exit(f"No model output folders found in {exp_dir}") 88 | folders = natsort.natsorted(folders) 89 | folder_last = folders[-1] 90 | exp_id = folder_last.replace("\\", "/") 91 | return exp_id 92 | 93 | 94 | def get_experiment_id(make_new, dir_id, data_dir): 95 | """ 96 | Get timestamp ID of current experiment 97 | """ 98 | if make_new is False: 99 | if dir_id == "last": 100 | timestamp = get_newest_id(os.path.join(data_dir, "model_outputs")) 101 | else: 102 | timestamp = dir_id 103 | else: 104 | timestamp = dt.datetime.now().strftime("%Y_%m_%d_%H_%M_%S") 105 | 106 | return timestamp 107 | 108 | 109 | def get_seg_mask(sample_seg, sample_n): 110 | """ 111 | Generate the segmentation mask with unique cell IDs 112 | """ 113 | sample_n = np.squeeze(sample_n) 114 | 115 | # Background prob is average probability of all cells EXCEPT FOR NUCLEI 116 | sample_probs = softmax(sample_seg, axis=1) 117 | bgd_probs = np.expand_dims(np.mean(sample_probs[:, 0, :, :], axis=0), 0) 118 | fgd_probs = sample_probs[:, 1, :, :] 119 | probs = np.concatenate((bgd_probs, fgd_probs), axis=0) 120 | final_seg = np.argmax(probs, axis=0) 121 | 122 | # Map predictions to original cell IDs 123 | ids_orig = np.unique(sample_n) 124 | if ids_orig[0] != 0: 125 | ids_orig = np.insert(ids_orig, 0, 0) 126 | ids_pred = np.unique(final_seg) 127 | if ids_pred[0] != 0: 128 | ids_pred = np.insert(ids_pred, 0, 0) 129 | ids_orig = ids_orig[ids_pred] 130 | 131 | dictionary = dict(zip(ids_pred, ids_orig)) 132 | dictionary[0] = 0 133 | final_seg_orig = np.copy(final_seg) 134 | final_seg_orig = np.vectorize(dictionary.get)(final_seg) 135 | 136 | # Add nuclei back in 137 | final_seg_orig = np.where(sample_n > 0, sample_n, final_seg_orig) 138 | 139 | return final_seg_orig 140 | 141 | 142 | def save_fig_outputs(sample_seg, sample_n, sample_sa, sample_expr, patch_fp): 143 | """ 144 | Generate figure of inputs and outputs 145 | """ 146 | sample_n = np.squeeze(sample_n) 147 | 148 | sample_expr = np.squeeze(sample_expr) 149 | sample_expr[sample_expr > 0] = 1 150 | 151 | sample_sa = np.squeeze(np.sum(sample_sa, 0)) 152 | 153 | final_seg_orig = get_seg_mask(sample_seg, sample_n) 154 | 155 | # Randomise colours for plot 156 | cells_ids_orig = np.unique(final_seg_orig) 157 | n_cells_ids = len(cells_ids_orig) 158 | cell_ids_rand = np.arange(1, n_cells_ids + 1) 159 | random.shuffle(cell_ids_rand) 160 | dictionary = dict(zip(cells_ids_orig, cell_ids_rand)) 161 | dictionary[0] = 0 162 | final_seg_mapped = np.copy(final_seg_orig) 163 | final_seg_mapped = np.vectorize(dictionary.get)(final_seg_orig) 164 | nuclei_mapped = np.copy(sample_n) 165 | nuclei_mapped = np.vectorize(dictionary.get)(sample_n) 166 | 167 | # Plot 168 | fig, axes = plt.subplots(ncols=3, figsize=(9, 3), sharex=True, sharey=True) 169 | ax = axes.ravel() 170 | 171 | ax[0].imshow(nuclei_mapped, cmap=plt.cm.nipy_spectral) 172 | ax[0].set_title("Nuclei") 173 | ax[1].imshow(final_seg_mapped, cmap=plt.cm.nipy_spectral) 174 | ax[1].set_title("Cells") 175 | ax[2].imshow(sample_expr, cmap=plt.cm.gray) 176 | ax[2].set_title("Expressions") 177 | # ax[3].imshow(sample_sa, cmap=plt.cm.gray) 178 | # ax[3].set_title("Eligible") 179 | 180 | for a in ax: 181 | a.set_axis_off() 182 | 183 | fig.tight_layout() 184 | # plt.show() 185 | 186 | fig.savefig(patch_fp) 187 | plt.close(fig) 188 | -------------------------------------------------------------------------------- /bidcell/processing/cell_gene_matrix.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import glob 3 | import multiprocessing as mp 4 | import os 5 | import sys 6 | 7 | import cv2 8 | import numpy as np 9 | import pandas as pd 10 | import tifffile 11 | from tqdm import tqdm 12 | 13 | from .utils import get_n_processes, get_patches_coords 14 | from ..config import Config, load_config 15 | 16 | np.seterr(divide="ignore", invalid="ignore") 17 | 18 | 19 | def process_chunk( 20 | chunk, output_dir, cell_ids_unique, col_names, seg_map, x_col, y_col, gene_col 21 | ): 22 | """Extract cell expression profiles""" 23 | 24 | df_out = pd.DataFrame(0, index=cell_ids_unique, columns=col_names) 25 | df_out["cell_id"] = cell_ids_unique.copy() 26 | 27 | chunk_id = chunk.index[0] 28 | 29 | for index_row, row in chunk.iterrows(): 30 | gene = row[gene_col] 31 | w_loc = row[x_col] 32 | h_loc = row[y_col] 33 | 34 | seg_val = seg_map[h_loc, w_loc] 35 | if seg_val > 0: 36 | df_out.loc[seg_val, gene] += 1 37 | 38 | df_out.to_csv(output_dir + "/" + "chunk_%d.csv" % chunk_id) 39 | 40 | 41 | def process_chunk_meta( 42 | matrix, fp_output, seg_map_mi, col_names_coords, scale_pix_x, scale_pix_y 43 | ): 44 | """Compute cell locations and sizes""" 45 | 46 | chunk_id = matrix[0, 0] 47 | output = np.zeros((matrix.shape[0], len(col_names_coords))) 48 | output[:, 0] = matrix[:, 0].copy() 49 | output[:, 4:] = matrix[:, 1:].copy() 50 | 51 | # Convert to pixel resolution 52 | for cur_i, cell_id in enumerate(output[:, 0]): 53 | if cell_id > 0: 54 | try: 55 | # cell_centroid_x and cell_centroid_y 56 | coords = np.where(seg_map_mi == cell_id) 57 | x_points = coords[1] 58 | y_points = coords[0] 59 | centroid_x = sum(x_points) / len(x_points) 60 | centroid_y = sum(y_points) / len(y_points) 61 | output[cur_i, 1] = centroid_x * scale_pix_x 62 | output[cur_i, 2] = centroid_y * scale_pix_y 63 | 64 | # cell_size 65 | output[cur_i, 3] = len(coords[0]) / (scale_pix_x * scale_pix_y) 66 | except Exception: 67 | output[cur_i, 1] = -1 68 | output[cur_i, 2] = -1 69 | output[cur_i, 3] = -1 70 | 71 | # Save as csv 72 | df_split = pd.DataFrame( 73 | output, index=list(range(output.shape[0])), columns=col_names_coords 74 | ) 75 | df_split.to_csv(fp_output + "%d.csv" % chunk_id, index=False) 76 | 77 | 78 | def transform_locations(df_expr, col, scale, shift=0): 79 | """Scale transcripts to pixel resolution of the platform""" 80 | print(f"Transforming {col}") 81 | df_expr[col] = df_expr[col].div(scale).round().astype(int).sub(shift) 82 | return df_expr 83 | 84 | 85 | def read_expr_csv(fp): 86 | try: 87 | print("Reading filtered transcripts") 88 | return pd.read_csv(fp) 89 | except Exception: 90 | sys.exit(f"Cannot read {fp}") 91 | 92 | 93 | def make_cell_gene_mat(config: Config, is_cell: bool, timestamp: str | None = None): 94 | dir_dataset = config.files.data_dir 95 | dir_cgm = config.files.dir_cgm 96 | 97 | if is_cell is False: 98 | output_dir = os.path.join(dir_dataset, dir_cgm, "nuclei") 99 | else: 100 | output_dir = os.path.join(dir_dataset, dir_cgm, timestamp) 101 | 102 | fp_transcripts_processed = os.path.join( 103 | dir_dataset, config.files.fp_transcripts_processed 104 | ) 105 | 106 | fp_gene_names = os.path.join(dir_dataset, config.files.fp_gene_names) 107 | 108 | if is_cell is False: 109 | fp_seg = os.path.join(dir_dataset, config.files.fp_nuclei) 110 | else: 111 | fp_seg_name = [ 112 | "epoch_" 113 | + str(config.testing_params.test_epoch) 114 | + "_step_" 115 | + str(config.testing_params.test_step) 116 | + "_connected.tif" 117 | ] 118 | fp_seg = os.path.join( 119 | config.files.data_dir, 120 | "model_outputs", 121 | timestamp, 122 | config.experiment_dirs.test_output_dir, 123 | "".join(fp_seg_name), 124 | ) 125 | 126 | # Column names in the transcripts csv 127 | x_col = config.transcripts.x_col 128 | y_col = config.transcripts.y_col 129 | gene_col = config.transcripts.gene_col 130 | 131 | if not os.path.exists(output_dir): 132 | os.makedirs(output_dir) 133 | 134 | seg_map_mi = tifffile.imread(fp_seg) 135 | height = seg_map_mi.shape[0] 136 | width = seg_map_mi.shape[1] 137 | 138 | cell_ids_unique = np.unique(seg_map_mi.reshape(-1)) 139 | cell_ids_unique = cell_ids_unique[1:] 140 | n_cells = len(cell_ids_unique) 141 | print("Number of cells " + str(n_cells)) 142 | 143 | with open(fp_gene_names) as file: 144 | gene_names = [line.rstrip() for line in file] 145 | 146 | col_names = ["cell_id"] + gene_names 147 | 148 | # Divide the dataframe into chunks for multiprocessing 149 | n_processes = get_n_processes(config.cpus) 150 | print(f"Number of splits for multiprocessing: {n_processes}") 151 | 152 | # Scale factor to pixel resolution of platform 153 | # read in affine 154 | # extract scale_x and scale_y 155 | # divide by (scale_x*pixel resolution) (microns per pixel) 156 | # affine = pd.read_csv(fp_affine, index_col=0, header=None, sep='\t') 157 | # scale_x_tr = float(affine.loc["scale_x"].item()) 158 | # scale_y_tr = float(affine.loc["scale_y"].item()) 159 | # scale_pix_x = (scale_x_tr*config.affine.scale_pix_x) 160 | # scale_pix_y = (scale_y_tr*config.affine.scale_pix_y) 161 | scale_pix_x = config.affine.scale_pix_x 162 | scale_pix_y = config.affine.scale_pix_y 163 | 164 | if not os.path.exists(output_dir + "/" + config.files.fp_expr): 165 | # Rescale to pixel size 166 | height_pix = np.round(height / config.affine.scale_pix_y).astype(int) 167 | width_pix = np.round(width / config.affine.scale_pix_x).astype(int) 168 | 169 | seg_map = cv2.resize( 170 | seg_map_mi.astype(np.int32), 171 | (width_pix, height_pix), 172 | interpolation=cv2.INTER_NEAREST, 173 | ) 174 | print("Segmentation map pixel size: ", seg_map.shape) 175 | fp_rescaled_seg = output_dir + "/rescaled.tif" 176 | print("Saving temporary resized segmentation") 177 | tifffile.imwrite( 178 | fp_rescaled_seg, seg_map.astype(np.uint32), photometric="minisblack" 179 | ) 180 | 181 | df_out = pd.DataFrame(0, index=cell_ids_unique, columns=col_names) 182 | df_out["cell_id"] = cell_ids_unique.copy() 183 | 184 | # Divide into patches for large datasets that exceed memory capacity 185 | if (height_pix + width_pix) > config.cgm_params.max_sum_hw: 186 | patch_h = int(config.cgm_params.max_sum_hw / 2) 187 | patch_w = config.cgm_params.max_sum_hw - patch_h 188 | else: 189 | patch_h = height_pix 190 | patch_w = width_pix 191 | 192 | h_coords, _ = get_patches_coords(height_pix, patch_h) 193 | w_coords, _ = get_patches_coords(width_pix, patch_w) 194 | hw_coords = [(hs, he, ws, we) for (hs, he) in h_coords for (ws, we) in w_coords] 195 | 196 | print("Extracting cell expressions") 197 | for hs, he, ws, we in tqdm(hw_coords): 198 | print(f"Patch H {hs}:{he}, W {ws}:{we}") 199 | seg_map = tifffile.imread(fp_rescaled_seg)[hs:he, ws:we] 200 | print(seg_map.shape) 201 | 202 | df_expr = read_expr_csv(fp_transcripts_processed) 203 | print( 204 | df_expr[x_col].min(), 205 | df_expr[x_col].max(), 206 | df_expr[y_col].min(), 207 | df_expr[y_col].max(), 208 | ) 209 | 210 | df_expr = transform_locations(df_expr, x_col, scale_pix_x) 211 | df_expr = transform_locations(df_expr, y_col, scale_pix_y) 212 | 213 | df_expr = df_expr[ 214 | (df_expr[x_col].between(ws, we - 1)) 215 | & (df_expr[y_col].between(hs, he - 1)) 216 | ] 217 | print( 218 | df_expr[x_col].min(), 219 | df_expr[x_col].max(), 220 | df_expr[y_col].min(), 221 | df_expr[y_col].max(), 222 | ) 223 | 224 | df_expr = transform_locations(df_expr, x_col, 1, ws) 225 | df_expr = transform_locations(df_expr, y_col, 1, hs) 226 | print( 227 | df_expr[x_col].min(), 228 | df_expr[x_col].max(), 229 | df_expr[y_col].min(), 230 | df_expr[y_col].max(), 231 | ) 232 | 233 | df_expr.reset_index(drop=True, inplace=True) 234 | 235 | df_expr_splits = np.array_split(df_expr, n_processes) 236 | processes = [] 237 | 238 | print("Extracting cell-gene matrix chunks") 239 | for chunk in df_expr_splits: 240 | p = mp.Process( 241 | target=process_chunk, 242 | args=( 243 | chunk, 244 | output_dir, 245 | cell_ids_unique, 246 | col_names, 247 | seg_map, 248 | x_col, 249 | y_col, 250 | gene_col, 251 | ), 252 | ) 253 | processes.append(p) 254 | p.start() 255 | 256 | for p in processes: 257 | p.join() 258 | 259 | print("Combining cell-gene matrix chunks") 260 | 261 | fp_chunks = glob.glob(output_dir + "/chunk_*.csv") 262 | for fpc in fp_chunks: 263 | df_i = pd.read_csv(fpc, index_col=0) 264 | df_out.iloc[:, 1:] = df_out.iloc[:, 1:].add(df_i.iloc[:, 1:]) 265 | 266 | df_out.to_csv(output_dir + "/" + config.files.fp_expr) 267 | 268 | # Clean up 269 | for fpc in fp_chunks: 270 | os.remove(fpc) 271 | 272 | print("Obtained cell-gene matrix") 273 | os.remove(fp_rescaled_seg) 274 | del seg_map 275 | del df_expr 276 | 277 | else: 278 | df_out = pd.read_csv(output_dir + "/" + config.files.fp_expr, index_col=0) 279 | 280 | # if is_cell: 281 | # print("Computing cell locations and sizes") 282 | 283 | # matrix_all = df_out.to_numpy().astype(np.float32) 284 | # matrix_all_splits = np.array_split(matrix_all, n_processes) 285 | # processes = [] 286 | 287 | # fp_output = output_dir + "/cell_outputs_" 288 | # col_names_coords = [ 289 | # "cell_id", 290 | # "cell_centroid_x", 291 | # "cell_centroid_y", 292 | # "cell_size", 293 | # ] + gene_names 294 | 295 | # for chunk in matrix_all_splits: 296 | # p = mp.Process( 297 | # target=process_chunk_meta, 298 | # args=( 299 | # chunk, 300 | # fp_output, 301 | # seg_map_mi, 302 | # col_names_coords, 303 | # scale_pix_x, 304 | # scale_pix_y, 305 | # ), 306 | # ) 307 | # processes.append(p) 308 | # p.start() 309 | 310 | # for p in processes: 311 | # p.join() 312 | 313 | print("Done") 314 | 315 | 316 | if __name__ == "__main__": 317 | parser = argparse.ArgumentParser() 318 | 319 | parser.add_argument("--config_dir", type=str, help="path to config") 320 | parser.add_argument( 321 | "--is_cell", type=bool, help="whether to segment cells or nuclei" 322 | ) 323 | 324 | args = parser.parse_args() 325 | config = load_config(args.config_dir) 326 | 327 | make_cell_gene_mat(config, args.is_cell) 328 | -------------------------------------------------------------------------------- /bidcell/processing/nuclei_segmentation.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | 4 | import numpy as np 5 | import pandas as pd 6 | import tifffile 7 | from cellpose import models 8 | from skimage.transform import resize 9 | from tqdm import tqdm 10 | 11 | from .utils import get_patches_coords 12 | from ..config import Config, load_config 13 | 14 | 15 | def resize_dapi(dapi, new_h, new_w): 16 | """Resize DAPI image""" 17 | resized = resize(dapi, (new_h, new_w), preserve_range=True, anti_aliasing=True) 18 | return resized 19 | 20 | 21 | def segment_dapi(img, diameter=None, use_cpu=False): 22 | """Segment nuclei in DAPI image using Cellpose""" 23 | use_gpu = True if not use_cpu else False 24 | model = models.Cellpose(gpu=use_gpu, model_type="cyto") 25 | channels = [0, 0] 26 | mask, _, _, _ = model.eval(img, diameter=diameter, channels=channels) 27 | return mask 28 | 29 | 30 | def segment_nuclei(config: Config): 31 | dir_dataset = config.files.data_dir 32 | 33 | print("Reading DAPI image") 34 | if config.files.fp_dapi is None: 35 | fp_dapi = os.path.join(dir_dataset, "dapi_stitched.tif") 36 | else: 37 | fp_dapi = config.files.fp_dapi 38 | print(fp_dapi) 39 | dapi = tifffile.imread(fp_dapi) 40 | 41 | # Crop to size of transcript map (requires getting transcript maps first) 42 | if config.nuclei.crop_nuclei_to_ts: 43 | # Get starting coordinates 44 | fp_affine = os.path.join(dir_dataset, config.files.fp_affine) 45 | 46 | affine = pd.read_csv(fp_affine, index_col=0, header=None, sep="\t") 47 | 48 | min_x = int(float(affine.loc["min_x"].item())) 49 | min_y = int(float(affine.loc["min_y"].item())) 50 | size_x = int(float(affine.loc["size_x"].item())) 51 | size_y = int(float(affine.loc["size_y"].item())) 52 | 53 | dapi = dapi[min_y : min_y + size_y, min_x : min_x + size_x] 54 | 55 | dapi_h = dapi.shape[0] 56 | dapi_w = dapi.shape[1] 57 | print(f"DAPI shape h: {dapi_h} w: {dapi_w}") 58 | 59 | # Process patch-wise if too large 60 | if ( 61 | dapi_h > config.nuclei.max_height 62 | or dapi_w > config.nuclei.max_width 63 | or config.affine.scale_pix_x != 1.0 64 | or config.affine.scale_pix_y != 1.0 65 | ): 66 | if config.nuclei.max_height is None: 67 | max_height = dapi_h 68 | else: 69 | max_height = config.nuclei.max_height if config.nuclei.max_height < dapi_h else dapi_h 70 | if config.nuclei.max_width is None: 71 | max_width = dapi_w 72 | else: 73 | max_width = config.nuclei.max_width if config.nuclei.max_width < dapi_w else dapi_w 74 | 75 | print(f"Segmenting DAPI patches h: {max_height} w: {max_width}") 76 | 77 | # Coordinates of patches 78 | h_coords, _ = get_patches_coords(dapi_h, max_height) 79 | w_coords, _ = get_patches_coords(dapi_w, max_width) 80 | hw_coords = [(hs, he, ws, we) for (hs, he) in h_coords for (ws, we) in w_coords] 81 | 82 | # Original patch sizes 83 | h_patch_sizes = [he - hs for (hs, he) in h_coords] 84 | w_patch_sizes = [we - ws for (ws, we) in w_coords] 85 | 86 | # Determine the resized patch sizes 87 | rh_patch_sizes = [round(y * config.affine.scale_pix_y) for y in h_patch_sizes] 88 | rw_patch_sizes = [round(x * config.affine.scale_pix_x) for x in w_patch_sizes] 89 | rhw_patch_sizes = [ 90 | (hsize, wsize) for hsize in rh_patch_sizes for wsize in rw_patch_sizes 91 | ] 92 | 93 | # Determine the resized patch starting coordinates 94 | rh_coords = [sum(rh_patch_sizes[:i]) for i, x in enumerate(rh_patch_sizes)] 95 | rw_coords = [sum(rw_patch_sizes[:i]) for i, x in enumerate(rw_patch_sizes)] 96 | rhw_coords = [(h, w) for h in rh_coords for w in rw_coords] 97 | 98 | # Sum up the new sizes to get the final size of the resized DAPI 99 | rh_dapi = sum(rh_patch_sizes) 100 | rw_dapi = sum(rw_patch_sizes) 101 | rdapi = np.zeros((rh_dapi, rw_dapi), dtype=dapi.dtype) 102 | nuclei = np.zeros((rh_dapi, rw_dapi), dtype=np.uint32) 103 | print(f"Nuclei image h: {rh_dapi} w: {rw_dapi}") 104 | 105 | # Divide into patches 106 | n_patches = len(hw_coords) 107 | total_n = 0 108 | 109 | for patch_i in tqdm(range(n_patches)): 110 | (hs, he, ws, we) = hw_coords[patch_i] 111 | (hsize, wsize) = rhw_patch_sizes[patch_i] 112 | (h, w) = rhw_coords[patch_i] 113 | 114 | patch = dapi[hs:he, ws:we] 115 | 116 | patch_resized = resize_dapi(patch, hsize, wsize) 117 | rdapi[h : h + hsize, w : w + wsize] = patch_resized 118 | 119 | # Segment nuclei in each patch and place into final segmentation with unique ID 120 | patch_nuclei = segment_dapi(patch_resized, config.nuclei.diameter, config.nuclei.use_cpu) 121 | nuclei_mask = np.where(patch_nuclei > 0, 1, 0) 122 | nuclei[h : h + hsize, w : w + wsize] = patch_nuclei + total_n * nuclei_mask 123 | unique_ids = np.unique(patch_nuclei) 124 | total_n += unique_ids.max() 125 | 126 | # Save resized DAPI 127 | fp_rdapi = os.path.join(dir_dataset, config.files.fp_rdapi) 128 | tifffile.imwrite(fp_rdapi, rdapi, photometric="minisblack") 129 | 130 | else: 131 | print("Segmenting whole DAPI") 132 | nuclei = segment_dapi(dapi) 133 | 134 | print(f"Finished segmenting, found {len(np.unique(nuclei))-1} nuclei") 135 | 136 | # Save nuclei segmentation 137 | fp_nuclei = os.path.join(dir_dataset, config.files.fp_nuclei) 138 | tifffile.imwrite(fp_nuclei, nuclei.astype(np.uint32), photometric="minisblack") 139 | 140 | 141 | if __name__ == "__main__": 142 | parser = argparse.ArgumentParser() 143 | 144 | parser.add_argument( 145 | "--config_dir", type=str, help="path to config" 146 | ) 147 | 148 | args = parser.parse_args() 149 | config = load_config(args.config_dir) 150 | 151 | segment_nuclei(config) 152 | -------------------------------------------------------------------------------- /bidcell/processing/nuclei_stitch_fov.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import glob 3 | import os 4 | import re 5 | import sys 6 | 7 | import natsort 8 | import numpy as np 9 | import tifffile 10 | from PIL import Image 11 | from ..config import Config, load_config 12 | 13 | 14 | def check_pattern(string, pattern): 15 | """Check that a string contains a pattern""" 16 | pattern = pattern.replace("#", r"\d") # Replace "#" with "\d" to match any digit 17 | match = re.search(pattern, string) 18 | # print(match, match.group()) 19 | 20 | if match: 21 | return True # Pattern found in the string 22 | else: 23 | return False # Pattern not found in the string 24 | 25 | 26 | def check_shape_imgs(fp, target_h, target_w): 27 | """Check that an image (given its file path) has the same shape as target""" 28 | img = Image.open(fp) 29 | if img.size == (target_w, target_h): 30 | return True 31 | else: 32 | return False 33 | 34 | 35 | def check_images_meet_criteria(fp_list, boolean_list, msg): 36 | """Check if any file in a list of paths doesn't meet criteria""" 37 | wrong = [i for i, x in enumerate(boolean_list) if not x] 38 | if len(wrong) > 0: 39 | sys.exit(f"{msg}: {[fp_list[i] for i in wrong]}") 40 | 41 | 42 | def get_string_with_pattern(number, pattern): 43 | """1 # means F1..F10, etc. >1 # means F001..F010, etc""" 44 | num_hash = pattern.count("#") 45 | no_hash = pattern.replace("#", "") 46 | 47 | if num_hash == 1: 48 | return no_hash + str(number) 49 | elif num_hash >= len(str(number)): 50 | padded = str(number).zfill(num_hash) 51 | return no_hash + str(padded) 52 | else: 53 | sys.exit(f"Number requires more characters than {pattern}") 54 | 55 | 56 | def read_dapi(fp, channel_first, channel_dapi): 57 | """Reads DAPI image or channel from file""" 58 | dapi = tifffile.imread(fp) 59 | 60 | if len(dapi.shape) > 2: 61 | if channel_first: 62 | dapi = dapi[channel_dapi, :, :] 63 | else: 64 | dapi = dapi[:, :, channel_dapi] 65 | 66 | return dapi 67 | 68 | 69 | def stitch_nuclei(config: Config): 70 | dir_dataset = config.files.data_dir 71 | 72 | if not config.nuclei_fovs.dir_dapi: 73 | dir_dapi = dir_dataset 74 | else: 75 | dir_dapi = config.nuclei_fovs.dir_dapi 76 | 77 | ext_pat = "".join( 78 | "[%s%s]" % (e.lower(), e.upper()) for e in config.nuclei_fovs.ext_dapi 79 | ) 80 | fp_dapi_list = glob.glob(os.path.join(dir_dapi, "*." + ext_pat)) 81 | fp_dapi_list = natsort.natsorted(fp_dapi_list) 82 | 83 | sample = tifffile.imread(fp_dapi_list[0]) 84 | fov_shape = sample.shape 85 | 86 | # Error if multi-channel image and channel is not specified 87 | if len(fov_shape) > 2: 88 | if config.nuclei_fovs.channel_first: 89 | print(f"Channel axis first, DAPI channel {config.nuclei_fovs.channel_dapi}") 90 | fov_h = sample.shape[1] 91 | fov_w = sample.shape[2] 92 | else: 93 | print(f"Channel axis last, DAPI channel {config.nuclei_fovs.channel_dapi}") 94 | fov_h = sample.shape[0] 95 | fov_w = sample.shape[1] 96 | 97 | fov_dtype = sample.dtype 98 | 99 | # Error if patterns in file path names not found 100 | found_f = [check_pattern(s, config.nuclei_fovs.pattern_f) for s in fp_dapi_list] 101 | check_images_meet_criteria(fp_dapi_list, found_f, "FOV string pattern not found in") 102 | 103 | if config.nuclei_fovs.pattern_z is not None: 104 | found_z = [check_pattern(s, config.nuclei_fovs.pattern_z) for s in fp_dapi_list] 105 | check_images_meet_criteria( 106 | fp_dapi_list, found_z, "Z slice string pattern not found in" 107 | ) 108 | 109 | # Check shape the same as sample 110 | match_shape = [check_shape_imgs(s, fov_h, fov_w) for s in fp_dapi_list] 111 | check_images_meet_criteria(fp_dapi_list, match_shape, "Different image shape for") 112 | 113 | # Locations of each FOV in the whole image 114 | n_fov = config.nuclei_fovs.n_fov 115 | if config.nuclei_fovs.row_major: 116 | order = np.arange( 117 | config.nuclei_fovs.n_fov_h * config.nuclei_fovs.n_fov_w 118 | ).reshape((config.nuclei_fovs.n_fov_h, config.nuclei_fovs.n_fov_w)) 119 | else: 120 | order = np.arange( 121 | config.nuclei_fovs.n_fov_h * config.nuclei_fovs.n_fov_w 122 | ).reshape((config.nuclei_fovs.n_fov_h, config.nuclei_fovs.n_fov_w), order="F") 123 | 124 | # Arrangement of the FOVs - default is ul 125 | if config.nuclei_fovs.start_corner == "ur": 126 | order = np.flip(order, 1) 127 | elif config.nuclei_fovs.start_corner == "bl": 128 | order = np.flip(order, 0) 129 | elif config.nuclei_fovs.start_corner == "br": 130 | order = np.flip(order, (0, 1)) 131 | 132 | print("FOV ordering") 133 | print(order) 134 | 135 | stitched = np.zeros( 136 | (fov_h * config.nuclei_fovs.n_fov_h, fov_w * config.nuclei_fovs.n_fov_w), 137 | dtype=fov_dtype, 138 | ) 139 | 140 | for i_fov in range(n_fov): 141 | coord = np.where(order == i_fov) 142 | h_idx = coord[0][0] 143 | w_idx = coord[1][0] 144 | h_start = h_idx * fov_h 145 | w_start = w_idx * fov_w 146 | h_end = h_start + fov_h 147 | w_end = w_start + fov_w 148 | 149 | fov_num = i_fov + config.nuclei_fovs.min_fov 150 | 151 | # All files for FOV 152 | pattern_fov = get_string_with_pattern(fov_num, config.nuclei_fovs.pattern_f) 153 | print(pattern_fov) 154 | found_fov = [check_pattern(s, pattern_fov) for s in fp_dapi_list] 155 | fp_stack_fov = [fp_dapi_list[i] for i, x in enumerate(found_fov) if x] 156 | 157 | # Take MIP - or z level 158 | if config.nuclei_fovs.mip: 159 | dapi_stack = np.zeros((len(fp_stack_fov), fov_h, fov_w), dtype=fov_dtype) 160 | for i, fp in enumerate(fp_stack_fov): 161 | dapi_stack[i, :, :] = read_dapi( 162 | fp, 163 | config.nuclei_fovs.channel_first, 164 | config.nuclei_fovs.channel_dapi, 165 | ) 166 | 167 | fov_img = np.max(dapi_stack, axis=0) 168 | 169 | else: 170 | # Find z level slice for FOV 171 | pattern_slice = get_string_with_pattern( 172 | config.nuclei_fovs.z_level, config.nuclei_fovs.pattern_z 173 | ) 174 | print(pattern_slice) 175 | found_slice = [check_pattern(s, pattern_slice) for s in fp_stack_fov] 176 | found_slice_idx = [i for i, x in enumerate(found_slice) if x] 177 | if len(found_slice_idx) > 1: 178 | sys.exit( 179 | f"Found {len(found_slice_idx)} files with {pattern_slice} for FOV {fov_num}" 180 | ) 181 | 182 | print(fp_stack_fov[found_slice_idx[0]]) 183 | fov_img = read_dapi( 184 | fp_stack_fov[found_slice_idx[0]], 185 | config.nuclei_fovs.channel_first, 186 | config.nuclei_fovs.channel_dapi, 187 | ) 188 | 189 | # Flip 190 | if config.nuclei_fovs.flip_ud: 191 | fov_img = np.flip(fov_img, 0) 192 | 193 | # Place into appropriate location in stitched image 194 | stitched[h_start:h_end, w_start:w_end] = fov_img.copy() 195 | 196 | # Save 197 | fp_output = os.path.join(dir_dataset, "dapi_stitched.tif") 198 | tifffile.imwrite(fp_output, stitched, photometric="minisblack") 199 | print(f"Saved {fp_output}") 200 | 201 | 202 | if __name__ == "__main__": 203 | parser = argparse.ArgumentParser() 204 | 205 | parser.add_argument("--config_dir", type=str, help="path to config") 206 | 207 | args = parser.parse_args() 208 | config = load_config(args.config_dir) 209 | 210 | stitch_nuclei(config) 211 | -------------------------------------------------------------------------------- /bidcell/processing/preannotate.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import collections 3 | import glob 4 | import json 5 | import multiprocessing as mp 6 | import os 7 | import sys 8 | 9 | import h5py 10 | import numpy as np 11 | import pandas as pd 12 | from scipy.stats import spearmanr 13 | 14 | from .utils import get_n_processes 15 | from ..config import Config, load_config 16 | 17 | np.seterr(divide="ignore", invalid="ignore") 18 | 19 | 20 | def json_file_to_pyobj(filename): 21 | """ 22 | Read json config file 23 | """ 24 | 25 | def _json_object_hook(d): 26 | return collections.namedtuple("X", d.keys())(*d.values()) 27 | 28 | def json2obj(data): 29 | return json.loads(data, object_hook=_json_object_hook) 30 | 31 | return json2obj(open(filename).read()) 32 | 33 | 34 | def normalise_matrix(matrix): 35 | x_sums = np.sum(matrix, axis=1) 36 | matrix = matrix / np.expand_dims(x_sums, -1) 37 | matrix = np.log1p(matrix) 38 | return matrix 39 | 40 | 41 | def process_chunk_corr(matrix, dir_output, sc_expr, sc_labels, n_atlas_types): 42 | matrix_out = np.zeros((matrix.shape[0], 4)) 43 | col_names = ["cell_id", "cell_type", "spearman", "cell_type_atlas"] 44 | 45 | # cell_type 46 | cell_genes_norm = normalise_matrix(matrix[:, 1:]) 47 | res = spearmanr(sc_expr, cell_genes_norm, axis=1) 48 | corr = res.correlation 49 | # bottom left section 50 | corr = corr[n_atlas_types:, :n_atlas_types] 51 | 52 | corr_best = np.max(corr, 1) 53 | best_i_type = np.argmax(corr, 1) 54 | predicted_cell_type = [sc_labels[x] for x in best_i_type] 55 | 56 | nan_true = np.isnan(corr_best) 57 | corr_best = [x if not y else -1 for (x, y) in zip(corr_best, nan_true)] 58 | best_i_type = [x if not y else -1 for (x, y) in zip(best_i_type, nan_true)] 59 | predicted_cell_type = [ 60 | x if not y else -1 for (x, y) in zip(predicted_cell_type, nan_true) 61 | ] 62 | 63 | # cell ID 64 | matrix_out[:, 0] = matrix[:, 0].copy() 65 | 66 | # cell type 67 | matrix_out[:, 1] = predicted_cell_type.copy() 68 | 69 | # spearman 70 | matrix_out[:, 2] = corr_best.copy() 71 | 72 | # cell type atlas 73 | matrix_out[:, 3] = best_i_type.copy() 74 | 75 | # Save as csv 76 | df_split = pd.DataFrame( 77 | matrix_out, index=list(range(matrix_out.shape[0])), columns=col_names 78 | ) 79 | df_split.to_csv( 80 | dir_output + "/preannotations_%d.csv" % matrix_out[0, 0], index=False 81 | ) 82 | 83 | 84 | def preannotate(config: Config): 85 | dir_dataset = config.files.data_dir 86 | expr_dir = os.path.join(dir_dataset, config.files.dir_cgm, "nuclei") 87 | 88 | # Cell expressions - order of gene names (columns) will be in same order as all_gene_names.txt 89 | df_cells = pd.read_csv(os.path.join(expr_dir, config.files.fp_expr), index_col=0) 90 | print(f"Number of cells: {df_cells.shape[0]}") 91 | 92 | # Reference data - no requirement of column orders - ensure same order as df_cells 93 | df_ref_orig = pd.read_csv(config.files.fp_ref, index_col=0) 94 | 95 | # Ensure the order of genes match 96 | genes_cells = df_cells.columns[1:].tolist() 97 | ct_columns = df_ref_orig.columns[-3:].tolist() 98 | df_ref = df_ref_orig[genes_cells + ct_columns] 99 | 100 | genes_ref = df_ref.columns[:-3] 101 | if list(genes_cells) != list(genes_ref): 102 | print( 103 | "Genes in transcripts but not reference: ", 104 | list(set(genes_cells) - set(genes_ref)), 105 | ) 106 | print( 107 | "Genes in reference but not transcripts: ", 108 | list(set(genes_ref) - set(genes_cells)), 109 | ) 110 | print("Check names of genes") 111 | sys.exit() 112 | 113 | sc_expr = df_ref.iloc[:, :-3].to_numpy() 114 | n_atlas_types = sc_expr.shape[0] 115 | sc_labels = df_ref.iloc[:, -3].to_numpy().astype(int) 116 | # sc_names = df_ref.iloc[:, -2].to_list() 117 | 118 | # Divide the data into chunks for multiprocessing 119 | n_processes = get_n_processes(config.cpus) 120 | print(f"Number of splits for multiprocessing: {n_processes}") 121 | 122 | matrix_all = df_cells.to_numpy().astype(np.float32) 123 | matrix_all_splits = np.array_split(matrix_all, n_processes) 124 | processes = [] 125 | 126 | print("Computing simple annotation") 127 | for chunk in matrix_all_splits: 128 | p = mp.Process( 129 | target=process_chunk_corr, 130 | args=(chunk, dir_dataset, sc_expr, sc_labels, n_atlas_types), 131 | ) 132 | processes.append(p) 133 | p.start() 134 | 135 | for p in processes: 136 | p.join() 137 | 138 | fp_chunks = glob.glob(dir_dataset + "/preannotations_*.csv") 139 | for fp_i, fpc in enumerate(fp_chunks): 140 | df_i = pd.read_csv(fpc) 141 | if fp_i == 0: 142 | cell_df = df_i.copy() 143 | else: 144 | cell_df = pd.concat([cell_df, df_i], axis=0) 145 | 146 | cell_type_col = cell_df["cell_type"].to_numpy() 147 | cell_id_col = cell_df["cell_id"].to_numpy() 148 | 149 | h5f = h5py.File(dir_dataset + "/" + config.files.fp_nuclei_anno, "w") 150 | h5f.create_dataset("data", data=cell_type_col) 151 | h5f.create_dataset("ids", data=cell_id_col) 152 | h5f.close() 153 | 154 | # Clean up 155 | for fpc in fp_chunks: 156 | os.remove(fpc) 157 | 158 | 159 | if __name__ == "__main__": 160 | parser = argparse.ArgumentParser() 161 | 162 | parser.add_argument("--config_dir", type=str, help="path to config") 163 | 164 | args = parser.parse_args() 165 | config = load_config(args.config_dir) 166 | 167 | preannotate(config) 168 | -------------------------------------------------------------------------------- /bidcell/processing/transcript_patches.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import glob 3 | import os 4 | import re 5 | 6 | import h5py 7 | import natsort 8 | import numpy as np 9 | from tqdm import tqdm 10 | 11 | from ..config import Config, load_config 12 | 13 | 14 | def generate_patches(config: Config): 15 | """ 16 | Divides transcriptomic maps of all genes into patches for input to the CNN 17 | 18 | """ 19 | dir_dataset = os.path.join(config.files.data_dir, config.files.dir_out_maps) 20 | 21 | patch_size = config.model_params.patch_size 22 | shift = [0, int(patch_size / 2)] 23 | 24 | fp_maps = glob.glob(dir_dataset + "/all_genes_*.hdf5") 25 | fp_maps = natsort.natsorted(fp_maps) 26 | 27 | for fp in fp_maps: 28 | print(f"Processing {fp}") 29 | h5f = h5py.File(fp, "r") 30 | sst = h5f["data"][:] 31 | h5f.close() 32 | print("Loaded gene expr maps from %s" % fp) 33 | 34 | # hs, he, ws, we 35 | map_h = sst.shape[0] 36 | map_w = sst.shape[1] 37 | map_coords = [ 38 | int(x) 39 | for x in re.findall(r"\d+", os.path.basename(fp).replace(".hdf5", "")) 40 | ] 41 | print(map_coords) 42 | 43 | h_lim = sst.shape[0] 44 | w_lim = sst.shape[1] 45 | 46 | # If map contains blank border 47 | if (map_coords[1] - map_coords[0]) < map_h: 48 | h_lim = map_coords[1] - map_coords[0] 49 | if (map_coords[3] - map_coords[2]) < map_w: 50 | w_lim = map_coords[3] - map_coords[2] 51 | 52 | # print(h_lim, w_lim) 53 | 54 | for shift_patches in shift: 55 | print("Shift by %d" % shift_patches) 56 | 57 | dir_output = os.path.join( 58 | dir_dataset, 59 | config.files.dir_patches 60 | + "%dx%d_shift_%d" % (patch_size, patch_size, shift_patches), 61 | ) 62 | if not os.path.exists(dir_output): 63 | os.makedirs(dir_output) 64 | 65 | # Get coordinates of non-overlapping patches 66 | if shift_patches == 0: 67 | h_starts = list(np.arange(0, h_lim - patch_size, patch_size)) 68 | w_starts = list(np.arange(0, w_lim - patch_size, patch_size)) 69 | 70 | # Include remainder patches on 71 | h_starts.append(h_lim - patch_size) 72 | w_starts.append(w_lim - patch_size) 73 | 74 | else: 75 | h_starts = list( 76 | np.arange(shift_patches, h_lim - patch_size, patch_size) 77 | ) 78 | w_starts = list( 79 | np.arange(shift_patches, w_lim - patch_size, patch_size) 80 | ) 81 | 82 | coords_starts = [(x, y) for x in h_starts for y in w_starts] 83 | print(f"{len(coords_starts)} patches") 84 | 85 | # Get patches and save 86 | for h, w in tqdm(coords_starts): 87 | patch = sst[h : h + patch_size, w : w + patch_size, :] 88 | 89 | fp_output = f"{dir_output}/{h+map_coords[0]}_{w+map_coords[2]}.hdf5" 90 | 91 | h = h5py.File(fp_output, "w") 92 | _ = h.create_dataset("data", data=patch, dtype=np.uint8) 93 | h.close() 94 | 95 | 96 | if __name__ == "__main__": 97 | parser = argparse.ArgumentParser() 98 | 99 | parser.add_argument("--config_dir", type=str, help="path to config") 100 | 101 | args = parser.parse_args() 102 | config = load_config(args.config_dir) 103 | 104 | generate_patches(config) 105 | -------------------------------------------------------------------------------- /bidcell/processing/transcripts.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import csv 3 | import glob 4 | import multiprocessing as mp 5 | import os 6 | import pathlib 7 | import re 8 | import sys 9 | import warnings 10 | 11 | import h5py 12 | import natsort 13 | import numpy as np 14 | import pandas as pd 15 | import tifffile 16 | from tqdm import tqdm 17 | 18 | from .utils import get_n_processes, get_patches_coords 19 | from ..config import Config, load_config 20 | 21 | 22 | def process_gene_chunk( 23 | gene_chunk, 24 | df_patch, 25 | img_height, 26 | img_width, 27 | dir_output, 28 | hs, 29 | ws, 30 | gene_col, 31 | x_col, 32 | y_col, 33 | counts_col, 34 | ): 35 | # print(gene_chunk) 36 | for i_fe, fe in enumerate(gene_chunk): 37 | # print(fe) 38 | df_fe = df_patch.loc[df_patch[gene_col] == fe] 39 | map_fe = np.zeros((img_height, img_width)) 40 | # print(map_fe.shape) 41 | 42 | if counts_col is None: 43 | for idx in df_fe.index: 44 | idx_x = np.round(df_patch.iloc[idx][x_col]).astype(int) 45 | idx_y = np.round(df_patch.iloc[idx][y_col]).astype(int) 46 | 47 | map_fe[idx_y, idx_x] += 1 48 | 49 | else: 50 | for idx in df_fe.index: 51 | idx_x = np.round(df_patch.iloc[idx][x_col]).astype(int) 52 | idx_y = np.round(df_patch.iloc[idx][y_col]).astype(int) 53 | idx_counts = df_patch.iloc[idx][counts_col] 54 | 55 | map_fe[idx_y, idx_x] += idx_counts 56 | 57 | # print(map_fe.shape) 58 | 59 | fp_fe_map = f"{dir_output}/{fe}_{hs}_{ws}.tif" 60 | # print(fp_fe_map) 61 | tifffile.imwrite(fp_fe_map, map_fe.astype(np.uint8), photometric="minisblack") 62 | 63 | 64 | def stitch_patches(dir_patches, fp_pattern): 65 | """Stitches together the patches of summed genes and saves as new tif""" 66 | fp_patches = glob.glob(dir_patches + "/" + fp_pattern) 67 | fp_patches = natsort.natsorted(fp_patches) 68 | 69 | coords = np.zeros((len(fp_patches), 4), dtype=int) 70 | 71 | for i, fp in enumerate(fp_patches): 72 | coords_patch = [int(x) for x in re.findall(r"\d+", os.path.basename(fp))] 73 | coords[i, :] = np.array(coords_patch) 74 | 75 | height_patch = coords[0, 1] - coords[0, 0] 76 | width_patch = coords[0, 3] - coords[0, 2] 77 | height = np.max(coords[:, 1]) + height_patch 78 | width = np.max(coords[:, 2]) + width_patch 79 | 80 | whole = np.zeros((height, width), dtype=np.uint16) 81 | 82 | for i, fp in enumerate(fp_patches): 83 | hs, _, ws, _ = coords[i, 0], coords[i, 1], coords[i, 2], coords[i, 3] 84 | whole[hs : hs + height_patch, ws : ws + width_patch] = tifffile.imread(fp) 85 | 86 | height_trim = np.max(coords[:, 1]) 87 | width_trim = np.max(coords[:, 3]) 88 | 89 | whole = whole[:height_trim, :width_trim] 90 | print(whole.shape) 91 | 92 | tifffile.imwrite( 93 | dir_patches + "/all_genes_sum.tif", whole, photometric="minisblack" 94 | ) 95 | 96 | 97 | def generate_expression_maps(config: Config): 98 | """ 99 | Generates transcript expression maps from transcripts.csv.gz, which contains transcript data with locations. 100 | Example file for Xenium: 101 | 102 | "transcript_id","cell_id","overlaps_nucleus","feature_name","x_location","y_location","z_location","qv" 103 | 281474976710656,565,0,"SEC11C",4.395842,328.66647,12.019493,18.66248 104 | 281474976710657,540,0,"NegControlCodeword_0502",5.074415,236.96484,7.6085105,18.634956 105 | 281474976710658,562,0,"SEC11C",4.702023,322.79715,12.289083,18.66248 106 | 281474976710659,271,0,"DAPK3",4.9066014,581.42865,11.222615,20.821745 107 | 281474976710660,291,0,"TCIM",5.6606994,720.85175,9.265523,18.017488 108 | 281474976710661,297,0,"TCIM",5.899098,748.5928,9.818688,18.017488 109 | 110 | """ 111 | 112 | dir_dataset = config.files.data_dir 113 | dir_out_maps = dir_dataset + "/" + config.files.dir_out_maps 114 | if not os.path.exists(dir_out_maps): 115 | os.makedirs(dir_out_maps) 116 | 117 | fp_transcripts_processed = dir_dataset + "/" + config.files.fp_transcripts_processed 118 | 119 | # Names to filter out 120 | # fp_transcripts_to_filter = os.path.join(config.files.data_dir, config.files.fp_transcripts_to_filter) 121 | # with open(fp_transcripts_to_filter) as file: 122 | # transcripts_to_filter = [line.rstrip() for line in file] 123 | transcripts_to_filter = config.transcripts.transcripts_to_filter 124 | 125 | # Column names in the transcripts csv 126 | x_col = config.transcripts.x_col 127 | y_col = config.transcripts.y_col 128 | gene_col = config.transcripts.gene_col 129 | 130 | # if not os.path.exists(fp_transcripts_processed): 131 | print("Loading transcripts file") 132 | fp_transcripts = config.files.fp_transcripts 133 | if pathlib.Path(fp_transcripts).suffixes[-1] == ".gz": 134 | if ".tsv" in fp_transcripts: 135 | df = pd.read_csv(fp_transcripts, sep="\t", compression="gzip") 136 | else: 137 | df = pd.read_csv(fp_transcripts, compression="gzip") 138 | else: 139 | if ".tsv" in fp_transcripts: 140 | df = pd.read_csv(fp_transcripts, sep="\t") 141 | else: 142 | df = pd.read_csv(fp_transcripts) 143 | print(df.head()) 144 | 145 | print("Filtering transcripts") 146 | if "qv" in df.columns: 147 | df = df[ 148 | (df["qv"] >= config.transcripts.min_qv) 149 | & (~df[gene_col].str.startswith(tuple(transcripts_to_filter))) 150 | ] 151 | else: 152 | df = df[(~df[gene_col].str.startswith(tuple(transcripts_to_filter)))] 153 | 154 | if config.files.fp_selected_genes is not None: 155 | with open(config.files.fp_selected_genes) as file: 156 | selected_genes = [line.rstrip() for line in file] 157 | df = df[(df[gene_col].isin(selected_genes))] 158 | 159 | # Scale 160 | # print(df[x_col].min(), df[x_col].max(), df[y_col].min(), df[y_col].max()) 161 | df[x_col] = df[x_col].mul(config.affine.scale_ts_x) 162 | df[y_col] = df[y_col].mul(config.affine.scale_ts_y) 163 | # print(df[x_col].min(), df[x_col].max(), df[y_col].min(), df[y_col].max()) 164 | 165 | # Shift 166 | min_x = df[x_col].min() 167 | min_y = df[y_col].min() 168 | if config.transcripts.shift_to_origin: 169 | with pd.option_context("mode.chained_assignment", None): 170 | df.loc[:, x_col] = df[x_col] - min_x + config.affine.global_shift_x 171 | df.loc[:, y_col] = df[y_col] - min_y + config.affine.global_shift_y 172 | 173 | size_x = df[x_col].max() + 1 174 | size_y = df[y_col].max() + 1 175 | 176 | # Write transform parameters to file 177 | fp_affine = os.path.join(dir_dataset, config.files.fp_affine) 178 | params = [ 179 | "scale_ts_x", 180 | "scale_ts_y", 181 | "min_x", 182 | "min_y", 183 | "size_x", 184 | "size_y", 185 | "global_shift_x", 186 | "global_shift_y", 187 | "origin", 188 | ] 189 | vals = [ 190 | config.affine.scale_ts_x, 191 | config.affine.scale_ts_y, 192 | min_x, 193 | min_y, 194 | size_x, 195 | size_y, 196 | config.affine.global_shift_x, 197 | config.affine.global_shift_y, 198 | config.transcripts.shift_to_origin, 199 | ] 200 | # print(vals) 201 | with open(fp_affine, "w") as f: 202 | writer = csv.writer(f, delimiter="\t") 203 | writer.writerows(zip(params, vals)) 204 | 205 | # Delete entries with negative coordinates 206 | df = df[df[x_col] >= 0] 207 | df = df[df[y_col] >= 0] 208 | 209 | df.reset_index(inplace=True, drop=True) 210 | print("Finished filtering") 211 | print("Saving csv...") 212 | 213 | df.to_csv(fp_transcripts_processed) 214 | # else: 215 | # print("Loading filtered transcripts") 216 | # df = pd.read_csv(fp_transcripts_processed, index_col=0) 217 | 218 | # Round locations and convert to integer 219 | df[x_col] = df[x_col].round().astype(int) 220 | df[y_col] = df[y_col].round().astype(int) 221 | 222 | print(df.head) 223 | print(df.shape) 224 | 225 | # Save list of gene names 226 | gene_names = df[gene_col].unique() 227 | print("%d unique genes" % len(gene_names)) 228 | gene_names = natsort.natsorted(gene_names) 229 | with open(dir_dataset + "/" + config.files.fp_gene_names, "w") as f: 230 | for line in gene_names: 231 | f.write(f"{line}\n") 232 | 233 | # Dimensions 234 | total_height_t = int(np.ceil(df[y_col].max())) + 1 235 | total_width_t = int(np.ceil(df[x_col].max())) + 1 236 | 237 | fp_nuclei = os.path.join(dir_dataset, config.files.fp_nuclei) 238 | if os.path.exists(fp_nuclei): 239 | nuclei_img = tifffile.imread(fp_nuclei) 240 | nuclei_h = nuclei_img.shape[0] 241 | nuclei_w = nuclei_img.shape[1] 242 | if total_height_t <= nuclei_h and total_width_t <= nuclei_w: 243 | total_height = nuclei_h 244 | total_width = nuclei_w 245 | else: 246 | sys.exit( 247 | f"Dimensions of transcript map [{total_height_t},{total_width_t}] exceeds those of nuclei image [{nuclei_h},{nuclei_w}]. Check scale_ts_x and scale_ts_y values. Then consider specifying --global_shift_x and --global_shift_y, or padding nuclei" 248 | ) 249 | else: 250 | warnings.warn( 251 | "Computing dimensions from transcript locations - check dimensions are the same as nuclei image. Unless cropping DAPI to size of transcript map, it is highly advised to provide nuclei file name via --fp_nuclei to ensure dimensions are identical" 252 | ) 253 | total_height = total_height_t 254 | total_width = total_width_t 255 | 256 | print(f"Total height {total_height}, width {total_width}") 257 | 258 | # Start and end coordinates of patches 259 | h_coords, img_height = get_patches_coords( 260 | total_height, config.transcripts.max_height 261 | ) 262 | w_coords, img_width = get_patches_coords(total_width, config.transcripts.max_width) 263 | hw_coords = [(hs, he, ws, we) for (hs, he) in h_coords for (ws, we) in w_coords] 264 | 265 | print("Converting to maps") 266 | 267 | n_processes = get_n_processes(config.cpus) 268 | gene_names_chunks = np.array_split(gene_names, n_processes) 269 | 270 | for hs, he, ws, we in tqdm(hw_coords): 271 | print("Patch:", (hs, he, ws, we)) 272 | 273 | df_patch = df[(df[x_col].between(ws, we - 1)) & (df[y_col].between(hs, he - 1))] 274 | 275 | with pd.option_context("mode.chained_assignment", None): 276 | df_patch.loc[:, x_col] = df_patch[x_col] - ws 277 | df_patch.loc[:, y_col] = df_patch[y_col] - hs 278 | 279 | df_patch.reset_index(inplace=True, drop=True) 280 | 281 | processes = [] 282 | 283 | for gene_chunk in gene_names_chunks: 284 | p = mp.Process( 285 | target=process_gene_chunk, 286 | args=( 287 | gene_chunk, 288 | df_patch, 289 | img_height, 290 | img_width, 291 | dir_out_maps, 292 | hs, 293 | ws, 294 | gene_col, 295 | x_col, 296 | y_col, 297 | config.transcripts.counts_col, 298 | ), 299 | ) 300 | processes.append(p) 301 | p.start() 302 | 303 | for p in processes: 304 | p.join() 305 | 306 | # Combine channel-wise 307 | map_all_genes = np.zeros( 308 | (img_height, img_width, len(gene_names)), dtype=np.uint8 309 | ) 310 | 311 | for i_fe, fe in enumerate(tqdm(gene_names)): 312 | fp_fe_map = f"{dir_out_maps}/{fe}_{hs}_{ws}.tif" 313 | map_all_genes[:, :, i_fe] = tifffile.imread(fp_fe_map) 314 | os.remove(fp_fe_map) 315 | 316 | # Sum across all markers 317 | fp_out_map_sum = f"all_genes_sum_{hs}_{he}_{ws}_{we}.tif" 318 | tifffile.imwrite( 319 | dir_out_maps + "/" + fp_out_map_sum, 320 | np.sum(map_all_genes, -1).astype(np.uint8), 321 | photometric="minisblack", 322 | ) 323 | 324 | # Save to hdf5 325 | fp_out_map = f"all_genes_{hs}_{he}_{ws}_{we}.hdf5" 326 | h = h5py.File(dir_out_maps + "/" + fp_out_map, "w") 327 | _ = h.create_dataset("data", data=map_all_genes, dtype=np.uint8) 328 | 329 | print("Saved all maps") 330 | 331 | stitch_patches(dir_out_maps, "/all_genes_sum_*.tif") 332 | 333 | 334 | if __name__ == "__main__": 335 | parser = argparse.ArgumentParser() 336 | 337 | parser.add_argument("--config_dir", type=str, help="path to config") 338 | 339 | args = parser.parse_args() 340 | config = load_config(args.config_dir) 341 | generate_expression_maps(config) 342 | -------------------------------------------------------------------------------- /bidcell/processing/utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import multiprocessing as mp 3 | 4 | 5 | def get_patches_coords(size, size_patch): 6 | """Get start and end locations of patches in a large image given the image patch sizes""" 7 | 8 | if size <= size_patch: 9 | max_size = size 10 | coords = [(0, max_size)] 11 | else: 12 | max_size = size_patch 13 | starts = list(np.arange(0, size, size_patch)) 14 | ends = [x + size_patch if x + size_patch <= size else size for x in starts] 15 | coords = list(zip(starts, ends)) 16 | 17 | return coords, max_size 18 | 19 | 20 | def get_n_processes(n_processes): 21 | """Number of CPUs for multiprocessing""" 22 | if n_processes is None: 23 | return mp.cpu_count() 24 | else: 25 | return n_processes if n_processes <= mp.cpu_count() else mp.cpu_count() 26 | -------------------------------------------------------------------------------- /data/dataset_xenium_breast1_small/morphology_mip_small.tif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SydneyBioX/BIDCell/e565988cd2e78e622c68bd0a5649a1ec8b9b281f/data/dataset_xenium_breast1_small/morphology_mip_small.tif -------------------------------------------------------------------------------- /data/example_mousebrain_genes.txt: -------------------------------------------------------------------------------- 1 | 2010300C02Rik 2 | Acsbg1 3 | Acta2 4 | Acvrl1 5 | Adamts2 6 | Adamtsl1 7 | Adgrl4 8 | Aldh1a2 9 | Angpt1 10 | Ano1 11 | Aqp4 12 | Arc 13 | Arhgap6 14 | Arhgap12 15 | Arhgap25 16 | Arhgef28 17 | Bcl11b 18 | Bdnf 19 | Bhlhe22 20 | Bhlhe40 21 | Btbd11 22 | Cabp7 23 | Cacna2d2 24 | Calb1 25 | Calb2 26 | Car4 27 | Carmn 28 | Cbln1 29 | Cbln4 30 | Cd24a 31 | Cd44 32 | Cd53 33 | Cd68 34 | Cd93 35 | Cd300c2 36 | Cdh4 37 | Cdh6 38 | Cdh9 39 | Cdh13 40 | Cdh20 41 | Chat 42 | Chodl 43 | Chrm2 44 | Cldn5 45 | Clmn 46 | Cntn6 47 | Cntnap4 48 | Cntnap5b 49 | Cobll1 50 | Col1a1 51 | Col6a1 52 | Col19a1 53 | Cort 54 | Cplx3 55 | Cpne4 56 | Cpne6 57 | Cpne8 58 | Crh 59 | Cspg4 60 | Ctgf 61 | Cux2 62 | Cwh43 63 | Cyp1b1 64 | Dcn 65 | Deptor 66 | Dkk3 67 | Dner 68 | Dpy19l1 69 | Dpyd 70 | Ebf3 71 | Emcn 72 | Epha4 73 | Eya4 74 | Fezf2 75 | Fgd5 76 | Fhod3 77 | Fibcd1 78 | Fign 79 | Fmod 80 | Fn1 81 | Fos 82 | Foxp2 83 | Gad1 84 | Gad2 85 | Gadd45a 86 | Galnt14 87 | Garnl3 88 | Gfap 89 | Gfra2 90 | Gjb2 91 | Gjc3 92 | Gli3 93 | Gm2115 94 | Gm19410 95 | Gng12 96 | Gpr17 97 | Grik3 98 | Gsg1l 99 | Gucy1a1 100 | Hapln1 101 | Hat1 102 | Hpcal1 103 | Hs3st2 104 | Htr1f 105 | Id2 106 | Igf1 107 | Igf2 108 | Igfbp4 109 | Igfbp5 110 | Igfbp6 111 | Igsf21 112 | Ikzf1 113 | Inpp4b 114 | Kcnh5 115 | Kcnmb2 116 | Kctd8 117 | Kctd12 118 | Kdr 119 | Lamp5 120 | Laptm5 121 | Ly6a 122 | Lypd6 123 | Lyz2 124 | Mapk4 125 | Mdga1 126 | Mecom 127 | Meis2 128 | Myl4 129 | Myo16 130 | Ndst3 131 | Ndst4 132 | Necab1 133 | Necab2 134 | Nell1 135 | Neto2 136 | Neurod6 137 | Nostrin 138 | Npnt 139 | Npy2r 140 | Nr2f2 141 | Nrep 142 | Nrn1 143 | Nrp2 144 | Nts 145 | Ntsr2 146 | Nwd2 147 | Nxph3 148 | Opalin 149 | Opn3 150 | Orai2 151 | Paqr5 152 | Parm1 153 | Pcsk5 154 | Pde7b 155 | Pde11a 156 | Pdgfra 157 | Pdyn 158 | Pdzd2 159 | Pdzrn3 160 | Pecam1 161 | Penk 162 | Pglyrp1 163 | Pip5k1b 164 | Pkib 165 | Plch1 166 | Plcxd2 167 | Plcxd3 168 | Plekha2 169 | Pln 170 | Pou3f1 171 | Ppp1r1b 172 | Prdm8 173 | Prox1 174 | Prph 175 | Prr16 176 | Prss35 177 | Pthlh 178 | Pvalb 179 | Rab3b 180 | Rasgrf2 181 | Rasl10a 182 | Rbp4 183 | Rfx4 184 | Rims3 185 | Rmst 186 | Rnf152 187 | Ror1 188 | Rorb 189 | Rprm 190 | Rspo1 191 | Rspo2 192 | Rxfp1 193 | Satb2 194 | Sdk2 195 | Sema3a 196 | Sema3d 197 | Sema3e 198 | Sema5b 199 | Sema6a 200 | Shisa6 201 | Siglech 202 | Sipa1l3 203 | Sla 204 | Slc6a3 205 | Slc13a4 206 | Slc17a6 207 | Slc17a7 208 | Slc39a12 209 | Slc44a5 210 | Slfn5 211 | Slit2 212 | Sncg 213 | Sntb1 214 | Sorcs3 215 | Sox10 216 | Sox11 217 | Sox17 218 | Spag16 219 | Spi1 220 | Spp1 221 | Sst 222 | Stard5 223 | Strip2 224 | Syndig1 225 | Syt2 226 | Syt6 227 | Syt17 228 | Tacr1 229 | Tanc1 230 | Th 231 | Thsd7a 232 | Tle4 233 | Tmem132d 234 | Tmem163 235 | Tmem255a 236 | Tox 237 | Trbc2 238 | Trem2 239 | Trp73 240 | Trpc4 241 | Unc13c 242 | Vat1l 243 | Vip 244 | Vwc2l 245 | Wfs1 246 | Zfp366 247 | Zfp536 248 | Zfpm2 249 | -------------------------------------------------------------------------------- /data/sc_references/sc_breast_markers_neg.csv: -------------------------------------------------------------------------------- 1 | ,ABCC11,ACTA2,ACTG2,ADAM9,ADGRE5,ADH1B,ADIPOQ,AGR3,AHSP,AIF1,AKR1C1,AKR1C3,ALDH1A3,ANGPT2,ANKRD28,ANKRD29,ANKRD30A,APOBEC3A,APOBEC3B,APOC1,AQP1,AQP3,AR,AVPR1A,BACE2,BANK1,BASP1,BTNL9,C1QA,C1QC,C2orf42,C5orf46,C6orf132,C15orf48,CAV1,CAVIN2,CCDC6,CCDC80,CCL5,CCL8,CCL20,CCND1,CCPG1,CCR7,CD1C,CD3D,CD3E,CD3G,CD4,CD8A,CD8B,CD9,CD14,CD19,CD27,CD68,CD69,CD79A,CD79B,CD80,CD83,CD86,CD93,CD163,CD247,CD274,CDC42EP1,CDH1,CEACAM6,CEACAM8,CENPF,CLCA2,CLDN4,CLDN5,CLEC9A,CLEC14A,CLECL1,CLIC6,CPA3,CRHBP,CRISPLD2,CSF3,CTH,CTLA4,CTSG,CTTN,CX3CR1,CXCL5,CXCL12,CXCL16,CXCR4,CYP1A1,CYTIP,DAPK3,DERL3,DMKN,DNAAF1,DNTTIP1,DPT,DSC2,DSP,DST,DUSP2,DUSP5,EDN1,EDNRB,EGFL7,EGFR,EIF4EBP1,ELF3,ELF5,ENAH,EPCAM,ERBB2,ERN1,ESM1,ESR1,FAM49A,FAM107B,FASN,FBLIM1,FBLN1,FCER1A,FCER1G,FCGR3A,FGL2,FLNB,FOXA1,FOXC2,FOXP3,FSTL3,GATA3,GJB2,GLIPR1,GNLY,GPR183,GZMA,GZMB,GZMK,HAVCR2,HDC,HMGA1,HOOK2,HOXD8,HOXD9,HPX,IGF1,IGSF6,IL2RA,IL2RG,IL3RA,IL7R,ITGAM,ITGAX,ITM2C,JUP,KARS,KDR,KIT,KLF5,KLRB1,KLRC1,KLRD1,KLRF1,KRT5,KRT6B,KRT7,KRT8,KRT14,KRT15,KRT16,KRT23,LAG3,LARS,LDHB,LEP,LGALSL,LIF,LILRA4,LPL,LPXN,LRRC15,LTB,LUM,LY86,LYPD3,LYZ,MAP3K8,MDM2,MEDAG,MKI67,MLPH,MMP1,MMP2,MMP12,MMRN2,MNDA,MPO,MRC1,MS4A1,MUC6,MYBPC1,MYH11,MYLK,MYO5B,MZB1,NARS,NCAM1,NDUFA4L2,NKG7,NOSTRIN,NPM3,OCIAD2,OPRPN,OXTR,PCLAF,PCOLCE,PDCD1,PDCD1LG2,PDE4A,PDGFRA,PDGFRB,PDK4,PECAM1,PELI1,PGR,PIGR,PIM1,PLD4,POLR2J3,POSTN,PPARG,PRDM1,PRF1,PTGDS,PTN,PTPRC,PTRHD1,QARS,RAB30,RAMP2,RAPGEF3,REXO4,RHOH,RORC,RTKN2,RUNX1,S100A4,S100A8,S100A14,SCD,SCGB2A1,SDC4,SEC11C,SEC24A,SELL,SERHL2,SERPINA3,SERPINB9,SFRP1,SFRP4,SH3YL1,SLAMF1,SLAMF7,SLC4A1,SLC5A6,SLC25A37,SMAP2,SMS,SNAI1,SOX17,SOX18,SPIB,SQLE,SRPK1,SSTR2,STC1,SVIL,TAC1,TACSTD2,TCEAL7,TCF4,TCF7,TCF15,TCIM,TCL1A,TENT5C,TFAP2A,THAP2,TIFA,TIGIT,TIMP4,TMEM147,TNFRSF17,TOMM7,TOP2A,TPD52,TPSAB1,TRAC,TRAF4,TRAPPC3,TRIB1,TUBA4A,TUBB2B,TYROBP,UCP1,USP53,VOPP1,VWF,WARS,ZEB1,ZEB2,ZNF562 2 | B,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 3 | CD4Tconv/Treg,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 4 | CD8T/CD8Tex,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 5 | DC,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 6 | Endothelial,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 7 | Epithelial,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 8 | Fibroblasts,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 9 | Malignant,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 10 | Mast,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 11 | Mono/Macro,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 12 | Myofibroblasts,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 13 | NK,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0 14 | Neutrophils,1.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0 15 | Plasma,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 16 | SMC,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 17 | -------------------------------------------------------------------------------- /data/sc_references/sc_breast_markers_pos.csv: -------------------------------------------------------------------------------- 1 | ,ABCC11,ACTA2,ACTG2,ADAM9,ADGRE5,ADH1B,ADIPOQ,AGR3,AHSP,AIF1,AKR1C1,AKR1C3,ALDH1A3,ANGPT2,ANKRD28,ANKRD29,ANKRD30A,APOBEC3A,APOBEC3B,APOC1,AQP1,AQP3,AR,AVPR1A,BACE2,BANK1,BASP1,BTNL9,C1QA,C1QC,C2orf42,C5orf46,C6orf132,C15orf48,CAV1,CAVIN2,CCDC6,CCDC80,CCL5,CCL8,CCL20,CCND1,CCPG1,CCR7,CD1C,CD3D,CD3E,CD3G,CD4,CD8A,CD8B,CD9,CD14,CD19,CD27,CD68,CD69,CD79A,CD79B,CD80,CD83,CD86,CD93,CD163,CD247,CD274,CDC42EP1,CDH1,CEACAM6,CEACAM8,CENPF,CLCA2,CLDN4,CLDN5,CLEC9A,CLEC14A,CLECL1,CLIC6,CPA3,CRHBP,CRISPLD2,CSF3,CTH,CTLA4,CTSG,CTTN,CX3CR1,CXCL5,CXCL12,CXCL16,CXCR4,CYP1A1,CYTIP,DAPK3,DERL3,DMKN,DNAAF1,DNTTIP1,DPT,DSC2,DSP,DST,DUSP2,DUSP5,EDN1,EDNRB,EGFL7,EGFR,EIF4EBP1,ELF3,ELF5,ENAH,EPCAM,ERBB2,ERN1,ESM1,ESR1,FAM49A,FAM107B,FASN,FBLIM1,FBLN1,FCER1A,FCER1G,FCGR3A,FGL2,FLNB,FOXA1,FOXC2,FOXP3,FSTL3,GATA3,GJB2,GLIPR1,GNLY,GPR183,GZMA,GZMB,GZMK,HAVCR2,HDC,HMGA1,HOOK2,HOXD8,HOXD9,HPX,IGF1,IGSF6,IL2RA,IL2RG,IL3RA,IL7R,ITGAM,ITGAX,ITM2C,JUP,KARS,KDR,KIT,KLF5,KLRB1,KLRC1,KLRD1,KLRF1,KRT5,KRT6B,KRT7,KRT8,KRT14,KRT15,KRT16,KRT23,LAG3,LARS,LDHB,LEP,LGALSL,LIF,LILRA4,LPL,LPXN,LRRC15,LTB,LUM,LY86,LYPD3,LYZ,MAP3K8,MDM2,MEDAG,MKI67,MLPH,MMP1,MMP2,MMP12,MMRN2,MNDA,MPO,MRC1,MS4A1,MUC6,MYBPC1,MYH11,MYLK,MYO5B,MZB1,NARS,NCAM1,NDUFA4L2,NKG7,NOSTRIN,NPM3,OCIAD2,OPRPN,OXTR,PCLAF,PCOLCE,PDCD1,PDCD1LG2,PDE4A,PDGFRA,PDGFRB,PDK4,PECAM1,PELI1,PGR,PIGR,PIM1,PLD4,POLR2J3,POSTN,PPARG,PRDM1,PRF1,PTGDS,PTN,PTPRC,PTRHD1,QARS,RAB30,RAMP2,RAPGEF3,REXO4,RHOH,RORC,RTKN2,RUNX1,S100A4,S100A8,S100A14,SCD,SCGB2A1,SDC4,SEC11C,SEC24A,SELL,SERHL2,SERPINA3,SERPINB9,SFRP1,SFRP4,SH3YL1,SLAMF1,SLAMF7,SLC4A1,SLC5A6,SLC25A37,SMAP2,SMS,SNAI1,SOX17,SOX18,SPIB,SQLE,SRPK1,SSTR2,STC1,SVIL,TAC1,TACSTD2,TCEAL7,TCF4,TCF7,TCF15,TCIM,TCL1A,TENT5C,TFAP2A,THAP2,TIFA,TIGIT,TIMP4,TMEM147,TNFRSF17,TOMM7,TOP2A,TPD52,TPSAB1,TRAC,TRAF4,TRAPPC3,TRIB1,TUBA4A,TUBB2B,TYROBP,UCP1,USP53,VOPP1,VWF,WARS,ZEB1,ZEB2,ZNF562 2 | B,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 3 | CD4Tconv/Treg,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 4 | CD8T/CD8Tex,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 5 | DC,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 6 | Endothelial,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0 7 | Epithelial,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 8 | Fibroblasts,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0 9 | Malignant,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 10 | Mast,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 11 | Mono/Macro,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 12 | Myofibroblasts,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 13 | NK,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 14 | Neutrophils,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 15 | Plasma,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0 16 | SMC,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0 17 | -------------------------------------------------------------------------------- /example_small.py: -------------------------------------------------------------------------------- 1 | from bidcell import BIDCellModel 2 | 3 | BIDCellModel.get_example_data() 4 | 5 | model = BIDCellModel("params_small_example.yaml") 6 | 7 | model.run_pipeline() 8 | 9 | # Alternatively, call individual functions 10 | 11 | # model.preprocess() 12 | 13 | # or call individual functions within preprocess 14 | 15 | # # model.segment_nuclei() 16 | # # model.generate_expression_maps() 17 | # # model.generate_patches() 18 | # # model.make_cell_gene_mat(is_cell=False) 19 | # # model.preannotate() 20 | 21 | # model.train() 22 | 23 | # model.predict() 24 | -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [tool.pdm.build] 2 | includes = ["bidcell", "data"] 3 | 4 | [tool.pdm.dev-dependencies] 5 | test = [ 6 | "pytest>=7.4.1", 7 | ] 8 | dev = [ 9 | "black>=23.7.0", 10 | "deptry>=0.12.0", 11 | "flake8>=6.1.0", 12 | "ipython>=8.15.0", 13 | ] 14 | [build-system] 15 | requires = ["pdm-backend"] 16 | build-backend = "pdm.backend" 17 | 18 | [project] 19 | name = "bidcell" 20 | version = "1.0.3" 21 | description = "Biologically-informed deep learning for cell segmentation of subcelluar spatial transcriptomics data." 22 | authors = [ 23 | {name = "Helen Fu", email = "xiaohang.fu@sydney.edu.au"}, 24 | ] 25 | dependencies = [ 26 | "pandas>=2.1.0", 27 | "numpy>=1.24.4", 28 | "tifffile>=2023.8.30", 29 | "imgaug>=0.4.0", 30 | "h5py>=3.9.0", 31 | "scipy>=1.11.2", 32 | "matplotlib>=3.7.2", 33 | "natsort>=8.4.0", 34 | "cellpose>=2.2.3", 35 | "scikit-image>=0.21.0", 36 | "segmentation-models-pytorch>=0.3.3", 37 | "opencv-python>=4.8.0.76", 38 | "pillow>=10.0.0", 39 | "pyyaml>=6.0.1", 40 | "pydantic>=2.3.0", 41 | "tqdm>=4.66.1", 42 | ] 43 | requires-python = ">=3.9,<3.13" 44 | readme = "README.md" 45 | license = {text = "MIT"} 46 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [flake8] 2 | ignore = 3 | E501 4 | E266 5 | W503 6 | E731 7 | E203 8 | -------------------------------------------------------------------------------- /tests/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SydneyBioX/BIDCell/e565988cd2e78e622c68bd0a5649a1ec8b9b281f/tests/__init__.py --------------------------------------------------------------------------------