├── .gitignore
├── Figure1.png
├── LICENSE
├── README.md
├── bidcell
    ├── BIDCellModel.py
    ├── __init__.py
    ├── config.py
    ├── example_params
    │   ├── cosmx.yaml
    │   ├── merscope.yaml
    │   ├── small_example.yaml
    │   ├── stereoseq.yaml
    │   └── xenium.yaml
    ├── model
    │   ├── dataio
    │   │   └── dataset_input.py
    │   ├── model
    │   │   ├── intialisation.py
    │   │   ├── layers.py
    │   │   ├── losses.py
    │   │   └── model.py
    │   ├── postprocess_predictions.py
    │   ├── predict.py
    │   ├── train.py
    │   └── utils
    │   │   └── utils.py
    └── processing
    │   ├── cell_gene_matrix.py
    │   ├── nuclei_segmentation.py
    │   ├── nuclei_stitch_fov.py
    │   ├── preannotate.py
    │   ├── transcript_patches.py
    │   ├── transcripts.py
    │   └── utils.py
├── data
    ├── dataset_xenium_breast1_small
    │   ├── morphology_mip_small.tif
    │   └── transcripts_small.csv
    ├── example_mousebrain_genes.txt
    └── sc_references
    │   ├── sc_breast.csv
    │   ├── sc_breast_markers_neg.csv
    │   └── sc_breast_markers_pos.csv
├── example_small.py
├── pdm.lock
├── pyproject.toml
├── setup.cfg
└── tests
    └── __init__.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | **/__pycache__
 2 | **/cell_gene_matrices
 3 | **/model_outputs
 4 | *.tif
 5 | *.pbs
 6 | *.csv
 7 | *.gz
 8 | *.zip
 9 | **/check_cells.py
10 | **/show_cells_by_id.py
11 | **/check.py
12 | example_data/
13 | data_large/
14 | dist
15 | params_small_example.yaml
16 | *_example_config.yaml
17 | example_large.py
18 | pypi_token.txt
19 | 
20 | !morphology_mip_small.tif
21 | !transcripts_small.csv
22 | !sc_breast.csv
23 | !sc_breast_markers_neg.csv
24 | !sc_breast_markers_pos.csv
25 | .vscode/settings.json
26 | data/dataset_xenium_breast1/all_gene_names.txt
27 | .pdm-python
28 | get_markers.py
29 | 


--------------------------------------------------------------------------------
/Figure1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SydneyBioX/BIDCell/e565988cd2e78e622c68bd0a5649a1ec8b9b281f/Figure1.png


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # BIDCell: Biologically-informed self-supervised learning for segmentation of subcellular spatial transcriptomics data
  2 | 
  3 | For more details of our method, please refer to: https://doi.org/10.1038/s41467-023-44560-w
  4 | 
  5 | Recent advances in subcellular imaging transcriptomics platforms have enabled spatial mapping of the expression of hundreds of genes at subcellular resolution and provide topographic context to the data. This has created a new data analytics challenge to correctly identify cells and accurately assign transcripts, ensuring that all available data can be utilised. To this end, we introduce BIDCell, a self-supervised deep learning-based framework that incorporates cell type and morphology information via novel biologically-informed loss functions. We also introduce CellSPA, a comprehensive evaluation framework consisting of metrics in five complementary categories for cell segmentation performance. We demonstrate that BIDCell outperforms other state-of-the-art methods according to many CellSPA metrics across a variety of tissue types of technology platforms, including 10x Genomics Xenium. Taken together, we find that BIDCell can facilitate single-cell spatial expression analyses, including cell-cell interactions, enabling great potential in biological discovery.
  6 | 
  7 | ![alt text](Figure1.png)
  8 | 
  9 | 
 10 | ## Installation
 11 | 
 12 | > **Note**: A GPU with at least 12GB VRAM is strongly recommended for the deep learning component, and 32GB RAM for data processing.
 13 | We ran BIDCell on a Linux system with a 12GB NVIDIA GTX Titan V GPU, Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz with 16 threads, and 64GB RAM.
 14 | 
 15 | 1. Create virtual environment (Python>=3.9,<3.13):
 16 | ```sh
 17 | conda create --name bidcell python=3.10
 18 | ```    
 19 | 2. Activate virtual environment:
 20 | ```sh
 21 | conda activate bidcell
 22 | ```
 23 | 3. Install package:
 24 | ```sh
 25 | python -m pip install bidcell
 26 | ```
 27 | Installation of dependencies typically requires a few minutes. 
 28 | 
 29 | > **Note**: We are actively finding and fixing issues. If you encounter `[xcb] Unknown sequence number while processing queue`, try running without a GUI, e.g. through PuTTY. Please let us know any other issues you may find. Thank you.
 30 | 
 31 | 
 32 | ## Demo
 33 | 
 34 | Please download the BIDCell GitHub repository first. A small subset of Xenium breast cancer data is provided as a demo. The example .yaml file can be found in ``bidcell/example_params/small_example.yaml``. Use the following to run all the steps to verify installation:
 35 | ```sh
 36 | python example_small.py
 37 | ```
 38 | Or:
 39 | ```py
 40 | from bidcell import BIDCellModel
 41 | BIDCellModel.get_example_data()
 42 | model = BIDCellModel("params_small_example.yaml")
 43 | model.run_pipeline()
 44 | ```
 45 | 
 46 | 
 47 | ## Parameters
 48 | 
 49 | Parameters are defined in .yaml files. Examples are provided for 4 major platforms, including Xenium, CosMx, MERSCOPE, and Stereo-seq. BIDCell may also be applied to data from other technologies such as MERFISH. 
 50 | 
 51 | > **Note**: **Please modify `cpus` to suit your system. Higher `cpus` allow faster runtimes but may freeze your system.**
 52 | 
 53 | Run the following to obtain examples: 
 54 | ```py
 55 | from bidcell import BIDCellModel
 56 | BIDCellModel.get_example_config("xenium")
 57 | BIDCellModel.get_example_config("cosmx")
 58 | BIDCellModel.get_example_config("merscope")
 59 | BIDCellModel.get_example_config("stereoseq")
 60 | ```
 61 | This will copy the .yaml for the respective vendor into your working directory, for example `xenium_example_config.yaml`. 
 62 | 
 63 | 
 64 | ## Example usage
 65 | 
 66 | The full dataset (Xenium Output Bundle In Situ Replicate 1) may be downloaded from https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast. The breast cancer reference data are provided with this package under `data/sc_references`, or `./example_data/sc_references` if you have run `example_small.py`. Please ensure the correct paths are provided for the parameters under `files` in `xenium_example_config.yaml`, in particular, the paths for the transcripts (`transcripts.csv.gz`) and DAPI (`morphology_mip.ome.tif`) files.
 67 | 
 68 | To run the entire pipeline (data processing, training, prediction, and extracting the cell-gene matrix):
 69 | ```py
 70 | from bidcell import BIDCellModel
 71 | model = BIDCellModel("xenium_example_config.yaml")
 72 | model.run_pipeline()
 73 | ```
 74 | Alternatively, the pipeline can be broken down into 3 main stages:
 75 | ```py
 76 | from bidcell import BIDCellModel
 77 | model = BIDCellModel("xenium_example_config.yaml")
 78 | model.preprocess()
 79 | model.train()
 80 | model.predict()
 81 | ```
 82 | Or, functions in `preprocess` can be called individually:
 83 | ```py
 84 | from bidcell import BIDCellModel
 85 | model = BIDCellModel("xenium_example_config.yaml")
 86 | # model.stitch_nuclei() # for when nuclei images are separated into FOVs (e.g., CosMx)
 87 | model.segment_nuclei()
 88 | model.generate_expression_maps()
 89 | model.generate_patches()
 90 | model.make_cell_gene_mat(is_cell=False)
 91 | model.preannotate()
 92 | model.train()
 93 | model.predict()
 94 | ```
 95 | 
 96 | If your machine/server has multiple GPUs, you may select the GPU to use, e.g. for GPU ID 3:
 97 | ```sh
 98 | CUDA_VISIBLE_DEVICES=3
 99 | ```
100 | 
101 | 
102 | ## Single-cell reference and markers
103 | 
104 | BIDCell uses single-cell reference data for improved results. These can be downloaded from public repositories such as TISCH2, Allen Brain Map, and the Human Cell Atlas. 
105 | 
106 | Please see the provided breast cancer single-cell reference and positive/negative marker files (`sc_breast.csv`, `sc_breast_markers_pos.csv`, and `sc_breast_markers_neg.csv`) as a template.
107 | 
108 | The reference csv file contains average expressions for all of the genes in the spatial transcriptomic dataset of different cell types. You may choose an appropriate list of cell types to include for your data.
109 | 
110 | The positive and negative markers files contain the respective marker genes for each cell type. The positive and negative markers were those with expressions in the highest and lowest 10 percentile for each cell type of a tissue sample. We found that removing positive markers that were common to at least a third of cell types in each dataset was appropriate across various datasets. Using a larger number of positive markers tends to increase the size of predicted cells. Manual curation and alternative approaches to determine the marker genes can also be used.
111 | 
112 | Only <1,000 genes are needed to perform segmentation. Specify a selection of genes in a file (see Stero-seq example).
113 | 
114 | Some example reference data may be found here: https://github.com/SydneyBioX/scClassify?tab=readme-ov-file#pretrained-models
115 | 
116 | 
117 | ## Segmentation architectures:
118 | The default is UNet3+ https://arxiv.org/abs/2004.08790, and we have found it to perform well across different technologies and tissue types.
119 | To use a different architecture, select from a list of popular backbones or define your own:
120 |   - Set `model_params.name` in the .yaml file with an encoder from https://segmentation-modelspytorch.readthedocs.io/en/latest/index.html
121 |   - Or, modify `SegmentationModel` class in [`model.py`](bidcell/model/model/model.py)
122 | 
123 | 
124 | ## Additional information
125 | 
126 | If you receive the error: ``pickle.UnpicklingError: pickle data was truncated``, try reducing `cpus`
127 | 
128 | Performing segmentation at a higher resolution requires a larger patch size, thus more GPU memory.
129 | 
130 | Expected outputs:
131 | - .tif file of segmented cells, where the value corresponds to cell IDs. File name ends in `_connected.tif`
132 |   - e.g.: `dataset_xenium_breast1_small/model_outputs/2023_09_06_11_55_24/test_output/epoch_{test_epoch}_step_{test_step}_connected.tif`
133 | - `expr_mat.csv` containing gene expressions of segmented cells
134 |   - e.g.: `dataset_xenium_breast1_small/cell_gene_matrices/2023_09_06_11_55_24/expr_mat.csv`
135 | 
136 | Expected runtime (based on our system for the Xenium breast cancer dataset):
137 | - Training: ~10 mins for 4,000 steps
138 | - Inference: ~ 50 mins
139 | - Postprocessing: ~ 30 mins
140 | 
141 | 
142 | ## Xenium Ranger and Xenium Explorer
143 | 
144 | The BIDCell output .tif segmentation can be used with Xenium Ranger and then viewed in Xenium Explorer. The .tif file needs to be resized to be the same dimensions as the DAPI image (`morphology_mip.ome.tif`):
145 | 
146 | ```py
147 | cells = cv2.resize(cells.astype('float32'), (w_dapi, h_dapi), interpolation=cv2.INTER_NEAREST)
148 | cells = cells.astype(np.uint32)
149 | ```
150 | 
151 | The resized segmentation can then be used as the input file to the `--cells` argument for `xeniumranger import-segmentation`. The same applies to the nuclei from BIDCell.
152 | 
153 | 
154 | ## Contact us
155 | 
156 | If you have any enquiries, especially about using BIDCell to segment cells in your data, please contact xiaohang.fu@sydney.edu.au. We are also happy to receive any suggestions and comments.
157 | 
158 | 
159 | ## Citation
160 | 
161 | If BIDCell has assisted you with your work, please kindly cite our paper:
162 | 
163 | Fu, X., Lin, Y., Lin, D., Mechtersheimer, D., Wang, C., Ameen, F., Ghazanfar, S., Patrick, E., Kim, J., & Yang, J. Y. H. BIDCell: Biologically-informed self-supervised learning for segmentation of subcellular spatial transcriptomics data. Nat Commun 15, 509 (2024). https://doi.org/10.1038/s41467-023-44560-w


--------------------------------------------------------------------------------
/bidcell/BIDCellModel.py:
--------------------------------------------------------------------------------
  1 | """BIDCellModel class module"""
  2 | import importlib.resources
  3 | import os
  4 | from pathlib import Path
  5 | from shutil import copyfile, copytree
  6 | from typing import Literal
  7 | 
  8 | from .config import load_config
  9 | from .model.postprocess_predictions import postprocess_predictions
 10 | from .model.predict import fill_grid, predict
 11 | from .model.train import train
 12 | from .model.utils.utils import get_newest_id
 13 | from .processing.cell_gene_matrix import make_cell_gene_mat
 14 | from .processing.nuclei_segmentation import segment_nuclei
 15 | from .processing.nuclei_stitch_fov import stitch_nuclei
 16 | from .processing.preannotate import preannotate
 17 | from .processing.transcript_patches import generate_patches
 18 | from .processing.transcripts import generate_expression_maps
 19 | 
 20 | 
 21 | class BIDCellModel:
 22 |     """The BIDCellModel class, which provides an interface for preprocessing, training and predicting all the cell types for a datset."""
 23 | 
 24 |     def __init__(self, config_file: str) -> None:
 25 |         """Constructs a BIDCellModel instance using the user-supplied config file.\n
 26 |         The configuration is validated during construction.
 27 | 
 28 |         Parameters
 29 |         ----------
 30 |         config_file : str
 31 |             Path to the YAML configuration file.
 32 |         """
 33 |         self.config = load_config(config_file)
 34 | 
 35 |     def run_pipeline(self):
 36 |         """Runs the entire BIDCell pipeline using the settings defined in the configuration.
 37 |         """
 38 |         print("### Preprocessing ###")
 39 |         print()
 40 |         self.preprocess()
 41 |         print()
 42 |         print("### Training ###")
 43 |         print()
 44 |         self.train()
 45 |         print()
 46 |         print("### Predict ###")
 47 |         print()
 48 |         self.predict()
 49 |         print()
 50 |         print("### Done ###")
 51 | 
 52 |     def preprocess(self) -> None:
 53 |         """Preprocess the dataset for training.
 54 |         """
 55 |         if self.config.nuclei_fovs.stitch_nuclei_fovs:
 56 |             stitch_nuclei(self.config)
 57 |         if self.config.nuclei.crop_nuclei_to_ts:
 58 |             generate_expression_maps(self.config)
 59 |             segment_nuclei(self.config)
 60 |         else:
 61 |             segment_nuclei(self.config)
 62 |             generate_expression_maps(self.config)
 63 |         generate_patches(self.config)
 64 |         make_cell_gene_mat(self.config, is_cell=False)
 65 |         preannotate(self.config)
 66 | 
 67 |     def stitch_nuclei(self):
 68 |         """Stich separate FOV files into a single one (e.g. CosMx data).\n
 69 |         Runs inside preprocess by default, if nuclei_fovs.stitch_nuclei_fovs is True in the configuration file.
 70 |         """
 71 |         stitch_nuclei(self.config)
 72 | 
 73 |     def segment_nuclei(self):
 74 |         """Run the nucleus segmentation algorythm. Runs inside preprocess by default.
 75 |         """
 76 |         segment_nuclei(self.config)
 77 | 
 78 |     def generate_expression_maps(self):
 79 |         """Generate the expression maps. Runs inside preprocess by default.
 80 |         """
 81 |         generate_expression_maps(self.config)
 82 | 
 83 |     def generate_patches(self):
 84 |         """Generate patches for training. Runs inside preprocess by default.
 85 |         """
 86 |         generate_patches(self.config)
 87 | 
 88 |     def make_cell_gene_mat(self, is_cell: bool, timestamp: str = "last"):
 89 |         """Make a matrix containing counts for each cell. Runs inside preprocess and predict by default.
 90 | 
 91 |         Parameters
 92 |         ----------
 93 |         is_cell : bool
 94 |             If False, uses nuclei masks for creation, other wise it uses `timestamp` to chose a directory containing segmented cells outputted by BIDCell.
 95 |         timestamp : str, optional
 96 |             The timestamp corrisponding to the name of a directory in the data directory under `model_outputs`, by default "last", in which case it uses the folder with the most recent timestamp.
 97 |         """
 98 |         if is_cell and timestamp == "last":
 99 |             timestamp = get_newest_id(
100 |                 os.path.join(self.config.files.data_dir, "model_outputs")
101 |             )
102 |         elif is_cell:
103 |             self.__check_valid_timestamp(timestamp)
104 |         make_cell_gene_mat(self.config, is_cell, timestamp=timestamp)
105 | 
106 |     def preannotate(self):
107 |         """Preannotate the cells. Runs inside preprocess by default.
108 |         """
109 |         preannotate(self.config)
110 | 
111 |     def train(self) -> None:
112 |         """Train the model.
113 |         """
114 |         train(self.config)
115 | 
116 |     def predict(self) -> None:
117 |         """Segment and annotate the cells.
118 |         """
119 |         predict(self.config)
120 | 
121 |         if self.config.experiment_dirs.dir_id == "last":
122 |             timestamp = get_newest_id(
123 |                 os.path.join(self.config.files.data_dir, "model_outputs")
124 |             )
125 |         else:
126 |             timestamp = self.config.experiment_dirs.dir_id
127 |             self.__check_valid_timestamp(timestamp)
128 | 
129 |         fill_grid(self.config, timestamp)
130 | 
131 |         postprocess_predictions(self.config, timestamp)
132 | 
133 |         make_cell_gene_mat(self.config, is_cell=True, timestamp=timestamp)
134 | 
135 |     @staticmethod
136 |     def get_example_config(vendor: Literal["cosmx", "merscope", "stereoseq", "xenium"]) -> None:
137 |         """Gets an example configuration for a given vendor and places it in the working directory.
138 | 
139 |         Parameters
140 |         ----------
141 |         vendor : Literal["cosmx", "merscope", "stereoseq", "xenium"]
142 |             The vendor of the equiptment used to produce the dataset.
143 |         """
144 |         vendors = ["cosmx", "merscope", "stereoseq", "xenium"]
145 |         if not any([vendor.lower() == x for x in vendors]):
146 |             raise ValueError(f"Unknown vendor `{vendor}`\n\tChose one of {*vendors,}")
147 |         params_path = (
148 |             importlib.resources.files("bidcell") / "example_params" / f"{vendor}.yaml"
149 |         )
150 |         if not (dest := Path().cwd() / f"{vendor}_example_config.yaml").exists():
151 |             copyfile(params_path, dest)
152 | 
153 |     @staticmethod
154 |     def get_example_data(with_config: bool = True) -> None:
155 |         """Gets the small example data included in the package and places it in the current working directory.
156 | 
157 |         Parameters
158 |         ----------
159 |         with_config : bool, optional
160 |             Whether to get the configuration for the example data, by default True
161 |         """
162 |         root: Path = importlib.resources.files("bidcell")
163 |         data_path = (
164 |             root.parent / "data"
165 |         )
166 |         cwd = Path().cwd()
167 |         if not (cwd / "example_data").exists():
168 |             copytree(data_path, cwd / "example_data")
169 |         if with_config and not (cwd / "params_small_example.yaml").exists():
170 |             copyfile(
171 |                 root / "example_params" / "small_example.yaml",
172 |                 cwd / "params_small_example.yaml"
173 |             )
174 | 
175 |     def __check_valid_timestamp(self, timestamp: str) -> None:
176 |         outputs_path = Path(self.config.files.data_dir / "model_outputs")
177 |         outputs = list(outputs_path.iterdir())
178 |         if len(outputs) == 0:
179 |             raise ValueError(
180 |                 f"There are no outputs yet under {str(outputs_path)}. Run BIDCell at least once with this dataset to get some."
181 |             )
182 |         if not any(
183 |             [timestamp == x for x in outputs if x.is_dir()]
184 |         ):
185 |             valid_dirs = "\n".join(["\t" + str(x) for x in outputs])
186 |             raise ValueError(
187 |                 f"{timestamp} is not a valid model output directory (set in configuration YAML under `experiment_dirs.dir_id`). Choose one of the following:\n{valid_dirs}"
188 |             )
189 | 


--------------------------------------------------------------------------------
/bidcell/__init__.py:
--------------------------------------------------------------------------------
1 | """
2 | BIDCell - biologically-informed deep learning for cell segmentation of subcelluar spatial transcriptomics data
3 | """
4 | 
5 | from bidcell.BIDCellModel import BIDCellModel
6 | 
7 | __all__ = ["BIDCellModel"]
8 | 


--------------------------------------------------------------------------------
/bidcell/config.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | from pathlib import Path
  3 | from typing import Literal, Annotated
  4 | 
  5 | import yaml
  6 | from pydantic import BaseModel, computed_field, model_validator, ConfigDict
  7 | from pydantic.functional_validators import AfterValidator
  8 | 
  9 | 
 10 | def validate_path(v: str | None) -> str:
 11 |     if v is None:
 12 |         return v
 13 | 
 14 |     path = Path(v)
 15 | 
 16 |     assert (
 17 |         path.exists()
 18 |     ), f"Invalid path {v}: Ensure you have the correct path in your config file."
 19 | 
 20 |     return str(path.resolve())
 21 | 
 22 | 
 23 | PathString = Annotated[str, AfterValidator(validate_path)]
 24 | 
 25 | 
 26 | class FileParams(BaseModel):
 27 |     data_dir: PathString
 28 |     fp_dapi: PathString | None = None
 29 |     fp_transcripts: PathString
 30 |     fp_ref: PathString
 31 |     fp_pos_markers: PathString
 32 |     fp_neg_markers: PathString
 33 | 
 34 |     # Internal defaults
 35 |     # file of affine transformation - needed if cropping to align DAPI to transcripts
 36 |     fp_affine: str = "affine.csv"
 37 |     # file name of nuclei tif file
 38 |     fp_nuclei: str = "nuclei.tif"
 39 |     # file name of resized DAPI image
 40 |     fp_rdapi: str = "dapi_resized.tif"
 41 |     # directory containing processed gene expression maps
 42 |     dir_out_maps: str = "expr_maps"
 43 |     # filtered and xy-scaled transcripts data
 44 |     fp_transcripts_processed: str = "transcripts_processed.csv"
 45 |     # txt file containing list of gene names
 46 |     fp_gene_names: str = "all_gene_names.txt"
 47 |     # directory prefix of transcript patches
 48 |     dir_patches: str = "expr_maps_input_patches_"
 49 |     # directory for cell-gene expression matrices
 50 |     dir_cgm: str = "cell_gene_matrices"
 51 |     # file name of nuclei expression matrices
 52 |     fp_expr: str = "expr_mat.csv"
 53 |     # file name of nuclei annotations
 54 |     fp_nuclei_anno: str = "nuclei_cell_type.h5"
 55 |     # file name of text file containing selected gene names, e.g. selected_genes.txt
 56 |     fp_selected_genes: str | None = None
 57 | 
 58 |     # Internal
 59 |     # fp_stitched: str | None = None
 60 | 
 61 | 
 62 | class NucleiFovParams(BaseModel):
 63 |     stitch_nuclei_fovs: bool
 64 |     dir_dapi: str | None = None
 65 |     ext_dapi: str = "tif"
 66 |     pattern_z: str = "Z###"
 67 |     pattern_f: str = "F###"
 68 |     channel_first: bool = False
 69 |     channel_dapi: int = -1
 70 |     n_fov: int | None = None
 71 |     min_fov: int | None = None
 72 |     n_fov_h: int | None = None
 73 |     n_fov_w: int | None = None
 74 |     start_corner: Literal["ul", "ur", "bl", "br"] = "ul"
 75 |     row_major: bool = False
 76 |     z_level: int = 1
 77 |     mip: bool = False
 78 |     flip_ud: bool = False
 79 | 
 80 |     @model_validator(mode="after")
 81 |     def check_dapi(self):
 82 |         if not self.stitch_nuclei_fovs:
 83 |             return self
 84 | 
 85 |         if self.dir_dapi is None:
 86 |             raise ValueError(
 87 |                 "dir_dapi must be specified if stitch_nuclei_fovs is True."
 88 |             )
 89 | 
 90 |         p = Path(self.dir_dapi)
 91 | 
 92 |         if not p.exists():
 93 |             raise ValueError(
 94 |                 "Invalid value for dir_dapi ({self.dir_dapi}): Check the config file and ensure the correct directory is specified."
 95 |             )
 96 | 
 97 |         if not p.is_dir():
 98 |             raise ValueError(
 99 |                 "dir_dapi is not a directory: dir_dapi must point to a directory containing the FOVs to be stitched together."
100 |             )
101 |         return self
102 | 
103 | 
104 | class NucleiParams(BaseModel):
105 |     # divide into sections if too large - maximum height to process in original resolution
106 |     max_height: int = 24000
107 |     # divide into sections if too large - maximum width to process in original resolution
108 |     max_width: int = 32000
109 |     # crop nuclei to size of transcript maps
110 |     crop_nuclei_to_ts: bool = False
111 |     # use CPU for Cellpose if no GPU available
112 |     use_cpu: bool = False
113 |     # estimated diameter of nuclei for Cellpose
114 |     diameter: int | None = None
115 | 
116 | 
117 | class TranscriptParams(BaseModel):
118 |     min_qv: int = 20
119 |     # divide into sections if too large - height of patches
120 |     max_height: int = 3500
121 |     # divide into sections if too large - width of patches
122 |     max_width: int = 4000
123 |     shift_to_origin: bool = False
124 |     x_col: str = "x_location"
125 |     y_col: str = "y_location"
126 |     gene_col: str = "feature_name"
127 |     counts_col: str | None = None
128 |     transcripts_to_filter: list[str]
129 | 
130 | 
131 | class AffineParams(BaseModel):
132 |     target_pix_um: float = 1.0
133 |     base_pix_x: float
134 |     base_pix_y: float
135 |     base_ts_x: float
136 |     base_ts_y: float
137 |     global_shift_x: int = 0
138 |     global_shift_y: int = 0
139 | 
140 |     # Scaling images
141 |     @computed_field
142 |     @property
143 |     def scale_pix_x(self) -> float:
144 |         return self.base_pix_x / self.target_pix_um
145 | 
146 |     @computed_field
147 |     @property
148 |     def scale_pix_y(self) -> float:
149 |         return self.base_pix_y / self.target_pix_um
150 | 
151 |     # Scaling transcript locations
152 |     @computed_field
153 |     @property
154 |     def scale_ts_x(self) -> float:
155 |         return self.base_ts_x / self.target_pix_um
156 | 
157 |     @computed_field
158 |     @property
159 |     def scale_ts_y(self) -> float:
160 |         return self.base_ts_y / self.target_pix_um
161 | 
162 | 
163 | class CellGeneMatParams(BaseModel):
164 |     # max h+w for resized segmentation to extract expressions from
165 |     max_sum_hw: int = 30000
166 | 
167 | 
168 | class ModelParams(BaseModel):
169 |     name: str = "custom"  # TODO: Validate this field
170 |     patch_size: int
171 |     elongated: list[str]
172 | 
173 | 
174 | class TrainingParams(BaseModel):
175 |     model_config = ConfigDict(protected_namespaces=())
176 |     total_epochs: int = 1
177 |     total_steps: int = 4000
178 |     # learning rate of DL model
179 |     learning_rate: float = 0.00001
180 |     # adam optimiser beta1
181 |     beta1: float = 0.9
182 |     # adam optimiser beta2
183 |     beta2: float = 0.999
184 |     # adam optimiser weight decay
185 |     weight_decay: float = 0.0001
186 |     # optimiser
187 |     optimizer: Literal["adam", "rmsprop"] = "adam"
188 |     ne_weight: float = 1.0
189 |     os_weight: float = 1.0
190 |     cc_weight: float = 1.0
191 |     ov_weight: float = 1.0
192 |     pos_weight: float = 1.0
193 |     neg_weight: float = 1.0
194 |     # number of training steps per model save
195 |     model_freq: int = 1000
196 |     # number of training steps per sample save
197 |     sample_freq: int = 100
198 | 
199 | 
200 | class TestingParams(BaseModel):
201 |     test_epoch: int = 1
202 |     test_step: int = 4000
203 | 
204 | 
205 | class PostprocessParams(BaseModel):
206 |     # size of patches to perform morphological processing
207 |     patch_size_mp: int = 1024
208 | 
209 | 
210 | class ExperimentDirs(BaseModel):
211 |     model_config = ConfigDict(protected_namespaces=())
212 |     # directory names for each experiment
213 |     dir_id: str = "last"
214 |     model_dir: str = "models"
215 |     test_output_dir: str = "test_output"
216 |     samples_dir: str = "samples"
217 | 
218 | 
219 | class Config(BaseModel):
220 |     model_config = ConfigDict(protected_namespaces=())
221 |     files: FileParams
222 |     nuclei_fovs: NucleiFovParams
223 |     nuclei: NucleiParams
224 |     transcripts: TranscriptParams
225 |     affine: AffineParams
226 |     model_params: ModelParams
227 |     training_params: TrainingParams
228 |     testing_params: TestingParams
229 |     cpus: int
230 |     postprocess: PostprocessParams = PostprocessParams()
231 |     experiment_dirs: ExperimentDirs = ExperimentDirs()
232 |     cgm_params: CellGeneMatParams = CellGeneMatParams()
233 | 
234 | 
235 | def load_config(path: str) -> Config:
236 |     if not os.path.exists(path):
237 |         FileNotFoundError(
238 |             f"Config file at {path} could not be found. Please check if the filepath is valid."
239 |         )
240 | 
241 |     with open(path) as config_file:
242 |         try:
243 |             config = yaml.safe_load(config_file)
244 |         except Exception:
245 |             raise ValueError(
246 |                 "The inputted YAML config was invalid, try looking at the example config."
247 |             )
248 | 
249 |     if not isinstance(config, dict):
250 |         raise ValueError(
251 |             "The inputted YAML config was invalid, try looking at the example config."
252 |         )
253 | 
254 |     # validate the configuration schema
255 |     config = Config(**config)
256 |     return config
257 | 


--------------------------------------------------------------------------------
/bidcell/example_params/cosmx.yaml:
--------------------------------------------------------------------------------
 1 | # for functions in bidcell/processing
 2 | # NOTE: Commented options default to None
 3 | 
 4 | cpus: 16 # number of CPUs for multiprocessing
 5 | 
 6 | files:
 7 |   data_dir: ./data_large/dataset_cosmx_nsclc # data directory for processed/output data
 8 |   fp_dapi: # path of DAPI image or path of output stitched DAPI if using stitch_nuclei
 9 |   fp_transcripts: ./data_large/dataset_cosmx_nsclc/Lung5_Rep1_tx_file.csv # path of transcripts file
10 |   fp_ref: ./data_large/sc_references/sc_nsclc.csv # file path of reference data
11 |   fp_pos_markers: ./data_large/sc_references/sc_nsclc_markers_pos.csv # file path of positive markers
12 |   fp_neg_markers: ./data_large/sc_references/sc_nsclc_markers_neg.csv # file path of negative markers
13 | 
14 | nuclei_fovs:
15 |   stitch_nuclei_fovs: True # set True to stitch separate FOVs of DAPI together in 1 image
16 |   dir_dapi: ./data_large/dataset_cosmx_nsclc/Lung5_Rep1-RawMorphologyImages # name of directory containing the DAPI FOV images
17 |   ext_dapi: tif # extension of the DAPI images
18 |   pattern_z: Z### # String pattern to find in the file names for the Z number, or None for no Z component
19 |   pattern_f: F### # String pattern to find in file names for the FOV number
20 |   channel_first: True # channel axis first (e.g. [5,H,W]) or last (e.g. [H,W,5]) in image volumes
21 |   channel_dapi: -1 # channel index of the DAPI images in the image volumes
22 |   n_fov: 30 # total number of FOVs
23 |   min_fov: 1 # smallest FOV number - usually 0 or 1
24 |   n_fov_h: 6 # number of FOVs tiled along vertical axis
25 |   n_fov_w: 5 # number of FOVs tiled along horizontal axis
26 |   start_corner: ul # position of first FOV - choose from ul, ur, bl, br
27 |   row_major: True # row major ordering of FOVs
28 |   z_level: 1 # which z slice to use, or set mip to use MIP
29 |   mip: False # take the maximum intensity projection across all Z
30 |   flip_ud: True # flip images up/down before stitching
31 | 
32 | nuclei:
33 |   diameter: # estimated diameter of nuclei for Cellpose - or None to automatically compute, default: None
34 | 
35 | transcripts:
36 |   shift_to_origin: True # shift to origin, making min(x) and min(y) (0,0)
37 |   x_col: x_global_px # name of x location column in transcripts file
38 |   y_col: y_global_px # name of y location column in transcripts file
39 |   gene_col: target # name of genes column in transcripts file
40 |   transcripts_to_filter: # genes starting with these strings will be filtered out
41 |   - NegControlProbe_
42 |   - antisense_
43 |   - NegControlCodeword_
44 |   - BLANK_
45 |   - Blank-
46 |   - NegPrb
47 | 
48 | affine:
49 |   target_pix_um: 0.5 # microns per pixel to perform segmentation; default: 1.0
50 |   base_pix_x: 0.18 # convert to microns along width by multiplying the original pixels by base_pix_x microns per pixel
51 |   base_pix_y: 0.18 # convert to microns along width by multiplying the original pixels by base_pix_y microns per pixel
52 |   base_ts_x: 0.18 # convert between transcript locations and target pixels along width
53 |   base_ts_y: 0.18 # convert between transcript locations and target pixels along height
54 |   global_shift_x: 0 # additional adjustment to align transcripts to DAPI in target pixels along image width; default: 0
55 |   global_shift_y: 0 # additional adjustment to align transcripts to DAPI in target pixels along image height; default: 0
56 | 
57 | model_params:
58 |   name: custom # segmentation model to use: custom for model in model.py or set to a encoder name from segmentation_models_pytorch; default: custom
59 |   patch_size: 64 # size of transcriptomic image patches for input to DL model
60 |   elongated: # list of elongated cell types that are in the single-cell reference
61 |   - Adventitial fibroblasts
62 |   - Alveolar fibroblasts
63 |   - Peribronchial fibroblasts
64 |   - Subpleural fibroblasts
65 |   - Myofibroblasts
66 |   - Fibromyocytes
67 | 
68 | training_params:
69 |   total_epochs: 1 # number of training epochs; default: 1
70 |   total_steps: 4000 # number of training steps; default: 4000
71 |   ne_weight: 1.0 # weight for nuclei encapsulation loss; default: 1.0
72 |   os_weight: 1.0 # weight for oversegmentation loss; default: 1.0
73 |   cc_weight: 1.0 # weight for cell-calling loss; default: 1.0
74 |   ov_weight: 1.0 # weight for oversegmentation loss; default: 1.0
75 |   pos_weight: 1.0 # weight for positive marker loss; default: 1.0
76 |   neg_weight: 1.0 # weight for negative marker loss; default: 1.0
77 | 
78 | testing_params:
79 |   test_epoch: 1 # epoch to test; default: 1
80 |   test_step: 4000 # step number to test; default: 4000
81 | 
82 | experiment_dirs:
83 |   dir_id: last # specify timestamp of output dir or leave blank to use latest dir, default: last


--------------------------------------------------------------------------------
/bidcell/example_params/merscope.yaml:
--------------------------------------------------------------------------------
 1 | # for functions in bidcell/processing
 2 | # NOTE: Commented options default to None
 3 | 
 4 | cpus: 16 # number of CPUs for multiprocessing
 5 | 
 6 | files:
 7 |   data_dir: ./data_large/dataset_merscope_melanoma2 # data directory for processed/output data
 8 |   fp_dapi: ./data_large/dataset_merscope_melanoma2/HumanMelanomaPatient2_images_mosaic_DAPI_z0.tif # path of DAPI image or path of output stitched DAPI if using stitch_nuclei
 9 |   fp_transcripts: ./data_large/dataset_merscope_melanoma2/HumanMelanomaPatient2_detected_transcripts.csv # path of transcripts file
10 |   fp_ref: ./data_large/sc_references/sc_melanoma.csv # file path of reference data
11 |   fp_pos_markers: ./data_large/sc_references/sc_melanoma_markers_pos.csv # file path of positive markers
12 |   fp_neg_markers: ./data_large/sc_references/sc_melanoma_markers_neg.csv # file path of negative markers
13 | 
14 | nuclei_fovs:
15 |   stitch_nuclei_fovs: False # set True to stitch separate FOVs of DAPI together in 1 image
16 | 
17 | nuclei:
18 |   diameter: # estimated diameter of nuclei for Cellpose - or None to automatically compute, default: None
19 | 
20 | transcripts:
21 |   shift_to_origin: True # shift to origin, making min(x) and min(y) (0,0)
22 |   x_col: global_x # name of x location column in transcripts file
23 |   y_col: global_y # name of y location column in transcripts file
24 |   gene_col: gene # name of genes column in transcripts file
25 |   transcripts_to_filter: # genes starting with these strings will be filtered out
26 |   - NegControlProbe_
27 |   - antisense_
28 |   - NegControlCodeword_
29 |   - BLANK_
30 |   - Blank-
31 |   - NegPrb
32 | 
33 | affine:
34 |   target_pix_um: 1.0 # microns per pixel to perform segmentation; default: 1.0
35 |   base_pix_x: 0.107999132774 # convert to microns along width by multiplying the original pixels by base_pix_x microns per pixel
36 |   base_pix_y: 0.107997631125 # convert to microns along width by multiplying the original pixels by base_pix_y microns per pixel
37 |   base_ts_x: 1.0 # convert between transcript locations and target pixels along width
38 |   base_ts_y: 1.0 # convert between transcript locations and target pixels along height
39 |   global_shift_x: 12 # additional adjustment to align transcripts to DAPI in target pixels along image width; default: 0
40 |   global_shift_y: 10 # additional adjustment to align transcripts to DAPI in target pixels along image height; default: 0
41 | 
42 | model_params:
43 |   name: custom # segmentation model to use: custom for model in model.py or set to a encoder name from segmentation_models_pytorch; default: custom
44 |   patch_size: 64 # size of transcriptomic image patches for input to DL model
45 |   elongated: # list of elongated cell types that are in the single-cell reference
46 |   - Endothelial
47 |   - Fibroblasts
48 |   - Myofibroblasts
49 |   - SMC
50 | 
51 | training_params:
52 |   total_epochs: 1 # number of training epochs; default: 1
53 |   total_steps: 4000 # number of training steps; default: 4000
54 |   ne_weight: 1.0 # weight for nuclei encapsulation loss; default: 1.0
55 |   os_weight: 1.0 # weight for oversegmentation loss; default: 1.0
56 |   cc_weight: 1.0 # weight for cell-calling loss; default: 1.0
57 |   ov_weight: 1.0 # weight for oversegmentation loss; default: 1.0
58 |   pos_weight: 1.0 # weight for positive marker loss; default: 1.0
59 |   neg_weight: 1.0 # weight for negative marker loss; default: 1.0
60 | 
61 | testing_params:
62 |   test_epoch: 1 # epoch to test; default: 1
63 |   test_step: 4000 # step number to test; default: 4000
64 | 
65 | experiment_dirs:
66 |   dir_id: last # specify timestamp of output dir or leave blank to use latest dir, default: last


--------------------------------------------------------------------------------
/bidcell/example_params/small_example.yaml:
--------------------------------------------------------------------------------
 1 | # for functions in bidcell/processing
 2 | # NOTE: Commented options default to None
 3 | 
 4 | cpus: 8 # number of CPUs for multiprocessing
 5 | 
 6 | files:
 7 |   data_dir: ./example_data/dataset_xenium_breast1_small # data directory for processed/output data
 8 |   fp_dapi: ./example_data/dataset_xenium_breast1_small/morphology_mip_small.tif # path of DAPI image or path of output stitched DAPI if using stitch_nuclei
 9 |   fp_transcripts: ./example_data/dataset_xenium_breast1_small/transcripts_small.csv # path of transcripts file
10 |   fp_ref: ./example_data/sc_references/sc_breast.csv # file path of reference data
11 |   fp_pos_markers: ./example_data/sc_references/sc_breast_markers_pos.csv # file path of positive markers
12 |   fp_neg_markers: ./example_data/sc_references/sc_breast_markers_neg.csv # file path of negative markers
13 | 
14 | nuclei_fovs:
15 |   stitch_nuclei_fovs: False # set True to stitch separate FOVs of DAPI together in 1 image
16 | 
17 | nuclei:
18 |   diameter: # estimated diameter of nuclei for Cellpose - or None to automatically compute, default: None
19 | 
20 | transcripts:
21 |   shift_to_origin: True # shift to origin, making min(x) and min(y) (0,0)
22 |   x_col: x_location # name of x location column in transcripts file
23 |   y_col: y_location # name of y location column in transcripts file
24 |   gene_col: feature_name # name of genes column in transcripts file
25 |   transcripts_to_filter: # genes starting with these strings will be filtered out
26 |   - NegControlProbe_
27 |   - antisense_
28 |   - NegControlCodeword_
29 |   - BLANK_
30 |   - Blank-
31 |   - NegPrb
32 | 
33 | affine:
34 |   target_pix_um: 1.0 # microns per pixel to perform segmentation; default: 1.0
35 |   base_pix_x: 0.2125 # convert to microns along width by multiplying the original pixels by base_pix_x microns per pixel
36 |   base_pix_y: 0.2125 # convert to microns along width by multiplying the original pixels by base_pix_y microns per pixel
37 |   base_ts_x: 1.0 # convert between transcript locations and target pixels along width
38 |   base_ts_y: 1.0 # convert between transcript locations and target pixels along height
39 |   global_shift_x: 0 # additional adjustment to align transcripts to DAPI in target pixels along image width; default: 0
40 |   global_shift_y: 0 # additional adjustment to align transcripts to DAPI in target pixels along image height; default: 0
41 | 
42 | model_params:
43 |   name: custom # segmentation model to use: custom for model in model.py or set to a encoder name from segmentation_models_pytorch; default: custom
44 |   patch_size: 48 # size of transcriptomic image patches for input to DL model
45 |   elongated: # list of elongated cell types that are in the single-cell reference
46 |   - Endothelial
47 |   - Fibroblasts
48 |   - Myofibroblasts
49 |   - SMC
50 | 
51 | training_params:
52 |   total_epochs: 1 # number of training epochs; default: 1
53 |   total_steps: 60 # number of training steps; default: 4000
54 |   ne_weight: 1.0 # weight for nuclei encapsulation loss; default: 1.0
55 |   os_weight: 1.0 # weight for oversegmentation loss; default: 1.0
56 |   cc_weight: 1.0 # weight for cell-calling loss; default: 1.0
57 |   ov_weight: 1.0 # weight for oversegmentation loss; default: 1.0
58 |   pos_weight: 1.0 # weight for positive marker loss; default: 1.0
59 |   neg_weight: 1.0 # weight for negative marker loss; default: 1.0
60 | 
61 | testing_params:
62 |   test_epoch: 1 # epoch to test; default: 1
63 |   test_step: 60 # step number to test; default: 4000
64 | 
65 | experiment_dirs:
66 |   dir_id: last # specify timestamp of output dir or leave blank to use latest dir, default: last


--------------------------------------------------------------------------------
/bidcell/example_params/stereoseq.yaml:
--------------------------------------------------------------------------------
  1 | # for functions in bidcell/processing
  2 | # NOTE: Commented options default to None
  3 | 
  4 | cpus: 16 # number of CPUs for multiprocessing
  5 | 
  6 | files:
  7 |   data_dir: ./data_large/dataset_stereoseq_mousebrain # data directory for processed/output data
  8 |   fp_dapi: ./data_large/dataset_stereoseq_mousebrain/Mouse_brain_Adult.tif # path of DAPI image or path of output stitched DAPI if using stitch_nuclei
  9 |   fp_transcripts: ./data_large/dataset_stereoseq_mousebrain/Mouse_brain_Adult_GEM_bin1.tsv.gz # path of transcripts file
 10 |   fp_ref: ./data_large/sc_references/sc_mousebrain.csv # file path of reference data
 11 |   fp_pos_markers: ./data_large/sc_references/sc_mousebrain_markers_pos.csv # file path of positive markers
 12 |   fp_neg_markers: ./data_large/sc_references/sc_mousebrain_markers_neg.csv # file path of negative markers
 13 |   fp_selected_genes: ./data_large/dataset_stereoseq_mousebrain/example_mousebrain_genes.txt # file containing names of genes to use
 14 | 
 15 | nuclei_fovs:
 16 |   stitch_nuclei_fovs: False # set True to stitch separate FOVs of DAPI together in 1 image
 17 | 
 18 | nuclei:
 19 |   diameter: 30 # estimated diameter of nuclei for Cellpose - or None to automatically compute, default: None
 20 |   crop_nuclei_to_ts: True # crop nuclei to region of transcript detections - run generate_expression_maps() before segment_nuclei()
 21 | 
 22 | transcripts:
 23 |   shift_to_origin: True # shift to origin, making min(x) and min(y) (0,0)
 24 |   x_col: y # name of x location column in transcripts file
 25 |   y_col: x # name of y location column in transcripts file
 26 |   gene_col: geneID # name of genes column in transcripts file
 27 |   counts_col: MIDCounts # name of counts column in transcripts file, eg MIDCounts, default: None
 28 |   transcripts_to_filter: # genes starting with these strings will be filtered out
 29 |   - NegControlProbe_
 30 |   - antisense_
 31 |   - NegControlCodeword_
 32 |   - BLANK_
 33 |   - Blank-
 34 |   - NegPrb
 35 | 
 36 | affine:
 37 |   target_pix_um: 1.0 # microns per pixel to perform segmentation; default: 1.0
 38 |   base_pix_x: 1.0 # convert to microns along width by multiplying the original pixels by base_pix_x microns per pixel
 39 |   base_pix_y: 1.0 # convert to microns along width by multiplying the original pixels by base_pix_y microns per pixel
 40 |   base_ts_x: 1.0 # convert between transcript locations and target pixels along width
 41 |   base_ts_y: 1.0 # convert between transcript locations and target pixels along height
 42 |   global_shift_x: 0 # additional adjustment to align transcripts to DAPI in target pixels along image width; default: 0
 43 |   global_shift_y: 0 # additional adjustment to align transcripts to DAPI in target pixels along image height; default: 0
 44 | 
 45 | model_params:
 46 |   name: custom # segmentation model to use: custom for model in model.py or set to a encoder name from segmentation_models_pytorch; default: custom
 47 |   patch_size: 64 # size of transcriptomic image patches for input to DL model
 48 |   elongated: # list of elongated cell types that are in the single-cell reference
 49 |   - Astro
 50 |   - CA1
 51 |   - CA2
 52 |   - CA3
 53 |   - CR
 54 |   - CT SUB
 55 |   - Car3
 56 |   - DG
 57 |   - Endo
 58 |   - IT HATA
 59 |   - L2 IT APr
 60 |   - L2 IT ENTl
 61 |   - L2 IT ENTm
 62 |   - L2 IT RSP-ACA
 63 |   - L2 IT RSPv-POST-PRE
 64 |   - L2/3 IT AI
 65 |   - L2/3 IT CTX
 66 |   - L2/3 IT ENTl
 67 |   - L2/3 IT PAR
 68 |   - L2/3 IT POST-PRE
 69 |   - L2/3 IT ProS
 70 |   - L3 IT ENTl
 71 |   - L3 IT ENTm
 72 |   - L4 IT CTX
 73 |   - L4 RSP-ACA
 74 |   - L4/5 IT CTX
 75 |   - L5 IT CTX
 76 |   - L5 IT RSP-ACA
 77 |   - L5 PPP
 78 |   - L5 PT CTX
 79 |   - L5/6 IT CTX
 80 |   - L5/6 IT PFC
 81 |   - L5/6 IT TPE-ENT
 82 |   - L5/6 NP CT CTX
 83 |   - L5/6 NP CTX
 84 |   - L6 CT CTX
 85 |   - L6 CT ENT
 86 |   - L6 IT CTX
 87 |   - L6 IT ENTl
 88 |   - L6b CTX
 89 |   - L6b ENT
 90 |   - L6b RHP
 91 |   - Lamp5
 92 |   - Meis2
 93 |   - Micro-PVM
 94 |   - Mossy
 95 |   - NP PPP
 96 |   - NP SUB
 97 |   - Ntng1 HPF
 98 |   - Oligo
 99 |   - Pax6
100 |   - ProS
101 |   - Pvalb
102 |   - SMC-Peri
103 |   - SUB
104 |   - Sncg
105 |   - Sst
106 |   - VLMC
107 |   - Vip
108 | 
109 | training_params:
110 |   total_epochs: 1 # number of training epochs; default: 1
111 |   total_steps: 4000 # number of training steps; default: 4000
112 |   ne_weight: 1.0 # weight for nuclei encapsulation loss; default: 1.0
113 |   os_weight: 1.0 # weight for oversegmentation loss; default: 1.0
114 |   cc_weight: 1.0 # weight for cell-calling loss; default: 1.0
115 |   ov_weight: 1.0 # weight for oversegmentation loss; default: 1.0
116 |   pos_weight: 1.0 # weight for positive marker loss; default: 1.0
117 |   neg_weight: 1.0 # weight for negative marker loss; default: 1.0
118 | 
119 | testing_params:
120 |   test_epoch: 1 # epoch to test; default: 1
121 |   test_step: 4000 # step number to test; default: 4000
122 | 
123 | experiment_dirs:
124 |   dir_id: last # specify timestamp of output dir or leave blank to use latest dir, default: last


--------------------------------------------------------------------------------
/bidcell/example_params/xenium.yaml:
--------------------------------------------------------------------------------
 1 | # for functions in bidcell/processing
 2 | # NOTE: Commented options default to None
 3 | 
 4 | cpus: 16 # number of CPUs for multiprocessing
 5 | 
 6 | files: # NOTE: please ensure these point to the right locations
 7 |   data_dir: ./data_large/dataset_xenium_breast1 # data directory for processed/output data
 8 |   fp_dapi: ./data_large/dataset_xenium_breast1/morphology_mip.ome.tif # path of DAPI image or path of output stitched DAPI if using stitch_nuclei
 9 |   fp_transcripts: ./data_large/dataset_xenium_breast1/transcripts.csv.gz # path of transcripts file
10 |   fp_ref: ./example_data/sc_references/sc_breast.csv # file path of reference data
11 |   fp_pos_markers: ./example_data/sc_references/sc_breast_markers_pos.csv # file path of positive markers
12 |   fp_neg_markers: ./example_data/sc_references/sc_breast_markers_neg.csv # file path of negative markers
13 | 
14 | nuclei_fovs:
15 |   stitch_nuclei_fovs: False # set True to stitch separate FOVs of DAPI together in 1 image
16 | 
17 | nuclei:
18 |   diameter: # estimated diameter of nuclei for Cellpose - or None to automatically compute, default: None
19 | 
20 | transcripts:
21 |   shift_to_origin: False # shift to origin, making min(x) and min(y) (0,0)
22 |   x_col: x_location # name of x location column in transcripts file
23 |   y_col: y_location # name of y location column in transcripts file
24 |   gene_col: feature_name # name of genes column in transcripts file
25 |   transcripts_to_filter: # genes starting with these strings will be filtered out
26 |   - NegControlProbe_
27 |   - antisense_
28 |   - NegControlCodeword_
29 |   - BLANK_
30 |   - Blank-
31 |   - NegPrb
32 | 
33 | affine:
34 |   target_pix_um: 1.0 # microns per pixel to perform segmentation; default: 1.0
35 |   base_pix_x: 0.2125 # convert to microns along width by multiplying the original pixels by base_pix_x microns per pixel
36 |   base_pix_y: 0.2125 # convert to microns along width by multiplying the original pixels by base_pix_y microns per pixel
37 |   base_ts_x: 1.0 # convert between transcript locations and target pixels along width
38 |   base_ts_y: 1.0 # convert between transcript locations and target pixels along height
39 |   global_shift_x: 0 # additional adjustment to align transcripts to DAPI in target pixels along image width; default: 0
40 |   global_shift_y: 0 # additional adjustment to align transcripts to DAPI in target pixels along image height; default: 0
41 | 
42 | model_params:
43 |   name: custom # segmentation model to use: custom for model in model.py or set to a encoder name from segmentation_models_pytorch; default: custom
44 |   patch_size: 48 # size of transcriptomic image patches for input to DL model
45 |   elongated: # list of elongated cell types that are in the single-cell reference
46 |   - Endothelial
47 |   - Fibroblasts
48 |   - Myofibroblasts
49 |   - SMC
50 | 
51 | training_params:
52 |   total_epochs: 1 # number of training epochs; default: 1
53 |   total_steps: 4000 # number of training steps; default: 4000
54 |   ne_weight: 1.0 # weight for nuclei encapsulation loss; default: 1.0
55 |   os_weight: 1.0 # weight for oversegmentation loss; default: 1.0
56 |   cc_weight: 1.0 # weight for cell-calling loss; default: 1.0
57 |   ov_weight: 1.0 # weight for oversegmentation loss; default: 1.0
58 |   pos_weight: 1.0 # weight for positive marker loss; default: 1.0
59 |   neg_weight: 1.0 # weight for negative marker loss; default: 1.0
60 | 
61 | testing_params:
62 |   test_epoch: 1 # epoch to test; default: 1
63 |   test_step: 4000 # step number to test; default: 4000
64 | 
65 | experiment_dirs:
66 |   dir_id: last # specify timestamp of output dir or leave blank to use latest dir, default: last


--------------------------------------------------------------------------------
/bidcell/model/dataio/dataset_input.py:
--------------------------------------------------------------------------------
  1 | import glob
  2 | import os
  3 | import random
  4 | import re
  5 | import sys
  6 | import warnings
  7 | 
  8 | import cv2
  9 | import h5py
 10 | import imgaug.augmenters as iaa
 11 | import natsort
 12 | import numpy as np
 13 | import pandas as pd
 14 | import tifffile
 15 | import torch
 16 | import torch.utils.data as data
 17 | from scipy.ndimage import rotate
 18 | 
 19 | warnings.filterwarnings("ignore", category=np.VisibleDeprecationWarning)
 20 | 
 21 | 
 22 | class DataProcessing(data.Dataset):
 23 |     def __init__(
 24 |         self,
 25 |         config,
 26 |         isTraining=True,
 27 |         all_patches=True,
 28 |         shift_patches=0,
 29 |         total_steps=None,
 30 |     ):
 31 |         self.patch_size = config.model_params.patch_size
 32 |         self.isTraining = isTraining
 33 | 
 34 |         self.expr_fp = (
 35 |             config.files.data_dir
 36 |             + "/"
 37 |             + config.files.dir_out_maps
 38 |             + "/"
 39 |             + config.files.dir_patches
 40 |             + str(self.patch_size)
 41 |             + "x"
 42 |             + str(self.patch_size)
 43 |             + "_shift_"
 44 |             + str(shift_patches)
 45 |         )
 46 | 
 47 |         self.nuclei_fp = os.path.join(config.files.data_dir, config.files.fp_nuclei)
 48 |         self.nuclei_types_fp = os.path.join(
 49 |             config.files.data_dir, config.files.fp_nuclei_anno
 50 |         )
 51 |         self.gene_names_fp = os.path.join(
 52 |             config.files.data_dir, config.files.fp_gene_names
 53 |         )
 54 | 
 55 |         self.pos_markers_fp = config.files.fp_pos_markers
 56 |         self.neg_markers_fp = config.files.fp_neg_markers
 57 |         self.ref_fp = config.files.fp_ref
 58 | 
 59 |         # Check valid data directories
 60 |         if not os.path.exists(self.expr_fp):
 61 |             sys.exit("Invalid file path %s" % self.expr_fp)
 62 |         if not os.path.exists(self.nuclei_fp):
 63 |             sys.exit("Invalid file path %s" % self.nuclei_fp)
 64 |         if not os.path.exists(self.nuclei_types_fp):
 65 |             sys.exit("Invalid file path %s" % self.nuclei_types_fp)
 66 |         if not os.path.exists(self.pos_markers_fp):
 67 |             sys.exit("Invalid file path %s" % self.pos_markers_fp)
 68 |         if not os.path.exists(self.neg_markers_fp):
 69 |             sys.exit("Invalid file path %s" % self.neg_markers_fp)
 70 |         if not os.path.exists(self.ref_fp):
 71 |             sys.exit("Invalid file path %s" % self.ref_fp)
 72 |         if not os.path.exists(self.gene_names_fp):
 73 |             sys.exit("Invalid file path %s" % self.gene_names_fp)
 74 | 
 75 |         self.nuclei = tifffile.imread(self.nuclei_fp)
 76 |         self.nuclei = self.nuclei.astype(np.int32)
 77 |         print("Loaded nuclei")
 78 |         print(self.nuclei.shape)
 79 | 
 80 |         expr_fp_ext = ".hdf5"
 81 |         fp_patches_all = glob.glob(self.expr_fp + "/*" + expr_fp_ext)
 82 |         fp_patches_all = natsort.natsorted(fp_patches_all)
 83 | 
 84 |         # # Get coordinates of non-overlapping patches
 85 |         # if shift_patches == 0:
 86 |         # h_starts = list(np.arange(0, self.nuclei.shape[0]-self.patch_size, self.patch_size))
 87 |         # w_starts = list(np.arange(0, self.nuclei.shape[1]-self.patch_size, self.patch_size))
 88 | 
 89 |         # # Include remainder patches on
 90 |         # h_starts.append(self.nuclei.shape[0]-self.patch_size)
 91 |         # w_starts.append(self.nuclei.shape[1]-self.patch_size)
 92 |         # else:
 93 |         # h_starts = list(np.arange(shift_patches, self.nuclei.shape[0]-self.patch_size, self.patch_size))
 94 |         # w_starts = list(np.arange(shift_patches, self.nuclei.shape[1]-self.patch_size, self.patch_size))
 95 | 
 96 |         # coords_starts = [(x, y) for x in h_starts for y in w_starts]
 97 | 
 98 |         # Randomly select train/test samples
 99 |         random.seed(1234)
100 |         n_coords = len(fp_patches_all)
101 |         sample_ids = range(n_coords)
102 |         sample_k = int(80 * n_coords / 100)
103 |         train_ids = random.sample(sample_ids, k=sample_k)
104 | 
105 |         if self.isTraining:
106 |             self.fp_patches = [fp_patches_all[x] for x in train_ids]
107 |             if total_steps is not None:
108 |                 total_steps = total_steps + int(0.05 * len(train_ids))
109 |                 if total_steps <= len(train_ids):
110 |                     self.fp_patches = self.fp_patches[:total_steps]
111 |         elif all_patches:
112 |             self.fp_patches = fp_patches_all
113 |         else:
114 |             test_ids = [x for x in sample_ids if x not in train_ids]
115 |             self.fp_patches = [fp_patches_all[x] for x in test_ids]
116 | 
117 |         print("%d patches available" % len(self.fp_patches))
118 | 
119 |         # if self.isTraining:
120 |         # self.coords_starts = [coords_starts[x] for x in train_ids]
121 |         # self.coords_starts = self.coords_starts
122 |         # elif all_patches == True:
123 |         # self.coords_starts = coords_starts
124 |         # else:
125 |         # test_ids = [x for x in sample_ids if x not in train_ids]
126 |         # self.coords_starts = [coords_starts[x] for x in test_ids]
127 | 
128 |         # Nuclei IDs with cell types and elongated nuclei
129 |         h5f = h5py.File(self.nuclei_types_fp, "r")
130 |         self.nuclei_types_idx = list(h5f["data"][:])
131 |         self.nuclei_types_ids = list(h5f["ids"][:])
132 |         h5f.close()
133 | 
134 |         # Get order of cell-types from sc reference
135 |         atlas_exprs = pd.read_csv(self.ref_fp, index_col=0)
136 |         ct_idx_ref = atlas_exprs["ct_idx"].tolist()
137 |         ct_ref = atlas_exprs["cell_type"].tolist()
138 |         name_index_dict = {}
139 |         for name, index in zip(ct_ref, ct_idx_ref):
140 |             name_index_dict[index] = name
141 |         type_names = [name_index_dict[index] for index in sorted(ct_idx_ref)]
142 |         self.type_names = list(dict.fromkeys(type_names))
143 |         print(f"Cell types: {self.type_names}")
144 | 
145 |         types_elong = config.model_params.elongated
146 |         idx_elong = [self.type_names.index(x) for x in types_elong]
147 |         nuclei_types_elong = [1 if x in idx_elong else 0 for x in self.nuclei_types_idx]
148 |         self.nuclei_ids_elong = [
149 |             x for i, x in enumerate(self.nuclei_types_ids) if nuclei_types_elong[i] == 1
150 |         ]
151 | 
152 |         # print('%d patches available' %len(self.coords_starts))
153 | 
154 |         df_pos_markers = pd.read_csv(self.pos_markers_fp, index_col=0)
155 |         df_neg_markers = pd.read_csv(self.neg_markers_fp, index_col=0)
156 | 
157 |         # self.pos_markers = df_pos_markers.to_numpy()
158 |         # self.neg_markers = df_neg_markers.to_numpy()
159 |         self.pos_markers = df_pos_markers
160 |         self.neg_markers = df_neg_markers
161 | 
162 |         with open(self.gene_names_fp) as file:
163 |             self.gene_names = [line.rstrip() for line in file]
164 | 
165 |     def augment_data(self, batch_raw):
166 |         batch_raw = np.expand_dims(batch_raw, 0)
167 | 
168 |         # Original, horizontal
169 |         random_flip = np.random.randint(2, size=1)[0]
170 |         # 0, 90, 180, 270
171 |         random_rotate = np.random.randint(4, size=1)[0]
172 | 
173 |         # Flips
174 |         if random_flip == 0:
175 |             batch_flip = batch_raw * 1
176 |         else:
177 |             batch_flip = iaa.Flipud(1.0)(images=batch_raw)
178 | 
179 |         # Rotations
180 |         if random_rotate == 0:
181 |             batch_rotate = batch_flip * 1
182 |         elif random_rotate == 1:
183 |             batch_rotate = iaa.Rot90(1, keep_size=True)(images=batch_flip)
184 |         elif random_rotate == 2:
185 |             batch_rotate = iaa.Rot90(2, keep_size=True)(images=batch_flip)
186 |         else:
187 |             batch_rotate = iaa.Rot90(3, keep_size=True)(images=batch_flip)
188 | 
189 |         images_aug_array = np.array(batch_rotate)
190 | 
191 |         return images_aug_array, random_flip, random_rotate
192 | 
193 |     def normalise_images(self, imgs):
194 |         return (imgs - self.fold_mean) / self.fold_std
195 | 
196 |     def __len__(self):
197 |         "Denotes the total number of samples"
198 |         return len(self.fp_patches)
199 | 
200 |     def __getitem__(self, index):
201 |         "Generates one sample of data"
202 | 
203 |         patch_fp = self.fp_patches[index]
204 |         h5f = h5py.File(patch_fp, "r")
205 |         expr = h5f["data"][:].astype(np.float64)
206 |         h5f.close()
207 | 
208 |         # Global coordinates
209 |         expr_fp_g = re.findall(r"\d+", os.path.basename(patch_fp))
210 |         coords_h1 = int(expr_fp_g[0])
211 |         coords_w1 = int(expr_fp_g[1])
212 |         coords_h2 = coords_h1 + self.patch_size
213 |         coords_w2 = coords_w1 + self.patch_size
214 | 
215 |         nucl = self.nuclei[coords_h1:coords_h2, coords_w1:coords_w2]
216 | 
217 |         assert expr.shape[0] == self.patch_size, print(expr.shape[0])
218 |         assert expr.shape[1] == self.patch_size, print(expr.shape[1])
219 | 
220 |         img = np.concatenate((expr, np.expand_dims(nucl, -1)), -1)
221 | 
222 |         if self.isTraining:
223 |             img, _, _ = self.augment_data(img)
224 |             img = img[0, :, :, :]
225 | 
226 |         expr_aug = img[:, :, :-1]
227 |         nucl_aug = img[:, :, -1]
228 | 
229 |         # Tile cells individually along channel axis
230 |         cell_ids, _ = np.unique(nucl_aug, return_index=True)
231 |         cell_ids = cell_ids[cell_ids != 0]
232 |         n_cells = len(cell_ids)
233 | 
234 |         nucl_split = np.zeros((self.patch_size, self.patch_size, n_cells))
235 |         search_areas = np.zeros((self.patch_size, self.patch_size, n_cells))
236 | 
237 |         # For non-elongated cells
238 |         kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (20, 20))
239 | 
240 |         # For elongated cells
241 |         ksizevmin = 3
242 |         ksize_total = 60
243 |         ecc_scale = 0.9
244 | 
245 |         # For pos/neg marker loss
246 |         search_pos = np.zeros((self.patch_size, self.patch_size, n_cells))
247 |         search_neg = np.zeros((self.patch_size, self.patch_size, n_cells))
248 |         kernel_posneg = np.ones((3, 3), dtype=np.uint8)
249 | 
250 |         for i_cell, c_id in enumerate(cell_ids):
251 |             nucl_split[:, :, i_cell] = np.where(nucl_aug == c_id, 1, 0)
252 | 
253 |             if c_id not in self.nuclei_ids_elong:
254 |                 # Not elongated
255 |                 search_areas[:, :, i_cell] = cv2.dilate(
256 |                     nucl_split[:, :, i_cell], kernel, iterations=1
257 |                 )
258 |             else:
259 |                 # Elongated
260 |                 try:
261 |                     contours = cv2.findContours(
262 |                         nucl_split[:, :, i_cell].astype(np.uint8),
263 |                         cv2.RETR_LIST,
264 |                         cv2.CHAIN_APPROX_NONE,
265 |                     )
266 | 
267 |                     ellipse = cv2.fitEllipse(np.squeeze(contours[0]))
268 |                     (center, axes, orientation) = ellipse
269 |                     majoraxis_length = max(axes)
270 |                     minoraxis_length = min(axes)
271 |                     eccentricity = np.sqrt(
272 |                         1 - (minoraxis_length / majoraxis_length) ** 2
273 |                     )
274 | 
275 |                     # Get ellipse filter based on eccentricity and majoraxis length
276 |                     # Rotate based on orientation
277 |                     # https://opencv24-python-tutorials.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html
278 |                     ksizeh = int(round(ecc_scale * eccentricity * ksize_total))
279 |                     ksizev = (
280 |                         ksize_total - ksizeh
281 |                         if (ksize_total - ksizeh) > ksizevmin
282 |                         else ksizevmin
283 |                     )
284 |                     kernel_elong = cv2.getStructuringElement(
285 |                         cv2.MORPH_ELLIPSE, (ksizeh, ksizev)
286 |                     )
287 |                     kernel_elong = rotate(kernel_elong, 90 - orientation, reshape=True)
288 | 
289 |                     search_areas[:, :, i_cell] = cv2.dilate(
290 |                         nucl_split[:, :, i_cell], kernel_elong, iterations=1
291 |                     )
292 |                 except Exception:
293 |                     search_areas[:, :, i_cell] = cv2.dilate(
294 |                         nucl_split[:, :, i_cell], kernel, iterations=1
295 |                     )
296 | 
297 |             ct_nucleus = int(self.nuclei_types_idx[self.nuclei_types_ids.index(c_id)])
298 |             ct_nucleus_name = self.type_names[ct_nucleus]
299 | 
300 |             # Markers with dilation
301 |             # ct_pos = np.expand_dims(np.expand_dims(self.pos_markers[ct_nucleus,:], 0),0)*expr_aug
302 |             pos_vals = self.pos_markers.loc[ct_nucleus_name, self.gene_names].to_numpy()
303 |             ct_pos = np.expand_dims(np.expand_dims(pos_vals, 0), 0) * expr_aug
304 |             ct_pos = np.sum(ct_pos, -1)
305 |             ct_pos[ct_pos > 0] = 1
306 |             ct_pos[ct_pos < 0] = 0
307 |             search_pos[:, :, i_cell] = search_areas[:, :, i_cell] * cv2.dilate(
308 |                 ct_pos, kernel_posneg, iterations=1
309 |             )
310 |             search_pos[search_pos > 0] = 1
311 |             search_pos[search_pos < 0] = 0
312 | 
313 |             # ct_neg = np.expand_dims(np.expand_dims(self.neg_markers[ct_nucleus,:], 0),0)*expr_aug
314 |             neg_vals = self.neg_markers.loc[ct_nucleus_name, self.gene_names].to_numpy()
315 |             ct_neg = np.expand_dims(np.expand_dims(neg_vals, 0), 0) * expr_aug
316 |             ct_neg = np.sum(ct_neg, -1)
317 |             ct_neg[ct_neg > 0] = 1
318 |             ct_neg[ct_neg < 0] = 0
319 |             search_neg[:, :, i_cell] = search_areas[:, :, i_cell] * cv2.dilate(
320 |                 ct_neg, kernel_posneg, iterations=1
321 |             )
322 |             search_neg[search_neg > 0] = 1
323 |             search_neg[search_neg < 0] = 0
324 | 
325 |         search_areas[search_areas > 0] = 1
326 |         search_areas[search_areas < 0] = 0
327 | 
328 |         expr_aug_sum = np.sum(expr_aug, -1)
329 | 
330 |         # Mask expressions and change channel order
331 |         expr_split = np.repeat(expr_aug[:, :, :, np.newaxis], n_cells, axis=3)
332 |         expr_split = expr_split * np.expand_dims(search_areas, 2)
333 | 
334 |         # Convert to tensor
335 |         expr_torch = torch.from_numpy(expr_split).float()
336 |         nucl_torch = torch.from_numpy(nucl_split).long()
337 |         search_areas_torch = torch.from_numpy(search_areas).long()
338 |         search_pos_torch = torch.from_numpy(search_pos).long()
339 |         search_neg_torch = torch.from_numpy(search_neg).long()
340 | 
341 |         if self.isTraining:
342 |             return (
343 |                 expr_torch,
344 |                 nucl_torch,
345 |                 search_areas_torch,
346 |                 search_pos_torch,
347 |                 search_neg_torch,
348 |                 coords_h1,
349 |                 coords_w1,
350 |                 nucl_aug,
351 |                 expr_aug_sum,
352 |             )
353 |         else:
354 |             return (
355 |                 expr_torch,
356 |                 nucl_torch,
357 |                 search_areas_torch,
358 |                 search_pos_torch,
359 |                 search_neg_torch,
360 |                 coords_h1,
361 |                 coords_w1,
362 |                 nucl_aug,
363 |                 expr_aug_sum,
364 |                 self.nuclei.shape[0],
365 |                 self.nuclei.shape[1],
366 |                 self.expr_fp,
367 |             )
368 | 


--------------------------------------------------------------------------------
/bidcell/model/model/intialisation.py:
--------------------------------------------------------------------------------
 1 | from torch.nn import init
 2 | 
 3 | 
 4 | def weights_init_normal(m):
 5 |     classname = m.__class__.__name__
 6 |     if classname.find("Conv") != -1:
 7 |         init.normal_(m.weight.data, 0.0, 0.02)
 8 |     elif classname.find("Linear") != -1:
 9 |         init.normal_(m.weight.data, 0.0, 0.02)
10 |     elif classname.find("BatchNorm") != -1:
11 |         init.normal_(m.weight.data, 1.0, 0.02)
12 |         init.constant_(m.bias.data, 0.0)
13 | 
14 | 
15 | def weights_init_xavier(m):
16 |     classname = m.__class__.__name__
17 |     if classname.find("Conv") != -1:
18 |         init.xavier_normal_(m.weight.data, gain=1)
19 |     elif classname.find("Linear") != -1:
20 |         init.xavier_normal_(m.weight.data, gain=1)
21 |     elif classname.find("BatchNorm") != -1:
22 |         init.normal_(m.weight.data, 1.0, 0.02)
23 |         init.constant_(m.bias.data, 0.0)
24 | 
25 | 
26 | def weights_init_kaiming(m):
27 |     classname = m.__class__.__name__
28 |     if classname.find("Conv") != -1:
29 |         init.kaiming_normal_(m.weight.data, a=0, mode="fan_in")
30 |     elif classname.find("Linear") != -1:
31 |         init.kaiming_normal_(m.weight.data, a=0, mode="fan_in")
32 |     elif classname.find("BatchNorm") != -1:
33 |         init.normal_(m.weight.data, 1.0, 0.02)
34 |         init.constant_(m.bias.data, 0.0)
35 | 
36 | 
37 | def weights_init_orthogonal(m):
38 |     classname = m.__class__.__name__
39 |     if classname.find("Conv") != -1:
40 |         init.orthogonal_(m.weight.data, gain=1)
41 |     elif classname.find("Linear") != -1:
42 |         init.orthogonal_(m.weight.data, gain=1)
43 |     elif classname.find("BatchNorm") != -1:
44 |         init.normal_(m.weight.data, 1.0, 0.02)
45 |         init.constant_(m.bias.data, 0.0)
46 | 
47 | 
48 | def init_weights(net, init_type="normal"):
49 |     if init_type == "normal":
50 |         net.apply(weights_init_normal)
51 |     elif init_type == "xavier":
52 |         net.apply(weights_init_xavier)
53 |     elif init_type == "kaiming":
54 |         net.apply(weights_init_kaiming)
55 |     elif init_type == "orthogonal":
56 |         net.apply(weights_init_orthogonal)
57 |     else:
58 |         raise NotImplementedError(
59 |             "initialization method [%s] is not implemented" % init_type
60 |         )
61 | 


--------------------------------------------------------------------------------
/bidcell/model/model/layers.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | from .intialisation import init_weights
 4 | 
 5 | 
 6 | class unetConv2(nn.Module):
 7 |     def __init__(self, in_size, out_size, is_batchnorm, n=2, ks=3, stride=1, padding=1):
 8 |         super(unetConv2, self).__init__()
 9 |         self.n = n
10 |         self.ks = ks
11 |         self.stride = stride
12 |         self.padding = padding
13 |         s = stride
14 |         p = padding
15 | 
16 |         if is_batchnorm:
17 |             for i in range(1, n + 1):
18 |                 conv = nn.Sequential(
19 |                     nn.Conv2d(in_size, out_size, ks, s, p),
20 |                     nn.BatchNorm2d(out_size),
21 |                     nn.ReLU(inplace=True),
22 |                 )
23 |                 setattr(self, "conv%d" % i, conv)
24 |                 in_size = out_size
25 |         else:
26 |             for i in range(1, n + 1):
27 |                 conv = nn.Sequential(
28 |                     nn.Conv2d(in_size, out_size, ks, s, p),
29 |                     nn.ReLU(inplace=True),
30 |                 )
31 |                 setattr(self, "conv%d" % i, conv)
32 |                 in_size = out_size
33 | 
34 |         # initialise the blocks
35 |         for m in self.children():
36 |             init_weights(m, init_type="kaiming")
37 | 
38 |     def forward(self, inputs):
39 |         x = inputs
40 |         for i in range(1, self.n + 1):
41 |             conv = getattr(self, "conv%d" % i)
42 |             x = conv(x)
43 |         return x
44 | 
45 | 
46 | class unetUp(nn.Module):
47 |     def __init__(self, in_size, out_size, is_deconv, n_concat=2):
48 |         super(unetUp, self).__init__()
49 |         self.conv = unetConv2(out_size * 2, out_size, False)
50 | 
51 |         if is_deconv:
52 |             self.up = nn.ConvTranspose2d(
53 |                 in_size, out_size, kernel_size=4, stride=2, padding=1
54 |             )
55 |         else:
56 |             self.up = nn.UpsamplingBilinear2d(scale_factor=2)
57 | 
58 |         # initialise the blocks
59 |         for m in self.children():
60 |             if m.__class__.__name__.find("unetConv2") != -1:
61 |                 continue
62 |             init_weights(m, init_type="kaiming")
63 | 
64 |     def forward(self, inputs0, *input):
65 |         outputs0 = self.up(inputs0)
66 |         for i in range(len(input)):
67 |             outputs0 = torch.cat([outputs0, input[i]], 1)
68 |         return self.conv(outputs0)
69 | 
70 | 
71 | class unetUp_origin(nn.Module):
72 |     def __init__(self, in_size, out_size, is_deconv, n_concat=2):
73 |         super(unetUp_origin, self).__init__()
74 |         # self.conv = unetConv2(out_size*2, out_size, False)
75 |         if is_deconv:
76 |             self.conv = unetConv2(in_size + (n_concat - 2) * out_size, out_size, False)
77 |             self.up = nn.ConvTranspose2d(
78 |                 in_size, out_size, kernel_size=4, stride=2, padding=1
79 |             )
80 |         else:
81 |             self.conv = unetConv2(in_size + (n_concat - 2) * out_size, out_size, False)
82 |             self.up = nn.UpsamplingBilinear2d(scale_factor=2)
83 | 
84 |         # initialise the blocks
85 |         for m in self.children():
86 |             if m.__class__.__name__.find("unetConv2") != -1:
87 |                 continue
88 |             init_weights(m, init_type="kaiming")
89 | 
90 |     def forward(self, inputs0, *input):
91 |         # print(self.n_concat)
92 |         # print(input)
93 |         outputs0 = self.up(inputs0)
94 |         for i in range(len(input)):
95 |             outputs0 = torch.cat([outputs0, input[i]], 1)
96 |         return self.conv(outputs0)
97 | 


--------------------------------------------------------------------------------
/bidcell/model/model/losses.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | 
  4 | 
  5 | class NucleiEncapsulationLoss(nn.Module):
  6 |     """
  7 |     Ensure that nuclei are fully within predicted cells
  8 |     """
  9 | 
 10 |     def __init__(self, weight, device) -> None:
 11 |         super(NucleiEncapsulationLoss, self).__init__()
 12 |         self.weight = weight
 13 |         self.device = device
 14 | 
 15 |     def forward(self, seg_pred, batch_n):
 16 |         criterion_ce = torch.nn.CrossEntropyLoss(reduction="mean")
 17 |         loss = criterion_ce(seg_pred, batch_n[:, 0, :, :])
 18 | 
 19 |         return self.weight * loss
 20 | 
 21 | 
 22 | class Oversegmentation(nn.Module):
 23 |     """
 24 |     Minimise oversegmentation
 25 |     """
 26 | 
 27 |     def __init__(self, weight, device) -> None:
 28 |         super(Oversegmentation, self).__init__()
 29 |         self.weight = weight
 30 |         self.device = device
 31 | 
 32 |     def forward(self, seg_pred, batch_n):
 33 |         batch_n = batch_n[:, 0, :, :]
 34 | 
 35 |         seg_probs = torch.nn.functional.softmax(seg_pred, dim=1)
 36 |         probs_nuc = seg_probs[:, 1, :, :] * batch_n
 37 | 
 38 |         mask_cyto = torch.ones(batch_n.shape).to(self.device) - batch_n
 39 |         probs_cyto = seg_probs[:, 1, :, :] * mask_cyto
 40 | 
 41 |         ones = torch.ones(probs_cyto.shape).to(self.device)
 42 |         zeros = torch.zeros(probs_cyto.shape).to(self.device)
 43 | 
 44 |         alpha = 1.0
 45 | 
 46 |         preds_nuc = torch.sigmoid((probs_nuc - 0.5) * alpha) * (ones - zeros) + zeros
 47 |         count_nuc = torch.sum(preds_nuc)
 48 | 
 49 |         preds_cyto = torch.sigmoid((probs_cyto - 0.5) * alpha) * (ones - zeros) + zeros
 50 |         count_cyto = torch.sum(preds_cyto)
 51 | 
 52 |         extra = count_cyto - count_nuc
 53 |         m = torch.nn.ReLU()
 54 |         loss = m(extra)
 55 | 
 56 |         loss = loss / seg_pred.shape[0]
 57 | 
 58 |         return self.weight * loss
 59 | 
 60 | 
 61 | class CellCallingLoss(nn.Module):
 62 |     """
 63 |     Maximise assignment of transcripts to cells
 64 |     """
 65 | 
 66 |     def __init__(self, weight, device) -> None:
 67 |         super(CellCallingLoss, self).__init__()
 68 |         self.weight = weight
 69 |         self.device = device
 70 | 
 71 |     def forward(self, seg_pred, batch_sa):
 72 |         # Limit to searchable area where there is detected expression
 73 |         penalisable = batch_sa * 1
 74 |         criterion_ce = torch.nn.CrossEntropyLoss(reduction="none")
 75 |         loss = criterion_ce(seg_pred, penalisable[:, 0, :, :])
 76 | 
 77 |         loss_total = torch.sum(loss)
 78 | 
 79 |         loss_total = loss_total / seg_pred.shape[0]
 80 | 
 81 |         return self.weight * loss_total
 82 | 
 83 | 
 84 | class OverlapLoss(nn.Module):
 85 |     """
 86 |     Penalise overlaps between different cells
 87 |     """
 88 | 
 89 |     def __init__(self, weight, device) -> None:
 90 |         super(OverlapLoss, self).__init__()
 91 |         self.weight = weight
 92 |         self.device = device
 93 | 
 94 |     def forward(self, seg_pred, batch_n):
 95 |         batch_n = batch_n[:, 0, :, :]
 96 |         seg_probs = torch.nn.functional.softmax(seg_pred, dim=1)
 97 | 
 98 |         all_nuclei = torch.sum(batch_n, 0)
 99 |         all_not_nuclei = torch.ones(batch_n.shape).to(self.device) - all_nuclei
100 | 
101 |         probs_cyto = seg_probs[:, 1, :, :] * all_not_nuclei
102 | 
103 |         ones = torch.ones(probs_cyto.shape).to(self.device)
104 |         zeros = torch.zeros(probs_cyto.shape).to(self.device)
105 | 
106 |         # Penalise if number of cell prob > 0.5 is > 1
107 |         alpha = 1.0
108 |         preds_cyto = torch.sigmoid((probs_cyto - 0.5) * alpha) * (ones - zeros) + zeros
109 |         count_cyto_overlap = torch.sum(preds_cyto, 0) - all_not_nuclei
110 |         m = torch.nn.ReLU()
111 |         count_cyto_overlap = m(count_cyto_overlap)
112 | 
113 |         loss = torch.sum(count_cyto_overlap)
114 | 
115 |         scale = seg_pred.shape[0] * seg_pred.shape[2] * seg_pred.shape[3]
116 |         loss = loss / scale
117 | 
118 |         return self.weight * loss
119 | 
120 | 
121 | class PosNegMarkerLoss(nn.Module):
122 |     """
123 |     Positive and negative markers of cell type
124 |     """
125 | 
126 |     def __init__(self, weight_pos, weight_neg, device) -> None:
127 |         super(PosNegMarkerLoss, self).__init__()
128 |         self.weight_pos = weight_pos
129 |         self.weight_neg = weight_neg
130 |         self.device = device
131 | 
132 |     def forward(self, seg_pred, batch_pos, batch_neg):
133 |         batch_pos = batch_pos[:, 0, :, :]
134 |         batch_neg = batch_neg[:, 0, :, :]
135 | 
136 |         # POSITIVE markers
137 |         criterion_ce = torch.nn.CrossEntropyLoss(reduction="sum")
138 |         loss_pos = criterion_ce(seg_pred, batch_pos)
139 | 
140 |         # NEGATIVE markers
141 |         seg_probs = torch.nn.functional.softmax(seg_pred, dim=1)
142 |         probs_cells = seg_probs[:, 1, :, :]
143 | 
144 |         ones = torch.ones(probs_cells.shape).to(self.device)
145 |         zeros = torch.zeros(probs_cells.shape).to(self.device)
146 | 
147 |         alpha = 1.0
148 | 
149 |         preds_cells = (
150 |             torch.sigmoid((probs_cells - 0.5) * alpha) * (ones - zeros) + zeros
151 |         )
152 | 
153 |         loss_neg = torch.sum(preds_cells * batch_neg)
154 | 
155 |         loss_total = (
156 |             self.weight_pos * loss_pos + self.weight_neg * loss_neg
157 |         ) / seg_pred.shape[0]
158 | 
159 |         return loss_total
160 | 


--------------------------------------------------------------------------------
/bidcell/model/model/model.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | # Credit: https://github.com/avBuffer/UNet3plus_pth
  3 | import torch
  4 | import torch.nn as nn
  5 | 
  6 | from .intialisation import init_weights
  7 | from .layers import unetConv2
  8 | 
  9 | 
 10 | class SegmentationModel(nn.Module):
 11 |     def __init__(
 12 |         self,
 13 |         n_channels=313,
 14 |         bilinear=True,
 15 |         feature_scale=4,
 16 |         is_deconv=True,
 17 |         is_batchnorm=True,
 18 |     ):
 19 |         super(SegmentationModel, self).__init__()
 20 |         self.n_channels = n_channels
 21 |         self.bilinear = bilinear
 22 |         self.feature_scale = feature_scale
 23 |         self.is_deconv = is_deconv
 24 |         self.is_batchnorm = is_batchnorm
 25 |         filters = [64, 128, 256, 512, 1024]
 26 | 
 27 |         ## -------------Encoder--------------
 28 |         self.conv1 = unetConv2(self.n_channels, filters[0], self.is_batchnorm)
 29 |         self.maxpool1 = nn.MaxPool2d(kernel_size=2)
 30 | 
 31 |         self.conv2 = unetConv2(filters[0], filters[1], self.is_batchnorm)
 32 |         self.maxpool2 = nn.MaxPool2d(kernel_size=2)
 33 | 
 34 |         self.conv3 = unetConv2(filters[1], filters[2], self.is_batchnorm)
 35 |         self.maxpool3 = nn.MaxPool2d(kernel_size=2)
 36 | 
 37 |         self.conv4 = unetConv2(filters[2], filters[3], self.is_batchnorm)
 38 |         self.maxpool4 = nn.MaxPool2d(kernel_size=2)
 39 | 
 40 |         self.conv5 = unetConv2(filters[3], filters[4], self.is_batchnorm)
 41 | 
 42 |         ## -------------Decoder--------------
 43 |         self.CatChannels = filters[0]
 44 |         self.CatBlocks = 5
 45 |         self.UpChannels = self.CatChannels * self.CatBlocks
 46 | 
 47 |         """stage 4d"""
 48 |         # h1->320*320, hd4->40*40, Pooling 8 times
 49 |         self.h1_PT_hd4 = nn.MaxPool2d(8, 8, ceil_mode=True)
 50 |         self.h1_PT_hd4_conv = nn.Conv2d(filters[0], self.CatChannels, 3, padding=1)
 51 |         self.h1_PT_hd4_bn = nn.BatchNorm2d(self.CatChannels)
 52 |         self.h1_PT_hd4_relu = nn.ReLU(inplace=True)
 53 | 
 54 |         # h2->160*160, hd4->40*40, Pooling 4 times
 55 |         self.h2_PT_hd4 = nn.MaxPool2d(4, 4, ceil_mode=True)
 56 |         self.h2_PT_hd4_conv = nn.Conv2d(filters[1], self.CatChannels, 3, padding=1)
 57 |         self.h2_PT_hd4_bn = nn.BatchNorm2d(self.CatChannels)
 58 |         self.h2_PT_hd4_relu = nn.ReLU(inplace=True)
 59 | 
 60 |         # h3->80*80, hd4->40*40, Pooling 2 times
 61 |         self.h3_PT_hd4 = nn.MaxPool2d(2, 2, ceil_mode=True)
 62 |         self.h3_PT_hd4_conv = nn.Conv2d(filters[2], self.CatChannels, 3, padding=1)
 63 |         self.h3_PT_hd4_bn = nn.BatchNorm2d(self.CatChannels)
 64 |         self.h3_PT_hd4_relu = nn.ReLU(inplace=True)
 65 | 
 66 |         # h4->40*40, hd4->40*40, Concatenation
 67 |         self.h4_Cat_hd4_conv = nn.Conv2d(filters[3], self.CatChannels, 3, padding=1)
 68 |         self.h4_Cat_hd4_bn = nn.BatchNorm2d(self.CatChannels)
 69 |         self.h4_Cat_hd4_relu = nn.ReLU(inplace=True)
 70 | 
 71 |         # hd5->20*20, hd4->40*40, Upsample 2 times
 72 |         self.hd5_UT_hd4 = nn.Upsample(
 73 |             scale_factor=2, mode="bilinear", align_corners=False
 74 |         )  # 14*14
 75 |         self.hd5_UT_hd4_conv = nn.Conv2d(filters[4], self.CatChannels, 3, padding=1)
 76 |         self.hd5_UT_hd4_bn = nn.BatchNorm2d(self.CatChannels)
 77 |         self.hd5_UT_hd4_relu = nn.ReLU(inplace=True)
 78 | 
 79 |         # fusion(h1_PT_hd4, h2_PT_hd4, h3_PT_hd4, h4_Cat_hd4, hd5_UT_hd4)
 80 |         self.conv4d_1 = nn.Conv2d(self.UpChannels, self.UpChannels, 3, padding=1)  # 16
 81 |         self.bn4d_1 = nn.BatchNorm2d(self.UpChannels)
 82 |         self.relu4d_1 = nn.ReLU(inplace=True)
 83 | 
 84 |         """stage 3d"""
 85 |         # h1->320*320, hd3->80*80, Pooling 4 times
 86 |         self.h1_PT_hd3 = nn.MaxPool2d(4, 4, ceil_mode=True)
 87 |         self.h1_PT_hd3_conv = nn.Conv2d(filters[0], self.CatChannels, 3, padding=1)
 88 |         self.h1_PT_hd3_bn = nn.BatchNorm2d(self.CatChannels)
 89 |         self.h1_PT_hd3_relu = nn.ReLU(inplace=True)
 90 | 
 91 |         # h2->160*160, hd3->80*80, Pooling 2 times
 92 |         self.h2_PT_hd3 = nn.MaxPool2d(2, 2, ceil_mode=True)
 93 |         self.h2_PT_hd3_conv = nn.Conv2d(filters[1], self.CatChannels, 3, padding=1)
 94 |         self.h2_PT_hd3_bn = nn.BatchNorm2d(self.CatChannels)
 95 |         self.h2_PT_hd3_relu = nn.ReLU(inplace=True)
 96 | 
 97 |         # h3->80*80, hd3->80*80, Concatenation
 98 |         self.h3_Cat_hd3_conv = nn.Conv2d(filters[2], self.CatChannels, 3, padding=1)
 99 |         self.h3_Cat_hd3_bn = nn.BatchNorm2d(self.CatChannels)
100 |         self.h3_Cat_hd3_relu = nn.ReLU(inplace=True)
101 | 
102 |         # hd4->40*40, hd4->80*80, Upsample 2 times
103 |         self.hd4_UT_hd3 = nn.Upsample(
104 |             scale_factor=2, mode="bilinear", align_corners=False
105 |         )  # 14*14
106 |         self.hd4_UT_hd3_conv = nn.Conv2d(
107 |             self.UpChannels, self.CatChannels, 3, padding=1
108 |         )
109 |         self.hd4_UT_hd3_bn = nn.BatchNorm2d(self.CatChannels)
110 |         self.hd4_UT_hd3_relu = nn.ReLU(inplace=True)
111 | 
112 |         # hd5->20*20, hd4->80*80, Upsample 4 times
113 |         self.hd5_UT_hd3 = nn.Upsample(
114 |             scale_factor=4, mode="bilinear", align_corners=False
115 |         )  # 14*14
116 |         self.hd5_UT_hd3_conv = nn.Conv2d(filters[4], self.CatChannels, 3, padding=1)
117 |         self.hd5_UT_hd3_bn = nn.BatchNorm2d(self.CatChannels)
118 |         self.hd5_UT_hd3_relu = nn.ReLU(inplace=True)
119 | 
120 |         # fusion(h1_PT_hd3, h2_PT_hd3, h3_Cat_hd3, hd4_UT_hd3, hd5_UT_hd3)
121 |         self.conv3d_1 = nn.Conv2d(self.UpChannels, self.UpChannels, 3, padding=1)  # 16
122 |         self.bn3d_1 = nn.BatchNorm2d(self.UpChannels)
123 |         self.relu3d_1 = nn.ReLU(inplace=True)
124 | 
125 |         """stage 2d """
126 |         # h1->320*320, hd2->160*160, Pooling 2 times
127 |         self.h1_PT_hd2 = nn.MaxPool2d(2, 2, ceil_mode=True)
128 |         self.h1_PT_hd2_conv = nn.Conv2d(filters[0], self.CatChannels, 3, padding=1)
129 |         self.h1_PT_hd2_bn = nn.BatchNorm2d(self.CatChannels)
130 |         self.h1_PT_hd2_relu = nn.ReLU(inplace=True)
131 | 
132 |         # h2->160*160, hd2->160*160, Concatenation
133 |         self.h2_Cat_hd2_conv = nn.Conv2d(filters[1], self.CatChannels, 3, padding=1)
134 |         self.h2_Cat_hd2_bn = nn.BatchNorm2d(self.CatChannels)
135 |         self.h2_Cat_hd2_relu = nn.ReLU(inplace=True)
136 | 
137 |         # hd3->80*80, hd2->160*160, Upsample 2 times
138 |         self.hd3_UT_hd2 = nn.Upsample(
139 |             scale_factor=2, mode="bilinear", align_corners=False
140 |         )  # 14*14
141 |         self.hd3_UT_hd2_conv = nn.Conv2d(
142 |             self.UpChannels, self.CatChannels, 3, padding=1
143 |         )
144 |         self.hd3_UT_hd2_bn = nn.BatchNorm2d(self.CatChannels)
145 |         self.hd3_UT_hd2_relu = nn.ReLU(inplace=True)
146 | 
147 |         # hd4->40*40, hd2->160*160, Upsample 4 times
148 |         self.hd4_UT_hd2 = nn.Upsample(
149 |             scale_factor=4, mode="bilinear", align_corners=False
150 |         )  # 14*14
151 |         self.hd4_UT_hd2_conv = nn.Conv2d(
152 |             self.UpChannels, self.CatChannels, 3, padding=1
153 |         )
154 |         self.hd4_UT_hd2_bn = nn.BatchNorm2d(self.CatChannels)
155 |         self.hd4_UT_hd2_relu = nn.ReLU(inplace=True)
156 | 
157 |         # hd5->20*20, hd2->160*160, Upsample 8 times
158 |         self.hd5_UT_hd2 = nn.Upsample(
159 |             scale_factor=8, mode="bilinear", align_corners=False
160 |         )  # 14*14
161 |         self.hd5_UT_hd2_conv = nn.Conv2d(filters[4], self.CatChannels, 3, padding=1)
162 |         self.hd5_UT_hd2_bn = nn.BatchNorm2d(self.CatChannels)
163 |         self.hd5_UT_hd2_relu = nn.ReLU(inplace=True)
164 | 
165 |         # fusion(h1_PT_hd2, h2_Cat_hd2, hd3_UT_hd2, hd4_UT_hd2, hd5_UT_hd2)
166 |         self.conv2d_1 = nn.Conv2d(self.UpChannels, self.UpChannels, 3, padding=1)  # 16
167 |         self.bn2d_1 = nn.BatchNorm2d(self.UpChannels)
168 |         self.relu2d_1 = nn.ReLU(inplace=True)
169 | 
170 |         """stage 1d"""
171 |         # h1->320*320, hd1->320*320, Concatenation
172 |         self.h1_Cat_hd1_conv = nn.Conv2d(filters[0], self.CatChannels, 3, padding=1)
173 |         self.h1_Cat_hd1_bn = nn.BatchNorm2d(self.CatChannels)
174 |         self.h1_Cat_hd1_relu = nn.ReLU(inplace=True)
175 | 
176 |         # hd2->160*160, hd1->320*320, Upsample 2 times
177 |         self.hd2_UT_hd1 = nn.Upsample(
178 |             scale_factor=2, mode="bilinear", align_corners=False
179 |         )  # 14*14
180 |         self.hd2_UT_hd1_conv = nn.Conv2d(
181 |             self.UpChannels, self.CatChannels, 3, padding=1
182 |         )
183 |         self.hd2_UT_hd1_bn = nn.BatchNorm2d(self.CatChannels)
184 |         self.hd2_UT_hd1_relu = nn.ReLU(inplace=True)
185 | 
186 |         # hd3->80*80, hd1->320*320, Upsample 4 times
187 |         self.hd3_UT_hd1 = nn.Upsample(
188 |             scale_factor=4, mode="bilinear", align_corners=False
189 |         )  # 14*14
190 |         self.hd3_UT_hd1_conv = nn.Conv2d(
191 |             self.UpChannels, self.CatChannels, 3, padding=1
192 |         )
193 |         self.hd3_UT_hd1_bn = nn.BatchNorm2d(self.CatChannels)
194 |         self.hd3_UT_hd1_relu = nn.ReLU(inplace=True)
195 | 
196 |         # hd4->40*40, hd1->320*320, Upsample 8 times
197 |         self.hd4_UT_hd1 = nn.Upsample(
198 |             scale_factor=8, mode="bilinear", align_corners=False
199 |         )  # 14*14
200 |         self.hd4_UT_hd1_conv = nn.Conv2d(
201 |             self.UpChannels, self.CatChannels, 3, padding=1
202 |         )
203 |         self.hd4_UT_hd1_bn = nn.BatchNorm2d(self.CatChannels)
204 |         self.hd4_UT_hd1_relu = nn.ReLU(inplace=True)
205 | 
206 |         # hd5->20*20, hd1->320*320, Upsample 16 times
207 |         self.hd5_UT_hd1 = nn.Upsample(
208 |             scale_factor=16, mode="bilinear", align_corners=False
209 |         )  # 14*14
210 |         self.hd5_UT_hd1_conv = nn.Conv2d(filters[4], self.CatChannels, 3, padding=1)
211 |         self.hd5_UT_hd1_bn = nn.BatchNorm2d(self.CatChannels)
212 |         self.hd5_UT_hd1_relu = nn.ReLU(inplace=True)
213 | 
214 |         # fusion(h1_Cat_hd1, hd2_UT_hd1, hd3_UT_hd1, hd4_UT_hd1, hd5_UT_hd1)
215 |         self.conv1d_1 = nn.Conv2d(self.UpChannels, self.UpChannels, 3, padding=1)  # 16
216 |         self.bn1d_1 = nn.BatchNorm2d(self.UpChannels)
217 |         self.relu1d_1 = nn.ReLU(inplace=True)
218 | 
219 |         # output
220 |         self.outconv_seg = nn.Conv2d(self.UpChannels, 2, 3, padding=1)
221 | 
222 |         # initialise weights
223 |         for m in self.modules():
224 |             if isinstance(m, nn.Conv2d):
225 |                 init_weights(m, init_type="kaiming")
226 |             elif isinstance(m, nn.BatchNorm2d):
227 |                 init_weights(m, init_type="kaiming")
228 | 
229 |     def forward(self, inputs):
230 |         ## -------------Encoder-------------
231 |         h1 = self.conv1(inputs)  # h1->320*320*64
232 | 
233 |         h2 = self.maxpool1(h1)
234 |         h2 = self.conv2(h2)  # h2->160*160*128
235 | 
236 |         h3 = self.maxpool2(h2)
237 |         h3 = self.conv3(h3)  # h3->80*80*256
238 | 
239 |         h4 = self.maxpool3(h3)
240 |         h4 = self.conv4(h4)  # h4->40*40*512
241 | 
242 |         h5 = self.maxpool4(h4)
243 |         hd5 = self.conv5(h5)  # h5->20*20*1024
244 | 
245 |         ## -------------Decoder-------------
246 |         h1_PT_hd4 = self.h1_PT_hd4_relu(
247 |             self.h1_PT_hd4_bn(self.h1_PT_hd4_conv(self.h1_PT_hd4(h1)))
248 |         )
249 |         h2_PT_hd4 = self.h2_PT_hd4_relu(
250 |             self.h2_PT_hd4_bn(self.h2_PT_hd4_conv(self.h2_PT_hd4(h2)))
251 |         )
252 |         h3_PT_hd4 = self.h3_PT_hd4_relu(
253 |             self.h3_PT_hd4_bn(self.h3_PT_hd4_conv(self.h3_PT_hd4(h3)))
254 |         )
255 |         h4_Cat_hd4 = self.h4_Cat_hd4_relu(self.h4_Cat_hd4_bn(self.h4_Cat_hd4_conv(h4)))
256 |         hd5_UT_hd4 = self.hd5_UT_hd4_relu(
257 |             self.hd5_UT_hd4_bn(self.hd5_UT_hd4_conv(self.hd5_UT_hd4(hd5)))
258 |         )
259 |         hd4 = self.relu4d_1(
260 |             self.bn4d_1(
261 |                 self.conv4d_1(
262 |                     torch.cat(
263 |                         (h1_PT_hd4, h2_PT_hd4, h3_PT_hd4, h4_Cat_hd4, hd5_UT_hd4), 1
264 |                     )
265 |                 )
266 |             )
267 |         )  # hd4->40*40*UpChannels
268 | 
269 |         h1_PT_hd3 = self.h1_PT_hd3_relu(
270 |             self.h1_PT_hd3_bn(self.h1_PT_hd3_conv(self.h1_PT_hd3(h1)))
271 |         )
272 |         h2_PT_hd3 = self.h2_PT_hd3_relu(
273 |             self.h2_PT_hd3_bn(self.h2_PT_hd3_conv(self.h2_PT_hd3(h2)))
274 |         )
275 |         h3_Cat_hd3 = self.h3_Cat_hd3_relu(self.h3_Cat_hd3_bn(self.h3_Cat_hd3_conv(h3)))
276 |         hd4_UT_hd3 = self.hd4_UT_hd3_relu(
277 |             self.hd4_UT_hd3_bn(self.hd4_UT_hd3_conv(self.hd4_UT_hd3(hd4)))
278 |         )
279 |         hd5_UT_hd3 = self.hd5_UT_hd3_relu(
280 |             self.hd5_UT_hd3_bn(self.hd5_UT_hd3_conv(self.hd5_UT_hd3(hd5)))
281 |         )
282 |         hd3 = self.relu3d_1(
283 |             self.bn3d_1(
284 |                 self.conv3d_1(
285 |                     torch.cat(
286 |                         (h1_PT_hd3, h2_PT_hd3, h3_Cat_hd3, hd4_UT_hd3, hd5_UT_hd3), 1
287 |                     )
288 |                 )
289 |             )
290 |         )  # hd3->80*80*UpChannels
291 | 
292 |         h1_PT_hd2 = self.h1_PT_hd2_relu(
293 |             self.h1_PT_hd2_bn(self.h1_PT_hd2_conv(self.h1_PT_hd2(h1)))
294 |         )
295 |         h2_Cat_hd2 = self.h2_Cat_hd2_relu(self.h2_Cat_hd2_bn(self.h2_Cat_hd2_conv(h2)))
296 |         hd3_UT_hd2 = self.hd3_UT_hd2_relu(
297 |             self.hd3_UT_hd2_bn(self.hd3_UT_hd2_conv(self.hd3_UT_hd2(hd3)))
298 |         )
299 |         hd4_UT_hd2 = self.hd4_UT_hd2_relu(
300 |             self.hd4_UT_hd2_bn(self.hd4_UT_hd2_conv(self.hd4_UT_hd2(hd4)))
301 |         )
302 |         hd5_UT_hd2 = self.hd5_UT_hd2_relu(
303 |             self.hd5_UT_hd2_bn(self.hd5_UT_hd2_conv(self.hd5_UT_hd2(hd5)))
304 |         )
305 |         hd2 = self.relu2d_1(
306 |             self.bn2d_1(
307 |                 self.conv2d_1(
308 |                     torch.cat(
309 |                         (h1_PT_hd2, h2_Cat_hd2, hd3_UT_hd2, hd4_UT_hd2, hd5_UT_hd2), 1
310 |                     )
311 |                 )
312 |             )
313 |         )  # hd2->160*160*UpChannels
314 | 
315 |         h1_Cat_hd1 = self.h1_Cat_hd1_relu(self.h1_Cat_hd1_bn(self.h1_Cat_hd1_conv(h1)))
316 |         hd2_UT_hd1 = self.hd2_UT_hd1_relu(
317 |             self.hd2_UT_hd1_bn(self.hd2_UT_hd1_conv(self.hd2_UT_hd1(hd2)))
318 |         )
319 |         hd3_UT_hd1 = self.hd3_UT_hd1_relu(
320 |             self.hd3_UT_hd1_bn(self.hd3_UT_hd1_conv(self.hd3_UT_hd1(hd3)))
321 |         )
322 |         hd4_UT_hd1 = self.hd4_UT_hd1_relu(
323 |             self.hd4_UT_hd1_bn(self.hd4_UT_hd1_conv(self.hd4_UT_hd1(hd4)))
324 |         )
325 |         hd5_UT_hd1 = self.hd5_UT_hd1_relu(
326 |             self.hd5_UT_hd1_bn(self.hd5_UT_hd1_conv(self.hd5_UT_hd1(hd5)))
327 |         )
328 |         hd1 = self.relu1d_1(
329 |             self.bn1d_1(
330 |                 self.conv1d_1(
331 |                     torch.cat(
332 |                         (h1_Cat_hd1, hd2_UT_hd1, hd3_UT_hd1, hd4_UT_hd1, hd5_UT_hd1), 1
333 |                     )
334 |                 )
335 |             )
336 |         )  # hd1->320*320*UpChannels
337 | 
338 |         seg = self.outconv_seg(hd1)  # d1->320*320*2
339 | 
340 |         return seg
341 | 


--------------------------------------------------------------------------------
/bidcell/model/postprocess_predictions.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import glob
  3 | import multiprocessing as mp
  4 | import os
  5 | import random
  6 | import re
  7 | from collections import Counter
  8 | 
  9 | import cv2
 10 | import matplotlib.pyplot as plt
 11 | import numpy as np
 12 | import tifffile
 13 | from scipy import ndimage as ndi
 14 | from ..config import load_config, Config
 15 | 
 16 | 
 17 | def get_n_processes(n_processes):
 18 |     """Number of CPUs for multiprocessing"""
 19 |     if n_processes is None:
 20 |         return mp.cpu_count()
 21 |     else:
 22 |         return n_processes if n_processes <= mp.cpu_count() else mp.cpu_count()
 23 | 
 24 | 
 25 | def sorted_alphanumeric(data):
 26 |     convert = lambda text: int(text) if text.isdigit() else text.lower()
 27 |     alphanum_key = lambda key: [convert(c) for c in re.split("([0-9]+)", key)]
 28 |     return sorted(data, key=alphanum_key)
 29 | 
 30 | 
 31 | def get_exp_dir(config):
 32 |     if config.files.dir_id == "last":
 33 |         folders = next(os.walk("model_outputs"))[1]
 34 |         folders = sorted_alphanumeric(folders)
 35 |         folder_last = folders[-1]
 36 |         dir_id = folder_last.replace("\\", "/")
 37 |     else:
 38 |         dir_id = config.files.dir_id
 39 | 
 40 |     return dir_id
 41 | 
 42 | 
 43 | def postprocess_connect(img, nuclei):
 44 |     cell_ids = np.unique(img)
 45 |     cell_ids = cell_ids[1:]
 46 | 
 47 |     random.shuffle(cell_ids)
 48 | 
 49 |     kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))
 50 | 
 51 |     # Touch diagonally = same object
 52 |     s = ndi.generate_binary_structure(2, 2)
 53 | 
 54 |     final = np.zeros(img.shape, dtype=np.uint32)
 55 | 
 56 |     for i in cell_ids:
 57 |         i_mask = np.where(img == i, 1, 0).astype(np.uint8)
 58 | 
 59 |         connected_mask = cv2.dilate(i_mask, kernel, iterations=2)
 60 |         connected_mask = cv2.erode(connected_mask, kernel, iterations=2)
 61 | 
 62 |         # Add nucleus as predicted by cellpose
 63 |         nucleus_mask = np.where(nuclei == i, 1, 0).astype(np.uint8)
 64 | 
 65 |         connected_mask = connected_mask + nucleus_mask
 66 |         connected_mask[connected_mask > 0] = 1
 67 | 
 68 |         unique_ids, num_ids = ndi.label(connected_mask, structure=s)
 69 |         if num_ids > 1:
 70 |             # The first element is always 0 (background)
 71 |             unique, counts = np.unique(unique_ids, return_counts=True)
 72 | 
 73 |             # Ensure counts in descending order
 74 |             counts, unique = (list(t) for t in zip(*sorted(zip(counts, unique))))
 75 |             counts.reverse()
 76 |             unique.reverse()
 77 |             counts = np.array(counts)
 78 |             unique = np.array(unique)
 79 | 
 80 |             no_overlap = False
 81 | 
 82 |             # Rarely, the nucleus is not the largest segment
 83 |             for i_part in range(1, len(counts)):
 84 |                 if i_part > 1:
 85 |                     no_overlap = True  # TODO: Helen, check this!
 86 |                 largest = unique[np.argmax(counts[i_part:]) + i_part]
 87 |                 connected_mask = np.where(unique_ids == largest, 1, 0)
 88 |                 # Break if current largest region overlaps nucleus
 89 |                 if np.sum(connected_mask * nucleus_mask) > 0.5:
 90 |                     break
 91 | 
 92 |             # Close holes on largest section
 93 |             filled_mask = ndi.binary_fill_holes(connected_mask).astype(int)
 94 | 
 95 |         else:
 96 |             filled_mask = ndi.binary_fill_holes(connected_mask).astype(int)
 97 | 
 98 |         final = np.where(filled_mask > 0, i, final)
 99 | 
100 |     final = np.where(nuclei > 0, nuclei, final)
101 |     return final
102 | 
103 | 
104 | def remove_islands(img, nuclei):
105 |     cell_ids = np.unique(img)
106 |     cell_ids = cell_ids[1:]
107 | 
108 |     random.shuffle(cell_ids)
109 | 
110 |     # Touch diagonally = same object
111 |     s = ndi.generate_binary_structure(2, 2)
112 | 
113 |     final = np.zeros(img.shape, dtype=np.uint32)
114 | 
115 |     for i in cell_ids:
116 |         i_mask = np.where(img == i, 1, 0).astype(np.uint8)
117 | 
118 |         nucleus_mask = np.where(nuclei == i, 1, 0).astype(np.uint8)
119 | 
120 |         # Number of blobs belonging to cell
121 |         unique_ids, num_blobs = ndi.label(i_mask, structure=s)
122 |         if num_blobs > 1:
123 |             # Keep the blob with max overlap to nucleus
124 |             amount_overlap = np.zeros(num_blobs)
125 | 
126 |             for i_blob in range(1, num_blobs + 1):
127 |                 blob = np.where(unique_ids == i_blob, 1, 0)
128 |                 amount_overlap[i_blob - 1] = np.sum(blob * nucleus_mask)
129 |             blob_keep = np.argmax(amount_overlap) + 1
130 | 
131 |             final_mask = np.where(unique_ids == blob_keep, 1, 0)
132 | 
133 |         else:
134 |             blob_size = np.count_nonzero(i_mask)
135 |             if blob_size > 2:
136 |                 final_mask = i_mask.copy()
137 |             else:
138 |                 final_mask = i_mask * 0
139 | 
140 |         final_mask = ndi.binary_fill_holes(final_mask).astype(int)
141 | 
142 |         final = np.where(final_mask > 0, i, final)
143 | 
144 |     return final
145 | 
146 | 
147 | def process_chunk(chunk, patch_size, img_whole, nuclei_img, output_dir):
148 |     for index in range(len(chunk)):
149 |         coords = chunk[index]
150 |         coords_x1 = coords[0]
151 |         coords_y1 = coords[1]
152 |         coords_x2 = coords_x1 + patch_size
153 |         coords_y2 = coords_y1 + patch_size
154 | 
155 |         img = img_whole[coords_x1:coords_x2, coords_y1:coords_y2]
156 | 
157 |         nuclei = nuclei_img[coords_x1:coords_x2, coords_y1:coords_y2]
158 | 
159 |         output_fp = output_dir + "%d_%d.tif" % (coords_x1, coords_y1)
160 | 
161 |         # print('Filling holes')
162 |         filled = postprocess_connect(img, nuclei)
163 | 
164 |         # print('Removing islands')
165 |         final = remove_islands(filled, nuclei)
166 | 
167 |         tifffile.imwrite(output_fp, final.astype(np.uint32), photometric="minisblack")
168 | 
169 |         cell_ids = np.unique(final)[1:]
170 | 
171 |         # Visualise cells with random colours
172 |         n_cells_ids = len(cell_ids)
173 |         cell_ids_rand = np.arange(1, n_cells_ids + 1)
174 |         random.shuffle(cell_ids_rand)
175 | 
176 |         keep_mask = np.isin(nuclei, cell_ids)
177 |         nuclei = np.where(keep_mask, nuclei, 0)
178 |         keep_mask = np.isin(img, cell_ids)
179 |         img = np.where(keep_mask, nuclei, 0)
180 | 
181 |         dictionary = dict(zip(cell_ids, cell_ids_rand))
182 |         dictionary[0] = 0
183 |         nuclei_mapped = np.copy(nuclei)
184 |         img_mapped = np.copy(img)
185 |         final_mapped = np.copy(final)
186 | 
187 |         nuclei_mapped = np.vectorize(dictionary.get)(nuclei)
188 |         img_mapped = np.vectorize(dictionary.get)(img)
189 |         final_mapped = np.vectorize(dictionary.get)(final)
190 | 
191 |         fig, axes = plt.subplots(ncols=3, figsize=(9, 3), sharex=True, sharey=True)
192 |         ax = axes.ravel()
193 |         ax[0].imshow(nuclei_mapped, cmap=plt.cm.gray)
194 |         ax[0].set_title("Nuclei")
195 |         ax[1].imshow(img_mapped, cmap=plt.cm.nipy_spectral)
196 |         ax[1].set_title("Original")
197 |         ax[2].imshow(final_mapped, cmap=plt.cm.nipy_spectral)
198 |         ax[2].set_title("Processed")
199 |         for a in ax:
200 |             a.set_axis_off()
201 |         fig.tight_layout()
202 |         # plt.show()
203 | 
204 |         fig.savefig(output_fp.replace("tif", "png"), dpi=300)
205 |         plt.close(fig)
206 | 
207 | 
208 | def combine(config, dir_id, patch_size, nuclei_img):
209 |     """
210 |     Combine the patches previously output by the connect function
211 |     """
212 | 
213 |     fp_dir = dir_id + "epoch_%d_step_%d_connected" % (
214 |         config.testing_params.test_epoch,
215 |         config.testing_params.test_step,
216 |     )
217 |     fp_unconnected = dir_id + "epoch_%d_step_%d.tif" % (
218 |         config.testing_params.test_epoch,
219 |         config.testing_params.test_step,
220 |     )
221 | 
222 |     dl_pred = tifffile.imread(fp_unconnected)
223 |     height = dl_pred.shape[0]
224 |     width = dl_pred.shape[1]
225 | 
226 |     seg_final = np.zeros((height, width), dtype=np.uint32)
227 | 
228 |     fp_seg = glob.glob(fp_dir + "/*.tif", recursive=True)
229 | 
230 |     sample = tifffile.imread(fp_seg[0])
231 |     patch_h = sample.shape[0]
232 |     patch_w = sample.shape[1]
233 | 
234 |     cell_ids = []
235 | 
236 |     for fp in fp_seg:
237 |         patch = tifffile.imread(fp)
238 | 
239 |         patch_ids = np.unique(patch)
240 |         patch_ids = patch_ids[patch_ids != 0]
241 |         cell_ids.extend(patch_ids)
242 | 
243 |         fp_coords = os.path.basename(fp).split(".")[0]
244 |         fp_x = int(fp_coords.split("_")[0])
245 |         fp_y = int(fp_coords.split("_")[1])
246 | 
247 |         # Place into appropriate location
248 |         seg_final[fp_x : fp_x + patch_h, fp_y : fp_y + patch_w] = patch[:]
249 | 
250 |     # If cell is split by windowing, keep component with nucleus
251 |     count_ids = Counter(cell_ids)
252 |     windowed_ids = [k for k, v in count_ids.items() if v > 1]
253 | 
254 |     # Check along borders
255 |     h_starts = list(np.arange(0, height - patch_size, patch_size))
256 |     w_starts = list(np.arange(0, width - patch_size, patch_size))
257 |     h_starts.append(height - patch_size)
258 |     w_starts.append(width - patch_size)
259 | 
260 |     # Mask along grid
261 |     h_starts_wide = []
262 |     w_starts_wide = []
263 |     for i in range(-10, 11):
264 |         h_starts_wide.extend([x + i for x in h_starts])
265 |         w_starts_wide.extend([x + i for x in w_starts])
266 | 
267 |     mask = np.zeros(seg_final.shape)
268 |     mask[h_starts_wide, :] = 1
269 |     mask[:, w_starts_wide] = 1
270 | 
271 |     masked = mask * seg_final
272 |     masked_ids = np.unique(masked)[1:]
273 | 
274 |     # IDs to check for split bodies
275 |     to_check_ids = list(set(masked_ids) & set(windowed_ids))
276 | 
277 |     return seg_final, to_check_ids
278 | 
279 | 
280 | def process_check_splits(config, dir_id, nuclei_img, seg_final, chunk_ids):
281 |     """
282 |     Check and fix cells split by windowing
283 |     """
284 | 
285 |     chunk_seg = np.zeros(seg_final.shape, dtype=np.uint32)
286 | 
287 |     # Touch diagonally = same object
288 |     s = ndi.generate_binary_structure(2, 2)
289 | 
290 |     for i in chunk_ids:
291 |         i_mask = np.where(seg_final == i, 1, 0).astype(np.uint8)
292 | 
293 |         # Number of blobs belonging to cell
294 |         unique_ids, num_blobs = ndi.label(i_mask, structure=s)
295 | 
296 |         # Bounding box
297 |         bb = np.argwhere(unique_ids)
298 |         (ystart, xstart), (ystop, xstop) = bb.min(0), bb.max(0) + 1
299 |         unique_ids_crop = unique_ids[ystart:ystop, xstart:xstop]
300 | 
301 |         nucleus_mask = np.where(nuclei_img == i, 1, 0).astype(np.uint8)
302 |         nucleus_mask = nucleus_mask[ystart:ystop, xstart:xstop]
303 | 
304 |         if num_blobs > 1:
305 |             # Keep the blob with max overlap to nucleus
306 |             amount_overlap = np.zeros(num_blobs)
307 | 
308 |             for i_blob in range(1, num_blobs + 1):
309 |                 blob = np.where(unique_ids_crop == i_blob, 1, 0)
310 |                 amount_overlap[i_blob - 1] = np.sum(blob * nucleus_mask)
311 |             blob_keep = np.argmax(amount_overlap) + 1
312 | 
313 |             # Put into final segmentation
314 |             final_mask = np.where(unique_ids_crop == blob_keep, 1, 0)
315 | 
316 |             # seg_final = np.where(seg_final == i, 0, seg_final)
317 | 
318 |             # # Double check the few outliers
319 |             # unique_ids_2, num_blobs_2 = ndi.label(final_mask, structure=s)
320 |             # if num_blobs_2 > 1:
321 |             #     # Keep largest
322 |             #     blob_keep_2 = np.argmax(np.bincount(unique_ids_2)[1:]) + 1
323 |             #     final_mask = np.where(unique_ids_2 == blob_keep_2, 1, 0)
324 | 
325 |             chunk_seg[ystart:ystop, xstart:xstop] = np.where(
326 |                 final_mask == 1, i, chunk_seg[ystart:ystop, xstart:xstop]
327 |             )
328 | 
329 |         else:
330 |             chunk_seg = np.where(i_mask == 1, i, chunk_seg)
331 | 
332 |     tifffile.imwrite(
333 |         dir_id + "/" + str(chunk_ids[0]) + "_checked_splits.tif",
334 |         chunk_seg,
335 |         photometric="minisblack",
336 |     )
337 | 
338 | 
339 | def postprocess_predictions(config: Config, dir_id: str):
340 |     dir_id = config.files.data_dir + "/model_outputs/" + dir_id + "/test_output/"
341 | 
342 |     pred_fp = dir_id + "epoch_%d_step_%d.tif" % (
343 |         config.testing_params.test_epoch,
344 |         config.testing_params.test_step,
345 |     )
346 |     output_dir = dir_id + "epoch_%d_step_%d_connected/" % (
347 |         config.testing_params.test_epoch,
348 |         config.testing_params.test_step,
349 |     )
350 | 
351 |     nucleus_fp = os.path.join(config.files.data_dir, config.files.fp_nuclei)
352 |     nuclei_img = tifffile.imread(nucleus_fp)
353 | 
354 |     if not os.path.exists(output_dir):
355 |         os.makedirs(output_dir)
356 | 
357 |     img_whole = tifffile.imread(pred_fp)
358 | 
359 |     smallest_dim = np.min(img_whole.shape)
360 |     if config.postprocess.patch_size_mp < smallest_dim:
361 |         patch_size = config.postprocess.patch_size_mp
362 |     else:
363 |         patch_size = smallest_dim
364 | 
365 |     h_starts = list(np.arange(0, img_whole.shape[0] - patch_size, patch_size))
366 |     w_starts = list(np.arange(0, img_whole.shape[1] - patch_size, patch_size))
367 |     h_starts.append(img_whole.shape[0] - patch_size)
368 |     w_starts.append(img_whole.shape[1] - patch_size)
369 |     coords_starts = [(x, y) for x in h_starts for y in w_starts]
370 |     print("%d patches available" % len(coords_starts))
371 | 
372 |     # num_processes = mp.cpu_count()
373 |     num_processes = get_n_processes(config.cpus)
374 |     print("Num multiprocessing splits: %d" % num_processes)
375 | 
376 |     coords_splits = np.array_split(coords_starts, num_processes)
377 |     processes = []
378 | 
379 |     print("Processing...")
380 | 
381 |     for chunk in coords_splits:
382 |         p = mp.Process(
383 |             target=process_chunk,
384 |             args=(chunk, patch_size, img_whole, nuclei_img, output_dir),
385 |         )
386 |         processes.append(p)
387 |         p.start()
388 | 
389 |     for p in processes:
390 |         p.join()
391 | 
392 |     print("Combining results")
393 |     seg_final, to_check_ids = combine(config, dir_id, patch_size, nuclei_img)
394 |     # print(len(np.unique(seg_final)), len(to_check_ids))
395 | 
396 |     ids_splits = np.array_split(to_check_ids, num_processes)
397 |     processes = []
398 | 
399 |     for chunk_ids in ids_splits:
400 |         p = mp.Process(
401 |             target=process_check_splits,
402 |             args=(config, dir_id, nuclei_img, seg_final, chunk_ids),
403 |         )
404 |         processes.append(p)
405 |         p.start()
406 | 
407 |     for p in processes:
408 |         p.join()
409 | 
410 |     check_mask = np.isin(seg_final, to_check_ids)
411 |     seg_final = np.where(check_mask == 1, 0, seg_final)
412 | 
413 |     fp_checked_splits = glob.glob(dir_id + "/*_checked_splits.tif", recursive=True)
414 |     for fp in fp_checked_splits:
415 |         checked_split = tifffile.imread(fp)
416 |         seg_final = np.where(checked_split > 0, checked_split, seg_final)
417 |         os.remove(fp)
418 | 
419 |     seg_final = np.where(nuclei_img > 0, nuclei_img, seg_final)
420 | 
421 |     fp_dir = dir_id + "epoch_%d_step_%d_connected" % (
422 |         config.testing_params.test_epoch,
423 |         config.testing_params.test_step,
424 |     )
425 |     fp_output_seg = fp_dir + ".tif"
426 |     print("Saved segmentation to %s" % fp_output_seg)
427 |     tifffile.imwrite(
428 |         fp_output_seg, seg_final.astype(np.uint32), photometric="minisblack"
429 |     )
430 | 
431 | 
432 | if __name__ == "__main__":
433 |     parser = argparse.ArgumentParser()
434 | 
435 |     parser.add_argument("--config_dir", type=str, help="path to config")
436 | 
437 |     args = parser.parse_args()
438 |     config = load_config(args.config_dir)
439 | 
440 |     postprocess_predictions(config)
441 | 


--------------------------------------------------------------------------------
/bidcell/model/predict.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import bisect
  3 | import glob
  4 | import logging
  5 | import os
  6 | import re
  7 | import sys
  8 | 
  9 | import natsort
 10 | import numpy as np
 11 | import pandas as pd
 12 | import segmentation_models_pytorch as smp
 13 | import tifffile
 14 | import torch
 15 | from torch.utils.data import DataLoader
 16 | 
 17 | from .dataio.dataset_input import DataProcessing
 18 | from .model.model import SegmentationModel as Network
 19 | from .utils.utils import (
 20 |     get_experiment_id,
 21 |     get_files_list,
 22 |     get_seg_mask,
 23 |     make_dir,
 24 |     save_fig_outputs,
 25 |     sorted_alphanumeric,
 26 | )
 27 | 
 28 | from ..config import load_config, Config
 29 | 
 30 | 
 31 | def predict(config: Config) -> str:
 32 |     logging.basicConfig(
 33 |         format="%(asctime)s %(levelname)s %(message)s",
 34 |         level=logging.INFO,
 35 |         stream=sys.stdout,
 36 |     )
 37 | 
 38 |     use_cuda = torch.cuda.is_available()
 39 |     device = torch.device("cuda" if use_cuda else "cpu")
 40 | 
 41 |     # Create experiment directories
 42 |     make_new = False
 43 |     timestamp = get_experiment_id(
 44 |         make_new,
 45 |         config.experiment_dirs.dir_id,
 46 |         config.files.data_dir,
 47 |     )
 48 |     experiment_path = os.path.join(config.files.data_dir, "model_outputs", timestamp)
 49 |     model_dir = experiment_path + "/" + config.experiment_dirs.model_dir
 50 |     test_output_dir = experiment_path + "/" + config.experiment_dirs.test_output_dir
 51 |     make_dir(test_output_dir)
 52 | 
 53 |     # Set up the model
 54 |     logging.info("Initialising model")
 55 | 
 56 |     atlas_exprs = pd.read_csv(config.files.fp_ref, index_col=0)
 57 |     n_genes = atlas_exprs.shape[1] - 3
 58 |     print("Number of genes: %d" % n_genes)
 59 | 
 60 |     if config.model_params.name != "custom":
 61 |         model = smp.Unet(
 62 |             encoder_name=config.model_params.name,
 63 |             encoder_weights=None,
 64 |             in_channels=n_genes,
 65 |             classes=2,
 66 |         )
 67 |     else:
 68 |         model = Network(n_channels=n_genes)
 69 | 
 70 |     model = model.to(device)
 71 | 
 72 |     # Get list of model files
 73 |     if config.testing_params.test_epoch < 0:
 74 |         saved_model_paths, _ = get_files_list(model_dir, [".pth"])
 75 |         saved_model_paths = sorted_alphanumeric(saved_model_paths)
 76 |         saved_model_names = [
 77 |             (os.path.basename(x)).split(".")[0] for x in saved_model_paths
 78 |         ]
 79 |         saved_model_epochs = [x.split("_")[1] for x in saved_model_names]
 80 |         saved_model_steps = [x.split("_")[-1] for x in saved_model_names]
 81 |         if config.testing_params.test_epoch is None:
 82 |             saved_model_epochs = np.array(saved_model_epochs, dtype="int")
 83 |             saved_model_steps = np.array(saved_model_steps, dtype="int")
 84 |         elif config.testing_params.test_epoch == -1:
 85 |             saved_model_epochs = np.array(saved_model_epochs[-1], dtype="int")
 86 |             saved_model_epochs = [saved_model_epochs]
 87 |             saved_model_steps = np.array(saved_model_steps[-1], dtype="int")
 88 |             saved_model_steps = [saved_model_steps]
 89 |     else:
 90 |         saved_model_epochs = [config.testing_params.test_epoch]
 91 |         saved_model_steps = [config.testing_params.test_step]
 92 | 
 93 |     shifts = [0, int(config.model_params.patch_size / 2)]
 94 | 
 95 |     for shift_patches in shifts:
 96 |         # Dataloader
 97 |         logging.info("Preparing data")
 98 |         test_dataset = DataProcessing(
 99 |             config,
100 |             isTraining=False,
101 |             shift_patches=shift_patches,
102 |         )
103 |         test_loader = DataLoader(
104 |             dataset=test_dataset, batch_size=1, shuffle=False, num_workers=0
105 |         )
106 | 
107 |         n_test_examples = len(test_loader)
108 |         logging.info("Total number of patches: %d" % n_test_examples)
109 | 
110 |         logging.info("Begin prediction")
111 | 
112 |         for epoch_idx, (test_epoch, test_step) in enumerate(
113 |             zip(saved_model_epochs, saved_model_steps)
114 |         ):
115 |             current_dir = (
116 |                 test_output_dir
117 |                 + "/"
118 |                 + "epoch_"
119 |                 + str(test_epoch)
120 |                 + "_step_"
121 |                 + str(test_step)
122 |             )
123 |             make_dir(current_dir)
124 | 
125 |             # Restore model
126 |             load_path = model_dir + "/epoch_%d_step_%d.pth" % (test_epoch, test_step)
127 |             checkpoint = torch.load(load_path)
128 |             model.load_state_dict(checkpoint["model_state_dict"])
129 |             epoch = checkpoint["epoch"]
130 |             assert epoch == test_epoch
131 |             print("Predict using " + load_path)
132 | 
133 |             model = model.eval()
134 | 
135 |             for batch_idx, (
136 |                 batch_x313,
137 |                 batch_n,
138 |                 batch_sa,
139 |                 batch_pos,
140 |                 batch_neg,
141 |                 coords_h1,
142 |                 coords_w1,
143 |                 nucl_aug,
144 |                 expr_aug_sum,
145 |                 whole_h,
146 |                 whole_w,
147 |                 expr_fp,
148 |             ) in enumerate(test_loader):
149 |                 if batch_idx == 0:
150 |                     whole_seg = np.zeros((whole_h, whole_w), dtype=np.uint32)
151 | 
152 |                 # Permute channels axis to batch axis
153 |                 batch_x313 = batch_x313[0, :, :, :, :].permute(3, 2, 0, 1)
154 |                 batch_sa = batch_sa.permute(3, 0, 1, 2)
155 |                 batch_n = batch_n.permute(3, 0, 1, 2)
156 | 
157 |                 if batch_x313.shape[0] == 0:
158 |                     seg_patch = np.zeros(
159 |                         (
160 |                             config.model_params.patch_size,
161 |                             config.model_params.patch_size,
162 |                         ),
163 |                         dtype=np.uint32,
164 |                     )
165 | 
166 |                 else:
167 |                     # Transfer to GPU
168 |                     batch_x313 = batch_x313.to(device)
169 |                     batch_sa = batch_sa.to(device)
170 |                     batch_n = batch_n.to(device)
171 | 
172 |                     # Forward pass
173 |                     seg_pred = model(batch_x313)
174 | 
175 |                     coords_h1 = coords_h1.detach().cpu().squeeze().numpy()
176 |                     coords_w1 = coords_w1.detach().cpu().squeeze().numpy()
177 |                     sample_seg = seg_pred.detach().cpu().numpy()
178 |                     sample_n = nucl_aug.detach().cpu().numpy()
179 |                     sample_sa = batch_sa.detach().cpu().numpy()
180 |                     sample_expr = expr_aug_sum.detach().cpu().numpy()
181 |                     patch_fp = current_dir + "/%d_%d.png" % (coords_h1, coords_w1)
182 | 
183 |                     if (batch_idx % config.training_params.sample_freq) == 0:
184 |                         save_fig_outputs(
185 |                             sample_seg, sample_n, sample_sa, sample_expr, patch_fp
186 |                         )
187 | 
188 |                     seg_patch = get_seg_mask(sample_seg, sample_n)
189 | 
190 |                 # seg_patch_fp = current_dir + '/' + "%d_%d.tif" %(coords_h1, coords_w1)
191 |                 # tifffile.imwrite(seg_patch_fp, seg_patch.astype(np.uint32), photometric='minisblack')
192 | 
193 |                 whole_seg[
194 |                     coords_h1 : coords_h1 + config.model_params.patch_size,
195 |                     coords_w1 : coords_w1 + config.model_params.patch_size,
196 |                 ] = seg_patch.copy()
197 | 
198 |             seg_fp = (
199 |                 test_output_dir
200 |                 + "/"
201 |                 + "epoch_%d_step_%d_seg_shift%d.tif"
202 |                 % (test_epoch, test_step, shift_patches)
203 |             )
204 | 
205 |             tifffile.imwrite(
206 |                 seg_fp, whole_seg.astype(np.uint32), photometric="minisblack"
207 |             )
208 | 
209 |     logging.info("Finished")
210 | 
211 |     return test_output_dir
212 | 
213 | 
214 | def gap_coords(coords, patcsize):
215 |     """If gap larger than patcsize -> remove all corresponding locations"""
216 |     starts_diff = np.diff(coords)
217 |     gap_idx = np.where(starts_diff > patcsize)[0] + 1
218 |     # print(gap_idx.shape)
219 |     gap_end = [coords[x] for x in gap_idx]
220 |     gap_start = [coords[bisect.bisect(coords, x - 1) - 1] for x in gap_end]
221 |     # print(gap_start, gap_end)
222 |     gap = []
223 |     for gs, ge in zip(gap_start, gap_end):
224 |         gap.extend(list(range(gs, ge)))
225 |     # print(len(gap))
226 |     return gap
227 | 
228 | 
229 | def fill_grid(config: Config, dir_id: str):
230 |     """
231 |     Combine predictions from unshifted and shifted patches to remove
232 |     border effects
233 |     """
234 | 
235 |     print("Combining predictions")
236 | 
237 |     patch_size = config.model_params.patch_size
238 |     shift = int(patch_size / 2)
239 | 
240 |     expr_fp = (
241 |         config.files.data_dir
242 |         + "/"
243 |         + config.files.dir_out_maps
244 |         + "/"
245 |         + config.files.dir_patches
246 |         + str(config.model_params.patch_size)
247 |         + "x"
248 |         + str(config.model_params.patch_size)
249 |         + "_shift_"
250 |         + str(shift)
251 |     )
252 |     expr_fp_ext = ".hdf5"
253 | 
254 |     dir_id = os.path.join(
255 |         config.files.data_dir,
256 |         "model_outputs",
257 |         dir_id,
258 |         config.experiment_dirs.test_output_dir,
259 |     )
260 | 
261 |     pred_fp = "%s/epoch_%d_step_%d_seg_shift0.tif" % (
262 |         dir_id,
263 |         config.testing_params.test_epoch,
264 |         config.testing_params.test_step,
265 |     )
266 |     pred_fp_sf = "%s/epoch_%d_step_%d_seg_shift%d.tif" % (
267 |         dir_id,
268 |         config.testing_params.test_epoch,
269 |         config.testing_params.test_step,
270 |         shift,
271 |     )
272 | 
273 |     output_fp = dir_id + "/" + os.path.basename(pred_fp).replace("_seg_shift0", "")
274 | 
275 |     pred = tifffile.imread(pred_fp)
276 |     pred_sf = tifffile.imread(pred_fp_sf)
277 | 
278 |     fp_patches_sf = glob.glob(expr_fp + "/*" + expr_fp_ext)
279 |     fp_patches_sf = natsort.natsorted(fp_patches_sf)
280 | 
281 |     coords_patches = [re.findall(r"\d+", os.path.basename(x)) for x in fp_patches_sf]
282 |     coords_h1 = [int(x[0]) for x in coords_patches]
283 |     coords_w1 = [int(x[1]) for x in coords_patches]
284 | 
285 |     # Fill along grid
286 |     h_starts_wide = []
287 |     w_starts_wide = []
288 | 
289 |     # Middle section of shifted patches
290 |     for i in range(int(patch_size * 0.35), int(patch_size * 0.65)):
291 |         h_starts_wide.extend([x + i for x in coords_h1])
292 |         w_starts_wide.extend([x + i for x in coords_w1])
293 | 
294 |     # Gap larger than patch_size -> remove all corresponding locations
295 |     h_gap = gap_coords(coords_h1, patch_size)
296 |     w_gap = gap_coords(coords_w1, patch_size)
297 | 
298 |     fill = np.zeros(pred.shape)
299 |     fill[h_starts_wide, :] = 1
300 |     fill[:, w_starts_wide] = 1
301 | 
302 |     # Gaps
303 |     fill[h_gap, :] = 0
304 |     fill[:, w_gap] = 0
305 | 
306 |     tifffile.imwrite(
307 |         dir_id + "/" + "fill.tif", fill.astype(np.uint16), photometric="minisblack"
308 |     )
309 | 
310 |     result = np.zeros(pred.shape, dtype=np.uint32)
311 |     result = np.where(fill > 0, pred_sf, pred)
312 | 
313 |     tifffile.imwrite(output_fp, result.astype(np.uint32), photometric="minisblack")
314 | 
315 | 
316 | if __name__ == "__main__":
317 |     parser = argparse.ArgumentParser()
318 | 
319 |     parser.add_argument("--config_dir", type=str, help="path to config")
320 | 
321 |     args = parser.parse_args()
322 |     config = load_config(args.config_dir)
323 | 
324 |     test_output_dir = predict(config)
325 | 
326 |     fill_grid(config, test_output_dir)
327 | 


--------------------------------------------------------------------------------
/bidcell/model/train.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import logging
  3 | import math
  4 | import sys
  5 | import os
  6 | 
  7 | import matplotlib.pyplot as plt
  8 | import pandas as pd
  9 | import segmentation_models_pytorch as smp
 10 | import torch
 11 | import torch.optim.lr_scheduler as lr_scheduler
 12 | from torch.utils.data import DataLoader
 13 | 
 14 | from .dataio.dataset_input import DataProcessing
 15 | from .model.losses import (
 16 |     CellCallingLoss,
 17 |     NucleiEncapsulationLoss,
 18 |     OverlapLoss,
 19 |     Oversegmentation,
 20 |     PosNegMarkerLoss,
 21 | )
 22 | from .model.model import SegmentationModel as Network
 23 | from .utils.utils import (
 24 |     get_experiment_id,
 25 |     make_dir,
 26 |     save_fig_outputs,
 27 | )
 28 | from ..config import load_config, Config
 29 | 
 30 | 
 31 | def train(config: Config):
 32 |     logging.basicConfig(
 33 |         format="%(asctime)s %(levelname)s %(message)s",
 34 |         level=logging.INFO,
 35 |         stream=sys.stdout,
 36 |     )
 37 | 
 38 |     use_cuda = torch.cuda.is_available()
 39 |     device = torch.device("cuda" if use_cuda else "cpu")
 40 | 
 41 |     # Create experiment directories
 42 |     resume_epoch = None  # could be added
 43 |     resume_step = 0
 44 |     if resume_epoch is None:
 45 |         make_new = True
 46 |     else:
 47 |         make_new = False
 48 | 
 49 |     timestamp = get_experiment_id(
 50 |         make_new,
 51 |         config.experiment_dirs.dir_id,
 52 |         config.files.data_dir,
 53 |     )
 54 |     experiment_path = os.path.join(config.files.data_dir, "model_outputs", timestamp)
 55 |     make_dir(experiment_path + "/" + config.experiment_dirs.model_dir)
 56 |     make_dir(experiment_path + "/" + config.experiment_dirs.samples_dir)
 57 | 
 58 |     if config.training_params.model_freq <= config.testing_params.test_step:
 59 |         model_freq = config.training_params.model_freq
 60 |     else:
 61 |         model_freq = config.testing_params.test_step
 62 | 
 63 |     # Set up the model
 64 |     logging.info("Initialising model")
 65 | 
 66 |     atlas_exprs = pd.read_csv(config.files.fp_ref, index_col=0)
 67 |     n_genes = atlas_exprs.shape[1] - 3
 68 |     print("Number of genes: %d" % n_genes)
 69 | 
 70 |     if config.model_params.name != "custom":
 71 |         model = smp.Unet(
 72 |             encoder_name=config.model_params.name,
 73 |             encoder_weights=None,
 74 |             in_channels=n_genes,
 75 |             classes=2,
 76 |         )
 77 |     else:
 78 |         model = Network(n_channels=n_genes)
 79 | 
 80 |     model = model.to(device)
 81 | 
 82 |     # Dataloader
 83 |     logging.info("Preparing data")
 84 | 
 85 |     train_dataset = DataProcessing(
 86 |         config,
 87 |         isTraining=True,
 88 |         total_steps=config.training_params.total_steps,
 89 |     )
 90 |     train_loader = DataLoader(
 91 |         dataset=train_dataset, batch_size=1, shuffle=True, num_workers=0, drop_last=True
 92 |     )
 93 | 
 94 |     n_train_examples = len(train_loader)
 95 |     logging.info("Total number of training examples: %d" % n_train_examples)
 96 | 
 97 |     # Loss functions
 98 |     criterion_ne = NucleiEncapsulationLoss(config.training_params.ne_weight, device)
 99 |     criterion_os = Oversegmentation(config.training_params.os_weight, device)
100 |     criterion_cc = CellCallingLoss(config.training_params.cc_weight, device)
101 |     criterion_ov = OverlapLoss(config.training_params.ov_weight, device)
102 |     criterion_pn = PosNegMarkerLoss(
103 |         config.training_params.pos_weight,
104 |         config.training_params.neg_weight,
105 |         device,
106 |     )
107 | 
108 |     # Optimiser
109 |     if config.training_params.optimizer == "rmsprop":
110 |         optimizer = torch.optim.RMSprop(
111 |             model.parameters(),
112 |             lr=config.training_params.learning_rate,
113 |             weight_decay=1e-8,
114 |         )
115 |     elif config.training_params.optimizer == "adam":
116 |         optimizer = torch.optim.Adam(
117 |             model.parameters(),
118 |             lr=config.training_params.learning_rate,
119 |             betas=(config.training_params.beta1, config.training_params.beta2),
120 |             weight_decay=config.training_params.weight_decay,
121 |         )
122 |     else:
123 |         sys.exit("Select optimiser from rmsprop or adam")
124 | 
125 |     global_step = 0
126 | 
127 |     # Scheduler https://arxiv.org/pdf/1812.01187.pdf
128 |     lf = (
129 |         lambda x: (
130 |             ((1 + math.cos(x * math.pi / config.training_params.total_epochs)) / 2)
131 |             ** 1.0
132 |         )
133 |         * 0.95
134 |         + 0.05
135 |     )  # cosine
136 |     scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf)
137 |     scheduler.last_epoch = global_step
138 | 
139 |     # Starting epoch
140 |     if resume_epoch is not None:
141 |         initial_epoch = resume_epoch
142 |     else:
143 |         initial_epoch = 0
144 | 
145 |     # Restore saved model
146 |     if resume_epoch is not None:
147 |         load_path = (
148 |             experiment_path
149 |             + "/"
150 |             + config.experiment_dirs.model_dir
151 |             + "/epoch_%d_step_%d.pth" % (resume_epoch, resume_step)
152 |         )
153 |         checkpoint = torch.load(load_path)
154 |         model.load_state_dict(checkpoint["model_state_dict"])
155 |         optimizer.load_state_dict(checkpoint["optimizer_state_dict"])
156 |         epoch = checkpoint["epoch"]
157 |         assert epoch == resume_epoch
158 |         print("Resume training, successfully loaded " + load_path)
159 | 
160 |     logging.info("Begin training")
161 | 
162 |     model = model.train()
163 | 
164 |     lrs = []
165 | 
166 |     for epoch in range(initial_epoch, config.training_params.total_epochs):
167 |         cur_lr = optimizer.param_groups[0]["lr"]
168 |         print("\nEpoch =", (epoch + 1), " lr =", cur_lr)
169 | 
170 |         for step_epoch, (
171 |             batch_x313,
172 |             batch_n,
173 |             batch_sa,
174 |             batch_pos,
175 |             batch_neg,
176 |             coords_h1,
177 |             coords_w1,
178 |             nucl_aug,
179 |             expr_aug_sum,
180 |         ) in enumerate(train_loader):
181 |             # Permute channels axis to batch axis
182 |             # torch.Size([1, patch_size, patch_size, 313, n_cells]) to [n_cells, 313, patch_size, patch_size]
183 |             batch_x313 = batch_x313[0, :, :, :, :].permute(3, 2, 0, 1)
184 |             batch_sa = batch_sa.permute(3, 0, 1, 2)
185 |             batch_pos = batch_pos.permute(3, 0, 1, 2)
186 |             batch_neg = batch_neg.permute(3, 0, 1, 2)
187 |             batch_n = batch_n.permute(3, 0, 1, 2)
188 | 
189 |             if batch_x313.shape[0] == 0:
190 |                 if (step_epoch % model_freq) == 0:
191 |                     save_path = (
192 |                         experiment_path
193 |                         + "/"
194 |                         + config.experiment_dirs.model_dir
195 |                         + "/epoch_%d_step_%d.pth" % (epoch + 1, step_epoch)
196 |                     )
197 |                     torch.save(
198 |                         {
199 |                             "epoch": epoch + 1,
200 |                             "model_state_dict": model.state_dict(),
201 |                             "optimizer_state_dict": optimizer.state_dict(),
202 |                         },
203 |                         save_path,
204 |                     )
205 |                     logging.info("Model saved: %s" % save_path)
206 |                 continue
207 | 
208 |             # Transfer to GPU
209 |             batch_x313 = batch_x313.to(device)
210 |             batch_sa = batch_sa.to(device)
211 |             batch_pos = batch_pos.to(device)
212 |             batch_neg = batch_neg.to(device)
213 |             batch_n = batch_n.to(device)
214 | 
215 |             optimizer.zero_grad()
216 | 
217 |             seg_pred = model(batch_x313)
218 | 
219 |             # Compute losses
220 |             loss_ne = criterion_ne(seg_pred, batch_n)
221 |             loss_os = criterion_os(seg_pred, batch_n)
222 |             loss_cc = criterion_cc(seg_pred, batch_sa)
223 |             loss_ov = criterion_ov(seg_pred, batch_n)
224 |             loss_pn = criterion_pn(seg_pred, batch_pos, batch_neg)
225 | 
226 |             loss_ne = loss_ne.squeeze()
227 |             loss_os = loss_os.squeeze()
228 |             loss_cc = loss_cc.squeeze()
229 |             loss_ov = loss_ov.squeeze()
230 |             loss_pn = loss_pn.squeeze()
231 | 
232 |             loss = loss_ne + loss_os + loss_cc + loss_ov + loss_pn
233 | 
234 |             # Optimisation
235 |             loss.backward()
236 |             optimizer.step()
237 | 
238 |             # step_ne_loss = loss_ne.detach().cpu().numpy() # noqa
239 |             # step_os_loss = loss_os.detach().cpu().numpy() # noqa
240 |             # step_cc_loss = loss_cc.detach().cpu().numpy() # noqa
241 |             # step_ov_loss = loss_ov.detach().cpu().numpy() # noqa
242 |             # step_pn_loss = loss_pn.detach().cpu().numpy() # noqa
243 | 
244 |             step_train_loss = loss.detach().cpu().numpy()
245 | 
246 |             if (global_step % config.training_params.sample_freq) == 0:
247 |                 coords_h1 = coords_h1.detach().cpu().squeeze().numpy()
248 |                 coords_w1 = coords_w1.detach().cpu().squeeze().numpy()
249 |                 sample_seg = seg_pred.detach().cpu().numpy()
250 |                 sample_n = nucl_aug.detach().cpu().numpy()
251 |                 sample_sa = batch_sa.detach().cpu().numpy()
252 |                 sample_expr = expr_aug_sum.detach().cpu().numpy()
253 |                 patch_fp = (
254 |                     experiment_path
255 |                     + "/"
256 |                     + config.experiment_dirs.samples_dir
257 |                     + "/epoch_%d_%d_%d_%d.png"
258 |                     % (epoch + 1, step_epoch, coords_h1, coords_w1)
259 |                 )
260 | 
261 |                 save_fig_outputs(sample_seg, sample_n, sample_sa, sample_expr, patch_fp)
262 | 
263 |                 print(
264 |                     "Epoch[{}/{}], Step[{}], Loss:{:.4f}".format(
265 |                         epoch + 1,
266 |                         config.training_params.total_epochs,
267 |                         step_epoch,
268 |                         step_train_loss,
269 |                     )
270 |                 )
271 |                 # print('NE:{:.4f}, TC:{:.4f}, CC:{:.4f}, OV:{:.4f}, PN:{:.4f}'.format(step_ne_loss,
272 |                 #                                                                     step_os_loss,
273 |                 #                                                                     step_cc_loss,
274 |                 #                                                                     step_ov_loss,
275 |                 #                                                                     step_pn_loss))
276 | 
277 |             # Save model
278 |             if (step_epoch % model_freq) == 0:
279 |                 save_path = (
280 |                     experiment_path
281 |                     + "/"
282 |                     + config.experiment_dirs.model_dir
283 |                     + "/epoch_%d_step_%d.pth" % (epoch + 1, step_epoch)
284 |                 )
285 |                 torch.save(
286 |                     {
287 |                         "epoch": epoch + 1,
288 |                         "model_state_dict": model.state_dict(),
289 |                         "optimizer_state_dict": optimizer.state_dict(),
290 |                     },
291 |                     save_path,
292 |                 )
293 |                 logging.info("Model saved: %s" % save_path)
294 | 
295 |             global_step += 1
296 | 
297 |         # Update and append current LR
298 |         scheduler.step()
299 |         lrs.append(cur_lr)
300 | 
301 |     # Plot lr scheduler
302 |     # plt.plot(lrs, ".-", label="LambdaLR")
303 |     # plt.xlabel("epoch")
304 |     # plt.ylabel("LR")
305 |     # plt.tight_layout()
306 |     # plt.savefig(experiment_path + "/LR.png", dpi=300)
307 | 
308 |     logging.info("Training finished")
309 | 
310 | 
311 | if __name__ == "__main__":
312 |     parser = argparse.ArgumentParser()
313 | 
314 |     parser.add_argument("--config_dir", type=str, help="path to config")
315 | 
316 |     args = parser.parse_args()
317 |     config = load_config(args.config_dir)
318 | 
319 |     train(config)
320 | 


--------------------------------------------------------------------------------
/bidcell/model/utils/utils.py:
--------------------------------------------------------------------------------
  1 | import collections
  2 | import datetime as dt
  3 | import json
  4 | import os
  5 | import random
  6 | import re
  7 | import sys
  8 | 
  9 | import matplotlib.pyplot as plt
 10 | import natsort
 11 | import numpy as np
 12 | from scipy.special import softmax
 13 | 
 14 | 
 15 | def sorted_alphanumeric(data):
 16 |     """
 17 |     Alphanumerically sort a list
 18 |     """
 19 |     convert = lambda text: int(text) if text.isdigit() else text.lower()
 20 |     alphanum_key = lambda key: [convert(c) for c in re.split("([0-9]+)", key)]
 21 |     return sorted(data, key=alphanum_key)
 22 | 
 23 | 
 24 | def make_dir(dir_path):
 25 |     """
 26 |     Make directory if doesn't exist
 27 |     """
 28 |     if not os.path.exists(dir_path):
 29 |         os.makedirs(dir_path)
 30 | 
 31 | 
 32 | def delete_file(path):
 33 |     """
 34 |     Delete file if exists
 35 |     """
 36 |     if os.path.exists(path):
 37 |         os.remove(path)
 38 | 
 39 | 
 40 | def get_files_list(path, ext_array=[".tif"]):
 41 |     """
 42 |     Get all files in a directory with a specific extension
 43 |     """
 44 |     files_list = list()
 45 |     dirs_list = list()
 46 | 
 47 |     for root, dirs, files in os.walk(path, topdown=True):
 48 |         for file in files:
 49 |             if any(x in file for x in ext_array):
 50 |                 files_list.append(os.path.join(root, file))
 51 |                 folder = os.path.dirname(os.path.join(root, file))
 52 |                 if folder not in dirs_list:
 53 |                     dirs_list.append(folder)
 54 | 
 55 |     return files_list, dirs_list
 56 | 
 57 | 
 58 | def json_file_to_pyobj(filename):
 59 |     """
 60 |     Read json config file
 61 |     """
 62 | 
 63 |     def _json_object_hook(d):
 64 |         return collections.namedtuple("X", d.keys())(*d.values())
 65 | 
 66 |     def json2obj(data):
 67 |         return json.loads(data, object_hook=_json_object_hook)
 68 | 
 69 |     return json2obj(open(filename).read())
 70 | 
 71 | 
 72 | def get_newest_id(exp_dir="model_outputs"):
 73 |     """Get the latest experiment ID based on its timestamp
 74 | 
 75 |     Parameters
 76 |     ----------
 77 |     exp_dir : str, optional
 78 |         Name of the directory that contains all the experiment directories
 79 | 
 80 |     Returns
 81 |     -------
 82 |     exp_id : str
 83 |         Name of the latest experiment directory
 84 |     """
 85 |     folders = next(os.walk(exp_dir))[1]
 86 |     if len(folders) == 0:
 87 |         sys.exit(f"No model output folders found in {exp_dir}")
 88 |     folders = natsort.natsorted(folders)
 89 |     folder_last = folders[-1]
 90 |     exp_id = folder_last.replace("\\", "/")
 91 |     return exp_id
 92 | 
 93 | 
 94 | def get_experiment_id(make_new, dir_id, data_dir):
 95 |     """
 96 |     Get timestamp ID of current experiment
 97 |     """
 98 |     if make_new is False:
 99 |         if dir_id == "last":
100 |             timestamp = get_newest_id(os.path.join(data_dir, "model_outputs"))
101 |         else:
102 |             timestamp = dir_id
103 |     else:
104 |         timestamp = dt.datetime.now().strftime("%Y_%m_%d_%H_%M_%S")
105 | 
106 |     return timestamp
107 | 
108 | 
109 | def get_seg_mask(sample_seg, sample_n):
110 |     """
111 |     Generate the segmentation mask with unique cell IDs
112 |     """
113 |     sample_n = np.squeeze(sample_n)
114 | 
115 |     # Background prob is average probability of all cells EXCEPT FOR NUCLEI
116 |     sample_probs = softmax(sample_seg, axis=1)
117 |     bgd_probs = np.expand_dims(np.mean(sample_probs[:, 0, :, :], axis=0), 0)
118 |     fgd_probs = sample_probs[:, 1, :, :]
119 |     probs = np.concatenate((bgd_probs, fgd_probs), axis=0)
120 |     final_seg = np.argmax(probs, axis=0)
121 | 
122 |     # Map predictions to original cell IDs
123 |     ids_orig = np.unique(sample_n)
124 |     if ids_orig[0] != 0:
125 |         ids_orig = np.insert(ids_orig, 0, 0)
126 |     ids_pred = np.unique(final_seg)
127 |     if ids_pred[0] != 0:
128 |         ids_pred = np.insert(ids_pred, 0, 0)
129 |     ids_orig = ids_orig[ids_pred]
130 | 
131 |     dictionary = dict(zip(ids_pred, ids_orig))
132 |     dictionary[0] = 0
133 |     final_seg_orig = np.copy(final_seg)
134 |     final_seg_orig = np.vectorize(dictionary.get)(final_seg)
135 | 
136 |     # Add nuclei back in
137 |     final_seg_orig = np.where(sample_n > 0, sample_n, final_seg_orig)
138 | 
139 |     return final_seg_orig
140 | 
141 | 
142 | def save_fig_outputs(sample_seg, sample_n, sample_sa, sample_expr, patch_fp):
143 |     """
144 |     Generate figure of inputs and outputs
145 |     """
146 |     sample_n = np.squeeze(sample_n)
147 | 
148 |     sample_expr = np.squeeze(sample_expr)
149 |     sample_expr[sample_expr > 0] = 1
150 | 
151 |     sample_sa = np.squeeze(np.sum(sample_sa, 0))
152 | 
153 |     final_seg_orig = get_seg_mask(sample_seg, sample_n)
154 | 
155 |     # Randomise colours for plot
156 |     cells_ids_orig = np.unique(final_seg_orig)
157 |     n_cells_ids = len(cells_ids_orig)
158 |     cell_ids_rand = np.arange(1, n_cells_ids + 1)
159 |     random.shuffle(cell_ids_rand)
160 |     dictionary = dict(zip(cells_ids_orig, cell_ids_rand))
161 |     dictionary[0] = 0
162 |     final_seg_mapped = np.copy(final_seg_orig)
163 |     final_seg_mapped = np.vectorize(dictionary.get)(final_seg_orig)
164 |     nuclei_mapped = np.copy(sample_n)
165 |     nuclei_mapped = np.vectorize(dictionary.get)(sample_n)
166 | 
167 |     # Plot
168 |     fig, axes = plt.subplots(ncols=3, figsize=(9, 3), sharex=True, sharey=True)
169 |     ax = axes.ravel()
170 | 
171 |     ax[0].imshow(nuclei_mapped, cmap=plt.cm.nipy_spectral)
172 |     ax[0].set_title("Nuclei")
173 |     ax[1].imshow(final_seg_mapped, cmap=plt.cm.nipy_spectral)
174 |     ax[1].set_title("Cells")
175 |     ax[2].imshow(sample_expr, cmap=plt.cm.gray)
176 |     ax[2].set_title("Expressions")
177 |     # ax[3].imshow(sample_sa, cmap=plt.cm.gray)
178 |     # ax[3].set_title("Eligible")
179 | 
180 |     for a in ax:
181 |         a.set_axis_off()
182 | 
183 |     fig.tight_layout()
184 |     # plt.show()
185 | 
186 |     fig.savefig(patch_fp)
187 |     plt.close(fig)
188 | 


--------------------------------------------------------------------------------
/bidcell/processing/cell_gene_matrix.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import glob
  3 | import multiprocessing as mp
  4 | import os
  5 | import sys
  6 | 
  7 | import cv2
  8 | import numpy as np
  9 | import pandas as pd
 10 | import tifffile
 11 | from tqdm import tqdm
 12 | 
 13 | from .utils import get_n_processes, get_patches_coords
 14 | from ..config import Config, load_config
 15 | 
 16 | np.seterr(divide="ignore", invalid="ignore")
 17 | 
 18 | 
 19 | def process_chunk(
 20 |     chunk, output_dir, cell_ids_unique, col_names, seg_map, x_col, y_col, gene_col
 21 | ):
 22 |     """Extract cell expression profiles"""
 23 | 
 24 |     df_out = pd.DataFrame(0, index=cell_ids_unique, columns=col_names)
 25 |     df_out["cell_id"] = cell_ids_unique.copy()
 26 | 
 27 |     chunk_id = chunk.index[0]
 28 | 
 29 |     for index_row, row in chunk.iterrows():
 30 |         gene = row[gene_col]
 31 |         w_loc = row[x_col]
 32 |         h_loc = row[y_col]
 33 | 
 34 |         seg_val = seg_map[h_loc, w_loc]
 35 |         if seg_val > 0:
 36 |             df_out.loc[seg_val, gene] += 1
 37 | 
 38 |     df_out.to_csv(output_dir + "/" + "chunk_%d.csv" % chunk_id)
 39 | 
 40 | 
 41 | def process_chunk_meta(
 42 |     matrix, fp_output, seg_map_mi, col_names_coords, scale_pix_x, scale_pix_y
 43 | ):
 44 |     """Compute cell locations and sizes"""
 45 | 
 46 |     chunk_id = matrix[0, 0]
 47 |     output = np.zeros((matrix.shape[0], len(col_names_coords)))
 48 |     output[:, 0] = matrix[:, 0].copy()
 49 |     output[:, 4:] = matrix[:, 1:].copy()
 50 | 
 51 |     # Convert to pixel resolution
 52 |     for cur_i, cell_id in enumerate(output[:, 0]):
 53 |         if cell_id > 0:
 54 |             try:
 55 |                 # cell_centroid_x and cell_centroid_y
 56 |                 coords = np.where(seg_map_mi == cell_id)
 57 |                 x_points = coords[1]
 58 |                 y_points = coords[0]
 59 |                 centroid_x = sum(x_points) / len(x_points)
 60 |                 centroid_y = sum(y_points) / len(y_points)
 61 |                 output[cur_i, 1] = centroid_x * scale_pix_x
 62 |                 output[cur_i, 2] = centroid_y * scale_pix_y
 63 | 
 64 |                 # cell_size
 65 |                 output[cur_i, 3] = len(coords[0]) / (scale_pix_x * scale_pix_y)
 66 |             except Exception:
 67 |                 output[cur_i, 1] = -1
 68 |                 output[cur_i, 2] = -1
 69 |                 output[cur_i, 3] = -1
 70 | 
 71 |     # Save as csv
 72 |     df_split = pd.DataFrame(
 73 |         output, index=list(range(output.shape[0])), columns=col_names_coords
 74 |     )
 75 |     df_split.to_csv(fp_output + "%d.csv" % chunk_id, index=False)
 76 | 
 77 | 
 78 | def transform_locations(df_expr, col, scale, shift=0):
 79 |     """Scale transcripts to pixel resolution of the platform"""
 80 |     print(f"Transforming {col}")
 81 |     df_expr[col] = df_expr[col].div(scale).round().astype(int).sub(shift)
 82 |     return df_expr
 83 | 
 84 | 
 85 | def read_expr_csv(fp):
 86 |     try:
 87 |         print("Reading filtered transcripts")
 88 |         return pd.read_csv(fp)
 89 |     except Exception:
 90 |         sys.exit(f"Cannot read {fp}")
 91 | 
 92 | 
 93 | def make_cell_gene_mat(config: Config, is_cell: bool, timestamp: str | None = None):
 94 |     dir_dataset = config.files.data_dir
 95 |     dir_cgm = config.files.dir_cgm
 96 | 
 97 |     if is_cell is False:
 98 |         output_dir = os.path.join(dir_dataset, dir_cgm, "nuclei")
 99 |     else:
100 |         output_dir = os.path.join(dir_dataset, dir_cgm, timestamp)
101 | 
102 |     fp_transcripts_processed = os.path.join(
103 |         dir_dataset, config.files.fp_transcripts_processed
104 |     )
105 | 
106 |     fp_gene_names = os.path.join(dir_dataset, config.files.fp_gene_names)
107 | 
108 |     if is_cell is False:
109 |         fp_seg = os.path.join(dir_dataset, config.files.fp_nuclei)
110 |     else:
111 |         fp_seg_name = [
112 |             "epoch_"
113 |             + str(config.testing_params.test_epoch)
114 |             + "_step_"
115 |             + str(config.testing_params.test_step)
116 |             + "_connected.tif"
117 |         ]
118 |         fp_seg = os.path.join(
119 |             config.files.data_dir,
120 |             "model_outputs",
121 |             timestamp,
122 |             config.experiment_dirs.test_output_dir,
123 |             "".join(fp_seg_name),
124 |         )
125 | 
126 |     # Column names in the transcripts csv
127 |     x_col = config.transcripts.x_col
128 |     y_col = config.transcripts.y_col
129 |     gene_col = config.transcripts.gene_col
130 | 
131 |     if not os.path.exists(output_dir):
132 |         os.makedirs(output_dir)
133 | 
134 |     seg_map_mi = tifffile.imread(fp_seg)
135 |     height = seg_map_mi.shape[0]
136 |     width = seg_map_mi.shape[1]
137 | 
138 |     cell_ids_unique = np.unique(seg_map_mi.reshape(-1))
139 |     cell_ids_unique = cell_ids_unique[1:]
140 |     n_cells = len(cell_ids_unique)
141 |     print("Number of cells " + str(n_cells))
142 | 
143 |     with open(fp_gene_names) as file:
144 |         gene_names = [line.rstrip() for line in file]
145 | 
146 |     col_names = ["cell_id"] + gene_names
147 | 
148 |     # Divide the dataframe into chunks for multiprocessing
149 |     n_processes = get_n_processes(config.cpus)
150 |     print(f"Number of splits for multiprocessing: {n_processes}")
151 | 
152 |     # Scale factor to pixel resolution of platform
153 |     # read in affine
154 |     # extract scale_x and scale_y
155 |     # divide by (scale_x*pixel resolution) (microns per pixel)
156 |     # affine = pd.read_csv(fp_affine, index_col=0, header=None, sep='\t')
157 |     # scale_x_tr = float(affine.loc["scale_x"].item())
158 |     # scale_y_tr = float(affine.loc["scale_y"].item())
159 |     # scale_pix_x = (scale_x_tr*config.affine.scale_pix_x)
160 |     # scale_pix_y = (scale_y_tr*config.affine.scale_pix_y)
161 |     scale_pix_x = config.affine.scale_pix_x
162 |     scale_pix_y = config.affine.scale_pix_y
163 | 
164 |     if not os.path.exists(output_dir + "/" + config.files.fp_expr):
165 |         # Rescale to pixel size
166 |         height_pix = np.round(height / config.affine.scale_pix_y).astype(int)
167 |         width_pix = np.round(width / config.affine.scale_pix_x).astype(int)
168 | 
169 |         seg_map = cv2.resize(
170 |             seg_map_mi.astype(np.int32),
171 |             (width_pix, height_pix),
172 |             interpolation=cv2.INTER_NEAREST,
173 |         )
174 |         print("Segmentation map pixel size: ", seg_map.shape)
175 |         fp_rescaled_seg = output_dir + "/rescaled.tif"
176 |         print("Saving temporary resized segmentation")
177 |         tifffile.imwrite(
178 |             fp_rescaled_seg, seg_map.astype(np.uint32), photometric="minisblack"
179 |         )
180 | 
181 |         df_out = pd.DataFrame(0, index=cell_ids_unique, columns=col_names)
182 |         df_out["cell_id"] = cell_ids_unique.copy()
183 | 
184 |         # Divide into patches for large datasets that exceed memory capacity
185 |         if (height_pix + width_pix) > config.cgm_params.max_sum_hw:
186 |             patch_h = int(config.cgm_params.max_sum_hw / 2)
187 |             patch_w = config.cgm_params.max_sum_hw - patch_h
188 |         else:
189 |             patch_h = height_pix
190 |             patch_w = width_pix
191 | 
192 |         h_coords, _ = get_patches_coords(height_pix, patch_h)
193 |         w_coords, _ = get_patches_coords(width_pix, patch_w)
194 |         hw_coords = [(hs, he, ws, we) for (hs, he) in h_coords for (ws, we) in w_coords]
195 | 
196 |         print("Extracting cell expressions")
197 |         for hs, he, ws, we in tqdm(hw_coords):
198 |             print(f"Patch H {hs}:{he}, W {ws}:{we}")
199 |             seg_map = tifffile.imread(fp_rescaled_seg)[hs:he, ws:we]
200 |             print(seg_map.shape)
201 | 
202 |             df_expr = read_expr_csv(fp_transcripts_processed)
203 |             print(
204 |                 df_expr[x_col].min(),
205 |                 df_expr[x_col].max(),
206 |                 df_expr[y_col].min(),
207 |                 df_expr[y_col].max(),
208 |             )
209 | 
210 |             df_expr = transform_locations(df_expr, x_col, scale_pix_x)
211 |             df_expr = transform_locations(df_expr, y_col, scale_pix_y)
212 | 
213 |             df_expr = df_expr[
214 |                 (df_expr[x_col].between(ws, we - 1))
215 |                 & (df_expr[y_col].between(hs, he - 1))
216 |             ]
217 |             print(
218 |                 df_expr[x_col].min(),
219 |                 df_expr[x_col].max(),
220 |                 df_expr[y_col].min(),
221 |                 df_expr[y_col].max(),
222 |             )
223 | 
224 |             df_expr = transform_locations(df_expr, x_col, 1, ws)
225 |             df_expr = transform_locations(df_expr, y_col, 1, hs)
226 |             print(
227 |                 df_expr[x_col].min(),
228 |                 df_expr[x_col].max(),
229 |                 df_expr[y_col].min(),
230 |                 df_expr[y_col].max(),
231 |             )
232 | 
233 |             df_expr.reset_index(drop=True, inplace=True)
234 | 
235 |             df_expr_splits = np.array_split(df_expr, n_processes)
236 |             processes = []
237 | 
238 |             print("Extracting cell-gene matrix chunks")
239 |             for chunk in df_expr_splits:
240 |                 p = mp.Process(
241 |                     target=process_chunk,
242 |                     args=(
243 |                         chunk,
244 |                         output_dir,
245 |                         cell_ids_unique,
246 |                         col_names,
247 |                         seg_map,
248 |                         x_col,
249 |                         y_col,
250 |                         gene_col,
251 |                     ),
252 |                 )
253 |                 processes.append(p)
254 |                 p.start()
255 | 
256 |             for p in processes:
257 |                 p.join()
258 | 
259 |             print("Combining cell-gene matrix chunks")
260 | 
261 |             fp_chunks = glob.glob(output_dir + "/chunk_*.csv")
262 |             for fpc in fp_chunks:
263 |                 df_i = pd.read_csv(fpc, index_col=0)
264 |                 df_out.iloc[:, 1:] = df_out.iloc[:, 1:].add(df_i.iloc[:, 1:])
265 | 
266 |             df_out.to_csv(output_dir + "/" + config.files.fp_expr)
267 | 
268 |             # Clean up
269 |             for fpc in fp_chunks:
270 |                 os.remove(fpc)
271 | 
272 |         print("Obtained cell-gene matrix")
273 |         os.remove(fp_rescaled_seg)
274 |         del seg_map
275 |         del df_expr
276 | 
277 |     else:
278 |         df_out = pd.read_csv(output_dir + "/" + config.files.fp_expr, index_col=0)
279 | 
280 |     # if is_cell:
281 |     #     print("Computing cell locations and sizes")
282 | 
283 |     #     matrix_all = df_out.to_numpy().astype(np.float32)
284 |     #     matrix_all_splits = np.array_split(matrix_all, n_processes)
285 |     #     processes = []
286 | 
287 |     #     fp_output = output_dir + "/cell_outputs_"
288 |     #     col_names_coords = [
289 |     #         "cell_id",
290 |     #         "cell_centroid_x",
291 |     #         "cell_centroid_y",
292 |     #         "cell_size",
293 |     #     ] + gene_names
294 | 
295 |     #     for chunk in matrix_all_splits:
296 |     #         p = mp.Process(
297 |     #             target=process_chunk_meta,
298 |     #             args=(
299 |     #                 chunk,
300 |     #                 fp_output,
301 |     #                 seg_map_mi,
302 |     #                 col_names_coords,
303 |     #                 scale_pix_x,
304 |     #                 scale_pix_y,
305 |     #             ),
306 |     #         )
307 |     #         processes.append(p)
308 |     #         p.start()
309 | 
310 |     #     for p in processes:
311 |     #         p.join()
312 | 
313 |     print("Done")
314 | 
315 | 
316 | if __name__ == "__main__":
317 |     parser = argparse.ArgumentParser()
318 | 
319 |     parser.add_argument("--config_dir", type=str, help="path to config")
320 |     parser.add_argument(
321 |         "--is_cell", type=bool, help="whether to segment cells or nuclei"
322 |     )
323 | 
324 |     args = parser.parse_args()
325 |     config = load_config(args.config_dir)
326 | 
327 |     make_cell_gene_mat(config, args.is_cell)
328 | 


--------------------------------------------------------------------------------
/bidcell/processing/nuclei_segmentation.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import os
  3 | 
  4 | import numpy as np
  5 | import pandas as pd
  6 | import tifffile
  7 | from cellpose import models
  8 | from skimage.transform import resize
  9 | from tqdm import tqdm
 10 | 
 11 | from .utils import get_patches_coords
 12 | from ..config import Config, load_config
 13 | 
 14 | 
 15 | def resize_dapi(dapi, new_h, new_w):
 16 |     """Resize DAPI image"""
 17 |     resized = resize(dapi, (new_h, new_w), preserve_range=True, anti_aliasing=True)
 18 |     return resized
 19 | 
 20 | 
 21 | def segment_dapi(img, diameter=None, use_cpu=False):
 22 |     """Segment nuclei in DAPI image using Cellpose"""
 23 |     use_gpu = True if not use_cpu else False
 24 |     model = models.Cellpose(gpu=use_gpu, model_type="cyto")
 25 |     channels = [0, 0]
 26 |     mask, _, _, _ = model.eval(img, diameter=diameter, channels=channels)
 27 |     return mask
 28 | 
 29 | 
 30 | def segment_nuclei(config: Config):
 31 |     dir_dataset = config.files.data_dir
 32 | 
 33 |     print("Reading DAPI image")
 34 |     if config.files.fp_dapi is None:
 35 |         fp_dapi = os.path.join(dir_dataset, "dapi_stitched.tif")
 36 |     else:
 37 |         fp_dapi = config.files.fp_dapi
 38 |     print(fp_dapi)
 39 |     dapi = tifffile.imread(fp_dapi)
 40 | 
 41 |     # Crop to size of transcript map (requires getting transcript maps first)
 42 |     if config.nuclei.crop_nuclei_to_ts:
 43 |         # Get starting coordinates
 44 |         fp_affine = os.path.join(dir_dataset, config.files.fp_affine)
 45 | 
 46 |         affine = pd.read_csv(fp_affine, index_col=0, header=None, sep="\t")
 47 | 
 48 |         min_x = int(float(affine.loc["min_x"].item()))
 49 |         min_y = int(float(affine.loc["min_y"].item()))
 50 |         size_x = int(float(affine.loc["size_x"].item()))
 51 |         size_y = int(float(affine.loc["size_y"].item()))
 52 | 
 53 |         dapi = dapi[min_y : min_y + size_y, min_x : min_x + size_x]
 54 | 
 55 |     dapi_h = dapi.shape[0]
 56 |     dapi_w = dapi.shape[1]
 57 |     print(f"DAPI shape h: {dapi_h} w: {dapi_w}")
 58 | 
 59 |     # Process patch-wise if too large
 60 |     if (
 61 |         dapi_h > config.nuclei.max_height
 62 |         or dapi_w > config.nuclei.max_width
 63 |         or config.affine.scale_pix_x != 1.0
 64 |         or config.affine.scale_pix_y != 1.0
 65 |     ):
 66 |         if config.nuclei.max_height is None:
 67 |             max_height = dapi_h
 68 |         else:
 69 |             max_height = config.nuclei.max_height if config.nuclei.max_height < dapi_h else dapi_h
 70 |         if config.nuclei.max_width is None:
 71 |             max_width = dapi_w
 72 |         else:
 73 |             max_width = config.nuclei.max_width if config.nuclei.max_width < dapi_w else dapi_w
 74 | 
 75 |         print(f"Segmenting DAPI patches h: {max_height} w: {max_width}")
 76 | 
 77 |         # Coordinates of patches
 78 |         h_coords, _ = get_patches_coords(dapi_h, max_height)
 79 |         w_coords, _ = get_patches_coords(dapi_w, max_width)
 80 |         hw_coords = [(hs, he, ws, we) for (hs, he) in h_coords for (ws, we) in w_coords]
 81 | 
 82 |         # Original patch sizes
 83 |         h_patch_sizes = [he - hs for (hs, he) in h_coords]
 84 |         w_patch_sizes = [we - ws for (ws, we) in w_coords]
 85 | 
 86 |         # Determine the resized patch sizes
 87 |         rh_patch_sizes = [round(y * config.affine.scale_pix_y) for y in h_patch_sizes]
 88 |         rw_patch_sizes = [round(x * config.affine.scale_pix_x) for x in w_patch_sizes]
 89 |         rhw_patch_sizes = [
 90 |             (hsize, wsize) for hsize in rh_patch_sizes for wsize in rw_patch_sizes
 91 |         ]
 92 | 
 93 |         # Determine the resized patch starting coordinates
 94 |         rh_coords = [sum(rh_patch_sizes[:i]) for i, x in enumerate(rh_patch_sizes)]
 95 |         rw_coords = [sum(rw_patch_sizes[:i]) for i, x in enumerate(rw_patch_sizes)]
 96 |         rhw_coords = [(h, w) for h in rh_coords for w in rw_coords]
 97 | 
 98 |         # Sum up the new sizes to get the final size of the resized DAPI
 99 |         rh_dapi = sum(rh_patch_sizes)
100 |         rw_dapi = sum(rw_patch_sizes)
101 |         rdapi = np.zeros((rh_dapi, rw_dapi), dtype=dapi.dtype)
102 |         nuclei = np.zeros((rh_dapi, rw_dapi), dtype=np.uint32)
103 |         print(f"Nuclei image h: {rh_dapi} w: {rw_dapi}")
104 | 
105 |         # Divide into patches
106 |         n_patches = len(hw_coords)
107 |         total_n = 0
108 | 
109 |         for patch_i in tqdm(range(n_patches)):
110 |             (hs, he, ws, we) = hw_coords[patch_i]
111 |             (hsize, wsize) = rhw_patch_sizes[patch_i]
112 |             (h, w) = rhw_coords[patch_i]
113 | 
114 |             patch = dapi[hs:he, ws:we]
115 | 
116 |             patch_resized = resize_dapi(patch, hsize, wsize)
117 |             rdapi[h : h + hsize, w : w + wsize] = patch_resized
118 | 
119 |             # Segment nuclei in each patch and place into final segmentation with unique ID
120 |             patch_nuclei = segment_dapi(patch_resized, config.nuclei.diameter, config.nuclei.use_cpu)
121 |             nuclei_mask = np.where(patch_nuclei > 0, 1, 0)
122 |             nuclei[h : h + hsize, w : w + wsize] = patch_nuclei + total_n * nuclei_mask
123 |             unique_ids = np.unique(patch_nuclei)
124 |             total_n += unique_ids.max()
125 | 
126 |         # Save resized DAPI
127 |         fp_rdapi = os.path.join(dir_dataset, config.files.fp_rdapi)
128 |         tifffile.imwrite(fp_rdapi, rdapi, photometric="minisblack")
129 | 
130 |     else:
131 |         print("Segmenting whole DAPI")
132 |         nuclei = segment_dapi(dapi)
133 | 
134 |     print(f"Finished segmenting, found {len(np.unique(nuclei))-1} nuclei")
135 | 
136 |     # Save nuclei segmentation
137 |     fp_nuclei = os.path.join(dir_dataset, config.files.fp_nuclei)
138 |     tifffile.imwrite(fp_nuclei, nuclei.astype(np.uint32), photometric="minisblack")
139 | 
140 | 
141 | if __name__ == "__main__":
142 |     parser = argparse.ArgumentParser()
143 | 
144 |     parser.add_argument(
145 |         "--config_dir", type=str, help="path to config"
146 |     )
147 | 
148 |     args = parser.parse_args()
149 |     config = load_config(args.config_dir)
150 | 
151 |     segment_nuclei(config)
152 | 


--------------------------------------------------------------------------------
/bidcell/processing/nuclei_stitch_fov.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import glob
  3 | import os
  4 | import re
  5 | import sys
  6 | 
  7 | import natsort
  8 | import numpy as np
  9 | import tifffile
 10 | from PIL import Image
 11 | from ..config import Config, load_config
 12 | 
 13 | 
 14 | def check_pattern(string, pattern):
 15 |     """Check that a string contains a pattern"""
 16 |     pattern = pattern.replace("#", r"\d")  # Replace "#" with "\d" to match any digit
 17 |     match = re.search(pattern, string)
 18 |     # print(match, match.group())
 19 | 
 20 |     if match:
 21 |         return True  # Pattern found in the string
 22 |     else:
 23 |         return False  # Pattern not found in the string
 24 | 
 25 | 
 26 | def check_shape_imgs(fp, target_h, target_w):
 27 |     """Check that an image (given its file path) has the same shape as target"""
 28 |     img = Image.open(fp)
 29 |     if img.size == (target_w, target_h):
 30 |         return True
 31 |     else:
 32 |         return False
 33 | 
 34 | 
 35 | def check_images_meet_criteria(fp_list, boolean_list, msg):
 36 |     """Check if any file in a list of paths doesn't meet criteria"""
 37 |     wrong = [i for i, x in enumerate(boolean_list) if not x]
 38 |     if len(wrong) > 0:
 39 |         sys.exit(f"{msg}: {[fp_list[i] for i in wrong]}")
 40 | 
 41 | 
 42 | def get_string_with_pattern(number, pattern):
 43 |     """1 # means F1..F10, etc. >1 # means F001..F010, etc"""
 44 |     num_hash = pattern.count("#")
 45 |     no_hash = pattern.replace("#", "")
 46 | 
 47 |     if num_hash == 1:
 48 |         return no_hash + str(number)
 49 |     elif num_hash >= len(str(number)):
 50 |         padded = str(number).zfill(num_hash)
 51 |         return no_hash + str(padded)
 52 |     else:
 53 |         sys.exit(f"Number requires more characters than {pattern}")
 54 | 
 55 | 
 56 | def read_dapi(fp, channel_first, channel_dapi):
 57 |     """Reads DAPI image or channel from file"""
 58 |     dapi = tifffile.imread(fp)
 59 | 
 60 |     if len(dapi.shape) > 2:
 61 |         if channel_first:
 62 |             dapi = dapi[channel_dapi, :, :]
 63 |         else:
 64 |             dapi = dapi[:, :, channel_dapi]
 65 | 
 66 |     return dapi
 67 | 
 68 | 
 69 | def stitch_nuclei(config: Config):
 70 |     dir_dataset = config.files.data_dir
 71 | 
 72 |     if not config.nuclei_fovs.dir_dapi:
 73 |         dir_dapi = dir_dataset
 74 |     else:
 75 |         dir_dapi = config.nuclei_fovs.dir_dapi
 76 | 
 77 |     ext_pat = "".join(
 78 |         "[%s%s]" % (e.lower(), e.upper()) for e in config.nuclei_fovs.ext_dapi
 79 |     )
 80 |     fp_dapi_list = glob.glob(os.path.join(dir_dapi, "*." + ext_pat))
 81 |     fp_dapi_list = natsort.natsorted(fp_dapi_list)
 82 | 
 83 |     sample = tifffile.imread(fp_dapi_list[0])
 84 |     fov_shape = sample.shape
 85 | 
 86 |     # Error if multi-channel image and channel is not specified
 87 |     if len(fov_shape) > 2:
 88 |         if config.nuclei_fovs.channel_first:
 89 |             print(f"Channel axis first, DAPI channel {config.nuclei_fovs.channel_dapi}")
 90 |             fov_h = sample.shape[1]
 91 |             fov_w = sample.shape[2]
 92 |         else:
 93 |             print(f"Channel axis last, DAPI channel {config.nuclei_fovs.channel_dapi}")
 94 |             fov_h = sample.shape[0]
 95 |             fov_w = sample.shape[1]
 96 | 
 97 |     fov_dtype = sample.dtype
 98 | 
 99 |     # Error if patterns in file path names not found
100 |     found_f = [check_pattern(s, config.nuclei_fovs.pattern_f) for s in fp_dapi_list]
101 |     check_images_meet_criteria(fp_dapi_list, found_f, "FOV string pattern not found in")
102 | 
103 |     if config.nuclei_fovs.pattern_z is not None:
104 |         found_z = [check_pattern(s, config.nuclei_fovs.pattern_z) for s in fp_dapi_list]
105 |         check_images_meet_criteria(
106 |             fp_dapi_list, found_z, "Z slice string pattern not found in"
107 |         )
108 | 
109 |     # Check shape the same as sample
110 |     match_shape = [check_shape_imgs(s, fov_h, fov_w) for s in fp_dapi_list]
111 |     check_images_meet_criteria(fp_dapi_list, match_shape, "Different image shape for")
112 | 
113 |     # Locations of each FOV in the whole image
114 |     n_fov = config.nuclei_fovs.n_fov
115 |     if config.nuclei_fovs.row_major:
116 |         order = np.arange(
117 |             config.nuclei_fovs.n_fov_h * config.nuclei_fovs.n_fov_w
118 |         ).reshape((config.nuclei_fovs.n_fov_h, config.nuclei_fovs.n_fov_w))
119 |     else:
120 |         order = np.arange(
121 |             config.nuclei_fovs.n_fov_h * config.nuclei_fovs.n_fov_w
122 |         ).reshape((config.nuclei_fovs.n_fov_h, config.nuclei_fovs.n_fov_w), order="F")
123 | 
124 |     # Arrangement of the FOVs - default is ul
125 |     if config.nuclei_fovs.start_corner == "ur":
126 |         order = np.flip(order, 1)
127 |     elif config.nuclei_fovs.start_corner == "bl":
128 |         order = np.flip(order, 0)
129 |     elif config.nuclei_fovs.start_corner == "br":
130 |         order = np.flip(order, (0, 1))
131 | 
132 |     print("FOV ordering")
133 |     print(order)
134 | 
135 |     stitched = np.zeros(
136 |         (fov_h * config.nuclei_fovs.n_fov_h, fov_w * config.nuclei_fovs.n_fov_w),
137 |         dtype=fov_dtype,
138 |     )
139 | 
140 |     for i_fov in range(n_fov):
141 |         coord = np.where(order == i_fov)
142 |         h_idx = coord[0][0]
143 |         w_idx = coord[1][0]
144 |         h_start = h_idx * fov_h
145 |         w_start = w_idx * fov_w
146 |         h_end = h_start + fov_h
147 |         w_end = w_start + fov_w
148 | 
149 |         fov_num = i_fov + config.nuclei_fovs.min_fov
150 | 
151 |         # All files for FOV
152 |         pattern_fov = get_string_with_pattern(fov_num, config.nuclei_fovs.pattern_f)
153 |         print(pattern_fov)
154 |         found_fov = [check_pattern(s, pattern_fov) for s in fp_dapi_list]
155 |         fp_stack_fov = [fp_dapi_list[i] for i, x in enumerate(found_fov) if x]
156 | 
157 |         # Take MIP - or z level
158 |         if config.nuclei_fovs.mip:
159 |             dapi_stack = np.zeros((len(fp_stack_fov), fov_h, fov_w), dtype=fov_dtype)
160 |             for i, fp in enumerate(fp_stack_fov):
161 |                 dapi_stack[i, :, :] = read_dapi(
162 |                     fp,
163 |                     config.nuclei_fovs.channel_first,
164 |                     config.nuclei_fovs.channel_dapi,
165 |                 )
166 | 
167 |             fov_img = np.max(dapi_stack, axis=0)
168 | 
169 |         else:
170 |             # Find z level slice for FOV
171 |             pattern_slice = get_string_with_pattern(
172 |                 config.nuclei_fovs.z_level, config.nuclei_fovs.pattern_z
173 |             )
174 |             print(pattern_slice)
175 |             found_slice = [check_pattern(s, pattern_slice) for s in fp_stack_fov]
176 |             found_slice_idx = [i for i, x in enumerate(found_slice) if x]
177 |             if len(found_slice_idx) > 1:
178 |                 sys.exit(
179 |                     f"Found {len(found_slice_idx)} files with {pattern_slice} for FOV {fov_num}"
180 |                 )
181 | 
182 |             print(fp_stack_fov[found_slice_idx[0]])
183 |             fov_img = read_dapi(
184 |                 fp_stack_fov[found_slice_idx[0]],
185 |                 config.nuclei_fovs.channel_first,
186 |                 config.nuclei_fovs.channel_dapi,
187 |             )
188 | 
189 |         # Flip
190 |         if config.nuclei_fovs.flip_ud:
191 |             fov_img = np.flip(fov_img, 0)
192 | 
193 |         # Place into appropriate location in stitched image
194 |         stitched[h_start:h_end, w_start:w_end] = fov_img.copy()
195 | 
196 |     # Save
197 |     fp_output = os.path.join(dir_dataset, "dapi_stitched.tif")
198 |     tifffile.imwrite(fp_output, stitched, photometric="minisblack")
199 |     print(f"Saved {fp_output}")
200 | 
201 | 
202 | if __name__ == "__main__":
203 |     parser = argparse.ArgumentParser()
204 | 
205 |     parser.add_argument("--config_dir", type=str, help="path to config")
206 | 
207 |     args = parser.parse_args()
208 |     config = load_config(args.config_dir)
209 | 
210 |     stitch_nuclei(config)
211 | 


--------------------------------------------------------------------------------
/bidcell/processing/preannotate.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import collections
  3 | import glob
  4 | import json
  5 | import multiprocessing as mp
  6 | import os
  7 | import sys
  8 | 
  9 | import h5py
 10 | import numpy as np
 11 | import pandas as pd
 12 | from scipy.stats import spearmanr
 13 | 
 14 | from .utils import get_n_processes
 15 | from ..config import Config, load_config
 16 | 
 17 | np.seterr(divide="ignore", invalid="ignore")
 18 | 
 19 | 
 20 | def json_file_to_pyobj(filename):
 21 |     """
 22 |     Read json config file
 23 |     """
 24 | 
 25 |     def _json_object_hook(d):
 26 |         return collections.namedtuple("X", d.keys())(*d.values())
 27 | 
 28 |     def json2obj(data):
 29 |         return json.loads(data, object_hook=_json_object_hook)
 30 | 
 31 |     return json2obj(open(filename).read())
 32 | 
 33 | 
 34 | def normalise_matrix(matrix):
 35 |     x_sums = np.sum(matrix, axis=1)
 36 |     matrix = matrix / np.expand_dims(x_sums, -1)
 37 |     matrix = np.log1p(matrix)
 38 |     return matrix
 39 | 
 40 | 
 41 | def process_chunk_corr(matrix, dir_output, sc_expr, sc_labels, n_atlas_types):
 42 |     matrix_out = np.zeros((matrix.shape[0], 4))
 43 |     col_names = ["cell_id", "cell_type", "spearman", "cell_type_atlas"]
 44 | 
 45 |     # cell_type
 46 |     cell_genes_norm = normalise_matrix(matrix[:, 1:])
 47 |     res = spearmanr(sc_expr, cell_genes_norm, axis=1)
 48 |     corr = res.correlation
 49 |     # bottom left section
 50 |     corr = corr[n_atlas_types:, :n_atlas_types]
 51 | 
 52 |     corr_best = np.max(corr, 1)
 53 |     best_i_type = np.argmax(corr, 1)
 54 |     predicted_cell_type = [sc_labels[x] for x in best_i_type]
 55 | 
 56 |     nan_true = np.isnan(corr_best)
 57 |     corr_best = [x if not y else -1 for (x, y) in zip(corr_best, nan_true)]
 58 |     best_i_type = [x if not y else -1 for (x, y) in zip(best_i_type, nan_true)]
 59 |     predicted_cell_type = [
 60 |         x if not y else -1 for (x, y) in zip(predicted_cell_type, nan_true)
 61 |     ]
 62 | 
 63 |     # cell ID
 64 |     matrix_out[:, 0] = matrix[:, 0].copy()
 65 | 
 66 |     # cell type
 67 |     matrix_out[:, 1] = predicted_cell_type.copy()
 68 | 
 69 |     # spearman
 70 |     matrix_out[:, 2] = corr_best.copy()
 71 | 
 72 |     # cell type atlas
 73 |     matrix_out[:, 3] = best_i_type.copy()
 74 | 
 75 |     # Save as csv
 76 |     df_split = pd.DataFrame(
 77 |         matrix_out, index=list(range(matrix_out.shape[0])), columns=col_names
 78 |     )
 79 |     df_split.to_csv(
 80 |         dir_output + "/preannotations_%d.csv" % matrix_out[0, 0], index=False
 81 |     )
 82 | 
 83 | 
 84 | def preannotate(config: Config):
 85 |     dir_dataset = config.files.data_dir
 86 |     expr_dir = os.path.join(dir_dataset, config.files.dir_cgm, "nuclei")
 87 | 
 88 |     # Cell expressions - order of gene names (columns) will be in same order as all_gene_names.txt
 89 |     df_cells = pd.read_csv(os.path.join(expr_dir, config.files.fp_expr), index_col=0)
 90 |     print(f"Number of cells: {df_cells.shape[0]}")
 91 | 
 92 |     # Reference data - no requirement of column orders - ensure same order as df_cells
 93 |     df_ref_orig = pd.read_csv(config.files.fp_ref, index_col=0)
 94 | 
 95 |     # Ensure the order of genes match
 96 |     genes_cells = df_cells.columns[1:].tolist()
 97 |     ct_columns = df_ref_orig.columns[-3:].tolist()
 98 |     df_ref = df_ref_orig[genes_cells + ct_columns]
 99 | 
100 |     genes_ref = df_ref.columns[:-3]
101 |     if list(genes_cells) != list(genes_ref):
102 |         print(
103 |             "Genes in transcripts but not reference: ",
104 |             list(set(genes_cells) - set(genes_ref)),
105 |         )
106 |         print(
107 |             "Genes in reference but not transcripts: ",
108 |             list(set(genes_ref) - set(genes_cells)),
109 |         )
110 |         print("Check names of genes")
111 |         sys.exit()
112 | 
113 |     sc_expr = df_ref.iloc[:, :-3].to_numpy()
114 |     n_atlas_types = sc_expr.shape[0]
115 |     sc_labels = df_ref.iloc[:, -3].to_numpy().astype(int)
116 |     # sc_names = df_ref.iloc[:, -2].to_list()
117 | 
118 |     # Divide the data into chunks for multiprocessing
119 |     n_processes = get_n_processes(config.cpus)
120 |     print(f"Number of splits for multiprocessing: {n_processes}")
121 | 
122 |     matrix_all = df_cells.to_numpy().astype(np.float32)
123 |     matrix_all_splits = np.array_split(matrix_all, n_processes)
124 |     processes = []
125 | 
126 |     print("Computing simple annotation")
127 |     for chunk in matrix_all_splits:
128 |         p = mp.Process(
129 |             target=process_chunk_corr,
130 |             args=(chunk, dir_dataset, sc_expr, sc_labels, n_atlas_types),
131 |         )
132 |         processes.append(p)
133 |         p.start()
134 | 
135 |     for p in processes:
136 |         p.join()
137 | 
138 |     fp_chunks = glob.glob(dir_dataset + "/preannotations_*.csv")
139 |     for fp_i, fpc in enumerate(fp_chunks):
140 |         df_i = pd.read_csv(fpc)
141 |         if fp_i == 0:
142 |             cell_df = df_i.copy()
143 |         else:
144 |             cell_df = pd.concat([cell_df, df_i], axis=0)
145 | 
146 |     cell_type_col = cell_df["cell_type"].to_numpy()
147 |     cell_id_col = cell_df["cell_id"].to_numpy()
148 | 
149 |     h5f = h5py.File(dir_dataset + "/" + config.files.fp_nuclei_anno, "w")
150 |     h5f.create_dataset("data", data=cell_type_col)
151 |     h5f.create_dataset("ids", data=cell_id_col)
152 |     h5f.close()
153 | 
154 |     # Clean up
155 |     for fpc in fp_chunks:
156 |         os.remove(fpc)
157 | 
158 | 
159 | if __name__ == "__main__":
160 |     parser = argparse.ArgumentParser()
161 | 
162 |     parser.add_argument("--config_dir", type=str, help="path to config")
163 | 
164 |     args = parser.parse_args()
165 |     config = load_config(args.config_dir)
166 | 
167 |     preannotate(config)
168 | 


--------------------------------------------------------------------------------
/bidcell/processing/transcript_patches.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import glob
  3 | import os
  4 | import re
  5 | 
  6 | import h5py
  7 | import natsort
  8 | import numpy as np
  9 | from tqdm import tqdm
 10 | 
 11 | from ..config import Config, load_config
 12 | 
 13 | 
 14 | def generate_patches(config: Config):
 15 |     """
 16 |     Divides transcriptomic maps of all genes into patches for input to the CNN
 17 | 
 18 |     """
 19 |     dir_dataset = os.path.join(config.files.data_dir, config.files.dir_out_maps)
 20 | 
 21 |     patch_size = config.model_params.patch_size
 22 |     shift = [0, int(patch_size / 2)]
 23 | 
 24 |     fp_maps = glob.glob(dir_dataset + "/all_genes_*.hdf5")
 25 |     fp_maps = natsort.natsorted(fp_maps)
 26 | 
 27 |     for fp in fp_maps:
 28 |         print(f"Processing {fp}")
 29 |         h5f = h5py.File(fp, "r")
 30 |         sst = h5f["data"][:]
 31 |         h5f.close()
 32 |         print("Loaded gene expr maps from %s" % fp)
 33 | 
 34 |         # hs, he, ws, we
 35 |         map_h = sst.shape[0]
 36 |         map_w = sst.shape[1]
 37 |         map_coords = [
 38 |             int(x)
 39 |             for x in re.findall(r"\d+", os.path.basename(fp).replace(".hdf5", ""))
 40 |         ]
 41 |         print(map_coords)
 42 | 
 43 |         h_lim = sst.shape[0]
 44 |         w_lim = sst.shape[1]
 45 | 
 46 |         # If map contains blank border
 47 |         if (map_coords[1] - map_coords[0]) < map_h:
 48 |             h_lim = map_coords[1] - map_coords[0]
 49 |         if (map_coords[3] - map_coords[2]) < map_w:
 50 |             w_lim = map_coords[3] - map_coords[2]
 51 | 
 52 |         # print(h_lim, w_lim)
 53 | 
 54 |         for shift_patches in shift:
 55 |             print("Shift by %d" % shift_patches)
 56 | 
 57 |             dir_output = os.path.join(
 58 |                 dir_dataset,
 59 |                 config.files.dir_patches
 60 |                 + "%dx%d_shift_%d" % (patch_size, patch_size, shift_patches),
 61 |             )
 62 |             if not os.path.exists(dir_output):
 63 |                 os.makedirs(dir_output)
 64 | 
 65 |             # Get coordinates of non-overlapping patches
 66 |             if shift_patches == 0:
 67 |                 h_starts = list(np.arange(0, h_lim - patch_size, patch_size))
 68 |                 w_starts = list(np.arange(0, w_lim - patch_size, patch_size))
 69 | 
 70 |                 # Include remainder patches on
 71 |                 h_starts.append(h_lim - patch_size)
 72 |                 w_starts.append(w_lim - patch_size)
 73 | 
 74 |             else:
 75 |                 h_starts = list(
 76 |                     np.arange(shift_patches, h_lim - patch_size, patch_size)
 77 |                 )
 78 |                 w_starts = list(
 79 |                     np.arange(shift_patches, w_lim - patch_size, patch_size)
 80 |                 )
 81 | 
 82 |             coords_starts = [(x, y) for x in h_starts for y in w_starts]
 83 |             print(f"{len(coords_starts)} patches")
 84 | 
 85 |             # Get patches and save
 86 |             for h, w in tqdm(coords_starts):
 87 |                 patch = sst[h : h + patch_size, w : w + patch_size, :]
 88 | 
 89 |                 fp_output = f"{dir_output}/{h+map_coords[0]}_{w+map_coords[2]}.hdf5"
 90 | 
 91 |                 h = h5py.File(fp_output, "w")
 92 |                 _ = h.create_dataset("data", data=patch, dtype=np.uint8)
 93 |                 h.close()
 94 | 
 95 | 
 96 | if __name__ == "__main__":
 97 |     parser = argparse.ArgumentParser()
 98 | 
 99 |     parser.add_argument("--config_dir", type=str, help="path to config")
100 | 
101 |     args = parser.parse_args()
102 |     config = load_config(args.config_dir)
103 | 
104 |     generate_patches(config)
105 | 


--------------------------------------------------------------------------------
/bidcell/processing/transcripts.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import csv
  3 | import glob
  4 | import multiprocessing as mp
  5 | import os
  6 | import pathlib
  7 | import re
  8 | import sys
  9 | import warnings
 10 | 
 11 | import h5py
 12 | import natsort
 13 | import numpy as np
 14 | import pandas as pd
 15 | import tifffile
 16 | from tqdm import tqdm
 17 | 
 18 | from .utils import get_n_processes, get_patches_coords
 19 | from ..config import Config, load_config
 20 | 
 21 | 
 22 | def process_gene_chunk(
 23 |     gene_chunk,
 24 |     df_patch,
 25 |     img_height,
 26 |     img_width,
 27 |     dir_output,
 28 |     hs,
 29 |     ws,
 30 |     gene_col,
 31 |     x_col,
 32 |     y_col,
 33 |     counts_col,
 34 | ):
 35 |     # print(gene_chunk)
 36 |     for i_fe, fe in enumerate(gene_chunk):
 37 |         # print(fe)
 38 |         df_fe = df_patch.loc[df_patch[gene_col] == fe]
 39 |         map_fe = np.zeros((img_height, img_width))
 40 |         # print(map_fe.shape)
 41 | 
 42 |         if counts_col is None:
 43 |             for idx in df_fe.index:
 44 |                 idx_x = np.round(df_patch.iloc[idx][x_col]).astype(int)
 45 |                 idx_y = np.round(df_patch.iloc[idx][y_col]).astype(int)
 46 | 
 47 |                 map_fe[idx_y, idx_x] += 1
 48 | 
 49 |         else:
 50 |             for idx in df_fe.index:
 51 |                 idx_x = np.round(df_patch.iloc[idx][x_col]).astype(int)
 52 |                 idx_y = np.round(df_patch.iloc[idx][y_col]).astype(int)
 53 |                 idx_counts = df_patch.iloc[idx][counts_col]
 54 | 
 55 |                 map_fe[idx_y, idx_x] += idx_counts
 56 | 
 57 |         # print(map_fe.shape)
 58 | 
 59 |         fp_fe_map = f"{dir_output}/{fe}_{hs}_{ws}.tif"
 60 |         # print(fp_fe_map)
 61 |         tifffile.imwrite(fp_fe_map, map_fe.astype(np.uint8), photometric="minisblack")
 62 | 
 63 | 
 64 | def stitch_patches(dir_patches, fp_pattern):
 65 |     """Stitches together the patches of summed genes and saves as new tif"""
 66 |     fp_patches = glob.glob(dir_patches + "/" + fp_pattern)
 67 |     fp_patches = natsort.natsorted(fp_patches)
 68 | 
 69 |     coords = np.zeros((len(fp_patches), 4), dtype=int)
 70 | 
 71 |     for i, fp in enumerate(fp_patches):
 72 |         coords_patch = [int(x) for x in re.findall(r"\d+", os.path.basename(fp))]
 73 |         coords[i, :] = np.array(coords_patch)
 74 | 
 75 |     height_patch = coords[0, 1] - coords[0, 0]
 76 |     width_patch = coords[0, 3] - coords[0, 2]
 77 |     height = np.max(coords[:, 1]) + height_patch
 78 |     width = np.max(coords[:, 2]) + width_patch
 79 | 
 80 |     whole = np.zeros((height, width), dtype=np.uint16)
 81 | 
 82 |     for i, fp in enumerate(fp_patches):
 83 |         hs, _, ws, _ = coords[i, 0], coords[i, 1], coords[i, 2], coords[i, 3]
 84 |         whole[hs : hs + height_patch, ws : ws + width_patch] = tifffile.imread(fp)
 85 | 
 86 |     height_trim = np.max(coords[:, 1])
 87 |     width_trim = np.max(coords[:, 3])
 88 | 
 89 |     whole = whole[:height_trim, :width_trim]
 90 |     print(whole.shape)
 91 | 
 92 |     tifffile.imwrite(
 93 |         dir_patches + "/all_genes_sum.tif", whole, photometric="minisblack"
 94 |     )
 95 | 
 96 | 
 97 | def generate_expression_maps(config: Config):
 98 |     """
 99 |     Generates transcript expression maps from transcripts.csv.gz, which contains transcript data with locations.
100 |     Example file for Xenium:
101 | 
102 |         "transcript_id","cell_id","overlaps_nucleus","feature_name","x_location","y_location","z_location","qv"
103 |         281474976710656,565,0,"SEC11C",4.395842,328.66647,12.019493,18.66248
104 |         281474976710657,540,0,"NegControlCodeword_0502",5.074415,236.96484,7.6085105,18.634956
105 |         281474976710658,562,0,"SEC11C",4.702023,322.79715,12.289083,18.66248
106 |         281474976710659,271,0,"DAPK3",4.9066014,581.42865,11.222615,20.821745
107 |         281474976710660,291,0,"TCIM",5.6606994,720.85175,9.265523,18.017488
108 |         281474976710661,297,0,"TCIM",5.899098,748.5928,9.818688,18.017488
109 | 
110 |     """
111 | 
112 |     dir_dataset = config.files.data_dir
113 |     dir_out_maps = dir_dataset + "/" + config.files.dir_out_maps
114 |     if not os.path.exists(dir_out_maps):
115 |         os.makedirs(dir_out_maps)
116 | 
117 |     fp_transcripts_processed = dir_dataset + "/" + config.files.fp_transcripts_processed
118 | 
119 |     # Names to filter out
120 |     # fp_transcripts_to_filter = os.path.join(config.files.data_dir, config.files.fp_transcripts_to_filter)
121 |     # with open(fp_transcripts_to_filter) as file:
122 |     #     transcripts_to_filter = [line.rstrip() for line in file]
123 |     transcripts_to_filter = config.transcripts.transcripts_to_filter
124 | 
125 |     # Column names in the transcripts csv
126 |     x_col = config.transcripts.x_col
127 |     y_col = config.transcripts.y_col
128 |     gene_col = config.transcripts.gene_col
129 | 
130 |     # if not os.path.exists(fp_transcripts_processed):
131 |     print("Loading transcripts file")
132 |     fp_transcripts = config.files.fp_transcripts
133 |     if pathlib.Path(fp_transcripts).suffixes[-1] == ".gz":
134 |         if ".tsv" in fp_transcripts:
135 |             df = pd.read_csv(fp_transcripts, sep="\t", compression="gzip")
136 |         else:
137 |             df = pd.read_csv(fp_transcripts, compression="gzip")
138 |     else:
139 |         if ".tsv" in fp_transcripts:
140 |             df = pd.read_csv(fp_transcripts, sep="\t")
141 |         else:
142 |             df = pd.read_csv(fp_transcripts)
143 |     print(df.head())
144 | 
145 |     print("Filtering transcripts")
146 |     if "qv" in df.columns:
147 |         df = df[
148 |             (df["qv"] >= config.transcripts.min_qv)
149 |             & (~df[gene_col].str.startswith(tuple(transcripts_to_filter)))
150 |         ]
151 |     else:
152 |         df = df[(~df[gene_col].str.startswith(tuple(transcripts_to_filter)))]
153 | 
154 |     if config.files.fp_selected_genes is not None:
155 |         with open(config.files.fp_selected_genes) as file:
156 |             selected_genes = [line.rstrip() for line in file]
157 |         df = df[(df[gene_col].isin(selected_genes))]
158 | 
159 |     # Scale
160 |     # print(df[x_col].min(), df[x_col].max(), df[y_col].min(), df[y_col].max())
161 |     df[x_col] = df[x_col].mul(config.affine.scale_ts_x)
162 |     df[y_col] = df[y_col].mul(config.affine.scale_ts_y)
163 |     # print(df[x_col].min(), df[x_col].max(), df[y_col].min(), df[y_col].max())
164 | 
165 |     # Shift
166 |     min_x = df[x_col].min()
167 |     min_y = df[y_col].min()
168 |     if config.transcripts.shift_to_origin:
169 |         with pd.option_context("mode.chained_assignment", None):
170 |             df.loc[:, x_col] = df[x_col] - min_x + config.affine.global_shift_x
171 |             df.loc[:, y_col] = df[y_col] - min_y + config.affine.global_shift_y
172 | 
173 |     size_x = df[x_col].max() + 1
174 |     size_y = df[y_col].max() + 1
175 | 
176 |     # Write transform parameters to file
177 |     fp_affine = os.path.join(dir_dataset, config.files.fp_affine)
178 |     params = [
179 |         "scale_ts_x",
180 |         "scale_ts_y",
181 |         "min_x",
182 |         "min_y",
183 |         "size_x",
184 |         "size_y",
185 |         "global_shift_x",
186 |         "global_shift_y",
187 |         "origin",
188 |     ]
189 |     vals = [
190 |         config.affine.scale_ts_x,
191 |         config.affine.scale_ts_y,
192 |         min_x,
193 |         min_y,
194 |         size_x,
195 |         size_y,
196 |         config.affine.global_shift_x,
197 |         config.affine.global_shift_y,
198 |         config.transcripts.shift_to_origin,
199 |     ]
200 |     # print(vals)
201 |     with open(fp_affine, "w") as f:
202 |         writer = csv.writer(f, delimiter="\t")
203 |         writer.writerows(zip(params, vals))
204 | 
205 |     # Delete entries with negative coordinates
206 |     df = df[df[x_col] >= 0]
207 |     df = df[df[y_col] >= 0]
208 | 
209 |     df.reset_index(inplace=True, drop=True)
210 |     print("Finished filtering")
211 |     print("Saving csv...")
212 | 
213 |     df.to_csv(fp_transcripts_processed)
214 |     # else:
215 |     #     print("Loading filtered transcripts")
216 |     #     df = pd.read_csv(fp_transcripts_processed, index_col=0)
217 | 
218 |     # Round locations and convert to integer
219 |     df[x_col] = df[x_col].round().astype(int)
220 |     df[y_col] = df[y_col].round().astype(int)
221 | 
222 |     print(df.head)
223 |     print(df.shape)
224 | 
225 |     # Save list of gene names
226 |     gene_names = df[gene_col].unique()
227 |     print("%d unique genes" % len(gene_names))
228 |     gene_names = natsort.natsorted(gene_names)
229 |     with open(dir_dataset + "/" + config.files.fp_gene_names, "w") as f:
230 |         for line in gene_names:
231 |             f.write(f"{line}\n")
232 | 
233 |     # Dimensions
234 |     total_height_t = int(np.ceil(df[y_col].max())) + 1
235 |     total_width_t = int(np.ceil(df[x_col].max())) + 1
236 | 
237 |     fp_nuclei = os.path.join(dir_dataset, config.files.fp_nuclei)
238 |     if os.path.exists(fp_nuclei):
239 |         nuclei_img = tifffile.imread(fp_nuclei)
240 |         nuclei_h = nuclei_img.shape[0]
241 |         nuclei_w = nuclei_img.shape[1]
242 |         if total_height_t <= nuclei_h and total_width_t <= nuclei_w:
243 |             total_height = nuclei_h
244 |             total_width = nuclei_w
245 |         else:
246 |             sys.exit(
247 |                 f"Dimensions of transcript map [{total_height_t},{total_width_t}] exceeds those of nuclei image [{nuclei_h},{nuclei_w}]. Check scale_ts_x and scale_ts_y values. Then consider specifying --global_shift_x and --global_shift_y, or padding nuclei"
248 |             )
249 |     else:
250 |         warnings.warn(
251 |             "Computing dimensions from transcript locations - check dimensions are the same as nuclei image. Unless cropping DAPI to size of transcript map, it is highly advised to provide nuclei file name via --fp_nuclei to ensure dimensions are identical"
252 |         )
253 |         total_height = total_height_t
254 |         total_width = total_width_t
255 | 
256 |     print(f"Total height {total_height}, width {total_width}")
257 | 
258 |     # Start and end coordinates of patches
259 |     h_coords, img_height = get_patches_coords(
260 |         total_height, config.transcripts.max_height
261 |     )
262 |     w_coords, img_width = get_patches_coords(total_width, config.transcripts.max_width)
263 |     hw_coords = [(hs, he, ws, we) for (hs, he) in h_coords for (ws, we) in w_coords]
264 | 
265 |     print("Converting to maps")
266 | 
267 |     n_processes = get_n_processes(config.cpus)
268 |     gene_names_chunks = np.array_split(gene_names, n_processes)
269 | 
270 |     for hs, he, ws, we in tqdm(hw_coords):
271 |         print("Patch:", (hs, he, ws, we))
272 | 
273 |         df_patch = df[(df[x_col].between(ws, we - 1)) & (df[y_col].between(hs, he - 1))]
274 | 
275 |         with pd.option_context("mode.chained_assignment", None):
276 |             df_patch.loc[:, x_col] = df_patch[x_col] - ws
277 |             df_patch.loc[:, y_col] = df_patch[y_col] - hs
278 | 
279 |         df_patch.reset_index(inplace=True, drop=True)
280 | 
281 |         processes = []
282 | 
283 |         for gene_chunk in gene_names_chunks:
284 |             p = mp.Process(
285 |                 target=process_gene_chunk,
286 |                 args=(
287 |                     gene_chunk,
288 |                     df_patch,
289 |                     img_height,
290 |                     img_width,
291 |                     dir_out_maps,
292 |                     hs,
293 |                     ws,
294 |                     gene_col,
295 |                     x_col,
296 |                     y_col,
297 |                     config.transcripts.counts_col,
298 |                 ),
299 |             )
300 |             processes.append(p)
301 |             p.start()
302 | 
303 |         for p in processes:
304 |             p.join()
305 | 
306 |         # Combine channel-wise
307 |         map_all_genes = np.zeros(
308 |             (img_height, img_width, len(gene_names)), dtype=np.uint8
309 |         )
310 | 
311 |         for i_fe, fe in enumerate(tqdm(gene_names)):
312 |             fp_fe_map = f"{dir_out_maps}/{fe}_{hs}_{ws}.tif"
313 |             map_all_genes[:, :, i_fe] = tifffile.imread(fp_fe_map)
314 |             os.remove(fp_fe_map)
315 | 
316 |         # Sum across all markers
317 |         fp_out_map_sum = f"all_genes_sum_{hs}_{he}_{ws}_{we}.tif"
318 |         tifffile.imwrite(
319 |             dir_out_maps + "/" + fp_out_map_sum,
320 |             np.sum(map_all_genes, -1).astype(np.uint8),
321 |             photometric="minisblack",
322 |         )
323 | 
324 |         # Save to hdf5
325 |         fp_out_map = f"all_genes_{hs}_{he}_{ws}_{we}.hdf5"
326 |         h = h5py.File(dir_out_maps + "/" + fp_out_map, "w")
327 |         _ = h.create_dataset("data", data=map_all_genes, dtype=np.uint8)
328 | 
329 |     print("Saved all maps")
330 | 
331 |     stitch_patches(dir_out_maps, "/all_genes_sum_*.tif")
332 | 
333 | 
334 | if __name__ == "__main__":
335 |     parser = argparse.ArgumentParser()
336 | 
337 |     parser.add_argument("--config_dir", type=str, help="path to config")
338 | 
339 |     args = parser.parse_args()
340 |     config = load_config(args.config_dir)
341 |     generate_expression_maps(config)
342 | 


--------------------------------------------------------------------------------
/bidcell/processing/utils.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import multiprocessing as mp
 3 | 
 4 | 
 5 | def get_patches_coords(size, size_patch):
 6 |     """Get start and end locations of patches in a large image given the image patch sizes"""
 7 | 
 8 |     if size <= size_patch:
 9 |         max_size = size
10 |         coords = [(0, max_size)]
11 |     else:
12 |         max_size = size_patch
13 |         starts = list(np.arange(0, size, size_patch))
14 |         ends = [x + size_patch if x + size_patch <= size else size for x in starts]
15 |         coords = list(zip(starts, ends))
16 | 
17 |     return coords, max_size
18 | 
19 | 
20 | def get_n_processes(n_processes):
21 |     """Number of CPUs for multiprocessing"""
22 |     if n_processes is None:
23 |         return mp.cpu_count()
24 |     else:
25 |         return n_processes if n_processes <= mp.cpu_count() else mp.cpu_count()
26 | 


--------------------------------------------------------------------------------
/data/dataset_xenium_breast1_small/morphology_mip_small.tif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SydneyBioX/BIDCell/e565988cd2e78e622c68bd0a5649a1ec8b9b281f/data/dataset_xenium_breast1_small/morphology_mip_small.tif


--------------------------------------------------------------------------------
/data/example_mousebrain_genes.txt:
--------------------------------------------------------------------------------
  1 | 2010300C02Rik
  2 | Acsbg1
  3 | Acta2
  4 | Acvrl1
  5 | Adamts2
  6 | Adamtsl1
  7 | Adgrl4
  8 | Aldh1a2
  9 | Angpt1
 10 | Ano1
 11 | Aqp4
 12 | Arc
 13 | Arhgap6
 14 | Arhgap12
 15 | Arhgap25
 16 | Arhgef28
 17 | Bcl11b
 18 | Bdnf
 19 | Bhlhe22
 20 | Bhlhe40
 21 | Btbd11
 22 | Cabp7
 23 | Cacna2d2
 24 | Calb1
 25 | Calb2
 26 | Car4
 27 | Carmn
 28 | Cbln1
 29 | Cbln4
 30 | Cd24a
 31 | Cd44
 32 | Cd53
 33 | Cd68
 34 | Cd93
 35 | Cd300c2
 36 | Cdh4
 37 | Cdh6
 38 | Cdh9
 39 | Cdh13
 40 | Cdh20
 41 | Chat
 42 | Chodl
 43 | Chrm2
 44 | Cldn5
 45 | Clmn
 46 | Cntn6
 47 | Cntnap4
 48 | Cntnap5b
 49 | Cobll1
 50 | Col1a1
 51 | Col6a1
 52 | Col19a1
 53 | Cort
 54 | Cplx3
 55 | Cpne4
 56 | Cpne6
 57 | Cpne8
 58 | Crh
 59 | Cspg4
 60 | Ctgf
 61 | Cux2
 62 | Cwh43
 63 | Cyp1b1
 64 | Dcn
 65 | Deptor
 66 | Dkk3
 67 | Dner
 68 | Dpy19l1
 69 | Dpyd
 70 | Ebf3
 71 | Emcn
 72 | Epha4
 73 | Eya4
 74 | Fezf2
 75 | Fgd5
 76 | Fhod3
 77 | Fibcd1
 78 | Fign
 79 | Fmod
 80 | Fn1
 81 | Fos
 82 | Foxp2
 83 | Gad1
 84 | Gad2
 85 | Gadd45a
 86 | Galnt14
 87 | Garnl3
 88 | Gfap
 89 | Gfra2
 90 | Gjb2
 91 | Gjc3
 92 | Gli3
 93 | Gm2115
 94 | Gm19410
 95 | Gng12
 96 | Gpr17
 97 | Grik3
 98 | Gsg1l
 99 | Gucy1a1
100 | Hapln1
101 | Hat1
102 | Hpcal1
103 | Hs3st2
104 | Htr1f
105 | Id2
106 | Igf1
107 | Igf2
108 | Igfbp4
109 | Igfbp5
110 | Igfbp6
111 | Igsf21
112 | Ikzf1
113 | Inpp4b
114 | Kcnh5
115 | Kcnmb2
116 | Kctd8
117 | Kctd12
118 | Kdr
119 | Lamp5
120 | Laptm5
121 | Ly6a
122 | Lypd6
123 | Lyz2
124 | Mapk4
125 | Mdga1
126 | Mecom
127 | Meis2
128 | Myl4
129 | Myo16
130 | Ndst3
131 | Ndst4
132 | Necab1
133 | Necab2
134 | Nell1
135 | Neto2
136 | Neurod6
137 | Nostrin
138 | Npnt
139 | Npy2r
140 | Nr2f2
141 | Nrep
142 | Nrn1
143 | Nrp2
144 | Nts
145 | Ntsr2
146 | Nwd2
147 | Nxph3
148 | Opalin
149 | Opn3
150 | Orai2
151 | Paqr5
152 | Parm1
153 | Pcsk5
154 | Pde7b
155 | Pde11a
156 | Pdgfra
157 | Pdyn
158 | Pdzd2
159 | Pdzrn3
160 | Pecam1
161 | Penk
162 | Pglyrp1
163 | Pip5k1b
164 | Pkib
165 | Plch1
166 | Plcxd2
167 | Plcxd3
168 | Plekha2
169 | Pln
170 | Pou3f1
171 | Ppp1r1b
172 | Prdm8
173 | Prox1
174 | Prph
175 | Prr16
176 | Prss35
177 | Pthlh
178 | Pvalb
179 | Rab3b
180 | Rasgrf2
181 | Rasl10a
182 | Rbp4
183 | Rfx4
184 | Rims3
185 | Rmst
186 | Rnf152
187 | Ror1
188 | Rorb
189 | Rprm
190 | Rspo1
191 | Rspo2
192 | Rxfp1
193 | Satb2
194 | Sdk2
195 | Sema3a
196 | Sema3d
197 | Sema3e
198 | Sema5b
199 | Sema6a
200 | Shisa6
201 | Siglech
202 | Sipa1l3
203 | Sla
204 | Slc6a3
205 | Slc13a4
206 | Slc17a6
207 | Slc17a7
208 | Slc39a12
209 | Slc44a5
210 | Slfn5
211 | Slit2
212 | Sncg
213 | Sntb1
214 | Sorcs3
215 | Sox10
216 | Sox11
217 | Sox17
218 | Spag16
219 | Spi1
220 | Spp1
221 | Sst
222 | Stard5
223 | Strip2
224 | Syndig1
225 | Syt2
226 | Syt6
227 | Syt17
228 | Tacr1
229 | Tanc1
230 | Th
231 | Thsd7a
232 | Tle4
233 | Tmem132d
234 | Tmem163
235 | Tmem255a
236 | Tox
237 | Trbc2
238 | Trem2
239 | Trp73
240 | Trpc4
241 | Unc13c
242 | Vat1l
243 | Vip
244 | Vwc2l
245 | Wfs1
246 | Zfp366
247 | Zfp536
248 | Zfpm2
249 | 


--------------------------------------------------------------------------------
/data/sc_references/sc_breast_markers_neg.csv:
--------------------------------------------------------------------------------
 1 | ,ABCC11,ACTA2,ACTG2,ADAM9,ADGRE5,ADH1B,ADIPOQ,AGR3,AHSP,AIF1,AKR1C1,AKR1C3,ALDH1A3,ANGPT2,ANKRD28,ANKRD29,ANKRD30A,APOBEC3A,APOBEC3B,APOC1,AQP1,AQP3,AR,AVPR1A,BACE2,BANK1,BASP1,BTNL9,C1QA,C1QC,C2orf42,C5orf46,C6orf132,C15orf48,CAV1,CAVIN2,CCDC6,CCDC80,CCL5,CCL8,CCL20,CCND1,CCPG1,CCR7,CD1C,CD3D,CD3E,CD3G,CD4,CD8A,CD8B,CD9,CD14,CD19,CD27,CD68,CD69,CD79A,CD79B,CD80,CD83,CD86,CD93,CD163,CD247,CD274,CDC42EP1,CDH1,CEACAM6,CEACAM8,CENPF,CLCA2,CLDN4,CLDN5,CLEC9A,CLEC14A,CLECL1,CLIC6,CPA3,CRHBP,CRISPLD2,CSF3,CTH,CTLA4,CTSG,CTTN,CX3CR1,CXCL5,CXCL12,CXCL16,CXCR4,CYP1A1,CYTIP,DAPK3,DERL3,DMKN,DNAAF1,DNTTIP1,DPT,DSC2,DSP,DST,DUSP2,DUSP5,EDN1,EDNRB,EGFL7,EGFR,EIF4EBP1,ELF3,ELF5,ENAH,EPCAM,ERBB2,ERN1,ESM1,ESR1,FAM49A,FAM107B,FASN,FBLIM1,FBLN1,FCER1A,FCER1G,FCGR3A,FGL2,FLNB,FOXA1,FOXC2,FOXP3,FSTL3,GATA3,GJB2,GLIPR1,GNLY,GPR183,GZMA,GZMB,GZMK,HAVCR2,HDC,HMGA1,HOOK2,HOXD8,HOXD9,HPX,IGF1,IGSF6,IL2RA,IL2RG,IL3RA,IL7R,ITGAM,ITGAX,ITM2C,JUP,KARS,KDR,KIT,KLF5,KLRB1,KLRC1,KLRD1,KLRF1,KRT5,KRT6B,KRT7,KRT8,KRT14,KRT15,KRT16,KRT23,LAG3,LARS,LDHB,LEP,LGALSL,LIF,LILRA4,LPL,LPXN,LRRC15,LTB,LUM,LY86,LYPD3,LYZ,MAP3K8,MDM2,MEDAG,MKI67,MLPH,MMP1,MMP2,MMP12,MMRN2,MNDA,MPO,MRC1,MS4A1,MUC6,MYBPC1,MYH11,MYLK,MYO5B,MZB1,NARS,NCAM1,NDUFA4L2,NKG7,NOSTRIN,NPM3,OCIAD2,OPRPN,OXTR,PCLAF,PCOLCE,PDCD1,PDCD1LG2,PDE4A,PDGFRA,PDGFRB,PDK4,PECAM1,PELI1,PGR,PIGR,PIM1,PLD4,POLR2J3,POSTN,PPARG,PRDM1,PRF1,PTGDS,PTN,PTPRC,PTRHD1,QARS,RAB30,RAMP2,RAPGEF3,REXO4,RHOH,RORC,RTKN2,RUNX1,S100A4,S100A8,S100A14,SCD,SCGB2A1,SDC4,SEC11C,SEC24A,SELL,SERHL2,SERPINA3,SERPINB9,SFRP1,SFRP4,SH3YL1,SLAMF1,SLAMF7,SLC4A1,SLC5A6,SLC25A37,SMAP2,SMS,SNAI1,SOX17,SOX18,SPIB,SQLE,SRPK1,SSTR2,STC1,SVIL,TAC1,TACSTD2,TCEAL7,TCF4,TCF7,TCF15,TCIM,TCL1A,TENT5C,TFAP2A,THAP2,TIFA,TIGIT,TIMP4,TMEM147,TNFRSF17,TOMM7,TOP2A,TPD52,TPSAB1,TRAC,TRAF4,TRAPPC3,TRIB1,TUBA4A,TUBB2B,TYROBP,UCP1,USP53,VOPP1,VWF,WARS,ZEB1,ZEB2,ZNF562
 2 | B,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
 3 | CD4Tconv/Treg,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
 4 | CD8T/CD8Tex,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
 5 | DC,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
 6 | Endothelial,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
 7 | Epithelial,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
 8 | Fibroblasts,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
 9 | Malignant,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10 | Mast,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
11 | Mono/Macro,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
12 | Myofibroblasts,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
13 | NK,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
14 | Neutrophils,1.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
15 | Plasma,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
16 | SMC,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
17 | 


--------------------------------------------------------------------------------
/data/sc_references/sc_breast_markers_pos.csv:
--------------------------------------------------------------------------------
 1 | ,ABCC11,ACTA2,ACTG2,ADAM9,ADGRE5,ADH1B,ADIPOQ,AGR3,AHSP,AIF1,AKR1C1,AKR1C3,ALDH1A3,ANGPT2,ANKRD28,ANKRD29,ANKRD30A,APOBEC3A,APOBEC3B,APOC1,AQP1,AQP3,AR,AVPR1A,BACE2,BANK1,BASP1,BTNL9,C1QA,C1QC,C2orf42,C5orf46,C6orf132,C15orf48,CAV1,CAVIN2,CCDC6,CCDC80,CCL5,CCL8,CCL20,CCND1,CCPG1,CCR7,CD1C,CD3D,CD3E,CD3G,CD4,CD8A,CD8B,CD9,CD14,CD19,CD27,CD68,CD69,CD79A,CD79B,CD80,CD83,CD86,CD93,CD163,CD247,CD274,CDC42EP1,CDH1,CEACAM6,CEACAM8,CENPF,CLCA2,CLDN4,CLDN5,CLEC9A,CLEC14A,CLECL1,CLIC6,CPA3,CRHBP,CRISPLD2,CSF3,CTH,CTLA4,CTSG,CTTN,CX3CR1,CXCL5,CXCL12,CXCL16,CXCR4,CYP1A1,CYTIP,DAPK3,DERL3,DMKN,DNAAF1,DNTTIP1,DPT,DSC2,DSP,DST,DUSP2,DUSP5,EDN1,EDNRB,EGFL7,EGFR,EIF4EBP1,ELF3,ELF5,ENAH,EPCAM,ERBB2,ERN1,ESM1,ESR1,FAM49A,FAM107B,FASN,FBLIM1,FBLN1,FCER1A,FCER1G,FCGR3A,FGL2,FLNB,FOXA1,FOXC2,FOXP3,FSTL3,GATA3,GJB2,GLIPR1,GNLY,GPR183,GZMA,GZMB,GZMK,HAVCR2,HDC,HMGA1,HOOK2,HOXD8,HOXD9,HPX,IGF1,IGSF6,IL2RA,IL2RG,IL3RA,IL7R,ITGAM,ITGAX,ITM2C,JUP,KARS,KDR,KIT,KLF5,KLRB1,KLRC1,KLRD1,KLRF1,KRT5,KRT6B,KRT7,KRT8,KRT14,KRT15,KRT16,KRT23,LAG3,LARS,LDHB,LEP,LGALSL,LIF,LILRA4,LPL,LPXN,LRRC15,LTB,LUM,LY86,LYPD3,LYZ,MAP3K8,MDM2,MEDAG,MKI67,MLPH,MMP1,MMP2,MMP12,MMRN2,MNDA,MPO,MRC1,MS4A1,MUC6,MYBPC1,MYH11,MYLK,MYO5B,MZB1,NARS,NCAM1,NDUFA4L2,NKG7,NOSTRIN,NPM3,OCIAD2,OPRPN,OXTR,PCLAF,PCOLCE,PDCD1,PDCD1LG2,PDE4A,PDGFRA,PDGFRB,PDK4,PECAM1,PELI1,PGR,PIGR,PIM1,PLD4,POLR2J3,POSTN,PPARG,PRDM1,PRF1,PTGDS,PTN,PTPRC,PTRHD1,QARS,RAB30,RAMP2,RAPGEF3,REXO4,RHOH,RORC,RTKN2,RUNX1,S100A4,S100A8,S100A14,SCD,SCGB2A1,SDC4,SEC11C,SEC24A,SELL,SERHL2,SERPINA3,SERPINB9,SFRP1,SFRP4,SH3YL1,SLAMF1,SLAMF7,SLC4A1,SLC5A6,SLC25A37,SMAP2,SMS,SNAI1,SOX17,SOX18,SPIB,SQLE,SRPK1,SSTR2,STC1,SVIL,TAC1,TACSTD2,TCEAL7,TCF4,TCF7,TCF15,TCIM,TCL1A,TENT5C,TFAP2A,THAP2,TIFA,TIGIT,TIMP4,TMEM147,TNFRSF17,TOMM7,TOP2A,TPD52,TPSAB1,TRAC,TRAF4,TRAPPC3,TRIB1,TUBA4A,TUBB2B,TYROBP,UCP1,USP53,VOPP1,VWF,WARS,ZEB1,ZEB2,ZNF562
 2 | B,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
 3 | CD4Tconv/Treg,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
 4 | CD8T/CD8Tex,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
 5 | DC,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
 6 | Endothelial,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
 7 | Epithelial,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
 8 | Fibroblasts,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
 9 | Malignant,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10 | Mast,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
11 | Mono/Macro,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
12 | Myofibroblasts,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
13 | NK,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
14 | Neutrophils,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
15 | Plasma,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
16 | SMC,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
17 | 


--------------------------------------------------------------------------------
/example_small.py:
--------------------------------------------------------------------------------
 1 | from bidcell import BIDCellModel
 2 | 
 3 | BIDCellModel.get_example_data()
 4 | 
 5 | model = BIDCellModel("params_small_example.yaml")
 6 | 
 7 | model.run_pipeline()
 8 | 
 9 | # Alternatively, call individual functions
10 | 
11 | # model.preprocess()
12 | 
13 | # or call individual functions within preprocess
14 | 
15 | # # model.segment_nuclei()
16 | # # model.generate_expression_maps()
17 | # # model.generate_patches()
18 | # # model.make_cell_gene_mat(is_cell=False)
19 | # # model.preannotate()
20 | 
21 | # model.train()
22 | 
23 | # model.predict()
24 | 


--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
 1 | [tool.pdm.build]
 2 | includes = ["bidcell", "data"]
 3 | 
 4 | [tool.pdm.dev-dependencies]
 5 | test = [
 6 |     "pytest>=7.4.1",
 7 | ]
 8 | dev = [
 9 |     "black>=23.7.0",
10 |     "deptry>=0.12.0",
11 |     "flake8>=6.1.0",
12 |     "ipython>=8.15.0",
13 | ]
14 | [build-system]
15 | requires = ["pdm-backend"]
16 | build-backend = "pdm.backend"
17 | 
18 | [project]
19 | name = "bidcell"
20 | version = "1.0.3"
21 | description = "Biologically-informed deep learning for cell segmentation of subcelluar spatial transcriptomics data."
22 | authors = [
23 |     {name = "Helen Fu", email = "xiaohang.fu@sydney.edu.au"},
24 | ]
25 | dependencies = [
26 |     "pandas>=2.1.0",
27 |     "numpy>=1.24.4",
28 |     "tifffile>=2023.8.30",
29 |     "imgaug>=0.4.0",
30 |     "h5py>=3.9.0",
31 |     "scipy>=1.11.2",
32 |     "matplotlib>=3.7.2",
33 |     "natsort>=8.4.0",
34 |     "cellpose>=2.2.3",
35 |     "scikit-image>=0.21.0",
36 |     "segmentation-models-pytorch>=0.3.3",
37 |     "opencv-python>=4.8.0.76",
38 |     "pillow>=10.0.0",
39 |     "pyyaml>=6.0.1",
40 |     "pydantic>=2.3.0",
41 |     "tqdm>=4.66.1",
42 | ]
43 | requires-python = ">=3.9,<3.13"
44 | readme = "README.md"
45 | license = {text = "MIT"}
46 | 


--------------------------------------------------------------------------------
/setup.cfg:
--------------------------------------------------------------------------------
1 | [flake8]
2 | ignore = 
3 |   E501
4 |   E266
5 |   W503
6 |   E731
7 |   E203
8 | 


--------------------------------------------------------------------------------
/tests/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SydneyBioX/BIDCell/e565988cd2e78e622c68bd0a5649a1ec8b9b281f/tests/__init__.py


--------------------------------------------------------------------------------