├── .gitignore
├── .readthedocs.yaml
├── GRCh38_resources
├── HLA_regions.bed
├── genetic_map_GRCh38_merged.tab.gz
├── hgTables_hg38_gencode.txt
└── ig_gene_list.txt
├── LICENSE
├── README.md
├── calicost.smk
├── config.yaml
├── configuration_cna
├── configuration_cna_multi
├── configuration_purity
├── docs
├── _ext
│ └── typed_returns.py
├── _static
│ ├── css
│ │ ├── custom.css
│ │ ├── dataframe.css
│ │ ├── nbsphinx.css
│ │ └── sphinx_gallery.css
│ └── img
│ │ ├── acn_color_palette.png
│ │ ├── overview4_combine.pdf
│ │ └── overview4_combine.png
├── conf.py
├── index.rst
├── installation.rst
├── notebooks
│ └── tutorials
│ │ ├── prostate_tutorial.ipynb
│ │ └── simulated_data_tutorial.ipynb
├── parameters.rst
├── references.rst
└── tutorials.rst
├── environment.yml
├── examples
├── CalicoST_example.tar.gz
├── example_input_filelist
├── prostate_example.tar.gz
└── simulated_example.tar.gz
├── pyproject.toml
├── setup.py
├── src
└── calicost
│ ├── __init__.py
│ ├── allele_starch_generateconfig.py
│ ├── arg_parse.py
│ ├── calicost_main.py
│ ├── calicost_supervised.py
│ ├── estimate_tumor_proportion.py
│ ├── find_integer_copynumber.py
│ ├── hmm_NB_BB_nophasing.py
│ ├── hmm_NB_BB_nophasing_v2.py
│ ├── hmm_NB_BB_phaseswitch.py
│ ├── hmm_NB_sharedstates.py
│ ├── hmm_gaussian.py
│ ├── hmrf.py
│ ├── hmrf_normalmixture.py
│ ├── joint_allele_generateconfig.py
│ ├── oldcode.py
│ ├── parse_input.py
│ ├── phasing.py
│ ├── phylogeny_startle.py
│ ├── phylogeography.py
│ ├── simple_sctransform.py
│ ├── utils_IO.py
│ ├── utils_distribution_fitting.py
│ ├── utils_hmm.py
│ ├── utils_hmrf.py
│ ├── utils_phase_switch.py
│ └── utils_plotting.py
└── utils
├── filter_snps_forphasing.py
├── get_snp_matrix.py
├── maya_plotter.py
├── merge_bamfile.py
├── plot_hatchet.py
├── process_snps.sh
└── process_snps_merged.sh
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | build/
12 | develop-eggs/
13 | dist/
14 | downloads/
15 | eggs/
16 | .eggs/
17 | lib/
18 | lib64/
19 | parts/
20 | sdist/
21 | var/
22 | wheels/
23 | pip-wheel-metadata/
24 | share/python-wheels/
25 | *.egg-info/
26 | .installed.cfg
27 | *.egg
28 | MANIFEST
29 |
30 | # PyInstaller
31 | # Usually these files are written by a python script from a template
32 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
33 | *.manifest
34 | *.spec
35 |
36 | # Installer logs
37 | pip-log.txt
38 | pip-delete-this-directory.txt
39 |
40 | # Unit test / coverage reports
41 | htmlcov/
42 | .tox/
43 | .nox/
44 | .coverage
45 | .coverage.*
46 | .cache
47 | nosetests.xml
48 | coverage.xml
49 | *.cover
50 | *.py,cover
51 | .hypothesis/
52 | .pytest_cache/
53 |
54 | # Translations
55 | *.mo
56 | *.pot
57 |
58 | # Django stuff:
59 | *.log
60 | local_settings.py
61 | db.sqlite3
62 | db.sqlite3-journal
63 |
64 | # Flask stuff:
65 | instance/
66 | .webassets-cache
67 |
68 | # Scrapy stuff:
69 | .scrapy
70 |
71 | # Sphinx documentation
72 | docs/_build/
73 |
74 | # PyBuilder
75 | target/
76 |
77 | # Jupyter Notebook
78 | .ipynb_checkpoints
79 |
80 | # IPython
81 | profile_default/
82 | ipython_config.py
83 |
84 | # pyenv
85 | .python-version
86 |
87 | # pipenv
88 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
89 | # However, in case of collaboration, if having platform-specific dependencies or dependencies
90 | # having no cross-platform support, pipenv may install dependencies that don't work, or not
91 | # install all needed dependencies.
92 | #Pipfile.lock
93 |
94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
95 | __pypackages__/
96 |
97 | # Celery stuff
98 | celerybeat-schedule
99 | celerybeat.pid
100 |
101 | # SageMath parsed files
102 | *.sage.py
103 |
104 | # Environments
105 | .env
106 | .venv
107 | env/
108 | venv/
109 | ENV/
110 | env.bak/
111 | venv.bak/
112 |
113 | # Spyder project settings
114 | .spyderproject
115 | .spyproject
116 |
117 | # Rope project settings
118 | .ropeproject
119 |
120 | # mkdocs documentation
121 | /site
122 |
123 | # mypy
124 | .mypy_cache/
125 | .dmypy.json
126 | dmypy.json
127 |
128 | # Pyre type checker
129 | .pyre/
130 |
--------------------------------------------------------------------------------
/.readthedocs.yaml:
--------------------------------------------------------------------------------
1 | version: 2
2 |
3 | build:
4 | os: ubuntu-22.04
5 | tools:
6 | python: "3.10"
7 |
8 | sphinx:
9 | builder: html
10 | configuration: docs/conf.py
11 | fail_on_warning: false
12 |
13 | python:
14 | install:
15 | - method: pip
16 | path: .
17 | extra_requirements: [docs]
18 |
19 | submodules:
20 | include: [docs/notebooks]
21 | recursive: true
--------------------------------------------------------------------------------
/GRCh38_resources/HLA_regions.bed:
--------------------------------------------------------------------------------
1 | chr6 29722775 29738528
2 | chr6 29726601 29749049
3 | chr6 29826967 29831125
4 | chr6 29941260 29945884
5 | chr6 30489509 30494194
6 | chr6 31268749 31272130
7 | chr6 31269491 31357188
8 | chr6 32439878 32445046
9 | chr6 32517353 32530287
10 | chr6 32578769 32589848
11 | chr6 32628179 32647062
12 | chr6 32659467 32668383
13 | chr6 32659880 32660729
14 | chr6 32741391 32747198
15 | chr6 32756098 32763532
16 | chr6 32812763 32820466
17 | chr6 32934629 32941028
18 | chr6 32948613 32969094
19 | chr6 33004182 33009591
20 | chr6 33064569 33080775
21 | chr6 33075990 33089696
22 |
--------------------------------------------------------------------------------
/GRCh38_resources/genetic_map_GRCh38_merged.tab.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/raphael-group/CalicoST/5e4a8a1230e71505667d51390dc9c035a69d60d9/GRCh38_resources/genetic_map_GRCh38_merged.tab.gz
--------------------------------------------------------------------------------
/GRCh38_resources/ig_gene_list.txt:
--------------------------------------------------------------------------------
1 | IGKV3OR2-268
2 | IGKC
3 | IGKJ5
4 | IGKJ4
5 | IGKJ3
6 | IGKJ2
7 | IGKJ1
8 | IGKV4-1
9 | IGKV5-2
10 | IGKV1-5
11 | IGKV1-6
12 | IGKV3-7
13 | IGKV1-8
14 | IGKV1-9
15 | IGKV3-11
16 | IGKV1-12
17 | IGKV3-15
18 | IGKV1-16
19 | IGKV1-17
20 | IGKV3-20
21 | IGKV6-21
22 | IGKV2-24
23 | IGKV1-27
24 | IGKV2-28
25 | IGKV2-30
26 | IGKV1-33
27 | IGKV1-37
28 | IGKV1-39
29 | IGKV2-40
30 | IGKV2D-40
31 | IGKV1D-39
32 | IGKV1D-37
33 | IGKV1D-33
34 | IGKV2D-30
35 | IGKV2D-29
36 | IGKV2D-28
37 | IGKV2D-26
38 | IGKV2D-24
39 | IGKV6D-21
40 | IGKV3D-20
41 | IGKV6D-41
42 | IGKV1D-17
43 | IGKV1D-16
44 | IGKV3D-15
45 | IGKV1D-13
46 | IGKV1D-12
47 | IGKV3D-11
48 | IGKV1D-42
49 | IGKV1D-43
50 | IGKV1D-8
51 | IGKV3D-7
52 | IGKV1OR2-108
53 | IGHA2
54 | IGHE
55 | IGHG4
56 | IGHG2
57 | IGHA1
58 | IGHG1
59 | IGHG3
60 | IGHD
61 | IGHM
62 | IGHJ6
63 | IGHJ5
64 | IGHJ4
65 | IGHJ3
66 | IGHJ2
67 | IGHJ1
68 | IGHD7-27
69 | IGHD1-26
70 | IGHD6-25
71 | IGHD5-24
72 | IGHD4-23
73 | IGHD3-22
74 | IGHD2-21
75 | IGHD1-20
76 | IGHD6-19
77 | IGHD5-18
78 | IGHD4-17
79 | IGHD3-16
80 | IGHD2-15
81 | IGHD1-14
82 | IGHD6-13
83 | IGHD5-12
84 | IGHD4-11
85 | IGHD3-10
86 | IGHD3-9
87 | IGHD2-8
88 | IGHD1-7
89 | IGHD6-6
90 | IGHD5-5
91 | IGHD4-4
92 | IGHD3-3
93 | IGHD2-2
94 | IGHD1-1
95 | IGHV6-1
96 | IGHV1-2
97 | IGHV1-3
98 | IGHV4-4
99 | IGHV7-4-1
100 | IGHV2-5
101 | IGHV3-7
102 | IGHV3-64D
103 | IGHV5-10-1
104 | IGHV3-11
105 | IGHV3-13
106 | IGHV3-15
107 | IGHV3-16
108 | IGHV1-18
109 | IGHV3-20
110 | IGHV3-21
111 | IGHV3-23
112 | IGHV1-24
113 | IGHV2-26
114 | IGHV4-28
115 | IGHV3-30
116 | IGHV4-31
117 | IGHV3-33
118 | IGHV4-34
119 | IGHV3-35
120 | IGHV3-38
121 | IGHV4-39
122 | IGHV3-43
123 | IGHV1-45
124 | IGHV1-46
125 | IGHV3-48
126 | IGHV3-49
127 | IGHV5-51
128 | IGHV3-53
129 | IGHV1-58
130 | IGHV4-59
131 | IGHV4-61
132 | IGHV3-64
133 | IGHV3-66
134 | IGHV1-69
135 | IGHV2-70D
136 | IGHV1-69-2
137 | IGHV1-69D
138 | IGHV2-70
139 | IGHV3-72
140 | IGHV3-73
141 | IGHV3-74
142 | IGHV7-81
143 | IGHV1OR15-9
144 | IGHV3OR15-7
145 | IGHD5OR15-5A
146 | IGHD4OR15-4A
147 | IGHD3OR15-3A
148 | IGHD2OR15-2A
149 | IGHD1OR15-1A
150 | IGHD5OR15-5B
151 | IGHD4OR15-4B
152 | IGHD3OR15-3B
153 | IGHD2OR15-2B
154 | IGHD1OR15-1B
155 | AC135068.8
156 | AC135068.2
157 | IGHV1OR15-1
158 | IGHV4OR15-8
159 | IGHV3OR16-9
160 | IGHV2OR16-5
161 | IGHV3OR16-10
162 | IGHV3OR16-8
163 | IGHV3OR16-12
164 | IGHV3OR16-13
165 | IGHV1OR21-1
166 | IGLV4-69
167 | IGLV10-54
168 | IGLV8-61
169 | IGLV4-60
170 | IGLV6-57
171 | IGLV11-55
172 | IGLV5-52
173 | IGLV1-51
174 | IGLV1-50
175 | IGLV9-49
176 | IGLV5-48
177 | IGLV1-47
178 | IGLV7-46
179 | IGLV5-45
180 | IGLV1-44
181 | IGLV7-43
182 | IGLV1-40
183 | IGLV5-37
184 | IGLV1-36
185 | IGLV2-33
186 | IGLV3-32
187 | IGLV3-27
188 | IGLV3-25
189 | IGLV2-23
190 | IGLV3-22
191 | IGLV3-21
192 | IGLV3-19
193 | IGLV2-18
194 | IGLV3-16
195 | IGLV2-14
196 | IGLV3-12
197 | IGLV2-11
198 | IGLV3-10
199 | IGLV3-9
200 | IGLV2-8
201 | IGLV4-3
202 | IGLV3-1
203 | IGLJ1
204 | IGLC1
205 | IGLJ2
206 | IGLC2
207 | IGLJ3
208 | IGLC3
209 | IGLJ4
210 | IGLJ5
211 | IGLJ6
212 | IGLJ7
213 | IGLC7
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | BSD 3-Clause License
2 |
3 | Copyright (c) 2023, Princeton University
4 |
5 | Redistribution and use in source and binary forms, with or without
6 | modification, are permitted provided that the following conditions are met:
7 |
8 | 1. Redistributions of source code must retain the above copyright notice, this
9 | list of conditions and the following disclaimer.
10 |
11 | 2. Redistributions in binary form must reproduce the above copyright notice,
12 | this list of conditions and the following disclaimer in the documentation
13 | and/or other materials provided with the distribution.
14 |
15 | 3. Neither the name of the copyright holder nor the names of its
16 | contributors may be used to endorse or promote products derived from
17 | this software without specific prior written permission.
18 |
19 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
20 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
23 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
24 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
25 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
26 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
27 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
28 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # CalicoST
2 |
3 |
4 |
5 |
6 |
7 | CalicoST is a probabilistic model that infers allele-specific copy number aberrations and tumor phylogeography from spatially resolved transcriptomics.CalicoST has the following key features:
8 | 1. Identifies allele-specific integer copy numbers for each transcribed region, revealing events such as copy neutral loss of heterozygosity (CNLOH) and mirrored subclonal CNAs that are invisible to total copy number analysis.
9 | 2. Assigns each spot a clone label indicating whether the spot is primarily normal cells or a cancer clone with aberration copy number profile.
10 | 3. Infers a phylogeny relating the identified cancer clones as well as a phylogeography that combines genetic evolution and spatial dissemination of clones.
11 | 4. Handles normal cell admixture in SRT technologies hat are not single-cell resolution (e.g. 10x Genomics Visium) to infer more accurate allele-specific copy numbers and cancer clones.
12 | 5. Simultaneously analyzes multiple regions or aligned SRT slices from the same tumor.
13 |
14 | # System requirements
15 | The package has tested on the following Linux operating systems: SpringdaleOpenEnterprise 9.2 (Parma) and CentOS Linux 7 (Core).
16 |
17 | # Installation
18 | ## Minimum installation
19 | First setup a conda environment from the `environment.yml` file:
20 | ```
21 | git clone https://github.com/raphael-group/CalicoST.git
22 | cd CalicoST
23 | conda env create -f environment.yml --name calicost_env
24 | ```
25 |
26 |
27 | Then, install CalicoST using pip by
28 | ```
29 | conda activate calicost_env
30 | pip install -e .
31 | ```
32 |
33 | Setting up the conda environments takes around 15 minutes on an HPC head node.
34 |
35 | ## Additional installation for SNP parsing
36 | CalicoST requires allele count matrices for reference-phased A and B alleles for inferring allele-specific CNAs, and provides a snakemake pipeline for obtaining the required matrices from a BAM file. Run the following commands in CalicoST directory for installing additional package, [Eagle2](https://alkesgroup.broadinstitute.org/Eagle/), for snakemake preprocessing pipeline.
37 |
38 | ```
39 | mkdir external
40 | wget --directory-prefix=external https://storage.googleapis.com/broad-alkesgroup-public/Eagle/downloads/Eagle_v2.4.1.tar.gz
41 | tar -xzf external/Eagle_v2.4.1.tar.gz -C external
42 | ```
43 |
44 | ## Additional installation for reconstructing phylogeny
45 | Based on the inferred cancer clones and allele-specific CNAs by CalicoST, we apply Startle to reconstruct a phylogenetic tree along the clones. Install Startle by
46 | ```
47 | git clone --recurse-submodules https://github.com/raphael-group/startle.git
48 | cd startle
49 | mkdir build; cd build
50 | cmake -DLIBLEMON_ROOT=\
51 | -DCPLEX_INC_DIR=\
52 | -DCPLEX_LIB_DIR=\
53 | -DCONCERT_INC_DIR=\
54 | -DCONCERT_LIB_DIR=\
55 | ..
56 | make
57 | ```
58 |
59 |
60 | # Getting started
61 | ### Preprocessing: genotyping and reference-based phasing
62 | To infer allele-specific CNAs, we generate allele count matrices in this preprocessing step. We followed the recommended pipeline by [Numbat](https://kharchenkolab.github.io/numbat/), which is designed for scRNA-seq data to infer clones and CNAs: first genotyping using the BAM file by cellsnp-lite (included in the conda environment) and reference-based phasing by Eagle2. Download the following panels for genotyping and reference-based phasing.
63 | * [SNP panel](https://sourceforge.net/projects/cellsnp/files/SNPlist/genome1K.phase3.SNP_AF5e4.chr1toX.hg38.vcf.gz) - 0.5GB in size. You can also choose other SNP panels from [cellsnp-lite webpage](https://cellsnp-lite.readthedocs.io/en/latest/main/data.html#data-list-of-common-snps).
64 | * [Phasing panel](http://pklab.med.harvard.edu/teng/data/1000G_hg38.zip)- 9.0GB in size. Unzip the panel after downloading.
65 |
66 | Replace the following paths `config.yaml`:
67 | * `region_vcf`: Replace with the path of downloaded SNP panel.
68 | * `phasing_panel`: Replace with the unzipped directory of the downloaded phasing panel.
69 | * `spaceranger_dir`: Replace with the spaceranger directory of your Visium data, which should contain the BAM file `possorted_genome_bam.bam`.
70 | * `output_snpinfo`: Replace with the desired output directory.
71 | * Replace `calicost_dir` and `eagledir` with the path to the cloned CalicoST directory and downloaded Eagle2 directory.
72 |
73 | Then you can run preprocessing pipeline by
74 | ```
75 | snakemake --cores --configfile config.yaml --snakefile calicost.smk all
76 | ```
77 |
78 | ### Inferring tumor purity per spot (optional)
79 | Replace the paths in the parameter configuration file `configuration_purity` with the corresponding data/reference file paths and run
80 | ```
81 | OMP_NUM_THREADS=1 /src/calicost/estimate_tumor_proportion.py -c configuration_purity
82 | ```
83 |
84 | ### Inferring clones and allele-specific CNAs
85 | Replace the paths in parameter configuration file `configuration_cna` with the corresponding data/reference file paths and run
86 | ```
87 | OMP_NUM_THREADS=1 python /src/calicost/calicost_main.py -c configuration_cna
88 | ```
89 |
90 | When jointly inferring clones and CNAs across multiple SRT slices, prepare a table with the following columns (See [`examples/example_input_filelist`](https://github.com/raphael-group/CalicoST/blob/main/examples/example_input_filelist) as an example).
91 | Path to BAM file | sample ID | Path to Spaceranger outs
92 | Modify `configuration_cna_multi` with paths to the table and run
93 | ```
94 | OMP_NUM_THREADS=1 python /src/calicost/calicost_main.py -c configuration_cna_multi
95 | ```
96 |
97 | ### Reconstruct phylogeography
98 |
99 | ```
100 | python /src/calicost/phylogeny_startle.py -c -s -o