├── LICENSE ├── README.md └── docs ├── README.md ├── deepsomatic-case-study-ffpe-wes-tumor-only.md ├── deepsomatic-case-study-ffpe-wes.md ├── deepsomatic-case-study-ffpe-wgs-tumor-only.md ├── deepsomatic-case-study-ffpe-wgs.md ├── deepsomatic-case-study-ont-tumor-only.md ├── deepsomatic-case-study-ont.md ├── deepsomatic-case-study-pacbio-tumor-only.md ├── deepsomatic-case-study-pacbio.md ├── deepsomatic-case-study-wes.md ├── deepsomatic-case-study-wgs-tumor-only.md ├── deepsomatic-case-study-wgs.md ├── deepsomatic-quick-start.md └── metrics.md /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright 2023 Google LLC. 2 | 3 | Redistribution and use in source and binary forms, with or without modification, 4 | are permitted provided that the following conditions are met: 5 | 6 | 1. Redistributions of source code must retain the above copyright notice, this 7 | list of conditions and the following disclaimer. 8 | 9 | 2. Redistributions in binary form must reproduce the above copyright notice, 10 | this list of conditions and the following disclaimer in the documentation 11 | and/or other materials provided with the distribution. 12 | 13 | 3. Neither the name of the copyright holder nor the names of its contributors 14 | may be used to endorse or promote products derived from this software without 15 | specific prior written permission. 16 | 17 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 18 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 19 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 20 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR 21 | ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 22 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 23 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 24 | ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 25 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 26 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 27 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DeepSomatic 2 | 3 | [![release](https://img.shields.io/badge/release-v1.9.0-green?logo=github)](https://github.com/google/deepvariant/releases) 4 | [![announcements](https://img.shields.io/badge/announcements-blue)](https://groups.google.com/d/forum/deepvariant-announcements) 5 | [![blog](https://img.shields.io/badge/blog-orange)](https://goo.gl/deepvariant) 6 | 7 | DeepSomatic is an extension of deep learning-based variant caller 8 | [DeepVariant](https://github.com/google/deepvariant) that takes aligned reads 9 | (in BAM or CRAM format) from tumor and normal data, produces pileup image 10 | tensors from them, classifies each tensor using a convolutional neural network, 11 | and finally reports somatic variants in a standard VCF or gVCF file. 12 | 13 | DeepSomatic supports somatic variant-calling from tumor-normal and tumor-only 14 | sequencing data. 15 | 16 | ## Code availability 17 | 18 | DeepSomatic is integrated with 19 | [DeepVariant](https://github.com/google/deepvariant) to utilize the high-quality 20 | end-to-end testing and feature development of DeepVariant. 21 | 22 | Here are the scripts that describe the core components of DeepSomatic: 23 | 24 | * [run_deepsomatic](https://github.com/google/deepvariant/blob/r1.9/scripts/run_deepsomatic.py): 25 | The DeepSomatic runner script. 26 | 27 | * [make_examples_somatic](https://github.com/google/deepvariant/blob/r1.9/deepvariant/make_examples_somatic.py): 28 | The `make_examples` step for DeepSomatic. 29 | 30 | * [call_variants](https://github.com/google/deepvariant/blob/r1.9/deepvariant/call_variants.py): 31 | Inference script that generates the variant calls. 32 | 33 | * [postprocess_variants](https://github.com/google/deepvariant/blob/r1.9/deepvariant/postprocess_variants.py): 34 | Updated with `process_somatic` option to process somatic variants. 35 | 36 | * [dockerfile](https://github.com/google/deepvariant/blob/r1.9/Dockerfile.deepsomatic): 37 | The Dockerfile for DeepSomatic. 38 | 39 | Integrating DeepSomatic within DeepVariant helps to maintain 40 | high-quality code health with integrated testing and feature development. 41 | 42 | ## Case studies 43 | 44 | The following case studies show example runs for supported technologies: 45 | 46 | ### Tumor-normal case-studies 47 | 48 | * Illumina WGS tumor-normal [case study](docs/deepsomatic-case-study-wgs.md). 49 | 50 | * Illumina WES tumor-normal [case study](docs/deepsomatic-case-study-wes.md). 51 | 52 | * PacBio tumor-normal [case study](docs/deepsomatic-case-study-pacbio.md). 53 | 54 | * ONT tumor-normal [case study](docs/deepsomatic-case-study-ont.md). 55 | 56 | * FFPE WGS tumor-normal [case study](docs/deepsomatic-case-study-ffpe-wgs.md). 57 | 58 | * FFPE WES tumor-normal [case study](docs/deepsomatic-case-study-ffpe-wes.md). 59 | 60 | ### Tumor-only case-studies 61 | 62 | * Illumina WGS tumor-only 63 | [case study](docs/deepsomatic-case-study-wgs-tumor-only.md). 64 | 65 | * PacBio tumor-only 66 | [case study](docs/deepsomatic-case-study-pacbio-tumor-only.md). 67 | 68 | * ONT tumor-only [case study](docs/deepsomatic-case-study-ont-tumor-only.md). 69 | 70 | * FFPE WGS tumor-only [case study](docs/deepsomatic-case-study-ffpe-wgs-tumor-only.md). 71 | 72 | * FFPE WES tumor-only [case study](docs/deepsomatic-case-study-ffpe-wes-tumor-only.md). 73 | 74 | For details around runtime and accuracy expectations, please see the [DeepSomatic metrics page](docs/metrics.md). 75 | 76 | ## How to Cite 77 | 78 | If you use DeepSomatic in your work, please cite: 79 | 80 | [DeepSomatic: Accurate somatic small variant discovery for multiple sequencing technologies]( https://doi.org/10.1101/2024.08.16.608331) 81 | 82 | ## How to run DeepSomatic 83 | 84 | ```bash 85 | sudo docker run \ 86 | -v ${INPUT_DIR}:${INPUT_DIR} \ 87 | -v ${OUTPUT_DIR}:${OUTPUT_DIR} \ 88 | google/deepsomatic:"${BIN_VERSION}" \ 89 | run_deepsomatic \ 90 | --model_type=WGS \ ** Can be WGS,WES,PACBIO,ONT,FFPE_WGS,FFPE_WES,WGS_TUMOR_ONLY,PACBIO_TUMOR_ONLY,ONT_TUMOR_ONLY ** 91 | --ref=${INPUT_DIR}/REF.fasta \ **Path to reference fasta file. 92 | --reads_normal=${INPUT_DIR}/normal.bam \ **Path to normal bam file. 93 | --reads_tumor=${INPUT_DIR}/tumor.bam \ * Path to tumor bam file. 94 | --output_vcf=${OUTPUT_DIR}/OUTPUT.vcf.gz \ **Path to output VCF file. 95 | --output_gvcf=${OUTPUT_DIR}/OUTPUT.g.vcf.gz \ **Path to output gVCF file. 96 | --sample_name_tumor="tumor" \ 97 | --sample_name_normal="normal" \ 98 | --num_shards=$(nproc) \ **Total number of threads to use. 99 | --logging_dir=${OUTPUT_DIR}/logs \ **Log output directory. 100 | --intermediate_results_dir ${OUTPUT_DIR}/intermediate_results_dir \ 101 | --regions=chr1 \ **Region of the genome, if not provided then runs on whole genome 102 | --use_default_pon_filtering=false \ **Set to true for default PON filtering for tumor-only variant calling** 103 | --dry_run=false **Default is false. If set to true, commands will be printed out but not executed. 104 | ``` 105 | 106 | Please follow the [Quick Start](docs/deepsomatic-quick-start.md) for more 107 | details on different setups like Docker and Singuarity. available for 108 | DeepSomatic 109 | 110 | ### Example output 111 | 112 | DeepSomatic utilizes FILTER in VCF format to report identified germline and 113 | somatic variants. The description of the filters can be found in the header: 114 | 115 | ```bash 116 | ##FILTER= 117 | ##FILTER= 118 | ##FILTER= 119 | ##FILTER= 120 | ##FILTER= 121 | ``` 122 | 123 | For example, the variants reported below: 124 | 125 | ```bash 126 | # CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE_NAME 127 | chr1 14001 . A G 3.7 GERMLINE . GT:GQ:DP:AD:VAF:PL 0/0:4:8:4,4:0.5:1,0,34 128 | chr1 14002 . T A 0 RefCall . GT:GQ:DP:AD:VAF:PL 0/0:51:60:57,2:0.0333333:0,51,58 129 | chr1 14003 . C G 43.8 PASS . GT:GQ:DP:AD:VAF:PL 1/1:43:74:0,74:1:43,52,0 130 | ``` 131 | 132 | In this example: 133 | 134 | * The variant with `GERMLINE` FILTER status is identified as a germline variant 135 | * The variant with `RefCall` FILTER status is homozygous to the reference 136 | * The variant with `PASS` FILTER status is a **somatic variant**. 137 | 138 | ### Prerequisites 139 | 140 | * Unix-like operating system (cannot run on Windows) 141 | * Python 3.10 142 | 143 | ## Contribution Guidelines 144 | 145 | Please [open a pull request](https://github.com/google/deepsomatic/compare) if 146 | you wish to contribute to DeepSomatic. Note, we have not set up the 147 | infrastructure to merge pull requests externally. If you agree, we will test and 148 | submit the changes internally and mention your contributions in our 149 | [release notes](https://github.com/google/deepsomatic/releases). We apologize 150 | for any inconvenience. 151 | 152 | If you have any difficulty using DeepSomatic, feel free to 153 | [open an issue](https://github.com/google/deepsomatic/issues/new). If you have 154 | general questions not specific to DeepSomatic, we recommend that you post on a 155 | community discussion forum such as [BioStars](https://www.biostars.org/). 156 | 157 | ## License 158 | 159 | [BSD-3-Clause license](LICENSE) 160 | 161 | ## Disclaimer 162 | 163 | This is not an official Google product. 164 | 165 | NOTE: the content of this research code repository (i) is not intended to be a 166 | medical device; and (ii) is not intended for clinical use of any kind, including 167 | but not limited to diagnosis or prognosis. 168 | -------------------------------------------------------------------------------- /docs/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/deepsomatic/6f0866bb41ab60ce97d19d1a67ab5f5847bbc6c5/docs/README.md -------------------------------------------------------------------------------- /docs/deepsomatic-case-study-ffpe-wes-tumor-only.md: -------------------------------------------------------------------------------- 1 | # DeepSomatic FFPE WES tumor-only case study 2 | 3 | In this case study, we show an example of running DeepSomatic on FFPE WES 4 | tumor-only data. We use HCC1395 as an example for this case study. 5 | 6 | ## Data details 7 | 8 | For this case-study, we use HCC1395 as an example. We run the analysis on `chr1` 9 | that we hold out during training. 10 | 11 | Please see the [metrics page](metrics.md) for details on runtime and data. 12 | 13 | ## Allele frequency channel 14 | 15 | For accurate tumor-only calling, we use the allele-frequency channel that uses 16 | 1000 genomes variant calls using DeepVariant to filter out germline variants 17 | during inference. Currently, the default VCF is set to variant calls against 18 | GRCh38 reference. If you want to customize this to your VCF then please do 19 | so by using `--population_vcfs` parameter. 20 | 21 | ## Prepare environment 22 | 23 | ### Tools 24 | 25 | [Docker](https://docs.docker.com/get-docker/) will be used to run DeepSomatic 26 | and [hap.py](https://github.com/illumina/hap.py), 27 | 28 | ### Download input data 29 | 30 | We will be using GRCh38 for this case study. 31 | 32 | 33 | ```bash 34 | BASE="${HOME}/deepsomatic-ffpe-wes-tumor-only-case-study" 35 | 36 | # Set up input and output directory data 37 | INPUT_DIR="${BASE}/input/data" 38 | OUTPUT_DIR="${BASE}/output" 39 | 40 | ## Create local directory structure 41 | mkdir -p "${INPUT_DIR}" 42 | mkdir -p "${OUTPUT_DIR}" 43 | mkdir -p "${OUTPUT_DIR}/sompy_output" 44 | 45 | # Download bam files to input directory 46 | HTTPDIR=https://storage.googleapis.com/deepvariant/deepsomatic-case-studies/deepsomatic-chr1-case-studies 47 | # Download the reference files 48 | curl ${HTTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna > ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna 49 | curl ${HTTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna.fai > ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna.fai 50 | 51 | # Download the bam file 52 | curl ${HTTPDIR}/HCC1395_ffpe_wes.tumor.chr1.bam > ${INPUT_DIR}/HCC1395_ffpe_wes.tumor.chr1.bam 53 | curl ${HTTPDIR}/HCC1395_ffpe_wes.tumor.chr1.bam.bai > ${INPUT_DIR}/HCC1395_ffpe_wes.tumor.chr1.bam.bai 54 | 55 | # Download truth VCF 56 | DATA_HTTP_DIR=https://storage.googleapis.com/deepvariant/deepsomatic-case-studies/SEQC2-S1395-truth 57 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/High-Confidence_Regions_v1.2.bed 58 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz 59 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz.tbi 60 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/seqc2_hg38.exome_regions.bed 61 | ``` 62 | 63 | ## Running DeepSomatic with one command 64 | 65 | DeepVariant pipeline consists of 3 steps: `make_examples_somatic`, `call_variants`, and 66 | `postprocess_variants`. You can run DeepSomatic with one command using the 67 | `run_deepvariant` script. 68 | 69 | ### Running on a CPU-only machine 70 | 71 | ```bash 72 | BIN_VERSION="1.9.0" 73 | 74 | sudo docker pull google/deepsomatic:"${BIN_VERSION}" 75 | 76 | sudo docker run \ 77 | -v ${INPUT_DIR}:${INPUT_DIR} \ 78 | -v ${OUTPUT_DIR}:${OUTPUT_DIR} \ 79 | google/deepsomatic:"${BIN_VERSION}" \ 80 | run_deepsomatic \ 81 | --model_type=FFPE_WES_TUMOR_ONLY \ 82 | --ref=${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna \ 83 | --reads_tumor=${INPUT_DIR}/HCC1395_ffpe_wes.tumor.chr1.bam \ 84 | --output_vcf=${OUTPUT_DIR}/HCC1395_deepsomatic_output.vcf.gz \ 85 | --sample_name_tumor="HCC1395Tumor" \ 86 | --num_shards=$(nproc) \ 87 | --logging_dir=${OUTPUT_DIR}/logs \ 88 | --intermediate_results_dir=${OUTPUT_DIR}/intermediate_results_dir \ 89 | --use_default_pon_filtering=true \ 90 | --regions=chr1 91 | ``` 92 | 93 | By using `--use_default_pon_filtering=true` the somatic variants will be 94 | filtered using the default PON vcf that contains variant calls from dbSNP, 95 | gnomAD and 1000 genomes. If you plan to customize post-filtering, then you 96 | can set this parameter to `false` and use custom filtering. 97 | 98 | NOTE: If you want to run each of the steps separately, add `--dry_run=true` 99 | to the command above to figure out what flags you need in each step. Based on 100 | the different model types, different flags are needed in the `make_examples` 101 | step. 102 | 103 | `--intermediate_results_dir` flag is optional. By specifying it, the 104 | intermediate outputs of `make_examples_somatic` and `call_variants` stages can be found in the directory. 105 | 106 | ```bash 107 | sudo docker pull pkrusche/hap.py:latest 108 | # Run hap.py 109 | sudo docker run \ 110 | -v ${INPUT_DIR}:${INPUT_DIR} -v ${OUTPUT_DIR}:${OUTPUT_DIR} \ 111 | pkrusche/hap.py:latest \ 112 | /opt/hap.py/bin/som.py \ 113 | -N ${INPUT_DIR}/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz \ 114 | ${OUTPUT_DIR}/HCC1395_deepsomatic_output.vcf.gz \ 115 | -r ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna \ 116 | -o ${OUTPUT_DIR}/sompy_output/deepsomatic.chr1.sompy.output \ 117 | --feature-table generic \ 118 | -R ${INPUT_DIR}/High-Confidence_Regions_v1.2.bed \ 119 | -T ${INPUT_DIR}/seqc2_hg38.exome_regions.bed \ 120 | -l chr1 121 | ``` 122 | 123 | The output: 124 | 125 | ``` 126 | type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate 127 | 0 indels 7 13 6 7 1 0 0 0.857143 0.499203 1.000000 0.857143 0.461538 0.221123 0.717108 0 0 248956422 0.028117 128 | 1 SNVs 145 129 99 30 46 0 0 0.682759 0.603975 0.754328 0.682759 0.767442 0.689149 0.833890 0 0 248956422 0.120503 129 | 5 records 152 142 105 37 47 0 0 0.690789 0.614250 0.760141 0.690789 0.739437 0.662930 0.806296 0 0 248956422 0.148620 130 | ``` 131 | -------------------------------------------------------------------------------- /docs/deepsomatic-case-study-ffpe-wes.md: -------------------------------------------------------------------------------- 1 | # DeepSomatic FFPE WES case study 2 | 3 | In this case study, we show an example of running DeepSomatic on FFPE WES 4 | data. We use HCC1395 as an example for this case study. 5 | 6 | ## Data details 7 | 8 | For this case-study, we use HCC1395 as an example. We run the analysis on `chr1` 9 | that we hold out during training. 10 | 11 | Please see the [metrics page](metrics.md) for details on runtime and data. 12 | 13 | ## Prepare environment 14 | 15 | ### Tools 16 | 17 | [Docker](https://docs.docker.com/get-docker/) will be used to run DeepSomatic 18 | and [hap.py](https://github.com/illumina/hap.py), 19 | 20 | ### Download input data 21 | 22 | We will be using GRCh38 for this case study. 23 | 24 | 25 | ```bash 26 | BASE="${HOME}/deepsomatic-ffpe-wes-case-study" 27 | 28 | # Set up input and output directory data 29 | INPUT_DIR="${BASE}/input/data" 30 | OUTPUT_DIR="${BASE}/output" 31 | 32 | ## Create local directory structure 33 | mkdir -p "${INPUT_DIR}" 34 | mkdir -p "${OUTPUT_DIR}" 35 | mkdir -p "${OUTPUT_DIR}/sompy_output" 36 | 37 | # Download bam files to input directory 38 | HTTPDIR=https://storage.googleapis.com/deepvariant/deepsomatic-case-studies/deepsomatic-chr1-case-studies 39 | # Download the reference files 40 | curl ${HTTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna > ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna 41 | curl ${HTTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna.fai > ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna.fai 42 | 43 | # Download the bam files 44 | curl ${HTTPDIR}/HCC1395_ffpe_wes.normal.chr1.bam > ${INPUT_DIR}/HCC1395_ffpe_wes.normal.chr1.bam 45 | curl ${HTTPDIR}/HCC1395_ffpe_wes.normal.chr1.bam.bai > ${INPUT_DIR}/HCC1395_ffpe_wes.normal.chr1.bam.bai 46 | curl ${HTTPDIR}/HCC1395_ffpe_wes.tumor.chr1.bam > ${INPUT_DIR}/HCC1395_ffpe_wes.tumor.chr1.bam 47 | curl ${HTTPDIR}/HCC1395_ffpe_wes.tumor.chr1.bam.bai > ${INPUT_DIR}/HCC1395_ffpe_wes.tumor.chr1.bam.bai 48 | 49 | # Download truth VCF 50 | DATA_HTTP_DIR=https://storage.googleapis.com/deepvariant/deepsomatic-case-studies/SEQC2-S1395-truth 51 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/High-Confidence_Regions_v1.2.bed 52 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz 53 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz.tbi 54 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/seqc2_hg38.exome_regions.bed 55 | ``` 56 | 57 | ## Running DeepSomatic with one command 58 | 59 | DeepVariant pipeline consists of 3 steps: `make_examples_somatic`, `call_variants`, and 60 | `postprocess_variants`. You can run DeepSomatic with one command using the 61 | `run_deepvariant` script. 62 | 63 | ### Running on a CPU-only machine 64 | 65 | ```bash 66 | BIN_VERSION="1.9.0" 67 | 68 | sudo docker pull google/deepsomatic:"${BIN_VERSION}" 69 | 70 | sudo docker run \ 71 | -v ${INPUT_DIR}:${INPUT_DIR} \ 72 | -v ${OUTPUT_DIR}:${OUTPUT_DIR} \ 73 | google/deepsomatic:"${BIN_VERSION}" \ 74 | run_deepsomatic \ 75 | --model_type=FFPE_WES \ 76 | --ref=${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna \ 77 | --reads_normal=${INPUT_DIR}/HCC1395_ffpe_wes.normal.chr1.bam \ 78 | --reads_tumor=${INPUT_DIR}/HCC1395_ffpe_wes.tumor.chr1.bam \ 79 | --output_vcf=${OUTPUT_DIR}/HCC1395_deepsomatic_output.vcf.gz \ 80 | --sample_name_tumor="HCC1395Tumor" \ 81 | --sample_name_normal="HCC1395Normal" \ 82 | --num_shards=$(nproc) \ 83 | --logging_dir=${OUTPUT_DIR}/logs \ 84 | --intermediate_results_dir=${OUTPUT_DIR}/intermediate_results_dir \ 85 | --regions=chr1 86 | ``` 87 | 88 | NOTE: If you want to run each of the steps separately, add `--dry_run=true` 89 | to the command above to figure out what flags you need in each step. Based on 90 | the different model types, different flags are needed in the `make_examples` 91 | step. 92 | 93 | `--intermediate_results_dir` flag is optional. By specifying it, the 94 | intermediate outputs of `make_examples_somatic` and `call_variants` stages can be found in the directory. 95 | 96 | ```bash 97 | sudo docker pull pkrusche/hap.py:latest 98 | # Run hap.py 99 | sudo docker run \ 100 | -v ${INPUT_DIR}:${INPUT_DIR} -v ${OUTPUT_DIR}:${OUTPUT_DIR} \ 101 | pkrusche/hap.py:latest \ 102 | /opt/hap.py/bin/som.py \ 103 | -N ${INPUT_DIR}/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz \ 104 | ${OUTPUT_DIR}/HCC1395_deepsomatic_output.vcf.gz \ 105 | -r ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna \ 106 | -o ${OUTPUT_DIR}/sompy_output/deepsomatic.chr1.sompy.output \ 107 | --feature-table generic \ 108 | -R ${INPUT_DIR}/High-Confidence_Regions_v1.2.bed \ 109 | -T ${INPUT_DIR}/seqc2_hg38.exome_regions.bed \ 110 | -l chr1 111 | ``` 112 | 113 | The output: 114 | 115 | ``` 116 | type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate 117 | 0 indels 7 10 7 3 0 0 0 1.000000 0.590384 1.000000 1.000000 0.700000 0.394182 0.907305 0 0 248956422 0.012050 118 | 1 SNVs 145 124 121 3 24 0 0 0.834483 0.767678 0.888096 0.834483 0.975806 0.936851 0.993140 0 0 248956422 0.012050 119 | 5 records 152 134 128 6 24 0 0 0.842105 0.777939 0.893394 0.842105 0.955224 0.910050 0.981097 0 0 248956422 0.024101 120 | ``` 121 | -------------------------------------------------------------------------------- /docs/deepsomatic-case-study-ffpe-wgs-tumor-only.md: -------------------------------------------------------------------------------- 1 | # DeepSomatic FFPE WGS tumor-only case study 2 | 3 | In this case study, we show an example of running DeepSomatic on FFPE WGS 4 | tumor-only data. We use HCC1395 as an example for this case study. 5 | 6 | ## Data details 7 | 8 | For this case-study, we use HCC1395 as an example. We run the analysis on `chr1` 9 | that we hold out during training. 10 | 11 | Please see the [metrics page](metrics.md) for details on runtime and data. 12 | 13 | ## Allele frequency channel 14 | 15 | For accurate tumor-only calling, we use the allele-frequency channel that uses 16 | 1000 genomes variant calls using DeepVariant to filter out germline variants 17 | during inference. Currently, the default VCF is set to variant calls against 18 | GRCh38 reference. If you want to customize this to your VCF then please do 19 | so by using `--population_vcfs` parameter. 20 | 21 | ## Prepare environment 22 | 23 | ### Tools 24 | 25 | [Docker](https://docs.docker.com/get-docker/) will be used to run DeepSomatic 26 | and [hap.py](https://github.com/illumina/hap.py), 27 | 28 | ### Download input data 29 | 30 | We will be using GRCh38 for this case study. 31 | 32 | 33 | ```bash 34 | BASE="${HOME}/deepsomatic-ffpe-wgs-tumor-only-case-study" 35 | 36 | # Set up input and output directory data 37 | INPUT_DIR="${BASE}/input/data" 38 | OUTPUT_DIR="${BASE}/output" 39 | 40 | ## Create local directory structure 41 | mkdir -p "${INPUT_DIR}" 42 | mkdir -p "${OUTPUT_DIR}" 43 | mkdir -p "${OUTPUT_DIR}/sompy_output" 44 | 45 | # Download bam files to input directory 46 | HTTPDIR=https://storage.googleapis.com/deepvariant/deepsomatic-case-studies/deepsomatic-chr1-case-studies 47 | # Download the reference files 48 | curl ${HTTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna > ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna 49 | curl ${HTTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna.fai > ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna.fai 50 | 51 | # Download the bam file 52 | curl ${HTTPDIR}/HCC1395_ffpe_wgs.tumor.chr1.bam > ${INPUT_DIR}/HCC1395_ffpe_wgs.tumor.chr1.bam 53 | curl ${HTTPDIR}/HCC1395_ffpe_wgs.tumor.chr1.bam.bai > ${INPUT_DIR}/HCC1395_ffpe_wgs.tumor.chr1.bam.bai 54 | 55 | # Download truth VCF 56 | DATA_HTTP_DIR=https://storage.googleapis.com/deepvariant/deepsomatic-case-studies/SEQC2-S1395-truth 57 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/High-Confidence_Regions_v1.2.bed 58 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz 59 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz.tbi 60 | ``` 61 | 62 | ## Running DeepSomatic with one command 63 | 64 | DeepVariant pipeline consists of 3 steps: `make_examples_somatic`, `call_variants`, and 65 | `postprocess_variants`. You can run DeepSomatic with one command using the 66 | `run_deepvariant` script. 67 | 68 | ### Running on a CPU-only machine 69 | 70 | ```bash 71 | BIN_VERSION="1.9.0" 72 | 73 | sudo docker pull google/deepsomatic:"${BIN_VERSION}" 74 | 75 | sudo docker run \ 76 | -v ${INPUT_DIR}:${INPUT_DIR} \ 77 | -v ${OUTPUT_DIR}:${OUTPUT_DIR} \ 78 | google/deepsomatic:"${BIN_VERSION}" \ 79 | run_deepsomatic \ 80 | --model_type=FFPE_WGS_TUMOR_ONLY \ 81 | --ref=${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna \ 82 | --reads_tumor=${INPUT_DIR}/HCC1395_ffpe_wgs.tumor.chr1.bam \ 83 | --output_vcf=${OUTPUT_DIR}/HCC1395_deepsomatic_output.vcf.gz \ 84 | --sample_name_tumor="HCC1395Tumor" \ 85 | --num_shards=$(nproc) \ 86 | --logging_dir=${OUTPUT_DIR}/logs \ 87 | --intermediate_results_dir=${OUTPUT_DIR}/intermediate_results_dir \ 88 | --use_default_pon_filtering=true \ 89 | --regions=chr1 90 | ``` 91 | 92 | By using `--use_default_pon_filtering=true` the somatic variants will be 93 | filtered using the default PON vcf that contains variant calls from dbSNP, 94 | gnomAD and 1000 genomes. If you plan to customize post-filtering, then you 95 | can set this parameter to `false` and use custom filtering. 96 | 97 | NOTE: If you want to run each of the steps separately, add `--dry_run=true` 98 | to the command above to figure out what flags you need in each step. Based on 99 | the different model types, different flags are needed in the `make_examples` 100 | step. 101 | 102 | `--intermediate_results_dir` flag is optional. By specifying it, the 103 | intermediate outputs of `make_examples_somatic` and `call_variants` stages can be found in the directory. 104 | 105 | ```bash 106 | sudo docker pull pkrusche/hap.py:latest 107 | # Run hap.py 108 | sudo docker run \ 109 | -v ${INPUT_DIR}:${INPUT_DIR} -v ${OUTPUT_DIR}:${OUTPUT_DIR} \ 110 | pkrusche/hap.py:latest \ 111 | /opt/hap.py/bin/som.py \ 112 | -N ${INPUT_DIR}/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz \ 113 | ${OUTPUT_DIR}/HCC1395_deepsomatic_output.vcf.gz \ 114 | -r ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna \ 115 | -o ${OUTPUT_DIR}/sompy_output/deepsomatic.chr1.sompy.output \ 116 | --feature-table generic \ 117 | -R ${INPUT_DIR}/High-Confidence_Regions_v1.2.bed \ 118 | -l chr1 119 | ``` 120 | 121 | The output: 122 | 123 | ``` 124 | type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate 125 | 0 indels 133 150 55 95 78 0 0 0.413534 0.332440 0.498345 0.413534 0.366667 0.292686 0.445737 0 0 248956422 0.381593 126 | 1 SNVs 3440 2770 1949 821 1491 0 0 0.566570 0.549960 0.583068 0.566570 0.703610 0.686398 0.720397 0 0 248956422 3.297766 127 | 5 records 3573 2920 2004 916 1569 0 0 0.560873 0.544557 0.577091 0.560873 0.686301 0.669294 0.702940 0 0 248956422 3.679359 128 | ``` 129 | -------------------------------------------------------------------------------- /docs/deepsomatic-case-study-ffpe-wgs.md: -------------------------------------------------------------------------------- 1 | # DeepSomatic FFPE WGS case study 2 | 3 | In this case study, we show an example of running DeepSomatic on FFPE WGS 4 | data. We use HCC1395 as an example for this case study. 5 | 6 | ## Data details 7 | 8 | For this case-study, we use HCC1395 as an example. We run the analysis on `chr1` 9 | that we hold out during training. 10 | 11 | Please see the [metrics page](metrics.md) for details on runtime and data. 12 | 13 | ## Prepare environment 14 | 15 | ### Tools 16 | 17 | [Docker](https://docs.docker.com/get-docker/) will be used to run DeepSomatic 18 | and [hap.py](https://github.com/illumina/hap.py), 19 | 20 | ### Download input data 21 | 22 | We will be using GRCh38 for this case study. 23 | 24 | 25 | ```bash 26 | BASE="${HOME}/deepsomatic-ffpe-wgs-case-study" 27 | 28 | # Set up input and output directory data 29 | INPUT_DIR="${BASE}/input/data" 30 | OUTPUT_DIR="${BASE}/output" 31 | 32 | ## Create local directory structure 33 | mkdir -p "${INPUT_DIR}" 34 | mkdir -p "${OUTPUT_DIR}" 35 | mkdir -p "${OUTPUT_DIR}/sompy_output" 36 | 37 | # Download bam files to input directory 38 | HTTPDIR=https://storage.googleapis.com/deepvariant/deepsomatic-case-studies/deepsomatic-chr1-case-studies 39 | # Download the reference files 40 | curl ${HTTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna > ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna 41 | curl ${HTTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna.fai > ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna.fai 42 | 43 | # Download the bam files 44 | curl ${HTTPDIR}/HCC1395_ffpe_wgs.normal.chr1.bam > ${INPUT_DIR}/HCC1395_ffpe_wgs.normal.chr1.bam 45 | curl ${HTTPDIR}/HCC1395_ffpe_wgs.normal.chr1.bam.bai > ${INPUT_DIR}/HCC1395_ffpe_wgs.normal.chr1.bam.bai 46 | curl ${HTTPDIR}/HCC1395_ffpe_wgs.tumor.chr1.bam > ${INPUT_DIR}/HCC1395_ffpe_wgs.tumor.chr1.bam 47 | curl ${HTTPDIR}/HCC1395_ffpe_wgs.tumor.chr1.bam.bai > ${INPUT_DIR}/HCC1395_ffpe_wgs.tumor.chr1.bam.bai 48 | 49 | # Download truth VCF 50 | DATA_HTTP_DIR=https://storage.googleapis.com/deepvariant/deepsomatic-case-studies/SEQC2-S1395-truth 51 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/High-Confidence_Regions_v1.2.bed 52 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz 53 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz.tbi 54 | ``` 55 | 56 | ## Running DeepSomatic with one command 57 | 58 | DeepVariant pipeline consists of 3 steps: `make_examples_somatic`, `call_variants`, and 59 | `postprocess_variants`. You can run DeepSomatic with one command using the 60 | `run_deepvariant` script. 61 | 62 | ### Running on a CPU-only machine 63 | 64 | ```bash 65 | BIN_VERSION="1.9.0" 66 | 67 | sudo docker pull google/deepsomatic:"${BIN_VERSION}" 68 | 69 | sudo docker run \ 70 | -v ${INPUT_DIR}:${INPUT_DIR} \ 71 | -v ${OUTPUT_DIR}:${OUTPUT_DIR} \ 72 | google/deepsomatic:"${BIN_VERSION}" \ 73 | run_deepsomatic \ 74 | --model_type=FFPE_WGS \ 75 | --ref=${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna \ 76 | --reads_normal=${INPUT_DIR}/HCC1395_ffpe_wgs.normal.chr1.bam \ 77 | --reads_tumor=${INPUT_DIR}/HCC1395_ffpe_wgs.tumor.chr1.bam \ 78 | --output_vcf=${OUTPUT_DIR}/HCC1395_deepsomatic_output.vcf.gz \ 79 | --sample_name_tumor="HCC1395Tumor" \ 80 | --sample_name_normal="HCC1395Normal" \ 81 | --num_shards=$(nproc) \ 82 | --logging_dir=${OUTPUT_DIR}/logs \ 83 | --intermediate_results_dir=${OUTPUT_DIR}/intermediate_results_dir \ 84 | --regions=chr1 85 | ``` 86 | 87 | NOTE: If you want to run each of the steps separately, add `--dry_run=true` 88 | to the command above to figure out what flags you need in each step. Based on 89 | the different model types, different flags are needed in the `make_examples` 90 | step. 91 | 92 | `--intermediate_results_dir` flag is optional. By specifying it, the 93 | intermediate outputs of `make_examples_somatic` and `call_variants` stages can be found in the directory. 94 | 95 | ```bash 96 | sudo docker pull pkrusche/hap.py:latest 97 | # Run hap.py 98 | sudo docker run \ 99 | -v ${INPUT_DIR}:${INPUT_DIR} -v ${OUTPUT_DIR}:${OUTPUT_DIR} \ 100 | pkrusche/hap.py:latest \ 101 | /opt/hap.py/bin/som.py \ 102 | -N ${INPUT_DIR}/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz \ 103 | ${OUTPUT_DIR}/HCC1395_deepsomatic_output.vcf.gz \ 104 | -r ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna \ 105 | -o ${OUTPUT_DIR}/sompy_output/deepsomatic.chr1.sompy.output \ 106 | --feature-table generic \ 107 | -R ${INPUT_DIR}/High-Confidence_Regions_v1.2.bed \ 108 | -l chr1 109 | ``` 110 | 111 | The output: 112 | 113 | ``` 114 | type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate 115 | 0 indels 133 138 107 31 26 0 0 0.804511 0.730987 0.864953 0.804511 0.775362 0.700491 0.838825 0 0 248956422 0.124520 116 | 1 SNVs 3440 3015 2844 171 596 0 0 0.826744 0.813825 0.839114 0.826744 0.943284 0.934599 0.951118 0 0 248956422 0.686867 117 | 5 records 3573 3153 2951 202 622 0 0 0.825917 0.813222 0.838083 0.825917 0.935934 0.926985 0.944084 0 0 248956422 0.811387 118 | ``` 119 | -------------------------------------------------------------------------------- /docs/deepsomatic-case-study-ont-tumor-only.md: -------------------------------------------------------------------------------- 1 | # DeepSomatic ONT tumor-only case study 2 | 3 | In this case study, we show an example of running DeepSomatic on ONT 4 | tumor-only data. We use HCC1395 as an example for this case study. 5 | 6 | ## Data details 7 | 8 | For this case-study, we use HCC1395 as an example. We run the analysis on `chr1` 9 | that we hold out during training. 10 | 11 | Please see the [metrics page](metrics.md) for details on runtime and data. 12 | 13 | ## Allele frequency channel 14 | 15 | For accurate tumor-only calling, we use the allele-frequency channel that uses 16 | 1000 genomes variant calls using DeepVariant to filter out germline variants 17 | during inference. Currently, the default VCF is set to variant calls against 18 | GRCh38 reference. If you want to customize this to your VCF then please do 19 | so by using `--population_vcfs` parameter. 20 | 21 | ## Prepare environment 22 | 23 | ### Tools 24 | 25 | [Docker](https://docs.docker.com/get-docker/) will be used to run DeepSomatic 26 | and [hap.py](https://github.com/illumina/hap.py), 27 | 28 | ### Download input data 29 | 30 | We will be using GRCh38 for this case study. 31 | 32 | 33 | ```bash 34 | BASE="${HOME}/deepsomatic-ont-tumor-only-case-study" 35 | 36 | # Set up input and output directory data 37 | INPUT_DIR="${BASE}/input/data" 38 | OUTPUT_DIR="${BASE}/output" 39 | 40 | ## Create local directory structure 41 | mkdir -p "${INPUT_DIR}" 42 | mkdir -p "${OUTPUT_DIR}" 43 | mkdir -p "${OUTPUT_DIR}/sompy_output" 44 | 45 | # Download bam files to input directory 46 | HTTPDIR=https://storage.googleapis.com/deepvariant/deepsomatic-case-studies/deepsomatic-chr1-case-studies 47 | # Download the reference files 48 | curl ${HTTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna > ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna 49 | curl ${HTTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna.fai > ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna.fai 50 | 51 | # Download the bam file 52 | curl ${HTTPDIR}/HCC1395_ont.tumor.chr1.bam > ${INPUT_DIR}/HCC1395_ont.tumor.chr1.bam 53 | curl ${HTTPDIR}/HCC1395_ont.tumor.chr1.bam.bai > ${INPUT_DIR}/HCC1395_ont.tumor.chr1.bam.bai 54 | 55 | # Download truth VCF 56 | DATA_HTTP_DIR=https://storage.googleapis.com/deepvariant/deepsomatic-case-studies/SEQC2-S1395-truth 57 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/High-Confidence_Regions_v1.2.bed 58 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz 59 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz.tbi 60 | ``` 61 | 62 | ## Running DeepSomatic with one command 63 | 64 | DeepVariant pipeline consists of 3 steps: `make_examples_somatic`, `call_variants`, and 65 | `postprocess_variants`. You can run DeepSomatic with one command using the 66 | `run_deepvariant` script. 67 | 68 | ### Running on a CPU-only machine 69 | 70 | ```bash 71 | BIN_VERSION="1.9.0" 72 | 73 | sudo docker pull google/deepsomatic:"${BIN_VERSION}" 74 | 75 | sudo docker run \ 76 | -v ${INPUT_DIR}:${INPUT_DIR} \ 77 | -v ${OUTPUT_DIR}:${OUTPUT_DIR} \ 78 | google/deepsomatic:"${BIN_VERSION}" \ 79 | run_deepsomatic \ 80 | --model_type=ONT_TUMOR_ONLY \ 81 | --ref=${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna \ 82 | --reads_tumor=${INPUT_DIR}/HCC1395_ont.tumor.chr1.bam \ 83 | --output_vcf=${OUTPUT_DIR}/HCC1395_deepsomatic_output.vcf.gz \ 84 | --sample_name_tumor="HCC1395Tumor" \ 85 | --num_shards=$(nproc) \ 86 | --logging_dir=${OUTPUT_DIR}/logs \ 87 | --intermediate_results_dir=${OUTPUT_DIR}/intermediate_results_dir \ 88 | --use_default_pon_filtering=true \ 89 | --regions=chr1 90 | ``` 91 | 92 | By using `--use_default_pon_filtering=true` the somatic variants will be 93 | filtered using the default PON vcf that contains variant calls from dbSNP, 94 | gnomAD and 1000 genomes. If you plan to customize post-filtering, then you 95 | can set this parameter to `false` and use custom filtering. 96 | 97 | NOTE: If you want to run each of the steps separately, add `--dry_run=true` 98 | to the command above to figure out what flags you need in each step. Based on 99 | the different model types, different flags are needed in the `make_examples` 100 | step. 101 | 102 | `--intermediate_results_dir` flag is optional. By specifying it, the 103 | intermediate outputs of `make_examples_somatic` and `call_variants` stages can be found in the directory. 104 | 105 | ```bash 106 | sudo docker pull pkrusche/hap.py:latest 107 | # Run hap.py 108 | sudo docker run \ 109 | -v ${INPUT_DIR}:${INPUT_DIR} -v ${OUTPUT_DIR}:${OUTPUT_DIR} \ 110 | pkrusche/hap.py:latest \ 111 | /opt/hap.py/bin/som.py \ 112 | -N ${INPUT_DIR}/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz \ 113 | ${OUTPUT_DIR}/HCC1395_deepsomatic_output.vcf.gz \ 114 | -r ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna \ 115 | -o ${OUTPUT_DIR}/sompy_output/deepsomatic.chr1.sompy.output \ 116 | --feature-table generic \ 117 | -R ${INPUT_DIR}/High-Confidence_Regions_v1.2.bed \ 118 | -l chr1 119 | ``` 120 | 121 | The output: 122 | 123 | ``` 124 | type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate 125 | 0 indels 133 183 66 117 67 0 0 0.496241 0.412115 0.580528 0.496241 0.360656 0.293704 0.431976 0 0 248956422 0.469962 126 | 1 SNVs 3440 4389 2623 1766 817 0 0 0.762500 0.748063 0.776496 0.762500 0.597630 0.583062 0.612070 0 0 248956422 7.093611 127 | 5 records 3573 4572 2689 1883 884 0 0 0.752589 0.738239 0.766529 0.752589 0.588145 0.573827 0.602352 0 0 248956422 7.563573 128 | ``` 129 | -------------------------------------------------------------------------------- /docs/deepsomatic-case-study-ont.md: -------------------------------------------------------------------------------- 1 | # DeepSomatic ONT case study 2 | 3 | In this case study, we show an example of running DeepSomatic on ONT 4 | data. We use HCC1395 as an example for this case study. 5 | 6 | ## Data details 7 | 8 | For this case-study, we use HCC1395 as an example. We run the analysis on `chr1` 9 | that we hold out during training. 10 | 11 | Please see the [metrics page](metrics.md) for details on runtime and data. 12 | 13 | ## Prepare environment 14 | 15 | ### Tools 16 | 17 | [Docker](https://docs.docker.com/get-docker/) will be used to run DeepSomatic 18 | and [hap.py](https://github.com/illumina/hap.py), 19 | 20 | ### Download input data 21 | 22 | We will be using GRCh38 for this case study. 23 | 24 | 25 | ```bash 26 | BASE="${HOME}/deepsomatic-ont-case-study" 27 | 28 | # Set up input and output directory data 29 | INPUT_DIR="${BASE}/input/data" 30 | OUTPUT_DIR="${BASE}/output" 31 | 32 | ## Create local directory structure 33 | mkdir -p "${INPUT_DIR}" 34 | mkdir -p "${OUTPUT_DIR}" 35 | mkdir -p "${OUTPUT_DIR}/sompy_output" 36 | 37 | # Download bam files to input directory 38 | HTTPDIR=https://storage.googleapis.com/deepvariant/deepsomatic-case-studies/deepsomatic-chr1-case-studies 39 | # Download the reference files 40 | curl ${HTTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna > ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna 41 | curl ${HTTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna.fai > ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna.fai 42 | 43 | # Download the bam files 44 | curl ${HTTPDIR}/HCC1395_ont.normal.chr1.bam > ${INPUT_DIR}/HCC1395_ont.normal.chr1.bam 45 | curl ${HTTPDIR}/HCC1395_ont.normal.chr1.bam.bai > ${INPUT_DIR}/HCC1395_ont.normal.chr1.bam.bai 46 | curl ${HTTPDIR}/HCC1395_ont.tumor.chr1.bam > ${INPUT_DIR}/HCC1395_ont.tumor.chr1.bam 47 | curl ${HTTPDIR}/HCC1395_ont.tumor.chr1.bam.bai > ${INPUT_DIR}/HCC1395_ont.tumor.chr1.bam.bai 48 | 49 | # Download truth VCF 50 | DATA_HTTP_DIR=https://storage.googleapis.com/deepvariant/deepsomatic-case-studies/SEQC2-S1395-truth 51 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/High-Confidence_Regions_v1.2.bed 52 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz 53 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz.tbi 54 | ``` 55 | 56 | ## Running DeepSomatic with one command 57 | 58 | DeepVariant pipeline consists of 3 steps: `make_examples_somatic`, `call_variants`, and 59 | `postprocess_variants`. You can run DeepSomatic with one command using the 60 | `run_deepvariant` script. 61 | 62 | ### Running on a CPU-only machine 63 | 64 | ```bash 65 | BIN_VERSION="1.9.0" 66 | 67 | sudo docker pull google/deepsomatic:"${BIN_VERSION}" 68 | 69 | sudo docker run \ 70 | -v ${INPUT_DIR}:${INPUT_DIR} \ 71 | -v ${OUTPUT_DIR}:${OUTPUT_DIR} \ 72 | google/deepsomatic:"${BIN_VERSION}" \ 73 | run_deepsomatic \ 74 | --model_type=ONT \ 75 | --ref=${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna \ 76 | --reads_normal=${INPUT_DIR}/HCC1395_ont.normal.chr1.bam \ 77 | --reads_tumor=${INPUT_DIR}/HCC1395_ont.tumor.chr1.bam \ 78 | --output_vcf=${OUTPUT_DIR}/HCC1395_deepsomatic_output.vcf.gz \ 79 | --sample_name_tumor="HCC1395Tumor" \ 80 | --sample_name_normal="HCC1395Normal" \ 81 | --num_shards=$(nproc) \ 82 | --logging_dir=${OUTPUT_DIR}/logs \ 83 | --intermediate_results_dir=${OUTPUT_DIR}/intermediate_results_dir \ 84 | --regions=chr1 85 | ``` 86 | 87 | NOTE: If you want to run each of the steps separately, add `--dry_run=true` 88 | to the command above to figure out what flags you need in each step. Based on 89 | the different model types, different flags are needed in the `make_examples` 90 | step. 91 | 92 | `--intermediate_results_dir` flag is optional. By specifying it, the 93 | intermediate outputs of `make_examples_somatic` and `call_variants` stages can be found in the directory. 94 | 95 | ```bash 96 | sudo docker pull pkrusche/hap.py:latest 97 | # Run hap.py 98 | sudo docker run \ 99 | -v ${INPUT_DIR}:${INPUT_DIR} -v ${OUTPUT_DIR}:${OUTPUT_DIR} \ 100 | pkrusche/hap.py:latest \ 101 | /opt/hap.py/bin/som.py \ 102 | -N ${INPUT_DIR}/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz \ 103 | ${OUTPUT_DIR}/HCC1395_deepsomatic_output.vcf.gz \ 104 | -r ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna \ 105 | -o ${OUTPUT_DIR}/sompy_output/deepsomatic.chr1.sompy.output \ 106 | --feature-table generic \ 107 | -R ${INPUT_DIR}/High-Confidence_Regions_v1.2.bed \ 108 | -l chr1 109 | ``` 110 | 111 | The output: 112 | 113 | ``` 114 | type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate 115 | 0 indels 133 110 86 24 47 0 0 0.646617 0.562921 0.724009 0.646617 0.781818 0.697965 0.851069 0 0 248956422 0.096402 116 | 1 SNVs 3440 2655 2611 44 829 0 0 0.759012 0.744506 0.773082 0.759012 0.983427 0.978031 0.987772 0 0 248956422 0.176738 117 | 5 records 3573 2765 2697 68 876 0 0 0.754828 0.740520 0.768723 0.754828 0.975407 0.969127 0.980693 0 0 248956422 0.273140 118 | ``` 119 | -------------------------------------------------------------------------------- /docs/deepsomatic-case-study-pacbio-tumor-only.md: -------------------------------------------------------------------------------- 1 | # DeepSomatic PacBio tumor-only case study 2 | 3 | In this case study, we show an example of running DeepSomatic on PacBio 4 | tumor-only data. We use HCC1395 as an example for this case study. 5 | 6 | ## Data details 7 | 8 | For this case-study, we use HCC1395 as an example. We run the analysis on `chr1` 9 | that we hold out during training. 10 | 11 | Please see the [metrics page](metrics.md) for details on runtime and data. 12 | 13 | ## Allele frequency channel 14 | 15 | For accurate tumor-only calling, we use the allele-frequency channel that uses 16 | 1000 genomes variant calls using DeepVariant to filter out germline variants 17 | during inference. Currently, the default VCF is set to variant calls against 18 | GRCh38 reference. If you want to customize this to your VCF then please do 19 | so by using `--population_vcfs` parameter. 20 | 21 | ## Prepare environment 22 | 23 | ### Tools 24 | 25 | [Docker](https://docs.docker.com/get-docker/) will be used to run DeepSomatic 26 | and [hap.py](https://github.com/illumina/hap.py), 27 | 28 | ### Download input data 29 | 30 | We will be using GRCh38 for this case study. 31 | 32 | 33 | ```bash 34 | BASE="${HOME}/deepsomatic-pacbio-tumor-only-case-study" 35 | 36 | # Set up input and output directory data 37 | INPUT_DIR="${BASE}/input/data" 38 | OUTPUT_DIR="${BASE}/output" 39 | 40 | ## Create local directory structure 41 | mkdir -p "${INPUT_DIR}" 42 | mkdir -p "${OUTPUT_DIR}" 43 | mkdir -p "${OUTPUT_DIR}/sompy_output" 44 | 45 | # Download bam files to input directory 46 | HTTPDIR=https://storage.googleapis.com/deepvariant/deepsomatic-case-studies/deepsomatic-chr1-case-studies 47 | # Download the reference files 48 | curl ${HTTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna > ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna 49 | curl ${HTTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna.fai > ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna.fai 50 | 51 | # Download the bam file 52 | curl ${HTTPDIR}/HCC1395_pacbio.tumor.chr1.bam > ${INPUT_DIR}/HCC1395_pacbio.tumor.chr1.bam 53 | curl ${HTTPDIR}/HCC1395_pacbio.tumor.chr1.bam.bai > ${INPUT_DIR}/HCC1395_pacbio.tumor.chr1.bam.bai 54 | 55 | # Download truth VCF 56 | DATA_HTTP_DIR=https://storage.googleapis.com/deepvariant/deepsomatic-case-studies/SEQC2-S1395-truth 57 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/High-Confidence_Regions_v1.2.bed 58 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz 59 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz.tbi 60 | ``` 61 | 62 | ## Running DeepSomatic with one command 63 | 64 | DeepVariant pipeline consists of 3 steps: `make_examples_somatic`, `call_variants`, and 65 | `postprocess_variants`. You can run DeepSomatic with one command using the 66 | `run_deepvariant` script. 67 | 68 | ### Running on a CPU-only machine 69 | 70 | ```bash 71 | BIN_VERSION="1.9.0" 72 | 73 | sudo docker pull google/deepsomatic:"${BIN_VERSION}" 74 | 75 | sudo docker run \ 76 | -v ${INPUT_DIR}:${INPUT_DIR} \ 77 | -v ${OUTPUT_DIR}:${OUTPUT_DIR} \ 78 | google/deepsomatic:"${BIN_VERSION}" \ 79 | run_deepsomatic \ 80 | --model_type=PACBIO_TUMOR_ONLY \ 81 | --ref=${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna \ 82 | --reads_tumor=${INPUT_DIR}/HCC1395_pacbio.tumor.chr1.bam \ 83 | --output_vcf=${OUTPUT_DIR}/HCC1395_deepsomatic_output.vcf.gz \ 84 | --sample_name_tumor="HCC1395Tumor" \ 85 | --num_shards=$(nproc) \ 86 | --logging_dir=${OUTPUT_DIR}/logs \ 87 | --intermediate_results_dir=${OUTPUT_DIR}/intermediate_results_dir \ 88 | --use_default_pon_filtering=true \ 89 | --regions=chr1 90 | ``` 91 | 92 | By using `--use_default_pon_filtering=true` the somatic variants will be 93 | filtered using the default PON vcf that contains variant calls from dbSNP, 94 | gnomAD and 1000 genomes. If you plan to customize post-filtering, then you 95 | can set this parameter to `false` and use custom filtering. 96 | 97 | NOTE: If you want to run each of the steps separately, add `--dry_run=true` 98 | to the command above to figure out what flags you need in each step. Based on 99 | the different model types, different flags are needed in the `make_examples` 100 | step. 101 | 102 | `--intermediate_results_dir` flag is optional. By specifying it, the 103 | intermediate outputs of `make_examples_somatic` and `call_variants` stages can be found in the directory. 104 | 105 | ```bash 106 | sudo docker pull pkrusche/hap.py:latest 107 | # Run hap.py 108 | sudo docker run \ 109 | -v ${INPUT_DIR}:${INPUT_DIR} -v ${OUTPUT_DIR}:${OUTPUT_DIR} \ 110 | pkrusche/hap.py:latest \ 111 | /opt/hap.py/bin/som.py \ 112 | -N ${INPUT_DIR}/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz \ 113 | ${OUTPUT_DIR}/HCC1395_deepsomatic_output.vcf.gz \ 114 | -r ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna \ 115 | -o ${OUTPUT_DIR}/sompy_output/deepsomatic.chr1.sompy.output \ 116 | --feature-table generic \ 117 | -R ${INPUT_DIR}/High-Confidence_Regions_v1.2.bed \ 118 | -l chr1 119 | ``` 120 | 121 | The output: 122 | 123 | ``` 124 | type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate 125 | 0 indels 133 185 84 101 49 0 0 0.631579 0.547480 0.710021 0.631579 0.454054 0.383486 0.526047 0 0 248956422 0.405693 126 | 1 SNVs 3440 4816 3212 1604 228 0 0 0.933721 0.925041 0.941671 0.933721 0.666944 0.653535 0.680151 0 0 248956422 6.442895 127 | 5 records 3573 5001 3296 1705 277 0 0 0.922474 0.913362 0.930902 0.922474 0.659068 0.645841 0.672111 0 0 248956422 6.848588 128 | ``` 129 | -------------------------------------------------------------------------------- /docs/deepsomatic-case-study-pacbio.md: -------------------------------------------------------------------------------- 1 | # DeepSomatic PacBio case study 2 | 3 | In this case study, we show an example of running DeepSomatic on PacBio 4 | data. We use HCC1395 as an example for this case study. 5 | 6 | ## Data details 7 | 8 | For this case-study, we use HCC1395 as an example. We run the analysis on `chr1` 9 | that we hold out during training. 10 | 11 | Please see the [metrics page](metrics.md) for details on runtime and data. 12 | 13 | ## Prepare environment 14 | 15 | ### Tools 16 | 17 | [Docker](https://docs.docker.com/get-docker/) will be used to run DeepSomatic 18 | and [hap.py](https://github.com/illumina/hap.py), 19 | 20 | ### Download input data 21 | 22 | We will be using GRCh38 for this case study. 23 | 24 | 25 | ```bash 26 | BASE="${HOME}/deepsomatic-pacbio-case-study" 27 | 28 | # Set up input and output directory data 29 | INPUT_DIR="${BASE}/input/data" 30 | OUTPUT_DIR="${BASE}/output" 31 | 32 | ## Create local directory structure 33 | mkdir -p "${INPUT_DIR}" 34 | mkdir -p "${OUTPUT_DIR}" 35 | mkdir -p "${OUTPUT_DIR}/sompy_output" 36 | 37 | # Download bam files to input directory 38 | HTTPDIR=https://storage.googleapis.com/deepvariant/deepsomatic-case-studies/deepsomatic-chr1-case-studies 39 | # Download the reference files 40 | curl ${HTTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna > ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna 41 | curl ${HTTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna.fai > ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna.fai 42 | 43 | # Download the bam files 44 | curl ${HTTPDIR}/HCC1395_pacbio.normal.chr1.bam > ${INPUT_DIR}/HCC1395_pacbio.normal.chr1.bam 45 | curl ${HTTPDIR}/HCC1395_pacbio.normal.chr1.bam.bai > ${INPUT_DIR}/HCC1395_pacbio.normal.chr1.bam.bai 46 | curl ${HTTPDIR}/HCC1395_pacbio.tumor.chr1.bam > ${INPUT_DIR}/HCC1395_pacbio.tumor.chr1.bam 47 | curl ${HTTPDIR}/HCC1395_pacbio.tumor.chr1.bam.bai > ${INPUT_DIR}/HCC1395_pacbio.tumor.chr1.bam.bai 48 | 49 | # Download truth VCF 50 | DATA_HTTP_DIR=https://storage.googleapis.com/deepvariant/deepsomatic-case-studies/SEQC2-S1395-truth 51 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/High-Confidence_Regions_v1.2.bed 52 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz 53 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz.tbi 54 | ``` 55 | 56 | ## Running DeepSomatic with one command 57 | 58 | DeepVariant pipeline consists of 3 steps: `make_examples_somatic`, `call_variants`, and 59 | `postprocess_variants`. You can run DeepSomatic with one command using the 60 | `run_deepvariant` script. 61 | 62 | ### Running on a CPU-only machine 63 | 64 | ```bash 65 | BIN_VERSION="1.9.0" 66 | 67 | sudo docker pull google/deepsomatic:"${BIN_VERSION}" 68 | 69 | sudo docker run \ 70 | -v ${INPUT_DIR}:${INPUT_DIR} \ 71 | -v ${OUTPUT_DIR}:${OUTPUT_DIR} \ 72 | google/deepsomatic:"${BIN_VERSION}" \ 73 | run_deepsomatic \ 74 | --model_type=PACBIO \ 75 | --ref=${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna \ 76 | --reads_normal=${INPUT_DIR}/HCC1395_pacbio.normal.chr1.bam \ 77 | --reads_tumor=${INPUT_DIR}/HCC1395_pacbio.tumor.chr1.bam \ 78 | --output_vcf=${OUTPUT_DIR}/HCC1395_deepsomatic_output.vcf.gz \ 79 | --sample_name_tumor="HCC1395Tumor" \ 80 | --sample_name_normal="HCC1395Normal" \ 81 | --num_shards=$(nproc) \ 82 | --logging_dir=${OUTPUT_DIR}/logs \ 83 | --intermediate_results_dir=${OUTPUT_DIR}/intermediate_results_dir \ 84 | --regions=chr1 85 | ``` 86 | 87 | NOTE: If you want to run each of the steps separately, add `--dry_run=true` 88 | to the command above to figure out what flags you need in each step. Based on 89 | the different model types, different flags are needed in the `make_examples` 90 | step. 91 | 92 | `--intermediate_results_dir` flag is optional. By specifying it, the 93 | intermediate outputs of `make_examples_somatic` and `call_variants` stages can be found in the directory. 94 | 95 | ```bash 96 | sudo docker pull pkrusche/hap.py:latest 97 | # Run hap.py 98 | sudo docker run \ 99 | -v ${INPUT_DIR}:${INPUT_DIR} -v ${OUTPUT_DIR}:${OUTPUT_DIR} \ 100 | pkrusche/hap.py:latest \ 101 | /opt/hap.py/bin/som.py \ 102 | -N ${INPUT_DIR}/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz \ 103 | ${OUTPUT_DIR}/HCC1395_deepsomatic_output.vcf.gz \ 104 | -r ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna \ 105 | -o ${OUTPUT_DIR}/sompy_output/deepsomatic.chr1.sompy.output \ 106 | --feature-table generic \ 107 | -R ${INPUT_DIR}/High-Confidence_Regions_v1.2.bed \ 108 | -l chr1 109 | ``` 110 | 111 | The output: 112 | 113 | ``` 114 | type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate 115 | 0 indels 133 150 114 36 19 0 0 0.857143 0.790241 0.908710 0.857143 0.76000 0.687131 0.822947 0 0 248956422 0.144604 116 | 1 SNVs 3440 3349 3228 121 212 0 0 0.938372 0.929965 0.946042 0.938372 0.96387 0.957144 0.969795 0 0 248956422 0.486029 117 | 5 records 3573 3499 3342 157 231 0 0 0.935348 0.926931 0.943061 0.935348 0.95513 0.947891 0.961617 0 0 248956422 0.630632 118 | ``` 119 | -------------------------------------------------------------------------------- /docs/deepsomatic-case-study-wes.md: -------------------------------------------------------------------------------- 1 | # DeepSomatic WES case study 2 | 3 | In this case study, we show an example of running DeepSomatic on WES 4 | data. We use HCC1395 as an example for this case study. 5 | 6 | ## Data details 7 | 8 | For this case-study, we use HCC1395 as an example. We run the analysis on `chr1` 9 | that we hold out during training. 10 | 11 | Please see the [metrics page](metrics.md) for details on runtime and data. 12 | 13 | ## Prepare environment 14 | 15 | ### Tools 16 | 17 | [Docker](https://docs.docker.com/get-docker/) will be used to run DeepSomatic 18 | and [hap.py](https://github.com/illumina/hap.py), 19 | 20 | ### Download input data 21 | 22 | We will be using GRCh38 for this case study. 23 | 24 | 25 | ```bash 26 | BASE="${HOME}/deepsomatic-wes-case-study" 27 | 28 | # Set up input and output directory data 29 | INPUT_DIR="${BASE}/input/data" 30 | OUTPUT_DIR="${BASE}/output" 31 | 32 | ## Create local directory structure 33 | mkdir -p "${INPUT_DIR}" 34 | mkdir -p "${OUTPUT_DIR}" 35 | mkdir -p "${OUTPUT_DIR}/sompy_output" 36 | 37 | # Download bam files to input directory 38 | HTTPDIR=https://storage.googleapis.com/deepvariant/deepsomatic-case-studies/deepsomatic-chr1-case-studies 39 | # Download the reference files 40 | curl ${HTTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna > ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna 41 | curl ${HTTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna.fai > ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna.fai 42 | 43 | # Download the bam files 44 | curl ${HTTPDIR}/HCC1395_wes.normal.chr1.bam > ${INPUT_DIR}/HCC1395_wes.normal.chr1.bam 45 | curl ${HTTPDIR}/HCC1395_wes.normal.chr1.bam.bai > ${INPUT_DIR}/HCC1395_wes.normal.chr1.bam.bai 46 | curl ${HTTPDIR}/HCC1395_wes.tumor.chr1.bam > ${INPUT_DIR}/HCC1395_wes.tumor.chr1.bam 47 | curl ${HTTPDIR}/HCC1395_wes.tumor.chr1.bam.bai > ${INPUT_DIR}/HCC1395_wes.tumor.chr1.bam.bai 48 | 49 | # Download truth VCF 50 | DATA_HTTP_DIR=https://storage.googleapis.com/deepvariant/deepsomatic-case-studies/SEQC2-S1395-truth 51 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/High-Confidence_Regions_v1.2.bed 52 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz 53 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz.tbi 54 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/seqc2_hg38.exome_regions.bed 55 | ``` 56 | 57 | ## Running DeepSomatic with one command 58 | 59 | DeepVariant pipeline consists of 3 steps: `make_examples_somatic`, `call_variants`, and 60 | `postprocess_variants`. You can run DeepSomatic with one command using the 61 | `run_deepvariant` script. 62 | 63 | ### Running on a CPU-only machine 64 | 65 | ```bash 66 | BIN_VERSION="1.9.0" 67 | 68 | sudo docker pull google/deepsomatic:"${BIN_VERSION}" 69 | 70 | sudo docker run \ 71 | -v ${INPUT_DIR}:${INPUT_DIR} \ 72 | -v ${OUTPUT_DIR}:${OUTPUT_DIR} \ 73 | google/deepsomatic:"${BIN_VERSION}" \ 74 | run_deepsomatic \ 75 | --model_type=WES \ 76 | --ref=${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna \ 77 | --reads_normal=${INPUT_DIR}/HCC1395_wes.normal.chr1.bam \ 78 | --reads_tumor=${INPUT_DIR}/HCC1395_wes.tumor.chr1.bam \ 79 | --output_vcf=${OUTPUT_DIR}/HCC1395_deepsomatic_output.vcf.gz \ 80 | --sample_name_tumor="HCC1395Tumor" \ 81 | --sample_name_normal="HCC1395Normal" \ 82 | --num_shards=$(nproc) \ 83 | --logging_dir=${OUTPUT_DIR}/logs \ 84 | --intermediate_results_dir=${OUTPUT_DIR}/intermediate_results_dir \ 85 | --regions=chr1 86 | ``` 87 | 88 | NOTE: If you want to run each of the steps separately, add `--dry_run=true` 89 | to the command above to figure out what flags you need in each step. Based on 90 | the different model types, different flags are needed in the `make_examples` 91 | step. 92 | 93 | `--intermediate_results_dir` flag is optional. By specifying it, the 94 | intermediate outputs of `make_examples_somatic` and `call_variants` stages can be found in the directory. 95 | 96 | ```bash 97 | sudo docker pull pkrusche/hap.py:latest 98 | # Run hap.py 99 | sudo docker run \ 100 | -v ${INPUT_DIR}:${INPUT_DIR} -v ${OUTPUT_DIR}:${OUTPUT_DIR} \ 101 | pkrusche/hap.py:latest \ 102 | /opt/hap.py/bin/som.py \ 103 | -N ${INPUT_DIR}/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz \ 104 | ${OUTPUT_DIR}/HCC1395_deepsomatic_output.vcf.gz \ 105 | -r ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna \ 106 | -o ${OUTPUT_DIR}/sompy_output/deepsomatic.chr1.sompy.output \ 107 | --feature-table generic \ 108 | -R ${INPUT_DIR}/High-Confidence_Regions_v1.2.bed \ 109 | -T ${INPUT_DIR}/seqc2_hg38.exome_regions.bed \ 110 | -l chr1 111 | ``` 112 | 113 | The output: 114 | 115 | ``` 116 | type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate 117 | 0 indels 7 8 7 1 0 0 0 1.000000 0.590384 1.000000 1.000000 0.875000 0.546281 1 0 0 248956422 0.004017 118 | 1 SNVs 145 132 132 0 13 0 0 0.910345 0.855723 0.948809 0.910345 1.000000 0.972441 1 0 0 248956422 0.000000 119 | 5 records 152 140 139 1 13 0 0 0.914474 0.862163 0.951209 0.914474 0.992857 0.967106 1 0 0 248956422 0.004017 120 | ``` 121 | -------------------------------------------------------------------------------- /docs/deepsomatic-case-study-wgs-tumor-only.md: -------------------------------------------------------------------------------- 1 | # DeepSomatic WGS tumor-only case study 2 | 3 | In this case study, we show an example of running DeepSomatic on WGS 4 | tumor-only data. We use HCC1395 as an example for this case study. 5 | 6 | ## Data details 7 | 8 | For this case-study, we use HCC1395 as an example. We run the analysis on `chr1` 9 | that we hold out during training. 10 | 11 | Please see the [metrics page](metrics.md) for details on runtime and data. 12 | 13 | ## Allele frequency channel 14 | 15 | For accurate tumor-only calling, we use the allele-frequency channel that uses 16 | 1000 genomes variant calls using DeepVariant to filter out germline variants 17 | during inference. Currently, the default VCF is set to variant calls against 18 | GRCh38 reference. If you want to customize this to your VCF then please do 19 | so by using `--population_vcfs` parameter. 20 | 21 | ## Prepare environment 22 | 23 | ### Tools 24 | 25 | [Docker](https://docs.docker.com/get-docker/) will be used to run DeepSomatic 26 | and [hap.py](https://github.com/illumina/hap.py), 27 | 28 | ### Download input data 29 | 30 | We will be using GRCh38 for this case study. 31 | 32 | 33 | ```bash 34 | BASE="${HOME}/deepsomatic-wgs-tumor-only-case-study" 35 | 36 | # Set up input and output directory data 37 | INPUT_DIR="${BASE}/input/data" 38 | OUTPUT_DIR="${BASE}/output" 39 | 40 | ## Create local directory structure 41 | mkdir -p "${INPUT_DIR}" 42 | mkdir -p "${OUTPUT_DIR}" 43 | mkdir -p "${OUTPUT_DIR}/sompy_output" 44 | 45 | # Download bam files to input directory 46 | HTTPDIR=https://storage.googleapis.com/deepvariant/deepsomatic-case-studies/deepsomatic-chr1-case-studies 47 | # Download the reference files 48 | curl ${HTTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna > ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna 49 | curl ${HTTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna.fai > ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna.fai 50 | 51 | # Download the bam file 52 | curl ${HTTPDIR}/HCC1395_wgs.tumor.chr1.bam > ${INPUT_DIR}/HCC1395_wgs.tumor.chr1.bam 53 | curl ${HTTPDIR}/HCC1395_wgs.tumor.chr1.bam.bai > ${INPUT_DIR}/HCC1395_wgs.tumor.chr1.bam.bai 54 | 55 | # Download truth VCF 56 | DATA_HTTP_DIR=https://storage.googleapis.com/deepvariant/deepsomatic-case-studies/SEQC2-S1395-truth 57 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/High-Confidence_Regions_v1.2.bed 58 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz 59 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz.tbi 60 | ``` 61 | 62 | ## Running DeepSomatic with one command 63 | 64 | DeepVariant pipeline consists of 3 steps: `make_examples_somatic`, `call_variants`, and 65 | `postprocess_variants`. You can run DeepSomatic with one command using the 66 | `run_deepvariant` script. 67 | 68 | ### Running on a CPU-only machine 69 | 70 | ```bash 71 | BIN_VERSION="1.9.0" 72 | 73 | sudo docker pull google/deepsomatic:"${BIN_VERSION}" 74 | 75 | sudo docker run \ 76 | -v ${INPUT_DIR}:${INPUT_DIR} \ 77 | -v ${OUTPUT_DIR}:${OUTPUT_DIR} \ 78 | google/deepsomatic:"${BIN_VERSION}" \ 79 | run_deepsomatic \ 80 | --model_type=WGS_TUMOR_ONLY \ 81 | --ref=${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna \ 82 | --reads_tumor=${INPUT_DIR}/HCC1395_wgs.tumor.chr1.bam \ 83 | --output_vcf=${OUTPUT_DIR}/HCC1395_deepsomatic_output.vcf.gz \ 84 | --sample_name_tumor="HCC1395Tumor" \ 85 | --num_shards=$(nproc) \ 86 | --logging_dir=${OUTPUT_DIR}/logs \ 87 | --intermediate_results_dir=${OUTPUT_DIR}/intermediate_results_dir \ 88 | --use_default_pon_filtering=true \ 89 | --regions=chr1 90 | ``` 91 | 92 | By using `--use_default_pon_filtering=true` the somatic variants will be 93 | filtered using the default PON vcf that contains variant calls from dbSNP, 94 | gnomAD and 1000 genomes. If you plan to customize post-filtering, then you 95 | can set this parameter to `false` and use custom filtering. 96 | 97 | NOTE: If you want to run each of the steps separately, add `--dry_run=true` 98 | to the command above to figure out what flags you need in each step. Based on 99 | the different model types, different flags are needed in the `make_examples` 100 | step. 101 | 102 | `--intermediate_results_dir` flag is optional. By specifying it, the 103 | intermediate outputs of `make_examples_somatic` and `call_variants` stages can be found in the directory. 104 | 105 | ```bash 106 | sudo docker pull pkrusche/hap.py:latest 107 | # Run hap.py 108 | sudo docker run \ 109 | -v ${INPUT_DIR}:${INPUT_DIR} -v ${OUTPUT_DIR}:${OUTPUT_DIR} \ 110 | pkrusche/hap.py:latest \ 111 | /opt/hap.py/bin/som.py \ 112 | -N ${INPUT_DIR}/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz \ 113 | ${OUTPUT_DIR}/HCC1395_deepsomatic_output.vcf.gz \ 114 | -r ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna \ 115 | -o ${OUTPUT_DIR}/sompy_output/deepsomatic.chr1.sompy.output \ 116 | --feature-table generic \ 117 | -R ${INPUT_DIR}/High-Confidence_Regions_v1.2.bed \ 118 | -l chr1 119 | ``` 120 | 121 | The output: 122 | 123 | ``` 124 | type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate 125 | 0 indels 133 287 78 209 55 0 0 0.586466 0.501655 0.667560 0.586466 0.271777 0.222782 0.325349 0 0 248956422 0.839504 126 | 1 SNVs 3440 3434 2571 863 869 0 0 0.747384 0.732660 0.761692 0.747384 0.748690 0.733976 0.762984 0 0 248956422 3.466470 127 | 5 records 3573 3721 2649 1072 924 0 0 0.741394 0.726845 0.755552 0.741394 0.711905 0.697194 0.726288 0 0 248956422 4.305974 128 | ``` 129 | -------------------------------------------------------------------------------- /docs/deepsomatic-case-study-wgs.md: -------------------------------------------------------------------------------- 1 | # DeepSomatic WGS case study 2 | 3 | In this case study, we show an example of running DeepSomatic on WGS 4 | data. We use HCC1395 as an example for this case study. 5 | 6 | ## Data details 7 | 8 | For this case-study, we use HCC1395 as an example. We run the analysis on `chr1` 9 | that we hold out during training. 10 | 11 | Please see the [metrics page](metrics.md) for details on runtime and data. 12 | 13 | ## Prepare environment 14 | 15 | ### Tools 16 | 17 | [Docker](https://docs.docker.com/get-docker/) will be used to run DeepSomatic 18 | and [hap.py](https://github.com/illumina/hap.py), 19 | 20 | ### Download input data 21 | 22 | We will be using GRCh38 for this case study. 23 | 24 | 25 | ```bash 26 | BASE="${HOME}/deepsomatic-wgs-case-study" 27 | 28 | # Set up input and output directory data 29 | INPUT_DIR="${BASE}/input/data" 30 | OUTPUT_DIR="${BASE}/output" 31 | 32 | ## Create local directory structure 33 | mkdir -p "${INPUT_DIR}" 34 | mkdir -p "${OUTPUT_DIR}" 35 | mkdir -p "${OUTPUT_DIR}/sompy_output" 36 | 37 | # Download bam files to input directory 38 | HTTPDIR=https://storage.googleapis.com/deepvariant/deepsomatic-case-studies/deepsomatic-chr1-case-studies 39 | # Download the reference files 40 | curl ${HTTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna > ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna 41 | curl ${HTTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna.fai > ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna.fai 42 | 43 | # Download the bam files 44 | curl ${HTTPDIR}/HCC1395_wgs.normal.chr1.bam > ${INPUT_DIR}/HCC1395_wgs.normal.chr1.bam 45 | curl ${HTTPDIR}/HCC1395_wgs.normal.chr1.bam.bai > ${INPUT_DIR}/HCC1395_wgs.normal.chr1.bam.bai 46 | curl ${HTTPDIR}/HCC1395_wgs.tumor.chr1.bam > ${INPUT_DIR}/HCC1395_wgs.tumor.chr1.bam 47 | curl ${HTTPDIR}/HCC1395_wgs.tumor.chr1.bam.bai > ${INPUT_DIR}/HCC1395_wgs.tumor.chr1.bam.bai 48 | 49 | # Download truth VCF 50 | DATA_HTTP_DIR=https://storage.googleapis.com/deepvariant/deepsomatic-case-studies/SEQC2-S1395-truth 51 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/High-Confidence_Regions_v1.2.bed 52 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz 53 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz.tbi 54 | ``` 55 | 56 | ## Running DeepSomatic with one command 57 | 58 | DeepVariant pipeline consists of 3 steps: `make_examples_somatic`, `call_variants`, and 59 | `postprocess_variants`. You can run DeepSomatic with one command using the 60 | `run_deepvariant` script. 61 | 62 | ### Running on a CPU-only machine 63 | 64 | ```bash 65 | BIN_VERSION="1.9.0" 66 | 67 | sudo docker pull google/deepsomatic:"${BIN_VERSION}" 68 | 69 | sudo docker run \ 70 | -v ${INPUT_DIR}:${INPUT_DIR} \ 71 | -v ${OUTPUT_DIR}:${OUTPUT_DIR} \ 72 | google/deepsomatic:"${BIN_VERSION}" \ 73 | run_deepsomatic \ 74 | --model_type=WGS \ 75 | --ref=${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna \ 76 | --reads_normal=${INPUT_DIR}/HCC1395_wgs.normal.chr1.bam \ 77 | --reads_tumor=${INPUT_DIR}/HCC1395_wgs.tumor.chr1.bam \ 78 | --output_vcf=${OUTPUT_DIR}/HCC1395_deepsomatic_output.vcf.gz \ 79 | --sample_name_tumor="HCC1395Tumor" \ 80 | --sample_name_normal="HCC1395Normal" \ 81 | --num_shards=$(nproc) \ 82 | --logging_dir=${OUTPUT_DIR}/logs \ 83 | --intermediate_results_dir=${OUTPUT_DIR}/intermediate_results_dir \ 84 | --regions=chr1 85 | ``` 86 | 87 | NOTE: If you want to run each of the steps separately, add `--dry_run=true` 88 | to the command above to figure out what flags you need in each step. Based on 89 | the different model types, different flags are needed in the `make_examples` 90 | step. 91 | 92 | `--intermediate_results_dir` flag is optional. By specifying it, the 93 | intermediate outputs of `make_examples_somatic` and `call_variants` stages can be found in the directory. 94 | 95 | ```bash 96 | sudo docker pull pkrusche/hap.py:latest 97 | # Run hap.py 98 | sudo docker run \ 99 | -v ${INPUT_DIR}:${INPUT_DIR} -v ${OUTPUT_DIR}:${OUTPUT_DIR} \ 100 | pkrusche/hap.py:latest \ 101 | /opt/hap.py/bin/som.py \ 102 | -N ${INPUT_DIR}/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz \ 103 | ${OUTPUT_DIR}/HCC1395_deepsomatic_output.vcf.gz \ 104 | -r ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna \ 105 | -o ${OUTPUT_DIR}/sompy_output/deepsomatic.chr1.sompy.output \ 106 | --feature-table generic \ 107 | -R ${INPUT_DIR}/High-Confidence_Regions_v1.2.bed \ 108 | -l chr1 109 | ``` 110 | 111 | The output: 112 | 113 | ``` 114 | type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate 115 | 0 indels 133 149 127 22 6 0 0 0.954887 0.909391 0.980954 0.954887 0.852349 0.788841 0.902333 0 0 248956422 0.088369 116 | 1 SNVs 3440 3334 3302 32 138 0 0 0.959884 0.952935 0.966059 0.959884 0.990402 0.986652 0.993302 0 0 248956422 0.128537 117 | 5 records 3573 3483 3429 54 144 0 0 0.959698 0.952873 0.965778 0.959698 0.984496 0.979981 0.988207 0 0 248956422 0.216905 118 | ``` 119 | -------------------------------------------------------------------------------- /docs/deepsomatic-quick-start.md: -------------------------------------------------------------------------------- 1 | # DeepSomatic quick start 2 | 3 | This is an explanation of how to use DeepSomatic. 4 | 5 | ## Background 6 | 7 | To get started, you'll need the DeepSomatic programs (and some packages they 8 | depend on), some test data, and of course a place to run them. 9 | 10 | We've provided a Docker image, and some test data in a bucket on Google Cloud 11 | Storage. The instructions below show how to download the data through the 12 | corresponding public URLs from these data. 13 | 14 | This setup requires a machine with the AVX instruction set. To see if your 15 | machine meets this requirement, you can check the `/proc/cpuinfo` file, which 16 | lists this information under "flags". If you do not have the necessary 17 | instructions, see the next section for more information on how to build your own 18 | Docker image. 19 | 20 | ### Use Docker to run DeepSomatic in one command. 21 | 22 | ## Get Docker image, models, and test data 23 | 24 | ### Get Docker image 25 | 26 | ```bash 27 | BIN_VERSION="1.9.0" 28 | 29 | sudo apt -y update 30 | sudo apt-get -y install docker.io 31 | sudo docker pull google/deepsomatic:"${BIN_VERSION}" 32 | ``` 33 | 34 | ### Download test data 35 | 36 | Before you start running, you need to have the following input files: 37 | 38 | 1. A reference genome in [FASTA] format and its corresponding index file 39 | (.fai). 40 | 41 | 1. An aligned reads file in [BAM] format and its corresponding index file 42 | (.bai). You get this by aligning the reads from a sequencing instrument, 43 | using an aligner like [BWA] for example. 44 | 45 | We've prepared a small test data bundle for use in this quick start guide that 46 | can be downloaded to your instance from the public URLs. 47 | 48 | Download the test bundle: 49 | 50 | ```bash 51 | INPUT_DIR="${PWD}/deepsomatic-quickstart-testdata" 52 | DATA_HTTP_DIR="https://storage.googleapis.com/deepvariant/deepsomatic-case-studies/quick-start" 53 | 54 | mkdir -p ${INPUT_DIR} 55 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/S1395_WGS_ilm_normal.bwa.dedup.chr1.quickstart.bam 56 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/S1395_WGS_ilm_normal.bwa.dedup.chr1.quickstart.bam.bai 57 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/S1395_WGS_ilm_tumor.bwa.dedup.chr1.quickstart.bam 58 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/S1395_WGS_ilm_tumor.bwa.dedup.chr1.quickstart.bam.bai 59 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/GRCh38_no_alts_chr1.fasta 60 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/GRCh38_no_alts_chr1.fasta.fai 61 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/SEQC2_truth.chr1.quick_start.vcf.gz 62 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/SEQC2_truth.chr1.quick_start.vcf.gz.tbi 63 | wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/SEQC2_truth.chr1.quick_start.bed 64 | ``` 65 | 66 | This should create a subdirectory in the current directory containing the actual 67 | data files: 68 | 69 | ```bash 70 | ls -1 ${INPUT_DIR} 71 | ``` 72 | 73 | outputting: 74 | 75 | ``` 76 | GRCh38_no_alts_chr1.fasta 77 | GRCh38_no_alts_chr1.fasta.fai 78 | S1395_WGS_ilm_normal.bwa.dedup.chr1.quickstart.bam 79 | S1395_WGS_ilm_normal.bwa.dedup.chr1.quickstart.bam.bai 80 | S1395_WGS_ilm_tumor.bwa.dedup.chr1.quickstart.bam 81 | S1395_WGS_ilm_tumor.bwa.dedup.chr1.quickstart.bam.bai 82 | SEQC2_truth.chr1.quick_start.bed 83 | SEQC2_truth.chr1.quick_start.vcf.gz 84 | SEQC2_truth.chr1.quick_start.vcf.gz.tbi 85 | ``` 86 | 87 | ## Run DeepSomatic with one command 88 | 89 | DeepSomatic consists of 3 main binaries: `make_somatic_examples`, `call_variants`, and 90 | `postprocess_variants`. To make it easier to run, we create one entrypoint that 91 | can be directly run as a docker command. 92 | 93 | ```bash 94 | OUTPUT_DIR="${PWD}/quickstart-output" 95 | mkdir -p "${OUTPUT_DIR}" 96 | ``` 97 | 98 | You can run everything with the following command: 99 | 100 | ```bash 101 | sudo docker run \ 102 | -v ${INPUT_DIR}:${INPUT_DIR} \ 103 | -v ${OUTPUT_DIR}:${OUTPUT_DIR} \ 104 | google/deepsomatic:"${BIN_VERSION}" \ 105 | run_deepsomatic \ 106 | --model_type=WGS \ 107 | --ref=${INPUT_DIR}/GRCh38_no_alts_chr1.fasta \ 108 | --reads_normal=${INPUT_DIR}/S1395_WGS_ilm_normal.bwa.dedup.chr1.quickstart.bam \ 109 | --reads_tumor=${INPUT_DIR}/S1395_WGS_ilm_tumor.bwa.dedup.chr1.quickstart.bam \ 110 | --output_vcf=${OUTPUT_DIR}/HCC1395_deepsomatic_quickstart.vcf.gz \ 111 | --output_gvcf=${OUTPUT_DIR}/HCC1395_deepsomatic_quickstart.g.vcf.gz \ 112 | --sample_name_tumor="tumor" \ 113 | --sample_name_normal="normal" \ 114 | --num_shards=1 \ 115 | --logging_dir=${OUTPUT_DIR}/logs \ 116 | --vcf_stats_report=true \ 117 | --intermediate_results_dir ${OUTPUT_DIR}/intermediate_results_dir \ 118 | --regions=chr1:10,000,000-10,100,000 119 | ``` 120 | 121 | NOTE: If you want to look at all the commands being run, you can add 122 | `--dry_run=true` to the command above, which will print out all the commands 123 | but not execute them. 124 | 125 | This will generate 5 files and 1 directory in `${OUTPUT_DIR}`: 126 | 127 | ```bash 128 | ls -1 ${OUTPUT_DIR} 129 | ``` 130 | 131 | outputting: 132 | 133 | ``` 134 | HCC1395_deepsomatic_quickstart.g.vcf.gz 135 | HCC1395_deepsomatic_quickstart.g.vcf.gz.tbi 136 | HCC1395_deepsomatic_quickstart.vcf.gz 137 | HCC1395_deepsomatic_quickstart.vcf.gz.tbi 138 | HCC1395_deepsomatic_quickstart.visual_report.html 139 | intermediate_results_dir 140 | logs 141 | ``` 142 | 143 | The directory "intermediate_results_dir" exists because 144 | `--intermediate_results_dir /output/intermediate_results_dir` is specified. This 145 | directory contains the intermediate output of make_examples_somatic and 146 | call_variants steps. 147 | 148 | ## Notes on GPU image 149 | 150 | If you are using GPUs, you can pull the GPU version, and make sure you run with 151 | `--gpus 1`. `call_variants` is the only step that uses the GPU, and can only use 152 | one at a time. `make_examples_somatic` and `postprocess_variants` do not run on 153 | GPU. 154 | 155 | ```bash 156 | sudo docker run --gpus 1 \ 157 | -v ${INPUT_DIR}:${INPUT_DIR} \ 158 | -v ${OUTPUT_DIR}:${OUTPUT_DIR} \ 159 | google/deepsomatic:"${BIN_VERSION}-gpu" \ 160 | run_deepsomatic \ 161 | ... 162 | ``` 163 | 164 | ## Notes on Singularity 165 | 166 | ### CPU version 167 | 168 | ```bash 169 | # Pull the image. 170 | singularity pull docker://google/deepsomatic:"${BIN_VERSION}" 171 | 172 | # Run DeepSomatic. 173 | singularity run -B /usr/lib/locale/:/usr/lib/locale/ \ 174 | docker://google/deepsomatic:"${BIN_VERSION}" \ 175 | run_deepsomatic \ 176 | --model_type=WGS \ 177 | --ref=${INPUT_DIR}/GRCh38_no_alts_chr1.fasta \ 178 | --reads_normal=${INPUT_DIR}/S1395_WGS_ilm_normal.bwa.dedup.chr1.quickstart.bam \ 179 | --reads_tumor=${INPUT_DIR}/S1395_WGS_ilm_tumor.bwa.dedup.chr1.quickstart.bam \ 180 | --output_vcf=${OUTPUT_DIR}/HCC1395_deepsomatic_quickstart.vcf.gz \ 181 | --output_gvcf=${OUTPUT_DIR}/HCC1395_deepsomatic_quickstart.g.vcf.gz \ 182 | --sample_name_tumor="tumor" \ 183 | --sample_name_normal="normal" \ 184 | --num_shards=1 \ ** Set the number of threads ** 185 | --logging_dir=${OUTPUT_DIR}/logs \ 186 | --intermediate_results_dir ${OUTPUT_DIR}/intermediate_results_dir \ 187 | --regions=chr1:10,000,000-10,100,000 188 | ``` 189 | 190 | ### GPU version 191 | 192 | ``` 193 | # Pull the image. 194 | singularity pull docker://google/deepsomatic:"${BIN_VERSION}-gpu" 195 | 196 | # Run DeepSomatic. 197 | # Using "--nv" and "${BIN_VERSION}-gpu" is important. 198 | singularity run --nv -B /usr/lib/locale/:/usr/lib/locale/ \ 199 | docker://google/deepsomatic:"${BIN_VERSION}-gpu" \ 200 | run_deepsomatic \ 201 | ... 202 | ``` 203 | 204 | ## Evaluating the results 205 | 206 | Here we use the `hap.py` 207 | ([https://github.com/Illumina/hap.py](https://github.com/Illumina/hap.py)) 208 | program from Illumina to evaluate the resulting 10 kilobase vcf file. This 209 | serves as a quick check to ensure the three DeepSomatic commands ran correctly. 210 | 211 | ```bash 212 | sudo docker pull pkrusche/hap.py:v0.3.9 213 | 214 | sudo docker run -it \ 215 | -v ${INPUT_DIR}:${INPUT_DIR} \ 216 | -v ${OUTPUT_DIR}:${OUTPUT_DIR} \ 217 | pkrusche/hap.py:v0.3.9 /opt/hap.py/bin/som.py \ 218 | ${INPUT_DIR}/SEQC2_truth.chr1.quick_start.vcf.gz \ 219 | ${OUTPUT_DIR}/HCC1395_deepsomatic_quickstart.vcf.gz \ 220 | --restrict-regions ${INPUT_DIR}/SEQC2_truth.chr1.quick_start.bed \ 221 | -r ${INPUT_DIR}/GRCh38_no_alts_chr1.fasta \ 222 | -o ${OUTPUT_DIR}/s1395_deepsomatic_chr1_quickstart \ 223 | --feature-table generic 224 | ``` 225 | 226 | You should see output similar to the following. 227 | 228 | ``` 229 | Benchmarking Summary: 230 | type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate 231 | 1 SNVs 1 1 1 0 0 0 0 1 0.025 1 1 1 0.025 1 0 0 248956422 0 232 | 5 records 1 1 1 0 0 0 0 1 0.025 1 1 1 0.025 1 0 0 248956422 0 233 | ``` 234 | -------------------------------------------------------------------------------- /docs/metrics.md: -------------------------------------------------------------------------------- 1 | # Runtime and accuracy metrics for all release models 2 | 3 | ## Setup 4 | 5 | The runtime and accuracy reported in this page are generated using 6 | `n2-standard-96` GCP instances which has the following configuration: 7 | 8 | ```bash 9 | GCP instance type: n2-standard-96 10 | CPUs: 96-core (vCPU) 11 | Memory: 384GiB 12 | GPUs: 0 13 | ``` 14 | 15 | ## WGS (Illumina) 16 | 17 | Below are the numbers from an Illumina WGS run. 18 | 19 | Dataset details: 20 | 21 | ```bash 22 | Sample: HCC1395 23 | Normal coverage: 50x 24 | Tumor coverage: 60x 25 | ``` 26 | 27 | ### Runtime 28 | 29 | Runtime is all chromosomes. 30 | Reported runtime is an average of 5 runs. 31 | 32 | Stage | Time (wall time) 33 | -------------------------------- | ------------------ 34 | make_examples_somatic | 64m30.62s 35 | call_variants | 106m0.25s 36 | postprocess_variants (no gVCF) | 0m59.07s 37 | vcf_stats_report (optional) | 3m0.84s 38 | total | 185m49.59s (~3h5m) 39 | 40 | ### Accuracy 41 | 42 | somp.py results 43 | 44 | ``` 45 | type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate 46 | 0 indels 1626 1782 1512 270 114 0 0 0.929889 0.916710 0.941541 0.929889 0.848485 0.831278 0.864561 0 0 2875001522 0.093913 47 | 1 SNVs 39447 37921 37500 421 1947 0 0 0.950643 0.948472 0.952747 0.950643 0.988898 0.987806 0.989916 0 0 2875001522 0.146435 48 | 5 records 41073 39703 39012 691 2061 0 0 0.949821 0.947678 0.951901 0.949821 0.982596 0.981274 0.983847 0 0 2875001522 0.240348 49 | ``` 50 | 51 | ## WES (Illumina) 52 | 53 | Below are the numbers from an Illumina WES run. 54 | 55 | Dataset details: 56 | 57 | ```bash 58 | Sample: HCC1395 59 | Normal coverage: 140x 60 | Tumor coverage: 120x 61 | ``` 62 | 63 | ### Runtime 64 | 65 | Runtime is all chromosomes. 66 | Reported runtime is an average of 5 runs. 67 | 68 | Stage | Time (wall time) 69 | -------------------------------- | ------------------ 70 | make_examples_somatic | 8m17.49s 71 | call_variants | 2m37.36s 72 | postprocess_variants (no gVCF) | 0m5.78s 73 | vcf_stats_report (optional) | 0m7.25s 74 | total | 15m33.09s 75 | 76 | ### Accuracy 77 | 78 | somp.py results 79 | 80 | ``` 81 | type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate 82 | 0 indels 48 46 43 3 5 0 0 0.895833 0.786676 0.959125 0.895833 0.934783 0.836081 0.981291 0 0 2875001522 0.001043 83 | 1 SNVs 1159 1104 1094 10 65 0 0 0.943917 0.929551 0.956071 0.943917 0.990942 0.983992 0.995334 0 0 2875001522 0.003478 84 | 5 records 1207 1150 1137 13 70 0 0 0.942005 0.927749 0.954145 0.942005 0.988696 0.981294 0.993649 0 0 2875001522 0.004522 85 | ``` 86 | 87 | ## PacBio 88 | 89 | Below are the numbers from a PacBio run. 90 | 91 | Dataset details: 92 | 93 | ```bash 94 | Sample: HCC1395 95 | Normal coverage: 45x 96 | Tumor coverage: 60x 97 | ``` 98 | 99 | ### Runtime 100 | 101 | Runtime is all chromosomes. 102 | Reported runtime is an average of 5 runs. 103 | 104 | Stage | Time (wall time) 105 | -------------------------------- | ------------------ 106 | make_examples_somatic | 198m23.38s 107 | call_variants | 120m49.79s 108 | postprocess_variants (no gVCF) | 1m36.30s 109 | vcf_stats_report (optional) | 4m33.57s 110 | total | 334m57.98s (~5h34m) 111 | 112 | ### Accuracy 113 | 114 | somp.py results 115 | 116 | ``` 117 | type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate 118 | 0 indels 1626 1637 1315 322 311 0 0 0.808733 0.789077 0.827291 0.808733 0.803299 0.783517 0.822009 0 0 2875001522 0.112000 119 | 1 SNVs 39447 38798 37236 1562 2211 0 0 0.943950 0.941648 0.946187 0.943950 0.959740 0.957750 0.961662 0 0 2875001522 0.543304 120 | 5 records 41073 40435 38551 1884 2522 0 0 0.938597 0.936244 0.940888 0.938597 0.953407 0.951320 0.955429 0 0 2875001522 0.655304 121 | ``` 122 | 123 | ## ONT 124 | 125 | Below are the numbers from a ONT run. 126 | 127 | Dataset details: 128 | 129 | ```bash 130 | Sample: HCC1395 131 | Normal coverage: 33x 132 | Tumor coverage: 50x 133 | ``` 134 | 135 | ### Runtime 136 | 137 | Runtime is all chromosomes. 138 | Reported runtime is an average of 5 runs. 139 | 140 | Stage | Time (wall time) 141 | -------------------------------- | ------------------ 142 | make_examples_somatic | 121m23.55s 143 | call_variants | 167m49.86s 144 | postprocess_variants (no gVCF) | 3m12.58s 145 | vcf_stats_report (optional) | 8m52.38s 146 | total | 310m30.69s (~5h10m) 147 | 148 | ### Accuracy 149 | 150 | somp.py results 151 | 152 | ``` 153 | type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate 154 | 0 indels 1626 1260 1035 225 591 0 0 0.636531 0.612926 0.659651 0.636531 0.821429 0.799557 0.841826 0 0 2875001522 0.078261 155 | 1 SNVs 39447 31433 30595 838 8852 0 0 0.775598 0.771461 0.779694 0.775598 0.973340 0.971515 0.975078 0 0 2875001522 0.291478 156 | 5 records 41073 32693 31630 1063 9443 0 0 0.770092 0.766004 0.774142 0.770092 0.967485 0.965521 0.969367 0 0 2875001522 0.369739 157 | ``` 158 | 159 | ## FFPE WGS 160 | 161 | Below are the numbers from a FFPE run. 162 | 163 | Dataset details: 164 | 165 | ```bash 166 | Sample: HCC1395 167 | Normal coverage: 50x 168 | Tumor coverage: 90x 169 | ``` 170 | 171 | ### Runtime 172 | 173 | Runtime is all chromosomes. 174 | Reported runtime is an average of 5 runs. 175 | 176 | Stage | Time (wall time) 177 | -------------------------------- | ------------------ 178 | make_examples_somatic | 116m2.26s 179 | call_variants | 252m45.09s 180 | postprocess_variants (no gVCF) | 2m10.30s 181 | vcf_stats_report (optional) | 7m8.43s 182 | total | 389m7.11s (~6h29m) 183 | 184 | ### Accuracy 185 | 186 | somp.py results 187 | 188 | ``` 189 | type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate 190 | 0 indels 1626 1645 1300 345 326 0 0 0.799508 0.779525 0.818426 0.799508 0.790274 0.770100 0.809426 0 0 2875001522 0.120000 191 | 1 SNVs 39447 34094 32226 1868 7221 0 0 0.816944 0.813105 0.820737 0.816944 0.945210 0.942757 0.947588 0 0 2875001522 0.649739 192 | 5 records 41073 35739 33526 2213 7547 0 0 0.816254 0.812486 0.819977 0.816254 0.938079 0.935545 0.940542 0 0 2875001522 0.769739 193 | ``` 194 | 195 | ## FFPE WES 196 | 197 | Below are the numbers from a FFPE WES run. 198 | 199 | Dataset details: 200 | 201 | ```bash 202 | Sample: HCC1395 203 | Normal coverage: 185x 204 | Tumor coverage: 190x 205 | ``` 206 | 207 | ### Runtime 208 | 209 | Runtime is all chromosomes. 210 | Reported runtime is an average of 5 runs. 211 | 212 | Stage | Time (wall time) 213 | -------------------------------- | ------------------ 214 | make_examples_somatic | 14m7.55s 215 | call_variants | 3m39.45s 216 | postprocess_variants (no gVCF) | 0m6.25s 217 | vcf_stats_report (optional) | 0m8.99s 218 | total | 29m38.02s 219 | 220 | ### Accuracy 221 | 222 | somp.py results 223 | 224 | ``` 225 | type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate 226 | 0 indels 48 47 40 7 8 0 0 0.833333 0.710001 0.917913 0.833333 0.851064 0.729682 0.930824 0 0 2875001522 0.002435 227 | 1 SNVs 1159 990 956 34 203 0 0 0.824849 0.802170 0.845908 0.824849 0.965657 0.952920 0.975678 0 0 2875001522 0.011826 228 | 5 records 1207 1037 996 41 211 0 0 0.825186 0.802994 0.845822 0.825186 0.960463 0.947293 0.971069 0 0 2875001522 0.014261 229 | ``` 230 | 231 | ## WGS tumor-only 232 | 233 | Below are the numbers from a WGS tumor-only run. 234 | 235 | Dataset details: 236 | 237 | ```bash 238 | Sample: HCC1395 239 | Tumor coverage: 60x 240 | ``` 241 | 242 | ### Runtime 243 | 244 | Runtime is all chromosomes. 245 | Reported runtime is an average of 5 runs. 246 | 247 | Stage | Time (wall time) 248 | -------------------------------- | ------------------ 249 | make_examples_somatic | 37m36.77s 250 | call_variants | 54m41.93s 251 | postprocess_variants (no gVCF) | 1m37.85s 252 | vcf_stats_report (optional) | 3m33.53s 253 | total | 108m19.06s (~1h48m) 254 | 255 | ### Accuracy 256 | 257 | somp.py results 258 | 259 | ``` 260 | type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate 261 | 0 indels 1626 3794 1381 2413 245 0 0 0.849323 0.831321 0.866084 0.849323 0.363996 0.348794 0.379405 0 0 2875001522 0.839304 262 | 1 SNVs 39447 44623 35663 8960 3784 0 0 0.904074 0.901138 0.906950 0.904074 0.799207 0.795471 0.802904 0 0 2875001522 3.116520 263 | 5 records 41073 48416 37044 11372 4029 0 0 0.901906 0.899001 0.904755 0.901906 0.765119 0.761327 0.768879 0 0 2875001522 3.955476 264 | ``` 265 | 266 | ## PacBio tumor-only 267 | 268 | Below are the numbers from a PacBio tumor-only run. 269 | 270 | Dataset details: 271 | 272 | ```bash 273 | Sample: HCC1395 274 | Tumor coverage: 60x 275 | ``` 276 | 277 | ### Runtime 278 | 279 | Runtime is all chromosomes. 280 | Reported runtime is an average of 5 runs. 281 | 282 | Stage | Time (wall time) 283 | -------------------------------- | ------------------ 284 | make_examples_somatic | 102m3.48s 285 | call_variants | 52m56.60s 286 | postprocess_variants (no gVCF) | 2m39.18s 287 | vcf_stats_report (optional) | 6m2.34s 288 | total | 173m50.57s (~2h52m) 289 | 290 | ### Accuracy 291 | 292 | somp.py results 293 | 294 | ``` 295 | type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate 296 | 0 indels 1626 2379 1278 1101 348 0 0 0.785978 0.765545 0.805394 0.785978 0.537201 0.517129 0.557181 0 0 2875001522 0.382956 297 | 1 SNVs 39447 56874 37768 19106 1679 0 0 0.957437 0.955411 0.959395 0.957437 0.664064 0.660174 0.667938 0 0 2875001522 6.645562 298 | 5 records 41073 59253 39046 20207 2027 0 0 0.950649 0.948522 0.952712 0.950649 0.658971 0.655146 0.662780 0 0 2875001522 7.028518 299 | ``` 300 | 301 | ## ONT tumor-only 302 | 303 | Below are the numbers from a ONT tumor-only run. 304 | 305 | Dataset details: 306 | 307 | ```bash 308 | Sample: HCC1395 309 | Tumor coverage: 50x 310 | ``` 311 | 312 | ### Runtime 313 | 314 | Runtime is all chromosomes. 315 | Reported runtime is an average of 5 runs. 316 | 317 | Stage | Time (wall time) 318 | -------------------------------- | ------------------ 319 | make_examples_somatic | 67m22.21s 320 | call_variants | 95m5.18s 321 | postprocess_variants (no gVCF) | 4m4.65s 322 | vcf_stats_report (optional) | 10m15.89s 323 | total | 186m11.42s (~3h6m) 324 | 325 | ### Accuracy 326 | 327 | somp.py results 328 | 329 | ``` 330 | type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate 331 | 0 indels 1626 2165 945 1220 681 0 0 0.581181 0.557074 0.604999 0.581181 0.436490 0.415695 0.457454 0 0 2875001522 0.424348 332 | 1 SNVs 39447 51371 30570 20801 8877 0 0 0.774964 0.770823 0.779065 0.774964 0.595083 0.590833 0.599322 0 0 2875001522 7.235127 333 | 5 records 41073 53535 31515 22020 9558 0 0 0.767292 0.763187 0.771360 0.767292 0.588680 0.584507 0.592844 0 0 2875001522 7.659126 334 | ``` 335 | 336 | ## FFPE WGS tumor-only 337 | 338 | Below are the numbers from a FFPE WGS tumor-only run. 339 | 340 | Dataset details: 341 | 342 | ```bash 343 | Sample: HCC1395 344 | Tumor coverage: 90x 345 | ``` 346 | 347 | ### Runtime 348 | 349 | Runtime is all chromosomes. 350 | Reported runtime is an average of 5 runs. 351 | 352 | Stage | Time (wall time) 353 | -------------------------------- | ------------------ 354 | make_examples_somatic | 66m31.13s 355 | call_variants | 67m8.20s 356 | postprocess_variants (no gVCF) | 1m52.78s 357 | vcf_stats_report (optional) | 4m10.71s 358 | total | 150m42.14s (~2h30m) 359 | 360 | ### Accuracy 361 | 362 | somp.py results 363 | 364 | ``` 365 | type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate 366 | 0 indels 1626 1960 1193 767 433 0 0 0.733702 0.711816 0.754758 0.733702 0.608673 0.586920 0.630107 0 0 2875001522 0.266782 367 | 1 SNVs 39447 37785 30686 7099 8761 0 0 0.777905 0.773782 0.781986 0.777905 0.812121 0.808159 0.816036 0 0 2875001522 2.469216 368 | 5 records 41073 39744 31879 7865 9194 0 0 0.776155 0.772104 0.780166 0.776155 0.802108 0.798170 0.806003 0 0 2875001522 2.735651 369 | ``` 370 | 371 | ## FFPE WES tumor-only 372 | 373 | Below are the numbers from a FFPE WES tumor-only run. 374 | 375 | Dataset details: 376 | 377 | ```bash 378 | Sample: HCC1395 379 | Tumor coverage: 190x 380 | ``` 381 | 382 | ### Runtime 383 | 384 | Runtime is all chromosomes. 385 | Reported runtime is an average of 5 runs. 386 | 387 | Stage | Time (wall time) 388 | -------------------------------- | ------------------ 389 | make_examples_somatic | 5m51.71s 390 | call_variants | 1m12.73s 391 | postprocess_variants (no gVCF) | 0m6.74s 392 | vcf_stats_report (optional) | 0m7.25s 393 | total | 11m3.69s 394 | 395 | ### Accuracy 396 | 397 | somp.py results 398 | 399 | ``` 400 | type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate 401 | 0 indels 48 107 41 66 7 0 0 0.854167 0.734870 0.932321 0.854167 0.383178 0.295172 0.477408 0 0 2875001522 0.022957 402 | 1 SNVs 1159 1225 921 304 238 0 0 0.794651 0.770677 0.817155 0.794651 0.751837 0.727073 0.775412 0 0 2875001522 0.105739 403 | 5 records 1207 1332 962 370 245 0 0 0.797017 0.773631 0.818981 0.797017 0.722222 0.697705 0.745775 0 0 2875001522 0.128696 404 | ``` 405 | 406 | ## How to reproduce the metrics on this page 407 | 408 | For simplicity and consistency, we report runtime with a 409 | [CPU instance with 96 CPUs](https://github.com/google/deepvariant/blob/r1.9/docs/deepvariant-details.md#command-for-a-cpu-only-machine-on-google-cloud-platform) 410 | This is NOT the fastest or cheapest configuration. 411 | 412 | Use `gcloud compute ssh` to log in to the newly created instance. 413 | 414 | Download and run any of the following case study scripts: 415 | 416 | ``` 417 | # Get the script. 418 | curl -O https://raw.githubusercontent.com/google/deepvariant/r1.9/scripts/inference_deepsomatic.sh 419 | 420 | # WGS 421 | bash inference_deepsomatic.sh --model_preset WGS 422 | 423 | # WES 424 | bash inference_deepsomatic.sh --model_preset WES 425 | 426 | # PACBIO 427 | bash inference_deepsomatic.sh --model_preset PACBIO 428 | 429 | # ONT 430 | bash inference_deepsomatic.sh --model_preset ONT 431 | 432 | # FFPE_WGS 433 | bash inference_deepsomatic.sh --model_preset FFPE_WGS 434 | 435 | # FFPE_WES 436 | bash inference_deepsomatic.sh --model_preset FFPE_WES 437 | 438 | # WGS_TUMOR_ONLY 439 | bash inference_deepsomatic.sh --model_preset WGS_TUMOR_ONLY --use_default_pon_filtering 440 | 441 | # WES_TUMOR_ONLY 442 | bash inference_deepsomatic.sh --model_preset WES_TUMOR_ONLY --use_default_pon_filtering 443 | 444 | # PACBIO_TUMOR_ONLY 445 | bash inference_deepsomatic.sh --model_preset PACBIO_TUMOR_ONLY --use_default_pon_filtering 446 | 447 | # ONT_TUMOR_ONLY 448 | bash inference_deepsomatic.sh --model_preset ONT_TUMOR_ONLY --use_default_pon_filtering 449 | 450 | # FFPE_WGS_TUMOR_ONLY 451 | bash inference_deepsomatic.sh --model_preset FFPE_WGS_TUMOR_ONLY --use_default_pon_filtering 452 | 453 | # FFPE_WES_TUMOR_ONLY 454 | bash inference_deepsomatic.sh --model_preset FFPE_WES_TUMOR_ONLY --use_default_pon_filtering 455 | ``` 456 | 457 | Runtime metrics are taken from the resulting log after each stage of 458 | DeepSomatic. 459 | 460 | The accuracy metrics came from the som.py extension of hap.py program. 461 | --------------------------------------------------------------------------------