├── docs
├── install.md
├── genes.md
├── changelog.md
├── usage.md
├── usage_align-imgt.md
├── usage_call-contigs.md
├── usage_call-consensus.md
├── usage_call-reads.md
└── output.md
├── LICENSE.txt
├── figures
└── logo_HiFiHLA.svg
└── README.md
/docs/install.md:
--------------------------------------------------------------------------------
1 | ## Install
2 |
3 | [](http://bioconda.github.io/recipes/hifihla/README.html)
4 |
5 | Please refer to our [official pbbioconda page](https://github.com/PacificBiosciences/pbbioconda) for information on Installation, Support, License, Copyright, and Disclaimer.
6 |
7 | `hifihla` can be installed from bioconda
8 | ```
9 | conda install -c bioconda hifihla
10 | ```
11 | Binaries are also availible in the github [releases](https://github.com/PacificBiosciences/HiFiHLA/releases/latest).
12 |
--------------------------------------------------------------------------------
/docs/genes.md:
--------------------------------------------------------------------------------
1 | ## Genes
2 |
3 | Genes called by `hifihla`:
4 | ```
5 | HLA-A HLA-DQA1 HLA-DRB7 HLA-P
6 | HLA-B HLA-DQA2 HLA-DRB8 HLA-S
7 | HLA-C HLA-DQB1 HLA-E HLA-T
8 | HLA-DMA HLA-DQB2 HLA-F HLA-U
9 | HLA-DMB HLA-DRA HLA-G HLA-V
10 | HLA-DOA HLA-DRB1 HLA-H HLA-W
11 | HLA-DOB HLA-DRB2 HLA-HFE HLA-Y
12 | HLA-DPA1 HLA-DRB3 HLA-J MICA
13 | HLA-DPA2 HLA-DRB4 HLA-K MICB
14 | HLA-DPB1 HLA-DRB5 HLA-L TAP1
15 | HLA-DPB2 HLA-DRB6 HLA-N
16 | ```
17 |
--------------------------------------------------------------------------------
/docs/changelog.md:
--------------------------------------------------------------------------------
1 | # v0.3.1: 04/05/24
2 | ## Changes
3 | - Add output prefix option (takes directory or directory + prefix name)
4 | - Deprecate `outdir` (maintain backwards compatibility until v1.0)
5 | - Fix bug in call-reads where a read with partial exon2 (only) coverage blows up candidate pool
6 | - Catch error from aligned inputs with wrong reference
7 |
8 | # v0.3.0: 03/21/24
9 | ## Changes
10 | - New tool `call-reads` to call from HiFi reads (limited to class I)
11 | - Extend reports for read-based calls
12 | - Update IPD-IMGT/HLA to version 3.55 (2024-01)
13 | - Add option to limit calls to genomic (full-length) records
14 | - Update README
15 |
16 | # v0.2.3: 12/22/23
17 | ## Changes
18 | - Fix reporting bug for cdna calls
19 | - Add option to require exon 2 to make a call
20 | - Improved thread control and error reporting
21 |
22 | # v0.2.2: 11/17/23
23 | ## Changes
24 | - Update database to IPD-IMGT/HLA Version: 3.54 (2023-10)
25 |
26 | # v0.2.1: 11/15/23
27 | ## Changes
28 | - Call-contigs hap2 change to optional
29 | - Improve output determinism
30 | - Update README
31 | - Fix validation bugs
32 |
33 | # v0.2.0: 9/25/23
34 | ## Changes
35 | - Update README for initial release
36 | - Latest database version v3.53
37 | - Bug fixes
38 |
39 | # v0.1.0: 5/31/23
40 | ## Changes
41 | - Report results in tsv and json format
42 |
--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
1 | Copyright (c) 2023, Pacific Biosciences of California, Inc.
2 |
3 | All rights reserved.
4 |
5 | Redistribution and use in source and binary forms, with or without
6 | modification, are permitted (subject to the limitations in the
7 | disclaimer below) provided that the following conditions are met:
8 |
9 | * Redistributions of source code must retain the above copyright
10 | notice, this list of conditions and the following disclaimer.
11 |
12 | * Redistributions in binary form must reproduce the above
13 | copyright notice, this list of conditions and the following
14 | disclaimer in the documentation and/or other materials provided
15 | with the distribution.
16 |
17 | * Neither the name of Pacific Biosciences nor the names of its
18 | contributors may be used to endorse or promote products derived
19 | from this software without specific prior written permission.
20 |
21 | NO EXPRESS OR IMPLIED LICENSES TO ANY PARTY'S PATENT RIGHTS ARE
22 | GRANTED BY THIS LICENSE. THIS SOFTWARE IS PROVIDED BY PACIFIC
23 | BIOSCIENCES AND ITS CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED
24 | WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
25 | OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
26 | DISCLAIMED. IN NO EVENT SHALL PACIFIC BIOSCIENCES OR ITS
27 | CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
28 | SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
29 | LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
30 | USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
31 | ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
32 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
33 | OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
34 | SUCH DAMAGE.
35 |
--------------------------------------------------------------------------------
/docs/usage.md:
--------------------------------------------------------------------------------
1 | ## Usage
2 |
3 | `hifihla` has four subcommands:
4 | * [call-reads](usage_call-reads.md)
5 | * [call-consensus](usage_call-consensus.md)
6 | * [call-contigs](usage_call-contigs.md)
7 | * [align-imgt](usage_align-imgt.md)
8 |
9 | ```
10 | Usage: hifihla
11 |
12 | Commands:
13 | call-reads Call HLA loci from an aligned BAM of HiFi reads
14 | call-contigs Extract HLA loci from assembled MHC contigs & call star alleles on extracted sequences
15 | call-consensus Call HLA Star (*) alleles from consensus sequences
16 | align-imgt Align queries to IMGT/HLA genomic accession sequences
17 | help Print this message or the help of the given subcommand(s)
18 |
19 | Options:
20 | -h, --help Print help
21 | -V, --version Print version
22 | ```
23 |
24 | ## Subcommand Inputs
25 | | Subcommand | Input Type | File types |Description |
26 | |----------------|-------------------------------------|-----------------|------------|
27 | | call-reads | Aligned HiFi reads | BAM | Call Class I (ABC) from HiFi reads aligned to [GRCH38 no alts](https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz) |
28 | | call-contigs | Aligned assembly; unaligned contigs | BAM, FASTA(.gz) | Extract and Call HLA loci from assembled MHC contigs |
29 | | call-consensus | Amplicon/Isoseq consensus | FASTA | Call HLA alleles from consensus sequences (e.g. amplicon assays) |
30 | | align-imgt | Sequence/IMGT accessions | FASTA | Compare sequences in fasta format or database sequences to specific IMGT/HLA genomic alleles |
31 |
--------------------------------------------------------------------------------
/figures/logo_HiFiHLA.svg:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
HiFiHLA
4 |
[DEPRECATED]
5 |
6 | ***
7 | **An HLA star-calling tool for PacBio HiFi data types**
8 |
9 | This project is deprecated. Please try [StarPhase](https://github.com/PacificBiosciences/pb-StarPhase).
10 |
11 | 
12 |
13 |
14 | HiFiHLA generates high resolution (4-field) HLA allele calls from PacBio HiFi data. HiFiHLA identifies the closest matching allele(s) and any differences between a sample and the IPD-IMGT/HLA database. Acceptable data types include aligned HiFi reads, assembly contigs, and amplicon consensus.
15 |
16 | Authors: [John Harting](https://github.com/jrharting), [Zev Kronenberg](https://github.com/zeeev), [Daniel Baker](https://github.com/dnbaker), [Matt Holt](https://github.com/holtjma)
17 |
18 | ## Availability
19 | * [Latest release with binary](https://github.com/PacificBiosciences/HiFiHLA/releases/latest)
20 |
21 | [](http://bioconda.github.io/recipes/hifihla/README.html)
22 |
23 | ## Documentation
24 | 1. [Installation](docs/install.md)
25 | 2. [Genes](docs/genes.md)
26 | 3. [Usage and Examples](docs/usage.md)
27 | 4. [Output](docs/output.md)
28 | 6. [Changelog](docs/changelog.md)
29 |
30 | ## Need help?
31 | If you notice any missing features, bugs, or need assistance with analyzing the output of HiFiHLA,
32 | please don't hesitate to open a GitHub issue.
33 |
34 | ## Support information
35 | HiFiHLA is a pre-release software intended for research use only and not for use in diagnostic procedures.
36 | While efforts have been made to ensure that HiFiHLA lives up to the quality that PacBio strives for, we make no warranty regarding this software.
37 |
38 | As HiFiHLA is not covered by any service level agreement or the like, please do not contact a PacBio Field Applications Scientists or PacBio Customer Service for assistance with any HiFiHLA release.
39 | Please report all issues through GitHub instead.
40 | We make no warranty that any such issue will be addressed, to any extent or within any time frame.
41 |
42 | ## References
43 | Barker DJ, Maccari G, Georgiou X, Cooper MA, Flicek P, Robinson J, Marsh SGE. _The IPD-IMGT/HLA Database_. Nucleic Acids Research (2023) 51:D1053-60.
44 |
45 | ## DISCLAIMER
46 | THIS WEBSITE AND CONTENT AND ALL SITE-RELATED SERVICES, INCLUDING ANY DATA, ARE PROVIDED "AS IS," WITH ALL FAULTS, WITH NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE. YOU ASSUME TOTAL RESPONSIBILITY AND RISK FOR YOUR USE OF THIS SITE, ALL SITE-RELATED SERVICES, AND ANY THIRD PARTY WEBSITES OR APPLICATIONS. NO ORAL OR WRITTEN INFORMATION OR ADVICE SHALL CREATE A WARRANTY OF ANY KIND. ANY REFERENCES TO SPECIFIC PRODUCTS OR SERVICES ON THE WEBSITES DO NOT CONSTITUTE OR IMPLY A RECOMMENDATION OR ENDORSEMENT BY PACIFIC BIOSCIENCES.
47 |
--------------------------------------------------------------------------------
/docs/usage_align-imgt.md:
--------------------------------------------------------------------------------
1 | ### Align queries to specific alleles in IPD-IMGT/HLA
2 | `align-imgt` is meant to check differences between alleles in the database and any query sequence (including other database accessions). The output is json format.
3 | ```
4 | Align queries to IMGT/HLA genomic accession sequences
5 |
6 | Usage: hifihla align-imgt [OPTIONS]
7 |
8 | Options:
9 | -f, --fasta Fasta with query sequence(s)
10 | -q, --qids [...] Comma-sep query IDs
11 | -t, --targets [...] Comma-sep target IDs (map refs)
12 | -n, --tnames [...] Comma-sep target Names (map refs)
13 | -e, --exact Exact target name matches only (default starts-with)
14 | -j, --threads Analysis threads [default: 1]
15 | -v, --verbose... Enable verbose output
16 | --log-level Alternative to repeated -v/--verbose: set log level via key.
17 | Equivalence to -v/--verbose:
18 | => "Warn"
19 | -v => "Info"
20 | -vv => "Debug"
21 | -vvv => "Trace" [default: Warn]
22 | -h, --help Print help
23 | -V, --version Print version
24 | ```
25 |
26 | ### Examples
27 | Compare query to specific alleles by name or accession number
28 | ```
29 | hifihla align-imgt \
30 | --fasta HG001_HLA_A_11_01_01_01.fasta \
31 | -n HLA-A*11:01:01:03 \
32 | -t HLA00043
33 | {
34 | "id": "Query HLA-A_11-01-01-01; Targets HLA00043,HLA15502",
35 | "hla_alignments": {
36 | "HLA-A_11-01-01-01": {
37 | "HLA00043": {
38 | "allele_id": "HLA00043",
39 | "star_name": "HLA-A*11:01:01:01",
40 | "length": 3503,
41 | "match_name": "*11:01:01:01",
42 | "query_start": 162,
43 | "query_end": 3301,
44 | "covered_feat": [
45 | "UTR_5",
46 | "Exon_1",
47 | "Intron_1",
48 | "Exon_2",
49 | "Intron_2",
50 | "Exon_3",
51 | "Intron_3",
52 | "Exon_4",
53 | "Intron_4",
54 | "Exon_5",
55 | "Intron_5",
56 | "Exon_6",
57 | "Intron_6",
58 | "Exon_7",
59 | "Intron_7",
60 | "Exon_8",
61 | "UTR_3"
62 | ],
63 | "not_covered": [],
64 | "coding_diffs": 0,
65 | "noncode_eddist": 0,
66 | "error_rate": null,
67 | "coverage": 1,
68 | "reads": null,
69 | "differences": []
70 | },
71 | "HLA15502": {
72 | "allele_id": "HLA15502",
73 | "star_name": "HLA-A*11:01:01:03",
74 | "length": 3503,
75 | "match_name": "*11:01:01",
76 | "query_start": 162,
77 | "query_end": 3301,
78 | "covered_feat": [
79 | "UTR_5",
80 | "Exon_1",
81 | "Intron_1",
82 | "Exon_2",
83 | "Intron_2",
84 | "Exon_3",
85 | "Intron_3",
86 | "Exon_4",
87 | "Intron_4",
88 | "Exon_5",
89 | "Intron_5",
90 | "Exon_6",
91 | "Intron_6",
92 | "Exon_7",
93 | "Intron_7",
94 | "Exon_8",
95 | "UTR_3"
96 | ],
97 | "not_covered": [],
98 | "coding_diffs": 0,
99 | "noncode_eddist": 1,
100 | "error_rate": null,
101 | "coverage": 1,
102 | "reads": null,
103 | "differences": [
104 | {
105 | "kind": "Mismatch",
106 | "pos": 849,
107 | "size": 1,
108 | "feat": "Intron_2"
109 | }
110 | ]
111 | }
112 | }
113 | }
114 | }
115 | ```
116 | Compare two IMGT alleles by accession ID:
117 | ```
118 | hifihla align-imgt -q HLA15502 -t HLA00043
119 |
120 | {
121 | "id": "Query HLA15502; Targets HLA00043",
122 | "hla_alignments": {
123 | "HLA15502": {
124 | "HLA00043": {
125 | "allele_id": "HLA00043",
126 | "star_name": "HLA-A*11:01:01:01",
127 | "length": 3503,
128 | "match_name": "*11:01:01",
129 | "query_start": 0,
130 | "query_end": 3503,
131 | "covered_feat": [
132 | "UTR_5",
133 | "Exon_1",
134 | "Intron_1",
135 | "Exon_2",
136 | "Intron_2",
137 | "Exon_3",
138 | "Intron_3",
139 | "Exon_4",
140 | "Intron_4",
141 | "Exon_5",
142 | "Intron_5",
143 | "Exon_6",
144 | "Intron_6",
145 | "Exon_7",
146 | "Intron_7",
147 | "Exon_8",
148 | "UTR_3"
149 | ],
150 | "not_covered": [],
151 | "coding_diffs": 0,
152 | "noncode_eddist": 1,
153 | "error_rate": null,
154 | "coverage": 1,
155 | "reads": null,
156 | "differences": [
157 | {
158 | "kind": "Mismatch",
159 | "pos": 849,
160 | "size": 1,
161 | "feat": "Intron_2"
162 | }
163 | ]
164 | }
165 | }
166 | }
167 | }
168 | ```
169 |
--------------------------------------------------------------------------------
/docs/usage_call-contigs.md:
--------------------------------------------------------------------------------
1 | ### Type HLA from MHC assembled contigs
2 | `call-contigs` requires 2 or 3 input files and an output directory:
3 | * Assembled contigs must be aligned to [GRCH38 no alts](https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz)
4 | * Assembled haplotype fasta[.gz] files (1 or 2)
5 |
6 | Options:
7 | * `hap2` is optional -- the fasta argument to `hap1` may contain one or two MHC haplotig sets. If `hap2` is not set, alleles will be phased by contig name.
8 | * Define the `loci` to be extracted and called (see [IMGT genes](https://hla.alleles.org/genes/index.html)) [default all loci]
9 | * Limit equivalent matches in report with `--max_matches` -- This is useful to limit reporting of common alleles with shared cDNA.
10 | * Require assembly alleles to be of a `--min_length`.
11 | * Limit output matches to full genomic IMGT accessions with `--full_length`.
12 | ```
13 | Extract HLA loci from assembled MHC contigs & call star alleles on extracted sequences
14 |
15 | Usage: hifihla call-contigs [OPTIONS] --abam --hap1
16 |
17 | Options:
18 | -a, --abam Input assembly aligned to GRCh38
19 | -p, --hap1 Input hap1 assembly fa(.gz)
20 | -m, --hap2 Input hap2 assembly fa(.gz) (optional)
21 | -o, --out_prefix Output prefix
22 | --outdir Output directory [deprecated]
23 | -l, --loci [...] Input comma-sep loci to extract [default: all]
24 | -s, --min_length Minimum length of extracted targets [default: 1000]
25 | -f, --full_length Full length IMGT records only
26 | -x, --max_matches Maximum equivalent matches per query in report [default: 10]
27 | -j, --threads Analysis threads [default: 1]
28 | -v, --verbose... Enable verbose output
29 | --log-level Alternative to repeated -v/--verbose: set log level via key.
30 | -h, --help Print help
31 | -V, --version Print version
32 | ```
33 |
34 | ### Examples
35 | Type HG00733 HPRC [assembly](https://s3-us-west-2.amazonaws.com/human-pangenomics/working/HPRC_PLUS/HG00733/assemblies/year1_freeze_assembly_v2/):
36 |
37 | #### Align contigs to reference
38 | The following command using [minimap2](https://github.com/lh3/minimap2/) is recommended for assembly alignment:
39 | ```
40 | minimap2 -t 12 \
41 | -L --secondary=no --eqx -ax asm5\
42 | -R '@RG\tID:HG00733\tSM:HG00733' \
43 | GRCh38_no_alts.fasta \
44 | HG00733.paternal.f1_assembly_v2.fa.gz HG00733.maternal.f1_assembly_v2.fa.gz | \
45 | samtools sort -@ 3 \
46 | -T /scratch \
47 | -m 100M \
48 | -O BAM > HG00733.asm.GRCh38_no_alts.bam
49 | samtools index HG00733.asm.GRCH38_no_alts.bam
50 | ```
51 | #### Call HLA alleles
52 | Call all available loci from separated haplotigs:
53 | ```
54 | hifihla call-contigs \
55 | --abam HG00733.asm.GRCh38_no_alts.bam \
56 | --hap1 HG00733.paternal.f1_assembly_v2.fa.gz \
57 | --hap2 HG00733.maternal.f1_assembly_v2.fa.gz \
58 | --out_prefix out_dir/my_sample
59 |
60 | head -7 out_dir/my_sample_hifihla_summary.tsv | column -t
61 |
62 | queryId qLen nMatches gType gPctId gPctCov gEdit cdnaType exCovered exEdit coverage errRate Type
63 | HG00733#1#h1tg000070l_29911131_29915604 4474 1 HLA-A*24:02:01:01 100.0 100.0 0 HLA-A*24:02:01 1,2,3,4,5,6,7,8 0 1 N/A HLA-A*24:02:01:01
64 | HG00733#2#h2tg000008l_1854867_1859341 4475 1 HLA-A*30:02:01:01 100.0 100.0 0 HLA-A*30:02:01 1,2,3,4,5,6,7,8 0 1 N/A HLA-A*30:02:01:01
65 | HG00733#1#h1tg000070l_31324234_31329311 5078 1 HLA-B*35:02:01:02 100.0 100.0 0 HLA-B*35:02:01 1,2,3,4,5,6,7 0 1 N/A HLA-B*35:02:01:02
66 | HG00733#2#h2tg000008l_440231_445304 5074 1 HLA-B*18:01:01:01 100.0 100.0 0 HLA-B*18:01:01 1,2,3,4,5,6,7 0 1 N/A HLA-B*18:01:01:01
67 | HG00733#1#h1tg000070l_31239869_31245142 5274 1 HLA-C*04:01:01:06 100.0 100.0 0 HLA-C*04:01:01 1,2,3,4,5,6,7,8 0 1 N/A HLA-C*04:01:01:06
68 | HG00733#2#h2tg000008l_525397_530669 5273 1 HLA-C*05:01:01:01 100.0 100.0 0 HLA-C*05:01:01 1,2,3,4,5,6,7,8 0 1 N/A HLA-C*05:01:01:01
69 | ```
70 | Call subset of loci from contigs of just one haplotype:
71 | ```
72 | hifihla call-contigs \
73 | --abam HG00733.asm.GRCh38_no_alts.bam \
74 | --hap1 HG00733.paternal.f1_assembly_v2.fa.gz \
75 | --loci HLA-DQA1,HLA-DPA1,HLA-DRB1 \
76 | -o out_dir/my_sample
77 |
78 | column -t out_dir/my_sample_hifihla_summary.tsv
79 |
80 | queryId qLen nMatches gType gPctId gPctCov gEdit cdnaType exCovered exEdit coverage errRate Type
81 | HG00733#1#h1tg000070l_33003286_33014048 10763 1 HLA-DPA1*01:03:01:02 100.0 100.0 0 HLA-DPA1*01:03:01 1,2,3,4 0 1 N/A HLA-DPA1*01:03:01:02
82 | HG00733#1#h1tg000070l_32583472_32591044 7573 1 HLA-DQA1*05:05:01:01 100.0 100.0 0 HLA-DQA1*05:05:01 1,2,3,4 0 1 N/A HLA-DQA1*05:05:01:01
83 | HG00733#1#h1tg000070l_32526722_32541633 14912 1 HLA-DRB1*11:04:01:01 100.0 100.0 0 HLA-DRB1*11:04:01 1,2,3,4,5,6 0 1 N/A HLA-DRB1*11:04:01:01
84 | HG00733#1#h1tg000070l_32462971_32477576 14606 1 HLA-DRB3*02:02:01:04 99.99 100.0 1 HLA-DRB3*02:02:01 1,2,3,4,5,6 0 1 N/A HLA-DRB3*02:02:01
85 | ```
86 | Call class I from un-separated contigs:
87 | ```
88 | cat HG00733.paternal.f1_assembly_v2.fa.gz HG00733.maternal.f1_assembly_v2.fa.gz > HG00733.both.fasta.gz
89 |
90 | hifihla call-contigs \
91 | --abam HG00733.asm.GRCh38_no_alts.bam \
92 | --hap1 HG00733.both.fasta.gz \
93 | --loci HLA-A,HLA-B,HLA-C \
94 | --outdir my_output_dir
95 |
96 | column -t my_output_dir/hifihla_summary.tsv
97 |
98 | queryId qLen nMatches gType gPctId gPctCov gEdit cdnaType exCovered exEdit coverage errRate Type
99 | HG00733#1#h1tg000070l_29911131_29915604 4474 1 HLA-A*24:02:01:01 100.0 100.0 0 HLA-A*24:02:01 1,2,3,4,5,6,7,8 0 1 N/A HLA-A*24:02:01:01
100 | HG00733#2#h2tg000008l_1854867_1859341 4475 1 HLA-A*30:02:01:01 100.0 100.0 0 HLA-A*30:02:01 1,2,3,4,5,6,7,8 0 1 N/A HLA-A*30:02:01:01
101 | HG00733#1#h1tg000070l_31324234_31329311 5078 1 HLA-B*35:02:01:02 100.0 100.0 0 HLA-B*35:02:01 1,2,3,4,5,6,7 0 1 N/A HLA-B*35:02:01:02
102 | HG00733#2#h2tg000008l_440231_445304 5074 1 HLA-B*18:01:01:01 100.0 100.0 0 HLA-B*18:01:01 1,2,3,4,5,6,7 0 1 N/A HLA-B*18:01:01:01
103 | HG00733#1#h1tg000070l_31239869_31245142 5274 1 HLA-C*04:01:01:06 100.0 100.0 0 HLA-C*04:01:01 1,2,3,4,5,6,7,8 0 1 N/A HLA-C*04:01:01:06
104 | HG00733#2#h2tg000008l_525397_530669 5273 1 HLA-C*05:01:01:01 100.0 100.0 0 HLA-C*05:01:01 1,2,3,4,5,6,7,8 0 1 N/A HLA-C*05:01:01:01
105 | ```
106 |
--------------------------------------------------------------------------------
/docs/usage_call-consensus.md:
--------------------------------------------------------------------------------
1 | ### Type consensus sequences
2 | `call-consensus` accepts a FASTA file of sequences and attempts to find the closest matching allele in the [IMGT/HLA database](https://www.ebi.ac.uk/ipd/imgt/hla/). It is assumed that each FASTA record is a single locus.
3 | Optionally call star alleles using exon sequence only.
4 | ```
5 | Call HLA Star (*) alleles from consensus sequences
6 |
7 | Usage: hifihla call-consensus [OPTIONS] --fasta
8 |
9 | Options:
10 | -f, --fasta Input fasta file of consensus sequences
11 | -o, --out_prefix Output prefix
12 | --outdir Output directory [deprecated]
13 | -c, --cdna Enable cDNA-only calling
14 | -e, --exon2 Require Exon2 in query sequence
15 | -l, --full_length Full length IMGT records only
16 | -j, --threads Analysis threads [default: 1]
17 | -x, --max_matches Maximum equivalent matches per query in report [default: 10]
18 | -v, --verbose... Enable verbose output
19 | --log-level Alternative to repeated -v/--verbose: set log level via key.
20 | -h, --help Print help
21 | -V, --version Print version
22 | ```
23 | #### Options Description
24 | * `--fasta` Fasta file of consensus query sequences. Only one allele per query sequence.
25 | * `--out_prefix` Output prefix, accepts a directory or a directory + prefix.
26 | * `--outdir` Output directory \[deprecated\].
27 | * `--cdna` Call and report only coding regions (cdna). Can be used for either DNA or RNA sequences.
28 | * `--exon2` Require exon 2 in query. This may reduce search space.
29 | * `--max_matches` Only report up to this number of matches in the json report.
30 |
31 | ### Example
32 | Type HLA consensus sequences, for example from [HiFi amplicon consensus with pbaa](https://downloads.pacbcloud.com/public/dataset/pbAmpliconAnalysis_HLA/pbaa/12878-HG001):
33 | ```
34 | hifihla call-consensus \
35 | --fasta pbaa_12878-HG001_passed_cluster_sequences.fasta \
36 | --out_prefix out_dir/my_sample
37 |
38 | column -t out_dir/my_sample_hifihla_summary.tsv
39 |
40 | queryId qLen nMatches gType gPctId gPctCov gEdit cdnaType exCovered exEdit coverage errRate Type
41 | sample-12878-HG001_guide-HLA-A_cluster-0_ReadCount-1023 3098 5 HLA-A*01:01:01:01 100.0 88.43 0 HLA-A*01:01:01 1,2,3,4,5,6,7,8 0 1 N/A HLA-A*01:01:01
42 | sample-12878-HG001_guide-HLA-A_cluster-1_ReadCount-999 3098 3 HLA-A*11:01:01:01 100.0 88.43 0 HLA-A*11:01:01 1,2,3,4,5,6,7,8 0 1 N/A HLA-A*11:01:01
43 | sample-12878-HG001_guide-HLA-B_cluster-1_ReadCount-399 3354 7 HLA-B*08:01:01:01 100.0 81.69 0 HLA-B*08:01:01 1,2,3,4,5,6,7 0 1 N/A HLA-B*08:01:01
44 | sample-12878-HG001_guide-HLA-B_cluster-0_ReadCount-481 3363 1 HLA-B*56:01:01:04 100.0 86.53 0 HLA-B*56:01:01 1,2,3,4,5,6,7 0 1 N/A HLA-B*56:01:01:04
45 | sample-12878-HG001_guide-HLA-C_cluster-0_ReadCount-541 3384 6 HLA-C*01:02:01:01 100.0 78.62 0 HLA-C*01:02:01 1,2,3,4,5,6,7,8 0 1 N/A HLA-C*01:02:01
46 | sample-12878-HG001_guide-HLA-C_cluster-1_ReadCount-484 3389 6 HLA-C*07:01:01:01 100.0 78.48 0 HLA-C*07:01:01 1,2,3,4,5,6,7,8 0 1 N/A HLA-C*07:01:01
47 | sample-12878-HG001_guide-HLA-DPB1_cluster-1_ReadCount-348 5824 33 HLA-DPB1*04:01:01:01 100.0 50.52 0 HLA-DPB1*04:01:01 2,3,4,5 0 1 N/A HLA-DPB1*04:01:01
48 | sample-12878-HG001_guide-HLA-DPB1_cluster-0_ReadCount-475 5760 7 HLA-DPB1*14:01:01:01 100.0 50.23 0 HLA-DPB1*14:01:01 2,3,4,5 0 1 N/A HLA-DPB1*14:01:01
49 | sample-12878-HG001_guide-HLA-DQB1_cluster-1_ReadCount-2032 4076 12 HLA-DQB1*02:01:01:01 100.0 54.49 0 HLA-DQB1*02:01:01 2,3,4 0 1 N/A HLA-DQB1*02:01:01
50 | sample-12878-HG001_guide-HLA-DQB1_cluster-0_ReadCount-3740 3703 14 HLA-DQB1*05:01:01:02 100.0 52.22 0 HLA-DQB1*05:01:01 2,3,4 0 1 N/A HLA-DQB1*05:01:01
51 | sample-12878-HG001_guide-HLA-DRB1_cluster-1_ReadCount-1011 3900 15 HLA-DRB1*01:01:01:01 100.0 35.13 0 HLA-DRB1*01:01:01 2,3,4 0 1 N/A HLA-DRB1*01:01:01
52 | sample-12878-HG001_guide-HLA-DRB1_cluster-0_ReadCount-2022 3646 14 HLA-DRB1*03:01:01:01 100.0 26.21 0 HLA-DRB1*03:01:01 2,3,4 0 1 N/A HLA-DRB1*03:01:01
53 | ```
54 | Note that query sequences that match >1 allele are not labeled as four-field matches because the actual call is ambiguous. This can happen when differences between alleles are outside the amplified region.
55 | For example, HLA-A_cluster-1 above matches three alleles perfectly over the amplified range. All three matches are listed in the json.
56 |
57 | ```
58 | jq '.. | objects | to_entries | .[] | select(.key == "sample-12878-HG001_guide-HLA-A_cluster-1_ReadCount-999")' my_sample_hifihla_report.json
59 |
60 | {
61 | "key": "sample-12878-HG001_guide-HLA-A_cluster-1_ReadCount-999",
62 | "value": [
63 | {
64 | "allele_id": "HLA00043",
65 | "star_name": "HLA-A*11:01:01:01",
66 | "length": 3503,
67 | "match_name": "*11:01:01",
68 | "query_start": 182,
69 | "query_end": 3280,
70 | "covered_feat": [
71 | "UTR_5",
72 | "Exon_1",
73 | "Intron_1",
74 | "Exon_2",
75 | "Intron_2",
76 | "Exon_3",
77 | "Intron_3",
78 | "Exon_4",
79 | "Intron_4",
80 | "Exon_5",
81 | "Intron_5",
82 | "Exon_6",
83 | "Intron_6",
84 | "Exon_7",
85 | "Intron_7",
86 | "Exon_8",
87 | "UTR_3"
88 | ],
89 | "not_covered": [],
90 | "coding_diffs": 0,
91 | "noncode_eddist": 0,
92 | "error_rate": null,
93 | "coverage": 1,
94 | "reads": null,
95 | "differences": []
96 | },
97 | {
98 | "allele_id": "HLA32930",
99 | "star_name": "HLA-A*11:01:01:75",
100 | "length": 3503,
101 | "match_name": "*11:01:01",
102 | "query_start": 182,
103 | "query_end": 3280,
104 | "covered_feat": [
105 | "UTR_5",
106 | "Exon_1",
107 | "Intron_1",
108 | "Exon_2",
109 | "Intron_2",
110 | "Exon_3",
111 | "Intron_3",
112 | "Exon_4",
113 | "Intron_4",
114 | "Exon_5",
115 | "Intron_5",
116 | "Exon_6",
117 | "Intron_6",
118 | "Exon_7",
119 | "Intron_7",
120 | "Exon_8",
121 | "UTR_3"
122 | ],
123 | "not_covered": [],
124 | "coding_diffs": 0,
125 | "noncode_eddist": 0,
126 | "error_rate": null,
127 | "coverage": 1,
128 | "reads": null,
129 | "differences": []
130 | },
131 | {
132 | "allele_id": "HLA38435",
133 | "star_name": "HLA-A*11:01:01:88",
134 | "length": 3503,
135 | "match_name": "*11:01:01",
136 | "query_start": 182,
137 | "query_end": 3280,
138 | "covered_feat": [
139 | "UTR_5",
140 | "Exon_1",
141 | "Intron_1",
142 | "Exon_2",
143 | "Intron_2",
144 | "Exon_3",
145 | "Intron_3",
146 | "Exon_4",
147 | "Intron_4",
148 | "Exon_5",
149 | "Intron_5",
150 | "Exon_6",
151 | "Intron_6",
152 | "Exon_7",
153 | "Intron_7",
154 | "Exon_8",
155 | "UTR_3"
156 | ],
157 | "not_covered": [],
158 | "coding_diffs": 0,
159 | "noncode_eddist": 0,
160 | "error_rate": null,
161 | "coverage": 1,
162 | "reads": null,
163 | "differences": []
164 | }
165 | ]
166 | }
167 | ```
168 |
--------------------------------------------------------------------------------
/docs/usage_call-reads.md:
--------------------------------------------------------------------------------
1 | ### Type from aligned HiFi reads
2 | `call-reads` accepts aligned HiFi reads in BAM format and calls HLA alleles directly from reads, without assembly.
3 | ```
4 | Call HLA loci from an aligned BAM of HiFi reads
5 |
6 | Usage: hifihla call-reads [OPTIONS] --abam
7 |
8 | Options:
9 | -j, --threads Analysis threads [default: 1]
10 | -v, --verbose... Enable verbose output
11 | --log-level Alternative to repeated -v/--verbose: set log level via key. [default: Warn]
12 | -h, --help Print help
13 | -V, --version Print version
14 |
15 | Input Options:
16 | -a, --abam Input assembly aligned to GRCh38 (no alts)
17 | -l, --loci [...] Input comma-sep loci to extract [default: all]
18 | -d, --max_depth Maximum reads per locus [default: 50]
19 | -p, --partial Include partially-spanning reads
20 | -t, --haplotypes Haplotypes in sample [default: 2] [possible values: 1, 2]
21 | -s, --seed Random number seed for downsampling to max_depth [default: 42]
22 |
23 | Output Options:
24 | -o, --out_prefix Output prefix
25 | --outdir Output directory [deprecated]
26 | -f, --full_length Full length IMGT records only (exclude exon-only records)
27 | -x, --max_matches Maximum matches in output report [default: 10]
28 | -m, --min_allele_freq Minimum allele frequency [default: 0.1]
29 | -b, --min_cdf Minimum binomial CDF to call het/hom [default: 0.001]
30 |
31 | Presets:
32 | --preset Sequence type presets [possible values: te, wgs]
33 | ```
34 | #### Input Options Description
35 | * `--abam` HiFi reads aligned to [GRCH38 no alts](https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz).
36 | * `--loci` HLA loci to call. Currently limited to HLA-A,HLA-B,HLA-C.
37 | * `--max_depth` Maximim reads to use per locus. Reads are randomly downsampled if coverage > d.
38 | * `--partial` Include HiFi reads that do not fully span locus, but still span exon 2 (minimum requirement).
39 | * `--haplotypes` Expected number of haploytypes in sample.
40 | * `--seed` Random number seed for downsampling and clustering.
41 |
42 | ### Output Options Description
43 | * `--out_prefix` Output prefix, accepts a directory or a directory + prefix.
44 | * `--outdir` Output directory \[deprecated\].
45 | * `--full_length` Restrict allele matches to full length IMGT records (exclude exon-only accessions).
46 | * `--max_matches` Maximum number of equivalent matches to list per query sequence.
47 | * `--min_allele_freq` Minimum fraction of reads for minor allele. Clusters with lower frequency will be ignored.
48 | * `--min_cdf` Minimum binomial CDF for minor allele. Clusters with lower CDF will be ignored.
49 |
50 | Presets are convenience options for whole genome sequencing with long (>10kb) HiFi Reads`wgs` or shorter (~5kb) Target Enrichment HiFi Reads `te`.
51 | The only preset option at this time sets the `-p` flag for wgs, and no `-p` for te. Target Enrichment datasets tend to have higher coverage, so we filter for fully spanning reads.
52 |
53 | ### Examples
54 | Type Class I (HLA-A/-B/-C) alleles from [HPRC](https://github.com/human-pangenomics/HPP_Year1_Data_Freeze_v1.0) HG00733 aligned WGS HiFi reads using 8 threads:
55 | ```
56 | hifihla call-reads \
57 | --preset wgs \
58 | -j 8 \
59 | -a HG00733.GRCh38_no_alts.bam \
60 | -o out_dir/my_sample
61 |
62 | column -t out_dir/my_sample_hifihla_summary.tsv
63 |
64 | queryId qLen nMatches gType gPctId gPctCov gEdit cdnaType exCovered exEdit coverage errRate Type
65 | HLA-A_1 3502 1 HLA-A*24:02:01:01 100.0 100.0 0 HLA-A*24:02:01 1,2,3,4,5,6,7,8 0 9 0.00346 HLA-A*24:02:01:01
66 | HLA-A_0 3503 1 HLA-A*30:02:01:01 100.0 100.0 0 HLA-A*30:02:01 1,2,3,4,5,6,7,8 0 25 0.00293 HLA-A*30:02:01:01
67 | HLA-B_1 4081 1 HLA-B*18:01:01:01 100.0 100.0 0 HLA-B*18:01:01 1,2,3,4,5,6,7 0 16 0.00281 HLA-B*18:01:01:01
68 | HLA-B_0 4085 1 HLA-B*35:02:01:02 100.0 100.0 0 HLA-B*35:02:01 1,2,3,4,5,6,7 0 18 0.00250 HLA-B*35:02:01:02
69 | HLA-C_0 4264 1 HLA-C*04:01:01:06 100.0 100.0 0 HLA-C*04:01:01 1,2,3,4,5,6,7,8 0 20 0.00280 HLA-C*04:01:01:06
70 | HLA-C_1 4303 1 HLA-C*05:01:01:01 100.0 100.0 0 HLA-C*05:01:01 1,2,3,4,5,6,7,8 0 12 0.00335 HLA-C*05:01:01:01
71 | ```
72 | Type Class I alleles from targeted [Twist Alliance Panels](https://www.pacb.com/wp-content/uploads/Application-Brief-HiFi-Target-Enrichment-Best-Practices.pdf) sequenced with [PacBio HiFi](https://downloads.pacbcloud.com/public/dataset/HiFiTE_SqIIe/Oct_2022/TwistAllianceLongReadPGx/):
73 | ```
74 | hifihla call-reads \
75 | --preset te \
76 | -j 8 \
77 | -a HG01190.GRCh38_noalt.deepvariant.haplotagged..bam \
78 | -o my_output_dir
79 |
80 | column -t my_output_dir/hifihla_summary.tsv
81 |
82 | queryId qLen nMatches gType gPctId gPctCov gEdit cdnaType exCovered exEdit coverage errRate Type
83 | HLA-A_1 3517 1 HLA-A*02:01:01:01 100.0 100.0 0 HLA-A*02:01:01 1,2,3,4,5,6,7,8 0 19 0.00108 HLA-A*02:01:01:01
84 | HLA-A_0 3502 1 HLA-A*24:02:01:01 100.0 100.0 0 HLA-A*24:02:01 1,2,3,4,5,6,7,8 0 28 0.00213 HLA-A*24:02:01:01
85 | HLA-B_1 3327 1 HLA-B*15:20 100.0 100.0 0 HLA-B*15:20 1,2,3,4,5,6,7 0 18 0.00107 HLA-B*15:20
86 | HLA-B_0 4081 1 HLA-B*18:01:01:83 100.0 100.0 0 HLA-B*18:01:01 1,2,3,4,5,6,7 0 10 0.00059 HLA-B*18:01:01:83
87 | HLA-C_0 4304 1 HLA-C*01:02:01:01 100.0 100.0 0 HLA-C*01:02:01 1,2,3,4,5,6,7,8 0 24 0.00121 HLA-C*01:02:01:01
88 | HLA-C_1 4318 1 HLA-C*07:01:01:16 100.0 100.0 0 HLA-C*07:01:01 1,2,3,4,5,6,7,8 0 26 0.00205 HLA-C*07:01:01:16
89 | ```
90 | Force call only a single, full-length HLA-A haplotype using at most 15 reads:
91 | ```
92 | hifihla call-reads \
93 | -d 15 \
94 | -t 1 \
95 | -f \
96 | -l HLA-A \
97 | -a NA12889.GRCH38.haplotagged.bam \
98 | -o out_dir/my_sample
99 |
100 | cat out_dir/my_sample_hifihla_report.json
101 | {
102 | "sample_id": "NA12889.GRCh38.haplotagged",
103 | "version": "hifihla 0.3.0;IPD-IMGT/HLA 3.55 (2024-01)",
104 | "hla_calls": {
105 | "HLA-A": [
106 | {
107 | "HLA-A_0": [
108 | {
109 | "allele_id": "HLA00037",
110 | "star_name": "HLA-A*03:01:01:01",
111 | "length": 3502,
112 | "match_name": "*03:01:01:01",
113 | "query_start": 0,
114 | "query_end": 3502,
115 | "covered_feat": [
116 | "UTR_5",
117 | "Exon_1",
118 | "Intron_1",
119 | "Exon_2",
120 | "Intron_2",
121 | "Exon_3",
122 | "Intron_3",
123 | "Exon_4",
124 | "Intron_4",
125 | "Exon_5",
126 | "Intron_5",
127 | "Exon_6",
128 | "Intron_6",
129 | "Exon_7",
130 | "Intron_7",
131 | "Exon_8",
132 | "UTR_3"
133 | ],
134 | "not_covered": [],
135 | "coding_diffs": 0,
136 | "noncode_eddist": 0,
137 | "error_rate": 0.0023986294,
138 | "coverage": 15,
139 | "reads": [
140 | "m54329U_230312_191124/80677293/ccs",
141 | "m84046_230327_223715_s3/52499911/ccs",
142 | "m54329U_230314_044350/92145583/ccs",
143 | "m54329U_230311_094504/134349275/ccs",
144 | "m84046_230327_223715_s3/41027584/ccs",
145 | "m84046_230327_223715_s3/80090755/ccs",
146 | "m84046_230327_223715_s3/132322013/ccs",
147 | "m84046_230327_223715_s3/80871474/ccs",
148 | "m84046_230327_223715_s3/41226855/ccs",
149 | "m54329U_230312_191124/63046676/ccs",
150 | "m54329U_230311_094504/31064736/ccs",
151 | "m54329U_230312_191124/56625320/ccs",
152 | "m84046_230327_223715_s3/38012954/ccs",
153 | "m84046_230327_223715_s3/23856426/ccs",
154 | "m54329U_230314_044350/129892658/ccs"
155 | ],
156 | "differences": []
157 | }
158 | ]
159 | }
160 | ]
161 | }
162 | }
163 | ```
164 |
--------------------------------------------------------------------------------
/docs/output.md:
--------------------------------------------------------------------------------
1 | ## Output
2 | `call-reads`, `call-consensus` and `call-contigs` all generate three reports containing HLA star-allele type calls. Additionally, `call-contigs` produces fasta files of extracted sequences from the assembly. If `out_prefix` is given as a directory _+ prefix/samplename_, output files will be joined to the prefix with underscore `_`.
3 |
4 | | File | Description |
5 | | -------------------------------------------- | ----------- |
6 | | {out_prefix}\[_/\]hifihla_summary.tsv | Detailed file listing best call for each locus |
7 | | {out_prefix}\[_/\]hifihla_report.tsv | Simple tsv file listing calls for each locus |
8 | | {out_prefix}\[_/\]hifihla_report.json | Detailed results file, see below for example |
9 | | {out_prefix}\[_/\]asm.contigs.h[12].fasta | Extracted (full) assembly contigs aligning to MHC |
10 | | {out_prefix}\[_/\]asm.contigs.h[12].fasta.fai | FASTA index for contigs |
11 | | {out_prefix}\[_/\]extracted.targets.h[12].fasta | Extracted targets used for star-typing |
12 |
13 | ### Detailed summary tsv
14 | This file reports the single best call and statistics for each query sequence in the sample.
15 |
16 | | Column Name | Description |
17 | |--------------|------------------------------------------------------|
18 | | queryId | Query sequence name |
19 | | qLen | Length of query sequence |
20 | | nMatches | Number of equivalent matches with same edit distance |
21 | | gType | Closest matching genomic accession |
22 | | gPctId | Percent matching bases of closest genomic match |
23 | | gPctCov | Percent coverage of closest genomic match |
24 | | gEdit | Edit distance to closest genomic match |
25 | | cdnaType | Closest matching coding sequence |
26 | | exCovered | Exons included in alignment |
27 | | exEdit | Edit distance to cDNA match |
28 | | coverage | HiFi read count of allele (call-reads only) |
29 | | errRate | Total error rate of HiFi reads (call-reads only) |
30 | | Type | Final type call |
31 |
32 | Final type call is determined by edit distance and exact matches to the query (nMatches). The final call is four field if the query sequences matches exactly and is the only exact match returned.
33 |
34 | `call-reads` summary for HG002 (Class I):
35 | ```
36 | queryId qLen nMatches gType gPctId gPctCov gEdit cdnaType exCovered exEdit coverage errRate Type
37 | HLA-A_1 3503 1 HLA-A*01:01:01:01 100.0 100.0 0 HLA-A*01:01:01 1,2,3,4,5,6,7,8 0 15 0.00276 HLA-A*01:01:01:01
38 | HLA-A_0 3517 1 HLA-A*26:01:01:01 100.0 100.0 0 HLA-A*26:01:01 1,2,3,4,5,6,7,8 0 13 0.00249 HLA-A*26:01:01:01
39 | HLA-B_0 3855 1 HLA-B*35:08:01:01 100.0 100.0 0 HLA-B*35:08:01 1,2,3,4,5,6,7 0 18 0.00231 HLA-B*35:08:01:01
40 | HLA-B_1 4070 1 HLA-B*38:01:01:01 100.0 100.0 0 HLA-B*38:01:01 1,2,3,4,5,6,7 0 10 0.00204 HLA-B*38:01:01:01
41 | HLA-C_0 4264 1 HLA-C*04:01:01:06 100.0 100.0 0 HLA-C*04:01:01 1,2,3,4,5,6,7,8 0 12 0.00135 HLA-C*04:01:01:06
42 | HLA-C_1 4303 1 HLA-C*12:03:01:01 100.0 100.0 0 HLA-C*12:03:01 1,2,3,4,5,6,7,8 0 8 0.00215 HLA-C*12:03:01:01
43 | ```
44 | `call-contigs` summary for HG00733:
45 | ```
46 | queryId qLen nMatches gType gPctId gPctCov gEdit cdnaType exCovered exEdit coverage errRate Type
47 | HG00733#1#h1tg000070l_29911131_29915604 4474 1 HLA-A*24:02:01:01 100.0 100.0 0 HLA-A*24:02:01 1,2,3,4,5,6,7,8 0 1 N/A HLA-A*24:02:01:01
48 | HG00733#2#h2tg000008l_1854867_1859341 4475 1 HLA-A*30:02:01:01 100.0 100.0 0 HLA-A*30:02:01 1,2,3,4,5,6,7,8 0 1 N/A HLA-A*30:02:01:01
49 | HG00733#1#h1tg000070l_31324234_31329311 5078 1 HLA-B*35:02:01:02 100.0 100.0 0 HLA-B*35:02:01 1,2,3,4,5,6,7 0 1 N/A HLA-B*35:02:01:02
50 | HG00733#2#h2tg000008l_440231_445304 5074 1 HLA-B*18:01:01:01 100.0 100.0 0 HLA-B*18:01:01 1,2,3,4,5,6,7 0 1 N/A HLA-B*18:01:01:01
51 | HG00733#1#h1tg000070l_31239869_31245142 5274 1 HLA-C*04:01:01:06 100.0 100.0 0 HLA-C*04:01:01 1,2,3,4,5,6,7,8 0 1 N/A HLA-C*04:01:01:06
52 | HG00733#2#h2tg000008l_525397_530669 5273 1 HLA-C*05:01:01:01 100.0 100.0 0 HLA-C*05:01:01 1,2,3,4,5,6,7,8 0 1 N/A HLA-C*05:01:01:01
53 | HG00733#1#h1tg000070l_33003286_33014048 10763 1 HLA-DPA1*01:03:01:02 100.0 100.0 0 HLA-DPA1*01:03:01 1,2,3,4 0 1 N/A HLA-DPA1*01:03:01:02
54 | HG00733#2#h2tg000060l_26807522_26818289 10768 1 HLA-DPA1*02:01:01:01 100.0 100.0 0 HLA-DPA1*02:01:01 1,2,3,4 0 1 N/A HLA-DPA1*02:01:01:01
55 | HG00733#1#h1tg000070l_33014638_33027155 12518 1 HLA-DPB1*04:01:01:01 100.0 100.0 0 HLA-DPB1*04:01:01 1,2,3,4,5 0 1 N/A HLA-DPB1*04:01:01:01
56 | HG00733#2#h2tg000060l_26794480_26806932 12453 1 HLA-DPB1*11:01:01:01 100.0 100.0 0 HLA-DPB1*11:01:01 1,2,3,4,5 0 1 N/A HLA-DPB1*11:01:01:01
57 | HG00733#1#h1tg000070l_32583472_32591044 7573 1 HLA-DQA1*05:05:01:01 100.0 100.0 0 HLA-DQA1*05:05:01 1,2,3,4 0 1 N/A HLA-DQA1*05:05:01:01
58 | HG00733#2#h2tg000060l_27233443_27240904 7462 1 HLA-DQA1*05:01:01:01 100.0 100.0 0 HLA-DQA1*05:01:01 1,2,3,4 0 1 N/A HLA-DQA1*05:01:01:01
59 | HG00733#1#h1tg000070l_32600790_32608991 8202 1 HLA-DQB1*03:01:01:02 100.0 100.0 0 HLA-DQB1*03:01:01 1,2,3,4,5,6 0 1 N/A HLA-DQB1*03:01:01:02
60 | HG00733#2#h2tg000060l_27214264_27222723 8460 1 HLA-DQB1*02:01:01:01 100.0 100.0 0 HLA-DQB1*02:01:01 1,2,3,4,5,6 0 1 N/A HLA-DQB1*02:01:01:01
61 | HG00733#1#h1tg000070l_32526722_32541633 14912 1 HLA-DRB1*11:04:01:01 100.0 100.0 0 HLA-DRB1*11:04:01 1,2,3,4,5,6 0 1 N/A HLA-DRB1*11:04:01:01
62 | HG00733#2#h2tg000060l_27282742_27297652 14911 1 HLA-DRB1*03:01:01:02 100.0 100.0 0 HLA-DRB1*03:01:01 1,2,3,4,5,6 0 1 N/A HLA-DRB1*03:01:01:02
63 | HG00733#1#h1tg000070l_32462971_32477576 14606 1 HLA-DRB3*02:02:01:04 99.99 100.0 1 HLA-DRB3*02:02:01 1,2,3,4,5,6 0 1 N/A HLA-DRB3*02:02:01
64 | HG00733#2#h2tg000060l_27346699_27361303 14605 1 HLA-DRB3*02:02:01:01 100.0 100.0 0 HLA-DRB3*02:02:01 1,2,3,4,5,6 0 1 N/A HLA-DRB3*02:02:01:01
65 | ```
66 |
67 | ### Simple tsv format
68 | This file is formatted for a quick view of top-level results. It is appropriate as input for downstream interpretation tools like [pharmCAT](https://pharmcat.org/).
69 |
70 | ```
71 | $ column -t hifihla_report.tsv
72 | HLA-A *01:01:01:01/*26:01:01:01
73 | HLA-B *38:01:01:01/*35:08:01:01
74 | HLA-C *12:03:01:01/*04:01:01:06
75 | ```
76 |
77 | ### Detailed results json
78 | Detailed results contain information on calls for each sequence typed. The json is structured in the following way:
79 |
80 | ```
81 | { sample_id: ,
82 | hla_calls: {
83 | : {
84 | : [
85 | {
86 | allele_id: ,
87 | star_name: ,
88 | length: ,
89 | match_name: ,
90 | query_start: ,
91 | query_end: ,
92 | covered_feat: ,
93 | not_covered: ,
94 | coding_diffs: ,
95 | noncode_eddist: ,
96 | error_rate: ,
97 | coverage: ,
98 | reads: ,
99 | differences: [
100 | {
101 | kind: ,
102 | pos: ,
103 | size: ,
104 | feat:
105 | },
106 | ...
107 | ]
108 | }
109 | ],
110 | ...
111 | }
112 | ...
113 | }
114 | }
115 | ```
116 |
117 | Extracting detailed results for a given locus can be done with the [jq](https://jqlang.github.io/jq/) tool:
118 | ```
119 | jq '.hla_calls | with_entries(select(.key == "HLA-DRB3"))' tmp2/hifihla_report.json
120 |
121 | {
122 | "HLA-DRB3": [
123 | {
124 | "HG00733#1#h1tg000070l_32462971_32477576": [
125 | {
126 | "allele_id": "HLA22697",
127 | "star_name": "HLA-DRB3*02:02:01:04",
128 | "length": 12905,
129 | "match_name": "*02:02:01",
130 | "query_start": 0,
131 | "query_end": 12905,
132 | "covered_feat": [
133 | "UTR_5",
134 | "Exon_1",
135 | "Intron_1",
136 | "Exon_2",
137 | "Intron_2",
138 | "Exon_3",
139 | "Intron_3",
140 | "Exon_4",
141 | "Intron_4",
142 | "Exon_5",
143 | "Intron_5",
144 | "Exon_6",
145 | "UTR_3"
146 | ],
147 | "not_covered": [],
148 | "coding_diffs": 0,
149 | "noncode_eddist": 1,
150 | "error_rate": null,
151 | "coverage": 1,
152 | "reads": null,
153 | "differences": [
154 | {
155 | "kind": "Mismatch",
156 | "pos": 8158,
157 | "size": 1,
158 | "feat": "Intron_2"
159 | }
160 | ]
161 | }
162 | ],
163 | "HG00733#2#h2tg000060l_27346699_27361303": [
164 | {
165 | "allele_id": "HLA00895",
166 | "star_name": "HLA-DRB3*02:02:01:01",
167 | "length": 13588,
168 | "match_name": "*02:02:01:01",
169 | "query_start": 0,
170 | "query_end": 13588,
171 | "covered_feat": [
172 | "UTR_5",
173 | "Exon_1",
174 | "Intron_1",
175 | "Exon_2",
176 | "Intron_2",
177 | "Exon_3",
178 | "Intron_3",
179 | "Exon_4",
180 | "Intron_4",
181 | "Exon_5",
182 | "Intron_5",
183 | "Exon_6",
184 | "UTR_3"
185 | ],
186 | "not_covered": [],
187 | "coding_diffs": 0,
188 | "noncode_eddist": 0,
189 | "error_rate": null,
190 | "coverage": 1,
191 | "reads": null,
192 | "differences": []
193 | }
194 | ]
195 | }
196 | ]
197 | }
198 | ```
199 |
--------------------------------------------------------------------------------