├── .github └── FUNDING.yml ├── .gitignore ├── LICENSE ├── README.md ├── example-data ├── example-multi.gb ├── example-multi.vcf ├── example-version.gb ├── example-version.vcf ├── example.gb └── example.vcf ├── requirements.txt └── vcf-annotator.py /.github/FUNDING.yml: -------------------------------------------------------------------------------- 1 | # These are supported funding model platforms 2 | 3 | github: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2] 4 | patreon: # Replace with a single Patreon username 5 | open_collective: # Replace with a single Open Collective username 6 | ko_fi: rpetit3 7 | tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel 8 | community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry 9 | liberapay: # Replace with a single Liberapay username 10 | issuehunt: # Replace with a single IssueHunt username 11 | otechie: # Replace with a single Otechie username 12 | custom: ['https://www.buymeacoffee.com/rpetit3'] # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2'] 13 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.*~ 2 | *.pyc -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Robert A. Petit III 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat-square)](http://bioconda.github.io/recipes/vcf-annotator/README.html) 2 | [![Anaconda-Server Badge](https://anaconda.org/bioconda/vcf-annotator/badges/downloads.svg)](https://anaconda.org/bioconda/vcf-annotator) 3 | 4 | 5 | *vcf-annotator* uses the reference GenBank file to add more details to the variant calls in a VCF. 6 | 7 | # vcf-annotator 8 | Using a reference GenBank file, *vcf-annotator* adds biological annotations to variants in a VCF file. A full list of annotations is descibed below, but these include amino acid changes, gene information, synonymous vs nonsynonymous, locus tag information, among many more. 9 | 10 | ### Added Annotations 11 | For each mutation, if applicable, the following annotations are added to the INFO column of the VCF. 12 | 13 | | Annotation | Description | 14 | |------------|-------------| 15 | | RefCodon | Reference codon | 16 | | AltCodon | Alternate codon | 17 | | RefAminoAcid | Reference amino acid | 18 | | AltAminoAcid | Alternate amino acid | 19 | | CodonPosition | Codon position in the gene | 20 | | SNPCodonPosition | SNP position in the codon | 21 | | AminoAcidChange | Amino acid change | 22 | | IsSynonymous | 0:nonsynonymous, 1:synonymous, 9:N/A or Unknown | 23 | | IsTransition | 0:transversion, 1:transition, 9:N/A or Unknown | 24 | | IsGenic | 0:intergenic, 1:genic | 25 | | IsPseudo | 0:not pseudo, 1:pseudo gene | 26 | | LocusTag | Locus tag associated with gene | 27 | | Gene | Name of gene | 28 | | Note | Note associated with gene | 29 | | Inference | Inference of feature. | 30 | | Product | Description of gene | 31 | | ProteinID | Protein ID of gene | 32 | | Comments | Example: Negative strand: T->C | 33 | | VariantType | Indel, SNP, Ambiguous_SNP | 34 | | FeatureType | The feature type of variant. | 35 | 36 | # Installation 37 | ### Requirements 38 | * Python >= 3.4 39 | * [BioPython](http://biopython.org/) >= 1.65 40 | * [PyVCF](https://github.com/jamescasbon/PyVCF) == 0.6.8 41 | 42 | ### Bioconda 43 | *vcf-annotator* is available from BioConda 44 | ``` 45 | conda install -c bioconda vcf-annotator 46 | ``` 47 | 48 | ### From Source 49 | ``` 50 | git@github.com:rpetit3/vcf-annottor.git 51 | cd vcf-annottor 52 | pip3 install -r requirements.txt 53 | python3 vcf-annottor.py YOUR_VCF.vcf REFERENCE.gb 54 | ``` 55 | 56 | Nothing much else to it, just a simple to read in a VCF and GenBank file and output an annotated VCF. Feel free to drop it in your $PATH somewhere! 57 | 58 | # Usage 59 | *vcf-annotator* requires an uncompressed VCF file and the corresponding reference GenBank file. It then outputs the annotated variants, by default to STDOUT, but this can be changed on runtime. 60 | 61 | ### Usage Output 62 | ``` 63 | python3 vcf-annotator.py 64 | usage: vcf-annotator.py [-h] [--output STRING] [--version] 65 | VCF_FILE GENBANK_FILE 66 | 67 | Annotate variants from a VCF file using the reference genome's GenBank file. 68 | 69 | positional arguments: 70 | VCF_FILE VCF file of variants 71 | GENBANK_FILE GenBank file of the reference genome. 72 | 73 | optional arguments: 74 | -h, --help show this help message and exit 75 | --output STRING File to write VCF output to (Default STDOUT). 76 | --version show program's version number and exit 77 | ``` 78 | 79 | ##### --version Output 80 | ``` 81 | python3 vcf-annotator.py --version 82 | vcf-annotator.py 0.5 83 | ``` 84 | 85 | ### Example Usage 86 | A VCF and GenBank file are included in the *example-data* directory. You can use these two files to verify the script is working properly. 87 | ``` 88 | python3 vcf-annotator.py example-data/example.vcf example-data/example.gb 89 | ``` 90 | 91 | ### Disclaimer 92 | This script has been developed only for microbial variant analysis. I've only tested on VCF files output from GATK, but I would assume if the VCF format is followed other VCF files should work as well. Currently for a ~3mb genome with ~20k mutations it takes about 10s to annotate the VCF file. Based on this information, I'm not sure how well it would work on larger genomes (if it would even work at all!). 93 | 94 | -------------------------------------------------------------------------------- /example-data/example-multi.vcf: -------------------------------------------------------------------------------- 1 | ##fileformat=VCFv4.2 2 | ##GATKCommandLine.HaplotypeCaller= 3 | ##GATKCommandLine.VariantFiltration= 9 && AF >= 0.95] filterName=[Fail, .] genotypeFilterExpression=[GQ < 20] genotypeFilterName=[LowGQ] clusterSize=3 clusterWindowSize=10 maskExtension=0 maskName=Mask filterNotInMask=false missingValuesInExpressionsShouldEvaluateAsFailing=false invalidatePreviousFilters=false invertFilterExpression=false invertGenotypeFilterExpression=false setFilteredGtToNocall=false filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false"> 4 | ##reference=file:///tmp/nxf.4uWu5CKryc/ref.fasta 5 | ##INFO= 6 | ##INFO= 7 | ##INFO= 8 | ##INFO= 9 | ##INFO= 10 | ##INFO= 11 | ##INFO= 12 | ##INFO= 13 | ##INFO= 14 | ##INFO= 15 | ##INFO= 16 | ##INFO= 17 | ##INFO= 18 | ##INFO= 19 | ##INFO= 20 | ##INFO= 21 | ##INFO= 22 | ##INFO= 23 | ##FORMAT= 24 | ##FORMAT= 25 | ##FORMAT= 26 | ##FORMAT= 27 | ##FORMAT= 28 | ##FORMAT= 29 | ##FILTER= 30 | ##FILTER= 31 | ##FILTER= 32 | ##FILTER= 33 | ##FILTER= 9 && AF >= 0.95"> 34 | ##contig= 35 | #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GATK 36 | NC_002745 89 . T C 2201 . AC=1;AF=1.0;AN=1;DP=50;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=30.63;SOR=1.27; GT:AD:DP:GQ:PL 1:0,50:50:99:2231,0 37 | NC_002745 93 . T C 2130 . AC=1;AF=1.0;AN=1;DP=48;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=34.04;SOR=1.179; GT:AD:DP:GQ:PL 1:0,48:48:99:2160,0 38 | NC_002745 137 . G A 1489 . AC=1;AF=1.0;AN=1;DP=38;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=32.42;SOR=1.179; GT:AD:DP:GQ:PL 1:0,38:38:99:1519,0 39 | NC_002745 165 . T C 1329 . AC=1;AF=1.0;AN=1;DP=33;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=27.64;SOR=1.022; GT:AD:DP:GQ:PL 1:0,33:33:99:1359,0 40 | NC_002745 290 . A G 1348 . AC=1;AF=1.0;AN=1;DP=34;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=29.09;SOR=1.085; GT:AD:DP:GQ:PL 1:0,34:34:99:1378,0 41 | NC_002745 358 . A T 1175 . AC=1;AF=1.0;AN=1;DP=30;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=29.67;SOR=0.826; GT:AD:DP:GQ:PL 1:0,30:30:99:1205,0 42 | NC_002745 393 . G A 1293 . AC=1;AF=1.0;AN=1;DP=32;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=37.76;SOR=0.957; GT:AD:DP:GQ:PL 1:0,32:32:99:1323,0 43 | NC_002745 903 . T C 1796 . AC=1;AF=1.0;AN=1;DP=45;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=34.24;SOR=1.038; GT:AD:DP:GQ:PL 1:0,45:45:99:1826,0 44 | NC_002745 927 . A G 1739 . AC=1;AF=1.0;AN=1;DP=43;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=29.43;SOR=0.739; GT:AD:DP:GQ:PL 1:0,43:43:99:1769,0 45 | NC_002745 939 . A G 1636 . AC=1;AF=1.0;AN=1;DP=41;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=30.09;SOR=0.843; GT:AD:DP:GQ:PL 1:0,41:41:99:1666,0 46 | NC_002745 987 . G T 1239 . AC=1;AF=1.0;AN=1;DP=35;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=31.26;SOR=0.749; GT:AD:DP:GQ:PL 1:0,35:35:99:1269,0 47 | NC_002745 1194 . T C 1701 . AC=1;AF=1.0;AN=1;DP=38;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=32.93;SOR=0.914; GT:AD:DP:GQ:PL 1:0,38:38:99:1731,0 48 | NC_002745 1197 . A G 1708 . AC=1;AF=1.0;AN=1;DP=39;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=30.83;SOR=0.85; GT:AD:DP:GQ:PL 1:0,39:39:99:1738,0 49 | NC_002745 1641 . G A 1770 . AC=1;AF=1.0;AN=1;DP=45;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=29.0;SOR=1.154; GT:AD:DP:GQ:PL 1:0,45:45:99:1800,0 50 | NC_002745 1924 . A G 2326 . AC=1;AF=1.0;AN=1;DP=51;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=30.19;SOR=0.813; GT:AD:DP:GQ:PL 1:0,51:51:99:2356,0 51 | NC_002745 1930 . T A 2255 . AC=1;AF=1.0;AN=1;DP=49;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=29.45;SOR=0.733; GT:AD:DP:GQ:PL 1:0,49:49:99:2285,0 52 | NC_002745 1931 . G A 2255 . AC=1;AF=1.0;AN=1;DP=49;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=34.04;SOR=0.733; GT:AD:DP:GQ:PL 1:0,49:49:99:2285,0 53 | NC_002745 1953 . A G 2546 . AC=1;AF=1.0;AN=1;DP=58;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=29.82;SOR=0.762; GT:AD:DP:GQ:PL 1:0,58:58:99:2576,0 54 | NC_002745 1962 . A G 2421 . AC=1;AF=1.0;AN=1;DP=54;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=33.74;SOR=0.767; GT:AD:DP:GQ:PL 1:0,54:54:99:2451,0 55 | NC_002745 2063 . C A 1806 . AC=1;AF=1.0;AN=1;DP=46;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=26.28;SOR=0.976; GT:AD:DP:GQ:PL 1:0,46:46:99:1836,0 56 | NC_002745 2288 . C T 1246 . AC=1;AF=1.0;AN=1;DP=31;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=33.93;SOR=1.044; GT:AD:DP:GQ:PL 1:0,31:31:99:1276,0 57 | NC_002745 2605 . G A 1615 . AC=1;AF=1.0;AN=1;DP=40;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=33.49;SOR=0.693; GT:AD:DP:GQ:PL 1:0,40:40:99:1645,0 58 | NC_002745 3001 . A G 1684 . AC=1;AF=1.0;AN=1;DP=42;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=33.07;SOR=1.005; GT:AD:DP:GQ:PL 1:0,42:42:99:1714,0 59 | NC_002745 3058 . A G 1681 . AC=1;AF=1.0;AN=1;DP=43;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=32.11;SOR=1.179; GT:AD:DP:GQ:PL 1:0,43:43:99:1711,0 60 | NC_002745 3363 . A G 1346 . AC=1;AF=1.0;AN=1;DP=34;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=30.97;SOR=1.418; GT:AD:DP:GQ:PL 1:0,34:34:99:1376,0 61 | NC_002745 3385 . A G 1650 . AC=1;AF=1.0;AN=1;DP=42;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=29.43;SOR=1.005; GT:AD:DP:GQ:PL 1:0,42:42:99:1680,0 62 | NC_002745 3510 . T A 1285 . AC=1;AF=1.0;AN=1;DP=32;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=26.86;SOR=1.284; GT:AD:DP:GQ:PL 1:0,32:32:99:1315,0 63 | NC_002745 3656 . C T 1556 . AC=1;AF=1.0;AN=1;DP=39;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=30.35;SOR=1.236; GT:AD:DP:GQ:PL 1:0,39:39:99:1586,0 64 | NC_002745 4580 . A G 1527 . AC=1;AF=1.0;AN=1;DP=38;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=28.18;SOR=1.493; GT:AD:DP:GQ:PL 1:0,38:38:99:1557,0 65 | NC_002745 5229 . A C 1949 . AC=1;AF=1.0;AN=1;DP=50;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=28.32;SOR=0.693; GT:AD:DP:GQ:PL 1:0,50:50:99:1979,0 66 | NC_002745 5369 . C T 1489 . AC=1;AF=1.0;AN=1;DP=37;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=29.65;SOR=0.746; GT:AD:DP:GQ:PL 1:0,37:37:99:1519,0 67 | NC_002745 6272 . T C 1659 . AC=1;AF=1.0;AN=1;DP=43;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=35.03;SOR=1.775; GT:AD:DP:GQ:PL 1:0,43:43:99:1689,0 68 | NC_002745 7255 . C T 1654 . AC=1;AF=1.0;AN=1;DP=41;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=25.82;SOR=1.075; GT:AD:DP:GQ:PL 1:0,41:41:99:1684,0 69 | NC_002745 7505 . G A 1475 . AC=1;AF=1.0;AN=1;DP=37;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=32.17;SOR=1.27; GT:AD:DP:GQ:PL 1:0,37:37:99:1505,0 70 | NC_002745 7517 . A T 1336 . AC=1;AF=1.0;AN=1;DP=36;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=32.77;SOR=0.927; GT:AD:DP:GQ:PL 1:0,36:36:99:1366,0 71 | NC_002745 7529 . A C 1585 . AC=1;AF=1.0;AN=1;DP=40;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=30.77;SOR=0.793; GT:AD:DP:GQ:PL 1:0,40:40:99:1615,0 72 | NC_002745 8373 . A G 1875 . AC=1;AF=1.0;AN=1;DP=43;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=25.91;SOR=0.941; GT:AD:DP:GQ:PL 1:0,43:43:99:1905,0 73 | NC_002745 8396 . A G 2081 . AC=1;AF=1.0;AN=1;DP=44;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=26.83;SOR=1.105; GT:AD:DP:GQ:PL 1:0,44:44:99:2111,0 74 | NC_002745 8400 . C T 2085 . AC=1;AF=1.0;AN=1;DP=45;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=30.42;SOR=0.929; GT:AD:DP:GQ:PL 1:0,45:45:99:2115,0 75 | NC_002745 8405 . A G 2040 . AC=1;AF=1.0;AN=1;DP=43;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=24.5;SOR=0.941; GT:AD:DP:GQ:PL 1:0,43:43:99:2070,0 76 | NC_002745 8423 . A G 1905 . AC=1;AF=1.0;AN=1;DP=40;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=29.65;SOR=0.793; GT:AD:DP:GQ:PL 1:0,40:40:99:1935,0 77 | NC_002745 8426 . A T 1946 . AC=1;AF=1.0;AN=1;DP=41;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=59.43;QD=34.02;SOR=0.741; GT:AD:DP:GQ:PL 1:0,41:41:99:1976,0 78 | NC_002745 8433 . C A 1803 . AC=1;AF=1.0;AN=1;DP=41;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=58.54;QD=28.35;SOR=0.741; GT:AD:DP:GQ:PL 1:0,41:41:99:1833,0 79 | NC_002745_2 89 . T C 2201 . AC=1;AF=1.0;AN=1;DP=50;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=30.63;SOR=1.27; GT:AD:DP:GQ:PL 1:0,50:50:99:2231,0 80 | NC_002745_2 93 . T C 2130 . AC=1;AF=1.0;AN=1;DP=48;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=34.04;SOR=1.179; GT:AD:DP:GQ:PL 1:0,48:48:99:2160,0 81 | NC_002745_2 137 . G A 1489 . AC=1;AF=1.0;AN=1;DP=38;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=32.42;SOR=1.179; GT:AD:DP:GQ:PL 1:0,38:38:99:1519,0 82 | NC_002745_2 165 . T C 1329 . AC=1;AF=1.0;AN=1;DP=33;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=27.64;SOR=1.022; GT:AD:DP:GQ:PL 1:0,33:33:99:1359,0 83 | NC_002745_2 290 . A G 1348 . AC=1;AF=1.0;AN=1;DP=34;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=29.09;SOR=1.085; GT:AD:DP:GQ:PL 1:0,34:34:99:1378,0 84 | NC_002745_2 358 . A T 1175 . AC=1;AF=1.0;AN=1;DP=30;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=29.67;SOR=0.826; GT:AD:DP:GQ:PL 1:0,30:30:99:1205,0 85 | NC_002745_2 393 . G A 1293 . AC=1;AF=1.0;AN=1;DP=32;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=37.76;SOR=0.957; GT:AD:DP:GQ:PL 1:0,32:32:99:1323,0 86 | NC_002745_2 903 . T C 1796 . AC=1;AF=1.0;AN=1;DP=45;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=34.24;SOR=1.038; GT:AD:DP:GQ:PL 1:0,45:45:99:1826,0 87 | NC_002745_2 927 . A G 1739 . AC=1;AF=1.0;AN=1;DP=43;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=29.43;SOR=0.739; GT:AD:DP:GQ:PL 1:0,43:43:99:1769,0 88 | NC_002745_2 939 . A G 1636 . AC=1;AF=1.0;AN=1;DP=41;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=30.09;SOR=0.843; GT:AD:DP:GQ:PL 1:0,41:41:99:1666,0 89 | NC_002745_2 987 . G T 1239 . AC=1;AF=1.0;AN=1;DP=35;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=31.26;SOR=0.749; GT:AD:DP:GQ:PL 1:0,35:35:99:1269,0 90 | NC_002745_2 1194 . T C 1701 . AC=1;AF=1.0;AN=1;DP=38;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=32.93;SOR=0.914; GT:AD:DP:GQ:PL 1:0,38:38:99:1731,0 91 | NC_002745_2 1197 . A G 1708 . AC=1;AF=1.0;AN=1;DP=39;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=30.83;SOR=0.85; GT:AD:DP:GQ:PL 1:0,39:39:99:1738,0 92 | NC_002745_2 1641 . G A 1770 . AC=1;AF=1.0;AN=1;DP=45;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=29.0;SOR=1.154; GT:AD:DP:GQ:PL 1:0,45:45:99:1800,0 93 | NC_002745_2 1924 . A G 2326 . AC=1;AF=1.0;AN=1;DP=51;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=30.19;SOR=0.813; GT:AD:DP:GQ:PL 1:0,51:51:99:2356,0 94 | NC_002745_2 1930 . T A 2255 . AC=1;AF=1.0;AN=1;DP=49;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=29.45;SOR=0.733; GT:AD:DP:GQ:PL 1:0,49:49:99:2285,0 95 | NC_002745_2 1931 . G A 2255 . AC=1;AF=1.0;AN=1;DP=49;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=34.04;SOR=0.733; GT:AD:DP:GQ:PL 1:0,49:49:99:2285,0 96 | NC_002745_2 1953 . A G 2546 . AC=1;AF=1.0;AN=1;DP=58;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=29.82;SOR=0.762; GT:AD:DP:GQ:PL 1:0,58:58:99:2576,0 97 | NC_002745_2 1962 . A G 2421 . AC=1;AF=1.0;AN=1;DP=54;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=33.74;SOR=0.767; GT:AD:DP:GQ:PL 1:0,54:54:99:2451,0 98 | NC_002745_2 2063 . C A 1806 . AC=1;AF=1.0;AN=1;DP=46;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=26.28;SOR=0.976; GT:AD:DP:GQ:PL 1:0,46:46:99:1836,0 99 | NC_002745_2 2288 . C T 1246 . AC=1;AF=1.0;AN=1;DP=31;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=33.93;SOR=1.044; GT:AD:DP:GQ:PL 1:0,31:31:99:1276,0 100 | NC_002745_2 2605 . G A 1615 . AC=1;AF=1.0;AN=1;DP=40;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=33.49;SOR=0.693; GT:AD:DP:GQ:PL 1:0,40:40:99:1645,0 101 | NC_002745_2 3001 . A G 1684 . AC=1;AF=1.0;AN=1;DP=42;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=33.07;SOR=1.005; GT:AD:DP:GQ:PL 1:0,42:42:99:1714,0 102 | NC_002745_2 3058 . A G 1681 . AC=1;AF=1.0;AN=1;DP=43;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=32.11;SOR=1.179; GT:AD:DP:GQ:PL 1:0,43:43:99:1711,0 103 | NC_002745_2 3363 . A G 1346 . AC=1;AF=1.0;AN=1;DP=34;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=30.97;SOR=1.418; GT:AD:DP:GQ:PL 1:0,34:34:99:1376,0 104 | NC_002745_2 3385 . A G 1650 . AC=1;AF=1.0;AN=1;DP=42;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=29.43;SOR=1.005; GT:AD:DP:GQ:PL 1:0,42:42:99:1680,0 105 | NC_002745_2 3510 . T A 1285 . AC=1;AF=1.0;AN=1;DP=32;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=26.86;SOR=1.284; GT:AD:DP:GQ:PL 1:0,32:32:99:1315,0 106 | NC_002745_2 3656 . C T 1556 . AC=1;AF=1.0;AN=1;DP=39;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=30.35;SOR=1.236; GT:AD:DP:GQ:PL 1:0,39:39:99:1586,0 107 | NC_002745_2 4580 . A G 1527 . AC=1;AF=1.0;AN=1;DP=38;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=28.18;SOR=1.493; GT:AD:DP:GQ:PL 1:0,38:38:99:1557,0 108 | NC_002745_2 5229 . A C 1949 . AC=1;AF=1.0;AN=1;DP=50;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=28.32;SOR=0.693; GT:AD:DP:GQ:PL 1:0,50:50:99:1979,0 109 | NC_002745_2 5369 . C T 1489 . AC=1;AF=1.0;AN=1;DP=37;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=29.65;SOR=0.746; GT:AD:DP:GQ:PL 1:0,37:37:99:1519,0 110 | NC_002745_2 6272 . T C 1659 . AC=1;AF=1.0;AN=1;DP=43;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=35.03;SOR=1.775; GT:AD:DP:GQ:PL 1:0,43:43:99:1689,0 111 | NC_002745_2 7255 . C T 1654 . AC=1;AF=1.0;AN=1;DP=41;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=25.82;SOR=1.075; GT:AD:DP:GQ:PL 1:0,41:41:99:1684,0 112 | NC_002745_2 7505 . G A 1475 . AC=1;AF=1.0;AN=1;DP=37;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=32.17;SOR=1.27; GT:AD:DP:GQ:PL 1:0,37:37:99:1505,0 113 | NC_002745_2 7517 . A T 1336 . AC=1;AF=1.0;AN=1;DP=36;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=32.77;SOR=0.927; GT:AD:DP:GQ:PL 1:0,36:36:99:1366,0 114 | NC_002745_2 7529 . A C 1585 . AC=1;AF=1.0;AN=1;DP=40;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=30.77;SOR=0.793; GT:AD:DP:GQ:PL 1:0,40:40:99:1615,0 115 | NC_002745_2 8373 . A G 1875 . AC=1;AF=1.0;AN=1;DP=43;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=25.91;SOR=0.941; GT:AD:DP:GQ:PL 1:0,43:43:99:1905,0 116 | NC_002745_2 8396 . A G 2081 . AC=1;AF=1.0;AN=1;DP=44;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=26.83;SOR=1.105; GT:AD:DP:GQ:PL 1:0,44:44:99:2111,0 117 | NC_002745_2 8400 . C T 2085 . AC=1;AF=1.0;AN=1;DP=45;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=30.42;SOR=0.929; GT:AD:DP:GQ:PL 1:0,45:45:99:2115,0 118 | NC_002745_2 8405 . A G 2040 . AC=1;AF=1.0;AN=1;DP=43;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=24.5;SOR=0.941; GT:AD:DP:GQ:PL 1:0,43:43:99:2070,0 119 | NC_002745_2 8423 . A G 1905 . AC=1;AF=1.0;AN=1;DP=40;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=60.0;QD=29.65;SOR=0.793; GT:AD:DP:GQ:PL 1:0,40:40:99:1935,0 120 | NC_002745_2 8426 . A T 1946 . AC=1;AF=1.0;AN=1;DP=41;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=59.43;QD=34.02;SOR=0.741; GT:AD:DP:GQ:PL 1:0,41:41:99:1976,0 121 | NC_002745_2 8433 . C A 1803 . AC=1;AF=1.0;AN=1;DP=41;FS=0.0;MLEAC=1;MLEAF=1.0;MQ=58.54;QD=28.35;SOR=0.741; GT:AD:DP:GQ:PL 1:0,41:41:99:1833,0 122 | -------------------------------------------------------------------------------- /example-data/example-version.gb: -------------------------------------------------------------------------------- 1 | LOCUS NC_045512 29903 bp ss-RNA linear VRL 18-JUL-2020 2 | DEFINITION Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, 3 | complete genome. 4 | ACCESSION NC_045512 5 | VERSION NC_045512.2 6 | DBLINK BioProject: PRJNA485481 7 | KEYWORDS RefSeq. 8 | SOURCE Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) 9 | ORGANISM Severe acute respiratory syndrome coronavirus 2 10 | Viruses; Riboviria; Orthornavirae; Pisuviricota; Pisoniviricetes; 11 | Nidovirales; Cornidovirineae; Coronaviridae; Orthocoronavirinae; 12 | Betacoronavirus; Sarbecovirus. 13 | REFERENCE 1 (bases 1 to 29903) 14 | AUTHORS Wu,F., Zhao,S., Yu,B., Chen,Y.M., Wang,W., Song,Z.G., Hu,Y., 15 | Tao,Z.W., Tian,J.H., Pei,Y.Y., Yuan,M.L., Zhang,Y.L., Dai,F.H., 16 | Liu,Y., Wang,Q.M., Zheng,J.J., Xu,L., Holmes,E.C. and Zhang,Y.Z. 17 | TITLE A new coronavirus associated with human respiratory disease in 18 | China 19 | JOURNAL Nature 579 (7798), 265-269 (2020) 20 | PUBMED 32015508 21 | REMARK Erratum:[Nature. 2020 Apr;580(7803):E7. PMID: 32296181] 22 | REFERENCE 2 (bases 13476 to 13503) 23 | AUTHORS Baranov,P.V., Henderson,C.M., Anderson,C.B., Gesteland,R.F., 24 | Atkins,J.F. and Howard,M.T. 25 | TITLE Programmed ribosomal frameshifting in decoding the SARS-CoV genome 26 | JOURNAL Virology 332 (2), 498-510 (2005) 27 | PUBMED 15680415 28 | REFERENCE 3 (bases 29728 to 29768) 29 | AUTHORS Robertson,M.P., Igel,H., Baertsch,R., Haussler,D., Ares,M. Jr. and 30 | Scott,W.G. 31 | TITLE The structure of a rigorously conserved RNA element within the SARS 32 | virus genome 33 | JOURNAL PLoS Biol. 3 (1), e5 (2005) 34 | PUBMED 15630477 35 | REFERENCE 4 (bases 29609 to 29657) 36 | AUTHORS Williams,G.D., Chang,R.Y. and Brian,D.A. 37 | TITLE A phylogenetically conserved hairpin-type 3' untranslated region 38 | pseudoknot functions in coronavirus RNA replication 39 | JOURNAL J. Virol. 73 (10), 8349-8355 (1999) 40 | PUBMED 10482585 41 | REFERENCE 5 (bases 1 to 29903) 42 | CONSRTM NCBI Genome Project 43 | TITLE Direct Submission 44 | JOURNAL Submitted (17-JAN-2020) National Center for Biotechnology 45 | Information, NIH, Bethesda, MD 20894, USA 46 | REFERENCE 6 (bases 1 to 29903) 47 | AUTHORS Wu,F., Zhao,S., Yu,B., Chen,Y.-M., Wang,W., Hu,Y., Song,Z.-G., 48 | Tao,Z.-W., Tian,J.-H., Pei,Y.-Y., Yuan,M.L., Zhang,Y.-L., 49 | Dai,F.-H., Liu,Y., Wang,Q.-M., Zheng,J.-J., Xu,L., Holmes,E.C. and 50 | Zhang,Y.-Z. 51 | TITLE Direct Submission 52 | JOURNAL Submitted (05-JAN-2020) Shanghai Public Health Clinical Center & 53 | School of Public Health, Fudan University, Shanghai, China 54 | COMMENT REVIEWED REFSEQ: This record has been curated by NCBI staff. The 55 | reference sequence is identical to MN908947. 56 | On Jan 17, 2020 this sequence version replaced NC_045512.1. 57 | Annotation was added using homology to SARSr-CoV NC_004718.3. ### 58 | Formerly called 'Wuhan seafood market pneumonia virus.' If you have 59 | questions or suggestions, please email us at info@ncbi.nlm.nih.gov 60 | and include the accession number NC_045512.### Protein structures 61 | can be found at 62 | https://www.ncbi.nlm.nih.gov/structure/?term=sars-cov-2.### Find 63 | all other Severe acute respiratory syndrome coronavirus 2 64 | (SARS-CoV-2) sequences at 65 | https://www.ncbi.nlm.nih.gov/genbank/sars-cov-2-seqs/ 66 | 67 | ##Assembly-Data-START## 68 | Assembly Method :: Megahit v. V1.1.3 69 | Sequencing Technology :: Illumina 70 | ##Assembly-Data-END## 71 | COMPLETENESS: full length. 72 | FEATURES Location/Qualifiers 73 | source 1..29903 74 | /organism="Severe acute respiratory syndrome coronavirus 75 | 2" 76 | /mol_type="genomic RNA" 77 | /isolate="Wuhan-Hu-1" 78 | /host="Homo sapiens" 79 | /db_xref="taxon:2697049" 80 | /country="China" 81 | /collection_date="Dec-2019" 82 | 5'UTR 1..265 83 | gene 266..21555 84 | /gene="ORF1ab" 85 | /locus_tag="GU280_gp01" 86 | /db_xref="GeneID:43740578" 87 | CDS join(266..13468,13468..21555) 88 | /gene="ORF1ab" 89 | /locus_tag="GU280_gp01" 90 | /ribosomal_slippage 91 | /note="pp1ab; translated by -1 ribosomal frameshift" 92 | /codon_start=1 93 | /product="ORF1ab polyprotein" 94 | /protein_id="YP_009724389.1" 95 | /db_xref="GeneID:43740578" 96 | /translation="MESLVPGFNEKTHVQLSLPVLQVRDVLVRGFGDSVEEVLSEARQ 97 | HLKDGTCGLVEVEKGVLPQLEQPYVFIKRSDARTAPHGHVMVELVAELEGIQYGRSGE 98 | TLGVLVPHVGEIPVAYRKVLLRKNGNKGAGGHSYGADLKSFDLGDELGTDPYEDFQEN 99 | WNTKHSSGVTRELMRELNGGAYTRYVDNNFCGPDGYPLECIKDLLARAGKASCTLSEQ 100 | LDFIDTKRGVYCCREHEHEIAWYTERSEKSYELQTPFEIKLAKKFDTFNGECPNFVFP 101 | LNSIIKTIQPRVEKKKLDGFMGRIRSVYPVASPNECNQMCLSTLMKCDHCGETSWQTG 102 | DFVKATCEFCGTENLTKEGATTCGYLPQNAVVKIYCPACHNSEVGPEHSLAEYHNESG 103 | LKTILRKGGRTIAFGGCVFSYVGCHNKCAYWVPRASANIGCNHTGVVGEGSEGLNDNL 104 | LEILQKEKVNINIVGDFKLNEEIAIILASFSASTSAFVETVKGLDYKAFKQIVESCGN 105 | FKVTKGKAKKGAWNIGEQKSILSPLYAFASEAARVVRSIFSRTLETAQNSVRVLQKAA 106 | ITILDGISQYSLRLIDAMMFTSDLATNNLVVMAYITGGVVQLTSQWLTNIFGTVYEKL 107 | KPVLDWLEEKFKEGVEFLRDGWEIVKFISTCACEIVGGQIVTCAKEIKESVQTFFKLV 108 | NKFLALCADSIIIGGAKLKALNLGETFVTHSKGLYRKCVKSREETGLLMPLKAPKEII 109 | FLEGETLPTEVLTEEVVLKTGDLQPLEQPTSEAVEAPLVGTPVCINGLMLLEIKDTEK 110 | YCALAPNMMVTNNTFTLKGGAPTKVTFGDDTVIEVQGYKSVNITFELDERIDKVLNEK 111 | CSAYTVELGTEVNEFACVVADAVIKTLQPVSELLTPLGIDLDEWSMATYYLFDESGEF 112 | KLASHMYCSFYPPDEDEEEGDCEEEEFEPSTQYEYGTEDDYQGKPLEFGATSAALQPE 113 | EEQEEDWLDDDSQQTVGQQDGSEDNQTTTIQTIVEVQPQLEMELTPVVQTIEVNSFSG 114 | YLKLTDNVYIKNADIVEEAKKVKPTVVVNAANVYLKHGGGVAGALNKATNNAMQVESD 115 | DYIATNGPLKVGGSCVLSGHNLAKHCLHVVGPNVNKGEDIQLLKSAYENFNQHEVLLA 116 | PLLSAGIFGADPIHSLRVCVDTVRTNVYLAVFDKNLYDKLVSSFLEMKSEKQVEQKIA 117 | EIPKEEVKPFITESKPSVEQRKQDDKKIKACVEEVTTTLEETKFLTENLLLYIDINGN 118 | LHPDSATLVSDIDITFLKKDAPYIVGDVVQEGVLTAVVIPTKKAGGTTEMLAKALRKV 119 | PTDNYITTYPGQGLNGYTVEEAKTVLKKCKSAFYILPSIISNEKQEILGTVSWNLREM 120 | LAHAEETRKLMPVCVETKAIVSTIQRKYKGIKIQEGVVDYGARFYFYTSKTTVASLIN 121 | TLNDLNETLVTMPLGYVTHGLNLEEAARYMRSLKVPATVSVSSPDAVTAYNGYLTSSS 122 | KTPEEHFIETISLAGSYKDWSYSGQSTQLGIEFLKRGDKSVYYTSNPTTFHLDGEVIT 123 | FDNLKTLLSLREVRTIKVFTTVDNINLHTQVVDMSMTYGQQFGPTYLDGADVTKIKPH 124 | NSHEGKTFYVLPNDDTLRVEAFEYYHTTDPSFLGRYMSALNHTKKWKYPQVNGLTSIK 125 | WADNNCYLATALLTLQQIELKFNPPALQDAYYRARAGEAANFCALILAYCNKTVGELG 126 | DVRETMSYLFQHANLDSCKRVLNVVCKTCGQQQTTLKGVEAVMYMGTLSYEQFKKGVQ 127 | IPCTCGKQATKYLVQQESPFVMMSAPPAQYELKHGTFTCASEYTGNYQCGHYKHITSK 128 | ETLYCIDGALLTKSSEYKGPITDVFYKENSYTTTIKPVTYKLDGVVCTEIDPKLDNYY 129 | KKDNSYFTEQPIDLVPNQPYPNASFDNFKFVCDNIKFADDLNQLTGYKKPASRELKVT 130 | FFPDLNGDVVAIDYKHYTPSFKKGAKLLHKPIVWHVNNATNKATYKPNTWCIRCLWST 131 | KPVETSNSFDVLKSEDAQGMDNLACEDLKPVSEEVVENPTIQKDVLECNVKTTEVVGD 132 | IILKPANNSLKITEEVGHTDLMAAYVDNSSLTIKKPNELSRVLGLKTLATHGLAAVNS 133 | VPWDTIANYAKPFLNKVVSTTTNIVTRCLNRVCTNYMPYFFTLLLQLCTFTRSTNSRI 134 | KASMPTTIAKNTVKSVGKFCLEASFNYLKSPNFSKLINIIIWFLLLSVCLGSLIYSTA 135 | ALGVLMSNLGMPSYCTGYREGYLNSTNVTIATYCTGSIPCSVCLSGLDSLDTYPSLET 136 | IQITISSFKWDLTAFGLVAEWFLAYILFTRFFYVLGLAAIMQLFFSYFAVHFISNSWL 137 | MWLIINLVQMAPISAMVRMYIFFASFYYVWKSYVHVVDGCNSSTCMMCYKRNRATRVE 138 | CTTIVNGVRRSFYVYANGGKGFCKLHNWNCVNCDTFCAGSTFISDEVARDLSLQFKRP 139 | INPTDQSSYIVDSVTVKNGSIHLYFDKAGQKTYERHSLSHFVNLDNLRANNTKGSLPI 140 | NVIVFDGKSKCEESSAKSASVYYSQLMCQPILLLDQALVSDVGDSAEVAVKMFDAYVN 141 | TFSSTFNVPMEKLKTLVATAEAELAKNVSLDNVLSTFISAARQGFVDSDVETKDVVEC 142 | LKLSHQSDIEVTGDSCNNYMLTYNKVENMTPRDLGACIDCSARHINAQVAKSHNIALI 143 | WNVKDFMSLSEQLRKQIRSAAKKNNLPFKLTCATTRQVVNVVTTKIALKGGKIVNNWL 144 | KQLIKVTLVFLFVAAIFYLITPVHVMSKHTDFSSEIIGYKAIDGGVTRDIASTDTCFA 145 | NKHADFDTWFSQRGGSYTNDKACPLIAAVITREVGFVVPGLPGTILRTTNGDFLHFLP 146 | RVFSAVGNICYTPSKLIEYTDFATSACVLAAECTIFKDASGKPVPYCYDTNVLEGSVA 147 | YESLRPDTRYVLMDGSIIQFPNTYLEGSVRVVTTFDSEYCRHGTCERSEAGVCVSTSG 148 | RWVLNNDYYRSLPGVFCGVDAVNLLTNMFTPLIQPIGALDISASIVAGGIVAIVVTCL 149 | AYYFMRFRRAFGEYSHVVAFNTLLFLMSFTVLCLTPVYSFLPGVYSVIYLYLTFYLTN 150 | DVSFLAHIQWMVMFTPLVPFWITIAYIICISTKHFYWFFSNYLKRRVVFNGVSFSTFE 151 | EAALCTFLLNKEMYLKLRSDVLLPLTQYNRYLALYNKYKYFSGAMDTTSYREAACCHL 152 | AKALNDFSNSGSDVLYQPPQTSITSAVLQSGFRKMAFPSGKVEGCMVQVTCGTTTLNG 153 | LWLDDVVYCPRHVICTSEDMLNPNYEDLLIRKSNHNFLVQAGNVQLRVIGHSMQNCVL 154 | KLKVDTANPKTPKYKFVRIQPGQTFSVLACYNGSPSGVYQCAMRPNFTIKGSFLNGSC 155 | GSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGNFYGPFVDRQTAQAAGTDTTITVN 156 | VLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYEPLTQDHVDILGPLSAQTGIAV 157 | LDMCASLKELLQNGMNGRTILGSALLEDEFTPFDVVRQCSGVTFQSAVKRTIKGTHHW 158 | LLLTILTSLLVLVQSTQWSLFFFLYENAFLPFAMGIIAMSAFAMMFVKHKHAFLCLFL 159 | LPSLATVAYFNMVYMPASWVMRIMTWLDMVDTSLSGFKLKDCVMYASAVVLLILMTAR 160 | TVYDDGARRVWTLMNVLTLVYKVYYGNALDQAISMWALIISVTSNYSGVVTTVMFLAR 161 | GIVFMCVEYCPIFFITGNTLQCIMLVYCFLGYFCTCYFGLFCLLNRYFRLTLGVYDYL 162 | VSTQEFRYMNSQGLLPPKNSIDAFKLNIKLLGVGGKPCIKVATVQSKMSDVKCTSVVL 163 | LSVLQQLRVESSSKLWAQCVQLHNDILLAKDTTEAFEKMVSLLSVLLSMQGAVDINKL 164 | CEEMLDNRATLQAIASEFSSLPSYAAFATAQEAYEQAVANGDSEVVLKKLKKSLNVAK 165 | SEFDRDAAMQRKLEKMADQAMTQMYKQARSEDKRAKVTSAMQTMLFTMLRKLDNDALN 166 | NIINNARDGCVPLNIIPLTTAAKLMVVIPDYNTYKNTCDGTTFTYASALWEIQQVVDA 167 | DSKIVQLSEISMDNSPNLAWPLIVTALRANSAVKLQNNELSPVALRQMSCAAGTTQTA 168 | CTDDNALAYYNTTKGGRFVLALLSDLQDLKWARFPKSDGTGTIYTELEPPCRFVTDTP 169 | KGPKVKYLYFIKGLNNLNRGMVLGSLAATVRLQAGNATEVPANSTVLSFCAFAVDAAK 170 | AYKDYLASGGQPITNCVKMLCTHTGTGQAITVTPEANMDQESFGGASCCLYCRCHIDH 171 | PNPKGFCDLKGKYVQIPTTCANDPVGFTLKNTVCTVCGMWKGYGCSCDQLREPMLQSA 172 | DAQSFLNRVCGVSAARLTPCGTGTSTDVVYRAFDIYNDKVAGFAKFLKTNCCRFQEKD 173 | EDDNLIDSYFVVKRHTFSNYQHEETIYNLLKDCPAVAKHDFFKFRIDGDMVPHISRQR 174 | LTKYTMADLVYALRHFDEGNCDTLKEILVTYNCCDDDYFNKKDWYDFVENPDILRVYA 175 | NLGERVRQALLKTVQFCDAMRNAGIVGVLTLDNQDLNGNWYDFGDFIQTTPGSGVPVV 176 | DSYYSLLMPILTLTRALTAESHVDTDLTKPYIKWDLLKYDFTEERLKLFDRYFKYWDQ 177 | TYHPNCVNCLDDRCILHCANFNVLFSTVFPPTSFGPLVRKIFVDGVPFVVSTGYHFRE 178 | LGVVHNQDVNLHSSRLSFKELLVYAADPAMHAASGNLLLDKRTTCFSVAALTNNVAFQ 179 | TVKPGNFNKDFYDFAVSKGFFKEGSSVELKHFFFAQDGNAAISDYDYYRYNLPTMCDI 180 | RQLLFVVEVVDKYFDCYDGGCINANQVIVNNLDKSAGFPFNKWGKARLYYDSMSYEDQ 181 | DALFAYTKRNVIPTITQMNLKYAISAKNRARTVAGVSICSTMTNRQFHQKLLKSIAAT 182 | RGATVVIGTSKFYGGWHNMLKTVYSDVENPHLMGWDYPKCDRAMPNMLRIMASLVLAR 183 | KHTTCCSLSHRFYRLANECAQVLSEMVMCGGSLYVKPGGTSSGDATTAYANSVFNICQ 184 | AVTANVNALLSTDGNKIADKYVRNLQHRLYECLYRNRDVDTDFVNEFYAYLRKHFSMM 185 | ILSDDAVVCFNSTYASQGLVASIKNFKSVLYYQNNVFMSEAKCWTETDLTKGPHEFCS 186 | QHTMLVKQGDDYVYLPYPDPSRILGAGCFVDDIVKTDGTLMIERFVSLAIDAYPLTKH 187 | PNQEYADVFHLYLQYIRKLHDELTGHMLDMYSVMLTNDNTSRYWEPEFYEAMYTPHTV 188 | LQAVGACVLCNSQTSLRCGACIRRPFLCCKCCYDHVISTSHKLVLSVNPYVCNAPGCD 189 | VTDVTQLYLGGMSYYCKSHKPPISFPLCANGQVFGLYKNTCVGSDNVTDFNAIATCDW 190 | TNAGDYILANTCTERLKLFAAETLKATEETFKLSYGIATVREVLSDRELHLSWEVGKP 191 | RPPLNRNYVFTGYRVTKNSKVQIGEYTFEKGDYGDAVVYRGTTTYKLNVGDYFVLTSH 192 | TVMPLSAPTLVPQEHYVRITGLYPTLNISDEFSSNVANYQKVGMQKYSTLQGPPGTGK 193 | SHFAIGLALYYPSARIVYTACSHAAVDALCEKALKYLPIDKCSRIIPARARVECFDKF 194 | KVNSTLEQYVFCTVNALPETTADIVVFDEISMATNYDLSVVNARLRAKHYVYIGDPAQ 195 | LPAPRTLLTKGTLEPEYFNSVCRLMKTIGPDMFLGTCRRCPAEIVDTVSALVYDNKLK 196 | AHKDKSAQCFKMFYKGVITHDVSSAINRPQIGVVREFLTRNPAWRKAVFISPYNSQNA 197 | VASKILGLPTQTVDSSQGSEYDYVIFTQTTETAHSCNVNRFNVAITRAKVGILCIMSD 198 | RDLYDKLQFTSLEIPRRNVATLQAENVTGLFKDCSKVITGLHPTQAPTHLSVDTKFKT 199 | EGLCVDIPGIPKDMTYRRLISMMGFKMNYQVNGYPNMFITREEAIRHVRAWIGFDVEG 200 | CHATREAVGTNLPLQLGFSTGVNLVAVPTGYVDTPNNTDFSRVSAKPPPGDQFKHLIP 201 | LMYKGLPWNVVRIKIVQMLSDTLKNLSDRVVFVLWAHGFELTSMKYFVKIGPERTCCL 202 | CDRRATCFSTASDTYACWHHSIGFDYVYNPFMIDVQQWGFTGNLQSNHDLYCQVHGNA 203 | HVASCDAIMTRCLAVHECFVKRVDWTIEYPIIGDELKINAACRKVQHMVVKAALLADK 204 | FPVLHDIGNPKAIKCVPQADVEWKFYDAQPCSDKAYKIEELFYSYATHSDKFTDGVCL 205 | FWNCNVDRYPANSIVCRFDTRVLSNLNLPGCDGGSLYVNKHAFHTPAFDKSAFVNLKQ 206 | LPFFYYSDSPCESHGKQVVSDIDYVPLKSATCITRCNLGGAVCRHHANEYRLYLDAYN 207 | MMISAGFSLWVYKQFDTYNLWNTFTRLQSLENVAFNVVNKGHFDGQQGEVPVSIINNT 208 | VYTKVDGVDVELFENKTTLPVNVAFELWAKRNIKPVPEVKILNNLGVDIAANTVIWDY 209 | KRDAPAHISTIGVCSMTDIAKKPTETICAPLTVFFDGRVDGQVDLFRNARNGVLITEG 210 | SVKGLQPSVGPKQASLNGVTLIGEAVKTQFNYYKKVDGVVQQLPETYFTQSRNLQEFK 211 | PRSQMEIDFLELAMDEFIERYKLEGYAFEHIVYGDFSHSQLGGLHLLIGLAKRFKESP 212 | FELEDFIPMDSTVKNYFITDAQTGSSKCVCSVIDLLLDDFVEIIKSQDLSVVSKVVKV 213 | TIDYTEISFMLWCKDGHVETFYPKLQSSQAWQPGVAMPNLYKMQRMLLEKCDLQNYGD 214 | SATLPKGIMMNVAKYTQLCQYLNTLTLAVPYNMRVIHFGAGSDKGVAPGTAVLRQWLP 215 | TGTLLVDSDLNDFVSDADSTLIGDCATVHTANKWDLIISDMYDPKTKNVTKENDSKEG 216 | FFTYICGFIQQKLALGGSVAIKITEHSWNADLYKLMGHFAWWTAFVTNVNASSSEAFL 217 | IGCNYLGKPREQIDGYVMHANYIFWRNTNPIQLSSYSLFDMSKFPLKLRGTAVMSLKE 218 | GQINDMILSLLSKGRLIIRENNRVVISSDVLVNN" 219 | mat_peptide 266..805 220 | /gene="ORF1ab" 221 | /locus_tag="GU280_gp01" 222 | /product="leader protein" 223 | /note="nsp1; produced by both pp1a and pp1ab" 224 | /protein_id="YP_009725297.1" 225 | mat_peptide 806..2719 226 | /gene="ORF1ab" 227 | /locus_tag="GU280_gp01" 228 | /product="nsp2" 229 | /note="produced by both pp1a and pp1ab" 230 | /protein_id="YP_009725298.1" 231 | mat_peptide 2720..8554 232 | /gene="ORF1ab" 233 | /locus_tag="GU280_gp01" 234 | /product="nsp3" 235 | /note="former nsp1; conserved domains are: N-terminal 236 | acidic (Ac), predicted phosphoesterase, papain-like 237 | proteinase, Y-domain, transmembrane domain 1 (TM1), 238 | adenosine diphosphate-ribose 1''-phosphatase (ADRP); 239 | produced by both pp1a and pp1ab" 240 | /protein_id="YP_009725299.1" 241 | mat_peptide 8555..10054 242 | /gene="ORF1ab" 243 | /locus_tag="GU280_gp01" 244 | /product="nsp4" 245 | /note="nsp4B_TM; contains transmembrane domain 2 (TM2); 246 | produced by both pp1a and pp1ab" 247 | /protein_id="YP_009725300.1" 248 | mat_peptide 10055..10972 249 | /gene="ORF1ab" 250 | /locus_tag="GU280_gp01" 251 | /product="3C-like proteinase" 252 | /note="nsp5A_3CLpro and nsp5B_3CLpro; main proteinase 253 | (Mpro); mediates cleavages downstream of nsp4. 3D 254 | structure of the SARSr-CoV homolog has been determined 255 | (Yang et al., 2003); produced by both pp1a and pp1ab" 256 | /protein_id="YP_009725301.1" 257 | mat_peptide 10973..11842 258 | /gene="ORF1ab" 259 | /locus_tag="GU280_gp01" 260 | /product="nsp6" 261 | /note="nsp6_TM; putative transmembrane domain; produced by 262 | both pp1a and pp1ab" 263 | /protein_id="YP_009725302.1" 264 | mat_peptide 11843..12091 265 | /gene="ORF1ab" 266 | /locus_tag="GU280_gp01" 267 | /product="nsp7" 268 | /note="produced by both pp1a and pp1ab" 269 | /protein_id="YP_009725303.1" 270 | mat_peptide 12092..12685 271 | /gene="ORF1ab" 272 | /locus_tag="GU280_gp01" 273 | /product="nsp8" 274 | /note="produced by both pp1a and pp1ab" 275 | /protein_id="YP_009725304.1" 276 | mat_peptide 12686..13024 277 | /gene="ORF1ab" 278 | /locus_tag="GU280_gp01" 279 | /product="nsp9" 280 | /note="ssRNA-binding protein; produced by both pp1a and 281 | pp1ab" 282 | /protein_id="YP_009725305.1" 283 | mat_peptide 13025..13441 284 | /gene="ORF1ab" 285 | /locus_tag="GU280_gp01" 286 | /product="nsp10" 287 | /note="nsp10_CysHis; formerly known as growth-factor-like 288 | protein (GFL); produced by both pp1a and pp1ab" 289 | /protein_id="YP_009725306.1" 290 | mat_peptide join(13442..13468,13468..16236) 291 | /gene="ORF1ab" 292 | /locus_tag="GU280_gp01" 293 | /product="RNA-dependent RNA polymerase" 294 | /note="nsp12; NiRAN and RdRp; produced by pp1ab only" 295 | /protein_id="YP_009725307.1" 296 | mat_peptide 16237..18039 297 | /gene="ORF1ab" 298 | /locus_tag="GU280_gp01" 299 | /product="helicase" 300 | /note="nsp13_ZBD, nsp13_TB, and nsp_HEL1core; zinc-binding 301 | domain (ZD), NTPase/helicase domain (HEL), RNA 302 | 5'-triphosphatase; produced by pp1ab only" 303 | /protein_id="YP_009725308.1" 304 | mat_peptide 18040..19620 305 | /gene="ORF1ab" 306 | /locus_tag="GU280_gp01" 307 | /product="3'-to-5' exonuclease" 308 | /note="nsp14A2_ExoN and nsp14B_NMT; produced by pp1ab 309 | only" 310 | /protein_id="YP_009725309.1" 311 | mat_peptide 19621..20658 312 | /gene="ORF1ab" 313 | /locus_tag="GU280_gp01" 314 | /product="endoRNAse" 315 | /note="nsp15-A1 and nsp15B-NendoU; produced by pp1ab only" 316 | /protein_id="YP_009725310.1" 317 | mat_peptide 20659..21552 318 | /gene="ORF1ab" 319 | /locus_tag="GU280_gp01" 320 | /product="2'-O-ribose methyltransferase" 321 | /note="nsp16_OMT; 2'-o-MT; produced by pp1ab only" 322 | /protein_id="YP_009725311.1" 323 | CDS 266..13483 324 | /gene="ORF1ab" 325 | /locus_tag="GU280_gp01" 326 | /note="pp1a" 327 | /codon_start=1 328 | /product="ORF1a polyprotein" 329 | /protein_id="YP_009725295.1" 330 | /db_xref="GeneID:43740578" 331 | /translation="MESLVPGFNEKTHVQLSLPVLQVRDVLVRGFGDSVEEVLSEARQ 332 | HLKDGTCGLVEVEKGVLPQLEQPYVFIKRSDARTAPHGHVMVELVAELEGIQYGRSGE 333 | TLGVLVPHVGEIPVAYRKVLLRKNGNKGAGGHSYGADLKSFDLGDELGTDPYEDFQEN 334 | WNTKHSSGVTRELMRELNGGAYTRYVDNNFCGPDGYPLECIKDLLARAGKASCTLSEQ 335 | LDFIDTKRGVYCCREHEHEIAWYTERSEKSYELQTPFEIKLAKKFDTFNGECPNFVFP 336 | LNSIIKTIQPRVEKKKLDGFMGRIRSVYPVASPNECNQMCLSTLMKCDHCGETSWQTG 337 | DFVKATCEFCGTENLTKEGATTCGYLPQNAVVKIYCPACHNSEVGPEHSLAEYHNESG 338 | LKTILRKGGRTIAFGGCVFSYVGCHNKCAYWVPRASANIGCNHTGVVGEGSEGLNDNL 339 | LEILQKEKVNINIVGDFKLNEEIAIILASFSASTSAFVETVKGLDYKAFKQIVESCGN 340 | FKVTKGKAKKGAWNIGEQKSILSPLYAFASEAARVVRSIFSRTLETAQNSVRVLQKAA 341 | ITILDGISQYSLRLIDAMMFTSDLATNNLVVMAYITGGVVQLTSQWLTNIFGTVYEKL 342 | KPVLDWLEEKFKEGVEFLRDGWEIVKFISTCACEIVGGQIVTCAKEIKESVQTFFKLV 343 | NKFLALCADSIIIGGAKLKALNLGETFVTHSKGLYRKCVKSREETGLLMPLKAPKEII 344 | FLEGETLPTEVLTEEVVLKTGDLQPLEQPTSEAVEAPLVGTPVCINGLMLLEIKDTEK 345 | YCALAPNMMVTNNTFTLKGGAPTKVTFGDDTVIEVQGYKSVNITFELDERIDKVLNEK 346 | CSAYTVELGTEVNEFACVVADAVIKTLQPVSELLTPLGIDLDEWSMATYYLFDESGEF 347 | KLASHMYCSFYPPDEDEEEGDCEEEEFEPSTQYEYGTEDDYQGKPLEFGATSAALQPE 348 | EEQEEDWLDDDSQQTVGQQDGSEDNQTTTIQTIVEVQPQLEMELTPVVQTIEVNSFSG 349 | YLKLTDNVYIKNADIVEEAKKVKPTVVVNAANVYLKHGGGVAGALNKATNNAMQVESD 350 | DYIATNGPLKVGGSCVLSGHNLAKHCLHVVGPNVNKGEDIQLLKSAYENFNQHEVLLA 351 | PLLSAGIFGADPIHSLRVCVDTVRTNVYLAVFDKNLYDKLVSSFLEMKSEKQVEQKIA 352 | EIPKEEVKPFITESKPSVEQRKQDDKKIKACVEEVTTTLEETKFLTENLLLYIDINGN 353 | LHPDSATLVSDIDITFLKKDAPYIVGDVVQEGVLTAVVIPTKKAGGTTEMLAKALRKV 354 | PTDNYITTYPGQGLNGYTVEEAKTVLKKCKSAFYILPSIISNEKQEILGTVSWNLREM 355 | LAHAEETRKLMPVCVETKAIVSTIQRKYKGIKIQEGVVDYGARFYFYTSKTTVASLIN 356 | TLNDLNETLVTMPLGYVTHGLNLEEAARYMRSLKVPATVSVSSPDAVTAYNGYLTSSS 357 | KTPEEHFIETISLAGSYKDWSYSGQSTQLGIEFLKRGDKSVYYTSNPTTFHLDGEVIT 358 | FDNLKTLLSLREVRTIKVFTTVDNINLHTQVVDMSMTYGQQFGPTYLDGADVTKIKPH 359 | NSHEGKTFYVLPNDDTLRVEAFEYYHTTDPSFLGRYMSALNHTKKWKYPQVNGLTSIK 360 | WADNNCYLATALLTLQQIELKFNPPALQDAYYRARAGEAANFCALILAYCNKTVGELG 361 | DVRETMSYLFQHANLDSCKRVLNVVCKTCGQQQTTLKGVEAVMYMGTLSYEQFKKGVQ 362 | IPCTCGKQATKYLVQQESPFVMMSAPPAQYELKHGTFTCASEYTGNYQCGHYKHITSK 363 | ETLYCIDGALLTKSSEYKGPITDVFYKENSYTTTIKPVTYKLDGVVCTEIDPKLDNYY 364 | KKDNSYFTEQPIDLVPNQPYPNASFDNFKFVCDNIKFADDLNQLTGYKKPASRELKVT 365 | FFPDLNGDVVAIDYKHYTPSFKKGAKLLHKPIVWHVNNATNKATYKPNTWCIRCLWST 366 | KPVETSNSFDVLKSEDAQGMDNLACEDLKPVSEEVVENPTIQKDVLECNVKTTEVVGD 367 | IILKPANNSLKITEEVGHTDLMAAYVDNSSLTIKKPNELSRVLGLKTLATHGLAAVNS 368 | VPWDTIANYAKPFLNKVVSTTTNIVTRCLNRVCTNYMPYFFTLLLQLCTFTRSTNSRI 369 | KASMPTTIAKNTVKSVGKFCLEASFNYLKSPNFSKLINIIIWFLLLSVCLGSLIYSTA 370 | ALGVLMSNLGMPSYCTGYREGYLNSTNVTIATYCTGSIPCSVCLSGLDSLDTYPSLET 371 | IQITISSFKWDLTAFGLVAEWFLAYILFTRFFYVLGLAAIMQLFFSYFAVHFISNSWL 372 | MWLIINLVQMAPISAMVRMYIFFASFYYVWKSYVHVVDGCNSSTCMMCYKRNRATRVE 373 | CTTIVNGVRRSFYVYANGGKGFCKLHNWNCVNCDTFCAGSTFISDEVARDLSLQFKRP 374 | INPTDQSSYIVDSVTVKNGSIHLYFDKAGQKTYERHSLSHFVNLDNLRANNTKGSLPI 375 | NVIVFDGKSKCEESSAKSASVYYSQLMCQPILLLDQALVSDVGDSAEVAVKMFDAYVN 376 | TFSSTFNVPMEKLKTLVATAEAELAKNVSLDNVLSTFISAARQGFVDSDVETKDVVEC 377 | LKLSHQSDIEVTGDSCNNYMLTYNKVENMTPRDLGACIDCSARHINAQVAKSHNIALI 378 | WNVKDFMSLSEQLRKQIRSAAKKNNLPFKLTCATTRQVVNVVTTKIALKGGKIVNNWL 379 | KQLIKVTLVFLFVAAIFYLITPVHVMSKHTDFSSEIIGYKAIDGGVTRDIASTDTCFA 380 | NKHADFDTWFSQRGGSYTNDKACPLIAAVITREVGFVVPGLPGTILRTTNGDFLHFLP 381 | RVFSAVGNICYTPSKLIEYTDFATSACVLAAECTIFKDASGKPVPYCYDTNVLEGSVA 382 | YESLRPDTRYVLMDGSIIQFPNTYLEGSVRVVTTFDSEYCRHGTCERSEAGVCVSTSG 383 | RWVLNNDYYRSLPGVFCGVDAVNLLTNMFTPLIQPIGALDISASIVAGGIVAIVVTCL 384 | AYYFMRFRRAFGEYSHVVAFNTLLFLMSFTVLCLTPVYSFLPGVYSVIYLYLTFYLTN 385 | DVSFLAHIQWMVMFTPLVPFWITIAYIICISTKHFYWFFSNYLKRRVVFNGVSFSTFE 386 | EAALCTFLLNKEMYLKLRSDVLLPLTQYNRYLALYNKYKYFSGAMDTTSYREAACCHL 387 | AKALNDFSNSGSDVLYQPPQTSITSAVLQSGFRKMAFPSGKVEGCMVQVTCGTTTLNG 388 | LWLDDVVYCPRHVICTSEDMLNPNYEDLLIRKSNHNFLVQAGNVQLRVIGHSMQNCVL 389 | KLKVDTANPKTPKYKFVRIQPGQTFSVLACYNGSPSGVYQCAMRPNFTIKGSFLNGSC 390 | GSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGNFYGPFVDRQTAQAAGTDTTITVN 391 | VLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYEPLTQDHVDILGPLSAQTGIAV 392 | LDMCASLKELLQNGMNGRTILGSALLEDEFTPFDVVRQCSGVTFQSAVKRTIKGTHHW 393 | LLLTILTSLLVLVQSTQWSLFFFLYENAFLPFAMGIIAMSAFAMMFVKHKHAFLCLFL 394 | LPSLATVAYFNMVYMPASWVMRIMTWLDMVDTSLSGFKLKDCVMYASAVVLLILMTAR 395 | TVYDDGARRVWTLMNVLTLVYKVYYGNALDQAISMWALIISVTSNYSGVVTTVMFLAR 396 | GIVFMCVEYCPIFFITGNTLQCIMLVYCFLGYFCTCYFGLFCLLNRYFRLTLGVYDYL 397 | VSTQEFRYMNSQGLLPPKNSIDAFKLNIKLLGVGGKPCIKVATVQSKMSDVKCTSVVL 398 | LSVLQQLRVESSSKLWAQCVQLHNDILLAKDTTEAFEKMVSLLSVLLSMQGAVDINKL 399 | CEEMLDNRATLQAIASEFSSLPSYAAFATAQEAYEQAVANGDSEVVLKKLKKSLNVAK 400 | SEFDRDAAMQRKLEKMADQAMTQMYKQARSEDKRAKVTSAMQTMLFTMLRKLDNDALN 401 | NIINNARDGCVPLNIIPLTTAAKLMVVIPDYNTYKNTCDGTTFTYASALWEIQQVVDA 402 | DSKIVQLSEISMDNSPNLAWPLIVTALRANSAVKLQNNELSPVALRQMSCAAGTTQTA 403 | CTDDNALAYYNTTKGGRFVLALLSDLQDLKWARFPKSDGTGTIYTELEPPCRFVTDTP 404 | KGPKVKYLYFIKGLNNLNRGMVLGSLAATVRLQAGNATEVPANSTVLSFCAFAVDAAK 405 | AYKDYLASGGQPITNCVKMLCTHTGTGQAITVTPEANMDQESFGGASCCLYCRCHIDH 406 | PNPKGFCDLKGKYVQIPTTCANDPVGFTLKNTVCTVCGMWKGYGCSCDQLREPMLQSA 407 | DAQSFLNGFAV" 408 | mat_peptide 266..805 409 | /gene="ORF1ab" 410 | /locus_tag="GU280_gp01" 411 | /product="leader protein" 412 | /note="nsp1; produced by both pp1a and pp1ab" 413 | /protein_id="YP_009742608.1" 414 | mat_peptide 806..2719 415 | /gene="ORF1ab" 416 | /locus_tag="GU280_gp01" 417 | /product="nsp2" 418 | /note="produced by both pp1a and pp1ab" 419 | /protein_id="YP_009742609.1" 420 | mat_peptide 2720..8554 421 | /gene="ORF1ab" 422 | /locus_tag="GU280_gp01" 423 | /product="nsp3" 424 | /note="former nsp1; conserved domains are: N-terminal 425 | acidic (Ac), predicted phosphoesterase, papain-like 426 | proteinase, Y-domain, transmembrane domain 1 (TM1), 427 | adenosine diphosphate-ribose 1''-phosphatase (ADRP); 428 | produced by both pp1a and pp1ab" 429 | /protein_id="YP_009742610.1" 430 | mat_peptide 8555..10054 431 | /gene="ORF1ab" 432 | /locus_tag="GU280_gp01" 433 | /product="nsp4" 434 | /note="nsp4B_TM; contains transmembrane domain 2 (TM2); 435 | produced by both pp1a and pp1ab" 436 | /protein_id="YP_009742611.1" 437 | mat_peptide 10055..10972 438 | /gene="ORF1ab" 439 | /locus_tag="GU280_gp01" 440 | /product="3C-like proteinase" 441 | /note="nsp5A_3CLpro and nsp5B_3CLpro; main proteinase 442 | (Mpro); mediates cleavages downstream of nsp4. 3D 443 | structure of the SARSr-CoV homolog has been determined 444 | (Yang et al., 2003); produced by both pp1a and pp1ab" 445 | /protein_id="YP_009742612.1" 446 | mat_peptide 10973..11842 447 | /gene="ORF1ab" 448 | /locus_tag="GU280_gp01" 449 | /product="nsp6" 450 | /note="nsp6_TM; putative transmembrane domain; produced by 451 | both pp1a and pp1ab" 452 | /protein_id="YP_009742613.1" 453 | mat_peptide 11843..12091 454 | /gene="ORF1ab" 455 | /locus_tag="GU280_gp01" 456 | /product="nsp7" 457 | /note="produced by both pp1a and pp1ab" 458 | /protein_id="YP_009742614.1" 459 | mat_peptide 12092..12685 460 | /gene="ORF1ab" 461 | /locus_tag="GU280_gp01" 462 | /product="nsp8" 463 | /note="produced by both pp1a and pp1ab" 464 | /protein_id="YP_009742615.1" 465 | mat_peptide 12686..13024 466 | /gene="ORF1ab" 467 | /locus_tag="GU280_gp01" 468 | /product="nsp9" 469 | /note="ssRNA-binding protein; produced by both pp1a and 470 | pp1ab" 471 | /protein_id="YP_009742616.1" 472 | mat_peptide 13025..13441 473 | /gene="ORF1ab" 474 | /locus_tag="GU280_gp01" 475 | /product="nsp10" 476 | /note="nsp10_CysHis; formerly known as growth-factor-like 477 | protein (GFL); produced by both pp1a and pp1ab" 478 | /protein_id="YP_009742617.1" 479 | mat_peptide 13442..13480 480 | /gene="ORF1ab" 481 | /locus_tag="GU280_gp01" 482 | /product="nsp11" 483 | /note="produced by pp1a only" 484 | /protein_id="YP_009725312.1" 485 | stem_loop 13476..13503 486 | /gene="ORF1ab" 487 | /locus_tag="GU280_gp01" 488 | /inference="COORDINATES: 489 | profile:Rfam-release-14.1:RF00507,Infernal:1.1.2" 490 | /function="Coronavirus frameshifting stimulation element 491 | stem-loop 1" 492 | stem_loop 13488..13542 493 | /gene="ORF1ab" 494 | /locus_tag="GU280_gp01" 495 | /inference="COORDINATES: 496 | profile:Rfam-release-14.1:RF00507,Infernal:1.1.2" 497 | /function="Coronavirus frameshifting stimulation element 498 | stem-loop 2" 499 | gene 21563..25384 500 | /gene="S" 501 | /locus_tag="GU280_gp02" 502 | /gene_synonym="spike glycoprotein" 503 | /db_xref="GeneID:43740568" 504 | CDS 21563..25384 505 | /gene="S" 506 | /locus_tag="GU280_gp02" 507 | /gene_synonym="spike glycoprotein" 508 | /note="structural protein; spike protein" 509 | /codon_start=1 510 | /product="surface glycoprotein" 511 | /protein_id="YP_009724390.1" 512 | /db_xref="GeneID:43740568" 513 | /translation="MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFR 514 | SSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPFNDGVYFASTEKSNIIR 515 | GWIFGTTLDSKTQSLLIVNNATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVY 516 | SSANNCTFEYVSQPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPQ 517 | GFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFL 518 | LKYNENGTITDAVDCALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITN 519 | LCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCF 520 | TNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYN 521 | YLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPY 522 | RVVVLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFLPFQQFG 523 | RDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLYQDVNCTEVPVAI 524 | HADQLTPTWRVYSTGSNVFQTRAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPR 525 | RARSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTKTSVDCTM 526 | YICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKNTQEVFAQVKQIYKTPPIKDFG 527 | GFNFSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFN 528 | GLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRFNGIGVTQN 529 | VLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALNTLVKQLSSNFGA 530 | ISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMS 531 | ECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTAPAICHDGKAH 532 | FPREGVFVSNGTHWFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQPELD 533 | SFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELG 534 | KYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSCGSCCKFDEDDSE 535 | PVLKGVKLHYT" 536 | gene 25393..26220 537 | /gene="ORF3a" 538 | /locus_tag="GU280_gp03" 539 | /db_xref="GeneID:43740569" 540 | CDS 25393..26220 541 | /gene="ORF3a" 542 | /locus_tag="GU280_gp03" 543 | /codon_start=1 544 | /product="ORF3a protein" 545 | /protein_id="YP_009724391.1" 546 | /db_xref="GeneID:43740569" 547 | /translation="MDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLPFG 548 | WLIVGVALLAVFQSASKIITLKKRWQLALSKGVHFVCNLLLLFVTVYSHLLLVAAGLE 549 | APFLYLYALVYFLQSINFVRIIMRLWLCWKCRSKNPLLYDANYFLCWHTNCYDYCIPY 550 | NSVTSSIVITSGDGTTSPISEHDYQIGGYTEKWESGVKDCVVLHSYFTSDYYQLYSTQ 551 | LSTDTGVEHVTFFIYNKIVDEPEEHVQIHTIDGSSGVVNPVMEPIYDEPTTTTSVPL" 552 | gene 26245..26472 553 | /gene="E" 554 | /locus_tag="GU280_gp04" 555 | /db_xref="GeneID:43740570" 556 | CDS 26245..26472 557 | /gene="E" 558 | /locus_tag="GU280_gp04" 559 | /note="ORF4; structural protein; E protein" 560 | /codon_start=1 561 | /product="envelope protein" 562 | /protein_id="YP_009724392.1" 563 | /db_xref="GeneID:43740570" 564 | /translation="MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCC 565 | NIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV" 566 | gene 26523..27191 567 | /gene="M" 568 | /locus_tag="GU280_gp05" 569 | /db_xref="GeneID:43740571" 570 | CDS 26523..27191 571 | /gene="M" 572 | /locus_tag="GU280_gp05" 573 | /note="ORF5; structural protein" 574 | /codon_start=1 575 | /product="membrane glycoprotein" 576 | /protein_id="YP_009724393.1" 577 | /db_xref="GeneID:43740571" 578 | /translation="MADSNGTITVEELKKLLEQWNLVIGFLFLTWICLLQFAYANRNR 579 | FLYIIKLIFLWLLWPVTLACFVLAAVYRINWITGGIAIAMACLVGLMWLSYFIASFRL 580 | FARTRSMWSFNPETNILLNVPLHGTILTRPLLESELVIGAVILRGHLRIAGHHLGRCD 581 | IKDLPKEITVATSRTLSYYKLGASQRVAGDSGFAAYSRYRIGNYKLNTDHSSSSDNIA 582 | LLVQ" 583 | gene 27202..27387 584 | /gene="ORF6" 585 | /locus_tag="GU280_gp06" 586 | /db_xref="GeneID:43740572" 587 | CDS 27202..27387 588 | /gene="ORF6" 589 | /locus_tag="GU280_gp06" 590 | /codon_start=1 591 | /product="ORF6 protein" 592 | /protein_id="YP_009724394.1" 593 | /db_xref="GeneID:43740572" 594 | /translation="MFHLVDFQVTIAEILLIIMRTFKVSIWNLDYIINLIIKNLSKSL 595 | TENKYSQLDEEQPMEID" 596 | gene 27394..27759 597 | /gene="ORF7a" 598 | /locus_tag="GU280_gp07" 599 | /db_xref="GeneID:43740573" 600 | CDS 27394..27759 601 | /gene="ORF7a" 602 | /locus_tag="GU280_gp07" 603 | /codon_start=1 604 | /product="ORF7a protein" 605 | /protein_id="YP_009724395.1" 606 | /db_xref="GeneID:43740573" 607 | /translation="MKIILFLALITLATCELYHYQECVRGTTVLLKEPCSSGTYEGNS 608 | PFHPLADNKFALTCFSTQFAFACPDGVKHVYQLRARSVSPKLFIRQEEVQELYSPIFL 609 | IVAAIVFITLCFTLKRKTE" 610 | gene 27756..27887 611 | /gene="ORF7b" 612 | /locus_tag="GU280_gp08" 613 | /db_xref="GeneID:43740574" 614 | CDS 27756..27887 615 | /gene="ORF7b" 616 | /locus_tag="GU280_gp08" 617 | /codon_start=1 618 | /product="ORF7b" 619 | /protein_id="YP_009725318.1" 620 | /db_xref="GeneID:43740574" 621 | /translation="MIELSLIDFYLCFLAFLLFLVLIMLIIFWFSLELQDHNETCHA" 622 | gene 27894..28259 623 | /gene="ORF8" 624 | /locus_tag="GU280_gp09" 625 | /db_xref="GeneID:43740577" 626 | CDS 27894..28259 627 | /gene="ORF8" 628 | /locus_tag="GU280_gp09" 629 | /codon_start=1 630 | /product="ORF8 protein" 631 | /protein_id="YP_009724396.1" 632 | /db_xref="GeneID:43740577" 633 | /translation="MKFLVFLGIITTVAAFHQECSLQSCTQHQPYVVDDPCPIHFYSK 634 | WYIRVGARKSAPLIELCVDEAGSKSPIQYIDIGNYTVSCLPFTINCQEPKLGSLVVRC 635 | SFYEDFLEYHDVRVVLDFI" 636 | gene 28274..29533 637 | /gene="N" 638 | /locus_tag="GU280_gp10" 639 | /db_xref="GeneID:43740575" 640 | CDS 28274..29533 641 | /gene="N" 642 | /locus_tag="GU280_gp10" 643 | /note="ORF9; structural protein" 644 | /codon_start=1 645 | /product="nucleocapsid phosphoprotein" 646 | /protein_id="YP_009724397.2" 647 | /db_xref="GeneID:43740575" 648 | /translation="MSDNGPQNQRNAPRITFGGPSDSTGSNQNGERSGARSKQRRPQG 649 | LPNNTASWFTALTQHGKEDLKFPRGQGVPINTNSSPDDQIGYYRRATRRIRGGDGKMK 650 | DLSPRWYFYYLGTGPEAGLPYGANKDGIIWVATEGALNTPKDHIGTRNPANNAAIVLQ 651 | LPQGTTLPKGFYAEGSRGGSQASSRSSSRSRNSSRNSTPGSSRGTSPARMAGNGGDAA 652 | LALLLLDRLNQLESKMSGKGQQQQGQTVTKKSAAEASKKPRQKRTATKAYNVTQAFGR 653 | RGPEQTQGNFGDQELIRQGTDYKHWPQIAQFAPSASAFFGMSRIGMEVTPSGTWLTYT 654 | GAIKLDDKDPNFKDQVILLNKHIDAYKTFPPTEPKKDKKKKADETQALPQRQKKQQTV 655 | TLLPAADLDDFSKQLQQSMSSADSTQA" 656 | gene 29558..29674 657 | /gene="ORF10" 658 | /locus_tag="GU280_gp11" 659 | /db_xref="GeneID:43740576" 660 | CDS 29558..29674 661 | /gene="ORF10" 662 | /locus_tag="GU280_gp11" 663 | /codon_start=1 664 | /product="ORF10 protein" 665 | /protein_id="YP_009725255.1" 666 | /db_xref="GeneID:43740576" 667 | /translation="MGYINVFAFPFTIYSLLLCRMNSRNYIAQVDVVNFNLT" 668 | stem_loop 29609..29644 669 | /gene="ORF10" 670 | /locus_tag="GU280_gp11" 671 | /inference="COORDINATES: 672 | profile::Rfam-release-14.1:RF00165,Infernal:1.1.2" 673 | /function="Coronavirus 3' UTR pseudoknot stem-loop 1" 674 | stem_loop 29629..29657 675 | /gene="ORF10" 676 | /locus_tag="GU280_gp11" 677 | /inference="COORDINATES: 678 | profile::Rfam-release-14.1:RF00165,Infernal:1.1.2" 679 | /function="Coronavirus 3' UTR pseudoknot stem-loop 2" 680 | 3'UTR 29675..29903 681 | stem_loop 29728..29768 682 | /inference="COORDINATES: 683 | profile:Rfam-release-14.1:RF00164,Infernal:1.1.2" 684 | /note="basepair exception: alignment to the Rfam model 685 | implies coordinates 29740:29758 form a noncanonical C:T 686 | basepair, but the homologous positions form a highly 687 | conserved C:G basepair in other viruses, including SARS 688 | (NC_004718.3)" 689 | /function="Coronavirus 3' stem-loop II-like motif (s2m)" 690 | ORIGIN 691 | 1 attaaaggtt tataccttcc caggtaacaa accaaccaac tttcgatctc ttgtagatct 692 | 61 gttctctaaa cgaactttaa aatctgtgtg gctgtcactc ggctgcatgc ttagtgcact 693 | 121 cacgcagtat aattaataac taattactgt cgttgacagg acacgagtaa ctcgtctatc 694 | 181 ttctgcaggc tgcttacggt ttcgtccgtg ttgcagccga tcatcagcac atctaggttt 695 | 241 cgtccgggtg tgaccgaaag gtaagatgga gagccttgtc cctggtttca acgagaaaac 696 | 301 acacgtccaa ctcagtttgc ctgttttaca ggttcgcgac gtgctcgtac gtggctttgg 697 | 361 agactccgtg gaggaggtct tatcagaggc acgtcaacat cttaaagatg gcacttgtgg 698 | 421 cttagtagaa gttgaaaaag gcgttttgcc tcaacttgaa cagccctatg tgttcatcaa 699 | 481 acgttcggat gctcgaactg cacctcatgg tcatgttatg gttgagctgg tagcagaact 700 | 541 cgaaggcatt cagtacggtc gtagtggtga gacacttggt gtccttgtcc ctcatgtggg 701 | 601 cgaaatacca gtggcttacc gcaaggttct tcttcgtaag aacggtaata aaggagctgg 702 | 661 tggccatagt tacggcgccg atctaaagtc atttgactta ggcgacgagc ttggcactga 703 | 721 tccttatgaa gattttcaag aaaactggaa cactaaacat agcagtggtg ttacccgtga 704 | 781 actcatgcgt gagcttaacg gaggggcata cactcgctat gtcgataaca acttctgtgg 705 | 841 ccctgatggc taccctcttg agtgcattaa agaccttcta gcacgtgctg gtaaagcttc 706 | 901 atgcactttg tccgaacaac tggactttat tgacactaag aggggtgtat actgctgccg 707 | 961 tgaacatgag catgaaattg cttggtacac ggaacgttct gaaaagagct atgaattgca 708 | 1021 gacacctttt gaaattaaat tggcaaagaa atttgacacc ttcaatgggg aatgtccaaa 709 | 1081 ttttgtattt cccttaaatt ccataatcaa gactattcaa ccaagggttg aaaagaaaaa 710 | 1141 gcttgatggc tttatgggta gaattcgatc tgtctatcca gttgcgtcac caaatgaatg 711 | 1201 caaccaaatg tgcctttcaa ctctcatgaa gtgtgatcat tgtggtgaaa cttcatggca 712 | 1261 gacgggcgat tttgttaaag ccacttgcga attttgtggc actgagaatt tgactaaaga 713 | 1321 aggtgccact acttgtggtt acttacccca aaatgctgtt gttaaaattt attgtccagc 714 | 1381 atgtcacaat tcagaagtag gacctgagca tagtcttgcc gaataccata atgaatctgg 715 | 1441 cttgaaaacc attcttcgta agggtggtcg cactattgcc tttggaggct gtgtgttctc 716 | 1501 ttatgttggt tgccataaca agtgtgccta ttgggttcca cgtgctagcg ctaacatagg 717 | 1561 ttgtaaccat acaggtgttg ttggagaagg ttccgaaggt cttaatgaca accttcttga 718 | 1621 aatactccaa aaagagaaag tcaacatcaa tattgttggt gactttaaac ttaatgaaga 719 | 1681 gatcgccatt attttggcat ctttttctgc ttccacaagt gcttttgtgg aaactgtgaa 720 | 1741 aggtttggat tataaagcat tcaaacaaat tgttgaatcc tgtggtaatt ttaaagttac 721 | 1801 aaaaggaaaa gctaaaaaag gtgcctggaa tattggtgaa cagaaatcaa tactgagtcc 722 | 1861 tctttatgca tttgcatcag aggctgctcg tgttgtacga tcaattttct cccgcactct 723 | 1921 tgaaactgct caaaattctg tgcgtgtttt acagaaggcc gctataacaa tactagatgg 724 | 1981 aatttcacag tattcactga gactcattga tgctatgatg ttcacatctg atttggctac 725 | 2041 taacaatcta gttgtaatgg cctacattac aggtggtgtt gttcagttga cttcgcagtg 726 | 2101 gctaactaac atctttggca ctgtttatga aaaactcaaa cccgtccttg attggcttga 727 | 2161 agagaagttt aaggaaggtg tagagtttct tagagacggt tgggaaattg ttaaatttat 728 | 2221 ctcaacctgt gcttgtgaaa ttgtcggtgg acaaattgtc acctgtgcaa aggaaattaa 729 | 2281 ggagagtgtt cagacattct ttaagcttgt aaataaattt ttggctttgt gtgctgactc 730 | 2341 tatcattatt ggtggagcta aacttaaagc cttgaattta ggtgaaacat ttgtcacgca 731 | 2401 ctcaaaggga ttgtacagaa agtgtgttaa atccagagaa gaaactggcc tactcatgcc 732 | 2461 tctaaaagcc ccaaaagaaa ttatcttctt agagggagaa acacttccca cagaagtgtt 733 | 2521 aacagaggaa gttgtcttga aaactggtga tttacaacca ttagaacaac ctactagtga 734 | 2581 agctgttgaa gctccattgg ttggtacacc agtttgtatt aacgggctta tgttgctcga 735 | 2641 aatcaaagac acagaaaagt actgtgccct tgcacctaat atgatggtaa caaacaatac 736 | 2701 cttcacactc aaaggcggtg caccaacaaa ggttactttt ggtgatgaca ctgtgataga 737 | 2761 agtgcaaggt tacaagagtg tgaatatcac ttttgaactt gatgaaagga ttgataaagt 738 | 2821 acttaatgag aagtgctctg cctatacagt tgaactcggt acagaagtaa atgagttcgc 739 | 2881 ctgtgttgtg gcagatgctg tcataaaaac tttgcaacca gtatctgaat tacttacacc 740 | 2941 actgggcatt gatttagatg agtggagtat ggctacatac tacttatttg atgagtctgg 741 | 3001 tgagtttaaa ttggcttcac atatgtattg ttctttctac cctccagatg aggatgaaga 742 | 3061 agaaggtgat tgtgaagaag aagagtttga gccatcaact caatatgagt atggtactga 743 | 3121 agatgattac caaggtaaac ctttggaatt tggtgccact tctgctgctc ttcaacctga 744 | 3181 agaagagcaa gaagaagatt ggttagatga tgatagtcaa caaactgttg gtcaacaaga 745 | 3241 cggcagtgag gacaatcaga caactactat tcaaacaatt gttgaggttc aacctcaatt 746 | 3301 agagatggaa cttacaccag ttgttcagac tattgaagtg aatagtttta gtggttattt 747 | 3361 aaaacttact gacaatgtat acattaaaaa tgcagacatt gtggaagaag ctaaaaaggt 748 | 3421 aaaaccaaca gtggttgtta atgcagccaa tgtttacctt aaacatggag gaggtgttgc 749 | 3481 aggagcctta aataaggcta ctaacaatgc catgcaagtt gaatctgatg attacatagc 750 | 3541 tactaatgga ccacttaaag tgggtggtag ttgtgtttta agcggacaca atcttgctaa 751 | 3601 acactgtctt catgttgtcg gcccaaatgt taacaaaggt gaagacattc aacttcttaa 752 | 3661 gagtgcttat gaaaatttta atcagcacga agttctactt gcaccattat tatcagctgg 753 | 3721 tatttttggt gctgacccta tacattcttt aagagtttgt gtagatactg ttcgcacaaa 754 | 3781 tgtctactta gctgtctttg ataaaaatct ctatgacaaa cttgtttcaa gctttttgga 755 | 3841 aatgaagagt gaaaagcaag ttgaacaaaa gatcgctgag attcctaaag aggaagttaa 756 | 3901 gccatttata actgaaagta aaccttcagt tgaacagaga aaacaagatg ataagaaaat 757 | 3961 caaagcttgt gttgaagaag ttacaacaac tctggaagaa actaagttcc tcacagaaaa 758 | 4021 cttgttactt tatattgaca ttaatggcaa tcttcatcca gattctgcca ctcttgttag 759 | 4081 tgacattgac atcactttct taaagaaaga tgctccatat atagtgggtg atgttgttca 760 | 4141 agagggtgtt ttaactgctg tggttatacc tactaaaaag gctggtggca ctactgaaat 761 | 4201 gctagcgaaa gctttgagaa aagtgccaac agacaattat ataaccactt acccgggtca 762 | 4261 gggtttaaat ggttacactg tagaggaggc aaagacagtg cttaaaaagt gtaaaagtgc 763 | 4321 cttttacatt ctaccatcta ttatctctaa tgagaagcaa gaaattcttg gaactgtttc 764 | 4381 ttggaatttg cgagaaatgc ttgcacatgc agaagaaaca cgcaaattaa tgcctgtctg 765 | 4441 tgtggaaact aaagccatag tttcaactat acagcgtaaa tataagggta ttaaaataca 766 | 4501 agagggtgtg gttgattatg gtgctagatt ttacttttac accagtaaaa caactgtagc 767 | 4561 gtcacttatc aacacactta acgatctaaa tgaaactctt gttacaatgc cacttggcta 768 | 4621 tgtaacacat ggcttaaatt tggaagaagc tgctcggtat atgagatctc tcaaagtgcc 769 | 4681 agctacagtt tctgtttctt cacctgatgc tgttacagcg tataatggtt atcttacttc 770 | 4741 ttcttctaaa acacctgaag aacattttat tgaaaccatc tcacttgctg gttcctataa 771 | 4801 agattggtcc tattctggac aatctacaca actaggtata gaatttctta agagaggtga 772 | 4861 taaaagtgta tattacacta gtaatcctac cacattccac ctagatggtg aagttatcac 773 | 4921 ctttgacaat cttaagacac ttctttcttt gagagaagtg aggactatta aggtgtttac 774 | 4981 aacagtagac aacattaacc tccacacgca agttgtggac atgtcaatga catatggaca 775 | 5041 acagtttggt ccaacttatt tggatggagc tgatgttact aaaataaaac ctcataattc 776 | 5101 acatgaaggt aaaacatttt atgttttacc taatgatgac actctacgtg ttgaggcttt 777 | 5161 tgagtactac cacacaactg atcctagttt tctgggtagg tacatgtcag cattaaatca 778 | 5221 cactaaaaag tggaaatacc cacaagttaa tggtttaact tctattaaat gggcagataa 779 | 5281 caactgttat cttgccactg cattgttaac actccaacaa atagagttga agtttaatcc 780 | 5341 acctgctcta caagatgctt attacagagc aagggctggt gaagctgcta acttttgtgc 781 | 5401 acttatctta gcctactgta ataagacagt aggtgagtta ggtgatgtta gagaaacaat 782 | 5461 gagttacttg tttcaacatg ccaatttaga ttcttgcaaa agagtcttga acgtggtgtg 783 | 5521 taaaacttgt ggacaacagc agacaaccct taagggtgta gaagctgtta tgtacatggg 784 | 5581 cacactttct tatgaacaat ttaagaaagg tgttcagata ccttgtacgt gtggtaaaca 785 | 5641 agctacaaaa tatctagtac aacaggagtc accttttgtt atgatgtcag caccacctgc 786 | 5701 tcagtatgaa cttaagcatg gtacatttac ttgtgctagt gagtacactg gtaattacca 787 | 5761 gtgtggtcac tataaacata taacttctaa agaaactttg tattgcatag acggtgcttt 788 | 5821 acttacaaag tcctcagaat acaaaggtcc tattacggat gttttctaca aagaaaacag 789 | 5881 ttacacaaca accataaaac cagttactta taaattggat ggtgttgttt gtacagaaat 790 | 5941 tgaccctaag ttggacaatt attataagaa agacaattct tatttcacag agcaaccaat 791 | 6001 tgatcttgta ccaaaccaac catatccaaa cgcaagcttc gataatttta agtttgtatg 792 | 6061 tgataatatc aaatttgctg atgatttaaa ccagttaact ggttataaga aacctgcttc 793 | 6121 aagagagctt aaagttacat ttttccctga cttaaatggt gatgtggtgg ctattgatta 794 | 6181 taaacactac acaccctctt ttaagaaagg agctaaattg ttacataaac ctattgtttg 795 | 6241 gcatgttaac aatgcaacta ataaagccac gtataaacca aatacctggt gtatacgttg 796 | 6301 tctttggagc acaaaaccag ttgaaacatc aaattcgttt gatgtactga agtcagagga 797 | 6361 cgcgcaggga atggataatc ttgcctgcga agatctaaaa ccagtctctg aagaagtagt 798 | 6421 ggaaaatcct accatacaga aagacgttct tgagtgtaat gtgaaaacta ccgaagttgt 799 | 6481 aggagacatt atacttaaac cagcaaataa tagtttaaaa attacagaag aggttggcca 800 | 6541 cacagatcta atggctgctt atgtagacaa ttctagtctt actattaaga aacctaatga 801 | 6601 attatctaga gtattaggtt tgaaaaccct tgctactcat ggtttagctg ctgttaatag 802 | 6661 tgtcccttgg gatactatag ctaattatgc taagcctttt cttaacaaag ttgttagtac 803 | 6721 aactactaac atagttacac ggtgtttaaa ccgtgtttgt actaattata tgccttattt 804 | 6781 ctttacttta ttgctacaat tgtgtacttt tactagaagt acaaattcta gaattaaagc 805 | 6841 atctatgccg actactatag caaagaatac tgttaagagt gtcggtaaat tttgtctaga 806 | 6901 ggcttcattt aattatttga agtcacctaa tttttctaaa ctgataaata ttataatttg 807 | 6961 gtttttacta ttaagtgttt gcctaggttc tttaatctac tcaaccgctg ctttaggtgt 808 | 7021 tttaatgtct aatttaggca tgccttctta ctgtactggt tacagagaag gctatttgaa 809 | 7081 ctctactaat gtcactattg caacctactg tactggttct ataccttgta gtgtttgtct 810 | 7141 tagtggttta gattctttag acacctatcc ttctttagaa actatacaaa ttaccatttc 811 | 7201 atcttttaaa tgggatttaa ctgcttttgg cttagttgca gagtggtttt tggcatatat 812 | 7261 tcttttcact aggtttttct atgtacttgg attggctgca atcatgcaat tgtttttcag 813 | 7321 ctattttgca gtacatttta ttagtaattc ttggcttatg tggttaataa ttaatcttgt 814 | 7381 acaaatggcc ccgatttcag ctatggttag aatgtacatc ttctttgcat cattttatta 815 | 7441 tgtatggaaa agttatgtgc atgttgtaga cggttgtaat tcatcaactt gtatgatgtg 816 | 7501 ttacaaacgt aatagagcaa caagagtcga atgtacaact attgttaatg gtgttagaag 817 | 7561 gtccttttat gtctatgcta atggaggtaa aggcttttgc aaactacaca attggaattg 818 | 7621 tgttaattgt gatacattct gtgctggtag tacatttatt agtgatgaag ttgcgagaga 819 | 7681 cttgtcacta cagtttaaaa gaccaataaa tcctactgac cagtcttctt acatcgttga 820 | 7741 tagtgttaca gtgaagaatg gttccatcca tctttacttt gataaagctg gtcaaaagac 821 | 7801 ttatgaaaga cattctctct ctcattttgt taacttagac aacctgagag ctaataacac 822 | 7861 taaaggttca ttgcctatta atgttatagt ttttgatggt aaatcaaaat gtgaagaatc 823 | 7921 atctgcaaaa tcagcgtctg tttactacag tcagcttatg tgtcaaccta tactgttact 824 | 7981 agatcaggca ttagtgtctg atgttggtga tagtgcggaa gttgcagtta aaatgtttga 825 | 8041 tgcttacgtt aatacgtttt catcaacttt taacgtacca atggaaaaac tcaaaacact 826 | 8101 agttgcaact gcagaagctg aacttgcaaa gaatgtgtcc ttagacaatg tcttatctac 827 | 8161 ttttatttca gcagctcggc aagggtttgt tgattcagat gtagaaacta aagatgttgt 828 | 8221 tgaatgtctt aaattgtcac atcaatctga catagaagtt actggcgata gttgtaataa 829 | 8281 ctatatgctc acctataaca aagttgaaaa catgacaccc cgtgaccttg gtgcttgtat 830 | 8341 tgactgtagt gcgcgtcata ttaatgcgca ggtagcaaaa agtcacaaca ttgctttgat 831 | 8401 atggaacgtt aaagatttca tgtcattgtc tgaacaacta cgaaaacaaa tacgtagtgc 832 | 8461 tgctaaaaag aataacttac cttttaagtt gacatgtgca actactagac aagttgttaa 833 | 8521 tgttgtaaca acaaagatag cacttaaggg tggtaaaatt gttaataatt ggttgaagca 834 | 8581 gttaattaaa gttacacttg tgttcctttt tgttgctgct attttctatt taataacacc 835 | 8641 tgttcatgtc atgtctaaac atactgactt ttcaagtgaa atcataggat acaaggctat 836 | 8701 tgatggtggt gtcactcgtg acatagcatc tacagatact tgttttgcta acaaacatgc 837 | 8761 tgattttgac acatggttta gccagcgtgg tggtagttat actaatgaca aagcttgccc 838 | 8821 attgattgct gcagtcataa caagagaagt gggttttgtc gtgcctggtt tgcctggcac 839 | 8881 gatattacgc acaactaatg gtgacttttt gcatttctta cctagagttt ttagtgcagt 840 | 8941 tggtaacatc tgttacacac catcaaaact tatagagtac actgactttg caacatcagc 841 | 9001 ttgtgttttg gctgctgaat gtacaatttt taaagatgct tctggtaagc cagtaccata 842 | 9061 ttgttatgat accaatgtac tagaaggttc tgttgcttat gaaagtttac gccctgacac 843 | 9121 acgttatgtg ctcatggatg gctctattat tcaatttcct aacacctacc ttgaaggttc 844 | 9181 tgttagagtg gtaacaactt ttgattctga gtactgtagg cacggcactt gtgaaagatc 845 | 9241 agaagctggt gtttgtgtat ctactagtgg tagatgggta cttaacaatg attattacag 846 | 9301 atctttacca ggagttttct gtggtgtaga tgctgtaaat ttacttacta atatgtttac 847 | 9361 accactaatt caacctattg gtgctttgga catatcagca tctatagtag ctggtggtat 848 | 9421 tgtagctatc gtagtaacat gccttgccta ctattttatg aggtttagaa gagcttttgg 849 | 9481 tgaatacagt catgtagttg cctttaatac tttactattc cttatgtcat tcactgtact 850 | 9541 ctgtttaaca ccagtttact cattcttacc tggtgtttat tctgttattt acttgtactt 851 | 9601 gacattttat cttactaatg atgtttcttt tttagcacat attcagtgga tggttatgtt 852 | 9661 cacaccttta gtacctttct ggataacaat tgcttatatc atttgtattt ccacaaagca 853 | 9721 tttctattgg ttctttagta attacctaaa gagacgtgta gtctttaatg gtgtttcctt 854 | 9781 tagtactttt gaagaagctg cgctgtgcac ctttttgtta aataaagaaa tgtatctaaa 855 | 9841 gttgcgtagt gatgtgctat tacctcttac gcaatataat agatacttag ctctttataa 856 | 9901 taagtacaag tattttagtg gagcaatgga tacaactagc tacagagaag ctgcttgttg 857 | 9961 tcatctcgca aaggctctca atgacttcag taactcaggt tctgatgttc tttaccaacc 858 | 10021 accacaaacc tctatcacct cagctgtttt gcagagtggt tttagaaaaa tggcattccc 859 | 10081 atctggtaaa gttgagggtt gtatggtaca agtaacttgt ggtacaacta cacttaacgg 860 | 10141 tctttggctt gatgacgtag tttactgtcc aagacatgtg atctgcacct ctgaagacat 861 | 10201 gcttaaccct aattatgaag atttactcat tcgtaagtct aatcataatt tcttggtaca 862 | 10261 ggctggtaat gttcaactca gggttattgg acattctatg caaaattgtg tacttaagct 863 | 10321 taaggttgat acagccaatc ctaagacacc taagtataag tttgttcgca ttcaaccagg 864 | 10381 acagactttt tcagtgttag cttgttacaa tggttcacca tctggtgttt accaatgtgc 865 | 10441 tatgaggccc aatttcacta ttaagggttc attccttaat ggttcatgtg gtagtgttgg 866 | 10501 ttttaacata gattatgact gtgtctcttt ttgttacatg caccatatgg aattaccaac 867 | 10561 tggagttcat gctggcacag acttagaagg taacttttat ggaccttttg ttgacaggca 868 | 10621 aacagcacaa gcagctggta cggacacaac tattacagtt aatgttttag cttggttgta 869 | 10681 cgctgctgtt ataaatggag acaggtggtt tctcaatcga tttaccacaa ctcttaatga 870 | 10741 ctttaacctt gtggctatga agtacaatta tgaacctcta acacaagacc atgttgacat 871 | 10801 actaggacct ctttctgctc aaactggaat tgccgtttta gatatgtgtg cttcattaaa 872 | 10861 agaattactg caaaatggta tgaatggacg taccatattg ggtagtgctt tattagaaga 873 | 10921 tgaatttaca ccttttgatg ttgttagaca atgctcaggt gttactttcc aaagtgcagt 874 | 10981 gaaaagaaca atcaagggta cacaccactg gttgttactc acaattttga cttcactttt 875 | 11041 agttttagtc cagagtactc aatggtcttt gttctttttt ttgtatgaaa atgccttttt 876 | 11101 accttttgct atgggtatta ttgctatgtc tgcttttgca atgatgtttg tcaaacataa 877 | 11161 gcatgcattt ctctgtttgt ttttgttacc ttctcttgcc actgtagctt attttaatat 878 | 11221 ggtctatatg cctgctagtt gggtgatgcg tattatgaca tggttggata tggttgatac 879 | 11281 tagtttgtct ggttttaagc taaaagactg tgttatgtat gcatcagctg tagtgttact 880 | 11341 aatccttatg acagcaagaa ctgtgtatga tgatggtgct aggagagtgt ggacacttat 881 | 11401 gaatgtcttg acactcgttt ataaagttta ttatggtaat gctttagatc aagccatttc 882 | 11461 catgtgggct cttataatct ctgttacttc taactactca ggtgtagtta caactgtcat 883 | 11521 gtttttggcc agaggtattg tttttatgtg tgttgagtat tgccctattt tcttcataac 884 | 11581 tggtaataca cttcagtgta taatgctagt ttattgtttc ttaggctatt tttgtacttg 885 | 11641 ttactttggc ctcttttgtt tactcaaccg ctactttaga ctgactcttg gtgtttatga 886 | 11701 ttacttagtt tctacacagg agtttagata tatgaattca cagggactac tcccacccaa 887 | 11761 gaatagcata gatgccttca aactcaacat taaattgttg ggtgttggtg gcaaaccttg 888 | 11821 tatcaaagta gccactgtac agtctaaaat gtcagatgta aagtgcacat cagtagtctt 889 | 11881 actctcagtt ttgcaacaac tcagagtaga atcatcatct aaattgtggg ctcaatgtgt 890 | 11941 ccagttacac aatgacattc tcttagctaa agatactact gaagcctttg aaaaaatggt 891 | 12001 ttcactactt tctgttttgc tttccatgca gggtgctgta gacataaaca agctttgtga 892 | 12061 agaaatgctg gacaacaggg caaccttaca agctatagcc tcagagttta gttcccttcc 893 | 12121 atcatatgca gcttttgcta ctgctcaaga agcttatgag caggctgttg ctaatggtga 894 | 12181 ttctgaagtt gttcttaaaa agttgaagaa gtctttgaat gtggctaaat ctgaatttga 895 | 12241 ccgtgatgca gccatgcaac gtaagttgga aaagatggct gatcaagcta tgacccaaat 896 | 12301 gtataaacag gctagatctg aggacaagag ggcaaaagtt actagtgcta tgcagacaat 897 | 12361 gcttttcact atgcttagaa agttggataa tgatgcactc aacaacatta tcaacaatgc 898 | 12421 aagagatggt tgtgttccct tgaacataat acctcttaca acagcagcca aactaatggt 899 | 12481 tgtcatacca gactataaca catataaaaa tacgtgtgat ggtacaacat ttacttatgc 900 | 12541 atcagcattg tgggaaatcc aacaggttgt agatgcagat agtaaaattg ttcaacttag 901 | 12601 tgaaattagt atggacaatt cacctaattt agcatggcct cttattgtaa cagctttaag 902 | 12661 ggccaattct gctgtcaaat tacagaataa tgagcttagt cctgttgcac tacgacagat 903 | 12721 gtcttgtgct gccggtacta cacaaactgc ttgcactgat gacaatgcgt tagcttacta 904 | 12781 caacacaaca aagggaggta ggtttgtact tgcactgtta tccgatttac aggatttgaa 905 | 12841 atgggctaga ttccctaaga gtgatggaac tggtactatc tatacagaac tggaaccacc 906 | 12901 ttgtaggttt gttacagaca cacctaaagg tcctaaagtg aagtatttat actttattaa 907 | 12961 aggattaaac aacctaaata gaggtatggt acttggtagt ttagctgcca cagtacgtct 908 | 13021 acaagctggt aatgcaacag aagtgcctgc caattcaact gtattatctt tctgtgcttt 909 | 13081 tgctgtagat gctgctaaag cttacaaaga ttatctagct agtgggggac aaccaatcac 910 | 13141 taattgtgtt aagatgttgt gtacacacac tggtactggt caggcaataa cagttacacc 911 | 13201 ggaagccaat atggatcaag aatcctttgg tggtgcatcg tgttgtctgt actgccgttg 912 | 13261 ccacatagat catccaaatc ctaaaggatt ttgtgactta aaaggtaagt atgtacaaat 913 | 13321 acctacaact tgtgctaatg accctgtggg ttttacactt aaaaacacag tctgtaccgt 914 | 13381 ctgcggtatg tggaaaggtt atggctgtag ttgtgatcaa ctccgcgaac ccatgcttca 915 | 13441 gtcagctgat gcacaatcgt ttttaaacgg gtttgcggtg taagtgcagc ccgtcttaca 916 | 13501 ccgtgcggca caggcactag tactgatgtc gtatacaggg cttttgacat ctacaatgat 917 | 13561 aaagtagctg gttttgctaa attcctaaaa actaattgtt gtcgcttcca agaaaaggac 918 | 13621 gaagatgaca atttaattga ttcttacttt gtagttaaga gacacacttt ctctaactac 919 | 13681 caacatgaag aaacaattta taatttactt aaggattgtc cagctgttgc taaacatgac 920 | 13741 ttctttaagt ttagaataga cggtgacatg gtaccacata tatcacgtca acgtcttact 921 | 13801 aaatacacaa tggcagacct cgtctatgct ttaaggcatt ttgatgaagg taattgtgac 922 | 13861 acattaaaag aaatacttgt cacatacaat tgttgtgatg atgattattt caataaaaag 923 | 13921 gactggtatg attttgtaga aaacccagat atattacgcg tatacgccaa cttaggtgaa 924 | 13981 cgtgtacgcc aagctttgtt aaaaacagta caattctgtg atgccatgcg aaatgctggt 925 | 14041 attgttggtg tactgacatt agataatcaa gatctcaatg gtaactggta tgatttcggt 926 | 14101 gatttcatac aaaccacgcc aggtagtgga gttcctgttg tagattctta ttattcattg 927 | 14161 ttaatgccta tattaacctt gaccagggct ttaactgcag agtcacatgt tgacactgac 928 | 14221 ttaacaaagc cttacattaa gtgggatttg ttaaaatatg acttcacgga agagaggtta 929 | 14281 aaactctttg accgttattt taaatattgg gatcagacat accacccaaa ttgtgttaac 930 | 14341 tgtttggatg acagatgcat tctgcattgt gcaaacttta atgttttatt ctctacagtg 931 | 14401 ttcccaccta caagttttgg accactagtg agaaaaatat ttgttgatgg tgttccattt 932 | 14461 gtagtttcaa ctggatacca cttcagagag ctaggtgttg tacataatca ggatgtaaac 933 | 14521 ttacatagct ctagacttag ttttaaggaa ttacttgtgt atgctgctga ccctgctatg 934 | 14581 cacgctgctt ctggtaatct attactagat aaacgcacta cgtgcttttc agtagctgca 935 | 14641 cttactaaca atgttgcttt tcaaactgtc aaacccggta attttaacaa agacttctat 936 | 14701 gactttgctg tgtctaaggg tttctttaag gaaggaagtt ctgttgaatt aaaacacttc 937 | 14761 ttctttgctc aggatggtaa tgctgctatc agcgattatg actactatcg ttataatcta 938 | 14821 ccaacaatgt gtgatatcag acaactacta tttgtagttg aagttgttga taagtacttt 939 | 14881 gattgttacg atggtggctg tattaatgct aaccaagtca tcgtcaacaa cctagacaaa 940 | 14941 tcagctggtt ttccatttaa taaatggggt aaggctagac tttattatga ttcaatgagt 941 | 15001 tatgaggatc aagatgcact tttcgcatat acaaaacgta atgtcatccc tactataact 942 | 15061 caaatgaatc ttaagtatgc cattagtgca aagaatagag ctcgcaccgt agctggtgtc 943 | 15121 tctatctgta gtactatgac caatagacag tttcatcaaa aattattgaa atcaatagcc 944 | 15181 gccactagag gagctactgt agtaattgga acaagcaaat tctatggtgg ttggcacaac 945 | 15241 atgttaaaaa ctgtttatag tgatgtagaa aaccctcacc ttatgggttg ggattatcct 946 | 15301 aaatgtgata gagccatgcc taacatgctt agaattatgg cctcacttgt tcttgctcgc 947 | 15361 aaacatacaa cgtgttgtag cttgtcacac cgtttctata gattagctaa tgagtgtgct 948 | 15421 caagtattga gtgaaatggt catgtgtggc ggttcactat atgttaaacc aggtggaacc 949 | 15481 tcatcaggag atgccacaac tgcttatgct aatagtgttt ttaacatttg tcaagctgtc 950 | 15541 acggccaatg ttaatgcact tttatctact gatggtaaca aaattgccga taagtatgtc 951 | 15601 cgcaatttac aacacagact ttatgagtgt ctctatagaa atagagatgt tgacacagac 952 | 15661 tttgtgaatg agttttacgc atatttgcgt aaacatttct caatgatgat actctctgac 953 | 15721 gatgctgttg tgtgtttcaa tagcacttat gcatctcaag gtctagtggc tagcataaag 954 | 15781 aactttaagt cagttcttta ttatcaaaac aatgttttta tgtctgaagc aaaatgttgg 955 | 15841 actgagactg accttactaa aggacctcat gaattttgct ctcaacatac aatgctagtt 956 | 15901 aaacagggtg atgattatgt gtaccttcct tacccagatc catcaagaat cctaggggcc 957 | 15961 ggctgttttg tagatgatat cgtaaaaaca gatggtacac ttatgattga acggttcgtg 958 | 16021 tctttagcta tagatgctta cccacttact aaacatccta atcaggagta tgctgatgtc 959 | 16081 tttcatttgt acttacaata cataagaaag ctacatgatg agttaacagg acacatgtta 960 | 16141 gacatgtatt ctgttatgct tactaatgat aacacttcaa ggtattggga acctgagttt 961 | 16201 tatgaggcta tgtacacacc gcatacagtc ttacaggctg ttggggcttg tgttctttgc 962 | 16261 aattcacaga cttcattaag atgtggtgct tgcatacgta gaccattctt atgttgtaaa 963 | 16321 tgctgttacg accatgtcat atcaacatca cataaattag tcttgtctgt taatccgtat 964 | 16381 gtttgcaatg ctccaggttg tgatgtcaca gatgtgactc aactttactt aggaggtatg 965 | 16441 agctattatt gtaaatcaca taaaccaccc attagttttc cattgtgtgc taatggacaa 966 | 16501 gtttttggtt tatataaaaa tacatgtgtt ggtagcgata atgttactga ctttaatgca 967 | 16561 attgcaacat gtgactggac aaatgctggt gattacattt tagctaacac ctgtactgaa 968 | 16621 agactcaagc tttttgcagc agaaacgctc aaagctactg aggagacatt taaactgtct 969 | 16681 tatggtattg ctactgtacg tgaagtgctg tctgacagag aattacatct ttcatgggaa 970 | 16741 gttggtaaac ctagaccacc acttaaccga aattatgtct ttactggtta tcgtgtaact 971 | 16801 aaaaacagta aagtacaaat aggagagtac acctttgaaa aaggtgacta tggtgatgct 972 | 16861 gttgtttacc gaggtacaac aacttacaaa ttaaatgttg gtgattattt tgtgctgaca 973 | 16921 tcacatacag taatgccatt aagtgcacct acactagtgc cacaagagca ctatgttaga 974 | 16981 attactggct tatacccaac actcaatatc tcagatgagt tttctagcaa tgttgcaaat 975 | 17041 tatcaaaagg ttggtatgca aaagtattct acactccagg gaccacctgg tactggtaag 976 | 17101 agtcattttg ctattggcct agctctctac tacccttctg ctcgcatagt gtatacagct 977 | 17161 tgctctcatg ccgctgttga tgcactatgt gagaaggcat taaaatattt gcctatagat 978 | 17221 aaatgtagta gaattatacc tgcacgtgct cgtgtagagt gttttgataa attcaaagtg 979 | 17281 aattcaacat tagaacagta tgtcttttgt actgtaaatg cattgcctga gacgacagca 980 | 17341 gatatagttg tctttgatga aatttcaatg gccacaaatt atgatttgag tgttgtcaat 981 | 17401 gccagattac gtgctaagca ctatgtgtac attggcgacc ctgctcaatt acctgcacca 982 | 17461 cgcacattgc taactaaggg cacactagaa ccagaatatt tcaattcagt gtgtagactt 983 | 17521 atgaaaacta taggtccaga catgttcctc ggaacttgtc ggcgttgtcc tgctgaaatt 984 | 17581 gttgacactg tgagtgcttt ggtttatgat aataagctta aagcacataa agacaaatca 985 | 17641 gctcaatgct ttaaaatgtt ttataagggt gttatcacgc atgatgtttc atctgcaatt 986 | 17701 aacaggccac aaataggcgt ggtaagagaa ttccttacac gtaaccctgc ttggagaaaa 987 | 17761 gctgtcttta tttcacctta taattcacag aatgctgtag cctcaaagat tttgggacta 988 | 17821 ccaactcaaa ctgttgattc atcacagggc tcagaatatg actatgtcat attcactcaa 989 | 17881 accactgaaa cagctcactc ttgtaatgta aacagattta atgttgctat taccagagca 990 | 17941 aaagtaggca tactttgcat aatgtctgat agagaccttt atgacaagtt gcaatttaca 991 | 18001 agtcttgaaa ttccacgtag gaatgtggca actttacaag ctgaaaatgt aacaggactc 992 | 18061 tttaaagatt gtagtaaggt aatcactggg ttacatccta cacaggcacc tacacacctc 993 | 18121 agtgttgaca ctaaattcaa aactgaaggt ttatgtgttg acatacctgg catacctaag 994 | 18181 gacatgacct atagaagact catctctatg atgggtttta aaatgaatta tcaagttaat 995 | 18241 ggttacccta acatgtttat cacccgcgaa gaagctataa gacatgtacg tgcatggatt 996 | 18301 ggcttcgatg tcgaggggtg tcatgctact agagaagctg ttggtaccaa tttaccttta 997 | 18361 cagctaggtt tttctacagg tgttaaccta gttgctgtac ctacaggtta tgttgataca 998 | 18421 cctaataata cagatttttc cagagttagt gctaaaccac cgcctggaga tcaatttaaa 999 | 18481 cacctcatac cacttatgta caaaggactt ccttggaatg tagtgcgtat aaagattgta 1000 | 18541 caaatgttaa gtgacacact taaaaatctc tctgacagag tcgtatttgt cttatgggca 1001 | 18601 catggctttg agttgacatc tatgaagtat tttgtgaaaa taggacctga gcgcacctgt 1002 | 18661 tgtctatgtg atagacgtgc cacatgcttt tccactgctt cagacactta tgcctgttgg 1003 | 18721 catcattcta ttggatttga ttacgtctat aatccgttta tgattgatgt tcaacaatgg 1004 | 18781 ggttttacag gtaacctaca aagcaaccat gatctgtatt gtcaagtcca tggtaatgca 1005 | 18841 catgtagcta gttgtgatgc aatcatgact aggtgtctag ctgtccacga gtgctttgtt 1006 | 18901 aagcgtgttg actggactat tgaatatcct ataattggtg atgaactgaa gattaatgcg 1007 | 18961 gcttgtagaa aggttcaaca catggttgtt aaagctgcat tattagcaga caaattccca 1008 | 19021 gttcttcacg acattggtaa ccctaaagct attaagtgtg tacctcaagc tgatgtagaa 1009 | 19081 tggaagttct atgatgcaca gccttgtagt gacaaagctt ataaaataga agaattattc 1010 | 19141 tattcttatg ccacacattc tgacaaattc acagatggtg tatgcctatt ttggaattgc 1011 | 19201 aatgtcgata gatatcctgc taattccatt gtttgtagat ttgacactag agtgctatct 1012 | 19261 aaccttaact tgcctggttg tgatggtggc agtttgtatg taaataaaca tgcattccac 1013 | 19321 acaccagctt ttgataaaag tgcttttgtt aatttaaaac aattaccatt tttctattac 1014 | 19381 tctgacagtc catgtgagtc tcatggaaaa caagtagtgt cagatataga ttatgtacca 1015 | 19441 ctaaagtctg ctacgtgtat aacacgttgc aatttaggtg gtgctgtctg tagacatcat 1016 | 19501 gctaatgagt acagattgta tctcgatgct tataacatga tgatctcagc tggctttagc 1017 | 19561 ttgtgggttt acaaacaatt tgatacttat aacctctgga acacttttac aagacttcag 1018 | 19621 agtttagaaa atgtggcttt taatgttgta aataagggac actttgatgg acaacagggt 1019 | 19681 gaagtaccag tttctatcat taataacact gtttacacaa aagttgatgg tgttgatgta 1020 | 19741 gaattgtttg aaaataaaac aacattacct gttaatgtag catttgagct ttgggctaag 1021 | 19801 cgcaacatta aaccagtacc agaggtgaaa atactcaata atttgggtgt ggacattgct 1022 | 19861 gctaatactg tgatctggga ctacaaaaga gatgctccag cacatatatc tactattggt 1023 | 19921 gtttgttcta tgactgacat agccaagaaa ccaactgaaa cgatttgtgc accactcact 1024 | 19981 gtcttttttg atggtagagt tgatggtcaa gtagacttat ttagaaatgc ccgtaatggt 1025 | 20041 gttcttatta cagaaggtag tgttaaaggt ttacaaccat ctgtaggtcc caaacaagct 1026 | 20101 agtcttaatg gagtcacatt aattggagaa gccgtaaaaa cacagttcaa ttattataag 1027 | 20161 aaagttgatg gtgttgtcca acaattacct gaaacttact ttactcagag tagaaattta 1028 | 20221 caagaattta aacccaggag tcaaatggaa attgatttct tagaattagc tatggatgaa 1029 | 20281 ttcattgaac ggtataaatt agaaggctat gccttcgaac atatcgttta tggagatttt 1030 | 20341 agtcatagtc agttaggtgg tttacatcta ctgattggac tagctaaacg ttttaaggaa 1031 | 20401 tcaccttttg aattagaaga ttttattcct atggacagta cagttaaaaa ctatttcata 1032 | 20461 acagatgcgc aaacaggttc atctaagtgt gtgtgttctg ttattgattt attacttgat 1033 | 20521 gattttgttg aaataataaa atcccaagat ttatctgtag tttctaaggt tgtcaaagtg 1034 | 20581 actattgact atacagaaat ttcatttatg ctttggtgta aagatggcca tgtagaaaca 1035 | 20641 ttttacccaa aattacaatc tagtcaagcg tggcaaccgg gtgttgctat gcctaatctt 1036 | 20701 tacaaaatgc aaagaatgct attagaaaag tgtgaccttc aaaattatgg tgatagtgca 1037 | 20761 acattaccta aaggcataat gatgaatgtc gcaaaatata ctcaactgtg tcaatattta 1038 | 20821 aacacattaa cattagctgt accctataat atgagagtta tacattttgg tgctggttct 1039 | 20881 gataaaggag ttgcaccagg tacagctgtt ttaagacagt ggttgcctac gggtacgctg 1040 | 20941 cttgtcgatt cagatcttaa tgactttgtc tctgatgcag attcaacttt gattggtgat 1041 | 21001 tgtgcaactg tacatacagc taataaatgg gatctcatta ttagtgatat gtacgaccct 1042 | 21061 aagactaaaa atgttacaaa agaaaatgac tctaaagagg gttttttcac ttacatttgt 1043 | 21121 gggtttatac aacaaaagct agctcttgga ggttccgtgg ctataaagat aacagaacat 1044 | 21181 tcttggaatg ctgatcttta taagctcatg ggacacttcg catggtggac agcctttgtt 1045 | 21241 actaatgtga atgcgtcatc atctgaagca tttttaattg gatgtaatta tcttggcaaa 1046 | 21301 ccacgcgaac aaatagatgg ttatgtcatg catgcaaatt acatattttg gaggaataca 1047 | 21361 aatccaattc agttgtcttc ctattcttta tttgacatga gtaaatttcc ccttaaatta 1048 | 21421 aggggtactg ctgttatgtc tttaaaagaa ggtcaaatca atgatatgat tttatctctt 1049 | 21481 cttagtaaag gtagacttat aattagagaa aacaacagag ttgttatttc tagtgatgtt 1050 | 21541 cttgttaaca actaaacgaa caatgtttgt ttttcttgtt ttattgccac tagtctctag 1051 | 21601 tcagtgtgtt aatcttacaa ccagaactca attaccccct gcatacacta attctttcac 1052 | 21661 acgtggtgtt tattaccctg acaaagtttt cagatcctca gttttacatt caactcagga 1053 | 21721 cttgttctta cctttctttt ccaatgttac ttggttccat gctatacatg tctctgggac 1054 | 21781 caatggtact aagaggtttg ataaccctgt cctaccattt aatgatggtg tttattttgc 1055 | 21841 ttccactgag aagtctaaca taataagagg ctggattttt ggtactactt tagattcgaa 1056 | 21901 gacccagtcc ctacttattg ttaataacgc tactaatgtt gttattaaag tctgtgaatt 1057 | 21961 tcaattttgt aatgatccat ttttgggtgt ttattaccac aaaaacaaca aaagttggat 1058 | 22021 ggaaagtgag ttcagagttt attctagtgc gaataattgc acttttgaat atgtctctca 1059 | 22081 gccttttctt atggaccttg aaggaaaaca gggtaatttc aaaaatctta gggaatttgt 1060 | 22141 gtttaagaat attgatggtt attttaaaat atattctaag cacacgccta ttaatttagt 1061 | 22201 gcgtgatctc cctcagggtt tttcggcttt agaaccattg gtagatttgc caataggtat 1062 | 22261 taacatcact aggtttcaaa ctttacttgc tttacataga agttatttga ctcctggtga 1063 | 22321 ttcttcttca ggttggacag ctggtgctgc agcttattat gtgggttatc ttcaacctag 1064 | 22381 gacttttcta ttaaaatata atgaaaatgg aaccattaca gatgctgtag actgtgcact 1065 | 22441 tgaccctctc tcagaaacaa agtgtacgtt gaaatccttc actgtagaaa aaggaatcta 1066 | 22501 tcaaacttct aactttagag tccaaccaac agaatctatt gttagatttc ctaatattac 1067 | 22561 aaacttgtgc ccttttggtg aagtttttaa cgccaccaga tttgcatctg tttatgcttg 1068 | 22621 gaacaggaag agaatcagca actgtgttgc tgattattct gtcctatata attccgcatc 1069 | 22681 attttccact tttaagtgtt atggagtgtc tcctactaaa ttaaatgatc tctgctttac 1070 | 22741 taatgtctat gcagattcat ttgtaattag aggtgatgaa gtcagacaaa tcgctccagg 1071 | 22801 gcaaactgga aagattgctg attataatta taaattacca gatgatttta caggctgcgt 1072 | 22861 tatagcttgg aattctaaca atcttgattc taaggttggt ggtaattata attacctgta 1073 | 22921 tagattgttt aggaagtcta atctcaaacc ttttgagaga gatatttcaa ctgaaatcta 1074 | 22981 tcaggccggt agcacacctt gtaatggtgt tgaaggtttt aattgttact ttcctttaca 1075 | 23041 atcatatggt ttccaaccca ctaatggtgt tggttaccaa ccatacagag tagtagtact 1076 | 23101 ttcttttgaa cttctacatg caccagcaac tgtttgtgga cctaaaaagt ctactaattt 1077 | 23161 ggttaaaaac aaatgtgtca atttcaactt caatggttta acaggcacag gtgttcttac 1078 | 23221 tgagtctaac aaaaagtttc tgcctttcca acaatttggc agagacattg ctgacactac 1079 | 23281 tgatgctgtc cgtgatccac agacacttga gattcttgac attacaccat gttcttttgg 1080 | 23341 tggtgtcagt gttataacac caggaacaaa tacttctaac caggttgctg ttctttatca 1081 | 23401 ggatgttaac tgcacagaag tccctgttgc tattcatgca gatcaactta ctcctacttg 1082 | 23461 gcgtgtttat tctacaggtt ctaatgtttt tcaaacacgt gcaggctgtt taataggggc 1083 | 23521 tgaacatgtc aacaactcat atgagtgtga catacccatt ggtgcaggta tatgcgctag 1084 | 23581 ttatcagact cagactaatt ctcctcggcg ggcacgtagt gtagctagtc aatccatcat 1085 | 23641 tgcctacact atgtcacttg gtgcagaaaa ttcagttgct tactctaata actctattgc 1086 | 23701 catacccaca aattttacta ttagtgttac cacagaaatt ctaccagtgt ctatgaccaa 1087 | 23761 gacatcagta gattgtacaa tgtacatttg tggtgattca actgaatgca gcaatctttt 1088 | 23821 gttgcaatat ggcagttttt gtacacaatt aaaccgtgct ttaactggaa tagctgttga 1089 | 23881 acaagacaaa aacacccaag aagtttttgc acaagtcaaa caaatttaca aaacaccacc 1090 | 23941 aattaaagat tttggtggtt ttaatttttc acaaatatta ccagatccat caaaaccaag 1091 | 24001 caagaggtca tttattgaag atctactttt caacaaagtg acacttgcag atgctggctt 1092 | 24061 catcaaacaa tatggtgatt gccttggtga tattgctgct agagacctca tttgtgcaca 1093 | 24121 aaagtttaac ggccttactg ttttgccacc tttgctcaca gatgaaatga ttgctcaata 1094 | 24181 cacttctgca ctgttagcgg gtacaatcac ttctggttgg acctttggtg caggtgctgc 1095 | 24241 attacaaata ccatttgcta tgcaaatggc ttataggttt aatggtattg gagttacaca 1096 | 24301 gaatgttctc tatgagaacc aaaaattgat tgccaaccaa tttaatagtg ctattggcaa 1097 | 24361 aattcaagac tcactttctt ccacagcaag tgcacttgga aaacttcaag atgtggtcaa 1098 | 24421 ccaaaatgca caagctttaa acacgcttgt taaacaactt agctccaatt ttggtgcaat 1099 | 24481 ttcaagtgtt ttaaatgata tcctttcacg tcttgacaaa gttgaggctg aagtgcaaat 1100 | 24541 tgataggttg atcacaggca gacttcaaag tttgcagaca tatgtgactc aacaattaat 1101 | 24601 tagagctgca gaaatcagag cttctgctaa tcttgctgct actaaaatgt cagagtgtgt 1102 | 24661 acttggacaa tcaaaaagag ttgatttttg tggaaagggc tatcatctta tgtccttccc 1103 | 24721 tcagtcagca cctcatggtg tagtcttctt gcatgtgact tatgtccctg cacaagaaaa 1104 | 24781 gaacttcaca actgctcctg ccatttgtca tgatggaaaa gcacactttc ctcgtgaagg 1105 | 24841 tgtctttgtt tcaaatggca cacactggtt tgtaacacaa aggaattttt atgaaccaca 1106 | 24901 aatcattact acagacaaca catttgtgtc tggtaactgt gatgttgtaa taggaattgt 1107 | 24961 caacaacaca gtttatgatc ctttgcaacc tgaattagac tcattcaagg aggagttaga 1108 | 25021 taaatatttt aagaatcata catcaccaga tgttgattta ggtgacatct ctggcattaa 1109 | 25081 tgcttcagtt gtaaacattc aaaaagaaat tgaccgcctc aatgaggttg ccaagaattt 1110 | 25141 aaatgaatct ctcatcgatc tccaagaact tggaaagtat gagcagtata taaaatggcc 1111 | 25201 atggtacatt tggctaggtt ttatagctgg cttgattgcc atagtaatgg tgacaattat 1112 | 25261 gctttgctgt atgaccagtt gctgtagttg tctcaagggc tgttgttctt gtggatcctg 1113 | 25321 ctgcaaattt gatgaagacg actctgagcc agtgctcaaa ggagtcaaat tacattacac 1114 | 25381 ataaacgaac ttatggattt gtttatgaga atcttcacaa ttggaactgt aactttgaag 1115 | 25441 caaggtgaaa tcaaggatgc tactccttca gattttgttc gcgctactgc aacgataccg 1116 | 25501 atacaagcct cactcccttt cggatggctt attgttggcg ttgcacttct tgctgttttt 1117 | 25561 cagagcgctt ccaaaatcat aaccctcaaa aagagatggc aactagcact ctccaagggt 1118 | 25621 gttcactttg tttgcaactt gctgttgttg tttgtaacag tttactcaca ccttttgctc 1119 | 25681 gttgctgctg gccttgaagc cccttttctc tatctttatg ctttagtcta cttcttgcag 1120 | 25741 agtataaact ttgtaagaat aataatgagg ctttggcttt gctggaaatg ccgttccaaa 1121 | 25801 aacccattac tttatgatgc caactatttt ctttgctggc atactaattg ttacgactat 1122 | 25861 tgtatacctt acaatagtgt aacttcttca attgtcatta cttcaggtga tggcacaaca 1123 | 25921 agtcctattt ctgaacatga ctaccagatt ggtggttata ctgaaaaatg ggaatctgga 1124 | 25981 gtaaaagact gtgttgtatt acacagttac ttcacttcag actattacca gctgtactca 1125 | 26041 actcaattga gtacagacac tggtgttgaa catgttacct tcttcatcta caataaaatt 1126 | 26101 gttgatgagc ctgaagaaca tgtccaaatt cacacaatcg acggttcatc cggagttgtt 1127 | 26161 aatccagtaa tggaaccaat ttatgatgaa ccgacgacga ctactagcgt gcctttgtaa 1128 | 26221 gcacaagctg atgagtacga acttatgtac tcattcgttt cggaagagac aggtacgtta 1129 | 26281 atagttaata gcgtacttct ttttcttgct ttcgtggtat tcttgctagt tacactagcc 1130 | 26341 atccttactg cgcttcgatt gtgtgcgtac tgctgcaata ttgttaacgt gagtcttgta 1131 | 26401 aaaccttctt tttacgttta ctctcgtgtt aaaaatctga attcttctag agttcctgat 1132 | 26461 cttctggtct aaacgaacta aatattatat tagtttttct gtttggaact ttaattttag 1133 | 26521 ccatggcaga ttccaacggt actattaccg ttgaagagct taaaaagctc cttgaacaat 1134 | 26581 ggaacctagt aataggtttc ctattcctta catggatttg tcttctacaa tttgcctatg 1135 | 26641 ccaacaggaa taggtttttg tatataatta agttaatttt cctctggctg ttatggccag 1136 | 26701 taactttagc ttgttttgtg cttgctgctg tttacagaat aaattggatc accggtggaa 1137 | 26761 ttgctatcgc aatggcttgt cttgtaggct tgatgtggct cagctacttc attgcttctt 1138 | 26821 tcagactgtt tgcgcgtacg cgttccatgt ggtcattcaa tccagaaact aacattcttc 1139 | 26881 tcaacgtgcc actccatggc actattctga ccagaccgct tctagaaagt gaactcgtaa 1140 | 26941 tcggagctgt gatccttcgt ggacatcttc gtattgctgg acaccatcta ggacgctgtg 1141 | 27001 acatcaagga cctgcctaaa gaaatcactg ttgctacatc acgaacgctt tcttattaca 1142 | 27061 aattgggagc ttcgcagcgt gtagcaggtg actcaggttt tgctgcatac agtcgctaca 1143 | 27121 ggattggcaa ctataaatta aacacagacc attccagtag cagtgacaat attgctttgc 1144 | 27181 ttgtacagta agtgacaaca gatgtttcat ctcgttgact ttcaggttac tatagcagag 1145 | 27241 atattactaa ttattatgag gacttttaaa gtttccattt ggaatcttga ttacatcata 1146 | 27301 aacctcataa ttaaaaattt atctaagtca ctaactgaga ataaatattc tcaattagat 1147 | 27361 gaagagcaac caatggagat tgattaaacg aacatgaaaa ttattctttt cttggcactg 1148 | 27421 ataacactcg ctacttgtga gctttatcac taccaagagt gtgttagagg tacaacagta 1149 | 27481 cttttaaaag aaccttgctc ttctggaaca tacgagggca attcaccatt tcatcctcta 1150 | 27541 gctgataaca aatttgcact gacttgcttt agcactcaat ttgcttttgc ttgtcctgac 1151 | 27601 ggcgtaaaac acgtctatca gttacgtgcc agatcagttt cacctaaact gttcatcaga 1152 | 27661 caagaggaag ttcaagaact ttactctcca atttttctta ttgttgcggc aatagtgttt 1153 | 27721 ataacacttt gcttcacact caaaagaaag acagaatgat tgaactttca ttaattgact 1154 | 27781 tctatttgtg ctttttagcc tttctgctat tccttgtttt aattatgctt attatctttt 1155 | 27841 ggttctcact tgaactgcaa gatcataatg aaacttgtca cgcctaaacg aacatgaaat 1156 | 27901 ttcttgtttt cttaggaatc atcacaactg tagctgcatt tcaccaagaa tgtagtttac 1157 | 27961 agtcatgtac tcaacatcaa ccatatgtag ttgatgaccc gtgtcctatt cacttctatt 1158 | 28021 ctaaatggta tattagagta ggagctagaa aatcagcacc tttaattgaa ttgtgcgtgg 1159 | 28081 atgaggctgg ttctaaatca cccattcagt acatcgatat cggtaattat acagtttcct 1160 | 28141 gtttaccttt tacaattaat tgccaggaac ctaaattggg tagtcttgta gtgcgttgtt 1161 | 28201 cgttctatga agacttttta gagtatcatg acgttcgtgt tgttttagat ttcatctaaa 1162 | 28261 cgaacaaact aaaatgtctg ataatggacc ccaaaatcag cgaaatgcac cccgcattac 1163 | 28321 gtttggtgga ccctcagatt caactggcag taaccagaat ggagaacgca gtggggcgcg 1164 | 28381 atcaaaacaa cgtcggcccc aaggtttacc caataatact gcgtcttggt tcaccgctct 1165 | 28441 cactcaacat ggcaaggaag accttaaatt ccctcgagga caaggcgttc caattaacac 1166 | 28501 caatagcagt ccagatgacc aaattggcta ctaccgaaga gctaccagac gaattcgtgg 1167 | 28561 tggtgacggt aaaatgaaag atctcagtcc aagatggtat ttctactacc taggaactgg 1168 | 28621 gccagaagct ggacttccct atggtgctaa caaagacggc atcatatggg ttgcaactga 1169 | 28681 gggagccttg aatacaccaa aagatcacat tggcacccgc aatcctgcta acaatgctgc 1170 | 28741 aatcgtgcta caacttcctc aaggaacaac attgccaaaa ggcttctacg cagaagggag 1171 | 28801 cagaggcggc agtcaagcct cttctcgttc ctcatcacgt agtcgcaaca gttcaagaaa 1172 | 28861 ttcaactcca ggcagcagta ggggaacttc tcctgctaga atggctggca atggcggtga 1173 | 28921 tgctgctctt gctttgctgc tgcttgacag attgaaccag cttgagagca aaatgtctgg 1174 | 28981 taaaggccaa caacaacaag gccaaactgt cactaagaaa tctgctgctg aggcttctaa 1175 | 29041 gaagcctcgg caaaaacgta ctgccactaa agcatacaat gtaacacaag ctttcggcag 1176 | 29101 acgtggtcca gaacaaaccc aaggaaattt tggggaccag gaactaatca gacaaggaac 1177 | 29161 tgattacaaa cattggccgc aaattgcaca atttgccccc agcgcttcag cgttcttcgg 1178 | 29221 aatgtcgcgc attggcatgg aagtcacacc ttcgggaacg tggttgacct acacaggtgc 1179 | 29281 catcaaattg gatgacaaag atccaaattt caaagatcaa gtcattttgc tgaataagca 1180 | 29341 tattgacgca tacaaaacat tcccaccaac agagcctaaa aaggacaaaa agaagaaggc 1181 | 29401 tgatgaaact caagccttac cgcagagaca gaagaaacag caaactgtga ctcttcttcc 1182 | 29461 tgctgcagat ttggatgatt tctccaaaca attgcaacaa tccatgagca gtgctgactc 1183 | 29521 aactcaggcc taaactcatg cagaccacac aaggcagatg ggctatataa acgttttcgc 1184 | 29581 ttttccgttt acgatatata gtctactctt gtgcagaatg aattctcgta actacatagc 1185 | 29641 acaagtagat gtagttaact ttaatctcac atagcaatct ttaatcagtg tgtaacatta 1186 | 29701 gggaggactt gaaagagcca ccacattttc accgaggcca cgcggagtac gatcgagtgt 1187 | 29761 acagtgaaca atgctaggga gagctgccta tatggaagag ccctaatgtg taaaattaat 1188 | 29821 tttagtagtg ctatccccat gtgattttaa tagcttctta ggagaatgac aaaaaaaaaa 1189 | 29881 aaaaaaaaaa aaaaaaaaaa aaa 1190 | // 1191 | 1192 | -------------------------------------------------------------------------------- /example-data/example-version.vcf: -------------------------------------------------------------------------------- 1 | ##fileformat=VCFv4.1 2 | ##contig= 3 | ##FORMAT= 4 | #CHROM POS ID REF ALT QUAL FILTER INFO 5 | NC_045512.2 25 . T G . . . 6 | NC_045512.2 241 . C T . . . 7 | NC_045512.2 512 . C T . . . 8 | NC_045512.2 514 . T C . . . 9 | NC_045512.2 520 . G T . . . 10 | NC_045512.2 521 . G T . . . 11 | NC_045512.2 710 . C T . . . 12 | NC_045512.2 734 . T C . . . 13 | NC_045512.2 739 . A G . . . 14 | NC_045512.2 745 . C A . . . 15 | NC_045512.2 784 . C T . . . 16 | NC_045512.2 832 . C T . . . 17 | NC_045512.2 835 . C T . . . 18 | NC_045512.2 875 . C T . . . 19 | NC_045512.2 878 . C T . . . 20 | NC_045512.2 894 . A G . . . 21 | NC_045512.2 913 . C T . . . 22 | NC_045512.2 960 . G A . . . 23 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | biopython==1.72 2 | PyVCF==0.6.8 3 | -------------------------------------------------------------------------------- /vcf-annotator.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env python3 2 | """ 3 | vcf-annotator.py [-h] [--output STRING] [--version] VCF_FILE GENBANK_FILE 4 | 5 | Annotate variants from a VCF file using the reference genome's GenBank file. 6 | 7 | positional arguments: 8 | VCF_FILE VCF file of Variants 9 | GENBANK_FILE GenBank file of the reference genome. 10 | 11 | optional arguments: 12 | -h, --help show this help message and exit 13 | --output STRING File to write VCF output to (Default STDOUT). 14 | --version show program's version number and exit 15 | 16 | Example Usage: 17 | ./vcf-annotator.py example-data/example.vcf example-data/example.gb 18 | """ 19 | import collections 20 | from Bio import SeqIO 21 | from Bio.Seq import Seq 22 | import vcf 23 | VERSION = 0.7 24 | 25 | 26 | class Annotator(object): 27 | """Annotate a given VCF file according to the reference GenBank.""" 28 | 29 | def __init__(self, gb_file=False, vcf_file=False): 30 | """Initialize variables.""" 31 | self.__annotated_features = ["CDS", "tRNA", "rRNA", "ncRNA", 32 | "misc_feature"] 33 | self.__gb = GenBank(gb_file) 34 | self.__vcf = VCFTools(vcf_file) 35 | self.add_annotation_info() 36 | 37 | def add_annotation_info(self): 38 | """Add custom VCF info fields.""" 39 | self.__vcf.add_information_fields([ 40 | ['RefCodon', None, 'String', 'Reference codon'], 41 | ['AltCodon', None, 'String', 'Alternate codon'], 42 | ['RefAminoAcid', None, 'String', 'Reference amino acid'], 43 | ['AltAminoAcid', None, 'String', 'Alternate amino acid'], 44 | ['CodonPosition', '1', 'Integer', 'Codon position in the gene'], 45 | ['SNPCodonPosition', '1', 'Integer', 'SNP position in the codon'], 46 | ['AminoAcidChange', None, 'String', 'Amino acid change'], 47 | ['IsSynonymous', '1', 'Integer', 48 | '0:nonsynonymous, 1:synonymous, 9:N/A or Unknown'], 49 | ['IsTransition', '1', 'Integer', 50 | '0:transversion, 1:transition, 9:N/A or Unknown'], 51 | ['IsGenic', '1', 'Integer', '0:intergenic, 1:genic'], 52 | ['IsPseudo', '1', 'Integer', '0:not pseudo, 1:pseudo gene'], 53 | ['LocusTag', None, 'String', 'Locus tag associated with gene'], 54 | ['Gene', None, 'String', 'Name of gene'], 55 | ['Note', None, 'String', 'Note associated with gene'], 56 | ['Inference', None, 'String', 'Inference of feature.'], 57 | ['Product', None, 'String', 'Description of gene'], 58 | ['ProteinID', None, 'String', 'Protein ID of gene'], 59 | ['Comments', None, 'String', 'Example: Negative strand: T->C'], 60 | ['VariantType', None, 'String', 'Indel, SNP, Ambiguous_SNP'], 61 | ['FeatureType', None, 'String', 'The feature type of variant.'], 62 | ]) 63 | 64 | def annotate_vcf_records(self): 65 | """Annotate each record in the VCF acording to the input GenBank.""" 66 | for record in self.__vcf.records: 67 | self.__gb.accession = record.CHROM 68 | self.__gb.version = record.CHROM 69 | self.__gb.index = record.POS 70 | 71 | # Set defaults 72 | record.INFO['RefCodon'] = '.' 73 | record.INFO['AltCodon'] = '.' 74 | record.INFO['RefAminoAcid'] = '.' 75 | record.INFO['AltAminoAcid'] = '.' 76 | record.INFO['CodonPosition'] = '.' 77 | record.INFO['SNPCodonPosition'] = '.' 78 | record.INFO['AminoAcidChange'] = '.' 79 | record.INFO['IsSynonymous'] = 9 80 | record.INFO['IsTransition'] = 9 81 | record.INFO['Comments'] = '.' 82 | record.INFO['IsGenic'] = '0' 83 | record.INFO['IsPseudo'] = '0' 84 | record.INFO['LocusTag'] = '.' 85 | record.INFO['Gene'] = '.' 86 | record.INFO['Note'] = '.' 87 | record.INFO['Inference'] = '.' 88 | record.INFO['Product'] = '.' 89 | record.INFO['ProteinID'] = '.' 90 | record.INFO['FeatureType'] = 'inter_genic' 91 | 92 | # Get annotation info 93 | if self.__gb.feature_exists: 94 | record.INFO['FeatureType'] = self.__gb.feature.type 95 | if self.__gb.feature.type in self.__annotated_features: 96 | feature = self.__gb.feature 97 | if feature.type == "CDS": 98 | record.INFO['IsGenic'] = '1' 99 | 100 | qualifiers = { 101 | 'Note': 'note', 'LocusTag': 'locus_tag', 102 | 'Gene': 'gene', 'Product': 'product', 103 | 'ProteinID': 'protein_id', 104 | 'Inference': 'inference' 105 | } 106 | 107 | if feature.type == "tRNA": 108 | qualifiers['Note'] = 'anticodon' 109 | for k, v in qualifiers.items(): 110 | if v in feature.qualifiers: 111 | # Spell out semi-colons, commas and spaces 112 | record.INFO[k] = feature.qualifiers[v][0].replace( 113 | ';', '[semi-colon]' 114 | ).replace( 115 | ',', '[comma]' 116 | ).replace( 117 | ' ', '[space]' 118 | ) 119 | if v == 'anticodon': 120 | record.INFO[k] = 'anticodon{0}'.format( 121 | record.INFO[k] 122 | ) 123 | 124 | if 'pseudo' in feature.qualifiers: 125 | record.INFO['IsPseudo'] = '1' 126 | 127 | # Determine variant type 128 | if record.is_indel: 129 | if record.is_deletion: 130 | record.INFO['VariantType'] = 'Deletion' 131 | else: 132 | record.INFO['VariantType'] = 'Insertion' 133 | else: 134 | if len(record.ALT) > 1: 135 | record.ALT = self.__gb.determine_iupac_base(record.ALT) 136 | record.INFO['VariantType'] = 'Ambiguous_SNP' 137 | else: 138 | if record.is_transition: 139 | record.INFO['IsTransition'] = 1 140 | else: 141 | record.INFO['IsTransition'] = 0 142 | record.INFO['VariantType'] = 'SNP' 143 | 144 | if int(record.INFO['IsGenic']): 145 | alt_base = str(record.ALT[0]) 146 | 147 | # Determine codon information 148 | codon = self.__gb.codon_by_position(record.POS) 149 | record.INFO['RefCodon'] = ''.join(list(codon[0])) 150 | record.INFO['SNPCodonPosition'] = codon[1] 151 | record.INFO['CodonPosition'] = codon[2] 152 | 153 | # Adjust for ambiguous base and negative strand. 154 | if feature.strand == -1: 155 | alt_base = str( 156 | Seq(alt_base).complement() 157 | ) 158 | 159 | record.INFO['Comments'] = 'Negative:{0}->{1}'.format( 160 | Seq(record.REF).complement(), 161 | alt_base 162 | ) 163 | 164 | # Determine alternates 165 | record.INFO['AltCodon'] = list(record.INFO['RefCodon']) 166 | record.INFO['AltCodon'][ 167 | record.INFO['SNPCodonPosition'] 168 | ] = alt_base 169 | record.INFO['AltCodon'] = ''.join(record.INFO['AltCodon']) 170 | record.INFO['RefAminoAcid'] = Seq( 171 | record.INFO['RefCodon'] 172 | ).translate() 173 | record.INFO['AltAminoAcid'] = Seq( 174 | record.INFO['AltCodon'] 175 | ).translate() 176 | record.INFO['AminoAcidChange'] = '{0}{1}{2}'.format( 177 | str(record.INFO['RefAminoAcid']), 178 | record.INFO['CodonPosition'], 179 | str(record.INFO['AltAminoAcid']) 180 | ) 181 | 182 | if record.INFO['VariantType'] != 'Ambiguous_SNP': 183 | ref = str(record.INFO['RefAminoAcid']) 184 | alt = str(record.INFO['AltAminoAcid']) 185 | if ref == alt: 186 | record.INFO['IsSynonymous'] = 1 187 | else: 188 | record.INFO['IsSynonymous'] = 0 189 | 190 | def write_vcf(self, output='/dev/stdout'): 191 | """Write the VCF to the specified output.""" 192 | self.__vcf.write_vcf(output) 193 | 194 | 195 | class GenBank(object): 196 | """A class for parsing GenBank files.""" 197 | 198 | def __init__(self, gb=False): 199 | """Inititalize variables.""" 200 | self.genbank_file = gb 201 | self.records = {} 202 | self.record_index = {} 203 | self.record_ids = {} 204 | self.__gb = None 205 | self._index = None 206 | self._accession = None 207 | self.__position_index = None 208 | self.feature = None 209 | self.features = ["CDS", "rRNA", "tRNA", "ncRNA", "repeat_region", 210 | "misc_feature"] 211 | self.gene_codons = {} 212 | self.parse_genbank() 213 | 214 | @property 215 | def accession(self): 216 | """Accession for records.""" 217 | return self._index 218 | 219 | @accession.setter 220 | def accession(self, value): 221 | if value not in self.records: 222 | value = self.record_ids[value] 223 | 224 | self._accession = value 225 | self.__gb = self.records[value] 226 | self.__position_index = self.record_index[value] 227 | 228 | @property 229 | def index(self): 230 | """Postion index for features.""" 231 | return self._index 232 | 233 | @index.setter 234 | def index(self, value): 235 | self._index = self.__position_index[value - 1] 236 | self.__set_feature() 237 | 238 | def parse_genbank(self): 239 | with open(self.genbank_file, 'r') as gb_fh: 240 | for record in SeqIO.parse(gb_fh, 'genbank'): 241 | self.records[record.name] = record 242 | self.gene_codons[record.name] = {} 243 | self.record_index[record.name] = [None] * len(record.seq) 244 | self.record_ids[record.id] = record.name 245 | for i in range(len(record.features)): 246 | if record.features[i].type in self.features: 247 | start = int(record.features[i].location.start) 248 | end = int(record.features[i].location.end) 249 | self.record_index[record.name][start:end] = [i] * (end - start) 250 | 251 | def __set_feature(self): 252 | if self._index is None: 253 | self.feature_exists = False 254 | self.feature = None 255 | else: 256 | self.feature_exists = True 257 | self.feature = self.records[self._accession].features[self._index] 258 | 259 | def codon_by_position(self, pos): 260 | """Retreive the codon given a postion of a CDS feature.""" 261 | if self._index not in self.gene_codons[self._accession]: 262 | self.split_into_codons() 263 | gene_position = self.position_in_gene(pos) 264 | codon_position = gene_position // 3 265 | return [self.gene_codons[self._accession][self._index][codon_position], 266 | gene_position % 3, 267 | codon_position + 1] 268 | 269 | def split_into_codons(self): 270 | """Split the complete CDS feature in to a list of codons.""" 271 | start = self.feature.location.start 272 | end = self.feature.location.end 273 | seq = ''.join(list(self.__gb.seq[start:end])) 274 | 275 | if self.feature.strand == -1: 276 | seq = Seq(seq).reverse_complement() 277 | 278 | self.gene_codons[self._accession][self._index] = [ 279 | seq[i:i + 3] for i in range(0, len(seq), 3) 280 | ] 281 | 282 | def position_in_gene(self, pos): 283 | """Return a codon postion in a gene.""" 284 | if self.feature.strand == 1: 285 | return pos - self.feature.location.start - 1 286 | else: 287 | return self.feature.location.end - pos 288 | 289 | def base_by_pos(self, pos): 290 | """Print the base by position.""" 291 | print(self.__gb.seq[pos - 1]) 292 | 293 | def determine_iupac_base(self, bases): 294 | """ 295 | Determine the IUPAC symbol for a list of nucleotides. 296 | 297 | Source: https://en.wikipedia.org/wiki/Nucleic_acid_notation 298 | List elements are in this order: [A,C,G,T] 299 | """ 300 | if len(bases) > 1: 301 | iupac_notation = { 302 | 'W': [True, False, False, True], 303 | 'S': [False, True, True, False], 304 | 'M': [True, True, False, False], 305 | 'K': [False, False, True, True], 306 | 'R': [True, False, True, False], 307 | 'Y': [False, True, False, True], 308 | 'B': [False, True, True, True], 309 | 'D': [True, False, True, True], 310 | 'H': [True, True, False, True], 311 | 'V': [True, True, True, False], 312 | 'N': [False, False, False, False] 313 | } 314 | 315 | base_condition = [base in bases for base in ['A', 'C', 'G', 'T']] 316 | for symbol, iupac_condition in iupac_notation.items(): 317 | if iupac_condition == base_condition: 318 | return symbol 319 | 320 | def is_transition(self, ref_base, alt_base): 321 | """ 322 | Identify SNP as being a transition or not. 323 | 324 | 1: Transition, 0:Transversion 325 | """ 326 | substitution = ref_base + alt_base 327 | transition = ['AG', 'GA', 'CT', 'TC'] 328 | 329 | if substitution in transition: 330 | return 1 331 | 332 | return 0 333 | 334 | 335 | class VCFTools(object): 336 | """A class for parsing VCF formatted files.""" 337 | 338 | def __init__(self, vcf_file): 339 | """Initialize variables.""" 340 | self.reader = vcf.Reader(open(vcf_file, 'r')) 341 | self.records = [record for record in self.reader] 342 | 343 | def add_information_fields(self, info_list): 344 | """Add a given list of information fields to the VCF.""" 345 | for i in info_list: 346 | id, num, type, desc = i 347 | self.__add_information_field(id, num, type, desc) 348 | 349 | def __add_information_field(self, id, num, type, desc): 350 | """Add a given information field to the VCF.""" 351 | _Info = collections.namedtuple('Info', ['id', 'num', 'type', 'desc']) 352 | if id: 353 | self.reader.infos[id] = _Info(id, num, type, desc) 354 | 355 | def write_vcf(self, output='/dev/stdout'): 356 | """Write the VCF to a given output file.""" 357 | vcf_writer = vcf.Writer(open(output, 'w'), self.reader) 358 | for record in self.records: 359 | vcf_writer.write_record(record) 360 | 361 | 362 | if __name__ == '__main__': 363 | import argparse as ap 364 | import os 365 | import sys 366 | 367 | parser = ap.ArgumentParser( 368 | prog='vcf-annotator.py', 369 | conflict_handler='resolve', 370 | description=("Annotate variants from a VCF file using the reference " 371 | "genome's GenBank file.") 372 | ) 373 | parser.add_argument('vcf', metavar="VCF_FILE", type=str, 374 | help='VCF file of variants') 375 | parser.add_argument('gb', metavar="GENBANK_FILE", type=str, 376 | help='GenBank file of the reference genome.') 377 | parser.add_argument('--output', metavar="STRING", type=str, 378 | default='/dev/stdout', 379 | help='File to write VCF output to (Default STDOUT).') 380 | parser.add_argument('--version', action='version', 381 | version='%(prog)s {0}'.format(VERSION)) 382 | 383 | if len(sys.argv) == 1: 384 | parser.print_help() 385 | sys.exit(0) 386 | 387 | args = parser.parse_args() 388 | # Verify input exists 389 | if not os.path.isfile(args.gb): 390 | print('Unable to locate GenBank file: {0}'.format(args.gb)) 391 | sys.exit(1) 392 | elif not os.path.isfile(args.vcf): 393 | print('Unable to locate VCF file: {0}'.format(args.vcf)) 394 | sys.exit(1) 395 | 396 | annotator = Annotator(gb_file=args.gb, vcf_file=args.vcf) 397 | annotator.annotate_vcf_records() 398 | annotator.write_vcf(args.output) 399 | --------------------------------------------------------------------------------