├── setup.cfg ├── pictures ├── hbv_1.png ├── hbv_C_Bj_Ba.jpg ├── hbv_slice_1.png ├── lsdv_rec_sar.png ├── short_names.png ├── hbv_df_example.png ├── hbv_matplotlib.png ├── hcv_2k_1b_rec.png ├── hiv_rec_kal153.png ├── norovirus_rec.png └── HBV_1_rec_C_B_annotated.PNG ├── paper_plots ├── lsdv_rec_sar.png ├── hcv_2k_1b_rec.png ├── hiv_rec_kal153.png └── norovirus_rec.png ├── recan ├── __pycache__ │ ├── recan.cpython-36.pyc │ ├── simgen.cpython-36.pyc │ └── __init__.cpython-36.pyc ├── __init__.py ├── .gitignore ├── rolling_window.py ├── calc_pairwise_distance.py └── simgen.py ├── test ├── __pycache__ │ └── test.cpython-310-pytest-7.1.2.pyc ├── test_nuc_freq.fasta ├── test_p_dist.fasta ├── test.py └── hbv_C_Bj_Ba.fasta ├── CONTRIBUTING.md ├── .github └── ISSUE_TEMPLATE │ └── bug_report.md ├── LICENSE ├── PULL_REQUEST_TEMPLATE.md ├── setup.py ├── .gitignore ├── paper ├── paper.md └── paper.bib ├── datasets ├── hbv_C_Bj_Ba.fasta ├── hbv.fasta ├── norovirus.fasta └── hiv.fasta └── README.md /setup.cfg: -------------------------------------------------------------------------------- 1 | [metadata] 2 | description-file = README.md 3 | -------------------------------------------------------------------------------- /pictures/hbv_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/babinyurii/recan/HEAD/pictures/hbv_1.png -------------------------------------------------------------------------------- /pictures/hbv_C_Bj_Ba.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/babinyurii/recan/HEAD/pictures/hbv_C_Bj_Ba.jpg -------------------------------------------------------------------------------- /pictures/hbv_slice_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/babinyurii/recan/HEAD/pictures/hbv_slice_1.png -------------------------------------------------------------------------------- /pictures/lsdv_rec_sar.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/babinyurii/recan/HEAD/pictures/lsdv_rec_sar.png -------------------------------------------------------------------------------- /pictures/short_names.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/babinyurii/recan/HEAD/pictures/short_names.png -------------------------------------------------------------------------------- /paper_plots/lsdv_rec_sar.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/babinyurii/recan/HEAD/paper_plots/lsdv_rec_sar.png -------------------------------------------------------------------------------- /pictures/hbv_df_example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/babinyurii/recan/HEAD/pictures/hbv_df_example.png -------------------------------------------------------------------------------- /pictures/hbv_matplotlib.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/babinyurii/recan/HEAD/pictures/hbv_matplotlib.png -------------------------------------------------------------------------------- /pictures/hcv_2k_1b_rec.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/babinyurii/recan/HEAD/pictures/hcv_2k_1b_rec.png -------------------------------------------------------------------------------- /pictures/hiv_rec_kal153.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/babinyurii/recan/HEAD/pictures/hiv_rec_kal153.png -------------------------------------------------------------------------------- /pictures/norovirus_rec.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/babinyurii/recan/HEAD/pictures/norovirus_rec.png -------------------------------------------------------------------------------- /paper_plots/hcv_2k_1b_rec.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/babinyurii/recan/HEAD/paper_plots/hcv_2k_1b_rec.png -------------------------------------------------------------------------------- /paper_plots/hiv_rec_kal153.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/babinyurii/recan/HEAD/paper_plots/hiv_rec_kal153.png -------------------------------------------------------------------------------- /paper_plots/norovirus_rec.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/babinyurii/recan/HEAD/paper_plots/norovirus_rec.png -------------------------------------------------------------------------------- /pictures/HBV_1_rec_C_B_annotated.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/babinyurii/recan/HEAD/pictures/HBV_1_rec_C_B_annotated.PNG -------------------------------------------------------------------------------- /recan/__pycache__/recan.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/babinyurii/recan/HEAD/recan/__pycache__/recan.cpython-36.pyc -------------------------------------------------------------------------------- /recan/__pycache__/simgen.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/babinyurii/recan/HEAD/recan/__pycache__/simgen.cpython-36.pyc -------------------------------------------------------------------------------- /recan/__pycache__/__init__.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/babinyurii/recan/HEAD/recan/__pycache__/__init__.cpython-36.pyc -------------------------------------------------------------------------------- /test/__pycache__/test.cpython-310-pytest-7.1.2.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/babinyurii/recan/HEAD/test/__pycache__/test.cpython-310-pytest-7.1.2.pyc -------------------------------------------------------------------------------- /test/test_nuc_freq.fasta: -------------------------------------------------------------------------------- 1 | >s1_A_10 2 | ATGCCTGCTT 3 | >s2_G_50 4 | AGCGTGCGTG 5 | >s3_C_90 6 | CCCCACCCCC 7 | >s4_T_30 8 | TAGCTAGCTA 9 | >s5_A_0 10 | TGCTTGCTTG 11 | >s5_G_100 12 | GGGGGGGGGG 13 | -------------------------------------------------------------------------------- /recan/__init__.py: -------------------------------------------------------------------------------- 1 | print("currently new web version of recan is available at: http://yuriyb.pythonanywhere.com") 2 | print("guide and repository for recan web version: https://github.com/babinyurii/recan_gui") 3 | -------------------------------------------------------------------------------- /test/test_p_dist.fasta: -------------------------------------------------------------------------------- 1 | >s1_ref 2 | ATGCATGCAT 3 | >s2_dist_0 4 | ATGCATGCAT 5 | >s3_dist_10 6 | TTGCATGCAT 7 | >s4_dist_50 8 | GGCGGTGCAT 9 | >s5_dist_90 10 | GGCGGGGGGG 11 | >s5_dist_100 12 | GGCGGGCGGG 13 | 14 | 15 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributions to the software are welcome. 2 | 3 | For bugs and suggestions, the most effective way is by raising an issue on the github issue tracker. 4 | Github allows you to classify your issues so that we know if it is a bug report, feature request or feedback to the authors. 5 | 6 | If you wish to contribute some changes to the code then you should submit a [pull request](https://github.com/babinyurii/recan/pulls). 7 | 8 | [Documentation on pull requests.](https://help.github.com/en/articles/about-pull-requests) 9 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/bug_report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Create a report to help us improve 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Describe the bug** 11 | A clear and concise description of what the bug is. 12 | 13 | **To Reproduce** 14 | Steps to reproduce the behavior: 15 | 1. Go to '...' 16 | 2. Click on '....' 17 | 3. Scroll down to '....' 18 | 4. See error 19 | 20 | **Expected behavior** 21 | A clear and concise description of what you expected to happen. 22 | 23 | **Screenshots** 24 | If applicable, add screenshots to help explain your problem. 25 | 26 | **Desktop (please complete the following information):** 27 | - OS: [e.g. iOS] 28 | - Browser [e.g. chrome, safari] 29 | - Version [e.g. 22] 30 | 31 | **Smartphone (please complete the following information):** 32 | - Device: [e.g. iPhone6] 33 | - OS: [e.g. iOS8.1] 34 | - Browser [e.g. stock browser, safari] 35 | - Version [e.g. 22] 36 | 37 | **Additional context** 38 | Add any other context about the problem here. 39 | -------------------------------------------------------------------------------- /recan/.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | 5 | # Distribution / packaging 6 | .Python 7 | build/ 8 | develop-eggs/ 9 | dist/ 10 | downloads/ 11 | eggs/ 12 | .eggs/ 13 | lib/ 14 | lib64/ 15 | parts/ 16 | sdist/ 17 | var/ 18 | wheels/ 19 | *.egg-info/ 20 | .installed.cfg 21 | *.egg 22 | MANIFEST 23 | 24 | # PyInstaller 25 | # Usually these files are written by a python script from a template 26 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 27 | *.manifest 28 | *.spec 29 | 30 | # Installer logs 31 | pip-log.txt 32 | pip-delete-this-directory.txt 33 | 34 | # Unit test / coverage reports 35 | htmlcov/ 36 | .tox/ 37 | .coverage 38 | .coverage.* 39 | .cache 40 | nosetests.xml 41 | coverage.xml 42 | *.cover 43 | .hypothesis/ 44 | .pytest_cache/ 45 | 46 | # Jupyter Notebook 47 | .ipynb_checkpoints 48 | 49 | # pyenv 50 | .python-version 51 | 52 | # Environments 53 | .env 54 | .venv 55 | env/ 56 | venv/ 57 | ENV/ 58 | env.bak/ 59 | venv.bak/ 60 | 61 | # Spyder project settings 62 | .spyderproject 63 | .spyproject 64 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Yuriy Babin 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | # Description 2 | 3 | Please include a summary of the change and which issue is fixed. 4 | Please also include relevant motivation and context. 5 | List any dependencies that are required for this change. 6 | 7 | # Fixes (issue) 8 | 9 | ## Type of change 10 | 11 | Please delete options that are not relevant. 12 | 13 | - [ ] Bug fix (non-breaking change which fixes an issue) 14 | - [ ] New feature (non-breaking change which adds functionality) 15 | - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) 16 | - [ ] This change requires a documentation update 17 | 18 | # How Has This Been Tested? 19 | 20 | Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. 21 | Please also list any relevant details for your test configuration 22 | 23 | 24 | # Checklist: 25 | 26 | - [ ] My code follows the style guidelines of this project 27 | - [ ] I have performed a self-review of my own code 28 | - [ ] I have commented my code, particularly in hard-to-understand areas 29 | - [ ] I have made corresponding changes to the documentation 30 | - [ ] My changes generate no new warnings 31 | - [ ] I have added tests that prove my fix is effective or that my feature works 32 | - [ ] New and existing unit tests pass locally with my changes 33 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from os import path 2 | import setuptools 3 | from setuptools import setup 4 | 5 | 6 | this_directory = path.abspath(path.dirname(__file__)) 7 | with open(path.join(this_directory, 'README.md'), encoding='utf-8') as f: 8 | long_description = f.read() 9 | 10 | 11 | 12 | setup( 13 | name = 'recan', 14 | long_description = long_description, # added to package readme on pypi 15 | long_description_content_type = "text/markdown", # added to package readme on pypi 16 | packages = ['recan'], 17 | version = '0.5', 18 | license='MIT', 19 | description = 'recan: recombination analysis tool', 20 | author = 'Yuriy Babin', 21 | author_email = 'babin.yurii@gmail.com', 22 | url = 'https://github.com/babinyurii/recan', 23 | download_url = 'https://github.com/babinyurii/recan/archive/refs/tags/v_0.5.tar.gz', 24 | keywords = ['DNA recombination', 'bioinformatics', 'genetic distance'], 25 | install_requires=[ 26 | 'pandas', 27 | 'plotly', 28 | 'biopython', 29 | 'matplotlib' 30 | ], 31 | classifiers=[ 32 | 'Development Status :: 4 - Beta', 33 | 'Intended Audience :: Developers', 34 | 'Topic :: Software Development :: Build Tools', 35 | 'License :: OSI Approved :: MIT License', 36 | 'Programming Language :: Python :: 3', 37 | 'Programming Language :: Python :: 3.4', 38 | 'Programming Language :: Python :: 3.5', 39 | 'Programming Language :: Python :: 3.6', 40 | ], 41 | ) 42 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg 26 | MANIFEST 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | .hypothesis/ 48 | .pytest_cache/ 49 | 50 | # Translations 51 | *.mo 52 | *.pot 53 | 54 | # Django stuff: 55 | *.log 56 | local_settings.py 57 | db.sqlite3 58 | 59 | # Flask stuff: 60 | instance/ 61 | .webassets-cache 62 | 63 | # Scrapy stuff: 64 | .scrapy 65 | 66 | # Sphinx documentation 67 | docs/_build/ 68 | 69 | # PyBuilder 70 | target/ 71 | 72 | # Jupyter Notebook 73 | .ipynb_checkpoints 74 | 75 | # pyenv 76 | .python-version 77 | 78 | # celery beat schedule file 79 | celerybeat-schedule 80 | 81 | # SageMath parsed files 82 | *.sage.py 83 | 84 | # Environments 85 | .env 86 | .venv 87 | env/ 88 | venv/ 89 | ENV/ 90 | env.bak/ 91 | venv.bak/ 92 | 93 | # Spyder project settings 94 | .spyderproject 95 | .spyproject 96 | 97 | # Rope project settings 98 | .ropeproject 99 | 100 | # mkdocs documentation 101 | /site 102 | 103 | # mypy 104 | .mypy_cache/ 105 | -------------------------------------------------------------------------------- /recan/rolling_window.py: -------------------------------------------------------------------------------- 1 | from Bio import AlignIO 2 | 3 | class RollingWindowOnAlignment(): 4 | """ 5 | alignment obj as the biopython multiple alignment 6 | has two sliding window methods that slice the alignment into sections 7 | """ 8 | 9 | def __init__(self, in_file): 10 | self.align = AlignIO.read(in_file, "fasta") 11 | 12 | def roll_window_along_alignment(self, window_len, window_step): 13 | 14 | window_start = 0 15 | window_end = window_len 16 | window_step = window_step 17 | 18 | window_counter = 0 19 | sliced_alignment = {} 20 | 21 | while window_start < self.align.get_alignment_length(): 22 | 23 | sliced_alignment[(window_start, window_end)] = self.align[:, window_start:window_end] 24 | window_start += window_step 25 | 26 | ############################################### 27 | # redo 28 | #new_window_end = window_end + window_step 29 | if window_end + window_step < self.align.get_alignment_length(): 30 | window_end += window_step 31 | else: 32 | window_end = self.align.get_alignment_length() 33 | # redo 34 | ############################################3 35 | window_counter += 1 36 | 37 | return sliced_alignment 38 | 39 | 40 | def roll_window_along_alignment_region(self, window_len, window_step, region): 41 | 42 | 43 | window_start = region[0] 44 | window_end = region[0] + window_len 45 | window_step = window_step 46 | 47 | window_counter = 0 48 | sliced_alignment = {} 49 | while window_start < region[1]: 50 | sliced_alignment[(window_start, window_end)] = self.align[:, window_start:window_end] 51 | window_start += window_step 52 | 53 | if window_end + window_step < region[1]: 54 | window_end += window_step 55 | else: 56 | window_end = region[1] 57 | 58 | 59 | window_counter += 1 60 | 61 | 62 | return sliced_alignment 63 | 64 | 65 | 66 | 67 | -------------------------------------------------------------------------------- /paper/paper.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'Recan: Python tool for analysis of recombination events in viral genomes' 3 | tags: 4 | - Python 5 | - virology 6 | - recombination 7 | authors: 8 | - name: Yuriy Babin 9 | orcid: 0000-0002-7524-5921 10 | affiliation: "1" 11 | affiliations: 12 | - name: National Medical Research Center for Tuberculosis and Infectious Diseases, Moscow, Russia 13 | index: 1 14 | date: 29 November 2019 15 | bibliography: paper.bib 16 | --- 17 | 18 | 19 | # Summary 20 | 21 | Recombination drives virus evolution in response to selective forces in a host environment and adaptation to new abiotic factors [@Perez-Losada2015]. 22 | Gaining insights into recombination events is important for a better understanding of viral biology. Analysis of recombination events can be performed through construction and exploration of similarity plots based on genetic distances between nucleotide sequences. 23 | Python package named "recan" (recombination analyzer) provides the means to construct genetic distance plots and explore them interactively. The package has been designed to operate with the Jupyter notebook. 24 | Compared to the previously designed desktop software [@Lole1999; @Etherington2005a; @Martin2015] recan has the ability to insert or delete sequences from the output without reconstructing the plots and recalculating the distance values. Finally, recan enables simultaneous analysis of several datasets in a single session. Recan is based on Biopython, Pandas, and Plotly libraries. The package requires a sequence alignment in fasta format as an input. The user can adjust the sliding window size, the window shift, method of distance calculation, sequence of interest (a sequence where breakpoints occur), and the length of alignment region which will be included into the distance calculation. The two methods of genetic distance calculations implemented in recan are the pairwise and Kimura 2-parameter models. The distance data can be saved in csv or excel file, or directly used to reconstruct the plot in the Jupyter notebook using the plotting library to obtain a final report. 25 | 26 | # Testing and verification 27 | To test the package, we used four previously reported recombinant viral genomes representing different genuses: human immunodeficiency virus (HIV) [@Liitsola2000a], hepatitis C virus (HCV) [@Smith2014], norovirus [@Jiang1999], and lumpy skin disease virus (LSDV) [@Sprygin2018]. 28 | Each dataset included a recombinant virus sequence, its putative parental sequences and a set of sequences of the same virus closely related to the recombinant virus. HIV, HCV and Norovirus sequences were aligned using ClustalW [@Larkin2007], and LSDV genomes were aligned using MAFFT [@Katoh2002] as part of Ugene software [@Okonechnikov2012]. 29 | The HIV alignment contained twenty five 3135 bp sequences; the HCV alignment contained twenty three 9431 bp sequences; the norovirus alignment included nineteen 3366 bp sequences, and the LSDV alignment had a total of 150511 bp sequences. The resulting `simgen` method execution time with the default window size and shift parameters was the following: 437 ms ± 7.74 ms for HIV, 579 ms ± 58.7 ms for Norovirus, 648 ms ± 44.2 ms for HCV, and 3.55 s ± 239 ms for LSDV dataset. Time execution test was performed using a desktop PC with 4 CPU cores and 4 Gb RAM. LSDV has one of the largest genomes of all viruses (about 150 000 bp). Ultimately, recan can potentially be used to identify and analyze recombination events in a large subset of sequences regardless of the length of the viral genome. The distance plots with recombination events detected by recan are shown in Figures 1-4. 30 | 31 | # Availability and implementation 32 | Recan is supported on Linux and Windows. The package can be installed by `pip` Python package manager using `pip install recan` command. The source code, guide and datasets are available on the GitHub repository (https://github.com/babinyurii/recan). 33 | 34 | ![](https://raw.githubusercontent.com/babinyurii/recan/master/paper_plots/hiv_rec_kal153.png) 35 | _Figure 1. HIV recombinant strain AF193276 between sequences AF193275 and AF193278._ 36 | 37 | 38 | ![](https://raw.githubusercontent.com/babinyurii/recan/master/paper_plots/hcv_2k_1b_rec.png) 39 | _Figure 2. HCV intergenotype recombinant 2k/1b._ 40 | 41 | 42 | ![](https://raw.githubusercontent.com/babinyurii/recan/master/paper_plots/norovirus_rec.png) 43 | _Figure 3. Norovirus recombinant AF190817 between parental sequences U22498 and X86557._ 44 | 45 | 46 | ![](https://raw.githubusercontent.com/babinyurii/recan/master/paper_plots/lsdv_rec_sar.png) 47 | _Figure 4. LSDV recombinant vaccine-like strain LSDV RUSSIA/Saratov/2017 between sequences AF193275 and KY829023._ 48 | 49 | # Acknowledgement 50 | The author thanks Alexander Sprygin for editing the manuscript and providing LSDV data. 51 | 52 | 53 | 54 | # References 55 | -------------------------------------------------------------------------------- /paper/paper.bib: -------------------------------------------------------------------------------- 1 | @article{Perez-Losada2015, 2 | author = {P{\'{e}}rez-Losada, Marcos and Arenas, Miguel and Gal{\'{a}}n, Juan Carlos and Palero, Ferran and Gonz{\'{a}}lez-Candelas, Fernando}, 3 | doi = {10.1016/j.meegid.2014.12.022}, 4 | journal = {Infection, Genetics and Evolution}, 5 | number = {December}, 6 | pages = {296--307}, 7 | title = {{Recombination in viruses: Mechanisms, methods of study, and evolutionary consequences}}, 8 | volume = {30}, 9 | year = {2015} 10 | } 11 | 12 | @article{Lole1999, 13 | author = {Lole, K S and Bollinger, R C and Paranjape, R S and Gadkari, D and Kulkarni, S S and Novak, N G and Ingersoll, R and Sheppard, H W and Ray, S C}, 14 | journal = {Journal of virology}, 15 | number = {1}, 16 | pages = {152--60}, 17 | pmid = {9847317}, 18 | title = {{Full-length human immunodeficiency virus type 1 genomes from subtype C-infected seroconverters in India, with evidence of intersubtype recombination.}}, 19 | volume = {73}, 20 | year = {1999} 21 | } 22 | 23 | @article{Etherington2005a, 24 | author = {Etherington, Graham J. and Dicks, Jo and Roberts, Ian N.}, 25 | doi = {10.1093/bioinformatics/bth500}, 26 | journal = {Bioinformatics}, 27 | number = {3}, 28 | pages = {278--281}, 29 | pmid = {15333462}, 30 | title = {{Recombination Analysis Tool (RAT): A program for the high-throughput detection of recombination}}, 31 | volume = {21}, 32 | year = {2005} 33 | } 34 | 35 | @article{Martin2015, 36 | author = {Martin, Darren P. and Murrell, Ben and Golden, Michael and Khoosal, Arjun and Muhire, Brejnev}, 37 | doi = {10.1093/ve/vev003}, 38 | journal = {Virus Evolution}, 39 | number = {1}, 40 | pages = {1--5}, 41 | title = {{RDP4: Detection and analysis of recombination patterns in virus genomes}}, 42 | volume = {1}, 43 | year = {2015} 44 | } 45 | 46 | @article{Liitsola2000a, 47 | author = {Liitsola, Kirsi and Holm, Kirsi and Bobkov, Aleksei and Pokrovsky, Vadim and Smolskaya, Tatjana and Leinikki, Pauli and Osmanov, Saladin and Salminen, Mika}, 48 | doi = {10.1089/08892220050075309}, 49 | journal = {AIDS Research and Human Retroviruses}, 50 | number = {11}, 51 | pages = {1047--1053}, 52 | title = {{An AB recombinant and its parental HIV type 1 strains in the area of the former Soviet Union: Low requirements for sequence identity in recombination}}, 53 | volume = {16}, 54 | year = {2000} 55 | } 56 | 57 | @article{Smith2014, 58 | author = {Smith, Donald B. and Bukh, Jens and Kuiken, Carla and Muerhoff, A. Scott and Rice, Charles M. and Stapleton, Jack T. and Simmonds, Peter}, 59 | doi = {10.1002/hep.26744}, 60 | journal = {Hepatology}, 61 | number = {1}, 62 | pages = {318--327}, 63 | pmid = {24115039}, 64 | title = {{Expanded classification of hepatitis C virus into 7 genotypes and 67 subtypes: Updated criteria and genotype assignment web resource}}, 65 | volume = {59}, 66 | year = {2014} 67 | } 68 | 69 | @article{Jiang1999, 70 | author = {Jiang, X. and Espul, C. and Zhong, W. M. and Cuello, H. and Matson, D. O.}, 71 | doi = {10.1007/s007050050651}, 72 | journal = {Archives of Virology}, 73 | number = {12}, 74 | pages = {2377--2387}, 75 | title = {{Characterization of a novel human calicivirus that may be a naturally occurring recombinant}}, 76 | volume = {144}, 77 | year = {1999} 78 | } 79 | 80 | @article{Sprygin2018, 81 | doi = {10.1371/journal.pone.0207480}, 82 | url = {https://doi.org/10.1371/journal.pone.0207480}, 83 | year = {2018}, 84 | publisher = {PLOS}, 85 | volume = {13}, 86 | number = {12}, 87 | pages = {1--19}, 88 | author = {Alexander Sprygin, Yurii Babin, Yana Pestova, Svetlana Kononova, David B. Wallace, Antoinette Van Schalkwyk, Olga Byadovskaya, Vyacheslav Diev, Dmitry Lozovoy, Alexander Kononov}, 89 | title = {Analysis and insights into recombination signals in lumpy skin disease virus recovered in the field}, 90 | journal = {PLoS ONE} 91 | } 92 | 93 | @article{Larkin2007, 94 | author = {Larkin, M. A. and Blackshields, G. and Brown, N. P. and Chenna, R. and Mcgettigan, P. A. and McWilliam, H. and Valentin, F. and Wallace, I. M. and Wilm, A. and Lopez, R. and Thompson, J. D. and Gibson, T. J. and Higgins, D. G.}, 95 | doi = {10.1093/bioinformatics/btm404}, 96 | journal = {Bioinformatics}, 97 | number = {21}, 98 | pages = {2947--2948}, 99 | pmid = {17846036}, 100 | title = {{Clustal W and Clustal X version 2.0}}, 101 | volume = {23}, 102 | year = {2007} 103 | } 104 | 105 | @article{Katoh2002, 106 | author = {Katoh, K.}, 107 | doi = {10.1093/nar/gkf436}, 108 | journal = {Nucleic Acids Research}, 109 | number = {14}, 110 | pages = {3059--3066}, 111 | pmid = {12136088}, 112 | title = {{MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform}}, 113 | volume = {30}, 114 | year = {2002} 115 | } 116 | 117 | @article{Okonechnikov2012, 118 | author = {Okonechnikov, Konstantin and Golosova, Olga and Fursov, Mikhail and Varlamov, Alexey and Vaskin, Yuri and Efremov, Ivan and {German Grehov}, O. G. and Kandrov, Denis and Rasputin, Kirill and Syabro, Maxim and Tleukenov, Timur}, 119 | doi = {10.1093/bioinformatics/bts091}, 120 | journal = {Bioinformatics}, 121 | number = {8}, 122 | pages = {1166--1167}, 123 | title = {{Unipro UGENE: A unified bioinformatics toolkit}}, 124 | volume = {28}, 125 | year = {2012} 126 | } 127 | -------------------------------------------------------------------------------- /test/test.py: -------------------------------------------------------------------------------- 1 | import sys 2 | 3 | 4 | 5 | from Bio import AlignIO 6 | 7 | sys.path.append("..") 8 | #from recan.simgen import Simgen 9 | 10 | from recan.calc_pairwise_distance import p_distance, estimate_nucleotide_frequencies 11 | from recan.rolling_window import RollingWindowOnAlignment 12 | 13 | 14 | def test_p_distance(): 15 | 16 | align = AlignIO.read("test_p_dist.fasta", "fasta") 17 | """ 18 | >s1_ref 19 | ATGCATGCAT 20 | >s2_dist_0 21 | ATGCATGCAT 22 | >s3_dist_10 23 | TTGCATGCAT 24 | >s4_dist_50 25 | GGCGGTGCAT 26 | >s5_dist_90 27 | GGCGGGGGGG 28 | >s5_dist_100 29 | GGCGGGCGGG 30 | """ 31 | ref_seq = align[0] 32 | seq_0 = align[1] 33 | seq_10 = align[2] 34 | seq_50 = align[3] 35 | seq_90 = align[4] 36 | seq_100 = align[5] 37 | 38 | assert p_distance(ref_seq, seq_0) == 0.0 39 | assert p_distance(ref_seq, seq_10) == 0.1 40 | assert p_distance(ref_seq, seq_50) == 0.5 41 | assert p_distance(ref_seq, seq_90) == 0.9 42 | assert p_distance(ref_seq, seq_100) == 1.0 43 | 44 | 45 | def test_estimate_nuc_frequency(): 46 | align = AlignIO.read("test_nuc_freq.fasta", "fasta") 47 | 48 | """ sequences in the file 'test_nuc_freq.fastq": 49 | >s1_A_10 50 | ATGCCTGCTT 51 | >s2_G_50 52 | AGCGTGCGTG 53 | >s3_C_90 54 | CCCCACCCCC 55 | >s4_T_30 56 | TAGCTAGCTA 57 | >s5_A_0 58 | TGCTTGCTTG 59 | >s5_G_100 60 | GGGGGGGGGG 61 | """ 62 | # return [ x / length for x in [A, C ,G, T] ] 63 | 64 | # seq to get Seq obj out of SeqRecord obj 65 | assert estimate_nucleotide_frequencies(align[0].seq)[0] == 0.1 66 | assert estimate_nucleotide_frequencies(align[1].seq)[2] == 0.5 67 | assert estimate_nucleotide_frequencies(align[2].seq)[1] == 0.9 68 | assert estimate_nucleotide_frequencies(align[3].seq)[3] == 0.3 69 | assert estimate_nucleotide_frequencies(align[4].seq)[0] == 0.0 70 | assert estimate_nucleotide_frequencies(align[5].seq)[2] == 1.0 71 | 72 | 73 | def test_sliced_alignment_slices_on_whole_alignment(): 74 | 75 | align = RollingWindowOnAlignment("./hbv_C_Bj_Ba.fasta") 76 | 77 | sliced_align = align.roll_window_along_alignment(window_len=500, window_step=500) 78 | assert len(sliced_align) == 7 79 | 80 | sliced_align = align.roll_window_along_alignment(window_len=1000, window_step=500) 81 | assert len(sliced_align) == 7 82 | 83 | sliced_align = align.roll_window_along_alignment(window_len=1, window_step=1) 84 | assert len(sliced_align) == 3215 85 | 86 | sliced_align = align.roll_window_along_alignment(window_len=3215, window_step=3215) 87 | assert len(sliced_align) == 1 88 | 89 | 90 | def test_sliced_alignment_slices_on_alignment_region(): 91 | 92 | align = RollingWindowOnAlignment("./hbv_C_Bj_Ba.fasta") 93 | 94 | sliced_align = align.roll_window_along_alignment_region(window_len=500, window_step=500, 95 | region=(0, 1000)) 96 | assert len(sliced_align) == 2 97 | 98 | sliced_align = align.roll_window_along_alignment_region(window_len=500, window_step=250, 99 | region=(0, 1000)) 100 | assert len(sliced_align) == 4 101 | 102 | sliced_align = align.roll_window_along_alignment_region(window_len=500, window_step=250, 103 | region=(0, 3215)) 104 | assert len(sliced_align) == 13 105 | 106 | 107 | def test_sliced_alignment_window_borders_whole_alignment(): 108 | 109 | align = RollingWindowOnAlignment("./hbv_C_Bj_Ba.fasta") 110 | 111 | sliced_align = align.roll_window_along_alignment(window_len=500, window_step=500) 112 | 113 | 114 | check_window_coords = [[0, 500], [500, 1000], [1000, 1500], [1500, 2000], 115 | [2000, 2500], [2500, 3000], [3000, 3215]] 116 | 117 | counter = 0 118 | for window_coords in sliced_align.keys(): 119 | 120 | assert window_coords[0] == check_window_coords[counter][0] 121 | assert window_coords[1] == check_window_coords[counter][1] 122 | counter += 1 123 | 124 | 125 | 126 | def test_sliced_alignment_window_borders_alignment_region(): 127 | 128 | align = RollingWindowOnAlignment("./hbv_C_Bj_Ba.fasta") 129 | 130 | sliced_align = align.roll_window_along_alignment_region(window_len=500, window_step=250, 131 | region=(0, 1000)) 132 | 133 | 134 | check_window_coords = [[0, 500], [250, 750], [500, 1000], [750, 1000]] 135 | 136 | counter = 0 137 | for window_coords in sliced_align.keys(): 138 | 139 | assert window_coords[0] == check_window_coords[counter][0] 140 | assert window_coords[1] == check_window_coords[counter][1] 141 | counter += 1 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | -------------------------------------------------------------------------------- /recan/calc_pairwise_distance.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | 4 | from math import log, sqrt 5 | 6 | """ 7 | U Uracil (RNA) U 8 | W Weak A/T 9 | S Strong C/G 10 | M Amino A/C 11 | K Keto G/T 12 | R Purine A/G 13 | Y Pyrimidine C/T 14 | B Not A C/G/T 15 | D Not C A/G/T 16 | H Not G A/C/T 17 | V Not T A/C/G 18 | N Any A/C/G/T 19 | """ 20 | DEGENERATE_NUCS = ("U", "W", "S", "M", "K", "R", "Y", "B", "D", "H", "V", "N") 21 | 22 | def estimate_nucleotide_frequencies(seq): 23 | 24 | seq = seq.replace("-","").upper() 25 | 26 | A = seq.count("A") 27 | C = seq.count("C") 28 | G = seq.count("G") 29 | T = seq.count("T") 30 | 31 | length = float(len(seq)) 32 | 33 | return [ x / length for x in [A, C ,G, T] ] 34 | 35 | 36 | def p_distance(seq1, seq2): 37 | """calculates pairwise distance between two sequences 38 | distance = num of different nucleotides / total nucleotides 39 | """ 40 | 41 | different_nucs = 0 42 | nuc_pairs = [] 43 | 44 | for x in zip(seq1, seq2): 45 | if '-' not in x and x[0] not in DEGENERATE_NUCS and x[1] not in DEGENERATE_NUCS: # skip gaps 46 | nuc_pairs.append(x) 47 | 48 | for (nuc_1, nuc_2) in nuc_pairs: 49 | if nuc_1 != nuc_2: 50 | different_nucs += 1 51 | 52 | total_nucs = len(nuc_pairs) 53 | 54 | try: 55 | distance = float(different_nucs / total_nucs) 56 | 57 | return distance 58 | 59 | except ValueError: 60 | print("""the reasons for the ValueError maybe: 1. too many gaps in some 61 | region of the alignment. 2. too short window span""") 62 | 63 | 64 | 65 | 66 | def jc_distance(seq1, seq2): 67 | 68 | """ 69 | Jukes-Cantor 70 | jc_distance = - b log(1 - p_dist / b) 71 | ------------ 72 | b is a constant. b = 3/4 for nucleotides and 19/20 for proteins. 73 | p_dist = pairwise distance, which is uncorrected distance between seq1 and seq2 74 | """ 75 | b = 0.75 76 | p_dist = p_distance(seq1,seq2) 77 | 78 | try: 79 | distance = - b * log(1 - p_dist / b) 80 | 81 | return distance 82 | 83 | except ValueError: 84 | print("""the reasons for the ValueError maybe: 1. too many gaps in some 85 | region of the alignment. 2. too short window span""") 86 | 87 | 88 | 89 | def k2p_distance(seq1,seq2): 90 | """ 91 | Kimura 2-Parameter distance 92 | k2p_distance = - 0.5 log( (1 - 2p - q)*sqrt(1 - 2q) ) 93 | where: 94 | p : transition frequency 95 | q : transversion frequency 96 | """ 97 | nuc_pairs = [] 98 | 99 | # collect nuc pairs without gaps 100 | for x in zip(seq1,seq2): 101 | if '-' not in x and x[0] not in DEGENERATE_NUCS and x[1] not in DEGENERATE_NUCS: 102 | nuc_pairs.append(x) 103 | 104 | ts_count = 0 105 | tv_count = 0 106 | total_nucs = len(nuc_pairs) 107 | 108 | transitions = [ "AG", "GA", "CT", "TC"] 109 | transversions = [ "AC", "CA", "AT", "TA", 110 | "GC", "CG", "GT", "TG" ] 111 | 112 | for (nuc_1, nuc_2) in nuc_pairs: 113 | if nuc_1 + nuc_2 in transitions: 114 | ts_count += 1 115 | elif nuc_1 + nuc_2 in transversions: 116 | tv_count += 1 117 | 118 | ts_freq = float(ts_count) / total_nucs 119 | tv_freq = float(tv_count) / total_nucs 120 | 121 | try: 122 | distance = -0.5 * log((1 - 2 * ts_freq - tv_freq) * sqrt( 1 - 2 * tv_freq )) 123 | 124 | return distance 125 | 126 | except ValueError: 127 | print("""the reasons for the ValueError maybe: 1. too many gaps in some 128 | region of the alignment. 2. too short window span""") 129 | 130 | 131 | def tamura_distance(seq1,seq2): 132 | """ 133 | Tamura distance = -C log( 1 - P/C - Q ) - 0.5( 1 - C )log( 1 - 2Q ) 134 | where: 135 | P = transition frequency 136 | Q = transversion frequency 137 | C = GC1 + GC2 - 2 * GC1 * GC2 138 | GC1 = GC-content of sequence 1 139 | GC2 = GC-coontent of sequence 2 140 | """ 141 | 142 | nuc_pairs = [] 143 | 144 | #collect ungapped pairs 145 | for x in zip(seq1,seq2): 146 | if '-' not in x and x[0] not in DEGENERATE_NUCS and x[1] not in DEGENERATE_NUCS: 147 | nuc_pairs.append(x) 148 | 149 | ts_count = 0 150 | tv_count = 0 151 | total_nucs = len(nuc_pairs) 152 | 153 | transitions = [ "AG", "GA", "CT", "TC"] 154 | transversions = [ "AC", "CA", "AT", "TA", 155 | "GC", "CG", "GT", "TG" ] 156 | 157 | for (nuc_1, nuc_2) in nuc_pairs: 158 | if nuc_1 + nuc_2 in transitions: 159 | ts_count += 1 160 | elif nuc_1 + nuc_2 in transversions: 161 | tv_count += 1 162 | 163 | ts_freq = float(ts_count) / total_nucs # p 164 | tv_freq = float(tv_count) / total_nucs # q 165 | 166 | gc1 = sum(estimate_nucleotide_frequencies(seq1)[1:3]) 167 | gc2 = sum(estimate_nucleotide_frequencies(seq2)[1:3]) 168 | 169 | c = gc1 + gc2 - 2 * gc1 * gc2 170 | 171 | try: 172 | distance = -c * log( 1 - ts_freq / c - tv_freq) - 0.5 * ( 1 - c ) * log ( 1 - 2 * tv_freq ) 173 | 174 | return distance 175 | 176 | except ValueError: 177 | print("""the reasons for the ValueError maybe: 1. too many gaps in some 178 | region of the alignment. 2. too short window span""") 179 | 180 | 181 | # TODO tajima nei isn't included still 182 | def tn_distance(seq1, seq2): 183 | """ 184 | Tajima-Nei distance = -b log(1 - p / b) 185 | where: 186 | b = 0.5 * [ 1 - Sum i from A to T(Gi^2+p^2/h) ] 187 | h = Sum i from A to G( Sum j from C to T (Xij^2/2*Gi*Gj)) 188 | p = p-distance, i.e. uncorrected distance between seq1 and seq2 189 | Xij = frequency of pair (i,j) in seq1 and seq2, with gaps removed 190 | Gi = frequency of base i over seq1 and seq2 """ 191 | from math import log 192 | 193 | ns = ['A','C','G','T'] 194 | G = estimate_nucleotide_frequencies(seq1 + seq2) 195 | p = p_distance(seq1,seq2) 196 | pairs = [] 197 | h = 0 198 | 199 | #collect ungapped pairs 200 | for x in zip(seq1,seq2): 201 | if '-' not in x: pairs.append(x) 202 | 203 | #pair frequencies are calculated for AC, AG, AT, CG, CT, GT (and reverse order) 204 | for i in range(len(ns)-1): 205 | for j in range(i+1,len(ns)): 206 | if i != j: 207 | paircount = pairs.count( (ns[i], ns[j]) ) + pairs.count( (ns[j], ns[i]) ) 208 | Xij_sq = (float(paircount)/len(pairs))**2 209 | GiGj = G[i]*G[j] 210 | h += 0.5*Xij_sq/GiGj #h value used to calculate b 211 | 212 | b = 0.5*(1-sum([x**2 for x in G])+p**2/h) 213 | 214 | try: 215 | d = -b * log(1 - p/b) 216 | return d 217 | 218 | except ValueError: 219 | print("""the reasons for the ValueError maybe: 1. too many gaps in some 220 | region of the alignment. 2. too short window span""") 221 | 222 | 223 | def calc_pairwise_distance(seq1, seq2, dist_method): 224 | 225 | if dist_method == "pdist": 226 | distance = p_distance(seq1, seq2) 227 | 228 | elif dist_method == "jcd": 229 | distance = jc_distance(seq1, seq2) 230 | 231 | elif dist_method == "k2p": 232 | distance = k2p_distance(seq1, seq2) 233 | 234 | elif dist_method == "td": 235 | distance = tamura_distance(seq1, seq2) 236 | 237 | #elif dist_method == "tnd": 238 | # distance = tn_distance(seq1, seq2) 239 | 240 | return 1 - distance 241 | 242 | 243 | 244 | 245 | -------------------------------------------------------------------------------- /recan/simgen.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | import plotly.graph_objs as go 4 | import pandas as pd 5 | from plotly.offline import init_notebook_mode, iplot 6 | from .rolling_window import RollingWindowOnAlignment 7 | from .calc_pairwise_distance import calc_pairwise_distance 8 | 9 | 10 | class Simgen(): 11 | 12 | def __init__(self, in_file): 13 | 14 | self.alignment_roll_window = RollingWindowOnAlignment(in_file) 15 | self.align_sliced = None 16 | self.ticks_for_x_axis = [] 17 | self.distance_data = {} 18 | self.pot_rec_index = None 19 | self.pot_rec_id = None 20 | 21 | 22 | def _plot_distance_by_plotly(self): 23 | 24 | """draws similarity plot using plotly""" 25 | init_notebook_mode() 26 | data = [] 27 | for key in self.distance_data.keys(): 28 | trace = go.Scatter(y=self.distance_data[key], x=self.ticks_for_x_axis, name=key) 29 | data.append(trace) 30 | 31 | layout = go.Layout( 32 | xaxis=dict( 33 | title="nucleotide position"), 34 | yaxis=dict( 35 | title="sequence identity"), 36 | legend=dict(x=-0.1, y=1.5, orientation="h")) 37 | #legend=dict(x=-0.1, y=1.5)) 38 | 39 | fig = go.Figure(data=data, layout=layout) 40 | iplot(fig) 41 | print("potential recombinant: ", self.pot_rec_id) 42 | 43 | 44 | def _get_ticks_for_x_axis(self): 45 | """ 46 | prepares ticks for x axis 47 | the ticks correspond to the alignment nucleotide positions 48 | the ticks for x axis are in the middle of the rolling window 49 | """ 50 | 51 | self.ticks_for_x_axis.clear() 52 | 53 | for start_stop_nuc_index in self.align_sliced.keys(): 54 | # we take a nucleotide position in the middle of the window 55 | # it'll be a point we plot 56 | middle_nuc = (start_stop_nuc_index[1] - start_stop_nuc_index[0]) / 2 57 | self.ticks_for_x_axis.append(start_stop_nuc_index[0] + middle_nuc) 58 | 59 | 60 | def _get_pot_rec_id(self): 61 | """ get id of the potential recombinant from 62 | the alignment slices 63 | """ 64 | 65 | for slice in self.align_sliced.values(): 66 | slice_0 = slice 67 | break 68 | 69 | self.pot_rec_id = slice_0[self.pot_rec_index].id 70 | 71 | 72 | 73 | def _prepare_distance_data(self): 74 | 75 | """ makes dictionary for distance data collection 76 | keys are sequences names (ids from the alignment) 77 | values are lists of distances to the potential recombinant 78 | """ 79 | self.distance_data = {} 80 | 81 | for slice in self.align_sliced.values(): 82 | slice_0 = slice 83 | break 84 | 85 | for seq in slice_0: 86 | if seq.id == self.pot_rec_id: 87 | continue 88 | self.distance_data[seq.id] = [] 89 | 90 | 91 | 92 | def simgen(self, pot_rec, window, shift, region=False, dist="pdist"): 93 | """ 94 | Parameters 95 | ---------- 96 | pot_rec : int 97 | index of the potential recombinant in the alignment. 98 | window : int 99 | sliding window size. 100 | shift : int 101 | sliding window shift along the alignment. 102 | region : list or tuple, optional 103 | The default is False. start and end of the region 104 | to analyze, f.e. (1000, 3000) 105 | dist : str, optional 106 | the distance calculation method. default is "pdist". 107 | 108 | available methods: 109 | pdist - pairwise distance (default) 110 | jcd - Jukes-Cantor distance 111 | k2p - Kimura 2-parameter distance 112 | td - Tamura distance 113 | 114 | Returns 115 | ------- 116 | None. 117 | 118 | """ 119 | 120 | if region: 121 | assert region[0] < region[1], "start of the region must be less than the region end" 122 | self.align_sliced = self.alignment_roll_window.roll_window_along_alignment_region(window_len=window, 123 | window_step=shift, 124 | region=region) 125 | else: 126 | self.align_sliced = self.alignment_roll_window.roll_window_along_alignment(window_len=window, 127 | window_step=shift) 128 | self.pot_rec_index = pot_rec 129 | self._get_pot_rec_id() 130 | self._get_ticks_for_x_axis() 131 | self._prepare_distance_data() 132 | 133 | 134 | for window_borders, alignment_slice in self.align_sliced.items(): 135 | for seq in alignment_slice: 136 | if seq.id == self.pot_rec_id: 137 | continue 138 | seq1 = alignment_slice[self.pot_rec_index].seq 139 | seq2 = seq.seq 140 | distance = calc_pairwise_distance(seq1=seq1, seq2=seq2, 141 | dist_method=dist) 142 | self.distance_data[seq.id].append(distance) 143 | 144 | 145 | self._plot_distance_by_plotly() 146 | 147 | 148 | 149 | 150 | def save_data(self, path=False, out="csv", out_name="distance_data", 151 | data_cols="plot_ticks"): 152 | """saves the data spreadsheet as a csv file 153 | Parameters 154 | --------- 155 | path: str 156 | output destination 157 | out: str 158 | output file format: "csv" or "excel" 159 | out_name: str 160 | output file name 161 | columns: str 162 | denoting columns in the spreadsheet: "plot_ticks" by default, 163 | or "window_pos" for sliding window start and end nucleotide 164 | position in alignment 165 | """ 166 | 167 | if data_cols == "plot_ticks": 168 | columns = self.ticks_for_x_axis 169 | elif data_cols == "window_pos": 170 | columns = [x for x in self.align_sliced.keys()] 171 | 172 | 173 | df = pd.DataFrame.from_dict(self.distance_data, 174 | orient='index', 175 | columns=columns) 176 | 177 | if path: 178 | if out == "csv": 179 | df.to_csv(out_name + ".csv") 180 | else: 181 | print("invalid output file") 182 | else: 183 | if out == "csv": 184 | df.to_csv(out_name + ".csv") 185 | else: 186 | print("invalid output file format") 187 | 188 | 189 | 190 | 191 | def get_data(self, df=True): 192 | """returns distance data 193 | Parameters 194 | --------- 195 | df: bool 196 | True: returns pandas DataFrame object 197 | False: returns a dictionary where keys are the sequence ids and 198 | values are distance data 199 | 200 | """ 201 | if df: 202 | return pd.DataFrame(data=self.distance_data, index=self.ticks_for_x_axis).T 203 | else: 204 | return self.ticks_for_x_axis, self.distance_data 205 | 206 | 207 | def get_info(self): 208 | """outputs information about the alignment: 209 | index (which is the row number), 210 | sequence names, and alignment length""" 211 | 212 | print("index:", "sequence id:", sep="\t") 213 | for seq_index, seq in enumerate(self.alignment_roll_window.align): 214 | print(seq_index, seq.id, sep="\t") 215 | print("alignment length: ", self.alignment_roll_window.align.get_alignment_length()) 216 | 217 | 218 | 219 | -------------------------------------------------------------------------------- /test/hbv_C_Bj_Ba.fasta: -------------------------------------------------------------------------------- 1 | >AB048704.1_genotype_C_ 2 | TTCCACAGCATTCCACCAAGCTCTGCAGGATCCCAGAGTAAGGGGTCTGTATTTTCCTGCTGGTGGCTCCAGTTCCGGAACAGTAAACCCTGTTCCGAATACTGTCTCTCACATCTCATCAATCTTCACGAAGACTGGGGACCCTGCATCGAACATGGAGAGCACAACATCAGGATTCCTAGGACCCCTGCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAGAATCCTCACAATACCTCAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGAGCTCCCGGGTGTATTGGCCAAAATTCGCAGTCCCAAACCTCCAATCACTCACCAACCTCTTGTCCTCCAACCTGTCCTGGCTATCGTTGGATGTGTCTGCGGCGTTTTATCATCTTCCTCTTCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTTCTTCTGGACTATCAAGGTATGTTGCCCGTTTGTCCTCTACTTCCAGGATCAACGACCACCAGCACGGGACCTTGCAGAACCTGCACGATCACTGCTCAAGGAACCTCTATGTTTCCCTCATGTTGCTGTACAAAACCTTCGGACGGAAATTGCACCTGTATTCCCATCCCATCATCTTGGGGTTTCGCAAAATTCCTATGGGAGTGGGCCTCAGTCCGTTTCTCCTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGCAGGGCTTTCCCCCACTGTTTGGCTTTCAGTTATATGGATGATCTGGTATTGGGGGCCAAGTCTGTACAACATCTTGAGTCCCTTTTTACCGCTGTTACCAATTTTCTTATGTCTTTGGGTATACATTTAAATCCCCACAAAACAAAAAGATGGGGTTATTCCCTCAACTTTATGGGATATGTGATCGGGAGTTGGGGAAGCTTACCGCAAGAGCATATTGTACACAAACTCAAACACTGTTTTAGAAAACTTCCTGTTAATAGGCCTATTGATTGGAAAGTATGCCAACGAATTGTGGGTTTATTGGGCTTCGCTGCCCCTTTTACACAATGTGGTTATCCTGCCTTAATGCCTTTGTATGCGTGCATACAAGCCAAGCAAGCTTTCACTTTCTCGCCAACTTACAAGGCCTTTCTGTGTAAACAATATCTGAACCTTTACCCCGTTGCCCGGCAACGGGCTGGTCTCTGCCAAGTGTTTGCTGACGCAACCCCCACTGGTTGGGGCTTGGCCATAGGCCATCAGCGCGTGCGGGGAACCTTTGTGGCTCCTCTGCCGATCCATACTGCGGAACTCCTAGCAGCTTGTTTCGCTCGCAGCCGGTCTGGAGCGAACATTCTCGGGACCGACAACTCTGTTGTTCTCTCTCGGAAATACACCTCCTTTCCATGGCTGCTAGGCTGTGCTGCCAACTGGATCCTACGCGGGACGTCCTTTGTCTACGTCCCGTCGGCGCTGAATCCCGCGGACGACCCGTCTCGGGGCCGCTTGGGGATCTACCGTCCCCTTCTGCGTCTCCCGTTCCGACCATCGACAGGGCGCACCTCTCTTTACGCGGACTCCCCGTCTGTGCCTTCTCATCTGCCGGACCGTGTGCACTTCGCTTCACCTCTGCACGTCGCATGGAGACCACCGTGAACACCCACATGATCTTGCCCAAGGTCTTGCATAAGAGGACTCTTGGACTCCCAGCGATGTCAACGATCGACCTTGAGGCATACTTCAAAGACTGTTTGTTTAAAGACTGGGAGGAGTTGGGGGAGGAGATTAGGCTAAAGGTCTTTGTACTAGGAGGCTGTAGGCATAAATTGGTCTGTTCACCAGCACCATGCAACTTTTTCACCTCTGCCTAATCATCTCATGTTCATGTCCTACTGTTCAAGCCTCCAAGTTGTGCCTTGGGTGGCTTTAGGACATGGACATTGACCCTTATAAAGAATTTGGAGCTTCTGTGGAGTTACTCTCTTTTTTGCCTTCTGATTTCTTTCCAAATATTCGAGATCTCCTCGACACCGCCTCCGCCCTGTATCGGGAGGCCTTAGAGTCTCCGGAACATTGCTCACCTCACCATACCGCACTCAGGCAAGCTATACTGTGTTGGGGTGAGTTAATGAATCTGGCAACCTGGGTGGGAAGTAATTTGGAAGATCCAGCATCCAGGGAATTAGTAGTCAGTTATGTCAACGTTAATATGGGCCTAAAAATTAGACAACTATTGTGGTTTCACATTTCCTGTCTTACTTTTGGAAGAGAAACTGTTCTTGAGTATTTGGTGTCTTTTGGAGTGTGGATTCGCACTCCTATCGCTTACAGACCACCAAATGCCCCTATCCTATCAACACTTCCGGAAACTACTGTTGTTAGACGACGAGGCAGGTCCCCTAGAAGAAGAACTCCCTCGCCTCGCAGACGAAGGTCTCAATCGCCGCGTCGCAGAAGATCTCAATCTCGGGAATCCCAATGTTAGTATCCCTTGGACTCATAAGGTGGGAAACTTTACTGGGCTTTATTCTTCTACTGTACCTGTCTTTAATCCTGATTGGCAAACTCCCAAGTTTCCTGATATTCATTTAAAGGAGGACATTATCAATAGGTGTCAAAATTATGTAGGCCCTCTTACAGTCAATGAAAAAAGAAGATTAAAATTAATTATGCCTGCTAGGTTCTATCCTACTCTTACCAAATATTTGCCCCTAGAGAAAGGCATAAAACCTTATTATCCTGAACATGCAGTTAATCATTACTTCAAAACTAGGCATTATTTACATACTCTGTGGAAGGCTGGCATTCTATATAAGAGAGAAACTACACGCAGCGCCTCATTTTGTGGGTCACCATATTCTTGGGAACAAGAGCTACAGCATGGGAGGTTGGTCTTCCAAACATCGGAAAGGCATGGGGACGAATCTTTCTGTTCCCAATCCTCTGGGATTCTTTCCCGATCACCAGTTGGACCCTGCGTTCGGAGCCAACTCAAACAATCCAGATTGGGACTTCAACCCCAACAAGGATCACTGGCCAGAGGCAAATCAGGTAGGAGCGGGAGCATTCGGGCCAGGGTTCACCCCACCACACGGAGGTTTTTTGGGGTGGAGCCCGCAGGCCCAGGGCATATTGACAACAGTGCCAGCAGCTCCTCCTCCTGCATCCACCAATCGGCAGTCAGGAAGACAACCCACTCCCATCTCACCACCGCTCAGAGACACTCACCCTCAGGCCATGCAGTGGAA 3 | >AB033555.1_Ba 4 | CTCCACCACGTTCCACCAAACTCTTCAAGATCCCAGAGTCAGGGCTCTGTACTTTCCTGCTGGTGGCTCCAGTTCAGGAACAGTAAACCCTGTTCAGAACACTGCCTCTTCCATATCGTCAATCTTATCGAAGACTGGGGACCCTGTGCCGAACATGGAGAACATCGCATCAGGACTCCTAGGACCCCTGCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAAAATCCTCACAATACCACAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGAACACCCGTGTGTCTTGGCCAAAATTCGCAGTCCCAAATCTCCAGTCACTCACCAACTTGTTGTCCTCCGATTTGTCCTGGTTATCGCTGGATGTGTCTGCGGCGTTTTATCATCTTCCTCTGCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTTCTTCTGGACTATCAAGGTATGTTGCCCGTTTGTCCTCTAATTCCAGGATCATCAACCACCAGCACCGGACCATGCAAAACCTGCACGACTCCTGCTCAAGGAACCTCTTTGTTTCCCTCATGTTGCTGTACAAAACCTACGGACGGAAATTGCACCTGTATTCCCATCCCATCATCTTGGGCTTTCGCAAAATACCTATGGGAGTGGGCCTCAGTCCGTTTCTCTTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTCTGGCTTTCAGTTATATGGATGATGTGGTTTTGGGGGCCAAGTCTGTACAACATCTTGAGTCCCTTTATGCCGCTGTTACCAATTTTCTTTTGTCTTTGGGTATACATTTAAACCCTCAGAAAACAAAAAGATGGGGCTACTCCCTTAACTTCATGGGGTATGTAATTGGAAGTTGGGGCACCTTACCCCAAGAACATATTGTGTTGAAAATCAAACAATGTTTTAGAAAACTTCCTGTAAACAGGCCTATTGATTGGAAAGTGTGTCAACGAATTGTGGGTCTTTTGGGATTTGCTGCTCCTTTCACACAATGTGGTTATCCTGCTTTAATGCCTTTATATGCATGTATACAAGCTAAACAGGCTTTTACTTTTTCGCCAACATATAAGGCCTTTCTAAACAAACAATATCTGAACCTTTACCCCGTTGCTCGGCAACGGCCAGGTCTGTGCCAAGTGTTTGCTGACGCAACCCCCACTGGCTGGGGCTTGGCCATAGGCCATCAGCGCATGCGTGGAACCTTTGTGTCTCCTCTGCCGATCCATACTGCGGAACTCCTAGCCGCTTGTTTTGCTCGCAGCAGGTCTGGAGCAAACCTTATCGGGACTGACAATTCTGTCGTCCTTTCCCGCAAATATACATCGTTTCCATGGCTGCTAGGATGTGCTGCCAACTGGATCCTGCGCGGGACGTCCTTTGTTTACGTCCCGTCGGCGCTGAATCCCGCGGACGACCCCTCCCGGGGTCGCTTGGGGCTCTACCGCCCTCTTCTCCGTCTGCCGTACCGACCGACCACGGGGCGCACCTCTCTTTACGCGGACTCCCCGTCTGTGCCTTCTCATCTGCCGGACCGTGTGCACTTCGCTTCACCTCTGCACGTCGCATGGAGACCACCGTGAACGCCCATCGGAACCTGCCCAAGGTCTTGCATAAGAGGACTCTTGGACTTTCAGCAATGTCAACGACCGACCTTGAGGCATACTTCAAAGACTGTGTGTTTAATGAGTGGGAGGAGTTGGGGGAGGAGATCAGGTTAAAGGTCTTTGTACTAGGAGGCTGTAGGCATAAATTGGTCTGTTCACCAGCACCATGCAACTTTTTCACCTCTGCCTAATCATCTCATGTTCATGTCCTATTGTTCAAGCCTCCAAGCTGTGCCTTGGGTGGCTTTGGGGCATGGACATTGACCCGTATAAAGAATTTGGAGCTTCTGTGGAGTTACTCTCTTTTTTGCCTTCTGACTTCTTTCCTTCTATTCGAGATCTTCTCGACACCGCCTCAGCTCTGTATCGGGAGGCCTTAGAGTCTCCGGAACATTGTTCACCTCACCATACGGCACTCAGGCAAGCTATTCTGTGTTGGGGTGAGTTGATGAATCTAGCCACCTGGGTGGGAAGTAATTTGGAAGACCCAGCCTCCCGGGAATTAGTAGTCAGCTATGTCAATGTTAATATGGGCCTAAAAATCAGACAACTATTGTGGTTTCACATTTCCTGTCTTACTTTTGGAAGAGAAACTGTTCTTGAATATTTGGTGTCTTTTGGAGTGTGGATTCGCACACCTCCTGCATATAGACCACCAAATGCCCCTATCTTATCAACACTTCCGGAAACTACTGTTGTTAGACGACGAGGCAGGTCCCCTAGAAGAAGAACTCCCTCGCCTCGCAGACGAAGGTCTCAATCGCCCCGTCGCAGAAGATCTCAATCTCGGGAATCTCAATGTTAGTATTCCTTGGACTCATAAGGTGGGAAACTTTACGGGGCTTTATTCTTCTACGGTTCCTAGCTTTAACCCTCAATGGCAAACTCCTTCCTTTCCTGACATTCATTTGCAGGAGGACATCATTAATAGATGTAACCAATTTGTGGGACCCCTTACAGTGAATGAAAACAGGAGACTAAAATTGATTATGCCTGCTAGGTTCTATCCCAATGTTACTAAATATTTGCCCTTAGATAAAGGAATTAAACCTTATTATCCAGAGCATGTAGTTAATCATTACTTCCAGACGAGACATTATTTACATACTCTTTGGAAGGCGGGTATCTTATATAAAAGAGAGACAACACGTAGCGCCTCATTTTGCGGGTCACCATATTCTTGGGAACAAGAGCTACAGCATGGGAGGTTGGTCCTCCAAACCTCGACAAGGCATGGGGACAAATCTTTCCGTCCCCAATCCGCTGGGATTCTTTCCCGATCACCAGTTGGACCCTGCATTCAAAGCCAACTCCGACAATCCCGATTGGGACCTCAACCCACACAAGGACAACTGGCCGGACTCCAACAAGGTGGGAGTGGGAGCATTCGGGCCGGGATTCACTCCACCCCATGGGGGACTGTTGGGGTGGAGCCCTCAAGCTCAGGGCATACTCACAACTGTGCCAACAGCTCCTCCTCCTGCCTCCACCAATCGGCAGTTAGGAAGGAAGCCTACTCCCCTGTCTCCACCTCTAAGAGACACTCATCCTCAGGCAATGCAGTGGAA 5 | >AB010291.1_Bj 6 | CTCCACCACTTTCCACCAAACTCTTCAAGATCCCAGAGTCAGGGCTCTGTACCTTCCTGCTGGTGGCTCCAGTTCAGGAACAGTAAGCCCTGCTCAGAATACTGTCTCTGCCATATCGTCAATCTTATCGAAGACTGGGGACCCTGTGCCGAACATGGAGAACATCGCATCAGGACTCCTAGGACCCCTGCTCGTGTTACAGGCGGGGTTTTTCTCGTTGACAAAAATCCTCACAATACCACAGAGTCTAGACTCGTGGTGGACTTCTCTCAGTTTTCTAGGGGGAACACCCGTGTGTCTTGGCCAAAATTCGCAGTCCCAAATCTCCAGTCACTCACCAACTTGTTGTCCTCCAATTTGTCCTGGTTATCGCTGGATGTGTCTGCGGCGTTTTATCATCTTCCTCTGCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTTCTTCTGGACTATCAAGGTATGTTGCCCGTTTGTCCTCTAATTCCAGGATCATCAACCACCAGCACGGGACCATGCAAGACCTGCACAACTCCTGCTCAAGGAACCTCTATGTTTCCCTCATGTTGCTGTACAAAACCTACGGACGGAAACTGCACCTGTATTCCCATCCCATCATCTTGGGCTTTCGCAAAATACCTATGGGAGTGGGCCTCAGTCCGTTTCTCTTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTCTGGCTTTCAGTTATATGGATGATGTGGTTCTGGGGGCCAAGTCTGTACAACATCTTGAGTCCCTTTATGCCGCTGTTACCAATTTTCTTGTGTCTCTGGGTATACATGTAAACCCTCACAAAACAAAAAGATGGGGATACTCCCTTAATTTCATGGGATATGTAATTGGGAGTTGGGGCACATTGCCACAGGAACATATTAGACAAAAAATCAAACTATGTTTTAGAAAACTTCCTGTAAACAGGCCTATTGATTGGAAAGTATGTCAAAGAATTGTGGGTCTTTTGGGGTTTGCGGCCCCTTTTACACAATGTGGATATCCTGCTTTAATGCCTTTATATGCATGTATATCAGCAAAACAGGCTTTTACTTTCTCGCCAACTTACAAGGCCTTTCTAAGTCAACAGTATCTGAACCTTTACCCCGTTGCTCGGCAACGGTCTGGTCTGTGCCAAGTGTTTGCTGACGCAACCCCCACTGGTTGGGGCTTGGCCATAGGCCATCAGCGCATGCGTGGAACCTTTGTGTCTCCTCTGCCGATCCATACTGCGGAACTCCTAGCCGCTTGTTTTGCTCGCAGCAGGTCTGGAGCGAAACTCATCGGGACTGACAATTCTGTAGTGCTCTCCCGCAAGTATACATCATTTCCATGGCTGCTAGGCTGTGCTGCCAACTGGATCCTGCGCGGGACGTCCTTTGTTTACGTCCCGTCGGCGCTGAATCCCGCGGACGACCCCTCCCGGGGCCGCTTGGGGCTATACCGCCCGCTTCTCCGTCTACCGTACCGACCGACCACGGGGCGCACCTCTCTTTACGCGGACTCCCCGTCTGTGCCTTCTCATCTGCCGGACCGTGTGCACTTCGCTTCACCTCTGCACGTCGCATGGAGACCACCGTGAACGCCCACCGGAACTTGCCCAAGGTCTTGCATAAGAGGACTCTTGGACTTTCAGTAATGTCAACGACCGACCTTGAGGCATACTTCAAAGACTGTGTGTTTACTGAGTGGGAGGAGCTGGGGGAGGAGATGAGGTTAAAGGTCTTTGTACTAGGAGGCTGTAGGCATAAATTGGTCTGTTCACCAGCACCATGCAACTTTTTCACCTCTGCCTAGTCATCTCTTGTTCATGTCCTACTGTTCAAGCCTCCAAGCTGTGCCTTGGGTGGCTTTAGGGCATGGACATTGACCCTTATAAAGAATTTGGAGCTACTACGGAGTTAATCTCTTTTTTGCCTGCTGACTTCTTTCCGTCGGTGCGAGACCTCCTAGATACCGCTGCTGCTCTGTATCGGGAAGCCTTAGAATCTCCTGAACATTGCTCACATCACCACACAGCACTCAGGCAAGCTACTCTGTGCTGGGGGGAATTAATGACTCTAGCTACCTGGGTGGGTAATAATTTACAAGATCCAGCCTCCAGGGATCTAGTAGTCAATTATGTTAACACTAACATGGGCCTAAAGATCAGGCAATTATTGTGGTTTCACATTTCCTGTCTTACTTTTGGAAGAGAAACTGTTCTTGAATATTTGGTGTCTTTTGGAGTGTGGATTCGCACTCCCCCTGCCTACAGACCACCAAATGCCCCTATCTTATCAACACTTCCGGAAACTACTGTTGTTAGACGACGAGGCAGGTCCCCTAGAAGAAGAACTCCCTCGCCTCGCAGACGAAGGTCTCAATCGCCGCGTCGCAGAAGATCTCAATCTCGGGAATCCCAATGTTAGTATCCCTTGGACTCATAAGGTGGGAAACTTTACGGGGCTCTATTCTTCTACAGTACCTGTCTTTAATCCTGAATGGCAGACTCCTTCTTTTCCAGACATTCATTTGCAGGAGGACATTGTTGATAGATGTAAGCAATTTGTGGGACCCCTTACAGTAAATGAAAACAGGAGACTAAAATTAATAATGCCTGCTAGATTTTATCCTAATGTTACCAAATATTTGCCCTTAGATAAAGGGATCAAACCTTATTATCCAGAGCATGTAGTTAATCATTACTTCCAGGCGAGACATTATTTGCATACTCTTTGGAAGGCGGGCATCTTATATAAAAGAGAGTCAACACATAGCGCCTCATTTTGCGGGTCACCTTATTCTTGGGAACAAGATCTACAGCATGGGAGGTTGGTCTTCCAAACCTCGAAAAGGCATGGGGACAAATCTTTCTGTCCCCAATCCCCTGGGATTCTTCCCCGATCATCAGTTGGACCCTGCATTCAAAGCCAACTCAGAAAATCCAGATTGGGACCTCAACCCACACAAGGACAACTGGCCGGACGCCCACAAGGTGGGAGTGGGAGCATTCGGGCCAGGGTTCACCCCTCCCCATGGGGGACTGTTGGGGTGGAGCCCTCAGGCTCAGGGCATACTCACATCTGTGCCAGCAGCTCCTCCTCCTGCCTCCACCAATCGGCAGTCAGGAAGGCAGCCTACTCCCTTATCTCCACCTCTAAGGGACACTCATCCTCAGGCCGTGCAGTGGAA 7 | -------------------------------------------------------------------------------- /datasets/hbv_C_Bj_Ba.fasta: -------------------------------------------------------------------------------- 1 | >AB048704.1_genotype_C_ 2 | TTCCACAGCATTCCACCAAGCTCTGCAGGATCCCAGAGTAAGGGGTCTGTATTTTCCTGCTGGTGGCTCCAGTTCCGGAACAGTAAACCCTGTTCCGAATACTGTCTCTCACATCTCATCAATCTTCACGAAGACTGGGGACCCTGCATCGAACATGGAGAGCACAACATCAGGATTCCTAGGACCCCTGCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAGAATCCTCACAATACCTCAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGAGCTCCCGGGTGTATTGGCCAAAATTCGCAGTCCCAAACCTCCAATCACTCACCAACCTCTTGTCCTCCAACCTGTCCTGGCTATCGTTGGATGTGTCTGCGGCGTTTTATCATCTTCCTCTTCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTTCTTCTGGACTATCAAGGTATGTTGCCCGTTTGTCCTCTACTTCCAGGATCAACGACCACCAGCACGGGACCTTGCAGAACCTGCACGATCACTGCTCAAGGAACCTCTATGTTTCCCTCATGTTGCTGTACAAAACCTTCGGACGGAAATTGCACCTGTATTCCCATCCCATCATCTTGGGGTTTCGCAAAATTCCTATGGGAGTGGGCCTCAGTCCGTTTCTCCTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGCAGGGCTTTCCCCCACTGTTTGGCTTTCAGTTATATGGATGATCTGGTATTGGGGGCCAAGTCTGTACAACATCTTGAGTCCCTTTTTACCGCTGTTACCAATTTTCTTATGTCTTTGGGTATACATTTAAATCCCCACAAAACAAAAAGATGGGGTTATTCCCTCAACTTTATGGGATATGTGATCGGGAGTTGGGGAAGCTTACCGCAAGAGCATATTGTACACAAACTCAAACACTGTTTTAGAAAACTTCCTGTTAATAGGCCTATTGATTGGAAAGTATGCCAACGAATTGTGGGTTTATTGGGCTTCGCTGCCCCTTTTACACAATGTGGTTATCCTGCCTTAATGCCTTTGTATGCGTGCATACAAGCCAAGCAAGCTTTCACTTTCTCGCCAACTTACAAGGCCTTTCTGTGTAAACAATATCTGAACCTTTACCCCGTTGCCCGGCAACGGGCTGGTCTCTGCCAAGTGTTTGCTGACGCAACCCCCACTGGTTGGGGCTTGGCCATAGGCCATCAGCGCGTGCGGGGAACCTTTGTGGCTCCTCTGCCGATCCATACTGCGGAACTCCTAGCAGCTTGTTTCGCTCGCAGCCGGTCTGGAGCGAACATTCTCGGGACCGACAACTCTGTTGTTCTCTCTCGGAAATACACCTCCTTTCCATGGCTGCTAGGCTGTGCTGCCAACTGGATCCTACGCGGGACGTCCTTTGTCTACGTCCCGTCGGCGCTGAATCCCGCGGACGACCCGTCTCGGGGCCGCTTGGGGATCTACCGTCCCCTTCTGCGTCTCCCGTTCCGACCATCGACAGGGCGCACCTCTCTTTACGCGGACTCCCCGTCTGTGCCTTCTCATCTGCCGGACCGTGTGCACTTCGCTTCACCTCTGCACGTCGCATGGAGACCACCGTGAACACCCACATGATCTTGCCCAAGGTCTTGCATAAGAGGACTCTTGGACTCCCAGCGATGTCAACGATCGACCTTGAGGCATACTTCAAAGACTGTTTGTTTAAAGACTGGGAGGAGTTGGGGGAGGAGATTAGGCTAAAGGTCTTTGTACTAGGAGGCTGTAGGCATAAATTGGTCTGTTCACCAGCACCATGCAACTTTTTCACCTCTGCCTAATCATCTCATGTTCATGTCCTACTGTTCAAGCCTCCAAGTTGTGCCTTGGGTGGCTTTAGGACATGGACATTGACCCTTATAAAGAATTTGGAGCTTCTGTGGAGTTACTCTCTTTTTTGCCTTCTGATTTCTTTCCAAATATTCGAGATCTCCTCGACACCGCCTCCGCCCTGTATCGGGAGGCCTTAGAGTCTCCGGAACATTGCTCACCTCACCATACCGCACTCAGGCAAGCTATACTGTGTTGGGGTGAGTTAATGAATCTGGCAACCTGGGTGGGAAGTAATTTGGAAGATCCAGCATCCAGGGAATTAGTAGTCAGTTATGTCAACGTTAATATGGGCCTAAAAATTAGACAACTATTGTGGTTTCACATTTCCTGTCTTACTTTTGGAAGAGAAACTGTTCTTGAGTATTTGGTGTCTTTTGGAGTGTGGATTCGCACTCCTATCGCTTACAGACCACCAAATGCCCCTATCCTATCAACACTTCCGGAAACTACTGTTGTTAGACGACGAGGCAGGTCCCCTAGAAGAAGAACTCCCTCGCCTCGCAGACGAAGGTCTCAATCGCCGCGTCGCAGAAGATCTCAATCTCGGGAATCCCAATGTTAGTATCCCTTGGACTCATAAGGTGGGAAACTTTACTGGGCTTTATTCTTCTACTGTACCTGTCTTTAATCCTGATTGGCAAACTCCCAAGTTTCCTGATATTCATTTAAAGGAGGACATTATCAATAGGTGTCAAAATTATGTAGGCCCTCTTACAGTCAATGAAAAAAGAAGATTAAAATTAATTATGCCTGCTAGGTTCTATCCTACTCTTACCAAATATTTGCCCCTAGAGAAAGGCATAAAACCTTATTATCCTGAACATGCAGTTAATCATTACTTCAAAACTAGGCATTATTTACATACTCTGTGGAAGGCTGGCATTCTATATAAGAGAGAAACTACACGCAGCGCCTCATTTTGTGGGTCACCATATTCTTGGGAACAAGAGCTACAGCATGGGAGGTTGGTCTTCCAAACATCGGAAAGGCATGGGGACGAATCTTTCTGTTCCCAATCCTCTGGGATTCTTTCCCGATCACCAGTTGGACCCTGCGTTCGGAGCCAACTCAAACAATCCAGATTGGGACTTCAACCCCAACAAGGATCACTGGCCAGAGGCAAATCAGGTAGGAGCGGGAGCATTCGGGCCAGGGTTCACCCCACCACACGGAGGTTTTTTGGGGTGGAGCCCGCAGGCCCAGGGCATATTGACAACAGTGCCAGCAGCTCCTCCTCCTGCATCCACCAATCGGCAGTCAGGAAGACAACCCACTCCCATCTCACCACCGCTCAGAGACACTCACCCTCAGGCCATGCAGTGGAA 3 | >AB033555.1_Ba 4 | CTCCACCACGTTCCACCAAACTCTTCAAGATCCCAGAGTCAGGGCTCTGTACTTTCCTGCTGGTGGCTCCAGTTCAGGAACAGTAAACCCTGTTCAGAACACTGCCTCTTCCATATCGTCAATCTTATCGAAGACTGGGGACCCTGTGCCGAACATGGAGAACATCGCATCAGGACTCCTAGGACCCCTGCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAAAATCCTCACAATACCACAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGAACACCCGTGTGTCTTGGCCAAAATTCGCAGTCCCAAATCTCCAGTCACTCACCAACTTGTTGTCCTCCGATTTGTCCTGGTTATCGCTGGATGTGTCTGCGGCGTTTTATCATCTTCCTCTGCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTTCTTCTGGACTATCAAGGTATGTTGCCCGTTTGTCCTCTAATTCCAGGATCATCAACCACCAGCACCGGACCATGCAAAACCTGCACGACTCCTGCTCAAGGAACCTCTTTGTTTCCCTCATGTTGCTGTACAAAACCTACGGACGGAAATTGCACCTGTATTCCCATCCCATCATCTTGGGCTTTCGCAAAATACCTATGGGAGTGGGCCTCAGTCCGTTTCTCTTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTCTGGCTTTCAGTTATATGGATGATGTGGTTTTGGGGGCCAAGTCTGTACAACATCTTGAGTCCCTTTATGCCGCTGTTACCAATTTTCTTTTGTCTTTGGGTATACATTTAAACCCTCAGAAAACAAAAAGATGGGGCTACTCCCTTAACTTCATGGGGTATGTAATTGGAAGTTGGGGCACCTTACCCCAAGAACATATTGTGTTGAAAATCAAACAATGTTTTAGAAAACTTCCTGTAAACAGGCCTATTGATTGGAAAGTGTGTCAACGAATTGTGGGTCTTTTGGGATTTGCTGCTCCTTTCACACAATGTGGTTATCCTGCTTTAATGCCTTTATATGCATGTATACAAGCTAAACAGGCTTTTACTTTTTCGCCAACATATAAGGCCTTTCTAAACAAACAATATCTGAACCTTTACCCCGTTGCTCGGCAACGGCCAGGTCTGTGCCAAGTGTTTGCTGACGCAACCCCCACTGGCTGGGGCTTGGCCATAGGCCATCAGCGCATGCGTGGAACCTTTGTGTCTCCTCTGCCGATCCATACTGCGGAACTCCTAGCCGCTTGTTTTGCTCGCAGCAGGTCTGGAGCAAACCTTATCGGGACTGACAATTCTGTCGTCCTTTCCCGCAAATATACATCGTTTCCATGGCTGCTAGGATGTGCTGCCAACTGGATCCTGCGCGGGACGTCCTTTGTTTACGTCCCGTCGGCGCTGAATCCCGCGGACGACCCCTCCCGGGGTCGCTTGGGGCTCTACCGCCCTCTTCTCCGTCTGCCGTACCGACCGACCACGGGGCGCACCTCTCTTTACGCGGACTCCCCGTCTGTGCCTTCTCATCTGCCGGACCGTGTGCACTTCGCTTCACCTCTGCACGTCGCATGGAGACCACCGTGAACGCCCATCGGAACCTGCCCAAGGTCTTGCATAAGAGGACTCTTGGACTTTCAGCAATGTCAACGACCGACCTTGAGGCATACTTCAAAGACTGTGTGTTTAATGAGTGGGAGGAGTTGGGGGAGGAGATCAGGTTAAAGGTCTTTGTACTAGGAGGCTGTAGGCATAAATTGGTCTGTTCACCAGCACCATGCAACTTTTTCACCTCTGCCTAATCATCTCATGTTCATGTCCTATTGTTCAAGCCTCCAAGCTGTGCCTTGGGTGGCTTTGGGGCATGGACATTGACCCGTATAAAGAATTTGGAGCTTCTGTGGAGTTACTCTCTTTTTTGCCTTCTGACTTCTTTCCTTCTATTCGAGATCTTCTCGACACCGCCTCAGCTCTGTATCGGGAGGCCTTAGAGTCTCCGGAACATTGTTCACCTCACCATACGGCACTCAGGCAAGCTATTCTGTGTTGGGGTGAGTTGATGAATCTAGCCACCTGGGTGGGAAGTAATTTGGAAGACCCAGCCTCCCGGGAATTAGTAGTCAGCTATGTCAATGTTAATATGGGCCTAAAAATCAGACAACTATTGTGGTTTCACATTTCCTGTCTTACTTTTGGAAGAGAAACTGTTCTTGAATATTTGGTGTCTTTTGGAGTGTGGATTCGCACACCTCCTGCATATAGACCACCAAATGCCCCTATCTTATCAACACTTCCGGAAACTACTGTTGTTAGACGACGAGGCAGGTCCCCTAGAAGAAGAACTCCCTCGCCTCGCAGACGAAGGTCTCAATCGCCCCGTCGCAGAAGATCTCAATCTCGGGAATCTCAATGTTAGTATTCCTTGGACTCATAAGGTGGGAAACTTTACGGGGCTTTATTCTTCTACGGTTCCTAGCTTTAACCCTCAATGGCAAACTCCTTCCTTTCCTGACATTCATTTGCAGGAGGACATCATTAATAGATGTAACCAATTTGTGGGACCCCTTACAGTGAATGAAAACAGGAGACTAAAATTGATTATGCCTGCTAGGTTCTATCCCAATGTTACTAAATATTTGCCCTTAGATAAAGGAATTAAACCTTATTATCCAGAGCATGTAGTTAATCATTACTTCCAGACGAGACATTATTTACATACTCTTTGGAAGGCGGGTATCTTATATAAAAGAGAGACAACACGTAGCGCCTCATTTTGCGGGTCACCATATTCTTGGGAACAAGAGCTACAGCATGGGAGGTTGGTCCTCCAAACCTCGACAAGGCATGGGGACAAATCTTTCCGTCCCCAATCCGCTGGGATTCTTTCCCGATCACCAGTTGGACCCTGCATTCAAAGCCAACTCCGACAATCCCGATTGGGACCTCAACCCACACAAGGACAACTGGCCGGACTCCAACAAGGTGGGAGTGGGAGCATTCGGGCCGGGATTCACTCCACCCCATGGGGGACTGTTGGGGTGGAGCCCTCAAGCTCAGGGCATACTCACAACTGTGCCAACAGCTCCTCCTCCTGCCTCCACCAATCGGCAGTTAGGAAGGAAGCCTACTCCCCTGTCTCCACCTCTAAGAGACACTCATCCTCAGGCAATGCAGTGGAA 5 | >AB010291.1_Bj 6 | CTCCACCACTTTCCACCAAACTCTTCAAGATCCCAGAGTCAGGGCTCTGTACCTTCCTGCTGGTGGCTCCAGTTCAGGAACAGTAAGCCCTGCTCAGAATACTGTCTCTGCCATATCGTCAATCTTATCGAAGACTGGGGACCCTGTGCCGAACATGGAGAACATCGCATCAGGACTCCTAGGACCCCTGCTCGTGTTACAGGCGGGGTTTTTCTCGTTGACAAAAATCCTCACAATACCACAGAGTCTAGACTCGTGGTGGACTTCTCTCAGTTTTCTAGGGGGAACACCCGTGTGTCTTGGCCAAAATTCGCAGTCCCAAATCTCCAGTCACTCACCAACTTGTTGTCCTCCAATTTGTCCTGGTTATCGCTGGATGTGTCTGCGGCGTTTTATCATCTTCCTCTGCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTTCTTCTGGACTATCAAGGTATGTTGCCCGTTTGTCCTCTAATTCCAGGATCATCAACCACCAGCACGGGACCATGCAAGACCTGCACAACTCCTGCTCAAGGAACCTCTATGTTTCCCTCATGTTGCTGTACAAAACCTACGGACGGAAACTGCACCTGTATTCCCATCCCATCATCTTGGGCTTTCGCAAAATACCTATGGGAGTGGGCCTCAGTCCGTTTCTCTTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTCTGGCTTTCAGTTATATGGATGATGTGGTTCTGGGGGCCAAGTCTGTACAACATCTTGAGTCCCTTTATGCCGCTGTTACCAATTTTCTTGTGTCTCTGGGTATACATGTAAACCCTCACAAAACAAAAAGATGGGGATACTCCCTTAATTTCATGGGATATGTAATTGGGAGTTGGGGCACATTGCCACAGGAACATATTAGACAAAAAATCAAACTATGTTTTAGAAAACTTCCTGTAAACAGGCCTATTGATTGGAAAGTATGTCAAAGAATTGTGGGTCTTTTGGGGTTTGCGGCCCCTTTTACACAATGTGGATATCCTGCTTTAATGCCTTTATATGCATGTATATCAGCAAAACAGGCTTTTACTTTCTCGCCAACTTACAAGGCCTTTCTAAGTCAACAGTATCTGAACCTTTACCCCGTTGCTCGGCAACGGTCTGGTCTGTGCCAAGTGTTTGCTGACGCAACCCCCACTGGTTGGGGCTTGGCCATAGGCCATCAGCGCATGCGTGGAACCTTTGTGTCTCCTCTGCCGATCCATACTGCGGAACTCCTAGCCGCTTGTTTTGCTCGCAGCAGGTCTGGAGCGAAACTCATCGGGACTGACAATTCTGTAGTGCTCTCCCGCAAGTATACATCATTTCCATGGCTGCTAGGCTGTGCTGCCAACTGGATCCTGCGCGGGACGTCCTTTGTTTACGTCCCGTCGGCGCTGAATCCCGCGGACGACCCCTCCCGGGGCCGCTTGGGGCTATACCGCCCGCTTCTCCGTCTACCGTACCGACCGACCACGGGGCGCACCTCTCTTTACGCGGACTCCCCGTCTGTGCCTTCTCATCTGCCGGACCGTGTGCACTTCGCTTCACCTCTGCACGTCGCATGGAGACCACCGTGAACGCCCACCGGAACTTGCCCAAGGTCTTGCATAAGAGGACTCTTGGACTTTCAGTAATGTCAACGACCGACCTTGAGGCATACTTCAAAGACTGTGTGTTTACTGAGTGGGAGGAGCTGGGGGAGGAGATGAGGTTAAAGGTCTTTGTACTAGGAGGCTGTAGGCATAAATTGGTCTGTTCACCAGCACCATGCAACTTTTTCACCTCTGCCTAGTCATCTCTTGTTCATGTCCTACTGTTCAAGCCTCCAAGCTGTGCCTTGGGTGGCTTTAGGGCATGGACATTGACCCTTATAAAGAATTTGGAGCTACTACGGAGTTAATCTCTTTTTTGCCTGCTGACTTCTTTCCGTCGGTGCGAGACCTCCTAGATACCGCTGCTGCTCTGTATCGGGAAGCCTTAGAATCTCCTGAACATTGCTCACATCACCACACAGCACTCAGGCAAGCTACTCTGTGCTGGGGGGAATTAATGACTCTAGCTACCTGGGTGGGTAATAATTTACAAGATCCAGCCTCCAGGGATCTAGTAGTCAATTATGTTAACACTAACATGGGCCTAAAGATCAGGCAATTATTGTGGTTTCACATTTCCTGTCTTACTTTTGGAAGAGAAACTGTTCTTGAATATTTGGTGTCTTTTGGAGTGTGGATTCGCACTCCCCCTGCCTACAGACCACCAAATGCCCCTATCTTATCAACACTTCCGGAAACTACTGTTGTTAGACGACGAGGCAGGTCCCCTAGAAGAAGAACTCCCTCGCCTCGCAGACGAAGGTCTCAATCGCCGCGTCGCAGAAGATCTCAATCTCGGGAATCCCAATGTTAGTATCCCTTGGACTCATAAGGTGGGAAACTTTACGGGGCTCTATTCTTCTACAGTACCTGTCTTTAATCCTGAATGGCAGACTCCTTCTTTTCCAGACATTCATTTGCAGGAGGACATTGTTGATAGATGTAAGCAATTTGTGGGACCCCTTACAGTAAATGAAAACAGGAGACTAAAATTAATAATGCCTGCTAGATTTTATCCTAATGTTACCAAATATTTGCCCTTAGATAAAGGGATCAAACCTTATTATCCAGAGCATGTAGTTAATCATTACTTCCAGGCGAGACATTATTTGCATACTCTTTGGAAGGCGGGCATCTTATATAAAAGAGAGTCAACACATAGCGCCTCATTTTGCGGGTCACCTTATTCTTGGGAACAAGATCTACAGCATGGGAGGTTGGTCTTCCAAACCTCGAAAAGGCATGGGGACAAATCTTTCTGTCCCCAATCCCCTGGGATTCTTCCCCGATCATCAGTTGGACCCTGCATTCAAAGCCAACTCAGAAAATCCAGATTGGGACCTCAACCCACACAAGGACAACTGGCCGGACGCCCACAAGGTGGGAGTGGGAGCATTCGGGCCAGGGTTCACCCCTCCCCATGGGGGACTGTTGGGGTGGAGCCCTCAGGCTCAGGGCATACTCACATCTGTGCCAGCAGCTCCTCCTCCTGCCTCCACCAATCGGCAGTCAGGAAGGCAGCCTACTCCCTTATCTCCACCTCTAAGGGACACTCATCCTCAGGCCGTGCAGTGGAA 7 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [![DOI](https://joss.theoj.org/papers/10.21105/joss.02014/status.svg)](https://doi.org/10.21105/joss.02014) 2 | [![PyPI version](https://badge.fury.io/py/recan.svg)](https://badge.fury.io/py/recan) 3 | 4 | [Recan](#recan) 5 | 6 | [Recan web version](#recan-web-version) 7 | 8 | [Requirements](#requirements) 9 | 10 | [Intallation](#intallation) 11 | 12 | [Usage example](#usage-example) 13 | 14 | [Some notes on usage](#some-notes-on-usage) 15 | 16 | [Automated tests](#automated-tests) 17 | 18 | [Example datasets](#example-datasets) 19 | 20 | [References](#references) 21 | 22 | [recan citations](#recan-citations) 23 | 24 | # Recan 25 | `recan` [9] is a Python package which allows to construct genetic distance plots to explore and discover recombination events in viral genomes. This method has been previously implemented in desktop software tools: RAT[1], Simplot[2] and RDP4 [8]. 26 | 27 | ## Recan web version 28 | Recan django-based web version is currently under development 29 | https://github.com/babinyurii/recan_gui 30 | 31 | It is available at: 32 | http://yuriyb.pythonanywhere.com/ 33 | 34 | ## Requirements 35 | To use `recan`, you will need: 36 | - Python 3 37 | - Biopython 38 | - plotly 39 | - pandas 40 | - Jupyter notebook 41 | 42 | 43 | 44 | 45 | ## Intallation 46 | To install the package via `pip` run : 47 | 48 | ` 49 | $ pip install recan 50 | ` 51 | 52 | If you are going to use `recan` in JupyterLab, follow [the insctructions to install the Jupyter Lab Plotly renderer](https://plot.ly/python/getting-started/#jupyterlab-support-python-35) 53 | 54 | ## Usage example 55 | The package is intended to be used in Jupyter notebook. 56 | Import `Simgen` class from the recan package: 57 | ```python 58 | from recan.simgen import Simgen 59 | ``` 60 | 61 | create an object of the Simgen class. To initialize the object pass your alignment in fasta format as an argument: 62 | ```python 63 | sim_obj = Simgen("./datasets/hbv_C_Bj_Ba.fasta") 64 | ``` 65 | The input data are taken from the article by Sugauchi et al.(2002). This paper describes recombination event observed in hepatitis B virus isolates. 66 | 67 | The object of the Simgen class has method `get_info()` which shows information about the alignment. 68 | ```python 69 | sim_obj.get_info() 70 | ``` 71 | ``` 72 | index: sequence id: 73 | 0 AB048704.1_genotype_C_ 74 | 1 AB033555.1_Ba 75 | 2 AB010291.1_Bj 76 | alignment length: 3215 77 | ``` 78 | 79 | 80 | We have three sequences in our alignment. `Simgen` class is based upon the `MultipleSequenceAlignment` class of the Biopython library. So, we treat our alignment as the array with n_samples and n_features, where 'samples' are sequences themselves, and the features are columns of nucleotides in the alignment. Index corresponds to the sequence. Note, that indices start with 0. 81 | 82 | 83 | After you've created the object you can draw the similarity plot. 84 | Call the method `simgen()` of the Simgen object to draw the plot. Pass the following parameters to the method: 85 | - `window`: sliding window size. The number of nucleotides the sliding window will span. It has the value of 500 by default. 86 | - `shift`: this is the step our window slides downstream the alignment. It's value is set to 250 by default 87 | - `pot_rec`: the index of the potential recombinant. All the other sequences will be plotted as function of distance to that sequence. Use method `get_info()` to get the indices, especially if your alignment has many sequences. 88 | 89 | The isolate of Ba genotype is the recombinant between the virus of C genotype and genotype Bj. Let's plot it. We set genotype Ba as the potential recombinant : 90 | 91 | ```python 92 | sim_obj.simgen(window=200, shift=50, pot_rec=1) 93 | ``` 94 | 95 | ![hbv_1](https://raw.githubusercontent.com/babinyurii/recan/master/pictures/HBV_1_rec_C_B_annotated.PNG) 96 | 97 | 98 | Potential recombinant is not shown in the plot, as the distances are calculated relative to it. The higher is the distance function (i.e. the closer to 1), the closer is the sequence to the recombinant and vice versa. 99 | 100 | We can see typical 'crossover' of the distances which is the indicator of the possible recombination event. The distance of one isolate 'drops down' whereas the distance of the other remains the same of even gets closer to the potential recombinant, this abrupt drop shows that recombination could take place. 101 | 102 | The picture from the article is shown below. It's just turned upside down relative to our plot, and instead of distance drop we see distance rising. Here Bj 'goes away' from the genotype C, whereas Ba keeps the same distance 103 | 104 | ![Ba_Bj_C](https://raw.githubusercontent.com/babinyurii/recan/master/pictures/hbv_C_Bj_Ba.jpg) 105 | 106 | 107 | By default `simgen()` method plots the whole alignment. But after initial exploration, we can take a closer look at a particular region by passing the `region` parameter to the simgen method. We can slice the alignment by using this parameter. `region` must be a tuple or a list with two integers: the start and the end position of the alignment slice. 108 | ```python 109 | region = (start, end) 110 | ``` 111 | 112 | ```python 113 | sim_obj.simgen(window=200, shift=50, pot_rec=1, region=(1000, 2700)) 114 | ``` 115 | 116 | ![hbv_slice_1](https://raw.githubusercontent.com/babinyurii/recan/master/pictures/hbv_slice_1.png) 117 | 118 | 119 | To customize the plot or just to export and store the data, use `get_data()` method. `get_data()` returns pandas DataFrame object with sequences as samples, and distances at given points as features. 120 | 121 | ```python 122 | sim_obj.get_data() 123 | ``` 124 | ![hbv_df_example](https://raw.githubusercontent.com/babinyurii/recan/master/pictures/hbv_df_example.png) 125 | 126 | If optional paremeter `df` is set to `False`, `get_data()` returns a tuple containing list of ticks and a dictionary of lists. Each dictionary key is the sequence id, and lists under the keys contain the corresponding distances. 127 | 128 | ```python 129 | positions, data = sim_obj.get_data(df=False) 130 | ``` 131 | ``` 132 | print(positions) 133 | [1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950, 2000, 2050, 2100, 2150, 2200, 2250, 2300, 2350, 2400, 2450, 2500, 2550, 2600, 2650, 2700] 134 | 135 | print(data) 136 | {'AB048704.1_genotype_C_': [0.88, 0.935, 0.925, 0.955, 0.955, 0.965, 0.95, 0.935, 0.94, 0.92, 0.9299999999999999, 0.945, 0.925, 0.945, 0.96, 0.95, 0.975, 0.9733333333333334, 0.96, 0.96], 'AB010291.1_Bj': [0.98, 0.975, 0.97, 0.97, 0.965, 0.95, 0.91, 0.88, 0.85, 0.83, 0.825, 0.865, 0.885, 0.9299999999999999, 0.98, 0.97, 0.98, 0.9733333333333334, 0.96, 0.96]} 137 | ``` 138 | 139 | Once you've returned the data, you can easily customize the plot by using your favourite plotting library: 140 | 141 | ```python 142 | dist_data = sim_obj.get_data() 143 | 144 | import matplotlib.pyplot as plt 145 | import seaborn as sns 146 | sns.set() 147 | 148 | fig_dist1 = plt.figure(figsize=(20, 8)) 149 | plt.plot(df.loc["AB048704.1_genotype_C_", : ], lw=7, alpha=0.7, label="AB048704.1_genotype_C_") 150 | plt.plot(df.loc["AB010291.1_Bj", : ], lw=7, alpha=0.7, label="AB010291.1_Bj") 151 | 152 | plt.ylim(0.75, 1.05) 153 | plt.title("similarity distance plot", fontsize=25) 154 | plt.ylabel("distance relative to Ba", fontsize=20) 155 | plt.xlabel("nucleotide position", fontsize=20) 156 | plt.xticks(fontsize=15) 157 | plt.yticks(fontsize=15) 158 | 159 | plt.axvline(1750, alpha=0.5, color="red", lw=3, 160 | linestyle="dashed", label="putative recombination break points") 161 | plt.axvline(2250, alpha=0.5, color="red", lw=3, 162 | linestyle="dashed" ) 163 | 164 | plt.legend(prop={"size":20}) 165 | plt.show() 166 | ``` 167 | 168 | ![hbv_matplotlib](https://raw.githubusercontent.com/babinyurii/recan/master/pictures/hbv_matplotlib.png) 169 | 170 | 171 | `simgen()` method has optional parameter `dist` which denoted method used to calculate pairwise distance. By default its value is set to `pdist`, so `simgen()` calculates simple pairwise distance. 172 | 173 | Parameters for distance calculation methods: 174 | 175 | - `pdist` : pairwise distance (default) 176 | - `jcd` : Jukes-Cantor distance 177 | - `k2p` : Kimura 2-parameter distance 178 | - `td` : Tamura distance 179 | 180 | 181 | 182 | 183 | ```python 184 | sim_obj.simgen(window=200, shift=50, pot_rec=1, region=(1000, 2700), dist='k2p') 185 | ``` 186 | 187 | to save the distance data in csv format use the method `save_data()`: 188 | ```python 189 | sim_obj.save_data(out_name="hbv_distance_data") 190 | ``` 191 | If there are about 20 or 30 sequences in the input file and their names are long, legend element may hide the plot. So, to be able to analyze many sequences at once, it's better to use short consice sequence names instead of long ones. Like this: 192 | 193 | ![hbv_short_names](https://raw.githubusercontent.com/babinyurii/recan/master/pictures/short_names.png) 194 | 195 | To illustrate how typical breakpoints may look like, here are shown some examples of previously described recombinations in the genomes of different viruses. The fasta alignments used are available at [datasets folder](datasets). 196 | 197 | Putative recombinations in the of 145000 bp genome of lumpy skin disease virus [4]: 198 | 199 | ![lsdv](https://raw.githubusercontent.com/babinyurii/recan/master/pictures/lsdv_rec_sar.png) 200 | 201 | Recombination in HIV genome [5]: 202 | ![hiv](https://raw.githubusercontent.com/babinyurii/recan/master/pictures/hiv_rec_kal153.png) 203 | 204 | HCV intergenotype recombinant 2k/1b [6]: 205 | ![hcv](https://raw.githubusercontent.com/babinyurii/recan/master/pictures/hcv_2k_1b_rec.png) 206 | 207 | Norovirus recombinant isolate [7]: 208 | ![norovirus](https://raw.githubusercontent.com/babinyurii/recan/master/pictures/norovirus_rec.png) 209 | 210 | 211 | ## Some notes on usage 212 | - the optimal window size is about 200-250 bp, the optimal window shift is typicall about 50-150 bp 213 | - now distance calculation skips degenerate nucleotides and gaps and they do not influence the distance values 214 | 215 | ## Automated tests 216 | To verify the installation, go to the `recan/test/` folder and run: 217 | 218 | ` 219 | $ pytest test.py 220 | ` 221 | 222 | 223 | ## Example datasets 224 | To download the datasets use the following link: 225 | https://drive.google.com/drive/folders/1v2lg5yUDFw_fgSiulsA1uFeuzoGz0RjH?usp=sharing 226 | 227 | 228 | ## References 229 | 230 | 1. Recombination Analysis Tool (RAT): a program for the high-throughput detection of recombination. Bioinformatics, Volume 21, Issue 3, 231 | 1 February 2005, Pages 278–281, https://doi.org/10.1093/bioinformatics/bth500 232 | 2. https://sray.med.som.jhmi.edu/SCRoftware/simplot/ 233 | 3. Hepatitis B Virus of Genotype B with or without Recombination with Genotype C over the Precore Region plus the Core Gene. Fuminaka Sugauchi et al. JOURNAL OF VIROLOGY, June 2002, p. 5985–5992. 10.1128/JVI.76.12.5985-5992.2002 https://jvi.asm.org/content/76/12/5985 234 | 4. Sprygin A, Babin Y, Pestova Y, Kononova S, Wallace DB, Van Schalkwyk A, et al. (2018) Analysis and insights into recombination signals in lumpy skin disease virus recovered in the field. PLoS ONE 13(12): e0207480. https://doi.org/10.1371/journal.pone.0207480 235 | 5. Liitsola, K., Holm K., Bobkov, A., Pokrovsky, V., Smolskaya,T., Leinikki,P., Osmanov,S. and Salminen,M. (2000) An AB recombinant and its parental HIV type 1 strains in the area of the former Soviet Union: low requirements for sequence identity in recombination. UNAIDS Virus Isolation Network. AIDS Res. Hum. Retroviruses, 16, 1047–1053. 236 | 6. Smith, D. B., Bukh, J., Kuiken, C., Muerhoff, A. S., Rice, C. M., Stapleton, J. T., & Simmonds, P. (2014). Expanded classification of hepatitis C virus into 7 genotypes and 67 subtypes: Updated criteria and genotype assignment web resource. Hepatology, 59(1), 318–327. https://doi.org/10.1002/hep.26744 237 | 7. Jiang,X., Espul,C., Zhong,W.M., Cuello,H. and Matson,D.O. (1999) Characterization of a novel human calicivirus that may be a naturally occurring recombinant. Arch. Virol., 144, 2377–2387. 238 | 8. Martin, D. P., Murrell, B., Golden, M., Khoosal, A., & Muhire, B. (2015). RDP4: Detection and analysis of recombination patterns in virus genomes. Virus Evolution, 1(1), 1–5. https://doi.org/10.1093/ve/vev003 239 | 9. Babin, Y., (2020). Recan: Python tool for analysis of recombination events in viral genomes. Journal of Open Source Software, 5(49), 2014. 240 | https://doi.org/10.21105/joss.02014 241 | 242 | ## recan citations 243 | 244 | 1. Characterization of SARS-CoV-2 P.1 (Gamma) Variant of Concern From Amazonas, Brazil. Zimerman RA et al. (2022). Comparative Genomics and Front. Med. 9:806611. https://doi.org/10.3389/fmed.2022.806611 245 | 2. In book: Proceedings of the 4th International Conference on Big Data Analytics for Cyber-Physical System in Smart City - Volume 2. Chapter: Python Data Analysis Techniques in Administrative Information Integration Management System April 2023 https://doi.org/10.1007/978-981-99-1157-8_35 246 | 3. Substantial viral diversity in bats and rodents from East Africa: insights into evolution, recombination, and cocirculation. Daxi Wang et al. 2024. Microbiome (2024) 12:72 https://doi.org/10.1186/s40168-024-01782-4 247 | 4. Identification of Recombinant Aichivirus D in Cattle, Italy.Pellegrini, F et al. I Animals 2024, 14, 3315. https://doi.org/10.3390/ani14223315 248 | -------------------------------------------------------------------------------- /datasets/hbv.fasta: -------------------------------------------------------------------------------- 1 | >KM519455_South_Africa 2 | TCCTAGGACCCCTTCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAGAATCCTCACAATACCGCAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGAACTACCGTGTGTCTTGGCCAAAATTCGCAGTCCCCAACCTCCAATCACTCACCAACCTCCTGTCCTCCAACTTGTCCTGGTTATCGCTGGATGTGTCTGCGGCGTTTTATCATCTTCCTCTTCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTTCTTCTGGACTATCAAGGTATGTTGCCCGTGTGTCCTCTAATTCCAGGATCCTCAACCACCAGCACGGGACCATGCCGAACCTGCACGACTCCTGCTCAAGGAACCTCTATGTATCCCTCCTGTTGCTGTACCAAACCTTCGGACGGAAATTGCACCTGTATTCCCATCCCATCATCCTGGGCTTTCGGAAAATTCCTATGGGAGTGGGCCTCAGCCCGTTTCTCCTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTTTGGCTTTCAGTTATATGGATGATGTGGTATTGGGGGCCAAGTCTGTACAGCATCTTGAGTCCCTTTTTACCGCTGTTACCAATTTTCTTTTGTCTTTGGGTATACATTTAAACCCTAATAAAACAAAGAGATGGGGTTACTCTCTAAATTTTA 3 | >KM577669.1_Iran 4 | TCCTAGGACCCCTGCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAGAATCCTCACAATACCGCAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGAACTACCGTGTGTCTTGGCCAAAATTCGCAGTCCCCAACCTCCAATCACTCACCAACCTCTTGTCCTCCAACTTGTCCTGGTTATCGCTTGATGTGTCTGCGGCGTTTTATCATCTTCCTCTTCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTTCTTCTGGACTATCAGGGTATGTTGCCCGTTTGTCCTCTAATTCCAGGATCTTCAACCACCAGCGCGGGACCATGCAGAACCTGCACGACTACTGCTCAAGGAACCTCTATGTATCCCTCCTGTTGCTGTACCAAACCTTCGGACGGAAATTGCACCTGTATTCCCATCCCATCATCCTGGGCTTTCGGAAAATTCCTATGGGAGTGGGCCTCAGCCCGTTTCTCCTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTTTGGCTTTCAGTTATATGGATGATGTGGTATTGGGGGCCAAGTCTGTACAGCATCTTGAGTCCCTTTTTACCGCTGTTACCAATTTTCTTTTGTCTTTGGGTATACATTTAAACCCTAACAAAACTAAAAGATGGGGTTACTCTTTAAATTTCA 5 | >KM606742.1_Cuba 6 | TCCTAGGACCCCTGCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAGAATCCTCACAATACCGCAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGATCACCCGTGTGTCTTGGCCAAAATTCGCAGTCCCCAACCTCCAATCACTCACCAACCTCCTGTCCTCCAATTTGTCCTGGTTATCGCTGGATGTGTCTGCGGCGTTTTATCATATTCCTCTTCATCCTGCTGCTATGCCTCATCTTCTTATTGGTTCTTCTGGATTATCAAGGTATGTTGCCCGTTTGTCCTCTAATTCCAGGATCAACAACAACCAGTACGGGACCATGCAAAACCTGCACGACTCCTGCTCAAGGCAACTCTATGTTTCCCTCATGTTGCTGTACAAAACCTACGGATGGAAATTGCACCTGTATTCCCATCCCATCGTCCTGGGCTTTCGCAAAATACCTATGGGAGTGGGCCTCAGTCCGTTTCTCTTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTTTGGCTTTCAGCTATATGGATGATGTGGTATTGGGGGCCAAGTCTGTACAGCATCGTGAGTCCCTTTATACCGCTGTTACCAATTTTCTTTTGTCTCTGGGTATACATTTAAACCCTAACAAAACAAAAAGATGGGGTTATTCCCTAAACTTCA 7 | >KM606753_Cuba 8 | TCCTAGGACCCCTGCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAGAATCCTCACAATACCGCAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGAACTACCGTGTGTCTTGGCCAAAATTCGCAGTCCCCAACCTCCAATCACTCACCAACCTCCTGTCCTCCAACTTGTCCTGGTTATCGCTGGATGTGTCTGCGGCGTTTTATCATCTTCCTCTTCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTTCTTCTGGACTATCAAGGTATGTTGCCCGTTTGTCCTCTAATTCCAGGATCTTCAACCACCAGCACGGGACCCTGCAGAACATGCACGACTCCTGCTCAAGGAACCTCTATGTATCCCTCCTGCTGCTGTACCAAACCTTCGGACGGAAATTGCACCTGTATTCCCATCCCATCATCTTGGGCTTTCGGAAAATTCCTATGGGAGTGGGCCTCAGCCCGTTTCTCCTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTTTGGCTTTCAGTTATATGGATGATGTGGTATTGGGGGCCAAGTCTGTACAGCATCTTGAGTCCCTTTTTACCGCTGTTACCAATTTTCTTTTGTCTTTGGGTATACATTTAAACCCTAACAAAACAAAAAGATGGGGTTACTCTTTACATTTCA 9 | >KP322600_China 10 | TCCTAGGACCCCTGCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAGAATCCTCACAATACCGCAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGAACTACCGTGTGTCTTGGCCAAAATTCGCAGTCCCCAACCTCCAATCACTCACCAACCTCCTGTCCTCCAACTTGTCCTGGTTATCGCTGGATGTGTCTGCGGCGTTTTATCATCTTCCTCTTCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTTCTTCTGGACTATCAAGGTATGTTGCCCGTTTGTCCTCTAATTCCAGGATCTTCAACCACCAGCACGGGACCATGCAGAACCTGCACGACTCCTGCTCAAGGAACCTCTATGTATCCCTCCTGTTGCTGTACCAAACCTTCGGACGGAAATTGCACCTGTATTCCCATCCCATCATCCTGGGCTTTCGGAAAATTCCTATGGGAGTGGGCCTCAGCCCGTTTCTCCTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTTTGGCTTTCAGTTATATGGATGATGTGGTATTGGGGGCCAAGTCTGTACAGCATCTTGAGTCCCTTTTTACCGCTGTTACCAATTTTCTTTTGTCTTTGGGTATACATTTAAACCCTAACAAAACAAAAAGATGGGGTTACTCTTTACATTTCA 11 | >KP322601_China 12 | TCCTAGGACCCCTGCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAGAATCCTCACAATACCGCAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGAACTACCGTGTGTCTTGGCCAAAATTCGCAGTCCCCAACCTCCAATCACTCACCAACCTCCTGTCCTCCAACTTGTCCTGGTTATCGCTGGATGTGTCTGCGGCGTTTTATCATCTTCCTCTTCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTTCTTCTGGACTATCAAGGTATGTTGCCCGTTTGTCCTCTAATTCCAGGATCTTCAACCACCAGCGCGGGACCATGCAGAACCTGCACGACTACTGCTCAAGGAACCTCTATGTATCCCTCCTGTTGCTGTACCAAACCTTCGGACGGAAATTGCACCTGTATTCCCATCCCATCATCCTGGGCTTTCGGAAAATTCCTATGGGAGTGGGCCTCAGCCCGTTTCTCCTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTTTGGCTTTCAGTTATATGGATGATGTGGTATTGGGGGCCAAGTCTGTACAGCATCTTGAGTCCCTTTTTACCGCTGTTACCAATTTTCTTTTGTCTTTGGGTATACATTTAAACCCTAACAAAACTAAAAGATGGGGTTACTCTTTAAATTTCA 13 | >KP718093.1_Panama 14 | TCCTAGGACCCCTGCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAGAATCCTCACAATACCGCAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGATCACCCGTGTGTCTTGGCCAAAATTCGCAGTCCCCAACCTCCAATCACTCACCAACCTCCTGTCCTCCAATTTGTCCTGGTTATCGCTGGATGTGTCTGCGGCGTTTTATCATATTCCTCTTCATCCTGCTGCTATGCCTCATCTTCTTATTGGTTCTTCTGGATTATCAAGGTATGTTGCCCGTTTGTCCTCTAATTCCAGGATCAACAACAACCAGTACGGGACCATGCAAAACCTGCACGACTCCTGCTCAAGGCAACTCTATGTTTCCCTCATGTTGCTGTACAAAACCTACGGATGGAAATTGCACCTGTATTCCCATCCCATCGTCCTGGGCTTTCGCAAAATACCTATGGGAGTGGGCCTCAGTCCGTTTCTCTTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTTTGGCTTTCAGCTATATGGATGATGTGGTATTGGGGGCCAAGTCTGTACAGCATCGTGAGTCCCTTTATACCGCTGTTACCAATTTTCTTTTGTCTCTGGGTATACATTTAAACCCTAACAAAACAAAAAGATGGGGTTATTCCCTAAACTTCA 15 | >Mur_11 16 | TCCTAAGACCCCTGCTCGTGTTACAGGAGGGGTTTTTCTCGTTGACAAGAATCCTCACAATACCGCAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGATCACCCGTGTGTCTTGGCCAAAATTCGCAGTCCCCAACCTCCAATCACTCACCAACCTCCTGTCCTCCAATTTGTCCTGGTTATCGCTGGATGTGTCTGCGGCGTTTTATCATATTCCTCTTCATCCTGCTGCTATGCCTCATCTTCTTATTGGTTCTTCTGGATTATCAAGGTATGTTGCCCGTTTGTCCTCTAATTCCAGGATCAACAACAACCAGTACGGGACCATGCAAAACCTGCACGACTCCTGCTCAAGGCAACTCTATGTTTCCCTCATGTTGCTGTACAAAACCTACGGATGGAAATTGCACCTGTATTCCCATCCCATCGTCCTGGGCTTTCGCAAAATACCTATGGGAGTGGGCCTCAGTCCGTTTCTCTTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTTTGGCTTTCAGCTATATGGATGATGTGGTATTGGGGGCCAAGACTGTACAGCATCGTGAGTCCCTTTATACCGCTGTTACCAATTTTCTTTTGTCTCTGGGTATACATTTAAACCCTAACAAAACAAAAAGATGGGGTTATTCCCTAAACTTCA 17 | >Mur_18 18 | TCCTAGGACCCCTTCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAGAATCCTCACAATACCGCAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGAACTACCGTGTGTCTTGGCCAAAATTCGCAGTCCCCAACCTCCAATCACTCACCAACCTCCTGTCCTCCAACTTGTCCTGGTTATCGTTGGATGTGTCTGCGGCGTTTTATCATCTTCCTCTTCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTTCTTCTGGACTATCAAGGTATGTTGCCCGTTTGTCCTCTACTTCCAGGATCCTCAACCACCAGCACGGGACCATGCAGAACCTGCACGACTCCTGCTCAAGGAACCTCTATGTATCCCTCCTGTTGCTGTACCAAACCTTCGGACGGAAATTGCACCTGTATTCCCATCCCATCATCCTGGGCTTTCGGAAAATTCCTATGGGAGTGGGCCTCAGCCCGTTTCTCCTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTTTGGCTTTCAGTTATATGGATGATGTGGTATTGGGGGCCAAGTCTGTGCAGCATCTTGAGTCCCTTTTTACCGCTGTTACCAATTTTCTTTTGTCTTTGGGTATACATTTAAACCCTAACAAAACAAAGAGATGGGGTTATTCTCTAAATTTTA 19 | >Mur_19 20 | TCCTAGGACCCCTGCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAGAATCCTCACAATACCGCAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGAACTACCGTGTGTCTTGGCCAAAATTCGCAGTCCCCAACCTCCAATCACTCACCAACCTCCTGTCCTCCAACTTGTCCTGGTTATCGCTGGATGTGTCTGCGGCGTTTTATCATCTTCCTCTTCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTTCTTCTGGACTATCAAGGTATGTTGCCCGTTTGTCCTCTAATTCCAGGATCTTCAACCACCAGCGTGGGACCATGCAGGACATGCACGACTACTGTTCAAGGAACCTCTATGTATCCCTCCTGTTGCTGTACCAAACCTTCGGACGGAAATTGCACCTGTATTCCCATCCCATCATCCTGGGCTTTCGGAAAATTCCTATGGGAGTGGGCCTCAGCCCGTTTCTCTTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTTTGGCTTTCAGTTATATGGATGATGTGGTATTGGGGGCCAAATCTGTACAGCATCTTGAGTCCCTTTTTACCGCTGTTACCAATTTTCTTTTGTCTTTGGGTATACATTTGAACCCTAACAAAACTAAAAGATGGGGTTACTCTTTACATTTCA 21 | >Mur_28 22 | TCCTAGGACCCCTGCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAGAATCCTCACAATACCGAAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGAACTACCGTGTGTCTTGGCCAAAATTCGCAGTCCCCAACCTCCAATCACTCACCAACCTCCTGTCCTCCAACTTGTCCTGGTTATCGCTGGATGTGTCTGCGGCATTTTATCATCTTCCTCTTCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTTCTTCTGGACTATCAAGGTATGTTGCCCGTCTGTCCTCTAATTCCAGGATCTTCAACCACCAGCGTGGGACCATGCAGAACCTGCACGACTACTGTTCAAGGAACCTCTATGTATCCCTCCTGTTGCTGTACCAAACCTTCGGACGGAAATTGCACCTGTATTCCCATCCCATCATCCTGGGCTTTCGGAAAATTCCTATGGGAGTGGGCCTCAGCCCGTTTCTCCTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTTTGGCTTTCAGTTATATGGATGATGTGGTATTGGGGGCCAAGTCTGTACAGCATCTTGAGTCCCTTTTTACCGCTGTTACCAATTTTCTTTTGTCTTTGGGTATACATTTAAACCCTAACAAAACTAAAAGATGGGGTTACTCTTTAAATTTCA 23 | >Mur_29 24 | TCCTAGGACCCCTGCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAGAATCCTCACAATACCGCAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGAACTACCGTGTGTCTTGGCCAAAATTCGCAGTCCCCAACCTCCAATCACTCACCAACCTCCTGTCCTCCAACTTGTCCTGGTTATCGCTGGATGTGTCTGCGGCGTTTTATCATCTTCCTCTTCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTTCTTCTGGACTATCAAGGTATGTTGCCCGTCTGTCCTCTAATTCCAGGATCTTCAACCACCAGCGCGGGACCATGCAGAACCTGCACGACTACTGCTCAAGGAACCTCTATGTATCCCTCCTGTTGCTGTACAAAACCTTCGGACGGAAATTGCACCTGTATTCCCATCCCATCATCCTGGGCTTTCGGAAAATTCCTATGGGAGTGGGCCTCAGCCCGTTTCTCCTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCATTGTTTGGCTTTCAGTTATATGGATGATGTGGTATTGGGGGCCAAGTCTGTACAGCATCTTGAGTCCCTTTTTACCGCTGTTACCAATTTTCTTTTGTCTTTGGGTATACATTTAAACCCTAACAAAACTAAAAGATGGGGTTACTCTTTAAATTTTA 25 | >Mur_31 26 | TCCTAGGACCCCTTCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAGAATCCTCACAATACCGCAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGAACCACCGTGTGTCTTGGCCAAAATTCGCAGTCCCCAACCTCCAATCACTCACCAACCTCCTGTCCTCCGACTTGTCCTGGTTATCGTTGGATGTGTCTGCGGCGTTTTATCATCTTCCTCTTCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTGCGTCTGGACTATCAAGGTATGTTGCCCGTTTGTCCTCTAATTCCAGGATCATCAACCACCAGCACGGGACCATGCAGAACCTGCACGACTCCTGCTCAAGGAACCTCTATGTATCCCTCCTGTTGCTGTACAAAACCTTCGGACGGAAATTGCACCTGTATTCCCATCCCATCATCCTGGGCTTTCGGAAAATTCCTATGGGAGTGGGCCTCAGCCCGTTTCTCCTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTTTGGCTTTCAGTTATATGGATGATGTGGTATTGGGGGCCAAGTCTGTACAACATCTTGAGTCCCTTTTTACCGCTGTTACCAATTTTCTTTTGTCTTTGGGTATACATTTAAACCCTAACAAAACAAAGAGATGGGGTTATTCTTTAAATTTTA 27 | >Mur_37 28 | TCCTAGGACCCCTTCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAGAATCCTCACAATACCGCAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGAACTACCGTGTGTCTTGGCCAAAATTCGCAGTCCCCAACCTCCAATCACTCACCAACCTCCTGTCCTCCAACTTGTCCTGGTTATCGCTGGATGTGTCTGCGGCGTTTTATCATCTTCCTCTTCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTTCTTCTGGACTATCAAGGTATGTTGCCCGTTTGTCCTCTAATTCCAGGATCCTCAACCACCAGCACGGGACCATGCCGAACCTGCACGACTCTTGCTCAAGGAACCTCTATGTATCCCTCCTGTTGCTGTACCAAACCTTCGGACGGAAATTGCACCTGTATTCCCATCCCATCATCCTGGGCTTTCGGAAAATTCCTATGGGAGTGGGCCTCAGCCCGTTTCTCCTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTTTGGCTTTCAGTTATATGGATGATGTGGTATTGGGGGCCAAGTCTGTACAGCATCTTGAGTCCCTTTTTACCGCTGTTACCAATTTTCTTTTGTCTTTGGGTATACATTTAAACCCTAACAAAACAAAGAGATGGGGTTACTCTCTACATTTTA 29 | >Mur_4 30 | TCCTAAGACCCCTTCTCGTGTTACAGGCGGGGTTTTCCTTGTTGACAAGAATCCTCAAAATACCGCAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCCAGGGGGAACTACCGTGTGTCTTGGCCAAAATTCGCAGTCCCCAACCTCCAATCACTCACCAACCTCCTGTCCTCCAACTTGTCCTGGTTATCGTTGGATGTGTCTGCGGCGTTTTATCATCTTCCTCTTCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTTCTTCTGGACTATCAAGGTATGTTGCCCGTTTGTCCTCTAATTCCAGGATCCTCAACCACCAGCACGGGACCATGCAGAACCTGCACGACTCCTGCTCAAGGAACCTCTATGTATCCCTCCTGTTGCTGTACCAAACCTTCGGACGGAAATTGCACCTGTATTCCCATCCCATCATCCTGGGCTTTCGGAAAATTCCTATGGGAGTGGGCCTCAGCCCGTTTCTCCTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTTTGGCTTTCAGTTATATGGATGATGTGGTATTGGGGGCCAAGTCTGTACATCATCTTGAGTCCCTTTTTACCGCTGTTACCAATTTTCTTTTGTCTTTGGGTATACATTTAAACCCTAACAAAACAAAGAGATGGGGTTATTCTCTAAATTTTA 31 | >Mur_40 32 | TCCTAGGACCCCTGCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAGAATCCTCACAATACCGCAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGAACTACCGTGTGTCTTGGCCAAAATTCGCAGTCCCCAACCTCCAATCACTCACCAACCTCCTGTCCTCCAACTTGTCCTGGTTATCGCTGGATGTGTCTGCGGCGTTTTATCATCTTCCTCTTCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTTCTTCTGGACTATCAAGGTATGTTGCCCGTCTGTCCTCTAATTCCAGGATCTTCAACCACCAGCGCGGGACCATGCAGAACCTGCACGACTACTGCTCAAGGAACCTCTATGTATCCCTCCTGTTGCTGTACAAAACCTTCGGACGGAAATTGCACTTGTATTCCCATCCCATCATCCTGGGCTTTCGGAAAATTCCTATGGGAGTGGGCCTCAGCCCGTTTCTCCTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCATTGTTTGGCTTTCAGTTATATGGATGATGTGGTATTGGGGGCCAAGTCTGTACAGCACCTTGAGTCCCTTTTTACCGCTGTTACCAATTTTCTTTTGTCTTTGGGTATACATTTGAACCCTCACAAAACTAAAAGATGGGGTTACTCTTTAAATTTCA 33 | >Mur_44 34 | TCCTAGGACCCCTGCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAAAATCCTCACAATACCGCAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGGACTACCGTGTGTCTTGGCCAAAATTCGCAGTCCCCAACCTCCAATCACTCACCAACCTCCTGTCCTCCAACTTGTCCTGGTTATCGCTGGATGTGTCTGCGGCGTTTTATCATCTTCCTCTTCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTTCTTCTGGACTATCAAGGTATGTTGCCCGTCTGTCCTCTAATTCCAGGATCTTCAACCACCAGCACGGGACCATGCAGAACCTGCACGACTACTGCTCAAGGAACCTCTATGTATCCCTCCTGTTGCTGTACCAAACCTTCGGACGGAAATTGCACCTGTATTCCCATCCCATCATCCTGGGCTTTCGGAAAATTCCTATGGGAGTGGGCCTCAGCCCGTTTCTCCTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTTTGGCTTTCAGTTATATGGATGATGTGGTATTGGGGGCCAAGTCTGTACCGCATCTTGAGTCCCTTTTTACCGCTGTTACCAATTTTCTTTTGTCTTTGGGTATACATTTAAACCCTAACAAAACTAAAAGATGGGGTTACTCTTTAAATTTCA 35 | >Mur_45 36 | TCCTAGGACCCCTGCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAAAATCCTCACAATACCGCAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGGACCACCGTGTGTCTTGGCCAAAATTCGCAGTCCCCAACCTCCAATCACTCACCAACCTCCTGTCCTCCAACTTGTCCTGGTTATCGCTGGATGTGTCTGCGGCGTTTTATCATATTCCTCTGCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTTCTTCTGGACTATCAAGGTATGTTGCCCGTTTGTCCTCTAATTCCAGGATCTTCAACGACCAGCACGGGACCATGCAGAACCTGCACGACTCCTGCTCAAGGCACCTCTATGTATCCCTCATGTTGCTGTACCAAACCTTCGGACGGAAATTGCACCTGTATTCCCATCCCATCATCCTGGGCTTTCGGAAAATTCCTATGGGAGTGGGCCTCAGCCCGTTTCTCCTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTTTGGCTTTCAGTTATATGGATGATGTGGTATTGGGGGCCATGTCTGTTCCGCACCTTGAGTCCCTTTTTACCGCTGTTACCAATTTTCTTTTGTCTTTGGGTATACATTTAAACCCTGACAAAACAAAAAGATGGGGTTACTCTTTACATTTCA 37 | >Mur_52 38 | TCCTAGGACCCCTGCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAGAATCCTCACAATACCGCAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTCGGGGGATCACCCGTGTGTCTTGGCCAAAATTCGCAGTCCCCAACCTCCAATCACTCACCAACCTCCTGTCCTCCAATTTGTCCTGGTTATCGCTGGATGTGTCTGCGGCGTTTTATCATATTCCTCTTCATCCTGCTGCTATGCCTCATCTTCTTATTGGTTCTTCTGGATTATCAAGGTATGTTGCCCGTTTGTCCTCTAATTCCAGGATCAACAACAACCAGTACGGGACCCTGCAAAACCTGCACGACTCCTGCTCAAGGCAACTCTATGTTTCCCTCATGTTGCTGTACAAAACCTACGGATGGAAATTGCACCTGTATTCCCATCCCATCGTCCTGGGCTTTCGCAAAATACCTATGGGAGTGGGCCTCAGTCCGTTTCTCTTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCATTGTTTGGCTTTCAGCTATATGGATGATGTGGTATTGGGGGCCAAGTCTGTACAGCATCGTGAGTCCCTTTATACCGCTGTTACCAATTTTCTTTTGTCTCTGGGTATACATTTAAACCCTAACAAAACAAAAAGATGGGGTTATTCCCTGAACTTCA 39 | >Mur_53 40 | TCCTAGGACCCCTGCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAGAATCCTCACAATACCGCAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGAACTACCGTGTGTCTTGGCCAAAATTCGCAGTCCCCAACCTCCAATCACTCACCAACCTCCTGTCCTCCAACTTGTCCTGGTTATCGCTGGATGTGTCTGCGGCGTTTTATCATCTTCCTCTTCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTTCTTCTGGACTATCAAGGTATGTTGCCCGTTTGTCCTCTAATTCCAGGATCTTCAACCACCAGCACGGGACCATGCAGAACCTGCACGACTCCTGCTCAAGGAACCTCTATGTATCCCTCCTGTTGCTGTACCAAACCTTCGGACGGAAATTGCACCTGTATTCCCATCCCATCATCTTGGGCTTTCGGAAAATTCCTATGGGAGTGGGCCTCAGCCCGTTTCTCCTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTTTGGCTTTCAGTTATATGGATGATGTGGTATTGGGGGCCAAGTCTGTACCGCATCTTGAGTCCCTTTTTACCGCTGTAACCAATTTTCTTTTGTCTTTGGGTATACATTTAAACCCTAACAAAACAAAAAGATGGGGTTACTCTTTACATTTTA 41 | >Mur_59 42 | TCCTAGGACCCCTGCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAGAATCCTCACAATACCGCAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGAACTACCGTGTGTCTTGGCCAAAATTCGCAGTCCCCAACCTCCAATCACTCACCAACCTCCTGTCCTCCAACTTGTCCTGGTTATCGCTGGATGTGTCTGCGGCGTTTTATCATATTCCTCTTCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTTCTTCTGGACTATCAGGGTATGTTGCCCGTCTGTCCTCTAATTCCAGGATCTTCAACCACCAGCGTGGGACCATGCAGAACCTGCACGACAACTGTTCAAGGAACCTCTATGTATCCCTCCTGTTGCTGTACCAAACCTTCGGACGGAAATTGCACCTGTATTCCCATCCCATCATCCTGGGCTTTCGGAAAATTCCTATGGGAGTGGGCCTCAGCCCGTTTCTCCTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTTTGGCTTTCAGTTATATGGATGATGTGGTGTTGGGGGCCAAGACTGTGCAGCATCTTGAGTCCCTTTTTACCGCTGTTACCAATTTTCTTTTGTCTTTGGGTATACATTTAAACCCTAACAAAACTAAAAGATGGGGTTACTCTTTAAATTTCA 43 | >Mur_68 44 | TCCTAGGACCCCTGCTCGTGTTACAGGGGGGGTTTTTCTTGTTGACAAGAATCCTCACAATACCACAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGAATCACCCGTGTGTCCTGGCCAAAATTCGCAGTCCCCAACTTGCAGTCACTCACCAACCTTCTGTCCTCAAACTTGTCGTGGTTATCGCTGGATGTGTCTGCGGCGTTTTATCATATTCCTCTTCATCCTGCTGCTATGCCTCATCTTCTTATTGGTTCTTCGGAATTATCAAGGTATGTTGCCCGTTTGTCCTCTAATTCCAGGATCAACAACAACCAGTACGGGACCATGCAAAACCTGCACGACTCCTGCTCAAGGCAACTCTATGTTTCCCTCATGTTGCTGTACAAAACCTACGGATGGAAATTGCACCTGTATTCCCATCCCATCGTCCTGGGCTTTCGCAAAATACCTATGGGAGTGGGCCTCAGTCCGTTTCTCTTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTTTGGCTTTCAGCTATATGGATGATGTGGTATTGGGGGCCAAGTCTGTACAACATCGTGAGTCCCTTTATACCGCTGTTACCAATTTTCTTTTGTCTCTGGGTATACATTTAAACCCTAACAAAACAAAAAGATGGGGTTATTCCCTAAACTTCA 45 | >Mur_9 46 | TCCTAGGACCCCTGCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAGAATCCTCACAATACCGCAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGAACCACCGTGTGTCTTGGCCAAAATTCGCAGTGCCCAACCTCCAATCACTCACCAACCTCCTGTCCTCCAACTTGTCCTGGTTATCGTTGGATGTGTCTGCGGCGTTTTATCATCTTCCTCTTCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTTCTTCTGGACTATCAAGGTATGTTGCCCGTTTGTCCTCTAATTCCAGGATCCTCAACCACCAGCACGGGACCATGCAGAACCTGCACGACTCCTGCTCAAGGAACCTCTATGTATCCCTCCTGTTGCTGTACCAAACCTTCGGACGGAAATTGCACCTGTATTCCCATCCCATCATCCTGGGCTTTCGGAAAATTCCTATGGGAGTGGGCCTCAGCCCGTTTCTCCTGGCTCAGTTTACTGGTGCCCTTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTTTGGCTTTCAGTTATATGGATGATGTGGTATTGGGGGCCAAGTCTGTACACCATCTTGAGTCCCTTTTTACCGCTGTTACCAATTTTCTTTTGTCTTTGGGTATACATTTAAACCCTAACAAAACAAAGAGATGGGGTTATTCTCTAAATTTTA 47 | -------------------------------------------------------------------------------- /datasets/norovirus.fasta: -------------------------------------------------------------------------------- 1 | >AF190817.1_Arg320 2 | TGAATGAAGATGGCCCTATCATATTTGAGAAACATTCTAGGTACCACTATCACTATGATGCAGACTACTCCCGGTGGGATTCAACACAACAGAGAGCTGTGCTAGCCGCAGCCCTAGAGATCATGGTAAAATTCTCTTCAGAACCACATCTAGCCCAGGTGGTTGCAGAGGACCTTCTTTCTCCCAGTGTGATGGACGTGGGTGATTTCAAGATATCAATCACTGAAGGGCTTCCTTCCGGTGTGCCTTGCACTTCACAGTGGAATTCCATCGGCCATTGGCTCCTCACACTCTGTGCACTCTCTGAGACTACAAATCTGTCCCCTGACATCATCCCGGCAAATTCTCTCTTCTCCTTTTATGGTGATGATGAAATCGTGAGCACAGATATCAAATTAGACCCAGGGAAGCTTACAGCCAAGTTGAAAGAGTATGGGTTGAAACCAACTCGCCCTGACAAGACTGAAGGGCCTCTGATCATCTCTGAGGACCTAGACGGTCTGACCTTCCTACGAAGGACCGTGACCCGTGACCCAGCTGGCTGGTTTGGAAAGTTGGAGCAAAGTTCAATACTTAGACAAATGTATTGGACCAGGGGCCCTAACCATGAGGACCCCTCTGAAACAATGATACCACACTCCCAGAGACCCATACAGCTAATGTCACTGTTAGGTGAAGCAGCACTGCATGGACCATCATTCTACAGTAAGATTATTAAGCTAGTTATTGCAGAGCTGAAGGAAGGTGGCATGGACTTTTACGTGCCTAGACAAGAACCAATGTTCCGGTGGATGAGGTTCTCAGACTTGAGCACGTGGGAGGGCGATCGCAATCTGGCTCCCAGCTTTGTGAATGAAGATGGCGTCGAATGACGCCACTCCATCTAATGATGGTGCCGCCGGCCTCGTCCCAGAGATCAACAATGAGGCAATGGCGCTAGACCCAGTGGCGGGTGCAGCGATAGCAGCACCCCTCACTGGTCAGCAAAACATAATTGATCCCTGGATTATGAATAATTTTGTGCAAGCACCTGGTGGTGAGTTTACAGTGTCCCCTAGGAATCCCCCTGGTGAAGTGCTTCTTAATTTGGAATTGGGCCCAGAAATAAACCCCTATTTGGCCCATCTTGCTAGAATGTATAATGGTTATGCAGGTGGATTTGAAGTGCAGGTAGTCCTGGCTGGGAATGCGTTTACAGCAGGAAAGATAATCTTTGCAGCTATACCTCCTAATTTTCCAATTGATAATCTGAGCGCAGCACAAATCACAATGTGCCCGCATGTGATTGTGGATGTCAGACAGTTGGAACCGGTCAACCTCCCGATGCCTGACGTTCGCAACAACTTCTTTCATTACAATCAAGGGTCTGATTCGAGATTGCGCTTAATTGCAATGCTGTATACACCTCTTAGGGCAAATAATTCTGGAGATGATGTTTTTACTGTGTCCTGTAGAGTACTGACTAGGCCTAGCCCTGACTTCTCATTCAATTTCCTTGTCCCACCTACTGTGGAATCAAAGACAAAACCCTTTACCCTCCCTATTCTGACTATCTCTGAAATGTCCAATTCTAGGTTTCCAGTGCCGATTGAGTCTTTGCACACCGGCCCAACTGAGAATATTGTTGTCCAGTGCCAAAATGGGCGCGTCACTCTTGATGGTGAGTTGATGGGCACCACCCAACTCTTACCGAGTCAAATTTGTG-CTTTTAGGGGCGTGCTCACCAGATCAACAAGTAGGGCCAGTGATCAGGCCGATACAGCAACCCCTAGGCTGTTTAATTATTATTGGCATGTACAATTGGATAATCTAAATGGGACACCTTATGACCCTGCAGAAGACATACCAGGCCCCCTAGGGACACCAGACTTCCGGGGCAAGGTCTTTGGCGTGGCCAGCCAGAG---AAACCCTGACAGCACAACTAGAGCACATGAAGCAAAGGTGGACACAACAGCTGGTCGTTTCACCCCAAAATTGGGCTCATTAGAGATATCTACTGACTC---CAGTGACTTTGACCAAAACCAACCAACAAGATTCACCCCAGTTGGCATT---------GGGGTTGACAATGAGGCAGATTTTCAACAATGGTCTTTACCCGACTATTCTGGTCAATTCACTCACAACATGAACTTGGCCCCAGCTGTTGCTCCCAACTTCCCTGGTGAGCAGCTCCTTTTCTTCCGATCACAGTTACCATCTTCTGGTGGGCGATCCAACGGGGTTCTAGACTGTCTGGTCCCCCAGGAATGGGCTCAACACTTCTACCAGGAATCGGCCCCGGCCCAAACACAAGTGGCCCTGGTTAGGTATGTCAACCCTGACACTGGTAGAGTGCTATTTGAGGCCAAGCTGCATAAATTAGGTTTCATGACTATAGCTAAGAATGGTGACTCTCCAATAACTGTCCCCCCAAATGGATACTTTAGGTTTGAATCTTGGGTGAACCCCTTTTATACACTTGCCCCCATGGGAACTGGGAATGGGCGTAGAAGGATTCAATAATGGCTGGAGCTTTTATAGCAGGATTGGCTGGTGACATGCTCGCAAATACTGTAGGATCTCTAGTTAGTGCAGGGGCCAATGCTATTAATCAAAAAGTTGATTTTGAAAATAATAAATATTTACAAAATGCATCCTTC-AAT--------------------------------CATGATAAGGAAATGTTAAATGCACAAATTGAGGCAACAAAGAGGTTACAGGCTGACATGATTGCTATCAAACAAGGGGTTTTGAC-GCTGGCGGCTTCTCCCCTACTGATGCAGCCCGTGG--CAATTAATGCCCCCATGACAAAAGTTTTAGATTGGAGTGGAACAAGGTACTGGGCACCAAATGCCACCTCCACAACCT-CAATGTCGGGTGGCTT--CACAAACCAAGCTGTTCACAGAACCACGCCAAATTTTAAA---ACGAACCAGACCCCCAAATCCACACCCAGCAGTGGGTCTTC-AGCGAGGTCAAACTCAACCCAACTCACTAGCCTGAGCTCACACTCGTCCGGGTCGTCTCGATCCAGCGGGT-CTACAGTTATTAGCTCA---TTACCATCTTCTAACAGGACTAGGGACTGGGTCAACCAACAGAATTTTAATTTGGAACCACACATGCCTGGATCTCTTAGGACAGCTTTTGTTACTCCACCATCTAGTACGGCCTCTAGTTCAAGCACGGTCTCAACCGTGCCCAAAAATGTTTTGGACTCCTGGACA-TCTGCGTTTAACACGCGCAGACAGCCGCTATT-CGCACACCTTCGTAGAAGGGGGGAGTCAAATGTTTA--GTGAAAAGATCATTTTAAATTTGATTTAAATTGGATTTAA 3 | >U22498.1_MX 4 | TGAATGAAGATGGTCCCATAATATTTGAGAAACATTCCAGATACAGATACCACTACGACGCAGATTACTCCCGCTGGGACTCCACGCAGCAGCGGGCAGTGTTGGCAGCAGCACTTGAAATCATGGTGAGGTTCTCTGCTGAACCACAGCTAGCACAAATAGTAGCTGAAGATCTGCTAGCACCAAGTGTGGTTGATGAGGGTGACTTCAAGATCACCATTAACGAAGGCCTACCTTCTGGTGTGCCTTGCACCTCACAGCGGAACTCCACCGCCCACTGGTTGCTTACTCTGTGTGCCCTTTCTGAAGTGACAGGACTAGGCCCCGACATCATACAAGCTAATTCTATGTACTCTTTCTATGGTGATGATGAGATAGTGAGTACTGACATAAAATTGGACCCAGAGAAACTGACTGCAAAACTCAAAGAGTACGGCCTCAAACCCACTCGGCCCGACAAAACCGAAGGGCCGCTGGTGATCAGTGAAGACTTGAATGGTTTAACGTTCCTCCACCGAAACGTCACCCGTGACCCAGCAGGTTGGTTTGGAAAGCTGGAGCAAAGTTCCATCCTCAGGCAGCTATACTGGACAAGAGGACCTAACCATGAAGACCCCAGTGAAACCATGATACCACATGCGCTGAGACCCGTGCAGCTCATGGCACTACTGGGAGAATCCTCCCTAAATGGACCCTCATTTTACAGCAAGGTCAGCAAGCTGGTTATATCTGAACTTAAGGAGGGAGGAATGGATTTTTATGTGCCCAGACAAGAGTCAATGTTCAGGTGGATGAGGTTCTCAGATCTAAGCACATGGGAGGGCGATCGCAATCTGGCCTCCAGTTTTGTGAATGAAGATGGCGTCGAATCGCGCTGCTCCATCTAATGATGGTGCCGCCTGCCTCGTCCCAGAGATCAACAATGAGGCAATGGCGCTAGAGCCAGTGGCGGGTGCAGCGATAGCAGCGCCCCTCACTGGCCAGCAAAATATAATTGATCCCTGGATTATGAATAATTTTGTGCAAGCACCTGGTGGTGAGTTTACAGTGTCACCCAGGAATTCCCCTGGTGAAGTGCTTCTTAATTTGGAATTAGGTCCAGAAATAAATCCTTATTTGGCTCATCTTGCTAGAATGTACAATGGTTATGCAGGTGGATTTGAAGTGCAAGTGGTCCTGGCTGGAAATGCGTTTACAGCAGCAAAAATTATCTTTGCAGCTATACCCCCTAACTTCCCTATTGACAATCTGAGCGCGGCACAGATCACAATGTGCCCGCATGTGATTGTGGATGTCAGGCAGTTGGAACCAATCAATCTTCCGATGCCTGATGTCCGCAACAATTTCTTTCATTATAATCAAGGTTCTGATTCAAGATTACGCTTAATTGCAATGCTGTATACACCTCTTAGGGCAAATAATTCTGGAGATGATGTTTTCACTGTGTCTTGTAGGGTGTTAACTAGGCCTAGCCCTGATTTCTTATTCAATTTTCTTGTCCCACCCACTGTGGAATCAAAGACAAAACCTTTTACCCTCCCCATTTTAACCATCTCTGAAATGTCTAATTCCAGGTTTCCGGTGCCAATTGACTCTCTGCACACCAGCCCAACTGAGAATATAGTTGTCCAGTGCCAAAATGGGCGCGTCACTCTTGACGGTGAGTTGATGGGCACCACCCAACTCTTACCGAGCCAAATATGTG-CTTTCAGGGGCACACTCACTGGATCAACAAGCAGGGCCAGTGACCAAGCCGACACACCAACCCCTAGGCTATTCAACCATCATTGGCACATACAATTGGATAATCTAAATGGAACTCCCTACGACCCTGCAGAGGACATACCAGCTCCTTTGGGCACACCAGACTTCCGGGGCAAGGTCTTTGGCGTAGCCGGCCAGAG---AAACCCCGACAGCACAACAAGGGCACATGAAGCAAAAGTGGACACAACATCTGGCCGCTTCACCCCAAAATTGGGCTCCTTAGAAATAACCACTGAATC---TGATGACCTTGACCTAAGCCAGCCAACAAAATTCACCCCAGTTGGCATT---------GGAGTTGACAATAGGGCAGAATTTCAGCAATGGTCCTTACCTGACTATTCCGGTCAGTTTACTCACAACATGAACTTGGCCCCAGCTGTCGCCCCCAATTTTCCTGGTGAACAGCTACTTTTCTTCCGATCACAGCTGCCATCCTCTGGTGGGCGGTCTAACGGGGTTCTAGACTGCCTGGTCCCCCAGGAATGGGTTCAACACTTTTACCAAGAATCAGCCCCCGCCCAAACACAGGTGGCCCTGGTTAGGTATGTCAACCCTGACACTGGTAGAGTGCTATTTGAGGCCAAGCTACACAAATTGGGTTTTATGACTGTAGCAAAGAATGGTGACTCCCCAATAACTGTCCCTCCAAATGGTTATTTTAGATTTGAATCTTGGGTTAACCCCTTTTACACACTTGCCCCCATGGGAACTGGAAACGGGCGTAGAAGGATTCAATAATGGCCGGAGCTTTTATAGCAGGATTGGCTGGGGACATGCTCACAAACACTGTGGGGTCTTTGGTTAATGCAGGGGCTAATGCTATCAATCAAAAAGTTGATTTTGAAAATAATAAATATTTGCGAAATGCTTCTTTT-AAT--------------------------------CATGATAAGGAGATGCTAAATGCACAAATTGCGGCAACAAAGAGGCTGCAGGCTGACATGATTGCAATCAAACAACCCGTCTTGAC-GCTGGCGGCTTTTCCCCTACTGATGCAGCACGTGG--CAATTAATGCCCCAATGACAAAAGTTTTAGATTGGAGTGGAACAAGGTACTGGGCACCAAACGCCACCTCCACAACTT-CAATGTCAGGTGGCTT--CACAAGCCAAGTTGTGCACAGAACCACACCAAATTTCAAA---ACGAACCAGGCCCCCGAATTCACACCCAGCAGTGGGTCTTC-AGTGAGATCAAGCTCAACCCAACTCACCAACTTGAGCTCACACTCATCTGGTCGGTCCCGATCTAGCGGGT-CTACGGTTGTCAGCTCG---CTGCCGTCCTCCAGTAGGACTAGGGATCGGGTCAATCAACAGAATCTCAATTTGGAACCATACATGCCTGGATCTCTCAGGATAGCTTTTGTCACTCCACCATCTAGCACAGCCTCTAGTTCAGGCACAGTCTCAACCGTGCCCAAAAATGTTTTGGACTCCTGGACA-TCT-CGTTTAACAGGCGCAGACAGCCGCTGTTTGATACCCCTTCGTAGAAGGGGGGAGTCAAATGTTTA--GTGAAAAGATTATTCCTAAATTTGATTTAGAATCTTTTAC 5 | >X86557.1_Lordsdale 6 | TGAATGAGGATGGCCCCATCATCTTCGAGAGACACTCCAGATACAAGTATCACTATGATGCTGACTACTCTCGGTGGGATTCAACACAACAAAGGGCCGTGTTAGCAGCAGCCCTAGAAATCATGGTTAAATTCTCCCCAGAACCGCATTTGGCCCAGATAGTTGCAGAAGACCTTCTATCTCCTAGTGTGATGGATGTGGGTGACTTCAAAATATCAATCAATGAGGGCCTTCCCTCTGGTGTGCCCTGCACCTCTCAATGGAATTCCATCGCCCACTGGCTCCTCACTCTCTGTGCACTCTCTGAAGTTACAAACCTGTCCCCTGACATCATACAGGCTAATTCCCTCTTTTCCTTCTATGGTGATGATGAAATTGTCAGTACAGATATAAACTTAAACCCAGAGAAACTAACAGCAAAGCTCAAGGAATACGGGTTGAAACCAACCCGCCCTGACAAAACTGAGGGACCCCTTATTATCTCTGAAGACCTGAACGGCCTCACCTTCCTGCGGAGGACTGTGACCCGCGACCCAGCTGGTTGGTTTGGAAAACTGGATCAGAGCTCAATACTTAGGCAAATGTACTGGACTAGAGGCCCCAATCATGAAGACCCATCTGAAACAATGATACCACACTCCCAAAGACCCATACAACTAATGTCTTTGCTGGGCGAGGCCGCACTCCACGGCCCAGCATTCTACAGCAAAATTAGCAAGCTAGTCATTGCAGAACTGAAGGAAGGTGGCATGGATTTTTACGTGCCCAGACAAGAGCCAATGTTCAGATGGATGAGATTCTCAGATCTGAGCACGTGGGAGGGCGATCGCAATCTGGCTCCCAGCTTTGTGAATGAAGATGGCGTCGAATGACGCCAACCCATCTGATGGGTCCGCAGCCAACCTCGTCCCAGAGGTCAATAATGAGGTTATGGCTCTGGAGCCCGTTGTTGGTGCCGCTATTGCGGCACCTGTGGCGGGCCAACAAAACGTAATTGACCCCTGGATTAGAAACAATTTTGTACAAGCCCCTGGTGGAGAGTTCACAGTGTCCCCTAGAAACGCTCCAGGTGAGATACTGTGGAGCGCGCCCTTGGGCCCTGATCTGAACCCCTATCTTTCTCATTTGTCCAGAATGTACAATGGTTATGCAGGTGGTTTTGAAGTGCAAGTAATCCTCGCGGGGAATGCGTTCACCGCCGGGAAAGTCATATTTGCAGCAGTCCCACCAAACTTTCCTACTGAAGGCTTAAGCCCCAGCCAGGTTACTATGTTCCCCCATATAATTGTAGATGTTAGACAATTGGAACCTGTGTTGATCCCCCTACCTGATGTTAGGAATAATTTCTATCATTACAATCAAGCAAATGATTCTACCCTCAAGTTGATAGCAATGTTGTACACACCACTCAGAGCTAATAATGCCGGGGATGATGTCTTCACAGTCTCTTGTCGAGTCCTCACGAGGCCATCCCCCGATTTTGATTTTATATTTCTGGTGCCACCCACAGTTGAATCAAGAACTAAACCATTCACTGTCCCAGTCTTGACTGTTGAGGAAATGTCTAATTCAAGATTCCCCATTCCTTTGGAAAAGCTTTACACGGGCCCTAGTAGTGCTTTTGTTGTCCAACCACAAAATGGCAGATGTACGACTGATGGCGTACTTCTAGGCACTACCCAGCTGTCAGCTGTTAATATCTGTAACTTTAGGGGGGATGTTACCCATATT----------GCGGGTAGCCATGATTACACAATGAAT--------------------TTGGCATCC-------------CAAAATTGGAGCAATTATGACCCAACAGAAGAGATCCCAGCCCCCCTAGGAACGCCAGACTTTGTGGGAAAGATCCAAGGCTTGCTCACCCAGACCACAAGAGCGGACGGCTCGACCCGTGCCCACAAAGCTACAGTGAGCACTGGGAGTGTCCACTTCACTCCAAAGCTGGGTAGTGTTCAATTCACCACTGACACGAACAATGATTTCCAAGCTGGCCAAAACACAAAATTCACCCCAGTTGGCGTCATCCAAGACGGTGATCACCACCAGAATGAACCCCAACAATGGTCACTCCCAAATTACTCAGGTAGAACTGGTCACAATGTGCACCTGGCCCCTGCTGTCGCCCCCACTTTCCCCGGTGAGCAGCTTCTTTTCTTTAGATCCACCATGCCAGGGTGTAGCGGGTACCCCAACATGAATTTGGATTGCTTACTCCCCCAGGAATGGGTGTTGCACTTTTACCAGGAAGCAGCCCCAGCACAATCCGATGTGGCACTGCTGAGATTTGTGAATCCAGATACAGGTAGGGTTCTGTTTGAGTGCAAGCTTCATAAATCAGGCTATATCACAGTGGCCCACACCGGCCCGTATGATTTGGTTCTCCCCCCTAATGGTTATTTCAGATTTGATTCTTGGGTCAACCAGTTCTACACACTCGCCCCCATGGGAAATGGAACGGGGCGCAGGCGTGCATTATAATGGCTGGAGCTTTCTTTGCTGGATTGGCATCTGATGTCCTCGGCTCTGGACTTGGTTCTTTAATCAATGCTGGAGCTGGAGCCATCAATCAAAAAGTTGAATTTGAAAATAATAGGAAATTACAACAAGCTTCCTTTCAATTTAGTAGCACCCTACAACAGGCTTCTTTCCAACATGATAAAGAGATGCTCCAAGCACAAATTGAGGCTACTCAAAAATTACAACAAGATCTGATGAAGGTTAAACAGGCAGTGCTCCTAGAGGGTGGGTTTTCCACAGCAGATGCGGCCCGTGGGGCAATCAACGCCCCCATGACAAAGGCTCTGGACTGGAGCGGAACAAGGTACTGGGCCCCTGATGCCAGGGTCACAACATACAATGCAGGCCACTTTTCCACCCCTCAGTCTTTGGGGGCGTTGACAGGAAGGACTAATTCTAGGGTCTCTGCTCCTGCTCGGAGCTCCCCCAGTGCACTTTCTAATGCTCCTACTGCCACTTCTTTGCATTCAAATCAAACTGTTTCTACGAGACTAGGTTC--TTCAGCTGGTTCTGGTACCGGTGTCTCGAGTCTCTCGTCAGCTGCAAGGACTAGGAGTTGGGTTGAGGACCAAAACAGAAATTTGTCACCCTTCATGAGGGGGGCTCTCAACACATCATTTGTCACCCCTCCATCTAGTAGATCCTCCAGTCAAAGCACAGTCTCAACCGTGCCTAAAGAAATTTTGGACTCCTGGACT-GGCGCTTTCAACACGCGCAGGCAGCCTCTCTT-CGCTCACATTCGCAAACGAGGGGAGTCACGGGTGTAATGTGAAAAGACAAGATTGATTATCTTTCCTTTCTTTAGTGT 7 | >AB190457.1_Norwalk 8 | TGAATGAAGATGGCCCCATCATCTTTGAGAAACACTCTAGGTACCACTATCACTATGATGCAGACTACTCCCGGTGGGATTCAACACAACAGAGAGCTGTGCTAGCCGCAGCCCTAGAGATCATGGTAAAATTCTCTTCAGAACCACACCTAGCCCAGGTGGTTGCAGAGGACCTTCTTTCCCCCAGTGTGATGGACGTGGGTGATTTCAAGATATCAATCACTGAAGGGCTTCCTTCCGGGGTTCCTTGCACTTCACAGTGGAACTCCATCGCCCATTGGCTCCTCACACTCTGTGCACTCTCTGAAACCACAAATCTGTCCCCTGACATCATCCAGGCAAATTCTCTCTTCTCCTTCTATGGTGATGATGAAATCGTGAGCACAGATATCAAATTAGACCCAGGGAAGCTTACAGCAAAGTTAAAAGAGTATGGGTTAAAACCAACTCGCCCTGACAAGACTGAGGGACCTCTGATCATCTCTGAGGACCTAGATGGCCTGACCTTCCTACGAAGGACCGTGACCCGTGACCCAGCTGGTTGGTATGGAAAGCTGGAGCAAAGTTCAATACTTAGACAAATGTATTGGACCAGGGGCCCTAACCATGAGGACCCCTCTGAAACAATGATACCACACTCCCAGAGACCCATACAGCTGATGTCACTGTTAGGTGAAGCAGCACTACATGGACCATCATTCTACAGCAAGATTAGTAAGCTAGTCATTGCAGAGCTGAAGGAAGGTGGCATGGACTTTTACGTGCCTAGGCAAGAACCAATGTTCCGGTGGATGAGGTTCTCAGACTTGAGCACGTGGGAGGGCGATCGCAATCTGGCTCCCAGCTTTGTGAATGAAGATGGCGTCGAATGACGCCACTCCATCTAATGATGGTGCCGCCGGCCTCGTCCCAGAGATCAACAATGAGGCAATGGCGCTAGACCCAGTGGCGGGTGCAGCGATAGCAGCACCCCTCACTGGTCAGCAAAATATAATTGATCCCTGGATTATGAATAATTTTGTGCAAGCACCTGGTGGTGAGTTTACAGTATCCCCCAGGAATTCCCCTGGTGAAGTGCTTCTTAATTTGGAATTGGGCCCAGAAATAAACCCCTATTTGGCTCATCTTGCTAGAATGTATAATGGTTATGCAGGTGGATTTGAAGTGCAGGTAGTCCTGGCTGGAAATGCGTTTACAGCAGGAAAGATAATCTTTGCAGCTATACCCCCTAATTTTCCAATTGATAATCTGAGCGCAGCACAGATCACAATGTGCCCGCATGTGATTGTGGATGTCAGACAGTTGGAACCGGTCAACCTTCCGATGCCTGACGTTCGCAACAATTTCTTTCATTACAATCAAGGGTCTGATTCGAGATTGCGCTTAATCGCAATGCTGTATACACCTCTTAGGGCAAATAATTCTGGGGATGATGTTTTTACTGTGTCTTGTAGAGTGTTGACTAGGCCTAGCCCTGACTTTTCATTTAATTTTCTTGTGCCACCTACTGTGGAGTCAAAGACAAAGCCCTTCACCCTCCCTATTCTAACTATCTCTGAAATGTCCAATTCTAGGTTTCCAGTGCCGATTGATTCTCTGCACACCAGCCCAACTGAGAATATTGTTGTCCAGTGTCAAAATGGGCGTGTCACTCTTGATGGTGAGTTGATGGGCACCACCCAACTCTTACCGAGTCAAATCTGTG-CTTTCAGGGGCGTGCTCACCAGATCAACAAGCAGGGCCAGTGATCAGGCCGACACAGCAACCCCTAGGTTGTTTAATTATTATTGGCACATACAATTGGATAATCTAAATGGGACTCCTTATGATCCTGCAGAAGACATACCAGGCCCCCTAGGGACACCAGATTTCCGGGGCAAAGTCTTTGGCGTGGCCAGCCAGAG---AAACCCCGACAGCACAACTAGAGCACATGAAGCAAAGGTGGACACAACAGCTGGTCGTTTCACCCCAAAATTAGGCTCATTAGAAATATCCACTGAATC---TGGTGACTTTGACCAAAACCAACCAACAAGATTCACCCCAGTTGGCATT---------GGGGTTGACAATGAGGCAGACTTTCAACAATGGTCTTTACCCGACTATTCTGGTCAGTTCACCCACAACATGAACTTGGCCCCAGCTGTTGCTCCCAACTTTCCTGGTGAGCAGCTCCTTTTCTTTCGCTCACAGTTACCATCTTCTGGTGGGCGATCTAACGGGATTCTAGACTGCCTGGTCCCCCAAGAATGGGTTCAGCACTTCTACCAAGAATCGGCCCCCGCCCAAACACAAGTGGCCCTGGTTAGGTATGTCAACCCTGACACTGGTAGAGTGTTATTTGAGGCCAAGCTGCACAAATTAGGTTTCATGACTATAGCTAAGAATGGTGATTCTCCAATAACTGTCCCTCCAAATGGATACTTTAGGTTTGAATCTTGGGTGAACCCCTTTTATACACTTGCCCCCATGGGAACTGGGAATGGGCGCAGAAGGATTCAATAATGGCTGGAGCTTTTATAGCAGGATTGGCTGGTGACATACTTACAAATACTGTAGGATCTCTAGTTAATGCAGGGGCTAATGCTATTAATCAAAAAGTTGATTTTGAAAACAATAAATATTTACAAAATGCATCCTTC-AAT--------------------------------CATGATAAGGAAATGTTAAATGCACAAGTTGAGGCAACAAAGAGGTTACAGGCTGACATGATTGCTATCAAACAAGGGGTTTTGACCGCTGGCGGCTTCTCCCCTACTGATGCAGCCCGCGGGGCAATTAACGCCCCCATGACAAAAGTCCTAGACTGGAATGGAACGAGGTACTGGGCACCAAATGCCACCTCCACAACTT-TAATGTCGGGTGGCTT--CACAAATCAAGCTGTGCACAGAACCACGCCAAATTTTAAA---ATGAACCAGACTCCCAAATCCACACCCAGCAGTGGGTCTTC-AGTGAGGTCAAACTCAACCCAAATCACTAGCCTGAGCTCACACTCGTCCGGGTCGTCTCGATCCAGCGGGT-CTACAGTTGTTAGCTCA---TTACCATCCTCTAACAGGACTAGGGACTGGGTCAACCAACAGAATTTTAATCTGGAACCACACATGCCTGGATCTCTTAGGACAGCTTTTGTTACTCCACCATCTAGTACAGCCTCTAGTTCAGGCACGGTCTCAACCGTGCCCAAAAATGTTTTGGACTCCTGGACA-TCTGCGTTCAATACGCGCAGACAGCCGCTATT-CGCACACCTTCGCAGAAGGGGGGAGTCAAATGTTTA--GTGAAAAGATTATTTTAAATTTGATTTAAATTGGATTTGA 9 | >GU980585.1 10 | TGAATGAGGATGGTCCTATCATCTTTGAGAGACACTCCAGATATAAATACCATTATGATGCTGATTACTCCCGGTGGGACTCGACACAACAAAGAGCCGTGTTAGCAGCAGCCTTAGAAATCATGGTCAAGTTCTCCCCAGAGCCGCATCTGGCCCAAAAGGTTGCAGAAGACCTTCTTTCTCCCAGCGTGATGGACGTGGGTGATTTCAAAATATCAATTAATGAGGGTCTCCCCTCCGGGGTGCCCTGCACCTCCCAATGGAATTCCATCGCCCACTGGCTCCTCACTCTCTGTGCACTCTCTGAGGTTACAAACCTGTCCCCTGACATTATTCAGGCTAACTCTCTCTTTTCTTTCTACGGTGATGATGAAATTGTGAGTACAGACATAAAATTGGACCCAGAAAAACTGACAGCAAAACTCAAGGAATACGGGTTGAAACCGACCCGCCCTGACAAGACTGAAGGACCCCTTGTCATCTCTGAAGACCTGAATGGCCTAACCTTCCTGCGGAGGACCGTGACCCGCGACCCAGCAGGCTGGTTTGGAAAGTTGGAACAGAGTTCAATACTCAGACAAATGTATTGGACTAGGGGCCCCAACCATGAAGACCCATCTGAAACAATGATACCACACTCCCAGAGGCCCATACAATTGATGTCTTTGCTGGGTGAGGCTGCACTCCACGGCCCAGCATTCTACAGCAAAATCAGTAAACTGGTCATTGCAGAGTTGAAGGAAGGTGGCATGGATTTTTACGTGCCAAGACAAGAGCCAATGTTCAGATGGATGAGGTTCTCGGATCTGAGCACGTGGGAGGGCGATCGCAATCTGGCTCCCAGTTTTGTGAATGAAGATGGCGTCGAATGACGCCACTCCATCTAATGATGGTGCCGCCGGCCTCGTCCCAGAGATCAGTAATGAGGCAATGGCGCTAGATCCAGTGGCGGGTGCAGCGATAGCAGCGCCCCTCACTGGTCAGCAAAATATAATTGATCCCTGGATTATGAATAATTTTGTGCAAGCACCTGGTGGTGAGTTTACAGTATCCCCTAGGAATTCCCCTGGTGAAGTGCTTCTCAATTTGGAATTGGGCCCAGAAATAAATCCCTATTTGGCCCATCTTGCTAGAATGTATAATGGTTATGCAGGTGGGTTTGAAGTGCAGGTAGTCCTAGCTGGAAATGCGTTTACAGCAGGAAAGATAATCTTTGCAGCTATACCCCCTAATTTCCCAATTGATAATCTAAGCGCAGCACAGATCACAATGTGCCCACATGTGATTGTGGATGTCAGACAGTTGGAACCGGTCAACCTTCCGATGCCTGACGTTCGCAATAACTTCTTCCACTACAACCAAGGGTCTGATTCGAGATTGCGCTTAGTTGCAATGCTGTATACACCTCTTAGGGCAAATAATTCTGGGGATGATGTTTTTACTGTGTCTTGTAGGGTGCTGACTAGGCCTAGCCCTGACTTTTCATTTAATTTCCTTGTGCCACCTACTGTGGAGTCAAAGACAAAACCCTTCACCCTCCCTATTCTGACTATCTCTGAAATGTCCAATTCTAGGTTTCCAGTGCCGATTGATTCTCTGCACACCAGCCCAACTGAAAATGTTGTTGTCCAGTGCCAAAATGGACGCGTCACTCTTGATGGTGAGTTGATGGGCACCACCCAACTCTTACCTAGTCAAATCTGTG-CTTTTAGGGGCGTGCTCACCAGATCAACAAGCAGGGCCAGTGACCAGGCCGACACAGCAACCCCTAGGTTGTTTAATTATTATTGGCACATACAATTGGATAATCTAAATGGGACTCCTTATGATCCTGCAGAAGACATACCAGGCCCCCTAGGGACACCAGATTTCCGGGGCAAAGTCTTTGGCGTGGCCAGCCAGAG---AAATCCCGACAGCACAACTAGAGCACATGAAGCAAAGATAGACACAACAGCTGGTCGTTTTACCCCAAAACTAGGCTCATTAGAGATATCCACTGAATC---TGGTGACTTTGATCAAAACCAACCAACAAGATTCACCCCAGTTGGCATT---------GGGGTTGACCACGAGGCAGATTTCCAACAATGGTCTTTACCCGACTATTCTGGTCAGTTCACCCACAACATGAACTTAGCCCCAGCTGTTGCTCCCAACTTCCCTGGTGAGCAGCTCCTTTTCTTCCGCTCACAGTTACCATCTTCTGGTGGGCGATCCAACGGGATTCTAGACTGCCTGGTCCCCCAAGAATGGGTTCAGCACTTCTACCAAGAATCGGCCCCCGCCCAAACTCAAGTGGCCCTGGTTAGGTATGTCAACCCTGACACTGGTAGAGTGTTGTTTGAGGCCAAGCTGCACAAATTGGGTTTCATGACTATAGCTAAGAATGGTGACTCTCCAATAACTGTCCCCCCAAATGGATACTTTAGGTTTGAATCTTGGGTGAACCCATTTTATACACTTGCCCCCATGGGAACTGGGAATGGGCGTAGAAGGATTCAATAATGGCTGGAGCTTTTATAGCAGGATTAGCTGGTGATATGTTCACAAACACTGTAGGATCTCTAGTTAATGCAGGGGCTAATGCCATTAATCAAAAAGTTGATTTTGAAAACAATAAATATTTGCAAAATGCTTCCTTT-AAT--------------------------------CATGATAAGGAAATGTTAAATGCACAAATTGAGGCAACAAAGAGGTTACAGGCTGACATGATTGCTATCAAACAAGGGGTTTTGACCGCTGGCGGCTTTTCTCCTACTGATGCAGCCCGCGGGGCAATTAATGCCCCCATGACAAAAGTCCTAGATTGGAATGGAACGAGGTACTGGGCACCAAATGCCACCTCTACAACCT-CGATGTCGGGTGGCTT--CACAAACCAGGCTGTGCACAGAACCACGCCAAATTTTAAA---ACGAACCAGGCCCCCAAATCCACACCCAGCAGTGGGTCTTC-AGTGAGGTCACACTCAACCCAAATCACTAATCTGAGCTCACACTCGTCCGGGTCGTCTCGATCCAGCGGGT-CTACAGTTGTTAGCTCA---TTACCATCCTCTAACAGGACTAGGGACTGGGTCAACCAACAAAATTTTAATTTGGAACCACACATGCCTGGATCCCTTAGGACAGCTTTTGTTACTCCACCATCTAGTACAGCCTCTAGTTCAGGCACTGTCTCAACCGTGCCCAAAAATGTTTTGGACTCCTGGACA-TCTGCGTTCAATACGCGCAGACAGCCGCTATT-CGCACACCTTCGTAGAAGGGGGGAGTCAAATGTTTA--GTGAAAAGATTACTTTAAATTTGGTTTAAA-TTGGATTTG 11 | >AB365435.1 12 | TGAATGAGGATGGACCCATAATATTTGAGAAACATTCCAGATACAGGTATCATTATGATGCAGATTACTCCCGCTGGGATTCAACACAACAAAGAGCAGTGCTGGCTGCAGCTCTGGAAATAATGGTCAAATTCTCATCAGAACCTCATCTGGCCCAAGTAGTTGCAGAAGACCTCTTGTCCCCCAGTGTGATGGACGTGGGTGATTTCAAGATATCAATCAACGAGGGGTTGCCCTCTGGTGTACCTTGCACCTCACAATGGAACTCCATTGCCCACTGGCTCCTGACACTATGTGCGCTGTCTGAAGTCACTGACCTGTCCCCTGACATCATCCAGGCAAACTCCCTATTCTCCTTTTATGGTGATGATGAAATAGTAAGCACAGACATCAAATTGGACCCAGAGAAATTGACAACAAAATTGAGGGAATATGGGCTAAAACCAACCCGTCCTGATAAAACAGAAGGACCCTTAATTATCTCTGAAGATTTGGATGGCCTGACCTTCTTACGGAGAACGGTGACCCGTGATCCGGCTGGGTGGTTTGGCAAACTGGACCAAAGTTCAATACTCAGGCAGATGTACTGGACCAGGGGACCAAACCATGAGGACCCCTTTGAAACAATGATACCACACTCCCAAAGACCCATACAACTGATGTCATTACTGGGTGAAGCTGCATTGCATGGCCCATCATTCTACAGTAAAATTAGCAAATTGGTCATCTCAGAATTGAAAGAGGGTGGAATGGATTTTTACGTGCCCAGACAAGAACCAATGTTTAGGTGGATGAGATTCTCAGATTTGAGCACGTGGGAGGGCGATCGCAATCTGGCTCCCAGTTTTGTGAATGAAGATGGCGTCGAATGACGCCACTCCATCTAATGATGGTGCCGCCGGCCTCGTCCCAGAGATCAACAATGAGGCAATGGCGCTAGATCCAGTGGCGGGTGCAGCGATAGCAGCACCCCTCACTGGCCAGCAAAATATAATTGATCCCTGGATTATGAATAATTTTGTGCAAGCACCTGGTGGTGAGTTTACAGTATCCCCTAGGAATTCCCCTGGTGAAGTGCTTCTTAATTTGGAATTGGGCCCAGAAATAAATCCCTATTTGGCACATCTCGCTAGAATGTATAATGGTTATGCAGGTGGATTTGAAGTGCAGGTAGTCCTAGCTGGAAATGCGTTTACAGCAGGAAAGATAATCTTTGCAGCTATACCCCCTAATTTCCCAATTGATAATCTAAGCGCACGACAGATTACAATGTGCCCACATGTGATTGTGGATGTCAGACAGTTGGAACCAGTCAACCTCCCGATGCCTGACGTTCGCAACAATTTCTTTCATTATAACCAAGGATCTGATTCGAGATTGCGCTTAATTGCAATGCTGTACACACCTCTTAGGGCAAATAATTCTGGGGATGATGTTTTTACTGTGTCTTGTAGAGTGCTGACTAGGCCTAGCCCTGACTTCTCATTCAATTTCCTTGTGCCACCCACTGTGGAGTCAAAGACAAAACCCTTCACCCTCCCTATTCTGACTATCTCTGAAATGTCCAATTCTAGGTTTCCAGTGCCGATTGACTCTCTGCACACCAGCCCAACTGAGAATATTGTTGTCCAGTGCCAAAATGGGCGCGTCACTCTTGATGGTGAGCTGATGGGCACCACCCAACTCTTACCTAGTCAAATCTGTG-CTTTCATGGGCGTGCTCACCAGGTCAACAAGCAGGGCCAGTGATCAGGCCGACACAGCAACCCCTAGGTTGTTTAATTATTATTGGCATATACAATTGGATAATCCAAATGGGACTCCTTATGATCCTGCAGAAGACATACCAGGCCCCCTAGGGACACCAGATTTCCGGGGCAAAGTCTTTGGCGTGGCCAGCCAGAG---AAACCCCGACAGCACAACTAGAGCACATGAAGCAAAGGTGGACACAACAGCTGGTCGTTTCACCCCAAAACTAGGCTCATTAGAGATATCCACTGAATC---TGATGACTTTCATCAAAACCAACCAACAAGATTCACCCCAGTTGGCATT---------GGGGTTGACAATGAAGCAGACTTTCAACAGTGGTCTTTACCCGACTATTCTGGTCAGTTCACCCACAACATGAACTTAGCCCCAGCTGTTGCTCCCAACTTCCCTGGAGAGCAGCTCCTTTTCTTCCGCTCACAGTTACCATCCTCTGGTGGGCGATCCAACGGGATTCTAGACTGCCTGGTCCCTCAAGAGTGGGTTCAGCACTTCTACCAAGAATCGGCCCCCTCTCAAACTCAAGTGGCCCTGGTTAGGTATGTCAACCCTGACACTGGCAGAGTATTATTTGAGGCCAAGCTGCACAAATTAGGTTTCATGACTATAGCTAAGAATGGTGACTCTCCAATAACTGTCCCTCCAAATGGATACTTTAGGTTTGAATCTTGGGTGAACCCATTTTACACACTTGCCCCCATGGGAACTGGGAATGGGCGTAGAAGGATTCAATAATGGCTGGAGCTTTTATAGCAGGATTAGCTGGTGATATACTCACAAATACTGTAGGATCTTTAGTTAATGCAGGGGCTAATGCCATTAATCAAAAAGTTGATTTTGAAAATAATAAATATTTACAAAATGCTTCCTTT-AAT--------------------------------CATGATAAGGAAATGTTAAATGCACAAATTGAGGCAACAAAGAGGTTACAGGCTGACATGATTGCTATCAAACAAGGGGTTTTGACCGCTGGCGGCTTCTCCCCCACTGATGCAGCCCGCGGGGCAATTAACGCCCCCATGACAAAAGTCCTAGATTGGAATGGGACGAGGTACTGGGCACCAAATGCCACCTCCACAACCT-CGATGTCGGGTGGCTT--CACAAATCAGGCTGTGCACAGAACCACGCCAAATTTTAAA---ACGAACCAGGCTTCCAAAACCACACCCAGCAGTGGGTCTTC-AGTGAGGTCAAATTCAACCCAAGTCACTAGCCTGAGCTCATACTCGTCCGGGTCGTCTCGATCCAGCGGGT-CTACAGTTGTTAGCTCA---TTACCATCCTCTAACAGGACTAGGGACTGGGTCAACCAACAAAATTTTAATTTGGAACCACACATGCCTGGGTCTCTTAGGACAGCTTTTGTCACTCCACCATCTAGTACAGCCTCTAGTTCAGGCACGGTCTCAACCGTGCCCAAAAATGTTTTGGACTCCTGGACA-TCTGCGTTCAATACGCGCAGACAGCCGCTATT-CGCACACCTTCGAAGAAGGGGGGAGTCAAAAGTTTA--GTGAAAAGATTATTTTAAATTTGATTTAAA-TTGGATTTG 13 | >JX846924.1 14 | TGAATGAGGATGGCCCCATCATCTTTGAGAAGCACTCCAGGTACAACTACCATTATGATGCAGATTACTCTCGGTGGGATTCAACACAACAGAGGGCTGTGTTAGCTGCAGCTCTAGAAATCATGGTAAAATTTTCCCCAGAACCACACCTAGCCCAGATAGTCGCAGAAGACCTTTTGTCCCCCAGTGTGATGGACGTGGGCGATTTCAAAATATCAATCACTGAAGGGCTCCCCTCTGGGGTGCCTTGCACCTCACAATGGAACTCCATCGCCCATTGGCTCCTCACACTCTGTGCACTCTCTGAGGTAACAAATTTGTCCCCTGACACCATCCAAGCAAATTCTCTTTTCTCTTTCTATGGTGATGATGAAATTGTGAGCACAGATATTAAATTGGATCCAGAAAAGCTGACAGCTAAATTGAAAGAGTATGGGCTAAAACCAACTCGCCCTGACAAGACTGAAGGACCTCTGGTCATCTCTGAGGACTTGAATGGTCTGACCTTCCTGCGGAGAACTGTAACCCGCGACCCAGCTGGTTGGTTTGGAAAATTGGAACAGAGTTCAATACTTAGACAAATGTATTGGACCAGGGGCCCCAATCATGAGGACCCCTCCGAAACAATGATACCACATTCCCAAAGACCCATACAGCTAATGTCCCTACTAGGTGAAGCTGCACTGCATGGCCCATCATTCTACAGCAAGATCAGTAAGCTAGTTATTGCAGAGTTGAAGGAAGGTGGCATGGATTTTTACGTGCCCAGACAAGAACCAATGTTTCGATGGATGAGGTTCTCAGACTTGAGCACGTGGGAGGGCGATCGCAATCTGGCTCCCAGTTTTGTGAATGAAGATGGCGTCGAATGACGCTGCTCCATCTAACGATGGTGCCGCCGGCCTCGTCCCAGAGATCAACAATGAGGCAATGGCGCTAGAGCCAGTGGCGGGTGCAGCGATAGCAGCACCCCTCACTGGCCAGCAAAACATAATTGATCCCTGGATTATGAATAATTTTGTGCAAGCACCTGGTGGTGAGTTTACAGTGTCACCTAGGAATTCCCCTGGTGAAGTGCTTCTTAATTTAGAATTAGGTCCAGAAATAAACCCCTATTTGGCTCACCTTGCTAGGATGTACAATGGTTATGCAGGTGGGTTTGAAGTGCAGGTAGTCCTGGCTGGAAACGCGTTTACAGCAGGAAAGGTGATCTTTGCAGCTATACCCCCCAATTTTCCAATTGATAATCTGAGCGCAGCACAAATTACAATGTGCCCGCATGTGATTGTGGATGTCAGGCAGCTGGAACCAATTAATCTTCCGATGCCTGATGTCCGCAACAATTTCTTTCATTATAATCAAGGGTCTGATTCGAGGTTACGCTTAATTGCAATGCTGTATACACCTCTTAGGGCAAACAATTCCGGAGATGATGTTTTTACTGTGTCCTGTAGAGTATTAACTAGGCCTAGCCCTGATTTCTCATTCAATTTTCTTGTCCCACCCACTGTGGAATCAAAGACAAAACCCTTCACCCTCCCCATTCTGACTATCTCTGAAATGTCTAATTCCAGGTTTCCAGTGCCAATTGACTCTCTACACACCAGCCCGACTGAGAACATTGTTGTCCAGTGCCAAAATGGGCGCGTCACTCTTGACGGTGAGTTAATGGGTACCACCCAACTCTTGCCGAGTCAGATATGTG-CTTTCAGGGGCACGCTCACCAGATCAACAAGCAGGGCCAGTGATCAAGCCGACACAGCAACCCCTAGGTTATTCAATTATTATTGGCACATACAATTGGACAATCTAAATGGAACCCCCTACGACCCTGCAGAGGACATACCAGCCCCTCTGGGAACACCAGACTTCCGGGGCAAGGTCTTTGGCGTAGCCAGCCAGAG---AAACCCTGACAGCACAACAAGAGCACATGAAGCAAAAGTGGACACAACATCTGGTCGCTTCACCCCGAAATTGGGTTCCCTAGAAATATCCACTGAATC---CGATGACTTTGACCCAAACCAACCAACAAGATTCACCCCAGTTGGCATT---------GGGGTTGACAATGAGGCAGATTTTCAGCAATGGTCCTTACCTGACTATTCCGGTCAGTTCACTCACAACATGAACTTAGCCCCAGCTGTCGCCCCCAATTTCCCTGGTGAGCAGCTTCTTTTCTTCCGCTCACAGTTGCCATCTTCTGGTGGGCGGTCTAACGGGATTCTAGACTGCCTGGTCCCCCAGGAATGGGTTCAACACTTCTACCAGGAATCAGCCCCTGCCCAAACACAGGTGGCCCTGGTTAGGTATGTCAACCCTGACACTGGTAGAGTGCTATTTGAGGCCAAGCTACATAAATTAGGTTTCATGACTATAGCTAAGAATGGTGACTCTCCAATAACCGTCCCTCCAAATGGGTACTTTAGGTTTGAATCTTGGGTGAACCCCTTTTATACACTTGCCCCCATGGGAACTGGAAATGGGCGCAGAAGGATTCAATAATGGCTGGAGCCTTTATAGCAGGATTGGCTGGTGACATGCTCACAAGTACTGTGGGATCTTTAGTTAATGCAGGGGCTAGTGCTATCAATCAAAAAGTTGATTTTGAAAATAATAAATATTTACAAAATGCATCTTTT-AAT--------------------------------CATGATAAGGAGATGTTAAATGCACAAATTGAGGCAACAAAGAGGCTACAGGCTGACATGATTGCTATCAAACAAGGGGTCTTGACCGCTGGCGGCTTTTCCCCCACTGATGCAGCCCGTGGGGCAATTAATGCCCCCATAACAAGAGTTTTGGACTGGAGTGGAACGAGGTACTGGGCACCAAACGCCACCTCCACAACCT-CAATGTCAGGTGGCTT--CACAAGCCAAACTGTACACAGAACCACACCAAATTTTAAA---ACGAACCAGGCCCCCAAGTCCACACCCAGCAGTGGGTCTTC-AGTGAGATCAAACTCAACCCAACTCACTAGCTTGAGCTCACACTCATCCGGGTCGTCTCGATCCAGCGGGT-CTACGGTTGTTAGCTCA---TTGCCATCTTCCAACAGGACTAGGGATTGGGTCAATCAACAGAATTTCAATTTGGAACCACACATGCCTGGATCTCTCAGGACAGCTTTTGTCACTCCACCATCTAGTACAGCCTCTAATTCAGACACGGTCTCAACCGTGCCCAAAAGTGTTTTGGACTCCTGGACA-TCTGCGTTTAATACGCGCAGACAGCCGCTATT-CGCACACCTTCGCAGAAGGGGGGAGTCAAATGTTTA--GTGAAAAGATTATCTTAAATTTAGTTT------------- 15 | >GU991355.1 16 | TGAATGAGGATGGTCCTATCATCTTTGAGAGACACTCCAGATATAAATATCATTATGATGCTGATTACTCCCGGTGGGACTCAACACAACAAAGAGCCGTGCTAGCAGCAGCCTTGGAAATCATGGTCAAGTTCTCCCCAGAGCCGCACCTGGCCCAAAAGGTTGCAGAAGACCTTCTTTCTCCCAGCGTGATGGACGTGGGTGATTTCAAAATATCAATCAATGAGGGTCTCCCCTCCGGAGTGCCCTGCACCTCCCAATGGAATTCCATCGCCCACTGGCTCCTCACTCTCTGTGCACTCTCCGAGGTTACAAACCTGTCTCCTGACATCATTCAGGCCAACTCTCTCTTTTCTTTCTACGGTGATGATGAAATTGTGAGTACAGACATAAAATTGGACCCAGAAAAACTGACAGCAAAACTCAAGGAATACGGGTTGAAACCGACCCGCCCTGACAAGACTGAAGGGCCTCTTGTCATCTCCGAAGACCTGAATGGCCTGACCTTCCTGCGGAGGACCGTGACCCGCGACCCAGCAGGCTGGTTTGGAAAGTTGGAACAGAGCTCAATACTCAGACAAATGTACTGGACTAGGGGCCCCAACCATGAAGATCCATCTGAAACAATGATACCACACTCCCAGAGGCCCATACAATTGATGTCTTTGCTGGGTGAGGCAGCACTCCACGGCCCAGCATTCTACAGCAAAATCAGTAAACTGGTCATTGCAGAGTTGAAGGAAGGTGGCATGGATTTTTACGTGCCAAGACAAGAGCCAATGTTCAGATGGATGAGATTCTCGGATCTGAGCACGTGGGAGGGCGATCGCAATCTGGCTCCCAGTTTTGTGAATGAAGATGGCGTCGAATGACGCCGCTCCATCTAACGATGGTGCCGCCGGCCTCGTCCCAGAGATCAACAATGAGGCAATGGCGCTAGAACCAGTGGCGGGTGCAGCGATAGCAGCACCCCTCACTGGCCAGCAAAATATAATTGATCCCTGGATTATGAATAATTTTGTGCAAGCACCTGGTGGTGAGTTTACAGTGTCTCCTAGGAATTCCCCTGGTGAAGTGCTTCTCAATTTGGAATTGGGCCCAGAAATAAATCCCTATTTGGCCCATCTTGCTAGAATGTATAATGGTTATGCAGGTGGGTTTGAAGTGCAGGTAGTCCTAGCTGGAAATGCGTTTACAGCAGGAAAGATAATTTTTGCAGCTATACCCCCTAACTTCCCAATTGACAATCTAAGCGCAGCACAGATCACAATGTGCCCACATGTGATTGTGGATGTCAGACAGTTGGAACCGGTCAACCTTCCGATGCCTGACGTTCGCAATAACTTCTTCCATTACAACCAAGGGTCTGATTCGAGATTGCGCTTAGTTGCAATGCTGTATACACCTCTTAGGGCAAATAATTCTGGGGATGATGTTTTTACTGTGTCTTGTAGAGTGCTGACTAGGCCTAGCCCTGACTTTTCATTTAACTTCCTTGTGCCACCCACTGTGGAGTCAAAGACAAAACCCTTCACCCTCCCTATTTTGACTATCTCTGAAATGTCTAATTCTAGGTTTCCAGTGCCGATTGATTCTCTGCACACCAGCCCAACTGAGAATATTGTTGTCCAGTGCCAAAATGGGCGCGTCACTCTTGATGGTGAGTTGATGGGCACCACCCAACTCTTACCTAGTCAAATCTGTG-CTTTTAGGGGCGTGCTCACCAGATCAACAAGCAGGGCCAGTGACCAGGCCGACACAGCAACCCCTAGATTGTTTAATTATTATTGGCACATACAATTGGATAATCTAAATGGGACTCCTTATGATCCTGCAGAAGACATACCAGGCCCCCTGGGGACACCAGACTTCCGGGGCAAAGTCTTTGGCGTGGCCAGCCAGAG---AAATCCCGACAGTACAACTAGAGCACATGAAGCGAAGGTGGACACAACAGCTGGTCGCTTTACCCCAAAACTAGGCTCATTGGAGATATCCACTGAATC---TGGTGACTTTAATCAAAACCAACCAACAAGATTCACCCCAGTTGGCATT---------GGGGTTGACCACGAGGAAGACTTCCAACAATGGTCCTTACCCGACTATTCTGGTCAGTTCACTCACAACATGAACTTAGCCCCAGCTGTTGCTCCCAACTTCCCTGGTGAGCAGCTCCTTTTCTTCCGCTCACAGTTACCATCTTCTGGTGGGCGATCCAATGGGATTCTAGACTGCCTGGTCCCCCAAGAATGGGTTCAGCACTTCTACCAAGAATCGGCCCCCACCCAAACCCAGGTGGCCCTGGTTAGATATGTCAACCCTGACACTGGTAGAGTGTTGTTTGAGGCCAAGCTGCACAAATTAGGTTTCATGACTATAGCTAAGAATGGTGACTCTCCAATAACTGTCCCCCCAAATGGATACTTTAGGTTTGAATCTTGGGTGAACCCATTTTATACACTTGCCCCCACGGGAACTGAGAATGGGCGTAGAAGGGTTCAATAATGGCTGGAGCTTTTATAGCAGGATTAGCTGGTGATATATTCACAAATACTGTAGGATCTCTAGTTAATGCAGGGGCTAATGCCATTAATCAAAAAGTTGATTTTGAAAATAACAAATATTTGCAAAATGCTTCCTTT-AAT--------------------------------CATGATAAAGAAATGTTAAATGCACAAATTGAGGCAACAAAGAGGTTACAGGCTGACATGATTGCTATCAAACAGGGGGTTTTGACCGCTGGCGGCTTTTCCCCCACTGATGCAGCTCGCGGGGCAATTAGTGCCCCCATGACAAAAGTCCTAGATTGGAATGGAACGAGGTACTGGGCACCAAATGCCACCTCTACAACCT-CGATGTCGGGTGGCTT--CACAAACCAGGCTGTGCACAGAACCGCGCCAAATTTTAAA---ACGAACCAGGCCCCCAAATCCACACCCAGCAGTGGGTCTTC-AGTGAGGTCACTCTCAACCCAAATCACTAGTCTGAGCTCACACTCGTCCGGGTCGTCTCGATCCAGCGGGT-CTACAGTTGTTAGCTCA---TTACCATCCTCTAACAGGACTAGGGACTGGGTCAATCAACAAAATTTCAATTTGGAACCACACATGCCTGGGTCCCTTAGGACAGCTTTTGTTACTCCACCATCTAGTACAGTCTCTAGTTCAGGCACTGTCTCAACCGTGCCCAAGAA-GTTTTGGACTCCTGGACAATCTGCGTTTAATACGCGCAGACAGCCGCTATT-CGCACACCTTCGTAGAAGGGGGGAGTCAAATGTTTA--GTGAAAAGATTACTTTAAATTTGATTTAAA-TTGGATTTG 17 | >KY442319.1 18 | TGAATGAGGATGGCCCTATTATCTTTGAGAAACACTCTAGATACAAATACCATTATGATGCAGACTACTCTCGGTGGGATTCAACACAGCAGAGAGCTGTACTGGCTGCAGCCCTAGAAATCATGGTCAAATTTTCTTCAGAACCACACCTAGCCCAGATAGTCGCAGAAGACCTTCTGTCTCCCAGTGTGATGGACGTGGGCGACTTCAAAATATCAATCACTGAAGGACTCCCTTCTGGGGTGCCTTGCACCTCACAATGGAATTCTATTGCCCATTGGCTCCTCACACTCTGTGCACTCTCCGAGGTGACAAACTTATCCCCTGATATTATCCAAGCCAATTCTCTTTTCTCTTTCTATGGTGATGATGAAATTGTGAGTACAGATATTAAATTGGATCCAGAAAAACTGACAGCTAAACTGAAAGAGTATGGGCTAAAACCAACACGCCCTGATAAGACTGAAGGGCCTCTGGTCATCTCTGAGGACCTGAATGGTCTGACCTTCCTGCGGAGGACTGTGACCCGCGATCCAGCTGGTTGGTTTGGAAAATTGGAACAGAGTTCAATACTTAGACAAATGTATTGGACCAGGGGTCCAAATCATGAGGACCCCTCTGAAACAATGATACCACATTCCCAGAGACCTATACAGCTAATGTCCCTGCTAGGTGAAGCTGCACTGCATGGCCCATCATTCTACAGCAAGATCAGCAAGCTAGTCATTGCAGAGTTGAAGGAAGGTGGCATGGATTTTTACGTGCCTAGACAAGAGCCAATGTTTCGGTGGATGAGGTTCTCAGACTTGAGCACGTGGGAGGGCGATCGCAATCTGGCTCCCAGTTTCGTGAATGAAGATGGCGTCGAATGACGCCACTCCATCTAACGATGGTGCCGCCGGCCTCGTCCCAGAGATCAACAATGAGGCAATGGCGCTAGAGCCAGTGGCAGGTGCAGCAATAGCAGCACCCCTCACTGGCCAGCAAAATATAATTGATCCCTGGATTATGAATAATTTTGTGCAAGCACCTGGTGGTGAGTTTACAGTGTCACCTAGGAACTCCCCTGGTGAAGTACTTCTTAATTTAGAATTAGGTCCAGAAATAAACCCCTATTTGGCTCACCTTGCTAGGATGTACAATGGTTATGCAGGTGGGTTTGAAGTGCAGGTAGTCCTGGCTGGAAATGCGTTTACAGCAGGAAAGGTGATCTTTGCAGCTATACCCCCCAATTTTCCAATTGACAATCTGAGCGCAGCACAGATTACAATGTGCCCGCATGTGATCGTGGATGTCAGGCAATTGGAACCAATCAACCTTCCGATGCCTGATGTCCGTAATAATTTCTTTCATTATAATCAAGGGTCTGATTCGAGGTTACGCTTAATTGCAATGTTATACACACCTCTTAGGGCAAATAATTCAGGAGATGATGTTTTCACTGTGTCTTGTAGAGTATTAACTAGGCCTAGCCCTGATTTCTCATTCAATTTTCTTGTTCCACCCACTGTGGAATCAAAGACAAAGCCTTTCACCCTCCCCATTTTGACTATCTCTGAAATGTCTAATTCCAGGTTTCCAGTGCCAATTGACTCTCTGCACACCAGCCCGACTGAGAATATTGTTGTCCAGTGCCAAAATGGGCGCGTCACCCTTGACGGTGAGTTAATGGGCACCACCCAACTCTTGCCGAGTCAAATATGTG-CTCTCAGGGGCACGCTCACCAGATCAACAAGCAGGGCCAGTGACCAAGCCGACACGGCAACCCCTAGGCTGTTCAATTATTATTGGCACATACAATTGGATAATCTAAATGGAACCCCCTACGACCCTGCAGAAGACATACCAGCCCCTTTGGGAACACCAGACTTCCGGGGCAAGGTCTTTGGCGTAGCTAGCCAGAG---AAACCCTGACAGCACAACAAGAGCACATGAAGCAAAAGTGGACACAACATCTGGTCGCTTCGCCCCGAAATTGGGTTCCCTAGAAATATCCACTGAATC---CAGTGACTTTGACTCAAATCAACCAACAAGGTTCACCCCAGTTGGCATT---------GGGGTTGACAATGAGGCAGATTTTCAGCAATGGTCCTTACCTGACTACTCCGGTCAGTTCACTCATAACATGAACTTAGCCCCAGCTGTCGCCCCCAATTTCCCTGGTGAGCAGCTTCTTTTCTTCCGCTCACAGTTGCCATCTTCTGGTGGGCGGTCTAACGGGATTCTAGACTGCCTGGTTCCCCAGGAATGGGTTCAACACTTCTACCAGGAATCAGCCCCTGCCCAAACACAGGTGGCCCTGGTTAGGTATGTTAACCCTGACACTGGTAGAGTGCTATTTGAGGCCAAGCTACACAAATTGGGCTTCATGACTATAGCTAAGAATGGTGACTCTCCAATAACTGTCCCTCCAAATGGGTACTTTAGGTTTGAATCTTGGGTGAACCCCTTTTATACACTTGCCCCCATGGGAACTGGAAATGGGCGTAGAAGGATTCAATAATGGCTGGAGCTTTTATAGCAGGATTGGCTGGTGACATGCTCACAAATACTGTAGGATCTTTAGTTAATTCAGGGGCTAGTGCCATCAATCAAAAAGTTGATTTTGAAAATAATAAATATTTACAAAATGCATCTTTT-GCT--------------------------------CATGATAAGGAGATGTTAAATGCACAAATTGAGGCAACAAAGAGGCTACAGGCTGACATGATTGCTATCAAACAAGGGGTCTTGACCGCTGGCGGCTTCTCCCCCACTGATGCAGCCCGTGGGGCAATTAATGCCCCCATGACAAAAGTTTTGGATTGGAGTGGAACGAGGTACTGGGCACCAAACGCCACCTCCACAACCT-CAATGTCAGGTGGCTT--CACAAGCCAAACTGTGCACAGAACCGCACCAAATTTTAAA---ACGAACCAGGCCCCCAAGTCCACACCCAGCAGTGGGTCTTC-AGTGAGGTCAAATTCAACCCAACTCACTAGCTTGAGCTCACACTCCTCCGGGTCGTCTCGATCCAGCGGGT-CTACGGTTGTCAGCTCA---TTGCCATCTTCCAACAGGACTAGGGATTGGGTCAATCAACAGAATCTCAATTTGGAACCACACATGCCTGGATCTCTTAGGACAGCTTTTGTCACTCCACCATCTAGTACAGCCTCTAGTTCAGGCACGGTCTCAACCGTGCCCAAAAGTGTTTTGGACTCCTGGACA-TCTGCGTTTAATACGCGCAGACAACCGCTATT-CGCACACCTTCGTAGAAGGGGGGAGTCAAATGTTTA--GTGAAAAGATTATTTTAAATTTAATTT------------- 19 | >MG601446.1 20 | TGAATGAGGATGGCCCTATCATCTTTGAGAGACATTCCAGATATAAATATCATTATGATGCTGACTACTCCCGGTGGGATTCAACACAACAAAGAGCCGTGTTGGCAGCAGCCTTAGAGATCATGGTCAAGTTTTCCCCAGAGCCGCACCTGGCCCAAAAGGTTGCAGAAGACCTTCTTTCTCCCAGCGTGATGGACGTGGGTGATTTCAAAATATCAATCAATGAGGGTCTCCCCTCCGGAGTGCCCTGCACCTCCCAATGGAACTCCATCGCCCACTGGCTCCTCACTCTCTGTGCACTCTCCGAGGTTACAAACCTGTCCCCTGACATTATTCAGGCTAACTCTCTCTTTTCTTTCTACGGTGATGATGAAATTGTGAGTACAGACATAAAATTGGACCCAGAAAAACTGACAGCAAAACTTAAGGAATACGGGTTGAAACCGACCCGTCCTGACAAAACTGAAGGGCCTCTTGTCATTTCCGAAGACCTGAATGGCCTAACCTTCCTGCGGAGGACCGTGACTCGCGACCCAGCAGGCTGGTTTGGAAAGTTGGAACAGAGCTCAATACTCAGACAAATGTACTGGACTAGGGGCCCCAACCATGAAGATCCATCTGAAACAATGATACCACACTCCCAGAGGCCCATACAATTGATGTCTTTGCTGGGTGAGGCAGCACTCCACGGCCCAGCATTCTACAGCAAAATCAATAAACTGGTCATTGCAGAGTTGAAGGAAGGTGGCATGGATTTTTACGTGCCAAGACAAGAGCCAATGTTCAGATGGATGAGATTCTCGGATCTGAGCACGTGGGAGGGCGATCGCAATCTGGCTCCCAGTTTTGTGAATGAAGATGGCGTCGAATGACCCCACTCCATCTAATGATGGTGCCGCCGGCCTCGTCCCAGAGATCAACAATGAGGCAATGGCGCTAGATCCAGTGGCGGGTGCAGCGATAGCAGCACCCCTCACTGGCCAGCAAAATATAATTGATCCCTGGATTATGAACAATTTTGTGCAAGCACCTGGTGGTGAGTTCACAGTGTCTCCTAGGAATTCCCCTGGTGAAGTGCTCCTTAATTTGGAATTGGGCCCAGAGATAAACCCCTATCTGGCCCATCTTGCTAGAATGTATAATGGTTATGCAGGTGGGTTTGAGGTGCAGGTAGTCCTGGCTGGAAATGCGTTTACAGCAGGAAAGATAATCTTTGCAGCTATACCCCCTAATTTCCCAATTGATAATCTAAGTGCAGCACAGATCACAATGTGCCCACATGTGATTGTGGATGTCAGGCAGTTGGAACCGGTCAACCTCCCGATGCCTGACGTTCGCAATAACTTCTTCCACTACAACCAAGGGTCTGATTCGAGATTGCGCTTGGTTGCAATGCTGTACACACCTCTTAGGGCAAATAACTCTGGGGATGATGTTTTCACTGTGTCTTGTAGAGTGCTGACTAGACCTAGTCCTGAATTTTCATTTAACTTCCTTGTGCCACCCACTGTGGAGTCAAAGACAAAACCCTTTACCCTCCCAATTCTGACTATCTCTGAAATGTCTAATTCTAGGTTTCCAGTGCCGATTGATTCTCTGCACACCAGCCCAACTGAGAATATTGTTGTCCAGTGCCAAAATGGACGCGTCACTCTTGATGGTGAGTTGATGGGCACCACTCAGCTCTTACCTAGTCAAATCTGTG-CTTTCAGGGGCGTGCTCACTAGATCAACGAGCAGGGCTAGTGACCAGGCCGACACAGCAACCCCTAGATTGTTTAATTATTATTGGCATATACAATTGGATAATCTGAATGGGACTCCTTATGATCCTGCAGAAGACATACCAGGCCCCCTGGGGACACCAGATTTCCGGGGCAAAGTCTTTGGCGTGGCCAGCCAAAG---AAACCCCGACAGTACAACTAGAGCACATGAAGCAAAGGTGGACACAACAGCTGGTCGCTTCACCCCAAAACTAGGCTCATTAGAGATATCCACTGAATC---TGGTGACTTTGACCAAAACCAACCAACAAGATTCACCCCAGTTGGCATT---------GGGGTTGACCACGAGGCAGACTTCCAGCAATGGTCCTTACCCGACTACTCTGGCCAGTTCACTCACAACATGAACTTAGCCCCAGCTGTTGCTCCCAACTTCCCTGGTGAGCAGCTCCTTTTCTTCCGCTCACAGTTACCATCTTCTGGTGGGCGATCCAATGGGATTCTAGACTGCCTGGTCCCCCAAGAATGGGTTCAGCACTTCTACCAAGAATCAGCCCCCGCCCAAACCCAGGTGGCTTTGGTTAGATATGTCAACCCTGACACTGGTAGAGTGTTGTTTGAGGCCAAGCTGCACAAATTGGGTTTCATGACTATAGCTAAGAATGGTGACTCTCCAATAACTGTCCCCCCAAATGGATATTTTAGGTTTGAATCTTGGGTGAACCCATTTTATACACTTGCCCCCATGGGAACTGGGAATGGGCGTAGAAGAGTTCAATAATGGCTGGAGCTTTTATAGCAGGATTAGCTGGTGATATATTCACAAATACTGTAGGGTCTCTAGTTAATGCAGGGGCCAATGCTATTAATCAAAAAGTTGATTTTGAAAATAATAAATATTTGCAAAATGCTTCCTTC-AAT--------------------------------CATGATAAGGAAATGTTAAATGCACAGATTGAGGCAACAAAGAGGTTACAGGCTGACATGATTGCTATCAAACAGGGGGTTTTGACCGCTGGCGGCTTTTCCCCCACTGATGCAGCCCGCGGGGCAATTAATGCCCCCATGACAAAAGTCCTAGATTGGAATGGAACAAGGTACTGGGCACCAAGTGCCACCTCTACAACCT-CGATGTCGGGTGGCTT--CACAAACCAGGCTGTGCACAGAACCACGCCAAATTTTAAA---ATGAACCAGGCCCCCAAATCCACACCCAGCAGTGGGTCTTC-AGTGAGGTCACTCTCAACCCAAGTCACTAGTCTGAGCTCACACTCGTCCGGGTCGTCTCGATCCAGCGGGT-CTACAGTTGCTAGCTCA---TTACCATCCTCTAACAGGACTAGGGACTGGGTCAGTCAGCAAAATTTCAATTTGGAACCACACATGCCTGGGTCACTTAGGACGGCTTTTGTCACTCCACCATCTAGTACAGCCTCTAGTTCAGGCACTGTCTCAACCGTGCCCAAAAATGTTTTGGACTCCTGGACA-TCTGCGTTTAACACGCGCAGACAGCCGCTATT-CGCACACCTTCGTAGAATGGGGGAGTCAAATGTTTA--GTGAAAAGGTTATTTTAAATTTGATTTAAA-TTGGATCTG 21 | >KY442320.1 22 | TGAATGAGGATGGCCCTATTATCTTTGAGAAACACTCTAGATACAAATACCATTATGATGCAGACTACTCTCGGTGGGATTCAACACAGCAGAGAGCTGTACTGGCTGCAGCCCTAGAAATCATGGTCAAATTTTCTTCAGAACCACACCTAGCCCAGATAGTCGCAGAAGACCTTCTGTCTCCCAGTGTGATGGACGTGGGCGACTTCAAAATATCAATCACTGAAGGACTCCCTTCTGGGGTGCCTTGCACCTCACAATGGAATTCTATTGCCCATTGGCTCCTCACACTCTGTGCACTCTCCGAGGTGACAAACTTATCCCCTGATATTATCCAAGCCAATTCTCTTTTCTCTTTCTATGGTGATGATGAAATTGTGAGTACAGATATTAAATTGGATCCAGAAAAACTGACAGCTAAACTGAAAGAGTATGGGCTAAAACCAACACGCCCTGATAAGACTGAAGGGCCTCTGGTCATCTCTGAGGACCTGAATGGTCTGACCTTCCTGCGGAGGACTGTGACCCGCGATCCAGCTGGTTGGTTTGGAAAATTGGAACAGAGTTCAATACTTAGACAAATGTATTGGACCAGGGGTCCAAATCATGAGGACCCCTCTGAAACAATGATACCACATTCCCAGAGACCTATACAGCTAATGTCCCTGCTAGGTGAAGCTGCACTGCATGGCCCATCATTCTACAGCAAGATCAGCAAGCTAGTCATTGCAGAGTTGAAGGAAGGTGGCATGGATTTTTACGTGCCTAGACAAGAGCCAATGTTTCGGTGGATGAGGTTCTCAGACTTGAGCACGTGGGAGGGCGATCGCAATCTGGCTCCCAGTTTCGTGAATGAAGATGGCGTCGAATGACGCCACTCCATCTAACGATGGTGCCGCCGGCCTCGTCCCAGAGATCAACAATGAGGCAATGGCGCTAGAGCCAGTGGCAGGTGCAGCAATAGCAGCACCCCTCACTGGCCAGCAAAATATAATTGATCCCTGGATTATGAATAATTTTGTGCAAGCACCTGGTGGTGAGTTTACAGTGTCAGCTAGGAACTCCCCTGGTGAAGTACTTCTTAATTTAGAATTAGGTCCAGAAATAAACCCCTATTTGGCTCACCTTGCTAGGATGTACAATGGTTATGCAGGTGGGTTTGAAGTGCAGGTAGTCCTGGCTGGAAATGCGTTTACAGCAGGAAAGGTGATCTTTGCAGCTATACCCCCCAATTTTCCAATTGACAATCTGAGCGCAGCACAGATTACAATGTGCCCGCATGTGATCGTGGATGTCAGGCAATTGGAACCAATCAACCTTCCGATGCCTGATGTCCGTAATAATTTCTTTCATTATAATCAAGGGTCTGATTCGAGGTTACGCTTAATTGCAATGTTATACACACCTCTTAGGGCAAATAATTCAGGAGATGATGTTTTCACTGTGTCTTGTAGAGTATTAACTAGGCCTAGCCCTGATTTCTCATTCAATTTTCTTGTTCCACCCACTGTGGAATCAAAGACAAAGCCTTTCACCCTCCCCATTTTGACTATCTCTGAAATGTCTAATTCCAGGTTTCCAGTGCCAATTGACTCTCTGCACACCAGCCCGACTGAGAATATTGTTGTCCAGTGCCAAAATGGGCGCGTCACCCTTGACGGTGAGTTAATGGGCACCACCCAACTCTTGCCGAGTCAAATATGTG-CTCTCAGGGGCACGCTCACCAGATCAACAAGCAGGGCCAGTGACCAAGCCGACACGGCAACCCCTAGGCTGTTCAATTATTATTGGCACATACAATTGGATAATCTAAATGGAACCCCCTACGACCCTGCAGAAGACATACCAGCCCCTTTGGGAACACCAGACTTCCGGGGCAAGGTCTTTGGCGTAGCTAGCCAGAG---AAACCCTGACAGCACAACAAGAGCACATGAAGCAAAAGTGGACACAACATCTGGTCGCTTCGCCCCGAAATTGGGTTCCCTAGAAATATCCACTGAATC---CAGTGACTTTGACTCAAATCAACCAACAAGGTTCACCCCAGTTGGCATT---------GGGGTTGACAATGAGGCAGATTTTCAGCAATGGTCCTTACCTGACTACTCCGGTCAGTTCACTCATAACATGAACTTAGCCCCAGCTGTCGCCCCCAATTTCCCTGGTGAGCAGCTTCTTTTCTTCCGCTCACAGTTGCCATCTTCTGGTGGGCGGTCTAACGGGATTCTAGACTGCCTGGTTCCCCAGGAATGGGTTCAACACTTCTACCAGGAATCAGCCCCTGCCCAAACACAGGTGGCCCTGGTTAGGTATGTTAACCCTGACACTGGTAGAGTGCTATTTGAGGCCAAGCTACACAAATTGGGCTTCATGACTATAGCTAAGAATGGTGACTCTCCAATAACTGTCCCTCCAAATGGGTACTTTAGGTTTGAATCTTGGGTGAACCCCTTTTATACACTTGCCCCCATGGGAACTGGAAATGGGCGTAGAAGGATTCAATAATGGCTGGAGCTTTTATAGCAGGATTGGCTGGTGACATGCTCACAAATACTGTAGGATCTTTAGTTAATTCAGGGGCTAGTGCCATCAATCAAAAAGTTGATTTTGAAAATAATAAATATTTACAAAATGCATCTTTT-GCT--------------------------------CATGATAAGGAGATGTTAAATGCACAAATTGAGGCAACAAAGAGGCTACAGGCTGACATGATTGCTATCAAACAAGGGGTCTTGACCGCTGGCGGCTTCTCCCCCACTGATGCAGCCCGTGGGGCAATTAATGCCCCCATGACAAAAGTTTTGGATTGGAGTGGAACGAGGTACTGGGCACCAAACGCCACCTCCACAACCT-CAATGTCAGGTGGCTT--CACAAGCCAAACTGTGCACAGAACCGCACCAAATTTTAAA---ACGAACCAGGCCCCCAAGTCCACACCCAGCAGTGGGTCTTC-AGTGAGGTCAAATTCAACCCAACTCACTAGCTTGAGCTCACACTCCTCCGGGTCGTCTCGATCCAGCGGGT-CTACGGTTGTCAGCTCA---TTGCCATCTTCCAACAGGACTAGGGATTGGGTCAATCAACAGAATCTCAATTTGGAACCACACATGCCTGGATCTCTTAGGACAGCTTTTGTCACTCCACCATCTAGTACAGCCTCTAGTTCAGGCACGGTCTCAACCGTGCCCAAAAGTGTTTTGGACTCCTGGACA-TCTGCGTTTAATACGCGCAGACAACCGCTATT-CGCACACCTTCGTAGAAGGGGGGAGTCAAATGTTTA--GTGAAAAGATTATTTTAAATTTAATTT------------- 23 | >MN199033.1 24 | TGAATGAGGATGGCCCTATCATCTTTGAGAGACACTCCAGGTATAAATATCATTATGATGCTGACTACTCCCGGTGGGATTCAACACAACAAAGAGCCGTGTTGGCAGCAGCCTTAGAAATCATGGTCAAGTTTTCCCCAGAGCCGCACCTGGCCCAAAAGGTTGCAGAAGACCTTCTTTCTCCCAGTGTGATGGACGTGGGTGATTTCAAAATATCAATCAATGAGGGTCTCCCCTCCGGAGTGCCCTGCACCTCCCAATGGAATTCCATCGCCCACTGGCTCCTCACTCTCTGTGCACTCTCCGAGGTTACAAACCTGTCCCCTGACATTATTCAGGCCAACTCTCTCTTTTCTTTCTACGGTGATGATGAAATTGTGAGTACAGACATAAAATTGGACCCAGAAAAACTGACAGCAAAACTTAAGGAATACGGGTTGAAACCGACCCGCCCTGACAAGACTGAAGGGCCTCTTGTCATTTCCGAAGACCTGAATGGCCTAACCTTCCTGCGGAGGACCGTGACTCGCGACCCAGCAGGCTGGTTTGGAAAGTTGGAACAGAGCTCAATACTCAGACAAATGTACTGGACTAGGGGCCCCAACCATGAAGATCCATCTGAAACAATGATACCACACTCCCAGAGGCCCATACAATTGATGTCTTTGCTGGGTGAGGCAGCACTCCACGGCCCAGCATTCTACAGCAAAATCAGTAAACTGGTCATTGCAGAGTTGAAGGAAGGTGGCATGGATTTTTACGTGCCAAGACAAGAGCCAATGTTCAGATGGATGAGATTCTCGGATCTGAGCACGTGGGAGGGCGATCGCAATCTGGCTCCCAGTTTTGTGAATGAAGATGGCGTCGAATGACGCCACCCCATCTAATGATGGTGCCGCCGGCCTCGTCCCAGAGATCAACAATGAGGCAATGGCGCTAGAGCCAGTGGCGGGTGCAGCGATAGCGGCACCCCTCACTGGCCAGCAAAATATAATTGATCCCTGGATTATGAATAATTTTGTGCAAGCACCTGGTGGTGAGTTCACAGTGTCTCCTAGGAATTCCCCTGGTGAAGTGCTCCTCAATTTGGAATTGGGCCCAGAAATAAACCCCTATCTGGCCCATCTTGCTAGAATGTATAATGGTTATGCAGGTGGGTTTGAAGTGCAGGTAGTCCTAGCTGGAAATGCGTTTACAGCAGGAAAGATAATCTTTGCAGCTATACCCCCTAACTTCCCAATTGACAATCTAAGTGCAGCACAGATCACAATGTGCCCACATGTGATTGTGGATGTCAGGCAGTTGGAACCGGTCAACCTCCCGATGCCTGACGTTCGCAATAACTTCTTCCACTACAACCAAGGGTCTGATTCGAGATTGCGCTTGGTTGCAATGCTGTACACACCTCTTAGGGCAAATAACTCTGGGGATGATGTTTTCACTGTGTCTTGTAGAGTGCTGACTAGACCTAGCCCTGAATTTTCATTTAACTTCCTTGTGCCACCCACTGTGGAGTCAAAGACAAAACCCTTCACCCTCCCAATTCTGACTATCTCTGAAATGTCTAATTCTAGGTTTCCAGTGCCGATTGATTCTCTGCACACCAGCCCAACTGAGAATATTGTTGTCCAGTGCCAAAATGGGCGCGTCACTCTTGATGGTGAGTTGATGGGCACCACTCAGCTCTTACCTAGTCAAATCTGTG-CTTTCAGGGGCGTGCTCACTAGATCAACAAGCAGGACTAGTGACCAGGCCGACACAGCAACCCCTAGATTGTTTAATTATTATTGGCACATACAATTGGATAATCTAAATGGGACTCCTTATGATCCTGCAGAAGACATACCAGGCCCCCTGGGGACACCAGATTTCCGGGGCAAAGTCTTTGGCGTGGCCAGCCAAAG---AAACCCCGACAGTACAACTAGAGCACATGAAGCAAAGGTGGACACAACAGCTGGTCGCTTCACCCCAAAACTAGGCTCACTAGAGATATCCACTGAATC---TGGTGACTTTGACCAAAACCAACCAACAAGATTCACCCCAGTTGGCATT---------GGGGTTGACCACGAGGCAGACTTCCAACAATGGTCCTTACCCGACTACTCTGGCCAGTTCACTCACAACATGAACTTAGCCCCAGCTGTTGCTCCCAACTTCCCTGGTGAGCAGCTCCTTTTCTTCCGCTCACAGTTACCATCTTCTGGTGGGCGATCCAATGGGATTCTAGACTGCCTGGTCCCCCAAGAATGGGTTCAGCACTTCTACCAAGAATCAGCCCCCGCCCAAACCCAGGTGGCTCTGGTTAGATATGTCAACCCTGACACTGGTAGAGTGTTGTTTGAGGCCAAGCTGCACAAATTAGGCTTCATGACTATAGCTAAGAATGGTGACTCTCCAATAACTGTTCCCCCAAATGGATACTTTAGGTTTGAATCTTGGGTGAACCCATTTTATACACTTGCCCCCATGGGAACTGGGAATGGGCGTAGAAGAGTTCAATAATGGCTGGAGCTTTTGTAGCAGGATTAGCTGGTGATATATTCACAAACACTGTAGGGTCTCTAGTTAATGCAGGGGCTAATGCTATTAATCAAAAAGTTGATTTTGAAAATAATAAATATTTGCAAAATGCTTCCTTC-AAT--------------------------------CATGATAAGGAAATGTTAAATGCACAGATTGAGGCAACAAAGAGGTTACAGGCTGACATGATTGCTATCAAACAGGGGGTTTTGACCGCTGGCGGCTTTTCCCCCACTGATGCAGCTCGCGGGGCAATTAATGCCCCCATGACAAAAGTCCTAGATTGGAATGGAACAAGGTACTGGGCACCAAGTGCCACCTCTACAACCT-CGATGTCGGGTGGCTT--CACAAACCAGGTTGTGCACAGAACCACGCCAAATTTTAAA---ATGAACCAGGCCCCCAAATCCACACCCAGCAGTGGGTCTTC-AGTGAGGTCACTCTCAACCCAAGTCACTAGTCTGAGCTCACACTCGTCCGGGTCGTCTCGATCCAGCGGGC-CTACAGCTGCTAGCTCG---TTACCATCCTCTAACAGGACTAGGGACTGGGTCAATCAGCAGAATTTCAATTTGGAACCACACATGCCTGGGTCACTTAGGACGGCTTTTGTCACTCCACCATCTAGTACAGCCTCTAGTTCAGGCACTGTCTCAACCGTGCCCAAAAATGTTTTGGACTCCTGGACA-TCTGCGTTTAACACGCGCAGACAGCCGCTATT-CGCACACCTTCGTAGAAGGGGGGAGTCAAATGTTTA--GTGAAAAGGTTATCTTAAATTTGATTTAAA-TTGGATTTG 25 | >KY348698.1 26 | TGAATGAGGATGGCCCTATCATCTTTGAGAGACACTCCAGATATAAATATCATTATGATGCTGACTACTCCCGGTGGGATTCAACACAACAAAGAGCCGTGTTGGCAGCAGCCTTAGAAATCATGGTCAAGTTTTCCCCAGAGCCGCACCTGGCCCAAAAGGTTGCAGAAGACCTTCTTTCTCCCAGCGTGATGGACGTGGGTGATTTCAAAATATCAATCAATGAGGGTCTCCCCTCCGGAGTGCCCTGCACCTCCCAATGGAATTCCATCGCCCACTGGCTCCTCACTCTCTGTGCACTCTCCGAGGTTACAAACCTGTCCCCTGACATTATTCAGGCCAACTCTCTCTTTTCTTTTTACGGTGATGATGAAATTGTGAGTACAGACATAAAATTGGACCCAGAAAAACTGACAGCAAAACTTAAGGAATACGGGTTGAAACCGACCCGCCCTGACAAGACTGAAGGGCCCCTTGTCATTTCCGAAGACCTGAATGGCCTAACCTTCCTGCGGAGGACCGTGACTCGCGACCCAGCAGGCTGGTTTGGAAAGTTGGAACAGAGCTCAATACTCAGACAAATGTACTGGACTAGGGGCCCCAACCATGAAGATCCATCTGAAACAATGATACCACACTCCCAGAGGCCCATACAATTGATGTCTTTGCTGGGTGAGGCAGCACTCCACGGCCCAGCATTCTACAGCAAAATCAGTAAACTGGTCATTGCAGAGTTGAAGGAAGGTGGCATGGATTTTTACGTGCCAAGACAAGAGCCAATGTTCAGATGGATGAGATTCTCGGATCTGAGCACGTGGGAGGGCGATCGCAATCTGGCTCCCAGTTTTGTGAATGAAGATGGCGTCGAATGACGCCACTCCATCTAATGATGGTGCCGCCGGCCTCGTCCCAGAGATCAACAATGAGGCAATGGCGCTAGATCCAGTGGCGGGTGCAGCGATAGCAGCGCCCCTCACTGGCCAGCAAAATATAATTGATCCCTGGATTATGAATAATTTTGTGCAAGCACCTGGTGGTGAGTTCACAGTGTCTCCTAGGAATTCCCCTGGTGAAGTGCTCCTCAATTTGGAATTGGGCCCAGAGATAAACCCCTATCTGGCCCATCTTGCTAGAATGTATAATGGTTATGCAGGTGGGTTTGAAGTGCAGGTAGTCCTAGCTGGAAATGCGTTTACAGCAGGAAAGATAATCTTTGCAGCTATACCCCCTAACTTCCCAATTGACAATCTAAGTGCAGCACAGATCACAATGTGCCCACATGTGATTGTGGATGTCAGGCAGTTGGAACCGGTCAACCTCCCGATGCCTGACGTTCGCAATAACTTCTTCCACTACAACCAAGGGTCTGATTCGAGATTGCGCTTGGTTGCAATGCTGTACACACCTCTTAGGGCAAATAACTCTGGGGATGATGTTTTCACTGTGTCTTGTAGAGTGCTGACTAGACCTAGCCCTGAATTTTCATTTAACTTCCTTGTGCCACCCACTGTGGAGTCAAAGACAAAACCCTTTACCCTCCCAATTCTGACTATCTCTGAAATGTCTAATTCTAGGTTTCCAGTGCCGATTGATTCTCTGCACACCAGCCCAACTGAGAATATTGTTGTCCAGTGCCAAAATGGACGCGTCACTCTTGATGGTGAGTTGATGGGCACCACTCAGCTCTTACCTAGTCAAATCTGTG-CTTTCAGGGGCGTGCTCACTAGATCAACAAGCAGGACTAGTGACCAGGCCGACACAGCAACCCCTAGATTGTTTAATTATTATTGGCACATACAATTGGATAATCTAAATGGGACTCCTTATGATCCTGCAGAAGACATACCAGGCCCCCTGGGGACACCAGATTTCCGGGGCAAAGTCTTTGGCGTGGCCAGCCAAAG---AAACCCCGACAGTACAACTAGAGCACATGAAGCAAAGGTGGACACAACAGCTGGTCGCTTCACCCCAAAACTAGGCTCATTAGAGATATCCACTGAATC---TGGTGACTTTGACCAAAACCAACCAACAAGATTCACCCCAGTTGGCATT---------GGGGTTGACAACGAGGCAGACTTCCAACAATGGTCCTTACCCGACTACTCTGGCCAGTTCACCCACAACATGAACTTAGCCCCAGCTGTTGCTCCCAACTTCCCTGGTGAGCAGCTCCTTTTCTTCCGCTCACAGTTACCATCTTCTGGTGGGCGATCCAATGGGATTCTAGACTGCCTGGTCCCCCAAGAATGGGTTCAGCACTTCTACCAAGAATCAGCCCCCGCCCAAACCCAGGTGGCTCTGGTTAGATATGTCAACCCTGACACTGGTAGAGTGTTGTTTGAGGCCAAGCTGCACAAATTAGGTTTCATGACTATAGCTAAGAATGGTGACTCTCCAATAACTGTCCCCCCAAATGGATACTTTAGGTTTGAATCTTGGGTGAACCCATTTTATACACTTGCCCCCATGGGAACTGGGAATGGGCGTAGAAGAGTTCAATAATGGCTGGAGCTTTTATAGCAGGATTAGCTGGTGATATACTCACAAATACTGTAGGGTCTCTAGTTAATGCAGGGGCTAATGCTATTAATCAAAAAGTTGATTTTGAAAATAATAAATATTTGCAAAATGCTTCCTTC-AAT--------------------------------CATGATAAGGAAATGTTAAATGCACAGATTGAGGCAACAAAGAGGTTACAGGCTGACATGATTGCTATCAAACAGGGGGTTTTGACCGCTGGCGGCTTTTCCCCCACTGATGCAGCTCGCGGGGCAATTAATGCCCCCATGACAAAAGTCCTAGATTGGAATGGAACAAGGTACTGGGCACCAAGTGCCACCTCTACAACCT-CGATGTCGGGTGGCTT--CACAAACCAGGCTGTGCACAGAACCACGCCAAATTTTAAA---ATGAACCAGGCCCCAAAATCCACACCCAGCAGTGGGTCTTC-AGTGAGGTCACTCTCAACCCAAGTCACTAGTCTGAGCTCACACTCGTCCGGGTCGTCTCGATCCAGCGGGT-CTACAGTTGCTAGCTCA---TTACCATCCTCTAGCAGGACTAGGGACTGGGTCAATCAGCAAAATTTCAATTTGGAACCACACATGCCTGGGTCACTTAGGACGGCTTTTGTCACTCCACCATCTAGCACAGCCTCTAGTTCAGGCACTGTCTCAACCGTGCCCAAAAATGTTTTGGACTCCTGGACA-TCTGCGTTTAACACGCGCAGACAGCCGCTATT-CGCACACCTTCGTAGAAGGGGGGAGTCAAATGTTTA--GTGA------------------------------------ 27 | >EU921389.2 28 | TGAATGAGGATGGACCCATAATATTTGAGAAACATTCCAGATACAAATACCATTATGATGCAGATTACTCCCGTTGGGACTCAACACAACAAAGAGCAGTGCTGGCTGCAGCCCTGGAAATAATGGTCAAATTCTCACCAGAACCCCATCTGGCCCAAGTGGTTGCTGAAGACCTCTTGTCCCCCAGTGTGATGGATGTGGGTGACTTCAAGATATCAATCAACGAGGGATTACCCTCTGGTGTTCCCTGCACCTCACAATGGAACTCCATTGCCCACTGGCTCCTCACACTATGTGCACTGTCTGAAGTCACAGACCTGTCCCCTGACATCATCCAGGCAAATTCCCTGTTCTCCTTTTATGGTGATGATGAAATAGTGAGCACAGATATTAAACTGGACCCAGAGAAATTAACAACAAAATTGAAGGAATACGGGCTAAAACCAACCCGTCCTGACAAAACAGAAGGACCCTTAATTATCTCTGAAGATTTGGATGGCCTGACCTTCTTACGGAGAACGGTGACCCGTGATCCGGCCGGGTGGTTTGGCAAACTGGACCAAAGTTCAATACTCAGGCAGATGTACTGGACCAGGGGACCAAACCATGAGGACCCCTTCGAAACAATGATACCACACTCCCAAAGACCCATACAACTGATGTCATTATTGGGTGAAGCTGCGTTGCATGGTCCATCATTCTACAGTAAAATCAGCAAATTGGTCATCTCAGAATTGAAAGAGGGTGGAATGGATTTTTACGTGCCCAGACAAGAACCAATGTTCAGGTGGATGAGATTCTCAGATTTGAGCACGTGGGAGGGCGATCGCAATCTGGCTCCCAGTTTTGTGAATGAAGATGGCGTCGAATGACGCCGCTCCATCTAATGATGGTGCCGCCGGCCTCGTCCCAGAGATCAACAATGAGGCAATGGCGCTAGAGCCAGTGGCGGGTGCAGCGATAGCAGCACCCCTCACTGGTCAGCAAAATATAATTGATCCCTGGATTATGAATAATTTTGTGCAAGCACCTGGTGGTGAGTTTACAGTATCCCCTAGAAATTCCCCTGGTGAAGTTCTTCTTAATTTGGAATTGGGCCCAGAAATAAATCCCTATTTGGCCCATCTTGCTAGAATGTATAATGGTTATGCAGGTGGATTTGAAGTGCAGGTGGTCCTAGCTGGAAATGCGTTTACAGCAGGAAAGATAATCTTTGCAGCTATTCCCCCTAATTTTCCAATTGATAATCTAAGTGCAGCACAGATCACAATGTGTCCACATGTGATTGTGGATGTCAGACAGCTGGAACCAGTCAACCTCCCAATGCCTGACGTTCGTAACAACTTCTTTCATTACAATCAAGGGTCTGATTCGAGATTGCGCCTAATTGCAATGCTGTATACACCTCTTAGGGCAAATAATTCTGGGGATGATGTTTTTACTGTGTCTTGCAGAGTGCTAACTAGACCTAGTCCTGACTTCTCATTTAATTTCCTTGTGCCACCTACTGTGGAGTCAAAGACAAAACCCTTTTCCCTCCCTATTCTGACTATCTCTGAAATGTCTAATTCTAGGTTCCCAGTACCAATTGATTCTCTGCACACCAGCCCTACTGAGAACATTGTTGTCCAGTGTCAGAATGGACGCGTCACCCTTGATGGTGAGTTGATGGGCACCACCCAACTCTTACCTAGCCAAATCTGTG-CTTTCAGGGGCGTGCTCACCAGATCAACAAGCAGGGCCAGTGACCAGGCCGATACAGCAACCCCTAGATTGTTTAATTATTATTGGCATATACAGTTGGATAATCTAAATGGAACTCCTTATGACCCTGCAGAAGATATACCAGGCCCCCTAGGGACACCAGATTTTCGGGGCAAAGTCTTTGGCGTGGCCAGCCAGAG---AAATCCTGATAGCACGACTAGGGCACATGAAGCAAAGATAGACACAACATCTGGCCGTTTCACCCCAAAACTAGGCTCATTAGAGATTTCCACTGAGTC---TGATGATTTTGATCAAAACAAACCAACAAGATTCACCCCAGTTGGCATT---------GGGGTTGACCATGAGGCAGACTTTCAACAATGGGCTCTTCCCGACTATGCTGGCCAGTTCACCCACAACATGAACTTAGCCCCAGCTGTTGCTCCCAACTTTCCTGGTGAGCAGCTCCTTTTCTTCCGCTCACAGTTGCCATCTTCTGGTGGGCGATCCAACGGGATTCTAGACTGCCTGGTCCCCCAAGAATGGGTACAGCACTTCTACCAAGAATCAGCCCCCTCCCAATCTCAAGTGGCCCTGGTTAGGTATATCAACCCTGACACTGGTAGAGTGTTATTTGAGGCCAAGCTGCACAAATTAGGTTTCATAACTATAGCCAAGAATGGTGACTCTCCAATAACTGTCCCTCCAAATGGATACTTTAGGTTTGAATCTTGGGTGAACCCCTTTTATACACTTGCCCCCATGGGAACTGGGAATGGGCGTAGAAGGATTCAATAATGGCTGGAGCTTTTATAGCAGGATTGGCTGGTGACATGCTCACAAATACTGTAGGATCTTTAGTTAATGCAGGGGCTAATGCCATTAATCAAACAATTGATTTTGAAAATAATAAATATTTGCAAAATGCCTCTTTT-AAT--------------------------------CATGATAAGGAGATGTTGAACGCACAAATTGAGGCAACAAAGAGATTACAGGCTGACATGATTGCTATCAAACAAGGGGTTTTGACCGCTGGCGGCTTCTCCCCTACTGATGCAGCCCGCGGGGCAATCAATGCCCCCATGACAAAAGTCCTAGATTGGAATGGAACGAGATACTGGGCACCAGGTGCCACCTCCACAACCT-CGATGTCGGGTGGCTT--CACAAATCAAACTGTGCACAGATCCACACCAAATTTTAAA---ACGAACCAGGCCCCCAAACCCACACCCAGCAGTGGGTCTTC-AGTGAGGTCAAATTCAACCCAAATCACTAGCCTGAGTTCACACTCGTCCGGGTCGTCTCGATCCAGCGGGT-CTACAGTTGTCAACTCA---ATACCATCCTCTAACAGGACTAGGGACTGGGTCAACCAACAAAATTTTAATTTGGAACCACACATGCCTGGATCTCTTAGGACAGCTTTTGTCACTCCACCATCTAGTACAGCCTCTAGCTCAGGCACAGTCTCAACTGTGCCCAAAAATGTTTTGGACTCCTGGACA-TCTGCGTTTAACACGCGCAGACAACCGCTATT-CGCACACCTTCGCAGAAGGGGGGAGTCGAATGTTTA--GTGAAAAGATTATTTTAAATTTGATTTAAA-TTGGATTTG 29 | >MH260494.1 30 | TGAATGAGGATGGCCCTATCATCTTTGAGAGACACTCTAGATATAAATATCATTATGATGCTGACTACTCTCGGTGGGATTCAACACAACAAAGAGCCGTGTTGGCAGCAGCCTTAGAAATTATGGTCAAGTTTTCCCCAGAGCCGCACCTGGCCCAAAAGGTTGCAGAAGACCTTCTTTCTCCCAGCGTGATGGACGTAGGTGATTTCAAAATATCAATCAATGAGGGTCTCCCCTCCGGGGTGCCCTGCACCTCCCAATGGAATTCCATCGCCCACTGGCTCCTCACCCTCTGTGCACTCTCCGAAGTTACAAACCTGTCCCCTGACATTATTCAGGCCAACTCTCTCTTTTCTTTCTACGGTGATGATGAAATTGTGAGTACAGACATAAAATTAGACCCAGAAAAACTGACAGCAAAACTTAAGGAATACGGGTTGAAACCGACCCGCCCTGACAAGACTGAAGGGCCTCTTGTCATTTCCGAAGACCTGAATGGCCTAACCTTCCTGCGGAGGACCGTGACCCGTGACCCAGCAGGCTGGTTTGGAAAGTTGGAACAGAGCTCAATACTCAGACAAATGTACTGGACTAGGGGCCCCAACCATGAAGATCCATCTGAAACAATGATACCACACTCCCAGAGGCCCATACAATTGATGTCTTTGCTGGGTGAGGCAGCACTCCACGGCCCAGCATTCTACAGCAAAATCAGCAAACTGGTCATTGCAGAGTTGAAGGAAGGTGGCATGGATTTTTACGTGCCAAGACAAGAGCCAATGTTCAGATGGATGAGATTCTCGGATCTGAGCACGTGGGAGGGCGATCGCAATCTGGCTCCCAGTTTTGTGAATGAAGATGGCGTCGAATGACGCCACTCCATCTAATGATGGTGCCGCCGGCCTCGTCCCAGAGATCAACAATGAGGCAATGGCGCTAGAACCAGTGGCGGGTGCAGCGATAGCAGCACCCCTCACTGGCCAGCAAAATATAATTGATCCCTGGATTATGAATAATTTTGTGCAAGCACCTGGTGGTGAGTTCACAGTGTCTCCTAGGAATTCCCCTGGTGAAGTGCTCCTCAATTTGGAATTGGGCCCAGAGATAAACCCCTATCTGGCCCATCTTGCTAGAATGTATAATGGTTATGCAGGTGGGTTTGAAGTGCAGGTAGTCCTAGCTGGAAATGCGTTTACAGCAGGAAAGATAATCTTTGCAGCTATACCCCCTAACTTCCCAATTGACAATCTAAGTGCAGCACAGATCACAATGTGTCCACATGTGATTGTGGATGTCAGGCAGTTGGAACCGGTCAACCTCCCGATGCCTGACGTTCGCAATAACTTCTTCCACTACAACCAAGGGTCTGATTCGAGATTGCGCTTGGTTGCGATGCTGTACACACCTCTTAGGGCAAATAACTCTGGGGATGATGTTTTCACTGTGTCTTGTAGAGTGTTGACTAGACCTAGCCCTGAATTTTCATTTAACTTCCTTGTGCCACCCACTGTGGAGTCAAAGACAAAACCCTTTACCCTCCCAATTCTGACTATTTCTGAAATGTCTAATTCTAGGTTTCCAGTGCCGATTGATTCCCTGCACACCAGCCCAACTGAGAATATTGTTGTCCAGTGCCAAAATGGACGCGTCACTCTTGATGGTGAGTTGATGGGCACCACTCAGCTCTTACCTAGTCAAATCTGTG-CTTTCAGGGGCGTGCTCACTAGATCAACAAGCAGGGCTAGTGACCAGGCCGACACAGCAACCCCTAGATTGTTTAATTATTATTGGCACATACAATTGGATAATCTAAATGGGACTCCTTATGATCCTGCAGAAGACATACCAGGCCCCCTGGGGACACCAGATTTCCGGGGCAAAGTCTTCGGCGTGGCCAGCCAAAG---AAACCCCGACAGTACAACTAGAGCACATGAAGCAAAGGTGGACACAACAGCTGGTCGCTTCACCCCAAAACTAGGCTCATTAGAGATATCCACTGAATC---TGATGACTTTGATCAAAATCAACCAACAAGATTCACCCCAGTTGGCATT---------GGGGTTGACCACGAGGCAGACTTCCAACAATGGTCCTTACCCGACTACTCTGGCCAGTTCACTCACAACATGAACTTAGCCCCAGCTGTTGCTCCCAACTTCCCTGGTGAGCAGCTCCTTTTCTTCCGCTCACAGTTACCATCTTCTGGTGGGCGATCCAATGGGATTCTAGACTGCCTGGTCCCCCAAGAATGGGTTCAGCACTTCTACCAAGAATCAGCCCCCGCCCAAACCCAGGTGGCTCTGGTTAGATATGTCAACCCTGACACTGGTAGAGTGTTGTTTGAGGCAAAGCTGCACAAATTAGGCTTCATGACTATAGCTAAGAATGGTGACTCTCCAATAACTGTCCCCCCAAATGGATACTTTAGGTTTGAATCTTGGGTGAACCCATTTTATACACTTGCCCCCATGGGAACTGGGAATGGGCGTAGAAGAGTTCAATAATGGCTGGAGCTTTTATAGCAGGATTAGCTGGTGATATATTCACAAATACTGTAGGGTCTCTAGTTAATGCAGGGGCTAATGCTATTAATCAAAAAGTTGATTTTGAAAATAATAAATATTTGCAAAATGCTTCCTTC-AAT--------------------------------CATGATAAGGAAATGTTAAATGCACAGATTGAGGCAACAAAGAGGTTACAGGCTGACATGATTGCTATCAAACAGGGGGTTTTGACCGCTGGCGGCTTTTCCCCCACTGATGCAGCTCGCGGGGCAATTAATGCCCCCATGACAAAAGTCCTAGATTGGAATGGAACAAGGTACTGGGCACCGAGTGCCACCTCTACAACCT-CGATGTCGGGTGGCTT--CACAAACCAGGCTGTGTACAGAACTACACCAAATTTTAAA---ATGAACCAGGCCCCCAAATCCACACCCAGCAGTGGGTCTTC-AGTGAGGTCACTCTCAACCCAAGTCACTAGTCTGAGCTCACACTCGTCCGGGTCGTCTCGATCCAGCGAGT-CTACAGTTGCTAGCTCA---TTACCATCCTCTAACAGGACTAGGGACTGGGTCAATCAGCAAAATTTCAATTTGGAACCACACATGCCTGGGTCACTTAGGACGGCTTTTGTCACTCCACCATCTAGTACAGCCTCTAGTTCAGGCACTGTCTCAACCGTGCCCAAAAATGTTTTGGACTCCTGGACA-TCTGCGTTTAACACGCGCAGACAGCCGCTATT-CGCACACCTTCGTAGAAGGGGGGAGTCAAATGTTTA--GTGAAAAGGTTATTTTAAATTTGATTTAAA-TTGGATTTG 31 | >KY348697.1 32 | TGAATGAGGATGGCCCTATCATCTTTGAGAGACACTCCAGATATAAATATCATTATGATGCTGACTACTCCCGGTGGGATTCAACACAACAAAGAGCCGTGTTGGCAGCAGCCTTAGAAATCATGGTCAAGTTTTCCCCAGAGCCGCACCTGGCCCAAAAGGTTGCAGAAGACCTTCTTTCTCCCAGCGTGATGGACGTGGGTGACTTCAAAATATCAATCAATGAGGGTCTCCCCTCCGGAGTGCCCTGCACCTCCCAATGGAATTCCATCGCCCACTGGCTCCTCACTCTCTGTGCACTCTCCGAGGTTACAAACCTGTCCCCTGACATTATTCAGGCCAACTCTCTCTTTTCTTTCTACGGTGATGATGAAATTGTGAGTACAGACATAAAATTGGACCCAGAAAAACTGACAGCAAAACTTAAGGAATACGGGTTGAAACCGACCCGCCCTGACAAGACAGAAGGGCCTCTTGTCATTTCCGAAGACCTGAATGGCCTAACCTTCCTGCGGAGGACCGTGACTCGCGACCCAGCAGGCTGGTTTGGAAAGTTGGAACAGAGCTCAATACTCAGACAAATGTACTGGACTAGGGGCCCCAACCATGAAGATCCATCTGAAACAATGATACCACACTCCCAGAGGCCCATACAATTGATGTCTTTGCTGGGTGAGGCAGCACTCCACGGCCCAGCATTCTACAGCAAAATCAGTAAACTGGTCATTGCAGAGTTGAAGGAAGGTGGCATGGATTTCTACGTGCCAAGACAAGAGCCAATGTTCAGATGGATGAGATTCTCGGATCTGAGCACGTGGGAGGGCGATCGCAATCTGGCTCCCAGTTTTGTGAATGAAGATGGCGTCGAATGACGCCACTCCATCTAATGATGGTGCCGCCGGCCTCGTCCCAGAGATCAACAATGAGGCAATGGCGCTAGATCCAGTGGCGGGTGCAGCGATAGCAGCACCCCTCACTGGCCAGCAAAATATAATTGATCCCTGGATTATGAATAATTTTGTGCAAGCACCTGGTGGTGAGTTCACAGTGTCTCCTAGGAATTCCCCTGGTGAAGTGCTCCTCAATTTGGAATTGGGCCCAGAGATAAACCCCTATCTGGCCCATCTTGCTAGAATGTATAATGGTTATGCAGGTGGGTTTGAAGTGCAGGTAGTCCTAGCTGGAAATGCGTTTACAGCAGGAAAGATAATCTTTGCAGCTATACCCCCTAACTTCCCAATTGACAATCTAAGTGCAGCACAGATCACAATGTGCCCACATGTGATTGTGGATGTCAGGCAGTTGGAACCGGTCAACCTCCCGATGCCTGACGTTCGCAATAACTTCTTCCACTACAACCAAGGGTCTGATTCGAGATTGCGCTTGGTTGCAATGCTGTACACACCTCTTAGGGCAAACAACTCTGGGGATGATGTTTTCACTGTGTCTTGTAGAGTGCTGACTAGACCTAGCCCTGAATTTTCATTTAACTTCCTTGTGCCACCCACTGTGGAGTCAAAAACAAAACCCTTCACTCTCCCAATTCTGACTATCTCTGAAATGTCTAATTCTAGGTTTCCAGTGCCGATTGATTCTCTGCACACCAGCCCAACTGAGAATATTGTTGTCCAGTGCCAAAATGGACGCGTCACTCTTGATGGTGAGTTGATGGGCACCACTCAGCTCTTACCTAGTCAAATCTGTG-CTTTCAGGGGCGTGCTCACTAGATCAACAAGCAGGGCTAGTGACCAGGCCGACACAGCAACCCCTAGATTGTTTAATTATTATTGGCATATACAATTGGATAATCTAAATGGGACTCCTTATGATCCTGCAGAAGACATACCAGGCCCCCTGGGGACACCAGATTTCCGGGGCAAAGTCTTTGGCGTGGCCAGCCAAAG---AAACCCCGACAGTACAACTAGAGCACATGAAGCAAAGGTGGACACAACAGCTGGTCGCTTCACCCCAAAACTAGGCTCATTAGAGATATCCACTGAATC---TGATGACTTTGACCAAAACCAACCAACAAGATTCACCCCAGTTGGCATT---------GGGGTTGACCACGAGGCAGACTTCCAACAATGGTCCTTACCCGACTACTCTGGCCAGTTCACTCACAACATGAACTTAGCCCCAGCTGTTGCTCCCAACTTCCCTGGTGAGCAGCTCCTTTTCTTCCGCTCACAGTTACCATCTTCTGGTGGGCGATCCAATGGGATTCTAGACTGCCTGGTCCCCCAAGAATGGGTTCAGCACTTCTACCAAGAATCAGCCCCCGCCCAAACCCAGGTGGCTCTGGTTAGATATGTCAACCCTGACACTGGTAGAGTGTTGTTTGAGGCCAAGCTGCACAAATTAGGTTTCATGACTATAGCTAAGAATGGTGACTCTCCAATAACTGTCCCCCCAAATGGATACTTTAGGTTTGAATCTTGGGTGAACCCATTTTATACACTTGCCCCCATGGGAACTGGGAATGGGCGTAGAAGAGTTCAATAATGGCTGGAGCTTTTATAGCAGGATTAGCTGGTGATATATTCACAAATACTGTAGGGTCTCTAGTTAATGCAGGGGCTAATGCTATTAATCAAAAAGTTGATTTTGAAAATAATAAATATTTGCAAAATGCTTCCTTC-AAT--------------------------------CATGATAAGGAAATGTTAAATGCACAGATTGAGGCAACAAAGAGGTTACAGGCTGACATGATTGCTATCAAACAGGGGGTTTTGACCGCTGGCGGCTTTTCCCCCACTGATGCAGCTCGCGGGGCAATTAATGCCCCCATGACAAAAGTCCTAGATTGGAATGGAACAAGGTACTGGGCACCAAGTGCCACCTCTACAACCT-CGATGTCGGGTGGCTT--CACAAATCAGGCTGTGCACAGAACCACGCCAAATTTTAAA---ATGAACCAGGCCCCCAAATCCACACCCAGCAGTGGGTCTTC-AGTGAGGTCACTCTCAACCCAAGTCACTAGTCTGAGCTCACACTCGTCCGGGTCGTCTCGATCCAGCGGGT-CTACAGTTGCTAGCTCA---TTACCATCCTCTAACAGGACTAGGGACTGGGTCAATCAGCAAAATTTCAATTTGGAACCACACATGCCTGGGTCACTTAGGACGGCTTTTGTCACTCCACCATCTAGTACAGCCTCTAGTTCAGGCACTGTCTCAACCGTGCCCAAAAATGTTTTGGACTCCTGGACA-TCTGCGTTTAACACGCGCAGACAGCCGCTATT-CGCACACCTTCGTAGAAGGGGGGAGTCAAATGTTTA--GTGA------------------------------------ 33 | >MG601447.1 34 | TGAATGAGGATGGCCCTATCATCTTTGAGAGACACTCCAGATATAAATATCATTATGATGCTGACTACTCCCGGTGGGATTCAACACAACAAAGAGCCGTGTTGGCAGCAGCCTTAGAAATCATGGTCAAGTTTTCCCCAGAGCCACACCTGGCCCAAAAGGTTGCAGAAGACCTTCTTTCTCCCAGCGTGATGGACGTGGGTGATTTCAAAATATCAATCAATGAGGGTCTCCCCTCCGGAGTGCCCTGCACCTCCCAATGGAATTCCATCGCCCACTGGCTCCTCACTCTCTGTGCACTCTCTGAGGTTACAAACCTGTCCCCTGACATTATTCAGGCTAACTCTCTCTTTTCTTTCTACGGTGATGATGAAATTGTGAGTACAGACATAAAATTGGACCCAGAAAAACTGACAGCAAAACTCAAGGAATACGGGTTGAAACCGACCCGCCCTGACAAAACTGAAGGGCCTCTTGTCATTTCCGAAGACCTGAATGGCCTAACCTTCCTGCGGAGGACCGTGACTCGCGACCCAGCAGGCTGGTTTGGAAAGTTGGAACAGAGCTCAATACTCAGACAAATGTACTGGACTAGGGGCCCCAACCATGAAGATCCATCTGAAACAATGATACCACACTCCCAGAGGCCCATACAATTGATGTCTTTGCTGGGTGAGGCAGCACTCCACGGCCCAGCATTCTACAGCAAAATCAGTAAACTGGTCATTGCAGAGTTGAAGGAGGGTGGCATGGATTTTTACGTGCCAAGACAAGAGCCAATGTTCAGATGGATGAGATTCTCGGATCTGAGCACGTGGGAGGGCGATCGCAATCTGGCTCCCAGTTTTGTGAATGAAGATGGCGTCGAATGACGCCACCCCATCTAATGATGGTGCCGCCGGCCTCGTCCCAGAGATCAACAATGAGGCAATGGCGCTAGATCCAGTGGCGGGTGCAGCGATAGCAGCACCCCTCACCGGCCAGCAAAATATAATTGATCCCTGGATTATGAATAATTTTGTGCAAGCACCTGGTGGTGAGTTCACAGTGTCTCCTAGGAATTCCCCTGGTGAAGTGCTCCTCAATTTGGAATTGGGCCCAGAGATAAACCCCTATCTGGCCCATCTTGCTAGAATGTATAATGGTTATGCAGGTGGGTTTGAAGTGCAGGTAGTCCTGGCTGGAAATGCGTTTACAGCAGGAAAGATAATCTTTGCAGCTATACCCCCTAACTTCCCAATTGACAATCTAAGTGCAGCACAGATCACAATGTGCCCACATGTGATTGTGGATGTCAGGCAGCTGGAACCGGTCAACCTCCCGATGCCTGACGTTCGCAATAACTTCTTCCACTACAACCAAGGGTCTGATTCGAGATTGCGCTTGGTTGCAATGCTGTATACACCTCTTAGGGCAAATAACTCTGGGGATGATGTTTTCACTGTGTCTTGTAGAGTGCTGACTAGACCTAGTCCTGAATTTTCATTTAACTTCCTTGTGCCACCCACTGTGGAGTCAAAGACAAAACCTTTTACCCTCCCAATTCTGACTATCTCTGAAATGTCTAATTCTAGGTTTCCAGTGCCGATTGATTCTCTGCACACCAGCCCAACTGAGAATATTGTTGTCCAGTGCCAAAATGGACGCGTCACTCTTGATGGCGAGTTGATGGGCACCACTCAGCTCTTACCTAGTCAAATCTGTG-CTTTCAGGGGCGTGCTCACTAGATCAACAAGCAGGGCTAGTGACCACGCCGACACAGCAACCCCTAGATTGTTTAATTATTATTGGCACATACAATTGGATAATCTAAATGGGACTCCTTATGATCCTGCAGAAGACATACCAGGCCCCCTAGGGACACCAGATTTCCGGGGCAAAGTCTTTGGCGTGGCCAGCCAAAG---AAACCCCGACAGTACAACTAGGGCACATGAAGCAAAGGTGGACACAACAGCTGGTCGCTTCACCCCAAAACTAGGCTCATTAGAGATATCCACTGAATC---TGGTGACTTTGACCAAAACCAACCAACAAGATTCACCCCAGTTGGCATT---------GGGGTTGACCACGAGTCAGACTTCCAGCAATGGTCCTTACCCGATTACTCTGGCCAGTTCACTCACAACATGAACTTAGCCCCAGCTGTTGCTCCCAACTTCCCTGGTGAGCAGCTCCTTTTCTTCCGCTCACAGTTACCATCTTCTGGTGGGCGATCCAATGGGATTCTAGATTGCCTGGTCCCCCAGGAATGGGTCCAGCACTTCTACCAAGAATCAGCCCCCGCCCAAACCCAGGTGGCTTTGGTTAGATATGTCAACCCTGACACTGGTAGAGTGTTGTTTGAGGCCAAGCTGCACAAATTAGGTTTCATGACTATAGCTAAGAATGGTGACTCTCCAATAACTGTCCCCCCAAATGGATACTTTAGGTTTGAATCTTGGGTGAACCCATTTTACACACTTGCCCCCATGGGAACTGGGAATGGGCGTAGAAGAGTTCAATAATGGCTGGTGCTTTTATAGCAGGATTAGCTGGTGATATACTCACAAATACTGTAGGGTCTCTAGTTAATGCAGGGGCTAATGCTATTAATCAAAAAGTTGATTTTGAAAATAATAAATACTTGCAAAATGCTTCCTTC-AAT--------------------------------CATGATAAGGAAATGTTAAATGCACAGATTGAGGCAACAAAGAGATTACAGGCTGATATGATTGCTATCAAACAGGGGGTTCTGACCGCTGGCGGCTTTTCCCCCACTGATGCAGCTCGCGGGGCAATCAACGCCCCCATGACAAAAGCCCTAGATTGGAATGGAACAAGGTACTGGGCACCAAGTGCCACCTCTACAACCT-CGATGTCGGGTGGCTT--CACAAACCAGGCTGTGCACAGAACCACGCCAAATTTTAAA---ATGAACCAGGCCCCCAGATCCACACCCAGCAGTGGGTCTTC-AGTAAGGTCACTCTCAACCCAAGTCACTAGTCTGAGCTCACACTCGTCCGGGTCGTCTCGACCCAGCGGGT-CTACAGTTGCTAGCTCA---TTACCATCCTCTAACAGGACTAGGGACTGGGTCAATCAGCAAAATTTCAATTTGGAACCACACATGCCTGGGTCACTTAGGACGGCTTTTGTCACTCCACCATCTAGTACAGCCTCTAGTTTAGGCACTGTCTCAACCGTGCCCAAAAATGTTTTGGACTCCTGGACA-TCTGCGTTTAACACGCGCAGACAGCCGCTATT-CGCACACCTTCGTAGAAGGGGGGAGTCAAATGTTTA--GTGAAAAGGTTATCTTAAATTTGATTTAAA-TTGGATCTG 35 | >KY905334.1 36 | TGAATGAAGATGGCCCTATCATCTTTGAGAGACACTCCAGATATAAATATCATTATGATGCTGACTACTCCCGGTGGGATTCAACACAACAAAGAGCCGTGTTGGCAGCAGCCTTAGAAATTATGGTCAAGTTTTCCCCAGAGCCGCACCTGGCCCAAAAGGTTGCAGAAGACCTTCTCTCTCCCAGCGTGATGGACGTGGGTGATTTCAAAATATCAATCAATGAGGGTCTCCCCTCCGGAGTGCCCTGCACCTCCCAATGGAATTCCATCGCCCACTGGCTCCTCACCCTCTGTGCACTCTCCGAAGTTACAAATCTGTCCCCTGACATTATTCAGGCCAACTCTCTCTTTTCTTTCTACGGTGATGATGAAATTGTGAGTACAGACATAAAATTAGACCCAGAAAAACTGACAGCAAAACTTAAGGAATATGGGTTGAAACCGACCCGCCCTGACAAAACTGAAGGGCCTCTTGTCATTTCCGAAGACCTGAATGGCCTAACCTTCCTGCGGAGGACCGTGACCCGTGACCCAGCAGGCTGGTTTGGAAAGTTGGAACAGAGCTCAATACTCAGACAAATGTACTGGACTAGGGGCCCCAACCATGAAGATCCATCTGAAACAATGATACCACACTCCCAGAGGCCCATACAATTGATGTCTTTGCTGGGTGAGGCAGCACTCCACGGCCCAGCATTCTATAGTAAAATCAGCAAACTGGTCATTGCAGAGTTGAAGGAAGGTGGCATGGATTTTTACGTGCCAAGACAAGAGCCAATGTTCAGATGGATGAGATTCTCGGATCTGAGCACGTGGGAGGGCGATCGCAATCTGGCTCCCAGTTTTGTGAATGAAGATGGCGTCGAATGACGCCACCCCATCTAATGATGGTGCCGCCGGCCTCGTCCCAGAGATCAACAATGAGGCAATGGCGCTAGAACCAGTGGCGGGTGCAGCGATAGCAGCACCCCTCACTGGCCAACAAAATATAATTGATCCCTGGATTATGAATAATTTTGTGCAAGCACCTGGTGGTGAGTTCACAGTATCTCCTAGGAATTCCCCTGGTGAAGTGCTCCTCAATTTGGAATTGGGCCCAGAGATAAACCCCTATCTGGCCCATCTTGCTAGAATGTATAATGGTTATGCAGGTGGGTTTGAAGTGCAGGTAGTCCTAGCTGGAAATGCGTTTACAGCAGGAAAGATAATCTTTGCAGCTATACCCCCTAACTTCCCAATTGACAATCTAAGTGCAGCACAGATCACAATGTGCCCACATGTGATTGTGGATGTCAGGCAGTTGGAACCGGTCAACCTCCCGATGCCTGACGTTCGCAACAACTTCTTCCACTACAACCAAGGGTCTGATTCGAGATTGCGCTTGGTTGCAATGCTGTACACACCTCTTAGGGCAAATAACTCTGGGGATGATGTTTTCACTGTGTCTTGTAGAGTGTTGACTAGACCTAGCCCTGACTTTTCATTTAACTTCCTTGTGCCACCCACTGTGGAGTCAAAGACAAAACCCTTCACCCTCCCAATTCTGACTATCTCTGAAATGTCTAATTCTAGGTTTCCAGTGCCGATTGATTCCCTGCACACCAGCCCAACTGAGAGTATTGTTGTCCAGTGCCAAAATGGACGCGTCACTCTTGATGGTGAGTTGATGGGCACCACTCAGCTCTTACCTAGTCAAATCTGTG-CTTTCAGGGGCGTGCTCACTAGATCAACAAGCAGGGCTAGTGACCAGGCCGACACAGCAACCCCTAGATTGTTTAATTATTATTGGCACATACAATTGGATAATCTAAATGGGACTCCTTATGATCCTGCAGAAGACACACCAGGCCCCCTGGGGACACCAGATTTCCGGGGCAAAGTCTTCGGCGTGGCCAGCCAAAG---AAACCCCGACAGTACAACTAGAGCACATGAAGCAAAGGTGGACACAACAGCTGGTCGCTTCACCCCAAAACTAGGCTCATTAGAGATATCCACTGAATC---TGATGACTTTGACCAAAATCAACCAACAAGATTCACCCCAGTTGGCATT---------GGGGTTGACCGCGAGGCAGACTTCCAACAATGGTCCTTACCCGACTACTCTGGCCAGTTCACTCACAACATGAACTTAGCCCCAGCTGTTGCTCCCAACTTCCCTGGTGAGCAGCTCCTTTTCTTCCGCTCACAGTTACCATCTTCTGGTGGGCGATCCAATGGGATTCTAGACTGCCTGGTCCCCCAAGAATGGGTTCAGCACTTCTACCAAGAATCAGCCCCCGCCCAAACCCAGGTGGCTCTGGTTAGATATGTCAACCCTGACACTGGTAGAGTGTTGTTTGAGGCAAAGCTGCACAAATTAGGCTTCATGACTATAGCTAAGAATGGTGATTCTCCAATAACTGTCCCTCCAAATGGATACTTTAGGTTTGAATCTTGGGTGAACCCATTTTATACACTTGCCCCCATGGGAACTGGGAATGGGCGTAGGAGAGTTCAATAATGGCTGGAGCTTTTATAGCAGGATTAGCTGGTGATATATTCACAAATACTGTAGGGTCTCTAGTTAATGCAGGGGCAAATGTTATTAACCAAAAAGTTGATTTTGAAAATAATAAATATTTGCAAAATGCTTCCTTC-AAT--------------------------------CATGATAAGGAAATGTTAAATGCACAGATTGAGGCAACAAAGAGGTTACAGGCTGACATGATTGCTATCAAACAGGGGGTTTTGACCGCTGGCGGCTTTTCCCCCACTGATGCAGCTCGCGGGGCAATTAATGCCCCCATGACAAAAGTCCTAGATTGGAATGGAACAAGGTACTGGGCACCGAGTGCCACCTCTACAACCT-CGATGTCGGGTGGCTT--CACAAACCAGGCTGCGCACAGAACCACACCAAATTTTAAA---ATGAACCAGGCCCCCAAATCCACACCCAGCAGTGGGTCTTC-AGTGAGGTCACTCTCAACCCAAGTCACTAGTCTGAGCTCACACTCGTCCGGGTCGTCTCGATCCAGCGAGT-CTACAGTTGCTAGCTCA---TTACCATCCTCTAACAGGACTAGGGACTGGGTCAATCAGCAAAATTTCAATTTGGAACCACACATGCCTGGGTCACTTAGGACGGCTTTTGTCACTCCACCATCTAGTACAGCCTCTAGTTCAGGCACTGTCTCAACCGTGCCCAAAAATGTTTTGGACTCCTGGACA-TCTGCGTTTAACACGCGCAGACAGCCGCTATT-CGCACACCTTCGTAGAAGGGGGGAGTCAAATGTTTA--GTGAAAAGGTTATTTTAAATTTGATTTAAA-TTGGATTTG 37 | >MH218601.1 38 | TGAATGAGGATGGACCCATAATATTTGAGAAACACTCCAGATACAAATACCATTATGATGCAGATTACTCCCGCTGGGACTCAACACAACAAAGAGCAGTGCTAGCCGCAGCCCTGGAAATAATGGTCAAATTCTCACCAGAACCCCACCTGGCCCAGGTGGTTGCAGAAGACCTTTTGTCCCCCAGTGTGATGGATGTGGGTGATTTTAGGATATCAATCAACGAGGGATTACCCTCTGGTGTTCCTTGCACTTCACAATGGAACTCCATTGCTCACTGGCTCCTCACACTATGTGCACTGTCTGAAGTCACAGACCTGTCCCCTGACATCATCCAGGCGAATTCCCTGTTCTCCTTTTATGGTGATGATGAAATAGTGAGCACAGACATCAAATTAGACCCAGAGAAATTGACAGCAAAGCTGAGGGAATACGGGCTTAAACCAACCCGCCCTGACAAAACAGAGGGACCCTTAATTATCTCTGAAGATTTGAATGGCCTGACCTTCTTGCGGAGAACAGTGACCCGCGACCCGGCCGGATGGTTTGGCAAACTGGACCAAAGTTCAATACTCAGACAGATGTACTGGACCAAGGGGCCAAACCATGAAGACCCCTTTGAAACAATGATACCACACTCCCAAAGACCCATACAATTGATGTCATTACTTGGTGAAGCTGCATTGCATGGTCCATCATTCTACAGTAAAATCAGCAAATTGGTCATCTCAGAACTGAAAGAGGGTGGAATGGATTTTTACGTGCCCAGACAAGAACCAATGTTCAGGTGGATGAGATTCTCAGATTTGAGCACGTGGGAGGGCGATCGCAATCTGGCTCCCAGTTTTGTGAATGAAGATGGCGTCGAATGACGCCGCTCCATCTAATGATGGGGCCGCCGGCCTCGTCCCAGAGATCAACAATGAGGCAATGGCGCTAGAGCCAGTGGCGGGTGCAGCGATAGCAGCACCCCTCACTGGCCAGCAGAATATAATTGATCCCTGGATTATGAATAATTTTGTGCAAGCACCTGGTGGTGAGTTCACAGTGTCCCCCAGAAATTCCCCTGGTGAAGTCCTTCTTAATTTGGAACTGGGCCCAGAAATAAATCCCTATTTGGCCCATCTTGCTAGAATGTATAATGGTTATGCAGGTGGATTTGAAGTGCAGGTGGTCCTAGCTGGAAATGCGTTTACAGCAGGAAAGATAATCTTTGCAGCTATTCCCCCCAATTTTCCAATTGATAATCTAAGTGCGGCACAGATCACAATGTGCCCACATGTGATTGTGGATGTCAGACAGTTGGAACCAGTCAACCTCCCGATGCCTGACGTTCGCAATAATTTCTTTCATTATAATCAAGGGTCTGATTCAAGATTACGCTTAATTGCAATGTTATATACACCTCTTAGGGCAAACAATTCTGGGGATGATGTTTTTACTGTGTCTTGTAGAGTGCTGACTAGACCTAGTCCTGATTTCTCATTCAATTTCCTTGTGCCACCTACTGTGGAGTCAAAGACAAAACCCTTTTCCCTCCCTATTCTGACTATCTCTGAAATGTCCAATTCTAGGTTCCCAGTACCAATTGATTCTCTGCACACCAGTCCTACTGAGAATATTGTTGTTCAGTGCCAAAATGGGCGCGTCACCCTTGATGGTGAGTTGATGGGCACCACCCAACTCTTGCCTAGCCAAATCTGTG-CTTTTAGGGGCGTTCTCACCAGATCAACAAGCAGGGCCAGTGACCAGGCCGATACAGCAACCCCTAGATTGTTTAATTATTATTGGCATATACAATTGGATAATCTAAATGGAACCCCTTATGATCCTGCAGAAGATATACCAGGCCCCCTAGGGACACCAGATTTCCGTGGCAAAGTCTTTGGCGTGGCCAGCCAGAG---AAACCCTGATGCCACAACTAGGGCACATGAAGCAAAGATAGACACCACATCTGGCCGCTTCACCCCAAAGCTAGGCTCATTAGAGATATCCACTGAATC---TAGTGACTTTGACCAAAACCAACCAACAAGATTCACCCCAGTTGGCATT---------GGAGTTGACCATGAGGCAGACTTTCAACAATGGACCCTACCCGACTACGCTGGTCAGTTCACACACAACATGAACTTAGCCCCAGCTGTTGCTCCCAACTTCCCTGGTGAGCAGCTCCTTTTCTTCCGCTCACATTTGCCATCTTCTGGTGGGCGATCCAACGGGATTCTAGACTGCCTGGTCCCCCAAGAATGGGTACAGCACTTCTACCAAGAGTCGGCCCCCTCTCAGTCTCAAGTGGCTCTGGTTAGATATGTTAACCCTGACACTGGTAGAGTGTTATTTGAGGCCAAGCTGCACAAATTAGGTTTCATGACTATAGCCAAGAATGGTGATTCTCCAATAACTGTTCCTCCAAATGGGTATTTTAGGTTTGAATCTTGGGTGAACCCCTTTTACACACTTGCCCCCATGGGAACTGGGAATGGGCGTAGAAGGATTCAATAATGGCTGGAGCGTTTATAGCAGGATTGGCTGGTGACATGCTCACAAATACTGTAGGATCTTTAGTTAATGCAGGAGCTAATGCTATTAATCAGACAATTGATTTTGAAAATAATAAATATTTGCAAAATGCTTCTTTT-AAT--------------------------------CATGATAAGGAGATGTTGAATGCACAAGTTGAGGCAACAAAGAAGTTACAGGCTGACATGATTGCTATCAAGCAAGGGGTCTTGACCGCTGGCGGCTTCTCCCCTACTGATGCAGCCCGTGGGGCAATTAATGCCCCCATGACAAAAGTCCTAGATTGGAATGGAACGAGACACTGGGCACCAGGTGCCACCTCCACAACCT-CGATGTCGGGTGGCTT--TACACATCAAACTGTGCACAGATCCACACCAAATTTTAAA---ACGAACCAGGCTCCCAAACCCACACCCAGCAGTGGGTCTTC-AGTGAGGTCAAACTCAACCCAAATCACTAGCCTGAGCTCACACTCGTCCGGGTCGTCTCGATCCAGCGGGT-CTACAGTTGTCAGCTCA---ATACCATCCTCTAACAGGACTAGGGACTGGGTCAACCAACAAAATTTTAATTTGGAACCACACATGCCTGGATCTCTTAGGACAGCTTTTGTCACTCCACCATCTAGTACAGCCTCTAGCTCAGGCACAGTCTCAACTGTGCCCAAAAATGTTTTGGACTCCTGGACA-TCTGCGTTTAACACGCGCAGACAGCCGCTATT-CGCACACCTTCGCAGAAGGGGGGAGTCAAATGTTTA--GTGAAAAGATCATTTTAAATTTGGTTTAAAATTAGGTTTA 39 | -------------------------------------------------------------------------------- /datasets/hiv.fasta: -------------------------------------------------------------------------------- 1 | >AF193276.1_KAL153 2 | AGGCATCTCCTATGGCAGGAAGAAGCGGAGACAGCGACGAAGAGCTCCTCAAGACAATCAGACTGATCAAGTTTCTCTACCAAAGCAGTAAGTA----GTACATGTAATGCAATCCTTAGCAATAGCAGCAATAGTAGCATTAGTAGTAGTAGGAATAATAGCAATAGTTGTGGGGTCCATAGTATTCATAGAATATAGGAAAATATTAAGACAAAGAAAAATAGACAGGTTAATTGATAGAATAAGAGAAAGAGCAGAAGACAGTGGCAATGAGAGTGAAGGAGATCAGGA---------AGCACTTAT---GGAGATGGGGCACCTTGTTCCTTGGGATGCTGATGATCTGTAGTGCTACAGAAAATTTATGGGTCACAGTTTATTATGGGGTACCTGTGTGGAAAGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATAGTAAGGAGGTACATAATGTTTGGGCCACATATGCCTGTGTACCCACGGACCCCAGCCCACAAGAAATACCATTGAAAAATGTGACAGAAAATTTTAACATGGGGAAAAATAACATGGTAGAACAGATGCATGAGGATATAATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTAAAGTTAACCCCACTCTGTGTTACTTTAAATTGCACTGATTTGAAGAAGGAGGTTACTAGTACCAATA---CTAGTAGC---------------ATAAAAATGATGGAAATGAAAAACTGCTCTTTCAACATCACCACAGACCTGAGAGATAAAGTGAAAAAAGAATATGCACTCTTTTATAAACTTGATGTAGTACAAATAGAT------AATGATA---------------GCTATAGGTTGATAAGTTGTAATACCTCAGTCGTTACACAAGCCTGTCCAAAGATATCCTTTGAGCCAATTCCCATACATTATTGTGCCCCAGCTGGTTTTGCGATTCTAAAGTGTAACGATAAAAAGTTCAACGGAACAGGGCCATGTACAAATGTCAGTACAGTACAATGTACACATGGAATTAAGCCAGTAGTATCAACTCAACTGCTGTTAAATGGTAGTCTAGCAGAAGAAGAGGTAGTAATTAGATCTGTCAATTTCACGGACAATACTAAAACCATAATAGTACAGCTGAAAGAACCTGTGGAAATTAATTGTACAAGACCCAACAACAATACAAGAAAAGGTATTCATAT------AGGACCAGGGAGAGCATTTTATGCAACAGGAGACATAACAGGAGATATAAGACAAGCACATTGTAACATTAGTATAACAAAATGGAATAACACATTAAAACAGATAGTTATCAAATTAAGAAAACAATTTGGGAATA---AAACAATAGTCTTTAATCAATCCTCAGGAGGGGACCCAGAAATTGTAATGCACAGTTTTAATTGTGGAGGGGAATTTTTCTACTGTAATACAACAAAACTGTTTAATAGCACTTGGA------ATGGTACT------------GAAGAGTTAAATAACACTGAAGGAGAT------ATAGTCACACTCCCATGCAGAATAAAACAAATTATAAACATGTGGCAGGAAGTAGGAAAAGCAAGGTATGCCCCTCCCATCGCAGGACAAATTAGATGTTCATCAAATATTACAGGACTGCTATTAACAAGAGATGGTGGTAACCAGAGCAATGTTAC------CGAGATTTTTAGAACTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTTAAAATTGAACCATTAGGAGTAGCACCCACCAGGGCAAAGAGAAGAGTGGTGCAGAGAGAGAAAAGAGCAGTGGGA---ATAGGAGCTGTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCGGCGTCAATAACGCTGACGGTACAGGCCAGACAATTATTGTCTGGTATAGTGCAACAGCAGAATAATCTGCTGAGGGCTATTGAGGCGCAACAACATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCGAGAGTCCTGGCTGTGGAGAGATACCTAAAGGATCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTGCTGTGCCTTGGAATACTAGTTGGAGTAATAAACCTCTAGATGAGATTTGAAATAACATGACCTGGATGGAGTGGGAAAGAGAAATTAATAATTACACAGGTTTAATATACAATTTAATAGAAGAATCGCAGAACCAACAAGAAAAGAATGAACAAGAAATATTGGCATTAGATAAATGGGCAAGTTTGTGGAATTGGTTTGACATATCAAAATGGCTGTGGTATATAAAAATATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAATTTTTGCTGTACTTTCTATAGTGAATAGAGGTAGGCAGGGATATTCACCATTATCATTTCAGACCCGCCTCCCAGCCCAGAGGGGACCCGACAGGCCCGAAGGAATAGAAGAAGAAGATGGAGAGAGAGACAGAGACACATCCATTCGATTAGTGAACAGATTCTTAGCACTTATCTGGGACGACCTGAGGAGCCTGTGCTTCTTCATCTACCACCACTTGAGAGACTTACTCTTGATTGCAGCGAGGATTGTGGAACTTCTGGGACGCAG---------------------GGGGTGGGAAGCCCTCAAATATTGGTGGAATCTCCTACAGTATTGGATTCAGGAACTAAAGAGTAGTGCTATTAATCTGATAGGTACCATAGCAATAGCAGTAGCTGGGTGGACAGATAGGGTTATAGAAATAGGACAAAGATTTTGTAGAGCTATGCGTAACATACCTAGGAGAATCAGACAGGGCGCAGAAAAGGCTTTGCAATAACATGGGGGGCAAATGGTCAAAAAGTAGCATAGTGGGATGGCC------TCAGGTTAGGGAAAGAATAAGACGAGCTCCTGCTC-------CAGCGGCAAGA------------GGAGTAGGACCAGTATCTCAAGATTTGGATAAGTATGGAGCAGTCACAAGCAGTAATACAGCAGCTAATAATGCTGATTGCGCCTGGCTGGAGGCGCAAGAGGAAGAGGAGGTAG 3 | >AF193275.1_97BL006 4 | AGGCATCTCCTATGGCAGGAAGAAGCGGAGGCACAGACGCGGAACTTCTCACAGCAGTAAGGATCATCANATTCCTATATCAAAGCAGTAAGTACTAAATAAATGTAATGACACCTTTARAAATTTATGCAATAGTAGCATTAGTAGTAGTGTTTGTTATAGCNNTAGTTGTGTGGACTATAGTAGGTATARAATATANNNNATTGCTAAAACAAAGAAAAATAGACAGGTTARTTGAGAGAANNNNAGANAGAGCAGAAGACAGTGGCAATGAAAGCGAGGGGGATGCARAGGAATTATCAACACTTAT---GGAGGTGRGGAACTATGCTCTTTTGGATGATAATAATGTGTAAGGCTGCAGAAGACTTGTAGGTCACRGTATACTATARGGTACCTGTGTRGARAGATGCAGCGACCACCCTATTTTGTGCATCAGATGCTAAAGCAYATGATAAAGAAGTACACAATGTCTGGGCTACACATGCCTGTGTACCCACAGACCCTGACCCACAAGAAATAATTTTAGGAAATGTGACAGAAAAATTTGACATGTRGAAAAATAACATRGTAGAACAAATGCAAACAGATATAATCAGTCTCTAGGACCAAAGCCTAAAGCCATGTGTAAAGTTAACCCCTCTCTGCGTTACTTTAAATTGTGCTGAACCCAACAGCACTAGATCTAACAACAGTAGCGTTAACAGCAACAGCAGCGATAGCTTGTTTRAAR---AAATGAAGAACTGCTCTTTCAACATGACCACAGAACTAAGAGATAAAAGGAAAACTGTACATTCACTTTTTTATAAACTTGATATAGTATCAACTAGTAAT---AATGATAGT---------RGGCAGTATAGACTAATAAATTGTAATACATCAGCCATGACACAGGCCTGTCCTAARGTAACCTTTGAGCCAATTCCTATATATTATTGTGCCCCAGCTGGTTTTGCGATTCTAAAGTGCAARGATACAAATTTTACTAGAACARGGCCATGCAAGAATGTCAGCACAGTACAATGCACACATRGAATCAAGCCAGTAGTATCAACTCAACTGCTGTTAAATGGCAGTCTAGCAGAGAAAGARGTAATGATTAGATCTGAAAATATCACAGACAATGTCAAAATCATAATAGTACAGCTTACTGAGCCTGTAAACATCACTTGTATCAGACCTGGCAACAATACAAGAACAAGTATACGTAT------AGGACCAGGACAAACCTTCTATGCAACAGGTGATGTAATARRGGACATAAGAAAAGCATATTGTAATGTCAGCAGAGCAGCATRGAATAGCACTTTACAAAAGATAAGTACACAATTAAGAAAATACTTTAATAACA---AAACAATAATCTTTAAGAGCTCCACAGGARAGGATTTAGAAGTTACAACACATAGTTTCAATTGTGGAGGAGAATTTTTCTATTGCAATACAACAGACCTGTTCAATAGCACTTRGG------ATGGCACT---------------GTCACAAATAGCACAAAGGCCAATGGA---ACTATAACTCTACCATGCAGAATAAAGCAAATTATAAATATGTGGCAGAGAGTAGGACAAGCAATGTATGCCCNTCCTATCAAARGAAGTATAAGGTGTGAATCAAACATTACAGGACTACTACTAACAAGAGATGGTRGAGGTRGAACTAATNGCAGCA---ATGAGACCTTCAGACCTATARGAGGAGATWTGAGGAACAATTGGAGAAGTGAACTATATAAGNATAAAGTAGTAAAAATTGAACCAATARRAGTAGCACCTACCAGGGCAAAGAGAAGAGTRGTRGAGAGAGAAAAAAGAGCAATTGGA---CTARGAGCTGCCTTCCTTARGTTCTTAGGAGCAGCAGRAAGTACTATRGGCGCGGCGTCAATGACGCTGACGGTACAGGCCAGACAATTATTGTCTGGCATAGTGCAACAGCAAAGCAATTTGCTGARGGCTATAWARGCTCAGCAGCAYCTGCTGAAACTCACGGTCTRGGGCATTAAACAGCTCCAGGCAAGARTCCTGGCTGTGGAAAGRTACCTAAAGGATCAGCAGYTCCTAAGAATTTGRGGTTGCTCTARAAAACTCATCTGCACCACTAATGTGCCCTRGAAYTCTAGTTRGAGTAATAAATYTCAGAGTGAGATATARGATAACATGACCTAGATGCAATAGGACAARGAAGTTATCAATWACACAGACATAATATATGATCTAATTRAAAAATCGCAAAACCAGCAGGAAAAGAATGARCAAGATTTATTGGCATTAGATAAGTAGGCAGGTCTGTRRAGTTRGTTWGACATATCAAATTGGTTATRGTATATARAAATATTTATAATAATAGTAGGAGGCTTAATARGATTAAGAATAATTTTTGCTGTGCTTTCTATAATAAATAGAGCCAGGCARRGATACTCACCCTTGTCATTGCAGACCCTTACCCCACACCCAGAAAGACCAGACAGGCCCRGAAGAATCAAAGAAGAAGGTRGAGAGCAAGGCAGAGACAGATCAATTCGATTAGTAAGCGGATTTTTAGCACTTGCCTRGGACGATCTACRGAGCCTGTGTCTCTTCAGCTACCACCGATTGAGAGACTTCATCTCGATTGCAGCGAGGACTGTRGAACTTCTGAAACGCAGCAGTCTCAARGGACTGAGACTGRGGTARGARGGCCTCAAATATCTRRGGAATCTTCTRGGATATTRRGGTCAGGAACTAAAGAGTAGTGCTATTAATCTGATAGATACCATAGCAATAGCAGTAGCTRGGTRGACAGATARGGTTATAGAAATAGGACAAAGATTTTGTAGAGCTATTCGTAACATACCTAGGAGAATCAGACARGGCGCAGAAAAAGCTTTGCAATAACATRGRGGGCAAATGGTCAAAAAGTAGCATAGTRRGATGGCC------TCAGGTTARRGAAAGAATAAGACGAGCTCCTGCTC-------CAGCAGCAAGA------------RGAGTAGGACCAGTATCTCAAGATTTRGATAAGCATGGAGCAGTCACAAGCAGTAATACAGCAGCTAATAATGCTGATTGCGCCTRGCTGGAGGCGCAAGARGAAGARGAGGTAG 5 | >AF193278.1_UKR1216 6 | AGGCATCTCCTATGGCAGGAAGAAGCGGAGACAGCGACGAAGAGCTCCTCAAGACAATCAGACTGATCAAGTTTCTTTACCAAAGCAGTAAGTA----GTACATGTAATGCAACCCTTAGCAATAGCAGCAATAGTAGCATTAGTAGTAGTAGGAATAATAGCAATAGTTGTGTGGTCCATAGTATTCATAGAATATAGAAAAATATTAAGACAAAGAAAAATAGACAGGCTAATTGATAGAATAAGAGAAAGAGCAGAAGACAGTGGCAATGAGAGTGAAGGAGATCAGGA---------AGCACTTAT---GGAGATGGGGCACCTTGTTCCTTGGGATGCTGATGATCTGTAGTGCTACAGAAAATTCATGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAAGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATAGTAAGGAGGTACATAATGTTTGGGCCACATATGCCTGTGTACCCACGGACCCCAGCCCACAAGAAATACCATTGAAAAATGTGACAGAAAATTTTAGCATGGGGAAAAATAACATGGTAGAACAGATGCATGAGGATATAATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTAAAGTTAACCCCACTCTGTGTTACTTTAAATTGCACTGATTTGAAGAAGAATGCTACTAGTACCAATACTAGTAGCATAACCAATACTAGTAGCATAGAAAGGACGGAAATGAAAAACTGCTCTTTCAACATCACCACAGACCTGAGAGATAAAGTGAAAAAAGAATATGCACTCTTTTATAACCTTGATGTAGTACAAATAGAT------AATGATA---------------GCTATAGGTTGATAAGTTGTAATACCTCAGTCGTTACACAAGCCTGTCCAAAGATATCCTTTGAGCCAATTCCCATACATTATTGTGCCCCGGCTGGTTTTGCGATTCTAAAGTGTAACGATAAAAAGTTCAATGGAACAGGGCCATGTACAAATGTCAGTACAGTACAATGTACACATGGAATTAAGCCAGTAGTATCAACTCAACTGCTGTTAAATGGTAGTCTAGCAGAAGAAGGGGTAGTAATTAGATCTGTCAATTTCACGGACAATACTAAAACCATAATAGTACAGCTGAAAGAACCTGTAGAAATTAATTGTACAAGACCCAACAACAATACAGGAAAAGGTATTCATAT------AGGACCAGGGAGAGCATTTTATGCAACAGGAAACATAATAGGAGATATAAGACAAGCACATTGTAACATTAGTATAACAAAATGGAATAACACATTAAAACAGATAGTTATCAAATTAAGAGAACAATTTGAGAATA---AAACAATAGTTTTTAATCAATCCTCAGGAGGGGACCCAGAAATTGTAATGCACAGTTTTAATTGTGGAGGGGAATTTTTCTACTGTAATACAACAAAACTGTTTAATAGCACTTGGAAT---GATAGCACTTGGAATGGTACTGGAGAGGTAAATAACACTGAAGGAGAT------ATAGTCACACTCCCATGCAGAATAAAACAAATTATAAACATGTGGCAGGAAGTAGGAAAAGCAATGTATGCCCCTCCCATCGCAGGACAAATTAGATGTTCATCAAATATTACAGGACTGCTATTAACAAGAGATGGTGGTAACCAGAGCAATGTCAC------CGAGATCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCCACCAGGGCAAAGAGAAGAGTGGTGCAGAGAGAGAAAAGAGCAGTGGGA---ATAGGAGCTGTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCGGCGTCAATAACGCTGACGGTACAGGCCAGACAATTATTGTCTGGTATAGTGCAACAGCAGAACAATCTGCTGAGGGCTATTGAGGCGCAACAACATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCGAGAGTCCTGGCTGTGGAGAGATACCTAAAGGATCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTGCTGTGCCTTGGAATACTAGTTGGAGTAATAAATCTCTAGATAAGATTTGGGATAACATGACCTGGATGGAGTGGGAAAGAGAAATTAATAATTACATAGATTTAATATACAATTTAATAGAAGAATCGCAGAACCAACAAGAAAAGAATGAACAAGAATTATTGGCATTGGATAAATGGGCAAGTTTGTGGAATTGGTTTGACATATCAAAATGGCTGTGGTATATAAAAATATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAATTTTTGCTGTACTTTCTATAGTAAATAGAGTTAGGCAGGGATATTCACCATTATCATTTCAGACCCGCCTCCCAACCCAGAGGGGACCCGACAGGCCCGAAGGAATCGAAGAAGAAGGTGGAGAGAGAGACAGAGACACCTCCATTCGATTAGTGAACGGATTCTTAGCACTTATCTGGGACGACCTGAGGAGCCTGTGCCTCTTCCTCTACCACCACTTGAGAGACTTACTCTTGATTGCAGCGAGGATTGTGGAACTTCTGGGACGCAG---------------------GGGGTGGGAAGCCCTCAAATATTGGTGGAATCTCCTACAGTATTGGATTCAGGAACTAAAGAATAGTGCTGTCAGCTTGCTCAATGCCACAGCTGTAGCAGTAGCTGAGGGAACAGATAGGGTTATAGAAGTAGTACGAAGAGCTTTTAGAGCTTTCCTCCGCATACCTAGAAGAATAAGACAGGGCTTCGAAAGGGCTTTGCTATAAGATGGGTGGCAAGTGGTCAAAACGTAGCTTGGTTGGATGGCC------TAAAATAAGGGAAAGAATGCAACGAGCTGAG-CCA---------GCAGCAGAA------------GGGGTGGGAGCAGTATCTCGAGACCTGGAAAAACATGGAGCAATCACAAGCAGCAACACAGCAGCTACTAATGCTGCTTGTGCCTGGCTAGAAGCACAAGAGGATGAGGAGGTGG 7 | >AF193277.1_RU98001 8 | AGGCATCTCCTATGGCAGGAAGAAGCGGAGACAGCGACGAAGACCTCCTCAAGACAATCAGACTGATCAAGTTTCTCTACCAAAGCAGTAAGTA----GTACATGTAATGCAATTCTTAGTAATAGCAGCAATAGTAGCATTAGTAGTAGGAGGAATAATAGCAATAGTTGTGTGGTCCATAGTATTCATAGAATATAGGAAAATATTAAGACAAAGAAAAATAGACAGGTTAATTGATAGAATAAGAGAAAGAGCAGAAGACAGTGGCAATGAGAGTGAAGGAGATCAGGA---------AGCACTTAT---GGAGATGGGGCACCTTGCTCCTTGGGATGCTGATGATCTGTAGTGCTACAGAAAATTTATGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAAGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATAGTAAGGAGGTACATAATGTTTGGGCCACATATGCCTGTGTACCCACGGACCCCAGCCCACAAGAAATACCATTGGAAAATGTAACAGAGAATTTCAACATGGGGAAAAATAACATGGTAGAACAGATGCATGAGGATATAATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTAAAGTTAACCCCACTCTGTGTTACTTTAAATTGCACTGAAGTGAAGACGAATGATACTAGTACCAATG---CTAGTGGC---------------ATAGAAATGATG------AAAAACTGCTCTTTCAACATCACCACAGACCTGAGAGATAAAGTGAAAAAAGAACATGCACTCTTTTATAAACTTGATGTAGTACAAATAGAT------AATGATA---------------GCTATAGGTTGATAAGTTGTAATACCTCAGTCGTTACACAAGCCTGTCCAAAGATATCCTTTGAGCCAATTCCCATACATTATTGTGCCCCAGCTGGTTTTGCGATTCTAAAGTGTAACGATAAAAAGTTCAATGGAACAGGGCCATGTACAAATGTCAGTACAGTACAATGTACACATGGAATTAAGCCAGTAGTGTCAACTCAACTGCTGTTAAATGGTAGCCTAGCAGAAGAAGAGGTAGTAATTAGATCTGTCAATTTCACGGACAATACTAAAACCATAATAGTACAGCTGAAAGAACCTGTAGAAATTAATTGTACGAGACCCAACAACAATACAAGAAAAGGTATTCATAT------AGGACCAGGGAGAGCATTTTATGCAACAGGAGACATAATAGGAGATATAAGACAAGCATATTGTAACATTAGTAGAACAAAATGGAATAACACATTAGAACAGATAGTTAGCAAATTAAGAAAACAATTTAGGAATA---AAACAATAGTCTTTAATCAATCCTCAGGAGGGGACCCAGAAATTGTAATGCACAGTTTTAATTGTGGAGGGGAATTTTTCTACTGTAATACAACAAAACTGTTTAATAGCACTTGGA------ATAATACT------------GAAGAGTCAAATAACACTAAAGGAGAT------ATAGTCACACTCCCATGCAGAATAAAACAAATTATAAACATGTGGCAGGAAGTAGGAAAAGCAATGTATGCCCCTCCCATCGCAGGACAAATTAGATGTTCATCAAATATTACAGGACTGCTATTAACAAGAGATGGTGGTAACCAGAACAATGTCAC------CGAGATCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAAGAGTGGTGCAGAGAGAGAAAAGAGCAGTGGGA---ATAGGAGCTGTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCGGCGTCAATAACGCTGACGGTACAGGCCAGACAATTATTGTCTGGTATAGTGCAACAGCAGAATAATCTGCTGAGGGCTATTGAGGCGCAACAACATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCGAGAGTCCTGGCTGTGGAGAGATACCTAAAGGATCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTGCTGTGCCTTGGAATACTAGTTGGAGTAATAAATCTCTAGATAAGATTTGGAATAACATGACCTGGATGGAGTGGGAAAGAGAAATTAATAATTACACAGGTTTAATATACAATTTAATAGAAGAATCGCAGAACCAACAAGAAAAGAATGAACAAGAACTATTGGCATTAGATAAATGGGCAAGTTTGTGGAATTGGTTTGACATATCAAAATGGCTGTGGTATATAAAAATATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAATTTTTGCTGTACTTTCTATAGTGAATAGAGTTAGGCAGGGATATTCACCATTATCATTTCAGACCCGCCTCCCAACCCAGAGGGGACCCGACAGGCCCGAAGGAATAGAAGAAGAAGGTGGAGAGAGAGACAGAGACACATCCATTCGATTAGTCAACGGATTCTTAGCACTTATCTGGGACGACCTGAGGAGCCTGTGCCTCTTCATCTACCACCACTTGAGAGACTTACTCTTGATTGCAGCGAGGACTGTGGAACTTCTGGGACGCAG---------------------GGGGTGGGAAGCCCTCAAATATTGGTGGAATCTCCTACAGTATTGGATTCAGGAACTAAAGAGTAGTGCTATTAATCTGATAAATACCATAGCAATAGCAGTAGCTGGGTGGACAGATAGGGTTATAGAAATAGGACAAAGATTTTGTAGAGCTATTCGTAACATACCTAGGAGAATCAGACAGGGCGCAGAAAAGGCTTTGCAATAACATGGGGGGCAAATGGTCAAAAAGTAGCATAGTGGGATGGCC------TCAGATTAGGGAAAGAATACGACGAGCTCCTGCTC-------CAGCAGCAAGA------------GGAGTAGGACCAGTATCTCAAGATTTGGATAAGTATGGAGCAGTCACAAGCAGTAACACAGCAGCTAACAATGCTGACTGCGCCTGGCTGGAGGCGCAAAAGGAAGAGGAGGTAG 9 | >DQ207943.1_98GEMZ003 10 | AGGCATCTCCTATGGCAGGAAGAAGCGGAGACAGCGACGAAGAGCTCCTCAAGGCAATCAGACTGATCAAGTTGCTCTACCAAAGCAGCAAGTA----GTACATGTAATGCAACCCTTAGTAATAGCAGCAATAGTAGCATTAGTAGTAGTAGGAATAATAGCAATAGTTGTGTGGTCCATAGTAGGCATAGAATATAGGAAAATATTAAAACAAAGAAAAATAGACAGATTGATTGAAAGAATAAGAGAAAGAGCAGAAGACAGTGGCAATGAAAGTGAAGGAGATCAGGA---------AGCACTTAT---GGAGATGGGGCACCTTGTTCCTTGGGATGCTGATGATCTGTAGTGCTACAGAAAATTTATGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAAGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATAGTAAGGAGGCACATAATGTTTGGGCCACATATGCCTGTGTACCCACGGACCCCAGCCCACAAGAAATACCATTGGAAAATGTGACAGAAAATTTTAACATGGGGAAAAATAACATGGTAGAACAGATGCATGAGGATATAATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTAAAGTTAACCCCACTCTGTGTTACTTTAAATTGCACTGATTTAAATAAGAATGGTACTAATACCAATAATAGT------------------AGCAAAGAAATGATGGAAATGAAAAACTGCTCTTTCAACATCACCACAGACCTGAGAGATAAAGTGAAAAAAGAATATGCACTCTTTTATAGACTTGATGTAGTACAAATAGAT------AATGATA---------------GCTATAGGTTGATAAGTTGTAATACCTCAGTCGTTACACAAGCCTGTCCAAAGATATCCTTTGAGCCAATTCCCATACATTATTGTGCCCCGGCTGGTTTTGCGATTATAAAGTGTAACGATAAAAAGTTCAATGGAACAGGGCCATGTACAAATATCAGTACAGTACAATGTACACATGGAATTAAGCCAGTAGTATCAACTCAACTGCTGTTAAATGGTAGTCTAGCAGAAGAAGAGGTAGTAATTAGATCTGTCAATTTCACGGACAATACTAAAACCATAATAGTACAGCTGAAAGAACCTGTAAAAATTAATTGTACAAGACCCAACAACAATACAAGAAAAGGTATTCATAT------GGGACCAGGGAGAGCATTTTTTGCAACAGGAGACATAATAGGAGATATAAGACAAGCACATTGTAACATTAGTATAACAGAATGGAATAACACATTAACACAGATAGTTATCAAATTAAGAGAACAATTTGGGAATA---AAACAATAGTCTTTAATCAATCCTCAGGAGGGGACCCAGAAATTGTAATGCACAGTTTTAATTGTGGAGGGGAATTTTTCTACTGTAATACAACACAATTGTTTAATAGCACTTGGAAT---GATA---------------CTGAAGGGTTAAATAGCACTAAAGGAGAT------ATA---ACACTCCCATGCAGAATAAAACAAATTATAAACATGTGGCAGGAAGCAGGAAAAGCAATGTATGCCCCTCCCATCGCAGGACAAATTAGATGTTCATCAAATATTACAGGACTGCTATTAACAAGAGATGGTGGTAACCAGAGCAATGTCACTA---CCGAGATCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCCCCCACCAGGGCAAAGAGAAGAGTGGTGCAGAGAGAGAAAAGAGCAGTGGGA---ATAGGAGCTGTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCGGCGTCAATAACGCTGACGGTACAGGCCAGACAATTATTGTCTGGTATAGTGCAACAGCAGAACAATCTGCTGAGGGCTATTGAGGCGCAACAACATCTGTTGCAACTCACAGTCTGGGGCATCAAACAGCTCCAGGCGAGAGTCCTGGCTGTGGAGAGATACCTAAAGGATCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTGCTGTGCCTTGGAATACTAGTTGGAGTAATAAAACTCTAGAGGAGATTTGGGATAACATGACCTGGATGCAGTGGGAAAGAGAAATTGATAATTACACAGGTTTAATATACAATTTAATAGAAGAATCGCAGAACCAACAAGAAAAGAATGAACAAGAATTATTGGCATTAGATAAATGGGCAGGTTTGTGGAATTGGTTTGACATATCAAATTGGCTGTGGTATATAAAAATATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAATTTTTGCTGTACTTTCTATAGTGAATAGAGTTAGGCAGGGATATTCACCATTATCATTTCAGACCCGCCTCCCAACCCAGAGGGGACCCGACAGGCCCGAAGGAACCGAAGAAGAAGGTGGAGAGAGAGACAGAGACACATCCATTCGATTAGTGAACGGATTCTTAGCACTTATCTGGGACGACCTGAGGAGCCTGTGCCTCTTCATCTACCACCACTTGAGAGACTTACTCTTGATTGCAGCGAGGATTGTGGAACTTCTGGGACGCAG---------------------GGGGTGGGAAGCCCTCAAATATTGGTGGAATCTCCTACAGTATTGGATTCAGGAATTAAAGAATAGTGCTATCAGCTTGCTCAATGCCACAGCTATAGCAGTAGCTGAGGGAACAGATAGGGTTATAGAAGTAGTACAAAGAGCTTTTAGAGCTTTCCTCAACATACCTAGAAGAATAAGACAGGGCTTCGAAAGGGCTTTGCTATAAGATGGGTGGCAAGTGGTCAAAAAGTAGCTTGGTTGGATGGCC------TAAAATAAGGGAAAGAATGCAACGAGCTGAG-CCA---------GCAGCAGAA------------GGGGTGGGAGCAGTATCTCGAGACCTGGAAAAACATGGAGCAATCACAAGCAGCAATACAGCAACTACTAATGCTGCTTGTGCCTGGCTAGAAGCACAAGAGGATGAGGAGGTGG 11 | >AY835754.1_5084-83 12 | AGGCATCTCCTATGGCAGGAAGAAGCGGAGACAGCGACGAAGAGCTCCTCAAGACAGTCAGACTCATCAAGCTTCTCTATCAAAGCAGTAAGTA----GTACATGTAATGCAACCTATAA---TATTAGCAATAGTAGCATTAGTAGTAGCAATAATAATAGCAATAGTTGTGTGGTCCATAGTAGCCATAGAATATAGGAAAATATTAAGCCAAAGAAAAATAGCCAGGATAATTGATAGAATAATAGAAAGACCAGAAGCCAGTGGCAATGAGAGTGAAGGAGATCAGGAAGAATTATCAGCACTGGTGGTGGAGATGGGGCACCATGCTCCTTGGGATATTAATGATCTGTAGTGCTGCAGACAAATTGTGGGTCACAGTCTATTATGGGGTGCCTGTGTGGAAAGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGATACAGAGGTACATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAGTAGTATTGGAAAATGTAACAGAAAATTTTAACATGTGGAAAAATAACATGGTAGAACAGATGCATGAGGATATAATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTAAAATTAACCCCACTCTGTGTTACTTTAAATTGCACTGAT---CTGAGGAATGCTACTAATACCACTA---GTGGTAGTGGGGGAG-TGAT---GGAGAAAGGA--GAAATAAAAAACTGCTCTTTCAATATCACCACAAGCATAAGAGATAAGGTGCAGAAAGAACATGCACTTTTTTATAAACTTGATGTAGTACCAATAGAT------AATGATAAT---------ACCAGCTATAGGTTGATAAGTTGTAACACCTCAGTCATTACACAGGCCTGTCCAAAGGTATCCTTTGAGCCAATTCCCATACATTATTGTACCCCGGCTGGTTTTGCGATTCTAAAGTGTAACGATAAGAAGTTCAATGGAAAAGGACCATGTACAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTATCAACTCAACTGCTGTTAAATGGCAGTCTAGCAGAAGAAGAGGTAATAATTAGATCTGACAATTTCACGGACAATGCTAAAACTATAATAGTACAGCTGAAAGAATCTGTAGAAATTAATTGTACAAGACCCAACAACAATACAAGAAAAAGTATACATAT------AGGACCAGGGAGAGCATTTTATACAACAGGACAAATAATAGGAGATATAAGACAAGCACATTGTAACCTTAGTAGAACAAAATGGGATAACACTTTAAAACAGATAGCTGAAAAATTAAGAGAACAATTTGGGAATAGTA---CAATAGTCTTTAATCATTCCTCAGGAGGGGACCCAGAAATTGTAACGCACAGTTTTAATTGTGGAGGGGAATTTTTCTACTGTAATTCAACACAACTGTTTAATAGTACTTGGA------ATGGTACT---GAT---------GTGTCAAATAACACTGAAGGAAATATC---A------CACTCCCATGCAGAATAAAACAAATTGTAAACATGTGGCAGGAAGTAGGAAAAGCAATGTATGCTCCTCCCATCAGAGGACAAATTAGATGCTCATCAAATATTACAGGGCTGCTATTAACAAGAGATGGTGGTGATAACCAG-----AACGAGA-CCGAGATCTTTAGACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTAAAAATTGACCCATTAGGAGTAGCACCCACTAAGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGTGGGA---ATAGGAGCTGTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAGCGTCAATGACGCTGACGGTACAGGCCAGACTATTATTGTCTGGTATAGTGCAACAGCAGAACAATTTGCTGAGGGCTATTGAGGCGCAACAGCATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAGTCTTAGCTGTGGAAAGATACCTAAGGGATCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTACTGTGCCTTGGAATGCTAGTTGGAGTAATAAATCTTTGGATGAGATTTGGGATAACATGACCTGGATGCAGTGGGAAAGAGAAATTAACAATTACACAAGCTTGATATACACCTTAATTGAAGAATCGCAAAACCAACAAGAAAAGAATGAACAAGAATTATTGGAATTGGATAAATGGGCAAGTTTGTGGAATTGGTTTGACATAACAAAATGGCTGTGGTATATAAAAATATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTACTATACTTTCTATAGTGAATAGAGTTAGGCAGGGATATTCACCATTATCGTTTCAGACCCACCTCCCAGCCCAGAGAGGACCCGACAGGCCCGAAGGAATCGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCCGGTCGCTTAGTGGATGGATTCTTAGCAATTTTCTGGGTCGACCTACGGAGCCTGTGCCTTTTCAGCTACCACCGCTTGAGAGACTTACTCTTGATTGTAACGAGGATTGTGGAACTTCTGGGACGCAG---------------------GGGGTGGGAAGCCCTCAAATATTGGTGGAATCTCCTACAGTATTGGAGTCAGGAACTAAAGAATAGTGCTATTAGCTTGCTCAATGCCACAGCCATAGCAGTAGCTGAGGGGACAGACAGGGTTATAGAAGTATTACAAAGAGCTTTTAGAGCTATTCTCCATATACCTGTAAGAATAAGACAGGGCTTGGAAAGGGCTTTGCTATAAGATGGGTGGCAAGTGGTCAAAACGTAGTTTGGGTGAATGGCA------TACTGTAAGGGAAAGAATGAGACGAGCTGAG-CCA---------GCAGCAGAT------------GGGGTGGGAGCAGCATCTCGAGACCTGGAAAAACATGGAGCAATCACAAGTAGCAATACAGCAGCTAACAATGCTGATTGTGCCTGGCTAGAAGCACAAGAGGAGGAGGAGGTGG 13 | >AY835781.1_5157-83 14 | AGGCATCTCCAATGGCAGGAAGAAGCGGAGACAGCGACGAAGAGCTCCTCAAGACAGTCAGACTCATCAAGCTTCTCTATCAAAGCAGTATGTA----GTACATGTAATGCAATCTTTACAAATATTAGCACTAGTAGCATTAGTAGTAGCAGCAATAATAGCAATAGTTGTGTGGACCATAGTATTCATAGAATATAGGAAAATATTAAGACAAAGAAAAATAGACAGGTTAATTGATAGAATAAGAGAAAGAGCAGAAGACAGTGGCAATGAAAGCGAAGGAGACCAGGAAGAATTGTCAGCACTTGT---GGAGATGGGGCACCATGCTCCTTGGGATGTTGATGATCTGTAGTGCTGCAGGAAATTTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAAGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGATACAGAGGTACATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAGTAGTATTGGAAAATGTGACAGACAGTTTTAACATGTGGAAAAATAACATGGTAGAACAGATGCATGAGGATATAATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTAAAATTAACCCCACTCTGTGTCACTCTAAATTGCACTGATAAATTGAGGAATGATACTAATACCAATA---ATAGTAGTTGGGGAA-AGAT---GGAGAAAGGA--GAAATAAAAAACTGCTCTTTCAATATCACCACAAACATAAGAGACAAGGTGCCGAAAGAATATGCACTTTTTTATAAACTTGATGTAGTACCAATTGAT------AATGATAAT---------ACTAGCTATAGGTTGATAAATTGTAACACCTCAGTCATTACACAGGCCTGTCCAAAGGTATCCTTTGAGCCAATTCCCATACATTATTGTGCCCCGGCTGGTTTTGCGATTCTAAAGTGTAAAGATAAAAAGTTCAATGGAAAAGGACCATGTACAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAGTGGTATCAACTCAACTGCTATTAAATGGCAGTCTAGCAGAAGAAGAGGTAGTAATTAGATCTGACAATCTCACGGACAATGCTAAAACCATAATAGTACAGCTGAAGGAACCTGTAGAAATTAATTGTACAAGACCTAACAACAATACAAGAAAAAGTATACATAT------AGGACCAGGGAGAGCATTTTATACAACAGGACAAATAATAGGAGATATAAGACAAGCACATTGTAACATTAGTAGAGCAAAATGGAATAACACTTTACAACAGATAGTTATAAAATTAAGAAAACAATTTGAGAATA---GAACAATAGTCTTTAATCAATCCTCAGGAGGGGACCCAGAAGTTGTAATGCACAGTTTTAATTGTGGAGGAGAATTTTTCTACTGTAATTCATCACAACTGTTTAATAGTACTTGGAAT---GATAGTACTTGGAATGATACTAAAGGGTTAAATAACACTGAAGGA---------ATTATCACACTCCCATGCAGAATAAAACAATTTATAAACATGTGGCAGGAAGTAGGAAAAGCAATGTATGCCCCTCCCATCAGAGGACAAATTAGATGTTCATCAAATATTACAGGGCTGCTCTTAACAAGAGATGGTGGTAATAGCGAGAACGATACCA---CCGAGATCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAGTAATAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGTGGGAACGCTAGGAGCTGTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACCATGGGCGCAGCGTCAATGACGCTGACGGTACAGGCCAGACAATTATTGTCTGGTATAGTGCAACAGCAGAACAATTTGCTGAGGGCTATTGAGGCGCAACAGCATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAGTCCTGGCTGTGGAAAGATACCTACAGGATCAACAGCTCCTAGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTACTGTGCCTTGGAATAGTAGTTGGAGTAATAAATCTCTGAATGAGATTTGGAATAACATGACCTGGATGGAGTGGGAAAGAGAAATTAACAATTACACAAGCTTAATATACACCTTAATTGAAGAATCGCAGAACCAACAAGAAAAGAATGAACAAGAATTATTGGAATTAGATAAGTGGGCAAGTTTGTGGAATTGGTTTAGCATAACAAACTGGCTGTGGTATATAAAAATATTCATAATGATAGTAGGAGGCTTGATAGGTTTAAGAATAGTTTTTGCTGTACTTTCTATAGCGAATAGAGTTAGGCAGGGATATTCACCATTATCATTACAGACCCGCCTCCCAACCCCGAGGGGACCCGACAGGCCCGAAGGAATCGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCCGGAATATTAGTGAACGGATTCTTAGCACTTTTCTGGGACGACCTGCGGAGCCTGTGCCTCTTCAGCTACCACCGCTTGAGAGACTTACTCTTGATTGTAGCGAGGATTGTGGAACTTCTGGGACGCAG---------------------GGGGTGGGAAGTCCTCAAATATTGGTGGAATCTCCTACAGTATTGGAGTCAGGAACTAAAGAATAGTGCTGTTAGCTTGCTCAACGCCACAGCCATAGCAGTAGCTGAGGGGACAGATAGGGTTATAGAATTAGTACAAGCAGCTTGTAGAGCTATTCTCCACATACCTAGAAGAGTGAGACAGGGCTTGGAAAGGGCTTTGCTATAAGATGGGTGGCAAGTGGTCAAAACGTAGTACGGTTGGATGGTC------TACCATAAGAGAAAGAATGAAACGAGTTGAG-CCA---------GCAGCAGAT------------GGGGTGGGAGCAGCATCTCGAGACCTGGAAGAACATGGAGCACTCACAAGTAGCAATACGACAGCTAATAATGCTGCTTGTGCCTGGCTAGAAGCACAAGAGGAGGAGGAAGTGG 15 | >AY835765.1_5084-84 16 | AGGCATCTCCTATGGCAGGAAGAAGCGGAGACAGCGACGAAGAGCTCCTCAAGACAGTCAGACTCATCAAGCTTCTCTATCAAAGCAGTATGTA----GTACATGTAATGCAAGCTTTAAACACATTAGCAATAGTAGCATTAGTAGTAGCAATAATAATAGCAATAGTTGTGTGGTCCATAGTAGCCATAGAATATAGGAAAATATTAAGACAAAGAAAAATAGACAGGATAATTGATAGAATAATAGAAAGAGCAGAAGACAGTGGCAATGAGAGCGAAGGGGATCAGGAGGAATTATCAGCACTTGT---GGAGATGGGGCACGATGCTCCTTGGGATATTGATGATCTGTACTGCTGGAGAACAATTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAAGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGATACAGAGGTACATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAGTGGAATTGAAAAATGTAACAGAAAACTTTAACATGTGGAAAAATAACATGGTAGAACAGATGCATGAGGATATAATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTAAAATTAACCCCACTCTGTGTTACTTTGAATTGCACTGAT---CTGAGGAATGATACTAATACCAATA---GTAGTAGCGAGGGAG-TGAT---GGAGAAAGGA--GAAATAAAAAACTGCTCTTTCAACATCACCACAAGCATAAGAGATAAGGTGCAGAAAGAATATGCAACTTTTTATAAACTTGATATAGTACCAATAAAT------AATGATAAT---------ACCAGCTATAGGTTGATAAGTTGTAACACCTCAGTCATTACACAGGCCTGTCCAAAGGTATCCTTTGAGCCAATTCCCATACATTATTGTACCCCAGCTGGTTTTGCGATTCTAAAGTGTAAAGATAAAAAGTTCAATGGAAAAGGACCATGTAAAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTATCAACTCAACTGCTGTTAAATGGCAGTCTAGCAGAAGAAGAGGTAATAATTAGATCTGACAATCTCACGGACAATGCTAAAACTATAATAGTACAGCTGAACGAATCTGTAGAAATTAATTGTATAAGACCCAACAACAATACAAGAAAAAGTATACATAT------AGGACCAGGGAGAGCATTTTATACAACAGGACAAATAGTA---GATATAAGACAAGCACATTGTAACATTAGTAGAGCAAAATGGAATAACACTTTAAAACAGATAGCTGAAAAATTAAGAGAACAATTTGAGAATAGAA---CAATAGTCTTTAATCAATCCTCAGGAGGGGACCCAGAAATTGTAACGCACAGTTTTAATTGTGGAGGGGAATTTTTCTACTGTAATTCAACACAACTGTTTAATAGTACTTGGA------ATGGTACT---GAG---------GGGTCATATAACACTTCAGAAAATATC---AATATCACACTCCCATGCAGAATAAAACAAATTGTAAACATGTGGCAGGAAGTAGGAAAAGCAATGTATGCTCCTCCCATCAGTGGACAAATTAAATGTTCATCAAATATTACAGGGCTGCTATTGACAAGAGATGGTGGTAATGACCGG-----ACCAAGA-CCGAGGTTTTCAGACCTGGGGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAGATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCCACTAAGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGTGACA---TTAGGAGCTTTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAGCGTCAATGACGCTGACGGTACAGGCCAGACTATTATTGTCTGGTATAGTGCAACAGCAGAACAATTTGCTGAGGGCTATTGAGGCGCAACAGCATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAGTCCTAGCTGTGGAAAGATACCTAAGGGATCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTACTGTGCCTTGGAATACTAGTTGGAGTAATAGATCTTTGGATGAGATTTGGGATAACATGACCTGGATGCAGTGGGAAAGAGAAATTAACAATTACACAGGCTTAATATACACCTTAATTGAACAATCGCAAAACCAACAAGAAAAGAATGAACAAGAATTATTGGAATTAGATAAATGGGCAAGTTTGTGGAATTGGTTTGACATAACAAAATGGCTGTGGTATATAAAAATATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTACTATACTTTCTATAGTGAATAGAGTTAGGCAGGGATACTCACCATTATCGTTTCAGACCCACCTCCCAGCCCAGAGGGGACCCGACAGGCCCGAAGGAATCGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCCGGTCGCTTAGTGGATGGATTCTTAGCAATTATCTGGGTCGACCTACGGAGCCTGTGCCTTTTCAGCTACCACCGCTTGAGAGACTTACTCTTGATTGTAACGAGGATTGTGGAACTTCTGGGACGCAG---------------------GGGGTGGGAAGCCCTCAAATATTGGTGGAATCTCCTACAGTATTGGAGTCAGGAACTAAAGAATAGTGCTATTAGCTTGCTCAATGCCACAACCATAGCAGTAGCTGAGGGGACAGATAGGGTTATAGAAGTATTACAAAGAGCTTATAGAGCTATTATCCACATACCTACAAGAATAAGACAGGGCTTGGAAAGGGCTTTGCTATAAGATGGGTGGCAAGTGGTCAAAACG---TTTGGGTGAATGGCA------TACTGTAAGGGAAAGAATGAGACAAGCTGAG-CCA---------GCAGCAGAT------------GGGGTGGGAGCAGCATCTCGAGACCTGGAAAAACATGGAGCACTCACAAGTAGCAACACAGCAGCTAACAATGCTGATTGTGCCTGGCTAGAAGCACAAGAGGAGGAGGAGGTGG 17 | >MN055643.1_127GC 18 | AGGCATCTCCTATGGCAGGAAGAAGCGGAGACAGCGACGAAGAGCTCCTCAAGACAGTCAGACTCATCAAGTTTCTCTATCAAAGCAGTAAGTA----GTACATGTAATGCAATCTTTACAAATATTAGCACTAGTAGCATTAGTAGTAGCAATAATAATAGCAATAGTTGTGTGGTCCATAGTATTCATAGAATATAGGAAAATATTAAGACAAAGAAAAATAGACAGGTTAATTGATAGAATAAGAGAAAGAGCAGAAGACAGTGGCAATGAGAGTGGAGGGGATCAGGAAGAATTATCAGCACTTG---TGGAGATGGGGCACCATGCTCCTTGGGTTATTAATAATCTGTAGTGCTGCAGAAAAATTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAAGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGATACAGAAGTACATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAGTAGTATTGGAAAATGTGACAGAAAATTTTAACATGTGGAAAAATAACATGGTAGAACAGATGCATGAGGATATAATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTAAAATTAACCCCACTCTGTGTTACTTTAAATTGCACTGAT---TGGAGGAATGCTACTAATACCACTA---GTAGTAGCGGGGGAA-CGAT---GGGGAGAGGA--GAAATAAAAAACTGCTCTTTCAATATCACCACAAGCATAAGAGATAAGGTGCAGAAAGAATATGCACTTTTTTATAAACTTGATGTAGTACCAATAGAC------AATGATAAT---------AGTAGCTATAGGTTGATAAGTTGTAACACCTCAGTCATTACACAGGCCTGTCCAAAAGTATCCTTTAATCCAATTCCCATACATTATTGTACCCCGGCTGGTTTTGCGATTCTAAAGTGTAAAGATAAGAAGTTCAATGGAACAGGACCATGCACAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTATCAACTCAACTGCTGTTAAATGGCAGTCTAGCAGAAGAAGAGGTAGTAATTAGATCTGACAATTTCACGGACAATGCTAAAACCATAATAGTACAGCTGAACGAATCTATAGAAATTAATTGTACAAGACCCAACAACAATACAAGAAAAAGTATACATAT------AGGACCAGGGAGAGCATTCTATACAACAGGAGAAATAACAGGAGATATAAGACAAGCACATTGTAACCTTAGTAGAGCAAAATGGAATAACACTTTAAAACAGATAGTTAAAAAATTAAGAGAACAATTAGGGAATA---AAACAATAGTCTTTAAACAATCCTCAGGAGGGGACCCAGAAATTGTAACGCACAGTTTTAATTGTGGAGGGGAATTTTTCTACTGTGATTCAACACAACTGTTTAATAGTACTTGG------AATGATACTGAAGAG---TCTGAAGGGTCAAAGAACACTGAAGAAAATACC---CCACTCACACTCCCATGCAGAATAAAACAAATTATAAACATGTGGCAGGAAGTAGGAAAAGCAATGTATGCCCCTCCCATCAGAGGACAAATTAGATGTTCATCAAATATTACAGGGCTGCTATTAACAAGAGATGGTGGTAAT---AACGATAACAA---GACCGAGATCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGTGGGA---ATAGGAGCTGTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAGCGTCAATGACGTTGACGGTACAGGCCAGACTATTATTGTCTGGTATAGTGCAACAGCAGAACAATTTGCTGAGGGCTATTGAGGCGCAACAGCATCTGTTGCAACTCACAGTTTGGGGCATCAAGCAGCTCCAGGCAAGAGTCCTGGCTGTGGAAAGATACCTAAAGGATCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTACTGTGCCTTGGAATGCTAGTTGGAGTAATAAATCCCTGGATAAGATTTGGAATAACATGACCTGGATGGAGTGGGAAAGAGAAATTAACAATTACACAAGCTTAATATACACCTTAATTGAAGAATCGCAGAATCAACAAGAAAAGAATGAACAAGAATTATTGGAATTGGATAAATGGGCAAGTTTGTGGAATTGGTTTGACATAACAAAATGGCTGTGGTATATAAAAATATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTGCCGTACTTTCTATAGTGAATAGAGTTAGGCAGGGATACTCACCATTATCGTTTCAGACCCGCCTCCCAGCCCCGAGGGGACCCGACAGGCCCGAAGGAATCGAAGAAGAAGATGGAGAGAGAGACAGAGACAGATCCGGTCGCTTAGTGGGTGGATTCTTAGCACTTTTCTGGGTCGACCTGCGGAGCCTGTGCCTCTTCAGCTACCACCGCTTGAGAGACTTACTCTTGATTGTAACGAGGATTGTGGAACTTCTGGGACGCAG---------------------GGGGTGGGAAGTCCTCAAATATTGGTGGAATCTCCTACAGTATTGGAGTCAGGAACTACAGAATAGTGCTGTTAGCTTGCTCAATGCCACAGCCATAGCAGTAGCTGAGGGGACAGATAGGGTTATAGAAGTATTACAAAGAGCTTATAGAGCTATTCTCCACATACCTACAAGAATAAGACAGGGCTTGGAAAGGACTTTGCTATAAGATGGGTGGCAAGTGGTCAAAACGTAGTGTAGATAGATGGCC------TGCTGTAAGGGAAAGAATGAGACGGGCTGAG-CCA---------GCAGCAGAT------------GGGGTGGGAGCAGTATCTCGAGACCTGGAAAAACATGGAGCAATCACAAGTAGCAATACAGCAAATAACAATGCTGCTTGTGCCTGGCTAGAAGCACAAGAGGAGGAGGAGGTGG 19 | >AY835779.1_5019-84 20 | AGGCATCTCCTATGGCAGGAAGAAGCGGAGACAGCGACGAAGATCTCCTCAAGACAGTCAGACTCATCAAGCTTCTCTATCAAAGCAGTATGTA----GTACATGTAATGCAATCTTTACAAATATTAGCAATAGTAGCATTAGTAGTAACAATAA---TAGCAATAGTTGTGTGGTCCATAGTATTCATAGAATATAGGAAAATATTAAGACAAAGAAAAATAGACAGGTTAATTGATAGAATAATAGAAAGAACAGAAGACAGTGGCAATGAGAGTGAAGGAGATCAGGAAGAATTATCAGCACTGGT---GGAGATGGGGCACCATGCTCCTTGGGATATTAATGATCTGTAGTGCTGCAGACAAATTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAAGAAGCAACCACCACTCTATTCTGTGCATCAGATGCTAAAGCATATGATACAGAGGTACATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAGTAGTATTGGAAAATGTGACAGAAAATTTTAACATGTGGAAAAATAATATGGTAGAACAGATGCATGAGGATATAATCAGTCTATGGGATCAAAGCCTAAAGCCATGTGTAAAATTAACCCCACTCTGTGTCACTTTAAATTGCACTGAT---TTGAAGAATGATACTAATACCAATA---GTAGTAGCGGGGAAA-TGAA---GGAGAAAGGA--GAAATGAAAAACTGCTCTTTCAATATCAGCACAAGCATAAGAGATAAGGTGCAGAAAGAACATGCACTTTTTTATAAATTTGATATAGTACCAATAGAT------AATGATACT---------ACTAGATATAGGTTGATAAGTTGTAATACCTCAGTCATTACACAGGCCTGTCCAAAGGTATCTTTTGAGCCAATTCCCATACATTATTGTGCCCCGGCTGGTTTTGCGATTCTCAAGTGTAACGATAAAAAGTTCAATGGAAAAGAACTATGTAAAAATGTTAGCACAGTACAATGTACACATGGAATTAGGCCAGTGGTATCAACTCAACTACTGTTAAATGGCAGTCTAGCAGAAGAAGAAGTAGTAATTAGATCTGAAAATTTCACGAATAATGCTAAAACCATAATAGTACAGCTGAATCAATCTGTAGAAATTAATTGTACAAGACCCAACAACAAGAAAGTAAGAAGGATACATAT------AGGACCAGGGAGAGCATTTTATACAACAGGACAAATAGTAGGAAATATAAGACAAGCACATTGTAACCTTAGTAGAACAAAATGGAATGACACTTTAAAACAGATAGTTAGTAAATTAAGAGAACAATTTGGGAATAATAAAACAATAGTCTTTAAGCAATCCTCAGGAGGGGACCCAGAAATTGTAATGCACAGTTTTAATTGTGGAGGGGAATTTTTCTACTGTAATTCAACACAACTGTTTAATAGTACTTGGA------ATGTTACT---GGA---------GGGACAAATGGCACTGAAGGAAGTAACCCAAATATCACACTCCCATGCAGAATAAAACAAATTATAAACATGTGGCAGGAAGTAGGAAAAGCAATGTATGCCCCTCCCATCAGTGGACAAATTAGATGTTCATCAAATATTACAGGACTGCTATTAACAAGAGATGGTGGTAGTAACAAG-----AGCGATA-CCGAGATCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGCGGGA---CTAGGAGCTTTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAGCGGCAGTAACGCTGACGGTACAGGCCAGACTATTATTGTCTGGTATAGTGCAACAGCAGAACAATTTGCTGAGGGCTATTGAGGCGCAACAGCATCTGTTGCAACTAACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAGTCCTGGCTGTGGAAAGATACCTAAAGGATCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTGCTGTGCCTTGGAATGCTAGTTGGAGTAATAGATCTCTGGATACGATTTGGCATAACATGACCTGGATGGAGTGGGAAAGAGAAATTGATAATTACACAGGCTTAATATACTCCTTAATTGAAAAATCGCAGAACCAACAAGAAAAGAATGAACAAGAATTATTGGAATTAGATAAATGGGCAAGTTTGTGGAGTTGGTTTGACATAACAAATTGGCTGTGGTATATAAAAATATTCATAATGATAGTAGGAGGCTTGATAGGTTTAAGAATAGTTTTTGCTGTACTTTCTGTAGTGTATAGAGTTAGGCAGGGATACTCACCATTATCGTTTCAGACCCACCTCCCAGCTCCGAGGGGACACGACAGGCCCGAAGGAATCGAAGAAGAAGGTGGAGAGAGAGACAGAGACACATCCGGACGATTAGTGGATGGATTCTTAACACTTATCTGGGTCGATCTGAGGAGCCTGTGCCTCTTCAGCTACCACCGCTTGAGAGACTTACTCTTGATTGTAACGAGGATTGTGGAACTTCTGGGACGCAG---------------------GGGGTGGGAAATCCTCAAAATTTGGTGGAATCTCCTACAGTATTGGAGTCAGGAACTAAAGAATAGTGCTGTTAGCTTGCTCAATGCCACAGCCATAGCAGTAGCTGAGGGGACAGATAGGATTATAGAAGTATTACAAAGAGCTTGTAGAGCTATTCTCCACATACCTAGAAGAATAAGACAGGGCTTGGAAAGGGCTTTGCTATAAAATGGGTGGCAAGTGGTCAAAAAGTAGTGTGGTTGGATGGCC------TAAGATAAGGGAAAGAATGAGACGAGCTGAG-CCA---------GCAGCAGAG------------GGGGTGGGAGCAGTATCTCGAGACCTGGAAAAGCATGGAGCAATCACAAGTAGCAATACAGCAGCCAACAATCCTGATTGTGCCTGGCTAGAAGCACAAGAGGAGGAGGAGGTGG 21 | >AF004394.1 22 | AGGCATCTCCTATGGCAGGAAGAAGCGGAGACAGCGACGAAGAACTCCTCAAGACAGTCAGACTCATCAACTTTCTCTATCAAAGCAGTAAGTA----GTAAATGTAATGCAACCTTTACAAATATTAGCAATAGTAGCATTAGTAGTAGCAGCAATAATAGCAATAGTTGTGTGGACCATAGTATTCATAGAATATAGGAAAATATTAAGACAAAGAAAAATAGACAGGTTAATTGATAGGATAACAGAAAGAGCAGAAGACAGTGGCAATGAAAGTGAAGGGGATCAGGAAGAATTATCAGCACTTGT---GGAAATGGGGCATCATGCTCCTTGGGATGTTGATGATCTGTAGTGCTGTAGAAAATTTGTGGGTCACAGTTTATTATGGGGTACCTGTGTGGAAAGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGATACAGAGGTACATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAGTAGTATTGGAAAATGTGACAGAAAATTTTAACATGTGGAAAAATAACATGGTAGAACAGATGCATGAGGATATAATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTAAAATTAACCCCACTCTGTGTTACTTTAAATTGCACTGAT---TTGAGGAATGTTACTAATATCAATA---ATAGTAGTGAGGGAA-TGA---------GAGGA--GAAATAAAAAACTGCTCTTTCAATATCACCACAAGCATAAGAGATAAGGTGAAGAAAGACTATGCACTTTTTTATAGACTTGATGTAGTACCAATAGAT------AATGATAAT---------ACTAGCTATAGGTTGATAAATTGTAATACCTCAACCATTACACAGGCCTGTCCAAAGGTATCCTTTGAGCCAATTCCCATACATTATTGTACCCCGGCTGGTTTTGCGATTCTAAAGTGTAAAGATAAGAAGTTCAATGGAACAGGGCCATGTAAAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTGTCAACTCAACTGCTGTTAAATGGCAGTCTAGCAGAAGAAGAGGTAGTAATTAGATCTAGTAATTTCACAGACAATGCAAAAAACATAATAGTACAGTTGAAAGAATCTGTAGAAATTAATTGTACAAGACCCAACAACAATACAAGGAAAAGTATACATAT------AGGACCAGGAAGAGCATTTTATACAACAGGAGACATAATAGGAGATATAAGACAAGCACATTGCAACATTAGTAGAACAAAATGGAATAACACTTTAAATCAAATAGCTACAAAATTAAAAGAACAATTTGGGAATAATAAAACAATAGTCTTTAATCAATCCTCAGGAGGGGACCCAGAAATTGTAATGCACAGTTTTAATTGTGGAGGGGAATTTTTCTACTGTAATTCAACACAACTGTTTAATAGTACTTGGAATTTTAATGGTACTTGGAAT---TTAACACAATCGAATGGTACTGAAGGAAATGAC---ACTATCACACTCCCATGTAGAATAAAACAAATTATAAACATGTGGCAAGAAGTAGGAAAAGCAATGTATGCCCCTCCCATCAGAGGACAAATTAGATGTTCATCAAATATTACAGGGCTGATATTAACAAGAGATGGTGGAAATAACCACAATAATGATA---CCGAGACCTTTAGACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGTGGGAACAATAGGAGCTATGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAGCGTCAATAACGCTGACGGTACAGGCCAGACTATTATTGTCTGGTATAGTGCAACAGCAGAACAACTTGCTGAGGGCTATTGAGGCGCAACAGCATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAGTCCTGGCTGTGGAAAGATACCTAAGGGATCAACAGCTCCTAGGGATTTGGGGTTGCTCTGGAAAACTCATCTGCACCACTGCTGTGCCTTGGAATGCTAGTTGGAGTAATAAAACTCTGGATATGATTTGGAATAACATGACCTGGATGGAGTGGGAAAGAGAAATCGACAACTACACAGGCTTAATATACACATTAATTGAAGAATCGCAGAACCAGCAAGAAAAGAATGAACAAGAATTATTAGAATTAGATAAGTGGGCAAGTTTGTGGAATTGGTTTGACATAACAAATTGGCTGTGGTATATAAAAATATTCATAATGATAGTAGGAGGCTTGATAGGTTTAAGAATAGTTTTTACTGTACTTTCTATAGTAAATAGAGTTAGGCAGGGATACTCACCATTGTCATTTCAGACCCACCTCCCAGCCCCGAGGGGACCCGACAGGCCCGAAGGAATCGAAGAAGAAGGTGGAGACAGAGACAGAGACAGATCCGTGCGATTAGTGGATGGATTCTTAGCACTTTTCTGGGACGACCTGCGGAGCCTGTGCCTCTTCAGCTACCACCGCTTGAGAGACTTACTCTTGATTGTAGCGAGGATTGTGGAACTTCTGGGACGCAG---------------------GGGGTGGGAAGCCCTCAAGTATTGGTGGAATCTCCTGCAGTATTGGAGTCAGGAACTAAGGAATAGTGCTGTTAGCTTGCTTAATGCCACAGCTATAGCAGTAGCTGAGGGGACAGATAGGGTTATAGAAATAGTACAAAGAATTTATAGGGCTATTCTCCACATACCTACAAGAATAAGACAGGGCTTGGAAAGGCTTTTGCTATAAGATGGGTGGCAAGTGGTCAAAACGTAGTATGGCTGGATGGCC------TACTGTAAGGGAAAGAATGACACGAGCTGAG-CCA---------GCAGCAGAT------------GGGGTGGGAGCAGCATCTCGGGACCTGGAGAAACATGGAGCACTCACAAGTAGCAATACAGCAACTAATAATGCTGCTTGTGCCTGGCTAGAAGCACAAGAGGAAGAGGAGGTGG 23 | >AY835755.1_5157-85 24 | AGGCATCTCCCATGGCAGGAAGAAGCGGAGACAGCGACGAAGAGCTCCTCAAGACAGTCAGACTCATCAAGCTTCTCTATCAAAGCAGTAAGTA----GTACATGTAATGCAATCTTTACAAATATTAGCACTAGTAGCATTAGTAGTAGCAGCAATAATAGCAATAGTTGTGTGGACCATAGTATTCATAGAATATAGGAAAATATTAAGACAAAGAAAAATAGACAGGTTAATTGATAGAATAAGAGAAAGAGCAGAAGACAGTGGCAATGAAAGTGAAGGAGACCAGGAAGAATTGTCAGCACTTGT---GGAGATGGGGCACCATGCTCCTTGGGATGTTGATGATCTGTAGTGTTGCAGGAAAATTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAAGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGATACAGAGGTTCATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAGTAGTATTGAAAAATGTGACAGAAAATTTTAACATGTGGAAAAATAACATGGTAGAACAGATGCATGAGGATATAATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTAAAATTAACCCCACTCTGTGTCACTCTAAATTGCACTGAT---TTGGGGAATGATACTAATACCAATAATAGTAGTAGTTGGGGAA-AGAT---GGAGAAAGGA--GAAATAAAAAACTGCTCTTTCAATATCACCACAAACATAAGAGACAAGGTGCGGAAAGAATATGCACTTCTTTATAAACTTGATGTAGTACCAATAGAT------GATAATAAT---------ACTAGCTATAGGTTGATAAATTGTAACACCTCAGTCATTACACAGGCCTGTCCAAAGGTATCCTTTGAGCCAATTCCCATACATTATTGTGCCCCGGCTGGTTTTGCGATTCTAAAGTGTAAAGATAAAAAGTTCAATGGAAAAGGACCATGTACAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAGTGGTATCAACTCAACTACTGTTAAATGGCAGTCTAGCAGAAGAAGAGGTAGTAATTAGATCTGACAATCTCACGAACAATGCTAAAACCATAATAGTACAGCTGAAGGAACCTGTAGAAATTAATTGTACAAGACCCAACAACAATACAAGAAAAGGTATATATAT------AGGACCAGGGAGAGCATTTTATACAACAGAAAAAATAATAGGAGATATAAGACGAGCACATTGTAACATTAGTAGAGCAAAATGGAATAACACTTTACAACAGATAGTTAGAAAATTAAGAGAAAAATTTGAGAATA---AAACAATAGTCTTTAATCGATCCTCAGGAGGGGACCCAGAAGTTGTAATGCACAGTTTTAATTGTGGAGGAGAATTTTTCTACTGTAACTCATCACAACTGTTTAATAGTACTTGGAAT---GATACTATTAAAGGGATAAATAA------------CCCTGAAGAA---------GTTATCACACTCCCATGTAGAATAAAACAAATTATAAACAGGTGGCAGGAAGTAGGAAAAGCAATGTATGCCCCTCCCATCAGAGGACAAATTAGATGTTCATCAAATATTACTGGGCTGCTATTAACAAGAGATGGTGGTAATAGCGAGGACAATACCA---CCGAGATCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTACAAATTGAACCATTAGGAGTAGCACCCACCCAGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGTGGGAATGCTAGGAGCTGTGTTCCTTGGGGTCTTGGGAGCAGCAGGAAGCACCATGGGCGCAGTGTCAATGACGCTGACGGTACAGACCAGACAATTATTGTCTGGTATAGTGCAACAGCAGAACAATTTGCTGAGGGCTATTGAGGCGCAACAACATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAGTCCTGGCTGTGGAAAGATACCTACAGGATCAACAGCTCCTAAGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTACTGTGCCTTGGAATAATAGTTGGAGTAATAAATCTCTGAATGAGATTTGGGATAACATGACCTGGATGGAGTGGGAAAGAGAAATTAACAATTACACAAACTTAATATACACCTTAATTGCAGAATCGCAAAACCAACAAGAAAAGAATGAGCAAGAATTATTGGAATTAGATAAGTGGGCAAGTTTGTGGAATTGGTTTAGCATAACAAACTGGCTGTGGTATATAAGAATATTCATGATGATAGTAGGAGGCTTGATAGGTTTAAGAATAGTTGTTGCTGTGCTTTCTATAGTGAATAGAGTTAGGCAGGGATATTCACCATTATCATTACAGACCCGCCTCCCAACCCCGAGGGGACCCGACAGGCCCGAAGGAATCGAAGAAGAAGGTGGAGAGAGAGACAGAGGCAGATCCGAAATATTAGTGAACGGATTCTTAGCACTTTTCTGGGACGACCTGCGGAGCCTGTGCCTCTTCAGCTACCACCGCTTGAGAGACTTACTCTTGATTGTAGCGAGGATTGTGGAACTTCTGGGACGCAG---------------------GGGGTGGGAAGTCCTCAAATATTGGTGGAATATCCTACAGTATTGGAGTCAGGAACTAAAGAATAGTGCTGTTAGCTTGCTCAACGCCACAGCCATAGCAGTAGCTGAGGGGACAGATAGGGTTATAGAAGTAGTACAAAGAGCTGGTAGAGCTATTCTCCACATACCTAGAAGAGTGAGACAGGGCTTGGAAAGGGCTTTGCTATAAGATGGGTGGCAAGTGGTCAAAACGTAGTACGCCAGGATGGGC------TACCATAAGAGAAAGAATGAGACGCACTGAG-CCA---------GCAACTGAT------------GGGGTGGGAGCAGCATCTCGAGACCTGGAAAAACATGGAGCACTCACAAGTAGCAATACAACAGCTAATAATGCTGCTTGTGCCTGGCTAGAAGCACAAGAGGAGGAGGAAGTGG 25 | >AY835771.1_5082-86 26 | AGGCATCTCCTATGGCAGGAAGAAGCGGAGACAGCGACGAAGAGCTCCTCAAGACAGTCAGACTCATCAGGTTTCTCTATCAAAGCAGTATGTA----GTACATGTAATGCAACCTTTACACATAGTAGCAATAGTAGCATTAGTAGTAGCAACAATAATAGCAATAGTTGTGTGGTCCATAGTATTCATAGAATATAGGAAAATATTAAGACAAAGGAAAATAGACAGGTTAATTGATAGAATAAGAGAAAGAGCAGAAGACAGTGGCAATGAGAGTGAAGGGGATCAGGAGGAATTGTCAGCACTTG---TGGAGATGGGGCACCATGCTCCTTGGGATATTGATGATCTGTAGTGCTACAGAAAAATTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAAGAAGCAAACACCACTCTTTTTTGTGCATCAGATGCTAAAGCATATGATACAGAGGTACATAATGTCTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAGTAGTATTGGTAAATGTGACAGAAAATTTTAACATGTGGAAAAATAACATGGTAGAACAGATGCATGAGGATATAATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTAAAATTAACCCCACTCTGTGTCACTTTAAATTGCACTGAT---TGGAAGAATGCTACTAATACCAATA---GTAGTAGAGAGGTAA-CGAT---GGAGAGAGGA--GAAATAAAAAACTGCTCTTTCAATATCACCACAAGCATGAGAGATAAGATGCAGAAAGTATATGCACTTTTTTATAAACTTGATGTAGTACCAATAGAT------AATGATAATGATAGTAATACCAGCTATAGGTTGATAAGTTGTAACACCTCAGTCATTACACAGGCCTGTCCAAAGGTATCCTTTGAGCCAATTCCCATACATTATTGTGCCCCGGCTGGTTTTGCGATTATAAAGTGTAAAGATAAGAAGTTCAATGGAACAGGACCATGTAAAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTATCAACTCAACTGCTGTTAAATGGCAGTCTAGCAGAAGAAGAGGTAGTAATTAGATCTGCCAAATTCTCGGACAATACTAAAACCATAATAGTACAGCTGAACGAATCTGTAGAAATTAATTGTATAAGACCCAACAACAATACAAGAAAAAGTATAACTAT------AGGACCAGGGAGAGCATTTTATACAACAGGAGATATAATAGGAGATATAAGACAAGCACATTGTAACCTTAGTAGAACAGAATGGAATAACACTTTGATACAGATAGTTGAAAAATTAAGAGAACAATTTAGGAATA---AAACAATAGCCTTTAATCGATCCTCAGGAGGGGACCCAGAAATTGTAATGCACAGCTTTAATTGTGGAGGAGAATTTTTCTACTGTAATACAACACAACTGTTTAATAGTACTTGGAATGTAACTGGTACTTGGAATGTTACTGCAAGGTCAAATTACACTGGAGGAAATGAC---AATATCACACTCCCATGCAGAATAAAACAAATTATAAACATGTGGCAGGAAGTAGGAAAAGCAATGTATGCCCCTCCCATCAGCGGACAAATTAGATGTTCATCAAATATTACAGGGCTGCTGTTAACAAGAGATGGTGGTAACGAGAGCGAGACCAC---CACCGAGACCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAGATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCGCCCACCAAGGCGAAGAGAAGAGTGGTGCAGAGAGAGAAAAGAGCAGTGGGA---ATAGGAGCTAAGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAGCGTCACTGACGCTGACGGTACAGGCCAGACAATTATTGTCTGGTATAGTGCAACAGCAGAGCAATTTGCTGAGAGCTATTGAGGCGCAACAGCATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAGTCCTGGCTGTGGAAAGATACCTAAAGGATCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTACTGTGCCTTGGAATGCTAGTTGGAGTAATAAAAATCTGAGTCAAATTTGGGATAACATGACCTGGATGGAGTGGGAAAGAGAAATTGACAATTACACAAGCTTAATATACACCTTAATTGAAGAATCGCAAAACCAACAAGAAAAGAATGAACAAGAATTATTGGAATTGGATAAATGGGCAAGTTTGTGGAATTGGTTTAACATAACACAATGGCTGTGGTATATAAAAATATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTGCTGTACTTTCTATAGTGAATAGAGTTAGGCAGGGATACTCACCATTATCGTTGCAGACCCGCCTCCCAAGCCAGAGGGGACCCGACAGGCCCGAAGGAATCGAAGAAGAAGGTGGAGAGAGAGACAGAGACGGATCCGTCAGATTAGTGGATGGCTTCTTAGCACTTATCTGGGACGACCTGCGGAGCCTGTGCCTCTTCAGCTACCACCGCTTGAGAGACTTACTCTTGATTGTAACGAGGATTGTGGAACTTCTGGGACGCAG---------------------GGGGTGGGAAATCCTCAAATATTGGTGGAATCTCCTGCAGTATTGGAGTCAGGAACTAAAGAATAGTGCTGTTAGCTTGTTTGATGCCTTAGCCATAGCAGTAGCTGAGGGGACAGATAGGGTTATAGAAGTATTACAAAGAGCTTGTAGAGCTATTCTCCACATACCTAGAAGAATAAGACAGGGCTTGGAAAGGGCTTTGCTATAAGATGGGTGGCAAGTGGTCAAAACGTAGTGTG---GGATGGTC------TACTGTAAGGGAAAGAATGAAACAAACTGAG-CCA---------GCAGCAGAGCCAGCAGCAGAGGGGGTGGGAGCAGTATCTCGAGACCTGGAAAAACATGGGGCAATCACAAGTAGCAATACAGCAGCTACCAATGCTACTTGTGCCTGGCTGGAAGCACAAGAGGAGGAGGAAGTGG 27 | >AY835759.1_5048-82 28 | AGGCATCTCCTATGGCAGGAAGAAGCGGAGACAGCGACGAAGACCTCCTCAAGACAGTCAGACTCATCAAGTTTCTCTATCAAAGCAGTATGTA----GTACATGTAATGCAGCCTTTACAAATAGTAGTAATAGTAGCATTAGTAGTAGCAACAATAATAGCAATAGTTGTGTGGTCCATAGTATTCATAGAATATAGGAAAATATTAAGACAAAGAAAAATAGACAGGTTAATTGATAGAATAAGAGAAAGAGCAGAAGACAGTGGCAATGAGAGTGAAGGGGATCAGGAAGAATTATCAGCACTTG---TGGAAATGGGGCACTCTGCTCCTCGGGATATTGATGATCTGTAATGCTACAGAACAATTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAAGAAGCAAACACCACTCTATTCTGTGCATCAGATGCTAAAGCCTATGATACAGAGGTACATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAGTAGTATTGGAAAATGTGACAGAAAATTTTAACATGTGGAAAAACAACATGGTAGAACAGATGCATGAGGATATAATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTAAAATTAACCCCACTCTGTGTTACTTTAAATTGCACTGAT---TTGAGGAATGCTACTAATACCAATG---GTAGTAGCGGGAAAA-TGATGG---AGAGAGGA--GAAATGAAAAACTGCTCTTTCAATATCACCACAAGCATAAGAGATAAGATGCAGAAAGAATATGCGCTTTTTTATAAACTTGATGTAGTACCAATAGAT------AATGATAAT---------ACCAGCTATAGGTTGATAAGTTGTAACACCTCAGTCATTACACAGGCCTGTCCAAAGGTATCCTTTCAGCCAATTCCCATACATTATTGTGCCCCGGCTGGTTTTGCGATTCTAAAGTGTAAAGATAAGAAGTTCAATGGAACGGGACCATGTAAAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTATCAACTCAACTGCTGTTAAATGGAAGTCTAGCAGAAGAAGAGGTATTAATTAGATCTGCCAATTTCACGGACAATGCTAAAAACATAATAGTACAGCTGAACAAATCTGTAGAAATTAATTGTATAAGACCCAACAACAATACAAGAAAAAGTATACATAT------AGGACCAGGGAGAGCATTCTATACAACAGGAGACATAATAGGAAATATAAGACAAGCACATTGTAACATTAGTAGAGCAGAATGGTATAACACTTTAGAACAGATAGCTAAAAAATTAAGAGAACAATTTAGGAATA---AAACAATAGTCTTTAATCAATCCTCAGGGGGGGACCCAGAAATTGTAACGCACAGTTTTAATTGTGCAGGGGAATTTTTCTACTGTAATACAACAGAACTGTTTAATAGTACTTGGAATGGTACTAGTACTTGGAATGTTACTGAAGG---AAATTACACTGAA------GAA---AATATCACACTCCAATGCAGCATAAAACAAATTGTAAACATGTGGCAGGAAGTAGGAAAAGCAATGTATGCCCCTCCCATCAGAGGACAAATTAGATGCTCATCAAATATTACAGGGCTGCTATTAACAAGAGATGGTGGTACTGACAATAA---------GACCGAGACCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTAAAAATCGAACCATTAGGAATAGCACCCACCAAGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGTGGGA---ATAGGAGCTGTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAGCGTCAATAACGCTGACGGTACAGGCCAGACTATTATTGTCTGGTATAGTGCAACAGCAGAACAATTTGCTGAGGGCTATTGAGGCGCAACAGCATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAGTCCTGGCTGTGGAAAGATACCTACAGGATCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTACTGTGCCTTGGAATGCTAGTTGGAGTAATAAATCTCTAAGTGAGATTTGGGATAACATGACCTGGATGGAGTGGGAAAGAGAAATTAACAATTACACAGGCTTAATATACACCTTAATTGAAGAATCGCAGAACCAACAAGAAAAGAATGAACAAGAATTATTGGCATTAGATAAATGGGCAAGTTTGTGGAATTGGTTTGACATAACAAACTGGCTGTGGTATATAAAAATATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTGTTTGTAGTACTTTCTCTAGTGAATAGAGTTAGGCAGGGATACTCACCATTATCGTTTCAGACCCACCGCCCAGCCCCGAGGGGACCCGACAGGCCCGAAGGAATCGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCCGGTCGATTAGTGGATGGATTCTTAGCACTTATCTGGGTCGACCTGCGGAGCCTGTGCCTCTTCAGCTACCACCGCTTGAGAGACTTACTCTTGATTGTAACGAGGATTGTGGAACTTCTGGGACGCAG---------------------GGGGTGGGAAGTCCTCAAATATTGGTGGAATCTCCTGCAGTATTGGAGTCAGGAACTGAGGAATAGTGCTGTTAGCTTGCTCAACGCCACAGCCATAGCAGTAGCTGAGGGGACAGATAGGGTTATAGAAGTATTACAAAGAGCCTGTAGAGCTATTCTCCACATACCTACGCGAATAAGACAGGGCTTGGAAAGGGCTTTGCTATAAGATGGGTGGCAAGTGGTCAAAGCGTAGTGTGGTTGGATGGTC------TACTGTAAGGGAAAGAATGAGACGAGCTGAGGCCA---------GCAGCTGTGCCAGCAGCAGAGGGGGTGGGAGCAGTATCTCGAGACCTGGAAAAACATGGAGCAATCACAAGTACCAATACAGTAGCTATCAATGCTGCTTGTGCCTGGCTAGAAGCACAAGAGGAGGAGGAGGTGG 29 | >AY970947.1_H434 30 | AGGCATCTCCTATGGCAGGAAGAAGCGGAGACAGCGACGAAGACCTCCTCAGGACAGTCAGACTCATCAAGCTTCTCTATCAAAGCAGTAAGTA----GTACATGTAATGCAATCTTTACAAATATTAGCAATAGTAGCATTAGTAGTAACAGCAATAATAGCAATAGTTGTGTGGACCATAGTAGTCATAGAATATAGGAAAATATTAAGACAAAGAAAAATAGACAGGTTAATTGATAGAATAAGAGAAAGAGCAGAAGACAGTGGCAATGAAAGCGAGGGGGACCAGGAAGAATTATCAGCACTTGT---GGAGATGGGGCACCATGCTCCTTGGGATGTTGATGATCTGTAGTGCTACAGAGAAATTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAAGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGATACAGAGGTACATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCCAGCCCACAAGAAGTAGTATTGGCAAATGTGACGGAAAATTTTAACATGTGGAAAAATAACATGGTAGAACAAATGCATGAGGACATGATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTAAAATTAACCCCACTCTGTGTTACTTTAAATTGCACTAAT---TTAAGGAATGCTACTAATACCACTGCTACTAATACCACTAGTA-GTGG---ATGGGGAGAA--GAAATGACAAACTGCTCTTTCAATATCACCACAAGCATAAGAGATAAGGTTCGGCGAGAATATGCACTTTTTTATAAACTTGATGTAGTACCAATAGAT------AAGAATACT---------ACTAAATATAGGTTGATAAATTGTAACACCTCAGTCATTACACAGGCCTGTCCAAAGGTATCATTTGAGCCAATTCCCATACATTATTGTACCCCGGCTGGTTTTGCGATTCTAAAGTGTAATGATAAGAAGTTCAATGGAACAGGACCATGTACAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTGTCAACTCAACTGCTGTTAAATGGCAGTCTGGCAGAAGGAGGGGTAGTAATTAGATCTGAAAATTTCACGAACAATGCTAAAACCATAATAGTACAGCTGAATGAATCTATAGAAATTAATTGTACAAGACCCAACAACAATACCAGAAAAAGTATACATAT------AGGACCGGGGAGAGCATTCTATGCAACAGGAGAAATAATAGGAGATATAAGACAAGCACATTGTAACATTAGTAGAGGAAAATGGAATAACACTTTAAAACAGATAGTTACAAAATTAAAAGAACAATATGGGAATA---AAACAATAGTATTTAATTCATCCTCAGGAGGGGACCCAGAAATTGTAATGCACAGTTTTAATTGTGGAGGGGAATTTTTCTACTGTAATTCAACACAACTGTTTAATAGCACTTGGA------AAGATACT------------GGAGAGTTAAATAACCCTGAAGGAAATAGC---AACATCACACTCCCATGTAAAATAAAACAAATTATAAACAGGTGGCAGGGAGTAGGAAAAGCAATGTATGCCCCTCCCATCAGAGGACAAATTAGATGTTCATCAAATATTACAGGGCTGCTATTAACAAGAGATGGTGGTAACAACGTGAATAATACCA---CCGAGGTCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCTACCAAGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGTGGGAACGATAGGAGCTATGTTCCTTGGATTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAGCGTCTGTGGCGCTGACGGTACAGGCCAGACAATTATTGTCCGGTATAGTGCAACAGCAGAATAATTTGCTGAGGGCTATTGAGGCGCAACAACATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAGTCCTGGCTGTGGAAAGATACCTAAAGGATCAACAGCTCCTAGGGATTTGGGGTTGCTCTGGAAAACTCATCTGCACCACTGCTGTGCCTTGGAATGCTAGTTGGAGTAATAAATCTCTGAATAAGATTTGGGATAACATGACCTGGATGGAGTGGGAAAGAGAAATTGAAAATTACACAAGCTTAATATACACCTTAATTGAAGAACCGCAGAACCAGCAAGAAAAGAATGAACAAGAATTATTGGAATTAGATAAGTGGGCAAGTTTGTGGAATTGGTTTGACATAACAAACTGGCTGTGGTATATAAAAATATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTGCTGTACTTTCTATAGTGAATAGAGTTAGGCAGGGATACTCACCATTATCATTCCAGACCCGCCTCCCAGCCCCGAGGGGACCCGACAGGCCCGACGGAATCGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCCGGACGATTAGTGACTGGATTCTTAGCACTCATCTGGGACGATCTGCGGAGCCTGTGCCTCTTCAGCTACCACCGCTTGAGAGACTTACTCTTGATTGTAGCGAGGATTGTGGAACTTCTGGGACGCTG---------------------GGTGTGGGAAGCCCTCAAATATTGGCGGAACCTCCTGCAGTATTGGAGTCAGGAACTAAAGAATAGTGCTGTTAGTTTGCTTAATGCCACAGCTATAGCAGTAGCTGAGGGGACAGATAGGATTATAGAAGTAGTACAAAGAACTTGTAGAGCTATTCGCCACATACCTAGAAGAATAAGACAGGGCTTTGAAAGGGCTTTGCTATAAGATGGGGGGCAAGTGGTCAAAATATAGT------GGATGGTC------TGCCATAAGGGAAAGAATGAGACGAACTAAG-CCA---------GCAGCAGAG------------GGGGTAGGAGCAGTATCTCGAGACTTGGAAAAACATGGAGCAATCACAAGTAGCAATACAGCAGCTACTAATGCTGATTGTGCCTGGCTAGAAGCACAAGAGGAGGAAGAAGTGG 31 | >AY835777.1_5018-83 32 | AGGCATCTCCTATGGCAGGAAGAAGCGGAGACAGCGACGACGAGCTCCTCAAGACAGTCAGACTCATCAAGTTTCTCTATCAAAGCAGTATGTA----GTACATGTAATGCAATCTTTGCAAACATTAGCAATAGTAGCATTAGTAGTAGCAAGCATAATAGCAATAGTTGTGTGGGCCATAGTGTTCATAGAATACAGGAAAATATTAAGACAAAGAAAAATAGACAGGTTAATTGATAGAATAATAGAAAGAGCAGAAGACAGTGGCAATGAGAGTGAAGGAGATCAGGAAGAATTGTCAGCGCTTGT---GGAAATGGGGCACCATGCTCCTTGGGATGTTGATGATCTGTAGTGCTACAGAACAATTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAAGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGACACCGAGGTACATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCTAACCCACAAGAAGTAGTATTGGGAAATGTGACAGAAAATTTTAACATGTGGAAAAATAACATGGTAGAACAAATGCATGAGGATATAATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTAAAATTAACCCCACTCTGTGTTAGTTTGAAGTGCACTGAT---TTGAAGAATGATACTAATACCAATA---GTAGTAGCGGGAGAA-TGATGATGGAGGTAGGA--GAAATAAAAAACTGCTCTTTCAATATCACCACAAGCATAAGAAATAAGGTACAGAAAGAATATGCACTTTTTTATAAACTTGATGTAGTACCAATAGAT------AATGATAAT---------ACAAGCTATACGTTGATAAATTGTAACACCTCAGTCATTACACAGGCCTGTCCAAAGGCATCCTTTGAACCAATTCCCATACATTATTGTACCCCGGCTGGTTTTGCGATTCTAAAGTGTAATGATAAGAAGTTCGATGGAACAGGACCATGTACAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTGTCAACTCAACTGCTGTTAAATGGCAGTCTAGCAGAAGAAGGGGTAGTAATTAGATCTGAAAATTTCACGGACAATGCTAAAACCATAATAGTACAGCTGAATGAATCTGTAGAAATTAATTGTACAAGACCCAACAACAATACAAGAAAAAGTATACATAT------AGGACGAGGGAGAGCATTTTATGCAACAGGAGAAATAATAGGAGATATAAGGAAAGCACATTGTAACATTAGTGAAAAGAAATGGAATGACACTTTAAGACAGGTAGTTATAAAATTAAGAGAACAATTTGGGAATA---AAACAATAATCTTTAATCAATCCTCAGGAGGGGACCCAGAAATTGTAATGCACAGTTTTAATTGTGGAGGGGAATTTTTCTACTGTAATTCAACAAAGCTGTTTAATAGTACTTGGA------ATGATACT------------AAAGGGTGGAATGATACTAACAGGTCAAATAAAACTATCACACTCCAATGCAGAATAAAACAAGTTATAAACAGATGGCAGGAAGTAGGAAAAGCAATGTATGCCCCTCCCATCAGAGGACAAATTAGATGTACATCAAATATTACAGGGCTGCTATTAACAAGAGATGGTGGTAGTAACAAT-----AGCGAGA-CCGAGACCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAGCCATTAGGAGTAGCACCCACCAGGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGTGGGAACAATAGGAGCTATGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAGCGTCAATGACGCTGACGGTACAAGCCAGACTATTATTGTCTGGTATAGTGCAACAGCAGAACAATTTGCTGAGGGCTATTGAGGCGCAACAGCATCTGTTGCAACTCACAGTCTGGGGCATCAAACAGCTCCAGGCAAGAGTCCTGGCTGTGGAAAGATACCTAAGGGATCAACAGCTCCTAGGAATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTTCTGTGCCTTGGAATAATAGTTGGAGTAATAAATCTCTGAATGACATTTGGGATAACATGACCTGGATGGAGTGGGAGAGAGAGATTGACAATTATACAAGCTTAATATACACCTTAATTGAAGAATCGCAGAACCAACAAGAAAAGAATGAACAAGACTTATTGCAATTGGATACGTGGGCAAGTTTGTGGAATTGGTTTACCATAACAAATCGGCTGTGGTATATAAAAATATTCATAATGATAGTAGGAGGGTTAGTAGGTTTAAGAATAGTTTTTGCTGTACTTTCTATAGCGAATAGAGTTAGGCAGGGATACTCACCATTATCATTTCAGACCCGCCTCCCAGCCCCGAGGGGACCCGACAGGCCCGAAGGAATCGAAGAAGAAGGTGGAGAGAGAGACAGAGACGGATCCAGTCCATTAGTGCATGGATTCTTAGCACTCATCTGGGACGATCTGCGGAGCCTGTGCCTCTTCAGCTACCACCGCTTGAGAGACTTACTCTTGATTGTAGCGAGGATTGTGGAACTTCTGGGACGCAG---------------------GGGGTGGGAAGCCCTCAAATATTGGTGGAATCTCCTGCAGTATTGGAGTCAGGAACTAAAGAATAGTGCTGTTAGTTTGCTTAATGCCACAGCTATAGCAGTAGCTGAGGGGACAGATAGGGTTATAGAAGTAGTACAAAGAATTTGTAGAGCTATTCTCCACATACCTAGAAGAATAAGACAGGGCTTGGAAAGGGCTTTGCTATAAGATGGGTGGCAAATGGTCAAAACGTAGTGGGGGTGGATGGCC------TGCTGTAAGGGAAAGAATGAAACGAGCTGAG-CCA---------GCAGCAGTT------------GGGGTGGGAGCAGTATCTCGAGACTTGGAAAAACATGGAGCAATCACAAATAGCAATACAGCAACTACTAATGCTGATTGTGCCTGGCTAGAAGCACAAGAGGATGAGGAGGTGG 33 | >AF033819.3 34 | AGGCATCTCCTATGGCAGGAAGAAGCGGAGACAGCGACGAAGAGCTCATCAGAACAGTCAGACTCATCAAGCTTCTCTATCAAAGCAGTAAGTA----GTACATGTAATGCAACCTATACCAATAGTAGCAATAGTAGCATTAGTAGTAGCAATAATAATAGCAATAGTTGTGTGGTCCATAGTAATCATAGAATATAGGAAAATATTAAGACAAAGAAAAATAGACAGGTTAATTGATAGACTAATAGAAAGAGCAGAAGACAGTGGCAATGAGAGTGAAGGAGAAATATCAGCACTTGTGGAGATGGGGGTGGAGATGGGGCACCATGCTCCTTGGGATGTTGATGATCTGTAGTGCTACAGAAAAATTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAGGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGATACAGAGGTACATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAGTAGTATTGGTAAATGTGACAGAAAATTTTAACATGTGGAAAAATGACATGGTAGAACAGATGCATGAGGATATAATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTAAAATTAACCCCACTCTGTGTTAGTTTAAAGTGCACTGAT---TTGAAGAATGATACTAATACCAATA---GTAGTAGCGGGAGAA-TGATAATGGAGAAAGGA--GAGATAAAAAACTGCTCTTTCAATATCAGCACAAGCATAAGAGGTAAGGTGCAGAAAGAATATGCATTTTTTTATAAACTTGATATAATACCAATAGAT------AATGATACT---------ACCAGCTATAAGTTGACAAGTTGTAACACCTCAGTCATTACACAGGCCTGTCCAAAGGTATCCTTTGAGCCAATTCCCATACATTATTGTGCCCCGGCTGGTTTTGCGATTCTAAAATGTAATAATAAGACGTTCAATGGAACAGGACCATGTACAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTATCAACTCAACTGCTGTTAAATGGCAGTCTAGCAGAAGAAGAGGTAGTAATTAGATCTGTCAATTTCACGGACAATGCTAAAACCATAATAGTACAGCTGAACACATCTGTAGAAATTAATTGTACAAGACCCAACAACAATACAAGAAAAAGAATCCGTATCCAGAGAGGACCAGGGAGAGCATTTGTTACAATAGGAAAAATA---GGAAATATGAGACAAGCACATTGTAACATTAGTAGAGCAAAATGGAATAACACTTTAAAACAGATAGCTAGCAAATTAAGAGAACAATTTGGAAATAATAAAACAATAATCTTTAAGCAATCCTCAGGAGGGGACCCAGAAATTGTAACGCACAGTTTTAATTGTGGAGGGGAATTTTTCTACTGTAATTCAACACAACTGTTTAATAGTACTTGGTTT---AATAGTACTTGGAGT---ACTGAAGGGTCAAATAACACTGAAGGAAGTGAC---ACAATCACCCTCCCATGCAGAATAAAACAAATTATAAACATGTGGCAGAAAGTAGGAAAAGCAATGTATGCCCCTCCCATCAGTGGACAAATTAGATGTTCATCAAATATTACAGGGCTGCTATTAACAAGAGATGGTGGTAAT---AGCAACAATGA---GTCCGAGATCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGTGGGA---ATAGGAGCTTTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAGCCTCAATGACGCTGACGGTACAGGCCAGACAATTATTGTCTGGTATAGTGCAGCAGCAGAACAATTTGCTGAGGGCTATTGAGGCGCAACAGCATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAATCCTGGCTGTGGAAAGATACCTAAAGGATCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTGCTGTGCCTTGGAATGCTAGTTGGAGTAATAAATCTCTGGAACAGATTTGGAATCACACGACCTGGATGGAGTGGGACAGAGAAATTAACAATTACACAAGCTTAATACACTCCTTAATTGAAGAATCGCAAAACCAGCAAGAAAAGAATGAACAAGAATTATTGGAATTAGATAAATGGGCAAGTTTGTGGAATTGGTTTAACATAACAAATTGGCTGTGGTATATAAAATTATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTGCTGTACTTTCTATAGTGAATAGAGTTAGGCAGGGATATTCACCATTATCGTTTCAGACCCACCTCCCAACCCCGAGGGGACCCGACAGGCCCGAAGGAATAGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCCATTCGATTAGTGAACGGATCCTTGGCACTTATCTGGGACGATCTGCGGAGCCTGTGCCTCTTCAGCTACCACCGCTTGAGAGACTTACTCTTGATTGTAACGAGGATTGTGGAACTTCTGGGACGCAG---------------------GGGGTGGGAAGCCCTCAAATATTGGTGGAATCTCCTACAGTATTGGAGTCAGGAACTAAAGAATAGTGCTGTTAGCTTGCTCAATGCCACAGCCATAGCAGTAGCTGAGGGGACAGATAGGGTTATAGAAGTAGTACAAGGAGCTTGTAGAGCTATTCGCCACATACCTAGAAGAATAAGACAGGGCTTGGAAAGGATTTTGCTATAAGATGGGTGGCAAGTGGTCAAAAAGTAGTGTGATTGGATGGCC------TACTGTAAGGGAAAGAATGAGACGAGCTGAG-CCA---------GCAGCAGAT------------AGGGTGGGAGCAGCATCTCGAGACCTGGAAAAACATGGAGCAATCACAAGTAGCAATACAGCAGCTACCAATGCTGCTTGTGCCTGGCTAGAAGCACAAGAGGAGGAGGAGGTGG 35 | >K03455.1 36 | AGGCATCTCCTATGGCAGGAAGAAGCGGAGACAGCGACGAAGAGCTCATCAGAACAGTCAGACTCATCAAGCTTCTCTATCAAAGCAGTAAGTA----GTACATGTAACGCAACCTATACCAATAGTAGCAATAGTAGCATTAGTAGTAGCAATAATAATAGCAATAGTTGTGTGGTCCATAGTAATCATAGAATATAGGAAAATATTAAGACAAAGAAAAATAGACAGGTTAATTGATAGACTAATAGAAAGAGCAGAAGACAGTGGCAATGAGAGTGAAGGAGAAATATCAGCACTTGTGGAGATGGGGGTGGAGATGGGGCACCATGCTCCTTGGGATGTTGATGATCTGTAGTGCTACAGAAAAATTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAGGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGATACAGAGGTACATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAGTAGTATTGGTAAATGTGACAGAAAATTTTAACATGTGGAAAAATGACATGGTAGAACAGATGCATGAGGATATAATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTAAAATTAACCCCACTCTGTGTTAGTTTAAAGTGCACTGAT---TTGAAGAATGATACTAATACCAATA---GTAGTAGCGGGAGAA-TGATAATGGAGAAAGGA--GAGATAAAAAACTGCTCTTTCAATATCAGCACAAGCATAAGAGGTAAGGTGCAGAAAGAATATGCATTTTTTTATAAACTTGATATAATACCAATAGAT------AATGATACT---------ACCAGCTATAAGTTGACAAGTTGTAACACCTCAGTCATTACACAGGCCTGTCCAAAGGTATCCTTTGAGCCAATTCCCATACATTATTGTGCCCCGGCTGGTTTTGCGATTCTAAAATGTAATAATAAGACGTTCAATGGAACAGGACCATGTACAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTATCAACTCAACTGCTGTTAAATGGCAGTCTAGCAGAAGAAGAGGTAGTAATTAGATCTGTCAATTTCACGGACAATGCTAAAACCATAATAGTACAGCTGAACACATCTGTAGAAATTAATTGTACAAGACCCAACAACAATACAAGAAAAAGAATCCGTATCCAGAGAGGACCAGGGAGAGCATTTGTTACAATAGGAAAAATA---GGAAATATGAGACAAGCACATTGTAACATTAGTAGAGCAAAATGGAATAACACTTTAAAACAGATAGCTAGCAAATTAAGAGAACAATTTGGAAATAATAAAACAATAATCTTTAAGCAATCCTCAGGAGGGGACCCAGAAATTGTAACGCACAGTTTTAATTGTGGAGGGGAATTTTTCTACTGTAATTCAACACAACTGTTTAATAGTACTTGGTTT---AATAGTACTTGGAGT---ACTGAAGGGTCAAATAACACTGAAGGAAGTGAC---ACAATCACCCTCCCATGCAGAATAAAACAAATTATAAACATGTGGCAGAAAGTAGGAAAAGCAATGTATGCCCCTCCCATCAGTGGACAAATTAGATGTTCATCAAATATTACAGGGCTGCTATTAACAAGAGATGGTGGTAAT---AGCAACAATGA---GTCCGAGATCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGTGGGA---ATAGGAGCTTTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAGCCTCAATGACGCTGACGGTACAGGCCAGACAATTATTGTCTGGTATAGTGCAGCAGCAGAACAATTTGCTGAGGGCTATTGAGGCGCAACAGCATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAATCCTGGCTGTGGAAAGATACCTAAAGGATCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTGCTGTGCCTTGGAATGCTAGTTGGAGTAATAAATCTCTGGAACAGATTTGGAATCACACGACCTGGATGGAGTGGGACAGAGAAATTAACAATTACACAAGCTTAATACACTCCTTAATTGAAGAATCGCAAAACCAGCAAGAAAAGAATGAACAAGAATTATTGGAATTAGATAAATGGGCAAGTTTGTGGAATTGGTTTAACATAACAAATTGGCTGTGGTATATAAAATTATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTGCTGTACTTTCTATAGTGAATAGAGTTAGGCAGGGATATTCACCATTATCGTTTCAGACCCACCTCCCAACCCCGAGGGGACCCGACAGGCCCGAAGGAATAGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCCATTCGATTAGTGAACGGATCCTTGGCACTTATCTGGGACGATCTGCGGAGCCTGTGCCTCTTCAGCTACCACCGCTTGAGAGACTTACTCTTGATTGTAACGAGGATTGTGGAACTTCTGGGACGCAG---------------------GGGGTGGGAAGCCCTCAAATATTGGTGGAATCTCCTACAGTATTGGAGTCAGGAACTAAAGAATAGTGCTGTTAGCTTGCTCAATGCCACAGCCATAGCAGTAGCTGAGGGGACAGATAGGGTTATAGAAGTAGTACAAGGAGCTTGTAGAGCTATTCGCCACATACCTAGAAGAATAAGACAGGGCTTGGAAAGGATTTTGCTATAAGATGGGTGGCAAGTGGTCAAAAAGTAGTGTGATTGGATGGCC------TACTGTAAGGGAAAGAATGAGACGAGCTGAG-CCA---------GCAGCAGAT------------AGGGTGGGAGCAGCATCTCGAGACCTGGAAAAACATGGAGCAATCACAAGTAGCAATACAGCAGCTACCAATGCTGCTTGTGCCTGGCTAGAAGCACAAGAGGAGGAGGAGGTGG 37 | >AY835760.1 38 | AGGCATCTCCTATGGCAGGAAGAAGCGGAGACAGCGACGAAGACCTCCTCAAGACGGTCAGACTCATCAAGTTTCTCTATCAAAGCAGTACGTA----GTACATGTAATGCAGCCTTTACAAATAGTAGTAATAGTAGCATTAGTAGTAGCAACAATAATAGCAATAGTTGTTTGGTCCATAGTACTCATAGAATATAGAAAAATATTAAGACAAAGGAAAATAGACAGGCTAATTGATAGAATAAGAGAAAGAGCAGAAGACAGTGGCAATGAGAGTGAAGGAGATCAAGAAGAATTATCAGCGCTTG---TGGAGATGGGGCACCATGCTCCTTGGGATATTGATGATCTGTAGTGCTACAGATAAATTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAAGAAGCAAACACCACTCTATTTTGTGCATCTGATGCTAAAGCATATGATACAGAGGTACATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAGTAGTATTGGGAAATGTGACAGAAAATTTTAACATGTGGAAAAATAACATGGTAGAACAGATGCATGAGGATATAATCAGTTTGTGGGATCAAAGCTTAAAGCCATGTGTAAAATTAACCCCACTCTGTGTTACTTTAAATTGCACTGAT---TTGAAGAATGCTACTAATACCACTA---GTAGTAGCGGGAAAA-TGATGGTGGAGAGAGGA--GAAATAAAAAACTGCTCTTTTAATATCACCACCGGCATAAGAGATAAGGTGCAGAAAGAATATGCACTTTTGTATAAATTTGATATAGTACCAATAGAT------AATGATACG---------ACCAGCTATAGGTTGATAAGTTGTAACACCTCAGTCATTACACAGGCCTGTCCAAAGGTATCCTTTCAGCCAATTCCCATACATTATTGTGCCCCGGCTGGTTTTGCGATTCTAAAGTGTAATGATAAGAAGTTCAATGGAACAGGACCATGTAAAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTATCAACTCAACTGCTGTTAAATGGAAGTGTAGCAGAAGAAGAGGTAGTAATTAGATCTGCCAATTTCTCGGACAATGCTAAAACCATAATAGTACAGCTGAAAGAATCTGTAGAAATTAATTGTACAAGACCCAACAACAATACAAGAAGAAGTATACATAT------AGGACCAGGGAGAGCATTCTATGCAGCAGGAGACATAATAGGAGATATAAGACAAGCACATTGTAACATTAGTGGAGAAAAATGGCATAACACTTTAAAACAGGTAGCTAAAAAATTAGGAGAACAATTTGAGAATA---AAACAATAGCCTTTAAAAATTCCTCAGGGGGGGACCCAGAGATTGTAATGCACAGTTTTAATTGTGGAGGGGAATTTTTCTACTGTGATACAACAGGACTGTTTAATAGTATTTGGAATGATA---GTACTTGGAATGATAATAG----------TACTCGGAA-----TGAT---ACTATCATACTCCCATGCAGGATAAAACAAATTATAAACATGTGGCAGGAAGTAGGAAAAGCAATGTATGCCCCTCCCATCAGAGGACAAATTAGATGCTCATCAAATATTACAGGGCTGCTATTAACAAGAGATGGTGGTGTTGACAATGGTACTAATGGGACCGAGACCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACCATTAGGAATAGCACCCACCAAGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGTGGGA---ATAGGAGCTGTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAGCGTCAATAACGCTGACGGTACAGGCCAGACTATTATTGTCTGGTATAGTGCAACAGCAGAACAATTTGCTGAGGGCTATTGAGGCGCAACAGCATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAGTCCTGGCTGTGGAAAGATACCTACAGGATCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTGCTGTGCCTTGGAATGCTAGTTGGAGTAATAAATCTCTAAGTGAGATTTGGGATAACATGACCTGGATGGAGTGGGAAAGAGAAATTAACAATTACACAGGCTTAATATACACCTTAATTGAAGAATCGCAGAACCAACAAGAAAAGAATGAACAAGAATTATTGGCATTAGATAAATGGGCAAGTTTGTGGAATTGGTTTGACATAACAAACTGGCTGTGGTATATAAGAATATTCATAATGATAGTAGGAGGCTTGATAGGTTTAAGAATAGTGTTTGTAGTACTTTCTCTAGTGAATAGAGTTAGGCAGGGATACTCACCATTATCGTTTCAGACCCACCGCCCAGCCCCGAGGGGACCCGACAGGCCCGAAGGAATCGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCCGGTCGATTAGTGGATGGATTCTTAGCACTTATCTGGGTCGACCTGCGGAGCCTGTGCCTCTTCAGCTACCACCGCTTGAGAGACTTACTATTGATTGTAACGAGGATTGTGGAACTTCTGGGACGCAG---------------------GGGGTGGGAAGCTCTCAAATATTGGTGGAATCTCCTGCAGTATTGGATTCAGGAACTGAAGAATAGTGCTATTAGCTTGCTCAACGCCACAGCCATAGCAGTAGCTGAGGGGACAGATAGGGTTATAGAAGTAGTACAAAGAGCCTGTAGAGCTATTCTCCACATACCTAGGAGAATAAGACAGGGCTTGGAAAGGGCTTTGCAATAAGATGGGTGGCAAGTGGTCAAAGCGTAGTGTGCTAGGATGGTC------TACTATAAGGGAAAGAATGAGACGAGCTGAG-------------CCAGCTGAGCCAGCAGCAGATGGGGTGGGAGCAGTATCTCGAGACCTGGAAAAACATGGAGCAATCACAAGTAGCAATACAGCAACTAACAATGCTGCTTGTGCCTGGCTAGAAGCACAAGAGGAGGAGGAGGTGG 39 | >AY970946.1_H434 40 | AGGCATCTCCTATGGCAGGAAGAAGCGGAGACAGCGACGAAGACCTCCTCAGGACAGTCAGACTCATCAAGCTTCTCTATCAAAGCAGTAAGTA----GTACATGTAATGCAATCTTTACAAATATTAGCAATAGTAGCATTAGTAGTAGCAGCAATAATAGCAATAGTTGTGTGGACCATAGTAGTCATAGAATATAGGAAAATATTAAGACAAAGAAAAATAGACAGGTTAATTGATAGAATAAGAGAAAGAGCAGAAGACAGTGGCAATGAAAGTGAGGGGGACCAGGAAGAATTATCAGCACTTGT---GGAGATGGGGCACCATGCTCCTTGGGATGTTGATGATCTGTAGTGCTACAGAGAAATTGTGGGTCACAGTTTATTATGGGGTACCTGTGTGGAAAGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGATACAGAGGTACATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCCAGCCCACAAGAAGTAGTATTGGCAAATGTGACAGAAAATTTTAACATGTGGAAAAATAACATGGTAGAACAAATGCATGAGGACATGATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTAAAATTAACCCCACTCTGTGTTACTTTAGATTGCACTAAT---TTAAGGAATGCTACTAATACCACTGCTACTAATACCACTAGTA-GTGG---ATGGGGAGAA--GAAATGACAAACTGCTCTTTCAATATCACCACAAGCATAAGAAATAAGGTTCAGCAAGAATATGCACTTTTTTATAAACTTGATGTAGTACCAATAGAT------AAGAATACT---------GCTAAATATAGGTTGATAAATTGTAACACCTCAGTCATTACACAGGCCTGTCCAAAGGTATCATTTGAGCCAATTCCCATACATTATTGTACCCCGGCTGGTTTTGCGATTCTAAAGTGTAATGATAAGAAGTTCAATGGAACAGGACCATGTACAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTGTCAACTCAACTGCTGTTAAATGGCAGTCTGGCAGAAGGAGGGGTAGTAATTAGATCTGAAAATTTCACGAACAATGCTAAAACCATAATAGTACAGCTGAATGAATCTATAGAAATTAATTGTACAAGACCCAACAACAATACCAGAAAAAGTATACATAT------AGGACCGGGGAGAGCATTCTATGCAACAGGAGAAATAATAGGAGATATAAGACAAGCACATTGTAACATTAGTAGAGGAAAATGGAATAACACTTTAAAACAGATAGTTACAAAATTAAAAGAACAATATGGGAATA---AAACAATAGTATTTAAGTCATCCTCAGGAGGGGACCCAGAAATTGTAATGCACAGTTTTAATTGTGGAGGGGAATTTTTCTACTGTAATTCAACACAACTGTTTAATAGCACTTGGA------AAGATACT------------GGAAAGTTAAATAACCCTGAAGGAAATGGC---AACATCACACTCCCATGTAAAATAAAACAAATTATAAACAGGTGGCAGGGAGTAGGAAAAGCAATGTATGCCCCTCCCATCAGAGGACAAATTAGATGTTCATCAAATATTACAGGGCTGCTATTAACAAGAGATGGTGGTAACAACGTGAATAATACCA---CCGAGGTCTTCAGACCTGGAGGAGGAGATATGAGGGA---------------ATTATATAAATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCTACCAAGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGTGGGAACGATAGGAGCTATGTTCCTTGGATTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAGCGTCAGTGGCGCTGACGGTACAGGCCAGACAATTATTGTCCGGTATAGTGCAACAGCAGAATAATTTGCTGAAAGCTATTGAGGCGCAACAACATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAGTCCTGGCTGTGGAAAGATACCTAAAGGATCAACAGCTCCTAGGGATTTGGGGTTGCTCTGGAAAACTCATCTGCACCACTGCTGTGCCTTGGAATGCTAGTTGGAGTAATAAATCCCTGAATAAGATTTGGGATAACATGACCTGGATGGAGTGGGAAAGAGAAATTGAAAATTACCCAAGCTTAATATACACCTTAATTGAAGAATCGCAGAACCAGCAAGAAAAGAATGAACAAGAATTATTGGAATTAGATAAGTGGGCAAGTTTGTGGAATTGGTTTGACATAACAAACTGGCTGTGGTATATAAAAATATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTGCTGTACTTTCTATAGTGAATAGAGTTAGGCAGGGATACTCACCATTATCATTCCAGACCCGCCTCCCAGCCCCGAGGGGACCCGACAGGCCCGACGGAATCGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCCGGACGATTAGTGACTGGATTCTTAGCACTCATCTGGGACGATCTGCGGAGCCTGTGCCTCTTCAGCTACCACCGCTTGAGAGACTTACTCTTGATTGTAGCGAGGATTGTGGAACTTCTGGGACGCTG---------------------GGTGTGGGAAGCCCTCAAATATTGGTGGAATCTCCTGCAGTATTGGAGTCAGGAACTAAAGAATAGTGCTGTTAGTTTGCTTAATGCCACAGCTATAGCAGTAGCTGAGGGGACAGATAGGATTATAGAAGTAGTACAAAGAATTTGTAGAGCTATTCGCCACATACCTAGAAGAATAAGACAGGGCTTTGAAAGGGCTTTGCTATAAGATGGGGGGCAAGTGGTCAAAATATAGT------GGATGGTC------TGCCATAAGGGAAAGAATGAGACGAACTAAG-CCA---------GCAGCAGAG------------GGGGTAGGAGCAGTATCTCGAGACTTGGAAAAACATGGAGCAATCACAAGTAGCAATACAGCAGCTACTAATGCTGATTGTGCCTGGCTAGAAGCACAAGAGGAGGAAAAAGTGG 41 | >AY835762.1_5160-84 42 | AGGCATCTCCTATGGCAGGAAGAAGCGGAGACAGCGACGAAGAGCTCCTCAAGACAGTCAGAATCATCAAGTTTCTCTATCAAAGCAGTATGTA----GTACATGTAATGCAATCTTTACAAATAGTAGTAATAGTAGCATTAGTAGTAGCAACAATAATAGCAATAGTTGTTTGGTCCATAGTACTCATAGAATATAGAAAAATATTAAGACAAAGGAAAATAGACAGGTTAATTGATAGAATAAGAGAAAGAGCAGAAGACAGTGGCAATGAGAGTGAAGGAGATCAAGAAGAATTATCAGCGCTTGT---GGAGATGGGGCACCATGCTCCTTGGGATATTGATGATCTGTAGTGCTACAGATAAATTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAAGAAGCAACCACCACTCTATTTTGTGCATCTGATGCTAAAGCATATGATACAGAGGTACATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAGTAGTATTGGGAAATGTGACAGAAAATTTTAACATGTGGAAAAATAACATGGTAGAACAGATGCATGAGGATATAATCAGTTTGTGGGATCAAAGCTTAAAGCCATGTGTAAAATTAACCCCACTCTGTGTTACTTTAAATTGCACTGAT---TTGAGGAATGCTACTAATACCACTA---GTAGTAGCGGGAAAA-TAAT---GGAGGGAGGA--GAAATAAAAAACTGCTCTTTTAATATCACCACAGGGATAAGAGATAAGGTGCAGAAAGAATATGCACTTTTGTATAAATTTGATATAGTACCAATAGAT------AATGATACG---------ACCAGCTATAGGTTGATAAGTTGTAACACCTCAGTCATTACACAGGCCTGTCCAAAGGTAACCTTTGAGCCAATTCCCATACATTATTGTGCCCCGGCTGGTTTTGCGATTCTAAAGTGTAATGATAAGAAGTTCAAGGGAACAGGACCATGTACAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTATCCACTCACTTGCTGTTAAATGGCAGTCTAGCAGAAAAAGAGATAGTAATTAGATCTGAAAATTTCACAGACAATAGTAAAACCATAATAGTACATCTGAATGAATCTGTAGAAATTAATTGTACAAGGCCCAACAACAATACAAGGAAAAGTATACATAT------AGGACCAGGAAGAGCATTTTATACAACAGGAGAAATAATAGGAAATATAAGACAAGCACATTGTAACCTTAGTAGAACAAAATGGGCGAACACTTTAAAACAGATAGTTGAAAAATTAAGAGAACAATTTAA---GAATAGAACAGTAATATTTAATCAATCCTCAGGAGGGGACCCAGAAATTGTAATGCACACTTTTAATTGTGGAGGGGAATTTTTCTACTGTAATTCAACACAACTGTTTAATAGTACTTGGATGTTTAATAGTACT---AAT---------AGTACTAGAGGGTTAAATGAAACTGAC---ACACTCACACTCCCATGCAGAATAAAACAAATTATAAACATGTGGCAGGAAGTAGGAAAAGCAATGTATGCCCCTCCAATCAGAGGACAAATTAGATGTTCATCAAATATTACAGGGCTGCTATTAACAAGAGATGGTGGTAAGAACAA--------TGAGA-CCGAGATCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGTGGGA---TTAGGAGCTTTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAGCGTCAATGACGCTGACGGTACAGGCCAGACTATTATTGTCTGGTATAGTGCAACAGCAGAACAATTTGCTGAGGGCTATTGAGGCGCAACAGCGTATGTTGCAACTTACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAGTCCTGGCTGTGGAAAGATACCTAAGGGATCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTACTGTGCCTTGGAATGTTAGTTGGAGTAATAAAACTCTGGATGAGATTTGGAATAACATGACCTGGATGGAGTGGGAGAGAGAAATTGACAATTACACAAGCTTAATATACACCTTAATTGAGAAATCGCAAAACCAACAAGAAAAGAATGAACAAGAATTATTGGAATTAGATAAATGGGCAAGTTTGTGGAATTGGTTTGACATAACAAAATGGCTGTGGTATATAAAAATATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTGCTGTGCTTTCTATAGTGAATAGAGTTAGGCAGGGATACTCACCATTATCGTTTCAGACCCGCCTCCCAACCCCGAGGGGACCCGACAGGCCCGAAGGAATCGAAGAAGAAGGTGGAGAGAGAGACAGAAACAGATCCGGTCCCTTAGTGGATGGATTCTTAGCACTTTTCTGGGACGATCTGCGGAGCCTGTGCCTCTTCAGCTACCACCGCTTGAGAGACTTACTCTTGATTGTAACGAGGATTGTGGAACTCCTGGGACGCAG---------------------GGGGGGGGAAGTCCTCAAATATTGGTGGAATCTCCTACAGTATTGGAGTCAGGAACTAAGGAATAGTGCTGTTAGCTTGCTCAATGTCACAGCCATAGCAGTAGCTGAGGGGACAGATAGGGTTATAGAAGTATTACAAAGAGCTTATAGAGCTATCCTGCACATACCTAGAAGAATAAGACAGGGCTTTGAAAGGGCTTTGCAATAAGATGGGTGGCAAGTGGTCAAAAAGTAGTATAGTTGGATGGGC------TTCTGTAAGGGAAAGAATGAGACGAGCTGAG-CCA---------GCAGCAGAT------------GGGGTGGGAGCAGCATCGCGAGACCTGGAAAAACATGGAGCGATCACAAGTAGCAACACAGCAGCTACCAATGCTGCTTGTGCCTGGCTAGAAGCACAAGAGGATGAGGAGGTGG 43 | >AY835766.1_5073-89 44 | AGGCATCTCCTATGGCAGGAAGAAGCGGAGACAGCGACGAAGAGCTCATCAAGACAGTGAGACTCATCAAGTTCCTCTATCAAAGCAGTATGTA----GTAAATGTAATGCAATCTTTAGTAATATTAGCAATAGTAGCATATGTAGTAGCAATAATATTAGCAATAGTTGTGTGGTCTATAGTATTCATAGAATATAGGAAAATATTAAGACAAAGGAGAATAGACAGGTTAATTGATAGAATAGGAGAAAGAGCAGAAGACAGTGGCAATGAGAGTGAAGGAGATCAAGAAGAATTATCAGCAATTGT---GGAGATGGGGCACCATGCTCCTTGGGATATTAATGATCTGTAGTGCTACAAAAAAATTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAAGAGGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGACACAGAGGTACATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAGTAGTATTGAAAAATGTGACAGAAAATTTTAACATGTGGAAAAATAACATGGTAGAACAGATGCATGAGGATATAATCAGTTTGTGGGATCAAAGCCTAAAGCCATGTGTAAAATTAACCCCACTCTGTGTTACTTTAAATTGCACTGAC---TTGAAGAATGCTACTAATACCACTA---GTAGCAGCG---GAA-TAAT---AGAGGGAGGA--GAAATAAAAAACTGCTCTTTCAATGTCACCACAACAGTAAAAGATAAGGTGCAGAGAGAGTATGCACTTTTTTATAAACTTGATGTAGTACCACTAGAAGATGCTAATGATAGT---------ACCAGCTATAGGTTGATAAGTTGTAATACCTCAGTCACTACACAGGCCTGTCCAAAGGTAACCTTTGAGCCAATTCCCATACATTATTGTGCCCCAGCTGGTTTTGCGCTTCTAAAGTGTAACAATAAGACGTTCAATGGAACAGGACCATGCAAAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTATCAACTCAACTGCTGTTAAATGGCAGTCTAGCAGAAGAAGAGGCAGAACTTAGATCTGCCAATTTCACAGACAATGCTAAAACCATAATAGTACAGCTGAATGAATCTGTAGTAATTACTTGTACAAGACCCAACAACAATACAAGAAAAAGTATACATAT------AGGACCAGGGAGAGCATTTTATGCAACAGGAGAAATAATAGGAAATATAAGACAAGCATATTGTAACCTTAATATAACAAAATGGAATGACACTTTAAGACAGATAGTTACAAAATTAAGAGAACAATTTGGAATGAATAAAACAATAGTCTTTAATCAATCCTCAGGAGGGGACCCAGAAATTGTAATGCACAGTTTTAATTGTGGAGGGGAATTTTTCTACTGTAATACAACAAAACTGTTTAATAGTACTTGGATGTTTAATGGTACTTGGACT---------GGTACAAATAGCACGGAAAGAAATGGC---ACAATCACACTCCCATGCAGAATAAAACAAATTATAAACATGTGGCAGGAAGTAGGAAAAGCAATGTATGCCCCTCCCATCAGAGGACAAATTAGATGTTCATCAAACATTACAGGGCTGCTATTAACAAGAGATGGTGGTAAGAACGAGAAGAGAGCGAGAGCCGAGACCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAATTATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCCACTGAGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGTGGGA---ATAGGAGCTATGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAGCGTCAATGACGCTGACGGTACAGGCCAGACAATTATTATCTGGTATAGTGCAACAGCAGAACAATTTGCTGAGGGCTATTGAGGCGCAACAGCATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAGTCCTGGCTGTGGAAAGATACCTAAAGGATCAACAGCTCCTGGGTATTTGGGGTTGCTCTGGAAAACTCATCTGCACCACTGCTGTGCCTTGGAATGCTAGTTGGAGTAATAGATCTGTGGACTACATTTGGAATAACATGACCTGGATGGATTGGGAAAGAGAAATTGACAATTACACAGACTTAATATACAACTTAATTGAAGAATCGCAAAACCAACAAGAAAAAAATGAACAAGAATTATTGGAATTAGATAAATGGGCAAGTTTGTGGAATTGGTTTAACATAACAAACTGGCTGTGGTATATAAAATTATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTACTGTACTTTCTATAGTGAATAGAGTTAGGCAGGGATACTCACCATTATCGTTTCAGACCCGCCTCCCAGCCCCGAGGGGACCCGACAGGCCCGAAGGAATCGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCCGGTCCATTAGTGAACGGATTCTTAGCACTTATCTGGGTCGATCTGCGGAGCCTGTTCCTCTTCATCTACCACCGCTTGAGAGACTTACTCTTGATTGTAACGAGGACTGTGGAACTTCTGGGGCGCAG---------------------GGGGTGGGAACTCCTCAAATATTGTTGGAATCTCCTACAGTATTGGAGTCAGGAACTAAAGAATAGCGCTGTTAGCTTGCTCAATGTCACAGCCATAGCAGTAGCTGAGGGGACAGACAGGGTTATAGAAGTATTGCAAAGAGCTTATAGAGCTTTGCTCCATATACCTGTAAGAATAAGACAGGGCTTGGAAAGGGCTTTGCTATAAAATGGGTGGCAAGTGGTCAAAAAGTAGTGTAGTTGGATGGCC------TACGGTAAGGGAAAGAATGAGACGAGCTGCG-CCA---------GCAGCAGAT------------GGGGTGGGAGCAGTATCTCGAGACCTGGAAAAACATGGAGCAATTACAAGTAGCAATACAGCAGCAAACAATGCTGACTTGTGTTGGCTAGAAGCACAAGAGGAGGAGGAGGTGG 45 | >KF384809.1_ES24-7A 46 | AGGCATCTCCTATGGCAGGAAGAAGCGGAGACAGCGACGAAGAGCTCCTCAAGACAGTCAGACTCATCAAGTTTCTCTATCAAAGCAGTAAGTA----GTACATGTAATGCAATCTTTACAAATATTAGCAATAGTAGCATTAGTAGTAGCAGCAATAATAGCAATAGTTGTATGGTCCATAGTACTCGTAGAATATAGGAAAATATTAAGACAAAGAAAAATAGACAGGATAATTGATAGAATAAGAGAAAGAGCAGAAGACAGTGGCAATGAGAGTGACGGAGATCAGGAAGAATTGTCAGCACTGGT---GGAAATGGGGCATCATGCTCCTTGGGATATTAATGATCTGTAATGCTGCAAAAAAATTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAAGAAGCAACCACTACTCTATTTTGTGCATCAGATGCTAAAGTATATGATACAGAGGTACATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAGTAGTATTGGAAAATGTGACAGAAAATTTTAACATGTGGAAAAATAACATGGTAGAACAGATGCATGAGGATATAATTAGTTTATGGGATCAAAGCCTAAAGCCATGTGTAAAATTAACCCCACTCTGTGTTACTTTAAATTGCACTGAT---CTGAGGAATGCTACTAATACCACGA---GTAGTAGCGAGAGAA-CGAT---GGAGGGGGGA--GAAATAAAAAATTGCTCTTTCAATATCACCACAAGCATAAGAGATAAGGTGCAGAAAGAATATGCACTTTTTTATAAACTTGATGTAATACCAATAGAA------AAAGATAAT---------ACTAGCTATAGGTTGATAAGTTGTAACACCTCAGTCATTACACAGGCCTGTCCAAAGGTATCCTTTGAGCCAATTCCCATACATTATTGTGCCCCGGCTGGTTTTGCGATTCTAAAGTGTAACGATAAGAAGTTCAATGGAAAAGGACCATGTACAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTATCAACTCAACTGCTGTTAAATGGCAGTCTAGCAGAAGAAGAGGTAGTAATTAGATCCGCCAATCTCACGGACAATGCTAAAATCATAATAGTAAAGCTGAATAAATCTGTAGAACTGAATTGTACAAGACCCAACAACAATACAAGAAAAAGTATACCTAT------AGGACCAGGCAGAGCATTTTATACAACAGGAGAAATAATAGGAGATATAAGACAAGCACATTGTAACCTTAGTAGAGCAAAATGGAATGACACTTTAGAAAAGATAGCTATAAAATTAAGAGAACAATTTAA---GAATAAAACAATAGTCTTTAGTCAACCCTCAGGAGGGGACCCAGAAATTGTAACGCTCAGTTTTAATTGTGGAGGGGAATTTTTCTACTGTAATTCAACACAACTGTTTAGTAGTACTTGGA------ATGGTACT---------------GGGTCAAATAACACTAAAGGAAATGAC---ACAATCACACTCCCATGCAGAATAAAACAAATTATAAACATGTGGCAGGAAGTAGGAAAAGCAATGTATGCCCCTCCCATCAGAGGACAAATTAGATGTTCATCAAATATTACAGGGCTGCTATTAACAAGAGATGGTGGTAAAAACGAG-----AGCGAGA-CCGAGATCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGTGGGA---ATAGGAGCTATGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAGCGTCAATGACGCTGACGGTACAGGCCAGACTATTATTGTCTGGTATAGTGCAACAGCAGAACAATTTGCTGAGGGCTATTGAGGCGCAACAGCATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAGTCCTGGCTGTGGAAAGATACCTAAAGGATCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTACTGTGCCTTGGAATACTAGTTGGAGTAATAAATCTCTGAATCAGATTTGGCAGAACATGACCTGGATGCAGTGGGAAAGAGAAATTAATAATTACACAAGCTTAATATACACCTTAATTGAAGAATCGCAAAACCAACAAGAAAAGAATGAACAAGAATTATTGGAATTAGATAAATGGGCAAGTTTGTGGAATTGGTTTGACATAACAAACTGGCTGTGGTATATAAAAATATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTATTGTACTTTCTATAGTGAATAGAGTTAGGCAGGGATACTCACCATTATCGTTCCAGACCCGCCTCCCAGCCCCGAGGGGACCCGACAGGCCCGAAGGAATCGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCCGGTCAATTAGTGGATGGATTCTTAGCAATCATCTGGGTCGATCTGCGGAGCCTGTGCCTCTTCAGCTACCACCGCTTGAGAGACTTACTCTTGATTGTAACGAGGATTGTGGAACTTCTGGGACGCAG---------------------GGGGTGGGAAGTCCTCAAATATTGGTGGAATCTCCTACAGTATTGGAGTCAGGAACTAAAGAGTAGTGCTGTTAGCTTGCTCAATGCCACAGCCATAGCAGTAGCTGAGGGGACAGATAGGGTTATAGAAGTATTACAAAGAGCTTGTAGAGCTATTCTCCACATACCTACAAGAATAAGACAGGGCTTGGAAAGGGCTTTGCTATAAGATGGGTGGTAAGTGGTCAAAAAGTAGTCTGGTTGGATGGCC------TACTGTAAGGGAAAGAATGAGACGAGCTGAG-CCA---------GCAGCAGAT------------GGGGTGGGAGCAGCATCTCGAGACCTGGAAAAACATGGAGCACTCACAAGTAGCAATACAGCAACTAACAATGCTGATTGTGCCTGGCTAGAAGCACAAGAGAAGGAGGAGGTGG 47 | >AY835778.1_5018-86 48 | AGGCATCTCCTATGGCAGGAAGAAGCGGAGACAGCGACGACGAGCTCCTCAAGACAGTCAGACTCATCAAGTTTCTCTATCAAAGCAGTATGTA----GGACATGTAATGCAATCTTTACAAATATTACGAATAGTAGCATTAGTAGTAGTAGCAATAATAGCAATAGTTGTGTGGACCATAGTGTTCATAGAATATAGGAAAATATTAAGACAAAGAAAAATAGACAGGTTAATTGATAGAATAATAGAAAGAGCAGAAGCCAGTGGCAATGAGAGTGAAGGAGATCAGGAAGAATTGTCAGCGCTTGT---GGAAATGGGGCACCATGCTCCTTGGGATGTTGATGATCTGTAGTGCTACAGAAAAATTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAAGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGACACCGAGGTACATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCTAACCCACAAGAAGTAGTATTGGGAAATGTGACAGAAAATTTTAACATGTGGAAAAATAACATGGTAGAACAAATGCATGAGGATATAATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTAAAATTAACCCCACTCTGTGTTACTTTAAAATGCACTGAT---TTGAAGAATATTACTAATACCAATA---GTAGTAATTGGGAAA-AGAT---GGAGGAAGGA--GAAATAAAAAACTGCTCTTTCAATATCACCACAAACATAAGAGATAAGGTACAGAAAGAATATGCACTTTTTTATAAACTTGATGTAATGCCAATAGAT------AATGATAAT---------ACAAGCTATACTTTGATAAATTGTAACACCTCAGTCATTACACAGGCCTGTCCAAAGGCATCCTTTGAACCAATTCCCATACATTATTGTACCCCGGCTGGTTTTGCGATTCTAAAGTGTAATGATAAGAAGTTCGATGGAACAGGACCATGTACAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTGTCAACTCAACTGCTGTTAAATGGCAGTCTAGCAGAAGAAGGGGTAGTAATTAAATCTGAAAATTTCACGGACAATGCTAAAACCATAATAGTACAGCTGAATGAATCTGTAGAAATTAATTGTACAAGACCCAACAACAATACAAGAAAAAGTATCCGTAT------AGGACTGGGGAGAAGATTTTATGCAACG---AAAATAATAGGAGATATAAGGAAAGCACATTGTAACATTAGTGAAAAGAAATGGAATGACACTTTAAGACAGGTAGTTATAAAATTAAGAGAACAATTTGGGAATA---GAACAATAATCTTTAATCAATCCTCAGGAGGGGACCCAGAAATTGTAATGCACACTTTTAATTGTGGAGGGGAATTTTTCTACTGTAATTCAAGAGAGCTGTTTAATAGTACTTGGA------ATGATACT------------GAAGGGTGGAATAATACTGACAGGTCAAATAAAACTATCACACTCCCATGCAAAATAAAACAAATTATAAACAGATGGCAGGAAGTAGGAAAAGCAATGTATGCCCCTCCCATCAGAGGACAAATTAGATGTACATCAAATATTACAGGGCTGCTACTAACAAGAGATGGTGGTAGGAACAAT-----AGCGAAA-ACGAGACCTTCAGACCTGGAGGAGACGATATGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTAAAAATTGGACCATTGGGAGTAGCACCCACCAAGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGTGGGAGTAATAGGAGCTATGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAGCGTCAATGACGCTGACGGTACAAGCCAGACAATTATTGTCTGGTATAGTGCAACAGCAGAACAATCTGCTGAGGGCTATTGAGGCGCAACAGCATCTGTTGCGACTCACAGTCTGGGGCATCAAACAGCTCCAGGCAAGAGTCCTGGCTGTGGAAAGATACCTAAGGGATCAACAGCTCCTAGGAATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTTCTGTGCCTTGGAATACTAGTTGGAGTAATAAATCTCTGAATGACATTTGGAATAACATGACCTGGATGGAGTGGGAAAGAGAGATTGACAATTATACAAACTTAATATACAACTTAATTGAAGAATCGCAGAACCAACAAGAAAAGAATGAACAAGACTTATTGCAATTGGATAAGTGGGCAAGTTTGTGGAATTGGTTTGACATATCAAACTGGCTGTGGTATATAAAAATATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTACTGTACTTTCTTTAGTGAATAGAGTTAGGCAGGGATACTCACCATTATCATTTCAGACCCGCCTCCCAGCCCAGAGGGGACCCGACAGGCCCGAAGGAATCGAAGAAGAAGGTGGAGAGAGAGACAGAGACACATCCAGGCCGTTAGTGCATGGATTCTTAGCACTCATCTGGGTCGATCTGCGGAGCCTGTGCCTCTTCAGCTACCACCGCTTGAGAGACTTACTCTTGATTGTAGCGAGGATTGCGGAACTTCTGGGACGCAG---------------------GGGGTGGGAAGCCCTCAAATATTGGGGGAATCTCCTGCAGTATTGGAGTCAGGAACTAAAGAATAGTGCTGTTAACTTGCTCAATGCCACAGCCATAGCAGTAGCTGAGGGGACAGATAGCGTTATAGAAGTAGTACAAAGAATTTGTAGAGCTATTCTCAACATACCTAGAAGAATAAGACAGGGCTTGGAAAGGGCTTTACTATAAGATGGGTGGCATATGGTCAAAACGTAGTGGGGGTGGATGGACATGGGCTGCTGTAAGGGAAAGAATGAGACGAGCTGAG-CCA---------GCAGCAGTT------------GGGGTGGGAGCAGTATCTCGAGACTTGGAAAAACATGGAGCAATCACAAATAGCAATACAGCAGCTACTAATGCTGATTGTGCCTGGCTAGAAGCACAAGAGGATGAGGAGGTGG 49 | >AY835756.1_5157-86 50 | AGGCATCTCCTATGGCAGGAAGAAGCGGAGACAGCGACGAAGAGCTCCTCAAGACAGTCAGACTCATCAGAATTCTCCATCAAAGCAGTAAGTA----GTACATGTAATGCAACCTTTACAAATATTAGCACTAGTAGCATTAGTAGTAGCAGCAATAATAGCAATAGTTGTGTGGACTATAGTATTCATAGAATATAGGAAAATATTAAGACAAAGAAAAATAGACAGGTTAATTGATAGAATAATAGAAAGAGCAGAAGACAGTGGCAATGAAAGTGAAGGAGACCAGGAAGAATTGTCAGCACTTGT---GGAGATGGGGCACCATGCTCCTTGGGATGTTGATGATCTGTAGTGCTGCAGAAAACTTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAAGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGACACAGAGGTACATAATGTTTGGGCCACACATGCCTGTGTGCCCACAGACCCCAACCCACAAGAAGTAGCATTGGAAAATGTGACAGAAAAATTTAACATGTGGAAAAATAACATGGTAGAACAGATGCATGAGGACATAATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTAAAATTAACCCCACTCTGTGTCACTCTAGATTGCACTGAT---TTAGGGAATGCTACTAATACCAATA---GTAATAGTTGGGGAG-AGAT---GGAGAAAGGA--GAAATAAAAAACTGCTCTTTCAATATCACCACAAACATAAGAGACAAGGTGACAAAAGAATATGCACTTTTTTATAACCTTGATGTAGTACCAATAGAT------AAGAATAAG---------ACTAGCTTTAGGTTGATACATTGTAACACCTCAACCATTACACAGGCCTGTCCAAAGGTATCCTTTGAGCCAATTCCCATACATTATTGTGCCCCGGCTGGTTTTGCGATTCTAAAGTGTAATGATAAAAGGTTCAATGGAAAAGAATCATGTAAAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAGTGGTATCAACTCAACTACTGTTAAATGGCAGTCTAGCAGAAGAAGAAGTAGTAATTAGATCTGACAATCTCACGGACAATGCTAAAACCATAATAGTACAGCTGAAGGAACCTGTAGAAATTAATTGTACAAGACCCAACAACAATACAAGAAAAGGTATATATAT------AGGACCAGGGAGAGCATTTTATACAACAGAAAAAATAATAGGAGATATAAGACGAGCACATTGTAACATTAGTAGAGTAAAATGGAATAACACTTTACAACAGATAGTTAAAAAATTAAGAGAAAAATTTGAGAATA---AAACAATAGTCTTTAATCGATCCTCAGGAGGGGACCCAGAAGTTGTAATGCACAGTTTTAATTGTGGAGGAGAATTTTTCTACTGTAATTCATCACAACTGTTTAATAGTACTTGGAAT---AATGGTACTTGGAATGATACTAA------------CACTGAAGGA---------ACTATCACACTCCCATGTAGAATAAAACAAATTATAAACAGGTGGCAGGAAGTGGGAAAAGCAATGTATGCCCCTCCCATCAACGGACAAATTAGATGTTCATCAAATATTACAGGGCTGCTATTAACAAGAGATGGTGGTAATAGCGAGGACAATACCA---CAGAGATCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTAGAAATTGACCCATTAGGAGTAGCACCCACCAGGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGTGGGAATGCTAGGAGCTGTGTTCCTTGGGGTCTTGGGAGCAGCAGGAAGCACCATGGGCGCAGTGTCAATGACGCTGACGGTACAGACCAGACAATTATTGTCTGGGATAGTGCAACAGCAGAACAATTTGCTGAGGGCTATTGAGGCGCAACAACATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAGTCCTGGCTGTGGAAAGATACCTACAGGATCAACAGCTCCTAGGGATTTGGGGGTGCTCTGGAAAACTCATTTGCACCACTGCTGTGCCTTGGAATAGTAGTTGGAGTAATAAATCTCTGAATGAGATTTGGGATAACATGACCTGGATGGAGTGGGAAAGAGAAATTAACAATTACACAAACTTAATATACACCTTAATTGCAGAATCGCAAAACCAACAAGAAAAGAATGAGCAAGAATTATTGGAATTAGATAAGTGGGCAAGTTTGTGGAATTGGTTTAGCATAACAAATTGGCTGTGGTATATAAAAATATTCATAATGATAGTAGGAGGCTTGATAGGTTTAAGAATAGTTTTTGCTGTGCTTTCTATAGTGAATAGAGTTAGGCAGGGATATTCACCATTGTCATTGCAGACCCGCCTCCCAACCCCGAGGGGACCCGACAGGCCCGAAGGAATCGAAGAAGACGGTGGAGAGAGAGACAGAGACAGATCCGGAATATTAGTGAACGGATTCTTAGCACTTTTCTGGGACGACCTGCGGAGCCTGTGCCTCTTCAGCTACCACCGCTTGAGAGACTTACTCTTGATTGTAGCGAGGATTGTGGAACTTCTGGGACGCAG---------------------GGGGTGGGAAATCCTCAAGTATTGGTGGAATCTCCTACAGTATTGGAGTCAGGAACTAAAGAATAGTGCTGTTAGCTTGCTCAACGCCACTGCCATAGTAGTAGCTGAGGGGACAGATAGGGTTATAGAAGTAGTACAAAGAGCTGGTAGAGCTATTCTCCACATACCTAGAAGAATAAGACAGGGCTTGGAAAGGGCTTTGCTATAAGATGGGTGGCAAGTGGTCAAACAGTAGTACGGGTGGATGGGC------TACCATAAGAGAAAGAATGAGACGAACTGAG-CCAACTGAGCCAGCAGCAGAT------------GGGGTGGGAGCAGCATCTAGAGACCTGGAAAAACATGGAGCACTCACAAGTAGCAATACGTCAGCTAATAATGCTGATTGTGCCTGGCTAGAAGCACAAGAGGAGGAGGAAGTGG 51 | --------------------------------------------------------------------------------