├── README.md
├── create_pseudohaploid.sh
├── filter_seq
├── img
├── AlignmentChain.png
├── ChainFiltering.png
├── Pseudohaploid.png
└── README.md
├── pseudohaploid.chains.pl
└── test
├── basic.fa
├── run_test.output
├── run_tests.sh
└── simple.fa
/README.md:
--------------------------------------------------------------------------------
1 | # pseudohaploid
2 | Create pseudohaploid assemblies from a partially resolved diploid assembly
3 |
4 | [Mike Alonge](http://michaelalonge.com/), [Srividya Ramakrishnan](https://github.com/srividya22), and [Michael C. Schatz](http://schatz-lab.org)
5 |
6 | When assembling highly heterozygous genomes, the total span of the assembly is often nearly twice the expected (haploid) genome size, which is indicative of the assembler partially resolving the heterozygosity. This creates many duplicated genes and other duplicated features that can complicate annotation and comparative genomics. This repository contains code for post-processing an assembly to create a pseudo-haploid representation where pairs of contigs representing the same homologous sequence were filtered to select only one representative contig. The approach is similar to the approach used by [FALCON-unzip](https://www.nature.com/articles/nmeth.4035) for PacBio reads or [SuperNova](https://genome.cshlp.org/content/27/5/757) for 10X Genomics Linked Reads. As with those algorithms, our algorithm will not necessarily maintain the same phase throughout the assembly, and can arbitrarily alternate between homologous chromosomes at the ends of contigs. Unlike those methods, our method can be run as a stand-alone tool with any assembler.
7 |
8 |

9 |
10 | **Oveview of Pseudo-haploid Genome Assembly** (a) The original sample has two homologous chromosomes labeled orange and blue. (b) In the de novo assembly, homologous regions containing higher rates of heterozygosity are split into distinct sequences (orange and blue), while regions with low rates or no heterozygous bases are collapsed to a single representative sequence (black). (c) Our algorithm attempts to filter out redundant contigs from the other homologous chromosome, although the phasing of the differ contigs may be inconsistent. Figure derived from [8]
11 |
12 | Briefly, the algorithm begins by aligning the genome assembly to itself using the whole genome aligner `nucmer` from the [MUMmer suite](http://mummer.sourceforge.net/). We recommend the parameters `nucmer -maxmatch -l 100 -c 500` to report all alignments, unique and repetitive, at least 500bp long with a 100bp seed match. We further filtered these alignments to those that are 1000bp or longer using `delta-filter` (also part of the MUMmer suite). We also recommend the `sge_mummer` version of MUMmer so the alignments can be computed in parallel in a cluster environment: [sge_mummer github](https://github.com/fritzsedlazeck/sge_mummer) although this will produce identical results to the serial version. Finally we recommend filtering the alignments to keep those that are 90% identity or greater, to filter lower identity repetitive alignments while accommodating the expected rate of heterozygosity between homologous chromosomes while accounting for local regions of greater diversity.
13 |
14 | Next, the alignments were examined to identify and filter out redundant homologous contigs. We do so by linking the individual alignments into “alignment chains”, consisting of sets of alignments that are co-linear along the pair of contigs. Our method was inspired by older methods for computing synteny between distantly related genomes, although our method focuses on the problem of identifying homologous contig pairs as high identity long alignment chains. As we expect there to be structural variations between the homologous sequences, we allow for gaps in the alignments between the contigs, although true homologous contig pairs should maintain a consistent order and orientation to the alignments. Specifically, in the alignments from contig A to contig B, each aligned region of A forms a node in an alignment graph, and edges are added between nodes if they are compatible alignments, meaning they are on the same strand, and the implied gap distance on both contig A and contig B was less than 25kbp but not negative. Our algorithm then uses a depth first search starting at every node in the alignment graph to find the highest scoring chain of alignments, where the score is determined by the number of bases that are aligned in the chain. Notably, if a repetitive alignment is flanked by unique or repetitive alignments, such as the orange sequence in Contig B below, this approach will prefer to link alignments that are co-linear on Contig A. We find this produces better results than the filtering that MUMmer’s delta-filter can perform, which does not consider the context of the alignments when identifying a candidate set of non-redundant set of alignments.
15 |
16 | 
17 |
18 |
19 | **Alignment Chain Construction** (a) Pairwise alignments between all contigs are computed with nucmer. Here we show just the alignments between contigs A and B. (b) An alignment graph is computed where each aligned region of A forms a node, with edges between nodes that are compatible on the same strand, in the same order, and no more than 25kbp between them. (c) The final alignment chain is selected from the alignment graph as the maximal weight path in the alignment graph.
20 |
21 |
22 | With the alignment chains identified between pairs of contigs, the last phase of the algorithm is to remove any contigs that are redundant with other contigs originating on the homologous chromosome. Specifically, it evaluates the contigs in order from smallest to longest, and computes the fraction of the bases of each contig that are spanned by alignment chains to other non-redundant contigs. If more than X% of the contig is spanned, it is marked as redundant. This can occur in simple cases where shorter contigs are spanned by individual longer contigs as well as more complex cases where a contig is spanned by multiple shorter non-redundant contigs. We recommmend you evaluate several cutoffs for the threshold of percent of the bases spanned.
23 |
24 | 
25 |
26 | **Chain Filtering** (a) In simple cases, short contigs (contig A) are filtering out by their alignment chains to longer non-redundant contigs (contig B). (b) In complex cases, a contig (contig B) is filtered out because the total span of the alignment chains to multiple non-redundant contigs (contigs A and C) span more than X% of the bases.
27 |
28 |
29 | ## Installation
30 |
31 | Make sure [MUMmer](http://mummer.sourceforge.net/) is installed and the binaries are in your path. We recommend version 3.23 although others may work.
32 |
33 | Then download the pseudohaploid code:
34 |
35 | `
36 | $ git clone https://github.com/schatzlab/pseudohaploid.git
37 | `
38 |
39 | There is nothing else to install.
40 |
41 | ## Usage
42 |
43 | The main script to run is `create_pseudohaploid.sh`. This is a simple bash script to simplify the steps of aligning the genome to itself, filtering the alignments, constructing and analyzing the alignment chains, and then creating the final pseudohaploid assembly. The usage is:
44 |
45 | ```
46 | $ create_pseudohaploid.sh assembly.fa outprefix
47 | ```
48 |
49 |
50 | The test directory has a smalll script to run this comman on a small simple example. If everything is working well you should see:
51 |
52 | ```
53 | $ cd test
54 | $ ./run_tests.sh
55 | Running the simple example
56 | Generating pseudohaploid genome sequence
57 | ----------------------------------------
58 | GENOME: simple.fa
59 | OUTPREFIX: ph.simple
60 | MIN_IDENTITY: 90
61 | MIN_LENGTH: 1000
62 | MIN_CONTAIN: 93
63 | MAX_CHAIN_GAP: 20000
64 |
65 | 1. Aligning simple.fa to itself with nucmer
66 | Original assembly has 2 contigs
67 |
68 | 2. Filter for alignments longer than 1000 bp and below 90 identity
69 |
70 | 3. Generating coords file
71 |
72 | 4. Identifying alignment chains: min_id: 90 min_contain: 93 max_gap: 20000
73 | Processing coords file (ph.simple.filter.coords)...
74 | Processed 6 alignment records [4 valid]
75 | Finding chains for 2 contigs...
76 | Found 2 total edges [0.000 constructtime, 0.000 searchtime, 4 stackadd]
77 | Looking for contained contigs...
78 | Found 1 joint contained contigs
79 | Printed 1 total contained contigs
80 |
81 | 5. Generating a list of redundant contig ids using min_contain: 93
82 | Identified 1 redundant contig to remove in ph.simple.contained.ids
83 |
84 | 6. Creating final pseudohaploid assembly in ph.simple.pseudohap.fa
85 | Pseudohaploid assembly has 1 contigs
86 | ```
87 |
88 | Note the `create_pseudohaploid.sh` script is just a simple bash script so can be easily editing or incorporated into a larger pipeline. You can also swap out steps, such as replacing nucmer with sge_mummer to use a grid to compute the self alignments.
89 |
90 | ## Performance Validation
91 |
92 | To demonstrate the capabilities of our new Pseudohaploid method, we applied these techniques to a highly heterozygous sample of Arabidopsis thaliana, an F1 hybrid of Col-0 and Cvi-0 that was previously sequenced as part of the FALCON-unzip paper. For this analysis, we downloaded 116x coverage of PacBio reads (read N50 length=17,474) of the F1 genome from the SRA under accession SRX1715706. We then assembled the reads using Canu using parameters optimized for heterozygous samples. The total size of the raw Canu assembly was substantially larger than the expected haploid genome size: the total assembly size was 214.7Mbp, whereas the haploid genome size is ~135Mbp according to the latest estimates from The Arabidopsis Information Resource (TAIR) (https://www.arabidopsis.org/portals/genAnnotation/gene_structural_annotation/agicomplete.jsp).
93 |
94 | We then applied the Pseudohaploid method to the assembly. This reduced the total size of the assembly from 214.7Mbp to 143.5Mbp, and increased the contig N50 size from 350kbp to 950kbp by reducing the number of contigs from 2074 to 505. Then using the high quality TAIR10 reference genome, we investigated the quality of both the raw and Pseudohaploid assemblies. Using BUSCO, we found the reference genome contained 1356 complete BUSCOs genes, of which 1348 were single-copy, and 8 were duplicated. We found the raw Canu assembly contained a large fraction of duplicated genes, and overall it contained 1355 complete BUSCOs, although only 711 were single-copy, and 644 were duplicated. In contrast the Pseudohaploid assembly substantially reduced the number of duplicate genes, and contained a total of 1355 complete BUSCOs, of which 1240 were single-copy, and only 115 duplicated (an 83% reduction).
95 |
96 | Furthermore, by aligning the raw Canu and Pseudohaploid assemblies to the reference TAIR10 assemblies using nucmer using the parameters “-maxmatch -l 100 -c 500”, we found that 1.6Mbp (1.4%) of the TAIR10 assembly was not represented in the Canu assembly, and 4.2Mbp (3.5%) was not represented in the Pseudohaploid assembly as computed by the MUMmer tool dnadiff in the “AlignedBases” field. We also found that 19.0 Mbp of the raw Canu assembly and 14.1Mbp of the Pseudohaploid assembly were unaligned to the reference genome. However, the reference TAIR10 assembly was assembled from the Col-0 accession, and the portions that do not align are chiefly due to the pseudo-haploid representation that will alternate between the Col-0 and Cvi-0 haplotypes. To assess this, we also aligned a high quality (N50 size=7.9Mbp) Cvi inbred assembly created with the FALCON assembler to the TAIR10 reference using nucmer using the same parameters as above. From this, we find that 17.3Mbp (14.5%) of the reference is also not found in the Cvi assembly and the Cvi assembly contains 17.7Mbp not found in the reference highlighting the widespread structural variations between the accessions. We also found that the vast majority (94.5%) of the bases from the Pseudohaploid assembly that were not aligned to the reference genome could be successfully aligned to the Cvi assembly using the same parameters.
97 |
98 | Overall, the Pseudohaploid method was highly effective: it removed 71Mbp of redundant sequences from the raw Canu output to substantially improve the fraction of unique genes while only marginally decreasing the sequences from the reference present in the pseudohaploid assembly. The datafiles for these assemblies are available here: [http://labshare.cshl.edu/shares/schatzlab/www-data/pseudohaploid/arabidopsis/](http://labshare.cshl.edu/shares/schatzlab/www-data/pseudohaploid/arabidopsis/)
99 |
--------------------------------------------------------------------------------
/create_pseudohaploid.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | set -o pipefail
4 |
5 | BIN="$( cd "$(dirname "$0")" ; pwd -P )"
6 |
7 | ## Minimum alignment Identity
8 | #MIN_IDENTITY=95
9 | MIN_IDENTITY=90
10 |
11 | ## Minimum alignment length to consider
12 | #MIN_LENGTH=10000
13 | MIN_LENGTH=1000
14 |
15 | ## Minimum containment percentage from overlap chains to filter contig
16 | #MIN_CONTAIN=95
17 | MIN_CONTAIN=93
18 |
19 | ## Maximum distance in bp allowed between alignments on the same alignment chain
20 | MAX_CHAIN_GAP=20000
21 |
22 | if [ $# -lt 2 ]
23 | then
24 | echo "create_pseudohaploid.sh assembly.fa outprefix"
25 | exit
26 | fi
27 |
28 | GENOME=$1
29 | PREFIX=$2
30 |
31 | echo "Generating pseudohaploid genome sequence"
32 | echo "----------------------------------------"
33 | echo "GENOME: $GENOME"
34 | echo "OUTPREFIX: $PREFIX"
35 | echo "MIN_IDENTITY: $MIN_IDENTITY"
36 | echo "MIN_LENGTH: $MIN_LENGTH"
37 | echo "MIN_CONTAIN: $MIN_CONTAIN"
38 | echo "MAX_CHAIN_GAP: $MAX_CHAIN_GAP"
39 | echo
40 |
41 | ## You may want to replace this with sge_mummer for large genomes
42 | ## See: https://github.com/fritzsedlazeck/sge_mummer
43 | if [ ! -r $PREFIX.delta ]
44 | then
45 | echo "1. Aligning $GENOME to itself with nucmer"
46 | (nucmer --maxmatch -c 100 -l 500 $GENOME $GENOME -p $PREFIX) >& nucmer.log
47 | numorig=`grep -c '^>' $GENOME`
48 | echo "Original assembly has $numorig contigs"
49 | echo
50 | fi
51 |
52 | ## Pre-filter for just longer high identity alignments
53 | echo "2. Filter for alignments longer than $MIN_LENGTH bp and below $MIN_IDENTITY identity"
54 | delta-filter -l $MIN_LENGTH -i $MIN_IDENTITY $PREFIX.delta > $PREFIX.filter.delta
55 | echo
56 |
57 | ## Create the coord file
58 | echo "3. Generating coords file"
59 | show-coords -rclH $PREFIX.filter.delta > $PREFIX.filter.coords
60 | echo
61 |
62 | ## Find and analyze the alignment chains
63 | ## Note you can rerun this step multiple times from the same coords file
64 | echo "4. Identifying alignment chains: min_id: $MIN_IDENTITY min_contain: $MIN_CONTAIN max_gap: $MAX_CHAIN_GAP"
65 | ($BIN/pseudohaploid.chains.pl $PREFIX.filter.coords \
66 | $MIN_IDENTITY $MIN_CONTAIN $MAX_CHAIN_GAP > $PREFIX.chains) >& $PREFIX.chains.log
67 | cat $PREFIX.chains.log
68 | echo
69 |
70 | ## Generate a list of contained contigs
71 | ## This can also be rerun from the same chain file but using different containment thresholds
72 | echo "5. Generating a list of redundant contig ids using min_contain: $MIN_CONTAIN"
73 | grep '^#' $PREFIX.chains | \
74 | awk -v cut=$MIN_CONTAIN '{if ($4 >= cut){print ">"$2}}' > $PREFIX.contained.ids
75 | numcontained=`wc -l $PREFIX.contained.ids | awk '{print $1}'`
76 | echo "Identified $numcontained redundant contig to remove in $PREFIX.contained.ids"
77 | echo
78 |
79 |
80 | ## Finally filter the original assembly to remove the contained contigs
81 | echo "6. Creating final pseudohaploid assembly in $PREFIX.pseudohap.fa"
82 | ($BIN/filter_seq -v $PREFIX.contained.ids $GENOME > $PREFIX.pseudohap.fa) >& $PREFIX.pseudohap.fa.log
83 | numfinal=`grep -c '^>' $PREFIX.pseudohap.fa`
84 | echo "Pseudohaploid assembly has $numfinal contigs"
85 |
--------------------------------------------------------------------------------
/filter_seq:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env perl
2 | use warnings;
3 | use strict;
4 | use FileHandle;
5 |
6 | my $USAGE = "Usage: filter_seq [-index seq.fa] [-v] subset.fa all.fa\n";
7 |
8 | my $HELPTEXT = qq~
9 | Extract specified fasta records listed in subset.fa from a master file all.fa.
10 | If available, use the index file all.fa.idx to allow random access. Create the index file
11 | by running 'filter_seq -index all.fa'.
12 |
13 | $USAGE
14 |
15 | Options
16 | -------
17 | -index Create an index file of the fasta file\n
18 | -v skip everything listed in the subset.fa file\n
19 | ~;
20 |
21 |
22 | my $createindex = 0;
23 | my $skiplisted = 0;
24 |
25 | my $good = shift @ARGV or die $USAGE;
26 |
27 | if ($good eq "-help")
28 | {
29 | die $HELPTEXT;
30 | }
31 | elsif ($good eq "-index")
32 | {
33 | $createindex = 1;
34 | $good = shift @ARGV or die $USAGE;
35 | }
36 | elsif ($good eq "-v")
37 | {
38 | $skiplisted = 1;
39 | $good = shift @ARGV or die $USAGE;
40 | print STDERR "Skipping everything listed in $good\n";
41 | }
42 |
43 |
44 | if ($createindex)
45 | {
46 | my $orig = new FileHandle "$good", "<"
47 | or die("Can't open $good ($!)");
48 |
49 | open IDX, "> $good.idx"
50 | or die("Can't open $good.idx ($!)");
51 |
52 | while (!$orig->eof())
53 | {
54 | my $pos = $orig->tell();
55 | my $line = $orig->getline();
56 |
57 | if ($line =~ /^\>(\S+)/)
58 | {
59 | print IDX "$1 $pos\n";
60 | }
61 | }
62 |
63 | close IDX;
64 | }
65 | else
66 | {
67 | my $copy = shift @ARGV or die $USAGE;
68 |
69 | my %sequencelist;
70 |
71 | ## Find the seqnames from the good list
72 | open GOOD, "< $good"
73 | or die("Could't open $good ($!)");
74 |
75 | while ()
76 | {
77 | if (/^\#(\S+)\(/ || /^\>(\S+)/)
78 | {
79 | $sequencelist{$1} = 1;
80 | }
81 | }
82 | close GOOD;
83 |
84 | if ((!$skiplisted) && (-r "$copy.idx"))
85 | {
86 | ## Create the index as: grep -b '>' tvg2.qual | tr -d ':' | tr '>' ' ' | awk '{print $2" "$1}' > tvg2.qual.idx
87 | my %offsettable;
88 |
89 | open IDX, "< $copy.idx"
90 | or die("Couldnt open $copy.idx ($!)");
91 |
92 | while ()
93 | {
94 | my @val = split / /, $_;
95 |
96 | $offsettable{$val[0]} = $val[1]
97 | if (exists $sequencelist{$val[0]});
98 | }
99 | close IDX;
100 |
101 |
102 | my $copy = new FileHandle "$copy", "r"
103 | or die("Couldnt open $copy ($!)");
104 |
105 | foreach my $seqname (keys %sequencelist)
106 | {
107 | if (exists $offsettable{$seqname})
108 | {
109 | $sequencelist{$seqname} = 0;
110 |
111 | $copy->seek($offsettable{$seqname}, 0);
112 |
113 | ## Print the headerline for sure
114 | my $line = $copy->getline();
115 | print $line;
116 |
117 | ## loop until next record
118 | $line = $copy->getline();
119 | while ($line !~ /^>/)
120 | {
121 | print $line;
122 | last if $copy->eof();
123 | $line = $copy->getline();
124 | }
125 | }
126 | }
127 | }
128 | else
129 | {
130 | ## Pull the sequences out of the copy file
131 | my $printid = 0;
132 |
133 | open COPY, "< $copy"
134 | or die("Couldnt open $copy ($!)");
135 |
136 | while ()
137 | {
138 | if (/^\>(\S+)/)
139 | {
140 | $printid = $sequencelist{$1};
141 | if ($skiplisted) { $printid = !$printid; }
142 | $sequencelist{$1} = 0;
143 | }
144 |
145 | print $_ if $printid;
146 | }
147 |
148 | close COPY;
149 | }
150 |
151 | ## Make sure we found each id
152 | foreach my $seqname (keys %sequencelist)
153 | {
154 | print STDERR "$seqname in $good but not in $copy"
155 | if ($sequencelist{$seqname});
156 | }
157 | }
158 |
--------------------------------------------------------------------------------
/img/AlignmentChain.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/schatzlab/pseudohaploid/7c014188c5567bb8b27fcf754da79d59a1b5a425/img/AlignmentChain.png
--------------------------------------------------------------------------------
/img/ChainFiltering.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/schatzlab/pseudohaploid/7c014188c5567bb8b27fcf754da79d59a1b5a425/img/ChainFiltering.png
--------------------------------------------------------------------------------
/img/Pseudohaploid.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/schatzlab/pseudohaploid/7c014188c5567bb8b27fcf754da79d59a1b5a425/img/Pseudohaploid.png
--------------------------------------------------------------------------------
/img/README.md:
--------------------------------------------------------------------------------
1 | Image Files
2 |
--------------------------------------------------------------------------------
/pseudohaploid.chains.pl:
--------------------------------------------------------------------------------
1 | #!/usr/bin/perl -w
2 | use strict;
3 |
4 | ## Generate coords file like this:
5 | ## show-coords -rclH file.delta > file.coords
6 |
7 | my $USAGE = "pseudohaploid.chains.pl coords_file min_perc_id min_perc_cov max_chain_gap > redundant.list\n";
8 |
9 | my $coordsfile = shift @ARGV or die $USAGE;
10 | my $MIN_PERC_ID = shift @ARGV or die $USAGE;
11 | my $MIN_PERC_COV = shift @ARGV or die $USAGE;
12 | my $MAX_CHAIN_DIST = shift @ARGV or die $USAGE;
13 |
14 | my $VERBOSE = 1;
15 | my $PATHVERBOSE = 0;
16 |
17 |
18 | ## Parse the coords file for valid alignments
19 | ###############################################################################
20 |
21 | open COORDS, "$coordsfile" or die "Can't open $coordsfile\n";
22 | print STDERR "Processing coords file ($coordsfile)...\n";
23 |
24 | my $alignments = 0;
25 | my $validalignments = 0;
26 | my %contigs;
27 |
28 | while ()
29 | {
30 | $alignments++;
31 | if (($alignments % 10000) == 0) { print STDERR " processed $alignments alignments\n"; }
32 |
33 | chomp;
34 | $_ =~ s/^\s+//;
35 | my @vals = split /\s+/, $_;
36 |
37 | my $rstart = $vals[0];
38 | my $rend = $vals[1];
39 |
40 | my $qstart = $vals[3];
41 | my $qend = $vals[4];
42 |
43 | my $qoo = "F";
44 | if ($qstart > $qend) { $qoo = "R" };
45 |
46 | my $alenr = $vals[6];
47 | my $alenq = $vals[7];
48 |
49 | my $pid = $vals[9];
50 |
51 | my $lenr = $vals[11];
52 | my $lenq = $vals[12];
53 |
54 | my $rid = $vals[17];
55 | my $qid = $vals[18];
56 |
57 | $contigs{$rid}->{len} = $lenr;
58 | $contigs{$qid}->{len} = $lenq;
59 |
60 | #print "$_\n";
61 | #print "$rid $qid | $rstart $rend $lenr | $qstart $qend $qoo $lenq | $alenr\n";
62 |
63 | next if ($pid < $MIN_PERC_ID);
64 | next if ($rid eq $qid);
65 |
66 | $validalignments++;
67 |
68 | my $ainfo;
69 | $ainfo->{rstart} = $rstart;
70 | $ainfo->{rend} = $rend;
71 |
72 | $ainfo->{alenr} = $alenr;
73 |
74 | $ainfo->{qid} = $qid;
75 | $ainfo->{qstart} = $qstart;
76 | $ainfo->{qend} = $qend;
77 |
78 | push @{$contigs{$rid}->{align}->{$qid}->{$qoo}}, $ainfo;
79 | }
80 |
81 | print STDERR "Processed $alignments alignment records [$validalignments valid]\n";
82 |
83 |
84 |
85 | ## Find the longest alignment chain per sequence
86 | ###############################################################################
87 |
88 | my $numcontigs = scalar keys %contigs;
89 | my $totaledges = 0;
90 | my $ctgcount = 0;
91 |
92 | my $constructtime = 0;
93 | my $searchtime = 0;
94 | my $stackadd = 0;
95 | my $lasttime = 0;
96 |
97 | print STDERR "Finding chains for $numcontigs contigs...\n";
98 |
99 | ## process from smallest to biggest, so that bigger contigs are preferred to be kept
100 | foreach my $ctg (sort {$contigs{$a}->{len} <=> $contigs{$b}->{len}} keys %contigs)
101 | {
102 | $ctgcount++;
103 | if (($ctgcount % 1000) == 0) { print STDERR " processed $ctgcount contigs...\n"; }
104 |
105 | if (exists $contigs{$ctg}->{align})
106 | {
107 | my $clen = $contigs{$ctg}->{len};
108 |
109 | foreach my $qid (sort keys %{$contigs{$ctg}->{align}})
110 | {
111 | my $bestspanall = -1;
112 | my $bestpathall = undef;
113 |
114 | my %salign;
115 | $salign{F} = undef;
116 | $salign{R} = undef;
117 |
118 | foreach my $dir ('F', 'R')
119 | {
120 | if (exists $contigs{$ctg}->{align}->{$qid}->{$dir})
121 | {
122 | my @align = sort {$a->{rstart} <=> $b->{rstart}} @{$contigs{$ctg}->{align}->{$qid}->{$dir}};
123 | $salign{$dir} = \@align;
124 |
125 | if ($PATHVERBOSE)
126 | {
127 | my $qlen = $contigs{$qid}->{len};
128 | print "$ctg [$clen] $qid [$qlen] $dir\n";
129 | for (my $i = 0; $i < scalar @align; $i++)
130 | {
131 | my $rstart = $align[$i]->{rstart};
132 | my $rend = $align[$i]->{rend};
133 | my $qstart = $align[$i]->{qstart};
134 | my $qend = $align[$i]->{qend};
135 | print "\t<$i$dir>\t$rstart\t$rend\t|\t$qstart\t$qend\n";
136 | }
137 | }
138 |
139 | ## Find all of the compatible edges
140 | $lasttime = time();
141 | for (my $i = 0; $i < scalar @align; $i++)
142 | {
143 | for (my $j = 0; $j < $i; $j++)
144 | {
145 | ## sorted scan: 0 ... j ... i ... n
146 | ## check if alignment j is compatible with alignment i
147 | my $rdist = $align[$i]->{rstart} - $align[$j]->{rend};
148 | my $qdist = $align[$i]->{qstart} - $align[$j]->{qend};
149 | if ($dir eq "R") { $qdist = $align[$j]->{qend} - $align[$i]->{qstart} }
150 |
151 | my $valid = 0;
152 |
153 | ## First check the distance between the alignments and ref position
154 | if ((abs($rdist) < $MAX_CHAIN_DIST) &&
155 | (abs($qdist) < $MAX_CHAIN_DIST) &&
156 | ($align[$i]->{rstart} > $align[$j]->{rstart}) &&
157 | ($align[$i]->{rend} > $align[$j]->{rend}))
158 | {
159 | $valid = 1;
160 | }
161 |
162 | ## Now check the query positions
163 | if ($valid)
164 | {
165 | $valid = 0;
166 |
167 | if ($dir eq "F")
168 | {
169 | ## ----------------------------------------------------
170 | ## s----j------->e
171 | ## s------i------->e
172 |
173 | if (($align[$i]->{qstart} > $align[$j]->{qstart}) &&
174 | ($align[$i]->{qend} > $align[$j]->{qend}))
175 | {
176 | $valid = 1;
177 | }
178 | }
179 | else
180 | {
181 | ## ----------------------------------------------------
182 | ## {qstart} > $align[$i]->{qstart}) &&
186 | ($align[$j]->{qend} > $align[$i]->{qend}))
187 | {
188 | $valid = 1;
189 | }
190 | }
191 | }
192 |
193 | if ($valid)
194 | {
195 | $totaledges++;
196 | push @{$align[$j]->{edge}}, $i;
197 | }
198 | }
199 | }
200 | $constructtime += (time() - $lasttime);
201 |
202 | if ($PATHVERBOSE)
203 | {
204 | for (my $i = 0; $i < scalar @align; $i++)
205 | {
206 | if (exists $align[$i]->{edge})
207 | {
208 | print "edges from <$i$dir>:";
209 | foreach my $j (@{$align[$i]->{edge}})
210 | {
211 | print "\t<$j$dir>";
212 | }
213 | print "\n";
214 | }
215 | }
216 | }
217 |
218 | ## find the longest chain starting at each node (if not already visited)
219 | $lasttime = time();
220 | for (my $i = 0; $i < scalar @align; $i++)
221 | {
222 | next if exists $align[$i]->{visit};
223 |
224 | ## start a DFS at node i to explore chains passing through it
225 | my $path;
226 | $path->{chainstart} = $align[$i]->{rstart};
227 | $path->{chainend} = $align[$i]->{rend};
228 | $path->{chainweight} = $align[$i]->{alenr};
229 | $path->{dir} = $dir;
230 |
231 | push @{$path->{nodes}}, $i;
232 |
233 | my $bestspani = -1;
234 | my $bestpathi = undef;
235 |
236 | my @stack;
237 | push @stack, $path;
238 | $stackadd++;
239 |
240 | while (scalar @stack > 0)
241 | {
242 | my $path = pop @stack;
243 | my $pathlen = scalar @{$path->{nodes}};
244 |
245 | my $lastnode = $path->{nodes}->[$pathlen-1];
246 | $align[$lastnode]->{visit}++;
247 |
248 | my $betterpath = 0;
249 | if ((!exists $align[$lastnode]->{chainweight}) ||
250 | ($path->{chainweight} > $align[$lastnode]->{chainweight}))
251 | {
252 | $betterpath = 1;
253 | $align[$lastnode]->{chainweight} = $path->{chainweight};
254 | }
255 |
256 | if (($betterpath) && (exists $align[$lastnode]->{edge}))
257 | {
258 | ## If I can keep extending, extend with all children
259 | foreach my $e (@{$align[$lastnode]->{edge}})
260 | {
261 | my @nodes = @{$path->{nodes}};
262 | push @nodes, $e;
263 | my $newpath;
264 | $newpath->{nodes} = \@nodes;
265 | $newpath->{dir} = $dir;
266 |
267 | $newpath->{chainstart} = $path->{chainstart};
268 | $newpath->{chainend} = $path->{chainend};
269 | if ($align[$e]->{rend} > $newpath->{chainend}) { $newpath->{chainend} = $align[$e]->{rend}; }
270 |
271 | my $newstart = $align[$e]->{rstart};
272 | if ($path->{chainend} > $newstart) { $newstart = $path->{chainend}; }
273 | my $newbases = $align[$e]->{rend} - $newstart + 1;
274 | $newpath->{chainweight} = $path->{chainweight} + $newbases;
275 |
276 | push @stack, $newpath;
277 | $stackadd++;
278 | }
279 | }
280 | else
281 | {
282 | ## no place else to go, score the path
283 | my $chainstart = $path->{chainstart};
284 | my $chainend = $path->{chainend};
285 | my $chainspan = $chainend - $chainstart + 1;
286 | my $chainweight = $path->{chainweight};
287 |
288 | ## override span with weight
289 | $chainspan = $chainweight;
290 |
291 | if ($chainspan > $bestspani)
292 | {
293 | $bestspani = $chainspan;
294 | $bestpathi = $path;
295 | }
296 | }
297 | }
298 |
299 | ## best path from node i
300 | if ($PATHVERBOSE)
301 | {
302 | print "bestspani <$i$dir>\t$bestspani";
303 |
304 | if (defined $bestpathi)
305 | {
306 | my $span = $bestpathi->{chainend} - $bestpathi->{chainstart} + 1;
307 | print "\t|\t$bestpathi->{chainstart}\t$bestpathi->{chainend}\t[$span]\t|\t";
308 | foreach my $n (@{$bestpathi->{nodes}})
309 | {
310 | print "\t<$n$dir>";
311 | }
312 | print "\n";
313 | }
314 | }
315 |
316 | ## check if this is the best overall
317 | if ($bestspani > $bestspanall)
318 | {
319 | $bestspanall = $bestspani;
320 | $bestpathall = $bestpathi;
321 | }
322 | }
323 | $searchtime += (time() - $lasttime);
324 | }
325 | }
326 |
327 | ## overall best chain between this pair of contigs
328 | if ($VERBOSE)
329 | {
330 | my $clen = $contigs{$ctg}->{len};
331 | my $qlen = $contigs{$qid}->{len};
332 |
333 | print "bestspanall $ctg [$clen] $qid [$qlen] : $bestspanall\n";
334 |
335 | if (defined $bestpathall)
336 | {
337 | my $dir = $bestpathall->{dir};
338 | my $span = $bestpathall->{chainend} - $bestpathall->{chainstart} + 1;
339 |
340 | print "\t$dir\t|\t$bestpathall->{chainstart}\t$bestpathall->{chainend}\t[$span]\t|\t";
341 | foreach my $n (@{$bestpathall->{nodes}})
342 | {
343 | print "\t<$n$dir>";
344 | }
345 | print "\n";
346 |
347 | foreach my $n (@{$bestpathall->{nodes}})
348 | {
349 | my $rstart = $salign{$dir}->[$n]->{rstart};
350 | my $rend = $salign{$dir}->[$n]->{rend};
351 | my $qstart = $salign{$dir}->[$n]->{qstart};
352 | my $qend = $salign{$dir}->[$n]->{qend};
353 | print "\t<$n$dir>\t$rstart\t$rend\t|\t$qstart\t$qend\n";
354 | }
355 |
356 | print "\n\n";
357 | }
358 | }
359 |
360 | if (defined $bestpathall)
361 | {
362 | my $chain;
363 | $chain->{rstart} = $bestpathall->{chainstart};
364 | $chain->{rend} = $bestpathall->{chainend};
365 | $chain->{qid} = $qid;
366 |
367 | push @{$contigs{$ctg}->{chain}}, $chain;
368 | }
369 | }
370 | }
371 | }
372 |
373 | my $constructtimep = sprintf("%d", $constructtime);
374 | my $searchtimep = sprintf("%d", $searchtime);
375 |
376 | print STDERR "Found $totaledges total edges [$constructtimep constructtime, $searchtimep searchtime, $stackadd stackadd]\n";
377 |
378 |
379 |
380 | ## Look for jointly contained contigs
381 | ###############################################################################
382 |
383 | print STDERR "Looking for contained contigs...\n";
384 |
385 | my $jointcontained = 0;
386 |
387 | ## process from smallest to biggest, so that bigger contigs are preferred to be kept
388 | foreach my $ctg (sort {$contigs{$a}->{len} <=> $contigs{$b}->{len}} keys %contigs)
389 | {
390 | if (exists $contigs{$ctg}->{chain})
391 | {
392 | my $clen = $contigs{$ctg}->{len};
393 |
394 | my %octgs;
395 | my $mappedbp = 0;
396 | my $lastend = -1;
397 |
398 | ## Plane sweep to find non-redundant mapped bases
399 |
400 | foreach my $ainfo (sort {$a->{rstart} <=> $b->{rstart}} @{$contigs{$ctg}->{chain}})
401 | {
402 | ## skip alignments to stuff that is already contained
403 | next if (exists $contigs{$ainfo->{qid}}->{contained});
404 |
405 | my $mstart = $ainfo->{rstart};
406 | if ($lastend > $mstart) { $mstart = $lastend; }
407 |
408 | if ($ainfo->{rend} > $mstart)
409 | {
410 | my $newmap = $ainfo->{rend} - $mstart;
411 | $mappedbp += $newmap;
412 | $lastend = $ainfo->{rend};
413 | $octgs{$ainfo->{qid}} += $newmap;
414 | }
415 | }
416 |
417 |
418 | ## If a large fraction of this contig is mapped, mark it contained
419 | my $pcov = sprintf("%0.02f", 100.0 * $mappedbp / $clen);
420 | print "# $ctg [$clen] $pcov :";
421 |
422 | if ($pcov >= $MIN_PERC_COV)
423 | {
424 | $jointcontained++;
425 |
426 | foreach my $oid (sort {$octgs{$b} <=> $octgs{$a}} keys %octgs)
427 | {
428 | my $olen = $contigs{$oid}->{len};
429 | my $omap = $octgs{$oid};
430 | print " $oid [$omap $olen]";
431 |
432 | push @{$contigs{$ctg}->{contained}}, $oid;
433 | }
434 | }
435 |
436 | print "\n";
437 | }
438 | }
439 |
440 | print STDERR "Found $jointcontained joint contained contigs\n";
441 |
442 |
443 |
444 | ## Print final results
445 | ###############################################################################
446 |
447 | my $cnt = 0;
448 | foreach my $ctg (sort keys %contigs)
449 | {
450 | if (exists $contigs{$ctg}->{contained})
451 | {
452 | $cnt++;
453 | my $clen = $contigs{$ctg}->{len};
454 | print "$cnt $ctg [$clen] :";
455 | foreach my $parent (@{$contigs{$ctg}->{contained}})
456 | {
457 | my $plen = $contigs{$parent}->{len};
458 | print " $parent [$plen]";
459 | }
460 |
461 | print "\n";
462 | }
463 | }
464 |
465 | print STDERR "Printed $cnt total contained contigs\n";
466 |
467 |
468 |
--------------------------------------------------------------------------------
/test/run_test.output:
--------------------------------------------------------------------------------
1 | Running the simple example
2 | ===============================================================
3 | Generating pseudohaploid genome sequence
4 | ----------------------------------------
5 | GENOME: simple.fa
6 | OUTPREFIX: ph.simple
7 | MIN_IDENTITY: 90
8 | MIN_LENGTH: 1000
9 | MIN_CONTAIN: 93
10 | MAX_CHAIN_GAP: 20000
11 |
12 | 1. Aligning simple.fa to itself with nucmer
13 | Original assembly has 2 contigs
14 |
15 | 2. Filter for alignments longer than 1000 bp and below 90 identity
16 |
17 | 3. Generating coords file
18 |
19 | 4. Identifying alignment chains: min_id: 90 min_contain: 93 max_gap: 20000
20 | Processing coords file (ph.simple.filter.coords)...
21 | Processed 6 alignment records [4 valid]
22 | Finding chains for 2 contigs...
23 | Found 2 total edges [0.000 constructtime, 0.000 searchtime, 4 stackadd]
24 | Looking for contained contigs...
25 | Found 1 joint contained contigs
26 | Printed 1 total contained contigs
27 |
28 | 5. Generating a list of redundant contig ids using min_contain: 93
29 | Identified 1 redundant contig to remove in ph.simple.contained.ids
30 |
31 | 6. Creating final pseudohaploid assembly in ph.simple.pseudohap.fa
32 | Pseudohaploid assembly has 1 contigs
33 |
34 | This should report: Pseudohaploid assembly has 1 contigs
35 | ===============================================================
36 |
37 |
38 | Running the basic example
39 | ===============================================================
40 | Generating pseudohaploid genome sequence
41 | ----------------------------------------
42 | GENOME: basic.fa
43 | OUTPREFIX: ph.basic
44 | MIN_IDENTITY: 90
45 | MIN_LENGTH: 1000
46 | MIN_CONTAIN: 93
47 | MAX_CHAIN_GAP: 20000
48 |
49 | 1. Aligning basic.fa to itself with nucmer
50 | Original assembly has 4 contigs
51 |
52 | 2. Filter for alignments longer than 1000 bp and below 90 identity
53 |
54 | 3. Generating coords file
55 |
56 | 4. Identifying alignment chains: min_id: 90 min_contain: 93 max_gap: 20000
57 | Processing coords file (ph.basic.filter.coords)...
58 | Processed 8 alignment records [4 valid]
59 | Finding chains for 4 contigs...
60 | Found 2 total edges [0.000 constructtime, 0.000 searchtime, 4 stackadd]
61 | Looking for contained contigs...
62 | Found 1 joint contained contigs
63 | Printed 1 total contained contigs
64 |
65 | 5. Generating a list of redundant contig ids using min_contain: 93
66 | Identified 1 redundant contig to remove in ph.basic.contained.ids
67 |
68 | 6. Creating final pseudohaploid assembly in ph.basic.pseudohap.fa
69 | Pseudohaploid assembly has 3 contigs
70 |
71 | This should report: Pseudohaploid assembly has 3 contigs
72 | ===============================================================
73 |
--------------------------------------------------------------------------------
/test/run_tests.sh:
--------------------------------------------------------------------------------
1 | #!/bin/sh
2 |
3 | echo "Running the simple example"
4 | echo "==============================================================="
5 | mkdir -p simple
6 | cd simple
7 | ln -sf ../simple.fa
8 | ../../create_pseudohaploid.sh simple.fa ph.simple
9 | cd ..
10 |
11 | echo
12 | echo "This should report: Pseudohaploid assembly has 1 contigs"
13 |
14 | echo "==============================================================="
15 |
16 | echo
17 | echo
18 |
19 |
20 | echo "Running the basic example"
21 | echo "==============================================================="
22 | mkdir -p basic
23 | cd basic
24 | ln -sf ../basic.fa
25 | ../../create_pseudohaploid.sh basic.fa ph.basic
26 | cd ..
27 |
28 | echo
29 | echo "This should report: Pseudohaploid assembly has 3 contigs"
30 | echo "==============================================================="
31 |
--------------------------------------------------------------------------------
/test/simple.fa:
--------------------------------------------------------------------------------
1 | >ctg1
2 | AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
3 | TTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAA
4 | TATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACC
5 | ATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAG
6 | CCCGCACCTGACAGTGCGGGCTTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAA
7 | GTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCC
8 | AGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTGGCGATGATTG
9 | AAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTATTTTTGCCGAACTTTT
10 | GACGGGACTCGCCGCCGCCCAGCCGGGGTTCCCGCTGGCGCAATTGAAAACTTTCGTCGATCAGGAATTT
11 | GCCCAAATAAAACATGTCCTGCATGGCATTAGTTTGTTGGGGCAGTGCCCGGATAGCATCAACGCTGCGC
12 | TGATTTGCCGTGGCGAGAAAATGTCGATCGCCATTATGGCCGGCGTATTAGAAGCGCGCGGTCACAACGT
13 | TACTGTTATCGATCCGGTCGAAAAACTGCTGGCAGTGGGGCATTACCTCGAATCTACCGTCGATATTGCT
14 | GAGTCCACCCGCCGTATTGCGGCAAGCCGCATTCCGGCTGATCACATGGTGCTGATGGCAGGTTTCACCG
15 | CCGGTAATGAAAAAGGCGAACTGGTGGTGCTTGGACGCAACGGTTCCGACTACTCTGCTGCGGTGCTGGC
16 | TGCCTGTTTACGCGCCGATTGTTGCGAGATTTGGACGGACGTTGACGGGGTCTATACCTGCGACCCGCGT
17 | CAGGTGCCCGATGCGAGGTTGTTGAAGTCGATGTCCTACCAGGAAGCGATGGAGCTTTCCTACTTCGGCG
18 | CTAAAGTTCTTCACCCCCGCACCATTACCCCCATCGCCCAGTTCCAGATCCCTTGCCTGATTAAAAATAC
19 | CGGAAATCCTCAAGCACCAGGTACGCTCATTGGTGCCAGCCGTGATGAAGACGAATTACCGGTCAAGGGC
20 | ATTTCCAATCTGAATAACATGGCAATGTTCAGCGTTTCTGGTCCGGGGATGAAAGGGATGGTCGGCATGG
21 | CGGCGCGCGTCTTTGCAGCGATGTCACGCGCCCGTATTTCCGTGGTGCTGATTACGCAATCATCTTCCGA
22 | ATACAGCATCAGTTTCTGCGTTCCACAAAGCGACTGTGTGCGAGCTGAACGGGCAATGCAGGAAGAGTTC
23 | TACCTGGAACTGAAAGAAGGCTTACTGGAGCCGCTGGCAGTGACGGAACGGCTGGCCATTATCTCGGTGG
24 | TAGGTGATGGTATGCGCACCTTGCGTGGGATCTCGGCGAAATTCTTTGCCGCACTGGCCCGCGCCAATAT
25 | CAACATTGTCGCCATTGCTCAGGGATCTTCTGAACGCTCAATCTCTGTCGTGGTAAATAACGATGATGCG
26 | ACCACTGGCGTGCGCGTTACTCATCAGATGCTGTTCAATACCGATCAGGTTATCGAAGTGTTTGTGATTG
27 | GCGTCGGTGGCGTTGGCGGTGCGCTGCTGGAGCAACTGAAGCGTCAGCAAAGCTGGCTGAAGAATAAACA
28 | TATCGACTTACGTGTCTGCGGTGTTGCCAACTCGAAGGCTCTGCTCACCAATGTACATGGCCTTAATCTG
29 | GAAAACTGGCAGGAAGAACTGGCGCAAGCCAAAGAGCCGTTTAATCTCGGGCGCTTAATTCGCCTCGTGA
30 | AAGAATATCATCTGCTGAACCCGGTCATTGTTGACTGCACTTCCAGCCAGGCAGTGGCGGATCAATATGC
31 | CGACTTCCTGCGCGAAGGTTTCCACGTTGTCACGCCGAACAAAAAGGCCAACACCTCGTCGATGGATTAC
32 | TACCATCAGTTGCGTTATGCGGCGGAAAAATCGCGGCGTAAATTCCTCTATGACACCAACGTTGGGGCTG
33 | GATTACCGGTTATTGAGAACCTGCAAAATCTGCTCAATGCAGGTGATGAATTGATGAAGTTCTCCGGCAT
34 | TCTTTCTGGTTCGCTTTCTTATATCTTCGGCAAGTTAGACGAAGGCATGAGTTTCTCCGAGGCGACCACG
35 | CTGGCGCGGGAAATGGGTTATACCGAACCGGACCCGCGAGATGATCTTTCTGGTATGGATGTGGCGCGTA
36 | AACTATTGATTCTCGCTCGTGAAACGGGACGTGAACTGGAGCTGGCGGATATTGAAATTGAACCTGTGCT
37 | GCCCGCAGAGTTTAACGCCGAGGGTGATGTTGCCGCTTTTATGGCGAATCTGTCACAACTCGACGATCTC
38 | TTTGCCGCGCGCGTGGCGAAGGCCCGTGATGAAGGAAAAGTTTTGCGCTATGTTGGCAATATTGATGAAG
39 | ATGGCGTCTGCCGCGTGAAGATTGCCGAAGTGGATGGTAATGATCCGCTGTTCAAAGTGAAAAATGGCGA
40 | AAACGCCCTGGCCTTCTATAGCCACTATTATCAGCCGCTGCCGTTGGTACTGCGCGGATATGGTGCGGGC
41 | AATGACGTTACAGCTGCCGGTGTCTTTGCTGATCTGCTACGTACCCTCTCATGGAAGTTAGGAGTCTGAC
42 | ATGGTTAAAGTTTATGCCCCGGCTTCCAGTGCCAATATGAGCGTCGGGTTTGATGTGCTCGGGGCGGCGG
43 | TGACACCTGTTGATGGTGCATTGCTCGGAGATGTAGTCACGGTTGAGGCGGCAGAGACATTCAGTCTCAA
44 | CAACCTCGGACGCTTTGCCGATAAGCTGCCGTCAGAACCACGGGAAAATATCGTTTATCAGTGCTGGGAG
45 | CGTTTTTGCCAGGAACTGGGTAAGCAAATTCCAGTGGCGATGACCCTGGAAAAGAATATGCCGATCGGTT
46 | CGGGCTTAGGCTCCAGTGCCTGTTCGGTGGTCGCGGCGCTGATGGCGATGAATGAACACTGCGGCAAGCC
47 | GCTTAATGACACTCGTTTGCTGGCTTTGATGGGCGAGCTGGAAGGCCGTATCTCCGGCAGCATTCATTAC
48 | GACAACGTGGCACCGTGTTTTCTCGGTGGTATGCAGTTGATGATCGAAGAAAACGACATCATCAGCCAGC
49 | AAGTGCCAGGGTTTGATGAGTGGCTGTGGGTGCTGGCGTATCCGGGGATTAAAGTCTCGACGGCAGAAGC
50 | CAGGGCTATTTTACCGGCGCAGTATCGCCGCCAGGATTGCATTGCGCACGGGCGACATCTGGCAGGCTTC
51 | ATTCACGCCTGCTATTCCCGTCAGCCTGAGCTTGCCGCGAAGCTGATGAAAGATGTTATCGCTGAACCCT
52 | ACCGTGAACGGTTACTGCCAGGCTTCCGGCAGGCGCGGCAGGCGGTCGCGGAAATCGGCGCGGTAGCGAG
53 | CGGTATCTCCGGCTCCGGCCCGACCTTGTTCGCTCTGTGTGACAAGCCGGAAACCGCCCAGCGCGTTGCC
54 | GACTGGTTGGGTAAGAACTACCTGCAAAATCAGGAAGGTTTTGTTCATATTTGCCGGCTGGATACGGCGG
55 | GCGCACGAGTACTGGAAAACTAAATGAAACTCTACAATCTGAAAGATCACAACGAGCAGGTCAGCTTTGC
56 | GCAAGCCGTAACCCAGGGGTTGGGCAAAAATCAGGGGCTGTTTTTTCCGCACGACCTGCCGGAATTCAGC
57 | CTGACTGAAATTGATGAGATGCTGAAGCTGGATTTTGTCACCCGCAGTGCGAAGATCCTCTCGGCGTTTA
58 | TTGGTGATGAAATCCCACAGGAAATCCTGGAAGAGCGCGTGCGCGCGGCGTTTGCCTTCCCGGCTCCGGT
59 | CGCCAATGTTGAAAGCGATGTCGGTTGTCTGGAATTGTTCCACGGGCCAACGCTGGCATTTAAAGATTTC
60 | GGCGGTCGCTTTATGGCACAAATGCTGACCCATATTGCGGGTGATAAGCCAGTGACCATTCTGACCGCGA
61 | CCTCCGGTGATACCGGAGCGGCAGTGGCTCATGCTTTCTACGGTTTACCGAATGTGAAAGTGGTTATCCT
62 | CTATCCACGAGGCAAAATCAGTCCACTGCAAGAAAAACTGTTCTGTACATTGGGCGGCAATATCGAAACT
63 | GTTGCCATCGACGGCGATTTCGATGCCTGTCAGGCGCTGGTGAAGCAGGCGTTTGATGATGAAGAACTGA
64 | AAGTGGCGCTAGGGTTAAACTCGGCTAACTCGATTAACATCAGCCGTTTGCTGGCGCAGATTTGCTACTA
65 | CTTTGAAGCTGTTGCGCAGCTGCCGCAGGAGACGCGCAACCAGCTGGTTGTCTCGGTGCCAAGCGGAAAC
66 | TTCGGCGATTTGACGGCGGGTCTGCTGGCGAAGTCACTCGGTCTGCCGGTGAAACGTTTTATTGCTGCGA
67 | CCAACGTGAACGATACCGTGCCACGTTTCCTGCACGACGGTCAGTGGTCACCCAAAGCGACTCAGGCGAC
68 | GTTATCCAACGCGATGGACGTGAGTCAGCCGAACAACTGGCCGCGTGTGGAAGAGTTGTTCCGCCGCAAA
69 | ATCTGGCAACTGAAAGAGCTGGGTTATGCAGCCGTGGATGATGAAACCACGCAACAGACAATGCGTGAGT
70 | TAAAAGAACTGGGCTACACTTCGGAGCCGCACGCTGCCGTAGCTTATCGTGCGCTGCGTGATCAGTTGAA
71 | TCCAGGCGAATATGGCTTGTTCCTCGGCACCGCGCATCCGGCGAAATTTAAAGAGAGCGTGGAAGCGATT
72 | CTCGGTGAAACGTTGGATCTGCCAAAAGAGCTGGCAGAACGTGCTGATTTACCCTTGCTTTCACATAATC
73 | TGCCCGCCGATTTTGCTGCGTTGCGTAAATTGATGATGAATCATCAGTAAAATCTATTCATTATCTCAAT
74 | CAGGCCGGGTTTGCTTTTATGCAGCCCGGCTTTTTTATGAAGAAATTATGGAGAAAAATGACAGGGAAAA
75 | AGGAGAAATTCTCAATAAATGCGGTAACTTAGAGATTAGGATTGCGGAGAATAACAACCGCCGTTCTCAT
76 | CGAGTAATCTCCGGATATCGACCCATAACGGGCAATGATAAAAGGAGTAACCTGTGAAAAAGATGCAATC
77 | TATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCAGCACAGGCTGCGGAAATTACGTTAGTC
78 | CCGTCAGTAAAATTACAGATAGGCGATCGTGATAATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCG
79 | ACCACGGCTGGTGGAAACAACATTATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACC
80 | GCCGCGCCACCATAAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA
81 | ATGACAAATGCCGGGTAACAATCCGGCATTCAGCGCCTGATGCGACGCTGGCGCGTCTTATCAGGCCTAC
82 | GTTAATTCTGCAATATATTGAATCTGCATGCTTTTGTAGGCAGGATAAGGCGTTCACGCCGCATCCGGCA
83 | TTGACTGCAAACTTAACGCTGCTCGTAGCGTTTAAACACCAGTTCGCCATTGCTGGAGGAATCTTCATCA
84 | AAGAAGTAACCTTCGCTATTAAAACCAGTCAGTTGCTCTGGTTTGGTCAGCCGATTTTCAATAATGAAAC
85 | GACTCATCAGACCGCGTGCTTTCTTAGCGTAGAAGCTGATGATCTTAAATTTGCCGTTCTTCTCATCGAG
86 | GAACACCGGCTTGATAATCTCGGCATTCAATTTCTTCGGCTTCACCGATTTAAAATACTCATCTGACGCC
87 | AGATTAATCACCACATTATCGCCTTGTGCTGCGAGCGCCTCGTTCAGCTTGTTGGTGATGATATCTCCCC
88 | AGAATTGATACAGATCTTTCCCTCGGGCATTCTCAAGACGGATCCCCATTTCCAGACGATAAGGCTGCAT
89 | TAAATCGAGCGGGCGGAGTACGCCATACAAGCCGGAAAGCATTCGCAAATGCTGTTGGGCAAAATCGAAA
90 | TCGTCTTCGCTGAAGGTTTCGGCCTGCAAGCCGGTGTAGACATCACCTTTAAACGCCAGAATCGCCTGGC
91 | GGGCATTCGCCGGCGTGAAATCTGGCTGCCAGTCATGAAAGCGAGCGGCGTTGATACCCGCCAGTTTGTC
92 | GCTGATGCGCATCAGCGTGCTAATCTGCGGAGGCGTCAGTTTCCGCGCCTCATGGATCAACTGCTGGGAA
93 | TTGTCTAACAGCTCCGGCAGCGTATAGCGCGTGGTGGTCAACGGGCTTTGGTAATCAAGCGTTTTCGCAG
94 | GTGAAATAAGAATCAGCATATCCAGTCCTTGCAGGAAATTTATGCCGACTTTAGCAAAAAATGAGAATGA
95 | GTTGATCGATAGTTGTGATTACTCCTGCGAAACATCATCCCACGCGTCCGGAGAAAGCTGGCGACCGATA
96 | TCCGGATAACGCAATGGATCAAACACCGGGCGCACGCCGAGTTTACGCTGGCGTAGATAATCACTGGCAA
97 | TGGTATGAACCACAGGCGAGAGCAGTAAAATGGCGGTCAAATTGGTAATAGCCATGCAGGCCATTATGAT
98 | ATCTGCCAGTTGCCACATCAGCGGAAGGCTTAGCAAGGTGCCGCCGATGACCGTTGCGAAGGTGCAGATC
99 | CGCAAACACCAGATCGCTTTAGGGTTGTTCAGGCGTAAAAAGAAGAGATTGTTTTCGGCATAAATGTAGT
100 | TGGCAACGATGGAGCTGAAGGCAAACAGAATAACCACAAGGGTAACAAACTCAGCACCCCAGGAACCCAT
101 | TAGCACCCGCATCGCCTTCTGGATAAGCTGAATACCTTCCAGCGGCATGTAGGTTGTGCCGTTACCCGCC
102 | AGTAATATCAGCATGGCGCTTGCCGTACAGATGACCAGGGTGTCGATAAAAATGCCAATCATCTGGACAA
103 | TCCCTTGCGCTGCCGGATGCGGAGGCCAGGACGCCGCTGCCGCTGCCGCGTTTGGCGTCGAACCCATTCC
104 | CGCCTCATTGGAAAACATACTGCGCTGAAAACCGTTAGTAATCGCCTGGCTTAAGGTATATCCCGCCGCG
105 | CCGCCTGCCGCTTCCTGCCAGCCAAAAGCACTCTCAAAAATAGACCAAATGACGTGGGGAAGTTGCCCGA
106 | TATTCATTACGCAAATTACCAGGCTGGTCAGTACCCAGATTATCGCCATCAACGGGACAAAGCCCTGCAT
107 | GAGCCGGGCGACGCCATGAAGACCGCGAGTGATTGCCAGCAGAGTAAAGACAGCGAGAATAATGCCTGTC
108 | ACCAGCGGGGGAAAATCAAAAGAAAAACTCAGGGCGCGGGCAACGGCGTTCGCTTGAACTCCGCTGAAAA
109 | TTATGCCATAGGCGATGAGCAAAAAGACGGCGAACAGAACGCCCATCCAGCGCATCCCCAGCCCGCGCGC
110 | CATATACCATGCCGGTCCGCCACGAAACTGCCCATTGACGTCACGTTCTTTATAAAGTTGTGCCAGAGAA
111 | CATTCGGCAAACGAGGTCGCCATGCCGATAAACGCGGCAACCCACATCCAAAAGACGGCTCCAGGTCCAC
112 | CGGCGGTAATAGCCAGCGCAACGCCGGCCAGGTTGCCGCTACCCACGCGCGCCGCAAGACTGGTACACAA
113 | TGACTGAAATGAGGTTAAACCGCCTGGCTGTGGATGAATGCTATTTTTAAGACTTTTGCCAAACTGGCGG
114 | ATGTAGCGAAACTGCACAAATCCGGTGCGAAAAGTGAACCAACAACCTGCGCCGAAGAGCAGGTAAATCA
115 | TTACCGATCCCCAAAGGACGCTGTTAATGAAGGAGAAAAAATCTGGCATGCATATCCCTCTTATTGCCGG
116 | TCGCGATGACTTTCCTGTGTAAACGTTACCAATTGTTTAAGAAGTATATACGCTACGAGGTACTTGATAA
117 | CTTCTGCGTAGCATACATGAGGTTTTGTATAAAAATGGCGGGCGATATCAACGCAGTGTCAGAAATCCGA
118 | AACAGTCTCGCCTGGCGATAACCGTCTTGTCGGCGGTTGCGCTGACGTTGCGTCGTGATATCATCAGGGC
119 | AGACCGGTTACATCCCCCTAACAAGCTGTTTAAAGAGAAATACTATCATGACGGACAAATTGACCTCCCT
120 | TCGTCAGTACACCACCGTAGTGGCCGACACTGGGGACATCGCGGCAATGAAGCTGTATCAACCGCAGGAT
121 | GCCACAACCAACCCTTCTCTCATTCTTAACGCAGCGCAGATTCCGGAATACCGTAAGTTGATTGATGATG
122 | CTGTCGCCTGGGCGAAACAGCAGAGCAACGATCGCGCGCAGCAGATCGTGGACGCGACCGACAAACTGGC
123 | AGTAAATATTGGTCTGGAAATCCTGAAACTGGTTCCGGGCCGTATCTCAACTGAAGTTGATGCGCGTCTT
124 | TCCTATGACACCGAAGCGTCAATTGCGAAAGCAAAACGCCTGATCAAACTCTACAACGATGCTGGTATTA
125 | GCAACGATCGTATTCTGATCAAACTGGCTTCTACCTGGCAGGGTATCCGTGCTGCAGAACAGCTGGAAAA
126 | AGAAGGCATCAACTGTAACCTGACCCTGCTGTTCTCCTTCGCTCAGGCTCGTGCTTGTGCGGAAGCGGGC
127 | GTGTTCCTGATCTCGCCGTTTGTTGGCCGTATTCTTGACTGGTACAAAGCGAATACCGATAAGAAAGAGT
128 | ACGCTCCGGCAGAAGATCCGGGCGTGGTTTCTGTATCTGAAATCTACCAGTACTACAAAGAGCACGGTTA
129 | TGAAACCGTGGTTATGGGCGCAAGCTTCCGTAACATCGGCGAAATTCTGGAACTGGCAGGCTGCGACCGT
130 | CTGACCATCGCACCGGCACTGCTGAAAGAGCTGGCGGAGAGCGAAGGGGCTATCGAACGTAAACTGTCTT
131 | ACACCGGCGAAGTGAAAGCGCGTCCGGCGCGTATCACTGAGTCCGAGTTCCTGTGGCAGCACAACCAGGA
132 | TCCAATGGCAGTAGATAAACTGGCGGAAGGTATCCGTAAGTTTGCTATTGACCAGGAAAAACTGGAAAAA
133 | ATGATCGGCGATCTGCTGTAATCATTCTTAGCGTGACCGGGAAGTCGGTCACGCTACCTCTTCTGAAGCC
134 | TGTCTGTCACTCCCTTCGCAGTGTATCATTCTGTTTAACGAGACTGTTTAAACGGAAAAATCTTGATGAA
135 | TACTTTACGTATTGGCTTAGTTTCCATCTCTGATCGCGCATCCAGCGGCGTTTATCAGGATAAAGGCATC
136 | CCTGCGCTGGAAGAATGGCTGACATCGGCGCTAACCACGCCGTTTGAACTGGAAACCCGCTTAATCCCCG
137 | ATGAGCAGGCGATCATCGAGCAAACGTTGTGTGAGCTGGTGGATGAAATGAGTTGCCATCTGGTGCTCAC
138 | CACGGGCGGAACTGGCCCGGCGCGTCGTGACGTAACGCCCGATGCGACGCTGGCAGTAGCGGACCGCGAG
139 | ATGCCTGGCTTTGGTGAACAGATGCGCCAGATCAGCCTGCATTTTGTACCAACTGCGATCCTTTCGCGTC
140 | AGGTGGGCGTGATTCGCAAACAGGCGCTGATCCTTAACTTACCCGGTCAGCCGAAGTCTATTAAAGAGAC
141 | GCTGGAAGGTGTGAAGGACGCTGAGGGTAACGTTGTGGTACACGGTATTTTTGCCAGCGTACCGTACTGC
142 | ATTCAGTTGCTGGAAGGGCCATACGTTGAAACGGCACCGGAAGTGGTTGCAGCATTCAGACCGAAGAGTG
143 | CAAGACGCGACGTTAGCGAATAAAAAAATCCCCCCGAGCGGGGGGATCTCAAAACAATTAGTGGGATTCA
144 | CCAATCGGCAGAACGGTGCGACCAAACTGCTCGTTCAGTACTTCACCCATCGCCAGATAGATTGCGCTGG
145 | CACCGCAGATCAGCCCAATCCAGCCGGCAAAGTGGATGATTGCGGCGTTACCGGCAATGTTACCGATCGC
146 | CAGCAGGGCAAACAGCACGGTCAGGCTAAAGAAAACGAATTGCAGAACGCGTGCGCCTTTCAGCGTGCCG
147 | AAGAACATAAACAGCGTAAATACGCCCCACAGACCCAGGTAGACACCAAGGAACTGTGCATTTGGCGCAT
148 | CGGTCAGACCCAGTTTCGGCATCAGCAGAATCGCAACCAGCGTCAGCCAGAAAGAACCGTAAGAGGTGAA
149 | TGCGGTTAAACCGAAAGTGTTGCCTTTTTTGTACTCCAGCAGACCAGCAAAAATTTGCGCGATGCCGCCG
150 | TAGAAAATGCCCATGGCAAGAATAATACCGTCCAGAGCGAAATAACCCACGTTGTGCAGGTTAAGCAGAA
151 | TGGTGGTCATGCCGAAGCCCATCAGGCCCAGCGGTGCCGGATTAGCCAACTTAGTGTTGCCCATAATTCC
152 | TCAAAAATCATCATCGAATGAATGGTGAAATAATTTCCCTGAATAACTGTAGTGTTTTCAGGGCGCGGCA
153 | TAATAATCAGCCAGTGGGGCAGTGTCTACGATCTTTTGAGGGGAAAATGAAAATTTTCCCCGGTTTCCGG
154 | TATCAGACCTGAGTGGCGCTAACCATCCGGCGCAGGCAGGCGATTTGCAGTACGGCTGGAATCGTCACGC
155 | GATAGGCGCTGCCGCTGACCGCTTTAACCCCATTTAGTGCCGCACCTACAGGGCCTCCCAGCCCCGCGCC
156 | GCGCAGCAAACCATGCCCAAGTACGCTCATTGCTGCGTGGGTGCGTAAAATGCGGGTCAGTTGGCTGGAA
157 | AGCAAATGCGACACACCTTTTGCCAATAATTTGTCTTTCATCAGCAGCGGCAGCAGCTCTTCCAGCTCAT
158 | TCACCCTGGCATCGACCGCGTGCAGAAACTCCTGCTTATGTTCCTCGTCCATTTTCTTCCAGGTATTACG
159 | CAGAAATTGTTCCAGTAACTGTTGCTCAATTTCAAACGTAGACATCTCTTTGTCGGCTTTCAGCTTCAAT
160 | CGCTTTGAAACATCGAGCAAAATGGCCCGATACAATTTACCGTGTCCGCGCAGTTTGTTGGCGATACTAT
161 | CGCCACCAAAATGCTGTAATTCTCCGGCAATCAGCTGCCAGTTGCGGCGATGTTGCTCGGGATGCCCTTC
162 | CATCGATTTAAACAGTTCGTTGCGCATCAGTACGCTGGAGAGGCGAGTTTTGCCTTTTTCATTATGGGTG
163 | AGCAATCGGGCGAAATTTGCCAACTGTTCCTCACTACAATGCTGAAGAAAATCCAGATCTGAATCATTCA
164 | GGTAATTAACATTCATTTTTTGTGGCTTCTATATTCTGGCGTTAGTCGTCGCCGATAATTTTCAGCGTGG
165 | CCATATCCGATGAGTTCACCGTATGACCCGAAAAGGTGATTTTTGAGACGCAGCGTTTATTGTCGTTATC
166 | GCTGTTAATGTTGATCCAGTCAGTGGTTTGCCCTTCTTTTATTTCTGAAGGAATATTCAGGCTCTGACTG
167 | GCGCTACGGGCGGCTTTGAAATAAACCGATGCACCGCTTAACTGTAAATCGCCATGGTCGGCAGAGAGTT
168 | GTATGCGTTTCACAATGCGACAAACAGGAAGTTTCAGCGCCAGATCGTTGGTTTCGTTACGCGGCATTGC
169 | AATGGCGCCGAGGAGTTTATGGTCGTTTGCCTGCGCCGTGCAGCACAGCATCAGGCTAATCGCCAGGCTG
170 | GCGGAAATCGTAAAAACGGATTTCATAAGGATTCTCTTAGTGGGAAGAGGTAGGGGGATGAATACCCACT
171 | AGTTTACTGCTGATAAAGAGAAGATTCAGGCACGTAATCTTTTCTTTTTATTACAATTTTTTGATGAATG
172 | CCTTGGCTGCGATTCATTCTTTATATGAATAAAATTGCTGTCAATTTTACGTCTTGTCCTGCCATATCGC
173 | GAAATTTCTGCGCAAAAGCACAAAAAATTTTTGCATCTCCCCCTTGATGACGTGGTTTACGACCCCATTT
174 | AGTAGTCAACCGCAGTGAGTGAGTCTGCAAAAAAATGAAATTGGGCAGTTGAAACCAGACGTTTCGCCCC
175 | TATTACAGACTCACAACCACATGATGACCGAATATATAGTGGAGACGTTTAGATGGGTAAAATAATTGGT
176 | ATCGACCTGGGTACTACCAACTCTTGTGTAGCGATTATGGATGGCACCACTCCTCGCGTGCTGGAGAACG
177 | CCGAAGGCGATCGCACCACGCCTTCTATCATTGCCTATACCCAGGATGGTGAAACTCTAGTTGGTCAGCC
178 | GGCTAAACGTCAGGCAGTGACGAACCCGCAAAACACTCTGTTTGCGATTAAACGCCTGATTGGTCGCCGC
179 | TTCCAGGACGAAGAAGTACAGCGTGATGTTTCCATCATGCCGTTCAAAATTATTGCTGCTGATAACGGCG
180 | ACGCATGGGTCGAAGTTAAAGGCCAGAAAATGGCACCGCCGCAGATTTCTGCTGAAGTGCTGAAAAAAAT
181 | GAAGAAAACCGCTGAAGATTACCTGGGTGAACCGGTAACTGAAGCTGTTATCACCGTACCGGCATACTTT
182 | AACGATGCTCAGCGTCAGGCAACCAAAGACGCAGGCCGTATCGCTGGTCTGGAAGTAAAACGTATCATCA
183 | ACGAACCGACCGCAGCTGCGCTGGCTTACGGTCTGGACAAAGGCACTGGCAACCGTACTATCGCGGTTTA
184 | TGACCTGGGTGGTGGTACTTTCGATATTTCTATTATCGAAATCGACGAAGTTGACGGCGAAAAAACCTTC
185 | GAAGTTCTGGCAACCAACGGTGATACCCACCTGGGGGGTGAAGACTTCGACAGCCGTCTGATCAACTATC
186 | TGGTTGAAGAATTCAAGAAAGATCAGGGCATTGACCTGCGCAACGATCCGCTGGCAATGCAGCGCCTGAA
187 | AGAAGCGGCAGAAAAAGCGAAAATCGAACTGTCTTCCGCTCAGCAGACCGACGTTAACCTGCCATACATC
188 | ACTGCAGACGCGACCGGTCCGAAACACATGAACATCAAAGTGACTCGTGCGAAACTGGAAAGCCTGGTTG
189 | AAGATCTGGTAAACCGTTCCATTGAGCCGCTGAAAGTTGCACTGCAGGACGCTGGCCTGTCCGTATCTGA
190 | TATCGACGACGTTATCCTCGTTGGTGGTCAGACTCGTATGCCAATGGTTCAGAAGAAAGTTGCTGAGTTC
191 | TTTGGTAAAGAGCCGCGTAAAGACGTTAACCCGGACGAAGCTGTAGCAATCGGTGCTGCTGTTCAGGGTG
192 | GTGTTCTGACTGGTGACGTAAAAGACGTACTGCTGCTGGACGTTACCCCGCTGTCTCTGGGTATCGAAAC
193 | CATGGGCGGTGTGATGACGACGCTGATCGCGAAAAACACCACTATCCCGACCAAGCACAGCCAGGTGTTC
194 | TCTACCGCTGAAGACAACCAGTCTGCGGTAACCATCCATGTGCTGCAGGGTGAACGTAAACGTGCGGCTG
195 | ATAACAAATCTCTGGGTCAGTTCAACCTAGATGGTATCAACCCGGCACCGCGCGGCATGCCGCAGATCGA
196 | AGTTACCTTCGATATCGATGCTGACGGTATCCTGCACGTTTCCGCGAAAGATAAAAACAGCGGTAAAGAG
197 | CAGAAGATCACCATCAAGGCTTCTTCTGGTCTGAACGAAGATGAAATCCAGAAAATGGTACGCGACGCAG
198 | AAGCTAACGCCGAAGCTGACCGTAAGTTTGAAGAGCTGGTACAGACTCGCAACCAGGGCGACCATCTGCT
199 | GCACAGCACCCGTAAGCAGGTTGAAGAAGCAGGCGACAAACTGCCGGCTGACGACAAAACTGCTATCGAG
200 | TCTGCGCTGACTGCACTGGAAACTGCTCTGAAAGGTGAAGACAAAGCCGCTATCGAAGCGAAAATGCAGG
201 | >ctg2
202 | CTGGCGCGGGAAATGGGTTATACCGAACCGGACCCGCGAGATGATCTTTCTGGTATGGATGTGGCGCGTA
203 | AACTATTGATTCTCGCTCGTGAAACGGGACGTGAACTGGAGCTGGCGGATATTGAAATTGAACCTGTGCT
204 | GCCCGCAGAGTTTAACGCCGAGGGTGATGTTGCCGCTTTTATGGCGAATCTGTCACAACTCGACGATCTC
205 | TTTGCCGCGCGCGTGGCGAAGGCCCGTGATGAAGGAAAAGTTTTGCGCTATGTTGGCAATATTGATGAAG
206 | ATGGCGTCTGCCGCGTGAAGATTGCCGAAGTGGATGGTAATGATCCGCTGTTCAAAGTGAAAAATGGCGA
207 | AAACGCCCTGGCCTTCTATAGCCACTATTATCAGCCGCTGCCGTTGGTACTGCGCGGATATGGTGCGGGC
208 | AATGACGTTACAGCTGCCGGTGTCTTTGCTGATCTGCTACGTACCCTCTCATGGAAGTTAGGAGTCTGAC
209 | ATGGTTAAAGTTTATGCCCCGGCTTCCAGTGCCAATATGAGCGTCGGGTTTGATGTGCTCGGGGCGGCGG
210 | TGACACCTGTTGATGGTGCATTGCTCGGAGATGTAGTCACGGTTGAGGCGGCAGAGACATTCAGTCTCAA
211 | CAACCTCGGACGCTTTGCCGATAAGCTGCCGTCAGAACCACGGGAAAATATCGTTTATCAGTGCTGGGAG
212 | CGTTTTTGCCAGGAACTGGGTAAGCAAATTCCAGTGGCGATGACCCTGGAAAAGAATATGCCGATCGGTT
213 | CGGGCTTAGGCTCCAGTGCCTGTTCGGTGGTCGCGGCGCTGATGGCGATGAATGAACACTGCGGCAAGCC
214 | GCTTAATGACACTCGTTTGCTGGCTTTGATGGGCGAGCTGGAAGGCCGTATCTCCGGCAGCATTCATTAC
215 | GACAACGTGGCACCGTGTTTTCTCGGTGGTATGCAGTTGATGATCGAAGAAAACGACATCATCAGCCAGC
216 | AAGTGCCAGGGTTTGATGAGTGGCTGTGGGTGCTGGCGTATCCGGGGATTAAAGTCTCGACGGCAGAAGC
217 | CAGGGCTATTTTACCGGCGCAGTATCGCCGCCAGGATTGCATTGCGCACGGGCGACATCTGGCAGGCTTC
218 | ATTCACGCCTGCTATTCCCGTCAGCCTGAGCTTGCCGCGAAGCTGATGAAAGATGTTATCGCTGAACCCT
219 | ACCGTGAACGGTTACTGCCAGGCTTCCGGCAGGCGCGGCAGGCGGTCGCGGAAATCGGCGCGGTAGCGAG
220 | CGGTATCTCCGGCTCCGGCCCGACCTTGTTCGCTCTGTGTGACAAGCCGGAAACCGCCCAGCGCGTTGCC
221 | GACTGGTTGGGTAAGAACTACCTGCAAAATCAGGAAGGTTTTGTTCATATTTGCCGGCTGGATACGGCGG
222 | GCGCACGAGTACTGGAAAACTAAATGAAACTCTACAATCTGAAAGATCACAACGAGCAGGTCAGCTTTGC
223 | GCAAGCCGTAACCCAGGGGTTGGGCAAAAATCAGGGGCTGTTTTTTCCGCACGACCTGCCGGAATTCAGC
224 | CTGACTGAAATTGATGAGATGCTGAAGCTGGATTTTGTCACCCGCAGTGCGAAGATCCTCTCGGCGTTTA
225 | TTGGTGATGAAATCCCACAGGAAATCCTGGAAGAGCGCGTGCGCGCGGCGTTTGCCTTCCCGGCTCCGGT
226 | CGCCAATGTTGAAAGCGATGTCGGTTGTCTGGAATTGTTCCACGGGCCAACGCTGGCATTTAAAGATTTC
227 | GGCGGTCGCTTTATGGCACAAATGCTGACCCATATTGCGGGTGATAAGCCAGTGACCATTCTGACCGCGA
228 | CCTCCGGTGATACCGGAGCGGCAGTGGCTCATGCTTTCTACGGTTTACCGAATGTGAAAGTGGTTATCCT
229 | CTATCCACGAGGCAAAATCAGTCCACTGCAAGAAAAACTGTTCTGTACATTGGGCGGCAATATCGAAACT
230 | GTTGCCATCGACGGCGATTTCGATGCCTGTCAGGCGCTGGTGAAGCAGGCGTTTGATGATGAAGAACTGA
231 | AAGTGGCGCTAGGGTTAAACTCGGCTAACTCGATTAACATCAGCCGTTTGCTGGCGCAGATTTGCTACTA
232 | CTTTGAAGCTGTTGCGCAGCTGCCGCAGGAGACGCGCAACCAGCTGGTTGTCTCGGTGCCAAGCGGAAAC
233 | TTCGGCGATTTGACGGCGGGTCTGCTGGCGAAGTCACTCGGTCTGCCGGTGAAACGTTTTATTGCTGCGA
234 | CCAACGTGAACGATACCGTGCCACGTTTCCTGCACGACGGTCAGTGGTCACCCAAAGCGACTCAGGCGAC
235 | GTTATCCAACGCGATGGACGTGAGTCAGCCGAACAACTGGCCGCGTGTGGAAGAGTTGTTCCGCCGCAAA
236 | ATCTGGCAACTGAAAGAGCTGGGTTATGCAGCCGTGGATGATGAAACCACGCAACAGACAATGCGTGAGT
237 | TAAAAGAACTGGGCTACACTTCGGAGCCGCACGCTGCCGTAGCTTATCGTGCGCTGCGTGATCAGTTGAA
238 | TCCAGGCGAATATGGCTTGTTCCTCGGCACCGCGCATCCGGCGAAATTTAAAGAGAGCGTGGAAGCGATT
239 | CTCGGTGAAACGTTGGATCTGCCAAAAGAGCTGGCAGAACGTGCTGATTTACCCTTGCTTTCACATAATC
240 | TGCCCGCCGATTTTGCTGCGTTGCGTAAATTGATGATGAATCATCAGTAAAATCTATTCATTATCTCAAT
241 | CAGGCCGGGTTTGCTTTTATGCAGCCCGGCTTTTTTATGAAGAAATTATGGAGAAAAATGACAGGGAAAA
242 | AGGAGAAATTCTCAATAAATGCGGTAACTTAGAGATTAGGATTGCGGAGAATAACAACCGCCGTTCTCAT
243 | CGAGTAATCTCCGGATATCGACCCATAACGGGCAATGATAAAAGGAGTAACCTGTGAAAAAGATGCAATC
244 | TATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCAGCACAGGCTGCGGAAATTACGTTAGTC
245 | CCGTCAGTAAAATTACAGATAGGCGATCGTGATAATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCG
246 | ACCACGGCTGGTGGAAACAACATTATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACC
247 | GCCGCGCCACCATAAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA
248 | AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
249 | AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
250 | AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
251 | AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
252 | AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
253 | AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
254 | ATGACAAATGCCGGGTAACAATCCGGCATTCAGCGCCTGATGCGACGCTGGCGCGTCTTATCAGGCCTAC
255 | GTTAATTCTGCAATATATTGAATCTGCATGCTTTTGTAGGCAGGATAAGGCGTTCACGCCGCATCCGGCA
256 | TTGACTGCAAACTTAACGCTGCTCGTAGCGTTTAAACACCAGTTCGCCATTGCTGGAGGAATCTTCATCA
257 | AAGAAGTAACCTTCGCTATTAAAACCAGTCAGTTGCTCTGGTTTGGTCAGCCGATTTTCAATAATGAAAC
258 | GACTCATCAGACCGCGTGCTTTCTTAGCGTAGAAGCTGATGATCTTAAATTTGCCGTTCTTCTCATCGAG
259 | GAACACCGGCTTGATAATCTCGGCATTCAATTTCTTCGGCTTCACCGATTTAAAATACTCATCTGACGCC
260 | AGATTAATCACCACATTATCGCCTTGTGCTGCGAGCGCCTCGTTCAGCTTGTTGGTGATGATATCTCCCC
261 | AGAATTGATACAGATCTTTCCCTCGGGCATTCTCAAGACGGATCCCCATTTCCAGACGATAAGGCTGCAT
262 | TAAATCGAGCGGGCGGAGTACGCCATACAAGCCGGAAAGCATTCGCAAATGCTGTTGGGCAAAATCGAAA
263 | TCGTCTTCGCTGAAGGTTTCGGCCTGCAAGCCGGTGTAGACATCACCTTTAAACGCCAGAATCGCCTGGC
264 | GGGCATTCGCCGGCGTGAAATCTGGCTGCCAGTCATGAAAGCGAGCGGCGTTGATACCCGCCAGTTTGTC
265 | GCTGATGCGCATCAGCGTGCTAATCTGCGGAGGCGTCAGTTTCCGCGCCTCATGGATCAACTGCTGGGAA
266 | TTGTCTAACAGCTCCGGCAGCGTATAGCGCGTGGTGGTCAACGGGCTTTGGTAATCAAGCGTTTTCGCAG
267 | GTGAAATAAGAATCAGCATATCCAGTCCTTGCAGGAAATTTATGCCGACTTTAGCAAAAAATGAGAATGA
268 | GTTGATCGATAGTTGTGATTACTCCTGCGAAACATCATCCCACGCGTCCGGAGAAAGCTGGCGACCGATA
269 | TCCGGATAACGCAATGGATCAAACACCGGGCGCACGCCGAGTTTACGCTGGCGTAGATAATCACTGGCAA
270 | TGGTATGAACCACAGGCGAGAGCAGTAAAATGGCGGTCAAATTGGTAATAGCCATGCAGGCCATTATGAT
271 | ATCTGCCAGTTGCCACATCAGCGGAAGGCTTAGCAAGGTGCCGCCGATGACCGTTGCGAAGGTGCAGATC
272 | CGCAAACACCAGATCGCTTTAGGGTTGTTCAGGCGTAAAAAGAAGAGATTGTTTTCGGCATAAATGTAGT
273 | TGGCAACGATGGAGCTGAAGGCAAACAGAATAACCACAAGGGTAACAAACTCAGCACCCCAGGAACCCAT
274 | TAGCACCCGCATCGCCTTCTGGATAAGCTGAATACCTTCCAGCGGCATGTAGGTTGTGCCGTTACCCGCC
275 | AGTAATATCAGCATGGCGCTTGCCGTACAGATGACCAGGGTGTCGATAAAAATGCCAATCATCTGGACAA
276 | TCCCTTGCGCTGCCGGATGCGGAGGCCAGGACGCCGCTGCCGCTGCCGCGTTTGGCGTCGAACCCATTCC
277 | CGCCTCATTGGAAAACATACTGCGCTGAAAACCGTTAGTAATCGCCTGGCTTAAGGTATATCCCGCCGCG
278 | CCGCCTGCCGCTTCCTGCCAGCCAAAAGCACTCTCAAAAATAGACCAAATGACGTGGGGAAGTTGCCCGA
279 | TATTCATTACGCAAATTACCAGGCTGGTCAGTACCCAGATTATCGCCATCAACGGGACAAAGCCCTGCAT
280 | GAGCCGGGCGACGCCATGAAGACCGCGAGTGATTGCCAGCAGAGTAAAGACAGCGAGAATAATGCCTGTC
281 | ACCAGCGGGGGAAAATCAAAAGAAAAACTCAGGGCGCGGGCAACGGCGTTCGCTTGAACTCCGCTGAAAA
282 | TTATGCCATAGGCGATGAGCAAAAAGACGGCGAACAGAACGCCCATCCAGCGCATCCCCAGCCCGCGCGC
283 | CATATACCATGCCGGTCCGCCACGAAACTGCCCATTGACGTCACGTTCTTTATAAAGTTGTGCCAGAGAA
284 | CATTCGGCAAACGAGGTCGCCATGCCGATAAACGCGGCAACCCACATCCAAAAGACGGCTCCAGGTCCAC
285 | CGGCGGTAATAGCCAGCGCAACGCCGGCCAGGTTGCCGCTACCCACGCGCGCCGCAAGACTGGTACACAA
286 | TGACTGAAATGAGGTTAAACCGCCTGGCTGTGGATGAATGCTATTTTTAAGACTTTTGCCAAACTGGCGG
287 | ATGTAGCGAAACTGCACAAATCCGGTGCGAAAAGTGAACCAACAACCTGCGCCGAAGAGCAGGTAAATCA
288 | TTACCGATCCCCAAAGGACGCTGTTAATGAAGGAGAAAAAATCTGGCATGCATATCCCTCTTATTGCCGG
289 | TCGCGATGACTTTCCTGTGTAAACGTTACCAATTGTTTAAGAAGTATATACGCTACGAGGTACTTGATAA
290 | CTTCTGCGTAGCATACATGAGGTTTTGTATAAAAATGGCGGGCGATATCAACGCAGTGTCAGAAATCCGA
291 | AACAGTCTCGCCTGGCGATAACCGTCTTGTCGGCGGTTGCGCTGACGTTGCGTCGTGATATCATCAGGGC
292 | AGACCGGTTACATCCCCCTAACAAGCTGTTTAAAGAGAAATACTATCATGACGGACAAATTGACCTCCCT
293 | TCGTCAGTACACCACCGTAGTGGCCGACACTGGGGACATCGCGGCAATGAAGCTGTATCAACCGCAGGAT
294 | GCCACAACCAACCCTTCTCTCATTCTTAACGCAGCGCAGATTCCGGAATACCGTAAGTTGATTGATGATG
295 | CTGTCGCCTGGGCGAAACAGCAGAGCAACGATCGCGCGCAGCAGATCGTGGACGCGACCGACAAACTGGC
296 | AGTAAATATTGGTCTGGAAATCCTGAAACTGGTTCCGGGCCGTATCTCAACTGAAGTTGATGCGCGTCTT
297 | TCCTATGACACCGAAGCGTCAATTGCGAAAGCAAAACGCCTGATCAAACTCTACAACGATGCTGGTATTA
298 | GCAACGATCGTATTCTGATCAAACTGGCTTCTACCTGGCAGGGTATCCGTGCTGCAGAACAGCTGGAAAA
299 | AGAAGGCATCAACTGTAACCTGACCCTGCTGTTCTCCTTCGCTCAGGCTCGTGCTTGTGCGGAAGCGGGC
300 | GTGTTCCTGATCTCGCCGTTTGTTGGCCGTATTCTTGACTGGTACAAAGCGAATACCGATAAGAAAGAGT
301 | ACGCTCCGGCAGAAGATCCGGGCGTGGTTTCTGTATCTGAAATCTACCAGTACTACAAAGAGCACGGTTA
302 | TGAAACCGTGGTTATGGGCGCAAGCTTCCGTAACATCGGCGAAATTCTGGAACTGGCAGGCTGCGACCGT
303 | CTGACCATCGCACCGGCACTGCTGAAAGAGCTGGCGGAGAGCGAAGGGGCTATCGAACGTAAACTGTCTT
304 | ACACCGGCGAAGTGAAAGCGCGTCCGGCGCGTATCACTGAGTCCGAGTTCCTGTGGCAGCACAACCAGGA
305 | TCCAATGGCAGTAGATAAACTGGCGGAAGGTATCCGTAAGTTTGCTATTGACCAGGAAAAACTGGAAAAA
306 | ATGATCGGCGATCTGCTGTAATCATTCTTAGCGTGACCGGGAAGTCGGTCACGCTACCTCTTCTGAAGCC
307 | TGTCTGTCACTCCCTTCGCAGTGTATCATTCTGTTTAACGAGACTGTTTAAACGGAAAAATCTTGATGAA
308 | TACTTTACGTATTGGCTTAGTTTCCATCTCTGATCGCGCATCCAGCGGCGTTTATCAGGATAAAGGCATC
309 | CCTGCGCTGGAAGAATGGCTGACATCGGCGCTAACCACGCCGTTTGAACTGGAAACCCGCTTAATCCCCG
310 | ATGAGCAGGCGATCATCGAGCAAACGTTGTGTGAGCTGGTGGATGAAATGAGTTGCCATCTGGTGCTCAC
311 | CACGGGCGGAACTGGCCCGGCGCGTCGTGACGTAACGCCCGATGCGACGCTGGCAGTAGCGGACCGCGAG
312 | ATGCCTGGCTTTGGTGAACAGATGCGCCAGATCAGCCTGCATTTTGTACCAACTGCGATCCTTTCGCGTC
313 | AGGTGGGCGTGATTCGCAAACAGGCGCTGATCCTTAACTTACCCGGTCAGCCGAAGTCTATTAAAGAGAC
314 | GCTGGAAGGTGTGAAGGACGCTGAGGGTAACGTTGTGGTACACGGTATTTTTGCCAGCGTACCGTACTGC
315 | ATTCAGTTGCTGGAAGGGCCATACGTTGAAACGGCACCGGAAGTGGTTGCAGCATTCAGACCGAAGAGTG
316 | CAAGACGCGACGTTAGCGAATAAAAAAATCCCCCCGAGCGGGGGGATCTCAAAACAATTAGTGGGATTCA
317 | CCAATCGGCAGAACGGTGCGACCAAACTGCTCGTTCAGTACTTCACCCATCGCCAGATAGATTGCGCTGG
318 | CACCGCAGATCAGCCCAATCCAGCCGGCAAAGTGGATGATTGCGGCGTTACCGGCAATGTTACCGATCGC
319 | CAGCAGGGCAAACAGCACGGTCAGGCTAAAGAAAACGAATTGCAGAACGCGTGCGCCTTTCAGCGTGCCG
320 | AAGAACATAAACAGCGTAAATACGCCCCACAGACCCAGGTAGACACCAAGGAACTGTGCATTTGGCGCAT
321 | CGGTCAGACCCAGTTTCGGCATCAGCAGAATCGCAACCAGCGTCAGCCAGAAAGAACCGTAAGAGGTGAA
322 | TGCGGTTAAACCGAAAGTGTTGCCTTTTTTGTACTCCAGCAGACCAGCAAAAATTTGCGCGATGCCGCCG
323 | TAGAAAATGCCCATGGCAAGAATAATACCGTCCAGAGCGAAATAACCCACGTTGTGCAGGTTAAGCAGAA
324 | TGGTGGTCATGCCGAAGCCCATCAGGCCCAGCGGTGCCGGATTAGCCAACTTAGTGTTGCCCATAATTCC
325 | TCAAAAATCATCATCGAATGAATGGTGAAATAATTTCCCTGAATAACTGTAGTGTTTTCAGGGCGCGGCA
326 | TAATAATCAGCCAGTGGGGCAGTGTCTACGATCTTTTGAGGGGAAAATGAAAATTTTCCCCGGTTTCCGG
327 | TATCAGACCTGAGTGGCGCTAACCATCCGGCGCAGGCAGGCGATTTGCAGTACGGCTGGAATCGTCACGC
328 | GATAGGCGCTGCCGCTGACCGCTTTAACCCCATTTAGTGCCGCACCTACAGGGCCTCCCAGCCCCGCGCC
329 | GCGCAGCAAACCATGCCCAAGTACGCTCATTGCTGCGTGGGTGCGTAAAATGCGGGTCAGTTGGCTGGAA
330 | AGCAAATGCGACACACCTTTTGCCAATAATTTGTCTTTCATCAGCAGCGGCAGCAGCTCTTCCAGCTCAT
331 | TCACCCTGGCATCGACCGCGTGCAGAAACTCCTGCTTATGTTCCTCGTCCATTTTCTTCCAGGTATTACG
332 | CAGAAATTGTTCCAGTAACTGTTGCTCAATTTCAAACGTAGACATCTCTTTGTCGGCTTTCAGCTTCAAT
333 | CGCTTTGAAACATCGAGCAAAATGGCCCGATACAATTTACCGTGTCCGCGCAGTTTGTTGGCGATACTAT
334 | CGCCACCAAAATGCTGTAATTCTCCGGCAATCAGCTGCCAGTTGCGGCGATGTTGCTCGGGATGCCCTTC
335 | CATCGATTTAAACAGTTCGTTGCGCATCAGTACGCTGGAGAGGCGAGTTTTGCCTTTTTCATTATGGGTG
336 | AGCAATCGGGCGAAATTTGCCAACTGTTCCTCACTACAATGCTGAAGAAAATCCAGATCTGAATCATTCA
337 | GGTAATTAACATTCATTTTTTGTGGCTTCTATATTCTGGCGTTAGTCGTCGCCGATAATTTTCAGCGTGG
338 | CCATATCCGATGAGTTCACCGTATGACCCGAAAAGGTGATTTTTGAGACGCAGCGTTTATTGTCGTTATC
339 | GCTGTTAATGTTGATCCAGTCAGTGGTTTGCCCTTCTTTTATTTCTGAAGGAATATTCAGGCTCTGACTG
340 | GCGCTACGGGCGGCTTTGAAATAAACCGATGCACCGCTTAACTGTAAATCGCCATGGTCGGCAGAGAGTT
341 | GTATGCGTTTCACAATGCGACAAACAGGAAGTTTCAGCGCCAGATCGTTGGTTTCGTTACGCGGCATTGC
342 | AATGGCGCCGAGGAGTTTATGGTCGTTTGCCTGCGCCGTGCAGCACAGCATCAGGCTAATCGCCAGGCTG
343 | GCGGAAATCGTAAAAACGGATTTCATAAGGATTCTCTTAGTGGGAAGAGGTAGGGGGATGAATACCCACT
344 | AGTTTACTGCTGATAAAGAGAAGATTCAGGCACGTAATCTTTTCTTTTTATTACAATTTTTTGATGAATG
345 | CCTTGGCTGCGATTCATTCTTTATATGAATAAAATTGCTGTCAATTTTACGTCTTGTCCTGCCATATCGC
346 | GAAATTTCTGCGCAAAAGCACAAAAAATTTTTGCATCTCCCCCTTGATGACGTGGTTTACGACCCCATTT
347 | AGTAGTCAACCGCAGTGAGTGAGTCTGCAAAAAAATGAAATTGGGCAGTTGAAACCAGACGTTTCGCCCC
348 | TATTACAGACTCACAACCACATGATGACCGAATATATAGTGGAGACGTTTAGATGGGTAAAATAATTGGT
349 | ATCGACCTGGGTACTACCAACTCTTGTGTAGCGATTATGGATGGCACCACTCCTCGCGTGCTGGAGAACG
350 | CCGAAGGCGATCGCACCACGCCTTCTATCATTGCCTATACCCAGGATGGTGAAACTCTAGTTGGTCAGCC
351 | GGCTAAACGTCAGGCAGTGACGAACCCGCAAAACACTCTGTTTGCGATTAAACGCCTGATTGGTCGCCGC
352 | TTCCAGGACGAAGAAGTACAGCGTGATGTTTCCATCATGCCGTTCAAAATTATTGCTGCTGATAACGGCG
353 | ACGCATGGGTCGAAGTTAAAGGCCAGAAAATGGCACCGCCGCAGATTTCTGCTGAAGTGCTGAAAAAAAT
354 | GAAGAAAACCGCTGAAGATTACCTGGGTGAACCGGTAACTGAAGCTGTTATCACCGTACCGGCATACTTT
355 | AACGATGCTCAGCGTCAGGCAACCAAAGACGCAGGCCGTATCGCTGGTCTGGAAGTAAAACGTATCATCA
356 | ACGAACCGACCGCAGCTGCGCTGGCTTACGGTCTGGACAAAGGCACTGGCAACCGTACTATCGCGGTTTA
357 | TGACCTGGGTGGTGGTACTTTCGATATTTCTATTATCGAAATCGACGAAGTTGACGGCGAAAAAACCTTC
358 | GAAGTTCTGGCAACCAACGGTGATACCCACCTGGGGGGTGAAGACTTCGACAGCCGTCTGATCAACTATC
359 | TGGTTGAAGAATTCAAGAAAGATCAGGGCATTGACCTGCGCAACGATCCGCTGGCAATGCAGCGCCTGAA
360 | AGAAGCGGCAGAAAAAGCGAAAATCGAACTGTCTTCCGCTCAGCAGACCGACGTTAACCTGCCATACATC
361 | ACTGCAGACGCGACCGGTCCGAAACACATGAACATCAAAGTGACTCGTGCGAAACTGGAAAGCCTGGTTG
362 | AAGATCTGGTAAACCGTTCCATTGAGCCGCTGAAAGTTGCACTGCAGGACGCTGGCCTGTCCGTATCTGA
363 | TATCGACGACGTTATCCTCGTTGGTGGTCAGACTCGTATGCCAATGGTTCAGAAGAAAGTTGCTGAGTTC
364 | TTTGGTAAAGAGCCGCGTAAAGACGTTAACCCGGACGAAGCTGTAGCAATCGGTGCTGCTGTTCAGGGTG
365 | GTGTTCTGACTGGTGACGTAAAAGACGTACTGCTGCTGGACGTTACCCCGCTGTCTCTGGGTATCGAAAC
366 | CATGGGCGGTGTGATGACGACGCTGATCGCGAAAAACACCACTATCCCGACCAAGCACAGCCAGGTGTTC
367 | TCTACCGCTGAAGACAACCAGTCTGCGGTAACCATCCATGTGCTGCAGGGTGAACGTAAACGTGCGGCTG
368 | ATAACAAATCTCTGGGTCAGTTCAACCTAGATGGTATCAACCCGGCACCGCGCGGCATGCCGCAGATCGA
369 | AGTTACCTTCGATATCGATGCTGACGGTATCCTGCACGTTTCCGCGAAAGATAAAAACAGCGGTAAAGAG
370 | CAGAAGATCACCATCAAGGCTTCTTCTGGTCTGAACGAAGATGAAATCCAGAAAATGGTACGCGACGCAG
371 | AAGCTAACGCCGAAGCTGACCGTAAGTTTGAAGAGCTGGTACAGACTCGCAACCAGGGCGACCATCTGCT
372 | GCACAGCACCCGTAAGCAGGTTGAAGAAGCAGGCGACAAACTGCCGGCTGACGACAAAACTGCTATCGAG
373 | TCTGCGCTGACTGCACTGGAAACTGCTCTGAAAGGTGAAGACAAAGCCGCTATCGAAGCGAAAATGCAGG
374 |
--------------------------------------------------------------------------------