├── README.md ├── create_pseudohaploid.sh ├── filter_seq ├── img ├── AlignmentChain.png ├── ChainFiltering.png ├── Pseudohaploid.png └── README.md ├── pseudohaploid.chains.pl └── test ├── basic.fa ├── run_test.output ├── run_tests.sh └── simple.fa /README.md: -------------------------------------------------------------------------------- 1 | # pseudohaploid 2 | Create pseudohaploid assemblies from a partially resolved diploid assembly 3 | 4 | [Mike Alonge](http://michaelalonge.com/), [Srividya Ramakrishnan](https://github.com/srividya22), and [Michael C. Schatz](http://schatz-lab.org) 5 | 6 | When assembling highly heterozygous genomes, the total span of the assembly is often nearly twice the expected (haploid) genome size, which is indicative of the assembler partially resolving the heterozygosity. This creates many duplicated genes and other duplicated features that can complicate annotation and comparative genomics. This repository contains code for post-processing an assembly to create a pseudo-haploid representation where pairs of contigs representing the same homologous sequence were filtered to select only one representative contig. The approach is similar to the approach used by [FALCON-unzip](https://www.nature.com/articles/nmeth.4035) for PacBio reads or [SuperNova](https://genome.cshlp.org/content/27/5/757) for 10X Genomics Linked Reads. As with those algorithms, our algorithm will not necessarily maintain the same phase throughout the assembly, and can arbitrarily alternate between homologous chromosomes at the ends of contigs. Unlike those methods, our method can be run as a stand-alone tool with any assembler. 7 | 8 |

9 | 10 | **Oveview of Pseudo-haploid Genome Assembly** (a) The original sample has two homologous chromosomes labeled orange and blue. (b) In the de novo assembly, homologous regions containing higher rates of heterozygosity are split into distinct sequences (orange and blue), while regions with low rates or no heterozygous bases are collapsed to a single representative sequence (black). (c) Our algorithm attempts to filter out redundant contigs from the other homologous chromosome, although the phasing of the differ contigs may be inconsistent. Figure derived from [8] 11 | 12 | Briefly, the algorithm begins by aligning the genome assembly to itself using the whole genome aligner `nucmer` from the [MUMmer suite](http://mummer.sourceforge.net/). We recommend the parameters `nucmer -maxmatch -l 100 -c 500` to report all alignments, unique and repetitive, at least 500bp long with a 100bp seed match. We further filtered these alignments to those that are 1000bp or longer using `delta-filter` (also part of the MUMmer suite). We also recommend the `sge_mummer` version of MUMmer so the alignments can be computed in parallel in a cluster environment: [sge_mummer github](https://github.com/fritzsedlazeck/sge_mummer) although this will produce identical results to the serial version. Finally we recommend filtering the alignments to keep those that are 90% identity or greater, to filter lower identity repetitive alignments while accommodating the expected rate of heterozygosity between homologous chromosomes while accounting for local regions of greater diversity. 13 | 14 | Next, the alignments were examined to identify and filter out redundant homologous contigs. We do so by linking the individual alignments into “alignment chains”, consisting of sets of alignments that are co-linear along the pair of contigs. Our method was inspired by older methods for computing synteny between distantly related genomes, although our method focuses on the problem of identifying homologous contig pairs as high identity long alignment chains. As we expect there to be structural variations between the homologous sequences, we allow for gaps in the alignments between the contigs, although true homologous contig pairs should maintain a consistent order and orientation to the alignments. Specifically, in the alignments from contig A to contig B, each aligned region of A forms a node in an alignment graph, and edges are added between nodes if they are compatible alignments, meaning they are on the same strand, and the implied gap distance on both contig A and contig B was less than 25kbp but not negative. Our algorithm then uses a depth first search starting at every node in the alignment graph to find the highest scoring chain of alignments, where the score is determined by the number of bases that are aligned in the chain. Notably, if a repetitive alignment is flanked by unique or repetitive alignments, such as the orange sequence in Contig B below, this approach will prefer to link alignments that are co-linear on Contig A. We find this produces better results than the filtering that MUMmer’s delta-filter can perform, which does not consider the context of the alignments when identifying a candidate set of non-redundant set of alignments. 15 | 16 |

17 | 18 | 19 | **Alignment Chain Construction** (a) Pairwise alignments between all contigs are computed with nucmer. Here we show just the alignments between contigs A and B. (b) An alignment graph is computed where each aligned region of A forms a node, with edges between nodes that are compatible on the same strand, in the same order, and no more than 25kbp between them. (c) The final alignment chain is selected from the alignment graph as the maximal weight path in the alignment graph. 20 | 21 | 22 | With the alignment chains identified between pairs of contigs, the last phase of the algorithm is to remove any contigs that are redundant with other contigs originating on the homologous chromosome. Specifically, it evaluates the contigs in order from smallest to longest, and computes the fraction of the bases of each contig that are spanned by alignment chains to other non-redundant contigs. If more than X% of the contig is spanned, it is marked as redundant. This can occur in simple cases where shorter contigs are spanned by individual longer contigs as well as more complex cases where a contig is spanned by multiple shorter non-redundant contigs. We recommmend you evaluate several cutoffs for the threshold of percent of the bases spanned. 23 | 24 |

25 | 26 | **Chain Filtering** (a) In simple cases, short contigs (contig A) are filtering out by their alignment chains to longer non-redundant contigs (contig B). (b) In complex cases, a contig (contig B) is filtered out because the total span of the alignment chains to multiple non-redundant contigs (contigs A and C) span more than X% of the bases. 27 | 28 | 29 | ## Installation 30 | 31 | Make sure [MUMmer](http://mummer.sourceforge.net/) is installed and the binaries are in your path. We recommend version 3.23 although others may work. 32 | 33 | Then download the pseudohaploid code: 34 | 35 | ` 36 | $ git clone https://github.com/schatzlab/pseudohaploid.git 37 | ` 38 | 39 | There is nothing else to install. 40 | 41 | ## Usage 42 | 43 | The main script to run is `create_pseudohaploid.sh`. This is a simple bash script to simplify the steps of aligning the genome to itself, filtering the alignments, constructing and analyzing the alignment chains, and then creating the final pseudohaploid assembly. The usage is: 44 | 45 | ``` 46 | $ create_pseudohaploid.sh assembly.fa outprefix 47 | ``` 48 | 49 | 50 | The test directory has a smalll script to run this comman on a small simple example. If everything is working well you should see: 51 | 52 | ``` 53 | $ cd test 54 | $ ./run_tests.sh 55 | Running the simple example 56 | Generating pseudohaploid genome sequence 57 | ---------------------------------------- 58 | GENOME: simple.fa 59 | OUTPREFIX: ph.simple 60 | MIN_IDENTITY: 90 61 | MIN_LENGTH: 1000 62 | MIN_CONTAIN: 93 63 | MAX_CHAIN_GAP: 20000 64 | 65 | 1. Aligning simple.fa to itself with nucmer 66 | Original assembly has 2 contigs 67 | 68 | 2. Filter for alignments longer than 1000 bp and below 90 identity 69 | 70 | 3. Generating coords file 71 | 72 | 4. Identifying alignment chains: min_id: 90 min_contain: 93 max_gap: 20000 73 | Processing coords file (ph.simple.filter.coords)... 74 | Processed 6 alignment records [4 valid] 75 | Finding chains for 2 contigs... 76 | Found 2 total edges [0.000 constructtime, 0.000 searchtime, 4 stackadd] 77 | Looking for contained contigs... 78 | Found 1 joint contained contigs 79 | Printed 1 total contained contigs 80 | 81 | 5. Generating a list of redundant contig ids using min_contain: 93 82 | Identified 1 redundant contig to remove in ph.simple.contained.ids 83 | 84 | 6. Creating final pseudohaploid assembly in ph.simple.pseudohap.fa 85 | Pseudohaploid assembly has 1 contigs 86 | ``` 87 | 88 | Note the `create_pseudohaploid.sh` script is just a simple bash script so can be easily editing or incorporated into a larger pipeline. You can also swap out steps, such as replacing nucmer with sge_mummer to use a grid to compute the self alignments. 89 | 90 | ## Performance Validation 91 | 92 | To demonstrate the capabilities of our new Pseudohaploid method, we applied these techniques to a highly heterozygous sample of Arabidopsis thaliana, an F1 hybrid of Col-0 and Cvi-0 that was previously sequenced as part of the FALCON-unzip paper. For this analysis, we downloaded 116x coverage of PacBio reads (read N50 length=17,474) of the F1 genome from the SRA under accession SRX1715706. We then assembled the reads using Canu using parameters optimized for heterozygous samples. The total size of the raw Canu assembly was substantially larger than the expected haploid genome size: the total assembly size was 214.7Mbp, whereas the haploid genome size is ~135Mbp according to the latest estimates from The Arabidopsis Information Resource (TAIR) (https://www.arabidopsis.org/portals/genAnnotation/gene_structural_annotation/agicomplete.jsp). 93 | 94 | We then applied the Pseudohaploid method to the assembly. This reduced the total size of the assembly from 214.7Mbp to 143.5Mbp, and increased the contig N50 size from 350kbp to 950kbp by reducing the number of contigs from 2074 to 505. Then using the high quality TAIR10 reference genome, we investigated the quality of both the raw and Pseudohaploid assemblies. Using BUSCO, we found the reference genome contained 1356 complete BUSCOs genes, of which 1348 were single-copy, and 8 were duplicated. We found the raw Canu assembly contained a large fraction of duplicated genes, and overall it contained 1355 complete BUSCOs, although only 711 were single-copy, and 644 were duplicated. In contrast the Pseudohaploid assembly substantially reduced the number of duplicate genes, and contained a total of 1355 complete BUSCOs, of which 1240 were single-copy, and only 115 duplicated (an 83% reduction). 95 | 96 | Furthermore, by aligning the raw Canu and Pseudohaploid assemblies to the reference TAIR10 assemblies using nucmer using the parameters “-maxmatch -l 100 -c 500”, we found that 1.6Mbp (1.4%) of the TAIR10 assembly was not represented in the Canu assembly, and 4.2Mbp (3.5%) was not represented in the Pseudohaploid assembly as computed by the MUMmer tool dnadiff in the “AlignedBases” field. We also found that 19.0 Mbp of the raw Canu assembly and 14.1Mbp of the Pseudohaploid assembly were unaligned to the reference genome. However, the reference TAIR10 assembly was assembled from the Col-0 accession, and the portions that do not align are chiefly due to the pseudo-haploid representation that will alternate between the Col-0 and Cvi-0 haplotypes. To assess this, we also aligned a high quality (N50 size=7.9Mbp) Cvi inbred assembly created with the FALCON assembler to the TAIR10 reference using nucmer using the same parameters as above. From this, we find that 17.3Mbp (14.5%) of the reference is also not found in the Cvi assembly and the Cvi assembly contains 17.7Mbp not found in the reference highlighting the widespread structural variations between the accessions. We also found that the vast majority (94.5%) of the bases from the Pseudohaploid assembly that were not aligned to the reference genome could be successfully aligned to the Cvi assembly using the same parameters. 97 | 98 | Overall, the Pseudohaploid method was highly effective: it removed 71Mbp of redundant sequences from the raw Canu output to substantially improve the fraction of unique genes while only marginally decreasing the sequences from the reference present in the pseudohaploid assembly. The datafiles for these assemblies are available here: [http://labshare.cshl.edu/shares/schatzlab/www-data/pseudohaploid/arabidopsis/](http://labshare.cshl.edu/shares/schatzlab/www-data/pseudohaploid/arabidopsis/) 99 | -------------------------------------------------------------------------------- /create_pseudohaploid.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | set -o pipefail 4 | 5 | BIN="$( cd "$(dirname "$0")" ; pwd -P )" 6 | 7 | ## Minimum alignment Identity 8 | #MIN_IDENTITY=95 9 | MIN_IDENTITY=90 10 | 11 | ## Minimum alignment length to consider 12 | #MIN_LENGTH=10000 13 | MIN_LENGTH=1000 14 | 15 | ## Minimum containment percentage from overlap chains to filter contig 16 | #MIN_CONTAIN=95 17 | MIN_CONTAIN=93 18 | 19 | ## Maximum distance in bp allowed between alignments on the same alignment chain 20 | MAX_CHAIN_GAP=20000 21 | 22 | if [ $# -lt 2 ] 23 | then 24 | echo "create_pseudohaploid.sh assembly.fa outprefix" 25 | exit 26 | fi 27 | 28 | GENOME=$1 29 | PREFIX=$2 30 | 31 | echo "Generating pseudohaploid genome sequence" 32 | echo "----------------------------------------" 33 | echo "GENOME: $GENOME" 34 | echo "OUTPREFIX: $PREFIX" 35 | echo "MIN_IDENTITY: $MIN_IDENTITY" 36 | echo "MIN_LENGTH: $MIN_LENGTH" 37 | echo "MIN_CONTAIN: $MIN_CONTAIN" 38 | echo "MAX_CHAIN_GAP: $MAX_CHAIN_GAP" 39 | echo 40 | 41 | ## You may want to replace this with sge_mummer for large genomes 42 | ## See: https://github.com/fritzsedlazeck/sge_mummer 43 | if [ ! -r $PREFIX.delta ] 44 | then 45 | echo "1. Aligning $GENOME to itself with nucmer" 46 | (nucmer --maxmatch -c 100 -l 500 $GENOME $GENOME -p $PREFIX) >& nucmer.log 47 | numorig=`grep -c '^>' $GENOME` 48 | echo "Original assembly has $numorig contigs" 49 | echo 50 | fi 51 | 52 | ## Pre-filter for just longer high identity alignments 53 | echo "2. Filter for alignments longer than $MIN_LENGTH bp and below $MIN_IDENTITY identity" 54 | delta-filter -l $MIN_LENGTH -i $MIN_IDENTITY $PREFIX.delta > $PREFIX.filter.delta 55 | echo 56 | 57 | ## Create the coord file 58 | echo "3. Generating coords file" 59 | show-coords -rclH $PREFIX.filter.delta > $PREFIX.filter.coords 60 | echo 61 | 62 | ## Find and analyze the alignment chains 63 | ## Note you can rerun this step multiple times from the same coords file 64 | echo "4. Identifying alignment chains: min_id: $MIN_IDENTITY min_contain: $MIN_CONTAIN max_gap: $MAX_CHAIN_GAP" 65 | ($BIN/pseudohaploid.chains.pl $PREFIX.filter.coords \ 66 | $MIN_IDENTITY $MIN_CONTAIN $MAX_CHAIN_GAP > $PREFIX.chains) >& $PREFIX.chains.log 67 | cat $PREFIX.chains.log 68 | echo 69 | 70 | ## Generate a list of contained contigs 71 | ## This can also be rerun from the same chain file but using different containment thresholds 72 | echo "5. Generating a list of redundant contig ids using min_contain: $MIN_CONTAIN" 73 | grep '^#' $PREFIX.chains | \ 74 | awk -v cut=$MIN_CONTAIN '{if ($4 >= cut){print ">"$2}}' > $PREFIX.contained.ids 75 | numcontained=`wc -l $PREFIX.contained.ids | awk '{print $1}'` 76 | echo "Identified $numcontained redundant contig to remove in $PREFIX.contained.ids" 77 | echo 78 | 79 | 80 | ## Finally filter the original assembly to remove the contained contigs 81 | echo "6. Creating final pseudohaploid assembly in $PREFIX.pseudohap.fa" 82 | ($BIN/filter_seq -v $PREFIX.contained.ids $GENOME > $PREFIX.pseudohap.fa) >& $PREFIX.pseudohap.fa.log 83 | numfinal=`grep -c '^>' $PREFIX.pseudohap.fa` 84 | echo "Pseudohaploid assembly has $numfinal contigs" 85 | -------------------------------------------------------------------------------- /filter_seq: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env perl 2 | use warnings; 3 | use strict; 4 | use FileHandle; 5 | 6 | my $USAGE = "Usage: filter_seq [-index seq.fa] [-v] subset.fa all.fa\n"; 7 | 8 | my $HELPTEXT = qq~ 9 | Extract specified fasta records listed in subset.fa from a master file all.fa. 10 | If available, use the index file all.fa.idx to allow random access. Create the index file 11 | by running 'filter_seq -index all.fa'. 12 | 13 | $USAGE 14 | 15 | Options 16 | ------- 17 | -index Create an index file of the fasta file\n 18 | -v skip everything listed in the subset.fa file\n 19 | ~; 20 | 21 | 22 | my $createindex = 0; 23 | my $skiplisted = 0; 24 | 25 | my $good = shift @ARGV or die $USAGE; 26 | 27 | if ($good eq "-help") 28 | { 29 | die $HELPTEXT; 30 | } 31 | elsif ($good eq "-index") 32 | { 33 | $createindex = 1; 34 | $good = shift @ARGV or die $USAGE; 35 | } 36 | elsif ($good eq "-v") 37 | { 38 | $skiplisted = 1; 39 | $good = shift @ARGV or die $USAGE; 40 | print STDERR "Skipping everything listed in $good\n"; 41 | } 42 | 43 | 44 | if ($createindex) 45 | { 46 | my $orig = new FileHandle "$good", "<" 47 | or die("Can't open $good ($!)"); 48 | 49 | open IDX, "> $good.idx" 50 | or die("Can't open $good.idx ($!)"); 51 | 52 | while (!$orig->eof()) 53 | { 54 | my $pos = $orig->tell(); 55 | my $line = $orig->getline(); 56 | 57 | if ($line =~ /^\>(\S+)/) 58 | { 59 | print IDX "$1 $pos\n"; 60 | } 61 | } 62 | 63 | close IDX; 64 | } 65 | else 66 | { 67 | my $copy = shift @ARGV or die $USAGE; 68 | 69 | my %sequencelist; 70 | 71 | ## Find the seqnames from the good list 72 | open GOOD, "< $good" 73 | or die("Could't open $good ($!)"); 74 | 75 | while () 76 | { 77 | if (/^\#(\S+)\(/ || /^\>(\S+)/) 78 | { 79 | $sequencelist{$1} = 1; 80 | } 81 | } 82 | close GOOD; 83 | 84 | if ((!$skiplisted) && (-r "$copy.idx")) 85 | { 86 | ## Create the index as: grep -b '>' tvg2.qual | tr -d ':' | tr '>' ' ' | awk '{print $2" "$1}' > tvg2.qual.idx 87 | my %offsettable; 88 | 89 | open IDX, "< $copy.idx" 90 | or die("Couldnt open $copy.idx ($!)"); 91 | 92 | while () 93 | { 94 | my @val = split / /, $_; 95 | 96 | $offsettable{$val[0]} = $val[1] 97 | if (exists $sequencelist{$val[0]}); 98 | } 99 | close IDX; 100 | 101 | 102 | my $copy = new FileHandle "$copy", "r" 103 | or die("Couldnt open $copy ($!)"); 104 | 105 | foreach my $seqname (keys %sequencelist) 106 | { 107 | if (exists $offsettable{$seqname}) 108 | { 109 | $sequencelist{$seqname} = 0; 110 | 111 | $copy->seek($offsettable{$seqname}, 0); 112 | 113 | ## Print the headerline for sure 114 | my $line = $copy->getline(); 115 | print $line; 116 | 117 | ## loop until next record 118 | $line = $copy->getline(); 119 | while ($line !~ /^>/) 120 | { 121 | print $line; 122 | last if $copy->eof(); 123 | $line = $copy->getline(); 124 | } 125 | } 126 | } 127 | } 128 | else 129 | { 130 | ## Pull the sequences out of the copy file 131 | my $printid = 0; 132 | 133 | open COPY, "< $copy" 134 | or die("Couldnt open $copy ($!)"); 135 | 136 | while () 137 | { 138 | if (/^\>(\S+)/) 139 | { 140 | $printid = $sequencelist{$1}; 141 | if ($skiplisted) { $printid = !$printid; } 142 | $sequencelist{$1} = 0; 143 | } 144 | 145 | print $_ if $printid; 146 | } 147 | 148 | close COPY; 149 | } 150 | 151 | ## Make sure we found each id 152 | foreach my $seqname (keys %sequencelist) 153 | { 154 | print STDERR "$seqname in $good but not in $copy" 155 | if ($sequencelist{$seqname}); 156 | } 157 | } 158 | -------------------------------------------------------------------------------- /img/AlignmentChain.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/schatzlab/pseudohaploid/7c014188c5567bb8b27fcf754da79d59a1b5a425/img/AlignmentChain.png -------------------------------------------------------------------------------- /img/ChainFiltering.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/schatzlab/pseudohaploid/7c014188c5567bb8b27fcf754da79d59a1b5a425/img/ChainFiltering.png -------------------------------------------------------------------------------- /img/Pseudohaploid.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/schatzlab/pseudohaploid/7c014188c5567bb8b27fcf754da79d59a1b5a425/img/Pseudohaploid.png -------------------------------------------------------------------------------- /img/README.md: -------------------------------------------------------------------------------- 1 | Image Files 2 | -------------------------------------------------------------------------------- /pseudohaploid.chains.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl -w 2 | use strict; 3 | 4 | ## Generate coords file like this: 5 | ## show-coords -rclH file.delta > file.coords 6 | 7 | my $USAGE = "pseudohaploid.chains.pl coords_file min_perc_id min_perc_cov max_chain_gap > redundant.list\n"; 8 | 9 | my $coordsfile = shift @ARGV or die $USAGE; 10 | my $MIN_PERC_ID = shift @ARGV or die $USAGE; 11 | my $MIN_PERC_COV = shift @ARGV or die $USAGE; 12 | my $MAX_CHAIN_DIST = shift @ARGV or die $USAGE; 13 | 14 | my $VERBOSE = 1; 15 | my $PATHVERBOSE = 0; 16 | 17 | 18 | ## Parse the coords file for valid alignments 19 | ############################################################################### 20 | 21 | open COORDS, "$coordsfile" or die "Can't open $coordsfile\n"; 22 | print STDERR "Processing coords file ($coordsfile)...\n"; 23 | 24 | my $alignments = 0; 25 | my $validalignments = 0; 26 | my %contigs; 27 | 28 | while () 29 | { 30 | $alignments++; 31 | if (($alignments % 10000) == 0) { print STDERR " processed $alignments alignments\n"; } 32 | 33 | chomp; 34 | $_ =~ s/^\s+//; 35 | my @vals = split /\s+/, $_; 36 | 37 | my $rstart = $vals[0]; 38 | my $rend = $vals[1]; 39 | 40 | my $qstart = $vals[3]; 41 | my $qend = $vals[4]; 42 | 43 | my $qoo = "F"; 44 | if ($qstart > $qend) { $qoo = "R" }; 45 | 46 | my $alenr = $vals[6]; 47 | my $alenq = $vals[7]; 48 | 49 | my $pid = $vals[9]; 50 | 51 | my $lenr = $vals[11]; 52 | my $lenq = $vals[12]; 53 | 54 | my $rid = $vals[17]; 55 | my $qid = $vals[18]; 56 | 57 | $contigs{$rid}->{len} = $lenr; 58 | $contigs{$qid}->{len} = $lenq; 59 | 60 | #print "$_\n"; 61 | #print "$rid $qid | $rstart $rend $lenr | $qstart $qend $qoo $lenq | $alenr\n"; 62 | 63 | next if ($pid < $MIN_PERC_ID); 64 | next if ($rid eq $qid); 65 | 66 | $validalignments++; 67 | 68 | my $ainfo; 69 | $ainfo->{rstart} = $rstart; 70 | $ainfo->{rend} = $rend; 71 | 72 | $ainfo->{alenr} = $alenr; 73 | 74 | $ainfo->{qid} = $qid; 75 | $ainfo->{qstart} = $qstart; 76 | $ainfo->{qend} = $qend; 77 | 78 | push @{$contigs{$rid}->{align}->{$qid}->{$qoo}}, $ainfo; 79 | } 80 | 81 | print STDERR "Processed $alignments alignment records [$validalignments valid]\n"; 82 | 83 | 84 | 85 | ## Find the longest alignment chain per sequence 86 | ############################################################################### 87 | 88 | my $numcontigs = scalar keys %contigs; 89 | my $totaledges = 0; 90 | my $ctgcount = 0; 91 | 92 | my $constructtime = 0; 93 | my $searchtime = 0; 94 | my $stackadd = 0; 95 | my $lasttime = 0; 96 | 97 | print STDERR "Finding chains for $numcontigs contigs...\n"; 98 | 99 | ## process from smallest to biggest, so that bigger contigs are preferred to be kept 100 | foreach my $ctg (sort {$contigs{$a}->{len} <=> $contigs{$b}->{len}} keys %contigs) 101 | { 102 | $ctgcount++; 103 | if (($ctgcount % 1000) == 0) { print STDERR " processed $ctgcount contigs...\n"; } 104 | 105 | if (exists $contigs{$ctg}->{align}) 106 | { 107 | my $clen = $contigs{$ctg}->{len}; 108 | 109 | foreach my $qid (sort keys %{$contigs{$ctg}->{align}}) 110 | { 111 | my $bestspanall = -1; 112 | my $bestpathall = undef; 113 | 114 | my %salign; 115 | $salign{F} = undef; 116 | $salign{R} = undef; 117 | 118 | foreach my $dir ('F', 'R') 119 | { 120 | if (exists $contigs{$ctg}->{align}->{$qid}->{$dir}) 121 | { 122 | my @align = sort {$a->{rstart} <=> $b->{rstart}} @{$contigs{$ctg}->{align}->{$qid}->{$dir}}; 123 | $salign{$dir} = \@align; 124 | 125 | if ($PATHVERBOSE) 126 | { 127 | my $qlen = $contigs{$qid}->{len}; 128 | print "$ctg [$clen] $qid [$qlen] $dir\n"; 129 | for (my $i = 0; $i < scalar @align; $i++) 130 | { 131 | my $rstart = $align[$i]->{rstart}; 132 | my $rend = $align[$i]->{rend}; 133 | my $qstart = $align[$i]->{qstart}; 134 | my $qend = $align[$i]->{qend}; 135 | print "\t<$i$dir>\t$rstart\t$rend\t|\t$qstart\t$qend\n"; 136 | } 137 | } 138 | 139 | ## Find all of the compatible edges 140 | $lasttime = time(); 141 | for (my $i = 0; $i < scalar @align; $i++) 142 | { 143 | for (my $j = 0; $j < $i; $j++) 144 | { 145 | ## sorted scan: 0 ... j ... i ... n 146 | ## check if alignment j is compatible with alignment i 147 | my $rdist = $align[$i]->{rstart} - $align[$j]->{rend}; 148 | my $qdist = $align[$i]->{qstart} - $align[$j]->{qend}; 149 | if ($dir eq "R") { $qdist = $align[$j]->{qend} - $align[$i]->{qstart} } 150 | 151 | my $valid = 0; 152 | 153 | ## First check the distance between the alignments and ref position 154 | if ((abs($rdist) < $MAX_CHAIN_DIST) && 155 | (abs($qdist) < $MAX_CHAIN_DIST) && 156 | ($align[$i]->{rstart} > $align[$j]->{rstart}) && 157 | ($align[$i]->{rend} > $align[$j]->{rend})) 158 | { 159 | $valid = 1; 160 | } 161 | 162 | ## Now check the query positions 163 | if ($valid) 164 | { 165 | $valid = 0; 166 | 167 | if ($dir eq "F") 168 | { 169 | ## ---------------------------------------------------- 170 | ## s----j------->e 171 | ## s------i------->e 172 | 173 | if (($align[$i]->{qstart} > $align[$j]->{qstart}) && 174 | ($align[$i]->{qend} > $align[$j]->{qend})) 175 | { 176 | $valid = 1; 177 | } 178 | } 179 | else 180 | { 181 | ## ---------------------------------------------------- 182 | ## {qstart} > $align[$i]->{qstart}) && 186 | ($align[$j]->{qend} > $align[$i]->{qend})) 187 | { 188 | $valid = 1; 189 | } 190 | } 191 | } 192 | 193 | if ($valid) 194 | { 195 | $totaledges++; 196 | push @{$align[$j]->{edge}}, $i; 197 | } 198 | } 199 | } 200 | $constructtime += (time() - $lasttime); 201 | 202 | if ($PATHVERBOSE) 203 | { 204 | for (my $i = 0; $i < scalar @align; $i++) 205 | { 206 | if (exists $align[$i]->{edge}) 207 | { 208 | print "edges from <$i$dir>:"; 209 | foreach my $j (@{$align[$i]->{edge}}) 210 | { 211 | print "\t<$j$dir>"; 212 | } 213 | print "\n"; 214 | } 215 | } 216 | } 217 | 218 | ## find the longest chain starting at each node (if not already visited) 219 | $lasttime = time(); 220 | for (my $i = 0; $i < scalar @align; $i++) 221 | { 222 | next if exists $align[$i]->{visit}; 223 | 224 | ## start a DFS at node i to explore chains passing through it 225 | my $path; 226 | $path->{chainstart} = $align[$i]->{rstart}; 227 | $path->{chainend} = $align[$i]->{rend}; 228 | $path->{chainweight} = $align[$i]->{alenr}; 229 | $path->{dir} = $dir; 230 | 231 | push @{$path->{nodes}}, $i; 232 | 233 | my $bestspani = -1; 234 | my $bestpathi = undef; 235 | 236 | my @stack; 237 | push @stack, $path; 238 | $stackadd++; 239 | 240 | while (scalar @stack > 0) 241 | { 242 | my $path = pop @stack; 243 | my $pathlen = scalar @{$path->{nodes}}; 244 | 245 | my $lastnode = $path->{nodes}->[$pathlen-1]; 246 | $align[$lastnode]->{visit}++; 247 | 248 | my $betterpath = 0; 249 | if ((!exists $align[$lastnode]->{chainweight}) || 250 | ($path->{chainweight} > $align[$lastnode]->{chainweight})) 251 | { 252 | $betterpath = 1; 253 | $align[$lastnode]->{chainweight} = $path->{chainweight}; 254 | } 255 | 256 | if (($betterpath) && (exists $align[$lastnode]->{edge})) 257 | { 258 | ## If I can keep extending, extend with all children 259 | foreach my $e (@{$align[$lastnode]->{edge}}) 260 | { 261 | my @nodes = @{$path->{nodes}}; 262 | push @nodes, $e; 263 | my $newpath; 264 | $newpath->{nodes} = \@nodes; 265 | $newpath->{dir} = $dir; 266 | 267 | $newpath->{chainstart} = $path->{chainstart}; 268 | $newpath->{chainend} = $path->{chainend}; 269 | if ($align[$e]->{rend} > $newpath->{chainend}) { $newpath->{chainend} = $align[$e]->{rend}; } 270 | 271 | my $newstart = $align[$e]->{rstart}; 272 | if ($path->{chainend} > $newstart) { $newstart = $path->{chainend}; } 273 | my $newbases = $align[$e]->{rend} - $newstart + 1; 274 | $newpath->{chainweight} = $path->{chainweight} + $newbases; 275 | 276 | push @stack, $newpath; 277 | $stackadd++; 278 | } 279 | } 280 | else 281 | { 282 | ## no place else to go, score the path 283 | my $chainstart = $path->{chainstart}; 284 | my $chainend = $path->{chainend}; 285 | my $chainspan = $chainend - $chainstart + 1; 286 | my $chainweight = $path->{chainweight}; 287 | 288 | ## override span with weight 289 | $chainspan = $chainweight; 290 | 291 | if ($chainspan > $bestspani) 292 | { 293 | $bestspani = $chainspan; 294 | $bestpathi = $path; 295 | } 296 | } 297 | } 298 | 299 | ## best path from node i 300 | if ($PATHVERBOSE) 301 | { 302 | print "bestspani <$i$dir>\t$bestspani"; 303 | 304 | if (defined $bestpathi) 305 | { 306 | my $span = $bestpathi->{chainend} - $bestpathi->{chainstart} + 1; 307 | print "\t|\t$bestpathi->{chainstart}\t$bestpathi->{chainend}\t[$span]\t|\t"; 308 | foreach my $n (@{$bestpathi->{nodes}}) 309 | { 310 | print "\t<$n$dir>"; 311 | } 312 | print "\n"; 313 | } 314 | } 315 | 316 | ## check if this is the best overall 317 | if ($bestspani > $bestspanall) 318 | { 319 | $bestspanall = $bestspani; 320 | $bestpathall = $bestpathi; 321 | } 322 | } 323 | $searchtime += (time() - $lasttime); 324 | } 325 | } 326 | 327 | ## overall best chain between this pair of contigs 328 | if ($VERBOSE) 329 | { 330 | my $clen = $contigs{$ctg}->{len}; 331 | my $qlen = $contigs{$qid}->{len}; 332 | 333 | print "bestspanall $ctg [$clen] $qid [$qlen] : $bestspanall\n"; 334 | 335 | if (defined $bestpathall) 336 | { 337 | my $dir = $bestpathall->{dir}; 338 | my $span = $bestpathall->{chainend} - $bestpathall->{chainstart} + 1; 339 | 340 | print "\t$dir\t|\t$bestpathall->{chainstart}\t$bestpathall->{chainend}\t[$span]\t|\t"; 341 | foreach my $n (@{$bestpathall->{nodes}}) 342 | { 343 | print "\t<$n$dir>"; 344 | } 345 | print "\n"; 346 | 347 | foreach my $n (@{$bestpathall->{nodes}}) 348 | { 349 | my $rstart = $salign{$dir}->[$n]->{rstart}; 350 | my $rend = $salign{$dir}->[$n]->{rend}; 351 | my $qstart = $salign{$dir}->[$n]->{qstart}; 352 | my $qend = $salign{$dir}->[$n]->{qend}; 353 | print "\t<$n$dir>\t$rstart\t$rend\t|\t$qstart\t$qend\n"; 354 | } 355 | 356 | print "\n\n"; 357 | } 358 | } 359 | 360 | if (defined $bestpathall) 361 | { 362 | my $chain; 363 | $chain->{rstart} = $bestpathall->{chainstart}; 364 | $chain->{rend} = $bestpathall->{chainend}; 365 | $chain->{qid} = $qid; 366 | 367 | push @{$contigs{$ctg}->{chain}}, $chain; 368 | } 369 | } 370 | } 371 | } 372 | 373 | my $constructtimep = sprintf("%d", $constructtime); 374 | my $searchtimep = sprintf("%d", $searchtime); 375 | 376 | print STDERR "Found $totaledges total edges [$constructtimep constructtime, $searchtimep searchtime, $stackadd stackadd]\n"; 377 | 378 | 379 | 380 | ## Look for jointly contained contigs 381 | ############################################################################### 382 | 383 | print STDERR "Looking for contained contigs...\n"; 384 | 385 | my $jointcontained = 0; 386 | 387 | ## process from smallest to biggest, so that bigger contigs are preferred to be kept 388 | foreach my $ctg (sort {$contigs{$a}->{len} <=> $contigs{$b}->{len}} keys %contigs) 389 | { 390 | if (exists $contigs{$ctg}->{chain}) 391 | { 392 | my $clen = $contigs{$ctg}->{len}; 393 | 394 | my %octgs; 395 | my $mappedbp = 0; 396 | my $lastend = -1; 397 | 398 | ## Plane sweep to find non-redundant mapped bases 399 | 400 | foreach my $ainfo (sort {$a->{rstart} <=> $b->{rstart}} @{$contigs{$ctg}->{chain}}) 401 | { 402 | ## skip alignments to stuff that is already contained 403 | next if (exists $contigs{$ainfo->{qid}}->{contained}); 404 | 405 | my $mstart = $ainfo->{rstart}; 406 | if ($lastend > $mstart) { $mstart = $lastend; } 407 | 408 | if ($ainfo->{rend} > $mstart) 409 | { 410 | my $newmap = $ainfo->{rend} - $mstart; 411 | $mappedbp += $newmap; 412 | $lastend = $ainfo->{rend}; 413 | $octgs{$ainfo->{qid}} += $newmap; 414 | } 415 | } 416 | 417 | 418 | ## If a large fraction of this contig is mapped, mark it contained 419 | my $pcov = sprintf("%0.02f", 100.0 * $mappedbp / $clen); 420 | print "# $ctg [$clen] $pcov :"; 421 | 422 | if ($pcov >= $MIN_PERC_COV) 423 | { 424 | $jointcontained++; 425 | 426 | foreach my $oid (sort {$octgs{$b} <=> $octgs{$a}} keys %octgs) 427 | { 428 | my $olen = $contigs{$oid}->{len}; 429 | my $omap = $octgs{$oid}; 430 | print " $oid [$omap $olen]"; 431 | 432 | push @{$contigs{$ctg}->{contained}}, $oid; 433 | } 434 | } 435 | 436 | print "\n"; 437 | } 438 | } 439 | 440 | print STDERR "Found $jointcontained joint contained contigs\n"; 441 | 442 | 443 | 444 | ## Print final results 445 | ############################################################################### 446 | 447 | my $cnt = 0; 448 | foreach my $ctg (sort keys %contigs) 449 | { 450 | if (exists $contigs{$ctg}->{contained}) 451 | { 452 | $cnt++; 453 | my $clen = $contigs{$ctg}->{len}; 454 | print "$cnt $ctg [$clen] :"; 455 | foreach my $parent (@{$contigs{$ctg}->{contained}}) 456 | { 457 | my $plen = $contigs{$parent}->{len}; 458 | print " $parent [$plen]"; 459 | } 460 | 461 | print "\n"; 462 | } 463 | } 464 | 465 | print STDERR "Printed $cnt total contained contigs\n"; 466 | 467 | 468 | -------------------------------------------------------------------------------- /test/run_test.output: -------------------------------------------------------------------------------- 1 | Running the simple example 2 | =============================================================== 3 | Generating pseudohaploid genome sequence 4 | ---------------------------------------- 5 | GENOME: simple.fa 6 | OUTPREFIX: ph.simple 7 | MIN_IDENTITY: 90 8 | MIN_LENGTH: 1000 9 | MIN_CONTAIN: 93 10 | MAX_CHAIN_GAP: 20000 11 | 12 | 1. Aligning simple.fa to itself with nucmer 13 | Original assembly has 2 contigs 14 | 15 | 2. Filter for alignments longer than 1000 bp and below 90 identity 16 | 17 | 3. Generating coords file 18 | 19 | 4. Identifying alignment chains: min_id: 90 min_contain: 93 max_gap: 20000 20 | Processing coords file (ph.simple.filter.coords)... 21 | Processed 6 alignment records [4 valid] 22 | Finding chains for 2 contigs... 23 | Found 2 total edges [0.000 constructtime, 0.000 searchtime, 4 stackadd] 24 | Looking for contained contigs... 25 | Found 1 joint contained contigs 26 | Printed 1 total contained contigs 27 | 28 | 5. Generating a list of redundant contig ids using min_contain: 93 29 | Identified 1 redundant contig to remove in ph.simple.contained.ids 30 | 31 | 6. Creating final pseudohaploid assembly in ph.simple.pseudohap.fa 32 | Pseudohaploid assembly has 1 contigs 33 | 34 | This should report: Pseudohaploid assembly has 1 contigs 35 | =============================================================== 36 | 37 | 38 | Running the basic example 39 | =============================================================== 40 | Generating pseudohaploid genome sequence 41 | ---------------------------------------- 42 | GENOME: basic.fa 43 | OUTPREFIX: ph.basic 44 | MIN_IDENTITY: 90 45 | MIN_LENGTH: 1000 46 | MIN_CONTAIN: 93 47 | MAX_CHAIN_GAP: 20000 48 | 49 | 1. Aligning basic.fa to itself with nucmer 50 | Original assembly has 4 contigs 51 | 52 | 2. Filter for alignments longer than 1000 bp and below 90 identity 53 | 54 | 3. Generating coords file 55 | 56 | 4. Identifying alignment chains: min_id: 90 min_contain: 93 max_gap: 20000 57 | Processing coords file (ph.basic.filter.coords)... 58 | Processed 8 alignment records [4 valid] 59 | Finding chains for 4 contigs... 60 | Found 2 total edges [0.000 constructtime, 0.000 searchtime, 4 stackadd] 61 | Looking for contained contigs... 62 | Found 1 joint contained contigs 63 | Printed 1 total contained contigs 64 | 65 | 5. Generating a list of redundant contig ids using min_contain: 93 66 | Identified 1 redundant contig to remove in ph.basic.contained.ids 67 | 68 | 6. Creating final pseudohaploid assembly in ph.basic.pseudohap.fa 69 | Pseudohaploid assembly has 3 contigs 70 | 71 | This should report: Pseudohaploid assembly has 3 contigs 72 | =============================================================== 73 | -------------------------------------------------------------------------------- /test/run_tests.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | echo "Running the simple example" 4 | echo "===============================================================" 5 | mkdir -p simple 6 | cd simple 7 | ln -sf ../simple.fa 8 | ../../create_pseudohaploid.sh simple.fa ph.simple 9 | cd .. 10 | 11 | echo 12 | echo "This should report: Pseudohaploid assembly has 1 contigs" 13 | 14 | echo "===============================================================" 15 | 16 | echo 17 | echo 18 | 19 | 20 | echo "Running the basic example" 21 | echo "===============================================================" 22 | mkdir -p basic 23 | cd basic 24 | ln -sf ../basic.fa 25 | ../../create_pseudohaploid.sh basic.fa ph.basic 26 | cd .. 27 | 28 | echo 29 | echo "This should report: Pseudohaploid assembly has 3 contigs" 30 | echo "===============================================================" 31 | -------------------------------------------------------------------------------- /test/simple.fa: -------------------------------------------------------------------------------- 1 | >ctg1 2 | AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC 3 | TTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAA 4 | TATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACC 5 | ATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAG 6 | CCCGCACCTGACAGTGCGGGCTTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAA 7 | GTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCC 8 | AGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTGGCGATGATTG 9 | AAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTATTTTTGCCGAACTTTT 10 | GACGGGACTCGCCGCCGCCCAGCCGGGGTTCCCGCTGGCGCAATTGAAAACTTTCGTCGATCAGGAATTT 11 | GCCCAAATAAAACATGTCCTGCATGGCATTAGTTTGTTGGGGCAGTGCCCGGATAGCATCAACGCTGCGC 12 | TGATTTGCCGTGGCGAGAAAATGTCGATCGCCATTATGGCCGGCGTATTAGAAGCGCGCGGTCACAACGT 13 | TACTGTTATCGATCCGGTCGAAAAACTGCTGGCAGTGGGGCATTACCTCGAATCTACCGTCGATATTGCT 14 | GAGTCCACCCGCCGTATTGCGGCAAGCCGCATTCCGGCTGATCACATGGTGCTGATGGCAGGTTTCACCG 15 | CCGGTAATGAAAAAGGCGAACTGGTGGTGCTTGGACGCAACGGTTCCGACTACTCTGCTGCGGTGCTGGC 16 | TGCCTGTTTACGCGCCGATTGTTGCGAGATTTGGACGGACGTTGACGGGGTCTATACCTGCGACCCGCGT 17 | CAGGTGCCCGATGCGAGGTTGTTGAAGTCGATGTCCTACCAGGAAGCGATGGAGCTTTCCTACTTCGGCG 18 | CTAAAGTTCTTCACCCCCGCACCATTACCCCCATCGCCCAGTTCCAGATCCCTTGCCTGATTAAAAATAC 19 | CGGAAATCCTCAAGCACCAGGTACGCTCATTGGTGCCAGCCGTGATGAAGACGAATTACCGGTCAAGGGC 20 | ATTTCCAATCTGAATAACATGGCAATGTTCAGCGTTTCTGGTCCGGGGATGAAAGGGATGGTCGGCATGG 21 | CGGCGCGCGTCTTTGCAGCGATGTCACGCGCCCGTATTTCCGTGGTGCTGATTACGCAATCATCTTCCGA 22 | ATACAGCATCAGTTTCTGCGTTCCACAAAGCGACTGTGTGCGAGCTGAACGGGCAATGCAGGAAGAGTTC 23 | TACCTGGAACTGAAAGAAGGCTTACTGGAGCCGCTGGCAGTGACGGAACGGCTGGCCATTATCTCGGTGG 24 | TAGGTGATGGTATGCGCACCTTGCGTGGGATCTCGGCGAAATTCTTTGCCGCACTGGCCCGCGCCAATAT 25 | CAACATTGTCGCCATTGCTCAGGGATCTTCTGAACGCTCAATCTCTGTCGTGGTAAATAACGATGATGCG 26 | ACCACTGGCGTGCGCGTTACTCATCAGATGCTGTTCAATACCGATCAGGTTATCGAAGTGTTTGTGATTG 27 | GCGTCGGTGGCGTTGGCGGTGCGCTGCTGGAGCAACTGAAGCGTCAGCAAAGCTGGCTGAAGAATAAACA 28 | TATCGACTTACGTGTCTGCGGTGTTGCCAACTCGAAGGCTCTGCTCACCAATGTACATGGCCTTAATCTG 29 | GAAAACTGGCAGGAAGAACTGGCGCAAGCCAAAGAGCCGTTTAATCTCGGGCGCTTAATTCGCCTCGTGA 30 | AAGAATATCATCTGCTGAACCCGGTCATTGTTGACTGCACTTCCAGCCAGGCAGTGGCGGATCAATATGC 31 | CGACTTCCTGCGCGAAGGTTTCCACGTTGTCACGCCGAACAAAAAGGCCAACACCTCGTCGATGGATTAC 32 | TACCATCAGTTGCGTTATGCGGCGGAAAAATCGCGGCGTAAATTCCTCTATGACACCAACGTTGGGGCTG 33 | GATTACCGGTTATTGAGAACCTGCAAAATCTGCTCAATGCAGGTGATGAATTGATGAAGTTCTCCGGCAT 34 | TCTTTCTGGTTCGCTTTCTTATATCTTCGGCAAGTTAGACGAAGGCATGAGTTTCTCCGAGGCGACCACG 35 | CTGGCGCGGGAAATGGGTTATACCGAACCGGACCCGCGAGATGATCTTTCTGGTATGGATGTGGCGCGTA 36 | AACTATTGATTCTCGCTCGTGAAACGGGACGTGAACTGGAGCTGGCGGATATTGAAATTGAACCTGTGCT 37 | GCCCGCAGAGTTTAACGCCGAGGGTGATGTTGCCGCTTTTATGGCGAATCTGTCACAACTCGACGATCTC 38 | TTTGCCGCGCGCGTGGCGAAGGCCCGTGATGAAGGAAAAGTTTTGCGCTATGTTGGCAATATTGATGAAG 39 | ATGGCGTCTGCCGCGTGAAGATTGCCGAAGTGGATGGTAATGATCCGCTGTTCAAAGTGAAAAATGGCGA 40 | AAACGCCCTGGCCTTCTATAGCCACTATTATCAGCCGCTGCCGTTGGTACTGCGCGGATATGGTGCGGGC 41 | AATGACGTTACAGCTGCCGGTGTCTTTGCTGATCTGCTACGTACCCTCTCATGGAAGTTAGGAGTCTGAC 42 | ATGGTTAAAGTTTATGCCCCGGCTTCCAGTGCCAATATGAGCGTCGGGTTTGATGTGCTCGGGGCGGCGG 43 | TGACACCTGTTGATGGTGCATTGCTCGGAGATGTAGTCACGGTTGAGGCGGCAGAGACATTCAGTCTCAA 44 | CAACCTCGGACGCTTTGCCGATAAGCTGCCGTCAGAACCACGGGAAAATATCGTTTATCAGTGCTGGGAG 45 | CGTTTTTGCCAGGAACTGGGTAAGCAAATTCCAGTGGCGATGACCCTGGAAAAGAATATGCCGATCGGTT 46 | CGGGCTTAGGCTCCAGTGCCTGTTCGGTGGTCGCGGCGCTGATGGCGATGAATGAACACTGCGGCAAGCC 47 | GCTTAATGACACTCGTTTGCTGGCTTTGATGGGCGAGCTGGAAGGCCGTATCTCCGGCAGCATTCATTAC 48 | GACAACGTGGCACCGTGTTTTCTCGGTGGTATGCAGTTGATGATCGAAGAAAACGACATCATCAGCCAGC 49 | AAGTGCCAGGGTTTGATGAGTGGCTGTGGGTGCTGGCGTATCCGGGGATTAAAGTCTCGACGGCAGAAGC 50 | CAGGGCTATTTTACCGGCGCAGTATCGCCGCCAGGATTGCATTGCGCACGGGCGACATCTGGCAGGCTTC 51 | ATTCACGCCTGCTATTCCCGTCAGCCTGAGCTTGCCGCGAAGCTGATGAAAGATGTTATCGCTGAACCCT 52 | ACCGTGAACGGTTACTGCCAGGCTTCCGGCAGGCGCGGCAGGCGGTCGCGGAAATCGGCGCGGTAGCGAG 53 | CGGTATCTCCGGCTCCGGCCCGACCTTGTTCGCTCTGTGTGACAAGCCGGAAACCGCCCAGCGCGTTGCC 54 | GACTGGTTGGGTAAGAACTACCTGCAAAATCAGGAAGGTTTTGTTCATATTTGCCGGCTGGATACGGCGG 55 | GCGCACGAGTACTGGAAAACTAAATGAAACTCTACAATCTGAAAGATCACAACGAGCAGGTCAGCTTTGC 56 | GCAAGCCGTAACCCAGGGGTTGGGCAAAAATCAGGGGCTGTTTTTTCCGCACGACCTGCCGGAATTCAGC 57 | CTGACTGAAATTGATGAGATGCTGAAGCTGGATTTTGTCACCCGCAGTGCGAAGATCCTCTCGGCGTTTA 58 | TTGGTGATGAAATCCCACAGGAAATCCTGGAAGAGCGCGTGCGCGCGGCGTTTGCCTTCCCGGCTCCGGT 59 | CGCCAATGTTGAAAGCGATGTCGGTTGTCTGGAATTGTTCCACGGGCCAACGCTGGCATTTAAAGATTTC 60 | GGCGGTCGCTTTATGGCACAAATGCTGACCCATATTGCGGGTGATAAGCCAGTGACCATTCTGACCGCGA 61 | CCTCCGGTGATACCGGAGCGGCAGTGGCTCATGCTTTCTACGGTTTACCGAATGTGAAAGTGGTTATCCT 62 | CTATCCACGAGGCAAAATCAGTCCACTGCAAGAAAAACTGTTCTGTACATTGGGCGGCAATATCGAAACT 63 | GTTGCCATCGACGGCGATTTCGATGCCTGTCAGGCGCTGGTGAAGCAGGCGTTTGATGATGAAGAACTGA 64 | AAGTGGCGCTAGGGTTAAACTCGGCTAACTCGATTAACATCAGCCGTTTGCTGGCGCAGATTTGCTACTA 65 | CTTTGAAGCTGTTGCGCAGCTGCCGCAGGAGACGCGCAACCAGCTGGTTGTCTCGGTGCCAAGCGGAAAC 66 | TTCGGCGATTTGACGGCGGGTCTGCTGGCGAAGTCACTCGGTCTGCCGGTGAAACGTTTTATTGCTGCGA 67 | CCAACGTGAACGATACCGTGCCACGTTTCCTGCACGACGGTCAGTGGTCACCCAAAGCGACTCAGGCGAC 68 | GTTATCCAACGCGATGGACGTGAGTCAGCCGAACAACTGGCCGCGTGTGGAAGAGTTGTTCCGCCGCAAA 69 | ATCTGGCAACTGAAAGAGCTGGGTTATGCAGCCGTGGATGATGAAACCACGCAACAGACAATGCGTGAGT 70 | TAAAAGAACTGGGCTACACTTCGGAGCCGCACGCTGCCGTAGCTTATCGTGCGCTGCGTGATCAGTTGAA 71 | TCCAGGCGAATATGGCTTGTTCCTCGGCACCGCGCATCCGGCGAAATTTAAAGAGAGCGTGGAAGCGATT 72 | CTCGGTGAAACGTTGGATCTGCCAAAAGAGCTGGCAGAACGTGCTGATTTACCCTTGCTTTCACATAATC 73 | TGCCCGCCGATTTTGCTGCGTTGCGTAAATTGATGATGAATCATCAGTAAAATCTATTCATTATCTCAAT 74 | CAGGCCGGGTTTGCTTTTATGCAGCCCGGCTTTTTTATGAAGAAATTATGGAGAAAAATGACAGGGAAAA 75 | AGGAGAAATTCTCAATAAATGCGGTAACTTAGAGATTAGGATTGCGGAGAATAACAACCGCCGTTCTCAT 76 | CGAGTAATCTCCGGATATCGACCCATAACGGGCAATGATAAAAGGAGTAACCTGTGAAAAAGATGCAATC 77 | TATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCAGCACAGGCTGCGGAAATTACGTTAGTC 78 | CCGTCAGTAAAATTACAGATAGGCGATCGTGATAATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCG 79 | ACCACGGCTGGTGGAAACAACATTATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACC 80 | GCCGCGCCACCATAAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA 81 | ATGACAAATGCCGGGTAACAATCCGGCATTCAGCGCCTGATGCGACGCTGGCGCGTCTTATCAGGCCTAC 82 | GTTAATTCTGCAATATATTGAATCTGCATGCTTTTGTAGGCAGGATAAGGCGTTCACGCCGCATCCGGCA 83 | TTGACTGCAAACTTAACGCTGCTCGTAGCGTTTAAACACCAGTTCGCCATTGCTGGAGGAATCTTCATCA 84 | AAGAAGTAACCTTCGCTATTAAAACCAGTCAGTTGCTCTGGTTTGGTCAGCCGATTTTCAATAATGAAAC 85 | GACTCATCAGACCGCGTGCTTTCTTAGCGTAGAAGCTGATGATCTTAAATTTGCCGTTCTTCTCATCGAG 86 | GAACACCGGCTTGATAATCTCGGCATTCAATTTCTTCGGCTTCACCGATTTAAAATACTCATCTGACGCC 87 | AGATTAATCACCACATTATCGCCTTGTGCTGCGAGCGCCTCGTTCAGCTTGTTGGTGATGATATCTCCCC 88 | AGAATTGATACAGATCTTTCCCTCGGGCATTCTCAAGACGGATCCCCATTTCCAGACGATAAGGCTGCAT 89 | TAAATCGAGCGGGCGGAGTACGCCATACAAGCCGGAAAGCATTCGCAAATGCTGTTGGGCAAAATCGAAA 90 | TCGTCTTCGCTGAAGGTTTCGGCCTGCAAGCCGGTGTAGACATCACCTTTAAACGCCAGAATCGCCTGGC 91 | GGGCATTCGCCGGCGTGAAATCTGGCTGCCAGTCATGAAAGCGAGCGGCGTTGATACCCGCCAGTTTGTC 92 | GCTGATGCGCATCAGCGTGCTAATCTGCGGAGGCGTCAGTTTCCGCGCCTCATGGATCAACTGCTGGGAA 93 | TTGTCTAACAGCTCCGGCAGCGTATAGCGCGTGGTGGTCAACGGGCTTTGGTAATCAAGCGTTTTCGCAG 94 | GTGAAATAAGAATCAGCATATCCAGTCCTTGCAGGAAATTTATGCCGACTTTAGCAAAAAATGAGAATGA 95 | GTTGATCGATAGTTGTGATTACTCCTGCGAAACATCATCCCACGCGTCCGGAGAAAGCTGGCGACCGATA 96 | TCCGGATAACGCAATGGATCAAACACCGGGCGCACGCCGAGTTTACGCTGGCGTAGATAATCACTGGCAA 97 | TGGTATGAACCACAGGCGAGAGCAGTAAAATGGCGGTCAAATTGGTAATAGCCATGCAGGCCATTATGAT 98 | ATCTGCCAGTTGCCACATCAGCGGAAGGCTTAGCAAGGTGCCGCCGATGACCGTTGCGAAGGTGCAGATC 99 | CGCAAACACCAGATCGCTTTAGGGTTGTTCAGGCGTAAAAAGAAGAGATTGTTTTCGGCATAAATGTAGT 100 | TGGCAACGATGGAGCTGAAGGCAAACAGAATAACCACAAGGGTAACAAACTCAGCACCCCAGGAACCCAT 101 | TAGCACCCGCATCGCCTTCTGGATAAGCTGAATACCTTCCAGCGGCATGTAGGTTGTGCCGTTACCCGCC 102 | AGTAATATCAGCATGGCGCTTGCCGTACAGATGACCAGGGTGTCGATAAAAATGCCAATCATCTGGACAA 103 | TCCCTTGCGCTGCCGGATGCGGAGGCCAGGACGCCGCTGCCGCTGCCGCGTTTGGCGTCGAACCCATTCC 104 | CGCCTCATTGGAAAACATACTGCGCTGAAAACCGTTAGTAATCGCCTGGCTTAAGGTATATCCCGCCGCG 105 | CCGCCTGCCGCTTCCTGCCAGCCAAAAGCACTCTCAAAAATAGACCAAATGACGTGGGGAAGTTGCCCGA 106 | TATTCATTACGCAAATTACCAGGCTGGTCAGTACCCAGATTATCGCCATCAACGGGACAAAGCCCTGCAT 107 | GAGCCGGGCGACGCCATGAAGACCGCGAGTGATTGCCAGCAGAGTAAAGACAGCGAGAATAATGCCTGTC 108 | ACCAGCGGGGGAAAATCAAAAGAAAAACTCAGGGCGCGGGCAACGGCGTTCGCTTGAACTCCGCTGAAAA 109 | TTATGCCATAGGCGATGAGCAAAAAGACGGCGAACAGAACGCCCATCCAGCGCATCCCCAGCCCGCGCGC 110 | CATATACCATGCCGGTCCGCCACGAAACTGCCCATTGACGTCACGTTCTTTATAAAGTTGTGCCAGAGAA 111 | CATTCGGCAAACGAGGTCGCCATGCCGATAAACGCGGCAACCCACATCCAAAAGACGGCTCCAGGTCCAC 112 | CGGCGGTAATAGCCAGCGCAACGCCGGCCAGGTTGCCGCTACCCACGCGCGCCGCAAGACTGGTACACAA 113 | TGACTGAAATGAGGTTAAACCGCCTGGCTGTGGATGAATGCTATTTTTAAGACTTTTGCCAAACTGGCGG 114 | ATGTAGCGAAACTGCACAAATCCGGTGCGAAAAGTGAACCAACAACCTGCGCCGAAGAGCAGGTAAATCA 115 | TTACCGATCCCCAAAGGACGCTGTTAATGAAGGAGAAAAAATCTGGCATGCATATCCCTCTTATTGCCGG 116 | TCGCGATGACTTTCCTGTGTAAACGTTACCAATTGTTTAAGAAGTATATACGCTACGAGGTACTTGATAA 117 | CTTCTGCGTAGCATACATGAGGTTTTGTATAAAAATGGCGGGCGATATCAACGCAGTGTCAGAAATCCGA 118 | AACAGTCTCGCCTGGCGATAACCGTCTTGTCGGCGGTTGCGCTGACGTTGCGTCGTGATATCATCAGGGC 119 | AGACCGGTTACATCCCCCTAACAAGCTGTTTAAAGAGAAATACTATCATGACGGACAAATTGACCTCCCT 120 | TCGTCAGTACACCACCGTAGTGGCCGACACTGGGGACATCGCGGCAATGAAGCTGTATCAACCGCAGGAT 121 | GCCACAACCAACCCTTCTCTCATTCTTAACGCAGCGCAGATTCCGGAATACCGTAAGTTGATTGATGATG 122 | CTGTCGCCTGGGCGAAACAGCAGAGCAACGATCGCGCGCAGCAGATCGTGGACGCGACCGACAAACTGGC 123 | AGTAAATATTGGTCTGGAAATCCTGAAACTGGTTCCGGGCCGTATCTCAACTGAAGTTGATGCGCGTCTT 124 | TCCTATGACACCGAAGCGTCAATTGCGAAAGCAAAACGCCTGATCAAACTCTACAACGATGCTGGTATTA 125 | GCAACGATCGTATTCTGATCAAACTGGCTTCTACCTGGCAGGGTATCCGTGCTGCAGAACAGCTGGAAAA 126 | AGAAGGCATCAACTGTAACCTGACCCTGCTGTTCTCCTTCGCTCAGGCTCGTGCTTGTGCGGAAGCGGGC 127 | GTGTTCCTGATCTCGCCGTTTGTTGGCCGTATTCTTGACTGGTACAAAGCGAATACCGATAAGAAAGAGT 128 | ACGCTCCGGCAGAAGATCCGGGCGTGGTTTCTGTATCTGAAATCTACCAGTACTACAAAGAGCACGGTTA 129 | TGAAACCGTGGTTATGGGCGCAAGCTTCCGTAACATCGGCGAAATTCTGGAACTGGCAGGCTGCGACCGT 130 | CTGACCATCGCACCGGCACTGCTGAAAGAGCTGGCGGAGAGCGAAGGGGCTATCGAACGTAAACTGTCTT 131 | ACACCGGCGAAGTGAAAGCGCGTCCGGCGCGTATCACTGAGTCCGAGTTCCTGTGGCAGCACAACCAGGA 132 | TCCAATGGCAGTAGATAAACTGGCGGAAGGTATCCGTAAGTTTGCTATTGACCAGGAAAAACTGGAAAAA 133 | ATGATCGGCGATCTGCTGTAATCATTCTTAGCGTGACCGGGAAGTCGGTCACGCTACCTCTTCTGAAGCC 134 | TGTCTGTCACTCCCTTCGCAGTGTATCATTCTGTTTAACGAGACTGTTTAAACGGAAAAATCTTGATGAA 135 | TACTTTACGTATTGGCTTAGTTTCCATCTCTGATCGCGCATCCAGCGGCGTTTATCAGGATAAAGGCATC 136 | CCTGCGCTGGAAGAATGGCTGACATCGGCGCTAACCACGCCGTTTGAACTGGAAACCCGCTTAATCCCCG 137 | ATGAGCAGGCGATCATCGAGCAAACGTTGTGTGAGCTGGTGGATGAAATGAGTTGCCATCTGGTGCTCAC 138 | CACGGGCGGAACTGGCCCGGCGCGTCGTGACGTAACGCCCGATGCGACGCTGGCAGTAGCGGACCGCGAG 139 | ATGCCTGGCTTTGGTGAACAGATGCGCCAGATCAGCCTGCATTTTGTACCAACTGCGATCCTTTCGCGTC 140 | AGGTGGGCGTGATTCGCAAACAGGCGCTGATCCTTAACTTACCCGGTCAGCCGAAGTCTATTAAAGAGAC 141 | GCTGGAAGGTGTGAAGGACGCTGAGGGTAACGTTGTGGTACACGGTATTTTTGCCAGCGTACCGTACTGC 142 | ATTCAGTTGCTGGAAGGGCCATACGTTGAAACGGCACCGGAAGTGGTTGCAGCATTCAGACCGAAGAGTG 143 | CAAGACGCGACGTTAGCGAATAAAAAAATCCCCCCGAGCGGGGGGATCTCAAAACAATTAGTGGGATTCA 144 | CCAATCGGCAGAACGGTGCGACCAAACTGCTCGTTCAGTACTTCACCCATCGCCAGATAGATTGCGCTGG 145 | CACCGCAGATCAGCCCAATCCAGCCGGCAAAGTGGATGATTGCGGCGTTACCGGCAATGTTACCGATCGC 146 | CAGCAGGGCAAACAGCACGGTCAGGCTAAAGAAAACGAATTGCAGAACGCGTGCGCCTTTCAGCGTGCCG 147 | AAGAACATAAACAGCGTAAATACGCCCCACAGACCCAGGTAGACACCAAGGAACTGTGCATTTGGCGCAT 148 | CGGTCAGACCCAGTTTCGGCATCAGCAGAATCGCAACCAGCGTCAGCCAGAAAGAACCGTAAGAGGTGAA 149 | TGCGGTTAAACCGAAAGTGTTGCCTTTTTTGTACTCCAGCAGACCAGCAAAAATTTGCGCGATGCCGCCG 150 | TAGAAAATGCCCATGGCAAGAATAATACCGTCCAGAGCGAAATAACCCACGTTGTGCAGGTTAAGCAGAA 151 | TGGTGGTCATGCCGAAGCCCATCAGGCCCAGCGGTGCCGGATTAGCCAACTTAGTGTTGCCCATAATTCC 152 | TCAAAAATCATCATCGAATGAATGGTGAAATAATTTCCCTGAATAACTGTAGTGTTTTCAGGGCGCGGCA 153 | TAATAATCAGCCAGTGGGGCAGTGTCTACGATCTTTTGAGGGGAAAATGAAAATTTTCCCCGGTTTCCGG 154 | TATCAGACCTGAGTGGCGCTAACCATCCGGCGCAGGCAGGCGATTTGCAGTACGGCTGGAATCGTCACGC 155 | GATAGGCGCTGCCGCTGACCGCTTTAACCCCATTTAGTGCCGCACCTACAGGGCCTCCCAGCCCCGCGCC 156 | GCGCAGCAAACCATGCCCAAGTACGCTCATTGCTGCGTGGGTGCGTAAAATGCGGGTCAGTTGGCTGGAA 157 | AGCAAATGCGACACACCTTTTGCCAATAATTTGTCTTTCATCAGCAGCGGCAGCAGCTCTTCCAGCTCAT 158 | TCACCCTGGCATCGACCGCGTGCAGAAACTCCTGCTTATGTTCCTCGTCCATTTTCTTCCAGGTATTACG 159 | CAGAAATTGTTCCAGTAACTGTTGCTCAATTTCAAACGTAGACATCTCTTTGTCGGCTTTCAGCTTCAAT 160 | CGCTTTGAAACATCGAGCAAAATGGCCCGATACAATTTACCGTGTCCGCGCAGTTTGTTGGCGATACTAT 161 | CGCCACCAAAATGCTGTAATTCTCCGGCAATCAGCTGCCAGTTGCGGCGATGTTGCTCGGGATGCCCTTC 162 | CATCGATTTAAACAGTTCGTTGCGCATCAGTACGCTGGAGAGGCGAGTTTTGCCTTTTTCATTATGGGTG 163 | AGCAATCGGGCGAAATTTGCCAACTGTTCCTCACTACAATGCTGAAGAAAATCCAGATCTGAATCATTCA 164 | GGTAATTAACATTCATTTTTTGTGGCTTCTATATTCTGGCGTTAGTCGTCGCCGATAATTTTCAGCGTGG 165 | CCATATCCGATGAGTTCACCGTATGACCCGAAAAGGTGATTTTTGAGACGCAGCGTTTATTGTCGTTATC 166 | GCTGTTAATGTTGATCCAGTCAGTGGTTTGCCCTTCTTTTATTTCTGAAGGAATATTCAGGCTCTGACTG 167 | GCGCTACGGGCGGCTTTGAAATAAACCGATGCACCGCTTAACTGTAAATCGCCATGGTCGGCAGAGAGTT 168 | GTATGCGTTTCACAATGCGACAAACAGGAAGTTTCAGCGCCAGATCGTTGGTTTCGTTACGCGGCATTGC 169 | AATGGCGCCGAGGAGTTTATGGTCGTTTGCCTGCGCCGTGCAGCACAGCATCAGGCTAATCGCCAGGCTG 170 | GCGGAAATCGTAAAAACGGATTTCATAAGGATTCTCTTAGTGGGAAGAGGTAGGGGGATGAATACCCACT 171 | AGTTTACTGCTGATAAAGAGAAGATTCAGGCACGTAATCTTTTCTTTTTATTACAATTTTTTGATGAATG 172 | CCTTGGCTGCGATTCATTCTTTATATGAATAAAATTGCTGTCAATTTTACGTCTTGTCCTGCCATATCGC 173 | GAAATTTCTGCGCAAAAGCACAAAAAATTTTTGCATCTCCCCCTTGATGACGTGGTTTACGACCCCATTT 174 | AGTAGTCAACCGCAGTGAGTGAGTCTGCAAAAAAATGAAATTGGGCAGTTGAAACCAGACGTTTCGCCCC 175 | TATTACAGACTCACAACCACATGATGACCGAATATATAGTGGAGACGTTTAGATGGGTAAAATAATTGGT 176 | ATCGACCTGGGTACTACCAACTCTTGTGTAGCGATTATGGATGGCACCACTCCTCGCGTGCTGGAGAACG 177 | CCGAAGGCGATCGCACCACGCCTTCTATCATTGCCTATACCCAGGATGGTGAAACTCTAGTTGGTCAGCC 178 | GGCTAAACGTCAGGCAGTGACGAACCCGCAAAACACTCTGTTTGCGATTAAACGCCTGATTGGTCGCCGC 179 | TTCCAGGACGAAGAAGTACAGCGTGATGTTTCCATCATGCCGTTCAAAATTATTGCTGCTGATAACGGCG 180 | ACGCATGGGTCGAAGTTAAAGGCCAGAAAATGGCACCGCCGCAGATTTCTGCTGAAGTGCTGAAAAAAAT 181 | GAAGAAAACCGCTGAAGATTACCTGGGTGAACCGGTAACTGAAGCTGTTATCACCGTACCGGCATACTTT 182 | AACGATGCTCAGCGTCAGGCAACCAAAGACGCAGGCCGTATCGCTGGTCTGGAAGTAAAACGTATCATCA 183 | ACGAACCGACCGCAGCTGCGCTGGCTTACGGTCTGGACAAAGGCACTGGCAACCGTACTATCGCGGTTTA 184 | TGACCTGGGTGGTGGTACTTTCGATATTTCTATTATCGAAATCGACGAAGTTGACGGCGAAAAAACCTTC 185 | GAAGTTCTGGCAACCAACGGTGATACCCACCTGGGGGGTGAAGACTTCGACAGCCGTCTGATCAACTATC 186 | TGGTTGAAGAATTCAAGAAAGATCAGGGCATTGACCTGCGCAACGATCCGCTGGCAATGCAGCGCCTGAA 187 | AGAAGCGGCAGAAAAAGCGAAAATCGAACTGTCTTCCGCTCAGCAGACCGACGTTAACCTGCCATACATC 188 | ACTGCAGACGCGACCGGTCCGAAACACATGAACATCAAAGTGACTCGTGCGAAACTGGAAAGCCTGGTTG 189 | AAGATCTGGTAAACCGTTCCATTGAGCCGCTGAAAGTTGCACTGCAGGACGCTGGCCTGTCCGTATCTGA 190 | TATCGACGACGTTATCCTCGTTGGTGGTCAGACTCGTATGCCAATGGTTCAGAAGAAAGTTGCTGAGTTC 191 | TTTGGTAAAGAGCCGCGTAAAGACGTTAACCCGGACGAAGCTGTAGCAATCGGTGCTGCTGTTCAGGGTG 192 | GTGTTCTGACTGGTGACGTAAAAGACGTACTGCTGCTGGACGTTACCCCGCTGTCTCTGGGTATCGAAAC 193 | CATGGGCGGTGTGATGACGACGCTGATCGCGAAAAACACCACTATCCCGACCAAGCACAGCCAGGTGTTC 194 | TCTACCGCTGAAGACAACCAGTCTGCGGTAACCATCCATGTGCTGCAGGGTGAACGTAAACGTGCGGCTG 195 | ATAACAAATCTCTGGGTCAGTTCAACCTAGATGGTATCAACCCGGCACCGCGCGGCATGCCGCAGATCGA 196 | AGTTACCTTCGATATCGATGCTGACGGTATCCTGCACGTTTCCGCGAAAGATAAAAACAGCGGTAAAGAG 197 | CAGAAGATCACCATCAAGGCTTCTTCTGGTCTGAACGAAGATGAAATCCAGAAAATGGTACGCGACGCAG 198 | AAGCTAACGCCGAAGCTGACCGTAAGTTTGAAGAGCTGGTACAGACTCGCAACCAGGGCGACCATCTGCT 199 | GCACAGCACCCGTAAGCAGGTTGAAGAAGCAGGCGACAAACTGCCGGCTGACGACAAAACTGCTATCGAG 200 | TCTGCGCTGACTGCACTGGAAACTGCTCTGAAAGGTGAAGACAAAGCCGCTATCGAAGCGAAAATGCAGG 201 | >ctg2 202 | CTGGCGCGGGAAATGGGTTATACCGAACCGGACCCGCGAGATGATCTTTCTGGTATGGATGTGGCGCGTA 203 | AACTATTGATTCTCGCTCGTGAAACGGGACGTGAACTGGAGCTGGCGGATATTGAAATTGAACCTGTGCT 204 | GCCCGCAGAGTTTAACGCCGAGGGTGATGTTGCCGCTTTTATGGCGAATCTGTCACAACTCGACGATCTC 205 | TTTGCCGCGCGCGTGGCGAAGGCCCGTGATGAAGGAAAAGTTTTGCGCTATGTTGGCAATATTGATGAAG 206 | ATGGCGTCTGCCGCGTGAAGATTGCCGAAGTGGATGGTAATGATCCGCTGTTCAAAGTGAAAAATGGCGA 207 | AAACGCCCTGGCCTTCTATAGCCACTATTATCAGCCGCTGCCGTTGGTACTGCGCGGATATGGTGCGGGC 208 | AATGACGTTACAGCTGCCGGTGTCTTTGCTGATCTGCTACGTACCCTCTCATGGAAGTTAGGAGTCTGAC 209 | ATGGTTAAAGTTTATGCCCCGGCTTCCAGTGCCAATATGAGCGTCGGGTTTGATGTGCTCGGGGCGGCGG 210 | TGACACCTGTTGATGGTGCATTGCTCGGAGATGTAGTCACGGTTGAGGCGGCAGAGACATTCAGTCTCAA 211 | CAACCTCGGACGCTTTGCCGATAAGCTGCCGTCAGAACCACGGGAAAATATCGTTTATCAGTGCTGGGAG 212 | CGTTTTTGCCAGGAACTGGGTAAGCAAATTCCAGTGGCGATGACCCTGGAAAAGAATATGCCGATCGGTT 213 | CGGGCTTAGGCTCCAGTGCCTGTTCGGTGGTCGCGGCGCTGATGGCGATGAATGAACACTGCGGCAAGCC 214 | GCTTAATGACACTCGTTTGCTGGCTTTGATGGGCGAGCTGGAAGGCCGTATCTCCGGCAGCATTCATTAC 215 | GACAACGTGGCACCGTGTTTTCTCGGTGGTATGCAGTTGATGATCGAAGAAAACGACATCATCAGCCAGC 216 | AAGTGCCAGGGTTTGATGAGTGGCTGTGGGTGCTGGCGTATCCGGGGATTAAAGTCTCGACGGCAGAAGC 217 | CAGGGCTATTTTACCGGCGCAGTATCGCCGCCAGGATTGCATTGCGCACGGGCGACATCTGGCAGGCTTC 218 | ATTCACGCCTGCTATTCCCGTCAGCCTGAGCTTGCCGCGAAGCTGATGAAAGATGTTATCGCTGAACCCT 219 | ACCGTGAACGGTTACTGCCAGGCTTCCGGCAGGCGCGGCAGGCGGTCGCGGAAATCGGCGCGGTAGCGAG 220 | CGGTATCTCCGGCTCCGGCCCGACCTTGTTCGCTCTGTGTGACAAGCCGGAAACCGCCCAGCGCGTTGCC 221 | GACTGGTTGGGTAAGAACTACCTGCAAAATCAGGAAGGTTTTGTTCATATTTGCCGGCTGGATACGGCGG 222 | GCGCACGAGTACTGGAAAACTAAATGAAACTCTACAATCTGAAAGATCACAACGAGCAGGTCAGCTTTGC 223 | GCAAGCCGTAACCCAGGGGTTGGGCAAAAATCAGGGGCTGTTTTTTCCGCACGACCTGCCGGAATTCAGC 224 | CTGACTGAAATTGATGAGATGCTGAAGCTGGATTTTGTCACCCGCAGTGCGAAGATCCTCTCGGCGTTTA 225 | TTGGTGATGAAATCCCACAGGAAATCCTGGAAGAGCGCGTGCGCGCGGCGTTTGCCTTCCCGGCTCCGGT 226 | CGCCAATGTTGAAAGCGATGTCGGTTGTCTGGAATTGTTCCACGGGCCAACGCTGGCATTTAAAGATTTC 227 | GGCGGTCGCTTTATGGCACAAATGCTGACCCATATTGCGGGTGATAAGCCAGTGACCATTCTGACCGCGA 228 | CCTCCGGTGATACCGGAGCGGCAGTGGCTCATGCTTTCTACGGTTTACCGAATGTGAAAGTGGTTATCCT 229 | CTATCCACGAGGCAAAATCAGTCCACTGCAAGAAAAACTGTTCTGTACATTGGGCGGCAATATCGAAACT 230 | GTTGCCATCGACGGCGATTTCGATGCCTGTCAGGCGCTGGTGAAGCAGGCGTTTGATGATGAAGAACTGA 231 | AAGTGGCGCTAGGGTTAAACTCGGCTAACTCGATTAACATCAGCCGTTTGCTGGCGCAGATTTGCTACTA 232 | CTTTGAAGCTGTTGCGCAGCTGCCGCAGGAGACGCGCAACCAGCTGGTTGTCTCGGTGCCAAGCGGAAAC 233 | TTCGGCGATTTGACGGCGGGTCTGCTGGCGAAGTCACTCGGTCTGCCGGTGAAACGTTTTATTGCTGCGA 234 | CCAACGTGAACGATACCGTGCCACGTTTCCTGCACGACGGTCAGTGGTCACCCAAAGCGACTCAGGCGAC 235 | GTTATCCAACGCGATGGACGTGAGTCAGCCGAACAACTGGCCGCGTGTGGAAGAGTTGTTCCGCCGCAAA 236 | ATCTGGCAACTGAAAGAGCTGGGTTATGCAGCCGTGGATGATGAAACCACGCAACAGACAATGCGTGAGT 237 | TAAAAGAACTGGGCTACACTTCGGAGCCGCACGCTGCCGTAGCTTATCGTGCGCTGCGTGATCAGTTGAA 238 | TCCAGGCGAATATGGCTTGTTCCTCGGCACCGCGCATCCGGCGAAATTTAAAGAGAGCGTGGAAGCGATT 239 | CTCGGTGAAACGTTGGATCTGCCAAAAGAGCTGGCAGAACGTGCTGATTTACCCTTGCTTTCACATAATC 240 | TGCCCGCCGATTTTGCTGCGTTGCGTAAATTGATGATGAATCATCAGTAAAATCTATTCATTATCTCAAT 241 | CAGGCCGGGTTTGCTTTTATGCAGCCCGGCTTTTTTATGAAGAAATTATGGAGAAAAATGACAGGGAAAA 242 | AGGAGAAATTCTCAATAAATGCGGTAACTTAGAGATTAGGATTGCGGAGAATAACAACCGCCGTTCTCAT 243 | CGAGTAATCTCCGGATATCGACCCATAACGGGCAATGATAAAAGGAGTAACCTGTGAAAAAGATGCAATC 244 | TATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCAGCACAGGCTGCGGAAATTACGTTAGTC 245 | CCGTCAGTAAAATTACAGATAGGCGATCGTGATAATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCG 246 | ACCACGGCTGGTGGAAACAACATTATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACC 247 | GCCGCGCCACCATAAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA 248 | AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 249 | AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 250 | AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 251 | AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 252 | AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 253 | AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 254 | ATGACAAATGCCGGGTAACAATCCGGCATTCAGCGCCTGATGCGACGCTGGCGCGTCTTATCAGGCCTAC 255 | GTTAATTCTGCAATATATTGAATCTGCATGCTTTTGTAGGCAGGATAAGGCGTTCACGCCGCATCCGGCA 256 | TTGACTGCAAACTTAACGCTGCTCGTAGCGTTTAAACACCAGTTCGCCATTGCTGGAGGAATCTTCATCA 257 | AAGAAGTAACCTTCGCTATTAAAACCAGTCAGTTGCTCTGGTTTGGTCAGCCGATTTTCAATAATGAAAC 258 | GACTCATCAGACCGCGTGCTTTCTTAGCGTAGAAGCTGATGATCTTAAATTTGCCGTTCTTCTCATCGAG 259 | GAACACCGGCTTGATAATCTCGGCATTCAATTTCTTCGGCTTCACCGATTTAAAATACTCATCTGACGCC 260 | AGATTAATCACCACATTATCGCCTTGTGCTGCGAGCGCCTCGTTCAGCTTGTTGGTGATGATATCTCCCC 261 | AGAATTGATACAGATCTTTCCCTCGGGCATTCTCAAGACGGATCCCCATTTCCAGACGATAAGGCTGCAT 262 | TAAATCGAGCGGGCGGAGTACGCCATACAAGCCGGAAAGCATTCGCAAATGCTGTTGGGCAAAATCGAAA 263 | TCGTCTTCGCTGAAGGTTTCGGCCTGCAAGCCGGTGTAGACATCACCTTTAAACGCCAGAATCGCCTGGC 264 | GGGCATTCGCCGGCGTGAAATCTGGCTGCCAGTCATGAAAGCGAGCGGCGTTGATACCCGCCAGTTTGTC 265 | GCTGATGCGCATCAGCGTGCTAATCTGCGGAGGCGTCAGTTTCCGCGCCTCATGGATCAACTGCTGGGAA 266 | TTGTCTAACAGCTCCGGCAGCGTATAGCGCGTGGTGGTCAACGGGCTTTGGTAATCAAGCGTTTTCGCAG 267 | GTGAAATAAGAATCAGCATATCCAGTCCTTGCAGGAAATTTATGCCGACTTTAGCAAAAAATGAGAATGA 268 | GTTGATCGATAGTTGTGATTACTCCTGCGAAACATCATCCCACGCGTCCGGAGAAAGCTGGCGACCGATA 269 | TCCGGATAACGCAATGGATCAAACACCGGGCGCACGCCGAGTTTACGCTGGCGTAGATAATCACTGGCAA 270 | TGGTATGAACCACAGGCGAGAGCAGTAAAATGGCGGTCAAATTGGTAATAGCCATGCAGGCCATTATGAT 271 | ATCTGCCAGTTGCCACATCAGCGGAAGGCTTAGCAAGGTGCCGCCGATGACCGTTGCGAAGGTGCAGATC 272 | CGCAAACACCAGATCGCTTTAGGGTTGTTCAGGCGTAAAAAGAAGAGATTGTTTTCGGCATAAATGTAGT 273 | TGGCAACGATGGAGCTGAAGGCAAACAGAATAACCACAAGGGTAACAAACTCAGCACCCCAGGAACCCAT 274 | TAGCACCCGCATCGCCTTCTGGATAAGCTGAATACCTTCCAGCGGCATGTAGGTTGTGCCGTTACCCGCC 275 | AGTAATATCAGCATGGCGCTTGCCGTACAGATGACCAGGGTGTCGATAAAAATGCCAATCATCTGGACAA 276 | TCCCTTGCGCTGCCGGATGCGGAGGCCAGGACGCCGCTGCCGCTGCCGCGTTTGGCGTCGAACCCATTCC 277 | CGCCTCATTGGAAAACATACTGCGCTGAAAACCGTTAGTAATCGCCTGGCTTAAGGTATATCCCGCCGCG 278 | CCGCCTGCCGCTTCCTGCCAGCCAAAAGCACTCTCAAAAATAGACCAAATGACGTGGGGAAGTTGCCCGA 279 | TATTCATTACGCAAATTACCAGGCTGGTCAGTACCCAGATTATCGCCATCAACGGGACAAAGCCCTGCAT 280 | GAGCCGGGCGACGCCATGAAGACCGCGAGTGATTGCCAGCAGAGTAAAGACAGCGAGAATAATGCCTGTC 281 | ACCAGCGGGGGAAAATCAAAAGAAAAACTCAGGGCGCGGGCAACGGCGTTCGCTTGAACTCCGCTGAAAA 282 | TTATGCCATAGGCGATGAGCAAAAAGACGGCGAACAGAACGCCCATCCAGCGCATCCCCAGCCCGCGCGC 283 | CATATACCATGCCGGTCCGCCACGAAACTGCCCATTGACGTCACGTTCTTTATAAAGTTGTGCCAGAGAA 284 | CATTCGGCAAACGAGGTCGCCATGCCGATAAACGCGGCAACCCACATCCAAAAGACGGCTCCAGGTCCAC 285 | CGGCGGTAATAGCCAGCGCAACGCCGGCCAGGTTGCCGCTACCCACGCGCGCCGCAAGACTGGTACACAA 286 | TGACTGAAATGAGGTTAAACCGCCTGGCTGTGGATGAATGCTATTTTTAAGACTTTTGCCAAACTGGCGG 287 | ATGTAGCGAAACTGCACAAATCCGGTGCGAAAAGTGAACCAACAACCTGCGCCGAAGAGCAGGTAAATCA 288 | TTACCGATCCCCAAAGGACGCTGTTAATGAAGGAGAAAAAATCTGGCATGCATATCCCTCTTATTGCCGG 289 | TCGCGATGACTTTCCTGTGTAAACGTTACCAATTGTTTAAGAAGTATATACGCTACGAGGTACTTGATAA 290 | CTTCTGCGTAGCATACATGAGGTTTTGTATAAAAATGGCGGGCGATATCAACGCAGTGTCAGAAATCCGA 291 | AACAGTCTCGCCTGGCGATAACCGTCTTGTCGGCGGTTGCGCTGACGTTGCGTCGTGATATCATCAGGGC 292 | AGACCGGTTACATCCCCCTAACAAGCTGTTTAAAGAGAAATACTATCATGACGGACAAATTGACCTCCCT 293 | TCGTCAGTACACCACCGTAGTGGCCGACACTGGGGACATCGCGGCAATGAAGCTGTATCAACCGCAGGAT 294 | GCCACAACCAACCCTTCTCTCATTCTTAACGCAGCGCAGATTCCGGAATACCGTAAGTTGATTGATGATG 295 | CTGTCGCCTGGGCGAAACAGCAGAGCAACGATCGCGCGCAGCAGATCGTGGACGCGACCGACAAACTGGC 296 | AGTAAATATTGGTCTGGAAATCCTGAAACTGGTTCCGGGCCGTATCTCAACTGAAGTTGATGCGCGTCTT 297 | TCCTATGACACCGAAGCGTCAATTGCGAAAGCAAAACGCCTGATCAAACTCTACAACGATGCTGGTATTA 298 | GCAACGATCGTATTCTGATCAAACTGGCTTCTACCTGGCAGGGTATCCGTGCTGCAGAACAGCTGGAAAA 299 | AGAAGGCATCAACTGTAACCTGACCCTGCTGTTCTCCTTCGCTCAGGCTCGTGCTTGTGCGGAAGCGGGC 300 | GTGTTCCTGATCTCGCCGTTTGTTGGCCGTATTCTTGACTGGTACAAAGCGAATACCGATAAGAAAGAGT 301 | ACGCTCCGGCAGAAGATCCGGGCGTGGTTTCTGTATCTGAAATCTACCAGTACTACAAAGAGCACGGTTA 302 | TGAAACCGTGGTTATGGGCGCAAGCTTCCGTAACATCGGCGAAATTCTGGAACTGGCAGGCTGCGACCGT 303 | CTGACCATCGCACCGGCACTGCTGAAAGAGCTGGCGGAGAGCGAAGGGGCTATCGAACGTAAACTGTCTT 304 | ACACCGGCGAAGTGAAAGCGCGTCCGGCGCGTATCACTGAGTCCGAGTTCCTGTGGCAGCACAACCAGGA 305 | TCCAATGGCAGTAGATAAACTGGCGGAAGGTATCCGTAAGTTTGCTATTGACCAGGAAAAACTGGAAAAA 306 | ATGATCGGCGATCTGCTGTAATCATTCTTAGCGTGACCGGGAAGTCGGTCACGCTACCTCTTCTGAAGCC 307 | TGTCTGTCACTCCCTTCGCAGTGTATCATTCTGTTTAACGAGACTGTTTAAACGGAAAAATCTTGATGAA 308 | TACTTTACGTATTGGCTTAGTTTCCATCTCTGATCGCGCATCCAGCGGCGTTTATCAGGATAAAGGCATC 309 | CCTGCGCTGGAAGAATGGCTGACATCGGCGCTAACCACGCCGTTTGAACTGGAAACCCGCTTAATCCCCG 310 | ATGAGCAGGCGATCATCGAGCAAACGTTGTGTGAGCTGGTGGATGAAATGAGTTGCCATCTGGTGCTCAC 311 | CACGGGCGGAACTGGCCCGGCGCGTCGTGACGTAACGCCCGATGCGACGCTGGCAGTAGCGGACCGCGAG 312 | ATGCCTGGCTTTGGTGAACAGATGCGCCAGATCAGCCTGCATTTTGTACCAACTGCGATCCTTTCGCGTC 313 | AGGTGGGCGTGATTCGCAAACAGGCGCTGATCCTTAACTTACCCGGTCAGCCGAAGTCTATTAAAGAGAC 314 | GCTGGAAGGTGTGAAGGACGCTGAGGGTAACGTTGTGGTACACGGTATTTTTGCCAGCGTACCGTACTGC 315 | ATTCAGTTGCTGGAAGGGCCATACGTTGAAACGGCACCGGAAGTGGTTGCAGCATTCAGACCGAAGAGTG 316 | CAAGACGCGACGTTAGCGAATAAAAAAATCCCCCCGAGCGGGGGGATCTCAAAACAATTAGTGGGATTCA 317 | CCAATCGGCAGAACGGTGCGACCAAACTGCTCGTTCAGTACTTCACCCATCGCCAGATAGATTGCGCTGG 318 | CACCGCAGATCAGCCCAATCCAGCCGGCAAAGTGGATGATTGCGGCGTTACCGGCAATGTTACCGATCGC 319 | CAGCAGGGCAAACAGCACGGTCAGGCTAAAGAAAACGAATTGCAGAACGCGTGCGCCTTTCAGCGTGCCG 320 | AAGAACATAAACAGCGTAAATACGCCCCACAGACCCAGGTAGACACCAAGGAACTGTGCATTTGGCGCAT 321 | CGGTCAGACCCAGTTTCGGCATCAGCAGAATCGCAACCAGCGTCAGCCAGAAAGAACCGTAAGAGGTGAA 322 | TGCGGTTAAACCGAAAGTGTTGCCTTTTTTGTACTCCAGCAGACCAGCAAAAATTTGCGCGATGCCGCCG 323 | TAGAAAATGCCCATGGCAAGAATAATACCGTCCAGAGCGAAATAACCCACGTTGTGCAGGTTAAGCAGAA 324 | TGGTGGTCATGCCGAAGCCCATCAGGCCCAGCGGTGCCGGATTAGCCAACTTAGTGTTGCCCATAATTCC 325 | TCAAAAATCATCATCGAATGAATGGTGAAATAATTTCCCTGAATAACTGTAGTGTTTTCAGGGCGCGGCA 326 | TAATAATCAGCCAGTGGGGCAGTGTCTACGATCTTTTGAGGGGAAAATGAAAATTTTCCCCGGTTTCCGG 327 | TATCAGACCTGAGTGGCGCTAACCATCCGGCGCAGGCAGGCGATTTGCAGTACGGCTGGAATCGTCACGC 328 | GATAGGCGCTGCCGCTGACCGCTTTAACCCCATTTAGTGCCGCACCTACAGGGCCTCCCAGCCCCGCGCC 329 | GCGCAGCAAACCATGCCCAAGTACGCTCATTGCTGCGTGGGTGCGTAAAATGCGGGTCAGTTGGCTGGAA 330 | AGCAAATGCGACACACCTTTTGCCAATAATTTGTCTTTCATCAGCAGCGGCAGCAGCTCTTCCAGCTCAT 331 | TCACCCTGGCATCGACCGCGTGCAGAAACTCCTGCTTATGTTCCTCGTCCATTTTCTTCCAGGTATTACG 332 | CAGAAATTGTTCCAGTAACTGTTGCTCAATTTCAAACGTAGACATCTCTTTGTCGGCTTTCAGCTTCAAT 333 | CGCTTTGAAACATCGAGCAAAATGGCCCGATACAATTTACCGTGTCCGCGCAGTTTGTTGGCGATACTAT 334 | CGCCACCAAAATGCTGTAATTCTCCGGCAATCAGCTGCCAGTTGCGGCGATGTTGCTCGGGATGCCCTTC 335 | CATCGATTTAAACAGTTCGTTGCGCATCAGTACGCTGGAGAGGCGAGTTTTGCCTTTTTCATTATGGGTG 336 | AGCAATCGGGCGAAATTTGCCAACTGTTCCTCACTACAATGCTGAAGAAAATCCAGATCTGAATCATTCA 337 | GGTAATTAACATTCATTTTTTGTGGCTTCTATATTCTGGCGTTAGTCGTCGCCGATAATTTTCAGCGTGG 338 | CCATATCCGATGAGTTCACCGTATGACCCGAAAAGGTGATTTTTGAGACGCAGCGTTTATTGTCGTTATC 339 | GCTGTTAATGTTGATCCAGTCAGTGGTTTGCCCTTCTTTTATTTCTGAAGGAATATTCAGGCTCTGACTG 340 | GCGCTACGGGCGGCTTTGAAATAAACCGATGCACCGCTTAACTGTAAATCGCCATGGTCGGCAGAGAGTT 341 | GTATGCGTTTCACAATGCGACAAACAGGAAGTTTCAGCGCCAGATCGTTGGTTTCGTTACGCGGCATTGC 342 | AATGGCGCCGAGGAGTTTATGGTCGTTTGCCTGCGCCGTGCAGCACAGCATCAGGCTAATCGCCAGGCTG 343 | GCGGAAATCGTAAAAACGGATTTCATAAGGATTCTCTTAGTGGGAAGAGGTAGGGGGATGAATACCCACT 344 | AGTTTACTGCTGATAAAGAGAAGATTCAGGCACGTAATCTTTTCTTTTTATTACAATTTTTTGATGAATG 345 | CCTTGGCTGCGATTCATTCTTTATATGAATAAAATTGCTGTCAATTTTACGTCTTGTCCTGCCATATCGC 346 | GAAATTTCTGCGCAAAAGCACAAAAAATTTTTGCATCTCCCCCTTGATGACGTGGTTTACGACCCCATTT 347 | AGTAGTCAACCGCAGTGAGTGAGTCTGCAAAAAAATGAAATTGGGCAGTTGAAACCAGACGTTTCGCCCC 348 | TATTACAGACTCACAACCACATGATGACCGAATATATAGTGGAGACGTTTAGATGGGTAAAATAATTGGT 349 | ATCGACCTGGGTACTACCAACTCTTGTGTAGCGATTATGGATGGCACCACTCCTCGCGTGCTGGAGAACG 350 | CCGAAGGCGATCGCACCACGCCTTCTATCATTGCCTATACCCAGGATGGTGAAACTCTAGTTGGTCAGCC 351 | GGCTAAACGTCAGGCAGTGACGAACCCGCAAAACACTCTGTTTGCGATTAAACGCCTGATTGGTCGCCGC 352 | TTCCAGGACGAAGAAGTACAGCGTGATGTTTCCATCATGCCGTTCAAAATTATTGCTGCTGATAACGGCG 353 | ACGCATGGGTCGAAGTTAAAGGCCAGAAAATGGCACCGCCGCAGATTTCTGCTGAAGTGCTGAAAAAAAT 354 | GAAGAAAACCGCTGAAGATTACCTGGGTGAACCGGTAACTGAAGCTGTTATCACCGTACCGGCATACTTT 355 | AACGATGCTCAGCGTCAGGCAACCAAAGACGCAGGCCGTATCGCTGGTCTGGAAGTAAAACGTATCATCA 356 | ACGAACCGACCGCAGCTGCGCTGGCTTACGGTCTGGACAAAGGCACTGGCAACCGTACTATCGCGGTTTA 357 | TGACCTGGGTGGTGGTACTTTCGATATTTCTATTATCGAAATCGACGAAGTTGACGGCGAAAAAACCTTC 358 | GAAGTTCTGGCAACCAACGGTGATACCCACCTGGGGGGTGAAGACTTCGACAGCCGTCTGATCAACTATC 359 | TGGTTGAAGAATTCAAGAAAGATCAGGGCATTGACCTGCGCAACGATCCGCTGGCAATGCAGCGCCTGAA 360 | AGAAGCGGCAGAAAAAGCGAAAATCGAACTGTCTTCCGCTCAGCAGACCGACGTTAACCTGCCATACATC 361 | ACTGCAGACGCGACCGGTCCGAAACACATGAACATCAAAGTGACTCGTGCGAAACTGGAAAGCCTGGTTG 362 | AAGATCTGGTAAACCGTTCCATTGAGCCGCTGAAAGTTGCACTGCAGGACGCTGGCCTGTCCGTATCTGA 363 | TATCGACGACGTTATCCTCGTTGGTGGTCAGACTCGTATGCCAATGGTTCAGAAGAAAGTTGCTGAGTTC 364 | TTTGGTAAAGAGCCGCGTAAAGACGTTAACCCGGACGAAGCTGTAGCAATCGGTGCTGCTGTTCAGGGTG 365 | GTGTTCTGACTGGTGACGTAAAAGACGTACTGCTGCTGGACGTTACCCCGCTGTCTCTGGGTATCGAAAC 366 | CATGGGCGGTGTGATGACGACGCTGATCGCGAAAAACACCACTATCCCGACCAAGCACAGCCAGGTGTTC 367 | TCTACCGCTGAAGACAACCAGTCTGCGGTAACCATCCATGTGCTGCAGGGTGAACGTAAACGTGCGGCTG 368 | ATAACAAATCTCTGGGTCAGTTCAACCTAGATGGTATCAACCCGGCACCGCGCGGCATGCCGCAGATCGA 369 | AGTTACCTTCGATATCGATGCTGACGGTATCCTGCACGTTTCCGCGAAAGATAAAAACAGCGGTAAAGAG 370 | CAGAAGATCACCATCAAGGCTTCTTCTGGTCTGAACGAAGATGAAATCCAGAAAATGGTACGCGACGCAG 371 | AAGCTAACGCCGAAGCTGACCGTAAGTTTGAAGAGCTGGTACAGACTCGCAACCAGGGCGACCATCTGCT 372 | GCACAGCACCCGTAAGCAGGTTGAAGAAGCAGGCGACAAACTGCCGGCTGACGACAAAACTGCTATCGAG 373 | TCTGCGCTGACTGCACTGGAAACTGCTCTGAAAGGTGAAGACAAAGCCGCTATCGAAGCGAAAATGCAGG 374 | --------------------------------------------------------------------------------