├── scripts └── hello.py ├── live ├── results_with_csv.txt ├── data │ ├── seq.txt │ └── genes.txt ├── output.txt ├── gc_content_data.csv ├── new_gc_content_data.csv ├── tools.py ├── live_python_fm_3.ipynb └── live_help_python_fm_3.ipynb ├── my_first_module.py ├── programming.txt ├── data.txt ├── data ├── mydata.txt ├── genes.txt ├── glpa.fa └── sample.fa ├── img ├── python_shell.png └── python_run_code.png ├── dict_data.txt ├── install ├── Dockerfile ├── vbox_installer.sh └── 2to3_nb.py ├── solutions ├── ex2_1_b.py ├── ex2_2_a.py ├── ex3_4.py ├── ex2_2_b.py ├── ex2_1_a.py ├── ex2_1_c.py ├── ex3_3.py ├── ex4_1.py ├── ex3_1.py ├── ex3_2.py ├── ex2_3.py ├── ex4_2.py └── ex1_1.py ├── LICENSE ├── .gitignore ├── README.md ├── python_fm_4.ipynb ├── python_fm_3.ipynb ├── python_fm_1.ipynb ├── python_fm_intro.ipynb └── python_fm_2.ipynb /scripts/hello.py: -------------------------------------------------------------------------------- 1 | print("Hello world!") 2 | -------------------------------------------------------------------------------- /live/results_with_csv.txt: -------------------------------------------------------------------------------- 1 | BRCA2 84195 2 | TNFAIP3 16099 3 | TCF7 37155 4 | -------------------------------------------------------------------------------- /my_first_module.py: -------------------------------------------------------------------------------- 1 | def say_hello(user): 2 | print('Hello', user, '!') 3 | -------------------------------------------------------------------------------- /live/data/seq.txt: -------------------------------------------------------------------------------- 1 | AAAAAAAAAAAA 2 | TTTTTTTTTTTT 3 | CCCCCC 4 | GGGG 5 | ATCGAATCGTAAA 6 | -------------------------------------------------------------------------------- /live/output.txt: -------------------------------------------------------------------------------- 1 | AAAAAAAAAAAA 0 2 | TTTTTTTTTTTT 0 3 | CCCCCC 6 4 | GGGG 4 5 | ATCGAATCGTAAA 4 6 | -------------------------------------------------------------------------------- /programming.txt: -------------------------------------------------------------------------------- 1 | I love programming in Python! 2 | I love making scripts. 3 | I love working with data. 4 | -------------------------------------------------------------------------------- /data.txt: -------------------------------------------------------------------------------- 1 | Index Organism Score 2 | 1 Human 1.076 3 | 2 Mouse 1.202 4 | 3 Frog 2.2362 5 | 4 Fly 0.9853 6 | -------------------------------------------------------------------------------- /data/mydata.txt: -------------------------------------------------------------------------------- 1 | Index Organism Score 2 | 1 Human 1.076 3 | 2 Mouse 1.202 4 | 3 Frog 2.2362 5 | 4 Fly 0.9853 6 | -------------------------------------------------------------------------------- /img/python_shell.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pycam/python-functions-and-modules/HEAD/img/python_shell.png -------------------------------------------------------------------------------- /dict_data.txt: -------------------------------------------------------------------------------- 1 | Index Score Organism 2 | 1 1.076 Human 3 | 2 1.202 Mouse 4 | 3 2.2362 Frog 5 | 4 0.9853 Fly 6 | -------------------------------------------------------------------------------- /img/python_run_code.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pycam/python-functions-and-modules/HEAD/img/python_run_code.png -------------------------------------------------------------------------------- /live/gc_content_data.csv: -------------------------------------------------------------------------------- 1 | seq,gc 2 | AAAAAAAAAAAA,0 3 | TTTTTTTTTTTT,0 4 | CCCCCC,6 5 | GGGG,4 6 | ATCGAATCGTAAA,4 7 | -------------------------------------------------------------------------------- /live/new_gc_content_data.csv: -------------------------------------------------------------------------------- 1 | seq,gc 2 | AAAAAAAAAAAA,0 3 | TTTTTTTTTTTT,0 4 | CCCCCC,6 5 | GGGG,4 6 | ATCGAATCGTAAA,4 7 | -------------------------------------------------------------------------------- /data/genes.txt: -------------------------------------------------------------------------------- 1 | gene chrom start end 2 | BRCA2 13 32889611 32973805 3 | TNFAIP3 6 138188351 138204449 4 | TCF7 5 133450402 133487556 5 | -------------------------------------------------------------------------------- /live/data/genes.txt: -------------------------------------------------------------------------------- 1 | gene chrom start end 2 | BRCA2 13 32889611 32973805 3 | TNFAIP3 6 138188351 138204449 4 | TCF7 5 133450402 133487556 5 | -------------------------------------------------------------------------------- /data/glpa.fa: -------------------------------------------------------------------------------- 1 | >swissprot|P02724|GLPA_HUMAN Glycophorin-A; 2 | MYGKIIFVLLLSEIVSISASSTTGVAMHTSTSSSVTKSYISSQTNDTHKRDTYAATPRAH 3 | EVSEISVRTVYPPEEETGERVQLAHHFSEPEITLIIFGVMAGVIGTILLISYGIRRLIKK 4 | SPSDVKPLPSPDTDVPLSSVEIENPETSDQ 5 | 6 | -------------------------------------------------------------------------------- /live/tools.py: -------------------------------------------------------------------------------- 1 | def gc_content(seq): 2 | gc = 0 3 | for base in seq: 4 | if (base == 'G') or (base == 'C'): 5 | gc += 1 6 | return gc 7 | 8 | def extract_seq(seq, window_size): 9 | results = [] 10 | nb_windows = len(seq) - window_size + 1 11 | for i in range(nb_windows): 12 | results.append(seq[i:i+window_size]) 13 | return results 14 | -------------------------------------------------------------------------------- /install/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM ubuntu 2 | MAINTAINER Mark Dunning 3 | 4 | RUN sudo apt-get update 5 | RUN apt-get install -y ipython ipython-notebook git 6 | RUN git clone https://github.com/pycam/python-intro.git 7 | 8 | EXPOSE 8888 9 | ENV USE_HTTP 0 10 | 11 | WORKDIR python-intro/ 12 | RUN ipython notebook --no-browser --port 8888 --ip=* Introduction_to_python_session_1.ipynb 13 | -------------------------------------------------------------------------------- /solutions/ex2_1_b.py: -------------------------------------------------------------------------------- 1 | def molecular_weight(sequence): 2 | """Function that takes a single DNA sequence as an argument and estimates 3 | the molecular weight of this sequence. 4 | """ 5 | sequence = sequence.upper() 6 | base_weights = {'A': 331, 'C': 307, 'G': 347, 'T': 306} 7 | total_weight = 0 8 | for base in sequence: 9 | total_weight += base_weights[base] 10 | return total_weight 11 | 12 | # Test your function using some example sequences. 13 | weight = molecular_weight("ACTTGGGCAGATAGTCGCG") 14 | print("Molecular weight:", weight, "g/mol") 15 | -------------------------------------------------------------------------------- /solutions/ex2_2_a.py: -------------------------------------------------------------------------------- 1 | def base_composition(sequence): 2 | """Write a function that counts the number of each base found 3 | in a DNA sequence. 4 | """ 5 | sequence = sequence.upper() 6 | num_As = sequence.count('A') 7 | num_Cs = sequence.count('C') 8 | num_Gs = sequence.count('G') 9 | num_Ts = sequence.count('T') 10 | # Return the result as a tuple of 4 numbers representing the counts of each base A, C, G and T. 11 | return (num_As, num_Cs, num_Gs, num_Ts) 12 | 13 | dna = "ACAGTGTCGTACAGATCAGTCAGATACA" 14 | print('base composition', base_composition(dna)) 15 | -------------------------------------------------------------------------------- /solutions/ex3_4.py: -------------------------------------------------------------------------------- 1 | from ex3_3 import gc_content, extract_sub_sequences 2 | 3 | def gc_content_along_the_chain(dna_sequence, window_size): 4 | """Returns a list of GC along the DNA sequence 5 | given a DNA sequence and the size of the sliding window 6 | """ 7 | sub_sequences = extract_sub_sequences(dna_sequence, window_size) 8 | gc_results = [] 9 | for sub_sequence in sub_sequences: 10 | gc_results.append(gc_content(sub_sequence)) 11 | return gc_results 12 | 13 | dna = 'ATGGTGCATCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTG' 14 | print(gc_content(dna)) 15 | print(extract_sub_sequences(dna, 5)) 16 | print(gc_content_along_the_chain(dna, 5)) 17 | -------------------------------------------------------------------------------- /solutions/ex2_2_b.py: -------------------------------------------------------------------------------- 1 | def reverse_complement(sequence): 2 | """Write a function to return the reverse-complement of a nucleotide 3 | sequence. 4 | """ 5 | reverse_base = {'A': 'T', 'T': 'A', 'G': 'C', 'C': 'G'} 6 | sequence = sequence.upper() 7 | sequence = reversed(sequence) 8 | result = [] 9 | for base in sequence: 10 | # check if sequence is a DNA sequence or not 11 | if base not in 'ATCG': 12 | return base + " is NOT a known DNA base" 13 | result.append(reverse_base[base]) 14 | return "".join(result) 15 | 16 | print(reverse_complement('ATCGTAGCatgcAATTGGC')) 17 | print(reverse_complement('ATCGTAGCatgcxAATTGGC')) 18 | -------------------------------------------------------------------------------- /solutions/ex2_1_a.py: -------------------------------------------------------------------------------- 1 | def simple_mean(x, y): 2 | """Function that takes 2 numerical arguments and returns their mean. 3 | """ 4 | mean = (x + y) / 2 5 | return mean 6 | 7 | 8 | def advanced_mean(values): 9 | """Function that takes a list of numbers and returns the mean of all 10 | the numbers in the list. 11 | """ 12 | total = 0 13 | for v in values: 14 | total += v 15 | mean = total / len(values) 16 | return mean 17 | 18 | print("Mean of 2 & 3:", simple_mean(2, 3)) 19 | print("Mean of 8 & 10:", simple_mean(8, 10)) 20 | print("Mean of [2, 4, 6]", advanced_mean([2, 4, 6])) 21 | print("Mean of values even numbers under 20:", advanced_mean(list(range(0, 20, 2)))) 22 | -------------------------------------------------------------------------------- /solutions/ex2_1_c.py: -------------------------------------------------------------------------------- 1 | def molecular_weight(sequence): 2 | """Function that takes a single DNA sequence as an argument and estimates 3 | the molecular weight of this sequence. 4 | If the sequence passed in above contains N bases, 5 | use the mean weight of the other bases as the weight. 6 | """ 7 | sequence = sequence.upper() 8 | base_weights = {'A': 331, 'C': 307, 'G': 347, 'T': 306} 9 | base_weights['N'] = sum(base_weights.values()) / len(base_weights) 10 | total_weight = 0 11 | for base in sequence: 12 | total_weight += base_weights[base] 13 | return total_weight 14 | 15 | weight = molecular_weight("AAGGACTGTCNCGTNNCGTAGGATNATAGNN") 16 | print("Molecular weight:", weight, "g/mol") 17 | -------------------------------------------------------------------------------- /solutions/ex3_3.py: -------------------------------------------------------------------------------- 1 | def gc_content(sequence): 2 | """Calculate the GC content of a DNA sequence 3 | """ 4 | gc = 0 5 | for base in sequence: 6 | if (base == 'G') or (base == 'C'): 7 | gc += 1 8 | return 100 * (gc / len(sequence)) 9 | 10 | 11 | #dna = 'ATGGTGCATCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTG' 12 | #print('GC%', gc_content(dna)) 13 | 14 | def extract_sub_sequences(sequence, window_size): 15 | """Extract a list of overlaping sub-sequences for a given window size 16 | from a given sequence. 17 | """ 18 | if window_size <= 0: 19 | return "Window size must be a positive integer" 20 | if window_size > len(sequence): 21 | return "Window size is larger than sequence length" 22 | result = [] 23 | nr_windows = len(sequence) - window_size + 1 24 | for i in range(nr_windows): 25 | sub_sequence = sequence[i:i + window_size] 26 | result.append(sub_sequence) 27 | return result 28 | 29 | 30 | #dna = 'ATGGTGCATCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTG' 31 | #print(extract_sub_sequences(dna, 5)) 32 | -------------------------------------------------------------------------------- /solutions/ex4_1.py: -------------------------------------------------------------------------------- 1 | import csv 2 | 3 | # initialise variables 4 | dataset = [] 5 | 6 | # read GapMinder dataset from file and filter the years of interest using CSV 7 | with open('data/gapminder.txt') as f: 8 | reader = csv.DictReader(f, delimiter='\t') 9 | for record in reader: 10 | dataset.append(record) 11 | 12 | # define function to filter the dataset based on year and extract one column 13 | def filter_dataset(dataset, column_name, year): 14 | results = [] 15 | for data in dataset: 16 | if data['year'] == year: 17 | results.append(data[column_name]) 18 | return results 19 | 20 | 21 | # scatter plots 22 | import matplotlib.pyplot as mpyplot 23 | mpyplot.xlabel('Life Expectancy') 24 | mpyplot.ylabel('GDP per Capita') 25 | mpyplot.title('GapMinder world data over 50 years') 26 | mpyplot.scatter(filter_dataset(dataset, 'lifeExp', '1957'), filter_dataset(dataset, 'gdpPercap', '1957'), c='b', label='1957') 27 | mpyplot.scatter(filter_dataset(dataset, 'lifeExp', '2007'), filter_dataset(dataset, 'gdpPercap', '2007'), c='g', label='2007') 28 | mpyplot.legend() 29 | mpyplot.grid(True) 30 | mpyplot.show() 31 | -------------------------------------------------------------------------------- /solutions/ex3_1.py: -------------------------------------------------------------------------------- 1 | import os.path 2 | 3 | # Read a tab delimited file which has 4 columns: gene, chromosome, start and end coordinates. 4 | # Check if the file exists, then compute the length of each gene and store 5 | # its name and corresponding length into a dictionary. 6 | # Write the results into a new tab separated file. 7 | 8 | gene_file = os.path.join('data', 'genes.txt') 9 | output_file = "gene_lengths.txt" 10 | 11 | if os.path.exists(gene_file): 12 | results = [] 13 | with open(gene_file) as f: 14 | header = f.readline() 15 | for line in f: 16 | gene, chrom, start, end = line.strip().split("\t") 17 | row = {'gene': gene, 'length': int(end) - int(start) + 1} 18 | results.append(row) 19 | print(results) 20 | with open(output_file, "w") as out: 21 | out.write('gene' + "\t" + 'length' + "\n") # write header 22 | for record in results: 23 | out.write(record['gene'] + "\t" + str(record['length']) + "\n") 24 | else: 25 | print(gene_file, 'does not exists!') 26 | 27 | if os.path.exists(output_file): 28 | # print contents of output file 29 | with open(output_file) as f: 30 | print(f.read()) 31 | else: 32 | print(output_file, 'does not exists!') 33 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | This is free and unencumbered software released into the public domain. 2 | 3 | Anyone is free to copy, modify, publish, use, compile, sell, or 4 | distribute this software, either in source code form or as a compiled 5 | binary, for any purpose, commercial or non-commercial, and by any 6 | means. 7 | 8 | In jurisdictions that recognize copyright laws, the author or authors 9 | of this software dedicate any and all copyright interest in the 10 | software to the public domain. We make this dedication for the benefit 11 | of the public at large and to the detriment of our heirs and 12 | successors. We intend this dedication to be an overt act of 13 | relinquishment in perpetuity of all present and future rights to this 14 | software under copyright law. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 19 | IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR 20 | OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 21 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 22 | OTHER DEALINGS IN THE SOFTWARE. 23 | 24 | For more information, please refer to 25 | -------------------------------------------------------------------------------- /solutions/ex3_2.py: -------------------------------------------------------------------------------- 1 | import os.path 2 | import csv 3 | 4 | # Read a tab delimited file which has 4 columns: gene, chromosome, start and end coordinates. 5 | # Check if the file exists, then compute the length of each gene and store 6 | # its name and corresponding length into a dictionary. 7 | # Write the results into a new tab separated file and make use of the csv module. 8 | 9 | gene_file = os.path.join('data', 'genes.txt') 10 | output_file = "gene_lengths_csv_module.txt" 11 | 12 | if os.path.exists(gene_file): 13 | results = [] 14 | with open(gene_file) as f: 15 | reader = csv.DictReader(f, delimiter='\t') 16 | for record in reader: 17 | record['length'] = int(record['end']) - int(record['start']) + 1 18 | results.append(record) 19 | print(results) 20 | with open(output_file, "w") as out: 21 | writer = csv.DictWriter(out, results[0].keys(), delimiter='\t') 22 | writer.writeheader() # write header 23 | for record in results: 24 | writer.writerow(record) 25 | else: 26 | print(gene_file, 'does not exists!') 27 | 28 | if os.path.exists(output_file): 29 | # print contents of output file 30 | with open(output_file) as f: 31 | print(f.read()) 32 | else: 33 | print(output_file, 'does not exists!') 34 | -------------------------------------------------------------------------------- /solutions/ex2_3.py: -------------------------------------------------------------------------------- 1 | def molecular_weight(sequence, molecule_type='DNA'): 2 | """Function that takes a single DNA or RNA sequence as an argument 3 | and estimates the molecular weight of this sequence. 4 | If the sequence passed in above contains N bases, 5 | use the mean weight of the other bases as the weight. 6 | Use an optional argument to specify the molecule type, but default to DNA. 7 | """ 8 | sequence = sequence.upper() 9 | molecule_type = molecule_type.upper() 10 | 11 | dna_weights = {'A': 331, 'C': 307, 'G': 347, 'T': 306} 12 | rna_weights = {'A': 347, 'C': 323, 'G': 363, 'U': 324} 13 | 14 | if molecule_type == 'DNA': 15 | base_weights = dna_weights 16 | elif molecule_type == 'RNA': 17 | base_weights = rna_weights 18 | else: 19 | return "Unrecognised molecule_type " + molecule_type 20 | 21 | total_weight = 0 22 | for base in sequence: 23 | # check if base is a DNA base or not 24 | if base not in base_weights: 25 | return base + " is NOT a known DNA base" 26 | total_weight += base_weights[base] 27 | return total_weight 28 | 29 | 30 | print("RNA weight:", molecular_weight("AACGUCGAAUCCUAGCGC", molecule_type="RNA"), "g/mol") 31 | print("DNA weight:", molecular_weight("AACGTCGAATCCTAGCGC"), "g/mol") 32 | print("Other sequence weight:", molecular_weight("AACGTCGAATXXXCCTAGCGC"), "g/mol") 33 | -------------------------------------------------------------------------------- /install/vbox_installer.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | # lubuntu LTS 14.04 VirtualBox installer based on lubuntu-14.04.2-desktop-i386 3 | # computer name: crukci-training-vm; user: training; password: admin123 4 | 5 | sudo su - 6 | apt-get install gedit 7 | apt-get install vim 8 | apt-get install git 9 | apt-get install python-pip 10 | apt-get install python-zmq 11 | apt-get install python-matplotlib 12 | apt-get install python-biopython 13 | apt-get install ncbi-blast+ 14 | 15 | # Install VirtualBox Additions 16 | # From the VirtualBox menu of lubuntu go to Devices > Insert Guest Additions CD image... and do 17 | cd /media/training/VBOXADDITIONS_4.3.26_98988 18 | sudo ./VBoxLinuxAdditions.run 19 | 20 | # To increase screen resolution 21 | # Start > Preferences > Additional Drivers: Using x86 virtualization solution... and click Apply Changes 22 | # Then Start > Preferences > Monitor Settings and select 1440x1050 and click Save and Apply 23 | 24 | pip install ipython[notebook] 25 | 26 | apt-get autoremove 27 | apt-get clean 28 | 29 | 30 | adduser pycam # password: pycam123 31 | 32 | exit 33 | 34 | # login as pycam -------------------------------------------------------------- 35 | 36 | git clone https://github.com/pycam/python-intro.git course 37 | 38 | # Add ipython at startup from lubuntu menu do to... 39 | # Preferences > Default applications for LXSession then tab Autostart and add: 40 | # /usr/local/bin/ipython notebook --no-browser --port=8888 --ip=127.0.0.1 /home/pycam/course/ 41 | 42 | # Add bookmarks into firefox: (1) pycam.github.io (2) 127.0.0.1:8888 43 | 44 | 45 | -------------------------------------------------------------------------------- /solutions/ex4_2.py: -------------------------------------------------------------------------------- 1 | from Bio import SeqIO 2 | from Bio.SeqUtils import GC 3 | 4 | # Read in a FASTA file named data/sample.fa 5 | seq_records = list(SeqIO.parse('data/sample.fa', 'fasta')) 6 | 7 | # find the number of sequences present in the file 8 | num_seq = len(seq_records) 9 | print('Total number of sequences:', num_seq) 10 | 11 | # find IDs and lengths of the longest and the shortest sequences 12 | max_len = min_len = len(seq_records[0].seq) 13 | 14 | longest_seq = shortest_seq = seq_records[0].id 15 | 16 | for i in range(1, num_seq): 17 | if len(seq_records[i].seq) > max_len: 18 | # update max_len and longest_seq 19 | max_len = len(seq_records[i].seq) 20 | longest_seq = seq_records[i].id 21 | elif len(seq_records[i].seq) < min_len: 22 | # update min_len and shortest_seq 23 | min_len = len(seq_records[i].seq) 24 | shortest_seq = seq_records[i].id 25 | 26 | print('Longest sequence is', longest_seq, 'with length', max_len, 'bp') 27 | print('Shortest sequence is', shortest_seq, 'with length', min_len, 'bp') 28 | 29 | # Creating a new sequence list containing sequences longer than 500bp 30 | # Calculate the average length of these sequences 31 | # calculate and print the percentage of GC contents 32 | 33 | long_seq_records = list() # empty list for sequences 34 | 35 | total_seq_length = 0 36 | for sequence in seq_records: 37 | if len(sequence) > 500: 38 | long_seq_records.append(sequence) 39 | total_seq_length += len(sequence) 40 | gc = GC(sequence.seq) 41 | print('%GC in', sequence.id, 'is {:.2f}'.format(gc)) 42 | 43 | avg_seq_length = total_seq_length / len(long_seq_records) 44 | 45 | print('Average length for sequences longer than 500bp is', avg_seq_length) 46 | 47 | # Write sequences in the long_seq_records in a file with 'GenBank' format 48 | SeqIO.write(long_seq_records, 'long_sequences.fa', 'fasta') 49 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | .idea 7 | .ipynb_checkpoints 8 | 9 | .DS_Store 10 | 11 | venv 12 | 13 | biopython.fa 14 | 15 | csvdata.tsv 16 | 17 | csvdictdata.tsv 18 | 19 | data/mydata.csv 20 | 21 | gene_lengths_csv.tsv 22 | 23 | gene_lengths.tsv 24 | 25 | out.txt 26 | 27 | sample.long.fa 28 | 29 | mySeqFile.fa 30 | 31 | # C extensions 32 | *.so 33 | 34 | # Distribution / packaging 35 | .Python 36 | env/ 37 | build/ 38 | develop-eggs/ 39 | dist/ 40 | downloads/ 41 | eggs/ 42 | .eggs/ 43 | lib/ 44 | lib64/ 45 | parts/ 46 | sdist/ 47 | var/ 48 | *.egg-info/ 49 | .installed.cfg 50 | *.egg 51 | 52 | # PyInstaller 53 | # Usually these files are written by a python script from a template 54 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 55 | *.manifest 56 | *.spec 57 | 58 | # Installer logs 59 | pip-log.txt 60 | pip-delete-this-directory.txt 61 | 62 | # Unit test / coverage reports 63 | htmlcov/ 64 | .tox/ 65 | .coverage 66 | .coverage.* 67 | .cache 68 | nosetests.xml 69 | coverage.xml 70 | *,cover 71 | .hypothesis/ 72 | 73 | # Translations 74 | *.mo 75 | *.pot 76 | 77 | # Django stuff: 78 | *.log 79 | local_settings.py 80 | 81 | # Flask stuff: 82 | instance/ 83 | .webassets-cache 84 | 85 | # Scrapy stuff: 86 | .scrapy 87 | 88 | # Sphinx documentation 89 | docs/_build/ 90 | 91 | # PyBuilder 92 | target/ 93 | 94 | # IPython Notebook 95 | .ipynb_checkpoints 96 | 97 | # pyenv 98 | .python-version 99 | 100 | # celery beat schedule file 101 | celerybeat-schedule 102 | 103 | # dotenv 104 | .env 105 | 106 | # virtualenv 107 | venv/ 108 | ENV/ 109 | 110 | # Spyder project settings 111 | .spyderproject 112 | 113 | # Rope project settings 114 | .ropeproject 115 | 116 | gene_lengths_csv_module.txt 117 | 118 | gene_lengths.txt 119 | 120 | long_sequences.fa 121 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Working with Python: functions and modules - course materials 2 | 3 | Materials for the course run by the Graduate School of Life Sciences, University of Cambridge. 4 | 5 | - Course website: http://pycam.github.io/ 6 | - Booking website: http://www.training.cam.ac.uk/ 7 | 8 | 9 | If you wish to run the course on your personal computer, here are the steps to follow to get up and running. 10 | 11 | ## Clone this github project 12 | 13 | ```bash 14 | git clone https://github.com/pycam/python-functions-and-modules.git 15 | cd python-functions-and-modules 16 | ``` 17 | 18 | ## Dependencies 19 | 20 | Install Python 3 by downloading the latest version from https://www.python.org/. For Mac OSX, just run `brew install python3`. 21 | 22 | Python 2.x is legacy, Python 3.x is the present and future of the language. 23 | 24 | Create first a virtual environment using the [`venv` library](https://docs.python.org/3/library/venv.html). Update pip if needed, install [jupyter](http://jupyter.org/) and [RISE](https://github.com/damianavila/RISE) to get a slideshow extension into jupyter. 25 | 26 | ***Note*** A virtual environment is a Python environment such that the Python interpreter, libraries and scripts installed into it are isolated from those installed in other virtual environments. 27 | 28 | ```bash 29 | python3 -m venv venv 30 | # activate your virtual environment 31 | source venv/bin/activate 32 | # update pip if needed 33 | pip install --upgrade pip 34 | # install jupyter 35 | pip install jupyter 36 | 37 | # slideshow extension 38 | pip install rise 39 | jupyter-nbextension install rise --py --sys-prefix 40 | jupyter nbextension enable rise --py --sys-prefix 41 | 42 | # matplotlib 43 | pip install matplotlib 44 | 45 | # biopython 46 | pip install biopython 47 | 48 | # pandas 49 | pip install pandas 50 | ``` 51 | 52 | On mac OSX you may need to run this command to accept the XCode license, before installing biopython: 53 | 54 | ```bash 55 | sudo xcodebuild -license 56 | ``` 57 | 58 | ## Usage 59 | 60 | Go to the directory where you've cloned this repository, activate your virtual environment and run jupyter. 61 | 62 | Your web browser should automatically open with this url http://localhost:8888/tree where you see the directory tree of the course with all the jupyter notebooks. 63 | 64 | ```bash 65 | cd python-functions-and-modules 66 | source venv/bin/activate 67 | jupyter notebook 68 | ``` 69 | 70 | To shutdown jupyter, type ctrl-C into the terminal you've ran `jupyter notebook`, answer `y` and press `enter`. 71 | 72 | You may wish to deactivate the virtual environment, by entering into the terminal: 73 | ``` 74 | deactivate 75 | ``` 76 | -------------------------------------------------------------------------------- /solutions/ex1_1.py: -------------------------------------------------------------------------------- 1 | # initialise variables 2 | data = [] 3 | data_years = [] 4 | 5 | # read GapMinder dataset from file 6 | with open('data/gapminder.txt') as f: 7 | for line in f: 8 | country, continent, year, life_expectancy, pop, gdp_per_capita = line.strip().split('\t') 9 | if country == 'country': 10 | continue 11 | data.append({'country': country, 12 | 'continent': continent, 13 | 'year': int(year), 14 | 'life_expectancy': float(life_expectancy), 15 | 'population': int(pop), 16 | 'gdp_per_capita': float(gdp_per_capita)}) 17 | data_years.append(int(year)) 18 | 19 | # find what are the oldest and youngest years in the dataset 20 | non_redundant_years = list(set(data_years)) 21 | non_redundant_years.sort() 22 | oldest_year = non_redundant_years[0] 23 | youngest_year = non_redundant_years[-1] 24 | print(oldest_year, youngest_year) 25 | 26 | # initialise more variables 27 | oldest_life_expectancy = [] 28 | youngest_life_expectancy = [] 29 | oldest_population = [] 30 | youngest_population = [] 31 | 32 | # calculate average life expectancy as well as global population increase between these two years 33 | for d in data: 34 | if d['year'] == oldest_year: 35 | oldest_life_expectancy.append(d['life_expectancy']) 36 | oldest_population.append(d['population']) 37 | elif d['year'] == youngest_year: 38 | youngest_life_expectancy.append(d['life_expectancy']) 39 | youngest_population.append(d['population']) 40 | 41 | print(oldest_year, sum(oldest_life_expectancy)/len(oldest_life_expectancy)) 42 | print(youngest_year, sum(youngest_life_expectancy)/len(youngest_life_expectancy)) 43 | life_expectancy_increased = sum(youngest_life_expectancy)/len(youngest_life_expectancy) - sum(oldest_life_expectancy)/len(oldest_life_expectancy) 44 | population_increased = sum(youngest_population) - sum(oldest_population) 45 | print('In', youngest_year - oldest_year, 'years, life expectancy increased by', life_expectancy_increased, 'years') 46 | print('In', youngest_year - oldest_year, 'years, global population increased by', population_increased, 'people') 47 | 48 | # find which country as the lowest life expectancy in 2002 49 | lowest_life_expectancy_in_2002 = 100 50 | country_with_lowest_life_expectancy_in_2002 = '' 51 | for d in data: 52 | if d['year'] == 2002: 53 | if d['life_expectancy'] < lowest_life_expectancy_in_2002: 54 | lowest_life_expectancy_in_2002 = d['life_expectancy'] 55 | country_with_lowest_life_expectancy_in_2002 = d['country'] 56 | print(country_with_lowest_life_expectancy_in_2002, lowest_life_expectancy_in_2002) 57 | -------------------------------------------------------------------------------- /install/2to3_nb.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | """ 3 | To run: python3 nb2to3.py notebook-or-directory 4 | """ 5 | # Authors: Thomas Kluyver, Fernando Perez 6 | # See: https://gist.github.com/takluyver/c8839593c615bb2f6e80 7 | 8 | import argparse 9 | import pathlib 10 | from nbformat import read, write 11 | 12 | import lib2to3 13 | from lib2to3.refactor import RefactoringTool, get_fixers_from_package 14 | 15 | 16 | def refactor_notebook_inplace(rt, path): 17 | 18 | def refactor_cell(src): 19 | #print('\n***SRC***\n', src) 20 | try: 21 | tree = rt.refactor_string(src+'\n', str(path) + '/cell-%d' % i) 22 | except (lib2to3.pgen2.parse.ParseError, 23 | lib2to3.pgen2.tokenize.TokenError): 24 | return src 25 | else: 26 | return str(tree)[:-1] 27 | 28 | 29 | print("Refactoring:", path) 30 | nb = read(str(path), as_version=4) 31 | 32 | # Run 2to3 on code 33 | for i, cell in enumerate(nb.cells, start=1): 34 | if cell.cell_type == 'code': 35 | if cell.execution_count in (' ', '*'): 36 | cell.execution_count = None 37 | 38 | if cell.source.startswith('%%'): 39 | # For cell magics, try to refactor the body, in case it's 40 | # valid python 41 | head, source = cell.source.split('\n', 1) 42 | cell.source = head + '\n' + refactor_cell(source) 43 | else: 44 | cell.source = refactor_cell(cell.source) 45 | 46 | 47 | # Update notebook metadata 48 | nb.metadata.kernelspec = { 49 | 'display_name': 'Python 3', 50 | 'name': 'python3', 51 | 'language': 'python', 52 | } 53 | if 'language_info' in nb.metadata: 54 | nb.metadata.language_info.codemirror_mode = { 55 | 'name': 'ipython', 56 | 'version': 3, 57 | } 58 | nb.metadata.language_info.pygments_lexer = 'ipython3' 59 | nb.metadata.language_info.pop('version', None) 60 | 61 | write(nb, str(path)) 62 | 63 | def main(argv=None): 64 | ap = argparse.ArgumentParser() 65 | ap.add_argument('path', type=pathlib.Path, 66 | help="Notebook or directory containing notebooks") 67 | 68 | options = ap.parse_args(argv) 69 | 70 | avail_fixes = set(get_fixers_from_package('lib2to3.fixes')) 71 | rt = RefactoringTool(avail_fixes) 72 | 73 | if options.path.is_dir(): 74 | for nb_path in options.path.rglob('*.ipynb'): 75 | refactor_notebook_inplace(rt, nb_path) 76 | else: 77 | refactor_notebook_inplace(rt, options.path) 78 | 79 | if __name__ == '__main__': 80 | main() 81 | -------------------------------------------------------------------------------- /live/live_python_fm_3.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "deletable": true, 7 | "editable": true 8 | }, 9 | "source": [ 10 | "# Recap\n", 11 | "- functions" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": null, 17 | "metadata": { 18 | "collapsed": false, 19 | "deletable": true, 20 | "editable": true 21 | }, 22 | "outputs": [], 23 | "source": [ 24 | "# built-in functions\n", 25 | "seq = 'ATCCTGCTAAA'\n", 26 | "print(len(seq))" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": null, 32 | "metadata": { 33 | "collapsed": false, 34 | "deletable": true, 35 | "editable": true 36 | }, 37 | "outputs": [], 38 | "source": [ 39 | "# your own function\n", 40 | "def gc_content(seq):\n", 41 | " gc = 0\n", 42 | " for base in seq:\n", 43 | " if (base == 'C') or (base == 'G'):\n", 44 | " gc += 1\n", 45 | " return gc\n", 46 | "\n", 47 | "print(gc_content('ATCCTGCTAAA'))\n", 48 | "print(gc_content('GGGCCCCTTTA'))" 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": { 54 | "deletable": true, 55 | "editable": true 56 | }, 57 | "source": [ 58 | "# Session 3: modules\n", 59 | "\n", 60 | "- math, os.path, csv and pandas\n", 61 | "- create own module" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": null, 67 | "metadata": { 68 | "collapsed": false, 69 | "deletable": true, 70 | "editable": true 71 | }, 72 | "outputs": [], 73 | "source": [ 74 | "import math\n", 75 | "print(math.pi)" 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": null, 81 | "metadata": { 82 | "collapsed": false, 83 | "deletable": true, 84 | "editable": true 85 | }, 86 | "outputs": [], 87 | "source": [ 88 | "import os.path\n", 89 | "seq_filename = os.path.join('data', 'seq.txt')\n", 90 | "print(os.path.exists(seq_filename))\n", 91 | "print(os.path.dirname(seq_filename))\n", 92 | "print(os.path.basename(seq_filename))" 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": { 98 | "deletable": true, 99 | "editable": true 100 | }, 101 | "source": [ 102 | "## Ex 3.1\n", 103 | "- read a tab delimited file data/genes.txt\n", 104 | "- check if the file exists\n", 105 | "- calculate the lenght of each gene\n", 106 | "- write the results into a tab separated file" 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": null, 112 | "metadata": { 113 | "collapsed": false, 114 | "deletable": true, 115 | "editable": true 116 | }, 117 | "outputs": [], 118 | "source": [ 119 | "data_filename = os.path.join('data', 'genes.txt')\n", 120 | "if os.path.exists(data_filename):\n", 121 | " with open(data_filename) as data:\n", 122 | " header = data.readline()\n", 123 | " with open('results.txt', 'w') as out:\n", 124 | " for line in data:\n", 125 | " #gene, chrom, start, end = line.strip().split()\n", 126 | " row = line.strip().split()\n", 127 | " print(int(row[3])-int(row[2]))\n", 128 | " #out.write('{}\\t{}\\n'.format(gene, int(end)-int(start)+1))\n", 129 | "else:\n", 130 | " print('{} file does not exist'.format(data_filename))" 131 | ] 132 | }, 133 | { 134 | "cell_type": "code", 135 | "execution_count": null, 136 | "metadata": { 137 | "collapsed": true, 138 | "deletable": true, 139 | "editable": true 140 | }, 141 | "outputs": [], 142 | "source": [ 143 | "def gc_content(seq):\n", 144 | " gc = 0\n", 145 | " for base in seq:\n", 146 | " if (base == 'C') or (base == 'G'):\n", 147 | " gc += 1\n", 148 | " return gc\n", 149 | "\n", 150 | "seq_filename = os.path.join('data', 'seq.txt')\n", 151 | "if os.path.exists(seq_filename):\n", 152 | " with open (seq_filename) as data:\n", 153 | " with open('gc_content_data.csv', 'w') as out: \n", 154 | " for line in data:\n", 155 | " seq = line.strip()\n", 156 | " out.write('{},{}\\n'.format(seq, gc_content(seq)))" 157 | ] 158 | }, 159 | { 160 | "cell_type": "markdown", 161 | "metadata": { 162 | "deletable": true, 163 | "editable": true 164 | }, 165 | "source": [ 166 | "## cvs module" 167 | ] 168 | }, 169 | { 170 | "cell_type": "code", 171 | "execution_count": null, 172 | "metadata": { 173 | "collapsed": false, 174 | "deletable": true, 175 | "editable": true 176 | }, 177 | "outputs": [], 178 | "source": [ 179 | "import csv\n", 180 | "data_filename = 'gc_content_data.csv'\n", 181 | "if os.path.exists(data_filename):\n", 182 | " with open(data_filename) as data:\n", 183 | " reader = csv.reader(data, delimiter=',')\n", 184 | " for row in reader:\n", 185 | " print(row)" 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": null, 191 | "metadata": { 192 | "collapsed": false, 193 | "deletable": true, 194 | "editable": true 195 | }, 196 | "outputs": [], 197 | "source": [ 198 | "import csv\n", 199 | "# add column header to data file called seq,gc\n", 200 | "data_filename = 'gc_content_data.csv'\n", 201 | "results = []\n", 202 | "if os.path.exists(data_filename):\n", 203 | " with open(data_filename) as data:\n", 204 | " reader = csv.DictReader(data, delimiter=',')\n", 205 | " for row in reader:\n", 206 | " results.append(row)\n", 207 | "\n", 208 | "for r in results:\n", 209 | " print('{}\\t{}'.format(r['seq'], r['gc']))" 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": null, 215 | "metadata": { 216 | "collapsed": true, 217 | "deletable": true, 218 | "editable": true 219 | }, 220 | "outputs": [], 221 | "source": [ 222 | "import csv\n", 223 | "with open('output.txt', 'w') as out:\n", 224 | " writer = csv.DictWriter(out, fieldnames=['seq', 'gc'], delimiter='\\t')\n", 225 | " for r in results:\n", 226 | " writer.writerow(r)" 227 | ] 228 | }, 229 | { 230 | "cell_type": "markdown", 231 | "metadata": { 232 | "deletable": true, 233 | "editable": true 234 | }, 235 | "source": [ 236 | "## Ex 3.2\n", 237 | "- change the script you wrote for ex 3.1 to make use of the csv module" 238 | ] 239 | }, 240 | { 241 | "cell_type": "code", 242 | "execution_count": null, 243 | "metadata": { 244 | "collapsed": false, 245 | "deletable": true, 246 | "editable": true 247 | }, 248 | "outputs": [], 249 | "source": [ 250 | "data_filename = os.path.join('data', 'genes.txt')\n", 251 | "results = []\n", 252 | "if os.path.exists(data_filename):\n", 253 | " with open(data_filename) as data:\n", 254 | " reader = csv.DictReader(data, delimiter='\\t')\n", 255 | " for row in reader:\n", 256 | " results.append({'gene': row['gene'], 'len': int(row['end'])-int(row['start'])+1})\n", 257 | "else:\n", 258 | " print('{} file does not exist'.format(data_filename))\n", 259 | " \n", 260 | "with open('results_with_csv.txt', 'w') as out:\n", 261 | " writer = csv.DictWriter(out, fieldnames=['gene', 'len'], delimiter='\\t')\n", 262 | " for r in results:\n", 263 | " writer.writerow(r)\n", 264 | "\n", 265 | "#print(results)" 266 | ] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": null, 271 | "metadata": { 272 | "collapsed": false, 273 | "deletable": true, 274 | "editable": true 275 | }, 276 | "outputs": [], 277 | "source": [ 278 | "import pandas\n", 279 | "data = pandas.read_csv('results_with_csv.txt', sep='\\t')\n", 280 | "print(data)" 281 | ] 282 | }, 283 | { 284 | "cell_type": "markdown", 285 | "metadata": { 286 | "deletable": true, 287 | "editable": true 288 | }, 289 | "source": [ 290 | "## Writing your own module" 291 | ] 292 | }, 293 | { 294 | "cell_type": "code", 295 | "execution_count": null, 296 | "metadata": { 297 | "collapsed": true, 298 | "deletable": true, 299 | "editable": true 300 | }, 301 | "outputs": [], 302 | "source": [ 303 | "def gc_content(seq):\n", 304 | " gc = 0\n", 305 | " for base in seq:\n", 306 | " if (base == 'C') or (base == 'G'):\n", 307 | " gc += 1\n", 308 | " return gc" 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": null, 314 | "metadata": { 315 | "collapsed": false, 316 | "deletable": true, 317 | "editable": true 318 | }, 319 | "outputs": [], 320 | "source": [ 321 | "import tools\n", 322 | "print(tools.gc_content('CCCTTCGCTT'))" 323 | ] 324 | }, 325 | { 326 | "cell_type": "code", 327 | "execution_count": null, 328 | "metadata": { 329 | "collapsed": false, 330 | "deletable": true, 331 | "editable": true 332 | }, 333 | "outputs": [], 334 | "source": [ 335 | "from tools import gc_content\n", 336 | "print(gc_content('AAAAA'))" 337 | ] 338 | }, 339 | { 340 | "cell_type": "markdown", 341 | "metadata": { 342 | "deletable": true, 343 | "editable": true 344 | }, 345 | "source": [ 346 | "## Ex 3.3\n", 347 | "- write a function that extract a list of overlapping sub-sequences of a given sequence for a given window size" 348 | ] 349 | }, 350 | { 351 | "cell_type": "code", 352 | "execution_count": null, 353 | "metadata": { 354 | "collapsed": false, 355 | "deletable": true, 356 | "editable": true 357 | }, 358 | "outputs": [], 359 | "source": [ 360 | "def extract_seq(seq, window_size):\n", 361 | " results = []\n", 362 | " nb_windows = len(seq) - window_size + 1\n", 363 | " for i in range(nb_windows):\n", 364 | " results.append(seq[i:i+window_size])\n", 365 | " return results\n", 366 | "\n", 367 | "seq = 'ATTCCGGGCCTTAAAA'\n", 368 | "print(extract_seq(seq, 5))" 369 | ] 370 | }, 371 | { 372 | "cell_type": "markdown", 373 | "metadata": { 374 | "deletable": true, 375 | "editable": true 376 | }, 377 | "source": [ 378 | "## Ex 3.4\n", 379 | "- calculate the gc content along the DNA sequence by combining the two functiona writen using the tools module" 380 | ] 381 | }, 382 | { 383 | "cell_type": "code", 384 | "execution_count": null, 385 | "metadata": { 386 | "collapsed": false, 387 | "deletable": true, 388 | "editable": true 389 | }, 390 | "outputs": [], 391 | "source": [ 392 | "import tools\n", 393 | "seq = 'ATTCCGGGCCTTAAAA'\n", 394 | "for sub_seq in tools.extract_seq(seq, 5):\n", 395 | " print(tools.gc_content(sub_seq))" 396 | ] 397 | } 398 | ], 399 | "metadata": { 400 | "kernelspec": { 401 | "display_name": "Python 3", 402 | "language": "python", 403 | "name": "python3" 404 | }, 405 | "language_info": { 406 | "codemirror_mode": { 407 | "name": "ipython", 408 | "version": 3 409 | }, 410 | "file_extension": ".py", 411 | "mimetype": "text/x-python", 412 | "name": "python", 413 | "nbconvert_exporter": "python", 414 | "pygments_lexer": "ipython3", 415 | "version": "3.6.4" 416 | } 417 | }, 418 | "nbformat": 4, 419 | "nbformat_minor": 2 420 | } 421 | -------------------------------------------------------------------------------- /live/live_help_python_fm_3.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Recap\n", 8 | "\n", 9 | "- basic python" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "### Python basic" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": null, 22 | "metadata": { 23 | "collapsed": false 24 | }, 25 | "outputs": [], 26 | "source": [ 27 | "# list\n", 28 | "my_name = 'Anne'\n", 29 | "my_list = [2, 4, 6, 8, my_name]\n", 30 | "print(my_list)\n", 31 | "print(my_list[1])" 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": null, 37 | "metadata": { 38 | "collapsed": true 39 | }, 40 | "outputs": [], 41 | "source": [ 42 | "# dictionary\n", 43 | "my_dict = {'A': 'Adenine', 'C': 'Cytosine'} \n", 44 | "print(my_dict)\n", 45 | "print(my_dict['A'])" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": null, 51 | "metadata": { 52 | "collapsed": false 53 | }, 54 | "outputs": [], 55 | "source": [ 56 | "# string\n", 57 | "seq = 'ATC CTG TAC TTT'\n", 58 | "codons = seq.split()\n", 59 | "print(codons)\n", 60 | "new_seq = ','.join(codons)\n", 61 | "print(new_seq)" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": null, 67 | "metadata": { 68 | "collapsed": true 69 | }, 70 | "outputs": [], 71 | "source": [ 72 | "# loop\n", 73 | "seq = 'ATCCTGTACTT'\n", 74 | "for base in seq:\n", 75 | " print(base)" 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": null, 81 | "metadata": { 82 | "collapsed": false 83 | }, 84 | "outputs": [], 85 | "source": [ 86 | "# condition\n", 87 | "base = 'A'\n", 88 | "if base == 'A':\n", 89 | " print('found base A')" 90 | ] 91 | }, 92 | { 93 | "cell_type": "code", 94 | "execution_count": null, 95 | "metadata": { 96 | "collapsed": false 97 | }, 98 | "outputs": [], 99 | "source": [ 100 | "# loop and condition combined\n", 101 | "seq = 'ATCCTGTACTT'\n", 102 | "gc = 0\n", 103 | "for base in seq:\n", 104 | " if (base == 'G') or (base == 'C'):\n", 105 | " gc += 1\n", 106 | " print(base)\n", 107 | "print('total number of GCs in the sequence', seq, 'is', gc)" 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": null, 113 | "metadata": { 114 | "collapsed": true 115 | }, 116 | "outputs": [], 117 | "source": [ 118 | "# file\n", 119 | "seq = 'ATCCTGTACTT'\n", 120 | "gc = 0\n", 121 | "for base in seq:\n", 122 | " if (base == 'G') or (base == 'C'):\n", 123 | " gc += 1\n", 124 | "\n", 125 | "with open('my_file.txt', 'w') as out:\n", 126 | " out.write('seq,gc_content\\n')\n", 127 | " out.write('{},{}'.format(seq, gc))" 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": { 133 | "deletable": true, 134 | "editable": true 135 | }, 136 | "source": [ 137 | "## Recap\n", 138 | "\n", 139 | "- functions" 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": {}, 145 | "source": [ 146 | "### Python documentation\n", 147 | "\n", 148 | "- link to python.org" 149 | ] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": {}, 154 | "source": [ 155 | "### Functions" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": null, 161 | "metadata": { 162 | "collapsed": false 163 | }, 164 | "outputs": [], 165 | "source": [ 166 | "# build-in ones\n", 167 | "seq = 'ATCCTGTACTT'\n", 168 | "print(len(seq))" 169 | ] 170 | }, 171 | { 172 | "cell_type": "code", 173 | "execution_count": null, 174 | "metadata": { 175 | "collapsed": false 176 | }, 177 | "outputs": [], 178 | "source": [ 179 | "# your own one\n", 180 | "def gc_content(seq):\n", 181 | " gc = 0\n", 182 | " for base in seq:\n", 183 | " if (base == 'G') or (base == 'C'):\n", 184 | " gc += 1\n", 185 | " return gc\n", 186 | "\n", 187 | "seq = 'ATCCTGTACTT'\n", 188 | "print(gc_content(seq))\n", 189 | "print(gc_content('AAATCGATTTAAGGGG')) # reuse multiple time\n", 190 | "\n", 191 | "with open('gc_content_data.csv', 'w') as out:\n", 192 | " with open('seq.txt') as data:\n", 193 | " for line in data:\n", 194 | " seq = line.strip()\n", 195 | " out.write('{},{}\\n'.format(seq, gc_content(seq)))" 196 | ] 197 | }, 198 | { 199 | "cell_type": "markdown", 200 | "metadata": {}, 201 | "source": [ 202 | "# Session 3: Modules\n", 203 | "\n", 204 | "- use built-in modules: math, os.path, csv and pandas\n", 205 | "- create your own" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "### math module" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": null, 218 | "metadata": { 219 | "collapsed": false 220 | }, 221 | "outputs": [], 222 | "source": [ 223 | "# import math\n", 224 | "import math\n", 225 | "dir(math)" 226 | ] 227 | }, 228 | { 229 | "cell_type": "markdown", 230 | "metadata": {}, 231 | "source": [ 232 | "### os.path module" 233 | ] 234 | }, 235 | { 236 | "cell_type": "code", 237 | "execution_count": null, 238 | "metadata": { 239 | "collapsed": false 240 | }, 241 | "outputs": [], 242 | "source": [ 243 | "# os.path module\n", 244 | "import os.path\n", 245 | "print(os.path.exists('my_file_that_does_not_exist.txt'))" 246 | ] 247 | }, 248 | { 249 | "cell_type": "code", 250 | "execution_count": null, 251 | "metadata": { 252 | "collapsed": false 253 | }, 254 | "outputs": [], 255 | "source": [ 256 | "import os.path\n", 257 | "seq_filename = os.path.join('data', 'seq.txt')\n", 258 | "if (os.path.exists(seq_filename)):\n", 259 | " with open(seq_filename) as data:\n", 260 | " for line in data:\n", 261 | " print(line.strip())\n", 262 | "\n", 263 | " print(os.path.dirname(seq_filename))\n", 264 | " print(os.path.basename(seq_filename))\n", 265 | " \n", 266 | "else:\n", 267 | " print('file {} not found'.format(seq_filename))" 268 | ] 269 | }, 270 | { 271 | "cell_type": "markdown", 272 | "metadata": {}, 273 | "source": [ 274 | "### Ex 3.1\n", 275 | "\n", 276 | "- Read a tab delimited file data/genes.txt \n", 277 | "- Check the file exists\n", 278 | "- Calculate the length of each gene\n", 279 | "- Write the results into a new tab separated file" 280 | ] 281 | }, 282 | { 283 | "cell_type": "markdown", 284 | "metadata": {}, 285 | "source": [ 286 | "### csv module" 287 | ] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "execution_count": null, 292 | "metadata": { 293 | "collapsed": false 294 | }, 295 | "outputs": [], 296 | "source": [ 297 | "# csv module - reader\n", 298 | "import csv\n", 299 | "gc_content_filename = 'gc_content_data.csv'\n", 300 | "if os.path.exists(gc_content_filename):\n", 301 | " #print('file exists')\n", 302 | " with open(gc_content_filename) as data:\n", 303 | " #for line in data:\n", 304 | " # print(line)\n", 305 | " reader = csv.reader(data, delimiter = \",\")\n", 306 | " for row in reader:\n", 307 | " print(row)\n", 308 | " " 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": null, 314 | "metadata": { 315 | "collapsed": false 316 | }, 317 | "outputs": [], 318 | "source": [ 319 | "import csv\n", 320 | "gc_content_filename = 'gc_content_data.csv'\n", 321 | "results = []\n", 322 | "if os.path.exists(gc_content_filename):\n", 323 | " with open(gc_content_filename) as data:\n", 324 | " reader = csv.DictReader(data, delimiter = \",\")\n", 325 | " for row in reader:\n", 326 | " results.append(row)\n", 327 | "\n", 328 | "# ordered dictionary\n", 329 | "print(results[1])\n", 330 | "\n", 331 | "for r in results:\n", 332 | " print('{}\\t{}'.format(r['seq'], r['gc']))\n" 333 | ] 334 | }, 335 | { 336 | "cell_type": "code", 337 | "execution_count": null, 338 | "metadata": { 339 | "collapsed": false 340 | }, 341 | "outputs": [], 342 | "source": [ 343 | "# csv module - writer\n", 344 | "with open('output.txt', 'w') as out:\n", 345 | " writer = csv.DictWriter(out, fieldnames=['seq', 'gc'], delimiter='\\t')\n", 346 | " #writer.writeheader()\n", 347 | " for r in results:\n", 348 | " writer.writerow(r)" 349 | ] 350 | }, 351 | { 352 | "cell_type": "markdown", 353 | "metadata": {}, 354 | "source": [ 355 | "### Ex 3.2\n", 356 | "- change the script you wrote for Ex 3.1 to make use of the csv module" 357 | ] 358 | }, 359 | { 360 | "cell_type": "code", 361 | "execution_count": null, 362 | "metadata": { 363 | "collapsed": false 364 | }, 365 | "outputs": [], 366 | "source": [ 367 | "# pandas module\n", 368 | "import pandas\n", 369 | "data = pandas.read_csv('gc_content_data.csv')\n", 370 | "print(data)\n", 371 | "for i, d in data.iterrows():\n", 372 | " print(d['seq'], d['gc'])" 373 | ] 374 | }, 375 | { 376 | "cell_type": "code", 377 | "execution_count": null, 378 | "metadata": { 379 | "collapsed": true 380 | }, 381 | "outputs": [], 382 | "source": [ 383 | "data.to_csv('new_gc_content_data.csv', sep=',', index=False)" 384 | ] 385 | }, 386 | { 387 | "cell_type": "markdown", 388 | "metadata": {}, 389 | "source": [ 390 | "### Writing your own module" 391 | ] 392 | }, 393 | { 394 | "cell_type": "code", 395 | "execution_count": null, 396 | "metadata": { 397 | "collapsed": true 398 | }, 399 | "outputs": [], 400 | "source": [ 401 | "# use this function and save it into a file called tools.py\n", 402 | "def gc_content(seq):\n", 403 | " gc = 0\n", 404 | " for base in seq:\n", 405 | " if (base == 'G') or (base == 'C'):\n", 406 | " gc += 1\n", 407 | " return gc" 408 | ] 409 | }, 410 | { 411 | "cell_type": "code", 412 | "execution_count": null, 413 | "metadata": { 414 | "collapsed": false 415 | }, 416 | "outputs": [], 417 | "source": [ 418 | "import tools\n", 419 | "print(tools.gc_content('AAATTTCCGG'))" 420 | ] 421 | }, 422 | { 423 | "cell_type": "code", 424 | "execution_count": null, 425 | "metadata": { 426 | "collapsed": false 427 | }, 428 | "outputs": [], 429 | "source": [ 430 | "from tools import gc_content\n", 431 | "print(gc_content('AAATTTCCGG'))" 432 | ] 433 | }, 434 | { 435 | "cell_type": "markdown", 436 | "metadata": {}, 437 | "source": [ 438 | "### Ex 3.3\n", 439 | "\n", 440 | "- Write a function that extracts a list of overlapping sub-sequences for a given window size for any given sequences" 441 | ] 442 | }, 443 | { 444 | "cell_type": "markdown", 445 | "metadata": {}, 446 | "source": [ 447 | "### Ex 3.4\n", 448 | "- Calculate GC content along the DNA sequence by combining the two functions writen using the tools module" 449 | ] 450 | } 451 | ], 452 | "metadata": { 453 | "kernelspec": { 454 | "display_name": "Python 3", 455 | "language": "python", 456 | "name": "python3" 457 | }, 458 | "language_info": { 459 | "codemirror_mode": { 460 | "name": "ipython", 461 | "version": 3 462 | }, 463 | "file_extension": ".py", 464 | "mimetype": "text/x-python", 465 | "name": "python", 466 | "nbconvert_exporter": "python", 467 | "pygments_lexer": "ipython3", 468 | "version": "3.6.4" 469 | } 470 | }, 471 | "nbformat": 4, 472 | "nbformat_minor": 2 473 | } 474 | -------------------------------------------------------------------------------- /python_fm_4.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Working with Python: functions and modules\n", 8 | "\n", 9 | "## Session 4: Using third party libraries\n", 10 | "\n", 11 | "- [Matplotlib](#Matplotlib)\n", 12 | "- [Exercise 4.1](#Exercise-4.1)\n", 13 | "- [BioPython](#BioPython)\n", 14 | "- [Working with sequences](#Working-with-sequences)\n", 15 | "- [Connecting with biological databases](#Connecting-with-biological-databases)\n", 16 | "- [Exercise 4.2](#Exercise-4.2)" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "## Matplotlib\n", 24 | "\n", 25 | "[matplotlib](http://matplotlib.org/) is probably the single most used Python package for graphics. It provides both a very quick way to visualize data from Python and publication-quality figures in many formats.\n", 26 | "\n", 27 | "matplotlib.pyplot is a collection of command style functions that make matplotlib work like MATLAB. Each pyplot function makes some change to a figure: e.g., creates a figure, creates a plotting area in a figure, plots some lines in a plotting area, decorates the plot with labels, etc. \n", 28 | "\n", 29 | "Let's start with a very simple plot." 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": null, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "import matplotlib.pyplot as mpyplot\n", 39 | "mpyplot.plot([1,2,3,4])\n", 40 | "mpyplot.ylabel('some numbers')\n", 41 | "mpyplot.show()" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "`plot()` is a versatile command, and will take an arbitrary number of arguments. For example, to plot x versus y, you can issue the command:" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": null, 54 | "metadata": {}, 55 | "outputs": [], 56 | "source": [ 57 | "mpyplot.plot([1,2,3,4], [1,4,9,16])" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "metadata": {}, 63 | "source": [ 64 | "For every x, y pair of arguments, there is an **optional third argument** which is the format string that indicates the color and line type of the plot. The letters and symbols of the format string are from MATLAB, and you concatenate a color string with a line style string. The default format string is `b-`, which is a solid blue line. For example, to plot the above with red circles, you would chose `ro`." 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": null, 70 | "metadata": {}, 71 | "outputs": [], 72 | "source": [ 73 | "import matplotlib.pyplot as mpyplot\n", 74 | "mpyplot.plot([1,2,3,4], [1,4,9,16], 'ro')\n", 75 | "mpyplot.axis([0, 6, 0, 20])\n", 76 | "mpyplot.show()" 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": {}, 82 | "source": [ 83 | "`matplotlib` has a few methods in the **`pyplot` module** that make creating common types of plots faster and more convenient because they automatically create a Figure and an Axes object. The most widely used are:\n", 84 | "\n", 85 | "- [mpyplot.bar](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.bar) – creates a bar chart.\n", 86 | "- [mpyplot.boxplot](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.boxplot) – makes a box and whisker plot.\n", 87 | "- [mpyplot.hist](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.hist) – makes a histogram.\n", 88 | "- [mpyplot.plot](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.plot) – creates a line plot.\n", 89 | "- [mpyplot.scatter](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.scatter) – makes a scatter plot.\n", 90 | "\n", 91 | "Calling any of these methods will automatically setup `Figure` and `Axes` objects, and draw the plot. Each of these methods has different parameters that can be passed in to modify the resulting plot.\n", 92 | "\n", 93 | "The [Pyplot tutorial](http://matplotlib.org/users/pyplot_tutorial.html) is where these simple examples above are coming from. More could be learn from it if you wish during your own time.\n", 94 | "\n", 95 | "Let's now try to plot the GC content along the chain we have calculated during the previous session, while solving the Exercises 3.3 and 3.4." 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": null, 101 | "metadata": {}, 102 | "outputs": [], 103 | "source": [ 104 | "seq = 'ATGGTGCATCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTG'\n", 105 | "gc = [40.0, 60.0, 80.0, 60.0, 40.0, 60.0, 40.0, 40.0, 40.0, 60.0, \n", 106 | " 40.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, \n", 107 | " 60.0, 40.0, 40.0, 40.0, 40.0, 40.0, 60.0, 60.0, 80.0, 80.0, \n", 108 | " 80.0, 60.0, 40.0, 40.0, 20.0, 40.0, 60.0, 80.0, 80.0, 80.0, \n", 109 | " 80.0, 60.0, 60.0, 60.0, 80.0, 80.0, 100.0, 80.0, 60.0, 60.0, \n", 110 | " 60.0, 40.0, 60.0]\n", 111 | "window_ids = range(len(gc))\n", 112 | "\n", 113 | "import matplotlib.pyplot as mpyplot\n", 114 | "mpyplot.plot(window_ids, gc, '--' )\n", 115 | "mpyplot.xlabel('5 bases window id along the sequence')\n", 116 | "mpyplot.ylabel('%GC')\n", 117 | "mpyplot.title('GC plot for sequence\\n' + seq)\n", 118 | "mpyplot.show()" 119 | ] 120 | }, 121 | { 122 | "cell_type": "markdown", 123 | "metadata": {}, 124 | "source": [ 125 | "## Exercise 4.1\n", 126 | "\n", 127 | "Re-use the GapMinder dataset to plot, in Jupyter using Matplotlib, from the world data the life expectancy against GDP per capita for 1957 and 2007 using a scatter plot, add title to your graph as well as a legend." 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "## BioPython\n", 135 | "\n", 136 | "The goal of Biopython is to make it as easy as possible to use Python for bioinformatics by creating high-quality, reusable modules and classes. Biopython features include parsers for various Bioinformatics file formats (BLAST, Clustalw, FASTA, Genbank,...), access to online services (NCBI, Expasy,...), interfaces to common and not-so-common programs (Clustalw, DSSP, MSMS...), a standard sequence class, various clustering modules, a KD tree data structure etc. and documentation as well as a tutorial: http://biopython.org/DIST/docs/tutorial/Tutorial.html." 137 | ] 138 | }, 139 | { 140 | "cell_type": "markdown", 141 | "metadata": {}, 142 | "source": [ 143 | "## Working with sequences" 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "metadata": {}, 149 | "source": [ 150 | "We can create a sequence by defining a `Seq` object with strings. `Bio.Seq()` takes as input a string and converts in into a Seq object. We can print the sequences, individual residues, lengths and use other functions to get summary statistics. " 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": null, 156 | "metadata": {}, 157 | "outputs": [], 158 | "source": [ 159 | "# Creating sequence\n", 160 | "from Bio.Seq import Seq\n", 161 | "my_seq = Seq(\"AGTACACTGGT\")\n", 162 | "print(my_seq)\n", 163 | "print(my_seq[10])\n", 164 | "print(my_seq[1:5])\n", 165 | "print(len(my_seq))\n", 166 | "print(my_seq.count(\"A\"))" 167 | ] 168 | }, 169 | { 170 | "cell_type": "markdown", 171 | "metadata": {}, 172 | "source": [ 173 | "We can use functions from `Bio.SeqUtils` to get idea about a sequence " 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": null, 179 | "metadata": {}, 180 | "outputs": [], 181 | "source": [ 182 | "# Calculate the molecular weight\n", 183 | "from Bio.SeqUtils import GC, molecular_weight\n", 184 | "print(GC(my_seq))\n", 185 | "print(molecular_weight(my_seq))" 186 | ] 187 | }, 188 | { 189 | "cell_type": "markdown", 190 | "metadata": {}, 191 | "source": [ 192 | "One letter code protein sequences can be converted into three letter codes using `seq3` utility " 193 | ] 194 | }, 195 | { 196 | "cell_type": "code", 197 | "execution_count": null, 198 | "metadata": {}, 199 | "outputs": [], 200 | "source": [ 201 | "from Bio.SeqUtils import seq3\n", 202 | "print(seq3(my_seq))" 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": {}, 208 | "source": [ 209 | "Alphabets defines how the strings are going to be treated as sequence object. `Bio.Alphabet` module defines the available alphabets for Biopython. `Bio.Alphabet.IUPAC` provides basic definition for DNA, RNA and proteins. " 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": null, 215 | "metadata": {}, 216 | "outputs": [], 217 | "source": [ 218 | "from Bio.Alphabet import IUPAC\n", 219 | "my_dna = Seq(\"AGTACATGACTGGTTTAG\", IUPAC.unambiguous_dna)\n", 220 | "print(my_dna)\n", 221 | "print(my_dna.alphabet)" 222 | ] 223 | }, 224 | { 225 | "cell_type": "code", 226 | "execution_count": null, 227 | "metadata": {}, 228 | "outputs": [], 229 | "source": [ 230 | "my_dna.complement()" 231 | ] 232 | }, 233 | { 234 | "cell_type": "code", 235 | "execution_count": null, 236 | "metadata": {}, 237 | "outputs": [], 238 | "source": [ 239 | "my_dna.reverse_complement()" 240 | ] 241 | }, 242 | { 243 | "cell_type": "code", 244 | "execution_count": null, 245 | "metadata": {}, 246 | "outputs": [], 247 | "source": [ 248 | "my_dna.translate()" 249 | ] 250 | }, 251 | { 252 | "cell_type": "markdown", 253 | "metadata": {}, 254 | "source": [ 255 | "### Parsing sequence file format: FASTA files" 256 | ] 257 | }, 258 | { 259 | "cell_type": "markdown", 260 | "metadata": {}, 261 | "source": [ 262 | "Sequence files can be parsed and read the same way we read other files. " 263 | ] 264 | }, 265 | { 266 | "cell_type": "code", 267 | "execution_count": null, 268 | "metadata": {}, 269 | "outputs": [], 270 | "source": [ 271 | "with open( \"data/glpa.fa\" ) as f:\n", 272 | " print(f.read())" 273 | ] 274 | }, 275 | { 276 | "cell_type": "markdown", 277 | "metadata": {}, 278 | "source": [ 279 | "Biopython provides specific functions to allow parsing/reading sequence files. " 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "execution_count": null, 285 | "metadata": {}, 286 | "outputs": [], 287 | "source": [ 288 | "# Reading FASTA files\n", 289 | "from Bio import SeqIO\n", 290 | "\n", 291 | "with open(\"data/glpa.fa\") as f:\n", 292 | " for protein in SeqIO.parse(f, 'fasta'):\n", 293 | " print(protein.id)\n", 294 | " print(protein.seq)" 295 | ] 296 | }, 297 | { 298 | "cell_type": "markdown", 299 | "metadata": {}, 300 | "source": [ 301 | "Sequence objects can be written into files using file handles with the function `SeqIO.write()`. We need to provide the name of the output sequence file and the sequence file format. " 302 | ] 303 | }, 304 | { 305 | "cell_type": "code", 306 | "execution_count": null, 307 | "metadata": {}, 308 | "outputs": [], 309 | "source": [ 310 | "# Writing FASTA files\n", 311 | "from Bio import SeqIO\n", 312 | "from Bio.SeqRecord import SeqRecord\n", 313 | "from Bio.Seq import Seq\n", 314 | "from Bio.Alphabet import IUPAC\n", 315 | "\n", 316 | "sequence = 'MYGKIIFVLLLSEIVSISASSTTGVAMHTSTSSSVTKSYISSQTNDTHKRDTYAATPRAHEVSEISVRTVYPPEEETGERVQLAHHFSEPEITLIIFG'\n", 317 | " \n", 318 | "seq = Seq(sequence, IUPAC.protein)\n", 319 | "protein = [SeqRecord(seq, id=\"THEID\", description='a description'),]\n", 320 | "\n", 321 | "with open( \"biopython.fa\", \"w\") as f:\n", 322 | " SeqIO.write(protein, f, 'fasta')\n", 323 | "\n", 324 | "with open( \"biopython.fa\" ) as f:\n", 325 | " print(f.read())" 326 | ] 327 | }, 328 | { 329 | "cell_type": "markdown", 330 | "metadata": {}, 331 | "source": [ 332 | "## Connecting with biological databases" 333 | ] 334 | }, 335 | { 336 | "cell_type": "markdown", 337 | "metadata": {}, 338 | "source": [ 339 | "Sequences can be searched and downloaded from public databases. " 340 | ] 341 | }, 342 | { 343 | "cell_type": "code", 344 | "execution_count": null, 345 | "metadata": {}, 346 | "outputs": [], 347 | "source": [ 348 | "# Read FASTA file from NCBI GenBank\n", 349 | "from Bio import Entrez\n", 350 | "\n", 351 | "Entrez.email = 'A.N.Other@example.com' # Always tell NCBI who you are\n", 352 | "handle = Entrez.efetch(db=\"nucleotide\", id=\"71066805\", rettype=\"gb\")\n", 353 | "seq_record = SeqIO.read(handle, \"gb\")\n", 354 | "handle.close()\n", 355 | "\n", 356 | "print(seq_record.id, 'with', len(seq_record.features), 'features')\n", 357 | "print(seq_record.seq)\n", 358 | "print(seq_record.format(\"fasta\"))" 359 | ] 360 | }, 361 | { 362 | "cell_type": "code", 363 | "execution_count": null, 364 | "metadata": {}, 365 | "outputs": [], 366 | "source": [ 367 | "# Read SWISSPROT record\n", 368 | "from Bio import ExPASy\n", 369 | "\n", 370 | "handle = ExPASy.get_sprot_raw('HBB_HUMAN')\n", 371 | "prot_record = SeqIO.read(handle, \"swiss\")\n", 372 | "handle.close()\n", 373 | "\n", 374 | "print(prot_record.description)\n", 375 | "print(prot_record.seq)" 376 | ] 377 | }, 378 | { 379 | "cell_type": "markdown", 380 | "metadata": {}, 381 | "source": [ 382 | "## Exercise 4.2" 383 | ] 384 | }, 385 | { 386 | "cell_type": "markdown", 387 | "metadata": { 388 | "collapsed": true 389 | }, 390 | "source": [ 391 | "- Retrieve a FASTA file named `data/sample.fa` using BioPython and answer the following questions:\n", 392 | " - How many sequences are in the file?\n", 393 | " - What are the IDs and the lengths of the longest and the shortest sequences?\n", 394 | " - Select sequences longer than 500bp. What is the average length of these sequences?\n", 395 | " - Calculate and print the percentage of GC in each of the sequences.\n", 396 | " - Write the newly created sequences into a FASTA file named `long_sequences.fa` " 397 | ] 398 | }, 399 | { 400 | "cell_type": "markdown", 401 | "metadata": {}, 402 | "source": [ 403 | "## Congratulation! You reached the end! " 404 | ] 405 | } 406 | ], 407 | "metadata": { 408 | "kernelspec": { 409 | "display_name": "Python 3", 410 | "language": "python", 411 | "name": "python3" 412 | }, 413 | "language_info": { 414 | "codemirror_mode": { 415 | "name": "ipython", 416 | "version": 3 417 | }, 418 | "file_extension": ".py", 419 | "mimetype": "text/x-python", 420 | "name": "python", 421 | "nbconvert_exporter": "python", 422 | "pygments_lexer": "ipython3", 423 | "version": "3.6.2" 424 | } 425 | }, 426 | "nbformat": 4, 427 | "nbformat_minor": 1 428 | } 429 | -------------------------------------------------------------------------------- /python_fm_3.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Working with Python: functions and modules\n", 8 | "\n", 9 | "## Session 3: Modules\n", 10 | "\n", 11 | "- [Importing modules and libraries](#Importing-modules-and-libraries)\n", 12 | "- [Python file library](#Python-file-library)\n", 13 | "- [Exercise 3.1](#Exercise-3.1)\n", 14 | "- [Using the `csv` module](#Using-the-csv-module)\n", 15 | "- [Exercise 3.2](#Exercise-3.2)\n", 16 | "- [Create your own module](#Create-your-own-module)\n", 17 | "- [Exercise 3.3](#Exercise-3.3)\n", 18 | "- [Exercise 3.4](#Exercise-3.4)" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "## Importing modules and libraries" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "Like other laguages, Python has the ability to import external modules (or libraries) into the current program. These modules may be part of the standard library that is automatically included with the Python installation, they may be extra libraries which you install separately or they may be other Python programs you have written yourself. Whatever the source of the module, they are imported into a program via an **`import`** command.\n", 33 | "\n", 34 | "For example, if we wish to access the mathematical constants `pi` and `e` we can use the import keyword to get [the module named `math`](https://docs.python.org/3/library/math.html) and access its contents with the dot notation:" 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": null, 40 | "metadata": {}, 41 | "outputs": [], 42 | "source": [ 43 | "import math\n", 44 | "print(math.pi, math.e)" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": {}, 50 | "source": [ 51 | "Also we can use the `as` keyword to give the module a different name in our code, which can be useful for brevity and avoiding name conflicts:" 52 | ] 53 | }, 54 | { 55 | "cell_type": "code", 56 | "execution_count": null, 57 | "metadata": {}, 58 | "outputs": [], 59 | "source": [ 60 | "import math as m\n", 61 | "print(m.pi, m.e)" 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": {}, 67 | "source": [ 68 | "Alternatively we can import the separate components using the `from … import` keyword combination:" 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": null, 74 | "metadata": {}, 75 | "outputs": [], 76 | "source": [ 77 | "from math import pi, e\n", 78 | "print(pi, e)" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "We can import multiple components from a single module, either on one line like as seen above or on separate lines:" 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": null, 91 | "metadata": { 92 | "collapsed": true 93 | }, 94 | "outputs": [], 95 | "source": [ 96 | "from math import pi\n", 97 | "from math import e" 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "### Listing module contents\n", 105 | "\n", 106 | "Using the [method `dir()`](https://docs.python.org/3/library/functions.html?highlight=dir#dir) and passing the module name:" 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": null, 112 | "metadata": {}, 113 | "outputs": [], 114 | "source": [ 115 | "import math\n", 116 | "dir(math)" 117 | ] 118 | }, 119 | { 120 | "cell_type": "markdown", 121 | "metadata": {}, 122 | "source": [ 123 | "or directly using an instance, like with this String:" 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": null, 129 | "metadata": {}, 130 | "outputs": [], 131 | "source": [ 132 | "dir(\"mystring\")" 133 | ] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "metadata": {}, 138 | "source": [ 139 | "or using the object type" 140 | ] 141 | }, 142 | { 143 | "cell_type": "code", 144 | "execution_count": null, 145 | "metadata": {}, 146 | "outputs": [], 147 | "source": [ 148 | "dir(str)" 149 | ] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": {}, 154 | "source": [ 155 | "### Getting help from the official Python documentation" 156 | ] 157 | }, 158 | { 159 | "cell_type": "markdown", 160 | "metadata": {}, 161 | "source": [ 162 | "The most useful information is online on https://www.python.org/ website and should be used as a reference guide.\n", 163 | "\n", 164 | "- [Python3 documentation](https://docs.python.org/3/) is the starting page with links to tutorials and libraries' documentation for Python 3\n", 165 | " - [The Python Tutorial](https://docs.python.org/3/tutorial/index.html)\n", 166 | " - [Modules](https://docs.python.org/3/tutorial/modules.html)\n", 167 | " - [Brief Tour of the Standard Library: Mathematics](https://docs.python.org/3/tutorial/stdlib.html#mathematics)\n", 168 | " - [The Python Standard Library Reference](https://docs.python.org/3/library/index.html) is the reference documentation of all libraries included in Python like:\n", 169 | " - [`math` — Mathematical functions](https://docs.python.org/3/library/math.html)\n", 170 | " - [`os.path` — Common pathname manipulations](https://docs.python.org/3/library/os.path.html)\n", 171 | " - [`os` — Miscellaneous operating system interfaces](https://docs.python.org/3/library/os.html)\n", 172 | " - [`csv` — CSV File Reading and Writing](https://docs.python.org/3/library/csv.html)" 173 | ] 174 | }, 175 | { 176 | "cell_type": "markdown", 177 | "metadata": {}, 178 | "source": [ 179 | "## Python file library" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": {}, 185 | "source": [ 186 | "### [`os.path` — Common pathname manipulations](https://docs.python.org/3/library/os.path.html)\n", 187 | "\n", 188 | "- `exists(path)` : returns whether path exists\n", 189 | "- `isfile(path)` : returns whether path is a “regular” file (as opposed to a directory)\n", 190 | "- `isdir(path)` : returns whether path is a directory\n", 191 | "- `islink(path)` : returns whether path is a symbolic link\n", 192 | "- `join(*paths)` : joins the paths together into one long path\n", 193 | "- `dirname(path)` : returns directory containing the path\n", 194 | "- `basename(path)` : returns the path minus the dirname(path) in front\n", 195 | "- `split(path)` : returns (dirname(path), basename(path))\n", 196 | "\n", 197 | "### [`os` — Miscellaneous operating system interfaces](https://docs.python.org/3/library/os.html)\n", 198 | "\n", 199 | "- `chdir(path)` : change the current working directory to be path\n", 200 | "- `getcwd()` : return the current working directory\n", 201 | "- `listdir(path)` : returns a list of files/directories in the directory path\n", 202 | "- `mkdir(path)` : create the directory path\n", 203 | "- `rmdir(path)` : remove the directory path\n", 204 | "- `remove(path)` : remove the file path\n", 205 | "- `rename(src, dst)` : move the file/directory from src to dst" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "Building the path to your file from a list of directory and filename makes your script able to run on any platforms." 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": null, 218 | "metadata": {}, 219 | "outputs": [], 220 | "source": [ 221 | "import os.path\n", 222 | "os.path.join(\"data\", \"mydata.txt\")\n", 223 | "# data/mydata.txt - Unix\n", 224 | "# data\\mydata.txt - Windows" 225 | ] 226 | }, 227 | { 228 | "cell_type": "markdown", 229 | "metadata": {}, 230 | "source": [ 231 | "Check if a file exists before opening it:" 232 | ] 233 | }, 234 | { 235 | "cell_type": "code", 236 | "execution_count": null, 237 | "metadata": {}, 238 | "outputs": [], 239 | "source": [ 240 | "import os.path\n", 241 | "data_file = os.path.join(\"data\", \"mydata.txt\")\n", 242 | "if os.path.exists(data_file):\n", 243 | " print(\"file\", data_file, \"exists\")\n", 244 | " with open(data_file) as f:\n", 245 | " print(f.read())\n", 246 | "else:\n", 247 | " print(\"file\", data_file, \"not found!\")" 248 | ] 249 | }, 250 | { 251 | "cell_type": "markdown", 252 | "metadata": {}, 253 | "source": [ 254 | "## Exercise 3.1\n", 255 | "\n", 256 | "Write a script that reads a tab delimited file which has 4 columns: gene, chromosome, start and end coordinates. Check if the file exists, then compute the length of each gene and store its name and corresponding length into a dictionary. Write the results into a new tab separated file. You can find a data file in `data/genes.txt` directory of the course materials." 257 | ] 258 | }, 259 | { 260 | "cell_type": "markdown", 261 | "metadata": {}, 262 | "source": [ 263 | "## Using the `csv` module\n", 264 | "\n", 265 | "The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases. The csv module implements methods to read and write tabular data in CSV format.\n", 266 | "\n", 267 | "The csv module’s `reader()` and `writer()` methods read and write CSV files. You can also read and write data into dictionary form using the `DictReader()` and `DictWriter()` methods.\n", 268 | "\n", 269 | "For more information about this built-in Python library about [CSV File Reading and Writing documentation](https://docs.python.org/3/library/csv.html).\n", 270 | "\n", 271 | "Let's now read our `data/mydata.txt` space separated file using the `csv` module." 272 | ] 273 | }, 274 | { 275 | "cell_type": "code", 276 | "execution_count": null, 277 | "metadata": {}, 278 | "outputs": [], 279 | "source": [ 280 | "import csv\n", 281 | "with open(\"data/mydata.txt\") as f:\n", 282 | " reader = csv.reader(f, delimiter = \" \") # default delimiter is \",\"\n", 283 | " for row in reader:\n", 284 | " print(row)" 285 | ] 286 | }, 287 | { 288 | "cell_type": "markdown", 289 | "metadata": {}, 290 | "source": [ 291 | "Change the `csv.reader()` by the `csv.DictReader()` and it builds up a dictionary automatically based on the column headers." 292 | ] 293 | }, 294 | { 295 | "cell_type": "code", 296 | "execution_count": null, 297 | "metadata": {}, 298 | "outputs": [], 299 | "source": [ 300 | "with open(\"data/mydata.txt\") as f:\n", 301 | " reader = csv.DictReader(f, delimiter = \" \")\n", 302 | " for row in reader:\n", 303 | " print(row)" 304 | ] 305 | }, 306 | { 307 | "cell_type": "code", 308 | "execution_count": null, 309 | "metadata": {}, 310 | "outputs": [], 311 | "source": [ 312 | "# Write a tab delimited file using the csv module\n", 313 | "import csv\n", 314 | "\n", 315 | "mydata = [\n", 316 | " ['1', 'Human', '1.076'], \n", 317 | " ['2', 'Mouse', '1.202'], \n", 318 | " ['3', 'Frog', '2.2362'], \n", 319 | " ['4', 'Fly', '0.9853']\n", 320 | "]\n", 321 | "\n", 322 | "with open(\"data.txt\", \"w\") as f:\n", 323 | " writer = csv.writer(f, delimiter='\\t' )\n", 324 | " writer.writerow( [ \"Index\", \"Organism\", \"Score\" ] ) # write header\n", 325 | " for record in mydata:\n", 326 | " writer.writerow( record )\n", 327 | "\n", 328 | "# Open the output file and print out its content\n", 329 | "with open(\"data.txt\") as f:\n", 330 | " print(f.read())" 331 | ] 332 | }, 333 | { 334 | "cell_type": "code", 335 | "execution_count": null, 336 | "metadata": {}, 337 | "outputs": [], 338 | "source": [ 339 | "# Write a delimited file using the csv module from a list of dictionaries \n", 340 | "import csv\n", 341 | "\n", 342 | "mydata = [\n", 343 | " {'Index': '1', 'Score': '1.076', 'Organism': 'Human'}, \n", 344 | " {'Index': '2', 'Score': '1.202', 'Organism': 'Mouse'}, \n", 345 | " {'Index': '3', 'Score': '2.2362', 'Organism': 'Frog'}, \n", 346 | " {'Index': '4', 'Score': '0.9853', 'Organism': 'Fly'}\n", 347 | "]\n", 348 | "\n", 349 | "with open(\"dict_data.txt\", \"w\") as f:\n", 350 | " writer = csv.DictWriter(f, mydata[0].keys(), delimiter='\\t')\n", 351 | " writer.writeheader() # write header\n", 352 | "\n", 353 | " for record in mydata:\n", 354 | " writer.writerow( record )\n", 355 | "\n", 356 | "# Open the output file and print out its content\n", 357 | "with open(\"dict_data.txt\") as f:\n", 358 | " print(f.read())" 359 | ] 360 | }, 361 | { 362 | "cell_type": "markdown", 363 | "metadata": {}, 364 | "source": [ 365 | "## Exercise 3.2\n", 366 | "\n", 367 | "Now change the script you wrote for [Exercise 3.1](#Exercise-3.1) to make use of the `csv` module." 368 | ] 369 | }, 370 | { 371 | "cell_type": "markdown", 372 | "metadata": {}, 373 | "source": [ 374 | "## Create your own module\n", 375 | "\n", 376 | "So far we have been writing Python code in files as executable scripts without knowing that they are also modules from which we are able to call the different functions defined in them.\n", 377 | "\n", 378 | "A module is a file containing Python definitions and statements. The file name is the module name with the suffix .py appended. Create a file called `my_first_module.py` in the current directory with the following contents:" 379 | ] 380 | }, 381 | { 382 | "cell_type": "code", 383 | "execution_count": null, 384 | "metadata": { 385 | "collapsed": true 386 | }, 387 | "outputs": [], 388 | "source": [ 389 | "def say_hello(user):\n", 390 | " print('hello', user, '!')" 391 | ] 392 | }, 393 | { 394 | "cell_type": "markdown", 395 | "metadata": {}, 396 | "source": [ 397 | "Now enter the Python interpreter from the directory you've created `my_first_module.py` file and import the `say_hello` function from this module with the following command:\n", 398 | "\n", 399 | "```bash\n", 400 | "python3\n", 401 | "Python 3.5.2 (default, Jun 30 2016, 18:10:25) \n", 402 | "[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin\n", 403 | "Type \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n", 404 | ">>> from my_first_module import say_hello\n", 405 | ">>> say_hello('Anne')\n", 406 | "hello Anne !\n", 407 | ">>> \n", 408 | "```\n", 409 | "\n", 410 | "There is one module already stored in the course directory called `my_first_module.py`, if you wish to import it into this notebook, below is what you need to do. If you wish to edit this file and change the code or add another function, you will have to restart the notebook to have these changes taken into account using the restart the kernel button in the menu bar." 411 | ] 412 | }, 413 | { 414 | "cell_type": "code", 415 | "execution_count": null, 416 | "metadata": {}, 417 | "outputs": [], 418 | "source": [ 419 | "from my_first_module import say_hello\n", 420 | "say_hello('Anne')" 421 | ] 422 | }, 423 | { 424 | "cell_type": "markdown", 425 | "metadata": {}, 426 | "source": [ 427 | "A module can contain executable statements as well as function definitions. These statements are intended to initialize the module. They are executed only the first time the module name is encountered in an import statement. \n", 428 | "They are also run if the file is executed as a script.\n", 429 | "\n", 430 | "Do comment out these executable statements if you do not wish to have them executed when importing your module.\n", 431 | "\n", 432 | "For more information about modules, https://docs.python.org/3/tutorial/modules.html." 433 | ] 434 | }, 435 | { 436 | "cell_type": "markdown", 437 | "metadata": {}, 438 | "source": [ 439 | "## Exercise 3.3\n", 440 | "\n", 441 | "### Calculate the GC content of a DNA sequence\n", 442 | "\n", 443 | "Write a function that calculates the GC content of a DNA sequence.\n", 444 | "\n", 445 | "### Extract the list of all overlapping sub-sequences\n", 446 | "\n", 447 | "Write a function that extracts a list of overlapping sub-sequences for a given window size from a given sequence. Do not forget to test it on a given DNA sequence." 448 | ] 449 | }, 450 | { 451 | "cell_type": "markdown", 452 | "metadata": {}, 453 | "source": [ 454 | "## Exercise 3.4\n", 455 | "### Calculate GC content along the DNA sequence\n", 456 | "Combine the two methods written above to calculate the GC content of each overlapping sliding window along a DNA sequence from start to end. \n", 457 | "\n", 458 | "Import the two methods you wrote above at exercise 3.3, to solve this exercise.\n", 459 | "\n", 460 | "The new function should take two arguments, the DNA sequence and the size of the sliding window, and re-use the previous methods written to calculate the GC content of a DNA sequence and to extract the list of all overlapping sub-sequences. It returns a list of GC% along the DNA sequence." 461 | ] 462 | }, 463 | { 464 | "cell_type": "markdown", 465 | "metadata": {}, 466 | "source": [ 467 | "## Next session\n", 468 | "\n", 469 | "Go to our next notebook: [python_functions_and_modules_4](python_fm_4.ipynb)" 470 | ] 471 | } 472 | ], 473 | "metadata": { 474 | "kernelspec": { 475 | "display_name": "Python 3", 476 | "language": "python", 477 | "name": "python3" 478 | }, 479 | "language_info": { 480 | "codemirror_mode": { 481 | "name": "ipython", 482 | "version": 3 483 | }, 484 | "file_extension": ".py", 485 | "mimetype": "text/x-python", 486 | "name": "python", 487 | "nbconvert_exporter": "python", 488 | "pygments_lexer": "ipython3", 489 | "version": "3.6.2" 490 | } 491 | }, 492 | "nbformat": 4, 493 | "nbformat_minor": 1 494 | } 495 | -------------------------------------------------------------------------------- /python_fm_1.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "slideshow": { 7 | "slide_type": "-" 8 | } 9 | }, 10 | "source": [ 11 | "# Working with Python: functions and modules\n", 12 | "\n", 13 | "- Our course webpage: http://pycam.github.io\n", 14 | "- Python website: https://www.python.org/ \n", 15 | "\n", 16 | "## Session 1: Introduction to Python\n", 17 | "- [Printing](#Printing)\n", 18 | "- [Variables](#Variables)\n", 19 | "- [Simple data types](#Simple-data-types)\n", 20 | "- [Arithmetic operations](#Arithmetic-operations)\n", 21 | "- Collections: [Lists](#Lists) | [Dictionnaries](#Dictionnaries) | [Sets](#Sets) | [Tuples](#Tuples) | [Strings](#Strings)\n", 22 | "- [Conditional execution](#Conditional-execution) \n", 23 | "- [Comparison operations](#Comparison-operations)\n", 24 | "- [Loops](#Loops)\n", 25 | "- [Files](#Files)\n", 26 | "- [Getting help](#Getting-help)\n", 27 | "- [Exercise 1.1](#Exercise-1.1)" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "## Printing\n", 35 | "\n", 36 | "You can include a ***comment*** in python by prefixing some text with a **`#` character**. All text following the `#` will then be ignored by the interpreter." 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": null, 42 | "metadata": {}, 43 | "outputs": [], 44 | "source": [ 45 | "print('Hello from python!') # to print some text, enclose it between quotation marks - single\n", 46 | "print(\"I'm here today!\") # or double\n", 47 | "print(34) # print an integer\n", 48 | "print(2 + 4) # print the result of an arithmetic operation\n", 49 | "print(\"The answer is\", 42) # print multiple expressions, separated by comma" 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": {}, 55 | "source": [ 56 | "## Variables\n", 57 | "\n", 58 | "A variable can be assigned to a simple value or the outcome of a more complex expression.\n", 59 | "The **`=` operator** is used to assign a value to a variable." 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": null, 65 | "metadata": {}, 66 | "outputs": [], 67 | "source": [ 68 | "x = 3 # assignment of a simple value\n", 69 | "print(x)\n", 70 | "y = x + 5 # assignment of a more complex expression\n", 71 | "print(y)\n", 72 | "i = 12\n", 73 | "print(i)\n", 74 | "i = i + 1 # assigment of the current value of a variable incremented by 1 to itself\n", 75 | "print(i)\n", 76 | "i += 1 # shorter version with the special += operator\n", 77 | "print(i)" 78 | ] 79 | }, 80 | { 81 | "cell_type": "markdown", 82 | "metadata": { 83 | "slideshow": { 84 | "slide_type": "-" 85 | } 86 | }, 87 | "source": [ 88 | "## Simple data types\n", 89 | "\n", 90 | "Python has 4 main basic data types." 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": null, 96 | "metadata": { 97 | "slideshow": { 98 | "slide_type": "-" 99 | } 100 | }, 101 | "outputs": [], 102 | "source": [ 103 | "a = 2 # integer\n", 104 | "b = 5.0 # float\n", 105 | "c = 'word' # string\n", 106 | "d = 4 > 5 # boolean True or False\n", 107 | "e = None # special built-in value to create a variable that has not been set to anything specific\n", 108 | "print(a, b, c, d, e)\n", 109 | "print(a, 'is of type', type(a)) # to check the type of a variable " 110 | ] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "metadata": {}, 115 | "source": [ 116 | "## Arithmetic operations" 117 | ] 118 | }, 119 | { 120 | "cell_type": "code", 121 | "execution_count": null, 122 | "metadata": {}, 123 | "outputs": [], 124 | "source": [ 125 | "a = 2 # assignment\n", 126 | "a += 1 # change and assign (*=, /=)\n", 127 | "3 + 2 # addition\n", 128 | "3 - 2 # subtraction\n", 129 | "3 * 2 # multiplication\n", 130 | "3 / 2 # integer (python2) or float (python3) division\n", 131 | "\n", 132 | "3 // 2 # integer division\n", 133 | "3 % 2 # remainder\n", 134 | "3 ** 2 # exponent" 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": { 140 | "slideshow": { 141 | "slide_type": "-" 142 | } 143 | }, 144 | "source": [ 145 | "## Lists\n", 146 | "\n", 147 | "A list is an ordered collection of mutable elements." 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": null, 153 | "metadata": { 154 | "slideshow": { 155 | "slide_type": "-" 156 | } 157 | }, 158 | "outputs": [], 159 | "source": [ 160 | "a = ['red', 'blue', 'green'] # manual initialisation\n", 161 | "copy_of_a = a[:] # copy of a \n", 162 | "another_a = a # same as a\n", 163 | "b = list(range(5)) # initialise from iteratable\n", 164 | "c = [1, 2, 3, 4, 5, 6] # manual initialisation\n", 165 | "len(c) # length of the list\n", 166 | "d = c[0] # access first element at index 0\n", 167 | "e = c[1:3] # access a slice of the list, \n", 168 | " # including element at index 1 up to but not including element at index 3\n", 169 | "f = c[-1] # access last element\n", 170 | "c[1] = 8 # assign new value at index position 1\n", 171 | "g = ['re', 'bl'] + ['gr'] # list concatenation\n", 172 | "['re', 'bl'].index('re') # returns index of 're'\n", 173 | "a.append('yellow') # add new element to end of list\n", 174 | "a.extend(b) # add elements from list `b` to end of list `a`\n", 175 | "a.insert(1, 'yellow') # insert element in specified position\n", 176 | "'re' in ['re', 'bl'] # true if 're' in list\n", 177 | "'fi' not in ['re', 'bl'] # true if 'fi' not in list\n", 178 | "c.sort() # sort list in place\n", 179 | "h = sorted([3, 2, 1]) # returns sorted list\n", 180 | "i = a.pop(2) # remove and return item at index (default last)\n", 181 | "print(a, b, c, d, e, f, g, h, i)\n", 182 | "print(a, copy_of_a, another_a)" 183 | ] 184 | }, 185 | { 186 | "cell_type": "markdown", 187 | "metadata": {}, 188 | "source": [ 189 | "## Dictionnaries\n", 190 | "\n", 191 | "A dictionnary is an unordered collection of key-value pairs where keys must be unique." 192 | ] 193 | }, 194 | { 195 | "cell_type": "code", 196 | "execution_count": null, 197 | "metadata": { 198 | "slideshow": { 199 | "slide_type": "-" 200 | } 201 | }, 202 | "outputs": [], 203 | "source": [ 204 | "a = {'A': 'Adenine', 'C': 'Cytosine'} # dictionary\n", 205 | "b = a['A'] # translate item\n", 206 | "c = a.get('N', 'no value found') # return default value\n", 207 | "'A' in a # true if dictionary a contains key 'A'\n", 208 | "a['G'] = 'Guanine' # assign new key, value pair to dictonary a\n", 209 | "a['T'] = 'Thymine' # assign new key, value pair to dictonary a\n", 210 | "print(a)\n", 211 | "d = a.keys() # get list of keys\n", 212 | "e = a.values() # get list of values\n", 213 | "f = a.items() # get list of key-value pairs\n", 214 | "print(b, c, d, e, f)\n", 215 | "del a['A'] # delete key and associated value\n", 216 | "print(a)" 217 | ] 218 | }, 219 | { 220 | "cell_type": "markdown", 221 | "metadata": {}, 222 | "source": [ 223 | "## Sets\n", 224 | "\n", 225 | "A set is an unordered collection of unique elements. " 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": null, 231 | "metadata": { 232 | "slideshow": { 233 | "slide_type": "-" 234 | } 235 | }, 236 | "outputs": [], 237 | "source": [ 238 | "a = {1, 2, 3} # initialise manually\n", 239 | "b = set(range(5)) # initialise from iteratable\n", 240 | "c = set([1,2,2,2,2,4,5,6,6,6]) # initialise from list\n", 241 | "a.add(13) # add new element to set\n", 242 | "a.remove(13) # remove element from set\n", 243 | "2 in {1, 2, 3} # true if 2 in set\n", 244 | "5 not in {1, 2, 3} # true if 5 not in set\n", 245 | "d = a.union(b) # return the union of sets as a new set\n", 246 | "e = a.intersection(b) # return the intersection of sets as a new set\n", 247 | "print(a, b, c, d, e)" 248 | ] 249 | }, 250 | { 251 | "cell_type": "markdown", 252 | "metadata": {}, 253 | "source": [ 254 | "## Tuples\n", 255 | "\n", 256 | "Tuple is an ordered collection of immutable elements. Tuples are similar to lists, but the elements un a tuple cannot be modified. Most of list operations seen above can be used on tuples except the assignment of new value at a certain index position." 257 | ] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "execution_count": null, 262 | "metadata": {}, 263 | "outputs": [], 264 | "source": [ 265 | "a = (123, 54, 92) # initialise manually\n", 266 | "b = () # empty tuple\n", 267 | "c = (\"Ala\",) # tuple of a single string (note the trailing \",\")\n", 268 | "d = (2, 3, False, \"Arg\", None) # a tuple of mixed types\n", 269 | "print(a, b, c, d)\n", 270 | "t = a, c, d # tuple packing\n", 271 | "x, y, z = t # tuple unpacking\n", 272 | "print(t, x, y, z)" 273 | ] 274 | }, 275 | { 276 | "cell_type": "markdown", 277 | "metadata": {}, 278 | "source": [ 279 | "## Strings\n", 280 | "\n", 281 | "String is an ordered collection of immutable characters or tuple of characters." 282 | ] 283 | }, 284 | { 285 | "cell_type": "code", 286 | "execution_count": null, 287 | "metadata": { 288 | "slideshow": { 289 | "slide_type": "-" 290 | } 291 | }, 292 | "outputs": [], 293 | "source": [ 294 | "a = 'red' # assignment\n", 295 | "char = a[2] # access individual characters\n", 296 | "b = 'red' + 'blue' # string concatenation\n", 297 | "c = '1, 2, three'.split(',') # split string into list\n", 298 | "d = '.'.join(['1', '2', 'three']) # concatenate list into string\n", 299 | "print(a, char, b, c, d) \n", 300 | "dna = 'ATGTCACCGTTT' # assignment\n", 301 | "seq = list(dna) # convert string into list of character\n", 302 | "e = len(dna) # return string length\n", 303 | "f = dna[2:5] # slice string\n", 304 | "g = dna.find('TGA') # substring location, return -1 when not found\n", 305 | "print(dna, seq, e, f, g)\n", 306 | "text = ' chrom start end ' # assignment\n", 307 | "print('>', text, '<')\n", 308 | "print('>', text.strip(), '<') # remove unwanted whitespace at both end of the string\n", 309 | "print('{:.2f}'.format(0.4567)) # formating string\n", 310 | "print('{gene:s}\\t{exp:+.2f}'.format(gene='Beta-Actin', exp=1.7))" 311 | ] 312 | }, 313 | { 314 | "cell_type": "markdown", 315 | "metadata": {}, 316 | "source": [ 317 | "## Conditional execution\n", 318 | "\n", 319 | "A conditional **`if/elif`** statement is used to specify that some block of code should only be executed if a conditional expression evaluates to `True`, there can be a final **`else`** statement to do something if all of the conditions are `False`.\n", 320 | "Python uses **indentation** to show which statements are in a block of code. " 321 | ] 322 | }, 323 | { 324 | "cell_type": "code", 325 | "execution_count": null, 326 | "metadata": {}, 327 | "outputs": [], 328 | "source": [ 329 | "a, b = 1, 2 # assign different values to a and b\n", 330 | "if a + b == 3:\n", 331 | " print('True')\n", 332 | "elif a + b == 1:\n", 333 | " print('False')\n", 334 | "else:\n", 335 | " print('?')" 336 | ] 337 | }, 338 | { 339 | "cell_type": "markdown", 340 | "metadata": {}, 341 | "source": [ 342 | "## Comparison operations" 343 | ] 344 | }, 345 | { 346 | "cell_type": "code", 347 | "execution_count": null, 348 | "metadata": {}, 349 | "outputs": [], 350 | "source": [ 351 | "1 == 1 # equal value\n", 352 | "1 != 2 # not equal\n", 353 | "2 > 1 # larger\n", 354 | "2 < 1 # smaller\n", 355 | "\n", 356 | "1 != 2 and 2 < 3 # logical AND\n", 357 | "1 != 2 or 2 < 3 # logical OR\n", 358 | "not 1 == 2 # logical NOT\n", 359 | "\n", 360 | "a = list('ATGTCACCGTTT')\n", 361 | "b = a # same as a\n", 362 | "c = a[:] # copy of a\n", 363 | "'N' in a # test if character 'N' is in a\n", 364 | "\n", 365 | "print('a', a) # print a\n", 366 | "print('b', b) # print b\n", 367 | "print('c', c) # print c\n", 368 | "print('Is N in a?', 'N' in a)\n", 369 | "print('Are objects b and a point to the same memory address?', b is a)\n", 370 | "print('Are objects c and a point to the same memory address?', c is a)\n", 371 | "print('Are values of b and a identical?', b == a)\n", 372 | "print('Are values of c and a identical?', c == a)\n", 373 | "a[0] = 'N' # modify a \n", 374 | "print('a', a) # print a\n", 375 | "print('b', b) # print b\n", 376 | "print('c', c) # print c\n", 377 | "print('Is N in a?', 'N' in a)\n", 378 | "print('Are objects b and a point to the same memory address?', b is a)\n", 379 | "print('Are objects c and a point to the same memory address?', c is a)\n", 380 | "print('Are values of b and a identical?', b == a)\n", 381 | "print('Are values of c and a identical?', c == a)" 382 | ] 383 | }, 384 | { 385 | "cell_type": "markdown", 386 | "metadata": {}, 387 | "source": [ 388 | "## Loops\n", 389 | "\n", 390 | "There are two ways of creating loops in Python, the **`for` loop** and the **`while` loop**." 391 | ] 392 | }, 393 | { 394 | "cell_type": "code", 395 | "execution_count": null, 396 | "metadata": {}, 397 | "outputs": [], 398 | "source": [ 399 | "a = ['red', 'blue', 'green']\n", 400 | "for color in a:\n", 401 | " print(color)" 402 | ] 403 | }, 404 | { 405 | "cell_type": "code", 406 | "execution_count": null, 407 | "metadata": {}, 408 | "outputs": [], 409 | "source": [ 410 | "number = 1\n", 411 | "while number < 10:\n", 412 | " print(number)\n", 413 | " number += 1" 414 | ] 415 | }, 416 | { 417 | "cell_type": "markdown", 418 | "metadata": {}, 419 | "source": [ 420 | "Python has two ways of affecting the flow of the `for` or `while` loop inside the block. The **`break`** statement immediately causes all looping to finish, and execution is resumed at the next statement after the loop. The **`continue`** statement means that the rest of the code in the block is skipped for this particular item in the collection." 421 | ] 422 | }, 423 | { 424 | "cell_type": "code", 425 | "execution_count": null, 426 | "metadata": {}, 427 | "outputs": [], 428 | "source": [ 429 | "# break\n", 430 | "sequence = ['CAG','TAC','CAA','TAG','TAC','CAG','CAA']\n", 431 | "for codon in sequence:\n", 432 | " if codon == 'TAG':\n", 433 | " break # Quit looping at this point\n", 434 | " else:\n", 435 | " print(codon)\n", 436 | "\n", 437 | "# continue\n", 438 | "values = [10, -5, 3, -1, 7]\n", 439 | "total = 0\n", 440 | "for v in values:\n", 441 | " if v < 0:\n", 442 | " continue # Skip this iteration \n", 443 | " total += v\n", 444 | "print(values, 'sum:', sum(values), 'total:', total)" 445 | ] 446 | }, 447 | { 448 | "cell_type": "markdown", 449 | "metadata": {}, 450 | "source": [ 451 | "## Files\n", 452 | "\n", 453 | "To read from a file, your program needs to open the file and then read the contents of the file. You can read the entire contents of the file at once, or read the file line by line. The **`with`** statement makes sure the file is closed properly when the program has finished accessing the file.\n", 454 | "\n", 455 | "\n", 456 | "Passing the `'w'` argument to `open()` tells Python you want to write to the file. Be careful; this will erase the contents of the file if it already exists. Passing the `'a'` argument tells Python you want to append to the end of an existing file." 457 | ] 458 | }, 459 | { 460 | "cell_type": "code", 461 | "execution_count": null, 462 | "metadata": {}, 463 | "outputs": [], 464 | "source": [ 465 | "# reading from file\n", 466 | "with open(\"data/genes.txt\") as f:\n", 467 | " for line in f:\n", 468 | " print(line.strip())\n", 469 | "\n", 470 | "# writing to a file\n", 471 | "with open('programming.txt', 'w') as f:\n", 472 | " f.write(\"I love programming in Python!\\n\")\n", 473 | " f.write(\"I love making scripts.\\n\")\n", 474 | " \n", 475 | "# appending to a file \n", 476 | "with open('programming.txt', 'a') as f:\n", 477 | " f.write(\"I love working with data.\\n\")" 478 | ] 479 | }, 480 | { 481 | "cell_type": "markdown", 482 | "metadata": {}, 483 | "source": [ 484 | "## Getting help\n", 485 | "\n", 486 | "[The Python 3 Standard Library](https://docs.python.org/3/library/index.html) is the reference documentation of all libraries included in Python as well as built-in functions and data types." 487 | ] 488 | }, 489 | { 490 | "cell_type": "code", 491 | "execution_count": null, 492 | "metadata": {}, 493 | "outputs": [], 494 | "source": [ 495 | "help(len) # help on built-in function\n", 496 | "help(list.extend) # help on list function" 497 | ] 498 | }, 499 | { 500 | "cell_type": "code", 501 | "execution_count": null, 502 | "metadata": {}, 503 | "outputs": [], 504 | "source": [ 505 | "# help within jupyter\n", 506 | "len?" 507 | ] 508 | }, 509 | { 510 | "cell_type": "markdown", 511 | "metadata": {}, 512 | "source": [ 513 | "## Exercise 1.1\n", 514 | "\n", 515 | "We are going to look at a [Gapminder](https://www.gapminder.org/) dataset, made famous by Hans Rosling from the Ted presentation [‘The best stats you’ve ever seen’](http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen).\n", 516 | "\n", 517 | "- Read the dataset from the file `data/gapminder.txt` \n", 518 | "- Find what are the oldest and youngest years in the dataset programatically \n", 519 | "- Calculate average life expectancy as well as global population increase between these two years\n", 520 | "- Find which country has the lowest life expectancy in 2002" 521 | ] 522 | }, 523 | { 524 | "cell_type": "markdown", 525 | "metadata": { 526 | "slideshow": { 527 | "slide_type": "-" 528 | } 529 | }, 530 | "source": [ 531 | "## Next session\n", 532 | "\n", 533 | "Go to our next notebook: [python_functions_and_modules_2](python_fm_2.ipynb)" 534 | ] 535 | } 536 | ], 537 | "metadata": { 538 | "celltoolbar": "Slideshow", 539 | "kernelspec": { 540 | "display_name": "Python 3", 541 | "language": "python", 542 | "name": "python3" 543 | }, 544 | "language_info": { 545 | "codemirror_mode": { 546 | "name": "ipython", 547 | "version": 3 548 | }, 549 | "file_extension": ".py", 550 | "mimetype": "text/x-python", 551 | "name": "python", 552 | "nbconvert_exporter": "python", 553 | "pygments_lexer": "ipython3", 554 | "version": "3.6.2" 555 | } 556 | }, 557 | "nbformat": 4, 558 | "nbformat_minor": 1 559 | } 560 | -------------------------------------------------------------------------------- /python_fm_intro.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "deletable": true, 7 | "editable": true, 8 | "nbpresent": { 9 | "id": "dc7a1635-0bbd-4bf7-a07e-7a36f58e258b" 10 | }, 11 | "slideshow": { 12 | "slide_type": "slide" 13 | } 14 | }, 15 | "source": [ 16 | "# Working with Python: functions and modules\n", 17 | "\n", 18 | "
" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "deletable": true, 25 | "editable": true, 26 | "slideshow": { 27 | "slide_type": "slide" 28 | } 29 | }, 30 | "source": [ 31 | "## Presenters\n", 32 | "\n", 33 | "- Anne\n", 34 | "- Mukarram" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": { 40 | "deletable": true, 41 | "editable": true, 42 | "slideshow": { 43 | "slide_type": "slide" 44 | } 45 | }, 46 | "source": [ 47 | "## Aims\n", 48 | "\n", 49 | "This course will cover concepts and strategies for working **more effectively** with Python with the aim of: \n", 50 | "\n", 51 | "- Writing **reusable** code, using **functions** and **libraries**\n", 52 | "- Acquiring a working knowledge of **key concepts** which are prerequisites for advanced programming in Python e.g. writing modules and classes\n", 53 | "- During this course you will learn about:\n", 54 | " - Writing functions\n", 55 | " - Best practices to write reusable code\n", 56 | " - Structuring code in a custom module\n", 57 | " - Using Python libraries\n", 58 | " - Drawing plots with Matplotlib and working with biological data using BioPython" 59 | ] 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "metadata": { 64 | "deletable": true, 65 | "editable": true, 66 | "nbpresent": { 67 | "id": "21082cb9-e1b9-4fe9-80d5-9d9e8418937b" 68 | }, 69 | "slideshow": { 70 | "slide_type": "slide" 71 | } 72 | }, 73 | "source": [ 74 | "## Learning objectives\n", 75 | "- **Recall** the basic syntax, how to print and define variables\n", 76 | "- **List** the most common data structures in Python\n", 77 | "- **Explain** how to write conditions and loops in Python\n", 78 | "- **Practice** reading and writing files with Python\n", 79 | "- **Explain** how to write user-defined functions and modules in Python\n", 80 | "- **Use** existing in-built as well as third-party Python libraries\n", 81 | "- **Solve** more complex exercises using these concepts" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": { 87 | "deletable": true, 88 | "editable": true, 89 | "nbpresent": { 90 | "id": "ceb5f5a0-a5e8-435e-ae16-23c2ba8c6ab2" 91 | }, 92 | "slideshow": { 93 | "slide_type": "slide" 94 | } 95 | }, 96 | "source": [ 97 | "## Course schedule\n", 98 | "\n", 99 | "- 09:30-10:00: [0h30] **Introduction**\n", 100 | "- 10:00-11:00: [1h00] **Session 1** - Introduction to Python\n", 101 | "- 11:00-11:15: *break*\n", 102 | "- 11:15-13:00: [1h45] **Session 2** - Functions\n", 103 | "- 13:00-14:00: *lunch break*\n", 104 | "- 14:00-15:15: [1h15] **Session 3** - Modules\n", 105 | "- 15:15-15:30: *break*\n", 106 | "- 15:30-16:30: [1h00] **Session 4** - Matplotlib & BioPython" 107 | ] 108 | }, 109 | { 110 | "cell_type": "markdown", 111 | "metadata": { 112 | "deletable": true, 113 | "editable": true, 114 | "nbpresent": { 115 | "id": "8458de53-35b5-405e-a372-5db5d2e2c2c5" 116 | }, 117 | "slideshow": { 118 | "slide_type": "slide" 119 | } 120 | }, 121 | "source": [ 122 | "## Course materials\n", 123 | "\n", 124 | "- There are two course webpages with links to the materials, example solutions to the exercises etc.:\n", 125 | " - http://pycam.github.io\n", 126 | " - https://github.com/pycam\n", 127 | "- We’d like you to follow along with the example code as we go through the material, and attempt the exercises to practice what you’ve learned\n", 128 | "- Questions are welcome at any point!\n", 129 | "- If you have specific projects/problems that you think could be attempted using Python, we are happy to (try to) help during the exercises. Just let us know!\n" 130 | ] 131 | }, 132 | { 133 | "cell_type": "markdown", 134 | "metadata": { 135 | "deletable": true, 136 | "editable": true, 137 | "nbpresent": { 138 | "id": "96ca5c44-2cfc-471c-8da7-39870c822e20" 139 | }, 140 | "slideshow": { 141 | "slide_type": "slide" 142 | } 143 | }, 144 | "source": [ 145 | "## What is *Python*?\n", 146 | "\n", 147 | "- Python is a *dynamic, interpreted* general purpose programming language initially created by Guido van Rossum in 1991\n", 148 | "- It is a powerful language that supports several popular programming paradigms:\n", 149 | " - procedural\n", 150 | " - object-oriented\n", 151 | " - functional\n", 152 | "- Python is widely used in bioinformatics and scientific computing, as well as many other academic areas and in industry too\n", 153 | "- Python is available on all popular operating systems\n", 154 | " - Mac\n", 155 | " - Windows\n", 156 | " - Linux" 157 | ] 158 | }, 159 | { 160 | "cell_type": "markdown", 161 | "metadata": { 162 | "deletable": true, 163 | "editable": true, 164 | "nbpresent": { 165 | "id": "9110098b-9675-4d64-adf3-c947073d4c4d" 166 | }, 167 | "slideshow": { 168 | "slide_type": "slide" 169 | } 170 | }, 171 | "source": [ 172 | "## The Python programming language\n", 173 | "\n", 174 | "- Python is considered to come with \"batteries included\" and the standard library (some of which we will see in this course) provides built-in support for lots of common tasks:\n", 175 | " - numerical & mathematical functions \n", 176 | " - interacting with files and the operating system\n", 177 | " - ...\n", 178 | "\n", 179 | "- There is also a wide range of external libraries for areas not covered in the standard library, such as [Matplotlib](http://matplotlib.org/) the Python plotting library and the [BioPython](http://biopython.org/) Library which provides tools for bioinformatics - we look at this later" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": { 185 | "deletable": true, 186 | "editable": true, 187 | "nbpresent": { 188 | "id": "0d61b4b4-163f-47fe-80f1-092287218273" 189 | }, 190 | "slideshow": { 191 | "slide_type": "slide" 192 | } 193 | }, 194 | "source": [ 195 | "## Getting started\n", 196 | "\n", 197 | "- Python is an *interpreted* language, this means that your computer does not run Python code natively, but instead we run our code using the Python interpreter\n", 198 | "- There are three ways in which you can run Python code:\n", 199 | " - Directly typing **commands into the interpreter**: *Good for experimenting with the language, and for some interactive work*\n", 200 | " - Using a **Jupyter notebook**: *Great for experimenting with the language, and for sharing and learning*\n", 201 | " - Typing code **into a file** and then telling the interpreter to run the code from this file: *Good for larger programs, and when you want to run the same code repeatedly*\n" 202 | ] 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "metadata": { 207 | "deletable": true, 208 | "editable": true, 209 | "nbpresent": { 210 | "id": "8a4ac456-6c4b-4249-8662-b1cabfd7cee4" 211 | }, 212 | "slideshow": { 213 | "slide_type": "slide" 214 | } 215 | }, 216 | "source": [ 217 | "## How to start the Python interpreter?\n", 218 | "\n", 219 | "On a Mac or Linux machine you should start a terminal and then just type the command `python3`.\n", 220 | "
" 221 | ] 222 | }, 223 | { 224 | "cell_type": "markdown", 225 | "metadata": { 226 | "deletable": true, 227 | "editable": true, 228 | "nbpresent": { 229 | "id": "f5bcbcb5-4352-4674-a7b6-c8e576220422" 230 | }, 231 | "slideshow": { 232 | "slide_type": "slide" 233 | } 234 | }, 235 | "source": [ 236 | "## How to run Python code from a file?\n", 237 | "\n", 238 | "Running Python code is as simple as opening a Terminal window and typing the command `python3` followed by the name of the script.\n", 239 | "\n", 240 | "```bash\n", 241 | "python3 scripts/hello.py\n", 242 | "```\n", 243 | "
" 244 | ] 245 | }, 246 | { 247 | "cell_type": "markdown", 248 | "metadata": { 249 | "deletable": true, 250 | "editable": true, 251 | "nbpresent": { 252 | "id": "9814e8d7-60e0-43e6-aee0-3c33cc2cc809" 253 | }, 254 | "slideshow": { 255 | "slide_type": "slide" 256 | } 257 | }, 258 | "source": [ 259 | "## What is a Jupyter notebook?\n", 260 | "\n", 261 | "\n", 262 | "\n", 263 | "- The [Jupyter Notebook](http://jupyter.org/) is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. \n", 264 | "\n", 265 | "- Jupyter provides a rich architecture for interactive data science and scientific computing with: \n", 266 | " - Over 40 programming languages such as Python, R, Julia and Scala.\n", 267 | " - A browser-based notebook with support for code, rich text, math expressions, plots and other rich media.\n", 268 | " - Support for interactive data visualization.\n", 269 | " - Easy to use tools for parallel computing." 270 | ] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "metadata": { 275 | "deletable": true, 276 | "editable": true, 277 | "nbpresent": { 278 | "id": "62fdd00c-a006-4f11-b9dc-e2ca072225d7" 279 | }, 280 | "slideshow": { 281 | "slide_type": "slide" 282 | } 283 | }, 284 | "source": [ 285 | "## How to install Jupyter on your own computer?\n", 286 | "\n", 287 | "\n", 288 | "\n", 289 | "- We recommend using a virtual environment after having installed [Python 3](https://www.python.org/).\n", 290 | "```bash\n", 291 | "python3 -m venv venv\n", 292 | "source venv/bin/activate # activate your virtual environment\n", 293 | "pip install jupyter\n", 294 | "```\n", 295 | "\n", 296 | "- Start the notebook server from the command line:\n", 297 | "```\n", 298 | "jupyter notebook\n", 299 | "```\n", 300 | "- You should see the notebook home page open in your web browser.\n" 301 | ] 302 | }, 303 | { 304 | "cell_type": "markdown", 305 | "metadata": { 306 | "deletable": true, 307 | "editable": true, 308 | "nbpresent": { 309 | "id": "0e25dad3-add0-466e-8f71-e771d6ec4500" 310 | }, 311 | "slideshow": { 312 | "slide_type": "slide" 313 | } 314 | }, 315 | "source": [ 316 | "## How to run python in a Jupyter notebook?\n", 317 | "\n", 318 | "\n", 319 | "\n", 320 | "- See [Jupyter Notebook Basics](http://nbviewer.jupyter.org/github/jupyter/notebook/blob/master/docs/source/examples/Notebook/Notebook%20Basics.ipynb)\n", 321 | "\n", 322 | "\n", 323 | "- Go to our notebook for the fist session: [python_functions_and_modules_1](python_fm_1.ipynb)" 324 | ] 325 | } 326 | ], 327 | "metadata": { 328 | "anaconda-cloud": {}, 329 | "celltoolbar": "Slideshow", 330 | "kernelspec": { 331 | "display_name": "Python 3", 332 | "language": "python", 333 | "name": "python3" 334 | }, 335 | "language_info": { 336 | "codemirror_mode": { 337 | "name": "ipython", 338 | "version": 3 339 | }, 340 | "file_extension": ".py", 341 | "mimetype": "text/x-python", 342 | "name": "python", 343 | "nbconvert_exporter": "python", 344 | "pygments_lexer": "ipython3", 345 | "version": "3.6.4" 346 | }, 347 | "nbpresent": { 348 | "slides": { 349 | "152c5a3b-78f9-4183-bce2-379a4012baf6": { 350 | "id": "152c5a3b-78f9-4183-bce2-379a4012baf6", 351 | "layout": "grid", 352 | "prev": "5613e857-5b4e-42e4-9feb-df0440592ca2", 353 | "regions": { 354 | "20d6059c-7745-410d-a5fb-0b91cacbc2e2": { 355 | "attrs": { 356 | "height": 0.6666666666666666, 357 | "pad": 0.01, 358 | "treemap:weight": 1, 359 | "width": 0.5, 360 | "x": 0, 361 | "y": 0 362 | }, 363 | "id": "20d6059c-7745-410d-a5fb-0b91cacbc2e2" 364 | }, 365 | "300e6ccd-ecf4-425e-8574-3debe305aafb": { 366 | "attrs": { 367 | "height": 0.3333333333333333, 368 | "pad": 0.01, 369 | "treemap:weight": 1, 370 | "width": 1, 371 | "x": 0, 372 | "y": 0.6666666666666666 373 | }, 374 | "content": { 375 | "cell": "9814e8d7-60e0-43e6-aee0-3c33cc2cc809", 376 | "part": "whole" 377 | }, 378 | "id": "300e6ccd-ecf4-425e-8574-3debe305aafb" 379 | }, 380 | "df2dd6ff-570b-4b75-9cb7-1ff1dbdd4f55": { 381 | "attrs": { 382 | "height": 0.6666666666666666, 383 | "pad": 0.01, 384 | "treemap:weight": 1, 385 | "width": 0.5, 386 | "x": 0.5, 387 | "y": 0 388 | }, 389 | "id": "df2dd6ff-570b-4b75-9cb7-1ff1dbdd4f55" 390 | } 391 | } 392 | }, 393 | "2586ca7d-5091-40ea-b566-ccc5fbf833c6": { 394 | "id": "2586ca7d-5091-40ea-b566-ccc5fbf833c6", 395 | "prev": "f001d476-5814-4664-a722-f04f5d23cd52", 396 | "regions": { 397 | "d6011048-43db-4990-a82e-768683aa4fe5": { 398 | "attrs": { 399 | "height": 0.8, 400 | "width": 0.8, 401 | "x": 0.1, 402 | "y": 0.1 403 | }, 404 | "content": { 405 | "cell": "ceb5f5a0-a5e8-435e-ae16-23c2ba8c6ab2", 406 | "part": "whole" 407 | }, 408 | "id": "d6011048-43db-4990-a82e-768683aa4fe5" 409 | } 410 | } 411 | }, 412 | "27ee4130-d0bb-4287-b8fe-75a7b0ecf178": { 413 | "id": "27ee4130-d0bb-4287-b8fe-75a7b0ecf178", 414 | "prev": "2586ca7d-5091-40ea-b566-ccc5fbf833c6", 415 | "regions": { 416 | "7a689d66-0c9d-4492-928b-f35bfd2ffc4c": { 417 | "attrs": { 418 | "height": 0.8, 419 | "width": 0.8, 420 | "x": 0.1, 421 | "y": 0.1 422 | }, 423 | "content": { 424 | "cell": "e6c2e441-eb7b-4a4c-9c9c-b88cc9a2527f", 425 | "part": "whole" 426 | }, 427 | "id": "7a689d66-0c9d-4492-928b-f35bfd2ffc4c" 428 | } 429 | } 430 | }, 431 | "2de0c027-7a07-4f7e-8594-a98d36125372": { 432 | "id": "2de0c027-7a07-4f7e-8594-a98d36125372", 433 | "prev": "75e76bd9-24ae-4c42-b6bc-5f58a0550ba8", 434 | "regions": { 435 | "868fd842-e6fb-48b2-9ac5-95e8fe20927e": { 436 | "attrs": { 437 | "height": 0.8, 438 | "width": 0.8, 439 | "x": 0.1, 440 | "y": 0.1 441 | }, 442 | "content": { 443 | "cell": "0e25dad3-add0-466e-8f71-e771d6ec4500", 444 | "part": "whole" 445 | }, 446 | "id": "868fd842-e6fb-48b2-9ac5-95e8fe20927e" 447 | } 448 | } 449 | }, 450 | "5613e857-5b4e-42e4-9feb-df0440592ca2": { 451 | "id": "5613e857-5b4e-42e4-9feb-df0440592ca2", 452 | "prev": "564dae42-4185-46c1-b156-e503f475e25c", 453 | "regions": { 454 | "17e888b0-050b-406a-a5a3-0d5c1605b8df": { 455 | "attrs": { 456 | "height": 0.8, 457 | "width": 0.8, 458 | "x": 0.1, 459 | "y": 0.1 460 | }, 461 | "content": { 462 | "cell": "f5bcbcb5-4352-4674-a7b6-c8e576220422", 463 | "part": "whole" 464 | }, 465 | "id": "17e888b0-050b-406a-a5a3-0d5c1605b8df" 466 | } 467 | } 468 | }, 469 | "564dae42-4185-46c1-b156-e503f475e25c": { 470 | "id": "564dae42-4185-46c1-b156-e503f475e25c", 471 | "prev": "ba285213-f645-4314-afd5-0a656fa35631", 472 | "regions": { 473 | "328d4d72-cd9e-4e5b-aaa8-175833f5bfdb": { 474 | "attrs": { 475 | "height": 0.8, 476 | "width": 0.8, 477 | "x": 0.1, 478 | "y": 0.1 479 | }, 480 | "content": { 481 | "cell": "8a4ac456-6c4b-4249-8662-b1cabfd7cee4", 482 | "part": "whole" 483 | }, 484 | "id": "328d4d72-cd9e-4e5b-aaa8-175833f5bfdb" 485 | } 486 | } 487 | }, 488 | "6ff94ac3-8ded-442e-ae43-aa0a5c14d468": { 489 | "id": "6ff94ac3-8ded-442e-ae43-aa0a5c14d468", 490 | "prev": "27ee4130-d0bb-4287-b8fe-75a7b0ecf178", 491 | "regions": { 492 | "ad759b3a-6080-4356-a9fd-87f2b1b90bc2": { 493 | "attrs": { 494 | "height": 0.8, 495 | "width": 0.8, 496 | "x": 0.1, 497 | "y": 0.1 498 | }, 499 | "content": { 500 | "cell": "8458de53-35b5-405e-a372-5db5d2e2c2c5", 501 | "part": "whole" 502 | }, 503 | "id": "ad759b3a-6080-4356-a9fd-87f2b1b90bc2" 504 | } 505 | } 506 | }, 507 | "75e76bd9-24ae-4c42-b6bc-5f58a0550ba8": { 508 | "id": "75e76bd9-24ae-4c42-b6bc-5f58a0550ba8", 509 | "prev": "152c5a3b-78f9-4183-bce2-379a4012baf6", 510 | "regions": { 511 | "4afd3b41-071f-44eb-a8f6-9a7f780041c2": { 512 | "attrs": { 513 | "height": 0.8, 514 | "width": 0.8, 515 | "x": 0.1, 516 | "y": 0.1 517 | }, 518 | "content": { 519 | "cell": "62fdd00c-a006-4f11-b9dc-e2ca072225d7", 520 | "part": "whole" 521 | }, 522 | "id": "4afd3b41-071f-44eb-a8f6-9a7f780041c2" 523 | } 524 | } 525 | }, 526 | "8c46fa2c-d5dc-4ef7-8d99-f504e2c3a4a1": { 527 | "id": "8c46fa2c-d5dc-4ef7-8d99-f504e2c3a4a1", 528 | "prev": "e2f5626f-0d60-47cb-967f-0edababb0329", 529 | "regions": { 530 | "af33776f-ec36-45be-a627-39573a78b1d6": { 531 | "attrs": { 532 | "height": 0.8, 533 | "width": 0.8, 534 | "x": 0.1, 535 | "y": 0.1 536 | }, 537 | "content": { 538 | "cell": "0d61b4b4-163f-47fe-80f1-092287218273", 539 | "part": "whole" 540 | }, 541 | "id": "af33776f-ec36-45be-a627-39573a78b1d6" 542 | } 543 | } 544 | }, 545 | "ae3f4c01-80dc-4add-889a-05c74f7155a5": { 546 | "id": "ae3f4c01-80dc-4add-889a-05c74f7155a5", 547 | "prev": "6ff94ac3-8ded-442e-ae43-aa0a5c14d468", 548 | "regions": { 549 | "15f00a98-7b04-439d-996d-851b773b060a": { 550 | "attrs": { 551 | "height": 0.8, 552 | "width": 0.8, 553 | "x": 0.1, 554 | "y": 0.1 555 | }, 556 | "content": { 557 | "cell": "96ca5c44-2cfc-471c-8da7-39870c822e20", 558 | "part": "whole" 559 | }, 560 | "id": "15f00a98-7b04-439d-996d-851b773b060a" 561 | } 562 | } 563 | }, 564 | "ba285213-f645-4314-afd5-0a656fa35631": { 565 | "id": "ba285213-f645-4314-afd5-0a656fa35631", 566 | "prev": "8c46fa2c-d5dc-4ef7-8d99-f504e2c3a4a1", 567 | "regions": { 568 | "6cddb9f2-8e39-4010-8fab-3e70b3a8993f": { 569 | "attrs": { 570 | "height": 0.8, 571 | "width": 0.8, 572 | "x": 0.1, 573 | "y": 0.1 574 | }, 575 | "content": { 576 | "cell": "b878a4f9-4345-4abb-81f4-5a731c639ab8", 577 | "part": "whole" 578 | }, 579 | "id": "6cddb9f2-8e39-4010-8fab-3e70b3a8993f" 580 | } 581 | } 582 | }, 583 | "cd587236-8a19-444d-8b18-69d782dbf725": { 584 | "id": "cd587236-8a19-444d-8b18-69d782dbf725", 585 | "prev": null, 586 | "regions": { 587 | "ef377bfe-ff45-49db-b471-f79ecb10b580": { 588 | "attrs": { 589 | "height": 0.8, 590 | "width": 0.8, 591 | "x": 0.1, 592 | "y": 0.1 593 | }, 594 | "content": { 595 | "cell": "dc7a1635-0bbd-4bf7-a07e-7a36f58e258b", 596 | "part": "whole" 597 | }, 598 | "id": "ef377bfe-ff45-49db-b471-f79ecb10b580" 599 | } 600 | } 601 | }, 602 | "e2f5626f-0d60-47cb-967f-0edababb0329": { 603 | "id": "e2f5626f-0d60-47cb-967f-0edababb0329", 604 | "prev": "ae3f4c01-80dc-4add-889a-05c74f7155a5", 605 | "regions": { 606 | "eef49fa0-0f9b-4228-8fb8-79e079bf7682": { 607 | "attrs": { 608 | "height": 0.8, 609 | "width": 0.8, 610 | "x": 0.1, 611 | "y": 0.1 612 | }, 613 | "content": { 614 | "cell": "9110098b-9675-4d64-adf3-c947073d4c4d", 615 | "part": "whole" 616 | }, 617 | "id": "eef49fa0-0f9b-4228-8fb8-79e079bf7682" 618 | } 619 | } 620 | }, 621 | "f001d476-5814-4664-a722-f04f5d23cd52": { 622 | "id": "f001d476-5814-4664-a722-f04f5d23cd52", 623 | "prev": "cd587236-8a19-444d-8b18-69d782dbf725", 624 | "regions": { 625 | "5a176076-c5a5-4b50-ab2c-9cd0baedad45": { 626 | "attrs": { 627 | "height": 0.8, 628 | "width": 0.8, 629 | "x": 0.1, 630 | "y": 0.1 631 | }, 632 | "content": { 633 | "cell": "53eee250-b3d0-4262-ad09-e87fb2acf82e", 634 | "part": "whole" 635 | }, 636 | "id": "5a176076-c5a5-4b50-ab2c-9cd0baedad45" 637 | } 638 | } 639 | } 640 | }, 641 | "themes": { 642 | "default": "c6b5d1ad-d691-4000-9f62-de7fc0e83644", 643 | "theme": { 644 | "586a6e7a-f661-4d6c-90d0-1392715bea27": { 645 | "id": "586a6e7a-f661-4d6c-90d0-1392715bea27", 646 | "palette": { 647 | "19cc588f-0593-49c9-9f4b-e4d7cc113b1c": { 648 | "id": "19cc588f-0593-49c9-9f4b-e4d7cc113b1c", 649 | "rgb": [ 650 | 252, 651 | 252, 652 | 252 653 | ] 654 | }, 655 | "31af15d2-7e15-44c5-ab5e-e04b16a89eff": { 656 | "id": "31af15d2-7e15-44c5-ab5e-e04b16a89eff", 657 | "rgb": [ 658 | 68, 659 | 68, 660 | 68 661 | ] 662 | }, 663 | "50f92c45-a630-455b-aec3-788680ec7410": { 664 | "id": "50f92c45-a630-455b-aec3-788680ec7410", 665 | "rgb": [ 666 | 155, 667 | 177, 668 | 192 669 | ] 670 | }, 671 | "c5cc3653-2ee1-402a-aba2-7caae1da4f6c": { 672 | "id": "c5cc3653-2ee1-402a-aba2-7caae1da4f6c", 673 | "rgb": [ 674 | 43, 675 | 126, 676 | 184 677 | ] 678 | }, 679 | "efa7f048-9acb-414c-8b04-a26811511a21": { 680 | "id": "efa7f048-9acb-414c-8b04-a26811511a21", 681 | "rgb": [ 682 | 25.118061674008803, 683 | 73.60176211453744, 684 | 107.4819383259912 685 | ] 686 | } 687 | }, 688 | "rules": { 689 | "blockquote": { 690 | "color": "50f92c45-a630-455b-aec3-788680ec7410" 691 | }, 692 | "code": { 693 | "font-family": "Anonymous Pro" 694 | }, 695 | "h1": { 696 | "color": "c5cc3653-2ee1-402a-aba2-7caae1da4f6c", 697 | "font-family": "Lato", 698 | "font-size": 8 699 | }, 700 | "h2": { 701 | "color": "c5cc3653-2ee1-402a-aba2-7caae1da4f6c", 702 | "font-family": "Lato", 703 | "font-size": 6 704 | }, 705 | "h3": { 706 | "color": "50f92c45-a630-455b-aec3-788680ec7410", 707 | "font-family": "Lato", 708 | "font-size": 5.5 709 | }, 710 | "h4": { 711 | "color": "c5cc3653-2ee1-402a-aba2-7caae1da4f6c", 712 | "font-family": "Lato", 713 | "font-size": 5 714 | }, 715 | "h5": { 716 | "font-family": "Lato" 717 | }, 718 | "h6": { 719 | "font-family": "Lato" 720 | }, 721 | "h7": { 722 | "font-family": "Lato" 723 | }, 724 | "pre": { 725 | "font-family": "Anonymous Pro", 726 | "font-size": 4 727 | } 728 | }, 729 | "text-base": { 730 | "font-family": "Merriweather", 731 | "font-size": 4 732 | } 733 | }, 734 | "c6b5d1ad-d691-4000-9f62-de7fc0e83644": { 735 | "backgrounds": { 736 | "dc7afa04-bf90-40b1-82a5-726e3cff5267": { 737 | "background-color": "31af15d2-7e15-44c5-ab5e-e04b16a89eff", 738 | "id": "dc7afa04-bf90-40b1-82a5-726e3cff5267" 739 | } 740 | }, 741 | "id": "c6b5d1ad-d691-4000-9f62-de7fc0e83644", 742 | "palette": { 743 | "19cc588f-0593-49c9-9f4b-e4d7cc113b1c": { 744 | "id": "19cc588f-0593-49c9-9f4b-e4d7cc113b1c", 745 | "rgb": [ 746 | 252, 747 | 252, 748 | 252 749 | ] 750 | }, 751 | "31af15d2-7e15-44c5-ab5e-e04b16a89eff": { 752 | "id": "31af15d2-7e15-44c5-ab5e-e04b16a89eff", 753 | "rgb": [ 754 | 68, 755 | 68, 756 | 68 757 | ] 758 | }, 759 | "50f92c45-a630-455b-aec3-788680ec7410": { 760 | "id": "50f92c45-a630-455b-aec3-788680ec7410", 761 | "rgb": [ 762 | 197, 763 | 226, 764 | 245 765 | ] 766 | }, 767 | "c5cc3653-2ee1-402a-aba2-7caae1da4f6c": { 768 | "id": "c5cc3653-2ee1-402a-aba2-7caae1da4f6c", 769 | "rgb": [ 770 | 43, 771 | 126, 772 | 184 773 | ] 774 | }, 775 | "efa7f048-9acb-414c-8b04-a26811511a21": { 776 | "id": "efa7f048-9acb-414c-8b04-a26811511a21", 777 | "rgb": [ 778 | 25.118061674008803, 779 | 73.60176211453744, 780 | 107.4819383259912 781 | ] 782 | } 783 | }, 784 | "rules": { 785 | "a": { 786 | "color": "19cc588f-0593-49c9-9f4b-e4d7cc113b1c" 787 | }, 788 | "blockquote": { 789 | "color": "50f92c45-a630-455b-aec3-788680ec7410", 790 | "font-size": 3 791 | }, 792 | "code": { 793 | "font-family": "Anonymous Pro" 794 | }, 795 | "h1": { 796 | "color": "19cc588f-0593-49c9-9f4b-e4d7cc113b1c", 797 | "font-family": "Merriweather", 798 | "font-size": 8 799 | }, 800 | "h2": { 801 | "color": "19cc588f-0593-49c9-9f4b-e4d7cc113b1c", 802 | "font-family": "Merriweather", 803 | "font-size": 6 804 | }, 805 | "h3": { 806 | "color": "50f92c45-a630-455b-aec3-788680ec7410", 807 | "font-family": "Lato", 808 | "font-size": 5.5 809 | }, 810 | "h4": { 811 | "color": "c5cc3653-2ee1-402a-aba2-7caae1da4f6c", 812 | "font-family": "Lato", 813 | "font-size": 5 814 | }, 815 | "h5": { 816 | "font-family": "Lato" 817 | }, 818 | "h6": { 819 | "font-family": "Lato" 820 | }, 821 | "h7": { 822 | "font-family": "Lato" 823 | }, 824 | "li": { 825 | "color": "50f92c45-a630-455b-aec3-788680ec7410", 826 | "font-size": 3.25 827 | }, 828 | "pre": { 829 | "font-family": "Anonymous Pro", 830 | "font-size": 4 831 | } 832 | }, 833 | "text-base": { 834 | "color": "19cc588f-0593-49c9-9f4b-e4d7cc113b1c", 835 | "font-family": "Lato", 836 | "font-size": 4 837 | } 838 | } 839 | } 840 | } 841 | } 842 | }, 843 | "nbformat": 4, 844 | "nbformat_minor": 1 845 | } 846 | -------------------------------------------------------------------------------- /python_fm_2.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Working with Python: functions and modules\n", 8 | "\n", 9 | "## Session 2: Functions\n", 10 | "\n", 11 | "- [Function definition syntax](#Function-definition-syntax)\n", 12 | "- [Exercise 2.1](#Exercise-2.1)\n", 13 | "- [Return value](#Return-value)\n", 14 | "- [Exercise 2.2](#Exercise-2.2)\n", 15 | "- [Function arguments](#Function-arguments)\n", 16 | "- [Variable scope](#Variable-scope)\n", 17 | "- [Exercise 2.3](#Exercise-2.3)" 18 | ] 19 | }, 20 | { 21 | "cell_type": "markdown", 22 | "metadata": {}, 23 | "source": [ 24 | "## Function basics" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "We have already seen a number of functions built into python that let us do useful things to strings, collections and numbers etc. For example **`print()`** or **`len()`** which is passed some kind of sequence object and returns the length of the sequence.\n", 32 | "\n", 33 | "This is the general form of a function; it takes some input _arguments_ and returns some output based on the supplied arguments.\n", 34 | "\n", 35 | "The arguments to a function, if any, are supplied in parentheses and the result of the function _call_ is the result of evaluating the function.\n" 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": null, 41 | "metadata": {}, 42 | "outputs": [], 43 | "source": [ 44 | "x = abs(-3.0)\n", 45 | "print(x)\n", 46 | "\n", 47 | "l = len(\"ACGGTGTCAA\")\n", 48 | "print(l)" 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": {}, 54 | "source": [ 55 | "As well as using python's built in functions, you can write your own. Functions are a nice way to **encapsulate some code that you want to reuse** elsewhere in your program, rather than repeating the same bit of code multiple times. They also provide a way to name some coherent block of code and allow you to structure a complex program." 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "## Function definition syntax" 63 | ] 64 | }, 65 | { 66 | "cell_type": "markdown", 67 | "metadata": {}, 68 | "source": [ 69 | "Functions are defined in Python using the **`def` keyword** followed by the name of the function. If your function takes some arguments (input data) then you can name these in parentheses after the function name. If your function does not take any arguments you still need some empty parentheses. Here we define a simple function named `sayHello` that prints a line of text to the screen:" 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": null, 75 | "metadata": {}, 76 | "outputs": [], 77 | "source": [ 78 | "def sayHello():\n", 79 | " print('Hello world!')" 80 | ] 81 | }, 82 | { 83 | "cell_type": "markdown", 84 | "metadata": {}, 85 | "source": [ 86 | "Note that the code block for the function (just a single print line in this case) is indented relative to the `def`. The above definition just decalares the function in an abstract way and nothing will be printed when the definition is made. To actually use a function you need to invoke it (call it) by using its name and a pair of round parentheses:" 87 | ] 88 | }, 89 | { 90 | "cell_type": "code", 91 | "execution_count": null, 92 | "metadata": {}, 93 | "outputs": [], 94 | "source": [ 95 | "sayHello() # Call the function to print 'Hello world'" 96 | ] 97 | }, 98 | { 99 | "cell_type": "markdown", 100 | "metadata": {}, 101 | "source": [ 102 | "If required, a function may be written so it accepts input. Here we specify a variable called `name` in the brackets of the function definition and this variable is then used by the function. Although the input variable is referred to inside the function the variable does not represent any particular value. It only takes a value if the function is actually used in context." 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": null, 108 | "metadata": {}, 109 | "outputs": [], 110 | "source": [ 111 | "def sayHello(name):\n", 112 | " print('Hello', name)" 113 | ] 114 | }, 115 | { 116 | "cell_type": "markdown", 117 | "metadata": {}, 118 | "source": [ 119 | "When we call (invoke) this function we specify a specific value for the input. Here we pass in the value `User`, so the name variable takes that value and uses it to print a message, as defined in the function. " 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": null, 125 | "metadata": {}, 126 | "outputs": [], 127 | "source": [ 128 | "sayHello('User') # Prints 'Hello User'" 129 | ] 130 | }, 131 | { 132 | "cell_type": "markdown", 133 | "metadata": {}, 134 | "source": [ 135 | "When we call the function again with a different input value we naturally get a different message. Here we also illustrate that the input value can also be passed-in as a variable (text in this case)." 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": null, 141 | "metadata": {}, 142 | "outputs": [], 143 | "source": [ 144 | "text = 'Mary'\n", 145 | "sayHello(text) # Prints 'Hello Mary'" 146 | ] 147 | }, 148 | { 149 | "cell_type": "markdown", 150 | "metadata": {}, 151 | "source": [ 152 | "A function may also generate output that is passed back or returned to the program at the point at which the function was called. For example here we define a function to do a simple calculation of the square of input (`x`) to create an output (`y`):" 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "execution_count": null, 158 | "metadata": {}, 159 | "outputs": [], 160 | "source": [ 161 | "def square(x):\n", 162 | " y = x*x\n", 163 | " return y" 164 | ] 165 | }, 166 | { 167 | "cell_type": "markdown", 168 | "metadata": {}, 169 | "source": [ 170 | "Once the `return` statement is reached the operation of the function will end, and anything on the return line will be passed back as output. Here we call the function on an input number and catch the output value as result. Notice how the names of the variables used inside the function definition are separate from any variable names we may choose to use when calling the function.\n", 171 | " " 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": null, 177 | "metadata": {}, 178 | "outputs": [], 179 | "source": [ 180 | "number = 7\n", 181 | "result = square(number) # Call the square() function which returns a result\n", 182 | "print(result) # Prints: 49" 183 | ] 184 | }, 185 | { 186 | "cell_type": "markdown", 187 | "metadata": {}, 188 | "source": [ 189 | "The function `square` can be used from now on anywhere in your program as many times as required on any (numeric) input values we like." 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": null, 195 | "metadata": {}, 196 | "outputs": [], 197 | "source": [ 198 | "print(square(1.2e-3)) # Prints: 1.4399999999999998e-06" 199 | ] 200 | }, 201 | { 202 | "cell_type": "markdown", 203 | "metadata": {}, 204 | "source": [ 205 | "A function can accept multiple input values, otherwise known as arguments. These are separated by commas inside the brackets of the function definition. Here we define a function that takes two arguments and performs a calculation on both, before sending back the result.\n" 206 | ] 207 | }, 208 | { 209 | "cell_type": "code", 210 | "execution_count": null, 211 | "metadata": {}, 212 | "outputs": [], 213 | "source": [ 214 | "def calcFunc(x, y):\n", 215 | " z = x*x + y*y\n", 216 | " return z\n", 217 | "\n", 218 | "\n", 219 | "result = calcFunc(1.414, 2.0)\n", 220 | "print(result) # 5.999396" 221 | ] 222 | }, 223 | { 224 | "cell_type": "markdown", 225 | "metadata": {}, 226 | "source": [ 227 | "Note that this function does not check that x and y are valid forms of input. For the function to work properly we assume they are numbers. Depending on how this function is going to be used, appropriate checks could be added." 228 | ] 229 | }, 230 | { 231 | "cell_type": "markdown", 232 | "metadata": {}, 233 | "source": [ 234 | "Functions can be arbitrarily long and can peform very complex operations. However, to make a function reusable, it is often better to assign it a single responsibility and a descriptive name.\n", 235 | "Let's define now a function to calculate the [Euclidean distance](https://en.wikipedia.org/wiki/Euclidean_distance) between two vectors:" 236 | ] 237 | }, 238 | { 239 | "cell_type": "code", 240 | "execution_count": null, 241 | "metadata": {}, 242 | "outputs": [], 243 | "source": [ 244 | "def calcDistance(vec1, vec2): \n", 245 | " dist = 0\n", 246 | " for i in range(len(vec1)):\n", 247 | " delta = vec1[i] - vec2[i]\n", 248 | " dist += delta*delta\n", 249 | " dist = dist**(1/2) # square-root\n", 250 | " return dist" 251 | ] 252 | }, 253 | { 254 | "cell_type": "markdown", 255 | "metadata": {}, 256 | "source": [ 257 | "For the record, the [prefered way to calcule a square-root](https://docs.python.org/3/library/math.html#math.sqrt) is by using the built-in function `sqrt()` from the `math` library:\n", 258 | "```python\n", 259 | "import math\n", 260 | "math.sqrt(x)\n", 261 | "```\n", 262 | "\n", 263 | "Let's experiment a little with our function." 264 | ] 265 | }, 266 | { 267 | "cell_type": "code", 268 | "execution_count": null, 269 | "metadata": {}, 270 | "outputs": [], 271 | "source": [ 272 | "w1 = ( 23.1, 17.8, -5.6 )\n", 273 | "w2 = ( 8.4, 15.9, 7.7 )\n", 274 | "calcDistance( w1, w2 )" 275 | ] 276 | }, 277 | { 278 | "cell_type": "markdown", 279 | "metadata": {}, 280 | "source": [ 281 | "Note that the function is general and handles any two vectors (irrespective of their representation) as long as their dimensions are compatible:" 282 | ] 283 | }, 284 | { 285 | "cell_type": "code", 286 | "execution_count": null, 287 | "metadata": {}, 288 | "outputs": [], 289 | "source": [ 290 | "calcDistance( ( 1, 2 ), ( 3, 4 ) ) # dimension: 2" 291 | ] 292 | }, 293 | { 294 | "cell_type": "code", 295 | "execution_count": null, 296 | "metadata": {}, 297 | "outputs": [], 298 | "source": [ 299 | "calcDistance( [ 1, 2 ], [ 3, 4 ] ) # vectors represented as lists" 300 | ] 301 | }, 302 | { 303 | "cell_type": "code", 304 | "execution_count": null, 305 | "metadata": {}, 306 | "outputs": [], 307 | "source": [ 308 | "calcDistance( ( 1, 2 ), [ 3, 4 ] ) # mixed representation" 309 | ] 310 | }, 311 | { 312 | "cell_type": "markdown", 313 | "metadata": {}, 314 | "source": [ 315 | "## Exercise 2.1\n", 316 | "\n", 317 | "- a. Calculate the mean\n", 318 | " - Write a function that takes 2 numerical arguments and returns their mean. Test your function on some examples.\n", 319 | " - Write another function that takes a list of numbers and returns the mean of all the numbers in the list.\n", 320 | "- b. Write a function that takes a single DNA sequence as an argument and estimates the molecular weight of this sequence. Test your function using some example sequences. The following table gives the weight of each (single-stranded) nucleotide in g/mol:\n", 321 | "\n", 322 | "\n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | "
DNA ResidueWeight
A331
C307
G347
T306
\n", 329 | "\n", 330 | "\n", 331 | "- c. If the sequence passed contains base `N`, use the mean weight of the other bases as the weight of base `N`." 332 | ] 333 | }, 334 | { 335 | "cell_type": "markdown", 336 | "metadata": {}, 337 | "source": [ 338 | "## Return value" 339 | ] 340 | }, 341 | { 342 | "cell_type": "markdown", 343 | "metadata": {}, 344 | "source": [ 345 | "There can be more than one `return` statement in a function, although typically there is only one, at the bottom. Consider the following function to get some text to say whether a number is positive or negative. It has three return statements: the first two return statements pass back text strings but the last, which would be reached if the input value were zero, has no explicit return value and thus passes back the Python `None` object. Any function code after this final return is ignored. \n", 346 | "The `return` keyword immediately exits the function, and no more of the code in that function will be run once the function has returned (as program flow will be returned to the call site)" 347 | ] 348 | }, 349 | { 350 | "cell_type": "code", 351 | "execution_count": null, 352 | "metadata": {}, 353 | "outputs": [], 354 | "source": [ 355 | "def getSign(value):\n", 356 | " \n", 357 | " if value > 0:\n", 358 | " return \"Positive\"\n", 359 | " \n", 360 | " elif value < 0:\n", 361 | " return \"Negative\"\n", 362 | " \n", 363 | " return # implicit 'None'\n", 364 | "\n", 365 | " print(\"Hello world\") # execution does not reach this line\n", 366 | " \n", 367 | "print(\"getSign( 33.6 ):\", getSign( 33.6 ))\n", 368 | "print(\"getSign( -7 ):\", getSign( -7 ))\n", 369 | "print(\"getSign( 0 ):\", getSign( 0 ))" 370 | ] 371 | }, 372 | { 373 | "cell_type": "markdown", 374 | "metadata": {}, 375 | "source": [ 376 | "All of the examples of functions so far have returned only single values, however it is possible to pass back more than one value via the `return` statement. In the following example we define a function that takes two arguments and passes back three values. The return values are really passed back inside a single tuple, which can be caught as a single collection of values. " 377 | ] 378 | }, 379 | { 380 | "cell_type": "code", 381 | "execution_count": null, 382 | "metadata": {}, 383 | "outputs": [], 384 | "source": [ 385 | "def myFunction(value1, value2):\n", 386 | " \n", 387 | " total = value1 + value2\n", 388 | " difference = value1 - value2\n", 389 | " product = value1 * value2\n", 390 | " \n", 391 | " return total, difference, product\n", 392 | "\n", 393 | "values = myFunction( 3, 7 ) # Grab output as a whole tuple\n", 394 | "print(\"Results as a tuple:\", values)\n", 395 | "\n", 396 | "x, y, z = myFunction( 3, 7 ) # Unpack tuple to grab individual values\n", 397 | "print(\"x:\", x)\n", 398 | "print(\"y:\", y)\n", 399 | "print(\"z:\", z)" 400 | ] 401 | }, 402 | { 403 | "cell_type": "markdown", 404 | "metadata": {}, 405 | "source": [ 406 | "## Exercise 2.2\n", 407 | "\n", 408 | "a. Write a function that counts the number of each base found in a DNA sequence. Return the result as a tuple of 4 numbers representing the counts of each base `A`, `C`, `G` and `T`.\n", 409 | "\n", 410 | "b. Write a function to return the reverse-complement of a nucleotide sequence." 411 | ] 412 | }, 413 | { 414 | "cell_type": "markdown", 415 | "metadata": {}, 416 | "source": [ 417 | "## Function arguments" 418 | ] 419 | }, 420 | { 421 | "cell_type": "markdown", 422 | "metadata": {}, 423 | "source": [ 424 | "### Mandatory arguments\n", 425 | "\n", 426 | "The arguments we have passed to functions so far have all been _mandatory_, if we do not supply them or if supply the wrong number of arguments python will throw an error also called an exception:" 427 | ] 428 | }, 429 | { 430 | "cell_type": "code", 431 | "execution_count": null, 432 | "metadata": { 433 | "collapsed": true 434 | }, 435 | "outputs": [], 436 | "source": [ 437 | "def square(number):\n", 438 | " # one mandatory argument\n", 439 | " y = number*number\n", 440 | " return y" 441 | ] 442 | }, 443 | { 444 | "cell_type": "code", 445 | "execution_count": null, 446 | "metadata": {}, 447 | "outputs": [], 448 | "source": [ 449 | "square(2)" 450 | ] 451 | }, 452 | { 453 | "cell_type": "markdown", 454 | "metadata": {}, 455 | "source": [ 456 | "**Mandatory arguments are assumed to come in the same order as the arguments in the function definition**, but you can also opt to specify the arguments using the argument names as _keywords_, supplying the values corresponding to each keyword with a `=` sign." 457 | ] 458 | }, 459 | { 460 | "cell_type": "code", 461 | "execution_count": null, 462 | "metadata": {}, 463 | "outputs": [], 464 | "source": [ 465 | "square(number=3)" 466 | ] 467 | }, 468 | { 469 | "cell_type": "code", 470 | "execution_count": null, 471 | "metadata": {}, 472 | "outputs": [], 473 | "source": [ 474 | "def repeat(seq, n):\n", 475 | " # two mandatory arguments\n", 476 | " result = ''\n", 477 | " for i in range(0,n):\n", 478 | " result += seq\n", 479 | " return result\n", 480 | "\n", 481 | "print(repeat(\"CTA\", 3))\n", 482 | "print(repeat(n=4, seq=\"GTT\"))" 483 | ] 484 | }, 485 | { 486 | "cell_type": "markdown", 487 | "metadata": {}, 488 | "source": [ 489 | "
**NOTE** Unnamed (positional) arguments must come before named arguments, even if they look to be in the right order.
" 490 | ] 491 | }, 492 | { 493 | "cell_type": "code", 494 | "execution_count": null, 495 | "metadata": {}, 496 | "outputs": [], 497 | "source": [ 498 | "print(repeat(seq=\"CTA\", n=3))" 499 | ] 500 | }, 501 | { 502 | "cell_type": "markdown", 503 | "metadata": {}, 504 | "source": [ 505 | "### Arguments with default values\n", 506 | "Sometimes it is useful to give some arguments a default value that the caller can override, but which will be used if the caller does not supply a value for this argument. We can do this by assigning some value to the named argument with the `=` operator in the function definition." 507 | ] 508 | }, 509 | { 510 | "cell_type": "code", 511 | "execution_count": null, 512 | "metadata": {}, 513 | "outputs": [], 514 | "source": [ 515 | "def runSimulation(nsteps=1000):\n", 516 | " print(\"Running simulation for\", nsteps, \"steps\")\n", 517 | "\n", 518 | "runSimulation(500)\n", 519 | "runSimulation()" 520 | ] 521 | }, 522 | { 523 | "cell_type": "markdown", 524 | "metadata": {}, 525 | "source": [ 526 | "
**CAVEAT**: default arguments are defined once and keep their state between calls. This can be a problem for *mutable* objects:
" 527 | ] 528 | }, 529 | { 530 | "cell_type": "code", 531 | "execution_count": null, 532 | "metadata": {}, 533 | "outputs": [], 534 | "source": [ 535 | "def myFunction(parameters=[]):\n", 536 | " parameters.append( 100 )\n", 537 | " print(parameters)\n", 538 | " \n", 539 | "myFunction()\n", 540 | "myFunction()\n", 541 | "myFunction()\n", 542 | "myFunction([])\n", 543 | "myFunction([])\n", 544 | "myFunction([])" 545 | ] 546 | }, 547 | { 548 | "cell_type": "markdown", 549 | "metadata": {}, 550 | "source": [ 551 | "... or avoid modifying *mutable* default arguments." 552 | ] 553 | }, 554 | { 555 | "cell_type": "code", 556 | "execution_count": null, 557 | "metadata": {}, 558 | "outputs": [], 559 | "source": [ 560 | "def myFunction(parameters):\n", 561 | " # one mandatory argument without default value\n", 562 | " parameters.append( 100 )\n", 563 | " print(parameters)\n", 564 | " \n", 565 | "my_list = []\n", 566 | "myFunction(my_list)\n", 567 | "myFunction(my_list)\n", 568 | "myFunction(my_list)\n", 569 | "my_new_list = []\n", 570 | "myFunction(my_new_list)" 571 | ] 572 | }, 573 | { 574 | "cell_type": "markdown", 575 | "metadata": {}, 576 | "source": [ 577 | "### Position of mandatory arguments\n", 578 | "Arrange function arguments so that *mandatory* arguments come first:" 579 | ] 580 | }, 581 | { 582 | "cell_type": "code", 583 | "execution_count": null, 584 | "metadata": {}, 585 | "outputs": [], 586 | "source": [ 587 | "def runSimulation(initialTemperature, nsteps=1000):\n", 588 | " # one mandatory argument followed by one with default value\n", 589 | " print(\"Running simulation starting at\", initialTemperature, \"K and doing\", nsteps, \"steps\")\n", 590 | " \n", 591 | "runSimulation(300, 500)\n", 592 | "runSimulation(300)" 593 | ] 594 | }, 595 | { 596 | "cell_type": "markdown", 597 | "metadata": {}, 598 | "source": [ 599 | "As before, no positional argument can appear after a keyword argument, and all required arguments must still be provided." 600 | ] 601 | }, 602 | { 603 | "cell_type": "code", 604 | "execution_count": null, 605 | "metadata": {}, 606 | "outputs": [], 607 | "source": [ 608 | "runSimulation( nsteps=100, initialTemperature=300 )" 609 | ] 610 | }, 611 | { 612 | "cell_type": "code", 613 | "execution_count": null, 614 | "metadata": {}, 615 | "outputs": [], 616 | "source": [ 617 | "runSimulation( initialTemperature=300 )" 618 | ] 619 | }, 620 | { 621 | "cell_type": "code", 622 | "execution_count": null, 623 | "metadata": {}, 624 | "outputs": [], 625 | "source": [ 626 | "runSimulation( nsteps=100 ) # Error: missing required argument 'initialTemperature'" 627 | ] 628 | }, 629 | { 630 | "cell_type": "code", 631 | "execution_count": null, 632 | "metadata": {}, 633 | "outputs": [], 634 | "source": [ 635 | "runSimulation( nsteps=100, 300 ) # Error: positional argument follows keyword argument" 636 | ] 637 | }, 638 | { 639 | "cell_type": "markdown", 640 | "metadata": {}, 641 | "source": [ 642 | "Keyword names must naturally match to those declared:" 643 | ] 644 | }, 645 | { 646 | "cell_type": "code", 647 | "execution_count": null, 648 | "metadata": {}, 649 | "outputs": [], 650 | "source": [ 651 | "runSimulation( initialTemperature=300, numSteps=100 ) # Error: unexpected keyword argument 'numSteps'" 652 | ] 653 | }, 654 | { 655 | "cell_type": "markdown", 656 | "metadata": {}, 657 | "source": [ 658 | "Function cannot be defined with mandatory arguments after default ones." 659 | ] 660 | }, 661 | { 662 | "cell_type": "code", 663 | "execution_count": null, 664 | "metadata": {}, 665 | "outputs": [], 666 | "source": [ 667 | "def badFunction(nsteps=1000, initialTemperature):\n", 668 | " pass" 669 | ] 670 | }, 671 | { 672 | "cell_type": "markdown", 673 | "metadata": {}, 674 | "source": [ 675 | "## Variable scope" 676 | ] 677 | }, 678 | { 679 | "cell_type": "markdown", 680 | "metadata": {}, 681 | "source": [ 682 | "Every variable in python has a _scope_ in which it is defined. Variables defined at the outermost level are known as _globals_ (although typically only for the current module). In contrast, variables defined within a function are local, and cannot be accessed from the outside." 683 | ] 684 | }, 685 | { 686 | "cell_type": "code", 687 | "execution_count": null, 688 | "metadata": {}, 689 | "outputs": [], 690 | "source": [ 691 | "def mathFunction(x, y):\n", 692 | " math_func_result = ( x + y ) * ( x - y )\n", 693 | " return math_func_result" 694 | ] 695 | }, 696 | { 697 | "cell_type": "code", 698 | "execution_count": null, 699 | "metadata": {}, 700 | "outputs": [], 701 | "source": [ 702 | "answer = mathFunction( 4, 7 )\n", 703 | "print(answer)" 704 | ] 705 | }, 706 | { 707 | "cell_type": "code", 708 | "execution_count": null, 709 | "metadata": {}, 710 | "outputs": [], 711 | "source": [ 712 | "answer = mathFunction( 4, 7 )\n", 713 | "print(math_func_result)" 714 | ] 715 | }, 716 | { 717 | "cell_type": "markdown", 718 | "metadata": {}, 719 | "source": [ 720 | "Here we have two variables with the same name but they do have different scopes, one local to the function and the other one global, therefore they are different." 721 | ] 722 | }, 723 | { 724 | "cell_type": "code", 725 | "execution_count": null, 726 | "metadata": {}, 727 | "outputs": [], 728 | "source": [ 729 | "def increase(value):\n", 730 | " value += 1\n", 731 | " print(value)\n", 732 | " \n", 733 | "value = 4\n", 734 | "increase(value)\n", 735 | "print(value)" 736 | ] 737 | }, 738 | { 739 | "cell_type": "markdown", 740 | "metadata": {}, 741 | "source": [ 742 | "Generally, variables defined in an outer scope are also visible in functions, but you should be careful manipulating them as this can lead to confusing code and python will actually raise an error if you try to change the value of a global variable inside a function. Instead it is a good idea to avoid using global variables and, for example, to pass any necessary variables as parameters to your functions." 743 | ] 744 | }, 745 | { 746 | "cell_type": "code", 747 | "execution_count": null, 748 | "metadata": {}, 749 | "outputs": [], 750 | "source": [ 751 | "counter = 4\n", 752 | "def increment(): \n", 753 | " counter += 1\n", 754 | " return counter\n", 755 | "\n", 756 | "print(counter)\n", 757 | "print(increment())" 758 | ] 759 | }, 760 | { 761 | "cell_type": "markdown", 762 | "metadata": {}, 763 | "source": [ 764 | "Use a local variable instead" 765 | ] 766 | }, 767 | { 768 | "cell_type": "code", 769 | "execution_count": null, 770 | "metadata": {}, 771 | "outputs": [], 772 | "source": [ 773 | "def increment(): \n", 774 | " counter = 4\n", 775 | " counter += 1\n", 776 | " return counter\n", 777 | "\n", 778 | "print(increment())" 779 | ] 780 | }, 781 | { 782 | "cell_type": "markdown", 783 | "metadata": {}, 784 | "source": [ 785 | "or pass any necessary variables as parameters to your functions." 786 | ] 787 | }, 788 | { 789 | "cell_type": "code", 790 | "execution_count": null, 791 | "metadata": {}, 792 | "outputs": [], 793 | "source": [ 794 | "def increment(counter): \n", 795 | " counter += 1\n", 796 | " return counter\n", 797 | "\n", 798 | "print(increment(4))" 799 | ] 800 | }, 801 | { 802 | "cell_type": "markdown", 803 | "metadata": {}, 804 | "source": [ 805 | "## Exercise 2.3\n", 806 | "\n", 807 | "Extend your solution to the previous exercise estimating the weight of a DNA sequence so that it can also calculate the weight of an RNA sequence, use an optional argument to specify the molecule type, but default to DNA. The weights of RNA residues are:\n", 808 | "\n", 809 | "\n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | "
RNA ResidueWeight
A347
C323
G363
U324
\n" 816 | ] 817 | }, 818 | { 819 | "cell_type": "markdown", 820 | "metadata": {}, 821 | "source": [ 822 | "## Next session\n", 823 | "\n", 824 | "Go to our next notebook: [python_functions_and_modules_3](python_fm_3.ipynb)" 825 | ] 826 | } 827 | ], 828 | "metadata": { 829 | "kernelspec": { 830 | "display_name": "Python 3", 831 | "language": "python", 832 | "name": "python3" 833 | }, 834 | "language_info": { 835 | "codemirror_mode": { 836 | "name": "ipython", 837 | "version": 3 838 | }, 839 | "file_extension": ".py", 840 | "mimetype": "text/x-python", 841 | "name": "python", 842 | "nbconvert_exporter": "python", 843 | "pygments_lexer": "ipython3", 844 | "version": "3.6.2" 845 | } 846 | }, 847 | "nbformat": 4, 848 | "nbformat_minor": 1 849 | } 850 | -------------------------------------------------------------------------------- /data/sample.fa: -------------------------------------------------------------------------------- 1 | >seq39 2 | AAGGTCTCGAACGACATATAAGTGGGAAACGAAAAGG 3 | GAGGATCTACGGAACGTACAACGCACTGGATGCAAACGTAGATTCAAGTGGTAACGAA 4 | AACCATAAGAGGTTACTCCTAATGAGTCTACGATTCTTCTGCGGGTCGTGACTAAGCTCT 5 | ATACGCGGAACGTGGCACCCCACAATGATCTCTAATCATCGGCAGCCACGTCAGTGGCC 6 | TTTTGCAAAGGAGCTAGGGTTACGCCAATTTTATATGCAAAGCGCTTCCTTTAGGGA 7 | >seq51 8 | ATGGCCCGCATTGTTGCCTACGGAGCACACACATGGTCAACAGCGGGGTTACAATGGCG 9 | CTGCCGTCGCTGTCGCCGGATAACAGCTTATTGGCCCGCTATGCGTCGCTGCCTTAGGAA 10 | CAGAAACTGATGGCGATGGGTTTGTGTGGCCGGTTGATGTAGGTCCGGCAATTCGTCACA 11 | CCTTAGGGTGTATTAAAACCCCTCATTCAGACCTGAGCGAGTCTACGCAATGAGGCAT 12 | GCATCCGAAACCTTATGCACGCCTAGTGTAGTCCAAACGATCTAGGGGTTGAGAACCACC 13 | AAATTTCATCACGTATACAATAAATGCAGGGAAGTATTCATTGTGAAGA 14 | AAGGTCTCGAACGACATATAAGTGGGAAACGAAAAGG 15 | GAGGATCTACGGAACGTACAACGCACTGGATGCAAACGTAGATTCAAGTGGTAACGAA 16 | AACCATAAGAGGTTACTCCTAATGAGTCTACGATTCTTCTGTGGGTCGTGACTAAGCTCT 17 | ATACGCAGAACGTGGCACCCCGCAATGATCTCTAATCATCGGCAGCCACGTCAGTGGCC 18 | TTTTGCAAAGGAGCTAGGGTTACGCCAATTTTATATGCAAAGCGCTTCCTTTAGGG 19 | AGTGTGGCATTAGTCCCTAGAGGCCGCATATTTGACGAATGAAACAATTCGATGTCTGC 20 | GATTGGCAATTGACTGTAATAACCGCGGCAGAGTTGCTAAATCTTG 21 | CTCTCGTCAGGGCGGTCGTCGTAAATAGCTCAGTCAAAGCTGCGATTGCTTCCGTTTC 22 | ATATGATGGCTGAAATCTGGGTTCGAAAAGCGTAAGCACATGATATCATCAAGAAGAG 23 | GAGGGGGAACGGCCATACCGACCTTGTGAACGACGATCTGCGTCCTACCCAGACATA 24 | ATATTTATTGGCGATCGAGTGAGGGACACGCTTGAGATCTA 25 | AAAATCTGTCGCGAACCATGTATAAATCTACGCCGGCTCACCTGCTAACTATTTA 26 | GAGCGAACAC 27 | >seq31 28 | AAGGTCTCGAACGACATATAAGTGGGAAACGAAAAGG 29 | GAGGATCTACGGAACGTACAACGCACTGGATGCAAACGTAGATTCAAGTGGTAACGAA 30 | AACCATAAGAGGTTACTCCTAATGAGTCTACGATTCTTATGTGGGTCGTGACTAAGCTCT 31 | ATACACGGAACGTGGCACCCCGCAATGATCTCTAATCATCGCCAGCCACGTCAG 32 | >seq18 33 | CGTAGATTCAAGTGGTAACGAA 34 | AACCATAAGAGGTTACTCCTAATGAGTCTACGATTCTTCTGTGGGTCGCGACTAAGCTCT 35 | ATACGCGGAACGTGGCACCCCGCAATGATCTCTAATCATCGGCAGCCACGTCAGTGGTC 36 | TTTTGCAAAGGAGCTAGGGTTACGCCAATTTTATATGCAAAGC 37 | >seq32 38 | ATGGCCCGCATTGTTGCCTACGAAGCACACACATGGTCAACAGCGGGGTTACAATGGCG 39 | CTGCCGTCGCTGTCGCCGGATAACAGCCTATTGGCCCGCTATGCGTCGCTGCCTTAGGAA 40 | CAGAAACTGATGGCGATGGGTTTGTGCGGCCGGTTGATGTAGGTTCGGCAATTCGTCACA 41 | CCTTAGGGTGTATTAAAACCCCTCATTCAGACCTGAGCGAGTCTACGCAATGAGGCAT 42 | GCATCCGAAACCTTATGCACGCCTAGTGTAGTCCAAACGATCTAGGGGTTGAGAACCACC 43 | AAATTTCATCACGTATACAATAAATGCAGGGAAGTATTCATTGTGAAGA 44 | AAGGTCTCGAACGACATATAAGTGGGAAACGAAAAGG 45 | GAGGATCTACGGAACGTACAACGCACTGGATGCAAACGTACATTCAAGTGGTAACAAA 46 | AACCATAAGAGGTTACGCCTAATGAGTCTACGATTCTTCTGTGGGTCGCGACTAAGCTCT 47 | ATACGCGGAACGTGGCACCCCGCAATGATCTCTAATCATCGGCAGCCACGTCAGTGGCC 48 | TTTTGCAAAGGAGCTAGGGTTACGCCAATTTTATATGCAAAGCGCTTCCTTTAGGG 49 | AGTGTGGCATTAGTCCCTAGAGGCCGCATATTTGACGAATGAAACAGTTCGATATCTGC 50 | AATTGGCAATTGACTGTAATAACCGCGGCAGAGATGCTAAATCTTG 51 | CTCTCGTCAAGGCGGTCGTCGTGAATAGCTCAGTCAAAGCTGCGATTGCTTCCGTTTC 52 | ATATGATGGCTGAAATCTGGTTCGAAAAGCGTAAGCACATGATATCATCAAGAAGAG 53 | GAGGGCGAACGGCCATACCGACCTTGAGAACGACGATCTGCGTCCTACCCAGACATA 54 | ATATTTATTGGCGATCGAGTGAGGGACACGCTTGAGATCTA 55 | AAAATCTGTCGCGAACCATGTATAAATCTACGCCGGCTCACCTGCTAACTCTTTA 56 | GAGCGAACAC 57 | >seq91 58 | AAGA 59 | AAGGTCTCGAACGACATATAAGTGGGAAACGAAAAGG 60 | GAGGATCTACGGAACGTACAACGCACTGGATGCAAACGTAGATTCAAGTGGTAACGAA 61 | AACCATAAGAGGTTACTCCTAATGAGTCTACGCTTCTTCCGTGGGTCGTGACTAAGCTCT 62 | ATACGCGAAACGTGGCACCCCGCAATGATCTCTAATCATCGGCAGCCACGTCAGTGGCC 63 | TTTTGCAAAGGAGCTAGGGTTACGCCAATTTTATATGCAAAGCGCTTCCTTTAGGG 64 | AGTGTGGCATTAGTCCCTAGAGGCCGCATATTTGACGAATGAAACAATTCGATTACTGC 65 | AATTGGCAATTGACTGT 66 | >seq37 67 | CATTGTGAAGA 68 | AAGGTCTCGAACGACATATAAGTGGGAAACGAAAAGG 69 | GAGGATCTACGGAACGTACAACGCACTGGATGCAAACGTAGATTCAAGTGGTAACGAA 70 | AACCATAAGAGGTTACTCCTAATGAGTCTACGCTTCTTCCGTGGGTCGTGACTAAGCTCT 71 | ATACGCGA 72 | >seq64 73 | ATGGCCCGCATTGTTGCCTACGAAGCACACACATGGTCAACAGCGGGGTTACAATGGCG 74 | CTGCCGTTGCTGTCGCCGGATAAAAGCTTATTGGCCCGCTATGCGTCGCTGCCTTAGGAA 75 | CAGAAACTGATGGCGATGGGTTTGTGCGGCCGGTTGATGTAGGTCCGGCAATTCGTCACA 76 | CCTTAGGGTGTATTAAAACCCCTCATTCAGACCTGAGCGAGTCTACGCAATGAGGCAT 77 | GCATCCGAAACCTTATGCATGCCTAGTGTAGTCCAAACGACCTAGGGGTTGAGAACCACC 78 | AAATTTCATCACGTATACAATAAATGCAGGGAAGTATTCATTGTGAAGA 79 | AAGGTCTCGAACGACATATAAGTGGGAAACGAAAAGG 80 | GAGGATCTACGGAACGTACAACGCACTGGATGCAAACGTAGATTCAAGTGGTAACGAA 81 | AACCATAAGAGGTTACTCCTAATGAGTCTACGCTTCTTCCGTGGGTCGTGACTAAGCTCT 82 | ATACGCGAAACGTGGCACCCCGCAATGATCTCTAATCATCGGCAGCCACGTCAGTGGCC 83 | TTTTGCAAAGGAGCTAGGGTTACGCCAATTTTATATGCAAAGCGCTTCCTTTAGGG 84 | AGTGTGGCATTAGTCCCTAGAGGCCGCATATTTGACGAATGAAACAATTCGATAACTGC 85 | AATTGGCAATTGAATGTAATAACCGCGGCAGAGTTGCTAAATCTTG 86 | CCCTCGTCAAGGCGGTCGACGTAAACAGCTCAGTCAAAGCTGCGATTGCTTCCGTTTC 87 | ATATGATGGCTGAAATCTGGGTTCGAAAAGCGTAAGCACATGATATCATCAAGAAGAG 88 | GAGGGGGAACGGCCATACCGACCTTGAGAACGACGATCTGCGTCCTCCCCAGACATA 89 | ATATTTATTGGCGCTCGAGTGAGGGACACGCTTGAGATCTA 90 | AAAATCTGTCGCGAACCATGTATAAATCTACGCCGGCTCACCTGCTAACTCTTTA 91 | GAGCGAACAC 92 | >seq97 93 | ATAGGTTCGAGTCTGCGCTGA 94 | AACCACCGATAATGACCTCCGTTGAGCCTCCGATCCTTCTGGTCATAAGTGTCC 95 | ATATGCGGTATGTGGTTCCGCGCTAATAGCTATGGTGAGCCACGAGGACGCAAGTGGCC 96 | TCGCCTGTCATAATCGTTGGGTTGCGTCAATTGTATGCATTA 97 | CCTGTGTCATTAGTCCTTAGACGCCGTCTATTTGGCATTGAATCTTCCTAA 98 | >seq100 99 | CAACGCACGTAACTCAGGCGTAGGTTCGAGTCTGCGCTGA 100 | AACCATCGAAAATGACCTCCGTGGAGCCTCCGATCCTTCTGGTCCTAAGTGTCC 101 | ATATGCGGTATGTGGTTCCGCGCTAAT 102 | >seq5 103 | AAGGTCTCCATTGGGGACGCTGAAGCTCACGCGTCCTCGGCTGCAGGATTACAATGGAG 104 | CTGATGTCGCTGCGCTCGGGCATCAGTTCACTGGCATGCATTTCGTCGCTGCACTAGGGA 105 | TAGGAATTGTCGGCGACGCTTAGTTGCGTCTAGTTAGACTAGGTCGGTTAATACCTCTTG 106 | CACTAGCGCCGGACTCAAGACTCCCAGTCGGACCCTTAGGGGCCTACGAAATCAGCCGG 107 | CCACCCGAGTGTCCTATCGCGTCAACCAGCGTCCAGACGACTAAGATGTTAGCGCCTTCC 108 | AACCCTCCTCACGGGTACTATAAATCTGGGGATTAACTCCTCGTAAACA 109 | ATGTAGTAGAGTAACGTATAAATGACAACCTAAAGGG 110 | GACGGACTGCGGGAAACACAACGCACTTAACTCAGGCGTAGGTTCGAGTTTGCGCTGA 111 | AACCATCGAAAATGACCTCCGTGGAGCCTCCGATCCTTCTGGTCATAAGTGTCT 112 | ATATGCGGTATGTGGTTCCGCGCTAATAGCTATGTTGAGCCACGAGGACGCAAGTGGCC 113 | TCGCCTGTCATAATCGTTGGGCTGCGTCGATTGTATGCATGA 114 | CCTGTGTCATTAGTCCTTAGACGCCGTCTATTTAGCATTGGATCTTCCTAAGGCGGC 115 | GATTTCCAACGGATTTTAATTACCGGCCGTGAGTCGGACAACGTCG 116 | GTCTCGTCAGTACGGTCAACGTAAGTGGCTCGCTTAAGAGTGTGGCGGTTTCCATCTC 117 | ATATAGTAGCTCATCCCCTAAGACACCAAATGGGAGCATATGGGGTCTGCATGAAGAG 118 | GATGGTAAACGGCCAAGATGCGCGGTGGGAATTCCCAGTTGCGTTCAACGCAACATA 119 | ATTGTTGTCGCCGATACTGGGAGTGACACGCTTGTAAACTA 120 | CAAAATTATCGCGAGCTATGTAGCAATTTAAGCAGGCTCATCTTATCACAATTTCA 121 | GGGCGGAAAG 122 | >seq96 123 | AAGGTCTCCATTGGGGACGCTGAAGCTCACGCGTCCTCGGCTGCAGGATTACAATGGAG 124 | CTGATGTCGCTGCGCTCGGGCATCAGTTCACTGGCATGCATTTCGTCGCTGCACTAGGGA 125 | TAGGAATTGTCGGCGACGCTTAGATGCGTCTAGTTAACCTAGGTCAGTTAATACCTCTTG 126 | CACTAGCGCCGGACTCAAGACCCCCAGTCGGACCCTTAGGGGCCTACGAAATCAGCCGG 127 | CCACCCGAGTGCCCTATCGCGTCAAACATCGTCCAGACGACTAAGATGTTAGCGACTTCC 128 | AACCCTCCTCGCGGGTACTATAAATCTGGGGATTAACTCCTCGTAAACA 129 | ATGTAGTAGAGTAACGTATAAATGACAACCTAAAGGG 130 | GACGGACTGCGGGAAACACAACGCACTTAACTCAGGCGTAGGTTCGAGTCTGCGCTGA 131 | AACCATCGAAAATGACCTCCGTGGAGCCTCCGATCCTTCTGGTCATAAGTGTCT 132 | ATATGCGGTATGTGGTTCCGCGCTAATAGCTATGGTGAGCCACGAGGACGCAAGTGGCC 133 | TCGCCTGTCATAATCGTTGGGCTGCGTCGATTGTATGCATGA 134 | CCTGTGTCATTAGTCCTTAGACGCCGTCTATTTAGCATTAGATCTTCCTAAGGCGGC 135 | GATTTCCAACGGATTTTAATTACCGGCCGTGAGTCGGACAACGTCG 136 | GTCTCGTCAGTACGGTCAACGTAAGTGGCTCGCTTAAGAGTGTAGCGGTTTCCATCTC 137 | ATATAGTAGCTCATCCCCTAAGACACCAAATGGGAGCATATGGGGTCTGCATGAAGAG 138 | GACGGTAAACGGCCAAGCTGCGCGGTGGGAATTCCCAGTTGCGTTCAACGCAACATA 139 | ATTGTTGTCGCCGATACTGGGAGTGACACGCTTGTAAACTA 140 | CAAAATTATCGCGAGCTATGTAGCAATTTAAGCAGGCTCATCTTATCACAATTTCA 141 | GGGCGGAAAG 142 | >seq4 143 | CGTAAACA 144 | ATGTAGTAGAGTTACGTATAAATGACAATCTAAAGGG 145 | GACGGACTGCGGGAAACACAACGCACTTAACTCAGGCGTAGGTTCGAGTCTGCGCTGA 146 | AACCATCGAAAATGACCTCCGTGGAGCCTCCGATCCTTCTGGTCATAAGTCTCT 147 | ATATGCGGTATGTGGTTCCGCGCTAATAGCTATGGTGAGCCACGAGGACGCAAGTGGCC 148 | TCGCCTGTCATAATCGTTGGGCTGCGTCGATTGTATGCATGA 149 | CCTGTGTCATTAGTCCTTAGACGCCGTCTATTTAGCA 150 | >seq90 151 | AAGGTCTCCATTGGGGACGCTGAAGCTCACGCGTCCTCGGCTGCAGGATTACAATGGAG 152 | CTGATGTCGCTGCGCTCGGGCATCAGTTCACTGGCATGCATTTCGACGCTGCACTAGGGA 153 | TAGGAATTGTCGGCGACGCTTAGATGCGTCAAGTTAACCTAGGTCGGTTAATACCTCTTG 154 | CACCAGCGCCGGACTCAAGACCCCCAGTCGGACCCTTAGGGGCCTACGAAATCAGCCGG 155 | CCACCCGAGTGCCCTATCGCGTCAAACAGCGTCCAGACGACTAAGATGTTAGCGACTTCC 156 | AACCCTCCTCGCGGGTACTATAAATCTGGGGATTAACTCCTCGTAAACA 157 | ATGTAGTAGAGTAACGTATAAATGACAATCTAAAGGG 158 | GACGGACTGCGGGAAACACAACGCACTTAACTCAGGCGTAGGTTCGAGTCTGCGCTGA 159 | AACCATCGAAAATGACCACCGTGGAGCCTCGGATCCTTCTGGTCATAAGCCTCT 160 | ATATGCGGTATGTGGTTCCGCGCTAATAGCTATGGTGAGCCACGAGGACGCAAGTGGCC 161 | TCGCCTGTCATAATCGTTGGGCTGCGTCGATTGTATGCATGA 162 | CCTGTGTCATTAGTCCTTAGACGCCGTCTATTTAGCATTGGATCTTCCTAAGGCGGC 163 | GATTTCCAACGGATTTTAATTACCGGCCGTGAGTCGGACACCGTCG 164 | GTCTCGTCAGTACGGTCAACGTAAGTGGCTCGCTTAAGAGTGTAGCGGTTTCCATCTC 165 | ATATAGTAGCTCATCCCCTAAGACACCAAATGGGAGCATATGGGGTCTGCATGAAGAG 166 | GACGGTAAACGGCCAAGCTGCGCGGTGGGAATTCCCAGTTGCGTTCAACGCTACATA 167 | ATTGTTGTCGCCGATACTGTGAGTGACACGCTTGTAAACTA 168 | CAAAATTATCGCGAGCTATGTAGCAATTTAAGCAGGCTCATCTTATCACAATTTCA 169 | GGGCGGAAAG 170 | >seq43 171 | AAGGTCTGAATTGGGGACGCTGAAGCTCAAGCGTATTCGGCGGCAGGATTACAATGGAG 172 | CTGACGTCGCTGCGCTCGGGCATCAGTTCACTGGCATGCGTTTCGTCGCTGCGCTAGGGA 173 | TAGGAATTGTCGGCAACGCTTAGATGCGTCTAGTTAACCTAGGCCGGTTAATACCTCCTG 174 | CACTAGCGCCGGACTCAAGACCCCCAATCGGACCCTAAGGGGCCTACGAAATCAGCCGG 175 | ACACCCGAGTGCCCTATCGCATCAAACAGCGTCCAGACGAATAAGATGTTAGCGACTATC 176 | AACCCTCCTCGCGGGTACTATAAATCTGGGGATTAACTCCTCGTAAACA 177 | ATGTAGTAGAGTAACGTATAAATGACAAACTAAAGGG 178 | GACGGACTGCGAGTAACACAACGCACTTAACTCAGGCGTAGGTTCGAGTCTGCGCTGA 179 | AACCATCGAAAATGACCTCCGTGGAGCCGCCGATCCTTCTGGTCATAAGTGTCT 180 | ATATGCGGTATGTGGTTCCGCGCTAATAGCTAAGGTGAGCCACGAGGACGCAAGTGGCC 181 | TCGCCTGTCATAATCGTTGGGTTGCGTCGATTGTATGCATGA 182 | CCTGTGTCATTAGCCCTTAGACGTCGTCTATTTAGCATTGGACCTTCCTAAGGCGGC 183 | GATTTCCAATGGATTTTAATTAGCGCCCGTGAGTCGGACAACGTCA 184 | GTCTCGTCAAGACGGTCAACGTAAGTGGCTCGCTTAAGAGTGTGGCGGCTTCCATCTT 185 | ATATAGTAGCTCATCCCCTAAGACACCAAATGGGAGCATATGGGGTCTGCATGAAGAG 186 | GGCGGTAAAGGGCCAAGATGGGCGGTGGAAATTCCCAGTTGCGTTCAACGCAACATA 187 | ATTGTTGTCGCCGATACTGGGAGTGACACGCTTGTAAACTA 188 | CAAAATTATCGCGAGCTATGTAGCAATTGAAACTGGCTCATCTGATCACAATTTCA 189 | GGGCGAAAAG 190 | >seq38 191 | CCGTAAACA 192 | ATGTAGTAGAGTAACGTATAAATGACAAACTAAAGGG 193 | GACGGACTGCGGGTAACACAACGCACTTAACTCAGGCGTAGGTTCGAGTCTGCGCTGA 194 | AACCATCGAAAATGACCTCCGTGGAGCCTCCGATCCTTCTGGTCATAAGTGTCT 195 | ATATGCGGT 196 | >seq6 197 | AAGGTCTGAGTTGGGGACGCTGAAGCTCACACGTATTCGGCGGCAGGATTACAATGGAG 198 | CTGATGTCGCTGCGCTCGGGCATCAGTTCACTGGCATGCGTTTCGTCGCTGCACTAGGGA 199 | TAGGAATTGTCGGCAACGCTTAGATGCGTCTAGTTAACCTAGGCCGGTTAATACCTCCTG 200 | CACTAGCGCCGGACTCAAGACCCCCAATCGGACCCTAAGGGGCCTACTAAATCAGCCGG 201 | ACACCCGAGTGCCCTATCGCGTCAAACAGCGTCCAGACGACTAAGATGTTAGCGACTATC 202 | AACCCTCCTCGCGGGTACTATAAATCTGGGGATTAACTCCCCGTAAACA 203 | ATGTAGTAGAGTAACGTATAAATGACAAACTAAAGGG 204 | GACGGACTGCGGGTAAGACAACGCACTTAACTCAGGCGTAGTTTCGAGTCTGCGCTGA 205 | AACCATCGAAAATGACCTCCGTGGAGCCTCCGATCCTTCTGGTCATAAGTGTCT 206 | ATATGCGGTATGTGGTTCCGTGCTAATAGCTAAGGTGAGCCACGAGGACGCAAGTGGCC 207 | TCGCCTGTCATAATCGTTGGGTTGCGTCGATTGTATGCATGA 208 | CCTGTGTCATTAGTCCTTAGACGTCGTCTATTTAGCATTGGATCTTCCTAAGGCGGC 209 | GATTTCCAATGGATTTTAATTAGCGCCCGTGAGTCGGACAACATCA 210 | GTCTCGTCAAGACGGTCAACGTAAGTGGCTCGCTTAAGAGTGTGGCGGCTTCCATCTT 211 | ATATAGTAGCGCATCCCCTAGGACACGAAATGGGAGCATATGGGGTCTGCATGAAGAG 212 | GGCGGTAAACGGCCAAGATAGGCGGTGGAAATTCCCAGTTGCGTTCAACGCAACATA 213 | ATTGGTGTCGCCGATACTGGGAGTGACACGCTTGTAAACTA 214 | CAAAATTATTGCGAGCTATGTAGCAATTGAAGCTGCCTCATCTGATCACAATTTCA 215 | GGGCGAAAAG 216 | >seq7 217 | 218 | 219 | 220 | 221 | 222 | 223 | 224 | 225 | CT 226 | ATATGCGGTATGTGGTTCCGTGCTAATAGCTAAGGTGAGCCACGAGGACGCAAGTGTCC 227 | TCGCCTGTCATAATCGTTGGGTTGCGTC 228 | 229 | 230 | 231 | 232 | 233 | 234 | 235 | 236 | >seq99 237 | AAGGTCTGAATTGGGGACGCTGAAGCTCACGCGTATTCGGCGGCAGGATTACAATGGAG 238 | CTGATGTCGCTGCGCTCGGGCATCAGTTCACTGGCATGCGTTTCGTCGCTGCACTAGGGA 239 | TAGGAATTGTCGGCAACGCTTAGATGCGTCTAGTTAACCTAGGCCGGTTAATACCTCCTG 240 | CACTAGCGCCGGACTCAAGACCCCCAATCGGACCCTAAGGGGCCTACGAAATCAGCCGG 241 | ACACCCGAGTGCCCTATCGCGTCAAGCAGCGTCCAGACGACTAAGATGTTAGCGACTTTC 242 | AACCCTCCTCGCGGGTACTATAAATCTGGGGATTAACTCCTCGTAAACA 243 | ATGTAGTAGATTAACGTATAAATGACAAACTAAAGGG 244 | GACGGACTGCGGGTAACACAACGCACTTAACTCAGGCGTGGGTTCGAGTCTGCGTTGA 245 | AACCATCGAAAATGACCTCCGTGGAGCCTCCGATCCTTCTGGTCATAAGTGTCT 246 | ATATGCGGTATGTGGTTCCGCGCTAATAGCTAAGGTGAGCCACGAGGACGCAAGTGGCC 247 | TCGCCTGTCATAATCGTTGGGTTGCGTCGATTGTATGCATGA 248 | CCTGTGTCATTAGTCCTTAGACGTCGTCTATTTAGCATTGGATCTTCCTAAGGCGGC 249 | GATTTCCAATGGATTTTAATTAGCGCCCGTGAGTCGGGCAACGTCG 250 | GTCTCGTCAAGACGGTCAACGTAAGTGGCTCGCTTAAGAGTGTGGCGGCTTCCATCTT 251 | ATATAGTAGCTCATCCCCTAAGACACCAAATGGGAGCATATGGGGTCTGCATGAGGAG 252 | GGCGGTAAACGGCCAAGATGGGCGGTGGAAATTCCCAGTTGCGTTCAACGCAACATA 253 | ATTGTTGTCGCCGATACTGGGAGTGACACGCTTGTAAACTA 254 | CAAAATTATCGCGAGCTATGTAGCAATTGAAGCTGGCTCATCTGATCACAATTTCA 255 | GGGCGAAAAG 256 | >seq92 257 | ATGGCCTGCCTTGTTGAGCCCGGAGCCCACAAACGATCGAAAGGGGGGTTACAACGGCG 258 | CTGAGGTCGCTGCAGACGGATACGTACTTATTGGCACTCACTGCGTCGCTTCTCGAGTAA 259 | TGCAAATTGATGGTGAGGTTCCAACGTGGCTCGTAAACCTAGGTTCGGCGATTCTTCCTG 260 | CATTACGCTGGATTAGGAACCGCCATTTACACTCGAGAGAGCCTACGGATTCAGCCGG 261 | AGACACGTGAACCCCAGCACGCCGAACAGAGTCTAAACGATCTAGATGTAGCGCGCTTCT 262 | ATACTTCACCACCCGTCTTATAACTACAGGGACTAGCTCATTATGTACA 263 | ATGTCCACGACCAACTTATAAATGACAAATCAAAGGG 264 | GATGGACTGCAGGAAATACATCGCACATGACGCAGACCTAGATTCGAGCCCGAGCTGG 265 | AACCTAAGAAGGTGACTGCCAAGAAGTCTCCGGCCTTTTGGGTCGTAGATTCC 266 | ATACGCAAATATGTACTTCCTCGCTGCGAGATCCTATCACCAGTCGGGCCGTAGGCGGGC 267 | TTAACTTGTAGTAATCATTGGGTTGCCTGGCCTGTATTCTTAA 268 | CCTATTTCCTCAGTCCCTATTGGCTGCCTATTTGGCTAATTTTTAGTTTTAAATAAGGC 269 | GGCTCGCGAAGGACTTTAAAAGCCTCGTAAGGGTCGCACAAGTCCT 270 | GTCTTGTTCAGGCGATTGACTGAAATAGTTCGCTAAAGGATGTATGGGTTCTCATCTC 271 | ATACGATAGGTCAATGGCTGATTCAAAACCAGGAACCAGATTGTCTCGAAAGGGCT 272 | GCTGGGGGACGCCTATGAAACACCGTGAAAACTACGATTTGCGGCTTAGCCAACATGGG 273 | GACGTGTACGGTTGAACGTCTTTATAGGAGATCCTACGTGGGAAACGCTCATGAACTA 274 | CGAATCTGTCGCGAATTATGCACTGCTCTTGGCCGGCTCAGTCGATAACAAGTCA 275 | GGGCGACTAC 276 | >seq28 277 | 278 | 279 | 280 | 281 | 282 | 283 | 284 | 285 | GGTCGTCGATTCC 286 | AGACGTGGATATGTACTTCCTCGCTGCGAGATCCTATCACCAGCCGGGACGTAGGCGGGC 287 | TTAACTTGTAGTAATCATTGGGTTGCCTGGCCTGTATTCATTA 288 | CCTGTA 289 | 290 | 291 | 292 | 293 | 294 | 295 | 296 | >seq85 297 | ATAGCCTGCCTTGTTGAGCCCGGAGCCCACAAACGATGGAAAGGGGGGTTACAACGGCG 298 | CTGAGGTCGCTGCAGACGGATAGGTACTTATTGGCACTCACTGCGTCGCTTCTCGAGGGA 299 | TGCAAATTGATGGTGAGGTTCCAACGTGGCTCGTAAACCTAGGTTCGGCGATTCTTCCTG 300 | CATTACACTGGATTAGGGACGGCCATTTACACTCGAGAGAGCCTACGGATTCAGCCGG 301 | AGACCCGTGAACCCCAGCACGCCGAACAGAGTCTAAACGATCTGGATGTAGCGCGCTTCT 302 | ACACTTCACCACCCGTCTTATAACTACAGGGAGTAGCTCATTGTGTACG 303 | ATGTATCCCACGACCAACCTATAAATGACAAATCAAAGGG 304 | GATGGACTCCGGGAATTACACCGCACATGACGTAGACCTAGATTCGAGCCCCAGCTGA 305 | AACCTAAGGAAGTGACTGCCAGGAAGTCTCCGACCCTTTGGGTCTTCGATTCC 306 | AGGCGTGGATATGTACTTCCTCGCTGCGAGATCCTATCACCAGCCGGGACGTAGGCGGGC 307 | TTAACTTGTAGTAATCATTGGGTTGCCTGGCCTGTATTCATTA 308 | CCTGTATCCTCAGTCCCTATTGGCTGCCTATTTGGCTAATTTTTAGTTTTAAATAAGGC 309 | GGCTCGCAAAGGGCTTTAAAAGCCTGGTAAGTGTCGCACAAGGCCC 310 | GTCTTGTTCAGGCGGTTAGCGGAAATAGTTCGTTAAAGAAGGTATGGGTTCTCATCTG 311 | ATACGATGGGACAATGCCTGATTCAAAACCAGGAACCAGATTGTCTCGAAAGAGCC 312 | CCTGGGGGACGCCCATGACACACCGTGAAAACTACGATTTGCGGCTTAGCCAACATGGG 313 | GTCGTTTACGGTAGAACGTCTTTATAGGAGATCCTACGTGGGAAACGCTCATGAACTA 314 | CGAATCTGGCGCGAATTATGCACTGCTCTTGGCCGGCTCAGTCGATAACAAGTCA 315 | GGGCGACTAC 316 | >seq55 317 | ATGGCCTGCCTTGTTGAGCCCGGAGCCCACAAACGATCGAAAGGGGGGTTACAACGGCG 318 | CTGAGGTCGCTGCACACGGATATGTACTTATTGGCACTCACTGCGTCGCTTCTCGAGGGA 319 | TGCAAATTGATGGTGAGGTTCCAACGTGGCTCGTAAACCTAGGTTCGGCGATTCTTCCTG 320 | TATTACACTGGATTAGGGACGGCCATTTACACTCGAGAGAGCCTACGGATTCCGCCGG 321 | AGACCGTGAACCCCAGCACGCCGAACAGATCTAAACGATCTAGATGTAGCGTGCTTCT 322 | ACACTTCACCACCCGTCTTATAACTACAGGGAGTAGCTCATTGTGTACG 323 | ATGTATCCCACGACCAACCTATAAATGACAAATCAAAGGG 324 | GATGGATTCCGGGAATTACATCACACATGACGTAGACCTTGATTCGAGCCCCAGCTGA 325 | AACCTAAGGAAGTGACTGCCAGGAAGTCTCCGACCCTTTGGGTCGTCGATTCC 326 | AGACGTGGATATGTACTTCCTCGCTGCGAGATCCTATCACCAGCCGGGACGTAGGCGGGC 327 | TTAACTTGTAGTAATCATTGGGTTGCCTGGCCTGTATTCATAA 328 | CCTGTATCCTCAGTCCCTATTGGCTGCCTATTTGGCTATTTTTTAGTTTTAAATAAGGC 329 | GGCTCGCAAAGGGCTTTAAAAGCCTGGTAGGGGTCGCACAAGGCCC 330 | GTCTTGTTCAGGCGGTTAGCAGAAACAGTTCGTTAAAGAATGTATGGGTTCTCATCTG 331 | ATACGATGGGACAATGCCTGATTCAAAACCAGGAACCAGATTGTCTCGAAAGAGCC 332 | CCTGGGGGACGCCCATGAAACACCGTGAAAACTACGATTTGCGGCTTAGCCAACATGGG 333 | GTCGTTTACGGTTGAACGTCGTTATAGGAGATCCTACGTGGGAAACGCTCATGAACTA 334 | CGAATCTGGCGCGAATTATGCACTGCTCTTGGCCGGCTCAATCGATAACAAGTCA 335 | GGGCGACTAC 336 | >seq68 337 | ATGGCCTGCCTTGTTGAGCCCGGAGCTCACAAACGATCGAAAGGGGGGCTACAACGGCG 338 | CTGAGGTCGCTGCACACGGATATGTACTTATTGGCACTCACTGCGTCGCTTCTCGAGGGA 339 | TGCAAATTGATGGTGAGGTTCCAACGTGGCTCGTAAACCTAGGTTCGGCGATTCTTCCTG 340 | TATTACAATGGATTAGGGACGGCCATTTACACTCGAGAGAGCCTACGGATTCAGCCGG 341 | AGACCGTGAACCCCAGCACGCCGAACAGATCTAAACGATCTAGATGTAGCGTGCTTCT 342 | ACACTTCACCACCCGTCTTATAACTACAGGGAGTAGCTCATTGTGTACG 343 | ATGTATCCCACGACCAACCTATAAATGACAAATCAAAGGG 344 | GATGGATTCCGGGAATTACATCACACATGACGTAGACCTTGATTCGAGCCCCAGCTGA 345 | AAGCTAAGGAAGTGACTGCCAGGAAGTCTCCGACCCTTTTGGTCGTCGATTCC 346 | AGACGTGGATATGTACTTCCTCGCTGCGAGATCCTATCACCAGCCGGGACGTAGGCGGGC 347 | TTAACTTGTAGTAATCATTGGGTTGCCTGGCCTGTATTCATAA 348 | CCTGTATCCTCAGTCCCTATTGGCTGCCTATTTGGCTAATTTTTAGTTTTAAATAAGGC 349 | GGCTCGCGAAGGGCTTTAAAAGCCTGGTAAGGGTCGCATAAGGCCC 350 | GTCTTGTTCGGGCGGTTAGCAGAAACTGTTCGTTAAAGAATGTATGGGTTCTCATCTG 351 | ATACGATGGGACAATGCCTGATTCAAAACCAGGAACCAGATTGTCTCGAAAGAGCC 352 | CCTGGGGGACGCCCGTGAAACACCGTGAAAACTACGATTTGCGGCTTAGCCAACATGGG 353 | GTCGTTTACGGTTGAACGTCGTTATAGGAGATCCCACGTGGGAAACGCTCATGAACTA 354 | CGAATCTGGCGCGAATTATGCACTGCTCTTGGCCGGCTCAATCGATAACAAGTCA 355 | GCGCGACTAC 356 | >seq13 357 | 358 | 359 | 360 | 361 | 362 | 363 | 364 | CCTAGATTCGAGCCCCAGCTGA 365 | AACCTAAGGAAGTGACTGCCAGGAAGTCTCCGACCCTTTGGGTCGTCGATTCC 366 | AGACGTGGATAAGTACTTCCTTGCTGCGAGATCCTATCACCAGCCGGGGCGTAGGCGGGC 367 | TTAACTTGTAGTAATCATTGGGTTGCCTGGCCTGTATTCATAA 368 | CCTGTATCTTCAGTCCCTATTGGCTGCCTATTTGGCTAATTTTTAGTTTTAAAT 369 | 370 | 371 | 372 | 373 | 374 | 375 | 376 | >seq53 377 | 378 | 379 | 380 | 381 | 382 | 383 | GACAAATCAAAGGG 384 | GATGGATTCCGGGAATTACATCGCACATGACGTAGACCTAGATTCGAGCCCCAGCTGA 385 | AACCTAAGGAAGTGACTGCCAGGAAGTCTCCGACCCTTTGGGTCGTCGATTCC 386 | AGACGTGG 387 | 388 | 389 | 390 | 391 | 392 | 393 | 394 | 395 | 396 | >seq56 397 | 398 | 399 | 400 | 401 | 402 | 403 | 404 | TACATCGCTCATGACGTAAACCTAGATTCGAGCCCCAGCTGA 405 | AACCTAAGGAAGTGACTGCCAGGAAGTCTCCGACCCTTTGGGTCGTCGATTCC 406 | AGACGTGGATATGTACTTCCTCGCTGCGAGATCCTATCACCAGCCGGGACGTAGGCGGGC 407 | TTAACTTGTAGTAATCATTGCGTTGCCTGGCCTGTATTCATAA 408 | CCTGTATCCTCAGTCCCTATTGGCTGCCTATTTGGCTAATTTTTA 409 | 410 | 411 | 412 | 413 | 414 | 415 | 416 | >seq79 417 | 418 | 419 | 420 | 421 | 422 | 423 | ATGTATCCCACGACCAACCTATAAATGACAAATCAAAGGG 424 | GATGGATTCCGGGAATTACATCGCACATGACGTAGACCTAGATTCGAGCCCCAGCTGA 425 | AACCTAAGGAAGTGACTGCCAGGAAGTCTCCGACCCTTTGGGTCGTCGATTCC 426 | AGACGTGGATATGTACTTCCTCGCTGCGAGATCCTATCACCAGCCGGGACGTAGGCGGGC 427 | TTAACTTGTAGTAATCATTGCGTTGCCTGGCCTGTATTCATAA 428 | CCTGTATCCTCAGTCCCTATTGGCTGCCTATTTGGC 429 | 430 | 431 | 432 | 433 | 434 | 435 | 436 | >seq35 437 | ATGGCCTGCCTTGTCGAGCCCGGAGCCCACAAACGATCGAAAGGGGGGTTACAACGGCG 438 | CTGAAGTCGCTGCAGACGGATATGTACTTATTGGCACTCACTGCGTCGCTTCTCGAGGGA 439 | TGCAAATTGATGGTGAGGTTCCAACGTGGCTCGTAAACCTAGGTTCGGCGATTCTTCCTG 440 | CATTACACTGGATTAGGGACGGCCATTTACACTCGAGAGAGCCTACGGATTCAGCCGG 441 | AGACCCGTGAACCCCAGCACGCCGAACATAGTCTAAACGATCTAGATGTAGCGTGCTTCT 442 | ACACTTCACCACCAGTCTTATAACTACAGGGAGTAGCTCATTGTGTACG 443 | ATGTATCCCACGACCAACCTATAAATGACAAATCAAAGGG 444 | GATGGATTCCGGGAATTACATCGCACAAGACGTAGACCTAGATTCGAGCCCCAGCTGA 445 | AACCTAAGGAAGTGGCTGCCAGGAAGTCTCCGACCCTTTGGGTCGTCGATTCC 446 | AGACGAGGATATGTACTTCCTCGCTGCGAGATCCTATCACCAGCCGGGACGTAGGCGGGC 447 | TTAACTTGGAGTAATCATTGGGTTGCCTGGCCTGTAGTCATAA 448 | CCTGTATTCTCAATCCCTATTGGCTGCCTATTTGGCTAATTTTTAGTTTTAAATAAGGC 449 | GGCTCGCAAAGGGCTTTAAAAGCCTGGTAAGGGTCGCACAAGGCCC 450 | GTCTTGTCAAGGCGGTTAGCAGAAACAGTTCGTTAAAGAATGTATGGGTTCTCATCTG 451 | ATACGATGGGACAATGCCTGATTCAAAACCAGGAACCAGATTGTCTCGAAAGAGCC 452 | CCTGGGGGACGCCCATGAAACACCGTGAAAACTACGATTTGCGGCTTAGCCAACATGGG 453 | GTCGTTTACGGTTGAACGTCTTTATAGGAGGTCCTACGTGGGAAACGCTCATGAACTA 454 | CGAATCTGGCGCGAATTATGCACTGCTCTTGGCCGGCAGTCGATAACAAGTCA 455 | GGGCGACTAC 456 | >seq52 457 | 458 | 459 | 460 | 461 | 462 | 463 | 464 | 465 | TGACTGCCAGGAAGTCTCCGACCCTTTGGGTCGTCGATTCC 466 | AGACGTGGATATGTACTTCCTC 467 | 468 | 469 | 470 | 471 | 472 | 473 | 474 | 475 | 476 | >seq16 477 | 478 | 479 | 480 | 481 | 482 | 483 | 484 | 485 | CCCTTTGGGTCGTCGATTCC 486 | AGACGTGGATATGTACTTCCTCGCTACGAGATCCTATCACCAGCCGGGACGTA 487 | 488 | 489 | 490 | 491 | 492 | 493 | 494 | 495 | 496 | >seq29 497 | 498 | 499 | 500 | 501 | 502 | 503 | 504 | 505 | C 506 | AGACGTGGATATGTACTTCCTCGCTACGAGATCCTATCACCAGCCGGGACGTAGGCGGGC 507 | TTAACTTGGAGTAATCATTGGGTTGCCTG 508 | 509 | 510 | 511 | 512 | 513 | 514 | 515 | 516 | >seq30 517 | 518 | 519 | 520 | 521 | 522 | 523 | 524 | CTGA 525 | AACCTAAGGAAGTGACTGCCAGGAAGTCTCCGACCCTTTGGGTCGTCGATTCC 526 | AGACGTGGATATGTACTTCCTCGCTACGAGATCCTATCACCAGCCGGGACGTAGGCGGGC 527 | TTAACTTGGAGTAATCATTGGGTT 528 | 529 | 530 | 531 | 532 | 533 | 534 | 535 | 536 | >seq9 537 | 538 | 539 | 540 | 541 | 542 | 543 | 544 | 545 | AGTCTCCGACCCTTTGGGTCGTGGATTCC 546 | AGACGCGGATATGTACTTCCTCGCTGCGGGGTCCTATCACCAGACGGGACTTAGGCGGGC 547 | TTGACTTGTAGTAATCATTGGGTTGTCTGACCCGTATTCATAA 548 | CCTGTGTCCTCAGTCCCTATAGGCTGCCTATTTGGCTAAT 549 | 550 | 551 | 552 | 553 | 554 | 555 | 556 | >seq93 557 | 558 | 559 | 560 | 561 | 562 | 563 | ATGTATCCCACGACCAACCTATAAATGACAAATCAAAGGG 564 | GATGGACTGCGGGAATTACATCGCACATGACGTAGACCTAGATTCGAGCCCGAGCTGA 565 | AGCCTAAGGAAGTGACTGCCAGGAAGTCTCCGACCCTTTGGGTCGTAGATTCC 566 | AGACGCGGTTATGTACTTCCTCGCTGCGAGGTCCTATGACCAGCCGGGACTTAGGCGGGC 567 | TTAACTTGTAGTAATCATTGGATTGTCTGGCCTGTATTCATAA 568 | CCTGTATCCTCAGTCCCTATTGGCTGCCTATTTGGCTAATTTTTAGTTTTAAATAAGGC 569 | GGCTCGCAAAGGGCTTTAAAAGCCTGGTAAG 570 | 571 | 572 | 573 | 574 | 575 | 576 | >seq86 577 | 578 | 579 | 580 | 581 | 582 | CATTACAAACA 583 | AAGTCCTCGACCAACGTATAAACGGCTAACCAGAGCG 584 | GGTGAACTGTGGGAAGTAAATCGAACATGGCCCGAAGGTAGGTTCGAGTCGGTGCAGA 585 | ATCCCTAGAGGATGAGTGCCCAGCACCGTCCGATCCTTCCGGTCGTGGAATTCA 586 | ATACTCGCAGATGAGGTTACCCACTGAGAGATCAAAGCACTGTTCGGGGTGTAAACGGCC 587 | TTTACTGTAGTAATGGTTGGGTTACGTCAACCGTATGCGTAA 588 | GCTGAATCATTAGTCCTAAGAGGCTGCCTATTTGGTTAATTTAACGGTTCCCACTCGGC 589 | GGTTTTCAAAGGCCTTTAAGAGCCGCGTATGGATCGCACGAGTCTG 590 | GTCTTATTCA 591 | 592 | 593 | 594 | 595 | 596 | >seq71 597 | CTACACTGCGTTGTTGGCCCCGAAGCTCACCCACGGTAGAAAGCGGGGTTATAATGGTG 598 | CTTAAGTAGGTACAAACGGATACGGACGTACTGACGCGACATGCCTCGCTTCACAAGGAT 599 | TGGAATCTGTTGATGACGTTTAGGTATGGCTCATAAAATTAGGTCCGGCAGTCCTACATG 600 | CCTTACGCTGGATTAAGCACTCCCATTTACGCACGACAGAGCTTACGGAATCAGCTGC 601 | GCACCCGAGAACACCTGTACGTCTAGTACAGTCGAAACGATCTAGATGTAAAAAAGTTCT 602 | AAGCTTCATCACGCGGCTTAGAACTTCAGGCACGAGCTCATTACAAACA 603 | AAGTCCTCTACCAACCTATAAACGACTAACCAGAGCG 604 | GGTGAACTGCGGGAAGTAAATCGAACATGGCCCGAAGGTAGGTTCGAGTCGGTGCAGA 605 | ATCCCTAGGGGATGAGTGCCCAGCACCGTCCGACCCTTCCGGTCGTAGAATTCA 606 | ATACTCGCAGATGAGGTTCCCCACTGAGAGATCAAAGCACTGTTCGGGGTGTAAACGGCC 607 | TTTACTGTAGTAATGGTTGGGTTACGTCAACCGTATGCGTAA 608 | GCGGAATCATTAGTCCTAAGAGGCTGCCTATTTGGTTAATTTAACGGTTCCCACTCGGC 609 | GGTTTTCAAAGGCCTTTAAGAGCCGCGTATGGATCGCACGAGTCTG 610 | GTCTTATTCAGGCGGTCGACGAAATTAGACCGGTAGAAGATGTAATGGCTTCTCTCTC 611 | ATATGGTTACTCAATTCCTGATTCAAAAACAGGGAACATATGATGTTTCCGGGAGTCG 612 | GACGGGGAACAGCTACGACACACCTTGACAATTGCGGTTTACTGCCTGGCCAACTTGGG 613 | GCAGTTCACGGTTGGAACATCTTTATACGCGATCGCGTGAGGGAAACGTTTAGCAACTA 614 | CTAATCTTTCAAGAACTATGCATAATGGCCAGGCGGCTCGGTGAGTAACAAGCCA 615 | GGGCGAATAC 616 | >seq76 617 | CTACACTGCTTTGTTGGCCCCGAAGCTCACCCACGGTAGAAAGCGGGGTTATAGTGGTG 618 | CTTAAGTAGGTACAGACGGATACGGACGTACTGACGCGACATGCCTCGCTTCACAAGGAT 619 | TGGAATCTGTTGATGACGCTTAGGTACGGCTCATAAAATTAGGTCCGGCAGTCCTACATG 620 | CCTTACGCTGGATTAAGGACTCCCATTTACACACGACAAAGCTTACGGAATCAGGTGC 621 | GCACCCGAGAACACCTGTACGTCTAGTACAGTCGAAACGATCTAGATGTAAAAGAGTTCT 622 | AAGCTTCATCACGCGGCTTAGAACTTCAGGCACGAGCTCATTACAAACA 623 | AAGTCCTCTACCAACGTATAAACGACTAACCAGAGCG 624 | GGTGAACTGCGGGAAGTAAATCGAACATGGACCGAAGGGAGGTTCGAGTCGGTGCAGA 625 | ATCCCTAGAGGATGAGTGCCCAGCACCGTCCGATCCTTCCGGTCGTGGAATTCA 626 | ATACTCGTAGATGAGGTTCCCCACTGAGAGATCAAAGCACTGTTCGGGGTGTAAACGGCC 627 | TTTACTGTAGTAATGGTTGGGTCACGTCAACCGTATGCGTAA 628 | GCGGAATCATTAGTCCTAAGAGGCTGCCTATTTGGTTAATTTAACGGTTCCCACTCGGC 629 | GGTTTTCAAAGGCCTTTAAGAGCCGCGTATGGATCGCACGAGTCTG 630 | GTCTTATTCAGGCGGTCGACGGAATTAGACCGGTAGAAGATGTAATGGCTTCTCTGTC 631 | ATATGGTTACTCAATTCCTGATTCAAAAACAGGGAACATATGATGTTTTCGGGAGTCG 632 | GACGGGGAACAGCTACGACACACCTTGACAATTGCGGTTTACTGCCTGGCCAACTTGGG 633 | GCAGTTCACGGTTGGAACATCTTTATACGCGATCGCGTGAGGGAAACGTTTAGCAACTA 634 | CTAATCTTTCAAGAACTATGCATAATGGCCAGCCGGCTCGGTGAGTAACAATCCA 635 | GGGCGAATAC 636 | >seq70 637 | CTACACTGCGTTTTTGGCCCCGAAGCTCACCCACGGTAGAAAGCGGGGTTATGATGGTG 638 | CTTAAGTCGGTACAGACGGATACGGACGTACCGACGCGACATGCATCGCTTCACAAGGAT 639 | TGGAATCTGTTGATGACGCTTAGGTACGGCTCATAAAATTAGGTCCGGCAGTCCTACATG 640 | CCTTACGCTGGATTAAGGACTCCCATTTTCACATGACAGAGCTTACGGAATCAGCTGC 641 | GCACCCGAGAACCCCTGTACTTCAAGCAGAGTCGAAACGATCTAGATGTAAAAGAGTTCC 642 | AAGCTTCATCACGCGGCTTAGAACTTCAGGCATGAGCTCATTAGAAACA 643 | AAGTCCTCGACCAACGTATAAACGACTAACCAGAGCG 644 | GGTGAACTGCGGGAAGTAAATCGAACACGGCCCGAAGGTAGGTTCGAGTCGGTGCAGA 645 | AACCCTAGAGAATGAGGGCCCAGCACCGTCCGATCCTTCCGGTCGTGGAATTCA 646 | ATACTCGCAGATGAGGTTCCCCACTGCGAGATCAAAGCACTGGACGGGATGTAAACGGCC 647 | TTTACTGTAGTAATGGTTGGGTTGCGTCAACCGTATGCGTAA 648 | GCTGAATCATTAGTCCTAAGAAGCTGCCTATTTGGTTAATTTAACGGTTCCAACTCGGC 649 | GGTTTTCAAAGGACTTTAAGAGCCGCGTATGGATCGCACGAGTTTG 650 | GTCTTATTCAGGCGGTCGACGGAAATAGACCGGTAGAGGATGTAATGGCTTCCATCTC 651 | ATATGGTTACTCAATTCCTGATTCAAAAACATGGAACATATGATGTTTCCGGGAGTCG 652 | GACGGGGAACGGCTACGACACACCTTGAGAATCGCGATTTACTGCCTGGCCAACTCGGG 653 | GCCGTTCACAGTTGGAACATCTTTATAGGCGATCGCGTGAGGGAAACACTTAGCAACTA 654 | CTAATCTTTCAAGAACTATGCATAATGGCTAGCCGGCTCGGTGGGTAACAATCCA 655 | GGGCGAATAC 656 | >seq74 657 | 658 | 659 | 660 | 661 | 662 | 663 | 664 | GTGAACTGCGGGAAGTAAATCGAACACGGCCCGAAGGTAGGTTCGAGTCGGTGCAGA 665 | AACCCTAGAGGATGAGGGCCCAGCACCGTCCGATCCTTCCGGTCGTGGAATTCA 666 | ATACTCGCAGATGAGGTTCCCCACTGAGAGATCAAAGC 667 | 668 | 669 | 670 | 671 | 672 | 673 | 674 | 675 | 676 | >seq42 677 | CTAAACTGTGGTGCTGGCCTCGAAGCTCACCCACGGTAGAAAGCGGGGTTATGATGGTG 678 | CTTAAGTCGGTACAGACGGATACGGACGCACTGACGCGACATGCATCGCTTCACAAGGAT 679 | TGGAATCTGTTGATGACGCTTAGGTACGGCTCATAAAATTAGGTCCGGCAGTCCTACATG 680 | CCTTACGCTGGATTAAGGACTTCCATTTACACATGACAGAGCCTACGGAATTAGCCGC 681 | GCACCTGAGAACCCCTGTACGTCAAGCAGAGTCGAAACGATCTAGATGTAAAAGAGTTCC 682 | AAGCTCCATCACGCGGCTTAGAACTTCAGGCATGAGCTCATTAGAACCA 683 | AAGTCCTCGACCAACGTATAAACGACTAACCAGAGCG 684 | GGTGAACTGCGGGAAGTAAATCAAACACGGCCCGAAGGTAGGTTCGAGTCGGTGCAGA 685 | AACCCTAGAGGATGAGGGACCAGTACCGCCCGATTCTTCCGGTCGTGGAATTCA 686 | ATACTCGCAGATGAGGTTCCCTACAGAGAGGGCAAAGCACTGGCCGGGATGTAAACGGCC 687 | TTTACTGTAGTAATGGTTGGGTTACGTCTACCGTATGCGTAC 688 | GCAGAATCATTAGTCTTAAGAGGCTGCCTATTTGGTAAATTTAATGGTTCCAACTCGGC 689 | GGTTTGCAAAGGACTTTAAGAGCCGCGTATGGATCGCACGAGTTTG 690 | GTCTTATTCAGGCGGTCGACGGAAATAGACTGGTAGAGGATGTAATGGCTTCCATCTC 691 | ATATGGTTACTCAATTCCTGAGTCAAAAACAGGGAACATATGATGTTTCCGGGAGTCG 692 | GACGGGCAACGGCTACGAGACACCTTGAGAATTGCGATCTACTGCCTGGCCAACTTGGG 693 | GCCGTTCACAGTTGGAAGATCTTTATAGGCGATCGCGTGAGGGAAACACTTAACAACTA 694 | CCGATCTTTCAAGAACTATTCATAATGGCTGGCCGGCTCGGTGGGTAACAATCCA 695 | GACCGAATAC 696 | >seq10 697 | CTACACTGCGGTGCTGGCCTCGAAGCTCACCCACGGTAGAAAGCGGGGTTATGATGGTG 698 | CTTAAGTCGGTACAGACGGATACGGACGTACTGACGCGACATGCATCGCTTCACAAGGAT 699 | TGGAATCTGTTGATGACGCTTAGGTACGGCTCATAAAATTAGGTCCGGCAGTCCTACATG 700 | CCTTACGCTGGATTAAGGACTTCCATTTACACATGACAGAGCTTACGGAATTAGCTGC 701 | GCACCCGAGAACCCCTGTACGTCAAGCAGAGTCGAAACGATCTAGATGTAAAAGAGTTCC 702 | AAGCTCCATCACGCGGCTTAGAACTTCAGGCATGAGCTCATTAGAACCA 703 | AAGTCCTCGACCAACGTATAAACGACTAACCAGAGCG 704 | GGTGAACTGCGGGAAGTAAATCGAACACGGCTCGAAGGTAGGTTCGAGTCGGTGCAGA 705 | AACCCTAGAGGATGAGAGACCAACACCGCCCGATCCTTCCGGTCGTGGAATTCA 706 | ATACTCGCAGATGAGGTTCCCCACAGAGAGAGCAAAGCCCTGGCCGGGATGTAAACGGCC 707 | TTTACTGTAGTAATGGTTGGGTTACGTCAACCGTATGCGTAC 708 | GCAGAAACATTAGTCTTAAGAGGCTGCCTATTTGGTAAATTTAATGGTTCCAACTCGGC 709 | GGTTTTCAAAGGACTTTAAGAGCCGCGTATGGATCGCACGAGTTTG 710 | GTCTTATTCAGGCGGTCGACGGAAATAGACTGGTAGAGGATGTAATGGCTTCCATCTC 711 | ATATGGTTACTCAATTCCTGAGTCAAAAACACGGAACATATGATGTTTCCGGGAGTCG 712 | GACGGGCAACGGCTACGACACACCTTGAGAATTGCGATCTACTGCCTGGCCAACTTGGG 713 | GCCGTTCACAGTTGGAACATCTTTATAGGCGATCGCGTGAGGGAAACACTTAACAACTA 714 | CCAATCTTTCAAGAACTATGCATAACGGCTGGCCGGCTCGGTGGGTAACAATCCA 715 | GGCCGAATAC 716 | >seq36 717 | 718 | 719 | 720 | 721 | 722 | 723 | AAGTCCTCGACCAACGTATAAACGGCTAACCAGAGCG 724 | GGTGCACTGCGGGAAGTAAATCGAACACGGCCCGAAGGTAGGTTCGAGTCGGTGCAGA 725 | AACCCTAGAGGATGAGGGACCAGCACCGCCCGATCCTTCCGGTCGTGGAATTCA 726 | ATACTCGCAGATGA 727 | 728 | 729 | 730 | 731 | 732 | 733 | 734 | 735 | 736 | >seq40 737 | 738 | 739 | 740 | 741 | 742 | 743 | 744 | AACTGCGGGAAGTAAATCGAACACGGCCCGAAGGTAGGTTCGAGTCGGTGCAGA 745 | AACCCTAGAGGATGAGGGACCAGCACCGCCCGCTCCTTCCGGTCGTGGAATTCA 746 | ATACTCGCAGATGAGGTTCCC 747 | 748 | 749 | 750 | 751 | 752 | 753 | 754 | 755 | 756 | >seq63 757 | 758 | 759 | 760 | 761 | 762 | CCA 763 | AAGTCCTCGACCGACGTATAAACGACTAACCAGAGCG 764 | GGTGAACTGCGGGAAGTAAATCGAACACGGCCCGAAGGTAGGTTCGAGTCGGTGCAGA 765 | AACCCTAGAGGATGAGGGACCAGCACCGCCCGCTCCTTCCGGTCGTGGAATTCA 766 | ATACTCACAGATGAGGTTCCCCACAGAGAGAGCAAAGCACTGCCCGGGATGTAAACGGCC 767 | TTTACTG 768 | 769 | 770 | 771 | 772 | 773 | 774 | 775 | 776 | >seq27 777 | 778 | 779 | 780 | 781 | 782 | AGCTCCATCACGCGGCTTAGAACTTCAGGCATGAGCTCATTAGAACCA 783 | AAGTCCTCGACCGACGTATAAACGACTAACCAGAGCG 784 | GGTGAGCTGCGGGAAGTAAATCGAACACGGCCCGAAGGTAGGTTCGAGTCGGTGCAGA 785 | AACCCTAGAGGATGAGGGACCAGCACCGCCCGCTCCTTCCGGTCGTGGAATTCA 786 | ATATTCGCAGATGAGGTTCCCCACAGAGAGAGCAAAGCACTGCCCGGGATGTAAACGGCC 787 | TTTACTGTAATAATGGTTGGGTTACGTCAACCGTATGCGTAC 788 | GCAGAATCATTAGTTTTAAGAGGCTGCCTATTTGGTAAATTTAATGGTTCCAACTCGGC 789 | GGTTTTC 790 | 791 | 792 | 793 | 794 | 795 | 796 | >seq34 797 | 798 | 799 | 800 | 801 | 802 | 803 | 804 | TCGAACACGGCCCGAAGGTAGGTTCGAGTCGGTGCAGA 805 | AACCCTAGAGGATGAGGGACCAGCACCGCCCGCTCCTTCCGGTCGTGGAATTCA 806 | ATATTCGCAGATGAGGTTCCCCACAGAGAGAG 807 | 808 | 809 | 810 | 811 | 812 | 813 | 814 | 815 | 816 | >seq19 817 | 818 | 819 | 820 | 821 | 822 | 823 | 824 | 825 | GAGGATGAGGGACCAGCACCGCCCGCTCCTTCCGGTCGTGGAATTCA 826 | ACACTCGCAGATGAGGTTCCCCACAGAGAGAGCAAAGCACTGCCCGGGATGTAAACGGCC 827 | TTTACTGTAGTAATGGTTGGGTTACGTCAACCGTATGCGTAC 828 | GC 829 | 830 | 831 | 832 | 833 | 834 | 835 | 836 | >seq3 837 | CTACACTGCGGTGCTGGCCTCGAAGCTCACCCACGGTAGAAAGCGGGGTTATGATGGTG 838 | CTTAAGTCGGTACAGACGGGTACGGACGTACTGACGCTACATGCATCGCTTCACAAGAAT 839 | CGGAATCTGTTGATGACGCTTAGGTACGGCTCATAAAATTAGGTCCGGCAGTCCTACATG 840 | CCTTACGCCGGATTAAGGACTTCCATTTACATATGACAGAGCTTACGGAATTAGCTGC 841 | GCACCCGTGAACCCCTGTACGTCAAGCAGAGTCGAAACGATCTAGATGTAAAAGAGTTCC 842 | AAGCTCCATCACGCGGCTTAGAACTTCAGGCATGAGCTCATTAGAACCA 843 | AAGTCCTCGACCGACGTATAAACGACTAACCAGAGCG 844 | GGTGAACTGCGGGAAGTAAATCGAACACGGCCCGAAGGTAGGTTCGGGTCGGTGCAGA 845 | AACCCTAGAGGATGAGGGACCAGCACCGCCCGCTCCTTCCGGTCGTGGAATTCA 846 | ATACTCGCAGATGAGGTTCCCCACAGAGAGAGCAAAGCACTGCCCGGGATGTAAACGGCC 847 | TTTACTGTAGTAATGTTGGGTTACGTCAACCGTATGCGTAC 848 | GCAGAATCATTAGTCTTAAGAGGCTGCCTATTTGGTAAATTTAATGGTTCCAACTCGGC 849 | GGTTTTCAAAGGACTTTAAGAGCCGCGTATGGATCGCACGAGTTTG 850 | GTCTTATTCGGGCGGTCGACGGAAATCGACTGGTAGAGGATGTAATGGCTTCCATCTC 851 | ATATGGTTTCTCAATTCCTGAGTCAAAAACACGGAACATATGATTCCGGGAGTCG 852 | GACGGTTAACGGCTACGACACACCTTGAGAATTGCGATCTACTGCCTGGCCAACTAGGG 853 | GCCGTTCACAGTTGGAACATCTTTATAGGCGATCGCGTGAGGGAAACACTTAACAACTA 854 | CCAATCTTTCAAGAACTATGCATAATGGCTGGCCGGCTCGGTGGCTAACAATCCA 855 | GGCCGAATAC 856 | >seq66 857 | CTAGACTGCGGTGCTGGCCTCGAAGCTCACCCACGGTAGAAAGCGGGGTTATGATGGTG 858 | CTTAAGTCGGTACAGACGGATACGGACGTACTGACGCGACATGCATCGCTTCACAAGAAT 859 | CGGAATCTGTTGATGACGCTTAGGTACGGCTCATAAAATTAGGTCCGGCAGTCCTACATG 860 | CCTTACGCTGGATTAAGGACTTCCATTTACATATGACAGAGCTTACGGAATTAGTTGC 861 | GCACCCGTGAACCCCTGTACGTCAAGCAGAGTCGAAACGATCTAGATGTAAAGGAGTTCC 862 | AAGCTCCATCACGCGGCTTAGAACTTCAGGCATGAGCTCATTAGAACCA 863 | AAGTCCTCGACCGACGTATAAACGACTAACCAGAGCG 864 | GTTGAACTGCGGGAAGTAAATCGAACACGGCCCGAATGTAGGTTCGGGTCGGTGCAGA 865 | AACCCTAGAGGATGAGGGACCAGCACCGCCCGCTCCTTCCGGTCGTGGAATTCA 866 | ATACTCGCAGATGAGGTTCCCCACAGAGAGAGCGAAGCACTGCCCGGGATGTAAACGGCC 867 | TTTACTGTAGTTATGGTTGGGTTACGTCAACCGTATGCGTAC 868 | GCAGAATCATTAGTCTTAAGAGGCTGCCTATTTGGTAAATTTAATGGTTCCAACTCGGC 869 | GGTTTTCAAAGAACTTTAAGAGCCGCGTATGGATCGCACGAGTTTG 870 | GTCTTATTCGGGTGGTCGACGGAAATCGACTGGTAGAGGATGTAATGGCTTCCATCTC 871 | ATATGGTTACTCAATTCCTGAGTCAAAAACACGGAACATATGATTCCGGGAGTCG 872 | GACGGTCAACGGCTACGACACACCTTGAGAATTGCGATCTACTGCCTGGCCAACTAGGG 873 | GCCGTTCACAGTTGGAACATCTTTATAGGCGATCGCGTGAGGGAAACACTTAACAACTA 874 | CCAATCTTTCAAGAACTATGCATAATGGCTGGCCGGCTCGGTGGCTAACAATCCA 875 | GGCCGAATAC 876 | >seq84 877 | CTACACTGTGTTGTTGTCCCCGAAGCTCACCCACGGTAGAAAGCGGGGTTATGATGGTG 878 | CTTAAGTCGGTACAGACGGATACGGACGTACTGACGCGACATGCATCGCTTCACAAGGAT 879 | TGGAATCTGTTGATGACGCTTAGGTACGGCTCATAAAATTAGGTCCGGCAGTCCTACATG 880 | CCTAAAGCTGGATTAAGGACTTCCATTTACACATGACTGAGCTTACGGAATCGGCTGC 881 | GCACCCGGGAACCCCTGTACGTCAAGCAGAGTCGAAACGATCTGGATGTAAAAGAGTTCC 882 | AAGCTTCATCATGCGGCTTAGAACTTCAGGCATGTGCTCATTAGAAACA 883 | AAGTCCTCGACCAACGTATAAACGACTAACCAGAGCG 884 | GGTGTACTGCGGGAAGTAAATCGAACACGGCCCGAAGGTAGGTTCGAGTCGGTGCTGA 885 | AACCCCAAAGGATGAGGACCCAGCACCGTCCGATCCTTCCGGTCGTGGAATTCA 886 | ATACTCGCAGATGAGGTTCCCCACTGAGAGGTCAAAGCACTGGTCGGGATGTACACGGCC 887 | TTTACTGTAGTCATGGTTGAGTTCCGTCACCCGTATGCGTAA 888 | GCTGAATCATTAGTCTTAAGAGGCTGCCTATTTGGTTAATTTAACGGTTCCAACTCGGC 889 | GGTTTTCAAAGGACTTTAAGAGCCGCGTATGGATCGCACGAGTTTG 890 | GTCTTATTCAGGCGGTCGACGGAAATAGACCGGTAGAGGATGTAATGGCTTCCATCTC 891 | ATATGGTTACTCAATTCCTGAATCAAAAACAGGGTACATATGGTGTTTCCGGGAGTCG 892 | GACGGGCAACGGCTATGACACACCTTGAGAATTGCGAGTTACTGCCTGGCCAACTTGGG 893 | GCCGTTCACAGTTGCAACATCTTTATTGGCGATCGCGTGAGGGAAACACTTAGCAACTA 894 | CTAATCTTTCAAGAACTATGGGTATTGGCTAGCCGGCTCGGTGGGTAATAATCCA 895 | GGGCGAATAC 896 | >seq95 897 | CTACACTGCGTTGTTGACCCCGAAGCTCACCCACGGTAGAAAGCGGGGTTATGATGGTG 898 | CTTAACTCGGTACAGACGGATACGGACGTACTGACGCGACATGCATCGCTTCACAAGGAT 899 | TGGAATCTGTTGATGACGCTTAGGTACGGCTCATAAAATTAGGTCCGGCAGTCCTACATG 900 | CCTTACGCTGGATTAAGGACTTCCATTTACACATGACAGAGCTTACGGAATCAGCTGC 901 | GCACCCGAGAACCCCTGTACGTTAAGCAGAGTCGAAACGATCTGGATGTAAAAGAGTTCC 902 | AAGCTTCATCATGCGGCTTAGAACTTCAGGCATGAGCTCATTAGAAACA 903 | AAGTCCTCGACCAACGTATAAACGACTAACCAGAGCG 904 | GGTGAACTGCGGGAAGTAAATCGAACACGGCCCGAAGGTAGGTTCGAGTCGGTGCTGA 905 | AACCCCACAGGATGAGGGCCCAGCACCGTCCGATCCTTCCGGTCGTGGAATTCA 906 | ATACTCGCAGATGAGGTTCCCCACTGAGAGGTCAAAGCACTGGTCGGGATGTACACGGCC 907 | TTTACTGTAGTAATGGTTGGGTTACGTCATCCGTATGCGTAA 908 | GCTGAATCATTAGTCTTAAGAGGCTGCCTATTTGGTTAATTTAACGGTTCCAACTCAGC 909 | GGTTTTCAAAGGACTTTAAGAGCCGCGTATGGATCGCACGAGTTTG 910 | GTCTTATTCAGGCGGTCGACGGAAATAGACCGGTAGAGGATGTAATGGCTTCCATCTC 911 | ATATGATTACTCAATTCCTGAATCAAAAACAGGGAACATATGGTGTTTCCGGGAGTCG 912 | GACGGGCAACGGCTATGACACACCTTGAGAATTGCGAGTTACTGCCTGGCCAACTTGGG 913 | GCCGTTCACAGTTGCAACATCCTTATTGGCGATCGCGTGAGGGAAACGCTTAGCAACTA 914 | CTAATCTTTCAAGAACTATGGATAATGGCTAGCCGGCTCGGTGGGTAATAATCCA 915 | GGGCGAATAC 916 | >seq44 917 | CTACACTGCGTTGTTGACCCCGAAGCCCACCCACGGTAGAAAGCGGGGTTATGATGGTG 918 | CTTAATTCGGTACAGACGGATACCGACGTACTGACGTGACATGCATCGCTTCACAAGGAT 919 | TGGAATCTGTTGATCACGCTTAGGTACGGCTCATAAAATTAGGTCCGGCAGTCCTACATG 920 | CCTTACGCTGGATTAAAGACTTCCATTTACACATGACAGAGCTTACGGAATCAGCTGC 921 | GCACCCGAGAACCCCTGTACGTTAAGCAGAGTCGAAACGATCTGGATGTAAAAGAGTTCC 922 | AAGCTTCATCATGCGGCTTAGAACTTCAGGCATGAACTCATTAGAAACA 923 | AGGTCCTCGACCAACGTATAAACGACTAACCAGAGCG 924 | GGTGAACTGCGGGAAGTAAATCGAACACGGCCCGAAGGTAGGTTCGAGTCGGTGCTGA 925 | AACCCCAAAGGATGAGGGCCCAGCACCGTCCGATCCTTCCGGTCGTGGAATTCA 926 | ATACTCGCAGATGAGGTTCCCCACTGAGAGGTCAAAGCACTGGTCGGGATGTACACGGCC 927 | TTTACTGTAGTAATGGTTGGGTTACGTCAACCGTATGCGTAA 928 | GCTTAATCATTAGTCTTAAGAGGCTGCCTATTTGGTTAATTTAACGGTTCCAACTCGGC 929 | GGTTTTCAAAGGACTTTAAGAGCCGCGTATGGATCGCACGAGTTTG 930 | GTCTTATTCAGGCGGTCGACGGAAATAGACCGGTAGAGGATGTAATGGCTTCCATCTC 931 | ATATGGTTACTCAATTCCTGAATCAAAAACAGGGAACATATGGTGTGTCCGGGAGTCG 932 | GACGGGCAACGGCTATGACACACCTTGAGAATTGCAATTTACTGCCTGGCCAACTTGGG 933 | GCCGTTCACAGTTGCATCATCCTTATTGGCGATCGCGTGAGGGAAACGCTTAGCAACTA 934 | CTAATCTTTCAAGGACTATGGATAATGGCTAGCCGGCTCGGTGGGTAATAATCCA 935 | GGGCGAGTAC 936 | >seq2 937 | 938 | 939 | 940 | 941 | 942 | 943 | 944 | GGTGTTGA 945 | AACCCCAAAGGATGAGGGCCCAGCACCGTCCGATCCTTCCGGTCGTGGAATTCA 946 | ATACTCGCAGATGAGGTTCCCCACTGAGAGGTCAAAACACTGGTCGGGATGTACACGGCC 947 | TTTACTGTAGTAATGGTTGGGTTACGTCAACCGTATGCGTAA 948 | GCTTGATCATTAGTCTTAAGAGGCTGCTTATTTGGTTTATTTAACGGTTCCAACTCGGC 949 | GGTTTTCAAAGGACTTTAAGAGCCGCGTATGGATCGCACGAGTTTG 950 | GTCTTATTCAGGCGGTCGACGGAAA 951 | 952 | 953 | 954 | 955 | 956 | >seq17 957 | CTACACTGCGTTGTTGACCCCGAAGCTCACCCACGGTAGAAAGCGGGGTTATGATGGTG 958 | CTTAATTCGGTACAGACGGATACGGACGTACTGACGCGACATGCATCGCTTCACAAGGAT 959 | TGGAATCTGTTGGTCACGCTTAGGTACGGCTCATAAAATTAGGTCCGGCAGTCCTACATG 960 | CCTTACGCTGGCTTAAAGACTTCCATTTACACATGACAGAACTTACGGAATCAGCTGC 961 | GCACCCGGGACCCCCTGTACCTTAAGCAGAGTCGAAACGATATGGATGTAAAAGAGTTCC 962 | AAGCTTCATCATGCGGCTTAGAACTTCAGGCATGAATTCATTAGAAACA 963 | AAGTCCTCGACCAACGTATAAACGACTAACCAGAGCG 964 | GGTGAACTGCGGGAAGTAAATCGAACACGGCCCGAAGTAGGTTCGAGTCGGTGCTGA 965 | AACCCCAAAGGATGAGGGCCCAGCACCGTCCGATCCTTCCGGTCGTGGAATTCA 966 | ATACTCGCAGATGAGGTTCCCCACTGAGAGGTCAAAGCACTGGTCGGGATGTACACGGCC 967 | TTTACTGTAGTAATGGTTGGGTTACGTCAACCGTATGCGTAA 968 | GCTTGATCATTAGTCTTAAGAGGCTGCCTATTTGGTTAATTTAACAGTTCCAACTCGGC 969 | GGTTTTCAAAGGACTTTAAGAGCCGCGTATGGATCGCACGGGTTTT 970 | GTCTTATTCAGGCGGTCGACGGAAATAGACCGGTAGAGCATGTAATGGCTTCCATCTC 971 | ATATGGTTACTCAATTCCTGAATCAAAAACAGGGAACATATGGTGTTTCCGGGAGTCG 972 | GACGGGCAACGGCTATGACACACCTTGAGAATTGCGATTTACTGCCTGGCCAACTTGGG 973 | GCCGTTCACAGTTGCAACATCTTTATTGGCGATCGCATGAGGGAAACGCTTAGCAACTA 974 | CTAATCTTTCAAGAACTATGGATAATGGCTAGCCGGCTCGGTGGGTAATAATCCA 975 | GGGCGAGTCC 976 | >seq48 977 | CTGCACTGCGTTGTTGACCCCGAAGCTCACCCACGGTAGAAAGCGGGGTTATGATGGTG 978 | CTTAATTCAGTACAGACGGATACGGACGTACTGACGCGACATGCATCGCTTCACAAGGAT 979 | TGGAATCTGTTGGTCACGCTTAGGTACGGCTCATAAAATTAGGTCCGGCAGTCCTACATG 980 | CCTTACGCTGGCTTAAAGACTTCCAATTACATATGACAGAACTCACGGAATCAGCTGC 981 | GCACCCGAGAACCCCTGTACCTTAAGCAGAGTCGAAACGATCTGGATGTAAAAGAGTTCC 982 | AAGCTTCATCATGCGGCTTAGCACTTCAGGCATGAATTCATTAGAAACA 983 | AAGTCCTCGACCAACGTATAAACGACTAACCAGAGCG 984 | GGTGAACTGCGGGAAGTAAATCGAACACGGCCCGAAGTAGGTTCGAGTCGGTGCTGA 985 | AACCCCAAAGGATGAGGGCCCAGCACCGTCCGATCCTTCCGGTCGTGGAATTCA 986 | ATACTCGCAGATGAGGTTCCCCACTGAGAGGTCAAAGCACTGGTCGGGATGTACACGGCC 987 | TTTACTGTAGTAATGGTTGGGTTACGTCAACCGTATGCGTAA 988 | GCTTGATCATTAGTCTTAAGAGGCTGCCTATTTGGTTAATTTAACGGTTCCAACTCGGC 989 | GGTTTTCAAAGGACTTTAAGAGCCGCGTATGGATCGCACGGGTTTT 990 | GTCTTATTCAGGCGGTCGACGGAAATAGACCGGTAGAGCATGTAATGGCTTCCATCTC 991 | ATATGGTTACTCAATTCCTGAATCAAAAACAGGGAACATATGGTGTTTCCGGGAGTCG 992 | GACGGGCAACGGCTATGACACACCTTGAGAATTGCGATTTACTGCCTGGCCAACTTGGG 993 | GCCGTTCACAGTTGCAACATCTTTATTGGCGATCGCGTGAGGGAAACGCTTAGCAACTA 994 | CTAATCTTTCAAGAACTATGGATAATGGCTAGCCGGCTCGGTGGGTAATAATCCA 995 | GGGCGAGTCC 996 | >seq11 997 | 998 | 999 | 1000 | 1001 | 1002 | 1003 | AAGTCCTCGGCCAACGTATAAACGACTAACCAGAGCG 1004 | GGCGAACTGCGGGAAGTAAATCGAATACGGCCCGAAGGTAGGTTCGAGTCGGTGCTGA 1005 | AACCCCAAAGGATGAGGGCCCAGCACCGTCCGATCCTTCCGGTCGTGGAATTCA 1006 | ATACTCGCAGATGAGGTTCCCCACTGAGAGGTCAAAGCACTGGTCGGGATGTACACGGCC 1007 | TTT 1008 | 1009 | 1010 | 1011 | 1012 | 1013 | 1014 | 1015 | 1016 | >seq69 1017 | CTACACTGCGTTATTGACCCCGAAGCTCACCCACGGTAGAAAGCGGGGTTATGATGGTG 1018 | CTTAATTCGGTCCAGACGGATACGGACGTACTGACGCGACATGCATCGCTTCACAAGGAT 1019 | TGGAATCTGTTGGTCACGCTTAGGTACGGCGCATAAAATTAGGTCCGGCAGTCCTACATG 1020 | CCTTACGCTGGATTAAAGACTTCCATTTACACATGACAGAGCTTACGGAATCAGCTGC 1021 | GCACCCGAGAACCCCTGTACGTTAAGCAGAGTCGAAACGATCTGGATGTAAAAGAGTTCC 1022 | AAGCTTCATCATGCGGCTTAGAACTTCGGGCATGAATTCATTAGAAACA 1023 | AAGTCCTCGACCAACGTATAAACGACTAACCAGAGCG 1024 | GGCGAACTGCGGGAAGTAAATCGAACACGGCCCGAAGGTAGGTTCGAGTCGGTGCTGA 1025 | AACCCCAAAGGATGAGGGCCCAGCACCGTCCGATCCTTCCGGTCGTGGAATTCA 1026 | ATACTCGCAGATGAGGTTCCCCACTGAGAGGTCAGAGCACTGGTCGGGATGTACACGGCC 1027 | TTTACTGTAGTAATGGTTGGGTTACGTCAACCGTATGCGTAA 1028 | GCTTGATCATTAGTCTTAAGAGGCTGCCTATTTGGTTAATTTAACGGTTCCAACTCGGC 1029 | GGTTTTCAAAGGACTTTAAGAGCCGCGTATGGATCGCACGGGTTTG 1030 | GTCTTATTCAGGCGGTCGACGGAAATAGACCGGTAGAGCATGTAATGGCTTCCAGCTC 1031 | ATATGGTTACTCAATTCCTGAATCAAAAACAGGGAACATATGGTGTTTCCGGGAGTCG 1032 | GACGGCTATGACACCCCTTGAGAATTGCGATTTACTGCCTGGCCAACTTGGG 1033 | GCCGTTCACAGTTGCAACATCTTTATTGGCGATCGCGTGAGGGAAACGCTTAGCAACTA 1034 | CTAATCTTTCAAGAACTATGGATAATGGCTAGCCGGCTCGGTGGGTAATAATCCA 1035 | GGGCGAGTCC 1036 | >seq94 1037 | CTACACTGCGTTGTTGACCCCGAAGCTCACCCACGGTAGAAAGCGGGGTTATGATGGTG 1038 | CTTAAGTCGGTACAGACGGATACGGACGTACTGACGCGACATGCATCGCTTCACAAGGAT 1039 | TGGAATCTGTTGATGACGCTTAGGTACGGCTCATAAAATTAGGTCCGGCAGTCCTACATG 1040 | CCTTACGCTGGATTAAGGACTTCCATTTACACATGACAGAGCTTACGGGATCAGCTGC 1041 | GCACCCGAGAACCCCTGTACGTTAAGCAGAGTCGAAACGATCTGGATGTAAAAGAGTTCC 1042 | AAGCTTCATCATGCGGCTTAGAACTTCAGGCATGAGCTCATTAGAAACA 1043 | AAGTCCTCGACCAACGTATAAACGACTAACCAGAGCG 1044 | GGTGAACTGCGGGAAGTAAATCGAACACGGCCCGAAGGTAGGTTCGAGTCGGTGCTGA 1045 | AACCCCAAAGGATGAGGGCCCAGCACCGTCCGATCCTTCCGGTCGTGGAATTCA 1046 | ATACTCGCAGATGAGGTTCCCCACTGAGAGGTCAAAGCACTGGTCGGGATGTACACGGCC 1047 | TTTACTGTAGTAATGGTTGGGTTACGTCAACCGTATGCGTAA 1048 | GCTGAATCATTAGTCTTAAGAGGCTGCCTATTTGGTTAATTTAACGGTTCCAACTCGGC 1049 | GGTTTTCAAAGGACTTTAAGAGCCGCGTATGGATCGCACGAGTTTG 1050 | GTCTTATTCAGGCGGTCGACGGAAATAGACCGGTAGAGGATGTAATGGCTTCCATCTC 1051 | ATATGGTTACTCAATTCCTGAATCAAAAACAGGGAACATATGGTGTTTCCGGGAGTCG 1052 | GACGGGCAACGGCTATGACACACCTTGAGAATTGCGAGTTACTGCCTGGCCAACTTGGG 1053 | GCCGTTCACAGTTGCAACATCCTTATTGGCGATCGCGTGAGGGAAACACTTAGCAACTA 1054 | CTAATCTTTCAAGAACTATGGATAATGGCTAGCCGGCTCGGTGGGTAATAATCCA 1055 | GCGCGAATAC 1056 | >seq81 1057 | 1058 | 1059 | 1060 | 1061 | 1062 | 1063 | 1064 | AGCGGA 1065 | AACCATAAAAGGTAACTGCCAAGAAATCCACGCTCCTTCTGCGGGTCGTGATTGACTTCC 1066 | ACACGCGGTACGTGGCTCCCCTCTATGAGTTCTAAACAACGGCAGGGACGGCGGTGGCC 1067 | GCACTTGTAAAAATGCTAGGCTTGCGTCAATTGTCTGCGGAA 1068 | AATGTATCATTAGCCCTTAGTGGTCGG 1069 | 1070 | 1071 | 1072 | 1073 | 1074 | 1075 | 1076 | >seq80 1077 | CTGGCCTGCATTGTTTCCCACGAAGCCCACACATCGTCAACAGCGGGGTTACAATGGGT 1078 | CTGCCGTCGCTGCCGCCGGATAACAGCTTATTGGCCCGCTATGTGCCGCTTCTCTAGGAA 1079 | CAGAAACTGATTGTGATGGGTTGTTGTAGCCGCTTAATGTAGGATAGGTGACTGGTCACG 1080 | CCCTAGGATGGATTCAAACCCCCCAGTCACACCTGAGCGAAGCTACGGAATGAGGCAT 1081 | ACATCCGAGTACTCATGCACGTCAAGTATAGTCCAAACGACCGCGGTGAAAAGTACTCCC 1082 | AAATTTCATCACGCATACAACAAATGCAGCGAATAATTTATTGTAAAGATCTAACCCACC 1083 | GCAAAGTAAGGGAAATCGTTACGTTCTTGAATGACGTTTAAGTGACAAATTAAAGGT 1084 | GATGAACTACGAGACGTACAACGTACTGGACGAAAAAGTAGAAGCGGA 1085 | AATCACAAAAGGTAACTGCCAAGAAATCCACGCTCCTTCTTCGGGTCGTGATTGACTTGC 1086 | ATACCCGGTACGTGGCTCCCCTCTATGAGTTCTAAACAACGGCAGGGACGGCAGTGGCC 1087 | GCACTTGTAAAAATGCTAGGCTTGCGTCAATTGTCTGCGGAA 1088 | AATGTATCATTAGCCCTTAGTGGTCGGCTATTTGATGAATTAAATAGTTCTATATCGGC 1089 | AACCGGGAATTTACTGTAATAGCCGCGGAAGAGCTGCAAGATTTTG 1090 | GTCTCGTAAAGGCGGTCGATGTAGAGAACCCAGTCAAGTGTGCGATAGCTCCCATCTT 1091 | TTATGATGGCTGAATTCTGAGTCCGAAGCGCGAAAGCTGATGACAGCATCAAGAAGTG 1092 | GACGGGGAACAGCCATAGCGAGCTTGAAAACTACGATTTGCGTCCTACCCAACTTG 1093 | ATATTTATTGACGGTCGTGTGAGAAAAACTTTTAAGAACTA 1094 | CAAATTGGCCGCGAACCAGGTATGAACCTACGGCTGCTCAACTGCTAACTCTTTA 1095 | GCGCGAACGC 1096 | >seq82 1097 | CTGGCCTGCATTGTTTCCCACGAAGCCCACACATCGTCAACAGGGGGGTTACAATGGGT 1098 | CTGCCGTCGCTGCCGCCGGATAATAGCTTATTGGCCCGCTATGTGCCGCTTCTCTAGGAA 1099 | CAGAAACTGATTGTGATGGGTTGTTGTAGCCGCTTAATGTAGGATAGGTGACTCGTCACG 1100 | CTCTAGGATGGATTCAAACCCCCCAGTCACACCTGAGCGAAGCTACGGAATGAGGCAT 1101 | ACATCCGAGTACTCATGCACGTCAAGTATAGTCCAAACGACCGCGGTGTAAAGTACTCCC 1102 | AAATTTCATCACGCATACAACAAATGCAGCGAATAATTTATTGTAAAGATCTAACCCACC 1103 | GCAAAGTAAGGGAAATCGTTACGTTCTTGAATGACGTTTAAGTGACAAATTAAAGGT 1104 | GATGAACAACGAGACGTACAACGTACTGGACGAAAAAGTAGAAGCGGA 1105 | AATCATAAAAGGTAACTGCCAAGAAATCCACGCTCCTTCTGCGGGTCGTTATTGACTTGC 1106 | ATACCCGGTACGTGGCTCCCCTCTATGAGTTCTAAACAACGGCAGGGACGGCAGTGGCC 1107 | GCATTTGTAAAAATGCTAGGCTTGCGTCAATTGTCTGCGGAA 1108 | AATGTCTCATTAGCCCTTAGTGGTCGGCTATTTGATGAATTAAATAGTTCTATATCCGC 1109 | AACTGGGAATTTACTGTAATAGCCTCGGAAGAACTGCAAGATTTTG 1110 | GTCTCGTCAAGGCGGTCGATGTAAAGAACCCAGTCAAGTGTGCGATAGCTCCCATCTT 1111 | TTACGATGGCTGAATTCTGAGTCCGAAGCGCGAAAGCTGATGACAGCATCAAGAAGTG 1112 | GACGGGGAACAGCCATAGCGAGCTTGAAAACTACGATTTGCGTCCTACTCAACTTG 1113 | ATAATTATTGACGGTCGTGTGAGAAAAACTTTTAAGAACTA 1114 | CAAATTGGCCGCGAACCAGGTATGAACCTACGGCTGCTCAACTGCTAACTCTTTA 1115 | GCGCGAACGC 1116 | >seq46 1117 | 1118 | 1119 | 1120 | 1121 | 1122 | 1123 | GCAAATCGTTACGTTCGCGAATGACGTATAAGTGACAAATTAAAGGT 1124 | GATGAACTACGAGACGTACAACGTACTGGACGAAAAAGGAGGCTCAAGACGGAGCGGG 1125 | AACCTTAAAAGGTAACTGCCAAGAAATCCACGAGCCTTCTGCGGGTCGTAATCGACCCCC 1126 | ATACGCAGTACGTGACTCCCCTCTCCGAGTTCTAATCAACGGCAGGGACGGCGGTGACC 1127 | TCACTTGTAAAAATGCTAGGGTTGCGACAATTGTATGTG 1128 | 1129 | 1130 | 1131 | 1132 | 1133 | 1134 | 1135 | 1136 | >seq83 1137 | 1138 | 1139 | 1140 | 1141 | 1142 | 1143 | 1144 | 1145 | CGTGATCGACCCCC 1146 | ATACGCGGTACGTGACTCCCCTCTACGAGTTCTAATCAACGGCAGTGACGGCGATGACC 1147 | TCACTTGTGAAAATGC 1148 | 1149 | 1150 | 1151 | 1152 | 1153 | 1154 | 1155 | 1156 | >seq45 1157 | 1158 | 1159 | 1160 | 1161 | 1162 | 1163 | AAAGG 1164 | GAGGATCTACGAAACGGACAACGCACTCGACGCAATCGGAAATTCAAGTGGGAACGGA 1165 | AACCATCAGAGGTTACTCCTTATGGGTCTACCATTTTTTTGCGTGTCGTGACTAACCTCT 1166 | ATACACGGAACGTGGCACCCCGCGACGATCTCTGATCATCGGCAGCCACGTCGGTGGCC 1167 | TTTTGC 1168 | 1169 | 1170 | 1171 | 1172 | 1173 | 1174 | 1175 | 1176 | >seq61 1177 | 1178 | 1179 | 1180 | 1181 | 1182 | 1183 | 1184 | GAACGGA 1185 | AACCATCAGAGGTTACTCCTTATGGGTCTACCATTTTTTTGCGTGTCGTGACTAACCTCT 1186 | ATACACGGAACGTGGCACCCCGCGACGATCTCTGATCATCGGCAGCCACGTCGGTGGCC 1187 | TTTTGCAGAGAAG 1188 | 1189 | 1190 | 1191 | 1192 | 1193 | 1194 | 1195 | 1196 | >seq50 1197 | ATGGCCCGCATTGTTGCCTACGAAGCACACGCATGGTCAACCGAGGTTACAATGGTG 1198 | CTGCCGTCGCTGTGGCCGGATAACAGCTTATTGGCCCGCTATGTGTCTCTGCCTAAGGAA 1199 | CAGAAACTGATGGCGATGGGTTGGTGCAGCCGGTTGATGTAGGTCCGGCAATTCGTCACA 1200 | CCTTAGGATGTATTAAAACCCCTCACTCAAACCTGAGCGAATCTACGAAATGAGGCAT 1201 | ACATCCGAAACCTTATGCACGCCAAGGGTAGTCCTAACGATCTAGGTATTAAGAACGACC 1202 | AAATTTCATCACGTATACAATAAATGCAGGGAAGTATTCATTGTGAAGA 1203 | ATGGTCTCGAACGACATATAAGGGACAAACGAAAAGG 1204 | GACGATCTACGAAACGGACAACGCACTCGACGCAATCGGAAATTCAAGTGGGAACGGA 1205 | AACCATCAGAGGTTACTCCTTATGGGTCTACCATTTTTTTGCGTGTCGTGACTAACCTCT 1206 | ATACACGGAACGTGGCACCCCGCGACGATCTCTGATCATCGGCAGCCACGTCGGTGGCC 1207 | TTTTGCAGAGAAGCCAGGGTTACTTCAAGTGTATGGGCCAAGCGCTTCCTTTAGGA 1208 | AGTGTGGCATTAGTCCCTAGAGGCCGCATATTTGACGAATGAAACAATCCGATATCTGC 1209 | AATTGGCAATAGACTGTAATAGCCGCGGCAGAGTTGCAAATTCTTG 1210 | GTCTCGTCAAGGCGGTCGACGTAATTAGCTCAGTCAAAGCTAGGATTGCTCCCGTTTC 1211 | ATGTGATGGCTGAAATCTGGGTTTGGAAAGCGGAAGTACATGTATCATCAAGAAGCG 1212 | GTCAGGGAACGGCCATAGCGACCTTGAGAACGACGATTTGCGTCCTACCCAACATA 1213 | ATATTTATTGGCGATCGAGTGAGGGACACGCTTGAGATCTA 1214 | AAAATCTGTGGCGAACCATGTATAAATCTACGCCGGCTCACCTGCTAACTCTTTA 1215 | GAGCGAACGC 1216 | >seq12 1217 | 1218 | 1219 | 1220 | 1221 | 1222 | CATCACGTATACAATAAATACAGGGAAGTATTCATTGTGAAGA 1223 | ATGGTCTCGAACGACATATAAGGGACAAACGAGAAGG 1224 | GAGGATCTACGAAACGGACAACGCACTCGACGCAATCGGAAATTCAAGTGGGACCGGA 1225 | AACCATCAGAGGTTACTCCTTATGGGTCTACCATTTTTTTGCGTGTCGTGACTAACCTCT 1226 | ATACACGGAACGTGACACCCCGCGACGATCTCTGATCATCGGCAGCCACGTCG 1227 | 1228 | 1229 | 1230 | 1231 | 1232 | 1233 | 1234 | 1235 | 1236 | >seq54 1237 | 1238 | 1239 | 1240 | 1241 | CGACC 1242 | AAATTTCATCACGTATACAATAAATGCAGGGAAGTATTCATTGTGAAGA 1243 | ATGGTCTCGAACGACATATAAGGGACAATCGAAAAGG 1244 | GAGGATCTACGAAACGGACAACGCACTCGACGCAATCGGAAATTCAAGTGGGAACGGA 1245 | AACCATCAGAGGTTACTCCTTATGGGTCTACCATTTTTTTGCGTGTCGTGACTAACCTCT 1246 | ATACACGGAACGTGGCACCCCGCGACGATCTCTGATCATCGGCAGCCACGTCGGTGGCC 1247 | TTTTGCAGAGAAGCCAGGGTTACTTCAAGTGTATGGGCCAAGCGCTTCCTTTAGGA 1248 | AGTGTGGCATTAGTCCCTAGAGGCCGCATATTTGACGAATGGAACATTCCGATATCTGC 1249 | AATTGGCAATAGACTGTAATAGCCGCGGCAGAGTTGCAAATTCTTG 1250 | GTCTCGTCAAGGCGGT 1251 | 1252 | 1253 | 1254 | 1255 | 1256 | >seq26 1257 | ATGGCCCGCATTGTTGCCTACGAAGCACACGCATGGTCAACCGAGGTTACAATGGTG 1258 | CTGCCGTCGCTGTGGCCGGATAACAGCTTATTGGCCCGCTATGTGTCTCTGCCTAAGGAA 1259 | CAGAAACTGATGGCGATGGGTTGGTGCAGCCGGTTGATGTAGGTCCGGCAATTCGTCACA 1260 | CCTTAGGATGTATTAAAACCCCTCACTCAAACCTGAGCGAATCTACGAAATGAGGCAT 1261 | ACATCCGAAACCTTATACACGCCAAGTGTAGTCCTAACGATCTAGGTATTAAGAACGACC 1262 | AAATTTCATCACGTATACAATAAATGCAGGGAAGTATTCATTGTGAAGA 1263 | ATGGTCTCGAACGACATATAAGGGACAAACGAAAAGG 1264 | GGCAGGATCTACGAAACGGACAACGCACTCGACGCAATCGGAAATTCAAGTGGGGACGGA 1265 | AACCATCAGAGGTTACTCCTTATGGGTCTACCATTTTGTTGCGTGTCGTGACTAACCTCT 1266 | ATACACGGAACGTGGCACCCCGCGACGATTTCTGATCATCGGCAGCCACGTCGGTGGCC 1267 | TTTTGCAGAGAAGCCAGGGTTACTTCAAGTGTATGGGCCAAGCGCTTCCTTTAGGA 1268 | AGTGTGGCATTAGTCCCTAGAGGCCGTATATTTGACGAATGAAACAATCCGATATCTGC 1269 | AATTGGCAATAGACTGTAATAGCCGCGGCAGAGTTGCAAATTCTTT 1270 | GTCTCGTCAAGGCGGTCGACGTAACTAGCTCAGTCAAAGCTAGGATTGCTCCCGTTTC 1271 | ATGTGATGGCTGAAATCTGGGTTTGGAAAGCGGAAGTACATGTATCATCAAGAAGCG 1272 | GTCAGGGAACGGCCATAGCGACCTTGAGAACGACGATTTGCGTCCTACCCAACATA 1273 | ATATTTATTGGCGATCGAGTGAGGGACACGCTTGAGATCTA 1274 | AAAATCTGTGGCGAACCATGTATAAATCTACGCCGGCTCCCCTGCTAACTCTTTA 1275 | GAGCGAACGC 1276 | >seq47 1277 | 1278 | 1279 | 1280 | 1281 | 1282 | 1283 | AACGAAAAGC 1284 | GGCAGGATCTACGAAACGGACAACGCACTCGACGCAATCGGAAATTCAAGTGGGAACGGA 1285 | AACCATCAGAGGTTACTCCTTATGGGTCTACCATTTTATTGCGTGTCGTGACTAACCTCT 1286 | ATACACGGAACATGGCACCCCGCGACGATTTCTGATCATCGGCAGCCACGTCGGTGGCC 1287 | TTTTGCAGAGAAGCCAGGGTTACTTCAAGTGTATGGGCCAAGCGCTTCCTTTAGGA 1288 | AGTGTGGCATTAGTCCCTAGAGGCCGCATAATT 1289 | 1290 | 1291 | 1292 | 1293 | 1294 | 1295 | 1296 | >seq88 1297 | ATGGCCCGCATTGTTGCCTACGAAGCACACGCATGGTCAACAGCGAGGCTACAATGGTG 1298 | CTGCCGTCGCTGTGGCCGGGTAACAGCTTATTGGCCCGCTATGTGTCTCTGCCTAAGGAA 1299 | CAGAAACTGATGGCGATGGGTTGGTGCGGCCGGTTGATGTAGGTCCGGCAATTCGTCACA 1300 | CCTTAGGGTGTATTAAAACCCCTCACTCAAACCTGAGCGAGTCTACGAAATGAGGCAT 1301 | GCATCCGAAACCTTATGCACGCCAAGTGTAGTCCTAACGATCTAGGTATTAAGAACTACC 1302 | AAATTTCATCACGTATACAATAAATGCAGGGAAGTATTCATTGTGAAGA 1303 | ATGGCCTCGAACGACATATAAGTGACAAACGAAAAGG 1304 | GAGGATCTACGAAACGGACAACGCACTCGACGCAATCGGAAATTCAAGTGGGAACGAA 1305 | AACCATAAGAGGTTACCCCTTATGGGTCTACCATTCTTTTGCGTGTCGTGACCAACCTTT 1306 | ATACGCGGAACGTGGCACCCCGCGACGATCTCTGTTCATCGGCAGCCACGTCGGTGGCC 1307 | TTTTGCAGAGAAGCCAGGGTTACTTCAAGTGTATGGGCCAAGCGCTTCCTTTAGGA 1308 | AGTGTGGCATTAGTCCCTAGAGGCCGCATATTTGACGAATGAAACAATCCGATATCTGC 1309 | AATTGGCAATTGACTGTGATAGCCGCGGCAGAGTTGCAAATTCTTG 1310 | GTCTCGTCAAGGCGGTCGACGTAATTAGCTCAATCAAAGCTAGGATTGCTCCCGTTTC 1311 | ATGTGATGGCTGAAATCTGGGTTCGGAAAGCGGAAGTACATGTATCATCAAGAAGCG 1312 | GTCGGGGAACGGCCATAGCGACCTCGAGAACGACGATTTGCGTCCTACCCAACATA 1313 | ATATTTATTGGCGATCGAGTGAGGGACACGCTTGAGATCTA 1314 | AAAATCTGTGGCGAACCATGTATAAATCTACGCCGGCTCACCTGCTAACTCTTTA 1315 | GAGCGAACGC 1316 | >seq89 1317 | 1318 | 1319 | 1320 | 1321 | 1322 | 1323 | 1324 | 1325 | TTTTGCGAGTCGTGACTAACCTCT 1326 | ATACGCGGAACGTGGCACCCCGCGACGATCTCTGATCATCGGCAGCCACGTCGGTGGCC 1327 | TTTTGCAGAGAAGCCAGGGTTACTTCAAGTGTATGGGCCAAGCGCTTCCTTTAGGA 1328 | AGTGTGGCATTAGTCCCTAGAGGCCGCATATTTGACGAATGAAACAATCCGATATCTGC 1329 | AATTGGCAATTGACTGTAATAGCCGCGGCAGAGTTGCAAATTCTTG 1330 | 1331 | 1332 | 1333 | 1334 | 1335 | 1336 | >seq75 1337 | 1338 | 1339 | 1340 | 1341 | 1342 | 1343 | 1344 | CAAACTCAAGTGGGAACGGA 1345 | AACCATAAGAGGTTACTCCCAGAGTCTTCAATTCTTCAGCGGGTCGTGACTAAGCTCT 1346 | ATACGCGGAACGTGGCACCCCGCTACGATCTCTAATCATCGGCAGCCACGTCAGTGGCC 1347 | TTTTGCAAAGAAGCCAGGGTTACTTCAATTGTATGGGCCAAGCGCTTCCTCTAGGA 1348 | AGTGTTGCATTAGTCCCTAGAGGCCGCATACCTGACGAATGAAACAATCCGATATCTGC 1349 | AATTGGCAATTGACTGTGATAGCCGCGGCAGAGTTGCAAAATCTTG 1350 | 1351 | 1352 | 1353 | 1354 | 1355 | 1356 | >seq14 1357 | ATGGCCCGCATTGTTGCCTACGAAGCACACACATGGTCAACAGCGGGGTTACAATGGCG 1358 | CTGCCGTCGCTGGCGCCGGATAACAGCTTATTGGCCCGCTATGGGTCGCTGCCTGAGGAA 1359 | CAGAAAATGATGGCTATGGGTTGGTGCGGCCGGTTGATGCAGGTCCGGCAATTCGTCACA 1360 | CCTTAGGGTGTATTCAAACCCCTCATTCAAACCTGAGCGAGTCTACGAAATGAGGCAT 1361 | GCATCCGAAACCTTATGCACGCCAAGTGCAGTCCAAACGATCTAGGTGTTAAGAACTATC 1362 | AAATTTCATCACGTATACAATAAATGCAGGGAAGTATTCATTGTGAAGA 1363 | ATGGTCTCGCACGACACATAAGTGACAAACGAAAAGG 1364 | GAGGATCTACGAGACGTACAACGCACTCGACGCAAGCGCAAATTCAAGCGGGAACGGA 1365 | AACCATAAGAGGTTACTCCCAATGAGTCTACGATTCTTCTGCGGGTCGTGACTAAGCTCT 1366 | ATACGCGGAACGTGGCACCCCGCTACGATCTCTAATCATCGGCAGCCACGTCAGTGGCC 1367 | TTTTGCAAAGAAGCCAGGGTTACTTCAATTGTATGGGCCAAGCGCTTCCTTTAGGA 1368 | AGTGTGGAATTAGTCCCTAGAGGCCGAATACTTGACGAATGAAACAATCCGATATCTGC 1369 | AATTGGCAATTGACTGTGATAGCCGCGGCAGAGTTGCAAAATCTTG 1370 | GTCTCGTCAAGGCGGTCGACGTAAATAGCTCAGTCAAAGCTACAATTGCTCCCGTTTC 1371 | ATGGGATGGCTGAAATCTGGGTTCGAAAAGCGGAAGTTCACGTATCATCAAGAAGCG 1372 | GACGGGGAACGGCCATACCGACCTTGAGAACGACGATTTGCGTCCTACCCAACATA 1373 | ATATTTATTGGCGATCGAGTGAGGGACACGCTTGAGATCTA 1374 | AAAATTTGTCGCGAACCATGTATAAATCTACGCCGGCTCACCTGCTAACTCTTTA 1375 | GAGCGAACGC 1376 | >seq15 1377 | ATGGCCCGCATTGTTGCCTACGAAGCACACACATGGTCAACAGCGGGGTTACAATGGCG 1378 | CTGCCGTCGCTGTCGCCGGATAACAGCTTATTGGCCCGCTATGTGTCGCTGCCTGAGGAA 1379 | CAGAAAATGATGGCTATGGGTTGGTGCGGCCGGTTGATGCAGGTCCGGCAATTCGTCACA 1380 | CCTTAGAGTGTATTCAAACCCCTCACTCAAACCTGAGCGAGTCTACGAAATGAGGCAT 1381 | GCATCCGAAACCTTATGCACGCCAAGTGCAGTCCAAACGATCTAGGTGTTAAGAACTATC 1382 | AAATTTCATCACGTATACAATAAATGCAGGGAAGTATTCATTGTGAAGA 1383 | ATGGTCTCGCACGACACATAAGTGACAAACGAAAAGG 1384 | GAGGATCTACGAAACGTACAACGCACTCGACGCAAACGCAAATTCAAGCGGGAACGGA 1385 | AACCATAAGAGGTTCCTCCCAATGAGTCTACGATTCTTCTGCGGGTCGTGACTAAGCTCT 1386 | ATACGCGCAACGTGGCACCCCGCTACGATCTCTAATCATCGGCAGCCACGTCAGTGGCC 1387 | TTTTGCAAAGAAGCCAGGGTTACTTCAATTGTATGGGCCAAGCGCTTCCTTTAGGA 1388 | AGTGTGGAATTAGTCCCTAGAGGCCGCATACTTGACGAATGAAACAATCCGATATCTGC 1389 | AATTGGCAATTGACTGTGATAGCCGCGGCAGAGTTGCAAAATCTTG 1390 | GTCTCGTCAAGGCGGTCGACGTAAATAGCTCAGTCAAAGCTACGATTGCTCCCGTTTC 1391 | ATGTGATGGCTGAAATCTGGGTTCGAAAAGCGGAAGTTCACGTATCATCAAGAAGCG 1392 | GACGGGGAACGGCCATACCGACCTTGAGAACGACGATTTGCGTCCTACCCAACATA 1393 | ATATTTATTGGCGATCGAGTGAGGGACACGCTTGAGATCTA 1394 | AAAATTTGTCGCGAACCATGTATAAATCTACGCCGGCTCACCTGCTAACTCTTTA 1395 | GAGCGAACGC 1396 | >seq1 1397 | ATGGCCCGCATTGCTGCCTACGAAGCTCAAACATGGTCAACAGCCGGGTTAGAATGGCG 1398 | CTGACGTCGCTGTCGCCGGATAACAGCTTATTGGCCCACTATGCGTCGCTACCTCAGGAA 1399 | CAGAAACTGATGGCAATGGGTTGGTGCGGCCGGTTGATGTAGGTCCGGCAATTCGTCACA 1400 | CCTTAGGGTGTATTGAAACCCCCCACTCAAACCTGAGCGACTCTACGAAATGAGGCAT 1401 | GCATCTGAAACCTTATGCACGCCAAGAGTAGTCCAAACGATCTAGGTGTTAAGAACTACG 1402 | AAATTTCATCACGTATACAATAAAGGCAGGGAAGTATTGATTTTGAAGA 1403 | ATGGTCTGGAACTACATATAAGTGACAAACTAAAAGC 1404 | GAGGATCTACGAAACGTACAACGCACTCGACGCAAACGGAAATTCAAGTGGGAACGGA 1405 | AGCCATAAGAGGTTATTCCCAATGAGTTTATGACTATTCTGCGGGTCGTGACTAA 1406 | ATACGCGGAACGTGACACCCCGCTACGATCTCTAATCATCGGCAGCCACGTCAGTGGCC 1407 | TTTTGCAAAGAAGCCAGGCTTACGTCAATTGTTTGTGCGAAGCGCTTCCTGTAGGA 1408 | AGTGCAGGATTAGTCCCTAGAGGCCGCATATTTGACGAATGAAATAATCCGATATCTGC 1409 | AATTAGCAATTGACTGTAATAGCCGCTGTCGAGTTGCAAAATCTTG 1410 | GTCTCGTCCAGGCGGTCAACGTAAATAGCTCAGTCAGAGCTTCGATTGCTCCCGTTTC 1411 | ATGTGATGGCTTAAATCTGGGTTCGAAAAGCGGAAGTACATGTATCTTCAAGAGGCG 1412 | GACGAGGAACGGCTGTACCGACCTCGAGAACGACGATTCGCGTCCTACCCCACATA 1413 | AAATTTATTGGCGATCGAGTGAGGGACACGCTTGAGATCCG 1414 | GAAATCTGTCCCGAACCATGTATAAATCTACGCGGGTTCGTCTGCTAACTCTTTA 1415 | GAGCGGACGC 1416 | >seq41 1417 | 1418 | 1419 | 1420 | 1421 | 1422 | 1423 | ATGGTCTGGAACGACATATAAGTGACAAACTAAAAGC 1424 | GAGGATCTACGAAACGTACAGCGCACTCGACGCAAACAGAAATTCAAGTGGGAACGGA 1425 | AGCCATAAGAGGTTATTCTCAATGAGTCTATGATTATTCTGCGGGTCGTGACTAAGCTCT 1426 | ATACGCGGAACGTGGCACCCCGCTACGATCTCTAATCATCAGCAGCCACGTCAGTGGCC 1427 | TTTTGCAAAGAAGCCAGGCTTACGTCAATTGTTTGTGCGAGGCGCTTCCTGCAGGG 1428 | AGTGTAGCATTAGTCCCTAGAGGCCGCATATTTGACGAATGAAACAATTCGATATCTGC 1429 | AATTAGCAATTGACTGTAATAGC 1430 | 1431 | 1432 | 1433 | 1434 | 1435 | 1436 | >seq98 1437 | 1438 | 1439 | 1440 | 1441 | 1442 | 1443 | ATGGTCTGGAACGACATATAAGTGACAAACTAAAAGC 1444 | GAGGATCTACGAAACGTACAACGCACTCGACGCAAACGGAAATTCAAGTGGGAACGGA 1445 | AGCCATAAGAGGTTATTCCCAATGCGTCTACGACTCTTCTGCGGGTCGTGACTAAGCTCT 1446 | ATACGCGGAACGTGGCACCCCGCTACGATCTCTAATCATCGGCAGCCACGTCAGTGGCC 1447 | TTTTACGAAGAAGTCAGGCTTACGTCGATTGTGTGTGCGAAGCGCTTCCTGTAGGA 1448 | AGTGTAGCAGTAGTCCCTAGAGGCCGCATATTTGACGAATGAAACAATCCGATATCTG 1449 | 1450 | 1451 | 1452 | 1453 | 1454 | 1455 | 1456 | >seq25 1457 | 1458 | 1459 | 1460 | 1461 | GAACTTCG 1462 | AAATTTCATCACGTATACAATAAAGGCAGGGAAGTATTGATTTTGAAGA 1463 | ATGGTCTGGAACGACATATAAGTGACAAACTAAAAGC 1464 | GAGGATCTACGAAACGTACAACGCACTCGACGCAAACGGAAATTCAAGTGGGAACTGA 1465 | AGCCATAAGGGGTTATTCCCAATGCGTCTACGACTCTTCTGCGGGTCGTGACTAAGCTCT 1466 | ATACGCGGAACGTGGCACCCCGCTACGATCTCTAATCATCGGCAGCCACGTCAGTGGCC 1467 | TT 1468 | 1469 | 1470 | 1471 | 1472 | 1473 | 1474 | 1475 | 1476 | >seq77 1477 | ATGGCCCGCATTGTTGCCTACGAAGCTCACACATGGTCAACAGCCGTGTTACCATGGCG 1478 | CTGCCTTCGCTGTCGCCGGATAACAGCTTATTGGCCCACTATGCGTCGCTGGCTAAGGAA 1479 | CAGAAACTGATGGCGATGGGTTGGTGCGGCTGGTTGATGTAGGTCCGGCAATTCGTTACA 1480 | CCTTAGGGTATATTAAGACCCCTCACTCAAACCTGAGCGAGTCTACGAAATGAGGCAT 1481 | GCATGCGAAACCTTATGCACGCCAAGTGTAGTCCAAATGATCTAGGTGTTAAGAACTTCG 1482 | AAATTTCATCACGTATACAATAAAGGCAGGGAAGTATTGATTTTGAAGA 1483 | ATGGTCTGGAACGACATATAAGTGACAAACTAAAAGC 1484 | GAGGATCTACGAAACGTACAACGCACTCGACGCAAACGGAAATTCAAGTGGGAACTGA 1485 | AGCCATAAGGGGTTATTCCCAATGCGTCTACGACTCTTCTGCGGGTCGTGACTAAGCTCT 1486 | ATACGCGGAACGTGGCACCCCGCTACGATCTCTAATCATCGGCAGCCACGTCAGTGGCC 1487 | TTTTACGAAGAAGTCAGGCTTACGTCGATTGTGTGTGCGAAGCGCTTCCTGTAGGA 1488 | AGTGTAGCAGTAGTCCCTAGAGGCCGCATATTTGACGAATGAAACAATCCGATATCTGC 1489 | AATTGGCAATTGACTGTAATAGCCGCTGCCGTGTTGCAAAATCTTG 1490 | GTCTGGTCCAGGCGGTCGACGTAAATAGCTCAGTCAAAGCTTCGATTGCTCCCATTTC 1491 | ATGTGATGGCTTAAATCTGGGTTCGAAAAGCGGTAATACATGTATCATCAAGAAGCG 1492 | GATGGGGAACGGCCATACCGACCTCGAGAACGACGATTTGCGTCCTACCCAACATA 1493 | AAATTTTTTGGGGATCGAGTGAGGGACAGGCTTGAGATCTA 1494 | AAAATCTGTCGCGAACCATGTATAAATCTACACGGGCACACCTGCTAACTCTTTA 1495 | GAGCGAACGC 1496 | >seq72 1497 | 1498 | 1499 | 1500 | 1501 | 1502 | 1503 | 1504 | 1505 | CGACTAAGCTCT 1506 | ATACGCGGAACGTGGCACCCCGCTACGATCTCTAATCATCGGCAGCCACGTCAGTGGCC 1507 | TTTTACGAAGAAGTCAGGCTTACGTCGATTGTGTGTGCGAAGCGCTTCCTGTAGGA 1508 | AGTGTAGCAGTAGTCCCTAGAGGCCGCATATTTGACGAATGAAACAATCCGATATCTGC 1509 | AATTGGCAAT 1510 | 1511 | 1512 | 1513 | 1514 | 1515 | 1516 | >seq78 1517 | ATGGCCCGCATTGTTGCCTACGAAGCTCACACATGGTCAACAGCCGGGTTACAATGGCG 1518 | CTGCCTTCGCTGTCGCCGGATAACAGCTTATTGGCCCACTATGCGTCGCTGGCTAAGGAA 1519 | CAGAAACTGATGGCGATGGGTTGGTGCGGCTGGTTGATGTAGGTCCGGCAATTCGTCACA 1520 | CCTTAGGGTGTATTAAGACCCCTCACTCAAACCTGGGCGAGTCTACGAAATGAGGCAT 1521 | GCATCCGAAACCTTATGCACGCCAAGTGTAGTCCAAACGATCTTGGTGTTAAGAACTTCG 1522 | AAATTTCATCACGTATACAATAAAGGCAGGGAAGTATTGATTTTGAAGA 1523 | ATGGTCTGGAACGACATATAAGTGACAAACTAAAAGC 1524 | GAGGATCTACGAAACGTACAACGCACTCGACGCAAACGGAAATTCAAGTGGGAACTGA 1525 | AGCCATAAGGGGTTATTCCCAATGCGTCTACGACTCTTCTGCGGGTCGTGACTAAGCTCT 1526 | ATACGCGGAACGTGGCACCCCGCTACGATCTCTAATCATCGGCAGCCACGTCAGTAGCC 1527 | TTTTACGAAGAAGTCAGGCTTACGTCGATTGTGTGTGCGAAGCGCTTTCTGTAGGA 1528 | AGTGTAGCAGTAGTCCCTAGAGGCCGCATATTTGACGAATGAAACAATCCGATATCTGC 1529 | AATTGACAATTGACTGTAATAGCCGCCGCCGTGTTGCAAAATCTTG 1530 | GTCTCGTCCAGGCGGTCGACGTAAATAGCTCACTCAAAGCTTCGATTGCTCCCATTTC 1531 | ATGTGATGGCTTAAATCTGGGTTCGAAAAGCGGTAATACATGTATCATCAAGAAGCG 1532 | GATGGGGAACGGCCATACCGACCTCGAGAACGACGATTTGCGTCCTACCCAACATA 1533 | AAATTTTTCGGGGATCGAGTGAGGGACTGGCTTGAGATCTA 1534 | AAAATCTGTCGCGAACCATGTATAAATCTACACGGGCACACCTGCTAACTCTTTA 1535 | GAGCGAACGC 1536 | >seq8 1537 | ATGGCCCGCATTGTTGCCTACGAAGCTCACACATGGTCAACAGCCGGGTTACAATGGCG 1538 | CTGCCTTCGCTGTCGCCGGATAACAGCTTATTGGCCCACTATGCGTCGCTGGCTAAGGAA 1539 | CAGAAACTGATGGCGATGGGTTGGTGCGGCTGGTTGATGTAGGTCCGGCAATTTGTCACA 1540 | CCTTAGGGTGTATTAAGACCCCTCACTCAAACCTGAGCGAGTATACGAAATGAGGCAT 1541 | GCATCCGAAACCTTATGCACGCCAAGTGTAGTCCAAACGATCTTGGTGTTAAGAACTTCG 1542 | AAATTTCATCACGTATACAATAAAGGCAGGGAAGTATTGATTTTGTAGA 1543 | ATGGTCTGGAACGACATATAAATGACAAACTAAAAGC 1544 | GAGGATCTACGAAACGTACAACGCACTCGACGCAAACGGAAATTCAAGTGGGAACTGA 1545 | AGCCATAAGGGGTTATTCCCAATGCGTCTACGACTCTTCTGCGGGTCGTGACTAAGCTCT 1546 | ATACGCGGAACGTGGCACCCCGCTACGATCTCTAATCATCGGCAGCCACGTCAGTAGCC 1547 | TTTTACGAAGAAGTCAGGCTTACGTCGATTGTGTGTGCGAAGCGCTTTCTGTAGGA 1548 | AGTGTAGCAGTAGTCCCTAGAGGCCGCATATTTGACGAATGAAACAATCCGATATCTGC 1549 | AACTGGCAATTGACTGTAATAGCCGCCGCCGTGTTGCAAAATCTTG 1550 | GTCTCATCCAGGCGGTCGACGTAAATAGCTCACTCAAAGCTTCGATTGCTCCCATATC 1551 | ATGTGATGGCTTAAATCTGGGTTCGAAAAGCGGTAATACATGTATTATCAAGAAGCG 1552 | GATGGGGAACGGCCATACCGACCTCGAGAACGACGATTTGCGTCCTACCCAACATA 1553 | AAATTTTTTGGGGATCGAGTGAGGGACTGGCTTGAGATCTA 1554 | AAAATCTGTCGCGAACCATGTATAAATCTACACGGGCACACCTGCTAACTCTTTA 1555 | GAGCGCACGC 1556 | >seq67 1557 | ATGGCCCGCATTGTTGCCTACGAAGCTCACACATGGTCAACAGCCGGGTTACAATGGCG 1558 | CTGCCTTCGCTGTCGCCGGATAACAGCTTATTGGCCCACTAGGCGTCGCTGGCTAAGGAA 1559 | CAGAAACTGATGGCGATGGGTTGGTGCGGCTGGTTGATGTAGGTCCGGCAATTCGTCACA 1560 | CCTTAGGGTGTATTAAGACCCCTCACTCAAACCTGAGCGAGTCTACGAAATGAGGCAT 1561 | GCATCCGAAACCTTATGCACGCCAAGTGTAGTCCAAACGATCGTGGTGTTAAGAACTTCG 1562 | AAATTTCATCACGTATACAATAAAGGGAGGGAAGTATTGATTTTGTAGA 1563 | ATGGTCTGGAACGACATATAAGTGACAAACTAAAAGC 1564 | GAGGATCTACGAAACGTACAACGCACTCGACGCAAACGGAAATTCAAGTGGGAACTGA 1565 | AGCCATAAGGGGTTATTCCCAATGCGTCTACGACTCTTCTGCGGGTCGAGACTAAGCTCT 1566 | ATACGCGGAACGTGGCACCCCGCTACGATCTCTAATCGTCGGCAGCCACGTCAGTAGCC 1567 | TTTTACGAAGAAGTCAGGCTTACGTCGATTGTGTGTGCGAAGCGCTTTCTGTAGGA 1568 | AGTGTAGCAGTAGTCCCTAGAGGCCGCATATTTGACGAATGAAACAATCCGATATCTGC 1569 | AATTGGCAATTGACTGTAATAGCCGCCGCCGTGTTGCAAAATCTTG 1570 | GTCTCGTCCAGGCGGTCGACGTAAATAGCTCACTCAAAGCTTCGATTGCTCCCATTTC 1571 | ATGTGATGGCTTAAATCTGGGTTCGAAAAGCGGTAATACATGTATTACCAAGAAGCG 1572 | GATGGGGAACGGCCATACCGACCTTGAGAACGACGATTTGCGTCCTACCCAACATA 1573 | AAATTTTTTGGGGATCGAGTGAGGGACTGGCTTGAGATCTA 1574 | AAAATCTGTCGCGAACCATGTATAAATCTACACGGGCACACCTGCTAACTCTTTA 1575 | GAGCGAACGC 1576 | >seq24 1577 | 1578 | 1579 | 1580 | 1581 | 1582 | TAAAGGCAGGGAAGTATTGATTTTGAAGA 1583 | ATGGTCTGGAACGACATATAAGTGACAAACTAAAAGC 1584 | GAGGATCTACGAAACGTACAACGCACTCGACGCAAACGGAAATTCAAGTGGGAACGGA 1585 | AGCCATAAGAGGGTATTCCCAATGAGTCTACGATTCTTCTGCGGGTCGTGACTAAGCTCT 1586 | ATACGCGGAACGTGGC 1587 | 1588 | 1589 | 1590 | 1591 | 1592 | 1593 | 1594 | 1595 | 1596 | >seq73 1597 | 1598 | 1599 | 1600 | 1601 | 1602 | TGATTTTGAAGA 1603 | ATGGTCTGGAACGACATATAAGTGACAAACTAAAAGC 1604 | GAGGATCTACGAAACATACAACGCACTCGACGCAAACGGAAATTCAAGTGGGAACGGA 1605 | AGCCATAAGAGGTTATTCCCAATGAGTCTACGATTCTTCTGCGGGTCGTGACTAAGCTCC 1606 | ATACGCGGAACGTGGCACCCCGCTACGATCTCTAATCATCGGCAGCCACGTCAGTGGCC 1607 | TTTCGCAAAGAAGCCAGGCTTACGTCGATTGTTTGTGCGAAGCGCTTCCTGTAGGA 1608 | AGTGAAGCATTAGTCCCTAGAGGCCGCATATTTGACGAA 1609 | 1610 | 1611 | 1612 | 1613 | 1614 | 1615 | 1616 | >seq87 1617 | 1618 | 1619 | 1620 | 1621 | 1622 | AAATGCAGGGTAGTATTCATAGTGAAGA 1623 | AAGGTCTCGAACGACATATATGTGACAAACGAAAAGT 1624 | GAGGATCTACGGAACGTACAACGCACTGGACGCAAACGTAGATTCAAGTGGTAACGAA 1625 | AACCATATGAGGTTACTCCTAATGAGACTACGATTCTTCTGCGGGTCGTGACTAAGCTCT 1626 | ATACGCGGAACGTGGCACCCCGCTACGATCTCTAATCATCGGCAGCCACGTCAGTGGCC 1627 | TTTTGCAAAGCAGCTAGGGTTACGTCAATTTTATGTGCAAAGCGC 1628 | 1629 | 1630 | 1631 | 1632 | 1633 | 1634 | 1635 | 1636 | >seq49 1637 | 1638 | 1639 | 1640 | 1641 | 1642 | ATAAATGCAGGGAAGTATTCATAGTGAAGA 1643 | AAGGTCTCGAACGACATATAAGTGACAAACGAAAAGT 1644 | GAGGAACTACGGAACGTACAACGCACTGGACGCAAACGTAGATTCAAGTGGTAACGAA 1645 | AACCATAAGAGGTTACTCCTAATGAGACTACGATTCTTCTGCGGGTCGTGTCTAAGCTCT 1646 | ATACGCGGAACGTGGCACCCCGCTACGATCTCTAATCATCGGCAGCCACGTCAGTGGCC 1647 | TTTTGCAAAGAAGCTAGGGTTACGTCAATTTTATGTGCAAAGCGCTTCCTTTAGGA 1648 | AGTGTGGCATTACTCCCTAGAGGCCGCATATTTGACGAATGAAACAATTC 1649 | 1650 | 1651 | 1652 | 1653 | 1654 | 1655 | 1656 | >seq57 1657 | ATGGCCCGCATTGTTGTCTACGAAGCACACACATGGTCAACAGCGGGGTTAAAATGGCG 1658 | CTGCCGTCGCTGTCGCCGGATAACAGCTTATTGGCCCGCTATGCGTCGCTGCCTTAGGAA 1659 | CAGAAACTGATGGCGATGGGTTGGTGCGGCCGGTTAATGTAGGTCCGGCAATTCGTCACA 1660 | CCTTGGGGTGTATTAAAACCCCTCATTCAGACCTGAGCGAGTCTACGCAATGATGCAT 1661 | GCCTCCGAAACCTTATGCACGACTAGTGTAGTCCAAACGATCCAGGTGTTGAGAACCACC 1662 | AAATTTCATCACGTATACAATAAATGCAGGGAAGTATTCATAGTGAAGA 1663 | AAGGTCTCGAACGACATATAAGTGACAAACGAAAAGT 1664 | GAGGAACTACGGAACGTACTACGCACTGGACGCAAACGTAGATTCAAGTGGTAACGAA 1665 | AACCACAAGAGGTTACTCCTAATGAGACTACGATTCTTCTGCGGCTCGTGACTAAGCTCT 1666 | ATACGCGGAACGTGGCACCCCGCTACGATCTCTAATCATCGGCAGCCACGTCAGTGGCC 1667 | TTTTGCAAAGAAGCTAGGGTTACGTCAATTTTATGTGCAAAGCGCTTCCTTTAGGA 1668 | AGTGTGGCATTAGTCCCTAGAGGCCGCATATTTGACGAATGAAACAATTCGATATCTGC 1669 | AATGGGCAATTGACTGTAATAACCGCGGCAGAGTTGCAAAATCTTG 1670 | GTCTCGTCAAGGCGGTCGACGTAAATAGCTCAGTCAAAGCTGCGATTGCTCCTGTTTC 1671 | ATATGATGGCTGAAATCTGGGTTCGAAAAACGTAAGTACATGATATCATCAAGAAGAG 1672 | GAGGGGGAACGGCCATACCGACCTTGAGAACGACGATTTGAGTCCTACCCAACATA 1673 | ATAGTTATTGGCGATCGAGTGAGGGACACGCTTGAGATCTA 1674 | AAAATCCGTCGCGAACCATGTGTAAATCTACGCCGGCTCACCTGCTAACTCTTTA 1675 | GAGCGAACAC 1676 | >seq58 1677 | ATGGCCCGCATTGTTGTCTACGAAGCACACACATGGTCAACAGCGGGATTAAAATGGCG 1678 | CTGCCGTCGCTGTCGCCGGATAACAGCTTATTGGCCCGCTATGCGTCGCTGCCTTAGGAA 1679 | CAGAATCTGTTGGCGATGGGTTGGTGCGGCCGGTTGTTGTAGGTCCGGCAATTCGTCACA 1680 | CCTTAGGGTGTATTAAAACCCCTCATTCCGACCTGAGCGAGTCTACACAATGATGCAT 1681 | GCCTCCGAAACTTTATGCACGCCTAGTGTAGTCCAAACGATCCAGGTGTTGAGAACCACC 1682 | AAATTTCATCACGTATACAATAAATGCAGGGAAGTATTCATAGTGAAGA 1683 | AAGGTCTCGAACGACATATAAGTGACAAACGAAAAGT 1684 | GAGCAACTACGGAACGTACAACGCACTGGACGCAAACGTAGGTTCAAGTGGTAACGAA 1685 | AACCATAAGAGGTTACTCCTAATGAGACTACGATTCTTCTGCGGGTCGTGACTAAGCTCT 1686 | ATACGCGGAACGTGGCACCCCGCTACGATCTCTAATCATCGGCAGCCACGTCAGTGGCC 1687 | TTTTGCAAAGAAGCTAGGGTTACGTCAATTTTATGTGCAAAGCGCTTCCTTTAGGA 1688 | AGTGTAGCATTAGTCCCTAGAGGCCGCATATTTGACGAATAAAACAATTCGATATCTGC 1689 | AATTGGCAATTGACTGTAATAACCGCGGCAGAGTTGCAAAATCTTG 1690 | GTCTCGTCAAGGCGGTCGACGTAAATAGCTCAGTCAAAGCTGCGATTGCTCCCGTTTC 1691 | ATATGATGACTGAAATCTGGGTTCGAAAAGCGTAAGTACATGATATCATCAAGAAGAG 1692 | GAGGGGGAACGGCCATACCGACCTTGAGAACGACGATTTGAGTCCTACCCAACATA 1693 | ATATTTATTGGCGATCGAGTGAGGGACACGCTTGAGATCTA 1694 | AAAATCCGTCGCGAACCATGTATAAATCTACGACGGCTCACCTGCTAACTCTTTA 1695 | GAGCGAACAC 1696 | >seq59 1697 | 1698 | 1699 | 1700 | 1701 | 1702 | 1703 | AAGGTCTCGAACGACATATAAGTGACAAACGAAAAGT 1704 | GAGGAACTACGGAACGTACAACGCACTGGACGCAAACGCAGGTTCAAGTGGTAACGAA 1705 | AACCATAAGAGGTTACTCCTAATGAGACTACGATTCTTCTGCGGGTCGTGACTAAGCTCT 1706 | ATACGCGGAACGTGGCACCCCGCTACGATCTCTAATCATCGGCAGCCACGTCAGTGGCC 1707 | TTTTGCAAAGAAGCTAGGGTTACGTCAATTTTATGTGCAAAGCGCTTCCTTTAGGA 1708 | AGTGTAGCATTAGTCCCTAGAGGCCGCATATTTGACGAATAAAACAATTCGATATCTGC 1709 | AATTGGCAATTGACTGTAATAACCGCGGCAGAGTTGCAAA 1710 | 1711 | 1712 | 1713 | 1714 | 1715 | 1716 | >seq22 1717 | 1718 | 1719 | 1720 | 1721 | 1722 | 1723 | 1724 | A 1725 | AACCATAAGAGGTTACTCCTAATGAGTCTACGATTCTTCGGCGGGTCGTGACTAAGCTCT 1726 | ATACGGGGAGCGTGGCACCCCGCAATGATCTCTAATCATCGGCAGCCACGTCAGTTGCC 1727 | TTTTGCAAAGGAGTTAGGGTTACGCCAATTTTATGTGCAAAGCGCTTCCTTTAGGA 1728 | AGTGGGGCATTAGTCCCTAGAAGC 1729 | 1730 | 1731 | 1732 | 1733 | 1734 | 1735 | 1736 | >seq65 1737 | AGGGCCCGCATTGTTGCCTACGAAGCACACACATGGTCAACAGCGGGGTTACAATGGCG 1738 | CTGCCGTCGCTGTCGCCGGATAACAGCTTATTGGCCCGCTATGCGTCGTTGCCTTAGGAA 1739 | CAAAAACTGATGGCGATGGGTTGGTGCGGCCGGTTGATGTAGGTCCAGTAATTCGTCACA 1740 | CCTTAGGGTGTATTAAAACCCCTCATTCAGACCTGAGCGAGTCTACGCAATGAGGCAT 1741 | GCATCCGAAACCTTATGCACGCCTAGTGTAGTCCAAACGATCTAGGGGTTGAGAACCACC 1742 | AAATTTCATCACGTATACAATAAATGCAGGGAAGTATTCATTGTGAAGA 1743 | AAGGTCTCGAACGACATATAAGTGGGAAACGAAAAGG 1744 | GAGGATCTACGGAACGTACAACGCACTGGATGCAAACGTAGATTCAAGTGGTAACGAA 1745 | AACCATAAGAGGTTACTCCTAATGAGTCTACGATTCTTCTGCGGGTCGTGACTAAGCTCT 1746 | ATACGGGGAACGTGGCACCCCGCAATGATCTCTAATCATCGGCAGCCACGTCAGTTGCC 1747 | TTTTGCAAAGGGGCTAGGGTTACGCCAATTTTATGTGCAAAGCGCTTCCTTTAGGA 1748 | AGTGGGGCATTAGTCCCTAGAAACCGCATATTTGACGAATGAAACAATTCGATATCTGC 1749 | AATTGGCAATTGACTGTAATAACCGCGGCAGAGTTGCTAAATCTTG 1750 | CTCTCGTCAAGGCGGTCGACGTAAATAGCTCAGTCAAAGCTGCGATTGCTTCCGTTTC 1751 | ATATGATGGCTGAAGTCTGGGTTCGAAAAGCGTAAGTACATGATATCATCAAGAAGAG 1752 | GAGGGGGAACGGCCATATCGACCTTGAGAACGACGATCTGCGTCCTACCCAGACATA 1753 | ATATTTATTGGCGATCGAGTGAGGGACACGCTTGAGATCTA 1754 | AAAATCTGTCGCGAACCATGTATAAATCTACGCCGGCTCACCTGCTAACTCTTTA 1755 | GAGCGAACAC 1756 | >seq20 1757 | ATGGCCCGCATTGTTGCCTACGAAGCACACACATGGTCAACAGCGGGGTTACAATGGCG 1758 | CTGCCGTCGCTGTCGCCGGATAACAGCTTATTGGCCCGCTATGCGTCGCTGCCTTAGGAA 1759 | CAGAAACTGATGGCGATGGGATGGTGCGGCCGGTTGATGTAGGTCCGGCAATTCGTTACA 1760 | CCTTAGGGTGTATTAAAACCCCTCATTCAGACCTGAGCGAGTCTACGCAATGAGGCCT 1761 | GCATCCGAAACCTTATGCACGCCTAGTGTAGTCCAAACGATCTAGGGGTTGAGAACCACC 1762 | AAATTTCATCACGTATACAATAAATGCAGGAAAGTATTCATTGTGAAGA 1763 | AAGGTCTCGAACGACATATAAGTGGGAAACGAAAAGG 1764 | GAGGATCTACGGAACGTACAACGCACTGGATGCAAACGTAGATTCAAGTGGTAACGAA 1765 | AACCATAAGAGGTTACTCCTAATGAGTCTACGATTCTTCTGCGGGTCGTGACTAAGCTCT 1766 | ATACGCGGAACGTGGCACCCCGCAATGATCTCTAATCATCGGCAGCCACGTCAGTGGCC 1767 | TTTTGCAAAGGAGCTAGGGTTACGCCAATTTTATGTGCAAAGCGCTTCCTTTAGGA 1768 | AGTGTGGCATTAGTCCCTAGAGGCCGCATATTTGACGAATGAAACAATTCGATATCTGC 1769 | AATTGGCAATTGACTGTAATAACCGCGGCAGAGTTGCTAAATCTTG 1770 | CTCTCGTCAAGGCGGTCGACGTAAATAGTTCAGTCAAAGCTGCGATTGCTTCCGTTTC 1771 | ATATGATGGCTGAAATCTGGGTTCGAAAAGCGTAAGTACATGATATCATCAAGAAGAG 1772 | GAGGGGGAACGGCCCTACCGACCTTGAGAACGACGATCTGCGTCCTACCCAGACATA 1773 | ATATTTATTGACGATCGAGTGAGGGACACGCTTGAGATCTA 1774 | AAAATCTGTCGCGAACCATGTATAAATCTACGCCGGCTCACCTGCTAACTCTTTA 1775 | GAGCGAACAC 1776 | >seq21 1777 | 1778 | 1779 | 1780 | 1781 | 1782 | 1783 | GTGGGAAACGAAAAGG 1784 | AAGGATCTACGGAACGTACAACGCACTGGATGCAAACGTAGATTCAAGTGGTAACGAA 1785 | AACCATAAGAGGTTACTCCTAATGAGTCTACGATTCTTCTGCGGGTCGTGACTAAGCTCT 1786 | ATACGCGGAACGTGGCACCCCGCAATGATCTCTAATCATCGGCAGCCACGTCAGTGGCC 1787 | TTTTGCAAAGGAGCTAGGGTTACACCAATTTTATGTGCAAAGCGCTTCCTTTAGGA 1788 | AGTGTGGCATTAGTCCCTAGAGGCCGCATATTTGACGAATGAAACAATTCGATATCTGC 1789 | AATTGGCAATTGACTGTAATAACCGCGGCAGAGTTGCTAAATCTTG 1790 | CTCTCGTCAAGGCGGTCGA 1791 | 1792 | 1793 | 1794 | 1795 | 1796 | >seq33 1797 | 1798 | 1799 | 1800 | 1801 | 1802 | 1803 | AAGGTCTCGAACGACATATAAGTGGGAAACGAAAAGG 1804 | GAGGATCTACGGAACGTACAACGCACTGGATGCAAACGTAGATTCAAGTGGTAACGAA 1805 | AACCATAAGAGGTTACTCCTAATGAGTCTACGATTCTTCTGCGGGTCGTGACTAAGCTCT 1806 | ATACGCGGAACGTGGCACCCCGCAATGATCTCTAATCATCGGCAGCCACGTCAGTGGCC 1807 | TTTTGCAAAGGAGCTAGGGTTACGCCAATTTTATGTGCAAAGCGCTTCCTTTAGGA 1808 | AGTGTGGCATTAGTCCCTAGA 1809 | 1810 | 1811 | 1812 | 1813 | 1814 | 1815 | 1816 | >seq62 1817 | 1818 | 1819 | 1820 | 1821 | 1822 | 1823 | 1824 | 1825 | TCGTGACTAAGCTCT 1826 | ATACGCGGAACGTGGCACCCCGCAATGATCTCTAATCATCGGCAGCCACGTCAGTGGCC 1827 | TTTTGCAGAGGAGCTAGGGTTACGCCAATTTTATGTGCAAAGCGCTTCCTTTAGGA 1828 | AGTGTGGCATTAGTCCCTAGAGGCCGCATATTTGACGAATGAAACAATTCGATATCTGC 1829 | AATTGGCAATTGACTGTAATAACCGCGGC 1830 | 1831 | 1832 | 1833 | 1834 | 1835 | 1836 | >seq60 1837 | ATGGCCCGTGTTGTTGCCTAGAAGCACACACATGGTCAACAGCGGGGTGACAATGGCG 1838 | CTGCTGTCGCTGTCGCCGGATAACAGCTTATTGGCCCGCTATGCGTCGCTGCCTTAGGAA 1839 | CAGAAACTGATGGCGATGGGATCGTGCGGCCGGTTGATGTAGGTCCGGCAATTCGTCACA 1840 | CCTTAGGGTGTATTAAAACCCCTCATTCAGACCCGAGCAAGTCTACGCAATGAGGCAT 1841 | GCATGCGAAAACTTATGCACGCCTAGTGTAGTCCAAACGAGCTAGGGGTTGAGAACCACC 1842 | AAATTTCATCACGTATACAATAAATGCAGGGAAGTATTCATTGTGAAGA 1843 | AAGGTCTCGAACGACATATAAGTGGGAAACGAAAAGG 1844 | GAGGATCTACGGAACGTACGACGCACTGGATGCAAACGTAGATTCAAGTGGTAACGAA 1845 | AACCATAAGAGGTTACTCCTAATGAGTCTACGATTCTTCTGCGGGTCGTGACTAAGCTCT 1846 | ATACGCGGAACGTGGCACCCCGCAATGATCTCTAATCATCGGCAGCCACGTCAGTGGCC 1847 | TTTTGCAAAGGAGCTAGGGTTACGCCAATTTTATGTGCAAAGCGCTTCCTTTAGGA 1848 | AGTGTGGCATTAGTCCCTAGAGGCCGCATATTTGACGAATGAAACAATTCGATATCTGC 1849 | AATTGGCAATTGACTGTAATAACCGCGGCAGAGTTGCCAAATCTTG 1850 | CTCTCGTCAAGGCGGTCGACGTAAATAGCTCAGTCAAAGCTGCGATTGCTTCCGTTTC 1851 | ATATGATGGCTGAAATCTGGGCTCGAAAAGCGTAAGTACATGATATCATCAAGAAGAG 1852 | GAGGGGGAACGGCCATACCGACCTTGAGAACGTCGATCTGCGTCCTACCCAGACATA 1853 | ATATTTATTGGCGATCGAGTGAGGGACACGCTTGAGATCTA 1854 | AAAATCTGTCGCGAACCATGTATAAATCTACGCCGGCTCACCTGCTAACTCTTTA 1855 | GAGCGAACAC 1856 | >seq23 1857 | ATGGCCCGCATTGTTGCCTAGAAGCACACACATGGTCAACAGCGGGGTTACAATGGCG 1858 | CTGCTGTCGCTGTCGCCGGATAACAGCTTATTGGCCCGCTATGCGTCGCTGCCTTAGGAA 1859 | CAGAAACTGATGGCGATGGGATAGTGCGGCCGGTTGATGTAGGTCCGGCAATTCGTCACA 1860 | CCTTAGGGTGTATTAAAACCCCTCATTCAGACCTGAGCGAGTCTACGCAATGAGGCAT 1861 | GCATCCGAAACCTTATGCGCGCCTAGTGTAGTCCAAACGATCTAGGGGTTGAGAACCACC 1862 | AAATTTCATCACGTATACAATAAATGCAGGGAAGTATTCATTGTGAAGA 1863 | AAGGTCTCGAACGACATATAAGTGGGAAACGAAAAGG 1864 | GAGGATCTACGGAACGTACAACGCACTGGATGCAAACGTAGATTCAAGTGGTAACGAA 1865 | AACCATAAGAGGTTACTCCTAATGAGTCTACGATTCTTCTGCGGGTCGTGACTAAGCTCT 1866 | ATACGCGGAACGTGGCACCCCGCAATGATCTCTAATCATCGGCAGCCACGTCAGTGGCC 1867 | TTTTGCAAAGGAGCTAGGGTTACGCCAATTTTATGTGCAAAGCGCTTCCTTTAGGA 1868 | AGTGTGGCATTAGTCCCTAGAGGCCGCATATTTGACGAATGAAACAATTCGATATCTGC 1869 | AATTGGCAATTGACTGTAATAACCGCGGCAGAGTTGCTAAATCTTG 1870 | CTCTCGTCAAGGCTGTCGACGTAAATAGCTCAGTCAAAGCTGCGATTGCTTCCGTTTC 1871 | ATATGATGGCTGAAATCTGGGTTCGAAAAGCGTAAGTACATGATATCATCAAGAAGAG 1872 | GAGGGGGAACGGCCATACCGACCTTGAGAACGACGATCTGCGTCCTACCCAGACATA 1873 | ATATTTATTGGCGATCGAGTGAGGGACACGCTTGAGATCTG 1874 | AAAATCTGTCGCGAACCATGTATAAATCTACGCCGGCTCACCTGCTAACTCTTTA 1875 | GAGCGAACAC 1876 | --------------------------------------------------------------------------------