├── .gitignore ├── README.rst ├── abifpy.py ├── demo.py ├── setup.py └── tests ├── 310.ab1 ├── 3100.ab1 ├── 3730.ab1 ├── empty.ab1 ├── fake.ab1 ├── test_abifpy.py └── test_py3_abifpy.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.py[co] 2 | 3 | # Packages 4 | <<<<<<< HEAD 5 | ======= 6 | setup.py 7 | >>>>>>> 100f00e8961c25108aae9d1c26da08d81129fbc8 8 | *.egg 9 | *.egg-info 10 | dist 11 | build 12 | eggs 13 | parts 14 | bin 15 | develop-eggs 16 | .installed.cfg 17 | 18 | # Installer logs 19 | pip-log.txt 20 | 21 | # Unit test / coverage reports 22 | .coverage 23 | .tox 24 | -------------------------------------------------------------------------------- /README.rst: -------------------------------------------------------------------------------- 1 | ====================== 2 | :warning: unmaintained 3 | ====================== 4 | 5 | A modified version of this module has been merged into the `Biopython 6 | project`_, available from version 1.58 onwards. If you already have Biopython 7 | version >=1.58, there is no need to use abifpy. Despite that, I am keeping 8 | the module available as a stand-alone for personal reasons :). 9 | 10 | 11 | ====== 12 | ABIFPY 13 | ====== 14 | 15 | ----------------------------------------------------------- 16 | Python module for reading ABI Sanger sequencing trace files 17 | ----------------------------------------------------------- 18 | 19 | abifpy is a python module that extracts sequence and various other data from 20 | Applied Biosystem's, Inc. format (ABI) file. The module is python3-compatible 21 | and was written based on the `official spec`_ released by Applied Biosystems. 22 | 23 | abifpy provides the following items: 24 | 25 | *class* Trace(in_file) 26 | Class representing the trace file ``in_file``. 27 | 28 | Trace object attributes and methods 29 | =================================== 30 | 31 | seq 32 | String of base-called nucleotide sequence stored in the file. 33 | 34 | qual 35 | String of phred quality characters of the base-called sequence. 36 | 37 | qual_val 38 | List of phred quality values of the base-called sequence. 39 | 40 | id 41 | String of the sequence file name. 42 | 43 | name 44 | String of the sample name entered prior to sequencing. 45 | 46 | trim(sequence[, cutoff=0.05]) 47 | Returns a trimmed sequence using Richard Mott's algorithm (used in phred) 48 | with the probability cutoff of 0.05. Can be used on ``seq``, ``qual``, and 49 | ``qual_val``. 50 | 51 | get_data(key) 52 | Returns metadata stored in the file, accepts keys from ``tags`` (see below). 53 | 54 | export([out_file="", fmt='fasta']) 55 | Writes a fasta (``fmt='fasta'``), qual (``fmt='qual'``), or 56 | fastq (``fmt='fastq'``) file from the trace file. Default format is ``fasta``. 57 | 58 | close() 59 | Closes the Trace file object. 60 | 61 | seq_remove_ambig(seq) 62 | Replaces extra ambigous base characters (K, Y, W, M, R, S) with 'N'. Accepts ``seq`` 63 | for input. 64 | 65 | EXTRACT 66 | Dictionary for determining which metadata are extracted. 67 | 68 | data 69 | Dictionary that contains the file metadata. The keys are values of ``EXTRACT``. 70 | 71 | tags 72 | Dictionary of tags with values of data directory class instance. Keys are tag name and 73 | tag number, concatenated. Use ``get_data()`` to access values in each ``tags`` entry. 74 | 75 | Usage 76 | ===== 77 | 78 | :: 79 | 80 | $ python 81 | >>> from abifpy import Trace 82 | >>> yummy = Trace('tests/3730.ab1') 83 | 84 | Or if you want to perform base trimming directly:: 85 | 86 | >>> yummy = Trace('tests/3730.ab1', trimming=True) 87 | 88 | Sequence can be accessed with the ``seq`` attribute. Other attributes of note 89 | are ``qual`` for phred quality characters, ``qual_val`` for phred quality values, 90 | ``id`` for sequencing trace file name, and ``name`` for the sample name:: 91 | 92 | >>> yummy.seq 93 | 'GGGCGAGCKYYAYATTTTGGCAAGAATTGAGCTCT... 94 | >>> yummy.qual 95 | '5$%%%\'%%!!!\'!+5;726@>A=3824DESHSS... 96 | >>> yummy.qual_val 97 | [20, 3, 4, 4, 4, 6, 4, 4, 0, 0, 0, 6, 0, 10, 20, 26, 22, 17, 21... 98 | >>> yummy.id 99 | '3730' 100 | >>> yummy.name 101 | '226032_C-ME-18_pCAGseqF' 102 | 103 | If trimming was not performed when instantiating, you can still do it afterwards:: 104 | 105 | >>> yummy.trim(yummy.seq) 106 | 107 | The quality values itself can be trimmed as well:: 108 | 109 | >>> yummy.trim(yummy.qual) 110 | 111 | Viewing the trace file metadata is easy. Use the values from ``EXTRACT`` 112 | as the keys in ``data``:: 113 | 114 | >>> yummy.data['well'] 115 | 'B9' 116 | >>> yummy.data['model'] 117 | '3730' 118 | >>> yummy.data['run start date'] 119 | datetime.date(2009, 12, 12) 120 | 121 | metadata not contained in ``data`` can be viewed using ``get_data()`` 122 | with one of the keys in ``tags`` as the argument, e.g.:: 123 | 124 | >>> yummy.get_data('PTYP1') 125 | '96-well' 126 | 127 | For more info on the meaning of these tags and the file metadata, consult the `official spec`_. 128 | 129 | Installation 130 | ============ 131 | 132 | * ``pip install abifpy``, or 133 | 134 | * Add the abifpy directory to your ``$PYTHONPATH`` (in ``.bashrc`` to make it persistent) 135 | 136 | License 137 | ======= 138 | 139 | abifpy is licensed under the MIT License. 140 | 141 | Copyright (c) 2011 by Wibowo Arindrarto 142 | 143 | Permission is hereby granted, free of charge, to any person obtaining a copy of 144 | this software and associated documentation files (the "Software"), to deal in 145 | the Software without restriction, including without limitation the rights to 146 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 147 | the Software, and to permit persons to whom the Software is furnished to do so, 148 | subject to the following conditions: 149 | 150 | The above copyright notice and this permission notice shall be included in all 151 | copies or substantial portions of the Software. 152 | 153 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 154 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,FITNESS 155 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 156 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 157 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 158 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 159 | 160 | .. _official spec: http://www.appliedbiosystems.com/support/software_community/ABIF_File_Format.pdf 161 | .. _Biopython project: http://biopython.org/wiki/Biopython 162 | -------------------------------------------------------------------------------- /abifpy.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # 3 | # abifpy.py 4 | # python module for reading abi trace files 5 | # http://github.com/bow/abifpy 6 | 7 | """Python module for reading .ab1 trace files.""" 8 | 9 | import datetime 10 | import struct 11 | from os.path import splitext, basename 12 | 13 | from sys import version_info 14 | 15 | RELEASE = False 16 | __version_info__ = ('1', '0', ) 17 | __version__ = '.'.join(__version_info__) 18 | __version__ += '-dev' if not RELEASE else '' 19 | 20 | 21 | __all__ = ['Trace'] 22 | 23 | # dictionary for deciding which values to extract and contain in self.data 24 | EXTRACT = { 25 | 'TUBE1': 'well', 26 | 'DySN1': 'dye', 27 | 'GTyp1': 'polymer', 28 | 'MODL1': 'model', 29 | 'RUND1': 'run start date', 30 | 'RUND2': 'run finish date', 31 | 'RUND3': 'data collection start date', 32 | 'RUND4': 'data collection finish date', 33 | 'RUNT1': 'run start time', 34 | 'RUNT2': 'run finish time', 35 | 'RUNT3': 'data collection start time', 36 | 'RUNT4': 'data collection finish time', 37 | 'DATA1': 'raw1', 38 | 'DATA2': 'raw2', 39 | 'DATA3': 'raw3', 40 | 'DATA4': 'raw4', 41 | 'PLOC2': 'tracepeaks', 42 | 'FWO_1': 'baseorder', 43 | } 44 | 45 | # dictionary for unpacking tag values 46 | _BYTEFMT = { 47 | 1: 'b', # byte 48 | 2: 's', # char 49 | 3: 'H', # word 50 | 4: 'h', # short 51 | 5: 'i', # long 52 | 6: '2i', # rational, legacy unsupported 53 | 7: 'f', # float 54 | 8: 'd', # double 55 | 10: 'h2B', # date 56 | 11: '4B', # time 57 | 12: '2i2b', # thumb 58 | 13: 'B', # bool 59 | 14: '2h', # point, legacy unsupported 60 | 15: '4h', # rect, legacy unsupported 61 | 16: '2i', # vPoint, legacy unsupported 62 | 17: '4i', # vRect, legacy unsupported 63 | 18: 's', # pString 64 | 19: 's', # cString 65 | 20: '2i', # Tag, legacy unsupported 66 | } 67 | 68 | # header structure 69 | _HEADFMT = '>4sH4sI2H3I' 70 | 71 | # directory data structure 72 | _DIRFMT = '>4sI2H4I' 73 | 74 | # to handle py3 IO 75 | def py3_get_string(byte): 76 | if version_info[0] < 3: 77 | return byte 78 | else: 79 | return byte.decode() 80 | 81 | def py3_get_byte(string): 82 | if version_info[0] < 3: 83 | return string 84 | else: 85 | return string.encode() 86 | 87 | class Trace(object): 88 | """Class representing trace file.""" 89 | def __init__(self, in_file, trimming=False): 90 | self._handle = open(in_file, 'rb') 91 | try: 92 | self._handle.seek(0) 93 | if not self._handle.read(4) == py3_get_byte('ABIF'): 94 | raise IOError('Input is not a valid trace file') 95 | except IOError: 96 | self._handle = None 97 | raise 98 | else: 99 | # header data structure: 100 | # file type, file, version, tag name, tag number, element type code, 101 | # element size, number of elements, data size, data offset, handle, 102 | # file type, file version 103 | # dictionary for containing file metadata 104 | self.data = {} 105 | # dictionary for containing extracted directory data 106 | self.tags = {} 107 | self.trimming = trimming 108 | # values contained in file header 109 | self._handle.seek(0) 110 | header = struct.unpack(_HEADFMT, 111 | self._handle.read(struct.calcsize(_HEADFMT))) 112 | # file format version 113 | self.version = header[1] 114 | 115 | # build dictionary of data tags and metadata 116 | for entry in self._parse_header(header): 117 | key = entry.tag_name + str(entry.tag_num) 118 | self.tags[key] = entry 119 | # only extract data from tags we care about 120 | if key in EXTRACT: 121 | # e.g. self.data['well'] = 'B6' 122 | self.data[EXTRACT[key]] = self.get_data(key) 123 | 124 | self.id = self._get_file_id(in_file) 125 | self.name = self.get_data('SMPL1') 126 | self.seq = self.get_data('PBAS2') 127 | self.qual = ''.join([chr(ord(value) + 33) for value in self.get_data('PCON2')]) 128 | self.qual_val = [ord(value) for value in self.get_data('PCON2')] 129 | 130 | if trimming: 131 | self.seq, self.qual, self.qual_val = map(self.trim, 132 | [self.seq, self.qual, 133 | self.qual_val]) 134 | 135 | def __repr__(self): 136 | """Represents data associated with the file.""" 137 | if len(self.seq) > 10: 138 | seq = "{0}...{1}".format(self.seq[:5], self.seq[-5:]) 139 | qual_val = "[{0}, ..., {1}]".format( 140 | repr(self.qual_val[:5])[1:-1], 141 | repr(self.qual_val[-5:])[1:-1]) 142 | else: 143 | seq = self.seq 144 | qual_val = self.qual_val 145 | 146 | return "{0}({1}, qual_val:{2}, id:{3}, name:{4})".format( 147 | self.__class__.__name__, repr(seq), qual_val, 148 | repr(self.id), repr(self.name)) 149 | 150 | def _parse_header(self, header): 151 | """Generator for directory contents.""" 152 | # header structure: 153 | # file signature, file version, tag name, tag number, 154 | # element type code, element size, number of elements 155 | # data size, data offset, handle 156 | head_elem_size = header[5] 157 | head_elem_num = header[6] 158 | head_offset = header[8] 159 | index = 0 160 | 161 | while index < head_elem_num: 162 | start = head_offset + index * head_elem_size 163 | # added directory offset to tuple 164 | # to handle directories with data size <= 4 bytes 165 | self._handle.seek(start) 166 | dir_entry = struct.unpack(_DIRFMT, 167 | self._handle.read(struct.calcsize(_DIRFMT))) + (start,) 168 | index += 1 169 | yield _TraceDir(dir_entry, self._handle) 170 | 171 | def _get_file_id(self, in_file): 172 | """Returns filename without extension.""" 173 | return splitext(basename(in_file))[0] 174 | 175 | def close(sel): 176 | """Closes the Trace file object.""" 177 | self._handle.close() 178 | 179 | 180 | def get_data(self, key): 181 | """Returns data stored in a tag.""" 182 | return self.tags[key].tag_data 183 | 184 | def seq_remove_ambig(self, seq): 185 | """Replaces extra ambiguous bases with 'N'.""" 186 | import re 187 | seq = self.seq 188 | return re.sub("K|Y|W|M|R|S", 'N', seq) 189 | 190 | def export(self, out_file="", fmt='fasta'): 191 | """Writes the trace file sequence to a fasta file. 192 | 193 | Keyword argument: 194 | out_file -- output file name (detault 'tracefile'.fa) 195 | fmt -- 'fasta': write fasta file, 'qual': write qual file, 'fastq': write fastq file 196 | 197 | """ 198 | if out_file == "": 199 | file_name = self.id 200 | if fmt == 'fasta': 201 | file_name += '.fa' 202 | elif fmt == 'qual': 203 | file_name += '.qual' 204 | elif fmt == 'fastq': 205 | file_name += '.fq' 206 | else: 207 | raise ValueError('Invalid file format: {0}.'.format(fmt)) 208 | else: 209 | file_name = out_file 210 | 211 | if fmt == 'fasta': 212 | contents = '>{0} {1}\n{2}\n'.format( 213 | self.id, 214 | self.name, 215 | self.seq) 216 | elif fmt == 'qual': 217 | contents = '>{0} {1}\n{2}\n'.format( 218 | self.id, 219 | self.name, 220 | ' '.join(map(str, self.qual_val))) 221 | elif fmt == 'fastq': 222 | contents = '@{0} {1}\n{2}\n+{0} {1}\n{3}\n'.format( 223 | self.id, 224 | self.name, 225 | self.seq, ''.join(self.qual)) 226 | 227 | with open(file_name, 'w') as out_file: 228 | out_file.writelines(contents) 229 | 230 | def trim(self, seq, cutoff=0.05): 231 | """Trims the sequence using Richard Mott's modified trimming algorithm. 232 | 233 | Keyword argument: 234 | seq -- sequence to be trimmed 235 | cutoff -- probability cutoff value 236 | 237 | Trimmed bases are determined from their segment score, ultimately 238 | determined from each base's quality values. 239 | 240 | More on: 241 | http://www.phrap.org/phredphrap/phred.html 242 | http://www.clcbio.com/manual/genomics/Quality_trimming.html 243 | """ 244 | # set flag for trimming 245 | start = False 246 | # set minimum segment size 247 | segment = 20 248 | trim_start = 0 249 | 250 | if len(seq) <= segment: 251 | raise ValueError('Sequence can not be trimmed because \ 252 | it is shorter than the trim segment size') 253 | else: 254 | # calculate probability back from formula used 255 | # to calculate phred qual values 256 | score_list = [cutoff - (10 ** (qual/-10.0)) for 257 | qual in self.qual_val] 258 | 259 | # calculate cummulative score_list 260 | # if cummulative value < 0, set to 0 261 | # first value is set to 0 (assumption: trim_start is always > 0) 262 | running_sum = [0] 263 | for i in range(1, len(score_list)): 264 | num = running_sum[-1] + score_list[i] 265 | if num < 0: 266 | running_sum.append(0) 267 | else: 268 | running_sum.append(num) 269 | if not start: 270 | # trim_start = value when cummulative starts to be > 0 271 | trim_start = i 272 | start = True 273 | 274 | # trim_finish = index of the highest cummulative value, 275 | # marking the segment with the highest cummulative score 276 | trim_finish = running_sum.index(max(running_sum)) 277 | 278 | return seq[trim_start:trim_finish] 279 | 280 | class _TraceDir(object): 281 | """Class representing directory content.""" 282 | def __init__(self, tag_entry, handle): 283 | self.tag_name = py3_get_string(tag_entry[0]) 284 | self.tag_num = tag_entry[1] 285 | self.elem_code = tag_entry[2] 286 | self.elem_size = tag_entry[3] 287 | self.elem_num = tag_entry[4] 288 | self.data_size = tag_entry[5] 289 | self.data_offset = tag_entry[6] 290 | self.data_handle = tag_entry[7] 291 | self.tag_offset = tag_entry[8] 292 | 293 | # if data size is <= 4 bytes, data is stored inside the directory 294 | # so offset needs to be changed 295 | if self.data_size <= 4: 296 | self.data_offset = self.tag_offset + 20 297 | 298 | self.tag_data = self._unpack(handle) 299 | 300 | def __repr__(self): 301 | """Represents data associated with a tag.""" 302 | summary = ['tag_name: {0}'.format(repr(self.tag_name))] 303 | summary.append('tag_number: {0}'.format(repr(self.tag_num))) 304 | summary.append('elem_code: {0}'.format(repr(self.elem_code))) 305 | summary.append('elem_size: {0}'.format(repr(self.elem_size))) 306 | summary.append('elem_num: {0}'.format(repr(self.elem_num))) 307 | summary.append('data_size: {0}'.format(repr(self.data_size))) 308 | summary.append('data_offset: {0}'.format(repr(self.data_offset))) 309 | summary.append('data_handle: {0}'.format(repr(self.data_handle))) 310 | summary.append('tag_offset: {0}'.format(repr(self.tag_offset))) 311 | summary.append('tag_data: {0}'.format(repr(self.tag_data))) 312 | 313 | return '\n'.join(summary) 314 | 315 | def _unpack(self, handle): 316 | """Returns tag data""" 317 | if self.elem_code in _BYTEFMT: 318 | 319 | # because ">1s" unpacks differently from ">s" 320 | num = '' if self.elem_num == 1 else str(self.elem_num) 321 | fmt = "{0}{1}{2}".format('>', num, _BYTEFMT[self.elem_code]) 322 | start = self.data_offset 323 | 324 | handle.seek(start) 325 | data = struct.unpack(fmt, handle.read(struct.calcsize(fmt))) 326 | 327 | # no need to use tuple if len(data) == 1 328 | if self.elem_code not in [10, 11] and len(data) == 1: 329 | data = data[0] 330 | 331 | # account for different data types 332 | if self.elem_code == 2: 333 | return py3_get_string(data) 334 | elif self.elem_code == 10: 335 | return datetime.date(*data) 336 | elif self.elem_code == 11: 337 | return datetime.time(*data) 338 | elif self.elem_code == 13: 339 | return bool(data) 340 | elif self.elem_code == 18: 341 | return py3_get_string(data[1:]) 342 | elif self.elem_code == 19: 343 | return py3_get_string(data[:-1]) 344 | else: 345 | return data 346 | else: 347 | return None 348 | -------------------------------------------------------------------------------- /demo.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | # demo.py 4 | # demo script for using abifpy to write fasta files from trace files 5 | 6 | import abifpy 7 | import glob 8 | 9 | counter = 0 10 | 11 | print "Working..." 12 | 13 | for trace in glob.iglob('*.ab1'): 14 | abifpy.Trace(trace, trimming=True).export() 15 | counter += 1 16 | 17 | print "Done! Processed {0} trace files.".format(counter) 18 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | from setuptools import setup, find_packages 4 | 5 | from abifpy import __version__ 6 | 7 | 8 | version = __version__ 9 | long_description = open("README.rst").read() 10 | 11 | setup( 12 | name = "abifpy", 13 | version = version, 14 | description = "abifpy is a module for reading ABI Sanger sequencing trace files.", 15 | long_description = long_description, 16 | author = "Wibowo Arindrarto", 17 | author_email = "bow@bow.web.id", 18 | py_modules = ['abifpy'], 19 | url = "http://github.com/bow/abifpy/", 20 | license = "MIT", 21 | zip_safe = False, 22 | classifiers = [ 23 | "Development Status :: 4 - Beta", 24 | "Environment :: Console", 25 | "Intended Audience :: Science/Research", 26 | "License :: OSI Approved :: MIT License", 27 | "Programming Language :: Python", 28 | "Topic :: Scientific/Engineering :: Bio-Informatics", 29 | ], 30 | ) 31 | -------------------------------------------------------------------------------- /tests/310.ab1: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bow/abifpy/6a74bbeec9410a827e8c82f0b2278a70d55cb3ab/tests/310.ab1 -------------------------------------------------------------------------------- /tests/3100.ab1: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bow/abifpy/6a74bbeec9410a827e8c82f0b2278a70d55cb3ab/tests/3100.ab1 -------------------------------------------------------------------------------- /tests/3730.ab1: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bow/abifpy/6a74bbeec9410a827e8c82f0b2278a70d55cb3ab/tests/3730.ab1 -------------------------------------------------------------------------------- /tests/empty.ab1: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bow/abifpy/6a74bbeec9410a827e8c82f0b2278a70d55cb3ab/tests/empty.ab1 -------------------------------------------------------------------------------- /tests/fake.ab1: -------------------------------------------------------------------------------- 1 | This is a fake trace file 2 | -------------------------------------------------------------------------------- /tests/test_abifpy.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # 3 | # test_abifpy.py 4 | # python2 unit tests for abifpy 5 | # http://github.com/bow/abifpy 6 | 7 | import unittest 8 | import datetime 9 | import abifpy 10 | 11 | class TestAbif(unittest.TestCase): 12 | 13 | def __init__(self, filename): 14 | unittest.TestCase.__init__(self, methodName='runTest') 15 | self.filename = filename 16 | self.abif = abifpy.Trace(self.filename) 17 | self.trimmed_seq = self.abif.trim(self.abif.seq) 18 | self.untrimmed_seq = self.abif.seq 19 | 20 | def shortDescription(self): 21 | return "Testing %s" % self.filename 22 | 23 | def runTest(self): 24 | self.file_type() 25 | self.tag_data() 26 | self.tag_data_len() 27 | self.tag_data_type() 28 | self.trim_is_shorter() 29 | 30 | def file_type(self): 31 | print "\nChecking file type..." 32 | self.abif._handle.seek(0) 33 | self.assertEqual(self.abif._handle.read(4), 'ABIF') 34 | 35 | def tag_data(self): 36 | print "Checking tag data parsing..." 37 | for key in self.abif.tags: 38 | # should be assertNone(), not available in py2.6 39 | if self.abif.tags[key].elem_code != 1024: 40 | self.assertNotEqual(self.abif.get_data(key), None) 41 | 42 | def tag_data_len(self): 43 | print "Checking tag data lengths..." 44 | for key in self.abif.tags: 45 | code = self.abif.tags[key].elem_code 46 | data = self.abif.get_data(key) 47 | 48 | # 10 & 11 returns datetime, 12 is not clear, 1024 is ignored 49 | # only check for strings, arrays, and numbers 50 | if code not in [10, 11, 12, 1024]: 51 | # account for null character in pString and cString 52 | mod = 1 if code in [18, 19] else 0 53 | # if data is int/float, len is always 1 54 | obtained = len(data) if not isinstance(data, (int, float, bool)) else 1 55 | expected = self.abif.tags[key].elem_num - mod 56 | self.assertEqual(obtained, expected) 57 | 58 | def tag_data_type(self): 59 | print "Checking tag data types..." 60 | for key in self.abif.tags: 61 | code = self.abif.tags[key].elem_code 62 | data = self.abif.get_data(key) 63 | 64 | # user data should return None 65 | if code == 1024: 66 | self.assertEqual(data, None) 67 | # check for string return type in tags 2, 18, 19 68 | elif code in [2, 18, 19]: 69 | self.assertTrue(isinstance(data, basestring)) 70 | # check for datetime return tag 71 | elif code == 10: 72 | self.assertTrue(isinstance(data, datetime.date)) 73 | # check for datetime return tag 74 | elif code == 11: 75 | self.assertTrue(isinstance(data, datetime.time)) 76 | elif code == 13: 77 | self.assertTrue(isinstance(data, bool)) 78 | # check for number return types 79 | # some tags' data are still in a tuple of numbers, so will have to 80 | # iterate over them 81 | elif isinstance(data, tuple): 82 | for item in data: 83 | self.assertTrue(isinstance(item, (int, float))) 84 | # otherwise just check for type directly 85 | else: 86 | self.assertTrue(isinstance(data, (int, float))) 87 | 88 | def trim_is_shorter(self): 89 | print "Checking trimmed sequence length..." 90 | self.assertTrue(len(self.trimmed_seq) < len(self.untrimmed_seq)) 91 | 92 | class TestAbifFake(unittest.TestCase): 93 | 94 | def __init__(self, filename): 95 | unittest.TestCase.__init__(self, methodName='runTest') 96 | self.filename = filename 97 | 98 | def shortDescription(self): 99 | return "Testing %s" % self.filename 100 | 101 | def runTest(self): 102 | self.fake_file_type() 103 | 104 | def fake_file_type(self): 105 | print "\nIOError is raised if file is not ABIF..." 106 | self.assertRaises(IOError, abifpy.Trace, self.filename) 107 | 108 | class TestAbifEmpty(unittest.TestCase): 109 | 110 | def __init__(self, filename): 111 | unittest.TestCase.__init__(self, methodName='runTest') 112 | self.filename = filename 113 | self.abif = abifpy.Trace(self.filename) 114 | 115 | def shortDescription(self): 116 | return "Testing %s" % self.filename 117 | 118 | def runTest(self): 119 | self.short_sequence_untrimmed() 120 | 121 | def short_sequence_untrimmed(self): 122 | print "\nValueError is raised if sequence length is shorter than 20." 123 | self.assertRaises(ValueError, self.abif.trim, self.abif.seq) 124 | 125 | 126 | abif_real = ['3730.ab1', '3100.ab1', '310.ab1',] 127 | abif_fake = ['fake.ab1',] 128 | abif_empty = ['empty.ab1',] 129 | 130 | def run_suite(): 131 | suite = unittest.TestSuite([TestAbif(n) for n in abif_real]) 132 | suite.addTests(unittest.TestSuite([TestAbifFake(n) for n in abif_fake])) 133 | suite.addTests(unittest.TestSuite([TestAbifEmpty(n) for n in abif_empty])) 134 | return suite 135 | 136 | if __name__ == '__main__': 137 | runner = unittest.TextTestRunner(verbosity=2) 138 | runner.run(run_suite()) 139 | -------------------------------------------------------------------------------- /tests/test_py3_abifpy.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # 3 | # test_abifpy_py3.py 4 | # python3 unit tests for abifpy 5 | # http://github.com/bow/abifpy 6 | 7 | import unittest 8 | import datetime 9 | import abifpy 10 | 11 | class TestAbif(unittest.TestCase): 12 | 13 | def __init__(self, filename): 14 | unittest.TestCase.__init__(self, methodName='runTest') 15 | self.filename = filename 16 | self.abif = abifpy.Trace(self.filename) 17 | self.trimmed_seq = self.abif.trim(self.abif.seq) 18 | self.untrimmed_seq = self.abif.seq 19 | 20 | def shortDescription(self): 21 | return "Testing %s" % self.filename 22 | 23 | def runTest(self): 24 | self.file_type() 25 | self.tag_data() 26 | self.tag_data_len() 27 | self.tag_data_type() 28 | self.trim_is_shorter() 29 | 30 | def file_type(self): 31 | print("\nChecking file type...") 32 | self.abif._handle.seek(0) 33 | self.assertEqual(self.abif._handle.read(4), 'ABIF'.encode()) 34 | 35 | def tag_data(self): 36 | print("Checking tag data parsing...") 37 | for key in self.abif.tags: 38 | # should be assertNone(), not available in py2.6 39 | if self.abif.tags[key].elem_code != 1024: 40 | self.assertNotEqual(self.abif.get_data(key), None) 41 | 42 | def tag_data_len(self): 43 | print("Checking tag data lengths...") 44 | for key in self.abif.tags: 45 | code = self.abif.tags[key].elem_code 46 | data = self.abif.get_data(key) 47 | 48 | # 10 & 11 returns datetime, 12 is not clear, 1024 is ignored 49 | # only check for strings, arrays, and numbers 50 | if code not in [10, 11, 12, 1024]: 51 | # account for null character in pString and cString 52 | mod = 1 if code in [18, 19] else 0 53 | # if data is int/float, len is always 1 54 | obtained = len(data) if not isinstance(data, (int, float, bool)) else 1 55 | expected = self.abif.tags[key].elem_num - mod 56 | self.assertEqual(obtained, expected) 57 | 58 | def tag_data_type(self): 59 | print("Checking tag data types...") 60 | for key in self.abif.tags: 61 | code = self.abif.tags[key].elem_code 62 | data = self.abif.get_data(key) 63 | 64 | # user data should return None 65 | if code == 1024: 66 | self.assertEqual(data, None) 67 | # check for string return type in tags 2, 18, 19 68 | elif code in [2, 18, 19]: 69 | self.assertTrue(isinstance(data, str)) 70 | # check for datetime return tag 71 | elif code == 10: 72 | self.assertTrue(isinstance(data, datetime.date)) 73 | # check for datetime return tag 74 | elif code == 11: 75 | self.assertTrue(isinstance(data, datetime.time)) 76 | elif code == 13: 77 | self.assertTrue(isinstance(data, bool)) 78 | # check for number return types 79 | # some tags' data are still in a tuple of numbers, so will have to 80 | # iterate over them 81 | elif isinstance(data, tuple): 82 | for item in data: 83 | self.assertTrue(isinstance(item, (int, float))) 84 | # otherwise just check for type directly 85 | else: 86 | self.assertTrue(isinstance(data, (int, float))) 87 | 88 | def trim_is_shorter(self): 89 | print("Checking trimmed sequence length...") 90 | self.assertTrue(len(self.trimmed_seq) < len(self.untrimmed_seq)) 91 | 92 | class TestAbifFake(unittest.TestCase): 93 | 94 | def __init__(self, filename): 95 | unittest.TestCase.__init__(self, methodName='runTest') 96 | self.filename = filename 97 | 98 | def shortDescription(self): 99 | return "Testing %s" % self.filename 100 | 101 | def runTest(self): 102 | self.fake_file_type() 103 | 104 | def fake_file_type(self): 105 | print("\nIOError is raised if file is not ABIF...") 106 | self.assertRaises(IOError, abifpy.Trace, self.filename) 107 | 108 | class TestAbifEmpty(unittest.TestCase): 109 | 110 | def __init__(self, filename): 111 | unittest.TestCase.__init__(self, methodName='runTest') 112 | self.filename = filename 113 | self.abif = abifpy.Trace(self.filename) 114 | 115 | def shortDescription(self): 116 | return "Testing %s" % self.filename 117 | 118 | def runTest(self): 119 | self.short_sequence_untrimmed() 120 | 121 | def short_sequence_untrimmed(self): 122 | print("\nValueError is raised if sequence length is shorter than 20.") 123 | self.assertRaises(ValueError, self.abif.trim, self.abif.seq) 124 | 125 | 126 | abif_real = ['3730.ab1', '3100.ab1', '310.ab1',] 127 | abif_fake = ['fake.ab1',] 128 | abif_empty = ['empty.ab1',] 129 | 130 | def run_suite(): 131 | suite = unittest.TestSuite([TestAbif(n) for n in abif_real]) 132 | suite.addTests(unittest.TestSuite([TestAbifFake(n) for n in abif_fake])) 133 | suite.addTests(unittest.TestSuite([TestAbifEmpty(n) for n in abif_empty])) 134 | return suite 135 | 136 | if __name__ == '__main__': 137 | runner = unittest.TextTestRunner(verbosity=2) 138 | runner.run(run_suite()) 139 | --------------------------------------------------------------------------------