├── .gitignore
├── README.rst
├── abifpy.py
├── demo.py
├── setup.py
└── tests
    ├── 310.ab1
    ├── 3100.ab1
    ├── 3730.ab1
    ├── empty.ab1
    ├── fake.ab1
    ├── test_abifpy.py
    └── test_py3_abifpy.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | *.py[co]
 2 | 
 3 | # Packages
 4 | <<<<<<< HEAD
 5 | =======
 6 | setup.py
 7 | >>>>>>> 100f00e8961c25108aae9d1c26da08d81129fbc8
 8 | *.egg
 9 | *.egg-info
10 | dist
11 | build
12 | eggs
13 | parts
14 | bin
15 | develop-eggs
16 | .installed.cfg
17 | 
18 | # Installer logs
19 | pip-log.txt
20 | 
21 | # Unit test / coverage reports
22 | .coverage
23 | .tox
24 | 


--------------------------------------------------------------------------------
/README.rst:
--------------------------------------------------------------------------------
  1 | ======================
  2 | :warning: unmaintained
  3 | ======================
  4 | 
  5 | A modified version of this module has been merged into the `Biopython 
  6 | project`_, available from version 1.58 onwards. If you already have Biopython
  7 | version >=1.58, there is no need to use abifpy. Despite that, I am keeping 
  8 | the module available as a stand-alone for personal reasons :).
  9 | 
 10 | 
 11 | ======
 12 | ABIFPY
 13 | ======
 14 | 
 15 | -----------------------------------------------------------
 16 | Python module for reading ABI Sanger sequencing trace files
 17 | -----------------------------------------------------------
 18 | 
 19 | abifpy is a python module that extracts sequence and various other data from
 20 | Applied Biosystem's, Inc. format (ABI) file. The module is python3-compatible
 21 | and was written based on the `official spec`_ released by Applied Biosystems.
 22 | 
 23 | abifpy provides the following items:
 24 | 
 25 | *class* Trace(in_file)
 26 |     Class representing the trace file ``in_file``.
 27 | 
 28 | Trace object attributes and methods
 29 | ===================================
 30 | 
 31 | seq
 32 |     String of base-called nucleotide sequence stored in the file.
 33 | 
 34 | qual
 35 |     String of phred quality characters of the base-called sequence.
 36 | 
 37 | qual_val
 38 |     List of phred quality values of the base-called sequence.
 39 | 
 40 | id
 41 |     String of the sequence file name.
 42 | 
 43 | name
 44 |     String of the sample name entered prior to sequencing.
 45 | 
 46 | trim(sequence[, cutoff=0.05])        
 47 |     Returns a trimmed sequence using Richard Mott's algorithm (used in phred)
 48 |     with the probability cutoff of 0.05. Can be used on ``seq``, ``qual``, and
 49 |     ``qual_val``.
 50 |     
 51 | get_data(key)
 52 |     Returns metadata stored in the file, accepts keys from ``tags`` (see below).
 53 | 
 54 | export([out_file="", fmt='fasta'])       
 55 |     Writes a fasta (``fmt='fasta'``), qual (``fmt='qual'``), or 
 56 |     fastq (``fmt='fastq'``) file from the trace file. Default format is ``fasta``.
 57 | 
 58 | close()
 59 |     Closes the Trace file object.
 60 | 
 61 | seq_remove_ambig(seq)
 62 |     Replaces extra ambigous base characters (K, Y, W, M, R, S) with 'N'. Accepts ``seq``
 63 |     for input.
 64 | 
 65 | EXTRACT
 66 |     Dictionary for determining which metadata are extracted.
 67 | 
 68 | data
 69 |     Dictionary that contains the file metadata. The keys are values of ``EXTRACT``.
 70 | 
 71 | tags
 72 |     Dictionary of tags with values of data directory class instance. Keys are tag name and 
 73 |     tag number, concatenated. Use ``get_data()`` to access values in each ``tags`` entry.
 74 | 
 75 | Usage
 76 | =====
 77 | 
 78 | ::
 79 | 
 80 |     $ python
 81 |     >>> from abifpy import Trace
 82 |     >>> yummy = Trace('tests/3730.ab1')
 83 | 
 84 | Or if you want to perform base trimming directly::
 85 |     
 86 |     >>> yummy = Trace('tests/3730.ab1', trimming=True)
 87 | 
 88 | Sequence can be accessed with the ``seq`` attribute. Other attributes of note
 89 | are ``qual`` for phred quality characters, ``qual_val`` for phred quality values,
 90 | ``id`` for sequencing trace file name, and ``name`` for the sample name::
 91 | 
 92 |     >>> yummy.seq
 93 |     'GGGCGAGCKYYAYATTTTGGCAAGAATTGAGCTCT...
 94 |     >>> yummy.qual
 95 |     '5$%%%\'%%!!!\'!+5;726@>A=3824DESHSS...
 96 |     >>> yummy.qual_val
 97 |     [20, 3, 4, 4, 4, 6, 4, 4, 0, 0, 0, 6, 0, 10, 20, 26, 22, 17, 21...
 98 |     >>> yummy.id
 99 |     '3730'
100 |     >>> yummy.name
101 |     '226032_C-ME-18_pCAGseqF'
102 | 
103 | If trimming was not performed when instantiating, you can still do it afterwards::
104 |     
105 |     >>> yummy.trim(yummy.seq)
106 | 
107 | The quality values itself can be trimmed as well::
108 | 
109 |     >>> yummy.trim(yummy.qual)
110 | 
111 | Viewing the trace file metadata is easy. Use the values from ``EXTRACT``
112 | as the keys in ``data``::
113 | 
114 |     >>> yummy.data['well']
115 |     'B9'
116 |     >>> yummy.data['model']
117 |     '3730'
118 |     >>> yummy.data['run start date']
119 |     datetime.date(2009, 12, 12)
120 | 
121 | metadata not contained in ``data`` can be viewed using ``get_data()``
122 | with one of the keys in ``tags`` as the argument, e.g.::
123 | 
124 |     >>> yummy.get_data('PTYP1')
125 |     '96-well'
126 | 
127 | For more info on the meaning of these tags and the file metadata, consult the `official spec`_. 
128 | 
129 | Installation
130 | ============
131 | 
132 | * ``pip install abifpy``, or
133 | 
134 | * Add the abifpy directory to your ``$PYTHONPATH`` (in ``.bashrc`` to make it persistent)
135 | 
136 | License
137 | =======
138 | 
139 | abifpy is licensed under the MIT License.
140 | 
141 | Copyright (c) 2011 by Wibowo Arindrarto <bow@bow.web.id>
142 | 
143 | Permission is hereby granted, free of charge, to any person obtaining a copy of
144 | this software and associated documentation files (the "Software"), to deal in
145 | the Software without restriction, including without limitation the rights to
146 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
147 | the Software, and to permit persons to whom the Software is furnished to do so,
148 | subject to the following conditions:
149 | 
150 | The above copyright notice and this permission notice shall be included in all
151 | copies or substantial portions of the Software.
152 | 
153 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
154 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,FITNESS
155 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
156 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
157 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
158 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
159 | 
160 | .. _official spec: http://www.appliedbiosystems.com/support/software_community/ABIF_File_Format.pdf
161 | .. _Biopython project: http://biopython.org/wiki/Biopython
162 | 


--------------------------------------------------------------------------------
/abifpy.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | #
  3 | # abifpy.py
  4 | # python module for reading abi trace files
  5 | # http://github.com/bow/abifpy
  6 | 
  7 | """Python module for reading .ab1 trace files."""
  8 | 
  9 | import datetime
 10 | import struct
 11 | from os.path import splitext, basename
 12 | 
 13 | from sys import version_info
 14 | 
 15 | RELEASE = False
 16 | __version_info__ = ('1', '0', )
 17 | __version__ = '.'.join(__version_info__)
 18 | __version__ += '-dev' if not RELEASE else ''
 19 | 
 20 | 
 21 | __all__ = ['Trace']
 22 | 
 23 | # dictionary for deciding which values to extract and contain in self.data
 24 | EXTRACT = {
 25 |             'TUBE1': 'well',
 26 |             'DySN1': 'dye',
 27 |             'GTyp1': 'polymer',
 28 |             'MODL1': 'model', 
 29 |             'RUND1': 'run start date',
 30 |             'RUND2': 'run finish date',
 31 |             'RUND3': 'data collection start date',
 32 |             'RUND4': 'data collection finish date',
 33 |             'RUNT1': 'run start time',
 34 |             'RUNT2': 'run finish time',
 35 |             'RUNT3': 'data collection start time',
 36 |             'RUNT4': 'data collection finish time',
 37 |             'DATA1': 'raw1',
 38 |             'DATA2': 'raw2',
 39 |             'DATA3': 'raw3',
 40 |             'DATA4': 'raw4',
 41 |             'PLOC2': 'tracepeaks',
 42 |             'FWO_1': 'baseorder',
 43 |           }     
 44 | 
 45 | # dictionary for unpacking tag values
 46 | _BYTEFMT = {
 47 |             1: 'b',     # byte
 48 |             2: 's',     # char
 49 |             3: 'H',     # word
 50 |             4: 'h',     # short
 51 |             5: 'i',     # long
 52 |             6: '2i',    # rational, legacy unsupported
 53 |             7: 'f',     # float
 54 |             8: 'd',     # double
 55 |             10: 'h2B',  # date
 56 |             11: '4B',   # time
 57 |             12: '2i2b', # thumb
 58 |             13: 'B',    # bool
 59 |             14: '2h',   # point, legacy unsupported
 60 |             15: '4h',   # rect, legacy unsupported
 61 |             16: '2i',   # vPoint, legacy unsupported
 62 |             17: '4i',   # vRect, legacy unsupported
 63 |             18: 's',    # pString
 64 |             19: 's',    # cString
 65 |             20: '2i',   # Tag, legacy unsupported
 66 |            }
 67 | 
 68 | # header structure
 69 | _HEADFMT = '>4sH4sI2H3I'
 70 | 
 71 | # directory data structure
 72 | _DIRFMT = '>4sI2H4I'
 73 | 
 74 | # to handle py3 IO
 75 | def py3_get_string(byte):
 76 |     if version_info[0] < 3:
 77 |         return byte
 78 |     else:
 79 |         return byte.decode()
 80 | 
 81 | def py3_get_byte(string):
 82 |     if version_info[0] < 3:
 83 |         return string
 84 |     else:
 85 |         return string.encode()
 86 | 
 87 | class Trace(object):
 88 |     """Class representing trace file."""
 89 |     def __init__(self, in_file, trimming=False):        
 90 |         self._handle = open(in_file, 'rb')
 91 |         try:
 92 |             self._handle.seek(0)
 93 |             if not self._handle.read(4) == py3_get_byte('ABIF'):
 94 |                 raise IOError('Input is not a valid trace file')
 95 |         except IOError:
 96 |             self._handle = None
 97 |             raise
 98 |         else:
 99 |             # header data structure:
100 |             # file type, file, version, tag name, tag number, element type code,
101 |             # element size, number of elements, data size, data offset, handle,
102 |             # file type, file version
103 |             # dictionary for containing file metadata
104 |             self.data = {}
105 |             # dictionary for containing extracted directory data
106 |             self.tags = {}
107 |             self.trimming = trimming
108 |             # values contained in file header
109 |             self._handle.seek(0)
110 |             header = struct.unpack(_HEADFMT, 
111 |                      self._handle.read(struct.calcsize(_HEADFMT)))
112 |             # file format version
113 |             self.version = header[1]
114 | 
115 |             # build dictionary of data tags and metadata
116 |             for entry in self._parse_header(header):
117 |                 key = entry.tag_name + str(entry.tag_num)
118 |                 self.tags[key] = entry
119 |                 # only extract data from tags we care about
120 |                 if key in EXTRACT:
121 |                     # e.g. self.data['well'] = 'B6'
122 |                     self.data[EXTRACT[key]] = self.get_data(key)
123 | 
124 |             self.id = self._get_file_id(in_file)
125 |             self.name = self.get_data('SMPL1')
126 |             self.seq = self.get_data('PBAS2')
127 |             self.qual = ''.join([chr(ord(value) + 33) for value in self.get_data('PCON2')])
128 |             self.qual_val = [ord(value) for value in self.get_data('PCON2')]
129 | 
130 |             if trimming:
131 |                 self.seq, self.qual, self.qual_val = map(self.trim, 
132 |                                                         [self.seq, self.qual,
133 |                                                         self.qual_val])
134 | 
135 |     def __repr__(self):
136 |         """Represents data associated with the file."""
137 |         if len(self.seq) > 10:
138 |             seq = "{0}...{1}".format(self.seq[:5], self.seq[-5:])
139 |             qual_val = "[{0}, ..., {1}]".format(
140 |                       repr(self.qual_val[:5])[1:-1], 
141 |                       repr(self.qual_val[-5:])[1:-1])
142 |         else:
143 |             seq = self.seq
144 |             qual_val = self.qual_val
145 | 
146 |         return "{0}({1}, qual_val:{2}, id:{3}, name:{4})".format(
147 |                 self.__class__.__name__, repr(seq), qual_val,
148 |                 repr(self.id), repr(self.name))
149 |     
150 |     def _parse_header(self, header):
151 |         """Generator for directory contents."""
152 |         # header structure:
153 |         # file signature, file version, tag name, tag number, 
154 |         # element type code, element size, number of elements
155 |         # data size, data offset, handle
156 |         head_elem_size = header[5]
157 |         head_elem_num = header[6]
158 |         head_offset = header[8]
159 |         index = 0
160 |         
161 |         while index < head_elem_num:
162 |             start = head_offset + index * head_elem_size
163 |             # added directory offset to tuple
164 |             # to handle directories with data size <= 4 bytes
165 |             self._handle.seek(start)
166 |             dir_entry =  struct.unpack(_DIRFMT, 
167 |                         self._handle.read(struct.calcsize(_DIRFMT))) + (start,)
168 |             index += 1
169 |             yield _TraceDir(dir_entry, self._handle)
170 | 
171 |     def _get_file_id(self, in_file):
172 |         """Returns filename without extension."""
173 |         return splitext(basename(in_file))[0]
174 | 
175 |     def close(sel):
176 |         """Closes the Trace file object."""
177 |         self._handle.close()
178 |     
179 | 
180 |     def get_data(self, key):
181 |         """Returns data stored in a tag."""
182 |         return self.tags[key].tag_data
183 | 
184 |     def seq_remove_ambig(self, seq):
185 |         """Replaces extra ambiguous bases with 'N'."""
186 |         import re
187 |         seq = self.seq
188 |         return re.sub("K|Y|W|M|R|S", 'N', seq)
189 | 
190 |     def export(self, out_file="", fmt='fasta'):
191 |         """Writes the trace file sequence to a fasta file.
192 |         
193 |         Keyword argument:
194 |         out_file -- output file name (detault 'tracefile'.fa)
195 |         fmt -- 'fasta': write fasta file, 'qual': write qual file, 'fastq': write fastq file
196 | 
197 |         """
198 |         if out_file == "":
199 |             file_name = self.id
200 |             if fmt == 'fasta':
201 |                 file_name += '.fa'
202 |             elif fmt == 'qual':
203 |                 file_name += '.qual'
204 |             elif fmt == 'fastq':
205 |                 file_name += '.fq'
206 |             else:
207 |                 raise ValueError('Invalid file format: {0}.'.format(fmt))
208 |         else:
209 |             file_name = out_file
210 |         
211 |         if fmt == 'fasta':
212 |             contents = '>{0} {1}\n{2}\n'.format(
213 |                         self.id, 
214 |                         self.name, 
215 |                         self.seq)
216 |         elif fmt == 'qual':
217 |             contents = '>{0} {1}\n{2}\n'.format(
218 |                         self.id, 
219 |                         self.name, 
220 |                         ' '.join(map(str, self.qual_val)))
221 |         elif fmt == 'fastq':
222 |             contents = '@{0} {1}\n{2}\n+{0} {1}\n{3}\n'.format(
223 |                         self.id, 
224 |                         self.name, 
225 |                         self.seq, ''.join(self.qual))
226 | 
227 |         with open(file_name, 'w') as out_file:
228 |             out_file.writelines(contents)
229 | 
230 |     def trim(self, seq, cutoff=0.05):
231 |         """Trims the sequence using Richard Mott's modified trimming algorithm.
232 |         
233 |         Keyword argument:
234 |         seq -- sequence to be trimmed
235 |         cutoff -- probability cutoff value
236 | 
237 |         Trimmed bases are determined from their segment score, ultimately
238 |         determined from each base's quality values. 
239 |         
240 |         More on:
241 |         http://www.phrap.org/phredphrap/phred.html
242 |         http://www.clcbio.com/manual/genomics/Quality_trimming.html
243 |         """
244 |         # set flag for trimming
245 |         start = False
246 |         # set minimum segment size
247 |         segment = 20
248 |         trim_start = 0
249 |         
250 |         if len(seq) <= segment:
251 |             raise ValueError('Sequence can not be trimmed because \
252 |                              it is shorter than the trim segment size')
253 |         else:
254 |             # calculate probability back from formula used
255 |             # to calculate phred qual values
256 |             score_list = [cutoff - (10 ** (qual/-10.0)) for 
257 |                          qual in self.qual_val]
258 | 
259 |             # calculate cummulative score_list
260 |             # if cummulative value < 0, set to 0
261 |             # first value is set to 0 (assumption: trim_start is always > 0)
262 |             running_sum = [0]
263 |             for i in range(1, len(score_list)):
264 |                 num = running_sum[-1] + score_list[i]
265 |                 if num < 0:
266 |                     running_sum.append(0)
267 |                 else:
268 |                     running_sum.append(num)
269 |                     if not start:
270 |                         # trim_start = value when cummulative starts to be > 0
271 |                         trim_start = i
272 |                         start = True
273 | 
274 |             # trim_finish = index of the highest cummulative value,
275 |             # marking the segment with the highest cummulative score 
276 |             trim_finish = running_sum.index(max(running_sum)) 
277 | 
278 |             return seq[trim_start:trim_finish]
279 | 
280 | class _TraceDir(object):
281 |     """Class representing directory content."""
282 |     def __init__(self, tag_entry, handle):
283 |         self.tag_name = py3_get_string(tag_entry[0])
284 |         self.tag_num = tag_entry[1]
285 |         self.elem_code = tag_entry[2]
286 |         self.elem_size = tag_entry[3]
287 |         self.elem_num = tag_entry[4]
288 |         self.data_size = tag_entry[5]
289 |         self.data_offset = tag_entry[6]
290 |         self.data_handle = tag_entry[7]
291 |         self.tag_offset = tag_entry[8]
292 | 
293 |         # if data size is <= 4 bytes, data is stored inside the directory
294 |         # so offset needs to be changed
295 |         if self.data_size <= 4:
296 |             self.data_offset = self.tag_offset + 20
297 | 
298 |         self.tag_data = self._unpack(handle)
299 | 
300 |     def __repr__(self):
301 |         """Represents data associated with a tag."""
302 |         summary = ['tag_name: {0}'.format(repr(self.tag_name))]
303 |         summary.append('tag_number: {0}'.format(repr(self.tag_num)))
304 |         summary.append('elem_code: {0}'.format(repr(self.elem_code)))
305 |         summary.append('elem_size: {0}'.format(repr(self.elem_size)))
306 |         summary.append('elem_num: {0}'.format(repr(self.elem_num)))
307 |         summary.append('data_size: {0}'.format(repr(self.data_size)))
308 |         summary.append('data_offset: {0}'.format(repr(self.data_offset)))
309 |         summary.append('data_handle: {0}'.format(repr(self.data_handle)))
310 |         summary.append('tag_offset: {0}'.format(repr(self.tag_offset)))
311 |         summary.append('tag_data: {0}'.format(repr(self.tag_data)))
312 |        
313 |         return '\n'.join(summary)
314 | 
315 |     def _unpack(self, handle):
316 |         """Returns tag data"""
317 |         if self.elem_code in _BYTEFMT:
318 |             
319 |             # because ">1s" unpacks differently from ">s"
320 |             num = '' if self.elem_num == 1 else str(self.elem_num)
321 |             fmt = "{0}{1}{2}".format('>', num, _BYTEFMT[self.elem_code])
322 |             start = self.data_offset
323 |     
324 |             handle.seek(start)
325 |             data = struct.unpack(fmt, handle.read(struct.calcsize(fmt)))
326 |             
327 |             # no need to use tuple if len(data) == 1
328 |             if self.elem_code not in [10, 11] and len(data) == 1:
329 |                 data = data[0]
330 | 
331 |             # account for different data types
332 |             if self.elem_code == 2:
333 |                 return py3_get_string(data)
334 |             elif self.elem_code == 10:
335 |                 return datetime.date(*data)
336 |             elif self.elem_code == 11:
337 |                 return datetime.time(*data)
338 |             elif self.elem_code == 13:
339 |                 return bool(data)
340 |             elif self.elem_code == 18:
341 |                 return py3_get_string(data[1:])
342 |             elif self.elem_code == 19:
343 |                 return py3_get_string(data[:-1])
344 |             else:
345 |                 return data
346 |         else:
347 |             return None
348 | 


--------------------------------------------------------------------------------
/demo.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | 
 3 | # demo.py
 4 | # demo script for using abifpy to write fasta files from trace files
 5 | 
 6 | import abifpy
 7 | import glob
 8 | 
 9 | counter = 0
10 | 
11 | print "Working..."
12 | 
13 | for trace in glob.iglob('*.ab1'):
14 |     abifpy.Trace(trace, trimming=True).export()
15 |     counter += 1
16 | 
17 | print "Done! Processed {0} trace files.".format(counter)
18 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | 
 3 | from setuptools import setup, find_packages
 4 | 
 5 | from abifpy import __version__
 6 | 
 7 | 
 8 | version = __version__
 9 | long_description = open("README.rst").read()
10 | 
11 | setup(
12 |     name = "abifpy",
13 |     version = version,
14 |     description = "abifpy is a module for reading ABI Sanger sequencing trace files.",
15 |     long_description = long_description,
16 |     author = "Wibowo Arindrarto",
17 |     author_email = "bow@bow.web.id",
18 |     py_modules = ['abifpy'],
19 |     url = "http://github.com/bow/abifpy/",
20 |     license = "MIT",
21 |     zip_safe = False,
22 |     classifiers = [
23 |         "Development Status :: 4 - Beta",
24 |         "Environment :: Console",
25 |         "Intended Audience :: Science/Research",
26 |         "License :: OSI Approved :: MIT License",
27 |         "Programming Language :: Python",
28 |         "Topic :: Scientific/Engineering :: Bio-Informatics",
29 |     ],
30 | )
31 | 


--------------------------------------------------------------------------------
/tests/310.ab1:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bow/abifpy/6a74bbeec9410a827e8c82f0b2278a70d55cb3ab/tests/310.ab1


--------------------------------------------------------------------------------
/tests/3100.ab1:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bow/abifpy/6a74bbeec9410a827e8c82f0b2278a70d55cb3ab/tests/3100.ab1


--------------------------------------------------------------------------------
/tests/3730.ab1:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bow/abifpy/6a74bbeec9410a827e8c82f0b2278a70d55cb3ab/tests/3730.ab1


--------------------------------------------------------------------------------
/tests/empty.ab1:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bow/abifpy/6a74bbeec9410a827e8c82f0b2278a70d55cb3ab/tests/empty.ab1


--------------------------------------------------------------------------------
/tests/fake.ab1:
--------------------------------------------------------------------------------
1 | This is a fake trace file
2 | 


--------------------------------------------------------------------------------
/tests/test_abifpy.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | #
  3 | # test_abifpy.py
  4 | # python2 unit tests for abifpy
  5 | # http://github.com/bow/abifpy
  6 | 
  7 | import unittest
  8 | import datetime
  9 | import abifpy
 10 | 
 11 | class TestAbif(unittest.TestCase):
 12 | 
 13 |     def __init__(self, filename):
 14 |         unittest.TestCase.__init__(self, methodName='runTest')
 15 |         self.filename = filename
 16 |         self.abif = abifpy.Trace(self.filename)
 17 |         self.trimmed_seq = self.abif.trim(self.abif.seq)
 18 |         self.untrimmed_seq = self.abif.seq
 19 | 
 20 |     def shortDescription(self):
 21 |         return "Testing %s" % self.filename
 22 | 
 23 |     def runTest(self):
 24 |         self.file_type()
 25 |         self.tag_data()
 26 |         self.tag_data_len()
 27 |         self.tag_data_type()
 28 |         self.trim_is_shorter() 
 29 | 
 30 |     def file_type(self):
 31 |         print "\nChecking file type..."
 32 |         self.abif._handle.seek(0)
 33 |         self.assertEqual(self.abif._handle.read(4), 'ABIF')
 34 | 
 35 |     def tag_data(self):
 36 |         print "Checking tag data parsing..."
 37 |         for key in self.abif.tags:
 38 |             # should be assertNone(), not available in py2.6
 39 |             if self.abif.tags[key].elem_code != 1024:
 40 |                 self.assertNotEqual(self.abif.get_data(key), None)
 41 | 
 42 |     def tag_data_len(self):
 43 |         print "Checking tag data lengths..."
 44 |         for key in self.abif.tags:
 45 |             code = self.abif.tags[key].elem_code
 46 |             data = self.abif.get_data(key)
 47 |             
 48 |             # 10 & 11 returns datetime, 12 is not clear, 1024 is ignored
 49 |             # only check for strings, arrays, and numbers
 50 |             if code not in [10, 11, 12, 1024]:
 51 |                 # account for null character in pString and cString
 52 |                 mod = 1 if code in [18, 19] else 0
 53 |                 # if data is int/float, len is always 1
 54 |                 obtained = len(data) if not isinstance(data, (int, float, bool)) else 1
 55 |                 expected = self.abif.tags[key].elem_num - mod
 56 |                 self.assertEqual(obtained, expected)
 57 |                 
 58 |     def tag_data_type(self):
 59 |         print "Checking tag data types..."
 60 |         for key in self.abif.tags:
 61 |             code = self.abif.tags[key].elem_code
 62 |             data = self.abif.get_data(key)
 63 | 
 64 |             # user data should return None
 65 |             if code == 1024:
 66 |                 self.assertEqual(data, None)
 67 |             # check for string return type in tags 2, 18, 19
 68 |             elif code in [2, 18, 19]:
 69 |                 self.assertTrue(isinstance(data, basestring))
 70 |             # check for datetime return tag
 71 |             elif code == 10:
 72 |                 self.assertTrue(isinstance(data, datetime.date))
 73 |             # check for datetime return tag
 74 |             elif code == 11:
 75 |                 self.assertTrue(isinstance(data, datetime.time))
 76 |             elif code == 13:
 77 |                 self.assertTrue(isinstance(data, bool))
 78 |             # check for number return types
 79 |             # some tags' data are still in a tuple of numbers, so will have to
 80 |             # iterate over them
 81 |             elif isinstance(data, tuple):
 82 |                 for item in data:
 83 |                     self.assertTrue(isinstance(item, (int, float)))
 84 |             # otherwise just check for type directly
 85 |             else:
 86 |                 self.assertTrue(isinstance(data, (int, float)))
 87 | 
 88 |     def trim_is_shorter(self):
 89 |         print "Checking trimmed sequence length..."
 90 |         self.assertTrue(len(self.trimmed_seq) < len(self.untrimmed_seq))
 91 | 
 92 | class TestAbifFake(unittest.TestCase):
 93 | 
 94 |     def __init__(self, filename):
 95 |         unittest.TestCase.__init__(self, methodName='runTest')
 96 |         self.filename = filename
 97 | 
 98 |     def shortDescription(self):
 99 |         return "Testing %s" % self.filename
100 | 
101 |     def runTest(self):
102 |         self.fake_file_type()
103 | 
104 |     def fake_file_type(self):
105 |         print "\nIOError is raised if file is not ABIF..."
106 |         self.assertRaises(IOError, abifpy.Trace, self.filename)
107 | 
108 | class TestAbifEmpty(unittest.TestCase):
109 | 
110 |     def __init__(self, filename):
111 |         unittest.TestCase.__init__(self, methodName='runTest')
112 |         self.filename = filename
113 |         self.abif = abifpy.Trace(self.filename)
114 | 
115 |     def shortDescription(self):
116 |         return "Testing %s" % self.filename
117 | 
118 |     def runTest(self):
119 |         self.short_sequence_untrimmed()
120 | 
121 |     def short_sequence_untrimmed(self):
122 |         print "\nValueError is raised if sequence length is shorter than 20."
123 |         self.assertRaises(ValueError, self.abif.trim, self.abif.seq) 
124 | 
125 | 
126 | abif_real = ['3730.ab1', '3100.ab1', '310.ab1',]
127 | abif_fake = ['fake.ab1',]
128 | abif_empty = ['empty.ab1',]
129 | 
130 | def run_suite():
131 |     suite = unittest.TestSuite([TestAbif(n) for n in abif_real])
132 |     suite.addTests(unittest.TestSuite([TestAbifFake(n) for n in abif_fake]))
133 |     suite.addTests(unittest.TestSuite([TestAbifEmpty(n) for n in abif_empty]))
134 |     return suite
135 | 
136 | if __name__ == '__main__':
137 |     runner = unittest.TextTestRunner(verbosity=2)
138 |     runner.run(run_suite())
139 | 


--------------------------------------------------------------------------------
/tests/test_py3_abifpy.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | #
  3 | # test_abifpy_py3.py
  4 | # python3 unit tests for abifpy
  5 | # http://github.com/bow/abifpy
  6 | 
  7 | import unittest
  8 | import datetime
  9 | import abifpy
 10 | 
 11 | class TestAbif(unittest.TestCase):
 12 | 
 13 |     def __init__(self, filename):
 14 |         unittest.TestCase.__init__(self, methodName='runTest')
 15 |         self.filename = filename
 16 |         self.abif = abifpy.Trace(self.filename)
 17 |         self.trimmed_seq = self.abif.trim(self.abif.seq)
 18 |         self.untrimmed_seq = self.abif.seq
 19 | 
 20 |     def shortDescription(self):
 21 |         return "Testing %s" % self.filename
 22 | 
 23 |     def runTest(self):
 24 |         self.file_type()
 25 |         self.tag_data()
 26 |         self.tag_data_len()
 27 |         self.tag_data_type()
 28 |         self.trim_is_shorter() 
 29 | 
 30 |     def file_type(self):
 31 |         print("\nChecking file type...")
 32 |         self.abif._handle.seek(0)
 33 |         self.assertEqual(self.abif._handle.read(4), 'ABIF'.encode())
 34 | 
 35 |     def tag_data(self):
 36 |         print("Checking tag data parsing...")
 37 |         for key in self.abif.tags:
 38 |             # should be assertNone(), not available in py2.6
 39 |             if self.abif.tags[key].elem_code != 1024:
 40 |                 self.assertNotEqual(self.abif.get_data(key), None)
 41 | 
 42 |     def tag_data_len(self):
 43 |         print("Checking tag data lengths...")
 44 |         for key in self.abif.tags:
 45 |             code = self.abif.tags[key].elem_code
 46 |             data = self.abif.get_data(key)
 47 |             
 48 |             # 10 & 11 returns datetime, 12 is not clear, 1024 is ignored
 49 |             # only check for strings, arrays, and numbers
 50 |             if code not in [10, 11, 12, 1024]:
 51 |                 # account for null character in pString and cString
 52 |                 mod = 1 if code in [18, 19] else 0
 53 |                 # if data is int/float, len is always 1
 54 |                 obtained = len(data) if not isinstance(data, (int, float, bool)) else 1
 55 |                 expected = self.abif.tags[key].elem_num - mod
 56 |                 self.assertEqual(obtained, expected)
 57 |                 
 58 |     def tag_data_type(self):
 59 |         print("Checking tag data types...")
 60 |         for key in self.abif.tags:
 61 |             code = self.abif.tags[key].elem_code
 62 |             data = self.abif.get_data(key)
 63 | 
 64 |             # user data should return None
 65 |             if code == 1024:
 66 |                 self.assertEqual(data, None)
 67 |             # check for string return type in tags 2, 18, 19
 68 |             elif code in [2, 18, 19]:
 69 |                 self.assertTrue(isinstance(data, str))
 70 |             # check for datetime return tag
 71 |             elif code == 10:
 72 |                 self.assertTrue(isinstance(data, datetime.date))
 73 |             # check for datetime return tag
 74 |             elif code == 11:
 75 |                 self.assertTrue(isinstance(data, datetime.time))
 76 |             elif code == 13:
 77 |                 self.assertTrue(isinstance(data, bool))
 78 |             # check for number return types
 79 |             # some tags' data are still in a tuple of numbers, so will have to
 80 |             # iterate over them
 81 |             elif isinstance(data, tuple):
 82 |                 for item in data:
 83 |                     self.assertTrue(isinstance(item, (int, float)))
 84 |             # otherwise just check for type directly
 85 |             else:
 86 |                 self.assertTrue(isinstance(data, (int, float)))
 87 | 
 88 |     def trim_is_shorter(self):
 89 |         print("Checking trimmed sequence length...")
 90 |         self.assertTrue(len(self.trimmed_seq) < len(self.untrimmed_seq))
 91 | 
 92 | class TestAbifFake(unittest.TestCase):
 93 | 
 94 |     def __init__(self, filename):
 95 |         unittest.TestCase.__init__(self, methodName='runTest')
 96 |         self.filename = filename
 97 | 
 98 |     def shortDescription(self):
 99 |         return "Testing %s" % self.filename
100 | 
101 |     def runTest(self):
102 |         self.fake_file_type()
103 | 
104 |     def fake_file_type(self):
105 |         print("\nIOError is raised if file is not ABIF...")
106 |         self.assertRaises(IOError, abifpy.Trace, self.filename)
107 | 
108 | class TestAbifEmpty(unittest.TestCase):
109 | 
110 |     def __init__(self, filename):
111 |         unittest.TestCase.__init__(self, methodName='runTest')
112 |         self.filename = filename
113 |         self.abif = abifpy.Trace(self.filename)
114 | 
115 |     def shortDescription(self):
116 |         return "Testing %s" % self.filename
117 | 
118 |     def runTest(self):
119 |         self.short_sequence_untrimmed()
120 | 
121 |     def short_sequence_untrimmed(self):
122 |         print("\nValueError is raised if sequence length is shorter than 20.")
123 |         self.assertRaises(ValueError, self.abif.trim, self.abif.seq) 
124 | 
125 | 
126 | abif_real = ['3730.ab1', '3100.ab1', '310.ab1',]
127 | abif_fake = ['fake.ab1',]
128 | abif_empty = ['empty.ab1',]
129 | 
130 | def run_suite():
131 |     suite = unittest.TestSuite([TestAbif(n) for n in abif_real])
132 |     suite.addTests(unittest.TestSuite([TestAbifFake(n) for n in abif_fake]))
133 |     suite.addTests(unittest.TestSuite([TestAbifEmpty(n) for n in abif_empty]))
134 |     return suite
135 | 
136 | if __name__ == '__main__':
137 |     runner = unittest.TextTestRunner(verbosity=2)
138 |     runner.run(run_suite())
139 | 


--------------------------------------------------------------------------------