Python phoneme alignment representation

├── .gitignore
├── CITATION.cff
├── LICENSE
├── README.md
├── pypar
    ├── __init__.py
    ├── alignment.py
    ├── compare.py
    ├── phoneme.py
    ├── textgrid.py
    └── word.py
├── setup.py
└── test
    ├── assets
        ├── float.json
        ├── test.TextGrid
        ├── test.json
        ├── test.txt
        └── test.wav
    ├── conftest.py
    ├── test_alignment.py
    └── test_compare.py


/.gitignore:
--------------------------------------------------------------------------------
1 | *.egg-info
2 | __pycache__/
3 | .ipynb_checkpoints/
4 | .pytest_cache/
5 | .vscode/
6 | build/
7 | dist/
8 | 


--------------------------------------------------------------------------------
/CITATION.cff:
--------------------------------------------------------------------------------
 1 | cff-version: 1.2.0
 2 | message: "If you use this software, please cite it using the following metadata."
 3 | authors:
 4 | - family-names: "Morrison"
 5 |   given-names: "Max"
 6 | title: "pypar"
 7 | version: 0.0.2
 8 | date-released: 2021-04-03
 9 | url: "https://github.com/maxrmorrison/pypar"
10 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2020 Max Morrison
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | <h1 align="center">Python phoneme alignment representation</h1>
  2 | <div align="center">
  3 | 
  4 | [![PyPI](https://img.shields.io/pypi/v/pypar.svg)](https://pypi.python.org/pypi/pypar)
  5 | [![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
  6 | [![Downloads](https://static.pepy.tech/badge/pypar)](https://pepy.tech/project/pypar)
  7 | 
  8 | `pip install pypar`
  9 | </div>
 10 | 
 11 | Word and phoneme alignment representation for speech tasks. This repo does
 12 | not perform forced word or phoneme alignment, but provides an interface
 13 | for working with the resulting alignment of a forced aligner, such as
 14 | [`pyfoal`](https://github.com/maxrmorrison/pyfoal), or a manual alignment.
 15 | 
 16 | 
 17 | ## Table of contents
 18 | 
 19 | - [Usage](#usage)
 20 |     * [Creating alignments](#creating-aligments)
 21 |     * [Accessing words and phonemes](#accessing-words-and-phonemes)
 22 |     * [Saving alignments](#saving-alignments)
 23 | - [Application programming interface (API)](#application-programming-interface-api)
 24 |     * [`pypar.Alignment`](#pyparalignment)
 25 |         * [`pypar.Alignment.__init__`](#pyparalignment__init__)
 26 |         * [`pypar.Alignment.__add__`](#pyparalignment__add__)
 27 |         * [`pypar.Alignment.__eq__`](#pyparalignment__eq__)
 28 |         * [`pypar.Alignment.__getitem__`](#pyparalignment__getitem__)
 29 |         * [`pypar.Alignment.__len__`](#pyparalignment__len__)
 30 |         * [`pypar.Alignment.__str__`](#pyparalignment__str__)
 31 |         * [`pypar.Alignment.duration`](#pyparalignmentduration)
 32 |         * [`pypar.Alignment.end`](#pyparalignmentend)
 33 |         * [`pypar.Alignment.find`](#pyparalignmentfind)
 34 |         * [`pypar.Alignment.framewise_phoneme_indices`](#pyparalignmentframewise_phoneme_indices)
 35 |         * [`pypar.Alignment.phonemes`](#pyparalignmentphonemes)
 36 |         * [`pypar.Alignment.phoneme_at_time`](#pyparalignmentphoneme_at_time)
 37 |         * [`pypar.Alignment.phoneme_bounds`](#pyparalignmentphoneme_bounds)
 38 |         * [`pypar.Alignment.save`](#pyparalignmentsave)
 39 |         * [`pypar.Alignment.start`](#pyparalignmentstart)
 40 |         * [`pypar.Alignment.update`](#pyparalignmentupdate)
 41 |         * [`pypar.Alignment.words`](#pyparalignmentwords)
 42 |         * [`pypar.Alignment.word_bounds`](#pyparalignmentword_bounds)
 43 |     * [`pypar.Phoneme`](#pyparphoneme)
 44 |         * [`pypar.Phoneme.__init__`](#pyparphoneme__init__)
 45 |         * [`pypar.Phoneme.__eq__`](#pyparphoneme__eq__)
 46 |         * [`pypar.Phoneme.__str__`](#pyparphoneme__str__)
 47 |         * [`pypar.Phoneme.duration`](#pyparphonemeduration)
 48 |         * [`pypar.Phoneme.end`](#pyparphonemeend)
 49 |         * [`pypar.Phoneme.start`](#pyparphonemestart)
 50 |     * [`pypar.Word`](#pyparword)
 51 |         * [`pypar.Word.__init__`](#pyparword__init__)
 52 |         * [`pypar.Word.__eq__`](#pyparword__eq__)
 53 |         * [`pypar.Word.__getitem__`](#pyparword__getitem__)
 54 |         * [`pypar.Word.__len__`](#pyparword__len__)
 55 |         * [`pypar.Word.__str__`](#pyparword__str__)
 56 |         * [`pypar.Word.duration`](#pyparwordduration)
 57 |         * [`pypar.Word.end`](#pyparwordend)
 58 |         * [`pypar.Word.phoneme_at_time`](#pyparwordphoneme_at_time)
 59 |         * [`pypar.Word.start`](#pyparwordstart)
 60 | - [Tests](#tests)
 61 | 
 62 | ## Usage
 63 | 
 64 | ### Creating alignments
 65 | 
 66 | If you already have the alignment saved to a `json`, `mlf`, or `TextGrid`
 67 | file, pass the name of the file. Valid examples of each format can be found in
 68 | `test/assets/`.
 69 | 
 70 | ```python
 71 | alignment = pypar.Alignment(file)
 72 | ```
 73 | 
 74 | Alignments can be created manually from `Word` and `Phoneme` objects. Start and
 75 | end times are given in seconds.
 76 | 
 77 | ```python
 78 | # Create a word from phonemes
 79 | word = pypar.Word(
 80 |     'THE',
 81 |     [pypar.Phoneme('DH', 0., .03), pypar.Phoneme('AH0', .03, .06)])
 82 | 
 83 | # Create a silence
 84 | silence = pypar.Word(pypar.SILENCE, pypar.Phoneme(pypar.SILENCE, .06, .16))
 85 | 
 86 | # Make an alignment
 87 | alignment = pypar.Alignment([word, silence])
 88 | ```
 89 | 
 90 | You can create a new alignment from existing alignments via slicing and
 91 | concatenation.
 92 | 
 93 | ```python
 94 | # Slice
 95 | first_two_words = alignment[:2]
 96 | 
 97 | # Concatenate
 98 | alignment_with_repeat = first_two_words + alignment
 99 | ```
100 | 
101 | 
102 | ### Accessing words and phonemes
103 | 
104 | To retrieve a list of words in the alignment, use `alignment.words()`.
105 | To retrieve a list of phonemes, use `alignment.phonemes()`. The `Alignment`,
106 | `Word`, and `Phoneme` objects all define `.start()`, `.end()`, and
107 | `.duration()` methods, which return the start time, end time, and duration,
108 | respectively. All times are given in units of seconds. These objects also
109 | define equality checks via `==`, casting to string with `str()`, and iteration
110 | as follows.
111 | 
112 | ```python
113 | # Iterate over words
114 | for word in alignment:
115 | 
116 |     # Access start and end times
117 |     assert word.duration() == word.end() - word.start()
118 | 
119 |     # Iterate over phonemes in word
120 |     for phoneme in word:
121 | 
122 |         # Access string representation
123 |         assert isinstance(str(phoneme), str)
124 | ```
125 | 
126 | To access a word or phoneme at a specific time, pass the time in seconds to
127 | `alignment.word_at_time` or `alignment.phoneme_at_time`.
128 | 
129 | To retrieve the frame indices of the start and end of a word or phoneme, pass
130 | the audio sampling rate and hopsize (in samples) to `alignment.word_bounds` or
131 | `alignment.phoneme_bounds`.
132 | 
133 | 
134 | ### Saving alignments
135 | 
136 | To save an alignment to disk, use `alignment.save(file)`, where `file` is the
137 | desired filename. `pypar` currently supports saving as a `json` or `TextGrid`
138 | file.
139 | 
140 | 
141 | ## Application programming interface (API)
142 | 
143 | ### `pypar.Alignment`
144 | 
145 | #### `pypar.Alignment.__init__`
146 | 
147 | ```python
148 | def __init__(
149 |     self,
150 |     alignment: Union[str, bytes, os.PathLike, List[pypar.Word], dict]
151 | ) -> None:
152 |     """Create alignment
153 | 
154 |     Arguments
155 |         alignment
156 |             The filename, list of words, or json dict of the alignment
157 |     """
158 | ```
159 | 
160 | 
161 | #### `pypar.Alignment.__add__`
162 | 
163 | ```python
164 | def __add__(self, other):
165 |     """Add alignments by concatenation
166 | 
167 |     Arguments
168 |         other
169 |             The alignment to compare to
170 | 
171 |     Returns
172 |         The concatenated alignment
173 |     """
174 | ```
175 | 
176 | 
177 | #### `pypar.Alignment.__eq__`
178 | 
179 | ```python
180 | def __eq__(self, other) -> bool:
181 |     """Equality comparison for alignments
182 | 
183 |     Arguments
184 |         other
185 |             The alignment to compare to
186 | 
187 |     Returns
188 |         Whether the alignments are equal
189 |     """
190 | ```
191 | 
192 | 
193 | #### `pypar.Alignment.__getitem__`
194 | 
195 | ```python
196 | def __getitem__(self, idx: Union[int, slice]) -> pypar.Word:
197 |     """Retrieve the idxth word
198 | 
199 |     Arguments
200 |         idx
201 |             The index of the word to retrieve
202 | 
203 |     Returns
204 |         The word at index idx
205 |     """
206 | ```
207 | 
208 | 
209 | #### `pypar.Alignment.__len__`
210 | 
211 | ```python
212 | def __len__(self) -> int:
213 |     """Retrieve the number of words
214 | 
215 |     Returns
216 |         The number of words in the alignment
217 |     """
218 | ```
219 | 
220 | 
221 | #### `pypar.Alignment.__str__`
222 | 
223 | ```python
224 | def __str__(self) -> str:
225 |     """Retrieve the text
226 | 
227 |     Returns
228 |         The words in the alignment separated by spaces
229 |     """
230 | ```
231 | 
232 | 
233 | #### `pypar.Alignment.duration`
234 | 
235 | ```python
236 | def duration(self) -> float:
237 |     """Retrieve the duration of the alignment in seconds
238 | 
239 |     Returns
240 |         The duration in seconds
241 |     """
242 | ```
243 | 
244 | 
245 | #### `pypar.Alignment.end`
246 | 
247 | ```python
248 | def end(self) -> float:
249 |     """Retrieve the end time of the alignment in seconds
250 | 
251 |     Returns
252 |         The end time in seconds
253 |     """
254 | ```
255 | 
256 | 
257 | #### `pypar.Alignment.framewise_phoneme_indices`
258 | 
259 | ```python
260 | def framewise_phoneme_indices(
261 |     self,
262 |     phoneme_map: Dict[str, int],
263 |     hopsize: float,
264 |     times: Optional[List[float]] = None
265 | ) -> List[int]:
266 |     """Convert alignment to phoneme indices at regular temporal interval
267 | 
268 |     Arguments
269 |         phoneme_map
270 |             Mapping from phonemes to indices
271 |         hopsize
272 |             Temporal interval between frames in seconds
273 |         times
274 |             Specified times in seconds to sample phonemes
275 |     """
276 | ```
277 | 
278 | 
279 | #### `pypar.Alignment.find`
280 | 
281 | ```python
282 | def find(self, words: str) -> int:
283 |     """Find the words in the alignment
284 | 
285 |     Arguments
286 |         words
287 |             The words to find
288 | 
289 |     Returns
290 |         The index of the start of the words or -1 if not found
291 |     """
292 | ```
293 | 
294 | 
295 | #### `pypar.Alignment.phonemes`
296 | 
297 | ```python
298 | def phonemes(self) -> List[pypar.Phoneme]:
299 |     """Retrieve the phonemes in the alignment
300 | 
301 |     Returns
302 |         The phonemes in the alignment
303 |     """
304 | ```
305 | 
306 | 
307 | #### `pypar.Alignment.phoneme_at_time`
308 | 
309 | ```python
310 | def phoneme_at_time(self, time: float) -> Optional[pypar.Phoneme]:
311 |     """Retrieve the phoneme spoken at specified time
312 | 
313 |     Arguments
314 |         time
315 |             Time in seconds
316 | 
317 |     Returns
318 |         The phoneme at the given time (or None if time is out of bounds)
319 |     """
320 | ```
321 | 
322 | 
323 | #### `pypar.Alignment.phoneme_bounds`
324 | 
325 | ```python
326 | def phoneme_bounds(
327 |     self,
328 |     sample_rate: int,
329 |     hopsize: int = 1
330 | ) -> List[Tuple[int, int]]:
331 |     """Retrieve the start and end frame index of each phoneme
332 | 
333 |     Arguments
334 |         sample_rate
335 |             The audio sampling rate
336 |         hopsize
337 |             The number of samples between successive frames
338 | 
339 |     Returns
340 |         The start and end indices of the phonemes
341 |     """
342 | ```
343 | 
344 | 
345 | #### `pypar.Alignment.save`
346 | 
347 | ```python
348 | def save(self, filename: Union[str, bytes, os.PathLike]) -> None:
349 |     """Save alignment to json
350 | 
351 |     Arguments
352 |         filename
353 |             The location on disk to save the phoneme alignment json
354 |     """
355 | ```
356 | 
357 | 
358 | #### `pypar.Alignment.start`
359 | 
360 | ```python
361 | def start(self) -> float:
362 |     """Retrieve the start time of the alignment in seconds
363 | 
364 |     Returns
365 |         The start time in seconds
366 |     """
367 | ```
368 | 
369 | 
370 | #### `pypar.Alignment.update`
371 | 
372 | ```python
373 | def update(
374 |     self,
375 |     idx: int = 0,
376 |     durations: Optional[List[float]] = None,
377 |     start: Optional[float] = None
378 | ) -> None:
379 |     """Update alignment starting from phoneme index idx
380 | 
381 |     Arguments
382 |         idx
383 |             The index of the first phoneme whose duration is being updated
384 |         durations
385 |             The new phoneme durations, starting from idx
386 |         start
387 |             The start time of the alignment
388 |     """
389 | ```
390 | 
391 | 
392 | #### `pypar.Alignment.words`
393 | 
394 | ```python
395 | def words(self) -> List[pypar.Word]:
396 |     """Retrieve the words in the alignment
397 | 
398 |     Returns
399 |         The words in the alignment
400 |     """
401 | ```
402 | 
403 | 
404 | #### `pypar.Alignment.word_bounds`
405 | 
406 | ```python
407 | def word_at_time(self, time: float) -> Optional[pypar.Word]:
408 |     """Retrieve the word spoken at specified time
409 | 
410 |     Arguments
411 |         time
412 |             Time in seconds
413 | 
414 |     Returns
415 |         The word spoken at the specified time
416 |     """
417 | ```
418 | 
419 | 
420 | ### `pypar.Phoneme`
421 | 
422 | #### `pypar.Phoneme.__init__`
423 | 
424 | ```python
425 | def __init__(self, phoneme: str, start: float, end: float) -> None:
426 |     """Create phoneme
427 | 
428 |     Arguments
429 |         phoneme
430 |             The phoneme
431 |         start
432 |             The start time in seconds
433 |         end
434 |             The end time in seconds
435 |     """
436 | ```
437 | 
438 | 
439 | #### `pypar.Phoneme.__eq__`
440 | 
441 | ```python
442 | def __eq__(self, other) -> bool:
443 |     """Equality comparison for phonemes
444 | 
445 |     Arguments
446 |         other
447 |             The phoneme to compare to
448 | 
449 |     Returns
450 |         Whether the phonemes are equal
451 |     """
452 | ```
453 | 
454 | 
455 | #### `pypar.Phoneme.__str__`
456 | 
457 | ```python
458 | def __str__(self) -> str:
459 |     """Retrieve the phoneme text
460 | 
461 |     Returns
462 |         The phoneme
463 |     """
464 | ```
465 | 
466 | 
467 | #### `pypar.Phoneme.duration`
468 | 
469 | ```python
470 | def duration(self) -> float:
471 |     """Retrieve the phoneme duration
472 | 
473 |     Returns
474 |         The duration in seconds
475 |     """
476 | ```
477 | 
478 | 
479 | #### `pypar.Phoneme.end`
480 | 
481 | ```python
482 | def end(self) -> float:
483 |     """Retrieve the end time of the phoneme in seconds
484 | 
485 |     Returns
486 |         The end time in seconds
487 |     """
488 | ```
489 | 
490 | 
491 | #### `pypar.Phoneme.start`
492 | 
493 | ```python
494 | def start(self) -> float:
495 |     """Retrieve the start time of the phoneme in seconds
496 | 
497 |     Returns
498 |         The start time in seconds
499 |     """
500 | ```
501 | 
502 | 
503 | ### `pypar.Word`
504 | 
505 | #### `pypar.Word.__init__`
506 | 
507 | ```python
508 | def __init__(self, word: str, phonemes: List[pypar.Phoneme]) -> None:
509 |     """Create word
510 | 
511 |     Arguments
512 |         word
513 |             The word
514 |         phonemes
515 |             The phonemes in the word
516 |     """
517 | ```
518 | 
519 | 
520 | #### `pypar.Word.__eq__`
521 | 
522 | ```python
523 | def __eq__(self, other) -> bool:
524 |     """Equality comparison for words
525 | 
526 |     Arguments
527 |         other
528 |             The word to compare to
529 | 
530 |     Returns
531 |         Whether the words are the same
532 |     """
533 | ```
534 | 
535 | 
536 | #### `pypar.Word.__getitem__`
537 | 
538 | ```python
539 | def __getitem__(self, idx: int) -> pypar.Phoneme:
540 |     """Retrieve the idxth phoneme
541 | 
542 |     Arguments
543 |         idx
544 |             The index of the phoneme to retrieve
545 | 
546 |     Returns
547 |         The phoneme at index idx
548 |     """
549 | ```
550 | 
551 | 
552 | #### `pypar.Word.__len__`
553 | 
554 | ```python
555 | def __len__(self) -> int:
556 |     """Retrieve the number of phonemes
557 | 
558 |     Returns
559 |         The number of phonemes
560 |     """
561 | ```
562 | 
563 | 
564 | #### `pypar.Word.__str__`
565 | 
566 | ```python
567 | def __str__(self) -> str:
568 |     """Retrieve the word text
569 | 
570 |     Returns
571 |         The word text
572 |     """
573 | ```
574 | 
575 | 
576 | #### `pypar.Word.duration`
577 | 
578 | ```python
579 | def duration(self) -> float:
580 |     """Retrieve the word duration in seconds
581 | 
582 |     Returns
583 |         The duration in seconds
584 |     """
585 | ```
586 | 
587 | 
588 | #### `pypar.Word.end`
589 | 
590 | ```python
591 | def end(self) -> float:
592 |     """Retrieve the end time of the word in seconds
593 | 
594 |     Returns
595 |         The end time in seconds
596 |     """
597 | ```
598 | 
599 | 
600 | #### `pypar.Word.phoneme_at_time`
601 | 
602 | ```python
603 | def phoneme_at_time(self, time: float) -> Optional[pypar.Phoneme]:
604 |     """Retrieve the phoneme at the specified time
605 | 
606 |     Arguments
607 |         time
608 |             Time in seconds
609 | 
610 |     Returns
611 |         The phoneme at the given time (or None if time is out of bounds)
612 |     """
613 | ```
614 | 
615 | 
616 | #### `pypar.Word.start`
617 | 
618 | ```python
619 |     def start(self) -> float:
620 |         """Retrieve the start time of the word in seconds
621 | 
622 |         Returns
623 |             The start time in seconds
624 |         """
625 | ```
626 | 
627 | 
628 | ## Tests
629 | 
630 | Tests can be run as follows.
631 | 
632 | ```
633 | pip install pytest
634 | pytest
635 | ```
636 | 


--------------------------------------------------------------------------------
/pypar/__init__.py:
--------------------------------------------------------------------------------
1 | from .phoneme import Phoneme
2 | from .word import Word
3 | from .alignment import Alignment
4 | from . import compare
5 | from . import textgrid
6 | 
7 | SILENCE = '<silent>'
8 | 


--------------------------------------------------------------------------------
/pypar/alignment.py:
--------------------------------------------------------------------------------
  1 | import copy
  2 | import json
  3 | import math
  4 | import os
  5 | from pathlib import Path
  6 | from typing import Dict, List, Optional, Tuple, Union
  7 | 
  8 | import pypar
  9 | 
 10 | 
 11 | ###############################################################################
 12 | # Alignment representation
 13 | ###############################################################################
 14 | 
 15 | 
 16 | class Alignment:
 17 |     """Word and phoneme alignment"""
 18 | 
 19 |     def __init__(
 20 |         self,
 21 |         alignment: Union[str, bytes, os.PathLike, List[pypar.Word], dict]
 22 |     ) -> None:
 23 |         """Create alignment
 24 | 
 25 |         Arguments
 26 |             alignment
 27 |                 The filename, list of words, or json dict of the alignment
 28 |         """
 29 |         if isinstance(alignment, str):
 30 | 
 31 |             # Load alignment from disk
 32 |             self._words = self.load(alignment)
 33 | 
 34 |         elif isinstance(alignment, Path):
 35 | 
 36 |             # Cast and load
 37 |             self._words = self.load(str(alignment))
 38 | 
 39 |         elif isinstance(alignment, list):
 40 |             self._words = alignment
 41 | 
 42 |             # Require first word to start at 0 seconds
 43 |             self.update(start=0.)
 44 | 
 45 |         elif isinstance(alignment, dict):
 46 |             self._words = self.parse_json(alignment)
 47 | 
 48 |         # Ensure there are no gaps (by filling with silence)
 49 |         self.validate()
 50 | 
 51 |     def __add__(self, other):
 52 |         """Add alignments by concatenation
 53 | 
 54 |         Arguments
 55 |             other
 56 |                 The alignment to compare to
 57 | 
 58 |         Returns
 59 |             The concatenated alignment
 60 |         """
 61 |         # Don't change original
 62 |         other = copy.deepcopy(other)
 63 | 
 64 |         # Move start time of other to end of self
 65 |         other.update(start=self.end())
 66 | 
 67 |         # Concatenate word lists
 68 |         return Alignment(self._words + other.words)
 69 | 
 70 |     def __eq__(self, other) -> bool:
 71 |         """Equality comparison for alignments
 72 | 
 73 |         Arguments
 74 |             other
 75 |                 The alignment to compare to
 76 | 
 77 |         Returns
 78 |             Whether the alignments are equal
 79 |         """
 80 |         return \
 81 |             len(self) == len(other) and \
 82 |             all(word == other_word for word, other_word in zip(self, other))
 83 | 
 84 |     def __getitem__(self, idx: Union[int, slice]) -> pypar.Word:
 85 |         """Retrieve the idxth word
 86 | 
 87 |         Arguments
 88 |             idx
 89 |                 The index of the word to retrieve
 90 | 
 91 |         Returns
 92 |             The word at index idx
 93 |         """
 94 |         if isinstance(idx, slice):
 95 | 
 96 |             # Slice into word list
 97 |             return Alignment(copy.deepcopy(self._words[idx]))
 98 | 
 99 |         # Retrieve a single word
100 |         return self._words[idx]
101 | 
102 |     def __len__(self) -> int:
103 |         """Retrieve the number of words
104 | 
105 |         Returns
106 |             The number of words in the alignment
107 |         """
108 |         return len(self._words)
109 | 
110 |     def __str__(self) -> str:
111 |         """Retrieve the text
112 | 
113 |         Returns
114 |             The words in the alignment separated by spaces
115 |         """
116 |         return ' '.join([str(word) for word in self._words
117 |                          if str(word) != pypar.SILENCE])
118 | 
119 |     def duration(self) -> float:
120 |         """Retrieve the duration of the alignment in seconds
121 | 
122 |         Returns
123 |             The duration in seconds
124 |         """
125 |         return self.end() - self.start()
126 | 
127 |     def end(self) -> float:
128 |         """Retrieve the end time of the alignment in seconds
129 | 
130 |         Returns
131 |             The end time in seconds
132 |         """
133 |         return self._words[-1].end()
134 | 
135 |     def find(self, words: str) -> int:
136 |         """Find the words in the alignment
137 | 
138 |         Arguments
139 |             words
140 |                 The words to find
141 | 
142 |         Returns
143 |             The index of the start of the words or -1 if not found
144 |         """
145 |         # Split at spaces
146 |         words = words.split(' ')
147 | 
148 |         for i in range(0, len(self._words) - len(words) + 1):
149 | 
150 |             # Get text
151 |             text = str(self._words[i]).lower()
152 | 
153 |             # Skip silence
154 |             if text == pypar.SILENCE:
155 |                 continue
156 | 
157 |             j, k = 0, 0
158 |             while j < len(words):
159 | 
160 |                 # Compare words
161 |                 if text != words[j]:
162 |                     break
163 | 
164 |                 # Increment words
165 |                 j += 1
166 |                 k += 1
167 |                 text = str(self._words[i + k]).lower()
168 | 
169 |                 # skip silence
170 |                 while text == pypar.SILENCE:
171 |                     k += 1
172 |                     text = str(self._words[i + k]).lower()
173 | 
174 |             # Found match; return indices
175 |             if j == len(words):
176 |                 return i
177 | 
178 |         # No match
179 |         return -1
180 | 
181 |     def framewise_phoneme_indices(
182 |         self,
183 |         phoneme_map: Dict[str, int],
184 |         hopsize: float,
185 |         times: Optional[List[float]] = None
186 |     ) -> List[int]:
187 |         """Convert alignment to phoneme indices at regular temporal interval
188 | 
189 |         Arguments
190 |             phoneme_map
191 |                 Mapping from phonemes to indices
192 |             hopsize
193 |                 Temporal interval between frames in seconds
194 |             times
195 |                 Specified times in seconds to sample phonemes
196 |         """
197 |         if times is None:
198 |             times = [
199 |                 i * hopsize for i in
200 |                 range(math.ceil(self.duration() / hopsize))]
201 |         phonemes = [self.phoneme_at_time(time) for time in times]
202 |         return [phoneme_map[str(phoneme)] for phoneme in phonemes]
203 | 
204 |     def phonemes(self) -> List[pypar.Phoneme]:
205 |         """Retrieve the phonemes in the alignment
206 | 
207 |         Returns
208 |             The phonemes in the alignment
209 |         """
210 |         return [phoneme for word in self for phoneme in word]
211 | 
212 |     def phoneme_at_time(self, time: float) -> Optional[pypar.Phoneme]:
213 |         """Retrieve the phoneme spoken at specified time
214 | 
215 |         Arguments
216 |             time
217 |                 Time in seconds
218 | 
219 |         Returns
220 |             The phoneme at the given time (or None if time is out of bounds)
221 |         """
222 |         word = self.word_at_time(time)
223 |         return word.phoneme_at_time(time) if word else None
224 | 
225 |     def phoneme_bounds(
226 |         self,
227 |         sample_rate: int,
228 |         hopsize: int = 1
229 |     ) -> List[Tuple[int, int]]:
230 |         """Retrieve the start and end frame index of each phoneme
231 | 
232 |         Arguments
233 |             sample_rate
234 |                 The audio sampling rate
235 |             hopsize
236 |                 The number of samples between successive frames
237 | 
238 |         Returns
239 |             The start and end indices of the phonemes
240 |         """
241 |         bounds = [(p.start(), p.end()) for p in self.phonemes()
242 |                   if str(p) != pypar.SILENCE]
243 |         return [(int(a * sample_rate / hopsize),
244 |                  int(b * sample_rate / hopsize))
245 |                 for a, b in bounds]
246 | 
247 |     def save(self, filename: Union[str, bytes, os.PathLike]) -> None:
248 |         """Save alignment to json
249 | 
250 |         Arguments
251 |             filename
252 |                 The location on disk to save the phoneme alignment json
253 |         """
254 |         if os.path.dirname(filename):
255 |             os.makedirs(os.path.dirname(filename), exist_ok=True)
256 |         if isinstance(filename, Path):
257 |             filename = str(filename)
258 |         extension = filename.split('.')[-1]
259 |         if extension == 'json':
260 |             self.save_json(filename)
261 |         elif extension.lower() == 'textgrid':
262 |             self.save_textgrid(filename)
263 |         else:
264 |             raise ValueError(
265 |                 f'No save routine for files with extension {extension}')
266 | 
267 |     def start(self) -> float:
268 |         """Retrieve the start time of the alignment in seconds
269 | 
270 |         Returns
271 |             The start time in seconds
272 |         """
273 |         return self._words[0].start()
274 | 
275 |     def update(
276 |         self,
277 |         idx: int = 0,
278 |         durations: Optional[List[float]] = None,
279 |         start: Optional[float] = None
280 |     ) -> None:
281 |         """Update alignment starting from phoneme index idx
282 | 
283 |         Arguments
284 |             idx
285 |                 The index of the first phoneme whose duration is being updated
286 |             durations
287 |                 The new phoneme durations, starting from idx
288 |             start
289 |                 The start time of the alignment
290 |         """
291 |         # If durations are not given, just update phoneme start and end times
292 |         durations = [] if durations is None else durations
293 | 
294 |         # Word start time (in seconds) and phoneme start index
295 |         start = self.start() if start is None else start
296 |         start_phoneme = 0
297 | 
298 |         # Update each word
299 |         for word in self:
300 |             end_phoneme = start_phoneme + len(word)
301 | 
302 |             # Update phoneme alignment of this word
303 |             word = self.update_word(
304 |                 word, idx, durations, start, start_phoneme, end_phoneme)
305 | 
306 |             start = word.end()
307 |             start_phoneme += len(word)
308 | 
309 |     def words(self) -> List[pypar.Word]:
310 |         """Retrieve the words in the alignment
311 | 
312 |         Returns
313 |             The words in the alignment
314 |         """
315 |         return self._words
316 | 
317 |     def word_at_time(self, time: float) -> Optional[pypar.Word]:
318 |         """Retrieve the word spoken at specified time
319 | 
320 |         Arguments
321 |             time
322 |                 Time in seconds
323 | 
324 |         Returns
325 |             The word spoken at the specified time
326 |         """
327 |         for word in self:
328 |             if word.start() <= time <= word.end():
329 |                 return word
330 |         return None
331 | 
332 |     def word_bounds(
333 |         self,
334 |         sample_rate: int,
335 |         hopsize: int = 1,
336 |         silences: bool = False
337 |     ) -> List[Tuple[int, int]]:
338 |         """Retrieve the start and end frame index of each word
339 | 
340 |         Arguments
341 |             sample_rate
342 |                 The audio sampling rate
343 |             hopsize
344 |                 The number of samples between successive frames
345 |             silences
346 |                 Whether to include silences as words
347 | 
348 |         Returns
349 |             The start and end indices of the words
350 |         """
351 |         words = [
352 |             word for word in self if str(word) != pypar.SILENCE or silences]
353 |         bounds = [(word.start(), word.end()) for word in words]
354 |         return [(int(a * sample_rate / hopsize),
355 |                  int(b * sample_rate / hopsize))
356 |                 for a, b in bounds]
357 | 
358 |     ###########################################################################
359 |     # Utilities
360 |     ###########################################################################
361 | 
362 |     def json(self):
363 |         """Convert to json format"""
364 |         words = []
365 |         for word in self._words:
366 | 
367 |             # Convert phonemes to list
368 |             phonemes = [[str(phoneme), phoneme.start(), phoneme.end()]
369 |                         for phoneme in word]
370 | 
371 |             # Convert word to dict format
372 |             words.append({'alignedWord': str(word),
373 |                           'start': word.start(),
374 |                           'end': word.end(),
375 |                           'phonemes': phonemes})
376 | 
377 |         return {'words': words}
378 | 
379 |     def line_is_valid(self, line):
380 |         """Check if a line of a mlf file represents a phoneme"""
381 |         line = line.strip().split()
382 |         if not line:
383 |             return False
384 |         return len(line) in [4, 5]
385 | 
386 |     def load(self, file):
387 |         """Load alignment from file"""
388 |         extension = file.split('.')[-1]
389 |         if extension == 'mlf':
390 |             return self.load_mlf(file)
391 |         if extension == 'json':
392 |             return self.load_json(file)
393 |         if extension.lower() == 'textgrid':
394 |             return self.load_textgrid(file)
395 |         raise ValueError(
396 |             f'No alignment representation for file extension {extension}')
397 | 
398 |     def load_json(self, filename):
399 |         """Load alignment from json file"""
400 |         # Load from json file
401 |         with open(filename) as file:
402 |             return self.parse_json(json.load(file))
403 | 
404 |     def load_mlf(self, filename):
405 |         """Load from mlf file"""
406 |         # Load file from disk
407 |         with open(filename) as file:
408 |             # Read in phoneme alignment
409 |             lines = [Line(line) for line in file.readlines()
410 |                      if self.line_is_valid(line)]
411 | 
412 |             # Remove silence tokens with 0 duration
413 |             lines = [line for line in lines if line.start < line.end]
414 | 
415 |         # Extract words and phonemes
416 |         phonemes = []
417 |         words = []
418 |         for line in lines:
419 | 
420 |             # Start new word
421 |             if line.word is not None:
422 | 
423 |                 # Add word that just finished
424 |                 if phonemes:
425 |                     words.append(pypar.Word(word, phonemes))
426 |                     phonemes = []
427 | 
428 |                 word = line.word
429 | 
430 |             # Add a phoneme
431 |             phonemes.append(pypar.Phoneme(line.phoneme, line.start, line.end))
432 | 
433 |         # Handle last word
434 |         if phonemes:
435 |             words.append(pypar.Word(word, phonemes))
436 | 
437 |         return words
438 | 
439 |     def load_textgrid(self, filename):
440 |         """Load from textgrid file"""
441 |         # Load file
442 |         grid = pypar.textgrid.TextGrid.fromFile(filename)
443 | 
444 |         # Get phoneme and word representations
445 |         if 'word' in grid[0].name and 'phon' in grid[1].name:
446 |             word_tier, phon_tier = grid[0], grid[1]
447 |         elif 'phon' in grid[0].name and 'word' in grid[1].name:
448 |             phon_tier, word_tier = grid[0], grid[1]
449 |         else:
450 |             raise ValueError(
451 |                 'Cannot determine which TextGrid tiers ' +
452 |                 'correspond to words and phonemes')
453 | 
454 |         # Iterate over words
455 |         words = []
456 |         phon_idx = 0
457 |         for word in word_tier:
458 | 
459 |             # Get all phonemes for this word
460 |             phonemes = []
461 |             while (
462 |                 phon_idx < len(phon_tier) and
463 |                 phon_tier[phon_idx].maxTime <= word.maxTime
464 |             ):
465 |                 phonemes.append(
466 |                     pypar.Phoneme(
467 |                         phon_tier[phon_idx].mark,
468 |                         phon_tier[phon_idx].minTime,
469 |                         phon_tier[phon_idx].maxTime))
470 |                 phon_idx += 1
471 | 
472 |             # Add finished word
473 |             words.append(pypar.Word(word.mark, phonemes))
474 | 
475 |         return words
476 | 
477 |     def parse_json(self, alignment):
478 |         """Construct word list from json representation"""
479 |         words = []
480 |         for word in alignment['words']:
481 |             try:
482 | 
483 |                 # Add a word
484 |                 phonemes = [
485 |                     pypar.Phoneme(*phoneme) for phoneme in word['phonemes']]
486 |                 words.append(pypar.Word(word['alignedWord'], phonemes))
487 | 
488 |             except KeyError:
489 | 
490 |                 # Add a silence
491 |                 phonemes = [
492 |                     pypar.Phoneme(pypar.SILENCE, word['start'], word['end'])]
493 |                 words.append(pypar.Word(pypar.SILENCE, phonemes))
494 | 
495 |         return words
496 | 
497 |     def save_json(self, filename):
498 |         """Save alignment as json"""
499 |         with open(filename, 'w', encoding='utf-8') as file:
500 |             json.dump(self.json(), file, ensure_ascii=False, indent=4)
501 | 
502 |     def save_textgrid(self, filename):
503 |         """Save alignment as textgrid"""
504 |         # Construct phoneme tier
505 |         phon_tier = pypar.textgrid.IntervalTier('phone')
506 |         for phoneme in self.phonemes():
507 |             phon_tier.add(phoneme.start(), phoneme.end(), str(phoneme))
508 | 
509 |         # Construct word tier
510 |         word_tier = pypar.textgrid.IntervalTier('word')
511 |         for word in self:
512 |             word_tier.add(word.start(), word.end(), str(word))
513 | 
514 |         # Save textgrid
515 |         pypar.textgrid.TextGrid([phon_tier, word_tier]).write(filename)
516 | 
517 |     def update_word(
518 |         self,
519 |         word,
520 |         idx,
521 |         durations,
522 |         start,
523 |         start_phoneme,
524 |         end_phoneme):
525 |         """Update the phoneme alignment of one word"""
526 |         # All phonemes beyond (and including) idx must be updated
527 |         if end_phoneme > idx:
528 | 
529 |             # Retrieve current phoneme durations for word
530 |             word_durations = [phoneme.duration() for phoneme in word]
531 | 
532 |             # The first len(durations) phonemes use new durations
533 |             if start_phoneme - idx < len(durations) and end_phoneme - idx > 0:
534 | 
535 |                 # Get indices into durations for copy/paste operation
536 |                 src_start_idx = max(0, start_phoneme - idx)
537 |                 src_end_idx = min(len(durations), end_phoneme - idx)
538 |                 src = durations[src_start_idx:src_end_idx]
539 | 
540 |                 # Case 1: replace all phonemes in word
541 |                 if len(src) == len(word_durations):
542 |                     dst_start_idx, dst_end_idx = 0, len(word_durations)
543 | 
544 |                 # Case 2: replace right-most phonemes in word
545 |                 elif idx > start_phoneme and len(src) == end_phoneme - idx:
546 |                     dst_start_idx = len(word_durations) - len(src)
547 |                     dst_end_idx = len(word_durations)
548 | 
549 |                 # Case 3: replace left-most phonemes in word
550 |                 elif idx <= start_phoneme:
551 |                     dst_start_idx = 0
552 |                     dst_end_idx = len(word_durations) - len(src)
553 | 
554 |                 # Case 4: replace phonemes in center of word
555 |                 else:
556 |                     dst_start_idx = -(start_phoneme - idx)
557 |                     dst_end_idx = dst_start_idx + len(src)
558 | 
559 |                 # Perform copy/paste on duration vector
560 |                 word_durations[dst_start_idx:dst_end_idx] = \
561 |                     durations[src_start_idx:src_end_idx]
562 | 
563 |             # Get new durations for word
564 |             word.update(start, word_durations)
565 | 
566 |         return word
567 | 
568 |     def validate(self):
569 |         """Ensures that adjacent start/stop times are valid by adding silence"""
570 |         i = 0
571 |         start = 0.
572 |         while i < len(self) - 1:
573 | 
574 |             # Get start and end times between words
575 |             end = self[i].start()
576 | 
577 |             # Patch gap with silence
578 |             if end - start > 1e-3:
579 | 
580 |                 # Extend existing silence if possible
581 |                 if str(self[i]) == pypar.SILENCE:
582 |                     self[i][0]._start = start
583 |                 else:
584 |                     word = pypar.Word(
585 |                         pypar.SILENCE,
586 |                         [pypar.Phoneme(pypar.SILENCE, start, end)])
587 |                     self._words.insert(i, word)
588 |                     i += 1
589 | 
590 |             i += 1
591 |             start = self[i].end()
592 | 
593 |         # Phoneme gap validation
594 |         for word in self:
595 |             word.validate()
596 | 
597 | 
598 | ###############################################################################
599 | # Utilities
600 | ###############################################################################
601 | 
602 | 
603 | class Line:
604 |     """One line of a HTK mlf file"""
605 | 
606 |     def __init__(self, line):
607 |         line = line.strip().split()
608 | 
609 |         if len(line) == 4:
610 |             start, end, self.phoneme, _ = line
611 |             self.word = None
612 |         else:
613 |             start, end, self.phoneme, _, self.word = line
614 | 
615 |         self.start = float(start) / 10000000.
616 |         self.end = float(end) / 10000000.
617 | 


--------------------------------------------------------------------------------
/pypar/compare.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | 
 4 | ###############################################################################
 5 | # Alignment comparisons
 6 | ###############################################################################
 7 | 
 8 | 
 9 | def per_frame_rate(
10 |     alignment_a,
11 |     alignment_b,
12 |     sample_rate,
13 |     hopsize,
14 |     frames=None):
15 |     """Compute the per-frame rate difference between alignments A and B
16 | 
17 |     Arguments
18 |         alignment_a : Alignment
19 |             The source alignment
20 |         alignment_b : Alignment
21 |             The target alignment
22 |         sample_rate : int
23 |             The audio sampling rate
24 |         hopsize : int
25 |             The number of samples between successive frames
26 |         frames:
27 |             The number of frames of audio. May vary based on padding.
28 | 
29 |     Returns
30 |         rates : list[float]
31 |             The frame-wise relative speed of alignment B to alignment A
32 |     """
33 |     # Create dict mapping phoneme to relative rate
34 |     rates_per_phoneme = per_phoneme_rate(alignment_a, alignment_b)
35 |     dict_keys = [phoneme_tuple(phoneme) for phoneme in alignment_a.phonemes()]
36 |     rate_map = dict(zip(dict_keys, rates_per_phoneme))
37 | 
38 |     # Query the dict every hopsize seconds
39 |     if frames is None:
40 |         frames = 1 + int(round(alignment_a.end(), 6) * sample_rate / hopsize)
41 |     return [rate_map[phoneme_tuple(alignment_a.phoneme_at_time(t))]
42 |             for t in np.linspace(0., alignment_a.end(), frames)]
43 | 
44 | 
45 | def per_phoneme_rate(alignment_a, alignment_b):
46 |     """Compute the per-phoneme rate difference between alignments A and B
47 | 
48 |     Arguments
49 |         alignment_a : Alignment
50 |             The source alignment
51 |         alignment_b : Alignment
52 |             The target alignment
53 | 
54 |     Returns
55 |         rates : list[float]
56 |             The phoneme-wise relative speed of alignment B to alignment A
57 |     """
58 |     # Error check alignments
59 |     if len(alignment_a.phonemes()) != len(alignment_b.phonemes()):
60 |         raise ValueError('Alignments must have same number of phonemes')
61 | 
62 |     iterator = zip(alignment_a.phonemes(), alignment_b.phonemes())
63 |     return [target.duration() / source.duration()
64 |             for source, target in iterator]
65 | 
66 | 
67 | ###############################################################################
68 | # Alignment comparisons
69 | ###############################################################################
70 | 
71 | 
72 | def phoneme_tuple(phoneme):
73 |     """Convert phoneme to hashable tuple representation
74 | 
75 |     Arguments
76 |         phoneme - The phoneme to convert
77 | 
78 |     Returns
79 |         tuple(float, float, string)
80 |             The phoneme represented as a tuple
81 |     """
82 |     return (phoneme.start(), phoneme.end(), str(phoneme))
83 | 


--------------------------------------------------------------------------------
/pypar/phoneme.py:
--------------------------------------------------------------------------------
 1 | ###############################################################################
 2 | # Phoneme
 3 | ###############################################################################
 4 | 
 5 | 
 6 | class Phoneme:
 7 |     """Aligned phoneme representation"""
 8 | 
 9 |     def __init__(self, phoneme: str, start: float, end: float) -> None:
10 |         """Create phoneme
11 | 
12 |         Arguments
13 |             phoneme
14 |                 The phoneme
15 |             start
16 |                 The start time in seconds
17 |             end
18 |                 The end time in seconds
19 |         """
20 |         self.phoneme = phoneme
21 |         self._start = start
22 |         self._end = end
23 | 
24 |     def __eq__(self, other) -> bool:
25 |         """Equality comparison for phonemes
26 | 
27 |         Arguments
28 |             other
29 |                 The phoneme to compare to
30 | 
31 |         Returns
32 |             Whether the phonemes are equal
33 |         """
34 |         return \
35 |             str(self) == str(other) and \
36 |             abs(self._start - other._start) < 1e-5 and \
37 |             abs(self._end - other._end) < 1e-5
38 | 
39 |     def __str__(self) -> str:
40 |         """Retrieve the phoneme text
41 | 
42 |         Returns
43 |             The phoneme
44 |         """
45 |         return self.phoneme
46 | 
47 |     def duration(self) -> float:
48 |         """Retrieve the phoneme duration
49 | 
50 |         Returns
51 |             The duration in seconds
52 |         """
53 |         return self._end - self._start
54 | 
55 |     def end(self) -> float:
56 |         """Retrieve the end time of the phoneme in seconds
57 | 
58 |         Returns
59 |             The end time in seconds
60 |         """
61 |         return self._end
62 | 
63 |     def start(self) -> float:
64 |         """Retrieve the start time of the phoneme in seconds
65 | 
66 |         Returns
67 |             The start time in seconds
68 |         """
69 |         return self._start
70 | 


--------------------------------------------------------------------------------
/pypar/textgrid.py:
--------------------------------------------------------------------------------
  1 | import re
  2 | 
  3 | 
  4 | ###############################################################################
  5 | # Textgrid
  6 | ###############################################################################
  7 | 
  8 | 
  9 | class TextGrid:
 10 | 
 11 |     def __init__(self, tiers=None):
 12 |         self.tiers = [] if tiers is None else tiers
 13 | 
 14 |     def __len__(self):
 15 |         return len(self.tiers)
 16 | 
 17 |     def __getitem__(self, i):
 18 |         return self.tiers[i]
 19 | 
 20 |     def read(self, file):
 21 |         # Open file
 22 |         with open(file) as file:
 23 | 
 24 |             # Parse header
 25 |             _, short = parse_header(file)
 26 |             first_line_beside_header = file.readline()
 27 |             try:
 28 |                 parse_line(first_line_beside_header, short)
 29 |             except Exception:
 30 |                 short = True
 31 |             parse_line(first_line_beside_header, short)
 32 |             parse_line(file.readline(), short)
 33 |             file.readline()
 34 |             if short:
 35 |                 tiers = int(file.readline().strip())
 36 |             else:
 37 |                 tiers = int(file.readline().strip().split()[2])
 38 |             if not short:
 39 |                 file.readline()
 40 | 
 41 |             # Iterate over tiers
 42 |             for _ in range(tiers):
 43 | 
 44 |                 # Maybe flush extra line 
 45 |                 if not short:
 46 |                     file.readline()
 47 | 
 48 |                 # Create interval tier
 49 |                 if parse_line(file.readline(), short) == 'IntervalTier':
 50 | 
 51 |                     # Initialize
 52 |                     name = parse_line(file.readline(), short)
 53 |                     tier = IntervalTier(name)
 54 | 
 55 |                     # Flush tier min/max time
 56 |                     parse_line(file.readline(), short)
 57 |                     parse_line(file.readline(), short)
 58 | 
 59 |                     # Populate
 60 |                     for _ in range(int(parse_line(file.readline(), short))):
 61 |                         if not short:
 62 |                             file.readline().rstrip().split()
 63 |                         minTime = parse_line(file.readline(), short)
 64 |                         maxTime = parse_line(file.readline(), short)
 65 |                         mark = parseMark(file, short)
 66 |                         if minTime < maxTime:
 67 |                             tier.add(minTime, maxTime, mark)
 68 |                     self.tiers.append(tier)
 69 | 
 70 |                 else:
 71 |                     raise ValueError('TextGrid error')
 72 | 
 73 |     def write(self, file):
 74 |         with open(file, 'w') as file:
 75 |             # Write header
 76 |             file.write('File type = "ooTextFile"\n')
 77 |             file.write('Object class = "TextGrid"\n\n')
 78 |             file.write('xmin = {0}\n'.format(self.tiers[0][0].minTime))
 79 |             file.write('xmax = {0}\n'.format(self.tiers[0][-1].maxTime))
 80 |             file.write('tiers? <exists>\n')
 81 |             file.write('size = {0}\n'.format(len(self)))
 82 |             file.write('item []:\n')
 83 | 
 84 |             # Write interval tiers
 85 |             for i, tier in enumerate(self.tiers, 1):
 86 |                 file.write('\titem [{0}]:\n'.format(i))
 87 |                 file.write('\t\tclass = "IntervalTier"\n')
 88 |                 file.write('\t\tname = "{0}"\n'.format(tier.name))
 89 |                 file.write('\t\txmin = {0}\n'.format(tier[0].minTime))
 90 |                 file.write('\t\txmax = {0}\n'.format(tier[-1].maxTime))
 91 |                 file.write(
 92 |                     '\t\tintervals: size = {0}\n'.format(len(tier.intervals)))
 93 | 
 94 |                 # Write intervals
 95 |                 for j, interval in enumerate(tier.intervals, 1):
 96 |                     file.write('\t\t\tintervals [{0}]:\n'.format(j))
 97 |                     file.write('\t\t\t\txmin = {0}\n'.format(interval.minTime))
 98 |                     file.write('\t\t\t\txmax = {0}\n'.format(interval.maxTime))
 99 |                     mark = interval.mark.replace('"', '""')
100 |                     file.write('\t\t\t\ttext = "{0}"\n'.format(mark))
101 | 
102 |     @classmethod
103 |     def fromFile(cls, file):
104 |         textgrid = cls()
105 |         textgrid.read(file)
106 |         return textgrid
107 | 
108 | 
109 | ###############################################################################
110 | # Textgrid interval
111 | ###############################################################################
112 | 
113 | 
114 | class Interval:
115 | 
116 |     def __init__(self, minTime, maxTime, mark):
117 |         if minTime >= maxTime:
118 |             raise ValueError(minTime, maxTime)
119 |         self.minTime = minTime
120 |         self.maxTime = maxTime
121 |         self.mark = mark
122 | 
123 | 
124 | class IntervalTier:
125 | 
126 |     def __init__(self, name):
127 |         self.name = name
128 |         self.intervals = []
129 | 
130 |     def __iter__(self):
131 |         return iter(self.intervals)
132 | 
133 |     def __len__(self):
134 |         return len(self.intervals)
135 | 
136 |     def __getitem__(self, i):
137 |         return self.intervals[i]
138 | 
139 |     def add(self, minTime, maxTime, mark):
140 |         self.intervals.append(Interval(minTime, maxTime, mark))
141 | 
142 | 
143 | ###############################################################################
144 | # Utilities
145 | ###############################################################################
146 | 
147 | 
148 | def parse_header(source):
149 |     header = source.readline()
150 |     m = re.match(r'File type = "([\w ]+)"', header)
151 |     short = 'short' in m.groups()[0]
152 |     file_type = parse_line(source.readline(), short)
153 |     source.readline()
154 |     return file_type, short
155 | 
156 | 
157 | def parse_line(line, short):
158 |     line = line.strip()
159 |     if short:
160 |         if '"' in line:
161 |             return line[1:-1]
162 |         return float(line)
163 |     if '"' in line:
164 |         m = re.match(r'.+? = "(.*)"', line)
165 |         return m.groups()[0]
166 |     m = re.match(r'.+? = (.*)', line)
167 |     return float(m.groups()[0])
168 | 
169 | 
170 | def parseMark(text, short):
171 |     line = text.readline()
172 | 
173 |     # read until the number of double-quotes is even
174 |     while line.count('"') % 2:
175 |         next_line = text.readline()
176 |         line += next_line
177 | 
178 |     if short:
179 |         pattern = r'^"(.*?)"\s*$'
180 |     else:
181 |         pattern = r'^\s*(text|mark) = "(.*?)"\s*$'
182 |     entry = re.match(pattern, line, re.DOTALL)
183 | 
184 |     return entry.groups()[-1].replace('""', '"')
185 | 


--------------------------------------------------------------------------------
/pypar/word.py:
--------------------------------------------------------------------------------
  1 | from typing import List, Optional
  2 | 
  3 | import pypar
  4 | 
  5 | 
  6 | ###############################################################################
  7 | # Word representation
  8 | ###############################################################################
  9 | 
 10 | 
 11 | class Word:
 12 |     """Aligned word represenatation"""
 13 | 
 14 |     def __init__(self, word: str, phonemes: List[pypar.Phoneme]) -> None:
 15 |         """Create word
 16 | 
 17 |         Arguments
 18 |             word
 19 |                 The word
 20 |             phonemes
 21 |                 The phonemes in the word
 22 |         """
 23 |         self.word = word
 24 |         self.phonemes = phonemes
 25 | 
 26 |     def __eq__(self, other) -> bool:
 27 |         """Equality comparison for words
 28 | 
 29 |         Arguments
 30 |             other
 31 |                 The word to compare to
 32 | 
 33 |         Returns
 34 |             Whether the words are the same
 35 |         """
 36 |         return \
 37 |             str(self) == str(other) and \
 38 |             len(self) == len(other) and \
 39 |             all(phoneme == other_phoneme
 40 |                 for phoneme, other_phoneme in zip(self, other))
 41 | 
 42 |     def __getitem__(self, idx: int) -> pypar.Phoneme:
 43 |         """Retrieve the idxth phoneme
 44 | 
 45 |         Arguments
 46 |             idx
 47 |                 The index of the phoneme to retrieve
 48 | 
 49 |         Returns
 50 |             The phoneme at index idx
 51 |         """
 52 |         return self.phonemes[idx]
 53 | 
 54 |     def __len__(self) -> int:
 55 |         """Retrieve the number of phonemes
 56 | 
 57 |         Returns
 58 |             The number of phonemes
 59 |         """
 60 |         return len(self.phonemes)
 61 | 
 62 |     def __str__(self) -> str:
 63 |         """Retrieve the word text
 64 | 
 65 |         Returns
 66 |             The word text
 67 |         """
 68 |         return self.word
 69 | 
 70 |     def duration(self) -> float:
 71 |         """Retrieve the word duration in seconds
 72 | 
 73 |         Returns
 74 |             The duration in seconds
 75 |         """
 76 |         return self.end() - self.start()
 77 | 
 78 |     def end(self) -> float:
 79 |         """Retrieve the end time of the word in seconds
 80 | 
 81 |         Returns
 82 |             The end time in seconds
 83 |         """
 84 |         return self.phonemes[-1].end()
 85 | 
 86 |     def phoneme_at_time(self, time: float) -> Optional[pypar.Phoneme]:
 87 |         """Retrieve the phoneme at the specified time
 88 | 
 89 |         Arguments
 90 |             time
 91 |                 Time in seconds
 92 | 
 93 |         Returns
 94 |             The phoneme at the given time (or None if time is out of bounds)
 95 |         """
 96 |         for phoneme in self.phonemes:
 97 |             if phoneme.start() <= time <= phoneme.end():
 98 |                 return phoneme
 99 |         return None
100 | 
101 |     def start(self) -> float:
102 |         """Retrieve the start time of the word in seconds
103 | 
104 |         Returns
105 |             The start time in seconds
106 |         """
107 |         return self.phonemes[0].start()
108 | 
109 |     ###########################################################################
110 |     # Utilities
111 |     ###########################################################################
112 | 
113 |     def update(self, start, durations=None):
114 |         """Update the word with new start time and phoneme durations
115 | 
116 |         Arguments
117 |             start : float
118 |                 The new start time of the word
119 |             durations : list[float] or None
120 |                 The new phoneme durations
121 |         """
122 |         # Use current durations if None provided
123 |         if durations is None:
124 |             durations = [phoneme.duration() for phoneme in self.phonemes]
125 | 
126 |         # Update phonemes
127 |         phoneme_start = start
128 |         for phoneme, duration in zip(self.phonemes, durations):
129 |             phoneme._start = phoneme_start
130 |             phoneme._end = phoneme_start + duration
131 |             phoneme_start = phoneme._end
132 | 
133 |     def validate(self):
134 |         """Ensures that adjacent start/end times are valid by adding silence"""
135 |         i = 0
136 |         while i < len(self) - 1:
137 | 
138 |             # Get start and end times between phonemes
139 |             start = self[i].end()
140 |             end = self[i + 1].start()
141 | 
142 |             # Patch gap with silence
143 |             if end - start > 1e-4:
144 |                 phoneme = pypar.Phoneme(pypar.SILENCE, start, end)
145 |                 self.phonemes.insert(i + 1, phoneme)
146 |                 i += 1
147 | 
148 |             i += 1
149 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | from setuptools import setup
 2 | 
 3 | 
 4 | # Description
 5 | with open('README.md') as file:
 6 |     long_description = file.read()
 7 | 
 8 | 
 9 | setup(
10 |     name='pypar',
11 |     version='0.0.6',
12 |     description='Python phoneme alignment representation',
13 |     author='Max Morrison',
14 |     author_email='maxrmorrison@gmail.com',
15 |     url='https://github.com/maxrmorrison/pypar',
16 |     install_requires=['numpy'],
17 |     packages=['pypar'],
18 |     long_description=long_description,
19 |     long_description_content_type='text/markdown',
20 |     keywords=['align', 'duration', 'phoneme', 'speech'],
21 |     classifiers=['License :: OSI Approved :: MIT License'],
22 |     license='MIT')
23 | 


--------------------------------------------------------------------------------
/test/assets/float.json:
--------------------------------------------------------------------------------
  1 | {
  2 |     "words": [
  3 |         {
  4 |             "alignedWord": "<silent>",
  5 |             "start": 0.0,
  6 |             "end": 0.0245,
  7 |             "phonemes": [
  8 |                 [
  9 |                     "<silent>",
 10 |                     0.0,
 11 |                     0.0245
 12 |                 ]
 13 |             ]
 14 |         },
 15 |         {
 16 |             "alignedWord": "the",
 17 |             "start": 0.0245,
 18 |             "end": 0.112,
 19 |             "phonemes": [
 20 |                 [
 21 |                     "dh",
 22 |                     0.0245,
 23 |                     0.08075
 24 |                 ],
 25 |                 [
 26 |                     "ax",
 27 |                     0.08075,
 28 |                     0.112
 29 |                 ]
 30 |             ]
 31 |         },
 32 |         {
 33 |             "alignedWord": "girl",
 34 |             "start": 0.112,
 35 |             "end": 0.36825,
 36 |             "phonemes": [
 37 |                 [
 38 |                     "g",
 39 |                     0.112,
 40 |                     0.18075
 41 |                 ],
 42 |                 [
 43 |                     "er",
 44 |                     0.18075,
 45 |                     0.29325
 46 |                 ],
 47 |                 [
 48 |                     "l",
 49 |                     0.29325,
 50 |                     0.36825
 51 |                 ]
 52 |             ]
 53 |         },
 54 |         {
 55 |             "alignedWord": "faced",
 56 |             "start": 0.36825,
 57 |             "end": 0.7245,
 58 |             "phonemes": [
 59 |                 [
 60 |                     "f",
 61 |                     0.36825,
 62 |                     0.4745
 63 |                 ],
 64 |                 [
 65 |                     "ey",
 66 |                     0.4745,
 67 |                     0.60575
 68 |                 ],
 69 |                 [
 70 |                     "s",
 71 |                     0.60575,
 72 |                     0.6745
 73 |                 ],
 74 |                 [
 75 |                     "t",
 76 |                     0.6745,
 77 |                     0.7245
 78 |                 ]
 79 |             ]
 80 |         },
 81 |         {
 82 |             "alignedWord": "him",
 83 |             "start": 0.7245,
 84 |             "end": 0.91825,
 85 |             "phonemes": [
 86 |                 [
 87 |                     "hh",
 88 |                     0.7245,
 89 |                     0.7495
 90 |                 ],
 91 |                 [
 92 |                     "ih",
 93 |                     0.7495,
 94 |                     0.7995
 95 |                 ],
 96 |                 [
 97 |                     "m",
 98 |                     0.7995,
 99 |                     0.91825
100 |                 ]
101 |             ]
102 |         },
103 |         {
104 |             "alignedWord": "<silent>",
105 |             "start": 0.91825,
106 |             "end": 1.13075,
107 |             "phonemes": [
108 |                 [
109 |                     "<silent>",
110 |                     0.91825,
111 |                     1.13075
112 |                 ]
113 |             ]
114 |         },
115 |         {
116 |             "alignedWord": "her",
117 |             "start": 1.13075,
118 |             "end": 1.29325,
119 |             "phonemes": [
120 |                 [
121 |                     "hh",
122 |                     1.13075,
123 |                     1.1995
124 |                 ],
125 |                 [
126 |                     "er",
127 |                     1.1995,
128 |                     1.29325
129 |                 ]
130 |             ]
131 |         },
132 |         {
133 |             "alignedWord": "eyes",
134 |             "start": 1.29325,
135 |             "end": 1.48075,
136 |             "phonemes": [
137 |                 [
138 |                     "ay",
139 |                     1.29325,
140 |                     1.3995
141 |                 ],
142 |                 [
143 |                     "z",
144 |                     1.3995,
145 |                     1.48075
146 |                 ]
147 |             ]
148 |         },
149 |         {
150 |             "alignedWord": "shining",
151 |             "start": 1.48075,
152 |             "end": 1.90575,
153 |             "phonemes": [
154 |                 [
155 |                     "sh",
156 |                     1.48075,
157 |                     1.59325
158 |                 ],
159 |                 [
160 |                     "ay",
161 |                     1.59325,
162 |                     1.71825
163 |                 ],
164 |                 [
165 |                     "n",
166 |                     1.71825,
167 |                     1.76825
168 |                 ],
169 |                 [
170 |                     "ih",
171 |                     1.76825,
172 |                     1.83075
173 |                 ],
174 |                 [
175 |                     "ng",
176 |                     1.83075,
177 |                     1.90575
178 |                 ]
179 |             ]
180 |         },
181 |         {
182 |             "alignedWord": "with",
183 |             "start": 1.90575,
184 |             "end": 2.00575,
185 |             "phonemes": [
186 |                 [
187 |                     "w",
188 |                     1.90575,
189 |                     1.95575
190 |                 ],
191 |                 [
192 |                     "ih",
193 |                     1.95575,
194 |                     1.987
195 |                 ],
196 |                 [
197 |                     "dh",
198 |                     1.987,
199 |                     2.00575
200 |                 ]
201 |             ]
202 |         },
203 |         {
204 |             "alignedWord": "sudden",
205 |             "start": 2.00575,
206 |             "end": 2.38075,
207 |             "phonemes": [
208 |                 [
209 |                     "s",
210 |                     2.00575,
211 |                     2.13075
212 |                 ],
213 |                 [
214 |                     "ah",
215 |                     2.13075,
216 |                     2.212
217 |                 ],
218 |                 [
219 |                     "d",
220 |                     2.212,
221 |                     2.24325
222 |                 ],
223 |                 [
224 |                     "ax",
225 |                     2.24325,
226 |                     2.287
227 |                 ],
228 |                 [
229 |                     "n",
230 |                     2.287,
231 |                     2.38075
232 |                 ]
233 |             ]
234 |         },
235 |         {
236 |             "alignedWord": "fear",
237 |             "start": 2.38075,
238 |             "end": 2.80575,
239 |             "phonemes": [
240 |                 [
241 |                     "f",
242 |                     2.38075,
243 |                     2.50575
244 |                 ],
245 |                 [
246 |                     "ih",
247 |                     2.50575,
248 |                     2.59325
249 |                 ],
250 |                 [
251 |                     "r",
252 |                     2.59325,
253 |                     2.80575
254 |                 ]
255 |             ]
256 |         },
257 |         {
258 |             "alignedWord": "<silent>",
259 |             "start": 2.80575,
260 |             "end": 2.84325,
261 |             "phonemes": [
262 |                 [
263 |                     "<silent>",
264 |                     2.80575,
265 |                     2.84325
266 |                 ]
267 |             ]
268 |         }
269 |     ]
270 | }
271 | 


--------------------------------------------------------------------------------
/test/assets/test.TextGrid:
--------------------------------------------------------------------------------
  1 | File type = "ooTextFile"
  2 | Object class = "TextGrid"
  3 | 
  4 | xmin = 0.0
  5 | xmax = 5.429931972789116
  6 | tiers? <exists>
  7 | size = 2
  8 | item []:
  9 | 	item [1]:
 10 | 		class = "IntervalTier"
 11 | 		name = "phone"
 12 | 		xmin = 0.0
 13 | 		xmax = 5.429931972789116
 14 | 		intervals: size = 54
 15 | 			intervals [1]:
 16 | 				xmin = 0.0
 17 | 				xmax = 0.27188208616780046
 18 | 				text = "sil"
 19 | 			intervals [2]:
 20 | 				xmin = 0.27188208616780046
 21 | 				xmax = 0.4414965986394558
 22 | 				text = "AY1"
 23 | 			intervals [3]:
 24 | 				xmin = 0.4414965986394558
 25 | 				xmax = 0.6011337868480725
 26 | 				text = "B"
 27 | 			intervals [4]:
 28 | 				xmin = 0.6011337868480725
 29 | 				xmax = 0.6609977324263039
 30 | 				text = "EH1"
 31 | 			intervals [5]:
 32 | 				xmin = 0.6609977324263039
 33 | 				xmax = 0.7507936507936508
 34 | 				text = "G"
 35 | 			intervals [6]:
 36 | 				xmin = 0.7507936507936508
 37 | 				xmax = 0.8605442176870748
 38 | 				text = "Y"
 39 | 			intervals [7]:
 40 | 				xmin = 0.8605442176870748
 41 | 				xmax = 0.8904761904761904
 42 | 				text = "AO1"
 43 | 			intervals [8]:
 44 | 				xmin = 0.8904761904761904
 45 | 				xmax = 0.9303854875283446
 46 | 				text = "R"
 47 | 			intervals [9]:
 48 | 				xmin = 0.9303854875283446
 49 | 				xmax = 1.0501133786848071
 50 | 				text = "P"
 51 | 			intervals [10]:
 52 | 				xmin = 1.0501133786848071
 53 | 				xmax = 1.1399092970521538
 54 | 				text = "AA1"
 55 | 			intervals [11]:
 56 | 				xmin = 1.1399092970521538
 57 | 				xmax = 1.199773242630385
 58 | 				text = "R"
 59 | 			intervals [12]:
 60 | 				xmin = 1.199773242630385
 61 | 				xmax = 1.2396825396825393
 62 | 				text = "D"
 63 | 			intervals [13]:
 64 | 				xmin = 1.2396825396825393
 65 | 				xmax = 1.269614512471655
 66 | 				text = "AH0"
 67 | 			intervals [14]:
 68 | 				xmin = 1.269614512471655
 69 | 				xmax = 1.4990929705215414
 70 | 				text = "N"
 71 | 			intervals [15]:
 72 | 				xmin = 1.4990929705215414
 73 | 				xmax = 1.5888888888888886
 74 | 				text = "sil"
 75 | 			intervals [16]:
 76 | 				xmin = 1.5888888888888886
 77 | 				xmax = 1.7485260770975053
 78 | 				text = "S"
 79 | 			intervals [17]:
 80 | 				xmin = 1.7485260770975053
 81 | 				xmax = 1.818367346938775
 82 | 				text = "EH1"
 83 | 			intervals [18]:
 84 | 				xmin = 1.818367346938775
 85 | 				xmax = 1.8482993197278907
 86 | 				text = "D"
 87 | 			intervals [19]:
 88 | 				xmin = 1.8482993197278907
 89 | 				xmax = 1.8782312925170064
 90 | 				text = "DH"
 91 | 			intervals [20]:
 92 | 				xmin = 1.8782312925170064
 93 | 				xmax = 1.9081632653061218
 94 | 				text = "AH0"
 95 | 			intervals [21]:
 96 | 				xmin = 1.9081632653061218
 97 | 				xmax = 2.017913832199546
 98 | 				text = "M"
 99 | 			intervals [22]:
100 | 				xmin = 2.017913832199546
101 | 				xmax = 2.1875283446712013
102 | 				text = "AW1"
103 | 			intervals [23]:
104 | 				xmin = 2.1875283446712013
105 | 				xmax = 2.4269841269841264
106 | 				text = "S"
107 | 			intervals [24]:
108 | 				xmin = 2.4269841269841264
109 | 				xmax = 2.526757369614512
110 | 				text = "F"
111 | 			intervals [25]:
112 | 				xmin = 2.526757369614512
113 | 				xmax = 2.586621315192743
114 | 				text = "R"
115 | 			intervals [26]:
116 | 				xmin = 2.586621315192743
117 | 				xmax = 2.6863945578231285
118 | 				text = "AW1"
119 | 			intervals [27]:
120 | 				xmin = 2.6863945578231285
121 | 				xmax = 2.7263038548752827
122 | 				text = "N"
123 | 			intervals [28]:
124 | 				xmin = 2.7263038548752827
125 | 				xmax = 2.7961451247165523
126 | 				text = "IH0"
127 | 			intervals [29]:
128 | 				xmin = 2.7961451247165523
129 | 				xmax = 2.8759637188208607
130 | 				text = "NG"
131 | 			intervals [30]:
132 | 				xmin = 2.8759637188208607
133 | 				xmax = 2.935827664399092
134 | 				text = "B"
135 | 			intervals [31]:
136 | 				xmin = 2.935827664399092
137 | 				xmax = 2.995691609977323
138 | 				text = "AH1"
139 | 			intervals [32]:
140 | 				xmin = 2.995691609977323
141 | 				xmax = 3.075510204081631
142 | 				text = "T"
143 | 			intervals [33]:
144 | 				xmin = 3.075510204081631
145 | 				xmax = 3.1353741496598624
146 | 				text = "V"
147 | 			intervals [34]:
148 | 				xmin = 3.1353741496598624
149 | 				xmax = 3.1752834467120166
150 | 				text = "EH1"
151 | 			intervals [35]:
152 | 				xmin = 3.1752834467120166
153 | 				xmax = 3.2850340136054403
154 | 				text = "R"
155 | 			intervals [36]:
156 | 				xmin = 3.2850340136054403
157 | 				xmax = 3.3448979591836716
158 | 				text = "IY0"
159 | 			intervals [37]:
160 | 				xmin = 3.3448979591836716
161 | 				xmax = 3.4147392290249416
162 | 				text = "P"
163 | 			intervals [38]:
164 | 				xmin = 3.4147392290249416
165 | 				xmax = 3.444671201814057
166 | 				text = "AH0"
167 | 			intervals [39]:
168 | 				xmin = 3.444671201814057
169 | 				xmax = 3.5444444444444425
170 | 				text = "L"
171 | 			intervals [40]:
172 | 				xmin = 3.5444444444444425
173 | 				xmax = 3.644217687074828
174 | 				text = "AY1"
175 | 			intervals [41]:
176 | 				xmin = 3.644217687074828
177 | 				xmax = 3.6741496598639434
178 | 				text = "T"
179 | 			intervals [42]:
180 | 				xmin = 3.6741496598639434
181 | 				xmax = 3.7439909297052134
182 | 				text = "L"
183 | 			intervals [43]:
184 | 				xmin = 3.7439909297052134
185 | 				xmax = 3.8736961451247143
186 | 				text = "IY0"
187 | 			intervals [44]:
188 | 				xmin = 3.8736961451247143
189 | 				xmax = 4.093197278911562
190 | 				text = "sil"
191 | 			intervals [45]:
192 | 				xmin = 4.093197278911562
193 | 				xmax = 4.232879818594102
194 | 				text = "D"
195 | 			intervals [46]:
196 | 				xmin = 4.232879818594102
197 | 				xmax = 4.292743764172333
198 | 				text = "IH1"
199 | 			intervals [47]:
200 | 				xmin = 4.292743764172333
201 | 				xmax = 4.352607709750564
202 | 				text = "D"
203 | 			intervals [48]:
204 | 				xmin = 4.352607709750564
205 | 				xmax = 4.442403628117912
206 | 				text = "Y"
207 | 			intervals [49]:
208 | 				xmin = 4.442403628117912
209 | 				xmax = 4.482312925170066
210 | 				text = "UW1"
211 | 			intervals [50]:
212 | 				xmin = 4.482312925170066
213 | 				xmax = 4.641950113378682
214 | 				text = "S"
215 | 			intervals [51]:
216 | 				xmin = 4.641950113378682
217 | 				xmax = 4.681859410430836
218 | 				text = "P"
219 | 			intervals [52]:
220 | 				xmin = 4.681859410430836
221 | 				xmax = 4.851473922902492
222 | 				text = "IY1"
223 | 			intervals [53]:
224 | 				xmin = 4.851473922902492
225 | 				xmax = 5.0011337868480705
226 | 				text = "K"
227 | 			intervals [54]:
228 | 				xmin = 5.0011337868480705
229 | 				xmax = 5.429931972789116
230 | 				text = "sil"
231 | 	item [2]:
232 | 		class = "IntervalTier"
233 | 		name = "word"
234 | 		xmin = 0.0
235 | 		xmax = 5.429931972789116
236 | 		intervals: size = 18
237 | 			intervals [1]:
238 | 				xmin = 0.0
239 | 				xmax = 0.27188208616780046
240 | 				text = "sp"
241 | 			intervals [2]:
242 | 				xmin = 0.27188208616780046
243 | 				xmax = 0.4414965986394558
244 | 				text = "I"
245 | 			intervals [3]:
246 | 				xmin = 0.4414965986394558
247 | 				xmax = 0.7507936507936508
248 | 				text = "BEG"
249 | 			intervals [4]:
250 | 				xmin = 0.7507936507936508
251 | 				xmax = 0.9303854875283446
252 | 				text = "YOUR"
253 | 			intervals [5]:
254 | 				xmin = 0.9303854875283446
255 | 				xmax = 1.4990929705215414
256 | 				text = "PARDON"
257 | 			intervals [6]:
258 | 				xmin = 1.4990929705215414
259 | 				xmax = 1.5888888888888886
260 | 				text = "sp"
261 | 			intervals [7]:
262 | 				xmin = 1.5888888888888886
263 | 				xmax = 1.8482993197278907
264 | 				text = "SAID"
265 | 			intervals [8]:
266 | 				xmin = 1.8482993197278907
267 | 				xmax = 1.9081632653061218
268 | 				text = "THE"
269 | 			intervals [9]:
270 | 				xmin = 1.9081632653061218
271 | 				xmax = 2.4269841269841264
272 | 				text = "MOUSE"
273 | 			intervals [10]:
274 | 				xmin = 2.4269841269841264
275 | 				xmax = 2.8759637188208607
276 | 				text = "FROWNING"
277 | 			intervals [11]:
278 | 				xmin = 2.8759637188208607
279 | 				xmax = 3.075510204081631
280 | 				text = "BUT"
281 | 			intervals [12]:
282 | 				xmin = 3.075510204081631
283 | 				xmax = 3.3448979591836716
284 | 				text = "VERY"
285 | 			intervals [13]:
286 | 				xmin = 3.3448979591836716
287 | 				xmax = 3.8736961451247143
288 | 				text = "POLITELY"
289 | 			intervals [14]:
290 | 				xmin = 3.8736961451247143
291 | 				xmax = 4.093197278911562
292 | 				text = "sp"
293 | 			intervals [15]:
294 | 				xmin = 4.093197278911562
295 | 				xmax = 4.352607709750564
296 | 				text = "DID"
297 | 			intervals [16]:
298 | 				xmin = 4.352607709750564
299 | 				xmax = 4.482312925170066
300 | 				text = "YOU"
301 | 			intervals [17]:
302 | 				xmin = 4.482312925170066
303 | 				xmax = 5.0011337868480705
304 | 				text = "SPEAK"
305 | 			intervals [18]:
306 | 				xmin = 5.0011337868480705
307 | 				xmax = 5.429931972789116
308 | 				text = "sp"
309 | 


--------------------------------------------------------------------------------
/test/assets/test.json:
--------------------------------------------------------------------------------
  1 | {
  2 |     "words": [
  3 |         {
  4 |             "alignedWord": "<silent>",
  5 |             "start": 0.0,
  6 |             "end": 0.27188208616780046,
  7 |             "phonemes": [
  8 |                 [
  9 |                     "<silent>",
 10 |                     0.0,
 11 |                     0.27188208616780046
 12 |                 ]
 13 |             ]
 14 |         },
 15 |         {
 16 |             "alignedWord": "I",
 17 |             "start": 0.27188208616780046,
 18 |             "end": 0.4414965986394558,
 19 |             "phonemes": [
 20 |                 [
 21 |                     "AY1",
 22 |                     0.27188208616780046,
 23 |                     0.4414965986394558
 24 |                 ]
 25 |             ]
 26 |         },
 27 |         {
 28 |             "alignedWord": "BEG",
 29 |             "start": 0.4414965986394558,
 30 |             "end": 0.7507936507936508,
 31 |             "phonemes": [
 32 |                 [
 33 |                     "B",
 34 |                     0.4414965986394558,
 35 |                     0.6011337868480725
 36 |                 ],
 37 |                 [
 38 |                     "EH1",
 39 |                     0.6011337868480725,
 40 |                     0.6609977324263039
 41 |                 ],
 42 |                 [
 43 |                     "G",
 44 |                     0.6609977324263039,
 45 |                     0.7507936507936508
 46 |                 ]
 47 |             ]
 48 |         },
 49 |         {
 50 |             "alignedWord": "YOUR",
 51 |             "start": 0.7507936507936508,
 52 |             "end": 0.9303854875283446,
 53 |             "phonemes": [
 54 |                 [
 55 |                     "Y",
 56 |                     0.7507936507936508,
 57 |                     0.8605442176870748
 58 |                 ],
 59 |                 [
 60 |                     "AO1",
 61 |                     0.8605442176870748,
 62 |                     0.8904761904761904
 63 |                 ],
 64 |                 [
 65 |                     "R",
 66 |                     0.8904761904761904,
 67 |                     0.9303854875283446
 68 |                 ]
 69 |             ]
 70 |         },
 71 |         {
 72 |             "alignedWord": "PARDON",
 73 |             "start": 0.9303854875283446,
 74 |             "end": 1.4990929705215414,
 75 |             "phonemes": [
 76 |                 [
 77 |                     "P",
 78 |                     0.9303854875283446,
 79 |                     1.0501133786848071
 80 |                 ],
 81 |                 [
 82 |                     "AA1",
 83 |                     1.0501133786848071,
 84 |                     1.1399092970521538
 85 |                 ],
 86 |                 [
 87 |                     "R",
 88 |                     1.1399092970521538,
 89 |                     1.199773242630385
 90 |                 ],
 91 |                 [
 92 |                     "D",
 93 |                     1.199773242630385,
 94 |                     1.2396825396825393
 95 |                 ],
 96 |                 [
 97 |                     "AH0",
 98 |                     1.2396825396825393,
 99 |                     1.269614512471655
100 |                 ],
101 |                 [
102 |                     "N",
103 |                     1.269614512471655,
104 |                     1.4990929705215414
105 |                 ]
106 |             ]
107 |         },
108 |         {
109 |             "alignedWord": "<silent>",
110 |             "start": 1.4990929705215414,
111 |             "end": 1.5888888888888886,
112 |             "phonemes": [
113 |                 [
114 |                     "<silent>",
115 |                     1.4990929705215414,
116 |                     1.5888888888888886
117 |                 ]
118 |             ]
119 |         },
120 |         {
121 |             "alignedWord": "SAID",
122 |             "start": 1.5888888888888886,
123 |             "end": 1.8482993197278907,
124 |             "phonemes": [
125 |                 [
126 |                     "S",
127 |                     1.5888888888888886,
128 |                     1.7485260770975053
129 |                 ],
130 |                 [
131 |                     "EH1",
132 |                     1.7485260770975053,
133 |                     1.818367346938775
134 |                 ],
135 |                 [
136 |                     "D",
137 |                     1.818367346938775,
138 |                     1.8482993197278907
139 |                 ]
140 |             ]
141 |         },
142 |         {
143 |             "alignedWord": "THE",
144 |             "start": 1.8482993197278907,
145 |             "end": 1.9081632653061218,
146 |             "phonemes": [
147 |                 [
148 |                     "DH",
149 |                     1.8482993197278907,
150 |                     1.8782312925170064
151 |                 ],
152 |                 [
153 |                     "AH0",
154 |                     1.8782312925170064,
155 |                     1.9081632653061218
156 |                 ]
157 |             ]
158 |         },
159 |         {
160 |             "alignedWord": "MOUSE",
161 |             "start": 1.9081632653061218,
162 |             "end": 2.4269841269841264,
163 |             "phonemes": [
164 |                 [
165 |                     "M",
166 |                     1.9081632653061218,
167 |                     2.017913832199546
168 |                 ],
169 |                 [
170 |                     "AW1",
171 |                     2.017913832199546,
172 |                     2.1875283446712013
173 |                 ],
174 |                 [
175 |                     "S",
176 |                     2.1875283446712013,
177 |                     2.4269841269841264
178 |                 ]
179 |             ]
180 |         },
181 |         {
182 |             "alignedWord": "FROWNING",
183 |             "start": 2.4269841269841264,
184 |             "end": 2.8759637188208607,
185 |             "phonemes": [
186 |                 [
187 |                     "F",
188 |                     2.4269841269841264,
189 |                     2.526757369614512
190 |                 ],
191 |                 [
192 |                     "R",
193 |                     2.526757369614512,
194 |                     2.586621315192743
195 |                 ],
196 |                 [
197 |                     "AW1",
198 |                     2.586621315192743,
199 |                     2.6863945578231285
200 |                 ],
201 |                 [
202 |                     "N",
203 |                     2.6863945578231285,
204 |                     2.7263038548752827
205 |                 ],
206 |                 [
207 |                     "IH0",
208 |                     2.7263038548752827,
209 |                     2.7961451247165523
210 |                 ],
211 |                 [
212 |                     "NG",
213 |                     2.7961451247165523,
214 |                     2.8759637188208607
215 |                 ]
216 |             ]
217 |         },
218 |         {
219 |             "alignedWord": "BUT",
220 |             "start": 2.8759637188208607,
221 |             "end": 3.075510204081631,
222 |             "phonemes": [
223 |                 [
224 |                     "B",
225 |                     2.8759637188208607,
226 |                     2.935827664399092
227 |                 ],
228 |                 [
229 |                     "AH1",
230 |                     2.935827664399092,
231 |                     2.995691609977323
232 |                 ],
233 |                 [
234 |                     "T",
235 |                     2.995691609977323,
236 |                     3.075510204081631
237 |                 ]
238 |             ]
239 |         },
240 |         {
241 |             "alignedWord": "VERY",
242 |             "start": 3.075510204081631,
243 |             "end": 3.3448979591836716,
244 |             "phonemes": [
245 |                 [
246 |                     "V",
247 |                     3.075510204081631,
248 |                     3.1353741496598624
249 |                 ],
250 |                 [
251 |                     "EH1",
252 |                     3.1353741496598624,
253 |                     3.1752834467120166
254 |                 ],
255 |                 [
256 |                     "R",
257 |                     3.1752834467120166,
258 |                     3.2850340136054403
259 |                 ],
260 |                 [
261 |                     "IY0",
262 |                     3.2850340136054403,
263 |                     3.3448979591836716
264 |                 ]
265 |             ]
266 |         },
267 |         {
268 |             "alignedWord": "POLITELY",
269 |             "start": 3.3448979591836716,
270 |             "end": 3.8736961451247143,
271 |             "phonemes": [
272 |                 [
273 |                     "P",
274 |                     3.3448979591836716,
275 |                     3.4147392290249416
276 |                 ],
277 |                 [
278 |                     "AH0",
279 |                     3.4147392290249416,
280 |                     3.444671201814057
281 |                 ],
282 |                 [
283 |                     "L",
284 |                     3.444671201814057,
285 |                     3.5444444444444425
286 |                 ],
287 |                 [
288 |                     "AY1",
289 |                     3.5444444444444425,
290 |                     3.644217687074828
291 |                 ],
292 |                 [
293 |                     "T",
294 |                     3.644217687074828,
295 |                     3.6741496598639434
296 |                 ],
297 |                 [
298 |                     "L",
299 |                     3.6741496598639434,
300 |                     3.7439909297052134
301 |                 ],
302 |                 [
303 |                     "IY0",
304 |                     3.7439909297052134,
305 |                     3.8736961451247143
306 |                 ]
307 |             ]
308 |         },
309 |         {
310 |             "alignedWord": "<silent>",
311 |             "start": 3.8736961451247143,
312 |             "end": 4.093197278911562,
313 |             "phonemes": [
314 |                 [
315 |                     "<silent>",
316 |                     3.8736961451247143,
317 |                     4.093197278911562
318 |                 ]
319 |             ]
320 |         },
321 |         {
322 |             "alignedWord": "DID",
323 |             "start": 4.093197278911562,
324 |             "end": 4.352607709750564,
325 |             "phonemes": [
326 |                 [
327 |                     "D",
328 |                     4.093197278911562,
329 |                     4.232879818594102
330 |                 ],
331 |                 [
332 |                     "IH1",
333 |                     4.232879818594102,
334 |                     4.292743764172333
335 |                 ],
336 |                 [
337 |                     "D",
338 |                     4.292743764172333,
339 |                     4.352607709750564
340 |                 ]
341 |             ]
342 |         },
343 |         {
344 |             "alignedWord": "YOU",
345 |             "start": 4.352607709750564,
346 |             "end": 4.482312925170066,
347 |             "phonemes": [
348 |                 [
349 |                     "Y",
350 |                     4.352607709750564,
351 |                     4.442403628117912
352 |                 ],
353 |                 [
354 |                     "UW1",
355 |                     4.442403628117912,
356 |                     4.482312925170066
357 |                 ]
358 |             ]
359 |         },
360 |         {
361 |             "alignedWord": "SPEAK",
362 |             "start": 4.482312925170066,
363 |             "end": 5.0011337868480705,
364 |             "phonemes": [
365 |                 [
366 |                     "S",
367 |                     4.482312925170066,
368 |                     4.641950113378682
369 |                 ],
370 |                 [
371 |                     "P",
372 |                     4.641950113378682,
373 |                     4.681859410430836
374 |                 ],
375 |                 [
376 |                     "IY1",
377 |                     4.681859410430836,
378 |                     4.851473922902492
379 |                 ],
380 |                 [
381 |                     "K",
382 |                     4.851473922902492,
383 |                     5.0011337868480705
384 |                 ]
385 |             ]
386 |         },
387 |         {
388 |             "alignedWord": "<silent>",
389 |             "start": 5.0011337868480705,
390 |             "end": 5.429931972789116,
391 |             "phonemes": [
392 |                 [
393 |                     "<silent>",
394 |                     5.0011337868480705,
395 |                     5.429931972789116
396 |                 ]
397 |             ]
398 |         }
399 |     ]
400 | }
401 | 


--------------------------------------------------------------------------------
/test/assets/test.txt:
--------------------------------------------------------------------------------
1 | "I beg your pardon?" said the mouse, frowning, but very politely, "did you speak?"


--------------------------------------------------------------------------------
/test/assets/test.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/maxrmorrison/pypar/15701c8d3325d24c9aa04919468e840655c460a6/test/assets/test.wav


--------------------------------------------------------------------------------
/test/conftest.py:
--------------------------------------------------------------------------------
 1 | from pathlib import Path
 2 | 
 3 | import pytest
 4 | 
 5 | import pypar
 6 | 
 7 | 
 8 | ###############################################################################
 9 | # Test fixtures
10 | ###############################################################################
11 | 
12 | 
13 | @pytest.fixture(scope='session')
14 | def alignment():
15 |     """Retrieve the alignment to use for testing"""
16 |     return pypar.Alignment(path('test.json'))
17 | 
18 | @pytest.fixture(scope='session')
19 | def text():
20 |     """Retrieve the speech transcript"""
21 |     with open(path('test.txt')) as file:
22 |         return file.read()
23 | 
24 | 
25 | @pytest.fixture(scope='session')
26 | def textgrid():
27 |     """Retrieve the speech textgrid"""
28 |     return pypar.Alignment(path('test.TextGrid'))
29 | 
30 | 
31 | @pytest.fixture(scope='session')
32 | def float_alignment():
33 |     """Retrieve special alignment for float testing"""
34 |     return pypar.Alignment(path('float.json'))
35 | 
36 | 
37 | ###############################################################################
38 | # Utilities
39 | ###############################################################################
40 | 
41 | 
42 | def path(file):
43 |     """Resolve the file path of a test asset"""
44 |     return Path(__file__).parent / 'assets' / file
45 | 


--------------------------------------------------------------------------------
/test/test_alignment.py:
--------------------------------------------------------------------------------
 1 | import tempfile
 2 | from pathlib import Path
 3 | 
 4 | import pypar
 5 | 
 6 | 
 7 | ###############################################################################
 8 | # Test alignment
 9 | ###############################################################################
10 | 
11 | 
12 | def test_find(alignment):
13 |     """Test finding words in the alignment"""
14 |     assert alignment.find('the mouse') == 7
15 |     assert alignment.find('the dog') == -1
16 | 
17 | 
18 | def test_phoneme_at_time(alignment):
19 |     """Test queries for current phoneme given a time in seconds"""
20 |     assert alignment.phoneme_at_time(-1.) is None
21 |     assert str(alignment.phoneme_at_time(1.5)) == pypar.SILENCE
22 |     assert str(alignment.phoneme_at_time(1.9)) == 'AH0'
23 |     assert str(alignment.phoneme_at_time(4.5)) == 'S'
24 |     assert alignment.phoneme_at_time(6.) is None
25 | 
26 | 
27 | def test_phoneme_bounds(alignment):
28 |     """Test frame boundaries of phonemes"""
29 |     bounds = alignment.phoneme_bounds(10000, 100)
30 |     assert bounds[0] == (27, 44)
31 |     assert bounds[4] == (75, 86)
32 | 
33 | 
34 | def test_load(textgrid):
35 |     """Test textgrid loading"""
36 |     pass
37 | 
38 | 
39 | def test_save(alignment):
40 |     """Test saving and reloading alignment"""
41 |     with tempfile.TemporaryDirectory() as directory:
42 |         # Test json
43 |         file = Path(directory) / 'alignment.json'
44 |         alignment.save(file)
45 |         assert alignment == pypar.Alignment(file)
46 | 
47 |         # Test textgrid
48 |         file = Path(directory) / 'alignment.TextGrid'
49 |         alignment.save(file)
50 |         assert alignment == pypar.Alignment(file)
51 | 
52 | 
53 | def test_string(text, alignment):
54 |     """Test the alignment string representation"""
55 |     text = text.replace('"', '')
56 |     text = text.replace('?', '')
57 |     text = text.replace(',', '')
58 |     assert text.upper() == str(alignment)
59 | 
60 | 
61 | def test_word_at_time(alignment):
62 |     """Test queries for current word given a time in seconds"""
63 |     assert alignment.word_at_time(-1.) is None
64 |     assert str(alignment.word_at_time(1.)) == 'PARDON'
65 |     assert str(alignment.word_at_time(4.1)) == 'DID'
66 |     assert alignment.word_at_time(6.) is None
67 | 
68 | 
69 | def test_word_bounds(alignment):
70 |     """Test frame boundaries of words"""
71 |     bounds = alignment.word_bounds(10000, 100)
72 |     assert bounds[0] == (27, 44)
73 |     assert bounds[3] == (93, 149)
74 | 
75 | 
76 | def test_float_update(float_alignment):
77 |     for i in range(1, len(float_alignment)):
78 |         assert float_alignment[i].start() >= float_alignment[i-1].end()
79 |     float_alignment.update(start=0.)
80 |     for i in range(1, len(float_alignment)):
81 |         assert float_alignment[i].start() >= float_alignment[i-1].end()
82 | 


--------------------------------------------------------------------------------
/test/test_compare.py:
--------------------------------------------------------------------------------
 1 | import copy
 2 | 
 3 | import pypar
 4 | 
 5 | 
 6 | ###############################################################################
 7 | # Test alignment comparisons
 8 | ###############################################################################
 9 | 
10 | 
11 | def test_per_frame_rate(alignment):
12 |     """Test the per-frame speed difference between alignments"""
13 |     stretch_and_assert(alignment, .5)
14 |     stretch_and_assert(alignment, 1.)
15 |     stretch_and_assert(alignment, 2.)
16 | 
17 | 
18 | ###############################################################################
19 | # Utilities
20 | ###############################################################################
21 | 
22 | 
23 | def stretch(alignment, factor):
24 |     """Time-stretch the alignment by a constant factor"""
25 |     # Get phoneme durations
26 |     durations = [factor * p.duration() for p in alignment.phonemes()]
27 |     alignment = copy.deepcopy(alignment)
28 |     alignment.update(durations=durations)
29 |     return alignment
30 | 
31 | 
32 | def stretch_and_assert(alignment, factor, sample_rate=10000, hopsize=100):
33 |     """Time-stretch and perform test assertions"""
34 |     # Get per-frame rate differences
35 |     result = pypar.compare.per_frame_rate(alignment,
36 |                                           stretch(alignment, factor),
37 |                                           sample_rate,
38 |                                           hopsize)
39 | 
40 |     # Perform assertions
41 |     assert len(result) == 1 + int(alignment.duration() * sample_rate / hopsize)
42 |     for item in result:
43 |         assert item == factor
44 | 


--------------------------------------------------------------------------------