├── .gitignore
├── CITATION.cff
├── LICENSE
├── README.md
├── pypar
├── __init__.py
├── alignment.py
├── compare.py
├── phoneme.py
├── textgrid.py
└── word.py
├── setup.py
└── test
├── assets
├── float.json
├── test.TextGrid
├── test.json
├── test.txt
└── test.wav
├── conftest.py
├── test_alignment.py
└── test_compare.py
/.gitignore:
--------------------------------------------------------------------------------
1 | *.egg-info
2 | __pycache__/
3 | .ipynb_checkpoints/
4 | .pytest_cache/
5 | .vscode/
6 | build/
7 | dist/
8 |
--------------------------------------------------------------------------------
/CITATION.cff:
--------------------------------------------------------------------------------
1 | cff-version: 1.2.0
2 | message: "If you use this software, please cite it using the following metadata."
3 | authors:
4 | - family-names: "Morrison"
5 | given-names: "Max"
6 | title: "pypar"
7 | version: 0.0.2
8 | date-released: 2021-04-03
9 | url: "https://github.com/maxrmorrison/pypar"
10 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2020 Max Morrison
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
Python phoneme alignment representation
2 |
3 |
4 | [](https://pypi.python.org/pypi/pypar)
5 | [](https://opensource.org/licenses/MIT)
6 | [](https://pepy.tech/project/pypar)
7 |
8 | `pip install pypar`
9 |
10 |
11 | Word and phoneme alignment representation for speech tasks. This repo does
12 | not perform forced word or phoneme alignment, but provides an interface
13 | for working with the resulting alignment of a forced aligner, such as
14 | [`pyfoal`](https://github.com/maxrmorrison/pyfoal), or a manual alignment.
15 |
16 |
17 | ## Table of contents
18 |
19 | - [Usage](#usage)
20 | * [Creating alignments](#creating-aligments)
21 | * [Accessing words and phonemes](#accessing-words-and-phonemes)
22 | * [Saving alignments](#saving-alignments)
23 | - [Application programming interface (API)](#application-programming-interface-api)
24 | * [`pypar.Alignment`](#pyparalignment)
25 | * [`pypar.Alignment.__init__`](#pyparalignment__init__)
26 | * [`pypar.Alignment.__add__`](#pyparalignment__add__)
27 | * [`pypar.Alignment.__eq__`](#pyparalignment__eq__)
28 | * [`pypar.Alignment.__getitem__`](#pyparalignment__getitem__)
29 | * [`pypar.Alignment.__len__`](#pyparalignment__len__)
30 | * [`pypar.Alignment.__str__`](#pyparalignment__str__)
31 | * [`pypar.Alignment.duration`](#pyparalignmentduration)
32 | * [`pypar.Alignment.end`](#pyparalignmentend)
33 | * [`pypar.Alignment.find`](#pyparalignmentfind)
34 | * [`pypar.Alignment.framewise_phoneme_indices`](#pyparalignmentframewise_phoneme_indices)
35 | * [`pypar.Alignment.phonemes`](#pyparalignmentphonemes)
36 | * [`pypar.Alignment.phoneme_at_time`](#pyparalignmentphoneme_at_time)
37 | * [`pypar.Alignment.phoneme_bounds`](#pyparalignmentphoneme_bounds)
38 | * [`pypar.Alignment.save`](#pyparalignmentsave)
39 | * [`pypar.Alignment.start`](#pyparalignmentstart)
40 | * [`pypar.Alignment.update`](#pyparalignmentupdate)
41 | * [`pypar.Alignment.words`](#pyparalignmentwords)
42 | * [`pypar.Alignment.word_bounds`](#pyparalignmentword_bounds)
43 | * [`pypar.Phoneme`](#pyparphoneme)
44 | * [`pypar.Phoneme.__init__`](#pyparphoneme__init__)
45 | * [`pypar.Phoneme.__eq__`](#pyparphoneme__eq__)
46 | * [`pypar.Phoneme.__str__`](#pyparphoneme__str__)
47 | * [`pypar.Phoneme.duration`](#pyparphonemeduration)
48 | * [`pypar.Phoneme.end`](#pyparphonemeend)
49 | * [`pypar.Phoneme.start`](#pyparphonemestart)
50 | * [`pypar.Word`](#pyparword)
51 | * [`pypar.Word.__init__`](#pyparword__init__)
52 | * [`pypar.Word.__eq__`](#pyparword__eq__)
53 | * [`pypar.Word.__getitem__`](#pyparword__getitem__)
54 | * [`pypar.Word.__len__`](#pyparword__len__)
55 | * [`pypar.Word.__str__`](#pyparword__str__)
56 | * [`pypar.Word.duration`](#pyparwordduration)
57 | * [`pypar.Word.end`](#pyparwordend)
58 | * [`pypar.Word.phoneme_at_time`](#pyparwordphoneme_at_time)
59 | * [`pypar.Word.start`](#pyparwordstart)
60 | - [Tests](#tests)
61 |
62 | ## Usage
63 |
64 | ### Creating alignments
65 |
66 | If you already have the alignment saved to a `json`, `mlf`, or `TextGrid`
67 | file, pass the name of the file. Valid examples of each format can be found in
68 | `test/assets/`.
69 |
70 | ```python
71 | alignment = pypar.Alignment(file)
72 | ```
73 |
74 | Alignments can be created manually from `Word` and `Phoneme` objects. Start and
75 | end times are given in seconds.
76 |
77 | ```python
78 | # Create a word from phonemes
79 | word = pypar.Word(
80 | 'THE',
81 | [pypar.Phoneme('DH', 0., .03), pypar.Phoneme('AH0', .03, .06)])
82 |
83 | # Create a silence
84 | silence = pypar.Word(pypar.SILENCE, pypar.Phoneme(pypar.SILENCE, .06, .16))
85 |
86 | # Make an alignment
87 | alignment = pypar.Alignment([word, silence])
88 | ```
89 |
90 | You can create a new alignment from existing alignments via slicing and
91 | concatenation.
92 |
93 | ```python
94 | # Slice
95 | first_two_words = alignment[:2]
96 |
97 | # Concatenate
98 | alignment_with_repeat = first_two_words + alignment
99 | ```
100 |
101 |
102 | ### Accessing words and phonemes
103 |
104 | To retrieve a list of words in the alignment, use `alignment.words()`.
105 | To retrieve a list of phonemes, use `alignment.phonemes()`. The `Alignment`,
106 | `Word`, and `Phoneme` objects all define `.start()`, `.end()`, and
107 | `.duration()` methods, which return the start time, end time, and duration,
108 | respectively. All times are given in units of seconds. These objects also
109 | define equality checks via `==`, casting to string with `str()`, and iteration
110 | as follows.
111 |
112 | ```python
113 | # Iterate over words
114 | for word in alignment:
115 |
116 | # Access start and end times
117 | assert word.duration() == word.end() - word.start()
118 |
119 | # Iterate over phonemes in word
120 | for phoneme in word:
121 |
122 | # Access string representation
123 | assert isinstance(str(phoneme), str)
124 | ```
125 |
126 | To access a word or phoneme at a specific time, pass the time in seconds to
127 | `alignment.word_at_time` or `alignment.phoneme_at_time`.
128 |
129 | To retrieve the frame indices of the start and end of a word or phoneme, pass
130 | the audio sampling rate and hopsize (in samples) to `alignment.word_bounds` or
131 | `alignment.phoneme_bounds`.
132 |
133 |
134 | ### Saving alignments
135 |
136 | To save an alignment to disk, use `alignment.save(file)`, where `file` is the
137 | desired filename. `pypar` currently supports saving as a `json` or `TextGrid`
138 | file.
139 |
140 |
141 | ## Application programming interface (API)
142 |
143 | ### `pypar.Alignment`
144 |
145 | #### `pypar.Alignment.__init__`
146 |
147 | ```python
148 | def __init__(
149 | self,
150 | alignment: Union[str, bytes, os.PathLike, List[pypar.Word], dict]
151 | ) -> None:
152 | """Create alignment
153 |
154 | Arguments
155 | alignment
156 | The filename, list of words, or json dict of the alignment
157 | """
158 | ```
159 |
160 |
161 | #### `pypar.Alignment.__add__`
162 |
163 | ```python
164 | def __add__(self, other):
165 | """Add alignments by concatenation
166 |
167 | Arguments
168 | other
169 | The alignment to compare to
170 |
171 | Returns
172 | The concatenated alignment
173 | """
174 | ```
175 |
176 |
177 | #### `pypar.Alignment.__eq__`
178 |
179 | ```python
180 | def __eq__(self, other) -> bool:
181 | """Equality comparison for alignments
182 |
183 | Arguments
184 | other
185 | The alignment to compare to
186 |
187 | Returns
188 | Whether the alignments are equal
189 | """
190 | ```
191 |
192 |
193 | #### `pypar.Alignment.__getitem__`
194 |
195 | ```python
196 | def __getitem__(self, idx: Union[int, slice]) -> pypar.Word:
197 | """Retrieve the idxth word
198 |
199 | Arguments
200 | idx
201 | The index of the word to retrieve
202 |
203 | Returns
204 | The word at index idx
205 | """
206 | ```
207 |
208 |
209 | #### `pypar.Alignment.__len__`
210 |
211 | ```python
212 | def __len__(self) -> int:
213 | """Retrieve the number of words
214 |
215 | Returns
216 | The number of words in the alignment
217 | """
218 | ```
219 |
220 |
221 | #### `pypar.Alignment.__str__`
222 |
223 | ```python
224 | def __str__(self) -> str:
225 | """Retrieve the text
226 |
227 | Returns
228 | The words in the alignment separated by spaces
229 | """
230 | ```
231 |
232 |
233 | #### `pypar.Alignment.duration`
234 |
235 | ```python
236 | def duration(self) -> float:
237 | """Retrieve the duration of the alignment in seconds
238 |
239 | Returns
240 | The duration in seconds
241 | """
242 | ```
243 |
244 |
245 | #### `pypar.Alignment.end`
246 |
247 | ```python
248 | def end(self) -> float:
249 | """Retrieve the end time of the alignment in seconds
250 |
251 | Returns
252 | The end time in seconds
253 | """
254 | ```
255 |
256 |
257 | #### `pypar.Alignment.framewise_phoneme_indices`
258 |
259 | ```python
260 | def framewise_phoneme_indices(
261 | self,
262 | phoneme_map: Dict[str, int],
263 | hopsize: float,
264 | times: Optional[List[float]] = None
265 | ) -> List[int]:
266 | """Convert alignment to phoneme indices at regular temporal interval
267 |
268 | Arguments
269 | phoneme_map
270 | Mapping from phonemes to indices
271 | hopsize
272 | Temporal interval between frames in seconds
273 | times
274 | Specified times in seconds to sample phonemes
275 | """
276 | ```
277 |
278 |
279 | #### `pypar.Alignment.find`
280 |
281 | ```python
282 | def find(self, words: str) -> int:
283 | """Find the words in the alignment
284 |
285 | Arguments
286 | words
287 | The words to find
288 |
289 | Returns
290 | The index of the start of the words or -1 if not found
291 | """
292 | ```
293 |
294 |
295 | #### `pypar.Alignment.phonemes`
296 |
297 | ```python
298 | def phonemes(self) -> List[pypar.Phoneme]:
299 | """Retrieve the phonemes in the alignment
300 |
301 | Returns
302 | The phonemes in the alignment
303 | """
304 | ```
305 |
306 |
307 | #### `pypar.Alignment.phoneme_at_time`
308 |
309 | ```python
310 | def phoneme_at_time(self, time: float) -> Optional[pypar.Phoneme]:
311 | """Retrieve the phoneme spoken at specified time
312 |
313 | Arguments
314 | time
315 | Time in seconds
316 |
317 | Returns
318 | The phoneme at the given time (or None if time is out of bounds)
319 | """
320 | ```
321 |
322 |
323 | #### `pypar.Alignment.phoneme_bounds`
324 |
325 | ```python
326 | def phoneme_bounds(
327 | self,
328 | sample_rate: int,
329 | hopsize: int = 1
330 | ) -> List[Tuple[int, int]]:
331 | """Retrieve the start and end frame index of each phoneme
332 |
333 | Arguments
334 | sample_rate
335 | The audio sampling rate
336 | hopsize
337 | The number of samples between successive frames
338 |
339 | Returns
340 | The start and end indices of the phonemes
341 | """
342 | ```
343 |
344 |
345 | #### `pypar.Alignment.save`
346 |
347 | ```python
348 | def save(self, filename: Union[str, bytes, os.PathLike]) -> None:
349 | """Save alignment to json
350 |
351 | Arguments
352 | filename
353 | The location on disk to save the phoneme alignment json
354 | """
355 | ```
356 |
357 |
358 | #### `pypar.Alignment.start`
359 |
360 | ```python
361 | def start(self) -> float:
362 | """Retrieve the start time of the alignment in seconds
363 |
364 | Returns
365 | The start time in seconds
366 | """
367 | ```
368 |
369 |
370 | #### `pypar.Alignment.update`
371 |
372 | ```python
373 | def update(
374 | self,
375 | idx: int = 0,
376 | durations: Optional[List[float]] = None,
377 | start: Optional[float] = None
378 | ) -> None:
379 | """Update alignment starting from phoneme index idx
380 |
381 | Arguments
382 | idx
383 | The index of the first phoneme whose duration is being updated
384 | durations
385 | The new phoneme durations, starting from idx
386 | start
387 | The start time of the alignment
388 | """
389 | ```
390 |
391 |
392 | #### `pypar.Alignment.words`
393 |
394 | ```python
395 | def words(self) -> List[pypar.Word]:
396 | """Retrieve the words in the alignment
397 |
398 | Returns
399 | The words in the alignment
400 | """
401 | ```
402 |
403 |
404 | #### `pypar.Alignment.word_bounds`
405 |
406 | ```python
407 | def word_at_time(self, time: float) -> Optional[pypar.Word]:
408 | """Retrieve the word spoken at specified time
409 |
410 | Arguments
411 | time
412 | Time in seconds
413 |
414 | Returns
415 | The word spoken at the specified time
416 | """
417 | ```
418 |
419 |
420 | ### `pypar.Phoneme`
421 |
422 | #### `pypar.Phoneme.__init__`
423 |
424 | ```python
425 | def __init__(self, phoneme: str, start: float, end: float) -> None:
426 | """Create phoneme
427 |
428 | Arguments
429 | phoneme
430 | The phoneme
431 | start
432 | The start time in seconds
433 | end
434 | The end time in seconds
435 | """
436 | ```
437 |
438 |
439 | #### `pypar.Phoneme.__eq__`
440 |
441 | ```python
442 | def __eq__(self, other) -> bool:
443 | """Equality comparison for phonemes
444 |
445 | Arguments
446 | other
447 | The phoneme to compare to
448 |
449 | Returns
450 | Whether the phonemes are equal
451 | """
452 | ```
453 |
454 |
455 | #### `pypar.Phoneme.__str__`
456 |
457 | ```python
458 | def __str__(self) -> str:
459 | """Retrieve the phoneme text
460 |
461 | Returns
462 | The phoneme
463 | """
464 | ```
465 |
466 |
467 | #### `pypar.Phoneme.duration`
468 |
469 | ```python
470 | def duration(self) -> float:
471 | """Retrieve the phoneme duration
472 |
473 | Returns
474 | The duration in seconds
475 | """
476 | ```
477 |
478 |
479 | #### `pypar.Phoneme.end`
480 |
481 | ```python
482 | def end(self) -> float:
483 | """Retrieve the end time of the phoneme in seconds
484 |
485 | Returns
486 | The end time in seconds
487 | """
488 | ```
489 |
490 |
491 | #### `pypar.Phoneme.start`
492 |
493 | ```python
494 | def start(self) -> float:
495 | """Retrieve the start time of the phoneme in seconds
496 |
497 | Returns
498 | The start time in seconds
499 | """
500 | ```
501 |
502 |
503 | ### `pypar.Word`
504 |
505 | #### `pypar.Word.__init__`
506 |
507 | ```python
508 | def __init__(self, word: str, phonemes: List[pypar.Phoneme]) -> None:
509 | """Create word
510 |
511 | Arguments
512 | word
513 | The word
514 | phonemes
515 | The phonemes in the word
516 | """
517 | ```
518 |
519 |
520 | #### `pypar.Word.__eq__`
521 |
522 | ```python
523 | def __eq__(self, other) -> bool:
524 | """Equality comparison for words
525 |
526 | Arguments
527 | other
528 | The word to compare to
529 |
530 | Returns
531 | Whether the words are the same
532 | """
533 | ```
534 |
535 |
536 | #### `pypar.Word.__getitem__`
537 |
538 | ```python
539 | def __getitem__(self, idx: int) -> pypar.Phoneme:
540 | """Retrieve the idxth phoneme
541 |
542 | Arguments
543 | idx
544 | The index of the phoneme to retrieve
545 |
546 | Returns
547 | The phoneme at index idx
548 | """
549 | ```
550 |
551 |
552 | #### `pypar.Word.__len__`
553 |
554 | ```python
555 | def __len__(self) -> int:
556 | """Retrieve the number of phonemes
557 |
558 | Returns
559 | The number of phonemes
560 | """
561 | ```
562 |
563 |
564 | #### `pypar.Word.__str__`
565 |
566 | ```python
567 | def __str__(self) -> str:
568 | """Retrieve the word text
569 |
570 | Returns
571 | The word text
572 | """
573 | ```
574 |
575 |
576 | #### `pypar.Word.duration`
577 |
578 | ```python
579 | def duration(self) -> float:
580 | """Retrieve the word duration in seconds
581 |
582 | Returns
583 | The duration in seconds
584 | """
585 | ```
586 |
587 |
588 | #### `pypar.Word.end`
589 |
590 | ```python
591 | def end(self) -> float:
592 | """Retrieve the end time of the word in seconds
593 |
594 | Returns
595 | The end time in seconds
596 | """
597 | ```
598 |
599 |
600 | #### `pypar.Word.phoneme_at_time`
601 |
602 | ```python
603 | def phoneme_at_time(self, time: float) -> Optional[pypar.Phoneme]:
604 | """Retrieve the phoneme at the specified time
605 |
606 | Arguments
607 | time
608 | Time in seconds
609 |
610 | Returns
611 | The phoneme at the given time (or None if time is out of bounds)
612 | """
613 | ```
614 |
615 |
616 | #### `pypar.Word.start`
617 |
618 | ```python
619 | def start(self) -> float:
620 | """Retrieve the start time of the word in seconds
621 |
622 | Returns
623 | The start time in seconds
624 | """
625 | ```
626 |
627 |
628 | ## Tests
629 |
630 | Tests can be run as follows.
631 |
632 | ```
633 | pip install pytest
634 | pytest
635 | ```
636 |
--------------------------------------------------------------------------------
/pypar/__init__.py:
--------------------------------------------------------------------------------
1 | from .phoneme import Phoneme
2 | from .word import Word
3 | from .alignment import Alignment
4 | from . import compare
5 | from . import textgrid
6 |
7 | SILENCE = ''
8 |
--------------------------------------------------------------------------------
/pypar/alignment.py:
--------------------------------------------------------------------------------
1 | import copy
2 | import json
3 | import math
4 | import os
5 | from pathlib import Path
6 | from typing import Dict, List, Optional, Tuple, Union
7 |
8 | import pypar
9 |
10 |
11 | ###############################################################################
12 | # Alignment representation
13 | ###############################################################################
14 |
15 |
16 | class Alignment:
17 | """Word and phoneme alignment"""
18 |
19 | def __init__(
20 | self,
21 | alignment: Union[str, bytes, os.PathLike, List[pypar.Word], dict]
22 | ) -> None:
23 | """Create alignment
24 |
25 | Arguments
26 | alignment
27 | The filename, list of words, or json dict of the alignment
28 | """
29 | if isinstance(alignment, str):
30 |
31 | # Load alignment from disk
32 | self._words = self.load(alignment)
33 |
34 | elif isinstance(alignment, Path):
35 |
36 | # Cast and load
37 | self._words = self.load(str(alignment))
38 |
39 | elif isinstance(alignment, list):
40 | self._words = alignment
41 |
42 | # Require first word to start at 0 seconds
43 | self.update(start=0.)
44 |
45 | elif isinstance(alignment, dict):
46 | self._words = self.parse_json(alignment)
47 |
48 | # Ensure there are no gaps (by filling with silence)
49 | self.validate()
50 |
51 | def __add__(self, other):
52 | """Add alignments by concatenation
53 |
54 | Arguments
55 | other
56 | The alignment to compare to
57 |
58 | Returns
59 | The concatenated alignment
60 | """
61 | # Don't change original
62 | other = copy.deepcopy(other)
63 |
64 | # Move start time of other to end of self
65 | other.update(start=self.end())
66 |
67 | # Concatenate word lists
68 | return Alignment(self._words + other.words)
69 |
70 | def __eq__(self, other) -> bool:
71 | """Equality comparison for alignments
72 |
73 | Arguments
74 | other
75 | The alignment to compare to
76 |
77 | Returns
78 | Whether the alignments are equal
79 | """
80 | return \
81 | len(self) == len(other) and \
82 | all(word == other_word for word, other_word in zip(self, other))
83 |
84 | def __getitem__(self, idx: Union[int, slice]) -> pypar.Word:
85 | """Retrieve the idxth word
86 |
87 | Arguments
88 | idx
89 | The index of the word to retrieve
90 |
91 | Returns
92 | The word at index idx
93 | """
94 | if isinstance(idx, slice):
95 |
96 | # Slice into word list
97 | return Alignment(copy.deepcopy(self._words[idx]))
98 |
99 | # Retrieve a single word
100 | return self._words[idx]
101 |
102 | def __len__(self) -> int:
103 | """Retrieve the number of words
104 |
105 | Returns
106 | The number of words in the alignment
107 | """
108 | return len(self._words)
109 |
110 | def __str__(self) -> str:
111 | """Retrieve the text
112 |
113 | Returns
114 | The words in the alignment separated by spaces
115 | """
116 | return ' '.join([str(word) for word in self._words
117 | if str(word) != pypar.SILENCE])
118 |
119 | def duration(self) -> float:
120 | """Retrieve the duration of the alignment in seconds
121 |
122 | Returns
123 | The duration in seconds
124 | """
125 | return self.end() - self.start()
126 |
127 | def end(self) -> float:
128 | """Retrieve the end time of the alignment in seconds
129 |
130 | Returns
131 | The end time in seconds
132 | """
133 | return self._words[-1].end()
134 |
135 | def find(self, words: str) -> int:
136 | """Find the words in the alignment
137 |
138 | Arguments
139 | words
140 | The words to find
141 |
142 | Returns
143 | The index of the start of the words or -1 if not found
144 | """
145 | # Split at spaces
146 | words = words.split(' ')
147 |
148 | for i in range(0, len(self._words) - len(words) + 1):
149 |
150 | # Get text
151 | text = str(self._words[i]).lower()
152 |
153 | # Skip silence
154 | if text == pypar.SILENCE:
155 | continue
156 |
157 | j, k = 0, 0
158 | while j < len(words):
159 |
160 | # Compare words
161 | if text != words[j]:
162 | break
163 |
164 | # Increment words
165 | j += 1
166 | k += 1
167 | text = str(self._words[i + k]).lower()
168 |
169 | # skip silence
170 | while text == pypar.SILENCE:
171 | k += 1
172 | text = str(self._words[i + k]).lower()
173 |
174 | # Found match; return indices
175 | if j == len(words):
176 | return i
177 |
178 | # No match
179 | return -1
180 |
181 | def framewise_phoneme_indices(
182 | self,
183 | phoneme_map: Dict[str, int],
184 | hopsize: float,
185 | times: Optional[List[float]] = None
186 | ) -> List[int]:
187 | """Convert alignment to phoneme indices at regular temporal interval
188 |
189 | Arguments
190 | phoneme_map
191 | Mapping from phonemes to indices
192 | hopsize
193 | Temporal interval between frames in seconds
194 | times
195 | Specified times in seconds to sample phonemes
196 | """
197 | if times is None:
198 | times = [
199 | i * hopsize for i in
200 | range(math.ceil(self.duration() / hopsize))]
201 | phonemes = [self.phoneme_at_time(time) for time in times]
202 | return [phoneme_map[str(phoneme)] for phoneme in phonemes]
203 |
204 | def phonemes(self) -> List[pypar.Phoneme]:
205 | """Retrieve the phonemes in the alignment
206 |
207 | Returns
208 | The phonemes in the alignment
209 | """
210 | return [phoneme for word in self for phoneme in word]
211 |
212 | def phoneme_at_time(self, time: float) -> Optional[pypar.Phoneme]:
213 | """Retrieve the phoneme spoken at specified time
214 |
215 | Arguments
216 | time
217 | Time in seconds
218 |
219 | Returns
220 | The phoneme at the given time (or None if time is out of bounds)
221 | """
222 | word = self.word_at_time(time)
223 | return word.phoneme_at_time(time) if word else None
224 |
225 | def phoneme_bounds(
226 | self,
227 | sample_rate: int,
228 | hopsize: int = 1
229 | ) -> List[Tuple[int, int]]:
230 | """Retrieve the start and end frame index of each phoneme
231 |
232 | Arguments
233 | sample_rate
234 | The audio sampling rate
235 | hopsize
236 | The number of samples between successive frames
237 |
238 | Returns
239 | The start and end indices of the phonemes
240 | """
241 | bounds = [(p.start(), p.end()) for p in self.phonemes()
242 | if str(p) != pypar.SILENCE]
243 | return [(int(a * sample_rate / hopsize),
244 | int(b * sample_rate / hopsize))
245 | for a, b in bounds]
246 |
247 | def save(self, filename: Union[str, bytes, os.PathLike]) -> None:
248 | """Save alignment to json
249 |
250 | Arguments
251 | filename
252 | The location on disk to save the phoneme alignment json
253 | """
254 | if os.path.dirname(filename):
255 | os.makedirs(os.path.dirname(filename), exist_ok=True)
256 | if isinstance(filename, Path):
257 | filename = str(filename)
258 | extension = filename.split('.')[-1]
259 | if extension == 'json':
260 | self.save_json(filename)
261 | elif extension.lower() == 'textgrid':
262 | self.save_textgrid(filename)
263 | else:
264 | raise ValueError(
265 | f'No save routine for files with extension {extension}')
266 |
267 | def start(self) -> float:
268 | """Retrieve the start time of the alignment in seconds
269 |
270 | Returns
271 | The start time in seconds
272 | """
273 | return self._words[0].start()
274 |
275 | def update(
276 | self,
277 | idx: int = 0,
278 | durations: Optional[List[float]] = None,
279 | start: Optional[float] = None
280 | ) -> None:
281 | """Update alignment starting from phoneme index idx
282 |
283 | Arguments
284 | idx
285 | The index of the first phoneme whose duration is being updated
286 | durations
287 | The new phoneme durations, starting from idx
288 | start
289 | The start time of the alignment
290 | """
291 | # If durations are not given, just update phoneme start and end times
292 | durations = [] if durations is None else durations
293 |
294 | # Word start time (in seconds) and phoneme start index
295 | start = self.start() if start is None else start
296 | start_phoneme = 0
297 |
298 | # Update each word
299 | for word in self:
300 | end_phoneme = start_phoneme + len(word)
301 |
302 | # Update phoneme alignment of this word
303 | word = self.update_word(
304 | word, idx, durations, start, start_phoneme, end_phoneme)
305 |
306 | start = word.end()
307 | start_phoneme += len(word)
308 |
309 | def words(self) -> List[pypar.Word]:
310 | """Retrieve the words in the alignment
311 |
312 | Returns
313 | The words in the alignment
314 | """
315 | return self._words
316 |
317 | def word_at_time(self, time: float) -> Optional[pypar.Word]:
318 | """Retrieve the word spoken at specified time
319 |
320 | Arguments
321 | time
322 | Time in seconds
323 |
324 | Returns
325 | The word spoken at the specified time
326 | """
327 | for word in self:
328 | if word.start() <= time <= word.end():
329 | return word
330 | return None
331 |
332 | def word_bounds(
333 | self,
334 | sample_rate: int,
335 | hopsize: int = 1,
336 | silences: bool = False
337 | ) -> List[Tuple[int, int]]:
338 | """Retrieve the start and end frame index of each word
339 |
340 | Arguments
341 | sample_rate
342 | The audio sampling rate
343 | hopsize
344 | The number of samples between successive frames
345 | silences
346 | Whether to include silences as words
347 |
348 | Returns
349 | The start and end indices of the words
350 | """
351 | words = [
352 | word for word in self if str(word) != pypar.SILENCE or silences]
353 | bounds = [(word.start(), word.end()) for word in words]
354 | return [(int(a * sample_rate / hopsize),
355 | int(b * sample_rate / hopsize))
356 | for a, b in bounds]
357 |
358 | ###########################################################################
359 | # Utilities
360 | ###########################################################################
361 |
362 | def json(self):
363 | """Convert to json format"""
364 | words = []
365 | for word in self._words:
366 |
367 | # Convert phonemes to list
368 | phonemes = [[str(phoneme), phoneme.start(), phoneme.end()]
369 | for phoneme in word]
370 |
371 | # Convert word to dict format
372 | words.append({'alignedWord': str(word),
373 | 'start': word.start(),
374 | 'end': word.end(),
375 | 'phonemes': phonemes})
376 |
377 | return {'words': words}
378 |
379 | def line_is_valid(self, line):
380 | """Check if a line of a mlf file represents a phoneme"""
381 | line = line.strip().split()
382 | if not line:
383 | return False
384 | return len(line) in [4, 5]
385 |
386 | def load(self, file):
387 | """Load alignment from file"""
388 | extension = file.split('.')[-1]
389 | if extension == 'mlf':
390 | return self.load_mlf(file)
391 | if extension == 'json':
392 | return self.load_json(file)
393 | if extension.lower() == 'textgrid':
394 | return self.load_textgrid(file)
395 | raise ValueError(
396 | f'No alignment representation for file extension {extension}')
397 |
398 | def load_json(self, filename):
399 | """Load alignment from json file"""
400 | # Load from json file
401 | with open(filename) as file:
402 | return self.parse_json(json.load(file))
403 |
404 | def load_mlf(self, filename):
405 | """Load from mlf file"""
406 | # Load file from disk
407 | with open(filename) as file:
408 | # Read in phoneme alignment
409 | lines = [Line(line) for line in file.readlines()
410 | if self.line_is_valid(line)]
411 |
412 | # Remove silence tokens with 0 duration
413 | lines = [line for line in lines if line.start < line.end]
414 |
415 | # Extract words and phonemes
416 | phonemes = []
417 | words = []
418 | for line in lines:
419 |
420 | # Start new word
421 | if line.word is not None:
422 |
423 | # Add word that just finished
424 | if phonemes:
425 | words.append(pypar.Word(word, phonemes))
426 | phonemes = []
427 |
428 | word = line.word
429 |
430 | # Add a phoneme
431 | phonemes.append(pypar.Phoneme(line.phoneme, line.start, line.end))
432 |
433 | # Handle last word
434 | if phonemes:
435 | words.append(pypar.Word(word, phonemes))
436 |
437 | return words
438 |
439 | def load_textgrid(self, filename):
440 | """Load from textgrid file"""
441 | # Load file
442 | grid = pypar.textgrid.TextGrid.fromFile(filename)
443 |
444 | # Get phoneme and word representations
445 | if 'word' in grid[0].name and 'phon' in grid[1].name:
446 | word_tier, phon_tier = grid[0], grid[1]
447 | elif 'phon' in grid[0].name and 'word' in grid[1].name:
448 | phon_tier, word_tier = grid[0], grid[1]
449 | else:
450 | raise ValueError(
451 | 'Cannot determine which TextGrid tiers ' +
452 | 'correspond to words and phonemes')
453 |
454 | # Iterate over words
455 | words = []
456 | phon_idx = 0
457 | for word in word_tier:
458 |
459 | # Get all phonemes for this word
460 | phonemes = []
461 | while (
462 | phon_idx < len(phon_tier) and
463 | phon_tier[phon_idx].maxTime <= word.maxTime
464 | ):
465 | phonemes.append(
466 | pypar.Phoneme(
467 | phon_tier[phon_idx].mark,
468 | phon_tier[phon_idx].minTime,
469 | phon_tier[phon_idx].maxTime))
470 | phon_idx += 1
471 |
472 | # Add finished word
473 | words.append(pypar.Word(word.mark, phonemes))
474 |
475 | return words
476 |
477 | def parse_json(self, alignment):
478 | """Construct word list from json representation"""
479 | words = []
480 | for word in alignment['words']:
481 | try:
482 |
483 | # Add a word
484 | phonemes = [
485 | pypar.Phoneme(*phoneme) for phoneme in word['phonemes']]
486 | words.append(pypar.Word(word['alignedWord'], phonemes))
487 |
488 | except KeyError:
489 |
490 | # Add a silence
491 | phonemes = [
492 | pypar.Phoneme(pypar.SILENCE, word['start'], word['end'])]
493 | words.append(pypar.Word(pypar.SILENCE, phonemes))
494 |
495 | return words
496 |
497 | def save_json(self, filename):
498 | """Save alignment as json"""
499 | with open(filename, 'w', encoding='utf-8') as file:
500 | json.dump(self.json(), file, ensure_ascii=False, indent=4)
501 |
502 | def save_textgrid(self, filename):
503 | """Save alignment as textgrid"""
504 | # Construct phoneme tier
505 | phon_tier = pypar.textgrid.IntervalTier('phone')
506 | for phoneme in self.phonemes():
507 | phon_tier.add(phoneme.start(), phoneme.end(), str(phoneme))
508 |
509 | # Construct word tier
510 | word_tier = pypar.textgrid.IntervalTier('word')
511 | for word in self:
512 | word_tier.add(word.start(), word.end(), str(word))
513 |
514 | # Save textgrid
515 | pypar.textgrid.TextGrid([phon_tier, word_tier]).write(filename)
516 |
517 | def update_word(
518 | self,
519 | word,
520 | idx,
521 | durations,
522 | start,
523 | start_phoneme,
524 | end_phoneme):
525 | """Update the phoneme alignment of one word"""
526 | # All phonemes beyond (and including) idx must be updated
527 | if end_phoneme > idx:
528 |
529 | # Retrieve current phoneme durations for word
530 | word_durations = [phoneme.duration() for phoneme in word]
531 |
532 | # The first len(durations) phonemes use new durations
533 | if start_phoneme - idx < len(durations) and end_phoneme - idx > 0:
534 |
535 | # Get indices into durations for copy/paste operation
536 | src_start_idx = max(0, start_phoneme - idx)
537 | src_end_idx = min(len(durations), end_phoneme - idx)
538 | src = durations[src_start_idx:src_end_idx]
539 |
540 | # Case 1: replace all phonemes in word
541 | if len(src) == len(word_durations):
542 | dst_start_idx, dst_end_idx = 0, len(word_durations)
543 |
544 | # Case 2: replace right-most phonemes in word
545 | elif idx > start_phoneme and len(src) == end_phoneme - idx:
546 | dst_start_idx = len(word_durations) - len(src)
547 | dst_end_idx = len(word_durations)
548 |
549 | # Case 3: replace left-most phonemes in word
550 | elif idx <= start_phoneme:
551 | dst_start_idx = 0
552 | dst_end_idx = len(word_durations) - len(src)
553 |
554 | # Case 4: replace phonemes in center of word
555 | else:
556 | dst_start_idx = -(start_phoneme - idx)
557 | dst_end_idx = dst_start_idx + len(src)
558 |
559 | # Perform copy/paste on duration vector
560 | word_durations[dst_start_idx:dst_end_idx] = \
561 | durations[src_start_idx:src_end_idx]
562 |
563 | # Get new durations for word
564 | word.update(start, word_durations)
565 |
566 | return word
567 |
568 | def validate(self):
569 | """Ensures that adjacent start/stop times are valid by adding silence"""
570 | i = 0
571 | start = 0.
572 | while i < len(self) - 1:
573 |
574 | # Get start and end times between words
575 | end = self[i].start()
576 |
577 | # Patch gap with silence
578 | if end - start > 1e-3:
579 |
580 | # Extend existing silence if possible
581 | if str(self[i]) == pypar.SILENCE:
582 | self[i][0]._start = start
583 | else:
584 | word = pypar.Word(
585 | pypar.SILENCE,
586 | [pypar.Phoneme(pypar.SILENCE, start, end)])
587 | self._words.insert(i, word)
588 | i += 1
589 |
590 | i += 1
591 | start = self[i].end()
592 |
593 | # Phoneme gap validation
594 | for word in self:
595 | word.validate()
596 |
597 |
598 | ###############################################################################
599 | # Utilities
600 | ###############################################################################
601 |
602 |
603 | class Line:
604 | """One line of a HTK mlf file"""
605 |
606 | def __init__(self, line):
607 | line = line.strip().split()
608 |
609 | if len(line) == 4:
610 | start, end, self.phoneme, _ = line
611 | self.word = None
612 | else:
613 | start, end, self.phoneme, _, self.word = line
614 |
615 | self.start = float(start) / 10000000.
616 | self.end = float(end) / 10000000.
617 |
--------------------------------------------------------------------------------
/pypar/compare.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 |
4 | ###############################################################################
5 | # Alignment comparisons
6 | ###############################################################################
7 |
8 |
9 | def per_frame_rate(
10 | alignment_a,
11 | alignment_b,
12 | sample_rate,
13 | hopsize,
14 | frames=None):
15 | """Compute the per-frame rate difference between alignments A and B
16 |
17 | Arguments
18 | alignment_a : Alignment
19 | The source alignment
20 | alignment_b : Alignment
21 | The target alignment
22 | sample_rate : int
23 | The audio sampling rate
24 | hopsize : int
25 | The number of samples between successive frames
26 | frames:
27 | The number of frames of audio. May vary based on padding.
28 |
29 | Returns
30 | rates : list[float]
31 | The frame-wise relative speed of alignment B to alignment A
32 | """
33 | # Create dict mapping phoneme to relative rate
34 | rates_per_phoneme = per_phoneme_rate(alignment_a, alignment_b)
35 | dict_keys = [phoneme_tuple(phoneme) for phoneme in alignment_a.phonemes()]
36 | rate_map = dict(zip(dict_keys, rates_per_phoneme))
37 |
38 | # Query the dict every hopsize seconds
39 | if frames is None:
40 | frames = 1 + int(round(alignment_a.end(), 6) * sample_rate / hopsize)
41 | return [rate_map[phoneme_tuple(alignment_a.phoneme_at_time(t))]
42 | for t in np.linspace(0., alignment_a.end(), frames)]
43 |
44 |
45 | def per_phoneme_rate(alignment_a, alignment_b):
46 | """Compute the per-phoneme rate difference between alignments A and B
47 |
48 | Arguments
49 | alignment_a : Alignment
50 | The source alignment
51 | alignment_b : Alignment
52 | The target alignment
53 |
54 | Returns
55 | rates : list[float]
56 | The phoneme-wise relative speed of alignment B to alignment A
57 | """
58 | # Error check alignments
59 | if len(alignment_a.phonemes()) != len(alignment_b.phonemes()):
60 | raise ValueError('Alignments must have same number of phonemes')
61 |
62 | iterator = zip(alignment_a.phonemes(), alignment_b.phonemes())
63 | return [target.duration() / source.duration()
64 | for source, target in iterator]
65 |
66 |
67 | ###############################################################################
68 | # Alignment comparisons
69 | ###############################################################################
70 |
71 |
72 | def phoneme_tuple(phoneme):
73 | """Convert phoneme to hashable tuple representation
74 |
75 | Arguments
76 | phoneme - The phoneme to convert
77 |
78 | Returns
79 | tuple(float, float, string)
80 | The phoneme represented as a tuple
81 | """
82 | return (phoneme.start(), phoneme.end(), str(phoneme))
83 |
--------------------------------------------------------------------------------
/pypar/phoneme.py:
--------------------------------------------------------------------------------
1 | ###############################################################################
2 | # Phoneme
3 | ###############################################################################
4 |
5 |
6 | class Phoneme:
7 | """Aligned phoneme representation"""
8 |
9 | def __init__(self, phoneme: str, start: float, end: float) -> None:
10 | """Create phoneme
11 |
12 | Arguments
13 | phoneme
14 | The phoneme
15 | start
16 | The start time in seconds
17 | end
18 | The end time in seconds
19 | """
20 | self.phoneme = phoneme
21 | self._start = start
22 | self._end = end
23 |
24 | def __eq__(self, other) -> bool:
25 | """Equality comparison for phonemes
26 |
27 | Arguments
28 | other
29 | The phoneme to compare to
30 |
31 | Returns
32 | Whether the phonemes are equal
33 | """
34 | return \
35 | str(self) == str(other) and \
36 | abs(self._start - other._start) < 1e-5 and \
37 | abs(self._end - other._end) < 1e-5
38 |
39 | def __str__(self) -> str:
40 | """Retrieve the phoneme text
41 |
42 | Returns
43 | The phoneme
44 | """
45 | return self.phoneme
46 |
47 | def duration(self) -> float:
48 | """Retrieve the phoneme duration
49 |
50 | Returns
51 | The duration in seconds
52 | """
53 | return self._end - self._start
54 |
55 | def end(self) -> float:
56 | """Retrieve the end time of the phoneme in seconds
57 |
58 | Returns
59 | The end time in seconds
60 | """
61 | return self._end
62 |
63 | def start(self) -> float:
64 | """Retrieve the start time of the phoneme in seconds
65 |
66 | Returns
67 | The start time in seconds
68 | """
69 | return self._start
70 |
--------------------------------------------------------------------------------
/pypar/textgrid.py:
--------------------------------------------------------------------------------
1 | import re
2 |
3 |
4 | ###############################################################################
5 | # Textgrid
6 | ###############################################################################
7 |
8 |
9 | class TextGrid:
10 |
11 | def __init__(self, tiers=None):
12 | self.tiers = [] if tiers is None else tiers
13 |
14 | def __len__(self):
15 | return len(self.tiers)
16 |
17 | def __getitem__(self, i):
18 | return self.tiers[i]
19 |
20 | def read(self, file):
21 | # Open file
22 | with open(file) as file:
23 |
24 | # Parse header
25 | _, short = parse_header(file)
26 | first_line_beside_header = file.readline()
27 | try:
28 | parse_line(first_line_beside_header, short)
29 | except Exception:
30 | short = True
31 | parse_line(first_line_beside_header, short)
32 | parse_line(file.readline(), short)
33 | file.readline()
34 | if short:
35 | tiers = int(file.readline().strip())
36 | else:
37 | tiers = int(file.readline().strip().split()[2])
38 | if not short:
39 | file.readline()
40 |
41 | # Iterate over tiers
42 | for _ in range(tiers):
43 |
44 | # Maybe flush extra line
45 | if not short:
46 | file.readline()
47 |
48 | # Create interval tier
49 | if parse_line(file.readline(), short) == 'IntervalTier':
50 |
51 | # Initialize
52 | name = parse_line(file.readline(), short)
53 | tier = IntervalTier(name)
54 |
55 | # Flush tier min/max time
56 | parse_line(file.readline(), short)
57 | parse_line(file.readline(), short)
58 |
59 | # Populate
60 | for _ in range(int(parse_line(file.readline(), short))):
61 | if not short:
62 | file.readline().rstrip().split()
63 | minTime = parse_line(file.readline(), short)
64 | maxTime = parse_line(file.readline(), short)
65 | mark = parseMark(file, short)
66 | if minTime < maxTime:
67 | tier.add(minTime, maxTime, mark)
68 | self.tiers.append(tier)
69 |
70 | else:
71 | raise ValueError('TextGrid error')
72 |
73 | def write(self, file):
74 | with open(file, 'w') as file:
75 | # Write header
76 | file.write('File type = "ooTextFile"\n')
77 | file.write('Object class = "TextGrid"\n\n')
78 | file.write('xmin = {0}\n'.format(self.tiers[0][0].minTime))
79 | file.write('xmax = {0}\n'.format(self.tiers[0][-1].maxTime))
80 | file.write('tiers? \n')
81 | file.write('size = {0}\n'.format(len(self)))
82 | file.write('item []:\n')
83 |
84 | # Write interval tiers
85 | for i, tier in enumerate(self.tiers, 1):
86 | file.write('\titem [{0}]:\n'.format(i))
87 | file.write('\t\tclass = "IntervalTier"\n')
88 | file.write('\t\tname = "{0}"\n'.format(tier.name))
89 | file.write('\t\txmin = {0}\n'.format(tier[0].minTime))
90 | file.write('\t\txmax = {0}\n'.format(tier[-1].maxTime))
91 | file.write(
92 | '\t\tintervals: size = {0}\n'.format(len(tier.intervals)))
93 |
94 | # Write intervals
95 | for j, interval in enumerate(tier.intervals, 1):
96 | file.write('\t\t\tintervals [{0}]:\n'.format(j))
97 | file.write('\t\t\t\txmin = {0}\n'.format(interval.minTime))
98 | file.write('\t\t\t\txmax = {0}\n'.format(interval.maxTime))
99 | mark = interval.mark.replace('"', '""')
100 | file.write('\t\t\t\ttext = "{0}"\n'.format(mark))
101 |
102 | @classmethod
103 | def fromFile(cls, file):
104 | textgrid = cls()
105 | textgrid.read(file)
106 | return textgrid
107 |
108 |
109 | ###############################################################################
110 | # Textgrid interval
111 | ###############################################################################
112 |
113 |
114 | class Interval:
115 |
116 | def __init__(self, minTime, maxTime, mark):
117 | if minTime >= maxTime:
118 | raise ValueError(minTime, maxTime)
119 | self.minTime = minTime
120 | self.maxTime = maxTime
121 | self.mark = mark
122 |
123 |
124 | class IntervalTier:
125 |
126 | def __init__(self, name):
127 | self.name = name
128 | self.intervals = []
129 |
130 | def __iter__(self):
131 | return iter(self.intervals)
132 |
133 | def __len__(self):
134 | return len(self.intervals)
135 |
136 | def __getitem__(self, i):
137 | return self.intervals[i]
138 |
139 | def add(self, minTime, maxTime, mark):
140 | self.intervals.append(Interval(minTime, maxTime, mark))
141 |
142 |
143 | ###############################################################################
144 | # Utilities
145 | ###############################################################################
146 |
147 |
148 | def parse_header(source):
149 | header = source.readline()
150 | m = re.match(r'File type = "([\w ]+)"', header)
151 | short = 'short' in m.groups()[0]
152 | file_type = parse_line(source.readline(), short)
153 | source.readline()
154 | return file_type, short
155 |
156 |
157 | def parse_line(line, short):
158 | line = line.strip()
159 | if short:
160 | if '"' in line:
161 | return line[1:-1]
162 | return float(line)
163 | if '"' in line:
164 | m = re.match(r'.+? = "(.*)"', line)
165 | return m.groups()[0]
166 | m = re.match(r'.+? = (.*)', line)
167 | return float(m.groups()[0])
168 |
169 |
170 | def parseMark(text, short):
171 | line = text.readline()
172 |
173 | # read until the number of double-quotes is even
174 | while line.count('"') % 2:
175 | next_line = text.readline()
176 | line += next_line
177 |
178 | if short:
179 | pattern = r'^"(.*?)"\s*$'
180 | else:
181 | pattern = r'^\s*(text|mark) = "(.*?)"\s*$'
182 | entry = re.match(pattern, line, re.DOTALL)
183 |
184 | return entry.groups()[-1].replace('""', '"')
185 |
--------------------------------------------------------------------------------
/pypar/word.py:
--------------------------------------------------------------------------------
1 | from typing import List, Optional
2 |
3 | import pypar
4 |
5 |
6 | ###############################################################################
7 | # Word representation
8 | ###############################################################################
9 |
10 |
11 | class Word:
12 | """Aligned word represenatation"""
13 |
14 | def __init__(self, word: str, phonemes: List[pypar.Phoneme]) -> None:
15 | """Create word
16 |
17 | Arguments
18 | word
19 | The word
20 | phonemes
21 | The phonemes in the word
22 | """
23 | self.word = word
24 | self.phonemes = phonemes
25 |
26 | def __eq__(self, other) -> bool:
27 | """Equality comparison for words
28 |
29 | Arguments
30 | other
31 | The word to compare to
32 |
33 | Returns
34 | Whether the words are the same
35 | """
36 | return \
37 | str(self) == str(other) and \
38 | len(self) == len(other) and \
39 | all(phoneme == other_phoneme
40 | for phoneme, other_phoneme in zip(self, other))
41 |
42 | def __getitem__(self, idx: int) -> pypar.Phoneme:
43 | """Retrieve the idxth phoneme
44 |
45 | Arguments
46 | idx
47 | The index of the phoneme to retrieve
48 |
49 | Returns
50 | The phoneme at index idx
51 | """
52 | return self.phonemes[idx]
53 |
54 | def __len__(self) -> int:
55 | """Retrieve the number of phonemes
56 |
57 | Returns
58 | The number of phonemes
59 | """
60 | return len(self.phonemes)
61 |
62 | def __str__(self) -> str:
63 | """Retrieve the word text
64 |
65 | Returns
66 | The word text
67 | """
68 | return self.word
69 |
70 | def duration(self) -> float:
71 | """Retrieve the word duration in seconds
72 |
73 | Returns
74 | The duration in seconds
75 | """
76 | return self.end() - self.start()
77 |
78 | def end(self) -> float:
79 | """Retrieve the end time of the word in seconds
80 |
81 | Returns
82 | The end time in seconds
83 | """
84 | return self.phonemes[-1].end()
85 |
86 | def phoneme_at_time(self, time: float) -> Optional[pypar.Phoneme]:
87 | """Retrieve the phoneme at the specified time
88 |
89 | Arguments
90 | time
91 | Time in seconds
92 |
93 | Returns
94 | The phoneme at the given time (or None if time is out of bounds)
95 | """
96 | for phoneme in self.phonemes:
97 | if phoneme.start() <= time <= phoneme.end():
98 | return phoneme
99 | return None
100 |
101 | def start(self) -> float:
102 | """Retrieve the start time of the word in seconds
103 |
104 | Returns
105 | The start time in seconds
106 | """
107 | return self.phonemes[0].start()
108 |
109 | ###########################################################################
110 | # Utilities
111 | ###########################################################################
112 |
113 | def update(self, start, durations=None):
114 | """Update the word with new start time and phoneme durations
115 |
116 | Arguments
117 | start : float
118 | The new start time of the word
119 | durations : list[float] or None
120 | The new phoneme durations
121 | """
122 | # Use current durations if None provided
123 | if durations is None:
124 | durations = [phoneme.duration() for phoneme in self.phonemes]
125 |
126 | # Update phonemes
127 | phoneme_start = start
128 | for phoneme, duration in zip(self.phonemes, durations):
129 | phoneme._start = phoneme_start
130 | phoneme._end = phoneme_start + duration
131 | phoneme_start = phoneme._end
132 |
133 | def validate(self):
134 | """Ensures that adjacent start/end times are valid by adding silence"""
135 | i = 0
136 | while i < len(self) - 1:
137 |
138 | # Get start and end times between phonemes
139 | start = self[i].end()
140 | end = self[i + 1].start()
141 |
142 | # Patch gap with silence
143 | if end - start > 1e-4:
144 | phoneme = pypar.Phoneme(pypar.SILENCE, start, end)
145 | self.phonemes.insert(i + 1, phoneme)
146 | i += 1
147 |
148 | i += 1
149 |
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | from setuptools import setup
2 |
3 |
4 | # Description
5 | with open('README.md') as file:
6 | long_description = file.read()
7 |
8 |
9 | setup(
10 | name='pypar',
11 | version='0.0.6',
12 | description='Python phoneme alignment representation',
13 | author='Max Morrison',
14 | author_email='maxrmorrison@gmail.com',
15 | url='https://github.com/maxrmorrison/pypar',
16 | install_requires=['numpy'],
17 | packages=['pypar'],
18 | long_description=long_description,
19 | long_description_content_type='text/markdown',
20 | keywords=['align', 'duration', 'phoneme', 'speech'],
21 | classifiers=['License :: OSI Approved :: MIT License'],
22 | license='MIT')
23 |
--------------------------------------------------------------------------------
/test/assets/float.json:
--------------------------------------------------------------------------------
1 | {
2 | "words": [
3 | {
4 | "alignedWord": "",
5 | "start": 0.0,
6 | "end": 0.0245,
7 | "phonemes": [
8 | [
9 | "",
10 | 0.0,
11 | 0.0245
12 | ]
13 | ]
14 | },
15 | {
16 | "alignedWord": "the",
17 | "start": 0.0245,
18 | "end": 0.112,
19 | "phonemes": [
20 | [
21 | "dh",
22 | 0.0245,
23 | 0.08075
24 | ],
25 | [
26 | "ax",
27 | 0.08075,
28 | 0.112
29 | ]
30 | ]
31 | },
32 | {
33 | "alignedWord": "girl",
34 | "start": 0.112,
35 | "end": 0.36825,
36 | "phonemes": [
37 | [
38 | "g",
39 | 0.112,
40 | 0.18075
41 | ],
42 | [
43 | "er",
44 | 0.18075,
45 | 0.29325
46 | ],
47 | [
48 | "l",
49 | 0.29325,
50 | 0.36825
51 | ]
52 | ]
53 | },
54 | {
55 | "alignedWord": "faced",
56 | "start": 0.36825,
57 | "end": 0.7245,
58 | "phonemes": [
59 | [
60 | "f",
61 | 0.36825,
62 | 0.4745
63 | ],
64 | [
65 | "ey",
66 | 0.4745,
67 | 0.60575
68 | ],
69 | [
70 | "s",
71 | 0.60575,
72 | 0.6745
73 | ],
74 | [
75 | "t",
76 | 0.6745,
77 | 0.7245
78 | ]
79 | ]
80 | },
81 | {
82 | "alignedWord": "him",
83 | "start": 0.7245,
84 | "end": 0.91825,
85 | "phonemes": [
86 | [
87 | "hh",
88 | 0.7245,
89 | 0.7495
90 | ],
91 | [
92 | "ih",
93 | 0.7495,
94 | 0.7995
95 | ],
96 | [
97 | "m",
98 | 0.7995,
99 | 0.91825
100 | ]
101 | ]
102 | },
103 | {
104 | "alignedWord": "",
105 | "start": 0.91825,
106 | "end": 1.13075,
107 | "phonemes": [
108 | [
109 | "",
110 | 0.91825,
111 | 1.13075
112 | ]
113 | ]
114 | },
115 | {
116 | "alignedWord": "her",
117 | "start": 1.13075,
118 | "end": 1.29325,
119 | "phonemes": [
120 | [
121 | "hh",
122 | 1.13075,
123 | 1.1995
124 | ],
125 | [
126 | "er",
127 | 1.1995,
128 | 1.29325
129 | ]
130 | ]
131 | },
132 | {
133 | "alignedWord": "eyes",
134 | "start": 1.29325,
135 | "end": 1.48075,
136 | "phonemes": [
137 | [
138 | "ay",
139 | 1.29325,
140 | 1.3995
141 | ],
142 | [
143 | "z",
144 | 1.3995,
145 | 1.48075
146 | ]
147 | ]
148 | },
149 | {
150 | "alignedWord": "shining",
151 | "start": 1.48075,
152 | "end": 1.90575,
153 | "phonemes": [
154 | [
155 | "sh",
156 | 1.48075,
157 | 1.59325
158 | ],
159 | [
160 | "ay",
161 | 1.59325,
162 | 1.71825
163 | ],
164 | [
165 | "n",
166 | 1.71825,
167 | 1.76825
168 | ],
169 | [
170 | "ih",
171 | 1.76825,
172 | 1.83075
173 | ],
174 | [
175 | "ng",
176 | 1.83075,
177 | 1.90575
178 | ]
179 | ]
180 | },
181 | {
182 | "alignedWord": "with",
183 | "start": 1.90575,
184 | "end": 2.00575,
185 | "phonemes": [
186 | [
187 | "w",
188 | 1.90575,
189 | 1.95575
190 | ],
191 | [
192 | "ih",
193 | 1.95575,
194 | 1.987
195 | ],
196 | [
197 | "dh",
198 | 1.987,
199 | 2.00575
200 | ]
201 | ]
202 | },
203 | {
204 | "alignedWord": "sudden",
205 | "start": 2.00575,
206 | "end": 2.38075,
207 | "phonemes": [
208 | [
209 | "s",
210 | 2.00575,
211 | 2.13075
212 | ],
213 | [
214 | "ah",
215 | 2.13075,
216 | 2.212
217 | ],
218 | [
219 | "d",
220 | 2.212,
221 | 2.24325
222 | ],
223 | [
224 | "ax",
225 | 2.24325,
226 | 2.287
227 | ],
228 | [
229 | "n",
230 | 2.287,
231 | 2.38075
232 | ]
233 | ]
234 | },
235 | {
236 | "alignedWord": "fear",
237 | "start": 2.38075,
238 | "end": 2.80575,
239 | "phonemes": [
240 | [
241 | "f",
242 | 2.38075,
243 | 2.50575
244 | ],
245 | [
246 | "ih",
247 | 2.50575,
248 | 2.59325
249 | ],
250 | [
251 | "r",
252 | 2.59325,
253 | 2.80575
254 | ]
255 | ]
256 | },
257 | {
258 | "alignedWord": "",
259 | "start": 2.80575,
260 | "end": 2.84325,
261 | "phonemes": [
262 | [
263 | "",
264 | 2.80575,
265 | 2.84325
266 | ]
267 | ]
268 | }
269 | ]
270 | }
271 |
--------------------------------------------------------------------------------
/test/assets/test.TextGrid:
--------------------------------------------------------------------------------
1 | File type = "ooTextFile"
2 | Object class = "TextGrid"
3 |
4 | xmin = 0.0
5 | xmax = 5.429931972789116
6 | tiers?
7 | size = 2
8 | item []:
9 | item [1]:
10 | class = "IntervalTier"
11 | name = "phone"
12 | xmin = 0.0
13 | xmax = 5.429931972789116
14 | intervals: size = 54
15 | intervals [1]:
16 | xmin = 0.0
17 | xmax = 0.27188208616780046
18 | text = "sil"
19 | intervals [2]:
20 | xmin = 0.27188208616780046
21 | xmax = 0.4414965986394558
22 | text = "AY1"
23 | intervals [3]:
24 | xmin = 0.4414965986394558
25 | xmax = 0.6011337868480725
26 | text = "B"
27 | intervals [4]:
28 | xmin = 0.6011337868480725
29 | xmax = 0.6609977324263039
30 | text = "EH1"
31 | intervals [5]:
32 | xmin = 0.6609977324263039
33 | xmax = 0.7507936507936508
34 | text = "G"
35 | intervals [6]:
36 | xmin = 0.7507936507936508
37 | xmax = 0.8605442176870748
38 | text = "Y"
39 | intervals [7]:
40 | xmin = 0.8605442176870748
41 | xmax = 0.8904761904761904
42 | text = "AO1"
43 | intervals [8]:
44 | xmin = 0.8904761904761904
45 | xmax = 0.9303854875283446
46 | text = "R"
47 | intervals [9]:
48 | xmin = 0.9303854875283446
49 | xmax = 1.0501133786848071
50 | text = "P"
51 | intervals [10]:
52 | xmin = 1.0501133786848071
53 | xmax = 1.1399092970521538
54 | text = "AA1"
55 | intervals [11]:
56 | xmin = 1.1399092970521538
57 | xmax = 1.199773242630385
58 | text = "R"
59 | intervals [12]:
60 | xmin = 1.199773242630385
61 | xmax = 1.2396825396825393
62 | text = "D"
63 | intervals [13]:
64 | xmin = 1.2396825396825393
65 | xmax = 1.269614512471655
66 | text = "AH0"
67 | intervals [14]:
68 | xmin = 1.269614512471655
69 | xmax = 1.4990929705215414
70 | text = "N"
71 | intervals [15]:
72 | xmin = 1.4990929705215414
73 | xmax = 1.5888888888888886
74 | text = "sil"
75 | intervals [16]:
76 | xmin = 1.5888888888888886
77 | xmax = 1.7485260770975053
78 | text = "S"
79 | intervals [17]:
80 | xmin = 1.7485260770975053
81 | xmax = 1.818367346938775
82 | text = "EH1"
83 | intervals [18]:
84 | xmin = 1.818367346938775
85 | xmax = 1.8482993197278907
86 | text = "D"
87 | intervals [19]:
88 | xmin = 1.8482993197278907
89 | xmax = 1.8782312925170064
90 | text = "DH"
91 | intervals [20]:
92 | xmin = 1.8782312925170064
93 | xmax = 1.9081632653061218
94 | text = "AH0"
95 | intervals [21]:
96 | xmin = 1.9081632653061218
97 | xmax = 2.017913832199546
98 | text = "M"
99 | intervals [22]:
100 | xmin = 2.017913832199546
101 | xmax = 2.1875283446712013
102 | text = "AW1"
103 | intervals [23]:
104 | xmin = 2.1875283446712013
105 | xmax = 2.4269841269841264
106 | text = "S"
107 | intervals [24]:
108 | xmin = 2.4269841269841264
109 | xmax = 2.526757369614512
110 | text = "F"
111 | intervals [25]:
112 | xmin = 2.526757369614512
113 | xmax = 2.586621315192743
114 | text = "R"
115 | intervals [26]:
116 | xmin = 2.586621315192743
117 | xmax = 2.6863945578231285
118 | text = "AW1"
119 | intervals [27]:
120 | xmin = 2.6863945578231285
121 | xmax = 2.7263038548752827
122 | text = "N"
123 | intervals [28]:
124 | xmin = 2.7263038548752827
125 | xmax = 2.7961451247165523
126 | text = "IH0"
127 | intervals [29]:
128 | xmin = 2.7961451247165523
129 | xmax = 2.8759637188208607
130 | text = "NG"
131 | intervals [30]:
132 | xmin = 2.8759637188208607
133 | xmax = 2.935827664399092
134 | text = "B"
135 | intervals [31]:
136 | xmin = 2.935827664399092
137 | xmax = 2.995691609977323
138 | text = "AH1"
139 | intervals [32]:
140 | xmin = 2.995691609977323
141 | xmax = 3.075510204081631
142 | text = "T"
143 | intervals [33]:
144 | xmin = 3.075510204081631
145 | xmax = 3.1353741496598624
146 | text = "V"
147 | intervals [34]:
148 | xmin = 3.1353741496598624
149 | xmax = 3.1752834467120166
150 | text = "EH1"
151 | intervals [35]:
152 | xmin = 3.1752834467120166
153 | xmax = 3.2850340136054403
154 | text = "R"
155 | intervals [36]:
156 | xmin = 3.2850340136054403
157 | xmax = 3.3448979591836716
158 | text = "IY0"
159 | intervals [37]:
160 | xmin = 3.3448979591836716
161 | xmax = 3.4147392290249416
162 | text = "P"
163 | intervals [38]:
164 | xmin = 3.4147392290249416
165 | xmax = 3.444671201814057
166 | text = "AH0"
167 | intervals [39]:
168 | xmin = 3.444671201814057
169 | xmax = 3.5444444444444425
170 | text = "L"
171 | intervals [40]:
172 | xmin = 3.5444444444444425
173 | xmax = 3.644217687074828
174 | text = "AY1"
175 | intervals [41]:
176 | xmin = 3.644217687074828
177 | xmax = 3.6741496598639434
178 | text = "T"
179 | intervals [42]:
180 | xmin = 3.6741496598639434
181 | xmax = 3.7439909297052134
182 | text = "L"
183 | intervals [43]:
184 | xmin = 3.7439909297052134
185 | xmax = 3.8736961451247143
186 | text = "IY0"
187 | intervals [44]:
188 | xmin = 3.8736961451247143
189 | xmax = 4.093197278911562
190 | text = "sil"
191 | intervals [45]:
192 | xmin = 4.093197278911562
193 | xmax = 4.232879818594102
194 | text = "D"
195 | intervals [46]:
196 | xmin = 4.232879818594102
197 | xmax = 4.292743764172333
198 | text = "IH1"
199 | intervals [47]:
200 | xmin = 4.292743764172333
201 | xmax = 4.352607709750564
202 | text = "D"
203 | intervals [48]:
204 | xmin = 4.352607709750564
205 | xmax = 4.442403628117912
206 | text = "Y"
207 | intervals [49]:
208 | xmin = 4.442403628117912
209 | xmax = 4.482312925170066
210 | text = "UW1"
211 | intervals [50]:
212 | xmin = 4.482312925170066
213 | xmax = 4.641950113378682
214 | text = "S"
215 | intervals [51]:
216 | xmin = 4.641950113378682
217 | xmax = 4.681859410430836
218 | text = "P"
219 | intervals [52]:
220 | xmin = 4.681859410430836
221 | xmax = 4.851473922902492
222 | text = "IY1"
223 | intervals [53]:
224 | xmin = 4.851473922902492
225 | xmax = 5.0011337868480705
226 | text = "K"
227 | intervals [54]:
228 | xmin = 5.0011337868480705
229 | xmax = 5.429931972789116
230 | text = "sil"
231 | item [2]:
232 | class = "IntervalTier"
233 | name = "word"
234 | xmin = 0.0
235 | xmax = 5.429931972789116
236 | intervals: size = 18
237 | intervals [1]:
238 | xmin = 0.0
239 | xmax = 0.27188208616780046
240 | text = "sp"
241 | intervals [2]:
242 | xmin = 0.27188208616780046
243 | xmax = 0.4414965986394558
244 | text = "I"
245 | intervals [3]:
246 | xmin = 0.4414965986394558
247 | xmax = 0.7507936507936508
248 | text = "BEG"
249 | intervals [4]:
250 | xmin = 0.7507936507936508
251 | xmax = 0.9303854875283446
252 | text = "YOUR"
253 | intervals [5]:
254 | xmin = 0.9303854875283446
255 | xmax = 1.4990929705215414
256 | text = "PARDON"
257 | intervals [6]:
258 | xmin = 1.4990929705215414
259 | xmax = 1.5888888888888886
260 | text = "sp"
261 | intervals [7]:
262 | xmin = 1.5888888888888886
263 | xmax = 1.8482993197278907
264 | text = "SAID"
265 | intervals [8]:
266 | xmin = 1.8482993197278907
267 | xmax = 1.9081632653061218
268 | text = "THE"
269 | intervals [9]:
270 | xmin = 1.9081632653061218
271 | xmax = 2.4269841269841264
272 | text = "MOUSE"
273 | intervals [10]:
274 | xmin = 2.4269841269841264
275 | xmax = 2.8759637188208607
276 | text = "FROWNING"
277 | intervals [11]:
278 | xmin = 2.8759637188208607
279 | xmax = 3.075510204081631
280 | text = "BUT"
281 | intervals [12]:
282 | xmin = 3.075510204081631
283 | xmax = 3.3448979591836716
284 | text = "VERY"
285 | intervals [13]:
286 | xmin = 3.3448979591836716
287 | xmax = 3.8736961451247143
288 | text = "POLITELY"
289 | intervals [14]:
290 | xmin = 3.8736961451247143
291 | xmax = 4.093197278911562
292 | text = "sp"
293 | intervals [15]:
294 | xmin = 4.093197278911562
295 | xmax = 4.352607709750564
296 | text = "DID"
297 | intervals [16]:
298 | xmin = 4.352607709750564
299 | xmax = 4.482312925170066
300 | text = "YOU"
301 | intervals [17]:
302 | xmin = 4.482312925170066
303 | xmax = 5.0011337868480705
304 | text = "SPEAK"
305 | intervals [18]:
306 | xmin = 5.0011337868480705
307 | xmax = 5.429931972789116
308 | text = "sp"
309 |
--------------------------------------------------------------------------------
/test/assets/test.json:
--------------------------------------------------------------------------------
1 | {
2 | "words": [
3 | {
4 | "alignedWord": "",
5 | "start": 0.0,
6 | "end": 0.27188208616780046,
7 | "phonemes": [
8 | [
9 | "",
10 | 0.0,
11 | 0.27188208616780046
12 | ]
13 | ]
14 | },
15 | {
16 | "alignedWord": "I",
17 | "start": 0.27188208616780046,
18 | "end": 0.4414965986394558,
19 | "phonemes": [
20 | [
21 | "AY1",
22 | 0.27188208616780046,
23 | 0.4414965986394558
24 | ]
25 | ]
26 | },
27 | {
28 | "alignedWord": "BEG",
29 | "start": 0.4414965986394558,
30 | "end": 0.7507936507936508,
31 | "phonemes": [
32 | [
33 | "B",
34 | 0.4414965986394558,
35 | 0.6011337868480725
36 | ],
37 | [
38 | "EH1",
39 | 0.6011337868480725,
40 | 0.6609977324263039
41 | ],
42 | [
43 | "G",
44 | 0.6609977324263039,
45 | 0.7507936507936508
46 | ]
47 | ]
48 | },
49 | {
50 | "alignedWord": "YOUR",
51 | "start": 0.7507936507936508,
52 | "end": 0.9303854875283446,
53 | "phonemes": [
54 | [
55 | "Y",
56 | 0.7507936507936508,
57 | 0.8605442176870748
58 | ],
59 | [
60 | "AO1",
61 | 0.8605442176870748,
62 | 0.8904761904761904
63 | ],
64 | [
65 | "R",
66 | 0.8904761904761904,
67 | 0.9303854875283446
68 | ]
69 | ]
70 | },
71 | {
72 | "alignedWord": "PARDON",
73 | "start": 0.9303854875283446,
74 | "end": 1.4990929705215414,
75 | "phonemes": [
76 | [
77 | "P",
78 | 0.9303854875283446,
79 | 1.0501133786848071
80 | ],
81 | [
82 | "AA1",
83 | 1.0501133786848071,
84 | 1.1399092970521538
85 | ],
86 | [
87 | "R",
88 | 1.1399092970521538,
89 | 1.199773242630385
90 | ],
91 | [
92 | "D",
93 | 1.199773242630385,
94 | 1.2396825396825393
95 | ],
96 | [
97 | "AH0",
98 | 1.2396825396825393,
99 | 1.269614512471655
100 | ],
101 | [
102 | "N",
103 | 1.269614512471655,
104 | 1.4990929705215414
105 | ]
106 | ]
107 | },
108 | {
109 | "alignedWord": "",
110 | "start": 1.4990929705215414,
111 | "end": 1.5888888888888886,
112 | "phonemes": [
113 | [
114 | "",
115 | 1.4990929705215414,
116 | 1.5888888888888886
117 | ]
118 | ]
119 | },
120 | {
121 | "alignedWord": "SAID",
122 | "start": 1.5888888888888886,
123 | "end": 1.8482993197278907,
124 | "phonemes": [
125 | [
126 | "S",
127 | 1.5888888888888886,
128 | 1.7485260770975053
129 | ],
130 | [
131 | "EH1",
132 | 1.7485260770975053,
133 | 1.818367346938775
134 | ],
135 | [
136 | "D",
137 | 1.818367346938775,
138 | 1.8482993197278907
139 | ]
140 | ]
141 | },
142 | {
143 | "alignedWord": "THE",
144 | "start": 1.8482993197278907,
145 | "end": 1.9081632653061218,
146 | "phonemes": [
147 | [
148 | "DH",
149 | 1.8482993197278907,
150 | 1.8782312925170064
151 | ],
152 | [
153 | "AH0",
154 | 1.8782312925170064,
155 | 1.9081632653061218
156 | ]
157 | ]
158 | },
159 | {
160 | "alignedWord": "MOUSE",
161 | "start": 1.9081632653061218,
162 | "end": 2.4269841269841264,
163 | "phonemes": [
164 | [
165 | "M",
166 | 1.9081632653061218,
167 | 2.017913832199546
168 | ],
169 | [
170 | "AW1",
171 | 2.017913832199546,
172 | 2.1875283446712013
173 | ],
174 | [
175 | "S",
176 | 2.1875283446712013,
177 | 2.4269841269841264
178 | ]
179 | ]
180 | },
181 | {
182 | "alignedWord": "FROWNING",
183 | "start": 2.4269841269841264,
184 | "end": 2.8759637188208607,
185 | "phonemes": [
186 | [
187 | "F",
188 | 2.4269841269841264,
189 | 2.526757369614512
190 | ],
191 | [
192 | "R",
193 | 2.526757369614512,
194 | 2.586621315192743
195 | ],
196 | [
197 | "AW1",
198 | 2.586621315192743,
199 | 2.6863945578231285
200 | ],
201 | [
202 | "N",
203 | 2.6863945578231285,
204 | 2.7263038548752827
205 | ],
206 | [
207 | "IH0",
208 | 2.7263038548752827,
209 | 2.7961451247165523
210 | ],
211 | [
212 | "NG",
213 | 2.7961451247165523,
214 | 2.8759637188208607
215 | ]
216 | ]
217 | },
218 | {
219 | "alignedWord": "BUT",
220 | "start": 2.8759637188208607,
221 | "end": 3.075510204081631,
222 | "phonemes": [
223 | [
224 | "B",
225 | 2.8759637188208607,
226 | 2.935827664399092
227 | ],
228 | [
229 | "AH1",
230 | 2.935827664399092,
231 | 2.995691609977323
232 | ],
233 | [
234 | "T",
235 | 2.995691609977323,
236 | 3.075510204081631
237 | ]
238 | ]
239 | },
240 | {
241 | "alignedWord": "VERY",
242 | "start": 3.075510204081631,
243 | "end": 3.3448979591836716,
244 | "phonemes": [
245 | [
246 | "V",
247 | 3.075510204081631,
248 | 3.1353741496598624
249 | ],
250 | [
251 | "EH1",
252 | 3.1353741496598624,
253 | 3.1752834467120166
254 | ],
255 | [
256 | "R",
257 | 3.1752834467120166,
258 | 3.2850340136054403
259 | ],
260 | [
261 | "IY0",
262 | 3.2850340136054403,
263 | 3.3448979591836716
264 | ]
265 | ]
266 | },
267 | {
268 | "alignedWord": "POLITELY",
269 | "start": 3.3448979591836716,
270 | "end": 3.8736961451247143,
271 | "phonemes": [
272 | [
273 | "P",
274 | 3.3448979591836716,
275 | 3.4147392290249416
276 | ],
277 | [
278 | "AH0",
279 | 3.4147392290249416,
280 | 3.444671201814057
281 | ],
282 | [
283 | "L",
284 | 3.444671201814057,
285 | 3.5444444444444425
286 | ],
287 | [
288 | "AY1",
289 | 3.5444444444444425,
290 | 3.644217687074828
291 | ],
292 | [
293 | "T",
294 | 3.644217687074828,
295 | 3.6741496598639434
296 | ],
297 | [
298 | "L",
299 | 3.6741496598639434,
300 | 3.7439909297052134
301 | ],
302 | [
303 | "IY0",
304 | 3.7439909297052134,
305 | 3.8736961451247143
306 | ]
307 | ]
308 | },
309 | {
310 | "alignedWord": "",
311 | "start": 3.8736961451247143,
312 | "end": 4.093197278911562,
313 | "phonemes": [
314 | [
315 | "",
316 | 3.8736961451247143,
317 | 4.093197278911562
318 | ]
319 | ]
320 | },
321 | {
322 | "alignedWord": "DID",
323 | "start": 4.093197278911562,
324 | "end": 4.352607709750564,
325 | "phonemes": [
326 | [
327 | "D",
328 | 4.093197278911562,
329 | 4.232879818594102
330 | ],
331 | [
332 | "IH1",
333 | 4.232879818594102,
334 | 4.292743764172333
335 | ],
336 | [
337 | "D",
338 | 4.292743764172333,
339 | 4.352607709750564
340 | ]
341 | ]
342 | },
343 | {
344 | "alignedWord": "YOU",
345 | "start": 4.352607709750564,
346 | "end": 4.482312925170066,
347 | "phonemes": [
348 | [
349 | "Y",
350 | 4.352607709750564,
351 | 4.442403628117912
352 | ],
353 | [
354 | "UW1",
355 | 4.442403628117912,
356 | 4.482312925170066
357 | ]
358 | ]
359 | },
360 | {
361 | "alignedWord": "SPEAK",
362 | "start": 4.482312925170066,
363 | "end": 5.0011337868480705,
364 | "phonemes": [
365 | [
366 | "S",
367 | 4.482312925170066,
368 | 4.641950113378682
369 | ],
370 | [
371 | "P",
372 | 4.641950113378682,
373 | 4.681859410430836
374 | ],
375 | [
376 | "IY1",
377 | 4.681859410430836,
378 | 4.851473922902492
379 | ],
380 | [
381 | "K",
382 | 4.851473922902492,
383 | 5.0011337868480705
384 | ]
385 | ]
386 | },
387 | {
388 | "alignedWord": "",
389 | "start": 5.0011337868480705,
390 | "end": 5.429931972789116,
391 | "phonemes": [
392 | [
393 | "",
394 | 5.0011337868480705,
395 | 5.429931972789116
396 | ]
397 | ]
398 | }
399 | ]
400 | }
401 |
--------------------------------------------------------------------------------
/test/assets/test.txt:
--------------------------------------------------------------------------------
1 | "I beg your pardon?" said the mouse, frowning, but very politely, "did you speak?"
--------------------------------------------------------------------------------
/test/assets/test.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/maxrmorrison/pypar/15701c8d3325d24c9aa04919468e840655c460a6/test/assets/test.wav
--------------------------------------------------------------------------------
/test/conftest.py:
--------------------------------------------------------------------------------
1 | from pathlib import Path
2 |
3 | import pytest
4 |
5 | import pypar
6 |
7 |
8 | ###############################################################################
9 | # Test fixtures
10 | ###############################################################################
11 |
12 |
13 | @pytest.fixture(scope='session')
14 | def alignment():
15 | """Retrieve the alignment to use for testing"""
16 | return pypar.Alignment(path('test.json'))
17 |
18 | @pytest.fixture(scope='session')
19 | def text():
20 | """Retrieve the speech transcript"""
21 | with open(path('test.txt')) as file:
22 | return file.read()
23 |
24 |
25 | @pytest.fixture(scope='session')
26 | def textgrid():
27 | """Retrieve the speech textgrid"""
28 | return pypar.Alignment(path('test.TextGrid'))
29 |
30 |
31 | @pytest.fixture(scope='session')
32 | def float_alignment():
33 | """Retrieve special alignment for float testing"""
34 | return pypar.Alignment(path('float.json'))
35 |
36 |
37 | ###############################################################################
38 | # Utilities
39 | ###############################################################################
40 |
41 |
42 | def path(file):
43 | """Resolve the file path of a test asset"""
44 | return Path(__file__).parent / 'assets' / file
45 |
--------------------------------------------------------------------------------
/test/test_alignment.py:
--------------------------------------------------------------------------------
1 | import tempfile
2 | from pathlib import Path
3 |
4 | import pypar
5 |
6 |
7 | ###############################################################################
8 | # Test alignment
9 | ###############################################################################
10 |
11 |
12 | def test_find(alignment):
13 | """Test finding words in the alignment"""
14 | assert alignment.find('the mouse') == 7
15 | assert alignment.find('the dog') == -1
16 |
17 |
18 | def test_phoneme_at_time(alignment):
19 | """Test queries for current phoneme given a time in seconds"""
20 | assert alignment.phoneme_at_time(-1.) is None
21 | assert str(alignment.phoneme_at_time(1.5)) == pypar.SILENCE
22 | assert str(alignment.phoneme_at_time(1.9)) == 'AH0'
23 | assert str(alignment.phoneme_at_time(4.5)) == 'S'
24 | assert alignment.phoneme_at_time(6.) is None
25 |
26 |
27 | def test_phoneme_bounds(alignment):
28 | """Test frame boundaries of phonemes"""
29 | bounds = alignment.phoneme_bounds(10000, 100)
30 | assert bounds[0] == (27, 44)
31 | assert bounds[4] == (75, 86)
32 |
33 |
34 | def test_load(textgrid):
35 | """Test textgrid loading"""
36 | pass
37 |
38 |
39 | def test_save(alignment):
40 | """Test saving and reloading alignment"""
41 | with tempfile.TemporaryDirectory() as directory:
42 | # Test json
43 | file = Path(directory) / 'alignment.json'
44 | alignment.save(file)
45 | assert alignment == pypar.Alignment(file)
46 |
47 | # Test textgrid
48 | file = Path(directory) / 'alignment.TextGrid'
49 | alignment.save(file)
50 | assert alignment == pypar.Alignment(file)
51 |
52 |
53 | def test_string(text, alignment):
54 | """Test the alignment string representation"""
55 | text = text.replace('"', '')
56 | text = text.replace('?', '')
57 | text = text.replace(',', '')
58 | assert text.upper() == str(alignment)
59 |
60 |
61 | def test_word_at_time(alignment):
62 | """Test queries for current word given a time in seconds"""
63 | assert alignment.word_at_time(-1.) is None
64 | assert str(alignment.word_at_time(1.)) == 'PARDON'
65 | assert str(alignment.word_at_time(4.1)) == 'DID'
66 | assert alignment.word_at_time(6.) is None
67 |
68 |
69 | def test_word_bounds(alignment):
70 | """Test frame boundaries of words"""
71 | bounds = alignment.word_bounds(10000, 100)
72 | assert bounds[0] == (27, 44)
73 | assert bounds[3] == (93, 149)
74 |
75 |
76 | def test_float_update(float_alignment):
77 | for i in range(1, len(float_alignment)):
78 | assert float_alignment[i].start() >= float_alignment[i-1].end()
79 | float_alignment.update(start=0.)
80 | for i in range(1, len(float_alignment)):
81 | assert float_alignment[i].start() >= float_alignment[i-1].end()
82 |
--------------------------------------------------------------------------------
/test/test_compare.py:
--------------------------------------------------------------------------------
1 | import copy
2 |
3 | import pypar
4 |
5 |
6 | ###############################################################################
7 | # Test alignment comparisons
8 | ###############################################################################
9 |
10 |
11 | def test_per_frame_rate(alignment):
12 | """Test the per-frame speed difference between alignments"""
13 | stretch_and_assert(alignment, .5)
14 | stretch_and_assert(alignment, 1.)
15 | stretch_and_assert(alignment, 2.)
16 |
17 |
18 | ###############################################################################
19 | # Utilities
20 | ###############################################################################
21 |
22 |
23 | def stretch(alignment, factor):
24 | """Time-stretch the alignment by a constant factor"""
25 | # Get phoneme durations
26 | durations = [factor * p.duration() for p in alignment.phonemes()]
27 | alignment = copy.deepcopy(alignment)
28 | alignment.update(durations=durations)
29 | return alignment
30 |
31 |
32 | def stretch_and_assert(alignment, factor, sample_rate=10000, hopsize=100):
33 | """Time-stretch and perform test assertions"""
34 | # Get per-frame rate differences
35 | result = pypar.compare.per_frame_rate(alignment,
36 | stretch(alignment, factor),
37 | sample_rate,
38 | hopsize)
39 |
40 | # Perform assertions
41 | assert len(result) == 1 + int(alignment.duration() * sample_rate / hopsize)
42 | for item in result:
43 | assert item == factor
44 |
--------------------------------------------------------------------------------