├── .gitignore ├── CITATION.cff ├── LICENSE ├── README.md ├── pypar ├── __init__.py ├── alignment.py ├── compare.py ├── phoneme.py ├── textgrid.py └── word.py ├── setup.py └── test ├── assets ├── float.json ├── test.TextGrid ├── test.json ├── test.txt └── test.wav ├── conftest.py ├── test_alignment.py └── test_compare.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.egg-info 2 | __pycache__/ 3 | .ipynb_checkpoints/ 4 | .pytest_cache/ 5 | .vscode/ 6 | build/ 7 | dist/ 8 | -------------------------------------------------------------------------------- /CITATION.cff: -------------------------------------------------------------------------------- 1 | cff-version: 1.2.0 2 | message: "If you use this software, please cite it using the following metadata." 3 | authors: 4 | - family-names: "Morrison" 5 | given-names: "Max" 6 | title: "pypar" 7 | version: 0.0.2 8 | date-released: 2021-04-03 9 | url: "https://github.com/maxrmorrison/pypar" 10 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Max Morrison 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |

Python phoneme alignment representation

2 |
3 | 4 | [![PyPI](https://img.shields.io/pypi/v/pypar.svg)](https://pypi.python.org/pypi/pypar) 5 | [![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT) 6 | [![Downloads](https://static.pepy.tech/badge/pypar)](https://pepy.tech/project/pypar) 7 | 8 | `pip install pypar` 9 |
10 | 11 | Word and phoneme alignment representation for speech tasks. This repo does 12 | not perform forced word or phoneme alignment, but provides an interface 13 | for working with the resulting alignment of a forced aligner, such as 14 | [`pyfoal`](https://github.com/maxrmorrison/pyfoal), or a manual alignment. 15 | 16 | 17 | ## Table of contents 18 | 19 | - [Usage](#usage) 20 | * [Creating alignments](#creating-aligments) 21 | * [Accessing words and phonemes](#accessing-words-and-phonemes) 22 | * [Saving alignments](#saving-alignments) 23 | - [Application programming interface (API)](#application-programming-interface-api) 24 | * [`pypar.Alignment`](#pyparalignment) 25 | * [`pypar.Alignment.__init__`](#pyparalignment__init__) 26 | * [`pypar.Alignment.__add__`](#pyparalignment__add__) 27 | * [`pypar.Alignment.__eq__`](#pyparalignment__eq__) 28 | * [`pypar.Alignment.__getitem__`](#pyparalignment__getitem__) 29 | * [`pypar.Alignment.__len__`](#pyparalignment__len__) 30 | * [`pypar.Alignment.__str__`](#pyparalignment__str__) 31 | * [`pypar.Alignment.duration`](#pyparalignmentduration) 32 | * [`pypar.Alignment.end`](#pyparalignmentend) 33 | * [`pypar.Alignment.find`](#pyparalignmentfind) 34 | * [`pypar.Alignment.framewise_phoneme_indices`](#pyparalignmentframewise_phoneme_indices) 35 | * [`pypar.Alignment.phonemes`](#pyparalignmentphonemes) 36 | * [`pypar.Alignment.phoneme_at_time`](#pyparalignmentphoneme_at_time) 37 | * [`pypar.Alignment.phoneme_bounds`](#pyparalignmentphoneme_bounds) 38 | * [`pypar.Alignment.save`](#pyparalignmentsave) 39 | * [`pypar.Alignment.start`](#pyparalignmentstart) 40 | * [`pypar.Alignment.update`](#pyparalignmentupdate) 41 | * [`pypar.Alignment.words`](#pyparalignmentwords) 42 | * [`pypar.Alignment.word_bounds`](#pyparalignmentword_bounds) 43 | * [`pypar.Phoneme`](#pyparphoneme) 44 | * [`pypar.Phoneme.__init__`](#pyparphoneme__init__) 45 | * [`pypar.Phoneme.__eq__`](#pyparphoneme__eq__) 46 | * [`pypar.Phoneme.__str__`](#pyparphoneme__str__) 47 | * [`pypar.Phoneme.duration`](#pyparphonemeduration) 48 | * [`pypar.Phoneme.end`](#pyparphonemeend) 49 | * [`pypar.Phoneme.start`](#pyparphonemestart) 50 | * [`pypar.Word`](#pyparword) 51 | * [`pypar.Word.__init__`](#pyparword__init__) 52 | * [`pypar.Word.__eq__`](#pyparword__eq__) 53 | * [`pypar.Word.__getitem__`](#pyparword__getitem__) 54 | * [`pypar.Word.__len__`](#pyparword__len__) 55 | * [`pypar.Word.__str__`](#pyparword__str__) 56 | * [`pypar.Word.duration`](#pyparwordduration) 57 | * [`pypar.Word.end`](#pyparwordend) 58 | * [`pypar.Word.phoneme_at_time`](#pyparwordphoneme_at_time) 59 | * [`pypar.Word.start`](#pyparwordstart) 60 | - [Tests](#tests) 61 | 62 | ## Usage 63 | 64 | ### Creating alignments 65 | 66 | If you already have the alignment saved to a `json`, `mlf`, or `TextGrid` 67 | file, pass the name of the file. Valid examples of each format can be found in 68 | `test/assets/`. 69 | 70 | ```python 71 | alignment = pypar.Alignment(file) 72 | ``` 73 | 74 | Alignments can be created manually from `Word` and `Phoneme` objects. Start and 75 | end times are given in seconds. 76 | 77 | ```python 78 | # Create a word from phonemes 79 | word = pypar.Word( 80 | 'THE', 81 | [pypar.Phoneme('DH', 0., .03), pypar.Phoneme('AH0', .03, .06)]) 82 | 83 | # Create a silence 84 | silence = pypar.Word(pypar.SILENCE, pypar.Phoneme(pypar.SILENCE, .06, .16)) 85 | 86 | # Make an alignment 87 | alignment = pypar.Alignment([word, silence]) 88 | ``` 89 | 90 | You can create a new alignment from existing alignments via slicing and 91 | concatenation. 92 | 93 | ```python 94 | # Slice 95 | first_two_words = alignment[:2] 96 | 97 | # Concatenate 98 | alignment_with_repeat = first_two_words + alignment 99 | ``` 100 | 101 | 102 | ### Accessing words and phonemes 103 | 104 | To retrieve a list of words in the alignment, use `alignment.words()`. 105 | To retrieve a list of phonemes, use `alignment.phonemes()`. The `Alignment`, 106 | `Word`, and `Phoneme` objects all define `.start()`, `.end()`, and 107 | `.duration()` methods, which return the start time, end time, and duration, 108 | respectively. All times are given in units of seconds. These objects also 109 | define equality checks via `==`, casting to string with `str()`, and iteration 110 | as follows. 111 | 112 | ```python 113 | # Iterate over words 114 | for word in alignment: 115 | 116 | # Access start and end times 117 | assert word.duration() == word.end() - word.start() 118 | 119 | # Iterate over phonemes in word 120 | for phoneme in word: 121 | 122 | # Access string representation 123 | assert isinstance(str(phoneme), str) 124 | ``` 125 | 126 | To access a word or phoneme at a specific time, pass the time in seconds to 127 | `alignment.word_at_time` or `alignment.phoneme_at_time`. 128 | 129 | To retrieve the frame indices of the start and end of a word or phoneme, pass 130 | the audio sampling rate and hopsize (in samples) to `alignment.word_bounds` or 131 | `alignment.phoneme_bounds`. 132 | 133 | 134 | ### Saving alignments 135 | 136 | To save an alignment to disk, use `alignment.save(file)`, where `file` is the 137 | desired filename. `pypar` currently supports saving as a `json` or `TextGrid` 138 | file. 139 | 140 | 141 | ## Application programming interface (API) 142 | 143 | ### `pypar.Alignment` 144 | 145 | #### `pypar.Alignment.__init__` 146 | 147 | ```python 148 | def __init__( 149 | self, 150 | alignment: Union[str, bytes, os.PathLike, List[pypar.Word], dict] 151 | ) -> None: 152 | """Create alignment 153 | 154 | Arguments 155 | alignment 156 | The filename, list of words, or json dict of the alignment 157 | """ 158 | ``` 159 | 160 | 161 | #### `pypar.Alignment.__add__` 162 | 163 | ```python 164 | def __add__(self, other): 165 | """Add alignments by concatenation 166 | 167 | Arguments 168 | other 169 | The alignment to compare to 170 | 171 | Returns 172 | The concatenated alignment 173 | """ 174 | ``` 175 | 176 | 177 | #### `pypar.Alignment.__eq__` 178 | 179 | ```python 180 | def __eq__(self, other) -> bool: 181 | """Equality comparison for alignments 182 | 183 | Arguments 184 | other 185 | The alignment to compare to 186 | 187 | Returns 188 | Whether the alignments are equal 189 | """ 190 | ``` 191 | 192 | 193 | #### `pypar.Alignment.__getitem__` 194 | 195 | ```python 196 | def __getitem__(self, idx: Union[int, slice]) -> pypar.Word: 197 | """Retrieve the idxth word 198 | 199 | Arguments 200 | idx 201 | The index of the word to retrieve 202 | 203 | Returns 204 | The word at index idx 205 | """ 206 | ``` 207 | 208 | 209 | #### `pypar.Alignment.__len__` 210 | 211 | ```python 212 | def __len__(self) -> int: 213 | """Retrieve the number of words 214 | 215 | Returns 216 | The number of words in the alignment 217 | """ 218 | ``` 219 | 220 | 221 | #### `pypar.Alignment.__str__` 222 | 223 | ```python 224 | def __str__(self) -> str: 225 | """Retrieve the text 226 | 227 | Returns 228 | The words in the alignment separated by spaces 229 | """ 230 | ``` 231 | 232 | 233 | #### `pypar.Alignment.duration` 234 | 235 | ```python 236 | def duration(self) -> float: 237 | """Retrieve the duration of the alignment in seconds 238 | 239 | Returns 240 | The duration in seconds 241 | """ 242 | ``` 243 | 244 | 245 | #### `pypar.Alignment.end` 246 | 247 | ```python 248 | def end(self) -> float: 249 | """Retrieve the end time of the alignment in seconds 250 | 251 | Returns 252 | The end time in seconds 253 | """ 254 | ``` 255 | 256 | 257 | #### `pypar.Alignment.framewise_phoneme_indices` 258 | 259 | ```python 260 | def framewise_phoneme_indices( 261 | self, 262 | phoneme_map: Dict[str, int], 263 | hopsize: float, 264 | times: Optional[List[float]] = None 265 | ) -> List[int]: 266 | """Convert alignment to phoneme indices at regular temporal interval 267 | 268 | Arguments 269 | phoneme_map 270 | Mapping from phonemes to indices 271 | hopsize 272 | Temporal interval between frames in seconds 273 | times 274 | Specified times in seconds to sample phonemes 275 | """ 276 | ``` 277 | 278 | 279 | #### `pypar.Alignment.find` 280 | 281 | ```python 282 | def find(self, words: str) -> int: 283 | """Find the words in the alignment 284 | 285 | Arguments 286 | words 287 | The words to find 288 | 289 | Returns 290 | The index of the start of the words or -1 if not found 291 | """ 292 | ``` 293 | 294 | 295 | #### `pypar.Alignment.phonemes` 296 | 297 | ```python 298 | def phonemes(self) -> List[pypar.Phoneme]: 299 | """Retrieve the phonemes in the alignment 300 | 301 | Returns 302 | The phonemes in the alignment 303 | """ 304 | ``` 305 | 306 | 307 | #### `pypar.Alignment.phoneme_at_time` 308 | 309 | ```python 310 | def phoneme_at_time(self, time: float) -> Optional[pypar.Phoneme]: 311 | """Retrieve the phoneme spoken at specified time 312 | 313 | Arguments 314 | time 315 | Time in seconds 316 | 317 | Returns 318 | The phoneme at the given time (or None if time is out of bounds) 319 | """ 320 | ``` 321 | 322 | 323 | #### `pypar.Alignment.phoneme_bounds` 324 | 325 | ```python 326 | def phoneme_bounds( 327 | self, 328 | sample_rate: int, 329 | hopsize: int = 1 330 | ) -> List[Tuple[int, int]]: 331 | """Retrieve the start and end frame index of each phoneme 332 | 333 | Arguments 334 | sample_rate 335 | The audio sampling rate 336 | hopsize 337 | The number of samples between successive frames 338 | 339 | Returns 340 | The start and end indices of the phonemes 341 | """ 342 | ``` 343 | 344 | 345 | #### `pypar.Alignment.save` 346 | 347 | ```python 348 | def save(self, filename: Union[str, bytes, os.PathLike]) -> None: 349 | """Save alignment to json 350 | 351 | Arguments 352 | filename 353 | The location on disk to save the phoneme alignment json 354 | """ 355 | ``` 356 | 357 | 358 | #### `pypar.Alignment.start` 359 | 360 | ```python 361 | def start(self) -> float: 362 | """Retrieve the start time of the alignment in seconds 363 | 364 | Returns 365 | The start time in seconds 366 | """ 367 | ``` 368 | 369 | 370 | #### `pypar.Alignment.update` 371 | 372 | ```python 373 | def update( 374 | self, 375 | idx: int = 0, 376 | durations: Optional[List[float]] = None, 377 | start: Optional[float] = None 378 | ) -> None: 379 | """Update alignment starting from phoneme index idx 380 | 381 | Arguments 382 | idx 383 | The index of the first phoneme whose duration is being updated 384 | durations 385 | The new phoneme durations, starting from idx 386 | start 387 | The start time of the alignment 388 | """ 389 | ``` 390 | 391 | 392 | #### `pypar.Alignment.words` 393 | 394 | ```python 395 | def words(self) -> List[pypar.Word]: 396 | """Retrieve the words in the alignment 397 | 398 | Returns 399 | The words in the alignment 400 | """ 401 | ``` 402 | 403 | 404 | #### `pypar.Alignment.word_bounds` 405 | 406 | ```python 407 | def word_at_time(self, time: float) -> Optional[pypar.Word]: 408 | """Retrieve the word spoken at specified time 409 | 410 | Arguments 411 | time 412 | Time in seconds 413 | 414 | Returns 415 | The word spoken at the specified time 416 | """ 417 | ``` 418 | 419 | 420 | ### `pypar.Phoneme` 421 | 422 | #### `pypar.Phoneme.__init__` 423 | 424 | ```python 425 | def __init__(self, phoneme: str, start: float, end: float) -> None: 426 | """Create phoneme 427 | 428 | Arguments 429 | phoneme 430 | The phoneme 431 | start 432 | The start time in seconds 433 | end 434 | The end time in seconds 435 | """ 436 | ``` 437 | 438 | 439 | #### `pypar.Phoneme.__eq__` 440 | 441 | ```python 442 | def __eq__(self, other) -> bool: 443 | """Equality comparison for phonemes 444 | 445 | Arguments 446 | other 447 | The phoneme to compare to 448 | 449 | Returns 450 | Whether the phonemes are equal 451 | """ 452 | ``` 453 | 454 | 455 | #### `pypar.Phoneme.__str__` 456 | 457 | ```python 458 | def __str__(self) -> str: 459 | """Retrieve the phoneme text 460 | 461 | Returns 462 | The phoneme 463 | """ 464 | ``` 465 | 466 | 467 | #### `pypar.Phoneme.duration` 468 | 469 | ```python 470 | def duration(self) -> float: 471 | """Retrieve the phoneme duration 472 | 473 | Returns 474 | The duration in seconds 475 | """ 476 | ``` 477 | 478 | 479 | #### `pypar.Phoneme.end` 480 | 481 | ```python 482 | def end(self) -> float: 483 | """Retrieve the end time of the phoneme in seconds 484 | 485 | Returns 486 | The end time in seconds 487 | """ 488 | ``` 489 | 490 | 491 | #### `pypar.Phoneme.start` 492 | 493 | ```python 494 | def start(self) -> float: 495 | """Retrieve the start time of the phoneme in seconds 496 | 497 | Returns 498 | The start time in seconds 499 | """ 500 | ``` 501 | 502 | 503 | ### `pypar.Word` 504 | 505 | #### `pypar.Word.__init__` 506 | 507 | ```python 508 | def __init__(self, word: str, phonemes: List[pypar.Phoneme]) -> None: 509 | """Create word 510 | 511 | Arguments 512 | word 513 | The word 514 | phonemes 515 | The phonemes in the word 516 | """ 517 | ``` 518 | 519 | 520 | #### `pypar.Word.__eq__` 521 | 522 | ```python 523 | def __eq__(self, other) -> bool: 524 | """Equality comparison for words 525 | 526 | Arguments 527 | other 528 | The word to compare to 529 | 530 | Returns 531 | Whether the words are the same 532 | """ 533 | ``` 534 | 535 | 536 | #### `pypar.Word.__getitem__` 537 | 538 | ```python 539 | def __getitem__(self, idx: int) -> pypar.Phoneme: 540 | """Retrieve the idxth phoneme 541 | 542 | Arguments 543 | idx 544 | The index of the phoneme to retrieve 545 | 546 | Returns 547 | The phoneme at index idx 548 | """ 549 | ``` 550 | 551 | 552 | #### `pypar.Word.__len__` 553 | 554 | ```python 555 | def __len__(self) -> int: 556 | """Retrieve the number of phonemes 557 | 558 | Returns 559 | The number of phonemes 560 | """ 561 | ``` 562 | 563 | 564 | #### `pypar.Word.__str__` 565 | 566 | ```python 567 | def __str__(self) -> str: 568 | """Retrieve the word text 569 | 570 | Returns 571 | The word text 572 | """ 573 | ``` 574 | 575 | 576 | #### `pypar.Word.duration` 577 | 578 | ```python 579 | def duration(self) -> float: 580 | """Retrieve the word duration in seconds 581 | 582 | Returns 583 | The duration in seconds 584 | """ 585 | ``` 586 | 587 | 588 | #### `pypar.Word.end` 589 | 590 | ```python 591 | def end(self) -> float: 592 | """Retrieve the end time of the word in seconds 593 | 594 | Returns 595 | The end time in seconds 596 | """ 597 | ``` 598 | 599 | 600 | #### `pypar.Word.phoneme_at_time` 601 | 602 | ```python 603 | def phoneme_at_time(self, time: float) -> Optional[pypar.Phoneme]: 604 | """Retrieve the phoneme at the specified time 605 | 606 | Arguments 607 | time 608 | Time in seconds 609 | 610 | Returns 611 | The phoneme at the given time (or None if time is out of bounds) 612 | """ 613 | ``` 614 | 615 | 616 | #### `pypar.Word.start` 617 | 618 | ```python 619 | def start(self) -> float: 620 | """Retrieve the start time of the word in seconds 621 | 622 | Returns 623 | The start time in seconds 624 | """ 625 | ``` 626 | 627 | 628 | ## Tests 629 | 630 | Tests can be run as follows. 631 | 632 | ``` 633 | pip install pytest 634 | pytest 635 | ``` 636 | -------------------------------------------------------------------------------- /pypar/__init__.py: -------------------------------------------------------------------------------- 1 | from .phoneme import Phoneme 2 | from .word import Word 3 | from .alignment import Alignment 4 | from . import compare 5 | from . import textgrid 6 | 7 | SILENCE = '' 8 | -------------------------------------------------------------------------------- /pypar/alignment.py: -------------------------------------------------------------------------------- 1 | import copy 2 | import json 3 | import math 4 | import os 5 | from pathlib import Path 6 | from typing import Dict, List, Optional, Tuple, Union 7 | 8 | import pypar 9 | 10 | 11 | ############################################################################### 12 | # Alignment representation 13 | ############################################################################### 14 | 15 | 16 | class Alignment: 17 | """Word and phoneme alignment""" 18 | 19 | def __init__( 20 | self, 21 | alignment: Union[str, bytes, os.PathLike, List[pypar.Word], dict] 22 | ) -> None: 23 | """Create alignment 24 | 25 | Arguments 26 | alignment 27 | The filename, list of words, or json dict of the alignment 28 | """ 29 | if isinstance(alignment, str): 30 | 31 | # Load alignment from disk 32 | self._words = self.load(alignment) 33 | 34 | elif isinstance(alignment, Path): 35 | 36 | # Cast and load 37 | self._words = self.load(str(alignment)) 38 | 39 | elif isinstance(alignment, list): 40 | self._words = alignment 41 | 42 | # Require first word to start at 0 seconds 43 | self.update(start=0.) 44 | 45 | elif isinstance(alignment, dict): 46 | self._words = self.parse_json(alignment) 47 | 48 | # Ensure there are no gaps (by filling with silence) 49 | self.validate() 50 | 51 | def __add__(self, other): 52 | """Add alignments by concatenation 53 | 54 | Arguments 55 | other 56 | The alignment to compare to 57 | 58 | Returns 59 | The concatenated alignment 60 | """ 61 | # Don't change original 62 | other = copy.deepcopy(other) 63 | 64 | # Move start time of other to end of self 65 | other.update(start=self.end()) 66 | 67 | # Concatenate word lists 68 | return Alignment(self._words + other.words) 69 | 70 | def __eq__(self, other) -> bool: 71 | """Equality comparison for alignments 72 | 73 | Arguments 74 | other 75 | The alignment to compare to 76 | 77 | Returns 78 | Whether the alignments are equal 79 | """ 80 | return \ 81 | len(self) == len(other) and \ 82 | all(word == other_word for word, other_word in zip(self, other)) 83 | 84 | def __getitem__(self, idx: Union[int, slice]) -> pypar.Word: 85 | """Retrieve the idxth word 86 | 87 | Arguments 88 | idx 89 | The index of the word to retrieve 90 | 91 | Returns 92 | The word at index idx 93 | """ 94 | if isinstance(idx, slice): 95 | 96 | # Slice into word list 97 | return Alignment(copy.deepcopy(self._words[idx])) 98 | 99 | # Retrieve a single word 100 | return self._words[idx] 101 | 102 | def __len__(self) -> int: 103 | """Retrieve the number of words 104 | 105 | Returns 106 | The number of words in the alignment 107 | """ 108 | return len(self._words) 109 | 110 | def __str__(self) -> str: 111 | """Retrieve the text 112 | 113 | Returns 114 | The words in the alignment separated by spaces 115 | """ 116 | return ' '.join([str(word) for word in self._words 117 | if str(word) != pypar.SILENCE]) 118 | 119 | def duration(self) -> float: 120 | """Retrieve the duration of the alignment in seconds 121 | 122 | Returns 123 | The duration in seconds 124 | """ 125 | return self.end() - self.start() 126 | 127 | def end(self) -> float: 128 | """Retrieve the end time of the alignment in seconds 129 | 130 | Returns 131 | The end time in seconds 132 | """ 133 | return self._words[-1].end() 134 | 135 | def find(self, words: str) -> int: 136 | """Find the words in the alignment 137 | 138 | Arguments 139 | words 140 | The words to find 141 | 142 | Returns 143 | The index of the start of the words or -1 if not found 144 | """ 145 | # Split at spaces 146 | words = words.split(' ') 147 | 148 | for i in range(0, len(self._words) - len(words) + 1): 149 | 150 | # Get text 151 | text = str(self._words[i]).lower() 152 | 153 | # Skip silence 154 | if text == pypar.SILENCE: 155 | continue 156 | 157 | j, k = 0, 0 158 | while j < len(words): 159 | 160 | # Compare words 161 | if text != words[j]: 162 | break 163 | 164 | # Increment words 165 | j += 1 166 | k += 1 167 | text = str(self._words[i + k]).lower() 168 | 169 | # skip silence 170 | while text == pypar.SILENCE: 171 | k += 1 172 | text = str(self._words[i + k]).lower() 173 | 174 | # Found match; return indices 175 | if j == len(words): 176 | return i 177 | 178 | # No match 179 | return -1 180 | 181 | def framewise_phoneme_indices( 182 | self, 183 | phoneme_map: Dict[str, int], 184 | hopsize: float, 185 | times: Optional[List[float]] = None 186 | ) -> List[int]: 187 | """Convert alignment to phoneme indices at regular temporal interval 188 | 189 | Arguments 190 | phoneme_map 191 | Mapping from phonemes to indices 192 | hopsize 193 | Temporal interval between frames in seconds 194 | times 195 | Specified times in seconds to sample phonemes 196 | """ 197 | if times is None: 198 | times = [ 199 | i * hopsize for i in 200 | range(math.ceil(self.duration() / hopsize))] 201 | phonemes = [self.phoneme_at_time(time) for time in times] 202 | return [phoneme_map[str(phoneme)] for phoneme in phonemes] 203 | 204 | def phonemes(self) -> List[pypar.Phoneme]: 205 | """Retrieve the phonemes in the alignment 206 | 207 | Returns 208 | The phonemes in the alignment 209 | """ 210 | return [phoneme for word in self for phoneme in word] 211 | 212 | def phoneme_at_time(self, time: float) -> Optional[pypar.Phoneme]: 213 | """Retrieve the phoneme spoken at specified time 214 | 215 | Arguments 216 | time 217 | Time in seconds 218 | 219 | Returns 220 | The phoneme at the given time (or None if time is out of bounds) 221 | """ 222 | word = self.word_at_time(time) 223 | return word.phoneme_at_time(time) if word else None 224 | 225 | def phoneme_bounds( 226 | self, 227 | sample_rate: int, 228 | hopsize: int = 1 229 | ) -> List[Tuple[int, int]]: 230 | """Retrieve the start and end frame index of each phoneme 231 | 232 | Arguments 233 | sample_rate 234 | The audio sampling rate 235 | hopsize 236 | The number of samples between successive frames 237 | 238 | Returns 239 | The start and end indices of the phonemes 240 | """ 241 | bounds = [(p.start(), p.end()) for p in self.phonemes() 242 | if str(p) != pypar.SILENCE] 243 | return [(int(a * sample_rate / hopsize), 244 | int(b * sample_rate / hopsize)) 245 | for a, b in bounds] 246 | 247 | def save(self, filename: Union[str, bytes, os.PathLike]) -> None: 248 | """Save alignment to json 249 | 250 | Arguments 251 | filename 252 | The location on disk to save the phoneme alignment json 253 | """ 254 | if os.path.dirname(filename): 255 | os.makedirs(os.path.dirname(filename), exist_ok=True) 256 | if isinstance(filename, Path): 257 | filename = str(filename) 258 | extension = filename.split('.')[-1] 259 | if extension == 'json': 260 | self.save_json(filename) 261 | elif extension.lower() == 'textgrid': 262 | self.save_textgrid(filename) 263 | else: 264 | raise ValueError( 265 | f'No save routine for files with extension {extension}') 266 | 267 | def start(self) -> float: 268 | """Retrieve the start time of the alignment in seconds 269 | 270 | Returns 271 | The start time in seconds 272 | """ 273 | return self._words[0].start() 274 | 275 | def update( 276 | self, 277 | idx: int = 0, 278 | durations: Optional[List[float]] = None, 279 | start: Optional[float] = None 280 | ) -> None: 281 | """Update alignment starting from phoneme index idx 282 | 283 | Arguments 284 | idx 285 | The index of the first phoneme whose duration is being updated 286 | durations 287 | The new phoneme durations, starting from idx 288 | start 289 | The start time of the alignment 290 | """ 291 | # If durations are not given, just update phoneme start and end times 292 | durations = [] if durations is None else durations 293 | 294 | # Word start time (in seconds) and phoneme start index 295 | start = self.start() if start is None else start 296 | start_phoneme = 0 297 | 298 | # Update each word 299 | for word in self: 300 | end_phoneme = start_phoneme + len(word) 301 | 302 | # Update phoneme alignment of this word 303 | word = self.update_word( 304 | word, idx, durations, start, start_phoneme, end_phoneme) 305 | 306 | start = word.end() 307 | start_phoneme += len(word) 308 | 309 | def words(self) -> List[pypar.Word]: 310 | """Retrieve the words in the alignment 311 | 312 | Returns 313 | The words in the alignment 314 | """ 315 | return self._words 316 | 317 | def word_at_time(self, time: float) -> Optional[pypar.Word]: 318 | """Retrieve the word spoken at specified time 319 | 320 | Arguments 321 | time 322 | Time in seconds 323 | 324 | Returns 325 | The word spoken at the specified time 326 | """ 327 | for word in self: 328 | if word.start() <= time <= word.end(): 329 | return word 330 | return None 331 | 332 | def word_bounds( 333 | self, 334 | sample_rate: int, 335 | hopsize: int = 1, 336 | silences: bool = False 337 | ) -> List[Tuple[int, int]]: 338 | """Retrieve the start and end frame index of each word 339 | 340 | Arguments 341 | sample_rate 342 | The audio sampling rate 343 | hopsize 344 | The number of samples between successive frames 345 | silences 346 | Whether to include silences as words 347 | 348 | Returns 349 | The start and end indices of the words 350 | """ 351 | words = [ 352 | word for word in self if str(word) != pypar.SILENCE or silences] 353 | bounds = [(word.start(), word.end()) for word in words] 354 | return [(int(a * sample_rate / hopsize), 355 | int(b * sample_rate / hopsize)) 356 | for a, b in bounds] 357 | 358 | ########################################################################### 359 | # Utilities 360 | ########################################################################### 361 | 362 | def json(self): 363 | """Convert to json format""" 364 | words = [] 365 | for word in self._words: 366 | 367 | # Convert phonemes to list 368 | phonemes = [[str(phoneme), phoneme.start(), phoneme.end()] 369 | for phoneme in word] 370 | 371 | # Convert word to dict format 372 | words.append({'alignedWord': str(word), 373 | 'start': word.start(), 374 | 'end': word.end(), 375 | 'phonemes': phonemes}) 376 | 377 | return {'words': words} 378 | 379 | def line_is_valid(self, line): 380 | """Check if a line of a mlf file represents a phoneme""" 381 | line = line.strip().split() 382 | if not line: 383 | return False 384 | return len(line) in [4, 5] 385 | 386 | def load(self, file): 387 | """Load alignment from file""" 388 | extension = file.split('.')[-1] 389 | if extension == 'mlf': 390 | return self.load_mlf(file) 391 | if extension == 'json': 392 | return self.load_json(file) 393 | if extension.lower() == 'textgrid': 394 | return self.load_textgrid(file) 395 | raise ValueError( 396 | f'No alignment representation for file extension {extension}') 397 | 398 | def load_json(self, filename): 399 | """Load alignment from json file""" 400 | # Load from json file 401 | with open(filename) as file: 402 | return self.parse_json(json.load(file)) 403 | 404 | def load_mlf(self, filename): 405 | """Load from mlf file""" 406 | # Load file from disk 407 | with open(filename) as file: 408 | # Read in phoneme alignment 409 | lines = [Line(line) for line in file.readlines() 410 | if self.line_is_valid(line)] 411 | 412 | # Remove silence tokens with 0 duration 413 | lines = [line for line in lines if line.start < line.end] 414 | 415 | # Extract words and phonemes 416 | phonemes = [] 417 | words = [] 418 | for line in lines: 419 | 420 | # Start new word 421 | if line.word is not None: 422 | 423 | # Add word that just finished 424 | if phonemes: 425 | words.append(pypar.Word(word, phonemes)) 426 | phonemes = [] 427 | 428 | word = line.word 429 | 430 | # Add a phoneme 431 | phonemes.append(pypar.Phoneme(line.phoneme, line.start, line.end)) 432 | 433 | # Handle last word 434 | if phonemes: 435 | words.append(pypar.Word(word, phonemes)) 436 | 437 | return words 438 | 439 | def load_textgrid(self, filename): 440 | """Load from textgrid file""" 441 | # Load file 442 | grid = pypar.textgrid.TextGrid.fromFile(filename) 443 | 444 | # Get phoneme and word representations 445 | if 'word' in grid[0].name and 'phon' in grid[1].name: 446 | word_tier, phon_tier = grid[0], grid[1] 447 | elif 'phon' in grid[0].name and 'word' in grid[1].name: 448 | phon_tier, word_tier = grid[0], grid[1] 449 | else: 450 | raise ValueError( 451 | 'Cannot determine which TextGrid tiers ' + 452 | 'correspond to words and phonemes') 453 | 454 | # Iterate over words 455 | words = [] 456 | phon_idx = 0 457 | for word in word_tier: 458 | 459 | # Get all phonemes for this word 460 | phonemes = [] 461 | while ( 462 | phon_idx < len(phon_tier) and 463 | phon_tier[phon_idx].maxTime <= word.maxTime 464 | ): 465 | phonemes.append( 466 | pypar.Phoneme( 467 | phon_tier[phon_idx].mark, 468 | phon_tier[phon_idx].minTime, 469 | phon_tier[phon_idx].maxTime)) 470 | phon_idx += 1 471 | 472 | # Add finished word 473 | words.append(pypar.Word(word.mark, phonemes)) 474 | 475 | return words 476 | 477 | def parse_json(self, alignment): 478 | """Construct word list from json representation""" 479 | words = [] 480 | for word in alignment['words']: 481 | try: 482 | 483 | # Add a word 484 | phonemes = [ 485 | pypar.Phoneme(*phoneme) for phoneme in word['phonemes']] 486 | words.append(pypar.Word(word['alignedWord'], phonemes)) 487 | 488 | except KeyError: 489 | 490 | # Add a silence 491 | phonemes = [ 492 | pypar.Phoneme(pypar.SILENCE, word['start'], word['end'])] 493 | words.append(pypar.Word(pypar.SILENCE, phonemes)) 494 | 495 | return words 496 | 497 | def save_json(self, filename): 498 | """Save alignment as json""" 499 | with open(filename, 'w', encoding='utf-8') as file: 500 | json.dump(self.json(), file, ensure_ascii=False, indent=4) 501 | 502 | def save_textgrid(self, filename): 503 | """Save alignment as textgrid""" 504 | # Construct phoneme tier 505 | phon_tier = pypar.textgrid.IntervalTier('phone') 506 | for phoneme in self.phonemes(): 507 | phon_tier.add(phoneme.start(), phoneme.end(), str(phoneme)) 508 | 509 | # Construct word tier 510 | word_tier = pypar.textgrid.IntervalTier('word') 511 | for word in self: 512 | word_tier.add(word.start(), word.end(), str(word)) 513 | 514 | # Save textgrid 515 | pypar.textgrid.TextGrid([phon_tier, word_tier]).write(filename) 516 | 517 | def update_word( 518 | self, 519 | word, 520 | idx, 521 | durations, 522 | start, 523 | start_phoneme, 524 | end_phoneme): 525 | """Update the phoneme alignment of one word""" 526 | # All phonemes beyond (and including) idx must be updated 527 | if end_phoneme > idx: 528 | 529 | # Retrieve current phoneme durations for word 530 | word_durations = [phoneme.duration() for phoneme in word] 531 | 532 | # The first len(durations) phonemes use new durations 533 | if start_phoneme - idx < len(durations) and end_phoneme - idx > 0: 534 | 535 | # Get indices into durations for copy/paste operation 536 | src_start_idx = max(0, start_phoneme - idx) 537 | src_end_idx = min(len(durations), end_phoneme - idx) 538 | src = durations[src_start_idx:src_end_idx] 539 | 540 | # Case 1: replace all phonemes in word 541 | if len(src) == len(word_durations): 542 | dst_start_idx, dst_end_idx = 0, len(word_durations) 543 | 544 | # Case 2: replace right-most phonemes in word 545 | elif idx > start_phoneme and len(src) == end_phoneme - idx: 546 | dst_start_idx = len(word_durations) - len(src) 547 | dst_end_idx = len(word_durations) 548 | 549 | # Case 3: replace left-most phonemes in word 550 | elif idx <= start_phoneme: 551 | dst_start_idx = 0 552 | dst_end_idx = len(word_durations) - len(src) 553 | 554 | # Case 4: replace phonemes in center of word 555 | else: 556 | dst_start_idx = -(start_phoneme - idx) 557 | dst_end_idx = dst_start_idx + len(src) 558 | 559 | # Perform copy/paste on duration vector 560 | word_durations[dst_start_idx:dst_end_idx] = \ 561 | durations[src_start_idx:src_end_idx] 562 | 563 | # Get new durations for word 564 | word.update(start, word_durations) 565 | 566 | return word 567 | 568 | def validate(self): 569 | """Ensures that adjacent start/stop times are valid by adding silence""" 570 | i = 0 571 | start = 0. 572 | while i < len(self) - 1: 573 | 574 | # Get start and end times between words 575 | end = self[i].start() 576 | 577 | # Patch gap with silence 578 | if end - start > 1e-3: 579 | 580 | # Extend existing silence if possible 581 | if str(self[i]) == pypar.SILENCE: 582 | self[i][0]._start = start 583 | else: 584 | word = pypar.Word( 585 | pypar.SILENCE, 586 | [pypar.Phoneme(pypar.SILENCE, start, end)]) 587 | self._words.insert(i, word) 588 | i += 1 589 | 590 | i += 1 591 | start = self[i].end() 592 | 593 | # Phoneme gap validation 594 | for word in self: 595 | word.validate() 596 | 597 | 598 | ############################################################################### 599 | # Utilities 600 | ############################################################################### 601 | 602 | 603 | class Line: 604 | """One line of a HTK mlf file""" 605 | 606 | def __init__(self, line): 607 | line = line.strip().split() 608 | 609 | if len(line) == 4: 610 | start, end, self.phoneme, _ = line 611 | self.word = None 612 | else: 613 | start, end, self.phoneme, _, self.word = line 614 | 615 | self.start = float(start) / 10000000. 616 | self.end = float(end) / 10000000. 617 | -------------------------------------------------------------------------------- /pypar/compare.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | ############################################################################### 5 | # Alignment comparisons 6 | ############################################################################### 7 | 8 | 9 | def per_frame_rate( 10 | alignment_a, 11 | alignment_b, 12 | sample_rate, 13 | hopsize, 14 | frames=None): 15 | """Compute the per-frame rate difference between alignments A and B 16 | 17 | Arguments 18 | alignment_a : Alignment 19 | The source alignment 20 | alignment_b : Alignment 21 | The target alignment 22 | sample_rate : int 23 | The audio sampling rate 24 | hopsize : int 25 | The number of samples between successive frames 26 | frames: 27 | The number of frames of audio. May vary based on padding. 28 | 29 | Returns 30 | rates : list[float] 31 | The frame-wise relative speed of alignment B to alignment A 32 | """ 33 | # Create dict mapping phoneme to relative rate 34 | rates_per_phoneme = per_phoneme_rate(alignment_a, alignment_b) 35 | dict_keys = [phoneme_tuple(phoneme) for phoneme in alignment_a.phonemes()] 36 | rate_map = dict(zip(dict_keys, rates_per_phoneme)) 37 | 38 | # Query the dict every hopsize seconds 39 | if frames is None: 40 | frames = 1 + int(round(alignment_a.end(), 6) * sample_rate / hopsize) 41 | return [rate_map[phoneme_tuple(alignment_a.phoneme_at_time(t))] 42 | for t in np.linspace(0., alignment_a.end(), frames)] 43 | 44 | 45 | def per_phoneme_rate(alignment_a, alignment_b): 46 | """Compute the per-phoneme rate difference between alignments A and B 47 | 48 | Arguments 49 | alignment_a : Alignment 50 | The source alignment 51 | alignment_b : Alignment 52 | The target alignment 53 | 54 | Returns 55 | rates : list[float] 56 | The phoneme-wise relative speed of alignment B to alignment A 57 | """ 58 | # Error check alignments 59 | if len(alignment_a.phonemes()) != len(alignment_b.phonemes()): 60 | raise ValueError('Alignments must have same number of phonemes') 61 | 62 | iterator = zip(alignment_a.phonemes(), alignment_b.phonemes()) 63 | return [target.duration() / source.duration() 64 | for source, target in iterator] 65 | 66 | 67 | ############################################################################### 68 | # Alignment comparisons 69 | ############################################################################### 70 | 71 | 72 | def phoneme_tuple(phoneme): 73 | """Convert phoneme to hashable tuple representation 74 | 75 | Arguments 76 | phoneme - The phoneme to convert 77 | 78 | Returns 79 | tuple(float, float, string) 80 | The phoneme represented as a tuple 81 | """ 82 | return (phoneme.start(), phoneme.end(), str(phoneme)) 83 | -------------------------------------------------------------------------------- /pypar/phoneme.py: -------------------------------------------------------------------------------- 1 | ############################################################################### 2 | # Phoneme 3 | ############################################################################### 4 | 5 | 6 | class Phoneme: 7 | """Aligned phoneme representation""" 8 | 9 | def __init__(self, phoneme: str, start: float, end: float) -> None: 10 | """Create phoneme 11 | 12 | Arguments 13 | phoneme 14 | The phoneme 15 | start 16 | The start time in seconds 17 | end 18 | The end time in seconds 19 | """ 20 | self.phoneme = phoneme 21 | self._start = start 22 | self._end = end 23 | 24 | def __eq__(self, other) -> bool: 25 | """Equality comparison for phonemes 26 | 27 | Arguments 28 | other 29 | The phoneme to compare to 30 | 31 | Returns 32 | Whether the phonemes are equal 33 | """ 34 | return \ 35 | str(self) == str(other) and \ 36 | abs(self._start - other._start) < 1e-5 and \ 37 | abs(self._end - other._end) < 1e-5 38 | 39 | def __str__(self) -> str: 40 | """Retrieve the phoneme text 41 | 42 | Returns 43 | The phoneme 44 | """ 45 | return self.phoneme 46 | 47 | def duration(self) -> float: 48 | """Retrieve the phoneme duration 49 | 50 | Returns 51 | The duration in seconds 52 | """ 53 | return self._end - self._start 54 | 55 | def end(self) -> float: 56 | """Retrieve the end time of the phoneme in seconds 57 | 58 | Returns 59 | The end time in seconds 60 | """ 61 | return self._end 62 | 63 | def start(self) -> float: 64 | """Retrieve the start time of the phoneme in seconds 65 | 66 | Returns 67 | The start time in seconds 68 | """ 69 | return self._start 70 | -------------------------------------------------------------------------------- /pypar/textgrid.py: -------------------------------------------------------------------------------- 1 | import re 2 | 3 | 4 | ############################################################################### 5 | # Textgrid 6 | ############################################################################### 7 | 8 | 9 | class TextGrid: 10 | 11 | def __init__(self, tiers=None): 12 | self.tiers = [] if tiers is None else tiers 13 | 14 | def __len__(self): 15 | return len(self.tiers) 16 | 17 | def __getitem__(self, i): 18 | return self.tiers[i] 19 | 20 | def read(self, file): 21 | # Open file 22 | with open(file) as file: 23 | 24 | # Parse header 25 | _, short = parse_header(file) 26 | first_line_beside_header = file.readline() 27 | try: 28 | parse_line(first_line_beside_header, short) 29 | except Exception: 30 | short = True 31 | parse_line(first_line_beside_header, short) 32 | parse_line(file.readline(), short) 33 | file.readline() 34 | if short: 35 | tiers = int(file.readline().strip()) 36 | else: 37 | tiers = int(file.readline().strip().split()[2]) 38 | if not short: 39 | file.readline() 40 | 41 | # Iterate over tiers 42 | for _ in range(tiers): 43 | 44 | # Maybe flush extra line 45 | if not short: 46 | file.readline() 47 | 48 | # Create interval tier 49 | if parse_line(file.readline(), short) == 'IntervalTier': 50 | 51 | # Initialize 52 | name = parse_line(file.readline(), short) 53 | tier = IntervalTier(name) 54 | 55 | # Flush tier min/max time 56 | parse_line(file.readline(), short) 57 | parse_line(file.readline(), short) 58 | 59 | # Populate 60 | for _ in range(int(parse_line(file.readline(), short))): 61 | if not short: 62 | file.readline().rstrip().split() 63 | minTime = parse_line(file.readline(), short) 64 | maxTime = parse_line(file.readline(), short) 65 | mark = parseMark(file, short) 66 | if minTime < maxTime: 67 | tier.add(minTime, maxTime, mark) 68 | self.tiers.append(tier) 69 | 70 | else: 71 | raise ValueError('TextGrid error') 72 | 73 | def write(self, file): 74 | with open(file, 'w') as file: 75 | # Write header 76 | file.write('File type = "ooTextFile"\n') 77 | file.write('Object class = "TextGrid"\n\n') 78 | file.write('xmin = {0}\n'.format(self.tiers[0][0].minTime)) 79 | file.write('xmax = {0}\n'.format(self.tiers[0][-1].maxTime)) 80 | file.write('tiers? \n') 81 | file.write('size = {0}\n'.format(len(self))) 82 | file.write('item []:\n') 83 | 84 | # Write interval tiers 85 | for i, tier in enumerate(self.tiers, 1): 86 | file.write('\titem [{0}]:\n'.format(i)) 87 | file.write('\t\tclass = "IntervalTier"\n') 88 | file.write('\t\tname = "{0}"\n'.format(tier.name)) 89 | file.write('\t\txmin = {0}\n'.format(tier[0].minTime)) 90 | file.write('\t\txmax = {0}\n'.format(tier[-1].maxTime)) 91 | file.write( 92 | '\t\tintervals: size = {0}\n'.format(len(tier.intervals))) 93 | 94 | # Write intervals 95 | for j, interval in enumerate(tier.intervals, 1): 96 | file.write('\t\t\tintervals [{0}]:\n'.format(j)) 97 | file.write('\t\t\t\txmin = {0}\n'.format(interval.minTime)) 98 | file.write('\t\t\t\txmax = {0}\n'.format(interval.maxTime)) 99 | mark = interval.mark.replace('"', '""') 100 | file.write('\t\t\t\ttext = "{0}"\n'.format(mark)) 101 | 102 | @classmethod 103 | def fromFile(cls, file): 104 | textgrid = cls() 105 | textgrid.read(file) 106 | return textgrid 107 | 108 | 109 | ############################################################################### 110 | # Textgrid interval 111 | ############################################################################### 112 | 113 | 114 | class Interval: 115 | 116 | def __init__(self, minTime, maxTime, mark): 117 | if minTime >= maxTime: 118 | raise ValueError(minTime, maxTime) 119 | self.minTime = minTime 120 | self.maxTime = maxTime 121 | self.mark = mark 122 | 123 | 124 | class IntervalTier: 125 | 126 | def __init__(self, name): 127 | self.name = name 128 | self.intervals = [] 129 | 130 | def __iter__(self): 131 | return iter(self.intervals) 132 | 133 | def __len__(self): 134 | return len(self.intervals) 135 | 136 | def __getitem__(self, i): 137 | return self.intervals[i] 138 | 139 | def add(self, minTime, maxTime, mark): 140 | self.intervals.append(Interval(minTime, maxTime, mark)) 141 | 142 | 143 | ############################################################################### 144 | # Utilities 145 | ############################################################################### 146 | 147 | 148 | def parse_header(source): 149 | header = source.readline() 150 | m = re.match(r'File type = "([\w ]+)"', header) 151 | short = 'short' in m.groups()[0] 152 | file_type = parse_line(source.readline(), short) 153 | source.readline() 154 | return file_type, short 155 | 156 | 157 | def parse_line(line, short): 158 | line = line.strip() 159 | if short: 160 | if '"' in line: 161 | return line[1:-1] 162 | return float(line) 163 | if '"' in line: 164 | m = re.match(r'.+? = "(.*)"', line) 165 | return m.groups()[0] 166 | m = re.match(r'.+? = (.*)', line) 167 | return float(m.groups()[0]) 168 | 169 | 170 | def parseMark(text, short): 171 | line = text.readline() 172 | 173 | # read until the number of double-quotes is even 174 | while line.count('"') % 2: 175 | next_line = text.readline() 176 | line += next_line 177 | 178 | if short: 179 | pattern = r'^"(.*?)"\s*$' 180 | else: 181 | pattern = r'^\s*(text|mark) = "(.*?)"\s*$' 182 | entry = re.match(pattern, line, re.DOTALL) 183 | 184 | return entry.groups()[-1].replace('""', '"') 185 | -------------------------------------------------------------------------------- /pypar/word.py: -------------------------------------------------------------------------------- 1 | from typing import List, Optional 2 | 3 | import pypar 4 | 5 | 6 | ############################################################################### 7 | # Word representation 8 | ############################################################################### 9 | 10 | 11 | class Word: 12 | """Aligned word represenatation""" 13 | 14 | def __init__(self, word: str, phonemes: List[pypar.Phoneme]) -> None: 15 | """Create word 16 | 17 | Arguments 18 | word 19 | The word 20 | phonemes 21 | The phonemes in the word 22 | """ 23 | self.word = word 24 | self.phonemes = phonemes 25 | 26 | def __eq__(self, other) -> bool: 27 | """Equality comparison for words 28 | 29 | Arguments 30 | other 31 | The word to compare to 32 | 33 | Returns 34 | Whether the words are the same 35 | """ 36 | return \ 37 | str(self) == str(other) and \ 38 | len(self) == len(other) and \ 39 | all(phoneme == other_phoneme 40 | for phoneme, other_phoneme in zip(self, other)) 41 | 42 | def __getitem__(self, idx: int) -> pypar.Phoneme: 43 | """Retrieve the idxth phoneme 44 | 45 | Arguments 46 | idx 47 | The index of the phoneme to retrieve 48 | 49 | Returns 50 | The phoneme at index idx 51 | """ 52 | return self.phonemes[idx] 53 | 54 | def __len__(self) -> int: 55 | """Retrieve the number of phonemes 56 | 57 | Returns 58 | The number of phonemes 59 | """ 60 | return len(self.phonemes) 61 | 62 | def __str__(self) -> str: 63 | """Retrieve the word text 64 | 65 | Returns 66 | The word text 67 | """ 68 | return self.word 69 | 70 | def duration(self) -> float: 71 | """Retrieve the word duration in seconds 72 | 73 | Returns 74 | The duration in seconds 75 | """ 76 | return self.end() - self.start() 77 | 78 | def end(self) -> float: 79 | """Retrieve the end time of the word in seconds 80 | 81 | Returns 82 | The end time in seconds 83 | """ 84 | return self.phonemes[-1].end() 85 | 86 | def phoneme_at_time(self, time: float) -> Optional[pypar.Phoneme]: 87 | """Retrieve the phoneme at the specified time 88 | 89 | Arguments 90 | time 91 | Time in seconds 92 | 93 | Returns 94 | The phoneme at the given time (or None if time is out of bounds) 95 | """ 96 | for phoneme in self.phonemes: 97 | if phoneme.start() <= time <= phoneme.end(): 98 | return phoneme 99 | return None 100 | 101 | def start(self) -> float: 102 | """Retrieve the start time of the word in seconds 103 | 104 | Returns 105 | The start time in seconds 106 | """ 107 | return self.phonemes[0].start() 108 | 109 | ########################################################################### 110 | # Utilities 111 | ########################################################################### 112 | 113 | def update(self, start, durations=None): 114 | """Update the word with new start time and phoneme durations 115 | 116 | Arguments 117 | start : float 118 | The new start time of the word 119 | durations : list[float] or None 120 | The new phoneme durations 121 | """ 122 | # Use current durations if None provided 123 | if durations is None: 124 | durations = [phoneme.duration() for phoneme in self.phonemes] 125 | 126 | # Update phonemes 127 | phoneme_start = start 128 | for phoneme, duration in zip(self.phonemes, durations): 129 | phoneme._start = phoneme_start 130 | phoneme._end = phoneme_start + duration 131 | phoneme_start = phoneme._end 132 | 133 | def validate(self): 134 | """Ensures that adjacent start/end times are valid by adding silence""" 135 | i = 0 136 | while i < len(self) - 1: 137 | 138 | # Get start and end times between phonemes 139 | start = self[i].end() 140 | end = self[i + 1].start() 141 | 142 | # Patch gap with silence 143 | if end - start > 1e-4: 144 | phoneme = pypar.Phoneme(pypar.SILENCE, start, end) 145 | self.phonemes.insert(i + 1, phoneme) 146 | i += 1 147 | 148 | i += 1 149 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup 2 | 3 | 4 | # Description 5 | with open('README.md') as file: 6 | long_description = file.read() 7 | 8 | 9 | setup( 10 | name='pypar', 11 | version='0.0.6', 12 | description='Python phoneme alignment representation', 13 | author='Max Morrison', 14 | author_email='maxrmorrison@gmail.com', 15 | url='https://github.com/maxrmorrison/pypar', 16 | install_requires=['numpy'], 17 | packages=['pypar'], 18 | long_description=long_description, 19 | long_description_content_type='text/markdown', 20 | keywords=['align', 'duration', 'phoneme', 'speech'], 21 | classifiers=['License :: OSI Approved :: MIT License'], 22 | license='MIT') 23 | -------------------------------------------------------------------------------- /test/assets/float.json: -------------------------------------------------------------------------------- 1 | { 2 | "words": [ 3 | { 4 | "alignedWord": "", 5 | "start": 0.0, 6 | "end": 0.0245, 7 | "phonemes": [ 8 | [ 9 | "", 10 | 0.0, 11 | 0.0245 12 | ] 13 | ] 14 | }, 15 | { 16 | "alignedWord": "the", 17 | "start": 0.0245, 18 | "end": 0.112, 19 | "phonemes": [ 20 | [ 21 | "dh", 22 | 0.0245, 23 | 0.08075 24 | ], 25 | [ 26 | "ax", 27 | 0.08075, 28 | 0.112 29 | ] 30 | ] 31 | }, 32 | { 33 | "alignedWord": "girl", 34 | "start": 0.112, 35 | "end": 0.36825, 36 | "phonemes": [ 37 | [ 38 | "g", 39 | 0.112, 40 | 0.18075 41 | ], 42 | [ 43 | "er", 44 | 0.18075, 45 | 0.29325 46 | ], 47 | [ 48 | "l", 49 | 0.29325, 50 | 0.36825 51 | ] 52 | ] 53 | }, 54 | { 55 | "alignedWord": "faced", 56 | "start": 0.36825, 57 | "end": 0.7245, 58 | "phonemes": [ 59 | [ 60 | "f", 61 | 0.36825, 62 | 0.4745 63 | ], 64 | [ 65 | "ey", 66 | 0.4745, 67 | 0.60575 68 | ], 69 | [ 70 | "s", 71 | 0.60575, 72 | 0.6745 73 | ], 74 | [ 75 | "t", 76 | 0.6745, 77 | 0.7245 78 | ] 79 | ] 80 | }, 81 | { 82 | "alignedWord": "him", 83 | "start": 0.7245, 84 | "end": 0.91825, 85 | "phonemes": [ 86 | [ 87 | "hh", 88 | 0.7245, 89 | 0.7495 90 | ], 91 | [ 92 | "ih", 93 | 0.7495, 94 | 0.7995 95 | ], 96 | [ 97 | "m", 98 | 0.7995, 99 | 0.91825 100 | ] 101 | ] 102 | }, 103 | { 104 | "alignedWord": "", 105 | "start": 0.91825, 106 | "end": 1.13075, 107 | "phonemes": [ 108 | [ 109 | "", 110 | 0.91825, 111 | 1.13075 112 | ] 113 | ] 114 | }, 115 | { 116 | "alignedWord": "her", 117 | "start": 1.13075, 118 | "end": 1.29325, 119 | "phonemes": [ 120 | [ 121 | "hh", 122 | 1.13075, 123 | 1.1995 124 | ], 125 | [ 126 | "er", 127 | 1.1995, 128 | 1.29325 129 | ] 130 | ] 131 | }, 132 | { 133 | "alignedWord": "eyes", 134 | "start": 1.29325, 135 | "end": 1.48075, 136 | "phonemes": [ 137 | [ 138 | "ay", 139 | 1.29325, 140 | 1.3995 141 | ], 142 | [ 143 | "z", 144 | 1.3995, 145 | 1.48075 146 | ] 147 | ] 148 | }, 149 | { 150 | "alignedWord": "shining", 151 | "start": 1.48075, 152 | "end": 1.90575, 153 | "phonemes": [ 154 | [ 155 | "sh", 156 | 1.48075, 157 | 1.59325 158 | ], 159 | [ 160 | "ay", 161 | 1.59325, 162 | 1.71825 163 | ], 164 | [ 165 | "n", 166 | 1.71825, 167 | 1.76825 168 | ], 169 | [ 170 | "ih", 171 | 1.76825, 172 | 1.83075 173 | ], 174 | [ 175 | "ng", 176 | 1.83075, 177 | 1.90575 178 | ] 179 | ] 180 | }, 181 | { 182 | "alignedWord": "with", 183 | "start": 1.90575, 184 | "end": 2.00575, 185 | "phonemes": [ 186 | [ 187 | "w", 188 | 1.90575, 189 | 1.95575 190 | ], 191 | [ 192 | "ih", 193 | 1.95575, 194 | 1.987 195 | ], 196 | [ 197 | "dh", 198 | 1.987, 199 | 2.00575 200 | ] 201 | ] 202 | }, 203 | { 204 | "alignedWord": "sudden", 205 | "start": 2.00575, 206 | "end": 2.38075, 207 | "phonemes": [ 208 | [ 209 | "s", 210 | 2.00575, 211 | 2.13075 212 | ], 213 | [ 214 | "ah", 215 | 2.13075, 216 | 2.212 217 | ], 218 | [ 219 | "d", 220 | 2.212, 221 | 2.24325 222 | ], 223 | [ 224 | "ax", 225 | 2.24325, 226 | 2.287 227 | ], 228 | [ 229 | "n", 230 | 2.287, 231 | 2.38075 232 | ] 233 | ] 234 | }, 235 | { 236 | "alignedWord": "fear", 237 | "start": 2.38075, 238 | "end": 2.80575, 239 | "phonemes": [ 240 | [ 241 | "f", 242 | 2.38075, 243 | 2.50575 244 | ], 245 | [ 246 | "ih", 247 | 2.50575, 248 | 2.59325 249 | ], 250 | [ 251 | "r", 252 | 2.59325, 253 | 2.80575 254 | ] 255 | ] 256 | }, 257 | { 258 | "alignedWord": "", 259 | "start": 2.80575, 260 | "end": 2.84325, 261 | "phonemes": [ 262 | [ 263 | "", 264 | 2.80575, 265 | 2.84325 266 | ] 267 | ] 268 | } 269 | ] 270 | } 271 | -------------------------------------------------------------------------------- /test/assets/test.TextGrid: -------------------------------------------------------------------------------- 1 | File type = "ooTextFile" 2 | Object class = "TextGrid" 3 | 4 | xmin = 0.0 5 | xmax = 5.429931972789116 6 | tiers? 7 | size = 2 8 | item []: 9 | item [1]: 10 | class = "IntervalTier" 11 | name = "phone" 12 | xmin = 0.0 13 | xmax = 5.429931972789116 14 | intervals: size = 54 15 | intervals [1]: 16 | xmin = 0.0 17 | xmax = 0.27188208616780046 18 | text = "sil" 19 | intervals [2]: 20 | xmin = 0.27188208616780046 21 | xmax = 0.4414965986394558 22 | text = "AY1" 23 | intervals [3]: 24 | xmin = 0.4414965986394558 25 | xmax = 0.6011337868480725 26 | text = "B" 27 | intervals [4]: 28 | xmin = 0.6011337868480725 29 | xmax = 0.6609977324263039 30 | text = "EH1" 31 | intervals [5]: 32 | xmin = 0.6609977324263039 33 | xmax = 0.7507936507936508 34 | text = "G" 35 | intervals [6]: 36 | xmin = 0.7507936507936508 37 | xmax = 0.8605442176870748 38 | text = "Y" 39 | intervals [7]: 40 | xmin = 0.8605442176870748 41 | xmax = 0.8904761904761904 42 | text = "AO1" 43 | intervals [8]: 44 | xmin = 0.8904761904761904 45 | xmax = 0.9303854875283446 46 | text = "R" 47 | intervals [9]: 48 | xmin = 0.9303854875283446 49 | xmax = 1.0501133786848071 50 | text = "P" 51 | intervals [10]: 52 | xmin = 1.0501133786848071 53 | xmax = 1.1399092970521538 54 | text = "AA1" 55 | intervals [11]: 56 | xmin = 1.1399092970521538 57 | xmax = 1.199773242630385 58 | text = "R" 59 | intervals [12]: 60 | xmin = 1.199773242630385 61 | xmax = 1.2396825396825393 62 | text = "D" 63 | intervals [13]: 64 | xmin = 1.2396825396825393 65 | xmax = 1.269614512471655 66 | text = "AH0" 67 | intervals [14]: 68 | xmin = 1.269614512471655 69 | xmax = 1.4990929705215414 70 | text = "N" 71 | intervals [15]: 72 | xmin = 1.4990929705215414 73 | xmax = 1.5888888888888886 74 | text = "sil" 75 | intervals [16]: 76 | xmin = 1.5888888888888886 77 | xmax = 1.7485260770975053 78 | text = "S" 79 | intervals [17]: 80 | xmin = 1.7485260770975053 81 | xmax = 1.818367346938775 82 | text = "EH1" 83 | intervals [18]: 84 | xmin = 1.818367346938775 85 | xmax = 1.8482993197278907 86 | text = "D" 87 | intervals [19]: 88 | xmin = 1.8482993197278907 89 | xmax = 1.8782312925170064 90 | text = "DH" 91 | intervals [20]: 92 | xmin = 1.8782312925170064 93 | xmax = 1.9081632653061218 94 | text = "AH0" 95 | intervals [21]: 96 | xmin = 1.9081632653061218 97 | xmax = 2.017913832199546 98 | text = "M" 99 | intervals [22]: 100 | xmin = 2.017913832199546 101 | xmax = 2.1875283446712013 102 | text = "AW1" 103 | intervals [23]: 104 | xmin = 2.1875283446712013 105 | xmax = 2.4269841269841264 106 | text = "S" 107 | intervals [24]: 108 | xmin = 2.4269841269841264 109 | xmax = 2.526757369614512 110 | text = "F" 111 | intervals [25]: 112 | xmin = 2.526757369614512 113 | xmax = 2.586621315192743 114 | text = "R" 115 | intervals [26]: 116 | xmin = 2.586621315192743 117 | xmax = 2.6863945578231285 118 | text = "AW1" 119 | intervals [27]: 120 | xmin = 2.6863945578231285 121 | xmax = 2.7263038548752827 122 | text = "N" 123 | intervals [28]: 124 | xmin = 2.7263038548752827 125 | xmax = 2.7961451247165523 126 | text = "IH0" 127 | intervals [29]: 128 | xmin = 2.7961451247165523 129 | xmax = 2.8759637188208607 130 | text = "NG" 131 | intervals [30]: 132 | xmin = 2.8759637188208607 133 | xmax = 2.935827664399092 134 | text = "B" 135 | intervals [31]: 136 | xmin = 2.935827664399092 137 | xmax = 2.995691609977323 138 | text = "AH1" 139 | intervals [32]: 140 | xmin = 2.995691609977323 141 | xmax = 3.075510204081631 142 | text = "T" 143 | intervals [33]: 144 | xmin = 3.075510204081631 145 | xmax = 3.1353741496598624 146 | text = "V" 147 | intervals [34]: 148 | xmin = 3.1353741496598624 149 | xmax = 3.1752834467120166 150 | text = "EH1" 151 | intervals [35]: 152 | xmin = 3.1752834467120166 153 | xmax = 3.2850340136054403 154 | text = "R" 155 | intervals [36]: 156 | xmin = 3.2850340136054403 157 | xmax = 3.3448979591836716 158 | text = "IY0" 159 | intervals [37]: 160 | xmin = 3.3448979591836716 161 | xmax = 3.4147392290249416 162 | text = "P" 163 | intervals [38]: 164 | xmin = 3.4147392290249416 165 | xmax = 3.444671201814057 166 | text = "AH0" 167 | intervals [39]: 168 | xmin = 3.444671201814057 169 | xmax = 3.5444444444444425 170 | text = "L" 171 | intervals [40]: 172 | xmin = 3.5444444444444425 173 | xmax = 3.644217687074828 174 | text = "AY1" 175 | intervals [41]: 176 | xmin = 3.644217687074828 177 | xmax = 3.6741496598639434 178 | text = "T" 179 | intervals [42]: 180 | xmin = 3.6741496598639434 181 | xmax = 3.7439909297052134 182 | text = "L" 183 | intervals [43]: 184 | xmin = 3.7439909297052134 185 | xmax = 3.8736961451247143 186 | text = "IY0" 187 | intervals [44]: 188 | xmin = 3.8736961451247143 189 | xmax = 4.093197278911562 190 | text = "sil" 191 | intervals [45]: 192 | xmin = 4.093197278911562 193 | xmax = 4.232879818594102 194 | text = "D" 195 | intervals [46]: 196 | xmin = 4.232879818594102 197 | xmax = 4.292743764172333 198 | text = "IH1" 199 | intervals [47]: 200 | xmin = 4.292743764172333 201 | xmax = 4.352607709750564 202 | text = "D" 203 | intervals [48]: 204 | xmin = 4.352607709750564 205 | xmax = 4.442403628117912 206 | text = "Y" 207 | intervals [49]: 208 | xmin = 4.442403628117912 209 | xmax = 4.482312925170066 210 | text = "UW1" 211 | intervals [50]: 212 | xmin = 4.482312925170066 213 | xmax = 4.641950113378682 214 | text = "S" 215 | intervals [51]: 216 | xmin = 4.641950113378682 217 | xmax = 4.681859410430836 218 | text = "P" 219 | intervals [52]: 220 | xmin = 4.681859410430836 221 | xmax = 4.851473922902492 222 | text = "IY1" 223 | intervals [53]: 224 | xmin = 4.851473922902492 225 | xmax = 5.0011337868480705 226 | text = "K" 227 | intervals [54]: 228 | xmin = 5.0011337868480705 229 | xmax = 5.429931972789116 230 | text = "sil" 231 | item [2]: 232 | class = "IntervalTier" 233 | name = "word" 234 | xmin = 0.0 235 | xmax = 5.429931972789116 236 | intervals: size = 18 237 | intervals [1]: 238 | xmin = 0.0 239 | xmax = 0.27188208616780046 240 | text = "sp" 241 | intervals [2]: 242 | xmin = 0.27188208616780046 243 | xmax = 0.4414965986394558 244 | text = "I" 245 | intervals [3]: 246 | xmin = 0.4414965986394558 247 | xmax = 0.7507936507936508 248 | text = "BEG" 249 | intervals [4]: 250 | xmin = 0.7507936507936508 251 | xmax = 0.9303854875283446 252 | text = "YOUR" 253 | intervals [5]: 254 | xmin = 0.9303854875283446 255 | xmax = 1.4990929705215414 256 | text = "PARDON" 257 | intervals [6]: 258 | xmin = 1.4990929705215414 259 | xmax = 1.5888888888888886 260 | text = "sp" 261 | intervals [7]: 262 | xmin = 1.5888888888888886 263 | xmax = 1.8482993197278907 264 | text = "SAID" 265 | intervals [8]: 266 | xmin = 1.8482993197278907 267 | xmax = 1.9081632653061218 268 | text = "THE" 269 | intervals [9]: 270 | xmin = 1.9081632653061218 271 | xmax = 2.4269841269841264 272 | text = "MOUSE" 273 | intervals [10]: 274 | xmin = 2.4269841269841264 275 | xmax = 2.8759637188208607 276 | text = "FROWNING" 277 | intervals [11]: 278 | xmin = 2.8759637188208607 279 | xmax = 3.075510204081631 280 | text = "BUT" 281 | intervals [12]: 282 | xmin = 3.075510204081631 283 | xmax = 3.3448979591836716 284 | text = "VERY" 285 | intervals [13]: 286 | xmin = 3.3448979591836716 287 | xmax = 3.8736961451247143 288 | text = "POLITELY" 289 | intervals [14]: 290 | xmin = 3.8736961451247143 291 | xmax = 4.093197278911562 292 | text = "sp" 293 | intervals [15]: 294 | xmin = 4.093197278911562 295 | xmax = 4.352607709750564 296 | text = "DID" 297 | intervals [16]: 298 | xmin = 4.352607709750564 299 | xmax = 4.482312925170066 300 | text = "YOU" 301 | intervals [17]: 302 | xmin = 4.482312925170066 303 | xmax = 5.0011337868480705 304 | text = "SPEAK" 305 | intervals [18]: 306 | xmin = 5.0011337868480705 307 | xmax = 5.429931972789116 308 | text = "sp" 309 | -------------------------------------------------------------------------------- /test/assets/test.json: -------------------------------------------------------------------------------- 1 | { 2 | "words": [ 3 | { 4 | "alignedWord": "", 5 | "start": 0.0, 6 | "end": 0.27188208616780046, 7 | "phonemes": [ 8 | [ 9 | "", 10 | 0.0, 11 | 0.27188208616780046 12 | ] 13 | ] 14 | }, 15 | { 16 | "alignedWord": "I", 17 | "start": 0.27188208616780046, 18 | "end": 0.4414965986394558, 19 | "phonemes": [ 20 | [ 21 | "AY1", 22 | 0.27188208616780046, 23 | 0.4414965986394558 24 | ] 25 | ] 26 | }, 27 | { 28 | "alignedWord": "BEG", 29 | "start": 0.4414965986394558, 30 | "end": 0.7507936507936508, 31 | "phonemes": [ 32 | [ 33 | "B", 34 | 0.4414965986394558, 35 | 0.6011337868480725 36 | ], 37 | [ 38 | "EH1", 39 | 0.6011337868480725, 40 | 0.6609977324263039 41 | ], 42 | [ 43 | "G", 44 | 0.6609977324263039, 45 | 0.7507936507936508 46 | ] 47 | ] 48 | }, 49 | { 50 | "alignedWord": "YOUR", 51 | "start": 0.7507936507936508, 52 | "end": 0.9303854875283446, 53 | "phonemes": [ 54 | [ 55 | "Y", 56 | 0.7507936507936508, 57 | 0.8605442176870748 58 | ], 59 | [ 60 | "AO1", 61 | 0.8605442176870748, 62 | 0.8904761904761904 63 | ], 64 | [ 65 | "R", 66 | 0.8904761904761904, 67 | 0.9303854875283446 68 | ] 69 | ] 70 | }, 71 | { 72 | "alignedWord": "PARDON", 73 | "start": 0.9303854875283446, 74 | "end": 1.4990929705215414, 75 | "phonemes": [ 76 | [ 77 | "P", 78 | 0.9303854875283446, 79 | 1.0501133786848071 80 | ], 81 | [ 82 | "AA1", 83 | 1.0501133786848071, 84 | 1.1399092970521538 85 | ], 86 | [ 87 | "R", 88 | 1.1399092970521538, 89 | 1.199773242630385 90 | ], 91 | [ 92 | "D", 93 | 1.199773242630385, 94 | 1.2396825396825393 95 | ], 96 | [ 97 | "AH0", 98 | 1.2396825396825393, 99 | 1.269614512471655 100 | ], 101 | [ 102 | "N", 103 | 1.269614512471655, 104 | 1.4990929705215414 105 | ] 106 | ] 107 | }, 108 | { 109 | "alignedWord": "", 110 | "start": 1.4990929705215414, 111 | "end": 1.5888888888888886, 112 | "phonemes": [ 113 | [ 114 | "", 115 | 1.4990929705215414, 116 | 1.5888888888888886 117 | ] 118 | ] 119 | }, 120 | { 121 | "alignedWord": "SAID", 122 | "start": 1.5888888888888886, 123 | "end": 1.8482993197278907, 124 | "phonemes": [ 125 | [ 126 | "S", 127 | 1.5888888888888886, 128 | 1.7485260770975053 129 | ], 130 | [ 131 | "EH1", 132 | 1.7485260770975053, 133 | 1.818367346938775 134 | ], 135 | [ 136 | "D", 137 | 1.818367346938775, 138 | 1.8482993197278907 139 | ] 140 | ] 141 | }, 142 | { 143 | "alignedWord": "THE", 144 | "start": 1.8482993197278907, 145 | "end": 1.9081632653061218, 146 | "phonemes": [ 147 | [ 148 | "DH", 149 | 1.8482993197278907, 150 | 1.8782312925170064 151 | ], 152 | [ 153 | "AH0", 154 | 1.8782312925170064, 155 | 1.9081632653061218 156 | ] 157 | ] 158 | }, 159 | { 160 | "alignedWord": "MOUSE", 161 | "start": 1.9081632653061218, 162 | "end": 2.4269841269841264, 163 | "phonemes": [ 164 | [ 165 | "M", 166 | 1.9081632653061218, 167 | 2.017913832199546 168 | ], 169 | [ 170 | "AW1", 171 | 2.017913832199546, 172 | 2.1875283446712013 173 | ], 174 | [ 175 | "S", 176 | 2.1875283446712013, 177 | 2.4269841269841264 178 | ] 179 | ] 180 | }, 181 | { 182 | "alignedWord": "FROWNING", 183 | "start": 2.4269841269841264, 184 | "end": 2.8759637188208607, 185 | "phonemes": [ 186 | [ 187 | "F", 188 | 2.4269841269841264, 189 | 2.526757369614512 190 | ], 191 | [ 192 | "R", 193 | 2.526757369614512, 194 | 2.586621315192743 195 | ], 196 | [ 197 | "AW1", 198 | 2.586621315192743, 199 | 2.6863945578231285 200 | ], 201 | [ 202 | "N", 203 | 2.6863945578231285, 204 | 2.7263038548752827 205 | ], 206 | [ 207 | "IH0", 208 | 2.7263038548752827, 209 | 2.7961451247165523 210 | ], 211 | [ 212 | "NG", 213 | 2.7961451247165523, 214 | 2.8759637188208607 215 | ] 216 | ] 217 | }, 218 | { 219 | "alignedWord": "BUT", 220 | "start": 2.8759637188208607, 221 | "end": 3.075510204081631, 222 | "phonemes": [ 223 | [ 224 | "B", 225 | 2.8759637188208607, 226 | 2.935827664399092 227 | ], 228 | [ 229 | "AH1", 230 | 2.935827664399092, 231 | 2.995691609977323 232 | ], 233 | [ 234 | "T", 235 | 2.995691609977323, 236 | 3.075510204081631 237 | ] 238 | ] 239 | }, 240 | { 241 | "alignedWord": "VERY", 242 | "start": 3.075510204081631, 243 | "end": 3.3448979591836716, 244 | "phonemes": [ 245 | [ 246 | "V", 247 | 3.075510204081631, 248 | 3.1353741496598624 249 | ], 250 | [ 251 | "EH1", 252 | 3.1353741496598624, 253 | 3.1752834467120166 254 | ], 255 | [ 256 | "R", 257 | 3.1752834467120166, 258 | 3.2850340136054403 259 | ], 260 | [ 261 | "IY0", 262 | 3.2850340136054403, 263 | 3.3448979591836716 264 | ] 265 | ] 266 | }, 267 | { 268 | "alignedWord": "POLITELY", 269 | "start": 3.3448979591836716, 270 | "end": 3.8736961451247143, 271 | "phonemes": [ 272 | [ 273 | "P", 274 | 3.3448979591836716, 275 | 3.4147392290249416 276 | ], 277 | [ 278 | "AH0", 279 | 3.4147392290249416, 280 | 3.444671201814057 281 | ], 282 | [ 283 | "L", 284 | 3.444671201814057, 285 | 3.5444444444444425 286 | ], 287 | [ 288 | "AY1", 289 | 3.5444444444444425, 290 | 3.644217687074828 291 | ], 292 | [ 293 | "T", 294 | 3.644217687074828, 295 | 3.6741496598639434 296 | ], 297 | [ 298 | "L", 299 | 3.6741496598639434, 300 | 3.7439909297052134 301 | ], 302 | [ 303 | "IY0", 304 | 3.7439909297052134, 305 | 3.8736961451247143 306 | ] 307 | ] 308 | }, 309 | { 310 | "alignedWord": "", 311 | "start": 3.8736961451247143, 312 | "end": 4.093197278911562, 313 | "phonemes": [ 314 | [ 315 | "", 316 | 3.8736961451247143, 317 | 4.093197278911562 318 | ] 319 | ] 320 | }, 321 | { 322 | "alignedWord": "DID", 323 | "start": 4.093197278911562, 324 | "end": 4.352607709750564, 325 | "phonemes": [ 326 | [ 327 | "D", 328 | 4.093197278911562, 329 | 4.232879818594102 330 | ], 331 | [ 332 | "IH1", 333 | 4.232879818594102, 334 | 4.292743764172333 335 | ], 336 | [ 337 | "D", 338 | 4.292743764172333, 339 | 4.352607709750564 340 | ] 341 | ] 342 | }, 343 | { 344 | "alignedWord": "YOU", 345 | "start": 4.352607709750564, 346 | "end": 4.482312925170066, 347 | "phonemes": [ 348 | [ 349 | "Y", 350 | 4.352607709750564, 351 | 4.442403628117912 352 | ], 353 | [ 354 | "UW1", 355 | 4.442403628117912, 356 | 4.482312925170066 357 | ] 358 | ] 359 | }, 360 | { 361 | "alignedWord": "SPEAK", 362 | "start": 4.482312925170066, 363 | "end": 5.0011337868480705, 364 | "phonemes": [ 365 | [ 366 | "S", 367 | 4.482312925170066, 368 | 4.641950113378682 369 | ], 370 | [ 371 | "P", 372 | 4.641950113378682, 373 | 4.681859410430836 374 | ], 375 | [ 376 | "IY1", 377 | 4.681859410430836, 378 | 4.851473922902492 379 | ], 380 | [ 381 | "K", 382 | 4.851473922902492, 383 | 5.0011337868480705 384 | ] 385 | ] 386 | }, 387 | { 388 | "alignedWord": "", 389 | "start": 5.0011337868480705, 390 | "end": 5.429931972789116, 391 | "phonemes": [ 392 | [ 393 | "", 394 | 5.0011337868480705, 395 | 5.429931972789116 396 | ] 397 | ] 398 | } 399 | ] 400 | } 401 | -------------------------------------------------------------------------------- /test/assets/test.txt: -------------------------------------------------------------------------------- 1 | "I beg your pardon?" said the mouse, frowning, but very politely, "did you speak?" -------------------------------------------------------------------------------- /test/assets/test.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/maxrmorrison/pypar/15701c8d3325d24c9aa04919468e840655c460a6/test/assets/test.wav -------------------------------------------------------------------------------- /test/conftest.py: -------------------------------------------------------------------------------- 1 | from pathlib import Path 2 | 3 | import pytest 4 | 5 | import pypar 6 | 7 | 8 | ############################################################################### 9 | # Test fixtures 10 | ############################################################################### 11 | 12 | 13 | @pytest.fixture(scope='session') 14 | def alignment(): 15 | """Retrieve the alignment to use for testing""" 16 | return pypar.Alignment(path('test.json')) 17 | 18 | @pytest.fixture(scope='session') 19 | def text(): 20 | """Retrieve the speech transcript""" 21 | with open(path('test.txt')) as file: 22 | return file.read() 23 | 24 | 25 | @pytest.fixture(scope='session') 26 | def textgrid(): 27 | """Retrieve the speech textgrid""" 28 | return pypar.Alignment(path('test.TextGrid')) 29 | 30 | 31 | @pytest.fixture(scope='session') 32 | def float_alignment(): 33 | """Retrieve special alignment for float testing""" 34 | return pypar.Alignment(path('float.json')) 35 | 36 | 37 | ############################################################################### 38 | # Utilities 39 | ############################################################################### 40 | 41 | 42 | def path(file): 43 | """Resolve the file path of a test asset""" 44 | return Path(__file__).parent / 'assets' / file 45 | -------------------------------------------------------------------------------- /test/test_alignment.py: -------------------------------------------------------------------------------- 1 | import tempfile 2 | from pathlib import Path 3 | 4 | import pypar 5 | 6 | 7 | ############################################################################### 8 | # Test alignment 9 | ############################################################################### 10 | 11 | 12 | def test_find(alignment): 13 | """Test finding words in the alignment""" 14 | assert alignment.find('the mouse') == 7 15 | assert alignment.find('the dog') == -1 16 | 17 | 18 | def test_phoneme_at_time(alignment): 19 | """Test queries for current phoneme given a time in seconds""" 20 | assert alignment.phoneme_at_time(-1.) is None 21 | assert str(alignment.phoneme_at_time(1.5)) == pypar.SILENCE 22 | assert str(alignment.phoneme_at_time(1.9)) == 'AH0' 23 | assert str(alignment.phoneme_at_time(4.5)) == 'S' 24 | assert alignment.phoneme_at_time(6.) is None 25 | 26 | 27 | def test_phoneme_bounds(alignment): 28 | """Test frame boundaries of phonemes""" 29 | bounds = alignment.phoneme_bounds(10000, 100) 30 | assert bounds[0] == (27, 44) 31 | assert bounds[4] == (75, 86) 32 | 33 | 34 | def test_load(textgrid): 35 | """Test textgrid loading""" 36 | pass 37 | 38 | 39 | def test_save(alignment): 40 | """Test saving and reloading alignment""" 41 | with tempfile.TemporaryDirectory() as directory: 42 | # Test json 43 | file = Path(directory) / 'alignment.json' 44 | alignment.save(file) 45 | assert alignment == pypar.Alignment(file) 46 | 47 | # Test textgrid 48 | file = Path(directory) / 'alignment.TextGrid' 49 | alignment.save(file) 50 | assert alignment == pypar.Alignment(file) 51 | 52 | 53 | def test_string(text, alignment): 54 | """Test the alignment string representation""" 55 | text = text.replace('"', '') 56 | text = text.replace('?', '') 57 | text = text.replace(',', '') 58 | assert text.upper() == str(alignment) 59 | 60 | 61 | def test_word_at_time(alignment): 62 | """Test queries for current word given a time in seconds""" 63 | assert alignment.word_at_time(-1.) is None 64 | assert str(alignment.word_at_time(1.)) == 'PARDON' 65 | assert str(alignment.word_at_time(4.1)) == 'DID' 66 | assert alignment.word_at_time(6.) is None 67 | 68 | 69 | def test_word_bounds(alignment): 70 | """Test frame boundaries of words""" 71 | bounds = alignment.word_bounds(10000, 100) 72 | assert bounds[0] == (27, 44) 73 | assert bounds[3] == (93, 149) 74 | 75 | 76 | def test_float_update(float_alignment): 77 | for i in range(1, len(float_alignment)): 78 | assert float_alignment[i].start() >= float_alignment[i-1].end() 79 | float_alignment.update(start=0.) 80 | for i in range(1, len(float_alignment)): 81 | assert float_alignment[i].start() >= float_alignment[i-1].end() 82 | -------------------------------------------------------------------------------- /test/test_compare.py: -------------------------------------------------------------------------------- 1 | import copy 2 | 3 | import pypar 4 | 5 | 6 | ############################################################################### 7 | # Test alignment comparisons 8 | ############################################################################### 9 | 10 | 11 | def test_per_frame_rate(alignment): 12 | """Test the per-frame speed difference between alignments""" 13 | stretch_and_assert(alignment, .5) 14 | stretch_and_assert(alignment, 1.) 15 | stretch_and_assert(alignment, 2.) 16 | 17 | 18 | ############################################################################### 19 | # Utilities 20 | ############################################################################### 21 | 22 | 23 | def stretch(alignment, factor): 24 | """Time-stretch the alignment by a constant factor""" 25 | # Get phoneme durations 26 | durations = [factor * p.duration() for p in alignment.phonemes()] 27 | alignment = copy.deepcopy(alignment) 28 | alignment.update(durations=durations) 29 | return alignment 30 | 31 | 32 | def stretch_and_assert(alignment, factor, sample_rate=10000, hopsize=100): 33 | """Time-stretch and perform test assertions""" 34 | # Get per-frame rate differences 35 | result = pypar.compare.per_frame_rate(alignment, 36 | stretch(alignment, factor), 37 | sample_rate, 38 | hopsize) 39 | 40 | # Perform assertions 41 | assert len(result) == 1 + int(alignment.duration() * sample_rate / hopsize) 42 | for item in result: 43 | assert item == factor 44 | --------------------------------------------------------------------------------