No speakers loaded.
24 | {% endif %} -------------------------------------------------------------------------------- /templates/deepstory.js: -------------------------------------------------------------------------------- 1 | $(document).ready(function () { 2 | function refresh_status() { 3 | $("#status").load("{{ url_for('status') }}", function () { 4 | $("#clearCache").click(function () { 5 | $.ajax({ 6 | url: "{{ url_for('clear') }}", 7 | success: function (message) { 8 | alert(message); 9 | refresh_status(); 10 | refresh_animate(); 11 | refresh_video(); 12 | }, 13 | error: function (response) { 14 | alert(response.responseText); 15 | } 16 | }); 17 | }); 18 | }); 19 | } 20 | function refresh_sent() { 21 | $("#sentences").load("{{ url_for('sentences') }}"); 22 | } 23 | function refresh_animate() { 24 | $("#tab-3").load("{{ url_for('animate') }}", function () { 25 | $("#animate").find("select").each(function () { 26 | let img = $('{{ sentence | replace('\n', '
') | safe }}
Intereactive Textarea: (generate inferred to content in textarea)
97 | 98 |Speaker | 5 |Sentence | 6 |Modify | 7 | {% if speaker_map %}Mapped | {% endif %} 8 | {% if 'wav' in sentences[0] %}Audio | {% endif %} 9 |
---|---|---|---|---|
{{ sentence['speaker'] }} | 15 |{{ sentence['text'] }} | 16 |17 | 31 | | 32 | {% if speaker_map %} 33 |34 | {% if sentence['speaker'] in speaker_map %}{{ speaker_map[sentence['speaker']] }}{% else %}---{% endif %} 35 | | 36 | {% endif %} 37 | {% if 'wav' in sentence %} 38 |39 | 43 | | 44 | {% endif %} 45 |
Synthesized: {% if synthsized %}True{% else %}False{% endif %}
6 |Base audio: {% if combined %}True{% else %}False{% endif %}
7 |Base video: {% if base %}True{% else %}False{% endif %}
8 |Animated video: {% if animated %}True{% else %}False{% endif %}
9 | 10 |No videos animated.
5 | {% endif %} -------------------------------------------------------------------------------- /test.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thetobysiu/deepstory/34eb7b1771479b996f361c291dc36b88ca25bd17/test.py -------------------------------------------------------------------------------- /text.txt: -------------------------------------------------------------------------------- 1 | The taller of the women suddenly swayed, legs planted widely apart, and twisted her hips. Her sabre, which no one saw her draw, hissed sharply through the air. The spotty-faced man’s head flew upwards in an arc and fell into the gaping opening to the dungeon. His body toppled stiffly and heavily, like a tree being felled, among the crushed bricks. The crowd let out a scream. The second woman, hand on her sword hilt, whirled around nimbly, protecting her partner’s back. Needlessly. The crowd, stumbling and falling over on the rubble, fled towards the town as fast as they could. The Alderman loped at the front with impressive strides, outdistancing the huge butcher by only a few yards. 2 | ‘An excellent stroke,’ the white-haired man commented coldly, shielding his eyes from the sun with a black-gloved hand. ‘An excellent stroke from a Zerrikanian sabre. I bow before the skill and beauty of the free warriors. I’m Geralt of Rivia.’ 3 | ‘And I,’ the stranger in the dark brown tunic pointed at the faded coat of arms on the front of his garment, depicting three black birds sitting in a row in the centre of a uniformly gold field, ‘am Borch, also known as Three Jackdaws. And these are my girls, Téa and Véa. That’s what I call them, because you’ll twist your tongue on their right names. They are both, as you correctly surmised, Zerrikanian.’ 4 | ‘Thanks to them, it appears, I still have my horse and belongings. I thank you, warriors. My thanks to you too, sir.’ 5 | ‘Three Jackdaws. And you can drop the “sir”. Does anything detain you in this little town, Geralt of Rivia?’ 6 | ‘Quite the opposite.’ 7 | ‘Excellent. I have a proposal. Not far from here, at the crossroads on the road to the river port, is an inn. It’s called the Pensive Dragon. The vittals there have no equal in these parts. I’m heading there with food and lodging in mind. It would be my honour should you choose to keep me company.’ 8 | ‘Borch.’ The white-haired man turned around from his horse and looked into the stranger’s bright eyes. ‘I wouldn’t want anything left unclear between us. I’m a witcher.’ 9 | ‘I guessed as much. But you said it as you might have said “I’m a leper”.’ 10 | ‘There are those,’ Geralt said slowly, ‘who prefer the company of lepers to that of a witcher.’ 11 | ‘There are also those,’ Three Jackdaws laughed, ‘who prefer sheep to girls. Ah, well, one can only sympathise with the former and the latter. I repeat my proposal.’ 12 | Geralt took off his glove and shook the hand being proffered. -------------------------------------------------------------------------------- /util.py: -------------------------------------------------------------------------------- 1 | # SIU KING WAI SM4701 Deepstory 2 | import re 3 | import copy 4 | import spacy 5 | import librosa 6 | import numpy as np 7 | 8 | from unidecode import unidecode 9 | from modules.dctts import hp 10 | from pydub import AudioSegment, effects 11 | 12 | 13 | def quote_boundaries(doc): 14 | for token in doc[:-1]: 15 | # if token.text == "“" or token.text == "”": 16 | # doc[token.i + 1].is_sent_start = True 17 | if token.text == "“": 18 | doc[token.i + 1].is_sent_start = True 19 | return doc 20 | 21 | 22 | nlp = spacy.load('en_core_web_sm') 23 | nlp.add_pipe(quote_boundaries, before="parser") 24 | nlp_no_comma = copy.deepcopy(nlp) 25 | sentencizer = nlp.create_pipe("sentencizer") 26 | sentencizer.punct_chars.add(',') 27 | sentencizer_no_comma = nlp_no_comma.create_pipe("sentencizer") 28 | nlp.add_pipe(sentencizer, first=True) 29 | nlp_no_comma.add_pipe(sentencizer_no_comma, first=True) 30 | 31 | 32 | def normalize_text(text): 33 | """Normalize text so that some punctuations that indicate pauses will be replaced as commas""" 34 | replace_list = [ 35 | [r'(\.\.\.)$|…$', '.'], 36 | [r'\(|\)|:|;| “|(\s*-+\s+)|(\s+-+\s*)|\s*-{2,}\s*|(\.\.\.)|…|—', ', '], 37 | [r'\s*,[^\w]*,\s*', ', '], # capture multiple commas 38 | [r'\s*,\s*', ', '], # format commas 39 | [r'\.,', '.'], 40 | [r'[‘’“”]', ''] # strip quote 41 | ] 42 | for regex, replacement in replace_list: 43 | text = re.sub(regex, replacement, text) 44 | text = unidecode(text) # Get rid of the accented characters 45 | text = text.lower() 46 | text = re.sub(f"[^{hp.vocab}]", " ", text) 47 | text = re.sub(r' +', ' ', text).strip() 48 | return text 49 | 50 | 51 | def fix_text(text): 52 | """fix text for pasting content from the book""" 53 | replace_list = [ 54 | [r'(\w)’(\w)', r"\1'\2"], # fix apostrophe for content from books 55 | ] 56 | for regex, replacement in replace_list: 57 | text = re.sub(regex, replacement, text) 58 | text = re.sub(r' +', ' ', text) 59 | 60 | return text 61 | 62 | 63 | def trim_text(generated_text, max_sentences=0, script=False): 64 | """trim unfinished sentence generated by GPT2""" 65 | # remove this replacement character for utf-8, a bug? 66 | generated_text = generated_text.replace(b'\xef\xbf\xbd'.decode('utf-8'), '') 67 | if script: 68 | generated_text = generated_text.replace('\n', '') 69 | text_list = re.findall(r'.*?[.!\?…—][’”]*', generated_text, re.DOTALL) 70 | if script: 71 | text_list = ['\n' + text if text[0].isupper() else text for text in text_list] 72 | 73 | # if limit the max_sentence 74 | if max_sentences: 75 | # find all sentences and parsed as list and select the first nth items and join them back 76 | return ''.join(text_list[:max_sentences]) 77 | else: 78 | return ''.join(text_list) 79 | 80 | 81 | # backup... 82 | # # select until the last punctuation using regex, and create an nlp object for counting sentences 83 | # text_list = [*nlp_no_comma(re.findall(r'.*[.!\?’”]', generated_text, re.DOTALL)[0]).sents] 84 | # # figure out how to select max sentence(which structure) 85 | # text_list = re.findall(r'.*?[.!\?]|.*\w+', generated_text, re.DOTALL) 86 | # for i in reversed(range(1, len(text_list))): 87 | # try: 88 | # while not text_list[i][0].isalpha() and text_list[i][0] != '“' and text_list[i][0] != '‘' and text_list[i][0] != ' ': 89 | # text_list[i - 1] = text_list[i - 1] + text_list[i][0] 90 | # text_list[i] = text_list[i][1:] 91 | # if not text_list: 92 | # break 93 | # except IndexError: 94 | # print('ok') 95 | # if not any(text_list[-1][-1] == x for x in ['.', '!', '?']): 96 | # del text_list[-1] 97 | # if max_sentences: 98 | # text_list = [text.text for i, text in enumerate(text_list) if i < max_sentences] 99 | # else: 100 | # text_list = [text.text for text in text_list] 101 | # return ' '.join(text_list) 102 | 103 | 104 | def separate(text, n_gram, comma, max_len=30): 105 | _nlp = nlp if comma else nlp_no_comma 106 | lines = [] 107 | line = '' 108 | counter = 0 109 | for sent in _nlp(text).sents: 110 | if sent.text: 111 | if counter == 0: 112 | line = sent.text 113 | else: 114 | line = f'{line} {sent.text}' 115 | counter += 1 116 | 117 | if counter == n_gram: 118 | lines.append(_nlp(line)) 119 | line = '' 120 | counter = 0 121 | 122 | # for remaining sentences 123 | if line: 124 | lines.append(_nlp(line)) 125 | 126 | return lines 127 | 128 | 129 | def get_duration(second): 130 | return int(hp.sr * second) 131 | 132 | 133 | def normalize_audio(wav): 134 | # normalize the audio with pydub 135 | audioseg = AudioSegment(wav.tobytes(), sample_width=2, frame_rate=hp.sr, channels=1) 136 | # normalized = effects.normalize(audioseg, self.norm_factor) 137 | normalized = audioseg.apply_gain(-30 - audioseg.dBFS) 138 | wav = np.array(normalized.get_array_of_samples()) 139 | return wav 140 | 141 | 142 | # from my audio processing project 143 | def split_audio_to_list(source, preemph=True, preemphasis=0.8, min_diff=1500, min_size=get_duration(1), db=80): 144 | if preemph: 145 | source = np.append(source[0], source[1:] - preemphasis * source[:-1]) 146 | split_list = librosa.effects.split(source, top_db=db).tolist() 147 | i = len(split_list) - 1 148 | while i > 0: 149 | if split_list[i][-1] - split_list[i][0] > min_size: 150 | now = split_list[i][0] 151 | prev = split_list[i - 1][1] 152 | diff = now - prev 153 | if diff < min_diff: 154 | split_list[i - 1] = [split_list[i - 1][0], split_list.pop(i)[1]] 155 | else: 156 | split_list.pop(i) 157 | i -= 1 158 | 159 | # make sure nothing is trimmed away 160 | split_list[0][0] = 0 161 | split_list[-1][1] = len(source) 162 | for i in reversed(range(len(split_list))): 163 | if i != 0: 164 | split_list[i][0] = split_list[i - 1][1] 165 | 166 | return split_list 167 | -------------------------------------------------------------------------------- /voice.py: -------------------------------------------------------------------------------- 1 | # SIU KING WAI SM4701 Deepstory 2 | import numpy as np 3 | import torch 4 | import glob 5 | 6 | from util import normalize_text, normalize_audio 7 | from modules.dctts import Text2Mel, SSRN, hp, spectrogram2wav 8 | 9 | torch.set_grad_enabled(False) 10 | device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 11 | 12 | 13 | class Voice: 14 | def __init__(self, speaker): 15 | self.speaker = speaker 16 | self.text2mel = None 17 | self.ssrn = None 18 | 19 | def __enter__(self): 20 | self.load() 21 | return self 22 | 23 | def __exit__(self, exc_type, exc_val, exc_tb): 24 | self.close() 25 | 26 | def load(self): 27 | self.text2mel = Text2Mel(hp.vocab).to(device).eval() 28 | self.text2mel.load_state_dict(torch.load(glob.glob(f'data/dctts/{self.speaker}/t2m*.pth')[0])['state_dict']) 29 | self.ssrn = SSRN().to(device).eval() 30 | self.ssrn.load_state_dict(torch.load(f'data/dctts/{self.speaker}/ssrn.pth')['state_dict']) 31 | 32 | def close(self): 33 | del self.text2mel 34 | del self.ssrn 35 | torch.cuda.empty_cache() 36 | 37 | # referenced from original repo 38 | def synthesize(self, text, timeout=10000): 39 | with torch.no_grad(): # no grad to save memory 40 | normalized_text = normalize_text(text) + "E" # text normalization, E: EOS 41 | L = torch.from_numpy(np.array([[hp.char2idx[char] for char in normalized_text]], np.long)).to(device) 42 | zeros = torch.from_numpy(np.zeros((1, hp.n_mels, 1), np.float32)).to(device) 43 | Y = zeros 44 | 45 | for i in range(timeout): 46 | _, Y_t, A = self.text2mel(L, Y, monotonic_attention=True) 47 | Y = torch.cat((zeros, Y_t), -1) 48 | _, attention = torch.max(A[0, :, -1], 0) 49 | attention = attention.item() 50 | if L[0, attention] == hp.vocab.index('E'): # EOS 51 | break 52 | 53 | _, Z = self.ssrn(Y) # batch ssrn instead? 54 | Z = Z.cpu().detach().numpy() 55 | 56 | wav = spectrogram2wav(Z[0, :, :].T) 57 | wav = normalize_audio(wav) 58 | return wav 59 | --------------------------------------------------------------------------------