├── .gitignore ├── README.md ├── download ├── README.md ├── __pycache__ │ └── download_youtube.cpython-36.pyc ├── audios │ ├── README.md │ ├── __pycache__ │ │ └── audio_chunks.cpython-36.pyc │ ├── audio_Spectrum.py │ ├── audio_chunks.py │ ├── audio_total_length.py │ ├── export_audioChunks.py │ └── get_chuncks_text.py ├── convert_wav.py ├── download_list.py ├── download_youtube.py └── urlList.txt └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | ## Ignore File 2 | 3 | *.mp4 4 | *.wav -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 책 읽어주는 딥러닝을 보고 꼽혀 만든 Repository 2 | 책 읽어주는 딥러닝이 너무 어려워보여 쉽고 간단하게 공부하면서 만들어볼 수 있지 않을까 하면서 시작하게된 repository입니다. 3 | 4 | * * * 5 | 6 | ### 1. 데이터 수집 및 음성 추출. 7 | download 디렉토리에 유튜브 영상을 수집하는 프로그램이 있습니다.
8 | 영상을 수집하고 오디오를 추출하여 wav파일로 뽑아내는 코드도 있습니다.
9 | [이동](https://github.com/clianor/voice-speaker-tensorflow/tree/master/download) 10 | * * * 11 | 12 | ### 2. 음성 자르기. 13 | 추출한 음성 파일을 잘라 추출하는 프로그램이 있습니다.
14 | [이동](https://github.com/clianor/voice-speaker-tensorflow/tree/master/download/audios) 15 | 16 | * * * -------------------------------------------------------------------------------- /download/README.md: -------------------------------------------------------------------------------- 1 | # Data download & Export wav files 2 | 3 | ### 파일 설명 4 | - download_youtube.py 5 | - ``` python download_youtube.py ``` 처럼 사용하며 유튜브 영상을 다운로드 가능합니다. 6 | - 저장은 videos 폴더에 저장이되면 폴더가 존재하지 않으면 만들어 저장합니다. 7 | - download_list.py 8 | - ``` python download_list.py ``` 처럼 사용하며 urlList.txt 파일을 읽어 저장된 URL들의 유튜브 영상을 다운로드합니다. 9 | - convert_wav.py 10 | - ``` python convert_wav.py videos audios ``` 처럼 사용하며 영상에서 오디오를 추출하여 audios 디렉토리에 저장합니다. 11 | 12 | * * * 13 | 14 | ### 실행순서 15 | 1. ``` python download_list.py ``` 16 | 2. ``` python convert_wav.py``` 17 | 18 | * * * -------------------------------------------------------------------------------- /download/__pycache__/download_youtube.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/clianor/voice-speaker-tensorflow/537e80be65eaa35f997093cee2be84035c66a188/download/__pycache__/download_youtube.cpython-36.pyc -------------------------------------------------------------------------------- /download/audios/README.md: -------------------------------------------------------------------------------- 1 | # Export Audio Chunks 2 | 3 | ### 파일 설명 4 | - audio_Spectrum.py 5 | - ``` python audio_Spectrum.py ``` 처럼 사용하며 파일을 열어 librosa.load 부분을 수정하여 특정 음성 파일의 스펙트럼을 확인할 수 있습니다. 6 | - audio_chunks.py 7 | - ``` python audio_chunks.py ``` 처럼 사용하며 audio_Spectrum.py와 같이 열어 함수의 인자 부분을 수정하여 사용할 수 있습니다. 8 | - export_audioChunks.py 9 | - ``` python export_audioChunks.py ``` 처럼 사용하며 audios파일에 존재하는 모든 wav 파일들을 조건에 맞는 chunk로 split하여 chunks 디렉토리에 저장합니다. 10 | * * * 11 | 12 | ### 실행 순서 13 | 1. ``` python audio_Spectrum.py ``` 처럼 사용하여 먼저 오디오의 스펙트럼을 확인한다. 14 | 2. ``` python export_audioChunks.py ``` 처럼 사용하여 오디오의 chunk들을 추출합니다. 15 | 3. ``` python audio_total_length.py ``` 처럼 사용하여 추출된 chunk들의 총 길이를 계산합니다. (단위 : sec) 16 | * * * 17 | 18 | ### Speech to text 19 | - 구글 Speech API를 이용 아래 참조. 20 | - https://cloud.google.com/speech-to-text/docs/reference/libraries?hl=ko -------------------------------------------------------------------------------- /download/audios/__pycache__/audio_chunks.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/clianor/voice-speaker-tensorflow/537e80be65eaa35f997093cee2be84035c66a188/download/audios/__pycache__/audio_chunks.cpython-36.pyc -------------------------------------------------------------------------------- /download/audios/audio_Spectrum.py: -------------------------------------------------------------------------------- 1 | import librosa 2 | import librosa.display 3 | import numpy as np 4 | import matplotlib.pyplot as plt 5 | 6 | y, sr = librosa.load("data/yVxDSnTFN6o.wav") 7 | 8 | S = librosa.feature.melspectrogram(y, sr=sr, n_mels=128) 9 | log_S = librosa.amplitude_to_db(S, ref=np.max) 10 | 11 | plt.figure(figsize=(12, 4)) 12 | librosa.display.specshow(log_S, sr=sr, x_axis='time', y_axis='mel') 13 | plt.title('mel power spectrogram') 14 | plt.colorbar(format='%+02.0f dB') 15 | plt.tight_layout() 16 | plt.show() 17 | -------------------------------------------------------------------------------- /download/audios/audio_chunks.py: -------------------------------------------------------------------------------- 1 | import os 2 | from pydub import AudioSegment 3 | from pydub.silence import split_on_silence 4 | 5 | def AudioChuncks(filePath, format): 6 | if "chuncks" not in os.listdir(): 7 | os.mkdir("chuncks") 8 | 9 | sound = AudioSegment.from_file("data/"+filePath, format=format) 10 | sound = sound.set_channels(1) 11 | 12 | chunks = split_on_silence( 13 | sound, 14 | # 0.5초(500ms)보다 긴 무음을 무음으로 간주. 15 | min_silence_len = 500, 16 | # -50 dBFS 미만의 소리는 없는것으로 간주. 17 | silence_thresh = -50, 18 | ) 19 | 20 | soundCount = 0 21 | for chunk in chunks: 22 | # 잘려진 음성의 길이가 0.8초 이상일때 사용함. 23 | if chunk.duration_seconds > 0.8: 24 | chunk.export("chuncks/"+os.path.splitext(filePath)[0]+"{0}.wav".format(soundCount), format="wav") 25 | soundCount += 1 26 | 27 | if __name__ == "__main__": 28 | AudioChuncks("yVxDSnTFN6o&t.wav", "wav") -------------------------------------------------------------------------------- /download/audios/audio_total_length.py: -------------------------------------------------------------------------------- 1 | import os 2 | from os.path import splitext 3 | from pydub import AudioSegment 4 | 5 | audioFiles = [fileName for fileName in os.listdir("chuncks") if splitext(fileName)[1] == ".wav"] 6 | totalLength = 0. 7 | for fn in audioFiles: 8 | totalLength += AudioSegment.from_file("chuncks/"+fn, format="wav").duration_seconds 9 | 10 | print("Audio Files Total Length :", totalLength, "sec") -------------------------------------------------------------------------------- /download/audios/export_audioChunks.py: -------------------------------------------------------------------------------- 1 | import os 2 | from os.path import splitext 3 | from audio_chunks import AudioChuncks 4 | 5 | wavFiles = [fileName for fileName in os.listdir("data") if splitext(fileName)[1] == ".wav"] 6 | for fileName in wavFiles: 7 | AudioChuncks(fileName, "wav") -------------------------------------------------------------------------------- /download/audios/get_chuncks_text.py: -------------------------------------------------------------------------------- 1 | import io 2 | import os 3 | import json 4 | from progressbar import ProgressBar 5 | 6 | from google.cloud import speech 7 | from google.cloud.speech import enums 8 | from google.cloud.speech import types 9 | 10 | 11 | def getAudioText(fn): 12 | client = speech.SpeechClient() 13 | 14 | file_name = os.path.join('chuncks',fn) 15 | with io.open(file_name, 'rb') as audio_file: 16 | content=audio_file.read() 17 | audio=types.RecognitionAudio(content=content) 18 | 19 | config=types.RecognitionConfig(language_code='ko-KR') 20 | response=client.recognize(config, audio) 21 | try: 22 | result = response.results[0].alternatives[0].transcript 23 | except: 24 | result = "" 25 | return result 26 | 27 | if __name__ == "__main__": 28 | bar=ProgressBar() 29 | data=dict() 30 | for fn in bar(sorted(os.listdir("chuncks"))): 31 | data[fn]=getAudioText(fn) 32 | 33 | with open("alignment.json","w") as f: 34 | f.write(json.dumps(data,ensure_ascii=False,indent="\t")) -------------------------------------------------------------------------------- /download/convert_wav.py: -------------------------------------------------------------------------------- 1 | import os, sys 2 | from moviepy.editor import * 3 | 4 | if __name__ == "__main__": 5 | if len(sys.argv) != 3: 6 | print("비디오 디렉토리와 오디오 디렉토리를 입력하세요") 7 | sys.exit() 8 | 9 | if sys.argv[2] not in os.listdir(): 10 | os.mkdir(sys.argv[2]) 11 | if "data" not in os.listdir(".\\"+sys.argv[2]): 12 | os.mkdir(os.path.join(sys.argv[2], "data")) 13 | 14 | files=list(filter(lambda x: os.path.splitext(x)[1] == ".mp4" ,os.listdir(sys.argv[1]))) 15 | for filename in files: 16 | try: 17 | video_clip=VideoFileClip(os.path.join(sys.argv[1], filename)) 18 | audio_clip=video_clip.audio 19 | print("\npath:", ".\\"+os.path.join(sys.argv[1], "data\\"+filename)) 20 | audio_clip.write_audiofile(os.path.join(sys.argv[2], "data\\"+os.path.splitext(filename)[0]+".wav")) 21 | audio_clip.close() 22 | except Exception as e: 23 | print(filename, e) -------------------------------------------------------------------------------- /download/download_list.py: -------------------------------------------------------------------------------- 1 | import os 2 | from download_youtube import DownloadYoutube 3 | 4 | if __name__ == "__main__": 5 | try: 6 | with open("urlList.txt", "r") as f: 7 | urls=f.readlines() 8 | 9 | for url in urls: 10 | DownloadYoutube(url) 11 | except Exception as e: 12 | print(e) -------------------------------------------------------------------------------- /download/download_youtube.py: -------------------------------------------------------------------------------- 1 | import os, re, sys 2 | import unicodedata 3 | import subprocess 4 | import pytube 5 | 6 | def DownloadYoutube(url): 7 | try: 8 | if not os.path.isdir(".\\videos"): 9 | os.mkdir(".\\videos") 10 | 11 | yt=pytube.YouTube(url) 12 | vids=yt.streams.all() 13 | 14 | parent_dir=".\\videos" 15 | vids[0].download(parent_dir) 16 | 17 | new_filename=re.findall("watch\?v=(\S+)", url)[0]+".mp4" 18 | default_filename=vids[0].default_filename 19 | try: 20 | os.rename(os.path.join(parent_dir, default_filename), os.path.join(parent_dir, new_filename)) 21 | except FileExistsError as e: 22 | print(new_filename, "파일이 존재합니다.") 23 | os.remove(os.path.join(parent_dir, default_filename)) 24 | except Exception as e: 25 | print(e) 26 | # end def 27 | 28 | if __name__ == "__main__": 29 | DownloadYoutube(sys.argv[1]) -------------------------------------------------------------------------------- /download/urlList.txt: -------------------------------------------------------------------------------- 1 | https://www.youtube.com/watch?v=cL7LXYA1uI4 2 | https://www.youtube.com/watch?v=lpJH4X6WiIY 3 | https://www.youtube.com/watch?v=YdsWnEXKLsY 4 | https://www.youtube.com/watch?v=QmOAZM4t_js 5 | https://www.youtube.com/watch?v=7_91FYi0nlI 6 | https://www.youtube.com/watch?v=yVxDSnTFN6o 7 | https://www.youtube.com/watch?v=0oJABnYXeoY 8 | https://www.youtube.com/watch?v=nRH6CedeUoo 9 | https://www.youtube.com/watch?v=tTRG60X0svo -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | audioread==2.1.6 2 | backcall==0.1.0 3 | certifi==2018.4.16 4 | chardet==3.0.4 5 | colorama==0.3.9 6 | cycler==0.10.0 7 | decorator==4.3.0 8 | future==0.16.0 9 | idna==2.7 10 | imageio==2.3.0 11 | ipython==6.4.0 12 | ipython-genutils==0.2.0 13 | jedi==0.12.1 14 | joblib==0.12.1 15 | kiwisolver==1.0.1 16 | librosa==0.6.1 17 | llvmlite==0.24.0 18 | matplotlib==2.2.2 19 | moviepy==0.2.3.5 20 | numba==0.39.0 21 | numpy==1.14.5 22 | parso==0.3.1 23 | pickleshare==0.7.4 24 | Pillow==5.2.0 25 | prompt-toolkit==1.0.15 26 | pydub==0.22.1 27 | Pygments==2.2.0 28 | pyparsing==2.2.0 29 | python-dateutil==2.7.3 30 | pytube==9.2.2 31 | pytz==2018.5 32 | requests==2.19.1 33 | resampy==0.2.1 34 | scikit-learn==0.19.2 35 | scipy==1.1.0 36 | simplegeneric==0.8.1 37 | six==1.11.0 38 | tqdm==4.23.4 39 | traitlets==4.3.2 40 | urllib3==1.23 41 | wcwidth==0.1.7 42 | --------------------------------------------------------------------------------