├── .gitignore ├── .gitmodules ├── LICENSE ├── README.md ├── docs └── source │ └── _static │ └── pipeline.png └── pipeline ├── convert_transcribe ├── README.md └── convert_and_transcribe.py ├── crawler ├── README.md └── download_from_youtube_channels.sh ├── force_alignment ├── README.md ├── calculate_precision.py ├── force_align.sh └── force_align_from_list.sh ├── segmentation ├── README.md ├── filter_manifest.py ├── segment.sh ├── segment_from_list.sh └── segment_from_manifests.py └── utils ├── force_alignment ├── README.md ├── align.py ├── align_utils.py ├── norm_config.py ├── punctuations.lst └── text_normalization.py └── textgrid2jsonl.py /.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__ 2 | .DS_Store 3 | -------------------------------------------------------------------------------- /.gitmodules: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SpeechColab/GigaSpeech2/HEAD/.gitmodules -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SpeechColab/GigaSpeech2/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SpeechColab/GigaSpeech2/HEAD/README.md -------------------------------------------------------------------------------- /docs/source/_static/pipeline.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SpeechColab/GigaSpeech2/HEAD/docs/source/_static/pipeline.png -------------------------------------------------------------------------------- /pipeline/convert_transcribe/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SpeechColab/GigaSpeech2/HEAD/pipeline/convert_transcribe/README.md -------------------------------------------------------------------------------- /pipeline/convert_transcribe/convert_and_transcribe.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SpeechColab/GigaSpeech2/HEAD/pipeline/convert_transcribe/convert_and_transcribe.py -------------------------------------------------------------------------------- /pipeline/crawler/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SpeechColab/GigaSpeech2/HEAD/pipeline/crawler/README.md -------------------------------------------------------------------------------- /pipeline/crawler/download_from_youtube_channels.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SpeechColab/GigaSpeech2/HEAD/pipeline/crawler/download_from_youtube_channels.sh -------------------------------------------------------------------------------- /pipeline/force_alignment/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SpeechColab/GigaSpeech2/HEAD/pipeline/force_alignment/README.md -------------------------------------------------------------------------------- /pipeline/force_alignment/calculate_precision.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SpeechColab/GigaSpeech2/HEAD/pipeline/force_alignment/calculate_precision.py -------------------------------------------------------------------------------- /pipeline/force_alignment/force_align.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SpeechColab/GigaSpeech2/HEAD/pipeline/force_alignment/force_align.sh -------------------------------------------------------------------------------- /pipeline/force_alignment/force_align_from_list.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SpeechColab/GigaSpeech2/HEAD/pipeline/force_alignment/force_align_from_list.sh -------------------------------------------------------------------------------- /pipeline/segmentation/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SpeechColab/GigaSpeech2/HEAD/pipeline/segmentation/README.md -------------------------------------------------------------------------------- /pipeline/segmentation/filter_manifest.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SpeechColab/GigaSpeech2/HEAD/pipeline/segmentation/filter_manifest.py -------------------------------------------------------------------------------- /pipeline/segmentation/segment.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SpeechColab/GigaSpeech2/HEAD/pipeline/segmentation/segment.sh -------------------------------------------------------------------------------- /pipeline/segmentation/segment_from_list.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SpeechColab/GigaSpeech2/HEAD/pipeline/segmentation/segment_from_list.sh -------------------------------------------------------------------------------- /pipeline/segmentation/segment_from_manifests.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SpeechColab/GigaSpeech2/HEAD/pipeline/segmentation/segment_from_manifests.py -------------------------------------------------------------------------------- /pipeline/utils/force_alignment/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SpeechColab/GigaSpeech2/HEAD/pipeline/utils/force_alignment/README.md -------------------------------------------------------------------------------- /pipeline/utils/force_alignment/align.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SpeechColab/GigaSpeech2/HEAD/pipeline/utils/force_alignment/align.py -------------------------------------------------------------------------------- /pipeline/utils/force_alignment/align_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SpeechColab/GigaSpeech2/HEAD/pipeline/utils/force_alignment/align_utils.py -------------------------------------------------------------------------------- /pipeline/utils/force_alignment/norm_config.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SpeechColab/GigaSpeech2/HEAD/pipeline/utils/force_alignment/norm_config.py -------------------------------------------------------------------------------- /pipeline/utils/force_alignment/punctuations.lst: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SpeechColab/GigaSpeech2/HEAD/pipeline/utils/force_alignment/punctuations.lst -------------------------------------------------------------------------------- /pipeline/utils/force_alignment/text_normalization.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SpeechColab/GigaSpeech2/HEAD/pipeline/utils/force_alignment/text_normalization.py -------------------------------------------------------------------------------- /pipeline/utils/textgrid2jsonl.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SpeechColab/GigaSpeech2/HEAD/pipeline/utils/textgrid2jsonl.py --------------------------------------------------------------------------------