├── .gitignore ├── .gitmodules ├── LICENSE ├── README.md ├── imgs ├── 01.overview.png └── 02.framework.png └── mtllm ├── __init__.py ├── criterions └── cross_entropy_acc.py ├── data ├── speechllm_dataset.py └── tokenizer.py ├── inference ├── generate.py └── sequence_generator.py ├── models ├── llama.py ├── speechllm_model.py ├── wavlm.py └── whisper_encoder.py ├── modules └── convolution.py ├── requirements.txt ├── scripts └── inference.sh ├── tasks └── speechllm_task.py ├── test_data ├── audio │ ├── 4198-61336-0009_common-voice-de-19162009_4077-13754-0013.wav │ ├── common-voice-de-20014556_3575-170457-0004.wav │ ├── test-clean-2mix-0728.wav │ └── test-clean-3mix-1307.wav ├── dict.txt └── for_demo.tsv └── tokenizer └── tokenizer.model /.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__/ 2 | -------------------------------------------------------------------------------- /.gitmodules: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cuhealthybrains/MT-LLM/HEAD/.gitmodules -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cuhealthybrains/MT-LLM/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cuhealthybrains/MT-LLM/HEAD/README.md -------------------------------------------------------------------------------- /imgs/01.overview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cuhealthybrains/MT-LLM/HEAD/imgs/01.overview.png -------------------------------------------------------------------------------- /imgs/02.framework.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cuhealthybrains/MT-LLM/HEAD/imgs/02.framework.png -------------------------------------------------------------------------------- /mtllm/__init__.py: -------------------------------------------------------------------------------- 1 | from . import criterions 2 | -------------------------------------------------------------------------------- /mtllm/criterions/cross_entropy_acc.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cuhealthybrains/MT-LLM/HEAD/mtllm/criterions/cross_entropy_acc.py -------------------------------------------------------------------------------- /mtllm/data/speechllm_dataset.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cuhealthybrains/MT-LLM/HEAD/mtllm/data/speechllm_dataset.py -------------------------------------------------------------------------------- /mtllm/data/tokenizer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cuhealthybrains/MT-LLM/HEAD/mtllm/data/tokenizer.py -------------------------------------------------------------------------------- /mtllm/inference/generate.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cuhealthybrains/MT-LLM/HEAD/mtllm/inference/generate.py -------------------------------------------------------------------------------- /mtllm/inference/sequence_generator.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cuhealthybrains/MT-LLM/HEAD/mtllm/inference/sequence_generator.py -------------------------------------------------------------------------------- /mtllm/models/llama.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cuhealthybrains/MT-LLM/HEAD/mtllm/models/llama.py -------------------------------------------------------------------------------- /mtllm/models/speechllm_model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cuhealthybrains/MT-LLM/HEAD/mtllm/models/speechllm_model.py -------------------------------------------------------------------------------- /mtllm/models/wavlm.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cuhealthybrains/MT-LLM/HEAD/mtllm/models/wavlm.py -------------------------------------------------------------------------------- /mtllm/models/whisper_encoder.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cuhealthybrains/MT-LLM/HEAD/mtllm/models/whisper_encoder.py -------------------------------------------------------------------------------- /mtllm/modules/convolution.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cuhealthybrains/MT-LLM/HEAD/mtllm/modules/convolution.py -------------------------------------------------------------------------------- /mtllm/requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cuhealthybrains/MT-LLM/HEAD/mtllm/requirements.txt -------------------------------------------------------------------------------- /mtllm/scripts/inference.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cuhealthybrains/MT-LLM/HEAD/mtllm/scripts/inference.sh -------------------------------------------------------------------------------- /mtllm/tasks/speechllm_task.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cuhealthybrains/MT-LLM/HEAD/mtllm/tasks/speechllm_task.py -------------------------------------------------------------------------------- /mtllm/test_data/audio/4198-61336-0009_common-voice-de-19162009_4077-13754-0013.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cuhealthybrains/MT-LLM/HEAD/mtllm/test_data/audio/4198-61336-0009_common-voice-de-19162009_4077-13754-0013.wav -------------------------------------------------------------------------------- /mtllm/test_data/audio/common-voice-de-20014556_3575-170457-0004.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cuhealthybrains/MT-LLM/HEAD/mtllm/test_data/audio/common-voice-de-20014556_3575-170457-0004.wav -------------------------------------------------------------------------------- /mtllm/test_data/audio/test-clean-2mix-0728.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cuhealthybrains/MT-LLM/HEAD/mtllm/test_data/audio/test-clean-2mix-0728.wav -------------------------------------------------------------------------------- /mtllm/test_data/audio/test-clean-3mix-1307.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cuhealthybrains/MT-LLM/HEAD/mtllm/test_data/audio/test-clean-3mix-1307.wav -------------------------------------------------------------------------------- /mtllm/test_data/dict.txt: -------------------------------------------------------------------------------- 1 | 1 1 2 | 2 2 3 | 3 3 4 | 4 4 5 | 5 5 6 | -------------------------------------------------------------------------------- /mtllm/test_data/for_demo.tsv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cuhealthybrains/MT-LLM/HEAD/mtllm/test_data/for_demo.tsv -------------------------------------------------------------------------------- /mtllm/tokenizer/tokenizer.model: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cuhealthybrains/MT-LLM/HEAD/mtllm/tokenizer/tokenizer.model --------------------------------------------------------------------------------