├── .gitignore ├── Dockerfile ├── LICENSE ├── README.md ├── assets ├── architecture5.pdf ├── architecture5.png ├── arxiv.svg ├── default_female.wav ├── default_male.wav ├── give_me_a_brief_introduction_to_the_great_wall.wav ├── logo.png ├── mmau_test.wav ├── multi-turn-round1-听说荡口古镇从下个月开始取消门票了,你知道这事吗。.wav ├── multi-turn-round2-新闻说九月十九号就免费开放了。好像整个古镇都升级改造了,现在变成开放式街区了。.wav ├── music_playing_followed_by_a_woman_speaking.wav ├── paralinguistic_information_understanding.wav ├── qrcode.jpg ├── radar.png ├── radar_mini.png ├── search_result.txt ├── usage.jpg ├── wechat_group.png └── 帮我查一下今天上证指数的开盘价是多少.wav ├── cosyvoice2 ├── flow │ ├── __init__.py │ ├── decoder_dit.py │ ├── flow.py │ └── flow_matching.py ├── transformer │ ├── __init__.py │ ├── attention.py │ ├── embedding.py │ ├── encoder_layer.py │ ├── positionwise_feed_forward.py │ ├── subsampling.py │ └── upsample_encoder_v2.py └── utils │ ├── class_utils.py │ ├── common.py │ └── mask.py ├── examples-base.py ├── examples-think.py ├── examples-vllm-stream.py ├── examples-vllm.py ├── examples.py ├── flashcosyvoice ├── __init__.py ├── cli.py ├── config.py ├── cosyvoice2.py ├── cosyvoice3.py ├── engine │ ├── __init__.py │ ├── block_manager.py │ ├── llm_engine.py │ ├── model_runner.py │ ├── scheduler.py │ └── sequence.py ├── modules │ ├── __init__.py │ ├── flow.py │ ├── flow_components │ │ ├── __init__.py │ │ ├── estimator.py │ │ └── upsample_encoder.py │ ├── hifigan.py │ ├── hifigan_components │ │ ├── __init__.py │ │ └── layers.py │ ├── qwen2.py │ ├── qwen2_components │ │ ├── __init__.py │ │ └── layers.py │ └── sampler.py └── utils │ ├── __init__.py │ ├── audio.py │ ├── context.py │ ├── loader.py │ └── memory.py ├── stepaudio2.py ├── stepaudio2vllm.py ├── token2wav.py ├── utils.py ├── web_demo.py └── web_demo_vllm.py /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/.gitignore -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/Dockerfile -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/README.md -------------------------------------------------------------------------------- /assets/architecture5.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/assets/architecture5.pdf -------------------------------------------------------------------------------- /assets/architecture5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/assets/architecture5.png -------------------------------------------------------------------------------- /assets/arxiv.svg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/assets/arxiv.svg -------------------------------------------------------------------------------- /assets/default_female.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/assets/default_female.wav -------------------------------------------------------------------------------- /assets/default_male.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/assets/default_male.wav -------------------------------------------------------------------------------- /assets/give_me_a_brief_introduction_to_the_great_wall.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/assets/give_me_a_brief_introduction_to_the_great_wall.wav -------------------------------------------------------------------------------- /assets/logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/assets/logo.png -------------------------------------------------------------------------------- /assets/mmau_test.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/assets/mmau_test.wav -------------------------------------------------------------------------------- /assets/multi-turn-round1-听说荡口古镇从下个月开始取消门票了,你知道这事吗。.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/assets/multi-turn-round1-听说荡口古镇从下个月开始取消门票了,你知道这事吗。.wav -------------------------------------------------------------------------------- /assets/multi-turn-round2-新闻说九月十九号就免费开放了。好像整个古镇都升级改造了,现在变成开放式街区了。.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/assets/multi-turn-round2-新闻说九月十九号就免费开放了。好像整个古镇都升级改造了,现在变成开放式街区了。.wav -------------------------------------------------------------------------------- /assets/music_playing_followed_by_a_woman_speaking.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/assets/music_playing_followed_by_a_woman_speaking.wav -------------------------------------------------------------------------------- /assets/paralinguistic_information_understanding.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/assets/paralinguistic_information_understanding.wav -------------------------------------------------------------------------------- /assets/qrcode.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/assets/qrcode.jpg -------------------------------------------------------------------------------- /assets/radar.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/assets/radar.png -------------------------------------------------------------------------------- /assets/radar_mini.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/assets/radar_mini.png -------------------------------------------------------------------------------- /assets/search_result.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/assets/search_result.txt -------------------------------------------------------------------------------- /assets/usage.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/assets/usage.jpg -------------------------------------------------------------------------------- /assets/wechat_group.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/assets/wechat_group.png -------------------------------------------------------------------------------- /assets/帮我查一下今天上证指数的开盘价是多少.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/assets/帮我查一下今天上证指数的开盘价是多少.wav -------------------------------------------------------------------------------- /cosyvoice2/flow/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /cosyvoice2/flow/decoder_dit.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/cosyvoice2/flow/decoder_dit.py -------------------------------------------------------------------------------- /cosyvoice2/flow/flow.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/cosyvoice2/flow/flow.py -------------------------------------------------------------------------------- /cosyvoice2/flow/flow_matching.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/cosyvoice2/flow/flow_matching.py -------------------------------------------------------------------------------- /cosyvoice2/transformer/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /cosyvoice2/transformer/attention.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/cosyvoice2/transformer/attention.py -------------------------------------------------------------------------------- /cosyvoice2/transformer/embedding.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/cosyvoice2/transformer/embedding.py -------------------------------------------------------------------------------- /cosyvoice2/transformer/encoder_layer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/cosyvoice2/transformer/encoder_layer.py -------------------------------------------------------------------------------- /cosyvoice2/transformer/positionwise_feed_forward.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/cosyvoice2/transformer/positionwise_feed_forward.py -------------------------------------------------------------------------------- /cosyvoice2/transformer/subsampling.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/cosyvoice2/transformer/subsampling.py -------------------------------------------------------------------------------- /cosyvoice2/transformer/upsample_encoder_v2.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/cosyvoice2/transformer/upsample_encoder_v2.py -------------------------------------------------------------------------------- /cosyvoice2/utils/class_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/cosyvoice2/utils/class_utils.py -------------------------------------------------------------------------------- /cosyvoice2/utils/common.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/cosyvoice2/utils/common.py -------------------------------------------------------------------------------- /cosyvoice2/utils/mask.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/cosyvoice2/utils/mask.py -------------------------------------------------------------------------------- /examples-base.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/examples-base.py -------------------------------------------------------------------------------- /examples-think.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/examples-think.py -------------------------------------------------------------------------------- /examples-vllm-stream.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/examples-vllm-stream.py -------------------------------------------------------------------------------- /examples-vllm.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/examples-vllm.py -------------------------------------------------------------------------------- /examples.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/examples.py -------------------------------------------------------------------------------- /flashcosyvoice/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /flashcosyvoice/cli.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/flashcosyvoice/cli.py -------------------------------------------------------------------------------- /flashcosyvoice/config.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/flashcosyvoice/config.py -------------------------------------------------------------------------------- /flashcosyvoice/cosyvoice2.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/flashcosyvoice/cosyvoice2.py -------------------------------------------------------------------------------- /flashcosyvoice/cosyvoice3.py: -------------------------------------------------------------------------------- 1 | # TODO(xcsong): Implement CosyVoice3 when it is released 2 | -------------------------------------------------------------------------------- /flashcosyvoice/engine/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /flashcosyvoice/engine/block_manager.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/flashcosyvoice/engine/block_manager.py -------------------------------------------------------------------------------- /flashcosyvoice/engine/llm_engine.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/flashcosyvoice/engine/llm_engine.py -------------------------------------------------------------------------------- /flashcosyvoice/engine/model_runner.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/flashcosyvoice/engine/model_runner.py -------------------------------------------------------------------------------- /flashcosyvoice/engine/scheduler.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/flashcosyvoice/engine/scheduler.py -------------------------------------------------------------------------------- /flashcosyvoice/engine/sequence.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/flashcosyvoice/engine/sequence.py -------------------------------------------------------------------------------- /flashcosyvoice/modules/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /flashcosyvoice/modules/flow.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/flashcosyvoice/modules/flow.py -------------------------------------------------------------------------------- /flashcosyvoice/modules/flow_components/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /flashcosyvoice/modules/flow_components/estimator.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/flashcosyvoice/modules/flow_components/estimator.py -------------------------------------------------------------------------------- /flashcosyvoice/modules/flow_components/upsample_encoder.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/flashcosyvoice/modules/flow_components/upsample_encoder.py -------------------------------------------------------------------------------- /flashcosyvoice/modules/hifigan.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/flashcosyvoice/modules/hifigan.py -------------------------------------------------------------------------------- /flashcosyvoice/modules/hifigan_components/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /flashcosyvoice/modules/hifigan_components/layers.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/flashcosyvoice/modules/hifigan_components/layers.py -------------------------------------------------------------------------------- /flashcosyvoice/modules/qwen2.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/flashcosyvoice/modules/qwen2.py -------------------------------------------------------------------------------- /flashcosyvoice/modules/qwen2_components/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /flashcosyvoice/modules/qwen2_components/layers.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/flashcosyvoice/modules/qwen2_components/layers.py -------------------------------------------------------------------------------- /flashcosyvoice/modules/sampler.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/flashcosyvoice/modules/sampler.py -------------------------------------------------------------------------------- /flashcosyvoice/utils/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /flashcosyvoice/utils/audio.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/flashcosyvoice/utils/audio.py -------------------------------------------------------------------------------- /flashcosyvoice/utils/context.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/flashcosyvoice/utils/context.py -------------------------------------------------------------------------------- /flashcosyvoice/utils/loader.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/flashcosyvoice/utils/loader.py -------------------------------------------------------------------------------- /flashcosyvoice/utils/memory.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/flashcosyvoice/utils/memory.py -------------------------------------------------------------------------------- /stepaudio2.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/stepaudio2.py -------------------------------------------------------------------------------- /stepaudio2vllm.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/stepaudio2vllm.py -------------------------------------------------------------------------------- /token2wav.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/token2wav.py -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/utils.py -------------------------------------------------------------------------------- /web_demo.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/web_demo.py -------------------------------------------------------------------------------- /web_demo_vllm.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stepfun-ai/Step-Audio2/HEAD/web_demo_vllm.py --------------------------------------------------------------------------------