CapSpeech: A Prompt-Guided Expressive Text-to-Speech Synthesizer
👋 Welcome to the 🧢CapSpeech live demo.
🔗 Learn more about this project on the 🧢CapSpeech Homepage.
📃 Licensed under CC BY-NC 4.0.
🔧 Usage Tips
- Quick Start: Enter a style caption and a transcript to generate expressive speech just the way you want.
- Model Tabs: Toggle model checkpoints by clicking the Model tab, with each checkpoint tailored for a specific downstream use case.
- Speed/Duration Settings: Adjust the speed and duration if the predicted speech pace sounds unnatural.
- Flow Matching Settings: Modify the CFG scale and sampling steps to refine prompt alignment and improve generation quality.
This checkpoint offers balanced performance and supports general style control.
0.5 2
1 20
Enable to use a fixed audio duration.
0 100
Enable to use a fixed random seed for reproducibility.
1 5
20 100
Examples
Voice Style Caption | Speech Transcript/Content |
---|
This checkpoint is specifically fine-tuned for enhanced accent control.
0.5 2
1 20
Enable to use a fixed audio duration.
0 100
Enable to use a fixed random seed for reproducibility.
1 5
20 100
Examples
Voice Style Caption | Speech Transcript/Content |
---|
This checkpoint is specifically fine-tuned to enhance emotional expressiveness.
0.5 2
1 20
Enable to use a fixed audio duration.
0 100
Enable to use a fixed random seed for reproducibility.
1 5
20 100
Examples
Voice Style Caption | Speech Transcript/Content |
---|