PromptSpeaker: Speaker Generation Based on Text Descriptions

The article discusses the development and functionality of PromptSpeaker, a system that uses text prompts to generate custom speaker voices. The PromptSpeaker system consists of a prompt encoder, a zero-shot VITS, and a Glow model. The prompt encoder predicts a prior distribution based on the text description and samples from this distribution to obtain a semantic representation. This semantic representation is then converted into a speaker representation by the Glow model, and the zero-shot VITS synthesizes the speaker’s voice based on this representation. The authors verify that PromptSpeaker can generate new speakers not included in the training set and that the synthetic speaker voice matches the speaker prompt reasonably well.

Publication date: 10 Oct 2023
Project Page: https://promptspeaker.github.io/demo/
Paper: https://arxiv.org/pdf/2310.05001

Post Views: 373

Press ESC to close

Share Article:

root

SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation

A Computational Framework for Solving Wasserstein Lagrangian Flows

Please allow ads on our site