Prosody in Cascade and Direct Speech-to-Text Translation: a case study on Korean Wh-Phrases

This research focuses on Speech-to-Text Translation (S2TT), comparing traditional cascade systems with direct translation systems. The authors argue that direct S2TT systems can better manage non-verbal content such as prosody, and prove this by testing Korean-English translation systems on wh-phrases. The results show that direct translation systems outperform cascade models, with a significant improvement in overall accuracy and F1 scores. The research provides quantitative evidence of the effectiveness of direct S2TT models in leveraging prosody.

Publication date: 2 Feb 2024
Project Page: https://github.com/GiulioZhou/contrastive_prosody
Paper: https://arxiv.org/pdf/2402.00632

Post Views: 316

Prosody in Cascade and Direct Speech-to-Text Translation: a case study on Korean Wh-Phrases

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning

Actor Identification in Discourse: A Challenge for LLMs?

Leave a Reply Cancel reply

Please allow ads on our site