December 23, 2023

Speech Translation with Large Language Models: An Industrial Practice

This paper introduces LLM-ST, a model for speech translation developed from a pre-trained large language model (LLM). The model integrates a speech encoder with the LLM and uses multi-task instruction tuning. LLM-ST can create accurate timestamped transcriptions and translations, even from lengthy audio inputs. The application of Chain-of-Thought (CoT) prompting also brings benefits to LLM-ST. The model has been tested on English and Chinese datasets and has set a new standard in the field of speech translation.

Publication date: 21 Dec 2023
Project Page: https://speechtranslation.github.io/llm-st/
Paper: https://arxiv.org/pdf/2312.13585

Post Views: 302

Chain-of-Thought prompting, Large Language Models, Multi-task Instruction Tuning, Speech Encoder, Speech Translation

Speech Translation with Large Language Models: An Industrial Practice

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQA

How to Prune Your Language Model: Recovering Accuracy on the Sparsity May Cry” Benchmark

Leave a Reply Cancel reply

Please allow ads on our site