October 4, 2023

SLM: Bridge the thin gap between speech and text foundation models

The authors introduce a multitask, multilingual, dual-modal Speech and Language Model (SLM). The SLM uses pretrained foundational speech and language models, preserving their capabilities while training a simple adapter with only 1% of the foundational model parameters. This model performs well on tasks like automatic speech recognition and translation, and can follow zero-shot instructions for a variety of tasks. The SLM demonstrates that the gap between pretrained speech and language models can be bridged with a simple adaptation mechanism.

Publication date: 4 Oct 2023
Project Page: Not Provided
Paper: https://arxiv.org/pdf/2310.00230

Post Views: 328

automatic speech recognition, Foundation Models, Speech Language Model, Speech Translation, zero-shot instruction-following

SLM: Bridge the thin gap between speech and text foundation models

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Time-Variant Overlap-Add in Partitions

A Novel U-Net Architecture for Denoising of Real-world Noise Corrupted Phonocardiogram Signal

Leave a Reply Cancel reply

Please allow ads on our site