Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation
The research focuses on enhancing Automated Audio Captioning (AAC), which generates descriptions for various sounds. The latest systems use seq2seq models like Transformers. This study aims to improve these models…
Continue reading