This research introduces CLAPP (Contrastive Language-Audio Pre-training in Passive Underwater Vessel Classification), a novel model for audio classification in passive underwater vessel scenarios. The model uses a neural network trained on a wide range of vessel audio and vessel state text pairs. It can learn directly from raw vessel audio data and, when available, from curated labels. This enables improved recognition of vessel attributes. The model’s zero-shot capability allows it to predict the most relevant vessel state description for a given vessel audio, without directly optimizing for the task. The proposed method achieves new state-of-the-art results on both Deepship and Shipsear public datasets.
Publication date: 5 Jan 2024
Project Page: Not Provided
Paper: https://arxiv.org/pdf/2401.02099