The article presents LIP-Loc, a new method for cross-modal localization in autonomous driving applications. The approach involves using a batch of image and LiDAR data pairs to predict the correct match between possible pairings. Through this, an image encoder and LiDAR encoder are jointly trained to learn a multi-modal embedding space, maximizing the cosine similarity between positive pairings and minimizing it for negative pairings. The method has been tested on standard autonomous driving datasets, outperforming state-of-the-art recall accuracy by 22.4%. The research also demonstrates the zero-shot capabilities of the model, beating the existing state-of-the-art by 8% without training.

 

Publication date: 29 Dec 2023
Project Page: https://shubodhs.ai/liploc
Paper: https://arxiv.org/pdf/2312.16648