CLIPSwarm is an innovative algorithm that generates robot swarm formations from natural language descriptions. The algorithm employs a variation of the Montecarlo particle filter to iteratively generate and evaluate new formations based on their Clip Similarity with the input text. This similarity is calculated using Clip, a foundation model trained to encode images and texts into vectors within a shared latent space. The initial proof of concept showcases the potential of this solution in multi-robot systems. The research also suggests a novel application of foundation models like CLIP in the field of multi-robot systems. The study’s first approach involves creating formations using a Convex-Hull approach, with future work aiming to include more robust and generic representation and optimization steps in the process of obtaining a suitable swarm formation.

 

Publication date: 21 Nov 2023
Project Page: N/A
Paper: https://arxiv.org/pdf/2311.11047