The academic article discusses the importance of syntax representation in sentence-to-layout prediction, especially in unexpected situations. The study shows that models that explicitly integrate syntax perform better in predicting 2D spatial layouts based on text when using a novel structural loss function. This new function retains the syntactic structure of the sentence in its representation by aligning the syntax tree embeddings with the output of the visual embeddings. This research has significant implications for text-to-image synthesis, as it allows more controlled and localized image in-painting.

 

Publication date: 26 Jan 2024
Project Page: https://github.com/rubencart/USCOCO
Paper: https://arxiv.org/pdf/2401.14212