The article introduces WEBLINX, a large-scale benchmark for conversational web navigation. It covers a broad range of patterns on over 150 real-world websites. The authors designed a model that prunes HTML pages by ranking relevant elements to tackle the information processing bottleneck in Large Language Models (LLMs). The study found that smaller finetuned decoders outperformed zero-shot LLMs and larger finetuned multimodal models. However, the models struggled to generalize to unseen websites, highlighting the need for large multimodal models that can adapt to novel settings.

 

Publication date: 8 Feb 2024
Project Page: https://mcgill-nlp.github.io/weblinx
Paper: https://arxiv.org/pdf/2402.05930