GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
The article discusses the development of MM-Navigator, a GPT-4V-based agent that can interact with a smartphone screen and determine subsequent actions based on given instructions. The system demonstrated high accuracy…
Continue reading