The article discusses the development of Auto-UI, an autonomous user interface agent aimed at facilitating task automation. It is a multimodal solution that interacts directly with the interface, eliminating the need for environment parsing or reliance on application-specific APIs. The authors propose a chain-of-action technique that uses a series of intermediate action histories and future action plans to help the agent decide what action to execute. The Auto-UI was evaluated on a new device-control benchmark and has shown promising results, achieving a prediction accuracy of 90% and an overall action success rate of 74%.

 

Publication date: 20 Sep 2023
Project Page: https://github.com/cooelf/Auto-UI
Paper: https://arxiv.org/pdf/2309.11436