This article introduces KV Inversion, a novel method for text-conditioned real image action editing. Unlike existing methods, KV Inversion can produce results that adhere to the action semantics of the editing prompt while preserving the content of the original image. This method solves two main issues: the edited result can match the corresponding action, and the edited object can maintain the texture and identity of the original image. The method doesn’t require training the Stable Diffusion model itself or scanning a large-scale dataset for time-consuming training. The potential applications of this method in areas like comic book production, video editing, and advertising material production are vast.

 

Publication date: 28 Sep 2023
Project Page: https://arxiv.org/abs/2309.16608v1
Paper: https://arxiv.org/pdf/2309.16608