The article presents GenLens – a tool for evaluating Generative AI (GenAI) model outputs. It fills a gap in the evaluation of GenAI output during the model development process, which often depends on developers’ subjective visual assessments. GenLens offers a quantifiable approach for overviewing and annotating failure cases, customizing issue tags and classifications, and aggregating annotations from multiple users to enhance collaboration. A user study with model developers reveals that GenLens effectively enhances their workflow, underlining the importance of robust early-stage evaluation tools in GenAI development.

 

Publication date: 6 Feb 2024
Project Page: https://arxiv.org/abs/2402.03700v1
Paper: https://arxiv.org/pdf/2402.03700