Unitxt is an innovative library designed for customizable textual data preparation and evaluation tailored to generative language models. It integrates with common libraries like HuggingFace and LM-eval-harness and deconstructs processing flows into modular components for easy customization and sharing. Its catalog centralizes these components, fostering collaboration and exploration in modern textual data workflows. Unitxt is more than just a tool, it’s a community-driven platform that empowers users to build, share, and advance their pipelines collaboratively.

 

Publication date: 26 Jan 2024
Project Page: https://github.com/IBM/unitxt
Paper: https://arxiv.org/pdf/2401.14019