Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning

The paper discusses the creation of the Aya Dataset, a multilingual instruction-following dataset spanning 65 languages. The researchers collaborated with fluent speakers worldwide to collect natural instances of instructions and completions. This dataset is significant as it bridges the language gap in AI, specifically in the area of instruction fine-tuning (IFT). The Aya Dataset, the Aya Collection, the Aya Evaluation Suite, and the Aya Annotation Platform have been developed and open-sourced, making it the most extensive multilingual collection to date. The initiative is a valuable case study in participatory research involving collaborators from 119 countries.

Publication date: 12 Feb 2024
Project Page: Not Provided
Paper: https://arxiv.org/pdf/2402.06619

Post Views: 308

Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Feedback Loops With Language Models Drive In-Context Reward Hacking

TIC: Translate-Infer-Compile for accurate ‘text to plan’ using LLMs and logical intermediate representations

Leave a Reply Cancel reply

Please allow ads on our site