L3Cube-IndicNews: News-based Short Text and Long Document Classification Datasets in Indic Languages

This article presents L3Cube-IndicNews, a multilingual text classification corpus aimed at creating a high-quality dataset for Indian regional languages. The focus is on news headlines and articles in 10 prominent Indic languages. The datasets are designed to handle different document lengths and are classified into Short Headlines, Long Document, and Long Paragraph. The research significantly contributes to expanding the available text classification datasets and enables the development of topic classification models for Indian regional languages. The datasets and models are shared publicly for further research.

Publication date: 5 Jan 2024
Project Page: https://github.com/l3cube-pune/indic-nlp
Paper: https://arxiv.org/pdf/2401.02254

Post Views: 286

L3Cube-IndicNews: News-based Short Text and Long Document Classification Datasets in Indic Languages

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Rethinking Response Evaluation from Interlocutor’s Eye for Open-Domain Dialogue Systems

Joint Multi-Facts Reasoning Network For Complex Temporal Question Answering Over Knowledge Graph

Leave a Reply Cancel reply

Please allow ads on our site