This study introduces KoTox, a Korean Toxic instruction dataset to improve the ethical robustness of Large Language Models (LLMs). The dataset comprises 39K unethical instruction-output pairs and focuses on three areas: Political bias, Crime, and Hate. The researchers used lists of derogatory terms, biased expressions, and a diverse set of predicates to generate the dataset. The aim is to aid LLMs in effectively responding to toxic queries, thereby promoting secure and responsible interactions in Natural Language Processing (NLP) applications.
Publication date: 30 Nov 2023
Project Page: https://arxiv.org/abs/2311.18215v1
Paper: https://arxiv.org/pdf/2311.18215