This article discusses the growing issue of data leakage within organizations. To mitigate this, a statistical Data Leakage Prevention (DLP) system is proposed, which classifies documents before access is granted. Techniques such as TF-IDF, Vectorization, and Gradient boosting are used for document classification. The model also introduces an efficient and accurate approach, the Improvised Gradient Boosting Classification Algorithm (IGBCA). The results show high accuracy in document classification, effectively preventing potential data loss.

 

Publication date: 22 Dec 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2312.13711