The study explores the use of Large Language Models (LLMs) in the medical domain, particularly their effectiveness and robustness in semantic search tasks. It constructs a textual dataset based on the ICD-10-CM code descriptions used in US hospitals, and benchmarked generalist versus specialized embedding models. The results showed that generalist models performed better, indicating that specialized models may be more sensitive to slight input variations. This could be due to inadequate or insufficiently diverse training data. The study underscores the need for a reliable global language understanding for accurate handling of medical documents.

 

Publication date: 3 Jan 2024
Project Page: https://arxiv.org/abs/2401.01943v1
Paper: https://arxiv.org/pdf/2401.01943