Abstract
Large language models (LLMs) excel in various natural language processing tasks and are increasingly applied in specialized fields like medicine. However, their deployment in the medical domain is challenged by limited domain-specific data and the tendency to generate inaccurate information, known as “hallucinations.” While domainspecific fine-tuning has improved open-source LLMs, they still underperform compared to proprietary models like ChatGPT and PaLM. To address this gap, retrieval-augmented generation (RAG) techniques have been explored to enhance LLMs by integrating external knowledge bases. Nevertheless, the success of RAG depends on the quality of retrieved documents, and its application within the medical field remains in the early stages. In this paper, we introduce the “Bailicai” framework as an exploratory approach to integrating RAG with LLMs in the medical field. The framework employs fine-tuning to improve the RAG process, where “falsely relevant” and “completely irrelevant” interference documents are intentionally included in the training data. This enables Bailicai to develop the ability to assess the quality of retrieved documents and selectively incorporate them. The framework is organized into four modules: (1) medical knowledge injection, (2) self-knowledge boundary identification, (3) directed acyclic graph task decomposition, and (4) retrieval-augmented generation. Through the synergy of these modules, Bailicai achieves superior performance on multiple medical benchmarks, outperforming existing large models in the medical domain, RAG-based methods, and proprietary models such as GPT-3.5. Furthermore, Bailicai effectively mitigates the hallucination problem common in LLMs applied to medical tasks and enhances the robustness of RAG when dealing with irrelevant or misleading documents, enabling more accurate information retrieval and integration.