Sort:
Open Access Issue
MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models
Big Data Mining and Analytics 2024, 7(4): 1116-1128
Published: 04 December 2024
Abstract PDF (3.7 MB) Collect
Downloads:556

Ensuring the general efficacy and benefit for human beings from medical Large Language Models (LLM) before real-world deployment is crucial. However, a widely accepted and accessible evaluation process for medical LLM, especially in the Chinese context, remains to be established. In this work, we introduce “MedBench”, a comprehensive, standardized, and reliable benchmarking system for Chinese medical LLM. First, MedBench assembles the currently largest evaluation dataset (300901 questions) to cover 43 clinical specialties, and performs multi-faceted evaluation on medical LLM. Second, MedBench provides a standardized and fully automatic cloud-based evaluation infrastructure, with physical separations between question and ground truth. Third, MedBench implements dynamic evaluation mechanisms to prevent shortcut learning and answer memorization. Applying MedBench to popular general and medical LLMs, we observe unbiased, reproducible evaluation results largely aligning with medical professionals’ perspectives. This study establishes a significant foundation for preparing the practical applications of Chinese medical LLMs. MedBench is publicly accessible at https://medbench.opencompass.org.cn.

Open Access Original Article Issue
Exploring the feasibility of integrating ultra‐high field magnetic resonance imaging neuroimaging with multimodal artificial intelligence for clinical diagnostics
iRADIOLOGY 2024, 2(5): 498-509
Published: 22 October 2024
Abstract PDF (4 MB) Collect
Downloads:15
Background

The integration of 7 Tesla (7T) magnetic resonance imaging (MRI) with advanced multimodal artificial intelligence (AI) models represents a promising frontier in neuroimaging. The superior spatial resolution of 7TMRI provides detailed visualizations of brain structure, which are crucial forunderstanding complex central nervous system diseases and tumors. Concurrently, the application of multimodal AI to medical images enables interactive imaging‐based diagnostic conversation.

Methods

In this paper, we systematically investigate the capacity and feasibility of applying the existing advanced multimodal AI model ChatGPT‐4V to 7T MRI under the context of brain tumors. First, we test whether ChatGPT‐4V has knowledge about 7T MRI, and whether it can differentiate 7T MRI from 3T MRI. In addition, we explore whether ChatGPT‐4V can recognize different 7T MRI modalities and whether it can correctly offer diagnosis of tumors based on single or multiple modality 7T MRI.

Results

ChatGPT‐4V exhibited accuracy of 84.4% in 3T‐vs‐7T differentiation and accuracy of 78.9% in 7T modality recognition. Meanwhile, in a human evaluation with three clinical experts, ChatGPT obtained average scores of 9.27/20 in single modality‐based diagnosis and 21.25/25 in multiple modality‐based diagnosis. Our study indicates that single‐modality diagnosis and the interpretability of diagnostic decisions in clinical practice should be enhanced when ChatGPT‐4V is applied to 7T data.

Conclusions

In general, our analysis suggests that such integration has promise as a tool to improve the workflow of diagnostics in neurology, with a potentially transformative impact in the fields of medical image analysis and patient management.

Total 2
1/11GOpage