This study presents a novel multimodal medical image zero-shot segmentation algorithm named the text-visual-prompt segment anything model (TV-SAM) without any manual annotations. The TV-SAM incorporates and integrates the large language model GPT-4, the vision language model GLIP, and the SAM to autonomously generate descriptive text prompts and visual bounding box prompts from medical images, thereby enhancing the SAM’s capability for zero-shot segmentation. Comprehensive evaluations are implemented on seven public datasets encompassing eight imaging modalities to demonstrate that TV-SAM can effectively segment unseen targets across various modalities without additional training. TV-SAM significantly outperforms SAM AUTO (p < 0.01) and GSAM (p < 0.05), closely matching the performance of SAM BBOX with gold standard bounding box prompts (p = 0.07), and surpasses the state-of-the-art methods on specific datasets such as ISIC (0.853 versus 0.802) and WBC (0.968 versus 0.883). The study indicates that TV-SAM serves as an effective multimodal medical image zero-shot segmentation algorithm, highlighting the significant contribution of GPT-4 to zero-shot segmentation. By integrating foundational models such as GPT-4, GLIP, and SAM, the ability to address complex problems in specialized domains can be enhanced.
W. Ji, J. Li, Q. Bi, T. Liu, W. Li, and L. Cheng, Segment anything is not always perfect: An investigation of SAM on different real-world applications, Mach. Intell. Res., vol. 21, no. 4, pp. 617–630, 2024.
J. Ma, Y. He, F. Li, L. Han, C. You, and B. Wang, Segment anything in medical images, Nat. Commun., vol. 15, p. 654, 2024.
M. A. Mazurowski, H. Dong, H. Gu, J. Yang, N. Konz, and Y. Zhang, Segment anything model for medical image analysis: An experimental study, Med. Image Anal., vol. 89, p. 102918, 2023.
S. Saminu, G. Xu, S. Zhang, I. A. El Kader, H. A. Aliyu, A. H. Jabire, Y. K. Ahmed, and M. J. Adamu, Applications of artificial intelligence in automatic detection of epileptic seizures using EEG signals: A review, Artif. Intell. Appl., vol. 1, no. 1, pp. 11–25, 2023.
Y. Huang, X. Yang, L. Liu, H. Zhou, A. Chang, X. Zhou, R. Chen, J. Yu, J. Chen, C. Chen, et al., Segment anything model for medical images, Med. Image Anal., vol. 92, p. 103061, 2024.
J. Gao, Q. Lao, P. Liu, H. Yi, Q. Kang, Z. Jiang, X. Wu, K. Li, and Y. Chen, Anatomically guided cross-domain repair and screening for ultrasound fetal biometry, IEEE Journal of Biomedical and Health Informatics, vol. 27, no. 10, pp. 4914–4925, 2023.
H. Song, L. Chen, Y. Cui, Q. Li, Q. Wang, J. Fan, J. Yang, and L. Zhang, Denoising of MR and CT images using cascaded multi-supervision convolutional neural networks with progressive training, Neurocomputing, vol. 469, pp. 354–365, 2022.
J. Gao, P. Liu, G. D. Liu, and L. Zhang, Robust needle localization and enhancement algorithm for ultrasound by deep learning and beam steering methods, J. Comput. Sci. Technol., vol. 36, no. 2, pp. 334–346, 2021.
M. Aljabri, M. AlAmir, M. AlGhamdi, M. Abdel-Mottaleb, and F. Collado-Mesa, Towards a better understanding of annotation tools for medical imaging: A survey, Multimed. Tools Appl., vol. 81, no. 18, pp. 25877–25911, 2022.
P. Tschandl, C. Rosendahl, and H. Kittler, The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions, Sci. Data, vol. 5, p. 180161, 2018.
X. Zheng, Y. Wang, G. Wang, and J. Liu, Fast and robust segmentation of white blood cell images by self-supervised learning, Micron, vol. 107, pp. 55–71, 2018.
W. Al-Dhabyani, M. Gomaa, H. Khaled, and A. Fahmy, Dataset of breast ultrasound images, Data Brief, vol. 28, p. 104863, 2019.
M. E. H. Chowdhury, T. Rahman, A. Khandakar, R. Mazhar, M. A. Kadir, Z. B. Mahbub, K. R. Islam, M. S. Khan, A. Iqbal, N. Al Emadi, et al., Can AI help in screening viral and COVID-19 pneumonia, IEEE Access, vol. 8, pp. 132665–132676, 2020.
T. Rahman, A. Khandakar, Y. Qiblawey, A. Tahir, S. Kiranyaz, S. Bin Abul Kashem, M. T. Islam, S. Al Maadeed, S. M. Zughaier, M. S. Khan, et al., Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images, Comput. Biol. Med., vol. 132, p. 104319, 2021.
M. Xiao, R. Wei, J. Yu, C. Gao, F. Yang, and L. Zhang, CpG island definition and methylation mapping of the T2T-YAO genome, Genom. Proteom. Bioinform., vol. 22, no. 2, p. qzae009, 2024.
Q. Zhang, H. Zhang, K. Zhou, and L. Zhang, Developing a physiological signal-based, mean threshold and decision-level fusion algorithm (PMD) for emotion recognition, Tsinghua Science and Technology, vol. 28, no. 4, pp. 673–685, 2023.
L. Zhang, J. Badai, G. Wang, X. Ru, W. Song, Y. You, J. He, S. Huang, H. Feng, R. Chen, et al., Discovering hematoma-stimulated circuits for secondary brain injury after intraventricular hemorrhage by spatial transcriptome analysis, Front. Immunol., vol. 14, p. 1123652, 2023.
Y. You, L. Zhang, P. Tao, S. Liu, and L. Chen, Spatiotemporal transformer neural network for time-series forecasting, Entropy(Basel), vol. 24, no. 11, p. 1651, 2022.
X. Lai, J. Zhou, A. Wessely, M. Heppt, A. Maier, C. Berking, J. Vera, and L. Zhang, A disease network-based deep learning approach for characterizing melanoma, Int. J. Cancer, vol. 150, no. 6, pp. 1029–1044, 2022.
Y. Xia, C. Yang, N. Hu, Z. Yang, X. He, T. Li, and L. Zhang, Exploring the key genes and signaling transduction pathways related to the survival time of glioblastoma multiforme patients by a novel survival analysis model, BMC Genomics, vol. 18, no. Suppl 1, p. 950, 2017.
L. Zhang, S. Fan, J. Vera, and X. Lai, A network medicine approach for identifying diagnostic and prognostic biomarkers and exploring drug repurposing in human cancer, Comput. Struct. Biotechnol. J., vol. 21, pp. 34–45, 2023.