PDF (1.3 MB)
Collect
Submit Manuscript
Show Outline
Figures (4)

Tables (2)
Table 1
Table 2
Open Access

Large Language Model for Medical Images: A Survey of Taxonomy, Systematic Review, and Future Trends

School of Computer Science and Engineering, Central South University, Changsha 410083, China
Key Laboratory of Computing Power Network and Information Security Affiliated with Ministry of Education, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China

Show Author Information

Abstract

The advent of Large Language Models (LLMs) has sparked considerable interest in the medical image domain, as they can generalize to multiple tasks and offer outstanding performance. While LLMs achieve promising results, there is currently a lack of a comprehensive summary of medical images, making it challenging for researchers to understand the progress within this domain. To fill this gap, we make the first attempt to present a comprehensive survey for LLM on medical images. In addition, to better summarize the current progress comprehensively, we further introduce a novel x-stage tuning paradigm for summarization, including zero-stage tuning, one-stage tuning, and multi-stage tuning, offering a unified perspective on LLMs for medical images. Finally, we discuss challenges and future directions in this domain, aiming to spur more breakthroughs in the future. We hope this work can pave the way for the broad application of LLMs in medical images and provide a valuable resource for this domain.

References

[1]

S. K. Zhou, H. Greenspan, C. Davatzikos, J. S. Duncan, B. Van Ginneken, A. Madabhushi, J. L. Prince, D. Rueckert, and R. M. Summers, A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises, Proc. IEEE, vol. 109, no. 5, pp. 820–838, 2021.

[2]
H. P. Chan, R. K. Samala, L. M. Hadjiiski, and C. Zhou, Deep learning in medical image analysis, in Deep Learning in Medical Image Analysis : Challenges and Applications, G. Lee and H. Fujita, eds. Cham, Switzerland: Springer, 2020, pp. 3–21.
[3]

G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. W. M. Van Der Laak, B. Van Ginneken, and C. I. Sánchez, A survey on deep learning in medical image analysis, Med. Image Anal., vol. 42, pp. 60–88, 2017.

[4]
W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong, et al., A survey of large language models, arXiv preprint arXiv: 2303.18223, 2024.
[5]
L. Qin, Q. Chen, X. Feng, Y. Wu, Y. Zhang, Y. Li, M. Li, W. Che, and P. S. Yu, Large language models meet NLP: A survey, arXiv preprint arXiv: 2405.12819, 2024.
[6]
L. Qin, Q. Chen, Y. Zhou, Z. Chen, Y. Li, L. Liao, M. Li, W. Che, and P. S. Yu, Multilingual large language model: A survey of resources, taxonomy and frontiers, arXiv preprint arXiv: 2404.04925, 2024.
[7]

L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang, Z. Chen, J. Tang, X. Chen, Y. Lin, et al., A survey on large language model based autonomous agents, Front. Comput. Sci., vol. 18, no. 6, p. 186345, 2024.

[8]
P. Wang, Y. Zhang, H. Fei, Q. Chen, Y. Wang, J. Si, W. Lu, M. Li, and L. Qin, S3 agent: Unlocking the power of VLLM for zero-shot multi-modal sarcasm detection, ACM Trans. Multimedia Comput. Commun. Appl., doi: 10.1145/3690642.
[9]
L. Qin, F. Wei, Q. Chen, J. Zhou, S. Huang, J. Si, W. Lu, and W. Che, CroPrompt: Cross-task interactive prompting for zero-shot spoken language understanding, arXiv preprint arXiv: 2406.10505, 2024.
[10]

A. J. Thirunavukarasu, D. S. J. Ting, K. Elangovan, L. Gutierrez, T. F. Tan, and D. S. W. Ting, Large language models in medicine, Nat. Med., vol. 29, no. 8, pp. 1930–1940, 2023.

[11]
H. Zhou, F. Liu, B. Gu, X. Zou, J. Huang, J. Wu, Y. Li, S. S. Chen, P. Zhou, J. Liu, et al., A survey of large language models in medicine: Progress, application, and challenge, arXiv preprint arXiv: 2311.05112, 2024.
[12]
M. Moor, Q. Huang, S. Wu, M. Yasunaga, Y. Dalmia, J. Leskovec, C. Zakka, E. P. Reis, and P. Rajpurkar, Med-flamingo: A multimodal medical few-shot learner, in Proc. 3 rd Machine Learning for Health Symp., New Orleans, LA, USA, 2023, pp. 353–367.
[13]

F. Liu, T. Zhu, X. Wu, B. Yang, C. You, C. Wang, L. Lu, Z. Liu, Y. Zheng, X. Sun, et al., A medical multimodal large language model for future pandemics, NPJ Digit. Med., vol. 6, no. 1, p. 226, 2023.

[14]
Y. Li, Y. Liu, Z. Wang, X. Liang, L. Wang, L. Liu, L. Cui, Z. Tu, L. Wang, and L. Zhou, A systematic evaluation of GPT-4V’s multimodal capability for medical image analysis, arXiv preprint arXiv: 2310.20381, 2024.
[15]
Z. Liu, H. Jiang, T. Zhong, Z. Wu, C. Ma, Y. Li, X. Yu, Y. Zhang, Y. Pan, P. Shu, et al., Holistic evaluation of GPT-4V for biomedical imaging, arXiv preprint arXiv: 2312.05256, 2023.
[16]
M. Li, B. Lin, Z. Chen, H. Lin, X. Liang, and X. Chang, Dynamic graph enhanced contrastive learning for chest X-ray report generation, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Vancouver, Canada, 2023, pp. 3334–3343.
[17]
J. Irvin, P. Rajpurkar, M. Ko, Y. Yu, S. Ciurea-Ilcus, C. Chute, H. Marklund, B. Haghgoo, R. Ball, K. Shpanskaya, et al., CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison, in Proc. 33 rd AAAI Conf. Artificial Intelligence, Honolulu, HI, USA, 2019, pp. 590–597.
[18]
Y. Zhang, X. Wang, Z. Xu, Q. Yu, A. Yuille, and D. Xu, When radiology report generation meets knowledge graph, in Proc. 34 th AAAI Conf. Artificial Intelligence, New York, NY, USA, 2020, pp. 12910–12917.
[19]

B. Ma, F. Meng, G. Yan, H. Yan, B. Chai, and F. Song, Diagnostic classification of cancers using extreme gradient boosting algorithm and multi-omics data, Comput. Biol. Med., vol. 121, p. 103761, 2020.

[20]
B. Boecking, N. Usuyama, S. Bannur, D. C. Castro, A. Schwaighofer, S. Hyland, M. Wetscherek, T. Naumann, A. Nori, J. Alvarez-Valle, et al., Making the most of text semantics to improve biomedical vision-language processing, in Proc. 17 th European Conf. Computer Vision, Tel Aviv, Israel, 2022, pp. 1–21.
[21]
S. Wang, Z. Zhao, X. Ouyang, Q. Wang, and D. Shen, ChatCAD: Interactive computer-aided diagnosis on medical image using large language models, arXiv preprint arXiv: 2302.07257, 2023.
[22]
Z. Zhao, S. Wang, J. Gu, Y. Zhu, L. Mei, Z. Zhuang, Z. Cui, Q. Wang, and D. Shen, Chatcad+: Towards a universal and reliable interactive CAD using LLMs, arXiv preprint arXiv: 2305.15964, 2024.
[23]
Z. Yang, L. Li, K. Lin, J. Wang, C. C. Lin, Z. Liu, and L. Wang, The dawn of LMMs: Preliminary explorations with GPT-4V(ision), arXiv preprint arXiv: 2309.17421, 2023.
[24]
A. Pal and M. Sankarasubbu, Gemini goes to med school: Exploring the capabilities of multimodal large language models on medical challenge problems & hallucinations, arXiv preprint arXiv: 2402.07023, 2024.
[25]
T. Han, L. C. Adams, S. Nebelung, J. N. Kather, K. K. Bressem, and D. Truhn, Multimodal large language models are generalist medical image interpreters, medRxiv preprint medRxiv: 10.1101/2023.12.21.23300146, 2023.
[26]
W. Gao, Z. Deng, Z. Niu, F. Rong, C. Chen, Z. Gong, W. Zhang, D. Xiao, F. Li, Z. Cao, et al., OphGLM: Training an ophthalmology large language-and-vision assistant based on instructions and dialogue, arXiv preprint arXiv: 2306.12174, 2023.
[27]
C. Shu, B. Chen, F. Liu, Z. Fu, E. Shareghi, and N. Collier, Visual med-alpaca: A parameter-efficient biomedical LLM with visual capabilities, https://cambridgeltl.github.io/visual-med-alpaca, 2023.
[28]
D. Shi, X. Chen, W. Zhang, P. Xu, Z. Zhao, Y. Zheng, and M. He, FFA-GPT: An interactive visual question answering system for fundus fluorescein angiography, https://doi.org/10.21203/rs.3.rs-3307492/v1, 2023.
[29]
Q. Chen, X. Hu, Z. Wang, and Y. Hong, MedBLIP: Bootstrapping language-image pre-training from 3D medical images and texts, arXiv preprint arXiv: 2305.10799, 2023.
[30]
C. Wu, X. Zhang, Y. Zhang, Y. Wang, and W. Xie, Towards generalist foundation model for radiology by leveraging web-scale 2D&3D medical data, arXiv preprint arXiv: 2308.02463, 2023.
[31]
B. N. Zhao, X. Jiang, X. Luo, Y. Yang, B. Li, Z. Wang, J. Alvarez-Valle, M. P. Lungren, D. Li, and L. Qiu, Large multimodal model for real-world radiology report generation, https://openreview.net/forum?id=3Jl0sjmZx9, 2023.
[32]
S. L. Hyland, S. Bannur, K. Bouzid, D. C. Castro, M. Ranjit, A. Schwaighofer, F. Pérez-García, V. Salvatelli, S. Srivastav, A. Thieme, et al., MAIRA-1: A specialised large multimodal model for radiology report generation, arXiv preprint arXiv: 2311.13668, 2024.
[33]
T. Tu, S. Azizi, D. Driess, M. Schaekermann, M. Amin, P. C. Chang, A. Carroll, C. Lau, R. Tanno, I. Ktena, et al., Towards generalist biomedical AI, arXiv preprint arXiv: 2307.14334, 2023.
[34]
B. Yang, A. Raza, Y. Zou, and T. Zhang, Customizing general-purpose foundation models for medical report generation, arXiv preprint arXiv: 2306.05642, 2023.
[35]
X. Zhang, C. Wu, Z. Zhao, W. Lin, Y. Zhang, Y. Wang, and W. Xie, PMC-VQA: Visual instruction tuning for medical visual question answering, arXiv preprint arXiv: 2305.10415, 2024.
[36]
T. van Sonsbeek, M. M. Derakhshani, I. Najdenkoska, C. G. M. Snoek, and M. Worring, Open-ended medical visual question answering through prefix tuning of language models, arXiv preprint arXiv: 2303.05977, 2023.
[37]
Z. Wang, L. Liu, L. Wang, and L. Zhou, R2GenGPT: Radiology report generation with frozen LLMs, Meta-Radiology, vol. 1, no. 3, p. 100033, 2023.
[38]
L. Yang, Z. Wang, and L. Zhou, MedXChat: Bridging CXR modalities with a unified multimodal large model, arXiv preprint arXiv: 2312.02233, 2024.
[39]

M. Song, J. Wang, Z. Yu, J. Wang, L. Yang, Y. Lu, B. Li, X. Wang, X. Wang, Q. Huang, et al., PneumoLLM: Harnessing the power of large language model for pneumoconiosis diagnosis, Med. Image Anal., vol. 97, p. 103248, 2024.

[40]
J. He, P. Li, G. Liu, Z. Zhao, and S. Zhong, PeFoMed: Parameter efficient fine-tuning on multimodal large language models for medical visual question answering, arXiv preprint arXiv: 2401.02797, 2024.
[41]
J. Zhou, X. He, L. Sun, J. Xu, X. Chen, Y. Chu, L. Zhou, X. Liao, B. Zhang, and X. Gao, SkinGPT-4: An interactive dermatology diagnostic system with visual large language model, arXiv preprint arXiv: 2304.10691, 2023.
[42]
L. Ma, J. Han, Z. Wang, and D. Zhang, CephGPT-4: An interactive multimodal cephalometric measurement and diagnostic system with visual large language model, arXiv preprint arXiv: 2307.07518, 2023.
[43]
Y. Sun, C. Zhu, S. Zheng, K. Zhang, L. Sun, Z. Shui, Y. Zhang, H. Li, and L. Yang, PathAsst: Redefining pathology through generative foundation AI assistant for pathology, arXiv preprint arXiv: 2305.15072, 2024.
[44]
S. Lee, J. Youn, M. Kim, and S. H. Yoon, CXR-LLAVA: A multimodal large language model for interpreting chest X-ray images, arXiv preprint arXiv: 2310.18341, 2024.
[45]
C. Li, C. Wong, S. Zhang, N. Usuyama, H. Liu, J. Yang, T. Naumann, H. Poon, and J. Gao, LLaVA-Med: Training a large language-and-vision assistant for biomedicine in one day, arXiv preprint arXiv: 2306.00890, 2023.
[46]
S. Xu, L. Yang, C. Kelly, M. Sieniek, T. Kohlberger, M. Ma, W. H. Weng, A. Kiraly, S. Kazemzadeh, Z. Melamed, et al., ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders, arXiv preprint arXiv: 2308.01317, 2023.
[47]
K. Tian, Towards automated healthcare: Deep vision and large language models for radiology report generation, PhD dissertation, Harvard University, Cambridge, MA, USA, 2023.
[48]
W. Zhou, Z. Ye, Y. Yang, S. Wang, H. Huang, R. Wang, and D. Yang, Transferring pre-trained large language-image model for medical image captioning, in Proc. CLEF2023 : Conf. and Labs of the Evaluation Forum, Thessaloniki, Greece, 2023. http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-3497/paper-148.pdf
[49]

J. Ji, Y. Hou, X. Chen, Y. Pan, and Y. Xiang, Vision-language model for generating textual descriptions from clinical images: Model development and validation study, JMIR Form. Res., vol. 8, no. 1, p. e32690, 2024.

[50]
O. Thawkar, A. Shaker, S. S. Mullappilly, H. Cholakkal, R. M. Anwer, S. Khan, J. Laaksonen, and F. S. Khan, XrayGPT: Chest radiographs summarization using medical vision-language models, arXiv preprint arXiv: 2306.07971, 2023.
[51]
J. Liu, Z. Wang, Q. Ye, D. Chong, P. Zhou, and Y. Hua, Qilin-Med-VL: Towards Chinese large vision-language model for general healthcare, arXiv preprint arXiv: 2310.17956, 2023.
[52]
S. Lee, W. J. Kim, and J. C. Ye, LLM itself can read and generate CXR images, arXiv preprint arXiv: 2305.11490, 2024.
[53]
Y. Lu, S. Hong, Y. Shah, and P. Xu, Effectively fine-tune to improve large multimodal models for radiology report generation, arXiv preprint arXiv: 2312.01504, 2023.
[54]
J. Dai, Q. Zhu, J. Zhan, B. Wang, and X. Qiu, MOSS-MED: Medical multimodal model serving medical image analysis, ACM Trans. Manag. Inf. Syst., doi: 10.1145/3688005.
[55]
K. Le-Duc, R. Zhang, N. S. Nguyen, T. H. Pham, A. Dao, B. H. Ngo, A. T. Nguyen, and T. S. Hy, LiteGPT: Large vision-language model for joint chest X-ray localization and classification task, arXiv preprint arXiv: 2407.12064, 2024.
[56]
A. Alkhaldi, R. Alnajim, L. Alabdullatef, R. Alyahya, J. Chen, D. Zhu, A. Alsinan, and M. Elhoseiny, MiniGPT-Med: Large language model as a general interface for radiology diagnosis, arXiv preprint arXiv: 2407.04106, 2024.
[57]
Gemini Team Google, Gemini: A family of highly capable multimodal models, arXiv preprint arXiv: 2312.11805, 2024.
[58]
J. B. Alayrac, J. Donahue, P. Luc, A. Miech, I. Barr, Y. Hasson, K. Lenc, A. Mensch, K. Millican, M. Reynolds, et al., Flamingo: A visual language model for few-shot learning, in Proc. 36 th Int. Conf. Neural Information Processing Systems, New Orleans, LA, USA, 2022, pp. 23716–23736.
[59]
A. Zeng, X. Liu, Z. Du, Z. Wang, H. Lai, M. Ding, Z. Yang, Y. Xu, W. Zheng, X. Xia, et al., GLM-130B: An open bilingual pre-trained model, arXiv preprint arXiv: 2210.02414, 2023.
[60]
H. Touvron, T. Lavril, G. Izacard, X. Martinet, M. A. Lachaux, T. Lacroix, B. Roziere, N. Goyal, E. Hambro, F. Azhar, et al., LLaMA: Open and efficient foundation language models, arXiv preprint arXiv: 2302.13971, 2023.
[61]
J. Li, D. Li, C. Xiong, and S. Hoi, BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation, in Proc. 39 th Int. Conf. Machine Learning, Baltimore, MD, USA, 2022, pp. 12888–12900.
[62]
H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, et al., Llama 2: Open foundation and fine-tuned chat models, arXiv preprint arXiv: 2307.09288, 2023.
[63]
A. Awadalla, I. Gao, J. Gardner, J. Hessel, Y. Hanafy, W. Zhu, K. Marathe, Y. Bitton, S. Gadre, S. Sagawa, et al., OpenFlamingo: An open-source framework for training large autoregressive vision-language models, arXiv preprint arXiv: 2308.01390, 2023.
[64]
H. Liu, C. Li, Y. Li, and Y. J. Lee, Improved baselines with visual instruction tuning, arXiv preprint arXiv: 2310.03744, 2024.
[65]
D. Driess, F. Xia, M. S. M. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu, et al., PaLM-E: An embodied multimodal language model, arXiv preprint arXiv: 2303.03378, 2023.
[66]
J. Li, D. Li, S. Savarese, and S. Hoi, BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models, arXiv preprint arXiv: 2301.12597, 2023.
[67]
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., Learning transferable visual models from natural language supervision, in Proc. 38 th Int. Conf. Machine Learning, Virtual Event, 2021, pp. 8748–8763.
[68]
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, High-resolution image synthesis with latent diffusion models, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022, pp. 10674–10685.
[69]
D. Zhu, J. Chen, X. Shen, X. Li, and M. Elhoseiny, MiniGPT-4: Enhancing vision-language understanding with advanced large language models, arXiv preprint arXiv: 2304.10592, 2023.
[70]
J. Chen, D. Zhu, X. Shen, X. Li, Z. Liu, P. Zhang, R. Krishnamoorthi, V. Chandra, Y. Xiong, and M. Elhoseiny, MiniGPT-v2: Large language model as a unified interface for vision-language multi-task learning, arXiv preprint arXiv: 2310.09478, 2023.
[71]
H. Liu, C. Li, Q. Wu, and Y. J. Lee, Visual instruction tuning, in Proc. 37 th Int. Conf. Neural Information Processing Systems, New Orleans, LA, USA, 2024, pp. 34892–34916.
[72]
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16×16 words: Transformers for image recognition at scale, arXiv preprint arXiv: 2010.11929, 2021.
[73]
S. Zhang, S. Roller, N. Goyal, M. Artetxe, M. Chen, S. Chen, C. Dewan, M. Diab, X. Li, X. V. Lin, et al., OPT: Open pre-trained transformer language models, arXiv preprint arXiv: 2205.01068, 2022.
[74]
W. Dai, J. Li, D. Li, A. M. H. Tiong, J. Zhao, W. Wang, B. Li, P. Fung, and S. Hoi, InstructBLIP: Towards general-purpose vision-language models with instruction tuning, arXiv preprint arXiv: 2305.06500, 2023.
[75]
Z. Wang, Z. Wu, D. Agarwal, and J. Sun, MedCLIP: Contrastive learning from unpaired medical images and text, arXiv preprint arXiv: 2210.10163, 2022.
[76]
W. L. Chiang, Z. Li, Z. Lin, Y. Sheng, Z. Wu, H. Zhang, L. Zheng, S. Zhuang, Y. Zhuang, J. E. Gonzalez, I. Stoica, and E. P. Xing, Vicuna: An open-source chatbot impressing GPT-4 with 90%* ChatGPT quality, https://lmsys.org/blog/2023-03-30-vicuna, 2023.
[77]
P. Li, G. Liu, J. He, Z. Zhao, and S. Zhong, Masked vision and language pre-training with unimodal and multimodal contrastive losses for medical visual question answering, in Proc. 26 th Int. Conf. Medical Image Computing and Computer Assisted Intervention, Vancouver, Canada, 2023, pp. 374–383.
[78]
Y. Liu, Z. Wang, D. Xu, and L. Zhou, Q2ATransformer: Improving medical VQA via an answer querying decoder, in Proc. 28 th Int. Conf. Information Processing in Medical Imaging, San Carlos de Bariloche, Argentina, 2023, pp. 445–456.
[79]
S. Gunasekar, Y. Zhang, J. Aneja, C. C. T. Mendes, A. Del Giorno, S. Gopi, M. Javaheripi, P. Kauffmann, G. de Rosa, O. Saarikivi, et al., Textbooks are all you need, arXiv preprint arXiv: 2306.11644, 2023.
[80]
A. Mitra, L. Del Corro, G. Zheng, S. Mahajan, D. Rouhana, A. Codas, Y. Lu, W. G. Chen, O. Vrousgos, C. Rosset, et al., Agentinstruct: Toward generative teaching with agentic flows, arXiv preprint arXiv: 2407.03502, 2024.
[81]
P. Wang, N. Zhang, B. Tian, Z. Xi, Y. Yao, Z. Xu, M. Wang, S. Mao, X. Wang, S. Cheng, et al., EasyEdit: An easy-to-use knowledge editing framework for large language models, arXiv preprint arXiv: 2308.07269, 2024.
[82]
S. Wang, Y. Zhu, H. Liu, Z. Zheng, C. Chen, and J. Li, Knowledge editing for large language models: A survey, arXiv preprint arXiv: 2310.16218, 2024.
[83]
Y. Zhang, Y. Li, L. Cui, D. Cai, L. Liu, T. Fu, X. Huang, E. Zhao, Y. Zhang, Y. Chen, et al., Siren’s song in the AI ocean: A survey on hallucination in large language models, arXiv preprint arXiv: 2309.01219, 2023.
[84]
M. Nasr, N. Carlini, J. Hayase, M. Jagielski, A. F. Cooper, D. Ippolito, C. A. Choquette-Choo, E. Wallace, F. Tramèr, and K. Lee, Scalable extraction of training data from (production) language models, arXiv preprint arXiv: 2311.17035, 2023.
[85]
J. Konečný, H. B. McMahan, D. Ramage, and P. Richtárik, Federated optimization: Distributed machine learning for on-device intelligence, arXiv preprint arXiv: 1610.02527, 2016.
[86]

A. A. Taha and A. Hanbury, Metrics for evaluating 3D medical image segmentation: Analysis, selection, and tool, BMC Med. Imaging, vol. 15, p. 29, 2015.

[87]
L. Qin, Q. Chen, F. Wei, S. Huang, and W. Che, Cross-lingual prompting: Improving zero-shot chain-of-thought reasoning across languages, arXiv preprint arXiv: 2310.14799, 2023.
[88]
D. Yoon, J. Jang, S. Kim, S. Kim, S. Shafayat, and M. Seo, LangBridge: Multilingual reasoning without multilingual supervision, arXiv preprint arXiv: 2401.10695, 2024.
[89]
X. Tang, A. Zou, Z. Zhang, Z. Li, Y. Zhao, X. Zhang, A. Cohan, and M. Gerstein, MedAgents: Large language models as collaborators for zero-shot medical reasoning, arXiv preprint arXiv: 2311.10537, 2024.
[90]
J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V. Le, and D. Zhou, Chain-of-thought prompting elicits reasoning in large language models, in Proc. 36 th Int. Conf. Neural Information Processing Systems, New Orleans, LA, USA, 2022, pp. 24824–24837.
[91]
Z. Zhang, A. Zhang, M. Li, and A. Smola, Automatic chain of thought prompting in large language models, arXiv preprint arXiv: 2210.03493, 2022.
[92]
W. Chen, X. Ma, X. Wang, and W. W. Cohen, Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks, arXiv preprint arXiv: 2211.12588, 2023.
[93]
J. Long, Large language model guided tree-of-thought, arXiv preprint arXiv: 2305.08291, 2023.
[94]
Y. Zhang, Q. Chen, J. Zhou, P. Wang, J. Si, J. Wang, W. Lu, and L. Qin, Wrong-of-thought: An integrated reasoning framework with multi-perspective verification and wrong information, in Proc. Findings of the Association for Computational Linguistics : EMNLP 2024, Miami, FL, USA, 2024, pp. 6644–6653.
[95]

L. Wang, X. Chen, X. Deng, H. Wen, M. You, W. Liu, Q. Li, and J. Li, Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs, npj Digit. Med., vol. 7, no. 1, p. 41, 2024.

[96]
S. Wu, H. Fei, L. Qu, W. Ji, and T. S. Chua, NExT-GPT: Any-to-any multimodal LLM, in Proc. 41 st Int. Conf. Machine Learning, Vienna, Austria, 2024, pp. 53366–53397.
[97]

F. Shamshad, S. Khan, S. W. Zamir, M. H. Khan, M. Hayat, F. S. Khan, and H. Fu, Transformers in medical imaging: A survey, Med. Image Anal., vol. 88, p. 102802, 2023.

[98]

E. Çallı, E. Sogancioglu, B. van Ginneken, K. G. van Leeuwen, and K. Murphy, Deep learning for chest X-ray analysis: A survey, Med. Image Anal., vol. 72, p. 102125, 2021.

[99]

J. Wang, H. Zhu, S. H. Wang, and Y. D. Zhang, A review of deep learning on medical image analysis, Mob. Netw. Appl., vol. 26, no. 1, pp. 351–380, 2021.

[100]

P. Chlap, H. Min, N. Vandenberg, J. Dowling, L. Holloway, and A. Haworth, A review of medical image data augmentation techniques for deep learning applications, J. Med. Imaging Radiat. Oncol., vol. 65, no. 5, pp. 545–563, 2021.

[101]

M. A. Morid, A. Borjali, and G. Del Fiol, A scoping review of transfer learning research on medical image analysis using ImageNet, Comput. Biol. Med., vol. 128, p. 104115, 2021.

[102]

X. Xie, J. Niu, X. Liu, Z. Chen, S. Tang, and S. Yu, A survey on incorporating domain knowledge into deep learning for medical image analysis, Med. Image Anal., vol. 69, p. 101985, 2021.

[103]

H. Guan and M. Liu, Domain adaptation for medical image analysis: A survey, IEEE Trans. Biomed. Eng., vol. 69, no. 3, p. 1173–1185, 2022.

[104]

V. Cheplygina, M. de Bruijne, and J. P. W. Pluim, Not-so-supervised: A survey of semi-supervised, multi-instance, and transfer learning in medical image analysis, Med. Image Anal., vol. 54, pp. 280–296, 2019.

[105]

S. Shurrab and R. Duwairi, Self-supervised learning methods and applications in medical imaging analysis: A survey, PeerJ Comput. Sci., vol. 8, p. e1045, 2022.

[106]
S. P. Singh, L. Wang, S. Gupta, H. Goli, P. Padmanabhan, and B. Gulyás, 3D deep learning on medical images: A review, Sensors, vol. 20, no. 18, p. 5097, 2020.
[107]

B. H. M. van der velden, H. J. Kuijf, K. G. A. Gilhuijs, and M. A. Viergever, Explainable artificial intelligence (XAI) in deep learning-based medical image analysis, Med. Image Anal., vol. 79, p. 102470, 2022.

[108]

A. Singh, S. Sengupta, and V. Lakshminarayanan, Explainable deep learning models in medical image analysis, J. Imaging, vol. 6, no. 6, p. 52, 2020.

[109]
K. Papineni, S. Roukos, T. Ward, and W. J. Zhu, BLEU: A method for automatic evaluation of machine translation, in Proc. 40 th Annu. Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, 2002, pp. 311–318.
[110]
C. Y. Lin, ROUGE: A package for automatic evaluation of summaries, in Proc. Text Summarization Branches Out, Barcelona, Spain, 2004, pp. 74–81.
[111]
R. Vedantam, C. L. Zitnick, and D. Parikh, CIDEr: Consensus-based image description evaluation, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 4566–4575.
[112]
S. Banerjee and A. Lavie, METEOR: An automatic metric for MT evaluation with improved correlation with human judgments, in Proc. ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, MI, USA, 2005, pp. 65–72.
[113]

J. H. Moon, H. Lee, W. Shin, Y. H. Kim, and E. Choi, Multi-modal understanding and generation for medical images and text via vision-language pre-training, IEEE J. Biomed. Health Inform., vol. 26, no. 12, pp. 6070–6080, 2022.

[114]

S. Park, E. S. Lee, K. S. Shin, J. E. Lee, and J. C. Ye, Self-supervised multi-modal training from uncurated images and reports enables monitoring AI in radiology, Med. Image Anal., vol. 91, p. 103021, 2024.

Big Data Mining and Analytics
Pages 496-517
Cite this article:
Wang P, Lu W, Lu C, et al. Large Language Model for Medical Images: A Survey of Taxonomy, Systematic Review, and Future Trends. Big Data Mining and Analytics, 2025, 8(2): 496-517. https://doi.org/10.26599/BDMA.2024.9020090
Metrics & Citations  
Article History
Copyright
Rights and Permissions
Return