In medical X-ray images, multiple abnormalities may occur frequently. However, existing report generation methods cannot efficiently extract all abnormal features, resulting in incomplete disease diagnoses when generating diagnostic reports. In real medical scenarios, there are co-occurrence relations among multiple diseases. If such co-occurrence relations are mined and integrated into the feature extraction process, the issue of missing abnormal features may be addressed. Inspired by this observation, we propose a novel method to improve the extraction of abnormal features in images through joint probability graph reasoning. Specifically, to reveal the co-occurrence relations among multiple diseases, we conduct statistical analyses on the dataset, and extract disease relationships into a probability map. Subsequently, we devise a graph reasoning network for conducting correlation-based reasoning over the features of medical images, which can facilitate the acquisition of more abnormal features. Furthermore, we introduce a gating mechanism focused on cross-modal features fusion into the current text generation model. This optimization substantially improves the model’s capabilities to learn and fuse information from two distinct modalities—medical images and texts. Experimental results on the IU-X-Ray and MIMIC-CXR datasets demonstrate that our approach outperforms previous state-of-the-art methods, exhibiting the ability to generate higher quality medical image reports.
L. Zhang, K. Zhang, and H. Pan, Sunet++: A deep network with channel attention for small-scale object segmentation on 3D medical images, Tsinghua Science and Technology, vol. 28, no. 4, pp. 628–638, 2023.
B. Hui, Y. Liu, J. Qiu, L. Cao, L. Ji, and Z. He, Study of texture segmentation and classification for grading small hepatocellular carcinoma based on CT images, Tsinghua Science and Technology, vol. 26, no. 2, pp. 199–207, 2021.
X. Fan, M. Dai, C. Liu, F. Wu, X. Yan, Y. Feng, Y. Feng, and B. Su, Effect of image noise on the classification of skin lesions using deep convolutional neural networks, Tsinghua Science and Technology, vol. 25, no. 3, pp. 425–434, 2020.
M. Li, R. Liu, F. Wang, X. Chang, and X. Liang, Auxiliary signal-guided knowledge encoder-decoder for medical report generation, World Wide Web, vol. 26, no. 1, pp. 253–270, 2023.
O. Alfarghaly, R. Khaled, A. Elkorany, M. Helal, and A. Fahmy, Automated radiology report generation using conditioned transformers, Inform. Med. Unlocked, vol. 24, p. 100557, 2021.
Y. Liu, X. Feng, and Z. Zhou, Multimodal video classification with stacked contractive autoencoders, Signal Process., vol. 120, pp. 761–766, 2016.
A. Habibian, T. Mensink, and C. G. M. Snoek, Video2vec embeddings recognize events when examples are scarce, IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 10, pp. 2089–2103, 2017.
Z. Z. Lan, L. Bao, S. I. Yu, W. Liu, and A. G. Hauptmann, Multimedia classification and event detection using double fusion, Multimed. Tools Appl., vol. 71, pp. 333–347, 2014.
D. Demner-Fushman, M. D. Kohli, M. B. Rosenman, S. E. Shooshan, L. Rodriguez, S. Antani, G. R. Thoma, and C. J. McDonald, Preparing a collection of radiology examinations for distribution and retrieval, J. Am. Med. Inform. Assoc., vol. 23, no. 2, pp. 304–310, 2016.
A. E. W. Johnson, T. J. Pollard, S. J. Berkowitz, N. R. Greenbaum, M. P. Lungren, C. Y. Deng, R. G. Mark, and S. Horng, MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports, Sci. Data, vol. 6, no. 1, p. 317, 2019.
Z. Lin, D. Zhang, D. Shi, R. Xu, Q. Tao, L. Wu, M. He, and Z. Ge, Contrastive pre-training and linear interaction attention-based transformer for universal medical reports generation, J. Biomed. Inform., vol. 138, p. 104281, 2023.