PDF (5.6 MB)
Collect
Submit Manuscript
Open Access

Generating Medical Report via Joint Probability Graph Reasoning

Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China
Shandong Inspur Intelligent Medical Technology Co., Ltd., Jinan 250101, China
School of Cyber Engineering, Xidian University, Xi’an 710071, China
School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
Show Author Information

Abstract

In medical X-ray images, multiple abnormalities may occur frequently. However, existing report generation methods cannot efficiently extract all abnormal features, resulting in incomplete disease diagnoses when generating diagnostic reports. In real medical scenarios, there are co-occurrence relations among multiple diseases. If such co-occurrence relations are mined and integrated into the feature extraction process, the issue of missing abnormal features may be addressed. Inspired by this observation, we propose a novel method to improve the extraction of abnormal features in images through joint probability graph reasoning. Specifically, to reveal the co-occurrence relations among multiple diseases, we conduct statistical analyses on the dataset, and extract disease relationships into a probability map. Subsequently, we devise a graph reasoning network for conducting correlation-based reasoning over the features of medical images, which can facilitate the acquisition of more abnormal features. Furthermore, we introduce a gating mechanism focused on cross-modal features fusion into the current text generation model. This optimization substantially improves the model’s capabilities to learn and fuse information from two distinct modalities—medical images and texts. Experimental results on the IU-X-Ray and MIMIC-CXR datasets demonstrate that our approach outperforms previous state-of-the-art methods, exhibiting the ability to generate higher quality medical image reports.

References

[1]

L. Zhang, K. Zhang, and H. Pan, Sunet++: A deep network with channel attention for small-scale object segmentation on 3D medical images, Tsinghua Science and Technology, vol. 28, no. 4, pp. 628–638, 2023.

[2]
Q. Huang, Y. Zhou, L. Tao, W. Yu, Y. Zhang, L. Huo, and Z. He, A Chan-Vese model based on the Markov chain for unsupervised medical image segmentation, Tsinghua Science and Technology, vol. 26, no. 6, pp. 833–844, 2021.
[3]
X. Xing, J. Del Ser, Y. Wu, Y. Li, J. Xia, L. Xu, D. Firmin, P. Gatehouse, and G. Yang, HDL: Hybrid deep learning for the synthesis of myocardial velocity maps in digital twins for cardiac analysis, IEEE J. Biomed. Health Inform., vol. 27, no. 10, pp. 5134–5142, 2023.
[4]
B. Jing, P. Xie, and E. Xing, On the automatic generation of medical imaging reports, in Proc. 56 th Annu. Meeting of the Association for Computational Linguistics, Melbourne, Australia, 2018, pp. 2577–2586.
[5]
B. Jing, Z. Wang, and E. Xing, Show, describe and conclude: On exploiting the structure information of chest X-ray reports, in Proc. 57 th Annu. Meeting of the Association for Computational Linguistics, Florence, Italy, 2019, pp. 6570–6580.
[6]

B. Hui, Y. Liu, J. Qiu, L. Cao, L. Ji, and Z. He, Study of texture segmentation and classification for grading small hepatocellular carcinoma based on CT images, Tsinghua Science and Technology, vol. 26, no. 2, pp. 199–207, 2021.

[7]

X. Fan, M. Dai, C. Liu, F. Wu, X. Yan, Y. Feng, Y. Feng, and B. Su, Effect of image noise on the classification of skin lesions using deep convolutional neural networks, Tsinghua Science and Technology, vol. 25, no. 3, pp. 425–434, 2020.

[8]
Z. Chen, Y. Song, T. H. Chang, and X. Wan, Generating radiology reports via memory-driven transformer, in Proc. 2020 Conf. Empirical Methods in Natural Language Processing (EMNLP ), Virtual Event, 2020, pp. 1439–1449.
[9]

M. Li, R. Liu, F. Wang, X. Chang, and X. Liang, Auxiliary signal-guided knowledge encoder-decoder for medical report generation, World Wide Web, vol. 26, no. 1, pp. 253–270, 2023.

[10]
B. Hou, G. Kaissis, R. M. Summers, and B. Kainz, RATCHET: Medical transformer for chest X-ray diagnosis and reporting, in Proc. 24 th Int. Conf. Medical Image Computing and Computer Assisted Intervention, Strasbourg, France, 2021, pp. 293–303.
[11]
C. Y. Li, X. Liang, Z. Hu, and E. P. Xing, Hybrid retrieval-generation reinforced agent for medical image report generation, in Proc. 32 nd Int. Conf. Neural Information Processing Systems, Montréal, Canada, 2018, pp. 1537–1547.
[12]
Y. Zhang, X. Wang, Z. Xu, Q. Yu, A. Yuille, and D. Xu, When radiology report generation meets knowledge graph, in Proc. 34 th AAAI Conf. Artificial Intelligence, New York, NY, USA, 2020, pp. 12910–12917.
[13]
C. Y. Li, X. Liang, Z. Hu, and E. P. Xing, Knowledge-driven encode, retrieve, paraphrase for medical image report generation, in Proc. 33 rd AAAI Conf. Artificial Intelligence, Honolulu, HI, USA, 2019, pp. 6666–6673.
[14]
X. Wang, Y. Peng, L. Lu, Z. Lu, and R. M. Summers, TieNet: Text-image embedding network for common thorax disease classification and reporting in chest X-Rays, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 9049–9058.
[15]
Y. Xue, T. Xu, L. R. Long, Z. Xue, S. Antani, G. R. Thoma, and X. Huang, Multimodal recurrent model with attention for automated radiology report generation, in Proc. 21 st Int. Conf. Medical Image Computing and Computer Assisted Intervention, Granada, Spain, 2018, pp. 457–466.
[16]
J. Yuan, H. Liao, R. Luo, and J. Luo, Automatic radiology report generation based on multi-view image fusion and medical concept enrichment, in Proc. 22 nd Int. Conf. Medical Image Computing and Computer Assisted Intervention, Shenzhen, China, 2019, pp. 721–729.
[17]
F. Liu, X. Wu, S. Ge, W. Fan, and Y. Zou, Exploring and distilling posterior and prior knowledge for radiology report generation, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, TN, USA, 2021, pp. 13748–13757.
[18]
F. Liu, C. Yin, X. Wu, S. Ge, P. Zhang, and X. Sun, Contrastive attention for automatic chest X-ray report generation, in Proc. Findings of the Association for Computational Linguistics, Virtual Event, 2021, pp. 269–280.
[19]

O. Alfarghaly, R. Khaled, A. Elkorany, M. Helal, and A. Fahmy, Automated radiology report generation using conditioned transformers, Inform. Med. Unlocked, vol. 24, p. 100557, 2021.

[20]
Y. Zhou, L. Huang, T. Zhou, H. Fu, and L. Shao, Visual-textual attentive semantic consistency for medical report generation, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV ), Montreal, Canada, 2021, pp. 3965–3974.
[21]
Z. Chen, Y. Shen, Y. Song, and X. Wan, Cross-modal memory networks for radiology report generation, in Proc. 59 th Annu. Meeting of the Association for Computational Linguistics and the 11 th Int. Joint Conf. Natural Language Processing (Volume 1 : Long Papers ), Virtual Event, 2021, pp. 5904–5914.
[22]

Y. Liu, X. Feng, and Z. Zhou, Multimodal video classification with stacked contractive autoencoders, Signal Process., vol. 120, pp. 761–766, 2016.

[23]
S. Wu, S. Bondugula, F. Luisier, X. Zhuang, and P. Natarajan, Zero-shot event detection using multi-modal fusion of weakly supervised concepts, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp. 2665–2672.
[24]

A. Habibian, T. Mensink, and C. G. M. Snoek, Video2vec embeddings recognize events when examples are scarce, IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 10, pp. 2089–2103, 2017.

[25]
Y. Bie, Y. Yang, and Y. Zhang. Fusing syntactic structure information and lexical semantic information for end-to-end aspect-based sentiment analysis. Tsinghua Science and Technology, vol. 28, no. 2, pp. 230–243, 2023.
[26]
C. Peng, C. Zhang, X. Xue, J. Gao, H. Liang, and Z. Niu. Cross-modal complementary network with hierarchical fusion for multimodal sentiment classification. Tsinghua Science and Technology, vol. 27, no. 4, pp. 664–679, 2022.
[27]
Z. Liu, Y. Shen, V. B. Lakshminarasimhan, P. P. Liang, A. B. Zadeh, and L. P. Morency, Efficient low-rank multimodal fusion with modality-specific factors, in Proc. 56 th Annu. Meeting of the Association for Computational Linguistics (Volume 1 : Long Papers ), Melbourne, Australia, 2018, pp. 2247–2256.
[28]
A. Zadeh, P. P. Liang, N. Mazumder, S. Poria, E. Cambria, and L. P. Morency, Memory fusion network for multi-view sequential learning, in Proc. 32 nd AAAI Conf. Artificial Intelligence, New Orleans, LA, USA, 2018, p. 691.
[29]
Q. Long, M. Wang, and L. Li, Generative imagination elevates machine translation, arXiv preprint arXiv:2009.09654, 2020.
[30]

Z. Z. Lan, L. Bao, S. I. Yu, W. Liu, and A. G. Hauptmann, Multimedia classification and event detection using double fusion, Multimed. Tools Appl., vol. 71, pp. 333–347, 2014.

[31]
Y. Lu, Y. Wu, B. Liu, T. Zhang, B. Li, Q. Chu, and N. Yu, Cross-modality person re-identification with shared-specific feature transfer, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 13376–13386.
[32]
J. E. A. Ovalle, T. Solorio, M. Montes-y-Gómez, and F. A. González, Gated multimodal units for information fusion, in Proc. the 5 th Int. Conf. Learning Representations, arXiv preprint arXiv: 1702.01992, 2017.
[33]
A. Kumar and J. Vepa, Gated mechanism for attention based multi modal sentiment analysis, in Proc. 2020 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP ), Barcelona, Spain, 2020, pp. 4477–4481.
[34]

D. Demner-Fushman, M. D. Kohli, M. B. Rosenman, S. E. Shooshan, L. Rodriguez, S. Antani, G. R. Thoma, and C. J. McDonald, Preparing a collection of radiology examinations for distribution and retrieval, J. Am. Med. Inform. Assoc., vol. 23, no. 2, pp. 304–310, 2016.

[35]

A. E. W. Johnson, T. J. Pollard, S. J. Berkowitz, N. R. Greenbaum, M. P. Lungren, C. Y. Deng, R. G. Mark, and S. Horng, MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports, Sci. Data, vol. 6, no. 1, p. 317, 2019.

[36]
K. Papineni, S. Roukos, T. Ward, and W. J. Zhu, Bleu: A method for automatic evaluation of machine translation, in Proc. 40 th Annu. Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, 2002, pp. 311–318.
[37]
S. Banerjee and A. Lavie, METEOR: An automatic metric for mt evaluation with improved correlation with human judgments, in Proc. ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, MI, USA, 2005, pp. 65–72.
[38]
C. Y. Lin, ROUGE: A package for automatic evaluation of summaries, in Proc. Text Summarization Branches Out, Barcelona, Spain, 2004, pp. 74–81.
[39]
J. Lu, C. Xiong, D. Parikh, and R. Socher, Knowing when to look: Adaptive attention via a visual sentinel for image captioning, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR ), Honolulu, HI, USA, 2017, pp. 3242–3250.
[40]
H. Qin and Y. Song, Reinforced cross-modal alignment for radiology report generation, in Proc. Findings of the Association for Computational Linguistics, Dublin, Ireland, 2022, pp. 448–458.
[41]
J. You, D. Li, M. Okumura, and K. Suzuki, JPG-Jointly learn to align: Automated disease prediction and radiology report generation, in Proc. 29 th Int. Conf. Computational Linguistics, Gyeongju, Republic of Korea, 2022, pp. 5989–6001.
[42]

Z. Lin, D. Zhang, D. Shi, R. Xu, Q. Tao, L. Wu, M. He, and Z. Ge, Contrastive pre-training and linear interaction attention-based transformer for universal medical reports generation, J. Biomed. Inform., vol. 138, p. 104281, 2023.

[43]
R. Wang, X. Wang, Z. Xu, W. Xu, J. Chen, and T. Lukasiewicz, MvCo-DoT: Multi-view contrastive domain transfer network for medical report generation, in Proc. 2023 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP ), Rhodes Island, Greece, 2023, pp. 1–5.
Tsinghua Science and Technology
Pages 1685-1699
Cite this article:
Zhang J, Cheng M, Li X, et al. Generating Medical Report via Joint Probability Graph Reasoning. Tsinghua Science and Technology, 2025, 30(4): 1685-1699. https://doi.org/10.26599/TST.2024.9010058
Metrics & Citations  
Article History
Copyright
Rights and Permissions
Return