In medical X-ray images, multiple abnormalities may occur frequently. However, existing report generation methods cannot efficiently extract all abnormal features, resulting in incomplete disease diagnoses when generating diagnostic reports. In real medical scenarios, there are co-occurrence relations among multiple diseases. If such co-occurrence relations are mined and integrated into the feature extraction process, the issue of missing abnormal features may be addressed. Inspired by this observation, we propose a novel method to improve the extraction of abnormal features in images through joint probability graph reasoning. Specifically, to reveal the co-occurrence relations among multiple diseases, we conduct statistical analyses on the dataset, and extract disease relationships into a probability map. Subsequently, we devise a graph reasoning network for conducting correlation-based reasoning over the features of medical images, which can facilitate the acquisition of more abnormal features. Furthermore, we introduce a gating mechanism focused on cross-modal features fusion into the current text generation model. This optimization substantially improves the model’s capabilities to learn and fuse information from two distinct modalities—medical images and texts. Experimental results on the IU-X-Ray and MIMIC-CXR datasets demonstrate that our approach outperforms previous state-of-the-art methods, exhibiting the ability to generate higher quality medical image reports.


The YOLOv5 algorithm is widely used in edge computing systems for object detection. However, the limited computing resources of embedded devices and the large model size of existing deep learning-based methods increase the difffculty of real-time object detection on edge devices. To address this issue, we propose a smaller, less computationally intensive and more accurate algorithm for object detection. Multi-scale feature fusion-YOLO (MFF-YOLO) is built on top of the YOLOv5s framework, but it contains substantial improvements to YOLOv5s. First, we design the MFF module to improve the feature propagation path in the feature pyramid, which further integrates the semantic information from different paths of feature layers. Then, a large convolutionkernel module is used in the bottleneck. The structure enlarges the receptive ffeld and preserves shallow semantic information, which overcomes the performance limitation arising from uneven propagation in feature pyramid networks (FPN). In addition, a multi-branch downsampling method based on depthwise separable convolutions and a bottleneck structure with deformable convolutions are designed to reduce the complexity of the backbone network and minimize the real-time performance loss caused by the increased model complexity. The experimental results on the PASCAL VOC and MS COCO datasets show that, compared with YOLOv5s, MFF-YOLO reduces the number of parameters by 7% and the number of FLOPs by 11.8%. The mAP@0.5 has improved by 3.7% and 5.5%, and the mAP@0.5:0.95 has improved by 6.5% and 6.2%. Furthermore, compared with YOLOv7-tiny, PP-YOLO-tiny, and other mainstream methods, MFF-YOLO has achieved better results on multiple indicators.