Scholar - SciOpen

Open Access Issue

SeaConvNeXt: A Lightweight Two-Branch Network Architecture for Efficient Prediction of Specific IHC Proteins and Antigens on Hematoxylin and Eosin (H&E) Images

Yuli Chen, Guoping Chen, Guoying Shi, Yao Zhou, Jiayang Bai, Germán Corredor, Cheng Lu, Xiujuan Lei

Big Data Mining and Analytics 2024, 7(4): 1212-1236

Published: 04 December 2024

Abstract

PDF (23.1 MB) Collect Collected

Downloads：71

Immunohistochemistry (IHC) is a vital technique for detecting specific proteins and antigens in tissue sections using antibodies, aiding in the analysis of tumor growth and metastasis. However, IHC is costly and time-consuming, making it challenging to implement on a large scale. To address this issue, we introduce a method that enables virtual IHC staining directly on Hematoxylin and Eosin (H&E) images. Firstly, we have developed a novel registration technique, called Bi-stage Registration based on density Clustering (BiReC), to enhance the registration efficiency between H&E and IHC images. This method involves automatically generating numerous Regions Of Interest (ROI) labels on the H&E image for model training, with the labels being determined by the intensity of IHC staining. Secondly, we propose a novel two-branch network architecture, called SeaConvNeXt, which integrates a lightweight Squeeze-Enhanced Axial (SEA) attention mechanism to efficiently extract and fuse multi-level local and global features from H&E images for direct prediction of specific proteins and antigens. The SeaConvNeXt consists of a ConvNeXt branch and a global fusion branch. The ConvNeXt branch extracts multi-level local features at four stages, while the global fusion branch, including an SEA Transformer module and three global blocks, is designed for global feature extraction and multiple feature fusion. Our experiments demonstrate that SeaConvNeXt outperforms current state-of-the-art methods on two public datasets with corresponding IHC and H&E images, achieving an AUC of 90.7% on the HER2SC dataset and 82.5% on the CRC dataset. These results suggest that SeaConvNeXt has great potential for predicting virtual IHC staining on H&E images.

Open Access Issue

Molecular Generation and Optimization of Molecular Properties Using a Transformer Model

Zhongyin Xu, Xiujuan Lei, Mei Ma, Yi Pan

Big Data Mining and Analytics 2024, 7(1): 142-155

Published: 25 December 2023

Abstract

PDF (2.9 MB) Collect Collected

Downloads：196

Generating novel molecules to satisfy specific properties is a challenging task in modern drug discovery, which requires the optimization of a specific objective based on satisfying chemical rules. Herein, we aim to optimize the properties of a specific molecule to satisfy the specific properties of the generated molecule. The Matched Molecular Pairs (MMPs), which contain the source and target molecules, are used herein, and logD and solubility are selected as the optimization properties. The main innovative work lies in the calculation related to a specific transformer from the perspective of a matrix dimension. Threshold intervals and state changes are then used to encode logD and solubility for subsequent tests. During the experiments, we screen the data based on the proportion of heavy atoms to all atoms in the groups and select 12365, 1503, and 1570 MMPs as the training, validation, and test sets, respectively. Transformer models are compared with the baseline models with respect to their abilities to generate molecules with specific properties. Results show that the transformer model can accurately optimize the source molecules to satisfy specific properties.

Open Access Issue

Metabolite-Disease Association Prediction Algorithm Combining DeepWalk and Random Forest

Jiaojiao Tie, Xiujuan Lei, Yi Pan

Tsinghua Science and Technology 2022, 27(1): 58-67

Published: 17 August 2021

Abstract

PDF (2.2 MB) Collect Collected

Downloads：103

Identifying the association between metabolites and diseases will help us understand the pathogenesis of diseases, which has great significance in diagnosing and treating diseases. However, traditional biometric methods are time consuming and expensive. Accordingly, we propose a new metabolite-disease association prediction algorithm based on DeepWalk and random forest (DWRF), which consists of the following key steps: First, the semantic similarity and information entropy similarity of diseases are integrated as the final disease similarity. Similarly, molecular fingerprint similarity and information entropy similarity of metabolites are integrated as the final metabolite similarity. Then, DeepWalk is used to extract metabolite features based on the network of metabolite-gene associations. Finally, a random forest algorithm is employed to infer metabolite-disease associations. The experimental results show that DWRF has good performances in terms of the area under the curve value, leave-one-out cross-validation, and five-fold cross-validation. Case studies also indicate that DWRF has a reliable performance in metabolite-disease association prediction.

Regular Paper Issue

Predicting CircRNA-Disease Associations Based on Improved Weighted Biased Meta-Structure

Xiu-Juan Lei, Chen Bian, Yi Pan

Journal of Computer Science and Technology 2021, 36(2): 288-298

Published: 05 March 2021

Abstract Collect Collected

Circular RNAs (circRNAs) are RNAs with a special closed loop structure, which play important roles in tumors and other diseases. Due to the time consumption of biological experiments, computational methods for predicting associations between circRNAs and diseases become a better choice. Taking the limited number of verified circRNA-disease associations into account, we propose a method named CDWBMS, which integrates a small number of verified circRNA-disease associations with a plenty of circRNA information to discover the novel circRNA-disease associations. CDWBMS adopts an improved weighted biased meta-structure search algorithm on a heterogeneous network to predict associations between circRNAs and diseases. In terms of leave-one-out-cross-validation (LOOCV), 10-fold cross-validation and 5-fold cross-validation, CDWBMS yields the area under the receiver operating characteristic curve (AUC) values of 0.921 6, 0.917 2 and 0.900 5, respectively. Furthermore, case studies show that CDWBMS can predict unknow circRNA-disease associations. In conclusion, CDWBMS is an effective method for exploring disease-related circRNAs.

Open Access Issue

CircRNA-Disease Associations Prediction Based on Metapath2vec++ and Matrix Factorization

Yuchen Zhang, Xiujuan Lei, Zengqiang Fang, Yi Pan

Big Data Mining and Analytics 2020, 3(4): 280-291

Published: 16 November 2020

Abstract

PDF (4.4 MB) Collect Collected

Downloads：104

Circular RNA (circRNA) is a novel non-coding endogenous RNAs. Evidence has shown that circRNAs are related to many biological processes and play essential roles in different biological functions. Although increasing numbers of circRNAs are discovered using high-throughput sequencing technologies, these techniques are still time-consuming and costly. In this study, we propose a computational method to predict circRNA-disesae associations which is based on metapath2vec++ and matrix factorization with integrated multiple data (called PCD_MVMF). To construct more reliable networks, various aspects are considered. Firstly, circRNA annotation, sequence, and functional similarity networks are established, and disease-related genes and semantics are adopted to construct disease functional and semantic similarity networks. Secondly, metapath2vec++ is applied on an integrated heterogeneous network to learn the embedded features and initial prediction score. Finally, we use matrix factorization, take similarity as a constraint, and optimize it to obtain the final prediction results. Leave-one-out cross-validation, five-fold cross-validation, and f-measure are adopted to evaluate the performance of PCD_MVMF. These evaluation metrics verify that PCD_MVMF has better prediction performance than other methods. To further illustrate the performance of PCD_MVMF, case studies of common diseases are conducted. Therefore, PCD_MVMF can be regarded as a reliable and useful circRNA-disease association prediction tool.

Open Access Issue

Prediction of miRNA-circRNA Associations Based on k-NN Multi-Label with Random Walk Restart on a Heterogeneous Network

Zengqiang Fang, Xiujuan Lei

Big Data Mining and Analytics 2019, 2(4): 261-272

Published: 05 August 2019

Abstract

PDF (61.6 MB) Collect Collected

Downloads：41

Circular RNAs (circRNAs) play important roles in various biological processes, as essential non-coding RNAs that have effects on transcriptional and posttranscriptional gene expression regulation. Recently, many studies have shown that circRNAs can be regarded as micro RNA (miRNA) sponges, which are known to be associated with certain diseases. Therefore efficient computation methods are needed to explore miRNA-circRNA interactions, but only very few computational methods for predicting the associations between miRNAs and circRNAs exist. In this study, we adopt an improved random walk computational method, named KRWRMC, to express complicated associations between miRNAs and circRNAs. Our major contributions can be summed up in two points. First, in the conventional Random Walk Restart Heterogeneous (RWRH) algorithm, the computational method simply converts the circRNA/miRNA similarity network into the transition probability matrix; in contrast, we take the influence of the neighbor of the node in the network into account, which can suggest or stress some potential associations. Second, our proposed KRWRMC is the first computational model to calculate large numbers of miRNA-circRNA associations, which can be regarded as biomarkers to diagnose certain diseases and can thus help us to better understand complicated diseases. The reliability of KRWRMC has been verified by Leave One Out Cross Validation (LOOCV) and 10-fold cross validation, the results of which indicate that this method achieves excellent performance in predicting potential miRNA-circRNA associations.