Abstract
In this work, we propose an original method, the Random matrix theory (RMT)-based hierarchical clustering method, to identify functional gene networks of diffuse large B-cell Lymphoma (DLBCL) gene co-expression networks. Comparing topological approach, the RMT-based hierarchical clustering method is effective in representing not only the strong correlations between genes inside the modules (the modularity and independency of networks), but also the weak correlations between different modules (the hierarchy of networks). We show that missing expression values among microarray dataset should not be neglected, and different imputation methods result in different performances. We suggest LLS to estimate missing values for better performance in accuracy and stability. Based on the RMT, the random noises are separated from DLBCL gene expression data. We use normalized root mean squared error (NRMSE) ratio method to identify a transition zone of NNSDs, and for DLBCL networks it is [0.71, 0.84].