The rapid evolution of software development has accentuated the deficiencies of prevailing code clone detection techniques. As modern applications become more complex, traditional cloning tools often struggle to detect general and large-gap clones that undergo regular modification. Such challenges pose threats to software integrity, emphasizing the critical need for improved code cloning techniques. Observing the prevailing gap, we propose an innovative code clone dive (CCDive) code cloning technique, which is designed to detect an extensive range of clones, from direct clones to the often challenging large-gap clones, thoroughly covering different categories, such as very strongly Type-III, strongly Type-III, and moderate Type-III clones. In CCDive, the fusion of a level-by-level abstraction and an innovative similarity matching algorithm ensures the recognition of clones even when nearly half the original code in the chunk has been modified. Furthermore, by integrating the Smith–Waterman local sequence alignment, the capability of CCDive to spot exact code transformation locations can be enhanced. In a comprehensive evaluation, CCDive was compared with well-known code cloning techniques. The efficacy of CCDive was measured using precision, recall, F1-score, accuracy, and efficiency. CCDive consistently surpassed other techniques in the precision, recall, F1-score, and accuracy metrics for both file-based and function-based clone detection. The robust performance of CCDive emphasizes its effectiveness, reliability, accuracy, and efficiency, making it well-suited for practical applications in the real world.
Q. U. Ain, W. H. Butt, M. W. Anwar, F. Azam, and B. Maqbool, A systematic review on code clone detection, IEEE Access, vol. 7, pp. 86121–86144, 2019.
H. Zhang and K. Sakurai, A survey of software clone detection from security perspective, IEEE Access, vol. 9, pp. 48157–48173, 2021.
S. Kim and H. Lee, Software systems at risk: An empirical study of cloned vulnerabilities in practice, Computers & Security, vol. 77, pp. 720–736, 2018.
E. Kodhai and S. Kanmani, Method-level code clone detection through LWH (Light Weight Hybrid) approach, J. Softw. Eng. Res. Dev., vol. 2, no. 1, pp. 12–29, 2014.
N. Saini, S. Singh, and N. Suman, Code clones: Detection and management, Procedia Comput. Sci., vol. 132, pp. 718–727, 2018.
J. Akram, M. Mumtaz, G. Jabeen, and P. Luo, DroidMD: An efficient and scalable Android malware detection approach at source code level, International Journal of Information and Computer Security, vol. 15, no. 2–3, pp. 299–321, 2021.
J. Svajlenko, I. Keivanloo, and C. K. Roy, Big data clone detection using classical detectors: An exploratory study, J. Softw.: Evol. Process, vol. 27, no. 6, pp. 430–464, 2015.