| Sign up

PDF (8.4 MB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Open Access

Multi-Relational Graph Representation Learning for Financial Statement Fraud Detection

Chenxu Wang^¹(), Mengqin Wang^², Xiaoguang Wang^², Luyue Zhang^², Yi Long^³

1School of Software Engineering, and also with MoE Key Lab of Intelligent Networks and Network Security, Xi’an Jiaotong University, Xi’an 710049, China

2School of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, China

3Shenzhen Finance Institute, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Shenzhen 518026, China

Show Author Information

Abstract

Financial statement fraud refers to malicious manipulations of financial data in listed companies’ annual statements. Traditional machine learning approaches focus on individual companies, overlooking the interactive relationships among companies that are crucial for identifying fraud patterns. Moreover, fraud detection is a typical imbalanced binary classification task with normal samples outnumbering fraud ones. In this paper, we propose a multi-relational graph convolutional network, named FraudGCN, for detecting financial statement fraud. A multi-relational graph is constructed to integrate industrial, supply chain, and accounting-sharing relationships, effectively encapsulating the multidimensional and complex interactions among companies. We then develop a multi-relational graph convolutional network to aggregate information within each relationship and employ an attention mechanism to fuse information across multiple relationships. The attention mechanism enables the model to distinguish the importance of different relationships, thereby aggregating more useful information from key relationships. To alleviate the class imbalance problem, we present a diffusion-based under-sampling strategy that strategically selects key nodes globally for model training. We also employ focal loss to assign greater weights to harder-to-classify minority samples. We build a real-world dataset from the annual financial statement of listed companies in China. The experimental results show that FraudGCN achieves an improvement of 3.15% in Macro-recall, 3.36% in Macro-F1, and 3.86% in GMean compared to the second-best method. The dataset and codes are publicly available at: https://github.com/XNetLab/MRG-for-Finance.

Keywords

financial statement fraud class imbalance Graph Neural Networks (GNN)multi-relational graphs

References

[1]

P. Ravisankar, V. Ravi, G. R. Rao, and I. Bose, Detection of financial statement fraud and feature selection using data mining techniques, Decis. Support Syst., vol. 50, no. 2, pp. 491–500, 2011.

Crossref Google Scholar

[2]

S. Barman, U. Pal, A. Sarfaraj, B. Biswas, A. Mahata, and P. Mandal, A complete literature review on financial fraud detection applying data mining techniques, Int. J. Trust Manage. Comput. Commun., vol. 3, no. 4, pp. 336–359, 2016.

Crossref Google Scholar

[3]

G. Niu, L. Yu, G. Z. Fan, and D. Zhang, Corporate fraud, risk avoidance, and housing investment in China, Emerg. Mark. Rev., vol. 39, pp. 18–33, 2019.

Crossref Google Scholar

[4]

M. Jusup, P. Holme, K. Kanazawa, M. Takayasu, I. Romić, Z. Wang, S. Geček, T. Lipić, B. Podobnik, L. Wang, et al., Social physics, Phys. Rep., vol. 948, pp. 1–148, 2022.

Crossref Google Scholar

[5]

L. G. A. Alves, H. Y. D. Sigaki, M. Perc, and H. V. Ribeiro, Collective dynamics of stock market efficiency, Sci. Rep., vol. 10, no. 1, p. 21992, 2020.

Crossref Google Scholar

[6]

D. Fister, M. Perc, and T. Jagrič, Two robust long short-term memory frameworks for trading stocks, Appl. Intell., vol. 51, no. 10, pp. 7177–7195, 2021.

Crossref Google Scholar

[7]

A. A. B. Pessa, M. Perc, and H. V. Ribeiro, Age and market capitalization drive large price variations of cryptocurrencies, Sci. Rep., vol. 13, no. 1, p. 3351, 2023.

Crossref Google Scholar

[8]

B. Baesens, S. Höppner, and T. Verdonck, Data engineering for fraud detection, Decis. Support Syst., vol. 150, p. 113492, 2021.

Crossref Google Scholar

[9]

K. G. Al-Hashedi and P. Magalingam, Financial fraud detection applying data mining techniques: A comprehensive review from 2009 to 2019, Comput. Sci. Rev., vol. 40, p. 100402, 2021.

Crossref Google Scholar

[10]

P. S. Stanimirovic, Fraud detection in publicly traded us firms using beetle antennae search: A machine learning approach, Expert Systems with Applications, vol. 191, p. 116148, 2022.

Crossref Google Scholar

[11]

P. M. Dechow, W. Ge, C. R. Larson, and R. G. Sloan, Predicting material accounting misstatements, Contemp. Account. Res., vol. 28, no. 1, pp. 17–82, 2011.

Crossref Google Scholar

[12]

P. Craja, A. Kim, and S. Lessmann, Deep learning for detecting financial statement fraud, Decis. Support Syst., vol. 139, p. 113421, 2020.

Crossref Google Scholar

[13]

Z. Sabir, H. A. Wahab, S. Javeed, and H. M. Baskonus, An efficient stochastic numerical computing framework for the nonlinear higher order singular models, Fractal Fract., vol. 5, no. 4, p. 176, 2021.

Crossref Google Scholar

[14]

Z. Sabir, K. Nisar, M. A. Z. Raja, A. A. B. A. Ibrahim, J. J. P. C. Rodrigues, K. S. Al-Basyouni, S. R. Mahmoud, and D. B. Rawat, Heuristic computational design of morlet wavelet for solving the higher order singular nonlinear differential equations, Alex. Eng. J., vol. 60, no. 6, pp. 5935–5947, 2021.

Crossref Google Scholar

[15]

S. Yang, Z. Zhang, J. Zhou, Y. Wang, W. Sun, X. Zhong, Y. Fang, Q. Yu, and Y. Qi, Financial risk analysis for SMEs with graph-based supply chain mining, in Proc. Twenty-Ninth Int. Joint Conf. on Artificial Intelligence, Yokohama, Japan, 2021, p. 643.

[16]

D. T. Ngaa, N. T. Le Thi Khanh Hoaa, P. Anha, T. V. Anha, L. P. Thaoa, and D. T. Haa, The impact of auditor’s emotional intelligence and leadership style on audit quality: A, in Proc. ICAEFM 2023, Nha trang, Vietnam 2023, p. 124.

[17]

R. Barandela, R. M. Valdovinos, J. S. Sánchez, and F. J. Ferri, The imbalanced training sample problem: Under or over sampling? in Proc. Joint IAPR Int. Workshops, Lisbon, Portugal, 2004, pp. 806–814.

[18]

S. C. L. Koh, M. Demirbag, E. Bayraktar, E. Tatoglu, and S. Zaim, The impact of supply chain management practices on performance of SMEs, Ind. Manage. Data Syst., vol. 107, no. 1, pp. 103–124, 2007.

Crossref Google Scholar

[19]

M. Abed and B. Fernando, E-commerce fraud detection based on machine learning techniques: Systematic literature review, Big Data Mining and Analytics, vol. 7, no.2, pp. 419−444, 2024.

Crossref Google Scholar

[20]

J. Perols, Financial statement fraud detection: An analysis of statistical and machine learning algorithms, Audit. : A J. Pract. Theory, vol. 30, no. 2, pp. 19–50, 2011.

Crossref Google Scholar

[21]

W. H. Beaver, Financial ratios as predictors of failure, J. Account. Res. vol. 4, no. 1, pp. 71–111, 1966.

Crossref Google Scholar

[22]

M. Cecchini, H. Aytug, G. J. Koehler, and P. Pathak, Detecting management fraud in public companies, Manage. Sci., vol. 56, no. 7, pp. 1146–1160, 2010.

Crossref Google Scholar

[23]

S. Kotsiantis, E. Koumanakos, D. Tzelepis, V. Tampakas, Forecasting fraudulent financial statements using data mining, Int. J. Comput. Intell., vol. 3, no. 2, pp. 104–110, 2006.

[24]

H. C. Koh and C. K. Low, Going concern prediction using data mining techniques, Manag. Audit. J., vol. 19, no. 3, pp. 462–476, 2004.

Crossref Google Scholar

[25]

C. Liu, Y. Chan, S. H. A. Kazmi, and H. Fu, Financial fraud detection model: Based on random forest, Int. J. Econ. Finance, vol. 7, no. 7, pp. 178–188, 2015.

Crossref Google Scholar

[26]

M. Cecchini, H. Aytug, G. J. Koehler, and P. Pathak, Making words work: Using financial text as a predictor of financial events, Decis. Support Syst., vol. 50, no. 1, pp. 164–175, 2010.

Crossref Google Scholar

[27]

Y. Y. Chen, Forecasting financial distress of listed companies with textual content of the information disclosure: A study based MD&A in Chinese annual reports, (in Chinese), Chin. J. Manage. Sci., vol. 27, no. 7, pp. 23–34, 2019.

[28]

T. K. Hwang, W. C. Chen, W. C. Chiang, and Y. M. Li, Machine learning detection for financial statement fraud, in Information Systems and Technologies, A. Rocha, H. Adeli, G. Dzemyda, and F. Moreira, eds. Cham, Switzerland: Springer, 2022, pp. 148–154.

[29]

A. Dyck, A. Morse, and L. Zingales, Who blows the whistle on corporate fraud? J. Finance, vol. 65, no. 6, pp. 2213–2253, 2010.

Crossref Google Scholar

[30]

P. Hajek and R. Henriques, Mining corporate annual reports for intelligent detection of financial statement fraud—A comparative study of machine learning methods, Knowl. -Based Syst., vol. 128, pp. 139–152, 2017.

Crossref Google Scholar

[31]

J. L. Hobson, W. J. Mayew, and M. Venkatachalam, Analyzing speech to detect financial misreporting, J. Account. Res., vol. 50, no. 2, pp. 349–392, 2012.

Crossref Google Scholar

[32]

W. Dong, S. Liao, and Z. Zhang, Leveraging financial social media data for corporate fraud detection, J. Manage. Inf. Syst., vol. 35, no. 2, pp. 461–487, 2018.

Crossref Google Scholar

[33]

F. H. Chen, D. J. Chi, and J. Y. Zhu, Application of random forest, rough set theory, decision tree and neural network to detect financial statement fraud-taking corporate governance into consideration, in Proc. 10^th Int. Conf. on Intelligent Computing, Taiyuan, China, 2014, pp. 221–234.

[34]

G. Ozdagoglu, A. Ozdagoglu, Y. Gumus, and G. Kurt Gumus, The application of data mining techniques in manipulated financial statement classification: The case of turkey, J. AI Data Mining., vol. 5, no. 1, pp. 67–77, 2017.

[35]

A. A. Rizki, I. Surjandari, and R. A. Wayasti, Data mining application to detect financial fraud in Indonesia’s public companies, in Proc. 2017 3^rd Int. Conf. on Science in Information Technology, Bandung, Indonesia, 2017, pp. 206–211.

[36]

X. B. Tang, G. C. Liu, J. Yang, and W. Wei, Knowledge-based financial statement fraud detection system: Based on an ontology and a decision tree, Knowl. Org., vol. 45, no. 3, pp. 205–219, 2018.

Crossref Google Scholar

[37]

Y. Bao, B. Ke, B. Li, Y. J. Yu, and J. Zhang, Detecting accounting fraud in publicly traded U. S. firms using a machine learning approach, J. Account. Res., vol. 58, no. 1, pp. 199–235, 2020.

Crossref Google Scholar

[38]

X. Wu and S. Du, An analysis on financial statement fraud detection for Chinese listed companies using deep learning, IEEE Access, vol. 10, pp. 22516–22532, 2022.

Crossref Google Scholar

[39]

Z. Sabir, M. A. Z. Raja, A. S. Alnahdi, M. B. Jeelani, and M. A. Abdelkawy, Numerical investigations of the nonlinear smoke model using the Gudermannian neural networks, Math. Biosci. Eng, vol. 19, no. 1, pp. 351–370, 2022.

Crossref Google Scholar

[40]

Z. Sabir, M. A. Z. Raja, J. L. G. Guirao, and T. Saeed, Meyer wavelet neural networks to solve a novel design of fractional order pantograph lane-emden differential model, Chaos Solitons Fractals, vol. 152, p. 111404, 2021.

Crossref Google Scholar

[41]

Z. Sabir, M. A. Z. Raja, H. A. Wahab, M. Shoaib, and J. F. G. Aguilar, Integrated neuro-evolution heuristic with sequential quadratic programming for second-order prediction differential models, Numer. Methods Part. Differ. Equations, vol. 40, no. 1, p. e22692, 2024.

Crossref Google Scholar

[42]

K. Nisar, Z. Sabir, M. A. Z. Raja, A. A. A. Ibrahim, F. Erdogan, M. R. Haque, J. J. P. C. Rodrigues, and D. B. Rawat, Design of morlet wavelet neural network for solving a class of singular pantograph nonlinear differential models, IEEE Access, vol. 9, pp. 77845–77862, 2021.

Crossref Google Scholar

[43]

Z. Sabir, Neuron analysis through the swarming procedures for the singular two-point boundary value problems arising in the theory of thermal explosion, Eur. Phys. J. Plus, vol. 137, no. 5, p. 638, 2022.

Crossref Google Scholar

[44]

Z. Sabir, T. Botmart, M. A. Z. Raja, R. Sadat, M. R. Ali, A. A. Alsulami, and A. Alghamdi, Artificial neural network scheme to solve the nonlinear influenza disease model, Biomed. Signal Process. Control, vol. 75, p. 103594, 2022.

Crossref Google Scholar

[45]

J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and M. Sun, Graph neural networks: A review of methods and applications, AI Open, vol. 1, pp. 57–81, 2020.

Crossref Google Scholar

[46]

H. Yang, AliGraph: A comprehensive graph neural network platform, in Proc. 25^th ACM SIGKDD Int. Conf. on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 2019, pp. 3165–3166.

[47]

Z. Liu, C. Chen, X. Yang, J. Zhou, X. Li, and L. Song, Heterogeneous graph neural networks for malicious account detection, in Proc. 27^th ACM Int. Conf. on Information and Knowledge Management, Torino, Italy, 2018, pp. 2077–2085.

[48]

B. Feng, H. Xu, W. Xue, and B. Xue, Every corporation owns its structure: Corporate credit rating via graph neural networks, in Proc. 5th Chinese Conf. on Pattern Recognition and Computer Vision, Shenzhen, China, 2022, pp. 688–699.

[49]

Z. Liu, Y. Dou, P. S. Yu, Y. Deng, and H. Peng, Alleviating the inconsistency problem of applying graph neural network to fraud detection, in Proc. 43^rd Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, New York, NY, USA, 2020, pp. 1569–1572.

[50]

X. Mao, H. Sun, X. Zhu, and J. Li, Financial fraud detection using the related-party transaction knowledge graph, Procedia Comput. Sci., vol. 199, pp. 733–740, 2022.

Crossref Google Scholar

[51]

J. Gasteiger, S. Weißenberger, and S. Günnemann, Diffusion improves graph learning, in Proc. 33^rd Int. Conf. on Neural Information Processing Systems, Vancouver, Canada, 2019, p. 1197.

[52]

K. Hassani and A. H. Khasahmadi, Contrastive multi-view representation learning on graphs, in Proc. 37^th Int. Conf. on Machine Learning, Virtual Event, 2020, pp. 4116–4126.

[53]

L. Page, S. Brin, R. Motwani, and T. Winograd, The PageRank citation ranking: bringing order to the web, Stanford Digital Libraries Working Paper.

Crossref Google Scholar

[54]

W. L. Hamilton, Z. Ying, and J. Leskovec, Inductive representation learning on large graphs, in Proc. 31^st Conf. on Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 1024–1034.

[55]

P. Christoffersen and K. Jacobs, The importance of the loss function in option valuation, J. Financ. Econ., vol. 72, no. 2, pp. 291–318, 2004.

Crossref Google Scholar

[56]

U. Ruby and V. Yendapalli, Binary cross entropy with deep learning technique for image classification, Int. J. Adv. Trends Comput. Sci. Eng, vol. 9, no. 4, pp. 5393–5397, 2020.

Crossref Google Scholar

[57]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: Machine learning in python, J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011.

[58]

W. Caesarendra, A. Widodo, and B. S. Yang, Application of relevance vector machine and logistic regression for machine degradation assessment, Mech. Syst. Signal Process., vol. 24, no. 4, pp. 1161–1171, 2010.

Crossref Google Scholar

[59]

W. Tong, H. Hong, H. Fang, Q. Xie, and R. Perkins, Decision forest: Combining the predictions of multiple independent decision tree models, J. Chem. Inf. Comput. Sci., vol. 43, no. 2, pp. 525–531, 2003.

Crossref Google Scholar

[60]

L. Y. Hu, M. W. Huang, S. W. Ke, and C. F. Tsai, The distance function effect on k-nearest neighbor classification for medical datasets, SpringerPlus, vol. 5, no. 1, p. 1304, 2016.

Crossref Google Scholar

[61]

V. Y. Kulkarni and P. K. Sinha, Pruning of random forest classifiers: A survey and future directions, in Proc. 2012 Int. Conf. on Data Science & Engineering, Cochin, India, 2012, pp. 64–68.

[62]

X. Zhao, Z. Ma, and M. Yin, Using support vector machine and evolutionary profiles to predict antifreeze protein sequences, Int. J. Mol. Sci., vol. 13, no. 2, pp. 2196–2207, 2012.

Crossref Google Scholar

[63]

T. Chen and C. Guestrin, XGBoost: A scalable tree boosting system, in Proc. 22^nd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016, pp. 785–794.

[64]

G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T. Y. Liu, LightGBM: A highly efficient gradient boosting decision tree, in Proc. 31^st Int. Conf. on Neural Information Processing Systems, Red Hook, CA, USA, 2017, pp. 3149–3157.

[65]

J. H. Friedman, Greedy function approximation: A gradient boosting machine, Ann. Statist., vol. 29, no. 5, pp. 1189–1232, 2001.

Crossref Google Scholar

[66]

T. N. Kipf and M. Welling, Semi-supervised classification with graph convolutional networks, in Proc. 5^th Int. Conf. on Learning Representations, Toulon, France, 2017.

[67]

Y. Dou, Z. Liu, L. Sun, Y. Deng, H. Peng, and P. S. Yu, Enhancing graph neural network-based fraud detectors against camouflaged fraudsters, in Proc. 29^th ACM Int. Conf. on Information & Knowledge Management, Virtual Event, 2020, pp. 315–324.

[68]

M. Sokolova and G. Lapalme, A systematic analysis of performance measures for classification tasks, Inf. Process. Manage., vol. 45, no. 4, pp. 427–437, 2009.

Crossref Google Scholar

[69]

Y. Liu, X. Ao, Z. Qin, J. Chi, J. Feng, H. Yang, and Q. He, Pick and choose: A GNN-based imbalanced learning approach for fraud detection, in Proc. Web Conf. 2021, Ljubljana, Slovenia, 2021, pp. 3168–3177.

[70]

L. van der Maaten and G. Hinton, Visualizing data using t-SNE, J. Mach. Learn. Res., vol. 9, no. 86, pp. 2579–2605, 2008.

Crossref Google Scholar

Big Data Mining and Analytics

Volume 7 Issue 3,
September 2024

Pages 920-941

DOI: 10.26599/BDMA.2024.9020013

Cite this article:

Wang C, Wang M, Wang X, et al. Multi-Relational Graph Representation Learning for Financial Statement Fraud Detection. Big Data Mining and Analytics, 2024, 7(3): 920-941. https://doi.org/10.26599/BDMA.2024.9020013

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号