AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (8.4 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Multi-Relational Graph Representation Learning for Financial Statement Fraud Detection

School of Software Engineering, and also with MoE Key Lab of Intelligent Networks and Network Security, Xi’an Jiaotong University, Xi’an 710049, China
School of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, China
Shenzhen Finance Institute, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Shenzhen 518026, China
Show Author Information

Abstract

Financial statement fraud refers to malicious manipulations of financial data in listed companies’ annual statements. Traditional machine learning approaches focus on individual companies, overlooking the interactive relationships among companies that are crucial for identifying fraud patterns. Moreover, fraud detection is a typical imbalanced binary classification task with normal samples outnumbering fraud ones. In this paper, we propose a multi-relational graph convolutional network, named FraudGCN, for detecting financial statement fraud. A multi-relational graph is constructed to integrate industrial, supply chain, and accounting-sharing relationships, effectively encapsulating the multidimensional and complex interactions among companies. We then develop a multi-relational graph convolutional network to aggregate information within each relationship and employ an attention mechanism to fuse information across multiple relationships. The attention mechanism enables the model to distinguish the importance of different relationships, thereby aggregating more useful information from key relationships. To alleviate the class imbalance problem, we present a diffusion-based under-sampling strategy that strategically selects key nodes globally for model training. We also employ focal loss to assign greater weights to harder-to-classify minority samples. We build a real-world dataset from the annual financial statement of listed companies in China. The experimental results show that FraudGCN achieves an improvement of 3.15% in Macro-recall, 3.36% in Macro-F1, and 3.86% in GMean compared to the second-best method. The dataset and codes are publicly available at: https://github.com/XNetLab/MRG-for-Finance.

References

[1]

P. Ravisankar, V. Ravi, G. R. Rao, and I. Bose, Detection of financial statement fraud and feature selection using data mining techniques, Decis. Support Syst., vol. 50, no. 2, pp. 491–500, 2011.

[2]

S. Barman, U. Pal, A. Sarfaraj, B. Biswas, A. Mahata, and P. Mandal, A complete literature review on financial fraud detection applying data mining techniques, Int. J. Trust Manage. Comput. Commun., vol. 3, no. 4, pp. 336–359, 2016.

[3]

G. Niu, L. Yu, G. Z. Fan, and D. Zhang, Corporate fraud, risk avoidance, and housing investment in China, Emerg. Mark. Rev., vol. 39, pp. 18–33, 2019.

[4]

M. Jusup, P. Holme, K. Kanazawa, M. Takayasu, I. Romić, Z. Wang, S. Geček, T. Lipić, B. Podobnik, L. Wang, et al., Social physics, Phys. Rep., vol. 948, pp. 1–148, 2022.

[5]

L. G. A. Alves, H. Y. D. Sigaki, M. Perc, and H. V. Ribeiro, Collective dynamics of stock market efficiency, Sci. Rep., vol. 10, no. 1, p. 21992, 2020.

[6]

D. Fister, M. Perc, and T. Jagrič, Two robust long short-term memory frameworks for trading stocks, Appl. Intell., vol. 51, no. 10, pp. 7177–7195, 2021.

[7]

A. A. B. Pessa, M. Perc, and H. V. Ribeiro, Age and market capitalization drive large price variations of cryptocurrencies, Sci. Rep., vol. 13, no. 1, p. 3351, 2023.

[8]

B. Baesens, S. Höppner, and T. Verdonck, Data engineering for fraud detection, Decis. Support Syst., vol. 150, p. 113492, 2021.

[9]

K. G. Al-Hashedi and P. Magalingam, Financial fraud detection applying data mining techniques: A comprehensive review from 2009 to 2019, Comput. Sci. Rev., vol. 40, p. 100402, 2021.

[10]

P. S. Stanimirovic, Fraud detection in publicly traded us firms using beetle antennae search: A machine learning approach, Expert Systems with Applications, vol. 191, p. 116148, 2022.

[11]

P. M. Dechow, W. Ge, C. R. Larson, and R. G. Sloan, Predicting material accounting misstatements, Contemp. Account. Res., vol. 28, no. 1, pp. 17–82, 2011.

[12]

P. Craja, A. Kim, and S. Lessmann, Deep learning for detecting financial statement fraud, Decis. Support Syst., vol. 139, p. 113421, 2020.

[13]

Z. Sabir, H. A. Wahab, S. Javeed, and H. M. Baskonus, An efficient stochastic numerical computing framework for the nonlinear higher order singular models, Fractal Fract., vol. 5, no. 4, p. 176, 2021.

[14]

Z. Sabir, K. Nisar, M. A. Z. Raja, A. A. B. A. Ibrahim, J. J. P. C. Rodrigues, K. S. Al-Basyouni, S. R. Mahmoud, and D. B. Rawat, Heuristic computational design of morlet wavelet for solving the higher order singular nonlinear differential equations, Alex. Eng. J., vol. 60, no. 6, pp. 5935–5947, 2021.

[15]
S. Yang, Z. Zhang, J. Zhou, Y. Wang, W. Sun, X. Zhong, Y. Fang, Q. Yu, and Y. Qi, Financial risk analysis for SMEs with graph-based supply chain mining, in Proc. Twenty-Ninth Int. Joint Conf. on Artificial Intelligence, Yokohama, Japan, 2021, p. 643.
[16]
D. T. Ngaa, N. T. Le Thi Khanh Hoaa, P. Anha, T. V. Anha, L. P. Thaoa, and D. T. Haa, The impact of auditor’s emotional intelligence and leadership style on audit quality: A, in Proc. ICAEFM 2023, Nha trang, Vietnam 2023, p. 124.
[17]
R. Barandela, R. M. Valdovinos, J. S. Sánchez, and F. J. Ferri, The imbalanced training sample problem: Under or over sampling? in Proc. Joint IAPR Int. Workshops, Lisbon, Portugal, 2004, pp. 806–814.
[18]

S. C. L. Koh, M. Demirbag, E. Bayraktar, E. Tatoglu, and S. Zaim, The impact of supply chain management practices on performance of SMEs, Ind. Manage. Data Syst., vol. 107, no. 1, pp. 103–124, 2007.

[19]

M. Abed and B. Fernando, E-commerce fraud detection based on machine learning techniques: Systematic literature review, Big Data Mining and Analytics, vol. 7, no.2, pp. 419−444, 2024.

[20]

J. Perols, Financial statement fraud detection: An analysis of statistical and machine learning algorithms, Audit. : A J. Pract. Theory, vol. 30, no. 2, pp. 19–50, 2011.

[21]

W. H. Beaver, Financial ratios as predictors of failure, J. Account. Res. vol. 4, no. 1, pp. 71–111, 1966.

[22]

M. Cecchini, H. Aytug, G. J. Koehler, and P. Pathak, Detecting management fraud in public companies, Manage. Sci., vol. 56, no. 7, pp. 1146–1160, 2010.

[23]

S. Kotsiantis, E. Koumanakos, D. Tzelepis, V. Tampakas, Forecasting fraudulent financial statements using data mining, Int. J. Comput. Intell., vol. 3, no. 2, pp. 104–110, 2006.

[24]

H. C. Koh and C. K. Low, Going concern prediction using data mining techniques, Manag. Audit. J., vol. 19, no. 3, pp. 462–476, 2004.

[25]

C. Liu, Y. Chan, S. H. A. Kazmi, and H. Fu, Financial fraud detection model: Based on random forest, Int. J. Econ. Finance, vol. 7, no. 7, pp. 178–188, 2015.

[26]

M. Cecchini, H. Aytug, G. J. Koehler, and P. Pathak, Making words work: Using financial text as a predictor of financial events, Decis. Support Syst., vol. 50, no. 1, pp. 164–175, 2010.

[27]

Y. Y. Chen, Forecasting financial distress of listed companies with textual content of the information disclosure: A study based MD&A in Chinese annual reports, (in Chinese), Chin. J. Manage. Sci., vol. 27, no. 7, pp. 23–34, 2019.

[28]
T. K. Hwang, W. C. Chen, W. C. Chiang, and Y. M. Li, Machine learning detection for financial statement fraud, in Information Systems and Technologies, A. Rocha, H. Adeli, G. Dzemyda, and F. Moreira, eds. Cham, Switzerland: Springer, 2022, pp. 148–154.
[29]

A. Dyck, A. Morse, and L. Zingales, Who blows the whistle on corporate fraud? J. Finance, vol. 65, no. 6, pp. 2213–2253, 2010.

[30]

P. Hajek and R. Henriques, Mining corporate annual reports for intelligent detection of financial statement fraud—A comparative study of machine learning methods, Knowl. -Based Syst., vol. 128, pp. 139–152, 2017.

[31]

J. L. Hobson, W. J. Mayew, and M. Venkatachalam, Analyzing speech to detect financial misreporting, J. Account. Res., vol. 50, no. 2, pp. 349–392, 2012.

[32]

W. Dong, S. Liao, and Z. Zhang, Leveraging financial social media data for corporate fraud detection, J. Manage. Inf. Syst., vol. 35, no. 2, pp. 461–487, 2018.

[33]
F. H. Chen, D. J. Chi, and J. Y. Zhu, Application of random forest, rough set theory, decision tree and neural network to detect financial statement fraud-taking corporate governance into consideration, in Proc. 10 th Int. Conf. on Intelligent Computing, Taiyuan, China, 2014, pp. 221–234.
[34]

G. Ozdagoglu, A. Ozdagoglu, Y. Gumus, and G. Kurt Gumus, The application of data mining techniques in manipulated financial statement classification: The case of turkey, J. AI Data Mining., vol. 5, no. 1, pp. 67–77, 2017.

[35]
A. A. Rizki, I. Surjandari, and R. A. Wayasti, Data mining application to detect financial fraud in Indonesia’s public companies, in Proc. 2017 3 rd Int. Conf. on Science in Information Technology, Bandung, Indonesia, 2017, pp. 206–211.
[36]

X. B. Tang, G. C. Liu, J. Yang, and W. Wei, Knowledge-based financial statement fraud detection system: Based on an ontology and a decision tree, Knowl. Org., vol. 45, no. 3, pp. 205–219, 2018.

[37]

Y. Bao, B. Ke, B. Li, Y. J. Yu, and J. Zhang, Detecting accounting fraud in publicly traded U. S. firms using a machine learning approach, J. Account. Res., vol. 58, no. 1, pp. 199–235, 2020.

[38]

X. Wu and S. Du, An analysis on financial statement fraud detection for Chinese listed companies using deep learning, IEEE Access, vol. 10, pp. 22516–22532, 2022.

[39]

Z. Sabir, M. A. Z. Raja, A. S. Alnahdi, M. B. Jeelani, and M. A. Abdelkawy, Numerical investigations of the nonlinear smoke model using the Gudermannian neural networks, Math. Biosci. Eng, vol. 19, no. 1, pp. 351–370, 2022.

[40]

Z. Sabir, M. A. Z. Raja, J. L. G. Guirao, and T. Saeed, Meyer wavelet neural networks to solve a novel design of fractional order pantograph lane-emden differential model, Chaos Solitons Fractals, vol. 152, p. 111404, 2021.

[41]

Z. Sabir, M. A. Z. Raja, H. A. Wahab, M. Shoaib, and J. F. G. Aguilar, Integrated neuro-evolution heuristic with sequential quadratic programming for second-order prediction differential models, Numer. Methods Part. Differ. Equations, vol. 40, no. 1, p. e22692, 2024.

[42]

K. Nisar, Z. Sabir, M. A. Z. Raja, A. A. A. Ibrahim, F. Erdogan, M. R. Haque, J. J. P. C. Rodrigues, and D. B. Rawat, Design of morlet wavelet neural network for solving a class of singular pantograph nonlinear differential models, IEEE Access, vol. 9, pp. 77845–77862, 2021.

[43]

Z. Sabir, Neuron analysis through the swarming procedures for the singular two-point boundary value problems arising in the theory of thermal explosion, Eur. Phys. J. Plus, vol. 137, no. 5, p. 638, 2022.

[44]

Z. Sabir, T. Botmart, M. A. Z. Raja, R. Sadat, M. R. Ali, A. A. Alsulami, and A. Alghamdi, Artificial neural network scheme to solve the nonlinear influenza disease model, Biomed. Signal Process. Control, vol. 75, p. 103594, 2022.

[45]

J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and M. Sun, Graph neural networks: A review of methods and applications, AI Open, vol. 1, pp. 57–81, 2020.

[46]
H. Yang, AliGraph: A comprehensive graph neural network platform, in Proc. 25 th ACM SIGKDD Int. Conf. on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 2019, pp. 3165–3166.
[47]
Z. Liu, C. Chen, X. Yang, J. Zhou, X. Li, and L. Song, Heterogeneous graph neural networks for malicious account detection, in Proc. 27 th ACM Int. Conf. on Information and Knowledge Management, Torino, Italy, 2018, pp. 2077–2085.
[48]
B. Feng, H. Xu, W. Xue, and B. Xue, Every corporation owns its structure: Corporate credit rating via graph neural networks, in Proc. 5th Chinese Conf. on Pattern Recognition and Computer Vision, Shenzhen, China, 2022, pp. 688–699.
[49]
Z. Liu, Y. Dou, P. S. Yu, Y. Deng, and H. Peng, Alleviating the inconsistency problem of applying graph neural network to fraud detection, in Proc. 43 rd Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, New York, NY, USA, 2020, pp. 1569–1572.
[50]

X. Mao, H. Sun, X. Zhu, and J. Li, Financial fraud detection using the related-party transaction knowledge graph, Procedia Comput. Sci., vol. 199, pp. 733–740, 2022.

[51]
J. Gasteiger, S. Weißenberger, and S. Günnemann, Diffusion improves graph learning, in Proc. 33 rd Int. Conf. on Neural Information Processing Systems, Vancouver, Canada, 2019, p. 1197.
[52]
K. Hassani and A. H. Khasahmadi, Contrastive multi-view representation learning on graphs, in Proc. 37 th Int. Conf. on Machine Learning, Virtual Event, 2020, pp. 4116–4126.
[53]

L. Page, S. Brin, R. Motwani, and T. Winograd, The PageRank citation ranking: bringing order to the web, Stanford Digital Libraries Working Paper.

[54]
W. L. Hamilton, Z. Ying, and J. Leskovec, Inductive representation learning on large graphs, in Proc. 31 st Conf. on Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 1024–1034.
[55]

P. Christoffersen and K. Jacobs, The importance of the loss function in option valuation, J. Financ. Econ., vol. 72, no. 2, pp. 291–318, 2004.

[56]

U. Ruby and V. Yendapalli, Binary cross entropy with deep learning technique for image classification, Int. J. Adv. Trends Comput. Sci. Eng, vol. 9, no. 4, pp. 5393–5397, 2020.

[57]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: Machine learning in python, J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011.

[58]

W. Caesarendra, A. Widodo, and B. S. Yang, Application of relevance vector machine and logistic regression for machine degradation assessment, Mech. Syst. Signal Process., vol. 24, no. 4, pp. 1161–1171, 2010.

[59]

W. Tong, H. Hong, H. Fang, Q. Xie, and R. Perkins, Decision forest: Combining the predictions of multiple independent decision tree models, J. Chem. Inf. Comput. Sci., vol. 43, no. 2, pp. 525–531, 2003.

[60]

L. Y. Hu, M. W. Huang, S. W. Ke, and C. F. Tsai, The distance function effect on k-nearest neighbor classification for medical datasets, SpringerPlus, vol. 5, no. 1, p. 1304, 2016.

[61]
V. Y. Kulkarni and P. K. Sinha, Pruning of random forest classifiers: A survey and future directions, in Proc. 2012 Int. Conf. on Data Science & Engineering, Cochin, India, 2012, pp. 64–68.
[62]

X. Zhao, Z. Ma, and M. Yin, Using support vector machine and evolutionary profiles to predict antifreeze protein sequences, Int. J. Mol. Sci., vol. 13, no. 2, pp. 2196–2207, 2012.

[63]
T. Chen and C. Guestrin, XGBoost: A scalable tree boosting system, in Proc. 22 nd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016, pp. 785–794.
[64]
G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T. Y. Liu, LightGBM: A highly efficient gradient boosting decision tree, in Proc. 31 st Int. Conf. on Neural Information Processing Systems, Red Hook, CA, USA, 2017, pp. 3149–3157.
[65]

J. H. Friedman, Greedy function approximation: A gradient boosting machine, Ann. Statist., vol. 29, no. 5, pp. 1189–1232, 2001.

[66]
T. N. Kipf and M. Welling, Semi-supervised classification with graph convolutional networks, in Proc. 5 th Int. Conf. on Learning Representations, Toulon, France, 2017.
[67]
Y. Dou, Z. Liu, L. Sun, Y. Deng, H. Peng, and P. S. Yu, Enhancing graph neural network-based fraud detectors against camouflaged fraudsters, in Proc. 29 th ACM Int. Conf. on Information & Knowledge Management, Virtual Event, 2020, pp. 315–324.
[68]

M. Sokolova and G. Lapalme, A systematic analysis of performance measures for classification tasks, Inf. Process. Manage., vol. 45, no. 4, pp. 427–437, 2009.

[69]
Y. Liu, X. Ao, Z. Qin, J. Chi, J. Feng, H. Yang, and Q. He, Pick and choose: A GNN-based imbalanced learning approach for fraud detection, in Proc. Web Conf. 2021, Ljubljana, Slovenia, 2021, pp. 3168–3177.
[70]

L. van der Maaten and G. Hinton, Visualizing data using t-SNE, J. Mach. Learn. Res., vol. 9, no. 86, pp. 2579–2605, 2008.

Big Data Mining and Analytics
Pages 920-941
Cite this article:
Wang C, Wang M, Wang X, et al. Multi-Relational Graph Representation Learning for Financial Statement Fraud Detection. Big Data Mining and Analytics, 2024, 7(3): 920-941. https://doi.org/10.26599/BDMA.2024.9020013

256

Views

34

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 02 January 2024
Revised: 02 March 2024
Accepted: 04 March 2024
Published: 28 August 2024
© The author(s) 2024.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return