A Parallel High-Utility Itemset Mining Algorithm Based on Hadoop

Zaihe Cheng; Wei Shen; Wei Fang; Jerry Chun-Wei Lin

doi:10.23919/CSMS.2022.0023

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (1.9 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

A Parallel High-Utility Itemset Mining Algorithm Based on Hadoop

Zaihe Cheng^¹, Wei Shen^², Wei Fang^²(

), Jerry Chun-Wei Lin^³

1School of Internet of Things, Wuxi Institute of Technology, Wuxi 214121, China and also with the School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China

2Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi 214122, China

3Department of Computer Science, Electrical Engineering and Mathematical Sciences, Western Norway University of Applied Sciences, Bergen 5020, Norway

Show Author Information

Abstract

High-utility itemset mining (HUIM) can consider not only the profit factor but also the profitable factor, which is an essential task in data mining. However, most HUIM algorithms are mainly developed on a single machine, which is inefficient for big data since limited memory and processing capacities are available. A parallel efficient high-utility itemset mining (P-EFIM) algorithm is proposed based on the Hadoop platform to solve this problem in this paper. In P-EFIM, the transaction-weighted utilization values are calculated and ordered for the itemsets with the MapReduce framework. Then the ordered itemsets are renumbered, and the low-utility itemsets are pruned to improve the dataset utility. In the Map phase, the P-EFIM algorithm divides the task into multiple independent subtasks. It uses the proposed S-style distribution strategy to distribute the subtasks evenly across all nodes to ensure load-balancing. Furthermore, the P-EFIM uses the EFIM algorithm to mine each subtask dataset to enhance the performance in the Reduce phase. Experiments are performed on eight datasets, and the results show that the runtime performance of P-EFIM is significantly higher than that of the PHUI-Growth, which is also HUIM algorithm based on the Hadoop framework.

Keywords

big data data mining Hadoop parallel pattern mining high-utility itemset mining

References

[1]

X. Li, S. Cao, L. Gao, and L. Wen, A threshold-control generative adversarial network method for intelligent fault diagnosis, Complex System Modeling and Simulation, vol. 1, no. 1, pp. 55–64, 2021.

Crossref Google Scholar

[2]

Q. Yu, L. Li, H. Zhao, Y. Liu, and K.-Y. Lin, Evaluation system and correlation analysis for determining the performance of a semiconductor manufacturing system, Complex System Modeling and Simulation, vol. 1, no. 3, pp. 218–231, 2021.

Crossref Google Scholar

[3]

J. Li, X. Gu, Y. Zhang, and X. Zhou, Distributed flexible job-shop scheduling problem based on hybrid chemical reaction optimization algorithm, Complex System Modeling and Simulation, vol. 2, no. 2, pp. 156–173, 2022.

Crossref Google Scholar

[4]

L. Wang, Z. Pan, and J. Wang, A review of reinforcement learning based intelligent optimization for manufacturing scheduling, Complex System Modeling and Simulation, vol. 1, no. 4, pp. 257–270, 2021.

Crossref Google Scholar

[5]

Z. -H. Deng and S. -L. Lv, PrePost+: An efficient N-lists-based algorithm for mining frequent itemsets via children–parent equivalence pruning, Expert Systems with Applications, vol. 42, no. 13, pp. 5424–5432, 2015.

Crossref Google Scholar

[6]

T. L. Dam, K. Li, P. Fournier-Viger, and Q. H. Duong, An efficient algorithm for mining top-rank-k frequent patterns, Applied Intelligence, vol. 45, no. 1, pp. 96–111, 2016.

Crossref Google Scholar

[7]

G. Grahne and J. Zhu, Fast algorithms for frequent itemset mining using FP-trees, IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 10, pp. 1347–1362, 2005.

Crossref Google Scholar

[8]

Z. H. Deng, DiffNodesets: An efficient structure for fast mining frequent itemsets, Applied Soft Computing, vol. 41, pp. 214–223, 2016.

Crossref Google Scholar

[9]

T. Ryu, H. Kim, C. Lee, H. Kim, B. Vo, J. C. -W. Lin, W. Pedrycz, and U. Yun, Scalable and efficient approach for high temporal fuzzy utility pattern mining, IEEE Transactions on Cybernetics, doi: 10.1109/TCYB.2022.3198661.

Crossref

[10]

W. Song, Y. Liu, and J. Li, BAHUI: Fast and memory efficient mining of high utility itemsets based on bitmap, International Journal of Data Warehousing and Mining, vol. 10, no. 1, pp. 1–15, 2014.

Crossref Google Scholar

[11]

C. -W. Wu, P. Fournier-Viger, J. -Y. Gu, and V. S. Tseng, Mining compact high utility itemsets without candidate generation, in High-Utility Pattern Mining, P. Fournier-Viger, J. C. -W. Lin, R. Nkambou, B. Vo, and V. S. Tseng, eds. Cham, Switzerland: Springer, 2019, pp. 279–302.

Crossref

[12]

Y. C. Lin, C. -W. Wu, and V. S. Tseng, Mining high utility itemsets in big data, in Proc. 19^th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Ho Chi Minh City, Vietnam, 2015, pp. 649–661.

Crossref

[13]

W. Song, Y. Liu, and J. Li, Mining high utility itemsets by dynamically pruning the tree structure, Applied Intelligence, vol. 40, pp. 29–43, 2014.

Crossref Google Scholar

[14]

S. Krishnamoorthy, Hminer: Efficiently mining high utility itemsets, Expert Systems with Applications, vol. 90, pp. 168–183, 2017.

Crossref Google Scholar

[15]

S. Zida, P. Fournier-Viger, J. C. -W. Lin, C. -W. Wu, and V. S. Tseng, EFIM: A highly efficient algorithm for high-utility itemset mining, in Proc. 14^th Mexican International Conference on Artificial Intelligence, Cuernavaca, Mexico, 2015, pp. 530–546.

Crossref

[16]

T. L. Dam, K. Li, P. Fournier-Viger, and Q. -H. Duong, CLS-miner: Efficient and effective closed high-utility itemset mining, Frontiers of Computer Science, vol. 13, no. 1, pp. 357–381, 2019.

Crossref Google Scholar

[17]

J. C. -W. Lin, Y. Djenouri, G. Srivastava, and J. M. -T. Wu, Large-scale closed high-utility itemset mining, in Proc. 2021 International Conference on Data Mining Workshops (ICDMW), Auckland, New Zealand, 2021, pp. 591–598.

[18]

W. Gan, Z. Du, W. Ding, C. Zhang, and H. -C. Chao, Explainable fuzzy utility mining on sequences, IEEE Transactions on Fuzzy Systems, vol. 29, no. 12, pp. 3620–3634, 2021.

Crossref Google Scholar

[19]

W. Gan, J. C. -W. Lin, P. Fournier-Viger, H. -C. Chao, V. S. Tseng, and P. S. Yu, A survey of utility-oriented pattern mining, IEEE Transactions on Knowledge and Data Engineering, vol. 33, no. 4, pp. 1306–1327, 2021.

Crossref Google Scholar

[20]

J. Miao, S. Wan, W. Gan, J. Sun, and J. Chen, Targeted high-utility itemset querying, IEEE Transactions on Artificial Intelligence, doi: 10.1109/TAI.2022.3171530.

Crossref

[21]

H. Yao and H. J. Hamilton, Mining itemset utilities from transaction databases, Data and Knowledge Engineering, vol. 59, no. 3, pp. 603–626, 2006.

Crossref Google Scholar

[22]

Y. Liu, W. Liao, and A. Choudhary, A two-phase algorithm for fast discovery of high utility itemsets, in Proc. 9^th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, Hanoi, Vietnam, 2005, pp. 689–695.

Crossref

[23]

Y. C. Li, J. S. Yeh, and C. C. Chang, Isolated items discarding strategy for discovering high utility itemsets, Data and Knowledge Engineering, vol. 64, no. 1, pp. 198–217, 2008.

Crossref Google Scholar

[24]

C. F. Ahmed, S. K. Tanbeer, B. S. Jeong, and Y. K. Lee, Efficient tree structures for high utility pattern mining in incremental databases, IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 12, pp. 1708–1721, 2009.

Crossref Google Scholar

[25]

U. Yun, H. Ryang, and K. H. Ryu, High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates, Expert Systems with Applications, vol. 41, no. 8, pp. 3861–3878, 2014.

Crossref Google Scholar

[26]

V. S. Tseng, C. W. Wu, B. E. Shie, and P. S. Yu, Up-growth: An efficient algorithm for high utility itemset mining, in Proc. 16^th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 2010, pp. 253–262.

Crossref

[27]

V. S. Tseng, B. E. Shie, C. W. Wu, and P. S. Yu, Efficient algorithms for mining high utility itemsets from transactional databases, IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 8, pp. 1772–1786, 2013.

Crossref Google Scholar

[28]

M. Liu and J. Qu, Mining high utility itemsets without candidate generation, in Proc. 21^st ACM International Conference on Information and Knowledge Management, Maui, HI, USA, 2012, pp. 55–64.

Crossref

[29]

P. Fournier-Viger, C. -W. Wu, S. Zida, and V. S. Tseng, FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning, in Proc. 21^st International Symposium on Methodologies for Intelligent Systems, Roskilde, Denmark, 2014, pp. 83–92.

Crossref

[30]

S. Krishnamoorthy, Pruning strategies for mining high utility itemsets, Expert Systems with Applications, vol. 42, no. 5, pp. 2371–2381, 2015.

Crossref Google Scholar

[31]

J. Sahoo, A. K. Das, and A. Goswami, An efficient fast algorithm for discovering closed⁺ high utility itemsets, Applied Intelligence, vol. 45, no. 1, pp. 44–74, 2016.

Crossref Google Scholar

[32]

J. Liu, K. Wang, and B. C. M. Fung, Direct discovery of high utility itemsets without candidate generation, in Proc. 2012 IEEE 12^th International Conference on Data Mining, Brussels, Belgium, 2012, pp. 984–989.

Crossref

[33]

C. W. Wu, P. Fournier-Viger, J. Y. Gu, and V. S. Tseng, Mining closed+ high utility itemsets without candidate generation, in Proc. 2015 Conference on Technologies and Applications of Artificial Intelligence, Tainan, China, 2015, pp. 187–194.

[34]

G. -C. Lan, T. -P. Hong, and V. S. Tseng, An efficient projection-based indexing approach for mining high utility itemsets, Knowledge and Information Systems, vol. 38, no. 1, pp. 85–107, 2014.

Crossref Google Scholar

[35]

S. Dawar, V. Goyal, and D. Bera, A hybrid framework for mining high-utility itemsets in a sparse transaction database, Applied Intelligence, vol. 47, no. 3, pp. 809–827, 2017.

Crossref Google Scholar

Complex System Modeling and Simulation

Volume 3 Issue 1,
March 2023

Pages 47-58

DOI: 10.23919/CSMS.2022.0023

Cite this article:

Cheng Z, Shen W, Fang W, et al. A Parallel High-Utility Itemset Mining Algorithm Based on Hadoop. Complex System Modeling and Simulation, 2023, 3(1): 47-58. https://doi.org/10.23919/CSMS.2022.0023

715

Views

Downloads

Crossref

Scopus

Google Scholar
Citation

Altmetrics

Received: 13 July 2022

Revised: 20 October 2022

Accepted: 31 October 2022

Published: 09 March 2023

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).