AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (1.9 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

A Parallel High-Utility Itemset Mining Algorithm Based on Hadoop

Zaihe Cheng1Wei Shen2Wei Fang2( )Jerry Chun-Wei Lin3
School of Internet of Things, Wuxi Institute of Technology, Wuxi 214121, China and also with the School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi 214122, China
Department of Computer Science, Electrical Engineering and Mathematical Sciences, Western Norway University of Applied Sciences, Bergen 5020, Norway
Show Author Information

Abstract

High-utility itemset mining (HUIM) can consider not only the profit factor but also the profitable factor, which is an essential task in data mining. However, most HUIM algorithms are mainly developed on a single machine, which is inefficient for big data since limited memory and processing capacities are available. A parallel efficient high-utility itemset mining (P-EFIM) algorithm is proposed based on the Hadoop platform to solve this problem in this paper. In P-EFIM, the transaction-weighted utilization values are calculated and ordered for the itemsets with the MapReduce framework. Then the ordered itemsets are renumbered, and the low-utility itemsets are pruned to improve the dataset utility. In the Map phase, the P-EFIM algorithm divides the task into multiple independent subtasks. It uses the proposed S-style distribution strategy to distribute the subtasks evenly across all nodes to ensure load-balancing. Furthermore, the P-EFIM uses the EFIM algorithm to mine each subtask dataset to enhance the performance in the Reduce phase. Experiments are performed on eight datasets, and the results show that the runtime performance of P-EFIM is significantly higher than that of the PHUI-Growth, which is also HUIM algorithm based on the Hadoop framework.

References

[1]

X. Li, S. Cao, L. Gao, and L. Wen, A threshold-control generative adversarial network method for intelligent fault diagnosis, Complex System Modeling and Simulation, vol. 1, no. 1, pp. 55–64, 2021.

[2]

Q. Yu, L. Li, H. Zhao, Y. Liu, and K.-Y. Lin, Evaluation system and correlation analysis for determining the performance of a semiconductor manufacturing system, Complex System Modeling and Simulation, vol. 1, no. 3, pp. 218–231, 2021.

[3]

J. Li, X. Gu, Y. Zhang, and X. Zhou, Distributed flexible job-shop scheduling problem based on hybrid chemical reaction optimization algorithm, Complex System Modeling and Simulation, vol. 2, no. 2, pp. 156–173, 2022.

[4]

L. Wang, Z. Pan, and J. Wang, A review of reinforcement learning based intelligent optimization for manufacturing scheduling, Complex System Modeling and Simulation, vol. 1, no. 4, pp. 257–270, 2021.

[5]

Z. -H. Deng and S. -L. Lv, PrePost+: An efficient N-lists-based algorithm for mining frequent itemsets via children–parent equivalence pruning, Expert Systems with Applications, vol. 42, no. 13, pp. 5424–5432, 2015.

[6]

T. L. Dam, K. Li, P. Fournier-Viger, and Q. H. Duong, An efficient algorithm for mining top-rank-k frequent patterns, Applied Intelligence, vol. 45, no. 1, pp. 96–111, 2016.

[7]

G. Grahne and J. Zhu, Fast algorithms for frequent itemset mining using FP-trees, IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 10, pp. 1347–1362, 2005.

[8]

Z. H. Deng, DiffNodesets: An efficient structure for fast mining frequent itemsets, Applied Soft Computing, vol. 41, pp. 214–223, 2016.

[9]
T. Ryu, H. Kim, C. Lee, H. Kim, B. Vo, J. C. -W. Lin, W. Pedrycz, and U. Yun, Scalable and efficient approach for high temporal fuzzy utility pattern mining, IEEE Transactions on Cybernetics, doi: 10.1109/TCYB.2022.3198661.
[10]

W. Song, Y. Liu, and J. Li, BAHUI: Fast and memory efficient mining of high utility itemsets based on bitmap, International Journal of Data Warehousing and Mining, vol. 10, no. 1, pp. 1–15, 2014.

[11]
C. -W. Wu, P. Fournier-Viger, J. -Y. Gu, and V. S. Tseng, Mining compact high utility itemsets without candidate generation, in High-Utility Pattern Mining, P. Fournier-Viger, J. C. -W. Lin, R. Nkambou, B. Vo, and V. S. Tseng, eds. Cham, Switzerland: Springer, 2019, pp. 279–302.
[12]
Y. C. Lin, C. -W. Wu, and V. S. Tseng, Mining high utility itemsets in big data, in Proc. 19th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Ho Chi Minh City, Vietnam, 2015, pp. 649–661.
[13]

W. Song, Y. Liu, and J. Li, Mining high utility itemsets by dynamically pruning the tree structure, Applied Intelligence, vol. 40, pp. 29–43, 2014.

[14]

S. Krishnamoorthy, Hminer: Efficiently mining high utility itemsets, Expert Systems with Applications, vol. 90, pp. 168–183, 2017.

[15]
S. Zida, P. Fournier-Viger, J. C. -W. Lin, C. -W. Wu, and V. S. Tseng, EFIM: A highly efficient algorithm for high-utility itemset mining, in Proc. 14th Mexican International Conference on Artificial Intelligence, Cuernavaca, Mexico, 2015, pp. 530–546.
[16]

T. L. Dam, K. Li, P. Fournier-Viger, and Q. -H. Duong, CLS-miner: Efficient and effective closed high-utility itemset mining, Frontiers of Computer Science, vol. 13, no. 1, pp. 357–381, 2019.

[17]
J. C. -W. Lin, Y. Djenouri, G. Srivastava, and J. M. -T. Wu, Large-scale closed high-utility itemset mining, in Proc. 2021 International Conference on Data Mining Workshops (ICDMW), Auckland, New Zealand, 2021, pp. 591–598.
[18]

W. Gan, Z. Du, W. Ding, C. Zhang, and H. -C. Chao, Explainable fuzzy utility mining on sequences, IEEE Transactions on Fuzzy Systems, vol. 29, no. 12, pp. 3620–3634, 2021.

[19]

W. Gan, J. C. -W. Lin, P. Fournier-Viger, H. -C. Chao, V. S. Tseng, and P. S. Yu, A survey of utility-oriented pattern mining, IEEE Transactions on Knowledge and Data Engineering, vol. 33, no. 4, pp. 1306–1327, 2021.

[20]
J. Miao, S. Wan, W. Gan, J. Sun, and J. Chen, Targeted high-utility itemset querying, IEEE Transactions on Artificial Intelligence, doi: 10.1109/TAI.2022.3171530.
[21]

H. Yao and H. J. Hamilton, Mining itemset utilities from transaction databases, Data and Knowledge Engineering, vol. 59, no. 3, pp. 603–626, 2006.

[22]
Y. Liu, W. Liao, and A. Choudhary, A two-phase algorithm for fast discovery of high utility itemsets, in Proc. 9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, Hanoi, Vietnam, 2005, pp. 689–695.
[23]

Y. C. Li, J. S. Yeh, and C. C. Chang, Isolated items discarding strategy for discovering high utility itemsets, Data and Knowledge Engineering, vol. 64, no. 1, pp. 198–217, 2008.

[24]

C. F. Ahmed, S. K. Tanbeer, B. S. Jeong, and Y. K. Lee, Efficient tree structures for high utility pattern mining in incremental databases, IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 12, pp. 1708–1721, 2009.

[25]

U. Yun, H. Ryang, and K. H. Ryu, High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates, Expert Systems with Applications, vol. 41, no. 8, pp. 3861–3878, 2014.

[26]
V. S. Tseng, C. W. Wu, B. E. Shie, and P. S. Yu, Up-growth: An efficient algorithm for high utility itemset mining, in Proc. 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 2010, pp. 253–262.
[27]

V. S. Tseng, B. E. Shie, C. W. Wu, and P. S. Yu, Efficient algorithms for mining high utility itemsets from transactional databases, IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 8, pp. 1772–1786, 2013.

[28]
M. Liu and J. Qu, Mining high utility itemsets without candidate generation, in Proc. 21st ACM International Conference on Information and Knowledge Management, Maui, HI, USA, 2012, pp. 55–64.
[29]
P. Fournier-Viger, C. -W. Wu, S. Zida, and V. S. Tseng, FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning, in Proc. 21st International Symposium on Methodologies for Intelligent Systems, Roskilde, Denmark, 2014, pp. 83–92.
[30]

S. Krishnamoorthy, Pruning strategies for mining high utility itemsets, Expert Systems with Applications, vol. 42, no. 5, pp. 2371–2381, 2015.

[31]

J. Sahoo, A. K. Das, and A. Goswami, An efficient fast algorithm for discovering closed+ high utility itemsets, Applied Intelligence, vol. 45, no. 1, pp. 44–74, 2016.

[32]
J. Liu, K. Wang, and B. C. M. Fung, Direct discovery of high utility itemsets without candidate generation, in Proc. 2012 IEEE 12th International Conference on Data Mining, Brussels, Belgium, 2012, pp. 984–989.
[33]
C. W. Wu, P. Fournier-Viger, J. Y. Gu, and V. S. Tseng, Mining closed+ high utility itemsets without candidate generation, in Proc. 2015 Conference on Technologies and Applications of Artificial Intelligence, Tainan, China, 2015, pp. 187–194.
[34]

G. -C. Lan, T. -P. Hong, and V. S. Tseng, An efficient projection-based indexing approach for mining high utility itemsets, Knowledge and Information Systems, vol. 38, no. 1, pp. 85–107, 2014.

[35]

S. Dawar, V. Goyal, and D. Bera, A hybrid framework for mining high-utility itemsets in a sparse transaction database, Applied Intelligence, vol. 47, no. 3, pp. 809–827, 2017.

Complex System Modeling and Simulation
Pages 47-58
Cite this article:
Cheng Z, Shen W, Fang W, et al. A Parallel High-Utility Itemset Mining Algorithm Based on Hadoop. Complex System Modeling and Simulation, 2023, 3(1): 47-58. https://doi.org/10.23919/CSMS.2022.0023

715

Views

59

Downloads

2

Crossref

2

Scopus

Altmetrics

Received: 13 July 2022
Revised: 20 October 2022
Accepted: 31 October 2022
Published: 09 March 2023
© The author(s) 2023

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return