AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (964.3 KB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Mining Conditional Functional Dependency Rules on Big Data

Department of Computer Science, University of California, Los Angles, CA 90095, USA.
Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150000, China.
Show Author Information

Abstract

Current Conditional Functional Dependency (CFD) discovery algorithms always need a well-prepared training dataset. This condition makes them difficult to apply on large and low-quality datasets. To handle the volume issue of big data, we develop the sampling algorithms to obtain a small representative training set. We design the fault-tolerant rule discovery and conflict-resolution algorithms to address the low-quality issue of big data. We also propose parameter selection strategy to ensure the effectiveness of CFD discovery algorithms. Experimental results demonstrate that our method can discover effective CFD rules on billion-tuple data within a reasonable period.

References

[1]
W. F. Fan, F. Geerts, X. B. Jia, and A. Kementsietsidis, Conditional functional dependencies for capturing data inconsistencies, ACM Trans. Database Syst., vol. 33, no. 2, p. 6, 2008.
[2]
W. G. Chen, W. F. Fan, and S. Ma, Analyses and validation of conditional dependencies with built-in predicates, in Proc. 20th Int. Conf. Database and Expert Systems Applications, Linz, Austria, 2009, pp. 576-591.
[3]
B. Bollobás, Modern Graph Theory. New York, NY, USA: Springer-Verlag, 1998.
[4]
L. Golab, H. Karloff, F. Korn, D. Srivastava, and B. Yu, On generating near-optimal tableaux for conditional functional dependencies, Proc. VLDB Endowment, vol. 1, no. 1, pp. 376-390, 2008.
[5]
G. Cormode, L. Golab, K. Flip, A. McGregor, D. Srivastava, and X. Zhang, Estimating the confidence of conditional functional dependencies, in Proc. 2009 ACM SIGMOD Int. Conf. Management of Data, Providence, RI, USA, 2009, pp. 469-482.
[6]
J. S. Vitter, Random sampling with a reservoir, ACM Trans. Math. Softw., vol. 11, no. 1, pp. 37-57, 1985.
[7]
W. F. Fan, F. Geerts, J. Z. Li, and M. Xiong, Discovering conditional functional dependencies, IEEE Trans. Knowl. Data Eng., vol. 23, no. 5, pp. 683-698, 2011.
[8]
M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. New York, NY, USA: W. H. Freeman & Co., 1979.
[9]
F. Chiang and R. J. Miller, Discovering data quality rules, Proc. VLDB Endowment, vol. 1, no. 1, pp. 1166-1177, 2008.
[10]
C. Batini and M. Scannapieco, Data Quality: Concepts, Methodologies and Techniques. Berlin, Germany: Springer, 2006.
[11]
J. Chomicki, Consistent query answering: Five easy pieces, in Proc. 11th Int. Conf. Database Theory, Barcelona, Spain, 2007, pp. 1-17.
[12]
L. Bertossi, Consistent query answering in databases, ACM SIGMOD Record, vol. 35, no. 2, pp. 68-76, 2006.
[13]
W. F. Fan, Dependencies revisited for improving data quality, in Proc. 27th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems, Vancouver, Canada, 2008, pp. 159-170.
[14]
E. Rahm and H. H. Do, Data cleaning: Problems and current approaches, IEEE Data Eng. Bull., vol. 23, no. 4, pp. 3-13, 2010.
[15]
L. Bravo, W. F. Fan, and S. Ma, Extending dependencies with conditions, in Proc. 33rd Int. Conf. Very Large Data Bases, Vienna, Austria, 2007, pp. 243-254.
[16]
L. Bravo, W. F. Fan, F. Geerts, and S. Ma, Increasing the expressivity of conditional functional dependencies without extra complexity, in Proc. 24th Int. Conf. Data Engineering, Cancun, Mexico, 2008, pp. 516-525.
[17]
A. K. Kalavagattu, Mining approximate functional dependencies as condensed representations of association rules, Master dissertation, Arizona State University, Phoenix, AZ, USA, 2008.
Big Data Mining and Analytics
Pages 68-84
Cite this article:
Li M, Wang H, Li J. Mining Conditional Functional Dependency Rules on Big Data. Big Data Mining and Analytics, 2020, 3(1): 68-84. https://doi.org/10.26599/BDMA.2019.9020019

1132

Views

78

Downloads

40

Crossref

29

Web of Science

44

Scopus

0

CSCD

Altmetrics

Received: 28 September 2019
Accepted: 09 October 2019
Published: 19 December 2019
© The author(s) 2020

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return