Mining Conditional Functional Dependency Rules on Big Data

Mingda Li; Hongzhi Wang; Jianzhong Li

doi:10.26599/BDMA.2019.9020019

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

PDF (964.3 KB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

Mining Conditional Functional Dependency Rules on Big Data

Mingda Li, Hongzhi Wang(

), Jianzhong Li

∙ Department of Computer Science, University of California, Los Angles, CA 90095, USA.

∙ Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150000, China.

Show Author Information

Abstract

Current Conditional Functional Dependency (CFD) discovery algorithms always need a well-prepared training dataset. This condition makes them difficult to apply on large and low-quality datasets. To handle the volume issue of big data, we develop the sampling algorithms to obtain a small representative training set. We design the fault-tolerant rule discovery and conflict-resolution algorithms to address the low-quality issue of big data. We also propose parameter selection strategy to ensure the effectiveness of CFD discovery algorithms. Experimental results demonstrate that our method can discover effective CFD rules on billion-tuple data within a reasonable period.

Keywords

data mining conditional functional dependency big data data quality

References

[1]

W. F.

Fan

, F.

Geerts

, X. B.

Jia

, and A.

Kementsietsidis

, Conditional functional dependencies for capturing data inconsistencies, ACM Trans. Database Syst., vol. 33, no. 2, p. 6, 2008.

Crossref Google Scholar

[2]

W. G.

Chen

, W. F.

Fan

, and S.

, Analyses and validation of conditional dependencies with built-in predicates, in Proc. 20th Int. Conf. Database and Expert Systems Applications, Linz, Austria, 2009, pp. 576-591.

Crossref

[3]

Bollobás

, Modern Graph Theory. New York, NY, USA: Springer-Verlag, 1998.

Crossref

[4]

Golab

, H.

Karloff

, F.

Korn

, D.

Srivastava

, and B.

, On generating near-optimal tableaux for conditional functional dependencies, Proc. VLDB Endowment, vol. 1, no. 1, pp. 376-390, 2008.

Crossref Google Scholar

[5]

Cormode

, L.

Golab

, K.

Flip

, A.

McGregor

, D.

Srivastava

, and X.

Zhang

, Estimating the confidence of conditional functional dependencies, in Proc. 2009 ACM SIGMOD Int. Conf. Management of Data, Providence, RI, USA, 2009, pp. 469-482.

Crossref

[6]

J. S.

Vitter

, Random sampling with a reservoir, ACM Trans. Math. Softw., vol. 11, no. 1, pp. 37-57, 1985.

Crossref Google Scholar

[7]

W. F.

Fan

, F.

Geerts

, J. Z.

, and M.

Xiong

, Discovering conditional functional dependencies, IEEE Trans. Knowl. Data Eng., vol. 23, no. 5, pp. 683-698, 2011.

Crossref Google Scholar

[8]

M. R.

Garey

and D. S.

Johnson

, Computers and Intractability: A Guide to the Theory of NP-Completeness. New York, NY, USA: W. H. Freeman & Co., 1979.

[9]

Chiang

and R. J.

Miller

, Discovering data quality rules, Proc. VLDB Endowment, vol. 1, no. 1, pp. 1166-1177, 2008.

Crossref Google Scholar

[10]

Batini

and M.

Scannapieco

, Data Quality: Concepts, Methodologies and Techniques. Berlin, Germany: Springer, 2006.

[11]

Chomicki

, Consistent query answering: Five easy pieces, in Proc. 11th Int. Conf. Database Theory, Barcelona, Spain, 2007, pp. 1-17.

Crossref

[12]

Bertossi

, Consistent query answering in databases, ACM SIGMOD Record, vol. 35, no. 2, pp. 68-76, 2006.

Crossref Google Scholar

[13]

W. F.

Fan

, Dependencies revisited for improving data quality, in Proc. 27th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems, Vancouver, Canada, 2008, pp. 159-170.

Crossref

[14]

Rahm

and H. H.

, Data cleaning: Problems and current approaches, IEEE Data Eng. Bull., vol. 23, no. 4, pp. 3-13, 2010.

Google Scholar

[15]

Bravo

, W. F.

Fan

, and S.

, Extending dependencies with conditions, in Proc. 33rd Int. Conf. Very Large Data Bases, Vienna, Austria, 2007, pp. 243-254.

[16]

Bravo

, W. F.

Fan

, F.

Geerts

, and S.

, Increasing the expressivity of conditional functional dependencies without extra complexity, in Proc. 24th Int. Conf. Data Engineering, Cancun, Mexico, 2008, pp. 516-525.

Crossref

[17]

A. K.

Kalavagattu

, Mining approximate functional dependencies as condensed representations of association rules, Master dissertation, Arizona State University, Phoenix, AZ, USA, 2008.

Big Data Mining and Analytics

Volume 3 Issue 1,
March 2020

Pages 68-84

DOI: 10.26599/BDMA.2019.9020019

Cite this article:

Li M, Wang H, Li J. Mining Conditional Functional Dependency Rules on Big Data. Big Data Mining and Analytics, 2020, 3(1): 68-84. https://doi.org/10.26599/BDMA.2019.9020019

1132

Views

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 28 September 2019

Accepted: 09 October 2019

Published: 19 December 2019

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).