AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (1.6 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Efficient Currency Determination Algorithms for Dynamic Data

School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
Show Author Information

Abstract

Data quality is an important aspect in data application and management, and currency is one of the major dimensions influencing its quality. In real applications, datasets timestamps are often incomplete and unavailable, or even absent. With the increasing requirements to update real-time data, existing methods can fail to adequately determine the currency of entities. In consideration of the velocity of big data, we propose a series of efficient algorithms for determining the currency of dynamic datasets, which we divide into two steps. In the preprocessing step, to better determine data currency and accelerate dataset updating, we propose the use of a topological graph of the processing order of the entity attributes. Then, we construct an Entity Query B-Tree (EQB-Tree) structure and an Entity Storage Dynamic Linked List (ES-DLL) to improve the querying and updating processes of both the data currency graph and currency scores. In the currency determination step, we propose definitions of the currency score and currency information for tuples referring to the same entity and use examples to discuss methods and algorithms for their computation. Based on our experimental results with both real and synthetic data, we verify that our methods can efficiently update data in the correct order of currency.

References

[1]
Fan W., Geerts F., Ma S., Tang N., and Yu W., Data Quality Problems beyond Consistency and Deduplication. Springer Berlin Heidelberg, 2013, pp. 237249.
[2]
Li M. H., Li J. Z., and Gao H., Evaluation of data currency, (in Chinese), Chinese Journal of Computers, vol. 35, no. 11, pp. 2348-2360, 2012.
[3]
Fan W., Geerts F., and Jia X., Conditional dependencies: A principled approach to improving data quality, in British National Conference on Databases: Dataspace: the Final Frontier, 2009, pp. 8-20.
[4]
Herzog T. N., Scheuren F. J., and Winkler W. E., Data Quality and Record Linkage Techniques. Springer Science & Business Media, 2007.
[5]
Fan W., Geerts F., and Wijsen J., Determining the currency of data, Acm Transactions on Database Systems, vol. 37, no. 4, pp. 71-82, 2012.
[6]
Li M. and Li J., A minimized-rule based approach for improving data currency, Journal of Combinatorial Optimization, vol. 32, no. 3, pp. 812-841, 2016.
[7]
Shen Y., Guo B., Shen Y., Duan X., Dong X., and Zhang H., A pricing model for big personal data, Tsinghua Science and Technology, vol. 21, no. 5, pp. 482-490, 2016.
[8]
Batini C., Cappiello C., Francalanci C., and Maurino A., Methodologies for data quality assessment and improvement, ACM Computing Surveys, vol. 41, no. 3, pp. 75-79, 2009.
[9]
Godfrey T. C., Data Quality for the Information Age. Artech House, Inc., 1996.
[10]
Wang R. Y. and Strong D. M., Beyond accuracy: What data quality means to data consumers, Journal of Management Information Systems, vol. 12, no. 4, pp. 5-33, 1996.
[11]
Gorz Q., An economics-driven decision model for data quality improvement—A contribution to data currency, in Proc. 17th Americas Conference on Information Systems (AMCIS), Detroit, MI, USA, 2011, pp. 1-8.
[12]
Heinrich B. and Klier M., Assessing data currency—A probabilistic approach, Journal of Information Science, vol. 37, no. 1, pp. 86-100, 2011.
[13]
Cappiello C., Francalanci C., and Pernici B., A model of data currency in multi-channel financial architectures, in International Conference on Information Quality, 2002, pp. 106-118.
[14]
Heinrich B., Klier M., and Kaiser M., A procedure to develop metrics for currency and its application in CRM, Journal of Data and Information Quality, vol. 1, no. 1, pp. 1-28, 2009
[15]
Heinrich B. and Hristova D., A fuzzy metric for currency in the context of BIG DATA, in 22nd European Conference on Information Systems (ECIS), 2014.
[16]
Cappiello C., Francalanci C., and Pernici B., Time related factors of data accuracy, completeness, and currency in multi-channel infor-mation systems, in The Conference on Advanced Information Systems Engineering, 2003, pp. 145-153.
[17]
Bertossi L., Consistent query answering in databases, ACM Sigmod Record Homepage, vol. 35, no. 2, pp. 68-76, 2006.
[18]
Chomicki J., Consistent query answering: Five easy pieces, in Database Theory – ICDT 2007, International Conference, Barcelona, Spain, January 10–12, 2007, pp. 1-17.
[19]
Dong X. L., Berti-Equille L., and Srivastava D., Truth discovery and copying detection in a dynamic world, Proceedings of the Vldb Endowment, vol. 2, no. 1, pp. 562-573, 2009.
[20]
Cao Y., Fan W., and Yu W., Determining the relative accuracy of attributes, in ACM SIGMOD International Conference on Management of Data, 2013, pp. 565-576.
[21]
Fan W., Geerts F., Tang N., and Yu W., Inferring data currency and consistency for conflict resolution, in 2013 IEEE 29th International Conference on Data Engineering (ICDE), Brisbane, Australlia, 2013, pp. 470-481.
[22]
Fan W., Li J. , Ma S. , Tang N. , and Yu W. , Interaction between record matching and data repairing, in ACM SIGMOD International Conference on Management of Data, Athens, Greece, ACM, 2011, pp. 469-480.
[23]
Fan W., Geerts F., Tang N., and Yu W., Conflict resolution with data currency and consistency, Journal of Data and Information Quality, vol. 5, nos. 1&2, pp. 1-37, 2014.
[24]
Ding X., Wang H., Gao Y., Li J., and Gao H., Determining the currency of dynamic data, in Proceedings of the 2017 ACM TUR-C Conference, ACM, 2017.
[25]
Christen P., A survey of indexing techniques for scalable record linkage and deduplication, IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 9, pp. 1537-1555, 2011.
[26]
Bodirsky M. and Kara J.. The complexity of temporal constraint satisfaction problems, in ACM Symposium on Theory of Computing, Victoria, British Columbia, Canada, 2008.
[27]
Wang H., Li J., and Gao H., Efficient entity resolution based on subgraph cohesion, Knowledge and Information Systems, vol. 46, no. 2, pp. 285-314, 2016.
[28]
Elmagarmid A. K., Ipeirotis P. G., and Verykios V. S., Duplicate record detection: A survey, IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 1, pp. 1-16, 2007.
Tsinghua Science and Technology
Pages 227-242
Cite this article:
Ding X, Wang H, Gao Y, et al. Efficient Currency Determination Algorithms for Dynamic Data. Tsinghua Science and Technology, 2017, 22(3): 227-242. https://doi.org/10.23919/TST.2017.7914196

643

Views

9

Downloads

6

Crossref

N/A

Web of Science

10

Scopus

0

CSCD

Altmetrics

Received: 26 March 2017
Revised: 06 April 2017
Accepted: 11 April 2017
Published: 04 May 2017
© The authors 2017
Return