Efficient Partitioning Method for Optimizing the Compression on Array Data

Shuai Han; Xian-Min Liu; Jian-Zhong Li

doi:10.1007/s11390-022-2371-7

| Sign up

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Outline

Abstract

Keywords

Electronic Supplementary Material

References

Show full outline

Hide outline

Regular Paper

Efficient Partitioning Method for Optimizing the Compression on Array Data

Shuai Han, Xian-Min Liu, Jian-Zhong Li

Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China

Show Author Information

Abstract

Array partitioning is an important research problem in array management area, since the partitioning strategies have important influence on storage, query evaluation, and other components in array management systems. Meanwhile, compression is highly needed for the array data due to its growing volume. Observing that array partitioning can affect the compression performance significantly, this paper aims to design an efficient partitioning method for array data to optimize the compression performance. As far as we know, there still lacks research efforts on this problem. In this paper, the problem of array partitioning for optimizing the compression performance (PPCP for short) is firstly proposed. We adopt a popular compression technique which allows to process queries on the compressed data without decompression. Secondly, because the above problem is NP-hard, two essential principles for exploring the partitioning solution are introduced, which can explain the core idea of the partitioning algorithms proposed by us. The first principle shows that the compression performance can be improved if an array can be partitioned into two parts with different sparsities. The second principle introduces a greedy strategy which can well support the selection of the partitioning positions heuristically. Supported by the two principles, two greedy strategy based array partitioning algorithms are designed for the independent case and the dependent case respectively. Observing the expensive cost of the algorithm for the dependent case, a further optimization based on random sampling and dimension grouping is proposed to achieve linear time cost. Finally, the experiments are conducted on both synthetic and real-life data, and the results show that the two proposed partitioning algorithms achieve better performance on both compression and query evaluation.

Keywords

array partitioning compression performance greedy strategy NP-hard

Electronic Supplementary Material

Download File(s)

jcst-37-5-1049-Highlights.pdf (131.6 KB)

References

[1]

Duggan J, Stonebraker M. Incremental elasticity for array databases. In Proc. the 2014 ACM SIGMOD International Conference on Management of Data, Jun. 2014, pp. 409-420. DOI: 10.1145/2588555.2588569.

Crossref

[2]

Li J, Rotem D, Wong H K. A new compression method with fast searching on large databases. In Proc. the 13th International Conference on Very Large Data Bases, Sept. 1987, pp. 311-318.

[3]

Wang J, Lin C, Papakonstantinou Y, Swanson S. An experimental study of bitmap compression vs. inverted list compression. In Proc. the 2017 ACM International Conference on Management of Data, May 2017, pp. 993-1008. DOI: 10.1145/3035918.3064007.

Crossref

[4]

Damme P, Ungethüm A, Hildebrandt J, Habich D, Lehner W. From a comprehensive experimental survey to a cost-based selection strategy for lightweight integer compression algorithms. ACM Transactions on Database Systems, 2019, 44(3): Article No. 9. DOI: 10.1145/3323991.