Scientific instruments and simulation programs are generating large amounts of multidimensional array data. Queries with value and dimension subsetting conditions are commonly used by scientists to find useful information from big array data, and data storage and indexing methods play an important role in supporting queries on multidimensional array data efficiently. In this paper, we propose SwiftArray, a new storage layout with indexing techniques to accelerate queries with value and dimension subsetting conditions. In SwiftArray, the multidimensional array is divided into blocks and each block stores sorted values. Blocks are placed in the order of a Hilbert space-filling curve to improve data locality for dimension subsetting queries. We propose a 2-D-Bin method to build an index for the blocks’ value ranges, which is an efficient way to avoid accessing unnecessary blocks for value subsetting queries. Our evaluations show that SwiftArray surpasses the NetCDF-4 format and FastBit indexing technique for queries on multidimensional arrays.
- Article type
- Year
- Co-author
The Gaussian Copula Probability Density Function (PDF) plays an important role in the fields of finance, hydrological modeling, biomedical study, and texture retrieval. However, the existing schemes for evaluating the Gaussian Copula PDF are all computationally-demanding and generally the most time-consuming part in the corresponding applications. In this paper, we propose an FPGA-based design to accelerate the computation of the Gaussian Copula PDF. Specifically, the evaluation of the Gaussian Copula PDF is mapped into a fully-pipelined FPGA dataflow engine by using three optimization steps: transforming the calculation pattern, eliminating constant computations from hardware logic, and extending calculations to multiple pipelines. In the experiments on 10 typical large-scale data sets, our FPGA-based solution shows a maximum of 1870 times speedup over a well-tuned single-core CPU-based solution, and 610 times speedup over a well-optimized parallel quad-core CPU-based solution when processing two-dimensional data.