Managing software packages in a scientific computing environment is a challenging task, especially in the case of heterogeneous systems. It is error prone when installing and updating software packages in a sophisticated computing environment. Testing and performance evaluation in an on-the-fly manner is also a troublesome task for a production system. In this paper, we discuss a package management scheme based on containers. The newly developed method can ease the maintenance complexity and reduce human mistakes. We can benefit from the self-containing and isolation features of container technologies for maintaining the software packages among intricately connected clusters. By deploying the SuperComputing application Strore (SCStore) over the WAN connected world-largest clusters, it proved that it can greatly reduce the effort for maintaining the consistency of software environment and bring benefit to achieve automation.
- Article type
- Year
- Co-author
Scientific instruments and simulation programs are generating large amounts of multidimensional array data. Queries with value and dimension subsetting conditions are commonly used by scientists to find useful information from big array data, and data storage and indexing methods play an important role in supporting queries on multidimensional array data efficiently. In this paper, we propose SwiftArray, a new storage layout with indexing techniques to accelerate queries with value and dimension subsetting conditions. In SwiftArray, the multidimensional array is divided into blocks and each block stores sorted values. Blocks are placed in the order of a Hilbert space-filling curve to improve data locality for dimension subsetting queries. We propose a 2-D-Bin method to build an index for the blocks’ value ranges, which is an efficient way to avoid accessing unnecessary blocks for value subsetting queries. Our evaluations show that SwiftArray surpasses the NetCDF-4 format and FastBit indexing technique for queries on multidimensional arrays.
The Gaussian Copula Probability Density Function (PDF) plays an important role in the fields of finance, hydrological modeling, biomedical study, and texture retrieval. However, the existing schemes for evaluating the Gaussian Copula PDF are all computationally-demanding and generally the most time-consuming part in the corresponding applications. In this paper, we propose an FPGA-based design to accelerate the computation of the Gaussian Copula PDF. Specifically, the evaluation of the Gaussian Copula PDF is mapped into a fully-pipelined FPGA dataflow engine by using three optimization steps: transforming the calculation pattern, eliminating constant computations from hardware logic, and extending calculations to multiple pipelines. In the experiments on 10 typical large-scale data sets, our FPGA-based solution shows a maximum of 1870 times speedup over a well-tuned single-core CPU-based solution, and 610 times speedup over a well-optimized parallel quad-core CPU-based solution when processing two-dimensional data.