3.1 Transforming the calculation pattern
As Eq. (
1
) shows, the computation complexity of the Gaussian Copula PDF is mainly dominated by the evaluation of the function
, which is called
times throughout the evaluation. Moreover, the number of calls will continue to increase proportionally with the increase of the amount of input data. In our design, we implement the ICDF
based on Acklam’s method shown in Algorithm 1. In comparison to other conventional algorithms for the evaluation of
, Acklam’s algorithm is more optimized and requires less arithmetic operations although it still needs more than 60 arithmetic operations to achieve an accurate result. Hence, if we can reduce the computational resource cost of
as much as possible, the global resource cost of the Gaussian Copula PDF will be reduced consequently.
As shown in Algorithm 1, this pseudo code of
performs corresponding calculations based on different regions (lower region, central region, and upper region) that
belongs to by using three IF-ELSE structures. Moreover, the calculations inside these IF-ELSE structures are similar to each other: Each structure includes two degree-5 polynomial evaluations and one division, except that the polynomial coefficients are different in each structure. The pseudo code works well for the CPU architecture and will not incur any resource cost on duplicated operations. Since each
belongs to one of these three regions, only one calculation path will be executed. However, the situation is completely different if we map the same pseudo code onto the FPGA architecture. Specifically, we use the similar polynomial evaluation and division units for each of these three IF-ELSE structures.
In order to deal with the cost of the duplicated arithmetic units, we use the multiplexer (MUX) to select the corresponding coefficients and share the two polynomial evaluation units and the one division unit among the three different regions. More specifically, looking at the numerators of these three division operations in Algorithm 1 (see lines 25, 29 and 33), if we ignore the multiplication by
in the low-region and high-region calculations, they are polynomials of degree 5. Therefore, we can multiplex the coefficients. Then, the additional multiplication by
in the central region’s numerator can be matched by a multiplication by 1 in the low-region’s and the high-region’s numerators. Next, the denominators have different degrees, but this can be easily handled by adding 0 as the top-degree coefficient for the low-region’s and the high region’s denominators. Thus, after multiplexing these coefficients, the new calculation process of Acklam’s ICDF algorithm is shown in Algorithm 2, which contains only one division and two polynomials, leading to an FPGA-friendly algorithm.
In Algorithm 2, lines 6-8 correspond to the selection of polynomial coefficients for the computation of the division’s numerator; lines 10-12 correspond to the selection of polynomial coefficients for the computation of the division’s denominator. Lines 9-13 correspond to the calculation process of the numerator and the denominator for division. Compared with Algorithm 1, the new Acklam’s ICDF algorithm (see Algorithm 2) will lead to a significant saving of the computational resources since the calculations of six polynomials and two divisions are reduced to two and one, respectively.