Scholar - SciOpen

Data race is one of the most important concurrent anomalies in multi-threaded programs. Emerging constraint-based techniques are leveraged into race detection, which is able to find all the races that can be found by any other sound race detector. However, this constraint-based approach has serious limitations on helping programmers analyze and understand data races. First, it may report a large number of false positives due to the unrecognized dataflow propagation of the program. Second, it recommends a wide range of thread context switches to schedule the reported race (including the false one) whenever this race is exposed during the constraint-solving process. This ad hoc recommendation imposes too many context switches, which complicates the data race analysis. To address these two limitations in the state-of-the-art constraint-based race detection, this paper proposes DFTracker, an improved constraint-based race detector to recommend each data race with minimal thread context switches. Specifically, we reduce the false positives by analyzing and tracking the dataflow in the program. By this means, DFTracker thus reduces the unnecessary analysis of false race schedules. We further propose a novel algorithm to recommend an effective race schedule with minimal thread context switches for each data race. Our experimental results on the real applications demonstrate that 1) without removing any true data race, DFTracker effectively prunes false positives by 68% in comparison with the state-of-the-art constraint-based race detector; 2) DFTracker recommends as low as 2.6–8.3 (4.7 on average) thread context switches per data race in the real world, which is 81.6% fewer context switches per data race than the state-of-the-art constraint based race detector. Therefore, DFTracker can be used as an effective tool to understand the data race for programmers.

Regular Paper Issue

Evaluating RISC-V Vector Instruction Set Architecture Extension with Computer Vision Workloads

Ruo-Shi Li, Ping Peng, Zhi-Yuan Shao, Hai Jin, Ran Zheng

Journal of Computer Science and Technology 2023, 38(4): 807-820

Published: 06 December 2023

Abstract Collect Collected

Computer vision (CV) algorithms have been extensively used for a myriad of applications nowadays. As the multimedia data are generally well-formatted and regular, it is beneficial to leverage the massive parallel processing power of the underlying platform to improve the performances of CV algorithms. Single Instruction Multiple Data (SIMD) instructions, capable of conducting the same operation on multiple data items in a single instruction, are extensively employed to improve the efficiency of CV algorithms. In this paper, we evaluate the power and effectiveness of RISC-V vector extension (RV-V) on typical CV algorithms, such as Gray Scale, Mean Filter, and Edge Detection. By our examinations, we show that compared with the baseline OpenCV implementation using scalar instructions, the equivalent implementations using the RV-V (version 0.8) can reduce the instruction count of the same CV algorithm up to 24x, when processing the same input images. Whereas, the actual performances improvement measured by the cycle counts is highly related with the specific implementation of the underlying RV-V co-processor. In our evaluation, by using the vector co-processor (with eight execution lanes) of Xuantie C906, vector-version CV algorithms averagely exhibit up to 2.98x performances speedups compared with their scalar counterparts.

Regular Paper Issue

FDGLib: A Communication Library for Efficient Large-Scale Graph Processing in FPGA-Accelerated Data Centers

Yu-Wei Wu, Qing-Gang Wang, Long Zheng, Xiao-Fei Liao, Hai Jin, Wen-Bin Jiang, Ran Zheng, Kan Hu

Journal of Computer Science and Technology 2021, 36(5): 1051-1070

Published: 30 September 2021

Abstract Collect Collected

With the rapid growth of real-world graphs, the size of which can easily exceed the on-chip (board) storage capacity of an accelerator, processing large-scale graphs on a single Field Programmable Gate Array (FPGA) becomes difficult. The multi-FPGA acceleration is of great necessity and importance. Many cloud providers (e.g., Amazon, Microsoft, and Baidu) now expose FPGAs to users in their data centers, providing opportunities to accelerate large-scale graph processing. In this paper, we present a communication library, called FDGLib, which can easily scale out any existing single FPGA-based graph accelerator to a distributed version in a data center, with minimal hardware engineering efforts. FDGLib provides six APIs that can be easily used and integrated into any FPGA-based graph accelerator with only a few lines of code modifications. Considering the torus-based FPGA interconnection in data centers, FDGLib also improves communication efficiency using simple yet effective torus-friendly graph partition and placement schemes. We interface FDGLib into AccuGraph, a state-of-the-art graph accelerator. Our results on a 32-node Microsoft Catapult-like data center show that the distributed AccuGraph can be 2.32x and 4.77x faster than a state-of-the-art distributed FPGA-based graph accelerator ForeGraph and a distributed CPU-based graph system Gemini, with better scalability.

Total 3

<1/11>GOpage