Abstract
Object tracking has been a challenge in computer vision. In this paper, we present a novel method to model target appearance and combine it with structured output learning for robust online tracking within a tracking-by-detection framework. We take both convolutional features and hand-crafted features into account to robustly encode the target appearance. First, we extract convolutional features of the target by kernels generated from the initial annotated frame. To capture appearance variation during tracking, we propose a new strategy to update the target and background kernel pool. Secondly, we employ a structured output SVM for refining the target’s location to mitigate uncertainty in labeling samples as positive or negative. Compared with existing state-of-the-art trackers, our tracking method not only enhances the robustness of the feature representation, but also uses structured output prediction to avoid relying on heuristic intermediate steps to produce labelled binary samples. Extensive experimental evaluation on the challenging OTB-50 video sequences shows competitive results in terms of both success and precision rate, demonstrating the merits of the proposed tracking method.