We propose a novel end-to-end deep learning framework, the Joint Matting Network (JMNet), to automatically generate alpha mattes for human images. We utilize the intrinsic structures of the human body as seen in images by introducing a pose estimation module, which can provide both global structural guidance and a local attention focus for the matting task. Our network model includes a pose network, a trimap network, a matting network, and a shared encoder to extract features for the above three networks. We also append a trimap refinement module and utilize gradient loss to provide a sharper alpha matte. Extensive experiments have shown that our method outperforms state-of-the-art human matting techniques; the shared encoder leads to better performance and lower memory costs. Our model can process real images downloaded from the Internet for use in composition applications.
- Article type
- Year
- Co-author
It is challenging to track a target continuously in videos with long-term occlusion, or objects which leave then re-enter a scene. Existing tracking algorithms combined with online-trained object detectors perform unreliably in complex conditions, and can only provide discontinuous trajectories with jumps in position when the object is occluded. This paper proposes a novel framework of tracking-by-detection using selection and completion to solve the abovementioned problems. It has two components, tracking and trajectory completion. An offline-trained object detector can localize objects in the same category as the object being tracked. The object detector is based on a highly accurate deep learning model. The object selector determines which object should be used to re-initialize a traditional tracker. As the object selector is trained online, it allows the framework to be adaptable. During completion, a predictive non-linear autoregressive neural network completes any discontinuous trajectory. The tracking component is an online real-time algorithm, and the completion part is an after-the-event mechanism. Quantitative experiments show a significant improvement in robustness over prior state-of-the-art methods.
Current image-editing tools do not match up to the demands of personalized image manipulation, one application of which is changing clothes in user-captured images. Previous work can change single color clothes using parametric human warping methods. In this paper, we propose an image-based clothes changing system, exploiting body factor extraction and content-aware image warping. Image segmentation and mask generation are first applied to the user input. Afterwards, we determine joint positions via a neural network. Then, body shape matching is performed and the shape of the model is warped to the user’s shape. Finally, head swapping is performed to produce realistic virtual results. We also provide a supervision and labeling tool for refinement and further assistance when creating a dataset.
This paper surveys the state-of-the-art of research in patch-based synthesis. Patch-based methods synthesize output images by copying small regions from exemplar imagery. This line of research originated from an area called “texture synthesis”, which focused on creating regular or semi-regular textures from small exemplars. However, more recently, much research has focused on synthesis of larger and more diverse imagery, such as photos, photo collections, videos, and light fields. Additionally, recent research has focused on customizing the synthesis process for particular problem domains, such as synthesizing artistic or decorative brushes, synthesis of rich materials, and synthesis for 3D fabrication. This report investigates recent papers that follow these themes, with a particular emphasis on papers published since 2009, when the last survey in this area was published. This survey can serve as a tutorial for readers who are not yet familiar with these topics, as well as provide comparisons between these papers, and highlight some open problems in this area.