Recently, facial-expression recognition (FER) has primarily focused on images in the wild, including factors such as face occlusion and image blurring, rather than laboratory images. Complex field environments have introduced new challenges to FER. To address these challenges, this study proposes a cross-fusion dual-attention network. The network comprises three parts: (1) a cross-fusion grouped dual-attention mechanism to refine local features and obtain global information; (2) a proposed
- Article type
- Year
- Co-author
We propose a method for generating a ruled B-spline surface fitting to a sequence of pre-defined ruling lines and the generated surface is required to be as developable as possible. Specifically, the terminal ruling lines are treated as hard constraints. Different from existing methods that compute a quasi-developable surface from two boundary curves and cannot achieve explicit ruling control, our method controls ruling lines in an intuitive way and serves as an effective tool for computing quasi-developable surfaces from freely-designed rulings. We treat this problem from the point of view of numerical optimization and solve for surfaces meeting the distance error tolerance allowed in applications. The performance and the efficacy of the proposed method are demonstrated by the experiments on a variety of models including an application of the method for the path planning in 5-axis computer numerical control (CNC) flank milling.
Image segmentation is a basic problem in medical image analysis and useful for disease diagnosis. However, the complexity of medical images makes image segmentation difficult. In recent decades, fuzzy clustering algorithms have been preferred due to their simplicity and efficiency. However, they are sensitive to noise. To solve this problem, many algorithms using non-local information have been proposed, which perform well but are inefficient. This paper proposes an improved fuzzy clustering algorithm utilizing non-local self-similarity and a low-rank prior for image segmentation. Firstly, cluster centers are initialized based on peak detection. Then, a pixel correlation model between corresponding pixels is constructed, and similar pixel sets are retrieved. To improve efficiency and robustness, the proposed algorithm uses a novel objective function combining non-local information and a low-rank prior. Experiments on synthetic images and medical images illustrate that the algorithm can improve efficiency greatly while achieving satisfactory results.
Smoothing images, especially with rich texture, is an important problem in computer vision. Obtaining an ideal result is difficult due to complexity, irregularity, and anisotropicity of the texture. Besides, some properties are shared by the texture and the structure in an image. It is a hard compromise to retain structure and simultaneously remove texture. To create an ideal algorithm for image smoothing, we face three problems. For images with rich textures, the smoothing effect should be enhanced. We should overcome inconsistency of smoothing results in different parts of the image. It is necessary to create a method to evaluate the smoothing effect. We apply texture pre-removal based on global sparse decomposition with a variable smoothing parameter to solve the first two problems. A parametric surface constructed by an improved Bessel method is used to determine the smoothing parameter. Three evaluation measures: edge integrity rate, texture removal rate, and gradient value distribution are proposed to cope with the third problem. We use the alternating direction method of multipliers to complete the whole algorithm and obtain the results. Experiments show that our algorithm is better than existing algorithms both visually and quantitatively. We also demonstrate our method’s ability in other applications such as clip-art compression artifact removal and content-aware image manipulation.
This paper proposes a kernel-blending connection approximated by a neural network (KBNN) for image classification. A kernel mapping connection structure, guaranteed by the function approximation theorem, is devised to blend feature extraction and feature classification through neural network learning. First, a feature extractor learns features from the raw images. Next, an automatically constructed kernel mapping connection maps the feature vectors into a feature space. Finally, a linear classifier is used as an output layer of the neural network to provide classification results. Furthermore, a novel loss function involving a cross-entropy loss and a hinge loss is proposed to improve the generalizability of the neural network. Experimental results on three well-known image datasets illustrate that the proposed method has good classification accuracy and generalizability.
With the growing popularity of somatosensory interaction devices, human action recognition is becoming attractive in many application scenarios. Skeleton-based action recognition is effective because the skeleton can represent the position and the structure of key points of the human body. In this paper, we leverage spatiotemporal vectors between skeleton sequences as input feature representation of the network, which is more sensitive to changes of the human skeleton compared with representations based on distance and angle features. In addition, we redesign residual blocks that have different strides in the depth of the network to improve the processing ability of the temporal convolutional networks (TCNs) for long time dependent actions. In this work, we propose the two-stream temporal convolutional networks (TS-TCNs) that take full advantage of the inter-frame vector feature and the intra-frame vector feature of skeleton sequences in the spatiotemporal representations. The framework can integrate different feature representations of skeleton sequences so that the two feature representations can make up for each other’s shortcomings. The fusion loss function is used to supervise the training parameters of the two branch networks. Experiments on public datasets show that our network achieves superior performance and attains an improvement of 1.2% over the recent GCN-based (BGC-LSTM) method on the NTU RGB+D dataset.
Single image super-resolution is devoted to generating a high-resolution image from a low-resolution one, which has been a research hotspot for its significant applications. A novel method that is totally based on the single input image itself is proposed in this paper. Firstly, a local-feature based interpolation method where both edge pixel property and location information are taken into consideration is presented to obtain a better initialization. Then, a dynamic lightweight database of self-examples is built with the aid of our in-depth study on self-similarity, from which adaptive linear regressions are learned to directly map the low-resolution patch into its high-resolution version. Furthermore, a gradually upscaling strategy accompanied by iterative optimization is employed to enhance the consistency at each step. Even without any external information, extensive experimental comparisons with state-of-the-art methods on standard benchmarks demonstrate the competitive performance of the proposed scheme in both visual effect and objective evaluation.
Image smoothing is a crucial image processing topic and has wide applications. For images with rich texture, most of the existing image smoothing methods are difficult to obtain significant texture removal performance because texture containing obvious edges and large gradient changes is easy to be preserved as the main edges. In this paper, we propose a novel framework (DSHFG) for image smoothing combined with the constraint of sparse high frequency gradient for texture images. First, we decompose the image into two components: a smooth component (constant component) and a non-smooth (high frequency) component. Second, we remove the non-smooth component containing high frequency gradient and smooth the other component combining with the constraint of sparse high frequency gradient. Experimental results demonstrate the proposed method is more competitive on efficiently texture removing than the state-of-the-art methods. What is more, our approach has a variety of applications including edge detection, detail magnification, image abstraction, and image composition.
This paper proposes a novel method for image magnification by exploiting the property that the intensity of an image varies along the direction of the gradient very quickly. It aims to maintain sharp edges and clear details. The proposed method first calculates the gradient of the low-resolution image by fitting a surface with quadratic polynomial precision. Then, bicubic interpolation is used to obtain initial gradients of the high-resolution (HR) image. The initial gradients are readjusted to find the constrained gradients of the HR image, according to spatial correlations between gradients within a local window. To generate an HR image with high precision, a linear surface weighted by the projection length in the gradient direction is constructed. Each pixel in the HR image is determined by the linear surface. Experimental results demonstrate that our method visually improves the quality of the magnified image. It particularly avoids making jagged edges and bluring during magnification.