The popularity of online home design and floor plan customization has been steadily increasing. However, the manual conversion of floor plan images from books or paper materials into electronic resources can be a challenging task due to the vast amount of historical data available. By leveraging neural networks to identify and parse floor plans, the process of converting these images into electronic materials can be significantly streamlined. In this paper, we present a novel learning framework for automatically parsing floor plan images. Our key insight is that the room type text is very common and crucial in floor plan images as it identifies the important semantic information of the corresponding room. However, this clue is rarely considered in previous learning-based methods. In contrast, we propose the Row and Column network (RC-Net) for recognizing floor plan elements by integrating the text feature. Specifically, we add the text feature branch in the network to extract text features corresponding to the room type for the guidance of room type predictions. More importantly, we formulate the Row and Column constraint module (RC constraint module) to share and constrain features across the entire row and column of the feature maps to ensure that only one type is predicted in each room as much as possible, making the segmentation boundaries between different rooms more regular and cleaner. Extensive experiments on three benchmark datasets validate that our framework substantially outperforms other state-of-the-art approaches in terms of the metrics of FWIoU, mACC and mIoU.
- Article type
- Year
- Co-author
We address the 3D shape assembly of multiple geometric pieces without overlaps, a scenario often encountered in 3D shape design, field archeology, and robotics. Existing methods depend on strong assumptions on the number of shape pieces and coherent geometry or semantics of shape pieces. Despite raising attention to 3D registration with complex or low overlapping patterns, few methods consider shape assembly with rare overlaps. To address this problem, we present a novel framework inspired by solving puzzles, named PuzzleNet, which conducts multi-task learning by leveraging both 3D alignment and boundary information. Specifically, we design an end-to-end neural network based on a point cloud transformer with two-way branches for estimating rigid transformation and predicting boundaries simultaneously. The framework is then naturally extended to reassemble multiple pieces into a full shape by using an iterative greedy approach based on the distance between each pair of candidate-matched pieces. To train and evaluate PuzzleNet, we construct two datasets, named DublinPuzzle and ModelPuzzle, based on a real-world urban scan dataset (DublinCity) and a synthetic CAD dataset (ModelNet40) respectively. Experiments demonstrate our effectiveness in solving 3D shape assembly for multiple pieces with arbitrary geometry and inconsistent semantics. Our method surpasses state-of-the-art algorithms by more than 10 times in rotation metrics and four times in translation metrics.
Specular highlight detection and removal is a fundamental problem in computer vision and image processing. In this paper, we present an efficient end-to-end deep learning model for automatically detecting and removing specular highlights in a single image. In particular, an encoder–decoder network is utilized to detect specular highlights, and then a novel Unet-Transformer network performs highlight removal; we append transformer modules instead of feature maps in the Unet architecture. We also introduce a highlight detection module as a mask to guide the removal task. Thus, these two networks can be jointly trained in an effective manner. Thanks to the hierarchical and global properties of the transformer mechanism, our framework is able to establish relationships between continuous self-attention layers, making it possible to directly model the mapping between the diffuse area and the specular highlight area, and reduce indeterminacy within areas containing strong specular highlight reflection. Experiments on public benchmark and real-world images demonstrate that our approach outperforms state-of-the-art methods for both highlight detection and removal tasks.
Recent learning-based approaches show promising performance improvement for the scene text removal task but usually leave several remnants of text and provide visually unpleasant results. In this work, a novel end-to-end framework is proposed based on accurate text stroke detection. Specifically, the text removal problem is decoupled into text stroke detection and stroke removal; we design separate networks to solve these two subproblems, the latter being a generative network. These two networks are combined as a processing unit, which is cascaded to obtain our final model for text removal. Experimental results demonstrate that the proposed method substantially outperforms the state-of-the-art for locating and erasing scene text. A new large-scale real-world dataset with 12,120 images has been constructed and is being made available to facilitate research, as current publicly available datasets are mainly synthetic so cannot properly measure the performance of different methods.
Existing physical cloth simulators suffer from expensive computation and difficulties in tuning mechanical parameters to get desired wrinkling behaviors. Data-driven methods provide an alternative solution. They typically synthesize cloth animation at a much lower computational cost, and also create wrinkling effects that are similar to the training data. In this paper we propose a deep learning based method for synthesizing cloth animation with high resolution meshes. To do this we first create a dataset for training: a pair of low and high resolution meshes are simulated and their motions are synchronized. As a result the two meshes exhibit similar large-scale deformation but different small wrinkles. Each simulated mesh pair is then converted into a pair of low- and high-resolution “images” (a 2D array of samples), with each image pixel being interpreted as any of three descriptors: the displacement, the normal and the velocity. With these image pairs, we design a multi-feature super-resolution (MFSR) network that jointly trains an upsampling synthesizer for the three descriptors. The MFSR architecture consists of shared and task-specific layers to learn multi-level features when super-resolving three descriptors simultaneously. Frame-to-frame consistency is well maintained thanks to the proposed kinematics-based loss function. Our method achieves realistic results at high frame rates: 12–14 times faster than traditional physical simulation. We demonstrate the performance of our method with various experimental scenes, including a dressed character with sophisticated collisions.
A discriminative local shape descriptor plays an important role in various applications. In this paper, we present a novel deep learning framework that derives discriminative local descriptors for deformable 3D shapes. We use local "geometry images" to encode the multi-scale local features of a point, via an intrinsic parameterization method based on geodesic polar coordinates. This new parameterization provides robust geometry images even for badly-shaped triangular meshes. Then a triplet network with shared architecture and parameters is used to perform deep metric learning; its aim is to distinguish between similar and dissimilar pairs of points. Additionally, a newly designed triplet loss function is minimized for improved, accurate training of the triplet network. To solve the dense correspondence problem, an efficient sampling approach is utilized to achieve a good compromise between training performance and descriptor quality. During testing, given a geometry image of a point of interest, our network outputs a discriminative local descriptor for it. Extensive testing of non-rigid dense shape matching on a variety of benchmarks demonstrates the superiority of the proposed descriptors over the state-of-the-art alternatives.
The traditional space-invariant isotropic kernel utilized by a bilateral filter (BF) frequently leads to blurry edges and gradient reversal artifacts due to the existence of a large amount of outliers in the local averaging window. However, the efficient and accurate estimation of space-variant kernels which adapt to image structures, and the fast realization of the corresponding space-variant bilateral filtering are challenging problems. To address these problems, we present a space-variant BF (SVBF), and its linear time and error-bounded acceleration method. First, we accurately estimate spacevariant anisotropic kernels that vary with image structures in linear time through structure tensor and minimum spanning tree. Second, we perform SVBF in linear time using two error-bounded approximation methods, namely, low-rank tensor approximation via higher-order singular value decomposition and exponential sum approximation. Therefore, the proposed SVBF can efficiently achieve good edge-preserving results. We validate the advantages of the proposed filter in applications including: image denoising, image enhancement, and image focus editing. Experimental results demonstrate that our fast and error-bounded SVBF is superior to state-of-the-art methods.
Surface remeshing is widely required in modeling, animation, simulation, and many other computer graphics applications. Improving the elements’ quality is a challenging task in surface remeshing. Existing methods often fail to efficiently remove poor-quality elements especially in regions with sharp features. In this paper, we propose and use a robust segmentation method followed by remeshing the segmented mesh. Mesh segmentation is initiated using an existing Live-wire interaction approach and is further refined using local mesh operations. The refined segmented mesh is finally sent to the remeshing pipeline, in which each mesh segment is remeshed independently. An experimental study compares our mesh segmentation method as well as remeshing results with representative existing methods. We demonstrate that the proposed segmentation method is robust and suitable for remeshing.
Point distributions with different characteristics have a crucial influence on graphics applications. Various analysis tools have been developed in recent years, mainly for blue noise sampling in Euclidean domains. In this paper, we present a new method to analyze the properties of general sampling patterns that are distributed on mesh surfaces. The core idea is to generalize to surfaces the pair correlation function (PCF) which has successfully been employed in sampling pattern analysis and synthesis in 2D and 3D. Experimental results demonstrate that the proposed approach can reveal correlations of point sets generated by a wide range of sampling algorithms. An acceleration technique is also suggested to improve the performance of the PCF.