In this paper, we reconsider the clustering problem for image over-segmentation from a new per-spective. We propose a novel search algorithm called "active search" which explicitly considers neighbor continuity. Based on this search method, we design a back-and-forth traversal strategy and a joint assignment and update step to speed up the algorithm. Compared to earlier methods, such as simple linear iterative clustering (SLIC) and its variants, which use fixed search regions and perform the assignment and the update steps separately, our novel scheme reduces the number of iterations required for convergence, and also provides better boundaries in the over-segmentation results. Extensive evaluation using the Berkeley segmentation benchmark verifies that our method outperforms competing methods under various evaluation metrics. In particular, our method is fastest, achieving approximately 30 fps for a image on a single CPU core. To facilitate further research, our code is made publicly available.
L.Vincent,; P.Soille,Watersheds in digital spaces: An efficient algorithm based on immersion simulations. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 13, No. 6, 583-598, 1991.
O.Veksler,; Y.Boykov,; P.Mehrani,Superpixels and supervoxels in an energy optimization framework. In: Computer Vision - ECCV 2010. Lecture Notes in Computer Science, Vol. 6315.K.Daniilidis,; P.Maragos,; N.Paragios,Eds.Springer Berlin Heidelberg, 211-224, 2010.
Y.Boykov,; V.Kolmogorov,An experimental-comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 26, No. 9, 1124-1137, 2004.
V.Kolmogorov,; R.Zabin,What energy functions can be minimized via graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 26, No. 2, 147-159, 2004.
A.Levinshtein,; A.Stere,; K. N.Kutulakos,; D. J.Fleet,; S. J.Dickinson,; K.Siddiqi,TurboPixels: Fast superpixels using geometric flows. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 31, No. 12, 2290-2297, 2009.
S.Osher,; J. A.Sethian,Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton-Jacobi formulations. Journal of Computational Physics Vol. 79, No. 1, 12-49, 1988.
G.Peyré,; M.Péchaud,; R.Keriven,; L. D.Cohen,Geodesic methods in computer vision and graphics. Foundations and Trends® in Computer Graphics and Vision Vol. 5, Nos. 3-4, 197-397, 2010.
Y.-J.Liu,; C.-C.Yu,; M.-J.Yu,; Y.He,Manifold SLIC: A fast method to compute content-sensitive superpixels. In: Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition, 651-659, 2016.
R.Mottaghi,; X.Chen,; X.Liu,; N.-G.Cho,; S.-W.Lee,; S.Fidler,; R.Urtasun,; A.Yuille,The role of contextfor object detection and semantic segmentation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 891-898, 2014.
Zhao J, Bo R, Hou Q, et al. FLIC: Fast linear iterative clustering with active search. Computational Visual Media, 2018, 4(4): 333-348. https://doi.org/10.1007/s41095-018-0123-y
This article is published with open access at Springerlink.com
The articles published in this journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.
10.1007/s41095-018-0123-y.F1
(a) Search method used in SLIC. Each seed only searches over a limited region to reduce computational cost. (b) Our active search method. Each pixel decides its own label by searching its surroundings.
3 Preliminaries
Before introducing our approach that allows adaptive search regions and joint assignment and update steps, we first briefly recap the standard SLIC algorithm with fixed search regions and separate steps. This approach improves upon Lloyd’s algorithm, reducing the time complexity from to , where is the number of superpixels and is the number of pixels.
Let be a color image, where represents some pixel. Given a set of evenly distributed seeds , SLIC simplifies Lloyd’s algorithm to get the centroidal Voronoi tessellation (CVT) [
22
] that will be discussed in Section 3.4. In the assignment step, each pixel is associated with those cluster seeds whose search regions overlap the pixel’s location: see
Fig. 1(a)
. The area of a search region is given by , where . In detail, SLIC considers to lie in a five dimensional space that includes a three dimensional CIELAB color space and a two dimensional spatial space . SLIC measures the distance between two points using a weighted Euclidean distance, computed by
where is a variable that controls the weight of the spatial term, and . Variables and are respectively the spatial and color distances, given by
and
In the update step, SLIC recomputes the center of each superpixel and moves the seeds to these new centers. Overall, it obtains the over-segmentation results by iteratively performing the assignment and update steps.
Variants of SLIC use a similar approach. They improve upon the performance of SLIC by using better distance measures or more suitable transformation functions between color space and spatial space. However, in these algorithms, each search region is fixed during the assignment step within each loop, and the relationships between neighboring pixels are largely ignored when allocating pixels to superpixels. Separately performing the assignment step and the update step causes delayed incorporation of pixel label changes.
Since superpixel computation is typically used as the first step of other vision applications, the rapid generation of superpixels with good boundaries is a crucial problem. Here, unlike previous algorithms [
17
,
21
], we consider this problem from a new point of view, in which only surrounding pixels are considered for determining the label of the current pixel. Each pixel actively selects which superpixel it should belong to in a back-and-forth order to provide better determination of over-segmentation regions. Moreover, the assignment step and the update step are performed jointly. Very few iterations are required for our approach to converge. An overview of our algorithm is provided in Algorithm 3.
3.1 Problem setup
Given the desired number of superpixels and an input image , where is the number of pixels, our goal is to produce a series of disjoint small regions (or superpixels). Following most previous works [
17
], the original RGB color space is transformed to the more useful CIELAB color space. Thus, each pixel in an image can be represented in a five dimensional space:
We first divide the original image into a regular grid containing elements with step length as in Ref. [
17
], and the initial label for each pixel is assigned as
We initialize the seed in at its centroid. Therefore, can also be defined as being in the same five dimensional space:
3.2 Label decision
In most natural images adjacent pixels tend to share the same labels, i.e., neighboring pixels havenatural continuity. Thus, we propose an active search method to leverage as much of this a priori information as possible. In our method, unlike most previous approaches [
17
,
21
], the label of each pixel is only determined by its neighbors. We compute the distances between the current pixel and the seeds of its four or eight adjacent pixels—see
Fig. 1
. Specifically, for a pixel , our assignment principle is
where consists of and its four neighboring pixels, and is ’s corresponding superpixel seed. We use Eq. (
1
) to measure the distance .
Since each pixel can only be assigned to a superpixel containing at least one of its neighbors, local pixel continuity has a stronger effect in our proposed strategy, allowing each pixel to actively assign itself to one of its surrounding closely connected superpixel regions. The advantages of such a strategy are clear: firstly, the nearby assignment principle can avoid the occurrence of too many isolated regions, indirectly preserving the desired number of superpixels. Secondly, this assignment operation is not limited by a fixed range in space, resulting in better superpixel boundary compliance even when very complicated content leads to irregular superpixel shapes. Furthermore, during the assignment process, the superpixel centers are also self-adaptively modified, leading to faster convergence. Detailed demonstration and analysis are given in Section 4.5. It is worth mentioning that the neighbors of the internal pixels in a superpixel normally share the same labels, so it is unnecessary to process them further, allowing us to process each superpixel extremely quickly.
3.3 Traversal order
The traversal order plays a very important role in our approach: an appropriate scanning order may lead to a visually better segmentation. As explained in Section 3.2, the label of each pixel only depends on the seeds of its surrounding pixels. Thus, in a superpixel, the label of the current pixel is directly or indirectly related to those pixels that have already been processed. To better take advantage of this avalanche effect, we adopt a back-and-forth traversal order as in PatchMatch [
23
], in which the pixels that are processed later will benefit from updates to previously processed pixels.
Figure 2
illustrates this process. In the forward pass, the label decision for each pixel considers the information from the top surrounding pixels of the superpixel, and similarly, the backward pass will provide the information from the bottom surrounding pixels of the superpixel. With such a scanning order, all the surrounding information can be taken into consideration, yielding better segments.
10.1007/s41095-018-0123-y.F2
Scanning order for each superpixel. Gray regions enclosed by blue lines represent superpixels. Red dashed rectangles denote their corresponding bounding boxes. We first scan the bounding box from left to right and top to bottom (b) and then in the opposite direction (c). The shape of each superpixel may change, whereupon we update the bounding box (d) if necessary.
While an arbitrary superpixel may have an irregular shape instead of a simple rectangle or square, we use a simple strategy to traverse the whole superpixel. We first find a minimum bounding box within which all its pixels are enclosed, as shown in
Fig. 2
. We then perform the scanning process for all pixels in the corresponding minimum bounding box and only deal with those pixels that are within the superpixel.
3.4 Joint assignment and update step
It is common in existing methods, such as SLIC [
17
], that the assignment and update steps are performed separately, leading to delayed feedback from pixel label changes to superpixel seeds. An obvious problem of such a strategy is that many (normally more than five) iterations are required before convergence. In our approach, based on the assignment principle in Eq. (
7
), we use a joint assignment and update strategy which performs these two steps at a finer granularity. This approach is able to adjust the superpixel seed center position on the fly, significantly reducing the number of iterations needed for convergence. Since most clustering-based superpixel methods use the centroidal Voronoi tessellation (CVT), we first briefly introduce the CVT and then describe our method.
Let be the set of seeds in the image, where is the expected number of superpixels. The Voronoi cell of a seed is denoted by
where is an arbitrary distance measure from pixel to the seed . The Voronoi diagram is defined by
A CVT is then defined as a Voronoi diagram whose generator point of each Voronoi cell is also its center of mass. The CVT is usually obtained by heuristic algorithms, such as Lloyd’s algorithm, iteratively performing updates after each assignment step until convergence is reached.
In our approach, on account of our novel label decision strategy as shown in Eq. (
7
), we are able to jointly perform the update step and the assignment step instead of separately. More specifically, after pixel is processed, if its label is changed to, say, , we immediately update the current seed using the following equation:
where is the number of pixels in superpixel , and update using the following equation:
The bounding box of is also updated accordingly.
As the above updates only contain very simple arithmetic operations, they can be performed very quickly. Such an immediate update will help later pixels make a better choice during assignment, leading to better convergence.
Figure 10
given later illustrates the speed of convergence of our approach.
3.5 Superpixel processing order
In our method, superpixels are processed inde-pendently. Thus, the superpixel processing order will affect the performance of the method; label changes affect the surrounding pixels. However, we do not want current pixel changes to affect the superpixels we have already processed. Therefore, we give priority to those complex superpixels in which many pixels labels will be changed, processing them first and then simple superpixels later. Superpixels which have a uniform color are simple, and almost no pixels labels need to change.
We use color entropy to define the complexity of superpixels. We calculate it using the following formulation (
15
):
where is the number of pixels in superpixel , is a three-dimensional vector in lab color space, and is the mean of in superpixel .
After determining the entropy of all superpixels, we process them in descending order of entropy. We later demonstrate the benefits of this processing order.
10.1007/s41095-018-0123-y.F2
Scanning order for each superpixel. Gray regions enclosed by blue lines represent superpixels. Red dashed rectangles denote their corresponding bounding boxes. We first scan the bounding box from left to right and top to bottom (b) and then in the opposite direction (c). The shape of each superpixel may change, whereupon we update the bounding box (d) if necessary.
10.1007/s41095-018-0123-y.F3
Images and ground-truths for the BSDS dataset. Rows 1,3: original images. Rows 2,4: corresponding annotated edges.
10.1007/s41095-018-0123-y.F4
Images and ground-truths for the Pascal Context dataset. Rows 1,3: original images. Rows 2,4: corresponding annotated edges.
10.1007/s41095-018-0123-y.F5
Comparisons between state-of-the-art methods and our approach on the BSDS500 benchmark. In (b)-(d), is fixed to for the best trade-off between result quality and speed. In terms of boundary recall, our strategy significantly outperforms methods that take similar time. Furthermore, competitive results are also achieved compared to slower methods (e.g., the state-of-the-art ERS method [
16
]) according to all evaluation metrics, but at an order of magnitude faster speed.
10.1007/s41095-018-0123-y.F6
Visual comparison of superpixel segmentation results using various algorithms, with 100 superpixels and . Our approach follows boundaries very well and at the same time produces compact superpixels.
10.1007/s41095-018-0123-y.F7
Comparisons between state-of-the-art methods and our approach on the Pascal Context dataset. Due to the simplicity of this dataset, almost all over-segmentation methods yield good boundary recall. The proposed method achieves a competitive trade-off on this dataset.
10.1007/s41095-018-0123-y.F8
Partial sensitivity analysis for standard evaluation metrics and time taken.
10.1007/s41095-018-0123-y.F9
(a) BR- curves; is the spatial distance weight in Eq. (
1
). Our results are far better than those from SLIC for all values of . (b) BR-iteration curves. Our method converges within 2 iterations, much fewer than SLIC.
10.1007/s41095-018-0123-y.F10
Comparison of convergence rate for joint and separate assignment and update steps.
10.1007/s41095-018-0123-y.F11
Images from the Pascal Context dataset segmented by our approach with and the number of superpixels set to 1000, 400, and 200, respectively. The resulting superpixels follow region boundaries very well.
10.1007/s41095-018-0123-y.F12
Images from the BSDS dataset segmented by our proposed approach with , , and 30, respectively. When is smaller, the superpixels follow boundaries well. When is larger, the superpixels are more compact.
10.1007/s41095-018-0123-y.F13
Images from the BSDS dataset segmented by our approach with and the number of superpixels set to 1000, 400, and 200, respectively. The resulting superpixels follow region boundaries very well.
10.1007/s41095-018-0123-y.F14
Images from the Pascal Context dataset segmented by our proposed approach with , , and 30, respectively. When is smaller, the superpixels follow boundaries well. When is larger, the superpixels are more compact.
10.1007/s41095-018-0123-y.T1Boundary recall versus time for 4-neighborhoods and 8-neighborhoods with different superpixel counts: 100, 200, 300, 400
100
200
300
400
BR
Time
BR
Time
BR
Time
BR
Time
4-N
78.6
34
85.9
35
89.1
36
91.8
38
8-N
80.5
54
87.4
56
90.5
59
92.7
61
10.1007/s41095-018-0123-y.T2Boundary recall for different traversal orders. TO1 represents the default traversal order which is first from left to right and then from top to bottom. TO2 represents traversing the pixels in a superpixel first from top to bottom and then from left to right. TO3 represents traversing the pixels first from right to left and then from bottom to top. TO4 represents the combined scheme
100
200
300
400
500
600
TO1
78.6
85.9
89.1
91.8
93.1
94.6
TO2
77.6
85.1
88.5
91.4
92.8
94.3
TO3
77.9
85.3
88.8
91.4
92.9
94.4
TO4
78.1
85.3
88.7
91.5
93.2
94.4
10.1007/s41095-018-0123-y.T3Comparison of segmentation results generated by SLIC and FLIC for different numbers of superpixels. E.g., SLIC-100 indicates use of SLIC to generate 100 superpixels which were then input to a downstream algorithm to achieve the final segmentation result