CamDiff: Camouflage Image Augmentation via Diffusion

Xue-Jing Luo^¹, Shuo Wang^{¹^,²}, Zongwei Wu^¹, Christos Sakaridis^¹, Yun Cheng^¹, Deng-Ping Fan^¹

(), Luc Van Gool^¹

1Department of Information Technology and Electrical Engineering, ETH Zurich, Zurich 8092, Switzerland

2School of Systems Science, Beijing Normal University, Beijing 100875, China

Show Author Information

Abstract

The burgeoning field of Camouflaged Object Detection (COD) seeks to identify objects that blend into their surroundings. Despite the impressive performance of recent learning-based models, their robustness is limited, as existing methods may misclassify salient objects as camouflaged ones, despite these contradictory characteristics. This limitation may stem from the lack of multi-pattern training images, leading to reduced robustness against salient objects. To overcome the scarcity of multi-pattern training images, we introduce CamDiff, a novel approach inspired by AI-Generated Content (AIGC). Specifically, we leverage a latent diffusion model to synthesize salient objects in camouflaged scenes, while using the zero-shot image classification ability of the Contrastive Language-Image Pre-training (CLIP) model to prevent synthesis failures and ensure that the synthesized objects align with the input prompt. Consequently, the synthesized image retains its original camouflage label while incorporating salient objects, yielding camouflaged scenes with richer characteristics. The results of user studies show that the salient objects in our synthesized scenes attract the user’s attention more; thus, such samples pose a greater challenge to the existing COD models. Our CamDiff enables flexible editing and effcient large-scale dataset generation at a low cost. It significantly enhances the training and testing phases of COD baselines, granting them robustness across diverse domains. Our newly generated datasets and source code are available at https://github.com/drlxj/CamDiff.

Keywords

AI-generated content diffusion model camouflaged object detection salient object detection

References

[1]

H. K. Chu, W. H. Hsu, N. J. Mitra, D. Cohen-Or, T. T. Wong, and T. Y. Lee, Camouflage images, ACM Trans. Graph., vol. 29, no. 4, pp. 1–8, 2010.

Crossref Google Scholar

[2]

D. P. Fan, G. P. Ji, M. M. Cheng, and L. Shao, Concealed object detection, IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 10, pp. 6024–6042, 2022.

Crossref Google Scholar

[3]

R. He, Q. Dong, J. Lin, and R. W. H. Lau, Weakly-supervised camouflaged object detection with scribble annotations, Proc. AAAI Conf. Artif. Intell., vol. 37, no. 1, pp. 781–789, 2023.

Crossref Google Scholar

[4]

X. Hu, S. Wang, X. Qin, H. Dai, W. Ren, D. Luo, Y. Tai, and L. Shao, High-resolution iterative feedback network for camouflaged object detection, Proc. AAAI Conf. Artif. Intell., vol. 37, no. 1, pp. 881–889, 2023.

Crossref Google Scholar

[5]

Z. Huang, H. Dai, T. Z. Xiang, S. Wang, H. X. Chen, J. Qin, and H. Xiong, Feature shrinkage pyramid for camouflaged object detection with transformers, in Proc. 2023 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 2023, pp. 5557–5566.

Parameter	Value
$R A T I O_{M I N}$	6.25%
$R A T I O_{M A X}$	25%
$R A T I O_{M A S K}$	75%

Dataset		Type	Inception score $↑$
SOD	DUTSE-TE	Orig.	71.63
	ECSSD	Orig.	24.40
	XPIE (salient)	Orig.	96.79
	XPIE (not salient)	Orig.	13.96
COD	CAMO	Orig.	6.61
	CAMO	New	9.90
	CHAM	Orig.	4.38
	CHAM	new	5.98
	COD10K	Orig.	7.00
	COD10K	New	14.85
	NC4K	Orig.	7.00
	NC4K	New	12.87

Dataset		SINet^[13]	PFNet^[16]	C2FNet^[17]	ZoomNet^[14]
CAMO	$M ↓$	0.099	0.085	0.079	0.066
	$F_{m} ↑$	0.762	0.793	0.802	0.832
	$S_{m} ↑$	0.751	0.782	0.796	0.819
	$E_{m} ↑$	0.790	0.845	0.856	0.881
Diff-CAMO	$M ↓$	0.130	0.122	0.116	0.136
	$F_{m} ↑$	0.581	0.626	0.632	0.557
	$S_{m} ↑$	0.651	0.686	0.700	0.664
	$E_{m} ↑$	0.768	0.792	0.802	0.790
CHAM	$M ↓$	0.044	0.033	0.032	0.023
	$F_{m} ↑$	0.845	0.859	0.871	0.883
	$S_{m} ↑$	0.868	0.882	0.888	0.900
	$E_{m} ↑$	0.908	0.927	0.936	0.944
Diff-CHAM	$M ↓$	0.065	0.065	0.061	0.088
	$F_{m} ↑$	0.700	0.795	0.726	0.596
	$S_{m} ↑$	0.787	0.708	0.798	0.726
	$E_{m} ↑$	0.869	0.865	0.869	0.850
COD10K	$M ↓$	0.051	0.040	0.036	0.029
	$F_{m} ↑$	0.708	0.747	0.764	0.799
	$S_{m} ↑$	0.771	0.800	0.813	0.836
	$E_{m} ↑$	0.832	0.880	0.894	0.887
Diff-COD10K	$M ↓$	0.057	0.054	0.052	0.064
	$F_{m} ↑$	0.620	0.644	0.656	0.585
	$S_{m} ↑$	0.727	0.751	0.757	0.729
	$E_{m} ↑$	0.826	0.832	0.839	0.841
NC4K	$M ↓$	0.058	0.053	0.049	0.044
	$F_{m} ↑$	0.804	0.820	0.831	0.845
	$S_{m} ↑$	0.808	0.829	0.838	0.851
	$E_{m} ↑$	0.873	0.891	0.898	0.896
Diff-NC4K	$M ↓$	0.090	0.084	0.080	0.076
	$F_{m} ↑$	0.640	0.664	0.666	0.631
	$S_{m} ↑$	0.719	0.744	0.746	0.739
	$E_{m} ↑$	0.821	0.830	0.834	0.841

Algorithm 1 Mask generation.
Put the eight regions’ index in a list $candidates$ in order
Shuffle the index in $candidates$
for $i$ in $candidates$ do
if the area of region $i$ is higher than $R A T I O_{M I N}$ then
if the area of region $i$ is less than $R A T I O_{M A X}$ then
choose the area $m a s k$ that covers $R A T I O_{M A S K}$ of the total region area from the center
break
else
choose the area $m a s k$ that covers $R A T I O_{M A S K} \cdot R A T I O_{M A X}$ of the total region area from the center
break
end if
else
continue
end if
end for
return $m a s k$

Dataset		SINet^[13]		PFNet^[16]		C2FNet^[17]		ZoomNet^[14]
Dataset		Pre.	Tr.	Pre.	Tr.	Pre.	Tr.	Pre.	Tr.
DUTS-TE	$M ↓$	0.065	0.082	0.064	0.079	0.065	0.069	0.080	0.083
	$F_{m} ↑$	0.820	0.760	0.808	0.748	0.807	0.780	0.715	0.718
	$S_{m} ↑$	0.806	0.741	0.806	0.751	0.802	0.777	0.772	0.768
	$E_{m} ↑$	0.846	0.757	0.845	0.778	0.832	0.812	0.840	0.842
ECSSD	$M ↓$	0.106	0.135	0.105	0.130	0.116	0.115	0.129	0.134
	$F_{m} ↑$	0.844	0.784	0.822	0.762	0.802	0.790	0.744	0.751
	$S_{m} ↑$	0.766	0.692	0.766	0.703	0.748	0.734	0.722	0.715
	$E_{m} ↑$	0.786	0.688	0.784	0.702	0.750	0.740	0.834	0.841
XPIE-SAL	$M ↓$	0.090	0.119	0.093	0.115	0.099	0.101	0.115	0.123
	$F_{m} ↑$	0.822	0.763	0.804	0.739	0.786	0.762	0.720	0.703
	$S_{m} ↑$	0.770	0.691	0.762	697	0.749	0.728	0.723	0.705
	$E_{m} ↑$	0.805	0.697	0.792	0.709	0.768	0.749	0.820	0.815