Self-Sparse Generative Adversarial Networks

Wenliang Qian; Yang Xu; Wangmeng Zuo; Hui Li

doi:10.26599/AIR.2022.9150005

| Sign up

PDF (4.6 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Figures (8)

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Fig. 8

Tables (24)

Table 1

Table 2

Table 3

Table 4

Table 5

Open Access

Self-Sparse Generative Adversarial Networks

Wenliang Qian^{¹^,²}, Yang Xu^{¹^,²}, Wangmeng Zuo^³, Hui Li^{¹^,²}()

1 Labortoray of Artificial Intelligence, Harbin Institute of Technology, Harbin 150001, China

2 School of Civil Engineering, Harbin Institute of Technology, Harbin 150090, China

3 School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China

Show Author Information

Abstract

Generative adversarial networks (GANs) are an unsupervised generative model that learns data distribution through adversarial training. However, recent experiments indicated that GANs are difficult to train due to the requirement of optimization in the high dimensional parameter space and the zero gradient problem. In this work, we propose a self-sparse generative adversarial network (Self-Sparse GAN) that reduces the parameter space and alleviates the zero gradient problem. In the Self-Sparse GAN, we design a self-adaptive sparse transform module (SASTM) comprising the sparsity decomposition and feature-map recombination, which can be applied on multi-channel feature maps to obtain sparse feature maps. The key idea of Self-Sparse GAN is to add the SASTM following every deconvolution layer in the generator, which can adaptively reduce the parameter space by utilizing the sparsity in multi-channel feature maps. We theoretically prove that the SASTM can not only reduce the search space of the convolution kernel weight of the generator but also alleviate the zero gradient problem by maintaining meaningful features in the batch normalization layer and driving the weight of deconvolution layers away from being negative. The experimental results show that our method achieves the best Fréchet inception distance (FID) scores for image generation compared with Wasserstein GAN with gradient penalty (WGAN-GP) on MNIST, Fashion-MNIST, CIFAR-10, STL-10, mini-ImageNet, CELEBA-HQ, and LSUN bedrooms datasets, and the relative decrease of FID is 4.76%–21.84%. Meanwhile, an architectural sketch dataset (Sketch) is also used to validate the superiority of the proposed method.

Keywords

generative adversarial networks self-adaptive sparse transform module self-sparse generative adversarial network (Self-Sparse GAN)

References

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial nets, in Proc. 28^th Annu. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 2672–2680.

A. Radford, L. Metz, and S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks, arXiv preprint arXiv: 1511.06434, 2015.

L. Mescheder, A. Geiger, and S. Nowozin, Which Training Methods for GANs do actually converge, in Proc. 35^th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 3481–3490.

T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, Improved techniques for training GANs, in Proc. 30^th Annu. Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 2226–2234.

M. Arjovsky and L. Bottou, Towards principled methods for training generative adversarial networks, in Proc. 5^th Int. Conf. Learning Representations, Toulon, France, 2017.

S. Jenni and P. Favaro, On stabilizing generative adversarial training with noise, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 12137–12145.

Crossref

T. Karras, T. Aila, S. Laine, and J. Lehtinen, Progressive growing of GANs for improved quality, stability, and variation, arXiv preprint arXiv: 1710.10196, 2017.

H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, Self-attention generative adversarial networks, in Proc. 36^th Int. Conf. Machine Learning, Long Beach, CA, USA, 2019, pp. 7354–7363.

M. Arjovsky, S. Chintala, and L. Bottou, Wasserstein GAN, arXiv preprint arXiv: 1701.07875, 2017.

X. Mao, Q. Li, H. Xie, R. Y. K. Lau, Z. Wang, and S. P. Smolley, Least squares generative adversarial networks, in Proc. 2017 IEEE Int. Conf. on Computer Vision, Venice, Italy, 2017, pp. 2813–2821.

Crossref

I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, Improved training of Wasserstein GANs, in Proc. 31^st Annu. Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 5767–5777.

T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, Spectral normalization for generative adversarial networks, in Proc. 6^th Int. Conf. Learning Representations, Vancouver, Canada, 2018.

B. Liu, M. Wang, H. Foroosh, M. Tappen, and M. Penksy, Sparse convolutional neural networks, in Proc. 2015 IEEE Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 806–814.

C. Louizos, M. Welling, and D. P. Kingma, Learning sparse neural networks through L_0 regularization, in Proc. 6^th Int. Conf. Learning Representations, Vancouver, Canada, 2018.

L. Deng, The MNIST database of handwritten digit images for machine learning research [best of the web], IEEE Signal Process. Mag., vol. 29, no. 6, pp. 141–142, 2012.

Crossref Google Scholar

H. Xiao, K. Rasul, and R. Vollgraf, Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms, arXiv preprint arXiv: 1708.07747, 2017.

A. Krizhevsky, Learning multiple layers of features from tiny images, https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf, 2009.

A. Coates, A. Ng, and H. Lee, An analysis of single-layer networks in unsupervised feature learning, in Proc. 14th Int. Conf. Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 2011, 215–223.

O. Vinyals, C. Blundell, T. Lillicrap, K. Kavukcuoglu, and D. Wierstra. Matching networks for one shot learning, in Proc. 30^th Annu. Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 3630–3638.

F. Yu, Y. Zhang, S. Song, A. Seff, and J. Xiao, LSUN: Construction of a large-scale image dataset using deep learning with humans in the Loop, arXiv preprint arXiv: 1506.03365, 2015

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, GANs trained by a two time-scale update rule converge to a local Nash equilibrium, in Proc. 31^st Annu. Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 6626–6637.

A. Brock, J. Donahue, and K. Simonyan, Large scale GAN training for high fidelity natural image synthesis, arXiv preprint arXiv: 1809.11096, 2018.

Y. Zhou and T. L. Berg, Learning temporal transformations from time-lapse videos, in Proc. 14^th Eur. Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 262–277.

Crossref

P. Isola, J. Y. Zhu, T. Zhou, and Alexei A. Efros, Image-to-image translation with conditional adversarial networks, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 5967–5976.

Crossref

O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin, and J. Matas, Deblurgan: Blind motion deblurring using conditional adversarial networks, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 8183–8192.

Crossref

M. Huh, S. H. Sun, and N. Zhang, Feedback adversarial learning: Spatial feedback for improving generative adversarial networks, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 1476–1485.

Crossref

T. Karras, S. Laine, and T. Aila, A style-based generator architecture for generative adversarial networks, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 4396–4405.

Crossref

T. Chen, M. Lucic, N. Houlsby, and S. Gelly, On self modulation for generative adversarial networks, arXiv preprint arXiv: 1810.01365, 2018.

S. Mahdizadehaghdam, A. Panahi, and H. Krim, Sparse generative adversarial network, in Proc. 2019 IEEE/CVF Int. Conf. Computer Vision Workshop, Seoul, Republic of Korea, 2019, pp. 3063-3071.

Crossref

F. Liu, L. Jiao, and X. Tang, Task-oriented GAN for PolSAR image classification and clustering, IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 9, pp. 2707–2719, 2019.

Crossref Google Scholar

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, in Proc. 26^th Annu. Conf. Neural Information Processing Systems, Lake Tahoe, NV, USA, 2012, pp. 1106–1114.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv: 1409.1556, 2014.

R. Shang, W. Zhang, M. Lu, L. Jiao, and Y. Li, Feature selection based on non-negative spectral feature learning and adaptive rank constraint, Knowl. -Based Syst., vol. 236, p. 107749, 2022.

Crossref Google Scholar

R. Shang, X. Zhang, J. Feng, Y. Li, and L. Jiao, Sparse and low-dimensional representation with maximum entropy adaptive graph for feature selection, Neurocomputing, vol. 485, pp. 57–73, 2022.

Crossref Google Scholar

J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang and, H. Lu, Dual attention network for scene segmentation, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 3141–3149.

Crossref

Y. Wang, Z. Chen, F. Wu, and G. Wang, Person re-identification with cascaded pairwise convolutions, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2019, pp. 1470-1478.

Crossref

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization. arXiv preprint arXiv: 1412.6980, 2014.

W. Qian, Y. Xu, and H. Li, A self-sparse generative adversarial network for autonomous early-stage design of architectural sketches, Comput. -Aided Civ. Infrastruct. Eng., vol. 37, no. 5, pp. 612–628, 2021.

Crossref Google Scholar

CAAI Artificial Intelligence Research

Volume 1 Issue 1,
September 2022

Pages 68-78

DOI: 10.26599/AIR.2022.9150005

Cite this article:

Qian W, Xu Y, Zuo W, et al. Self-Sparse Generative Adversarial Networks. CAAI Artificial Intelligence Research, 2022, 1(1): 68-78. https://doi.org/10.26599/AIR.2022.9150005

Return

Table 1Investigated pixel resolutions of generated images.

Dataset	Pixel resolution of the generated image
MNIST	32×32, 128×128
Fashion-MNIST	64×64, 128×128
CIFAR-10	128×128×3
STL-10	64×64×3
mini-ImageNet	32×32×3, 64×64×3
CELEBA-HQ	64×64×3, 128×128×3
LSUN bedrooms	64×64×3, 128×128×3

Table 2Comparison of FIDs between our proposed Self-Sparse GAN and the baseline WGAN-GP. The mean and standard deviation of the FID are calculated through three individual trainings with different random seeds.

Dataset	Pixel resolution	Model	FID	Dataset	Pixel resolution	Model	FID
MNIST (Grayscale)	32×32×1	WGAN-GP	7.43 ± 0.28	mini- ImageNet (RGB)	32×32×3	WGAN-GP	33.16 ± 0.02
	32×32×1	Self-Sparse GAN	6.26 ± 0.64		32×32×3	Self-Sparse GAN	28.88 ± 0.37
	128×128×1	WGAN-GP	10.42 ± 0.86		64×64×3	WGAN-GP	58.81 ± 3.28
	128×128×1	Self-Sparse GAN	8.32 ± 1.03		64×64×3	Self-Sparse GAN	54.78 ± 0.28
Fashion- MNIST (Grayscale)	64×64×1	WGAN-GP	20.37 ± 0.87	CELEBA- HQ (RGB)	64×64×3	WGAN-GP	15.95 ± 0.44
	64×64×1	Self-Sparse GAN	15.92 ± 1.10		64×64×3	Self-Sparse GAN	15.19 ± 0.18
	128×128×1	WGAN-GP	20.41 ± 0.70		128×128×3	WGAN-GP	32.40 ± 2.03
	128×128×1	Self-Sparse GAN	17.67 ± 0.87		128×128×3	Self-Sparse GAN	27.72 ± 1.54
CIFAR-10 (RGB)	128×128×3	WGAN-GP	43.77 ± 2.10	LSUN bedrooms (RGB)	64×64×3	WGAN-GP	59.12 ± 0.95
CIFAR-10 (RGB)	128×128×3	Self-Sparse GAN	36.69 ± 1.53		64×64×3	Self-Sparse GAN	55.06 ± 2.00
STL-10 (RGB)	64×64×3	WGAN-GP	63.88 ± 1.33		128×128×3	WGAN-GP	102.16 ± 0.85
STL-10 (RGB)	64×64×3	Self-Sparse GAN	56.23 ± 1.38		128×128×3	Self-Sparse GAN	84.78 ± 2.89

Table 3Comparisons of FIDs in ablation studies on Fashion-MNIST and STL-10.

Dataset	Pixel resolution	Method	FID
Fashion-MNIST	128×128×1	WGAN-GP	20.41 ± 0.70
		Self-Sparse GAN	17.67 ± 0.87
		Without PSM	21.51 ± 0.29
		Without CSM	163.71±3.99
STL-10	64×64×3	WGAN-GP	63.88 ± 1.33
		Self-Sparse GAN	56.23 ± 1.38
		Without PSM	60.85 ± 0.99
		Without CSM	64.87 ± 1.71

Table 4Comparisons of FID in the robustness experiments on STL-10 with different Adam hyperparameter settings.

Dataset	Betas	Method	FID
STL-10	0, 0.9	WGAN-GP	63.88 ± 1.33
	0, 0.9	Self-Sparse GAN	56.23 ± 1.38
	0.5, 0.999	WGAN-GP	67.13 ± 0.77
	0.5, 0.999	Self-Sparse GAN	56.51 ± 1.62

Table 5The SASTM Architecture on STL-10 with 128×128×3 resolution.

Layer	Operator
f	Linear: $z \in R^{128} \sim N (0, 1) \to$ 256, ReLU
g₁	Linear: 256 $\to$ M_c, ReLU
g₂	Conv or DeConv. $16 \times 16 \to M_{h} \times M_{w}$ , ReLU

Table 6ResNet Generator and Discriminator in the robustness experiment on STL-10.

Model	Layer
Generator	Firstblock	$z \in R^{128} \sim N (0, 1)$ DeConv. 4×4, stride =1, padding =0,SASTM
	ResNetblock	BN-512, ReLU, Upsample-2,Conv. 3×3, stride=1, padding=1,BN-256, ReLU,Conv. 3×3, stride=1, padding=1,SASTM
	ResNetblock	BN-256, ReLU, Upsample-2,Conv. 3×3, stride=1, padding=1,BN-128, ReLU,Conv. 3×3, stride=1, padding=1,SASTM
Discriminator	Conv. 4×4, stride=2, padding=1,IN-64, LeakyReLU,Conv. 3×3, stride=1, padding=1,IN-128, LeakyReLU,Conv. 3×3, stride=1, padding=1, Downsample-0.5,IN-128, LeakyReLU,Conv. 3×3, stride=1, padding=1,IN-256, LeakyReLU,Conv. 3×3, stride=1, padding=1,Downsample-0.5

Table 7Comparisons of FID in the robustness experiments on STL-10 with different network architectures.

Dataset	Architecture	Method	FID
STL-10	DCGAN	WGAN-GP	63.88 ± 1.33
	DCGAN	Self-Sparse GAN	56.23 ± 1.38
	ResNet	WGAN-GP	65.16 ± 5.96
	ResNet	Self-Sparse GAN	60.31 ± 4.29

Table 8Relationship between sparseness and FID with the same network architecture.

Pixel resolution	Dataset	Average position sparsity rate	FID reduction (%)
64×64	STL-10	0.44	11.97
	mini-ImageNet	0.47	6.83
	CELEBA-HQ	0.26	4.76
	LSUN bedrooms	0.39	6.86
128×128	MNIST	0.53	20.21
	CIFAR-10	0.47	16.17
	CELEBA-HQ	0.21	14.43
	LSUN bedrooms	0.31	17.02

Table 9FIDs with WGAN-GP and Self-Sparse GAN on the Sketch.

Dataset	Method	FID
Sketch	WGAN-GP	76.68 ± 1.08
Sketch	Self-SparseGAN	67.58 ± 0.89

Table 10Comparison between WGAN-GP, SAGAN and Self-Sparse GAN

Dataset	Model	FID
CELEBA-HQ	WGAN-GP	32.40 ± 2.03
	SAGAN	30.94 ± 1.01
	Self-Sparse GAN	27.72 ± 1.54
Sketch	WGAN-GP	76.68 ± 1.08
	SAGAN	76.97 ± 1.07
	Self-Sparse GAN	67.58 ± 0.89

Table A1Self-Sparse GAN generator and discriminator with 32 × 32 resolution. $z \in R^{100} \sim N (0, 1)$ on MNIST and $z \in R^{128} \sim N (0, 1)$ on CIFAR-10.

Generator	Discriminator
DeConv. 4×4, stride=1, padding=0, SASTM, BN-256, ReLU,	Conv. 4×4, stride=2, padding=1, IN-64, LeakyReLU
DeConv. 4×4, stride=2, padding=1, SASTM , BN-128, ReLU,	Conv. 4×4, stride=2, padding=1, IN-128, LeakyReLU
DeConv. 4×4, stride=2, padding=1, SASTM, BN-64, ReLU,	Conv. 4×4, stride=2, padding=1, IN-256, LeakyReLU
DeConv. 4×4, stride=2, padding=1, Tanh	Conv. 4×4, stride=1, padding=0

Table A2Self-Sparse GAN generator and discriminator with 64 × 64 resolution, $z \in R^{100} \sim N (0, 1)$ on MNIST and $z \in R^{128} \sim N (0, 1)$ on CIFAR-10.

Generator	Discriminator
DeConv. 4×4, stride=1, padding=0, SASTM, BN-128, ReLU	Conv. 4×4, stride=2, padding=1, IN-128, LeakyReLU
DeConv.4×4, stride=2, padding=1, SASTM, BN-128, ReLU	Conv. 4×4, stride=2, padding=1, IN-128, LeakyReLU
DeConv.4×4, stride=2, padding=1, SASTM, BN-128, ReLU	Conv. 4×4, stride=2, padding=1, IN-128, LeakyReLU
DeConv.4×4, stride=2, padding=1, SASTM, BN-128, ReLU	Conv.4×4, stride=2, padding=1, IN-128, LeakyReLU
DeConv. 4×4, stride=2, padding=1, Tanh	Conv.4×4, stride=1, padding=0

Table A3Self-Sparse GAN generator and discriminator with 128 × 128 resolution, $z \in R^{100} \sim N (0, 1)$ on MNIST and $z \in R^{128} \sim N (0, 1)$ on CIFAR-10.

Generator	Discriminator
DeConv. 4×4, stride=1, padding=0, SASTM, BN-512, ReLU	Conv. 4×4, stride=2, padding=1, IN-32, LeakyReLU
DeConv. 4×4, stride=2, padding=1, SASTM, BN-256, ReLU	Conv. 4×4, stride=2, padding=1, IN-64, LeakyReLU
DeConv. 4×4, stride=2, padding=1, SASTM, BN-128, ReLU	Conv. 4×4, stride=2, padding=1, IN-128, LeakyReLU
DeConv. 4×4, stride=2, padding=1, SASTM, BN-64, ReLU	Conv. 4×4, stride=2, padding=1, IN-256, LeakyReLU
DeConv. 4×4, stride=2, padding=1, SASTM, BN-32, ReLU	Conv. 4×4, stride=2, padding=1, IN-512, LeakyReLU
DeConv. 4×4, stride=2, padding=1, Tanh	Conv.4×4, stride=1, padding=0

Table A4SASTM architecture, $z \in R^{100} \sim N (0, 1)$ on MNIST and $z \in$ $R^{128} \sim N (0, 1)$ on CIFAR-10.

Layers	Operators
$f$	Linear: $z \to 1024$ , ReLULinear: $1024 \to 512$ , ReLU
$g_{1}$	Linear: $512 \to M_{c}$ , ReLU
$g_{2}$	Linear: $512 \to 256$ , ReLUConv or DeConv: $16 \times 16 \to M_{h} \times M_{w}$ , ReLU

Table B1Self-Sparse GAN generator and discriminator with 64 × 64 × 1 resolution, and $z \in R^{100} \sim N (0, 1)$ .

Generator	Discriminator
DeConv. 4×4, stride=1, padding=0, SASTM, BN-128, ReLU	Conv. 4×4, stride=2, padding=1, IN-128, LeakyReLU
DeConv.4×4, stride=2, padding=1, SASTM, BN-128, ReLU	Conv. 4×4, stride=2, padding=1, IN-128, LeakyReLU
DeConv.4×4, stride=2, padding=1, SASTM, BN-128, ReLU	Conv. 4×4, stride=2, padding=1, IN-128, LeakyReLU
DeConv.4×4, stride=2, padding=1, SASTM, BN-128, ReLU	Conv.4×4, stride=2, padding=1, IN-128, LeakyReLU
DeConv. 4×4, stride=2, padding=1, Tanh	Conv.4×4, stride=1, padding=0

Table B2Self-Sparse GAN generator and discriminator with 128 × 128 × 1 resolution, and $z \in R^{100} \sim N (0, 1)$ .

Generator	Discriminator
DeConv. 4×4, stride=1, padding=0, SASTM, BN-128, ReLU	Conv. 4×4, stride=2, padding=1, IN-128, LeakyReLU
DeConv.4×4, stride=2, padding=1, SASTM, BN-128, ReLU	Conv. 4×4, stride=2, padding=1, IN-128, LeakyReLU
DeConv.4×4, stride=2, padding=1, SASTM, BN-128, ReLU	Conv. 4×4, stride=2, padding=1, IN-128, LeakyReLU
DeConv.4×4, stride=2, padding=1, SASTM, BN-128, ReLU	Conv. 4×4, stride=2, padding=1, IN-128, LeakyReLU
DeConv.4×4, stride=2, padding=1, SASTM, BN-128, ReLU	Conv. 4×4, stride=2, padding=1, IN-128, LeakyReLU
DeConv. 4×4, stride=2, padding=1, Tanh	Conv. 4×4, stride=1, padding=0

Table B3SASTM architecture with 64 × 64 × 1 and 128 × 128 × 1 resolution.

Layers	Operators
$f$	Linear: $z \in R^{100} \sim N (0, 1) \to 1024$ , ReLULinear: $1024 \to 512$ , ReLULinear: $512 \to 256$ , ReLU
$g_{1}$	Linear: $256 \to M_{c}$ , ReLU
$g_{2}$	Conv or DeConv: $16 \times 16 \to M_{h} \times M_{w}$ , ReLU

Table C1Self-Sparse GAN generator and discriminator with 64 × 64 × 3 resolution, and $z \in R^{128} \sim N (0, 1)$ .

Generator	Discriminator
DeConv. 4×4, stride=1, padding=0, SASTM, BN-512, ReLU	Conv. 4×4, stride=2, padding=1, IN-64, LeakyReLU
DeConv. 4×4, stride=2, padding=1, SASTM, BN-256, ReLU	Conv. 4×4, stride=2, padding=1, IN-128, LeakyReLU
DeConv. 4×4, stride=2, padding=1, SASTM, BN-128, ReLU	Conv. 4×4, stride=2, padding=1, IN-256, LeakyReLU
DeConv. 4×4, stride=2, padding=1, SASTM, BN-64, ReLU	Conv. 4×4, stride=2, padding=1, IN-512, LeakyReLU
DeConv. 4×4, stride=2, padding=1, Tanh	Conv.4×4, stride=1, padding=0

Table C2Self-Sparse GAN generator and discriminator with 128 × 128 × 3 resolution, and $z \in R^{128} \sim N (0, 1)$ .

Generator	Discriminator
DeConv. 4×4, stride=1, padding=0, SASTM, BN-512, ReLU	Conv. 4×4, stride=2, padding=1, IN-32, LeakyReLU
DeConv. 4×4, stride=2, padding=1, SASTM, BN-256, ReLU	Conv. 4×4, stride=2, padding=1, IN-64, LeakyReLU
DeConv. 4×4, stride=2, padding=1, SASTM, BN-128, ReLU	Conv. 4×4, stride=2, padding=1, IN-128, LeakyReLU
DeConv. 4×4, stride=2, padding=1, SASTM, BN-64, ReLU	Conv. 4×4, stride=2, padding=1, IN-256, LeakyReLU
DeConv. 4×4, stride=2, padding=1, SASTM, BN-32, ReLU	Conv. 4×4, stride=2, padding=1, IN-512, LeakyReLU
DeConv. 4×4, stride=2, padding=1, Tanh	Conv. 4×4, stride=1, padding=0

Table C3SASTM architecture with 64×64×3 and 128×128 × 3 resolutions.

Layer	Operator
$f$	Linear: $z \in R^{128} \sim N (0, 1) \to 256$ , ReLU
$g_{1}$	Linear: $256 \to M_{c}$ , ReLU
$g_{2}$	Conv or DeConv: $16 \times 16 \to M_{h} \times M_{w}$ , ReLU

Table D1Self-Sparse GAN generator and discriminator with 32 × 32 × 3 resolution.

Generator	Discriminator
DeConv. 4×4, stride=1, padding=0, SASTM, BN-512, ReLU	Conv. 4×4, stride=2, padding=1, IN-128, LeakyReLU
DeConv. 4×4, stride=2, padding=1, SASTM, BN-256, ReLU	Conv. 4×4, stride=2, padding=1, IN-256, LeakyReLU
DeConv. 4×4, stride=2, padding=1, SASTM, BN-128, ReLU	Conv. 4×4, stride=2, padding=1, IN-512, LeakyReLU
DeConv. 4×4, stride=2, padding=1, Tanh	Conv. 4×4, stride=1, padding=0

Table D2Self-Sparse GAN generator and discriminator with 64 × 64 × 3 resolution.

Generator	Discriminator
DeConv. 4×4, stride=1, padding=0, SASTM, BN-512, ReLU	Conv. 4×4, stride=2, padding=1, IN-64, LeakyReLU
DeConv. 4×4, stride=2, padding=1, SASTM, BN-256, ReLU	Conv. 4×4, stride=2, padding=1, IN-128, LeakyReLU
DeConv. 4×4, stride=2, padding=1, SASTM, BN-128, ReLU	Conv. 4×4, stride=2, padding=1, IN-256, LeakyReLU
DeConv. 4×4, stride=2, padding=1, SASTM, BN-64, ReLU	Conv. 4×4, stride=2, padding=1, IN-512, LeakyReLU
DeConv. 4×4, stride=2, padding=1, Tanh	Conv.4×4, stride=1, padding=0

Table D3SASTM architecture with 32 × 32 × 3 resolution.

Layer	Operator
$f$	Linear: $z \in R^{128} \sim N (0, 1) \to 1024$ , ReLULinear: $1024 \to 512$ , ReLULinear: $512 \to 256$ , ReLU
$g_{1}$	Linear: $512 \to M_{c}$ , ReLU
$g_{2}$	Linear: $512 \to 256$ , ReLUConv or DeConv: $16 \times 16 \to M_{h} \times M_{w}$ , ReLU

Table D4SASTM architecture with 64 × 64 × 3 resolution.

Layer	Operator
$f$	Linear: $z \in R^{128} \sim N (0, 1) \to 256$ , ReLU
$g_{1}$	Linear: $256 \to M_{c}$ , ReLU
$g_{2}$	Conv or DeConv: $16 \times 16 \to M_{h} \times M_{w}$ , ReLU