Few-shot classification models trained with clean samples poorly classify samples from the real world with various scales of noise. To enhance the model for recognizing noisy samples, researchers usually utilize data augmentation or use noisy samples generated by adversarial training for model training. However, existing methods still have problems: (ⅰ) The effects of data augmentation on the robustness of the model are limited. (ⅱ) The noise generated by adversarial training usually causes overfitting and reduces the generalization ability of the model, which is very significant for few-shot classification. (ⅲ) Most existing methods cannot adaptively generate appropriate noise. Given the above three points, this paper proposes a noise-robust few-shot classification algorithm, VADA—Variational Adversarial Data Augmentation. Unlike existing methods, VADA utilizes a variational noise generator to generate an adaptive noise distribution according to different samples based on adversarial learning, and optimizes the generator by minimizing the expectation of the empirical risk. Applying VADA during training can make few-shot classification more robust against noisy data, while retaining generalization ability. In this paper, we utilize FEAT and ProtoNet as baseline models, and accuracy is verified on several common few-shot classification datasets, including MiniImageNet, TieredImageNet, and CUB. After training with VADA, the classification accuracy of the models increases for samples with various scales of noise.

Deep learning techniques are extensively employed in visual tasks such as autonomous driving and facial recognition. These models often incorporate data preprocessing modules to align images from various sources with their inputs. However, recent studies indicate that these deep learning models can be vulnerable to image-scaling attacks. In such attacks, carefully designed images are scaled to appear completely different, thereby misleading the model. Most vision devices leveraging deep learning technologies are equipped with an image signal processing (ISP) pipeline. This pipeline converts RAW data to RGB images and integrates data preprocessing for efficient image processing. Despite the numerous adversarial attacks proposed for deep learning, many of them fail to consider the combined impact of ISP and data preprocessing. This oversight undermines their effectiveness in attacking visual applications.
Therefore, we address this gap by considering the effects of ISP and data preprocessing. We construct an experimental platform for vision applications based on deep learning and introduce an image-scaling attack targeting the ISP pipeline. This attack involves crafting adversarial RAW data that, once processed through the ISP and scaled to specific dimensions, exhibits a completely different appearance. Our proposed attack relies on gradient information derived from the ISP process. It is essential to acknowledge that obtaining gradient information directly from the typically closed ISP process is challenging. Therefore, we construct an equivalent model to learn the target ISP transformation process. We use the approximate gradients of this model to launch the attack. Specifically, we devise an encoder–decoder architecture for the equivalent model to extract and reconstruct corresponding RGB images. Leveraging a data set comprising RAW-RGB image pairs generated by the target ISP pipeline, we employ these pairs as training data for our equivalent model. As this model is trained, it learns to capture the transformation process of the target ISP, facilitating the gradient approximation necessary for the attack. We conducted extensive experiments to demonstrate the effectiveness of our proposed attack.
The results were the following: (1) Our strategy against the target ISP process achieved 100% attack success rates in attack effectiveness. This implies that the crafted adversarial RAW data can be successfully transformed by the target ISP into an adversarial image, which subsequently misleads the model’s predictions after scaling. (2) The peak signal-to-noise ratio (PSNR) values between the adversarial RAW and clean RAW were comparable to those between the generated adversarial image and the source image. This implies that the perturbations introduced in the adversarial RAW are preserved in the generated attack image. Furthermore, all corresponding PSNR values exceeded 25, underscoring the stealthiness of the attack.
The construction of this attack encompasses various aspects, including deep learning algorithms, image processing, and adversarial attack optimization. It serves as a valuable educational resource for students to gain a deeper understanding of task processing principles in deep learning-based visual applications, as well as the vulnerabilities of deep learning models. This knowledge fosters and enhances their innovative and practical capabilities when addressing complex algorithmic problems.