Diffusion-Based Adaptation for Classification of Unknown Degraded Images

1Tokyo Institute of Technology
2Teikyo Heisei University
CVPR Workshops 2024
Proposed Method Figure 1: Architecture diagram of the proposed method, where the top figure shows the overall inferencing process and the bottom block represents the training process of using knowledge distillation from pre-trained teacher networks. Symbol © represents concatenation of inputs , , and . Grey/blue and orange/blue blocks represent pre-trained teacher and student networks.

Abstract

Classification of unknown degraded images is essential in practical applications since image-degraded models are usually unknown. Diffusion-based models provide enhanced performance for image enhancement and image restoration from degraded images. In this study, we use the diffusion-based model for the adaptation instead of restoration. Restoration from the degraded image aims to restore the degrade-free clean image, while adaptation from the degraded image transforms the degraded image towards a clean image domain. However, the diffusion models struggle to perform image adaptation in case of specific degradations attributable to the unknown degradation models. To address the issue of imperfect adapted clean images from diffusion models for the classification of degraded images, we propose a novel Diffusion-based Adaptation for Unknown Degraded images (DiffAUD) method based on robust classifiers trained on a few known degradations. Our proposed method complements the diffusion models and consistently generalizes well on different types of degradations with varying severities. DiffAUD improves the performance from the baseline diffusion model and clean classifier on the Imagenet-C dataset by 5.5%, 5%, and 5% with ResNet-50, Swin Transformer (Tiny), and ConvNeXt-Tiny backbones, respectively. Moreover, we exhibit that training classifiers using known degradations provides significant performance gains for classifying degraded images.

Method

Motivation: The performance of typical classifiers significantly drops due to unknown degradation. Hence, we employ DDPM to adjust degraded images towards the domain of clean images. We inherently assume that the adapted images domain is better than directly using unknown degraded images for classification. Indeed, previous studies like DDA have shown that DDPM helps improve the performance of classifying unknown degraded images. The DDA method applies an ensemble of classifiers trained on clean images to the input of degraded and adapted images to resolve imperfect adapted images' limitations. Contrastingly, we train two separate classifiers on adapted and degraded images that substantially improve classification performance for both adapted and degraded images. In particular, a classifier trained on adapted images with a limited set of known degradations anticipates imperfections in the image, thereby contributing to the robustness of our proposed method. Similarly, a classifier trained on degraded images of a few dissimilar known degradations helps our proposed method handle the degraded images directly.

Notations: We categorize three types of training images, i.e., clean, degraded, and adapted, denoted as , , and , respectively. Clean images are natural images without degradation; degraded images undergo synthesis using a specific degradation model, and the adapted images are sampled by applying DDPM on degraded images. Furthermore, there are two types of classifier in our study, i.e., simple classifier and distilled classifier, denoted as and . and are trained using image and label pairs , where represents clean, degraded, and adapted images respectively. We represent classifiers trained with clean, adapted, and degraded images as , , and , respectively. Likewise, distilled classifiers trained using , , and images are represented as , , and respectively. Besides, there are two other symbols utilized in our study, i.e., describes the DDPM process for adaptation such as the one described in DDA and denotes the ensemble, which comprises a set of distinct classifiers defined as .

Proposed Method: We propose DiffAUD, i.e., diffusion-based adaptation for unknown degraded images as described in Figure 1, where the top block shows the overall process for the classification of degraded images, which constitutes applying a diffusion model and an ensemble of distilled classifiers and to get the final classification prediction. Furthermore, to apply ensemble, we take the sum of logits from the two classifiers before the softmax function and apply to predict the input image class. Our proposed method is split into three steps as follows:
  1. Apply DDPM on the degraded images to yield adapted images .
  2. Feed adapted images to a distilled classifier trained on adapted images from known degradations, i.e., and in parallel, we input degraded images directly to a distilled classifier trained on known degradation images, i.e., .
  3. Apply ensemble on the outputs of two distilled classifiers to output .

Sample training images

To show images utilized for training in our study, we show a few sample images. All sample figures include clean images, and pairs of their corresponding known degraded and adapted images from the diffusion model.

Results

To provide an in-depth view of the Imagenet-C dataset, we show accuracy over different severity levels averaged over all respective corruptions in Figure 3 with different backbones. With an increase in severity levels from 1 to 5, naturally, performance drops for all methods. On the ResNet-50 backbone, the performance of the DDA method becomes closer to the method towards low severity levels. Similarly, on Swin-Tiny and ConvNeXt-Tiny backbones, we can see similar patterns where performs almost similarly or, in fact, better on lower degradation levels than DDA. While DDA performs decently compared to on higher severity levels, severity levels are often unknown in real-world images, making the DDA method much more prone to performance reduction on lower severity levels. Next, performance of and is very close to each other; however, performing slightly better on Swin-Tiny and ConvNeXt-Tiny backbones; showing the effectiveness of our distillation strategy for training classifiers.

On the other hand, our proposed method DiffAUD consistently performs drastically better on all severity levels as compared to as well as DDA methods, which shows that DiffAUD is invariant of lower adaptation quality from the diffusion models following the same diffusion process as DDA. It makes our work more significant toward achieving higher robustness and generalization, which can work with different diffusion models and adaptation processes.

Result Graphs Figure 3: Performance with several backbones on Imagenet-C dataset with different severity levels averaged over all corruptions.

BibTeX

@InProceedings{Daultani_2024_CVPR,
    author    = {Daultani, Dinesh and Tanaka, Masayuki and Okutomi, Masatoshi and Endo, Kazuki},
    title     = {Diffusion-Based Adaptation for Classification of Unknown Degraded Images},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month     = {June},
    year      = {2024},
    pages     = {5982-5991}
}