HKU Data Repository
Browse

Semantic Shift Benchmark Corruption (SSB-C)

Download (14.45 GB)
dataset
posted on 2025-03-18, 02:06 authored by Hongjun WangHongjun Wang, Sagar Vaze, Kai HanKai Han

Generalized Category Discovery (GCD) is a challenging task in which, given a partially labelled dataset, models must categorize all unlabelled instances, regardless of whether they come from labelled categories or from new ones. In this paper, we challenge a remaining assumption in this task: that all images share the same domain. Specifically, we introduce a new task and method to handle GCD when the unlabelled data also contains images from different domains to the labelled set. Our proposed `HiLo' networks extract High-level semantic and Low-level domain features, before minimizing the mutual information between the representations. Our intuition is that the clusterings based on domain information and semantic information should be independent. We further extend our method with a specialized domain augmentation tailored for the GCD task, as well as a curriculum learning approach. Finally, we construct a benchmark from corrupted fine-grained datasets as well as a large-scale evaluation on DomainNet with real-world domain shifts, reimplementing a number of GCD baselines in this setting. We demonstrate that HiLo outperforms SoTA category discovery models by a large margin on all evaluations.

Funding

National Natural Science Foundation of China (Grant No. 62306251)

HKU Seed Fund for Basic Research

Hong Kong Research Grant Council - Early Career Scheme (Grant No. 27208022)

History

Usage metrics

    Research Postgraduates

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC