From SAM to CAMs: Exploring Segment Anything Model for Weakly Supervised Semantic Segmentation
– Published Date : TBD
– Category : Weakly Supervised Semantic Segmentation
– Place of publication : IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024 (oral)
Abstract:
Weakly Supervised Semantic Segmentation (WSSS) aims to learn the concept of segmentation using image-level class labels. Recent WSSS works have shown promising results by using the Segment Anything Model (SAM), a foundation model for segmentation, during the inference phase. However, we observe that these methods can still be vulnerable to the noise of class activation maps (CAMs) serving as initial seeds. As a remedy, this paper introduces From-SAM-to-CAMs (S2C), a novel WSSS framework that directly transfers the knowledge of SAM to the classifier during the training process, enhancing the quality of CAMs itself. S2C comprises SAM-segment Contrasting (SSC) and a CAM-based prompting module (CPM), which exploit SAM at the feature and logit levels, respectively. SSC performs prototype-based contrasting using SAM’s automatic segmentation results. It constrains each feature to be close to the prototype of its segment and distant from prototypes of the others. Meanwhile, CPM extracts prompts from the CAM of each class and uses them to generate class-specific segmentation masks through SAM. The masks are aggregated into unified self-supervision based on the confidence score, designed to consider the reliability of both SAM and CAMs. S2C achieves a new state-of-the-art performance across all benchmarks, outperforming existing studies by significant margins. The code will be available soon.