With the increasing application of deep neural networks (DNN) in personal and property security-related scenarios, ensuring the interpretability and trustworthiness of DNN models is crucial. Concept Bottleneck Models (CBMs) improve interoperability by predicting human-understandable concepts in the hidden layer for the final task, but they face challenges in efficiency and interpretability in multi-label classification (MLC) of concepts, such as ignoring concept correlations or relying on complex models with limited performance gain. To address the challenge of massive parameters and limited interpretability in the concept MLC problem, we propose a novel Visual-Projecting CBM (ViP-CBM), which reformulates the MLC of concepts as an input-dependent binary classification problem of concept embeddings using visual features for projection. Our ViP-CBM model reduces the training parameter set by more than 50% compared to other embedding-based CBMs while achieving comparable or even better performance in concept and class prediction. Our ViP-CBM also provides a more intuitive explanation by visualizing the projected embedding space. Additionally, we propose an intervention method for our ViP-CBM, which is shown to be more efficient than other embedding-based CBMs under joint training by experiments.
Companion
APSIPA Transactions on Signal and Information Processing Special Issue - Invited Papers from APSIPA ASC 2024
See the other articles that are part of this special issue.