APSIPA Transactions on Signal and Information Processing > Vol 14 > Issue 4

ViP-CBM: A Low-parameter and Interpretable Concept Bottleneck Model Using Visual-projected Embeddings

Ji Qi, Tsinghua University, China, qij21@mails.tsinghua.edu.cn , Huisheng Wang, Tsinghua University, China, H. Vicky Zhao, Tsinghua University, China
 
Suggested Citation
Ji Qi, Huisheng Wang and H. Vicky Zhao (2025), "ViP-CBM: A Low-parameter and Interpretable Concept Bottleneck Model Using Visual-projected Embeddings", APSIPA Transactions on Signal and Information Processing: Vol. 14: No. 4, e301. http://dx.doi.org/10.1561/116.20250015

Publication Date: 28 Oct 2025
© 2025 J. Qi, H. Wang and H. V. Zhao
 
Subjects
Deep learning,  Classification and prediction
 
Keywords
Concept bottleneck modelsmulti-label classificationvisual projection
 

Share

Open Access

This is published under the terms of CC BY-NC.

Downloaded: 1 times

In this article:
Introduction 
Related Works 
The Proposed ViP-CBM Framework 
Experimental Setup 
Experimental Results 
Conclusions 
References 

Abstract

With the increasing application of deep neural networks (DNN) in personal and property security-related scenarios, ensuring the interpretability and trustworthiness of DNN models is crucial. Concept Bottleneck Models (CBMs) improve interoperability by predicting human-understandable concepts in the hidden layer for the final task, but they face challenges in efficiency and interpretability in multi-label classification (MLC) of concepts, such as ignoring concept correlations or relying on complex models with limited performance gain. To address the challenge of massive parameters and limited interpretability in the concept MLC problem, we propose a novel Visual-Projecting CBM (ViP-CBM), which reformulates the MLC of concepts as an input-dependent binary classification problem of concept embeddings using visual features for projection. Our ViP-CBM model reduces the training parameter set by more than 50% compared to other embedding-based CBMs while achieving comparable or even better performance in concept and class prediction. Our ViP-CBM also provides a more intuitive explanation by visualizing the projected embedding space. Additionally, we propose an intervention method for our ViP-CBM, which is shown to be more efficient than other embedding-based CBMs under joint training by experiments.

DOI:10.1561/116.20250015

Companion

APSIPA Transactions on Signal and Information Processing Special Issue - Invited Papers from APSIPA ASC 2024
See the other articles that are part of this special issue.