Computer vision for high-stakes, real-world applications necessitates robust explanation and transparency to ensure trust, accountability, and ethical deployment. Celebrating its 5th Anniversary, the Explainable AI for Computer Vision (XAI4CV) workshop provides a premier forum for the entire spectrum of XAI research, from interpretable-by-design models to challenges in multimodal foundational models. The program includes invited talks, spotlight papers, a poster session, and a tutorial. XAI4CV accepts paper and demo submissions to define the future of trustworthy visual AI.
The schedule is in MDT local time.
| 08:15–08:30 | Opening |
| 08:30–09:00 | Invited Talk 1 – Chaofan Chen Learning by Comparison: Case-based Reasoning for Interpretable Vision Models. Deep neural networks have achieved remarkable success in computer vision, yet their decision-making processes often remain opaque. This lack of transparency often limits their use in high-stakes domains such as healthcare. In this talk, I will present a line of work focused on interpretable vision models that make predictions using case-based reasoning, by comparing inputs with learned prototypical examples. I will also highlight how these interpretable models could be used in real-world applications, such as medical imaging, to improve transparency and user trust in AI-assisted decision-making. |
| 09:00–09:30 | Invited Talk 2 – Anh Totti Nguyen Vision Language Models with Explainable Bottleneck Layers. By design, VLM is a challenging type of neural network to "explain" due to (1) the parallel processing of multi-head attention; (2) dual image-text input streams; (3) the lack of a natural interface to point users to where evidence is in the input image. Empirically, VLMs tend to be heavily biased towards the knowledge in the language model and sometimes ignoring the signals from the image itself. In this talk, I'd advocate for explaining VLMs by first defining a bottleneck layer where we want to 2-way exchange the explanatory information between the user and the neural network. Depending on where in the network we want to extract information and insert information back into the network (for ablation study or further processing), we want to build a different explainability component. First, I'll talk about the phenomenon that VLMs are often acting like a short-sighted person (VLMs are Blind) and heavily biased to the text data (VLMs are Biased). Then, I'll share distinct attempts of building an explainable bottleneck in various points in the VLM: the attention layer (Transformer Attention Bottleneck); pen-ultimate layer (PEEB); and output layer or VLMs (Highlighted Chain of Thoughts, PageGuide, SketchVLM). I'll share our most recent, exciting demos of how frontier VLMs (Gemini 3 Pro) can be turned into a VLM that annotates your input image or computer screen to guide users through local-computer or web tasks. |
| 09:30–10:00 | Spotlight Session 1 Can Cross-Layer Transcoders Replace Vision Transformer Activations? An Interpretable Perspective on Vision. Gerasimos Chatzoudis, Konstantinos D. Polyzos, Zhuowei Li, Difei Gu, Gemma Elyse Moran, Hao Wang, Dimitris Metaxas. Multi-Granularity Concept Whitening for Neural Network Interpretability. Russell Barton, Hung Le, Yunhong Shan, Alexander Katopodis, Jonathan Donnelly, Eric Chen, Chaofan Chen, Cynthia Rudin. Activation-Based Concept Extraction for Explainability in Image Classification. Matteo Bianchi, Riccardo Campi, Antonio De Santis, Sara Merengo, Marco Brambilla. |
| 10:00–10:30 | Coffee Break |
| 10:30–11:00 | Invited Talk 3 – Elizabeth Barnes The Final Model Is Not Enough: Training Dynamics as a Form of XAI. AI weather and climate models now rival operational forecast systems, but interpreting these models remains a major challenge. Standard explainable AI methods, while effective for simpler climate prediction networks, have proven difficult to apply to large autoregressive emulators operating on high-dimensional spatiotemporal fields. In this talk, I argue that the training trajectory itself offers a complementary and underutilized diagnostic lens. Using AI weather emulators as a testbed, I show that models actively learn and then unlearn specific extreme weather events during training, that learning dynamics differ qualitatively across forecast tasks, and that designed perturbation experiments can pinpoint when models acquire knowledge of physical relationships between variables. These results suggest that for complex AI models where post-hoc explanation is intractable, studying how a model arrives at its final state may be as revealing as explaining what that final state does. |
| 11:00–11:30 | Spotlight Session 2 DINO-QPM: Adapting Visual Foundation Models for Globally Interpretable Image Classification. Robert Zimmermann, Thomas Norrenbrock, Bodo Rosenhahn. FaCT: Faithful Concept Traces for Explaining Neural Networks. Amin Parchami-Araghi, Sukrut Rao, Jonas Fischer, Bernt Schiele. Interpretable 3D Neural Object Volumes for Robust Conceptual Reasoning. Nhi Pham, Artur Jesslen, Bernt Schiele, Adam Kortylewski, Jonas Fischer. |
| 11:30–12:00 | Tutorial – Maximilian Dreyer From Concepts to Control: Diagnosing and Steering Vision Foundation Models. This tutorial shows how to turn concept-based explanations into diagnosis and control for vision(-language) models. We train a sparse autoencoder (SAE) on CLIP embeddings to discover interpretable components, explore them via textual search, and compute component-level attributions for single examples to see which concepts drive predictions. We then use these concepts to reveal biases (e.g., gender associations for “nurse”) and demonstrate lightweight control strategies: suppressing or amplifying specific concepts to test and correct the model. We briefly replicate the workflow for Qwen3‑VL, noting the differences for multimodal hidden states. Participants leave with a practical recipe, pitfalls to avoid, and guidance for diagnosing and correcting when models rely on undesired concepts. The tutorial is available on GitHub |
| 12:00–12:15 | Closing Remarks |
| 12:15–13:00 | Poster Session |
We thank our great Program Committee members who made this workshop possible!
Alina Barnett, Quentin Bouniot, Thea Brüsch, Tzoulio Chamiti, Jinwoo Choi, Joseph-Paul Cohen, Fernando Díaz-De-María, Maximilian Dreyer, Jonathan Donnelly, Miguel-Ángel Fernández-Torres, Marina Gavrilova, John Gkountouras, Maria Gonzalez-Calabuig, Iván González-Díaz, Ada Görgün, Sadaf Gulshad, Shashank Gupta, Adrian Höhl, Nils Huetten, Dahye Kim, Tobias Labarta, Lorenz Linhardt, Manxi Lin, Raphael Maser, Miguel Molina-Moreno, Elisa Nguyen, Ivica Obadic, Matthew Olson, Indu Panigrahi, Amin Parchami-Araghi, Paraskevas Pegios, Nhi Pham, Vipin Pillai, Sukrut Rao, Fawaz Sammani, Simone Schaub-Meyer, Mayank Singh, Shreya Tendulkar, Lenka Tětková, Navneet Tyagi, Kristoffer Wickstrøm, Romain Xu-Darme, Xiwei Xuan, Luna Zhang, Mengxue Zhang
We welcome paper and demo submissions:
We encourage submissions on topics including, but not limited to:
The Microsoft CMT service was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.