D-CRISP: Explaining Object Detectors by combining Randomized and Segment-based Perturbations

Authors: Alain Andres* , Javier Del Ser

Published in European Conference on Artificial Intelligence, ECAI, 2025

Full paper can be found here

Abstract: Explaining the decisions issued by Machine Learning models for object detection tasks is essential in high-stakes decision making scenarios, such as medical image processing and vehicular perception for autonomous driving. Despite the proliferation of post-hoc perturbation-based methods for generating visual explanations, most eXplainable AI (XAI) approaches rely exclusively on either random image masking or selective segmentation-based occlusion, missing the opportunity to synergistically leverage both strategies in a complementary fashion. In this paper we address this gap by proposing D-CRISP (Detector-Combining Randomized Input and Segment Perturbations), a novel post-hoc explanation method for object detection models. D-CRISP unifies both random and region-based occlusions derived from image segmentation, producing multiscale saliency maps that capture both granular (pixel-level) and semantic (region-level) cues about the objects detected by the model.

Experiments on the MS-COCO dataset show that D-CRISP significantly outperforms random-masking approaches in terms of explanation faithfulness and localization, while requiring slightly more computation effort than these methods. At the same time, it achieves comparable or better performance than segmentation-based methods, yet with substantially lower mask generation latencies. These results position D-CRISP as a highly effective and efficient XAI alternative for object detection models, particularly suited for time-constrained applications requiring timely, accurate, and interpretable decisions.