To efficiently deploy strong, often pre-trained feature extractors, recent Industrial Anomaly Detection and Segmentation (IADS) methods process low-resolution images, e.g., 224×224 pixels, obtained by downsampling the original input images. However, while numerous industrial applications demand the identification of both large and small defects, downsampling the input image to a low resolution may hinder a method’s ability to pinpoint tiny anomalies. We propose a novel Teacher-Student paradigm to leverage strong pre-trained features while processing high-resolution input images very efficiently. The core idea concerns training two shallow MLPs (the Students) by nominal images so as to mimic the mappings between the patch embeddings induced by the self-attention layers of a frozen vision Transformer (the Teacher). Indeed, learning these mappings sets forth a challenging pretext task that small-capacity models are unlikely to accomplish on out-of-distribution data such as anomalous images. Our method can spot anomalies from high-resolution images and runs way faster than competitors, achieving state-of-the-art performance on MVTec AD and the best segmentation results on VisA. We also propose novel evaluation metrics to capture robustness to defect size, i.e., the ability to preserve good localisation from large anomalies to tiny ones. Evaluating our method also by these metrics reveals its neatly superior performance.

Costanzino, A., Zama Ramirez, P., Lisanti, G., Stefano, L.D. (2025). Learning to Be a Transformer to Pinpoint Anomalies. IEEE ACCESS, 13, 109624-109637 [10.1109/access.2025.3582900].

Learning to Be a Transformer to Pinpoint Anomalies

Costanzino, Alex;Zama Ramirez, Pierluigi;Lisanti, Giuseppe;Stefano, Luigi Di
2025

Abstract

To efficiently deploy strong, often pre-trained feature extractors, recent Industrial Anomaly Detection and Segmentation (IADS) methods process low-resolution images, e.g., 224×224 pixels, obtained by downsampling the original input images. However, while numerous industrial applications demand the identification of both large and small defects, downsampling the input image to a low resolution may hinder a method’s ability to pinpoint tiny anomalies. We propose a novel Teacher-Student paradigm to leverage strong pre-trained features while processing high-resolution input images very efficiently. The core idea concerns training two shallow MLPs (the Students) by nominal images so as to mimic the mappings between the patch embeddings induced by the self-attention layers of a frozen vision Transformer (the Teacher). Indeed, learning these mappings sets forth a challenging pretext task that small-capacity models are unlikely to accomplish on out-of-distribution data such as anomalous images. Our method can spot anomalies from high-resolution images and runs way faster than competitors, achieving state-of-the-art performance on MVTec AD and the best segmentation results on VisA. We also propose novel evaluation metrics to capture robustness to defect size, i.e., the ability to preserve good localisation from large anomalies to tiny ones. Evaluating our method also by these metrics reveals its neatly superior performance.
2025
Costanzino, A., Zama Ramirez, P., Lisanti, G., Stefano, L.D. (2025). Learning to Be a Transformer to Pinpoint Anomalies. IEEE ACCESS, 13, 109624-109637 [10.1109/access.2025.3582900].
Costanzino, Alex; Zama Ramirez, Pierluigi; Lisanti, Giuseppe; Stefano, Luigi Di
File in questo prodotto:
File Dimensione Formato  
Learning_to_Be_a_Transformer_to_Pinpoint_Anomalies.pdf

accesso aperto

Tipo: Versione (PDF) editoriale / Version Of Record
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale - Non opere derivate (CCBYNCND)
Dimensione 5.98 MB
Formato Adobe PDF
5.98 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1028606
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact