Despite significant advances in text-based reasoning, the ability of multimodal models to perceive and interpret visual art remains poorly understood. This study explores how CLIP “sees” and understands artworks—focusing on European paintings from the Renaissance onward—by examining its capacity to recognize style, historical period, and cultural context. We evaluate both the strengths and limitations of CLIP’s visual perception, particularly its alignment with human judgments. Our analysis spans paintings created by both humans and AI, probing CLIP’s sensitivity to subtle yet meaningful differences between them, as well as its ability to detect characteristic artifacts and distortions introduced by generative models. The investigation draws on several recent datasets of AI-generated art, offering a critical assessment of their data acquisition methodologies, structure, and benchmarking potential. This work contributes to bridging computational perception and human artistic interpretation, offering new insights into the evolving role of AI in art analysis.

Asperti, A., Dessì, L., Wu, N. (2026). Art through CLIP’s Eyes. ACM JOURNAL ON COMPUTING AND CULTURAL HERITAGE, 19(2), 1-33 [10.1145/3812546].

Art through CLIP’s Eyes

Asperti, Andrea
;
2026

Abstract

Despite significant advances in text-based reasoning, the ability of multimodal models to perceive and interpret visual art remains poorly understood. This study explores how CLIP “sees” and understands artworks—focusing on European paintings from the Renaissance onward—by examining its capacity to recognize style, historical period, and cultural context. We evaluate both the strengths and limitations of CLIP’s visual perception, particularly its alignment with human judgments. Our analysis spans paintings created by both humans and AI, probing CLIP’s sensitivity to subtle yet meaningful differences between them, as well as its ability to detect characteristic artifacts and distortions introduced by generative models. The investigation draws on several recent datasets of AI-generated art, offering a critical assessment of their data acquisition methodologies, structure, and benchmarking potential. This work contributes to bridging computational perception and human artistic interpretation, offering new insights into the evolving role of AI in art analysis.
2026
Asperti, A., Dessì, L., Wu, N. (2026). Art through CLIP’s Eyes. ACM JOURNAL ON COMPUTING AND CULTURAL HERITAGE, 19(2), 1-33 [10.1145/3812546].
Asperti, Andrea; Dessì, Leonardo; Wu, Nico
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1069770
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact