Despite significant advances in text-based reasoning, the ability of multimodal models to perceive and interpret visual art remains poorly understood. This study explores how CLIP “sees” and understands artworks—focusing on European paintings from the Renaissance onward—by examining its capacity to recognize style, historical period, and cultural context. We evaluate both the strengths and limitations of CLIP’s visual perception, particularly its alignment with human judgments. Our analysis spans paintings created by both humans and AI, probing CLIP’s sensitivity to subtle yet meaningful differences between them, as well as its ability to detect characteristic artifacts and distortions introduced by generative models. The investigation draws on several recent datasets of AI-generated art, offering a critical assessment of their data acquisition methodologies, structure, and benchmarking potential. This work contributes to bridging computational perception and human artistic interpretation, offering new insights into the evolving role of AI in art analysis.
Asperti, A., Dessì, L., Wu, N. (2026). Art through CLIP’s Eyes. ACM JOURNAL ON COMPUTING AND CULTURAL HERITAGE, 19(2), 1-33 [10.1145/3812546].
Art through CLIP’s Eyes
Asperti, Andrea
;
2026
Abstract
Despite significant advances in text-based reasoning, the ability of multimodal models to perceive and interpret visual art remains poorly understood. This study explores how CLIP “sees” and understands artworks—focusing on European paintings from the Renaissance onward—by examining its capacity to recognize style, historical period, and cultural context. We evaluate both the strengths and limitations of CLIP’s visual perception, particularly its alignment with human judgments. Our analysis spans paintings created by both humans and AI, probing CLIP’s sensitivity to subtle yet meaningful differences between them, as well as its ability to detect characteristic artifacts and distortions introduced by generative models. The investigation draws on several recent datasets of AI-generated art, offering a critical assessment of their data acquisition methodologies, structure, and benchmarking potential. This work contributes to bridging computational perception and human artistic interpretation, offering new insights into the evolving role of AI in art analysis.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



