Multimodal systems and Large Language Models have shown remarkable capabilities in text-based reasoning, yet their capacity to perceive and interpret visual art remains uncertain. This study examines how CLIP “sees” and understands artworks by comparing their responses to human- and AI-generated paintings in the European tradition from the Renaissance onward. The analysis focuses on its ability to identify style, period and cultural context, as well as potential biases in its perception, evaluated against human judgments.
Asperti, A., Dessi, L., Tonetti, M.C., Wu, N. (2025). Does CLIP Perceive Art the Same Way We Do?. New York : IEEE [10.1109/cbmi66578.2025.11339321].
Does CLIP Perceive Art the Same Way We Do?
Asperti, Andrea
;Tonetti, Maria Chiara;
2025
Abstract
Multimodal systems and Large Language Models have shown remarkable capabilities in text-based reasoning, yet their capacity to perceive and interpret visual art remains uncertain. This study examines how CLIP “sees” and understands artworks by comparing their responses to human- and AI-generated paintings in the European tradition from the Renaissance onward. The analysis focuses on its ability to identify style, period and cultural context, as well as potential biases in its perception, evaluated against human judgments.| File | Dimensione | Formato | |
|---|---|---|---|
|
CLIP_perception__IEEE_trans_arxiv__compressed.pdf
embargo fino al 19/01/2028
Tipo:
Postprint / Author's Accepted Manuscript (AAM) - versione accettata per la pubblicazione dopo la peer-review
Licenza:
Licenza per accesso libero gratuito
Dimensione
221.4 kB
Formato
Adobe PDF
|
221.4 kB | Adobe PDF | Visualizza/Apri Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



