CRIS Current Research Information System

While text-conditional 3D object generation and manipulation have seen rapid progress, the evaluation of coherence between generated 3D shapes and input textual descriptions lacks a clear benchmark. The reason is twofold: a) the low quality of the textual descriptions in the only publicly available dataset of text-shape pairs; b) the limited effectiveness of the metrics used to quantitatively assess such coherence. In this paper, we propose a comprehensive solution that addresses both weaknesses. Firstly, we employ large language models to automatically refine textual descriptions associated with shapes. Secondly, we propose a quantitative metric to assess text-to-shape coherence, through cross-attention mechanisms. To validate our approach, we conduct a user study and compare quantitatively our metric with existing ones. The refined dataset, the new metric and a set of text-shape pairs validated by the user study comprise a novel, fine-grained benchmark that we publicly release to foster research on text-to-shape coherence of text-conditioned 3D generative models. Benchmark available at https://cvlab-unibo.github.io/CrossCoherence-Web/.

Amaduzzi A., Lisanti G., Salti S., Di Stefano L. (2023). Looking at words and points with attention: a benchmark for text-to-shape coherence [10.1109/ICCVW60793.2023.00309].

Looking at words and points with attention: a benchmark for text-to-shape coherence

Amaduzzi A.;Lisanti G.;Salti S.;Di Stefano L.

2023

Abstract

While text-conditional 3D object generation and manipulation have seen rapid progress, the evaluation of coherence between generated 3D shapes and input textual descriptions lacks a clear benchmark. The reason is twofold: a) the low quality of the textual descriptions in the only publicly available dataset of text-shape pairs; b) the limited effectiveness of the metrics used to quantitatively assess such coherence. In this paper, we propose a comprehensive solution that addresses both weaknesses. Firstly, we employ large language models to automatically refine textual descriptions associated with shapes. Secondly, we propose a quantitative metric to assess text-to-shape coherence, through cross-attention mechanisms. To validate our approach, we conduct a user study and compare quantitatively our metric with existing ones. The refined dataset, the new metric and a set of text-shape pairs validated by the user study comprise a novel, fine-grained benchmark that we publicly release to foster research on text-to-shape coherence of text-conditioned 3D generative models. Benchmark available at https://cvlab-unibo.github.io/CrossCoherence-Web/.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Titolo del volume
	
				Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops - ICCVW
			
	Pagina iniziale
	
				2860
			
	Pagina finale
	
				2869
			
	Collana/Serie
	
				... IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS
			
	Codice DOI
	
				https://dx.doi.org/10.1109/ICCVW60793.2023.00309
			
	Citazione
	
				Amaduzzi A.,  Lisanti G.,  Salti S.,  Di Stefano L. (2023). Looking at words and points with attention: a benchmark for text-to-shape coherence [10.1109/ICCVW60793.2023.00309].
			
	Tutti gli autori
	
						Amaduzzi A.; Lisanti G.; Salti S.; Di Stefano L.
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
23_ICCVW_Looking_at_Words_and_Points_with_Attention_a_Benchmark_for.pdf accesso aperto Tipo: Postprint Licenza: Licenza per accesso libero gratuito Dimensione 42.3 MB Formato Adobe PDF Visualizza/Apri	42.3 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/955652

Citazioni

ND

0

0

social impact