CRIS Current Research Information System

Neural Radiance Fields (NeRFs) are neural networks -- typically multilayer perceptrons (MLPs) -- that represent the geometry and appearance of objects, with applications in vision, graphics, and robotics. Recent works propose understanding NeRFs with natural language using Multimodal Large Language Models (MLLMs) that directly process the weights of a NeRF's MLP. However, these approaches rely on a global representation of the input object, making them unsuitable for spatial reasoning and fine-grained understanding. In contrast, we propose **weights2space**, a self-supervised framework featuring a novel meta-encoder that can compute a sequence of spatial tokens directly from the weights of a NeRF. Leveraging this representation, we build **Spatial LLaNA**, a novel MLLM for NeRFs, capable of understanding details and spatial relationships in objects represented as NeRFs. We evaluate Spatial LLaNA on NeRF captioning and NeRF Q&A tasks, using both existing benchmarks and our novel **Spatial ObjaNeRF** dataset consisting of 100 manually-curated language annotations for NeRFs. This dataset features 3D models and descriptions that challenge the spatial reasoning capability of MLLMs. Spatial LLaNA outperforms existing approaches across all tasks.

Amaduzzi, A., Zama Ramirez, P., Lisanti, G., Salti, S., Di Stefano, L. (2025). Spatially-aware Weights Tokenization for NeRF-Language Models.

Spatially-aware Weights Tokenization for NeRF-Language Models

Andrea Amaduzzi;Pierluigi Zama Ramirez;Giuseppe Lisanti;Samuele Salti;Luigi Di Stefano

2025

Abstract

Neural Radiance Fields (NeRFs) are neural networks -- typically multilayer perceptrons (MLPs) -- that represent the geometry and appearance of objects, with applications in vision, graphics, and robotics. Recent works propose understanding NeRFs with natural language using Multimodal Large Language Models (MLLMs) that directly process the weights of a NeRF's MLP. However, these approaches rely on a global representation of the input object, making them unsuitable for spatial reasoning and fine-grained understanding. In contrast, we propose **weights2space**, a self-supervised framework featuring a novel meta-encoder that can compute a sequence of spatial tokens directly from the weights of a NeRF. Leveraging this representation, we build **Spatial LLaNA**, a novel MLLM for NeRFs, capable of understanding details and spatial relationships in objects represented as NeRFs. We evaluate Spatial LLaNA on NeRF captioning and NeRF Q&A tasks, using both existing benchmarks and our novel **Spatial ObjaNeRF** dataset consisting of 100 manually-curated language annotations for NeRFs. This dataset features 3D models and descriptions that challenge the spatial reasoning capability of MLLMs. Spatial LLaNA outperforms existing approaches across all tasks.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Titolo del volume
	
				39th Conference on Neural Information Processing Systems (NeurIPS 2025)
			
	Pagina iniziale
	
				1
			
	Pagina finale
	
				30
			
	Collana/Serie
	
				ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS
			
	Citazione
	
				Amaduzzi, A., Zama Ramirez, P., Lisanti, G., Salti, S., Di Stefano, L. (2025). Spatially-aware Weights Tokenization for NeRF-Language Models.
			
	Tutti gli autori
	
						Amaduzzi, Andrea; Zama Ramirez, Pierluigi; Lisanti, Giuseppe; Salti, Samuele; Di Stefano, Luigi
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
output_neurips.pdf accesso aperto Tipo: Versione (PDF) editoriale / Version Of Record Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale (CCBYNC) Dimensione 1.78 MB Formato Adobe PDF Visualizza/Apri	1.78 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1040602

Citazioni

ND

ND

ND

ND

social impact