CRIS Current Research Information System

The perception of Deformable Linear Objects (DLOs) is a challenging task due to their complex and ambiguous appearance, lack of discernible features, typically small sizes, and deformability. Despite these challenges, achieving a robust and effective segmentation of DLOs is crucial to introduce robots into environments where they are currently underrepresented, such as domestic and complex industrial settings. In this context, the integration of language-based inputs can simplify the perception task while also enabling the possibility of introducing robots as human companions. Therefore, this letter proposes a novel architecture for the perception of DLOs, wherein the input image is augmented with a text-based prompt guiding the segmentation of the target DLO. After encoding the image and text separately, a Perceiver-inspired structure is exploited to compress the concatenated data into transformer layers and generate the output mask from a latent vector representation. The method is experimentally evaluated on real-world images of DLOs like electrical cables and ropes, validating its efficacy and efficiency in real practical scenarios.

Caporali, A., Galassi, K., Palli, G. (2024). DLO Perceiver: Grounding Large Language Model for Deformable Linear Objects Perception. IEEE ROBOTICS AND AUTOMATION LETTERS, 9(12), 11385-11392 [10.1109/lra.2024.3491428].

DLO Perceiver: Grounding Large Language Model for Deformable Linear Objects Perception

Caporali, Alessio;Galassi, Kevin;Palli, Gianluca

2024

Abstract

The perception of Deformable Linear Objects (DLOs) is a challenging task due to their complex and ambiguous appearance, lack of discernible features, typically small sizes, and deformability. Despite these challenges, achieving a robust and effective segmentation of DLOs is crucial to introduce robots into environments where they are currently underrepresented, such as domestic and complex industrial settings. In this context, the integration of language-based inputs can simplify the perception task while also enabling the possibility of introducing robots as human companions. Therefore, this letter proposes a novel architecture for the perception of DLOs, wherein the input image is augmented with a text-based prompt guiding the segmentation of the target DLO. After encoding the image and text separately, a Perceiver-inspired structure is exploited to compress the concatenated data into transformer layers and generate the output mask from a latent vector representation. The method is experimentally evaluated on real-world images of DLOs like electrical cables and ropes, validating its efficacy and efficiency in real practical scenarios.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Rivista
	
				IEEE ROBOTICS AND AUTOMATION LETTERS
			
	Codice DOI
	
				https://dx.doi.org/10.1109/lra.2024.3491428
			
	Citazione
	
				Caporali, A., Galassi, K., Palli, G. (2024). DLO Perceiver: Grounding Large Language Model for Deformable Linear Objects Perception. IEEE ROBOTICS AND AUTOMATION LETTERS, 9(12), 11385-11392 [10.1109/lra.2024.3491428].
			
	Tutti gli autori
	
						Caporali, Alessio; Galassi, Kevin; Palli, Gianluca
					
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
editoriale.pdf accesso aperto Tipo: Versione (PDF) editoriale / Version Of Record Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY) Dimensione 3.16 MB Formato Adobe PDF Visualizza/Apri	3.16 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1000282

Citazioni

ND

6

3

ND

social impact