Fashion e-commerce platforms are becoming increasingly popular. However, scanning, rendering, and captioning fashion items are still done mostly manually. In this work, we address the task of generating a textual description of a fashion item from an image portraying it. We carry out an extensive study with several neural architectures based on InceptionV3. We consider two existing fashion image captioning datasets, FACAD and InFashAI. We also curate a novel dataset, Fashion-Cap, that contains more than 290,000 images and 40,000 corresponding captions. In our analysis, we observe significant differences between the three datasets’ captions, with Fashion-Cap having higher quality captions. To the best of our knowledge, this is the most extensive experimental study in fashion image captioning to date. Our experimental results show that our dataset is less challenging than FACAD but more than InFashAI, which confirms our insights, suggesting that it could be a valuable benchmark for this domain.

Del Moro, M., Tudosie, S.C., Vannoni, F., Galassi, A., Ruggeri, F. (2023). Inception Models for Fashion Image Captioning: An Extensive Study on Multiple Datasets. Cham : Springer [10.1007/978-3-031-42448-9_1].

Inception Models for Fashion Image Captioning: An Extensive Study on Multiple Datasets

Vannoni, Francesco
Co-primo
;
Galassi, Andrea
;
Ruggeri, Federico
2023

Abstract

Fashion e-commerce platforms are becoming increasingly popular. However, scanning, rendering, and captioning fashion items are still done mostly manually. In this work, we address the task of generating a textual description of a fashion item from an image portraying it. We carry out an extensive study with several neural architectures based on InceptionV3. We consider two existing fashion image captioning datasets, FACAD and InFashAI. We also curate a novel dataset, Fashion-Cap, that contains more than 290,000 images and 40,000 corresponding captions. In our analysis, we observe significant differences between the three datasets’ captions, with Fashion-Cap having higher quality captions. To the best of our knowledge, this is the most extensive experimental study in fashion image captioning to date. Our experimental results show that our dataset is less challenging than FACAD but more than InFashAI, which confirms our insights, suggesting that it could be a valuable benchmark for this domain.
2023
Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2023.
3
14
Del Moro, M., Tudosie, S.C., Vannoni, F., Galassi, A., Ruggeri, F. (2023). Inception Models for Fashion Image Captioning: An Extensive Study on Multiple Datasets. Cham : Springer [10.1007/978-3-031-42448-9_1].
Del Moro, Mirko; Tudosie, Serban Cristian; Vannoni, Francesco; Galassi, Andrea; Ruggeri, Federico
File in questo prodotto:
File Dimensione Formato  
_23_CLEF__Image_Captioning_for_Fashion.pdf

Open Access dal 12/09/2024

Tipo: Postprint
Licenza: Licenza per accesso libero gratuito
Dimensione 1.22 MB
Formato Adobe PDF
1.22 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/941313
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact