Fashion e-commerce platforms are becoming increasingly popular. However, scanning, rendering, and captioning fashion items are still done mostly manually. In this work, we address the task of generating a textual description of a fashion item from an image portraying it. We carry out an extensive study with several neural architectures based on InceptionV3. We consider two existing fashion image captioning datasets, FACAD and InFashAI. We also curate a novel dataset, Fashion-Cap, that contains more than 290,000 images and 40,000 corresponding captions. In our analysis, we observe significant differences between the three datasets’ captions, with Fashion-Cap having higher quality captions. To the best of our knowledge, this is the most extensive experimental study in fashion image captioning to date. Our experimental results show that our dataset is less challenging than FACAD but more than InFashAI, which confirms our insights, suggesting that it could be a valuable benchmark for this domain.
Del Moro, M., Tudosie, S.C., Vannoni, F., Galassi, A., Ruggeri, F. (2023). Inception Models for Fashion Image Captioning: An Extensive Study on Multiple Datasets. Cham : Springer [10.1007/978-3-031-42448-9_1].
Inception Models for Fashion Image Captioning: An Extensive Study on Multiple Datasets
Vannoni, FrancescoCo-primo
;Galassi, Andrea
;Ruggeri, Federico
2023
Abstract
Fashion e-commerce platforms are becoming increasingly popular. However, scanning, rendering, and captioning fashion items are still done mostly manually. In this work, we address the task of generating a textual description of a fashion item from an image portraying it. We carry out an extensive study with several neural architectures based on InceptionV3. We consider two existing fashion image captioning datasets, FACAD and InFashAI. We also curate a novel dataset, Fashion-Cap, that contains more than 290,000 images and 40,000 corresponding captions. In our analysis, we observe significant differences between the three datasets’ captions, with Fashion-Cap having higher quality captions. To the best of our knowledge, this is the most extensive experimental study in fashion image captioning to date. Our experimental results show that our dataset is less challenging than FACAD but more than InFashAI, which confirms our insights, suggesting that it could be a valuable benchmark for this domain.File | Dimensione | Formato | |
---|---|---|---|
_23_CLEF__Image_Captioning_for_Fashion.pdf
Open Access dal 12/09/2024
Tipo:
Postprint
Licenza:
Licenza per accesso libero gratuito
Dimensione
1.22 MB
Formato
Adobe PDF
|
1.22 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.