Surgical scene understanding in Minimally Invasive Surgery (MIS) is crucial for advancing Computer-Assisted Intervention (CAI) applications, enhancing surgical safety, and improving navigation. This work introduces a novel multi-task learning framework that jointly performs binary surgical tool segmentation and monocular depth estimation in laparoscopic surgical scenes. The framework employs a staged learning strategy: first, leveraging widely available tool segmentation datasets to pre-train the network, followed by multi-task training using pseudo-masks and self-supervised monocular depth estimation. Extensive experiments demonstrate the effectiveness of the proposed framework, achieving competitive performance on depth estimation compared to state-of-the-art methods. Validation on two publicly available datasets highlights its robustness and adaptability across diverse surgical scenarios. These results emphasize the potential of multi-task learning to advance laparoscopic surgical perception. The implementation is available on GitHub.
Mazzocchetti, S., Cercenelli, L., Marcelli, E. (2025). Surgical Instrument Segmentation and Self-Supervised Monocular Depth Estimation in Minimally Invasive Surgery: A Multi-task Learning Approach. GEWERBESTRASSE 11, CHAM, CH-6330, SWITZERLAND : Springer Science and Business Media Deutschland GmbH [10.1007/978-3-031-95838-0_28].
Surgical Instrument Segmentation and Self-Supervised Monocular Depth Estimation in Minimally Invasive Surgery: A Multi-task Learning Approach
Mazzocchetti S.;Cercenelli L.;Marcelli E.
2025
Abstract
Surgical scene understanding in Minimally Invasive Surgery (MIS) is crucial for advancing Computer-Assisted Intervention (CAI) applications, enhancing surgical safety, and improving navigation. This work introduces a novel multi-task learning framework that jointly performs binary surgical tool segmentation and monocular depth estimation in laparoscopic surgical scenes. The framework employs a staged learning strategy: first, leveraging widely available tool segmentation datasets to pre-train the network, followed by multi-task training using pseudo-masks and self-supervised monocular depth estimation. Extensive experiments demonstrate the effectiveness of the proposed framework, achieving competitive performance on depth estimation compared to state-of-the-art methods. Validation on two publicly available datasets highlights its robustness and adaptability across diverse surgical scenarios. These results emphasize the potential of multi-task learning to advance laparoscopic surgical perception. The implementation is available on GitHub.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


