Nowadays, many Internet of Things (IoT) systems rely on sensing units that offload data to remote cloud servers for analytics. While this approach provides the computational power required to execute complex Deep Learning (DL) tasks, it introduces privacy vulnerabilities and becomes unfeasible in scenarios with constrained network bandwidth. In this paper, we investigate the possibility of completely offloading DL inference tasks to the Extreme Edge (EE) of an IoT system, consisting of a multi-hop network of microcontrollers or low-power PCs. To this end, we explore the splitting of DL models across the physical topology, taking into account the heterogeneity of EE devices and the characteristics of wireless links. To balance the trade-off between model accuracy and resource limitations, we focus on mixed-precision quantization strategies that adjust the precision of each sub-model based on the hardware capabilities of the target devices. Beyond the optimization problem formulation, we propose a Genetic Algorithm (GA) that determines the best model allocation and in-network inference path within the multi-hop IoT network by jointly optimizing energy efficiency and latency. Experimental results on three widely adopted DNN architectures (MobileNetV2, ResNet50, and VGG16) demonstrate that the proposed GA achieves up to a 66% reduction in the fitness function compared to the baseline greedy algorithm.
Trotta, A., Esposito, A., Sciullo, L., Bononi, L., Di Felice, M. (2026). Private Inference at the Extreme Edge: Joint Mixed Precision Quantization and Model Splitting in Multi-Hop IoT Networks [10.1109/ccnc65079.2026.11366617].
Private Inference at the Extreme Edge: Joint Mixed Precision Quantization and Model Splitting in Multi-Hop IoT Networks
Trotta, Angelo;Esposito, Alfonso;Sciullo, Luca;Bononi, Luciano;Di Felice, Marco
2026
Abstract
Nowadays, many Internet of Things (IoT) systems rely on sensing units that offload data to remote cloud servers for analytics. While this approach provides the computational power required to execute complex Deep Learning (DL) tasks, it introduces privacy vulnerabilities and becomes unfeasible in scenarios with constrained network bandwidth. In this paper, we investigate the possibility of completely offloading DL inference tasks to the Extreme Edge (EE) of an IoT system, consisting of a multi-hop network of microcontrollers or low-power PCs. To this end, we explore the splitting of DL models across the physical topology, taking into account the heterogeneity of EE devices and the characteristics of wireless links. To balance the trade-off between model accuracy and resource limitations, we focus on mixed-precision quantization strategies that adjust the precision of each sub-model based on the hardware capabilities of the target devices. Beyond the optimization problem formulation, we propose a Genetic Algorithm (GA) that determines the best model allocation and in-network inference path within the multi-hop IoT network by jointly optimizing energy efficiency and latency. Experimental results on three widely adopted DNN architectures (MobileNetV2, ResNet50, and VGG16) demonstrate that the proposed GA achieves up to a 66% reduction in the fitness function compared to the baseline greedy algorithm.| File | Dimensione | Formato | |
|---|---|---|---|
|
CCNC_2026___Split-12.pdf
embargo fino al 03/08/2027
Tipo:
Postprint / Author's Accepted Manuscript (AAM) - versione accettata per la pubblicazione dopo la peer-review
Licenza:
Licenza per accesso libero gratuito
Dimensione
1.44 MB
Formato
Adobe PDF
|
1.44 MB | Adobe PDF | Visualizza/Apri Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



