We carry out an information-theoretical analysis of a two-layer neural network trained from input–output pairs generated by a teacher network with matching architecture, in overparametrized regimes. Our results come in the form of bounds relating i) the mutual information between training data and network weights, and ii) the Bayes-optimal generalization error, to the same quantities but for a simpler (generalized) linear model for which explicit expressions are rigorously known. Our bounds, which are expressed in terms of the number of training samples, input dimension and number of hidden units, yield fundamental performance limits for learning a target function with the form of a shallow neural network when limited data is available. The proof relies on rigorous tools from spin glasses and is guided by “Gaussian equivalence principles” lying at the core of numerous recent analyses of neural networks. Our results are information-theoretic (i.e. are not specific to any learning algorithm) and, importantly, cover a setting where all the student network parameters are trained.

Camilli, F., Tieplova, D., Barbier, J. (2025). Fundamental limits of overparametrized shallow neural networks for supervised learning. BOLLETTINO DELLA UNIONE MATEMATICA ITALIANA, -, 1-38 [10.1007/s40574-025-00506-2].

Fundamental limits of overparametrized shallow neural networks for supervised learning

Camilli, Francesco;Barbier, Jean
2025

Abstract

We carry out an information-theoretical analysis of a two-layer neural network trained from input–output pairs generated by a teacher network with matching architecture, in overparametrized regimes. Our results come in the form of bounds relating i) the mutual information between training data and network weights, and ii) the Bayes-optimal generalization error, to the same quantities but for a simpler (generalized) linear model for which explicit expressions are rigorously known. Our bounds, which are expressed in terms of the number of training samples, input dimension and number of hidden units, yield fundamental performance limits for learning a target function with the form of a shallow neural network when limited data is available. The proof relies on rigorous tools from spin glasses and is guided by “Gaussian equivalence principles” lying at the core of numerous recent analyses of neural networks. Our results are information-theoretic (i.e. are not specific to any learning algorithm) and, importantly, cover a setting where all the student network parameters are trained.
2025
Camilli, F., Tieplova, D., Barbier, J. (2025). Fundamental limits of overparametrized shallow neural networks for supervised learning. BOLLETTINO DELLA UNIONE MATEMATICA ITALIANA, -, 1-38 [10.1007/s40574-025-00506-2].
Camilli, Francesco; Tieplova, Daria; Barbier, Jean
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1028449
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact