CRIS Current Research Information System

We rigorously analyse fully-trained neural networks of arbitrary depth in the Bayesian optimal setting in the so-called proportional scaling regime where the number of training samples and width of the input and all inner layers diverge proportionally. We prove an information-theoretic equivalence between the Bayesian deep neural network model trained from data generated by a teacher with matching architecture, and a simpler model of optimal inference in a generalized linear model. This equivalence enables us to compute the optimal generalization error for deep neural networks in this regime. We thus prove the "deep Gaussian equivalence principle" conjectured in Cui et al. (2023) (arXiv:2302.00375). Our result highlights that in order to escape this "trivialisation" of deep neural networks (in the sense of reduction to a linear model) happening in the strongly overparametrized proportional regime, models trained from much more data have to be considered.

Camilli, F., Tieplova, D., Bergamin, E., Barbier, J. (2025). Information-theoretic reduction of deep neural networks to linear models in the overparametrized proportional regime.

Information-theoretic reduction of deep neural networks to linear models in the overparametrized proportional regime

Francesco Camilli;Daria Tieplova;Eleonora Bergamin;Jean Barbier

2025

Abstract

We rigorously analyse fully-trained neural networks of arbitrary depth in the Bayesian optimal setting in the so-called proportional scaling regime where the number of training samples and width of the input and all inner layers diverge proportionally. We prove an information-theoretic equivalence between the Bayesian deep neural network model trained from data generated by a teacher with matching architecture, and a simpler model of optimal inference in a generalized linear model. This equivalence enables us to compute the optimal generalization error for deep neural networks in this regime. We thus prove the "deep Gaussian equivalence principle" conjectured in Cui et al. (2023) (arXiv:2302.00375). Our result highlights that in order to escape this "trivialisation" of deep neural networks (in the sense of reduction to a linear model) happening in the strongly overparametrized proportional regime, models trained from much more data have to be considered.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Titolo del volume
	
				Proceedings of Thirty Eighth Conference on Learning Theory
			
	Pagina iniziale
	
				757
			
	Pagina finale
	
				798
			
	Collana/Serie
	
				PROCEEDINGS OF MACHINE LEARNING RESEARCH
			
	Citazione
	
				Camilli, F., Tieplova, D., Bergamin, E., Barbier, J. (2025). Information-theoretic reduction of deep neural networks to linear models in the overparametrized proportional regime.
			
	Tutti gli autori
	
						Camilli, Francesco; Tieplova, Daria; Bergamin, Eleonora; Barbier, Jean
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
camilli25a.pdf accesso aperto Descrizione: Accepted version Tipo: Versione (PDF) editoriale / Version Of Record Licenza: Licenza per accesso riservato Dimensione 474.95 kB Formato Adobe PDF Visualizza/Apri	474.95 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1036711

Citazioni

ND

0

0

ND

social impact