A novel Frank–Wolfe algorithm. Analysis and applications to large-scale SVM training

Ñanculef, Ricardo; Frandi, Emanuele; Sartori, Claudio; Allende, Héctor

doi:10.1016/j.ins.2014.03.059

Recently, there has been a renewed interest in the machine learning community for variants of a sparse greedy approximation procedure for concave optimization known as the Frank–Wolfe (FW) method. In particular, this procedure has been successfully applied to train large-scale instances of non-linear Support Vector Machines (SVMs). Specializing FW to SVM training has allowed to obtain efficient algorithms, but also important theoretical results, including convergence analysis of training algorithms and new characterizations of model sparsity. In this paper, we present and analyze a novel variant of the FW method based on a new way to perform away steps, a classic strategy used to accelerate the convergence of the basic FW procedure. Our formulation and analysis is focused on a general concave maximization problem on the simplex. However, the specialization of our algorithm to quadratic forms is strongly related to some classic methods in computational geometry, namely the Gilbert and MDM algorithms. On the theoretical side, we demonstrate that the method matches the guarantees in terms of convergence rate and number of iterations obtained by using classic away steps. In particular, the method enjoys a linear rate of convergence, a result that has been recently proved for MDM on quadratic forms. On the practical side, we provide experiments on several classification datasets, and evaluate the results using statistical tests. Experiments show that our method is faster than the FW method with classic away steps, and works well even in the cases in which classic away steps slow down the algorithm. Furthermore, these improvements are obtained without sacrificing the predictive accuracy of the obtained SVM model.

Ricardo Ñanculef, Emanuele Frandi, Claudio Sartori, Héctor Allende (2014). A novel Frank–Wolfe algorithm. Analysis and applications to large-scale SVM training. INFORMATION SCIENCES, 285, 66-99 [10.1016/j.ins.2014.03.059].