IACM - IACM

IACM Colloquium

Speaker:

Panayotis Mertikopoulos

Department of Mathematics, University of Athens

Title:

The convergence landscape of stochastic gradient descent: Which minima are preferred (and by how much)?

Abstract:

This talk concerns the long-run convergence properties of stochastic gradient descent (SGD) in non-convex problems, namely (i) whether the algorithm converges in the long run (and in what sense); and (ii) which regions of the problem's state space are more likely to be visited, and by how much. Starting with the case where SGD is run with a vanishing step-size / learning rate, we will discuss a range of almost sure convergence results, including some recent results concerning the avoidance of non-minimizing critical points, and the method's rate of convergence to "regular" (Hurwicz) minimizers.

By constrast, in the constant step-size case, the landscape is considerably different, as the algorithm's trajectories typically do not converge, but are instead concentrated near "small gradient" areas. Using an approach based on the theory of large deviations and randomly perturbed dynamical systems, we will see that the long-run distribution of SGD resembles the Boltzmann-Gibbs distribution of equilibrium thermodynamics with temperature equal to the method's step-size and energy levels determined by the problem's objective and the statistics of the noise. In particular, we will see that, in the long run:

1. The problem's critical region is visited exponentially more often than any non-critical region.

2. The iterates of SGD are exponentially concentrated around the problem's minimum energy state (which does not always coincide with the global minimum of the objective).

3. All other components of critical points are visited with frequency that is exponentially proportional to their energy level.

4. Every non-minimizing component is "dominated" by a minimizing component that is visited exponentially more often.

Based on joint work with W. Azizian, F. Iutzeler, and J. Malick

Short Bio:

Ο Παναγιώτης Μερτικόπουλος αποφοίτησε από το Τμήμα Φυσικής του ΕΚΠΑ το 2003, και, μετά από μεταπτυχιακές σπουδές στα Μαθηματικά στο Πανεπιστήμιο Brown των ΗΠΑ (MSc το 2005 και MPhil το 2006), εκπόνησε τη διδακτορική του διατριβή στο Τμήμα Φυσικής του ΕΚΠΑ το 2010 με θέμα τις "Στοχαστικές Διαταραχές στη Θεωρία Παιγνίων". Το 2010-2011 εργάστηκε ως μεταδιδακτορικός ερευνητής στην École Polytechnique στο Παρίσι, και, από το 2011, είναι κύριος ερευνητής στο Εθνικό Κέντρο Επιστημονικών Ερευνών της Γαλλίας (CNRS) στο Εργαστήριο Πληροφορικής της Grenoble (LIG) με ειδίκευση στη θεωρία παιγνίων και τις εφαρμογές της στην επιστήμη δικτύων. Το 2019 εκπόνησε την υφηγεσία του (habilitation à diriger des recherches) στο Πανεπιστήμιο της Grenoble με θέμα την "Επαναλαμβανόμενη Μάθηση και Βελτιστοποίηση στη Θεωρία Παιγνίων", και το 2020 προτάθηκε για το βραβείο médaille de bronze του CNRS στον τομέα της Πληροφορικής και Εφαρμοσμένων Μαθηματικών. Το 2023 διορίστηκε ως καθηγητής στο Τμήμα Μαθηματικών του ΕΚΠΑ.

Τα ερευνητικά του ενδιαφέροντα επικεντρώνονται στις περιοχές της θεωρίας παιγνίων, του μαθηματικού προγραμματισμού, και των εφαρμογών τους στη μηχανική μάθηση, την επιστήμη δεδομένων, και την επιστήμη δικτύων. Είναι επιστημονικός υπεύθυνος ή συνυπεύθυνος σε σειρά ερευνητικών προγραμμάτων στις παραπάνω περιοχές, με πιο πρόσφατο το Ευρωπαϊκό Δίκτυο Θεωρίας Παιγνίων (GAMENET). Το ερευνητικό του έργο περιλαμβάνει πάνω από 140 εργασίες σε διεθνή περιοδικά ή πρακτικά διεθνών συνεδρίων με κριτή, και πρόσφατα τιμήθηκε με το "Best Paper Award" του Διεθνούς Ινστιτούτου Επιχειρησιακής Έρευνας INFORMS στον τομέα της επιστήμης δικτύων για το 2022.

Time, Date & Location:

15:00, Friday, April 12th, 2024, Payatakes Room

Articles