Generalization of Deep Neural Networks

Type

Master's thesis / Bachelor's thesis / guided research

Prerequisites

Strong machine learning knowledge
Solid knowledge in probability theory
Proficiency with Python
(Preferred) Proficiency with deep learning frameworks (TensorFlow or PyTorch)

Description

Deep neural networks (DNNs) have shown success in various domains (such as image classification, speech recognition, playing games, etc.) over the last decades. However, the theoretical properties of DNNs, particularly their remarkable generalization abilities, largely remain unclear to the scientific community. DNNs used in practice are usually strongly over-parametrized, i.e., they have many more parameters than training samples. According to classical statistical learning theory, such models are prone to overfitting, i.e., they perform much worse on test data than on training data. Surprisingly, DNNs often do generalize well to test data in practice, directly challenging classical statistical learning theory. Therefore, we need a new theory of DNNs' generalization and more thorough experiments that precede the theory.

References

Reconciling modern machine learning practice and the bias-variance trade-off (https://arxiv.org/pdf/1812.11118.pdf)
Deep Double Descent: Where Bigger Models and More Data Hurt [https://arxiv.org/pdf/1912.02292.pdf](https://arxiv.org/pdf/1912.02292.pdf)
Surprises in High-Dimensional Ridgeless Least Squares Interpolation [https://arxiv.org/pdf/1903.08560.pdf](https://arxiv.org/pdf/1903.08560.pdf)
Neural Tangent Kernel: Convergence and Generalization in Neural Networks [https://arxiv.org/pdf/1806.07572.pdf](https://arxiv.org/pdf/1806.07572.pdf)
Disentangling Trainability and Generalization in Deep Neural Networks [https://arxiv.org/pdf/1912.13053.pdf](https://arxiv.org/pdf/1912.13053.pdf)

Links and Functions

Breadcrumb Navigation

Main Navigation

Content

Generalization of Deep Neural Networks

Type

Prerequisites

Description

References

Footer