上QQ阅读APP看书，第一时间看更新

The learning rate

While somewhat intuitive, the learning rate of a model simply determines how fast it can learn. Mathematically put, the learning rate determines the exact size of the step we take at each iteration, as we descend the loss landscape to converge to ideal weights. Setting the right learning rate for your problem can be challanging, specially when the loss landscape is complex and full of surprises, as can be seen in the illustration here:

This is quite an important notion. If we set a learning rate too small, then naturally, our model learns less than it potentially could per any given training iteration. Even worse with low learning rates is when our model gets stuck in a local minimum, thinking that it has reached a global minimum. Conversely, a high learning rate could, on the other hand, deter our model from capturing patterns of predictive value.

If our steps are too large, we may simply keep overshooting over any global minima present in our feature space of weights, and hence, never converge on our ideal model weights.

One solution to this problem is to set an adaptive learning rate, responsive to the specific loss landscape it may encounter during training. We will explore various implementations of adaptive learning rates (such as Momentum, Adadelta, Adagrad, RMSProp, and more) in subsequent chapters: