Overfitting Problem of Regularization (CS229)

Underfitting (high bias) and overfitting (high varience) are both not good in regularization.

Overfitting

If we have too many features, the learned hypothesis may fit the training set very well, i.e. $J(\theta)=\frac{1}{2m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})^2\approx0$, but it is too difficult to figure out the prediction of new examples.

Occurrence of overfitting

If there are too many features with a little of training data, the overfitting problem might occur.

Addressing overfitting

Reduce number of features

  • Manually select which features to keep.
  • Model selection algorithm.

But the disadvantage of throwing away some features is by reducing them, we are throwing away the informations of the data.

Regularization

  • Keep all the features $(x_1,x_2,…,x_n)$, but reduce magnitude/values of parameters $\theta_j$.
  • Regularization works well when we have a lot of features, each of which contributes a bit to predicting $y$.