Linear Regression with multiple variables -- Gradient descent(CS229)

Feature Scaling

Get every feature into approximately $-1\le x_i \le1$

$$x_i=\frac{x^{(i)}-\mu^{(i)}}{s^{(i)}}$$

where $x_i$ is the scaled value of data set $x^{(i)}$, $\mu^{(i)}$ is the average value of $x^{(i)}$ training set, and $s^{(i)}$ is the range/ standard(i.e. $\max-\min$) of $x^{(i)}$ training set.

Choose appropriate $\alpha$ (learning rate)

  • $J(\theta)$ should decrease after every iteration.
  • We declare convergence if $J(\theta)$ decreases by less than $10^{-3}$ in one iteration, which means we say that convergence is defined as $\epsilon\lt10^{-3}$.
  • Draw a diagram of numbers of iterations as x-axis and $J(\theta)$ as y-axis to observe $\alpha$.
  • Try $\alpha = 10^n, n\in \mathbb{Z}$

Polynomial Regression