Marathon. Food. Programming.

Linear Regression with multiple variables -- Gradient descent(CS229)

發表於 2018-06-24

Feature Scaling

Get every feature into approximately $-1\le x_i \le1$

$$x_i=\frac{x^{(i)}-\mu^{(i)}}{s^{(i)}}$$

where $x_i$ is the scaled value of data set $x^{(i)}$, $\mu^{(i)}$ is the average value of $x^{(i)}$ training set, and $s^{(i)}$ is the range/ standard(i.e. $\max-\min$) of $x^{(i)}$ training set.

Choose appropriate $\alpha$ (learning rate)

$J(\theta)$ should decrease after every iteration.
We declare convergence if $J(\theta)$ decreases by less than $10^{-3}$ in one iteration, which means we say that convergence is defined as $\epsilon\lt10^{-3}$.
Draw a diagram of numbers of iterations as x-axis and $J(\theta)$ as y-axis to observe $\alpha$.
Try $\alpha = 10^n, n\in \mathbb{Z}$

Polynomial Regression