Gradient Descent for Linear Regression (Stanford CS229)

Derivative term of $\theta_j$

$$\frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1)=\frac{\partial}{\partial\theta_j}\frac{1}{2m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})^2$$
$$=\frac{\partial}{\partial\theta_j}\frac{1}{2m}\sum_{i=1}^m(\theta_0+\theta_1x^{(i)}-y^{(i)})^2\tag{1}$$

for $j=0,$
$$\frac{\partial}{\partial\theta_0}J(\theta_0,\theta_1)=\frac{1}{m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})*1\tag{1.1}$$

for $j=1,$
$$\frac{\partial}{\partial\theta_1}J(\theta_0,\theta_1)=\frac{1}{m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})*x^{(i)}\tag{1.2}$$

Gradient descent algorithm

Knowing from $(1.1)$ and $(1.2)$, repeat until convergence:
$$\theta_0:=\theta_0-\alpha\frac{1}{m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})$$
$$\theta_1:=\theta_1-\alpha\frac{1}{m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})*x^{(i)}$$

Batch gradient descent

Batch means we run through all amounts $(m)$ of the data every step when we get approach to the $min$ of $J(\theta_i)$