Logistic Regression of Cost Function (CS229)

Given and assumptions

Training set

$$\lbrace(x^{(1)},y^{(1)}),(x^{(1)},y^{(1)}),…,(x^{(m)},y^{(m)})\rbrace$$

$m$ examples

$$\mathbf{x}=
\begin{Bmatrix}
x_0 \\
x_1 \\
\vdots \\
x_n
\end{Bmatrix}\in\mathbb{R}^{n+1}$$

Labels

$$y\in\lbrace0,1\rbrace$$

Hypothesis

$$h_{\theta}(x)=\frac{1}{1+e^{-\mathbf{\theta^\top}x}}$$

How to choose $\mathbf\theta$ ?

Logistic Regression

If we take cost function as a squared function like:

$$Cost(h_{\theta}(x^{(i)}),y^{(i)})=\frac{1}{2}(h_{\theta}(x^{(i)})-y^{(i)})^2$$

Then the linear regression will become

$$J(\theta)=\frac{1}{m}\sum_{i=1}^m\frac{1}{2}(h_{\theta}(x^{(i)})-y^{(i)})^2\\=\frac{1}{m}\sum_{i=1}^mCost(h_{\theta}(x^{(i)}),y^{(i)})$$

To be brief, we get rid of the superscript:

$$Cost(h_{\theta}(x),y)=\frac{1}{2}(h_{\theta}(x)-y)^2$$

for assigning the hypothesis function to an logistic function, the cost furntion becomes a non-convex function, it means that we can not guarantee to converge to the global minimum when we run gradient descent.

We can proof that if a function is a convex funtion, then we’ll reach the global minmum running gradient descnet.

Choose a logistic regression cost function with a convex function

Assume
$$
Cost(h_{\theta}(x),y)=
\begin{cases}
-\log(h_{\theta}(x)), \text{if $y=1$} \\
-\log(1-h_{\theta}(x)), \text{if $y=0$}
\end{cases}
$$

Obviously, we can find the global minmum now as define the cost function a kind of convex function.