Given and assumptions
Training set
$$\lbrace(x^{(1)},y^{(1)}),(x^{(1)},y^{(1)}),…,(x^{(m)},y^{(m)})\rbrace$$
$m$ examples
$$\mathbf{x}=
\begin{Bmatrix}
x_0 \\
x_1 \\
\vdots \\
x_n
\end{Bmatrix}\in\mathbb{R}^{n+1}$$
Labels
$$y\in\lbrace0,1\rbrace$$
Hypothesis
$$h_{\theta}(x)=\frac{1}{1+e^{-\mathbf{\theta^\top}x}}$$
How to choose $\mathbf\theta$ ?
Logistic Regression
If we take cost function as a squared function like:
$$Cost(h_{\theta}(x^{(i)}),y^{(i)})=\frac{1}{2}(h_{\theta}(x^{(i)})-y^{(i)})^2$$
Then the linear regression will become
$$J(\theta)=\frac{1}{m}\sum_{i=1}^m\frac{1}{2}(h_{\theta}(x^{(i)})-y^{(i)})^2\\=\frac{1}{m}\sum_{i=1}^mCost(h_{\theta}(x^{(i)}),y^{(i)})$$
To be brief, we get rid of the superscript:
$$Cost(h_{\theta}(x),y)=\frac{1}{2}(h_{\theta}(x)-y)^2$$
for assigning the hypothesis function to an logistic function, the cost furntion becomes a non-convex function, it means that we can not guarantee to converge to the global minimum when we run gradient descent.
We can proof that if a function is a convex funtion, then we’ll reach the global minmum running gradient descnet.
Choose a logistic regression cost function with a convex function
Assume
$$
Cost(h_{\theta}(x),y)=
\begin{cases}
-\log(h_{\theta}(x)), \text{if $y=1$} \\
-\log(1-h_{\theta}(x)), \text{if $y=0$}
\end{cases}
$$
Obviously, we can find the global minmum now as define the cost function a kind of convex function.