Processing math: 100%

 

 

 

Example: binary classification problem

As an example of the above, relevant for project 2 as well, let us consider a binary class. As discussed in our logistic regression lectures, we defined a cost function in terms of the parameters β as

C(β)=ni=1(yilogp(yi|xi,β)+(1yi)log1p(yi|xi,β)),

where we had defined the logistic (sigmoid) function

p(yi=1|xi,β)=exp(β0+β1xi)1+exp(β0+β1xi),

and

p(yi=0|xi,β)=1p(yi=1|xi,β).

The parameters β were defined using a minimization method like gradient descent or Newton-Raphson's method.

Now we replace xi with the activation zli for a given layer l and the outputs as yi=ali=f(zli), with zli now being a function of the weights wlij and biases bli. We have then

ali=yi=exp(zli)1+exp(zli),

with

zli=jwlijal1j+bli,

where the superscript l1 indicates that these are the outputs from layer l1. Our cost function at the final layer l=L is now

C(W)=ni=1(tilogaLi+(1ti)log(1aLi)),

where we have defined the targets ti. The derivatives of the cost function with respect to the output aLi are then easily calculated and we get

C(W)aLi=aLitiaLi(1aLi).

In case we use another activation function than the logistic one, we need to evaluate other derivatives.