World's most popular travel blog for travel bloggers.

[Solved]: How exactly do you calculate the hidden layer gradients in the backpropagation algorithm?

, , No Comments
Problem Detail: 

I have been going through the description of the backpropagation algorithm found here. and I am having a bit of trouble getting my head around some of the linear algebra.

Say I have a final output layer $L$ consisting of two visible units, and layer $L-1$ consists of four hidden units. (this is just an example to illustrate my problem)

my understanding is that the weight matrix for this final layer ($w^L$) should be a 4 x 2 matrix.

the reference says to calculate the output error $\delta^{x,L}$ given by:

$\delta^{x,L} = \nabla_a C_x \odot \sigma'(z^{x,L})$

where:

$z^{x,L} = w^La^{x,L-1} + b^L$,

$a^{x,L} = \sigma(z^{x,L})$, and

$\odot$ is the hadamard product.

evaluating $\delta^{x,L}$ gives a 1 x 2 vector, as it should given there are two output units.

my problem is when calculating the hidden layer gradients (eg. $L-1$) given by:

$\delta^{x,L-1} = ((w^L)^T\delta^{x,L})\odot\sigma'(z^{x,L-1})$

now if $w^L$ is a 4x2 matrix and $\delta^{x,L}$ is a 1x2 vector then wouldnt $(w^L)^T\delta^{x,L}$ be a multiplication of a 2x4 matrix and a 1x2 matrix, which is impossible?

i feel like i have missed something vital in my understanding, but i cant work out what it is.

is it just as simple as making it $\delta^{x,L}(w^L)^T$? this would be a 1x2 matrix multiplied by a 2x4 matrix, which is perfectly legal. but the formula has it around the other way.

can anyone see where my understanding is flawed? any help would be greatly appreciated

Asked By : guskenny83

Answered By : Kyle Jones

You've transposed the sizes of both $w^L$ and $\delta^{x,L}$. $w^L$ should be 2x4 and $\delta^{x,L}$ should be 2x1. $(w^L)^T$ is a 4x2 matrix that will be multipled by a 2x1 matrix yielding a 4x1 matrix suitable for the next step of backpropagation. In general for neural nets the activation units are represented as column vectors and the weights are matrices of dimension |L+1| x |L|, where L is the current layer and L+1 is the next layer (in the forward direction).

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/30785

0 comments:

Post a Comment

Let us know your responses and feedback