Why are neural networks initial weights initialized as random numbers? I had read somewhere that this is done to "break the symmetry" and this makes the neural network learn faster. How does breaking the symmetry make it learn faster?
Would'nt initializing the weights to 0 be a better idea? That way the weights would be able to find their values (whether positive or negative) faster?
Is there some other underlying philosophy behind randomizing the weights apart from hoping that they would be near their optimum values when initialized?
Asked By : Shayan RC
Answered By : Subhayan
The basic intuition behind initializing weight layers into small (and different) values is just so that the bias of the system is broken and weight values can move along and away and apart to different values.
More concretely, you'ld probably want your initial weights to be distinct and have "a small gap" between them, this 'gap' expands out as you go along and forces the weights to be a bit larger at every iteration, and this helps the network to converge faster, i.e. the learning process speeds up.
If you would instead have all your weights to some constant, each weight will be updated at a very slow (~fixed) rate, and this won't help much, specially if the initial values are 'very far' from the final values.
Hope that helps, Have fun learning :)
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/13882
0 comments:
Post a Comment
Let us know your responses and feedback