Viewing a single comment thread. View all comments

MrFlufypants t1_iqx6bhc wrote

The activation functions are key. A linear combination of linear combinations is probably equal to a linear combination, so 10 layers would equate to a single layer, which is only capable of so much. The activation functions destroy the linearity though and are the key ingredient there

13

jms4607 t1_iqxdhw3 wrote

Without activation functions an mlp would just be y=sum(m•x) + b

7

ZestyData t1_iqybakm wrote

Lol thank you! Totally dropped the ball by missing that crucial element out.

3