Viewing a single comment thread. View all comments

ZestyData t1_iqwjiua wrote

  1. Layered linear nodes can model non-linear behaviours.
  2. Computational complexity. Its more efficient to use the aforementioned layers of linears than it is to use non-linear functions directly
24

MrFlufypants t1_iqx6bhc wrote

The activation functions are key. A linear combination of linear combinations is probably equal to a linear combination, so 10 layers would equate to a single layer, which is only capable of so much. The activation functions destroy the linearity though and are the key ingredient there

13

jms4607 t1_iqxdhw3 wrote

Without activation functions an mlp would just be y=sum(m•x) + b

7

ZestyData t1_iqybakm wrote

Lol thank you! Totally dropped the ball by missing that crucial element out.

3