Viewing a single comment thread. View all comments

Brudaks t1_j9k6mo0 wrote

Because being an universal function approximator is not sufficient to be useful in practice, and IMHO is not even really a particularly interesting property; we don't care if something can approximate any function, we care whether it approximates the thing needed for a particular task; and in any case being able to approximate it is a necessary but not a sufficient condition. We care about efficiency of approximation (e.g. a single-layer perceptron is an universal approximator iff you assume an impractical number of neurons), but even more important than how well the function can be approximated with a limited number of parameters is how well you can actually learn these parameters - this differs a lot for different models, and we don't care about how well a model would fit the function with optimal parameters, we care about how well it fits the function with the parameter values we can realistically identify with a bounded amount of computation.

That being said, we do use decision trees instead of DL; for some types of tasks the former outperform the latter and for other types of tasks its the other way around.

3