Submitted by radi-cho t3_10dqgw2 in MachineLearning
blackkettle t1_j4p8nqv wrote
It doesn’t seem to discuss the computational advantages in any detail. How interesting is this whole FF idea at this point? I’d love to hear more detailed analysis.
So far it seems like an interesting alternative but the “brain inspired” part is pushed in every article. In terms of accuracy it always seems slightly below traditional back prop. If there’s a huge computational improvement that would seriously recommend it I guess, but is there? Or is it just too early to tell?
youregonnalovemynuts t1_j4pmvzs wrote
Hinton's original paper discusses the computational advantages this algorithm can provide. See page 14 of this PDF: https://www.cs.toronto.edu/~hinton/FFA13.pdf . Today though there isn't hardware readily available that will exploit these advantages so they'll remain theoretical for now. If there seems to be enough promise that such an algorithm will converge anywhere close to backprop, we'll see some attempts there, probably starting with some FPGAs and extra circuits.
What everyone should be asking is why MNIST? It's a toy dataset at this point, it should be trivial for someone like Hinton to scale these experiments to something closer to reality. MNIST is like mouse experiments of machine learning, maybe it's an early necessity , but it hardly says anything about actual viability.
data_wizard_1867 t1_j4qstlq wrote
I like your likening of MNIST to mouse experiments. Someone should make a hierarchy of evidence equivalent for ML research. Since that's largely focused on medical research.
blimpyway t1_j4pndcs wrote
One application I can think of is learning on edge. There is an industry fashion to embed AI inference capabilities in the newer ARM chips. The so called NPUs. Which are simplified GPUs optimized only for inference (forward passes). Such an algorithm would enable them to learn using only forward passes, hence without requiring backpropagation.
Another possibility I think is ability to train one layer at a time, which diminishes GPU memory requirements.
And probably more important it opens the gates for all kind of not yet seen network architectures, topologies and training methods that do not require fully differentiable pathways.
edit: regarding the brain inspired part.. well you can dismiss it as AI's reversed cargo cult - if it imitates some properties of the brain it should act like the brain, but I would be cautious to attribute Hinton this kind of thinking. Brains are very different from ANNs and trying to emulate their properties could provide insights on how they work.
currentscurrents t1_j657n2z wrote
>The so called NPUs. Which are simplified GPUs optimized only for inference (forward passes). Such an algorithm would enable them to learn using only forward passes, hence without requiring backpropagation.
More importantly, you could build even simpler chips that physically implement a neural network out of analog circuits instead of emulating one with digital math.
This would use orders of magnitude less power, and also let you fit a larger network on the same amount of die space.
amassivek t1_j6owo9f wrote
Here is an reversed view, where ANNs provide inspiration for neuroscience to investigate the brain. Forward learning models provide a new perspective on how neurons without "feedback" or "learning" connections have the ability to still learn, a common scenerio. We make note of this and show the conceptual framework for forward learning: https://arxiv.org/abs/2204.01723. This conceptual framework is applicable to neuroscience models, providing an investigative path forward.
amassivek t1_j6opolw wrote
As the depth and width of the network grows, the computational advantage grows. Forward only learning algorithms, such as FF and PFF, have this advantage.
There is also a compatability advantage: forward only learning algorithm work on limited resource devices (edge devices) and neuromorphic chips.
For an analysis on efficiency, refer to section IV: https://arxiv.org/abs/2204.01723
We demonstrate reasonable performance on cifar-10 and cifar-100, in that same paper (Section IV). So, the performance gap may decrease over time.
For a review of forward only learning, with an explaination on why it has efficiency and compatibility: https://amassivek.github.io/sigprop
Viewing a single comment thread. View all comments