Viewing a single comment thread. View all comments

Superschlenz t1_iyy2jx0 wrote

Normally, compute is saved by pruning away slow changing weights which are close to zero.

And you seem to want to prune away fast changing activations.

Don't the machine learning libraries have a dropout mechanism where you can zero out activations with a binary mask? I don't know. You would have to compute the forward activations for the first layer, then compare the activations with a threshold to set the mask bits, then activate the dropout mask for that layer before computing the next layer's activations. Sounds like a lot of overhead instead of a saving.

Edit: You may also manually force the activations to zero if they are low. The hardware has built-in energy saving circuitry that skips multiplications by zero, maybe by 1 and additions of zero as well. But it still needs to move the data around.

1

_PYRO42_ t1_izmknte wrote

I have an intuition: Larger models are successful not because of the amount of computation they can take advantage of but because of the amount of knowledge they can encode. I want to try an ultra-large, ultra-deep neural network with Giga bytes of neurons that would consume no more than 50 Watts of power. The human brain uses 20 Watts; I feel we are making a mistake when we start poking in the 100-200W of power on a single network. I want to control machines, not generate pieces of art. I want Factorio not to be a game but a reality of ours.

I will bring edge computing to this world. I will make it a thing you can wear not on your skin but as your skin.

My brother, come join me. In battle, we are stronger.

1