suflaj t1_j13pqhe wrote on December 21, 2022 at 1:18 PM

While you can run large models (layer by layer, batch by batch, dimension by dimension or element by element), the problem is getting to the weights. No one said you need to transform your input to the output in one go. All that is important is that there is no single operation that would make you go OOM.

Theoretically, there is no network where a linear combination would exceed modern memory sizes, but this doesn't mean that such a strategy would be fast. At the base level, all you need is 3 registers (2 for addition and multiplication, 1 to keep sum aggregate) and enough memory to store the network weights.