Phoneaccount25732 t1_iz8kaym wrote on December 7, 2022 at 6:51 AM

Kingma's Reparameterization Trick.

Minsky on how single layer Perceptrons can't solve xor, and whoever did the follow-up on how MLPs can.

Wolpert on No Free Lunch in search and optimization.

Blutorangensaft t1_iz92v1x wrote on December 7, 2022 at 11:18 AM

What is people's take on Minsky? Because the XOR issue is easily relieved with a nonlinearity, and I never understood why not knowing how to learn connectedness is a major limtation of neural nets. In fact, even humans have trouble with that, and I fail to see how that generalises. He may be an important figure, but not in favour of progress in artificial intelligence.

Of course, if the question is just about portraying important junctions in AI research, regardless of whether they were helpful or not, then Minsky belongs there.

ThisIsMyStonerAcount t1_iz9c2oo wrote on December 7, 2022 at 1:02 PM

Minsky's work was very relevant because at the time the perceptron was the state of the art, and that algorithm can't solve XOR. That's why his work started the first AI winter.

Blutorangensaft t1_iz9pl46 wrote on December 7, 2022 at 2:52 PM

I get that, but I find it a little hard to believe that nobody immediately pointed out a nonlinearity would solve the issue. Or is that just hindsight bias, thinking it was easier because of what we know today?

ThisIsMyStonerAcount t1_iza1qzy wrote on December 7, 2022 at 4:16 PM

What nonlinearity would solve the issue? The usual ones we use today certainly wouldn't. Are you thinking a 2nd order polynomial? I'm not sure that's a generally applicable function, with being non-monotonical and all?

(Or do you mean a hidden layer? If so: yeah, that's absolutely hindsight bias).

Blutorangensaft t1_iza9zrt wrote on December 7, 2022 at 5:11 PM

I see. I meant the combination of a nonlinear activation function and another hidden layer. Was curious what people thought, thanks for your comment.

chaosmosis t1_izawfsz wrote on December 7, 2022 at 7:37 PM

Non-monotonic activation functions can allow for single layers to solve xor, but they take forever to converge.

twistor9 t1_iznfnm1 wrote on December 10, 2022 at 12:14 PM

Reparameterization trick is overrated, my Prof is a famous AI researcher and he said people were doing that since the 90s. Deepmind also were about to republish the same idea at the same time, I spoke to one of the authors. Gumbel softmax for discrete reparam was genuinely a new idea though as far as I know but also had two labs publishing on it.