Viewing a single comment thread. View all comments

_Arsenie_Boca_ t1_irwzk3j wrote

The point is that you cannot confirm the superiority of an architecture (or whatever component) when you change multiple things. And yes, it does matter where an improvement comes from, it is the only scientfically sound method to improve. Otherwise we might as well try random things until we find something that works.

To come back to LSTM vs Transformers: Im not saying LSTMs are better or anything. Im just saying that if LSTMs would have received the amount of engineering attention that went into making transformers better and faster, who knows if they might be similarly successful?

8

visarga t1_irzdrho wrote

> if LSTMs would have received the amount of engineering attention that went into making transformers better and faster

There was a short period when people were trying to improve LSTMs using genetic algorithms or RL.

The conclusion was that the LSTM cell is somewhat arbitrary and many other architectures work just as well, but none much better. So people stuck with classic LSTMs.

2