Viewing a single comment thread. View all comments

suflaj t1_jc7jibo wrote

> The same could have been said of Deep Learning until the Image Net breakthrough. The improvement process is evolutionary, and this may be a step in that process.

This is not comparable at all. ImageNet is a database for a competition - it is not a model, architecture or technique. When it was "beaten", it was beaten not by a certain philosophy or ideas, it was beaten by a proven implementation of a mathematically sound idea.

This is neither evaluated on a concrete dataset, nor is it delved into deeply in the mathematical sense. This is a preprint of an idea that someone fiddled with using a LLM.

> As for reinforcement learning, it has been successfully applied in many real-world scenarios, including robotics, game playing, and autonomous driving.

My point is that so has the 6 year old DNC. The thing is, however, that neither of those is your generic reinforcement learning - they're very specifically tuned for the exact problem they are dealing with. If you actually look at what is available for DRL, you will see that aside from very poor framework support, probably the best we have is Gym, the biggest issue is how to even get the environment set up to enable learning. The issue is in making the actual task you're learning easy enough for the agent to even start learning. The task of knowing how to memorize or recall is incredibly hard, and we humans don't even understand memory well enough to construct problem formulations for those two.

Whatever technique you come up with, if you can't reproduce it for other problems or models, you will just be ending up with a specific model. I mean - look at what you are saying. You're mentioning AlphaGo. Why are you mentioning a specific model/architecture for a specific task? Why not a family of models/architectures? Maybe AlphaZero, AlphaGo, MuZero sound similar, but they're all very, very different. And there is no real generalization of them, even though they all represent reinforcement learning.

> This is one path and other methods could be incorporated such as capsule networks, which aim to address the limitations of traditional convolutional neural networks by explicitly modeling the spatial relationships between features.

And those are long shown to be a scam, basically. Well, maybe not fundamentally scam, but definitely dead. Do you know what essentially killed them? Transformers. And do you know why Transformers are responsible for almost killing the rest of DL architectures? Because they showed actual results. The paper that is the topic of this thread fails to differentiate the contribution of this method disregarding the massive transformer they're using alongside it. If you are trying to show the benefits of a memory augmented system, why simply not use a CNN or LSTM as controller? Are the authors implying that this memory system they're proposing needs a massive transformer to even use it? Everything about it is just so unfinished and rough.

> Another approach is to use memory augmented networks to store and update embeddings of entities and their relationships over time, and use capsule networks to decode and interpret these embeddings to make predictions. This approach can be especially useful for tasks that involve sequential data, such as language modeling and time-series forecasting.

Are you aware that this exactly has been done by Graves et al., where the external memory is essentially a list of embeddings that is 1D convoluted on? The problem, like I mentioned, is that this kind of process is barely differentiable. Even if you do fuzzy search (Graves at al. use sort of an attention based on access frequency alongside the similarity one), your gradients are so sparse your network basically doesn't learn anything. Furthermore, the output of your model is tied to this external memory. If you do not optimize the memory, then you are limiting the performance of your model severely. If you are, then what you're doing is nothing novel, you have just arbitrarily decided that part of your monolithic network is memory, even though it's just one thing.

2