Comments

You must log in or register to comment.

harpooooooon t1_j3ck3ri wrote

An alternative would be a very large set of of 'if' statements

15

ok531441 t1_j3cm7dl wrote

For some problems like tabular data or time series there are plenty of alternatives, some of them better than deep learning (for a specific application). But those are very different from the cutting edge vision/language models.

29

IntelArtiGen t1_j3cmdvl wrote

Any alternative which would be able to solve the same problems would probably require a similar architecture: lot of parameters, deep connections.

There are many alternatives to deep learning on some specific tasks. But I'm not sure that if something is able to outpeform the current way we're doing deep learning on usual DL tasks, it will be something totally different (non-deep, few parameters etc.)

The future of ML regarding tasks we do with deep learning is probably just another kind of deep learning. Perhaps without backpropagation, perhaps with a totally different way to do computations, but still deep and highly parametric.

7

CireNeikual t1_j3cmwtt wrote

My own work focuses on an alternative to deep learning, called Sparse Predictive Hierarchies (SPH). It is implemented in a library called AOgmaNeo (Python bindings also exist). It does not use backpropagation, runs fully online/incremental/continually (non-i.i.d.). Its main advantages are the online learning but also that it runs super fast. Recently, I was able to play Atari Pong (with learning enabled!) on a Teensy 4.1 microcontroller, and still get 60hz.

If you would like to know more about it, here is a link to a presentation I gave a while back (Google Drive).

Other than my own work, I find the Tsetlin Machine interesting as well.

30

[deleted] t1_j3coep3 wrote

random forests and SVMs can also provide significant improvements over simple models (eg. logistic regression). however, for new problems/datasets, it's prudent to try many models.

3

jloverich t1_j3cpmoi wrote

Unfortunately this is basically a different type of layer by layer training which doesn't perform better than end to end training in any case that I'm aware of. It also seems very similar to stacking which can be done with any type of model.

6

brucebay t1_j3d4nzz wrote

Most commonly used (and successful models) in financial world are typically variations of regression and decision trees (xgboost is being the leader). If you try to do anything else, the powers in place will hang you, cut your head, and urinate in your grave, not necessarily in that order.

17

QuantumEffects t1_j3d73hq wrote

I think what we can expect in the future is combinations of deep learning and new, as of yet unknown methods. Just as reinforcement learning made older AI concepts new again, I bet we will see a merge. One that I'm watching right now is IBMs neurosymbolic approaches, which are attempting to merge formal methods learned through deep learning techniques.

1

TensorDudee t1_j3d8rox wrote

Linear regression is the future, maybe? 😭

1

Matthew2229 t1_j3dajdu wrote

Well of course. They just usually aren't as good on unstructured data (pictures, video, text, etc.). If they start performing better, then people will use them. Simple as that.

1

lightofaman t1_j3dccb2 wrote

PhD candidate on AI here. Gradient boosting is the real deal when tabular data is concerned (for both regression and classification on ML). However, thx to UAT neural nets are awesome approximators to really complex functions and therefore are the way to go for complex tasks, like the ones presented by scientific machine learning, for example. LeCun (not so) recently said that deep learning is dead and differentiable programing (another way to describe SciML) is the new kid in the block.

5

shanereid1 t1_j3dgdq0 wrote

Do you define transformers as Deep Learning? Cause if not then transformers.

1

CactusOnFire t1_j3dhxfk wrote

I work in the financial world- and to elaborate on your comment:

Speaking strictly for myself, there are a few reasons I rarely use neural networks at my day job:

-Model explainability: Often stakeholders care about being able to explicitly label the "rules" which lead to a specific outcome. Neural Networks can perform well, but it is harder to communicate why the results happened than with simpler models.

-Negligible performance gains: While I am working on multi-million row datasets, the number of features I am working with are often small. The performance improvements I get for running a tensorflow/pytorch model are nearly on par with running an sklearn model. As a result, deep learning is overkill for most of my tasks.

-Developer Time & Speed: It is much quicker and easier to make an effective model in sklearn than it is in tensorflow/pytorch. This is another reason Neural Networks are not my default solution.

There are some out-of-the-box solutions available in tools like Sagemaker or Transformers. But even still, finding and implementing one of these is still just going to take slightly longer than whipping up a random forest.

-Legacy processes: There's a mentality of "if it ain't broke, don't fix it". Even though I am considered a subject matter expert in data science, the finance people don't like me tweaking the way things work without lengthy consultations.

As a result, I am often asked to 'recreate' instead of 'innovate'. That means replacing a linear regression with another linear regression that uses slightly different hyperparameters.

-Maintainability: There are significantly more vanilla software engineers than data scientists/ML Engineers at my company. In the event I bugger off, it's going to be easier for the next person to maintain my code if the models are simple/not Neural Networks.

54

rduke79 t1_j3dohr9 wrote

In an alternate universe, Numenta's Hierarchical Temporal Memory model has won the race against Deep Learning. Sometimes I wish I lived in that universe.

It's an ML paradigm deeply rooted in neuroscience. Things like HTM School on Youtube are just awesome. I believe that if the ML community had jumped on the HTM wagon at the time (and Transfomers hadn't shown up) and had invested the same amount of effort into it as it has into DL, we would be at a similar point in development but with a better ML framework.

2

IntelArtiGen t1_j3dpy8q wrote

Well it doesn't really count because you can also "solve" these tasks with SVM / RandomForests, etc. MNIST, OCR and other tasks with very small images are not great benchmarks anymore to compare a random algorithm with a deep learning algorithm.

I was more thinking of getting 90% top 1 on ImageNet or generating 512x512 images from text or learning on billions of texts to answer questions. You either need tons of parameters to solve these or an unbelievable amount of compression. And even DL algorithms which do compression need a lot of parameters. You would need an even bigger way to compress an information, perhaps it's possible but it's yet to invent.

1

IntelArtiGen t1_j3dvbjr wrote

>Imo there's no reason why we can't have much smaller models

It depends on how much smaller they would be. There are limits to how much you can compress information. If you need to represent 4 states, you can't use one binary value 0/1, you need two parameters 00/01/10/11.

A large image of the real world contains a lot of information / details which can be hard to process and compress. We can compress it of course, that's what current DL algorithms and compression softwares do, but they have limits otherwise they loose too much information.

Usual models are far from being perfectly optimized but when you try to optimize them too much you can quickly loose in accuracy. Under 1.000.000 parameters it's hard to have anything that could compete with more standard DL models on the tasks I've described... at least for now. Perhaps people will have great ideas but it would require to really push current limits.

2

yldedly t1_j3dwdv6 wrote

I agree of course, you can't compress more than some hard limit, even in lossy compression. I just think DL finds very poor compression schemes compared to what's possible (compare DL for that handwriting problem above to the solution constructed by human experts).

2

IntelArtiGen t1_j3dyhfy wrote

By default it's true that DL algorithms are truly unoptimized on this point because modelers usually don't really care about optimizing the number of parameters.

For example Resnet50 uses 23 million parameters, which is much more than efficient net B0 which uses 5 million parameters and have a better accuracy (and is harder to train). But when you try to further optimize algorithms which were already optimized on their number of parameters you quickly see these limits. You would need models that would be even more efficient than these DL models which are already optimized regarding their number of parameters.

A DL model could probably solve this handwriting problem with a very low number of parameters if you build it specifically with this goal in mind.

2

Competitive-Rub-1958 t1_j3etbs5 wrote

I think HTM was doomed to fail from the very start; even Hawkins has distanced himself from it. The problem is that HTM/TBT all are iterations for a better model of the neocortex. But it would definitely take quite a bit of time to really unlock all the secrets of the brain.

I think Numenta quickly realized that the path forward is going to be even longer than they thought (they've been researching ~20 years now IIRC) so they're wanting to quickly cash out - their whole new push towards "integrating DL" with their ideas (spoiler: doesn't work well, check out their latest paper on that) and working on sparsifying LLMs - something which the NeuralMagic folks already lead quite a huge part of the industry by (see the recent paper: LMs can be pruned in one-shot).

That argument of "If we'd put X resources in Y thing, we'd have solved ASI by now!" is quite illogical and applicable to literally every field. In the end, Numenta's work simply did not yield the results that Hawkins et al. were hoping to get. No results is a very tricky grounding to attract the interests of other researchers. If HTM/TBT wants a comeback, it would have to be on the shoulders of some strong emergent abilities in their architectures...

7

CrysisAverted t1_j3f3n1p wrote

So an alternative to deep learning is tree based methods and gradient boosted methods on top of those trees. XGBoost etc. These aren't technically deep learning but they have a ton in common.

1

Groundbreaking-Air73 t1_j3fn7o1 wrote

For small tabular datasets I found gradient boosting methods (xgboost, lightgbm, catboost) to outperform typical deep learning architectures.

5

Immarhinocerous t1_j3g5qgb wrote

What do you want to do with it?

For tabular data of a few million rows or less, you're often much better off using XGBoost, or one of the other boosting libraries. Read up on boosting. It is an alternative approach to deep learning. Technically it can also be used with neural networks, including deep learning, but in practice it is not often used with it because boosting relies on multiple weak learners whereas deep learning has long training times for the creation of one strong learner.

XGBoost and CatBoost have won many many Kaggle competitions. My former employer trained all their production models using XGBoost, and they were modeling people's credit scores. There are many reasons to use XGBoost including speed of training (much faster to train than deep neural networks) and interpretability (easier to interpret the model's decision-making process with XGBoost, because under the hood it's just decision trees).

I mostly use XGBoost, and sometimes fairly simple LSTMs. I use them primarily for financial modeling. XGBoost works well and the fast training times let me do optimization across a wide range of model parameters, without spending a bunch of money on GPUs.

If you want to do image analysis though, you do need deep learning for state-of-the-art. Ditto reinforcement learning. Ditto several other types of problems.

So, it depends.

1

Immarhinocerous t1_j3g6gf5 wrote

That's really interesting, thanks for the share. Though I wonder if most decision trees still don't converge upon the same solutions as a neural network, even if they're capable of representing the same solutions. If trees don't converge on the same solutions, and NNs outperform trees, that would mean NNs are still needed for training, then the models can be optimized for run-time by basing a tree off the NN.

2

ForceBru t1_j3gfgvo wrote

> Read up on boosting.

What could a good reading list look like? I read the original papers which introduced functional gradient descent (the theoretical underpinning of boosting), but I can't say they shed much light on these techniques for me.

Is there more recommended reading to study boosting? Anything more recent, maybe? Any textbook treatments?

1

Immarhinocerous t1_j3gkq83 wrote

Google is your friend here. ChatGPT may even give a decent response.

Start by learning bagging, then learn boosting.

I find the following site fairly good: https://machinelearningmastery.com/essence-of-boosting-ensembles-for-machine-learning .

The explanations are usually approachable, or the author usually has another article on the topic at a simpler level of detail. He has good code samples, many of his articles also go quite in depth, so he caters to a broad range of audiences and is good at sticking with a certain level of depth in a topic in his articles. I've even bought some of his materials and they were fairly good, but his free articles are plenty.

There are lots of other sites you can find that will teach you. Read a few different sources. They're worth understanding well. Since you stated you've read papers on gradient descent, you might find some helpful papers by searching scholar.google.com.

This is also a good place to start: https://www.ibm.com/topics/bagging

1

Immarhinocerous t1_j3km9hy wrote

My goto model (after a linear model) is usually XGBoost. It's nice to see that the theoretical potential of a tree based model is as high as neural network. Though that's not necessarily true of XGBoost's boosted trees, they do perform well, and I really like how fast their training time is.

1