Comments

You must log in or register to comment.

CireNeikual t1_j3cmwtt wrote

My own work focuses on an alternative to deep learning, called Sparse Predictive Hierarchies (SPH). It is implemented in a library called AOgmaNeo (Python bindings also exist). It does not use backpropagation, runs fully online/incremental/continually (non-i.i.d.). Its main advantages are the online learning but also that it runs super fast. Recently, I was able to play Atari Pong (with learning enabled!) on a Teensy 4.1 microcontroller, and still get 60hz.

If you would like to know more about it, here is a link to a presentation I gave a while back (Google Drive).

Other than my own work, I find the Tsetlin Machine interesting as well.

30

sidney_lumet OP t1_j3cp3xs wrote

Great insights my G. Surely will look into your work. Might contact you in future if I want to contribute something to your idea.

7

ok531441 t1_j3cm7dl wrote

For some problems like tabular data or time series there are plenty of alternatives, some of them better than deep learning (for a specific application). But those are very different from the cutting edge vision/language models.

29

varukimm t1_j3cxdrz wrote

You refere to regressions?

3

brucebay t1_j3d4nzz wrote

Most commonly used (and successful models) in financial world are typically variations of regression and decision trees (xgboost is being the leader). If you try to do anything else, the powers in place will hang you, cut your head, and urinate in your grave, not necessarily in that order.

17

CactusOnFire t1_j3dhxfk wrote

I work in the financial world- and to elaborate on your comment:

Speaking strictly for myself, there are a few reasons I rarely use neural networks at my day job:

-Model explainability: Often stakeholders care about being able to explicitly label the "rules" which lead to a specific outcome. Neural Networks can perform well, but it is harder to communicate why the results happened than with simpler models.

-Negligible performance gains: While I am working on multi-million row datasets, the number of features I am working with are often small. The performance improvements I get for running a tensorflow/pytorch model are nearly on par with running an sklearn model. As a result, deep learning is overkill for most of my tasks.

-Developer Time & Speed: It is much quicker and easier to make an effective model in sklearn than it is in tensorflow/pytorch. This is another reason Neural Networks are not my default solution.

There are some out-of-the-box solutions available in tools like Sagemaker or Transformers. But even still, finding and implementing one of these is still just going to take slightly longer than whipping up a random forest.

-Legacy processes: There's a mentality of "if it ain't broke, don't fix it". Even though I am considered a subject matter expert in data science, the finance people don't like me tweaking the way things work without lengthy consultations.

As a result, I am often asked to 'recreate' instead of 'innovate'. That means replacing a linear regression with another linear regression that uses slightly different hyperparameters.

-Maintainability: There are significantly more vanilla software engineers than data scientists/ML Engineers at my company. In the event I bugger off, it's going to be easier for the next person to maintain my code if the models are simple/not Neural Networks.

54

brucebay t1_j3djt1n wrote

Yeah. This is a very nice summary of it.

4

harpooooooon t1_j3ck3ri wrote

An alternative would be a very large set of of 'if' statements

15

sidney_lumet OP t1_j3cl51k wrote

😂😅

3

tdgros t1_j3coj8r wrote

that's what random forests are...

15

currentscurrents t1_j3fop2j wrote

You can represent any neural network as a decision tree, and I believe you can represent any decision tree as a series of if statements...

But the interesting bit about neural networks is the training process, automatically creating that decision tree.

10

Immarhinocerous t1_j3g6gf5 wrote

That's really interesting, thanks for the share. Though I wonder if most decision trees still don't converge upon the same solutions as a neural network, even if they're capable of representing the same solutions. If trees don't converge on the same solutions, and NNs outperform trees, that would mean NNs are still needed for training, then the models can be optimized for run-time by basing a tree off the NN.

2

currentscurrents t1_j3jst6n wrote

I know there's a whole field of decision tree learning, but I'm not super up to date on it.

I assume neural networks are better or else we'd be using trees instead.

2

Immarhinocerous t1_j3km9hy wrote

My goto model (after a linear model) is usually XGBoost. It's nice to see that the theoretical potential of a tree based model is as high as neural network. Though that's not necessarily true of XGBoost's boosted trees, they do perform well, and I really like how fast their training time is.

1

IntelArtiGen t1_j3cmdvl wrote

Any alternative which would be able to solve the same problems would probably require a similar architecture: lot of parameters, deep connections.

There are many alternatives to deep learning on some specific tasks. But I'm not sure that if something is able to outpeform the current way we're doing deep learning on usual DL tasks, it will be something totally different (non-deep, few parameters etc.)

The future of ML regarding tasks we do with deep learning is probably just another kind of deep learning. Perhaps without backpropagation, perhaps with a totally different way to do computations, but still deep and highly parametric.

7

sidney_lumet OP t1_j3covo8 wrote

Like the forward forward algorithm Jeff Hinton proposed in nips 2022 ?

4

IntelArtiGen t1_j3cq3bi wrote

That's an innovative training algorithm on usual architectures. We could think of innovative training algorithms on innovative architectures.

7

jloverich t1_j3cpmoi wrote

Unfortunately this is basically a different type of layer by layer training which doesn't perform better than end to end training in any case that I'm aware of. It also seems very similar to stacking which can be done with any type of model.

6

yldedly t1_j3dn5mb wrote

>Any alternative which would be able
to solve the same problems would probably require a similar
architecture: lot of parameters, deep connections.

If handwritten character recognition (and generation) counts as one such problem, then here is a model that solves it with a handful of parameters: https://www.cs.cmu.edu/~rsalakhu/papers/LakeEtAl2015Science.pdf

2

IntelArtiGen t1_j3dpy8q wrote

Well it doesn't really count because you can also "solve" these tasks with SVM / RandomForests, etc. MNIST, OCR and other tasks with very small images are not great benchmarks anymore to compare a random algorithm with a deep learning algorithm.

I was more thinking of getting 90% top 1 on ImageNet or generating 512x512 images from text or learning on billions of texts to answer questions. You either need tons of parameters to solve these or an unbelievable amount of compression. And even DL algorithms which do compression need a lot of parameters. You would need an even bigger way to compress an information, perhaps it's possible but it's yet to invent.

1

yldedly t1_j3ds4h2 wrote

Imo there's no reason why we can't have much smaller models that do well on these tasks, but I admit it's just a hypothesis at this point. Specifically for images, an inverse graphics approach wouldn't require nearly as many parameters: http://sunw.csail.mit.edu/2015/papers/75_Kulkarni_SUNw.pdf

2

IntelArtiGen t1_j3dvbjr wrote

>Imo there's no reason why we can't have much smaller models

It depends on how much smaller they would be. There are limits to how much you can compress information. If you need to represent 4 states, you can't use one binary value 0/1, you need two parameters 00/01/10/11.

A large image of the real world contains a lot of information / details which can be hard to process and compress. We can compress it of course, that's what current DL algorithms and compression softwares do, but they have limits otherwise they loose too much information.

Usual models are far from being perfectly optimized but when you try to optimize them too much you can quickly loose in accuracy. Under 1.000.000 parameters it's hard to have anything that could compete with more standard DL models on the tasks I've described... at least for now. Perhaps people will have great ideas but it would require to really push current limits.

2

yldedly t1_j3dwdv6 wrote

I agree of course, you can't compress more than some hard limit, even in lossy compression. I just think DL finds very poor compression schemes compared to what's possible (compare DL for that handwriting problem above to the solution constructed by human experts).

2

IntelArtiGen t1_j3dyhfy wrote

By default it's true that DL algorithms are truly unoptimized on this point because modelers usually don't really care about optimizing the number of parameters.

For example Resnet50 uses 23 million parameters, which is much more than efficient net B0 which uses 5 million parameters and have a better accuracy (and is harder to train). But when you try to further optimize algorithms which were already optimized on their number of parameters you quickly see these limits. You would need models that would be even more efficient than these DL models which are already optimized regarding their number of parameters.

A DL model could probably solve this handwriting problem with a very low number of parameters if you build it specifically with this goal in mind.

2

lightofaman t1_j3dccb2 wrote

PhD candidate on AI here. Gradient boosting is the real deal when tabular data is concerned (for both regression and classification on ML). However, thx to UAT neural nets are awesome approximators to really complex functions and therefore are the way to go for complex tasks, like the ones presented by scientific machine learning, for example. LeCun (not so) recently said that deep learning is dead and differentiable programing (another way to describe SciML) is the new kid in the block.

5

Groundbreaking-Air73 t1_j3fn7o1 wrote

For small tabular datasets I found gradient boosting methods (xgboost, lightgbm, catboost) to outperform typical deep learning architectures.

5

[deleted] t1_j3coep3 wrote

random forests and SVMs can also provide significant improvements over simple models (eg. logistic regression). however, for new problems/datasets, it's prudent to try many models.

3

TeamRocketsSecretary t1_j3fd8kf wrote

Of course SVM’s work better over simple models, they are pretty heavyweight solutions involving the solution of a convex function.

0

rduke79 t1_j3dohr9 wrote

In an alternate universe, Numenta's Hierarchical Temporal Memory model has won the race against Deep Learning. Sometimes I wish I lived in that universe.

It's an ML paradigm deeply rooted in neuroscience. Things like HTM School on Youtube are just awesome. I believe that if the ML community had jumped on the HTM wagon at the time (and Transfomers hadn't shown up) and had invested the same amount of effort into it as it has into DL, we would be at a similar point in development but with a better ML framework.

2

Competitive-Rub-1958 t1_j3etbs5 wrote

I think HTM was doomed to fail from the very start; even Hawkins has distanced himself from it. The problem is that HTM/TBT all are iterations for a better model of the neocortex. But it would definitely take quite a bit of time to really unlock all the secrets of the brain.

I think Numenta quickly realized that the path forward is going to be even longer than they thought (they've been researching ~20 years now IIRC) so they're wanting to quickly cash out - their whole new push towards "integrating DL" with their ideas (spoiler: doesn't work well, check out their latest paper on that) and working on sparsifying LLMs - something which the NeuralMagic folks already lead quite a huge part of the industry by (see the recent paper: LMs can be pruned in one-shot).

That argument of "If we'd put X resources in Y thing, we'd have solved ASI by now!" is quite illogical and applicable to literally every field. In the end, Numenta's work simply did not yield the results that Hawkins et al. were hoping to get. No results is a very tricky grounding to attract the interests of other researchers. If HTM/TBT wants a comeback, it would have to be on the shoulders of some strong emergent abilities in their architectures...

7

rduke79 t1_j3rdqx1 wrote

I agree. I just enjoyed diving into HTM at the time and learning cool stuff about the brain. It was a nice break from the math/engineering heavy DL concepts.

3

QuantumEffects t1_j3d73hq wrote

I think what we can expect in the future is combinations of deep learning and new, as of yet unknown methods. Just as reinforcement learning made older AI concepts new again, I bet we will see a merge. One that I'm watching right now is IBMs neurosymbolic approaches, which are attempting to merge formal methods learned through deep learning techniques.

1

TensorDudee t1_j3d8rox wrote

Linear regression is the future, maybe? 😭

1

junetwentyfirst2020 t1_j3dhxqt wrote

That’s a pretty big hope that all data can be modeled linearly

0

Yo_Soy_Jalapeno t1_j3fdme4 wrote

Your data doesn't need to be linear to use a linear regression. The linearity is in the parameters. Ex: Y = Bo + B1X + B2X**2 Is a linear regression You can do non linear transformations to the X

5

junetwentyfirst2020 t1_j3ftprc wrote

Can you link to an example online or from a book? I need to understand this!!!

1

jpopsong t1_j3gilzi wrote

https://youtu.be/Hwj_9wMXDVo will explain why linear regression can map non-linear functions. Just have some features squared or cubed, etc. That allows linear regression to map the most complicated non-linear functions possible.

5

junetwentyfirst2020 t1_j3hlmnb wrote

Thank you!!!

1

Matthew2229 t1_j3dajdu wrote

Well of course. They just usually aren't as good on unstructured data (pictures, video, text, etc.). If they start performing better, then people will use them. Simple as that.

1

shanereid1 t1_j3dgdq0 wrote

Do you define transformers as Deep Learning? Cause if not then transformers.

1

junetwentyfirst2020 t1_j3dhtx6 wrote

Oh wow I never considered if I should see them as different. Transformers are a pretty big conceptual change, but they’re used in a deep way. 🤔 food for thought

2

currentscurrents t1_j3epeo7 wrote

Transformers are just deep learning with attention.

And attention is just another neural network telling the first one where to look.

4

junetwentyfirst2020 t1_j3ftm38 wrote

That makes sense. Convolutional neural networks were just deep learning with convolution.

1

CrysisAverted t1_j3f3n1p wrote

So an alternative to deep learning is tree based methods and gradient boosted methods on top of those trees. XGBoost etc. These aren't technically deep learning but they have a ton in common.

1

Immarhinocerous t1_j3g5qgb wrote

What do you want to do with it?

For tabular data of a few million rows or less, you're often much better off using XGBoost, or one of the other boosting libraries. Read up on boosting. It is an alternative approach to deep learning. Technically it can also be used with neural networks, including deep learning, but in practice it is not often used with it because boosting relies on multiple weak learners whereas deep learning has long training times for the creation of one strong learner.

XGBoost and CatBoost have won many many Kaggle competitions. My former employer trained all their production models using XGBoost, and they were modeling people's credit scores. There are many reasons to use XGBoost including speed of training (much faster to train than deep neural networks) and interpretability (easier to interpret the model's decision-making process with XGBoost, because under the hood it's just decision trees).

I mostly use XGBoost, and sometimes fairly simple LSTMs. I use them primarily for financial modeling. XGBoost works well and the fast training times let me do optimization across a wide range of model parameters, without spending a bunch of money on GPUs.

If you want to do image analysis though, you do need deep learning for state-of-the-art. Ditto reinforcement learning. Ditto several other types of problems.

So, it depends.

1

ForceBru t1_j3gfgvo wrote

> Read up on boosting.

What could a good reading list look like? I read the original papers which introduced functional gradient descent (the theoretical underpinning of boosting), but I can't say they shed much light on these techniques for me.

Is there more recommended reading to study boosting? Anything more recent, maybe? Any textbook treatments?

1

Immarhinocerous t1_j3gkq83 wrote

Google is your friend here. ChatGPT may even give a decent response.

Start by learning bagging, then learn boosting.

I find the following site fairly good: https://machinelearningmastery.com/essence-of-boosting-ensembles-for-machine-learning .

The explanations are usually approachable, or the author usually has another article on the topic at a simpler level of detail. He has good code samples, many of his articles also go quite in depth, so he caters to a broad range of audiences and is good at sticking with a certain level of depth in a topic in his articles. I've even bought some of his materials and they were fairly good, but his free articles are plenty.

There are lots of other sites you can find that will teach you. Read a few different sources. They're worth understanding well. Since you stated you've read papers on gradient descent, you might find some helpful papers by searching scholar.google.com.

This is also a good place to start: https://www.ibm.com/topics/bagging

1