harpooooooon t1_j3ck3ri wrote on January 7, 2023 at 4:28 PM

#1,302,723

An alternative would be a very large set of of 'if' statements

sidney_lumet OP t1_j3cl51k wrote on January 7, 2023 at 4:35 PM

#1,302,758

😂😅

ok531441 t1_j3cm7dl wrote on January 7, 2023 at 4:42 PM

#1,302,802

For some problems like tabular data or time series there are plenty of alternatives, some of them better than deep learning (for a specific application). But those are very different from the cutting edge vision/language models.

IntelArtiGen t1_j3cmdvl wrote on January 7, 2023 at 4:43 PM

#1,302,809

Any alternative which would be able to solve the same problems would probably require a similar architecture: lot of parameters, deep connections.

There are many alternatives to deep learning on some specific tasks. But I'm not sure that if something is able to outpeform the current way we're doing deep learning on usual DL tasks, it will be something totally different (non-deep, few parameters etc.)

The future of ML regarding tasks we do with deep learning is probably just another kind of deep learning. Perhaps without backpropagation, perhaps with a totally different way to do computations, but still deep and highly parametric.

CireNeikual t1_j3cmwtt wrote on January 7, 2023 at 4:47 PM

#1,302,830

My own work focuses on an alternative to deep learning, called Sparse Predictive Hierarchies (SPH). It is implemented in a library called AOgmaNeo (Python bindings also exist). It does not use backpropagation, runs fully online/incremental/continually (non-i.i.d.). Its main advantages are the online learning but also that it runs super fast. Recently, I was able to play Atari Pong (with learning enabled!) on a Teensy 4.1 microcontroller, and still get 60hz.

If you would like to know more about it, here is a link to a presentation I gave a while back (Google Drive).

Other than my own work, I find the Tsetlin Machine interesting as well.

[deleted] t1_j3cnqbe wrote on January 7, 2023 at 4:52 PM

#1,302,861

[removed]

[deleted] t1_j3coep3 wrote on January 7, 2023 at 4:57 PM

#1,302,891

random forests and SVMs can also provide significant improvements over simple models (eg. logistic regression). however, for new problems/datasets, it's prudent to try many models.

tdgros t1_j3coj8r wrote on January 7, 2023 at 4:58 PM

#1,302,898

Replying to sidney_lumet (#1,302,758)

that's what random forests are...

sidney_lumet OP t1_j3covo8 wrote on January 7, 2023 at 5:00 PM

#1,302,914

Replying to IntelArtiGen (#1,302,809)

Like the forward forward algorithm Jeff Hinton proposed in nips 2022 ?

sidney_lumet OP t1_j3cp3xs wrote on January 7, 2023 at 5:01 PM

#1,302,935

Replying to CireNeikual (#1,302,830)

Great insights my G. Surely will look into your work. Might contact you in future if I want to contribute something to your idea.

jloverich t1_j3cpmoi wrote on January 7, 2023 at 5:05 PM

#1,302,974

Replying to sidney_lumet (#1,302,914)

Unfortunately this is basically a different type of layer by layer training which doesn't perform better than end to end training in any case that I'm aware of. It also seems very similar to stacking which can be done with any type of model.

IntelArtiGen t1_j3cq3bi wrote on January 7, 2023 at 5:08 PM

#1,303,001

Replying to sidney_lumet (#1,302,914)

That's an innovative training algorithm on usual architectures. We could think of innovative training algorithms on innovative architectures.

varukimm t1_j3cxdrz wrote on January 7, 2023 at 5:55 PM

#1,303,338

Replying to ok531441 (#1,302,802)

You refere to regressions?

gourmetcuts t1_j3cy1ua wrote on January 7, 2023 at 5:59 PM

#1,303,360

Shallow learning

OrionsHeadband t1_j3d2csv wrote on January 7, 2023 at 6:27 PM

#1,303,521

Shallow learning?

brucebay t1_j3d4nzz wrote on January 7, 2023 at 6:42 PM

#1,303,615

Replying to varukimm (#1,303,338)

Most commonly used (and successful models) in financial world are typically variations of regression and decision trees (xgboost is being the leader). If you try to do anything else, the powers in place will hang you, cut your head, and urinate in your grave, not necessarily in that order.

QuantumEffects t1_j3d73hq wrote on January 7, 2023 at 6:58 PM

#1,303,699

I think what we can expect in the future is combinations of deep learning and new, as of yet unknown methods. Just as reinforcement learning made older AI concepts new again, I bet we will see a merge. One that I'm watching right now is IBMs neurosymbolic approaches, which are attempting to merge formal methods learned through deep learning techniques.

TensorDudee t1_j3d8rox wrote on January 7, 2023 at 7:09 PM

#1,303,748

Linear regression is the future, maybe? 😭

Matthew2229 t1_j3dajdu wrote on January 7, 2023 at 7:20 PM

#1,303,808

Well of course. They just usually aren't as good on unstructured data (pictures, video, text, etc.). If they start performing better, then people will use them. Simple as that.

lightofaman t1_j3dccb2 wrote on January 7, 2023 at 7:32 PM

#1,303,880

PhD candidate on AI here. Gradient boosting is the real deal when tabular data is concerned (for both regression and classification on ML). However, thx to UAT neural nets are awesome approximators to really complex functions and therefore are the way to go for complex tasks, like the ones presented by scientific machine learning, for example. LeCun (not so) recently said that deep learning is dead and differentiable programing (another way to describe SciML) is the new kid in the block.

PenDiscombobulated t1_j3dfz0m wrote on January 7, 2023 at 7:56 PM

#1,303,986

There’s living neurons in an artificial network that’s more of neuro/cognitive science. Learned pong with less iterations but there are ethical issues.

https://www.npr.org/sections/health-shots/2022/10/14/1128875298/brain-cells-neurons-learn-video-game-pong

shanereid1 t1_j3dgdq0 wrote on January 7, 2023 at 7:58 PM

#1,304,007

Do you define transformers as Deep Learning? Cause if not then transformers.

junetwentyfirst2020 t1_j3dhtx6 wrote on January 7, 2023 at 8:08 PM

#1,304,068

Replying to shanereid1 (#1,304,007)

Oh wow I never considered if I should see them as different. Transformers are a pretty big conceptual change, but they’re used in a deep way. 🤔 food for thought

CactusOnFire t1_j3dhxfk wrote on January 7, 2023 at 8:08 PM

#1,304,074

Replying to brucebay (#1,303,615)

I work in the financial world- and to elaborate on your comment:

Speaking strictly for myself, there are a few reasons I rarely use neural networks at my day job:

-Model explainability: Often stakeholders care about being able to explicitly label the "rules" which lead to a specific outcome. Neural Networks can perform well, but it is harder to communicate why the results happened than with simpler models.

-Negligible performance gains: While I am working on multi-million row datasets, the number of features I am working with are often small. The performance improvements I get for running a tensorflow/pytorch model are nearly on par with running an sklearn model. As a result, deep learning is overkill for most of my tasks.

-Developer Time & Speed: It is much quicker and easier to make an effective model in sklearn than it is in tensorflow/pytorch. This is another reason Neural Networks are not my default solution.

There are some out-of-the-box solutions available in tools like Sagemaker or Transformers. But even still, finding and implementing one of these is still just going to take slightly longer than whipping up a random forest.

-Legacy processes: There's a mentality of "if it ain't broke, don't fix it". Even though I am considered a subject matter expert in data science, the finance people don't like me tweaking the way things work without lengthy consultations.

As a result, I am often asked to 'recreate' instead of 'innovate'. That means replacing a linear regression with another linear regression that uses slightly different hyperparameters.

-Maintainability: There are significantly more vanilla software engineers than data scientists/ML Engineers at my company. In the event I bugger off, it's going to be easier for the next person to maintain my code if the models are simple/not Neural Networks.

junetwentyfirst2020 t1_j3dhxqt wrote on January 7, 2023 at 8:08 PM

#1,304,075

Replying to TensorDudee (#1,303,748)

That’s a pretty big hope that all data can be modeled linearly

brucebay t1_j3djt1n wrote on January 7, 2023 at 8:21 PM

#1,304,149

Replying to CactusOnFire (#1,304,074)

Yeah. This is a very nice summary of it.

yldedly t1_j3dn5mb wrote on January 7, 2023 at 8:43 PM

#1,304,308

Replying to IntelArtiGen (#1,302,809)

>Any alternative which would be able
to solve the same problems would probably require a similar
architecture: lot of parameters, deep connections.

If handwritten character recognition (and generation) counts as one such problem, then here is a model that solves it with a handful of parameters: https://www.cs.cmu.edu/~rsalakhu/papers/LakeEtAl2015Science.pdf

rduke79 t1_j3dohr9 wrote on January 7, 2023 at 8:52 PM

#1,304,358

In an alternate universe, Numenta's Hierarchical Temporal Memory model has won the race against Deep Learning. Sometimes I wish I lived in that universe.

It's an ML paradigm deeply rooted in neuroscience. Things like HTM School on Youtube are just awesome. I believe that if the ML community had jumped on the HTM wagon at the time (and Transfomers hadn't shown up) and had invested the same amount of effort into it as it has into DL, we would be at a similar point in development but with a better ML framework.

IntelArtiGen t1_j3dpy8q wrote on January 7, 2023 at 9:01 PM

#1,304,418

Replying to yldedly (#1,304,308)

Well it doesn't really count because you can also "solve" these tasks with SVM / RandomForests, etc. MNIST, OCR and other tasks with very small images are not great benchmarks anymore to compare a random algorithm with a deep learning algorithm.

I was more thinking of getting 90% top 1 on ImageNet or generating 512x512 images from text or learning on billions of texts to answer questions. You either need tons of parameters to solve these or an unbelievable amount of compression. And even DL algorithms which do compression need a lot of parameters. You would need an even bigger way to compress an information, perhaps it's possible but it's yet to invent.

yldedly t1_j3ds4h2 wrote on January 7, 2023 at 9:16 PM

#1,304,488

Replying to IntelArtiGen (#1,304,418)

Imo there's no reason why we can't have much smaller models that do well on these tasks, but I admit it's just a hypothesis at this point. Specifically for images, an inverse graphics approach wouldn't require nearly as many parameters: http://sunw.csail.mit.edu/2015/papers/75_Kulkarni_SUNw.pdf

IntelArtiGen t1_j3dvbjr wrote on January 7, 2023 at 9:37 PM

#1,304,586

Replying to yldedly (#1,304,488)

>Imo there's no reason why we can't have much smaller models

It depends on how much smaller they would be. There are limits to how much you can compress information. If you need to represent 4 states, you can't use one binary value 0/1, you need two parameters 00/01/10/11.

A large image of the real world contains a lot of information / details which can be hard to process and compress. We can compress it of course, that's what current DL algorithms and compression softwares do, but they have limits otherwise they loose too much information.

Usual models are far from being perfectly optimized but when you try to optimize them too much you can quickly loose in accuracy. Under 1.000.000 parameters it's hard to have anything that could compete with more standard DL models on the tasks I've described... at least for now. Perhaps people will have great ideas but it would require to really push current limits.

yldedly t1_j3dwdv6 wrote on January 7, 2023 at 9:44 PM

#1,304,632

Replying to IntelArtiGen (#1,304,586)

I agree of course, you can't compress more than some hard limit, even in lossy compression. I just think DL finds very poor compression schemes compared to what's possible (compare DL for that handwriting problem above to the solution constructed by human experts).

IntelArtiGen t1_j3dyhfy wrote on January 7, 2023 at 9:58 PM

#1,304,715

Replying to yldedly (#1,304,632)

By default it's true that DL algorithms are truly unoptimized on this point because modelers usually don't really care about optimizing the number of parameters.

For example Resnet50 uses 23 million parameters, which is much more than efficient net B0 which uses 5 million parameters and have a better accuracy (and is harder to train). But when you try to further optimize algorithms which were already optimized on their number of parameters you quickly see these limits. You would need models that would be even more efficient than these DL models which are already optimized regarding their number of parameters.

A DL model could probably solve this handwriting problem with a very low number of parameters if you build it specifically with this goal in mind.

Electronic_Bridge_64 t1_j3e8m3o wrote on January 7, 2023 at 11:07 PM

#1,305,101

Replying to junetwentyfirst2020 (#1,304,075)

With the right features

currentscurrents t1_j3epeo7 wrote on January 8, 2023 at 1:06 AM

#1,305,746

Replying to junetwentyfirst2020 (#1,304,068)

Transformers are just deep learning with attention.

And attention is just another neural network telling the first one where to look.

Competitive-Rub-1958 t1_j3etbs5 wrote on January 8, 2023 at 1:35 AM

#1,305,946

Replying to rduke79 (#1,304,358)

I think HTM was doomed to fail from the very start; even Hawkins has distanced himself from it. The problem is that HTM/TBT all are iterations for a better model of the neocortex. But it would definitely take quite a bit of time to really unlock all the secrets of the brain.

I think Numenta quickly realized that the path forward is going to be even longer than they thought (they've been researching ~20 years now IIRC) so they're wanting to quickly cash out - their whole new push towards "integrating DL" with their ideas (spoiler: doesn't work well, check out their latest paper on that) and working on sparsifying LLMs - something which the NeuralMagic folks already lead quite a huge part of the industry by (see the recent paper: LMs can be pruned in one-shot).

That argument of "If we'd put X resources in Y thing, we'd have solved ASI by now!" is quite illogical and applicable to literally every field. In the end, Numenta's work simply did not yield the results that Hawkins et al. were hoping to get. No results is a very tricky grounding to attract the interests of other researchers. If HTM/TBT wants a comeback, it would have to be on the shoulders of some strong emergent abilities in their architectures...

CrysisAverted t1_j3f3n1p wrote on January 8, 2023 at 2:51 AM

#1,306,334

So an alternative to deep learning is tree based methods and gradient boosted methods on top of those trees. XGBoost etc. These aren't technically deep learning but they have a ton in common.

sidney_lumet OP t1_j3f9j8n wrote on January 8, 2023 at 3:37 AM

#1,306,574

Replying to CactusOnFire (#1,304,074)

Great summarization

TeamRocketsSecretary t1_j3fd8kf wrote on January 8, 2023 at 4:06 AM

#1,306,718

Replying to [deleted] (#1,302,891)

Of course SVM’s work better over simple models, they are pretty heavyweight solutions involving the solution of a convex function.

Yo_Soy_Jalapeno t1_j3fdme4 wrote on January 8, 2023 at 4:08 AM

#1,306,741

Replying to junetwentyfirst2020 (#1,304,075)

Your data doesn't need to be linear to use a linear regression. The linearity is in the parameters. Ex: Y = Bo + B1X + B2X**2 Is a linear regression You can do non linear transformations to the X

Groundbreaking-Air73 t1_j3fn7o1 wrote on January 8, 2023 at 5:32 AM

#1,307,161

For small tabular datasets I found gradient boosting methods (xgboost, lightgbm, catboost) to outperform typical deep learning architectures.

currentscurrents t1_j3fop2j wrote on January 8, 2023 at 5:46 AM

#1,307,208

Replying to tdgros (#1,302,898)

You can represent any neural network as a decision tree, and I believe you can represent any decision tree as a series of if statements...

But the interesting bit about neural networks is the training process, automatically creating that decision tree.

junetwentyfirst2020 t1_j3ftm38 wrote on January 8, 2023 at 6:36 AM

#1,307,377

Replying to currentscurrents (#1,305,746)

That makes sense. Convolutional neural networks were just deep learning with convolution.

junetwentyfirst2020 t1_j3ftprc wrote on January 8, 2023 at 6:37 AM

#1,307,384

Replying to Yo_Soy_Jalapeno (#1,306,741)

Can you link to an example online or from a book? I need to understand this!!!

RevolutionaryGear647 t1_j3fuhle wrote on January 8, 2023 at 6:45 AM

#1,307,410

Neuro-symbolic models

Immarhinocerous t1_j3g5qgb wrote on January 8, 2023 at 9:03 AM

#1,307,780

What do you want to do with it?

For tabular data of a few million rows or less, you're often much better off using XGBoost, or one of the other boosting libraries. Read up on boosting. It is an alternative approach to deep learning. Technically it can also be used with neural networks, including deep learning, but in practice it is not often used with it because boosting relies on multiple weak learners whereas deep learning has long training times for the creation of one strong learner.

XGBoost and CatBoost have won many many Kaggle competitions. My former employer trained all their production models using XGBoost, and they were modeling people's credit scores. There are many reasons to use XGBoost including speed of training (much faster to train than deep neural networks) and interpretability (easier to interpret the model's decision-making process with XGBoost, because under the hood it's just decision trees).

I mostly use XGBoost, and sometimes fairly simple LSTMs. I use them primarily for financial modeling. XGBoost works well and the fast training times let me do optimization across a wide range of model parameters, without spending a bunch of money on GPUs.

If you want to do image analysis though, you do need deep learning for state-of-the-art. Ditto reinforcement learning. Ditto several other types of problems.

So, it depends.

Immarhinocerous t1_j3g62fu wrote on January 8, 2023 at 9:07 AM

#1,307,796

Replying to harpooooooon (#1,302,723)

Haha, that would look somewhat like a tree!

Immarhinocerous t1_j3g6gf5 wrote on January 8, 2023 at 9:12 AM

#1,307,811

Replying to currentscurrents (#1,307,208)

That's really interesting, thanks for the share. Though I wonder if most decision trees still don't converge upon the same solutions as a neural network, even if they're capable of representing the same solutions. If trees don't converge on the same solutions, and NNs outperform trees, that would mean NNs are still needed for training, then the models can be optimized for run-time by basing a tree off the NN.

ForceBru t1_j3gfgvo wrote on January 8, 2023 at 11:11 AM

#1,308,067

Replying to Immarhinocerous (#1,307,780)

> Read up on boosting.

What could a good reading list look like? I read the original papers which introduced functional gradient descent (the theoretical underpinning of boosting), but I can't say they shed much light on these techniques for me.

Is there more recommended reading to study boosting? Anything more recent, maybe? Any textbook treatments?

jpopsong t1_j3gilzi wrote on January 8, 2023 at 11:52 AM

#1,308,152

Replying to junetwentyfirst2020 (#1,307,384)

https://youtu.be/Hwj_9wMXDVo will explain why linear regression can map non-linear functions. Just have some features squared or cubed, etc. That allows linear regression to map the most complicated non-linear functions possible.

Immarhinocerous t1_j3gkq83 wrote on January 8, 2023 at 12:18 PM

#1,308,219

Replying to ForceBru (#1,308,067)

Google is your friend here. ChatGPT may even give a decent response.

Start by learning bagging, then learn boosting.

I find the following site fairly good: https://machinelearningmastery.com/essence-of-boosting-ensembles-for-machine-learning .

The explanations are usually approachable, or the author usually has another article on the topic at a simpler level of detail. He has good code samples, many of his articles also go quite in depth, so he caters to a broad range of audiences and is good at sticking with a certain level of depth in a topic in his articles. I've even bought some of his materials and they were fairly good, but his free articles are plenty.

There are lots of other sites you can find that will teach you. Read a few different sources. They're worth understanding well. Since you stated you've read papers on gradient descent, you might find some helpful papers by searching scholar.google.com.

This is also a good place to start: https://www.ibm.com/topics/bagging

ForceBru t1_j3hfg8s wrote on January 8, 2023 at 4:35 PM

#1,309,684

Replying to Immarhinocerous (#1,308,219)

Nice, thanks!

junetwentyfirst2020 t1_j3hlmnb wrote on January 8, 2023 at 5:14 PM

#1,309,935

Replying to jpopsong (#1,308,152)

Thank you!!!

Yo_Soy_Jalapeno t1_j3hnlig wrote on January 8, 2023 at 5:27 PM

#1,310,007

Replying to junetwentyfirst2020 (#1,309,935)

Is it good for you ? I could look for something more if needed !

junetwentyfirst2020 t1_j3hntqw wrote on January 8, 2023 at 5:28 PM

#1,310,013

Replying to Yo_Soy_Jalapeno (#1,310,007)

This should work, thank you :)

[deleted] t1_j3i5bry wrote on January 8, 2023 at 7:16 PM

#1,310,564

Replying to junetwentyfirst2020 (#1,310,013)

[deleted]

jpopsong t1_j3i5exu wrote on January 8, 2023 at 7:16 PM

#1,310,567

Replying to junetwentyfirst2020 (#1,309,935)

You’re welcome!

currentscurrents t1_j3jst6n wrote on January 9, 2023 at 1:32 AM

#1,312,603

Replying to Immarhinocerous (#1,307,811)

I know there's a whole field of decision tree learning, but I'm not super up to date on it.

I assume neural networks are better or else we'd be using trees instead.

Immarhinocerous t1_j3km9hy wrote on January 9, 2023 at 5:09 AM

#1,313,963

Replying to currentscurrents (#1,312,603)

My goto model (after a linear model) is usually XGBoost. It's nice to see that the theoretical potential of a tree based model is as high as neural network. Though that's not necessarily true of XGBoost's boosted trees, they do perform well, and I really like how fast their training time is.

rduke79 t1_j3rdqx1 wrote on January 10, 2023 at 3:41 PM

#1,325,420

Replying to Competitive-Rub-1958 (#1,305,946)

I agree. I just enjoyed diving into HTM at the time and learning cool stuff about the brain. It was a nice break from the math/engineering heavy DL concepts.

joozwa t1_j41ohvt wrote on January 12, 2023 at 3:50 PM

#1,341,821

Replying to PenDiscombobulated (#1,303,986)

What exact ethical issues are there? Article does not mention any.

Comments