suflaj t1_jb91xiw wrote on March 7, 2023 at 10:00 AM

Reply to Trying to figure out what GPU to buy... by GhostingProtocol

The 3060s are pretty gimped for DL. Currently, you either go for a 3090/4090, or used 1080Ti/2080Ti(s) if on a budget.

suflaj t1_jadb19p wrote on February 28, 2023 at 4:48 PM

Reply to comment by catesnake in Elon plans to build an OpenAI competitor by pyactee

That's simply not true. Musk is not a open source or public domain advocate. He is a profit-driven entrepreneur.

His current ambitions to create a OpenAI alternative is not for open source or public domain, it's to counter the leftist agenda (coincidentally also racist and sexist) that OpenAI has been pushing with their latest products, as he has multiple times in the past proclaimed he is politically neutral, antipolitical/anti-establishment and expressed views that could be understood as conservative.

Obviously, he counts on it being profitable, as OpenAI has demonstrated. The question only is - how does he do this without entering a conflict of interest with Tesla, which is mostly an AI company itself?

It's not much different from the reason he provided for buying Twitter (other than being forced to), which is countering Twitters control of the narrative into one that stifles conservative views and promotes liberal ones.

suflaj t1_jac0rfd wrote on February 28, 2023 at 10:03 AM

Reply to comment by MyHomeworkAteMyDog in Elon plans to build an OpenAI competitor by pyactee

Yes, but he left because it was a conflict of interest for Tesla (also says so in the article). I would assume he intends to create a Tesla subsidiary now or get rid of Trsla stock altogether (which would likely kill the company at this point, so it's unlikely).

suflaj t1_j9sxn6g wrote on February 24, 2023 at 9:26 AM

Reply to comment by Dropkickmurph512 in Why bigger transformer models are better learners? by begooboi

You say it doesn't help, yet double descent says otherwise. You do not early stop transformer models the way you do with other models, outside of maybe finetuning on a similar task.

But pretraining - no way. Big transformers are trained by setting some hyperparameters, and then checking them out the next day. If the model learned something, you keep on doing that, and if it diverged you load the last good checkpoint, change the hyperparameters and train with that.

Early stopping would imply that ypu're confident your hyperparameters are good and that you have a general idea of how long training will take and how much it can learn. For big transformers, neither is the case.

suflaj t1_j9swm0z wrote on February 24, 2023 at 9:11 AM

Reply to comment by junetwentyfirst2020 in Why bigger transformer models are better learners? by begooboi

I'm not sure what you mean. I'm using the usual definition of noise.

suflaj t1_j9qectx wrote on February 23, 2023 at 8:46 PM

Reply to comment by levand in Why bigger transformer models are better learners? by begooboi

That is peanuts for the datasets they are trained on. We're talking datasets in the order of terabytes, and the model doesn't usually even iterate over more than 10% of that. So you can't even overfit a model unless you're dealing with duplicates because you will never even go through the whole dataset.

Even if the model had 1 trillion parameters and iterated over the whole dataset, it would be too small for the number of relations contained within a dataset of 1 trillion+ bytes. AND THAT'S IF THEY WERE LINEAR, which they are (usually) NOT.

So there is large overhead in needing multiple sets of parameters to define just one type of relation. Not to mention that some of these models are trained on data pairs, which means the SQUARE of that number of relations. We're talking about physically impossible number of parameters here, which will require solutions radically different that simple matrix multiplication and nonlinear activations.

suflaj t1_j9puhj0 wrote on February 23, 2023 at 6:43 PM

Reply to comment by levand in Why bigger transformer models are better learners? by begooboi

Overfit on what? These models are too small to truly overfit on their datasets.

They overfit on noise, which seems to be one reason for their good performance, so it's something you want. And that is not a bad thing - learning what the noise is helps generalization. Once the model starts figuring out noise, it can go beyond what the data would usually allow it to in terms of generalization.

EDIT: Also, a larger model is more easily internally restructured. Overparametrized models are sort of like very big rooms. It's easier to rearrange the same furniture in a larger room, than it is in a smaller room.

suflaj t1_j9khr8s wrote on February 22, 2023 at 5:22 PM

Reply to comment by shawon-ashraf-93 in Bummer: nVidia stopping support for multi-gpu peer to peer access with 4090s by mosalreddit

The burden on proof is on you, since you initially claimed that there will be benefits.

suflaj t1_j9k3bqn wrote on February 22, 2023 at 3:50 PM

Reply to comment by shawon-ashraf-93 in Bummer: nVidia stopping support for multi-gpu peer to peer access with 4090s by mosalreddit

The 300 GB/s, which was its theoretical upper limit of it in a MULTI-GPU workload did not show a significant difference in benchmarks. Please do not gaslight people into believing it did.

suflaj t1_j9jwfc9 wrote on February 22, 2023 at 2:32 PM

Reply to comment by shawon-ashraf-93 in Bummer: nVidia stopping support for multi-gpu peer to peer access with 4090s by mosalreddit

OK, and now an actual argument?

Or are you one of those people who unironically believe NVLink enabled memory pooling or things like that?

suflaj t1_j9jjetb wrote on February 22, 2023 at 12:48 PM

Reply to comment by GraciousReformer in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer

Because it requires the least amount of human intervention

Also because it subjectively sounds like magic to people who don't really understand it, so it both sells to management and to consumers.

At least it's easier for humans to cope it is like magic than to accept that a lot of what AI can do is just stuff that is trivial and doesn't require humanity to solve.

suflaj t1_j9iwxuj wrote on February 22, 2023 at 8:06 AM

Reply to Is my plan for a 4x RTX 3090 Machine Learning rig feasible? by [deleted]

You would have to limit the power to 250W. It will overheat without an open case. PCI-E 3 x8 means you are cutting the cards bandwidth in half.

Overall a terrible idea.

suflaj t1_j9iwkp8 wrote on February 22, 2023 at 8:02 AM

Reply to comment by buzzz_buzzz_buzzz in Bummer: nVidia stopping support for multi-gpu peer to peer access with 4090s by mosalreddit

They removed NVLink to cut down on costs on a feature that is supported by only a handful of games in existence and otherwise useless.

suflaj t1_j91rcef wrote on February 18, 2023 at 4:02 PM

Reply to comment by [deleted] in [D] Please stop by [deleted]

I mean, the solution to this is already being used, if you want to stop the flood of similar threads, then you just create a pinned megathread.

suflaj t1_j919wen wrote on February 18, 2023 at 1:49 PM

Reply to comment by dojoteef in [D] Please stop by [deleted]

This is also a pretty low quality post. Although the gist of it makes sense,

> and no one with working brain will design an ai that is self aware

made the author lose pretty much any credibility. Followed by

> use common sense

make me think OP is actually hypocritical. For some the common sense IS that ChatGPT is sentient.

Whether you design a self-aware AI is not only out of one's control, but self-awareness is not really well-defined by itself. The only reason at this point we do not call ChatGPT self-aware is the AI effect, we need to invent new prerequisites otherwise. The discussions whether it is sentient, why or why not, is an interesting topic regardless of your level of expertise - but we can create a pinned thread for that, similarly to how we have Simple Questions for the exact same purpose of preventing flooding.

Be as it be, I do not believe mods should act aggressively on posts like this and that one. ML is not an exact science for a long time now. Downvote and move on, that's the only thing a redditor does anyways, and the only way you can abide by rule 1, since the alternative is excluding laymen. Ironically, if we did that, OP, as a layman himself, would be excluded.

suflaj t1_j8xv8md wrote on February 17, 2023 at 6:55 PM

Reply to comment by zcwang0702 in How likely is ChatGPT to be weaponized as an information pollution tool? What are the possible implementation paths? How to prevent possible attacks? by zcwang0702

Maybe that's just a necessary step to cull those who cannot critically think and to force society to stop taking things for granted.

At the end of the day misinformation like this was already generated and shared even before ChatGPT was released, yet it seems that the governmental response was to allow domestic sources and try to hunt down foreign ones. So if the government doesn't care, why should you?

People generally don't seem to care even if the misinformation is human-generated, ex. Hunter Biden's laptop story. I wouldn't lose sleep over this either way, it is still human-operated.

suflaj t1_j8xor46 wrote on February 17, 2023 at 6:14 PM

Reply to comment by SleekEagle in How likely is ChatGPT to be weaponized as an information pollution tool? What are the possible implementation paths? How to prevent possible attacks? by zcwang0702

It doesn't matter if it isn't of roughly the same size or larger. Either ChatGPT is REALLY sparse, or such detection models won't be available to mortals. So far, it doesn't seem to be sparse, since similarly sized detectors can't reliably differentiate between it and human text.

suflaj t1_j8vt849 wrote on February 17, 2023 at 8:34 AM

Reply to How likely is ChatGPT to be weaponized as an information pollution tool? What are the possible implementation paths? How to prevent possible attacks? by zcwang0702

Likely

The possible implementation path is just using it, lol, it is already capable of doing that.

At this moment, your two bets are:

OpenAI watermarks their generated text and you have models which cam detect this watermark
a bigger, better model comes out which can detect synthetic text (although then THAT model becomes the problem)

You could also counter misinformation with a fact checking model, but there are two big problems:

we are nowhere near developing useful AI that can reason
the truth is subjective and full of dogmas, ex. look at how most countries implement dogmas regarding the holocaust - your model would, without a severe transformation of society itself, be biased and capable of spreading propaganda in a general sense, and misinformation as a subset of propaganda

Therefore I believe your question should be: when can we expect to have models that only share the "truth of the victor". And that's already happening with ChatGPT now, as it seems to be spreading western liberal views.

suflaj t1_j8qxasd wrote on February 16, 2023 at 8:44 AM

Reply to comment by mems_m in [P] Struggling with thesis idea and implementation by mems_m

That's more of an issue of you searching. You mention sentiment analysis, for example, but it is a problem that is considered to be solved for years. There is no novelty you could do here besides a bigger model.

Obviously you need to stop looking at what people have done, and start looking at what in their process of doing something they didn't do or did poorly. One such thing is tokenization of text. You can't tell me that it's all figured out.

suflaj t1_j8qx0qv wrote on February 16, 2023 at 8:40 AM

Reply to comment by mems_m in [P] Struggling with thesis idea and implementation by mems_m

As I've said, there is no reason you can't do something novel with that, you just can't do what something else has done with it.

suflaj t1_j8qwt5d wrote on February 16, 2023 at 8:37 AM

Reply to [P] Struggling with thesis idea and implementation by mems_m

People usually create datasets when they work on something new. I don't know why you would think that just because a dataset exists you can't or even need to outperform anything.

suflaj t1_j890obw wrote on February 12, 2023 at 3:41 PM

Reply to comment by mil24havoc in [D] Have their been any attempts to create a programming language specifically for machine learning? by throwaway957280

It's not that they are missing something, it's that they're too general purpose to be considered "specifically for Machine Learning", i.e. DSLs.

They're about as specifically for ML as Python, only Python is better at it because there's a bigger community and better support, meaning wider coverage of ML.

suflaj t1_j886fei wrote on February 12, 2023 at 10:38 AM

Reply to comment by mil24havoc in [D] Have their been any attempts to create a programming language specifically for machine learning? by throwaway957280

Neither of those are for ML strictly. CUDA is a GPGPU platform that uses C to compile to some GPU high level assembly, Julia is a parallel computing maths language, and Scala is your general purpose OOP/FP hybrid.

suflaj t1_j83p27m wrote on February 11, 2023 at 12:09 PM

Reply to M1 MAX vs M2 MAX by markupdev

What does worth mean? The M1 GPU is roughly equivalent to a 2060. The M2 GPU is roughly equivalent to a 3060. Whether its worth for you depends on whether you want to pay that kind of money and endure shortened lifespan of your devices due to heat.

For me personally that wouldn't be worth, since it's cheaper to buy a rig to ssh in and a trusty Lenovo laptop, both of which will last longer and perform better.

suflaj t1_j7yr906 wrote on February 10, 2023 at 10:45 AM

Reply to comment by one_eyed_sphinx in what cpu and mother board is best for Dual RTX 4090? by one_eyed_sphinx

If GPU memory is the bottleneck then there is nothing you can viably do about that. If your GPU can't load the memory faster then you will need to get more rigs and GPUs if you want to speed up the loading in parallel.

Or you could try to quantize your models into something smaller that can fit in the memory, but then we're talking model surgery, not hardware.