suflaj

suflaj t1_jadb19p wrote

That's simply not true. Musk is not a open source or public domain advocate. He is a profit-driven entrepreneur.

His current ambitions to create a OpenAI alternative is not for open source or public domain, it's to counter the leftist agenda (coincidentally also racist and sexist) that OpenAI has been pushing with their latest products, as he has multiple times in the past proclaimed he is politically neutral, antipolitical/anti-establishment and expressed views that could be understood as conservative.

Obviously, he counts on it being profitable, as OpenAI has demonstrated. The question only is - how does he do this without entering a conflict of interest with Tesla, which is mostly an AI company itself?

It's not much different from the reason he provided for buying Twitter (other than being forced to), which is countering Twitters control of the narrative into one that stifles conservative views and promotes liberal ones.

1

suflaj t1_jac0rfd wrote

Yes, but he left because it was a conflict of interest for Tesla (also says so in the article). I would assume he intends to create a Tesla subsidiary now or get rid of Trsla stock altogether (which would likely kill the company at this point, so it's unlikely).

6

suflaj t1_j9sxn6g wrote

You say it doesn't help, yet double descent says otherwise. You do not early stop transformer models the way you do with other models, outside of maybe finetuning on a similar task.

But pretraining - no way. Big transformers are trained by setting some hyperparameters, and then checking them out the next day. If the model learned something, you keep on doing that, and if it diverged you load the last good checkpoint, change the hyperparameters and train with that.

Early stopping would imply that ypu're confident your hyperparameters are good and that you have a general idea of how long training will take and how much it can learn. For big transformers, neither is the case.

0

suflaj t1_j9qectx wrote

That is peanuts for the datasets they are trained on. We're talking datasets in the order of terabytes, and the model doesn't usually even iterate over more than 10% of that. So you can't even overfit a model unless you're dealing with duplicates because you will never even go through the whole dataset.

Even if the model had 1 trillion parameters and iterated over the whole dataset, it would be too small for the number of relations contained within a dataset of 1 trillion+ bytes. AND THAT'S IF THEY WERE LINEAR, which they are (usually) NOT.

So there is large overhead in needing multiple sets of parameters to define just one type of relation. Not to mention that some of these models are trained on data pairs, which means the SQUARE of that number of relations. We're talking about physically impossible number of parameters here, which will require solutions radically different that simple matrix multiplication and nonlinear activations.

5

suflaj t1_j9puhj0 wrote

Overfit on what? These models are too small to truly overfit on their datasets.

They overfit on noise, which seems to be one reason for their good performance, so it's something you want. And that is not a bad thing - learning what the noise is helps generalization. Once the model starts figuring out noise, it can go beyond what the data would usually allow it to in terms of generalization.

EDIT: Also, a larger model is more easily internally restructured. Overparametrized models are sort of like very big rooms. It's easier to rearrange the same furniture in a larger room, than it is in a smaller room.

2

suflaj t1_j9jjetb wrote

Because it requires the least amount of human intervention

Also because it subjectively sounds like magic to people who don't really understand it, so it both sells to management and to consumers.

At least it's easier for humans to cope it is like magic than to accept that a lot of what AI can do is just stuff that is trivial and doesn't require humanity to solve.

−12

suflaj t1_j91rcef wrote

Reply to comment by [deleted] in [D] Please stop by [deleted]

I mean, the solution to this is already being used, if you want to stop the flood of similar threads, then you just create a pinned megathread.

2

suflaj t1_j919wen wrote

Reply to comment by dojoteef in [D] Please stop by [deleted]

This is also a pretty low quality post. Although the gist of it makes sense,

> and no one with working brain will design an ai that is self aware

made the author lose pretty much any credibility. Followed by

> use common sense

make me think OP is actually hypocritical. For some the common sense IS that ChatGPT is sentient.


Whether you design a self-aware AI is not only out of one's control, but self-awareness is not really well-defined by itself. The only reason at this point we do not call ChatGPT self-aware is the AI effect, we need to invent new prerequisites otherwise. The discussions whether it is sentient, why or why not, is an interesting topic regardless of your level of expertise - but we can create a pinned thread for that, similarly to how we have Simple Questions for the exact same purpose of preventing flooding.

Be as it be, I do not believe mods should act aggressively on posts like this and that one. ML is not an exact science for a long time now. Downvote and move on, that's the only thing a redditor does anyways, and the only way you can abide by rule 1, since the alternative is excluding laymen. Ironically, if we did that, OP, as a layman himself, would be excluded.

9

suflaj t1_j8xv8md wrote

Maybe that's just a necessary step to cull those who cannot critically think and to force society to stop taking things for granted.

At the end of the day misinformation like this was already generated and shared even before ChatGPT was released, yet it seems that the governmental response was to allow domestic sources and try to hunt down foreign ones. So if the government doesn't care, why should you?

People generally don't seem to care even if the misinformation is human-generated, ex. Hunter Biden's laptop story. I wouldn't lose sleep over this either way, it is still human-operated.

2

suflaj t1_j8xor46 wrote

It doesn't matter if it isn't of roughly the same size or larger. Either ChatGPT is REALLY sparse, or such detection models won't be available to mortals. So far, it doesn't seem to be sparse, since similarly sized detectors can't reliably differentiate between it and human text.

1

suflaj t1_j8vt849 wrote

Likely

The possible implementation path is just using it, lol, it is already capable of doing that.

At this moment, your two bets are:

  • OpenAI watermarks their generated text and you have models which cam detect this watermark
  • a bigger, better model comes out which can detect synthetic text (although then THAT model becomes the problem)

You could also counter misinformation with a fact checking model, but there are two big problems:

  • we are nowhere near developing useful AI that can reason
  • the truth is subjective and full of dogmas, ex. look at how most countries implement dogmas regarding the holocaust - your model would, without a severe transformation of society itself, be biased and capable of spreading propaganda in a general sense, and misinformation as a subset of propaganda

Therefore I believe your question should be: when can we expect to have models that only share the "truth of the victor". And that's already happening with ChatGPT now, as it seems to be spreading western liberal views.

3

suflaj t1_j8qxasd wrote

That's more of an issue of you searching. You mention sentiment analysis, for example, but it is a problem that is considered to be solved for years. There is no novelty you could do here besides a bigger model.

Obviously you need to stop looking at what people have done, and start looking at what in their process of doing something they didn't do or did poorly. One such thing is tokenization of text. You can't tell me that it's all figured out.

5

suflaj t1_j890obw wrote

It's not that they are missing something, it's that they're too general purpose to be considered "specifically for Machine Learning", i.e. DSLs.

They're about as specifically for ML as Python, only Python is better at it because there's a bigger community and better support, meaning wider coverage of ML.

1

suflaj t1_j83p27m wrote

What does worth mean? The M1 GPU is roughly equivalent to a 2060. The M2 GPU is roughly equivalent to a 3060. Whether its worth for you depends on whether you want to pay that kind of money and endure shortened lifespan of your devices due to heat.

For me personally that wouldn't be worth, since it's cheaper to buy a rig to ssh in and a trusty Lenovo laptop, both of which will last longer and perform better.

1

suflaj t1_j7yr906 wrote

If GPU memory is the bottleneck then there is nothing you can viably do about that. If your GPU can't load the memory faster then you will need to get more rigs and GPUs if you want to speed up the loading in parallel.

Or you could try to quantize your models into something smaller that can fit in the memory, but then we're talking model surgery, not hardware.

2