antonivs
antonivs t1_je7ws1v wrote
Reply to comment by SlowThePath in [D] FOMO on the rapid pace of LLMs by 00001746
I was referring to what the OpenAI GPT models are trained on. For GPT-3, that involved about 45 TB of text data, part of which was Common Crawl, a multi-petabyte corpus obtained from 8 years of web crawling.
On top of that, 16% of its corpus was books, totaling about 67 billion tokens.
antonivs t1_je1d8o0 wrote
Reply to comment by dancingnightly in [D] FOMO on the rapid pace of LLMs by 00001746
The training corpus size here is in the multi-TB range, so probably isn't going to work with the OpenAI API currently, from what I understand.
antonivs t1_je1cuw1 wrote
Reply to comment by Craksy in [D] FOMO on the rapid pace of LLMs by 00001746
My description may have been misleading. They did the pretraining in this case. The training corpus wasn't natural language, it was a large set of executable definitions written in a company DSL, created by customers via a web UI.
antonivs t1_je0pfza wrote
Reply to comment by happycube in [D] FOMO on the rapid pace of LLMs by 00001746
Thanks! I actually don't know exactly what this guy used, I'll have to check.
antonivs t1_je0pb85 wrote
Reply to comment by Qpylon in [D] FOMO on the rapid pace of LLMs by 00001746
Our product involves a domain-specific language, which customers typically interface to via a web UI, to control the behavior of execution. The first model this guy trained involved generating that DSL so customers could enter a natural language request and avoid having to go through a multi-step GUI flow.
They've tried using it for docs too, that worked well.
antonivs t1_jdz6vai wrote
Reply to comment by abnormal_human in [D] FOMO on the rapid pace of LLMs by 00001746
Exactly what I was getting at, yes.
antonivs t1_jdyp1zw wrote
Reply to comment by rshah4 in [D] FOMO on the rapid pace of LLMs by 00001746
> I wouldn't get worried about training these models from scratch. Very few people are going to need those skills.
Not sure about that, unless you also mean that there are relatively few ML developers in general.
After the ChatGPT fuss began, one of our developers trained a GPT model on a couple of different subsets of our company's data, using one of the open source GPT packages, which is obviously behind GPT 3, 3.5, or 4. He got very good results though, to the point we're working on productizing it. Not every model needs to be trained on internet-sized corpuses.
antonivs t1_jdvqdpc wrote
Reply to comment by Cool_Abbreviations_9 in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
One thing I wonder about is how it arrives at those confidence scores. They're also presumably just the output of the language model, so why should they be correlated with the actual existence of the papers in question?
antonivs t1_j45u0dn wrote
Reply to comment by tiptoemicrobe in Rep. George Santos says he'll resign if 142,000 people ask him to by esporx
We need a lemon law for politicians.
antonivs t1_j1oaiwm wrote
Reply to comment by genius96 in NYPD pilot stripped of badge after wife's arrest for vaccine fraud by mowotlarx
And number of kills
antonivs t1_j0tkr0n wrote
Reply to comment by Captain-Barracuda in How do X-rays “compress” a nuclear fusion pellet? by i_owe_them13
As other comments have pointed out, you can’t.
It’s important to notice that this means that the term “ignition” here is misleading. The term is being used to imply that this means some physically important threshold has been reached, but that’s not true.
“Ignition” in this context is simply an arbitrary name for a symbolic point on the reaction efficiency chart. It has no physical meaning. No actual breakthrough in fusion physics has occurred, simply an improvement in efficiency.
antonivs t1_j0a7svs wrote
The first rule of talking about large language models, is that as a large language model, I can't talk about large language models.
antonivs t1_je82r3j wrote
Reply to comment by Craksy in [D] FOMO on the rapid pace of LLMs by 00001746
Well, I do need to be a bit vague. The main DSL has about 50 instructions corresponding to actions to be performed. There's also another different sub-DSL, with about 25 instructions, to represent key features of the domain model, that allows particular scenarios to be defined and then recognized when executing.
Both DSLs are almost entirely linear and declarative, so there's no nested structure, and the only control flow is a conditional branch instruction in the top-level DSL, to support conditional execution and looping. The UI essentially acts as a wizard, so that users don't have to deal with low-level detail.
There are various ideas for the GPT model, including suggesting instructions when creating a program, self-healing when something breaks, and finally generating programs from scratch based on data that we happen to already collect anyway.
NLP will probably end up being part of it as well - for that, we'd probably use the fine-tuning approach with an existing language model as you suggested.