adt

adt t1_jbbzba8 wrote

There are a few that 'feel' that way. Try Megatron-11B (~200:1) based on RoBERTa (6,198:1). Wayyyyy ahead of its time, and I've matched it with much larger models in some testing.

https://app.inferkit.com/demo

Here's the full table of Chinchilla-align comparisons:

https://lifearchitect.ai/models-table/

2

adt t1_j9w6x17 wrote

Leave them be.

Listen to the experts.

Connor Leahy was the first to re-create the GPT-2 model back in 2019 (by hand, he knows the tech stack, OpenAI lined up a meeting with him and told him to back off), co-founder of EleutherAI (open-source language models), helped with GPT-J and GPT-NeoX-20B models, advised Aleph Alpha (Europe's biggest language model lab), and is now the CEO of Conjecture.

Dude knows what he's talking about, and is also very careful about his wording (see the NeoX-20B paper s6 pp11 treading carefully around the subject of Transformative AI).

And yet, in Nov/2020, he went on record saying:

​

>“I think GPT-3 is artificial general intelligence, AGI. I think GPT-3 is as intelligent as a human. And I think that it is probably more intelligent than a human in a restricted way… in many ways it is more purely intelligent than humans are. I think humans are approximating what GPT-3 is doing, not vice versa.”
— Connor Leahy, co-founder of EleutherAI, creator of GPT-J (November 2020)

42

adt t1_j9nv4zj wrote

>shouldn't the onus of delineating man from machine be on the side providing the AI chatbot?

It is.

Here's a very long read, but it will explain how OpenAI is building in watermarking for use by govt + themselves + maybe academia.

https://scottaaronson.blog/?p=6823

>'to watermark, instead of selecting the next token randomly, the idea will be to select it pseudorandomly, using a cryptographic pseudorandom function, whose key is known only to OpenAI. That won’t make any detectable difference to the end user, assuming the end user can’t distinguish the pseudorandom numbers from truly random ones. But now you can choose a pseudorandom function that secretly biases a certain score—a sum over a certain function g evaluated at each n-gram (sequence of n consecutive tokens), for some small n—which score you can also compute if you know the key for this pseudorandom function'

And why they wouldn't just stick it in a database of logs:

>'Some might wonder: if OpenAI controls the server, then why go to all the trouble to watermark? Why not just store all of GPT’s outputs in a giant database, and then consult the database later if you want to know whether something came from GPT? Well, the latter could be done, and might even have to be done in high-stakes cases involving law enforcement or whatever. But it would raise some serious privacy concerns: how do you reveal whether GPT did or didn’t generate a given candidate text, without potentially revealing how other people have been using GPT? The database approach also has difficulties in distinguishing text that GPT uniquely generated, from text that it generated simply because it has very high probability (e.g., a list of the first hundred prime numbers).'

7

adt t1_j9neq5w wrote

There should be quite a few models smaller than 15M params. What's your use case? A lot of the 2022-2023 optimizations mean that you can squish models onto modern GPUs now (i.e. int8 etc.).

Designed to be fit onto a standard GPU, DeepMind Gato was bigger than I thought, with starting size of 79M params.

Have you found the BERT compression paper, which compresses the models to 7MB? It lists some 1.2M-6.2M param models:

https://arxiv.org/pdf/1909.11687.pdf

My table shows...

https://docs.google.com/spreadsheets/d/1O5KVQW1Hx5ZAkcg8AIRjbQLQzx2wVaLl0SqUu-ir9Fs/edit#gid=1158069878

*looks at table*

Smallest seems to be Microsoft Pact, which was ~30M params. Ignore that! Transformer is supposed to be wide and deep, I suppose, so it makes sense...

Many of the text-to-image models use smaller LLMs.

Also check HF, they now have 130,000 models of different sizes (to Feb/2023):

https://huggingface.co/models

Includes a tiny-gpt2: https://huggingface.co/sshleifer/tiny-gpt2

And t5-efficient tiny ('has 15.58 million parameters and thus requires ca. 62.32 MB of memory in full precision (fp32) or 31.16 MB of memory in half precision (fp16 or bf16).'):

https://huggingface.co/google/t5-efficient-tiny

Edit: I thought of Anthropic's toy models, but they were not really LLMs. They did train a 10M model during scaling research (paper), but the model hasn't been released.

25

adt t1_j93hgk2 wrote

Not since 1/Jul/2022:

DeepMind Gato. In a Lex Fridman interview, DeepMind CEO Demis Hassabis revealed that the company is already training the next embodied generalist agent, ready for AGI. The original Gato was already an unforeseen innovation.

‘Gato predicts potentially any action or any token, and it’s just the beginning really, it’s our most general agent… that itself can be scaled up massively, more than we’ve done so far, obviously we’re in the middle of doing that.’

https://youtu.be/Gfr50f6ZBvo

via my Dec/2022 AI report:

https://lifearchitect.ai/the-sky-is-infinite/

23

adt t1_j5erdiz wrote

Already in the works (Scott Aaronson is a scientist with OpenAI):

>>we actually have a working prototype of the watermarking scheme, built by OpenAI engineer Hendrik Kirchner. It seems to work pretty well—empirically, a few hundred tokens seem to be enough to get a reasonable signal that yes, this text came from GPT.
>Now, this can all be defeated with enough effort. For example, if you used another AI to paraphrase GPT’s output—well okay, we’re not going to be able to detect that. On the other hand, if you just insert or delete a few words here and there, or rearrange the order of some sentences, the watermarking signal will still be there. Because it depends only on a sum over n-grams, it’s robust against those sorts of interventions.
https://scottaaronson.blog/?p=6823

177

adt t1_j1y8tmg wrote

Interesting collab between Google and DeepMind.

While the project was mainly Google's, Dr Nenad Tomasev from DeepMind was also involved and cited in the paper:

>This project was an extensive collaboration between many teams at Google Research and Deepmind... We are also grateful to... Jeff Dean for [his] support during the course of this project.

Edit: my independent report on Pathways for interest.

16

adt t1_iuatudy wrote

Hmmm.... This seems dated.

The article is from 18/Sep/2022.

The actual report is by Europol, from some time in 2022.

The report is citing a book by EU-advisor Nina Schick, from 6/Aug/2020.

So, the original source was written well before the public release of GPT-3, and years before the release of current text-to-image (DALL-E 2, Midjourney, SD) and text-to-video capabilities.

I don't know what that means relative to the percentage quoted though!

31

adt t1_irutbar wrote

Just adding Dr Alan Turing's comment here, from this original 1950 paper on AI:

​

>…should we not believe that He [source, the universe, life] has freedom to confer a soul on an elephant if He sees fit? We might expect that He would only exercise this power in conjunction with a mutation which provided the elephant with an appropriately improved brain to minister to the needs of this sort.
>
>An argument of exactly similar form may be made for the case of machines. It may seem different because it is more difficult to “swallow.” But this really only means that we think it would be less likely that He would consider the circumstances suitable for conferring a soul. The circumstances in question are discussed in the rest of this paper. In attempting to construct such machines we should not be irreverently usurping His power of creating souls, any more than we are in the procreation of children: rather we are, in either case, instruments of His will providing mansions for the souls that He creates.

Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59, 433–460. https://doi.org/10.1093/mind/LIX.236.433

2

adt t1_iqpnfta wrote

DeepMind in particular also don't seem to be following the usual 6 to 9-month incubation and review process for their papers.

The Sparrow paper was published on the 20/Sep/2022, and includes a sample conversation from 9/Sep/2022.

That's an 11-day turnaround (or 7-business-day turnaround) between research and publication!

7

adt t1_iqpn61q wrote

This is pretty big news, and probably belongs in /r/mediasynthesis for relevance. X-post from my notes in /r/mlscaling:

​

>Uses Chinchilla 70B with massive prompt crafting.
A collection of scripts co-written with this process were produced and staged at the Edmonton International Fringe Theatre Festival in August 2022. Reflections from the creative team are presented, as are comments from reviewers, as these represent critical reflections on human-machine co-creativity.

And a media review of the play by DeepMind Dramatron, The Man At The Bar.

>RFT has enlisted Dramatron, a bot brainchild of the research scientists at DeepMind, to write scripts for theatre, including locations, stage directions, characters, dialogue. And, OH NO!, Dramatron has actually delivered. It’s just that the bot script just stops part-way through (I mean the bot doesn’t get a Canada Council grant or anything). And it’s for the RFT cast to improvise what happens and how it all ends.Plays By Bots presents one Dramatron play per Fringe performance. Friday night’s script was The Man At The Bar, set in a dive bar called The Pool Pit with (as specified in the stage directions) a dirty floor and an atmosphere full of smoke and the smell of beer.
>
>At the outset the four-member human cast each got a sealed envelope with script and their role descriptions, and a bag of props and costume pieces. Teddy (Jacob Banigan) is “an orphan and gifted lounge singer,” Gerald (Michael Johnson) is “quite wealthy.” His wife Rosie (Tyra Banda) is “a regular.” Gordie Lucius in a fetching blond wig is Lolo the road-weary bartender.
>
>And if there’s a certain flatness in the dialogue, which runs to declarations, that in itself is amusing since it turned out to be perfectly suited to the deadpan comic talents of Friday night’s improvisers. Banigan, for example, knows exactly what to do with “I’m putting down a song. A special song. I’m gonna sing the song.” He returns, as instructed, to the mic to deliver lounge-y songs extempore (“this is a helluva town…”). Rosie declares “I have a new hat…. I look beautiful in it.” Gerald says to Teddy “I want my money…. I’ll sue you.”
>
>The surprising thing (surprising to me, anyhow) is that the whole Dramatron play does hang together and create a world. About half-way through, the alert human actors start improvising, from the groundwork of the first part. They run with the characters; they reprise particularly funny laugh lines. Things happen, but Lola keeps pouring the drinks, and the human actors continue to capture the playwright bot’s tone.
>
>It’s a genuinely funny entertainment. Uh-oh.

https://12thnight.ca/2022/08/13/oh-no-bots-have-invaded-theatre-and-they-can-do-it-plays-by-bots-a-fringe-review/

9