mgostIH

mgostIH t1_j46xnpc wrote

The real bitter lesson is how Standford got so many authors cited for introducing nothing but a less descriptive name than "Large models"

37

mgostIH t1_j02vbuy wrote

All the layers are trained independently at the same time, you can use gradients but you don't need backprop because you can use explicit descriptions since each layer will have as a problem maximizing ||W * x||^2 for good samples, minimizing it for bad samples (each layer gets a normalized version of the previous output).

The issue I find in this is (besides generating good contrastive examples) that I don't understand how this would lead a big network to discover interesting structure: circuits require multiple layers to do something interesting, but here each layer greedily optimizes its own evaluation. In some sense we are hoping that the output of the past layers will orient things in a way that doesn't make it too hard for the next layers, which have only linear dynamics.

1

mgostIH t1_iw4dks1 wrote

Like how a reviewer noted, the "zero shot" part is a bit overclaimed, given that one of the models has to be already trained with these relatives encodings, but the concept of the paper is an interesting phenomenon that points to there being a "true layout" of concepts in latent space that different type of models end up discovering.

30

mgostIH t1_iv1fosb wrote

If you'd look at any of the articles before stopping to the title you'd understand that's what referred to "AIs work can't be copywritten" is that you can't attribute copyright to the artificial intelligence itself, but all of these judgements allow any human that puts any minimal effort into the generation (for example typing the prompt) to own the copyright for the image instead.

1