Mason-B t1_ja61gpk wrote on February 27, 2023 at 2:36 AM

Reply to comment by Cryptizard in So what should we do? by googoobah

Getting this in before you try and block me again to snipe your responses in.

> You say that current AI is the same as it was 15 years ago (I am using your exact language here)

No, my exact language was:

> > All the things that were imagined being possible with the tech 15 years ago is now today here.

Also (edit),

> You said moores law has been slowing for decades and would be the main bottleneck for the future, I show you actual evidence that it has only very slightly started to slow since 2010 and somehow now that was your argument the whole time lol.

I said Moore's law in either incarnation would be a bottle neck, but it is also slowing (in a parenthetical no less!). Over long periods of time the slowing trend becomes obvious.

> > it would take 70 years at a minimum if the computer industry managed to double at it's current speeds (which has been slowing down for decades, take that into account and it's closer to 90)

It's like you are trying to find enough nitpicks to justify stopping arguing over a technicality.

Cryptizard t1_ja61rh1 wrote on February 27, 2023 at 2:39 AM

You said “there are no new theory breakthroughs in the last decade.”

Mason-B t1_ja624u2 wrote on February 27, 2023 at 2:42 AM

I already admitted I rounded up the 6 year old by publication date paper there. I should have said half a decade.

Do you have a substantive counter point instead of nitpicks? (In something I wrote from memory in 30 minutes?)

Because I would very much enjoy being wrong on this.

Cryptizard t1_ja648ox wrote on February 27, 2023 at 2:59 AM

Yes like I said everything you wrote is wrong. Moore’s law still has a lot of time left on it. There are a lot of new advances in ML/AI. You ignore the fact that we have seen a repeated pattern where a gigantic model comes out that can do thing X and then in the next 6-12 months someone else comes out with a compact model 20-50x smaller that can do the same thing. It happened with DALLE/Stable Diffusion, it happened with GPT/Chinchilla it happened with LLaMa. This is an additional scaling factor that provides another source of advancement.

You ignore the fact that there are plenty of models that are not LLMs making progress on different tasks. Some, like Gato, are generalist AIs that can do hundreds of different complex tasks.

I can’t find any reference that we are 7 orders of magnitude away from the complexity of a brain. We have neural networks with more parameters than there are neurons in a brain. A lot more. Biological neurons encode more than an artificial neuron, but not a million times more.

The rate of published AI research is rising literally exponentially. Another factor that accelerates progress.

I don’t care what you have written about programming, the statistics say that it can write more than 50% of code that people write TODAY. It will only get better.

Mason-B t1_ja67eyi wrote on February 27, 2023 at 3:25 AM

> You ignore the fact that we have seen a repeated pattern where a gigantic model comes out that can do thing X and then in the next 6-12 months someone else comes out with a compact model 20-50x smaller that can do the same thing. It happened with DALLE/Stable Diffusion

DALLE/Stable Diffusion is 3.5 Billion to 900 million. Which is x4. But the cost of that is the training size. Millions of source images versus billions. Again, we are pushing the boundaries of what is possible in ways that cannot be repeated. With a 3 orders of magnitude more training data we got a 4x reduction in efficiency (assuming no other improvements played a role in that). I don't think we'll be able to find 5 trillion worthwhile images to train on anytime soon.

But it is a good point that I missed it, I'll be sure to include it in my rant about "reasons we are hitting the limits of easy gains"

> You ignore the fact that there are plenty of models that are not LLMs making progress on different tasks. Some, like Gato, are generalist AIs that can do hundreds of different complex tasks.

If you read the paper they discuss the glaring limitation I mentioned above. Limited attention span, limited context length, with a single image being significant fraction (~40%) of the entire model's context. That's the whole ball game. They also point out the fundamental limit of their design here is the known one: quadratic scaling to increase context. Same issues of fundamental design here.

I don't see your point here. I never claimed we can't make generalist AIs with these techniques.

> I can’t find any reference that we are 7 orders of magnitude away from the complexity of a brain. We have neural networks with more parameters than there are neurons in a brain. A lot more. Biological neurons encode more than an artificial neuron, but not a million times more.

Depends how you interpret it. Mostly I am basing these numbers on super computer efficiency (for the silicon side) and the lower bound of estimates made by CMU about what human brains operate at. Which takes into account things like hormones and other brain chemicals acting as part of a neuron's behavior. Which yes, does get us to a million times more on the lower bound.

If you want to get into it there are other issues like network density and the delay of transmission between neurons that we also aren't anywhere close to at similar magnitudes. And there is the raw physics angle about how much waste heat the different computations generate at a similar magnitude difference.

To say nothing of the mutability problem.

> The rate of published AI research is rising literally exponentially. Another factor that accelerates progress.

Exact same thing happened with the boom right before AI winter in the 80s. And also stock market booms. In both cases, right before the hype crashes and burns.

> I don’t care what you have written about programming, the statistics say that it can write more than 50% of code that people write TODAY. It will only get better.

The github statistics being put out by the for-profit company that made and is trying to sell the model? I'm sure they are very reliable and reproducible (/s).

Also can write the code is far different than would. My quantum computer can solve problems first try doesn't mean that it will. While I'm sure it can predict a lot of what people write (I am even willing to agree to 50%) the actual problem is choosing which prediction to actually write. Again to say nothing of the other 50% which is likely where the broader context is.

And that lack of context is the fundamental problem. There is a limit to how much better it can get without a radical change to our methodology or decades more of hardware advancements.

Cryptizard t1_ja69a0b wrote on February 27, 2023 at 3:41 AM

It seems to come down to the fact that you think AI researchers are clowns and won’t be able to fix any of these extremely obvious problems in the near future. For example, there are already methods to break the quadratic bottleneck of attention.

Just two weeks ago there was a paper that compresses GPT-3 to1/4 the size. That’s two orders of magnitude in one paper, let alone 10 years. Your pessimism just makes no sense in light of what we have seen.

Mason-B t1_ja6cwsg wrote on February 27, 2023 at 4:12 AM

> It seems to come down to the fact that you think AI researchers are clowns and won’t be able to fix any of these extremely obvious problems in the near future.

No, I think they have forgotten the lessons of the last AI winter. That despite their best intentions to fix obvious problems, many of them will turn out to be intractable for decades.

Fundamentally what DNNs are is a very useful mechanism of optimization algorithm approximation over large domains. We know how that class of algorithms responds to exponential increases in computational power (and re, efficiency), more accurate approximations at a sub linear rate.

> For example, there are already methods to break the quadratic bottleneck of attention.

The paper itself says it's unclear if it works for larger datasets. But this group of techniques is fun because it's a trade off of accuracy for efficiency. Which yea, that's also an option. I'd even bet if you graphed the efficiency gain against the loss of accuracy across enough models and sizes it would match up.

> That’s two orders of magnitude in one paper, let alone 10 years.

Uh what now? Two doublings is not even half of one order of magnitude. Yes they may have compressed them by two orders of magnitude but having to decode them eats up most of those gains. Compression is not going to get enough gains on it's own, even if you get specialized hardware to remove a part of the decompression cost.

And left unanalyzed is how much of that comes from getting the entire model on a single device.

Fundamentally I think you are overlooking the fact that research into this topics has been making 2x, 4x gains all the time but a lot of those gains are being done in ways we can't repeat. We can't further compress already well compressed stuff for example. At some point soon (2-3 years) we are going to hit a wall where all we have is hardware gains.