buggaby
buggaby OP t1_jc3ifnw wrote
Reply to comment by MysteryInc152 in [D] Are modern generative AI models on a path to significantly improved truthfulness? by buggaby
Informative. Thanks. I'm a complexity scientist with training in some ML approaches, but not in transformers or other RL approaches. I'll review this (though not as fast as a LLM can...)
buggaby OP t1_jc3dslx wrote
Reply to comment by igorhorst in [D] Are modern generative AI models on a path to significantly improved truthfulness? by buggaby
Great resources there, thanks.
I'm quite torn by the Bitter Solution, since, in my eyes, the types of questions explored since the start of AI research have been, from one perspective, quite simple. Chess and Go (and indeed other more recent examples in Poker and real-time video games) can be easily simulated. The game is perfectly replicated in the simulation. And speech and image recognition are very easily labelled by human labellers. But I wonder if we are entering a dramatically different goal for modern algorithms.
I quite like the take in this piece about how slowly human brains work and yet how complex they are. That describes a very different learning pattern than what results from the increasing computational speed of computers. Humans learn through a relatively small number of exposures to a very highly complex set of data (the experienced world). But algorithms have always relied on huge amounts of data (even simulated data, in the case of reinforcement learning). But when this data is hard to simulate and hard to label, then how can simply increasing the computation lead to faster machine learning?
I would argue that much of the world is driven by dynamic complexity, which highlights that data is only so valuable without knowledge of the underlying structure. (One example is the 3 body problem - small changes in initial condition results in very quick and dramatic changes in future trajectory.)
As an aside, I would argue that this is one reason that AI solutions have so rarely been used in healthcare settings: the data is so sparse compared with the complexity of the problem.
It seems to me that the value of computation depends on the volume and correctness and appropriateness of the data. So many systems that we navigate and are important to us have hard-to-measure data, data that is noisy, data that is relatively sparse given the complexity of the system, and whose future behaviour is incredibly sensitive to noise in the data.
buggaby OP t1_jc3a3zh wrote
Reply to comment by MysteryInc152 in [D] Are modern generative AI models on a path to significantly improved truthfulness? by buggaby
Thanks for that note. This sounds like, basically, 2 data sets are needed for this process. One with general responses and language, and one with high-accuracy contextual knowledge.
> bigger and smarter models need to guess less and therefore hallucinate less
>The largest models were generally the least truthful.
So maybe we need even more work to keep these truthful.
Submitted by buggaby t3_11qgasm in MachineLearning
buggaby OP t1_jbaf742 wrote
Reply to comment by Ryimax in Lonestar emerges from stealth with plans for lunar data centers by buggaby
If you have to bury it though? Anyway, we have cold places on the Earth.
buggaby OP t1_jba4r8i wrote
Put data centers on the moon?
- The moon has no atmosphere, so even small particles hit the surface at huge speeds. You need to bury them.
- These centers are "environmentally friendly", because launching them is free?
- What is this supposed to protect from? An earth ending meteor? Then what's the data for if we're all dead?
- The data centers will be super far from everything, so really slow to access.
I can't see any good reason for this. But maybe that's just me.
Submitted by buggaby t3_11l2tlt in nottheonion
buggaby OP t1_jc3jw39 wrote
Reply to comment by MysteryInc152 in [D] Are modern generative AI models on a path to significantly improved truthfulness? by buggaby
How do you find the model size? All those you listed appear to be based on GPT-3 or 3.5 which, according to my searching, are both 175B parameters. It looks to me like they are different only in the kind and amount of fine-tuning. What am I missing?