turnip_burrito
turnip_burrito t1_jegfptf wrote
Reply to comment by genericrich in ASI Is The Ultimate Weapon, And We Are In An Arms Race by ilikeover9000turtles
But the nuke would not only cause a huge backlash, but also likely ruin your economy.
turnip_burrito t1_jeg3lh1 wrote
Reply to comment by webernicke in Goddamn it's really happening by BreadManToast
Born just in time to become immortal, better versions of ourselves, maybe.
turnip_burrito t1_jeg0nuc wrote
Reply to Indirect democracy represented by AI by SSan_DDiego
If everybody chooses what society they live in, and can move there with its allowance, then there is no issue.
turnip_burrito t1_jefysiz wrote
Reply to comment by FermiAnyon in [D] Turns out, Othello-GPT does have a world model. by Desi___Gigachad
My prompt:
> Suppose I have an N>>1 dimensional space, finite in extent along any given axis, in which a set of M random vectors are dispersed (each coordinate of each vector is randomly sampled from a uniform distribution spanning some bounded range of the space). What can we say about the distances in this space between the M vectors?
I left my prompt open ended to not give it any ideas one way or another.
Its response makes sense to me. The standard deviation of a set of random samples from a uniform distribution centered at mean 0, which is proportional to the distance calculated here, should shrink as dimension N grows. If N is large, then the distribution of pairwise distances will narrow until nearly all points are roughly the same distance from each other. (The random sampling is a way to build in lack of correlation, like how you mentioned unrelated ideas)
Of course, the reverse is also true: if dimension N is small, then originally "far" points will become closer or farther (which one effect exactly is unpredictable depending on which dimensions are removed) because the averaging over random sample fluctuations disappears.
turnip_burrito t1_jee0mso wrote
Reply to comment by FermiAnyon in [D] Turns out, Othello-GPT does have a world model. by Desi___Gigachad
Here's what GPT4 chimed in with (it lines up with what I've heard and read before):
===========
In an N-dimensional space with N >> 1, where M random vectors are dispersed with each coordinate sampled from a uniform distribution within a bounded range, we can make a few observations about the distances between these vectors:
High-dimensional space behavior: In high-dimensional spaces, the behavior of distances between random points differs significantly from what we would expect in lower-dimensional spaces like 2D or 3D. In high-dimensional spaces, most points tend to be far away from each other, and the distances between them are often more similar than they would be in lower-dimensional spaces.
Concentration of distances: As the dimensionality N increases, the pairwise distances between the M random vectors tend to concentrate around a specific value. The distances between points in high-dimensional spaces tend to be less varied than in lower-dimensional spaces, and the majority of the distances will be relatively close to the expected value. This phenomenon is known as the "concentration of measure."
Sparse representation: In high-dimensional spaces, the points are often sparsely distributed, which means that there is a lot of empty space between them. This can lead to a counterintuitive property, where increasing the number of dimensions might actually make it easier to distinguish between the points, as they become more "isolated" in the space.
Curse of dimensionality: While the above-mentioned properties might sometimes be useful, high-dimensional spaces can also give rise to the "curse of dimensionality." This term refers to various challenges that arise when working with high-dimensional data, such as increased computational complexity, difficulty in visualizing the data, and the need for a larger number of samples to obtain statistically meaningful results.
To summarize, in an N-dimensional space with N >> 1, the pairwise distances between M random vectors will generally be far from each other and concentrate around a specific value. High-dimensional spaces exhibit unique properties like the concentration of measure and sparse representation, but they also come with challenges like the curse of dimensionality
turnip_burrito t1_jedy6rw wrote
Reply to comment by ItIsIThePope in ASI Is The Ultimate Weapon, And We Are In An Arms Race by ilikeover9000turtles
Intelligence has nothing to do with morals.
turnip_burrito t1_jeds3mv wrote
Reply to comment by FermiAnyon in [D] Turns out, Othello-GPT does have a world model. by Desi___Gigachad
>unrelated concepts are no longer equidistant from other unrelated concepts,
Are distances normally the same for all unrelated concepts in a very high dimensional space? Does this have to do with unrelated concepts having low correlation in coordinates, so random distances in each axis, and therefore on average the same distance between each pair of unrelated concepts as any other unrelated pair?
turnip_burrito t1_jedrykf wrote
Reply to comment by FermiAnyon in [D] Turns out, Othello-GPT does have a world model. by Desi___Gigachad
>In other words, a model without this property might havre an embedding space that isn't large enough, but you're still teaching it things and the result is that you're cramming things into your embedding space that's too small, so unrelated concepts are no longer equidistant from other unrelated concepts, implying a relationship that doesn't really exist with the result being that the language model confuses things that shouldn't be confused.
So False Nearest Neighbors?
turnip_burrito t1_jeckpk3 wrote
Reply to comment by [deleted] in When will AI actually start taking jobs? by Weeb_Geek_7779
Be aware that GPT4 is very good at basic programming, but not anything too specialized or difficult.
I don't know if GPT4 is allowed in government jobs or industry, but a later descendant may be.
turnip_burrito t1_je9jo7t wrote
Reply to comment by SnooWalruses8636 in The argument that a computer can't really "understand" things is stupid and completely irrelevant. by hey__bert
Ilya seems to be thinking more like a physicist than a computer scientist. This makes sense from a physics point of view.
turnip_burrito t1_je8lvl6 wrote
Depends on how it's built.
turnip_burrito t1_je8ichg wrote
The essence of it is this:
You have a model of some thing out there in the world. Ideally the model should be able to copy the behavior of that thing. That means it needs to produce the same data as that real thing.
So, you change parts of the model (numbers called parameters) until the model can create the data already collected from the real world system. This parameter- changing process is called training.
So for example, your model can be y=mx+b, a straight line, and the process of making sure m and b are good values to align the line to dataset (X, Y) is "training". AI models are not straight lines like y=mx+b, but the idea is the same. It's really advanced curve fitting, and some really interesting properties can emerge in the models as a result.
turnip_burrito t1_je8i45w wrote
Reply to comment by FlyingCockAndBalls in When people refer to “training” an AI, what does that actually mean? by Not-Banksy
"Attention mechanism" makes it good at predicting new words from past ones.
The paper that introduced the attention mechanism is called Attention its All You Need.
turnip_burrito t1_je6pzcu wrote
Reply to comment by TheSheepSheerer in How do i catch up with everything that is going on in A.I. Field? by Comfortable-Act9400
You must be a very quick reader.
turnip_burrito t1_je4zkmj wrote
That's the neat part.
You don't.
turnip_burrito t1_je3wkne wrote
Reply to Facing the inevitable singularity by IonceExisted
This is the first time to me that the world has felt so alien compared to the one we grew up in. This must be how it felt to the other generations too, watching man land on the moon, or the Internet come in when they were already fully grown adults with a mortgage. It's uncomfortable and anxiety-inducing. This change is at odds with our entire evolutionary and cultural history.
At the same time, humanity is growing into its limitless potential. It's also exciting. What can we accomplish now that eluded us before? Can we really make the world better now, and remove all sorts of needless suffering like illness, death, and slavery? What will we become if we make it to the other side of this singularity?
turnip_burrito t1_je3856d wrote
Reply to Single parent homes are the result of power grab by the neoliberal technocracy due the crossing of the singularity. by practical_ussy
I'm not reading that
I'm happy for you
Or sorry that happened
turnip_burrito t1_jduhcoa wrote
Reply to comment by RadioFreeAmerika in Why is maths so hard for LLMs? by RadioFreeAmerika
Yeah for an individual it's no joke .
For a business it may be worth it, depending on the job.
turnip_burrito t1_jdsqq3f wrote
Reply to comment by Cryptizard in Why is maths so hard for LLMs? by RadioFreeAmerika
I just tried a couple times now and you're right. That's weird.
When I tried these things about a week and a half ago, it did have the performance I found. Either I got lucky or something changed.
turnip_burrito t1_jdspnv6 wrote
Reply to comment by Cryptizard in Why is maths so hard for LLMs? by RadioFreeAmerika
It's good at math, it just has a rounded answer.
Most of the time it was actually absurdly accurate (0.0000001% error), and the 4 sig fig rounding only happened once or twice.
It is technically wrong. But so is a calculator's answer. The calculator cannot give an exact decimal representation either. So is it bad at math?
turnip_burrito t1_jdsoxo1 wrote
Reply to comment by RadioFreeAmerika in Why is maths so hard for LLMs? by RadioFreeAmerika
Yeah, we're really waiting for electricity costs to fall if we want to implement things like this in reality.
Right now the roughly current rate of $0.10/(1000tokens)/minute/LLM will, per hour, cost us $6 per hour to run a single LLM. If you have some ensemble of LLMs checking each other's work and working in parallel, say 10 LLMs, that's $60/hr, or $1440/day. Yikes, I can't afford that. And that will maybe have performance and problem solving somewhere between a single LLM and one human.
Once the cost falls by a factor of 100, that's $14.40/day. Expensive, but much more reasonable.
turnip_burrito t1_jdsninw wrote
Reply to comment by Cryptizard in Why is maths so hard for LLMs? by RadioFreeAmerika
Why even point this out?
If you reread my reply, you would see I said "exactly right OR right to 4 or 7 sig figs". I didn't say 4 or 7 sig figs was exactly right. I'm going to give you the benefit of the doubt and assume you just misread the reply.
turnip_burrito t1_jdse82g wrote
Reply to comment by Cryptizard in Why is maths so hard for LLMs? by RadioFreeAmerika
I've done this like 8 or 9 times with crazy things like 47t729374^3 /37462-736262636^2 /374 and it has gotten them all exactly right or right to 4 or 7 sig figs (always due to rounding whicj it acknowledges).
Maybe I just got lucky 8 or 9 times in a row.
turnip_burrito t1_jdqrxre wrote
Reply to comment by RadioFreeAmerika in Why is maths so hard for LLMs? by RadioFreeAmerika
To be fair, the model does have weaknesses. Just this particular one maybe has a workaround.
turnip_burrito t1_jegu7uk wrote
Reply to comment by FermiAnyon in [D] Turns out, Othello-GPT does have a world model. by Desi___Gigachad
Yeah I made the simplification of random vectors myself just to approximate what uncorrelated "features" in an embedding space could be like.
One thing that's relevant for embedding space size Takens theorem: https://en.wikipedia.org/wiki/Takens%27s_theorem?wprov=sfla1
If you have an originally D dimensional system (measured using correlation or information dimension for example), and you time delay embed data from the system, you at most (can be lower) need 2*D+1 embedding dimensions to ensure no false nearest neighbors.
This sets an upper bound if you use time delays. Now, for a *non-*time delayed embedding, I don't know the answer. I asked GPT4 and it said no analytical method for determining embedding dimension M presently exists ahead of time. An experimental method does exist that you can perform before training a model: You need to grow the number of embedding dimensions M and calculate FNN every time M grows. Once FNN drops to near zero, then you've finally found a suitable M.
One neat part about all this is that if you have some complex D-dimensional manifold or distribution with features that "poke out" into different directions in the embedding space (imagine a wheel hub with spokes), then increasing the embedding space size M will also increase the distance between the spokes. If M gets large enough, all the spokes should be nearly equal in distance from each other, but points along a singular spoke are also far from each other in most directions except for just a small subset.
I don't think that making it super large would actually make learning on the data any easier though. Best to stick with close to the minimum embedding dimension M. If you get larger, then measurement noise in your data becomes more represented in the embedded distribution. These dynamics also unfold when you increase M, which means if you're trying to only predict the D-dimensional system, you'll have harder time because now you're predicting a (D+large#) dimensional system and the obviousness of the D-dimensional system distribution gets lost in the larger distribution.