Submitted by kindofaboveaverage t3_10aqkjj in askscience
Lately I've been seeing a lot of stuff with AI and they always mention how they trained the AI to do this or that. How exactly does that work? What's the process for doing that?
Submitted by kindofaboveaverage t3_10aqkjj in askscience
Lately I've been seeing a lot of stuff with AI and they always mention how they trained the AI to do this or that. How exactly does that work? What's the process for doing that?
[removed]
This is a wonderful, simple explanation of machine learning, which is about 98% of what passes for AI.
In the simplest case, you start with an untrained AI (some mathematical model with variable parameters) and training data for which you already know the desired output (supervised learning). Initially, the AI produces nonsense when given the training data, so you repeatedly make small changes to the parameters of the AI, making sure that the actual output gets closer and closer to the desired output. At some point, the actual output is close enough to the desired output that you stop - the AI has been trained, and when given data sufficiently similar to the training data will produce the desired output even though the AI has never encountered that specific data before.
It obviously gets more complicated, especially when you don't already know your desired output (unsupervised learning) or in more complex designs such as generative adversarial networks. Some machine learning approaches typically use specific algorithms for training, such as the Baum-Welch algorithm for Hidden Markov Models, while others may use generic optimization algorithms. In general though, the process of repeatedly making small changes and comparing the new result to the previous one is a largely universal part of training AI.
And to be clear, 'you' don't make changes to the parameters, or check the output: Its all done by a computer program in a loop and because you need lots of input data then it can take a long time.
There are a number of ways that give you different results depending on what your goal is. The more complicated the AI, the more complicated the training process is, but the most common way is built on this procedure:
Modern AI is built on Neural Networks, which were technically invented in 1943 as a computerized analogy to Neurons. However, they've been modified a bunch since then, and no longer resemble neurons in a lot of relevant ways. The real breakthrough in Neural Nets was when (1) it was discovered that GPUs, which were invented for graphics, could also be used for Neural Nets, and (2) a method of training "deep" neural networks, where there are many layers of neurons between the input and output, was invented. Before (2), neural networks were limited to 5-10 layers, because we couldn't figure out how to do step (3) in the above list on deeper neural nets. Modern neural networks can have hundreds of layers.
If you want to dive deeper into the mechanics of what a "neural network" actually is, you can watch https://www.youtube.com/watch?v=aircAruvnKk.
The other thing unlocking modern AI, beyond having the ability to train deep neural networks on GPUs, is lots of examples. The breakthrough for training AI to solve the board game Go, for example, was figuring out a way to train the AI via letting it play itself billions of times. This is hard because you can't know if a move is good or not until the end of the game.
One thing you should always be careful of when evaluating an AI is ask "what was it actually trained to do?" For example, consider ChatGPT. ChatGPT was not trained to "answer questions usefully", it was trained on the internet with the task of "given the first 1000 words of this website, guess the next word." It turns out if you take this "next word prediction machine" and repeatedly feed its best-guess output back into it as input, it can write paragraphs of comprehensible text. But it's not _trying_ to write comprehensive text, it's trying to predict the next word. This can make some of the weird ways it behaves make more sense. For example, once it's made a single mistake, it's more likely to make more mistakes, because it thinks (using this word as a analogy, who knows whether neural nets can really think) "huh, there's some mistakes in the previous text, I will guess there will be more mistakes in the rest of the test".
TL&DR: Damn this was ranty. Check out things like "perceptron", "Taylor series", and "Gradient descent"
Rant:
"AI" is a misnomer for a programming model where you don't describe the logic, but rather provide the data to "program" software. The method via which this data is generated, organized, and processed into logic is called "training" The steps to train will depend on the type of model and type of problem.
The "model" is the rules for how the data interacts to form logic, but is devout of problem specific logic. However models do model the possible logic by describing the data flow on how to search for a solution.D depending on the type of "training", supervised vs unsupervised, and the type of model you are using: A discriminator/classifier vs a transformer, training will entail some specific details and could even go as far as use another AI model to generate the training data itself. Ultimately it just a bunch of number crunching in order to calculate the coefficients (scalar weights) used at the various inputs of the model's internals. The task of training is typically massively parallel over a large data set, so SIMD (single instruction, multiple data) type processors like GPUs or TPUs have been tremendously successful since the relationship of weights to data can be represented as a multi dimensional vector (array, list, or whatever else)
There are two steps to training:
1 - black magic: selecting (or generating) the right training data, handcrafting/selecting a model (data flow) suitable for the problem, determining the source of the error signal and deciding on he cost function. The success criteria of one model vs another is biased by these factors.
2 - Crunching some numbers via Math: Linear algebra and Calculus (processing)
The weird stuff is in 1:
A raw "model" is a combination of mathematical functions that can execute on data and a description of all possible data flow between functions. When I say function, i literally mean things like "add". For example consider a basic linear function of f(x)=ax+by+cz+k in a 3d space... so 3 inputs... the job of training is to calculate a, b, c and k such that line used to connect the 3 variables in your specific problem class.... obviously i can make a subtract out of the add by training a negative coefficient, or I can recuse the function to form an exponent... the basic building block for a deep learning neural network is the "perceptron" (or the McCulloch-Pitts neuron) which was inspired by the function of neuron dendrites (weighted inputs), soma (add them up) and axon (output to other neurons)... an overly fancy way to in fact just say 1 if there is enough input to fire, and 0 if there isn't. Enough input is defined as the dot product between the input vector x (so [x y z] ) and the coefficient vector ( for our line [a b c] ) plus k (the constant) being greater than 1
A line is not a particularly sophisticated function, however by defining x as a line of some subset of other variables, you can get a generic polynomial. Engineers will recall the usefulness of Taylor series, where if you give me enough data, I can guess the right function pretty damn well. AI training is basically that, beating a polynomial into the shape of your data set without "over-fitting" (meaning, hard wiring your training data as a solution) the function.
These perceptions are wired up as necessary to create more complex functions. Additionally the model may also have some creative wiring, like "attention" which is message signalling back and forth between the neuron layers, which creates interesting behaviours in Transformer models that are oddly successful in processing human language. (until a model succeeds, this is the black magic, researchers are just chimping combinations of these structures till they find one that works for their problem)
Ultimately for training to be efficient the model must be differentiable, meaning you can find its derivative. This allows you to use calculus' gradient descent to find minima in the cost (or loss) function calculated on the errror signal.
Once you have defined a model, "training it" is easy
The process is simple...
So training requires a large amount of linear algebra and calculus processing of "training data" and "internal weights".
Once you have the model trained, and you are happy with it. Re-running it on new input data, without adjusting weights (coefficients) is called "inference" Since inference is basically calculating an already know polynomial (training discovered it, inferring it means I just have new values for the variables), the "algorithm" can be implemented as optimally as a physical circuit that requires very little computational resources, however that wouldn't be very "upgrade-able"... You are seeing "inference" engines that are basically programmable circuits like FPGAs for low power low latency realtime inference (running the algorithm derived by the model)
For what it is worth, AI is not a panacea, It is very likely that P is not equal to NP. Meaning that not all problems are solvable by a polynomial, no matter how big your calculator or how badly you beat the math into submission. Get into too many "if this then that" conditions and your model is not differentiable and you will struggle to generalize a loss function. Even in cases where you "found" a polynomial solution, you can't make the assumption real data will always fit. Humans can learn as they infer, AI typicallyu cannot for the simple reason that the computational tasks are different between learning and inference, and for a large model you'd have to packing quite the overkill of melted sand with jammed up lightning in it just so you can dynamically re-train.
[removed]
3Blue1Brown is a YouTube channel that covers mathematics. They made a 4 part video series on neural networks and how they work, and specifically what it means to train a neural network to accomplish a task.
There is a lot of info that is well explained, but would be too much to write down in a post, so hope this helps. It helped me understand neural networks much better.
[removed]
[removed]
[removed]
There are different ways, some of them involve a lot of math but one way is to start with multiple randomized neural nets which modify inputs you give them to outputs. You then test them, rank them by performance and get rid of the bottom half (doesn’t have to be the bottom half can be another fraction and can be that the lower the score the higher the chance to be removed but random). Then you make copies of the remaining ones with a random change to each and repeat.
This is all automated of course except for setting parameters like how many are removed how many are in each generation etc.
If you want an explanation of a neural net you should look up a video because it’s a lot easier to explain with visuals.
As others have stated, these days "AI" typically refers to "Artificial Neural Networks". As the name suggests, they are meant to mimic one of the trillions of neural networks found in the brains of animals/people.
Like a budding, miniature brain, you need to train/educate it by giving it a variety of situations (inputs) and seeing how it responds (outputs). Then, you reward or punish it (positive/negative reinforcement) based on its behavior. Naturally, there are different methods/algorithms on how to apply that reinforcement. (I once read a paper where the authors tried training an ANN using a genetic algorithm! (Which is a different kind of AI.))
You could almost think of it as showing a toddler flash cards of cats and dogs and giving them a candy for every right answer. But, what happens if the deck does not have any fox cards and then they see a fox, will they identify it as a dog or as a cat?
The documentary on Netflix, "Coded Bias", highlights the dangers of AI and the importance of having good training data. (eg. Having a deck of flash cards that includes foxes.) My only issue with the movie is that it does not discuss any positive uses of AI, which there are some.
If you want to experiment with AI first-hand, I found Microsoft's ML.Net surprisingly easy to use. It was as easy as give it a text doc and within minutes it will give you an AI to play with.
[removed]
The simplest algorithm used for basically every student-level task in my youth was "backward propagation of errors".
To run this model, you need three things: a big multi-layered filter (which will be our "AI"; a set of matrices that the initial data is multiplied on, and an activation function to mix some non-linearity in), a sufficient set of input data, and a sufficient set of corresponding output data. Sufficient for the system to pick on the general task.
Basically you take initial (empty or random) filter, feed it a piece of input data, subtract output data from corresponding desired result (finding what we call "error", i.e. difference between actual and desired result), and then you go backwards through the filter, layer by layer, and with simple essentially arithmetic operations, adjust the coefficients in a way that IF you fed the same data again, the "error" would be smaller.
If you "overfeed" one and the same input to this model 10 million times, you'll end up with the system that can only generate correct result for this, specific input.
But, when you randomly shift between several thousand options of inputs, the filter ends up in "imperfect but generally optimal" state.
The miracle of this algorithm is that it keeps working no matter how small the adjustments are, as long as they are made in the right direction.
One thing to keep in mind is that this particular model works best when the neuron activation function is monotonic, and the complexity of the task is actually limited by the amount of layers.
As a student, I made a simple demonstration program on this principle that was designing isotropic beams of equal resistance in response to given forces. During this experiment, I have proven that such a program requires two layers (since the task at hand is essentially a double integration).
I'm putting this response in because no-one seems to have mentioned backwards propagation of errors; modern and complex AI systems, especially working on speech/text, actually use more complex algorithms; it's just that this one is most intuitive and easiest to understand for humans.
"AI" and "machine learning" tend to be used interchangeably, especially in mass-media articles. In theory "AI" is more "intelligent" but ... well.
Anyway in a previous job I worked on a "machine learning" project which used a "binary classifier" (a relatively simple machine-learning method) to determine whether a short sound recording was "baby" or "not baby".
To train it, we had a whole load of sound recordings (.wav files), of domestic "non-baby" sounds, like hand-washing dishes, washing machine, vacuum cleaner, TV etc. And a load of "baby" sounds, which included babies babbling as well as crying. The "training" comprised getting the program to analyse those sounds (from the two labelled categories) and "learn" how to classify them. Set the computer-program running, and wait for an hour or two...
As with much audio-processing (including speech recognition), the sounds were analysed in short pieces lasting a few 10's milliseconds each, each characterised with about 20-30 parameters relating to the frequency-content and rate-of-change with time. In this case the "training" was essentially fitting a plane through a 20-30 dimensional graph of those parameters, splitting the set into "baby" on one side and "non-baby" on the other. Once trained, you could then give the algorithm new recordings that it hadn't "heard" before, and it would classify them accordingly.
A problem (symptomatic of many machine learning methods) was that if you presented it with a recording of a baby but with some other sound in the background - even just a cooker-hood fan, that it hadn't been trained for - it would fail to recognise the baby.
There is an ever-present danger with AI/ML systems that if you haven't included all possible confounding factors in the training data, they may completely and unexpectedly fail to work properly when that factor pops up in the real world.
[removed]
There are many, many different variations, but they more or less all work on the same basic premise.
Begin with an initially random model.
Test the model. Give it a problem and ask for its response.
Modify. If the system didn't behave as intended, change something.
Repeat steps 2 and 3 until you run out of training data.
Pray that the model works.
The most obvious differences between AIs will be in the structure of the model (how big is it, how connected, how many layers, what kind of internal memory etc) but the real fun stuff is in how we do the modifying.
We can show that, for some problems, just tweaking the system randomly is enough to get okay solutions. But it's very far from ideal. Better is to be able to nudge the system "towards" the expected behavior. We've put a lot of focus into how to design these systems so that our modifications are more fruitful.
Scuka1 t1_j47s2cb wrote
I don't really agree with the term "AI" here, but let's use it for now.
I'll speak for neural networks, and particularly, a method called "supervised learning", i.e. learning where you as a human know the inputs and the desired outputs of the learning data set.
Neural network is just a mathematical model that:
Training the AI is, in simplest possible terms, adjusting the parameters of step 2.
When training the AI, you have a bunch of data where you know the inputs and outputs. Say you want to train the AI to recognize which image has a cat in it. You have a bunch of images, and you, as a human, know which ones are cats, and which aren't. Those images will be used to train the AI.
So, in the process of training you:
Now with known inputs and outputs, the AI tries to figure out the middle step. Similar to how when you've got an equation 3 × X = 6 you try to figure out what X is by asking "what number do I multiply with 3 to get to 6?", the AI tries to figure out "what kind of math do I need to do to a bunch of this pixel data to get to '1'?".
This is done through some mathematical algorithms, essentially by first trying certain parameters and seeing what output it gets it and then iterating with math to get closer to the solution. And through that process, the AI adjusts its own parameters so that, when you give it a bunch of pixels, it can take that pixel data, do math, and the result of that math equals 1 if there's a cat in the image and 0 if there are no cats.
So, in short (and oversimplified), you've got equations:
"image with a cat" × X = 1
"image without a cat" × X = 0
Training is using a bunch of mathematical algorithms on known data to figure out what X is.
Now that the X is known, you can give AI an unknown image so it can do math "X" on it, and figure out on its own whether the image contains a cat.