chazzmoney
chazzmoney t1_iw4s9ic wrote
Reply to comment by StevenTM in Scientists Taught an AI to ‘Sleep’ So That It Doesn't Forget What It Learned, Like a Person. Researchers say counting sleep may be the best way for AIs to exhibit life-long learning. by mossadnik
Of course - thanks for the great questions!
chazzmoney t1_iw4l6c8 wrote
Reply to comment by StevenTM in Scientists Taught an AI to ‘Sleep’ So That It Doesn't Forget What It Learned, Like a Person. Researchers say counting sleep may be the best way for AIs to exhibit life-long learning. by mossadnik
If you only need to detect hotdog/pizza and dog/cat, its a fine solution. I was using those as examples, but usually its much more drastic - “transcribe speech to text” and “identify speech from background noise” and “identify speaker”. Or “answer trivia question”, “fill in the blank with the correct choice”, “determine if the text has a positive or negative sentiment”, “determine the main topic of the text”,. Etc…. quite complicated tasks.
Thus, there are a few reason it doesn’t work in practice:
-
Generality
-
Efficiency (hardware memory)
-
Efficiency (training computation)
-
Efficiency (inference latency)
Having a general network is more interesting - a single network that can solve multiple problems is more useful and more applicable to problems that may be similar or you don’t know you have yet. It can also be easier to “fine-tune” an existing network because you don’t have enough data on a given problem to train it from scratch.
Efficiency is (in my opinion), the bigger one:
-
To run a network, the entire network and parameters for it must be stored in memory. These days, they are on the order of gigabytes for “interesting” networks. Putting a multiplier (multiple networks) on this makes scaling quite challenging.
-
Training one general network may be harder, but it is much faster than training a new network from scratch for each problem. If you have thousands of problems, you don’t want to be training thousands of networks.
-
The size of “interesting” models makes inference challenging as well. The bigger (more interesting)the model, the more computation it must perform on each input; some modeling techniques are loops and require thousands of runs for each input. This seems fine at first but if a single inputs takes more than 10ms, a thousand loops will take longer than one second. Usually this means that the most interesting models have to be used on high end cloud equipment, which brings about further scaling challenges.
So your answer isn’t wrong; the practicality of the situation just makes it infeasible when you are working with large models (vision, video, language, etc) - the number of parameters is often in the billions.
chazzmoney t1_iw4edfl wrote
Reply to comment by StevenTM in Scientists Taught an AI to ‘Sleep’ So That It Doesn't Forget What It Learned, Like a Person. Researchers say counting sleep may be the best way for AIs to exhibit life-long learning. by mossadnik
Great question. First, some concepts that the explanation depends on…
A neural network is made up of “weights”. These weights are floating point values which are multiplied by whatever they receive from the prior neuron.
While each neuron separately doesn’t necessarily do something easily understood / clear, each neuron (and weight) is part of an overall algorithm that solves the trained problem.
When you take any network and train it to perform a specific task, the weights of the network are updated towards solving that specific task.
So, with that background, lets go back to your question- why does catastrophic forgetting happen?
If you were to compare the weights from a network trained to detect pizza/hot dog and the weights from a network trained to detect cat/dog: the two networks would have some weights that were similar and some that were different.
This is because some parts of the algorithm to detect food and the algorithm to detect animals are the same, and some parts of the algorithms are different.
Thus, when you start with one network and train for a different task, the “correct” (useful) parts remain and the “incorrect” (not useful) parts are replaced with useful parts. The parts that are not useful, the weights get changed sufficiently to prevent that part of the algorithm from working.
This is why catastrophic forgetting happens.
The cause is basically:
-
Having a network of specific size / bias to be able to successfully train for one task.
-
The algorithm for two different tasks being sufficiently different for some portion of the network.
-
Training on each task separately.
Thus there are some “solutions” which have been and continued to be explored:
-
Having a much larger network, with a bias towards generality rather than specificity.
-
Adding / removing / training specific subsections of the network for each task - and leaving the earlier part as “general”
-
Training for multiple purposes at the same time
chazzmoney t1_iw12gmj wrote
Reply to comment by smegdawg in Scientists Taught an AI to ‘Sleep’ So That It Doesn't Forget What It Learned, Like a Person. Researchers say counting sleep may be the best way for AIs to exhibit life-long learning. by mossadnik
ML “memory” is not the same as hardware “memory”.
There are multiple methods that can be utilized, but catastrophic forgetting refers to the modification of weights within a neural network such that it loses capabilities it previously had.
Lets say you train a system to identify cats and dogs. It gets to a specific accuracy, say 80% correct, 20% incorrect choosing between the two.
Then, you stop training it to do that task, and instead start training it to identify pizza and hot dogs. It trains faster, and gets to an accuracy of 95% on pizzas and hot dogs.
Now, you go back and see how accurate it is on identifying cats and dogs - but it turns out that it is only 52% accurate- almost the same as a coin toss.
It has essentially “forgotten” how to perform the original task. This is catastrophic forgetting (no pun intended).
Edit: Cat-astrophic
chazzmoney t1_ja834yo wrote
Reply to comment by Jefethevol in Belarus to Mobilize 1.5 Million Potential Soldiers Outside the Armed Forces in Case of War, Senior Belarusian Official Claims by Leshracc
I don’t remember the exact details, bit I think the Syria situation was more like:
the US knew it was Russia, they called up and down the chain of command for 4 hours while restraining their defense and avoiding significant attack. Every person they spoke with denied it was Russian. The US then let everyone know that they would attack if the attackers did not retreat. Another hour went by, and they destroyed all enemies.
Edit: https://en.m.wikipedia.org/wiki/Battle_of_Khasham https://taskandpurpose.com/news/russian-mercenaries-syria-firefight/