smegdawg t1_iw02s28 wrote on November 11, 2022 at 10:07 PM

Isn't this essentially the difference between RAM and ROM?

Or are they adding a third level RUM, Really-Unfungible-Memory?

chazzmoney t1_iw12gmj wrote on November 12, 2022 at 2:55 AM

ML “memory” is not the same as hardware “memory”.

There are multiple methods that can be utilized, but catastrophic forgetting refers to the modification of weights within a neural network such that it loses capabilities it previously had.

Lets say you train a system to identify cats and dogs. It gets to a specific accuracy, say 80% correct, 20% incorrect choosing between the two.

Then, you stop training it to do that task, and instead start training it to identify pizza and hot dogs. It trains faster, and gets to an accuracy of 95% on pizzas and hot dogs.

Now, you go back and see how accurate it is on identifying cats and dogs - but it turns out that it is only 52% accurate- almost the same as a coin toss.

It has essentially “forgotten” how to perform the original task. This is catastrophic forgetting (no pun intended).

Edit: Cat-astrophic

smegdawg t1_iw15i2k wrote on November 12, 2022 at 3:23 AM

Interesting. Thanks for the ELI5

me_team t1_iw1kxcx wrote on November 12, 2022 at 6:00 AM

Deep down I know that pun was absolutely intended. Sly!

StevenTM t1_iw3mng5 wrote on November 12, 2022 at 6:28 PM

Yes but why does it happen?

chazzmoney t1_iw4edfl wrote on November 12, 2022 at 9:43 PM

Great question. First, some concepts that the explanation depends on…

A neural network is made up of “weights”. These weights are floating point values which are multiplied by whatever they receive from the prior neuron.

While each neuron separately doesn’t necessarily do something easily understood / clear, each neuron (and weight) is part of an overall algorithm that solves the trained problem.

When you take any network and train it to perform a specific task, the weights of the network are updated towards solving that specific task.

So, with that background, lets go back to your question- why does catastrophic forgetting happen?

If you were to compare the weights from a network trained to detect pizza/hot dog and the weights from a network trained to detect cat/dog: the two networks would have some weights that were similar and some that were different.

This is because some parts of the algorithm to detect food and the algorithm to detect animals are the same, and some parts of the algorithms are different.

Thus, when you start with one network and train for a different task, the “correct” (useful) parts remain and the “incorrect” (not useful) parts are replaced with useful parts. The parts that are not useful, the weights get changed sufficiently to prevent that part of the algorithm from working.

This is why catastrophic forgetting happens.

The cause is basically:

Having a network of specific size / bias to be able to successfully train for one task.
The algorithm for two different tasks being sufficiently different for some portion of the network.
Training on each task separately.

Thus there are some “solutions” which have been and continued to be explored:

Having a much larger network, with a bias towards generality rather than specificity.
Adding / removing / training specific subsections of the network for each task - and leaving the earlier part as “general”
Training for multiple purposes at the same time

StevenTM t1_iw4gxco wrote on November 12, 2022 at 10:02 PM

Why isn't it possible to just create a copy of the weights for task 1/task 2 at various points, or even continuously?

Storage space is ridiculously cheap, and even high powered debugging traces (like Microsoft's iDNA (TTT Debugging), which basically captures full process dumps in millisecond increments (albeit for a single running process) aren't THAT huge.

Then when you re-run task 1, it just uses the weights from the latest snapshot for task 1. I don't see why it wouldn't or what the benefit of using the (obviously mismatched) weights from task 2 would be (while running task 1).

I mean.. i know it sounds like a stupidly obvious suggestion, and I'm fairly certain it isn't used as a solution, but can't figure out why

chazzmoney t1_iw4l6c8 wrote on November 12, 2022 at 10:33 PM

If you only need to detect hotdog/pizza and dog/cat, its a fine solution. I was using those as examples, but usually its much more drastic - “transcribe speech to text” and “identify speech from background noise” and “identify speaker”. Or “answer trivia question”, “fill in the blank with the correct choice”, “determine if the text has a positive or negative sentiment”, “determine the main topic of the text”,. Etc…. quite complicated tasks.

Thus, there are a few reason it doesn’t work in practice:

Generality
Efficiency (hardware memory)
Efficiency (training computation)
Efficiency (inference latency)

Having a general network is more interesting - a single network that can solve multiple problems is more useful and more applicable to problems that may be similar or you don’t know you have yet. It can also be easier to “fine-tune” an existing network because you don’t have enough data on a given problem to train it from scratch.

Efficiency is (in my opinion), the bigger one:

To run a network, the entire network and parameters for it must be stored in memory. These days, they are on the order of gigabytes for “interesting” networks. Putting a multiplier (multiple networks) on this makes scaling quite challenging.
Training one general network may be harder, but it is much faster than training a new network from scratch for each problem. If you have thousands of problems, you don’t want to be training thousands of networks.
The size of “interesting” models makes inference challenging as well. The bigger (more interesting)the model, the more computation it must perform on each input; some modeling techniques are loops and require thousands of runs for each input. This seems fine at first but if a single inputs takes more than 10ms, a thousand loops will take longer than one second. Usually this means that the most interesting models have to be used on high end cloud equipment, which brings about further scaling challenges.

So your answer isn’t wrong; the practicality of the situation just makes it infeasible when you are working with large models (vision, video, language, etc) - the number of parameters is often in the billions.

StevenTM t1_iw4nmz3 wrote on November 12, 2022 at 10:52 PM

Thank you for the in-depth answer, it was very interesting!

chazzmoney t1_iw4s9ic wrote on November 12, 2022 at 11:26 PM

Of course - thanks for the great questions!

LeftWingMan t1_iw13m8n wrote on November 12, 2022 at 3:06 AM

Using REM to turn RAM into ROM.

[deleted] t1_iw1ecq3 wrote on November 12, 2022 at 4:48 AM

[removed]

teddyspaghetti t1_iw28aor wrote on November 12, 2022 at 11:30 AM

Hold my RUM

FibroBitch96 t1_iw12y7v wrote on November 12, 2022 at 3:00 AM

Why is the RUM always gone?

[deleted] t1_iw07mp5 wrote on November 11, 2022 at 10:43 PM

[deleted]

[deleted] t1_iw0kwbr wrote on November 12, 2022 at 12:26 AM

[removed]

MisterVelveteen t1_iw1awb3 wrote on November 12, 2022 at 4:13 AM

Probably not and more like a Hebbeian network

herrkuchenbaecker t1_iw35nqb wrote on November 12, 2022 at 4:32 PM

more like REM amirite?!

not_old_redditor t1_ivzs649 wrote on November 11, 2022 at 8:51 PM

Obviously the AI's not pulling out a pillow and lying down for 6-9 hours. It's just a clickbaity title.

Smallrequaza t1_iw01oyq wrote on November 11, 2022 at 9:59 PM

but its not like its just sitting around idle for 8 hours which is what one would assume a computer “sleeping” would look like

japgolly t1_iw04kx8 wrote on November 11, 2022 at 10:20 PM

Brains aren't idle when we sleep. I guess AI sleeping would be similar in that it stops taking in new inputs (stimuli) and just does internal processing for a while. Pretty similar to how many big systems (eg. banking) have daily offline periods when batch processes run.

usgrant7977 t1_iw0l2ma wrote on November 12, 2022 at 12:28 AM

So AI sleep is just defragging the drive?

japgolly t1_iw0tl8h wrote on November 12, 2022 at 1:38 AM

I mean, I kinda think of sleep as defragging the brain so 😆

Alex_c666 t1_iw0ipzm wrote on November 12, 2022 at 12:09 AM

When I read "taught AI to sleep" I'm just reminded not only do the mass majority not understand AI, the writers of such articles perpetuate this ignorance

epelle9 t1_iw02t31 wrote on November 11, 2022 at 10:07 PM

How is that different from sleep though?

It needs a certain time where the main process go idle and it processes memories instead.

empty_string_ t1_iw04y1c wrote on November 11, 2022 at 10:23 PM

Saving one dataset to another and then wiping the active set is common computer function. You can definitely make a comparison to how this is similar to humans and sleeping.

The part that bugs me is saying they "Taught" the AI to "sleep" , as if it now has some independent concept of resting and clearing it's mind to be refreshed. It sounds more like they "programmed" the AI to "save data".

epelle9 t1_iw063m3 wrote on November 11, 2022 at 10:32 PM

Well, I think you are looking at both humans sleep and transfer of data too much on the surface level.

"Resting and feeling refreshed" come mostly from your brain processing data, and storing it in a different format for long term storage.

That's why you can be under anesthesia for 8 hours and not wake up feeling rested and refreshed as if you would've slept and processed those memories. REM sleep is basically the equivalent of processing human RAM for long term storage.

Also, saving one data set to another is likely not what's going on in the AI, I would bet that it processing the data set, and storing it on a different data structure that allows better long term storage and different compression. Clearing the short term memory.

Sleep science is still a pretty undeveloped field, but I wouldn't be surprised if both processes are very similar. I'm willing to bet that the research on both field will complement each other very nicely.

I get your point though, we still don't even know enough about sleep to program a computer to do it like us, but this is definitely linked to sleep.

Aggravating_Moment78 t1_iw0lfwu wrote on November 12, 2022 at 12:31 AM

The program needs to save data, how’s that “sleeping” ? And they didn’t “teach” it they programmed it to save data

orbitaldan t1_iw0ppwc wrote on November 12, 2022 at 1:05 AM

It's not saving like a traditional computer program. Neural networks don't have compartmentalized files like that, just large blocks of numbers that control how strongly the simulated synapses react. What they have created is a sleep-like process that combines two blocks of numbers without discarding the learning contained within.

Aggravating_Moment78 t1_iw0rysc wrote on November 12, 2022 at 1:24 AM

Well yes, they programmed it to save data before it needs to discard it, hardly a new concept in the field... 😉

orbitaldan t1_iw11bii wrote on November 12, 2022 at 2:45 AM

Wow. They should hire you, since you could have saved them so much trouble years ago!

Aggravating_Moment78 t1_iw144dq wrote on November 12, 2022 at 3:10 AM

I was just making an observation on the wording of the article, if that’s not clear. It is probably still a valuable development it’s just not something you’d describe in human terms 😉

Ykieks t1_iw1lbpv wrote on November 12, 2022 at 6:05 AM

Original paper if you are interested

empty_string_ t1_iw1t0ql wrote on November 12, 2022 at 7:46 AM

Awesome, thank you!

sshwifty t1_iw0xtd2 wrote on November 12, 2022 at 2:14 AM

So....supervised learning in a while true loop

poemmys t1_iw02v76 wrote on November 11, 2022 at 10:08 PM

So they taught an AI to move data from RAM into storage over a couple hours, something any computer does in microseconds? How useless. Why do people seem intent on forcing human features onto AI with no benefit? Part of the allure of AI in the first place is that it doesn't need to sleep or eat like humans do.

cowlinator t1_iw1gsj0 wrote on November 12, 2022 at 5:14 AM

It is analogous tho. We know that this is one of the purposes of sleep in biological brains.

Also, neural net memory is not stored as data in the same way that it is stored on a conventional computer. The memory is in the weights of the model of the network. This is not trivial to "save" or "load". I dont think such a thing has been done before

empty_string_ t1_iw1j6oi wrote on November 12, 2022 at 5:39 AM

Data is data. It is trivial to save and load. If you can process it, you can save it.

In machine learning it's not groundbreaking to save a learning set and then start a new one. It's like saving or loading anything else.. The neat thing that it sounds like these people have done (presumably, since no real detail is given), is create some sort of fancy algorithm for deciding HOW and WHEN these different data banks are saved, loaded, and used in relation to one another.

bedpimp t1_iw1j7xp wrote on November 12, 2022 at 5:40 AM

Sleep? ETL? It’s the same picture!