neuralbeans

neuralbeans t1_jbjizpw wrote

It's a degenerate case, not something anyone should do. If you include Y in your input, then overfitting will lead to the best generalisation. This shows that the input does affect overfitting. In fact, the more similar the input is to the output, the simpler the model can be and thus the less it can overfit.

1

neuralbeans t1_jbiu3io wrote

Yes, if the features include the model's target output. Then, the overfitting would result in the model outputting that feature as is. Of course this is a useless solution, but the more similar the features are to the output, the less overfitting will be a problem and the less data you would need to generalise.

6

neuralbeans t1_j7jdiqz wrote

A sequence length of 6250 is massive! It's not just 6250*6250 since you're not multiplying one float per pair of sequence items. You're multiplying the key and value vectors together per pair of sequence items, and this is done for every attention head (in parallel). I think you're seriously under estimating the problem.

What transformer is this which accepts a sequence length of 6250?

1

neuralbeans t1_j3bx6e5 wrote

Alpha Go does not use a large database of moves. If anything the reason why winning Go was so impressive is because there are way too many scenarios to solve it the way you're describing. It learned to play by practicing against itself using reinforcement learning.

14

neuralbeans t1_j1tlj8i wrote

Reply to AI and education by lenhoi

Is it that different from using calculators in primary school mathematics? I'd like to say that we need to come up with a higher level task that computers can't do for assessing students but I think we'll run out of options in a few years. Better to just make it a point that, while it is possible for AI to do a student's job (in the near future), it is still important that the students learn to do what the AI can also do. Unfortunately that means using in-class tests only.

4