Ulfgardleo

Ulfgardleo t1_j1vjc6q wrote

I don't think this is one of those cases. The question we want to answer is whether texts are good enough that humans will not pick up on it. Making the task as hard as possible for humans is not indicative of real world performance once people get presented these texts more regularly.

1

Ulfgardleo t1_j1afdic wrote

other projects do not "completely dismiss" this issue, but they are researching other aspects of fusion. For example: that Wendelstein-7x does not investigate breeding is not because they don't see it as interesting, but because they want to show that stellarator confiment works and that we have finally understood plasma physics.

3

Ulfgardleo t1_iza5481 wrote

the hessian is always better information than the natural gradient because it includes actual information of the curvature of the function while the NG only includes curvature of the model. So any second order TR approach with NG information will approach the hessian.

//edit: I am assuming actual trust region methods, like TR-Newton, and not some RL-ML approximation schemes.

4

Ulfgardleo t1_iz437fi wrote

I have a story to tell about the one time where i got invited as external evaluator for a MSc thesis. I agreed, later opened it and then realized it was a comparison of 10 animal migration algorithms.

This thesis sat on my desk for WEEKS because i did not know how to grade it. How do you grade pseudo science?!? Like, it is not the fault of the students to fall prey to this topic, but I also can't condone them not figuring out that it IS pseudoscience.

3

Ulfgardleo t1_iz3q7pd wrote

I did not. I did it for Hinton.

A heuristic can be useful without proof, especially in tasks that are very difficult to solve. However, you have to supply strong theoretic arguments why they should work. A biological analog is not enough, especially if it is one that we do not understand, either.

Otherwise you end up like the other category of nature inspired optimization heuristics that pretend to optimize by mimicking the hunting patterns of the Harris hawk. And I wished I made this up just now.

8

Ulfgardleo t1_ixhlloe wrote

but he does not claim that. What he does claim is that developments in his lab predate those ideas. It might be that those ideas were rediscovered independently by others, but as in so many things, who is first matters. And from a scientific point of view, not doing literature review for ones own ideas is bad science, and especially so if a lab strategically avoids citing some other lab.

It shouldn't be so difficult to write "idea X [1] rediscovered by [more prominent 2] has led to" in ones work.

17

Ulfgardleo t1_ivxodj0 wrote

  1. okay you completely confuse everyone in the ML community when you call inputs "labels". lets keep with inputs/outputs.

  2. This is good, because it allows you to estimate some crude measure for the quality of the physics model.

So, label noise is a broad field. I am mostly knowledgeable in the classification setting, where label noise has different effects. Moreover, you are not in the standard noisy label setting, because the noise is not independent of the label, so just using weights will be difficult. Similarly, if you have more than one output to predict, a single weight is difficult to compute.

The standard way to derive all of these methods is by noting that the MSE can be derived as the log-probability of the normal distribution p(y|f) where y is the ground truth and f is the mean, and variance is some fixed value. For the mse, the value of the variance does not matter, as long as it remains fixed, but with fairly little effort you can show that as soon as you give samples individual variances, this amounts to weighting the MSE.

So, the cheapest approach would be to give outcomes from the different sources a different variance and if you have more than one output, you will also have more than one variance. How do you guess the parameters? well, make them learnable parameters and train them together with your model parameters.

Of course you can make it arbitrarily complicated. Since your cheap labels come from a physics simulation, errors are likely correlated so you can learn a full covariance matrix. And from there you can make it as complex as you like by making the error distribution more complex, but you will likely not have enough data to do so.

1

Ulfgardleo t1_ivvjvsd wrote

  1. you said "unreliable outputs". did you mean inputs? if you truely meant outputs (i.e., the material properties that you want to predict from some so far undefined inputs) then this is what in ml is called "label".
  2. Okay, i have the same issue here. typically ground-truth would be what we called label, but i can see that you would distinguish between simulated/measured ground-truth data.
  3. "model" here is the physics based model, not the ML-model, right?
  4. I don't see it answered. I ask it explicitely: is there any experimental measurmeent for which you also have the physics-model output?
  5. You lost me here.
1

Ulfgardleo t1_ivvfosf wrote

Okay, there are a few questions:

  1. what is unreliable: the inputs, or the labels? Is your problem even supervised?
  2. What do you want to learn?
  3. Is it possible to quantify reliability for each source? Is it just higher variance or also bias?
  4. Does there exist cases for which you have reliable and unreliable data?
  5. What is the data you finally predict on? the reliable or unreliable data?
1

Ulfgardleo t1_isec031 wrote

technically that holds true for all real number representations of vectors we have. your computer computes stuff in "pretend" vector spaces and it just happens to approximately work out. Sometimes you have to care about the numerics, but most of the time you are fine.

​

//edit Having said that, there are ways to make it mathematically work. A vector space is typically defined over a field, e.g., the real or complex numbers, and in this case the scalars are limited to that field. The same construction over a ring (like the integers) is called a module. So a quantized vector (for example the integers) limited to scalars being integers as well DOES work out. The only thing you have to deal with is that divison in a ring might not exist.

4

Ulfgardleo t1_irv9w00 wrote

quite easy to proof.

take a multi-class classification problem. Now, pick one class and assign it label 0, assign all other classes the same coarse label 1 and try to find the maximum margin classifier. This problem is equivalent to finding a convex polytope that separates class 0 from class 1 with maximum margin. This is an NP-hard problem. Logistic regression is not much better, but more difficult to proof.

This is already NP-complete when the coarse label encompasses two classes: https://proceedings.neurips.cc/paper/2018/file/22b1f2e0983160db6f7bb9f62f4dbb39-Paper.pdf

3