seba07 t1_jcjsmxi wrote on March 17, 2023 at 10:35 AM

Reply to [D] GPT-4 is really dumb by [deleted]

The model predicts (in a nutshell) the next word in the answer. It simply can't do math. That's just a known limitation.

seba07 t1_j6nxuiq wrote on January 31, 2023 at 5:52 PM

Reply to [D] Have researchers given up on traditional machine learning methods? by fujidaiti

I think the argument is quite often "because it works". You throw your data at a standard CNN (if you're dealing with images) and often get decent results quite fast.

seba07 t1_j69mjy3 wrote on January 28, 2023 at 7:13 PM

Reply to [D] Interviewer asked to code neural network from scratch with plain python on a live call. Reasonable? by OkAssociation8879

What does that even mean? Just the matrix multiplication of a perceptron or including the back propagation of an optimizer?

seba07 t1_j46wuue wrote on January 13, 2023 at 4:17 PM

Reply to comment by theDaninDanger in [R] Git is for Data (CIDR 2023) - Extending Git to Support Large-Scale Data by rajatarya

Isn't that the standard way research papers are written: you only compare your solution to methods that are worse that yours? ;)

seba07 t1_j0lg760 wrote on December 17, 2022 at 3:43 PM

Reply to [D] Is softmax a good choice for confidence? by thanderrine

From my experience the loss function also plays an important part. Cross entropy forces the model to be very certain with a decision. Focal loss can produce a smoother output distribution.

seba07 t1_ittpya9 wrote on October 26, 2022 at 6:23 AM

Reply to comment by TiredOldCrow in [D] Tensorflow learning differently than Pytorch by aleguida

Or otherwise don't use a pre-trained network for this test. Pytorch randomness shouldn't be better than Tensorflows.

seba07 t1_itd3x25 wrote on October 22, 2022 at 6:51 PM

Reply to [D] What things did you learn in ML theory that are, in practice, different? by 4bedoe

A few random things I learned in practice: "Because it worked" is a valid answer to why you chose certain parameters or algorithms. Older architectures like resnet are still state of the art for certain scenarios. Executive time is crucial, we often take the smallest models available.

seba07 t1_iqshjec wrote on October 2, 2022 at 7:38 PM

Reply to [D] Types of Machine Learning Papers by Lost-Parfait568

And the "results are 0.x% better" papers are often about challenges that aren't interesting anymore since many years.

seba07 t1_iqltogp wrote on October 1, 2022 at 9:25 AM

Reply to [D] Why is the machine learning community obsessed with the logistic distribution? by cthorrez

I think the real answer for many ML problems is "because it works". Why are we using relu (=max(x,0)) instead of sigmoid or tanh as layer activations nowdays? Math would discourage this as the derivative at 0 is not defined, but it's fast an it works.