seba07
seba07 t1_j6nxuiq wrote
I think the argument is quite often "because it works". You throw your data at a standard CNN (if you're dealing with images) and often get decent results quite fast.
seba07 t1_j69mjy3 wrote
Reply to [D] Interviewer asked to code neural network from scratch with plain python on a live call. Reasonable? by OkAssociation8879
What does that even mean? Just the matrix multiplication of a perceptron or including the back propagation of an optimizer?
seba07 t1_j46wuue wrote
Reply to comment by theDaninDanger in [R] Git is for Data (CIDR 2023) - Extending Git to Support Large-Scale Data by rajatarya
Isn't that the standard way research papers are written: you only compare your solution to methods that are worse that yours? ;)
seba07 t1_j0lg760 wrote
From my experience the loss function also plays an important part. Cross entropy forces the model to be very certain with a decision. Focal loss can produce a smoother output distribution.
seba07 t1_ittpya9 wrote
Reply to comment by TiredOldCrow in [D] Tensorflow learning differently than Pytorch by aleguida
Or otherwise don't use a pre-trained network for this test. Pytorch randomness shouldn't be better than Tensorflows.
seba07 t1_itd3x25 wrote
A few random things I learned in practice: "Because it worked" is a valid answer to why you chose certain parameters or algorithms. Older architectures like resnet are still state of the art for certain scenarios. Executive time is crucial, we often take the smallest models available.
seba07 t1_iqshjec wrote
Reply to [D] Types of Machine Learning Papers by Lost-Parfait568
And the "results are 0.x% better" papers are often about challenges that aren't interesting anymore since many years.
seba07 t1_iqltogp wrote
Reply to [D] Why is the machine learning community obsessed with the logistic distribution? by cthorrez
I think the real answer for many ML problems is "because it works". Why are we using relu (=max(x,0)) instead of sigmoid or tanh as layer activations nowdays? Math would discourage this as the derivative at 0 is not defined, but it's fast an it works.
seba07 t1_jcjsmxi wrote
Reply to [D] GPT-4 is really dumb by [deleted]
The model predicts (in a nutshell) the next word in the answer. It simply can't do math. That's just a known limitation.