nucLeaRStarcraft
nucLeaRStarcraft t1_jc1334g wrote
Why is this tagged [R]. This is a commercial project at best. Where's the paper, where's the code? Can we use it today on our PC like whisper? This really isn't 'research'.
nucLeaRStarcraft t1_jb9289f wrote
Reply to comment by etesian_dusk in [N] tinygrad 0.5.0 released by Balance-
they claim it's fast on apple m1 and some embedded arm devices, but i have no idea how easy it is to use ootb.
nucLeaRStarcraft t1_j6nunti wrote
Reply to comment by qalis in [D] Have researchers given up on traditional machine learning methods? by fujidaiti
There's also this survey of DL vs traditional methods for tabular data: https://arxiv.org/pdf/2110.01889.pdf
nucLeaRStarcraft t1_j08cjvc wrote
Reply to comment by Internal-Diet-514 in [P] Implemented Vision Transformers ๐ from scratch using TensorFlow 2.x by TensorDudee
I agree with you, if we want to test the architecture, we should use the same training procedure, including pre-training.
My theory is, that given the current results of GPT-like models, which use transformers under the hood, and given the fact that these groups have the compute power and data to train non-attention based recurrent models, it's quite unlikely that the architecture isn't a main contributor.
nucLeaRStarcraft t1_j07bufu wrote
Reply to comment by Internal-Diet-514 in [P] Implemented Vision Transformers ๐ from scratch using TensorFlow 2.x by TensorDudee
We're generally trying to maximize the available labeled data. If the Transformer can ingest more data and in the end performs better than any other non-attention based model, given the same amount of data, then, it's a better architecture.
However, you are asking a proper question, but I think the body of recent work shows that the Transformer indeed generalizes better. Otherwise, we'd see similar results with non-transformed based architectures, since the data and compute is already there for these groups who do this kind of research.
nucLeaRStarcraft t1_ittmx5p wrote
Reply to comment by AllowFreeSpeech in [N] OpenAI Gym and a bunch of the most used open source RL environments have been consolidated into a single new nonprofit (The Farama Foundation) by jkterry1
fฤrรขmฤ in romanian (faa-rae-mae) means grain or (very) small amount (i.e. grain of salt, grain of bread etc.)
nucLeaRStarcraft t1_jcoo30z wrote
Reply to comment by -xylon in [D] Unit and Integration Testing for ML Pipelines by Fender6969
more or less the same. However, the simplest way to start, at least that's what I found, is to randomize a sub sample of real data. It may be the case that synthetic data is simply too simple / does not capture the real distribution and can hide bugs.
Probably both is the ideal solution.