nucLeaRStarcraft t1_jcoo30z wrote on March 18, 2023 at 11:32 AM

Reply to comment by -xylon in [D] Unit and Integration Testing for ML Pipelines by Fender6969

more or less the same. However, the simplest way to start, at least that's what I found, is to randomize a sub sample of real data. It may be the case that synthetic data is simply too simple / does not capture the real distribution and can hide bugs.

Probably both is the ideal solution.

nucLeaRStarcraft t1_jc1334g wrote on March 13, 2023 at 7:19 AM

Reply to [R] Introducing Ursa from Speechmatics | 25% improvement over Whisper by jplhughes

Why is this tagged [R]. This is a commercial project at best. Where's the paper, where's the code? Can we use it today on our PC like whisper? This really isn't 'research'.

nucLeaRStarcraft t1_jb9289f wrote on March 7, 2023 at 10:04 AM

Reply to comment by etesian_dusk in [N] tinygrad 0.5.0 released by Balance-

they claim it's fast on apple m1 and some embedded arm devices, but i have no idea how easy it is to use ootb.

nucLeaRStarcraft t1_j6nunti wrote on January 31, 2023 at 5:33 PM

Reply to comment by qalis in [D] Have researchers given up on traditional machine learning methods? by fujidaiti

There's also this survey of DL vs traditional methods for tabular data: https://arxiv.org/pdf/2110.01889.pdf

nucLeaRStarcraft t1_j08cjvc wrote on December 14, 2022 at 8:18 PM

Reply to comment by Internal-Diet-514 in [P] Implemented Vision Transformers 🚀 from scratch using TensorFlow 2.x by TensorDudee

I agree with you, if we want to test the architecture, we should use the same training procedure, including pre-training.

My theory is, that given the current results of GPT-like models, which use transformers under the hood, and given the fact that these groups have the compute power and data to train non-attention based recurrent models, it's quite unlikely that the architecture isn't a main contributor.

nucLeaRStarcraft t1_j07bufu wrote on December 14, 2022 at 4:25 PM

Reply to comment by Internal-Diet-514 in [P] Implemented Vision Transformers 🚀 from scratch using TensorFlow 2.x by TensorDudee

We're generally trying to maximize the available labeled data. If the Transformer can ingest more data and in the end performs better than any other non-attention based model, given the same amount of data, then, it's a better architecture.

However, you are asking a proper question, but I think the body of recent work shows that the Transformer indeed generalizes better. Otherwise, we'd see similar results with non-transformed based architectures, since the data and compute is already there for these groups who do this kind of research.

nucLeaRStarcraft t1_ittmx5p wrote on October 26, 2022 at 5:44 AM

Reply to comment by AllowFreeSpeech in [N] OpenAI Gym and a bunch of the most used open source RL environments have been consolidated into a single new nonprofit (The Farama Foundation) by jkterry1

fărâmă in romanian (faa-rae-mae) means grain or (very) small amount (i.e. grain of salt, grain of bread etc.)