Brudaks

Brudaks t1_j9k6mo0 wrote

Because being an universal function approximator is not sufficient to be useful in practice, and IMHO is not even really a particularly interesting property; we don't care if something can approximate any function, we care whether it approximates the thing needed for a particular task; and in any case being able to approximate it is a necessary but not a sufficient condition. We care about efficiency of approximation (e.g. a single-layer perceptron is an universal approximator iff you assume an impractical number of neurons), but even more important than how well the function can be approximated with a limited number of parameters is how well you can actually learn these parameters - this differs a lot for different models, and we don't care about how well a model would fit the function with optimal parameters, we care about how well it fits the function with the parameter values we can realistically identify with a bounded amount of computation.

That being said, we do use decision trees instead of DL; for some types of tasks the former outperform the latter and for other types of tasks its the other way around.

3

Brudaks t1_j6nj9z1 wrote

For most established tasks people have a good idea (based on empirical evidence) about the limits of particular methods for this task.

There are tasks where "traditional machine learning methods" work well, and people working on these tasks use them and will use them.

And there are tasks where they don't and deep learning gets far better results that we could/can do otherwise - and for those types of tasks, yes, it would be accurate to say that we have given up on traditional machine learning; if you're given an image classification or text analysis task, you'd generally use DL even for a simple baseline without even trying any of the "traditional" methods we used in earlier years.

12

Brudaks t1_j6jqizr wrote

Availability of corpora for other languages.

If you care about much less resourced languages than English or the big ones, then you can generally get sufficient text to do interesting stuff, but working with speech becomes much more difficult due to very limited quantity of decent quality data.

2

Brudaks t1_j21x8ut wrote

Thing is, we can't really do that for text, natural language doesn't no free variation where you could insert a sufficient bits of special noise unnoticable to human eyes. Well, you might add some data with various formatting or unicode trickery, but that would be trivially removable by anyone who cared.

1

Brudaks t1_izypq27 wrote

It still works today just as before - for a new thread, if you start with the exact same prompt from the original post "I want you to act as a Linux terminal. I will type commands and you will reply with what the terminal should show. I want you to only reply with the terminal output inside one unique code block, and nothing else. Do not write explanations. Do not type commands unless I instruct you to do so. When I need to tell you something in English I will do so by putting text inside curly brackets {like this}. My first command is pwd." then you can execute commands without any concern.

2

Brudaks t1_iz4av37 wrote

My intuitive understanding is that transformers are far too "powerful"/assumption-free that they are quite data-hungry and need far more than "a couple books" to learn the structure.

If all you have is a couple of books, then IMHO a small RNN would bring better results than a transformer (but still bad - "all the works of Shakespeare" seems to be a reasonable minimum to get decent results) and the inflection point where transformer architecture starts to shine is at much larger quantities of training data.

If you do want to exactly that (and with overfitting), try starting from a sequence length of, say, 4 or 8 as a starting point.

2

Brudaks t1_iz04r4f wrote

Thing is, it's generally more compute-efficient to do the exact opposite and replace conditional execution with something that always does the exact same operations in parallel but just multiplies them by zero or something like that if they're not needed. Parallelism and vectorization is how we get effective execution nowadays.

1

Brudaks t1_ixo9z32 wrote

Legally anything without a formal license is "all rights reserved". If you don't have explicit permission, the law requires you to assume that the creator wouldn't want you to modify and republish it. If the author never says anything, you're prohibited to use it until 70 years after they die.

4

Brudaks t1_issxwcr wrote

If a company has more qualified applicants than open positions, then obviously simply being qualified can not imply getting a job; and while the criteria for filling out the 'shortlist' might be the qualifications, the criteria according to whom they'll choose the actual candidate can be quite arbitrary.

From their perspective, as long as the other guy/gal they got instead of you is also decent, their process has no problems.

7

Brudaks t1_islpipx wrote

> If you get a ton of citations, who cares what journal it’s in?

Citations take a significant amount of time to accumulate (and not all citations are equal).

The way academic incentives work, people generally want to evaluate published work sooner than that, they don't want to wait for five or ten years to see how much the paper will get cited, and they do not want to judge the actual papers (since most papers in the world or even in computer science are outside of anyone's field of expertise), so the main proxy is the selectivity and impact factor of the publication venue, so people absolutely care about what journal it's in, because they will judge your publications primarily (or even solely) by where they're published, assuming that the journal's review and selectivity is more informative about the quality than whatever they can gleam from a quick skimming of your papers.

Generally there will be some sort of semi-objective criteria - e.g. they'll look at Scopus or Web of Science citation counts - ignoring any citations from anything else, and see whether that journal is in the top half of all the journals in that field (for example). And then some evaluation (e.g. for fulfilling research project goals, future grant proposals, or academic job evaluations - in essence whenever funding is involved, the funding officials defer to such criteria) the publications which fit those arbitrary criteria matter, and the rest are worthless.

The second half of the incentive issue (problem?) is that since you can't publish the same research twice (at least not in respectable venues), if you publish it someplace that "doesn't count" then that research work may be wasted for whatever metrics your institution or advisor is evaluated on, being worse than unpublished work that at least has the potential to become a "proper" publication. So my guess is that simply that "more scientific journal" fits those criteria better than the ones you mention, and your advisor is simply aware of the "metrics politics".

28