Brudaks
Brudaks t1_j6nj9z1 wrote
For most established tasks people have a good idea (based on empirical evidence) about the limits of particular methods for this task.
There are tasks where "traditional machine learning methods" work well, and people working on these tasks use them and will use them.
And there are tasks where they don't and deep learning gets far better results that we could/can do otherwise - and for those types of tasks, yes, it would be accurate to say that we have given up on traditional machine learning; if you're given an image classification or text analysis task, you'd generally use DL even for a simple baseline without even trying any of the "traditional" methods we used in earlier years.
Brudaks t1_j6jqizr wrote
Availability of corpora for other languages.
If you care about much less resourced languages than English or the big ones, then you can generally get sufficient text to do interesting stuff, but working with speech becomes much more difficult due to very limited quantity of decent quality data.
Brudaks t1_j21x8ut wrote
Reply to comment by gkaykck in [Discussion] 2 discrimination mechanisms that should be provided with powerful generative models e.g. ChatGPT or DALL-E by Exnur0
Thing is, we can't really do that for text, natural language doesn't no free variation where you could insert a sufficient bits of special noise unnoticable to human eyes. Well, you might add some data with various formatting or unicode trickery, but that would be trivially removable by anyone who cared.
Brudaks t1_izypq27 wrote
Reply to comment by _Arsenie_Boca_ in [D] - Has Open AI said what ChatGPT's architecture is? What technique is it using to "remember" previous prompts? by 029187
It still works today just as before - for a new thread, if you start with the exact same prompt from the original post "I want you to act as a Linux terminal. I will type commands and you will reply with what the terminal should show. I want you to only reply with the terminal output inside one unique code block, and nothing else. Do not write explanations. Do not type commands unless I instruct you to do so. When I need to tell you something in English I will do so by putting text inside curly brackets {like this}. My first command is pwd." then you can execute commands without any concern.
Brudaks t1_iz4av37 wrote
Reply to comment by gkamer8 in [D] Simple Questions Thread by AutoModerator
My intuitive understanding is that transformers are far too "powerful"/assumption-free that they are quite data-hungry and need far more than "a couple books" to learn the structure.
If all you have is a couple of books, then IMHO a small RNN would bring better results than a transformer (but still bad - "all the works of Shakespeare" seems to be a reasonable minimum to get decent results) and the inflection point where transformer architecture starts to shine is at much larger quantities of training data.
If you do want to exactly that (and with overfitting), try starting from a sequence length of, say, 4 or 8 as a starting point.
Brudaks t1_iz04r4f wrote
Reply to comment by _PYRO42_ in [D] Simple Questions Thread by AutoModerator
Thing is, it's generally more compute-efficient to do the exact opposite and replace conditional execution with something that always does the exact same operations in parallel but just multiplies them by zero or something like that if they're not needed. Parallelism and vectorization is how we get effective execution nowadays.
Brudaks t1_iyzf7pc wrote
Reply to comment by TikiTDO in [D] OpenAI’s ChatGPT is unbelievable good in telling stories! by Far_Pineapple770
IDK, in my experience whenever it makes an undesirable assumption you can just do "now rewrite this, assuming that X is Y" and it will spit out an altered variation with whatever you want.
Brudaks t1_ixo9z32 wrote
Reply to comment by dualmindblade in [P] Stable Diffusion 2.0 Announcement by hardmaru
Legally anything without a formal license is "all rights reserved". If you don't have explicit permission, the law requires you to assume that the creator wouldn't want you to modify and republish it. If the author never says anything, you're prohibited to use it until 70 years after they die.
Brudaks t1_ivy46bv wrote
Reply to comment by DreamyPen in [Discussion] Can we train with multiple sources of data, some very reliable, others less so? by DreamyPen
Sure, as long as you make a direct connection from that feature column to your loss calculation (i.e. multiply the calculated error metric by it) instead of solely using it as one of the inputs.
Brudaks t1_issxwcr wrote
If a company has more qualified applicants than open positions, then obviously simply being qualified can not imply getting a job; and while the criteria for filling out the 'shortlist' might be the qualifications, the criteria according to whom they'll choose the actual candidate can be quite arbitrary.
From their perspective, as long as the other guy/gal they got instead of you is also decent, their process has no problems.
Brudaks t1_islpipx wrote
> If you get a ton of citations, who cares what journal it’s in?
Citations take a significant amount of time to accumulate (and not all citations are equal).
The way academic incentives work, people generally want to evaluate published work sooner than that, they don't want to wait for five or ten years to see how much the paper will get cited, and they do not want to judge the actual papers (since most papers in the world or even in computer science are outside of anyone's field of expertise), so the main proxy is the selectivity and impact factor of the publication venue, so people absolutely care about what journal it's in, because they will judge your publications primarily (or even solely) by where they're published, assuming that the journal's review and selectivity is more informative about the quality than whatever they can gleam from a quick skimming of your papers.
Generally there will be some sort of semi-objective criteria - e.g. they'll look at Scopus or Web of Science citation counts - ignoring any citations from anything else, and see whether that journal is in the top half of all the journals in that field (for example). And then some evaluation (e.g. for fulfilling research project goals, future grant proposals, or academic job evaluations - in essence whenever funding is involved, the funding officials defer to such criteria) the publications which fit those arbitrary criteria matter, and the rest are worthless.
The second half of the incentive issue (problem?) is that since you can't publish the same research twice (at least not in respectable venues), if you publish it someplace that "doesn't count" then that research work may be wasted for whatever metrics your institution or advisor is evaluated on, being worse than unpublished work that at least has the potential to become a "proper" publication. So my guess is that simply that "more scientific journal" fits those criteria better than the ones you mention, and your advisor is simply aware of the "metrics politics".
Brudaks t1_j9k6mo0 wrote
Reply to comment by GraciousReformer in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer
Because being an universal function approximator is not sufficient to be useful in practice, and IMHO is not even really a particularly interesting property; we don't care if something can approximate any function, we care whether it approximates the thing needed for a particular task; and in any case being able to approximate it is a necessary but not a sufficient condition. We care about efficiency of approximation (e.g. a single-layer perceptron is an universal approximator iff you assume an impractical number of neurons), but even more important than how well the function can be approximated with a limited number of parameters is how well you can actually learn these parameters - this differs a lot for different models, and we don't care about how well a model would fit the function with optimal parameters, we care about how well it fits the function with the parameter values we can realistically identify with a bounded amount of computation.
That being said, we do use decision trees instead of DL; for some types of tasks the former outperform the latter and for other types of tasks its the other way around.