JustOneAvailableName t1_jea2dzf wrote on March 30, 2023 at 2:47 PM

Reply to comment by EquipmentStandard892 in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama

Software engineer perspective on attention (self quote):

> You have to think about searching. If you search, you have a query (the search term), some way to correlate the query to the actual (size unknown/indifferent) knowledge base and the knowledge base itself. If you have to write this as a mathematical function you have to have something that matches a query, to how similar it is to some key and then return the corresponding value to that key. The transformer equation is a pretty straightforward formula from that perspective. Each layers learns what it searches for, how it can be found and which value it wants to transfer when requested.

RWKV changes this by removing the query. So data is not requested anymore, only pushed. I am frankly surprised to seems to work thus far. Pushing data (self determining how important something is for something else) is not dependant on other states, enabling it to be a RNN.

Edit: step I need to mention: in RWKV importance also fades over time, so it has a recency bias

JustOneAvailableName t1_jcobgnf wrote on March 18, 2023 at 8:36 AM

Reply to comment by mike94025 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

That is a very nice surprise

JustOneAvailableName t1_jbthd11 wrote on March 11, 2023 at 4:15 PM

Reply to [D] Unsupervised Learning — have there been any big advances recently? by onebigcat

Isn't the whole transformer revolution due to SSL which is just plain unsupervised learning?

JustOneAvailableName t1_j7v99gd wrote on February 9, 2023 at 5:33 PM

Reply to [D] RTX 3090 with i7 7700k, training bottleneck by Available_Lion_652

If the model is sufficiently large (if not, you don't really need to wait long anyways) and no expensive CPU pre/postprocessing is done, the 3090 will be the bottleneck.

A single 3090 might not have enough memory to train GPT 2 large, but it's probably close.

Fully training a LLM on a single 3090 is impossible, but you could finetune one.

JustOneAvailableName t1_j7oknmi wrote on February 8, 2023 at 7:50 AM

Reply to comment by TaXxER in [N] Getty Images sues AI art generator Stable Diffusion in the US for copyright infringement by Wiskkey

Copyright is about redistribution and we're talking pubicly available data. I don't want/need to give consent to specific people/companies to allow them to read this comment. Nor do I think it should now be up to reddit to decide what is and isn't allowed

JustOneAvailableName t1_j7le6dw wrote on February 7, 2023 at 5:21 PM

Reply to comment by HateRedditCantQuitit in [N] Getty Images sues AI art generator Stable Diffusion in the US for copyright infringement by Wiskkey

> but commercial use requires opt-in consent from content creators

You might as well ban it directly for commercial use with opt in

JustOneAvailableName t1_j7hh2yv wrote on February 6, 2023 at 8:51 PM

Reply to comment by mugbrushteeth in [N] Google: An Important Next Step On Our AI Journey by EducationalCicada

Their main source of revenue is seriously threatened by a 10-50M(?) investment. It might not be OpenAI, but something will replace Google in the coming years if Google doesn't innovate their search.

JustOneAvailableName t1_j71yj42 wrote on February 3, 2023 at 2:26 PM

Reply to comment by new_name_who_dis_ in [D] Understanding Vision Transformer (ViT) - What are the prerequisites? by SAbdusSamad

Understanding what is extremely easy and rather useless, to understand a paper you need to understand some level of why. If you have time to go in depth, aim to understand the what not and why not.

So I would argue at least some basic knowledge of CNNs is required.

JustOneAvailableName t1_j6cfdmr wrote on January 29, 2023 at 9:43 AM

Reply to comment by albertzeyer in [D] Why are there no End2End Speech Recognition models using the same Encoder-Decoder learning process as BART (no CTC) ? by KarmaCut132

I worked with Wav2vec a year ago. WER on dutch was (noticeably) better when fine tuned than it was with GCP or Azure, and we didn't use any labeled own data. I used CTC mainly because it didn't reduce WER, hugely improved CER and made inference lots simpler. Inference cost was also a fraction (less than a cent per hour, assuming the GPU is fully utalized) of the paid services. I kinda assumed others got to the same conclusions I did back then, but my own conclusions, so plenty I could have done wrong.

Whisper offers this performance level practically out of the box, although with a lot higher inference costs. I, sadly, haven't had the time yet to finetune it. Nor have I found the time to optimize inference costs.

> E.g. it does not work well for streaming (getting instant recognition results, usually within 100ms, or 500ms, or max 1sec)

If you're okay with intermediary results getting improved later this is doable, although at a factor increased cost. Offline works like a charm though.

> Also, I'm quite sure it has some strange failure cases, as AED models tend to have, like repeating some labels, or skipping to the end of a sequence (or just chunk) when it got confused.

True that.

JustOneAvailableName t1_j683rg1 wrote on January 28, 2023 at 12:08 PM

Reply to comment by albertzeyer in [D] Why are there no End2End Speech Recognition models using the same Encoder-Decoder learning process as BART (no CTC) ? by KarmaCut132

> I think most people nowadays use RNN-T.

Isn't slightly finetuning Whisper the go to?

JustOneAvailableName t1_j65eg5f wrote on January 27, 2023 at 8:55 PM

Reply to comment by nmfisher in [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78

!RemindMe 3 days

JustOneAvailableName t1_j62zkw3 wrote on January 27, 2023 at 10:05 AM

Reply to comment by angkhandelwal749 in [Discussion] Github like alternative for ML? by angkhandelwal749

Check mymlops.com

JustOneAvailableName t1_j4pqxnx wrote on January 17, 2023 at 12:08 PM

Reply to comment by lostmsu in [D] Tim Dettmers' GPU advice blog updated for 4000 series by init__27

> (12c/kWh)

I wish. It is 6x more expensive here...

JustOneAvailableName t1_j4my6kp wrote on January 16, 2023 at 9:17 PM

Reply to comment by timdettmers in [D] Tim Dettmers' GPU advice blog updated for 4000 series by init__27

Great article!

You say this about sparsity:

> It does not seem so. Since the granularity of the sparse matrix needs to have 2 zero-valued elements, every 4 elements, the sparse matrices need to be quite structured.

Wouldn't a more slightly more structured dropout be a perfect fit?

JustOneAvailableName t1_j45wg3c wrote on January 13, 2023 at 11:32 AM

Reply to [D] Bitter lesson 2.0? by Tea_Pearce

"In 70 years" feels extremely cautious. I would say it's in the next few years for regular ML, perhaps 20 years for robotics

JustOneAvailableName t1_j2nhjks wrote on January 2, 2023 at 5:21 PM

Reply to comment by hollow_sets in [D] What do you do while you wait for training? by hollow_sets

Personally I also like to eval way more often than every 5 hours. Perhaps use a smaller eval subset for every hour?

JustOneAvailableName t1_j2dih9m wrote on December 31, 2022 at 1:19 PM

Reply to comment by gscjj in [OC] Around 30% of countries spend more than 2% of GDP on their military by IndeterminateYogurt

Although I do agree that europe could use a boost to it's army, I think it's mainly in a reduction in bureaucracy and investing more in domestic military industry. The EU has 20% more military personnel and thrice US's reserve. A large reason why expendenture is so much less is that it's aimed at defense and not oversea power projection. I.e. no aircraft carriers and nuclear subs

JustOneAvailableName t1_j2dhnsy wrote on December 31, 2022 at 1:11 PM

Reply to comment by agingmonster in [OC] Around 30% of countries spend more than 2% of GDP on their military by IndeterminateYogurt

As both are per capita, the diagonal line is 2% of GDP

JustOneAvailableName t1_j1yk4ya wrote on December 28, 2022 at 10:07 AM

Reply to [D] Protecting your model in a place where models are not intellectual property? by nexflatline

Encypt the drive. Be the only one with password acces to the machine. Don't give the model output exactly.

JustOneAvailableName t1_j1096lz wrote on December 20, 2022 at 6:53 PM

Reply to comment by Dartagnjan in [D] Techniques to optimize a model when the loss over the training dataset has a Power Law type curve. by Dartagnjan

Sounds like you need a higher batch size. What happens on a plateaued model on the hard examples when you take a huge batch size?

JustOneAvailableName t1_j107lf3 wrote on December 20, 2022 at 6:43 PM

Reply to [D] Techniques to optimize a model when the loss over the training dataset has a Power Law type curve. by Dartagnjan

Perhaps something like keep track of harder data points and sample half the batch from that? What happened exactly when training on the hard examples only?

JustOneAvailableName t1_j0uj1ee wrote on December 19, 2022 at 3:03 PM

Reply to comment by retard-moron in [D] Will there be a replacement for Machine Learning Twitter? by MrAcurite

I based my answer on the 2020 and 2021 Neurips papers by institution. Couldn't find data of 2022. Anyways, Wav2vec was a huge paper for me and basically what I worked on for a large part of the past 2 years. And they were the maintainer of PyTorch and still carry the bulk of the work.

I really don't get how you can disregard FB as nearly zero influence on ML

JustOneAvailableName t1_j0tj00w wrote on December 19, 2022 at 8:14 AM

Reply to [P] Generate transcripts with Whisper AI and automatically translate with LibreTranslate by meddit_app

Did you compare results with the build in translate of Whisper?

JustOneAvailableName t1_j0titk8 wrote on December 19, 2022 at 8:12 AM

Reply to comment by retard-moron in [D] Will there be a replacement for Machine Learning Twitter? by MrAcurite

FAIR is only behind Google/Microsoft and one of the bigger players in AI research

JustOneAvailableName t1_izc2l2j wrote on December 8, 2022 at 12:26 AM

Reply to comment by Beor_The_Old in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

> understanding of the concept of a state.

I kinda think we're already there