JustOneAvailableName
JustOneAvailableName t1_jcobgnf wrote
Reply to comment by mike94025 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
That is a very nice surprise
JustOneAvailableName t1_jbthd11 wrote
Isn't the whole transformer revolution due to SSL which is just plain unsupervised learning?
JustOneAvailableName t1_j7v99gd wrote
If the model is sufficiently large (if not, you don't really need to wait long anyways) and no expensive CPU pre/postprocessing is done, the 3090 will be the bottleneck.
A single 3090 might not have enough memory to train GPT 2 large, but it's probably close.
Fully training a LLM on a single 3090 is impossible, but you could finetune one.
JustOneAvailableName t1_j7oknmi wrote
Reply to comment by TaXxER in [N] Getty Images sues AI art generator Stable Diffusion in the US for copyright infringement by Wiskkey
Copyright is about redistribution and we're talking pubicly available data. I don't want/need to give consent to specific people/companies to allow them to read this comment. Nor do I think it should now be up to reddit to decide what is and isn't allowed
JustOneAvailableName t1_j7le6dw wrote
Reply to comment by HateRedditCantQuitit in [N] Getty Images sues AI art generator Stable Diffusion in the US for copyright infringement by Wiskkey
> but commercial use requires opt-in consent from content creators
You might as well ban it directly for commercial use with opt in
JustOneAvailableName t1_j7hh2yv wrote
Reply to comment by mugbrushteeth in [N] Google: An Important Next Step On Our AI Journey by EducationalCicada
Their main source of revenue is seriously threatened by a 10-50M(?) investment. It might not be OpenAI, but something will replace Google in the coming years if Google doesn't innovate their search.
JustOneAvailableName t1_j71yj42 wrote
Reply to comment by new_name_who_dis_ in [D] Understanding Vision Transformer (ViT) - What are the prerequisites? by SAbdusSamad
Understanding what is extremely easy and rather useless, to understand a paper you need to understand some level of why. If you have time to go in depth, aim to understand the what not and why not.
So I would argue at least some basic knowledge of CNNs is required.
JustOneAvailableName t1_j6cfdmr wrote
Reply to comment by albertzeyer in [D] Why are there no End2End Speech Recognition models using the same Encoder-Decoder learning process as BART (no CTC) ? by KarmaCut132
I worked with Wav2vec a year ago. WER on dutch was (noticeably) better when fine tuned than it was with GCP or Azure, and we didn't use any labeled own data. I used CTC mainly because it didn't reduce WER, hugely improved CER and made inference lots simpler. Inference cost was also a fraction (less than a cent per hour, assuming the GPU is fully utalized) of the paid services. I kinda assumed others got to the same conclusions I did back then, but my own conclusions, so plenty I could have done wrong.
Whisper offers this performance level practically out of the box, although with a lot higher inference costs. I, sadly, haven't had the time yet to finetune it. Nor have I found the time to optimize inference costs.
> E.g. it does not work well for streaming (getting instant recognition results, usually within 100ms, or 500ms, or max 1sec)
If you're okay with intermediary results getting improved later this is doable, although at a factor increased cost. Offline works like a charm though.
> Also, I'm quite sure it has some strange failure cases, as AED models tend to have, like repeating some labels, or skipping to the end of a sequence (or just chunk) when it got confused.
True that.
JustOneAvailableName t1_j683rg1 wrote
Reply to comment by albertzeyer in [D] Why are there no End2End Speech Recognition models using the same Encoder-Decoder learning process as BART (no CTC) ? by KarmaCut132
> I think most people nowadays use RNN-T.
Isn't slightly finetuning Whisper the go to?
JustOneAvailableName t1_j65eg5f wrote
Reply to comment by nmfisher in [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78
!RemindMe 3 days
JustOneAvailableName t1_j62zkw3 wrote
Reply to comment by angkhandelwal749 in [Discussion] Github like alternative for ML? by angkhandelwal749
Check mymlops.com
JustOneAvailableName t1_j4pqxnx wrote
Reply to comment by lostmsu in [D] Tim Dettmers' GPU advice blog updated for 4000 series by init__27
> (12c/kWh)
I wish. It is 6x more expensive here...
JustOneAvailableName t1_j4my6kp wrote
Reply to comment by timdettmers in [D] Tim Dettmers' GPU advice blog updated for 4000 series by init__27
Great article!
You say this about sparsity:
> It does not seem so. Since the granularity of the sparse matrix needs to have 2 zero-valued elements, every 4 elements, the sparse matrices need to be quite structured.
Wouldn't a more slightly more structured dropout be a perfect fit?
JustOneAvailableName t1_j45wg3c wrote
Reply to [D] Bitter lesson 2.0? by Tea_Pearce
"In 70 years" feels extremely cautious. I would say it's in the next few years for regular ML, perhaps 20 years for robotics
JustOneAvailableName t1_j2nhjks wrote
Reply to comment by hollow_sets in [D] What do you do while you wait for training? by hollow_sets
Personally I also like to eval way more often than every 5 hours. Perhaps use a smaller eval subset for every hour?
JustOneAvailableName t1_j2dih9m wrote
Reply to comment by gscjj in [OC] Around 30% of countries spend more than 2% of GDP on their military by IndeterminateYogurt
Although I do agree that europe could use a boost to it's army, I think it's mainly in a reduction in bureaucracy and investing more in domestic military industry. The EU has 20% more military personnel and thrice US's reserve. A large reason why expendenture is so much less is that it's aimed at defense and not oversea power projection. I.e. no aircraft carriers and nuclear subs
JustOneAvailableName t1_j2dhnsy wrote
Reply to comment by agingmonster in [OC] Around 30% of countries spend more than 2% of GDP on their military by IndeterminateYogurt
As both are per capita, the diagonal line is 2% of GDP
JustOneAvailableName t1_j1yk4ya wrote
Reply to [D] Protecting your model in a place where models are not intellectual property? by nexflatline
Encypt the drive. Be the only one with password acces to the machine. Don't give the model output exactly.
JustOneAvailableName t1_j1096lz wrote
Reply to comment by Dartagnjan in [D] Techniques to optimize a model when the loss over the training dataset has a Power Law type curve. by Dartagnjan
Sounds like you need a higher batch size. What happens on a plateaued model on the hard examples when you take a huge batch size?
JustOneAvailableName t1_j107lf3 wrote
Reply to [D] Techniques to optimize a model when the loss over the training dataset has a Power Law type curve. by Dartagnjan
Perhaps something like keep track of harder data points and sample half the batch from that? What happened exactly when training on the hard examples only?
JustOneAvailableName t1_j0uj1ee wrote
Reply to comment by retard-moron in [D] Will there be a replacement for Machine Learning Twitter? by MrAcurite
I based my answer on the 2020 and 2021 Neurips papers by institution. Couldn't find data of 2022. Anyways, Wav2vec was a huge paper for me and basically what I worked on for a large part of the past 2 years. And they were the maintainer of PyTorch and still carry the bulk of the work.
I really don't get how you can disregard FB as nearly zero influence on ML
JustOneAvailableName t1_j0tj00w wrote
Reply to [P] Generate transcripts with Whisper AI and automatically translate with LibreTranslate by meddit_app
Did you compare results with the build in translate of Whisper?
JustOneAvailableName t1_j0titk8 wrote
Reply to comment by retard-moron in [D] Will there be a replacement for Machine Learning Twitter? by MrAcurite
FAIR is only behind Google/Microsoft and one of the bigger players in AI research
JustOneAvailableName t1_izc2l2j wrote
Reply to comment by Beor_The_Old in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official
> understanding of the concept of a state.
JustOneAvailableName t1_jea2dzf wrote
Reply to comment by EquipmentStandard892 in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
Software engineer perspective on attention (self quote):
> You have to think about searching. If you search, you have a query (the search term), some way to correlate the query to the actual (size unknown/indifferent) knowledge base and the knowledge base itself. If you have to write this as a mathematical function you have to have something that matches a query, to how similar it is to some key and then return the corresponding value to that key. The transformer equation is a pretty straightforward formula from that perspective. Each layers learns what it searches for, how it can be found and which value it wants to transfer when requested.
RWKV changes this by removing the query. So data is not requested anymore, only pushed. I am frankly surprised to seems to work thus far. Pushing data (self determining how important something is for something else) is not dependant on other states, enabling it to be a RNN.
Edit: step I need to mention: in RWKV importance also fades over time, so it has a recency bias