ZestyData
ZestyData t1_jeh12gm wrote
Reply to comment by pier4r in [News] Twitter algorithm now open source by John-The-Bomb-2
Idk man as a fairly well seasoned MLE I find their general architecture and scale of their combined models to be fascinating in-and-of itself.
Twitter sucks ass - but this is a beautiful piece of ML Engineering.
ZestyData t1_jegdmzo wrote
Putting aside the political undertones behind many peoples' desire to publish "the algorithm", this is a phenomenal piece of educational content for ML professionals.
Here we have a world-class complex recommendation & ranking system laid bare for all to read into, and develop upon. This is a veritable gold mine of an an educational resource.
ZestyData t1_jebmes7 wrote
Reply to comment by waxroy-finerayfool in [D] Turns out, Othello-GPT does have a world model. by Desi___Gigachad
...
bruh
ZestyData t1_jeagwa8 wrote
Reply to comment by ChuckSeven in [D] Can large language models be applied to language translation? by matthkamis
Thank you for repeating half of what I said back to me, much like ChatGPT you catch on quick to new information:
So, let's be clear here then. Contrary to your incorrect first comment; Google translate is an LLM, it is autoregressive, and it is pretrained. At least to the definition of pre-training given in the GPT paper, which was the parallel I first used in my own comment for OP who was coming into this thread with the knowledge of the latest GPT3+ and ChatGPT products.
​
>It's funny how you mention unrelated stuff, like RLHF
I did so because I had naively assumed you were also a newcomer to the field who knew nothing outside of ChatGPT, given how severely wrong your first comment was. I'll grant you that it wasn't related, except to lend an olive branch and reasonable exit-plan if that were the case for you. Alas.
​
>LLMs tend to be >>1B parameter models
Again, no. Elmo was 94 million, GPT was 120 milliom, GPT-2 was 1.5 billion. BERT has ~300 million parameters. These are all Large Language Models and have been called so for years.There is no hard definition on what constitutes "large". 2018's large is nearly today's consumer-hardware level. Google Translate (and its search) are a few of the most well-used LLMs actually out there.
Man. Why do you keep talking about things that you don't understand, even when corrected?
​
>Lastly, modelling p(y|x) is significantly easier and thus less general than modelling p(x).
Sure! It is easier! But that's not what you said. You'd initially brought up P(Y|X) as a justification that Translation isn't pre-trained. Those are two unrelated concepts. Its ultimate modelling goal is P(Y|X) but in both GPT (Generative Pre-training) and Google translate, they both pretrain their ability to predict P(X|context) in the decoder, just like any hot new LLM of today, hence my correction for you. The application towards ultimate P(Y|X) is not connected to the pretraining of their decoders.
ZestyData t1_jea73i3 wrote
Reply to comment by ChuckSeven in [D] Can large language models be applied to language translation? by matthkamis
LLM simply means Large Language Model. A language model with a large number of parameters. LLMs have referred to all sorts of deep learning architectures over the past 20 years.
Google invented the Transformer architecture, and most importantly discovered how well transformers scale in power as they scale in size. This invention kickstarted the new arms race of LLMs to refer to transformer models with large numbers of parameters.
Google translate's current Prod architecture is a (large) transformer to encode, and an RNN to decode.[1] This falls into the category of LLMs - which weren't just invented when OpenAI invented RLHF at the end of 2022 and published ChatGPT. GPT is the same, but uses transformers for both the encoder & decoder.
The decoding RNN in google translate absolute is an autoregressive model.
I re-read the original GPT paper[2] to try and get a better understanding of the actual "pre-training" term here and I genuinely can't see a difference between that and what Google write about in their papers & blogs [3]; it just defines X & Y differently but they're both predicting a token based on the context window. GPT calls it pretraining because it does an additional step after learning P(X | context). But both approaches perform this fundamental autoregressive training.
[1] - https://ai.googleblog.com/2020/06/recent-advances-in-google-translate.html
[2] - https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
ZestyData t1_je9n2f0 wrote
Reply to comment by matthkamis in [D] Can large language models be applied to language translation? by matthkamis
¯\_(ツ)_/¯
not that aggressive to have assumed but ok
ZestyData t1_je9mf4n wrote
Reply to comment by matthkamis in [D] Can large language models be applied to language translation? by matthkamis
Indeed. But that's the answer to your question. Don't downvote me for it.
"Can large language models be applied to language translation". Yes, they already are.
ZestyData t1_je9ly2p wrote
.. Uh. I'm going to assume you're relatively new to the world of ML. Translation is one of the most common uses for SOTA LLMs.
Its how Google Translate works, as just the most famous example.
What the SOTA translation tools don't yet use is instruct-tuning, to give them conversational interfaces (i.e the difference between GPT and ChatGPT). So they look different than using ChatGPT. But its very largely the same Generative (technically pretrained) Transformers under the hood.
ZestyData t1_jdyli0a wrote
Reply to [P] 🎉 Announcing Auto-Analyst: An open-source AI tool for data analytics! 🎉 by aadityaubhat
Mods can we crack down on students posting basic tutorial-tier side projects to this sub, its becoming common lately.
ZestyData t1_jdyfg0y wrote
Reply to comment by Hnriek in [D] Is French the most widely used language in ML circles after English? If not, what are some useful (natural) languages in the field of machine learning? by Subject_Ad_9680
I wasn't aware of Japan having a particularly disconnected tech sphere from the West like China does. Where China has its own independent platforms, technologies, separate SOTAs and completely disjointed research (until the recent 5 years where they've really started converging and borrowing from each other).
While Japan has tech companies, most of their research is coming out of their global offices, and they really are global even when based in Japan. Sony aren't publishing papers in Japanese, they're doing so to Western conferences in English.
Whereas China had its own parallel FAANG equivalent tech giants developing their own versions of Amazon, Google, and Facebook's tech supremacy & its constituent ML advances.
All this to say that Japan engaged in the Western economy a lot more, and subsequently its tech companies engaged in the Western pool of talent, science, and communication a lot more. Meanwhile China had its own bubble until very very recently, and thus a lot of the world's unique & innovative ML has been conducted in Mandarin.
ZestyData t1_jdy1uqw wrote
Reply to [D] Is French the most widely used language in ML circles after English? If not, what are some useful (natural) languages in the field of machine learning? by Subject_Ad_9680
Now this is a bizarre post..
The answer is clearly Mandarin. By like 2-4 orders of magnitude in terms of ML publishing and the language used internally in ML matters.
French, and indeed any non-English western language, is functionally useless for the explicit purpose of keeping up with industry/research material, or ML-specific career progression.
ZestyData t1_jdmdjrd wrote
Reply to comment by DarkTarantino in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai
Well.. you can just create these graphs if its important for your current task.
There isn't a role called "Chief graph maker" who makes graphs for people when they need them.
ZestyData t1_jdaq4a5 wrote
Reply to [P] One of the best ChatGPT-like models (possibly better than OpenAssistant, Stanford Alpaca, ChatGLM and others) by [deleted]
This is /r/MachineLearning, can you actually give some quality context as to what this post is showing off.
You started from the same starting point as Stanford. What did you do differently, and why? What are your results?
Basic technical writing things here man, otherwise this post is kinda useless. I don't doubt that a bright 16 year old kid can do some great work, but I don't yet see any substance behind your claims of novelty right now.
Give us the juicy deets!
ZestyData t1_jd5299x wrote
It's pretty much all that's been posted here for the past week
ZestyData t1_j9tstqb wrote
Reply to [P] Minds - A JS library to build LLM powered backends and workflows (OpenAI & Cohere) by gsvclass
A wrapper for a big API isn't suited for posting on /r/MachineLearning
ZestyData t1_j824cbd wrote
I spend ~30-40 hours a week on ML-powered projects for work. Life is far too short to start doing ML projects in my free time too.
ZestyData t1_j45bgzi wrote
Reply to [P] Creating A Code-Generating AI Model by Syntro42
This concept already exists so there are plenty of resources (papers, etc) online to learn from.
However, current code generation models are huge and hefty, and take a lot of time & resources to build using our current 2023 technology. So it probably isn't a great idea to build a large code-gen language model from scratch.
However, to do a school project about Large Language Models (LLMs), which includes finetuning a pretrained model as well as doing a small model from scratch as a demonstration, would be cool!
ZestyData t1_j3vpb25 wrote
Reply to comment by LaravelWorkflow in [P] LatentWeb.ai - It's like the Internet is dreaming. by LaravelWorkflow
No, it seems I didn't misread.
You're somehow expecting investments for displaying ChatGPT generated hallucinations.
ZestyData t1_j3vn2ss wrote
So you've put a nice CSS UI around Chat GPT's outputs, with no real substance added.
And you're legitimately hoping for VC funding for that..?
ZestyData t1_j0jbs10 wrote
If you can label a dataset of sentences where the goal is highlighted you can just train a classifier.
ZestyData t1_iwvmkg6 wrote
Reply to [D] NLP folks who have used AllenNLP, how do you migrate your projects to other framework(s)? by spruce5637
This is an infamous Software Engineering smell/tech-debt issue. Unfortunately, you hitched your wagon to the wrong horse - and this is why framework choice is a hugely important matter in all Engineering teams. And doing your research before commiting to structing entire projects around a framework may seem like a waste of time but it can prevent major headaches like this.
In reality, there is no shortcut to rewriting / disentangling your project from AllenNLP to more conventional NLP (Huggingface, Spacy) which are all but guaranteed to be safe for long term use.
It's now a matter of convincing Product / the business that you need to commit a lot of time to an Epic that won't deliver direct value.
​
>how to ensure that everything is reproduced
Unit tests, integration tests, logging & monitoring.
And of course comparing ML metrics for each model that was produced in AllenNLP versus the replacement models produced with Spacy/Huggingface/native-pytorch.
​
> and how to not want to pull out all your hair in frustration while doing it
Sorry mate, but there's no avoiding this one. Good luck.
ZestyData t1_ivzkuet wrote
Reply to comment by pm_me_your_smth in [D] Current Job Market in ML by diffusion-xgb
What? Your own source there shows that the number of companies "contributing to this wave" has been steadily decreasing month-on-month, and November's numbers are looking to be on par with the Summer in terms of number of companies. So yeah.. not a trend in unusual layoff numbers.
And if you omit the outliers of Meta and Twitter, the overall numbers of layoffs from the industry look to be completely on trend, aside from those specific outliers.
The data itself shows that we aren't in that big layoff wave yet.
ZestyData t1_ivyuvp8 wrote
Reply to [D] Current Job Market in ML by diffusion-xgb
Twitter layoffs are because of the Elon shitshow
Meta layoffs are because their stock is crashing, because their business strategy isn't as strong as it used to be.
As far as I'm aware those are the only significant outliers in terms of layoffs.
ML market is still great. And half of the exciting work in ML is coming out of startups.
ZestyData t1_ita3raa wrote
Reply to [P] Look up words by their description by phraisely
Getting real sick of people posting free LLMs behind their own API as some sort of paid product.
ZestyData t1_jeh198p wrote
Reply to comment by lordofbitterdrinks in [News] Twitter algorithm now open source by John-The-Bomb-2
This quite obviously isn't the repo used by twitter.
It is a pretty large and well put together documentation epic & consolidation of multiple microservices.
Whether the content is 100% reflective of whats deployed is completely unclear. But its not "fake" that's for sure, its genuinely too many man-years of work to not be in-essence real.