ZestyData

ZestyData t1_jeh198p wrote

This quite obviously isn't the repo used by twitter.

It is a pretty large and well put together documentation epic & consolidation of multiple microservices.

Whether the content is 100% reflective of whats deployed is completely unclear. But its not "fake" that's for sure, its genuinely too many man-years of work to not be in-essence real.

52

ZestyData t1_jegdmzo wrote

Putting aside the political undertones behind many peoples' desire to publish "the algorithm", this is a phenomenal piece of educational content for ML professionals.

Here we have a world-class complex recommendation & ranking system laid bare for all to read into, and develop upon. This is a veritable gold mine of an an educational resource.

631

ZestyData t1_jeagwa8 wrote

Thank you for repeating half of what I said back to me, much like ChatGPT you catch on quick to new information:

So, let's be clear here then. Contrary to your incorrect first comment; Google translate is an LLM, it is autoregressive, and it is pretrained. At least to the definition of pre-training given in the GPT paper, which was the parallel I first used in my own comment for OP who was coming into this thread with the knowledge of the latest GPT3+ and ChatGPT products.

​

>It's funny how you mention unrelated stuff, like RLHF

I did so because I had naively assumed you were also a newcomer to the field who knew nothing outside of ChatGPT, given how severely wrong your first comment was. I'll grant you that it wasn't related, except to lend an olive branch and reasonable exit-plan if that were the case for you. Alas.

​

>LLMs tend to be >>1B parameter models

Again, no. Elmo was 94 million, GPT was 120 milliom, GPT-2 was 1.5 billion. BERT has ~300 million parameters. These are all Large Language Models and have been called so for years.There is no hard definition on what constitutes "large". 2018's large is nearly today's consumer-hardware level. Google Translate (and its search) are a few of the most well-used LLMs actually out there.

Man. Why do you keep talking about things that you don't understand, even when corrected?

​

>Lastly, modelling p(y|x) is significantly easier and thus less general than modelling p(x).

Sure! It is easier! But that's not what you said. You'd initially brought up P(Y|X) as a justification that Translation isn't pre-trained. Those are two unrelated concepts. Its ultimate modelling goal is P(Y|X) but in both GPT (Generative Pre-training) and Google translate, they both pretrain their ability to predict P(X|context) in the decoder, just like any hot new LLM of today, hence my correction for you. The application towards ultimate P(Y|X) is not connected to the pretraining of their decoders.

1

ZestyData t1_jea73i3 wrote

LLM simply means Large Language Model. A language model with a large number of parameters. LLMs have referred to all sorts of deep learning architectures over the past 20 years.

Google invented the Transformer architecture, and most importantly discovered how well transformers scale in power as they scale in size. This invention kickstarted the new arms race of LLMs to refer to transformer models with large numbers of parameters.

Google translate's current Prod architecture is a (large) transformer to encode, and an RNN to decode.[1] This falls into the category of LLMs - which weren't just invented when OpenAI invented RLHF at the end of 2022 and published ChatGPT. GPT is the same, but uses transformers for both the encoder & decoder.

The decoding RNN in google translate absolute is an autoregressive model.

I re-read the original GPT paper[2] to try and get a better understanding of the actual "pre-training" term here and I genuinely can't see a difference between that and what Google write about in their papers & blogs [3]; it just defines X & Y differently but they're both predicting a token based on the context window. GPT calls it pretraining because it does an additional step after learning P(X | context). But both approaches perform this fundamental autoregressive training.

[1] - https://ai.googleblog.com/2020/06/recent-advances-in-google-translate.html

[2] - https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf

[3] - https://arxiv.org/pdf/1609.08144.pdf

4

ZestyData t1_je9ly2p wrote

.. Uh. I'm going to assume you're relatively new to the world of ML. Translation is one of the most common uses for SOTA LLMs.

Its how Google Translate works, as just the most famous example.

What the SOTA translation tools don't yet use is instruct-tuning, to give them conversational interfaces (i.e the difference between GPT and ChatGPT). So they look different than using ChatGPT. But its very largely the same Generative (technically pretrained) Transformers under the hood.

30

ZestyData t1_jdyfg0y wrote

I wasn't aware of Japan having a particularly disconnected tech sphere from the West like China does. Where China has its own independent platforms, technologies, separate SOTAs and completely disjointed research (until the recent 5 years where they've really started converging and borrowing from each other).

While Japan has tech companies, most of their research is coming out of their global offices, and they really are global even when based in Japan. Sony aren't publishing papers in Japanese, they're doing so to Western conferences in English.

Whereas China had its own parallel FAANG equivalent tech giants developing their own versions of Amazon, Google, and Facebook's tech supremacy & its constituent ML advances.

All this to say that Japan engaged in the Western economy a lot more, and subsequently its tech companies engaged in the Western pool of talent, science, and communication a lot more. Meanwhile China had its own bubble until very very recently, and thus a lot of the world's unique & innovative ML has been conducted in Mandarin.

3

ZestyData t1_jdy1uqw wrote

Now this is a bizarre post..

The answer is clearly Mandarin. By like 2-4 orders of magnitude in terms of ML publishing and the language used internally in ML matters.

French, and indeed any non-English western language, is functionally useless for the explicit purpose of keeping up with industry/research material, or ML-specific career progression.

26

ZestyData t1_jdaq4a5 wrote

This is /r/MachineLearning, can you actually give some quality context as to what this post is showing off.

You started from the same starting point as Stanford. What did you do differently, and why? What are your results?

Basic technical writing things here man, otherwise this post is kinda useless. I don't doubt that a bright 16 year old kid can do some great work, but I don't yet see any substance behind your claims of novelty right now.

Give us the juicy deets!

23

ZestyData t1_j45bgzi wrote

This concept already exists so there are plenty of resources (papers, etc) online to learn from.

However, current code generation models are huge and hefty, and take a lot of time & resources to build using our current 2023 technology. So it probably isn't a great idea to build a large code-gen language model from scratch.

However, to do a school project about Large Language Models (LLMs), which includes finetuning a pretrained model as well as doing a small model from scratch as a demonstration, would be cool!

1

ZestyData t1_iwvmkg6 wrote

This is an infamous Software Engineering smell/tech-debt issue. Unfortunately, you hitched your wagon to the wrong horse - and this is why framework choice is a hugely important matter in all Engineering teams. And doing your research before commiting to structing entire projects around a framework may seem like a waste of time but it can prevent major headaches like this.

In reality, there is no shortcut to rewriting / disentangling your project from AllenNLP to more conventional NLP (Huggingface, Spacy) which are all but guaranteed to be safe for long term use.

It's now a matter of convincing Product / the business that you need to commit a lot of time to an Epic that won't deliver direct value.

​

>how to ensure that everything is reproduced

Unit tests, integration tests, logging & monitoring.

And of course comparing ML metrics for each model that was produced in AllenNLP versus the replacement models produced with Spacy/Huggingface/native-pytorch.

​

> and how to not want to pull out all your hair in frustration while doing it

Sorry mate, but there's no avoiding this one. Good luck.

7

ZestyData t1_ivzkuet wrote

What? Your own source there shows that the number of companies "contributing to this wave" has been steadily decreasing month-on-month, and November's numbers are looking to be on par with the Summer in terms of number of companies. So yeah.. not a trend in unusual layoff numbers.

And if you omit the outliers of Meta and Twitter, the overall numbers of layoffs from the industry look to be completely on trend, aside from those specific outliers.

The data itself shows that we aren't in that big layoff wave yet.

1

ZestyData t1_ivyuvp8 wrote

Twitter layoffs are because of the Elon shitshow

Meta layoffs are because their stock is crashing, because their business strategy isn't as strong as it used to be.

As far as I'm aware those are the only significant outliers in terms of layoffs.

ML market is still great. And half of the exciting work in ML is coming out of startups.

83