There have been many significant papers on artificial intelligence (AI) published over the past 100 years. It would be difficult to narrow it down to just 10-20 papers, but some notable papers that have helped shape the field of AI include:

"A Logical Calculus of the Ideas Immanent in Nervous Activity" by Warren McCulloch and Walter Pitts (1943) - This paper introduced the concept of a "neural network" as a computational model for the workings of the brain.

"Computing Machinery and Intelligence" by Alan Turing (1950) - This paper introduced the Turing test, a benchmark for determining whether a machine can exhibit intelligent behavior.

"Perceptrons" by Frank Rosenblatt (1958) - This paper introduced the concept of the perceptron, a simple mathematical model of a neural network.

"Artificial Intelligence: A New Synthesis" by Nils Nilsson (1998) - This book provided a comprehensive overview of the state of the art in AI research at the time.

"The Logic Theorist" by Allen Newell, J. C. Shaw, and Herbert Simon (1956) - This paper described a program called the Logic Theorist, which was able to prove mathematical theorems using artificial intelligence techniques.

"Grammar Induction and Parsing with a Recursive Neural Network" by Stephen Clark and James R. Curran (2007) - This paper introduced the use of recursive neural networks for natural language processing tasks.

"A Survey of the Monte Carlo Method" by Alan Gelfand and Adrian Smith (1990) - This paper provided an overview of the Monte Carlo method, a computational technique that has been widely used in AI.

"The Elements of a Scientific Theory of Intelligence" by Judea Pearl (2000) - This paper introduced the concept of causality, which has become a key focus of AI research.

"Hierarchical Temporal Memory" by Jeff Hawkins, Dileep George, and D. S. Modha (2004) - This paper introduced the concept of hierarchical temporal memory, a computational model for the workings of the brain.

"Human-Level Control through Deep Reinforcement Learning" by Volodymyr Mnih, et al. (2015) - This paper introduced the use of deep reinforcement learning for achieving human-level performance in a range of challenging tasks.

These are just a few examples of some of the significant papers in the field of AI. There are many others that have contributed to the development of AI over the past 100 years.

xnonnymous t1_iz8kr3q wrote on December 7, 2022 at 6:57 AM

#876,763

Backprop is big and important and there're multiple potential papers to chose, but one to consider is: http://www.cs.toronto.edu/~hinton/absps/naturebp.pdf

versaceblues OP t1_iz8mep7 wrote on December 7, 2022 at 7:18 AM

#876,807

Replying to huberloss (#876,762)

interesting I asked gpt the questions as well... it gave me a slightly different set

carbocation t1_iz8n4ds wrote on December 7, 2022 at 7:28 AM

#876,829

Replying to CatalyzeX_code_bot (#876,687)

Bruh

vwings t1_iz8n5m2 wrote on December 7, 2022 at 7:28 AM

#876,831

Replying to huberloss (#876,762)

i would add:

LSTMs (1997), Hochreiter & Schmidhuber

ImageNet (2012), Krishevsky et al

Deep Learning (2015), LeCun, Hinton & Bengio

Attention is all you need (2017).

[deleted] t1_iz8odeg wrote on December 7, 2022 at 7:45 AM

#876,854

[removed]

FutureIsMine t1_iz8p67d wrote on December 7, 2022 at 7:56 AM

#876,878

Replying to vwings (#876,831)

Schmidhuber has entered the room and demands he be acknowledged for chatGPT

anonymousTestPoster t1_iz8pwzx wrote on December 7, 2022 at 8:06 AM

#876,894

Replying to huberloss (#876,762)

Schmidhuber would like a word with that ChatGPT bot

91o291o t1_iz8uvlz wrote on December 7, 2022 at 9:20 AM

#876,984

Each and every book about ML and deep learning has some important references to papers.

Just skim such books and find those important references you need. For example the latest Sebastian Raschka book from 2022 about Machine Learning is very good and cites milestones paper.

ennetws t1_iz8uyra wrote on December 7, 2022 at 9:22 AM

#876,987

Kunihiko Fukushima work https://twitter.com/kateiyas/status/1354469717948305408

Noddybear t1_iz8vddc wrote on December 7, 2022 at 9:28 AM

#876,992

Replying to vwings (#876,831)

Don’t forget Cybenko’s 1989 Universal Approximation Theorem paper.

hayAbhay t1_iz8xj9u wrote on December 7, 2022 at 10:01 AM

#877,023

Word2vec - the starting point for the deep learning boom with NLP (even though w2v is shallow) - biggest mover was that it shipped with pretained vectors and off the shelf code in addition to having the google tag

[deleted] t1_iz91sty wrote on December 7, 2022 at 11:04 AM

#877,133

[deleted]

Nameless1995 t1_iz92pe9 wrote on December 7, 2022 at 11:16 AM

#877,167

Replying to huberloss (#876,762)

> "Grammar Induction and Parsing with a Recursive Neural Network" by Stephen Clark and James R. Curran (2007) - This paper introduced the use of recursive neural networks for natural language processing tasks.

Is this one hallucinated? Couldn't find it.

Some other seems hallucinated too, although semantically related to kind of things the authors do.

Blutorangensaft t1_iz92v1x wrote on December 7, 2022 at 11:18 AM

#877,175

Replying to Phoneaccount25732 (#876,750)

What is people's take on Minsky? Because the XOR issue is easily relieved with a nonlinearity, and I never understood why not knowing how to learn connectedness is a major limtation of neural nets. In fact, even humans have trouble with that, and I fail to see how that generalises. He may be an important figure, but not in favour of progress in artificial intelligence.

Of course, if the question is just about portraying important junctions in AI research, regardless of whether they were helpful or not, then Minsky belongs there.

[deleted] t1_iz93q15 wrote on December 7, 2022 at 11:30 AM

#877,210

Replying to [deleted] (#876,854)

[removed]

93-summer-days t1_iz94puo wrote on December 7, 2022 at 11:42 AM

#877,240

Don't miss Dropout, Residual connection tricks to make training deepnets stable

m98789 t1_iz96oey wrote on December 7, 2022 at 12:06 PM

#877,314

ResNet

ThisIsMyStonerAcount t1_iz96wlt wrote on December 7, 2022 at 12:09 PM

#877,323

I think you only mean "ML", so I'll leave out symbolic approaches. I'll also mostly focus on Deep Learning as the currently strongest trend. But even then 20 papers wouldn't be enough to summarize a trajectory, but they'd be able to give a rough overview of the field.

Papers might not be the right medium for this, so I'll also use other publications. Off the top of my mind, it would be the publications that introduced (too lazy to look them up). In roughly temporal order from oldest to newest

Bayes Rule
Maximum Likelihood Estimation (this is a whole field, not a single paper, not sure where it got started)
Expectation Maximization
Perceptron
Minsky's "XOR is unsolvable" (i.e., the end of the first "Neural Network" era)
Neocognitron
Backprop
TD-Gammon
Vanishing Gradients (i.e., the end of the 2nd NN era)
LSTMs
SVM
RBMs (i.e., the start of Deep Learning and the 3nd NN era)
ImageNet
Playing Atari with Deep Reinforcement Learning
Attention is All You Need
AlphaGo
GPT-3 (arguably this could be replaced by BERT, GPT-1 or GPT-2)
CLIP

This is of course very biased to the last 10 years (because I lived through those).

acardosoj t1_iz97oqc wrote on December 7, 2022 at 12:17 PM

#877,356

Replying to ThisIsMyStonerAcount (#877,323)

You are right, MLE is the basis of everything and it's all work of Ronald Fisher, one the greatest statisticians of all time!

mfs619 t1_iz9apa8 wrote on December 7, 2022 at 12:49 PM

#877,472

I think some of the more fundamental statistics papers may have ground to stand on here in the top 20 if not top 10 in terms of total impact.

While some were not necessarily paradigm shifters they form the foundation of ML and DL inference.

Examples:

The positive false discovery rate: a Bayesian interpretation and the q-value: John Storey

Cox's Regression Model for Counting Processes: A Large Sample Study-P. K. Andersen, R. D. Gill

Nonparametric Estimation from Incomplete Observations: Kaplan and Meier

An Algorithm for Least Squares Estimation of Non-linear parameters-Marquardt

Maximum Likelihood From Incomplete Data Via the EM Algorithm-Dempster et al.

Garbage-Shoddy t1_iz9bq1v wrote on December 7, 2022 at 12:59 PM

#877,516

Replying to FutureIsMine (#876,878)

Typical Schmidhuber

ThisIsMyStonerAcount t1_iz9c2oo wrote on December 7, 2022 at 1:02 PM

#877,531

Replying to Blutorangensaft (#877,175)

Minsky's work was very relevant because at the time the perceptron was the state of the art, and that algorithm can't solve XOR. That's why his work started the first AI winter.

[deleted] t1_iz9fy2n wrote on December 7, 2022 at 1:37 PM

#877,669

[deleted]

Beor_The_Old t1_iz9kuds wrote on December 7, 2022 at 2:16 PM

#877,840

Replying to xnonnymous (#876,763)

Schmindhuber is typing in the chat

RobbinDeBank t1_iz9mrjg wrote on December 7, 2022 at 2:31 PM

#877,925

Replying to FutureIsMine (#876,878)

chatGPT is actually a specific case of the general learning algorithms introduced in Schmidhuber et al. (1990)

Tejas_Garhewal t1_iz9ormz wrote on December 7, 2022 at 2:46 PM

#878,013

Replying to anonymousTestPoster (#876,894)

Who is this Schmidhuber guy, seeing him getting mentioned quite a bit

JackandFred t1_iz9p9iz wrote on December 7, 2022 at 2:49 PM

#878,033

A similar thread was actually posted a couple weeks ago: https://www.reddit.com/r/MachineLearning/comments/ylixp5/d_what_are_the_major_general_advances_in/

pacific_plywood t1_iz9pesa wrote on December 7, 2022 at 2:50 PM

#878,045

Replying to Tejas_Garhewal (#878,013)

He is an extremely prolific researcher who believes that his lab was the first to publish on a number of significant topics

Blutorangensaft t1_iz9pl46 wrote on December 7, 2022 at 2:52 PM

#878,053

Replying to ThisIsMyStonerAcount (#877,531)

I get that, but I find it a little hard to believe that nobody immediately pointed out a nonlinearity would solve the issue. Or is that just hindsight bias, thinking it was easier because of what we know today?

AbsoluteYes t1_iz9pn4w wrote on December 7, 2022 at 2:52 PM

#878,057

So you basically want somebody to do the research for writing an article/college seminar for you..

TheWittyScreenName t1_iz9qnlv wrote on December 7, 2022 at 2:59 PM

#878,102

I don’t think anyone’s mentioned the Adam optimization paper yet. 99% of deep learning models just use it by default without even thinking about it, so I’d say it’s pretty foundational

blablanonymous t1_iz9rb39 wrote on December 7, 2022 at 3:04 PM

#878,118

That’s a good question for Galactica

robbsc t1_iza17he wrote on December 7, 2022 at 4:12 PM

#878,541

Replying to vwings (#876,831)

I don't think the deep learning paper is really significant. It just brought attention to recent advances.

ThisIsMyStonerAcount t1_iza1qzy wrote on December 7, 2022 at 4:16 PM

#878,560

Replying to Blutorangensaft (#878,053)

What nonlinearity would solve the issue? The usual ones we use today certainly wouldn't. Are you thinking a 2nd order polynomial? I'm not sure that's a generally applicable function, with being non-monotonical and all?

(Or do you mean a hidden layer? If so: yeah, that's absolutely hindsight bias).

SleekEagle t1_iza1xxj wrote on December 7, 2022 at 4:17 PM

#878,570

Replying to vzq (#876,732)

The mathematician in me will never let SVMs die!

SleekEagle t1_iza258i wrote on December 7, 2022 at 4:18 PM

#878,575

Replying to vwings (#876,831)

Agreed - the last three you listed were the first ones that came to mind for me

vzq t1_iza2gd1 wrote on December 7, 2022 at 4:21 PM

#878,596

Replying to SleekEagle (#878,570)

There are dozens of us! ;)

csreid t1_iza2yzk wrote on December 7, 2022 at 4:24 PM

#878,623

Replying to hayAbhay (#877,023)

Was it expected at the time that the embeddings would be so ... idk, semantically significant?

I feel like "king - man + woman = queen" is very unintuitive if you don't know about it already and it would've felt huge

csreid t1_iza3c96 wrote on December 7, 2022 at 4:27 PM

#878,639

Replying to [deleted] (#876,854)

Huh?

KingRandomGuy t1_iza65df wrote on December 7, 2022 at 4:45 PM

#878,758

Replying to ThisIsMyStonerAcount (#877,323)

This lines up with ImageNet, but I'd probably drop in AlexNet as well.

vwings t1_iza8lwm wrote on December 7, 2022 at 5:02 PM

#878,872

Replying to robbsc (#878,541)

Completely agree.

Blutorangensaft t1_iza9zrt wrote on December 7, 2022 at 5:11 PM

#878,932

Replying to ThisIsMyStonerAcount (#878,560)

I see. I meant the combination of a nonlinear activation function and another hidden layer. Was curious what people thought, thanks for your comment.

ispeakdatruf t1_izaajjp wrote on December 7, 2022 at 5:14 PM

#878,963

Replying to csreid (#878,623)

> king - man + woman = queen

this just blew my mind when I found out about it.

SammySamsamtam t1_izabcn9 wrote on December 7, 2022 at 5:19 PM

#878,999

Replying to AbsoluteYes (#878,057)

Autodidacts exist...

versaceblues OP t1_izad2r2 wrote on December 7, 2022 at 5:31 PM

#879,062

Replying to AbsoluteYes (#878,057)

no im not in college im just interested.

huberloss t1_izadd9p wrote on December 7, 2022 at 5:32 PM

#879,085

Replying to Nameless1995 (#877,167)

>Grammar Induction and Parsing with a Recursive Neural Network

Pretty sure it's hallucinated. It's kind of funny how plausible it sounded, though :D

MrAcurite t1_izagr44 wrote on December 7, 2022 at 5:54 PM

#879,237

Replying to Nameless1995 (#877,167)

The 'C' in 'ChatGPT' stands for "Confident Bullshitting."

The 'hat' identifies this as merely an approximation of confident bullshitting.

ThisIsMyStonerAcount t1_izakcfe wrote on December 7, 2022 at 6:18 PM

#879,389

Replying to KingRandomGuy (#878,758)

That's actually what I meant, thanks for pointing it out! Edited

FartyFingers t1_izan4i8 wrote on December 7, 2022 at 6:36 PM

#879,488

I would throw in some dead ends along the way as they very much altered the trajectory. LISP was the be all and end all in "AI" for quite some time as an example.

Many of the things being mentioned in these comments are how we got to where we are, but lots of time was spent on now dead evolutionary branches. Not a whole lot of practical progress in the 80s. I remember chatting with "AI" researchers in the 80s and they were on and on about Hebbian Synapses.

[deleted] t1_izaqc2u wrote on December 7, 2022 at 6:57 PM

#879,619

Replying to [deleted] (#877,210)

[removed]

MOSFETBJT t1_izat081 wrote on December 7, 2022 at 7:14 PM

#879,713

Alexnet due to first Gpu training?

Screye t1_izauw5w wrote on December 7, 2022 at 7:27 PM

#879,788

Replying to Tejas_Garhewal (#878,013)

He is the UIUC of Deep learning's mount rushmore.

Just as people think of Stanford, MIT, CMU, Berkley as the big CS universities and forget that UIUC is almost just as good.....people take the names of Hinton, LeCun, Bengio and forget that Schmidhuber(' lab) did a lot of important foundational work in deep learning.

Sadly, he is a curmudgeon who complains a lot and claims even more than he has actually achieved.....so people have kind of soured on him lately.

alfihar t1_izawe8n wrote on December 7, 2022 at 7:37 PM

#879,848

no love for the old classics?

Computing Machinery and Intelligence" (Mind, October 1950), Turing addressed the problem of artificial intelligence, and proposed an experiment that became known as the Turing test,

John Searle in his paper, "Minds, Brains, and Programs", published in Behavioral and Brain Sciences in 1980 - from which we get the Chinese room

chaosmosis t1_izawfsz wrote on December 7, 2022 at 7:37 PM

#879,850

Replying to ThisIsMyStonerAcount (#878,560)

Non-monotonic activation functions can allow for single layers to solve xor, but they take forever to converge.

belabacsijolvan t1_izb11i7 wrote on December 7, 2022 at 8:07 PM

#880,049

!remindme 3 days

RemindMeBot t1_izb157m wrote on December 7, 2022 at 8:08 PM

#880,053

Replying to belabacsijolvan (#880,049)

I will be messaging you in 3 days on 2022-12-10 20:07:44 UTC to remind you of this link

CLICK THIS LINK to send a PM to also be reminded and to reduce spam.

^(Parent commenter can ) ^(delete this message to hide from others.)

^(Info)	^(Custom)	^(Your Reminders)	^(Feedback)

thatguydr t1_izb2e6c wrote on December 7, 2022 at 8:16 PM

#880,088

Lots of recency bias here.

1948: Claude E. Shannon, "A Mathematical Theory of Communication". Bell System Technical Journal. Introduced the notions of information and entropy.

RageA333 t1_izb5exc wrote on December 7, 2022 at 8:35 PM

#880,207

You definitely don't need to go so far back in time. AI is a fairly modern concept.

versaceblues OP t1_izb7hn0 wrote on December 7, 2022 at 8:49 PM

#880,289

Replying to RageA333 (#880,207)

I was interested also in the Lineage of Thought that lead to where AI currently is.

undefdev t1_izbjvcl wrote on December 7, 2022 at 10:12 PM

#880,768

Replying to Screye (#879,788)

> Sadly, he is a curmudgeon who complains a lot and claims even more than he has actually achieved.....so people have kind of soured on him lately.

What did he claim that he didn't achieve? I didn't dig too deeply into it, but it always seemed to me that his complaints haven't been addressed, but nobody has an incentive to support him.

JustOneAvailableName t1_izbnfki wrote on December 7, 2022 at 10:37 PM

#880,916

Replying to undefdev (#880,768)

> What did he claim that he didn't achieve?

Connections to his work are often vague. Yes, his lab tried something in the same extremely general direction. No, his lab did not show it actually worked or what part of the broad direction they went in actually worked. So I am not gonna cite Fast Weight Programmers when I want to write about transformers. Yes, Fast Weight Programmers also argued there are more ways to handle variable sized input than using RNNs. No, I don't think the idea is special at all. The main point of Attention is all you need was that removing something of the then mainstream architecture made it faster (or larger) to train while keeping the quality. It was the timing that made it special, because it successfully went against mainstream and they made it work, not the idea itself.

LtCmdrData t1_izboikt wrote on December 7, 2022 at 10:45 PM

#880,981

To get started:

McCulloch, Warren S.; Pitts, Walter (1 December 1943). "A logical calculus of the ideas immanent in nervous activity". Bulletin of Mathematical Biophysics. 5 (4): 115–133. doi:10.1007/BF02478259
Curry, Haskell B. (1944). "The Method of Steepest Descent for Non-linear Minimization Problems". Quart. Appl. Math. 2 (3): 258–261. doi:10.1090/qam/10667
"Computing Machinery and Intelligence" (1950) by Alan Turing.
The Logic Theory Machine, a complex information processing system by Newell and Simon (1956)
Valiant, Leslie (1984). "A theory of the learnable". Communications of the ACM. 27 (11): 1134–1142. doi:10.1145/1968.1972
Cortes, C., Vapnik, V. Support-vector networks. Mach Learn 20, 273–297 (1995). https://doi.org/10.1007/BF00994018

-xylon t1_izbsuy6 wrote on December 7, 2022 at 11:15 PM

#881,191

Lemme ask chatgpt real quick

undefdev t1_izbui6y wrote on December 7, 2022 at 11:27 PM

#881,271

Replying to JustOneAvailableName (#880,916)

> So I am not gonna cite Fast Weight Programmers when I want to write about transformers.

I think you are probably refering to this paper: Linear Transformers Are Secretly Fast Weight Programmers

It seems like they showed that linear transformers are equivalent to fast weight programmers. If linear transformers are relevant to your research, why not cite fast weight programmers? Credit is cheap, right? We can still call them linear transformers.

JustOneAvailableName t1_izbzbaq wrote on December 8, 2022 at 12:02 AM

#881,470

Replying to undefdev (#881,271)

Because Schmidhuber claiming that transformers are based on his work was a meme for 3-4 years before he actually did that. Like here.

There are hundreds more relevant papers to cite and read about (linear scaling) transformers

undefdev t1_izc3tr1 wrote on December 8, 2022 at 12:35 AM

#881,668

Replying to JustOneAvailableName (#881,470)

> Because Schmidhuber claiming that transformers are based on his work was a meme for 3-4 years before he actually did that. Like here.

But why should memes be relevant in science? Not citing someone because there are memes around their person seems kind of arbitrary. If it's just memes, maybe we shouldn't take them too seriously.

trendymoniker t1_izckgr1 wrote on December 8, 2022 at 2:40 AM

#882,452

Don’t forget Latent Dirichlet Allocation by Blei, Ng, and Jordan (2003). Deep learning has far surpassed probabilistic models for its simple scalability, but they were ascendant throughout the 2000s, with LDA being probably the most impressively complex yet practical of the lot.

Plus: 45,000 citations earned mostly in the era before machine learning was everywhere and every thing.

melatonin_for_me t1_izckns7 wrote on December 8, 2022 at 2:42 AM

#882,461

Replying to thatguydr (#880,088)

I was horrified reading the comments and not seeing this one show up for a while. Thank you!

Mechanical_Number t1_izcoa6a wrote on December 8, 2022 at 3:10 AM

#882,642

Not a single mention to boosting? Harsh...

Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors)

tomhamer5 t1_izcpev4 wrote on December 8, 2022 at 3:19 AM

#882,708

I think CLIP has unlocked a great deal of value both for search and generative AI https://arxiv.org/abs/2103.00020

[deleted] t1_izctghu wrote on December 8, 2022 at 3:52 AM

#882,929

[removed]

CriticalTemperature1 t1_izcwu1m wrote on December 8, 2022 at 4:22 AM

#883,083

Is it worth even reading old papers? These are meant to communicate with researchers at the time the paper was published so its probably more efficient to learn these concepts and techniques from AI textbooks and blogs

Daturnus t1_izcyqjn wrote on December 8, 2022 at 4:38 AM

#883,182

I would consider papers tackling the factoring problem in cryptography, the Turing Test and stuff based on epistemology - how do machines have knowledge. Also don't want to sound cheeky but papers on how AI/ML apply to robotics (instruments adjusting to moments).

koiRitwikHai t1_izd73ei wrote on December 8, 2022 at 6:03 AM

#883,546

I am not aware of exact papers but the first paper which introduced the following concepts (from NLP) will be my pick

HMM models for POS tagging
SVM
Backpropagation
Word2Vec
Semantic graphs (like AMR)
Deep Speech (or CTC loss in ASR)
Transformers
BERT (and variants)

Please suggest if I am missing anything

banuk_sickness_eater t1_izd7gz7 wrote on December 8, 2022 at 6:07 AM

#883,557

Not as concise as a list of 10-20, but this github repo threading together the most impactful papers leading to our modern conceptualizations of ML as a sort of almost narrative driven pseudo-textbook is immensely useful for getting fully caught up on the modern state of ML, regardless.

[deleted] t1_izdaet9 wrote on December 8, 2022 at 6:42 AM

#883,704

Replying to [deleted] (#879,619)

[removed]

versaceblues OP t1_izdgolk wrote on December 8, 2022 at 8:05 AM

#883,945

Replying to CriticalTemperature1 (#883,083)

>Is it worth even reading old papers? These are meant to communicate with researchers at the time the paper was published so its probably more efficient to learn these concepts and techniques from AI textbooks and blogs

Im just interested from a historical perspective.

ziheng0924 t1_izdsp69 wrote on December 8, 2022 at 10:56 AM

#884,291

mark

[deleted] t1_ize6qbz wrote on December 8, 2022 at 1:32 PM

#885,008

Replying to [deleted] (#883,704)

[removed]

rsandler t1_izghl8o wrote on December 8, 2022 at 10:52 PM

#889,420

Replying to huberloss (#876,762)

Alot of these paper titles are hallucinated.

For example, I couldnt find:

"Grammar Induction and Parsing with a Recursive Neural Network"

"A Survey of the Monte Carlo Method"

Also, interestingly, Pearl never wrote a book called "The Elements of a Scientific Theory of Intelligence", but in 2000 he did write his seminal "Causality: Models, Reasoning, and Inference" for which the description would apply very well to...

twistor9 t1_iznfnm1 wrote on December 10, 2022 at 12:14 PM

#899,177

Replying to Phoneaccount25732 (#876,750)

Reparameterization trick is overrated, my Prof is a famous AI researcher and he said people were doing that since the 90s. Deepmind also were about to republish the same idea at the same time, I spoke to one of the authors. Gumbel softmax for discrete reparam was genuinely a new idea though as far as I know but also had two labs publishing on it.

jayjmcfly t1_izu8jdq wrote on December 11, 2022 at 9:58 PM

#908,862

! remindme 6 days

Comments