Submitted by versaceblues t3_zetvmd in MachineLearning

You can only pick max 20 papers, and they should cover the major milestones/turning points in AI research. What would those papers be?

In terms of significance im looking for papers along the lines of
"Attention is all you need" - https://arxiv.org/abs/1706.03762

That mark big shifts/breakthroughs in the field.

350

Comments

You must log in or register to comment.

huberloss t1_iz8kqp3 wrote

I asked ChatGPT this question. This is what it came up with (I am very impressed at what it output):

There have been many significant papers on artificial intelligence (AI) published over the past 100 years. It would be difficult to narrow it down to just 10-20 papers, but some notable papers that have helped shape the field of AI include:

"A Logical Calculus of the Ideas Immanent in Nervous Activity" by Warren McCulloch and Walter Pitts (1943) - This paper introduced the concept of a "neural network" as a computational model for the workings of the brain.

"Computing Machinery and Intelligence" by Alan Turing (1950) - This paper introduced the Turing test, a benchmark for determining whether a machine can exhibit intelligent behavior.

"Perceptrons" by Frank Rosenblatt (1958) - This paper introduced the concept of the perceptron, a simple mathematical model of a neural network.

"Artificial Intelligence: A New Synthesis" by Nils Nilsson (1998) - This book provided a comprehensive overview of the state of the art in AI research at the time.

"The Logic Theorist" by Allen Newell, J. C. Shaw, and Herbert Simon (1956) - This paper described a program called the Logic Theorist, which was able to prove mathematical theorems using artificial intelligence techniques.

"Grammar Induction and Parsing with a Recursive Neural Network" by Stephen Clark and James R. Curran (2007) - This paper introduced the use of recursive neural networks for natural language processing tasks.

"A Survey of the Monte Carlo Method" by Alan Gelfand and Adrian Smith (1990) - This paper provided an overview of the Monte Carlo method, a computational technique that has been widely used in AI.

"The Elements of a Scientific Theory of Intelligence" by Judea Pearl (2000) - This paper introduced the concept of causality, which has become a key focus of AI research.

"Hierarchical Temporal Memory" by Jeff Hawkins, Dileep George, and D. S. Modha (2004) - This paper introduced the concept of hierarchical temporal memory, a computational model for the workings of the brain.

"Human-Level Control through Deep Reinforcement Learning" by Volodymyr Mnih, et al. (2015) - This paper introduced the use of deep reinforcement learning for achieving human-level performance in a range of challenging tasks.

These are just a few examples of some of the significant papers in the field of AI. There are many others that have contributed to the development of AI over the past 100 years.

162

anonymousTestPoster t1_iz8pwzx wrote

Schmidhuber would like a word with that ChatGPT bot

111

Tejas_Garhewal t1_iz9ormz wrote

Who is this Schmidhuber guy, seeing him getting mentioned quite a bit

3

pacific_plywood t1_iz9pesa wrote

He is an extremely prolific researcher who believes that his lab was the first to publish on a number of significant topics

25

Screye t1_izauw5w wrote

He is the UIUC of Deep learning's mount rushmore.

Just as people think of Stanford, MIT, CMU, Berkley as the big CS universities and forget that UIUC is almost just as good.....people take the names of Hinton, LeCun, Bengio and forget that Schmidhuber(' lab) did a lot of important foundational work in deep learning.

Sadly, he is a curmudgeon who complains a lot and claims even more than he has actually achieved.....so people have kind of soured on him lately.

19

undefdev t1_izbjvcl wrote

> Sadly, he is a curmudgeon who complains a lot and claims even more than he has actually achieved.....so people have kind of soured on him lately.

What did he claim that he didn't achieve? I didn't dig too deeply into it, but it always seemed to me that his complaints haven't been addressed, but nobody has an incentive to support him.

6

JustOneAvailableName t1_izbnfki wrote

> What did he claim that he didn't achieve?

Connections to his work are often vague. Yes, his lab tried something in the same extremely general direction. No, his lab did not show it actually worked or what part of the broad direction they went in actually worked. So I am not gonna cite Fast Weight Programmers when I want to write about transformers. Yes, Fast Weight Programmers also argued there are more ways to handle variable sized input than using RNNs. No, I don't think the idea is special at all. The main point of Attention is all you need was that removing something of the then mainstream architecture made it faster (or larger) to train while keeping the quality. It was the timing that made it special, because it successfully went against mainstream and they made it work, not the idea itself.

5

undefdev t1_izbui6y wrote

> So I am not gonna cite Fast Weight Programmers when I want to write about transformers.

I think you are probably refering to this paper: Linear Transformers Are Secretly Fast Weight Programmers

It seems like they showed that linear transformers are equivalent to fast weight programmers. If linear transformers are relevant to your research, why not cite fast weight programmers? Credit is cheap, right? We can still call them linear transformers.

4

JustOneAvailableName t1_izbzbaq wrote

Because Schmidhuber claiming that transformers are based on his work was a meme for 3-4 years before he actually did that. Like here.

There are hundreds more relevant papers to cite and read about (linear scaling) transformers

2

undefdev t1_izc3tr1 wrote

> Because Schmidhuber claiming that transformers are based on his work was a meme for 3-4 years before he actually did that. Like here.

But why should memes be relevant in science? Not citing someone because there are memes around their person seems kind of arbitrary. If it's just memes, maybe we shouldn't take them too seriously.

7

vwings t1_iz8n5m2 wrote

i would add:

LSTMs (1997), Hochreiter & Schmidhuber

ImageNet (2012), Krishevsky et al

Deep Learning (2015), LeCun, Hinton & Bengio

Attention is all you need (2017).

77

FutureIsMine t1_iz8p67d wrote

Schmidhuber has entered the room and demands he be acknowledged for chatGPT

41

RobbinDeBank t1_iz9mrjg wrote

chatGPT is actually a specific case of the general learning algorithms introduced in Schmidhuber et al. (1990)

16

Noddybear t1_iz8vddc wrote

Don’t forget Cybenko’s 1989 Universal Approximation Theorem paper.

12

robbsc t1_iza17he wrote

I don't think the deep learning paper is really significant. It just brought attention to recent advances.

11

SleekEagle t1_iza258i wrote

Agreed - the last three you listed were the first ones that came to mind for me

1

Nameless1995 t1_iz92pe9 wrote

> "Grammar Induction and Parsing with a Recursive Neural Network" by Stephen Clark and James R. Curran (2007) - This paper introduced the use of recursive neural networks for natural language processing tasks.

Is this one hallucinated? Couldn't find it.

Some other seems hallucinated too, although semantically related to kind of things the authors do.

32

MrAcurite t1_izagr44 wrote

The 'C' in 'ChatGPT' stands for "Confident Bullshitting."

The 'hat' identifies this as merely an approximation of confident bullshitting.

23

huberloss t1_izadd9p wrote

>Grammar Induction and Parsing with a Recursive Neural Network

Pretty sure it's hallucinated. It's kind of funny how plausible it sounded, though :D

9

versaceblues OP t1_iz8mep7 wrote

interesting I asked gpt the questions as well... it gave me a slightly different set

5

rsandler t1_izghl8o wrote

Alot of these paper titles are hallucinated.

For example, I couldnt find:

"Grammar Induction and Parsing with a Recursive Neural Network"

"A Survey of the Monte Carlo Method"

Also, interestingly, Pearl never wrote a book called "The Elements of a Scientific Theory of Intelligence", but in 2000 he did write his seminal "Causality: Models, Reasoning, and Inference" for which the description would apply very well to...

4

ThisIsMyStonerAcount t1_iz96wlt wrote

I think you only mean "ML", so I'll leave out symbolic approaches. I'll also mostly focus on Deep Learning as the currently strongest trend. But even then 20 papers wouldn't be enough to summarize a trajectory, but they'd be able to give a rough overview of the field.

Papers might not be the right medium for this, so I'll also use other publications. Off the top of my mind, it would be the publications that introduced (too lazy to look them up). In roughly temporal order from oldest to newest

  • Bayes Rule
  • Maximum Likelihood Estimation (this is a whole field, not a single paper, not sure where it got started)
  • Expectation Maximization
  • Perceptron
  • Minsky's "XOR is unsolvable" (i.e., the end of the first "Neural Network" era)
  • Neocognitron
  • Backprop
  • TD-Gammon
  • Vanishing Gradients (i.e., the end of the 2nd NN era)
  • LSTMs
  • SVM
  • RBMs (i.e., the start of Deep Learning and the 3nd NN era)
  • ImageNet
  • Playing Atari with Deep Reinforcement Learning
  • Attention is All You Need
  • AlphaGo
  • GPT-3 (arguably this could be replaced by BERT, GPT-1 or GPT-2)
  • CLIP

This is of course very biased to the last 10 years (because I lived through those).

49

acardosoj t1_iz97oqc wrote

You are right, MLE is the basis of everything and it's all work of Ronald Fisher, one the greatest statisticians of all time!

10

Phoneaccount25732 t1_iz8kaym wrote

Kingma's Reparameterization Trick.

Minsky on how single layer Perceptrons can't solve xor, and whoever did the follow-up on how MLPs can.

Wolpert on No Free Lunch in search and optimization.

43

Blutorangensaft t1_iz92v1x wrote

What is people's take on Minsky? Because the XOR issue is easily relieved with a nonlinearity, and I never understood why not knowing how to learn connectedness is a major limtation of neural nets. In fact, even humans have trouble with that, and I fail to see how that generalises. He may be an important figure, but not in favour of progress in artificial intelligence.

Of course, if the question is just about portraying important junctions in AI research, regardless of whether they were helpful or not, then Minsky belongs there.

5

ThisIsMyStonerAcount t1_iz9c2oo wrote

Minsky's work was very relevant because at the time the perceptron was the state of the art, and that algorithm can't solve XOR. That's why his work started the first AI winter.

14

Blutorangensaft t1_iz9pl46 wrote

I get that, but I find it a little hard to believe that nobody immediately pointed out a nonlinearity would solve the issue. Or is that just hindsight bias, thinking it was easier because of what we know today?

0

ThisIsMyStonerAcount t1_iza1qzy wrote

What nonlinearity would solve the issue? The usual ones we use today certainly wouldn't. Are you thinking a 2nd order polynomial? I'm not sure that's a generally applicable function, with being non-monotonical and all?

(Or do you mean a hidden layer? If so: yeah, that's absolutely hindsight bias).

11

Blutorangensaft t1_iza9zrt wrote

I see. I meant the combination of a nonlinear activation function and another hidden layer. Was curious what people thought, thanks for your comment.

4

chaosmosis t1_izawfsz wrote

Non-monotonic activation functions can allow for single layers to solve xor, but they take forever to converge.

2

twistor9 t1_iznfnm1 wrote

Reparameterization trick is overrated, my Prof is a famous AI researcher and he said people were doing that since the 90s. Deepmind also were about to republish the same idea at the same time, I spoke to one of the authors. Gumbel softmax for discrete reparam was genuinely a new idea though as far as I know but also had two labs publishing on it.

1

thatguydr t1_izb2e6c wrote

Lots of recency bias here.

1948: Claude E. Shannon, "A Mathematical Theory of Communication". Bell System Technical Journal. Introduced the notions of information and entropy.

31

melatonin_for_me t1_izckns7 wrote

I was horrified reading the comments and not seeing this one show up for a while. Thank you!

5

hayAbhay t1_iz8xj9u wrote

Word2vec - the starting point for the deep learning boom with NLP (even though w2v is shallow) - biggest mover was that it shipped with pretained vectors and off the shelf code in addition to having the google tag

29

csreid t1_iza2yzk wrote

Was it expected at the time that the embeddings would be so ... idk, semantically significant?

I feel like "king - man + woman = queen" is very unintuitive if you don't know about it already and it would've felt huge

9

ispeakdatruf t1_izaajjp wrote

> king - man + woman = queen

this just blew my mind when I found out about it.

10

TheWittyScreenName t1_iz9qnlv wrote

I don’t think anyone’s mentioned the Adam optimization paper yet. 99% of deep learning models just use it by default without even thinking about it, so I’d say it’s pretty foundational

14

m98789 t1_iz96oey wrote

ResNet

5

alfihar t1_izawe8n wrote

no love for the old classics?

Computing Machinery and Intelligence" (Mind, October 1950), Turing addressed the problem of artificial intelligence, and proposed an experiment that became known as the Turing test,

John Searle in his paper, "Minds, Brains, and Programs", published in Behavioral and Brain Sciences in 1980 - from which we get the Chinese room

4

mfs619 t1_iz9apa8 wrote

I think some of the more fundamental statistics papers may have ground to stand on here in the top 20 if not top 10 in terms of total impact.

While some were not necessarily paradigm shifters they form the foundation of ML and DL inference.

Examples:

The positive false discovery rate: a Bayesian interpretation and the q-value: John Storey

Cox's Regression Model for Counting Processes: A Large Sample Study-P. K. Andersen, R. D. Gill

Nonparametric Estimation from Incomplete Observations: Kaplan and Meier

An Algorithm for Least Squares Estimation of Non-linear parameters-Marquardt

Maximum Likelihood From Incomplete Data Via the EM Algorithm-Dempster et al.

3

LtCmdrData t1_izboikt wrote

To get started:

  1. McCulloch, Warren S.; Pitts, Walter (1 December 1943). "A logical calculus of the ideas immanent in nervous activity". Bulletin of Mathematical Biophysics. 5 (4): 115–133. doi:10.1007/BF02478259
  2. Curry, Haskell B. (1944). "The Method of Steepest Descent for Non-linear Minimization Problems". Quart. Appl. Math. 2 (3): 258–261. doi:10.1090/qam/10667
  3. "Computing Machinery and Intelligence" (1950) by Alan Turing.
  4. The Logic Theory Machine, a complex information processing system by Newell and Simon (1956)
  5. Valiant, Leslie (1984). "A theory of the learnable". Communications of the ACM. 27 (11): 1134–1142. doi:10.1145/1968.1972
  6. Cortes, C., Vapnik, V. Support-vector networks. Mach Learn 20, 273–297 (1995). https://doi.org/10.1007/BF00994018
3

MOSFETBJT t1_izat081 wrote

Alexnet due to first Gpu training?

2

trendymoniker t1_izckgr1 wrote

Don’t forget Latent Dirichlet Allocation by Blei, Ng, and Jordan (2003). Deep learning has far surpassed probabilistic models for its simple scalability, but they were ascendant throughout the 2000s, with LDA being probably the most impressively complex yet practical of the lot.

Plus: 45,000 citations earned mostly in the era before machine learning was everywhere and every thing.

2

93-summer-days t1_iz94puo wrote

Don't miss Dropout, Residual connection tricks to make training deepnets stable

1

blablanonymous t1_iz9rb39 wrote

That’s a good question for Galactica

1

-xylon t1_izbsuy6 wrote

Lemme ask chatgpt real quick

1

Daturnus t1_izcyqjn wrote

I would consider papers tackling the factoring problem in cryptography, the Turing Test and stuff based on epistemology - how do machines have knowledge. Also don't want to sound cheeky but papers on how AI/ML apply to robotics (instruments adjusting to moments).

1

koiRitwikHai t1_izd73ei wrote

I am not aware of exact papers but the first paper which introduced the following concepts (from NLP) will be my pick

  1. HMM models for POS tagging
  2. SVM
  3. Backpropagation
  4. Word2Vec
  5. Semantic graphs (like AMR)
  6. Deep Speech (or CTC loss in ASR)
  7. Transformers
  8. BERT (and variants)

Please suggest if I am missing anything

1

jayjmcfly t1_izu8jdq wrote

! remindme 6 days

1

FartyFingers t1_izan4i8 wrote

I would throw in some dead ends along the way as they very much altered the trajectory. LISP was the be all and end all in "AI" for quite some time as an example.

Many of the things being mentioned in these comments are how we got to where we are, but lots of time was spent on now dead evolutionary branches. Not a whole lot of practical progress in the 80s. I remember chatting with "AI" researchers in the 80s and they were on and on about Hebbian Synapses.

0

CriticalTemperature1 t1_izcwu1m wrote

Is it worth even reading old papers? These are meant to communicate with researchers at the time the paper was published so its probably more efficient to learn these concepts and techniques from AI textbooks and blogs

0

versaceblues OP t1_izdgolk wrote

>Is it worth even reading old papers? These are meant to communicate with researchers at the time the paper was published so its probably more efficient to learn these concepts and techniques from AI textbooks and blogs

Im just interested from a historical perspective.

1

91o291o t1_iz8uvlz wrote

Each and every book about ML and deep learning has some important references to papers.

Just skim such books and find those important references you need. For example the latest Sebastian Raschka book from 2022 about Machine Learning is very good and cites milestones paper.

−1

AbsoluteYes t1_iz9pn4w wrote

So you basically want somebody to do the research for writing an article/college seminar for you..

−6

RageA333 t1_izb5exc wrote

You definitely don't need to go so far back in time. AI is a fairly modern concept.

−6

versaceblues OP t1_izb7hn0 wrote

I was interested also in the Lineage of Thought that lead to where AI currently is.

2

[deleted] t1_iz8odeg wrote

[removed]

−15