Viewing a single comment thread. View all comments

shmoculus t1_j28yfux wrote

I kind of see what you are getting at, and it could be the case with exponential improvements in methods/research that we see more discoveries in one year than all the previous at some point but I don't think we're there yet.

The progression has been linear in my view:

  1. Efficient image classification (CNNs)

  2. object detection / segmentation / pix2pix / basic img2text models (RCNNs, Unet, GANs)

  3. Deep reinforcement learning (DQN, PPO, MCTS)

  4. Attention networks (transformers and language modelling)

  5. Basic question / answer and reasoning models

  6. Low quality txt2img models (e.g. DALL-E 1)

  7. High quality txt2img models (e.g. DALL-E 2, stable diffusion)

  8. Multimodal modals (image understading etc) <- we are here

  9. Already happening video2video models, text2mesh / point cloud

  10. Expect low, then high quality multimodal generation models e.g. txt2video + music

  11. Expect improved text understanding, general chat behaviour, ie large step ups in chatbot usefulness inclution ability to take actions (this part is already underway)

  12. Expect some kind of attention based method for reading and writing to storage (i.e memory) and possibly online learning / continuous improvement

13 . More incrementally interesting stuff :)

5

CypherLH t1_j2b0ncx wrote

"Linear" but consider how rapidly the last half of your points progressed! It took nearly a decade to go from step 1 to step 6. In then took 18 months to go from step 6 to step 9, and probably less than another 12 months to get to step 11 based on current rates of progress.

1

shmoculus t1_j2byyzm wrote

It's going to be an interesting decade for sure :)

1