master3243

master3243 t1_jdue84p wrote

> The problem with your solution is that it probably biases the model towards making up some papers just to fit the prompt and have a mix.

That's a very important point, adding an extra condition (if 'p' then 'q') to the prompt makes the model biased towards doing 'p' then doing 'q' to fulfil the prompt despite the condition still being met if it just avoided doing 'p'.

For a more concrete example, here's me asking ChatGPT to write two essays:

1- Write a paragraph about zoos. Figure. (Notice how no Elephants are mentioned)

2- Write a paragraph about zoos with (if 'p' then 'q') condition. Figure (Notice how only this answer mentions Elephants)

46

master3243 t1_jdlhb8c wrote

I have a theory that the main reason OpenAI decided to start keeping it's training and architectural details private is because through minor modification in training data and data augmentation they were able to gain significant improvements in the qualitative output of GPT.

Thus any competitor could replicate the pipeline with ease and reproduce the improvements, so they decided to keep it as a trade secret.

Glad more research like this is being done and shared to the rest of the community.

29

master3243 t1_jccfe2n wrote

As a person that heavily relies on both CLIP (released 2 years ago) and Whisper (released just 6 months ago) in his research, I would disagree with the claim that "open research in AI [is] gone".

In addition, I've needed to run the usual benchmarking to compare my own work with several other models and was quite surprised when I was able to run my full benchmark on GPT-3 solely using the free credit provided by them.

Don't get me wrong, I criticize OpenAI for not completely sticking to the mission they built their foundation in (I mean it's literally in the name FFS) but I wouldn't say they completely closed off research from the public.

0

master3243 t1_j9evdjy wrote

It's not research paper worthy IMO. You'd be writing a paper heavily dependent on the hidden-prompt that Microsoft won't let you see and also dependent on what criteria they decide to end the conversation in. Neither of those are scientifically interesting.

But like always, feel free to make blog posts involving these investigations and I'd even be interested in reading them, I just don't think there are scientific contributions in it.

90

master3243 t1_j91xkeo wrote

There's way more to computer vision than what you listed.

Long form video understanding is still incredibly limited. Compared to the current SOTA capabilities of LLM to understand very long text and the various advancements in text summarization, video understanding seems to have an incredibly long ways to go.

Our current models can understand relatively very simple actions (sitting/standing/dancing) however compared to text, we want to reach a level where we can understand entire scenes in a movie or maybe even an entire movie, although that's more of a fantasy currently. Not to mention all the 3D input (instead of a projection 2D image) which adds extra complexity.

22

master3243 t1_j7tmpsz wrote

Exactly, the beginning "Clip" part of the entire Dalle model is trained to take any english text and map it to an embedding space.

It's completely natural (and probably surprising if it doesn't happen) that Clip would map (some) gibberish words to a part of the embedding space that is sufficiently close in L2-distance to the projection of a real world.

In that case, the diffusion model would decode that gibberish word to a similar image generated by the real word.

2

master3243 t1_j67jwad wrote

Hinton says that it does not generalize as well on the toy problems he investigates. An algorithm not doing well on toy problems is often not a good sign. I predict that unless someone discovers a breakthrough, it will be worse than backprop despite operating faster (due to not having the bottlenecks as you suggested).

13

master3243 t1_j62aoln wrote

You're right they've been around for 5 years (and the idea for attention even before that) but almost every major conference still has new papers coming out giving more insight into transformers (and sometimes algorithms/methods older than it)

I just don't want to see titles flooded with terms like "secretly" or "hidden" or "mysterious", I feel it replaces scientific terms with less scientific but more eye-catchy ones.

Again I totally understand why they would choose this phrasing, and I probably would too, but in a blog post title not a research paper title.

But once again, the actual work seems great and that's all that matters really.

12

master3243 t1_j20v8i9 wrote

> The reality is that AGI has become (always been?) an engineering problem.

I would not state this as fact, I'd say the majority of neuroscientists and cognitive scientists will disagree with this (or say we don't know yet), and a fair number of AI researchers would too.

I doubt any but a few researchers would be comfortable saying "Yes it's definitely an engineering problem".

5

master3243 t1_j152mmj wrote

The abstract puts this project into perspective, their methods are much faster but still doesn't beat the state of the art.

> While recent work on text-conditional 3D object generation has shown promising results, the state-of-the-art methods typically require multiple GPU-hours to produce a single sample. This is in stark contrast to state-of-the-art generative image models, which produce samples in a number of seconds or minutes. In this paper, we explore an alternative method for 3D object generation which produces 3D models in only 1-2 minutes on a single GPU. Our method first generates a single synthetic view using a text-to-image diffusion model, and then produces a 3D point cloud using a second diffusion model which conditions on the generated image. While our method still falls short of the state-of-the-art in terms of sample quality, it is one to two orders of magnitude faster to sample from, offering a practical trade-off for some use cases. We release our pre-trained point cloud diffusion models, as well as evaluation code and models, at https: //github.com/openai/point-e

11

master3243 t1_j12nmgc wrote

There's no way for a paper to just have a table of "real world comparison of GPT-3",

There needs to (for now) be some benchmark created that systematically tests for the things we care about. Which is exactly why I deeply respect researchers dedicated on creating better and more useful benchmarks as their work immensely accelerates the field while they mostly don't get the attention they (IMO) deserve.

31

master3243 t1_izv48yc wrote

This is it, they have a huge context size and they just feed it in.

I've seen discussion on whether they use some kind of summarization to be able to fit more context into the same size model but there's only speculation in that regards.

In either case, it's nothing we haven't seen in recent papers here and there.

114