master3243
master3243 t1_jdzec5r wrote
Reply to comment by hadaev in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
Seeing how they made sure the bar exam and the math olympiad tests were recent ones that were explicitly stated to not be in the training dataset to avoid contamination, I trusted that all the other reported tests were also as carefully picked to avoid contamination.
master3243 t1_jdue84p wrote
Reply to comment by BullockHouse in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
> The problem with your solution is that it probably biases the model towards making up some papers just to fit the prompt and have a mix.
That's a very important point, adding an extra condition (if 'p' then 'q') to the prompt makes the model biased towards doing 'p' then doing 'q' to fulfil the prompt despite the condition still being met if it just avoided doing 'p'.
For a more concrete example, here's me asking ChatGPT to write two essays:
1- Write a paragraph about zoos. Figure. (Notice how no Elephants are mentioned)
2- Write a paragraph about zoos with (if 'p' then 'q') condition. Figure (Notice how only this answer mentions Elephants)
master3243 t1_jdudizk wrote
Reply to comment by Borrowedshorts in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
Who needs statistical tests with theoretical grounding and justified/repeatable results when you've got LLMs™
master3243 t1_jdlhj77 wrote
Reply to comment by __Maximum__ in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
Knowing how a lot of text data from Reddit comments ends up in these huge text datasets only for them to make it completely closed source rubs me the wrong way.
master3243 t1_jdlhb8c wrote
I have a theory that the main reason OpenAI decided to start keeping it's training and architectural details private is because through minor modification in training data and data augmentation they were able to gain significant improvements in the qualitative output of GPT.
Thus any competitor could replicate the pipeline with ease and reproduce the improvements, so they decided to keep it as a trade secret.
Glad more research like this is being done and shared to the rest of the community.
master3243 t1_jccfe2n wrote
Reply to [D] What do people think about OpenAI not releasing its research but benefiting from others’ research? Should google meta enforce its patents against them? by [deleted]
As a person that heavily relies on both CLIP (released 2 years ago) and Whisper (released just 6 months ago) in his research, I would disagree with the claim that "open research in AI [is] gone".
In addition, I've needed to run the usual benchmarking to compare my own work with several other models and was quite surprised when I was able to run my full benchmark on GPT-3 solely using the free credit provided by them.
Don't get me wrong, I criticize OpenAI for not completely sticking to the mission they built their foundation in (I mean it's literally in the name FFS) but I wouldn't say they completely closed off research from the public.
master3243 t1_j9evdjy wrote
Reply to [D] Maybe a new prompt injection method against newBing or ChatGPT? Is this kind of research worth writing a paper? by KakaTraining
It's not research paper worthy IMO. You'd be writing a paper heavily dependent on the hidden-prompt that Microsoft won't let you see and also dependent on what criteria they decide to end the conversation in. Neither of those are scientifically interesting.
But like always, feel free to make blog posts involving these investigations and I'd even be interested in reading them, I just don't think there are scientific contributions in it.
master3243 t1_j92z98d wrote
Reply to comment by ilovethrills in [D] Please stop by [deleted]
Lambda firing back after bing called Google "the worst and most inferior chat service in the world."
master3243 t1_j91xkeo wrote
There's way more to computer vision than what you listed.
Long form video understanding is still incredibly limited. Compared to the current SOTA capabilities of LLM to understand very long text and the various advancements in text summarization, video understanding seems to have an incredibly long ways to go.
Our current models can understand relatively very simple actions (sitting/standing/dancing) however compared to text, we want to reach a level where we can understand entire scenes in a movie or maybe even an entire movie, although that's more of a fantasy currently. Not to mention all the 3D input (instead of a projection 2D image) which adds extra complexity.
master3243 t1_j91w1my wrote
Reply to comment by Borrowedshorts in [D] Please stop by [deleted]
Not everything that involves chatgpt belongs in this sub.
master3243 t1_j918xav wrote
Reply to [D] Please stop by [deleted]
Agreed, I would prefer posts about SOTA research, big/relevant projects, or news.
master3243 t1_j8wunw4 wrote
Reply to [N] Google is increasing the price of every Colab Pro tier by 10X! Pro is 95 Euro and Pro+ is 433 Euro per month! Without notifying users! by FreePenalties
I haven't seen any evidence that they changed the price. It's still $10/month for me.
You can check for yourself right here: https://colab.research.google.com/signup
master3243 t1_j8bg2wl wrote
It would be amazing if this supports the whisper
package directly instead of whisper in transformers
from huggingface. (I know this isn't just for whisper but I do really need to speedup whisper)
master3243 t1_j7tmpsz wrote
Reply to comment by DigThatData in [D] Are there emergent abilities of image models? by These-Assignment-936
Exactly, the beginning "Clip" part of the entire Dalle model is trained to take any english text and map it to an embedding space.
It's completely natural (and probably surprising if it doesn't happen) that Clip would map (some) gibberish words to a part of the embedding space that is sufficiently close in L2-distance to the projection of a real world.
In that case, the diffusion model would decode that gibberish word to a similar image generated by the real word.
master3243 t1_j67mcbh wrote
Reply to comment by currentscurrents in [D] Could forward-forward learning enable training large models with distributed computing? by currentscurrents
> A variant paper, Predictive Forward-Forward
Interesting, I'll have to read it at a more convenient time.
Do share your results if they are promising/fruitful.
master3243 t1_j67jwad wrote
Reply to [D] Could forward-forward learning enable training large models with distributed computing? by currentscurrents
Hinton says that it does not generalize as well on the toy problems he investigates. An algorithm not doing well on toy problems is often not a good sign. I predict that unless someone discovers a breakthrough, it will be worse than backprop despite operating faster (due to not having the bottlenecks as you suggested).
master3243 t1_j62aoln wrote
Reply to comment by currentscurrents in [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents
You're right they've been around for 5 years (and the idea for attention even before that) but almost every major conference still has new papers coming out giving more insight into transformers (and sometimes algorithms/methods older than it)
I just don't want to see titles flooded with terms like "secretly" or "hidden" or "mysterious", I feel it replaces scientific terms with less scientific but more eye-catchy ones.
Again I totally understand why they would choose this phrasing, and I probably would too, but in a blog post title not a research paper title.
But once again, the actual work seems great and that's all that matters really.
master3243 t1_j61wtpt wrote
Reply to [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents
This is great work in collaboration with Microsoft Research. I'll have to read more than just the abstract and quickly skimming over it.
My only slight annoyance is the word "Secretly" in the title, I just feel a better word would be "implicitly" that would also be less "clickbait'-y
master3243 t1_j20v8i9 wrote
Reply to comment by lorepieri in [D] DeepMind has at least half a dozen prototypes for abstract/symbolic reasoning. What are their approaches? by valdanylchuk
> The reality is that AGI has become (always been?) an engineering problem.
I would not state this as fact, I'd say the majority of neuroscientists and cognitive scientists will disagree with this (or say we don't know yet), and a fair number of AI researchers would too.
I doubt any but a few researchers would be comfortable saying "Yes it's definitely an engineering problem".
master3243 t1_j153av3 wrote
Reply to comment by MangoMo3 in [N] Point-E: a new Dalle-like model that generates 3D Point Clouds from Prompts by RepresentativeCod613
There was something like that posted here a couple of months ago (although using Stable Diffusion instead of Dalle) https://www.reddit.com/r/MachineLearning/comments/xxmdbv/p_stabledreamfusion_a_working_implementation_of/irhlkct/
master3243 t1_j152mmj wrote
Reply to [N] Point-E: a new Dalle-like model that generates 3D Point Clouds from Prompts by RepresentativeCod613
The abstract puts this project into perspective, their methods are much faster but still doesn't beat the state of the art.
> While recent work on text-conditional 3D object generation has shown promising results, the state-of-the-art methods typically require multiple GPU-hours to produce a single sample. This is in stark contrast to state-of-the-art generative image models, which produce samples in a number of seconds or minutes. In this paper, we explore an alternative method for 3D object generation which produces 3D models in only 1-2 minutes on a single GPU. Our method first generates a single synthetic view using a text-to-image diffusion model, and then produces a 3D point cloud using a second diffusion model which conditions on the generated image. While our method still falls short of the state-of-the-art in terms of sample quality, it is one to two orders of magnitude faster to sample from, offering a practical trade-off for some use cases. We release our pre-trained point cloud diffusion models, as well as evaluation code and models, at https: //github.com/openai/point-e
master3243 t1_j12nmgc wrote
Reply to comment by Purplekeyboard in [R] Nonparametric Masked Language Modeling - MetaAi 2022 - NPM - 500x fewer parameters than GPT-3 while outperforming it on zero-shot tasks by Singularian2501
There's no way for a paper to just have a table of "real world comparison of GPT-3",
There needs to (for now) be some benchmark created that systematically tests for the things we care about. Which is exactly why I deeply respect researchers dedicated on creating better and more useful benchmarks as their work immensely accelerates the field while they mostly don't get the attention they (IMO) deserve.
master3243 t1_izxkwzt wrote
Reply to comment by zzzthelastuser in [D] - Has Open AI said what ChatGPT's architecture is? What technique is it using to "remember" previous prompts? by 029187
True, using the embedding from an LLM as a summary of the past for the same LLM is a technique I've seen done before.
master3243 t1_izv48yc wrote
Reply to comment by patient_zer00 in [D] - Has Open AI said what ChatGPT's architecture is? What technique is it using to "remember" previous prompts? by 029187
This is it, they have a huge context size and they just feed it in.
I've seen discussion on whether they use some kind of summarization to be able to fit more context into the same size model but there's only speculation in that regards.
In either case, it's nothing we haven't seen in recent papers here and there.
master3243 t1_jeh48sn wrote
Reply to comment by [deleted] in [News] Twitter algorithm now open source by John-The-Bomb-2
I don't take any CEO's words at face value without considering the monetary values and incentives behind that tongue.
A large project like this being open-sourced, even if it's a very old or heavily stripped down version, is always a great thing for the community.