Submitted by mvujas t3_zo5imc in MachineLearning
CriticalTemperature1 t1_j0l6du5 wrote
Most people aren't labelling outputs as good or bad so how do they get any reward or training signals from these beta users
mvujas OP t1_j0l9aht wrote
That is true, but it's a similar case with crowdsourcing, they have some clever things there such as honeypots and weighted expertise scores or whatever they are called in order to make the most of the data. But I would even argue that continuing a conversation is a form of positive feedback or even coming back to the website
Nameless1995 t1_j0lri08 wrote
I just had a thought. I think resampling with "try again" button itself can be used as a feedback (a noisy signal for the "user didn't like the earlier version"). Moreover if a user switches back to the earlier sample that can be another feedback (the earlier version being preferred more). They can get a lot of data from these. I expect users to be using "try again" more frequently that upvotes/downvotes.
Aggravating-Act-1092 t1_j0lhqqx wrote
I’d agree. You can probably even ask ChatGPT to review the follow up someone gives it and assign a score based on that.
Personally if it gives me buggy code I point it out and try to fix for example, that’s clear negative. I also sometimes write thank you to it when I’m happy with its answer.
fimari t1_j0rpmx0 wrote
Probably the same way Google detect good search results - people stop searching when the result is good and people stop fiddle around if they have what they want.
mvujas OP t1_j0llkjy wrote
Oh, just reading the answer again, there is actually a feedback button in the top right corner of each answer, but I would assume that even if a small percentage of users is using this button, it ends up costing less than paying people to do this manually
humanbeingmusic t1_j0nc16t wrote
Was thinking the same thing, that is a reinforcement signal for sure, lots of other data to imply
mettle t1_j0mrkqp wrote
lots of implicit signals to look at based on what the user does after.
30katz t1_j0m3iuj wrote
Just analyzing questions and gleaning what could be going on would be a gold mine
I’m sure Google can come up with a lot of very profitable metrics
RandomIsAMyth t1_j0s0ydc wrote
I don't think that's right. Human inputs are great training signals. Fine tuning chatgpt on them (basically trying to predict what the human would have said) has a pretty high value.
They are running ChatGPT for something like 100k$ a day but getting millions of data points. They think that the data they get are worth these 100k$. A new version will come soon and they will probably be able to make better and better training data out of the crowdsourcing experiment.
If supervised learning is the way to go, make the labelling large and big. For free, on the simplest website ever. I think they nailed it.
ChuckSeven t1_j0u2grw wrote
> ut I would even argue that continuing a conversation is a form of positive feedback or even coming back to the websit
It is way cheaper to take real conversations and have a crowdworker label it for being a good conversation or a bad conversation.
Viewing a single comment thread. View all comments