Viewing a single comment thread. View all comments

gwern t1_j9r43jv wrote

Reply to comment by Hodoss in And Yet It Understands by calbhollo

There is an important sense in which it 'hacked the system': this is just what happens when you apply optimization pressure with adversarial dynamics, the Sydney model automatically yields 'hacks' of the classifier, and the more you optimize/sample, the more you exploit the classifier: https://openai.com/blog/measuring-goodharts-law/ My point is that this is more like a virus evolving to beat an immune system than about a more explicit or intentional-sounding 'deliberately hijacking the input suggestions'. The viruses aren't 'trying' to do anything, it's just that the unfit viruses get killed and vanish, and only the one that beat the immune system survive.

9