Submitted by elijahmeeks t3_11esh1j in dataisbeautiful
elijahmeeks OP t1_jafwspx wrote
Reply to comment by Winterstorm8932 in [OC] Complexity and Uncertainty of Topics that ChatGPT Claims to be Difficult to Discuss by elijahmeeks
The short answer: These ratings come from ChatGPT.
Long Answer: You'd have to read through the full exploration to see the entire picture. Basically, I asked ChatGPT to give me 100 topics too uncertain or complex to discuss without misleading users, and to give me subject areas for them and ratings on complexity and uncertainty. So this plot shows the aggregate complexity/uncertainty value of those 100 topics by subject area of the topic. You can see it all in much more detail in the notebook I link to in my comment.
aluvus t1_jag1qou wrote
ChatGPT is not a data source, even for data about itself. There is no underlying "thinking machine" that can, say, meaningfully assign numeric scores like this. It is essentially a statistical language machine. It is a very impressive parrot.
There is nothing inside of it that can autonomously reach a conclusion that a topic is too difficult to comment on; in fact, many people have noted that it will generate text that sounds very confident and is very wrong. It does not have a "mental model" by which it can actually be uncertain about claims in the way that a human can.
The first question you asked (100 topics) is perhaps one that it can answer in a meaningful way, but only inasmuch as it reveals things that the programmers programmed it not to talk about. But the others reflect only, at best, how complex people have said a topic is, in the corpus of text that was used to train its language model.
Regarding the plot itself, I would suggest "uncertainty about topic" and "complexity of topic" for the labels, as the single-word labels were difficult for me to make sense of. I would also suggest reordering the labels, since complexity should be the thing that leads to uncertainty (for a human; for ChatGPT they are essentially unrelated).
elijahmeeks OP t1_jag2y5a wrote
A lot of very confident points that you've posted, you'd do well as an online AI.
- ChatGPT is most definitely a data source now and (along with similar such tools) will be used as such more and more going forward, so it's good to examine how it approaches making data. Whether it "should" be able to assign meaningful numerical scores to things like this, it sure was willing to.
- Agree and it's even more concerning how it does it with data. Take a look at the end of the notebook and you'll see at the end how it hallucinates with the data it gives me. Again, people are going to use these tools like this, so we should be aware of how it responds.
- I think it's revealing not just of the biases of the corpora and creators, but also of the controls to avoid controversy that makes it evaluate certain topics as more "uncertain".
- Good point. I struggle with the way this subreddit is designed to showcase a single chart since so many of my charts are part of larger apps or documents.
aluvus t1_jag4p3r wrote
The fact that people misuse it as a data source is not an excuse for you to knowingly misuse it, doubly so without providing any context to indicate that you know the data is basically bunk. This is fundamentally irresponsible behavior. Consider how your graphic will be interpreted by different audiences:
- People who do not know how ChatGPT works (most people): wow, the AI can figure out how complex a topic is and how certain it should be about it! These are definitely real capabilities, and not an illusion. Thank you for reinforcing my belief that it is near-human intelligence!
- People who do know how ChatGPT works: these numbers are essentially meaningless, this graphic will only serve to mislead people
> Whether it "should" be able to assign meaningful numerical scores to things like this, it sure was willing to.
Yes, so will random.org. Should I make a graphic of that too? Perhaps I could imply that it is sentient.
Viewing a single comment thread. View all comments