Viewing a single comment thread. View all comments

OutOfBananaException t1_jadycev wrote

They have pretty well stated they can't scrape English data, as it has too much Western bias for their liking. They may be able to filter it, but as we've seen with ChatGPT, it's not straightforward, and things will fall through the cracks. That makes life difficult for censors.

In domains where they have access to large volumes of data that doesn't need heavy curating (outside of text), they should be able to do fine.

2