PredictorX1 t1_j5h5pb5 wrote on January 22, 2023 at 11:43 PM

Reply to comment by Loquzofaricoalaphar in [D] With more compute could it be easy to quickly un Mask all the people on Reddit by using text correlations to non masked publicly available text data? by Loquzofaricoalaphar

The biggest technical challenges I see:

Having enough reference samples from known people
The difference how people write on Reddit and how they write elsewhere (professional articles, e-mail, etc.: presumably used as reference)
If too many Reddit users are being considered, it may all dissolve into mush (estimated probabilities would all be low)

Loquzofaricoalaphar OP t1_j5h6s4z wrote on January 22, 2023 at 11:51 PM

That is interesting to think about. I’m biased to think text patterns have lots of variables and are fairly unique. Perhaps it’s more of a model than compute problem to analyze it at scale and not get mush.