Viewing a single comment thread. View all comments

PredictorX1 t1_j5h5pb5 wrote

The biggest technical challenges I see:

  1. Having enough reference samples from known people
  2. The difference how people write on Reddit and how they write elsewhere (professional articles, e-mail, etc.: presumably used as reference)
  3. If too many Reddit users are being considered, it may all dissolve into mush (estimated probabilities would all be low)
3

Loquzofaricoalaphar OP t1_j5h6s4z wrote

That is interesting to think about. I’m biased to think text patterns have lots of variables and are fairly unique. Perhaps it’s more of a model than compute problem to analyze it at scale and not get mush.

1