Submitted by Loquzofaricoalaphar t3_10ixiu6 in MachineLearning
Obviously nation states can already pretty comprehensively identify people using other methods, even on tor and such because of user error, but If your average home user can quickly do this using text what will implications be for the web?
-
I am Assuming that is it currently possible to feed a model a bunch of text written by “Bobby” and put a specific post into model and get confidence stat that is was written by Bobby
-
would it be possible in future with better models and a lot more compute to use non anon data from all of Facebook or internet to quickly scan pseudo anonymous places like Reddit, twitter or even something truly anon like dark web and return all results of list of probable authors?
I’m assuming people whom are seeking true anonymity already put their text through paraphrase models or just write very bland.
I am Using the word mask instead of anonymous because Reddit seems more like obfuscation than potential true anonymity like with some tor forum with a sophisticated user or something.
It is interesting to think that all the subtle errors and invisible algorimic choices of the human brain is trivial for a machine to identify given a sufficient natural language model that can translate the text and incorporate pattern matching.
Edit: I mean a a noisy probability stat not an assurance that x was written by y. More like 75% match to Bobby 32% match to sally. Matching to errors, flow, unusual word choices, more advanced than just a plagiarism detector.
[deleted] t1_j5h4cp9 wrote
[deleted]