Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Cool effort! Does seem like a lot of thinking went into this. But, a few points:

"Through trial and error (and, of course, pattern-fitting) we crafted a scoring system that could correctly identify over 65% of the spam accounts."

65% is not actually very accurate for a binary classifier...

"Applying this model to the ~44K random, recently-active accounts provided Followerwonk produces a quality score for each account, visualized below:"

Many real twitter uses are likely not to be "active" aside from reading stuff. So this methodology would clearly overestimate the number of spam/fake accounts (which all would be active).

Also, this is an important point:

"The other potential critique is our spam/fake follower calculation methodology. Because we crafted it in 2018, based off sample sets of purchased spam accounts, it’s likely that more sophisticated spammers and fake accounts go unidentified by our system"

The features collected are certainly outdated by now.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: