Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

HTML has had metadata tags for ever, it's just that they quickly stopped being used by search engines were so inaccurate and prone to abuse. Even now, the heavy presence of these types of tags is arguably a marker that a website is really interested in its google ranking, and probably fairly spammy.

Any sort of description or tagging or keywords or genre description needs third party vetting to be of any use what so ever. It's simply too profitable to misrepresent your websites for it to be any other way.



Those metadata tags were just a simple textual description and a bunch of keywords with no reference to any controlled vocabulary. This is what made them so easy to abuse. Modern schema-based structured data is vastly different, and with a bit of human supervision (that's the "third party vetting") it's feasible to tell when the site is lying. (Of course, low-quality bulk content can also be given an accurate description. But this is good for users, who can then more easily filter out that sort of content.)

One could even let this vetting happen in decentralized fashion, by extending Web Annotation standards to allow for claims of the sort "this page/site includes accurate/inaccurate structured content."


The thing is "a bit of human supervision" is difficult on a scale of ten thousand Wikipedias. It pretty much needs to be done completely automatically.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: