Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As someone that runs the infrastructure for a large OSS project. Mostly Chinese AI firms. All the big name brand AI firms play reasonably nice and respect robots.txt.

The Chinese ones are hyper aggressive, with no rate limit and pure greed scraping. They'll scrape the same content hundreds of times the same day





The Chinese are also sloppy. They will run those scrapers until they get banned and not give a fuck.

In my experience, they do not bother putting in the effort to obfuscate source or evade bans in the first place. They might try again later, but this particular setup was specifically engineered for resiliency.


Is this an example of that "chabuduo" we read about now and then?

Chinese AI is doing large amounts of request in the past weeks.

how is this showing up for you? site you host or bigger scale? I'm not surprised but rather curious.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: