Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Have they ever asked the customers why they prefer scraping over the data deltas?


I would bet the answer is it is easier to write a script the simply downloads everything it can (foreach <a href=>: download and recurse), rather than looking into which sites provide data dumps and how to use them.


So the solution would be an edgecached site exactly like the full site with just the deltas since a periodic timepoint?

The crawler still crawls but can confidently rest assured it still has all the info with the base+delta as if it had recrawled everyting?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: