Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Did you read the article? It references the server-side overhead for shallow clones.


The “server’ cost you’re referencing is the CI system running git shallow then brew update on GitHubs CI servers.

That's not how I understood it. Full clones are big but simple — the server just sends all the packfiles. A first shallow clone needs some server work, but that's cachable, OK.

But then on subsequent interactions between a git client that made a shallow clone various time ago and the git server, it's AFAIU actually expensive for the git server to compute the portion this particular client doesn't yet have.

Intuitively, and very hand-wavingly, I suspect things could be improved by:

(1) clients relaxing "exact depth" requests to "give me approximately N days of stuff, over-sending being OK", and server relaxing "minimal traffic" to roughly map time ranges to whole packfiles — CPU/traffic tradeoff. (2) allowing servers to under-send too (makes (1) tradeoffs easier), by client asking for missing parts right away and/or later — needs on-demand fetch ability to be transparent to user. With "promisor" mechanism in "partial clones" this sounds more realistic? (3) storing history/trees/blobs in entirely separate packfiles(?) I suspect recent years work on bitmaps & MIDX move in this direction, only less naively?

I'm not saying Git can scale as well as a DB, but I do feel we sat on an effectively frozen Git format & protocol for a ~decade, and are now exploring more of the design space so hope future will be less clear-cut...

And specifically, partial clones remove the hard "fully offline vs. centralized" dichotomy we long clinged to. Assuming you stay online (necessary anyway if you consider HTTP/DB), things that used to be up-front UX decisions can now be matters of perf tuning!

* The most dramatic win is if you had to fetch info from every package's separate repo, like Go did. Then, a central DB/caching proxy can build global indexes, unlocking huge wins, no question. It's like "1+N" issues. However, most examples other than Go in the article talk of a single Git repo already storing a global view (still leaving opportunity for custom indexing and querying).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: