There seems to be a fundamental misunderstanding with a lot of these writeups. A...

mxmlnkn · on April 3, 2024

Even history rewrites would be visible with Github's new Activity tab, e.g., see the two force-pushes in llama.cpp https://github.com/ggerganov/llama.cpp/activity So, while, yes, git history can be rewritten, commits pushed to Github can effectively never be deleted. Personally, I find this to be a downside. Think, personal information, etc. But, in this case, it is helpful. Of course, the repository is suspended right now, so the Activity cannot be checked.

azornathogron · on April 3, 2024

While it's certainly possible to rewrite git history, it's tricky to do it without other maintainers or contributors noticing, since anyone trying to pull into an existing local repo (rather than cloning fresh) would be hit with an unexpected non-fast-forward merge.

It seems likely to me that Lasse Collin would have one or more long-standing local working copies.

So IMHO injecting malicious changes back in time in the git history seems unlikely to me. But not strictly impossible.

KyleSanderson · on April 3, 2024

Based on how this has gone (remember xz has effectively been orphaned for years, and the majority of long-standing setups were using the release archives), unless if Lasse has never run any code from Jia (unlikely) I'd consider the entire machine untrusted (keys, etc). Provided the tarballs are still signed from that date, from another immutable source, that's really the only starting point here to rebuilding.

pdw · on April 3, 2024

In any case Debian has its own archive of every xz-utils version they've used in the past.

rkta · on April 3, 2024

The attacker had access to the GH mirror of the repo. The original repo remained at https://git.tukaani.org/

fl7305 · on April 3, 2024

> Are they 100% sure history was not rewritten at any point?

With git, one way to check is if other people still have clones of the xz repository from a time when it was trusted.

If you suspect the repo history has been tampered with, you can check against those copies.

I believe it would be hard to introduce such a history rewrite, since people pulling from the xz repo would start getting git error messages when things don't match up?

I don't know to what degree intentional SHA-1 hash collisions could be used to work around that?

dist-epoch · on April 3, 2024

You can create pairs of SHA-1 hash collission, but not for a particular existing SHA-1 hash (the git one)

AtNightWeCode · on April 3, 2024

People think git is immutable. It is not.

Lichtso · on April 3, 2024

Yes and no.

A local GIT repo can be changed (including its history) however you please. But once you have shared it with others you can't take that back. If you try to, then others will notice that the hashes mismatch and that their HEAD diffs uncleanly.

I know the term is infamous here, but GIT is essentially a blockchain. Each commit has a hash, which is based on the hashes of previous commits, forming a linked list (+ some DAG branching).

The_Colonel · on April 3, 2024

> If you try to, then others will notice that the hashes mismatch and that their HEAD diffs uncleanly.

So it relies on a human noticing and acting upon it. People not noticing backdoors being merged into the project is kinda the source of this problem.

fl7305 · on April 3, 2024

You can automate checks for if a large part of the previous git history suddenly changed.

You can't automate checks for malicious code.

The_Colonel · on April 3, 2024

That relies on some heuristics which can be worked around, unless you disallow rewriting history.

But the bigger issue is that this is some theoretical system which is not present in most git repositories.

fl7305 · on April 3, 2024

The heuristic would be "sound the alarm if the main branch is rewritten". And maybe also "if a release tag that we have used for our distro is moved".

Wouldn't that catch most problems, and not generate too many false alarms?

The_Colonel · on April 4, 2024

You can rename/switch branches. You can change what branch is considered main/master. You can find valid reasons why you'd want to do stuff which raises the alarms so that other people become deaf to them, and only then execute the rewriting attack. Relying on people noticing (even with alarms) is just super fragile.

fl7305 · on April 4, 2024

> You can rename/switch branches. You can change what branch is considered main/master.

Sure, in the project repo the branches are just simple text files that contain the hashes of the commits they point to.

So they are trivial to change in the project repo. But it is also trivial for the distro project to keep copies of the branch/tag info and check against those. I guess what you mainly care about are the previous release tags. They should never change after a release.

> Relying on people noticing (even with alarms) is just super fragile.

I'd say there's plenty of motivation now for the major distros to put infrastructure in place to automate this (keeping track of previous releases) and to actually keep looking at the alarms.

> You can find valid reasons why you'd want to do stuff which raises the alarms so that other people become deaf to them

I'm sure the attackers would try things like that.

But let's say you have an open source application/library that is part of Debian.

How common has it been in the past that the app/lib project had a bunch of tagged releases, and then wanted to rewrite the history so that the tagged releases now point to different commits? I assume it has been very uncommon, but maybe I'm wrong?

And even if that is the case, new infrastructure tools can keep local copies of the source code for previous releases, and check against that.

Repo checking is not trivial, perfect, or sufficient. But I'd say it's a necessary component in guarding against attacks.

The big challenge is still that there is so much code added/changed for each new release of apps/libs that it is very difficult to check against attacks. The obfuscated C contest has proven again and again how hard it is.

craftkiller · on April 3, 2024

Its a Merkle Tree. They were invented 3 years before blockchains: https://en.wikipedia.org/wiki/Merkle_tree

Lichtso · on April 3, 2024

It also uses a Merkle tree to compress the snapshot versions associated with commits. But the actual commit structure builds on top of that. A pure Merkle tree or forest would only give you a set of overlapping snapshots, without any directionality. So, I think it is fair to call it a blockchain as well.

dboreham · on April 3, 2024

Blockchains were invented in 1982?

Lichtso · on April 3, 2024

In short, yes: https://en.wikipedia.org/wiki/Blockchain#History

People conflate blockchains, distributed networks and cryptocurrencies.

ptx · on April 3, 2024

Well, it is and it isn't: It has mutable pointers (branches and tags) to immutable nodes in a graph (commits).

fl7305 · on April 3, 2024

Can you elaborate? Are you thinking of intentional SHA-1 has collisions? Would that work in practice?

AtNightWeCode · on April 3, 2024

The history. Every time something like this attack happens people think they can read the complete git history in the repo.

fl7305 · on April 3, 2024

If some commits are signed by people you trust, can the chain before that still be compromised?

smartmic · on April 3, 2024

Concerning history rewrite, it makes sense to point to Fossil and its major difference to Git:

https://fossil-scm.org/home/doc/trunk/www/fossil-v-git.wiki#...

There is also a link to "Is Fossil a Blockchain?", an interesting read because the term was mentioned elsewhere is this thread.