There seems to be a fundamental misunderstanding with a lot of these writeups. Are they 100% sure history was not rewritten at any point? Going back in time on the repo prior to listed involvement doesn't do anything as the attacker had full control. Starting from the last signed release prior to their involvement is the only way to actually move this forward (history may be fully lost at this point), the rest is posturing.
Even history rewrites would be visible with Github's new Activity tab, e.g., see the two force-pushes in llama.cpp https://github.com/ggerganov/llama.cpp/activity So, while, yes, git history can be rewritten, commits pushed to Github can effectively never be deleted. Personally, I find this to be a downside. Think, personal information, etc. But, in this case, it is helpful. Of course, the repository is suspended right now, so the Activity cannot be checked.
While it's certainly possible to rewrite git history, it's tricky to do it without other maintainers or contributors noticing, since anyone trying to pull into an existing local repo (rather than cloning fresh) would be hit with an unexpected non-fast-forward merge.
It seems likely to me that Lasse Collin would have one or more long-standing local working copies.
So IMHO injecting malicious changes back in time in the git history seems unlikely to me. But not strictly impossible.
Based on how this has gone (remember xz has effectively been orphaned for years, and the majority of long-standing setups were using the release archives), unless if Lasse has never run any code from Jia (unlikely) I'd consider the entire machine untrusted (keys, etc). Provided the tarballs are still signed from that date, from another immutable source, that's really the only starting point here to rebuilding.
> Are they 100% sure history was not rewritten at any point?
With git, one way to check is if other people still have clones of the xz repository from a time when it was trusted.
If you suspect the repo history has been tampered with, you can check against those copies.
I believe it would be hard to introduce such a history rewrite, since people pulling from the xz repo would start getting git error messages when things don't match up?
I don't know to what degree intentional SHA-1 hash collisions could be used to work around that?
A local GIT repo can be changed (including its history) however you please.
But once you have shared it with others you can't take that back.
If you try to, then others will notice that the hashes mismatch and that their HEAD diffs uncleanly.
I know the term is infamous here, but GIT is essentially a blockchain. Each commit has a hash, which is based on the hashes of previous commits, forming a linked list (+ some DAG branching).
You can rename/switch branches. You can change what branch is considered main/master. You can find valid reasons why you'd want to do stuff which raises the alarms so that other people become deaf to them, and only then execute the rewriting attack. Relying on people noticing (even with alarms) is just super fragile.
> You can rename/switch branches. You can change what branch is considered main/master.
Sure, in the project repo the branches are just simple text files that contain the hashes of the commits they point to.
So they are trivial to change in the project repo. But it is also trivial for the distro project to keep copies of the branch/tag info and check against those. I guess what you mainly care about are the previous release tags. They should never change after a release.
> Relying on people noticing (even with alarms) is just super fragile.
I'd say there's plenty of motivation now for the major distros to put infrastructure in place to automate this (keeping track of previous releases) and to actually keep looking at the alarms.
> You can find valid reasons why you'd want to do stuff which raises the alarms so that other people become deaf to them
I'm sure the attackers would try things like that.
But let's say you have an open source application/library that is part of Debian.
How common has it been in the past that the app/lib project had a bunch of tagged releases, and then wanted to rewrite the history so that the tagged releases now point to different commits? I assume it has been very uncommon, but maybe I'm wrong?
And even if that is the case, new infrastructure tools can keep local copies of the source code for previous releases, and check against that.
Repo checking is not trivial, perfect, or sufficient. But I'd say it's a necessary component in guarding against attacks.
The big challenge is still that there is so much code added/changed for each new release of apps/libs that it is very difficult to check against attacks. The obfuscated C contest has proven again and again how hard it is.
It also uses a Merkle tree to compress the snapshot versions associated with commits. But the actual commit structure builds on top of that. A pure Merkle tree or forest would only give you a set of overlapping snapshots, without any directionality. So, I think it is fair to call it a blockchain as well.