Debian is a good example here. A requirement of being a Debian Maintainer or Developer is that you must have met real existing Debian Developers in person and they must have signed your key (after checking your identity)[1].
This certainly does make it harder to become a DM or DD. But it gives us the Debian keyring, which is the distro trust implementation for Debian as explained in the article.
However, despite the difficulty in bootstrapping developers, Debian have achieved it. Thanks to them, there are many more people in the strong set[2], and you can probably find one near you[3]. Can we use this to bootstrap other communities, and end up in a situation where it's normal to already have been "introduced" into the strong set if you're a software author in any project?
The catch is that signing a key does not usually convey "trust", just "identity". In Debian, trust is assigned at the end of a successful application process to becoming a Debian Maintainer or Developer, and this trust is conveyed for a particular identity with presence in the official Debian keyring. So in Debian, trust is centralized.
However, it doesn't have to be exactly like this. I can imagine a decentralized system built on the same tooling. For example: you could require three certifications from existing members in the set, bootstrapped from Guido, for inclusion in the trusted keychain. (You might additionally limit the maximum degrees of separation from Guido to assist with auditability).
Bitcoin offers a neat solution for public key verification. I'm about to use that in a Bitcoin-related service that I'm about to launch. Here's an explanation from the security page:
#### Public key verification
To prevent an attacker from modifying our published Bitcoin public key,
its permanently embedded into Bitcoin blockchain in a way that is
[nearly impossible](https://en.bitcoin.it/wiki/Weaknesses#Attacker_has_a_lot_of_computing_power)
to modify (and becomes exponentially more difficult as time goes by).
The public key can be verified by taking the following procedure:
1. Take the SHA256 of the domain name ("****.com")
2. Create a Bitcoin address using that hash as the private key
3. Find the first transaction with that address as its *output address*
4. The *input address* of that transaction is our public key
If its ever required to change the public key, the announcement
will be signed with the old public key.
Software packages could use the package name instead of a domain name, or the authors can attach the public key to their usernames and use it to sign all their software.
Nope.
Package signing in the exposed ways means that if you trust the dev once, you will trust him for every subsequent package, no matter where you get it from.
The conclusion is especially bad.
"we'll get a solution where the end user has the relationship with the source of trust and not the package author."
That's extremely dumb, on many levels. I'll go with the possible most disturbing.
If you don't care about the author and only the source, the author just has to put a backdoor in the code. DONE.
That's why you have to trust the author and NOT the source. The author is the first creator of the code and thus the person with the most ability to sign the code.
Package signing ensure the package has not been modified (because you see, replacing SHA hashes is trivial. sha1sum file > thehashfile.txt).
Yes, you'll have to trust someone, at some point. Then you'll build more trust as time goes. That's how it is. When you get your debian package, you trust the debian devs, a bunch of them.
When you trust a arch package, you trust that the devs you trusts make good trust decisions, and you trust them as well (arch uses the "web of trust").
When you trust an SSL server, you trust that your browser made the right trust decisions, and that the CA made the proper checks, and that the server owner is trustable.
When you turn on your computer you trust that intel didn't backdoor their microcode.
And so on. Trust is an infinite "issue" and securing the transport will never ensure the source is correct. Never.
> That's why you have to trust the author and NOT the source. The author is the first creator of the code and thus the person with the most ability to sign the code.
Source of Trust, not Source of The Thing You Downloaded. The author would still sign the package, it's just how do you get from where you're at to trusting that person. The way that browsers, most (All?) Linux distributions, Microsoft etc work is by hard baking a list of trust roots. This has the effect that we have in the modern CA system where because there's a hard baked system and the trust relationship is between the "author" and the source of trust that you can't reasonably not trust say Verisign or a significant portion of the internet breaks. It's about Trust Agility not about trusting the place you downloaded the package from. It's an idea similar to http://convergence.io/.
1. Signing should be different from source integrity. For example, signing the manifest that contains a bunch of hashes should be sufficient - you don't necessarily need to sign the entire compressed package. Newer package management systems like the Illumos IPS work this way (which is not without it's faults). The veneration of the "church of tarball" needs to go away - we've been living in a DVCS world for the past few years and it's awesome compared to the dark ages of before.
2. We need some sort of heuristics to help make decisions. For example - a diff between versions is much easier to audit, and having statistics on contributors might be quite useful. For example, red flags might go up if a previously slow moving project had a massive number of commits with a new developer - this might just be new blood getting into a project, or a maliciously intended merge.
In the end, eyeballs are needed. We need to make it easier for those eyeballs to go further on less.
1. Yes absolutely. I tried to not talk about specific signing implementations because this was mostly about that simply signing isn't enough, you need a trust model.
2. I'm actually a PyPI admin, a pip core developer, and one of the folks working in the python packaging space. This sort of heuristics is something I absolutely want to add. Right now I'm mostly focused on closing huge gaps in security wise (prior to 1.3.x pip downloaded everything over HTTP, PyPI did not have a trusted SSL cert, etc). But Yes! Making it easier for eyeballs is a good thing.
1. When you sign a file.. oh snap. It generates a checksums and signs the checksum. That's how signing works.
Thats exactly the same as what you propose.
In fact.. the way you propose and without a custom implementation of the signing algorithm, its generally going to generate a hash of the list of hashes and then sign that. Slower, more memory.
The only advantage is if you need to remotely sign a file locally. You can gain time by not having to transfer the file if you sign only the list of hashes (and of course, you've to trust that they've been transported properly, so your trust is less perfect, because further away from the original)
2. The heuristic is the number of signatures you trust. Having thumbs up/down just makes bots happy.
As for malicious merge, yes "eyeballs" are the "only decently reliable solution", but not only this is mission impossible, they'll probably still miss some smart backdoors in the code. All you need is an off by one.. (or what not..)
You can however sign commits, and blame whoever introduced the issue whenever its discovered/public (and decide if it was a legit error or not)
Note: its still a good idea to have automatic checks so that all the "easy stuff" is filtered out, of course.
There's a difference between signing a tarball in it's entirety (in which case you'd get only one checksum) and checksumming every file inside the tarball and then signing that file.
The problem is how permissive PyPI is. I can upload a package called django2 with the description that this is the new generation of the Django framework and run arbitrary code on a whole lot of machines of many unsuspecting developers. The reason Linux distributions do not generally have this problem is that there is an air gap between the upstream developer and the repository in the form of a maintainer who reads the code and verifies that it is not malicious. Incidentally, GitHub has the same problem as PyPI. With good SEO I can create a repo called django and have it be the first thing that new developers find.
One possible solution to this is to have a core group of maintainers verify the most popular packages. I place my trust in the maintainer and the maintainer signs a specific version of the package. This will not scale to 100% of the packages, but that is fine. As long as django, flask, psycopg2, etc. are signed I can take responsibility for reviewing django-pretty-snarfs-with-smiles myself. That is essentially our trust model with GitHub already: you blindly download large official looking projects but read through the code of small stuff (right?). Perhaps after a while some developers become trusted and get to have their signing process fast-tracked.
At the end of the day if I could have one improvement it would be a big fat warning that pip gives you that says "what you are about to install is likely malicious code. Do not do it without verifying that it is not by manually downloading it and reading the source."
BTW, simply running "pip install foo" will let foo's setup.py run arbitrary code on your machine. It is such a giant security hole by design that I cringe every time I have to use it.
Edit: another potential model is for PyPI to punt the verification onto outside entities, such as GitHub, Bitbucket, etc. Basically instead off building my packages on my (potentially compromised machine), I would tag a version of it on GitHub and instruct PyPI to build it. It is easy to establish trust between PyPI and GitHub, so the guarantee you get with PyPI is "this package came from exactly this source" where you can read the source before installing it. Like I said above, GitHub also has a trust model problem but this would reduce the amount of work one would need to do to verify a package from two places to one.
Suppose that 'pip install foo' did not run a setup.py. Now eventually you are going to run some code from foo. If you weren't, why are you installing it? At that time, you could have 'arbitrary code' running on your machine.
My point is that you don't even get a chance to review the source before it runs. pip runs in trusted code automatically, without any verification whatsoever. At least when you run 'git clone' that does not happen.
The recently released pip 1.4 allows you to install from Wheels which does not use setup.py during install and executes no code from the package during install.
Yes, except I have seen very few packages that are not sdist's. Another issue is that unless the package maintainer disables it explicitly on PyPI, pip will go searching over the package's home page for a newer version. This is just broken since even if PyPI was to implement some type of package verification, example.com/~/devfoo/ might not. Often times these pages do not even have HTTPS set up.
As you said, this is a complex problem, but I think the first step is to take care of the easy things. Let me know if you'd like to chat off HN about this.
If the package repository acted as a certificate authority, and generated distributors' certificates by verifying the distributor's appropriate virtual identity (GitHub, BitBucket, DNS), then I think you can at least be pretty sure that the person making the release is someone who would have had access to commit to the project's source code anyway.
Is that at least "good enough"? If the root certificate for the package repo's CA were itself signed by a real-world CA, then I think what you end up trusting is the security of the repo, of GitHub, and of the project's developer(s). Projects with multiple committers could have several of them receive certificates, and require a quorum of signatures before a release is published, to mitigate the risk of any one core dev being compromised.
To keep package names trustable, I think the only sane scheme would be using the project's URL on the service which was used to prove its identity; Rails would be "https://github.com/rails/rails", though surely you could use aliases (github:rails/rails) to reduce typing.
As the OP says, if the same organization -- and even likely the same _server_ -- is being the certificate authority and distributing the signed releases....
> However in this model if someone is able to send you a malicious package they are also likely able to send you a malicious key.
Well crap. Source code reputation is exactly the problem we are trying to solve at http://rputbl.com (sorry, just a splash page for now.)
We are nowhere near close to launch yet and there's a chicken-and-egg problem of having a sufficiently large database of hashes to make an arbitrary source code file reputation check worth your time, but we have some ideas about what to do about that too.
I agree it's a problem without an easy solution. It's not just a problem with python, it's exactly the same thing with rubygems (where it's been getting similar discussion, especially after a rubygems vulnerability a few months ago).
But: The elephant in the room when talking about package signing is what exactly we are trusting.
I think it's actually relatively clear. When I install "rails", I want to know that it really did come from the "rails team", and not from a third party man in the middle.
That's the most that can be expected, and that's sufficient. There's no way to technologically ensure that the rails team itself isn't intentionally including malware in their release. And of course no way to technologically ensure that the release doesn't have bugs or vulnerabilities.
The goal is just ensuring the release is really from who it says it's from. Which is of course hard enough already, for the reasons dealt with in OP, and because who is "the rails team" exactly anyway?
I think this is a good introduction to some of the depth of the problem. As with all big problems, you start by breaking them down into what you can and cannot do.
So in the field of package signing, you can, if you choose to, nominate a single signature authority. This is basically what Microsoft does, they are the signature authority for all keys that sign things that go into Microsoft products. When you choose a signature authority that becomes the first lemma in your calculus "I trust <foo>."[1] Now, once you have that lemma you can build on it with statements like, "I trust Microsoft, and Microsoft trusts Dell, so I will trust that the key Microsoft says is Dell's really is Dell's."
Now you've created a transitive trust relationship, in that you not only trust Microsoft's internal processes, but you trust that it will do a good job of auditing Dell's before it gives its blessing to Dell.
Now Don makes a minor error when he says, "PyPI allows anyone to sign up and make a release which makes verifying authors an unmanageable problem." the issue is that PyPI doesn't create a durable relationship between people you trust and someone you don't know.
Because of this you have to assume there are bad actors who are making packages and simply not install them. This isn't an "unmanageable problem" this is "inconvenient." (not being snarky, trying to be precise). You can run a grocery store where everyone puts the money they owe into the cash register before they leave there are no checks on the cash register, everyone is trusted to do the right thing. The store going out of business isn't a "problem" it is the expected outcome. This comes up in policy debates as well where person A wants their free speech rights but they also don't want person B to make remarks they consider to be hateful (thus impinging on person B's free speech rights). Its the 'incompatible constraints' problem.
There are a number of interesting zero knowledge proof techniques in cryptography, things which allow someone you know nothing about (zero knowledge) to prove they are a particular person, and starting from there you can create durable audit trails of actions (useful in digital cash systems among others) but it also allows you to identify bad actors, sadly after the fact, because you can make statements about the person who identified and the package they produced. You cannot say anything about motive (they may have built their package on a compromised machine) but you can trace back the action to where it entered the system, and with sufficient audit trails back it out of the system.
As you can imagine these systems are challenging to get right, challenging to build, and take an expertise that is generally highly valued so rarely available "for free."
But working on challenging problems has its own rewards and so I recommend it highly. It is always a workable approach to start with assumptions (lay them out and keep them front and center) to gain an understanding of the more challenging aspects, and then work from there. Since you aren't building life support systems you may find that just good auditing trails is enough if they allow you to back up to any previous state. Sometimes using a centralized authority gives you enough to build a system around it. I highly recommend Schneir's "Applied Cryptography" for a pretty approachable take on these topics.
[1] People will attempt to derail this example with "but I don't trust Microsoft!" or some such. Which is fine, pick an authority you do trust and use that as a place holder.
> You can run a grocery store where everyone puts the money they owe into the cash register before they leave there are no checks on the cash register, everyone is trusted to do the right thing. The store going out of business isn't a "problem" it is the expected outcome.
Kind of not the main point here, but I've been in a store that was left unattended and there was a basket of money out for people to make their own change. As far as I know it's still there...
The drug store was kind of like that in Endicott when I was a kid, when the druggist was out for lunch the sign asked you to put what you owed in the basket next to the register. I asked my Grandmother why people didn't just take stuff out of the store and she said, "Oh no, everyone likes the druggist, no one would want to inconvenience him like that." and so the community was small enough that the trust metric was simply we like the guy behind the counter.
It isn't like that today, and Endicott is quite a bit larger than it was then, and people are sadly a lot less neighborly than they were when I was a kid (could be nostalgia though).
I believe the trust issue was "resolved" by locking the store when the pharmacist was out, and eventually raising prices so that they could pay the salary of someone to be there to help.
And so it is with the PyPI community, everyone is trusting everyone not to do anything bad. And that is a perfectly reasonable strategy, you just have to accept the risk that comes along with it. Sometime in the future something bad could happen if someone chooses to violate that trust.
Now in a classic vulnerability analysis you'd ask "What is the motive or payoff?" You don't put a steel vault door on your home because chances are if somebody wants into your house that badly they are going to come through the window. You accept a certain amount of risk, perhaps you have a home safe for really valuable stuff and insurance for the rest.
Apologies for hijacking your top comment but I have some additions for Microsoft signing:
An organisation I am familiar with enabled "AllSigned" via group policy. For those who don't know about PowerShell, this forces every script that runs on the system to be signed with a trusted* certificate.
This immediately meant that developers had trouble using NuGet in Visual Studio (lots of packages run init.ps1 to do setup and move files around, but few are signed). You can't even accept the prompt for signed packages (Do you want to run this script signed by "Microsoft Corporation"? [Y] [N]) because the prompt isn't interactive while NuGet is installing packages.
That kind of setup also precludes the use of Choclatey or any other solution that might be using Powershell without the correct signatures.
Finally, if you can run Powershell interactively, you can still just write code or use Invoke-Expression (eval).
Wargh.
*Trusted in this case still means you'll get a prompt the first time you run the script, and a Visual Studio extension installer still runs with enough permissions to install its own certificate into the right store, so I can still be evil if I want. That's assuming the admins haven't accidentally enabled the proxy MITM certificate for code signing (I'll just export and use that instead).
Hmm, maybe I wasn't clear. I actually meant It's unmanageable for the administrators/owners of PyPI to verify each author in the way that the owners of Debian/Red Hat verify their package maintainers. Basically that the repository can't act as a CA/Root of Trust because we simply don't have the man power.
I understood that was where you were, and by accepting that constraint you define what you can and cannot do vis-a-vis package signing. As you point out there isn't a magic bullet, some forgotten design pattern or algorithm, or some clever slight of hand that will make your repository secure.
That leaves you with one of two choices, accept that it isn't going to be secure and plan accordingly, or change the constraints.
For example, lets say you had exactly 1 volunteer who was willing to sign up to be your key signing authority, and as a volunteer they were only available on weekends. You could build a central authority but the throughput would be quite slow. That would impede your developers from getting authenticated and packages would take a long time to get out. But they would be signed and you would have some measure of trust in the packages as signed.
That might be unacceptable to your user base and they all go off and use something else (or as it often the case in open source they fork a new copy and decide theirs will have the rules they want). No amount of forking will create a secure system, so your constraints mean you choose risk and a wide pool of developers over less risk and fewer developers.
That a system that meets two incompatible constraints simultaneously is impossible should not necessarily stop you, but it can inform on what will happen in the possible futures. Its like growing old, you can say "I never want to be older than 25" but really there isn't anything you can do about that, what you can do is say "I always want to be able to do the following things like I was 25 ..." and work on what ever sort of preventative exercises or stuff you need to work on to make that true, but you can't stop yourself from aging.
So you can't have a package signing system with the current developer environment. Awesome, what about package signing is important going forward and what about it isn't as important? And given that, what changes can you make and what changes can't you make? And then given that you know how the future for that particular set of choices are going to roll forward.
Interestingly enough I actually I have a rough sketch of an idea which will probably get laid out in a future post. I started to add it to this one but it felt disjointed and crammed to add it so I figured I'd wait till later :)
But you're right something somewhere has to give because nothing is 100%. I hoped to show people that things aren't 100% and just slapping some crypto in there isn't going to give the results they were hoping for.
Hmm, with Ubuntu, there is not a required web-of-trust, as the uploader merely adds their key to their account using Launchpad.
If the user was able to access their Launchpad account, there is implicit trust.
There isn't a Ubuntu shared keyring for each developer/maintainer, as there is with Debian - rather, a handful of shared keys, where the secret is only held by the infra. These are currently, Apt Archive, CD image & now Ubuntu Cloud Archive.
But Launchpad accounts only get upload rights when their prior work is reviewed by the developer membership board. And the prior work is identified against that same Launchpad account. So there is a root of trust: it's the developer membership board. It just happens that the trust system isn't implemented using a PGP web of trust directly.
"This isn't an already solved problem nor is it an easy to solve one."
This isn't even a defined problem, much less solved or easily solved. A goal would help a lot. I searched the article and the comments for the word "goal" and found nothing. It is an OK laundry list of several tools and techniques and their strengths and weaknesses in general. Most of the tools listed do pretty easily solve certain problems; however how those easily solved problems relate to the undefined problem is apparently inadequate. That is not the fault of the tools. Also it is not an exhaustive list of all possible techs/tools. Perhaps the topic he has not googled for yet which meets the unspecified goal is "SSL certs" or "SSH host keys in DNSSEC secured zones".
Without a defined goal, rudderless drifting is unavoidable. This applies to a lot more than computational system designs.
Reading between the lines and doing some creative writing about something I find interesting, I am guessing what he's asking for is some kind of mandatory peer review/code audit of git commits aka the dude who writes code is never allowed to merge the code, at minimum, and the dude who writes unit tests is never allowed to write code, and all devs have to be in the "big" GPG WoT off a major public keyserver not just one little project. Because I have no idea what problem he's trying to solve, this particular solution to a problem, although somewhat interesting, may not have anything to do with his actual goal.
The irony is hilarious because there is at least some disagreement on what the "Holy Grail" really is, outside this discussion. So yes, "Package signing is not the Holy Grail" because we have no idea what the Holy Grail Really Is both archeologically/historically and WRT python package distribution. Maybe both are just a coffee cup.
An entire economy (e-commerce) is tied into the premise of bundling your keys with the first download (the browser) or OS install (browser shipped with OS). So far, not a lot of problems with this method, for most people. Why not try it? Can't be worse than no signing at all.
There actually is a good bit of problem with this method even in the browser space.
For one, which key do you bundle? If it's a central root key see the part about Linux as that's essentially what they do. Also the part about not having the work force that the for profit CA vendors have is relevant here too.
tl;dr no, there is no problem with it in the browser space
Step 1. Red Hat creates a CA and ships key with OS.
Step 2. Red Hat creates a keystore that allows any developer to add their own key.
Step 3. Package is created.[1]
Step 4a. Package is signed by Red Hat when they create the package.[2]
Step 4b. If 4a not followed, Package is signed by the package creator using their key from the Red Hat keystore.
Step 5. User downloads package.
Step 6. User installs package. Package signature is verified.[3]
[1] As with all Linux distros, packages are either created/approved by a distribution release manager or made by 3rd parties completely independent of the distro.
[2] Just like with browsers and certs, you can install a 3rd-party CA for a 3rd-party package if you want, but the key distribution is left up to the user to do safely. Most people don't do this.
[3] Again, just like with browsers and certs, a package signature is either signed by a CA or by a developer. If it's CA-signed, the OS already has the cert, and it is verified. If it's developer-signed, the developer's public cert is shipped with the package. Some crypto math tells you whether this developer cert was created by the CA or is a phony.
Red Hat can do this for virtually no cost; they just need to host a public web service that lets you create a signed key. They already host tons of free public services, so I don't see how this would be an issue for them. Not to mention someone could host a distro-independent service that does this exact same thing, and every distro could include its CA.
The only "problem" here is people have wacky expectations of trust. Bob creates a distro, Sally creates some software, and Frank creates a package of the software for the distro. You have to trust all three of them - which you can do by accepting all of their signed keys.
But realize that there is no "easy" way to trust Frank. Frank's essentially a stranger. We don't trust Frank in the browser world, so I don't know why we're expected to trust him with our packages.
The more I think about this the more confused I become
- the obvious first step is for pypy to generate a nonce for each upload request. If that is stored on the original developers source repo then we know that they control that repo. And ?
- send the nonce via a side channel - say email. Ok the owner of the repo also has access to that email. Which is slightly more helpful
But each turn I take I ask myself what am I trying to trust? That the code is not malicious? That's a third party test process. Who will do that?
Debian is a good example here. A requirement of being a Debian Maintainer or Developer is that you must have met real existing Debian Developers in person and they must have signed your key (after checking your identity)[1].
This certainly does make it harder to become a DM or DD. But it gives us the Debian keyring, which is the distro trust implementation for Debian as explained in the article.
However, despite the difficulty in bootstrapping developers, Debian have achieved it. Thanks to them, there are many more people in the strong set[2], and you can probably find one near you[3]. Can we use this to bootstrap other communities, and end up in a situation where it's normal to already have been "introduced" into the strong set if you're a software author in any project?
[1]: http://www.debian.org/devel/join/nm-step2 [2]: http://pgp.cs.uu.nl/plot/ [3]: http://wiki.debian.org/Keysigning/Offers