Join me in the rabbit hole of git repository verification, and how we
could improve it.
Problem statement
As part of my work on automating install procedures at Tor, I
ended up doing things like:
git clone REPO
./REPO/bootstrap.sh
... something eerily similar to the infamous curl pipe bash
method which I often decry. As a short-term workaround, I relied on
the SHA-1 checksum of the repository to make sure I have the right
code, by running this both on a "trusted" (ie. "local") repository and
the remote, then visually comparing the output:
$ git show-ref master
9f9a9d70dd1f1e84dec69a12ebc536c1f05aed1c refs/heads/master
One problem with this approach is that SHA-1 is now considered as
flawed as MD5 so it can't be used as an authentication mechanism
anymore. It's also fundamentally difficult to compare hashes for
humans.
The other flaw with comparing local and remote checksums is that we
assume we trust the local repository. But how can I trust that
repository? I can either:
audit all the code present and all the changes done to it after
or trust someone else to do so
The first option here is not practical in most cases. In this specific
use case, I have audited the source code -- I'm the author, even --
what I need is to transfer that code over to another server.
(Note that I am replacing those procedures with Fabric, which
makes this use case moot for now as the trust path narrows to "trust
the SSH server" which I already had anyways. But it's still important
for my fellow Tor developers who worry about trusting the git server,
especially now that we're moving to GitLab.)
But anyways, in most cases, I do need to trust some other fellow
developer I collaborate with. To do this, I would need to trust the
entire chain between me and them:
- the git client
- the operating system
- the hardware
- the network (HTTPS and the CA cartel, specifically)
- then the hosting provider (and that hardware/software stack)
- and then backwards all the way back to that other person's computer
I want to shorten that chain as much as possible, make it "peer to
peer", so to speak. Concretely, it would eliminate the hosting
provider and the network, as attackers.
OpenPGP verification
My first reaction is (perhaps perversely) to "use OpenPGP" for this. I
figured that if I sign every commit, then I can just check the latest
commit and see if the signature is good.
The first problem here is that this is surprisingly hard. Let's pick
some arbitrary commit I did recently:
commit b3c538898b0ed4e31da27fc9ca22cb55e1de0000
Author: Antoine Beaupré <anarcat@debian.org>
Date: Mon Mar 16 14:37:28 2020 -0400
fix test autoloading
pytest only looks for file names matching `test` by default. We inline
tests inside the source code directly, so hijack that.
diff --git a/fabric_tpa/pytest.ini b/fabric_tpa/pytest.ini
new file mode 100644
index 0000000..71004ea
--- /dev/null
+++ b/fabric_tpa/pytest.ini
@@ -0,0 +1,3 @@
+[pytest]
+# we inline tests directly in the source code
+python_files = *.py
That's the output of git log -p
in my local repository. I signed
that commit, yet git log
is not telling me anything special. To
check the signature, I need something special: --show-signature
,
which looks like this:
commit b3c538898b0ed4e31da27fc9ca22cb55e1de0000
gpg: Signature faite le lun 16 mar 2020 14:37:53 EDT
gpg: avec la clef RSA 7B164204D096723B019635AB3EA1DDDDB261D97B
gpg: Bonne signature de « Antoine Beaupré <anarcat@orangeseeds.org> » [ultime]
gpg: alias « Antoine Beaupré <anarcat@torproject.org> » [ultime]
gpg: alias « Antoine Beaupré <anarcat@anarc.at> » [ultime]
gpg: alias « Antoine Beaupré <anarcat@koumbit.org> » [ultime]
gpg: alias « Antoine Beaupré <anarcat@debian.org> » [ultime]
Author: Antoine Beaupré <anarcat@debian.org>
Date: Mon Mar 16 14:37:28 2020 -0400
fix test autoloading
pytest only looks for file names matching `test` by default. We inline
tests inside the source code directly, so hijack that.
Can you tell if this is a valid signature? If you speak a little
french, maybe you can! But even if you would, you are unlikely to see
that output on your own computer. What you would see instead is:
commit b3c538898b0ed4e31da27fc9ca22cb55e1de0000
gpg: Signature made Mon Mar 16 14:37:53 2020 EDT
gpg: using RSA key 7B164204D096723B019635AB3EA1DDDDB261D97B
gpg: Can't check signature: No public key
Author: Antoine Beaupré <anarcat@debian.org>
Date: Mon Mar 16 14:37:28 2020 -0400
fix test autoloading
pytest only looks for file names matching `test` by default. We inline
tests inside the source code directly, so hijack that.
Important part: Can't check signature: No public key. No public
key
. Because of course you would see that. Why would you have my
key lying around, unless you're me. Or, to put it another way, why
would that server I'm installing from scratch have a copy of my
OpenPGP certificate? Because I'm a Debian developer, my key is
actually part of the 800 keys in the debian-keyring package,
signed by the APT repositories. So I have a trust path.
But that won't work for someone who is not a Debian developer. It will
also stop working when my key expires in that repository, as it
already has on Debian buster (current stable). So I can't assume I
have a trust path there either. One could work with a trusted keyring
like we do in the Tor and Debian project, and only work inside that
project, that said.
But I still feel uncomfortable with those commands. Both git log
and
git show
will happily succeed (return code 0 in the shell) even
though the signature verification failed on the commits. Same with
git pull
and git merge
, which will happily push your branch ahead
even if the remote has unsigned or badly signed commits.
To actually verify commits (or tags), you need the git
verify-commit (or git verify-tag) command, which seems to do
the right thing:
$ LANG=C.UTF-8 git verify-commit b3c538898b0ed4e31da27fc9ca22cb55e1de0000
gpg: Signature made Mon Mar 16 14:37:53 2020 EDT
gpg: using RSA key 7B164204D096723B019635AB3EA1DDDDB261D97B
gpg: Can't check signature: No public key
[1]$
At least it fails with some error code (1
, above). But it's not
flexible: I can't use it to verify that a "trusted" developer (say one
that is in a trusted keyring) signed a given commit. Also, it is not
clear what a failure means. Is a signature by an expired certificate
okay? What if the key is signed by some random key in my personal
keyring? Why should that be trusted?
Worrying about git and GnuPG
In general, I'm worried about git's implementation of OpenPGP
signatures. There has been numerous cases of interoperability problems
with GnuPG specifically that led to security, like EFAIL or
SigSpoof. It would be surprising if such a vulnerability did not
exist in git.
Even if git did everything "just right" (which I have myself found
impossible to do when writing code that talks with GnuPG), what does
it actually verify? The commit's SHA-1 checksum? The tree's checksum?
The entire archive as a zip file? I would bet it signs the commit's
SHA-1 sum, but I just don't know, on the top of my head, and neither
do git-commit or git-verify-commit say exactly what is happening.
I had an interesting conversation with a fellow Debian developer
(dkg) about this and we had to admit those limitations:
<anarcat>
i'd like to integrate pgp signing into tor's coding
practices more, but so far, my approach has been "sign commits" and
the verify step was "TBD"
<dkg>
that's the main reason i've been reluctant to sign git
commits. i haven't heard anyone offer a better subsequent step. if
torproject could outline something useful, then i'd be less averse
to the practice.
i'm also pretty sad that git remains stuck on sha1, esp. given the
recent demonstrations. all the fancy strong signatures you can make
in git won't matter if the underlying git repo gets changed out from
under the signature due to sha1's weakness
In other words, even if git implements the arcane GnuPG dialect just
so, and would allow us to setup the trust chain just right, and
would give us meaningful and workable error messages, it still would
fail because it's still stuck in SHA-1. There is work underway to
fix that, but in February 2020, Jonathan Corbet described that work as
being in a "relatively unstable state", which is hardly something I
would like to trust to verify code.
Also, when you clone a fresh new repository, you might get an entirely
different repository, with a different root and set of commits. The
concept of "validity" of a commit, in itself, is hard to establish in
this case, because an hostile server could put you backwards in time,
on a different branch, or even on an entirely different
repository. Git will warn you about a different repository root with
warning: no common commits
but that's easy to miss. And complete
branch switches, rebases and resets from upstream are hardly more
noticeable: only a tiny plus sign (+
) instead of a star (*
) will
tell you that a reset happened, along with a warning (forced update
)
on the same line. Miss those and your git history can be compromised.
Possible ways forward
I don't consider the current implementation of OpenPGP signatures in
git to be sufficient. Maybe, eventually, it will mature away from
SHA-1 and the interface will be more reasonable, but I don't see that
happening in the short term. So what do we do?
git evtag
The git-evtag extension is a replacement for git tag -s
. It's
not designed to sign commits (it only verifies tags) but at least it
uses a stronger algorithm (SHA-512) to checksum the tree, and will
include everything in that tree, including blobs. If that sounds
expensive to you, don't worry too much: it takes about 5 seconds to
tag the Linux kernel, according to the author.
Unfortunately, that checksum is then signed with GnuPG, in a manner
similar to git itself, in that it exposes GnuPG output (which can be
confusing) and is likely similarly vulnerable to mis-implementation of
the GnuPG dialect as git itself. It also does not allow you to specify
a keyring to verify against, so you need to trust GnuPG to make sense
of the garbage that lives in your personal keyring (and, trust me, it
doesn't).
And besides, git-evtag
is fundamentally the same as signed git tags:
checksum everything and sign with GnuPG. The difference is it uses
SHA-512 instead of SHA-1, but that's something git will eventually fix
itself anyways.
kernel patch attestations
The kernel also faces this problem. Linus Torvalds signs the releases
with GnuPG, but patches fly all over mailing list without any form of
verification apart from clear-text email. So Konstantin Ryabitsev has
proposed a new protocol to sign git patches which uses SHA256 to
checksum the patch metadata, commit message and the patch itself, and
then sign that with GnuPG.
It's unclear to me what this solves, if anything, at all. As dkg
argues, it would seem better to add OpenPGP support to
git-send-email
and teach git tools to recognize that (e.g. git-am
)
at least if you're going to keep using OpenPGP anyways.
And furthermore, it doesn't resolve the problems associated with
verifying a full archive either, as it only attests "patches".
jcat
Unhappy with the current state of affairs, the author of fwupd
(Richard Hughes) wrote his own protocol as well, called
jcat, which provides signed "catalog files" similar to the ones
provided in Microsoft windows.
It consists of a "gzip-compressed JSON catalog files, which can be
used to store GPG, PKCS-7 and SHA-256 checksums for each file". So
yes, it is yet again another wrapper to GnuPG, probably with all the
flaws detailed above, on top of being a niche implementation,
disconnected from git.
The Update Framework
One more thing dkg correctly identified is:
<dkg>
anarcat: even if you could do exactly what you describe,
there are still some interesting wrinkles that i think would be
problems for you.
the big one: "git repo's latest commits" is a loophole big enough to
drive a truck through. if your adversary controls that repo, then
they get to decide which commits to include in the repo. (since
every git repo is a view into the same git repo, just some have more
commits than others)
In other words, unless you have a repository that has frequent commits
(either because of activity or by a bot generating fake commits), you
have to rely on the central server to decide what "the latest version"
is. This is the kind of problems that binary package distribution
systems like APT and TUF solve correctly. Unfortunately, those
don't apply to source code distribution, at least not in git form: TUF
only deals with "repositories" and binary packages, and APT only deals
with binary packages and source tarballs.
That said, there's actually no reason why git could not support the
TUF specification. Maybe TUF could be the solution to ensure
end-to-end cryptographic integrity of the source code
itself. OpenPGP-signed tarballs are nice, and signed git tags can be
useful, but from my experience, a lot of OpenPGP (or, more accurately,
GnuPG) derived tools are brittle and do not offer clear guarantees,
and definitely not to the level that TUF tries to address.
This would require changes on the git servers and clients, but I think
it would be worth it.
Other Projects
OpenBSD
There are other tools trying to do parts of what GnuPG is doing, for
example minisign and OpenBSD's signify. But they do not
integrate with git at all right now. Although I did find a
hack] to use signify with git, it's kind of gross...
Golang
Unsurprisingly, this is a problem everyone is trying to solve. Golang
is planning on hosting a notary which would leverage a
"certificate-transparency-style tamper-proof log" which would be ran
by Google (see the spec for details). But that doesn't resolve the
"evil server" attack, if we treat Google as an adversary (and we should).
Python
Python had OpenPGP going for a while on PyPI, but it's unclear if it
ever did anything at all. Now the plan seems to be to use TUF but
my hunch is that the complexity of the specification is keeping that
from moving ahead.
Docker
Docker and the container ecosystem has, in theory, moved to TUF in the
form of Notary, "a project that allows anyone to have trust over
arbitrary collections of data". In practice however, in my somewhat
limited experience,
setting up TUF and image verification in Docker is far from trivial.
Android and iOS
Even in what is possibly one of the strongest models (at least in
terms of user friendliness), mobile phones are surprisingly unclear
about those kind of questions. I had to ask if Android had end-to-end
authentication and I am still not clear on the answer. I have no
idea of what iOS does.
Conclusion
One of the core problems with everything here is the common usability
aspect of cryptography, and specifically the usability of verification
procedures. We have become pretty good at encryption. The harder
part (and a requirement for proper encryption) is verification. It
seems that problem still remains unsolved, in terms of usability. Even
Signal, widely considered to be a success in terms of adoption and
usability, doesn't properly solve that problem, as users regularly
ignore "The security number has changed" warnings...
So, even though they deserve a lot of credit in other areas, it seems
unlikely that hardcore C hackers (e.g. git and kernel developers)
will be able to resolve that problem without at least a little bit of
help. And TUF seems like the state of the art specification around
here, it would seem wise to start adopting it in the git community as
well.