
Yo! Welcome to my presentation about Securing the Open-Source supply chain. My name is Morten Linderud and we'll talk a little bit about how we can secure the open-source supply chain. How we can procure sources and, sort of, deliver them to a machine. And how the ecosystem is developing and adapting to the new challenges. My name is Morten Linderud, I go the name Foxboron on the internet. I work as a security engineer at Defendable where I do log pipeline stuff. I have a Master degree from the University of Bergen where I wrote about supply chain security in Debian, specifically. I've been a F/OSS developer/maintainer since around 2013, and I've been an Arch Linux contributor since around 2016
where I mostly do security team stuff. Like managing CVEs and publishing advisories, reproducible builds which we will talk a little bit about later, and then also a lot of packaging with different software.
Quick disclaimer, This talk does not necessarily represent the views of my employer, etc. This is also, sort of, a short talk, so it might not necessarily be super detailed with some of the specifics. But it's still an attempt at wrapping up a lot of the topics.
The first thing we'll take a look at is Reproducible Builds. Reproducible builds is a little bit of the foundation that we'll build the rest of the building blocks on. Because it's, essentially, "a set of software development practices that create an independently-verifiable path from source to binary code." What that means is that it tries to figure out how do we take out all of the variables from a compilation build of code and make it reproducible, so we can bit-for-bit verify the different binary distribution subset file. To illustrate how this works in practice I have made som Ess Vee Gees Ess Veh Geh Ess SVGs of how packaging works. We have me, and I went through all of the different packages I maintain and I realized
the most popular one was the Go compiler. When we package a new version of Go we have to figure out where these artifacts are published. In Google's case, all of them are on storage.googleapis.com. When there's a new version, the tarballs are uploaded to that location and usually contains a signature. This is what we use to build the package. What we do fetch out-of-bound is the Google package signing key. And it is important that this does not arrive from a single location because then the distribution mirror could be compromised. Having these out-of-bound fetched from another location and then verify source tarballs ensures that we can trust this release. Then I can publish this up to the build server
which is going to take the source code and turn it into a binary at this point I can go and say "I made this". Now I have to give it to the world as well. I take the newly built package and I upload it to the tier 0 mirror which is where we do all of the package distributions. The packaging mirrors are going to fetch all of the packages from the tier 0 mirror and these packaging mirrors are where you get your software updates. It could be your computer, it could be your servers or containers and so on. Reproducible builds essentially ensures that I haven't messed with the code the build server has not messed with the code
and that the tier 0 mirror has not messed with the code. By ensuring that it can actually fetch the source code compile the package that I built and get an identical built package. This is sort of complicated. You might be wondering how well it works in practice like in a real-world situation. And that is a little bit funny, a little bit amazing. In Arch we have 85% of 12 thousand packages reproducible. Which is a good number, it fluctuates a little between 87 and 81 depending on the regression that time because... We break stuff. I think it's at 80% now, because I haven't patched our tooling yet and there's a regression. So I have to fix that.
How this translates to a live system is that I can ask what the reproducibility of my system is in terms of packages. You'll see that I have 93% reproducible system. That's amazing, like sure, yay, supply chain secured! We don't have any issues at all because we'll soon reach 100% and everything's going to be amazing. But that's not really true because reproducible builds only really cares about the distribution network that we created. Is there any other rector that could sneak in code? The answer to that is that we don't really have any clue what is necessarily pushed from the Google storage. They signed some tarball from their source code repository, but there could be a million commits in that project
we have no clue what was done to it. So, what the real question entails though is What about source code? How do we secure the source code that we depend on for these sort of things? In 2011, kernel.org, which hosts the Linux kernel source code and a few other projects, was hacked. They didn't find any modifications, but yeah. And this was the second time, I think, which was compromised. In 2018 the Gentoo Linux distribution got their GitHub organization hacked because they did not have 2-factor authentication on all of their developers. And those people actually tried to modify packaging files. It was, obviously, stopped and 2-factor authentication was later added. But it's a little bit worrying when, sort of, the Holy Grail of Linux distributions
are attacked and compromised in that way. Another good example is NPM like, in general there's a, I don't know, hundred different cases of tarballs being swapped around and fetching your Bitcoin wallet, sending that off to somewhere or compromising your build or those sorts of compromises. But to, sort of exemplify a recent scenario the PHP Git repository was compromised fairly recently. It's a little bit interesting how that sort of attack vector is done, essentially. Quick disclaimer, I don't actually know what was done I just did a little bit of guesswork based on experience and it's all a bit coarse, but covers the scenario as an example. So you have Rasmus Lerdolf which was the main creator of PHP.
When he interacts with the Git repositories, he usually does that with the git.php.net server which is where the PHP source bare repository is located. This is then a mirror over to the GitHub repository. And this is the two locations where people fetch the source code for development builds, patches and so on. The releases to PHP is done on a separate server called php.net and I don't know how they do releases, probably rsync or something. But these tarballs are protected by reproducible builds because if something happens on the distribution server you can still check out the source code of the Git repository and compare it and you'll figure out that it does not add up at all.
That's covered, but in the PHP-scenario, you have a hacker in this case, Zero Cool, which did actually manage to get physical access to the server and inject malicious code into the bare repository. It was a little bit I think we can call it a little bit amateurish because it was detected. It's not super obvious, but you see the revert of "Revert "[skip-ci] Fix typo"" This reverts some commit. But it also adds a very big blob of code which fetches something from the HTTP header and tries to execute that. This was detected and reverted and everything is fine. They also migrated away from the private server, which was poorly maintained, over to GitHub which is, hopefully, more secure.
But they still didn't really figure out where this code came from. They just assumed it was physical access and they had no traces of who put the code there. What can we do to protect against these issues? A feature, that I honestly recently learned about, is that "git push" can actually do certificates or attestations of what was uploaded. So "git push --sign" tells the server that I want to sign I've uploaded a bunch of stuff to you and I want to sign that I did upload this stuff. And that allows us to make a statement about this is what I did and you can verify that I pushed this code. The way this works in Git is that "git push --signed" initiates this connection.
You have a standard Git commit and from this commit it generates an attestation saying this key, this timestamp, pushed to this repository, and the nonce because it needs to be unique. Then the commits to this repository and you see I pushed from the root commit 0000 to this new commit, I pushed to the master branch. Following this certificate is a signature that I made saying I did this and this is the gist of it. It also then completes the rest of the transaction. But This isn't super useful on it's own because if this was only done on PHP's side I would not necessarily know about this attestation and I would have no idea how to verify them.
PHP does not do this, I don't think it has support for it either. But people hosting their own stuff have implemented it. An example of this is kernel.org which, I mentioned earlier, was compromised in 2011.
They actually pushed all of their "git push" operations to one Git repository so you can and view them and actually see what's being pushed. But, I'm not sure how do you review this? How do you actually look at the commits go say "eh this looks fine" or distinguish it from a malicious commit to this repository because it's still the same server as the rest of the source code. If you look at each file, you'll essentially just see an email which just contains all the files and good luck trying to eyeball whatever this is supposed to mean. The real question is then "how do we monitor this?" At all? Do we just "git clone" it and figure out on our own how this works?
After I submitted this talk, I realized it would be fun project to try and write a monitor for this service so I did. So this is Kernel.org Transparency Log Monitor. Which I wrote as a weekend hack earlier this month. It, essentially, just checks out the Git repo we just saw and just verifies all of the different commits with the signatures. So you see, Greg, very good at signing all his pushes the other people, not so much. Greg is the Linux stable kernel maintainer, if you don't know. This is just an example of how it could look like there's a few features... Missing. Sort of things, like can we check if all commits in a kernel release is witnessed by this log?
And can we try to verify all of it? A little bit out of the question, but this is fun to hack around with. This was just a weekend hack to see if I could do it. But This is pretty much just a thing that's unique to the kernel, nobody else is publishing a log like this. So it's a unique format. It's specific to the kernel community. And that monitor is probably not going to be useful for anyone else except for the kernel. What about the rest of us? Like, my pet projects? I'd like if I could have some verifiability of the commits that's done to them. Other people might have some interesting projects that they
want to have on this log. But they don't necessarily want to host a log like that and depend on other people monitoring that log. This is what, a new project from Linux Foundation, is a project called sigstore and it does, essentially, what you think it does. And it allows you to take release artifacts or git push certificates and append them to a transparency log. A public one, at that. This is a Linux Foundation project. It's worked on by people from Google, Red Hat, Purdue University mostly, and a few others. It's a public transparency log for signing objects and the strength in this is that there's one place to store all of it. And preferebly in abstract ways
so it's reusable as well. This is similar to what you probably recognize as certificate transparency. It's, essentially, a way to have tamper-evident logging of things that's been signed. We can witness this log, ensure all the releases are on this log or you can fetch releases from Google and verify it's all on that log. But we still need to sign stuff, and I mean, first of all like Gnupg is terrible. I use it a lot, it's completely terrible, it's hard to use. If we don't want to use Gnupg to sign stuff we need other things to sign with. What sigstore is trying to experiment with a little bit is about OpenID Connect. Identities and create certificates from them to sign release artifacts.
This, sort of, it makes it a little bit of a, possibly, a silo because sigstore does their own thing. It enables regular maintainers to better sign their object with their own identity instead of having to rely on some extremely poor tooling like Gnupg, for developers. We then have OpenID Connect certificates and we can sign releases. And that enables us, as mentioned, to ensure that all releases we want to have is on this log. It also... It's, sort of, a better way for us to ensure that things are secure from A to B. Because now we can have more security built around our Git repositories. And because it uses OpenID Connect identities, it's sort of a public web key infrastructure
or Web PKI I saw somebody call it. It's sort of what Keybase tried to be a few years back, but it still depended on Gnupg which is not a good idea in 2021. It's probably a better idea for some people to look at instead. But what should we be signing then? And this is sort of the fun thing about this project, is that it's not dependent on the one thing. They implemented cosign which is capable of signing helm charts, containers, OCI images, in general. I know people use it for Kubernetes as well I believe. That enables you to sign things, append them to this public log and then you can verify it. This is not a constrain to the OCI format,
but it's one example of the format you could implement. How this is actually implemented is JSON records, which implement different types, are appended to this tree which is then checksummed and then checksummed upwards. Similar to blockchain, without the baggage and consensus algorithms it's just a tree with checksums. Because checksums are secure we can always know if a tree is consistent or not. This has not implemented all of it. It's still a work in progress the Web PKI stuff is not properly implemented yet. I think this... I showed off my kernel transparency logging project and I think this is probably a good target for that as well. So that's, sort of, on my todo-list to try and implement.
Somewhat in conclusion, open-source is adapting, we do actually get new. Like reproducible builds and sigstore is great projects which enables us to have a more secure supply chain for open-source projects. Reproducible builds ensures that the distribution network of these packages, on the build systems, can be verified. And then transparency logs ensures that the releases and the Git commits of the projects are all tamper evident. This improves the provenance of our system because we can then take a container, we can deconstruct the container, verify all the packages in that container, reproduce them and also trace all of that, hopefully, back to a transparency log and see that the commits have been verified. And that gives us the complete history of the projects.
But... It's, sort of, a dark side to the story. We need to fund Open-Source work. Reproducible builds is headed by 2 or 3 people, there's 10 other regulars, all of them volunteers, doing this in their free time. Sigstore is a few companies putting in a few resources, but it's not like a huge team. It's back to this XKCD image, which was posted earlier this year.
Which is a joke, but it's depressingly true at the same time. But a lot of these infrastructures are dependent on extremly few people. And there's not a lot of money that goes into these projects. So it shouldn't come as a huge surprise when The Register writes that much of the Open-Source world is critically underfunded yet critically important. Because that's the situation we have gotten ourselves into. So what can we do to solve this? That's to fund people. The problem is that in a capitalistic system, people care about their bottom line and earning money. And that's not one-to-one with "we should fund idealists that works on this on their own free time without necessarily earning any money on it."
What can you do? As a privileged developer, your can push your workplace to fund people working on these projects. Or dedicate some developer time to these projects to help out, because it's sorely needed. It's extremely important to secure the Open-Source infrastructure. If you can't get your work on board you can donate yourself. I do, so GitHub Sponsors, lovely project. I sponsor Florian Bruhin which writes my web browser that I use. Daniel Stenberg is the curl developer, swedish guy. If you don't know what curl is, it's literally running everywhere phone, your watch, your TV or whatever, probably runs curl. It's, essentially, what everybody uses to fetch network resources. Jason Donenfeld who writes WireGuard, which should be no stranger to most technical people.
This is not a lot of money donated, but collectively it's a lot. That's, sort of, the main take way, Open-Source has solutions, but there's essentially not enough people on the solutions.
That's pretty much my talk. I hope it was somewhat insightful with a few twists and turns. I have a Twitter if you're interested in discussing with me. I also have a GitHub, that's Foxboron. I also have email if you ever want to email me and discuss. I also have a blog with stuff I do on my free time, in terms of Open-Source development.
Thank you for having me! Thank you for your time. I think there should be a Q&A session starting soon. Have a lovely conference going onwards.