
Hey, I'm Stian and I will be presenting a deep dive on dependency confusion. This is joint work together with Ståle Pettersen and Alexander Kjäll in the Schibsted Product and Application Security Team. The term "dependency confusion" was coined by Alex Birsan, in a blog post he published in february of this year. There he talked about how he used the technique to hack a number of companies, including Apple and Microsoft. When we became aware of the issue in Schibsted, we spent quite a bit of time mitigating it. And in this presentation I'll cover how you can figure out if are affected by dependency confusion and also steps you can take to mitigate it. The dependencies we are talking about, just to quickly,
3rd party libraries that you include in your projects. Often called packages.
And you'll pull them in via package managers like, npm, pip and Maven.
As in, this example. Like 3rd party libraries are usually a big part of the project. You might include a lot of 3rd party libraries, and those in turn have their own dependencies. It can be quite a bit of attack surface.
Of course, the dependency confusion is just one of many ways to be compromised through your dependencies. These are in general referred to as supply chain issues. There're a bunch listed here, like different backdoors, typo squatting, etc. But in this talk we'll focus on dependency confusion. So for there to be dependency confusion, there's 2 main ingredients that need to be present. The first is ambiguity in, when you specify your dependencies.
So for the identifier to be unique it should include the repository, the scope, the package name and version. So an exact example would be, to have the URL for the repository, have the scope, like the company name. And then the package name and exact version. In practice, we often only find the package name,
and the version range... The version, rather, might be a range and not be exact like in this case, with the patch version being open. The second ingredient is that the package manager needs to be configured in such a way that the ambiguity of the package can be exploited. So in this case we have two upstreams. When you specify the package you want it might be like this. "MyPackage" 1.0.* And you expected to get this "MyPackage 1.0.0". But you might actually get "MyPackage 1.0.999" which is controlled by the attacker. So in this case the highest version would win and the attacker would choose a bunch of nines to make sure they would have the highest version.
This is one way there can be confusion. There could also be other like, the ordering the repositories are included. So this is the general gist of dependency confusion. In practice, it's often a bit more complicated. So in this case it's more like the setup in Schibsted, and also the different companies that were mentioned in the original blog post. Instead of just having upstream repositories, we also have an internal repository manager
where we might have our own internal packages. And we might also have proxies that proxy upstream repositories.
On top of this we can have virtual repositories that can reference one or more of the other repository types.
As we see, here there are several places there can be confusion. Like should we go internal or external. Which of the external should we use? And also which of the internal should we use? This is, sort of, the backdrop in the normal case where dependency confusion might be an issue. The attacker's goal here would be to publish a package upstream that would match the package name of an internal package. You might think that guessing the package name, or finding the package name they choose might be hard. But this is what is shown in the original blog post, that it's practical to find these package names. You might also think that it's a challenge to produce a package that wouldn't
produce runtime or compile time errors.
But fortunately, for the attacker at least, some package managers, like NPM, allow arbritrary code execution on package install. So that simplifies things for the attacker.
And then, of course, you would run in the context of a build server or a developer laptop. Which would be very useful. Given how straight-forward this attack is, it's, sort of, peculiar that it only recently became an issue people mitigate in the industry.
Of course there're a lot of different package managers that are affected by this. We have a bunch in Schibsted we had to mitigate for, and you can see more details in our blog post. In this presentation I'll focus mostly on Artifactory, and NPM. Here, we have npm-virtual. Which is... Points to other repositories, so we have the npm-local, which is our internal packages. And then we have npm-remote, which is a proxy for npmjs.com.
You can point to either of these three. And of course in terms of getting dependency confusion between these two you would have to point to the virtual, which then points to the two others. You would not have confusion if you only pointed to the internal, for instance, the npm-local. So here's a, sort of, concrete example. Say that we've configured NPM to fetch from the npm-virtual repository. And say that in our package.json we have a dependency: "babel-preset-internal" With this version. Which is a version range on the minor version.
Let's also assume that in our internal repository we do have "babel-preset-internal" with version 0.1.1. What the attacker would do, is the would go to npmjs.com and register the same package name upstream, and make sure that they have a higher version, which might be 0.1.2. When the developer then runs "npm update", the malicious package will be installed. One approach to mitigate this is to reserve your internal package names upstream. This is something we've done, but it's not in the spirit of the Terms of Use for the repositories. It might not be something you can continue to do.
In the case you can't quite trust upstreams, or you want to have extra security in this regard, you can also do a denylist internally. So you say that this package that you register upstream as a dummy package is also denylisted in your internal repository manager. But then, of course, the path needs to go through your internal repository manager, and not directly to the upstreams to be effective. Only going through internal is, regardless, a good idea. Because it gives you better insight into the packages that are used,
because the remotes are cached and you can also get usage statistics.
Which is convenient. For NPM, specifically, because they support scopes, rather than reserving the exact package name upstream, you could reserve a scope instead. So say that we have the "@schibsted" scope, and we make sure that the package is in that scope and we own that scope upstream. Assume we go to npmjs.com and see "are we already owner of the scope 'schibsted'?" "if not, can we get ownership? or maybe there's already malicious code published to that scope."
And do the cleanup, making sure that all packages are moved into your own scopes. Another complimentary approach would be to do allowlisting, for scopes per source. Here we say that the '@schibsted' scope should only come from the internal repository, while everything else can come from the virtual repository. Which might either be internal or resolve to upstream.
You could also set allowlisting for the remotes, this is useful in cases where you don't fully trust the upstreams, the upstreams might be a single project. This is more common for Maven. But then you might only trust the repository for a given project, with the prefixes for that project. Then you don't have to worry about other software coming in that way.
For the local, Or rather, setting the allowlisting in your build. This is how we do it for NPM. You would say that the general registry is the "npm-virtual". The general one. But then you say, for the scope "@schibsted", it's the "npm-local".
This is, of course, in contrast with only setting the registry to be "npm-virtual",
which then could lead to dependency confusion.
Our preferred, simple case is to... It covers when the package managers support scopes.
The simplest would be to make sure you only go through your repository manager and that you reserve the scope upstream. This is also simpler for NPM to only have one upstream repo.
So this simplifies things. Another approach, when the dependency confusion became a thing this spring, JFrog came up with a mitigation they call priority resolution. If this works for you, great, but it has some drawbacks. It works by... Basically, say there's priority resolution on an internal repository, then any package in that repository will never be fetched via a remote. The downside to this is that if you have an internal fork, or a single-version patch or an external project.
That will also block any other version of that project being fetched via the remote. Also if you have a project that started out internally and then became open-source, it will also break those projects. One solution could be to split them and you have one internal for those are definitely internal, where you set priority resolution, and then have a separate internal repository for the projects where you have to be able to fetch both the internal versions and the upstream versions.
This was JFrog Artifactory, but Sonatype Nexus also has similar functionality in their firewall product. Where you can, sort of, do it more fine-grained, so instead of saying, "all the packages in this repository", you can set it per package instead. Microsoft proposed a different solution, basically don't have upstreams at all, only use internal packages. This would require you to maintain internal copies of the project, either by building from source, or otherwise handling internal version of all the projects you need.
This is a good solution to have a good overview over all your dependencies. But it might be too resource-intensive for a lot of companies to do this.
Here's a summary of the preferred, complex case. This is when you can't... You don't have the scopes, for instance. If you can use priority resolution, great. Otherwise you might want to do a mix of reserved scopes or packages upstream. Use denylist and allowlist for the remotes in your repository manager, as well as setting allowlist for scopes per source, in your project. You would probably want to only go through the internal repository manager, to make sure you get some audit capabilities. Here we have some of the package systems we looked at. As you can see, some of the support scopes, others not. Of course it's easier when they do support scopes. PIP is one of those that doesn't support scopes,
which makes it tedious to go and register all the package names upstream.
One thing to note here is that most of them that do support scopes, anyone is free to register any scope. So then you have to be first, before the attacker to register it. For Maven Central, which is the main one for Maven and Gradle, you actually... They use DNS, so that you have to prove that you own, for instance, schibsted.com, for the group ID "com.schibsted". This is useful so that not anyone can get at your scope. If you already use good scopes, then you're, sort of, secure, to a larger extent in the Maven world. Basically this... It took a lot of time to go through this, and we did register several hundred packages upstream, in addition to scopes.
For the different package systems. Our approach, then to go through and clean up, if you're starting out. You can go through all packages and check if they do exist upstream, if not, you can reserve the package name and/or scopes. And you can add a denylist. If they do exist upstream, you should check if you are the owner or could get ownership. For instance, we had some packages that were owned by ex-employees that we could get ownership back of. You can also check that, is there already a malicous package there? And check if you're compromised. There are also some small projects where, their, sort of, no longer maintained, and you don't think you're gonna need it.
Then it might still be useful to denylist it internally if you're unable to rename the package internally. To avoid that conflict. And, of course, moving and renaming stuff might be required to get a clean setup.
In addition to going through all the internal packages we can also go through the remotes and make sure that we don't have any remotes we shouldn't have and also restrict some of them based on scope prefix to make sure that they can only publish packages that are relevant for that remote.
Then, unfortunately, based on your setup you might have to continously look for new internal packages or scopes and make sure that you go through the same process to secure those. This is, depending on the solution, an ongoing thing you have to make sure that you're covered. Then, of course, if you can use priority resolution it might be useful too. Hopefully there'll be more useful tools from the vendors in the future. So, some resources. We do have a blog post with more details on this. We also have an open-source tool called Artishock. That is useful to do some of the steps I just went through in the setting of Artifactory. There's also a tool from Visma called Confused,
which is more centered on, like you have one project and you can detect if that project has packages that are not claimed upstream. With that, I would like to say that we are hiring a Cloud Security Engineer and I'm happy to take questions!