
uh so I'm happy to be back at bsides uh Oso uh to present this research I did ear earlier this year uh so it's a hobby research uh into the python ecosystem so first off the python package index is used by millions of developers to distribute python packages so it works um uh in this way way so you have the package maintainers that package and upload their packages to p.org and then you have uh package consumers that uh download and install uh the packages using a package manager like pip or poetry uh I'm going to use requests as the uh example package throughout the presentation but it's just an example and the taxs are not specific to that
package so shout out to Ste and others at uh the hacker space hacka for inspiring this work um and also the PPI maintainers for how they um managed the reports and also feedback I got on how to present this so the findings were reported to PPI in June and July um there was one patch in June and another in August and was cleared for publication in August I um published four blog posts together with this talk earlier today uh because there's a lot of details I don't have time to go into so first let's look at the structure of uh PPI packages so you have a project and then uh for each uh project you can have zero or more
releases and for each release you can have uh one or more um distributions uh so distributions are just files so you can have Source distributions as star gz or zip and you can have binary distributions in the wheel format so the first issue uh is that it was possible to do a denial service on other projects uploading their packages and the reason is that uh there's a global check to make sure that file names of distributions are unique across pii and instead of checking the uh project name exactly it only checks the prefix so it was possible to make a prefix project to attack another project so for requests you can make recu uh and
then write uh the file that um the request project might write in the future and it's not that hard to to guess like future uh major minor releases uh so this was quickly patched uh by the PPI maintainers uh next there's a new type of attack at least I haven't seen it before so please let me know if you have uh so I call it distribution confusion and the issue here is that you um basically can make duplicate distributions so there's um the regular distributions I talked about before and then you can make uh malicious variants that are equivalent so um let's look at how we can build attack using that as so first we have the
benign uh distribution uh with a regular F name and then uh we're going to build a malicious variant um by changing the first character to uppercase R and this will Target poetry and then we're going to add the leading zero uh to Target pip instead um and like this is just the most convenient way to get the leog graphical order correct but there's all ways to um manipulate the the follow L to get uh the proper ordering so because these are equivalent uh seen from the package managers uh if you try to pin on like version or uh the platform or other tags H it will still uh select the malicious versions but if you pin on the benign
hash that will work but it will only work until uh you might add the malicious ashes so uh I call this trust on every update because if you use like a p update the next time it fetches the hashes from uh the index it will include the malicious hashes so um then it's still going to select the malicious ones so the B hash is just sort of safe until the next update you might of course catch that in the code review but it's not guaranteed okay so let's try a demo [Music]
um let's see so uh when poetry and pip uh are deciding which package to fetch uh they're going to uh p.org simple SL requests and uh we can see that that gives us a list of install candidates with uh the file names and also the link to download the file uh for the demo I'm just going to run this locally so on Local Host I'm serving um uh three variants so we have the benign one this one and then we have the capital r targeting poetry and the leading zero targeting pip so if I first do um I I pin on the version and I pin on the benign hash uh we expect to get the
benign hash so let's do
that so and here we can see that uh the file that was actually fetched it's the benign version but if I now do a pip install without pinning on any hand um but still pinning on the version we see that it's the the leading zero malicious version that's selected so um let's uh move to
poetry so first I'm going to do um um I'm I'm going to pin on the Ben hash as before and we can see that uh we are indeed getting the benine version but if I then do po three
updates it's fetching uh all the hashes again from the
index so now we see that the two malicious ashes are added to the set of trusted hashes in addition to the existing Ben hash and if I then do a PO Tre install with this new
set I get um um malicious variant with a capital R so [Applause] if thanks so if you uh either don't have ashes or you have the malicious hashes in the trusters set it will pick the the malicious variants okay so this is just one of the malicious uh um distribution confusion sandwiches you can make uh you could also Target poetry and pip um uh individually uh this is kind of interesting because there's no other mechanism to Target the the package manager uh of course all other package managers are going to fall into one of those two categories uh as well you could also do the malicious variant in the middle of the sandwich uh so then
like normal users would get the benign versions and then you could have a malicious Insider pinning on the malicious version and also like if you later delete the outer um variants which is possible the malicious uh version would be uh chosen instead uh so that takes us to also like you can have um time varying sandwiches of different sizes uh to to change the behavior over time so like if you're doing a training or there's some reason you want to Target in time you could do it that way um so those who paid attention I said they can you can only have one source um uh distribution per release and I just did a demo with more than one
source distribution per release so we need uh bypasses uh so the first bypass is that you could simply say when uploading a targ set you can say that this is actually binary distributions rather than a source distribution and it would just work uh so this was this was patched in August and then um the other way to do it uh which is still not patched is that you can uh give a different um version in the metadata rather than the file name and because the package managers just care about the file name you'll still get um this will still work and you can have more than one uh Source distribution per um per Rel is um so while it's not patched there
are two open issues to to fix this so let's move over over to a different attack called manifest confusion uh so this was coined for the mpm ecosystem uh this summer and it was already publicly known to be an issue for PPI uh as well and uh it's basically that there's different ways to resolve the dependencies of a package so in the what I've done is just to look at okay which tools are using which sources and what might be the impact so a lot of tools use the dependencies defined inside the distribution file uh like pip pip tools uh dep. Dev Etc uh so this is the recommended way uh from the PPI maintainers and then
there's other tools that use the metadata API instead uh to fetch the the dependencies including ptry uh snck and socket. and the issue could of course be that you are if you're checking the dependencies in snck and then you're installing with Pip the results might differ this actually more complicated than that because uh you could also get different dependencies in each distribution and there could be a lot of distributions say for um for Num now you get more than 30 uh distributions um and also like with distribution confusion uh you could have even more like uh ways to resolve a dependency um so uh this is just a screenshot showing that these three online uh uh tools that
index all packages uh show different dependes uh for the same package uh so this is still up and there's more more details in the blog post so deps doev see URL lib 3 while Snick and so. Dev see charge set normalizer okay so I talked about um um manifest confusion and distribution confusion and uh this is resolution confusion is a attempt at the umbrella term and again if you you've seen this before let me know uh but it's all the ways to resolve a package into something else than you expected so of course we have dependency confusion where the package installed from the wrong repository manifest confusion where um there's multiple ways to resolve uh
which dependency uh a package has and then distribution confusion where you might get the wrong um distribution so briefly on how I approach this research so I was actually uh starting to implement a transparency log for PPI um and then I had some uh assumptions that LEDs to to some bugs and I started reading the PPI uh source code it turned out that the the allowed inputs were greater than what was needed to do a regular um upload so I I I wonder if this was possible to exploit and then of course the you can only have one source distribution got in the way and I started to look for bypasses for that so the four blog posts I released
today is here uh I've touched on most of these uh but the the second one on repr reproducibility in PPI uh that has a lot of other ways where um you might get unexpected results with um with python packages and with that I can take questions [Applause]
anybody questions up top on the balcony downstairs yes excuse me thank you um now that you've seen that this possibility exists um have you had the opportunity to scan through piie for example to see if there are uh packages that could be vulnerable to this do have the these confusions in place or or other um in the wild uh occurrences of this yeah so in general because the the file name rules have been so LAX historically uh there's 's a lot of uh garbage in the system and for the most part it looks like it might be accidental but I I haven't looked closely enough to to know if it's uh accidental or not uh those issues that
are present uh so that might be a future research uh but luckily like this is easy to scan for right it's easy to detect much easier than to detect than if a package is
malicious anybody else question for St manifest confusion distribution confusion I like the Rhymes where is the hand keep it up hello so the prerequisite that was listed uh what does that mean and how hard is it to get it uh so that basically means you you need to compromise a package to be able to uh to publish as the maintainer uh so it could also be of course that the maintainer uh turns evil um but you need to have permissions to to actually publish packages which should be a high bar and and basically when when you are pulling in a project you are trusting that project to not do malicious stuff right as so this is just one uh malicious
thing you can do if you have those
permissions anybody else questions for St about these python package distribution attack
vectors hi um I was wondering you mentioned detection do you have any examples of good tools or ways or mechanisms to implement to detect uh these scenarios yeah as as far as I know nobody except the test code I have is is doing that detection um