← All talks

Abuse of Trust: Exploiting Our Relationship with Public Repositories

BSides Canberra · 201929:21143 viewsPublished 2019-05Watch on YouTube ↗
Speakers
Tags
About this talk
Pat Hay examines supply-chain attacks against public package repositories, demonstrating how malicious actors can inject code through trusted dependencies to compromise downstream users. Using Python and PyPI as case studies, the talk covers real-world incidents like the event-stream attack, and proposes practical defenses including containerization tools.
Show original YouTube description
BSides Canberra 2019 PatH - "Abuse of trust - Exploiting our relationship with public repositories"
Show transcript [en]

well thank you very much for coming back from the break I hope you checked out the rest of the conference if you stayed here in the break then that's good Euler be prepared for a little treat we have Pat hey Cho is a camera local who will be speaking about abuse of trust exploiting our relationship with public repository so let's give him a warm welcome [Applause] cool everyone thanks for coming I was gonna make a joke about how there's way too many people in this room but at least half the people have nicked off so that's great so thanks them for ruining the start of my talk so good hey my name is Kat ah so I'm a low-level programmer

during the day but when I like to take my head above this sea level I'm a Python developer and so today I'm gonna be talking about the software supply chain and I'm gonna cover yep I'm gonna cover basically what this is and basically how malicious users can basically exploit that the the concept of modern software development all right and then I'm gonna sort of go about what can we sort of do about this now quick note is I am gonna focus on Python and pi PI which is the Python package repository but all of the concepts I'm talking about this is systemic in modern software development as a whole it's not specific to this particular language and

I'm feeling to point out things that happen other languages but as a general rule everything I'm talking about is going to be possible in basically every other software language so first of all what do I mean by the software supply chain well modern software developing is hard and I'm not just saying that just so I own a big paycheck like it's a very difficult thing you know there's lots of moving parts in modern software development if you're building web apps you have target multiple different browsers across multiple different operating systems you have to deal with differences between computers and mobiles and tablets are you often have to plug your software into other systems such as third-party

authentication servers so there's a lot of very complex things that go that was involved when it goes to creating modern software app right so say for example you want to create a web app okay and you want to create an app where you show people pictures of cute dogs and you ask their customers to guess the names of these dogs where nicoletto guess of 3000 right so you might think that you know you're gonna spend all your time when you're developing this app working really interesting and fun stuff but in reality what actually happens is the majority of code that goes into your app it's gonna fall into one or two buckets okay you're gonna

have the easing and boring bit so these are the parts that you like every single web application has to do and you end up doing basically the same thing from every every app so these are things like hey how do you make a web request right how do you draw an image on the screen okay but like these and sort of things that you do time and time and time again and they're not really that interesting cuz they're not really the problem that you're trying to solve okay and then the second bucket is that really hard bits can go really really wrong okay these are things like the user authentication how do you keep user data

private okay these are things that if not written by experts and audited by experts things can go really really badly and this is a really really bad thing right and so therefore if you're not that expert and you don't want to spend your time doing the boring bits what do you do right well so the community and the open source community is really really great and it turns out there's lots of libraries out there that take a lot of this pain away that have already implemented a lot of these parts of the application right and so you can just take these libraries you can use their code inside your web application and therefore you can spend less time

doing the bits you don't want to do and more time doing the bits and you do want to do right and so this is really really great but the library developers also have the same idea they also have the parts of library that they were interested in writing and there's other bits don't they don't want to do right and so they themselves have their own dependencies and what you end up doing is this really complex chain it's really more like a web of dependencies and Co dependencies at inter dependencies and without realizing it you you might not even really know exactly who's written all the code that you are using inside your application right you you

might not really know where that code has come from how well it's been written how well it's being tested and and you you don't even know my don't even know some of the names of these because if their dependencies are dependencies appendices and normally this is okay normally you can just go and you know go live your life but what happens when one of these dependencies goes bad right are you gonna even know about this and what could happen if one of these dependencies goes bad so that's what we're gonna talk about today so we're gonna go through a hypothetical scenario here so we're sort through a hypothetical scenario of how this can go bad and what that could mean for you as

either an app developer or as the end user of a product okay so we're going through a hypothetical scenario there is a company out there called cool Corp right and cool corpse accounting department really loves using Python they like using it so much that they really love using this Python app called big money okay and big money exists in the pipe a packages and so every day the people in cool cops accounting department download the latest version of this app and use it to do whatever Countians do I know involves money that's about the extent of my knowledge about what they happens so we are going to play the evil people in this scenario okay and what is gonna be our God what

our goal is gonna break in a cool court and if doing CDF's has taught me anything it's that every organization out there has the secret XT sitting on their desktop inside of that there's gonna be a magical hash and inside of that it's gonna be something is really really important to someone okay so our goal is going to be to attack cool Corp jump onto their desktops and steal their secrets or take T and X fill that out to our own purposes okay so what do we do we'll so first off we're not going to attack cool Corp directly and we're not even actually going to attack the app that they use okay what we're actually

gonna do we're gonna look at that app and we have a see that there's actually a library that that act uses it's called big - lotsa - money okay what we're gonna do is we're gonna take over this app okay and so how are we gonna do this we'll so with Python packages and this is similar to all the other open-source software is you know these are written by people in the community people these are people who are sharing code want people to use their code and want to people to help them improve it right so in these packages they list information about how you can reach them okay so often you find in Python

packages unifying these people's names you're gonna find their email addresses you might find links to their home pages and some people sort of go even further than that sometimes without even realizing knowing it and they might put up you know links through their git repositories that contain private SSH keys to their servers that they don't didn't necessarily realize that they'd made that stuff public they also might put things there such as like links to confluence pages that they haven't locked down properly these are things that if you look in pi PI today these are the sorts of things that you're going to find right so if we wanted to target this developer with a spearfishing campaigns we wanted to send

them a crafted email that would trick them into handing over their credentials to PI pi we have a lot of information here to go on we have a lot of information we could craft something very very specific to that developer in order to trick them or potentially even convinced them to giving us their credentials okay now once we do that and once that's successful once we've got their credentials there's a really slim chance that developers even gonna know that we've done that so pi pi has no multi-factor authentication it has no notifications listing that you've logged into their account if you push up a new version of this package they're not gonna know about it so there is there's

a lot of issues here where if we gain control of this package that developers not really gonna know about this and let alone the developer of big money and learn you being cool so once we can modify this package what are we gonna do about it okay so we're gonna be actually extra crafty here we're not actually going to change the library itself okay because maybe we think hey maybe either cool corporate we really think maybe big money you know it checks out whenever we push whenever a new version is bumped checks out saves what the code does okay so we're not actually going to inject any malicious code inside this application but what we're going to do

we're going to make our own package and just slide that into this complex dependency web and then our own packages are going to be the dream that actually contains all the malicious code right and so from the perspective of big app that they don't know why this dependency exists because it's not their application all they know is one of the things that they use now requires something else and they don't know why and you know that they have no reason to sort of suspect anything suspicious and again cool corpse sitting right at the top of this there extra not gonna have any clue there they just know that something got bumped to a new version so

we we can do this and what you have to publish a like legacy version like file version with Python so modern Python packages don't long so so so modern Python packages we you can install them but they don't run anything unless you explicitly call them but if we were to publish a legacy version of a Python package what we can actually do is we can add some extra code that says hey while I'm installing go do this other thing and we can make that other thing be whatever we want it to be so we can do that so we don't actually need anybody to call our code directly as long as we wear in that dependency chain

we have code execution and we can do whatever we want there so we published this this legacy version we add that as a dependency and then whenever someone goes to grab the new version of big money we have code execution we can run whatever we want and we can run on that users now I've called this a legacy option but in reality there are hundreds and hundreds of legitimate packages out there that do this that do run code as they're installing typically it's things like setting up shortcuts or sometimes it's actually compiling some code inside the user system so even though this is a legacy thing no one has any plans to actually remove this functionality so

this functionality is going to exist for a long time now in other languages this is also possible so if you're using Ruby or net that this thing is absolutely possible and in there it's not even a legacy system it's just something that you can straight-up do okay and then the final thing to notice about this so if we have code execution we're now running inside cool Corp servers yes we're not running is like a domain admin or that you know we're just running is generic boring user from accounting software but if we remember our goal if our goal is to steal secrets from the accounting department we don't need to main out man you know that the information that we

want to steal is right there and we don't really need to go any further than that so the fact that we're just running as his user on a on their local machine that is completely fine for what our goal is in this scenario so we've now got the scenario where we've injected our dependency cool Corp has downloaded the latest version of this application we've got code execution and now we're running on the user's machine and we've got their secrets not take steam how do we get that out right so if this is a sort of diagram of what we've done so far so we've uploaded the malicious packages up to pi pi that's gone down

through cool Corp they've downloaded the new version we're running that's really cool but you might sort of think we'll look if we just were to take that take that file and just punt it you know straight off to our evil server we'll look there's a chance that either the firewall might block that connection or your intrusion detection systems you know they might notice hey this regular user user from the counting is reaching out this domain that I've never heard of and that's that sounds like some dodgy going on launch investigation right so if that doesn't work we'll why don't we just go back through pi pi right so Python packages that the way you submit them to

pi PI is just a web request it's a web request that really isn't that dissimilar to when you download a package right so unless you're looking under the TLS the the network paths to and from going going from pi pi to upload a package and downloading from pi pi these are going to look really really similar and if you're not paying attention you're gonna see that network flow and just completely dismiss it as someone's just downloading a package when in reality it's actually gone the other way around right and the pipe I mean you know it's one that you've already trusted because you've already trusted your users to download this package right so that you've already

trusted this domain you've already trusted the certificates these domain presents and so what we can do as a malicious package will we're just gonna wrap we're re-upload ourselves we have a bumper version repackage ourselves as a new package except this time we're gonna package the secret stop takes T inside of us and then just push us up to pi pi right and then ask from our evil server we can just connect pi PI and just pull it straight from there right and then if we wanted to do something different we could bump the version again put a new new sort of task in there and then we've basically got this complete C - this is

a complete command control system using these legitimate servers and again if the cool cop networks are not paying attention they're gonna trust all of this traffic right so you know we've got this complete sort of flow here right now that this is a hypothetical you know example but so how real is this okay so actually this sort of attacks this has happened already in real life okay so in November of last year there was a no js' library called a vent stream okay now this this attack is actually absolutely wild right so instead of spearfishing so what it's a event stream is it's a library that gets used similar to the big money it's a library that gets used

in a lot of applications okay it has millions and millions of downloads every single day so it's a pretty popular library okay but the author of this library and they didn't really use it that much you know they'd moved on to other projects and they were keeping it alive but you know they weren't really that interested in it and what ended up happening is the author of this library got approached by email and far from someone that said hey I've noticed you don't really care about this this library that more I use it every day why don't you let me maintain it for you yeah and the library developer didn't have any reason to doubt this you know

that they had their other projects someone was generally interested in and they ended up just handing over the credentials to part to our Malta this library to do maintain this library now it turned out that this was a social engineering attack and turned out that this was a malicious user and what this user did is exactly what we talked about they injected a dependency into the into the web which which was their own malicious JavaScript that did a bunch of wild crazy stuff but it it really are not boiling down that one in that happening was it was targeting a specific application that managed Bitcoin wallets and so whenever users use that application it would download

the event stream library therefore a download the extra malicious package and then that malicious package would steal all the credentials to harvest all the coins from that Bitcoin wallet right so these are attacks that are happening in the wild right now okay and when it comes to Python although we haven't seen this specific attack that there are absolutely bein malicious packages that people are all the time clearing out from pi PI so these are things that are happening in the world okay so now it's broken about the things what can we do about this okay so there's a couple of tools that can help us to mitigate the these sorts of attacks and try and help

us protect us from this so the first one is PI up safety and j4g x-ray safety is a free product x-ray is a page products what these do is basically they go in maps or if you're a developer they go and map all of your dependencies and it may give you a really useful graph that shows you just all the different packages that yours depicted that your library depends on certain apart from being pretty eye-opening because you'll see quite a large number of packages in there what they also do is that they check that against a database of like known bad packages now typically these aren't necessarily malicious packages what what these tools are mostly looking

for is if there was a library that has a vulnerability in it and that that vulnerability got patched and and then they bump to a new version well if you're still depending on that old unpatched version these tools are going to tell you about it so so this does it definitely makes it useful to see if you're relying on these bad packages but their focus is all on this sort of known already known bad and known fixed packages this doesn't really do anything doesn't do any sort of code analysis to look for any malicious code ok so then on the outside you've got our PI C QA is banded okay and this is looking at code

and so what this actually does is it takes the code passes it out breaks it all up into the abstract syntax tea tree and this looks for me an odd and bad code but sort of sort of again that the sort of point of this tool it's more it's not looking for badness it's more looking for you if you as a developer have done something wrong okay so if you were a developer have written some code that does something unsafe that's what this tool is designed for designed to protect you in those situations but what hasn't yet been tested and I it'd be interesting to find out would be if a malicious user wanted to try and work

find ways around this you know Python being a dynamic language like it's not designed to be protecting against an actual malicious user it's just designed to be only like finding sort of things that you've accidentally done something wrong and then sort of on top of this is lots and lots of packages get flagged by this tool okay I downloaded I think I was like the top mm packages of pi PI and the top 1000 all through like high warnings which was their second highest level of hey this package is doing something it should Sue's me it shouldn't be so like even though this tool is gonna be really really useful you're gonna have an absolute mountain of false positives if

you're attempting to use this to look at code that isn't directly yours so on top of all of them so so really the best thing in these situations is to really do basically like a full source code audit right of all of the current all your packages but that's really complex you know it like that that's a lot of time a lot of complex time and it's so like if you're a small team you know there might be thousands and thousands of lines of Python code there that you you might have to review to check for anything dodgy you know you might not have time for that or what if you're a student right you're a student that

might not necessarily know what bad and malicious code looks like so you know what do you do in these sorts of situations um also what if you like you just do sort of a problem once you know you're not making a production system like a bunch of times when we're doing CDF challenges there comes across a puzzle that you know there's a package out there that has a solution but once you solve that puzzle and you get that flag then you move on with your life you know you don't need that again you know so are you gonna do full source code reviews for something that you might only need for 15 minutes if that

right and then even if you are looking at a production system well what if you're in a you're comparing a number of different products are you gonna try and do a code review on every single one before you determine if it's one of them is worth going down or not right so in these sorts of situations the the tools they cover that they are useful to have but they definitely have their flaws in terms of those targeted malicious attacks and the source code on it you know it's unlikely to happen so if only there was some sort of way where we could take the code and just chuck it in a bubble right and we can just chuck

it in some contain environment and we can let everything there from installation code running all of that could happen in that bubble and none of its really segregated from the rest of our machine okay so this is what containers are what docker containers are there's sort of from up until now with docking containers is unless you really know how all this stuff works it can be a bit of a pain to spin up and test them particularly if you're just wanting to run a you know I project once or you're really just trying to focus on your app development and not on you know systems administration so I would like to introduce you to introduce to you a

tool that I've been working on recently called Dawkins okay what this tool does is it makes it really easy to take Python code and run it inside its containers and protect you know protect your system while making it really it really really simple and easy to run the code and actually get the information you want to get so what doc and does and it takes all the packages or everything from the install to the running of the code and it puts them builds docker container segregated doctrine containers builds the containers runs them and then it just spits out D right there and then it spits out the text input okay so what really what happens is you can very

simply go hey doc end create me an environment hey and install this package that my friend talked about recently I think it's called rongorongo one of them so give me an environment and installed ronger inside of it okay and doctor is gonna go and it's gonna build you this environment that's segregated from your machine and it's gonna install that package inside of that container then you can just go hey da can run this script over here and it's gonna take that script throw that inside the container it's gonna run all of that and then just spit out anything that comes out and if it's got any input ask you you know for any like user options and

that sort of stuff then all ask you and so Dawkins will create this bubble around all of this Python code and you know and we'll reject your signal completely segregate your system and the Python code it's not even necessarily gonna even know that's running inside of this system okay it's gonna see an operating system it's gonna see a user but it's not gonna see your user you know it's not gonna see your passwords it's not gonna see your sensitive information it's only gonna have access to what it can see inside of its own bubble so how does this protect us against all of the attacks that we were talking about right so it prevents the code running on your

machine you know it's gonna run inside of that bubble particularly at that install time so on top of this that the containers that everything runs in as soon as we're no longer running that script that gets all shut down and blown away so if it in attempts to install and it's sort of persistent backdoor well that stuff gets blown away and no longer exists and certainly isn't running on your machine okay and then also it even by default we even actually completely blocked the ability for it to write any files inside of its own bubble or even connect to the internet now if you require them we the tool can sort of let you do that but so by default there's

the package can't do anything outside of what it is executing and has no clue what is happening outside of it okay and so if we imagine our secrets txt is sitting on our system you know there's no way that it can even read that file let alone even know if that file exists so I've made this tool and I really think it's really useful for those sorry as I was talking about for those times where you you've just got a quick problem you want to solve or you're just doing some testing and you know I would necessarily recommend this for a production environment that requires production solutions because the the problem with this sort of solution is if

you are putting this in a production environment well production data is customer data and customer data is sensitive data right so at some point the tool in production has access to information that is potentially sensitive right but so just for testing and for development where you're not actually giving it any sensitive information I really think this tools gonna be really really useful for those sorts of situations now when it comes to production you know nothing is going to beat that full source code audit but for anything less than that this tool along with the others are going to be really really useful okay so what are we talked about so we've talked about the software supply chain and

we've we've talked about how really complex this can be and we've walked through the hypothetical and showing the real attack of you know that can occur on this without you know as any user really really knowing about it and we've really gone to show that like without fully understanding what all the code depends on you you really leave yourself vulnerable to to these sorts of attacks we've also covered that so there's a number of tools out there now including Daka and but safety x-ray and bandit you know we've talked about there are tools out there that can help developers and can help users to understand what exactly what code is being run but they still don't necessarily help against

these sort of targeted attacks so okay there we are so I'm putting I'm putting da camp up on github it's already up there if anyone's keen besides that does anyone have any questions I'm short on time

yeah sure Oh

if this sufferer is malicious and installs a back door or something in that container you'd be able to detect that fact and you know make some kind of indication I mean could you could you extend this to make it able to give some kind of indication whether the software was bad or not so the question was if the currents running inside the container could we do some detection to tell what's going on so at the moment no the tool is designed for just essentially making this quick running the code however it actually does a really really great idea and that is something I'm definitely gonna look into because in theory yes it should be quite simple to do so what what you

could easily do is we essentially could just chuck something underneath Python that just keeps track of any files that get written any Network calls that should be something that's actually fairly straightforward to do so I will actually think about doing that so I think we might have to leave it there if you've got any more questions for Pat grab Eve after he's talked and asking there but let's thank God for a great talk [Applause] yeah