← All talks

Scaling the Security Researcher to Eliminate OSS Vulnerabilities

Bsides CT · 20231:00:18344 viewsPublished 2023-10Watch on YouTube ↗
Speakers
Tags
About this talk
Jonathan Leitschuh presents a practical framework for fixing security vulnerabilities across thousands of open-source projects at scale through automated bulk pull request generation. Drawing on his work as the first Dan Kaminsky Fellow, he demonstrates how to detect vulnerabilities like zip slip and HTTP-dependency resolution issues, generate actionable fixes using tools like OpenRewrite and Mend, and navigate GitHub's rate limits to deploy patches across the Java ecosystem.
Show original YouTube description
Keynote talk for 2023 Discusses an innovative solution for addressing security vulnerabilities in Open Source Software (OSS) projects at scale: automated bulk pull request generation. It emphasizes the challenge of efficiently triaging and fixing widespread vulnerabilities, showcases practical applications, and highlights the importance of providing actionable fixes to volunteer maintainers to make a real impact on OSS security. Jonathan's BSides CT 2019 Zoom Vulnerability talk - https://www.youtube.com/watch?v=ypU5IPJKlXg
Show transcript [en]

hello everybody hi good morning good morning welcome um thank you for all being here um it is actually not as early but yes um a little bit of fun um so hi thank you guys all for being here I um one of my first speaking like my first journey of of public speaking about security happened at a bsides actually bides Connecticut back in 2019 um I gave a talk about uh I was vulnerability in Zoom that I found um back at that time um and uh had a lot of fun with it took that to shukan and was going to go other places and then promptly Co hit us so conferences ended and um this was a talk

that I've come up with um as a result of I'll talk about the Dan kinsky Fellowship this is a talk I've given at black hat Defcon bide Las Vegas um taken all over the world Japan and Italy and now I'm happy to present it to you all here now today so um welcome to scaling the security researcher to eliminate open source security vulnerabilities once and for all is there a way that I can get rid of the Bottom bar or is that it's fixed and the answer if the answer is it's there that's fine any I'm gonna get started while you fidget with that um I don't I'm not you click that is that not zoom's not coming through on my

computer it's just don't worry about it it's okay all right anyways hi who am I my name is jonath Leu um I'm a senior Soper security researcher for um the open source security Foundation which is a sub Foundation the Linux Foundation uh I work on a project called project Alpha Omega um uh I was the first ever Dan Kaminsky fellow um I'm a get up star get security Ambassador um you can find me on Twitter at JL Sho and GitHub on the same handle uh a little bit of a disclaimer um this talk does cover stuff that is proprietary but everything is available for free for open source so you can use this stuff all of this stuff for free

for open source and for your open source security research some of the stuff you might need a license for in a corporate setting um okay that jumped ahead um so this work was supported by um project Al Omega and the open security Foundation which I currently work for um it it was also supported by the Dan kinsky Fellowship um at human security I was lucky enough to become the first ever Dan kinsky fellow um for those of you who don't know Dan was um best known for I sadly never got the opportunity to meet Dan um he was he was very well known in the security Community he was best known as the hero of the internet back in 2008 for fixing

finding fixing of widespread security vulnerability in DNS back in 2008 um and after Dan passed the Dan Kaminsky Fellowship was created to celebrate Dan's memory and Legacy by funding open- source work that makes the world a better and more secure place and I was fortunate enough to have the opportunity to to take on that Fellowship uh in its first year let's jump straight into the spoilers um I generated polar requests this is I generated a bunch of polar to fix a security vulnerability called zip slip um I generated 164 polar requests to fix zip slip across the jaob system but what is zip slip and how did I do this is what the point of this entire

talk is so the journey the journey this entire Journey started with a simple security vulnerability and it was this use of HTP for resolve dependencies in my company's gradal build now this vulnerability you might ask me why is this important well if if you're using hgp instead of hgps to resolve your dependencies in your company's in your in your build um somebody can you know attack her in the middle inject code into your jar file that you're downloading and then run code on your machine this vulnerability doesn't just exist in Gradle build files it can also exist in Maven um this is where it appears if you're looking at a maven XML configuration file um this is jumping ahead faster

than I wanted to um and then also uh you can see that this also appears in the upload so this is usually with credentials it's going to your artifact server oh quiet Siri um and um so this is usually with credentials and it's uploading stuff into your artifact server as well um this vulnerability is everywhere it impacted organizations like spring pivotal um Red Hat aaty Foundation cotland Jeet brains Jenkins Gradle groovy elastic CCH Eclipse Foundation um it also impacted Oracle the NSA LinkedIn stripe this open source projects for organizations all over the world um and I reached out so Maven soner type is the piie PPI of the Java ecosystem um npm to the JavaScript ecosystem Maven

Central is that and Maven Central um is run by cype and they looked at their traffic and saw that 25% of their traffic um in 2019 was using HTP instead of htps to resolve their dependencies so this is and this is a major Pro prominent AR server in the in the Java us so how do we fix this well I push forward an initiative that on around January 15 2020 all of the major artifact servers unlike the other languages Java actually had a couple of major artifact servers at play um that all these major artifact servers in the industry would stop supporting HTP in favor hgps only and they did it there's a bunch of

blog posts announcing that they were going to do this be giving people warnings that this was going to happen um however when I reached out to Maven Central again they said that that um in 2020 January 2020 just before we were about to pull the plug we had only seen a 5% drop in the amount of traffic that was using HTP instead of HBS we still 20% so you can imagine what might have happened on January 15th 2020 broken software lots of posts on stack Overflow people like why is my build suddenly broken and I'm like that's because of me so um but we stopped the leading um but what about the other repositories so

these Maven Central J Center spring Gradle plugin portal these are only the M the predominantly used artifact servers in the industry um they're actually a lot more you'll see if you look at any open source project in the JV you'll see other artifact servers for random companies that have hosted artifacts they don't publish to Central um so Java ecosystems build infrastructure is actually way more distributed and and coming from a lot of addition different places than other languages which are all centralized on a single Central artifact server how do we fix the rest I said well let's just go fix the problem let's generate toll requests to just give the maintainers the fixes for these

vulnerabilities and so the question was how do we do this well the first thing code ql is uh as a simple query language that lets you write queries to find security vulnerabilities across open source projects owned by GitHub um and and so using this little simple coach query I was able to detect this vulnerability across the open source ecosystem coach scans hundreds of thousands of Open Source projects and you can run those queries against those projects to find vulnerabilities and for this very simple query you know it's not very much code and then a little bit of associated documentation get up awarded me a $2,300 Bounty for this little bit of little little chunk of

code and then so from that codee query I was able to get a list of vulnerable projects and I wrote this pull request generator and it was a the first thing was run on my laptop um python based wrapp over github's Hub CLI it had one nasty regular expression and a lot of logic for bouncing off the GitHub great Li and this is the engine you know underlying and you'll see there's this regular expression here and you say Okay so this regular expression is clearly fixing XML you like why Jonathan why are you using XML to fix uh or why are you using regular Expressions to fix XML and the answer is really annoying all

parsers almost all industry standard XML parsers in the industry if you feed that file in and then modify the XML when you dump it back out it comes out in the format the XML parts are dumped it out in not in the format that it came in in all these parsers don't preserve the formatting so if you want to make targeted changes in files that have tabs spaces braces on new like any all these formatting Chang CH you have to use something like regular Expressions really annoying um the problem with regular Expressions is when you use regular Expressions to solve your problems now you have two problems the original problem you had and the regular

expression um but it worked um I generated a ton of Po request to fix this security vulnerability cross open source uh this is a very you know you can see it's very targeted right no matter what the formatting was I could fix it with this regular expression I generated 1,596 polar press to fix this vulnerability across the source system and as of now today this is back in 2020 I did this um I have about a 40% merge rate for this and for this work GitHub awarded me a $44,000 bounty from the GitHub security lab bugm program so I got hooked on this idea of B pool Crush generation and fixing security vulnerabilities at scale and

submitted that as an idea to the daninsky fellowship and I was lucky enough to have that accepted as the project that I could work on for a year funded this is my up contribution graph for 2020 um and it actually shows those two massive P generate the lots of full requests to fix security more the scale so I have a problem um I have ADHD and I don't consider my ADHD to be a problem um but I do love chasing squirrels and I will do that by looking through advisory databases github.com advisories or something like that and I'll be like huh that's an interesting security vulnerability I wonder where else that is and the answer is

everywhere um you can take a very simple chunk of code that you'll see as vulnerable and be like I wonder where else this is and you'll just use a Cod query or something like that so the problem that I have is I'm finding too many security vulnerabilities this is an example this is the result of a code query search for vulnerability called zip slip again we will get to what zip slip is but this was on a site called lg.com which used to be owned by GitHub they've since turned it off but um you could scroll through pages and pages and pages of Open Source projects with this exact vulnerability all across open source and

how do you deal with a problem of that scale where it's just like we just have the results we don't know like you can't reasonably report this to maintainers everywhere I'm finding too many Security One abilities I need automation so we need automated accurate transformation at Mass scale this is where open rewrite comes in by open reite is an open source project promly written in Java um and it's you know available under Apache licensing you can use it in your own projects write tests so what is open rray well let's start out with what an abstract syntax tree is so an abstract syntax tree is what your compiler works on when you're compiling source code the

thing is that the abstract syntax 3 you'll notice that when you take this chunk of code it the compiler turns it into that tree structure in order to work on it but if you were to dump that source that that tree back out into source code you'll notice a bunch of all our wir SPAC is gone all of our Tabs are gone all of our comments are gone it's because the compiler doesn't care about that stuff the compiler doesn't care about that extra information so it throws it away so what we need is something that is this format preserving abstract syntax Tre that turns it into this tree structure but still preserves the white space tabs comments all of that

stuff One does not simply format the entire source file when you're fixing security going scale you got maintainers being like thank you so much for fixing this but your code doesn't match my style go change it all and that does not work so you need to be able to Target specifically the lines you want to fix and on top of it um you can generate new code that matches the surrounding formatting so if that project uses braces on a new line tabs whatever it is you can generate new code matching the code the surrounding projects standard formatting because open rewrite is a part of parts in the source code figures out what your Project's General

formatting is and will then try to match that no matter what it is additionally it's fully type attributed is that log for J slf for J log back what logging framework is that using might be important to know to fix a security vulnerability I can't imagine there ever being a critical security vulnerability in logging frame that would never happen in our industry no um and um on top because of it's fully type attributed it's this very very dense graph actually there's about 6,000 nodes missing from that left graph because it would be too dense and you would just see fuzz um and then on top of that when you're trying to insert new code into an

AB syntax 3 trees these ad trees are very complex um and so you want to be able to to to place that new code into your source code in a very targeted way let's say this is the fix for zip slip let's say we want to insert this into our source code and again we will get this certainly later um you can put that code that you want to insert into a string and the templating engine will let you insert this generate that as and insert into the code and also comes with a a um a coordinate system for stating exactly where in the as you want to place that new CH of code that you

should and so that lets us take this vulnerable code and insert this fix into the source tode matching the formatting and fix the security vulnerability so what's possible now what other vulnerabilties can we fix with the unlock that open reite provides I use open reite to fix three security vulnerabilities across open source um temporary directory hijacking partial pass reversal and zit let's talk about those three let's start with temporary directory hijacking so what is temporary directory hijacking so temporary directories on Unix like systems are shared between all users so I was surprised to find that out when I found that out I was like wait each user doesn't get their own temporary directory no every single temporary all

there's one temporary directory for Unix like systems so when you're creating a temporary directory like this you'll see that this was actually a very common pattern in Java so what is going on here well someone is trying to create a temporary directory instead of a temporary file the problem the problem is that prior to Java 1.7 there did not exist a way to create a temporary directory in the Java standard Library so people would create a temp file which gave you a randomly named file with a pseud random number gener or csprng name they call delete on it and they call maker problem is that yes if you ask that overflow how do I create temp

directory You' get this this answer but un fortunately you get a security vulnerability as result of it um so why is this vulnerable well there's a race condition here the race condition is because makeer returns false if it fails it doesn't throw an exception I need to turn off the swipe functionality if this that's what's got okay so okay stay um so this is a fix for it um but this is actually still vulnerable and the reason it's it's vulnerable to a different vulnerability this is actually vulnerable to tempory information disclosure because this makeer makes that directory using the default posic permissions on the file system so that directory is actually still visible all the contents are still visible to all of

the local users so you put sensitive information into this directory other local users can see that the contents that file so this is the real fix this is a Java API introduced in 1.7 um very old API and so a lot of people have not Rewritten their code to fix this vulnerability but there is a new way to do this and I got a bunch of cdes for this vulnerability cross the open source ecosystem um and then I generated PO requests I actually generated 64 PO requests to fix this vulnerability at scale um uh and this is what it looks like you can see these very very you know simplistic little changes that we can make to just fix the security

vulnerability in addition we can do more complex fixes right we don't need those those file um if if if checks anymore we can get rid of those and this is what open reate allows us to do um vulnerability number two partial path reversal so what is path so path reversal vulnerabilities let's assume that you have two users on a file system you user Sam and us just mantha and you have logic you want to sandbox some bit of logic to only be able to access user Sam so partial pathal allows an attacker to access access a sibling directory with the same prefense so again user Sam use Samantha because us Samantha starts to use your Sam you can bypass this so

what is this vulnerability look like well this is a very common pattern you'll see across you know software you'll see this starts with check economical path starts with and this starts with check is using a string comparison the problem is that get canonical path which you'll see called twice here is used to normalize the file path so it gets rid of the dot you know the um the SL SL so attacker is trying to you a path veral payload you're comparing you're comparing the normalized path so this is you take user file Sam and then you call get canonical path on it this actually returns a string and you notice that string is user samam what's the problem with that

we're missing that trailing slash that we had when we started this with this whole thing so then what happens is you have this user Supply value that comes in right and user Sam we've had that trailing slash dripped off and now we've got an attacker coming with saman SL baz and that gets normalized here and now we've got userb starts with user samam and that exception does not get thrown and so this logic is very easy to bypass so what is the fix for this one of the fixes for this is to put this file separator character back in thank you

Siri um the better f is to use the path object which is introduced in Java 1.7 which lets you compare these things safely um so this is a good thing this is what we actually want to see so how do we find this vulnerability well we're looking for this string starts with call and we're looking for the the qualifier and it to be this get canonical path or and the argument to be this Tak P right but we don't want to find places where this vulnerability is not present because somebody has put this code in where they've actually fixed a vulnerability we don't want to fix because again maintainers will be really pissed if you fix a vulnerability it's

not a vulnerability they'll just be like why stop bothering me go away um but it can't be that easy right problem is subware people people write code in a lot of different ways so what if they pull that that qualifier into an argument into a variable or they pull that argument into a variable or what if they have the fix but it's in a variable how do we detect this how do we know if it's not vulnerable or vulnerable or not this is where we need data flow analysis so data flow analysis allows us to track what variables Where What Where where variables have been assigned throughout our code base and where where things are coming from what

they'll be um and data flow can also let us do more complex like it'll do intermediary things if it flows through a tary um you know other you know other Transformations that may occur on those objects you can determine that and figure that out and then and know what's going to be at at runtime or try to determine what would be at runtime the data flow allows to uncover hard to find vulnerabilities and helps us prevent false positives when we're detecting and fixing these vulnerabilities and they it by show of hands has anybody here written any code flow before hey okay so have you worked with their data flow API a little bit you okay if you if you are familiar with

codee's data flow API I have made open re data flow API to try to match that because it was my sanity trying to map my brain's model of data flow and control flow to rewrite so if you if you are familiar with code ql you can try to translate your knowledge between those two fairly easily um and so when you have this you can use this to then fix you can see that we've we' tracked that there is a vulnerability there between for that starts with call right all we have is that string starts withth and we're able to determine that it's vulnerable and fix it appropriately um as let a good story um this has a bit of a uh a an example case

vulnerability disclosure do drama I found this vulnerability in the AWS Java SDK um the AWS Java SDK is used to uh you can use it to download the entire contents of of an AB bucket on your local system and they have this bit of logic where they're trying to compare this key which is coming from the bucket with the uh the the directory that it's being downloaded into to make sure that we're not escaping this key is not an a path versal payload attempting to escape this route and you'll see that this leave root method is used to pro as a guard that will say cannot download key is relative path resolves outside of the Parent

Directory and so that was a vulnerability that's partial pth Ral um and as then a good story this vulnerab disclosure had this vulnerability disclosure had a little bit of disclosure trauma um this is an email conversation I had with Amazon um hey Amazon you know Amazon said hey we'd like to award you a bug Bounty however we need you to sign an NDA and I said I don't normally agree to ndas can I read it first before potentially agreeing and Amazon came out with this great line uh we're unable to share the bug Bing program NDA since it and other contract documents are considered sensitive by the legal

team uh I still have not gotten that money um uh but yes um uh they did apologize for that and they said that's not what we intended and I'm like okay sure but um I still haven't got the money um anyways um vulnerability number three is zip slip so Joe hands you who's familiar here with h path Ral payloads I know we talk about partial path Ral but you guys familiar with path rival payloads okay path rival payloads are an attacker attempts to supply data like do do do SL do do slash um and then some file path as an attacker Supply value and that gets appended to uh your intended uh direct directory and then

that um resolves and uh you are able to manipulate the file system in a way that you did not intend so similar to partial path veral but now full path veral so P zip slip is zip files fundamentally are maps the key is the destination directory that you want to put the file and the value is the compress contents of the file and so you iterate through those key value pairs usually in zip unlock [Music] unpack um so um what is this Phil look like well it looks like this that's a lot of code let's narrow it down a little bit so we have zip entries iterating over those this is the important bit we've got this un this

untrusted user value E.G name which is that key which you can't trust because it's being supplied by hacker potentially and you got this file output stream that's being created from that untrusted in or from that unrusted input and from this and this uh output stream so we're looking for the flow from this name to this file output stream um that's that's the that's the data flow we're looking for so zip flip is complicated and the reason the zip flip is complicated is because this is the vulnerable code this is a valid fix the problem with zip slip is that while this is a valid fix for this vulnerability so is this right both of these things adequately protect against

this vulnerability and there can be more complicated things like this so how do we just determine if this is vulnerable or not we need control flow analysis so control flow analysis um allows us to differentiate between these two CHS of code and what is control full analysis control full analysis is a graph and you compute it by um you compute it and you break your code down into two structures basic blocks which is the set of contiguous instructions that will occur with that are jump in a program and condition nodes where the program will branch and so using that we can take this chunk of code and we can compute the control flow analysis graph for this

chunk of code and see that there's this guard this guard here that prevents us from reaching this code it'll it'll always go to this IO exception won't reach this code if this guard is present and so because of that we can say this is not vulnerable code with our with our logic and so because of that we can determine that there is vulnerable code here when there is vulnerable code we do need to fix it we can differentiate between those two things and only fix the vulnerabilities in places where the vulnerability is indeed present and we can do even more complicated things because of the power rewrite we should clean up the code a bit more and fix you know fix these

vulnerabilities um yes so let's talk about polar press generation so we've talked about actually the fixing the vulnerabilities right detecting fixing with open re how do we actually generate polar press and give these things to the maintainers got a security vulnerability everybody has a polar Prest so let's talk about the problems with polar generation how fast can we generate polar requests well when you're trying to generate a polar request against github's API you got three different things file IO get operations get up API calls you got to check out the code the repository locally you got to apply the branch and and commit the change you got to Fork the repository on GitHub got to rename the repository on

GitHub you might be like why do I need to rename the repository on GitHub well if you're forking thousands of repositories across GitHub you're very likely going to encounter projects with suit the same name so let's say that I've got the project rewrite for one organization the project rewrite to the other organization when I fork both of those against my account get up will say great you got the first one you already got a repository that name can't Fork it so you've got to rename every single repository to another name um then you got to push the changes and create the polar press on GitHub you'll notice there's three API calls on GitHub or three API calls to GitHub thankfully

this has been recently merged into one API call so we've got two API calls however get ask that you wait at least one second between every write request so if you're generating thousands of pull requests and you've got at least two API calls per you've got to wait at least a second and then on top of that it have is another secondary rate limit that's just like you're doing stuff too fast stop and that one's the really annoying one to work with because it doesn't really give you a time for how long you need to wait so you just kind of have to wait and like procedurally back off and just like hope that you've got the next

request right before you know they're happening so uh yeah if H get up could stop rate limiting their API so aggressively would make my life easier I wouldn't be able to do this faster y y y so we made it this far we've V detected the vulnerability we've detected the style we fixed the code we've bypassed the rate limit by mostly just waiting more um how we do this for all repositories so this is where mad comes in so mad is uh free for open source projects it has over 31,000 um repository index both not just of java but also some other language hotland python um uh you can open rewrite transformation scale across these 31,000 open source projects and

you can generate and update polar quests using their platform and they have over 800 um recipe recipes include complete framework migrations um you can migrate from Spring one to Spring two J 4 J 5 you might not think about your testing framework right junit which is the testing FR for Java as a security concern but currently you can can't run the latest version of spring which I'm presume most of you are familiar with spring for its security context and making sure that's always up to date you can't write code using L using latest version of spring without updating your testing framework so your testing framework that you're using can be a blocker to your security you know

improving your security posture across your organization um and then on their public SAS public. mod.io you can use their SAS to generate P request and so that's actually what I did you know you can see there we go start playing great so I have a set of results that I've generated to fix the security vulnerability I can say I want to create the PO request for it set the branch name set the commit message set the commit title set the organization that I'm going to create the polar requests from Polar request title give my gpgp and suddenly I'm generating polar requests as me from their sta using my gpg key so all the commits have been

signed thousands of repositories right there

there we um but there are more than 31,000 repositories in the world right they used to say seven they there at the time when I gave this pocket black hat they only supported 7,000 repositories on mad now they have 31,000 repositories on M this is where code comes in code stands over 100,000 open source projects and you can run your queries at scale across their open source and they have 35,000 Java projects that they are index and aware of that you can run your very Advan and so once you have a coach Tober that you've written that finds the security vulnerability at scale across open source you can then turn that coach Tober you can take that list and give

that back to mad and say hey please become aware of these repositories and they will then add those to their CSV file list they will go index them for you and then you can run your recipes against those projects finally let's go generate some open polar press that's what I did I generated polar press for this vulnerability cross open source temporary directory hijacking zip slip partial pass reversals um that first that first attempt that I did po price generation back in 2019 40% get up actually did this one this was not me uh sorry this one was not me um uh temporary D this these are the new ones temporary director hijacking I 64 polar quests um

partial path veral 50 Polar quests and zip slip 164 polar quests the thing about zip slip actually is the funny thing is zip slip will partial path reversal is often occurs in the context of zip slip so you'll actually even though it's only shows up as F50 I actually use a zip slip actually applies the paral path Ral fix first if relevant um and uh okay there we go uh so I in 2022 I generated 600 plus full requests across open source to fix security vulnerability scale I've been a part of generating north of 5,200 PO requests to fix various different security vulnerabilities across Open Source One unlucky project was the recipient of all three of my

requests slip veral and I don't think this polar has actually merged which is something that actually go deal with at some point very soon um and also you see that star up there 1 or yeah that's that's concerning um and this is my contribution graph for 2020 uh 2022 um so let's talk in the time we have left about best practices for B generation first thing messaging um maintainers are people and I know we're throwing automation at a problem um there's this there's this great saying out there um all software problems are people problems in Disguise and we are definitely throwing software at a person at a people problem right maintainers are very familiar and kind

of gotten comfortable with the idea of having bugs reported to them but not necessarily security vulnerabilities so you have to be gentle so communicating to them stuff like um that you are not um that you're a real person that you'll be receptive to their responses that you're not just going to ghost them after you've given them the full request is important because it means that you're generating rapport with them engaging with them it's important to not just do this and then leave the the ecosystem with a with a flood of disclosures with maintainers that are like what am I doing with this some finer lessons on this point lesson one sign off on all commits you

might ask what why what is this so sign off is a is a get commit flag um and this is what it looks like it shows up in the commit and you say why why do I need to sign off on the this well there was a lawsuit T lawyers otherwise you're pull requestly rejected by evil dragon bureaucrats um so just sign off on your commits it'll make your life easier trust me um lesson two be be a good commit gpg send your commits then it'll show up as verified like it does over there it's coming from you and then it won't be like L corval where you know been impersonated on GitHub multiple times Lon three ccom I don't have time

to dive into this but it's a commit message format for security vulnerability fixes so if you want an audal traceable sort of way of determining was this a security fix or not that was in this commit consider looking into ccom there are risks to using your personal GitHub account um by a show of hands anyone here familiar with github's angry uniform nobody here familiar with github's anger unicorn wow I use GitHub a lot um this is github's anger unicorn um this is the this page is taking too long to load this was my GTO profile page for most of 2020 I broke my GB account um so yeah so I don't recommend using your giup

account but I also do recommend using your giup account it's like a double-edged sword I recommend using your personal GI of account because you're dealing with real people and so it's better it's easier to engage with them as a real person than it is through an automation account but you may break your g account so you know yeah um second lesson five coordinate with GitHub um this is the security email for the GitHub security lab let them know you're going to do this you know say like you know ask them to review your changes stuff like that they they're happy to collaborate with you um and then lesson six consider the implications I got this

issue open against my repository shortly after beginning work is this responsible disclosure I don't use the term responsible disclosure I use the more nuanced term coordinator disclosure but to answer both questions no this is full disclosure publicly of a security vulnerability in an open source project by doing this work I have been oding open source maintainers that's unfortunate and there are new things that have come out since I did this work if you look GitHub has now has a way of privately reporting vulnerabilities via private vulnerability reporting it has is currently opt in it's not it's not something that is automatically enabled on all gith her repositories and so we have this problem of how do we give security vulnerability

reports to maintainers in a responsible ethical way and the problem is is also the scale of the problem hundreds thousands of security vulnerabilities that need to be reported and get into the hands of maintainers and there's only one of me how do we do this in a way that reasonably scales and so this is a problem we've been tackling as a part of working with the open source security Foundation I've been running a working group inside the open source security Foundation trying to figure out the best practices for this but the answer is it's hard because private public polar prests are standard this is we're trying to do something privately and that involves coordinated disclosure timeline disclosures like

it's it's a complicated fickle thing to deal with with automation however I believe that we it's we need some way to fix these vulnerab we need to get these vulnerabilities fixes into hands of maintainers so in conclusion as as security researchers as security profession owns I believe we have an obligation to societ we know these vulnerabilities are out there we as Security Professionals sitting in this room understand these vulnerabilities we have seen them pentest reports we can articulate them to our peers we understand the Imp impact these vulnerabilities can have on our organizations and the customers and the people that we support we understand that spoiler alert most software developers don't watch black hat talks don't watch Defcon talks

we do but the soft people that are writing this code don't there's a stat statistic out there that GitHub is put published for every 500 developers you have one security researcher we are heavily outnumbered in this industry and so how do we best leverage our knowledge of math science technology security scale our knowledge to get these vulnerabilities fixed across our organizations and across the industry I believe that automation is the best way for us to do that but automation with compassion and automation working with maintainers in a collaborative way with that I want to leave you one final quote this is from Dan Kaminsky it's on his Twitter profile and it remains there to this day we can fix it

we have the technology okay we need to create technology all right the policy guys are mucking with the technology relax We're on

it I want to leave you with a couple final notes and just things to do um learn code ql seriously it's an incredibly powerful language you can use to find vulnerability scale use uh contribute to open reite and you can deploy your security fixes at scale you join the get up security lab and open rewrite slack channels to discuss those Technologies with the maintainers also consider joining the open source security Foundation where we're discussing this and other critical topics surrounding securing open source it's a it's an organization under the Linux Foundation the meetings are public there's a public community calendar you can join any of the meetings that you want to join at any time just look up

the open source security Foundation um and and and consider participating in in the community that we we've been building there and finally um I want to thank the open source security foundation and project Alpha Omega for supporting this work mad um for working with me on this human security for the Dany Fellowship that enabled this work over the past year Lydia Juliano the speaker coach that I worked with to come for this talk and Sham meta my intern last year who worked with me to create data flow and control flow analysis rewrite and also came up with the graphics that you saw for control flow um yeah that's me thank [Applause] you I have time for questions I think I

think like yes I have 15 minutes time for questions yes do you guys now have these program analysis techniques in open rewrite for everybody to use or is it like a private code based so um it's broken up into a couple different Cod bases um oh yeah so the question as far as I understand it is um are the program analysis components available for free or is it commercial um I don't work for mad I don't work for so all of the stuff so I wrote open re data flow and control flow analysis from scratch it did not exist it now exists um it's not perfect but um I was all done in an open repository on top of

their code bases so um uh this separated into a separate repository from their core repository so the languages themselves live in predominantly in open rewrite rewrite that's why they have Java group Ruby um cotlin uh no cotlin a repository and then there's another repository called rewrite analysis and that's where control FL analysis it lives and then the application of that op lives in another repository called rewrite Java security which is where the security recipes like Zip SLP paral path veral and stuff like that that actually fix the vulnerability live but all these are open source they're all Gradle based projects you can download them run them on your local machine unit tests you know for everything so you can try it

try it out and play with it and and figure out you don't need to get your code deployed on their SAS to try it you can just Tinker with it locally and play with it and find yeah find it Dev fixes um yes so I think the pull requests are awesome right they they make sense to do because it's stuff that's out there now yes how much time should we be spending doing that versus fixing the source right the source of like Zip slip is not is not the open source user who used it it's the API oh I agree um so so I've played this game um XML parsers all basically all of the

XML parsers built into the Java standard Library are vulnerable to XML xxe by default I um reached out to Oracle and said hey Oracle this vulnerability that exists in all of your XML parsers why can we fix this and they said Well we'd have to it be an API breaking change so we'd have to go through a formal je process which is the Java improvement process I don't think that de has been open so I was like okay fine I think this is a security vulnerability that exists in the Java standard Library um we should get that you know we should get it fixed but if not we should just at least publish we should

say we should state it publicly this is a security monner so I went hey Oracle has there been a cve issue they said no so I went to cve board and I said cve board Java standard library has the security vulnerability by default you know there should be a CV for this the CV board stamps was to summarize just because of vulnerability exist just because it's vulnerable by default in an API does not necessarily make itself a is not necessarily A vulnerability in itself and so they were not willing to issue a CD for that vulnerability even though it's a standing like OAS has an entire document on how to fix that vulnerability that's very

public so at a certain point you just say can't fix it in the source right I can't it I you you've tried that's the one of the things in so I've been working on a specification that defines how to do bul generation to fix this stuff at scale in a way that's ethical appropriate you know as defined by the open source security Foundation one of the first things is fix it Upstream first try to fix it Upstream first if you can't fix it up stream then this is the the next best option right it's like do the best thing you can but when you are limited this is the fallback right it gives us a

solution for maintainer won't fix up Upstream maintainer won't fix security vulnerability good question yeah from an Enterprise side are there any tools out there or any packages out there that include these tools like get up Advanced security or anything like that so get Advanced security includes po Wells um mad is a corporate entity they are the primary maintainer and owner of the open source project which is open R rate open R rate itself is completely open it's uh well it's some caveat but it is an Apache it's Apache 2 license project uh Java based project so you can try out the recipes run the recipes against your local stuff but if you want to run it

against your entire Enterprise you're going have an easier time with purchasing their staffs but um you can if you can get all your projects checked out locally you can just run the recipes against all of your all of your um projects locally if you want to and have the same effect it's just they the convenience of their sacks is making us you can run them all your projects so and then trying to shift this left to develop time are things like open rewrite um being looked at to be included in ide plugins and things like that so you could I think that may something they're thinking about longer term but um it is not something that I

have seen currently they're most mostly focusing on their staffs um and then also being able to use these recipes inside of places like your Maven and Gradle builds so you can say these are the set of recipes that I always always want to run as a CI check and if they generate a diff fail to build or you know at build time just run the recipes anyway so that they're always cleaned up the code so that hopefully some maintainer will always be running those fixes in general right so you have a couple options there um and those are options that are available to you without a license for for open rear iner um you can just install it in your build

so I don't think it's something they focused on in their IDE yet I also think that Jeet brains and Eclipse both have their own as models and so manipulating ests in that way would require some level of interop between those two as LSD itself yeah yeah um one of the things I often get to is what languages support um it supports Java predominantly um they're working on python currently um so they're so they started with a Java as and they've added python elements on top of the Java as so if you write a recipe for Java it theoretically can also be applicable to python in the right contexts um they also so Java python groovy um cotlin um

there were they other languages XML htl so you can you know remember I was talking about fixing XML vulnerabilities we finally have an XML parser that is format preserving um and then one of the languages they support is a language that is absolutely everywhere everybody relies upon it nobody wants to write it any guesses coal I heard it over here coal Coal yes so Cobalt they have Cobalt support Cobalt is not um open source it's proprietary only but um they do have support for C any other

questions question yes I saw you have after a few years of of doing this you have 40 % commit right why why do you think you know that's not higher and how do you get people to so you'll notice there yeah so there's a lot um good question so um the question was why I've got a 40% merge rate on this here right but there's like 2.3% here 7.6 what what what causes the impact on TR well this one is a very simple security vulnerability there's also a lot of them but there's a lot of as you'll find out there's a lot of dead projects out there a lot of projects that maintainers just you

know I that being said I still get PS I mean these are all generated from my personal GitHub account I still get GitHub notification pays that I'll see like hey this polar request emerged from Polar request I generated in 2019 so it's still happens um this one 2.3% um J hipster is a code generator and it was a code generator that was used to generate the like bootstrap your basic Java project for a spring project the problem was that the jhipster code generator was generating code with security vulnerability in it so there was actually like 15,000 plus instances of the same security same exact security vulnerability same exact file appearing against 15,000 open source projects you might ask me why 2.3% well

a lot of those projects when people generated them they were like one off I'm trying this as a part of like a class something like that and like I'm never touching it again right so it didn't really matter but some portion of those projects were legitimate projects so that's why we have such a low rate for that um this one GitHub did um and then this these were just done in 2022 right um so 40% is actually pretty good so we get about 40 to 25% is 20 25% um most of it's not because the maintainers like go away most of it's because the projects just nobody's nobody's there it's a lot of projects that are on open

source that are just as a really good indicator right this project is not maintained because the security poll request came in and nobody's looking at it um yeah yes um you've engaged with with GitHub on some of these vulnerabilities and and fixing them at at Mass uh do you know if they're working on any kind of continuous scanning to include like the like they do with the dependabot for these sorts of vulnerabilities so they have Cod right so Cod is free for open source it's something that you can actually it used to be something that you'd have to add a GitHub action for now um if you go into your GitHub repository settings there's a check box that you can just

check to say hey I want to enable the default configuration for code ql and now code ql will scan your repository and give you the alerts inside of things the thing that the thing that bugs me about tools like code so code ql and things like that fall into this thing called SAS static code analysis tools tooling the problem with sass tooling that I have struggled with for years is you spend all of this time tracking down the as nodes that make this vulnerability up you spend all the time you get them in your memory in your hands and then you just let them go so report it to the maintainer to say hey you've got a

vulnerability rearrange the nodes and fix the vulnerability and that's the thing that I love about o is it finally gives us the ability to take these as nodes that we have spent all this time with our stas tooling detecting and actually rearrange them to fix the vulnerability and give the maintainer not just an alert but here is the fix for this vulnerability that I've detected just merge it that's really powerful and that's something that no other stas tool can do as far as I'm where to this day how do you verify changes to make sure that you're not accidentally inserting vulnerabilities into open source projects as opposed to just fixing them well how am I how do I

verify to make sure that I'm not inserting security cabili to projects when I'm attempting to fix other ones [Music] um uh so one of the bits of suggestions is um take your fix and vet it by another organization um I for example work with get up security lab and I will say here's a set of tests here's this test Suite that I've run this thing against look through this test does this change seem reasonable right so getting a third party to say you know and G also encourages us too because there have been other people that have followed up with this and tried to do this work and they just generate code that will not

compile so um uh following up on this work um TRX is another security research organization they were inspired by the work and they went out and attempted to fix T slip which is a security V vulnerability involving Python and the tar library was vulnerable to what basically zip slip but tar files by default and um they generated 65,000 pull requests to fix that vulnerability cross open source and up set a couple maintainers because they had some bugs in their code Generation Um all of that work actually finally got the python the python core team to make a pet to fix that vulnerability in the python core Library so this work making it that blatant that hey python

ecosystem Library maintainers you've got this problem everywhere will'll finally get them to make a change to fix the vulnerability in their standard Library you know if sad that we have to play that game as security researchers but it it's effective in the end of the day it has been effective and hopefully that'll fix this for all time future moving forward yeah good question yes I five more yes go for it this is kind of a statement and a question go for it um I know we're in a security context here yes but you know it seems that these obviously would these techniques would obviously uh help also with code quality yes so I mean I get the same question

all the time sonar finds all these things goes through all the effort of doing it yes and we automatically fix it I'm like go go talk to the folks and go look at their recipes they predominant set of recipes I saw their technology so the question was we have code quality problems right like this seems like it's not just security issues it's also code quality problems mad and open rear write was originally designed around the idea of code quality and fixing that problem and framework migrations I saw it as a security researcher and said that is perfect for fixing security vulnerabilities that is not the predominant Focus unfortunately I think it should be I think that a great Market

um but I am one of the primary people that's been writing and fix generating the security fixes to fix these vulnerabilities but their predominant selling point is framework migrations code quality improvements getting your code consistent so that you've got you got a bunch of technical de right everybody's got got technical Deb let's go clean that up fix it so you can deal with the things that your business needs the security problems you're not dealing with has anybody ever done a framework migration like you usually requires a senior engineer to come in and go through your entire code base for weeks sometimes to fix your to get all things up because they know where all the

cobwebs are if you could Auto you could take that knowledge of this is what a framework migration is and code it in code and just get it deployed it saves your organization so much time like I've done framework migrations I'm a software engineer turning security researcher I I you know I don't work for them I'm not selling their product but I see the value so yeah and that you a question that that was it perfect um I have two more minutes any lingering questions yes for projects that haven't been maintained and you've opened up a poll request with the fix uh does that make you the new maintainer no I know no because I don't have any permissions

they didn't give me any permissions I can't merge a poll request so no um uh I claim no ownership over the state of the world I just try to I've been analogized as the janitor of the open store of like I'm sweeping up all of the stuff that's just been left behind so and someone needs to do that right like these vulnerabilities are not new these vulnerabilities are things that we have known about for years there are blog posts about them there are documentation on OAS this I'm not doing not it's just that they're everywhere they're just there because people don't fix them so let's actually fix this stuff yeah I I do just want to

say that I think one of the the underrated parts of this is that even though like 60% of the HD downloads aren't merged yes it's going to cause some SCA tools to fail Cod quality checks for new libraries so even if it doesn't get merged it's helping well the thing islight yeah so this I mean the the the 60% P requests that are not merged I don't think that anybody's taken that data and turn that into an alert that's like Snick is flagging or something like that that hey this project is not maintaining or merging this stuff it'll get pulled in as part of some of the tools interesting yeah I have um one of the things that I

discussed with the c cve board is hey cve board can we put a Capper cve on the end of a project version range that says this vulnerab this this version is no longer maintained and there's a vulnerability in it and they're like nah there like not necessarily the end of a version range support is not necessarily itself a vulnerability I'm like but it is if there's a vulnerability that come talk to me about this afterwards um I'm at of time I'll be around I'll be carrying the duck around I've got the Hat on come chat with me I'm here all day and thank you all for having me I really appreciate [Applause] it awesome great talk great talk uh

Jonathan thank you so much good stuff thank you thank you for having thank you for cleaning up the world internet janitor right here all right we got our next talk coming up in just five