Scaling the Security Researcher to Eliminate OSS Vulnerabilities Once and For All

Name: Scaling the Security Researcher to Eliminate OSS Vulnerabilities Once and For All
Uploaded: 2022-09-04
Duration: 50 min 6 s
Description: Jonathan Leitschuh and Patrick Way present a case study in automated vulnerability discovery and remediation across the open-source ecosystem. Using code analysis and bulk pull request generation, they identified and fixed critical supply chain vulnerabilities (HTTP dependency resolution, partial pa

BSides Las Vegas · 202250:06257 viewsPublished 2022-09Watch on YouTube ↗

Speakers

Jonathan Leitschuh Patrick Way

Tags

CategoryResearch Technical

TopicSupply Chain Security Vulnerability Research

ResearchCase Studies and Incidents Analysis Methodology

StyleTalk

Mentioned in this talk

Tools used

CodeQL OpenRewrite

Platforms

GitHub

About this talk

Jonathan Leitschuh and Patrick Way present a case study in automated vulnerability discovery and remediation across the open-source ecosystem. Using code analysis and bulk pull request generation, they identified and fixed critical supply chain vulnerabilities (HTTP dependency resolution, partial path reversal, zip slip) affecting major projects from Spring to the NSA, detailing best practices for coordinated disclosure and maintainer communication at scale.

Show original YouTube description

BG - Scaling the Security Researcher to Eliminate OSS Vulnerabilities Once and For All - Jonathan Leitschuh, Patrick Way Breaking Ground @ 14:00 - 14:55 BSidesLV 2022 - Lucky 13 - 08/09/2022

Show transcript [en]

uh what's up please welcome jonathan lightshoe and patrick white

hello everybody uh good afternoon and welcome to uh scaling the security research to eliminate open source uh vulnerabilities once and for all um my name is jonathan leitchu i am a software engineer software security researcher i'm the first ever dan kaminski fellow at human security i'm a ghetto star and get up security ambassador um and you can find me on twitter at jlightshoe and on github at j light you as well and patrick uh yeah i'm patrick way i work from darren i'm the open rewrite team i've been writing software for 20 years or more and excited to be here talking to you today so short disclaimer um uh we're going to be discussing so first

off i'm sponsored by github second off we're discussing a sas solution that is sold however all the tools and infrastructure and everything discussed in this talk is available for free and free for open source uh are not free for corporate but free for open source and you can use this stuff in your own security research focused on open source software and with that other way i want to talk about so this uh the work and my employment is sponsored by the dan kaminski fellowship the dan kaminski fellowship was created after dan passed away last year i sadly never got the opportunity to meet dan um dan for those of you who don't know was the hero that was uh he was known as the

hero of the internet for his incredible 2008 dns vulnerability that he helped silently patch and the dan committee fellowship was created to celebrate dan's memory and legacy by funding open source work that makes the world a better and more secure place if you have a project that you want to work on that you think will help improve the security of the internet human is accepting applications for the 2022 dan kaminski fellow currently so this story started with a vulnerability a simple vulnerability and this vulnerability existed in my company's code it was the use of http to resolve dependencies in in my company's gradle build um and the reason this is important is because if you're using hdp

to resolve your dependencies in a maven in your gradle or maven build um it uh you can have that connection get intercepted and people can maliciously inject additional code and be running additional code in your ci cd development pipeline or on your developer machines and this vulnerability didn't just exist in gradle builds uh i it can also exist in maven builds this is an example that where your dependencies are getting resolved uh for compiler and test dependencies and this is artifact upload so this is the final release artifact for a gradle for a maven build and with this credentials are also attached to it so you're also exposing credentials publicly and this vulnerability was everywhere

it impacted organizations like spring apache foundation red hat kotlin jetbrains jenkins gradle groovy elasticsearch eclipse um oracle the nsa linkedin and stripe um and so i i found this vulnerability impacting all of the open source projects of all these different organizations across their getup accounts and at this point i realize i was like this is bigger than i thought um so i reached out to maven sauna type maven sonar type is is um the pip to the python ecosystem the uh npm to the javascript ecosystem maven sonotype is that to the java to the java ecosystem and they said that they looked at their traffic logs and saw that 25 of their traffic was still using hdp in june of

2019 and so i said all right how do we fix this so i pushed forward an initiative that on january 15th 2020 all the major artifact servers in the java ecosystem would decommission support for hdp in favor of hdp https only and uh and we published a blog post i published a blog post and then i reached out to me and sauna type again um in january of 2020 and they said that 20 of their central traffic was still using hgb instead of https even after you know uh like three quarters of a year of trying to disclose this to people and so you can imagine what happened on january 15 2020 broken software broken software

everywhere um uh but we stopped the bleeding um but this didn't fix the whole problem what about the other repositories these are only the most commonly used repositories in the java system um maven central j center spring and and the gradle plug-in portal other companies post their own java artifacts on their own servers and so you'll see other companies uh urls are build your urls and other builds across the system so how do how do we fix the rest and i said well i mean opening issues is one way to do this but like let's just go fix the problem right um and so i said let i'm gonna try bulk pull request generation and so how did i do this the uh first

off the first thing you need to do is you need to find the projects that are vulnerable right you need to identify the open source projects that have this vulnerability and so how i did this was i wrote a code query for it and uh this code code query looks for maven palm files that have the use of http in this sensitive location in the great in that file in that xml file and will flag it and by scanning uh codeql scans hundreds of thousands of open source projects on every single commit and you can run and they build databases for those for those projects that you as an open source security researcher can run queries against and find

vulnerabilities in open source at scale and so they also will pay you for queries that you write and contribute back to github for the github for as part of the github security lab binding program and so for this simple query github bountied me at 2 300 for this um and using the list of projects that were vulnerable from this query that i wrote i started generating pull requests and so the first bot the first thing that i ever the first pull request generation generator that i ever wrote was python based it used a wrapper over github's hub cli added one nasty regular expression and there was a lot of logic about bouncing off of github's rate

limiting um and this was version one um it probably looks pretty washed out now but like um uh this was the first pull request generator uh it's the it's a there's an underlying engine but like this was the logic for generating pull requests to fix the security vulnerability and this is the secure this is the regular expression and the reason that we had to use a regular expression uh i'd use a regular expression instead of using an xml parser is because if you parse xml into an xml parser and then you modify that xml the xml that comes back out when you print it back out will be in the format of the xml parser's output not the

output not in the format that came in and so if you're trying to modify tons and tons of different xml files that have tons of different formatting you're gonna end up with massive diffs and they're all gonna end up looking like the same thing and you're not gonna end up looking like the code the maintainer gave you are started with and they're not going to be happy they're going to be like great thank you for fixing this but it doesn't look like the code i wrote and the other problem is this is using a regular expression and when you have a problem that uses regular expressions you're gonna end up with two problems um because uh yeah

anybody who's working with your regular expressions understands this um but it worked um this is my github contribution feed for pull requests um when i generated this and this is an example of the diff that was generated right you can see that the all the places where http is being used in this sensitive location was fixed and uh i generated 1596 pull requests with this uh this python bot that i wrote and as of 2020 to 2022 today um we've had about a 40 merge rate of those 1596 pull requests and for this uh github awarded me a four thousand dollar bounty under the github security lab bug money program for uh not just that writing the query that

found the vulnerability so this is in addition to the 2 300 for writing the query this is for actually using that query to try to fix the vulnerability at scale and i got hooked on this idea of bulk pull request generation as an idea that we could use to fix vulnerabilities at scale across the open source ecosystem this is my github contribution graph for 2020 you can see the two peaks i actually did two different campaigns uh in 2020 um and you know this this tells a complicated story uh so i have a problem um my problem uh i've adhd um that's not my problem um but my problem is that uh i love chasing

squirrels and vulnerability disclosures are perfect places like you go read a vulnerability disclosure and you're like i wonder where else that is in open source like how many other places can i find that vulnerability and the problem for me is that if i use codeql or if i use code github code search i will find too many security vulnerabilities and for example this is um codeql results for zip slip right i can page through pages and pages and pages of results for zip lip vulnerabilities across open source so i'm finding too many vulnerabilities and so i need a way to fix this problem i can't just report to all these different projects i need a way to find to scale

my knowledge of security vulnerabilities to fixing these vulnerabilities in in a more productive way and so i need automation and so at this point i'm going to pass it off to patrick to discuss um open rewrite thanks jonathan yeah so that is a challenge i mean jonathan really expressed a couple of challenges that he faced right like hey i want to use this regular expression and of course that's a problem or i'm going to use an xml parser and it's going to come back goblin goop and not be a viable pull request so and then there's scale how do we detect all these things

[Music] all right all right so he needs automated transformations at scale and this is where this is where open rewrite comes in um you know somehow as software engineers we're professional automators we we automate anything that we can possibly automate it's just in our blood it's what we do but for some reason in 2022 most of us are still slogging through the tedious task of updating that dependency finding the breaking change uh you know a framework upgrade whatever it may be it's a tedious task and we like burn our fingers on the keyboard to do it well finally finally we have an opportunity to do something different finally it is 2022 and now as software engineers we can

start automating our own tedious tasks and this is one of those projects when i learned about it for the first time i realized like this is something new this is this is game changing right we've been writing daos and business logic for who knows how many years but now now you get to write code for your own code the the model that you're working with is your code it's it's it's amazing and it's incredible so what what makes it amazing what makes it possible and and what really is it so there's a need to transform code and it was discovered that hey the compiler represents source code as an abstract syntax tree it's like compiler's

representation of source code it's bare minimum but if you look at that tree you can kind of start to formulate ideas on like hey i could i could transform that tree i could write it back to a source file and i could i could make a diff i could make some change well that's a nice concept but again the compiler's ast is bare minimum there's no formatting the compiler doesn't compare but care about comments or spaces or tabs if you modify that and write it back to a file you're going to get not a pull request so one of the first challenges for open rewrite was to make this a fully format preserving abstract syntax tree so now

we have an abstract syntax tree that preserves that precious white space and the comments and all that contextual information that's so valuable to developers and you can write that tree back to a file that's identical to its origin so that's a huge unlock if you're able to do that we're also able to auto detect the style so we need to write source we need to transform code and write it back according to the style of that particular project so open re-reg will detect like hey this is using spaces this is using tabs this maybe has two spaces in a tab uh you know the the nuance is like braces on a new line so all that is done for you by

the framework but there's another challenge so we we got past the the fully uh you know the format preserving but again a compiler's ast is single level that the type information is not it's not fully typed so you you look at that log.info and is it log4j is it slf or j is it something custom you need you need the full depth of type information on that tree another problem that was solved by the open rewrite team and with that what you end up with is we have a fully type attributed format preserving abstract syntax tree it's syntactically and semantically aware and if you compare that to the compilers tree you can see like that's dense

there's a lot of information there's a lot right there that you can use to make accurate transformations at scale that match the format of their source well even a simple simple tree is a complex ast right just a simple bit of code is a complex axt if jonathan is going to do this at scale we have to provide him with the tools to be able to get beyond the tedium one of the biggest challenges like so is is take take this little conditional statement here jonathan is going to share with you a recipe a zip slip recipe in a little bit and he needs to insert this conditional statement into a block of code to do that and to

to handle the formatting and everything else for all the different projects would be nearly impossible so what we have is we have a templating engine the templating engine allows jonathan to build a tree out of a out of code so he can write code in in java and build an abstract syntax tree from it and he can do it in two statements so here here's an example of building up a java template you can see that there's the conditional statement it's got some handy features like uh parameter substitution and etc you'll notice that there's not a lot of formatting and this is where he's going to apply it to the tree it's got a coordinate system

so you can say hey take this bit of code substitute these parameters and place it in this spot in the end he can go through thousands of repositories find a vulnerability and insert that conditional statement exactly where it belongs with the exact correct formatting and so with that i'll hand it back to jonathan so he can share with you some of the more work that he's done on it thank you patrick

so with open rewrite with the with what ownvocal rewrite provides what other vulnerabilities can we fix um and what what's possible now so i'm going to talk to you about three different security security vulnerabilities and uh talk through open rewrites of application to fixing those vulnerabilities so the first vulnerability we're going to tackle is temporary directory hijacking so temporary directory hijacking the basis of temp directory hijacking is that on unix like systems the system temp directory is shared between all users this is not applicable to windows and mac os but on most other units like linux operating systems the temporary directory is shared shared between all users and this is the vulnerability this is the way that you will find a lot

of java code creates a temporary directory in java where they will create a temporary file call delete and call minter and the reason that this pattern exists is because prior to java 1.7 there did not exist an api in the java standard library to create a temporary directory and so what people did was this pattern of creating a file deleting it and creating a temp directory and this creates a randomly named pseudorandom or using a pr uh secure random number generator generates a random name for the file um this also if you were to look up on stack overflow how do you create a temporary directory in java this would have been the solution and unfor

unfortunately you'd end up with this vulnerability and so this vulnerability exists here it's a race condition between the delete and the make dir and the reason this vulnerability exists is because um an attacker can see the creation of the file and see that that file gets deleted and then race the java process to create the temporary directory before the java process does and the reason this is a vulnerability is because make dir if it fails returns false not it does not throw an exception and so the way to fix this vulnerability is to throw that logic into an if block that's one valid solution but it's imperfect and the reason that it's imperfect is because when you use maker

you're using the default uh you name set uh the default u name and that will make the directory with all open permissions to let it be it will not be editable by other another user but it will be visible to all other users so if you're writing sensitive information into that directory that directory's contents can be viewed by all of the local users on the system and so this is the fix it's to use the new java files api which has a create temp directory call um which is secure and this vulnerability that i i found a bunch of different places has i've received a bunch of cvs for this um and so i said okay there's more of this than

just these cvs let's try doing both pull request generation and so that's what we did um and so far um it's actually number is not up to date i i i did more last night but um i don't know exactly what it is all it's in a later slide but at least 64 pull requests and this is what the pull request looks like you can see that there's the deletion of the delete and the maker and the replacement with this new api call but we can do more complicated transformations than that let's say that those makers and deletes were used inside of an if block you can also remove those because they're no longer needed and just replace it with

that single line vulnerability number two that i want to talk to you about is uh partial path traversal so the basis of partial past reversal let's assume that you have two local users on a fought on a system user sam and you want to isolate your logic to only operate within the directory user sam and you have another user on that system user slash samantha partial path traversal allows an attacker to access a sibling directory with the same prefix and so taking this example again um where user sam and we have user samantha um if you're sandboxing your logic to user sam it's vulner you can access user samantha because it's a it's just it's a prefix of user sam or sorry

user sam is a prefix of user samantha and this is the vulnerability the vulnerability uh and the reason this vulnerability exists is because when you call a file when you call the method get canonical path on a java file you'll notice that the trailing slash gets dropped from the string and so what you're comparing is not when you're using this in a string comparison you're comparing paths without the trailing slash and so going back to this vulnerability the example is let's take user sam and put those into the two locations in code where they're going to end up and then we have a user supplied input that comes in which is dot dot slash samantha slash baz when that gets

appended together you're going to end up with user slash samantha slash baz and this bypasses this guard and so what's the fix for this vulnerability well going back to the original example what we're looking for is um this is this is one valid fix where you re-add the slash back in however you're still doing string comparisons and that's less than ideal um a better solution is to um replace uh the uh get canonical path with get canonical file turn that into a path object and do starts with comparisons on the path object because java's path object will do this comparison and do this starts with check correctly so how do we find this vulnerability to

actually fix it well first what we need to look for is we need to look for starts with calls on that you need to look for starts with you need to look for the string method starts with call and then you need to look before and after and see the get canonical paths on either side that starts with call but you also if you're going to fix this vulnerability correctly you also want to make sure that you're also filtering out cases where that slash that separator character is being appended correctly it can't be that easy though can it well developers also write code in a lot of different ways what happens if the developer doesn't just write this but

they extract one half of this to a variable or they extract or they extract the second argument to a variable or let's say that the correct logic is extracted to a variable how do you identify this vulnerability still existing even in the context of of that logic being pulled out into a separate variable we need this concept called data flow analysis so data flow analysis lets us track the uh logic from the sources of variables and how they end up where they're uh how they end up flowing to variables and this lets us for example see that these variables are getting assigned to these different locations and it can be more complicated than this it can also go through intermediate

variables so data flow allows us to uncover hard defined vulnerabilities and prevents false positives and um the data flow analysis api is very is modeled after coqls so if you're familiar with codeql or you learned open rewrite you can translate that knowledge back and forth between these two languages and these two frameworks to find these vulnerabilities and also fix them and putting it all together you can see in this example where the path was removed or this path that was vulnerable gets replaced in this place and so this is an example of the actual diff that was generated for partial path reversal um i have a brief aside story because it's just was just too funny not to tell

um there's a case study of this vulnerability cve uh five 2022-31159 uh the vulnerability existed in the uh aws java sdk um and it was partial past reversal they were created this guard which was checking to see if while downloading the contents of an aws s3 bucket if the s3 bucket key um uh was was traversing outside of the destination directory while you were downloading the entire contents of the s3 bucket and you can see that this this partial path reversal vulnerability was being used as a guard against path traversal in this leaves root logic and uh so this got a vulnerability cv assigned to it everything went well fine on that but there was a little bit of drama in

another way i had an email with the aws security team uh where they sent me this email and they said we'd like to award you a bug bounty uh however you need to sign an nda for this for the for us to award you this amount and i said i don't normally agree to ndas can i read it first potentially before agreeing and aws came back to me and said we're unable to share the bug bounty program nda because it and other legal other contract documents are considered sensitive by the legal team it was it was like amazon used uh legalese and it hurt itself in its confusion like so uh story that just could not be not

told as a part of this um all right so the third vulnerability is zip slip um so people who have been hackers for a very very long time are probably know what this is but to summarize zip slip is a path traversal vulnerability that exists while unpacking zip file entries because zips are just you know a name of where you want the desk the pad the file to end up to the contents of the zip file um and so this this vulnerability can exist in java um because it's very easy to unpack zip files um and so this is the logic to un that you'll see in a lot of places to unpack zip files the snick team did a bunch of

research back in 2018 to eliminate a lot of this vulnerability from the open source ecosystem they reported it to a bunch of different open source projects not just in java but other languages as well um and uh their fix was you know for the most part correct but there were certain cases that actually were still vulnerable to partial path reversal from zip slip um but you know that that's neither here nor there um the uh so this is the subset of the logic that that leaves you vulnerable um it's that the entry name um flows to this file output stream when you're and you're copying the contents of this file output stream so you're using

potentially malicious zip files to download the con or to you're you're unpacking potentially malicious zip files into uh an outside of the destination directory potentially if there's a pat if the attacker has supplied a pass reversal payload in the entry name so zip slip is complicated the zip link is complicated because in order to fix it you need to add this guard that protects against this vulnerability existing um you're you're checking to make sure that the uh that the the file is within the destination directory but the further problem with this is that even though that's a valid fix there are other valid fixes as well this so this is one valid fix for it but

this is another one where you just put the logic inside of an if check and so in order to determine if you are not going to be able to reach this logic you need a new concept and it's called control flow analysis and so control flow analysis lets us determine that is there a guard in place that protects against this vulnerability or not right so the version on the left does need to get fixed but the version on the right does not because there's a valid protection against this vulnerability and control flow analysis uh did not exist in open rewrite prior to the work that we were doing um but it it lets when we added it it enabled us

to take um this you know take up take a chunk of java and produce a graph which is at every jump what uh what is you know what what chunks of logic are reachable and so that lets us for example for zip slip we can build a control flow graph or zip slip and traverse that graph and determine that if you reach the starts with call and that is false then there will be an exception thrown and that that untrusted logic will not be reached so you'll it's a valid guard against this against reaching um that potential vulnerability vulnerable logic and so when we put this all together this is the example of the diffs that

you can generate um and you know here's another example where you can see we've not only you know fixed the code but we've also cleaned up the surrounding code as well and so uh pull request generation right let's go do some pull request generation if you've got security vulnerabilities you can get a pull request um so the problems with pull request generation one of the things that you'll run into really quickly is how fast can you generate pull requests when you're generating pull requests um i think i said real fast there are three different types of steps that you need to make there's file io there are git operations and there's the github api calls

and file i o is basically free because it's happening on your local machine you have git operations on github those are free they're not rate limited but then you have github api calls that are rate limited so the first step is you check out the source code and download it which is a git operation and then you file i o which is branching applying the diff and committing the change then you fork the repository on github which does actually um um that so you fork the repository on github um which is an api call um and then uh you rename the repository and github and you push the changes um again a git operation and then you

create a pull request on github so of these things there's three api calls and the three api calls um there's a story behind this there's a story behind this um the you have three api calls um that uh are neat that are rate limited by github right and so github recommends that for every um request or for every uh every between every request you wait at least one second um additionally there are other rate limits that are in place that they want to prevent you spam users or malicious users from generating more pull requests uh they don't want people spamming maintainers right they their users are maintainers they so they're trying to protect their users so if their their

protection of users unfortunately limits our ability to do this work of security researchers um so if there's anybody github in the audience if you could stop rate limiting your api or at least not doing as much that'd be greatly appreciated so we've made it this far um we've vulnerabilities have been detected style's been detected um code has been fixed and the rate limit's been bypassed um how do we do this for all the open source repositories that we want to fix and i'm going to pass it back to uh patrick to discuss modern down button well first let me make sure i've got it right side up we'd like to the open rewrite team we

would really like to thank jonathan and shawn for their work on behalf of open rewrite i'd really like to commend jonathan ansham for their work to add control flow and data flow to the project not an easy task these guys worked hard they put their heads down and they pulled it off so some context shawn is my intern who has been working he's in the back but he's he was essential to making control flow something that was possible nice nice job yeah you guys you guys rocked it all right so modern is a company that's supporting open rewrite and we we provide a uh a free service for open source projects um it allows you to run you know our 800

plus recipes over 7000 plus repositories um you could do your transformations at scale you can you can find usages of types of method implications all across either all of all the seven repositories that we have ingested or whatever you have in your organization and it'll also generate an update pull request for you this is really what's allowing jonathan to scale so as i mentioned before the open rewrite the suite of projects we've got testing frameworks logging frameworks rewrite spring rewrite kubernetes you name it we've got a whole suite of projects we have more than 800 recipes including complete framework migration so uh you know j in it four to five is a really tedious uh transformation to make

we have a recipe for that you want to get from spring boot one to two you need to do uh cheating at four to five we can bundle that all together so the developer is not having to sit there and bang their head on the keyboard and just work their fingers to the bone with all these tedious changes so back to pull request generation jonathan he's finding vulnerabilities everywhere he's got a handy-dandy recipe to fix them and now he needs to get those out madarin also provides the ability to to run that recipe costs all 7 000 repositories that are indexed and generate pull requests for them you can see they have a message that's uh

contextual and and yeah lots of results so i could talk about this or we can kind of give you a quick little demo of video here you can see it's it's run the recipe it's generated some results it's jonathan putting together a message he's put it signing his his pull request and off it goes it's it's running through building pull requests at scale so now as jonathan finds those squirrels everywhere he's got he's got the tool to fix him so now so there's more than seven thousand repositories in the world um how do we find the other vulnerable projects right and this circles back around to what we were talking about at the beginning of

the talk codeql so codeql uh as i said before it uh index uh indexes over a hundred thousand open source projects and at least thirty five thousand open source java projects um it's they support python cubase plus um see python go uh and some i think there's some other languages in there too but um it lets you write queries to find these vulnerabilities at scale and so those queries because there's more than 7000 open source projects out there you can use codeql to identify the vulnerable projects and then you need to make open rewrite aware of them or the modern sas aware of them and the way to do that is you just generate a poll or you open a

pull request against this repository and add to the csv file and then open rewrite will try to go out and build that project to to index and let you generate pull requests so finally let's go generate some pull requests this is basically all of the the giant pull requests that are projects that i've i've either been a part of are aware of the only one that i actually was not directly involved with is the our hostname array uh one where github used my python bot to generate those pull requests to fix uh a uh array overflow vulnerability um but the other ones and these are all pull requests that i've generated and for new pull requests as of 2020 i've generated

over 9 5 000 sorry wow 590 pull requests um and to my name across my history as an open source security researcher i've generated over 5200 pull requests and uh one of the projects funnily enough the three different the three pro uh the three vulnerabilities discussed in this talk um there was one unfortunate open source project that received all three pull requests um uh so yeah and this is my github contribution graph for 2022. uh it's not done yet but you know you can see the the impact of bulk holder across generation on that and so now i want to talk to you about some of the best practices for uh bulk polarized generation first off messaging

um you're dealing with maintainers you're dealing with real people um there's a saying all software problems are people problems in disguise this is 100 one of those cases right you're dealing with actual people maintainers of the software um and you need to be sensitive about that you're not just disclosing a security vulnerability you're not just disclosing a bug in this you're disclosing a security vulnerability and so there's a certain amount of ego that gets wrapped up in this we're challenging not just you've had a bug which maintainers have been pretty normalized to but you've developed you've had a security vulnerability that could put your users at risk you're you're hitting them in somewhere that's different and they're

not used to and you need to be careful about that um so lesson that's that's the you know lesson zero so lesson one about this um sign off on your commits um and this is what github sign up this is basically uh you add this to your commit messages um and why well the reason why is uh there was a bunch of lawsuits a lotta yada yada tldr lawyers um if you don't your pull requests will be rejected by evil dragon bureaucrats uh so gpgs are sorry uh sign off on your commits lesson number two be a good commit is in gpg sign your commits um if you so this is what signing your commits looks like it shows up as

verified um and you won't end up like somebody impersonating linus tar torvald on github because you know uh so if you gpg sign your commits um it'll prevent this lesson three um there's a commit there's a standard called ccom which is a commit format um and if you lay it out thank you thank you so the story i'll tell it when this is over but yes there's a story i'll tell this one this is over so sec thank you so much i appreciate i asked

we have a tradition where speakers are allowed to make an outrageous request when they submit their talk yes we accept their talk and we like the request we'll fulfill it however we often fulfill it in sort of a slightly evil genie sort of way so thank you you are quack quack thank you i appreciate it yes there's a longer story so this is not the first duck that i've ever seen i spoke i spoke about the zoom vulnerability which i found in shmukon and for some strange women somebody brought a duck up on stage for me because i also use the duck in that talk so i try to put the duck in every talk

and so i just said hey if you want to bring one that'd be great and lo and behold they have all right taking a step back um ccom so it's a commit message standard for for putting all the information in about a vulnerability into the commit message so that it's parsable um uh yes lesson number four um there are risks to using your personal github account um anybody here who's familiar with github uh github's angry unicorn this is github's angular unicorn that occurs when you hit a 500 error this is my github profile for most 2022. sorry 2020. um uh and the only way that i was able to fix it was by reaching out to github support

and i'm also github star talking to them directly and and asking uh to help so you can break your github account if you do this uh fair warning um coordinate with github uh there is there's a benefit to doing this because if you if you use your personal account and p and maintainers have issues and comments you'll get notifications and you can engage with them which is important from the communication side so there are risks but there are also an ad an advantage to doing it this way um [Music] uh yes this is my github profile for most of you coordinate with github um i reached out to github before attempting this um they want to know

that you're doing this so that you're not spamming they want to make sure you're not spamming people so it's a good thing to do and then lesson number five is consider the implications um shortly after engaging in my most recent round of bulk plural quest generation i received this issue on my security research uh i have a j light shoe slash security research uh get a repository is this responsible responsible disclosure now i use the term coordinated disclosure when i do vulnerability exposures which is the new more nuanced term um but either way you're going to call it coordinated disclosure responsible disclosure the answer is no this is full disclosure of a vulnerability you're odaying them potentially and that there

are implications there that you need to consider um i argue that given the scale of the amount of vulnerabilities in open source it it actually is a better net good to fix vulnerabilities like this than it is to not report them at all because that's the alternative that i'm facing there's only so much time in a day that i have as a researcher and so you're gonna end up full disclosing this vulnerability but the net benefit to the security the internet is more is positive and so i want to leave you this this conclusion as security researchers i feel we have an obligation to society we know that these vulnerabilities are out there and we know how to fix them we've

written pen test reports we've seen them in source code reviews we've had them come in from you know a variety of different places right we understand how these vulnerabilities exist there's this problem in the industry for every 500 developers you only have one security researcher this is from github in 2020. so we are vastly outnumbered right and spoiler a lot of developers don't watch besides talks don't watch black hat talks don't watch defcon talks right it's the unfortunate nature of the world right so how are we going to be able to best scale our knowledge of math science technology security and the vulnerabilities that are out there to do the most good in the world

and i argue that security uh pull request generation in this manner is the best way to have the most positive impact in the in in the security world or in this for the security of open source and with that i want to leave you with one final quote from dan um this is on dan's twitter account um it's still there to this day we can fix it we have the technology okay we need to create the technology all right the policy guys are mucking with the technology relax we're on it i want to leave you uh with some uh sound bites uh learn code ql uh it's a really powerful language um it's not easy but you'll get it like it takes a

learning curve but it's worth learning um you can do a lot with it you can find really cool vulnerabilities it's really cool writing a query and just having them fall out of open source it really it's really that easy they just start falling out of of the open source code um contribute to open rewrite and you can deploy your security fixes at scale and then join the github security lab and open rewrite slack channels where you can discuss with me and other security researchers who are trying to tackle this problem of open source security um to fix these vulnerabilities i'm also going to toss on there if you're interested in open security open source security in general um consider

joining the open source security foundation which is a subs it's a project on the under the linux foundation where a lot of the security the discussions around the security of open source are being discussed in working group meetings every single week so um and then finally i want to say thank you to human my employer um modern uh for being spectacular and enabling this work to even be possible uh lydia uh who uh helped we ran she was the black hat speaker coach that we ran this by like way too many times to get this into the state it was in and then sean my intern who uh was instrumental in allowing control flow analysis to exist um

so also the graphics for control flow or where some of his work so um thank you that's that's us

[Applause]

yeah so your question was um what other languages does open rewrites support besides java we have a xml we let's see xml javascript we're working on javascript yes working on cobalt yes um hcl um kotlin's in the works properties no not just not javascript currently javascript's in the works um java uh gradle build files that's right um groovy groovy uh yeah so it's dominant predominantly java currently but there's there's a lot of other work going on it needs to support a bunch of other languages too so um yeah anybody else

going once we'll take any questions i promise going twice

yeah so yes i try to put the wack duck in every single one of my talks and uh yes so this is a callback to that but also to my previous zoom talk uh about the oday that i dropped in zoom back in 2019. so go back and if you're curious about that vulnerability go back it's shmukon i talked about the zoom vulnerability that i found so um any yeah any last questions yes go for it yeah

so the question is it's the goal of the best practices about about trying to mitigate being evil and it's not just about mitigating being evil it's like you're dealing with real people but you're also you don't want to come across as a bad actor right because get up will bank if you're a bad actor right like i mean they will take into account like you're trying to do the right thing but like you don't want to get banned from github um so yeah i it's about trying to uh do the right thing in the context of also fully disclosing a security vulnerability in public if there was a way to do this quote you know in a way

that was private if github supported private issues like creating a private issue or a private uh pull request i would 100 percent go down that route it just does it's not it's not supported by get a current john hook like you have a question no okay just checking yes go for it um [Music]

[Music]

um are there so are there organization bought organizational bodies around this sort of effort no there is no so as far as i'm aware work like this around fixing vulnerabilities at scale in this way i am not aware of any other large-scale efforts besides the ones that i've engaged in to do this now they've not i've seen for example like i've seen the jenkins project go and generate a bunch of pull requests specifically for the jenkins project so like that's the security team from jenkins doing that specifically but i have not seen there's no major project that is actively engaged in this um this has been something that i've been it's been a project that i've been passionate

about and that's why i applied to the dan kaminsky fellowship saying i do this thing i have done this thing i want to do more of it and they accepted my application to enable me to try to see what i could do in a year to to scale this thing that i've been doing as a side project into a full-time job so this was my full-time job for the past since january was working on this project um and uh so yeah so that that's that's what enabled this um i would love to chat if you want to do more of this because i want to do more of this in like i think that more of this is awesome um yeah

anybody else open to any questions or you can also come chat with us afterwards i think that's anybody okay thank you all for coming i really appreciate it thank you thank you and if you want to chat with us or you can reach out to me on twitter uh i'm also available there so thank you

Scaling the Security Researcher to Eliminate OSS Vulnerabilities Once and For All

Related talks