Stalking Known Open Source Offenders for Novel CVEs

Name: Stalking Known Open Source Offenders for Novel CVEs
Uploaded: 2020-11-18
Duration: 41 min 59 s
Description: A case study in vulnerability research methodology that uses past CVE authorship patterns to discover novel vulnerabilities in open source software. Porter demonstrates how identifying the author of a known vulnerability and analyzing their other commits can systematically uncover additional securit

Bsides CT · 202041:59100 viewsPublished 2020-11Watch on YouTube ↗

Speakers

Will Porter

Tags

CategoryResearch

TopicDFIR Vulnerability Research

ResearchCase Studies and Incidents Analysis Methodology

StyleTalk

Mentioned in this talk

Tools used

Git

About this talk

A case study in vulnerability research methodology that uses past CVE authorship patterns to discover novel vulnerabilities in open source software. Porter demonstrates how identifying the author of a known vulnerability and analyzing their other commits can systematically uncover additional security flaws, using OpenEMR as a real-world example to discover three previously unknown CVEs.

Show original YouTube description

Will now has some tools to automate part of these processes available here https://github.com/mynameiswillporter/Stalking-Open-Source-Offenders He is also on Twitter as @willporter68 and will be positing updates about the evolution of this project there “The best predictor of future behavior is past behavior.” This case study tests this hypothesis against an opensource electronic medical records (EMR) implementation in an attempt to discover novel CVEs by examining the commits of contributors who are known to have authored the code responsible for existing CVEs. By examining CVE references containing links to git commits that patch security vulnerabilities, it is possible to determine the commits that introduced the errors. By identifying the author of the commit, and examining all commits produced by that individual, it is possible to discover novel CVEs that had previously been undiscovered. A SQLi vulnerability in OpenEMR (CVE-2018-17179) was featured in the Defcon 27 Biohacking village CTF. Targeting the author of this vulnerability, the author’s other contributions were inspected for weaknesses. As a result, CVE-2019-16862, CVE-2019-17179, and CVE-2019-17409 were discovered and patched. Other seemingly non-exploitable weaknesses were also discovered and patched. William Porter is a lifelong hacker and developer. Having worked in all types of environments from 3-person startups to fortune 50 companies, he has been exposed to a wide range of codebases and developer practices. An avid supporter of remote work, before COVID-19 he had visited 24 countries. Now as a homeowner, he spends an inordinate amount of time fixing things that break in the home.

Show transcript [en]

is um we our next talk is coming up in just a minute it's will porter uh what he's going to do is he's going to look at a cve he's going to look at the developer that contributed to that repository and then do more research on what other projects that that developer has done to find other cves um if i get it right and then i'm going to cut over right now just one quick thing make sure you guys check out the palo alto um discord channel as well they have a raffle there for a one-of-a-kind branded cooler and rocker glasses it also knocks um in their discord channel they have a raffle as well

so please um when you get a moment stop in there and say hi to those guys um so coming up next is a little porter let me bring them in

hey can everybody hear me okay

bueller all right sweet hey everyone um my name is will porter um i'm doing the next talk it's called stalking known open source offenders for novel cves uh that description of it was mostly right except for instead of in this example i wasn't um looking at other projects that the the author had written to but actually the same project but finding other areas of the code that had been made vulnerable by their other commits but i have some slides i will share them now all right how do i uh share my screen again uh click up top uh in the window that you're in you should see a um when you hover over oh man you should

pop up on top of your window all right um what where on the top of my video um so were you in the um the video window that you can see yourself in oh right right okay awesome thank you

all right so like i said this is called stalking known open source offenders for novel cves basically the talk is a methodology to kind of quickly identify new vulnerabilities in code based on existing vulnerabilities in open source projects it was a methodology i created because i wanted to be able to find new vulnerabilities a little faster than just performing a full uh web security assessment on a uh on a open source project so let's get into it so a little bit about myself my name is will porter my github is up there it's github.com my name is will porter um i'm a hacker a developer i put husband and homeowner on there just because i recently bought a house and i didn't

realize how much time i was gonna wind up spending on that so that's most of my hobbies now but industry wise i've been employed in things from three-person startups to fortune 50 companies either doing development work on authentication systems or most recently i was a senior security consultant in the consulting space so let's go over the agenda of what we're going to do today so i'll give a little bit of a bit about a a little bit of background about how i came up with this idea and the background revolves around a specific open source vulnerability i'll talk a little bit about the hypothesis that i had and then walk through actually testing the hypothesis

in the case study of a particular open source repository and then just how it led to the discovery of novel vulnerabilities and then if we have a little time at the end i'll talk about potential future steps that people could take to improve this methodology and make it automated or something like that all right so just before we start i want to go over some terms that people may or may not be familiar with just not to assume anything but so the first thing is cve so that's the common vulnerabilities and exposures and basically it's a list of records containing an id a description and at least one reference for a publicly known vulnerability and generally

a vulnerability that's assigned to cve is something that is a vulnerability in a common uh piece of software or something something that many people use um open source it's just software released under a license that grants users the rights to use modify or distribute the source code we'll be focusing in this uh presentation on open emr now this is just an open source electronic medical records implementation and that's kind of where uh where i tested this hypothesis so i also want to tell everyone about parameterized sql queries and basically a parameterized sql query is just a sql statement where dynamic data is supplied via parameters it's pre-compiled and it's a common method of preventing sql injection cross-site scripting

it's an injection attack where malicious scripts are injected into under otherwise trusted sites basically a vulnerability that allows an attacker to execute arbitrary javascript in a browser's in a victim's browser and weakness and now a weakness isn't necessarily a vulnerability but it's a coding error that could lead to a vulnerability and that concept is kind of or pivotal to this uh methodology all right so anyways um let's start with a little bit of background about a specific vulnerability and how this whole process came to be so the background is this vulnerability identified as cve 2018-17-179 and basically at defcon 27 me and actually some other people in this chat i think we're at defcon 27 we decided to do the

biohacking village ctf and basically it was about a three-day long ctf with a bunch of different sections to it and one of them was it brought you to a website and the the instructions were something along the lines of extract john doe's social security number from the application after poking around at the website we were able to determine that it was open emr version 5.0.1.1 and basically pulling up the cve details for this version um we saw that there was a sql injection vulnerability and that seems like a perfect candidate to do something like extract john doe's social security number from the application now there weren't any specific public exploits for this vulnerability at the

time i know if you're fact-checking me right now it might say there's a metasploit module for it but that's uh i wrote that after the fact so um don't worry about that uh also you could use sql map to exploit this but that's not what we did we we pulled up the source code and were able to engineer and exploit for it based on the source code so that's what we did and basically this the cve the description is that there is a sql injection in the make task function in this specific file name taskmanfunctions.php and it says via another file but basically the cve description itself tells you that there's a sql injection it tells you what function it's in and

what file it's in and now earlier as i explained to cve it contains uh identifier in that case this is the cve 2018 blah blah blah it contains a description and it contains references now the references are going to be pivotal to this methodology that i'm describing because a lot um sometimes for open source projects the cve references contain a link to the commit that fixes the vulnerability in the code base and that's actually exactly the case um here so one of the references for this cve is a github commit so basically it links to a commit that fixes fixes the um the vulnerability so uh if you were to load that commit in github

you would see that it brings you here the commits naven's bug fixes and if you scroll down you'll see that the diff for this commit is actually um uh if you know what you're looking for going to be obviously fixing a a sql injection and you can see that in the red that's the part of the diff that's being removed via the commit and the green is what's being added so basically this commit is replacing these two lines 97 and 98 replacing this code with this code and what this is is the original was an unpre non-parameterized sql statement where this potentially dynamic data is just inserted into the sql um statement causing a sql injection

vulnerability and it's replacing it with this newly parameterized sql injection so using the cve we're able to see that this is the commit that fixes this vulnerability so why is that important that's important because by seeing these commits that allow you to that show you the vulnerability being fixed you're able to identify work backwards and identify who actually introduced that vulnerability into the code base um and what happened was i wanted to i had done the ctf that was fun um i wrote a meta split module that was fun and then i was thinking like what else can i do that seems kind of fun and since we were able to exploit this vulnerability with relative ease having used the cv i

was thinking why can't i just find something that doesn't already have a cve and file a cve for it so basically i was like oh i could just test the entire web application um or is there any way that i can just kind of minimize my effort to start finding vulnerabilities and what i came up with was kind of this so this is predicated on a few facts so cves filed for open source projects sometimes have links to repository commits specifically commits that fix the vulnerabilities that they're discussing by finding the commit fixing the vulnerability it's more than likely possible to find the commit where the vulnerability was introduced um and by finding that commit you've

effectively found who what person introduced that vulnerability into the code base and i kind of think of human behavior in the sense that the best predictor of future behavior is past behavior so if somebody's committing if somebody's writing non-parameterized sql statements in this specific already discovered vulnerability it's likely because coders code and idioms i've reviewed so much code in my life that i find that people tend to write code the ways they speak like i use specific slang words things like that i have a mannerism of talking people also have uh ways of coding they use they reuse idioms over and over um different different paradigms so i'm thinking that if this person committed this

non-parameterized statement to the database how do we know that we've found all of them if there's no reason to think that so if we could just analyze this person's commits we're likely to find other vulnerabilities of the same nature of ones that have already been assigned cbes so that's basically the hypothesis um i wrote it up more succinctly here but i i think i pretty much gave a point but anyways by examining cve references containing links to git commits that patch security vulnerabilities it is possible to determine the commits that introduced the errors by identifying the author of the vulnerable commit and examining all commits produced by that individual it is possible to discover novel

vulnerabilities that had previously been undiscovered that's basically the gist of this entire this entire talk um it's pretty simple in concept and the rest of it i'll just walk through a case study showing that this did actually work for me when i tried it all right so identifying the author of the cbe that's tan amount to this process so let's uh let's let's see how we would do that so let's use the vulnerability we were discussing before and find that vulnerability's author so basically that cve included a link to a commit that fixed it and so we found the commit that fixes the vulnerability but using this how do we find the commit that introduced the vulnerability

so that's what we're going to do now so this is kind of the process i made if you wanted to do it in github um you could also probably just as easily figure out a process to do this in git but basically what you do is you go to the the commit that fixes the vulnerability and take note of the the file and the line that had the vulnerability on it before it was fixed so then what you can do is you can navigate to the parent vulnerability uh the parent commit so for those of you familiar with get or maybe not familiar with get basically get stores changes to code and these changes are stored

in a tree data structure so basically at any point in time there's a node that represents a change to a code base and then one or more children nodes that are changes from the state that the previous the parent node represented so basically at in git anytime you have a commit unless it's the first commit there should be a parent commit so basically what we're saying is go to the commit that fixes the vulnerability and then find that commits parent that commits parent will be the last state of the code base where that vulnerability was present in the code so basically by going to the parent vulnerability we've reverted the code base to a vulnerable state

so git then has a feature where you can browse the repository at that point in time so basically now what you can do is you can browse the repository as it was during the parent commit and then just navigate to the vulnerable file and that file should contain the vulnerable code since you've navigated backwards in time then what github has is a blame feature and what the blame feature does is it goes through every line of code in that file and links it to the commit that made that line of code the way it is at that point in time um so if you're familiar with it you probably know exactly what i'm talking about

if you're not familiar with these features i'll just kind of walk through what they look like a little bit so step one was take note of the vulnerable file and the vulnerable line of code now if you pull up the commit that fixes the vulnerability um the what whatever is highlighted in red is what's being removed and what's highlighted in green is what's being added and you can kind of see they have the diff notation um so it's minus minus for being removed plus plus you can also do a similar thing using git the command line program so we've taken note of the file name and the vulnerability so it's this file and lines 97 and 98.

step two we navigate to the parent command i've already kind of described the the tree structure but when you pull up this commit in in github so this is the commit but next to it it has uh it says one parent and you can actually click that and then it will load that commit so this is the parent commit and if i go back and forth so this is the commit that we were looking at that fixed the vulnerability this is the commit parent so it starts with 0 ff9 you can see that we've actually moved back in time to that commit so now that we're at this this commit this should be the last

commit where the vulnerability is still present in the database so now we can go to browse files and that will show us the entire repository at this vulnerable point in time so basically this is just some screenshots of me navigating the the database or the repository at that point in time um and what we do is you can open up the file just like you would but you'll see the state and time is still in that vulnerable commit so we're in that vulnerable commit i talked about the blame feature which will go through every single line of code and show you what commit caused that line to be in that state and when we do that we see that hey this

is our vulnerable line of code um this is a non-parameterized sql statement this is what we exploited as part of that vulnerability but over here it associates that these two lines are due to this commit this iform module commit so by clicking that we can basically bring up this next screen which shows you the commit being introduced into the code base now it says off the authored and brady miller committed so what does it mean that somebody authored a commit and somebody uh committed a commit basically this is this is common when there's something like a merge request so in this case of all is ostensibly the person who wrote the code and brady miller just and submitted a

merge request and brady miller merged that merch request so knowing this information um i might i'm interested in looking at commits that off thaw authored because i think he's the one that's introducing these non-parameterized sql statements but also it might be worthwhile looking at merge requests that brady miller emerged um because he didn't catch it as part of the merge request that being said uh the amount of code that oftdoll authored is in this project substantially less than merge requests merged by brady miller so to find the more specific the most probable method of finding a vulnerability we focus on commits that oftal authored all right so just to recap so oftall is the github user who

introduced this vulnerability into the open emr database now actually not part of this presentation but in general i repeated this process for every vulnerability in open emr that had a link to a commit that fixed the vulnerability i found that 11 users had committed vulnerabilities to the open emr repository in many cases these users had introduced more than one vulnerability into the code base so that kind of gave me a little more confidence that this was going to work that i was going to be able to find novel vulnerabilities by looking at known known offenders so in this case i also found another vulner uh there was another vulnerability with the in the common vulnerabilities and

exposures database that i was able to attribute to ofthal so now i kind of see that he has two publicly discovered vulnerabilities and if i remember correctly both of them were both due to non-parameterized sql statements so i'm starting to see that this is this is a thing that he does or he or she i don't actually know um so identifying the authors commits with weaknesses so like i said when i began this experiment ophthal was definitely responsible for two cves and open emr so also when i was looking looking at the commits even though the two vulnerabilities associated with them were sql injections actually looking at the commits that introduced these vulnerabilities i also

saw other weaknesses like um unencoded dynamic data being echoed into an html document like something that could possibly cause a cross-site scripting vulnerability so my hypothesis would lead me to believe then that oftdoll may have introduced encoding errors and other commits or non-parameterized sql statements so let's take a look so basically now what i want to do is we've we've identified the author who authors vulnerable code now we kind of want to look at all their other commits and see if there are other vulnerabilities lying there so there's two ways that i have included here to do it and one is through github again you can just basically if you go to the commits endpoint

and use the query parameter author you can just supply an author and in the browser it'll just show you a list of their commits from most recent to most in the past now if you prefer to use the command line you can use the git command so the git log also has an author flag that allows you to specify an author and will allow you to just view commits from that author note that as we discussed earlier you want to focus on the author and not the committer since the committer might have just merged a vulnerable merge request rather than have actually authored the code that was vulnerable um so moving on all right um so this is what it looks

like if you view the author's uh commits in github basically if you load this this url you just see a list of their commits that you can click on and use on the website um this is what it looks like if you use git log to try and accomplish this um basically this uh the git log author equals command loads this kind of interactive git log style display but what i found a little bit more useful was to just dump a list of the commit hashes that's also possible using the get the git binary so you can give it a no pager attribute which doesn't load the interactive one and you can actually format the

output of this so the percent h here just means just the commit hash so this kind of command will just dump a list of commit hashes by that author so um what i was able to do was just dump the list of commit hashes pipe it into wc and find that there were 98 commits uh authored by ofthal i kind of tweaked this this format parameter to actually instead of just dumping the commit hash output a list of of links to those commits in in github now the reason i did this was because i actually like viewing the commits in in the web browser rather than anywhere else um i it's just really quick for me to

look at it when it's there's the red sections and the green sections and i just think the the overall diff experience is really good on github so that's why i chose to do it this way if you were doing this methodology uh you could you could do whatever you want you could view these in the command line you could probably bashful or automate this like much more than this but i just like viewing it in the browser so um as i said there were 98 commits um being able to view the red and the green and just like sitting down and chugging through it um i i think uh i think it was really easy

for me to just do it in the browser um so i skipped ahead a little bit um all right so that's what i just talked about all right so armed with a way to v visit uh armed with a way to view all of offthal's commits now all 98 of them i have 98 links let me just be sure and revisit what exactly i'm going to look for when i view all these commits so as i said before this this author is responsible for two two published vulnerabilities um so let's just take a look at those vulnerabilities so we can kind of get in our heads what we're going to be looking for when we

when we pump through these 98 different different commits that they made so this is the first one um it's what we looked at earlier this is a non-parameterized sql statement that led to a sql injection vulnerability um you can kind of i for those of you who may not be familiar with um code analysis or reading code the reason this is a vulnerability is because that these these are variables in php they're possibly dynamic values and they're just concatenated into this into the sql statement um to use a parameterized sql query there would have to be the actual parameters which would kind of present as this but with a series of just uh locations with question marks in them

um additionally this is the the second vulnerability that they were responsible for this also turned out to be a sql injection and this kind of also presents as a sql statement with dynamic data merely concatenated into it rather than having the proper parameter parameterized version um and if i could if i had the whole file shown up uh you would see that file name is actually the product of user input that comes directly from a query parameter so you would no no more clearly than this example whereas with this code segment that i've shown you this may or may not be vulnerable you would see that it was clearly vulnerable if you saw how file name was

constructed all right um so those those are the the two parts of their their commits for that contributed to the known vulnerabilities that they had um but i also noticed some other things when i was looking at their commits like this for example in this the they are echoing uh possibly dynamic value straight into uh html and this is a potential vector for a cross-site scripting vulnerability so having seen this i'm actually going to be looking for two things when i look through off those commits i'm going to be looking for these non-parameterized sql statements and i'm going to be looking for um this type of unencoded output just dumped into a html document or javascript or

whatever whatever unencoded output going to the wrong place i'll just be looking for those kinds of things all right so now i have these 98 commits i have to examine um i know what i'm looking for and basically i just did it i just sat down chugged through it took about two hours um but what i did was i just looked at the red and the green sections because those are the diffs you could just look at the diffs if you're doing it in the command line and for every part that was introduced i just looked for the introduction of a non-parameterized sql statement or um an echo of dynamic data into html like there

i didn't do any deep comprehension of the code going on i just looked for these two pieces of things so that's why i was able to do 98 commits in two hours just because i'm not really understanding the code i'm not taking the time to do that i'm literally just looking for these two weaknesses that i know this author has done in the past so i was able to chug through it and um when i find a weakness i'm not even examining it for a vulnerability literally all i did i just chugged through it and uh if i noticed the weakness in that commit i just made a note and so at the end of

it i had identified 14 commits that had obvious weaknesses and if you remember uh earlier a weakness is just a coding error that might lead to a vulnerability um and then i just have these 14 commits that uh are much more manageable than 98 commits and so i can take now after doing this basic triaging can take a look at these 14 commits and kind of examine them further for uh whether or not they actually contain exploitable vulnerabilities so just to give you an idea of other things that that might not have been vulnerable code but things i tagged as weaknesses um i kind of have this this is actually one of the commits that i tagged as a

weakness um it might not have eventually been vulnerable for whatever reason but you'll basically see that there's a function here that um as part of the function it takes a stock id parameter and it kind of assembles this url by just merely concatenating the doc id with no no encoding it's just all of these weaknesses are kind of hinting that this author doesn't necessarily understand the value of encoding or the importance of it in terms of security um but somebody could if they could manipulate this stock id could completely change the meaning of this url something as simple as possibly adding an app per an at symbol and making this all be a username password combo and then putting

a different malicious url afterwards something like that i don't know um i don't think this turned out to be exploitable but this is something i would have tagged as a weakness um all right so now we've we've identified the author we've created a way to find all their commits um we've actually went through the commits and tagged a subset of them as containing weaknesses that were similar to what we had already seen from the author's coding style um so now let's just see how did this how did this turn out in reality um so discovering novel cves is the next part so i identified those 14 week commits now it's time to actually take a closer

look at each one um so this is the part it's kind of out of scope for this talk but uh i'm i've spent a lot of time as a developer so i'm very comfortable just like reading code but that doesn't mean you have to do it this way you could use the same methodology and when you get to this part you could just dump these files into i don't know a static analysis tool or something like that or take note of the files and just if you're comfortable doing dynamic web app testing just dynamically test that portion of the web app um so here me personally i prefer to just do the static analysis because it's

just quick for me to read code um but basically this is a this is a commit that i tagged as a weakness um and the commit basically boiled down to this section of code here uh i edited this for the presentation like these lines were interspaced but there was a bunch of random other code that wasn't pertinent to the vulnerability between some of these lines um but so for example here you have provider id which is if you're unfamiliar with php code what this line is doing is this is taking content from a query parameter in a url and just storing it in provider id so this is complete user input um it's not sanitized or

modified by the by the program it's not encoded in any way um encoding would be the better option but basically then you have the sql statement and what it does is it kind of pulls values from the database and then starts generating this long strange concatenated string with commas and parentheses around it eventually what you see is that what it's doing is dynamically generating a string containing values to insert into the database somewhere else and what do you know one of those values is actually the uh user input string that we can just input into a query parameter so just by seeing this i know there's an exploitable vulnerability i don't even really need to test it at

this point in time to know that this weakness is actually a vulnerability but it's always good to test it anyways but we already know what the what the query parameter is going to be the query parameter is going to be provider id so this is also kind of out of scope for this this talk but i was able to find this exploit string that um was able to since it since it was in an insert into statement the the method i used to exploit this was using an error-based sql injection so basically i used this update xml function to generate an error that it would then dump to the the web page so in this payload

what i'm doing is executing the version um trying to dump it trying to add it to a piece of xml but doing it in such a way that causes an error and that version is then output as part of the error message to the screen when this eventually crashes so there you go we've looked at a commit identified a vulnerability um and now it's a cve that turned out to be cve 2019 16404 so we discovered a novel vulnerability and uh filed it um so basically the description was authenticated sql injection and this file in the open latest until what was the latest open emr version and um yeah it allows you to extract arbitrary

data from the database by a non-parameterized insert into statement as demonstrated by the provider id parameter now this is exactly not only were we looking for vulnerabilities but this is exactly the vulnerability we were looking for we knew that this person committed uh weaknesses with non-parameterized insert into statements and boom it just took us right there there wasn't really much guesswork involved some other ones i think that was the only sql injection that came out of some of these tagged weaknesses commits but um you know how we had noticed earlier that they just echoed some some values to to html without any encoding well here's another commit um where they just echo this pid value

and uh this too is controlled by a query parameter so um i believe i believe this one was controlled by a query parameter where you could set the value of the pid um it's probably on the next page oh yeah so that was a vulnerability that was a novel vulnerability novel vulnerability um so that resulted in cve 2019-16862 and you can see here it's reflected xss and it allows you to execute arbitrary code in the context of a user session via the pit parameter so again exactly what we were looking for where we were looking for it um another one another echo of unencoded data this resulted in another reflected xss via via a query parameter so

that is um basically uh overview of how we identified an author that had committed known vulnerabilities how we examined all their commits and how we identified some previously undiscovered vulnerabilities ourselves now rather than have this be a a manual process it would definitely definitely um uh be benefit from some automation so i did actually start to automate this um but then i kind of went down a wormhole of using machine learning to try and identify vulnerable commits but you can kind of use some of the get okay so backing up first of all if you know what repository you're trying to attack um you might be able to parse that out of some of the

there's something called the national vulnerability database basically it's a database of all of these cves what you can do is parse it and what i did was while parsing it if there were ever any references in it that were one of 36 or so known open source repositories what i did was i i marked that as possibly being um an open source project that could that you could use this method on so if those if those references to open source repositories were actually references to commits that fixed vulnerabilities then you could kind of use this method and what i did was by doing that able to identify about 3000 or more open source repositories that this method might work on

um yeah but additionally what you could do i think that i haven't done is rather than try and use machine learning to identify vulnerable climates if you just did this process and instead of manually inspecting them just hooked it up to a like a static analysis tool or something like that you might be able to really just get away with um automating this and making it a lot better um so maybe i'll do that in the future but yeah that's about it i have some acknowledgments for my clip art and questions are there any questions i don't see now right well right now well um but we do have a channel for your talk in our discord if you want to hang on

there for a little bit maybe uh someone might pop in for you okay yeah um yeah i'll just hang out in the discord yeah thank you very much that was a very interesting talk it was really cool all right uh yeah no problem thank you thanks for having me enjoy the rest of the conference yeah you too

Stalking Known Open Source Offenders for Novel CVEs

Related talks