Bugs Are Shallow: Finding Vulnerabilities In Top GitHub Projects

Name: Bugs Are Shallow: Finding Vulnerabilities In Top GitHub Projects
Uploaded: 2024-02-06
Duration: 31 min 30 s
Description: Laurence Tennant explores "wide" vulnerability hunting—systematically searching for a single vulnerability class across millions of open-source projects rather than conducting deep audits on individual codebases. Using mass-assignment vulnerabilities in Node.js frameworks as a case study, he demonst

BSides London31:30217 viewsPublished 2024-02Watch on YouTube ↗

Speakers

Laurence Tennant

Tags

CategoryResearch Technical

TopicOWASP Vulnerability Research Web AppSec

TeamRed

ResearchEmpirical Research Methodology

StyleTalk

About this talk

Laurence Tennant explores "wide" vulnerability hunting—systematically searching for a single vulnerability class across millions of open-source projects rather than conducting deep audits on individual codebases. Using mass-assignment vulnerabilities in Node.js frameworks as a case study, he demonstrates how to scale vulnerability discovery with tools like GitHub's code search, challenges Linus's law about code quality in popular projects, and discusses the practical barriers to responsible disclosure at scale.

Show transcript [en]

it was about hacking chemical plants which looked really interesting so thank you for coming along to to this track and not that one um so first a quick intro to myself I first really caught the security bug um when I was playing cyber security challenge UK online and it had a number of different fun puzzles that you could solve and from there I ended up meeting a lot of other people who who were into the same thing and we formed a security team uh CTF team called called Crown which for a few years was the strongest Capture the Flag um team in the UK I co-founded a website called crypto Haack where you can play

lots of fun challenges for for learning cryptography and I currently work as an application security consultant um at a company called include security so the structure of this talk first of all I'm going to talk a bit about deep versus wine wide vone hunting and what I mean by that we're going to look at Linus's law which is kind of a backdrop to to everything that we're going to be talking about today and then next I'm going to compare methods for hunting down vulnerabilities across thousands or even millions of open- source projects at the same time and ranking which ones have been more effective than others then we're going to look into Mass Simon which is the

vulnerability class that I decided to narrow in on and I will talk about why why at that point um and then we're going to we're going to Deep dive into the discovery and exploitation of a particular vulnerability that I found in the top uh most starred GitHub projects and finally there'll be takeaways for both attackers and Defenders from this research um so bear with me here this is quite abstract but normally when I do pen testing as part of my job I think of it as being quite a deep activity where you look at a a single project or a a group of Rel projects and then you you really try and understand it very well

you you look at the business logic and you try and find as many different types of vulnerabilities as as you can um in those projects U but I wanted to try something different and and change up the gears a bit and do something more which I would call wide vulnerability hunting which is where I wanted to kind of specialize in just one particular vulnerability class and then try and search for it across as many different code bases as I could and see if I got lucky and score some hits so I don't need to understand any particular codebase very well I just want to kind of um you know check if if if their exports out there anywhere because they

probably are if you look at enough different code bases and the the silly metaphor that I came up for this was was the the type of um industrial fishing uh known as trolling um it's obviously very environmentally unfriendly but basically you take a massive net and then you you drag it along the bottom of the ocean you're normally looking to catch a some nice big fish but as a result of that you're going to get some by catch uh the wrong species or some smaller fish and then you you have to Chuck those back in the ocean so those will be false positives but the idea is that you're going to get enough of the of the good

fish that you want um that would probably be uneconomical to find via VIA other methods and so the only ocean that really makes sense to fishing is is GitHub absolutely enormous now it has over 200 million um open source repos and something that's always interested me has been Linus's laws this is an idea which is put forward by Linus tvols which is that given enough eyeballs all bugs are shallow um and what he means by that is the larger the community around the project then the normally the better it the higher quality the code will be the less bugs it should have the less security vulnerabilities it have should have because you'll have more experts

who are able to contribute and the Cory of that of course is that open source software is better than closed Source software uh because open source software can be can be viewed and read by anyone if you actually look at the conflicting evidence uh sorry if you actually look at the evidence about this it's actually quite conflicted about whether whether that's true or not so one paper I found couldn't find any real empirical evidence to support Linus's law whereas another one I found they looked specifically at Google's projects and they they did find that Google projects that had more stars on on GitHub tended to have higher quality code and less security vulnerabilities and quicker bug fixes than the than the

smaller project so how are we actually going to search for vulnerabilities on GitHub so the obvious way is we'll find some kind of known badge functions or badge strings which often lead to security vulnerabilities and then we'll just find a way to search them across every single project on GitHub that's kind of like the the obvious starting idea um so I've got some payloads here uh probably you're familiar with with things like remote code execution you know shell exec in PHP os. system in Python normally the these can lead to bad things happening another obvious one is SQL injection you could write rexes which will find uh how SQL injection exploits can can occur in in various

languages but for this research I wanted to avoid things like SQL injection because um any developer worth their salt these days is aware of it and and normally tries to design against it of course you still find them now and then but um i' I've had a lot of success in in in finding things like unsafe serialization um xxe vulnerabilities Mass assignment that kind of thing there can be very high impact vulnerabilities uh but developers are less aware of them and and often forget to to protect against them so they seem like the right kind of thing to to look for here so now we've got a rough idea about how we want to search for things um we

need to find you know a tool for actually doing the searching the search engine uh so the obvious one is is GitHub search and this has now changed but by the when I was doing this this this research and first starting last year uh the first part of call is obviously just yeah use GitHub search and it's actually really bad for this not just for this but it was just a really bad search in general um it didn't recognize symbols so if you searched for code it wouldn't understand dots or brackets or equal signs or anything it was just a keywords based search it didn't have any kind of regular expression support and worst of

all probably there's no sorting by relevance so on GitHub you have millions and millions of boilerplate repos for just my first project that kind of thing and you you don't want to be searching through thousands of those when when trying to find security vulnerabilities that are actually against you know active and fairly big projects so earlier this year GitHub massively improve their search it now recognizes symbols you can search for literal code strings across all of GitHub in a very uh fast way um now has regular expression support but it still has the big problem of not being able to really find relevant projects and relevant results and and here we're just looking for you know big GitHub projects

you know not not not random stuff that people have just thrown out there so an example of this is um is this is using GitHub search today and using a mass assignment payload I'm going to explain a bit more how that works later because it's not the most wellknown uh bug class but um here it's like great we can find some matches these projects are probably vulnerable but when you actually look at them you find find oh this this has just been abandoned 7 years ago has a small number of stars it's it's not really the the kind of big fish that we want to catch so the next thing that I got really excited about was Google's big

query so um big query is a data warehouse as a service that basically allows you to run SQL queries over an enormous data sets and they actually had a 3 terab data set of all of github's data on it so when I saw this great this this looks like exactly what I need and this hasn't escaped the attention of some other security researchers such as this guy sh sh lemper if you're into application security his blog is fantastic and um here you can see he's doing select repos uh where the language name is PHP where it contains this function that that he's looking for and ordering it by the repo watch count so so great it's like now we can search for

particular patterns and we can sort by by the the popularity of the repo and potentially find some interesting stuff in some pretty big GitHub repos uh but there were two massive problems with this uh with big query for this uh is I found that the data set hadn't been updated since 2016 and it's also really expensive so Google give you $300 in free trial credit which is very generous of them but I found that the the very first query that I ran against this data set cost me over $200 um so obviously this was going to become too expensive very quickly so the next idea I had was to to write a script which would uh use

github's API to pull down a a large number of the top GitHub most start repos and probably in a particular language so I was thinking about how about I just download the top 10,000 repositories that are written in Python and then I locally run some static analysis tools against against them I could start off with some default rule sets but then probably write some custom rule sets of my own based on my own particular vulnerability class and then um hopefully find some cool stuff that that everyone else has has missed so my two my two favorite static analysis tools are code ql and semrep and I'll uh briefly talk about them here so um cql

is a way of basically uh it basically reads a code base and then translates it into a database which can be queried in some quite powerful ways and it understands uh control flow between different functions so you can actually do um tank tracking analysis and you could do source to sync flows so you can see you can write queries that look for places where attacker input can flow um in the in the back end of an application all the way to a vulnerable function it's it's it's a brilliant tool but it has a very steep learning curve and I spent a few days trying to get the results that I that I wanted and um

wasn't was n super successful with it it's also not open source it's actually owned by GitHub now and it's part of their Advanced Enterprise things you are allowed to run it against open source projects but you uh are not allowed to run it against closed sourced ones currently and it has another big limitation if you're using it with a language like C++ you you need to be able to build the the code base for it to to to make the database um one thing that I would say though as a side note is that GitHub made it very easy now if you have an open source project on GitHub to add code ql as part

of the cicd so if you go into settings it's just like a two button thing and they will just run the default rule sets and for for certain languages like particularly python it's it's really good so I don't see many open source projects actually doing this and I think it's it's a it's a quick win which which uh you know shouldn't turn those down so the other one is semrep and um I'm a big fan of SE semrep like a lot of security engineers and research as just because it's uh it's got great defaults it's straight out the box you can run it on projects and it will it will figure out which rule sets to use and and um

it's very easy to write custom rules as well but if unless you do some customization of it it's likely to find quite a lot of false positives so in the end I think if I was going to do this research again I'd probably choose this approach this number four approach with downloading repos locally and running static analysis tools on them but um at the time I actually I actually just moved on from from this and thought that I will try and find something a little bit easier um so the thing that I eventually found which kind of felt like hitting the jackpot was I found this third party GitHub searching tool called gp. apppp and the thing that's really

great about it is it's similar to the current GitHub code search but they've only indexed the top half a million GitHub repos so that means if you search for vulnerable code patterns here you're very likely to find you know quite um important repos or or or valid results there uh without having to sift through a lot of a lot of craft and starter projects um so yeah gp. app perfect I realized we found the search engine that we need so now I wanted to choose the vulnerability type so I want something that's easy to search for where I can I can put queries in quickly and find find this type of vulnerability I want something that developers are largely

aware of and I want something that's often leading to high impact vulnerabilities so the one I chose for this is uh Mass assignment I'm going to do a quick refresher on what Mass assignment vulnerabilities are in the context of API security um so you've got the oos definition there and although it's correct I find that it's one of those things that's just best explained with an example so I'm not even going to read it out um so so on the right there uh is a kind of pseudo code very very basic idea of a of what a web application would look like that's vulnerable to to mass assignment so at the top we have the API post route uh for users

register if you make a post request to that route then it calls the register user Handler the register user Handler takes the post request body and saves it uh as a new user model um in into the database uh as a user object into the database I should say and then the user model itself is uh has three properties on it username email and roll and and Ro has a default value of user so so by default if you register you you should be a user so the the expectation from the developers of this of this false application is that fake application is that when a post is made from the front end to/ userregister it will have a name

and an email in it and then a new user will be created but there's a huge problem with this with this code which is that an attacker can then add an additional property to their request body uh with with the role set to admin and there's nothing in the code that's stopping them from doing that because the register user Handler is taking the entire request. body there and and persisting it to to the the database as a as a new user so at the core of mass assignment is the idea that certain properties uh of of objects should be allowed to be changed from inputs specified by the user and certain shouldn't and then they've forgotten to

actually filter for the for the properties which you you shouldn't be allowed to change so the most famous example of a mass assignment vulnerability was about 10 years ago now um with a researcher called egar HOV and he was trying to show rails how Ruby on Rails why they were insecure by default and they weren't really listening to him so he actually found a mass assignment vulnerability that allowed him to commit any codes to any repository in GitHub and he did this by um the vulnerability was against the public key model so the public key model had a user property and he was able to take his own public key and make the user property

point to the the rail GitHub rails user so that he could use his own public key to commit codes to the the rails Repository so partly what I found is that mass assignment due to instance like that is actually quite well known in the Ruby on Rails Community there's plenty of warnings about it their their Breakman static analysis scanner uh you know throws up loads of alarms if it thinks that you're you're doing it and so Ruby on Rails while they were originally the most vulnerable sort of uh web framework to to mass assignment vulnerability these days they're probably one of the most secure when it comes to this particular vulnerability class however the it's much less widely discussed in

node.js so again I thought in terms of making the most value out of this research um using this vulner this Mass assignment vulnerability against node.js Frameworks that are out there is very likely to result um in in in some good findings so my next step was to actually look at the different Frameworks in node.js and there's about a billion of them and find how they could possibly Express you know these Mass assignments uh vulnerabilities and there are a number of different ways that they persist data and properties to to user models so you've got using find one and update you've got uh loopback which uses update attributes but then they have separate ACLS for deciding

which uh users can affect which properties then you have this adonisjs where they have about 10 different functions that could be vulnerable so I just basically went through and I listed all of these different things all these different patterns and then just started plugging them into gp. apppp and seeing if it if if they came up with anything and uh it didn't take too long before I I had a hit with user. update attributes on free Cod camp and I got quite excited when I saw this because um something that I found earlier is that free Camp is actually the mostard uh repository on GitHub with have about 380,000 stars so any finding there would would would

automatically be you know interesting at least uh so what is free Camp it's kind of like an online boot camp for learning programming it's a nonprofit uh based in San Francisco with about with about 40 employees so looking a bit closer at the actual finding here it it turned out to originate in this update us a flag function and it's a little bit confusing to see but this line 208 here it's setting a constant of updates which is request. body um so it's similar really similar to the example that we that we were looking at earlier where the request body is then plugged straight into this user. update attributes function without any kind of filtering

um about what what properties are actually being saved to the user and this was behind a API uh routs a a put rout SL update user flag so this looked quite promising the next step is to work out are there actually any properties of the user that that are sensitive that we probably shouldn't be able to modify and so I found where the where the user is set in in the uh free cam code and um yeah there there are quite a few interesting Fields here there's no concept of an admin user in free Cod Camp however there all the certificates which you earn from studying and doing coursework on the website are simply just stored

as part of the model so you have a is backend sir is full stack sir for example so ultimately the exploit all that it required was making a put request to this endpoint and then setting all of these properties which I shouldn't be able to set um obviously I also had to set is cheetah false and is honest true um as as part of that and um yeah so I sent off that request and I looked at my profile and then uh next thing I knew I had I had a whole bunch of certificates uh signed by Quincy lson the the head of um free C- Camp itself uh so yeah there are some of the

other certifications I had on my profile which I worked out amounted to over 6,000 hours worth of studying which I which I didn't do um so I reported this to their team and they were they were really good at fixing it and the way they decided to fix it is by listing all of the keys which the properties of the user which you should be allowed to modify and then filtering out any properties in the request being made which were not part of that list so quite a simple fix and uh one which which works well so we're getting near the end here um don't know if I've been a bit faster than not expected but um the takeaways

for the defense so first of all I just wanted to talk talk about how you prevent Mass assignment so you saw one example just there of how free Camp filtered uh filtered out allow listed properties but there's actually a way that I prefer the second way which I see being done in lots of developers um in a more Modern Way which is that you write specific API roots to modify each particular property so you would have a you would have a API route just to modify and update a user's email address uh one to modify their username you don't do it all in one function and then try and filter later it's just a safer

pattern even if it requires a little bit more repetition the other thing that I suggested earlier was that uh GitHub maintainers can catch low hanging fruit by enabling code ql scanning now it's super easy to do in GitHub and although I don't think it would have caught this particular vulnerability they're always improving the get the code ql rules and it's likely that this kind of thing would be caught by it eventually so the takeaways for the attacker uh um this is very anecdotal evidence but does it disprove Linus's law there's there's a uh this has been there for 5 years in in free camp this particular vulnerability and it makes you think how many eyeballs are actually

looking of course although it has a lot of stars this particular repo most of them are probably from beginners who are using free Camp to learn and not actual open- Source contributors who want to modify free Camp itself so it's potentially not the you know the most uh it's just yes very much anecdotal ever evidence for the for the point but it does make me think uh there are loads of bugs out there there there there's tons of stuff out there that you can find on GitHub uh when I was doing those searches on gp. apppp I was all kinds of stuff was was popping out this was just the most interesting one to to talk

about today um and yeah finally uh github's code search has significantly improved from where it was before but still uh if you're getting frustrated using using something like GitHub search or various other platform provider tools it often requires uh just finding another tool that's out there that and because it took me a while to find gp. apppp even though I was specifically looking for search engines that could help me do this research it was only quite late that that I suddenly discovered it and it was exactly what I needed and um just had that kind of built-in relevance Factor um so yeah hopefully from from what you've seen from this talk I've given an idea about how anyone could

really go out there and probably find more vulnerabilities just just change the vulnerability class change the language start plugging in these vulnerable code patterns into gp. apppp and then seeing seeing what results from it um it's quite it's quite fun to do and in future I I might have another go and see what else I can find um yeah that's pretty much it does anyone have any

questions uh what did free Cod Camp give you Unfortunately they were nonprofits they didn't give me loads of money but they their CEO uh did did tweet out uh a a little qos for for finding that which was you know that was

nice um how many uh repos did you find doing this search that had the same vulnerability that's the main advantage of this kind of search is that you find lots of places with one vulnerability how many did you get this time you how many vulnerabilities did I find in total as as mass assignment yeah yeah out of this research project yeah I so I found loads and the the problem was was was that most of them are were in repositories which seemed like they might be used somewhere or they were dead like I there was like this EU search AR engine where I found that had about 400 stars on get her I found Mass assignments in that but

I couldn't find anyone to uh talk to and they were they were Spanish and I I didn't know how to tell them you have a vulnerability in Spanish um there was there were just basically lots of things like like like nothing that was in the big leagues but there were just lots of smaller scale projects which probably are being used in various ways which were which were vulnerable to this particular thing and and the thing I found quite quickly is that follow up and with all these people and telling them that your it's actually really hard to find who to contact or or where to go to or whether anyone's actually going to care or listen to you so I generally

yeah didn't really bother with it but yeah I guarantee if you go out and try that you you'll find a

lot I was I was just wondering what what was the second most interesting vulnerability you found when you were doing this research the next one down on the list that maybe you would cover if not this one yeah that's a really good question because I did I did start with insecure deserialization actually first and I thought that would be that would be the most promising one um because until recently I I think it's changed the behavior has changed but simply using uh yl. load in Ruby for instance and some other languages automatically exposes you you to a essentially remote code execution in in things like a yeah in in an application security context and I did look for that

and I I don't think I had much success in finding stuff in top GitHub repos um I didn't really look much much Beyond Mass assignment actually I just kind of drilled very deeply down into that one particular vulnerability class but I'm sure there's others as well that would that would also yield similar

results um you mentioned Ruby and node js um are there other text tacks or languages that this appears in or is it like mainly those two or yeah that's a great question actually so in my normal work I would say probably 75% now of of of kind of newer web apps being made at least are written in in node.js or various node.js Frameworks so it's super common and Al but it's it's a very diverse ecosystem where there's so many different Frameworks it's not like grubi on Rails where there's where there's kind of um you know one very solid and feature framework there's a whole bunch of people doing things in all kinds of different ways and building lots of

stuff from scratch so it's actually really the ideal place I think for finding stuff like this because there's not really been a systematic approach to squashing certain types of bug classes like like there have been in things like d Jango and and Ruby on Rails I I also think you might have some success doing this against python as well um but again probably the lwh hanging fruit is going to be stuff where they're they're not using the really common Frameworks they're using they're using stuff that's a little bit off the Beaten Track thank you I think that's everything at the front there's a question thanks I was just wondering um the fact that you couldn't find large

repos that had this problem and that many of the prepers you were finding were for one reason or another difficult to to inform that they had the problem do you think that might be support for lesses law that in those cases where it was easy to inform someone had informed already it just seems that might actually be support for so you you think that there the repositories where where they made it hard for them to contact and they probably have been found by several other people before but personally had the experience of trying to update people with you have vulnerability and it being for various reasons just eventually I gave up yeah yeah I I can I

can understand that for for sure and I I think get after that happens to you a few times you just you realize it's not really worth bothering here and I guess this is partly why bug Bounty programs were were became so popular because they created a formal process by which you could at least get listened to even though sometimes they they that doesn't always go to plan either the disclosure process there but yeah I agree with you that it could well be that several other people have tried have tried doing similar things before and and found these two and and just uh also hit a brick wool it reminds me of that story of like the B 52

bombers or something not 52s w war two bombers where they were looking at um oh these bombers are coming back with holes in these places but it was like what are the ones that aren't coming back so we don't know where the holes are that missing group seems to in in this case be the ones that have already been informed on if you know to me yeah that's a very good point so any kind drawing any kind of conclusions from what what is yeah not particularly systematic research here would probably probably be yeah too too hasty to do that's a very good

point so thanks uh just like one point like do you know what frequency grap app reindex is uh cuz maybe you have the same problem that you do with the uh get the big query data data set yeah that's a good point actually um you could obviously probably find that out just by searching for stuff and seeing if it has commits from the last day or so but on the website itself they don't really give any information about uh much information at all about how it works or how they got the data or how they run it so um yeah I I I don't know the answer to the question but it's I hadn't considered it and it's a good point like

have you checked to see if the bug that you found is still showing in grup um yeah it's it's it's I think it shows the fix now so so you can still see some of the same the same code but you see the fixed version and also because certain people forked free Camp over time and some and a few of those Forks have a lot of stars they they also showed up as well and I think they still haven't been fixed okay thanks I think now that's everything many thanks for the talk thank you

Bugs Are Shallow: Finding Vulnerabilities In Top GitHub Projects

Related talks