
our presenter today is Adam Berman Adam I think it's time take it away awesome uh great sounds like the mic is working hey folks my name is Adam I'm going to be talking about uh when is a vulnerability not a vulnerability overcoming the inundation of supply chain security alerts um a little bit about me my name is Adam Berman I lead the semcarp supply chain team at semgrip I lead the engineering and product org uh for our product um I started as a staff engineer here at stem grip back when it was called r2c had the pleasure of getting to build security tools many of which are open source if you've used if you haven't
used seven grip before give it a shot it's a pretty cool tool and yeah I work at some grip we build the open source static analysis tool and we're focused on uh security generally um so I want to first start with like who's ever opened up their code base or security tool and seen a tab with a big alert that looks something like this um I think the reality is uh if this were true if all these vulnerabilities were really dangerous in our applications we'd all be fired right um we all know this isn't real so many of these alerts are like just not that dangerous so few represent like a real risk to the business
um the trick is finding those real dangerous needles in the haystack because those can't be ignored um and the real trick is finding those needles without having to overwhelm your team with a huge quantity of work so today we're going to talk uh about some strategies for that we're going to talk about the rise of Open Source usage uh what that means for security uh we're going to talk about traditional software composition analysis tools how they work uh the challenge of running an effective SCA program and then we're going to dive into like what reachability analysis is and how it can help folks move up a level in terms of their Leverage okay so I hope at the end of the talk we
can get to some strategies I'm sorry not ignoring alerts but reducing alerts uh so that this can help security and developers be partners together so we'll start with the rise of Open Source software um it's a tale of like velocity but also risk so uh open source software usage has massively increased I think when I started uh you know it was writing software it was popular in common but today it's like the overwhelming uh percentage of the lines of code in our applications come from open source software this is good this means that open source project is happening against open source projects are solving problems that are generic to all of us we're a startup we don't have to figure
out how to render HTML we don't have to figure out how to parse a cookie or spend our time doing that instead we use react and flask and a bunch of other open source projects that Implement that functionality really well and we get to move on to other things that are important for our business I get to spend more of my engineering time on like things that are unique to us but the downside is that we also inherit all of the security problems of the open source tools we bring in and we don't always do that like with our eyes wide open to those problems those these like old vulnerabilities live forever they're out there
continuously hackers keep blasting the internet seeing if anyone is using vulnerable versions these are our logs from a few months ago uh catching some of the spam this is especially funny to us we don't use WordPress anywhere but like we still pick up people trying to see old WordPress versions in the wild um and looking at the rate of disclosures for open source vulnerabilities it just keeps on going up so last year I think we saw something like 22 000 vulnerabilities disclosed over the last three months we've seen 8 000 new vulnerabilities disclosed we can just see this is going to increase more and more over time and that doesn't even take into account folks who are running
microservices or projects with multiple lock files where a single unique vulnerability disclosure might compound and multiply out across the many applications you care about think about how many services your teams are supporting think about how many alerts you get when a disclosure happens on your support week a single new disclosure can account for tens or hundreds maybe sometimes thousands in large companies of vulnerabilities across a large number of projects and then the security team has to understand the context for each application in order to budget the worst part about this is that a huge percentage of those vulnerabilities are false positives we ran a study of 1100 of the most popular open source packages with vulnerabilities and our analysis
showed that about two percent of reported vulnerabilities are even theoretically reachable never mind exploitable we've replicated these results on customer code and Enterprise environments as well and this means that SCA programs are typically really low Roi so before we dive into how to fix this let's back up a bit and figure out how we ended up here in the first place so I want to talk about traditional SCA tools how do they work um there are a lot of tools that most of us might use for SCA there are free tools like oauth dependency track and dependabot they're paid Solutions like black duck and sneak but fundamentally these tools are pretty similar when it comes to vulnerability management
first they collect a database of vulnerabilities these are sourced from public vulnerability databases like nvd perhaps companies have their own security research team investigating packages and finding new vulnerabilities but each database entry is going to look something like this a package uh a vulnerable version or version range the minimum patched version and then ideally they'll also be accompanied by helpful information like a description a recommendation a severity this is like a pretty good database entry helps you understand what's going on with this vulnerability but then they have to so then the next phase the next thing they have to be able to do is they have to be able to scan your dependencies perhaps they scan
your manifest file or your lock file they might query your build system for the list but they have to be able to see okay what dependencies are in my system here we're showing a lock file giving you uh the list of uh dependencies in the system finally they have to match the package versions back to the database and then they tell you hey do you have a vulnerable version of a package in your system so at the end of the day you end up with a list like this of vulnerable packages in your ecosystem foreign so what's the issue here the problem is this is not a list of vulnerabilities that are exposed in your
system instead it's a list of packages that have vulnerabilities that are in your system the vast majority of packages that might show up on this list are not dangerous based off how they're being used in that specific application for a package to be dangerous the first party code typically has to use the vulnerable part of the library in a specific way so let's take this low Dash vulnerability as an example what is lodash well it's a JavaScript utility library with uh let's see if this will go um no maybe I won't okay uh a few hundred public functions on it there's supposed to be a video that like Scrolls through a million pages of documentation uh it's a really large
utility Library it has hundreds of functions on it this vulnerability here says that merge with is vulnerable to prototype pollution but the whole Library isn't vulnerable at that version just this function and a few more that also have disclosures associated with them like merge and merge deep if say I use lodash's partition method to manipulate an array I'm totally safe partition is not vulnerable does not rely on any of the vulnerable methods so there's no undue risk added to my application now looking back at this list you might be thinking why not just fix all of them anyway and if they're only five like absolutely that might be easy and we all want to live in a world where all
vulnerabilities are fixed all the time and everything is rainbows and unicorns but many of us work at places where getting to inbox zero is just not feasible instead we're closer to here a huge or perhaps even unknown number of vulnerabilities to triage so let's talk about why is it so hard to get to inbox zero shouldn't we able to be able to just upgrade our way out of this mess well in reality upgrading is rarely quite that simple organizations are pretty rarely only one version behind when a vulnerability is disclosed or being addressed we might be dozens of versions behind the closest safe version in the best case scenario when you have a rock solid test Suite you still need
the developer bandwidth to fix the upgrades that are breaking changes and in most scenarios when you don't have a really good test Suite you also have to do the manual work to figure out if they're breaking changes in the first place and you may need to land a fix for each vulnerability across a breadth of services at your organization which means you have to rinse and repeat for the test suite for the testing for the manuals testing for each of those applications and developers have a lot of different concerns they have to build features scale existing scale existing features pay down Tech debt fix bugs let's go about new work upgrades and other security work have to be balanced
against all of these competing needs but there's a really tough trade-off here security teams face similar time constraints with asks from across the business prioritizing triage time takes away from other important work and since vulnerabilities keep getting disclosed this is a forever commitment you'll be forever pushing this boulder up the hill so security teams are often faced with an extremely difficult Choice balancing the value against the resources required for an SCA program in a resource constrained environment so for some companies for some organizations running a security focused sea program just isn't worth it it's too much work for too little reward and instead the goal becomes meeting the minimum bar for compliance folks might turn on a tool
that can check the compliance box and ignore the rest of the results and accept the risk for others they cannot accept the risk the needle in the haystack is too dangerous and so folks commit to manually triaging these alerts so they can find the few real vulnerabilities to forward to the development teams and this is a really terrible trade-off okay so the the Crux of the talk here is that reachability analysis helps you escape from that choice and move a lever uh move a level up uh and work on higher leverage work here um so reachability analysis helps security teams run a more scalable and pragmatic SCA program it lets us ignore or de-prioritize less urgent
vulnerabilities in the ever-growing uh like background tasks a queue of background tasks it also lets us find and immediately address the vulnerabilities that are the most pressing today so we can be strategic with our triage time and then it also lets us codify acceptable and unacceptable uses of packages with known vulnerabilities so developers can fix and self-serve uh to prevent real security issues uh without having to spend a lot of time on upgrades that aren't useful today okay so what is reachability analysis how does it work let's dive in um reachability analysis is a process by which we check to see if the vulnerable part of the library is used we call it reachable when the dangerous part of a
vulnerable package is used by the first party code so let's go over an example here we import low Dash and let's say it's at a vulnerable version on line five here we see evidence of reachability we see the program has called the dangerous function on the library with potentially untrusted user input and then on line eight we see how a user might be able to get a malicious input into the vulnerable function here we see a very similar function same user input same library but we're calling the zip function which we know not to be vulnerable we know it doesn't utilize any of the vulnerable functionality in low Dash thus we'd call the vulnerability in this context
unreachable I want to make an important distinction this is different from exploitability a vulnerability might be reachable but still not exploitable exploitability requires a lot more context the environment the infrastructure you have to know how the software is really being used in practice and it's a much higher bar much more difficult to also to automate this detection but so if we go back to our previous example from this code sample we can't tell if this script is exploitable reachability analysis is this lower bar basically does the code touch the vulnerable part of the library and it has nice property that it's much easier to automate the detection of reachability okay so how do you do reachability
analysis this has three parts the first part is an engine of some kind that can scan the code and detect well like some pattern uh so a nice property of reachability analysis is this this can be done with static analysis we don't need to compile the code or test it dynamically um at some grip we've built and maintained the open source static analysis tool semcraft we just changed our name I don't really know how to say at sem grip we build sem grip uh this works really well for analysis like this and I'll show some examples uh in a bit on the lightweight side you can use something as simple as grep you might use a language specific linter like
RoboCop or eslint if you have access to something like code ql you might be able to wire up something like this too um second you need an engine that can tell what version of the library is being used in an application there are a lot of great free tools like this Sam grip does this but oauth dependency track does this dependabot does this there's a lot of good tools for this um and finally you need a database of vulnerabilities and you need to know the list of vulnerable functions for a library at a particular version and under what conditions those functions are dangerous now here there is no Silver Bullet this is going to require a
little bit of work a decent amount of security knowledge some elbow grease but essentially you flip the switch on triaging of vulnerability it means rather than digging into your own code to see how you use the library it means digging into the vulnerable library and looking at the patch diff reading blog posts on successful exploits stuff like that in order to figure out what makes this Library dangerous at this particular version so uh right so we've also found that there are some open source databases that are beginning to have some of these lists of vulnerable functions there's an effort in the open source community Through tools like osv to try to provide that information more readily we're
trying to get involved with these uh resources as well um I want to give a couple uh I'll give an example of like how this works in practice um so uh towards the end of 2022 we saw a disclosure in the pytorch library um looking at the issue on the library itself it doesn't take very long to figure out what's dangerous about the library someone even did a great job of demonstrating the exploit in the comments on the issue so you can see exactly the function that needs to be called for it to be dangerous and what that looks like in the wild so then the next step is actually writing a rule so you look at the engine you specify that
okay this is a pie torch this is for python you write a message that helps a developer perhaps understand why they need to fix this and then you specify the pattern so here we specify the you know pattern here's what needs to be called uh for this uh vulnerability to be dangerous then the second part is the engine for matching the dependency version we can run this rule against a code base see if it's used at all and then to see whether or not it's reachable or unreachable so here we specify are you using the library at the known dangerous versions now that we have some rules um we can talk about how we build
reachability into our SCA program at all so let's zoom out for a second think about how vulnerabilities might show up in our systems there are two main vectors of vulnerability risk one is new disclosures and the other is new commits so the first a new disclosure new disclosure is going to impact old code code that hasn't been touched in years code that you really thought was safe but now it turns out that that nokigiri XML parser that you thought was safe wasn't safe after all we need to either spend some time sanitizing the input or we need to upgrade so how do we prevent how do we like protect ourselves against these kinds of disclosures
here we to do reachability analysis you want to be able to run scans basically ad hoc at request time you might build a script that pulls your repos and scans the whole repo with this new rule when you have once you've written it for this disclosure if you use a standard static analysis tool these scans are going to be dominated by how long it takes to pull the code rather than running the scan itself typically so a lot of folks have uh worked on like kind of nightly or hourly chrons pre-scheduled in CI that when you check in your rule it's going to automatically get a scan in your next run and then you'll be able to
get a result pretty quickly um the second Vector is new or changed code so once you've scanned or confirmed at the time sorry so right so when someone's checked in some code um or sorry when a new disclosure comes out you've scanned you've confirmed at the time of the disclosure the vulnerability doesn't exist in your code base um but people don't stop changing their code people keep adding new code they keep on making changes to existing code and someone checks in maybe a new library or makes a use of a library in a new way and all of a sudden that library's vulnerability is reachable what we want to do is we want to put a
guardrail in place to prevent developers from introducing new reachable vulnerabilities so to do these folks typically instrument PR scans to make sure that code changes don't let in newly reachable vulnerabilities this is ideal it lets developers self-serve or fix especially if you can add a comment when it fails a PR check and you can make it blocking to ensure vulnerabilities never make it to production for some orgs that like PR comments are blocking is it quite feasible you can orchestrate this also with out of band scans that allow security teams to fast follow with fixes without slowing down Developers um cool so running a reachability program kind of in action here's what it looks like you hook up a feed uh to some
vulnerability database so you get notified when new vulnerabilities are disclosed I think a lot of folks have something like this already in place with something like uh dependabot or uh vulnerability tracker um one of when a vulnerability comes through then rather than jumping into your code to triage you dig into the vulnerability itself if there's a single function that's vulnerable or particular kinds of inputs that make it vulnerable you then write a rule um to codify like what is acceptable usage what is not you scan your existing code base to excise parts that might be vulnerable today and then you check into guardrail to make sure the vulnerability doesn't become reachable in the future okay so reachability uh lets folks scale
and run a more effective pragmatic SCA program but I think this and this allows your security team really to level up and work on at a higher leverage uh point of view takes you from manual toil to allowing you to automate away your current and future work but I think I want to say is by no means bulletproof static analysis is imperfect there are false positives and false negatives as with any analysis technique but our goal here is to run something pragmatic there are some volts that are vulnerabilities that are so bad you want to excise them no matter what so you know log4j things like that you just want to get out of your system you might
not trust reachability analysis for I think it's also important to remember there's not a trivial amount of work researching and writing your own rules for each ability program but especially for orgs with multiple services and multiple applications this lets you scale a single research effort rather than having to manually triage each vulnerability by hand I think if you don't want to run this program yourself there are products out there that do this same graph has a product that does this but there are a variety of other SCA tools that do this as well and we kind of want to push to make this the de facto uh standard that reach SCA tools should have reachability but it's
something every org has to evaluate for themselves whether they want to build it or buy it but I think the key part is security here isn't a purity test if we can only have security in a world in which every vulnerability is upgraded we're on inbox zero it's going to be hard for a lot of organizations we believe in pragmatic security pragmatic security should not require so much toil thanks for having me uh come join our community slack at sem group we're also hiring quite a bit so I would love to chat with you all if you're interested thanks so much [Applause] foreign
yeah so the question is why like why capture reachability early if reachability doesn't mean exploitable I think the the idea is that the number of exploitable vulnerabilities is a subset of the reachable vulnerabilities but that reachability already filters out a ton of vulnerabilities that you that aren't exploitable and that you don't have to look at because they're not reachable either so if you start with a list of a thousand vulnerabilities if we can get you down to the 10 that are reachable and then help you find the five within that that are exploitable it really cuts down the amount of triage times you still there is still some triage to do figure out which of those
are are uh exploitable but it helps you kind of get past the other 990 that aren't exploitable or or reachable sure
um a lot of the dependency alerts that I've seen can be a dependency of a dependency of a dependency yeah um the scanning bins that you can the coding engine Mike swap um Can the code scanning engines that you highlighted trace the call stack to understand like reachability even when the uh function is used very deep within the stack yeah not all of them can some of those there are some that can there are some that cannot um what we've also found there's like a multiplication Factor here where it's like okay if two percent of vulnerabilities are reachable in direct dependencies then two percent of the vulnerabilities in those dependencies are reachable we're at 0.04 percent so
because it starts to become like incredibly difficult and a very tailored attack that I have to get through so in terms of risk management it's often not like the top thing that you'd have to worry about but I think the like for the truth of the matter is like a lot of them cannot and it's like you know you can't grep your way through but or maybe you can but then like that'd be something you have to wire up as you have to grip your way through a call stack thanks folks due to time constraints due to time constraints folks we probably need to just take the questions outside I got freestyle question Cipher with Adam okay yep I'll be out in the
hallway for anyone or maybe I shouldn't be in the hallway I'm not sure if you want me in the hallway I'll be I'll be yeah by the concessions area if people have questions thanks folks