Static Code Analysis, from Source to Sink

Name: Static Code Analysis, from Source to Sink
Uploaded: 2015-09-22
Duration: 56 min 32 s
Description: Static application security testing (SAST) offers an alternative to dynamic penetration testing by analyzing source code rather than running applications. Paul Johnston presents a prototype tool that traces untrusted data from entry points to dangerous sinks, demonstrating how to find injection flaw

BSides Manchester · 201556:32626 viewsPublished 2015-09Watch on YouTube ↗

Speakers

Paul Johnston

Tags

CategoryTechnical

TopicVulnerability Research Web AppSec

StyleTalk

About this talk

Static application security testing (SAST) offers an alternative to dynamic penetration testing by analyzing source code rather than running applications. Paul Johnston presents a prototype tool that traces untrusted data from entry points to dangerous sinks, demonstrating how to find injection flaws, cross-site scripting, and other vulnerabilities across Java web applications. The talk covers practical challenges in building static analyzers, including framework-specific entry points and cross-language analysis (JSP compilation), plus lessons for secure development practices.

Show original YouTube description

Static analysis is an alternative approach to penetration testing, which focus on analysing source code,rather than attacking running applications. I developed a prototype static analysis tool, and learnedall sorts about static analysis and secure coding on the way. I will explain the basic principles of staticanalysis, the practical problems you hit, and the lessons for secure development pracitces. This willbe useful for people who use static analysis tools, perform code analysis, and developers interested insecurity.

Show transcript [en]

So we are now in track one. Thanks for making the long journey over and not going off to get some coffee. But we now have, you know I remember that I didn't do my selfie this morning, which I should have done when the room was really packed. Can I do it now? Does anyone mind? We'll get it done. Just for Twitter's sake. It's become a B-sides tradition now. We're only asking in here. Okay. Smile everyone. It's a bit sparse compared to the other room, but it'll do, it'll do. So we're getting things over. We're about 10 minutes later than the printed schedule, but we're on a break after this, so it's all good. Coming up

next we have the ridiculously talented Paul Johnson. He was telling me a story earlier how he went to his church the other day, and his vicar was stuck. with his PowerPoint presentation. And apparently in the church they have the hymns and it's like a karaoke, but the words go across. And he spent the rest of the day helping him with the troubleshoot and everything. So if any of you have PowerPoint presentations or karaoke issues, please do that. Hey, that was not in the bio. Cool. So over to Paul. All right. Good morning, everyone. I'm one of the consultants at Pentest. I've come here today to talk to you about static code analysis, about why this is a fantastic technique for finding security

books. I'm going to explain a little bit about a research project we've done. So you should learn a bit about how static analysis tools work. And there's also some interesting lessons about how to design secure software. So I'm just going to start just covering a little bit of terminology. Because if you come across these kind of tools, you might have heard of DAST and SAST. So DAST, or Dynamic Application Security Testing, is where you're assessing a running application. And that's what I guess a lot of people here will do when you're pen testing, you're assessing a running application. So static.

application security testing or SAST. That's when the code's not running, so you're doing a code review. So you probably see these terms around if you look at these kind of tools. Now the first thing we wanted to look at was why SAST would be a useful thing to do. So I've got a little example up here. I'm hoping there are a few technical people here. So I think people will be able to spot the security flow with that bit of code. Now, we can spot that very quickly by inspecting the source code. Now, how could we go about finding this in the black box pen test? Because unless we had some clue that that particular parameter name was going to have this

effect, it would be essentially impossible to find. So here's one example of something where SAS can find it very quickly. And that will struggle.

Now it turns out that it's not always plain sailing for SAS. I've got an example here which is modeled on some real code that we audited a while ago. We were actually asked to do both a pen test and a static analysis using a commercial tool. Now, this particular vulnerability is reflected cross-site scripting. And for a pen test, it's easy peasy to find. You go to the search box, you put in the most simple cross-site scripting attack, and it is immediately vulnerable. There's no weird filters, no difficulties.

The commercial static analysis tool failed to find this. And in fact, even when we looked at this manually, it's a very convoluted code path to follow. The top snippet, which is part of the controller, it's got a little path where if it's not got any search results, it includes an error message. And the error message actually includes the string that you searched for. And then the bottom bit, which is part of the JSP view, has got a bit where it prints out the error message. And it turns out that some of the error messages that are hard-coded have got HTML tags, so they disabled escaping. So Pentester, it's a dead simple cross-site scripting. It's pretty hard to find with a

static analyzer. So both these approaches, SAS and DASP, They've got weak points. But when we're faced with technologies that are not perfect, using both is going to give us some kind of benefit.

Now, there was another big advantage of static analysis that we've noticed compared to dynamic. And this is the format that you get the results in. The output of a pen test, it typically comes out of saying you've got this page, this parameter, this vulnerable, to this. Now, if a developer wants to do something with this, they don't really talk in terms of pages and parameters. They've got to do this sort of first bit where they figure out where in their code that page and parameter are, and then they start to look at it. And the benefit of static analysis is it comes out in the terms that developers understand. It will say, you've got a problem in message.java, line 123. And

developers seem to find that useful. And because of this, something that we in Vicious Static Analysis had been used for was as a tool that developers have had in the development lab, not necessarily something where they hand it over to the security specialist, but that you use static analysis as part of your nightly builds, fix books early on. So when you come to get pen test, all the low-hanging fruit is dealt with.

We had a little look around at the market to see what was available. There's a number of commercial tools. These are the main four up here. On the whole, the feedback on them is pretty good. The two persistent criticisms I hear are that they are expensive and that there's a lot of false positives. And I've got a little theory that there's a bit of a commercial disincentive about the false positives because

If you're doing a procurement exercise, you say you run Fortify and it finds 150 problems, and then you run VeriCode and it finds 180 problems, there's a whole lot of people who are just going to pick the one that finds more, despite the fact that they might just be finding more false positives.

There's also a number of open source tools. Now it turns out a lot of these are not very smart. do not much more than just wrap the source code. But there's a few up here that we thought were interesting. In particular, Find Books for Java is, I reckon, the best open source tool. There is a plugin for it as well called Find Security Books. And that gives you a pretty decent analyzer. But when you looked at this in detail, it's lacking. The reporting in UI is very much lacking compared to the commercial tool. And we also found quite a number of fairly simple examples that it missed. So we thought there is a bit of a gap here. We should be able

to create a tool that is just that bit smarter than FindBugs with a security plugin, but not necessarily competing with the commercial tools.

Now, so when we started to look at this, we had to think about how are we gonna assess source code for vulnerabilities, and we figured out the three main things to look for. So, insecure versions of libraries, insecure configuration, and insecure coding. I should add at this point that we decided to look at Java code. This was our first cut. Now, if you take any non-trivial Java project it's gonna use a whole heap of libraries this is a little screenshot from looking at Apache Roller and this is this screen probably only covers about a quarter of the libraries it uses so we've got that and every one of them we've got the name of the library and the Jars text contender version

number now at the first look that wouldn't mean a great deal to me but these numbers are pretty crucial If we look, for example, it's got struts2.core 2.2.3.1,

which is bad because it's vulnerable to this. Anything that's before struts2.3.14.3 is vulnerable to this remote compromise exploit. So that incomprehensible list of libraries can contain some quite scary gotchas Now it turns out that OWASP has got this one quite well covered. There's a tool called OWASP Dependency Check. And you can plug this into your build system. So if you've got old versions of libraries, this can alert you. Good.

We also looked at some examples of insecure configuration. And there's a bit of a classic one here. This is Apache Access, which is a web framework. And this config has got the default admin password. So if you leave it configured like this, anyone can come along, log into your admin panel, and mess with your site. Now we had a look around open source tools. There doesn't seem to be anything good for identifying configuration flaws. So that might be an exercise for someone who's interested.

complaint I have with looking at insecure libraries and insecure configuration, which is, it's kind of boring because it's only ever going to find vulnerabilities that have already been discovered and are already well known. It ain't going to find zero days. So what we really want to look at is the code. Because this is where we can take off-the-shelf software libraries and we can find brand new exploitable vulnerabilities. That's the theory. And there is one technique which is really prominent for analyzing code, which is called data flow analysis. And I'm going to just talk through a simple example here.

So coders, if you put this in your code, doing it like this is a bad idea. Now you may see some tutorials that tell you to code like this, but ignore them. It is wrong. This is the simplest example of SQL injection. So what's happening here is code is reading a parameter from the HTTP request. So this is a request that's come in over the internet to your web application. Now it could be some legitimate user. It probably is a legitimate user. But it might be a malicious attacker.

So that parameter could be legitimate, could be malicious. We don't know. We cannot trust it. So when that gets read into the code and stored in the ID variable, we say that the ID variable is tainted. Now, at the next line, we're building up an SQL string. And we're using the ID parameter. So once we build up that string, It uses some tainted data, so that string is tainted too. And then on the third line, we're running the query. So this SQL that includes some tainted data that we cannot trust, we're now passing to the Java database API and it's going to execute it. And a function like that is what we call a dangerous sync. If you pass tainted data to

a dangerous sync, you've got a security vulnerability. that ID parameter, well it should normally contain a number, if it contained an SQL injection and tag, that would be successful. So, just a quick recap, trying to sort of explain how we got here. So we decided that static analysis was a useful thing. We reckoned that the open source tools available were not quite up to scratch. And this technique, data flow analysis, seems very promising. So considering all that, we thought we'd set out on the project to build a tool that was just that bit smarter than Find Books. And the vision that we have is that this is a tool where you can just drop in Java code and you can get out nice summary graphs like this very

quickly.

To implement this, there is a very important design decision. And that is exactly what kind of code we're analyzing. If you're familiar with Java at all, you've probably heard of bytecode. So we start off with Java source code. We have the Java compiler that takes source code and builds bytecode. And then there is a Java runtime that you pass the bytecode. and it will actually execute it. So the question is, do we want to analyze the source code or the bytecode? Now, Find Books works on the bytecode. Commercial tools are a mix. Some work on the bytecode, some on the source code. There's a lot of considerations for this choice. So I made the courageous decision to go with the one that was slightly easier

to write, which was the bytecode. I have sometimes regretted this decision, but once you kind of make the decision, you're kind of a bit stuck. It's a long way to go back. Now, I'll just show you what Java bytecode looks like. So we looked at this function before that was vulnerable to SQL injection. So I've got below it the beginnings of a bytecode decompile. The bytecode, effectively, it's a little stack-based assembly language. So if we look at this first line, this string ID equals request, or get parameter. The way this goes into bytecode is we've got first instruction to load the request object. another instructions like the constant string I that cool to HTTPSerblet request get

parameter and then the result still in the ID variable okay now and I'm gonna show you now how we use that in so I just

set up a scan, it's just of this simple example, so it runs quite quickly. And I can drill in from here into a, what's that, bugger. This is what I use if I'm trying to figure out why the tool's not working on something, this is where I go. So we've got bytecode done here, and you can see down the side, the the said will be said these are the steps that and so the numbers that stop all the instructions the are saying status and said for instance if we look at and stated that this fact like constant we can see got a values time job on straight tracks if we look at the return from get parameter. And

we've recorded that it's attached to an untrusted source. So, essentially what we're doing is kind of writing a little emulator that will track the state of the stack, the variables, fields on classes, all that. So, for instance, once the SQL string is built up, you can see we've traced the the value of the string, and it's still attached to the HTTP source.

Now, in order to do that analysis, one of the key things is we had to know that get parameter was an untrusted source. We have to know that execute query is a dangerous sink. So to do this, we had to create a knowledge base, and This is a series of XML files, and it's got basic rules for how to treat Java library calls. So I've just picked out those two examples to show you. It turns out this is a rather major effort because there's loads of obscure function calls in all the libraries. And as well, these functions, they're quite simple because we're able to, emulate them using these purely static XML. It turns out when you use a lot of other libraries, they

introduce some kinds of dynamic functionality into Java. So to model them, you actually have to write plugins to plug into the analyzer. So we can analyze a bytecode. Go to the other piece. Now, the analyzer just works on single functions at a time. So another module that we needed in the tool was a scanner. So if you're going to give the scanner a whole pile of code, this is going to search through it, find all functions, pass those to the analyzer. And it's also going to maintain global state about the application, because you've got functions calling other functions, calling libraries, all that. The scanner is like a central data store for all that. The other thing that we had to think about was how

we want to report these results. And we tried out a few different things, and what we ended up quite liking was this annotated source code. Being a simple example, it says that every line in this function is part of the vulnerability. But in real world code, it tends to be more spread out. So the idea is that each vulnerability will produce a trace. And we'll display this. We'll have the source code. And the lines that are affected are highlighted yellow. And one of the things that we added to this was a narrative.

So, if we want to look at the first line, we might think, well, why has this tool highlighted its being involved in the . And we go and hover over it, and we get a pop-up that explains it contains an untrusted source, and it's storing that data in a variable.

The other part of the UI we need is something for setting up scans but that turned out to be fairly simple compared to the reporting. So, just to recap, those are the four main components of a static analyzer. Analyzer, knowledge-based and plugins, a scanner, and reporting in UI. Those four together can give you a tool that you can drop in source code and pull out security vulnerabilities at the end.

Now it turns out that working with completely hypothetical examples gets pretty boring. So I picked up one of our training apps. Now this app Butterfly, we've used this in coding workshops for ages. It's a simple e-commerce site and it is loaded with a few deliberate security vulnerabilities. And from a tool development point of view, we've got the benefit of, I already know where the vulnerabilities are. Which you might have exchanged, but you have to do it somewhere. So now people have cautioned me never to do any sort of live demos, but I'm actually going to scan this right now. And if it breaks, you can laugh for me. So, I have pre-set up, and I obviously did check that this worked just before.

So this is the output from the scan. Now, the first time I did this, it didn't find everything. But I'm just gonna talk you through how we've sort of extended it to find different sort of variations of vulnerabilities. So first off, we'd already put in the SQL injection features. So it's found some examples like this. Now this is pretty similar to the hypothetical example we looked at before. So we've got reading a parameter from a request, building up an SQL string, A slight complicating factor is that the source and the sync aren't actually in the same function, because it's got this helper function, dbconnected.query. So the tool's been able to trace the tape across two method calls. So we've

got untrusted source, building up the query, and then we've got a call to this function, which then takes tainted data in as a parameter, and then finally passes it into the dangerous sink.

So that we managed to find just using the rules from before. And it seems that there are some slightly more involved examples. Concept here is exactly the same. It's just it's a bit more complicated the way it's gone from reading a parameter to calling execute query.

Now it turns out that the first cut, it only found some of the vulnerable code that we knew about. Now, we can take a little shortcut now because we have the working tool. But what we had to do was, when it missed a vulnerability, we had to manually figure out the trace by manual code review and then go back and assess why the tool hadn't found it. And the first example was particular kind of SQL injection. Now,

this line, it turns out, is vulnerable. In fact, the variable uploaded file name is tainted, which makes it vulnerable. But we didn't find a call to request or get parameter. It turns out that the data actually comes in here. Because this app's using the struts framework, this is effectively a form beam. So the framework, every parameter it gets in, if it gets in uploaded file name, it's called set uploaded file name on your form beam. Now, this presents a few difficulties, because we've now got to know which entry points of the application are coming in through struts. And there's a struts.xml file in any struts application, or at least the older ones, pre-annotations. Got an XML file like this. So what we've now to do is to pass

this, identify the struts entry points,

and then these, rather than having an untrusted source, what we've got is what we call an externally tainted parameter. So the function is called from outside with tainted data passed in. And that got us all the SQL injection. Now, the next thing we knew about was there was some directory traversal vulnerabilities. Now, the first one was actually easy enough to work with. It's exactly the same example, but the dangerous sink, instead of being statement, IOTE, Q, query, it's now the constructor for file input stream. But we'd already designed the XML for the knowledge base, that was no big deal. And it turns out that there's a second vulnerability, and this again happens because of the structs form B. So, So far,

we've found the esterol injection and the path injection. This was feeling pretty promising, but things started to get a little bit harder.

We wanted to look across that scripting, and this application uses JSP as the view layer. Now, when we looked into how JSP works,

There is a compilation process. So you have a tool, something like Apache Jasper, which takes a JSP file and produces a Java file. This is then compiled by the normal Java compilers to produce a class file. And then that is run by the web server. So we thought, we've already got this bytecode analyzer, which was pretty tricky to get working. I don't want to start again. I am going to reuse the bytecode analyzer on the JSP files.

to tweak a setting to show you what the problem was. So the first cut, we did this and we identified a reflective cross-site scripting vulnerability.

But our first cut, the reporting is not really useful to anyone because the GSP had been compiled to this intermediate Java file The trace here is showing you the intermediate Java file. This isn't a file that any developer is ever going to work with. In fact, most people are completely unaware of it. So while we found the vulnerability, the reporting is no good. Now, it turns out that when they developed JSP, they predicted exactly this problem, not for static analyzers, but for runtime debuggers. A classifier that's got a section called smap. And smap lets you map line numbers that are in the intermediate Java back to the original JSP. So, we're able to add a feature that did this.

And when we look at the trace now, it's a great textbook example of cross-site scripting. It's the code here.

is just fetching a parameter from a request and spitting it out. And just by adding the dangerous sync for cross-site scripting and the support for SMAP, we've now got a pretty decent way to find and report cross-site scripting in GSP's, or at least the first one. Because there's also a store cross-site scripting in this application, which the tool did not detect initially.

Now, I set about to try and upgrade the tool to find it, and this turned out to be much harder. Now, I started with a completely wild assumption, which is that anything returned from a database, we're going to treat it tainted. This might get us some false positives later on, but this is a way to start. By good analysis, there's no trouble tracing this through the controller. So you can see the database query, you can see that there's a bit of processing, getting this in a suitable format to pass to the view. And what happens is it gets added to this list, order detail.

Now the JSP file uses the structs tag library.

So what we've had to do is add support for the struts tag library, which turned out to have all sorts of subtle difficulties because

where it says value equals, it turns out they've actually got a whole embedded expression language. It's called OGNL, Object Graph Navigation Language. And those strings are actually evaluated at runtime. So we had to add support for the struts library. We can create a mini OGNL parser. And now we can trace how the data flows through different JSP variables. So we can see how this expression results in a call to get order detail, which returns the field that we already know had got tainted data added to it. You can store one of the items in the comment field. And then the final bit is here. where we've got structs property tag, and it's got the escape equals false. Now, we

could have just written a grep for escape equals false, which would have been way easier. Way, way, way easier. But the crucial thing we've done here is we've shown that tainted data gets to that property tag.

When we look through the application, there are other tags that got escaped and disabled, but it's only ever hard-coded strings that get to them, not ever anything from user input. So I'm kind of at this point, I'm still hopeful that this is a good thing to do. Yeah, so that was of the main learning points of adding support for Butterfly. Now there is a companion application for Butterfly, which is called Pulse Secure. It is all the same functionality as Butterfly implemented securely. And we thought this would be a good thing to scan because we've been calling all the other tools for cold positives. What I didn't want to happen is point it at the secure version and have

it light up like a Christmas tree. Now it turns out that there is a whole load of different ways that people attempt to fix these vulnerabilities and ways that you can get it wrong. Now, I'm gonna pull up a few examples now. I'd just like people to vote on whether you think this is secure or vulnerable to SQL injection. So, example one, hands up for secure.

up for vulnerable hands up for I don't want to put my hand up

okay now I'm not going to claim to be the answer of truth here so I think this is secure someone may show me wrong but right now I think this is secure this is an SQL query done with a prepared statement, a parameter as being done correctly, in that the SQL statement is a constant string with that question mark as a placeholder, and the user-supplied data is connected using the parameter. So, prepared statements, these keep us secure, yeah? Take a minute to consider this example, which A real client actually landed on me. Okay, hands up for secure. Hands up for vulnerable. Very good. Yes, yes it is true. They are using a prepared statement. However, that is only

in fact, not in spirit. They are building up a dynamic SQL statement and then train a prepared statement with it, which does not help. Now, the first time I saw this, I hadn't even thought to make the constructor of a prepared statement a dangerous sink, but that needed to be out of it.

Now, everyone says prepared statements are the preferred way to fix SQL injection, but not the only way. So, just take a minute to think about this example. If anyone's feeling pedantic, it is connecting to a backend Oracle database. So what do we reckon? Is this secure? Is this vulnerable? So I reckon this is secure. So what we've got is a sanitiser. Now this function is rp in code for SQL. Now we know we can't trust the ID parameter, But we can sanitize it. So once it's been through the sanitizer, it is safe for us to build up a dynamic query. And so when we hit the dangerous sink, there is no vulnerability. So what we have to do to support this

is add the concept of sanitizers to our knowledge base. And we've been through some of the key libraries, which OWOTS is one of them, and put these into the knowledge base. Now, one thing that we identified is that the sanitizer has to match the sync. And just imagine if instead of encode for SQL, if they put encode for HTML. I mean, great. It would have stopped cross-site scripting, but that's not much help when you pass it to an SQL statement. So that was a crucial realization of sanitizers. They have to match the sync. Okay, take a minute to think about this example. Right, hands up for secure.

Hands up for vulnerable. Very good.

What's happened here is we've got sanitization and decoding done in the wrong order. The rule is always is that you decode first and then you sanitize. Now let's just think of what would happen if we had an SQL ejection attack in the ID parameter. If we have a simple attack, it comes into the ID parameter, encode for SQL escapes it, URL decoder does nothing, and the SQL statement is safe. Now, What would happen if the attacker URL encodes their attack? Well, this comes into the ID parameter. Encode for SQL looks at this, it sees stuff that is not single quotes, and it leaves it be. URL decoder then comes along and decodes our encoded payload,

and conveniently puts it into the dynamic SQL statement. So yes, correct. This was vulnerable. The crucial point here is that decoding wipes out any sanitization that's been done. Okay, I now have the fifth and final example. Right, so take a minute to digest this.

Looks very similar to our first example with a sanitizer. Okay, folks, force secure.

Vulnerable, excellent. The difference between this and the last example is just single quotes around the ID parameter. The first example, the user supplied data was sanitized and then put in the query within single quotes. Now this one, it's just put into the query there. And encofrescal does not help you with that. So it would normally be a number that's put in, but if there was followed on by a union statement or something, that would cause an injection.

Now it turns out this is a bit of a pain for an analyser because you need to keep track of the syntax of SQL strings, even when you've got user-supplied data in there. So you've got some sections of the string that you know it compiles at, and some that are unknown.

OK. So having been through all that, we thought we were ready for the next challenge. So we've had a go at scanning WebGare. Are we all right for time, by the way?

I got away with the two demos that I've done so far. Now this one's a bit more scary, it's a bit of a bigger app. So it's gonna run for about 30 seconds with the progress bar ticking across like this. Now it could be that it's all working absolutely fine. Or it could be stuck in some kind of infinite loop and I'm just gonna swipe more profusely while nothing happens. But it's looking promising.

Sorry, you've probably not come out great because of having to be zoomed in a little bit. Now, what's interesting about this is we've now got quite a nice selection of different types of vulnerability. Now, there's some things that this doesn't pick up. Things like cross-site request forgeries and authorization flaws. It doesn't pick them up. But anything that's an injection flaw, this is pretty good at picking up. And we can have a look at some of the different findings to get a feel for how much is this helping us identify them.

We've got this X-path injection. Now, if we look at the trace down here, there's a couple of functions in the way because it's got sort of a mini internal library called parameter parameter. So the ultimate untrusted source is here where it's calling servlet request or get parameter values. So we can look at the trace of that, but it's not that exciting. This function xpath injection.createcontent is where it's all happening.

And the basic set up is is exactly the same as all the other vulnerabilities we've looked at. We have got an untrusted source. We are dynamically building up an expression with no sanitization. And the sink is here. Now you see, it's the sink that's always connected with the type of the vulnerability. So when it's expat.evaluate, if you get into data there, the vulnerability is expat injection. So we've got another example here, which is shell injection. So the code is reading this parameter from the web request, and it's building up command line without doing any sanitization. So when we pass that to exec simple, a user will be able to subvert the command, put a semicolon in, and

run arbitrary code. Now we've got a couple of examples as well where it's a bit like the struts form B. So this function, getCreditCard, this is exposed using Apache Access. So it's essentially exposed as a web service. So the parameters come from the web request. So although we've not got a untrusted source, the parameter that's passed in is tainted.

So yeah, so there's all sorts of examples here that all follow the same path. And I'm pretty happy with how that's worked out, as in, It's able to quite quickly analyze new code. The WebGoat code is fairly complex. For example, it's got its own sort of mini web framework in there. So each lesson is a class. And rather than using Spring or Strux, they've built their own thing called Hammerhead. And it's been able to trace into Hammerhead without being trained about it in advance.

Now everything we've looked at so far has been web applications. And you could extend this to other network applications. The basic threat model is pretty much the same. There were a couple of other targets that we thought about looking at. Now we haven't as yet done this properly. Android apps would be a good example. Mobile security seems to be pretty hip at the minute. Now, some of the common vulnerabilities are service side, but we see client side mobile vulnerabilities, things like abuse of intent. And in fact, the sort of analysis we've done, you can expand it to client side mobile vulnerabilities quite readily, just in case of identifying what the untrusted sources are. The other thing that we've considered is

analyzing the JVM sandbox itself. You see most months that there's a new Java update. And this is to fix a new problem, whereby a Java app from a web page can break out a sandbox and take control of your laptop. Now, a lot of the vulnerabilities are quite intricate. But something I think this tool could do is if you identified a certain pattern of vulnerability, it could be very efficient at searching through the entire Java core library and identifying if there's other examples of them. So guys, having researched this, built a tool, had some fun doing it, I've come to the conclusion that static analysis is definitely useful, although it can be kind of tricky to do as

well.

So I think we've time for a couple of questions if we. Sir, is Tool-Log a consort in ESO? Where can we dial it? No, it's not. We've not worked out if we're going to release it like that or not. So if you want to close something now, So, the defined security bugs is the best open source thing. So, I'll start there. Thank you. You. So, you seem to be saying that XO sanitization, I might agree, are going to probably be okay, which is a reasonable assumption. But then how do you handle inline sanitization? So, if so, you've reported the previous process scripting, the developers going to fix the exact string that you did. How are you handling that when there might be other issues

having a regex? If there's a sanitizer that's not from a standard library, you can manually mark it as a sanitizer in the tool. Now, when you do that, you as the knowledgeable guy running the tool is asserting that the regX does indeed correctly block whatever it's happening. It's all or nothing. You're either saying it's fine or it's not. There's no middle ground where you can say it's trying to be best purposeful in these situations. You'll find in security there is no middle ground, you're vulnerable enough.

It's related to the same question. For the scenario side on the decoder, is it not possible to run some unique tests against some normal strings to be able to decide whether they are vulnerable or not? Because you've got a shame of what is normal. So if you know that the thing is a function, you might actually see that. Yeah, potentially that's possible. As I understand it, some of the commercial tools do that and some have made a decision not to. Now, we never even investigated it. But one of the things that concerned me is actually the struts vulnerability that I showed you.

About 10 years ago, when it was very young, they had no sanitization of untrusted stuff going into OJNL. And about 10 years ago, they fixed that by adding a sanitizer they rent. And then you fast forward another eight years or so, and this critical vulnerability comes out because the sanitizer was inadequate. Now, I don't think an automated tool's ever gonna make that kind of judgment. So I think that's quite a good argument for forcing it to be a manual decision to trust a sanitizer.

Yeah, well, that's kind of a dynamic kind of tool.

Well, by all means try it. And if you do a cool plug-in to find security bugs, you know, go for it. Put me in for 10%. Have we got an email? Rich! In your WebGrid example there, it found the Windows shell injection. It didn't seem to highlight the unit. No, you're right. It doesn't. What we've got on there is quite aggressive duplicate removal. Now, something that I wanted to work on was a kind of configurable level of velocity. So the analyzer, when it finds all the unique traces, turns out there's like gazillions of them. And they'll all be very similar. So it reduces them down to an original variant. It hasn't got a way of showing you the

supplementary traces for the bugs. But it may be one day.

Have we got any more? You? With a lot of these static code errors, I see them written in the same language that they test. So, you know, in Apple, Java, Java, RIPs, it's written in HB, and one of the problems I find is that it's just how to treat them or efficiency because of high applications.

Is there a reason you think it has to be the same language written in or is it just a design choice throughout the same Java? Yeah, there's no inherent reason. I mean, this tool is written in Java to analyze Java. If we were to move it to, say, analyze.net and see shadow, about 50% of the tool you could just keep exactly the same. But you'd need to write all the knowledge-based plugins you'd need to write again. And the analyzer would, big chunks of it would be different because you've got a different set of bytecode you're working on. So it was kind of just for convenience. Java has got good libraries for passing Java bytecode. Other languages don't. I mean,

probably the ideal thing to do would be to write the core analyzer in C so it's super fast. And then write the UI that spoke in a high level language. I was actually kind of related to what I was going to ask, which is you could suggest which is going for something like a compiler that targets many languages and has the same kind of intermediate bytecode and going at that level. Might be a way to extend it to you do see it, you get four trans or three or whatever. This was one of the reasons we were attracted to Bytecode, because it's not just Java on the JVM, it's Scala, Groovy, and others. And originally we thought, oh, this would be

great. We'll write it once and we'll get all that for free. Turns out it's not quite for free, and that there would be significant effort in supporting each one. But yeah, it's good. And yeah, if you could hook into the LLVM intermediate format or something, that would be interesting. Hello. So how long did it take to write and do you actually use it for real? Ages. It's been a bit on and off. I've probably spent at least 18 months on this. Yes, we have used this for real. Our favorite example is one we've got guys doing a pan test. And they get some kind of service-side vulnerability, which means they can nick all the source code. And that's

been interesting. We have used it for client engagement, too. What we've not done, though, is put the final push of engineering effort into it to make it just a sort of fully point and click. It just does it. It's a bit like we're 80% of the way there. And that last 20% would be really difficult.

Okay, have we got any more? Or can we sign up? Well, I'm gonna be here all day anyway. I'll be on the pentestown for bits. So if you wanna ask stuff, I mean, if you've got any cool ideas for things that a Satsuki Analyzer should do, I'd love to hear it. So come and hit me up.

Static Code Analysis, from Source to Sink

Related talks