BSidesTO2017 Vince Marcovecchio

BSides Toronto24:4857 viewsPublished 2018-01Watch on YouTube ↗

Mentioned in this talk

Tools used

Show transcript [en]

all right good morning everyone welcome to assessing nodejs web application security am i clear cool my name is Vince Marco Vecchio I'm a member of the BlackBerry security research group which I've been with for about five years now we largely do internal pen testing and vulnerability assessments my original training is in computer engineering so I serve on the dev side of things and kind of went on to the darker side of security had more fun and my story three years ago we had a dev team that approached our team and they told us they wrote a product and nodejs and they wanted us to do an assessment of it and we kind of turned to each other and none

of us had ever worked with nodejs before and had no idea what we were doing so this is kind of the story of what I've learned over the last three years more or less I'm coming at this from a white box assessment perspective so we usually have source code when we assess apps and what I'm going to be looking at is the sorts of gotchas that I look for in nodejs code both will start at the JavaScript level in terms of things that are common to JavaScript regardless of platform and then look specifically at some node s api's and modules and finally look at some tools I'm using to automate a lot of these checks or at

least to highlight parts of code that need more attention where I'm going through you know giant codebase that's hard many thousands of lines I'm going to try to avoid covering general web application vulnerabilities so I'm not gonna look like a wasp top 10 I'm looking at things quirks in nodejs and JavaScript things that I wouldn't expect it to be there and just the note on the word troublesome or the adjective troublesome here I'm not using troublesome to mean any of these are inherently bad more that they're often misused or they're often just incorrectly used and so if you see them in code that you're looking at they're worth a second look so just out of

curiosity how many people here have written or assessed a node.js app before okay so we've got a small number pretty good so let's there was troublesome JavaScript features my personal training in computer engineering was largely focused on C and Java and a lot of the apps that I've looked at are written by developers who were originally hired to do C in Java and then they kind of got thrown into JavaScript and so the first surprise that I encountered was the fact that JavaScript actually has more scopes than C and Java and I'm embarrassed to admit that it took me several months of looking at JavaScript code before I actually pulled out a book and read this

so there are three scopes in JavaScript not two so in addition to the usual like local scope or block scope and global scope you've also got a function scope and up until echo script six almost all code was written in using function scope because the VAR key where it was the default that everybody used and so in the VAR keyword so the the immediate curly brackets around your variable are not what actually scopes it but it's the entire function declaration so I've got a beautifully convoluted example up here if you run code very similar to this in C or Java you would expect that the final line at the bottom console dot log administered would print out with the

past or it's secret because the you've got a shadowed variable there in JavaScript this is not the case because they're both said function scope both instances of var password are pointing to the exact same spot on the stack and so your final line is actually going to print out with admin has password seven eight nine now you're probably going to tell me Vince why would you come up with a convoluted example like that it's you're never gonna encounter that in production and that is what I originally thought but then I came across a blog post from a few years ago a startup called melon card apparently ruined their launch because they forgot a var keyword and a variable

declaration so what they ended up with as a variable that was meant to be function scoped that was accidentally global scope they did not run into this like didn't come up while they were testing it came up when they launched their web app because they suddenly had you know 5200 users accessing their site at the same time and you saw user requests getting mixed up so the request object from one user but accidentally get transposed to some other users request and so they would not get back data they expected so stuff like this actually does come up in production and so when I'm reviewing code generally I've encourage developers to only use the let keyword everywhere

because let actually uses the more traditional block scope but if I'm seeing things as VAR or missing a keyword which makes a variable global in JavaScript then it's worth a second look coming again from a C and Java background javascript is weakly typed there are usually no explicit variable types that are declared in JavaScript and you've got implicit type conversions that are just messy and counterintuitive so the double equals sign or the double equals operator actually does silent type conversions for you and you end up with the life silly awkward behavior like what I've got quoted there on the screen what I'm comparing just in case the parentheses aren't coming out properly is an empty array

- an empty object and so in the first line we see the empty or in the empty object are not equal to each other and as reflected in the second line when I change the operator to not equals in the third line where I take the negation of the empty object we still get a consistent result but then if you get the empty array then you get the opposite results so weird stuff like that pops up if you've looked around at JavaScript documentation online or people whining about JavaScript online which I tend to do - to be you know honest this is a table that often comes up this is an equality chart for JavaScript and what you've got on the

left-hand side is what is considered equal when you're using a double equals with implicit conversions where you've got on the right-hand side is what is considered equal when you're using the triple equals operator which validates that both the type and the actual value of a variable are equal not just what the value can be interpreted as and so going back to like my perspective like what I'm doing I'm doing code reviews I will try to flag or at least look at anything with a double equals suspiciously because there might be implicit conversions happening that might work counterintuitive counter-intuitively that might you know Express as a vulnerability of some sort type conversions come up in other places

as well we're not just talking about awkward comparisons but also just in terms of like how input is handled and how API is work so the code I've got up here is just a really naive test for what could be considered sequel injection which is we're just looking for a single quote anywhere in the input in the first two you've just got a string being passed to this test and so they kind of reactivated expect you the second second line gets flagged as potential sequel injection because there's a quote there in the third line though instead of just passing and of string and passing in a basic object that just has a string in a

field and that does not actually get tested properly and again because it's javascript is weakly typed you don't get any errors thrown when this happens you just get false but comes out as a result and I have scenes code or further down the line that javascript object gets converted to a string and then the sequel injection suddenly works so I found one reviewing code it's important to not make any assumptions about the types of user input whereas if you're writing you know web application in spring or and say one of the Java frameworks you might have a lot more assurances in JavaScript you don't have those same assurances how many people here have heard of reduce attacks okay

not very many reduce is the buffer overflow of JavaScript it comes up about as frequently as about forever flows did when I was doing code or views and C so what a reduced attack comes down to is a regular expression that is vulnerable to easy bossing and so there are two necessary factors for this first off the regex itself has to be vulnerable and then the rhetorics has to be sent some sort of bad input and so the example I've got here up on the screen is a vulnerable regular expression and a variety of bad inputs where I'm just adding a letter a each time and you'll note that execution time of each line actually goes up doubles with each line

so by the fourth line there that takes 30 seconds to run on my laptop ordinarily and if you're coming at this from again the more traditional web application stack perspective you would think this is not really an attack right it's gonna slow down one user's request and that is it the the request that was the user who's sending the bad data themselves in nodejs land though no jest works with a single threaded event loop that is handling all your incoming web requests so when that last reg ex gets evaluated that takes 30 seconds of server time that actually blocks all request handling for that server so again add another two A's and you've got something a request that will

das a server if you send it every two minutes so it is worse when you're reviewing know dress code look at the regular expressions that are in it not just from the perspective are they actually validating the data expect them to validate but are they susceptible to this sort of reduce attack there's an NPM module called safe reg X that will actually run some heuristics and will let you know is the regular expressions you're looking at are vulnerable or not to it read off move on to remote code exec everyone I'm assuming has probably heard eval is evil or some sort of catchphrase like that javascript has lightful eval function let's line one there which takes a

string and runs it as JavaScript code and obviously if you're gonna pass user input into that then you've got yourself a one-line remote code exec in noches they're actually nine such API is the evil is not the only one so if you're just grabbing for eval you're probably missing most of the potential exploits so that's everything from from evil itself which still does exist in nodejs to the new function constructor can take a string and actually return like you can run it as code the VM module which is a core node.js module which is useful in other contexts can also begin create a disputable code from a string so I find it's useful to look at any usage of

any of these nine api's one piece of advice i've found shared online is to often look for set timeout and set interval which are similar in that they at least in browsers will take a string and we'll run that string as code in no dress though these two api's do not take strings they require a function so actually not a vector 4rce at least not none this way then finally in terms of troublesome JavaScript features let's look at objects and arrays and the weird api's so javascript has some weird api's that let you change very fundamental properties of any object or array that you're working with I listed the five weirdest ones in my opinion up there and

the I guess this is if you won't want a really in-depth look at this there's a talk I'd recommend by an ad Liesel Silvana from last blackhat where she just like rips apart JavaScript interpreters using these api's but I've got a really simple example here where if you call object I've defined property on an array you can actually replace the get method for a single element in that array that's what I'm doing here with this stupid example is just replacing the getter for a Foo sub-zero so that it increments another entry in that array and returns that so every time you access Foo sub-zero it'll just give you an incrementing integer I don't know if

there's legitimate use for these api's or not but in general if I'm seeing developers using these api's then it's definitely worth a second look to figure out what are they actually doing with them and again just to reconsider how you're doing your overall code assessment or code review because you're changing fundamental properties that you would assume you know that accessing element and array won't change it if there's a call anywhere new code object a defined property you need to revisit that assumption of how that code works so let's move on to actual nodejs api's both in the core nodejs build and in the bunch modules that ship with it that tend to be troublesome I'll start with

the crypto module in nodejs so crypto in most cases is actually just wrapping open ssl there are two I would say dangerous documented but not very well documented weaknesses and that's scripted I'll create safer encrypted I'll create D cipher which take an algorithm and a password and basically give you back a cipher function using the algorithm you asked for and they do a key derivation under the hood using the password you provide unfortunately the key derivation is done using md5 with one iteration and no salt there is no way to override that so I don't know why these functions aren't deprecated the documentation is pretty clear that you should not be using them I haven't captured them a production

code before so if you see those pop up they're probably wrong the let's move on to the DNS API so there's a DNS module that is part of core nodejs it gives you a bunch of functions for resolving postings to IP addresses or doing reverse lookups they're not exactly intuitive or they're there weirdnesses to how they're implemented so for example there's both a DNS lookup and a DNS to resolve API which look very similar and if you call them most of the time you look at the same result back under the hood though they're implemented completely differently so DNS lookup is actually using your operating system API is to do this to look up whereas DNS start

resolve is directly doing a DNS query so if you're operating in an environment you know where your server is like behind a firewall you might have weird entries and like your etc' hosts to redirect things to reverse proxies or what have you or you know you might have a weird configuration in one of your resolve doc on files the DNS lookup API or the called DNS lookup will reflect that the call to be honest resolve will not reflect that and so one of the things I look for when I'm assessing applications is to see that they're consistently using one family of API is not both I'm sure there is some sort of weird you know subtle vulnerability

there if you end up with a hosting that suddenly changes IP address halfway through your app execution javascript is a higher-level language by the common definition of higher-level right so you'd expect you're not dealing while you are not dealing with things like buffer overflows anymore and you'd think you're not dealing with things like uninitialized memory leaks anymore but it turns out you are so have this came out a couple of years ago and it did the rounds unlike the net sec subreddit and on Hacker News that the buffer API that nodejs gives you actually returns an initialized memory by default so the buffer constructor there to buffer constructors I have listed there under the hood there's actually a lot more

than two it depends on the tie of the variable X that you're sending in but if X is an integer it gives you back an empty buffer but empty does not mean zero it it's just whatever was in that memory though previously so to be those both those the constructors have been deprecated and though they still exist their behavior depends on the version of nodejs and what command line flags you're sending to note just as sometimes you make it uninitialized memory sometimes it'll get memory that is zero that's properly so they've been replaced by a bunch of a lakh constructors but it's worth noting there are still two a lakh constructors which are called a

lock unsafe and unsafe slow handle a mouse unsafe in this case refers to fact those constructors definitely give you uninitialized memory and so if you ever see those being called in code it's worth double checking that that memory is being zeroed out or being completely copied over before it's returned to a user so v8 is another core node.js module I guess more importantly v8 is the underlying JavaScript engine that nodejs runs it is the same JavaScript engine that chrome runs it's supported I think implanted and supported by Google but so no Jess implements or I'm sorry no just gives you a v8 module that lets you access underlying properties of your JavaScript interpreter most of the cases

most of the api's it gives you access to are pretty harmless things like just getting access to like you know your stack size or heap statistics and whatnot however there's a v8 that set flags from string that is troublesome it's again one of those API is I wonder if there's ever legitimate use for this in production code its own documentation says that calling v8 that set flags from a string may result in unpredictable behavior including crashes and data loss or it may simply do nothing and so this lets you change the flags that v8 is currently running with now again because the most of this is fine tuning performance so most of the flags that

you can change are things like optimizations enabling to say than what optimizations v8 is running with or what debug support it has at the moment there are some different dangerous-looking and like actual dangerous Flags you can call though so use strict and enable or disable strict mode globally in your interpreter and create that set flag some string will let you change that there's an allow unsafe function constructor which to this day I have not found out what that does but it has unsafe right there in the name so in general though again one of the things I grabbed for if my if devs are calling set flags from string in their code what is the reason for it it's

worth double-checking and go through a few other modules very briefly process is another core nodejs module which again is something that has very legitimate uses it also can be easily misused it gives you access to api is to change the UID or the g ID that the node process itself is currently running with now obviously this is still restricted by underlying OS permissions so you know you won't be able to change UID to anyone but if these api's are being called they're worth a second look similarly I can send signals to other processes child process is another core know drafts module that lets you launch other binaries or launch other scripts on your system again there are

underlying OS permissions that will restrict what you can or cannot do but it worth a second look if you see this in code and FFI is just random mentioned I encountered I don't remember and what code base a few months ago FFI is a third-party module that stands for a foreign func foreign function interface that basically simplifies dealing with native libraries lets you dynamically load a native library and talk to it without writing a line of C code so every single vulnerability that you don't listen C can suddenly happen in JavaScript if this is misused so if you see fi being used it's worth additional review now I've just given you a really long list of functions to look for and

rather than just grep for all of them there are easier ways of dealing with this and so I'm just gonna go through a few of the tools that I use the to make my life easier and dealing with with what I just listed there yes lint is the tool that I swear by it is a linter that's based on ass prema prema is also fun if you get a chance to work with it it's a lexer parser that's really easy to extend and write some really rudimentary static analysis way but I'd recommend es lint in two contexts I found it is useful to both have a set of rules that I try to convince developers to use or that we

try to promote internally and have a separate set of rules that I use when I'm doing lunar ability assessments and I'm doing code review that just highlight risky areas of code for me so just going through a few quick sample rules so the no VAR rule basically raises a warning whenever you declare a variable using the VAR keyword or no keyword which means it's either globally sculpt or function scope which are the two problematic scopes the strict rule just enforces strict mode so if there's a verse script that doesn't have a used strict to the topical flag that for you the semi-rural warns you whenever you're relying on a very obscure JavaScript feature called automatic semicolon

insertion which I've seen I've never encountered in production I've seen textbook examples of how this can go wrong but they're very convoluted examples it's worth having as a rule though just for like clean code and then from a from an assessment perspective I've got my own rule set that I run before I review code that just highlights things no eval is one example I'll just highlight anywhere where evil is either implicitly or explicitly being called in the code the triple EQ rule highlights anywhere where you've got a double equals being used instead of a triple equals block scoped var is a really useful rule in that it highlights when you have so ultimately you're gonna

get code where you know best practices or whatever you recommend it has been ignored so I often get you know a chunk of code where var is the only keyword used to declare a variable through like thousands of times so highlighting where var is used is not useful because I'm not going to go through thousands of variable declarations on the other hand block scope var is neat because it highlights where a function scoped variable is used outside of the current block which is exactly the sort of behavior you'd expect or you know exactly serves and it would indicate an error if you know that code is being written by say someone who doesn't expect function scope to be a thing and

moving on from those rules es lint is also great because you can write rules for it that interface directly over the supremist and a few lines of code if es lender doesn't support something you can probably have your own rule for it and do really basic static analysis through that so I included two sample rules here yes Lent has a built-in rule for blacklisting modules it has no built-in rule for blacklisting functions so I threw that together and then in my ESL and configuration file I've got a list of functions that are blacklisted that I got warned on and so this just tests the function based on the function name directly or the module dot the function

name and moving up in complexity a bit going back to those DNS API is in fact there are two families of DNS API is this is a rule I've written that would just logs what DNS family are using the first time you call in to a DNS API so if you're using lookup it'll log that and then if ever season that API from the other family being used it'll warn on those lines so that's proven useful in the past I want to final topic look at is dependencies so it took me about a year of assessing node.js code before I decided to try writing no dress code and when I wrote it I wrote a really basic

web app that just took had like to post endpoints I wrote to sequel Lite database and that was the dependency tree I ended up with that is over 250 modules worth NPM which is the node package manager encourages people to submit like one-line modules and to reuse code almost to a fault so you often end up with insane dependency trees like this and what I'm doing generally white box assessment I do not have time to go through third-party code even if slated have time like recessing this would probably take you know the better part of several months so there are some really great tools and really great vulnerability feeds that I often refer to the true in particular like the

lists are the newest security project and scenario so they both published vulnerability feeds for NPM packages and they both have command-line tools that will just scan your dependency tree quickly and tell you if there are any known vulnerabilities against any packages that you are currently depending on snake IO has a neat feature I haven't really played with enough yet that actually lets you handle indirect patching if there is a vulnerability found that isn't easy to address and yeah that's about it so to recap I just going through that long list there are a lot of language specific things to look for in nodejs now they're worth keeping in mind if ever you're assessing or writing an application and it is not

just like C or Java and I highly recommend playing with an extending es lint and relying on vulnerability scanners for those sorts of third party vulnerabilities thank you we have time for questions and can take a couple of questions if there are any I do not because of legal but I will post the slides so you can copy paste the code out of there

BSidesTO2017 Vince Marcovecchio

Related talks