← All talks

CG - 101 Things Your Application is Doing Without Your Knowledge

BSides Las Vegas34:14183 viewsPublished 2024-09Watch on YouTube ↗
About this talk
Common Ground, Wed, Aug 7, 12:30 - Wed, Aug 7, 13:15 CDT Every time you bring code you didn't write into your application, you're possibly introducing behavior you weren't expecting. Even using well-known and battle-tested dependency libraries, your application might be opening files and making network connections without your knowledge. Come hear about some crazy hidden things we've seen applications doing, and how you can learn what yours are doing as well. People Mike Larkin
Show transcript [en]

uh so welcome everyone thanks for coming um good to see nice full packed room that's always good to always good to have that it's never good when two people show up um I've given this talk before this is a slight variation of the one that I gave it besides Seattle was anyone up in Seattle and saw this talk there oh good so it's new for everyone great so talk about

is that working um so what are we going to do in the next 45 minutes well start with a quick overview and talk about application behavior um unexpected behaviors in applications and where where you might find those and what they might be coming from why it happens and how do you know it's happening in your own applications and then a quick wrap up um just real quickly quick show of hands how many people would say that part of your job is respons you're responsible for either developing or shipping secure applications okay good how many people are responsible for breaking applications about 5050 some people raise their hand for both breaking applications and delivering secure applications awesome

okay so a quick um quick info about me um I'm co-founder and CTO of deep Factor we have a booth right outside um in a nutshell we make software that helps developers fix and find vulnerabilities and prioritize them in the correct order um what should be fixed first second third fourth uh based on various various factors I'm also an adjunct faculty at San Jose State University in California been teaching there for a long time um I always joke with my students that the first student I get that says my mom or my dad took your class then it's time to go fortunately that hasn't happened yet but I'm at 22 years there so it'll be

happening pretty soon I think um I teach in the computer engineering uh master's degree program uh software engineering degree teach virtualization Technologies software security and sometimes the operating systems course some fairly lowlevel stuff I'm an active open source contributor I've been working primarily with openbsd since 2008 I wrote the hypervisor there some device drivers power management and whatnot so in this session going to show some things that we've seen applications doing when I say we I'm talking about specifically about our product our product monitor application behavior and some of the things we've seen over the years are are are pretty crazy and I put together a list over the years kind of jotted some things down where I said

this is this is weird and strange behavior and that's what I'm going to share with you guys today um some of these things that you'll see are very questionable security wise some of them are questionable performance-wise and some are just plain head scratchers like why is the application doing this um I'll talk about how we found these things it's really no secret um and I'll talk about how these things possibly can happen and how you can fix these things in your own environment sound good awesome so you can learn a lot about how an application is constructed and built based on monitoring what it does and we'll talk a little bit more later about how you mechanically do the monitoring

um but if you watch what an application does like what API calls it makes what system calls it uses what library calls it uses you can get some pretty interesting insights as to what applications do you can also see things that I wouldn't call them hidden behaviors but it's the point of this talk is to talk about things that you might not know we going on so uh what API calls are used again uh what parameters are passed what files are open enclosed and whatnot many times uh these these uh behaviors were coded into the application by the developer in other words it's what the developer wanted so for example they did they really write

code to open that file or make that network connection and so on or uh sometimes it these behaviors are introduced by thirdparty dependencies um how many people have heard that quote in that second bullet item hey I found a library on GitHub seems to do what I want to do I'm just going to use it okay show of hands how many people have done that all right good in Seattle nobody raised their hand I said so I've got a room full of liars right so of course stands to reason that if you import somebody else's code their behaviors in that are coded into their library or dependency now they're your behaviors right so what could some of these things possibly do

well that's what we'll talk about today the moral of the story and kind of one of the important things I want to leave you with as you leave The Talk is that most of the time developers don't know that this is happening right they import something who's going to go through and vet thousands or hundreds of thousands of lines of code so now all of a sudden an application has introduced Behavior or dependency has introduced Behavior into the application that the developer just flat out doesn't even know that's going on okay unexpected behaviors logic that you coded or your team coded is what you expect to happen when your program is executed obviously but there are many things that

happen without you knowing what are some of those things anytime you leave your own self-coded control flow paths all bets are off and that's not really a Vegas uh it's not really a Vegas call out but technically that's true all bets are off um so let's look a quick example this is the dumbest program ever can anybody make a dumber program maybe I don't know but this is pretty stupid right what does this do well it's a bass script that prints hello it clearly does no file manipulation I mean perhaps writing to standard out or something like that it doesn't use the network it doesn't manipulate environment variables it does nothing absolutely nothing whatsoever so

show of hands how many files get opened by the operating system let's just assume it's Linux for the moment how many files get opened when you execute that script one want to take a guess could be easily a dozen okay well here is it more than five how many people say more than five people say less than five how many say more than 50 now think about everything that could get open right all I'm doing is running bash and saying Echo hello what was that more than 100 well one of the answers I always like to use with my students is my favorite answer in the world is it depends it depends based on your

environment and how it's configured but it's quite it's it's more than you think what things get opened when you run Bash Echo hello well the the what does the operating system do assuming you're running it from another shell what happens process Forks process execs the new bash so when you do a new ex when you do an exec what happens the shared Library loader ld.so gets invoked looks at the header of the executable to figure out what things need to be brought into memory so that's at least the executable itself plus any libraries let's just assume for the moment that it's just lib C what happens when libc gets initialized well it looks at all kinds of other stuff right so the

fact that you're the fact that you just have two lines of code should not be an indication that this is a simple operation actually quite a few things get open we'll look at it in in a moment more important thing is how was this application built I'm using bash as an example here what compiled feature what what features were compiled in or LinkedIn does your distribution vendor that gave you bash Link in all the things um they did a blog post on the uh XZ um issue from a couple months ago and looked at what was linked into uh various system programs and it's interesting that uh sshd the SSH server uh on different on

different distributions had between five dependencies or dependent libraries and 47 so if you have different things linked into your executable different things will happen right so strive for thinness if your if your distribution vendor is not linking in all the things with behavior will be there'll be fewer unknown behaviors do you have environment variables set that might guide bash in this case to behave in different ways there's many different environment variables that you can set that bash will look at and do different things based on what you've got these things set to perhaps opening different files or whatnot so things like this are are the what can result in potentially a lot of things happening in your application

that you probably don't know about okay speaking of environment variables how many environment variables do you think are read by the system or any process when bash is executed it depends it depends how many are changed it depends now we're going to actually look at it in a moment again the answer is it depends based on how your environment was configured um the more options you have LinkedIn the more things are going to be looked at building software with every possible thing LinkedIn as features is very risky case in point what I just talked about if distribution vendors that choose to link in every possible thing into their environment and you end up with well you

end up with the kitchen sink for just for fun on your own machine sometime go run ldd against some of the system services that are running what you want to see is a really short little list you don't want to see the universe like for example why do why does why do the audio apis need to be linked into SSH right doesn't make any sense because it's a dependency of a dependency of a dependency and something needed it so we already talked about this but process behavior is is affected by the environment in which it runs so for example uh you might have LD uncore options set LD audit LD preload to to in to influence the the ld.so to do

different things when the application is started but uh um yeah I think we've talked about most of that already so let's go back to bash for a second so just a second can you guys see that I just want to make sure that I'm okay good all right um so this is just a WSL VM that's running on this machine um theun 222 um if I just do Bash - c Echo hello it's fundamentally the same thing as what you just saw so anybody have an idea how we can determine which files are opened because that was the question right what files are opened when I run that anyone have an idea srace perfect example there's lots

of different ways s trce is super easy for the purpose of of today's demonstration so we can do estrace bh- c go hello yeah I'll come back to that okay so it shows you all the things that happen right and we can do some quick grapping for opens and whatnot and you get oops I have to do slightly differently here okay let's look for open in temp log so it opens a bunch of stuff right and if you word counted that it's I think I checked it before it open something like about 40 or 50 files most of those are just local definitions being parsed and whatnot but depending on how your environment was built here's

the answer I mean I asked how many files get opened and it's at least now 32 okay now um somebody in the front said s-f to to of course you want to GA you want to make sure that you take in consideration any child processes and whatnot the the number would just grow right okay that's files now we'll talk about why that's interesting or not in a moment but what about environment variables I just use srace to figure out what environment variables get set get EnV and SE EnV are not system calls so they end up just reading the invironment block out of the process's address space so there's no get EnV that you can search for here but what can we

do to see what get EnV is actually how many times it's been called make an LD pre-load library that overrides EnV and set that and then print out some information that's too much work for a 45-minute talk we can just do this just GDB it

right so we'll put a break point on G andv and we'll just run it so how many do you how many environment variables do you think will be will be queried we didn't ask that it depends a let's take a look well down at the bottom you see Lo path that's the first thing that gets queried I'm not quite sure what's querying that but you can keep continuing this LC all blah blah blah blah blah blah blah blah blah blah blah lots of things lots of local related stuff but eventually you get to a point where you're out of the library initialization part and you're here now this looks like some kind of built-in G andv uh how do

you print the argument ARG in the red which register holds first argument ah I gotta the answer is what's my favorite answer depends I didn't give you enough information depends on the architecture RDI is correct on amd64 x86 64 this is an arm laptop what is it on arm r0 x0 whatever it is I think it's x0 term info home home again I would you read it twice because maybe it changed in half a millisecond okay you get the idea right lots of them we could sit here all day watching this it goes on for hundreds and hundreds of things the more important thing is not hey here's what here's what environment variables are being read the more important thing is

what's going on in the application when they're being read does anyone remember actually it's coming up here on a couple more slides uh does anyone remember Looney tunables Looney tunables was a cve from late last year and it had to do with improper parsing of an environment variable gipc tunables so the fact that your application is reading this and using code to parse it that you've never written yourself is an indication that something is something possibly could be bad okay would you guys agree with the first statement any any denters good most development organizations don't have the time to vet every piece of code that they that they bring in right it's not possible it just

simply Isn't So this leads to these kind of behaviors that we're talking about now I've kind of given you for the past 15 minutes or so just an overview of application behavior in general and how you can monitor stuff now we'll actually look at some of the things crazy things we've seen applications do sound good all right how many Java developers do we have out in the room how many Java developers that don't want to admit to being Java developers I like it how people then raise their hand um yeah I'm a recovering Java developer done Java development in the past um anyone know what jni i is the Java native interface it's how you call C basically C and C++

code from Java um it's allows you to call and pass parameters back and forth different functions there is a common redacted names protected uh names changed to protect the guilty uh jni based tracing framework out there real super popular it writes doso files which are again you know shared Library code totm it then loads them with DL the jvm then loads them with DL open which is what you do to get an so into memory uh when you call the Java side of the framework sounds okay right until you have this now can anybody see the problem with this you're creating executable code in temp with mode 777 with a predictable name what could go wrong

right well if you can simply guess the next name it's 777 you can surreptitiously place a malicious piece of code in front of the thing that's creating the legitimate code and Tada you've now commandeered potentially commandeered control flow of the application so oops you'll see a lot of 777 today seems to be everyone's favorite number and again it's not a Vegas thing okay so I again I hope it's obvious why that's bad um the more important thing again toon and this is what I was talking about earlier I hope everyone takes away from this that the developer probably had no idea this is happening right it it's just hey I use this framework and it created a library and

invoked some code in it okay example two speaking of creating files with mode 777 again it seems to be a popular mode we see log files being created with mode 777 all the time because everybody loves executable logs right right now again as a developer I'm probably not even aware this is happening as a developer I'm probably saying you know log. message something something something um I I don't pay attention to these kinds of things making this even more risky is the propensity that many logging Frameworks don't do proper parsing log with a parameter they just simply write whatever parameter you specified into the logs so what could you do with something like this so a

kind of a contrived example is something like this um consider an application that logs some you know user provided input you know user Fu logged in so you get a log message that looks like this now what what would what could happen if username was not or if the string in here was not properly parsed by the logging framework I mean in theory I could do like something like that's my name my username is that now granted you probably have to find a way to somehow execute that but the there's a lot of different things you could do with this right I mean now you have an now you have an opportunity for somebody who's controlling one part

of an application writing data into a log file that could then possibly be executed by something else that was commandeered from some other from some other Vector okay so executable logs I mean those are great right everyone loves executable logs bet you guys have never heard of executable Json or executable markdown this is actually a screen capture from uh from our product when it found some things that were being added now again I don't want to pick on permissions here because it's we've already talked about it but the permissions errors they're they might be innocuous but it indicates one of a couple things it's either lazy coding or maybe a wrong um mask setting or

something similar now these are usually created again with mode 777 so back to rewinding one slide if I have a piece of an application that's writing like ajacent configuration file with mode 777 what does this mean for something else on the machine if I commere a different component I can now affect the configuration of a different thing on the system right so just I don't understand why I don't understand why this happens but it does so why tempt fate fix it so back to environment variables again for just just a moment um we've already seen that environment variables are queried more often than you might think um some could be dangerous again uh tunables is

one example uh anyone ever used Less open and less close anyone even know that such environment variables exist now Less open and less close I don't know if they're in all versions of less but older versions of less used to look at this environment variable and if it was set when you did Less on a file it would actually run the command and less open first so if you could drop that in somebody's environment then you have control over anytime they run less they get to run another thing of your choice um even if you don't use these environment variables and you probably don't make direct reference to these um code that you imported might do this or system

level things uh one thing we see often in in looking at applications that are out there with our product is uh lots of debug environment variables set and lots of Secrets and keys and passwords in environment variables how many people have ever been guilty of putting a password in an environment variable and don't want to admit it yeah we see it a lot I mean this is why vaults exist right now you may say well you know if I set it in an environment variable it's only readable by that process right anyone know the answer to that last question what is the security boundary for an environment variable on a

machine oops get out of that one oh

sorry guys see that okay okay I'm right now uh PID 366 and that is my shell right here you guys know that you can read environment variables cross processes as long as you're the same user right a lot of nods some shaking their heads so yes you can the fact that I own process I don't know 401 here I can go in to proc 401 Environ and I can read its environment now that's not that's not a a new revelation that's been like that forever but people tend to think that oh if I put something in an environment variable it's secret it's not you can generally read cross cross things

right okay talk about that here's a good one what do we

think all right I'll give you some hit it shouldn't but can it the answer to this one is not really it depends I mean I suppose it kind of is it depends but it it definitely can happen how well we did see an environment where we've had auditing libraries that were left enabled and they were set with you know LD audit or LD preload set to pointing to some particular library that had Library initialization code in it Library initialization code that actually did a network connection um so when you have that set bcat effectively made an outbound network connection every single time it ran it's worse than that because it was set systemwide every process made an

outbound network connection to do phone home sounds bad right it gets worse what is this phone home phone home operation doing well the way this thing was built is that it did an https connection and then used uh like a a Json parser with rest to post a request to some update server to see whether or not there was a new version of the software available so what does this mean this means you have a library that's now injected into every process in the system which contains open SSL adjacent parser and an GPS and a TLS um mechanism and that's really good right because we know that none of those things ever have bugs right yeah so again case in point

developer had no idea this was this was happening so you have code that just simply gets injected and does random stuff speaking of phone home Connections in our internal testing we found a popular dashboarding product that makes phone home connections probably again to do update checks again again once a second because you really need to know within one second if there's a new version of the framework available right um it also does this check to every single IP address returned from the DNS query so it does DNS lookup on my updat server. f.com it gets back 20 things including half of them ipv4 half of them IPv6 and it tries every single one once every

second now typically like a WAFF or something can help stop that but that's just bad behavior right um the other problem with this particular framework is that if you looked inside the Cod it's open source look to see what it was doing it basically just blindly trusted a URL that was given to it and says Go download this and install it can anyone think of I guess very recently some code that got automatically installed and caused a lot of problems yeah so yeah this is uh not not good stuff um speaking of network connections um our product lets us correlate uh different network events over time so basically we can say like um I saw your

product I saw your code make an outbound network connection to IP address you know 1.2.3.4 but it previously had not obtained that through a DNS lookup so basically what is that mean it's it's a hard-coded IP address somewhere um this helped us uncover an environment in some uh customer environment where there was a connection and code to a fixed IP address um coming from a dependency in the application of course developer did not do this does connecting to a hard-coded IP indicate a security issue I don't know maybe it could be command and control from some hijacked Library um it could just be a lazy developer either way it should be investigated the point is how do you even know it's

happening right what about this one anyone ever guilty of doing srand one or srand zero or srand Time come on um yeah we see this being called all the time now there's nothing inherently wrong with these apis except if they're called in certain contexts so the question at the bottom of the slide is is asking when is this kind of okay and when is this really bad anyone know the the characteristics of the apis that are listed on the top there what was that uh they're Global yeah um but they're deterministic right so it's a deterministic random number API so when is that okay maybe test programs or something it's probably fine when is it

bad any cryptography context and we we see this on occasion um it's not good um but yeah it's like where did you come up with this code right so you either were reading a textbook from the 1970s or 80s when this was like the only way to do it uh or it was we went to stack Overflow and said how do I generate a random number right cut and paste it into your code um ancient functions like this uh that have better versions are um are still being used so we talked about Rand and srand um no bounds check string manipulation operations we see these all the time a lot of this is in Legacy code

it's getting better um but why there are better versions of all these things available so you should use them and I'm not saying you should use them I'm saying the people writing the dependencies that you're bringing in should use them so why not spend a few hours and clean up this code okay almost done so what are some of the reasons why this happens and this is just kind of you know my own personal opinion here I think we see these issues for several reasons um increasing use of third-party libraries I mean nobody writes every single line of code in their in their app and the like it or not the the rate the the velocity at which software is

developed today lends itself very well to going to GitHub and finding the library that you need um there's nothing you can do to change that um lack of time for developers and security teams to properly do proper vetting another issue is when when you if let's say you had infinite time and your security team and your development team said uh your development manager said I want you to go and personally inspect all 50,000 lines of code you're importing you going to always have that same 50,000 lines what happens when version two of that library is released next week right you're going to have to go do the same thing again so there's just a lack of time time is money and

you can't even conclusively do this so you you transitive dependencies dependencies that bring in other dependencies and so on um it's just it's really difficult and then lack of knowledge by some developers just even about security risks I mean it's kind of why we're all here right um Junior developers sometimes aren't trained in this mindset um so it's it's it's tough now I have some ideas on how to fix these but again they're just my own personal ideas here's some suggestions um strive for thinness whenever you can um choose distributions that if you're on a if you're building software for Linux choose distributions that have uh thinness as a mindset um Alpine's a really good one right they they strive

for very very thin software um don't Link in all the things just because you can um don't don't say I might need smart card logins in my app someday so I'm going to link the smart Ard library and don't do it until you absolutely need it uh use tools to periodically audit process behavior and apis that are being used um the only way to know what's happening is to actually look and see so we just touched on some ideas today I mean there's commercial tools like what we offer but there's other open source tools out there as well to help you check these things um compare behaviors compare current behaviors to previous behaviors look for drift make

sure things aren't changing without your knowledge and uh coach you know coach Junior developers about the importance of this um you know there's no shame in in in saying you know we shouldn't be doing it this way we should be doing it that way and I guess you know try as best as you can to vet imported code and then short of that I think you have to write everything yourself which is not really going to work all right so that's what I had um we've got some time um I can give everyone back their 15 minutes and if you have some questions come on up at the end uh otherwise uh we're right outside at our at our booth if you have

any questions about what we do or want to talk about this more hope this was entertaining and interesting um I find it I found it writing these things down over time helped me understand what apps are doing a little bit better so hope it did for you as well thanks thank you if you have a question please raise your hand so I can bring the mic over so everyone can

hear a sound check all right this is really good stuff thank you so in in my position we have like an EDR application that we use that Flags a lot of this stuff for us right like packing a encrypted binary or a weird network connection or stuff like that right but sometimes it's a 5-minute fix like I can ask somebody and they say oh that's not supposed to happen or oh that's this thing right but sometimes it's just a a legacy development team in another part of the country that has no idea what they've inherited from 5 years ago 10 years ago whatever and they just say I don't know but I don't want to worry

about that I don't have time so we have to make a Call of do I really care enough to do like a deep dive investigation into 30,000 lines of C code to find where this is coming from or do I just block or ignore it right CU it's probably not from 10 years ago exactly right so I guess what words of wisdom do you have about when do you make the call about when something is worth looking into and when do you just slap the firewall turn it off and say all right we're going to forget about it you're going to love my answer it depends yeah I I don't think there's a I don't think there's a I don't think

there's a a One-Stop shop answer for that um it it really does depend on the application depends on the criticalness of is it a critical line of business piece of functionality in your life business or is it just this you know tool that's off on the side somewhere um I I I don't want to kind of Dodge around the question but they're really it's it's tough and I I wish I did have an answer um no it's it's it's a case-by casee basis unfortunately I wish I got paid to look at this stuff right like that that's the fun part of the job right is finding this stuff but yeah it's it's it's interesting when you when you bring up

an application a lot of this stuff we find just like testing Open Source Products it's just like what are they doing right it's like what is going on here so great question yeah cool looks like that's it I I'm happy to hang around up here in the front if you want to ask a personal question or something that's off camera no worries thanks everyone [Applause]