← All talks

Observability For Pentesters - Rory McCune

BSides Dublin28:10122 viewsPublished 2024-06Watch on YouTube ↗
Speakers
Tags
StyleTalk
Show transcript [en]

time to go um thanks for coming along to this talk um so essentially one thing you might have noticed if you've done a lot of stuff around um Cloud native or a lot areas it is you are seeing increasing mentions of the phrase observability and the idea of observability is becoming much much more used in the general it world so it's interesting that I've seen the kind of in some of the earlier talks today we've seen some of the terms I'm going to use in this one used differently because these are general terms they used in different ways in different places um but this is kind of observability in the general it sense and when anything new comes along in it

um I've been a penti for was a penti for a large number of years my first thought is how do I break this what can I do with it what might be useful to me when I encounter it um but also in observability another thing is how does it work um because if we know how it works we understand where it weaknesses lie or if we're in defensive security we understand how we can Implement Security on it and that's basically what this talk is about try a bit of an introduction to what observability is in this General sense um how we can break into it and also a bit about how we can use it um for our own uses as well so

who am I why am I giving this talk um I have been a security person for a decent number of years now somewhere over 20 um I've done a lot of things like threat modeling uh in organizations I've done a lot of pen testing and these days I am a senior security advocate for a company called Data dog if you not come across data dog we are a large provider of observability and security systems and whilst we're not talking about the commercial side of things at all today this is kind of how I got interested in it because I started getting into observability and start seeing how that works and that made me think well how

does this work with security or not with security um so that's kind of where this talk comes from I do various things in kind of cloud native and security for my sins I help um audit some CIS benchmarks and Implement those uh and I do some stuff with kubernetes and cnca so first up what we going to do we're going to talk about what observability is so before we understand how to break something we need to understand what it is how it works then we'll talk about attacking observability how might you attack observability Solutions how do they secure themselves or not secure themselves as the case may be uh and then using observability in pen tests then I want to talk a about

how I think we can actually use these tools to improve security as well as how we can attack them so let's get started what is observability so this is an industry buzzword and as a result there's lots of people with lots of definitions uh indeed some of the talks today were using observability in a different way so we need to pick obser a definition that works for the purposes of this talk uh and this is one this comes from the open Telemetry website open Telemetry we'll talk a bit about later on um but it's kind of a prominent project in observability and this is how it defines it observability lets us understand a system from the outside by letting us

ask questions about that system without knowing its inner workings furthermore it allows us to easily troubleshoot and handle Noble problems are unknown unknowns and helps us answer the question why is this happening this is this is not a security product this is a general it product as a security person when I read this I thought this sounds absolutely fantastic this sounds like what I've been looking for because if I because excuse me we allten set outside systems right we we're not generally allowed necessarily inside the code we're generally sitting outside uh looking in so if this thing lets us understand how these systems operate from the outside that's really useful and what are attackers apart from

unknown unknowns we don't know what they're going do we don't know what when they're going to turn up because we did weally try to stop them so if these things making systems observable helps us detect unknown unknowns that again sounds pretty useful for security so that's interesting and that is one way of looking at observability so how does it work right that's a buzz word that's the term what actually is it when you put observability into practice in an IT environment what is it likely to look like so the observ world tends to have these what they call Three Pillars of observability these are things you gather Telemetry you gather in order to have a system being observable the first

two of these I'm pretty sure everyone is going to be familiar with right we have log information so systems generate logs applications generate logs they talk about something that's happened an event that's happened in the application a change in state something like that lots and lots of systems generate logs if you've ever used a system you probably looked at Lots metrics any numeric value that you can see that changes over over time so anything you can apply a numeric value to that changes that could be something if You' open task manag in Windows you've used a metric CPU usage as a metric but it can also be other things more abstract like how many users are

there in my application how many login sessions have I got per minute um how long is a cue that in my message cue so anything you can put in numeric format is a metric and I think that that again pretty familiar to everyone the last one here um is distributed tracing or traces and here's where I think we straight in stuff that maybe in security we don't deal with that much um it's a fairly new or kind of unknown concept so the idea of distributed tracing is modern microservices apps right so I've got a cloud native app and I've maybe got an e-commerce shop that has 20 microservices if users checkout session fails how do I know which invocation of

each of those 20 microservices was tied to that user Checker activity and the answer is you use distributed tracing so you instrument your code at an application Level and you pass an ident identifier that goes from microservice to microservice and lets you trace the full execution of an application call now this is designed for troubleshooting purposes and Performance Management trouble purposes but I'm sure if anyone is a pen tester or a security analyst you would be very handy to know exactly how these things all correlate to each other and be able to trace execution from one microservice to another and know which ones are used and this is something which is getting a lot more

traction generally see distributor tracing can really take over with the last couple of years and then the last part of observability all of this would be useless without correlation because if I just take a massive pile of logs from all my containers and my puberties clusters of my cloud systems a massive pile of metric and a massive pile of traces and I don't allow them to be correlated I have going to it's going be fast resolution to problems is it we going be delving through these Stacks so all this information need to be correlated for it to be observable we need to have metadata and tags applied to each thing that's generated when it's presented in the back end so we can

easily query it and that is a high level how you would make a system observable very high so what might that look like an architectural standpoint It generally looks a bit like this we have a set of applications and hosts all of which are gen generated in Petry so traces logs metrics they then either push or pull that information to collectors so in some cases the information is pulled in to the collectors or agents some cases it's pushed in and that mattress for security in a way I'll kind of detail a little bit later on so we're pushing P information into collectors these if you're really really really old you'll know the phrase ETL or extract transform

load um they're essentially ETL um you extract data from systems you transform it in certain ways and then you load it towards back ends and the backends will be some form of UI usually where you can query you can make dashboards you can run reports so a basic architecture for an observability system can be fairly straightforward obviously if you've got Mass massive numbers of applications it's a bit more complex so with that we have some idea of of what observability is what what it constitues and a basic architecture but the question is then what products am I going to see so if I'm a pent tester that's useful but I need to know what products I'm going to see so I go look

at the man page and find out how it works when I encounter it and for this we can turn to a thing called the cncf landscape I know ever look if people I'm sorry I haven't heard of the cncs the cloud native Computing Foundation they are part of the Linux foundation and essentially they handle anything vaguely Cloud native like they have a thing called the cncf landscape um which basically tries to gather all of the products and services which are part of cloud native um this is not the whole cncf landscape this is just the cncf landscape related to observability and even then we have this massive pile of logos um these are a com combination of

commercial tools and open source tools obviously there's no way to cover that sort of level of detail because each one of these is quite complex projects so for the purposes of this talk and just like hold a little bit we're going to focus on the open source ones these are all projects that are part of C yeah these are open source projects that they essentially Foster and create communities around and promote um and here are the ones that are are kind of the core ones um of these actually cortex open metrics and Thanos all kind of roll up to Prometheus so that only leaves us with four ways to talk about which is a bit more more of a manageable

number so we can look at those and see how they all kind of work together first one is flu D um fluid you'll see quite often if you use uh any kind of cloud-based logging system you'll see fluent D or fluent bit so there's two different versions of the same thing but the idea here is it's an open source collector of logs so it will pull all the logs of your systems and it will then apply transformations to them potentially and then send them on to back ends actually the new version also does metrics and traces can of confusingly but traditionally a logging system so it's one of the things you'll see quite frequently um then we have Jer

so jger is a backend so in the diagram earlier on we had backends so UI where you go and see Jer is a tracing back end so if you encounter on aen you encounter jger we'll see one later on essentially you can see all the that Trace information I talked about so Jer is quite often quite commonly encountered as well Prometheus is all things metric Prometheus and the Prometheus metrics formats are all about Gathering metrics data from systems um and notably when I was talking before about some push something pull Prometheus pulls in metrics so any system that wants to send metrics Prometheus will typically expose an endpoint and they'll put all their metrics on that endpoint so they'll all

be visible and available depending on how well they're secured they are which we'll talk about in a second and then the last one is open Telemetry and this is is kind of an interesting project it's currently the second biggest project in the cncf after kubernetes so you probably encounter kubernetes in most modern architectures um cemetry open is the second largest project the idea here is if you want to implement tracing across all the different programming languages you need to have have a Common Language and a common implementation that works together an open provides sdks in lots of different languages and you can essentially drop these into your application and get tracing automatically and I'm seeing

this turn up in a lot of places so systems like Docker kubernetes containerd ETD um all embed open Telemetry and a lot of applications are now embedding open Telemetry so the applications will automatically produce traces and we'll send that on for observability they also have a thing called The Collector which serves that purpose of the agent is all the kind of management of information so those are our projects and and what we've got there question of course right we've now got a stack of projects we've got a stack of things they do and we understand why they're doing it so why have it teams implemented this stuff it's because they want to improve the observability of their systems but what

does that actually mean for security and if we encounter these things as security people what are we going to notice about so the first thing to think about is threat models I actually one of them one of the things I think is most important in both proprietary and software commercial software um security is understanding the provider's threat model so they say we're secure but what's their threat model what do they consider a problem and what do they not consider a problem because it could be wildly different to what you consider a problem so they might say quite honestly we think we're secure but then they don't have your threats at hand they can't because they've got hundreds

thousands if your own source project you've got no idea who's using your software could be anybody so the idea that you're threat model will be the same as dodging um Prometheus went through a process where they actually had a a Security review and they actually after that decided to create this page on their website which details their threat model this is very useful for us as security people because we know what assumptions they're making however as security people this is one of their assumptions it's presumed that untrusted users have access to Prometheus HTTP end points and logs they have access to alltime series information contained in the database plus a variety of operational debug information that's nice but if you're in

a high SEC your environment do you really want anyone who can get to a service a network level to have access to all time series database information and configuration information possibly not so it's very important when you're implementing or when you're pen testing if you to Prometheus to know that that's their default assumption so if you've takeen out the box and you just implement it without changing the configuration and let's be honest anyone who's developed or looked at large new systems have been implemented will find default configurations security defaults are the most important thing out there because most people don't change the default so if the default is that then that's probably what you're going to

find and each of the other projects some of them have th model information published some don't hopefully more will because I think it's very very useful because you can read one page and get all that information rather than having to like dig through the docs and find out what they've actually done so key controls that we might want to see here we're talking about Gathering and transforming and transferring lots of information about how systems operate meanwhile we might want to know about encryption right we would want the data to be encrypted so that it can't be sniffed uh and so we've got some validation on end points typically no not by default they are it's often often offered in

observability tooling but it will not be turned on by default it's going to be clear text HTTP or grpc all the way um as to the two protocols that are primarily used if you've not encountered grpc it's fairly common in Cloud native systems it's a binary serialized protocol so not as fun to pentest but not encrypted unless they actually add it typically what I've seen is iner tooling does not um open source does not put on encryption by default if they do it will be selfs certificates so again there's no validation you might be able to land in the middle and attack that there other one obviously authentication so for the backend systems the things where

we're going to do analysis and information gathering and actually looking at what we've done and gathered we would want authentication on them do we have authentication uh Prometheus supported but not enabled by default so it's not on there jger which is a back end has no built-in authentication their stance is uh you can Implement a proxy in front of jger and put the authentication using that proxy so again as a default that means if you roll out jger you're rolling it out without authentication and if you encounter as a pent tester in all likelihood till after the first pent test report that points out this was accessible there won't be an authentication propery in front

because authentication properies are quite annoying to implement in a lot of times um but that is again important to know about that's the default so what can we get what what do we get out of this it's cool to say okay we we've got these observability systems they are going to generate a whole lot of information um what can we actually get out of it and and the answer is essentially there are three things um you like to get metric information Trace information log information so all the different sources of information that are gathered by these systems might be accessible if you get some application opportunity to pentesting Prometheus how do you actually do this let's talk about

practicals so if you're a pentest what you're looking for you're looking for two ports 90 and 9100 uh 90 is uh standard Prometheus endpoint uh 9100 is an interesting one it's thing called the Prometheus um node exporter so Prometheus can do metrics for hosts Linux VMS uh and the Prometheus node exporter handles that for you um I was doing this dry around of this talk a little while ago and someone said Oh you mean like SNMP and I was like yeah uh it produces metrics about hosts um but instead of pushing that information you pull it so what does it actually look like let's let's do a demo um and actually show you what it looks like so I have

a go I've got node exporter so node exporter looks bit like this um this is Prometheus node exporter if you hit Port 9100 you'll get a page that looks identical to this um interestingly if you search the internet for this and you use that port and the string node exporter you get about 700,000 results um these are all un authenticated because there is no authentications know it's water so there's about 700,000 of the internet people are using this software quite extensively it is out there so what do you actually get well if you go in you get basically all of the metrics in Prometheus format um that's small let's make it a bit bigger and there's lots and lots and lots and

lots and lots and lots and lots and lots and lot and lots and each of these essentially is a metric that provides some information of what's going on in the system these are all the defaults you can add more these are the defaults if you if I start thinking around I'm a pen tester what I might want to know stuff that's going on in this host I might want to know about device info right something I might want to do I might want to know what devices are in use on the host um so you can do something like search for device and then we start seeing things there like device e zero so I know know this host

is at e zero I know the interface name which could be useful depending on what I'm trying else I'm trying to do the system um we can also then go all the way through and look for that another thing that actually think I think having seen some of the talks earlier on when you're talking about attacking systems and trying to be stealthy one of the things you can get out of this is you can get how heavily loaded the CPU is so if you're attacking a host you're doing a red team you're attacking a system and you don't want to risk denal servicing it you can keep an eye on how the CPU is doing and find out whether you're

perhaps loading system too heavily and you want to back off your attacks or if you're trying to denal service maybe denal service is in scope of your review you can basically watch the CPU and see oh it's getting higher it's getting higher what I'm doing is working you can work out whether your denal service is actually doing what it should which is kind of cool um of course this is an awful lot of information you could go down for a very long time and of course you probably not want to read all of this so I will say in um in I am a lazy pent test or fashion um I wrote a very quick piece of code that basically

extract all the device information so these are all the devices on the host we've got all of the diss we have got um the file system devices and we have got all the network devices and my obligatory 2024 mention of ai ai helped me write this code so I think from what I've seen of talks this year you have to AI in Le once so I will say a nice website called deb. did a nice job of helping rewrite this program um and I put it up on GitHub if anyone actually wants it but basically you get you know this isn't going to end this isn't going to be a high in a pentest report but

this is pretty useful information all the devices of the host it's all the file systems all the networks and the exact version of the kernel running uh and pretty much you can tell this is going to be a WSL system as it actually says in the UN name String WL so you get a decent amount of information disclosure and that's essentially for free because this is not authenticated so and then if you hit an ordinary Prometheus m. 1990 you'll get all of the information about how it was built versions all that sort of stuff and remember the Pria threat model we talked about earlier this is by Design they expect untrusted people to be able

to get this so if someone puts this in this is by Design if you want to add authentication you need to consider that as an implementation detail not as something you're going to get out of the software so it's an interesting point to know and yeah you get all that sort of stuff let me get anything else that's in here command like blacks configuration all this sort of stuff file path information a lot of stuff that if you're an appentice could be pretty useful again it's not going to be the end of the test but it could be pretty useful information so that's Prometheus um the other one we talked about was joerger uh and jger is

interesting and I'm going to have to do this demo with a video because on this host the uh the system said it didn't want to have a fun day so let's go and get my Jagger video so jger as I mentioned is a a web UI for looking at traces so every application in this environment has sent its Trace information into jerger right so that information exists in joer jger as we've said doesn't have authentication so if you can find the joerger interface you can get access to all the information so what might that include well there kind of some interesting stuff we can get so for starters we have this service list here and that gives me a list of

every single service running on the host or every single service is not running host that's sending information to this so there's the start this is an little e-commerce shop so I know they've got a cart service a checkout service and all this other good fun stuff so that that's quite handy I do like that what else do I get uh well we can go and look at traces so yeah we look at front w web so that's one of our services and we can say show me the traces show me the trace information and this is these are all the traces so this is what a trace looks like it basically says what call was

made to the application by user and also what microservices did it hit so you know exactly what microservices were involved in the appication and we can exp expand that out and we can go look at it and and you can see why it's useful for troubleshooting I'm trying to work performance issues with this thing I can see exactly how long each microservice took to process a transaction which is kind of nice but also you can get this information like metadata out of each request and one of the things let me see down here you see here here just in the middle of the screen now we got the URL so one of the things that's loged by

default and tracing is the URL including all parameters set so if someone has put something sensitive in the URL of the application and anyone who's been a web app pentester will have written in their reports you have put sensitive information inside the URL could that be session IDs could even be tokens or credentials it happens so if any application analyzed by this system this back end has done that you as another thing to get person who can hit on the network could go into the trace information and you can get all the information like the HTP URL you can get the user agent you can get a variety of other information so it's kind of yeah HP is

is the one that there's actually currently a debate ongoing inside the open Cemetary Community about whether that should be a default or not um I my opinion obviously is I wouldn't if I was you um because the problem is that if you put it as a default everyone almost just implement it because you get this this getting this working the the actually getting tracing into an application is really quite simple you can literally require the library and suddenly your application starts generating traces but it takes the defaults so if you want to stop it doing URLs you can but you have to know to do that and as I've always said defaults last so you might be able to grab every

single URL now that on a pentest that could actually be quite serious that could be a high if I get session tokens in the thing I get every user session tokens that's probably an issue and not something we want to see so we can get that out of there yeah we you get various other bits of information as well ERS you get libr versions it depends on exactly what you're doing is exactly how bad that's going to be so so that's that that's the um so yeah so jger um is another interesting one port 16686 not so many of those on sensus when I looked and there were I think 700 and something so there are still some but not but not

nothing like as many as Prometheus node exporters tracing I think is slly less used yet so the other there's a flip side to this I've been talking about you break into them you encounter them web tests what can you do to them on a pen test that's some of the things you can do but I also believe strongly that observability tooling like General it observability tooling could be extremely useful for security teams and it's something that we should be talking to our general it colleagues about and there's a couple of basic examples I give for that first up thread modeling if you are thread modeling an application and you are trying to work out what you need to tell people one of

the first things you'll do is say show me an architecture diagram show show me the diagram of how this system works so I can do the threat model and a lot of times they'll give you the documentation that hasn't been updated in however many months right it's not up to date documentation often lags one of the things you can get out of um tracing is that this is a an architecture diagram built on live traffic and you get that for free as soon as you implement tracing because it knows how each microservice talk to each other microservice you get that um so any observability platform that's obviously um home telemetry but anything which implements open Clary or similar

application tracing will be able to give you this kind of architecture diagram now if you're threat modeling a system this is amazingly useful information if I'm thinking about far Walling I now know what service talk to what service so I know what you know potentially kubernetes Network policies or network barall to recommend if I'm thinking about authentication and encryption requirements I know what service needs to authenticate to what service because I can see it on the architecture diagram built out of Qui traffic so that's extremely useful and I I can really like that idea um for threat Ming um and then the next one on pen testing well it's kind of basically what we talked about

before but think about it from the other way around uh if anyone has ever been a web app tester and has done blackbox testing there are a few things more frustrating than sitting hammering an application and looking at 503 errors or some other error and not being able to work out exactly what's going on you know there's a problem you know something and you can waste a lot of time a lot of customer time basically trying to work that out so you can write it up in the report if you had access to observability tooling you could go and look at the traces and the traces will tell you exactly what's going on they'll tell you what microservices you're

hitting they'll probably tell you errors because typically these things will be implemented with error logging as well so the error logs will also turn up correlated with the trace information depending on how they've implemented it and also not in open Telemetry yet although I say yet because that might change um but right now if you're using some commercial observability tooling you might even get ones will actually tell you about Security tax so they'll actually tell you you know what attack string was detected or not detected and what SQL quaries were generated as a result you Tred to do SQL injection testing you have You' have the opportunity to actually say hey I know exactly what's happening when I write

the report I can tell you exactly what microservice has got the bug I can tell you exactly what the problem is and I can tell exactly where you need to go to fix it some of them even integrate with like code repositories so you can go to the actual source code straight from the error message and say hey I know exactly what's going on now in terms of report quality and giving the customer value for money I'd say that being able to give them that rather than I saw 503 and I kind of fuzzed it a bit but I could really get anything more than a 503 is a pretty good outcome so to bring that to the

conclusion observability tooling if you're not seeing it in your environments you will see more of it in the general it I'm seeing a lot of places place more focus on this as industry movement things like cncf has its own observability day now so it's it's it's going to be it's getting tra amongst developers and amongst at we will see more of this coming we need to understand it so we can give people recommendations on what to secure it and what to do with it security defaults are often not strong a lot of these programs are written to get up running fast to work quickly and to do what they need to do they can be secured but you have to

know that the defaults are not going to be what you need them to be they need to be changed and also and if there's a call to action from this talk at all if you do anything or remember anything from this talk if you're in security go and talk to your it folk after this talk next time when you're back in on Monday and ask them about what observability tooling they're using and can you get access to it because I personally think that having access to the observability tooling is an amazing resource for security people and also can really help when you're working together with development and it teams so definitely if you take one thing this talk it's in

Monday morning I'm going to talk to my t team and say what obser we doing and can I get access to it so great up hopefully that was useful or interesting uh my contact details are all there if you have any questions or if you're doing any of this stuff I'd be very interested to hear from you uh said some of these are just ideas um of things that could be rather than things that are already being done so if anyone's got any uh uh information those are my main socials you can find me on the other ones as well um and hopefully that was helpful [Applause]