
good morning everyone everyone hear me yes Danny can you guess what matters all right so TJ and I are gonna talk to you about encrypted things today start telling you a little bit about myself my name is Patrick Perry I'm a Technical Account Manager hey big him on in particular I work on a platform there that we refer to as insight you'll see some sort of screenshots of that platform a little bit later we're not here to sell you anything though it's just in there in server talk I'll start with this is my fourth time speaking at besides Augusta so thank you for keeping me back my favorite conference to come to talk to other stuff about me get a
master's degree in computer science in particular my thesis work was on breaking a popular crypto system that's used to boob store fingerprints this was work that was done at Michigan State and in particular this this is called a fuzzy fault it sort of sounds funny but it's kind of relevant to the talks we're talking about encryption key point here is when you're working on typing fuzzy balls into a PowerPoint really late it autocorrects the fuzzy balls which is other things is anyone here familiar with something called the Paley or crypto system anyone know know crypto folks here okay better Paley is kind of cool it's cool crypto systems something I worked on too when I
was studying what's neat about it is you're not familiar with crypto systems it was actually designed for for electronic voting so what happens with it is when you have a ciphertext that you produce of a number you can actually do mathematical operations on multiple ciphertext so you can add ciphertext together and get a new cipher text which one you decrypt actually gives you the results sort of a cool thing I say that because I'm really a crypto enthusiast at heart really one of the things I'm most interested in in the cyber realm if you're looking to make friends or pick up girls at a bar though don't tell him you're a crypto enthusiast it doesn't win you any points
in a past life I did a lot of our consulting it was blood money to recommend it I also worked on the GE cert for a number of years so with a lot of experience working in the soft & doing detection I was a federal agent for a while thank you for feigning surprise at that [Laughter] currently though currently as a tam I get to help customers which is something I do like to do some customers more so than others maybe but also in my role I get to dabble in a lot of things so keeps things interesting
[Laughter] Thank You Patrick I think my name is TJ Billy I'm also a Technical Account Manager at gigamon insight this is my first time speaking at besides thank you for having me here so I did comps I in school did some parallel computing research while I was there went into our consulting as well also don't recommend it current stuff like Patrick said we were becoz so I could do a travel law stuff mostly hunting across network data you'll get to see an example some of that in our slides writing a lot of code a lot of Python and unfortunately a lot of documentation - all right good yeah good just read that so what we're going to
talk about today basically if you work in the soccer you're into detection you know that most things are encrypted now so as an analyst what are you going to do about that it's metadata something we can leverage possibly future never gonna cover some use cases today for you though and hopefully give you some ideas on on how you can do some detection Zand basically just work in an encrypted world okay stranger things this is stranger things is sort of our theme here because a few reasons one like in stranger things with encryption scripted traffic evil is oftentimes right below the surface you just can't see it like in stranger things there's some really good examples of metadata
there where bad things start to happen a lot of times people don't know why but once you are aware of specific signals then you can find the bad thing in particular though the reason we did this talk is because our boss he's an excitable following he's really into season 3 of encrypted stranger things when we started this which kind of makes sense when you think about the fact that he was like a professional rollerblader anyone want to see a picture of our boss yeah
oh yeah it was it was all my idea before we wanted this talk you need to know the story of this picture so our team comes to b-sides Augusta every year we get a big house and last year our boss was driving the rental car to the Airbnb and speaking in his rich baritone voice much like I'm doing right now and out of nowhere it goes oh my god kittens with way more excitement than was at all reasonable so you I give you Justin kitten lover yeah all right so let's get to it mostly everything is encrypted you don't have to take my word for it so what we have here is we're looking at
the percentage of page loads over HTTP in Chrome by platform the real takeaway here is that if you look back just over the last five years encryption is on the rise one thing that I'm curious about here and I don't know the answer to is if you look at that green line that's Linux and it seems to be trending not sure why that is but the takeaway is that encryption is on the rise and if you're working a stock I'm sure you're aware of that so yay privacy right well actually being a enthusiast that privacy is actually really important to me I care about it greatly but it does certainly present challenges in the detection space one of
the issues is that attackers have actually all the same technology right so they can encrypt stuff - they can encrypt motion command control channels can be encrypted and if you've ever seen something like a file called one JPEG that's actually an encrypted rocker then you know that X Vil can be encrypted as well that presents will have a challenges for us and the highly encrypted environment what we've done traditionally doesn't really work anymore so there's Goods and Bad's of all the okay you're gonna win some you're gonna lose some don't love this slide I like the picture but this was more for the cispes in the room I guess we're going to talk about CIA stuff a little bit but so just
to get us a level set here get us all on the same page some advantages of encrypted traffic because it's not all bad it does protect things from from prying eyes that's for confidentiality your integrity ensure that data has not changed that's not the a is in availability that's a use and authentication but we can verify Bob's actually Bob non-repudiation prove it was actually Alice something you get the idea but there's some cons too we can't necessarily see malicious things coming onto the network we can't necessarily see exfil going out or be be plainly aware that something is X both and don't lose your keys availability whatever nice work TJ so as the security practitioner or what do we do about this
well we can decrypt all the things right and there's definitely benefits of that all of our old snore so ricotta rules start to work again that's helpful and there's certainly a benefit and if we decrypt everything there's a lot more data points that we have as defenders to find evil but there's cons that come with that as well potential loss of user privacy we can have a separate talk about whether or not you should be doing like online banking and working things but for the most part those are things that people do and people want that sort of data protected right if you do something like decrypt all the things and you start dealing doing difficut
management then you know it can be a huge pain and when things break they break badly for everyone on the network so is metadata metadata based analysis something that we work with here we posit yes so sproates of doing some metadata based analysis first SSL and TLS metadata is not encrypted right so if we can find interesting things within the metadata then we can use them as potential detection opportunities hunting opportunities looking at improving the security posture of our environment if we're focusing on metadata then we're talking about a significantly reduced amount of data that we need to store right so for the same amount of capacity means we can store more things less money and then NetFlow is still a thing
right but for my money it's still difficult but mileage may vary with that some cons with metadata based analysis you're gonna need infrastructure to be able to work with us right so you're gonna need a platform like we have but they come on inside or maybe something like security onion or whatever it might be there's lots of tools out there right but you're gonna need a way a sensor to be able to get the traffic you're gonna have to be able to strip out the the metadata and then move it to a back-end somewhere you can work with it or search it whether it be like AWS or maybe you're doing something with an elf stack
in cabana and finally this is really relevant is if you're working with metadata and it's not something that that you or your analyst have done before it's something you have to train them up on so that's a certainly okay so we talked about decrypting all the things we talked about looking just metadata which one do I actually want to do well in a perfect world like we said you do both you get the benefits of all the things but we know that there's lots of challenges that come along with that and frankly it's too much money too much time people can't in reality generally do both of those things so what we're going to focus on today is
just looking at the metadata we think there's plenty of opportunities for hunting and detection just by looking at that data that's not encrypted about the encrypted data thank you sir okay so I'm gonna talk about the specific metadata fields that you can pull out of TLS sessions we've been saying metadata a lot it's a very vague and abstract terms and try to make a little more clear and concrete I am gonna refer to Bruce bro stubbornly sorry to the Zeke developers in the room so these are the fields we're gonna go through we'll go through them by one and talk through what we can do with them one thing to note TLS version 1.3 is coming and it will take a
hit on some these fields make them unusable we'll talk about that one whit and will indicate TLS 103 ruining something by the big scary monster in the background all right so we're not gonna talk in the abstract actually did a full did some research on a pretty big data set to get some concrete numbers to back up my wild claims the data source I used was a roughly a hundred billion SSL TLS sessions covering about two months worth of traffic and this was traffic from about 50 organizations from all sizes all industries but primarily a healthcare tech and retail so just that that's where all these numbers are coming from so we're gonna solve some
start off with something quick and easy the TLS version string so what is it well it's the string that specifies what version you're using pretty straightforward the way this appears in a TLS session is the client suggests here's all the versions I support what I can do and then the server picks the one that should be used no one should be using SSL v2 or v3 because they're very deprecated and have been for a while now we should all be on TLS some version of TLS what can you do with the version string no so when we say what can you do it that we're talking about in the context of a security practitioner some
of the Sox won on the analyst team and we're gonna kind of break it down simply simplify it into a posture detection and hunting operations so with a TLS version string it's really more of a mostly a posture issue looking for you know are you you have servers expose the internet that are running old versions you have some internally accelerating an old version or using a tool and with the vendor that supports the whole person it's really all about posture there may be some detection opportunity but it's really all roads lead back to posture if you detect an old server its again you know into your posture workflow nothing really to do for hunting here because
again all roads lead to posture for this is what that string will look like if you're ripping out bro metadata it's pretty straightforward just dropping the period between the one in the two so let's get a little more concrete let's talk about some numbers of those 100 some odd billion TLS sessions we saw seven unique versions of TLS or SSL so the way I'm going to break down each field and kind of talk about numbers for every field actually is so over those two months of data I took eight different seven-day periods and computed you know in this case the average number of unique domains I saw all over those seven day periods so what
we're seeing here is that pretty much everyone's using TLS there's really isn't all that much as a cell just good a little bit of SSL v3 know SSO v2 or very sorry there's a very small novice uh B - but they weren't registered two domains oddly enough one thing I learned actually while doing this talk I didn't know before I started d TLS is a thing it's TLS for UDP traffic who knew and then of course not seeing any version 1.3 yet so at least not in the dataset that I looked at so main takeaways here like I said it's really all the posture issue it those you know the traffic to those s so using SSL v3 and v2 probably
a vendor that needs to be yelled at probably someone with just an old box so they can't get rid of but really not much to do here other than posture stuff TLS 1.3 is not going to affect the version string because the version is used to establish the encrypted tunnel so it can't be in the encrypted moving on the cipher suite so this basically defines the actual mechanics of the crypto when you're establishing this encrypted tunnel again the client suggests all the versions they support and then the server picks one if the two hosts can't agree then there's no connection also similar to the version string there's posture issues here you can look for old
deprecated crypto libraries get rid of those there might be some detection the hunting opportunity here I'll get in more detail and what I mean by it might be on the next slide and this is what it looks like at first it looks like kind of an obscure string but it's not too bad to read so this is TLS elliptic curve diffie-hellman key exchange RSA right it's just kind of defining the mechanics that are actually being used so not too straight it looks obscure it's not too bad so let's get into the numbers over those 100 billion sessions I saw 226 unique ciphers so what I did then did I took those 226 ciphers and I
broke them up into five groups each group relating to the average number of unique domains I saw for that cipher so the way to read this would be you know the top line there were 27 ciphers that were being used with over a thousand domain so that could go up really high it could be up to hundreds of thousands of unique domains over a seven day period so the main takeaway here is that you know there's a pretty small set of unique cipher suites that are responsible for an overwhelming majority of all the TLS sessions out there which might not be surprising to you on the opposite end roughly two-thirds of every unique cipher I saw had fewer
than ten domains and those in that you know ciphers map to a very small portion of the of the total number of sessions I saw and breaking that down even further roughly half the ciphers only map to one domain or none you don't have to define a domain for TLS and again an even tinier portion of the traffic there so you know small set aside for ciphers mapped to a most of the traffic and that's why I say can you detect or hunt on the cipher suite field maybe if you're looking for the you know ephemeral malware and the network of the eternal malware in the network if it's using one of these you know two thirds
right one of the unique cipher suites that isn't think seen all that much you can detect it probably with a pretty low false positive rate if you go hunting across obscure cipher Suites it won't be that hard to find because there won't be that much data to look through but if that malware is using the same cipher suite that Google Chrome on a Mac uses you're never gonna detect it because it's way too much data and you're never gonna find it when you're hunting so again going back to what can you do with it maybe to technicon it depends on the malware or the attacker you know what they're using they're blending in or not
again 1.3 is not going to affect the cipher so you either because again it's used to establish the encrypted tunnel so it can't be inside of it moving on to J 3 and J 3s hashes this is a somewhat new field that's come out in the last couple years for those of you that aren't familiar with it essentially what it's just an md5 hash but a hash of what well when you go to establish a TLS connection the client sends out a packet saying hello and it's called the client hello the server responds at their own hello packet called the server hello so the ja3 hash is essentially take that client hello packet rip out a few
decimal values that you know define basically some parts of the hello packet some of the configuration settings whatever you want to call it pull up those values turn it into a string hash that string that's it it's pretty simple process what it essentially gives you is a fingerprint of that client hello packet now the reason that's important is that applications different applications tend to have different hashes chrome tends to always have the same hash because it has the same crypto settings and libraries that it imports and uses and similar for the server the same applications tend to have the same fingerprint well they do have the same finger it most often so you can tell
applications apart by that ja3 hash not really a posture issue here because again the ja3 hash is really just a proxy for the application that's running and everything at TLS session so if you want look for now the application it's probably easier just to go look at the running applications in your network and not look at the obscure hash described enough but there's the text and hunting opportunity here will be that on the next slide as far as what it looks like again it's just an md5 hash nothing fancy same thing you've always seen before so let's get into the numbers someone split up J a 3 and J a 3s into two different sets because they're
different one describes the client one describes the server and these are two very different things so for the J a three hash this is the client I saw over 200,000 unique hashes took total over this two-month period and when I saw here kind of map to what we saw for the the cipher suite as well which was a small number of unique hashes were responsible for an overwhelming majority of the traffic and that's what this charts trying to show you so down here at the bottom this is the top x hashes by totaled session count and then overcome the y-axis we have the percentage of SSL connections or TLS connections so main takeaways here work
how to read this is the top 10 ja3 hashes by session count right the top 10 was popular we're responsible for 36% of all the sessions moving down the lists using nice around numbers the top 50 hashes are responsible for 71% and the top 100 hashes were responsible for 83% of all the sessions why is this important well just using a modest amount of Intel work right going and looking up 100 hashes and saying what applications of these match map to that would give you a significant amount of enrichment on your data right and you know that'll let you know the application that started a TLS connection for it I mean based on these numbers 83% of the sessions put on your
network so I mean that's a pretty big gain right I mean I can go look up hundred hashes it might take me a few hours scripts something up but you can do it and I did do it so just to run you through the quick and dirty Intel process I threw together you could take a ja3 hash go throw it in a public j3 hash database this is Jay a three-year comm and what that'll do is give you user agent strings that have been seen associated with that hash you can then take that user agent string plug it into a you know you a lookup database I used user agent string Docs and that will give you the operating system
operates the conversion application application version Bob all right we thought we hopefully done this before and now in to API lookups you've gone from obscure j-3 hash to a readable application stream so I did this on the top 100 hashes and I got results for 51 of them now I just did this on one ja3 database and one user a string date of it so I could go look for more ja3 databases and probably bump those numbers up without too much more effort so now of course there is a trade-off here right the top 100 gave me 83 percent of the traffic but you might have seen though that start it started to taper off a little bit you're gonna
just draw the line somewhere because you're you know rate of return is gonna decrease as you've increased that number of hashes but you know you can start with a really reasonable number and probably call it good that's the main takeaway here and TM not too much Python right I think this was maybe like 20 lines of code max and then once I have you know that enrichment for 83 percent of my traffic then I can start looking at the things that I couldn't identify what are they why couldn't identify them and just figure out what they are and maybe find some malware that's doing some sort of weird crypto thing or maybe I'll go looking for known J 3 hashes but
you know things that would be weird like PowerShell or something they're all been that's living off the land it's just some native binary basically that's talking out to the internet that probably shouldn't be for example WMI if you're not using WMA talk to the internet and you see it that might be something worth checking out so that's the ja3 hash moving on to the J a 3s again this is the server's ja3 I found 424 unique j 3s hashes I broke the table here down in a similar table that I did for the cipher suite and then for those of you who are well maybe you know noticing the numbers are kind of similar to the cipher suite
almost identical actually I broke them down the same right I took that all the j3 hashes I found and split them up into five buckets based on how many unique domains they map to and we get roughly the same numbers that we did for the cipher suite and which kind of makes sense when you think about it because the cipher suite is just you know a setting that the underlying application picks and the j3s hash is a fingerprint of the application so kind of makes sense that they would or late together and the findings are also very simple a little over two-thirds of all these hashes map to fewer than ten domains and a tiny
portion of the sessions right this should look familiar now and then 40% of the hashes map to just one domain or zero and even tinier portion of the sessions so what can you do with J 3 and J 3s hashes well this is why I say it's there's definitely detection and hunting opportunity here because the j 3s hash is basically just a robe more robust version of the cipher suite there was a maybe there on detection hunting right and for J 3 hazards we can definitely go hunting because we can get you know known enrichments for 83% of our hashes and you know take all the web browsers all in the popular platforms figure out
what their hash j-3 hashes are whitelist them that's gonna cut your data set down to a much more manageable size and then you can start doing hunting activities and also you know we found 200,000 more than 200,000 unique j3 hashes so if you find a piece of malware that has a unique j-3 hash it's probably gonna have a really low false positive rate for the detection scenario so there's a there's a lot you can do here with j-3 hashes they're not the magic bullet that I think I certainly thought it was when I first heard about it but there's a lot of stuff you can do with it operationally and Sheila's wound up 3 is
also not a threat here because again this is just fingerprinting the session establishment basically so it can't be within the session now moving on this will probably be a bit more familiar the server name indication field or SNR it's just a domain it's it's how the client specifies what site what application they actually talk to you on the host because we don't just host one website on one IP address these days right but the end of the day it's just a domain so anything you could do with domains in the past you can do with this right so not really a posture issue but or a posture thing but definitely detection hunting opportunities again just a
domain nothing fancy I don't have any numbers to show you here because again it's a domain we've worked with domains before we know what to do look for weird ones new ones the same process applies although TLS 1.3 could come into play here and take it away or maybe not it kind of depends what do I by that this thing is moving a lot Flores and I remembered it move so let's talk about what how 1.3 will affect us so first of all the SN I was always an optional extension you don't have to specify a domain to establish a TLS session you can just go straight to the IP address if the server isn't just
serving up one application we'll probably just ignore you so it's really common to see it but it is technically optional so what 1.3 will do is give you the option to encrypt the sni via DNS so basically you'd say hey I need I want to you know look up the IP address for this this domain then you'll also get back the key to encrypt that with and then when you sit in the encrypted string over the server will decrypt it and figure out what application you wanted to talk to you so why do this well privacy basically if Patrick is sitting in a coffee shop and he's you know gonna go to his bank website I could see the
sni if it wasn't encrypted and figure out Oh Patrick uses Bank America great I'll go attack him on thankfull Amerigo try and you know brute forces account on bank of america as opposed to not knowing what bank he uses and having to do it to all the banks right also isp is like to snoop on us and for advertising things so it's really just about privacy and protecting what you're browsing related to the SNR tickets right now when I say certificate attributes in this case specifically I'm talking about the subject and issuer fields on a certificate generally the server cert is usually required and the client is not but techie tech they don't need either one
what can you do with it pretty much anything honestly there's obviously detection and hunting scenarios that play because you can get domain names out of the certificate certificate attributes right so all the same rules apply there but there's also some posture plays here look for old certs self-signed cert it's just things that are falling following your corporate policies right this is typically what those those subject and issuer fields will look like when you break them out and bro it's just a comma-separated list of certificate fields right so common name you'll commonly get at least the common name but also you can get other you know descriptors like location and organizational descriptions so again I don't have
numbers on the certificate attributes because it usually maps back to domains right at the very least but so there's really not like a numbers to crunch here but if we can just kind of talk through the use cases let me get or this will come to play when we go through the use case lines however 1.3 is absolutely going to come into play here and so let's talk about how so encrypting certificates is going to be mandatory whereas the s and I was optional the reason why is well one you don't encrypt it for privacy but - if you don't encrypt a certificate and you encrypt the S&I then you've kind of undermined encrypting yes tonight
because I can just see Patrick going to Bank of America with their certificate so the way this is gonna work as certificates are always gonna be encrypted and then the SMI is gonna optionally be encrypted because the SMI itself is optional so you know requiring encryption all that would kind of be counterintuitive so smoke soldiers TLDR the certificates will go away with 1.3 as far as analyzer all right that's all the fields which we're gonna talk through I'm gonna pass them over to Patrick who could go through some use cases and we'll actually put the data into action ok so essentially there's a there's three different ways so I'm gonna talk a little bit about detection use cases and
then go from there so detection a couple things apply here as someone that has spent a lot of time in a sock open-source fell for me it's just all right dog it's like I like to think of it as I like to use it more as something to like decorate current information rather than necessarily relying on it so much for detection for my money I've found higher false positive rate I think with with open source Intel I don't know that's sort of a personal preference but the reason I bring this up here is because when we start talking about detecting with some of these metadata fields just like any other detection I think really finding unique values is important if
you're trying to be too general here in terms of detection you're gonna blow up your console when your analyst won't be happy one of the good things about like Jah 3 in particular and cipher Suites focusing on that is that we're really looking at at the how something has happened rather than the food so there's trade-offs with everything ok so I think it's time for a prize DJ what do you think yep ok so uh I'll ask the group of question here first up can anyone tell me what we're looking at here generally what kind of now where are we depicting up here anyone know looking at that data what's that yeah honestly close enough
yeah it's it's trick bond yeah but like yeah these days one of the same a follow-up question how'd you know
yep yep so I guess we have to repeat so it was a port for for seven SSL goes to port four for eight by default but trick bots known to talk to 4 for 7 and 4 for 9 for whatever reason so this is our product we use this to depict this not because we're trying to sell it to you but because we're lazy and it was easiest for us to do so what we did we're still bringing out our meta data here and it gets presented to a user in these fields right so I can show you some of the fields and this is that we can work with in front of the detection
I talked about IP address this is the desk IP address our prize winner noted the destination port other things here before we get to the subject server name indication here as TJ pointed out on an earlier slide that is not a required field so it can be an all-over blank interface subject so a common name you can do some interesting things with detection here if you're at this security onion conference yesterday there was a pretty interesting talk about this particular and one of the things you can do is if you look at the domain that is within the subject here you can actually do some analysis and compare that to like DNS requests and that sort of thing that would be
associated with this session and what happens is there are a lot of malicious things Metasploit in particular that when it populates this field it actually just puts in random data so if you have a subject that is sort of randomized or doesn't make sense with the other traffic that you see it can be a string that you can start going on and that also holds true for an issuer the other thing I'll say about that is that in some cases with certain malware you don't even have to do that level of analysis for you're looking for like ok does this make sense is this weird you can find things that are okay we know for a fact that this
particular common name is always malicious that sort of things so it can certainly be a detection point TJ talked about Joffrey and Joffrey yes a little bit we're just generating those and populating I theta cure and then we have the cipher suite here at the end all possible fields that can be used in infection okay so this is kind of a trip down memory lane for me if anyone who remembers green cat this sort of dates myself a little bit but this is like malware that utilized encryption going back but in terms of malware back in 2010 it was pretty interesting because it was encrypted traffic right which was unusual back then and full-featured backward it was websi to where you could
have an interactive shell that sort of thing goes all over us herself so to the defender it was hard to sort of figure out was going on I remember working on this back in the day and in particular there were specific user agents associated with this and specific URI patterns problem with that of course is that out for detection you need to tab the cretaceous proxy traffic recently that was leveraged by North Korean actors a few years ago and Mac truck was interesting within the metadata there's a specific certificate subject and specific issue or subject that could be used for detection so you could just have a simple detection where if you have like showed you a nuclear slide
we're like that's sure metadata and you say certain subject equals this exactly then you will always detect that an hour with zero or very very low false positive this is something I included here just as someone that is crypto enthusiast because what I thought was interesting about it is while it did work via SSL it was actually doing sort of asymmetric encryption so it worked with a public x.509 certificate that was embedded in the malware that would pass along but within that there were detection opportunities within that metadata there's a specific sha-1 that you could keep a specific serial serial number so using the metadata to leverage detection so things are getting better this is
something that's probably more timely and relevant to a lot of people in the room but we see a lot more often recently is red teams a whole lot leveraging PowerShell via SSL that's become a more common thing so if you've got PowerShell and SSL traffic well you can create a Jah 3 for that right you could you can look at the jaw 3 for PowerShell and then if you tie that to something in particular like we're an issue where contains let's encrypt or perhaps another issuer that you've identified being leveraged with with that PowerShell to do bad things then what you get is what really we're looking for here this is sort of the ideal outcome where we're using some of
these new fields some of this new metadata to actually come up with a high fidelity detection and I'll ask real quick anyone read about Reducto yesterday sort of a cool thing and I won't get into the details now X I really don't have a lot of the details but this is something that Kaspersky blogged about yesterday so reducta is the successor to something called comp fun and it's really interesting malware to me you can read about it there but what's so interesting about it is that it's actually water marking the TLS handshake so what happens is they are compromising a box right and then what they're doing on a system or in the browser why that
becomes an issue is because that PRNG is used to generate the the keys during the TLS handshake so what they do is they are able to set it up so there's like four bytes in the TLS handshake that they can always identify so like this is I think abt level kind of stuff but super interesting this kind of malware and stuff is out there and it's scary as hell honestly when you treat these things right but even with the limited data that we have about this thing as it just was logged about yesterday no reason to panic there's a very unique sha-1 a place where you can do detection so even this new scary stuff there's lots of detection all
right so hunting not the same as detection right because with detection we're looking for stuff we already know about with hunting we're going and finding the things we did not know about typically when most people start hunting they think I'm looking for apt I'm looking for the attackers or they compromise box on the network and whether it's intentional or not this generally breaks down into two sets of questions who are we interacting with and how are we interacting with them although most the time when you go out hunting and looking for attackers you're gonna find security posture issues all right gaps in the and your security posture right attack surface level things and that's not a failure on the
hunt that's just kind of I mean that that's a hole to go plug right so anyway let's get started with some hunting examples so I kind of broke it down to who are we interacting with and how are we interacting with them it's pretty easy to map the fields we talked about in those two in those two secretaries with the who this is the like the in network entity we're talking to so like the IP the domain specifically for SSL traffic we're talking about the s and I and the certificate attributes because that's where we get the domain from and these fields are pretty easy to use right there's nothing new it's just the same stuff we thought we've always done
with domain so it's prevalence right how many hosts are we talking to first is it when was the first time we saw this did we see it five years ago or we just see it for the first time last week that was the domain registered yesterday you know these kinds of questions just weird string characteristics of the domain it's the stuff we've always done however TLS 1.3 is gonna take this away for the most part we use might still get the SNI if you know vendors are companies aren't aren't encrypting the S&I but they might and then will lose this so we're gonna need something else fortunately encrypted traffic gives us the how and I would argue the how is
honestly a better detection metric because we've been struggling with the who for a while right tactics like fast flux DNS or just doing DevOps with the attack infrastructure there's no reason I can't spin down that you know compromise see to the CT it's been found and spin up new stuff right away right it's not that hard just to lift and shift but with the how that's a little bit more difficult our questions that's more limited right again we're looking at prevalence how many hosts are showing this J three hash what software is related to it is it a web browser or is it something obscure but these are a little bit more difficult to change if
you want to change your you know applications ja3 hash you have to go and tinker with the TLS libraries that you're pulling in and the specific settings and that's a little bit tougher to do it just takes a lot more effort and again as LeFort level of ever it increases likelihood of it being done decreases and then some loops a cipher suite like yeah you can go in and manually choose a different ciphers for you but if you're in you know just using libraries to write your malware I mean you're gonna go like custom change the library maybe not so two different categories here like I like be able to look at how how we're establishing this
session because it gives us two more to the process level so let's do that let's go hunt for J three hashes now so though I work at gigamon I we have you know access to gigawatts data in our tool so that's the data set that I'm using here 30 days of gigamon actual network data so what I did here is say okay let's go find all the records with a j-3 hash and group them together by the hash and the earliest timestamp that we saw that hash and so I ran this query on July 10th and found two hashes that we first saw for the first time on July 9th that's interesting so I wanted to dig in on those and I put
these through basically the same process that I talked about earlier with that Intel process right I took a hash I throw it into ja3 or comm and I got some user agent strings and at first glance that looks like Firefox version 60 on a Linux box but just to be certain right because it's easy to like make one little change so user agent string to try and hide it I plugged that user agent string into a you a database and sure enough Firefox version 60 on a Linux cool gigamon has engineers engineers tend to like Linux I would say so maybe it's just you know software update new box on the network or who knows right this doesn't make me
worry too much so I took the second hash plug that in no results that sucks my process it ends here because I want anything to move forward on so I've got to pivot to something else so what I did instead would say okay give me all the events traffic with that specific ja3 hash and now let's group it by the the organization name tied to the ASN and this is what I got TLDR nothing too scary honestly it looks like normal user traffic Google see social media stuff I see CDN traffic now of course there's no reason why a c2 server can be hosted behind cloud player but I just don't see anything - nothing jumps out at me here
it's all normal traffic and I don't see anything high-volume that might say oh c2 beaconing so maybe this is just a workstation web browser because it does look like user traffic and maybe it's just a recently updated web browser that made a little tweak to the TLS session establishment that totally changed the ja3 hash and that hash hasn't made its way into public repo or public repositories yet probably something to then say hey Intel team here's the new hash go figure that out because maybe I'm just the stock analyst looking at alerts or I'm the hunter right but this is essentially where the hunt stopped right because I had two results adduct through him I didn't find anything weird
thank goodness or else I probably wouldn't be presenting about it so let's recap that hunt what do we do I just wanted to look for new ja3 hashes right now I only cover thirty days of data right if one of those hashes appeared 40 days prior to that then clearly that's not really a new hash per se maybe that's what that Firefox version 60 on our Linux box was however there is a pretty solid process here but it can definitely be improved right maybe this is something your Intel team would be doing track all your observe j3 hashes over time and you know when you see a new one go figure out what it is
put that in the end your Intel DB and then now all the analysts have something to refer to now of course your mileage is you may vary here if you're seeing thousands of new hashes a day that might not be tenable you can maybe automate that to some degree but you might not be able to keep up potentially it really depends on your network because what's gonna generate new hashes mostly software updates why not mostly software updates but that can be a source so you might be chasing new things that aren't really actually new so you might need to bring in other data points maybe prevalence for example let's say you you one hash appears for the first time
today on thousands of hosts it's probably more likely that that's a software update that is now generating a new hash for some common application than it is that malware was immediately then installed on 3,000 hosts in your network possible but which is more likely right whereas if just one host randomly starts talking with some weird thing it could be some you know pup or who would type thing or could be malware who knows right maybe it's worth a look but the point is it this is helpful that I have an Intel team to track this over time and keep up so you have some context even if you can't hit a hundred percent I'll take 80 percent right that's still
pretty good so just kind of going to call it here like how you can go hunting and you know generate some findings and maybe feed that into other teams now you can also do this for certificates and domains right attract new domains track new certificate attributes things like that but we won't go through that process again moving on to the last use case as posture and we'll just use the example of Connor I talked about just looking for old versions of old versions of SSL right SSL v2 or v3 and we can ask the same questions I mentioned earlier we have any host either in or external in the network externally facing that are supporting these deprecated versions
do we have any vendors we need to go yell at as a vendor I'd say you probably to go yell your vendors regardless this will just be an excuse for it and we can also look for deprecated sign for sweets right it's all coming back to software that we shouldn't be using but let's ask this question right again I ran this in on Giga Mons data sniff gigamon had was talking out down to anyhow supporting SSL v2 or v3 and yes yes we were I found two domains that were supporting SSL v3 the top one was an actual vendor that I didn't finish aim publicly but they were supporting us to sell v3 so I shipped
that over to our sock and said hey go talk to them SSL labs comm is actually a site for you know testing or testing for old versions of SSL so that's expected that's actually a good thing Thank You SSL labs but yeah I'm sure if we ran this ago in my find another vendor too so this is always fun but that's that's it that's our posture use case so I'll go and wrap this up here we're just about done the key takeaways I want to highlight here is number one encryptions here today I hope that was pretty well established in those first couple slides the best way to handle this is to both decrypt the traffic
right and then extract metadata out of the encrypted and decrypted streams but that's tough to do if you're not doing either one you're definitely gonna start doing both tomorrow but if you decide to go the TLS or the sorry the metadata route there's plenty of analysis left to do the thought that you know encryption kills in SM is simply not true and I hope we've proved that out to you now obviously TLS version 1.3 is gonna hinder us some right it's gonna take away our certificates and it might take away our domain name from the S&I right but there's still some analysis we can do on top the who but the how I mentioned earlier we've had you know
it's been tough to chase down the WHO for a while now because attackers are you know pretty good at hiding from us or try to be at least but the how is harder to change so it's arguably a better analysis path anyway the only caveat here is as we saw with the hunting example you're not always going to get results for a j-3 hash when you look it up in a public database I don't know if you noticed but on the screenshots for J fear comma that says 1707 unique hashes I found two hundred thousand more than two thousand unique hashes over two months so there's definitely a discrepancy there we need to catch up on as a community but you
know we'll get there Jase we still kind of new not everyone is even using it at this point so we'll get there but it's definitely gonna require some Intel work on our part as the analyst and within our org but we'll get there and there will be lots of analysis that we can do with TLS metadata and that's it so we'll take questions now close that thanks for coming 1 2 3 no problem a question back there
so the question is with encrypting the sni via DNS is there anything you can do with the key you get back from that that you're gonna be that you're gonna use to encrypt the S&I with maybe I don't know we'll have to see I would imagine that it's gonna work much like a certificate would right so maybe let's see I don't know what the once that's actually happening in the wild and we're extracting that a day to walk to take a look and see what happens but if I did guess magic feel free to tell me I'm wrong but if I did guess that they maybe there's bound to be some level of uniqueness to it any other questions yes
sir his domain fronting still a threat can you expand
oh so can we still so can attackers today cities they s ni to hide like the true situ server yeah sure why not I mean that you could certainly could right you do a DNS lookup for just a front-end like you know look at the first lookup calm to get an IP talk to that at P and that tells you in an encrypted tunnel the actual IP address you should go talk to so certain possible fast look is there all possible with Sno Ferb nice to know Jesus SSL traffic yes
okay so the question is about training essentially he's asking what are some resources available to people out there to be able to learn more metadata to be able to dig into it that sort of thing that's a really good question and I think that's a question that I think this is there's a lot of levels of this right so if you're looking for data I mean there's there's certain things that are available right so like I don't know more about Daughtry for instance I can go read all about it Salesforce Salesforce yeah what I would probably refer
- how folks and socks think and how they ask questions and to me there's a lot more to gain by that gets how do we have to push what are the wrong thinking about how we think about these things that's something that that really takes some work and I think that really needs to be successful and kind of clarify it was the question about network data suppose our network forensics specifically or just trading in general
gotcha
this is the call it was a Chris Sanders of with applied network defense has a free course available to check out it was a call out from the audience yes sir
yeah think oh yeah we could make that public oh sorry the question was uh you know the data we show the top hash is a top Jay three strings back source of it is that available publicly and yes I can make that available through github because again they're just hashes they're basically anonymized and then I can throw on the inversions I got from my process sure yeah exactly yep yeah we can close up okay we've got a one minute here so we want to ask a couple questions be about two more prizes so TJ what's a great question oh
this is a cop-out what's the default port for SSL TLS I said it during the talk who said it first time here over yeah see this is tough to do I was gonna hand to the guys who were with questions actually and I thought they were pretty good questions all right can we do that was that okay okay so like I like the question with the encrypting SNR sir and then uh you had the question about the domain fronting those were the first two so that's what I was thinking that's it arrow thank you
and we'll hang around for questions if anyone has any thanks thank you