
awesome alright can everybody hear me was just working in the back yes awesome great thank you everyone for coming out so some brief it rose for myself and Justin my name is Peter Roy I'm a founder of a network security company before that I worked on product security at Microsoft's some of the bug bounty programs as well as Windows Internet Explorer before that I worked as a technical director and subject-matter expert at the US Department of Defense and otherwise I've given a lot of conference talks and yeah I think that's enough about me everybody my name is Justin Warner a security engineer working with will diceberg previously did some time in the military doing a
really fun cyber things and then transferred over into consulting where I spent three years with a team called the adaptive threat division working a lot of open source projects did a lot of Red Team stuff got tired of sending the same phishing message over and over for years out of time and getting into the same organizations and so I decided to transfer into blue team to solve some challenging problems figure out that our ways to do this so essentially the whole reason why we're here one of the things that we do at iceberg is we work with a number of different customers we help them manage network monitoring and one of the big things that we've seen and I
mean the FF has a booth out there where they also will talk about like things like let's encrypt but you know realistically we are seeing more and more traffic on the wire be encrypted and so so some pretty interesting impacts which we'll talk throughout the talk but realistically right now are we see things we're seeing like essentially between 35 and 45 percent of traffic be encrypted and so that's actually like just for the folks that are like I wonder if he means north-south or east-west traffic it's primarily north-south traffic so east-west workloads like that's a whole nother talk that isn't this talk so effectively not really sure where everybody's at with network security monitoring and how
everybody kind of understands it just gonna like do a couple high-level slides so those folks that are like yeah I know this please bear with me for folks that are you know or at least interested in this and want to set the background just we'll move to it pretty quickly here so effectively if you look at all of network security monitoring as you know kind of as a capability you essentially have for like effectively a quadrant system that that exists here so we have essentially batch in real-time which is talking about like are we processing data like a recollecting data at rest and processing data at rest real-time is something more like an IDs system where
we're processing data and stream or in real-time either like on the network itself or like we can have real-time analytics systems and like data pipelines behind that we have content which you know a network LAN means P cap basically there's a number of other proprietary formats but P cap is kind of the one that we've all decided to agree on is how we talk about network security and full content binary data and then up at the top we have metadata and content there this can mean basically anything that you can output into a structured log format that isn't binary data so effectively when we get into encryptions effect on the quadrant effectively everything that is content-based just
disappears so effectively what that means is we no longer have content inspection so like all of our you know our layer 7 firewalls our fancy you know sand boxes that we've bought the sit on the network and extract artifacts and files like all these capabilities essentially have gone away we also don't have any signature base capabilities which for those of you that have been doing network security as long as we have like that's where a lot of this all started was like I mean snort raise your hand if you've used snort or seen snort and yeah ok yep and I'm now also basing where I am on this talk and where the audience is but yeah
I mean like snort basically like encrypted traffic you're interesting like you are evaluating bite patterns and those patterns don't exist because that's the whole reason encryption exists and so realistically we get pushed up to capabilities that are either batch or real-time on top of metadata because that's really all that we have left so I guess like the other piece of this is setting background I'm like do attackers actually use encryption is this a real thing or did these guys just submit this talk so that they could like sound cool and get a talk submitted and like accept besides this is actually a real thing there's lots of different actors that use encrypted command control everyone
from your friendly neighborhood pentester Red Team all the way up through a lot of the criminal actors effectively you know we we took our nice pieces here it actually ranges from like self-signed certificates which you'd be like oh there's no way an attacker could do that and I would work in a real Network it totally will all the way up to you folks that are like establishing like legitimate Eevee infrastructure and doing that but the majority and where we'll talk a little bit later is around domain validated so especially DV certs are pretty much the easiest area for kind of abuse and also like if you're here from the Red Team side it's the easiest place to get encryption and
bypass all the controls that we just talked about if you're here from a blue team side it's the area where you're gonna have attackers doing that the most often so this kid mean like so I mean it's really funny because sometimes like there's this like age-old like grudge match between Network and host based security and so some of the most basic you guys are like alright we're just gonna give up on network security encryptions just making this terrible if I had a dollar for every time I talk to anyone like whether it was like as a vendor or like as a consultant before that or just talking to folks like oh well do you guys terminate encryption
like this is like if we don't terminate encryption like everything is just going to be a pumpkin-like I mean the majority of traffic is still unencrypted but like we're not gonna just surrender and like let this be over because otherwise we wouldn't have a talk or just be like we've done we get off the stage and have you over so effectively what happens is we do end up with with two options so one is we can say like alright encryption is a thing it exists we're now going to go ahead and terminate so like there's a number of appliances it's like and this is the other thing it's like if you go to your vendors and you think that you want to
like have all of your vendor support termination you may want to reconsider that strategy because I don't know if you guys have dealt with certificate management probably fewer people have had to actually deal with that but I mean Chris Campbell who's a great guy gave a really fun talk about owning security appliances a few years ago he was at blackhat but like you definitely you don't want to create your own like security like the whole idea is like you have a Lex less complex solution for security so you're not creating new security problems with your security solutions and capabilities so realistically like we would advocate from like a design and architecture perspective completely agnostic of what products you use to
look at like just establishing a monitoring layer and then terminating into that monitoring layer because then all your old stuff works like it's kind of like you know free vacation it's like hey like we used to do like we used to know how to use all these tools and do all these things and like if you determination for the most part like you get all that back so all the all the old stuff that work now it works again the other big thing is like if you try to determination on an appliance the performance will tank pretty hard so actually using like network gear to do things that network here is really good at we're big proponents of that and we
generally think that's a very good idea the other option here is you can shift from a purely like content-based strategy so more of a metadata based strategy and we'll talk a little bit more about what that means but realistically it means that instead of depending on all of these different content analysis engines to do like real-time detection and some other pieces like we essentially kind of move to a different architecture so really quickly I like pros and cons this is part of the problem with like running a company and having how's it work like that you work with on a team there's like you start to just look at everything in the world with like okay
what are the options and what are the pros and cons so effectively for termination like the O's are like I said everything just works like all my old stuff that like snort your back you're here to stay like I'm so happy about that you have more or less full visibility of encrypted streams anybody that's tried to implement termination knows that this isn't exactly as much of a panacea as I'm painting it as but it's pretty damn close let's just go ahead and say it is for now the cons here really around user privacy so again like if you guys are practitioners and you've talked about SSL termination or encryption termination in your environments you've probably interacted with some people
that are concerned about user privacy what about banking what about healthcare like do our users have a rights of privacy when they use encrypted websites and I think that before when most of the encryption was focused on particular like web-based applications that was a very powerful argument right because like hey I'm a security guy you know naturally I want to have all the data to be able to do all of the things but at the same time like yeah I mean user privacy like I'm a user on this network too like yes what I like privacy like when I'm doing my home banking probably should I be doing home baking at work that's a whole other
conversation so effectively like what you have to do here also is when you implement termination you effectively add a certificate to the trusted root Store on all the endpoints so like I was talking about like network versus host and all these pieces to actually create another like fairly complex management piece which is again why I would add like advocate for a single termination layer with a single certificate and then you can like install that certificate where you need it because you're essentially what but when you do this like if you guys know how like SSL and TLS works a lot of this is based around like the trusted routes that are installed either in your browser or on
your endpoint and your web browser like actually depends on this is like a source of authoritative trust and so this is managed by like a small handful of companies but like to a large Esther essentially like Google is a strong advocate and does a lot of really positive work around essentially like maintaining the integrity of that route of trust and then also like companies like Microsoft and some of the other CAS that are very large also kind of participate there but you're essentially breaking SSL ensconce when you do this like from a design perspective in reader trust you're basically saying like hey I trust these big companies and like there's a lot of if you haven't looked
at your like the certificates that are in this route of trust you may want to do that it's a whole nother like diverting entertaining like it'll it's good for at least five minutes of entertainment you won't need it this week but like next week when you're like I don't know recovering it might be worth checking out but effectively like you're saying that I'm better than all of these entities that are much larger than I am are dedicated to like maintaining integrity and Trust like on the internet like I'm just gonna go ahead and bypass all that but my certificate in here and then the other piece of this is like you're gonna have some issues around like if you have
mandatory certificate pinning so essentially like your certificate fingerprints aren't going to match up and so the way that most browsers implement this is like actually bypass and if the certs in your root of trust locally everything will just work so it's fairly transparent to the end-user for those of you that have interacted with or dealt with like ITF and like kind of the way that they're moving like will that be that way forever will folks that are really passionate about privacy continue to let that be a thing or will you know mandatory start pinning be more of a you know end user configurable like how is that going to look in the future is it
going to be a broken user experience like we don't really know but we can kind of see the trend line and that like it is definitely headed towards being a more broken user experience for end users at the same time like we really like content tools so anyway on the metadata analysis pros and cons so pros here effectively with metadata we don't have a lot of the same legacy issues that we got with content with content we all kind of became experts on storage and storage appliances it's like anybody that has maintained like a 30 day or 60 day peak app store sorry can I do like a show of hands like you know what I'm
talking about like massive peak app stores in or around your organization's alright sorry I'm like profiling the audience real-time but I appreciate it so like you're an expert on storage with metadata like all of a sudden like that same storage array that you had that was like super high performance for all the packets like well you can now do a lot more as far as look back you can start to look at not only like first order data which is like the actual data in packets but you can do things like data enrichment there's a lot more there's many more possibilities there you can also like just the amount of look-back that you can get for like metadata
versus like actual full content it's just it's it's pretty astounding the cons here are you're still in the storage game like I don't know if you thought from the first part that I was gonna say like Oh storage is no longer a thing it still is you kind of need a lot more metadata to be useful like versus like hey I have a 30 second pcap like 30 second pcap like as any network analyst worth their salt would say is like I can do a lot with that like a lot of analysts use like a single packet and they can do a lot with that a single metadata like a single log it is not quite the same 30 seconds of
like metadata is is really not quite the same you're not gonna get the same level of insight understanding that you would from full content the other thing here is like you're essentially becoming a like data processing pipeline business-like internally as a security team is you have like things that generate data you now need to make sure that like events from the thing that like the software agent that runs on the network sensor that collects the data go into like the aggregation and staging on me like appliance or the piece of hardware and they need to make sure that all of those events get written up to like a centralized data store you need to make sure that those events get like
deduplicated and cleaned and then move to like some other like thing that you can query ish and if you want to do like we're in Richmond or get more out of that data like you're essentially like it's like Lego blocks but they get like there's a ton of more like there's a lot more flexibility of options but we're like content like you literally just write pcap like a file and then when you need it like if you're really like you know playing your a game it's time index so you just like go grab the one pcap file that you need or you can slice it like using mighty shark and you're just good like when you get into metadata
it's kind of this like it's a very deep hole and you can continue to code like deeper and deeper so like is network security doomed I mean this is no we don't really think so so one of the really interesting things about SSL and the way that like that's so and TLS work is that there's actually a whole lot of there's a lot of new things that we we get access to you and so this one sorry this is actually this is a DNS this is DNS traffic so like we start to look to other protocols like augment our ability to see and understand encrypted traffic I don't know if you guys can all read
this this is actually so this is a DNS query for somebody once told me the world is gonna roll me I ain't the sharpest tool in the shed she was looking at um with a finger and a thumb in the shape of an L on her forehead dot what is that done md j md i think it's like there's like this is the kind of stuff that actually happens when you start looking at metadata like the internet is a strange and wonderful and strange place but there's definitely like techniques again can be really helpful here so when we get to in crafted starting encrypted traffic we essentially get a whole new set of metadata that we never had before before
the traffic was encrypted so we get a protocol version so this is essentially like the version of SSL or TLS it's being used cipher suite server server subjects server issuers client subjects client issuers and validation dates essentially like when is there she's for yet valid for and then we also have all the same metadata that we had before right so we have like IPS we have flow we understand how much data changed hands if you guys take anything out of this talk like hopefully like one of the first things you should take away is like hey termination maybe like the thing that we should talk about you know have like a framework to have that
conversation internally second is like there is interesting metadata here around encryption like the number of times that like self-signed certificates are actually super valid for bypassing all the content security measures on a network are it's kind of in like astounding and that stuff sort like it's sound like a sore thumb here like if you wanted to go for bonus points you could like actually validate each certificate that's actually a real certificate these not everything does that but there's a lot here and so essentially one of the big questions here we work a lot with like hypotheses and like do they actually work out in the real world or not so one of the things here is like we
can actually look at SSL metadata an infrastructure because essentially like normally when you talk to thread and tell people they're like our infrastructure is like these three things it's like IP addresses and domains and who is like this is what infrastructure means this now kind of gives us another layer of infrastructure that we can actually look at and analyze and so mark Parsons she's a super smart researcher who lives actually on the East Coast Justin's are from the East Coast up from the west coast so it's a if there's like a comment Bing good rivalry sorry it's a good rivalry but mark did a great talk it besides charm in 2016 by just pivoting on SSL infrastructure so
like what CA is what certificates what issuers like he was able to effectively like increases coverage up to over a hundred times versus just using like domain Whois information and trying to link that together so like there's a ton more here that you can do and this kind of goes back to like what does metadata mean what data do you collect how do you process it like by creating these huge aggregate data stores like we can start to look at this and even some of the new some of the vendors are actually starting to expose more SSL metadata and allow for you to do queries and kind of pivoting around there and so again like walk away with
something new to ask like whatever vendor you're talking to you at blackhat if you're gonna go over there I'm gonna hand it over to Justin now sweet so when I came over from the Red Team side one of the first things I want to do like first week on the job come in shiny ready to go was break everything I did for three years as a red teamer using our car analytics platform essentially like how many analytics or detection techniques can I come up with solely focus on how I was doing what I was doing assuming that I was being somewhat threat representative and I was doing things like real world bad guys were and
really what I ended up doing was forming some strategies around you know essentially hunting through data that find abnormalities or interesting facts everyone defines hunting is something different so I'm gonna set somewhat of a level baseline focusing strictly how we do with the the network data so we look at proactive identification meaning we're actively going out and performing something we're not using like an alerting technique we're not relying strictly on detection it's a it's a factor of it but it's a slightly more manual human process and we're focused on focused hypotheses meaning we make an assertion about something and we go test that assertion that allows us to be kind of scientific in nature rather than just
getting a whole bunch of data and saying show me evil because like everyone in here probably knows it's not quite that easy and it's it's it's not how it works and so we're where we really focus in is like what's normal where are the outliers like what are where the protocol abnormalities meaning things are not being used like they should be how do bad guys leverage certain protocols and then we take those different facets and actually go out and look for weird things in the data so starting with what is normal I think very few people kind of have a cross baseline look at what is normal in terms of ssl/tls and so for us being we have a
huge viewpoint working with our customers we're able to kind of profile and I guess create a pseudo baseline just around the bulk of data around what we would see as normal in terms of statistics on specific feature sets and so just looking at something as simple as TLS or SSL versions you can see kind of the break out of what's normal and you can also see that still vulnerable versions of SSL Arum prevalent use and so just in terms of data set this is across approximately I think to 2 billion samples of 2 billion events over SSL so good good data sets a look at these statistics I think even more interesting is if you
look at cipher suites and like what is common across full cipher suites unlike where the TLS version is kind of settled on a modern safe version in cipher suites there is tons of vulnerable cipher suites still being negotiated today on the Internet and so and I would also add that there are differences between cipher suites being negotiated potentially and malware versus cipher suites being negotiated in normal servers that's something that is really good to dig in on them look so we're gonna dive through some analytics looking at how we hunt through this data we'll have some good talks will poke fun at some real indicators so yay the first one we like to look at is kind of the
commonality of particular SSL traffic meaning our hypothesis is essentially around that bad guys when they gain access to an environment using encrypted traffic they're gonna be on a relatively low number of isolated systems like it's not entirely common unless you're coming in with an incident response that you walk in with every every node compromised and so generally it's going to be like a few number of systems but those those systems communicating to the certain servers are gonna have really high request counts meaning that there's going to be a lot of traffic from those low number of systems so it should beg the question of is this is something that's popular or common why is it not
on more systems and if it's not popular or not common why is it happening 24 hours a day all the time on some sort of periodic pattern and so when we look through this on our data I think we felt kind of founder fine that it's a useful feature to enrich upon meaning that after your start investigation this is a really good thing to tack on as a contextual enrichment to understand and better characterize particular flows and it can be combined with other beaconing patterns it's a really good way to kind of rule out advertising and what I mean by that is is if advertising kind of follows this characteristic somewhat but in general advertising you're going to
see on huge number of systems because there's woman advertising CDNs and common advertising domains that we see everywhere and we're going to show some examples of those later and it breaks this analytic so this is a good way to kind of weed out advertising but which by the way is looks like some of the most malicious stuff on the internet I I would hypothesize that they essentially study attacker techniques to bypass anti adware stuff or adware stuff yes I hear yes yeah so the next analytic we looked at it was kind of sin to receive ratios essentially looking at bite count distributions and understanding is there predictable patterns with how malware over SSL will leverage the byte counts
and so what we saw was is there's a extremely low variance and the byte count sets meaning that when you look at large periods of time and typical bad guys are only working for a limited number of hours during the day and not during those working hours therm hours just doing some sort of characteristic beaconing right so if you're using asynchronous sslc2 you are simply gonna have beacons rolling if you're a bad red teamer pentester attacker you have them rolling like every second if you are slightly more strategic it might be every hour or two hours or if you really like paying every eight to twelve and so in that case when you look at a huge
statistics set and break out the byte count distribution there's going to be a whole bunch of requests that are the same exact size meaning that when you craft them over time and look at like a kernel density estimate it's gonna be like really focused on one of the particular buckets which is not common across real ssl the pike count will heavily fluctuate and vary this is a potential good detection point for both data loss meaning do I see huge uploads of data as well as kind of profiling there's byte count distributions I would also add it's extremely tough to model and a modern enterprises like Americans and people around the world love to work all the time especially hackers and so
they break this because like looking at like one good area to focus would be off hours like let me look only at nights to look at these distributions which is my pair down our data but people work all night and around the world and they have VPN and everything's accessible and open and phones can be brought into networks and so this handle that it can complicate itself and get slightly more difficult one of my favorite concepts so as a red teamer come the blue side one of the challenges I took on was how can I do or how can I build an analytic that the attacker has no control over meaning like they have control over their beaconing times they
have control over their bike counts they can make their malware fluctuate bike counts to break these analytics one thing they have little control over is when they introduce themselves to an environment meaning when you add something new to an established baseline it should stand out and so looking at newly observed certs essentially building a concept of passive SSL certs like monitoring all SSL certs used in an enterprise looking at first observation dates and why something was newly introduced like if I have years worth of traffic collected from my environment and I've never seen this SSL cert before what is it because generally your business should be doing most of the same stuff you're over a year and it can
be particularly interesting I like to think this is an extremely useful characteristic for like organic or internal threat Intel programs kind of extracting and looking at data from their enterprise to get a better understanding of the environment and so what do we seen actors using it's like realistically we have all these fancy analytics and like these are we're targeting all these bad guys doing really fancy things but like when we actually look at real samples that we've seen in the environments and done response work with nine times out of ten there are free or cheap certificates like let's encrypt Komodo common providers that open this up and generally DV cert specifically we're gonna mine you for Intel if you have
seen an example that like completely counters everything I just said like you've seen attackers using [ __ ] certs like send it to us because if we're really interested and we'd make Thai potus's that's probably pretty uncommon not not something we've seen a ton of so this naturally brings us to our favorite topic which is let's encrypt let's encrypt is a awesome program it's really trying to promote privacy and bring SSL everywhere and kill us everywhere the interesting part is is when they when they brought out SSL TLS there was a lot of feedback or criticism from certain parts of the security community around why are you enabling encryption to be so easy without any validation or integrity
around who purchased the cert and so let's encrypt put out a blog that was essentially stating they believe that it's not the providers job it's not the issuers job to control or verify people over SSL and so what does this really mean or what is the big deal I think we have to start with what is the difference between DB and evey certs because that's really what they're getting at is they don't believe that it's their job to validate identities of people they believe it's their job to validate the identity of a certain technical owner of something and so in DB certs and by the way I'll be the first one to admit I didn't know how
to recognize this until like a couple of weeks ago and so as a security practitioner for years like I couldn't look at a browser and tell you difference between DB and evie at the bottom you can see the difference DV says secure Evy says the actual corporation name that's been validated so DB meaning all you can all you validate it is that someone owns a domain my my build script for red teaming included an automatic cert bot that like went and pulled my let's encrypt cert for my domain I purchase meaning I can in seconds get SSL TLS encryption on my c2 like no effort no validation no email no info provided all I did was prove that I own a domain make
very simple and domains cost you know 80 cents so it's really not too hard to get in this world EB search being that they've actually validated the enterprise not impossible to get around I'm sure there's really fancy organizations typically much more work and a lot higher cost which is going to be a turn off for most attackers so awesome who uses DB search so when you look at statistics across our database looking at the different issuers that we see I like to just point out that this is our like the most common issuers across multiple billions of samples so these percentages are pretty high like high request counts you'll notice that like DV certs are in
prevalence here so comodo being one sum of the digits Ertz and in trust and geo trust like they're all there let's encrypt is actually pretty high in the list it kind of a little bit below here I think it's around 1% but huge huge players huge prevalence in the the community DV is not going anywhere it's only growing in popularity in terms of the internet which really brings us to the conclusion of we have to change our mindset so there's a really good exchange on Troy hunts blog pretty recently a couple days ago like very timely I saw it I'm like yay a new slide I can add it in a really good exchange around the perceived value and
so I think when we get into this argument and growing popularity of DV search for us we really think that the Internet has to help our users change our mindsets around SSL and TLS does not mean legitimate it's like for the early years in the 90s and late 2000s it was so expensive to get an SSL TLS and it was so something that requires so much work that they were like you would see encrypted traffic and you're like oh that's that's okay it's encrypted like they they did work to get a cert nowadays that's not the case and so we have to start educating people that SSL simply means it's private not safe and really pushing us towards going and yeah
yay I like claps it's for me and so and so I think for us this is really pushing towards the hypothesis a let's go look here more and let's get used to working with encrypted traffic so if you weren't believing me this whole time we'll go show some stats and call out some domains this is fun so who would actually abused free certificates like oh my god who would do this well I'd say the biggest consumers of let's encrypt in our environments that we see are actually advertisers like I said advertisers were kind of evil and so I don't know if you've ever looked for these domains I would challenge you to go look for like lytx dot IO it's
everywhere and it's basically you know ad CDN kind of capability that uses let's encrypt so like highest percentage and like huge prevalence in our data sets also some of scooped in our line and all the other you know common sites you might see using this there are big providers moving the let's encrypt it's becoming more popular we would also say that doodle let's encrypt being really easy and due to me being as a red teamer who used it all the time I would argue that bad guys are probably using let's encrypt more prevalent than other providers it's just again a hypothesis of mine and so when we went looking through the results based on that
hypothesis and started to hunting through some of that data here's just a snapshot of what we found so a lot of X Y Z domains which like should be an analytic on its own like if anything's dot X Y Z it's evil same thing we've got top some degree so if you have a dot top to me and I'm sorry because I was trying to just insulted you but we found a lot of park named cheats and fishing domains and Ukrainian WordPress sites and you know all sorts of just really sketchy things we're really prevalent let's encrypt things that we probably wouldn't want flowing over our average corporate network and being used by workers in our enterprise so jumping
into phishing really looking at let's encrypt phishing so like I said as a red teamer we literally use let's encrypt for all our fishing stuff and pretty much all our c2 and so when we dive in on this let's look at some samples so you'll see this is a good fishing link that was sent out to providers also using let's encrypt these little or nice little credential stealer Eric Lawrence is not the person that you want to send phishing emails to basically yeah Eric law is like one of the used to be a lead developer on ie and now is a lead chrome developer so sending him our I mean you can send him our samples let's put him
on Twitter but yeah yeah there's a couple really good Twitter accounts to follow for stuff like this that's one of them yeah John Lambert's another really good one and so yeah I mean essentially what we can show here is that fishings becoming all their age using let's encrypt and malvert eyes errs and and bad guys like we've been saying this over and over it's a good place to go hunting it's not all evil but it's certainly interesting so this is where we get into the forensics part of the talk I've given a this is a much updated version of talk but the last time I gave it I really focused a lot on hunting and detection and really
enjoying like hey detection czar not over just because we don't have content but some of the feedback I got was like well we actually want to talk about forensics like that was in the title of the talk why don't we talk more about it so we're gonna talk about forensics forensics can sometimes be fairly dry so I'm going to try and treat this at a high level we'll try to make the slides available if you guys want to go into more detail and like dig in deeper but essentially this is the detection of forensics like kind of like overall like how we move through the process so initially within a lot of what we've been talking about is like the initial
identification of suspicious traffic we're gonna move on from there to effectively analyzing content for artifacts so like traditional Network forensics or traditional even like host forensics you're gonna pretty much go the same place like you start with something that you want to look at you then go like okay what content is relevant to this investigation you're then going to analyze that content as artifacts that could be anything from like binaries to domain names or any of those pieces after that you're gonna essentially like all roads lead to Rome you're building a time line I mean all the forensic investigators probably in the room have had the experience of like time servers not being in sync and like trying to go from
like an excel sheet to a PowerPoint presentation for an executive like these are the joys of being a forensic investigator and then ultimately impact analysis and presentation like I hate to say it but if you're doing forensics and you're an organization that really cares about it I've almost always had to do a PowerPoint presentation it's not my favorite but typically you're making it in some kind of like view that somebody who is maybe not as familiar with like the content or the artifact analysis like okay what happened and like was it bad like that's ultimately where we're all going to and so there's a whole lot of different pieces that I'm gonna go like that you can like we'll put this on
online somewhere so effectively like let's look at the process before we have network encryption right contents a thing you know let's just say like we have malicious traffic over unencrypted HTTP this is pretty much like the probably the most generic sample example for this kind of environment so effectively since we can see the content we just get to go like okay something's bad like we had some kind of detection event that led us here now when you do forensics you basically can look at all kinds of things we we have the IPS we have the host headers which is HTTP protocol level data so it's gonna be part of that content we also have the
payload and content so we have full content URIs parameters response codes user agents cookie values request response bodies like if there's actual file transferred in hgp we probably also have that and so the overall analysis and network forensics on like each sheet like bad traffic over HTTP fairly straightforward like if you look at that kind of flowchart like you have all the pieces it all kind of flows in nicely for this this is a PE download from chupa if you guys know like chupa and vulture I'm guessing both the red team people are familiar with it and probably some of the blue team people in the room are also familiar with vulture and Troopa these are a s's that are generally very
easy to set up VPS azan red teams and other people like it's not just red teams like there's actually legitimately bad [ __ ] in vulture and Shuba it's an interesting part of the internet remember I said XYZ this is real rad server totally cool name yeah so this is like this is actual real traffic sorry like in a lot of examples like we'll try to call out whether or not we made it up and like did it for this which like the demo that we're gonna do at the end we totally did makeup because live demos are terrifying but this is like legitimately like it's just a PE download and we can see like
the timelines the timelines already they're like we have the P download then we see the the traffic out and then we can see the the net flow associated that so like what happened like okay it downloaded malware and then we saw a beacon out to this IP address like timeline like we probably are going to still go to host analysis to finalize impact but like overall like this is a very straightforward process so we have content we analyze the infrastructure we do the time lining and then we like that's my like PowerPoint graphic that somebody made for me but yeah that's a PowerPoint graphic in a PowerPoint presentation it was its high point for me so Network forensics after encryption
so this is a little bit different right because our detection mechanisms like if I'm in an environment that hasn't really updated network security since like the let's say the early 2000s or like the mid 2000s or sorry like you're kind of screwed like all your content detection stuff is basically like sitting there generating heat in a rack but it's not actually generating a lot of detection for you one of the really easy ways to look at that is actually if you look at the types of detections you have and like do like group eyes I'm like what signatures are firing and start to look at that over time you'll start to see like fewer and fewer certain types of signatures so
like yeah we can sorry I'm not gonna go all the way down that rabbit hole but essentially here like you basically live in a world where you can like do things like match IPS so like if you have no termination you're not really collecting any SSL content metadata you're basically like okay this IP address is bad and anybody that's done investigations or detections or like actually tried to operationalize start intelligence knows that IP based matching is not the highest fidelity source of truth when it comes to that and then we also don't have any HTTP protocol metadata so remember all the things that like showed us which file got downloaded we have file metadata we
you know we saw like a post request out to like a malicious situ server we had the host header so we actually knew what website it was going to we don't have any of that so one of the issues here is like thinking about I don't know malicious actors that may or may not host infrastructure on AWS AWS ec2 IPs are fairly reused we have things like shared hosting now if I only have the IP address like my job becomes very difficult I don't have like a clean timeline and like how do we disambiguate between like legitimate usage of ec2 endpoints and like illegitimate usage like this starts to become fairly challenging on the left hand side we
like I love that we have this in here so like here's some like these are actually Intel matches so for SSO we can see these are IP addresses it's encrypted traffic then we're then pulling in the especially the threat intelligence metadata I think this came off of yeah John baman axe high confidence CT IP speed which is freely available through like this but a whole bunch of other places so like but that's pretty much all we have we have like a net flow record we have an SSL record and that's pretty much it so like filling in the timeline because a little more difficult so we effectively have to build out new like a lot of this talk is about like looking
at new data sources and building new techniques so effectively like we can't just match HTP host data anymore to disambiguate like which website was being accessed on an IP address so we can start to do like get more out of the ssl certificate data so we can look at server name and subject field so we're used to use HTTP host headers and then since we don't have the HTTP hate HTTP payload data we can actually start with an IP address and an event time and then if we have things like time index passive dns we can actually pull like this concept of like tight time bounded passive dns so if i'm watching every client in my environment and every dns
request and response i can actually tell you what host was requested because like if justin requested i can just look at the time period immediately preceding s so like connections stand up and like what did he actually make a dns request for and i can now start to like disambiguate and so like their powers combined a sense like server name and subject field data as well as Pat like type time tight time pounded passive DNS can be really useful and then for like interpreting the content so like was it a binary download with some of the uploading data like where does it fit in a time line or put like potential TTP that's really where
like data sent receive ratios can come in really handy like Justin talked about how they can be problematic for detection but for forensics they can actually be really useful like HTTP traffic by far uses like fairly similar data ratios you actually fingerprint different types there's some really good papers out on like PCR it's called the producer-consumer ratio and most of the papers that are out there but it's a really useful data point to have especially when you're trying to reconstruct things without actually having the things you're trying to reconstruct it's like a black box approach to network forensics so effectively we start with a starting point we start with essentially matching on domains so instead of matching on
domains in HTTP data we can match on those either in DNS which is highly problematic where we can do it in SSL which is all less problematic and then we can effectively continue to analyze infrastructure and IPS we can use that tight time bounded passive DNS to confirm like okay was this actually the host it was accessed for time lining we essentially are gonna pull all of the events so all the flow data all of a certificate metadata and effectively look to see like if is is it hosted on like a single IP is it you know essentially a cloud or shared hosting environment and then we can pull the data sent receive ratios across
different time buckets to try and ascertain what type of activity was actually going on that this host was doing and then we can put it in PowerPoint which is pretty much the same like it also ends up in PowerPoint sorry so to wrap things up and please feel free to ask questions on this I think we still have a little bit of time here we tried to leave some there so encryption is at the end of the world it sounds like a lot of vendors speak and a lot of different vendors will try to influence you in different ways we're pretty firm believers from just like a research and like completely product agnostic position like you can collect this data
there are many products you can use to collect this data you should collect this data and it will be useful to you you centrally have like you know recap you have two options you can turn so like if you have a very old legacy environment and you don't want to have to learn a whole bunch of new tools termination is probably your friend but you're gonna have to fight that user privacy battle otherwise you have the option is shift more towards metadata which really is looking at leveraging features of encryption for analytics intelligence and essentially like detection of forensics piece and I mean honestly there is a huge way forward for this like we're very early as a research
community I mean I called out mark Parsons I think he's awesome I need does some really good research around this but you know detection by looking at predictable or notable features around SSL Certificates and like Justin was saying like we were really disappointed when we started looking for this and we mostly just found like now are using self-signed certificates like that's that was kind of a bummer for us because we were like oh check out this awesome like analytics it was like knows we're gonna start with blocking and tackling like many security pursuits like let started zero and then work our way up so there's a lot of value here and then overall like forensics is in a dead science just
because the data is encrypted please don't let it become a dead science because the data is encrypted it's a little fidelity but we really still feel like you can build a lot of the critical information still build a timeline and ultimately still be able to present a PowerPoint up to your security leadership so I had a friend dare me to put a security maturity model in for this this is again like twisted things that makes sense when you're like doing ya company running stuff you're like oh we should have a maturity model I was like Touche you should troll them for this acronym because it's silly yeah you can see that I maybe had a little
bit of fun with the acronym here this is essentially a like I really do like in all all jokes aside like maturity models can be really useful this is really meant to be like a measuring stick for like hey I'm an organization like how am i doing on this like that guy talked a lot about like encryption and security and so something about detection and then some hunting stuff and then some forensic stuff like where am I on the chart so effectively we start out with content because everybody already has this so like do you have content termination in place and it's people process text so like if you have no people allocated to this you don't have
a capability so like it starts out with having someone like analysts or what actually make all of this work like without analysts and process like you buy all the shiny kit and the vendor hall that you want but you're not going to get anywhere and so like you have someone assigned to this you're actually terminating traffic and then your technology like you can actually do this at a web gateway or a web proxy so like a lot of appliances are starting to do this you can buy those you just want to make sure it's actually turned on like if I had a dollar for every client environment that we'd walked into you or they're like oh yeah we're terminating
all the SSL don't worry about it it's like go actually validate like this is what this one person can do and they may or may not like you afterwards if you're their co-worker boss but like make sure it's actually turned on and working in all these different places like that's essentially like starting out here level two is really looking at kind of like how much more can they do with content this is where you get to that like a termination layer so effectively you have a monitoring layer where you have like a packet broker so I could be a commercial like that could be like a gigamon we're like to make it super easy it's plug-and-play that could
be an Arista because you have legitimate amounts of traffic and you don't need all the the UI stuff but you essentially protect you have an SSL Terminator and you have a tap architecture or data broker that's connected to that it's like at that point you're doing like two thumbs up on content you're ready to rock and roll metadata level one is really about having comprehensive visibility and logging of DNS to a central repository and then making the DNS data accessible to analyst so I can an analyst actually query that DNS data because I've met a lot of people in the network security realm they're like we have all the packets or we have all the DNS like okay
great like can people get to that and then those people who are usually like the architects or engineers kind of look at me like what do you mean I'm like analysts like they they're the ones that actually do the job for the business so like both having it and making accessible our key points here as far as technology you can really just do this with DNS metadata extraction and logging there's a couple of open source projects that are pretty good for this if you just want to set up basic DNS logging I would strongly recommend you do it at the network layer and not try to do it at the hosts layer so like most
environments like all the like all the DNS is aggregated especially on Active Directory domain controllers because this is how everyone set up their domains is how everyone's set up DNS it's a recommended configure for Microsoft don't try to install much of logging facilities on domain controllers like for legacy versions you actually have to put it like basically in a debug state and then you're writing a hole into DNS logs to disk hit there are a lot of DNS logs if you've never looked at TNS and like the flow rates and all that I'd be lying if I said we hadn't tried to help a customer do this and then blue screen one of their domain
controllers which they were not happy about so like please do it at the network layer you will you will be happier I promise so layer 2 here is automatically linking DNS and IP resolutions based on time period basic data and data flow calculations so this is kind of like tight time bounded DNS that I was talking about to really do this so you in real time we're talking about Big Data stuff and I'm gonna qualify that because like I know I'm a this guy under got up and talked about Big Data and I'm like nobody threw anything at me which I appreciate but like realistically like you're talking about multiple probably in a decent sized enterprise like terabytes of data
a day that need to be written in and if you actually want to be able to query this in real time you're talking about like a streaming real-time processor so you're probably talking about like something like spark ish and Hadoop ish like HBase ish Cassandra ish and basically having all that architecture up and running and then having Web Services on top of that so like I didn't just say Big Data please just animal troll me about it later but it'll be fun and then level 3 here is really about data flow analytics getting more into like okay you have all the data it's accessible but now we're actually going to do a lot of these like
Big Data crunchy analytics on it and essentially this is about having an analytics layer that lives on top of all the data all the other things that you're just talking about that like make the data like collected and accessible and then you're really like you know as far as we're concerned you're cooking with fire and you're at the top of your game and if you get to both like the highest level of maturity on both content and encrypted data we feel like everybody will be in really good shape and we'll all feel like we did a good job on this talk and you know everybody did awesome so one of the big things that we really
enjoy our questions and we talked about a lot so please feel free to ask us Russians we're also happy to answer questions after the talk we'll be hanging out for a while yup here in the front and I know that guy with the mic is probably gonna hunt you down so you mentioned that you saw it differences in the SSL certs that between normal traffic and the malware could you elaborate on that a little bit yeah I mean so like in the most basic sense like so there's two things that happened one is that like malware uses self-signed certificates not a lot of other people do because in a browser like the browser will be like oh my god
it's a self-signed certificate like Firefox has been training us as users of like I mean like my parents know not so like browse to that web site whereas like backdoor clients don't have to validate certificates like that's a client-side thing that happens like some people will get confused of like what happens on the client what happens on the network so like you actually will see people using self-signed certificates like and they will almost never do that in a legitimate browsing session because of you know the my parents principle of like oh my gosh like well I saw this thing and it was like it's not signed and yeah like they my parents I've actually done that
before Kaspersky installed was awesome sorry did that answer your question yeah okay so I was curious if you had any thoughts on the TLS 1/3 debate for the big enterprise and the fact that they're complaining right now about the TLS 1/3 specification because they can no longer do man-in-the-middle attacks inside their network so I mean basically with TLS 1/3 I'm a pretty so I mean I don't know if you so one of the big grants that I did at Microsoft is I like ranted and raved and worked with the ITF to like remove rc4 as one of those ciphers moving forward because it's terrible you should never use it if you ever see it like negotiate on your network you
should be concerned it's not actually encryption it's kind of like a zipper and if you just pull on the right side of the zipper like everything just becomes unencrypted like stream ciphers are bad we could talk about that later but I think with TLS 1/3 I think that the way it's all going to end up in the end like it's all gonna come on the wash is that that trusted root is still going to be the source of truth for every endpoint and I think that it's so like having worked out Microsoft like the user story of like hey all these things are just gonna break and everywhere where you have like pans to put like your pallets or
networks you know gateways deployed in doing like a cell termination like that's gonna be a broken experience I think that the IETF can do a lot of things but I think at the end of the day usability is probably gonna win out but kind of like I was alluding to like with that like hey is this a long-term solution I think eventually user privacy has enough behind it that you know we're gonna move into a world where everything really is about you know Web Services architecture it's all service-oriented and like at that point like I think clients are gonna become less and less of a thing it's gonna be more about like I have a user agent but I think it'll
still be up to both the user agent people so effectively that's more and more becoming Google and the Chrome development team and they seem to be pretty privacy leaning and then also the OS manufacturers that really are the ones that implement like reader trusts although technically speaking some of the user agents so the browser's implement their own route of trust and that's a whole nother this is I'll try not to make this talk just about like me ranting about TLS but it's a very interesting like yeah if you after you check your trusted root contents and you're bored next week and like recovery and you could also like look at how TLS works it's a lot more convoluted and
after you see the people whose names are on the trusted store you'll probably be pretty curious about like how that yeah like Gators certificates or something like that there was also a funny tweet that I went out really recently where someone weighing cleared their entire I think was Casey Smith's up to you I think you did a where he cleared his entire trusted store and then just like rebooted his computer to see like what repopulated and basically they all just like came back poof like so like managing your own trust the store is not like as simple as it as it can be nobody wants to break users right like no one wants that but at the same time like
there's all these other competing pieces so walks very short like our opinion is it's a changing environment I don't think that it's going to be this round that like privacy rules the day because I just it's going to be so broken of a user experience but will that continue like I don't honestly think so I think we're gonna get to a place where we really do have to deal with analytics on top of encryption instead of relying on interception yeah just a quick comment for a Justin you were a little critical of advertisers and using free and cheap certs and I just wanted to comment that I'm actually applauding advertisers for finally encrypting traffic how long did we see mixed-mode them
including random JavaScript that I could easily malvert eyes my way into a domain so I think part of that was you know the browser's kind of forced their hand cuz mixed money became much more apparent to the user but but I really applaud advertisers finally take it up as as hell thanks for the comment we appreciate and value your opinion we as former red teamers like yes we have in Sept like we've intercepted mixed-mode content at the same time it's one of those things where like since advertising basically creates like a new fresh hal for us on the like analytics front every day i think we can sometimes be a little harsh on them i'll give that
yell many of that yeah generally speaking like advertising is you know i don't know like i live in a world where there's like good guys and bad guys i'm not gonna put them in either camp but you know sorry if we offended the advertisers please do continue to encrypt traffic otherwise red teamers will likely continue to intercept your traffic and do bad things to users so alright i think there was one other question or another question over here yeah this is more of a comment another thing that you can add to your SSL termination and con list yeah a lot of api at providers are now requiring client authentication so it requires mutual authentication a lot of it
certificate penned this is breaking i know in my own environment this is you know breaking our ability to do SSL termination and to inspect the traffic so absolutely i mean that's a super yes we will absolutely add it it is a super valid point it is becoming more and more common most people don't even know that you can do that so i applaud like like to be really honest like most people don't even understand that like there is a client server they either assume that it always has client server authentication and they don't understand that the readers trust is really like external to them or they don't understand how the client server authentication works it's kind of a
painted you but yeah i absolutely would agree that you know there's more of it going on and you know it's it's definitely on also cause issues because no matter what the you have client authentication in the man and little stuffy not gonna work like that's just basically in a break and that's actually why there was an asterisks on that slide where it was like it's a panacea it's not actually a panacea sorry like not everything I'm actually route over your termination gateway and like people use client certs and all just work because now I don't nobody wants to break users so it was there another question or comment over on that side of the room yeah hi great talk there's
another paper actually at besides DC last year on doing SSL cert analysis and looking at the fields not just for self sign search but also the regular signed certs and there's some very interesting attributes in there and then if you apply a machine learning model on that you can actually extract out which are malicious or not so you can go look it up and that at besides DC yeah you can absolutely be more complicated analytics on top of it I would say like our original hypothesis is that like oh we're gonna have to do that generally speaking so sorry totally hear you it's there's a lot of interesting things happening around like analysis of any
metadata and trying to use machine learning to find outliers or build models that we can train to like recognize like malicious or not malicious traffic I would advocate strongly like in the maturity model like I'm just like collect the data so you have it and then you can like have like really smart machine learning people that you find that can help you do things with it I mean most people just don't have the data to begin with and if you don't have like DNS data which most enterprises we interact with don't have like global DNS data and like so I mean how are they need a certificate metadata they even look at it like which contents
they actually look at like there's actually more metadata that you can get outside of like we hadn't even talked about like SSL like different SSL handshake properties the protocol is inherently super complex it's like there's tons of places where you can go deeper and deeper on this but please just collect the data because if you don't start collecting the data there is a time when content interception is not going to be as much of a thing anymore if you don't have the data that you need to do Network forensics you'll probably be on the end of the world wagon and we want more people on our wagon where it's not the end of the world so yeah I
think it's a super Valley comment thanks apologists but that's all the time that we have for questions today if you'd like to continue the conversation make sure to follow the authors on peer lists and please give her a round of applause to our speakers thank you