GT - PhishDefend: A Reinforcement Learning Framework

Name: GT - PhishDefend: A Reinforcement Learning Framework
Uploaded: 2024-09-04
Duration: 51 min 42 s
Description: Bobby Filar explores internet measurement as a distinct security research discipline, moving beyond traditional threat intel and penetration testing. Through three concrete examples—measuring attacker behavior in phishing markets, validating internet-wide scanning engines against ground truth, and o

BSides Las Vegas51:4260 viewsPublished 2024-09Watch on YouTube ↗

Speakers

Bobby Filar

Tags

CategoryResearch

ResearchEmpirical Research Technical Deep-dives

StyleTalk

About this talk

Bobby Filar explores internet measurement as a distinct security research discipline, moving beyond traditional threat intel and penetration testing. Through three concrete examples—measuring attacker behavior in phishing markets, validating internet-wide scanning engines against ground truth, and optimizing vulnerability patching workflows for human operators—the talk demonstrates how quantitative measurement drives better security defenses and reduces organizational attack surface.

Show original YouTube description

GroundTruth, PhishDefend: A Reinforcement Learning Framework for Measuring AI-Augmented Phishing Detection and Response (duplicating full title due to title character limit) People Bobby Filar

Show transcript [en]

all right thanks so much folks can everyone hear me okay volume's okay awesome thank you for the thumbs up um yeah so what do we mean when we say internet uh measurement and why does it matter for security let's get into it so often when people say security research like when I'm like oh I work in security as a researcher like oh you must work in threat Intel no pentester no okay but surely you're a reverse engineer it's like no I work in this other field called internet measurement or evidence space research and what I'm hoping is that you folks leave this talk with an understanding of what it is some examples and that when

you hear the term security research you think of more than just those first three pillars but before I can get into some examples I think it would behoove us to set the stage and talk about what is measurement at a very broad level measurement is a way to quantify the world around us with the goal to better it and the thing is that the security field is actually fairly nent I mean I know it's only been around for like two or three decades um but that's still pretty new there are other fields that have been using measurement at the core of their science and their principles for far longer for example I know folks who have figured out ways to measure

depression and cognitive well-being in adolescence using measurement um there's been some amazing work as many of us are aware with our masks on um on the public stage about how to measure covid-19 spread and how to mitigate it um and then also you can use measurement in physics to see what is the polarization of light particles so measurement is prevalent all around us it's been around for a long time it's using all these different fields and internet measurement specifically is about quantifying and improving the many parts of the internet including the security aspects and the nice thing about internet measurement is that uh it can contain a lot of different facets you know we can measure server and server

behaviors we can measure people like what are they doing or we can measure other people um like the attackers who are trying to break into servers and so there's a lot of different facets when I think about internet measurement so I'm going to concretize this a little bit with three examples what are three ways that we can actually use internet measurement to improve security and so in my own work I've um asked myself you know how do we measure attacker behavior in the hack for higher Market how do we compare our results to ground truth when you're an internet wide scanning engine and also in the context of a sock or an S so how do we improve vulnerability

notification um within an so and the cool thing about internet measurement is that like yes there's science and yes we're trying to do like really valid work but when I think of Internet measurement I think of it as a proxy to answer these overarching strategy questions to try and help better the world and so really when I say how do we measure attacker behavior in the hack for H Market I'm actually thinking what defenses can we build in email to better protect targeted users when I think about how do we compare our results to ground truth really what I'm asking is how could our scanning Encompass more breadth and depth and be more useful to users and

then finally when I'm thinking about how do we improve vulnerability notifications it's not just about the notifications in fact it's how do we reduce the attack surface of an IT organization by removing these vulnerabilities and so just a quick uh break to answer why you should listen to me for the next 30 minutes um my name is my name is Aran I currently work as a senior security researcher at census um but prev I did my PhD at UCSD where my thesis was focused on you guessed it internet measurement and security decisions and so all of the projects that I'm going to talk about are projects that I've worked in pretty directly and I chose these three because

they're very different so my goal is to just give you a a very broad range of what internet measurement can actually be and so I'm going to dive into it by jumping into this first project you know how do we measure attacker behavior in the hack for market with the overarching goal to figure out what defenses we can build to better protect users and so as many of you know um email accounts are super rich in information which makes them super lucrative attack targets and defenses have made large-scale attacks difficult defenses like two-act authentication spam filtering and um uh security questions like what are your hopes and dreams um but targeted attacks still remain an issue and that's because

the economics and targeted attacks is very different instead of sending a 100,000 emails and hoping that like 10 people click it you are spending as the attacker a lot more time um to cultivate a targeted attack a targeted message with the hopes that there is a higher payout and usually when we think of targeted attacks we think of high-profile targets like politicians celebrities you can tell these slides are kind of old because uh the most recent news article is John podesta um but the reality is that at the time of this study um and actually it still exists today there is this Underground Market that provides hacking services for hire so this is an example of one of

the advertisements it's essentially a hacking group that purports to break into any Yandex Rambler or Gmail account that you want to get broken into for anywhere from 100 to $300 um and this heck for higher Market had not yet been examined but the reality is that1 to $400 is actually pretty reasonable for a lot of us if we've been jilted by an ex-boss an ex partner Etc and so me and my colleagues had set out to answer this question specifically these three main questions how many services can we find how sophisticated are the methods of attack and how widely used are these Services again with the overarching goal to figure out what better defenses can we

build I do want to um make a quick note that this entire study was done with Gmail because we were collaborating with some folks who worked on the anti-abuse team at Google but as you'll see the results can generalize uh pretty well and so to give a quick overview before I jump into the specifics the way that this process worked is we discovered these services that purported to break into Gmail accounts we then created online personas as the buyer and the victim in order to engage with the accounts because like I didn't want to give out my personal info um so I just made a bunch of online personas we then engaged with the service as the buyer

Persona saying like hey we want to hire you um for this victim again both the buyer and victim are completely fake um we then monitored the attacks from a a variety of different Vantage points and if we were if they were successful we'd deliver payment because they did what they said they were going to do uh so some people made money out of this study um I do want to dive really quickly into the buyer and Victim personas Because this is this is there's some important details here that'll make sense when I talk about the results so like I said I didn't want to use my own email it'll also be weird if I just kept

being like have 20 people that I'm really mad about please hack into all of their accounts so instead every service that we reached out to we had made a buyer and a victim Persona and the buyer Persona we just made a Gmail address with you know some believable name like Ariana mirion some numers gmail.com but not actually my name um the victim Persona was a little bit more intricate and that's because one of the things that we had hypothesized is that in these more targeted attacks these attackers would take facets of this online digital footprint that a lot of us have and utilize it in the attack Vector but we didn't know what they would use so we were like we're just

going to make as much of a digital footprint as we reasonably can and so the victim Persona had a Gmail address again that was like some generic first last name numbers um they had an online web page where they purportedly worked or own so I made a lot of small business owners um and in the web page we linked the Gmail address we also made a additional Gmail addresses that were their Associates that were also linked on the web page so like in one of the cases uh I remember the victim was a a man in this like he owned a carpet cleaning website or a carpet cleaning service with his wife and they owned it

and there was this whole backstory I got really good at this um we also made a Facebook page for these victims and then we um set up SMS to fa on of the Gmail addresses and this was specifically because at the time and I I don't know what the statistics are now but at the time SMS tofa was the most widely used form of two-act authentication and so we were like we not only want to see if they can um create a convincing attack but if they can also bypass what is the most effective second form of authentication at the time of this study and so you know the first question we want to answer is how many services

could could we find um we looked at a bunch of underground forums we reached out to a number of our contacts at various abuse um anti-abuse Team abuse teams anti-abuse teams at tech companies overall we found uh 27 services and we reached out to all of them 10 of them never responded to us for reasons I could hypothesize I don't really know for sure um 12 of them responded to us they like initially like yeah we're totally going to hack into this account for you but then made no attempt um three of them were scams so they like you know purported that they broke into the account um but we didn't see any indicators of compromise we didn't see

any like logs or uh like Gmail logs that anyone had gotten into the account um and my favorite scam was this and I regret that I don't have the GIF um so I put in the email address and then I just watched this web page for 3 minutes as it told me that it was hacking into my Gmail account there was also a button to speed it up and you could pay them more money to go faster they did not break in I lost like 250 bucks on this um it was really entertaining though really cool thing about doing academic research is that you can do a lot of crazy and then someone pays for it and it's like all right well

that's great um but that means that there were five of the 27 that made an attempt and so for the rest of this portion of my talk I'm going to focus on those five because that's that's where we had results um how sophisticated are the methods of attack we never observed any Brute Force logins we never observed any communication with the Facebook account and we also never observed any communication with the associate so we set up all these facets of the digital footprint and a lot of them just like weren't really utilized in the attack um all of the five Services sent an email to the victim and in one of these Services the email

contained a malware execu able that wouldn't run this was like the one time in my life I really wanted to get owned we tried on a bunch of different laptops bunch of different VMS it was just it was just broken um we uploaded to virus total which said that it was probably a remote access Trojan which for those that aren't aware um it's basically a piece of Mal that records what you're doing on your computer including typing what uh including recording what you're typing into gmail.com um but that means that four of the five Services used fishing in their attacks um but they used very very good fishing um like when I said that they

use fishing these were incredibly targeted highly crafted um messages and this next graph that I'm going to show you well let's just walk through it um this next graph that I is on the page shows that these fishing attacks were persistent and personalized so I know there's a lot happening just focus on that top row for now because I'm going to walk you through what this graph is actually saying so the letter is the service which we've anonymized for privacy concern so like service a with and the first time that we hired them um each of the dots represents an email that they sent our victim account and so you can see across all the rows they

sent in most cases multiple emails and then the color denotes what sort of fishing lure or what sort of um what sort of bait they were trying to use to get us to click on it and so light blue means that uh they use personal details and when I mean personal details I mean they use details from the web page that we set up in the fishing emails for this fake person that doesn't exist so there is no way they could have crafted this emails unless they had Googled that Gmail address and had found that web page and then crafted an email that was spoof to look like my wife at my carpet cleaning service um which was actually

one of the emails um you also see on the legend that there's there were different types of lures that were used you know some of them reported to be like a Google login some of them reported to be government um but these attacks were persistent so when we didn't click they just kept sending more and they were also personalized and this was a pretty big finding for us it's like it wasn't a oneandone sort of thing they were putting in the work um they were trying multiple emails um to try and get in the X's is where we clicked on the emails because I was like I want to know how they get in um so these targeted attacks the tldr

that these targeted attacks were able to bypass two-factor authentication in their flow so I would click on the email it would take me you know the email would be whatever lure um it would take me to a Gmail sign-in page that was actually not Gmail but it was a domain that looked really close to Gmail I would put in the password for the victim account and then most of these um Services accounted for the fact that there was tofa protecting the account and so the next web page was actually again a prompt that looked like Gmail saying hey we just sent you a text to your phone could you please provide us that TFA code um the fishing attempts

that did not anticipate tofa so I put in my password and then it just like for Ford they adapted in other words they sent more emails later that then accounted for Tua in that fishing flow asking for the code um and in fact one of the services doubled the price when they realized that tof was protecting the account like they came back to me as the buyer we're like there's there's some more going on here we need we need $500 and I was like all right it's fine we paid them we paid all of them um the other so so this is an overview of like what was the level of sophistication for these attacks right

um the other question that we were interested in answering is you know how widely used are these Services cuz we had a very narrow view we could hire them we could see what methods of attack they were using it wasn't any sort of fancy zero day it was just really sophisticated fishing um spear fishing but we wanted to know how widely used are these services and this is where partnering with folks on the Google anti-abuse team really came um in handy because they could look at the logins from the Gmail side they could create fingerprints because a lot of these ended up looking like fishing kits because they were very fast and um how quickly they were responding to us

typing in the password in the tofa code and then they could look and see how many real Gmail accounts had been logged in with these fingerprints and so this graph shows from March to October of 2019 um how many actual Gmail accounts had been logged into um and so it's fairly small I mean the Y access only goes up to 35 these are unique accounts but there's still hundreds of people affected by these services and this is a lower bound because this is people um these are actual Gmail accounts that had a successful login with this fingerprint not an attempted login and so there could be hundreds more victims who saw it um and were like oh that email looks

weird I'm not going to respond to that um in tandem with this research study Gill introduced a couple new defenses that helped PR protect against this attack which we call man-in-the-middle fishing um and that's one of the measurement studies so at the end of the day we were trying to figure out you know what defenses can we build to better protect targeted users the measurement question underneath it was how do we measure attack behavior in this really niche market and in the process we found that the attackers are not as sophisticated as we had hypothesized um and we also found an effective defense for this attack That Was Then deployed um on a major email provider so this is one

example of a measurement internet measurement study to help security for good I'm going to change tracks a little bit and now talk about some work that I did at census more recently um the measurement question was how do we compare our results to ground truth and the overall ing strategy question that I was attempting to answer is how could our scanning Encompass more breadth and depth and may be more useful to users um those of you who were in my talk yesterday are probably like wow this is dja you are only talking about Good data quality we love good data quality this is a lot of what I think about um so just a quick primary sens this is the

one place to understand everything on the internet it's an internet wide scanning engine that um we do all the scanning and then you can access the data so you don't have to have your own servers you don't need to like have your own setup um for those who aren't as familiar with internet wide scanning I don't really have time to dive into it but it we essentially provide a map of the internet so like you know how Santa goes to every house and drops off gifts um the analogy here is that Santa is going around to all the network devices in the world knocking on all their doors and being like hey what do you speak and

what what are you willing to tell me okay bye and then runs away to the next device um and so we do a lot of scanning to see like what's on the internet um what are these devices willing to tell us what services do do they speak Etc and we have this interesting question which is a lot of people consider census ground truth but how do we quantify accuracy of ground truth when we are considered the ground truth um and like I said I think about data quality a lot and so we came up with this experiment which is we're going to compare census to nmap so nmap is like the OG internet scanner it's from the 90s um it's still

around you can still use it I still use it it's great um and so we were like okay we're going to take a set of hosts we're going to scan them with end map and then we're going to also compare census results um to an in the- moment end map scan to see where things differ the scanners are different but we were like this will at least give us a starting point to be like what are we missing if anything um because we want to grow and change and so this is essentially what I just explained um and this diagram I think shows a little bit better what we're trying to do it's like take the

Census Data take the end map data where's the overlap where's the exclusion um how can we improve and so we ran this and we found that you know between census and M map we found an 87% overlap um which is pretty good there were 133% of hosts or services that census found but edmap did not so I'm going to just we can call these false positives but like it's comparing ground truth to ground truth so that um is maybe a little bit uh of a misnomer um and then we found about 5% of things where census did not find it but nmap did and so of course whenever I see results I'm like well let me go validate

and do some manual digging to see what the heck is going on and in a lot of these cases the discrepancies were because we were seeing these hosts that would be online you know we' do like an in the- moment end map scan and they'd be online um and then they disappear like a couple hours later um only to come back and census is an internet wide scanning engine right so we're scanning pretty consistently but we're not scanning like every 30 minutes or like every in the moment um and so this actually brought this really interesting question which I I want to share because a lot of work with internet measurement is kind of taking a

step back and being like why is this happening um or what are we measuring the right thing like is this flapping behavior is are these differences a facet of the differences in the scanner or just because of timing because you know as an internet-wide search engine we have churn we have databases we're not just going to like run a one-off end map scan and so as with many moments in my life I was forced to take a step back and ask are we even measuring the right thing and so we actually ran the experiment again but instead of comparing the nmap scanner to the census API results which is what we were originally doing we comp we compared the

scanners to each other and so we said okay we're going to take the same set of hosts we're going to scan them at the same time across the two different boxes and examine the differences um and then again we saw like about 86% of the time the results were the same um in this experiment we found that census was able to find about 10% of services that nmap did not and at this point we're like we're doing a head-to-head comparison so we're going to go and manually verify this 10% and when we did it actually turned out that these were all services that we scan better like for example TCP sip is very complicated it's very funky on the internet and the

nmap scanner does well but we have additional logic to account for a lot of these weird edge cases on the internet so when I say we went and manually verified I mean we really dug in and we're like okay there's a lot of these instances in the 10% Like the majority of the 10% where we have just written code that accounts for more of the weirdness on the internet but then we found that nmap found 4% than our scanner did not and this was the place of growth and Improvement and a place for us to to continue growing as an internet WI scanny engine but like I said one of the things we realize in this study is that these

hosts some of these are just like going up and down so we had this real now what moment CU we were like ah one of the difficulties is that census data is not an in the- moment scan there is going to be a little bit of lag and there's a lot that happens on the internet so these hosts that are very ephemeral that are online for one scam but then all of a sudden offline for another scan we're not just going to churn them out immediately actually we want to go and like double check our work and make sure that they're still there or they're still not there and that we're not doing all this churn for

no reason and so as you know as a team we were like what can we do to convey to the public that there are these instances where we're like hey we saw this thing and now it's no longer online so we're going to go do some double checking just to make sure um how in other words how do we conade these instances where we think a specific host is no longer present but we're doing these additional checks to make sure that it's actually down and there's not like a network blip somewhere on the like 20 pads on the internet and so as a result we actually exposed this field you can see it right now it's called pending removal since um

and this is an example of how it shows up in the data um and it literally just says hey we think this host is no longer online but we're doing our sanity checks but as of 2024 0405 26 and then some other timestamp um we haven't really seen a positive scin so we're doing we're doing our due diligence the other thing that was interesting is that when we went back to our original results right where we compared the nmap scanner to the census API and we said okay let's go and see if we check how many of those hostes had pending removal since set in other words we think they're probably down how does our false false positive change rate

drop or how does it how does the false positive rate change and it drops from 133% to 5.5% so like the majority of those cases where we see something but nmap didn't actually we had received a negative scan we just hadn't turned it out of the database yet and by exposing this flag we are providing more transparency to everyone who's using the data if we also accounted for the protocols that we know SAS can scan better the the false posit rate drops even more and so this is an example of a very different experiment where I basically just kept digging but as a result we reexamined what metrics of comparison we should actually be using

for this pretty tricky question and then also how do we expose these intricacies of timing externally as a result also this project prompted a totally different project which I presented on yesterday um where we scan the internet every 45 minutes and are now analyzing those Trends to try and better our scanning data so a lot of interesting stuff here with my last 15 10 15 minutes I'm yet again going to switch topics and talk about a completely different internet measurement project so first we talked about measuring attacker Behavior we then talked about how do you compare ground Truth for an internet wide scanning engine and now I'm going to talk about some work that I did when I

worked as a security researcher at the it org at UCSD um where we wanted to improve our vulnerability notifications but really we wanted to reduce the attack surface of an IT organization um specifically ucsd's it organization and this was an interesting partnership because I actually was doing this while I um I was in this role jointly while I was doing my PhD and it was prompted because the it org was like we want to do better we want to reduce our attack surface but things are pretty abysmal so what should we do and we were like hey can we run a measurement with you and they were thrilled so that's what we ended up doing with them

so just for some background as I've done with the other two um a lot of organizations have moved their infrastructure to the cloud but there's a lot of Legacy organizations like a large academic institution that has physical machines on premise that are still maintained by a multitude of different admins and in an Ideal World you have people who are updating the machines constantly so that they don't have vulnerabilities that are exposed to the public and the reality is that these dis physical systems can um have vulnerabilities they are not patched at consistent rates and they can also affect the safety posture of an organization because all of a sudden you have all these different vulnerabilities

that are exposed that an attacker can then use to get into the network get into the system wreak havoc and so patching is not a new problem this actually has a really rich history but yet it persists and there's been a lot of advents uh that have been created um that that have tried to make patching an easier process a lot of them are optimized for the machine but instead we wanted to take a very different approach we said what if we tune the process fine-tune the process for the human what if we took the process and the current technologies that are employed and examine holistically how to make the how do we make this process easier for the

people not only for the machines and this is a little bit of a different mentality um than a lot of related work that I had seen in the space like a lot of plac like oh yeah just like set up page your duty do this do this do this and it's like well what if you have an org that is 30,000 people um and those 30,000 people are all running different experiments they all have their different Hardware setups and they also all have their own admins who are not talking to each other so as with any good project the first question that we had to ask is what isn't working so far and so like I

said we had teamed up with the it security team at UCSD and and the the lead security engineer who was um in charge of this endeavor had been sending out these emails and I was like hey what are the emails that you're sending out to these admins that no one is responding to and so this is an example all the um oh all the private information has not been redacted um whoops hang on well I need to show you this email that's awkward want pull out the real quick yeah I am this is literally recorded H my bad sorry folks live editing I had to re do these slides from a different slide deck and some of the

things did not get copied correctly okay

great do the thing no

sorry for this nephew folks I also found out I was giving this talk yesterday so that's why I had to alter some slides real fast um okay we're back on track this is the email that he had been sending out to um the admins that were within the um it org and like I said a lot of these people work in different groups they're just all under this umbrella broadly of it um and so you can see you know there's a couple things that immediately stood out to me with this email the first is that um this was basically just a laundry list right it would say hey here are your S four your s five vulnerabilities here are the host

that they're on you need to go log in to um qualis and figure out what the vulnerabilities are to go patch and also you need to patch them ASAP um thanks and you know looking at this and thinking about the human in the loop right this email didn't list the vulnerabilities it just said go figure it out here's the number good luck um it didn't list any additional details it required CIS admins to perform extra steps to get the necessary information like they had to log into qualus which as we found out later a lot of the people didn't have access to qualis so they'd click on the link to log in and they wouldn't have access so they would

just give up and be like Oh I got the email again guess I'm not going to do anything and at the end of the day this added a lot of friction in order to execute right because if anyone has done any sort of CIS admitting you know everything sus not everything things are constantly on fire and then you're getting told by the it security team hey also you've got to go fix these 10 vulnerabilities good luck uh to A system that you don't have access to so we reviewed some related work and basically just employed some pretty basic Behavioral Science principles right it's like instead of asking people to do all this additional work what if we just

laid it out for them as simply as possible and so this is the new email that we had started to send out to them some things that I want to point out um is that we every email only focused on one family of vulnerability so this one focuses on Windows we say how to patch and we would put instructions for different versions different dros if that was applicable um we would link to various resource articles and then also um which you can't see in this email there would be an attached CSV that listed out all the details from qualus for them directly so they didn't to go log in anywhere instead they could see like what is a cve what's the Vault name

what's the IP etc etc all this metadata so we started sending these and then the question was did the patch rate change did anything happen so we created this automatic pipeline to analyze the data um to see you know like during the old email and during the new email what was the patch rate for different contacts fs and different times of the month and we found in aggregate the patch rate increased from 3% to 78% which was huge that was like a huge difference people went from basically doing nothing to actually reading the emails and doing something but of course I work in measurements so my question was why is it only at 78% why is this not 100 I'm giving you

instructions that I painstakingly wrote out and so we analyzed the data in a couple different facets and we found some interesting Trends the first is that some of the contexts were just much better at patching than others there were some groups some admins who were at 100% there were some who were a little bit lower so the average was 78 but this was there were actually some pretty disperate distributions when we looked at the groups and the contacts we were sending to um certain V families just get patched more um patching something like Chrome is a lot easier than patching something like your Linux distro and we saw that in the data where is ated applications were getting

patched at much faster rates than something like your operating system um which makes sense it's more of a pain and then some vul families uh just take more time to patch and this was kind of the second part of that takeaway and so at this point we actually did something that was like a little new for me um instead of just looking at data we went and we talked to people so we conducted semi-structured interviews with many of these CIS admins they were anonymized to add a qualitative view to the quantitative data right like we had all these hypotheses and instead we just went to them and asked why like why are you not patching this over that how do you feel

about these emails um and the reason that I say semi-structured is because we had a list of questions and if you've ever done academic research you need to like get your questions approved by this thing called the IRB um but we also learned a lot of fascinating tidbits when we just let people talk and listen to what they said um in particular when we asked about the old email we saw three main Trends the first is that the monot monotonicity of the old email made it really easy to ignore that email looked the same every single week every week it just said go log into qualus and check out what's vulnerable and people straight up told us it was really easy

to ignore so they just started ignoring it um we also found out that many teams have exceptions like they're like oh I'm told by my manager that I'm not supposed to update these four different programs so I just ignored it's like oh we were not aware of that as the security team uh that's interesting um and then we also found that a lot of these notifications fall outside of their patch cycle um so a lot of these teams have set times that they're going to update specific applications specific programs so they'd see the email and they're like yeah I'm going to get to Red Hat in two weeks why would I listen to this email right

now um overall we found really positive sentiment towards the new notification but we also found room for improvement and better integration like I said the fact that a lot of these teams were like yeah I have exceptions and I've been filling out this Google form with all of my exceptions don't you get the exceptions was news to the entire security team and so that was a huge disconnect that has um since been remediated that we wouldn't have found out if we had just looked at the data we needed to go talk to people to find out that there was some random Google form that they were filling out and they thought they were exempt from patching

um yeah I know it was it was wild um so we not only improved the process but we uncovered systemic differences in infrastructure we figured out what are the right metrics and to my knowledge um these emails are continuing to be sent out and people are continuing to patch at UCSD so this is the third project right so in the process of trying to figure out how can we reduce the attack surface of an IT organization that's pretty spread out um we increased the patch rate significantly and we also found these major discrepancies in systems and organizations that could then be remediated um and that was a pretty huge finding for us um oh it's 338 okay great

so let's just recap I went over three very very different projects that in my opinion all fall under the umbrella of Internet measurement the first one we looked at attacker Behavior because we wanted to figure out what defenses we could build the second we looked at at internet-wide scanning data and compared it to other scan data to figure out how we could get better and then in the third we did a mixed method study to improve vulnerability notifications and specifically vulnerability remediation at a pretty spread out it organization and so what I'm hoping you folks take away is that like internet measurement is a tool for security research but you can also employ this in your own

organizations in your own um situations and so for example you know one totally different area that I don't work in is hate and harassment so you know how can we measure hate and harassment on online platforms to better think about defenses to protect targeted users I think about the user a lot I think we're all people at the end of the day um the this project that I had talked about yesterday you know how Emeral or host reports on the internet to better understand how do we alter our scanning to have even more up-to-date data for users or you could go totally different direction and say what trends exist in leaked data such that we can better

understand how do we help journalists democratize knowledge about what's happening in the world via that leaked data and so there's a lot of different ways that you can think about measurement you can think about it quantitatively you can think about qualitatively and what I hope that you've learned from this talk is that internet measurement is about quantifying and improving the many parts of the internet including security and it can be a tool for everyone so with that I want to thank you folks for your time um I think we have some time for questions you can also follow with me after on any of these handles and I'd be happy to chat thank you so

[Applause]

much I just be loud first question what's it like being a badass all the time oh my God stop Christian I used to work together he's just trying to Hype me up

all right okay uh second question uh you kind of mentioned this earlier and you were talking about how measurement is relatively new in the cyber security space you talked about these other pillars you know if you're threat hunting pen testing all these other things how much do you think we can learn from internet internet measurement and collecting and quantifying that data and using it to run experiments to gain better understanding how do we take that out of Internet measurement and take it to threat hunting and other things to be a lot more evidence-based I mean in my opinion I don't know if you share this we make a lot of decisions about what we should do and what defenses we should

put into place and what products we should buy and there's no evidence that they're efficacious or that one product works superior to the other outside of a vendor deck which you can't trust yeah so how do we take a lot of what You' just shown us which is objective measurement good quality data science and apply it to the other parts of cyber security yeah thanks for the question um I think that's really hard to answer um the the short answer is by taking a step back and thinking more about holistic approaches and so like often when I think about threat hunting or threat Intel for example we're thinking about very specific use cases or like a

very specific ific actor and one way to think about you know how do we bring internet measurement to that realm is like okay well like what is the spread of this actor who are they affecting how much are they affecting what does that look like over time and so I think one way to to start merging all these fields because I agree I think it's very useful it's thinking more holistically so like instead of just in this time in this moment in this specific actor how do I generalize how do I take a step back and take a more Global uh holistic view so that answer answer your question thanks uh so this is sort of a generic

question that comes to my mind whenever I'm looking at my log files is it an issue for you guys uh where people uh view what you're doing as being hostile uh where you hit honeypots where you hit firewalls that start blocking you where people you know try and hack back or complain to your ISP is that is that like just a little noise that doesn't matter or is it part of what you have to deal with you know when you're doing these measurements yeah just to make sure I understand correctly you're a little muffled I apologize is the question you know like how do you deal with um like honeypots on the internet people scanning back

hacking back correct along with uh like there are tools that will throw up drop rules in Honey and you know what I'm talking about drop rules and firewalls as soon as they see you scanning and then 15 minutes later the rule goes away yeah how do you measure and again you is that just so small that it doesn't matter yeah that's a great question um so how do we I I guess the overarching question is like how do we deal with all these facets on the internet that make internet scanning hard like honeypots firewalls things going up and down Etc um the first you know I guess the base answer to that question is like

understanding what can go wrong and then trying to quantify that so like in census we actually do label to the best of our knowledge honeypop pots tarpits Etc because we know those things exist so then we can go and do you know more fingerprinting research to say hey what are the most popular honeypots what do they look like on the internet what do they look like in our data set and then how do we label that such that other people aren't deceived as well um so that's one facet of your question how do we deal with firewalls that go up and down honestly constantly scanning changing our scanning um looking for new things and comparing

it to older parts of our data set and then recognizing like oh this thing is actually really new let's continue to dig in further but I think at the base it's having a deep understanding of the field itself and this maybe goes to the other questions like understanding in networking on the internet what are the things that could go wrong what are the things that could exist and then how do you account for that in your measurements in your scanning Etc does it answer your question yeah thank you very much yeah thank you hi great presentation nice to meet you I would like to say that I follow the work that is held at UCSD mostly with the

network telescope that they operate over there um so I would like to ask if there is an intersection between the UCSD telescope and the work that you perform currently so it's mostly a curiosity is there any other work that's coming out of the network telescope yeah if there is an intersection between the the style of research that you conduct the measurements that you conduct and this the the the knowledge that you acquired that could be potentially related to this telescope at the University yeah so I mean I think they're they're very highly related I mean I consider a network telescope to be a a measurement tool in and of itself for folks that don't know a network telescope is

essentially like a a block of IP addresses that isn't sending out information it is just receiving and so it's essentially like uh um like a a a box to receive information see like what are people sending organically on the internet with no interference right because like if we send a scan then someone might scan us back because they're like what the hell are you doing on my network but like this this block this network telescope doesn't do anything so yeah I think they're very highly related um and it's it's essentially just another tool right like if we think about census census is active scanning so like we're sending out probes we're seeing what information comes back um a network telescope is a

little bit more passive right it's it's listening and seeing what's happening on the internet and then like you said there's lot of really interesting Trends and research that can come out of it because you're just organically listening to what's going on yeah finally I just would like to share that I have this what is becoming a cloud telescope so I am reproducing the method that is used in a physically bound Network telescope such as UCSD but deploying the same approach in AWS to see how busy senses is at scanning the devices yeah and also other internet scanners so it's quite interesting to see this pattern this shifting in terms of behavior what's of what's currently

going on on the internet at well just sharing yeah thank you so much I love the mask what mask thank you so um I wanted to ask if you could share a little bit more about how you were configuring nmap with when you were doing your comparisons between the sensus scanner and the nmap scammer I think that you mentioned that one of the areas that census was excelling in was is sip uh and you know sip ports cuz yes sip is a lot yeah just hand waves so I was I was wondering if you were using uh like the the NSA the H sorry the nsse uh sip scripts as well and comparing that to yeah we did a number

of different tests like we did like Bare Bones end map we used the NSE scripts we tried like um dropping anything with UDP cuz like UDP gets a little funky um with like filtered results um and even when we tried the NSE scripts we still found a better improvement with census do you provide any feedback to inmap about any of that or are they considered a little bit too much of a competitor oh we haven't provided any feedback um but not because of Any not for any reason besides time to be quite honest um so yeah that's definitely something that I can explore separately um there's just always things to be doing yeah absolutely thank you

very much thank you great presentation um I thought that the uh going through and recrating the patching email was very cool like uh to be able to like just take a different view on it to get people to respond to it and actually fix the the patching was great my my followup on that my question is like for the people that said that they had exceptions previously did you follow that down the risk side of things to like figure out how people got exceptions were they real exceptions like tying it back into like a like a a risk register type system like was that something you guys did I'm sorry I it I'm having a hard time hearing what what

were you saying it's just hard with his masks no I think this like I don't think I can hear the speaker oh okay so just like um for the people that said for yeah sorry I'm just got it okay very cool um for the people that came back and said uh I had an exception already oh yes you went down that did you go down that path back to like a risk register like because a lot of times we find people say that as well and that's how we find out that our risk register process is broken the approvals for that is broken no one follows up on that like they they never expire so you know that

more my question like did you go further that way on it yeah so we we did examine for the people um the the question was you know for the folks who said they had an exception did we dig into that further um we did so after the semi-structured interview we followed up with them and we're like hey what are you talking about and that's when we found they were like oh there's this Google form that I thought was going to the security team and we were like we don't own this where did this come from and so that was when I handed it off to the the the security engineers and they chased it down and clarified some things

and and this actually led to um a bigger takeaway for UCSD it which is that because there's like so many different organizations who just have their own people there's a a breaking communication um and so there were some organizational changes that were made to try and facilitate better communication but yeah we didn't have like any risk registration or anything it was I think what had happened in that case is that someone who had previously been on the team years ago had set that up because you know you understand that there are exceptions to the rule and then that had just never been handed off correctly just using a process to better another process like ien took over USB access

approvals recently and now we're seeing all the breaks in it that we didn't know existed that we didn't know until we started seeing the requests yeah so like I it's just like I I always it always excites me to see like one process bettering things making another one better so very cool thank you yeah thank you for sharing that uh hi my name is Chris uh great talk thank you for that so you might have covered this before I walked in I walked in a couple minutes late wor but for those of us who are aspiring researchers and would love to get more into research what are some Pathways that you could share on ways we could

get in yeah absolutely great question um one of the things that is super cool about the internet today is that there's a lot of accessible public data sets and so actually this this last question like what it Trends exist in leaked data was in part motivated because I had gone to a talk where um oh man I'm blanking on his name the guy who used to H I'm not even going to try to remember this really famous journalist who was at The Intercept was like yeah there's all these data sets online but like I don't necessarily have the data skills to go and dig into it and so like I pa up with researchers and then they tell me all

the cool interesting stuff and then I write about it and so I think one thing that you could start to do is look at some of these public data sets like even public data sets that are used in tutorials um just to get an understanding for the tools and how they're used and then go and apply them to like leaked privacy data sets um more other Trends or other ways um you know uh like I got into research by working with a research lab in an academic org um that was my path but I think there's a lot of different ways and that's just two of them very cool and I didn't even think about working with journalists on

when you have a discovery maybe share it and get the word out yeah I would say you know there was a really interesting talk man I can't believe I forgot his name Mel Lee is that right I don't know M Lee just put put out a book hack leaks and Revelations yes that talks all about the data sets and you work that's the book Michael Lee's book this is where I was like wow there's actually a lot of really thank you for that there's a lot of public information public data sets now um but like pairing up with someone who might not necessarily have that background but has the interesting questions if you can bring the data

science skills to the table and you can pair up together I think that's also a really good way to get your toes wet very cool thank you for the the name drop thank yeah yeah thank you for remind there was a neuron working really hard to try and remember I was like oh my God I I'm terrible with names but uh I just saw he's going to be doing a book signing at Defcon oh nice so that's why it was in my head gotcha gotcha reason to go yeah um I yeah I was going to say I think we're at time you folks feel free to come find me and thank you so much for your time appreciate it

[Applause]

GT - PhishDefend: A Reinforcement Learning Framework

Related talks