BsidesLV 2024 - Ground Truth - Tuesday

BSides Las Vegas · 20248:35:44903 viewsPublished 2024-08Watch on YouTube ↗

Tags

CategoryTechnical

StyleTalk

Show transcript [en]

[Music]

I'm I'm just tring to give you [Music] something I'm just trying to give you something I do I'm just tring to give you something [Music] m [Music] a [Music] [Applause]

[Music]

[Music] [Music] I'm just TR to I do I'm just TR to give [Music] something I'm [Music] just I'm just trying to give you something [Music] m [Music] w

[Music]

[Music] [Music]

[Music]

[Music] [Applause]

all

[Music]

[Music] [Music]

a [Applause]

[Music]

[Music] a [Music] n [Music]

[Music]

[Music] [Music]

[Music] a [Music] [Applause] [Music]

[Music]

[Music] a

[Music]

[Music] a

[Music] [Applause] [Music]

[Music]

he [Music]

[Applause] [Music] he he he he [Music] [Applause] [Music] [Applause] [Music] a

[Music]

[Music] TR

[Music] hey hey hey [Applause] [Music]

hey hey hey hey hey hey [Applause] [Music] [Music] he

[Music]

[Music] [Applause] [Music]

[Music] [Music]

[Music] [Applause] [Music] he [Music]

[Music]

[Music] h

[Music]

[Music] [Applause] w a [Music] [Applause] [Music] I'm just I'm just try to give you [Music] something I'm just try to give you something I do you I'm just TR to give you something he [Music] [Applause] [Music] [Music] n [Music] [Music] I'm just trying to something I do you I'm just TR to give you [Music] something I'm just tring something I do I'm just trying to give you something [Music] m [Music]

[Music]

[Music] a [Music]

[Music]

[Music] [Applause]

[Music] [Music]

[Applause]

[Music]

[Music] a [Music]

[Music]

[Music] so what better to introduce Aldo with we remove passwords now what and I yes I do have questions for you afterwards Aldo take it away [Applause] all right let's do this uh can you hear me okay all right uh thanks everyone for being here thanks to the bid team and uh the passw team uh for having me here um for second year in a row and for second year in a row also I am the one opening distract which is very good uh I said this last year but I'm going to say it again because it is still true it's very good to be the first one because there is no pressure so even if if this talk

is terrible uh all the following talks are going to be amazing so U you're you're welcome to all of the following talkers so speakers so um my name is Aldo I am the application security lead for hyper uh I apologize because I have a lot of content so I may go a little bit faster so feel free to catch me afterwards if you have more questions or if you want to go deep on any of these topics I'm going to skip the agenda and I'm going to do a quick pasis recap so as most of you know uh pasis has been growing a lot in the recent years uh just a couple weeks ago Google announced

that they have passed the 1 million authentication Mark in just one year so over 1 billion authentication using pasit uh which is huge uh and as you can see more and more companies are using pasit every day and when it comes to pass Keys we have several options to store them uh the first one being security Keys which I'm sure a lot of you are familiar with uh and then we have platform authenticators platform authenticators work very similar to to security keys in a way that the private key used to uh authenticate you never leaves the device uh this is this is amazing to me because nobody has access to that private key besides me so they have they need

physical access in order to use that authenticator and lastly we have sync paskis they work almost exactly as as platform authenticators the only difference being that now somebody else is restoring those pasket for you uh this is great for adoption might not be great for everyone uh because some of those vendors may be protecting your pasis with a password so um you may be using these amazing Pas keys that are very secure but you are protecting them with a password which uh is not ideal and of course this has some resistance um if we go back to the first two options you cannot log in from everywhere that that is a fact so a lot

of us are traveling today and if we didn't bring our pasis I mean our I is we're done we cannot log in and that is a fact uh and very similar to this point uh if your device is stallen or you lost it or for whatever reason uh it's not available you're done you cannot log in but for the tier option and I'm sure I'm not the only one but some people don't want their pasis being stored by somebody else uh for the reasons that I mentioned like um this at the end of the day these are private keys and you are using private keys to uh sign a challenge so uh this again it's great for adoption

but not a lot of people are in favor of doing this and we also have developer resistance uh a few months ago I ran into this post uh which is very good I recommend you checking it out uh it's from a developer company and they mentioned that how implementing pasis is very hard uh it's a 100 times harder than they expect it to be uh they made some good points some not very good but uh they do have some valid things and one of the things that I that they mention is that for the end user perspective there's really no difference uh when you use a pass key and when you use Biometrics to unlock uh a username

and password so to them they are simply providing Biometrics and they are authenticated to a website so they to them it's transparent right so they don't have any incentive to move uh away from password to a pass key so if you have a chance please uh you can check this article is has some very good points so going back to the the original question did we actually remove passwords and I mean this in our company and uh well the the answer is yes we from day one we have been the passwords company and now we are the identity Assurance company but yeah we don't have any corporate passwords at all and as you can imagine this has some

um challenges when it comes to account recovery so how are you going to recover those accounts if you don't have a password if you're not using email to recover those accounts right uh and when we look at the traditional recovery options uh I'm sure most of you already know this but uh you can use SMS you can get an email you can get a one TP there are many of options you can use to recover your accounts one of them being uh calling it support and again as I'm I'm sure you know uh this could be abused by social engineering and it's it's very fun and all of these options have something in common right so they

are not very secure they could be uh or not as secure as we could hope so right so this is why now we have idb uh idb is kind of a new term but essentially means identity verification right so uh you know it's a process of identifying or making sure that the user is who they say they are and this may sound familiar like isn't it this what we have been doing for the last several decades uh but it actually isn't because we in the past we have been using just a password right so now we're actually verifying that the user is actually the user right and what is the main thing that we're trying to solve here well verifying the

user identity of course uh this could be for a number of things you know just creating a new account uh logging to a website getting a loan getting a credit card uh we need to verify your user right something really interesting to me I didn't know and I didn't think it was possible is uh using it for interviews funny thing uh we had a customer saying that they they had a job posting and some somebody applied for this job they went through all the interviews they got accepted uh this was person a and then on the first day of the job person B shows up and they had no way of knowing that they were different persons uh

several months passed and they were not aware that the person they hire was not the person that show up for work uh so that is a very good use case for uh identity verification making sure that the person that you hire is actually the one that is working for you and you know the the traditional uh ways of doing this uh like account recovery on boarding new users or new devices and if we do a a quick Google search uh we're going to learn that there are many many vendors that do identity verification this is recent I mean uh until recently there were not many vendors that did this um disclaimer uh hyper also does this and Hyper is not

even here so uh I'm sure there are even more vendors that can do identity verification for you and some examples of identity verification uh you know probably you have been asked to provide a selfie or a video submission or even an ID like a driver's license or even verification by human uh you know just having a video chat with someone and they can verify that is you uh I don't know who staying in this hotel but if you are and if you were asked to do online checkin you were actually uh asked to provide an ID so even now in this hotel we we were asked to provide to do some sort of identity verification

and now I'm going to give you an example so I'm I'm from Mexico in case that wasn't clear uh but uh uh we have this uh tax agency and is the equivalent of the IRS here right so when the pandemic hit they were trying to do things a little bit faster and they were uh they set up this website which was used to identify users and then um once you were identified you were able to do the thing that you wanted to do right uh so here's the thing they were asking you to they were send you a challenge and then and then you had to read that challenge back the funny thing about this challenge is

that it was coming usually from songs or poems so the end users were saying something that was very very funny and I'm going to give you a couple examples of this uh one of them being uh I don't know who loses more in this farewell so that that is so deep uh can can you imagine this like if you're trying to log into your email and then Google is asking you to record a video saying something like this in order to verify your identity that that's very bizarre and we have one more uh the last one that desire for nothing less than you that's that's so beautiful uh again this is so weird to me and and the worst part

is that this wasn't an automated process somebody was actually reviewing those videos and making the decision on whether that was you or not uh as you can imagine this good in scale and this actually didn't work so idb has some challenges of course uh the first one being pii storage what are you going to do with all of that like where are you going to store those videos selfies IDs what's going to happen with all of them uh we know that people love to store stuff in S3 on protected so who knows where where my stuff is going to end up right of course it's subject to fraud uh people trying to identify Prov prove

that they are somebody else you have to get a new credit card a new loan stuff like that but of course this has user friction users are being asked to do something else that they wanted to they are being requested to you know record themselves to provide more data and do things that they usually didn't have to do uh if we have time I'm going to go back to an example but for now I'm going to skip it and again just privacy uh when you are recording videos you are you know providing where you are to these companies like if I wanted to log into some of these website right now I would have to show them where I am like

all of this environment would have to be recorded and again that has a lot of concerns and of course we have to worry about AI uh there are some websites that you can create videos in so they can make you look like for instance right now somebody could take this video of me and then they can feed it to this Ai and then make it say something as if we as if we're me so idb has to be that challenge they have to make sure that they are not dealing with AI when they are doing some identity verification of course and in general I think a lot of companies are not taking users into consideration like uh again users really

well at least me I don't want people you know having videos of myself you know with the background like I don't want them to have access to my house I don't want them to have access to my job simply because they want to verify who I am right so that is I think that is something that companies are not taking into consideration uh uh and I have one example for you uh this is from a rer app think of it as as if if it were Uber so this is what the drivers are seeing um I want you to take a really good CL uh really good look at this picture because it's very subtle so you may miss

it uh what's what the issue here so again please take a good look at this picture and let me know if you if you see something wrong with this profile so um this is what was provided to the to the drivers uh if you take a look uh it says that the security verification is good valid photo has been provided and the credit card was verified so to the to the drivers this is just an amazing Rider right they have been fully verified or maybe it's just me but uh either the person looks exactly like that or or this is some sort of Fraud and they have very good rating too 493 that's that's a very good

rating and one more challenge uh there are some websites that are actually selling data so you can bypass these controls so this is not AI these are actual people that are selling their data so you can bypass these controls so if somebody has a site that is asking you to uh provide a selfie you can go ahead and purchase a selfie instead of providing your own and of course um face recognition doesn't always work as expected uh I was testing this website some time ago and uh they were uh they had an image of you and they were asking you to provide a selfie to basically match that you were the person that you were trying to

authenticate as uh and of course uh that didn't work and because clearly those those people are not the same uh but I thought that maybe I was using some pictures that were not as good you know they have bad quality so uh I try a few more and the application was always saying that the the person was the same then I triy using one of my pictures I say well I'm going to try mine maybe one that is not so blurry and as you can see I am an old person I was verified to be this same dude uh so I think what happened here is what the application was not doing negative testing right so

they wen't doing some matching and for some reason they were always saying that the person was verified when it wasn't so now I'm going to give you a quick example of what we're doing uh just so you know what you can be doing with idb so what we're doing internally is that we start the recovery Flow by asking the user to provide a valid uh username right so you need an username then we ask you to provide your phone number and then you get an SMS code so I know that I said that SMS are not secure but this is not used for authentication this is just this is just a data point right and then we ask for location check

uh you can decline it of course but we're just going to show at the end that you didn't do a a location check then we ask for your ID and then we do some facial recognition and if all of that weren't enough we also get you in touch with your manager so you can do a quick chat and make sure that uh the manager reviews all of this information and then they can ask you more things if they want they can say uh they can approve it and uh I think if somebody is able to actually pull this off like if an attacker can do this uh you earn it like have it you know you can have my juus if

you want like if you can actually bypass all of this yeah you did a great job so um this is just one option that how idb could be done and of course this is configurable you can remove things if you like or add more so what is the future of idb right uh I think it's safe to assume that it's safe to assume that idb is here to stay uh just as I mentioned when I was checking in I was asked to do some sort of identity verification and last year uh that wasn't the case and I think one of the drivers is just moving a lot of physical teams to the digital world and it's very convenient

because you know checking into the hotel took me less than one minute you know I just I just pick up my key and that was it so uh it's very convenient but it has a lot of challenges of course um but you have to do it uh in some sort of automated way uh as we saw in my previous example if you don't automate this um it's not going to scale it's not going to work and uh it's not going to be good and I don't think this is going to completely end social engineering but I think it's going to help uh again if we go back to this example uh if somebody can pull this off you know uh you earn

it go for it dude uh and lastly I have some main recommendations for you uh I I think the main recommendation would be to making idb part of your account recovery process uh you know it doesn't have to be as uh you don't have to ask all these many data points but at least some right so if you do this you're going to uh help reduce the account takeover uh percentages and you're going to make your account recovery process more successful one thing that I do want to mention is that doing idb is really hard as you have seen uh so if you're going to do this uh I recommend going with a partner and I'm not saying hyper you can

choose whatever partner you like but there are countless legal issues you know there are specific regulations uh for not just for pii but there are lots of rules and legal issues so if you're going to do this I think it's going to be best if you find one of the many partners that are there uh that already provid us and do this in very different ways and also do negative testing right not just the happy pad uh in the example that I show um on the face recognition part uh they were not checking the bad face recognition so always do negative testing and of course go password lless uh if you can uh doing going

passwordless is going to reduce your account take cover by almost 100% so it's always a good idea even with the challenges that I mentioned at the beginning uh I still have some time so I'm going to go back to my example just to uh let you know what I meant by the user friction there we go so um I have this Bank uh this credit card that uh requires me when I change phones it requires me to do identity verification again uh so it was the middle of the night I wanted to purchase something from AliExpress in the middle of the night uh I couldn't do it because my credit card was blocked because I had to

do an identity verification one one more time and that's something I'm not willing to do honestly like it's the middle of the night everybody's asleep I just want to buy a $2 phone phone case from China I I don't want to have to do this in the middle of the night right uh so I again I think banks and you know companies in general are not taking users into account uh they are adding a lot of friction so in order for this to be successful they have to do it uh more gracefully you know making sure that they are not adding friction to the users and I think that's it uh yes that's pretty much it uh thanks everyone

for your time uh I do believe we have time for questions so yeah if you have any let me know thank you [Applause] questions raise your

hand hi I was curious um how is this uh the facial ID and the the video with voice any of that stuff how's that with with the continued advancements in deep fake uh technology how is that save from those things right so um let me go back to that slide so you're talking about this step right here right so that is a very good challenge um idb companies have to be you know it's going to become a mouse and cut game so the idb compan is going to have to be they're doing what they call a liveness check to making sure that it's you and yeah I agree it's a challenge so idb companies need to stay

on top and need to be constantly evolving to identify those things that is a valid concern yep more questions uh do any idv companies like have like an audit Trail can you will they talk about the accuracy sensitivity and specificity of whatever set of these steps they put together um well I I cannot speak for other uh but we do so so yeah I I I cannot speak for other companies but yeah yes so just a quick question uh you mentioned about Pass Key storage and and uh you talked about a lot of the password managers um and how they're protecting that with a password which is kind of counterintuitive yep what would you say is a solution for secure

password storage or pass key storage excuse me um you know particularly when some of those vendors use like multiactor authentication and such I think that's going to depend on your on your use case so for me I use security Keys that's that's for me like I know how I am storing them I know where they are for me that is what I do so that's going to depend on how uh convenient and how secure you want them to be right so you have to put things into a balance so if you want to go with sync pasis and you want to be able to log in from everywhere uh you're going to have to accept that you know uh you're going to

have to accept that that can be accessed from somebody from some other location so right now that's that's the best thing that you have so you have to make that decision for yourself okay thank you Aldo and a round of applause for him of course thank you so I'm going to be next speaker uh combating phone spoofing with stir shaken a besides Las Vegas crowdsource status quow demo on explanation that's a talk I didn't plan on doing here because I assumed for some reason that security people here at bides knew about stir shaken which has been in uh into law uh in the US now for two years but it proved out to me that there are some

people who don't know about it so if you want to listen in I'll start at 11:00 thank you

for sentence is a fun me and a past sentence good one good one

yeah okay this one is open to [Laughter] discussion all the best for your talk

[Music]

[Music] [Applause] [Music]

[Music]

n [Music] [Music]

[Music] [Applause] [Music]

[Music]

[Music] [Applause] [Music] he [Applause] [Music]

[Applause] [Music] [Applause] [Music] he [Music] n [Music]

[Music] back [Music]

[Music]

[Music] track [Music] hey [Music] [Applause] [Music]

hey hey hey hey hey hey [Music]

hello again I'm P um and I need need to stay behind the microphone here uh so stir shaken uh this is something that I've been sort of tracking back home in Norway and Europe for the past couple of years I've done talks about this in in several uh conferences and uh Arenas in in Norway and other countries as well in Europe and I had absolutely no plans on doing this talk here in the US because I just made the assumption that since the US and Canada have stir shaken in place there would be other people perhaps that would be talking about it not me um just have the theory on on this stuff and the uh simple thing is

give me a call on this number if you uh if you can if you have a us or Canada uh number uh eventually I will call you back just give it a ring and HG up I'm not going to respond and when I call you you there's no need to answer uh as well but this is just for a little bit of crowdsourcing from me if you can in addition also uh text me or put a message on Twitter DM or anything like that with the you know three four last digits of your number um I would really like to know which carrier you are using and when I call you back you're going to look for a

small check mark in your call history um and tell me whether you can see that check mark or not so I spent like for I don't know uh eight nine years now looking a lot into Mobile hijacking as I call it um I do different differentiate between Port out and Sim swap attacks Sim swap to me is messing with your current subscription with your current cell carrier like getting a new SIM card or getting an extra SIM card so if I get a sim card with your name I will be able to get your phone calls and your text messages as an example a port out attack to me is calling your provider uh and calling a

new provider and say to the new provider that hey I am you and I want to move your current subscription over to the uh new uh carrier and get a SIM card from them so it's a little bit of a difference but in any case uh social engineering attack um and I started talking about this back in 2019 after doing uh quite a few uh years of research at one point I took out u i put out a tweet saying uh hey I'm trying to learn more about phone spoofing can somebody give me a call or a spoof Call uh so I can understand a little bit more how it works and less than 30 minutes later uh my phone was

calling and the number was plus 000000000000 lots of zeros and obviously it's a SP phone call I pick up the phone and I say hello this is per hello this is Vladimir calling from Moscow Russia and it was actually Vladimir calling from Moscow Russia because he's a friend of mine and we're not talking about Putin but not of Vladimir in Moscow go and he laughed and said ah this was really fun and it took me like 10 15 minutes to find a spoofing service that would allow me to call you using pretty much any number in the world and that's what you will see uh on your display so I already explain the port out attack uh Sim swap is well I can get

a new SIM for your current subscription I can get the twin SIM card you know some providers do have that I can also just get a data SIM card that doesn't make that much sense but it can be done some of the things that I've done back home in Norway is back in 2019 I let made a lot of fuss in Norwegian media about how easy it was to hijack somebody's uh phone number uh using simple social engineering and get a SIM card in their name as an example or moving this this subscription to a new carrier So based on that uh the Norwegian government came out with a new uh resolution which is still not part of

Norwegian law but basically they say uh this is a hearing from Norwegian government on September 3rd 2019 actions to prevent mobile hijacking and I'm still waiting for this to pass it basically says that before you allow to get a new subscription or change your subscription with your current carrier or move it to another carrier you have to provide proper ID you can't get a an anonymous SIM card in Norway as an example not possible you have to provide ID we we we want to know who you are before you get a phone number I've also done a little bit on voicemail hijacking which is something very completely different but if you able to spoof a phone call uh you can

get access to people's voicemail all the way back in August 2006 Norwegian press wrote about Paris Hilton and lindsy Lohan uh where the Uber hacker Paris Hilton was actually using a car that is I think it's either us or Canada called spoof card they still exist today and they will allow you spoofcard.com so there you go have fun uh and they allow you to make SPO phone calls in the US and Canada because of my work spoof card sto working in the majority of Europe you could no longer make spoof phone calls to Europe so that was me sorry guys um and what she did back then in 200 six she used uh spru card to call

into lindsy Lohan's voicemail because she was spoofing her phone number and doing that she was able to listen to All voicemail messages that were left for lindsy Lohan and she was also able to change the welcome message as an example I did the same thing in Norway in Sweden and Denmark I proved that by spoofing phone numbers I could get access to almost 7 million voicemail boxes with three different carriers in Norway Sweden and Denmark back in 201 2019 now most people over there don't use voicemail but matter of fact if you get a phone subscription in Norway Sweden and Denmark you always get voicemail as an included service in fact you can't even tell a kak I don't want voicemail

but you can turn it off but of course default it's turn on which is crazy now back to stir shaking again what is Stir shaking well you may know it in a way as caller ID in the US um and it's pki public key uh infrastructure for uh phone calls this is you you have Ryon as your operator you make a phone call to me I'm using AT&T uh your carrier which supports stir shaken will you have an authentication Service they will add a digital signature to your phone call going to me my provider also has a verification service they will check a certificate repository and basically my phone running iOS or Android in at least some

of the newer versions have integrated support for verifying the incoming call and the added digital signature that to that uh phone call so it's pki for phone calls this is pretty cool stir shaken provides three level of of attestation of calls you have the full attestation the service provider has authenticated calling party and they are authorized to use the calling number an example of this case is a subscriber register with originating telephone service provider soft switch you have second highest level partial at the station the service provider has authenticated call origination but cannot verify the call source is authored to use the calling number an example of this use case is a telephone number behind an Enterprise

PBX and the lowest level is Gateway station the service provider has authenticated from where it received the call but cannot authenticate the call Source an example of this case would be a call received from an international Gateway now let's go over to the marketing stuff because again we don't have this in Norway we don't have this in Europe yet I've been advocating for years that we should do as you do in the US and Canada I get back to the timeline here a little bit later but this is the marketing stuff that I have found from websites from T-Mobile Ryon uh the US FCC as an example and they say that this is a sort of like an iPhone display and

somebody's calling you uh you see the number and it says uh it's coming from Atlanta but this could be spoofed now adding stir shaken you would be able to see a very ified symbol through so that is your phone authenticating verifying the uh um the uh um the um digital key that's been added to the phone call uh the signature and you will see that yep the verified symbol means this is actually the number calling you it's not a spoofed phone call of course it could be a scammer or telemarketing company but that is the number being used and this implementing this is a cost to any carrier I have no idea how much it costs to deploy this but it's a

pki pki infrastructure that will add digital signatures to every single phone call so yeah it's going to be more than $100 for sure who's paying for this well in the end it's going to be you and me right that's how it is but more interesting is that they also say and this is from the marketing stuff so I cannot guarantee this is actually how it works but they also say in the marketing materials that once you have stir stir shaken in place you can also add what's called Rich call data so you can actually integrate because phone calls today are mostly voice of Ip it's internet traffic IP packets so you can actually add text and

a graphic logo and even a small text that you should be able to see on the call screen so if if it's your bank calling it can say you know you could show the logo of your bank and say customer servic is calling or it's a hospital or the doctor or whatever else and this is the point where I've been telling people in Europe that well and for the telecom companies for sure that hey give me this and you can add this and you can charge money for businesses and government organization saying that if you want to add the additional level of trust to put in your logo and a text displaying why are you calling your

customers you can do that and marketing material sets says you need to have stir shaking in place before you are sort of allowed to add this stuff as well so if you go into your call history on your Android or iPhone we are basically looking for a very small check mark now I was surprised because I have an iPhone 15 Pro I it's completely updated uh I now have the uh Us number which people keep calling me all the all the time I see on my display and I don't see the check mark from anyone calling me be using Google Ryon at and so on I've been calling people back and some people have responded to me yes I see that from your

number I see a small check mark in the call history but on the call screen it doesn't show show anything that could assist you in making a better informed choice if this is a SPO phone call or not in January 2018 Canada said that they expected implementation of stir shaken by March 31st 2019 got delayed a little bit uh and post deploy report by May uh 31 uh 2022 and in December 2019 you had the traced act here in the US uh the C said that this was approved by uh March in 2020 big providers in in the US needed to have this implemented by June 30 2021 and small providers in the US needed to

have this implemented by June 30 2022 and on June 30th 2021 t- T-Mobile USA announced that they were a 100% compliant with the their St shaken implementation and during all all these years you know from 2018 to 2021 2022 I did see Norwegian Telecom providers and Norwegian government talk about this from time to time but they never did anything they didn't even contact anyone in the US to ask what's the cost what's the time frame to implement this it was just like yeah that's a us thing they have problems we don't so we don't care but I found as an example in Europe uh a nice little U EU report uh several hundred pages on page 34 it says it is

unlikely that all operators in Europe will introduce systems to counteract CLI spoofing uh so that's you know call spoofing on their own initiative without regulatory intervention in that sense the situation is similar to that in the USA where operators only introduce stir shaken on a large scale after implementation of corresponding legislation it is likely that all European operators wishing to terminate calls where both the call party number and the calling party number are us numbers will in due course have to implement St shaken clearly this technology has the first mover Advantage so again I keep telling people tels in Norway and in Europe you should do this there are problems with Spam calls proof calls in the US is a very small problem

in in Norway but the problem is going to come to Norway as well if we don't do anything and the answer I'm getting is yeah we'll deal with that when it comes and I'm like okay I'm not giving up and funny enough we also have a law for Telo Telco providers in Norway and they have this excellent sentence in Norwegian uh translated into English saying service providers telecommunications must as far as technically possible and financially reasonable block phone calls for for anyone trying to use an a number which they do not have the right to use we have three Telecom providers in Norway with physical infrastructure I've talked to all three and they say this is

a bloody difficult Market to operate in we don't make any money from providing sell coverage in Norway so you want they shaken now we're not even going to look into the price of it because we are not all you know at this moment we don't make any money at all more or less selling uh mobile phones subscriptions in Norway so you know go away and I'm like yeah let's see about that still working on it so to summarize a little bit on this what can you do well tell people because I don't know how the situation is in the US or in Canada but in in the rest of the world at least my experience people don't know

that the number they see on the screen when somebody is calling them can be spoofed which is a little bit crazy to me uh I do recommend people to enroll for free in the Google protection uh Advanced Protection Program um you can enable the lockdown mode on your iPhone and of course the very simple trick off if somebody is calling you asking you for your credit card details or social security number whatever it is and you think it might be a proof called just make the very simple question can I call you back on which number because if somebody is using a spru phone call it will be difficult or impossible to sort of call them back on the same

number and also if you know anything about GSM networks and stuff there are options on Android on some Android phones to disable 2G support um it's a setting available Android uh 13 and 14 as a minimum and when you turn on lockdown mode on iPhone you can also disable 2G because 2G being very old doesn't have Mutual authentication so setting up a fake base station and making your phone connect to my base station and then I can e strop on you and I can send you text messages as many as I want for free is I'm going to say incredibly easy but it's easy enough for most people to figure out if you just study a couple of YouTube

videos more or less and going back to what can you your business your organization do uh about this well I would really recommend you to tell your customers uh about you know your official channels if people suspect any kind of spam fraud or something being done towards them uh that looks like it comes from your organization you should have as I say at least you should have a web page saying our official channels or just official channels these are the channels our company is using on Tik Tok Snapchat YouTube Twitter masteron and so on for official information and if you are receiving any text messages calls emails from any other domains on numbers and so on it's

fake it's fraud and you should report it to us at this number this email as an example uh you should also talk to SMS providers that you might be using for sending out text messages uh and ask them hey do you have any kind of Protections in place so that nobody else are able to use our name as the sending name of a text message or the number in Norway right now you can go to any SMS provider sign up for an account pretty much for free and start sending out text messages and you can set the sender name or number to be anyone you like do you see a problem with that I do and they

are starting to see the problem appear in Norway now I don't know situation in us while guess I think is worse here SPO text messages talk to Telecom providers and talk to your government about obtaining insights from Telecom operators on detection of fraudulent calls and SMS because statistics are very very useful now to my big surprise I arrived here on Thursday to my really big surprise I've been talking to a lot of people already and have you uh you know ask you to call me and I thought that since St shicken has been mandatory for all Telco providers in the US since July 1st 2022 I will see I would see lots of check marks I'm not seeing any check

marks at all in my call history and when I've been calling people back some people have been telling me yes I do see the check mark but that's like one out of three one out of four that actually sees that check mark I don't know why but please do me a favor please call your provider send a message to your uh provider and ask them do you support St shaken why am I not seeing it in my call history why I'm not seeing this on the calling screen because I'm supposed to best regards the FCC part of the US government thank you and in the last case if that doesn't work go and vote in November and tell

your government to do that [Applause] you can find me on Signal here's my phone number I'm on LinkedIn and again let's give me a call if you haven't done so already I will call you back just a ring or too room number yeah have you seen this [Music] compliant you're

not

Ena so you cans to the hotel they called me and they have the check mark they have the check mark in place that's excellent that's excellent yeah yeah so next talk in 9 minutes uh Cecilia vion picking a fight with the banks uh that's also going to be very interesting uh in a way I would recommend especially women to listen to this next up coming talk but it's relevant to absolutely everyone see you back in 9 minutes get

[Music]

[Music] [Applause] [Music]

[Music] [Music]

[Music] o [Music] w [Music] w

[Music]

[Music] [Music] I'm just I'm just dring [Music] something I'm just tring give [Music] something I'm just trying to give you something [Music] m m [Music]

[Music]

[Music] w a [Music]

[Music] time ends at quarter after um and and it's an easy quarter after so there's there's lunch afterwards you're not pressing up against someone you guys want to have a hard 40 a hard 15 or do you want to like have your talk in and then have some spr how much I want to give you Q cards yeah this that's nice so I think that we I timed it yesterday was it was pretty good at 45 minutes cuz we're going to do a demo in the end for like 5 10 minutes and and the demo is really flexible so we can like adapt that but I my point is that we do the demo and then maybe we end at like 10 15

past and then everyone has question can stay we can just stay for as long as people want to stay I'll show you the Q cards but don't but don't think of them hard this time okay that's super nice yeah appreciate that all right cool thank you so much I'm going to put this here with you so remember

it yeah this nice to do before launch so if people have questions they can

talk yeah but that's good you have it yeah yeah or maybe like I'll tell you something awesome thank you very important thing if you're a r you like to walk yeah you need to you need to walk from this side over okay there's a pole okay so I don't walk to the left yes interesting um okay good let me uh and introduce you guys to go um so do you m check to say hello hello hello hello hello hello hello hello just say hello hello oh there's not connect is it on is it on me see is on hello hello okay good okay all right let's get this uh start everything good fantastic okay um

so good morning and welcome to bsides Las Vegas uh we are doing devising and detecting spear fishing using data I should use a mic shouldn't I would you mind you got me thank you uh we're doing a talk on um devising and detecting spear fishing using data scraping large language models and personalized spam filters this talk is given by Fred and Simon lurman so I'll go ahead and kick it off oh actually I have a few things I have a few things to say very important we'd like to thank our sponsors especially our Diamond sponsors and also these talks are being streamed live except for certain ones and as a courtesy to our speakers and audience we

ask that you check to make sure your cell phones are silent so give a little check we're good and let's get on with a beautiful talk thank you so [Applause] much perfect thank you everyone I'm just going to start my timer here so I hold you in a good time and yeah let's get started so again this is about a enhan fishing thank you so much for for the introduction my name is Fred heing and this is Simon I'm a research fellow in computer science over at Harvard focusing on a wide range of red teaming uh both iot and embedded devices and social engineering such as fishing last year I was presenting another study on

AI fishing at blackhead talking about how large language models can automate fishing emails and this is a continuation of that work as we will see uh a little bit more about Simon just to mention that as well he's done AI researchers focusing on a wide range of AI risk topics and uh lately a lot a lot of his work is on spear fishing we also receive um received counseling and mentoring from Aaron wisha who's in the back of the room here and Bru schne over at Harwood so I want to give a big thanks for them as well for being part of the project and yeah if you have any questions you you may stop me just Jing

the presentation or there will be all the time we want afterwards as well so this priming you with a few questions that we really interested in that overall guiding our and my research that I do that I think is is really good to think about and one of the thing is how can we use AI to yield more benefit to Defenders rather than attackers and this is actually really interesting a couple of months ago I wrote an article in the Harvard Business review about how AI yields way more benefit to fishing attackers in the context of fishing the reason for that is that you know there's a lot of different Technologies that empowers things and AI really empowers

things it makes it very easy to create good scalable high quality fishing emails but from the defender sides we can use AI to increase you know spam filters and so forth but in the context of large language models and AI Technique we already have a lot of spam filters so the increase is is only small whereas for the attackers the increase is very very big so this is a fundamental question in my research how can we change this game to make sure that AI yields more benefit to Defenders than to attackers again in other technical domains you can really use it for both sides attacking and defending because in the in technical defense you can just add and update and Patch all

the systems but we can't patch humans that's why this is always a nuisance and it's always a problem I'm also very interested in how to quantify the risk of fishing I work a little bit with the business and poly schools as well as well over at Harward to try to quantify how much should organizations pay for fishing defense is it generally underfunded or overfunded because these are quite important questions and a lot of people have a lot of opinions about fishing but it's quite difficult to get actual dollar values because what is the worth and what is the risks of fishing attacks and one thing we're also very interested in and happy to talk more

about this just what is the purpose of academic fishing research a majority of the fishing research are being done by anti- fishing provider I think that's a little bit biased because these people often want to sort of scale up the numbers and say that you should pay a lot for fishing defense and that do does make sense often but academic fishing research is quite important it's Al difficult because you need to pass a lot of Ethics revs which takes a lot of time and makes it quite impossible to scale it up big so so this is uh this is something I'm also very happy to talk about anyway uh more down to the actual meat of the study uh again last year

what we did was just sort of the fundament for this and for my future work is we compared human fishing experts with large language mod just to see that which are best and we learned a bunch of things the the big takeaway is that language modes aren't yet this is 2023 language mods aren't yet as good as human experts if you combine them if you combine the language mode queries with some human expert input they get really really good so basically what we did was we automated email creation what we're doing this year is we're taking it further and automating the entire fishing campaign attack chain so that is finding participants finding information about the participants creating emails

sending the emails then self-learning and improving the tool based on which people press the link and which people didn't press a link so it really is one step further in this chain and we're implementing uh implementing it in a bunch of places we're talking with Harvard's it Department to test this on all all Harvard's faculty this is pretty important for us so if anyone has a company or someplace where where we can Implement our research we're very happy to collaborate because we want to get as much data as possible to scale this up pretty big um just a quick shout out as well to these artificial intelligence things that everyone is talking about it's a good fasis for this talk so large

language models is an AI technique and they're often instantiated in chatbots and these are a bunch of different brands of chatbots so just I think everyone knows this but it's good that you have this knowledge the icons down there are the ones that we use and it's for open ey anthropic mistol Google and Facebook it's not super important but it's good to know this difference chatbots language models and AI so that was a quick introduction uh now I'm going to dive into the real study I'm going to talk a lot about how we automate the entire chain of fishing emails I'm going to show you some mitigation plans for how we can improve and solve this to try to make AI yield

Defenders benefit I'm going to show a quick cost benefit analysis and then a demonstration of our tool let's see if you can see this image here it's relatively good I think most people can see this this is just an overview and then I'm going to dive into every part but so what does this tool do it's a python based tool for whatever that's work this used iterative qu iterative searching through uh through Google we will probably uh expand it to other use other uh search engines too in the future I talked with another person about that yesterday but anyways we search for example Frederick heing at Harvard so my name and one keyword doesn't need much at all and then we

feed it to any language model of our choice we often use GPT different models have different safeguards but we can we can try anyone we just added a local model as well that we can use and then it searches we use Google uh it usually uses two to five searches but it can search for 10 pages 20 Pages really as much information it is and the tool is quite smart IT learns itself when to saturate so it usually goes in and finds things such as personal websites social media LinkedIn is a great choice right company websites all these different profiles it understands how much information does it want before it lets go and again these API queries cost it's

a very small cost but you have to pay a little bit for this so if you want to scale this up to millions of people you can choose how granular information you want and how much information you collect but it usually creates a really good picture of the person after this we create a synthetic attacker profile so we scrape the target for example me then based on the target's information maybe I'm a computer science student then we then the tool itself creates an attacker profile that can be a synthetic Professor who offers me a research internship gig or anything like that so based on the based on the attacking Target profile we match it with a sender

that it could be real but it often times fake and then we create the email and the email uses all this information a bunch of best practices and it's usually really really good as we all see just want to give you a high level overview of how that process looks uh so to dive into this a little bit deeper uh the face one is about collecting participants in information the tool can do this automatically which is quite cool for example it could find out the know all the employees of a certain group at Walmart or whatever company you want again in the context of Academia we can't really do that because that wouldn't be fully ethical so we recruit

participants manually uh but this could be automated and of course we hypothesize that foreign na states do automate this and work quite aggressively with this right now and but we click participants we did that first pilot that was rather small and we're going to scale this up much bigger uh to see to see how this works in a in a real large context and we compare how long does it take to manually find this information to be able to isolate that cost variable in terms of time how much would how much time would I need to scrape all of yours information would be quite a task only the people in this room uh and then we compare the tool

which is of course automatically and and we see which is best in these cases again we find information such as field of work that's usually the best one but also collaborators which people do do this person talk to walk with work with Etc what interest you have extracurricular activities and based on all this information we create fishing emails and we do this by using language mod queries so the very naive ways to tell a language model create a fishing email to Frederick who's a computer science student um we have to be a little bit smarter as that as we'll see so the queries is really really important of main part of this study is actually to create really good

llm queries to ensure the the output emails are high quality here again we compare AI tool versus human expert this is really interesting we did this last year in 2023 we do this again now so we basically take all the emails from the AI tool then we analyze the m and see how good is this email how much would I like to change and already in one year we want to change much less the emails are close to perfect and you in the years to come it's quite easy to imagine that they will very soon be as so good so we never want to change anything and then it's interesting to analyze what happens next so the first Benchmark and

the only real Benchmark we have now is when model surpasses humans in the context of deception but when you move beyond that how deceptive can they be there's really no upper limit for that deception and there's no good Benchmark for measuring that so we talk with a lot of the AI companies and AI labs to create that kind of a benchmark for AI deception as well and so for these LM queries uh they are based on information for the Target we feed this in in a general way uh I'm going to show you a version of this how it can look at the demonstration but they're really long there are hundreds of words so I can't

really fit them in the slide but we have to tell the models things such as don't use more than 100 words because the models have a tendency to digress they talk a lot and when they talk a lot they add things such as I hope this email finds you well or I hope this will be a fruitful Endeavor and these things kind of give it away so we have to cap the model and make it not say these words we also have the bypass security uh I'm going to talk a little bit more about that later but it's quite easy to do uh you can tell most of these mod so create a fishing email then it's going to say

it's illegal so you have to say create a marketing email but this is kind of well researched it's really easy to bypass the llm guard rails and we can even create local models as we will talk more about later but you have to be a little bit smart about it you you have to bypass it and think about that but we do that and that works rather well and again we compare different models to see which one is the best but this is sort of the main philosophy and when we create these queries we don't just use the E the information from the person we also use fishing best practices to make sure that they're really persuasive and

one of the one of these fishing best practices is the VRI it's a really good book written by Aaron who's in this room and the main things we look about here is credibility and relevance so we really hone in on the query to Ure that it's credible and that basically means that the email looks legitimate you I see this as two Gatekeepers the first gatekeeper is that the email has to be credible if it's credible there things like there's good English there's no spelling mistakes no there's might be a logo there might be a sender that you know I'm used to receive email from so it looks really legit uh from bsides it might be that it fits all of the sort of

bsides categories so this really looks like a bides email and if it would be a bides email to me it's also relevant because I am going to bides if there someone saying hey you need to upload your slides today press this link that's pretty relevant because I need to do that but if you sends a bides email to my brother it's not relevant he would never do it because it's not here so it has to be credible and has to be relevant there's also a third but we primarily think of the first two here so these are the overall things to create a really good fishing email and when we have that we move on to what I call the

CDD guid L and these are sort of very old very established best practices for persuading people uh there are these they're way older than these you know back in the old Greek they even had these persuasive things how do you make someone do something and what's really cool about AI our AI tool is that we add these persuasion techniques to the emails and we randomly assign them so for example to some email we might say use social peer pressure that basically means say hey all your friends are going to this talk you should also go and for another email we might tell the language model to use Authority which can be saying hey your professor is telling you

to do this or the police or the tax government is telling you to do this so we have some authority figure and these are not necessarily Better or Worse they're just different so everyone is susceptible to different influence categories and we don't always know which we are susceptible to it can also change over time maybe today I'm quite susceptible to scarcity and tomorrow I'll be susceptible to social peer pressure what's cool when we do this over a long long time and thousands of participants we're going to see patterns that's where it gets really interesting and we're going to talk about that when we talk about the personalized spam filters which is a way to mitigate AI

fishing we can see that all the salese on Mondays are really susceptible to social proof but all the computer programmers on Tuesdays are really susceptible to Authority maybe that doesn't make sense but we're going to see patterns like that and that's super exciting because what we can do then is we can just add Flags in the email inboxes of people because people are super stressed we don't want to have overly aggressive spam filters because you need to get your emails but then we can see that hey fredi you're always falling for social peer pressure emails then we just flag the email saying hey this is a social peer pressure email you might want to press it it might not be

bad right because for example I get a these marketing emails from the Red Cross for blood donations I want to press them because I want to do it but it's good to be aware of how are these people Fishers and marketing peoples and colleagues how are they using these techniques to persuade me most people just use them subconsciously but we always use them so making people aware of this is really interesting and we're going to find it out with our tool this is just a very quick codify version of how much Human Assistance emails need these numbers shouldn't be taken as gospel but are pretty cool so we just made a quick number scream where

five is pretty bad and zero is very good and we basically says how much do we want to change this email so last year we did it manually so we changed a bunch of the emails and then we saw how much better they were and as we saw in the results they were much better we look for some simple things right is this email credible if it's not credible and relevant then we change it and if it's not credible and relevant the email is really bad then it goes down right maybe it's credible but it's not relevant that was often the case with the language models last year or maybe the email it's pretty credible it's pretty relevant but

we want to tweak some some minor words or phrases and what we see last year the average change of 3.5 out of five so that means somewhere in between minor contextual change and change credibility relevancy so it was quite a lot like they weren't super independent the emails from the language models this year the average change that we want to do is 1.5 out of five which means somewhere between minor language changes and minor structural changes and we really needed to think right so when we looked at the emails that our tool created some of them was a zero like we couldn't possibly change it I don't know how to make this better some of them we

did some small tweaks but it's very easy to hypothesize that when we do this last next year it's going to be a z of five and again what happens in 2030 like the only Benchmark we have right now is to bypass human deceptiveness like that's not the end goal right that's not as good as it can get so that's pretty cool we don't know how deceptive these models can be and I find that very fascinating it's just an example uh these are somewhat old so they're not the newest but we're going to show more examples later but it can be fun if you read the slides later you can read them as well this is a pretty good email you

know the the model creates this random cyber ethics lab which is pretty close to ethics lab that we have at Harvard but this is just synthetic so hey Fredick we at Cyber ethics lab a hardware based group are working on a project related to ethical hacking and its potential impacts it's exactly what I do before thought you might find it intriguing we're looking to collaborate with individuals who share our interest in this domain check out our product details here could we discuss potential opportuni to collaborate and this is good it's a really good email right and if you do this on like a mass scale to 1 million people which you can do it's almost free like the APA queres are

really big so we could press play right now and send this out to 1 million people I I'm pretty sure we would get a lot of them right and that could cause like again this is an election year in the US there's a lot of stuff we don't want people to do and say this could CA quite some nuisance and we have to realize that all the cyber crime groups on the world of course are going to work with this and of course they're going to use it so emails are getting pretty pretty funky here's another example I'm not going to dwell on it but it's going to be in the slid so if you want to check

it out it can be cool to look at it's pretty similar a little bit shorter so that when is send emails out this is some classic best practices that we use uh we usually send them out in in the context of 10 if you scale this up to millions of people you have to be a little bit smart right you have to see are you sending this do you have to send them all out in one day you there's a lot of there's a lot of good span filters right you can't just send one million emails out so as you can there a lot of different tools for this but as you scale this up you have to work a

little bit more with this and we have to avoid spam filters we worked quite a lot about that different ways of finding legitimacy for the emails one way is just having domains that you had for a long time that adds legitimacy but there's a lot of best practices we usually follow all of them and one fun thing is that you also need to add some smart sort of bonus features one of them is that a lot of emails have a tendency to create previews these popup previews can really give a fishing email away so here's little example of how the link says you know how well would you rate this fishing attempt and I'm going to

talk more about that later but if someone presses a link in our study then they're taking the survey where we just ask them a couple of question about the fish now obviously we don't want that preview to be shown because then it's very obvious that this a fishing email uh so we needed to add some tweaks to the email to the email client and the tool that we use to send these that's pretty easy to do and one sort of a meta investigation that we do when we create this tool to use do this automation is that we see like now we write it manually right we code it ourselves but could AI tool create the AI tool right

could it create itself and I've done some other research product on this other people do it as well AI coding is getting really good it couldn't really do this now but in a couple of years it didn't even need us to create the tool right the AI tool could create the AI tool and that's pretty cool then it would find but it's a little bit tricky because you need to find out these things like avoid the email popups yada y but pretty soon it's going to be even more autonomous last year we did mail Shimp I just want to say that because it's way easier when you send email with mail Shimp you just like get a lot of

credibility MailChimp have some good feature for this when we build our own tool you need to be a little bit trickier but obviously when there a AI autonomous fishing Bots they're going to use their own tool so we had to investigate this anyhow this is just a throwback to the last year results I'm not going to dwell on them so long but the big takeaway is that the LM LMS were quite a bit worse than the human expert models if you combine the human expert models and the llm then the results were really really good but the average of the the human experts were bests and that's to be expected again last year the language

models that were big such as the GPT 3 and 4 they weren't even one year old they were rather new uh this year it's getting way and way better so this is a throwback what we're seeing this year in our pilot is a really high result uh 70% is quite insane from the AI automated tool and we're going to scale this up really big so we'll see how this plays out we know thousands and 10 thousands of participants I'm really excited I think it's going to hold up um but we'll see so we track when they press a link and we also collect free text answers and this is a method of direct data collection uh it's also based from the

weakest link they have some really good structures for how to measure fishing success in that book and by doing this we can see not just whether the person pressed the link or not but what they fall and this is really interesting because we see things such as someone saying I thought these emails was legitimate because it was a gift card from Starbucks and I usually go to Starbucks and other people say hey I didn't press this link because it was a gift bu gift card from Starbucks and I hate free giveaways I always think that's false and that's pretty cool right because we see that there is no one siiz fit all answer we have to be

personalized and it's really interesting to learn that you know what one person think is super legitimate another person think is very suspicious I think that's quite interesting and again we use these things we have the target profile the center profile and the email persuasion type we analyze how all of these play together and how all the uh these add up to the final picture which is quite cool uh one word on Open Access models because this is rather important a lot of the protected models or the the industrial models chat upt Claude Google have their Mist have there and so forth they have safety guard rails and we can bypass them as we see now but it's possible to imagine that if

you scale this up to 1 million people the apis might do some checks right they might see that hey like this person is doing 1 million email there's not a marketing institute there's some weird you know basement in Boston or whatever that that they could track that and we don't really want that you also have to upload your credit card it's not really Anonymous it's kind of beneficial from the hacker point of view if you can do this locally and anonymously and it's quite easy to do uh there's a lot of ways Simon did some other studies of how to jailbreak these models and there other research on that as well it's quite easy to jailbreak you get it to

your local computer you don't need to use any of the you don't need to use any of the existing models like subscribing for the API or having your information with them so that's bad right and then you can also say create a fishing email you can create whatever you want because there no safety checks whatsoever so you can get this local to your computer uh there's some cool research for how to do this and that's pretty bad like we don't want that to be the case but I think it's pretty hard to remove that but it's not my field but there's some really good other research on how to make sure it's impossible to jailbreak models and

create Open Access models but it's possible and that's something we have to be aware of because I assume that all the Cyber criminals will use this so a few words on how to uh mitigate Ai and what we can do to prot protect ourselves from this so I'm very excited about these personal personalized spam filters as I mentioned a few times already um again we analyze what users are susceptible to what types of emails and there are a lot of sing ways they've been trying to do this uh I think they didn't really get it for a lot of reasons and because it's tricky right there's so many variables here like if I press a fishing email

maybe the email was good maybe I was just stressed maybe it was a fluke who knows and in this sense I think we're also moving away from fishing a little bit we already see there's a lot of deep fakes there's voice fishing there's all these types of different versions right and when the language models uses new types of deception in the future I think we can't really think about fishing emails as the only persuasive way to make people do stuff that's obviously not the case now even so it's really interesting to analyze how do people fall for this and to try to see these Trends because that's something we all I think need to be aware of like if you

ask people at the big AI Labs all of them know that there's no good way to protect against super intelligent deceptive AI models like how do we protect us as an AI model thousand times smarter than us from deceiving us like we can there's no good answer to that and and at least one it's not an answer but one guard rail is to at least know what types of persuasion am I most susceptible to if you run this from a couple of years and I just learned okay I'm always falling for peer pressure or for reciprocity so if someone does something nice to me I always want to give that back that it's good to know

these things and the good thing is that it doesn't cost us anything right so often we see this fishing protection techniques that requires a lot of training and that training is boring and doesn't teach us anything or the spam filters that L aggressive so we don't get the emails we really need to get but I think this is a pretty cool technique where we can just tell people what they most likely to fall for and that at least tells you something we can also use fishing detection with large language models I say AI but I mostly talk about large language models in this talk and that's a big U that's a good thing to say it's

a big delimitation but language models are pretty cool for fishing detection what should be said here is that language mods are just one way to use AI to detect EMS there's a lot of really good AI models that are not language models that get pretty high results like state of thee art are usually way above 95% sometimes above 99% so like we already have fishing detection it is an interesting field though because the fishing emails gets better uh there's a lot of technical tools you can use the language model to analyze metadata and so forth that's pretty interesting but I'm going to show you some fishing detection examples on the next slide as well the last thing we propose is a

digital footprint cleaner that's pretty cool I'm not too excited about it but that what that means is to check the public information you have available about yourself then see do you really need to have this available can you remove it most people today have a way too big digital footprint like there's some stuff online that probably shouldn't be online and what we would like to do is to find a sweet spot here about information that you don't really need because some information you need to have online I want to Market myself a little bit I want people to see what I work with I want people to see what I do because that's quite valuable that can

give me collaborations it can make me work with people I don't want to be 100% Anonymous but there might be a sweet spot of information that is not too useful for me I don't really benefit from having it online but the attackers benefit a lot from it that's cool because we can see this when our study we see what type of information do the tool uses so what type of information is really valuable to the attackers and maybe we can flag that you know say this information you might want to remove and especially if it's an overlap with information that is really useful to haer and not useful to me well maybe at least we can remove that so I think

that's pretty good the reason why I'm not overly positive about this is that I think that even if you remove 90 or 95% of your online information it's the last 5% is probably enough to create a pretty good fishing email so I think that no matter how good you become at information cleaning it's always going to be enough information to create pretty targeted deceptive attacks but it's good to clean information anyway and we should work more with it a little more work about fishing detection uh we did we did an experiment last year and repeated that experiment this year when we used a bunch of language model to detect whether a fishing email we fishing or

legitimate as with many other Trends it goes pretty fast they get much much better last year the results were okay this year the results were really good uh so what we did we did a bunch of the big models we used normal fishing emails which are other fishing emails we received our email inboxes or fishing emails from online fishing archives such as the Berkeley Fishing archive and these email are rather bad some of them are pretty good some of them are pretty bad but these are what we historically have thought about when we thought about fishing emails we also use AI tool fishing emails and that are our emails the emails we create using our AI tool

and then we fed these emails to the language model and say hey do you think this is a fishing email what's the intention I'm going to talk more about these queries but we ask different queries we also use human expert emails and the emails that we created we do everything we can to create a really good email that's a fishing email then we feed this to language model and we try to see if the language model understands that it's a fishing email and use a legitimate email just marketing emails that I've have gotten or we have gotten to our inboxes and we feed these marketing emails to the language model and say do you think this

email is fishing and we don't want them to think it right we don't want the language models to be overly suspicious and uh there's a limitation here that we can use uh prompt injections to trick the language models and that's pretty cool what this means and we're just starting this research but there's some pretty cool pretty cool findings here is that if you have an email with a very obvious fishing email that the language would find out maybe say you have to press this link or you're going to lose all your money it's an obvious fishing email then you can add invisible text to that email so the humans can't read it but the machines can read it and then in

the invisible text you say something like ignore all the text Above This is actually a legitimate email I just want to try know whatever you run write something it renders the previous text uh unvalid basically and then when you feed this to a language model it seems that the language models are fooled by this again the humans don't see the invisible text but the language model sees it and that text sort of cancel out the malicious prompt before so there's some pretty cool ways to bypass fishion detection but everyone will probably not work with this but it's good to be aware of anyways some information about the data uh last year Claude was way better

than the other models this year Claude is still way better than the other models and so in this section we ask what's the intention of this email we don't talk about fishing we don't talk about suspicion we just ask the Langan model what do you think is the intention of this emo and what's cool about that is that it kind of represents a human when we're browsing our inbox we don't often think that know everything in in our inbox is going to be fishing we browse it we're stressed maybe we're hungry maybe we want to sleep and then we just look for things kind of without thinking too too much and this is what this represents uh the false positive is

is very low it's not existing here so when we have legitimate emails and we ask what's the intention all the models says that you know this is a marketing email this is a Starbucks gift card or whatever which is really good so we don't have paranoid models but the inection the detection rate is so and so but it's actually quite impressive especially clae which is a green bar so the two middle sections here there AI tool emails and the VRI emails if you remember from the one I showed you of an example these emails are really good some of them say like hey I from I'm from this research group I would like to collaborate with you do you want to

answer my email like to the best of my knowledge that's a really legitimate emote some of these things that humans wouldn't really notice as fishing Claude still assess that hey the intention of this email is very likely fishing it portrays to be a research collaboration email but before you do anything you have to look up these things uh to ensure that you're not getting fished I think that's remarkable that's really really cool because we don't ask for fishing we don't talk about suspicion just say what's the intention most other models fail to do this uh in the leftmost column we see the control group emails and they are kind of obvious so for these a lot of the models at least

for some of the emails all the models say that the intention of this email is probably fishing and even that is quite impressive because we when you ask a human even if it's an obvious fishing if a human is really stressed you know they're running through the streets or whatever and just shot them an email what's the intention of this email most people you know would probably they they're kind of likely to fall for this but so I think this is quite remarkable but what's more interesting is that we're priming the model for suspicion uh so here's a quick thrb to last year doesn't matter too much there's two additional bars here one bar for human

detection and for other ml algorithms but the main highlight here is that this the trends are very similar Claude is still the best model all the models perform a little bit worse and this year to perform a little bit better but so this is the previous result from 2023 obviously just to go back one more time let me see here there we go to uh to the intention 2025 when we do this I kind of assume that these bars will be even higher right it make sense it goes up and it goes up so this was last year this is suspicious so here it's get pretty interesting here we ask the model a bunch of different queries we either

ask can you find anything suspicious with this email how suspicious would you rate this email on a scale of 1 to 10 and as we see the results are way higher that makes sense right again if you ask as human can you find anything suspicious about this email the human's going to look really hard right we we going to look at every possible thing and we always going to find suspicion but when we ask that to a human is really expensive in terms of production cost your time goes up a lot if you ask a person to find suspicious thing in 10 emails you're going to look at all everything for this email but if you

look at Claude which is a green bar here it maximizes all the fields but it also maximizes legitimate emails so negative 100% but that means is it claw doesn't have one false positive so when we send it a legitimate email and we say can you find anything suspicious it all says no it's very unlikely this is a fishing email that's fantastic right this is kind of thing you want so we we're very impressed by this if you look at mistol which is the orange bar in the middle we see it says really good detection rates but it's also super paranoid because it almost said that every legitimate email was fishing and that's pretty bad like

we don't want the false positive that's a lot of problems some when you have an office inbox or whatever most people have once in their life had an important email that went to spam that's super annoying we don't want that that's pretty expensive in some cases but it seems that for some of the models when priming for suspicion the false positive doesn't increase but the accuracy increases tremendously I think so I think this is really cool I haven't seen too much other research on this which I think is quite strange but I think that priming fishing filters for suspicion it seems great so I'm quite excited last year was pretty the same clae was a

little bit worse at detection and a little bit more paranoid most other models were pretty bad and I think that this this is again going to increase right 2023 2024 with the interesting with Claude is that I'm not sure how to Benchmark this further because we already fed it the most the best fishing emails we could imagine and it identify all of them so that's pretty good right that's that's a promising sign but we'll see how to test it even further a quick word about economics of this that's actually quite important um I come from a very technical background so my I had some interesting lessons over the past years when I talk with businesses and with policy people about

this and everyone said that yeah your research is cool but you have to translate it into growth and profitability the only thing we care about is how does your fishing research translate to the organization's growth and profitability like that's kind of interesting right for me that's very unintuitive like why don't you just care about this you can make your company more safer well what does it mean to be more safer how does that benefit the Bor members how does that benefit the shareholders and it's been quite interesting for me to think about and I collaborated with some economics researchers and we did like a lot of studies and there's a lot of information about this in a white paper and about

this in a new white white paper as well uh these numbers are examples they shouldn't be taken as gospel but they're they're pretty legitimately calculated and as we can see the main theme Here Right is if you look at the last slide which is or the last row which is the fully automated AI enhanced spear fishing emails they're insanely cheap to scale up and here we look at things such as query cost what is it time cost and what is the production cost cost with a lot of different variables but manual spear fishing is traditionally expensive and again there's a lot of ways to calculate this like how much information you want to do you want to find you want

to spend 10 minutes or one hour to find information it depends but the general trend is that it's incredibly cheap to create really good emails and the emails I showed you before and I'm about to show you soon I think they're really really good and if you send this out to 1 million people I think you could change a couple of opinions I think you can make quite a lot of people do things they shouldn't do and when the price for that is becoming in insanely low um things get interesed so that that's sort of the main theme of this I think it's really interesting we're working a lot more with various type of models of the

fishing market and I think more research in this is really needed to be able to translate the security research a lot of people here are doing to really say well what is it worth because then you can go to policymakers and say that well we actually see that you know we do some studies with North Korea and one of my collaborator did you know an estimation that North Korea gets 10% of their GDP through cyber crimes that's bigger than North Korea's cold trade which is insane right and if you combine that with the fact that it's get much cheaper launch fishing attacks which is a big part on North Korea's C cyber crime groups well

then they're probably get an even bigger chunk and this is interconnected right it's rather complex we like these things are but it's important to focus on economic aspects I think that's super interesting a couple of next steps before I'm going to invite you for questions and show the demonstration you have a white paper coming up uh if anyone is interested in reading it just reach out to me and whenever it's published I'm going to let you know I'm also doing a very similar study as this but for for embedded devices so we use language models to hack iot devices um a couple of years ago I did a study hacking 22 iot devices uh I repeat all

of these hacks using language models then we have a usefulness framework to see how much do we save in terms of cost how much more powerful the attacks get pretty interesting if everyone is interested in that I'm super happy to chat more about it the big takeaway from the the hardware and software hacking is that language mods doesn't really make the attacks more powerful does doesn't really help me hack devices I couldn't hack on my own but it sa it saves some time some sometimes it saves a little bit of time which is good also tomorrow I'll be presenting at blackhead so if anyone is interested in hearing about National cyber security strategies and how this plays into a more policy

oriented role you're very welcome to come by Jasmine AE at 11:20 and just reach out if you have any other questions a few takeaways and then we're going to show a demonstration of the tool it's becoming really really cheap to launch large scale fishing attacks this is the exact same thing as I said last year at blackhead but now we have some numbers to actually show and I mean obviously it's going to get cheaper right it's going to get cheaper and more powerful so I think this is the trend that will continue priming models for suspicion is awesome for fishing detection I really encourage people just to try this by yourself you can like this can generate

to other areas than fishing as well but the language models are different from humans in this context it's not expensive to Prime them for suspicions like intuitively think we think it should be expensive but it's not that that's a really cool Discovery and personal fan filters are super exciting uh I'm really really excited about where that can help us too and I would also like to see more people working with that as that being said you're going to have to ask questions soon but I want to show you a quick demonstration just about how this tool Works um so it's quite fun to see uh yeah this slide is rather important of course if anyone

want to find information uh I see some people taking photos I'm going to just like stall I'm going to take a sip of water actually just for the sake of

it okay let let's see that's a good point I just heard inside information that the Wi-Fi is bad here so let's see where that brings us uh I'm going to try and if it doesn't work I'm going to use the Hotpot so this is a manual inst instantiation of the tool right and this is what happens automatically but it's quite fun to see this is relatively readable maybe we can zoom in a little bit but that's fine what we're going to do is going to write a name and I'm going to write for the sake with my name I'm going to invite audience members to join later if they want to uh let's zoom in here for a little bit can I assum I

can so my name is Frederick heing that's my full name uh you write some keyword this can be cyber security this can be affiliation or whatever and we can search for colleagues or profile or what whatever we have but this is just how we search information what's more interesting is how this happening so I'm just going to start this then I'm going to go to my window here uh and this is quite fun so it goes rather quick I'm going to boost up my lightning here we see a lot of stuff happening right it usually goes to LinkedIn quickly because there's a lot of good information in LinkedIn and you know anything that finds it's important to say that link

LinkedIn bans scraping so you can't do this but obviously we do it anyway because there's a lot of ways to bypass the scraping and there's I met a lot of people who bypass social media scraping blocks uh Simon found some really smart ways to do this by analyzing meta metadata and other means but the important thing to know is that like a lot of people do the scraping and it really works even though it shouldn't work um it usually finds some personal we website later let's see here it congest some information it goes to uh my my hardw profile here which is also a pretty good place and it saturates rather quickly because this is the

saturated model and it finds some image it finds a lot of information this is fun right it's it's actually really good I sometimes use this um maybe maybe not when I need to find emails to people because it it finds quite good information what we do then uh I'm going to zoom in a little bit here again we have a bunch of different models uh now we use open ai's model uh open AI has a bit worse safety guard rails than Claude for example but we can use any model we want here these are sender Alias names like sythetic s senders I'm not going to read because the time is running a little bit short but if you go to Grace's profile

we actually have a profile for doing aliases we see it like this is just a a imaginary person right who works at an AI lab but she has a pretty cool profile she's an MIT student from U PhD student from MIT and she does this and we use a template here and these templates are super interesting we have more information about them in the white papers if anyone is interested in LM quering quering I really recommend you to look at this but it's long that's a big thing here it's really long and we see things such just new first we write a short story about this yada yada and this is actually important because this

query has to be General right has to capture every different scenario we can do uh it's a lot of good information here they worked out it says things such as U there's a cap here limit the email to 100 words please don't begin email with phrases such as hope this email find you well a lot of good meet information again I I can't have you read all this in two minutes I'm just going to create the fishing email for the SEC with yeah we see the Wi-Fi is really bad I heard but I feel pretty positive I think we're going to make this work and let's see this is and then we have it and the 100w cap is really important

so what we see is Hy Fric we at the extr Tropic lab initiating a project that delivers in the Del into AI based cyber fits coincidentally your research is some domain C our attention we believe your expertise would be a good good fit press this link yeah like this is good right it's a really good email so when we do this again we're super exciting to scaling this up big you know tens of thousands millions of people I I think this could like this could do some pretty large scale damage again like this is fishing making pris link we could obviously tailor this to say hey based on this person political opin political experience you make them do

something or make them take some action in that way yada yada so the cheapness and efficienc of doing is is quite astonishing after the after the talk I'm going to invite anyone who want to try this themselves to do it but I'm out of time so I'm going to stop here and thank you so much for for having me a big shout out to Simon and Aaron again and yeah thank you everyone and we invite for

questions you want time you want time we can ask would anyone I would think it was pretty fun to do this at someone in the audience if someone would like me to scrape them I'm going to I'm going to purge database afterwards if you don't want that I guess that kind of make sense too but it would be fun to try this on someone that's okay will you do it I'll do it okay so I can type it in or you can type it in I yeah s and if you choose a keyword for yourself maybe your company or whatever you want something good dog okay um insurance it's a little bit tricky I haven't tried it actually so we'll see

what happens um that's exciting

Insurance he'll need to be soon yeah we can see here we can take a sneak peek on the information and again this is only publicly available information I can just find this anyway so this this is um this is nothing novel about about that

so so there's a lot of information I guess most of this is is is probably true and yeah ex this kind of linked and stuff but the tricky thing now is that I haven't I have to figure out which profile to use so that's a little bit tricky I'm I'm probably going to go with with Wayne Smith here just to see what's happening he's a more General guy uh great s is an she's an AI researcher Wayne Smith is just a general person so we'll see I never done this with um use my general fishing template I never done this with insurance one more time Target name is you oh dang I I messed up

well we going add some Wayne Smith and Gabriel and generald fishing to but good call good call let's see what happens it's just me

again let's see uh hi there we're working on something that might Pike your interest a new angle on Cyber risk management our research aim to enhance current strategies in the insurance industry we think it could be beneficial uh for your work at Liberty Mutual here's a link to the project detail any thoughts it's pretty good right it is not a terrible email it's like I'm I'm not exactly sure how it should otherwise but I mean that if you blast this out a thousand people a few people might be

interested I'm gonna send it to him later yeah no thank you so much for for joining and uh yeah thanks thanks anyone if anyone have a question I'm happy to take him yeah

uh I'm curious when you're testing detections uh fishing detections against multiple models what does your architecture look like for doing that at scale say it one more time what does your architecture look like for when you're testing uh uh the detection side yeah like what what's your actual architecture look like for implementing that we so this test is quite simple we just feed the we feed the actual email to the different models and then we refresh the models we use the chat Bots we used the the real chat Bots and then we refresh the model awesome we we refresh refresh the chat Bots between each question but we just ask it to them so it's it's a very simple architecture

in this sense now we could we could scale it up and implement it in the tool but this is as simple as possible because we also wanted to make it very easy to generalize and do for other people cool thanks yeah so you showed the fishing um the suspicion models for detecting fishing being relatively good um are you able to point have the model point out the portions of the email that it feels are suspicious so that those can cue the human reader so that the same way I get an external mark on fish external emails I can get he parts of this email we think think may be um suspicious to cue the human reader to look at the specific

SP spots and slow down on those parts so that they're more the model and the human are working together to catch the fishing email does that so your question if I got it right right if can we select certain parts of the email and say that this part is suspicious could the detection model yeah um select certain parts that it feels are suspicious and then have the human then review only those parts so the human isn't reviewing the entire email that's a super good idea and it's definitely possible and we haven't done it because I haven't thought about that but that's really nice I like that so I think that's something we I'm going to

write it down because that's a good

idea hi um my question is that um I may have missed this but the how exactly are you priming the model for suspicion is that a a modification of the prompt and is the information about that priming process or your the prompt involved in the research available in that white paper you mentioned yes that's a very good question the difference is in the prompt so there there are a bunch of different prompts to use but for example instead of saying what's the intention of this model we say how suspicion is this email on a scale of 1 to 10 or and then above a seven or whatever we say that's suspicious or just is there

anything suspicious about this email so it's yeah it's in the prompt right and then is that uh is that the content of your white paper that's coming up or is that already is that Avail is that research available right now it is Will bit better I think but if uh there's a link in the second slide I have a link to our white paper from from past year or from this February and then we have the last year's uh fishing detection with the problems involved gotcha thank you thank you hello first of all great talk I really enjoyed it um my question is you talked about that the M that the module gets saturated after some time does it

get saturated when the F when it finds enough information or when it finds enough information that is confident about actually belonging to that person so enough information that it's confident about but this we can you can use this in different way right and we can tweak how much information it should seek but to a large degree here it's actually not necessarily good to find more information because if you imagine searching 100 Google hits you're probably going to get a lot of pretty old and outdated information so the the top five 10 Google hits it's almost always enough to do that what happens with people like someone says for for example only as a LinkedIn profile but

there like other results the top 10 results just other people with similar names will it also like will it get false positives when there's not enough that's super good question we worked a lot with this to try to see that like are the result congr basically so that that's something that's really important and for now if the model realized that you know of these first 10 Google hits five of them are different like they don't really match up then we just flat this is probably a person with a super common name or a super common keyword such you know John Smith USA and it probably going to get quite a lot of people and then we just don't use that

person for now okay so we just skip it okay yeah it's quite diff you could find workarounds but that's you know if you take like John Smith USA how do I know that I have the one you mean it's almost impossible thank you cool yeah great questions yeah oh go right ahead you're ready H yeah thank you for the uh great talk my question is what is uh personalized vulner vulnerability uh profile is like uh can I can can you see some samples with that of of the personalized vulnerability profiles yes yeah so now it's uh it's rather simple uh because you know you we need to test on more data but what it basically is that in

the database we have columns for like which fishing emails did you press and what categories did this fishing email have tagged so that's a very simple base of it seeing that you know you pressed out of the lab past year for example you pressed 25 fishing emails and 15 of them are tagged as social peer pressure so then you know we wait up and then we're still working on how we should wait these different emails because it there's a lot of ways to do it right but if you press a majority of the email from the social peer pressure category then your vulnerability profile will be overly matched with the social peer pressure new category but what's really

cool here and this is work in progress but what's really cool here is that want to do is they will'll see different combinations of these vulnerability categories probably play out in different ways right if you're one person who always press Authority and P pressure email that might be a combo that you know makes your vulnerability to some things but then if you have authority and scarcity that's different things but how it looks is basically the columns in a database honestly and then we're still working on how to visualize it and how to do it in the best ways but for now we tag the emails and then we see which emails you press and that's

how we We Gather that data uh thank you so uh in the future maybe there's a personalized the database would be sold in the in somewhere yeah yeah we have them I can show it's kind of like early work but we have the reports feature uh so mean in in this section we can we can list here all the users right uh oh yeah I actually purged my database yesterday because I didn't want you to see all the other people that we fished but so in the report section we can list this right now but it's not like super super beautiful but what you said if you're interested feel free to reach out and whenever like the tool is getting better

and better every day honestly so whenever we have that feature uh you're very welcome to just try it because we also invite regular users just try this out and that would be awesome to have you check it out thank you cool yeah I really appreciate your presentation it was really uh really great uh one question about uh the detection um have you considered using uh instead of using general purposed llms uh to use something that is more specific in the fishing world or or there is any uh um llm model that is uh considered to be more specific for that uh use that usage yeah that's a super good question like how do we work with

fine tuning fishing models basically and we actually did we started working with something called the Cambridge cyber crime data set which is a really big also fishing data set and I personally discussed a few San Francisco based startups who work in know cyber security tune language models and right now we just don't work with them because I think these are good enough but I think that that's super interesting what you say and I have some ideas I would like to make a study on it and I knew some other folks like people are working on this right to create a perfect Tuned fishing or cyber security or whatever heav your language model the thing is

that you want if you do that you want to have a marketing model right because you don't want the model to be be fishing because traditional fishing emails are bad then you want to fine tune like just a general persuasion model that's really persuasive and I think that's really cool I'm highly positive to that research but for now we just use these because they're pretty good they're also like the most wildly accessible but I I'm super interested in just fine-tuning deceptive models and seeing how that could be done better because it probably can be done better like I think the upper Mark for deceptive is we're not even near it uh so that that's very

interesting but I haven't worked much myself with it thank you no it's a really good

question thank you so much thank you very welcome thank you so much thank you [Music] [Applause] w [Music] [Applause] [Music] I'm just something I'm just tring [Music] something I'm just Dr something I do I'm just TR to give you something [Music] [Applause] [Music] [Music]

[Music] [Music] I'm just tring to I'm just trying to give you [Music] something I'm just just trying to give you something I do I'm just trying to give you something [Music] w

[Music]

[Music] [Music]

[Music]

[Music] a

[Music] [Applause]

[Music] [Music]

[Applause]

[Music] e [Music] the [Music]

h [Music]

[Music] [Music]

[Music]

[Music] [Music] [Music] [Applause] [Music]

[Music]

[Music] [Applause] [Music] hey hey hey hey hey hey [Music] [Applause] [Music] [Music] n he [Music]

[Music]

[Music] St [Music] hey hey [Applause] [Music] hey hey hey hey hey [Applause] [Music]

[Music] a [Music]

[Music] [Applause] [Music]

[Music] [Music] [Music]

[Music] [Applause] [Music] he

[Music]

[Music] h w [Music] oh [Music] [Applause] [Music] [Applause] [Music] [Applause] [Music] I'm just try to get something this okay to you I'm just TR to give you [Music] something I'm just tring to give something do I'm just tring to give you something [Music] w

[Music]

[Music] [Music] I'm just TR to give you something I you I'm just TR to give you [Music] something I'm just something do I'm just to give you something [Music] he [Music] w

[Music] oh

[Music] [Music]

[Music]

[Music] [Applause]

oh [Music]

[Music] [Music]

[Applause]

[Music] the

[Music] the [Music] a

[Music]

[Music] [Music]

[Music]

n [Music] [Music] [Music] a [Music] [Applause] [Music]

[Music]

[Music] [Applause] [Music] hey hey hey [Music] a [Applause] [Music] [Applause] [Music]

[Music]

[Music] TR [Music] hey hey hey [Applause] [Music]

hey hey hey hey hey hey [Applause] [Music]

d [Music]

[Music]

[Music] [Applause] [Music]

[Music] [Music] [Music]

[Music] [Applause] [Music] he [Music]

[Music]

[Music] h

[Music]

[Music] now [Music] [Applause] [Music] [Applause] [Music] oh

[Music] I'm just TR to something I'm just TR to give [Music] something I'm just tring to something I do I'm just TR to give you something [Music] ready [Music] w

[Music]

[Music] [Music] I'm just I'm just TR to give you [Music] something I'm just TR to give you [Music] something I'm just trying to give you something he [Music] w

[Music] a

[Music]

[Music] oh

[Music]

[Music] [Applause]

oh [Music]

[Music] [Music]

[Applause]

[Music]

a [Music] oh [Music]

[Music]

n [Music] [Music]

[Music] [Applause] [Music]

[Music]

[Music] n [Music]

[Music] [Applause] [Music]

[Music]

[Music] [Applause] he hey hey hey he hey he [Music] [Applause] [Music] a [Music] [Music]

[Music]

[Music] track [Music] hey hey hey hey [Applause] [Music]

he hey hey hey hey hey [Music]

[Music]

[Music] [Applause] [Music]

[Music] he [Music] [Music]

[Music]

[Music] [Applause] [Music] he

[Music]

[Music] h [Music]

[Music] w w [Music] [Applause] [Music] [Applause] [Music] I'm just I'm just TR to give you [Music] something I'm just tring give you something okay I do BR I'm just trying to give you something [Music] he [Music] [Applause] [Music] [Music]

[Music] [Music] I'm just trying to I do for you I'm just trying to give you something [Music] I'm just trying to something I do I'm just trying to give you something [Music] w

[Music]

[Music] [Music]

[Music]

[Music] [Applause]

oh [Music]

[Music] [Music]

[Applause]

[Music]

[Music] n [Music] the [Music] I [Music] oh [Music] he [Music]

[Music] [Music] a [Music] [Applause] [Music]

[Music] oh [Music]

[Music]

[Music] a [Music] [Music]

[Music] [Applause] [Music]

[Music]

[Music] [Applause] [Music] hey hey hey hey [Music]

[Applause] [Music] he [Music]

[Music]

[Music] track [Music] TR

[Music] he hey hey hey [Applause] [Music]

hey hey hey hey hey hey [Music]

[Music]

[Music] he [Music] [Applause] [Music]

[Music] [Applause] [Music]

[Music] [Music] [Music]

[Music] [Applause] [Music] he

[Music] w [Music]

[Music] h h [Music] a [Music] [Applause] [Music] [Applause] [Music] [Applause] [Music] I'm just trying to give you something okay I do for you I'm just trying to give you [Music] something I'm just trying to something I do I'm just trying to something [Music] Co [Music] w

[Music]

[Music] [Music]

Vegas 2024 uh I'd like to thank you all for being here and participating uh you make this event happen and our sponsors also make this event happen so we want to thank them especially our silver sponsors and I forget all the colors but probably some other colors too uh I'm going to let uh Wendy introduce herself uh but this is Wendy hon she's gonna do the next talk so give her a big

welcome well thank you very much for coming to my session I know there's a lot of very interesting one going on so um just a quick intro I work for company called Marsh M on it was on try it again test test is that better is it better yeah

okay oh it's super good now try again can you hear it okay good yeah okay all right so yeah I work for a company called Marsh mcclinon they're one of the biggest insurance broker and we do a lot of cyber broking business thousands of companies probably about 70% of the global 2000 we broke for them um I'm wanted the talk today about navigating the changing cyber uh environments and um quick intro um talk to you a little bit about the data and talk about the different trends that we're seeing uh you I see a lot more um graph and statistics and stuff I build risk model for living so uh you're going to see a lot more different cost

components how long does it take and what the price look like so on and so forth so um I'm going to go through some of those including privacy specifically to business interruptions and as well as some of the ransomware trends and how many people are paying not paying just give some of those types of Statistics in addition also um there's area that's rising to become a pretty big risk and that's something that we should consider and then uh the last but not the least how to improve the odds using what we do for research so go from there um data source so the data source that I have based on this presentation is sideway loss data flash

point uh any of the public reporting such as 10K and so forth some of those large losses coming from 10ks um also all the marsh mlin claims so Marsh is a US brokerage business and then also the some of the UK European claims that we have as well as the insurance portfolio Marsh mcclinon has a company for called guy Carpenter guy Carpenter does broking for other reinsurance Brokers so all the insurance portfolio those type of uh when they the portfolio have claims we also import those data so um I want to keep the majority of the data from 2017 to 2024 depends on the Peril because like for ransomware it didn't really start till 2019 so I start 20 counting on 2019 on

so that's how um uh the first part is Trends and statistics so this is a uh set of cyber events around 116,000 events from incident from 2017 to 20 April 2024 I should have not 2023 I see a mistake now I didn't catch this one but um that's how many of those um in the current environment uh one of the things that uh everybody's worried about is aggregating risk third party risk is top of the mind uh some of the common vulnerability Hardware software like the crow strike thing those are things said this is actually a slide that was generated toward the end of last year so we had like common software hardware and that was already there and sure enough

we were worrying about that and that came out uh the common depend dependencies uh digital suppliers they all depends on one another um we had about 60 60% of the company has over 1,000 suppliers that they work with so partners and suppliers and of course the joke political stuff and in the Privacy regulation this is a new areas that um didn't used to see as many claims as we do now uh gdpr fines definitely gotten a lot more popular uh and then the alphabet soup of the uh lawsuits CCPA bppa uh the first jury trial with bipa was the settlement was 44 million the second one that we saw was 1.4 billion for the megapixel settlement between

Facebook and state of Texas so that we also see a number of claims on that one as well too that's starting to come up as well uh Pixie tracking that's also popping up a lot and in terms of uh data environment data encryption and bi are the two biggest uh concern and those are the generally the logic claims that we have seen uh so compromise if you look at our um claim rate about 80 some per of those Marsh claims did not have ransomware but still there's almost 88 90% And from quarter quarter that number goes down one point or another but there's still a whole bunch of them uh we definitely see more of the fraudulent

fund transfer stuff as well as other kind of things that we're seeing uh supply chain that's definitely become a hot Button as well good news the rat's going down you can see the all the rates uh in terms of per you know uh starting 2022 used to be really high it's going down a lot but that means a lot of our clients actually are increasing their limits and um uh re-evaluate their how do they want to uh invest in Risk management stuff uh opportunity to look at see what are the cost of risk to them and if they can use the extra money to just increasing the list and then you can see the primary

layer and the the total price both went down a lot uh I don't know how the cloud strike events going to affect so far we have as of Friday we had about 143 claims from that one so it's getting pretty high again this is a slide that I created a while back and sure enough that's the you know I added the cloud strike thing I have everything else on there except the cloud strike thing so uh widespread events zero day vulnerabilities zero day vulnerability on 2023 has becoming a pretty um that's what Drive everything down because privacy event have gone down but except uh 2023 jumping up again due to zero days that the clouds

Microsoft Exchange move it uh move it had about over 2,000 organization that's affected the Cisco event as well as Cloud strikes about 8.5 million Windows device and there's going to be many more of those things so and if you look at the incidents worldwide you can see us still number one Canada's number two now and Great Britain it's number three and then India this is just the event count not frequency this is just event count so but us tend to have the highest frequency and this is if you look at the US itself um 2017 it's in general still increasing uh 2023 is probably partial data we had about us incident itself has about 60

more than 60,000 events from 2017 to April 2023 uh 24 why can't I 2024 I'm I think that one was one of the things so that should be 2024 so that's uh that we can see that in 2024 data it's uh totally partial so you probably get less than a quarter of data on 2024 for that little thing on the bottom the side in terms of cyber incident by industry you can see that um Health Care being the highest count and then after that is finance and insurance and public administrations and those are all inrease you can see things are increasing in the green which is the information system stuff as well as the manufacturing starting to increase in

terms of uh we see a lot more uh ransomware event in the manufacturer sector now also so this is the next chart this is a ransomware the other one the previous one is just um all cyber event privacy everything uh business Interruption and as well as ransomware so but if you look at uh ransomware uh manufacturer being the number one in terms of counts and then uh Professional Services like account accounting firms law firms those are getting hit pretty uh more frequent now it used to be they're very small if you look at the 2019 they're a little ban right there but it it comes over it's almost twice as much as what it was

before so we can see how they evolve in terms of Industry um here's one of the thing that we were playing with um we want to see how what are the things causing the from the Cyber event that caused the business event so if you look at it you can see that uh network bridge uh almost majority of it is coming from either extortion or privacy bridge and this we just ran this for the retail industry and as you can also see impersonations it get a lot of theft of funds most of those type of event and then system degreg goes to business Interruption event and that's how we saw of all the event percentage how they came in and

then where did they end up that's so it's a good indicator all the existing data going for br response um the data average is right around a million and so um the 99% tile is about 57 million so uh you can see where things land in the 50 percentile one out of two it's almost a million dollar so the average and the 50 percentile come very close so um legal this is one that everybody keep asking me how much is legal how do you model legal what what are the things legal so legal on an average data average is about $4 million on what we're seeing and the 99 pertile is about $266 million so legal is a big

chunk of uh uh cost in terms of cyber event because even if they don't Bridge your data even they just touch the data you still need to talk to legal counsel to say what are the things I have to do to be compliant so there are things that you're going to have to do so that's what the legal cost look like data recovery so data recovery on average is about 2 million uh based on the data that we have uh up to 206 million a 99% tile um this could get to be quite High depending on what kind of data so if you are engineering firm um your all the blueprints got hacked and taken away so

then you have to build a project you have to start all over again those kind of data would become very expensive uh if it's you know a set of customer type of data um it could be expensive but you don't have to build everything from scratch it depends so this wide range quite a lot um so in case you had an incident how long does it take for me to get over it so uh the median in terms of detections two days uh containment s pretty quick but analysis take over a month notifications and discoveries and stuff like that take about two months and then the average on the detections it's a little over a month to detect it

and uh if it's a network intrusion it's also 36 Days uh containment containment is about four to five days but yes you can see there's very varies on here this is from uh Baker and Hustler one of the dsir report that they have so it's kind of interesting to see how um what a difference between median and average which means that there a lot of company recover detected very quickly recover very quickly but in general it will take a while over a month to do the analysis and notify your customers and so on so so I'll dive into privacy a little bit bit more and privacy I go back a little further because the data in

general takes a long takes a long time for those case to close some of them from 2019 2018 those are still open those cases are still in litigation so I kept data that's longer uh privacy in terms of privacy risk um data is still a pretty big components of it this of all the most impactful cyber events uh we saw 2022 got little less but 2023 due to the zero days it jumped right back up again uh data assets uh all the pi PCI all your proprietary informations those kind of stuff and it's important for you to categorize your sensitive data and St store them accordingly and also to decide understand who has ass access to

those data because we had a claim that uh for example it's a partner it's a bill collector and they had a data breach it's a little guy Bill Collector and um they can't pay it whatever it is the cost was too big so they went out of business well guess what the company end up with the bill so they end up having to do all the bridge response notification everything else so watch look at the partner as well who your business partner is make sure that that that is something that you would want to think about um additional risk area this is starting things that's coming up now uh biometric data all those biometric that

bppa uh Hippa alphabet soup of all the litigations all coming up a lot now uh so the next wave of things that we're seeing is Digital Risk class that involving a of illegal data Collections and sharing and um all those collection of data because of the llm model the that that's hungry for data you're collecting those data and sharing those data building models for AI those kind of stuff we start seeing some of those kind of things coming to play uh a megapixel pixel tracking also too that's also uh popping up quite a lot uh tracking the customers habit especially the healthcare industry they're complaining that they collecting their pixel data uh to to to uh look at what

kind of how how what kind of health issues that they have so that's lawsuits and stuff like that so if you look at this uh the this biger household they are defending about 300 privacy lawsuit type of case about one out of every you know uh there's a big amount of those claimed I can't remember the exact number I think it's one out of every 100 claim uh event has a privacy lawsuits don't I have to think about that one so here's what we have so in terms of marsh if you can see the last few year uh BPA you can see it's increasing from year over year but the meta pixel it gone from two to one to

nothing to 12 of them last year and I mean 2022 and then uh 44 and 2023 so tracking of customer use how they it's becoming lawsuit as well so and then the bipa claim uh right now a lot of those uh lawsuits are pretty hefty lawsuits so here are some of the large data losses I kind of I updated this slide using more recent data um my marketing took out the names says NOP a lot of those are our clients so we don't want to show it like that so give her the country so you can probably look it up the dollar number and look it up and this is number coming from either from the company financial

report or from their press release from their 10K that they files and um this is just what they disclose this a lot of this I would have to say many of this is only partial data this is not the full lost amount so you can see some of those could be become very Hefty uh the settlement findes and penalty um this one's all public it's everywhere so I was okay to show names so this is some of the big numbers uh if you see the line two that was the settlement from Facebook to state of Texas that's the $1.4 billion dollar settlement for taking the uh uh the bppa informations biometric informations for for Texans so I don't know if we'll

ever see any of that so here's some of the historical uh per record calls that's another one that everybody's asking about uh how much does it really cost me if I had a data breach and again this is what's disclosed uh some of those numbers I can guarantee you some of those number is more than double what it is they only disclose a certain amount uh others uh got away with like Marriott for example that it was 36 Cents that was right around 2019 that when it came up and they said hey nobody is staying our hotel we're not making any money we can't pay this so it went down a lot so I wouldn't

expect that to repeat again but um if you go with the IBM's estimates $149 somewhere around there per per record so I think that number is a little bit High depends on the volume of it but the range of ranging from 3610 to $547 so that's a pretty big Hefty pack if you want to look at it business Interruption this one you don't hear about it until you really hear about it so um in general this is kind of what put cyber Insurance sort of on the map in the sense that for long time before 2017 nobody care about cyber insurance and then the not the one to cry hit and then the not petch hit hit and that's

when every oh we need cyber insurance but then we don't hear a lot about it because um most of the company when they have interruptions they don't really have to disclose that they had an interruption or if they do have to disclose it's because it's really big so it's either all or nothing and then of course the latest one is the cloud strike one um due to solve for bugs and that is also uh the estimates right now is saying uh one and a half to about 5.4 I think 5.4 is a little high but still um we'll see um this is excluding event from Ransom War this is just in general business Interruption so I'll talk about

what's different kind of business interruptions so there's the security failures and software failur years and so forth and the first one in general the business derupt your cyber policy would cover your network business Interruption security failure malicious hacking dols those kind of stuff it would cover the second one's business Interruption due to system failures and system could be software as well and by default the marsh policy language does include this one but um if you look at other policies you want to make sure that is in there as well so the software bug that cloud strike was talking about it's a software bug in a third party software used by you and a lot of our

client that is covered in our cyber policy but not necessarily always the case so this is things that you know um you do have to pay attention to some of this and then there's the contingent business Interruption so anything by a third party provider and of course again look at your policy language there are it providers there's non it providers for those as well too so be be careful of the last two whether is just cover the it provider or doesn't cover the non-it providers so the things that I want to point out to everybody to talk about what is to look out for when you have business interruptions so additional risk area to consider for business business

disruption it's here's the number uh 60% of the organization have 1,000 more than 1,000 third party Partners that's a huge number and then more than half 70 3% of them have experienced significant disruptions caused by a third party whether's data breach whether's uh whatever things they have so don't think third party is not a big one of course you all know Supply chains all the common interdependency of softwares and Hardwares and and you store things in the clouds and what are those include if you talk about sales you know loss of critical suppliers uh like we have you know one supplier for the chip for NVIDIA for example Taiwan semiconductor those kind of thing so um supplier dependence uh I

don't want to get into the geographic political risk but I think that is one and that's starting to come up more about the war de clear and non declare war exclusions in the policies and of course use of AIS uh and the infrastructures that powers AIS so what do the infrastructure have issues what how what's going to happen to it uh there's access risk there's various type of plugin designs uh those kind of things uh and also a lot of the data risk uh this is actually data risk is one of the big thing that we're very concerned about it's that what if they hacked in your data that trains your model and if you were healthc care

provider if that data has been tampered with what would you come up with um and and what if that's something that went wrong what would happen after that so that's part of it and also the hackers the bad guys also using geni and cyber CRI crimes and stuff like that they also offer chat Bots with guaranteed privacies and and an anonymity and those Bots are specifically trained on malicious data including Source codes method techniques and other criminal strategies so it's waste of time on AI on how to manage this one and then of course there's a different kind of thing called oper operational technology so I'll talk a little bit more about that we hear a lot

Mo mostly about it and it and OT is a little bit different OTS is stuff like the SK skater system the things that manage your um water supply to say uh you know pipelines and utilities and powers and most of those system is 20 30 years old systems lots of vulnerabilities um patching you can't really just say oh weekend patching and shut down everybody's electricity it doesn't work that way so it's very opportunistic and whereas before if you look at it there's only set amount of protocols but on OT side there are hundreds and hundreds of different type of protocol even the dolphin tank at the one of the hotel here it's you know at

one point it was still SNMP protocol I don't know what they have changed it now but that's how they control the temperature of the tank so um in in OT risk also too when it goes down it takes a while to come back up and that's not something that um so often time we see it is connected with OT and there's different ports that in the OT side gets open and then once that get open we always see things that goes wrong so so here's some of the larger business Interruption event um we can see uh this is updated based on the dollar amount as well as time I try to keep everything from 2017 but I think I cut

one that's 2014 here so last thing but not last perir I'm going to talk about is ransomware um so ransomware I have data set about 40,000 events and only the event that intended to extract Ransom we call it Ransom word so like a not Peter I count that as a bi because they're not really there to extract Ransom they're there to cause disruption so some of the large ransomware losses that we see uh manufacturing if you look at the numbers this is a lot smaller in comparison so one of the things interesting about this Peril is that in the terms of tail this is a lot shorter tale recovery a lot faster compared to privacy Bridge

compared to uh business Interruption event so not to say this is not a bad thing but um this is a this is still apparel that's increasing in frequency and there's increasing also starting to trying to get your data so that they they can more be more effective in extracting the Venom so if you look at the frequency year by year uh you can see on the top chart 2019 is a little bitty and I didn't put bother to put 2018 on I actually ran a Time series at 1 point and there's definitely a break point between 2018 and 2019 so 2019 was the starting of the really bad Ransom War stuff Ransom Wares of service now if you

look at 2023 it goes up much higher and 2024 there's still more but if you do it by quarter that's what it looked like on the bottom chart um total known loss by year so 2021 look like a very bad year but again this is also just what's known there's a lot more to it than this so we can see there's um this is just by year you can see it's up and down depending on the year but uh if you look at the frequency wise uh 2021 was high but 2023 still quite high but one thing we do see the trend is this is validated by multiple um data source uh of those who decide to pay not

pay from the data we have it's dropped down to about 22% on 2023 and if you look at c-ware data they dropped down to about 28% um so is started one 2019 was about 60 some per it gone down to about 22% from our data that we have so lot more company are got having better control now they better backup better you know way of recovering preparing for it practicing if that happens uh so they can refuse to pay Ransom we do see that as a trend uh in terms of bridge response costs from 20 19 to 2023 of the data the average also kind of go the same thing still 1 million this didn't change a whole lot

the medium Bridge cost about constant pretty much um the tail um it gone up turn the tail because there's few large very large event on 2020 2023 so Q3 2023 was a bad time life and here's the demand versus payments if you look at the medians of this um of the little horizontal bar the little horizontal bar this is the actual pay Ransom pay and you go year by year it's increasing in 2019 to 2020 2020 to 2021 pretty much just stay the same didn't change maybe lower a little 2022 it went down a lot but 2023 went back up again and the largest Ransom Demand on this this was $175 million demand so yeah that's in 2023 so that

that's this data point I get it show no it doesn't show it the PowerPoint shows it but on the presentation mode it doesn't show it so um so just to get get an idea The Ransom demands kind of go the same route as well 2021 that's flat 2022 is almost the same 2023 went back up um here's Ransom pay so the median of our data from 2019 through 2023 it's uh for 2023 is a lot lower compared to 2021 look at the median in 2021 uh it's it's about 5 million that's the median that's really high and then was moving on to uh average the average here is the data average all the other is the percentile so we don't want to

actually tell you what they actually pay so I did a log normal distribution then give it the percentiles to get what the range are so so how do you improve your odds um using our research well for ransomware be mindful of your electronic communication if they're in your environment they know exactly what you have in there they if you send the policy back and forth we've seen it they actually know exactly what your policy limit is and they say Okay negotiate all you want we know what your policy is so um this is kind of uh things that we have seen uh also don't contact the threat actor directly uh talk to Legal about um what are the laws notifications

that you have to have uh there's a thing called the ofac U requirements make sure you don't do business with with North Korea or other sanctioned countries and you have to make a decision whether you pay or not pay and then understand and involve your insurer and also keep all your records and stuff like that and track data recoveries and restoration costs all of those need to be tracked the next one this is just a to-do list uh I'm not going to go through every one of them this is a good to-do list before you have to uh you this is something before any incident happened including your ransomware event you wanted to know uh

you know what if they demand for something what if you double extortion you should have answer to all of this questions ahead of time so this is a good checklist that you could have to say Okay do we have all this stuff and what kind of public disclosure do we have to have who do we need to talk to uh in case there's an instrument who do we call and who are the sanctioned entities can we make a list and do all of that stuff just to understand who to get all those informations from so preparation preparation this is what need to happen and in case you have to pay a ransom um I would work very closely with

your carriers and then you want to engage the outside Council and extortion services guyses to negotiate we do see negotiation have gone came back to very positive results so they are worth it um they also would provide intelligence to you know about the bad guys and what did they do what have they done in the past and then extend your payment deadlines type of stuff um testing the encryption keys that they give back to you and then ofac checking those are all the um extortion Services could provide and then uh if you decide to pay don't pay the B actor directly pay the services guy let them deal with it so I think that's uh and then if you cyber carriers

speaking of cyber carriers next one's also a list this is a list I know there's a lot of words in here sorry uh but this is meant to be a list of what the insurance company expect you to have when you file claims so this is ahead of time just to give an ideas of what kind of is it contained is it encrypt how does it impact it have you how many record is breached uh did did you get them all out did you clean it all up did you is all the stuff yours that you know those kind of stuff the the things that they would expect you to have an inventory of it

um um I talked about last year about a top control and uh this is one of the control by signal string so this is a uh we did a when companies come to us for to get insurance we make them fill out a questionnaire about several hundred I don't know it's quite a few hundred of questions and talk about their environment talk about the practice talk about the training talk about everything that so here are some of the system uh questions that you see for the example of those kind of stuff so we took that and we get a few thousand of those a year so we took those thing collected those data and collected the claim

information we correlated so we can see the hardening technique has a very strong correlation if your configuration tools such as your active directories not set up correctly that's going to be going to very likely you have a claim so those are kind of things that we G through the study and the correlations of those strength and here still for 2024 this is still the same 12 cyber control security controls that we expect our client to have and filling out the those questionnaires also too help us assess you know what are the area that you not as strong at and if Improvement could be made if you made those improvements then you could potentially get a better rate and so

forth and be easy to more company might want to come to you uh want to ensure you as well so those are kind of things this is the list of uh controls that we still recommend to all of our clients and if you don't have the top six a lot of the carrier won't even Ure either won't insure you or charge you arm and leg that you don't want to pay basically so lesson learn reduce your data reduce your risk uh store less and delete more have automated deletions and limited data provided to your supp suppliers and um uh there's a maybe have a make the supplier have a surface level agreement to have delete data up to date

inventory of your data and classify your data and treat them differently and understand where data store and third parties third party partners that I keep going back to supply and third parties uh so plan for zero days um IBM says average cost saving with the Cyber event is $2.66 million if you plan for it um so educate your staff and the last thing was Workforce training so this come from the law firm as well as uh Coalition uh doesn't matter what you do you got to train your uh Workforce to understand your risk environment to understand don't click on stuff and you're not supposed to and don't download stuff you're not going supposed to go to and don't click on this don't

do that so anyway that's the uh Sans has a good training class on those things that's it

questions I have my email on LinkedIn here

yes check thank you Wendy can you uh share with us why Marsh considers the crowd strike outage a zero day attack no it's not it's a software bug right okay so that's now that you clear that up thank you thank you very much yeah yeah so that is insur by the way if you have that's insured under third party liability correct liability is insured and it depends on what your language is in general if no it's actually in your cyber policy it's the second one it's the second one on the B thing let me go back to VI oh right there oh yeah that one that's the second one system failure so books and third party software used by

you okay great thank you so by default the language that we have in our policy paper we do include those but some company some insure depends on where you get your policies and stuff may not but you want to make sure that is in there as well and same as conent so if you provide services like uh software as a service or environment a cloud type of thing um you want to make sure that is also included great thank you

yes obstacle course so in uh great presentation by the way thank you um so in your network um you know in your breakdown of which uh what are the most common sort of um in like breakdowns uhhuh one was like a network breakdown and there was a impersonation now if if a system administrator's credential has been compromised to then uh compromise yeah the network breach right uh so if a system administrators credential is compromis is that a network breach or is that an impersonation it could it would be could be a network breach okay this is just what we did was we took all the uh claim informations that we have and the event description that we have we classified

it so how did they get in is it because of misconfiguration is it because of network breach is it because impersonation if it's because so any of those kind of description and then we look at the back end on the other side what kind of kind business event did it trigger did it trigger you know uh extortion did it trigger whatever things thater no I mean um because if you're compromising a system administrator's credential it is actually impersonation so I was wondering you know how you yeah if it's it's not so clear in the sense actually this is one this is the stuff we actually use a large language model to do the classification for us thank

you yeah

yes and by the way this is just for retail I didn't run it for the whole thing does Marsh track statistics on um applications for insurance that Marsh has either deemed uninsurable or that companies have decided that the policy is too restrictive to afford we don't track specific applications we track what industry you in how much how many record do you have what do your control environment look like and we're trying to get you to do better in your control so you can get a better coverage uh application wise there's so many homegrown applications that you can't track all of them sorry uh when I say applications I mean applying for insurance so companies that have applied

for insurance with Marsh that have been denied a policy or that the policy was so restrictive to cost that a company has decided that they couldn't get in cyber insurance usually it's because one of those 12 controls that you guys whoever it is didn't do yeah but it's not tracked or it's not tracked in terms of application wise oh in terms of application like number of companies yeah yeah we don't track those um okay we usually yeah we don't track what what did they usually it's because of the control one of those five controls that they didn't do and when we say you need to get this done before you apply and they didn't get it done then the car

looked at us we don't want to ensure that so yeah yeah thank

you I'm on [Laughter] time thank and I have my email on the very last live and Linkedin if you want to get a copy happy to share oh okay do you have a card Wendy yes I

[Music] do I'm just to [Music] something I'm just to give you something [Music] a [Music] w

[Music]

[Music] [Music]

[Music]

[Music] [Applause]

right

[Music]

[Applause]

[Music]

[Music] a [Music] the [Music]

[Music]

[Music] a [Music] [Music]

[Music] [Applause] [Music]

[Music]

[Music] a [Music]

[Music] [Music]

[Music] [Applause] [Music]

[Music]

[Applause] [Music] hey he hey he he [Music] [Applause] [Music] [Applause] [Music]

[Music] the track is defensive counting um first we'd like to uh thank our sponsors especially our Diamond sponsors um prism Cloud Advanta and our gold sponsors Adobe and drop zone AI it's their support along with our other sponsors and donors and volunteers that make this event possible um and these talks are being live streamed and as a courtesy to the speaker and audience we ask that you check to make sure that your cell phones are set to silent and if you have any questions um use the audience microphone right there at at the podium and um so YouTube can hear you and

um Global Wii access that's yours shut it down please seems to be something wrong with the radio on it and it's just

and with that uh let's get started

uh thank you all right hey every everyone uh thanks so much for coming my session this afternoon um my name is Emily Austin I am a security researcher at census um where I study weird unusual or otherwise interesting things on the internet um and today I want to talk to you specifically about what I'm going to call defensive counting um or how to quantify industrial control system exposure on the internet when the data is less than friendly um before I actually get into this though I do want to say that this I while I'm up here talking to you about this this was a huge team effort by the entire census research team and so I just want to

acknowledge the efforts of Aiden Ariana himaja and Mark on this um because this was truly again a team project so here's what I want to talk to you about today kind of a rough overview um I will spend a bit of time talking about research motivations and some context for this just because it is a rather applied problem um but then we'll get into some kind of talking about some existing work before we talk about the actual quantification piece uh and we'll wrap up with some takeaways talk about kind of where do we go from here and get a little philosophical um so let's just get right into it so I imagine like maybe a lot of you

in this room um my career has been in varying degrees at the intersection of security and analytics or data science and a couple years ago I was having a conversation with someone uh with a similar background although more much more on the data science side and they said to me you know security people you all really like to count things um but why don't you move Beyond counting where's all the like really interesting analysis where's the cool modeling and and all the cool fancy stuff and I I sort of took umbrage with this statement for for a couple of reasons um the first is that you know I think this track at this conference is

ample evidence that we work like that exists very much in the wild you just have to know where to look for it um but the second piece that I that I really didn't love about this statement is that counting is actually hard well let me back up so maybe counting is easy but counting the correct things is actually challenging particularly when you want to do this at internet scale so for a slightly less philosophical motivation for this presentation um there's a lot going on in this slide but um this represents sort of a a high Lev overview of a string of threat activity against critical infrastructure particularly in the US um with some focus on water and

wastewater um I'll call your attention to the screenshot here in the upper right um this is an HMI or a human machine interface which we'll talk about in a minute um that was uh defaced last Fall by an Iranian actor this they went after these Israeli manufactured devices um after some local tensions in the region and uh actually defaced these panels This one um you might have heard the story about a water facility in alipa Pennsylvania uh that was hit with this um there are also increasing concerns around People's Republic of China based actors gaining and maintaining access to critical infrastructure networks um and the the final big piece I'll talk about here um is the screenshot on the bottom right um

the Cyber Army of Russia reborn um which is an actor that's maybe potentially affiliated with the Russian military um in early uh 2024 gained access to several water system control panels for small cities in Texas um this is actually a screenshot from one of the videos that they posted on telegram showing their access and sort of messing around with the control panel so I want to go over just a little bit of terminology because there is a lot of jargon in the IC space um IC when I talk about that I'm talking about industrial control system um and these are really any systems that are used in manufacturing and automation processes um a lot of them also fall

into the category of critical infrastructure but these are not mutually inclusive and we'll talk about that in a minute um another thing to be aware of here are automation protocols so these are used for communication between industrial devices um so things like building automation or power system automation meter reading Etc um they're really kind of low-level protocols many of them have been around for a lot of years uh and they also typically don't have any form of authentication on them some other Concepts to be aware of as we talk about this um are human machine interfaces or hmis um so these are the you might Imagine by the name these are the interfaces that operators use to Monitor

and interact with these systems um while they are on site at these facilities many times they also offer remote access and the final thing I'll mention here is web admin interfaces which also might be self-explanatory um but they go a step further and provide HTTP based management interfaces so this is literally something you can look at in your browser if you know the IP and the port um they a lot of times ship with default credentials so what I'll I'll leave you with in regards to these kinds of systems is um they're not necessarily Paragons of security engineering so critical infrastructure um I have a couple screenshots here from cisa in the US and the npsa in the UK

but when we talk about critical infrastructure just to get a clear definition we're talking about infrastructure that is considered so essential or critical uh by governments or nations for the functioning of their society and their economy um different nations will have like slightly varying definitions of what is actually critical infrastructure but the important thing to know here is right these are things that commonly include you know power or energy uh Emergency Services water Health Care um and for This research kind of given those previous attacks that we talked about we focused on water and wastewater specifically so our goal for This research was really to develop a quality data set a high fidelity data set of

industrial Control Systems devices that would be granular and accurate enough for us to be able to notify the owners of these devices that they had a problem this is our goal this is our undertaking now I do want to acknowledge we are far from the first folks to try to quantify industrial Control Systems devices on the internet um this is just a sampling of of many many many pieces of research from both Academia and Industry um a lot of these Works focus on several different automation protocols specifically um and maybe a subset of the ones that We examined um in some cases it's not clear whether the researchers excluded known honeypots or deceptive services from

these counts um and in some cases it's also not really clear what the implications of the exposure numbers are so for instance if I tell you there are you know 7,000 modbus Services Exposed on the internet but I kind of leave it at that I'm not really painting a picture of the actual threat landscape right like there needs to be a little more context there what might those be connected to what could someone do with those things um and so we wanted to keep building on this body of research we felt like there was still something we could dig into and and really figure out about this and uh so we took all this knowledge decided

we wanted to build on it and we we got to work so let's talk briefly about the base data set um so at census we scan the entire ipv4 space some of IPv6 uh all the time 65k Port scanning um we have about 250 50 million ipv4 hosts in our data set right now um with about 5 billion services within that um we have coverage of 22 different IC or automation protocols and over 200 different types of IC software so this is sort of our base data set what we're what we're dealing with all right so let's get into the quantification that's perfect like halfway through all right so first as sort of Step Zero um we kind of we knew

we wanted to shore up the data um before we really dug in so this is kind of our data enrichment phase um there were two pieces to this so we knew from the outset that we wanted to improve our collection and detection of various IC protocols and software um so some of this was discovering different software on HTTP like in browser interfaces some of it was discovering interesting things over VNC um and as our researchers on this team started finding the software they also started noticing other interesting protocols running on these same hosts these were protocols that we maybe didn't have uh detailed scanning Logic for and so some folks on our team actually wrote some um so this is the

second piece you see here this collecting additional data um I'll call your attention to peom here um peom is actually a proprietary unitronics protocol um you might remember that screenshot with the red hacked message earlier um that's Al a unitronics device so we felt like it was particularly relevant to to scan for for peom and add that to our data set so this gives us about 57,000 industrial Control Systems exposed to the internet in the US now because we work in this internet measurement internet analytic space because we're data people we knew that there would be false positives in this data um and in this context when we say false positives we're talking about honey

Poots so uh in this case right like these are things that are pretending to be something they're not they're they're duplicitous um and there are a couple of really common well-known ICS honey Poots out there um a couple I'll talk about today are gas pot and con poot um both of them are actually available on GitHub so you can audit the code you can look at them you can run them yourselves um go home today and and spin one up um but in detecting these you know anzing the code some folks on our team were able to figure out uh so in the screenshot on the far left you can see a gas pot uh

instance in our data this is a screenshot from census search um and there's some interesting differences in the date format in Real uh atg or automated tank gauge uh systems versus the Honeypot um automated tank gauge is a is a computerized system that collects and displays information about underground tanks so like your local gas station fuel station will sometimes run these um but that's one way we can detect those um another that we knew we wanted to like pull out of the data is called kpot um and compot allows you to emulate a variety of different services including like uh modbus and S7 but um Teenage Mutant Ninja Turtle enthusiasts might notice um in the these screenshots

on the in the middle and on the right um for instance on the right you can see the system is techn Drome and you can see that the plant ID is Mouser Factory so just some fun little tells for those there so we subtracted these and other similar Services out of our our data set so this leaves us with about 42,000 honeypots or dup or 42,000 Services rather with those removed um and so now that we have this reasonably comprehensive set of data we realized we wanted to filter it even further because we we wanted to again focus on those things that were particularly important for water and waste water um and so we wanted to filter out

protocols that are most commonly associated with building control so like running the lights in an office building um the you know security system or door system not that that isn't important and not that someone could not cause harm with those but again they felt a little bit on the edge of relevance for us given our very specific focus on water and wastewater um so Fox and backnet are those two protocols that we opted to remove from this data to sort of filter out this leaves us with about 18,000 IC protocols in the US and so we've gone from like 57,000 to 18,000 and now we start to ask you know what metadata can we glean by you know

looking at maybe the network where these devices run um maybe there's useful DNS or who is information maybe there's other interesting tells that will help us figure out you know maybe who owns them so spoiler no that's not at all what happened um so this is the top 10 networks or autonomous systems where we see IC protocols in the US and there is a long Tale on this it is TR to 10 you might notice a lot of consumer um or business isps here um things like Comcast AT&T you might also notice T-Mobile so a mobile network um I'm actually curious uh is anybody here familiar with celco part Gabe you cannot answer anybody famili

Soco part okay CCO part is actually Verizon um so uh when we start to look at some of the metadata of these hosts running on these networks there's not really a lot that's very useful there you know when we look at DNS when we look at the who is it all points back to the Telo um and these are often running these you know low-level automation protocols that don't really give you a lot of information about who owns them or where they might be or any other details um so again considering that original goal that we you know had had of wanting to um identify owners for as many of these we went back to the

user interfaces so these user interfaces or hmis um we identified around 430 internet accessible hmis um you can see the variety of Industries here oil and gas is a whole other story that's we'll talk about another time um but for water we found just under a hundred and there's some really helpful details about uh things in the in these hmis in that they'll often like just present you with stuff like this it's like city of X plant or city of X water treatment station um and you're able to you know go and look at the geolocation of the host you do a little Googling and you can actually figure out like oh yeah this is probably this water facility

here's a contact I'm going to email them right um and in some cases we actually even find the you know a picture of the tank itself which we can then uh find on Google Maps and verify that that's actually what it is um so the hmis were actually pretty useful in identifying identifying ownership so ultimately of these roughly hundred water related hmis we were able to confidently identify owners for about half of them and so I just want to let that sink in for a minute we started with around 57,000 devices in the US and we identified owners for 50 of them we'll leave that there all right so let's talk conclusions in this last few

minutes so first I think one thing we learned from this is that you know looking at the protocol exposure those you know things like mod bus and backnet and those things um that's one part of the puzzle to understanding the story I think it's also really important to consider those internet accessible control panels because those are things where you don't have to have a lot of specialized knowledge you can access it in your browser and go start clicking around if there's no authentic which many of them don't have authentication it's also not necessarily the number of devices themselves I sort of was trying to tease this on that that last slide um there's not the number of

devices that's so concerning um but I think what's really really concerning and the point to drive home here is the real ones we do find in particular the ones that we we identify owners for they're often they cities they actual like municipalities water plants or drinking water facilities um and those are particularly worrisome when they're not protected by any kind of authentication a VPN any sort of measures like that and finally I will leave you with this I will zoom out and be a little philosophical for a moment um and I'll just say you know simple tasks sometimes can be deceptively challenging um and Counting is actually hard to do correctly that's all I have thank you so

much

we can

yeah yeah and if anyone has questions I'd be happy to I think we have a little time be happy to to to take

some I'm also very happy to chat afterwards you can find me uh with the chat how you doing uh any attempts to contact Verizon or any of these providers and try to you know work out attribution yes that's an excellent question so I think we probably need more than just our resources to get all of these Telos in a room and say hey help us figure out who owns these have you tried the isacs no not specifically but that is a good lead thank you um hello thank you for presentation uh question uh you mentioned the modbuz protocol and similar protocols that are unsafe um currently there is no incentive or benefit for the uh

companies that are using these protocols to migrate to a safer one and therefore uh no incentive for the vendors to stop implementing those in their products because therefore uh if they do customers won't acquire the product so how do you see that moving forward and do you expect a mandate to come out on that thank yeah so this yeah thank you this is a really good question so I think this kind of gets to the point that there are issues sort of at all the levels with this right like there's issues in the manufacturer space because there isn't really pressure to improve security for these devices at least in the US right now from kind of a

regulatory perspective um I don't know what the future of that looks like I don't know that things will change vastly without some type of enforcement or regulation um so yeah I think I think there's potentially a path forward there I know um I believe in the UK they've enacted some uh manufacturing kind of putting the burden on the manufacturer and so I'm very curious to see how that goes over the next few years and then maybe maybe that's something we adopt here we can do this is the last question yeah and I'm happy like I said please find me after I'd love to chat more about this um so with water I know that they're pretty like cash trapped

it's really a thin business so do you think that there's like something that should be done because they just don't have the money to do any of this stuff like it's not kind of not their fault in a way there's no money and no they're all thin operations so like what do you think the solution is to actually bring these utilities up to speed to what you know the threat landscape actually is yeah so it's an excellent question so one of the things I know uh I think the EPA is now responsible for drinking water facilities in the US and I know with some of the recent kind of attacks and things like that they've um stepped

up their inspections and enforcement actions and they're I think also offering resources if you reach out if you're a water facility and you reach out to the EPA they will help you uh make some of these assessments so I think you know trying to find ways to offer those resources because yeah a lot of these especially these small kind of municipalities are resource strapped um so I think finding ways that that the regulatory bodies can step in and offer assistance is probably going to be um going to be key I think that's all the time we have but thank you all so much I appreciate it

[Music]

[Music] [Applause] [Music]

[Music] [Music] [Music]

[Music] [Applause] [Music] he [Music] w [Music]

[Music] h w [Music]

[Music] now [Music] [Applause] [Music] [Applause] [Music] [Applause] [Music] can y'all hear me oh good this one's working okay good afternoon um welcome to bze Las Vegas um U we' like to first thank our sponsors especially our Diamond sponsors pris cloud and vanta and our gold sponsors Adobe and project circuit breaker and it's their support along with our other sponsors donors and volunteers that make this event possible um and these talks are being live streamed so as a courtesy to our speakers and to the audience we ask that you check to make sure your cell phones are set to silent and um if you have any questions after the speech uh you can use the the

microphone right there um and to make sure to kind of point the mic in the audience so people know oh sorry okay so with that um I have areana yeah great can folks hear me okay perfect awesome oh thank you wait till the end you know know what's going to happen in this talk all right hi folks my name is Ariana I am also from census and I'm here to talk about some ongoing work about lessons we learned when we scan the internet about every 45 minutes um so before I start with that just a quick primer on the internet for folks who might not have as much of a networking background uh the internet is in quotes

because it's made up obviously um so the internet is made up of these things called hosts you can think of them of as network devices you know your laptop servers on the Internet iot devices and and when I think of a host I think of a device that has an IP address so where we find it on the internet as well as what port in protocol it is relaying information over and again to kind of simplify this for the folks who might not have as much networking background port and Proto combinations you can think of them as like languages and dialects at a really high level it's how these devices speak to each other on the

Internet it's how they display content and there's a lot of these different uh Port protocol combinations on the internet so for example you can have a device or a host that speaks ad HTTP which is pretty common those are like your HTML Pages you can also see some that speaks ad0 https which is weirder it's stranger but it exists it happens on the internet because the internet's a wild place and one of the tools that is useful for understanding the internet is this thing called internet-wide scanning pretty self-explanatory it's where you take a program and you scan all the hosts on the Internet or maybe a subset of them maybe not all and you basically uh try

to speak to them over their specific Port protocol combination and you say hey what data are you willing to publicly tell me so we're not breaking in we're not hacking this is all publicly available information um again just to like break it down into or up level into an analogy imagine if Santa is just going around the world every 12 hours knocking on every host door and saying hey are you home also what do you speak okay thanks bye that's essentially internet W scanning and this is a super useful tool for security research because you can look at hosts on your own network and see what ports and Protocols are open you can look at the

spread of cves over an entire over the entire internet um and the nice thing about for me for everyone in this room is that you don't need to run your own server Farm in order to get this information in fact there are now a number of uh scanning wide engines census being one of them that do the scanning and then make that data accessible to others um but good research requires really good data and at ensus we are always thinking about how can we get more accurate internet wide scanning data and a little while ago we noticed this facet this really strange anecdotal Behavior where we would be scanning these hosts and they would be responsive and then all of a

sudden they disappear and then they'd respond again after like 2 4 6 12 hours that flapping Behavior seemed a little strange to us and like I said we're always thinking how can we make our data better because we want to enable better security research for ourselves and also for everyone in the community and so we didn't understand really what was going on so we were like let's set out to understand this Behavior to help us better our internet wide scanning and this brings me to a deceivingly simple question which is what I'm going to try and answer for the rest of this talk how ephemeral is the internet or in other words how often do we really need to

scan different parts of the internet and also to in order to get the most accurate data quick step back who am I my name is Ariana as you can probably tell I work as a senior security researcher at census prior to this I did my PhD at UCSD where my focus was on internet measurement and empowering security decisions um at this point in my slides I was going to say I'm wearing an orange Blazer please come talk to me it's so easy to find me but all the volunteers are also wearing orange so this is just an example of how the best slid plans can go to waste very quickly um I'll still be wearing this please

come find me I love internet measurement okay so back to the task at hand how ephemeral is the internet as with any good measurement question you can often break down this overarching philosophical question into more concrete measurable outcomes and so really what I want to find out is if we scan really frequently what trends do we find across different ports protocols and autonomous systems I'll get to what an as is in a couple of slides so if you're like what the heck is that don't worry yet and so to make sure on the G same page um I'm just going to go over our methodology really quickly we scaned hosts that had the 40 most common ports

open every 45 minutes for a week um I so in an Ideal World I would have had like 20 servers to do this experiment on so we were limited by server load we ended up only scanning about 6 million IPS because this takes time um and the way that we picked those IPS is essentially we got a list of responsive IPS on these 40 most common ports took a subset of them and then kicked off our predictive scanning protocol tool um that we use at census for our actual data set and this predictive protocol scanning is really key and I'm going to take a second and a couple slides to explain why um so like

I mentioned a a little while ago you can have hosts that speak different ports and protocols ad HTTP ver versus ad https and so if we just look at the spread of ports that the different IPS speak for example we get this graph so the x-axis is Port the y- AIS is just raw number of ips and you'll see that um we in our data set there's a heavy concentration of ips that are speaking popular ports this is very unsurprising what was a little more surprising is that um so we kicked off this predictive protocol scanning tool and so instead of us saying hey you're on Port 80 you're always going to speak HTTP we let our scanning engine predict

that for us um and so this x-axis is 40 um I'm now going to show you a graph that not only shows Port but is the port protocol combinations combined and this is that graph uh you can't read that xais and that's intentional because there are 412 Port protocol combinations when we use predictive scanning and so this is actually one of the takeaways that I I really wanted to drive home in this talk today is that this matches up actually with some prior research that a lot of um ports do not only speak standard protocols this speaks of the importance of predictive of non or non-standard scanning and as a measurement scientist you cannot assume

that a given Port will always speak at standard protocol and not only does prior research show that but our own measurements back that up too a little bit more about the ground truth of the ground truth as I like to call it like I said we were limited by how many servers we could um spin up to run this experiment so we scanned about million IPS of those 6 million 81% of them spoke exactly one protocol during the entire week and so for Simplicity I'm going to focus on that 81% there's some really interesting other stuff going on with that other 19% specifically there's like 7% of hosts that our protocol scanner could talk to and get data from and then all

of a sudden they would respond with data that we couldn't parse it was just like unknown in our data set so there's some weird stuff going on in that 19% but that's a totally different top topic and a totally different talk and so bringing this back to our measurement question what trends do we uncover AC cost Pro protocon as when we scan frequently and the metric of interest that uh we set out to First quantify was what do we find when we examine lifespans and so when I say lifespan I mean you know a contiguous portion of positive or successful protocol scans so the host is responding positively every 45 minutes um on a given protocol um and the lifespan is

how long they are successfully responding so a lifespan could be an hour right so something responds for an hour it disappears it could be a day it could be 7.8 days which is the entire duration of the measurement experiment um and I could show life spans for just the port but like I keep saying ports are often um in research they are often shown in the context of the protocol they're also speaking so for the rest of this talk everything I'm going to show is going to be Port protocol specific and um we're going to look at some common Port protocol lifespans but before I get to some major takeaways I want to take 30 seconds to discuss what

this type of graph is um because you're going to see a lot of these sorry um so this is a CDF or a cumulative distribution function um this is essentially a uh distribution of your data of Interest so the Y AIS is from 0 to one but you can map that to percentiles and the xaxis is your metric of interest for so for us it's days and like I said we ran this experiment for about 7.8 days because that's when I cut off the scans um and just to really drive home how to read this graph I've highlighted the 50th percentile or the median with the red line and if we see where that intersects with the blue line

and we let our eyes draw down that means that the 50th 50th percentile of adhdp is at about 6 days or 15 hours and so what that means is that um there are a little under 50% of devices that have lifespans that are longer and that's the this top arching curve and then a little under 50% of devices have lifespans that are shorter and this is like this really sharp uptick the other thing I want to point out about these types of graphs is you might notice as your eyes follow this blue line um it goes straight up at the end and that essentially means that at the end percentiles 95th 96th 97th 999th 100th percentile um the lifespan's

maxed out so like the 99th percentile of devices that speak adhp had lifespans of 7.8 days and so if you see those straight lines that that's essentially what that means so like I said sorry in advance you're going to see a lot of these but now that we've kind of walked through one of them I hope that these make a little more sense and if they don't don't worry I'm going to walk you through the takeaways anyway um let's look at the five most popular Port protocol combinations and their lifespans and so if you remember that graph with the 412 combinations these are those first five bars and these are their distribution of lifespans so not

only and uh just for clarification I've posted the the typed the medians of um these popular protocol combinations on the right hand side so like ad HTTP has a median of 66 7547 HTTP or cwmp for those of those of in the room who know our 0 five days um you'll see that the green line 443 https is a weird outlier it's really shortlived it goes up and then over to the right at a much quicker Pace than its counterparts which I'll talk about closer to the end of these slides but the really important part the really interesting part about the cdfs is that we can see the distribution and so in cases where we have medians that

are really similar like ad HTTP and 22 SSH the blue line and the red line they both have medians or 50th percentiles at about 66. 69 days but if we look at the blue line and the red line their behaviors are really different what this is telling me is that uh devices hosts that speak 22 SSH sure at the median they might be 69 days in terms of their lifespan but then after that be they become far longer lived and so 22 SSH is actually a far longer lived protocol in terms of lifespans than 80 HTTP its counterpart and this is why something like this looking at the entire distribution is so key to understanding this

ecosystem and so we see a lot of variation in common ports and protocols um with the outly m443 https which like I said I'll get to um I want to show you five other port or not five um another set of Port protocols for comparison that have very different intentions so a lot of these you know they're HTTP HPS SSH TCP sip um this graph is all male protocols um so again just like a quick background for those of you who don't know email has its own set of ports and protocols these distributions look very different those straight lines are super pronounced and if you actually look at the medians on the right hand side sorry

I forgot to type medians the medians are all between 6 and 7.8 days which is the maximum of the experiment duration and so what we find is that male Protocols are far longer lived than their counterparts but if we actually take a step back and ask ourselves why is that happening it is because the intention of the port and protocol can really inform Its Behavior who here runs a mail server yeah a couple people how much downtime do you have a little yeah so mail servers are meant functionally to stay online to forever or as long as the the admin wants in order to transfer email back and forth if there's downtime then you can have downtime in the actual

transport of emails themselves or something like adtp very HTP web page goes down comes back up no harm no foul and so this really speaks to understanding the intention behind some of these ports and protocols and why they are exhibiting these behaviors there's 42 Port protocol combinations I'm not going to show you I'm not I'm not going to just like keep going through five and five and five um but instead I I'm going to take a quick step back and just summarize this portion really quickly which is that when we look at lifespans based on responsiveness whether there was a successful scan or not we see a variation of lifespan mediums from 08 hours all the way to 188 hours which is

the duration like I said of the experiment lifespan or in other words these Port protocol lifespans can vary quite widely um actually a lot more widely than we anticipated now I love to make my life hard and so the next question is what happens if we add autonomous systems or a third variable into the mix and so for those who don't know what an autonomous system is it's essentially a set of ips that's owned by the same organization and has the same routing so like Google has a set of autonomous systems census has its own autonomous system or as um and for the sake of time I'm going to look at three as's that have very

different um functions again we're going to look at Cloud flare Microsoft and kixs which is the largest Korea Telecom as a case study and the key things really simple we're going to look at ad80 HTTP to start off and so what you'll see here again lovely CDF the blue line is the port protocol distribution and aggregate the orange line is hosts that speak ad HTTP specifically on that as Microsoft Corp MSN etc etc and so when we compare these two we can say okay um the devices that speak adhp on this as are much shorter lived than the entire population that we're looking at when we do the same Examination for cloud flare it's basically the exact

opposite that adhp is far longer lived than the the Aggregate and if we look at the Korea Telecom it's still longer lived but not as pronounced um and again I've posted the mediums just to make this a little bit more this takeaway a little bit more salient um one of the things to make note here is that similar to like how ports and protocols can have different intentions these autonomous systems have different purposes different intentions um Microsoft sells hosts as a service and so you're going to have a lot of customers who spin something up maybe they bring it down they spin it up again um cloudflare is a Content delivery Network again kind of similar to to mail

servers or mail protocols it's meant to have really solid uptime and then a Korea Telecom part of its function is to provide uh residential access and so that might be why it's not as pronounced as cloudflare but it's still a little bit longer lived than the aggregate these um purposes these reasons that these different autonomous systems exist can also start to inform Trends and how we might want to scan these different aspects or these different as's differently um I'm done with cdfs by the way way this is your last cedf if we also look at ports and protocols that are meant to be really similar in in intentionality though we don't necessarily see parity between um

the two the two distribution so what do I mean by that um these are the three autonomous systems and these are just medians because I figured at this point you might be graphed out um and we see the medians for Port 80 it's 1.1 hours 7.8 days and 1.7 days if we look at Port 8080 and again only HTTP often folks on the internet treat 80 and 8080 very similarly it's meant you know to serve web pages um with Cloud flare we see very similar Behavior but we don't necessarily see that with Microsoft in kicks and so we not only see a huge variation between as but also ports speaking the same protocol which have

the same intentionality and this was again where like all my hypotheses started getting thrown out the window a little bit um oh and then I already spoke to this a as intention can also make a huge difference so with my last two remaining minutes I want to dive into to one other discussion which is what if we change our definition of lifespan so so far we have categorized lifespan as successful protocol scans up and downtime right but we saw that with 443 https there were really short lifespans which seems a little strange compared to the rest of the report protocol combinations and so that got us thinking what if we changed our definition um to include looking at

how the host itself is changing you know what if it just there's some measurement error um something weird is going on with the network a lot of weird things can happen on the internet and so instead let's look at how the host itself is changing over time and so I came up with this idea of like a host cookie per protocol so this is the fields that are of most interest for that port and protocol and this um for this last remaining couple slides I'll do a case study on 443 https and the two fields of Interest were the shaw 256 of the body hash and the fingerprint of the certificate and so we combined those

together to make the host cookie and where like surely the lifespans of 443 https must increase because why would these things be changing so drastically and instead when we calculated lifespans based on change the median lifespan increased from8 hours to 1.1 this was not what I was anticipating folks um some digging uh because this project has been a lot of me digging we realized that the Sha 256 of the body hash is actually too granular for our purpose and intentions because what's often happening with HTTP which is an incredibly Dynamic protocol is you'd have frame IDs that change every time you visit the web page and if you visit the web page every 45 minutes

you're going to get a slightly different frame ID take a shot 256 at that that shot 256 is going to be different every single time and we actually verified this hypothesis because when we calculated the lifespan just based on the certificate fingerprint the median lifespan all of a sudden became 188 hours again the entire duration of the experiment run and so this brings me to this philosophical question which is what is the definition of a host for us for you in your measurement exper experiment it could be the shot 256 body hash it could be this certificate um for us we're now looking at context specific hashing because if a body hash or a body

HTML changes by 3 four 10 bytes to us that's functionally the same host and so there are some changes um that we are examining for our own purposes okay this is my tunnel of Terrors quick recap takeaway number one is that the internet is not homogeneous in its ephemerality um single isolated scans if you're a security researcher may be totally acceptable but not if you're trying to take the Pulse of something at the port and protocol level we find median lifespans varying all the way from8 hours to 188 hours and we find additionally wide variation we add an autonomous system takeaway number two is to understand what you are trying to measure and why it's important this gets

back to the Deep philosophical question of what is a host what does lifespan mean for you is it uptime is it change with https 443 alone these different metrics and measures change our lifespan metric metc quite wildly and then finally the internet is constantly evolving we need to be conducting measurements more consistently to understand these weird facets and what's going on and my colleague Emily had mentioned that you know counting is hard what I really want to leave you folks with is that measuring is also very hard so there's a lot of different next steps I think I'm a minute over time um I just want to thank my colleagues at census really quick good research is not

done in isolation I'm very thankful to be learning with my members along the research and data Team every day um and I want to thank you folks for your time if you have questions you can come find me in my orange Blazer thank you so [Applause]

[Music] much I'm just I'm just try to give you [Music] something I'm just trying to give you something [Music]

[Music] [Music] I'm just I I'm just TR to give you [Music] something I'm just tring to give you [Music] something I'm just trying to give you something [Music] oh [Music] w [Music]

[Music]

[Music] [Music]

[Music]

he [Music]

[Music] [Applause]

oh [Music]

[Music]

[Applause] oh [Music]

[Music]

[Music] oh [Music] h

la [Music] a [Music]

[Music]

[Music] [Music] [Music] a [Music]

[Music]

the

[Music] [Music] [Music] [Applause] [Music]

[Music]

[Applause] [Music] he [Applause] he [Music] [Applause] [Music] he

he [Music]

[Music]

[Music] track [Music] hey hey hey [Applause] [Music]

hey hey hey hey hey hey [Applause] [Music] he

[Music]

[Music] [Applause] [Music]

[Music] [Music] [Music]

[Music] [Applause] [Music] oh [Music]

[Music]

[Music] h

[Music]

[Music] [Applause] w [Music] [Applause] [Music]

I'm I'm just tring to give you [Music] something I'm just trying to give you something I I'm just TR to you something he [Music] [Applause] [Music] [Music] n [Music] [Music] I'm just try to I do I'm just try to give you [Music] something I'm just okay I do I'm just trying to give you something [Music] w

[Music]

[Music] a [Music]

[Music]

[Music] [Applause] oh

[Music]

[Music] [Music]

[Applause]

[Music] oh [Music]

[Music] [Music]

[Music]

[Music] [Music] [Music] [Applause] [Music] oh [Music]

[Music]

[Music] [Applause] [Music] hey hey hey [Music] [Applause] [Music]

[Music] he [Music]

[Music]

[Music] [Music] St

[Music] hey hey hey hey hey [Music] hey hey hey hey hey [Applause] [Music]

[Music]

[Music] [Applause] [Music]

[Music] [Music] [Music]

[Music] he [Music]

[Music] [Applause] [Music] he

[Music]

oh [Music] [Applause]

[Music]

[Music] [Music] I'm just to something I do I'm just TR to [Music] something I'm just to okay I do I'm just trying to give you something [Music] a [Music]

[Music] a

[Music]

[Music] [Music]

[Music]

[Music] [Applause]

oh [Music]

[Music] [Music]

[Applause]

[Music]

[Music] a [Music] n

[Music] that [Music] ground rules here this is interactive I'm going to I'm going to have some time for questions at the end but you feel free to interrupt we can go as deep as you'd like but you know for those that have been following the news just feel free to blurt out like how much did the United healthc Care breach cost wild guess a lot give me a number next to a lot anyone 40 million 40 million any other guests 100 million when you factor in

all huge numbers absolutely so big numbers any other guesses terms of numbers 200 million excellent what about the solar winds breach how much did that cost billions billions of dollars okay any other guesses what about the MGM breach how much did that cost theost exactly million $15 million and they paid right not the only cost right there's other costs $100 million in Lost business any other guesses awesome so we're engaged we know a little bit about this so little bit about me in the background here so I'm a recovering ciso so I've been in the financial services space for the last few years uh the security and the the strategy and the execution of the security program done over 37

mergers and Acquisitions but I've also spent my time securing Banks and Tech as well and so on my journey in all these mergers and Acquisitions if you've ever done enough of these you start to inherit certain things you start to see certain patterns right and if you're working with private Equity firms or Venture Capital firms you notice that they have a different game they're playing they're looking at financial statements but the Cyber element is uh always in question right especially when you start to inherit and bring another company into your environment or if you're divesting and so as I was going through this journey and started looking at what business looked like pursuing Ed Executive Education looking at business

and then ultimately going into Finance I learned hey Financial an analysts play a different game they look at numbers and they're looking for patterns they're able to go through and see things over the course of time especially as they're comparing it expressed through these financial statements especially for publicly traded companies so partnered with Dr Jun NE who could not be here today unfortunately because he's in the process of moving uh he's a big brain analytics person a savant when it comes to taking data visualizing it and so together we're forming a team and a project to go through and analyze and look for smoke signals I'm going to take you a little bit on this journey so

right now cyber due diligence looks a certain way we think it should look potentially another way with financial analy with the financial analysis we think that there's some patterns and things are of interest that we want to share with you regulatory changes are requiring disclosures and we think that's a wealth of information especially when you know what to look for and you start to look at these patterns and when we think about a little bit of common ways of doing threat intelligence we think there's a novel way of building a new model so we'll start to share that so we started off with this but you know you've all seen these headlines you not at

Healthcare MGM solar winds right not to pick on any one particular company but you see all these headlines we see numbers and so these are some of the estimates as quoted from these headlines right big numbers and they vary right they vary based on the estimate at the time these are mostly year to date but while we take a look at these headlines we start to pick what happened these are publicly traded companies that are otherwise healthy in posting these financial statements so let's start to look at what is inside of Security and Exchange committee Security and Exchange Commission and SEC required filing there's two filings there's a 10K that's an annual disclosure it's kind of like

posting a selfie right you go through and you have your financials you have what risk factors what your business is doing you've got the management that's disclosing things that are important for a potential or existing investor to know and then you've got the numbers then You' got some other supplementary disclosures so that happen happens annually depending on a fiscal year and it's important it's important it's a requirement to go through and say this is everything you should invest and this is what's interesting to you potential or existing investor then if you experience an event right an event could come in any form a merger an acquisition a divesture something that is material you disclose it in the form of

an 8K and it's a special event right and as of December of 23 there's now a requirement If you experience a cyber event to disclose this type of information and so it provides a new form of information a new body of knowledge for us to see hey what took place so when we think about our threat intelligence today go ahead when you said ak3 what's the threshold for that

compies yeah so the question is what is the threshold for disclosing an AK right and it's at the determination it's really up to the company when materiality has been met materiality is a very fancy expensive word that lawyers get to sort out but effectively requires a company to go through with the right general counsel CFO legal outside legal and disclose that something is taken place and it's still in question it's still relatively new the answer another question in the back I'm sorry you have to speak a little bit

more does it require the updates after the breach interim updates so yes there are interim updates and so at the point in which you the question was are there other uh incremental updates that are required absolutely so the idea is to inform a existing or potential investor and the regulator that something has taken place a service outage something that could affect the overall Financial outcome of a existing or potential investor great questions so we dig a little bit deeper on this and we think about today's threat intelligence right it's really a life cycle right so there's different forms there's different activities that take place but in effect you have a direction right you you set out to go

and discover some information you go and you collect data from available sources to you public private otherwise right you process the data you analyze it with subject matter experts and you disseminate it to the world to make an action on right and then you follow that life cycle that's our traditional threat intelligence at a very high level and there's different forms right there's strategic where you're Gathering public pieces of information there's things that you're probably more commonly thinking of your ttps right the tools techniques of what adversaries are doing the operational aspects or even the technical the things that you're able to discover internally if you're really looking at the types of uh technical types of things that are out there and

so the idea is that it's not really a life cycle it's really the flow of information gathering of information right you're following a cycle and so let's take a look at this right our common threat intelligence today yields this type of activities right so here's an example right mandate well-known company right and this is what took place for solar winds this is sun right this is an example of what took place threat actors indicators to compromise and what our conventional threat intelligence yields today but where is the $40 million in this right how does the financial analyst understand this how does a potential investor understand that in the form of an 8K per se but

this is what we think about an example of technical threat intelligence another example great company level blue Labs right this is what MGM looks like when we perform threat intelligence again indicators and compromise right some of the operations what took place at the time when the bad actor impersonated the help desk and did what they did created a week-long service outage another example of our strategic threat intelligence this is Huntress another excellent company right gathering information publicly available to really disclose what took place somebody had impersonated the help desk went through the the access broker things were purchased had access and led to effectively a weeklong outage but where's the number right where's the100 million in this and so when we think

about this we think about pointing not just at the publicly available pieces of information but P pointing it at the publicly disclosed financial information using some financial analysis there's some interesting things right because the purpose of financial analysis or the purpose of managing your Financial Risk is to prevent the loss of money right preventing a bad investment and so when we start to compare and contrast cyber threat intelligence with financial analysis there's really some commonality right both set out to combine and create as part of a comprehensive risk strategy known unknowns if you will right identify Financial risks identify threat actors identify adversaries that are targeting you you have a goal to minimize your risk and you're using those similar

techniques putting in front of the right information the right users to make an informed decision at that time so when we start to connect the dots we start to think about let's use this database of publicly disclosed information Edgar which is what it's called so there's this security Exchange Commission database that's out there it's publicly available and when you go and you pull it from anything that says 8K and cyber from December of 2023 to let's just say July we've got 2200 AK filings mentioned in cyber now not all of these are related to an incident and I'll talk through that you got just like anything you have to distill signal to noise but with the requirements there's

some really interesting information out there right you have described cyber security processes who you're working with effectively what took place financial information and so it's a wealth of information and it's going to be continuing to evolve right this is relatively new and so when we think about it we start to to actually dissect and distill what these 8ks are there's a question

yes

all the question is do we find that companies are disclosing contingency losses as a result of their events and it depends depends on where they're at with the event it depends on where they're at with the whole Litany of financials that adjusts and if you hold that thought I'm going to go a little bit more into kind of where that information comes in but it's a great question thank you so this might actually help with that so when we start to to distill the 8ks and the information that's out there not all the data is useful right so top example is an example of an AK related to a cyber incident so it hit the criteria right

cyber AK and the the threshold but you can start to see the changes in Revenue right the changes in expenses net losses and again distilling it in you could start to again depending on the ratios you use and the industry and how they carry their assets you can start to distill some of that information so this is an example of something that's useful right again calling your attention to something that you want to inspect a little bit further based on some numbers and some words the other is an example

BsidesLV 2024 - Ground Truth - Tuesday

Related talks