BSidesSLC 2020 - Will Pearce - It Is The Year 200, We Are Robots

Name: BSidesSLC 2020 - Will Pearce - It Is The Year 200, We Are Robots
Uploaded: 2020-03-23
Duration: 31 min 30 s
Description: Title: It Is The Year 200, We Are Robots Presenter: Will Pearce

BSides SLC · 202031:3068 viewsPublished 2020-03Watch on YouTube ↗

Speakers

Will Pearce

About this talk

Title: It Is The Year 200, We Are Robots Presenter: Will Pearce

Show transcript [en]

you

umm can you see them at the screen yep we can see it awesome okay so yeah thanks Bryce for the intro and we'll Pierce I work at sound rig security and as I was you know building the talk obviously you have to submit and then when you're building the slides you know it gets you maybe take it in different directions so this you know we're gonna talk about phishing and sort of the applications of AI to phishing and kind of where that's at and the work that were doing so some previous work we've done some talks in besides Las Vegas as well which is besides it's been one of my favorite venues to give talks at

[Music] so he has a chat and we gave a talk at Derby con about kind of similar things we have some public projects already out there so op bot which uses reinforcement learning to sort of find administrative privileges proof pudding was sort of our adversarial attack against proof point we get to drive by them on 15 every day and so this was their kind of owner our hit list command recommendations with our n ends and deep drop which I just rewrote so deep drop is just a sandbox classification model and I just rewrote it to give back a score so it used to do the whole dropper thing but I tore out the all of the dropper bit and now it's

just an API we were scheduled to give some talks but thanks to Corona you're here so we'll have to save them for our course in blackhat but

so the first question when you look at machine learning says is machine learning right for you and you know really machine learning is there's a lot of hype around it and that's you know with good reason it can be really effective but ultimately you know you're just trying to model a problem mathematically you know there's not much magic to it you can use it to predict without explicit programming but you yourself are still going to need to know how to program so you can't just throw data in and expect something useful out there's going to be a very heavy workload up front in terms of collecting data parsing data and making it transforming it into a format that's

going to be useful that's even that's before you even get to the the piece like you know feature feature engineering where you're you're using your knowledge to lead the model in a particular way but ultimately you know that allows us to be more productive you know we can do things like automate decisions you know just we want to offload that more manual work to to an algorithm so if there are simple things that you can automate you know decisions that you can automate you know like deep drop for example that's you know you can be more productive and you can focus on different things the industry is growing super fast computing power I'm you know there's

some stat that goes around one to every one so it's like computing power doubles every you know 50 days or something and so you know there's a lot there's a lot out there and there's a lot to be done I kind of think that you know some basic knowledge of machine learning will be required going into the future it's still magic every time you know it hurt and I go through and I explain a neural network to one of my colleagues it's still a little bit magic but it is it is most of math fact it's only math which is you know I wouldn't be afraid of the math piece the math you know the

maths kind of already done for us you know I like to say algorithm they're empty and so the math we don't have to struggle with the math it's done for us on the right this is a little excerpt from talk turd transfer former so I just actually there to talk to transformer comm and it's just the GPT to model which is big language model hooked up and I simply just asked it is machine learning right for you and this this is what it came up with so in some ways it can it can be coherent and in other ways that you know it can kind of talk around the subject offensive machine learning so this is kind of I've

kind of stuck no I have stopped doing ops at silent break for now and I focus mainly on you know building tools and research that help support our ops team and offensive machine learning is just simply the application of Shinar into offensive security problems so whatever that may be whether it's you know generating phishing emails whether it's you know finding administrative access faster whatever that may be we're using that as a blanket term to kind of separate but it helps us reduce cost so you know we hunting for admin access or hunting for information in really large networks is very costly and can you know take quite a long time especially you know if you're going through NC a thousand file

shares it'd be really nice to have some sort of intelligent system by which you could go through automate decisions you know just offload those simple decisions we can scale operations on the defensive side machine learnings you know not being looked at as a replacement for human interaction but you know in order to scale or at least that'll be the first step is to scale we can dig through our data and we can create advantages so red team's I think traditionally haven't collected or even cared about what data they're looking at or collecting but you know digging through the data we can create our own advantages especially as networks get tighter and you know more products sort

of land on the endpoint and you know there's obviously the adversarial piece and we recount the adversarial machine learning under offensive machine learning as it just helps us further our more nefarious goals so um well if we can bypass proof points model with an adversarial model then you know that to us that helps us further our our offensive objectives and you know machine learning is awesome digging into it if you're if you're into data or if you know even if you have you know that little that little spark or that that love for your just numbers or or or data even if you're not into math you know it's like so it's not a math

problem anymore the math done for us it's it's an engineering problem now and so you know most of my work is actually you know a lot of some people make the analogy that data is oil and if this were the analogy were true then I'm building the drill so we have a lot of data here that we you know can't keep on to so I have to you know build some sort of drill that we can transform it before it gets deleted and the mass taken care of so it's not it's not a math problem so if you're interested in it you know at highly recommend you know digging into it because it's not as you would

think but you know we're able to model complex relationships so you know if we're looking through Active Directory you know I can find you know little nugget of information way faster with some sort of similarity algorithm then I could scrolling up and down in a text file looking through information obviously with the computing you can just crush huge amounts of data there's no almost no limit to it and the data we have we're not saving you know billions of data points we have you know maybe thousand so it's not it's not a ton you can make it as complex or as simple as you want so anything from binary classification to enormous language models to reinforcement learning to

combining all of those together into one sort of coherent model and you know really it's about bringing out those operators six senses so you know we obviously have a lot of experience and so it's really about finding out about how it's about modeling those operator decisions like why did you look in that file share why did you you know why did you run a sequence in this particular sequence of commands in this particular way why did you you know do this so it's we were trying to encode our our experience and their knowledge into these algorithms in some some coherent fashion and not only are we looking at for it to support our operations you know a lot of

other companies are doing the same and so you're gonna have to have some kind of knowledge otherwise you know you'll be operating on a network and something's going to break and you're gonna you're not really going to be sure why and so you're not going to be sure and I better explain it and you're not going to be able to have really any recourse in terms of troubleshooting and as networks get tighter those opportunities get are getting much [Music] we're losing access more often than than we were in the past and so unless we can explain why we can't go back and fix our tools and so we need to have some sort of knowledge

so we can go back and reintegrate or you know do some research and figure out what's going on but you know everybody's jumping on the bandwagon I remember for a lot of you remember application whitelisting you know three to three years ago black hat it was everything was application whitelisting and then you had the lull building bins project and yeah as it turns out there's actually a lot of stuff that can execute things so then there's a kind of stop talking about that and now now machine learning is kind of the way change it's called fish so this talk is really about fishing and over the course of maybe three years we've seen a significant increase in the amount of

effort required to fish and this is just a generic chart with numbers that I made up but what it really shows is the number of interactions that we're having to do is going up the number of platforms that we're having to use or whether it's hosting documents on s3 Azure any other you know Dropbox we're having to use a number of different platforms the number of techniques we're having to use so we used to be able to just send in you know simple macro and it would work but now we're having you know a macro MHT a you know L&K file and L&K file wrapped in an ISO wrapped in a zip file so just a number of emails

generally are going up and the number of interactions with targets and the number of techniques they have to send their targets but 100 percent of emails are being inspected whether it's you know there's BAM filter or you know a third party or on site everything's being inspected say probably 10% of our payloads are being sandbox so it's still not a huge huge number 80% of our emails are received so you know the payloads do get delivered and we do you get a 60% click great the issue is you know the endpoint protection that were generally up against it's there's several of them so it's not any one product you know you're having to balance bypassing you know two

or three different endpoint products so that means you know everything's our efforts going up but through ops there's kind of this this process where we have you know a comfort level you know ops are and smoothly everything's working fine works getting done project managers happy everyone's happy and then you know some sort of change and people start implement new dicta to new technology or whatever it might be and there's some discomfort ops are taking longer to complete tools work but you know they have some challenges maybe some techniques have fallen off in their usefulness or techniques are getting caught or you know we can't use one of our favorite techniques for whatever reason and ultimate just leads to work

taking longer and so we need to do something so we're gonna adjust and and this is works on hold as we rewrite our tools [Music] we have to test you know I have to make sure everything works and then you know then we can start pushing things out but there's definitely a delay in work and I'd say in terms of the the fishing fishing piece or the initial access piece we're definitely at that point of discomfort and an adjustment so my this talk obviously I think is a representation of that and the research that I do is a representation of the adjustment that that we're having to make as a result of you know just some

like discomfort in in regards to fishing but you know we've bred teams have had it or attackers have had it easy for a very long time and so a lot of people maybe complain that it's difficult I mean all of the teams we talked to say it's getting more difficult and you're seeing a rising sort of assume breach model testing but you know I think we're just in sort of a bouncing phase where defenses are getting better and attackers are gonna have to raise their game a little bit so we're gonna go through a little exercise and sort of talk about how we do things and so first thing when we're fishing we start with a

persona so this can this is a fake person and they have work experience you know they went to school they're just a regular person and they represent kind of our team on the Internet and I would say probably each of our operators handles maybe four to six personas that they're responsible for and you know thanks to sort of machine learning you know these faces that you see on the left are all fake they're they're not real people so if actually we used you know you can just go on Google and Google for a person and you go to like a fifteenth page and you steal a profile picture um you don't have to do that anymore you

can just generate a fake person so maybe and maybe deep fakes are actually good for the general population in this way but this is you know I think it's this is not a person calm and so you can go where this person does not exist calm thing go you can generate a face and you can throw it up on the LinkedIn and and fill out you know all the all the requisite information there and then you have some sort of presence you have an email address you know you can you can push out all the social media build all social media accounts you need and so on and so forth but really you know when

you're looking at phishing you want to get the right combination of things you need a persona you obviously need a pretext and you need some sort of target so one of you know our favorite our probably most successful fish come from our personas that are young women and my two favorite protects are just an executive recruiter or a new college graduate and my targets are men aged 45 to 60 vice presidents directors c-level you know those kind of people or similar age women in similar positions I tend to find that if I'm fishing from a young woman's persona that older Remini less interested in helping me out where no manage this you know anecdotal but

managed four to six year generally extremely responsive they're more ambitious and they're generally more aggressive with following up with you know with my emails and so they make a very nice target as you know they're they're more involved eventually in their careers you know or they're you know they're always looking for the next you know best thing kind of although you know executives at for example fortune one or you know top our fortune five let's say our you know the recruiting process is a little different and so we attended just maybe stay away from executives that really really big companies um you know posing his young man I like the you know job advice like hey here's my resume

could you you know take a look at it or you know I'm thinking about yeah you look like you took a similar career path to me just wondering how it worked out for you you know would you be interested you know looking at my resume or you know having a talk and the target can be really men aged any age in any position effectively you know a lot of people are really they're they want to share their opinion and they want to [Music] you know they want to interact with you so if you're asking someone about their opinion they're more likely to to come back to you you know and as the time goes on you say oh you know do you want

to get on the phone you know here's my calendar calendar is just an HT a send through a link on s3 um yeah execution that way again so young women protects job advice or life advice kind of a similar thing and then you know older women in any position so if you're asking for advice versus offering a position I tend to find that now older women are more inclined to help you out or take a look and if you are trying to if you meet them in a business fashion

um but you know techniques to talons or targets so you want to be professional upfront gonna build a relationship you know we you want to make sure that you're following up with your targets you know don't just shoot off a phishing email and then come back or not come back with anything make sure that you're following up if you found someone who's gonna execute your fish you know send them multiple payloads so use them to troubleshoot on the network and if you're recruiting you know salaries should be competitive they shouldn't be egregious you know they shouldn't be ridiculously out but you know in lieu you can use stock options so IPO anything with tech and your fintechs

very popular at the moment this is like hey you know we're to startup idea you know do you have your fishing an accountant you know do you have IPO experience and they're going to know exactly what that means for their paycheck but ultimately want to play to your strengths and you want to play to you know your targets wants needs and expectations of your persona so if you're fishing someone in Australia for example you want to make sure that you put to use in in the right places for the different spelling 10 targets so executives we like these we like going after executives but they're high risk high sometimes and the rewards not always there because you know they don't

always have access to the information you want interns we love fishing interns you know they're new they're fresh they don't mess up they you know they're looking for a new job that pays more so these guys these folks are always good marketing you know they're pretty hit or miss um sales really good they'll click on anything it says RFP or here's the invoice or so on and so forth HR used to be our favorite not so much anymore they're just used to dealing with people they have processes for shifting your document somewhere and so you know they're kind of not a favorite mine project managers you know they're also favorite they're used to receiving documents from the external IT folks

high risk and high risk high reward I recommend you know LinkedIn premium pays for itself you know you can find go and get do Google dork for some doc on their site pull the email to that poison the doc can send it back in but then when you're start chatting with people you know this is the kind I pulled these messages out of our LinkedIn chats and some some of them are kind of funny but I think you're beautiful is you know it's not an appropriate place for a LinkedIn but you know you should use that to your advantage if you're trying to fish this person so unstructured text is a really painful so we have kind of two scenarios when

we're chatting with targets we have you know are you interested and he's like no I'm not interested I'm happy where I am cool thank you no move on to the next one then you have the other scenario where they are interested and you can start to converse with the target when you're conversing with the target you want to give yourself options and so hey that link didn't work for me apologize run a new system you know here's another one doe couldn't open it here's an HD a so oftentimes you know if you find this this target that is willing to give you feedback then you should definitely definitely push the envelope until you get caught but now we want to turn this

into a you know machine learning problem so how can we do this well you know we can use word embeddings and these word embeddings are just machine learning ball representations so you know given given input why what's the probability of X so given a conversation where this person is interested in a job and there's this conversation going on what's the probability that you know the model should output this text versus the other text and you know it's really it's it's a joint probability problem so when you're looking at training so we gather your email logs you want to recreate the conversations and sort of input output you'll want to you know turn all those creative vocabulary

turn all those words into numbers and then you want to throw it into sort of a recurrent neural network and LST M or you could even you know some of the larger GPT - which is a really large language model there's a I'm sure a lot of you heard a I dungeon which I think came out of a student at BYU here in Utah you know but it's looking at the probabilities of what the next word is in a given context and so when you're running through this you know you can see see what comes out and this is our email so hi this is we gave it a context hi gentleman graduate looking for a job

and the the model just finished it for us including the many including the name she's not great but you know you can emails can be corrected and the emails are guaranteed to be different so this is an you know another email so this is useful in in the sense where you know a blue team is looking for similarity against our on our email soliciting fight we fire off five templated emails one of them gets caught these language models ensure that all of our emails are different and so they're not as likely to get found but you know from a cost perspective writing five fresh emails before phishing is can be you know as costly as timely and we want

to make you know we want to scale operations so if we can generate emails even if they have to be corrected then you know that's awesome and you can just generate text almost infinitely and in my testing January about thirty percent of them are useful a hundred percent of them need some sort of tweaking and yeah you can go kind of go from there but yeah it's coherent but it's still sounds kind of odd and we can use the same with chatbots so actually when I started first started researching this is like oh look for a chatbot that does recruiting or you know we're some example code and as it turns out there's just legitimate companies

that do this which I'm surprised that I was surprised to find that be it's the same thing so gather chat logs you recreate the conversation train you know your model and then kind of see what comes out our generally our issue generally is that we don't have you know a ton of these conversations so chat bots might you know a chapel at a recruiting company might have millions of examples we probably have maybe let's see 2500 so you know comparatively our models are gonna be worse but this link in the bottom you can go to LinkedIn you can pull out your messages but this is kind of an example of a conversation from from our from our chat boy but and

this this you know high cell you're interested in a job I am interested what is it and then the response is is doesn't make sense in the context and so these are kind of difficult to get away with but I would say you know it's mostly the issue that we have is we hindered by the amount of data we have so you know we just can't keep data for the sake of machine learning our clients you know it's unpopular for us to keep data even though you know clients will ship you know they'll ship all their logs off to carbon black in the cloud you know but if we're not allowed to it's obviously different kind of information but I'm

wondering if that will change in the future things like adversary simulation products for example if they they're gonna start to implement AI some at some point and they're gonna have to keep data you know temp I would say generally templates are easier to write they're easier to send you know you're in controller everything and they're still not getting caught but they will get you caught you know if if some blue team's finds one email they will likely find the rest of them but chatbots an a in language models are just gonna get better from here so you know we're kind of at the beginning of it which is exciting but also scary there's you know

there's a ton of work to be done in organisations everywhere and there are just new risks presented so you know the whole adversarial piece is another one I'll be on besides Lac and Twitter and I'll put these slides up on honest github link here so I can't talk confessed okay so we have a couple minute or two minutes for questions you are there any questions

you you

can people hear me yeah we can hear you I don't see any questions in the QA or in the chat anybody has anything but we just got one it said I have no experience in ml where should I start I would start so binary classification definitely start there there's a great book called make your own neural network by tariq rashid and it breaks down a neural network just the simplest components and it is dead simple to understand do I do you have any a I tool or research references yeah I would say so there's a def con AI slack and if you're interested I would definitely recommend joining that slack in addition you know to sort of googling I think

there's a there's a lot of research that's done in labs that isn't very practical for our use cases so there's there's gonna be a lot of efforts that's gonna have to go through and pick out this academic research and apply it to us to the offensive use case another question so if the techs are moving this way from blue team perspective how do we go about combating this I think a lot of the tools don't really exist yet in or they they're starting to be built in but the maturity isn't there so you know even though you know we're doing this research on networks we're generally not having issues with machine learning products that we know it yeah I'd say

proof point is an exception to that but you know they're they're they're pretty good at what they do [Music] and so yeah I would I would I would just say hang on it's gonna be you know - a year - three before these products are really going to be useful for you um in the meantime though if you're if you have your own data that you can go through and you know build your own own models so you're looking for cosine similarity similarity scores for for malicious events things like that then so we're out of time but there is one project that I wanted to highlight for the defensive side it's a project called side Bart cyb ERT and it was put out by

not sure is put out by but they're looking at this kind of stuff so there's definitely a goldmine there for the defensive side excellent that's it for me if you guys have questions I'll be in the B side slack and you just want to talk machine learning I'll be there as well thank you guys

BSidesSLC 2020 - Will Pearce - It Is The Year 200, We Are Robots

Related talks