
so thanks for coming this talk is about detecting social engineering attacks using natural language processing technique so we process the sentences and see if they're suspicious as they come in that's basically what we're gonna do but I just want to first acknowledge my co-authors so you can see toward the end of this list is me and then Marcel who's standing right here but the first people there who did all the coding are my students from last year so they need credit for this because they actually coded basically the bulk of this although they are not here giving the talk no so some introductions this is Ian so he's a professor in computer science at the university of
california in irvine and he does research in hardware verification and security and a lot of natural dyeing or language processing stuff and this is Marcel Marcel Carlson he's principal consultant at loot core he does red teaming consulting security research in general and he doesn't like to do hardware hacking and social engineering is his favorite thing that's some fun stuff with some card terminals today yes so quickly we'll do a little primer on social engineering I'm sure you guys are part of it but will this quickly go through so we have a definition here that's pretty good so what social engineering you can sort of explain that as any act that may influence a person to take any action
that may be in their best interest or not right because remember not all social engineering is evil you might have a doctor or something who will actually social engineer into doing something that's good for you even though you might not know that's the case so again social engineering it's something new it's been around for a very long time you have con women con artists men and your kids are very good social engineers they want ice cream or something they know how to do that very good it's a very fascinating but also complex topic you get a lot of time to go into all the details but you have things like body language micro expressions framing
pretexting illustration like that right so there's a lot of things that goes into social engineering you can feel more questions we can talk about that afterwards but we don't have time to go into details about that let's talk a bit about organizations and companies right there really I would say exposed to social engineering we should think about modern business or organization it's a complex ecosystem of technology people processes and of course humans working together and usually I would say the sort of threat the social engineering attack is usually underestimated by most companies and the awareness is usually quite low I would say it's sort of become better that the last few years some companies do some
some pen testing with spear fishing and stuff like that but normally it's pretty bad so if you do any sort of spear fishing associate that neering attacks well you should be working and that's why it's still being used today in most sort of attacks right and another contributing factors of course if you think about sort of solutions in a company even business processes often there's a burden on the individual user to take sort of security decisions right and that will be of course exploited by social engineers and that leads us to sort of trust relationships if you think about your typical organization they will of course engage service providers trusted business partners and often you
will have a someone running your physical security like facility management and stuff like that and of course this will expose a lot of trust relationships that again a social engineer will target attack and exploit and to identify this sort of translation ships we our social engineers will use of course open source intelligence gathering so most companies and their employees will post all kinds of information on social media platforms so what you can do is harvest all this information it's really metadata and we automate that so we can harvest a lot of stuff quickly and of course there's data dumps data leaks out there with passwords usernames so you can find out the user sort of formats and stuff like
that when we need to fishing and all the sort of technology infrastructure information like IP addresses all sort of stuff will gather all that information to build up processes and profiles of companies and people that we really use for our attacks and there's some pretty cool new tools that are coming out this if you look at sort of the topics of this conference and black cats and stuff you'll see a lot of machine learning cognitive science stuff and this is something called Microsoft video indexer and it's free you can just sign up and then you can actually upload videos videos right here's the fake Obama video from BuzzFeed you upload that and we'll do some pretty cool analysis so we'll
extra extract metadata so we will actually do face your face your recognition and find all the people in the video guess the timeline it will do transcript of all the sort of spoken words and extra extract keywords and you can say here will even translate the different languages will do OCR to take out stuff that's being text that's sort of in the video and pretty cool stuff right so you can see it's moving in the sort of direction that we can even use video now to extract metadata for our sort of profiling a quick word about methodology like we talked about trust relationships it's all about gaining trusts will extract information influence the target keep doing that
couple of rounds and then eventually we will have a compromise and talking about sort of s social engineering attacks right it's nice to blend social engineering attacks so you can do it remotely be a sort of email messaging SMS voice this sort of remote attack vectors or locally will you still show up in person and then you can of course mix and match these sort of things and attackers are lazy and they like to work effectively so of course path of least resistance basic stuff works and you don't really need 0 days to do the sort of attacks here's an example of typical phishing email this is from the DNC attack there's a guy called Podesta John
Podesta right so someone fish them pretty good they use some basic sort of obfuscation to bypass the I'm trained in the email solution and sub hiding the target or the URL there when you click on change passwords you can see here you should be this shouldn't work right but it works every time very much like people don't pay attention you can see here it's not a secure URL it's actually dot CK domain as you can see there so you should will click enough but people do usually here they've used his own sort of picture from his Google+ page to make it look nice so he would actually put his conscience in there and be fished as you
can see it's not very sophisticated stuff but this actually works that's the scary thing right so talking about what's coming more soon here there's something called deep fakes I'm not sure you guys are familiar with this so you can take a video now and use some pretty basic tools on your laptop and take for instance a set of video videos or pictures so you can put if you take a porno porno video put some celebrity face on there or maybe your girlfriend that didn't like that's actually moving it really fast in terms of the development doesn't have to be porno it could be yeah I'm saying it's always porn of that sort of drives devolution
right so that's kind of funny that just we're sort of starting always but I'm saying it's I mean we will see this you know for social engineering attacks as well okay standing of course you can alter video like Obama video you can put someone else's face on a video and that's pretty scary stuff which brings us to stuff like adversarial games right that's a concept we have sort of two counterparts one will produce or fake stuff and and sort of submit it to someone who will judge whether it's real or not here this example is sort of dollar bills and the sort of Calvin Mouse game so of course the faker will become better to sort of trick the
investigator who will look at the sort of fake stuff and the investigator will will learn how to spot the fakes and if we introduce machine learning and sort of AI into the mix we have something called generative adversarial networks or gas so this is sort of the same idea about here we introduced something called propagation so again we have a generator sort of creating fake sort of content based on changing samples submitting it to discriminator that would sort of look at it to see whether it's real or not comparing to real samples and then sort of the the determination a data will actually be sort of felt back to the discriminator and the generator it's actually become
better so that eventually the generator will actually produce content that the discriminator will not be able to tell whether there's a fake or not and that's actually interesting from a social engineering perspective because you should do this to sort of audio so if you have a bunch of audio snippets we can actually create a model where we generate sort of fake audio that will be very close to the real thing and again there's research being done done in this area about this moving pretty fast right so it's very interesting area for social engineers to look into and yeah so it's looking pretty bad right a lot of exposure a lot of tax is working nice so what can we do
to sort of defend ourselves thank you so I'll start talking about the tool so what it is is that yellow box right there it says social engineering detection that's what we provide so it takes text that some potential attacker has spoken and it analyzes the text to see if it is suspicious or not right and if it is then it reports it back so you could connect it to an email client or a texting app or something like that pretty easily you know authority in text form or you could do it if you had speech speech to text you could use any audio video voice you can put on a phone right if you did
speech to text first and just feed it the text and then it would output some kind of alert message of course you know right now while it's going to do is print text to the screen scam detected or something like that but you could easily put any kind of warning and so so yeah we're analyzing each one of these sentences is spoken or sent and this is one app just to differentiate this from a lot of previous work in social engineering stuff most of it is fishing just phishing emails you know but what about if you do it in person what about if you do it over the phone right then you only have the content to look at you don't have
metadata and you know URLs and all this and so this is what we're talking about is effective against long as you have content doesn't matter what for you know how what the vector was you could use our approach okay so so the basic idea is that in order to detect the social engineering attacks so social Internet X are complicated but somewhere in that attack the attacker has to ask some inappropriate question so meaning asking a question whose answer is private right like you know what's your secure social security whatever or they got to give you an inappropriate command tell you to do something you shouldn't do like plate click on the link or something right so so we're assuming
that somewhere in the social engineering attack they got to do one of these two things right that so we'll call I'll call it the punchline okay at the end of their social it maybe they're warming you up flattering you whatever make you feel good but at the end they got to be like okay what's your social security number right they have to do one of these two things so I ask a question or issue a command that's inappropriate so that's what we're trying to detect so now what I'm gonna do is so I have a demo we have a demo right in case that fails I have a little mini video of a demo right now that will run
through and I'll just sort of talk through it I thought yeah it worked five minutes ago but you know like ten minutes ago okay so let me start this guy yeah so they say how to start running it so basically all you do with all I'm gonna do is just type in sentences and it'll tell me if it's a scam or not okay so it is slowly starting okay now I'm gonna start running it okay it's called temp my students named it they're lame but don't worry about that okay answerin so you got a type of sentence in so I'm just gonna to some random sentence nice weather we're having right and it's gonna say that's fine so hit
enter it's gonna say uh first one takes a long time it's usually quicker normal sentence so I'll highlight that so okay so that worked right so now I'm gonna type in something that's more suspicious I forget what it is oh yeah a command give me money okay so that's suspicious if somebody's like give me money so that we detect scam detected now notice give and money right we find the verb and the direct object and that's sort of the idea with the commands right the verb direct object that pair is suspicious now we're not just searching for words the words given money we look at how they used in the sentence so here this is give me a poodle for money that's a
silly sentence but I bring it up because he's innocent and it has given money in it okay but this thing we will not detect this being because we can say we could see that given money are not used so given poodle pool is a direct object so this is an innocent sentence so just because it has the word giving it the where money in it doesn't mean it's malicious so we're not just looking for words look for how they're using the sentence then I get some questions yeah so those two are commands use a question so what is your password that's clearly bad or if I spelled it wrong so it's clearly bad so hit enter and it's gonna
tell me it detects it as a question and in a second it's gonna say scam detected okay so and then if you you can ask you some question that's whose answer is not malicious oh it's not suspicious and it won't detect it so what is your age okay so I didn't consider that to be suspicious so I didn't put that there we have a database of things answers which are suspicious and that's not in there and so it's a normal sentence so that's basically what it's gonna do I'll have them if I have time I would have time I'll show a longer version of that demo but that's basically what it's gonna do okay so system structure now this is big and
I'm not even going to talk about the whole thing but the idea is it starts with text that's spoken by some put some attacker and in the end it gives you malicious sentences over on the right meaning sentences that look suspicious and if there's more than one it like say you're scanning an email and there's a sentence one or more senses it looks suspicious then we say this is a phishing email or this is or if you're talking and one of the sentences of this person says is malicious then we say okay they're scamming you warning you know so the main thing I'm going to talk about over here is um okay so first thing we have
the sentence processing people we put periods in the sense so people don't necessarily use proper punctuation right so we have to put that in automatically sometimes but say you got these sentences we got it determined is it a question or is it a command right and maybe it's neither in which case we ignore it but if it's a question or command then we if the command we do command analysis to see if it's a malicious command if the question we do question analysis to see if it's a malicious question you know private question so these are the things I'm going to talk about like how we do those things they tell if it's a question or
command and then once we know it's a question command how do we analyze questions how do we analyze commands so okay so detect so how do you tell if the question of command so this and a bunch of the things we do are based on parsing okay so we take the sentence and we use a syntactic parser to basically which tells you the the grammatical structure of the sentence you know the parts of speech reach word noun phrase verb phrase all that all that stuff that you like forgotten sixth grade which I forgot I have to learn again there parses they don't automatically do this and we use Stanford parses a popular one it works really well so it gives you a
parse tree of the sentence that shows you the different parts of speech and we we use that we look at the parse tree to look at the structure the sentence and we basically find patterns in there that we think is vicious so so here's an example here's a sentence can i eat okay that's this parts tree over there and these just to say what these tags are asses for sentence noun phrase verb phrase verb modal anyway it's a bunch of tags and you get a tree like this it shows you the internal structure of the sentence now this is a question and notice that compare can I eat I can eat the way we made it a question like
one way to make questions in English is you swap the subject and the verb in the modal right so can I instead of I can actually same in French and a bunch of other languages but that's one way you can detect the question so if you see actually the the parser does this for us this s inv tag that stands for subject invert if you see it in the sentence you know that this is a yes/no question where they inverted this subject in the modal okay so this is given to us for free we just get the parse tree using another tool we say always that s inv tag in there yes is a question and also
s cute egg because there are other types of questions this is a close question a yes/no they're also open questions what is this what is that right and we can detect those with actually there's a series of slides I left out just because there's a lot of detail in it but I have a white paper that I'll be happy to give anybody who wants to read it but it gives all the detail but they're a bunch of tags we can we look for and we said look if we see these tags inside there we know it's a question so then once we know it's a question we do command detection sorry change that around question if we know it's a question so
we do say that wrong we have to check us a question and we have to check if the command it's not a question then we had chickens of demand so the way we check is to command there's several different ways to write commands the common way is just to have a verb without a subject in front of it okay so instead so go home right I didn't say a he he goes home I said go home just know subject to the left of it you know stop right there you just put the verb right in the front right and there's no subject in the sentence at all so if you see a sense like that with no verb - no
noun phrase to the left of it then you say okay that's a that's a command it's one type of command it's a common type there are other ones like more polite ones so often you'll say please go home alright I social engineers often do they want to soften it they don't want to seem like an order so they say in a nicer way another thing they do suggest you could go home you should go home right it would be good if you went home right these type of soft suggestions we detect those two could should their modal verbs and we basically we look for a pad in the tree where you see like a you then a modal verb and then the
sentence the you know the verb and the rest of the sentence and then we can say that's a command - so there are several different types of commands that we detect by looking at patterns in the parse tree so so we can detectives a question or detectives command now once we know if the questioner commands then we got to analyze the question or command to see if it's malicious so with the question analysis how we analyze a question so the basic idea is to determine if the answer to the question is private okay so if it's private the answer is private then sounding alarm if it's not private then no problem right so if you say where is the bathroom
answer is not private so no problem but you say once you so security number the answer is private so you say sound alarm sounding alarm right so we wanted to tell if the answer to the question is private I'm not going too fast am I no okay cool so if we want tell us private so what do we do we basically use research and question answering systems so question answering research is it's an area in active area research a lot of people do a lot of it was a lot of industrial working it to where the goal is you ask it a question it gives you an answer and it even looks through a massive database of facts to find the
answer or it looks through the internet itself to find the answer okay so it's like what Google does except better you know it understands the question well enough to be able to find the answer okay and other I don't develop question-answering tools but they exist okay they're pretty bad okay because it's a hard and they're bad but they're it's a hard problem okay taking a natural needs questioning providing an answer is really hard okay so when I say they're bad they have a lo they are not correct most of the time okay but that's okay you know they start in the field you know so on but I like for instance the tool that we're using it's correct 44% of the time
on their tests examples their test questions which is low right but when you modify it so that it's good and it's good for us okay because we say in a second though but anyway why so you look through some massive database and find the answer okay so so we're using the type where you look at through a massive database and the idea is you take a question in English you got to generate a query so I say I'm using a sequel database right I make some query and and then I search use a query to search the database and I find the answer okay that's how question answering tool that's how a lot of question answering tools work so what we
do is we basically say well we do one change if we do two existing to an existing question answering a tool is we say look instead of having this database is like 15 million facts we only put in the private facts okay only the private facts then if you find it in there you know it's private and this is a malicious question if you don't find it then it's innocent because see we don't need the answer to the question right all we need to know is is it private or not right so if it's a we just so instead of 15 million facts we'll have like a hundred facts that are the private facts you know whatever in fact
you want in there and we'll put them in the database way I actually put the fact - I should mention we put the information about the fact but not the fact itself so say I said President Obama's age was private right so I put it in there put his private tag and I wouldn't put the age in there because somebody notice me that would be stupid if you just have a database full of private facts if somebody got you know so but we don't need the actual fact right we just need to be able to say if you did a query search on it it would find this entry and then that's enough right we don't need the actual fact we
do we just say look we found it in there it means it's private so okay so the the question answering system we use is called parallax here's their reference their paper reference and it's open source and we just downloaded and use it use it and this is a sort of three entries out of their table so they have a massive sequel table of like over 15 million things that they culled from Wikipedia automatically and it's got basically I'll relate everything's a triple-a relation and two arguments so official language of Hong Kong Cantonese plural for bacterias bacterium and so on okay sake its massive database of facts and okay so that's what they have and what they
do is they take the English question like this they generate a query and then they search the database for the to find the answer right and they're correct 44% of time you had a question go back I was too fast huh yeah yeah okay oh yeah just want to see it okay yeah sure if you want any more information I'm happy to give you the whole paper like anytime like after this I'll just if you have a thumb drive anyway so yes so this is the idea so they and we didn't do this they do this right they make the query now one thing is since this problem of question answering is really hard is
they don't generally they don't make one query okay they take the English and they make multiple queries in fact like 3040 queries because it's basically guessing at what the right query is it doesn't know what the right query is so a try basically it tries lots of different possibilities so it doesn't generate one query it generates like 3040 okay for one question then it ranks them and they have a ranking algorithm and then the one at the top of the ranking is the one they use and then they search and if they find the right answer if they find the answer in there that's what they give now this is an example actually this is a one actual example using their
tool where it comes up with these two queries that comes up with a lot more but I'm going to show on the top - the top one query it's answer Steve Jobs cuz it's misunderstanding the question right the second one is actually giving me given the date is given the right answer so it with for this question they answer it incorrectly because they come up with all these queries they choose the top one but their ranking was messed up and they got the wrong one but note that the answer usually the correct query is somewhere in the top group of queries but they're only choosing the one top okay so so we change that okay so one
thing we do is like I said we strip the database down to just the private facts like information about all the private facts okay the other thing is that we use okay so we use the top 15 queries okay so we don't just take take the top one queer you find the answer we take a top-15 query search them all in the database and if it's found if the answer is found in there then we say it is private now what that'll do is it'll increase the rate of true positives right because if the right query is the second or third rank we'll find it so we end up getting correct 99% of the time okay but it may increase the rate
rate of false positives because you know maybe you know maybe this thing isn't actually private but one of the other fits you know 14 are you know it has an answer which is private and that's possible but extremely rare okay because remember we'd like actually there's a database of facts in the world it's like actually this tool uses like 15 million facts we have like a hundred private ones the odds that if you pull fifteen random facts out of there that they're actually at code just accidentally in the private set is really slim cuz you're talking like 100 I have like 15 million so the chance that increases false positives is really slim so that
hardly ever happens okay so even though we look for all 15 the other 14 are probably not gonna be private you get product because they're just sort of randomly spread out across a database so that's what we do with question-answering yeah that's basically how we deal with how we analyze a question okay so in addition to analyzing questions we have to analyze the commands to which by the way are much more plentiful like in the data that we were looking at you see a lot more malicious commands like click this link or whatever rather than malicious questions and now this doesn't mean that you know if you do it we're using phishing emails in person I would expect
a different answer but I would expect something different but anyway command analysis so the idea here is if you get a command that's just telling you do something innocent fine it's no problem but if they're telling you do something you know please tell me your social security number but you shouldn't do that then it it should be setting off an alarm right so how do we distinguish like which commands are suspicious and which are not so we we basically summarize the meaning of the command as the verb and direct object and it's direct object in the in the sentence okay so and then then we basically have Lac list of those verb direct object pairs so for instance take a left at the
corner take is a verb left is the direct object please give me your password give is a verb password is the direct object okay so we take the sentence the command find the verb direct object pair and then we basically have a list a blacklist of bad verb direct object airs if somebody says give password that is bad right and we flag that's basically all we do we look it up in the blacklist and if we see it in there then we say it's bad now there's other stuff that I'm not gonna talk about but for instance synonyms but we handle synonyms my little demo version doesn't but the real version handle synonyms it does Lemon ization so
for instance this verb might be in a different tense gave or something like that right so it normalizes to find like one lemma for every word plural versus singular for nouns right it normalizes all that so but right now I'm not really gonna talk about that and we when I say we did that we again this is all built into Stanford parser right they have tools for that already built in so we had to we have this topic blacklist when I say topic I mean verb direct object payer and we and you have to make that and you just look at look up see if the verb direct object are in there so how do we find the verb direct object in
a sentence so we parse a sentence and again we use in stanford parser they have a type dependency parser okay and that thing it's very cool actually it tells you um the the basically the the semantic roles of the words in the sentence okay so specifically of interest to us is do BJ so it says like if I said please give me your password it would it would in this parse it would tell us the obj direct object give password tells me that right so I know give us a verb password as a direct object and this is given to us Stanford type dependency parser already provides that I don't even write that okay there's another way
to find a direct object to but there are two tags do BJ and another one and subject pass for passive sentences but either way it gives you the verb direct object so that's how we find the verb and direct object of the sentence that you know that we take in and then we just look it up in the black list now this black list you know sue basic verb direct object right now these pairs you could compile them manually okay you could just like manually in fact this a good idea we're gonna explore that actually next week when I get back but you could make it manually but what we did was we use so what we did was we
have like a pile we have like thousand phishing emails we got from various sources we took a hundred thousand these phishing emails and we looked at the verb direct object pairs in there and we basically what we wanted was verb direct object pairs that occur in phishing emails and don't occur in non phishing emails right so to do that we took a hundred thousand phishing emails one hundred thousand Enron emails which were our our non phishing emails you know and we basically computed this term frequency inverse document frequency doesn't mention the last top two it's a metric that's used to basically if it appears in the one set a lot and not any other set then it this
gives a high value okay so if you see it a lot in phishing not in the end the non phishing then it gets highly ranked okay so we did that for and we got basically we just had some cutoff relatively arbitrary cutoff point and everything above that we said okay that's our blacklist okay that's what we did there are a lot of cases where that's not the best thing to do like you would want to do this manually you can do it manually too okay so anyway we got the top of blacklist demo okay I got the demo really cool now let's see if it actually works you saw the video though in case it fails okay so a little
hard to read but this is a virtual machine it's a bundu virtual machine instance so there's a couple this four windows in here this is just my directories that I'm gonna need late I might even need this open I this is where the actions gonna go on this is where I'm gonna type in the sentence and it's gonna tell you if it's a scam or not down here these two are related to the question answering so this is the question answering tool it has its own little server there and that's running in that window and then this guy is the the parser core that Stanford quarry NLP parser and this guy talks to this guy so
when this guy needs a question it needs to answer a question it sends it to this guy which parses it and it sends a parse tree back and this does analysis so but most the actions going on over here okay so let's try this let me go over here okay let's say ask something innocent make us sense hello there okay so that's not anything right and it says okay it says normal sense okay and do BGP object the direct object pair it didn't find anything because it's not even a command the commands are where we need that so let's give it a command that's malicious what's a good one oh now I'm blanking on my example I was gonna use up give me
money no I don't think that's I don't know if that was in there see basically I have a limited black list I'll show it to you in a second okay I got to use something out of the blacklist so let me stick with my plan right but you could add you could easily add that to the black list but for this demo I have like eight pairs or something anyway so let's say give me money and it's slow I don't know why but it will quickly okay that's right okay scam detective yeah good yeah now if I say give me a car car it shouldn't it's just a normal sentence okay so notice it says you know oh that might
be bad too but but I can put it no but I can make it bad hold on so give me money says you know give card said okay give car that's not in my blacklist so what I'm gonna do is modify my blacklist which I have open right here this saxy this JSON file is blacklisted JSON that's it so I will open it up of course customize your blacklist right depending on what's available too so you can put whatever you want there so yeah that's that's exactly right like the idea is that if you were using this you know at your company you would put in bird rec traffic pairs that are relevant to what
you peep what you think people would want to get out of your out of your employees you know I mean so you would make it by hand but here's his sample one that I got and this so you can see give money give resume oh I actually had send money in there what do you know borrow money whatever so I'm gonna add there's all those OHS ignore the oh those are options that we haven't really implemented yet so let me put a car in there give car in there okay so I'll just add give car and then I gotta put the O's in there there well there's there's just option some yeah like tf-idf is supposed to be
there's a bunch of things that we're gonna put in there one soon that are not there right now but a bunch of numbers we're gonna put in but yeah right now okay so yeah I think I put that in there so let me save it okay so now close up okay now I'm gonna quit the program started again so it reads that blacklist again now actually be able say give me a car and car it slowly will detect that is a scam yes yeah it goes yeah scam detective okay so it's as easy as that you can just add these verb direct object pairs in there to cater to your own needs you know question if it could
I do most of the time and the and the the car so give car is car car would still have to be the direct object of give can you give me a sentence I'll be willing to try it may fail nobody that's malicious busy okay so just so you know this demo is fragile not unlike the real thing you said cars is it cars with an S it will lose it just cuz of that I in the real thing that is on the gitlab but for the demo that we scraped together quick it doesn't do it oh I'm sorry I think he was next and then I'm sorry I think you an EXO they want to screw you over but I
work I used to work for German company and the Germans like you should really hate speaking English people because of the way we the semantics in which we talk about and I've written a few sentences down here right would you by chance know the are you able to tell me the and now they want to screw you over because this is like really awesome and you trying to squeeze us in so like a really reassure like an amount of time and they're quite complicated sentences so you know just kind of like you know this is gonna guaranteed to fail I'm sorry dude sorry what was the first one just out of curiosity women would you by
chance know the you know the password associated with the account cafe say thanks this demo will not do that but the actual one would so would you that would it will detect that as a modal and it would catch it but their most of the ones you're gonna say it won't okay but let me go that's alright you're next and then you so you go ahead straight behind you straight behind you like way back like straight back there you know yes at the wall that wall I'm a ghost that's exactly how I am alive which is how I would like this next question to be treated if we can here is an example sentence from a real-world
case I am in need of your company name API keys can you please provide I need here it would say need and then keys I mean oh you said could you please provide it okay here what okay so one thing it doesn't handle right now is pronouns okay so could you provide it that could be caught if we handle pronouns now pranaam arrest like a known thing in laughing processing I'm not doing it but if I were doing it it's like I basically I wouldn't make it up like I could find an off-the-shelf technique and just apply it but then it could handle that yes yeah no he said it yeah yeah so the caught in the context there actually the
word it wasn't even there oh really yeah yeah it was like could you think it was can you please provide oh yeah and yeah then it wouldn't yeah you know that so when you start talking so basically this is limited to one sentence at a time to understand that you got to connect two sentences yeah and we don't do that could I mean kid simply could have simply handle the phrase I am in need of API keys well it would say it would get need API keys yeah and then if that was in your blacklist and it would catch it got it yeah question first I just want to say this is uh this is really cool and I feel like the
audience is collectively trying to like figure out ways to break this anyway go ahead there's there's I can think of like lots of really cool ways to use this not just for like detecting fishing but also adding on to like bad user behaviors like I'm thinking like you take your chat feeds internally like through slack or whatever and you want to stop people from sharing their passwords with each other sharing API keys with each other in like a chat forum like slack like this would be a cool way to see employees behaving badly yeah you know there's all sorts of evil uses too but I don't go into that you're like I'm like why why isn't the US
government already doing this which maybe they are right so my question how how are we handling and not in the devil version but in the real version how are we handling like real social engineering like targeted social engineering in which we are creating a scenario that seems realistic where I'm like hey I'm system a bimbo blah please update your password we can I guess that for sure update password yeah meaning they're targeted aspect of it one of the beauties of this is that all that complicated stuff that you do the pretexting elicitation all that it doesn't matter we're not even looking at that we're just looking at the punchline the end where you're like look give me
the data or do this so does this flag on things like actual security communications where they're like hey you guys need to update your passwords yes and I'll get so the okay so let me just before I get to your question basically the problem is this has no idea who is speaking okay so it assumes this is a foreign person speaking if your mother asks you for a hundred dollars you should give her the money maybe but if you if a random person asked you've been that suspicious right so we would have to be able to identify the speaker and that's part of what you're saying and we can't do that meaning there have there are other
techniques to do that type of thing that we will investigate yeah you have to add on top go hey you got a question right there to blue so it's kind of similar to the last two questions kind of a combination so I know that you just originally just said that you only look at one sentence and my questions going back to telling kind of real-life I was too Shoshone engineering give me your password can can you give me your pass hey suspicious for like say I bury the subject nine and in the noun or those on the object in multiple exchanges let's just say if I say hey what do you think is a secure password and then that's a
question like symbols and stuff as I well do you think yours is secure how do you set yours what is it so okay but so okay what is it right there's a punch line right and if I could resolve that it to back to the now which is easy then I could detect that what is your path then I could substitute the it for your password and then I detected do i right and that's an example but like say if you were to keep this memory and how do you like say if you're just having a natural conversation it would be fairly difficult to trace back what the it is especially if you're talking about
multiples so I okay I disagree with that so this problem of tracing back what the it is they call it phenomenal resolution and naturally expressing people have been trying to solve it for decades you know I'm saying they do a pretty good job of it I mean I didn't implement one of their techniques but so I don't think it's as hard as you're saying I think people have looked at that problem a lot is this a like something that you guys are looking at like yeah I'm gonna add it so I just got a nice fat grant to do this and I'll be doing all that you know okay thank you to say because you're do
is I understand what you're built what you're talking about here is a simple solution it because it's simple it's not baffled by all the pretexting that people are talking about and you know complex awkward sentences also have the problem that a real human being misunderstands a-- and so that if a social engineer really if you know wants to you for sure that the password they've just been given is in fact the password as opposed to something else they have to ultimately be clear to the human being and declare at the point that they're being clarity that's your punch line and that's what you're triggering on mm-hmm and you know let me let me just say
something and to comment on that right so so this guy Chris has naggy you may know he runs a social engineering village at DEFCON right so X I published with him earlier talking about this a lot right and he wants to use this to Train social engineers to be better okay he's like yeah you try things out if I detect it then they got a reword it in a new way so I miss it right and they could do that right Nick and there can be this back and forth just like malware detection right and then I can change this to be better and they can change their to be better but mine has a finite
limit meaning humans only under you can't just might write random English sentences it's got to be something understandable by a human you can't write arbitrarily complex bizarro sounding sentences so there's like a limit to how much that can go on before I win and I could detect every way a human could express something or this is my claim anyway well with malware detection that limit I don't see it you know what I mean question oh yeah I think yeah okay okay oh sorry he's not after you I'm sorry yeah yeah you go ahead you go ahead okay okay yeah so as a social engineering uh I work in Brazil and security our awareness there is very non-existent actually so
actually actually this works a lot there because most of users are really just way stupid then when way more stupid than we can possibly think so it they handle passwords without you asking for it it's like doing do it's awesome doing a social engineering job there is like the easiest thing ever because the security guard gives you the password to the door is ridiculous but this and it's mostly not because people are stupid but because they never had contact with awareness security programs and such so this this could be very helpful for stupid users for say for saying but because it generates an alarm it gives you the alert that you know this phrase has
password and gives and give together you should really think about that you know because you're kind of being stupid so you should consider that so it helps be bringing awareness more present to the days of everybody else who works with the confidential like password I wanted in show infos like passwords I think this problem is knowledge Brazil I mean I'm from Scandinavia it's over there I mean it's like you say that the world is so low it's it's it's it's everywhere you know and yeah so whatever we can do sort of healthy situation right so yeah but he has to go next because they we skipped him go ahead I would like to give you a
car I would like to I don't know I only try it real quick
normal sentence it didn't detect that uh yeah wait he's next though and then I gotta go to go ahead I'm back alright so um one other tactic that I'll see as part of social engineering campaigns that I've run is I will genera size a question as much as I possibly can so and and what I often see happen is I would feed I would feed some information that I know that makes it appear as if I have some insight into this into some secret and then from there I will ask a generic follow-up which is what other information do I need to know or just more blatantly what else might I need to do to do X I might not even say to do
whatever I might just say what else might I need and then at once I've confirmed something that might appear semi-secret but isn't really that's the point at which the person the mark may start to vomit information right back at me because they feel there's some sort of established bond of trust I feel like this isn't necessarily designed to speak to that but I don't think that's necessarily the point if I'm understanding correctly this is definitely intended to sift out the the broad spam phishing attempts as well as some of the modestly more intelligent ones I'm thinking that somebody that's exceedingly motivated or somebody that's on a particularly motivated whaling mission might still be able to pass this
thoughts yeah I mean there's a lot of ways you can sort of bypass this remember it's it's kind of simple them all right like you said you can even say something on purpose incorrect and they will you know say the correct thing back these sort of things right so that's this wasn't sort of designed to catch these sort of sort of very sophisticated sort of examples right but yeah yeah like I know there's another question but yeah like like for instance I got in one of these slides okay so for instance so long lines what you're saying just we're detecting questions and commands so just don't ask one suggest it so you could say what is your password or you could
say I can reset your account but I'll need the password first you didn't ask a question but they're gonna give you the password you know so that's along lines what you're saying that is more sophisticated than I can deal with because you have to understand the mental state of the person I can't do that yet them all you know he's behind you is quite right there so I just wanted to so I also run an offensive security team of some sort I live in this road as well and I just wanted to make a comment and kind of respond to what you're hitting on it I think the point here isn't that we've designed the perfect system to solve
social engineering right it's that we now have perhaps a tool where like in like without this what is our defense it is relying on the human to even to catch the most basic dumb stuff someone gonna give me your password and I'm going K like that is not a solved problem right now and we try to use awareness and things like that to push that up but if we could technically solve it which this does we can at least like raise the bar so that the dummies don't get our users passwords and they at least have to freaking try hard right yeah which is the idea of security make them try harder right so and I think this I think
this is a really cool thing so was there another question thank you I think yeah we got another five minutes lift that's fine I don't have to finish the rest of talk this is why I had to get to the demo the rest is just French stuff go ahead I'll be quick so you had the the slide that said you know you input something and then you test it and then you get the result and then it goes back so yeah we put this in that oh say it's not working so that yeah I had that slide it doesn't it doesn't do that so much meaning what I mean is what all we do is we take we do exactly what it says
shows right it takes it takes text and it prints out text but you could easily hook it up to instead of putting out text you could hook it up to a screen and put a warning on on the in front of you on the screen or put a message in your ear on a cell phone something like that but it doesn't do I haven't closed that loop but that could that's sort of an easy thing to do I think okay okay I'm gonna go on okay so basically just to say result says we did we tried on a lot of phishing emails we found for these different sites and here are the results now they're not that good
actually but this is starting right so this column is for phishing emails and this calls for Enron emails okay and this is how many we classified is detected as phishing or it's you know malicious or not detected okay and what we want this is 87,000 so we have like one hundred eighty seven thousand mails of each type a hundred we used to Train so eighty seven thousand right as well you get left so we would like it if all the fishing were detected so this is eighty seven thousand and all the neurons were not so this is eighty seven thousand so these are bad right and they're not zero okay so I'm gonna try to explain some of
that I'll start with the false negatives the phishing emails the fish so thirty five percent of the phishing emails were not detected as phishing so why did it fail so badly in that sense so basically because what I did was I had the students look at a hundred of the of these false negatives and manually figure out why they weren't detected and what happened is so basic all right we just detect this punchline okay the question or command but in these phishing emails you a lot of these phishing emails in the data sets we looked at they don't have that they they're just the start of they just say look hit me back okay that's like the
most suspicious thing they say now so we don't catch them right and that's like 35 that's not 35 cent that's like 80% 79 percent of the male's of the ones we didn't catch we're just like that in fact um like here's one this is a subset of it right that's a dot but in the end it says please you can call us or reply us whatever right I what we didn't catch that as being malicious now I would argue that eventually in this phishing and this email conversation they would have asked a question or made a command and we would have caught it then okay so I don't feel too badly about the fact that we missed all these false negatives
because I think it was just a bad dataset in my opinion oh and I mention too in the side before I run out of time that is there something you want to say okay but no sirree but so one thing we're doing so what problem they have had with this work if people say look you're saying you can detect in-person attacks over the phone but you don't try it out on INE you haven't evaluated on any that's because they don't exist publicly okay you can find phishing email datasets but you can't find conversations that were that are fishing so there is social Engineering so what we're doing with my wonderful grant but I just got is we're
wouldn't gonna run an attack you know social engineering village you know so we're gonna do that right I got a pro I got it approved well I'm gonna pay students 15 bucks and I say look in three months sometimes three months you're gonna get an attack right we're gonna call you up try to scam you for the and we're gonna tell them explicitly this data or this date or whatever but they're gonna forget right and two months later we'll call them they will have no recollection of what I told them and then they'll get fooled and we're gonna run a bunch of attacks and see so we can get a data set we can say look these attacks work and these
don't you know we're so we're gonna do that actually next quarter question I
mean I you know oh stop I got a I got a stop I'm sorry oh wait can we ask question I have questions when you say we're running out of time but I think we you can squeeze one question in okay go
no no so do you have any kind of concern that students these days don't actually get calls anymore it's a parent fact that you're calling them may already be suspicious mmm-hmm I mean I think it's gonna work yeah we're gonna some peoples getting calls I mean I don't think it's that bad kids today you know maybe look we're gonna try it that I mean look what I got a proof is what I said well spoof a I call our IDs you know and we'll make calls like that okay I guess that's uh guess that's it there so so yeah so since there's the last talk let's take another five minutes for questions okay any questions we might be done with
I'm curious if you've tried combining this with any kind of sentiment analysis and seeing how that might play I have not so sentiment analysis I looked at sentiment analysis but for what I'm trying to do I don't think I need it it's cuz you know a sentiment is basically plus/minus as far as I understand it's like good bad right but some of these things like give me money is that good is it bad you can't even classify it that way so I don't see how it's useful for this yeah do you question ok she's that question right there at koi actually is not a question is more like a story that I have that you may consider in Brazil
actually calls are very common and are so common that people don't think twice um believing it so things like hey you just want a car give me your credit card number so I can credit to you and they and they'd say they do give the credit card number are you saying Brazil should be a target of our attack and we need that very bad because months ago my coworker came to me crying with money in her hand like very desperate omarina the doctor just called me my husband is in the hospital and I need to pay for the transfer and I'm like very desperate can you take me there I have to take the money to the doctor a lightweight take
the money to the doctor what the [ __ ] call the doctor and she called the number hope that called her and some guy answered it and like I'm like who was talking oh you know I was working with a guy and he got an accident and I'm in the hospital with him then the doctor is very desperate they need to transfer him and like ok let me talk to the doctor please and I was in India this other guy answered and I went and I sir how is the husband doing how you know what happened and the guy answer me you know he just like have his leg is kind of hanging it's very [ __ ] bad
you like just very [ __ ] bad yeah ease which was we were okay I see what's happening here but that was a good try man and that that happens it's not it's gonna work very good okay so I guess we're done right no thanks thanks for thank you so much [Applause]