← All talks

PG - Cats and Mice - Ever Evolving Attackers and Other Game Changers - Eric Kmetz

BSides Las Vegas25:2141 viewsPublished 2016-12Watch on YouTube ↗
About this talk
PG - Cats and Mice - Ever Evolving Attackers and Other Game Changers - Eric Kmetz Proving Ground BSidesLV 2015 - Tuscany Hotel - August 05, 2015
Show transcript [en]

besides Proving Ground this is Eric kets and he's talking about cats and mice ever evolving attackers and um other game changers hey folks how you guys doing this afternoon yeah besides all right sweet uh good stuff uh you guys here for the long week like I am I see a couple hungover people couple maybe a fed or two you guys definitely a Fed in the back uh you guys definitely want to uh Pace yourselves cuz you have a couple days in store so uh all right uh let's just move on cats and mice let's talk about social network Bad actors uh who am I just to kind of start into things um I'm a long running

information security enthusiasts I've been into BBS scene if you think uh hpac if anyone's familiar kind of like the old BBS nomenclature for hacking freaking carding um I got into the internet got into root kiding easy piec hijacking sequencing all the fun stuff that you could do around that time um and uh I've been going to Defcon for quite a while my first bid was last year I thought it would be an awesome venue to get my first talk uh moving moving to presentence I spent a couple years working on a social network and in that time I saw lots and lots of abuse cases and I uh kind of got outside the box and I tried to solve

them so uh that's where we're at right now so what's my talk about an interest of your guys' time I want to talk about what it's not about um despite the title and whatnot um we're we're not talking about cves we're not talking about host level Network level application Level security per se uh what what this talk gets into is more of your uh eighth layer if you will uh it gets into like the user space or the political space it's called both of these things um and so once once you're once everything else is secure what do people do on your services specifically social sites for the purpose of my talk um that covers spam uh fraud persona non

grata maybe some people just don't play nice with others as per your terms of service and they're using vpns and tour and all these things to access your network and I'm going to talk about how to transcend that a little bit and uh fingerprint their behavior but before I do let's deconstruct let's uh and and I'm going to get into the way I'm going to do this is uh I'm going to go through kind of a deconstructive process of what You observe um as as a means of actually putting these systems together so before I do let's talk about why we care about this you know what okay it's users on a site people are going to troll people

are going to do stuff well there's a few reasons you might care um first and foremost just as a researcher you know in an academic sense uh human behavior is very much a part of it you have your security chain and people are definitely a piece of that so social behavior on social sites is going to trans transcend any other kind of service really um maybe you're somebody with a vested business interest um you know you could just be a volunteer you might be a staffer of a site and you know what you do and what you learn about it and maybe the systems you implement could affect the bottom line or you know keep it a

cool place for people to hang out at least um maybe you're just a user you want to know what admins deal with on a regular basis or uh you know you could just be interested uh as as I give this talk I would be really curious uh towards the end of this what you guys think about in terms of who who might be interested in this sort of thing you know um as I present the concepts I'd like you guys to think about that and maybe when we get to questions um I'd certainly like to entertain sharing that so people are going to abuse social sites what can we do about it a couple things uh manual

intervention pretty common uh you know it's uh you have your admins on the site you have your moderators they're going to go and you know find trouble and deal with it on a one by one basis uh it's it's pretty default any kind of new site any kind of small business usually starts here um by deconstruction we're going to talk about um the pros and cons of that approach and we're going to talk about why it might be necessary to do something a little bit better um so by doing this we're going to deconstruct abuse um we're going to quantify it and therefore turn it into something that can be tracked measured um compared to

other methods for uh efficiency to see what what what approach is best so part of this after we deconstruct is we're going to we're going to identify our problems and we're going to take research you know we're going to see who solve this problem these these problems and we're going to uh try to apply other people's work we're going to harness that because a lot of people have worked on these things before for the most part um and finally we're going to you know we're going to take that research and see what's applicable and try to make it work and in my case some of the research I did um I found a use to take things that

are primarily done with strings and I found a way to take an algorithm and harness it for Behavioral fingerprinting and that uh is definitely a pivotal part of this talk so let's talk a little bit about manual intervention um for our cases um that might be like an administrator page where you know you're looking at new accounts before they're activated before they confirm their email maybe you're looking at photos on a social network before anyone else can see them you get to pre-screen a little bit um and that's you know it all comes back to that default Behavior it's common um it's preemptive maybe you even have like a User submitted abuse flagging system you see

that on a lot of social sites somebody sends you a message you have a link to say whether this is spam you want to block the person and so forth um the benefit of manual intervention is that you have a human to deal with so that's actually pretty cool because in the days of everything being like robot response you know if if you call a bank if you want to get customer service and you're on a robo dialer and you just don't even know how to get to a person that can be frustrating as an end user so the cool thing about manual intervention is usually you know someone's going to lock your account down or take some kind of action maybe

they'll be nice enough to talk to you first but uh you know you have that personal report at least you're going to get an explanation of some sort not everyone does so that's that's definitely like the upside of it people are going to like that um you know favorable outcomes can happen a lot better when you have somebody who has a little diplomacy telling you what's going on on the downside of this uh it's resource intensive you know you have a site and your users are multiplying and you only have so much staff you can only scale to so many moderators at a certain point you're not going to be able to give someone a human response

so that's definitely uh that's definitely a downside because it doesn't scale so in thinking about that you know we're leveraging uh one of the one of the Notions of this is that these report systems these flagging systems you know we're effectively crowdsourcing uh we we're crowd sourcing abuse flagging essentially so I'm going to talk about the pros and cons of that um on on the downside this is one of the most resource in sensive uses of person time that's okay resource intensive it it's a lot of resources when you have people doing stuff sorry and oh furthermore manual intervention can lead to very high maintenance users um one of the things that happens is uh

you have users that you know they they are happy that they got a human to handle the case and so from then on they're going to be your best friend and anytime anything on the site goes wrong you're going to hear from them and they're going to want to talk to you for like an hour um also there's malicious flagging you know you're going to have people that are just going and they're mad at someone else you know some drama happens on these social properties and there's going to be like I don't like this person I'm going to flag every one of their photos and therefore clog up your photo quue and it's a problem so how do we how do we move past

this uh what do we do uh we deconstruct the problem um so part of part of what we do is uh we're going to take social criteria you know um what what aspects of a social profile are commonly manipulated for unintended uses you sign you join a site you can set um a profile photo you can set a short biography about yourself you can do comments to other people you can do private messages to other people you can do video chat with you can do regular chat you know all these things can be abused um people can take pictures they can uh put IM handles or URLs across it I mean there's there's just so many ways

for SP uh spammers and people to try to you know lure your users to sites or even malware fishing um so so you look you look at these Criterion and you pick a handful of it you know um this this deconstruction approach means you kind of have some live data to work with this is much less of your uh NASA planning a space landing and you can't actively like work with the material this is this is taking live data and it's analyzing it and getting an intelligence from it and making a solution so you you take some criteria and you kind of classify it you know we're talking about strings kind of as a predominant thing there there are other

things like photos but let's just kind of stick to strings for uh the progression of this and uh what does that do that tells us that uh you know let's let's examine strings a little closer like what happens when people enter stuff into sites and how can we how can we process that um well let's look for let's let's stand on the shoulders of others you know let's go see who did academic papers and you know try to try to see if they've solved our problems u i find that in doing this it helps to have your problem chunked down you know um what are what are the abstract pieces of what you're trying to do you know don't bring a

convoluted problem to uh Google and hope hope to find exactly what happened um go for something succinct what are the pieces of your problem and how can you make them more efficient and I'll get into some of my findings in the following slides um finally once we apply what we've learned um we want to measure Effectiveness accuracy and we want to iterate on that to make it better this is kind of like a system that you grow and learn into something better research in my case I set out to learn things about the landscape of string matching um you know in in in C and PHP you very basic functions like do these strings match each other string compare um it's

a very much yes or no kind of thing for the most part um another one that's a little more involved is the Levenstein distance and the Lenin distance is essentially a uh algorithm that tells you how many between two strings how many Single Character edits would be required to make them identical um and that's useful sites use it I've seen it in practice but when we're dealing with uh matching strings that don't exactly match lanstein leaves a little to be desired you know if you it's got a threshold you can decide how similar you want to trigger on and um if you make it too too open U it's going to start pulling in words that just aren't even

close to what you're looking for so Lenin's cool but it's it doesn't really Cut the Cake when it comes to you know humans that are going to adapt and try to op youate strings and stuff like that so after a lot of General searching I found a paper entitled um a comparison of personal name matching techniques and practical issues I I only have time to present just a little bit of it um but the the not the gist of it is that uh somebody set out to try to um normalize and consolidate patient records in the medical space and you know you might have Robert M Smith or just Bob Smith and you know how do you how do you

decide that that's a single patient based off of a name and so the goes and it goes on about several different string algorithms um there's some pretty cool ones out there um including lein edit length but um what what they do is they don't just take anyone's string algorithm they take Levenstein edit length longest common subsequence uh Gerald Winkler composite dice coefficient a couple others and they take the root mean square of all these different thresholds and they even do a phenetic component uh with soundex and they take the root mean square of this and across all those algorithms then they decide across the master composite threshold are these the same name and that's effective that's a multi-layered

approach um it works a lot better than any one alone so after reading this uh I was inspired in two ways first of all the multi-layered thing really hit me hard I was like well that why don't we build multiple systems for everything that looks at a lot of stuff it's fault tolerant handles variations better sometimes no one or two ways of looking at something are going to solve what you want you know it's going to solve what you want and bring in too many things that you don't want so um the the other inspiration I had about this was longest common subsequences in particular was a very interesting string algorithm for me I thought um I thought it was just like

perfect for a lot of applications we have in strings and I found out a little more about it so um spammers love to Opus skate strings um it makes our problem matching them harder and again that comes back to why I like longest common subsequence um luckily for us there's a lot of works and papers so in this particular deconstruction problem scenario we're lucky you know whatever problems you're trying to solve you might not be that lucky but again you know abstracting your research out putting it in small pieces that'll help with that um all right so I have a slide here and this is just uh a couple examples in collage form uh of longest common

subsequences if you can see the uh lower left and upper right corner that's just your basic uh source code and file diffs on the lower left you have uh some PHP code upper right you just have some basic text files and uh the point is if you've ever done a diff on a file you've used longest common subsequences in the lower right hand corner you can see um just some genetic sequencing it's used for that and on the upper leftmost corner this is my favorite part about it and it illustrates an important Point um you take two sequences and longest common subsequence will uh Advance both of them but it will also not care if

there's noise in the sequence so if you're looking for a particular set of actions that occur in order um a set of strings whatever it's it's really great for matching stuff that um there's noise in it's it's kind of fault tolerant that way it's a good thing for dealing with uh string parsing and other things so we already know it works well for Strings but wait there's more um I had an aha moment where I kind of thought I thought long and hard about longest common subsequences and I sort of said to myself well okay that's that's cool but like it could actually be used for other things it's already used for letters and words well what if what if you took

behavior and you just um you just uh represented it by a letter or a word I just signed up for a site that's s i just uh filled out my profile that's P I uploaded a photo that's U I just wrote someone that's M right so you you kind of uh can store in a database or memory these words or these letters about people's actions and then suddenly if you're trying to you know match things you're not matching spam words against written text you're actually matching beh behaviors against um signatures of behaviors that might be of interest to you furthermore it might not even be Behavior it could be an attribute maybe if uh there's an n in there we know that

they came from a certain net block right so it sets up the field for signatures of behaviors and attributes of users on your sites uh most of you are probably familiar with the saying actions have consequences I propose to you that actions have sequences uh multiple actions in a Behavior have common [Music] subsequences this isn't an all end all solution um kind of drawing from that paper again you know uh it it's best deployed in layers just like the string matching layers this is very well deployed in uh a combination approach you know string algorithms by itself is not going to do everything you need um behavioral fingerprinting by itself it's not going to do everything you need um

they're better when combined but uh legitimate users can still false negative I mean by doing any of these behaviors that people normally do on a social site and that's that's really the challenge it's like how do you differentiate legitimate users from people that are not legitimate well how you do it is in my experience is synergies of layers you might have a criteria that by itself is not at all useful in differentiation a legitimate user from a fake user you know this person's from the US this person's from the Philippines oh that doesn't really what does that tell us there's legitimate users from both countries well uh this this user uh signs up as a female and they

message 20 people within five minutes of signing up and they're also from the Philippines and none of our other users from the Philippines do that or you know there's there's better examples but basically in the in the synergies of these examples in the mutually inclusive uh behavioral patterns you'll find the differentiator into whether these are accurate checks or not and in the mutually exclusive joining of them you can clear people and avoid even like scrutinizing their account that much because you know that they uh completely for sure don't match whatever pattern you're trying to apply to them um and that and that all gets into inliners and outliers you know um what again this is a close examination on a

per criteria basis what what do common users exhibit and common common abusers exhibit and what do what do common legitimate users exhibit so um you take that approach and then you kind of start applying them in pairs and you see what pairs of two and three things maybe even more uh work your advantage and that will really help you with accuracy Sy some integrity and efficiency um some of these codes some some of these tests once you put them in code you know um they're they're going to Loop in the wrong users up first they kind of just will this is that's part of the process of this um as long as your system isn't automatically processing

these people then uh it's not really a big deal actually it's kind of good because everyone that Loops in is a learning case for you to examine them and create more uh rules and distinctions about whether these users are legitimate or not um and and then you iterate on that and every version of your system after the next is supposed to be better than the [Music] last but why is subtle for just a really refined system that can match most attackers why don't we confuse and frustrate them too um and first of all we have to understand attackers evolve you know so um what do what do all of us do especially with a hacker mindset when we

encounter a barrier online in our computer we try and get around around it how do we do that well we look at what stopped us and we adapt and we try again and it's very much uh it's a pretty common thing for all computer use will attackers do the same thing so uh what what we want to do is kind of make our system low feedback you know you don't want them to understand why or how we're triggering them um and the social sites need to understand that and evolve in much the same way um your your current attackers already have Intelligence on your systems um your your past ones most definitely have intelligence your new

attackers might benefit with that intelligence they might bring a whole new bag of tools to the plate but um you know if you provide very low feedback that's going to help you in all those cases and what you really want to do the point here is to use pre-existing intelligence um in whatever you make and you know take examples from what you've seen so counter measures I I have had great success with a synchronous response um on top of it it makes it very frustrating for the attacker to adapt to it enables the sense of a permeable permeable barrier you know these people kind of get caught on your checks but you don't immediately act on

them and what does that do for us it helps us collect Intelligence on them as they're still in our system at some point they're going to get booted out or processed whatever you do with these users but before they do we we get to understand them better and and what does that do for us well it it makes it so we can uh take that intelligence and we can go take our checks that CAU them and we can look at more and new criteria that we might not have thought of in the first place and we can write those checks for them too and so what I find is uh when you have this when when you

can catch the same type of abuse three or four times over then you're pretty well calibrated um especially with a synchronous response uh it's going to be a normal frustrating to attack your system it's not going to pay off very well it's going to be difficult and they'll probably look for an easier site what what would a system like this look look like um an example might be uh A system that recognizes a phrase a photo a net block combination um a behavioral signature um time between actions time from action to action um and the the the point of that is that we get to scrutinize behavior um further and kind of I I touched on that so basically

iteration cool wait what the takeaways on this are uh hopefully this help you guys understand a framework um for dealing with this type of problem in a deconstructive way you know and leveraging intelligence to help um another thing is think outside the box I can guarantee whoever's messing with your sites is doing just that um it's a war not a battle don't get discouraged if somebody if your your site's getting just pummeled um you're going to learn something from it you can harness that for other stuff and finally have fun with this if you work on a site that has a lot of traffic you're going to have a lot of chances to collect a

lot of data but um don't let that data go to way don't just handle users and move on to whatever you're doing or move on to the next one like store that remember remember who your people are remember their techniques tactics and practices because that's how you'll recognize them the rest next time around and that's that's pretty much it um I'd like to entertain questions comments you know any thoughts that this might have [Music] provoked if not I'll be around feel free to contact me um I'm happy to talk about this stuff I'd like nothing better than uh some of the ideas I laid out here to be taken and applied to some of your

industries and maybe even pivoted to something completely different um thanks for your time