← All talks

Friend or Replicant: How Attackers Automate and Disguise Themselves

BSidesSF · 201932:062.3K viewsPublished 2019-03Watch on YouTube ↗
Speakers
Tags
StyleTalk
About this talk
Is this "real"? This is the story of how attackers today leverage a variety of tools and tricks to impact the influence landscape at scale. Many have heard of "fake news"" and know that those "friends," "matches," or "followers" might not all be real; the information we consume is inflated with likes and ratings generated by coordinated attackers utilizing anything from users' browsers to IoT devices. How are these fake accounts and likes and clicks created? To what extent are they ""real""? This session will explore the fake account ecosystem, with specific focus on the lifecycle of a fake account and how specific tools and attacks are used to create likes and clicks; sometimes through automation and emulators, sometimes using real people through phone farms, mechanical turks, and sweatshops. We'll dissect the different main attack vectors and how they are being exploited: Content: repurposed to fit a different context, Access & Authentication: gained through Account Takeovers and credential cracking, Fake Accounts: created strategically to build trust, Usage: to emulate "real" users and not get caught Together, we’ll workshop practical steps to building an army of influencers (on a budget) using off-the-shelf tools and show some more advanced techniques seen in attacks today.
Show transcript [en]

awesome thank you so much everybody for joining us the next talk is by Hannah West Elias on friend or replicant if you have any questions during the talk please go to SL I'd do slide or app on your browser and code to get in is besides SF 2019 so once you put in the code just choose the track and this is Taylor 14 so once you do that you can put in your question then I will take this at the end of the talk thank you so much you can take it away yeah thank you and just to introduce with a bad way like I thought I had 40 minutes which I don't so questions I'll actually take

later after the talk but thank you all for coming I'm Anna with stylus I am super honored to be here this is the 10th besides SF so it feels good to be here for an anniversary Who am I I'm a former security researcher and analyst I have been in the security industry for a while now I started in systems and network security I did everything from pen testing to IDS monitoring I ran a stock for a while the last decade I've spent focusing on solving fraud and abuse problems at scale as I've been working with and for various companies to deal with fake information fake accounts distributed bought attacks stuff like that so quick disclaimer my favorite

quote about ml is that if it's written in Python it's probably machine learning if it is written in PowerPoint it is probably AI and because this is a PowerPoint presentation I have taken the courtesy of using the terminology AI for things that some people might refer to as ml and other refer to as AI so don't get pissed me so why am I talking about this um throughout the history of humans we have always tried to game the system whenever we assign value to something people find a way to abuse it and with the rise of social media platforms likes users user activity become the one number one value commodity right so not only do we assign

a dollar value to these things you can trade likes followers ratings up votes activity for sums of money right you can get a thousand Twitter followers for $12 actually real followers supposedly but it's interesting to me that we've never really put a lot of security or security thought into these problems or did we assign a value to it but we don't hold it to the same level of scrutiny from a security perspective as you would transactions right if you log into your bank you expect there to be a certain level of security on a credit card transaction or credit card information but with life's followers similar things we don't actually put much security into it so what is the real value of activity

activities influence influence and credibility infos and business and companies and thoughts and items you can actually value a company today based on their ratings online if their consumer facing right if you have a business that exposed to consumers three or four stars actually changes the value of that company in real time right and again we don't put a lot of security into it so that's something additionally these fake accounts that are being used are used for other malicious purposes so we can influence someone's ratings we can influence someone's value as a company but we can also spread information like malware phishing scam etc and again if there is no security in these steps of things then why wouldn't I as a person

go in and abuse those resources from my own benefit what concert is real so if you go online and you're a social media or you are on a website that has ratings or comments etc how do you spot a fake account can someone please tell me how you would identify a fake user someone tries to friend you on social media what do you look for anyone then you reverse search their profile picture that's pretty advanced Frenchie what lack of activity that's one anyone else no common friends you don't have any friends that you are very suspicious don't friend this guy okay age of account okay all these are great thank you so as these things have become more I've

gotten more attention from the media users have been taught rudimentary means to identify things that are real or fake right so you know that if you have no friends in common if there is a lack of other friends if you have no photos if there is tons of activity so you Sofia here or a marvelous robot by the way this is the most advanced social robot that exists um she has 18,000 or 1800 reviews no comments no photos no friends right and she might come from a location that is unexpected from the place that she is revered right these are means that are very very easy for people to emulate and to pretend to be like because we set the framework for

what is real we give attackers the ability to emulate that behavior the way that they look the same things this does not increase the bar for attack at all actually it's actually very easy to emulate it's very similar to the way that we introduced more secure password practices if you all remember the eight character password with you know and number a special character a capital letter it actually gives people the format of which your passwords are rather than you know making it more secure wait I think we're all in agreement so if we were to create our own fake information campaign online me being Swedish I'm fairly tired of seeing other Swedish people in the workplace

I'm tired of having to listen to comments about ABBA and really nobody wants an Ikea maze in their neighborhood right so we're gonna start spreading this false information how do we do that well okay so how do we fake it what do we need in order to fake it another anecdote about my native Sweden so my country of Sweden just recently got voted the most unfriendly place to move to this might not surprise many of you if you ever met Swedes but apparently if you as a foreigner move into Sweden you can't make new friends because what sweets do is that we don't make friends or interact with people who don't know a person we already know

right to be invited to a party you need to know somebody there to talk to somebody in public we never talk to strangers I mean engineers love Sweden because of this right wait we are fairly unfriendly but what's important with this is that it gives a fast-track to closer friendship because you already know somebody who is in the clique you've been verified you've been validated and therefore you have access we think the same way when you think about fake accounts online right if we have friends in common we're much more likely to accept that person as real because it's been verified has been confirmed it has credibility right so when we generate fake information we

need to do two things one generate new accounts which will create all the activity generate all the content that we need distribute the fake information that we want to spread but secondly we need to access pre verified accounts we need to faster gain credibility through already verified pre-existing old accounts usually accounts used by real users so two parts then we need to start generating content and normally in fake information campaigns you take pre-existing content and you repurpose it in other contexts so information that has been published is being reused to serve our purpose so for example if you want to spread information for Swedes we might find articles or pictures with Swedish fish and repurpose them in a way that helps

our campaign swedish fish causes food poisoning or other things I don't know anti Swede propaganda but what we also need is content that makes us look real so you know the human inspection that I just talked about right real profile pictures age location name username things that passes a normal user expression inspection as well as a human one so we need to gather that lastly we need to start generating some activity to make us seem real over time we need to not overdo it and we need to do the things that we built this army to do right we want them to generate activity for us spread our anti sweet propaganda but in a way that seems real and normal

so now we know what we need but how do we get those things and it's important to understand that a lot of these companies that one might target and foreign fake information campaign you want to target many different kinds of websites right you want to spread your information in as many locations as possible and the way that people protect against these are three major categories and these are like developed over time right so as people started stopping bought attacks and other types of automation integration they started blocking things by IP address which is the first so IP reputation and IP rate limits they need you something called browser or device fingerprinting and for those who are not familiar browser and

device fingerprinting is essentially gathering client-side information about a device or a browser in combination with HTTP headers to create an identity so you track a user through some client-side information like screen resolution plugins font set cetera and you combine it with user agent string or other rudimentary HTTP features what's interesting with this is that one there is HTTP like there's feature parity in HTTP so you don't have that many things that you can gather there is fairly high entropy in finger but they are not trusted by the people who used lastly and this is kind of new people have started gathering things they call behavioral fingerprinting or biometrics which sadly is just a more fancy way of saying mouse pointer

movement and click locations which is not very reliable so if we're just gonna very quickly see how we surpass these things I piece ip's are fairly easy IP today if you want access to rely piece and then this sends real being high quality user access eye piece you can go online and go to any service that offers them whole la vie bien for example is a peer-to-peer VPN proxy that churches you almost nothing for direct access to other people's connections right we have a business service version that you can subscribe to that give you access to other people's home IPS and connections continuously and what's interesting with this is that the price model for these services is

the same as the value that we assigned to online accounts so a real IP meaning user activity IP validated real is priced higher than an IP address that comes from a data center or somewhere where you expect that traffic so I don't things that are being validated our price higher if we don't want to go the more legitimate route we can go down the Renta botnet Rhett and lucky for us the button industry has gone down the path of software as a service as a business model so you can actually get a diamond level subscription month-to-month to get access to residential IP addresses for only $40 a month's Diamond level that's fairly nice I think

so we've got our IP addresses then we need to bypass fingerprinting techniques and I find this hugely interesting because all of this is client-side information and anyone knowing anything about security or information transfer works if you give something to the client they can send anything back to you they owned the information that you then sent back to you so if you're basing your identification on client-side data well yeah you're pretty much because you know they own that right so when attacking any of these targets that you might target for these types of fake information campaigns either you own the information and therefore can send anything back you can pretend to be any browser or you can just not load the

JavaScript because they don't expect all users to load JavaScript and so a JavaScript based fingerprinting solution you can just not load it because that's not what they expect of their users right usability versus security in this case usability wins and you can access right lastly if we have biometrics again one of my favorites and this is because the people who build these things it's like they never talked to an attacker before they're like oh you're gonna go home and you're gonna program mouse movements so that they move like a human and therefore we're gonna catch you because it's programmatically calculate able and I'm like oh I'm just gonna go home and like record my own mouse movements and replay

them back like those are human movements they cannot be identified as programmatic because they are mine and there are actually so many software's out there today that just adds a layer of randomness to whatever movements you record so it's super easy and as you will see in the rest of this talk is that most of these things they used to be difficult we've built tools for them now so it's super super easy to just bypass all of these protections right so this that I just explained is referred to as a single request attack it means that we ideally record our own biometrics we add a little layer of randomness to it we distribute ourselves so that we access

it through different IP addresses for each request and either generate a new fingerprint or don't generate a fingerprint at all meaning we're always looking like a new user right so we want to generate and Meishan we want to generate new accounts we want to not get caught by IP limits we want to not get caught by biometrics or fingerprints if we just distribute ourselves with those features for each request we look like a completely new user that they cannot in any way blocker stuff right so it is just time to assemble the robot army we know what we need we know what we need to avoid to get it so let's start collecting those things and just as a quick recap we need

profile content and content to redistribute we need new accounts that will generate our activity we need to start generating that activity and we need some old accounts to give us credibility in our activity right so we don't get caught so before we start looking at creating accounts I wanted to talk a little bit about content and it's important to understand that the most common means of stopping or identifying fake content is advanced reverse image search dan mentioned this very briefly as I was asking about fake profile content but the way that most companies today look for a fake content is that whenever you post an image very very advanced unscalable expensive models look for similarity and older pictures so you

post a picture the models are looking for things that look kind of similar to that and they may look for the first original context of that image so the first time it was posted in what context was this image posted and then it compares it to the current context this is hugely expensive very advanced and a lot of companies have invested so much money into trying to do this at scale and so I was going to talk about image scraping I was going to talk about very advanced you know image manipulation techniques that we can add to our automated process as long as we've downloaded she tell us of other people's profile pictures but I don't have to

because of recent developments in the AI Department are any of you familiar with this website yeah a neat show of hands seen this before yeah so this person does not desist come it generates new pictures and photos of humans for every reload none of these people exist they have never existed there is no original context for any of these people right there is no way you can put these into a reverse image google search and say who is this person originally and has this been repurposed is this fake so this is great if we want to start generating fake profiles right there are no security on this website this is made by scientists they don't know security so I

can download hundreds of thousands of these images and build up a large inventory of fake people things AI so we can beat this lady it can be this lady or whomever we want and I also hear that from a marketing perspective that being AI powered is like the bee's knees so our robot army is now a I powered isn't that a little bit scary tiny bit but this is not the only part of this that is AI powered the all the other stuff that we needed from our profiles to seem real to pass that manual user inspection well if I am enter training data for user profiles training data is used by machine learning models to understand

the context better so there are large sets of data pre-existing for the purposes of science online there's a Wikipedia article with listings of just this specifically so I need hundred thousand user profiles I can download a hundred thousand user profiles and I can use that to my benefit no longer do I need to scrape the out of the internet to gain this information to seem real I can just generate it on demand thanks AI so we got our content yay when we start looking at generating new profiles we're gonna use the same tools that we use for generating new profiles as well as generating the activity that we need it always baffles me held little Co

do you need first of all what you could do is that you can go online and you can buy a purpose-built tool for less than ten dollars that can generate any accounts on any website but if you want to do it yourself you can pop up something like a headless browser like selenium chromium utilized centum j/s to build a bot that registers that logs in and then starts generating activity like following and liking people again the amount of code necessary to do this is baffling it's almost nothing also this is not mine this i googled right like how easy is it to create user profiles okay here is a social media bot for you and again it requires almost no work to

start generating the activity and they self replicate so ass friends validating friends as a concept they start generating and then they go back to the previous bot that they made and they friend it so it replicates itself the network validates itself pretty easy the important thing here though to remember in order to not get detected is to blend in like a sweet my country moto which is called the law of Yount a it's essentially don't stand out and don't think that you're better than anybody else this is not fake news by the way this is absolutely true and it really much resonates with how we need to act as we generate fake accounts online don't overdo it right we get caught if

we jump in and we generate a hundred thousand accounts and they start liking one thing they need to do it slowly and consistently just like it's if you ever find yourself in my native country of Sweden and you have to stand in line always ensure that you have nine feet in between you and the person next to you or you will you know it's the most the biggest social faux pas so we have the accounts we have the activity we have the content now we need to gain access to other people's accounts and this will do with something called credential stuffing I can't take over credential stuffing was the next most common cyberattack last year after DDoS

which we haven't sold yet by the way come on guys and what a credential stuffing attack is is that you have a set of credentials which is email addresses or usernames and passwords these are leaked constantly right we see leaks in the media all the time I think there was one yesterday with 7 billion user accounts and because our users have not started using separate passwords and separate websites this is the most effective way of gaining access to pre-existing accounts so what we'll do is that we go to pastebin one of my favorite websites and you download a set of credentials what the biggest one you'd find may get size matters and then we'll play it into

one of the the most common off-the-shelf ATO tools and so like I mentioned before this this is very easy it doesn't require any coding some of the most common ATO tools out there essentially MBA account hitman and anti-public and also something called sniper most of these are made by Russians and are normally in Russian but because of a growing community of crackers online they have been translated into English and well made available for anybody who does not speak Russian um yes and then just asked how many much malware you get by installing this you do it in a vm right don't you vm bro yes yes that too but without going too much up topic the way

these work this is a great overview by the way I promise is that you have three different resources that you put into these you have a combo list which is a set of credentials that you get from the pastebin you have a list of proxies so the IPS that we gathered originally these residential IPs that we want to come from we just put it into the tool they have a target list you can target multiple different websites at the same time this is awesome because that makes you not seem so aggressive right and if we want to start generating accounts and multiple places to spread information everywhere it's great to be able to do that same time and then they're highly

configurable so the fingerprints that we have identified previously that we might need we can pull them into a configuration have a change for every request we get that single request attack without having to use any higher-level technological tools or I really understand the environments in which we existed right this is plug and play it's also free generally so what normally happens and the way that account hitmen works is that you actually get a web interface in the tool so it opens up it loads the website in an iframe in the tool for you and then you actually get to choose if you want to be Chrome or Firefox like whatever browser you want to pretend to

be it loads it and you actually you'd go in and you do a successful login yourself manually and it records that so it sends that first manual request it validates success gets that back and then launches all the credentials in your combo list an individual request new fingerprints using the biometrics that were recorded when you went through the tool with a certain level of randomness and you have attacked all of your target sites at once with new individual fingerprints for each request and bypassed all identification because you seem like a new user for each request what you get back is a list of validated credentials that we can use to validate or other accounts so we build

credibility through our friends or friends well it's also very common is that people do this and they don't actually you overuse the accounts they get access to so a lot of famous people will verified accounts they get people going in and doing small amounts of activity and they come back from time to time but they don't overdo it so they stay under the radar because let's face it most users don't check their email for like security alert to Twitter right like someone's logged in from your account we care but most normal users don't so we have all the things that we need in order to start generating some activity and thank the a whilst all this this

entire economy and echo system is completely fueled by the way that we value user data and user activity I think that the recent development in AI and machine learning data science it's really fueling the new fake account economy right if we can't validate that information that's been put on the Internet has a real context we can't say that this article which features a picture of a beaten woman like we don't understand what the pictures come from we can never validate the original context it's very very difficult to battle fake information and fake news especially seeing that all the investments that's been done recently go into image server search capabilities and similar type technologies so I don't know if you guys

remember this user does not as East come this is my other favorite website this Airbnb does not exist calm so these are actually four different individual machine learning models that generate a new room in the same house so they work independently but together to generate a similar image of different rooms in a house it also automatically generates a description and a fake account picture this could be used in real time to scam people in Airbnb I mean I don't see really another use case for this except for science and my point here being that making this available to anybody to use only fuels the fake account and fake information society right the economy of spreading false information is

completely fueled by AI if we cannot validate again original content how is it that we should be able to believe in anything or have any faith in the content that we we look at online and if someone can build a purpose-built air B&B ad generator right like there is there is no business content for this like why would you do this then you can create content for other purposes right if I want to create fake anti Swede propaganda I can feed a model tons of pictures of Swedish fish and sad IKEA may situations for people get lost and it's easy for me to send them online and no one can prove to me that these are

not real right there's no way of proving so I managed to get it all done in time so in conclusion it has become increasingly easy for attackers to execute on these attacks and I've been I've been doing this for over a decade and it used to be hard when you used to do distributed bot attacks you had to write like your own little botnet and you had to like inject yourself and people's browsers I remember a few years ago there was a guy from Wix sitted an awesome talk and extension based botnets at DEFCON right like it used to be difficult but now you can just plug and play tools and get your get your access

right get your fake information out there you can generate new fake content easily by just building a model right or find a bunch of scientists whose research project they get really super excited about generating new fake Airbnb ads so conclusion is there's an impossible problem I don't think so I mean it looks it sounds like these attacks are so sophisticated but they you don't need any understanding and the technology is really available my take on this is that the security industry all of us we've been neglecting these problems for the past few years right we haven't actually cared about this ever when I moved from networks and system security into web almost all conversations seem to be that this

wasn't real security and that we didn't care we need to care because again credential stuffing was the next largest attack of last year and if we can solve this I don't think anybody can in my opinion having worked with large companies I mean just really interfacing with the identity team or the fraud team or the abuse team whom by the way never seem to interact with each other I don't know if any of you have any experience in that but very rarely with the security team and in my opinion what we need is a security mindset we need a hacker mindset in order to understand these attacks the environment in which they exist and the technology used so

what I'm asking of all of you is to think about how this might impact your business and if you can help fix these problems because there is no one silver bullet in security and I believe that we need multiple different efforts in order to try and solve this and we need to at scale so that was [Applause]