Proactive Password Leak Processing

Name: Proactive Password Leak Processing
Uploaded: 2016-08-29
Duration: 1 h 43 s
Description: Bruce Marshall examines how organizations can proactively detect and respond to password compromises by monitoring public data breaches. The talk covers techniques for identifying leaked credentials, processing password dumps in various formats, comparing hashes across breaches, and implementing mit

BSides Las Vegas · 20161:00:43265 viewsPublished 2016-08Watch on YouTube ↗

Speakers

Bruce Marshall

Tags

CategoryTechnical

TopicThreat Intel

StyleTalk

About this talk

Bruce Marshall examines how organizations can proactively detect and respond to password compromises by monitoring public data breaches. The talk covers techniques for identifying leaked credentials, processing password dumps in various formats, comparing hashes across breaches, and implementing mitigation strategies—balancing thoroughness against resource constraints and privacy considerations.

Show original YouTube description

Proactive Password Leak Processing - Bruce Marshall Passwords BSidesLV 2016 - Tuscany Hotel - Aug 03, 2016

Show transcript [en]

okay so this upcoming talk now Proactive password leak processing with Bruce Marshall I know the schedule says 25 minutes it's going to be a bit more than that up to maybe 50 minutes I don't know we'll see and as I said before you can very easily go without food for several days so even though there's lunch afterwards i will highly recommend you to stay here and listen to the entire talk because this is going to be a good one as well Bruce is one of those that have been with us ever since we started doing a password strong in Las Vegas four years ago so i will just still leave the stage for him go ahead Bruce

alright thanks pair so he has the higher on the password research com website which I started 10 15 years ago and not quite 15 years ago to try to gather some of the research specifically starting with the academic community but then I have since added more from events like this which I would consider non-academic for the most part and trying to share that information make it more accessible and essentially provide an index to people like us who work in the private you know industrial government type fields so we can benefit from some of that knowledge and my last couple of presentations as pairs mentioned I've presented several different times on security questions on pass phrases like

days where XKCD past style pass phrases and password expiration and some of those have been driven by data that I've found like the security questions and password expiration was based on password data that I had and some some cidade internet dumps that had security questions and answers and this is one that's been a little bit different because I started out just hearing about these companies looking at leaks like Microsoft going out they announced here a couple months ago they were going to start you know or then maybe they've already been doing it for a while but they announced at least that they were going to be looking at password leaks and looking at things like that to try

to protect their accounts of their other users or customers and so as I was collecting this information I start hearing more and more about it and I decided that I wanted to kind of try to gather what we know right now the companies are doing and talk about the techniques that they're using and talk about the different alternatives you have if you're considering this or if you're debating whether it's even worth your time and one of the reasons I want to do that now is because in some ways and some things in the industry we kind of get broadsided by stuff by either auditors or standards you know a wasp or sands or somebody comes out with a new guide

light on something then we had to figure out how to do it so this is kind of my way of helping start that conversation along before we're too far and we're you know ninety-nine percent of the industry isn't doing anything related to this so let's get right into it in like per set up there was some confusion I I had thought that I had a 55 minutes lot so I have about that much material prepared but I'm going to try to pare it down to probably closer to 30 35 minutes try to be respectful but we'll see I no guarantees just just raise your hand and payroll bring you a little cup of juice and right so password reuse if you're

not sure what that's term means basically is a person using a password on multiple sites or multiple applications typically in this case we're primarily in the internet but you could say a corporate environment something like that and so to try to measure the extent of my going to give you some stats and this is where I'll kind of not spend a lot of time but there are several different ways we can kind of measure that and then there's additional stats that I don't have just because there's more than we really need but just to give you a rough sense of it when actually asked what they do which can be somewhat problematic because people either idealize what they do or

they underestimate what they do roughly you know anywhere from forty six to sixty percent of people say that they at least use passwords on several sites or they reuse passwords for different places they go on the internet we can actually see that too in some research that's looked specifically at password leaks which password leaks are essentially if a company gets hacked like furin for Michaels last presentation talking about some of the data that gets dumped out they can compare passwords between the different users that are in both of those dumps both of those that were hacked from both of those organizations and see if they matched and kind of get a better feel for that and so one of the research

papers academic research papers was looking at how many had exact matches vs slight modifications you know maybe the password policy is different on one site than the other so they had to add a capital letter with only that wouldn't have one or that a number or then maybe there's some people like to do little prefixes like for facebook the first three letters or FAC things like that where they felt like they were reasonably predictive and I guess I kept it on the first slide but these numbers and brackets here I'm big on references since that was the point of me starting my website was to point people back to the original sources of data and i'll

have the references section at the end so if you see something you want to dig in further just write down the number you can see actually where it comes from and do some reason yourself Troy Helen also did some comparison between Yahoo voices sony pictures and similar type of thing he saw that around a little bit under sixty percent use the exact same password with in that sample and two percent has slight capitalization differences between those passwords and finally probably even the most accurate or at least more insightful is monitoring what people do within their web browsers and two different one study and one kind of industry type study have looked at that Trusteer it's a little

dazed getting a little bit old now but I don't imagine things have changed too terribly much how to basically at a browser extension that would monitor where people were using their passwords for those those customers I want to say that they had like four million different people that had that that extension installed and they were able to see that just specifically focusing on financial sites that seventy-three percent of the people used their bank password or we you know their credit union whatever to login to at least one other site which you would think that's very bad and you would be right and then also the fact that they may have a different ID but they but they on other

sites too but they use the same password and then a smaller review of just more more recent but smaller review of university students they did a similar thing install a browser extension looked at how many different sites they had versus how many passwords and there's a lot more details in the studies like and I'd encourage you to dig into if you're really interested but eighty-five percent had fewer passwords and websites than websites they went to so why is this a problem because of the ATO or account takeover threats kind of with that that type of attack has been labeled where we as site owners start getting an attack because our users have made choices to reuse their passwords

now account takeover is not just a result of password reuse could be really big as a result of poor password choice would be every someone's computer having a Trojan installed in it you know there's there's different reasons for but Passover uses one of the threats as one of the causes of account takeover and credential stuffing is kind of been the name that I think shape security kind of introduced that and then a wasp it's now on the wasp wiki page so it must be our standard for using different credentials across multiple sites so if you hear people say credential stuffing that's what they're talking about account checkers then are the actual tools I mean anybody can write scripts

to do password guessing against websites I mean we had a talk yesterday about my brute my BFF my brute force framework that was talking about Ewing that against specific login systems and there's lots of custom like if you go out in google facebook account checker or Instagram account checker you'll see different santa loan programs but there are some that are more versatile century and be a seems to be one of the like most professional great tools that can you essentially set up you know you can go run it through automatically run it through proxies you can have it it can even interpret some OCR so if you start getting account lockout threshold where OCR prompts are being or OCR catch put

prompts are being provided it can OCR those and try to you know essentially bypass the are you a human check but chardon cred map and I think shard came out this year those are specifically designed to try credentials across multiple websites so you provide it with a credential list and it tries Facebook Twitter you know myspace whatever else are programmed in there so these tools are out there they're available in fact it can be customized I think there's essentially a you know a template you can create to have it go try like if you want to add a site that's not in one of these existing tools you can go out and create that you

know get the parameters for the username password and maybe the failure success messages and go out there and do this so the tools are there this is you know in part what people are using and these account takeover attacks when people are sharing passwords so how is you know is this real how often is this happening all the way back in two thousand three google talked about seeing types of attacks against their sites a million different Google accounts every day for weeks at a time that they were trying to combat and you know we kind of talked about the online online guessing limitations where it's not nice not nice like offline password guessing where you

can do billions of attempts a second depend on what password hash online you're typically thinking well they're going to have a count lockout or they're going to have IP blocking or they're going to have something like that and we'll talk about a minute there's ways of getting around IP blocking but they even back then saw like 100 counts per second which is 6000 you no-account tries per minute which is a pretty substantial rate if they're having success today most most recently microsoft said every day they see 10 million and since updated it to like 12 billion or 12 12 million or more credential attacks every day for the systems that they manage their Microsoft account systems so I was definitely a

growing problem yes cannot can you beg boy I hope I didn't oh oh yes yes yes thank you I appreciate it that was part of my writer thank you you know you never know if people are gonna read those so apparently they do all right so there's private comments to you went to everybody all right all right Akamai's okay right that's I appreciate you keeping that person Akamai monitors a lot of sites on the internet and they're able to see some more insights on their customers beyond just a single customer type of a situation and they released this year in one of the reports like nearly you know a million IP is being used in one single attack against a

financial customer throughout that period 427 million accounts checked and a different customer which I think they said was in the entertainment industry was the the second one here you know 817 thousand different IPS and looking at those ip's of course because they've got insight into both of those Ariel to see that there's about 75 seventy percent overlap between them so they either they're using the same botnets or it's the same gang of people running these different types of attacks or there's you know there there's definitely some correlation between those different attacks so they're out there that trying whether it's your site yet or not I mean only probably you know or hopefully your logs will tell you one of the

interesting things also Akamai said in that same report was that when new password leaks come out like LinkedIn and myspace and some of the others we've seen this year they see spikes in account takeover credential reuse type activity there specifically monitoring that types of stuff and one of the biggest examples of the happened last year was taobao the Chinese kind of like an ebay reseller type site they were hit in the middle of October with what authorities later said was a collection of 99 million credentials that successfully got them into about twenty twenty that matched at least 20 point five million active users on the Taobao site now taobao said that they didn't actually get in they got blocked by

maybe you know some contextual authentication type stuff they came from the wrong IPS or they had suspicious browser strings or something like that but regardless of how many they actually blocked the resulting crime resulting from getting into those accounts was around 1 million dollars worth of fraud that they detected and then had to deal with on their site unless widespread nature there's been a lot techcrunch here just in the last couple weeks got hacked into due to a shared password and their content management system briefly someone posted a fake article github Pratt problems the same type of thing after linkedin came out the most recent linkedin leaked that their users were being attacked with reused credentials and they had to

respond to that and shat bow was another one that said that someone got in because of a third-party breach and the credentials being the same as on their one of their administrators so it's hard to quantify how many people have suffered account takeover due to password reuse because we often don't know why their account was taken over unless they say specifically oh I had the same password between ebay and my facebook account but we know as far as a just account compromises in general roughly you know twenty five percent of the population has experienced that in the past year and had to deal with the the outcomes of that based on this survey so why does that happen I don't

really want to spend too much time on this there's lots of different reasons for people to want to get into accounts often you think like why does someone want to get into my Starbucks account I could understand ebay or my bank but why am I starbucks account typically there's some way they can either get money out of it they can get you know social proofing they can do different things that are going to add value to them or that they will consult someone else that has entered and doing those things so I guess one of the questions that we kind of have to answer is is it our responsibility to care someone's chosen to reuse a password and there's been

research that says that people reuse passwords in part to deal as a coping mechanism to deal with the overloaded passwords that they have you know they don't try to choose something super complex so they can remember it they try to you reuse it in certain situations because they want you believe that memory burden that they have of having you know if you had to memorize a password for every single service or account that you have that could be overwhelming so they made the decision to reuse that pastor maybe they weren't as informed as they should they could have been but they did make a decision about them so we do have the option and most of us are in that

default option right now which is we're not doing anything we wait until a an account gets hacked and we respond to it and you know we can continue to do that if we want to so from a perspective of back to what do users know there's an excellent paper I don't think dr. Kramer is in here but her team at Carnegie Mellon has done great research over the last few years but one of the papers that i recommend pretty much anybody dealing with password policy or authentication decisions is this one called i added a ! at the end to make it secure which is where they sat down in the lab and they asked people to create

passwords for like a newspaper site a bank side and the email account and they looked at what they did and kind of Tad's them talk through the process okay I'm this is my bank account so I want a stronger password or this is my email so I want it to be something I can type in quickly they got that feedback from the people that they talked to and then they also talked to them after the fact you know why did you choose to make your bank password the same as your email password and so they specifically got feedback on password reuse and a lot of people say you know kind of like well we know it's a bad thing to do and I

probably shouldn't be doing it I'm not as concerned because it seems to not have any consequences for me they're part of that maybe seventy five percent of the people that haven't had their account taken over that they can't trace back to password reuse being a problem what you know that's you can't argue with experience in some cases they somebody can reuse a password and never have problems with it depends on some of the other factors of if that password is any of this closed through an attack or a breach somewhere so but part of the problem is that they don't have the same education that some of us have as to what constitutes good and bad password

decisions what constitutes risky situations that may expose their password to compromise three of the different people that they talked to you know say hey if I've got a good password i reuse it I don't see a problem with that my reuse passwords not easily guessed no one can guess my reused password well so the researchers then took the passwords that were generated as part of this lab experiment used hash cat blindly without you know with someone that didn't know what the values were to crack those resulting passwords or attempt to crack the passwords and two of these three people had their passwords cracked so maybe not the best you know judges of someone being able to guess her password so

that's my perspective I guess one more here this was a guy that recently just came out here he was contact by news agency because his pastor had been breached as part of a 02 compromised over and I guess England he talked about well you know I reused that 402 and ebay and gumtree and up to that point he'd considered himself secure on secure online and internet savvy so you know from his perspective maybe he had thought his password was good enough but the point being sometimes we have to provide that guidance back to users you know we established minimums or we establish standards and users say well if you're saying I only have to you know

do six character passwords and that must be sufficient so you know the office office space a flare seen here came to mind as far as that argument of vino well you why did you set the standard this if this wasn't sufficient and pastor reuse it's a little bit harder for us to set standards but our actions do kind of speak to that same question of is this acceptable behavior is it not acceptable behavior I mean users do also kind of have mixed feelings about who's responsible for that and one survey 56% said the sites that they visit had ultimate responsibility for their for their account protection and another thirty-nine percent said that websites are to blame if they have account

compromises because they didn't offer the right security features whether it's multi factor or stronger passwords or whatever it may be they're placing some of that blame on so I would say you know given this we kind of have a shared responsibility Alex damos who was it was now the Facebook CSO spoke at an opposite conference here last year where he said he was asked in the Q&A session what's the biggest challenge for Yahoo and he said user security and then broke further down and talking about how they deal with password leaks and password compromises and saying you know in theory there's nothing we can do about that right the users choosing to share their password

they're making a choice we you know we we we don't really have any role in that choice but he said in practice it means they need to kind of redress how they're dealing with passwords how they're dealing with our users and how to limit the risk of those compromises because the users not capable or not willing to do that for themselves so I thought that was a very pertinent quote related to this discussion so there are different things you can do we're going to focus on pastor leak processing but I did want to kind of talk through these and I'm not going to spend as much time I wasn't going to blacklisting a little bit more

in contextual risk-based authentication a little more but one of the bat i would say bad things you can do is to enforce regular password expiration and that's been talked to a couple other sessions in this conference where if people are changing their passwords every 60 to 90 days at least can't have an exact they probably won't have an exact match to their other accounts because they'll be incrementing the number on the password that they've been forced to change you know maybe a slight difference and as we talked about earlier and some of the leak processing people are able to guess those transformations fairly easy so not a great way incident driven would be like Citrix go to my pc and Pandora was

at Pandora see had one word wow carbonite Carbonite was the other one whoops there we go so Pandora and carbon ir gotomypc and Carbonite detected password guessing attacks on their sites and just reset everybody's password forced everybody to change their passwords that's a fairly you know scorched earth type of policy it can work but how often are you in a bell to carry that out and not upset your users I mean there was a lot of people that said I had good passwords on your site you know especially with carbonite we're doing backups with it and now you got to change service passwords and stuff to make sure that continues to back up that can cause a

lot of disruption that your users may not want to deal with you can design unusual password policy requirements you know if you start making sure that people have to start it with no space you know no symbols in their password or they have to have a you know two symbols in the middle of their password or Mark Burnett has a great site p-dub or twitter feed of pw too strong which he's retweets all the different terrible policies and different sites have there's there's ton and I don't think that typically they're designed to do that but that is one way that someone could approach this this problem saying well come up with some crazy requirement that way are they won't you know

possibly have the same password on our site as others you can assign random passwords to users a sort they don't have to be random I mean they could be semi random or whatever but you could assign passwords to users Linux Mint was hacked earlier this year there are their forums where their forums and their partner sites or community sites were their database for those were hacked and as a result of that one of the choices they made was to just randomly assign all their users passwords for the for the new passwords that lasted about a week after that they realized that people weren't really happy with that choice they wanted to be able to choose

their own passwords for some of us it's not as big of a deal we just plug it into our password manager and go on our way and but for others either they're trying to have to memorize that or write it down or they're just not pleased with having to deal with that possibility very high secure sites you might get away with that maybe there's you can justify that your users aren't going to react to strongly eliminate passwords altogether and this is what kind of Yahoo with their yahoo account key and and some other sites like medium have adopted where they just basically say if you want to log into the site plug in your username will send you an email you

click the link in the email that has basically a session key that logs you into the site you don't need a password anymore everything will be done through you I mean your email you need a password for but that's not our responsibility you know their responsibility more that's you know for some sites i think again lower security that may be an option you're willing to go with just because the passwords then are you ought to where you know they have to worry about their email password but from your perspective there's nothing else you really have to worry too much about two-factor multi-factor authentication two-step verification I mean we've been encouraging that for years so regardless

of past reviews because of all the other password and account takeover threats it's a good idea probably goes without saying but it is nice if your password isn't the only line of defense so if someone guess is your password they still can't get into your account that's ideal black list from leaked passwords and this is I'll talk briefly about this but and Jim mentioned yesterday is talk about the newness standard that there's some pushes towards instead of just reacting to people's bad choices after the fact when you see it in a leak you tell them yes that was a bad password I mean not all leaked passwords are bad but presumably some of the passwords are

and then you would be telling them not to use them you could just create that list from scratch or create it from leaked passwords them their services out there password rbl is when i talk to he's got millions of passwords essentially he's compiled from leaks from attack tools that have password username you know combo lists from different places like that that you can subscribe to you can generate your own black list but you can essentially try to prevent those from the start for all users rather than just saying this one user can't use this one leaked password that we've we found that they were using somewhere else contextual risk-based authentication there's lots of different names for it but it's essentially

looking at other factors more that you're monitoring more passively that are associated with the users blog and experience so you're looking at IP and geo-locating that IP to their normal location so if I login from the United States and suddenly I'm coming from Europe they may flag that for a you know they can essentially flag that as a higher risk type of authentication transaction linkedin did is doing this and did a great talk at enigma conference this year david freeman there i think it was called server side second factors and he talked about different essentially the formula that they use to to determine risk you know browser agent time of day and all the different IP

factors and he also talked about their success at combating things like take over fraud the fact that just by looking at country they could eliminate 90 some percent of automated you know those bottles like the Akamai data the the automated large-scale attacks because they were coming from other countries and some of the other things like that so that's certainly something that I think regardless of whether you're doing this just for to combat password reuse this is a good thing that you should be looking at if you're not already having it implemented all right so finally that in the approach we're going to talk in more detail about is looking for password leaks on the internet from other sites possibly from

people claiming it to be your site and then compare that to your own users so here's my obligatory kitten picture but so your goals are doing this is to reduce a town count take over you probably won't eliminate their to reduce account takeover based on risks that you know to be there if this password League is out there and other people have access to it those accounts by logic have a higher chance of being attacked so you're trying to get ahead of those attacks for you know presumably the riskier accounts at least from this perspective I haven't seen or heard feedback from the companies are doing this I would assume that there are some money savings from having to for dealing

more proactively with eliminating account takeover threats before the account takeover has happened when you're not engaging customer service and you know admins or whoever else has to deal with compromised accounts and possibly the loss of business that goes along with customers being frustrated that you didn't protect them even though maybe it was their bad choice that led to the count take over and it also as we talked about demonstrate some security commitments who your users to your investors your auditors your you know management team whatever whoever is kind of in that field of needing some reassurance so if you've never seen leaks before if you're not quite sure i'll give it a brief explanation typically it's going to be just you know

data that's posted on the internet my experience and i'll talk a little bit about this the data that I collected a couple years ago was most of the at least the small-time compromises come from things like sequel injection where cites private predominant the running PHP would either not program properly to resist sequel injection so the attacker can then dump their entire user database get the passwords you know user names and things like that sometimes in bigger cases it comes from just outright server compromises where they compromise an application server and then have connections back into the database server that they can pull the data out of but there's also cases where trojans and malware are collecting it I

think Trustwave did an analysis about the pony botnet and they had several hundred thousand passwords that had been collected by the the pony botnet over I think a six-month period or something like that fishing a course is tends to be somewhat lower scale may on the mill attacks and then there's compilations where people may just collect data over time and then will it down and say okay I've grabbed gmail addresses from seven different sources and now I'm going to put them all in one file and call it G you know a gmail dump password dump pass even though it didn't technically come from a hack of gmail someone be duplicates people kind of use this as a

bragging right in some cases they put their name on the top and say hey we hacked the FBI or we hacked you know Gmail and some leaks of course as you probably heard don't just contain password data you know the Ashley Madison compromise contained all sorts of personal data the health care breach that happened that was a Nargis was a leak that was kind of given publicity this week has all sorts of health care information so they're not just limited to you know username password email there may be a lots of other data within them so you kind of if you're deciding to do this you kind of have to make some decisions and you can be somewhat

flexible on this it's not a binary yes we're doing everything or no we're not doing anything type decision probably the most important one is for you decide if if you're gonna be looking for data that comes from your own site looking for signs of compromise or looking for signs that someone is sharing data that supposedly comes from your users are your employees after that probably the you're going to be looking at easy leaks to process so the larger leaks that have plaintext passwords that you don't have to worry about crack anything you want to worry about going through much effort to parse the data or try to go out there and find the data is another I guess easy hurdle to

overcome like I said the larger leaks and all the leaks you can find and this is kind of in talking with Michael coats who's the Twitter privacy and security officer he talked about that's kind of their approaches they're trying to find everything that's out there that they can deal with and try to process to protect our users but as you might guess that also requires a greater commitment to time and resources to try to address that so if you haven't seen password leaks before they come in all sorts of shapes and sizes these are some from like paste bin type sites where they're just you know text file type formats you know usernames passwords emails some are

going to have them at different orders and will be tab delimited sand will be space delimited some will be you know comma delimited some will have hash pastors will have plaintext passwords some will have passwords in a different table than they had username so you'll have to try to correlate them at just my point being that you there's there's not a one size fits all type approach to sucking in that data and having to process it in this case also you can see these these their passwords over here in parentheses that they've already cracked before they publish the dump out there this one's more like a sequel type statement with all the different fields within there so I looked at this back

it's been you know three four years now to try to get an idea of the scope of password leaks I wrote a blog post about it what you can see here in the the reference but essentially over a two month period I looked at how many dumps I could find that were specifically had I think my criteria was more than ten usernames and passwords in them because some might just have like two or three and some might be some like one single person had a church in on a system and they dumped all his information kind of like a daxing type of attack but these were the different results so december 154 different password dumps 225 they

had specifically or organizations or a site specifically named as the source of that dump whether that was accurate or not I didn't verify it plaintext passwords there were 66 dumps and 40 dumps you know resulting to 220 1061 and respectively passwords and then similar with hashed passwords so you can kind of see there's some variation is also in variation in size the dumps with less than a thousand passwords there was a pretty good number of those and a lot of these are just smaller less secure sites where they threw up a you know a university in India throws up a CMS for their students to log in and get course where were a small you know retailer

something like that throws up a site and I didn't count up the emails there were emails included in the December dumps but the January dumps i actually did say which which percent of the dumps had emails in them to give you an idea the rest either had just usernames or some of them may have just been passwords by themselves to give you an idea are you going about to have access to those emails to know if they're your users are not so you know roughly that's you know this was in this was a decent amount of work for me to parse through this so there was dozens and dozens more dumps that I saw that didn't have password

information and they were just config files or other data that was not passwords that I didn't want to have to deal with and this was even odd like I said automating just the I to manually review them but I could at least automate the scanning for them but when you were looking at a photo for emails in there I mean emails are being used as user names did you at any point see any leaks or did you ever think about also looking for usernames there are something else and just an email address like no twitter us an example you can log in using your phone number if you have given them that you can log in on

twitter using your handle or you can log in using your yeah email address no i didn't i didn't do any counting of usernames specifically you can see like in some of these dumps like this first one here on the left side it's got a screen i guess assess name screen name maybe that's their username and it also has their email I couldn't tell you what that site allowed or used as a login we know that a lot of sites on the Internet course use emails but yeah there may be cases where they could do either or and we'll talk more about whether you want to parse usernames yourselves or against your own users or not but that can be

problematic so that's you know that's kind of like the kitchen sink you're you're looking for as much as you can you're sucking that in and I'll talk about some tools here in a minute that may make that easier for you but you could also just look at the larger scale and this is kind of a sampling of what's come out and what I would say has been generally available in the last few months of this year some of these didn't come from this year like LinkedIn we know came from I think 2013 2012 okay and then my spaces around that same time and the Twitter one wasn't really Twitter with some other site and they had 400 million

entries but only 32 the million of 32 million of them were unique in trees and so there's you know but you may still find that that that data is still pertinent and we'll talk about LinkedIn here in a little bit but that data may still be relevant even if it it's older but as far as general availability these are kind of what you're looking at as far as large numbers clearly much larger than the you know a few 10,000 password dumps that we see with just crawling pastebin and most of these you can't find on sites like paceman yet to look for either people tweeting about them or there's torrents or you know maybe some

file she's other file sharing sources and the underground the nice thing is I guess once more people get them they tend to share them more like LinkedIn at first was kind of more harder to get a copy of and now it's it's fairly easy if you know where to look forward see all right so some tools Netflix is one of the organization's it does process for password leaks and they came up with a tool Ruby on Rails application a few years ago which they open sourced called scum blur and that's not really a password leak processing tools so much as an intelligence gathering tool that looks for password leaks so it's a tool that they have for data sources like the

pace pins but also Facebook Twitter they look for they can scan for Twitter try and think probably google searches and some other stuff like that but they essentially make it so if you they want to put in Netflix password as a term in there it gives them a workflow in kind of a checklist for them to go through all the different sources that that tools found since since the last time has been run you know you could schedule to run daily or whatever and retrieve that information decide whether it's a threat you need to deal with whether it's leaked actual leaked passwords or not and then and process it yes okay so he he mentioned scum blurs output as far as

alerting you that you need to process data from it was originally email and there were some quirks if you're using an older version of it so make sure you're using a current version but that's kind of their solution and it works in with some other their tools as far as like I said the workflow behind how they process those leaks and decide because they're looking for other stuff too if someone says hey I hacked hacked Netflix but they're not dumping passwords they're also interesting that type of data so it's more of a general intelligence gathering but it can be used specifically for password leaks dump bond is is a Twitter account but it's also an open source project where

they are crawling pastebin for you and I believe that was one of the primary sources I used when I was doing my research back in two thousand twelve and thirteen but you could customize that since you've got the source code to instead of post it on Twitter do the same type of thing write it into one of your ticketing systems or send yourself alerts when you find specific strings that seem to match password dump so you want to deal with there are some sites that there's not really hashes org is pretty much the best site that I'm aware of as far as current hash dumps they seem to have a good and they kind of

disguise the name like linkedin has a one instead of than I because I guess they're trying to make it harder to search for but they don't only have the raw hashes but they also have cracked hashes so that can save you a lot of time if you want to just take advantage of their work they've already done inside pro is like I think a Russian password cracking site that they have forums where people share hashes and stuff like that there's all sorts of other occasional sources that may have either older data or they may leak something every once in a while but it's it's kind of hard to find one good source on the internet that has

everything you need in one place yep it's high prose a little bit more nefarious than that it's basically frequented by a large number of people with excess GPU capacity you can basically post large dumps of uncracked caches yeah I've been watching specific actors responsible for some of the large cloud service provider breaches and when they pick a new target they'll pick like the hashes for maybe 50 or 60 employees that particular cloud provider and pump them into inside pro like uh anywhere from five dollar to 100 dollar bounty / hash and then we usually get around 22 want twenty to twenty-five percent of the accounts on the first within about three or four hours yeah that's yeah

open acting yep so do you mentioned basically the inside pro does kind of like a crack for pay in their forms where you can say hey I want these cracked and you can offer a bounty for it and so it's not always just like good natured password sherry and sometimes it's more of the criminal element involved in that too of course a lot of the dumps we've seen this year have been offered for sale initially like the LinkedIn and myspace data on you know underground sites where they sell everything from drugs to you know whatever else that you wouldn't build a cell in a normal site those can be a little bit harder to track down just

because you need to get access to them sometimes they're a little bit more careful about that sometimes they're not depends on which sites you're talking about I'm not aware of any just like a underground like a carding protype site that's specifically focused on credentials you know they are there their only interest is credentials so it tends to be mixed in with the other either data sales or things like that that people may have credit card sales other other types of data dumps that are out there law enforcement may also I've heard cases I think with time warner cable where the FBI like said hey we found this data we think may be associated with you and you might want

to take a look at it I wouldn't count on them doing that for you if you think they do great but it's probably more of a you know try to know who your local cyber security person is in the Secret Service and FBI or your national national law enforcement agency of choice but that's like it's nice if you get it but I wouldn't I wouldn't count on there are a couple companies that are specifically focused on providing you with either password leaks or allowing you to check your users against the password leaked data that they have collected hold security is run by Alex Holden which will you probably heard them in the news and last few years but

they've got several you know they have a pretty I guess mature program for going out there getting large data leaks from their underground connections they do some they do crack some of the pastor's themselves they put in you know said like about a week's worth of effort they do find a lot of plaintext passwords and their ideas that they sell you essentially an API service to query users and find out if your users whether their employees or customers have passwords in their database and if you need to crack them you can crack them you know do further cracking on your own leak source is a little bit newer they've also kind of focused more on the

end user as far as you can go out there plug in your email and i'll show you which sites you've been comprehend or credentials have been leaked through but they are off also offering a business option again they do some password cracking themselves they've got the API and you would have to subscribe to that thread intelligence service providers I'm not as familiar with and I would imagine that if you they found a leak that had your name on it they'd probably tell you about but they may not be willing to share like you know a Netflix dump with you if they found something like that I don't have any like I said if you got feedback on that certainly

feel free to pitch in but I know that the least alert of you know hey leaked and had this big credential dump and maybe something you're going to be aware of but they may not provide you with that data directly

right and he mentioned that leak source doesn't restrict you to just searching for your own users data you could essentially search for any matching emails within the dump and I had imagined hold securities probably the same way but I don't know that for sure so you've got these leaks whether you're pulling them from large sources or single sources you've got to decide make some choices on how to process that the first one like I mentioned there's duplicates so you probably be good to have like an indexing type of a history where you're saying okay we've already done leaked you know linkedin from 2012 we've already done blah blah blah so you don't waste time going back over the

same data especially if you've got your parson a lot of small dumps where you're probably getting duplicates every month if you're if you're doing that the cleanup and conversion of data some of this can be automated just things like I said I showed you the kind of the different formats there's headers and footers and different columns and you can sometimes use regular expressions to pull out hashes and email addresses but getting the other data may be problematic so there's and then of course if you're presumably going to filter that and focus on your users you just care about the email addresses that match your user did your user accounts there's another decision about which users you're worried about pandora was

the one that got the leaked in data and any user with an email in that in that dump had their password reset or had you know got flagged for password reset they didn't try to crack the passwords I didn't try to make sure but they were the same that as you might guess angered some people that didn't use the same passwords probably frustrated some others that weren't sure why that was going on and again it's kind of like that nuclear option of we're not really willing to put in much effort under this but we want you to be secure so they're kind of pushing that back in the users responsibility room any user with a

username appearing in the list and kind of back to your question about if you should look at usernames that can be problematic just because user names aren't necessarily unique you know globally unique email addresses are a little bit better about that all that they can be reassigned over time but you know Jay Smith is a username it's probably a lot of sites out there with a Jay Smith it isn't your Jay Smith Alex Deimos also mentioned in that same presentation from the app gaps at California that they do strip off the death the recipient of email addresses and check that against matching user names within yahoo at least they did at the time for matches so for them I mean

but they were specifically also looking for password matches so if the coincidence that Jay Smith juan jay smith too had the same password Jason mess of stuck in a password reset even if he wasn't necessarily the same guy and of course the the more precise option is to say if the username and email specifically match that's who we're going to worry about putting to the reset process so do you depend on what type of data you're dealing with like I mentioned sometimes you're getting hashes some time you're getting plain text plain text you're normally going to take that put it through your normal hashing process you may have to pull down the user salt if your salt

hopefully please hopefully salt in your password so you would have to retrieve some of that data if you have an existing API that this works well with it may be fine otherwise you may have to decide to design something specifically for password leak processing either you know because it's you're dealing with different performance issues or you do things in your normal login process that don't don't make that you know the right approach to cake for this if they are hashed you're going to identify the hash you know whether it's md5 or sha wan or something else and then have to decide how much effort you're going to put into cracking that my recommendation would be

to say we want to prevent people from getting their passwords cracked if they are having you know similar to what an attack the amount of effort average attacker is going to be so maybe that's a week maybe that's two three days you're gonna have to try to make that decision yourself as far as the different approaches I mean we don't really need to get too much too much into that you're going to try different approaches to crack the passwords you can certainly customize that your policy is eight characters and higher you don't to try to crack passwords lower than seven characters that can be problematic as far as the different hashes this is a presentation data from a presentation

that Rick Redmond did from CoreLogic a few years ago where he over a six-month period they gathered data on what types of hashes they saw on the dumps that they were capturing md5 was like forty forty six percent of all the dumps they saw sha-1 was I know four percent or it was much lower but you can kind of see you're dealing with a lot of different options and this is just the top ones they had I don't know a list of 30 or something that that they gathered and you could look at that source for more data but the idea just being it's not going to be an easy everything's md5 sha1 you may have to

deal with more options in that one alternative to hashing or two to going through the cracking process is something that we learned about with the Facebook presentation that Alec Muffet did a couple years ago in passwords crack passwords con Oslo or I'm not trondheim okay he talked about how Facebook deals with passwords and one of the things he mentioned was that before they know they they use I think s crypt at each Mac and some others so they're doing better security but he mentioned that they do have an option for feeding pastors through md5 the normal user passwords through the login process through mb5 first and then into the more complicated more secure systems so if

you find an md5 dump and your users passwords ahead of time have already been hashed with md5 initially and then you're more secure alternatives you can just skip that md5 hashing process and put them through your more secure process you know again maybe having to pull down seed values and serve see does salt values but you could make that comparison even without having to crack the original password just like I said assuming md5 de me five sha-1 sha-1 you you've got comparable types of approaches if you're going to do that for more than one like if you're into md5 and sha-1 and who knows what else you may have to have additional records stored within your account database you

may not want to go through that much trouble it kind of depends on how serious you want to take it and save yourself some trouble one approach I thought of which I'm not sure if it's good or bad because it's got some it's got some drawbacks is instead of trying to crack those passwords you decide to just compare users when they log in take their plaintext password they provide to you hash it if you've already determined how those passwords of hashed if they have a matching record and different dumps make the comparison then without having to do any crack in it around so that's going to compare both the secure secure passwords the problem is you kind

of have to keep that database around for an indefinite amount of Tyria time maybe users don't login and except for every couple months it's definitely more overhead you may not want to have to deal with that but it would get you as some cracking work so what to do until your users you could just notify users you don't have to necessarily force him to reset their password that's kind of what LinkedIn did linked in the initial dump had like some six million passwords in it so I understand that they made those users reset passwords if they had a matching password but the most recent dump that had all you know 117 million accounts in them had not been reset for

the most part in fact linkedin reported of those like 117 million users more than a hundred million had not reset their password at all and that was after all the media attention you know linkedin was in the news a lot about being breached and pastors being compromised but they didn't tell users to change it and some users probably said if I don't you know blinked ins not telling me to do it I don't need to worry about it so that's kind of the give it back on the users hands but they may not make the decision you want them to make so if that's concerning to you don't don't let them make the decision you can lock the account there's you

need to do a custom unlock workflow where you say you have to go to this page to reset you know recover from this which some sites have done most of the larger ones seem to try to push people through the normal forgotten password workflow because that's you've already got that in place it's better to reuse that same code and functionality the problem can be like in the tumbler case or they had to reset some user passwords is that this is literally the only message they gave their users it's time to reset your password once once they tried to login they didn't know why they didn't have any context of what why this was happening for them so if you're

going to put them in through your normal password workflow you may want to have some sort of a flag you can set where it says you know click here for more information and it points them to a URL where you can explain why that's happening you can do what Microsoft in Facebook and some others do which is to not lock the account necessarily they can still log into the account but once they login they're required to go through a password reset workflow but they're also subjected that secondary authentication during that process they're looking at are you coming from a country I know are you coming from a browser I know and things like them and

of course invalidate session tokens is also important if you've got mobile apps or persistent cookies or something like that where even if they're not they may not be logging in so you can't necessarily force them to your login workflow got kind of rush through this but there's some important things to tell your users why it's happening there's going to be confusion on their part I don't recommend that there's a question with the name of the third party I would say typically you don't want to tell them where the leak came from and that may cause some confusion but was we'll talk about from a privacy standpoint that could be problematic as well make sure you company emphasize

your site wasn't compromised this is from a third party whether or not unauthorized access was detected on their specific account that's also helpful for them to to understand that you're protecting them education media stuff like that nuisance leaks I'm going to skip through this basically just you may have to process stuff that didn't come from you like there was a release a few months ago where large email providers and here's the actual headline that came with it you know breach it these big major emails providers well it wasn't a breach that's just the way that routers Reuters decided to spend it but they had to process its data and they found that a very you know as far as a

percent of the council where in their very minimal amount were actually valid data the vast majority was either old data fake data regardless they had to go through a process of well two percent of what was in there if they mean what they're claiming clearly two percent of meaning 476,000 accounts that's pretty significant but as far as the percent

yeah my point is like I said not that you wouldn't care to reset this data just that the data itself was for the most part bad data it wasn't something you know that you would normally have to worry about except for those small percent that you do it was a really bad data mischaracterized a like you were saying before where somebody accumulates all these breaches and then sucks out just the gmail addresses and says hey gmail yeah and a lot of times we don't know like I mentioned and this is some headlines from it but it's a pain to our confusion I guess I don't have it in here someone basically said the Amazon Kindle was hacked they had a record

eighty thousand records and that turned out to be entirely fake data so sometimes it's completely made up sometimes it is just old data from collections that are years old that's no longer valid but any regardless so quickly going over this there is some risks involved with processing this like I said users confused about how did you know how did you know my password was in this league are you keeping my password in plain text are you encrypting it when Facebook announced their policy there was a lot of comments on there and as well as other news articles about it from users who were confused about that some people if you lock the accounts they're not going to get back into their

accounts they've changed email address they don't have access to their email their phone numbers change whatever recovery options you have for them aren't going to work so just be aware you're gonna have to deal with that in some cases privacy concerns if you tell me you know we found your password and Ashley Madison and we found your password and furaffinity I'm gonna say like I'm not sure I want you to know that I was using that site I haven't heard of people specifically being concerned about this but I'm not I would not be surprised if I heard concerns from users being voiced and notification fatigue you know if this is happening every month I could certainly see that

being a problem legal risks the short answer is talk to your legal team because you're dealing with stolen data there from a US law prospective federal law perspective if you're not actively using that to compromise sites you might be okay from a trade secret intellectual property standpoint it kind of depends on what data you have and of course avoid actually testing those credentials so successes you know WordPress went out there looked at gmail data that was compromised found a hundred thousand users that use the same credentials and they were able to reset those people's passwords before they got compromised you know yahoo i like stainless again mentioned that they've been in some of the bigger password dumps I deal with

ten to twenty percent of the entries match their users in able to get those passwords reset before it causes much trouble and Twitter also felt like it was successful so I like to feature my niece and every years slides and this is she's plugging the password leak of course this isn't going to solve all your problems when it comes to account takeover but it may help you and demonstrate your commitment so all but up for question and answer I'll just kind of slowly tab through the references so we can get it recorded in the video but I also have a link to my slides at the end so you have to worry about worry about writing these down but

any questions this is really it's more comment maybe to help I recently did a analysis of the LinkedIn data against a large ecommerce site and just against the subset of commercial customers there were a about a 16-percent overlap between the length and data and they're active customers and of those approximately 300,000 users only ten percent of them and change the password since the LinkedIn breach so that kind of gives you some sense of what the potential exposure is from these breaches and giving you these sensitive you know to go down this road yeah you have the customer engl I guess I i agree home the ten percent we tried resetting passwords and possibly set the email to

about 4,000 people and about 200 people change their passwords so it's it's it's it's tricky issue yeah you can it's you can easily off put customers from your website is it by doing that also an interesting observation so all the all the all the leaks that recently having for sale for one dollar and all that stuff it seems like the information that's being sold now is probably the same one that maybe Alex Holden has has had for a number of years only now got into somebody else's hands yeah yeah a lot of them seem like it's people there they've maybe not weren't and even directly and involved with the hacking it when it originally happened

but they were one of the you know inner circle of people that happen to have that private data and they've decided now well I might as well sell it and try to get some money out of it or share it or whatever the case may be so you mentioned briefly touched on the possibility of storing user passwords in a weaker hash algorithm in order to compare against leaks or leaked data do you have any advice for making your system robust such that if you decide to follow that strategy you don't make your own system more risky or more risk prone to a dump sure and let me be clear on that you're the the my advice is not

that you store an md5 record password hash record for your users it's at you hash with md5 and then you use your s krypter bcrypt or pbk tf2 on that hat on the resulting hash of that so the idea being that you've you're starting from a known known place the md5 hash so if you get more md5 hashes in from a leak you can then put them through that same you know starting on that step put them through your stronger password hashing process and then compare the results without having to crack those original md5 password but yeah don't don't store like an extra record with md5 or sha-1 or anything weaker than that okay so I'm

going to stop there and I will give you all actually give you the opportunity you have lunch when we get back again the next two talks are really taking out into the psychology and linguistics area of passwords and I'm really looking forward to those talks as well so be back here at two o'clock okay thank

you

Proactive Password Leak Processing

Related talks